Overview
We're seeking an experienced AI/ML Engineer specializing in Text-to-Speech (TTS) technologies to join our team. You'll be responsible for fine-tuning, optimizing, and deploying TTS models at scale, building robust streaming architectures that deliver high-quality voice synthesis in production environments.
What You'll Do
Fine-tune and optimize state-of-the-art TTS models (e.g., VITS, Tacotron, FastSpeech, Coqui TTS, Bark, or similar architectures)
Design and implement streaming TTS architectures for low-latency, real-time voice synthesis
Deploy and scale TTS models in production environments using cloud infrastructure (AWS/GCP/Azure)
Build efficient inference pipelines with optimization techniques (model quantization, ONNX, TensorRT)
Develop APIs and microservices for TTS model serving with high availability and performance
Implement voice cloning and custom voice synthesis solutions
Monitor model performance, latency, and quality metrics in production
Collaborate with product and engineering teams to integrate TTS capabilities into applications
Stay current with latest research in speech synthesis and neural audio generation
Required Qualifications
4-6 years of experience in AI/ML engineering, with at least 2 years focused on speech/audio ML
Strong expertise in TTS model architectures and frameworks (PyTorch/TensorFlow)
Hands-on experience fine-tuning TTS models and working with audio processing libraries
Proven track record of deploying ML models at scale in production environments
Deep understanding of streaming architectures and real-time inference systems
Experience with containerization (Docker, Kubernetes) and ML orchestration tools
Strong programming skills in Python and proficiency with ML frameworks
Knowledge of audio signal processing, vocoders, and speech synthesis techniques
Experience with model optimization techniques for low-latency inference