Overview
About the Role
We are looking for a passionate and highly skilled Machine Learning Engineer (TTS) to lead the development of our multilingual and multi-dialect Text-to-Speech systems. The ideal candidate should have hands-on experience working with modern TTS architectures, audio processing pipelines, and model fine-tuning techniques.
This role is critical to our long-term vision of building and self-hosting advanced TTS models tailored for multiple industry use cases across India. Since we work extensively with regional Indian languages and dialects, we are looking for someone who can take ownership of the TTS domain with strong technical leadership and commitment.
Key Responsibilities
Research, develop, fine-tune, and optimize modern Text-to-Speech models
Build multilingual and multi-dialect TTS systems for Indian regional languages
Work on speaker adaptation, voice cloning, prosody control, and low-latency inference
Train and self-host TTS models using large-scale internal audio datasets
Design robust audio preprocessing and postprocessing pipelines
Improve speech naturalness, pronunciation accuracy, and dialect adaptation
Work with audio codecs, speech enhancement, and compression techniques
Collaborate with product, engineering, and data teams to deploy production-ready speech systems
Evaluate model quality using objective and subjective metrics
Stay updated with the latest advancements in speech AI and generative audio
Required Skills & Qualifications
2+ years of experience in Machine Learning, Speech AI, or related domains
Strong understanding of modern TTS architectures such as Tacotron, FastSpeech, VITS, Bark, XTTS, YourTTS, StyleTTS, or similar
Experience in fine-tuning and deploying TTS models
Good understanding of audio codecs, signal processing, spectrograms, and vocoders like HiFi-GAN, WaveGlow, or BigVGAN
Hands-on experience with Python and deep learning frameworks such as PyTorch or TensorFlow
Experience handling multilingual datasets and speech corpora
Understanding of GPU training, inference optimization, and model serving
Familiarity with Linux environments, Docker, and cloud/GPU infrastructure
Ability to independently own and drive TTS research and production initiatives
Preferred Qualifications
Experience with Indian language datasets and dialect adaptation
Knowledge of ASR, Speech-to-Speech, or Voice AI pipelines
Experience with self-hosted inference systems and scalable deployment
Familiarity with conversational AI and voice-based applications
Research contributions, open-source projects, or published work in speech synthesis
What We Offer
Opportunity to work on cutting-edge multilingual speech AI systems
Access to large proprietary audio datasets
Freedom to experiment, research, and build production-grade TTS infrastructure
High-impact role with ownership and leadership opportunities
Collaborative and innovation-driven work environment
Ideal Candidate
We are looking for someone who is deeply interested in speech and audio AI, comfortable taking ownership of the complete TTS pipeline, excited about solving challenges in Indian languages and dialects, and capable of leading TTS initiatives end-to-end with full commitment.