Overview
dBee is a fullstack voice AI agent platform with unified memory and a self-learning framework. We aren't building demos, we're live with enterprise clients handling real call volumes, in multiple languages, across real-world conditions that polished demos never account for.
We sit at the intersection of LLM inference, real-time speech, and stateful agent orchestration. The problems we solve daily don't have stack overflow answers. If you're looking for a place where your architectural instincts will actually matter — and where your name will be on the things that work, this is that place.
We're a small, focused core team. Everyone here owns a significant piece of what ships.
About the Role
We're scaling our infrastructure to handle 100+ concurrent enterprise voice calls — without degradation, without noticeable latency spikes, and with the same conversational smoothness across Hindi, English, and multilingual code-switching that our clients expect.
The voice AI enterprise stack is a different beast from what most engineers have touched. If you've already built in this domain, you know the seams — the gap between STT confidence thresholds and LLM turn-taking, the prosody issues that surface at scale, the orchestration failures that only appear at high concurrency. Those are the exact problems we're solving.
This role is for someone who wants to work at the edge of what's currently possible in voice AI — not maintain what's already there.
What You'll Do
Design and optimize the orchestration layer between LLM, STT, and TTS for enterprise-grade concurrency (100+ simultaneous calls)
Build and tune pipelines for low-latency, high-reliability voice agents that stay consistent across long, multi-turn sessions
Solve multilingual voice AI challenges: code-switching, regional accent handling, prosody consistency, and natural turn-taking in non-English contexts
Architect session memory and context management so agents behave coherently across calls, not just within them
Work directly on the self-learning feedback loop that makes dbee agents improve over time from live interactions
Identify where vendor defaults break down and build around them — we're at a scale and complexity where off-the-shelf doesn't always hold
What We're Looking For
Shipped enterprise voice AI to production. Not a prototype. Actual clients, actual scale, actual incident response.
Hands-on with the full voice stack — STT, TTS, and LLM orchestration frameworks .
Deep understanding of real-time audio pipelines — VAD, barge-in handling, jitter buffers, silence detection, and why these matter at scale.
Multilingual nuance — building voice agents that work beyond English, especially in Indian language contexts, requires a different mental model. You should already have that model.
High-agency engineering style. You scope your own problems, propose solutions, and execute without hand-holding.
Nice to Have
Experience with WebRTC / SIP / telephony infrastructure at scale
Familiarity with distributed tracing and observability tooling for real-time audio systems
Opinions on eval frameworks for voice AI quality (latency, naturalness, task completion)
Prior experience at a voice AI startup, conversational AI company, or enterprise contact center platform
What We Offer
Salary: Industry-standard
Equity: ESOP with meaningful allocation tied to performance, if applicable
Location flexibility: Fully remote if you prefer. Or on-site offices in GIFT City, South Delhi, and Bengaluru. We don't manage by presence.
Autonomy: You'll have access to the core product, the team, and the roadmap. No layers.
Stage: Early enough that your work shapes the architecture. Late enough that we have real enterprise clients and real revenue.
How to Apply
Skip the cover letter. Send us one or more of the following:
🔗 Resume with GitHub profile
🔗 Live project or deployed voice AI system
🔗 Work profile or portfolio
🔗 A write-up or case study of a hard voice AI problem you solved
We review everything. If your work speaks for itself, so will your application.