Overview
About Deccan AI
Deccan AI is a high-growth, venture-backed AI model training and evaluation company headquartered in the Bay Area. Founded by alumni of IIT Bombay, IIM Ahmedabad, and ex-Google, we partner with the world’s top AI frontier labs including Google DeepMind, Snowflake, and several cutting-edge research groups. We are backed by Prosus Ventures, and our India office is based in Hyderabad.
We’re not just participating in the AI race we’re building the infrastructure that powers it.
With 1M+ global experts, advanced automation, and vertically integrated platforms, we deliver the gold-standard data that world-class AI models rely on. The AI data annotation market is exploding set to quadruple by 2032. The opportunity? Massive, and you can help define the future.
Role Overview
We are looking for an ML Researcher focused on human alignment and real-world usability evaluation of AI systems. This role designs benchmarks and datasets to assess how well models align with human intent, preferences, safety norms, and day-to-day task expectations.
1. Background
As AI and machine learning evolve rapidly, DeccanAI must stay at the forefront of research, evaluation, and benchmarking to maintain a competitive edge. Currently, ML Engineers (MLEs) are spread across research, benchmark ideation, dataset creation, pipeline design, and implementation. This dilution of focus has led to bottlenecks, delays, and inconsistent quality—particularly in benchmark creation and novel evaluation datasets.
To address these challenges, we propose a dedicated ML Lead role focused exclusively on research, conceptualization, and direction-setting for benchmarks and evaluations, without direct implementation responsibilities. This role will define the novelty, quality, and strategic direction of benchmarks, while leading a small team of specialized MLEs responsible for execution and pipeline development.
This structure enables faster delivery, higher-quality benchmarks, and stronger alignment with both research goals and client-facing requirements.
2. Purpose of the Role
The ML Lead will be responsible for driving innovation in AI evaluation at DeccanAI by:
Designing and proposing novel benchmarks and evaluation datasets aligned with the latest AI research.
Identifying gaps in existing evaluation methodologies across coding and non-coding domains.
Leading and mentoring a focused MLE team responsible for implementation and pipeline execution.
Ensuring consistent, high-quality benchmark output for both research initiatives and client deliverables.
3. Key Responsibilities
1. Benchmark & Evaluation Design (Primary Responsibility)
Propose at least one novel benchmark or evaluation project per week across coding (e.g., algorithmic tasks, agentic coding) and non-coding domains (e.g., NLP, CV, multimodal, agent frameworks).
Identify limitations in existing benchmarks and design new evaluation criteria that push current AI evaluation standards.
Conceptualize and propose novel evaluation datasets to support these benchmarks.
2. High-Level Requirements & Documentation
Produce a clear, high-level requirement document for every proposed benchmark, including:
Objective and motivation
Evaluation criteria and metrics
Testing methodology
Expected outcomes
Ensure documentation provides sufficient clarity for MLEs to implement efficiently without ambiguity.
3. Technical Leadership & Team Management
Lead and mentor MLEs responsible for pipeline design, validation, and execution.
Ensure alignment of MLE efforts with organizational research goals and client requirements.
Coordinate execution across multiple benchmark and pipeline initiatives.
4. Client Interaction & Requirement Alignment
Participate in client discussions to understand evaluation needs and expectations.
Translate client requirements into benchmark-level specifications.
Enable client-facing MLEs with the necessary context and guidance for successful delivery.
Provide constructive and positive feedback to clients on task feasibility and requirements.
5. Continuous Innovation & Improvement
Stay current with advances in AI research, benchmarks, and evaluation methodologies.
Regularly refine benchmarks based on research insights, client feedback, and evolving market needs.
Collaborate with internal teams to evolve existing benchmarks and datasets.
6. Strategic Recommendations
Propose improvements to benchmarks, datasets, and evaluation pipelines based on research insights and stakeholder feedback.
Contribute to long-term strategy around AI evaluation and benchmarking at DeccanAI.
4. Team Structure Under the ML Lead
Pipeline-Focused MLEs
Responsible for implementing, validating, and executing pipelines based on ML Lead proposals.
Support both research and client-facing benchmark deployments.
Benchmark Research MLEs
Conduct supporting research and collaborate with the ML Lead to propose novel benchmarks and evaluations on a weekly basis.
5. Expected Output & Delivery Cadence
Two new benchmarks every two weeks (initially starting with one):
One coding-domain benchmark
One non-coding-domain benchmark
High-level requirement documentation for every benchmark.
Client-ready deliverables, validated through robust pipelines and aligned with defined evaluation criteria.
6. Clear Separation of Responsibilities
ML Lead (Ownership):
Research, ideation, and conceptual design of benchmarks and evaluation datasets.
Defining novelty, quality standards, and evaluation direction.
High-level documentation and requirements definition.
Pipeline-Focused MLEs (Execution):
Pipeline development and integration.
Benchmark implementation and validation.
Direct client-facing execution and requirement clarification.
This separation ensures deep focus on innovation at the ML Lead level while enabling efficient and scalable execution by MLEs.
7. Seniority & Skillset Requirements
The ideal ML Lead will have:
Deep expertise in AI/ML domains including agent frameworks, NLP, deep learning, multimodal systems, and computer vision.
Proven experience designing benchmarks, evaluation datasets, and research-driven evaluation frameworks.
Strong leadership and mentorship skills for guiding MLE teams.
Excellent documentation and communication skills, capable of explaining complex ideas to both technical and non-technical stakeholders.
A strong understanding of existing AI benchmarks across coding and non-coding domains, with the ability to identify and address evaluation gaps.