Overview
JD: Machine Learning Engineer
Location: Hyderabad
About Us:
Deccan AI, founded by IIT Bombay and IIM Ahmedabad alumni, specializes in LLM model development and AI-first scaled operations. Based in SF and Hyderabad, our mission is to create AI for Good, driving innovation with positive societal impact.
About the Role
We are seeking a Machine Learning Engineer focused on Data Quality to ensure our model training data meets the highest standards of reliability, relevance, and safety. This role plays a pivotal part in the ML lifecycle — from automated QA of training data to developing evaluation strategies and leading rater workflows — ensuring that the data shipped aligns closely with client expectations and model performance objectives
.You will be at the intersection of engineering, research, and client success, acting as the final quality gatekeeper for datasets powering LLM fine-tuning, reward modeling, and evaluation
Key Responsibilities
Dataset Quality Automation
- Automate quality assurance pipelines for SFT transcripts and RLHF preference pair
- s.Implement schema validation, semantic overlap checks, and embedding-based deduplicatio
- n.Integrate filters for safety, toxicity, and reward-signal balance in datasets.
Training & Benchmarking
- Execute proxy fine-tuning (LoRA/QLoRA) on open-source LLMs using QA-approved datasets.
- Train lightweight reward models and track performance via public/internal benchmarks and calibration metrics.
LLM Evaluaion
- Orchestrate human and LLM-as-judge evaluations, including generation of critiques and scoring.
- Design evaluation rubrics focused on consistency, helpfulness, and alignment with reward models.
- Calculate and interpret statistical measures like binomial confidence intervals for evaluation scores.
Annotation & Rater Management
- Build a continuous feedback loop with annotation teams, resolve disputes, and maintain high annotation quality.
- Manage human evaluation workflows to maximize consistency and throughput.
Research & Tooling
- Prototype new signal-to-noise metrics (e.g., reward model entropy, preference flip
- rate).Package tooling into reproducible notebooks and integrate into CI pipelines (Airflow/Dagster).
End Value to the Company
You will serve as the client-end MLE advocate, ensuring that all training and evaluation datasets are aligned with downstream needs. Your work will directly influence model performance, client satisfaction, and data-driven improvements to our ML systems.
Required Skills & Qualifications
- Strong understanding of LLM training and evaluation pipelines (SFT, RLHF, reward modeling).
- Experience with model performance diagnostics, identifying root causes in model behavior (e.g., data flaws, prompt issues).
- Skilled in prompt engineering, dataset schema design, and annotation guideline development.
- Proficient in Python, with experience using PyTorch, Hugging Face Transformers,FastAPI.
- Comfortable building evaluation frameworks, including leaderboards and domain-specific test sets.
- Familiarity with model evaluation metrics, clustering techniques, embedding models, and data drift detection.
- Strong communication skills, especially in translating technical findings into actionable client insights.
- Self-starter with a consultative mindset who can operate across technical and business domains.
Nice -to-Have
- Experience with embedding similarity, data deduplication, or dataset filtering for toxicity/safety.
- Prior work in LLM-as-a-Judge systems or human alignment evaluations.
- Familiarity with CI/CD for data workflows and orchestration tools like Airflow or Dagster.