Overview
Senior Machine Learning Engineer – LLM Evaluation / Task Creation (India Based)
Hourly Contract | Remote | $35 per hour
Role Description
Mercor is hiring Senior Machine Learning Engineers to collaborate with a leading AI research lab on the design, evaluation, and benchmarking of advanced machine learning systems. In this role, you will create high-quality ML tasks, datasets, and evaluation workflows that directly support the training and assessment of next-generation AI and LLM-based systems.
This position is ideal for engineers with strong applied ML experience and competitive ML backgrounds (e.g., Kaggle), who can translate real-world problem statements into robust, reproducible machine learning pipelines. You will work closely with researchers and engineers to ensure dataset quality, sound evaluation methodology, and impactful experimentation.
Key Responsibilities
- Frame and design novel ML problems to enhance the reasoning and performance of LLMs
- Build, optimize, and evaluate machine learning models across classification, prediction, NLP, recommendation, and generative tasks
- Run rapid experimentation cycles and iterate on model performance
- Perform advanced feature engineering and data preprocessing
- Conduct robustness testing, adversarial evaluation, and bias analysis
- Fine-tune and evaluate transformer-based models when required
- Maintain clear documentation for datasets, experiments, and modeling decisions
- Stay up to date with latest ML research, tools, and best practices
Required Qualifications
- 3+ years of full-time experience in applied machine learning
- Technical degree in Computer Science, Electrical Engineering, Statistics, Mathematics, or related field
- Demonstrated competitive ML experience (Kaggle, DrivenData, or equivalent)
- Evidence of strong performance in ML competitions (leaderboard rankings, medals, finalist placements)
- Strong proficiency in Python, PyTorch/TensorFlow, and modern ML/NLP frameworks
- Solid understanding of statistics, optimization, model architectures, and evaluation techniques
- Experience with ML pipelines, experiment tracking, and distributed training
- Strong problem-solving, analytical, and communication skills
- Experience working with cloud platforms (AWS, GCP, or Azure)
- Fluency in English
- Must be based in India
Preferred / Nice to Have
- Kaggle Grandmaster/Master or multiple Gold Medals
- Experience creating ML benchmarks, evaluations, or challenge problems
- Background in LLMs, generative models, or multimodal learning
- Experience with large-scale distributed training
- Prior work in AI research, ML platforms, or infrastructure teams
- Contributions to open-source projects, blogs, or research publications
- Experience with LLM fine-tuning, vector databases, or generative AI workflows
- Familiarity with MLOps tools (Weights & Biases, MLflow, Airflow, Docker, etc.)
- Experience optimizing inference performance and deploying models at scale
Compensation & Contract
- Rate: $35 per hour
- Engagement: Independent contractor
- Work Mode: Fully remote and asynchronous
- Payments: Weekly via Stripe Connect
⚡ PS: Mercor reviews applications daily. Please complete your interview and onboarding steps to be considered for this opportunity. ⚡