Mumbai, Maharashtra, India
Information Technology
Full-Time
Infinite Computer Solutions
Overview
Job DescriptionSenior MLOps / LLMOps Engineer (Databricks Expert) - Job Description
Introduction
Join an amazing company where you can work with cutting-edge technologies and platforms. Give your career an Infinite edge, with a stimulating environment and a global work culture. Be a part of an organization where we celebrate integrity, innovation, collaboration, teamwork, and passion. A culture where every employee is a leader delivering ideas that make a difference to this world we live in.
In the MLOps / LLMOps Engineer responsibilities include, although not limited to:
- Design, build, and operate end-to-end MLOps and LLMOps pipelines for training, deployment, monitoring, and lifecycle management of ML and generative AI models.
- Lead Databricks-based ML and LLM platforms using MLflow, Model Registry, Feature Store, and Databricks Workflows.
- Deploy and operate ML and LLM models in production with scalability, reliability, and high availability.
- Architect and optimize high performance distributed ML and LLM training pipelines on Databricks using advanced Spark tuning, autoscaling policies, optimized cluster configurations, and photon execution.
- Implement high performance inference architectures, including GPU accelerated model serving, vector search indexing optimization, and low latency LLM deployments.
- Build mission-critical ML/LLM systems with strict SLAs for throughput, latency, scalability, and resilience—ensuring 24/7 production readiness.
- Lead implementation of automated retraining and evaluation frameworks with configurable thresholds for drift, quality degradation, and model reliability.
- Implement cost efficient ML and LLM operations, leveraging cluster policy enforcement, job orchestration patterns, caching strategies, and compute aware model design.
- Implement CI/CD pipelines for ML workflows including model versioning, testing, validation, and automated deployment.
- Operationalize LLM-based applications including RAG pipelines, embeddings, vector search, and prompt lifecycle management.
- Monitor model performance, drift, latency, bias, and cost with alerting and retraining strategies.
- Collaborate with data scientists, data engineers, and platform teams for secure and reproducible ML solutions.
- Define governance, lineage, reproducibility, and compliance standards for ML and LLM systems.
- Integrate Databricks ML workloads with Azure services such as Azure ML, ADLS Gen2, Key Vault, and Azure DevOps.
- Troubleshoot distributed ML pipelines and production inference services.
- Mentor teams on MLOps and LLMOps best practices.
In addition to the qualifications listed below, the ideal candidate will demonstrate the following traits:
- Experience with advanced MLOps/LLMOps reliability engineering, including rate limiting, autoscaling, circuit breaking, caching, and SLA management.
- Ownership mindset for production-grade ML systems.
- Ability to bridge experimentation and enterprise deployment.
- Passion for automation and reliability.
- Strong communication and collaboration skills.
- Proactive approach to performance, cost, and scalability.
- Curiosity for evolving Generative AI technologies.
- Bachelor’s degree in Computer Science, Engineering, AI, or related field.
- 7+ years of experience in ML engineering, data engineering, or platform engineering.
- Hands-on experience with MLOps and LLMOps pipelines in production.
- Strong expertise with Databricks for ML workloads.
- Experience deploying ML and LLM models in Azure environments.
- Proficiency in Python and ML frameworks.
- Experience with CI/CD for ML systems.
- Knowledge of model monitoring, drift detection, and retraining.
- Experience with Docker and Kubernetes.
- Understanding of AI security, governance, and compliance.
- Strong English communication skills.
- Experience with RAG architectures, vector databases, embeddings, and prompt engineering.
- Advanced Databricks capabilities including Unity Catalog and Lakehouse AI.
- Familiarity with Azure AI and enterprise AI governance.
- Responsible AI and ethics experience.
- Agile/Scrum delivery experience.
- Relevant certifications in Databricks or Azure AI.
BE
Range Of Year Experience-Min Year
4
Range Of Year Experience-Max Year
8
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in