Chennai, Tamil Nadu, India
Information Technology
Full-Time
Positive Synergy
Overview
Job Description
Seeking a Lead Data Scientist (Generative AI) to spearhead the development of advanced AI-powered classification and matching systems on Databricks. You will contribute to flagship programs like the Diageo AI POC by building RAG pipelines, deploying agentic AI workflows, and scaling LLM-based solutions for high-precision entity matching and MDM modernization.
Key Responsibilities
Seeking a Lead Data Scientist (Generative AI) to spearhead the development of advanced AI-powered classification and matching systems on Databricks. You will contribute to flagship programs like the Diageo AI POC by building RAG pipelines, deploying agentic AI workflows, and scaling LLM-based solutions for high-precision entity matching and MDM modernization.
Key Responsibilities
- Design and implement end-to-end AI pipelines for product classification, fuzzy matching, and deduplication using LLMs, RAG, and Databricks-native workflows.
- Develop scalable, reproducible AI solutions within Databricks notebooks and job clusters, leveraging Delta Lake, MLflow, and Unity Catalog.
- Engineer Retrieval-Augmented Generation (RAG) workflows using vector search and integrate with Python-based matching logic.
- Build agent-based automation pipelines (rule-driven + GenAI agents) for anomaly detection, compliance validation, and harmonization logic.
- Implement explainability, audit trails, and governance-first AI workflows aligned with enterprise-grade MDM needs.
- Collaborate with data engineers, BI teams, and product owners to integrate GenAI outputs into downstream systems.
- Contribute to modular system design and documentation for long-term scalability and :
- Bachelors/Masters in Computer Science, Artificial Intelligence, or related field.
- 5-7 years of overall Data Science experience with 2+ years in Generative AI / LLM-based applications.
- Deep experience with Databricks ecosystem: Delta Lake, MLflow, DBFS, Databricks Jobs & Workflows.
- Strong Python and PySpark skills with ability to build scalable data pipelines and AI workflows in Databricks.
- Experience with LLMs (e.g., OpenAI, LLaMA, Mistral) and frameworks like LangChain or LlamaIndex.
- Working knowledge of vector databases (e.g., FAISS, Chroma) and prompt engineering for Exposure to MDM platforms (e.g., Stibo STEP) and familiarity with data harmonization challenges.
- Experience with explainability frameworks (e.g., SHAP, LIME) and AI audit tooling.
- Knowledge of agentic AI architectures and multi-agent orchestration. - Familiarity with Azure Data Hub and enterprise data ingestion frameworks. - Understanding of data governance, lineage, and regulatory compliance in AI systems.
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in