Overview
We are looking for a forward-thinking Data Scientist with expertise in Natural Language Processing (NLP), Large Language Models (LLMs), Prompt Engineering, and Knowledge Graph construction. You will be instrumental in designing intelligent NLP pipelines involving Named Entity Recognition (NER), Relationship Extraction, and semantic knowledge representation. The ideal candidate will also have practical experience in deploying Python-based APIs for model and service integration.
This is a hands-on, cross-functional role where you’ll work at the intersection of cutting-edge AI models and domain-driven knowledge extraction.
Key Responsibilities:
Develop and fine-tune LLM-powered NLP pipelines for tasks such as NER, coreference resolution, entity linking, and relationship extraction.
Design and build Knowledge Graphs by structuring information from unstructured or semi-structured text.
Apply Prompt Engineering techniques to improve LLM performance in few-shot, zero-shot, and fine-tuned scenarios.
Evaluate and optimize LLMs (e.g., OpenAI GPT, Claude, LLaMA, Mistral, or Falcon) for custom domain-specific NLP tasks.
Build and deploy Python APIs (using Flask/Fast API) to serve ML/NLP models and access data from graph database.
Collaborate with teams to translate business problems into structured use cases for model development.
Understanding custom ontologies and entity schemas for corresponding domain.
Work with graph databases like Neo4j or similar DBs and query using Cypher or SPARQL.
Evaluate and track performance using both standard metrics and graph-based KPIs.
Required Skills & Qualifications:
Strong programming experience in Python and libraries such as PyTorch, TensorFlow, spaCy, scikit-learn, Hugging Face Transformers, LangChain, and OpenAI APIs.
Deep understanding of NER, relationship extraction, co-reference resolution, and semantic parsing.
Practical experience in working with or integrating LLMs for NLP applications, including prompt engineering and prompt tuning.
Hands-on experience with graph database design and knowledge graph generation.
Proficient in Python API development (Flask/FastAPI) for serving models and utilities.
Strong background in data preprocessing, text normalization, and annotation frameworks.
Understanding of LLM orchestration with tools like LangChain or workflow automation.
Familiarity with version control, ML lifecycle tools (e.g., MLflow), and containerization (Docker).
Nice to Have:
Experience using LLMs for Information Extraction, summarization, or question answering over knowledge bases.
Exposure to Graph Embeddings, GNNs, or semantic web technologies (RDF, OWL).
Experience with cloud-based model deployment (AWS/GCP/Azure).
Understanding of retrieval-augmented generation (RAG) pipelines and vector databases (e.g., Chroma, FAISS, Pinecone).
Job Type: Full-time
Pay: ₹1,200,000.00 - ₹2,400,000.00 per year
Ability to commute/relocate:
- Chennai, Tamil Nadu: Reliably commute or planning to relocate before starting work (Preferred)
Education:
- Bachelor's (Preferred)
Experience:
- Natural Language Processing (NLP): 3 years (Preferred)
Language:
- English & Tamil (Preferred)
Location:
- Chennai, Tamil Nadu (Preferred)
Work Location: In person