Overview
Job Title: AI Developer
Location: Mumbai (On-site)
Experience: 3 - 5 yrs
Role Summary:
We are seeking an experienced AI Developer to lead the fine-tuning, deployment, and optimization of the custom Proniti AI model based on the prevailing Ai models architecture (26B-A4B MoE / 31B Dense). You will be responsible for transforming the base model into a highly secure, autonomous reasoning engine capable of executing complex standard operating procedure (SOP) gap analyses and regulatory reporting.
Key Responsibilities:
Model Fine-Tuning: Configure and execute Supervised Fine-Tuning (SFT) pipelines using Parameter-Efficient Fine-Tuning (PEFT) methodologies. Utilize Quantized Low-Rank Adaptation (QLoRA) with frameworks like Hugging Face TRL and Unsloth (using bitsandbytes nf4 quantization) to adapt the model without catastrophic forgetting.
Sovereign Infrastructure Deployment: Manage the deployment of the model on sovereign Indian cloud infrastructure. Work directly with Dedicated infrastructure, NVIDIA H100 or L40S GPU clusters hosted in Mumbai-based Tier IV data centers to ensure data privacy and ultra-low latency.
Inference Optimization: Deploy and configure the vLLM inference engine. You will optimize the server using flags like --gpu-memory-utilization for long context management and enable Gemma 4's specific parsers (--reasoning-parser gemma4).
Agentic Tool Orchestration: Implement native tool-calling capabilities by mapping Proniti's backend APIs to the model's <|toolcall> and <|toolresponse> control tokens, powering the autonomous Reporting Agent.
Constrained Decoding: Implement structured JSON output generation via vLLM's guided decoding engine to guarantee that the AI generates perfectly structured data payloads for the Proniti Compliance Dashboard.
Security Governance: Integrate the open-source Agent Governance Toolkit to provide deterministic, sub-millisecond policy enforcement, preventing risks like tool misuse or prompt injections.
Requirements:
3–5+ years of experience in Deep Learning, NLP, and AI Systems Engineering.
Strong proficiency in Python, PyTorch, and the Hugging Face ecosystem.
Proven hands-on experience with LLM/SLM fine-tuning techniques (LoRA, QLoRA) and quantization.
Deep understanding of inference servers (specifically vLLM) and GPU memory optimization (KV caching, PagedAttention).
Experience building autonomous AI agents and utilizing JSON schemas for strict output decoding.