Overview
We are a fast-growing startup based in Pune, India, specializing in cutting-edge Data Science and Data Engineering solutions. Our team of dedicated professionals is committed to solving complex data challenges for companies worldwide.
Our Culture
We foster a vibrant startup culture that values:
- Intellectual curiosity
- Continuous learning
- Positive work environment
- Collaborative problem-solving
Role Overview
We are seeking a versatile and proactive Data Scientist to join our dynamic team. The ideal candidate will possess a blend of technical expertise in modern AI/ML technologies, strategic planning, and effective communication skills. This role demands critical thinking, applying data science and problem-solving skills to a wide variety of real-world problems, adaptability to rapidly evolving technologies, and a strong foundation in both traditional and generative AI principles.
Key Responsibilities
- Deliver end-to-end data science projects by applying Machine Learning and Deep Learning fundamentals to solve complex problems
- Derive actionable insights for a variety of problems, industries, and domains using statistical analysis and advanced data science techniques
- Develop high-quality software solutions with Python and other programming languages. Collaborate with developers to understand and improve existing code or create new solutions
- Build and deploy production-ready LLM applications using modern frameworks and best practices
- Design and implement RAG (Retrieval-Augmented Generation) architectures using vector databases and embedding models
- Perform prompt engineering and optimization to maximize LLM performance for specific use cases
- Implement agentic AI systems and multi-agent workflows for complex automation tasks
- Evaluate and benchmark LLM outputs using appropriate metrics and testing frameworks
- Build sophisticated data pipelines for large-scale data processing using modern orchestration tools
- Optimize database performance and create efficient SQL queries
- Deploy and monitor ML models in production using MLOps practices and containerization
- Practice active listening to understand project requirements and team inputs
- Collaborate with clients to translate business requirements into data science solutions
- Communicate complex ideas and results clearly to stakeholders through both verbal and written formats
- Apply responsible AI principles and ensure ethical considerations in model development
- Demonstrate punctuality and a strong sense of ownership in all tasks
- Plan strategically and multitask efficiently to meet project deadlines
- Employ critical thinking to break down problems and debug effectively
- Take initiative and be biased towards action to drive project progress
Required Skills
Core Programming & ML
- Strong Python programming skills with hands-on project experience
- Expertise in Machine Learning and Deep Learning algorithms (Random Forests, GBMs, Neural Networks, CNNs, RNNs, Transformers, Ensemble methods)
- Proficiency in TensorFlow or PyTorch, along with scikit-learn and pandas
- Familiarity with modern ML techniques: Transfer Learning, Few-shot Learning, Self-supervised Learning
- Experience with NLP, Computer Vision, or Time Series Analysis
Generative AI & LLMs
- Hands-on experience with LLM providers (OpenAI, Anthropic Claude, Google Gemini, or open-source models)
- Proficiency with GenAI orchestration frameworks (LangChain, LangGraph, LlamaIndex, or DSPy)
- Experience building RAG applications with vector databases (Pinecone, Weaviate, Chroma, FAISS)
- Strong prompt engineering skills and understanding of prompt optimization techniques
- Knowledge of fine-tuning techniques (LoRA, QLoRA) and when to apply them
- Understanding of LLM evaluation metrics and benchmarking methodologies
- Familiarity with agentic AI architectures and multi-agent systems
MLOps & Deployment
- Experience with MLOps practices and tools (MLflow, Kubeflow, Weights & Biases)
- Proficiency with containerization using Docker and orchestration with Kubernetes
- Experience with cloud platforms (AWS, Azure, or GCP) for ML model deployment and monitoring
- Understanding of CI/CD pipelines for ML applications
- Knowledge of model serving frameworks and API development (FastAPI, Flask, or Django)
Data Engineering & Databases
- Solid understanding of SQL, including advanced concepts like windowing functions and query optimization
- Experience with data pipeline orchestration tools (Airflow, Prefect, or similar)
- Familiarity with both SQL and NoSQL databases
Soft Skills & Professional Attributes
- Strong critical thinking and problem-solving skills
- Excellent written and verbal communication abilities
- Demonstrated ability to work well in a team and independently
- High degree of flexibility and adaptability to rapidly evolving technologies
- Understanding of AI safety principles and responsible AI practices
Nice-to-Have
- Experience with big data technologies (Spark, Hadoop, Databricks)
- Familiarity with BI tools and dashboard creation (Tableau, Power BI, Looker)
- Knowledge of graph databases and knowledge graph construction
- Experience with real-time streaming data processing
- Active participation in data science competitions (Kaggle, DrivenData)
- Contributions to open-source AI/ML projects or technical blog
- Experience with multimodal AI models (vision-language models, audio processing)
- Published research papers or conference presentations
Qualifications
- Data Scientist I: 0-2 years of hands-on experience in Data Science projects
- Data Scientist II: 2-5 years of hands-on experience in Data Science projects
- Bachelor's or Master's degree in Computer Science, Data Science, Statistics, or related technical field
- Demonstrated commitment to continuous learning through courses, certifications, or self-study (especially in GenAI and modern ML techniques)
What We Offer
- Competitive salary commensurate with experience
- Opportunity to work on diverse, cutting-edge AI/ML projects
- Collaborative and innovation-driven work environment
- Rapid growth and continuous learning opportunities
- Exposure to latest AI technologies and industry best practices