Overview
Project Role : Custom Software Engineer
Project Role Description : Develop custom software solutions to design, code, and enhance components across systems or applications. Use modern frameworks and agile practices to deliver scalable, high-performing solutions tailored to specific business needs.
Must have skills : Python (Programming Language)
Good to have skills : NA
Minimum 5 year(s) of experience is required
Educational Qualification : 15 years full time education
We are looking for a skilled Data Scientist with strong experience in traditional machine learning techniques such as regression, classification, and statistical modeling. The ideal candidate should have deep expertise in Python and SQL, strong analytical thinking, and hands-on experience in building and validating predictive models.
This role is not focused on GenAI/NLP, and candidates with heavy LLM-focused profiles without strong core Data Science fundamentals may not be suitable.
Key Responsibilities
Develop, train, and deploy machine learning models for classification, regression, and predictive analytics use cases
Perform data exploration, preprocessing, and feature engineering on large and complex datasets
Apply statistical techniques (hypothesis testing, distributions, variance analysis) to derive meaningful insights
Design and execute robust model validation frameworks (cross-validation, A/B testing, performance tuning)
Conduct hyperparameter tuning to improve model accuracy and efficiency
Write optimized and scalable SQL queries for data extraction, transformation, and analysis
Build reusable and scalable Python-based data science pipelines
Collaborate with data engineers and business stakeholders to understand requirements and translate them into analytical solutions
Ensure model performance monitoring, documentation, and reproducibility
Communicate findings, insights, and model outcomes clearly to technical and non-technical stakeholders
Mandatory Strong proficiency in following Skills:
o Python (Pandas, NumPy, Scikit-learn, etc.)
o SQL (advanced querying, optimization, transformations)
o Solid understanding of statistics and probability: Distributions, hypothesis testing, variance, confidence intervals
o Hands-on experience with machine learning algorithms: Logistic Regression, Linear Regression & Tree-based models (Random Forest, Gradient Boosting, XGBoost)
o Model validation techniques (cross-validation, train-test strategies)
o Hyperparameter tuning approaches (grid search, random search, Bayesian optimization)
o Data cleansing, transformation, and feature engineering
o Good understanding of algorithms and data structures fundamentals: Sorting, searching, and basic graph concepts
o Ability to work independently and handle end-to-end problem-solving
Preferred Qualifications:
o Hands-on experience with: Databricks / Apache Spark / PySpark
o Exposure to Healthcare Analytics domain: Claims data, CMS reimbursement, population health, Medicare Advantage
o Experience in: Building and deploying inference/prediction pipelines
o Familiarity with: ML lifecycle tools (MLflow, Azure ML, etc.)
o Exposure to cloud platforms (Azure/AWS/GCP)
15 years full time education