Bangalore, KA, India
Information Technology
Other
RealPage, Inc.
Overview
Overview:
Own the end-to-end model lifecycle: problem framing, data exploration, feature engineering, model training, evaluation, deployment, and monitoring
We are looking for an end-to-end Data Scientist to design, build, and maintain ML-powered systems that solve core data quality and classification problems across the business. You will own the full lifecycle — from exploratory analysis and feature engineering through model training, deployment, and ongoing performance monitoring. The work spans entity resolution (identifying duplicate records across large datasets) and multi-class classification models that drive decision-making across a variety of business domains.
Responsibilities: What You'll Do
Own the end-to-end model lifecycle: problem framing, data exploration, feature engineering, model training, evaluation, deployment, and monitoring
- Build and maintain entity resolution systems that detect duplicate records using supervised ML and string similarity techniques
- Develop classification models that categorize unstructured or semi-structured data into meaningful business categories
- Engineer features from messy, real-world text data — names, addresses, free-text fields — using string matching algorithms, phonetic encoding, n-grams, and other NLP techniques
- Design candidate retrieval and indexing strategies to make models performant at scale
- Tune thresholds, scoring logic, and rule-based overrides to balance precision and recall for production use cases
- Maintain production model artifacts and data pipelines, ensuring models stay current as underlying data evolves
- Collaborate with engineering and product teams to understand requirements and translate business problems into well-scoped modeling tasks
- 10+ years of experience building and deploying ML models end-to-end (not just notebooks)
- Strong Python skills — pandas, NumPy, scikit-learn, XGBoost or similar gradient boosting frameworks
- Hands-on experience with record linkage, entity resolution, or deduplication problems
- Experience building classification models (binary and multi-class) on structured and semi-structured data
- Deep familiarity with string similarity algorithms: edit distance, sequence matching, phonetic encoding, shingling
- Strong feature engineering instincts — ability to extract signal from noisy, inconsistently formatted data
- Comfort working with large serialized data structures and understanding memory/performance tradeoffs in production contexts
- Experience with SQL and relational databases (PostgreSQL or similar)
- Clear communication skills — ability to explain model behavior and tradeoffs to non-technical stakeholders
Nice to Have
- Experience with blocking and indexing strategies for scalable record linkage
- Background in NLP, text normalization, or information extraction
- Familiarity with model serving in API contexts (Flask, FastAPI, or similar)
- Experience in data quality, master data management, or marketplace domains
- Exposure to deep learning frameworks (PyTorch, TensorFlow) for text classification
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in