Overview
About the Role
We’re hiring a Data Engineer to build and scale the data pipelines and core datasets that power Bharat AI’s agentic AI platform. Your work will directly enable product analytics, model evaluation, safety and reliability systems, and business decision-making. You’ll partner closely with Product, Data Science, Infrastructure, Marketing, Finance, and AI/Research teams to ensure the platform has trustworthy, timely, and well-modeled data as the company scales.
What You’ll Do (Responsibilities)
Design, build, and operate data pipelines to ingest product and user event data into the data warehouse with high reliability.
Create canonical datasets that track key product metrics such as user growth, engagement, and revenue.
Implement fault-tolerant ingestion and processing systems with clear monitoring and recovery paths.
Collaborate with cross-functional teams to gather data requirements, define metric logic, and deliver usable datasets.
Contribute to data architecture decisions: schemas, partitioning strategies, pipeline patterns, and data quality controls.
Ensure data security, integrity, and compliance in line with company and industry standards.
Must-Haves (Requirements)
3+ years experience as a Data Engineer working on production pipelines.
Proficiency in Python, Scala, or Java (at least one).
Strong experience with distributed processing/storage ecosystems (e.g., Hadoop/Flink and distributed storage such as HDFS).
Hands-on expertise with orchestration/scheduling tools such as Airflow.
Solid Spark fundamentals with the ability to write, debug, and optimize Spark jobs.
Nice to Have (Bonus)
Hands-on experience with Databricks in production.
Familiarity with the GCP data stack (e.g., Pub/Sub, Dataflow, BigQuery, GCS).