Overview
Job Description
We are seeking a highly skilled and experienced Senior Data Engineer to join our growing data team. The ideal candidate will have a deep understanding of Azure Databricks, Apache Spark, and the Azure data ecosystem. You will lead the design and implementation of scalable data pipelines, enforce best practices, and ensure delivery of high-quality data products using the Medallion Architecture (Bronze, Silver, Gold).
Key Responsibilities
- Lead the design and implementation of end-to-end data pipelines using Azure Databricks and Delta Lake.
- Build scalable, high-performance ETL/ELT processes across structured and unstructured data sources.
- Implement and optimize Bronze, Silver, and Gold architecture for data curation.
- Integrate with Azure Data Factory, Data Lake, Synapse, and other Azure services.
- Apply best practices for data modeling, partitioning, Z-ordering, and performance tuning.
- Ensure high data quality through validation frameworks (e.g., Great Expectations).
- Enforce data governance, access control, and compliance using Azure Key Vault, RBAC, and Azure Purview.
- Collaborate with data analysts, data scientists, and product teams to understand data needs.
- Mentor junior engineers and participate in architecture/code reviews.
- Contribute to CI/CD pipelines for data workflows using Azure DevOps or similar tools.
Required Qualifications
5 to 8 years of experience in data engineering with at least 2+ years hands-on with Azure Databricks for pipeline development and data modelling.
Technical Skills:
o Azure Ecosystem Expertise:
1. Azure Databricks: End-to-end development including Delta Lake, notebook workflows, and ML integrations.
2. Azure Data Factory: For orchestration and data pipeline integration.
3. Azure Storage (Blob/Data Lake Gen2): Working with storage accounts for structured and unstructured data.
4. Azure Key Vault: Securing secrets and managing credentials in pipelines.
5. Azure DevOps: CI/CD for data pipelines and notebooks.
o Big Data & Distributed Processing:
1. Apache Spark (via Databricks): Advanced knowledge of Spark SQL, PySpark, DataFrames, RDDs, and performance tuning.
2. Delta Lake: Understanding ACID transactions, schema enforcement, time travel, and optimizing data formats.
o Data Modelling and Warehousing:
1. Dimensional modelling (Star/Snowflake schemas).
2. Knowledge of modern data warehousing principles.
3. Experience implementing medallion architecture (Bronze/Silver/Gold layers).
4. Experience working with Parquet, JSON, CSV, or other data formats.
o Programming Languages:
1. Python: For data transformation, notebook development, automation.
2. SQL: Strong grasp of SQL for querying and performance tuning.
3. Scala (nice to have): Useful for certain Spark applications.
o CI/CD and Infrastructure Automation:
1. Experience with Git repositories (branching, PRs).
2. Automated deployments via Azure DevOps Pipelines.
o Data Engineering & Analytical Skills:
1. ETL/ELT pipeline design and optimization.
2. Data quality and validation frameworks.
o Security & Governance:
1. RBAC and access controls within Azure and Databricks.
2. Data encryption, secure key management.
3. GDPR, HIPAA, or other compliance-aware data handling.
Soft Skills & Leadership:
o Stakeholder Communication: Translate technical jargon into business value.
o Project Ownership: End-to-end delivery including design, implementation, and monitoring.
o Mentorship: Guide junior engineers and establish best practices.
o Agile Practices: Work in sprints, participate in scrum ceremonies, story estimation.
Education:
- Bachelor’s or master’s degree in computer science, Data Engineering, or a related field.
- Certifications such as Azure Data Engineer, Databricks Certified Data Engineer Professional is a plus.
Job Types: Full-time, Permanent
Pay: ₹1,500,000.00 - ₹2,200,000.00 per year
Benefits:
- Health insurance
- Provident Fund
Schedule:
- Day shift
- Morning shift
Ability to commute/relocate:
- Hyderabad, Telangana: Reliably commute or planning to relocate before starting work (Required)
Experience:
- Azure Data Engineering: 5 years (Required)
- Python Programming: 4 years (Required)
- Data governance: 3 years (Required)
Work Location: In person
Application Deadline: 25/04/2025
Expected Start Date: 01/05/2025