Overview
Key Responsibilities
1. Develop, maintain, and optimize ETL/ELT pipelines using Azure Databricks (PySpark/SQL).
2. Load and transform data across Bronze, Silver, and Gold layers in accordance with medallion architecture.
3. Integrate data from diverse sources (databases, APIs, flat files) into Azure Data Lake.
4. Work with Azure Data Factory for orchestration and monitoring of data workflows.
5. Implement Delta Lake features: time travel, schema evolution, partitioning.
6. Ensure data quality, validation, and performance tuning of Spark jobs.
7. Collaborate with data analysts, scientists, and architects to support analytical use cases.
8. Write clean, maintainable code with proper version control using Git.
9. Follow Agile practices and contribute to sprint planning and retrospectives
Required Qualifications
3 to 5 years of experience in data engineering roles with 2+ years of hands-on experience with Azure Databricks and Apache Spark.
Comfortable with CI/CD basics and job scheduling in Databricks.
Core Competencies:
Foundational Data Engineering Skills:
▪ ETL/ELT Pipeline Development: Design and build data pipelines using Spark (in Databricks).
▪ Data Cleansing & Transformation: Handling raw data ingestion, transformation logic, and performance optimization.
▪ Batch and Streaming Processing: Working with batch datasets and understanding basics of real-time/streaming ingestion (e.g., Structured Streaming).
Azure Platform Knowledge:
o Azure Services Familiarity:
▪ Azure Databricks: Notebook development, job scheduling, Delta Lake tables, and workspace management.
▪ Azure Data Factory (ADF): Build and monitor pipelines for orchestration.
▪ Azure Storage (Blob/Data Lake Gen2): Working with storage accounts for structured and unstructured data.
▪ Azure Key Vault: Using secrets securely in pipelines.
Technical Toolset:
o Languages & Frameworks:
▪ Python: For data transformation, notebook development, automation.
▪ SQL: Strong grasp of SQL for querying and performance tuning.
Design Principles:
- Delta Lake Basics: Schema enforcement, partitioning, versioning.
- Understanding of medallion architecture (Bronze, Silver, Gold layers).
- Experience working with Parquet, JSON, CSV, or other data formats.
Security & Governance:
o Access and Permissions:
- Basic knowledge of RBAC, storage access keys, and Databricks cluster permissions.
- Familiar with data privacy policies (GDPR basics), encryption at rest/in transit.
Deployment & Monitoring:
o DevOps and Automation:
- Experience using Git for version control and collaboration.
- Ability to schedule and monitor jobs in Databricks Job Workflows.
Soft Skills:
o Communication & Collaboration:
- Work closely with analysts, data scientists, and architects.
- Ability to document pipelines and transformations clearly.
- Basic Agile/Scrum familiarity – working in sprints and logging tasks.
Nice to Have:
o Exposure to MLflow for basic ML model tracking
o Data quality tooling o Azure Purview or other cataloging tools.
Education:
o Bachelor’s or master’s degree in computer science, Data Engineering, or a related field.
o Certifications such as Azure Data Engineer, Databricks Certified Data Engineer Professional is a plus.
Job Types: Full-time, Permanent
Pay: ₹800,000.00 - ₹1,458,475.65 per year
Application Question(s):
- Experience in Azure Data Engineering with Azure Databricks and Apache Spark?
- Knowledge of Medallion Architecture? ?
- Experience in Python Programming Language?
- Experience in Azure Storage and Key Voult?
- Current Location ?
Work Location: In person