Overview
ABOUT THE ROLE
We are hiring a Senior Data Engineer with deep hands-on expertise in Apache Spark, Scala, and Azure Synapse to design, build, and maintain large-scale data pipelines and data warehouse solutions. You will be a core member of an offshore delivery team working closely with onshore stakeholders to deliver robust, performant, and scalable data engineering solutions across enterprise data platforms.
This is a code-first role — you are expected to write production-grade Spark and Scala code, not just configure tools.
KEY RESPONSIBILITIES
Design, develop, and maintain scalable data pipelines using Apache Spark (Scala/Java)
Build and optimise data warehouse solutions on Azure Synapse Analytics
Write complex, high-performance SQL for data transformation, aggregation, and reporting layers
Architect and implement large-scale data integration solutions across structured and semi-structured data sources
Collaborate with onshore data architects, product owners, and business stakeholders to translate requirements into technical data solutions
Manage and version code using GitHub — follow branching strategies, pull requests, and code review standards
Optimise Spark jobs for performance — partitioning strategies, caching, broadcast joins, and cluster resource management
Ensure data quality, lineage, and observability across the data platform
Participate in Agile ceremonies, sprint planning, and technical design discussions
Support data platform migration, modernisation, and cloud adoption initiatives
REQUIRED SKILLS & EXPERIENCE
Core Data Engineering:
8–10 years of hands-on data engineering experience in production environments
Strong, demonstrable Apache Spark expertise — core concepts, DAG optimisation, shuffle management, execution plans
Proficiency in Scala and/or Java for production Spark development — code-first mandatory (no low-code only profiles)
Advanced SQL — complex joins, window functions, CTEs, performance tuning, query plan analysis
Strong experience with data warehousing concepts — dimensional modelling, slowly changing dimensions, star/snowflake schemas
Hands-on experience with large-scale data integration — batch and streaming pipelines, ETL/ELT patterns
Cloud & Platform:
Hands-on Azure Synapse Analytics — dedicated SQL pools, serverless SQL, Spark pools, pipeline orchestration
Working knowledge of Azure Data Lake Storage (ADLS Gen2) and Delta Lake or similar lakehouse formats
GitHub for source control — branching, merging, CI/CD for data pipelines
Delivery & Collaboration:
Proven ability to work in an offshore delivery model with onshore coordination — async communication, documentation, sprint delivery
Experience translating onshore business requirements into offshore technical delivery
Comfortable with cross-time-zone collaboration, written communication, and delivery accountability
GOOD TO HAVE
Microsoft Fabric — experience with Lakehouses, Notebooks, Data Warehouses, or Pipelines within the Fabric ecosystem
Background in marketing data, consumer goods analytics, or retail domain data
Familiarity with metadata-driven architectures — configuration-driven pipelines, framework-based ETL
PySpark exposure alongside Scala — polyglot data engineering experience
Azure Data Factory or Synapse Pipelines orchestration experience
Databricks experience — Delta Live Tables, Unity Catalog, workflows
Experience with data quality frameworks — Great Expectations, Deequ, or similar
EXPERIENCE & QUALIFICATIONS
8–10 years of production data engineering experience — not BI or analytics reporting only
Degree in Computer Science, Engineering, Mathematics, or a related field (or equivalent)
Demonstrated ownership of end-to-end data pipeline delivery — from ingestion through to consumption layer
Prior experience in Agile/Scrum data delivery teams