
Overview
Role: Data Engineer / DataOps Specialist
Location: Remote (India)
Team: Data & Analytics
Mode: Permanent
Position Summary
The Mid-Level Data Engineer/DataOps Specialist is responsible for end-to-end design and implementation of our data architecture, spanning both data engineering and modelling lifecycles. You will develop and maintain AWS-native and Hadoop/Spark pipelines, apply relational and dimensional modelling best practices, and collaborate with analysts, data scientists, and actuaries to deliver high-quality data products. This role demands a pragmatic “data-as-a-product” mindset, strong Python and SQL skills, and the ability to optimize cloud infrastructure for performance, scale, and governance.
Key Responsibilities
· Data Architecture & Modelling: Define logical and dimensional schemas; ensure normalization, relational integrity, and optimized designs for analytics and reporting.
· AWS Pipeline Development: Build and operate ETL/ELT workflows with AWS Glue, Amazon Managed Airflow, and AWS Data Pipeline.
· Spark & Hadoop Ecosystem: Develop and tune Spark applications (PySpark/Scala) on EMR/Databricks; manage Hadoop clusters (HDFS, YARN, Hive).
· Data Lake & Warehousing: Design S3-based data lakes (Lake Formation) and Redshift warehouses, optimizing distribution/sort keys and partitioning.
· Infrastructure as Code: Provision and maintain AWS resources (VPCs, EMR/Spark clusters, Glue jobs) using Terraform or CloudFormation.
· Streaming & Messaging: Implement real-time pipelines with Spark Structured Streaming, Amazon Kinesis, or Apache Kafka (MSK).
· Data Quality & Governance: Embed tests and documentation in dbt workflows; enforce data quality via AWS Glue Data Quality or Deequ; maintain data lineage in Glue Catalog.
· Performance & Monitoring: Profile and optimize pipelines, SQL queries, and Spark jobs; configure CloudWatch and Spark UI dashboards with alerts for anomalies.
· Collaboration & Mentorship: Partner with cross-functional teams (BI, analytics, actuarial) to translate requirements; mentor junior engineers on best practices.
· Continuous Improvement: Research and pilot new technologies (EMR Studio, Glue Studio, Amazon Athena, Databricks Delta Lake) to enhance our data platform.
Required Skills & Qualifications
· 3–5 years in data engineering or DataOps roles.
· Strong AWS experience (S3, Glue, Redshift, EMR, Kinesis/MSK, Lambda, IAM, CloudWatch).
· Proficient in Python and SQL (Redshift, Athena, Hive, Spark SQL).
· Spark expertise (PySpark or Scala) and Hadoop cluster management.
· Deep understanding of relational and dimensional modelling.
· Infrastructure as Code with Terraform or CloudFormation.
· Experience with dbt for transformation workflows and automated testing.
· Excellent communication and stakeholder management skills.
Job Type: Permanent
Pay: ₹1,500,000.00 - ₹3,000,000.00 per year
Benefits:
- Work from home
Schedule:
- Day shift
Experience:
- Data Engineering: 4 years (Required)
- DataOps: 3 years (Required)
- AWS experience (S3, Glue, Redshift, EMR, Lambda): 3 years (Required)
- Python and SQL: 3 years (Required)
- Spark expertise (PySpark or Scala) and Hadoop: 3 years (Required)
- Infrastructure as Code: 3 years (Required)
- dbt (data build tool): 3 years (Required)
- Data Architecture & Modelling: 3 years (Required)
- Insurance domain (claims, or catastrophe modelling): 2 years (Required)
Work Location: Remote