Overview
ETL Data Engineer
Skill to Evaluate: ETL, Etl Developer, GCP, Big Data, Bigquery, Kafka, Hive, Data Modeling, Python, Pyspark, SQL
Experience: 5 to 6 Years
Location: Lower Parel, Mumbai (3 Days WFO)
BGV: Education, Address, Employment, Criminal
About the Role
We are looking for a passionate and experienced Data Engineer to join our team and
help build scalable, reliable, and efficient data pipelines on cloud platforms like
Primarily on Google Cloud Platform (GCP) and secondary on Amazon Web
Services (AWS). You will work with cutting-edge technologies to process structured
and unstructured data, enabling data-driven decision-making across the
organization.
Key Responsibilities
Design, develop, and maintain robust data pipelines and ETL/ELT workflows
using PySpark, Python, and SQL.
Build and manage data ingestion and transformation processes from various
sources including Hive, Kafka, and cloud-native services.
Orchestrate workflows using Apache Airflow and ensure timely and reliable
data delivery.
Work with large-scale big data systems to process structured and
unstructured datasets.
Implement data quality checks, monitoring, and alerting mechanisms.
Collaborate with cross-functional teams including data scientists, analysts,
and product managers to understand data requirements.
Optimize data processing for performance, scalability, and cost-efficiency.
Ensure compliance with data governance, security, and privacy standards.
Required Skills & Qualifications
5+ years of experience in data engineering or related roles.
Strong programming skills in Python and PySpark.
Proficiency in SQL and experience with Hive.
Hands-on experience with Apache Airflow for workflow orchestration.
Experience with Kafka for real-time data streaming.
Solid understanding of big data ecosystems and distributed computing.
Experience with GCP (BigQuery, Dataflow, Dataproc
Ability to work with both structured (e.g., relational databases)
and unstructured (e.g., logs, images, documents) data.
Familiarity with CI/CD tools and version control systems (e.g., Git).
Knowledge of containerization (Docker) and orchestration (Kubernetes).
Exposure to data cataloging and governance tools (e.g., AWS Lake
Formation, Google Data Catalog).
Understanding of data modeling and architecture principles.
Soft Skills
Strong analytical and problem-solving abilities.
Excellent communication and collaboration skills.
Ability to work in Agile/Scrum environments.
Ownership mindset and attention to detail.