Mumbai, Maharashtra, India
Information Technology
Full-Time
Monocept
Overview
Role Title: Data Engineer
Experience: 4–5 years
- Key Responsibilities
- Develop and maintain ETL/ELT pipelines using PySpark and orchestration tools (e.g., Airflow/AWS Step Functions/Azure Data Factory).
- Implement data models (staging, core, mart layers) aligned with Data Warehouse concepts (star/snowflake).
- Optimize Spark jobs (partitioning, caching, broadcast joins) for performance and cost.
- Build and manage data lakes/warehouses (e.g., S3/ADLS + Hive/Delta Lake/Redshift/Synapse).
- Author robust SQL (CTEs, window functions) and stored procedures/functions for complex transformations.
- Integrate Kafka/Kinesis/Event Hub for near real-time ingestion and streaming pipelines.
- Ensure data reliability via unit tests, data quality checks (Great Expectations/Deequ), and monitoring.
- Manage CI/CD for data workflows; collaborate on code reviews and versioning (Git).
- Implement security, governance, and metadata management (IAM/Key Vault/Lake Formation/Unity Catalog).
- Create operational documentation, runbooks, and contribute to production support (on-call rotations).
- 3) Mandatory Skills
- Big Data Ecosystem: Hadoop, HDFS, Hive/Delta Lake, Spark fundamentals
- Cloud: AWS (S3, EMR/Glue, Athena, Redshift) or Azure (ADLS, Databricks/Synapse, ADF)
- PySpark: RDD/DataFrame APIs, UDFs, Spark SQL, optimization
- SQL & Databases: Advanced SQL, indexing, query tuning, relational modeling
- ETL & DWH Concepts: Slowly Changing Dimensions, CDC, partitioning, schema evolution
- Orchestration: Airflow/ADF/Step Functions (at least one)
- Version Control & CI/CD: Git, branching, pipelines
- Data Quality & Testing: Unit tests, validation frameworks, logging & observability
- 4) Good-to-Have Skills
- NoSQL: MongoDB, Cassandra, DynamoDB, Cosmos DB
- Streaming: Kafka, Kinesis, Event Hub, Spark Structured Streaming
- Procedures & Functions: PL/pgSQL, T-SQL
- Performance Tuning: Cost-aware design, cluster sizing, job profiling
- Security: IAM/Role-based access, encryption at rest/in transit
- Exposure: dbt, Lakehouse patterns, data governance tools
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in