Back to Jobs
3 Days ago
Senior Data Engineer (AWS Data Lakes & Forecasting Pipelines) - Remote, India
Chennai, Tamil Nadu, India
Information Technology
Full-Time
LeewayHertz
Overview
Job Description
This is a remote position.
Job Summary
We are looking for an experienced Senior Data Engineer to lead the development of scalable AWS-native data lake pipelines with a strong focus on time series forecasting and upsert-ready architectures. This role requires end-to-end ownership of the data lifecycle, from ingestion to partitioning, versioning, and BI delivery. The ideal candidate must be highly proficient in AWS data services, PySpark, versioned storage formats like Apache Hudi/Iceberg, and must understand the nuances of data quality and observability in large-scale analytics systems.
Responsibilities
Essential Skills:
Job
Job
check(event) ; career-website-detail-template-2 => apply(record.id,meta)">
This is a remote position.
Job Summary
We are looking for an experienced Senior Data Engineer to lead the development of scalable AWS-native data lake pipelines with a strong focus on time series forecasting and upsert-ready architectures. This role requires end-to-end ownership of the data lifecycle, from ingestion to partitioning, versioning, and BI delivery. The ideal candidate must be highly proficient in AWS data services, PySpark, versioned storage formats like Apache Hudi/Iceberg, and must understand the nuances of data quality and observability in large-scale analytics systems.
Responsibilities
- Design and implement data lake zoning (Raw → Clean → Modeled) using Amazon S3, AWS Glue, and Athena.
- Ingest structured and unstructured datasets including POS, USDA, Circana, and internal sales data.
- Build versioned and upsert-friendly ETL pipelines using Apache Hudi or Iceberg.
- Create forecast-ready datasets with lagged, rolling, and trend features for revenue and occupancy modeling.
- Optimize Athena datasets with partitioning, CTAS queries, and metadata tagging.
- Implement S3 lifecycle policies, intelligent file partitioning, and audit logging.
- Build reusable transformation logic using dbt-core or PySpark to support KPIs and time series outputs.
- Integrate robust data quality checks using custom logs, AWS CloudWatch, or other DQ tooling.
- Design and manage a forecast feature registry with metrics versioning and traceability.
- Collaborate with BI and business teams to finalize schema design and deliverables for dashboard consumption.
Essential Skills:
Job
- Deep hands-on experience with AWS Glue, Athena, S3, Step Functions, and Glue Data Catalog.
- Strong command over PySpark, dbt-core, CTAS query optimization, and partition strategies.
- Working knowledge of Apache Hudi, Iceberg, or Delta Lake for versioned ingestion.
- Experience in S3 metadata tagging and scalable data lake design patterns.
- Expertise in feature engineering and forecasting dataset preparation (lags, trends, windows).
- Proficiency in Git-based workflows (Bitbucket), CI/CD, and deployment automation.
- Strong understanding of time series KPIs, such as revenue forecasts, occupancy trends, or demand volatility.
- Data observability best practices including field-level logging, anomaly alerts, and classification tagging.
- Independent, critical thinker with the ability to design for scale and evolving business logic.
- Strong communication and collaboration with BI, QA, and business stakeholders.
- High attention to detail in ensuring data accuracy, quality, and documentation.
- Comfortable interpreting business-level KPIs and transforming them into technical pipelines.
Job
- Experience with statistical forecasting frameworks such as Prophet, GluonTS, or related libraries.
- Familiarity with Superset or Streamlit for QA visualization and UAT reporting.
- Understanding of macroeconomic datasets (USDA, Circana) and third-party data ingestion.
- Proactive, ownership-driven mindset with a collaborative approach.
- Strong communication and collaboration skills.
- Strong problem-solving skills with attention to detail.
- Have the ability to work under stringent deadlines and demanding client conditions.
- Strong analytical and problem-solving skills.
- Ability to work in fast-paced, delivery-focused environments.
- Strong mentoring and documentation skills for scaling the platform.
- Bachelor’s degree in Computer Science, Information Technology, or a related field.
- Minimum 9+ years of experience in data engineering & architecture.
- This role offers the flexibility of working remotely in India.
check(event) ; career-website-detail-template-2 => apply(record.id,meta)">
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in