Overview
DutiesWe are seeking a highly skilled and motivated Data Engineer to join our team. The ideal candidate will have a strong background in building and managing data pipelines, with expertise in AWS Glue (Spark-based jobs), Snowflake, and Kafka. This role will involve designing, implementing, and optimizing data workflows, as well as integrating real-time and batch data pipelines to support business-critical analytics and notifications. The candidate will also work closely with cross-functional teams to ensure seamless data operations and scalability.
Key Responsibilities
Data Pipeline Development and Management
Design, develop, and maintain AWS Glue Spark jobs to process and transform large-scale datasets.
Build and optimize ETL/ELT pipelines for data ingestion, transformation, and loading into Snowflake.
Implement robust error handling, logging, and monitoring mechanisms for data workflows.
Real-Time Data Integration
Develop and maintain AWS Lambda functions to process and push Kafka events into Snowflake.
Ensure low-latency, high-throughput data ingestion for real-time analytics and event-driven architectures.
Data Analysis and Optimization
Collaborate with data analysts and business stakeholders to understand data requirements and deliver actionable insights.
Optimize Snowflake queries, schemas, and performance for efficient data analysis.
Automation and Notifications
Implement automated workflows to send email notifications based on data processing outcomes or business triggers.
Ensure timely and accurate communication of critical events to relevant stakeholders.
Collaboration and Best Practices
Work closely with cross-functional teams, including data scientists, analysts, and DevOps engineers, to ensure seamless data operations.
Establish and enforce best practices for data engineering, including version control, CI/CD pipelines, and documentation.
Skills
Technical Skills
- AWS Glue: Proficiency in developing and managing Spark-based Glue jobs.
- Snowflake: Strong experience in Snowflake data warehousing, including schema design, query optimization, and performance tuning.
- Kafka: Hands-on experience with Kafka for real-time data streaming and integration.
- AWS Lambda: Experience in building serverless functions for event-driven workflows.
- Programming: Proficiency in Python
- SQL: Advanced SQL skills for data transformation and analysis.
- Cloud Infrastructure: Familiarity with AWS services such as S3, CloudWatch, SES and IAM.
- CI/CD: Familiarity with Terraform and Git actions.
- Strong problem-solving and analytical skills.
- Excellent communication and collaboration abilities.
- Ability to work in a fast-paced, dynamic environment with minimal supervision.
Bachelor's degree in quantitative field like Computer Science, Engineering, Statistics, Mathematics or related field required. Advanced degree is a strong plus