Mumbai, Maharashtra, India
Information Technology
Full-Time
Sigmoid
Overview
We are looking for a skilled Data Engineer with 6+ years of experience in big data technologies, particularly Python, PYSpark, SQL, and data lakehouse architectures. The ideal candidate will have a strong background in building scalable data pipelines and experience with modern data storage formats, including Apache Iceberg. You will work closely with cross-functional teams to design and implement efficient data solutions in a cloud-based environment.
Data Pipeline Development
The core responsibilities for the job include the following:
Data Pipeline Development
The core responsibilities for the job include the following:
- Design, build, and optimize scalable data pipelines using Apache Spark.
- Implement and manage large-scale data processing solutions across data lakehouses.
- Work with modern data lakehouse platforms (e. g. Apache Iceberg) to handle large datasets.
- Optimize data storage, partitioning, and versioning to ensure efficient access and querying.
- Write complex SQL queries to extract, manipulate, and transform data.
- Develop performance-optimized queries for analytical and reporting purposes.
- Integrate various structured and unstructured data sources into the lakehouse environment.
- Work with stakeholders to define data needs and ensure data is available for downstream consumption.
- Implement data quality checks and ensure the reliability and accuracy of data.
- Contribute to metadata management and data cataloging efforts.
- Monitor and optimize the performance of Spark jobs, SQL queries, and overall data infrastructure.
- Work with cloud infrastructure teams to optimize costs and scale as needed.
- Bachelor's or Master's degree in Computer Science, Information Technology, or a related field.
- 8+ years of experience in data engineering, with a focus on Java/Python, Spark, and SQL Programming languages.
- Hands-on experience with Apache Iceberg, Snowflake, or similar technologies.
- Strong understanding of data lakehouse architectures and data warehousing principles.
- Proficiency in AWS data services.
- Experience with version control systems like Git and CI/CD pipelines.
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration skills.
- Experience with containerization (Docker, Kubernetes) and orchestration tools like Airflow.
- Certifications in AWS cloud technologies.
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in