Overview
DescriptionWe are seeking a highly motivated Data Engineer with experience in building scalable data lake architectures and high-performance data pipelines. The ideal candidate should have strong expertise in Apache Spark/PySpark, AWS data ecosystem, Apache Iceberg, and large-scale distributed data processing.
Working as a Data Engineer at Navtech, you will :
- Design, build, and maintain scalable ETL/ELT data pipelines using Apache Spark and PySpark
- Develop and optimize large-scale distributed data processing workflows
- Build and manage modern data lake architectures using Bronze/Silver/Gold layered approach
- Implement Apache Iceberg features including schema evolution, upserts, and time-travel queries
- Optimize Spark jobs for performance, memory usage, and compute efficiency
- Perform Parquet file optimization and small-files management for efficient storage and query performance
- Design and maintain OpenSearch indexing strategies and cluster architecture
- Write advanced SQL queries and perform query optimization
- Implement orchestration workflows using modern scheduling tools
- Ensure data governance, security, and compliance using AWS IAM roles and best practices
- Automate workflows using Python scripting and CI/CD pipelines
- Collaborate with Data Scientists, Analysts, and Product teams to deliver high-quality data solutions
- Monitor and troubleshoot production pipelines and data systems
Who Are We Looking For 1.5 - 4 years of experience in Data Engineering or Big Data Engineering
- Strong hands-on experience with Apache Spark and PySpark
- Experience with AWS services including S3, Glue, EMR, Athena, Lambda, EC2, OpenSearch
- Hands-on experience with Apache Iceberg (schema evolution, upserts, time travel)
- Strong knowledge of Data Lake architecture and layered design Advanced SQL skills and query optimization techniques
- Experience in Spark performance tuning
- Strong understanding of Parquet file optimization and small-file handling
- Experience with OpenSearch indexing and cluster design
- Strong Python programming for data processing and automation
- Experience with ETL/ELT pipeline orchestration tools
- Knowledge of AWS IAM roles, security practices, and data governance.
- Experience with CI/CD pipelines and managed AWS services
- Hands-on experience handling large-scale distributed datasets
- Should have excellent logical, analytical and communication skills.
- Should have a Masters/Bachelors (BS) degree in Computer Science, Information Technology, Data Science or related degrees and throughout education in English medium.
- Performance review and Appraisal Twice a year.
- Competitive pay package with additional bonus and benefits.
- Work with US, UK and Europe based industry-renowned clients for exponential technical growth.
- Opportunities to work on multiple projects.
- Medical Insurance cover for self and immediate family.
- Work with a culturally diverse team from different geographies.
About us is a premier IT software and Services provider. Navtechs mission is to increase public cloud adoption and build cloud-first solutions that become trendsetting platforms of the future. We have been recognized as the Best Cloud Service Provider at GoodFirms for ensuring good results with quality services.
Here, we strive to innovate and push technology and service boundaries to provide best-in-class technology solutions to our clients at scale. We deliver to our clients globally from our state-of-the-art design and development centers in the US, Hyderabad, and Pune. Were a fast-growing company with clients in the United States, UK and Europe.
You will join a team of talented developers, quality engineers, product managers whose mission is to impact above 100 million people across the world with technological services by the year 2030.
(ref:hirist.tech)