Chennai, Tamil Nadu, India
Information Technology
Full-Time
Mphasis
Overview
Job Summary:
We are seeking an experienced Data Engineer with a strong background in Scala development, advanced SQL, and big data technologies, particularly Apache Spark. The candidate will be responsible for designing, building, optimizing, and maintaining highly scalable and reliable data pipelines and data infrastructure.
Key Responsibilities:
- Data Pipeline Development: Design, develop, test, and deploy robust, high-performance, and scalable ETL/ELT data pipelines using Scala and Apache Spark to ingest, process, and transform large volumes of structured and unstructured data from diverse sources.
- Big Data Expertise: Leverage expertise in the Hadoop ecosystem (HDFS, Hive, etc.) and distributed computing principles to build efficient and fault-tolerant data solutions.
- Advanced SQL: Write complex, optimized SQL queries and stored procedures.
- Performance Optimization: Continuously monitor, analyze, and optimize the performance of data pipelines and data stores. Troubleshoot complex data-related issues, identify bottlenecks, and implement solutions for improved efficiency and reliability.
- Data Quality & Governance: Implement data quality checks, validation rules, and reconciliation processes to ensure the accuracy, completeness, and consistency of data. Contribute to data governance and security best practices.
- Automation & CI/CD: Implement automation for data pipeline deployment, monitoring, and alerting using tools like Apache Airflow, Jenkins, or similar CI/CD platforms.
- Documentation: Create and maintain comprehensive technical documentation for data architectures, pipelines, and processes.
Required Skills & Qualifications:
- Bachelor's or master's degree in computer science, Engineering, or a related quantitative field.
- Minimum 5 years of professional experience in Data Engineering, with a strong focus on big data technologies.
- Proficiency in Scala for developing big data applications and transformations, especially with Apache Spark.
- Expert-level proficiency in SQL; ability to write complex queries, optimize performance, and understand database internals.
- Extensive hands-on experience with Apache Spark (Spark SQL, Data Frames, RDDs) for large-scale data processing and analytics.
- Solid understanding of distributed computing concepts and experience with the Hadoop ecosystem (HDFS, Hive).
- Experience with building and optimizing ETL/ELT processes and data warehousing concepts.
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in