Information Technology
Other
Infosys Limited
Overview
Job Description:
- Build and scale data solutions that power smarter decisions
- In this role you ll work at the intersection of software engineering and data engineering using Python PySpark and ETL to transform raw complex datasets into reliable analytics ready assets
- You ll collaborate closely with data engineers analysts and stakeholders to understand requirements design efficient pipelines and deliver high quality outputs on time
- If you enjoy solving performance challenges improving data quality and creating maintainable code that runs in production this is a great opportunity to grow your impact
- Expect a supportive collaborative environment where ownership is encouraged learning is continuous and your contributions directly improve how teams access and trust data
Key Responsibilities:
- Data Pipeline Development
- Develop and maintain scalable batch ETL pipelines using Python and PySpark for data ingestion transformation and loading
- Implement reusable transformation logic ensuring pipelines are modular testable and easy to maintain
- Optimize Spark jobs for performance partitioning caching joins shuffles and cost efficiency
- Data Quality Reliability
- Apply data validation checks handle schema evolution and ensure accuracy and completeness of processed datasets
- Troubleshoot pipeline failures analyze logs and implement robust error handling and retry mechanisms
- Monitor job runs and support operational stability through alerts runbooks and timely incident resolution
- Collaboration Delivery
- Work with cross functional teams to gather requirements define data mappings and deliver datasets aligned to business needs
- Participate in code reviews follow engineering best practices and contribute to continuous improvement of standards and tooling
- Document pipeline logic dependencies and operational procedures for smooth handovers and long term maintainability
Technical Requirements:
- Technology Analytics Packages Python Big Data Technology Big Data Data Processing PySpark ETL
Additional Responsibilities:
- Bachelor s degree in Computer Science Engineering Information Systems or a related field or equivalent practical experience
- 2 5 years of hands on experience building data pipelines using Python and PySpark
- Strong understanding of ETL concepts data transformations and handling large scale datasets
- Proficiency in writing clean maintainable code and debugging production issues
- Working knowledge of data structures algorithms and software development best practices
Preferred Skills:
Technology->Analytics - Packages->Python - Big Data,Technology->Big Data - Data Processing->PySpark
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in