Overview
8+ years of experience in data engineering, specifically in cloud environments like AWS.
Proficiency in PySpark for distributed data processing and transformation.
Solid experience with AWS Glue for ETL jobs and managing data workflows.
Hands-on experience with AWS Data Pipeline (DPL) for workflow orchestration.
Strong experience with AWS services such as S3, Lambda, Redshift, RDS, and EC2.
Technical Skills:
Proficiency in Python and PySpark for data processing and transformation tasks.
Deep understanding of ETL concepts and best practices.
Familiarity with AWS Glue (ETL jobs, Data Catalog, and Crawlers).
Experience building and maintaining data pipelines with AWS Data Pipeline or similar orchestration tools.
Familiarity with AWS S3 for data storage and management, including file formats (CSV, Parquet, Avro).
Strong knowledge of SQL for querying and manipulating relational and semi-structured data.
Experience with Data Warehousing and Big Data technologies, specifically within AWS.
Additional Skills:
Experience with AWS Lambda for serverless data processing and orchestration.
Understanding of AWS Redshift for data warehousing and analytics.
Familiarity with Data Lakes, Amazon EMR, and Kinesis for streaming data processing.
Knowledge of data governance practices, including data lineage and auditing.
Familiarity with CI/CD pipelines and Git for version control.
Experience with Docker and containerization for building and deploying applications.
Design and Build Data Pipelines: Design, implement, and optimize data pipelines on AWS using PySpark, AWS Glue, and AWS Data Pipeline to automate data integration, transformation, and storage processes.
ETL Development: Develop and maintain Extract, Transform, and Load (ETL) processes using AWS Glue and PySpark to efficiently process large datasets.
Data Workflow Automation: Build and manage automated data workflows using AWS Data Pipeline, ensuring seamless scheduling, monitoring, and management of data jobs.
Data Integration: Work with different AWS data storage services (e.g., S3, Redshift, RDS) to ensure smooth integration and movement of data across platforms.
Optimization and Scaling: Optimize and scale data pipelines for high performance and cost efficiency, utilizing AWS services like Lambda, S3, and EC2.
About Virtusa
Teamwork, quality of life, professional and personal development: values that Virtusa is proud to embody. When you join us, you join a team of 27,000 people globally that cares about your growth — one that seeks to provide you with exciting projects, opportunities and work with state of the art technologies throughout your career with us.
Great minds, great potential: it all comes together at Virtusa. We value collaboration and the team environment of our company, and seek to provide great minds with a dynamic place to nurture new ideas and foster excellence.
Virtusa was founded on principles of equal opportunity for all, and so does not discriminate on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status or any other basis covered by appropriate law. All employment is decided on the basis of qualifications, merit, and business need.