Bangalore, Karnataka, India
Information Technology
Full-Time
Virtusa
Overview
Primary Skills
Experience in data engineering, with a proven focus on data ingestion and extraction using Python/PySpark..
Extensive AWS experience is mandatory, with proficiency in Glue, Lambda, SQS, SNS, AWS IAM, AWS Step Functions, S3, and RDS (Oracle, Aurora Postgres).
4+ years of experience working with both relational and non-relational/NoSQL databases is required.
Strong SQL experience is necessary, demonstrating the ability to write complex queries from scratch. Also, experience in Redshift is required along with other SQL DB experience
Strong scripting experience with the ability to build intricate data pipelines using AWS serverless architecture. understanding of building an end-to end Data pipeline.
Secondary Skills
Strong understanding of Kinesis, Kafka, CDK. Experience with Kafka and ECS is also required.
strong understanding of data concepts related to data warehousing, business intelligence (BI), data security, data quality, and data profiling is required
Experience in Node Js and CDK.
JD
Responsibilities
Lead the architectural design and development of a scalable, reliable, and flexible metadata-driven data ingestion and extraction framework on AWS using Python/PySpark.
Design and implement a customizable data processing framework using Python/PySpark. This framework should be capable of handling diverse scenarios and evolving data processing requirements.
Implement data pipeline for data Ingestion, transformation and extraction leveraging the AWS Cloud Services
Seamlessly integrate a variety of AWS services, including S3,Glue, Kafka, Lambda, SQL, SNS, Athena, EC2, RDS (Oracle, Postgres, MySQL), AWS Crawler to construct a highly scalable and reliable data ingestion and extraction pipeline.
Facilitate configuration and extensibility of the framework to adapt to evolving data needs and processing scenarios.
Develop and maintain rigorous data quality checks and validation processes to safeguard the integrity of ingested data.
Implement robust error handling, logging, monitoring, and alerting mechanisms to ensure the reliability of the entire data pipeline.
Qualifications
Must Have
Over 6 years of hands-on experience in data engineering, with a proven focus on data ingestion and extraction using Python/PySpark.
Extensive AWS experience is mandatory, with proficiency in Glue, Lambda, SQS, SNS, AWS IAM, AWS Step Functions, S3, and RDS (Oracle, Aurora Postgres).
4+ years of experience working with both relational and non-relational/NoSQL databases is required.
Strong SQL experience is necessary, demonstrating the ability to write complex queries from scratch.
Strong working experience in Redshift is required along with other SQL DB experience.
Strong scripting experience with the ability to build intricate data pipelines using AWS serverless architecture.
Complete understanding of building an end-to end Data pipeline.
Nice to have
Strong understanding of Kinesis, Kafka, CDK.
A strong understanding of data concepts related to data warehousing, business intelligence (BI), data security, data quality, and data profiling is required.
Experience in Node Js and CDK.
Experience in data engineering, with a proven focus on data ingestion and extraction using Python/PySpark..
Extensive AWS experience is mandatory, with proficiency in Glue, Lambda, SQS, SNS, AWS IAM, AWS Step Functions, S3, and RDS (Oracle, Aurora Postgres).
4+ years of experience working with both relational and non-relational/NoSQL databases is required.
Strong SQL experience is necessary, demonstrating the ability to write complex queries from scratch. Also, experience in Redshift is required along with other SQL DB experience
Strong scripting experience with the ability to build intricate data pipelines using AWS serverless architecture. understanding of building an end-to end Data pipeline.
Secondary Skills
Strong understanding of Kinesis, Kafka, CDK. Experience with Kafka and ECS is also required.
strong understanding of data concepts related to data warehousing, business intelligence (BI), data security, data quality, and data profiling is required
Experience in Node Js and CDK.
JD
Responsibilities
Lead the architectural design and development of a scalable, reliable, and flexible metadata-driven data ingestion and extraction framework on AWS using Python/PySpark.
Design and implement a customizable data processing framework using Python/PySpark. This framework should be capable of handling diverse scenarios and evolving data processing requirements.
Implement data pipeline for data Ingestion, transformation and extraction leveraging the AWS Cloud Services
Seamlessly integrate a variety of AWS services, including S3,Glue, Kafka, Lambda, SQL, SNS, Athena, EC2, RDS (Oracle, Postgres, MySQL), AWS Crawler to construct a highly scalable and reliable data ingestion and extraction pipeline.
Facilitate configuration and extensibility of the framework to adapt to evolving data needs and processing scenarios.
Develop and maintain rigorous data quality checks and validation processes to safeguard the integrity of ingested data.
Implement robust error handling, logging, monitoring, and alerting mechanisms to ensure the reliability of the entire data pipeline.
Qualifications
Must Have
Over 6 years of hands-on experience in data engineering, with a proven focus on data ingestion and extraction using Python/PySpark.
Extensive AWS experience is mandatory, with proficiency in Glue, Lambda, SQS, SNS, AWS IAM, AWS Step Functions, S3, and RDS (Oracle, Aurora Postgres).
4+ years of experience working with both relational and non-relational/NoSQL databases is required.
Strong SQL experience is necessary, demonstrating the ability to write complex queries from scratch.
Strong working experience in Redshift is required along with other SQL DB experience.
Strong scripting experience with the ability to build intricate data pipelines using AWS serverless architecture.
Complete understanding of building an end-to end Data pipeline.
Nice to have
Strong understanding of Kinesis, Kafka, CDK.
A strong understanding of data concepts related to data warehousing, business intelligence (BI), data security, data quality, and data profiling is required.
Experience in Node Js and CDK.
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in