Overview
Job Title: Data Pipeline & Cloud Infrastructure Engineer
Location: Bengaluru
Experience: 4 to 6 Years
Education Qualification: Bachelor of Engineering (B.E.)
Mandatory Skills
- Data Pipeline
- AWS SageMaker
- AWS Glue
- AWS (EC2, Lambda, S3, Athena, CloudFormation, CodePipeline, CodeDeploy)
- Python
- PySpark
- DevOps
- Ansible
- Web Application Hosting
- Terraform
- Boto3
Skills to Evaluate
- AWS
- Data Pipelines
- Cloud Infrastructure
- AWS Glue
- Python
- PySpark
- Athena
- ETL
- DevOps
- CodePipeline
Job Description
We are looking for an experienced Data Pipeline & Cloud Infrastructure Engineer to design, build, and maintain scalable data pipelines and cloud infrastructure. The role requires expertise in AWS services, Python-based automation, and DevOps practices to support machine learning deployments and analytics platforms.
Roles & Responsibilities
1. Data Pipeline Development & Maintenance
- Design and maintain robust ETL pipelines for loading and enriching data for machine learning model consumption.
- Use AWS Glue, Athena, and S3, primarily leveraging Python (90%) and PySpark (10%).
- Implement logging, error handling, and notifications.
- Collaborate with Data Scientists to troubleshoot and resolve data pipeline issues.
2. Workload & Infrastructure Management
- Manage and secure EC2 workloads including Linux setup, SSH access, patching, and deployment of ML models.
- Maintain infrastructure provisioning using AWS CloudFormation and Terraform.
3. Web Application Hosting
- Host analytics applications using Python frameworks such as Streamlit on EC2.
- Manage ELB, SSL/TLS certificates, and Linux-based web hosting environments.
4. Monitoring & Reporting
- Generate and maintain regular reports (billing, cost consolidation, data usage stats, activity logs).
- Monitor infrastructure health and performance metrics proactively.
5. Ad-hoc Research Support
- Provision EC2 instances for temporary research projects and manage their lifecycle (access, teardown, cost control).
6. SageMaker Support
- Manage user roles, permissions, and domain configurations in AWS SageMaker.
- Provide infrastructure support for model deployment (no direct ML work).
7. Network & Access Management
- Manage VPC configurations, VPN access provisioning, and handle IP whitelisting.
- Support network security compliance requirements.
8. CI/CD & Deployment
- Maintain and deploy infrastructure using CloudFormation templates.
- Manage CI/CD pipelines with AWS CodePipeline and CodeDeploy.
- Control ML and infrastructure codebases using Git.
Qualifications
- Strong hands-on experience with AWS services: Glue, Athena, S3, EC2, Lambda, CloudFormation, VPC
- Proficiency in Python and PySpark for automation and data workflows
- Experience hosting web applications on AWS using Python frameworks
- Good understanding of DevOps tools and processes including CI/CD, Git, and Ansible
- Exposure to AWS SageMaker for infrastructure and access management
- Strong Linux system administration and troubleshooting skills
- Familiarity with security, monitoring, and reporting in cloud environments
- Good understanding of networking fundamentals: VPC, VPN, IP management
Additional Information
- Flexible working hours aligned to project milestones
- Collaborative work environment with cross-functional data science and engineering teams
- Opportunity to build secure, scalable, and production-grade infrastructure for analytics and ML workloads