Overview
We are seeking an experienced DevOps Engineer with deep expertise in Amazon Web Services (AWS) to design, implement, and maintain our cloud infrastructure for AI/ML workloads. You will work closely with our Data Science and Engineering teams to deploy and scale machine learning models in Amazon SageMaker, manage compute environments with EC2, and build secure, highly available storage solutions with S3.
You will own the deployment pipelines, automate provisioning and scaling, and ensure reliable CI/CD workflows for model iteration. This role demands hands-on knowledge of infrastructure as code, monitoring and alerting, and best practices in cost optimization and security for AWS environments.
- Build, manage, and optimize SageMaker endpoints for real-time inference and batch processing.
- Configure auto-scaling policies for endpoints to handle varying traffic loads efficiently.
- Implement multi-AZ deployments and blue/green or canary release strategies for model updates with zero downtime.
- Integrate SageMaker pipelines with CI/CD systems (e.g., CodePipeline, GitHub Actions, or Jenkins)
- Design and maintain EC2 instances (compute fleets, spot/on-demand mix, AMI management, Auto Scaling Groups).
- Manage S3 buckets for data storage, versioning, and lifecycle policies (data pipelines, model artifacts, logs).
- Set up and monitor VPCs, subnets, security groups, and IAM policies for secure access control.
- Implement CloudFormation / Terraform to define reproducible infrastructure as code.
- Develop and maintain CI/CD pipelines for ML model builds, tests, and deployments.
- Integrate automated testing and validation steps in the deployment process.
- Create scripts and tooling (Python, Bash, etc.) to automate repetitive operational tasks.
Job Types: Full-time, Permanent
Pay: ₹25,000.00 - ₹70,000.00 per month
Benefits:
- Work from home
Schedule:
- Day shift
Supplemental Pay:
- Performance bonus
Work Location: Remote