Overview
Key Responsibilities :
• AWS Stability & Scalability: Design, implement, and maintain highly available,
fault-tolerant architecture on AWS (EC2, ECS/EKS, RDS, S3, CloudFront, etc.)
• Automated Deployments: Build and own CI/CD pipelines (GitHub Actions, Jenkins, or
similar) for microservices and data pipelines
• Infrastructure as Code: Define, version, and manage cloud resources using Terraform or
CloudFormation
• Cost Optimization: Continuously monitor and optimize AWS spend—right-sizing instances,
leveraging Spot/Reserved Instances, and identifying waste
• Monitoring & Alerting: Implement end-to-end observability (CloudWatch,
Prometheus/Grafana, ELK) and set up meaningful SLOs/SLIs/SLAs
• Incident Management: Lead incident response—triage alerts, debug production issues,
perform root-cause analysis, and implement preventive measures
• Security & Compliance: Enforce best practices for IAM, network security (VPC, Security
Groups), and data encryption at rest and in transit.
• Collaboration: Work closely with Dev, Data, and Product teams to ensure reliable feature
rollouts and performance tuning.
*Skills & Qualifications *
● 5+ years of SRE/DevOps experience in a product-led company or high-growth startup
● Strong AWS hands-on expertise (compute, storage, networking, DNS, CI/CD)
● Proven track record of building and maintaining CI/CD pipelines
● Expertise in Infrastructure as Code (Terraform or CloudFormation)
● Solid understanding of container orchestration (Docker + Kubernetes/ECS)
● Experience with monitoring tools (CloudWatch, Prometheus, ELK, Grafana)
● Demonstrable cost-optimization wins on cloud platforms
● Strong scripting skills (Python, Bash, or Go) and familiarity with Git
● Excellent troubleshooting, communication, and documentation skills
*Good-to-Have: *
• Experience with service mesh (Istio, Linkerd)
• Familiarity with message brokers (Kafka, RabbitMQ)
• Knowledge of SecOps/Supply Chain Security (HashiCorp Vault, AWS KMS)
• Previous exposure to FinTech or regulated