Overview
Job Description
We are seeking a highly skilled and experienced Platform Engineer to manage and enhance our entire application delivery platform, from Cloudfront to the underlying EKS clusters and their associated components.
The ideal candidate will possess deep expertise across cloud infrastructure, networking, Kubernetes, and service mesh technologies, coupled with strong programming skills. This role involves maintaining the stability, scalability, and performance of our production environment, including day-to-day operations, upgrades, troubleshooting, and developing in-house tools.
Main Responsibilities
- Perform regular upgrades and patching of EKS clusters and associated components & oversee the health, performance, and scalability of the EKS clusters.
- Manage and optimize related components such as Karpenter (cluster autoscaling) and ArgoCD (GitOps continuous delivery).
- Implement and manage service mesh solutions (e.g., Istio, Linkerd) for enhanced traffic management, security, and observability.
- Participate in an on-call rotation to provide 24/7 support for critical platform issues and monitor the platform for potential issues and implement preventative measures.
- Develop, maintain, and automate in-house tools and scripts using programming languages like Python or Go to improve platform operations and efficiency.
- Configure and manage CloudFront distributions, WAF Policies for efficient & secure content delivery & routing.
- Develop and maintain documentation for platform architecture, processes, and troubleshooting guides.
Tech Stack
- AWS:
- VPC, EC2, ECS, EKS, Lambda, Cloudfront, WAF, MWAA, RDS, ElastiCache, DynamoDB, Opensearch, S3, CloudWatch, Cognito, SQS, KMS, Secret Manager, KMS, MSK
- Terraform, Github Actions, Prometheus, Grafana, Atlantis, ArgoCD, OpenTelemetry
Required Skills and Experiences
- Proven 6+ Years experience as a Platform Engineer, Site Reliability Engineer (SRE), or similar role with a focus on end-to-end platform ownership.
- In-depth knowledge and hands-on experience of at least 4 years with Amazon EKS and Kubernetes.
- Strong understanding and practical experience with Karpenter, ArgoCD, Terraform..
- Solid grasp of core networking concepts and extensive experience of at least 5 years with AWS networking services (VPC, Security Groups, Network ACLs, CloudFront, WAF, ALB, DNS).
- Demonstrable experience with SSL/TLS certificate management.
- Proficiency in programming languages such as Python or Go for developing and maintaining automation scripts and internal tools.
- Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
- Excellent problem-solving and debugging skills across complex distributed systems.
- Strong communication and collaboration abilities.
- Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
Preferred Qualifications
- Prior experience working with service mesh technologies (preferably Istio) in a production environment.
- Experience building or contributing to Kubernetes Controllers.
- Experience with multi-cluster Kubernetes architectures.
- Experience building AZ isolated, DR architectures.
Remarks
*Please note that you cannot apply for PayPay (Japan-based jobs) or other positions in parallel or in duplicate.
PayPay 5 senses
- Please refer PayPay 5 senses to learn what we value at work.
Working Conditions
Employment Status
- Full Time
Office Location
- Gurugram (Wework)
※The development center requires you to work in the Gurugram office to establish the strong core team.