Overview
POSITION SUMMARY
Platform at EarnIn strives to operate as a platform-as-product organization. We build the foundation that empowers product teams to ship quickly and safely: golden-path CI/CD, Kubernetes runtimes, observability, self-service workflows, and paved paths exposed through our developer control plane. We hold a high bar for operational excellence and measure our impact through developer velocity, reliability, and cost efficiency. We’re seeking a Senior Platform Engineer to design, build, and evolve the internal platform that powers EarnIn’s microservices and data systems. You’ll lead work across Kubernetes (AWS EKS), GitOps (Argo CD), CI/CD (GitHub Actions), service mesh (Linkerd), observability (Datadog), and our developer portal (Cortex), with a focus on safe change delivery, self-service, and reliability at scale. You’ll collaborate with DevX, SRE, and Product Engineering peers to simplify the SDLC, raise our reliability posture, and turn fragmented workflows into intuitive golden paths, aligned with our Platform Engineer growth matrix (autonomy, reliability, team contribution, and responsible AI fluency).
This position will be hybrid from our Bengaluru office, as part of our expanding site. EarnIn provides excellent employee benefits, including healthcare, internet/cell phone reimbursement, a learning and development stipend, and opportunities to collaborate with and travel to our Palo Alto HQ and Bangkok Site. Our salary ranges are determined by role, level, and location.
WHAT YOU'LL DO
- Design and evolve GitOps-based continuous delivery with Argo CD and Argo Rollouts (progressive delivery, automated rollbacks), integrating with our CI pipelines and standardized Helm/Kustomize workflows.
- Advance our Kubernetes platform on AWS EKS with strong multi-env hygiene, security (Pod Identity/RBAC), and rollout strategies; partner on cluster-level upgrades and “cluster vending” patterns for safer blue/green upgrades over time.
- Leverage our developer control plane (Cortex) to expose paved paths, scorecards, and self-service actions (bootstrap, deploy, SLOs, operations) so teams can move from idea to production smoothly.
- Strengthen observability and operational excellence: SLOs/error budgets, Datadog metrics/traces/logs, RUM (where applicable), and blameless postmortems that lead to preventive actions and automation.
- Partner with SRE to embed reliability gates into pipelines (pre-merge and pre-deploy validation, canary policies), and improve MTTD/MTTR through better telemetry and predictable rollback strategies.
- Contribute to and maintain service scaffolds, templates, and shared frameworks that encode standards (testing, security, telemetry), and keep supported language/framework versions aligned to platform baselines.
- Apply AI responsibly across diagnostics, validation, and CI/CD/ops workflows (e.g., anomaly detection, test generation, performance triage), measuring outcomes and iterating for impact.
WHAT WE'RE LOOKING FOR
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field, or related experience.
- 4+ years in platform, infrastructure, or backend engineering with deep hands-on experience in Kubernetes (preferably EKS) and cloud-native architectures on AWS.
- Expertise in GitOps and CD (ArgoCD/Argo Rollouts) and CI (GitHub Actions; reusable workflows, shared actions) for multi-service systems at scale.
- Strong coding skills in Go and/or Python (Java/Kotlin is a plus), treating infrastructure as software; fluency with Helm/Kustomize and Terraform (IaC) for platform automation.
- Solid observability skills (Datadog APM/metrics/tracing/logs) with a track record of improving reliability and driving SLO/error-budget culture.
- Experience with service mesh (e.g., Linkerd) and traffic management patterns for progressive delivery and resilience.
- Demonstrated AI fluency applied to the SDLC (validation, diagnostics, automation) with a bias for measurement and iteration
- Excellent communication and collaboration skills; mentoring mindset and ability to influence teams toward consistent standards and safer delivery at speed.
- Experience evolving multi-cluster strategies (blue/green upgrades, cluster vending), automated validation (e.g., Testkube), and DR/chaos practices is a plus.
- Hands-on contributions to developer productivity insights (lead time, change fail rate) and FinOps observability for cost-aware engineering decisions are a plus.
#LI-Hybrid