Back to Jobs

2 Days ago

AWS Devops / Platform Engineer ( Product Experince )

Apply Now

2000000 - 2500000 INR - Yearly

Mumbai, Maharashtra, IN

Information Technology

Full-Time

Ekshvaku Tech Innovations

Overview

DevOps Engineer

About Us

We build a collaborative, real-time workspace platform enabling teams to organize content, manage projects, and communicate at scale. Our platform is a cloud-native SaaS product running on AWS, serving users across multiple regions through a microservices architecture.

Our engineering team moves fast. We ship continuously to a Kubernetes-based infrastructure with a fully automated CI/CD pipeline and take infrastructure quality as seriously as product quality. We value engineers who treat infrastructure as code, own reliability end-to-end, and proactively improve the systems they work on.

About the Role

Experience Level: Mid to Senior | Minimum: 5+ years in DevOps / Platform Engineering

We are looking for an experienced DevOps Engineer to own and evolve the infrastructure that powers our platform. You will work closely with our backend and frontend engineers to keep our systems reliable, secure, observable, and cost-efficient.

You will manage a production-grade AWS environment spanning 16+ microservices (Go, Node.js) on Kubernetes (EKS), with infrastructure provisioned entirely through Terraform and deployments managed via Helm and GitHub Actions. The role covers everything from infrastructure design and CI/CD pipelines to monitoring, incident response, and security hardening.

This is a hands-on engineering role — you will write real Terraform, maintain Helm charts, build and improve CI/CD pipelines, and debug production issues from CloudWatch logs to Kubernetes pod events.

What You'll Do

Infrastructure as Code (Terraform)

Own, maintain, and evolve a large library of Terraform modules that provision the entire AWS environment across development and production accounts

Manage EKS cluster configurations including managed node groups and spot/fleet instance node groups (cost-optimized, achieving up to 70% savings vs on-demand)

Provision and maintain supporting infrastructure: VPC, subnets, security groups, ALB, ACM certificates, Route53 DNS, SQS queues, SES email, and EFS volumes

Add new modules for evolving infrastructure requirements and ensure all resources are reproducible and version-controlled

Apply Terraform changes safely across environments using Terraform workspaces and remote state backends

Kubernetes & Container Orchestration

Operate and maintain the AWS EKS cluster with both spot/fleet and on-demand worker node groups

Deploy and manage 16+ microservices on Kubernetes using Helm charts (4 custom charts: generic deployments, one-time jobs, cron jobs, and ingress)

Configure and tune Horizontal Pod Autoscalers (HPA), Pod Disruption Budgets (PDB), and Persistent Volume Claims (PVC) per service

Manage Kubernetes ingress, service accounts, RBAC, and ConfigMaps/Secrets

Maintain the Helm chart repository (versioning, publishing, GitHub Actions pipeline)

Debug pod failures, resource constraints, and node scheduling issues

CI/CD Pipeline Management

Own multiple GitHub Actions workflows covering PR validation, auto-deployment to dev, and production releases

Enforce a two-part release flow: (1) PR checks (build, unit tests, commit linting, manual approvals) → (2) auto-deploy on merge to development for dev environment; semver tag (vx.y.z) releases for production

Maintain build pipelines for Go microservices (multi-stage Docker builds), Node.js services, and Helm charts

Manage AWS ECR image repositories — pushing, tagging, lifecycle policies

Configure Slack notifications for deployment failures and pipeline events

Build and improve deployment automation, reducing manual intervention in release processes

Monitoring & Observability

Operate SigNoz for APM — configure service traces, metrics dashboards, and alerts across all microservices

Manage CloudWatch log groups per service (integrated via Fluent Bit log shipping from Kubernetes)

Maintain Grafana dashboards for infrastructure-level metrics

Monitor Prometheus metrics exposed by backend services

Maintain StatusPage.io public status pages for our services

Define alerting rules and on-call runbooks; own incident response and post-mortems

Security & Secrets Management

Manage AWS Secrets Manager for all service credentials (MongoDB, Wasabi, application configs)

Administer AWS Client VPN with SSO integration for secure developer access to private infrastructure

Maintain IAM roles, policies, and service accounts following least-privilege principles

Manage ACM certificates and ensure TLS is enforced across all ingress endpoints

Operate ClamAV for malware scanning of user-uploaded files

Support the SpiceDB fine-grained authorization service and its migration tooling

Participate in compliance reviews and apply security best practices across the AWS account

Networking & Cloud Architecture

Manage multi-VPC architecture: separate VPCs for dev and production environments with VPC peering for controlled cross-environment access

Configure MongoDB Atlas PrivateLink connectivity ensuring database clusters are accessible only from within the designated VPC

Maintain bastion host configuration for emergency database access

Design and implement network segmentation, security group rules, and NACLs

Manage DNS via Route53 and ALB routing rules

Collaboration with Engineering Teams

Partner with Go and Node.js backend engineers to containerize new services and onboard them to the deployment pipeline

Work with frontend engineers on AWS Amplify deployments for the Nuxt.js / Vue 3 PWA

Provide runbooks and documentation for common debugging workflows (e.g., CloudWatch log tailing, VPN access, EKS pod debugging)

Define and enforce infrastructure standards, naming conventions, and tagging strategies across environments

Our Stack — You'll Work With These Every Day

Cloud Platform — AWS

EKS (Kubernetes managed control plane)

EC2 (managed and custom/fleet node groups)

ECR (container image registry)

ALB (Application Load Balancer)

CloudWatch (logging and metrics)

Secrets Manager

SQS (message queues)

SES (transactional email)

ACM (SSL/TLS certificates)

Route53 (DNS)

EFS (persistent storage for Kubernetes)

Client VPN (developer access)

AWS SSO (identity federation)

AWS FIS (Fault Injection Simulator — chaos engineering)

AWS Amplify (frontend CI/CD and hosting)

Container Orchestration & Packaging

Kubernetes (EKS) — fleet/spot + on-demand node groups

Helm (4 custom charts: generic deployments, one-time jobs, cron jobs, ingress)

Docker (multi-stage builds for Go and Node.js services)

HPA, PDB, PVC, Ingress, RBAC

Infrastructure as Code

Terraform — modular components, multi-environment (dev + prod), remote state backend

CI/CD & Automation

GitHub Actions (multiple workflows)

Semver-based release tagging (vx.y.z) for production promotions

Slack for pipeline notifications

Monitoring & Observability

SigNoz (APM, distributed tracing, dashboards, alerts)

CloudWatch (log aggregation — per-service log streams)

Fluent Bit (Kubernetes log shipping to CloudWatch)

Grafana (infrastructure dashboards)

Prometheus (per-service metrics)

StatusPage.io (public incident communication)

Data & Storage

MongoDB Atlas (cloud MongoDB with PrivateLink, per-environment isolation)

Aurora PostgreSQL and MySQL (via Amazon RDS)

Redis (ElastiCache — single-instance and cluster mode)

Wasabi (S3-compatible object storage with HA configuration)

EFS (Elastic File System for Kubernetes PVCs)

Security & Access

AWS Secrets Manager

AWS Client VPN + AWS SSO

IAM (service roles, least-privilege policies)

ACM (TLS certificates)

Security Groups and NACLs

SpiceDB (fine-grained authorization service)

ClamAV (antivirus scanning)

Services Architecture

14 Go microservices (gRPC inter-service communication via Protocol Buffers)

1 Node.js service (document generation)

gRPC (primary inter-service transport)

REST/HTTP (client-facing APIs)

MongoDB change streams (event-driven data sync)

Asynq/Redis (async task queues)

Frontend Deployment

AWS Amplify (Nuxt.js 3 / Vue 3 PWA — web application frontend)

Node.js 22+

GitHub Actions for Amplify CI/CD

What We're Looking For

Minimum Experience Requirements at a Glance

Area

Minimum

DevOps / Platform Engineering (overall)

5+ years

Terraform (module-level IaC)

2+ years

Kubernetes in production

2+ years

AWS (EKS, ECR, CloudWatch, IAM, etc.)

2+ years

CI/CD pipeline ownership (GitHub Actions or equivalent)

1+ year

Must Have

Experience & General Skills

5+ years of hands-on DevOps or Platform Engineering experience in a production environment

Strong ownership mentality — you don't wait to be asked to fix something that's broken

Comfortable working in a fast-moving startup environment with evolving infrastructure requirements

Clear written communication (runbooks, post-mortems, documentation)

Cloud — AWS (2+ years)

Solid experience with AWS core services: EKS, EC2, ALB, ECR, CloudWatch, Secrets Manager, IAM, SQS, Route53, ACM

Understanding of AWS networking: VPC design, subnets, security groups, VPC peering, PrivateLink

Experience managing multi-environment AWS accounts (dev / prod separation)

Kubernetes & Containers (2+ years)

Production Kubernetes experience — deploying, scaling, and debugging workloads

Helm chart authoring and maintenance (not just helm install)

Docker — writing efficient multi-stage Dockerfiles for compiled (Go) and interpreted (Node.js) applications

Familiarity with HPA, PDB, resource limits/requests, and pod scheduling

Infrastructure as Code — Terraform (2+ years)

2+ years writing and maintaining Terraform at module level

Experience with remote state, workspaces, and multi-environment Terraform layouts

Ability to read existing module code, understand dependencies, and extend it safely

CI/CD (1+ year)

GitHub Actions — building and maintaining workflows (jobs, steps, secrets, environments, reusable workflows)

Experience implementing gated release pipelines with automated checks and manual approval gates

Container build and push pipelines to ECR or similar registries

Monitoring & Observability

Practical experience with log aggregation (CloudWatch, Fluent Bit, or similar)

Alerting configuration — defining meaningful alerts (not alert fatigue)

Experience debugging production issues from logs and metrics

Security

Secrets management best practices (Secrets Manager or Vault)

IAM least-privilege design

VPN and SSO administration basics

Nice to Have

Experience with SigNoz or OpenTelemetry-based APM platforms

Experience with MongoDB Atlas including PrivateLink and cluster management

Familiarity with SpiceDB or Zanzibar-style authorization systems

Experience with AWS FIS or other chaos engineering tools

Knowledge of Wasabi or S3-compatible storage beyond AWS native S3

Experience with AWS Amplify for frontend deployments

Exposure to gRPC service-based architectures (understanding of how Protocol Buffer services are deployed and scaled)

Experience running cost optimization programs on EKS using spot/fleet instances

Familiarity with ClamAV integration in Kubernetes environments

Go or Node.js — enough to read service code, identify issues in Dockerfiles, and help debug build failures

What We Offer

Ownership over a production-grade, cloud-native infrastructure stack — not just ticket execution

Exposure to a modern, well-structured microservices architecture with 16+ services

A team that treats infrastructure quality as a first-class concern

Flexible, asynchronous-friendly work culture

Opportunity to shape DevOps practices and tooling from an early stage

To apply, please send your resume and a short note about a complex infrastructure problem you've solved — specifically what the challenge was, what you built or changed, and what the outcome was.

Share job

Similar Jobs

View All

1 Day ago

IT Support Admin

Information Technology

Mumbai, Maharashtra, IN

Job Title: IT Support Admin Location: Kerala, Trivandrum Job Summary: We are seeking a reliable and proactive IT Support Specialist to provide technical assistance to end-users, maintain IT systems, and ensure smooth daily operations. The ideal cand...

More info

2 Days ago

Senior AI/ML Engineer

Information Technology

1200000 - 3500000 INR - Annual
10 - 15 Yrs

🔹 Key Responsibilities: • Design & deploy GenAI/LLM applications (GPT-4, Claude, Gemini) • Build multi-agent systems using LangChain, LangGraph, CrewAI, AutoGen • Develop RAG & vector-based retrieval pipelines • Implement LLMOps, model evaluation...

More info

2 Days ago

Azure Architect

Information Technology

Req ID: 367466 NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now. We are currently seeking a Azure Arc...

More info

2 Days ago

Data Scientist + Python

Information Technology

Job Title: Data Scientist – Python Experience: 4–6 Years Key Responsibilities: Analyze structured and unstructured data to extract insights Build and deploy ML/statistical models using Python Perform data cleaning, feature engineering, and model ev...

More info

2 Days ago

AI Engineer

Information Technology

200000 - 360000 INR - Yearly
Mumbai, Maharashtra, IN

About the job:About Voxido: Voxido is an AI-powered platform that automates front desk operations for businesses through intelligent voice and chat agents. Our AI handles 24/7 call answering, appointment booking, and CRM integration helping businesse...

More info

2 Days ago

Software Engineer- Frontend Development

Information Technology

500000 - 2500000 INR - Yearly
Mumbai, Maharashtra, IN

*About AuxoAI: * AuxoAI is a global platform-based services firm. We help companies—turn their strategies into practical digital and AI solutions. By understanding how our clients make decisions, we use digital and Artificial Intelligence (AI) techno...

More info

2 Days ago

Software Engineer - Full stack

Information Technology

500000 - 1200000 INR - Yearly
Mumbai, Maharashtra, IN

*Software Engineering – Full Stack Location: Colaba, Mumbai Employment type: Full-time | Work from office Reports to: Shubham Singh (Chief Tech Officer) *Role Overview: We are hiring enthusiastic students passionate about software development to j...

More info

2 Days ago

Full-Stack Mobile App Developer (Flutter – IOS + Android) - Internship

Information Technology

7000 - 12000 INR - Monthly
Mumbai, Maharashtra, IN

About the internship:Selected intern's day-to-day responsibilities include: 1. Develop cross-platform mobile app screens for iOS and Android using Flutter and Dart 2. Integrate REST APIs and backend services including Supabase and Firebase 3. Mainta...

More info

Talk to us

Feel free to call, email, or hit us up on our social media accounts.

Email info@antaltechjobs.in