130000 - 140000 INR - Yearly
Bangalore, Karnataka, IN
Information Technology
Contract
LearnShiz Techies
Overview
Role overview **
You will own core platform and DevOps responsibilities across **AWS and GCP, ensuring our agentic AI and data workloads run securely, reliably, and efficiently. You’ll design cloud infrastructure, improve developer experience, and drive automation across the SDLC.
Key responsibilities
- Design, build, and maintain cloud infrastructure on AWS and GCP using Infrastructure as Code (Terraform / Pulumi / CloudFormation).
- Own Kubernetes‑based platforms (EKS, GKE, or self‑managed clusters), including cluster configuration, upgrades, autoscaling, and add‑ons.
- Implement and manage CI/CD pipelines (GitHub Actions, GitLab CI, CircleCI, ArgoCD, etc.) for microservices and AI workloads.
- Build secure, scalable networking: VPCs/VPC‑SC, subnets, peering, private endpoints, ingress/egress controls, and service mesh where appropriate.
- Set up and operate observability stacks: logging, metrics, tracing (e.g. Prometheus, Grafana, Loki, OpenTelemetry, Datadog, New Relic).
- Implement security best practices: IAM roles & policies, secrets management, key management (KMS), image scanning, and compliance controls.
- Optimise performance and cost for compute, storage, and network workloads across clouds.
- Support data and AI teams with infra for GPUs, model serving, batch jobs, and data pipelines (e.g. managed Kafka, Pub/Sub, Dataflow, EMR, BigQuery, Redshift).
- Create internal platform tooling and self‑service workflows to improve developer productivity (templates, CLIs, golden paths).
- Participate in on‑call rotations, incident response, and post‑incident reviews; drive reliability improvements (SLOs, error budgets, capacity planning).
*Required experience and skills *
- 6+ years in DevOps / SRE / Platform Engineering roles running production systems.
- Deep hands‑on experience with both AWS and GCP, including core services (compute, networking, storage, IAM).
- Strong Kubernetes experience (EKS/GKE): deployments, stateful workloads, Helm/Kustomize, autoscaling, and troubleshooting.
- Proficiency with Infrastructure as Code (Terraform strongly preferred) and Git‑based workflows.
- Strong CI/CD experience, including environment promotion strategies, blue/green or canary deployments, and rollback patterns.
- Solid Linux, containers, and networking fundamentals (TCP/IP, DNS, TLS, load balancing, service discovery).
- Experience implementing monitoring, logging, alerting, and SLOs for distributed systems.
- Strong scripting skills (Python, Go, or Bash) for automation and tooling.
- Practical understanding of cloud security, including least‑privilege IAM, network segmentation, and secrets management.
- Excellent collaboration skills with dev teams, plus a pragmatic approach to reliability vs. delivery speed.
*Nice to have *
- Experience operating AI/ML infrastructure (GPU clusters, model serving, feature stores, vector databases).
- Experience with service meshes (Istio/Linkerd), API gateways, or zero‑trust networking patterns.
- Background with Argo (CD/Workflows), Crossplane, or other platform engineering tools.
- Prior experience in highly regulated or security‑sensitive environments (finance, healthcare, enterprise SaaS).
- Experience mentoring engineers and contributing to platform strategy/roadmaps."
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in