Overview
About Plum
Plum is an employee insurance and health benefits platform focused on making health insurance simple, accessible and inclusive for modern organizations. Healthcare in India is seeing a phenomenal shift with inflation in healthcare costs 3x that of general inflation. A majority of Indians are unable to afford health insurance on their own; and so as many as 600mn Indians will likely have to depend on employer-sponsored insurance.
Plum is on a mission to provide the highest quality insurance and healthcare to 10 million lives by FY2030, through companies that care. Plum is backed by Tiger Global and Peak XV Partners.
Position Overview
We are seeking an experienced Senior DevOps Engineer with 5+ years of expertise to lead our cloud infrastructure and DevOps initiatives. This role is critical in scaling our platform to serve millions of users while maintaining the highest standards of security, reliability, and performance in the healthcare domain.
Key Responsibilities
Infrastructure & Container Management
Design, implement, and upgrade enterprise-grade container infrastructure including Kubernetes clusters, node pools, and service mesh architectures
Lead the migration and optimization of legacy systems to cloud-native containerized solutions
Create, maintain, and optimize deployment manifest files for microservices using HELM charts and advanced templating
Architect and implement multi-environment Kubernetes strategies (dev, staging, production) with proper resource allocation and security boundaries
Troubleshoot complex container infrastructure issues and implement preventive measures
CI/CD & Automation Leadership
Design and maintain sophisticated CI/CD pipelines using ArgoCD for GitOps-based deployments
Lead the implementation of advanced pipeline strategies including blue-green deployments, canary releases, and feature flagging
Develop comprehensive automation frameworks using multiple scripting languages (Groovy, Go, Python, Shell, PowerShell)
Implement and enforce software development best practices including quality gates, automated testing, vulnerability scanning, and penetration testing integration
Mentor junior team members on DevOps best practices and tooling
DevSecOps & Security
Champion DevSecOps practices by integrating security tools and processes throughout the software development lifecycle
Implement comprehensive security scanning, compliance monitoring, and vulnerability management workflows
Design and maintain secure artifact repositories using Nexus and JFrog Artifactory with proper access controls and vulnerability scanning
Ensure HIPAA and healthcare data compliance requirements are met across all infrastructure components
Lead security incident response and post-mortem activities
Observability & Performance Engineering
Design and implement comprehensive observability solutions using distributed tracing (Jaeger), service mesh monitoring (Kiali), and cloud-native monitoring tools
Architect centralized logging solutions using Elastic Stack (Elasticsearch, Logstash, Kibana) and Fluentd for high-volume healthcare data
Implement advanced monitoring and alerting strategies using Prometheus, Grafana, Datadog, and New Relic with custom dashboards and SLA tracking
Optimize application and infrastructure performance based on observability insights
Design disaster recovery and business continuity strategies
Infrastructure as Code & Cloud Architecture
Lead infrastructure design and implementation using Terraform and CloudFormation for multi-cloud environments
Architect scalable, fault-tolerant cloud solutions capable of handling healthcare workloads with strict uptime requirements
Design and implement auto-scaling strategies for handling variable healthcare enrollment periods and claim processing loads
Manage complex networking configurations including VPCs, load balancers, CDNs, and firewall rules for high-volume traffic
Required Qualifications
Experience & Education
4+ years of hands-on experience in Cloud & DevOps engineering with a proven track record of scaling production systems
Bachelor's or Master's degree in Computer Science, Engineering, or related technical field
Experience in healthcare, fintech, or other regulated industries is highly preferred
Technical Expertise
Cloud Platforms: Expert-level proficiency in at least one major cloud platform:
GCP: Compute Engine, IAM, VPC, Cloud Storage, Cloud Functions, Cloud SQL, GKE, Pub/Sub, Operations Suite, Cloud Security Command Center
AWS: EC2, IAM, VPC, S3, Lambda, RDS, EKS, SNS, CloudWatch, CloudTrail, AWS Config
Container Orchestration: Advanced Kubernetes experience including custom operators, RBAC, network policies, and multi-cluster management
Infrastructure as Code: Expert proficiency in Terraform with experience in complex, multi-environment deployments
CI/CD Tools: Advanced experience with Jenkins, GitLab CI, ArgoCD, and GitOps methodologies
Monitoring & Observability: Hands-on experience with the full observability stack and custom metric development
Programming: Strong scripting and automation skills in multiple languages
Preferred Qualifications
Must Have - Certification in cloud platforms (GCP Professional Cloud Architect, AWS Solutions Architect)
Experience with service mesh technologies (Istio, Linkerd)
Knowledge of healthcare compliance standards (HIPAA, SOC 2)
Experience with chaos engineering and reliability practices
Experience with FinOps and cloud cost optimization
Background in microservices architecture and API gateway management
Leadership & Soft Skills
Proven ability to lead technical initiatives and mentor junior engineers
Experience with incident management and on-call responsibilities
Strong communication skills for collaborating with cross-functional teams
Experience with agile methodologies and project management
Ability to work in a fast-paced startup environment while maintaining attention to detail