Free cookie consent management tool by TermsFeed Site Reliability Engineer (SRE) IT Platform Engineering | Antal Tech Jobs
Back to Jobs
1 Day ago

Site Reliability Engineer (SRE) IT Platform Engineering

decor
Gurugram, Haryana, India
Information Technology
Other
Omnissa

Overview

Job Description

Site Reliability Engineer (SRE) – Observability, Tracing & Platform Operations

About Omnissa — Our Company, Mission & Vision

Omnissa

is an independent global leader in digital work platforms, originating from the former VMware End User Computing business and now operating under KKR ownership. Omnissa empowers organizations to deliver seamless, flexible, and secure digital work experiences to employees everywhere. Our platform supports over 26,000 customers worldwide, including 7 of the top 10 Fortune 500 enterprises.

Our Purpose Is Clear

Guided by decades of innovation and strengthened by significant investment in AI, open APIs, and next generation digital workspace technologies, Omnissa is building the industry’s first autonomous workspace experience—a platform that simplifies operations for IT while unlocking higher productivity for employees.

Platform Engineering at Omnissa (IT Organization)

The Platform Engineering team within Omnissa’s internal IT organization is responsible for architecting, operating, and continuously improving our enterprise grade infrastructure platforms. Our mission is to deliver highly resilient, scalable, and secure systems that power Omnissa’s internal operations and customer facing services.

Our Environment Includes

Core Platforms

  • On-premise cloud environments built on:
  • VMware Cloud Foundation (vCF)
  • Apache CloudStack
  • Proxmox Virtualization Stack
  • Kubernetes based orchestration for containerized workloads
  • Opensource S3 compatible object storage systems

Observability Infrastructure

We have developed extensive Observability Infrastructure that monitors all Omnissa internal services that is managed automatically leveraging:

  • Prometheus
  • Grafana
  • Loki
  • Ansible

AI Driven Automation & Incident Response

We have developed an internal AI powered incident diagnosis and first response platform, leveraging cutting edge open technologies including:

  • Ollama
  • n8n
  • Various Model Context Protocol (MCP) servers

These systems help us reduce mean time to detect (MTTD) and mean time to-resolve (MTTR) through automated analysis, enrichment, and intelligent triage.

Role Overview — Site Reliability Engineer (SRE)

We are seeking a highly skilled SRE with deep expertise in Observability, particularly in:

  • Automation
  • Grafana
  • Loki
  • Prometheus
  • Development/Scripting

This role is critical to maintaining the reliability, performance, and operational integrity of our platforms. You will support both planned and unplanned workstreams, collaborating closely with engineering, incident management, and service owners.

The role includes participation in an on-call rotation, including nights and weekends, to ensure continuous coverage for mission critical systems.

Key Responsibilities

Observability Engineering

  • Design, deploy, and maintain Loki, Grafana, Prometheus, and integrated observability pipelines.
  • Contribute to new monitoring initiatives by developing new monitoring checks for services.
  • Maintain and improve our automation workflows that manage the infrastructure.
  • Develop and refine AI workflows for incident analysis and auto-remediation.
  • Continuously enhance logging, metrics, and tracing coverage across services.

Reliability, Resilience & Performance

  • Ensure high availability, capacity planning, and performance optimization across platforms.
  • Drive reliability improvements through automation, SLIs/SLOs, and root cause analysis.
  • Partner with development and platform teams to embed reliability best practices.

Incident Management & On-Call

  • Participate in the global on-call rotation, including weekends.
  • Leverage our AI driven incident diagnosis tools (Ollama, n8n, MCP) to accelerate response.
  • Manage unplanned work such as production incidents, outages, high urgency escalations and participate in post – mortem reviews.
  • Coordinate post incident reviews and continuous improvement initiatives.

Planned & Unplanned Work Management

  • Utilize the Atlassian toolset (Jira, Confluence, Opsgenie, etc.) for structured task, change, and incident management.
  • Manage planned maintenance, releases, and platform improvements.
  • Collaborate with cross functional teams to prioritize backlog and operational tasks.

Platform Operations

  • Support and enhance internal clouds based on vCF, CloudStack, and Proxmox.
  • Operate Kubernetes clusters and improve the reliability of containerized workloads.
  • Maintain S3 compatible storage platforms used across the enterprise.

Required Skills & Experience

  • Familiarity with at least one scripting/programming language.
  • Strong hands-on expertise with:
  • Grafana, Loki, Tempo (or similar tracing systems), Prometheus
  • Experience with Configuration Management tools (e.g., Ansible/Saltstack).
  • Proficiency in operating modern Linux-based distributed systems.
  • Experience supporting large scale, highly available architectures.
  • Familiarity with Kubernetes, CI/CD pipelines, and Infrastructure as Code.
  • Comfortable with on-call participation and incident leadership.
  • Experience with Atlassian tools (Jira, Confluence, Opsgenie).
  • Proficiency in Linux & Windows.

Nice to Have

  • Exposure to Ollama, N8N, or similar AI orchestration/automation tooling.
  • Experience with S3 storage internals or open source object stores (e.g., SeaweedFS, Ceph).
  • Understanding of virtualization stacks such as Proxmox, vSphere/vCF, or CloudStack.
  • Background in SRE driven culture, including SLIs/SLOs and error budgeting.
Share job
Similar Jobs
View All
1 Day ago
AEM Admin DevOps Engineer
Information Technology
  • 6 - 9 Yrs
  • Chennai
Job Title: AEM Admin DevOps Engineer Experience: 6 – 9 Years Location: Chennai Job Role We are seeking an experienced AEM Admin DevOps Engineer responsible for managing and maintaining the Adobe Experience Manager (AEM) platform (...
decor
1 Day ago
Principal Software Engineer (Java/AWS)
Information Technology
  • 8 - 12 Yrs
  • Hyderabad
Title: Principal Software Engineer (Java/AWS) Location: Hyderabad Work hours: US hours (EST/CST/PST) Experience: 8+ years Common requirements for role - Principal Software Engineer (Java/AWS) Core stack: Expert-level Java (Java 1...
decor
1 Day ago
Senior Software Engineer (Java/AWS)
Information Technology
  • 6 - 10 Yrs
  • Hyderabad
Title: Senior Software Engineer (Java/AWS) Location: Hyderabad Work hours: US hours (EST/CST/PST) Experience: 6-10 years  Common requirements for role - Senior Software Engineer (Java/AWS) Core stack: Expert-level Java (Java 11+)...
decor
1 Day ago
ANGULAR DEVELOPER (EXP 2 – 4 YEARS)
Information Technology
  • Gurugram, Haryana, India
Job DescriptionWe are hiring for experienced Angular developer to join our dynamic team. You will be responsible for creating a top-level coding-base using Angular best practices. Your role will require you to implement an exciting and streamlined us...
decor
1 Day ago
BC sector-O and G-Data Architect-SM
Information Technology
  • Gurugram, Haryana, India
At EY, we’re all in to shape your future with confidence. We’ll help you succeed in a globally connected powerhouse of diverse teams and take your career wherever you want it to go. Join EY and help to build a better working world. Job Description: S...
decor
1 Day ago
Senior Data Analyst
Information Technology
  • Gurugram, Haryana, India
About Hevo:Hevo (www.hevodata.com) is a simple, intuitive, and powerful No-code Data Pipeline platform that enables companies to consolidate data from multiple software for faster analytics.Hevo powers data analytics for 2000+ data-driven companies...
decor
1 Day ago
Senior Data Engineer
Information Technology
  • Gurugram, Haryana, India
Job DescriptionSenior Data EngineerAbout JLL And JLL TechnologiesJLL is a leading professional services firm that specializes in real estate and investment management. Our vision is to reimagine the world of real estate, creating rewarding opportunit...
decor
1 Day ago
Software Engineer 2
Information Technology
  • Gurugram, Haryana, India
We are a global team of innovators and pioneers dedicated to shaping the future of observability. At New Relic, we build an intelligent platform that empowers companies to thrive in an AI-first world by giving them unparalleled insight into their com...
decor

Talk to us

Feel free to call, email, or hit us up on our social media accounts.
Social media