Free cookie consent management tool by TermsFeed Senior Site Reliability Engineer (SRE) & Support Lead | Antal Tech Jobs
Back to Jobs
3 Days ago

Senior Site Reliability Engineer (SRE) & Support Lead

decor
Hyderabad, Telangana, India
Information Technology
Full-Time
TechGrove by Banyan Software

Overview

TechGrove is the Centre of Excellence for Banyan Software, based in Chennai, India. It plays a key role in supporting Banyan’s global businesses through technology, security, and software development. TechGrove brings together India’s deep pool of technical talent with Banyan’s long-term approach to growth, creating a trusted, developer-focused environment where people can do their best work.

Senior Site Reliability Engineer (SRE) & Support Lead (Touchstream)

Location: Chennai, India
Reports to: Head of Integrations
Role Type: Hands-on senior individual contributor with support leadership responsibilities

Company & Core Product Snapshot

Touchstream is the OTT Operations Hub: a cloud-native SaaS platform for independent, end-to-end monitoring of streaming video systems (CDNs, origin, delivery chain). We serve some of the world’s largest broadcasters, telco/OTT services, and streaming platforms—monitoring tens of thousands of live streams in real time.

Touchstream now unifies its best selling CDN Monitoring and VirtualNOC into a single platform delivering:

  • Unified data & end-to-end visibility across the streaming workflow
  • Best-in-class incident intelligence and RCA tooling (including timestamped evidence packs)
  • Operating-model improvements via shared views, collaboration, AI MCP Servers and rich knowledge bases
  • Business value and ROI reporting for capacity optimization and performance insights
Role Summary

As Senior SRE Engineer & Support Lead, you will own production health for Touchstream’s customer-facing platform and data plane, while also leading the global technical support function as part of your SRE responsibilities. Your mission is twofold:

  1. Reliability ownership: ensure high availability, performance, and change safety across the system (UI/API and ingest, process & query pipelines), with strong SLO discipline and continuous improvement.

  2. Support leadership: run and evolve the support operation—triage, escalation, incident response coordination, tooling, and (over time) building a strong support team in Chennai to deliver world-class customer outcomes.

This is a highly impactful role at the intersection of SRE, incident management, observability engineering, and customer-facing support.

Responsibilities:1) Reliability Ownership (Primary)
  • Define and maintain SLOs, error budgets, and service health reporting.
  • Own availability and performance of:
    • Customer-facing system: UI/API
    • Data plane: ingest,  process & query pipelines
  • Drive capacity planning for live-event spikes, load testing, and scaling strategies.
  • Prevent recurring issues through high-quality RCAs and rigorous follow-through.

2) On-Call & Incident Management (Run the Room)
  • Build and evolve the on-call operating model: severity levels, paging rules, escalation paths, comms templates.
  • Lead high-severity incidents end-to-end: triage, mitigation, rollback, “stop the bleeding” decisions, stakeholder comms.
  • Track MTTA/MTTR and implement systemic improvements over time.

3) Observability for the Observability Platform (“Meta-Observability”)
  • Own “who watches the watcher?”—monitoring and alerting for Touchstream’s monitoring pipeline itself.
  • Standardize telemetry conventions (logs/metrics/traces) across services.
  • Build and maintain dashboards for:
    • ingest health (per customer / per source)
    • pipeline lag
    • query performance
    • alerting health
  • Tune alerting to reduce noise: dedupe, routing, “symptom vs cause,” threshold hygiene.

4) Release Engineering & Change Safety (Bulletproof Change Management)
  • Implement guardrails: feature flags, progressive delivery/canaries, automated rollback triggers.
  • Maintain release readiness practices: migration checks, backfills, customer impact assessment, capacity impacts.
  • Drive change metrics: deploy frequency, change failure rate, recovery time from deploys.

5) Cost & Efficiency Ownership (Cloud Economics)
  • Monitor and optimize cost per GB ingested/stored/queried.
  • Enforce retention policies, tiering, sampling, and query limits without breaking customer value.
  • Make explicit capacity vs. cost tradeoffs—especially around large live events and heavy dashboards.

6) Security & Resilience Basics (Small-Team Practicality)
  • Baseline controls: access reviews, secrets management, least privilege, dependency scanning.
  • Rate limiting / abuse guardrails, audit logging, security incident response readiness.
  • Backup/restore and lightweight-but-real disaster recovery drills.
7) Support Leadership & Operations (Explicitly Part of the Role)
  • Serve as the senior escalation point for critical customer issues and high-impact outages.
    Senior Technical Support Manage…
  • Own the support operating model:
    • ticket triage, prioritization, SLAs, escalation paths, and shift handovers
    • runbooks, playbooks, FAQs, and knowledge base (including formats suitable for AI-assisted support / RAG)
  • Establish and monitor support KPIs (SLA compliance, backlog, customer satisfaction, MTTx) and implement process improvements.
    Senior Technical Support Manage…
  • Partner with Engineering/Product/Integrations to turn support learnings into reliability fixes and product improvements.
  • Over time: help build, mentor, and lead a team of support/NOC engineers in Chennai.
8) Customer-Impact Focus (Tenant Health & Trust)
  • Maintain per-tenant “customer health views”: SLO compliance, noisy sources, top offenders, recurring incident patterns.
  • Collaborate with Product on operator workflows: service health panels, incident summaries, status updates.

Required Qualifications & SkillsTechnical / SRE Foundation
  • 8+ years in SRE, production operations, technical support for SaaS, or NOC/ops roles with strong reliability ownership.
  • Strong Linux fundamentals; comfort with debugging distributed systems.
  • Strong understanding of cloud infrastructure (AWS and/or GCP) and service operations.
  • Experience with monitoring/alerting/logging stacks, incident management, and RCA practices.
  • Ability to automate operational work (Python and/or shell scripting); comfort with APIs and CLI tooling.

Streaming / OTT Domain (Nice to Have)
  • Strong understanding of video streaming and delivery concepts: HLS, DASH, CMAF, ABR, CDNs, origin, HTTP, caching, DNS, SSL/TLS.  Familiarity with AWS Media Services is a big plus.

Support Leadership & Customer Communication
  • Proven ability to run escalations and communicate clearly in high-pressure incidents.
  • Experience designing support workflows, SLAs, escalation paths, and operational KPIs.
  • Strong written and verbal English; confidence presenting incident status and RCAs to customers.
Working Style
  • Comfortable with flexible hours to support global customers (overlap with Europe/US time zones as needed).
  • Bias for action, continuous improvement mindset, and strong ownership.
Desired / Nice-to-Have
  • Prior experience supporting high-scale, always-on streaming events and live operations.
  • Experience with progressive delivery, canarying, feature-flag platforms, and release automation.
  • Familiarity with IT service management frameworks (e.g., ITIL).
  • Security operations exposure (secrets management, vulnerability management, audit logging).
What You’ll Gain & Why Join
  • A senior, high-ownership role shaping reliability + support for a mission-critical observability platform in OTT streaming.
  • Direct impact on global broadcasters and streaming services—improving viewer experience at scale.
  • Opportunity to build the SRE/support operating model and grow the Chennai support function over time.
  • Collaboration with a globally distributed team across engineering, integrations, operations, and product.

Beware of Recruitment Scams

We have been made aware of individuals fraudulently posing as members of our Talent Acquisition team and extending fake job offers. These scams may involve requests for personal information or payment for equipment. 

Protect yourself by following these steps:

  • Verify that all communications from our recruiting team come from an @banyansoftware.com email address.
  • Remember, employers will never request payment or banking information during the hiring process.
  • If you receive a suspicious message, do not respond — instead, forward it to careers@banyansoftware.com and/or report it to the platform where you received it.

Your safety and security are important to us. Thank you for staying vigilant.

Share job
Similar Jobs
View All
1 Day ago
Manual Tester
Information Technology
  • Hyderabad, Telangana, India
Summary As a Senior Manual Tester at Gainwell, you can contribute your skills as we harness the power of technology to help our clients improve the health and well-being of the members they serve — a community’s most vulnerable. Connect your passion ...
decor
1 Day ago
Network Administrator - Level 2
Information Technology
  • 700000 - 800000 INR - Yearly
  • Hyderabad, Telangana, India
Job Title: Network Administrator -l2 This position is intended for Network Administrators with experience configuring Cisco switches/routers and firewalls. Preference would be given to those candidate who has experience installing, configuring, a...
decor
1 Day ago
IT Support Associate - Level II
Information Technology
  • Hyderabad, Telangana, India
DESCRIPTION At Cummins, the IT Support Associate – Level II provides operational support to ensure the effectiveness, efficiency, accuracy, and availability of IT resources for end users under moderate supervision. This role works collaboratively wit...
decor
1 Day ago
Data Analyst, Marketing Procurement
Information Technology
  • Hyderabad, Telangana, India
Posted on: Mar 6, 2026Apply nowData Analyst, Marketing ProcurementMaharashtra, IndiaApply nowSee more open positions at ZitchaPrivacy policyCookie policy...
decor
1 Day ago
Senior Python Developer - Django/Flask/FastAPI
Information Technology
  • Hyderabad, Telangana, India
DescriptionSenior Python DeveloperAbout Apptware Solutions LLPApptware Solutions LLP is a technology-driven software development company delivering high-quality, scalable digital products for global clients. We focus on strong engineering practices, ...
decor
1 Day ago
IT Executive in Delhi, Delhi
Information Technology
  • Hyderabad, Telangana, India
Key Responsibilities Provide IT support to staff for hardware and software issues. Install and update operating systems and required software. Troubleshoot and resolve computer and network problems. Maintain computers, printers, and other IT equipmen...
decor
1 Day ago
Interesting Job Opportunity: Volody - PHP Developer - MySQL/jQuery
Information Technology
  • Hyderabad, Telangana, India
Job DescriptionVolody is enterprise software company and looking for software developers with 2 - 5 years experience working on PHP framework. The candidate should be process oriented and passionate coder building a world class product. Knowledge of ...
decor
1 Day ago
Software Developer - C# / C++ /Rust
Information Technology
  • Hyderabad, Telangana, India
About UsnCircle Tech Private Limited (Incorporated in 2012) empowers passionate innovators to create impactful 3D visualization software for desktop, mobile and cloud. Our domain expertise in CAD and BIM customization is driving automation with the a...
decor

Talk to us

Feel free to call, email, or hit us up on our social media accounts.
Social media