Back to Jobs

11 Weeks ago

Site Reliability Engineer (SRE)

Apply Now

Udaipur, Rajasthan, India

Information Technology

Other

Reflections Info Systems

Overview

Introduction

As a Site Reliability Engineer (SRE) you will be responsible for improving the overall reliability of applications by ensuring its availability, performance, and scalability. Should be able to gather the technical requirements from the DevOps team and the operational requirements from the Application Support team. With the Site Reliability Engineer role being at the heart of solving production problems, should be able to take a holistic approach to troubleshooting and delve deeply into technical details and must acquire the necessary domain knowledge to effectively troubleshoot and recover from an outage as well as monitor applications in production and build alerts as required.

Responsibilities include:

Work closely with the application support team.
Monitor critical applications and services to minimize downtime and ensure their availability.
Collaborate with DevOps teams to maintain and monitor CI/CD pipelines.
Deploy new versions to production environments.
Work with project teams to ensure the reliability and maintainability of new and modified releases.
Provide input to risk management practices that will anticipate reliability-related incidents that could adversely impact operations.
Document processes and monitor application performance metrics.
Continuously improve proactive monitoring alert configuration and incident response processes to increase reliability and reduce Mean Time to Recovery (MTTR ).
Optimize performance and cost efficiency through continuous monitoring, trend analysis, and fine-tuning.
Monitor any abnormal usage that can impact the cost or performance and take corrective actions.
Proactively implement preventive measures to improve system reliability.
Maintain runbooks, Standard Operating Procedures (SOPs), diagrams, and documentation for swift incident response.
Conduct post-incident reviews to improve reliability and contribute to the development of resilience strategies.
Achieve Service Level Indicators (SLIs) that are set to meet reliability objectives.

Certifications :

Azure Solutions Architect Expert (Microsoft)
AWS Certified Solutions Architect (AWS)
Open Group Certified Enterprise Architect (TOGAF)
PMP or Prince-2 in Project Management

Primary Skills :

Monitoring and Analysis

Continuously monitor CDC dashboards to track service performance and analyze reports.
Oversee production and DevOps infrastructure dashboards, ensuring system stability and identifying potential issues.
Observe alerts from New Relic and escalate them to the respective teams as needed.
Identify duplicated New Relic alerts and optimize alert configurations to reduce noise and improve efficiency.
Track daily alerts in production to enhance alert optimization strategies.
Maintain and update a list of dashboards monitored, including details such as widgets, metrics, and threshold values.
Create and manage dashboards for validating and monitoring CPU optimizations for Rapid and CDC services.
Perform sanity checks on Container Memory Utilization, Missing Pods, Container Restarts, Container CPU Utilization, Active Pods, Node Resource Consumption, and Pod Network Status to ensure system health.

Release and Deployment Management

Coordinate and execute weekly production releases, ensuring services are deployed with optimized CPU values.
Update central repositories with the latest service configurations and CPU requests.
Perform post-deployment sanity checks to validate service stability after production releases.
Redeploy CDC services with optimized CPU values, ensuring system performance improvements.
Monitor new CPU optimizations for Rapid and CDC services, tracking performance improvements and resource utilization.

Incident Management and RCA Documentation

Conduct incident analysis, identifying root causes and documenting findings for continuous improvement.
Maintain detailed Root Cause Analysis (RCA) documentation to track incidents and resolutions.
Provide reports on incident trends, helping improve response times and preventive measures.

Collaboration and Communication

Participate in daily SyncUpsand internal meetings to discuss ongoing tasks, challenges, and improvements.
Sync up with the (NOC) team to align on monitoring strategies and escalations.
Collaborate with the Database (DB) team for performance tuning and issue resolution.
Conduct knowledge transfer (KT) sessions on Rapid Resource
Optimization and related best practices.

Optimization and Continuous Improvement

Track CPU optimization efforts, ensuring proper resource allocation and utilization for Rapid and CDC services.
Analyze performance data to refine resource allocation strategies and improve system efficiency.
Identify and implement best practices for reducing alert noise and optimizing monitoring configurations.

Secondary Skills :

Technical Knowledge

Fluent in AWS key services (EBS, S3, AWS Compute, Storage, RDS etc).
Expertise in Kubernetes or any Container Orchestration System.
Knowledge of Infrastructure as a Code.
Linux system administration knowledge.
Knowledge of RDBMS and Document databases.
Knowledge of Monitoring tools including AWS CloudWatch and NewRelic.
Additional certification in Microsoft, Linux, Cisco, AWS or similar technologies is a plus.

Behavioral competencies

Communication
Customer Centricity
Business & Market Acumen
Psychological Safety
Empathy
Growth Mindset & Learning Agility
Ethical and Vigilant
Digital Mindset
Operational Excellence
Teamwork
Analytical thinking

Job Details

Role:

Site Reliability Engineer (SRE)

Location :

9A2, Carnival Technopark, Trivandrum

Close Date :

14-03-2025

Interested candidates may forward their detailed resumes to Careers@reflectionsinfos.com along with their notice period, current and expected CTC details. This is to notify jobseekers that some fraudsters are promising jobs with Reflections Info Systems for a fee. Please note that no payment is ever sought for jobs in Reflections. We contact our candidates only through our official website or LinkedIn and all employment related mails are sent through the official HR email id. Please contact careers@reflectionsinfos.com for any clarification/ alerts on this subject.

Share job

Similar Jobs

View All

1 Day ago

Product Security Engineer

Information Technology

3 - 6 Yrs
Noida

Role: Product Security Engineer Experience: 3+ Years Location: Noida Job Description: Security Specialist in areas of Security Vulnerability Assessment & Penetration Testing. Responsible for periodic assessment and implementation of remediation...

More info

1 Day ago

Sr. DevOps Engineer

Information Technology

7 - 12 Yrs
Mumbai

ComUnus is hiring for Sr. DevOps Engineer No Of Position : 3 Exp. Req : 7+ Yrs Work Location : Mumbai (Vikhroli) Max NP : Immediate Joiners are preferred Must Have Skills : AWS Key Responsibilities: 1. 7+ years of Hands-On experience as De...

More info

1 Day ago

Interesting Job Opportunity: Bitkraft Technologies - Full Stack Developer - AngularJS & Node.js

Information Technology

SummaryBitkraft Technologies LLP is looking for Full-stack Engineers to join our software engineering team. You will be working across the stack on cutting edge web development projects for our custom services business.As a Full-stack Engineer, you ...

More info

1 Day ago

Big Data Engineer

Information Technology

DescriptionAmazon Retail Financial Intelligence Systems is seeking a seasoned and talented Senior Data Engineer to join the Fortune Platform team. Fortune is a fast growing team with a mandate to build tools to automate profit-and-loss forecasting a...

More info

1 Day ago

Interesting Job Opportunity: i2k2 - Python Developer - Web Crawling

Information Technology

Profile : Python Developer Experience : 3 To 6 YearsRequirement : Expertise in Python Development, AWS, Web Crawling, Databases (MYSQL, SQL SERVER), etc.Location : Work From Office (Work From Office)Working Days : 5Prefer Immediate Joiners.Job Descr...

More info

1 Day ago

Engineer - SSIS and T-SQL Developer

Information Technology

We are seeking an experienced in SSIS and T-SQL Developer specializes in ETL processes and SQL Server development, ensuring efficient data integration and database performance.. The successful candidate will work closely with cross-functional teams ...

More info

1 Day ago

Antino Labs - Python Developer - Geospatial Domain

Information Technology

Mumbai, Maharashtra, India

We are looking for a highly skilled Python Developer with a strong foundation in AI technologies and hands-on experience in Django DRF, web scraping, and geospatial tools.The ideal candidate should be passionate about building scalable backend syste...

More info

1 Day ago

Web Developer in Delhi, Noida, Gurgaon, Faridabad (Hybrid)

Information Technology

Mumbai, Maharashtra, India

Key Responsibilities Develop and maintain UniInsightt’s website using WordPress (core platform). Create performance-optimized landing pages and improve website UX/UI. Set up and manage basic analytics tools to track user engagement. Collaborate ...

More info

Talk to us

Feel free to call, email, or hit us up on our social media accounts.

Email info@antaltechjobs.in