Hyderabad, Telangana, India
Information Technology
Full-Time
Principal Global Services
Overview
Responsibilities
Implement Engineering practices and mindset by being focused on continuous improvement, experimentation and learning to solve customer problems in SRE (Site Reliability Engineering).
This will be Individual contributor role.
Reporting Relationship
This role will report to Delivery Manager
Education
Key Responsibilities:
Observability - Dynatrace, Elastic Stack or any other logging tools
Mandatory skills
Additional Information
Implement Engineering practices and mindset by being focused on continuous improvement, experimentation and learning to solve customer problems in SRE (Site Reliability Engineering).
This will be Individual contributor role.
Reporting Relationship
This role will report to Delivery Manager
Education
Key Responsibilities:
- Graduate - Bachelor’s in Engineering/Technology degree (preferred)
- Any SRE/Cloud Technology related certifications (Good to have)
Observability - Dynatrace, Elastic Stack or any other logging tools
Mandatory skills
- Hands-on experience of APM - application monitoring tools (mentioned above).
- At least 5 years of relevant experience in Setting-up monitors, integrate with alerting tools, integration with third party tools via API/REST services,
- Experience to troubleshoot issues, perform RCA, create/maintain dashboards of system/application issues in APM tools.
- Deep understanding of set-up of monitoring tools (installation, tuning, vendor management, patching/upgrade)
- Develop tools/solutions to automate, to standardize deployments and operations in regards to System/application/infrastructure monitoring.
- Write scripts related to setting-up monitors in Application Performance Management tools such as Dynatrace, AppMon etc.
- Develop dashboard within APM tool to be used to research on the application issues.
- Integration with systems over APIs/REST services (Nice to have).
- Apply software development & engineering mindset to sys admin activities.
- Focus on improving monitoring/observability of systems in a measurable.
- Provide monitoring services for systems so that teams can begin to track their SLOs and SLIs. Also assist in providing realistic objectives for the future and advise on proper SLAs for customers.
- Set-up/tune monitors for application health/status/performance.
- Improve monitoring based on symptoms instead of outages.
- Recommend, support and train usage of effective monitoring tools and process that can allow real-time system monitoring as well as analysis of long-term reliability trends.
- Create new alerts to find anomalies and understand the root cause of system failures.
- Integrate monitoring tools with third party tools such as XMatters, alerting tools, SMS tools.
- Document every action to convert findings into repeatable actions and then into automation.
- Discover efficiency by automating things to remove repetitive toil like watching dashboard, executing scripts, and other manual endeavors.
- Help optimize on-call rotation - add automation and context to alerts - leading to better real-time collaborative response from on-call responders.
- Understanding and usage of Incident management.
- Assist stakeholders in examining incidents and establishing processes to help prevent or minimize similar problems from arising.
- Develop procedures and policies by which technical support teams will operate.
- These processes will be applied to help in such areas as service failures and security threats. Will also train IT support staff
- Provide RCA & solutions.
- Derive trends from recurring system/application issues using dashboard.
- Good Written and Verbal Communication skills
- Highly motivated individual with a positive and proactive attitude to work and willingness to make changes to improve operational efficiency through innovation, process and procedure and adopting and adapting ideas and practices from elsewhere.
- Ready to work in shift, weekends, and flexible schedule.
- Excellent team skills with ability to listen and contribute to discussions and meetings.
- Ability to motivate staff
- Excellent team skills with ability to listen and contribute to discussions and meetings.
Additional Information
- Collaborative team player
- Continuous learners
- Focused on building the right thing with the right technology
- Solution-based, problem solvers
- Critical and independent thinkers
- Curious/experimenters
- Accountability of their deliverables
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in