
Overview
exp loc
Responsibilities
Monitor and troubleshoot infrastructure components hosted in various cloud environments such as AWS, Microsoft Azure, Google Cloud, and on-premise data centers
Work closely with the DevOps team or the development team in identifying and rectifying issues promptly in various environments
Respond to reported issues within the agreed SLAs and escalate to appropriate stakeholders based on the incident management process
Detect and analyze alarms to provide basic to moderate fault isolation and remote troubleshooting, escalating if necessary
Build and maintain tools and frameworks that support automation, health-check of applications, and patching activities
Automate daily tasks using configuration management tools like Ansible, Chef, etc, or scripting using Python, Perl, Bash, etc
Support and troubleshoot high availability, performance, monitoring, backup, and restoration of different environments
Design & build a dashboard to provide visibility into the health of infrastructure using tools like Grafana, Kibana, etc
Closely work with client stakeholders to resolve issues and participate in regular cadence calls for business improvement
Establish a good working relationship with customers and other professionals
Evaluate tools, technologies, and processes to improve the efficiency and scalability of continuous integration environments
Requirements
B.E/B.Tech/MCA with 2+ of experience in managing cloud infrastructure
Experience working in 24x7 Support Environments on help desk tickets, Linux servers in virtualized environments, and monitoring tools like Nagios, working on ticketing tools like JIRA, Freshdesk, and ServiceNow (Preferred), creating runbooks and KB articles for help desk support.
Familiarity with the fundamentals of Linux scripting languages
Experience in AWS Administration
Effective written and verbal communication skills