Chennai, Tamil Nadu, India
Information Technology
Full-Time
Accenture services Pvt Ltd
Overview
Project Role : Cloud Migration Engineer
Project Role Description : Provides assessment of existing solutions and infrastructure to migrate to the cloud. Plan, deliver, and implement application and data migration with scalable, high-performance solutions using private and public cloud technologies driving next-generation business outcomes.
Must have skills : Netapp Network Attached Storage (NAS) Administration
Good to have skills : Storage Area Networks (SAN) Architecture and Design, EMC Storage Area Network (SAN) Administration
Minimum 2 Year(s) Of Experience Is Required
Educational Qualification : 15 years full time education
ROLE SUMMARY: As a highly skilled Site Reliability Engineer III, you will be responsible for managing and scaling our application observability platforms and tools. The ideal candidate will have a solid understanding of how best to implement tools, monitoring, and logging to best ensure we maintain optimal system and application health, by way of availability and performance. You will also play a role in scripting, troubleshooting, and setting up alerting for the monitors and logging. This role requires a solid foundation in automation, scripting, and problem-solving to ensure efficient and reliable information delivery, by way of our tools. KEY Responsibilities: s Implement and maintain an observability platform using Open Telemetry with Grafana Cloud (Log, Matrics and Traces). Integrate additional monitoring tools like Pingdom/SolarWinds, PagerDuty and Nagios to monitor and analyze system health and events. Collaborate with development and operations teams to improve system reliability, scalability, and performance. Analyze system and application performance metrics to identify and resolve performance bottlenecks. Develop and execute automation scripts for health checks, alerts, and auto remediation of the events Manage SaaS and cloud hosting environments to ensure optimal performance, security, and compliance. Conduct root cause analysis for system failures and implement preventive measures Create and maintain comprehensive technical documentation for Standard, template, and processes. Provide technical guidance on Migration and Critical projects to team members. REQUIRED QUALIFICATIONS 5-8 years prior experience in a SRE engineering role is required for this position BS Degree in Computer Science, Software Engineering, or related software engineering field Extensive experience with observability tools and practices, including Grafana Cloud, Prometheus, Loki and related technologies. Proficiency with Open Telemetry for comprehensive monitoring and performance tracking with Metrics and Traces Advanced knowledge of, and experience with, monitoring tools like Pingdom/SolarWinds Observability, Nagios and PagerDuty. Advanced skills in managing infrastructure on Windows Server 2016 and 2022, as well as Linux distributions (CentOS 7+, Alma). Advanced skills with scripting and automation (e.g., PowerShell, Python), health checks and auto restoration Strong understanding of SaaS and cloud hosting environments with virtualization technologies (VMware) and Azure DevOps. Experience with incident/ outage response, disaster recovery and service restoration. Strong problem-solving and troubleshooting skills and the ability to resolve complex outage/performance issues Ability to work independently and drive projects/tasks effectively. Willingness to work off-hours when necessary.
Project Role Description : Provides assessment of existing solutions and infrastructure to migrate to the cloud. Plan, deliver, and implement application and data migration with scalable, high-performance solutions using private and public cloud technologies driving next-generation business outcomes.
Must have skills : Netapp Network Attached Storage (NAS) Administration
Good to have skills : Storage Area Networks (SAN) Architecture and Design, EMC Storage Area Network (SAN) Administration
Minimum 2 Year(s) Of Experience Is Required
Educational Qualification : 15 years full time education
ROLE SUMMARY: As a highly skilled Site Reliability Engineer III, you will be responsible for managing and scaling our application observability platforms and tools. The ideal candidate will have a solid understanding of how best to implement tools, monitoring, and logging to best ensure we maintain optimal system and application health, by way of availability and performance. You will also play a role in scripting, troubleshooting, and setting up alerting for the monitors and logging. This role requires a solid foundation in automation, scripting, and problem-solving to ensure efficient and reliable information delivery, by way of our tools. KEY Responsibilities: s Implement and maintain an observability platform using Open Telemetry with Grafana Cloud (Log, Matrics and Traces). Integrate additional monitoring tools like Pingdom/SolarWinds, PagerDuty and Nagios to monitor and analyze system health and events. Collaborate with development and operations teams to improve system reliability, scalability, and performance. Analyze system and application performance metrics to identify and resolve performance bottlenecks. Develop and execute automation scripts for health checks, alerts, and auto remediation of the events Manage SaaS and cloud hosting environments to ensure optimal performance, security, and compliance. Conduct root cause analysis for system failures and implement preventive measures Create and maintain comprehensive technical documentation for Standard, template, and processes. Provide technical guidance on Migration and Critical projects to team members. REQUIRED QUALIFICATIONS 5-8 years prior experience in a SRE engineering role is required for this position BS Degree in Computer Science, Software Engineering, or related software engineering field Extensive experience with observability tools and practices, including Grafana Cloud, Prometheus, Loki and related technologies. Proficiency with Open Telemetry for comprehensive monitoring and performance tracking with Metrics and Traces Advanced knowledge of, and experience with, monitoring tools like Pingdom/SolarWinds Observability, Nagios and PagerDuty. Advanced skills in managing infrastructure on Windows Server 2016 and 2022, as well as Linux distributions (CentOS 7+, Alma). Advanced skills with scripting and automation (e.g., PowerShell, Python), health checks and auto restoration Strong understanding of SaaS and cloud hosting environments with virtualization technologies (VMware) and Azure DevOps. Experience with incident/ outage response, disaster recovery and service restoration. Strong problem-solving and troubleshooting skills and the ability to resolve complex outage/performance issues Ability to work independently and drive projects/tasks effectively. Willingness to work off-hours when necessary.
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in