Overview
IntroductionA career in IBM Consulting is rooted by long-term relationships and close collaboration with clients across the globe.
You'll work with visionaries across multiple industries to improve the hybrid cloud and AI journey for the most innovative and valuable companies in the world. Your ability to accelerate impact and make meaningful change for your clients is enabled by our strategic partner ecosystem and our robust technology platforms across the IBM portfolio; including Software and Red Hat.
Curiosity and a constant quest for knowledge serve as the foundation to success in IBM Consulting. In your role, you'll be encouraged to challenge the norm, investigate ideas outside of your role, and come up with creative solutions resulting in ground breaking impact for a wide network of clients. Our culture of evolution and empathy centers on long-term career growth and development opportunities in an environment that embraces your unique skills and experience
Your Role And Responsibilities
- The Incident Manager is responsible for driving the end-to-end management of major incidents, ensuring minimal impact to business operations, restoring services within agreed SLAs, and preventing recurrence through effective post-incident reviews. The role requires coordination across global technical teams, vendors, and business stakeholders in a high-pressure, 24x7 environment.
- Responsibilities
- Incident Ownership: Act as the single point of contact (SPOC) for all major and critical incidents (P1/P2).
- Restoration Management: Drive resolution by coordinating with technical support teams, vendors, and third parties to restore services within defined SLAs.
- Impact Assessment: Evaluate business impact and prioritize incident response accordingly.
- Communication: Provide timely and transparent communication to stakeholders during the incident lifecycle — including incident updates, business impact statements, and recovery progress.
- Escalation Management: Proactively escalate critical issues to senior management and ensure timely decision-making.
- Root Cause Analysis (RCA): Facilitate post-incident reviews, ensure RCA documentation, and track preventive and corrective actions to closure.
- Process Governance: Enforce ITIL-aligned incident and problem management processes, ensuring compliance and continuous improvement.
- Reporting: Generate daily, weekly, and monthly incident metrics, trend analyses, and SLA reports for management.
- Continuous Improvement: Identify process gaps and work with service delivery teams to enhance operational resilience and reduce incident frequency.
- Collaboration: Work closely with Service Delivery Managers, Change Managers, and Problem Managers to ensure service stability.
- Shift Operations: Support 24x7 operations with on-call availability for major incidents.
Master's Degree
Required Technical And Professional Expertise
ITIL Foundation Certification (mandatory)
- Experience managing Major Incidents (P1/P2) in large-scale enterprise IT environments.
- Familiarity with ITSM tools such as ServiceNow, BMC Remedy, or IBM Control Desk.
- Understanding of infrastructure components — servers, storage, networks, databases, cloud (AWS, Azure, IBM Cloud), and application layers.
- Ability to analyze logs, monitoring alerts, and performance metrics to support technical teams during triage.
- perience with automation and monitoring tools (e.g., Splunk, AppDynamics, Dynatrace, Nagios).
- Basic understanding of change and problem management processes to coordinate cross-functional dependencies.
- Proven experience in service delivery environments supporting BFSI, Telecom, or Manufacturing domains preferred.
- Familiarity with major outage management frameworks and command center operations.
- 5–10 years of overall IT experience with at least 3 years in an Incident Management or Service Management role in a global environment.
- Prior experience in managing incidents across hybrid infrastructure and cloud environments (IBM Cloud, Azure, AWS, GCP).