Incident Management is the IT Service Management process that is responsible for logging, recording, and resolving incidents. Usually, there is a ticketing tool needed for this process to ensure that everything is tracked properly, and the handover to various support groups is more efficient. The main goal of Incident Management is to restore business outage or disruption as quickly as possible via a “workaround” (temporary fix). Just think of the workaround as a patch to resolve a hole. The permanent fix, which is restoration of the hole via cement is called a “permanent resolution” and is covered separately by Problem Management.
So, why should we use Incident Management? Major benefits include (1) Improved reliability of the system/ongoing support since there is a team who will address the business disruptions right away, (2) Confidence that all reported issues are logged and being worked on, (3) Regular reporting process to users about their reported issues, (4) Improved monitoring of IT services quality, (5) Easier regular reporting of all issues and IT health status, (6) Having a single point of contact for all issues raised, (7) Easier escalation process, and (8) An organized process which will help Incident Management team determine rightaway the affected/ related Configuration items via the ticketing tool. In the end of the day, the main purpose of this process is to ensure that the business users will feel secured and confident to the IT service that thet are availing, by knowing that someone out there will be working and fixing if an issue happens. You also need to check if your service is big and critical enough to require a process like this. You can perform a simple cost versus business value analysis and see if it’s reasonable to implement Incident Management process on your service.
Incident Management process is composed of several steps, as illustrated and stated below:
- Incident Identification
- Incident Logging
- Incident Categorization
- Incident Prioritization
- Initial Diagnosis
- Investigation & Diagnosis
- Resolution and Recovery
- Incident Closure
An important item to take note here is that Incident identification can originate from various sources- Helpdesk, Technical user team, proactively detected by the support team, or even by the web interface. All of these reported issues are incidents that we should work on. One key item to ensure is to have the proper prioritization matrix and criteria to determine which of the incidents should your limited resources work on. It seldom happens that an IT service has an unlimited number of resources, because that will be way too costly.
Two of the terms you should know about Incidents, which we have been talking for a while now are its types: (1) Reactive, and (2) Proactive tickets. Reactive are the types of incidents being reported by the users of your service, while proactive are the ones being detected by the support team even before the incident is felt by the users. Long term, we would want to increase the proactive to reactive incident ticket ratio to ensure that users are almost not feeling any issues anymore.
The last aspect of Incident Management is ticket reporting and monitoring. This is one activity that is needed in an IT service to periodically check the status and trends in their own processes. One example is by tracking the various KPIs (Key Performance Indicators). Sample of this include, CWTT (Closed within Target Time), Ticket Aging, and Quality Days. Overall, this helps the IT Management decide on whether their ongoing operations is working well, or will need some fine tuning- either increase or decrease of resources.
This is Incident Management in a nutshell. Remember that in any IT Service, this is one of the main processes that is immediately needed upon setup.