Problem Management's Role in an MSP

By Jeff Umberger August 10, 2021
Problem Management Defined

Problem management helps prevent the occurrence of incidents and their impact, while also working to prevent incidents that have already occurred from happening again in the future. This starts with identifying a problem and finding ways to mitigate or fix the underlying cause. A “problem” according to the ITIL definition is “the underlying cause of one or more incidents.” From an IT perspective, problem management works to identify problems in the technological environment, research and confirm the root cause, then apply fixes or work-arounds to prevent future incidents and business impact from occurring.

Utilization in an MSP

Implementing problem management in an MSP can be unique as there may be multiple clients that are supported. Processes may be different across the various clients, along with varying levels of support based on contracts around the different types of technologies. Monitoring implemented by an MSP may vary from what a client utilizes internally, and sensitivity of the alarms between the two monitoring services may differ as well. For example, MSP monitoring may detect ping loss for a server that appears to be caused by a network issue. The network may not be supported by the MSP under contract, so the client may not prioritize or investigate because they don’t necessarily see it as a business impact. This can easily lead to reoccurring incidents around this server ping loss if nothing is done for the known error.

Challenges

Once the problem is defined and problem management determines a root cause, the next step would be to implement a fix or a work around. This can be met with challenges as described earlier, as processes, contracted support and monitoring may be unique from client to client. Implementing a fix once the root cause is determined will generally involve implementing a change, which may require additional steps on the client side to follow their change management process. Implementing a work around may require navigating the contracted support and monitoring. This could involve creating process documentation unique to the scenario that needs to be followed by the MSP to mitigate business impact and quickly resolve the issue if it occurs, or adjusting MSP monitoring to tune out the reoccurring incident if it is not business impacting or supported under contract.

Uses

Despite the challenges listed above, problem management can provide many benefits when applied within an MSP. Implementing focused problem management should quickly allow for a reduction in incidents generated, cutting back on the “noise” seen from reoccurring alerts. This allows for less time wasted on simple tasks, allowing resources to be utilized and focused on more urgent incidents. This also helps with the bottom line of the contracts related, cutting back on the resources needed to support some of the clients. Problem management can also allow lessons learned from one client to be applied across all clients that the MSP supports. For example, a problem related to a specific vendor software or hardware is identified for one client, but it is then found that multiple clients are using the same technology. This would allow for proactive prevention of business impacting incidents by applying the necessary fix or notifying clients of the problem to correct. The proactive prevention would then help build a stronger relationship with the client.

ivision’s Application

ivision places a strong emphasis on problem management. ivision has implemented automation on reoccurring incidents to automatically generate problem records if the incident occurs multiple times within a certain time frame. This will then automatically trigger an investigation by the problem management team to find the root cause and apply a fix or work around. We’ve implemented additional automation around incidents to close out those cases that auto resolve without troubleshooting, which relies on the problem record creation automation in the event those incidents occur multiple times.

ivision has also implemented processes internally for our engineers and analysts to manually create problem records. This allows for a proactive approach in identifying problems where engineers and analysts may recognize a developing pattern or issue, which may lead to business impact or incidents further down the road. Engineers and analysts include the details of the pattern seen or potential issue that may occur, which gives the problem management team a starting point to begin investigating the issue and finding the root cause. ivision continues to fine tune and streamline the problem management process, incorporating industry best practices and implementing automation where applicable.