Reducing Mean-time-to-Repair (MTTR)
When Adding Resources Isn’t an Option
Updated for 2019
Cable providers are doing much more these days than just delivering broadcast TV channels: operators are now focused on integrating a collection of IP-based services such as voice, Internet services as well as supporting tablets and other connected devices for video-on-demand and video streaming services. According to a recentIDG survey, the average home is also seeing an explosion of mobile devices: 75% of consumers use a smartphone to watch online videos compared with 61% in 2012.
How to reduce your Mean-Time-To-Repair
As these new services are rolled out, the challenge is addressing the inevitable network outages or bandwidth congestion events. Traditionally, customer impacting events have been measured by the mean-time-to-repair (MTTR). Unfortunately, the only way to reduce the mean-time is to either have shorter incidents or commit a greater number of resources. Sadly, operators don’t get to pick the nature of service issues and dedicating more resources isn’t an option.
This that doesn’t mean operators can’t impact MTTR. In fact, operators can significantly impact the economics of service assurance and the MTTR. To do this, they need to focus on the components, or stages, of MTTR.
What are the Four Stages Of MTTR?
There are four stages that make up MTTR (fig. 1): identification, knowledge, fix and verify. Not all of these stages require the same amount of time and resources; therefore, by breaking down MTTR into these stages, it is possible to gain a better understanding of where most of the time is spent. Let’s look at each stage:
This is the period of time from the start of an outage or service degradation until a NOC (Network Operations Center) or SOC (Service Operations Center) becomes aware of the issue. In the case of a fiber cut, the MTTI can be fairly short. In the case of a DNS issue, it could be hours; or, in the case of an Access Point (AP) firmware issue, it could be weeks. Having advanced warnings of service degradations can significantly reduce the MTTI and provides operators insight into a degraded or poor user experience.
The next stage of MTTR is knowledge. Mean-time-to-knowledge (MTTK) is the period of time after a problem has been identified, but before repairs have begun. This stage is generally conducted in the “War Room” trying to establish where the problem is and what network elements are involved. In most cases, this is the most time consuming aspect of the resolution process. It is also the one where advanced service assurance technology, such as NETSCOUT’s nGeniusONE platform can be game-changing in terms of shortening MTTK. nGeniusONE offers both early warning of service issues, as well as the ability to quickly drill down and understand which elements in the network are the root cause of the problems.
The third stage is actually fixing the problem. Reducing the Mean-Time-to-Fix (MTTF) can be accomplished by using element tools such as log files, session trace and packet capture, to guide operations and engineering teams to the precise cause of the problem, enabling rapid resolution.
Lastly is Mean-Time-to-Verify (MTTV), which is the process of ensuring the fix that has been applied is actually working. Once a fix or patch has been made the appropriate teams need to work together to verify that the fix works. A real-time network monitoring solution is needed to provide views and reports that prove the fix worked and the network, application or service is functioning normally.
Addressing Service Issues Before They Impact Customers
Reducing MTTR, MTTI, MTTK, MTTF and MTTV is imperative for operators looking to remain competitive in a rapidly evolving marketplace. Fortunately, the latest service assurance tools are enabling operators to find and address issues before they impact customers, ensuring the highest quality user experience and the best opportunity for the carrier to shine.