Cable providers are doing much more these days than just delivering broadcast TV channels: operators are now focused on integrating a collection of IP-based services such as voice, Internet services as well as supporting tablets and other connected devices for video-on-demand and video streaming services. According to a recentIDG survey, the average home is also seeing an explosion of mobile devices: 75% of consumers use a smartphone to watch online videos compared with 61% in 2012.
As these new services are rolled out, the challenge is addressing the inevitable network outages or bandwidth congestion events. Traditionally, customer impacting events have been measured by the mean-time-to-repair (MTTR). Unfortunately, the only way to reduce the mean-time is to either have shorter incidents or commit a greater number of resources. Sadly, operators don’t get to pick the nature of service issues and dedicating more resources isn’t an option.
This that doesn’t mean operators can’t impact MTTR. In fact, operators can significantly impact the economics of service assurance and the MTTR. To do this, they need to focus on the components, or stages, of MTTR.
The Four Stages Of MTTR
There are four stages that make up MTTR (fig. 1): identification, knowledge, fix and verify. Not all of these stages require the same amount of time and resources; therefore, by breaking down MTTR into these stages, it is possible to gain a better understanding of where most of the time is spent. Let’s look at each stage:
This is the period of time from the start of an outage or service degradation until a NOC (Network Operations Center) or SOC (Service Operations Center) becomes aware of the issue. In the case of a fiber cut, the MTTI can be fairly short. In the case of a DNS issue, it could be hours; or, in the case of an Access Point (AP) firmware issue, it could be weeks. Having advanced warnings of service degradations can significantly reduce the MTTI and provides operators insight into a degraded or poor user experience.
This is the period of time after an issue has been identified and work to fix the problem starts. This is the time typically spent in the “War Room” trying to figure out where the problem is and what network elements are involved. Typically, MTTK will require the most time and the most people. Fortunately, today’s service assurance technology like NETSCOUT’s nGeniusONE Service Assurance platform provides both early warning of service issues as well as the ability to quickly drill down and understand which elements in the network are having problems. Technology like this allows operators to significantly reduce the most time and labor intensive part of MTTR by shortening the MTTK.
This is the time for operations, engineering, and suppliers to actually fix the problem. At this stage, element tools such as log files are helpful in addressing the problem. With the aid of session trace and packet capture these teams will be able to quickly resolve the issue.
This is the final step in MTTR. Once a fix or patch has been made the appropriate teams need to work together to verify that the fix works. A real-time network monitoring solution is needed to provide views and reports that prove the fix worked and the network, application or service is functioning normally.
If an operator wants to get serious about reducing the duration of service issues, they need to move beyond the simple MTTR metric. Operators need to focus on the components of MTTR, MTTI, MTTK, MTTF and MTTV. By focusing on the elements of MTTR that operators will be able to reduce their overall mean-time. Fortunately, today’s service assurance tools are allowing operators to make significant strides and even allowing operators to find and address issues before they impact customers.