Understanding Service Performance in IP Networks
Application and Network Performance Monitoring in Real-Time
As service providers move to an IP-based infrastructure for greater flexibility and reduced operational costs, there is a fundamental reality that all carriers face – understanding service performance in a network where communication is broken into packets, and each packet may or may not follow the same path from the sender to the receiver. What does this mean on a practical level? IP networking allows the network to dynamically route packets through multiple paths (non-deterministic) and re-assembles them at the end point, so the receiver is completely unaware of the route each packet traveled. In earlier network architectures, there was a dedicated path (deterministic) that all communication or information would have traveled – in sequence from start to finish.
In a deterministic network, the path could be engineered to deliver 99.999% (five-nine) reliability. In today’s IP world, there is no “one” path to engineer or monitor. In fact, since packets travel multiple paths an operator needs a global, end-to-end view of the network.
Given the complexity of IP networks and the non-deterministic flow of traffic, carriers need a new approach service assurance – a service triage approach. Much like medical triage, service triage requires quickly assessing all the elements in the service-delivery process and focusing on those areas that are problematic while eliminating those that are not.
Let me use a simple diagram of a network (fig. 1) to demonstrate. As a colleague of mine likes to say, if you measure the collective performance of all the elements in the service-delivery process (fig 1.), A+B+C+D+E, it would be possible to see relatively high service-delivery performance with end-to-end traffic, when in fact, a large number of subscribers may be having service issues. Let’s suppose that one of the ‘C’ servers is down. For this example, let’s assume that this server provides authentication services. Users routed to that server and trying to authenticate would be denied access and not be allowed to use the service. With the A+B+C+D+E assurance approach many subscribers may be denied service or experience poor service and never be counted or recognized! Therefore, a better approach would be to measure performance as the performance of A, the performance of B, the performance of C, etc. By measuring performance as A, B, C, D, E, (A comma B comma C…) the operator would be able to see issues within the ‘C’ domain. This service triage approach allows operators to quickly rule out parts of the network that are operating well.
In the real world, IP-based services are much more complex than the simple scenario above. Whether offering TV Everywhere, Voice over LTE (VoLTE), or Carrier WiFi, there are many layers of the network (fig. 2) that packets must traverse to create and maintain a successful user session. While some of the elements serve a dedicated purpose, many of them serve multiple purposes.
Because of the nature of IP networks, problems identified at a service triage level will impact far more subscribers and can be identified far quicker than a bottoms-up approach. The older, reactive, bottom-up approach focuses on individual subscriber problems and works up to see if the problem existed for larger communities of users; service triage or a proactive, top-down approach looks for systemic problems impacting larger quantities of users or communities of users. To understand non-automated service triage of today’s IP networks, one need only look to the “war room” when problems arise. The war room in essence pulls together all key players responsible for services in the network. In the war room, teams go methodically through each facet of the network, incrementally either eliminating or retaining key personnel until those ultimately responsible are able to address the problem.
In an automated service triage approach with advanced analytics, service triage is done automatically and reported proactively on dashboards. From the dashboard, operation teams should be able to quickly click to drill down into the elements involved and the forensic details of the problem. With this approach, war rooms become a thing of the past, and “situation rooms” providing proactive early warning become the norm.
Today, service providers around the globe are using NETSCOUT’s nGeniusONE service assurance platform for real-time, automated service triage, saving Operations and Engineering teams time and money while increasing the quality of the services they provide their customers. For more information please visit: www.NETSCOUT.com.