Fixing Service Failures with Real-Time Solutions

sd wan


Before turning to address the concept of repairing service failures with real-time solutions, the conflict of definitions about what real-time means in an on-demand world needs to be clarified so enterprises and service providers share the same expectations of what can be achieved. So-called real-time tools are not instant, because time is needed between collecting and analyzing data and coming up with an actionable insight. Increased automation and utilization of technologies such as artificial intelligence and anomaly detection can quickly strip non-relevant data out of the process, thereby accelerating it and reducing costs by not performing analytics on irrelevant inputs.

This approach is sometimes described as smart data and involves selecting only the most relevant data for analysis in order to accelerate decision-making so it is close to real-time. This might involve targeting seemingly anomalous data or the use of machine learning to select data for analysis which has previously borne fruit. However, this is not done in absolute real-time. Instead, results in the form of actionable business insights are revealed in a timeframe that can be described as rapid. A loose definition could be to describe this as business insights delivered in a timely manner so the organization can react before the issue or impact is experienced by the customer. Typically this will be a timeframe that is fast enough that the user experience is not impacted in a detectable way – fractions of a second or just a few seconds.

The acceleration of the process by smart data tools does not mean that the system is ignoring potentially useful data, it’s simply selecting the most relevant data. Organizations can therefore rely on smart data tools not to miss key data points in the haste to get to actionable insights. Such tools incorporate advanced data analytics capabilities that have in-built intelligence to ensure only relevant data – rather than all data – is analyzed. This automated capability is vital because of the costs associated with processing vast volumes of irrelevant information. The shortage of data analysts and data scientists means only the most relevant data should be analyzed and the cost of compute capacity – remember, cloud isn’t free – can be contained by only analyzing data that is likely to be relevant.

Fast reacting, accurate smart data tools that are aware of the context in which they are operating are therefore foundational to enabling service failures to be fixed in near real-time. For organizations that are deploying SD-WAN, for example, there will be added advantages because the technology extends a service provider’s ability to understand what’s going on in the network and by extension have some insight into the activities going on at each of an organization’s premises. Having knowledge about the system about what is likely to be the root cause of a service failure means specific data sources can be targeted to get to a speedy resolution with minimized analysis of non-relevant data.

This is an improvement because before SD-WAN, the access pipe was described only as either working or not working. Degradation was hard to diagnose until a failure and then truck roll and field technicians were required to locate the problem causing cost and delay. With SD-WAN, the service provider can identify degradation and outright failures giving greater insight to its operations teams and supporting customer organizations better.

For example, knowing that performance is degrading at a specific item of network equipment means it can be proactively repaired or replaced. That might involve replacing it when an engineer is next on site, thereby avoiding costly truck roll, or it might involve quick identification of a problem because the system knows performance had been degrading at a specific node, for example, and therefore that node is where the fault lies and a fix can be made.

Smart data relies on end-to-end pervasive visibility across all data and networks. It relies on service assurance data as one key indicator but service assurance itself relies on smart data to enable fixes to be made rapidly. Without smart data, operators are left to analyze vast volumes of data as they seek out the cause of an issue. Typically this involves uncovering false negatives as some data sources can reveal faults that are a consequence of the root cause of an issue rather than the source of the issue itself. Sifting these erroneous indicators out of the process takes time and consumes computing resources.

Smart data tools that are deployed across the organization and that continually learn the likeliest causes of faults can enable service assurance fixes to be made in near real-time. This ensures maximized uptime and network utilization, minimized truck roll and, ultimately, an improved customer experience delivered by the service provider at lower cost.

~Written by George Malim. George is a freelance journalist who covers the telecoms and internet markets.