Troubleshooting a “Join Audio & Video” Delay with UCaaS using NETSCOUT Smart Edge Monitoring
Unified Communications-as-a-Service (UCaaS) adoption surged during the COVID-19 pandemic. Cisco Webex, Microsoft Teams, and Zoom all became household names and a standard practice in many business continuity plans for meetings and corporate communications during that period. As work from home (WFH) will be a part of future business models, use of these UCaaS technologies will continue, thus the importance of ensuring high-quality user experience to maintain employee and business productivity.
Like many enterprises, this company was using Cisco Webex for conference meetings (audio, video, collaboration) and over time, IT started to receive reports of slow audio connections for users joining meetings from locations in the Western United States. Normally, invitees to a meeting would log into the Webex session from an email invitation. Users could successfully access the session, however, when they clicked the “Join Meeting” button, the time in which the audio and video began was delayed by between 30 seconds and few minutes. Additionally, some users were also experiencing poor-quality sessions overall. It was no surprise IT began receiving complaints.
Not only were the affected attendees to the meetings frustrated with the delay, so too were the hosts and other invitees who had to wait for all participants to join and be able to contribute in order to start the meeting. This was extremely unproductive for all and was embarrassing for the company when customers and partners were also participating.
The impact on IT was multifaceted. For a paid UCaaS solution, they knew they were protected by service level agreements for quality delivery of service. However, the UCaaS vendor was not the only technology involved. The problem could be anywhere in the communications path, from the end users/client side through to the UCaaS hosting locations. There were gaps in visibility throughout the path and the delay could be in one of those areas, or with a protocol or application dependency. IT needed a way to improve their time to knowledge (MTTK), reduce time lost to finger-pointing among the various third-party vendors, while also isolating the true cause of the audio problem.
Using their Smart Edge Monitoring solution for visibility, the IT team first deployed software nGenius®PULSE 3000 Series Virtual nPoints on several of the laptops used by affected employees. The IT team configured a series of business transaction tests (BTT) for the nPoint to perform against the Webex service. As the WFH users were all using full VPN services that ran through the corporate network, their activity was also being monitored by the NETSCOUT® InfiniStreamNG® (ISNG) appliances in the data center for real-time analysis and views in the nGeniusONE® Service Assurance platform.
Once the BTTs were scheduled for regular, consistent analysis, the nPoints were able to orchestrate the steps a user would take throughout a Webex meeting lifecycle. It would navigate to a meeting, start the audio, start the video, attend the meeting and leave (log-out of) the meeting. The nPoint would send the smart data collected for each test to the nGeniusONE server with Edge Adaptor so the details for each step could be tracked and trended over time (as exhibited in Figure 1). Should delays be detected, such as a 30-second “join audio” time, the nPoint could be configured to send an alert to nGeniusONE.
Testing did find delays for the Start Audio step, which confirmed the user experience complaints (Figure 2). Why and what was causing the delay was the next challenge. The value of the Smart Edge Monitoring solution is the combination of nGeniusPULSE nPoint synthetic test monitoring for detecting What problems are emerging, and at which specific point in a transaction, along with InfiniStreamNG appliances and nGeniusONE Service Assurance solution with Edge Adaptor for discovering Why the problem is occurring.
nGeniusONE with smart data from the ISNG appliances at the data center can be used to investigate several potential root causes, including analyzing media metrics, e.g. MOS, QoS, Jitter, Latency, Packet Loss, to evaluating other protocols and applications in the same network segments, to those the UCaaS depends on for operating effectively.
In this case, the IT team’s analysis with nGeniusONE narrowed the slow start audio and video problem to DNS. Using the Universal Monitor feature, they evaluated the Session Overview details, which showed two DNS requests were occurring for a Webex session to two different DNS servers (Figure 3). The first server request was timing out, essentially a failure, whereas the second request was going through in a couple hundred milliseconds. The problem was determined to be the first DNS server.
The problem identified required a simple reconfiguration to target the properly operating DNS server rather than the one timing out, which eliminated the problem with delay in the joining audio and video portion of the Webex sessions. Should the IT team have found the problem to be associated to the ISP or their UCaaS provider, they could share the evidence found using Smart Edge Monitoring to collaboratively investigate and resolve the issue together, saving time and avoiding protracted, end-user impacting frustrations and experiences.
This use case highlights the power of Smart Edge Monitoring. The improved MTTK and mean-time-to-repair (MTTR) was made possible by leveraging the integration between active and passive monitoring technologies with the nGeniusPULSE and nPoints providing smart data from BTTs to the nGeniusONE server with Edge Adaptor for analysis, alerts, and views in nGeniusONE. And nGeniusONE was instrumental in delivering more detail from the enterprise environment to determine WHY that problem exists and pinpoint the root cause of the degradation. IT can now say, “Hey, there is an issue impacting end-user experience.” And now, with this powerful combination in Smart Edge Monitoring, they can confidently say, “And here’s why!”