Company Reduces MTTR from 15 hours to 15 minutes With NETSCOUT Service Edge Monitoring
- Customer Profile
- Customer Profile
- The Challenge
- Solution in Action
- The Results
- Thousands of clients globally access applications online
- Company experienced several outages, some lasting up to 15 hours
- Protracted war room efforts revealed a lack of proactive, application visibility in key network edge locations
- nGeniusONE® Service Assurance platform
- InfiniStreamNG® 9800 series hardware appliances and vSTREAM® virtual appliances
- nGenius® Packet Flow Operating System (PFOS) for Certified 7100 Series Packet Brokers
- NETSCOUT® Onsite Engineer
- Dramatic reduction in MTTR to resolve customer facing application degradations
- Elimination of costly, inefficient war room sessions
From its early beginnings in the mid 1900’s, this focused, driven corporation grew from a small, local business to a major international leader in business process outsourcing (BPO) and human resources and human capital management (HCM). Supporting clients around the world, this company provides cloud-based, online applications used in a variety of human resources and employee management activities.
In the forefront of most major technology advancements in its industry, digital transformation has become a point of distinction for this company. They are acutely aware of the importance of high-quality performance for all their client interactions. Disruptions are unacceptable when customer deadlines are concerned. Their IT organization is committed to ensuring that all their users have a high-quality experience with all their applications from wherever they are accessing them.
The company’s online application services require constant availability, with consistent responsiveness and performance for their tens of thousands of clients and millions of end-users accessing essential information. The company had recently experienced more than a dozen outages lasting in excess of five hours to triage, troubleshoot, and resolve. In some cases, they required a war room to be spun up involving 20 or more IT staff, vendors, and third-party service providers at all hours of the day and night. Not surprisingly, some of these protracted outages gained the scrutiny of the new CIO and led to the formation of several action plans to reduce and avoid similar future incidents.
The corporate objectives designed by the Executive Management team focused on providing highest-level quality of experience for its global, on-line customers.
The goals included:
- Rapid advancement from a reactive network management approach to a proactive application service assurance model.
- Improvement in operations, baselining performance, and implementing capacity trending visibility.
- Implementation of monitoring visibility to fill gaps and voids in key network, data center, and cloud service edges caused by migrations to a complex, hybrid-cloud environment.
- Initiation of proactive detection and notification of emerging problems, at all tiers of their service, before service degradations could occur or become broad-reaching.
- Measurable reduction in mean time to triage, troubleshoot, and restore services (MTTR).
- Qualification of impact from degradations by baselining normal utilization and performance of network and applications.
- Reduction in lengthy, costly troubleshooting demands with the immediate priority - “No more war rooms!”.
Members of the IT Team had been challenged in the past, due to a lack of visibility and competent monitoring tools, to troubleshoot disruptions and degradations quickly and effectively. A new, more comprehensive solution would be required to meet the goals and objectives, as well as, to achieve the standards that both executive leadership and IT staff desired for delivering services to employees, their clients, and the millions of users of their on-line human resources applications.
Solution in Action
Following a rigorous analysis of the several choices for network and application performance management available to this organization, the IT team collectively determined NETSCOUT, with its recently introduced Smart Edge Monitoring solution provided the level of detailed visibility, proactive analysis, early detection of emerging problems, troubleshooting workflows, and trending for baselining and capacity planning that would help them meet the CIO’s directives. In collaboration with NETSCOUT, the team designed the Smart Edge Monitoring visibility strategy to include:
- nGeniusONE Service Assurance solution for single pane of glass analysis for triage, alarming, troubleshooting, trending, network and application performance assurance.
- nGenius Packet Flow Operating System (PFOS) for Certified 7000 series packet brokers to distribute packets from the 40GB high-capacity links in the data centers to downstream devices, including InfiniStreamNG appliances and other tools.
- InfiniStreamNG 9000 series certified appliances to fill the visibility gap at the network edges where the WAN and Internet connected, as well as the data center core’s 40GB segments for monitoring the north-south network traffic traversing in and out of their data center environment and throughout key network switches.
- vSTREAM virtual appliances were included for monitoring of east-west conversation traffic at the service edges where the was a visibility gap throughout the VMware virtualized environment to provide details on activity between Web, database, and application servers.
- NETSCOUT Onsite Engineer (OSE) for dedicated support of the overall NETSCOUT solution, from configuring dashboards, workflows, and alarms to creating scheduled reports.
The ISNG and vSTREAM appliances use NETSCOUT’s Adaptive Service Intelligence® (ASI) technology to convert the packet-based traffic into smart data, which is consumed by nGeniusONE for smarter performance analytics providing real-time visibility and insight into the company’s networks and applications.
Although still early in the deployment of the solution, the collaboration between this company’s IT team and the NETSCOUT OSE is already detecting issues and identifying their causes. In monitoring the network during a regularly scheduled stress test of one of their customer-facing applications before and after the additional traffic load, they discovered bottlenecks at their Internet links in the network edge where thousands of end-users connected to their accounts could be impacted. The evidence in nGeniusONE revealed that the issues were with oversubscribed firewalls. Without the need for an inefficient, costly war room session, the visibility delivered by NETSCOUT’s Smart Edge Monitoring solution during this software pre-launch test provided the details necessary for the IT team to upgrade their firewalls and fix the problem. It also avoided a potentially challenging, and embarrassing, customer-impacting slowdown during a peak session.
Implementing a comprehensive network and application performance management solution for end-through-end visibility not only helped the IT team for this company meet their goal of eliminating War Rooms, but they have also reduced the average time to research, troubleshoot, and remediate problems from 10 – 15 hours per incident to 15 - 20 minutes. A dramatic reduction in MTTR! And the improvement is noticeable to employees and customers, as well as the executive leadership team.
With the help of the NETSCOUT OSE, the IT team is gaining quick time to value in the overall Smart Edge Monitoring solution and their coordinated efforts led them to identifying the potential slowdown at the firewalls on their ISP links. Additionally, the OSE and IT team leveraged nGeniusONE visibility into the Web portal used by customers for a variety of services, to ensure that expectations for quality user experience were met and sustained. When offering services that need to perform flawlessly for millions of users around the world, expertise and visibility is essential to ensure success.