We have all heard by now about the outages experienced yesterday by United Airlines (UAL), the New York Stock Exchange, and The Wall Street Journal. What we do not know is what exactly happened or what the economic impact was or will be.
Another thing we know is systems are more complex and manage more variables than ever before. For instance, think about UAL. Nearly every aspect of their systems are inexorably linked. When we fly today we make reservations through: agents, web sites, mobile apps and airline call centers. We may change our travel, our times and our seats. We may carry a web-printed paper ticket, a smart phone or smart watch to get through security. We may print a ticket at a kiosk or get one from a skycap or agent. When we go into security we are scanned by TSA and our ticket is validated. When we get on the plane this happens again and if any part of this system fails nearly all of it can result in a system crash landing.
I have no doubt that UAL has some of the brightest technology minds in the business. In addition to keeping this super complex network running 24/7/365 worldwide they are now faced with merging it all with Continental Airlines, no small feat for any company and especially complex in the airline space. So what is the point?
There is a lot more riding on our systems than ever before. The Wall Street Journal said, “Reliability remains elusive” yet most American companies enjoy excellent uptime records. The issue here is change. We are changing the way we build, manage and operate systems to improve service quality, add services and reduce operational costs. This means companies incur risk and they need to be careful to instrument both before and after new initiatives are considered. You can not measure what you do not instrument and you can not manage what you do not measure.
Forester Research reported that 90% of the time spent in fixing problems was identifying or discovering exactly what the problem is. The reason this takes so long is because there are too many purpose-built tools and not one single version of the truth. If companies want to radically reduce Mean Time to Know (MTTK) they need to take a holistic-all-seeing approach.
This is where a network traffic-based monitoring approach would reign supreme. All of you NETSCOUT users know what I am talking about.
In sum, I respect all of the professionals involved in running these systems. Systems are complex, dynamic, growing, changing and under constantly varying workloads and still they need to work, always.
We have all come to depend on the airlines, newspapers and stock exchanges. Their absence is not an option. Just as companies are spending millions on security it is imperative that they invest wisely in a total Service Assurance strategy. This is the primary reason NETSCOUT has acquired Arbor Networks, Tektronix Communications, Fluke Networks and VSS Systems in one fell swoop.
If there is one thing we do well it is instrument, monitor and manage complex systems. 80% of NETSCOUT surveyed customers reported reducing MTTK by 80%. So while The Wall Street Journal may say reliability remains elusive we know how to impact it and the corporate bottom-line. As far as NETSCOUT is concerned… #ThereIsNoOff.