Network Troubleshooting, It’s time to Rethink Our Methods
Network Performance Management Leaders
If we do what we always did, we’re going to get what we’ve always gotten.
Fixing network problems can be a real pain - especially as IT environments get more complex. Many issues today are intermittent and find new ways to hide from detection, compounding the difficulty in finding the root cause.
Despite the increase in complexity, we find that in most networks, engineers use the same old tricks and believe the same old myths when it comes to resolving problems. It’s no secret; these issues cost businesses lots of money and at times can even cost engineers their jobs. It is time for a change. Let’s dump the old ways and learn to take on some new ones.
In this post we will take a look at three common misconceptions in troubleshooting and how we can improve our approach when a problem strikes.
Myth 1: Upgrades will improve performance
While upgrades are definitely necessary from time to time, be very careful and selective when choosing how to direct budget dollars. When applications are slow, buying more bandwidth or a new core switch may do very little for improving end user performance.
Before upgrading anything as a troubleshooting step, make sure to collect as much information about the issue as possible. Is low bandwidth or network congestion really the root cause? Is the application lagging because of low resource or bad code? Tools that make use of deep packet inspection can help to clearly answer these questions and direct attention to the true root cause.
Myth 2: A certain level of Ethernet errors are acceptable
Hey, this was true back in the day. Cabling systems were not as reliable, interface hardware was still developing, and switch and network configurations were still being ironed out. However, it's 2016 – things have greatly improved.
By now we should have very few, if any, Ethernet errors on switch and router interfaces. FCS errors, misaligned packets, oversized packets, undersized packets, late collisions, and misconfigured VLAN tags should be behind us, at least on infrastructure devices that we control. These errors cause dropped packets, which cause data retransmissions, which cause application delays – i.e., things get slow.
I can't count the number of environments I have been called into where the root cause was due to Ethernet errors on an interface. In almost all of these cases, the assumption was that there is an error threshold, a percentage of acceptable errors. Nope. Chase these guys down like the plague.
Myth 3: A link light means the connection is tested.
Buy switch, install and configure, plug in cables, look for link lights.
Ready to go? No.
Before delivering a link, do more than just check a link light. A linky-blinky does not mean that the connection can support 1Gbps or more. Especially with core connections, these should be validated for throughput, jitter, loss, and errors before assuming that they are ready to go.
These are just a few of the common misconceptions made in troubleshooting networks today. For a more complete list, please check out THIS WHITEPAPER. When battling network performance problems, make sure you are equipped with the right tools and the right visibility to make the best decision for the business, and ultimately, you!