I have lost count of how many times I’ve heard IT professionals state they’ve eliminated human error because they’re using automation. Personally, I believe this is a dangerous assumption, and I will explain why with an example from the world of economics.
In early 2010, two Harvard economists, Carmen Reinhart and Kenneth Rogoff, published the results of analysis they’d performed on the relationship between a country’s debt level and its long-term economic growth. They predicted that if a country’s ratio of debt to gross domestic product (GDP) exceeded 90 percent, then that country’s growth would be negative. Many governments, still grappling with the fallout of the 2008 economic crisis, used this data to launch and justify austerity programs that sought to avoid the 90 percent tipping point predicted by Reinhart and Rogoff.
It took three years before Reinhart and Rogoff admitted there were errors in their calculations, and the admission occurred only because researchers in Massachusetts couldn’t replicate their results. When the errors were fixed, the predictions were substantially less pessimistic. Many observers subsequently argued that the original, flawed predictions had led to unnecessarily aggressive austerity measures in multiple countries.
‘Exactly as Requested’ Versus Exactly Correct
So, what does this have to do with automation? Well, Reinhart and Rogoff used a very common form of automation to perform their calculations: a Microsoft Excel spreadsheet. You may not think of it this way, but a spreadsheet is a great example of automation: it takes a task that is tedious and error-prone for a human to complete, in this case performing a sequence of mathematical calculations, and performs that same task rapidly and exactly as requested.
One of Reinhart and Rogoff’s errors will be very familiar to regular users of spreadsheets: there was a mistake in the range of a formula. In this case, the spreadsheet was supposed to average data from 19 countries, but in fact it included only 14. This small coding error led to a large error in the overall prediction.
Note that when I described the operation of a spreadsheet, I wrote “exactly as requested” and not “without error.” A spreadsheet will execute the calculations that have been entered without error, but that does not prevent the human who creates the spreadsheet from introducing errors into the equations and formulas. This is a critical distinction—one that naturally leads to the importance of observability and visibility in any automated system.
Observability is a property that enables visibility. Observability represents the degree to which a system’s internal state can be observed and understood, either by way of design or by way of instrumentation and monitoring.
The Importance of Observability and Continuous Monitoring
The vast majority of automated systems in an IT environment ultimately are driven by instructions provided by a human being, via a user interface, a configuration file, or a script. All of these mechanisms allow for the introduction of human error. For example, it’s generally accepted that on average, programmers introduce between 15 and 50 bugs per 1,000 lines of code they write. Of course, most bugs are discovered and fixed through testing, but some inevitably will go undetected. It’s also important to remember that the automation frameworks used in IT systems are themselves implemented as software.
Reinhart and Rogoff declined to share their spreadsheet for review by their peers. This meant there was no observability and no visibility. As a consequence, the effects of their coding errors went undiscovered for three years, arguably with profound effects on the lives of millions of people.
This cautionary tale provides a startling example of how small errors in the programming of an automated system can have far-reaching consequences. It also reinforces the enormous importance of maintaining a high level of observability in automated systems. Such observability, coupled with continuous monitoring, allows organizations to catch errors early and quickly—whether they arise from the programming of the system or exist in the fabric of the system itself.
Read the White paper: New Digital Architectures Require an Innovative Approach to Service Assurance