Case Study

Forte Data Solutions Relies on NETSCOUT to Ensure Availability & Performance of Applications in AWS Cloud

Forte Data Solutions Relies on NETSCOUT to Ensure Availability & Performance of Applications in AWS Cloud


The Challenge
  • Slowdowns, instability and intermittent freezes with Apache, Oracle and Java applications hosted in AWS 
  • Poor user experience 
  • Delays and failures in database backups 
The Solution
  • vSTREAM™ software appliances were deployed along with virtual nGeniusONE® in the AWS infrastructure and configured to analyze networks on Apache, Oracle, and Java application ports 
  • The nGeniusONE platform dashboards and top-down workflows provided immediate insights into database, web, applications and their dependencies 
The Results
  • Shortened MTTK by over 70 percent which accelerated resolutions to problems 
  • Improved customer experience with reduced latency of the web-tier from seconds to milliseconds 
  • Enabled efficiencies in scaling in the cloud 

Customer Profile

Forte Data Solutions are helping organizations successfully navigate complex digital transformations. They are well-regarded experts in database and application migrations, and feature web-based statistical applications used for generating reports on the sales per given time periods, as well as for specific products and other customizable criteria. They have successfully built and delivered integrated technology solutions for organizations of all sizes throughout the world. 

The Challenge

As experts in migrating database, application, and storage infrastructures to virtual cloud environments, Fortes’ reputation depends on the seamless and reliable performance of its solutions. When customers began experiencing issues with their web-based statistical applications, the IT teams faced the challenge of addressing the problem quickly before it could impact the bottom line and each company’s quality of service. 

Users stated that it took several seconds to save their work, causing reports to be delayed anywhere from several milliseconds to several seconds, which had a cascading impact with other applications that relied on these reports. Ultimately, more than 50 percent of customers’ reports were severely delayed, impacting user experience while costing Forte lost revenue and damage to its reputation. 

Forte’s web-based statistical application runs on two web servers behind Elastic Load Balancing (ELB) which functions as both a Network Load Balancer and Application Load Balancer. As the server load grew, customers experienced slowdowns when running queries, as well as instability, intermittent freezes within the application, and timeouts while creating reports. 

In response to the load expansion, data stored in an Oracle database and running on a multi-node Real Application Cluster (RAC) installed on Amazon Elastic Compute Cloud (Amazon EC2) RHEL instances were placed in an Auto Scaling Group. This enabled activity peaks to be easily satisfied by automatically scaling out new RAC nodes once CPU and RAM usage thresholds were exceeded. The Auto Scaling Group was set up for a minimum of two nodes and maximum of six. 

However, Forte’s AWS administrators noticed the Auto Scaling Group was regularly scaling in and out the number of active RAC nodes beyond the established limit. Even the additional Amazon EC2 instances were constantly starting up and shutting down, causing frequent Amazon CloudWatch Alarms to be raised. Furthermore, database backups were delayed, while some backups failed due to the database performance issues. 

  • Forte’s IT team attempted to address these issues a variety of ways: 
  • Testing the web tiers and adjusting some Apache parameters 
  • Utilizing new Amazon EC2 instance types with more memory and CPU processing power 

Collaborating with Database administrators to align the database settings with changed Amazon EC2 settings, which did reduce the number of instances scaling out 

Unfortunately, these changes did not reduce the frequency of the Auto Scaling Group scaling in and out.

Solution in Action

Forte’s IT team turned to NETSCOUT® to help solve the infrastructure and application slowdowns, instability, and intermittent freezes that plagued mission-critical functions relied upon by the business. NETSCOUT worked with Forte to deploy an application and service assurance solution through AWS Marketplace for their AWS infrastructure. This included vSTREAM software appliances with virtual nGeniusONE, that were configured to monitor Apache, Oracle, and Java application by analyzing network traffic on their respective ports. 

The nGeniusONE dashboard workflows provided insights into database, web, and application details and their dependencies. Database monitoring revealed evidence of persistent scaling in and out, while web monitoring uncovered persistent latencies on web servers, thus eliminating the RAC cluster as the root cause. Application session analysis showed multiple Java and embedded SQL-related errors. The Java errors retrieved from packets indicated version-related issues that began occurring after recent Java upgrades. 

Armed with these insights, Forte’s IT team was able to revert to a previous Java version. This was accomplished by using a previous Amazon EC2 snapshot. The Java config and libraries were successfully restored and downgraded on both web tier Amazon EC2 machines. Once this fix was applied, the errors disappeared and the RAC Auto Scaling Group returned to normal threshold usage of two machines. 

The Results

NETSCOUT’s monitoring solution has allowed Forte to address the slowdowns, instability, intermittent freezes, and timeouts plaguing customers. Their IT team is now able to proactively monitor and troubleshoot application performance in their AWS environment. 

The nGeniusONE dashboard workflows shortened Mean-Time-To-Knowledge (MTTK) by over 70 percent and empowered the IT organization to quickly identify the root cause of issues and solve problems faster. 

By using the NETSCOUT solution, Forte achieved tangible benefits, including: 

  • Improved customer-experience by reducing the latency of the web-tier from seconds to milliseconds. 
  • Enabled efficiencies in scaling by stopping the unnecessary workload-driven node scaling at the database tier that were happening every 5-7 minutes. 
  • Reduced time lost to unnecessary attention and analysis by eliminating hundreds of redundant daily CloudWatch Alarms.