Skip Links

Network World

  • Social Web 
  • Email 
  • Close

Troubleshoot to repair, or predict and prevent?

By Steve Henning , Network World , 06/10/2008
This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.
  • Share/Email
  • Comment
  • Print

It sounds simple. Instead of spending hours or days troubleshooting an application slowdown or system outage, why not just avoid it to begin with?

Until recently, the only way for IT organizations to resolve problems was to sift through alerts, log files and trouble tickets and burn the midnight oil on conference calls. Today, powerful analytics and automation capabilities built into system management tools can help organizations identify and resolve issues before they become problems.

Interconnected business services have made management exponentially more difficult. Collecting more data isn’t the answer because:

* Monitoring static thresholds triggers a flood of alerts, most of which do not represent actual problems.
* Problems are identified by groups of abnormal behaviors, not a solitary metric.
* With tens of thousands of devices and millions of metrics, the correlation effort required to identify problems is impossible.

This deterministic approach is not only ineffective but also cannot scale to accommodate increasing complexity. Highly complex service infrastructures demand a new approach, a probabilistic approach.

Intelligent system-management solutions now employ sophisticated correlation algorithms to sample subsets of metric data and deliver accurate information about potential system behavior. In addition, new learning technologies continuously refine alert thresholds — providing dynamic thresholds that recognize and accommodate the normal ebbs and flows of business. A probabilistic approach allows organizations to solve problems faster and with far less manual effort.

Intelligent management solutions integrate with existing monitoring infrastructures, automatically collecting and analyzing metrics from across all tiers of an application — such as Web server, application server and database tiers.

The first job for the intelligent management solution is to learn the normal behavior of the application. It should be possible to build behavior models for each resource in your infrastructure by using dynamic thresholding algorithms to continuously collect data. This makes it possible to compare the real-time measurements of metrics with the expected range of values to determine when a metric should trigger a threshold violation.

  • Share/Email
  • Comment
  • Print
Partner Content

NetScout is one of the world's premier providers of integrated network and application performance solutions.

www.netscout.com

Know First

Get Proactive — Move from Troubleshooting to Monitoring to Management with nGenius K2's Service Dashboard & Intelligent Early Warning Alarms

Watch the Video

Know Where

Get Rapid Performance Problem Isolation with nGenius Performance Manager and Diagnose Problems up to 70% Faster!

Learn More

Know Why

Get the Details to Validate and Solve your Toughest Performance Issues with nGenius InfiniStream and Sniffer Intelligence Modules

Read the Whitepaper

Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a NetworkWorld account? Log in here. Register now for a free account.

Videos

rssRss Feed
Get instant email notification when white papers, webcasts, executive guides are added to our library. Stay informed and up-to-date with the latest on IT Technologies with Network World's Resource Alerts.
Network World,to go. Wherever you are. Breaking news delivered to your mobile device. Select the hottest topics in networking and start receiving Network World on your mobile device today.