Failure management and performance optimization techniques need scalable and intelligent methods to analyze the large amounts of monitoring data, and to apply
smart optimizations to mitigate problems and ensure dynamic resource management in the presence of frequent component failures. At this level of scale, different system components operate at multiple time scales, further adding to the management challenges.
At the core of the overflow of alerts, lay core business truths. Unfortunately, businesses operating in the cloud are overwhelmed by the alerts and are not refining this critical information to make informed business decisions. In fact, companies should be distilling the alerts to a point where the most important ones reach a top 10 dashboard.
All business & monitoring needs should be consolidated into one screen. Otherwise, the most important information becomes overwhelming and unusable.
Ariel’s presentation can be viewed and downloaded here