StackState Blog

The Monitoring Maturity Model explained

Posted by Mark Bakker on Aug 11, 2015 8:59:00 AM
Find me on:

 

The pace of change is increasing. Component sizes are shrinking. All the while monitoring solutions are bombarding us with log data, metrics, status reports and alerts. It all scales, but we don’t. How do we prevent from drowning in run-time data?

 

A lot of companies are facing the same problem. They have such a huge amount of data, but can’t get a total unified overview. When problems occur in their IT stack, they don’t know where it originates. Was it a change, an overload, an attack or something else? Based on our experience, we created the Monitoring Maturity Model. At which level is your company now?


Level 1 - Health of your components

At level one you have different components, but monitor solutions at this level only report if they are up or down. If something happens in your IT stack, you will see a lot of red dots and you will probably get a lot of e-mails which say there is something broken. So at level one you will only see the states and alert notifications per (single) component.

Level 2 - In-depth monitoring on different levels

Most of the companies we’ve seen are at level two of the Monitoring Maturity Model. At this level you are monitoring on different levels and from different angles and sources. Tools like Splunk or Kibana are used for log files analysis. Appdynamics or New Relic are used for Application Performance Monitoring. Finally we have tools like Opsview to see the component's states of different services. And that’s a good thing, because you need all this kind of data. The more data you have, the more insight you have on the different components. So at this level you are able to get more in-depth insight on the systems your own team is using.


But what if something fails somewhere deep down in your IT stack, which affects your team? Any change or minor failure in your IT landscape can create a domino effect and eventually stop the delivery of core business functions. Your team only sees their part of the total stack. For this problem, we introduce level three of the Monitoring Maturity Model.


Monitoring Maturity Model
Level 3 - Create a total overview

At level three we don’t only look at all the states, events and metrics but also look at the dependencies and changes. Therefore you need an overview of your whole IT stack, which will be created using existing data from your available tools. To create this overview you will need data from tools like:

  • Monitoring tools (AppDynamics, New Relic, Splunk, Graylog2)
  • IT Management tools (Puppet, Jenkins, ServiceNow, XL-Deploy)
  • Incident Management tools (Jira, Pagerduty, Topdesk)


Re-use this existing data from different tools to create the total overview of your whole IT stack. At level three you are able to upgrade your entire organization. Now each team can view their team stack as part of the whole IT stack. So teams have a much easier job finding the cause of a failure.  Also teams are now able to find each other when this is needed the most. This level also helps the company to get a unified overview while letting teams decide which tools they want/need to use.

Level 4 - Automated operations

Level four is part of our bigger vision, at this level we will be able to:

  • Send alerts before there is a failure
  • Self-heal by for example scaling up or rerouting services before a service is overloaded
  • Abnormality detection
  • Advanced signal processing

 

We will implement level four by using our IT Operational Memory (ITOM), which enables time travelling and all sorts of complex graph operations.

Your next step
Learn more about our Monitoring Maturity Model and how you can improve your current IT operations. Join one of our live webinars and discover the impact of a single real-time view of the total IT stack. 

Sign up

 

 

 

Topics: Dev/Ops, ITSM

Our Mission

To simplify the lives of IT managers, Operators and Developers

To accomplish this, we created an Algorithmic IT Operations platform. It aggregates information from a multitude of sources and existing Dev/Ops tools to provide a unique insight into the health of the entire IT stack and to find root causes of problems across tools, teams and departments. Validate the effects of changes before applying them. Not only for one type of system, but for the full stack regardless of its size.

Join over 5,000 people from companies like eBay, American Express, Cisco, Tesco, ING and more who get our best new posts delivered via email. Subscribe below if you'd like to get it too:

Subscribe to Email Updates