StackState-Logo-2020
    Start StackState Now

     Tags: monitoring

    These days, I see that a lot of IT professionals consider ‘monitoring’ to be a bit boring and outdated. I’ve heard IT professionals saying: "You just have to keep the systems and applications up and running. How hard can it be? We've been doing this for more than 40 years already". Although this is not far from the truth, monitoring is definitely not boring. I'll explain why I think that monitoring is cool, crucial and still involves cutting-edge technologies IF you do it right that is. I'll show you the crucial steps how to do it right but first let's take a step back to see how monitoring started.

    How monitoring started

    One of the pioneers of monitoring is Aubrey Chernick. He decided to start his own company to build software that would manage the new IBM Operating System. He called his company Candle Corporation. His first product was called Omegamon – a system that monitors the internal operation of MVS. This all took place back in 1976. Most readers of this blog weren’t even born by then! Monitoring is being around for many years and left aside cool new technologies like Kubernetes and AWS, it’s the kind of stuff that your (grand)parents used to be excited about. So, when you realize that, it’s not that weird to think that monitoring is boring and outdated. Because that’s what you usually think of the stuff your (grand)parents get excited about right?

    Monitoring challenges of today

    But how boring is monitoring really? Because a lot has changed since 1976. IT infrastructures have become very dynamic and fast-changing. Traditional monitoring tools and procedures can’t keep up with the increasing pace of change in the dynamic infrastructures of today. Imagine the following scenario: there’s an incident at business level and it was caused by processes running in a container that lived for just a couple of seconds. Will you be able to determine the root cause of this incident by looking at traditional metrics like CPU usage or response times? Especially when it is caused by something that isn't there anymore? I guess you’ll have a hard time finding the root cause quickly!

    One of the biggest challenges to do monitoring right

    Dynamic and fast-changing scenario’s like the one mentioned here above that can take place in the IT landscape of today are the reason why monitoring is now more relevant than ever. But how can you deal with this? To tackle the above-mentioned challenge and find the root cause quickly it’s important to have a holistic understanding of your IT environment. However, getting a holistic understanding is one of the biggest challenges in monitoring today.

    "Psst...renowned global research company Gartner has listed all Artificial Intelligence for IT Operations (AIOps) vendors in their New Market Guide. Download your free report right here!"

    How to get a holistic understanding

    In order to get a holistic understanding of your IT environment it’s important to see the context of your whole IT landscape. One way to understand everything in context is to ask yourself the following questions:

    • From which components is your business application made? In this case, components refer to hardware such as computers, servers, routers, switches and other equipment. But also software which includes productivity applications, enterprise resource planning (ERP) and customer relationship management (CRM); networks which comprise Internet connection, Network enablement, firewall and security and much more!
    • What's the "when", "where" and "why" that caused your business to be impacted?
    • How can all the changes that took place in your environment be correlated easy?
    • How can you determine the importance and impact of the incident?

    Topology

    Answering the questions mentioned above is not enough to understand everything in context. An important part is to visualize how every component relates to each other. StackState delivers this visualization through one unified topology overview. See all dependencies and get a shared understanding across teams and tools to get to the root cause of incidents faster. Click here to read more about this.

    Traces_Topology (1)-1

    Artificial Intelligence

    However, with the right questions and visualization, you're still not there. Because to get all the answers on the questions here above involves an incredibly huge amount of data that no human today would be able to digest. Machines are able to process this huge amount of data if they are equipped with the right Artificial Intelligence. For example, StackState consolidates all your IT landscape data into its powerful AIOps platform. From all kinds of sources and with all dependencies in place. This context is the ultimate base for applying Artificial Intelligence. The more context you have. The more value you’ll get out of it. Giving you a better holistic understanding of your IT environment.

    Crucial steps for the right monitoring

    So by asking the right questions, visualizing your IT environment in a topology and applying Artificial Intelligence you've got the right mindset and tools to get a holistic understanding to do monitoring right. The only thing left to do is to follow the right steps that are crucial to monitor the dynamic infrastructure of today correctly:

    1. Measure everything: not just infrastructure and application (performance) metrics, but any metric relevant to your business. Like Google Analytics or maybe even the weather forecast. If this scares you (“OMG, measure everything, really? How? It will take forever!”) then start with measuring the four golden signals (described in Google’s SRE handbook):
      • Latency
      • Errors
      • Traffic
      • Saturation
    1. Gather all the information: about how components are interconnected and contribute to your business. No single component has the complete truth. Therefore, it’s important to combine all the data from the components.
    2. Track every single change: make sure all changes are registered, whether that is by tagging or registered in e.g. a deployment tool. Don't allow manual changes.
    3. Combine all the data: preferably in an AIOps platform like StackState, with the ability to do topology visualization. Because without an understanding of the context, it is nearly impossible to determine the correlation of events and possible impact.

    Don't wait to start following these steps when your application is already deployed into the production environment. Apply these steps in every process of your CI/CD pipeline.

    Done right, monitoring is exciting stuff!

    So, after reading this, do you still think monitoring is boring? Even after more than 40 years, if you do monitoring right, it involves cutting-edge technologies and will make or break your business performance. Therefore monitoring is cool and crucial. It just has to be done correctly and efficient. Ask the right questions, visualize your IT environment in a topology and apply AI. Start from the beginning by following the crucial steps to monitor your environment right.

    You can start with StackState GO right away for FREE!

    First want to learn a little bit more or do you have a question? Book a guided tour with one of our StackState experts to answer your questions and explore your needs.

    Mark Arts is Senior Sales Engineer at StackState and has over 20 years’ experience working in IT

    Subscribe Our Blog