StackState Blog

Capture your entire IT stack in one data model

Posted by Lodewijk Bogaards on May 24, 2017 6:16:40 PM
Find me on:
Group 20.png

On October 16th of 1843 a mathematician called William Rowan Hamilton crossed a bridge in Dublin when all of a sudden he was struck with the solution to a problem he had been working on for many years.
He became so enthusiastic that he cut his famous equation for a new type of number he called a quaternion into the stone of the bridge: "i^2=j^2=k^2=ijk=-1." Quaternions, among other things, make rotating objects through arbitrary orientations in a 3D space possible. We have Hamilton to thank for making it possible to safely land on the moon. In fact we have Hamilton to thank for making our WebGL powered 3D engine work at StackState.
At times, the ability to get an insight of such complexity seems to me to require a superhuman amount of genius. Yet I recognize that Hamilton, that day on the bridge, had simply arrived at a critical point after a long and arduous journey. The end of one stage of the journey and the beginning of another. Once given enough background information on what Hamilton was exploring mentally, the equation that marks the completion of that stage of the journey, though still highly impressive, is not beyond human capability.

Sometimes the end of a journey is not marked by a flash of genius. Rather sometimes an idea can be the start of a journey and the realization of how strong a concept really is slowly builds over time. So it is with what we, at StackState, call the 3T data model, which stands for: Telemetry, Topology and Time. 
Like Hamiltons’ three letters i, j and k; the three T's are dimensions too, each one perpendicular to the other. The three T's are a way to model the configuration of what is going on in any IT system; large or small, based on microservices, containers, web services, monoliths, serverless, cloud or on-premise. It does not matter. In fact, the word system may be used in a larger context to mean a system that could include: embedded devices, IoT, batch, real-time, offline or temporary components. 
At StackState we have invested almost all of our time in getting this model right and have never gotten into a situation yet that could not be captured through this model. Let me walk you through the 3 simple ingredients:
1. Telemetry
"The process of recording and transmitting the readings of an instrument." Typically each live component, and indeed also each live (network) connection, in a IT system is continually emitting a wealth of information. For example,  a microservice will log events, send or collect its own metrics, open and receive network connections and emit heartbeats. The operating system the microservice runs on will gather information on memory usage, cpu utilization,  page usage, etc.
2. Topology
"the way in which constituent parts are interrelated or arranged." Topology is a word used in networking to describe a  logical or physical network layout/graph. We use it in the broadest sense, meaning the description of the inter-dependencies between the IT components, be it hardware or software, that form a system. 
The data model for a topology is best suited to a directed graph. Storing a topology in a proper graph database is important if one hopes to achieve some type of performance when launching graph algorithms. Graph algorithms are extremely useful to find failure trees, indirect dependencies, weak spots, disconnections, etc. There are a ton of interesting facts one can find out about large IT systems when one has a complete picture of the actual physical up-to-date topology. 
Fun fact, Hamilton also contributed to the wonderful field of graph theory and is among other things well known for what are called Hamiltonian graphs.
Monitoring on the basis of telemetry alone, which is still mostly the norm nowadays, provides little insight into the big picture of what is happening because of the large volumes of unconnected data. Relationship between telemetry streams must either be known or discovered through some clever algorithms. We can simply skip over that part if we have a proper topology in place.
Now that we have set the stage, here is the interesting bit. What ties topology and telemetry together? Time. telemetry and topology in one model
3. Time 
Like in quantum mechanics a waveform collapses to a particle when observed, telemetry collapses to the topology when observed. The topology is the state of the stack in the current time. One might visualize this as telemetry passing through the plane of topology. As soon as telemetry hits the topology plane it can be observed and change the topology. (1).gifSave each change in the topology as a snapshot of the entire graph in time
Not all streaming telemetry information is worth observing, but some parts are, and those should be captured to update the state of the topology. Take for example a telemetry stream of the total amount of free memory of some server. Typically when everything is running smooth, the stream can remain a waveform, but as soon as the amount is running dangerously low it is worth capturing this information on the topology. And like a particle that interacts with other particles, topological components that will be impacted by the fact that this server is running low on memory may pro-actively be notified by a traversal of the topology's dependency graph.
It is thus the act of observation in real-time that ties telemetry and topology together. In StackState we have a number of user-programmable stream processors which automatically handle the act of observation and result in changes of the topology.  This system can react to any kind of metric streams, event streams and even changes to the topology that are generated based on these streams, thus producing a chain of effects. Real-time side-effects can be pushed to other systems as well. One might for example send events out to chat app, an incident management tool or orchestration tool.
At StackState we've build a special time traveling graph database to accommodate this model. This gives us the ability to save each change in the topology as a snapshot of the entire graph in time. From a DevOps engineers' perspective these snapshots are incredibly valuable information. One can replay, visualize and programmatically analyze all changes, which are worth observing by some self-defined standard, of the entire stack at any time. I urge you to imagine what you could do with that kind of super power.
VIDEO: the 3T data model explained by Lodewijk Bogaards, CTO at StackState
The 3T data model is extremely well suited to describe what happens in a stack, but of course is not well suited for each and every use-case. Generally it describes what is happening at what time and how each piece influences the next, but it does not necessarily describe why those pieces are there, why they connect with each other in the first place and how exactly they achieve their goals. 
William Rowan Hamilton spent the remaining part of his live studying applications of quaternions.  His son William Edwin wrote that his father "spoke of anticipated applications of Quaternions to Electricity, and to all questions in which the idea of Polarity is involved applications which he never in his own lifetime expected to be able fully to develop, bows to be reserved for the hands of another Ulysses." Hamilton has been at the top of my mind, because I identify with his poetic enthusiasm for something as abstract as a model. The applications of which are way beyond my understanding, but the benefits of which inspire me to work on StackState every day and enable our customers to run their business in an unprecedented way.
Want to learn more about the 3T data model and how it can help your organization move forward? Request a personal demo and get in touch with one of the StackState experts.
Get In Touch

Topics: Product, Algorithmic IT Operations

Our Mission

To simplify the lives of IT managers, Operators and Developers

To accomplish this, we created an Algorithmic IT Operations platform. It aggregates information from a multitude of sources and existing Dev/Ops tools to provide a unique insight into the health of the entire IT stack and to find root causes of problems across tools, teams and departments. Validate the effects of changes before applying them. Not only for one type of system, but for the full stack regardless of its size.

Join over 5,000 people from companies like eBay, American Express, Cisco, Tesco, ING and more who get our best new posts delivered via email. Subscribe below if you'd like to get it too:

Subscribe to Email Updates