I often get a question: Artem, should we develop in-house AI solutions, or should we expect a vendor to solve our problems? In this article, I try to answer this question. Of course, the answer cannot be "yes" or "no".
Each business is unique
Each business is unique. Each IT infrastructure is unique. Each team of people responsible for the business and IT is also unique. At StackState, we recognize it, and we recognize that each client has some AI-specific needs and wants, that differ from other clients. However, we also recognize that while being unique, our customers also share a lot. These two observations are the basis of StackState modeling strategy – give the most for least investments without limiting our client to roll out custom solutions. These two observations are also why we look at StackState as a product and a platform.
Flexibility versus investment
Figure 1 illustrates the modeling approach of StackState as a pyramid with four levels organized in two layers – the product layer and the platform layer. Moving up the pyramid, the client gets increasing amounts of flexibility at the cost of additional investments. In the next sections, I discuss each level in detail: which problems it solves and what is the investment on the side of the client. However, first, I explain what I mean by investments.
What are Investments?
A machine learning model requires to compute power, the software infrastructure to train and serve models, data science knowledge to know what model to use for which task and domain knowledge to define tasks to be solved by models. These are the investments that must be made to get a model that provides business value. StackState can make some of these, and some are left up to the client.
Our approach to delivering maximum flexibility
Figure 1: the modeling approach of StackState
1: Pre-trained Machine Learning Models
At the lowest level, we have pre-trained machine learning models. StackState has invested compute power to train them, developed all infrastructure to train and serve them, solved data science problems, and defined the tasks using the domain knowledge. The client simply needs to use them. What can these models do? They can solve generic IT problems, that are common among all clients. I like to call them 1-T models – they typically focus on single component failures. A good example is a Java memory leak – it does not depend on the client's topology and can be detected with a pre-trained model. StackState works as an out-of-the-box product.
2: Retraining Machine Learning Models
At the second level, we have generic problems with specific behaviors – disks running full, memory leak of one component affecting the latency of another component. These problems may depend on the particular component properties, that can be different among components, or on the particular relationships between components, topology. Because of that, the pretrained models cannot solve these tasks, and the client must invest computational resources in retraining them. However, StackState invests in all the infrastructure, data science, and domain knowledge. IT problems exist on these two levels. StackState works as a customizable product.
3: Auto Machine Learning Model Selection
On the third level, we have problems that were not anticipated by StackState – these are problems specific to the client's business. Each business is unique, and that's why StackState allows users to specify new tasks for StackState AI. For example, a webshop might want to detect anomalies in the number of clients visiting the website or the conversion rate. To solve this task, StackState AI needs domain knowledge – what to predict and on what to base the predictions. For instance, to perform the anomaly detection in conversion rate the user would specify the stream inside the StackState which holds the conversion rate, and potentially related streams such as weather, holiday information, and information about promotions and discounts. Then StackState would automatically select the right algorithm, train and deploy it. This way, the client invests domain knowledge and compute while StackState invests data science and infrastructure. StackState works as an AI platform at this level.
4: Custom Machine Learning Models
At the top level, the client wants to do something completely new – solve some highly specific problems. For example, the client might want to predict the capacity requirements for a campaign or plan a move to the cloud. Here the client has endless options. The client can use custom models, training routines, data pipelines all while taking advantage of StackState AI infrastructure. Alternatively, if even that is not enough, the client can use StackState just as a data platform or a visualization/alerting platform. The client pulls data from the 4T data model, performs data science, and optionally serves the predictions to show in StackState's user interface. This way, the client has ultimate flexibility while enjoying all the convenience of the StackState platform.
Build or buy?
At StackState, we strive to be a product and a platform because we recognize the need to provide out-of-the-box solutions while not restricting the AI capabilities of the client. StackState aims to provide effortless solutions to IT problems by providing convenient tools to solve problems at the intersection of business and IT.
The final advice is the following: it is preferable to buy off-the-shelve solutions when they exist; however, there are essential use cases that are not covered by these off-the-shelve solutions. Therefore, it is vital to invest in an open AIOps platform, that lets you roll out custom solutions while leveraging as much of the platform capabilities as possible for data access, machine learning life cycle management, etc. This way, the organization maximizes the ROI and has a short path to value.