WO2023208136A1

WO2023208136A1 - Kpi anomaly detection method and apparatus, device and medium

Info

Publication number: WO2023208136A1
Application number: PCT/CN2023/091310
Authority: WO
Inventors: 苏海明
Original assignee: 郑州云海信息技术有限公司
Priority date: 2022-04-28
Filing date: 2023-04-27
Publication date: 2023-11-02
Also published as: CN114781529A

Abstract

The present application discloses a KPI anomaly detection method and apparatus, a device, and a medium, which are applied to the technical field of KPI anomalies. The method comprises: acquiring single-dimensional KPI time series data of a target interval, the length of the target interval being a first preset time length, and the end time point of the target interval being a specified time point; extracting a first data feature of the single-dimensional KPI time series data; inputting the first data feature into a base classifier, and outputting a preliminary anomaly detection result of the single-dimensional KPI time series data using the base classifier; extracting a second data feature of a target time point, and inputting the second data feature and the preliminary anomaly detection result into a label classifier to obtain a classification result, the target time point being any time point within a second preset time length after the specified time point; and determining a final anomaly detection result of the single-dimensional KPI time series data on the basis of the classification result. In this way, the accuracy of KPI anomaly detection may be improved.

Description

A KPI anomaly detection method, device, equipment and medium

Cross-references to related applications

This application requests the priority of the Chinese patent application submitted to the China Patent Office on April 28, 2022, with the application number 202210460951.1, and the application title is "A KPI anomaly detection method, device, equipment and medium", the entire content of which is incorporated by reference incorporated in this application.

Technical field

This application relates to the technical field of KPI anomaly detection, and in particular to a KPI anomaly detection method, device, equipment and medium.

Background technique

With the rapid development of cloud computing, the realization of bare metal construction with physical machine performance and cloud elasticity is quietly emerging in cloud computing. In order to optimize the performance of physical machines and cloud hosts in cloud computing, analysis and monitoring data analysis has guiding significance for machine performance tuning.

Currently, server monitoring data mainly includes performance data such as CPU (Central Processing Unit), memory, storage, and network. These data include timing performance data such as CPU usage, memory usage, and network throughput. A large part of this data is single-dimensional KPI (Key Performance Indicators, Key Performance Indicators) data. In single-dimensional time series data anomaly detection, we often face some challenges: lack of definable anomaly occurrence patterns; noise may exist in the data; data It is usually unstable and changes dynamically, so it brings a relatively big challenge to anomaly detection of single-dimensional time series data. How to improve the accuracy of KPI anomaly detection is a problem that is constantly being studied in the field of KPI anomaly detection technology.

Contents of the invention

In view of this, the purpose of this application is to provide a KPI anomaly detection method, device, equipment and medium that can improve the accuracy of KPI anomaly detection. The solution can be as follows:

In the first aspect, this application discloses a KPI anomaly detection method in some embodiments, including:

Obtain the single-dimensional KPI time series data of the target interval; wherein the length of the target interval is the first preset time length, and the end time point of the target interval is the specified time point;

Extract the first data feature of single-dimensional KPI time series data;

Input the first data feature into the base classifier, and use the base classifier to output preliminary anomaly detection results of single-dimensional KPI time series data;

Extract the second data feature of the target time point, and input the second data feature and the preliminary anomaly detection result into the label classifier to obtain the classification result; where the target time point is any time within the second preset time length after the pointing time point. point in time;

The final anomaly detection results of single-dimensional KPI time series data are determined based on the classification results.

In some embodiments, extracting the first data feature of single-dimensional KPI time series data includes:

Perform normalization processing on single-dimensional KPI time series data to obtain normalized data;

Determine the normalized data and at least one of the statistical features, prediction features, and frequency domain features of the normalized data as the first data feature of the single-dimensional KPI time series data.

In some embodiments, it also includes:

Construct a candidate root cause set based on multiple single-dimensional KPI time series data for which the final anomaly detection result is abnormal; wherein the candidate root cause set includes one or more single-dimensional KPI time series data;

Taking candidate root cause sets as nodes, and constructing a multi-layer root cause tree based on the data dimensions of the candidate root cause sets;

The multi-layer root cause tree is pruned layer by layer based on the preset pruning strategy, and the abnormal root cause set is determined based on the ripple effect.

In some embodiments, the data dimensions of nodes at the same level in the multi-level root cause tree are the same, and the data dimensions of nodes at each level decrease from top to bottom;

Correspondingly, pruning the multi-layer root cause tree layer by layer based on the preset pruning strategy includes: pruning the multi-layer root cause tree layer by layer from top to bottom based on the preset pruning strategy.

In some embodiments, the multi-layer root cause tree is pruned layer by layer based on a preset pruning strategy, including:

For any layer, determine the influence value of each node based on the preset influence value calculation rules, and determine whether the influence value is less than the preset influence threshold. If so, delete the node and the layers below it. The child nodes of the node.

In some embodiments, the multi-layer root cause tree is pruned layer by layer based on a preset pruning strategy, and the abnormal root cause set is determined based on the ripple effect, including:

For any layer, after eliminating nodes whose influence value is less than the preset influence threshold, the potential score corresponding to each remaining node is determined based on the ripple effect; where the potential score represents the influence of the element of the node on the child node of the node. The degree of influence of elements;

The potential scores of the remaining nodes in each layer are sorted, and the abnormal root cause set is determined based on the sorting results.

In some embodiments, the influence value of each node is determined based on preset influence value calculation rules, including:

The predicted value of each element in each node is calculated based on the preset prediction algorithm, based on the predicted value of each element and each The actual value of the element determines the influence value of each node.

In the second aspect, this application discloses a KPI anomaly detection device in some embodiments, including:

The KPI data acquisition module is used to obtain the single-dimensional KPI time series data of the target interval; where the length of the target interval is the first preset time length, and the end time point of the target interval is the specified time point;

Data feature extraction module, used to extract the first data feature of single-dimensional KPI time series data;

The detection result output module is used to input the first data feature into the base classifier, and use the base classifier to output preliminary anomaly detection results of single-dimensional KPI time series data;

The classification result acquisition module is used to extract the second data feature of the target time point, and input the second data feature and the preliminary anomaly detection result into the label classifier to obtain the classification result; where the target time point is the second data feature after the pointing time point. Any point in time within a preset time period;

The detection result determination module is used to determine the final anomaly detection result of single-dimensional KPI time series data based on the classification results.

In a third aspect, this application discloses an electronic device in some embodiments, including a processor and a memory; wherein,

Memory, used to hold computer programs;

The processor is used to execute the computer program to implement the aforementioned KPI anomaly detection method.

In the fourth aspect, the present application discloses in some embodiments a non-volatile computer-readable storage medium for storing a computer program, wherein when the computer program is executed by a processor, the aforementioned KPI anomaly detection method is implemented.

It can be seen that in some embodiments of this application, the single-dimensional KPI time series data of the target interval can be obtained first. The length of the target interval is the first preset time length, and the end time point of the target interval is the specified time point, and then the single-dimensional KPI is extracted. The first data feature of the time series data is input into the base classifier, and the base classifier is used to output the preliminary anomaly detection result of the single-dimensional KPI time series data. Then the second data feature of the target time point is extracted, and the first data feature is extracted. The two data features and preliminary anomaly detection results are input into the label classifier to obtain the classification results. The target time point is any time point within the second preset time length after the pointing time point. The final anomaly of the single-dimensional KPI time series data is determined based on the classification results. Test results. That is to say, this application uses a two-layer classifier. First, the base classifier is used to detect the single-dimensional KPI time series data in the target interval to obtain preliminary anomaly detection results, and then the label classifier is used to determine the preliminary anomaly detection results, thereby The final anomaly detection result is obtained, which can improve the accuracy of KPI anomaly detection.

Description of the drawings

In order to explain the embodiments of the present application or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only is an embodiment of the present application. For those of ordinary skill in the art, without exerting any creative effort, they can also modify the method according to the appendices provided. Figure obtains additional drawings.

Figure 1 is a flow chart of a KPI anomaly detection method disclosed in some embodiments of this application;

Figure 2 is a flow chart of a KPI anomaly detection method disclosed in some embodiments of this application;

Figure 3 is a schematic diagram of a multi-layer root cause tree disclosed in some embodiments of the present application;

Figure 4 is a schematic diagram of KPI anomaly detection and root cause location disclosed in some embodiments of this application;

Figure 5 is a schematic structural diagram of a KPI anomaly detection device disclosed in some embodiments of this application;

Figure 6 is a structural diagram of an electronic device disclosed in some embodiments of the present application;

Figure 7 is a structural diagram of a non-volatile computer-readable storage medium disclosed in some embodiments of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

Currently, server monitoring data mainly includes performance data such as CPU, memory, storage, and network. These data include timing performance data such as CPU usage, memory usage, and network throughput. A large part of these data are single-dimensional KPIs. In single-dimensional time series data anomaly detection, we often face some challenges: lack of definable abnormal occurrence patterns; noise may exist in the data; data are usually unstable and dynamically changing. , Therefore, it brings a relatively big challenge to anomaly detection of single-dimensional time series data.

Anomaly detection methods for these single-dimensional KPIs include time series-based methods and machine learning-based methods. Based on time series characteristics, they mainly include ARIMA (Autoregressive Integrated Moving Average model, differential integrated moving average automatic model). Regression model)), exponential smoothing model and a series of linear models. Currently, the main method is to use machine learning for anomaly detection, which mainly includes supervised anomaly detection, semi-supervised anomaly detection and unsupervised anomaly detection methods. The supervised anomaly detection method is to train a two-class discriminator through normal and abnormal data instance labels, such as SVM (Support Vector Machines, Support Vector Machines), etc. However, this supervised detection method has some problems. In these data The proportion of abnormal samples and normal samples is seriously imbalanced, and the trained model is easily overfitted. Therefore, this type of method is not as popular as semi-supervised or unsupervised methods.

In the semi-supervised anomaly detection method, a small amount of labeled data is used to train the classification model, and unlabeled data is used to optimize the structural information implicit in the sample.

The most commonly used method is to use deep autoencoders for semi-supervised training on normal samples, so that normal autoencoders have lower reconstruction errors for normal data. Currently, the more mainstream autoencoders such as VAE (Variational Auto-Encoder) , variational autoencoder), AAE (Adaptive Arithmetic Encoder, adaptive arithmetic code encoder), etc.

The third type of unsupervised anomaly detection technology is to detect outliers based only on the intrinsic properties of data instances, and its basic theory also comes from autoencoders. Usually this type of method can be used for automatic annotation of data samples. Commonly used unsupervised algorithms include restricted Boltzmann machines, deep belief networks, etc.

How to improve the accuracy of KPI anomaly detection is a problem that is constantly being studied in the field of KPI anomaly detection technology. To this end, this application provides a KPI anomaly detection solution in some embodiments, which can improve the accuracy of KPI anomaly detection.

As shown in Figure 1, this application discloses a KPI anomaly detection method in some embodiments, including:

Step S11: Obtain the single-dimensional KPI time series data of the target interval; wherein the length of the target interval is the first preset time length, and the end time point of the target interval is the specified time point.

It should be pointed out that in KPI anomaly detection scenarios, KPI anomalies usually exist in the form of continuous intervals. Once an abnormality occurs, it will last for a period of time, not a single point in time. When a KPI is abnormal at time t, the abnormality will continue until time t+T. Therefore, the anomaly detection method is divided into two steps. The base classifier detects anomalies in KPI time series data, and the label classifier further detects them.

For example, at time t, data X _t =(x t-W+1, x t _- _W+2 ,..., x _t ) within the time period of (t-W+1, t) is extracted.

Among them, t is the specified time point, and W is the first preset time length. X _t represents single-dimensional KPI time series data, which can be CPU data, memory data, network data, etc.

Step S12: Extract the first data feature of the single-dimensional KPI time series data.

In some embodiments, the single-dimensional KPI time series data can be normalized to obtain normalized data; the normalized data and the statistical features, prediction features, and frequency domain features of the normalized data can be At least one item is determined to be the first data feature of the single-dimensional KPI time series data.

In some embodiments, the original data can be normalized using the Minmax method, whose expression is:

Among them, X _{t_nom} identifies the normalized data. Moreover, statistical features can include at least one of mean, variance, extreme value, quantile, difference, etc., and predictive features can use EWMA (Exponentially Weighted Moving-Average, exponentially weighted moving average) prediction algorithm to normalize the data. It is predicted that the frequency domain feature can be small Porter Characteristics, using DB2 wavelet decomposition. Among these features, normalized data and statistical features are used to represent the short-term characteristics of KPI time series data, prediction features can indicate the possibility of abnormal KPI time series data to a certain extent, and wavelet decomposition features can represent the frequency domain of KPI data. Characteristics.

Step S13: Input the first data feature into the base classifier, and use the base classifier to output preliminary anomaly detection results of the single-dimensional KPI time series data.

In some embodiments, the XGBoost (eXtreme Gradient Boosting, extreme gradient boosting tree) model can be used as the base classifier for anomaly detection, the first data feature is used as the input of the base classifier, and the output is normal or abnormal. KPI anomaly detection converts data anomalies into a binary classification problem, and uses XGBoost as a binary classifier. The XGBoost model can be expressed as:

Among them, f _k (x) represents the k-th weak learner, and the total number of weak learners in the XGBoost model is K. x _i is the i-th sample, is the predicted value of sample x _i . These K weak classifiers, in order to form a strong classifier, need to minimize the function:

Among them, l(·) is the loss function and Ω(·) is the regularization function. y _i is the true value of sample _xi . In the regularization term, T is the number of leaf nodes of the tree, w is the weight of the leaf node, and γ and λ are the hyperparameters in the regularization term. In each iteration, only the objective function of the t-th regression tree is optimized:

in, is the output of the sample xi corresponding to the first t-1 tree, and ft(xi) is the output of the current tree. Perform Taylor expansion of the objective function, retaining the linear and quadratic terms in the equation, and obtain the approximate value of the objective:

in,

For the first-order derivative and second-order number of each sample on the loss function, i∈I _j represents the sample data mapped to the j-th leaf node. n is the number of samples. The derivative of w _j is equal to 0, and the optimal solution of w _j is found:

Will Substituting into the original objective function we get:

T is the number of leaf nodes. Through the above iteration, the optimal splitting variables and split values of the tree can be found. use Find the tree with the best structure and add it to the model, using a greedy algorithm to find the optimal tree structure.

In this way, through the above detection, we can determine whether the single-dimensional KPI time series data is abnormal, and it is aimed at the point anomalies in anomaly detection. However, in a normal system, the importance of events is greater than the importance of points. This application in some embodiments is more What is important is to pay attention to event anomalies, which are reflected in the KPI as a continuous interval. Therefore, it is necessary to filter the results detected above, that is, use a label classifier to judge the preliminary anomaly detection results.

Step S14: Extract the second data feature of the target time point, and input the second data feature and the preliminary anomaly detection result into the label classifier to obtain the classification result; where the target time point is the second preset time length after the pointing time point. any point in time within.

Among them, the extraction method of the second data feature can refer to the extraction method of the first data feature mentioned above, extract the target time point and the KPI time series data within the first preset time length before the target time point, and perform feature extraction to obtain the target time point. The second data feature.

Step S15: Determine the final anomaly detection result of the single-dimensional KPI time series data based on the classification results.

It should be pointed out that the time point when an exception occurs is defined as t. If an exception still occurs within (t, t+T) time, the abnormality found is a true exception, otherwise it is a false exception. Therefore, the classification results can be divided into four categories: true examples, false positive examples, true negative examples and false negative examples. An abnormality was detected before, but the abnormality is still detected within T time, which is a true case TN. An abnormality was detected before but normal detection was detected within T time, which is a false positive case FN. It was normal before, but it was detected within T time. If it is still normal, it is a true negative example TP. If it was detected before, it was normal, and if it is abnormal within T time, it is a false negative example FP. For the label classifier, the anomalies detected within T time after the start of the continuous abnormal interval means that the entire interval is an abnormal interval, and the detected TN and FP are real anomalies, and other situations can be ignored.

The label classifier in some embodiments of this application still uses the XGBoost model. The input of the label classifier is the feature extracted in the aforementioned manner at any time point t _i within T time after the abnormal time point. The combined characteristics of the preliminary anomaly detection results l _t corresponding to time t are That is, t _i ∈ (t, t+T) further determines whether it is an abnormal interval: when the feature f _l at time t _i is input to the label classifier, the result y _i is obtained. If y _i ∈ TN or FP , then the determined continuous interval is an abnormal interval, and the final detection result is abnormal.

It can be seen that in some embodiments of this application, the single-dimensional KPI time series data of the target interval can be obtained first. The length of the target interval is the first preset time length, and the end time point of the target interval is the specified time point, and then the single-dimensional KPI is extracted. The first data feature of the time series data is input into the base classifier, and the base classifier is used to output the preliminary anomaly detection result of the single-dimensional KPI time series data. Then the second data feature of the target time point is extracted, and the first data feature is extracted. The two data features and preliminary anomaly detection results are input into the label classifier to obtain the classification results. The target time point is any time point within the second preset time length after the pointing time point. The final anomaly of the single-dimensional KPI time series data is determined based on the classification results. Test results. That is, a two-layer classifier is used. The base classifier is first used to detect the single-dimensional KPI time series data in the target interval to obtain preliminary anomaly detection results, and then the label classifier is used to determine the preliminary anomaly detection results, thereby obtaining the final The anomaly detection results can improve the accuracy of KPI anomaly detection.

Referring to Figure 2, this application discloses a root cause locating method in some embodiments, including:

Step S21: Construct a candidate root cause set based on multiple single-dimensional KPI time series data whose final anomaly detection results are abnormal; wherein the candidate root cause set includes one or more single-dimensional KPI time series data.

That is, in the same target interval, there may be single-dimensional KPI time series data that are abnormal, and a candidate root cause set is constructed based on multiple single-dimensional KPI time series data. A single-dimensional KPI time series data is an element in the candidate root cause set. For anomaly detection of single-dimensional KPI time series data, reference may be made to the content disclosed in some of the foregoing embodiments, and will not be described again here.

Step S22: Use the candidate root cause sets as nodes and construct a multi-layer root cause tree according to the data dimensions of the candidate root cause sets.

Among them, the data dimensions of nodes at the same level in the multi-level root cause tree are the same, and the data dimensions of nodes at each level decrease from top to bottom.

For example, as shown in Figure 3, this application discloses a schematic diagram of a multi-layer root cause tree in some embodiments. Among them, K1, K2, K3, and K4 respectively represent four kinds of abnormal single-dimensional KPI time series data. The root cause set corresponding to each node in the first layer only includes single-dimensional KPI time series data. The root cause set corresponding to each node in the second layer includes 2-dimensional KPI time series data, that is, it includes two types of single-dimensional KPI time series data. The root cause set corresponding to each node in the third layer includes 3-dimensional KPI time series data. The root cause set corresponding to each node in the third layer includes 4-dimensional KPI time series data.

It should be pointed out that for a set of abnormal KPI time series data, the root cause of this set of abnormal KPI time series data can be obtained based on the correlation between exceptions.

In some embodiments of this application, root causes can be searched based on Figure 3. For example, a layer-by-layer search can be performed according to the root cause dimensions. Each node is a combination of root causes, and the final root cause falls on a certain leaf node. .

Step S23: Prune the multi-layer root cause tree layer by layer based on the preset pruning strategy, and determine the abnormal root cause set based on the ripple effect.

In some embodiments of the present application, a multi-layer root tree can be pruned layer by layer from top to bottom based on a preset pruning strategy. Moreover, during the layer-by-layer pruning process, for any layer, the influence value of each node is determined based on the preset influence value calculation rules, and it is judged whether the influence value is less than the preset influence threshold. If so, the node is eliminated. and the child nodes of the deleted nodes in each layer below this layer. Further, for any layer, after eliminating nodes whose influence value is less than the preset influence threshold, the potential score corresponding to each remaining node is determined based on the ripple effect; where the potential score represents the influence of the element of the node on the node. The degree of influence of the elements of the child nodes; sort the potential scores of the remaining nodes in each layer, and determine the abnormal root cause set based on the sorting results.

In some embodiments of this application, the predicted value of each element in each node can be calculated based on a preset prediction algorithm, and the influence value of each node can be determined based on the predicted value of each element and the actual value of each element. .

It should be pointed out that according to the ripple effect, if a set of KPI data can affect the KPI values of a large number of other elements, then this set of KPI data is a root cause set, and other KPI data are leaf nodes of this set of KPI data. Assuming that the candidate set of a root cause set is S, the KPI values of its descendant leaf nodes can be derived based on the ripple effect, and the KPI data in the candidate set are compared with the derived KPI values. Therefore, in some embodiments of this application, an evaluation standard is designed that can express the comparison between the candidate set KPI and the derived KPI. The closer the two values are, the greater the possibility that S will become the root cause set. However, if the candidate set has many , then according to Occam’s razor principle, it is necessary to give priority to sets with fewer dimensions.

In some embodiments, the anomaly root cause locating method used can be based on the HotSpot algorithm and add a layer-by-layer pruning strategy to improve efficiency. The process is as follows:

When an abnormality is detected in a KPI, the actual value of the KPI leaf element within the window of length W before the abnormality is given is the time t when the abnormality occurs, that is, the aforementioned final detection result is abnormal. The actual value of a single-dimensional KPI time series data is v = {v _t-W+1 , v _t-W+2 ,..., v _t }, V _t is the actual KPI value of all leaf elements in any node at time t, V _t = {v(e ₁ , t ), v(e ₂ , t),..., v(e _n , t)}, e represents an element, n represents the number of elements in the node, then the KPI predicted value of any node at time t F _t = {f(e ₁ , t), f(e ₂ , t),..., f(e _n , t)}, the prediction algorithm uses the EWAM algorithm.

According to the ripple effect, if an abnormality occurs in an element Assume that the change value of x is h(x), then h(x)=f(x)-v(x), and the derived value of each element x' is calculated according to the formula:

Among them, x’ is an element different from x in the child node of the node to which element x belongs. f(x) represents the predicted value of x, the prediction algorithm uses the EWAM algorithm, and v(x) represents the actual value of x.

Furthermore, the degree of influence of element x on other leaf node elements can be expressed by the potential score ps:

in, for variables Euler distance between:

Same reason. Among them, i represents the time point i in the single-dimensional KPI time series data. Therefore, if the leaf node If the point element is the root cause, the closer the values of a and v are, The closer the value is to 0, the closer the ps value is to 1.

It is understandable that the ps value of the node is determined based on the ps value of the element. The root cause set S can be determined according to the ps value.

For example, see Figure 3, assuming that the ps value of the K1 node in the first layer is calculated, then K1 is the aforementioned x, assuming that K1,2, K1,3 are all remaining nodes, K2 in K1,2, K1,3 K3 of are all x', calculate the ps values of K1 for K2 in K1,2 and K3 in K1,3 respectively, and then sum them up to get the potential score of the K1 node, and for the potential scores of K1,2, K1 and K2 is the aforementioned x, f(x)=f(K1)+f(K2), v(x)=v(K1)+v(K2), assuming K1,2,3 are the remaining nodes, then K1,2, K3 in 3 is x'.

It should be pointed out that according to the aforementioned process, the potential score of any set can be determined. However, the search process for abnormal root causes is the process of finding the candidate root cause set with the largest potential score in a set of arbitrary dimensional combinations. This search space is very huge. When there are one-dimensional root causes, the search space It is n ³ , but the root cause is not necessarily one-dimensional, and the search space will grow exponentially. Therefore, in order to deal with this problem of search space explosion, a layer-by-layer pruning strategy is used to search the root cause set, and pruning is performed based on the influence of the root cause set. In some embodiments, the influence of the root cause set S is defined. for:

Among them, h(S) is the sum of the absolute values of the change values of each element in the root cause set S, and e represents any KPI time series data whose final detection result is abnormal. In some embodiments, the influence of the root cause set can also be calculated through the formula E(S)=h(S).

It should be pointed out that influence represents the possibility of the candidate root cause set S becoming a root cause. In addition, a threshold T _E needs to be determined. When traversing a certain node, when E(S)< _TE set. Through the process of pruning strategy and potential score calculation, uninfluential nodes are eliminated, and several candidate root cause sets are obtained in each layer. They are sorted from large to small according to the potential scores, and the highest potential score root cause set is obtained. is the root cause with the highest probability.

For example, as shown in Figure 4, this application discloses a schematic diagram of KPI anomaly detection and root cause location in some embodiments. In the KPI anomaly detection process, multi-dimensional KPIs are divided into single-dimensional KPIs for separate detection. The anomaly detection algorithm is designed for continuous segment anomalies in KPIs and uses multi-layer classifiers for detection. The base classifier detects abnormal points in KPIs. The label classifier detects anomalies in the time segment and then identifies them as detected anomalies. The point is true anomaly, thus detecting all abnormal KPIs that occur at the same time. A root cause candidate set is formed based on the discovered abnormal KPIs. The root cause set is discovered based on the ripple effect and using the potential scores in HotSpot to quantify the influence of the KPIs and by layer-by-layer calculation of the root cause tree. In this way, the KPI point anomalies in anomaly detection are converted into KPI continuous interval anomalies through a multi-layer classifier, making the anomaly detector pay more attention to abnormal events. In the root cause analysis stage, the ripple effect of KPI anomalies is used to quantify by defining potential scores and influences, and the search speed is accelerated through layer-by-layer pruning to complete the positioning of the root cause of the anomaly.

As shown in Figure 5, this application discloses a KPI anomaly detection device in some embodiments, including:

The KPI data acquisition module 11 is used to obtain single-dimensional KPI time series data of the target interval; wherein the length of the target interval is the first preset time length, and the end time point of the target interval is the designated time point;

Data feature extraction module 12, used to extract the first data feature of single-dimensional KPI time series data;

The detection result output module 13 is used to input the first data feature into the base classifier, and use the base classifier to output preliminary anomaly detection results of single-dimensional KPI time series data;

The classification result acquisition module 14 is used to extract the second data feature of the target time point, and input the second data feature and the preliminary anomaly detection result into the label classifier to obtain the classification result; wherein the target time point is the third time point after the pointing time point. 2. Any time point within the preset time length;

The detection result determination module 15 is used to determine the final anomaly detection result of the single-dimensional KPI time series data based on the classification results.

Among them, the data feature extraction module 12 includes:

The normalization processing sub-module is used to normalize single-dimensional KPI time series data and obtain normalized data;

The data feature extraction submodule is used to determine the normalized data and at least one of the statistical features, prediction features, and frequency domain features of the normalized data as the first data feature of the single-dimensional KPI time series data.

Further, the device also includes a root cause positioning module, which includes:

The candidate root cause set construction submodule is used to construct a candidate root cause set based on multiple single-dimensional KPI time series data whose final anomaly detection results are abnormal; wherein the candidate root cause set includes one or more single-dimensional KPI time series data;

The root cause tree construction submodule is used to use candidate root cause sets as nodes and build multiple layers according to the data dimensions of the candidate root cause sets. root cause tree;

The abnormal root cause set determination submodule is used to prune the multi-layer root cause tree layer by layer based on the preset pruning strategy, and determine the abnormal root cause set based on the ripple effect.

Among them, the data dimensions of nodes at the same level in the multi-layer root cause tree are the same, and the data dimensions of nodes at each level decrease from top to bottom;

Correspondingly, the abnormal root cause set determination submodule is used to prune the multi-layer root cause tree layer by layer from top to bottom based on the preset pruning strategy.

In some embodiments, the exception root cause set determination submodule is used to:

For any layer, determine the influence value of each node based on the preset influence value calculation rules, and determine whether the influence value is less than the preset influence threshold. If so, delete the node and the layers below it. child nodes of node;

Further, the abnormal root cause set determination submodule is used to: calculate the predicted value of each element in each node based on the preset prediction algorithm, and determine the predicted value of each node based on the predicted value of each element and the actual value of each element. Influence value.

Referring to Figure 6, this application discloses an electronic device 20 in some embodiments, including a processor 21 and a memory 22; wherein the memory 22 is used to save a computer program; the processor 21 is used to execute the computer program, KPI anomaly detection methods disclosed in some of the foregoing embodiments.

Regarding the process of the above KPI anomaly detection method, please refer to the corresponding content disclosed in some of the foregoing embodiments, and will not be described again here.

Moreover, the memory 22, as a carrier for resource storage, may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., and the storage method may be short-term storage or permanent storage.

In addition, the electronic device 20 also includes a power supply 23, a communication interface 24, an input and output interface 25 and a communication bus 26; the power supply 23 is used to provide operating voltage for each hardware device on the electronic device 20; the communication interface 24 can provide the electronic device 20 with working voltage. Create a data transmission channel with external devices, and the communication protocol it follows is any communication protocol that can be applied to the technical solution of this application, which is not limited here; the input and output interface 25 is used to obtain input data from the outside world or send data to the outside world. For external output data, the interface type can be selected according to application needs and is not limited here.

Referring to FIG. 7 , the present application discloses in some embodiments a non-volatile computer-readable storage medium 70 for storing a computer program 710 , wherein the computer program 710 implements some of the foregoing embodiments when executed by a processor. Publicly available KPI anomaly detection methods.

Each embodiment in this specification is described in a progressive manner. Each embodiment focuses on its differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other. As for the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple. For relevant details, please refer to the description in the method section.

The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein may be implemented directly in hardware, in software modules executed by a processor, or in a combination of both. Software modules may be located in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. any other known form of storage media.

The above is a detailed introduction to a KPI anomaly detection method, device, equipment and medium provided by this application. Specific examples are used in this article to illustrate the principles and implementation methods of this application. The description of the above embodiments is only for assistance. Understand the methods and core ideas of this application; at the same time, for those of ordinary skill in the field, there will be changes in the specific implementation methods and application scope based on the ideas of this application. In summary, the contents of this specification should not be understood as a limitation on this application.

Claims

A KPI anomaly detection method, characterized by including:

Obtain the single-dimensional key performance indicator KPI time series data of the target interval; wherein the length of the target interval is the first preset time length, and the end time point of the target interval is the specified time point;

Extract the first data feature of the single-dimensional KPI time series data;

Input the first data feature into a base classifier, and use the base classifier to output preliminary anomaly detection results of the single-dimensional KPI time series data;

Extract the second data feature of the target time point, and input the second data feature and the preliminary anomaly detection result into a label classifier to obtain a classification result; wherein the target time point is the third time point after the pointing time point. 2. Any time point within the preset time length;

The final anomaly detection result of the single-dimensional KPI time series data is determined based on the classification result.
The KPI anomaly detection method according to claim 1, wherein the extracting the first data feature of the single-dimensional KPI time series data includes:

Perform normalization processing on the single-dimensional KPI time series data to obtain normalized data;

Determine the normalized data and at least one of the statistical features, prediction features, and frequency domain features of the normalized data as the first data feature of the single-dimensional KPI time series data.
The KPI anomaly detection method according to claim 1, further comprising:

Construct a candidate root cause set based on the multiple single-dimensional KPI time series data whose final anomaly detection result is abnormal; wherein the candidate root cause set includes one or more of the single-dimensional KPI time series data;

Using the candidate root cause set as a node, and constructing a multi-layer root cause tree according to the data dimensions of the candidate root cause set;

The multi-layer root cause tree is pruned layer by layer based on a preset pruning strategy, and an abnormal root cause set is determined based on the ripple effect.
The KPI anomaly detection method according to claim 3, characterized in that the data dimensions of nodes at the same level in the multi-layer root cause tree are the same, and the data dimensions of nodes at each level decrease from top to bottom. ;

Correspondingly, pruning the multi-layer root cause tree layer by layer based on a preset pruning strategy includes: pruning the multi-layer root cause tree layer by layer from top to bottom based on the preset pruning strategy. branch.
The KPI anomaly detection method according to claim 4, characterized in that pruning the multi-layer root cause tree layer by layer based on a preset pruning strategy includes:

For any layer, determine the influence value of each node based on the preset influence value calculation rules, and determine whether the influence value is less than the preset influence threshold. If so, remove the node and the layers below it. The child nodes of the deleted node.
The KPI anomaly detection method according to claim 5, wherein the multi-layer root cause tree is pruned layer by layer based on a preset pruning strategy, and the abnormal root cause set is determined based on the ripple effect, including :

For any layer, after eliminating nodes whose influence value is less than the preset influence threshold, the potential score corresponding to each remaining node is determined based on the ripple effect; wherein the potential score represents the element pair of the node. The degree of influence of the elements of the node's child nodes;

The potential scores of the remaining nodes in each layer are sorted, and an abnormal root cause set is determined based on the sorting results.
The KPI anomaly detection method according to claim 5, wherein the determining the influence value of each node based on the preset influence value calculation rules includes:

The predicted value of each element in each node is calculated based on the preset prediction algorithm, and the influence value of each node is determined based on the predicted value of each element and the actual value of each element.
The KPI anomaly detection method according to claim 1, characterized in that the single-dimensional KPI time series data is central processing unit CPU data, memory data, and network data.
The KPI anomaly detection method according to claim 2, characterized in that the normalization method includes the Minmax method.
The KPI anomaly detection method according to claim 2, wherein the statistical characteristics include at least one of mean, variance, extreme value, quantile, and difference.
The KPI anomaly detection method according to claim 10, characterized in that the normalized data and statistical features are used to represent the short-term characteristics of the KPI time series data.
The KPI anomaly detection method according to claim 2, characterized in that the prediction feature is predicted by using an exponentially weighted moving average (EWMA) prediction algorithm to predict normalized data.
The KPI anomaly detection method according to claim 12, characterized in that the prediction features are used to represent the possibility of KPI time series data anomalies.
The KPI anomaly detection method according to claim 2, characterized in that the frequency domain features are wavelet features, and DB2 wavelet decomposition is used.
The KPI anomaly detection method according to claim 14, characterized in that the wavelet decomposition features are used to represent the characteristics of KPI data in the frequency domain.
The KPI anomaly detection method according to claim 1, characterized in that the base classifier for anomaly detection is an extreme gradient boosting tree XGBoost model.
The KPI anomaly detection method according to claim 1, characterized in that the classification result is one of a true example, a false positive example, a true negative example and a false negative example;

Among them, the time point when the exception occurs is t, for (t, t+T) time:

If an anomaly is detected first, then it is a true example if the anomaly is detected at time T;

If an abnormality is detected first and normal is detected within T time, it is a false positive;

If normal is detected first, if normal is detected within T time, it is a true negative example;

If normal is detected first, if abnormal is detected within T time, it is a false negative example.
A KPI anomaly detection device, characterized by including:

The KPI data acquisition module is used to obtain the single-dimensional KPI time series data of the target interval; wherein the length of the target interval is the first preset time length, and the end time point of the target interval is the designated time point;

A data feature extraction module, used to extract the first data feature of the single-dimensional KPI time series data;

A detection result output module, configured to input the first data feature into a base classifier, and use the base classifier to output preliminary anomaly detection results of the single-dimensional KPI time series data;

The classification result acquisition module is used to extract the second data feature of the target time point, and input the second data feature and the preliminary anomaly detection result into the label classifier to obtain the classification result; wherein the target time point is the Any time point within the second preset time length after the pointing time point;

A detection result determination module, configured to determine the final anomaly detection result of the single-dimensional KPI time series data based on the classification result.
An electronic device, characterized by including a processor and a memory; wherein,

The memory is used to store computer programs;

The processor is configured to execute the computer program to implement the KPI anomaly detection method according to any one of claims 1 to 17.
A non-volatile computer-readable storage medium, characterized in that it is used to store a computer program, wherein when the computer program is executed by a processor, the KPI anomaly detection method as described in any one of claims 1 to 17 is implemented. .