WO2023208136A1 - Kpi anomaly detection method and apparatus, device and medium - Google Patents

Kpi anomaly detection method and apparatus, device and medium Download PDF

Info

Publication number
WO2023208136A1
WO2023208136A1 PCT/CN2023/091310 CN2023091310W WO2023208136A1 WO 2023208136 A1 WO2023208136 A1 WO 2023208136A1 CN 2023091310 W CN2023091310 W CN 2023091310W WO 2023208136 A1 WO2023208136 A1 WO 2023208136A1
Authority
WO
WIPO (PCT)
Prior art keywords
kpi
anomaly detection
data
layer
time series
Prior art date
Application number
PCT/CN2023/091310
Other languages
French (fr)
Chinese (zh)
Inventor
苏海明
Original Assignee
郑州云海信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 郑州云海信息技术有限公司 filed Critical 郑州云海信息技术有限公司
Publication of WO2023208136A1 publication Critical patent/WO2023208136A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Abstract

The present application discloses a KPI anomaly detection method and apparatus, a device, and a medium, which are applied to the technical field of KPI anomalies. The method comprises: acquiring single-dimensional KPI time series data of a target interval, the length of the target interval being a first preset time length, and the end time point of the target interval being a specified time point; extracting a first data feature of the single-dimensional KPI time series data; inputting the first data feature into a base classifier, and outputting a preliminary anomaly detection result of the single-dimensional KPI time series data using the base classifier; extracting a second data feature of a target time point, and inputting the second data feature and the preliminary anomaly detection result into a label classifier to obtain a classification result, the target time point being any time point within a second preset time length after the specified time point; and determining a final anomaly detection result of the single-dimensional KPI time series data on the basis of the classification result. In this way, the accuracy of KPI anomaly detection may be improved.

Description

一种KPI异常检测方法、装置、设备及介质A KPI anomaly detection method, device, equipment and medium
相关申请的交叉引用Cross-references to related applications
本申请要求于2022年04月28日提交中国专利局,申请号为202210460951.1,申请名称为“一种KPI异常检测方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requests the priority of the Chinese patent application submitted to the China Patent Office on April 28, 2022, with the application number 202210460951.1, and the application title is "A KPI anomaly detection method, device, equipment and medium", the entire content of which is incorporated by reference incorporated in this application.
技术领域Technical field
本申请涉及KPI异常检测技术领域,特别涉及一种KPI异常检测方法、装置、设备及介质。This application relates to the technical field of KPI anomaly detection, and in particular to a KPI anomaly detection method, device, equipment and medium.
背景技术Background technique
随着云计算领域的快速发展,对实现拥有物理机性能和云弹性的裸机建设正在云计算中悄然兴起。为了使云计算中的物理机与云主机的性能达到最佳,分析监控数据分析对机器性能调优具有指导意义。With the rapid development of cloud computing, the realization of bare metal construction with physical machine performance and cloud elasticity is quietly emerging in cloud computing. In order to optimize the performance of physical machines and cloud hosts in cloud computing, analysis and monitoring data analysis has guiding significance for machine performance tuning.
当前,服务器监控数据主要包括CPU(Central Processing Unit,中央处理器),内存,存储,网络等性能数据,这些数据包含有CPU使用率,内存使用率,网络吞吐量等时序性能数据。这些数据很大一部分为单维KPI(Key Performance Indicators,关键性能指标)数据,在单维时序数据异常检测中,往往面临着一些挑战:缺乏可定义的异常发生模式;数据中可能存在噪声;数据通常是不平稳的,属于动态变化的,因此给单维时序数据的异常检测带来了比较大的挑战。而如何提升KPI异常检测的准确率是KPI异常检测技术领域不断在研究的问题。Currently, server monitoring data mainly includes performance data such as CPU (Central Processing Unit), memory, storage, and network. These data include timing performance data such as CPU usage, memory usage, and network throughput. A large part of this data is single-dimensional KPI (Key Performance Indicators, Key Performance Indicators) data. In single-dimensional time series data anomaly detection, we often face some challenges: lack of definable anomaly occurrence patterns; noise may exist in the data; data It is usually unstable and changes dynamically, so it brings a relatively big challenge to anomaly detection of single-dimensional time series data. How to improve the accuracy of KPI anomaly detection is a problem that is constantly being studied in the field of KPI anomaly detection technology.
发明内容Contents of the invention
有鉴于此,本申请的目的在于提供一种KPI异常检测方法、装置、设备及介质,能够提升KPI异常检测的准确率。其方案可以如下:In view of this, the purpose of this application is to provide a KPI anomaly detection method, device, equipment and medium that can improve the accuracy of KPI anomaly detection. The solution can be as follows:
第一方面,本申请在一些实施例中公开了一种KPI异常检测方法,包括:In the first aspect, this application discloses a KPI anomaly detection method in some embodiments, including:
获取目标区间的单维KPI时序数据;其中,目标区间的长度为第一预设时间长度,目标区间的结束时间点为指定时间点;Obtain the single-dimensional KPI time series data of the target interval; wherein the length of the target interval is the first preset time length, and the end time point of the target interval is the specified time point;
提取单维KPI时序数据的第一数据特征; Extract the first data feature of single-dimensional KPI time series data;
将第一数据特征输入基分类器,并利用基分类器输出单维KPI时序数据的初步异常检测结果;Input the first data feature into the base classifier, and use the base classifier to output preliminary anomaly detection results of single-dimensional KPI time series data;
提取目标时间点的第二数据特征,并将第二数据特征以及初步异常检测结果输入标签分类器,得到分类结果;其中,目标时间点为指点时间点之后的第二预设时间长度内的任意时间点;Extract the second data feature of the target time point, and input the second data feature and the preliminary anomaly detection result into the label classifier to obtain the classification result; where the target time point is any time within the second preset time length after the pointing time point. point in time;
基于分类结果确定单维KPI时序数据的最终异常检测结果。The final anomaly detection results of single-dimensional KPI time series data are determined based on the classification results.
在一些实施例中,提取单维KPI时序数据的第一数据特征,包括:In some embodiments, extracting the first data feature of single-dimensional KPI time series data includes:
对单维KPI时序数据进行归一化处理,得到归一化后数据;Perform normalization processing on single-dimensional KPI time series data to obtain normalized data;
将归一化后数据以及归一化后数据的统计特征、预测特征、频域特征中的至少一项确定为单维KPI时序数据的第一数据特征。Determine the normalized data and at least one of the statistical features, prediction features, and frequency domain features of the normalized data as the first data feature of the single-dimensional KPI time series data.
在一些实施例中,还包括:In some embodiments, it also includes:
基于最终异常检测结果为异常的多个单维KPI时序数据构建候选根因集;其中,候选根因集中包括一个或多个单维KPI时序数据;Construct a candidate root cause set based on multiple single-dimensional KPI time series data for which the final anomaly detection result is abnormal; wherein the candidate root cause set includes one or more single-dimensional KPI time series data;
以候选根因集为节点,并根据候选根因集的数据维数构建多层根因树;Taking candidate root cause sets as nodes, and constructing a multi-layer root cause tree based on the data dimensions of the candidate root cause sets;
基于预设剪枝策略对多层根因树进行逐层剪枝,并基于涟漪效应确定出异常根因集。The multi-layer root cause tree is pruned layer by layer based on the preset pruning strategy, and the abnormal root cause set is determined based on the ripple effect.
在一些实施例中,多层根因树中同一层节点的数据维数相同,且各层节点的数据维度数自顶向下递减;In some embodiments, the data dimensions of nodes at the same level in the multi-level root cause tree are the same, and the data dimensions of nodes at each level decrease from top to bottom;
相应的,基于预设剪枝策略对多层根因树进行逐层剪枝,包括:基于预设剪枝策略对多层根因树自顶向下的进行逐层剪枝。Correspondingly, pruning the multi-layer root cause tree layer by layer based on the preset pruning strategy includes: pruning the multi-layer root cause tree layer by layer from top to bottom based on the preset pruning strategy.
在一些实施例中,基于预设剪枝策略对多层根因树进行逐层剪枝,包括:In some embodiments, the multi-layer root cause tree is pruned layer by layer based on a preset pruning strategy, including:
针对任一层,基于预设影响力值计算规则确定每个节点的影响力值,判断影响力值是否小于预设影响力阈值,若是,则剔除该节点以及该层之下各层中已剔除节点的子节点。For any layer, determine the influence value of each node based on the preset influence value calculation rules, and determine whether the influence value is less than the preset influence threshold. If so, delete the node and the layers below it. The child nodes of the node.
在一些实施例中,基于预设剪枝策略对多层根因树进行逐层剪枝,并基于涟漪效应确定出异常根因集,包括:In some embodiments, the multi-layer root cause tree is pruned layer by layer based on a preset pruning strategy, and the abnormal root cause set is determined based on the ripple effect, including:
针对任一层,在剔除影响力值小于预设影响力阈值的节点后,基于涟漪效应确定每个剩余节点对应的潜在分值;其中,潜在分值表征节点的元素对该节点的子节点的元素的影响程度;For any layer, after eliminating nodes whose influence value is less than the preset influence threshold, the potential score corresponding to each remaining node is determined based on the ripple effect; where the potential score represents the influence of the element of the node on the child node of the node. The degree of influence of elements;
将各层剩余节点的潜在分值进行排序,基于排序结果确定出异常根因集。The potential scores of the remaining nodes in each layer are sorted, and the abnormal root cause set is determined based on the sorting results.
在一些实施例中,基于预设影响力值计算规则确定每个节点的影响力值,包括:In some embodiments, the influence value of each node is determined based on preset influence value calculation rules, including:
基于预设预测算法计算每个节点中每个元素的预测值,基于每个元素的预测值以及每个 元素的实际值确定每个节点的影响力值。The predicted value of each element in each node is calculated based on the preset prediction algorithm, based on the predicted value of each element and each The actual value of the element determines the influence value of each node.
第二方面,本申请在一些实施例中公开了一种KPI异常检测装置,包括:In the second aspect, this application discloses a KPI anomaly detection device in some embodiments, including:
KPI数据获取模块,用于获取目标区间的单维KPI时序数据;其中,目标区间的长度为第一预设时间长度,目标区间的结束时间点为指定时间点;The KPI data acquisition module is used to obtain the single-dimensional KPI time series data of the target interval; where the length of the target interval is the first preset time length, and the end time point of the target interval is the specified time point;
数据特征提取模块,用于提取单维KPI时序数据的第一数据特征;Data feature extraction module, used to extract the first data feature of single-dimensional KPI time series data;
检测结果输出模块,用于将第一数据特征输入基分类器,并利用基分类器输出单维KPI时序数据的初步异常检测结果;The detection result output module is used to input the first data feature into the base classifier, and use the base classifier to output preliminary anomaly detection results of single-dimensional KPI time series data;
分类结果获取模块,用于提取目标时间点的第二数据特征,并将第二数据特征以及初步异常检测结果输入标签分类器,得到分类结果;其中,目标时间点为指点时间点之后的第二预设时间长度内的任意时间点;The classification result acquisition module is used to extract the second data feature of the target time point, and input the second data feature and the preliminary anomaly detection result into the label classifier to obtain the classification result; where the target time point is the second data feature after the pointing time point. Any point in time within a preset time period;
检测结果确定模块,用于基于分类结果确定单维KPI时序数据的最终异常检测结果。The detection result determination module is used to determine the final anomaly detection result of single-dimensional KPI time series data based on the classification results.
第三方面,本申请在一些实施例中公开了一种电子设备,包括处理器和存储器;其中,In a third aspect, this application discloses an electronic device in some embodiments, including a processor and a memory; wherein,
存储器,用于保存计算机程序;Memory, used to hold computer programs;
处理器,用于执行计算机程序以实现前述的KPI异常检测方法。The processor is used to execute the computer program to implement the aforementioned KPI anomaly detection method.
第四方面,本申请在一些实施例中公开了一种非易失性计算机可读存储介质,用于保存计算机程序,其中,计算机程序被处理器执行时实现前述的KPI异常检测方法。In the fourth aspect, the present application discloses in some embodiments a non-volatile computer-readable storage medium for storing a computer program, wherein when the computer program is executed by a processor, the aforementioned KPI anomaly detection method is implemented.
可见,本申请在一些实施例中,可以先获取目标区间的单维KPI时序数据,目标区间的长度为第一预设时间长度,目标区间的结束时间点为指定时间点,然后提取单维KPI时序数据的第一数据特征,并将第一数据特征输入基分类器,并利用基分类器输出单维KPI时序数据的初步异常检测结果,之后提取目标时间点的第二数据特征,并将第二数据特征以及初步异常检测结果输入标签分类器,得到分类结果,目标时间点为指点时间点之后的第二预设时间长度内的任意时间点,基于分类结果确定单维KPI时序数据的最终异常检测结果。也即,本申请利用了两层分类器,先利用基分类器对目标区间的单维KPI时序数据进行检测,得到初步异常检测结果,然后再利用标签分类器对初步异常检测结果进行判定,从而得到最终的异常检测结果,这样,能够提升KPI异常检测的准确率。It can be seen that in some embodiments of this application, the single-dimensional KPI time series data of the target interval can be obtained first. The length of the target interval is the first preset time length, and the end time point of the target interval is the specified time point, and then the single-dimensional KPI is extracted. The first data feature of the time series data is input into the base classifier, and the base classifier is used to output the preliminary anomaly detection result of the single-dimensional KPI time series data. Then the second data feature of the target time point is extracted, and the first data feature is extracted. The two data features and preliminary anomaly detection results are input into the label classifier to obtain the classification results. The target time point is any time point within the second preset time length after the pointing time point. The final anomaly of the single-dimensional KPI time series data is determined based on the classification results. Test results. That is to say, this application uses a two-layer classifier. First, the base classifier is used to detect the single-dimensional KPI time series data in the target interval to obtain preliminary anomaly detection results, and then the label classifier is used to determine the preliminary anomaly detection results, thereby The final anomaly detection result is obtained, which can improve the accuracy of KPI anomaly detection.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附 图获得其他的附图。In order to explain the embodiments of the present application or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only is an embodiment of the present application. For those of ordinary skill in the art, without exerting any creative effort, they can also modify the method according to the appendices provided. Figure obtains additional drawings.
图1为本申请在一些实施例中公开的一种KPI异常检测方法的流程图;Figure 1 is a flow chart of a KPI anomaly detection method disclosed in some embodiments of this application;
图2为本申请在一些实施例中公开的一种KPI异常检测方法的流程图;Figure 2 is a flow chart of a KPI anomaly detection method disclosed in some embodiments of this application;
图3为本申请在一些实施例中公开的一种多层根因树的示意图;Figure 3 is a schematic diagram of a multi-layer root cause tree disclosed in some embodiments of the present application;
图4为本申请在一些实施例中公开的一种KPI异常检测和根因定位的示意图;Figure 4 is a schematic diagram of KPI anomaly detection and root cause location disclosed in some embodiments of this application;
图5为本申请在一些实施例中公开的一种KPI异常检测装置的结构示意图;Figure 5 is a schematic structural diagram of a KPI anomaly detection device disclosed in some embodiments of this application;
图6为本申请在一些实施例中公开的一种电子设备的结构图;Figure 6 is a structural diagram of an electronic device disclosed in some embodiments of the present application;
图7为本申请在一些实施例中公开的一种非易失性计算机可读存储介质的结构图。Figure 7 is a structural diagram of a non-volatile computer-readable storage medium disclosed in some embodiments of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
当前,服务器监控数据主要包括CPU,内存,存储,网络等性能数据,这些数据包含有CPU使用率,内存使用率,网络吞吐量等时序性能数据。这些数据很大一部分为单维KPI,在单维时序数据异常检测中,往往面临着一些挑战:缺乏可定义的异常发生模式;数据中可能存在噪声;数据通常是不平稳的,属于动态变化的,因此给单维时序数据的异常检测带来了比较大的挑战。Currently, server monitoring data mainly includes performance data such as CPU, memory, storage, and network. These data include timing performance data such as CPU usage, memory usage, and network throughput. A large part of these data are single-dimensional KPIs. In single-dimensional time series data anomaly detection, we often face some challenges: lack of definable abnormal occurrence patterns; noise may exist in the data; data are usually unstable and dynamically changing. , Therefore, it brings a relatively big challenge to anomaly detection of single-dimensional time series data.
针对这些单维KPI的异常检测方法,包含有基于时间序列的方式与基于机器学习的方式,基于时序特征主要包含有累积式自回归移动平均(ARIMA(Autoregressive Integrated Moving Average model,差分整合移动平均自回归模型)),指数平滑模型等一系列的线型模型。当前主要的是使用机器学习的方式进行异常检测,主要是包含有监督的异常检测,半监督的异常检测与无监督的异常检测方式。有监督的异常检测方式是通过正常与异常数据实例标签来训练二分类判别器,例如SVM(Support Vector Machines,支持向量机)等,但这种有监督的检测方式都有些问题,在这些数据中异常样本与正常样本比例严重不平衡,训练出的模型极易过拟合,因此这类方法不如半监督或无监督的方式流行。Anomaly detection methods for these single-dimensional KPIs include time series-based methods and machine learning-based methods. Based on time series characteristics, they mainly include ARIMA (Autoregressive Integrated Moving Average model, differential integrated moving average automatic model). Regression model)), exponential smoothing model and a series of linear models. Currently, the main method is to use machine learning for anomaly detection, which mainly includes supervised anomaly detection, semi-supervised anomaly detection and unsupervised anomaly detection methods. The supervised anomaly detection method is to train a two-class discriminator through normal and abnormal data instance labels, such as SVM (Support Vector Machines, Support Vector Machines), etc. However, this supervised detection method has some problems. In these data The proportion of abnormal samples and normal samples is seriously imbalanced, and the trained model is easily overfitted. Therefore, this type of method is not as popular as semi-supervised or unsupervised methods.
采用半监督的异常检测方法中,在使用少量标签数据进行分类模型的训练,使用无标签数据优化样本隐含的结构信息。 In the semi-supervised anomaly detection method, a small amount of labeled data is used to train the classification model, and unlabeled data is used to optimize the structural information implicit in the sample.
最常使用的是使用深度自编码器在正常样本上进行半监督训练,使得正常类的自编码器对于正常数据的重构误差较低,当前比较主流的自编码器比如VAE(Variational Auto-Encoder,变分自编码器),AAE(Adaptive Arithmetic Encoder,自适应算术码编码器)等。The most commonly used method is to use deep autoencoders for semi-supervised training on normal samples, so that normal autoencoders have lower reconstruction errors for normal data. Currently, the more mainstream autoencoders such as VAE (Variational Auto-Encoder) , variational autoencoder), AAE (Adaptive Arithmetic Encoder, adaptive arithmetic code encoder), etc.
而第三类的无监督的异常检测技术是仅基于数据实例的内在属性来检测异常值,其基础理论也来自于自编码器。通常这类方法可用于数据样本的自动标注,常用的无监督算法有受限玻尔兹曼机,深度信念网络等。The third type of unsupervised anomaly detection technology is to detect outliers based only on the intrinsic properties of data instances, and its basic theory also comes from autoencoders. Usually this type of method can be used for automatic annotation of data samples. Commonly used unsupervised algorithms include restricted Boltzmann machines, deep belief networks, etc.
如何提升KPI异常检测的准确率是KPI异常检测技术领域不断在研究的问题。为此,本申请在一些实施例中提供了一种KPI异常检测方案,能够提升KPI异常检测的准确率。How to improve the accuracy of KPI anomaly detection is a problem that is constantly being studied in the field of KPI anomaly detection technology. To this end, this application provides a KPI anomaly detection solution in some embodiments, which can improve the accuracy of KPI anomaly detection.
参见图1所示,本申请在一些实施例中公开了一种KPI异常检测方法,包括:As shown in Figure 1, this application discloses a KPI anomaly detection method in some embodiments, including:
步骤S11:获取目标区间的单维KPI时序数据;其中,目标区间的长度为第一预设时间长度,目标区间的结束时间点为指定时间点。Step S11: Obtain the single-dimensional KPI time series data of the target interval; wherein the length of the target interval is the first preset time length, and the end time point of the target interval is the specified time point.
需要指出的是,在KPI异常检测场景中,KPI的异常通常为连续区间的形式存在。异常一旦出现,就会持续一段时间,而非单独的一个时间点。当一个KPI在t时刻发生异常时,异常会持续到t+T时刻。因此,异常检测方法分为两步,基分类器检测KPI时序数据的异常,标签分类器进一步进行检测。It should be pointed out that in KPI anomaly detection scenarios, KPI anomalies usually exist in the form of continuous intervals. Once an abnormality occurs, it will last for a period of time, not a single point in time. When a KPI is abnormal at time t, the abnormality will continue until time t+T. Therefore, the anomaly detection method is divided into two steps. The base classifier detects anomalies in KPI time series data, and the label classifier further detects them.
例如,在t时刻时,提取(t-W+1,t)时间内的数据Xt=(xt-W+1,xt-W+2,...,xt)。For example, at time t, data X t =(x t-W+1, x t - W+2 ,..., x t ) within the time period of (t-W+1, t) is extracted.
其中,t为指定时间点,W为第一预设时间长度。Xt表示单维KPI时序数据,可以为CPU数据、内存数据、网络数据等。Among them, t is the specified time point, and W is the first preset time length. X t represents single-dimensional KPI time series data, which can be CPU data, memory data, network data, etc.
步骤S12:提取单维KPI时序数据的第一数据特征。Step S12: Extract the first data feature of the single-dimensional KPI time series data.
在一些实施例中,可以对单维KPI时序数据进行归一化处理,得到归一化后数据;将归一化后数据以及归一化后数据的统计特征、预测特征、频域特征中的至少一项确定为单维KPI时序数据的第一数据特征。In some embodiments, the single-dimensional KPI time series data can be normalized to obtain normalized data; the normalized data and the statistical features, prediction features, and frequency domain features of the normalized data can be At least one item is determined to be the first data feature of the single-dimensional KPI time series data.
在一些实施例中,原始数据的归一化方法可以采用Minmax方法,其表达式为:
In some embodiments, the original data can be normalized using the Minmax method, whose expression is:
其中,Xt_nom标识归一化后数据。并且,统计特征可以包括均值、方差、极值、分位数、差分等中的至少一项,预测特征可以使用EWMA(Exponentially Weighted Moving-Average,指数加权移动平均)预测算法对归一化后数据预测得到,频域特征可以为小波特 征,采用DB2小波分解。在这些特征中,归一化后数据与统计特征用来表示KPI时序数据的短期特征,预测特征可以在一定程度上表示KPI时序数据异常的可能性,小波分解特征可以表示KPI数据在频率域上的特征。Among them, X t_nom identifies the normalized data. Moreover, statistical features can include at least one of mean, variance, extreme value, quantile, difference, etc., and predictive features can use EWMA (Exponentially Weighted Moving-Average, exponentially weighted moving average) prediction algorithm to normalize the data. It is predicted that the frequency domain feature can be small Porter Characteristics, using DB2 wavelet decomposition. Among these features, normalized data and statistical features are used to represent the short-term characteristics of KPI time series data, prediction features can indicate the possibility of abnormal KPI time series data to a certain extent, and wavelet decomposition features can represent the frequency domain of KPI data. Characteristics.
步骤S13:将第一数据特征输入基分类器,并利用基分类器输出单维KPI时序数据的初步异常检测结果。Step S13: Input the first data feature into the base classifier, and use the base classifier to output preliminary anomaly detection results of the single-dimensional KPI time series data.
在一些实施例中,可以采用XGBoost(eXtreme Gradient Boosting,极度梯度提升树)模型作为异常检测的基分类器,第一数据特征作为基分类器的输入,输出为正常或异常。KPI异常检测将数据异常转化为二分类问题,并将XGBoost作为一个二分类分类器。XGBoost模型可以表述为:
In some embodiments, the XGBoost (eXtreme Gradient Boosting, extreme gradient boosting tree) model can be used as the base classifier for anomaly detection, the first data feature is used as the input of the base classifier, and the output is normal or abnormal. KPI anomaly detection converts data anomalies into a binary classification problem, and uses XGBoost as a binary classifier. The XGBoost model can be expressed as:
其中,fk(x)表示第k个弱学习器,在XGBoost模型中弱学习器中的总量为K个。xi为第i个样本,为样本xi的预测值。这K个弱分类器,为了组成强分类器,需要最小化函数:

Among them, f k (x) represents the k-th weak learner, and the total number of weak learners in the XGBoost model is K. x i is the i-th sample, is the predicted value of sample x i . These K weak classifiers, in order to form a strong classifier, need to minimize the function:

其中,l(·)为损失函数,Ω(·)为正则化函数。yi为样本xi的真实值。正则化项中T为树的叶子节点个数,w为叶子节点的权值,γ和λ为正则项中的超参数。在每一轮迭代时,只对第t棵回归树的目标函数进行优化:
Among them, l(·) is the loss function and Ω(·) is the regularization function. y i is the true value of sample xi . In the regularization term, T is the number of leaf nodes of the tree, w is the weight of the leaf node, and γ and λ are the hyperparameters in the regularization term. In each iteration, only the objective function of the t-th regression tree is optimized:
其中,为前t-1棵树对应样本xi的输出,ft(xi)是当前树的输出。对目标函数进行泰勒展开,保留式中的一次项与二次项,得到目标的近似值为:
in, is the output of the sample xi corresponding to the first t-1 tree, and ft(xi) is the output of the current tree. Perform Taylor expansion of the objective function, retaining the linear and quadratic terms in the equation, and obtain the approximate value of the objective:
其中,

in,

为每个样本在损失函数上的一阶导数与二阶数,i∈Ij表示每个映射到第j个叶子节点上的样本数据。n为样本数量。对wj求导等于0,求得wj的最优解:
For the first-order derivative and second-order number of each sample on the loss function, i∈I j represents the sample data mapped to the j-th leaf node. n is the number of samples. The derivative of w j is equal to 0, and the optimal solution of w j is found:
代入原目标函数得到:
Will Substituting into the original objective function we get:
T为叶子节点数量。通过以上的迭代,可以找到树的最优分裂变量和切分值。使用寻找具有最佳结构的树并将其添加到模型中,利用贪心算法来找到最优树结构。T is the number of leaf nodes. Through the above iteration, the optimal splitting variables and split values of the tree can be found. use Find the tree with the best structure and add it to the model, using a greedy algorithm to find the optimal tree structure.
这样,通过以上检测,可以得到单维KPI时序数据是不是异常,且针对的是异常检测中的点异常,但在正常的系统中事件的重要性要大于点的重要性。本申请在一些实施例中更多 的是要关注事件异常,反映在KPI中为一个连续区间。因此需要对上面检测出的结果进行筛选,即使用标签分类器对初步异常检测结果进行判定。In this way, through the above detection, we can determine whether the single-dimensional KPI time series data is abnormal, and it is aimed at the point anomalies in anomaly detection. However, in a normal system, the importance of events is greater than the importance of points. This application in some embodiments is more What is important is to pay attention to event anomalies, which are reflected in the KPI as a continuous interval. Therefore, it is necessary to filter the results detected above, that is, use a label classifier to judge the preliminary anomaly detection results.
步骤S14:提取目标时间点的第二数据特征,并将第二数据特征以及初步异常检测结果输入标签分类器,得到分类结果;其中,目标时间点为指点时间点之后的第二预设时间长度内的任意时间点。Step S14: Extract the second data feature of the target time point, and input the second data feature and the preliminary anomaly detection result into the label classifier to obtain the classification result; where the target time point is the second preset time length after the pointing time point. any point in time within.
其中,第二数据特征的提取方式可以参考前述第一数据特征的提取方式,提取目标时间点以及目标时间点之前第一预设时间长度内的KPI时序数据,并进行特征提取,得到目标时间点的第二数据特征。Among them, the extraction method of the second data feature can refer to the extraction method of the first data feature mentioned above, extract the target time point and the KPI time series data within the first preset time length before the target time point, and perform feature extraction to obtain the target time point. The second data feature.
步骤S15:基于分类结果确定单维KPI时序数据的最终异常检测结果。Step S15: Determine the final anomaly detection result of the single-dimensional KPI time series data based on the classification results.
需要指出的是,定义出现异常的时间点为t,若在(t,t+T)时间内依然出现异常,即为发现的异常为真异常,否则为假异常。因此对分类结果可以分为真正例、假正例、真负例和假负例四类。之前检测出异常,而在T时间内依然检测出异常即为真正例TN,之前检测出异常但在T时间内检测出正常即为假正例FN,之前检测出正常,而在T时间内检测依然是正常即为真负例TP,之前检测出正常,而在T时间内检测出异常即为假负例FP。对于标签分类器而言,连续异常区间开始之后T时间内检测出的异常即代表整个区间为异常区间,则检测出的TN与FP为真正的异常,其他情况可以忽略。It should be pointed out that the time point when an exception occurs is defined as t. If an exception still occurs within (t, t+T) time, the abnormality found is a true exception, otherwise it is a false exception. Therefore, the classification results can be divided into four categories: true examples, false positive examples, true negative examples and false negative examples. An abnormality was detected before, but the abnormality is still detected within T time, which is a true case TN. An abnormality was detected before but normal detection was detected within T time, which is a false positive case FN. It was normal before, but it was detected within T time. If it is still normal, it is a true negative example TP. If it was detected before, it was normal, and if it is abnormal within T time, it is a false negative example FP. For the label classifier, the anomalies detected within T time after the start of the continuous abnormal interval means that the entire interval is an abnormal interval, and the detected TN and FP are real anomalies, and other situations can be ignored.
本申请在一些实施例中的标签分类器依然使用XGBoost模型。标签分类器的输入为异常时间点之后T时间内的任意时间点ti以前述方式提取的特征与t时刻对应的初步异常检测结果lt的组合特征为也即,ti∈(t,t+T)进一步的,判定是否为异常区间的方式:当ti时刻的特征fl输入标签分类器时,得到结果yi,若yi∈TN or FP,则判定的连续区间为异常区间,进而最终检测结果为异常。The label classifier in some embodiments of this application still uses the XGBoost model. The input of the label classifier is the feature extracted in the aforementioned manner at any time point t i within T time after the abnormal time point. The combined characteristics of the preliminary anomaly detection results l t corresponding to time t are That is, t i ∈ (t, t+T) further determines whether it is an abnormal interval: when the feature f l at time t i is input to the label classifier, the result y i is obtained. If y i ∈ TN or FP , then the determined continuous interval is an abnormal interval, and the final detection result is abnormal.
可见,本申请在一些实施例中,可以先获取目标区间的单维KPI时序数据,目标区间的长度为第一预设时间长度,目标区间的结束时间点为指定时间点,然后提取单维KPI时序数据的第一数据特征,并将第一数据特征输入基分类器,并利用基分类器输出单维KPI时序数据的初步异常检测结果,之后提取目标时间点的第二数据特征,并将第二数据特征以及初步异常检测结果输入标签分类器,得到分类结果,目标时间点为指点时间点之后的第二预设时间长度内的任意时间点,基于分类结果确定单维KPI时序数据的最终异常检测结果。也即,利用了两层分类器,先利用基分类器对目标区间的单维KPI时序数据进行检测,得到初步异常检测结果,然后再利用标签分类器对初步异常检测结果进行判定,从而得到最终的异常检测结果,这样,能够提升KPI异常检测的准确率。 It can be seen that in some embodiments of this application, the single-dimensional KPI time series data of the target interval can be obtained first. The length of the target interval is the first preset time length, and the end time point of the target interval is the specified time point, and then the single-dimensional KPI is extracted. The first data feature of the time series data is input into the base classifier, and the base classifier is used to output the preliminary anomaly detection result of the single-dimensional KPI time series data. Then the second data feature of the target time point is extracted, and the first data feature is extracted. The two data features and preliminary anomaly detection results are input into the label classifier to obtain the classification results. The target time point is any time point within the second preset time length after the pointing time point. The final anomaly of the single-dimensional KPI time series data is determined based on the classification results. Test results. That is, a two-layer classifier is used. The base classifier is first used to detect the single-dimensional KPI time series data in the target interval to obtain preliminary anomaly detection results, and then the label classifier is used to determine the preliminary anomaly detection results, thereby obtaining the final The anomaly detection results can improve the accuracy of KPI anomaly detection.
参见图2所示,本申请在一些实施例中公开了一种根因定位方法,包括:Referring to Figure 2, this application discloses a root cause locating method in some embodiments, including:
步骤S21:基于最终异常检测结果为异常的多个单维KPI时序数据构建候选根因集;其中,候选根因集中包括一个或多个单维KPI时序数据。Step S21: Construct a candidate root cause set based on multiple single-dimensional KPI time series data whose final anomaly detection results are abnormal; wherein the candidate root cause set includes one or more single-dimensional KPI time series data.
也即,同一目标区间,可能存在单维KPI时序数据为异常,基于多个单维KPI时序数据构建候选根因集。一个单维KPI时序数据为候选根因集中的一个元素。其中,单维KPI时序数据的异常检测可以参考前述一些实施例公开的内容,在此不再进行赘述。That is, in the same target interval, there may be single-dimensional KPI time series data that are abnormal, and a candidate root cause set is constructed based on multiple single-dimensional KPI time series data. A single-dimensional KPI time series data is an element in the candidate root cause set. For anomaly detection of single-dimensional KPI time series data, reference may be made to the content disclosed in some of the foregoing embodiments, and will not be described again here.
步骤S22:以候选根因集为节点,并根据候选根因集的数据维数构建多层根因树。Step S22: Use the candidate root cause sets as nodes and construct a multi-layer root cause tree according to the data dimensions of the candidate root cause sets.
其中,多层根因树中同一层节点的数据维数相同,且各层节点的数据维度数自顶向下递减。Among them, the data dimensions of nodes at the same level in the multi-level root cause tree are the same, and the data dimensions of nodes at each level decrease from top to bottom.
例如,参见图3所示,本申请在一些实施例中公开了一种多层根因树示意图。其中,K1、K2、K3、K4分别表示4种异常的单维KPI时序数据。第一层中各节点对应的根因集仅包括单维KPI时序数据。第二层中各节点对应的根因集包括2维KPI时序数据,也即,包括两种类型的单维KPI时序数据,第三层各节点对应的根因集包括3维KPI时序数据。第三层各节点对应的根因集包括4维KPI时序数据。For example, as shown in Figure 3, this application discloses a schematic diagram of a multi-layer root cause tree in some embodiments. Among them, K1, K2, K3, and K4 respectively represent four kinds of abnormal single-dimensional KPI time series data. The root cause set corresponding to each node in the first layer only includes single-dimensional KPI time series data. The root cause set corresponding to each node in the second layer includes 2-dimensional KPI time series data, that is, it includes two types of single-dimensional KPI time series data. The root cause set corresponding to each node in the third layer includes 3-dimensional KPI time series data. The root cause set corresponding to each node in the third layer includes 4-dimensional KPI time series data.
需要指出的是,对于一组异常的KPI时序数据,依据异常间的关联性可以得到这组异常KPI时序数据的根因。It should be pointed out that for a set of abnormal KPI time series data, the root cause of this set of abnormal KPI time series data can be obtained based on the correlation between exceptions.
本申请在一些实施例中,可以基于图3搜索根因,例如:可以根据根因维度进行逐层搜索,每一个节点都为一种根因组合,最终的根因落在某一叶子节点上。In some embodiments of this application, root causes can be searched based on Figure 3. For example, a layer-by-layer search can be performed according to the root cause dimensions. Each node is a combination of root causes, and the final root cause falls on a certain leaf node. .
步骤S23:基于预设剪枝策略对多层根因树进行逐层剪枝,并基于涟漪效应确定出异常根因集。Step S23: Prune the multi-layer root cause tree layer by layer based on the preset pruning strategy, and determine the abnormal root cause set based on the ripple effect.
本申请在一些实施例中,可以基于预设剪枝策略对多层根因树自顶向下的进行逐层剪枝。并且,在逐层剪枝过程中,针对任一层,基于预设影响力值计算规则确定每个节点的影响力值,判断影响力值是否小于预设影响力阈值,若是,则剔除该节点以及该层之下各层中已剔除节点的子节点。进一步的,针对任一层,在剔除影响力值小于预设影响力阈值的节点后,基于涟漪效应确定每个剩余节点对应的潜在分值;其中,潜在分值表征节点的元素对该节点的子节点的元素的影响程度;将各层剩余节点的潜在分值进行排序,基于排序结果确定出异常根因集。In some embodiments of the present application, a multi-layer root tree can be pruned layer by layer from top to bottom based on a preset pruning strategy. Moreover, during the layer-by-layer pruning process, for any layer, the influence value of each node is determined based on the preset influence value calculation rules, and it is judged whether the influence value is less than the preset influence threshold. If so, the node is eliminated. and the child nodes of the deleted nodes in each layer below this layer. Further, for any layer, after eliminating nodes whose influence value is less than the preset influence threshold, the potential score corresponding to each remaining node is determined based on the ripple effect; where the potential score represents the influence of the element of the node on the node. The degree of influence of the elements of the child nodes; sort the potential scores of the remaining nodes in each layer, and determine the abnormal root cause set based on the sorting results.
其中,本申请在一些实施例中,可以基于预设预测算法计算每个节点中每个元素的预测值,基于每个元素的预测值以及每个元素的实际值确定每个节点的影响力值。 In some embodiments of this application, the predicted value of each element in each node can be calculated based on a preset prediction algorithm, and the influence value of each node can be determined based on the predicted value of each element and the actual value of each element. .
需要指出的是,根据涟漪效应若一组KPI数据能够影响其他大量元素的KPI值,则这组KPI数据为根因集合,其他的KPI数据为这组KPI数据的叶子节点。假设一个根因集的候选集为S,则可以根据涟漪效应推导出其后代叶子节点的KPI值,将候选集中的KPI数据与推导出的KPI值进行比较。因此,本申请在一些实施例中,设计能够表述候选集KPI与推导KPI进行比较的评价标准,这两种值越接近,S成为根因集的可能性就越大,但如果候选集有多个,则根据奥卡姆剃刀原理,需要优先选择维度更少的集合。It should be pointed out that according to the ripple effect, if a set of KPI data can affect the KPI values of a large number of other elements, then this set of KPI data is a root cause set, and other KPI data are leaf nodes of this set of KPI data. Assuming that the candidate set of a root cause set is S, the KPI values of its descendant leaf nodes can be derived based on the ripple effect, and the KPI data in the candidate set are compared with the derived KPI values. Therefore, in some embodiments of this application, an evaluation standard is designed that can express the comparison between the candidate set KPI and the derived KPI. The closer the two values are, the greater the possibility that S will become the root cause set. However, if the candidate set has many , then according to Occam’s razor principle, it is necessary to give priority to sets with fewer dimensions.
在一些实施例中,采用的异常根因定位方法可以基于HotSpot算法并为提高效率增加逐层剪枝的策略,过程如下所示:In some embodiments, the anomaly root cause locating method used can be based on the HotSpot algorithm and add a layer-by-layer pruning strategy to improve efficiency. The process is as follows:
当检测到KPI发生异常时,则给定异常发生的时间t以及发生异常时间前长度W窗口内的KPI叶子元素值即前述最终检测结果为异常的一个单维KPI时序数据的实际值为v={vt-W+1,vt-W+2,...,vt},Vt为t时刻任一节点中所有叶子元素的KPI实际值,Vt={v(e1,t),v(e2,t),...,v(en,t)},e表示一个元素,n表示节点中的元素数量,则t时刻的任一节点的KPI预测值Ft={f(e1,t),f(e2,t),...,f(en,t)},预测算法使用EWAM算法。When an abnormality is detected in a KPI, the actual value of the KPI leaf element within the window of length W before the abnormality is given is the time t when the abnormality occurs, that is, the aforementioned final detection result is abnormal. The actual value of a single-dimensional KPI time series data is v = {v t-W+1 , v t-W+2 ,..., v t }, V t is the actual KPI value of all leaf elements in any node at time t, V t = {v(e 1 , t ), v(e 2 , t),..., v(e n , t)}, e represents an element, n represents the number of elements in the node, then the KPI predicted value of any node at time t F t = {f(e 1 , t), f(e 2 , t),..., f(e n , t)}, the prediction algorithm uses the EWAM algorithm.
根据涟漪效应,若一个元素x发生了异常,则所有子元素都会发生相应的变化,则需要将该元素的变化量按比例分配给所有后代元素,计算得到所有元素的推导值。假设x的变化值为h(x),则h(x)=f(x)-v(x),根据公式计算各个元素x’的推导值为:
According to the ripple effect, if an abnormality occurs in an element Assume that the change value of x is h(x), then h(x)=f(x)-v(x), and the derived value of each element x' is calculated according to the formula:
其中,x’为x元素所属节点的子节点中的和x不同的元素。f(x)表示x的预测值,预测算法使用EWAM算法,v(x)表示x的实际值。Among them, x’ is an element different from x in the child node of the node to which element x belongs. f(x) represents the predicted value of x, the prediction algorithm uses the EWAM algorithm, and v(x) represents the actual value of x.
进一步的,元素x对其他叶子节点元素的影响程度可用潜在分数ps表示:
Furthermore, the degree of influence of element x on other leaf node elements can be expressed by the potential score ps:
其中,为变量之间的欧拉距离:
in, for variables Euler distance between:
同理。其中,i表示单维KPI时序数据中的时间点i,因此,如果叶子节 点元素是根因的话,a与v的值就越接近,的值就越接近于0,ps值就越接近1。 Same reason. Among them, i represents the time point i in the single-dimensional KPI time series data. Therefore, if the leaf node If the point element is the root cause, the closer the values of a and v are, The closer the value is to 0, the closer the ps value is to 1.
可以理解的是,而根据元素的ps值确定节点的ps值。可以根据ps值的进行排序,确定根因集S。It is understandable that the ps value of the node is determined based on the ps value of the element. The root cause set S can be determined according to the ps value.
例如,参见图3所示,假设计算第一层中K1节点的ps值,则K1为前述x,假设K1,2、K1,3均为剩余节点,K1,2中的K2,K1,3中的K3均为x’,分别计算K1对K1,2中的K2,K1,3中的K3的ps值,然后求和,得到K1节点的潜在分数,而对于K1,2的潜在分数,K1和K2为前述x,f(x)=f(K1)+f(K2),v(x)=v(K1)+v(K2),假设K1,2,3为剩余节点,则K1,2,3中的K3为x’。For example, see Figure 3, assuming that the ps value of the K1 node in the first layer is calculated, then K1 is the aforementioned x, assuming that K1,2, K1,3 are all remaining nodes, K2 in K1,2, K1,3 K3 of are all x', calculate the ps values of K1 for K2 in K1,2 and K3 in K1,3 respectively, and then sum them up to get the potential score of the K1 node, and for the potential scores of K1,2, K1 and K2 is the aforementioned x, f(x)=f(K1)+f(K2), v(x)=v(K1)+v(K2), assuming K1,2,3 are the remaining nodes, then K1,2, K3 in 3 is x'.
需要指出的是,根据前述流程,可以确定出任意集合的潜在分数。但在对异常根因的搜索过程就是在一个任意维度组合的集合中,寻找潜在分数最大的候选根因集合的过程,这个搜索空间是非常巨大的,在一维根因的时候,其搜索空间就为n3,但根因不一定是单维的,其搜索空间就会是指数级增长的。因此为了应对这种搜索空间爆炸的问题,使用逐层剪枝的策略对根因集合进行搜索,依据根因集的影响力进行剪枝,在一些实施例中,定义根因集S的影响力为:
It should be pointed out that according to the aforementioned process, the potential score of any set can be determined. However, the search process for abnormal root causes is the process of finding the candidate root cause set with the largest potential score in a set of arbitrary dimensional combinations. This search space is very huge. When there are one-dimensional root causes, the search space It is n 3 , but the root cause is not necessarily one-dimensional, and the search space will grow exponentially. Therefore, in order to deal with this problem of search space explosion, a layer-by-layer pruning strategy is used to search the root cause set, and pruning is performed based on the influence of the root cause set. In some embodiments, the influence of the root cause set S is defined. for:
其中,h(S)为根因集S中各元素变化值的绝对值的和,e表示任一最终检测结果为异常的KPI时序数据。在一些实施例中,也可以通过公式E(S)=h(S)计算根因集的影响力。Among them, h(S) is the sum of the absolute values of the change values of each element in the root cause set S, and e represents any KPI time series data whose final detection result is abnormal. In some embodiments, the influence of the root cause set can also be calculated through the formula E(S)=h(S).
需要指出的是,影响力表示的是候选根因集S成为根因的可能性。另外,还需要确定一个阈值TE,当遍历到某一节点时,当E(S)<TE时,即代表这一节点中的组合作为根因的可能性比较低,不作为候选根因集。通过剪枝策略与潜在分数计算的流程,剔除了不具有影响力的节点,在每层中取得了若干候选根因集,依据潜在分数从大到小进行排序,取得最高潜在分数根因集即为概率最高的根因。It should be pointed out that influence represents the possibility of the candidate root cause set S becoming a root cause. In addition, a threshold T E needs to be determined. When traversing a certain node, when E(S)< TE set. Through the process of pruning strategy and potential score calculation, uninfluential nodes are eliminated, and several candidate root cause sets are obtained in each layer. They are sorted from large to small according to the potential scores, and the highest potential score root cause set is obtained. is the root cause with the highest probability.
例如,参见图4所示,本申请在一些实施中例公开了一种KPI异常检测和根因定位示意图。在KPI异常检测过程中,将多维KPI分成单维KPI进行分别检测。异常检测算法针对KPI中的连续区段异常进行设计,使用多层分类器进行检测,基分类器检测KPI中的异常点,标签分类器检测出时间区段中出现异常才认定为检测出的异常点为真异常,这样检测同一时刻出现的所有异常KPI。根据发现的异常KPI组成根因候选集,基于涟漪效应并使用HotSpot中的潜在分数对KPI的影响力进行量化结合根因树逐层计算的方式发现根因集。这样,通过多层分类器将异常检测中的KPI点异常转化为KPI连续区间异常,使得异常检测器更关注于异常事 件,在根因分析阶段使用KPI异常的涟漪效应,通过定义潜在分数与影响力进行量化,并通过逐层剪枝的方式加速搜索速度,完成对异常根因的定位。For example, as shown in Figure 4, this application discloses a schematic diagram of KPI anomaly detection and root cause location in some embodiments. In the KPI anomaly detection process, multi-dimensional KPIs are divided into single-dimensional KPIs for separate detection. The anomaly detection algorithm is designed for continuous segment anomalies in KPIs and uses multi-layer classifiers for detection. The base classifier detects abnormal points in KPIs. The label classifier detects anomalies in the time segment and then identifies them as detected anomalies. The point is true anomaly, thus detecting all abnormal KPIs that occur at the same time. A root cause candidate set is formed based on the discovered abnormal KPIs. The root cause set is discovered based on the ripple effect and using the potential scores in HotSpot to quantify the influence of the KPIs and by layer-by-layer calculation of the root cause tree. In this way, the KPI point anomalies in anomaly detection are converted into KPI continuous interval anomalies through a multi-layer classifier, making the anomaly detector pay more attention to abnormal events. In the root cause analysis stage, the ripple effect of KPI anomalies is used to quantify by defining potential scores and influences, and the search speed is accelerated through layer-by-layer pruning to complete the positioning of the root cause of the anomaly.
参见图5所示,本申请在一些实施例中公开了一种KPI异常检测装置,包括:As shown in Figure 5, this application discloses a KPI anomaly detection device in some embodiments, including:
KPI数据获取模块11,用于获取目标区间的单维KPI时序数据;其中,目标区间的长度为第一预设时间长度,目标区间的结束时间点为指定时间点;The KPI data acquisition module 11 is used to obtain single-dimensional KPI time series data of the target interval; wherein the length of the target interval is the first preset time length, and the end time point of the target interval is the designated time point;
数据特征提取模块12,用于提取单维KPI时序数据的第一数据特征;Data feature extraction module 12, used to extract the first data feature of single-dimensional KPI time series data;
检测结果输出模块13,用于将第一数据特征输入基分类器,并利用基分类器输出单维KPI时序数据的初步异常检测结果;The detection result output module 13 is used to input the first data feature into the base classifier, and use the base classifier to output preliminary anomaly detection results of single-dimensional KPI time series data;
分类结果获取模块14,用于提取目标时间点的第二数据特征,并将第二数据特征以及初步异常检测结果输入标签分类器,得到分类结果;其中,目标时间点为指点时间点之后的第二预设时间长度内的任意时间点;The classification result acquisition module 14 is used to extract the second data feature of the target time point, and input the second data feature and the preliminary anomaly detection result into the label classifier to obtain the classification result; wherein the target time point is the third time point after the pointing time point. 2. Any time point within the preset time length;
检测结果确定模块15,用于基于分类结果确定单维KPI时序数据的最终异常检测结果。The detection result determination module 15 is used to determine the final anomaly detection result of the single-dimensional KPI time series data based on the classification results.
可见,本申请在一些实施例中,可以先获取目标区间的单维KPI时序数据,目标区间的长度为第一预设时间长度,目标区间的结束时间点为指定时间点,然后提取单维KPI时序数据的第一数据特征,并将第一数据特征输入基分类器,并利用基分类器输出单维KPI时序数据的初步异常检测结果,之后提取目标时间点的第二数据特征,并将第二数据特征以及初步异常检测结果输入标签分类器,得到分类结果,目标时间点为指点时间点之后的第二预设时间长度内的任意时间点,基于分类结果确定单维KPI时序数据的最终异常检测结果。也即,利用了两层分类器,先利用基分类器对目标区间的单维KPI时序数据进行检测,得到初步异常检测结果,然后再利用标签分类器对初步异常检测结果进行判定,从而得到最终的异常检测结果,这样,能够提升KPI异常检测的准确率。It can be seen that in some embodiments of this application, the single-dimensional KPI time series data of the target interval can be obtained first. The length of the target interval is the first preset time length, and the end time point of the target interval is the specified time point, and then the single-dimensional KPI is extracted. The first data feature of the time series data is input into the base classifier, and the base classifier is used to output the preliminary anomaly detection result of the single-dimensional KPI time series data. Then the second data feature of the target time point is extracted, and the first data feature is extracted. The two data features and preliminary anomaly detection results are input into the label classifier to obtain the classification results. The target time point is any time point within the second preset time length after the pointing time point. The final anomaly of the single-dimensional KPI time series data is determined based on the classification results. Test results. That is, a two-layer classifier is used. The base classifier is first used to detect the single-dimensional KPI time series data in the target interval to obtain preliminary anomaly detection results, and then the label classifier is used to determine the preliminary anomaly detection results, thereby obtaining the final The anomaly detection results can improve the accuracy of KPI anomaly detection.
其中,数据特征提取模块12,包括:Among them, the data feature extraction module 12 includes:
归一化处理子模块,用于对单维KPI时序数据进行归一化处理,得到归一化后数据;The normalization processing sub-module is used to normalize single-dimensional KPI time series data and obtain normalized data;
数据特征提取子模块,用于将归一化后数据以及归一化后数据的统计特征、预测特征、频域特征中的至少一项确定为单维KPI时序数据的第一数据特征。The data feature extraction submodule is used to determine the normalized data and at least one of the statistical features, prediction features, and frequency domain features of the normalized data as the first data feature of the single-dimensional KPI time series data.
进一步的,装置还包括根因定位模块,根因定位模块包括:Further, the device also includes a root cause positioning module, which includes:
候选根因集构建子模块,用于基于最终异常检测结果为异常的多个单维KPI时序数据构建候选根因集;其中,候选根因集中包括一个或多个单维KPI时序数据;The candidate root cause set construction submodule is used to construct a candidate root cause set based on multiple single-dimensional KPI time series data whose final anomaly detection results are abnormal; wherein the candidate root cause set includes one or more single-dimensional KPI time series data;
根因树构建子模块,用于以候选根因集为节点,并根据候选根因集的数据维数构建多层 根因树;The root cause tree construction submodule is used to use candidate root cause sets as nodes and build multiple layers according to the data dimensions of the candidate root cause sets. root cause tree;
异常根因集确定子模块,用于基于预设剪枝策略对多层根因树进行逐层剪枝,并基于涟漪效应确定出异常根因集。The abnormal root cause set determination submodule is used to prune the multi-layer root cause tree layer by layer based on the preset pruning strategy, and determine the abnormal root cause set based on the ripple effect.
其中,多层根因树中同一层节点的数据维数相同,且各层节点的数据维度数自顶向下递减;Among them, the data dimensions of nodes at the same level in the multi-layer root cause tree are the same, and the data dimensions of nodes at each level decrease from top to bottom;
相应的,异常根因集确定子模块,用于基于预设剪枝策略对多层根因树自顶向下的进行逐层剪枝。Correspondingly, the abnormal root cause set determination submodule is used to prune the multi-layer root cause tree layer by layer from top to bottom based on the preset pruning strategy.
在一些实施例中,异常根因集确定子模块,用于:In some embodiments, the exception root cause set determination submodule is used to:
针对任一层,基于预设影响力值计算规则确定每个节点的影响力值,判断影响力值是否小于预设影响力阈值,若是,则剔除该节点以及该层之下各层中已剔除节点的子节点;For any layer, determine the influence value of each node based on the preset influence value calculation rules, and determine whether the influence value is less than the preset influence threshold. If so, delete the node and the layers below it. child nodes of node;
针对任一层,在剔除影响力值小于预设影响力阈值的节点后,基于涟漪效应确定每个剩余节点对应的潜在分值;其中,潜在分值表征节点的元素对该节点的子节点的元素的影响程度;For any layer, after eliminating nodes whose influence value is less than the preset influence threshold, the potential score corresponding to each remaining node is determined based on the ripple effect; where the potential score represents the influence of the element of the node on the child node of the node. The degree of influence of elements;
将各层剩余节点的潜在分值进行排序,基于排序结果确定出异常根因集。The potential scores of the remaining nodes in each layer are sorted, and the abnormal root cause set is determined based on the sorting results.
进一步的,异常根因集确定子模块,用于:基于预设预测算法计算每个节点中每个元素的预测值,基于每个元素的预测值以及每个元素的实际值确定每个节点的影响力值。Further, the abnormal root cause set determination submodule is used to: calculate the predicted value of each element in each node based on the preset prediction algorithm, and determine the predicted value of each node based on the predicted value of each element and the actual value of each element. Influence value.
参见图6所示,本申请在一些实施例中公开了一种电子设备20,包括处理器21和存储器22;其中,存储器22,用于保存计算机程序;处理器21,用于执行计算机程序,前述一些实施例中公开的出KPI异常检测方法。Referring to Figure 6, this application discloses an electronic device 20 in some embodiments, including a processor 21 and a memory 22; wherein the memory 22 is used to save a computer program; the processor 21 is used to execute the computer program, KPI anomaly detection methods disclosed in some of the foregoing embodiments.
关于上述出KPI异常检测方法的过程可以参考前述一些实施例中公开的相应内容,在此不再进行赘述。Regarding the process of the above KPI anomaly detection method, please refer to the corresponding content disclosed in some of the foregoing embodiments, and will not be described again here.
并且,存储器22作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者光盘等,存储方式可以是短暂存储或者永久存储。Moreover, the memory 22, as a carrier for resource storage, may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., and the storage method may be short-term storage or permanent storage.
另外,电子设备20还包括电源23、通信接口24、输入输出接口25和通信总线26;其中,电源23用于为电子设备20上的各硬件设备提供工作电压;通信接口24能够为电子设备20创建与外界设备之间的数据传输通道,其所遵循的通信协议是能够适用于本申请技术方案的任意通信协议,在此不对其进行限定;输入输出接口25,用于获取外界输入数据或向外界输出数据,其接口类型可以根据应用需要进行选取,在此不进行限定。 In addition, the electronic device 20 also includes a power supply 23, a communication interface 24, an input and output interface 25 and a communication bus 26; the power supply 23 is used to provide operating voltage for each hardware device on the electronic device 20; the communication interface 24 can provide the electronic device 20 with working voltage. Create a data transmission channel with external devices, and the communication protocol it follows is any communication protocol that can be applied to the technical solution of this application, which is not limited here; the input and output interface 25 is used to obtain input data from the outside world or send data to the outside world. For external output data, the interface type can be selected according to application needs and is not limited here.
参见图7所示,本申请在一些实施例中公开了一种非易失性计算机可读存储介质70,用于保存计算机程序710,其中,计算机程序710被处理器执行时实现前述一些实施例公开的出KPI异常检测方法。Referring to FIG. 7 , the present application discloses in some embodiments a non-volatile computer-readable storage medium 70 for storing a computer program 710 , wherein the computer program 710 implements some of the foregoing embodiments when executed by a processor. Publicly available KPI anomaly detection methods.
关于上述出KPI异常检测方法的过程可以参考前述一些实施例中公开的相应内容,在此不再进行赘述。Regarding the process of the above KPI anomaly detection method, please refer to the corresponding content disclosed in some of the foregoing embodiments, and will not be described again here.
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。Each embodiment in this specification is described in a progressive manner. Each embodiment focuses on its differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other. As for the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple. For relevant details, please refer to the description in the method section.
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein may be implemented directly in hardware, in software modules executed by a processor, or in a combination of both. Software modules may be located in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. any other known form of storage media.
以上对本申请所提供的一种KPI异常检测方法、装置、设备及介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上,本说明书内容不应理解为对本申请的限制。 The above is a detailed introduction to a KPI anomaly detection method, device, equipment and medium provided by this application. Specific examples are used in this article to illustrate the principles and implementation methods of this application. The description of the above embodiments is only for assistance. Understand the methods and core ideas of this application; at the same time, for those of ordinary skill in the field, there will be changes in the specific implementation methods and application scope based on the ideas of this application. In summary, the contents of this specification should not be understood as a limitation on this application.

Claims (20)

  1. 一种KPI异常检测方法,其特征在于,包括:A KPI anomaly detection method, characterized by including:
    获取目标区间的单维关键性能指标KPI时序数据;其中,所述目标区间的长度为第一预设时间长度,所述目标区间的结束时间点为指定时间点;Obtain the single-dimensional key performance indicator KPI time series data of the target interval; wherein the length of the target interval is the first preset time length, and the end time point of the target interval is the specified time point;
    提取所述单维KPI时序数据的第一数据特征;Extract the first data feature of the single-dimensional KPI time series data;
    将所述第一数据特征输入基分类器,并利用所述基分类器输出所述单维KPI时序数据的初步异常检测结果;Input the first data feature into a base classifier, and use the base classifier to output preliminary anomaly detection results of the single-dimensional KPI time series data;
    提取目标时间点的第二数据特征,并将所述第二数据特征以及所述初步异常检测结果输入标签分类器,得到分类结果;其中,所述目标时间点为所述指点时间点之后的第二预设时间长度内的任意时间点;Extract the second data feature of the target time point, and input the second data feature and the preliminary anomaly detection result into a label classifier to obtain a classification result; wherein the target time point is the third time point after the pointing time point. 2. Any time point within the preset time length;
    基于所述分类结果确定所述单维KPI时序数据的最终异常检测结果。The final anomaly detection result of the single-dimensional KPI time series data is determined based on the classification result.
  2. 根据权利要求1所述的KPI异常检测方法,其特征在于,所述提取所述单维KPI时序数据的第一数据特征,包括:The KPI anomaly detection method according to claim 1, wherein the extracting the first data feature of the single-dimensional KPI time series data includes:
    对所述单维KPI时序数据进行归一化处理,得到归一化后数据;Perform normalization processing on the single-dimensional KPI time series data to obtain normalized data;
    将所述归一化后数据以及所述归一化后数据的统计特征、预测特征、频域特征中的至少一项确定为所述单维KPI时序数据的第一数据特征。Determine the normalized data and at least one of the statistical features, prediction features, and frequency domain features of the normalized data as the first data feature of the single-dimensional KPI time series data.
  3. 根据权利要求1所述的KPI异常检测方法,其特征在于,还包括:The KPI anomaly detection method according to claim 1, further comprising:
    基于所述最终异常检测结果为异常的多个单维KPI时序数据构建候选根因集;其中,所述候选根因集中包括一个或多个所述单维KPI时序数据;Construct a candidate root cause set based on the multiple single-dimensional KPI time series data whose final anomaly detection result is abnormal; wherein the candidate root cause set includes one or more of the single-dimensional KPI time series data;
    以所述候选根因集为节点,并根据所述候选根因集的数据维数构建多层根因树;Using the candidate root cause set as a node, and constructing a multi-layer root cause tree according to the data dimensions of the candidate root cause set;
    基于预设剪枝策略对所述多层根因树进行逐层剪枝,并基于涟漪效应确定出异常根因集。The multi-layer root cause tree is pruned layer by layer based on a preset pruning strategy, and an abnormal root cause set is determined based on the ripple effect.
  4. 根据权利要求3所述的KPI异常检测方法,其特征在于,所述多层根因树中同一层节点的所述数据维数相同,且各层节点的所述数据维度数自顶向下递减;The KPI anomaly detection method according to claim 3, characterized in that the data dimensions of nodes at the same level in the multi-layer root cause tree are the same, and the data dimensions of nodes at each level decrease from top to bottom. ;
    相应的,所述基于预设剪枝策略对所述多层根因树进行逐层剪枝,包括:基于预设剪枝策略对所述多层根因树自顶向下的进行逐层剪枝。Correspondingly, pruning the multi-layer root cause tree layer by layer based on a preset pruning strategy includes: pruning the multi-layer root cause tree layer by layer from top to bottom based on the preset pruning strategy. branch.
  5. 根据权利要求4所述的KPI异常检测方法,其特征在于,所述基于预设剪枝策略对所述多层根因树进行逐层剪枝,包括:The KPI anomaly detection method according to claim 4, characterized in that pruning the multi-layer root cause tree layer by layer based on a preset pruning strategy includes:
    针对任一层,基于预设影响力值计算规则确定每个节点的影响力值,判断所述影响力值是否小于预设影响力阈值,若是,则剔除该节点以及该层之下各层中已剔除节点的子节点。 For any layer, determine the influence value of each node based on the preset influence value calculation rules, and determine whether the influence value is less than the preset influence threshold. If so, remove the node and the layers below it. The child nodes of the deleted node.
  6. 根据权利要求5所述的KPI异常检测方法,其特征在于,所述基于预设剪枝策略对所述多层根因树进行逐层剪枝,并基于涟漪效应确定出异常根因集,包括:The KPI anomaly detection method according to claim 5, wherein the multi-layer root cause tree is pruned layer by layer based on a preset pruning strategy, and the abnormal root cause set is determined based on the ripple effect, including :
    针对任一层,在剔除所述影响力值小于所述预设影响力阈值的节点后,基于涟漪效应确定每个剩余节点对应的潜在分值;其中,所述潜在分值表征节点的元素对该节点的子节点的元素的影响程度;For any layer, after eliminating nodes whose influence value is less than the preset influence threshold, the potential score corresponding to each remaining node is determined based on the ripple effect; wherein the potential score represents the element pair of the node. The degree of influence of the elements of the node's child nodes;
    将各层所述剩余节点的所述潜在分值进行排序,基于排序结果确定出异常根因集。The potential scores of the remaining nodes in each layer are sorted, and an abnormal root cause set is determined based on the sorting results.
  7. 根据权利要求5所述的KPI异常检测方法,其特征在于,所述基于预设影响力值计算规则确定每个节点的影响力值,包括:The KPI anomaly detection method according to claim 5, wherein the determining the influence value of each node based on the preset influence value calculation rules includes:
    基于预设预测算法计算每个节点中每个元素的预测值,基于每个元素的预测值以及每个元素的实际值确定每个节点的影响力值。The predicted value of each element in each node is calculated based on the preset prediction algorithm, and the influence value of each node is determined based on the predicted value of each element and the actual value of each element.
  8. 根据权利要求1所述的KPI异常检测方法,其特征在于,所述单维KPI时序数据为中央处理器CPU数据、内存数据、网络数据。The KPI anomaly detection method according to claim 1, characterized in that the single-dimensional KPI time series data is central processing unit CPU data, memory data, and network data.
  9. 根据权利要求2所述的KPI异常检测方法,其特征在于,归一化处理的方法包括Minmax方法。The KPI anomaly detection method according to claim 2, characterized in that the normalization method includes the Minmax method.
  10. 根据权利要求2所述的KPI异常检测方法,其特征在于,所述统计特征包括均值、方差、极值、分位数、差分中的至少一项。The KPI anomaly detection method according to claim 2, wherein the statistical characteristics include at least one of mean, variance, extreme value, quantile, and difference.
  11. 根据权利要求10所述的KPI异常检测方法,其特征在于,归一化后数据与统计特征用来表示KPI时序数据的短期特征。The KPI anomaly detection method according to claim 10, characterized in that the normalized data and statistical features are used to represent the short-term characteristics of the KPI time series data.
  12. 根据权利要求2所述的KPI异常检测方法,其特征在于,所述预测特征使用指数加权移动平均EWMA预测算法对归一化后数据预测得到。The KPI anomaly detection method according to claim 2, characterized in that the prediction feature is predicted by using an exponentially weighted moving average (EWMA) prediction algorithm to predict normalized data.
  13. 根据权利要求12所述的KPI异常检测方法,其特征在于,所述预测特征用于表示KPI时序数据异常的可能性。The KPI anomaly detection method according to claim 12, characterized in that the prediction features are used to represent the possibility of KPI time series data anomalies.
  14. 根据权利要求2所述的KPI异常检测方法,其特征在于,所述频域特征为小波特征,采用DB2小波分解。The KPI anomaly detection method according to claim 2, characterized in that the frequency domain features are wavelet features, and DB2 wavelet decomposition is used.
  15. 根据权利要求14所述的KPI异常检测方法,其特征在于,所述小波分解特征用于表示KPI数据在频率域上的特征。The KPI anomaly detection method according to claim 14, characterized in that the wavelet decomposition features are used to represent the characteristics of KPI data in the frequency domain.
  16. 根据权利要求1所述的KPI异常检测方法,其特征在于,异常检测的基分类器为极度梯度提升树XGBoost模型。The KPI anomaly detection method according to claim 1, characterized in that the base classifier for anomaly detection is an extreme gradient boosting tree XGBoost model.
  17. 根据权利要求1所述的KPI异常检测方法,其特征在于,所述分类结果为真正例、假正例、真负例和假负例中的一种;The KPI anomaly detection method according to claim 1, characterized in that the classification result is one of a true example, a false positive example, a true negative example and a false negative example;
    其中,出现异常的时间点为t,针对(t,t+T)时间来说: Among them, the time point when the exception occurs is t, for (t, t+T) time:
    在先检测出异常,在T时间检测出异常则为真正例;If an anomaly is detected first, then it is a true example if the anomaly is detected at time T;
    在先检测出异常,在T时间内检测出正常则为假正例;If an abnormality is detected first and normal is detected within T time, it is a false positive;
    在先检测出正常,在T时间内检测出正常则为真负例;If normal is detected first, if normal is detected within T time, it is a true negative example;
    在先检测出正常,在T时间内检测出异常则为假负例。If normal is detected first, if abnormal is detected within T time, it is a false negative example.
  18. 一种KPI异常检测装置,其特征在于,包括:A KPI anomaly detection device, characterized by including:
    KPI数据获取模块,用于获取目标区间的单维KPI时序数据;其中,所述目标区间的长度为第一预设时间长度,所述目标区间的结束时间点为指定时间点;The KPI data acquisition module is used to obtain the single-dimensional KPI time series data of the target interval; wherein the length of the target interval is the first preset time length, and the end time point of the target interval is the designated time point;
    数据特征提取模块,用于提取所述单维KPI时序数据的第一数据特征;A data feature extraction module, used to extract the first data feature of the single-dimensional KPI time series data;
    检测结果输出模块,用于将所述第一数据特征输入基分类器,并利用所述基分类器输出所述单维KPI时序数据的初步异常检测结果;A detection result output module, configured to input the first data feature into a base classifier, and use the base classifier to output preliminary anomaly detection results of the single-dimensional KPI time series data;
    分类结果获取模块,用于提取目标时间点的第二数据特征,并将所述第二数据特征以及所述初步异常检测结果输入标签分类器,得到分类结果;其中,所述目标时间点为所述指点时间点之后的第二预设时间长度内的任意时间点;The classification result acquisition module is used to extract the second data feature of the target time point, and input the second data feature and the preliminary anomaly detection result into the label classifier to obtain the classification result; wherein the target time point is the Any time point within the second preset time length after the pointing time point;
    检测结果确定模块,用于基于所述分类结果确定所述单维KPI时序数据的最终异常检测结果。A detection result determination module, configured to determine the final anomaly detection result of the single-dimensional KPI time series data based on the classification result.
  19. 一种电子设备,其特征在于,包括处理器和存储器;其中,An electronic device, characterized by including a processor and a memory; wherein,
    所述存储器,用于保存计算机程序;The memory is used to store computer programs;
    所述处理器,用于执行所述计算机程序以实现如权利要求1至17任一项所述的KPI异常检测方法。The processor is configured to execute the computer program to implement the KPI anomaly detection method according to any one of claims 1 to 17.
  20. 一种非易失性计算机可读存储介质,其特征在于,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至17任一项所述的KPI异常检测方法。 A non-volatile computer-readable storage medium, characterized in that it is used to store a computer program, wherein when the computer program is executed by a processor, the KPI anomaly detection method as described in any one of claims 1 to 17 is implemented. .
PCT/CN2023/091310 2022-04-28 2023-04-27 Kpi anomaly detection method and apparatus, device and medium WO2023208136A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210460951.1A CN114781529A (en) 2022-04-28 2022-04-28 KPI (Key performance indicator) abnormity detection method, device, equipment and medium
CN202210460951.1 2022-04-28

Publications (1)

Publication Number Publication Date
WO2023208136A1 true WO2023208136A1 (en) 2023-11-02

Family

ID=82434848

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/091310 WO2023208136A1 (en) 2022-04-28 2023-04-27 Kpi anomaly detection method and apparatus, device and medium

Country Status (2)

Country Link
CN (1) CN114781529A (en)
WO (1) WO2023208136A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109032829A (en) * 2018-07-23 2018-12-18 腾讯科技(深圳)有限公司 Data exception detection method, device, computer equipment and storage medium
CN109145948A (en) * 2018-07-18 2019-01-04 宁波沙塔信息技术有限公司 A kind of injection molding machine putty method for detecting abnormality based on integrated study
CN111858231A (en) * 2020-05-11 2020-10-30 北京必示科技有限公司 Single index abnormality detection method based on operation and maintenance monitoring
US20200382536A1 (en) * 2019-05-31 2020-12-03 Gurucul Solutions, Llc Anomaly detection in cybersecurity and fraud applications

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145948A (en) * 2018-07-18 2019-01-04 宁波沙塔信息技术有限公司 A kind of injection molding machine putty method for detecting abnormality based on integrated study
CN109032829A (en) * 2018-07-23 2018-12-18 腾讯科技(深圳)有限公司 Data exception detection method, device, computer equipment and storage medium
US20200382536A1 (en) * 2019-05-31 2020-12-03 Gurucul Solutions, Llc Anomaly detection in cybersecurity and fraud applications
CN111858231A (en) * 2020-05-11 2020-10-30 北京必示科技有限公司 Single index abnormality detection method based on operation and maintenance monitoring

Also Published As

Publication number Publication date
CN114781529A (en) 2022-07-22

Similar Documents

Publication Publication Date Title
CN110825877A (en) Semantic similarity analysis method based on text clustering
CN107562772B (en) Event extraction method, device, system and storage medium
CN104239553A (en) Entity recognition method based on Map-Reduce framework
JP2018503206A (en) Technical and semantic signal processing in large unstructured data fields
CN110851176B (en) Clone code detection method capable of automatically constructing and utilizing pseudo-clone corpus
EP4006745A1 (en) Model training method and apparatus, short message verification method and apparatus, device, and storage medium
CN112131352A (en) Method and system for detecting bad information of webpage text type
CN116089873A (en) Model training method, data classification and classification method, device, equipment and medium
CN112905380A (en) System anomaly detection method based on automatic monitoring log
Rehs A supervised machine learning approach to author disambiguation in the Web of Science
CN116502646A (en) Semantic drift detection method and device, electronic equipment and storage medium
Yi-bin et al. Improvement of ID3 algorithm based on simplified information entropy and coordination degree
CN113723542A (en) Log clustering processing method and system
CN117216687A (en) Large language model generation text detection method based on ensemble learning
WO2023208136A1 (en) Kpi anomaly detection method and apparatus, device and medium
CN114579739B (en) Topic detection and tracking method for text data stream
Jingliang et al. A data-driven approach based on LDA for identifying duplicate bug report
CN114580534A (en) Industrial data anomaly detection method and device, electronic equipment and storage medium
Li et al. Detecting a multigranularity event in an unequal interval time series based on self-adaptive segmenting
Wu et al. Top-k contrast order-preserving pattern mining for time series classification
CN115438644B (en) Informationized project similarity analysis method, storage medium and system
Ding et al. Efficient Time Series Classification Based on Learning Similar Trend Features in the Same Class Sequences
Sundravadivelu et al. A Novel Approach for Discovering the Patterns by using PDBD Model in Big Data
Wang et al. Method for Data Retrieval Intent Recognition Targeting Complex Grid Control Operations
Songma et al. Optimizing Intrusion Detection Systems: Exploring the Impact of Feature Selection, Normalization and Three-Phase Precision on the Cse-Cic-Ids-2018 Dataset

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23795568

Country of ref document: EP

Kind code of ref document: A1