CN112686330B - KPI abnormal data detection method and device, storage medium and electronic equipment - Google Patents
KPI abnormal data detection method and device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN112686330B CN112686330B CN202110017874.8A CN202110017874A CN112686330B CN 112686330 B CN112686330 B CN 112686330B CN 202110017874 A CN202110017874 A CN 202110017874A CN 112686330 B CN112686330 B CN 112686330B
- Authority
- CN
- China
- Prior art keywords
- data
- kpi
- time point
- detected
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 55
- 238000001514 detection method Methods 0.000 title claims abstract description 30
- 239000013598 vector Substances 0.000 claims abstract description 307
- 230000005856 abnormality Effects 0.000 claims abstract description 40
- 238000000034 method Methods 0.000 claims description 40
- 230000006870 function Effects 0.000 claims description 31
- 238000004364 calculation method Methods 0.000 claims description 27
- 238000003062 neural network model Methods 0.000 claims description 24
- 238000012545 processing Methods 0.000 claims description 13
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000010276 construction Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Landscapes
- Testing And Monitoring For Control Systems (AREA)
Abstract
The application provides a KPI abnormal data detection method and device, a storage medium and electronic equipment, wherein a to-be-detected time point, a previous time point of the to-be-detected time point and a KPI vector sequence corresponding to a next time point of the to-be-detected time point are respectively input into a feature model to obtain a context feature corresponding to each KPI vector sequence; calculating the data anomaly probability of the time point to be detected based on the context characteristics; if the data anomaly probability is larger than the anomaly threshold, determining that the KPI data corresponding to the time point to be detected has abnormal KPI data, otherwise, determining that the KPI data has no abnormal KPI data. Therefore, according to the technical scheme, whether the KPI data of the abnormality exist at the time point to be detected or not is determined based on the time point to be detected, the previous time point of the time point to be detected and the context characteristics corresponding to the next time point of the time point to be detected, so that the accuracy of KPI abnormal data detection is improved.
Description
Technical Field
The present application relates to the field of operation and maintenance monitoring technologies, and in particular, to a KPI abnormal data detection method and apparatus, a storage medium, and an electronic device.
Background
With the development of the scientific and technological information industry, more and more enterprises move to intellectualization, and daily operations of the enterprises are not supported by automation equipment. Each device must maintain a stable and good running state to ensure the healthy development of the whole enterprise. Therefore, the operation and maintenance data generated by each time point of each device in the system platform is subjected to key performance index (KeyPerformanceIndicator, KPI)) data anomaly detection, so that the abnormal KPI data is timely found, and further the abnormal devices are timely diagnosed and repaired, and the method has great significance for the healthy development of enterprises.
In the prior art, the KPI abnormal data detection process comprises the following steps: and comparing each KPI data corresponding to the time point to be detected with a preset threshold value, and further determining whether abnormal KPI data exists at the time point to be detected. The existing KPI abnormal data detection accuracy is low.
Disclosure of Invention
The inventor finds out in the research process that whether the KPI data corresponding to the time point to be detected is abnormal or not often depends on all KPI data corresponding to the last time point of the time point to be detected and all KPI data corresponding to the next time point.
In order to achieve the above object, the present application provides the following technical solutions:
A KPI abnormal data detection method comprises the following steps:
acquiring a data sequence to be detected; the to-be-detected data sequence comprises a KPI vector sequence corresponding to a to-be-detected time point, a KPI vector sequence corresponding to a previous time point of the to-be-detected time point and a KPI vector sequence corresponding to a next time point of the to-be-detected time point, wherein each KPI vector sequence comprises a plurality of KPI vectors, each KPI vector is obtained by vectorization according to KPI data corresponding to a reference time point of the KPI vector, and the reference time point of each KPI vector is determined according to the position of the KPI vector in the KPI vector sequence to which the KPI vector belongs;
Inputting each KPI vector sequence contained in the data sequence to be detected into a pre-constructed feature model to obtain a context feature corresponding to each KPI vector sequence;
Calculating the data anomaly probability of the time point to be detected based on the context characteristics corresponding to each KPI vector sequence;
if the data abnormality probability is larger than a preset abnormality threshold, determining that abnormal KPI data exists in KPI data corresponding to the time point to be detected;
If the data abnormality probability is not greater than the abnormality threshold, determining that abnormal KPI data does not exist in KPI data corresponding to the time point to be detected.
The method, optionally, calculates the data anomaly probability of the to-be-detected time point based on the context feature corresponding to each KPI vector sequence, including:
calculating a first data similarity between the time point to be detected and a time point before the time point to be detected based on a context feature corresponding to the KPI vector sequence of the time point to be detected and a context feature corresponding to the KPI vector sequence of the time point before the time point to be detected;
calculating second data similarity between the time point to be detected and the time point to be detected based on the context feature corresponding to the KPI vector sequence of the time point to be detected and the context feature corresponding to the KPI vector sequence of the time point to be detected;
And calculating the data anomaly probability of the time point to be detected based on the first data similarity and the second data similarity.
In the above method, optionally, the calculating the data anomaly probability of the to-be-detected time point based on the first data similarity and the second data similarity includes:
determining the maximum similarity from the data similarity with the maximum value in the first data similarity and the second data similarity;
determining the minimum similarity from the data similarity with the minimum value in the first data similarity and the second data similarity;
and calculating the ratio of the minimum similarity to the maximum similarity to obtain the data anomaly probability of the time point to be detected.
In the above method, optionally, the feature model includes a target encoder and a feature calculation model, and the inputting each KPI vector sequence included in the data sequence to be detected into a pre-constructed feature model to obtain a context feature corresponding to each KPI vector sequence includes:
Coding each KPI vector sequence contained in the data sequence to be detected by using a target coder to obtain a feature vector corresponding to each KPI vector sequence;
And respectively inputting each feature vector into a pre-constructed feature calculation model to obtain a context feature corresponding to each feature vector.
The method, optionally, the construction process of the feature model includes:
Collecting a historical time sequence data sequence; the historical time sequence data sequence comprises a plurality of data subsets corresponding to time points; each of the data subsets comprising a plurality of KPI data;
Carrying out vectorization processing on each data subset to obtain KPI vectors corresponding to each data subset;
determining a plurality of sample data based on each KPI vector; each sample data comprises KPI vectors with preset numerical values and continuous time points;
Randomly arranging the sample data to obtain a sample data sequence;
selecting N pieces of sample data with continuous positions from the sample data sequence according to a preset sequence; wherein, N is a positive integer;
taking each selected sample data as first sample data;
Encoding each piece of current first sample data by using an encoder to obtain a first feature vector corresponding to each piece of first sample data;
inputting each first feature vector into a current neural network model to obtain a context feature corresponding to each first feature vector;
Constructing a first comprehensive loss function based on each first feature vector and the context feature corresponding to each first feature vector;
Solving a minimum value of the first comprehensive loss function to obtain a new encoder and a new neural network model;
Selecting N sample data with connected positions according to the preset sequence from all the rest sample data in the sample data sequence;
taking each sample data currently selected as first sample data;
returning to execute the step of performing encoding processing on each current first sample data by using the current encoder based on each first sample data, the new encoder and the new neural network model to obtain a first feature vector corresponding to each first sample data until the number of residual sample data in the sample data sequence is smaller than N;
the current encoder is taken as a target encoder, and the current neural network model is taken as a characteristic calculation model.
The method, optionally, wherein the constructing a first comprehensive loss function based on each first feature vector and the context feature corresponding to each first feature vector includes:
For each first feature vector, constructing an initial parameter matrix corresponding to the first feature vector based on the first feature vector and the context feature corresponding to the first feature vector, and constructing a first loss function of the first feature vector based on the first feature vector, the context feature of the first feature vector and the initial parameter matrix corresponding to the first feature vector;
And constructing a first comprehensive loss function based on each first loss function.
A KPI anomaly data detection apparatus, comprising:
The acquisition unit is used for acquiring the data sequence to be detected; the to-be-detected data sequence comprises a KPI vector sequence corresponding to a to-be-detected time point, a KPI vector sequence corresponding to a previous time point of the to-be-detected time point and a KPI vector sequence corresponding to a next time point of the to-be-detected time point, wherein each KPI vector sequence comprises a plurality of KPI vectors, each KPI vector is obtained by vectorization according to KPI data corresponding to a reference time point of the KPI vector, and the reference time point of each KPI vector is determined according to the position of the KPI vector in the KPI vector sequence to which the KPI vector belongs;
The first input unit is used for respectively inputting each KPI vector sequence contained in the data sequence to be detected into a pre-constructed feature model to obtain a context feature corresponding to each KPI vector sequence;
the first calculation unit is used for calculating the data anomaly probability of the time point to be detected based on the context characteristics corresponding to each KPI vector sequence;
The first determining unit is used for determining that abnormal KPI data exists in KPI data corresponding to the time point to be detected if the data abnormality probability is larger than a preset abnormality threshold; if the data abnormality probability is not greater than the abnormality threshold, determining that abnormal KPI data does not exist in KPI data corresponding to the time point to be detected.
The above apparatus, optionally, the computing unit includes:
A first calculating subunit, configured to calculate, based on a context feature corresponding to a KPI vector sequence of the to-be-detected time point and a context feature corresponding to a KPI vector sequence of a time point before the to-be-detected time point, a first data similarity between the to-be-detected time point and the time point before the to-be-detected time point;
A second calculating subunit, configured to calculate a second data similarity between the to-be-detected time point and a time point after the to-be-detected time point based on a context feature corresponding to the KPI vector sequence of the to-be-detected time point and a context feature corresponding to the KPI vector sequence of the time point after the to-be-detected time point;
and a third calculation subunit, configured to calculate, based on the first data similarity and the second data similarity, a probability of data anomaly at the time point to be detected.
A storage medium, which includes stored instructions, wherein when the instructions run, a device in which the storage medium is controlled to execute the KPI abnormal data detection method described above.
An electronic device comprising a memory and one or more instructions, wherein the one or more instructions are stored in the memory and configured to perform the KPI anomaly data detection method described above by one or more processors.
Compared with the prior art, the application has the following advantages:
The application provides a KPI abnormal data detection method and device, a storage medium and electronic equipment, wherein the method comprises the following steps: acquiring a data sequence to be detected; the data sequence to be detected comprises a KPI vector sequence corresponding to a time point to be detected, a KPI vector sequence corresponding to a previous time point of the time point to be detected and a KPI vector sequence corresponding to a next time point of the time point to be detected, wherein each KPI vector sequence comprises a plurality of KPI vectors, each KPI vector is obtained by vectorizing according to KPI data corresponding to a reference time point of the KPI vector, and the reference time point of each KPI vector is determined according to the position of the KPI vector in the KPI vector sequence to which the KPI vector belongs; respectively inputting each KPI vector sequence contained in the data sequence to be detected into a pre-constructed feature model to obtain the corresponding context feature of each KPI vector sequence; calculating the data anomaly probability of the time point to be detected based on the context characteristics corresponding to each KPI vector sequence; if the data anomaly probability is larger than a preset anomaly threshold, determining that the KPI data corresponding to the time point to be detected has anomaly KPI data, and if the data anomaly probability is not larger than the anomaly threshold, determining that the KPI data corresponding to the time point to be detected has no anomaly KPI data. Therefore, according to the technical scheme provided by the application, the data anomaly probability is calculated based on the context feature corresponding to the time point to be detected, the context feature corresponding to the previous time point of the time point to be detected and the context feature of the next time point of the time point to be detected, so that whether the KPI data of the anomaly exists at the time point to be detected or not is determined based on the comparison result of the data anomaly probability and the anomaly threshold value, and the accuracy of KPI anomaly data detection is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for detecting KPI abnormal data provided by the application;
FIG. 2 is a flowchart of a method for detecting KPI abnormal data according to the present application;
FIG. 3 is a flowchart of another method of KPI anomaly data detection method provided by the present application;
FIG. 4 is a schematic structural diagram of a KPI abnormal data detection device provided by the application;
fig. 5 is a schematic structural diagram of an electronic device according to the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The application is operational with numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor devices, distributed computing environments that include any of the above devices or devices, and the like.
The embodiment of the application provides a KPI abnormal data detection method, which can be applied to various system platforms, wherein an execution main body of the KPI abnormal data detection method can be operated on an event analyzer of a computer terminal or various mobile equipment, and a flow chart of the KPI abnormal data detection method is shown in figure 1, and specifically comprises the following steps:
S101, acquiring a data sequence to be detected.
And acquiring a data sequence to be detected, wherein the data sequence to be detected comprises a KPI vector sequence corresponding to a time point to be detected, a KPI vector sequence corresponding to a previous time point of the time point to be detected and a KPI vector sequence corresponding to a later time point of the time point to be detected. Each vector sequence comprises a plurality of KPI vectors, a reference time point corresponding to one KPI vector, different reference time points corresponding to the KPI vectors in each vector sequence, and the reference time point of each KPI vector in each KPI vector sequence is determined according to the position of the KPI vector in the KPI sequence to which the KPI vector belongs, specifically, the reference time point of each KPI vector in each KPI vector sequence is equal to the time point-m+ corresponding to the KPI vector sequence, and the position of the KPI vector in the KPI sequence. For example, the reference time point of the first KPI vector in the KPI vector sequence corresponding to the time point t is t-m, the reference time point of the second KPI vector is t-m+1, and so on, the reference time point of the last KPI vector is t+m, where m is a positive integer.
For each KPI vector sequence, the reference time point of the KPI vector in the middle position of the KPI vector sequence is the same as the corresponding time point of the KPI vector sequence. For example, the KPI vector sequence corresponding to the time point t is a, a is [ P t-m,Pt-m+1,...Pt,...Pt+m-1,Pt+m+1 ], and the reference time point t of the KPI vector P t at the middle position of a is the same as the time point t corresponding to a.
For each KPI vector in each KPI vector sequence, the KPI vector is obtained by vectorizing according to a plurality of KPI data corresponding to a self-reference time point, each KPI data is represented by p after vectorizing, that is, the KPI vector is composed of a plurality of p, for example, in the KPI vector sequence corresponding to a time point t, the first KPI vector
102. And respectively inputting each KPI vector sequence contained in the data sequence to be detected into a pre-constructed feature model to obtain the context feature corresponding to each KPI vector sequence.
In the method provided by the embodiment of the application, a feature model is pre-constructed, a KPI vector sequence is input into the feature model, the context feature corresponding to the KPI vector sequence is output through feature model processing, and the context feature corresponding to the KPI vector sequence is used for representing the relationship between the KPI vector sequence and the KPI vector of the previous time point of the self time point and the KPI vector of the next time point of the self time point.
In the method provided by the embodiment of the application, the characteristic model comprises a target encoder and a characteristic calculation model, and the process of inputting each KPI vector sequence into the characteristic model to obtain the context characteristic corresponding to each KPI vector sequence specifically comprises the following steps: coding each KPI vector sequence contained in the data sequence to be detected by using a target coder to obtain a feature vector corresponding to each KPI vector sequence; and respectively inputting each feature vector into a pre-constructed feature calculation model to obtain the context feature corresponding to each feature vector.
In the provision provided by the embodiment of the application, the target encoder encodes each KPI vector sequence, and then the feature vector obtained by encoding is input into the feature calculation model, so as to obtain the context feature corresponding to each KPI vector sequence.
In the method provided by the embodiment of the application, referring to fig. 2, the construction of the feature model specifically includes:
s201, collecting a historical time sequence data sequence.
And acquiring a historical time sequence data sequence, wherein the historical time sequence data set comprises a plurality of continuous data subsets at time points, each data subset corresponds to one time point, the time points corresponding to the data subsets form a continuous time sequence, and each data subset comprises a plurality of KPI data.
S202, carrying out vectorization processing on each data subset to obtain KPI vectors corresponding to each data subset.
And carrying out vectorization processing on each KPI data contained in each data subset aiming at each data subset, and forming the KPI data subjected to vectorization processing into KPI vectors corresponding to the data subsets.
S203, determining a plurality of sample data based on the KPI vectors.
Based on the corresponding time points of the KPI vectors, forming a vector sequence by the KPI vectors according to the sequence from the small time points to the large time points, sliding a preset sliding window in the vector sequence, and forming one sample data by the KPI vectors in the sliding window in the sliding process, so as to obtain a plurality of sample data. The window length of the preset sliding window is a preset value, and the optional preset value may be 2m+1, where the time point corresponding to each sample data is determined by the time point corresponding to the KPI vector in the middle position in the sample data, that is, the time point corresponding to each sample data is the time point corresponding to the KPI vector in the middle position in the sample data.
The time points corresponding to the different sample data are different.
S204, randomly arranging the sample data to obtain a sample data sequence.
And randomly arranging each sample data, and forming a sample data sequence by each sample data after random arrangement.
S205, selecting N pieces of sample data with continuous positions from the sample data sequence according to a preset sequence.
N pieces of sample data with continuous positions are selected from the sample data sequence according to a preset sequence, wherein N is a positive integer, the preset sequence can be a sequence from left to right, N pieces of sample data are sequentially selected from the sample data sequence according to a sequence from left to right, and the positions of the N pieces of sample data in the sample data sequence are continuous.
S206, taking each piece of selected sample data as first sample data.
Each sample data selected from the sample data sequence is taken as first sample data. It should be noted that, the N sample data selected from the sample data sequence for the first time is the first sample data of the first batch.
S207, encoding each piece of current first sample data by using an encoder to obtain a first feature vector corresponding to each piece of first sample data.
And carrying out coding processing on each piece of current first sample data by using an encoder to obtain a first feature vector corresponding to each piece of first sample data, wherein the encoder used for carrying out coding processing on the first sample data of the first batch is a preset encoder. For a first sample data of a non-first batch, the encoder that encodes the first sample data is an encoder trained with sample data of a previous batch of the first sample data.
S208, respectively inputting each first feature vector into the current neural network model to obtain the context feature corresponding to each first feature vector.
The method comprises the steps of respectively inputting each current first feature vector into a current neural network model to obtain context features corresponding to each first feature vector, and inputting different batches of feature vectors into the neural network model, wherein for the first feature vectors of the first batch, each first feature vector is input into a preset neural network model to obtain context features corresponding to each first feature vector, and for the first feature vectors of the non-first batch, each first feature vector is input into the neural network model trained by the first feature vector of the last batch of the first feature vectors to obtain context features corresponding to each first feature vector.
S209, constructing a first comprehensive loss function based on each first feature vector and the context feature corresponding to each first feature vector.
A first comprehensive loss function is constructed based on each first feature vector and the context feature corresponding to each first feature vector.
Specifically, for each first feature vector, an initial parameter matrix corresponding to the first feature vector is constructed based on the first feature vector and the context feature corresponding to the first feature vector, and a first loss function of the first feature vector is constructed based on the first feature vector, the context feature corresponding to the first feature vector, and the initial parameter matrix corresponding to the first feature vector.
For example, for the first feature vector L i, where the context feature corresponding to the first feature vector is C i,Li, d 1,Ci, d 2, the parameter matrix corresponding to the first feature vector is W i,The constructed loss function is: wherein/> Representing the transpose of the parameter matrix W i, L i T representing the transpose of L i, f i(ui,si) is the first loss function corresponding to the first feature vector.
And constructing a first comprehensive loss function based on the first loss function corresponding to each first feature vector.
For example, the first integrated loss function is constructed asWhere K represents the first integrated loss function and N represents the number of first feature vectors.
S210, solving the minimum value of the first comprehensive loss function to obtain a new encoder and a new neural network model.
And solving the minimum value of the first comprehensive loss function, thereby obtaining all unknowns in the first comprehensive loss function, and further obtaining a new encoder and a new neural network model.
S211, judging whether the number of the residual sample data in the sample data sequence is smaller than N, if not, executing step S212, and if yes, executing step S213.
S212, selecting N pieces of sample data with continuous positions according to a preset sequence from the rest sample data in the sample data sequence, taking each piece of currently selected sample data as first sample data, and returning to execute step S207
And selecting N pieces of sample data with connected positions from all the rest sample data in the sample data sequence according to a preset sequence, wherein the N pieces of sample data form another batch of sample data for training the encoder and the neural network model.
Each sample data currently selected is taken as first sample data, and based on the respective first sample data, the new encoder and the new neural network model, step S207 is executed back.
S213, taking the current encoder as a target encoder, and taking the current neural network model as a characteristic calculation model.
If the number of the residual sample data in the sample data sequence is smaller than N, the training of the encoder and the neural network model is completed, the current encoder is taken as a target encoder, and the current neural network model is taken as a characteristic calculation model.
In the method provided by the embodiment of the application, the target encoder and the feature calculation model are constructed based on the self-supervision comparison learning mode, and compared with the traditional manual feature definition learning mode, the accuracy is higher.
S103, calculating the data anomaly probability of the time point to be detected based on the context characteristics corresponding to each KPI vector sequence.
And calculating the data anomaly probability of the time point to be detected based on the context characteristics corresponding to each KPI vector sequence.
Specifically, referring to fig. 3, the process of calculating the data anomaly probability of the time point to be detected based on the context feature corresponding to each KPI vector sequence specifically includes:
S301, calculating first data similarity between a time point to be detected and a time point before the time point to be detected based on a context feature corresponding to the KPI vector sequence of the time point to be detected and a context feature corresponding to the KPI vector sequence of the time point before the time point to be detected.
And calculating the first data similarity between the time point to be detected and the time point before the time point to be detected through a preset similarity calculation formula based on the context feature corresponding to the KPI vector sequence of the time point to be detected and the context feature corresponding to the KPI vector sequence of the time point before the time point to be detected.
S302, calculating second data similarity between the time point to be detected and the time point to be detected based on the context feature corresponding to the KPI vector sequence of the time point to be detected and the context feature corresponding to the KPI vector sequence of the time point to be detected.
And calculating second data similarity between the time point to be detected and the time point to be detected according to a similarity calculation formula based on the context feature corresponding to the KPI vector sequence of the time point to be detected and the context feature corresponding to the KPI vector sequence of the time point to be detected.
S303, calculating the data anomaly probability of the time point to be detected based on the first data similarity and the second data similarity.
Based on the first data similarity and the second data similarity, calculating the data anomaly probability of the time point to be detected, wherein the process specifically comprises the following steps:
determining the maximum similarity from the data similarity with the maximum value in the first data similarity and the second data similarity;
determining the minimum similarity from the data similarity with the minimum value in the first data similarity and the second data similarity;
and calculating the ratio of the minimum similarity to the maximum similarity to obtain the data anomaly probability of the time point to be detected.
In the method provided by the embodiment of the application, the first data similarity and the second data similarity are compared, the data similarity with the largest value and the data similarity with the smallest value are determined, the data similarity with the largest value is determined as the largest similarity, the data similarity with the smallest value is determined as the smallest similarity, and the ratio of the smallest similarity to the largest similarity, namely the smallest similarity is divided by the largest similarity, is calculated, so that the data anomaly probability of the time point to be detected is obtained.
S104, judging whether the data abnormality probability is larger than a preset abnormality threshold, if so, executing step S105, and if not, executing step S106.
And comparing the data abnormality probability of the time point to be detected with a preset abnormality threshold value so as to judge whether the data abnormality probability is larger than the abnormality threshold value, wherein the abnormality threshold value is a manually set value and can be adjusted according to requirements, and the value range of the abnormality threshold value can be 0.5-1.
S105, determining that abnormal KPI data exists in KPI data corresponding to a time point to be detected.
If the data abnormality probability of the time point to be detected is larger than the abnormality threshold, determining that abnormal KPI data exists in KPI data corresponding to the time point to be detected.
S106, determining that no abnormal KPI data exists in KPI data corresponding to the time point to be detected.
If the data abnormality probability of the time point to be detected is not greater than the abnormality threshold, determining that abnormal KPI data does not exist in KPI data corresponding to the time point to be detected.
The KPI abnormal data detection method provided by the embodiment of the application acquires a data sequence to be detected; respectively inputting each KPI vector sequence contained in the data sequence to be detected into a pre-constructed feature model to obtain the corresponding context feature of each KPI vector sequence; calculating the data anomaly probability of the time point to be detected based on the context characteristics corresponding to each KPI vector sequence; if the data anomaly probability is greater than a preset anomaly threshold value, determining that the KPI data corresponding to the time point to be detected has abnormal KPI data; if the data anomaly probability is not greater than a preset anomaly threshold value, determining that no abnormal KPI data exists in KPI data corresponding to the time point to be detected. According to the KPI abnormal data detection method, based on the context characteristics corresponding to the time point to be detected, the context characteristics corresponding to the previous time point of the time point to be detected and the context characteristics of the next time point of the time point to be detected, the data abnormal probability of the time point to be detected is calculated, whether the KPI data with the abnormality exists at the time point to be detected is determined based on the comparison result of the data abnormal probability and the abnormality threshold value, and compared with the existing KPI data with the KPI data corresponding to the time point to be detected and the preset threshold value, whether the KPI data with the abnormality exists is determined more accurately and more efficiently.
Corresponding to the method shown in fig. 1, the embodiment of the present application further provides a KPI abnormal data detection apparatus, configured to implement the method shown in fig. 1, where a schematic structural diagram of the KPI abnormal data detection apparatus is shown in fig. 4, and the KPI abnormal data detection apparatus specifically includes:
An acquiring unit 401, configured to acquire a data sequence to be detected; the to-be-detected data sequence comprises a KPI vector sequence corresponding to a to-be-detected time point, a KPI vector sequence corresponding to a previous time point of the to-be-detected time point and a KPI vector sequence corresponding to a next time point of the to-be-detected time point, wherein each KPI vector sequence comprises a plurality of KPI vectors, each KPI vector is obtained by vectorization according to KPI data corresponding to a reference time point of the KPI vector, and the reference time point of each KPI vector is determined according to the position of the KPI vector in the KPI vector sequence to which the KPI vector belongs;
a first input unit 402, configured to input each KPI vector sequence included in the data sequence to be detected into a pre-constructed feature model, to obtain a context feature corresponding to each KPI vector sequence;
A first calculating unit 403, configured to calculate, based on the context feature corresponding to each KPI vector sequence, a data anomaly probability of the time point to be detected;
A first determining unit 404, configured to determine that abnormal KPI data exists in KPI data corresponding to the time point to be detected if the data abnormality probability is greater than a preset abnormality threshold; if the data abnormality probability is not greater than the abnormality threshold, determining that abnormal KPI data does not exist in KPI data corresponding to the time point to be detected.
According to the KPI abnormal data detection device provided by the embodiment of the application, based on the context feature corresponding to the time point to be detected, the context feature corresponding to the previous time point of the time point to be detected and the context feature of the next time point of the time point to be detected, the data abnormal probability of the time point to be detected is calculated, and based on the comparison result of the data abnormal probability and the abnormal threshold value, whether the KPI data of the time point to be detected is abnormal or not is determined.
In one embodiment of the present application, based on the foregoing scheme, the first calculation unit 403 is configured to:
A first calculating subunit, configured to calculate, based on a context feature corresponding to a KPI vector sequence of the to-be-detected time point and a context feature corresponding to a KPI vector sequence of a time point before the to-be-detected time point, a first data similarity between the to-be-detected time point and the time point before the to-be-detected time point;
A second calculating subunit, configured to calculate a second data similarity between the to-be-detected time point and a time point after the to-be-detected time point based on a context feature corresponding to the KPI vector sequence of the to-be-detected time point and a context feature corresponding to the KPI vector sequence of the time point after the to-be-detected time point;
and a third calculation subunit, configured to calculate, based on the first data similarity and the second data similarity, a probability of data anomaly at the time point to be detected.
In one embodiment of the present application, based on the foregoing scheme, the third calculation subunit performs calculation of the data anomaly probability at the time point to be detected based on the first data similarity and the second data similarity, for:
determining the maximum similarity from the data similarity with the maximum value in the first data similarity and the second data similarity;
determining the minimum similarity from the data similarity with the minimum value in the first data similarity and the second data similarity;
and calculating the ratio of the minimum similarity to the maximum similarity to obtain the data anomaly probability of the time point to be detected.
In one embodiment of the present application, based on the foregoing solution, the feature model includes a target encoder and a feature calculation model, and the first input unit 402 performs input of each KPI vector sequence included in the data sequence to be detected into a pre-constructed feature model, to obtain a context feature corresponding to each KPI vector sequence, for:
Coding each KPI vector sequence contained in the data sequence to be detected by using a target coder to obtain a feature vector corresponding to each KPI vector sequence;
And respectively inputting each feature vector into a pre-constructed feature calculation model to obtain a context feature corresponding to each feature vector.
In one embodiment of the present application, based on the foregoing scheme, it may be further configured to:
The acquisition unit is used for acquiring the historical time sequence data sequence; the historical time sequence data sequence comprises a plurality of data subsets corresponding to time points; each of the data subsets comprising a plurality of KPI data;
the vectorization processing unit is used for carrying out vectorization processing on each data subset to obtain KPI vectors corresponding to each data subset;
A second determining unit configured to determine a plurality of sample data based on each of the KPI vectors; each sample data comprises KPI vectors with preset numerical values and continuous time points;
the arrangement unit is used for randomly arranging the sample data to obtain a sample data sequence;
the first selecting unit is used for selecting N sample data with continuous positions from the sample data sequence according to a preset sequence; wherein, N is a positive integer;
A second selecting unit configured to use each of the selected sample data as first sample data;
the encoding unit is used for encoding each piece of current first sample data by using an encoder to obtain a first feature vector corresponding to each piece of first sample data;
The second input unit is used for respectively inputting each first characteristic vector into the current neural network model to obtain a context characteristic corresponding to each first characteristic vector;
The construction unit is used for constructing a first comprehensive loss function based on each first feature vector and the context feature corresponding to each first feature vector;
the second calculation unit is used for solving the minimum value of the first comprehensive loss function to obtain a new encoder and a new neural network model;
the second selecting unit is used for selecting N sample data connected in position according to the preset sequence from all the rest sample data in the sample data sequence;
a third determining unit configured to take each sample data currently selected as first sample data;
A return unit, configured to return to performing, based on each of the first sample data, the new encoder, and the new neural network model, a step of performing encoding processing on each of the current first sample data using the current encoder to obtain a first feature vector corresponding to each of the first sample data, until the number of remaining sample data in the sample data sequence is less than N;
And a fourth determining unit, configured to take the current encoder as a target encoder and take the current neural network model as a feature calculation model.
In one embodiment of the present application, based on the foregoing scheme, the construction unit performs construction of a first comprehensive loss function based on each of the first feature vectors and the context feature corresponding to each of the first feature vectors, for:
For each first feature vector, constructing an initial parameter matrix corresponding to the first feature vector based on the first feature vector and the context feature corresponding to the first feature vector, and constructing a first loss function of the first feature vector based on the first feature vector, the context feature of the first feature vector and the initial parameter matrix corresponding to the first feature vector;
And constructing a first comprehensive loss function based on each first loss function.
The embodiment of the application also provides a storage medium, which comprises stored instructions, wherein the equipment where the storage medium is located is controlled to execute the KPI abnormal data detection method when the instructions run.
The embodiment of the present application further provides an electronic device, whose structural schematic diagram is shown in fig. 5, specifically including a memory 501, and one or more instructions 502, where the one or more instructions 502 are stored in the memory 501, and configured to be executed by the one or more processors 503, where the one or more instructions 502 perform the following operations:
acquiring a data sequence to be detected; the to-be-detected data sequence comprises a KPI vector sequence corresponding to a to-be-detected time point, a KPI vector sequence corresponding to a previous time point of the to-be-detected time point and a KPI vector sequence corresponding to a next time point of the to-be-detected time point, wherein each KPI vector sequence comprises a plurality of KPI vectors, each KPI vector is obtained by vectorization according to KPI data corresponding to a reference time point of the KPI vector, and the reference time point of each KPI vector is determined according to the position of the KPI vector in the KPI vector sequence to which the KPI vector belongs;
Inputting each KPI vector sequence contained in the data sequence to be detected into a pre-constructed feature model to obtain a context feature corresponding to each KPI vector sequence;
Calculating the data anomaly probability of the time point to be detected based on the context characteristics corresponding to each KPI vector sequence;
if the data abnormality probability is larger than a preset abnormality threshold, determining that abnormal KPI data exists in KPI data corresponding to the time point to be detected;
If the data abnormality probability is not greater than the abnormality threshold, determining that abnormal KPI data does not exist in KPI data corresponding to the time point to be detected.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other. For the apparatus class embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference is made to the description of the method embodiments for relevant points.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present application.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.
The method and the device for detecting the KPI abnormal data, the storage medium and the electronic equipment provided by the application are described in detail, and specific examples are applied to the explanation of the principle and the implementation mode of the application, and the explanation of the above examples is only used for helping to understand the method and the core idea of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.
Claims (7)
1. The KPI abnormal data detection method is characterized by comprising the following steps:
acquiring a data sequence to be detected; the to-be-detected data sequence comprises a KPI vector sequence corresponding to a to-be-detected time point, a KPI vector sequence corresponding to a previous time point of the to-be-detected time point and a KPI vector sequence corresponding to a next time point of the to-be-detected time point, wherein each KPI vector sequence comprises a plurality of KPI vectors, each KPI vector is obtained by vectorization according to KPI data corresponding to a reference time point of the KPI vector, and the reference time point of each KPI vector is determined according to the position of the KPI vector in the KPI vector sequence to which the KPI vector belongs;
Inputting each KPI vector sequence contained in the data sequence to be detected into a pre-constructed feature model to obtain a context feature corresponding to each KPI vector sequence;
Calculating the data anomaly probability of the time point to be detected based on the context characteristics corresponding to each KPI vector sequence;
if the data abnormality probability is larger than a preset abnormality threshold, determining that abnormal KPI data exists in KPI data corresponding to the time point to be detected;
If the data abnormality probability is not greater than the abnormality threshold, determining that abnormal KPI data does not exist in KPI data corresponding to the time point to be detected;
The calculating the data anomaly probability of the time point to be detected based on the context characteristics corresponding to each KPI vector sequence comprises the following steps:
calculating a first data similarity between the time point to be detected and a time point before the time point to be detected based on a context feature corresponding to the KPI vector sequence of the time point to be detected and a context feature corresponding to the KPI vector sequence of the time point before the time point to be detected;
calculating second data similarity between the time point to be detected and the time point to be detected based on the context feature corresponding to the KPI vector sequence of the time point to be detected and the context feature corresponding to the KPI vector sequence of the time point to be detected;
calculating the data anomaly probability of the time point to be detected based on the first data similarity and the second data similarity;
The calculating the data anomaly probability of the to-be-detected time point based on the first data similarity and the second data similarity includes:
determining the maximum similarity from the data similarity with the maximum value in the first data similarity and the second data similarity;
determining the minimum similarity from the data similarity with the minimum value in the first data similarity and the second data similarity;
and calculating the ratio of the minimum similarity to the maximum similarity to obtain the data anomaly probability of the time point to be detected.
2. The method according to claim 1, wherein the feature model includes a target encoder and a feature calculation model, the inputting each KPI vector sequence included in the data sequence to be detected into a pre-constructed feature model, respectively, to obtain a context feature corresponding to each KPI vector sequence, including:
Coding each KPI vector sequence contained in the data sequence to be detected by using a target coder to obtain a feature vector corresponding to each KPI vector sequence;
And respectively inputting each feature vector into a pre-constructed feature calculation model to obtain a context feature corresponding to each feature vector.
3. The method according to claim 2, wherein the feature model construction process comprises:
Collecting a historical time sequence data sequence; the historical time sequence data sequence comprises a plurality of data subsets corresponding to time points; each of the data subsets comprising a plurality of KPI data;
Carrying out vectorization processing on each data subset to obtain KPI vectors corresponding to each data subset;
determining a plurality of sample data based on each KPI vector; each sample data comprises KPI vectors with preset numerical values and continuous time points;
Randomly arranging the sample data to obtain a sample data sequence;
selecting N pieces of sample data with continuous positions from the sample data sequence according to a preset sequence; wherein, N is a positive integer;
taking each selected sample data as first sample data;
Encoding each piece of current first sample data by using an encoder to obtain a first feature vector corresponding to each piece of first sample data;
inputting each first feature vector into a current neural network model to obtain a context feature corresponding to each first feature vector;
Constructing a first comprehensive loss function based on each first feature vector and the context feature corresponding to each first feature vector;
Solving a minimum value of the first comprehensive loss function to obtain a new encoder and a new neural network model;
Selecting N sample data with connected positions according to the preset sequence from all the rest sample data in the sample data sequence;
taking each sample data currently selected as first sample data;
returning to execute the step of performing encoding processing on each current first sample data by using the current encoder based on each first sample data, the new encoder and the new neural network model to obtain a first feature vector corresponding to each first sample data until the number of residual sample data in the sample data sequence is smaller than N;
the current encoder is taken as a target encoder, and the current neural network model is taken as a characteristic calculation model.
4. A method according to claim 3, wherein said constructing a first comprehensive loss function based on each of said first feature vectors and the corresponding contextual features of each of said first feature vectors comprises:
For each first feature vector, constructing an initial parameter matrix corresponding to the first feature vector based on the first feature vector and the context feature corresponding to the first feature vector, and constructing a first loss function of the first feature vector based on the first feature vector, the context feature of the first feature vector and the initial parameter matrix corresponding to the first feature vector;
And constructing a first comprehensive loss function based on each first loss function.
5. A KPI anomaly data detection apparatus, comprising:
The acquisition unit is used for acquiring the data sequence to be detected; the to-be-detected data sequence comprises a KPI vector sequence corresponding to a to-be-detected time point, a KPI vector sequence corresponding to a previous time point of the to-be-detected time point and a KPI vector sequence corresponding to a next time point of the to-be-detected time point, wherein each KPI vector sequence comprises a plurality of KPI vectors, each KPI vector is obtained by vectorization according to KPI data corresponding to a reference time point of the KPI vector, and the reference time point of each KPI vector is determined according to the position of the KPI vector in the KPI vector sequence to which the KPI vector belongs;
The first input unit is used for respectively inputting each KPI vector sequence contained in the data sequence to be detected into a pre-constructed feature model to obtain a context feature corresponding to each KPI vector sequence;
the first calculation unit is used for calculating the data anomaly probability of the time point to be detected based on the context characteristics corresponding to each KPI vector sequence;
the first determining unit is used for determining that abnormal KPI data exists in KPI data corresponding to the time point to be detected if the data abnormality probability is larger than a preset abnormality threshold; if the data abnormality probability is not greater than the abnormality threshold, determining that abnormal KPI data does not exist in KPI data corresponding to the time point to be detected;
the calculation unit includes:
A first calculating subunit, configured to calculate, based on a context feature corresponding to a KPI vector sequence of the to-be-detected time point and a context feature corresponding to a KPI vector sequence of a time point before the to-be-detected time point, a first data similarity between the to-be-detected time point and the time point before the to-be-detected time point;
A second calculating subunit, configured to calculate a second data similarity between the to-be-detected time point and a time point after the to-be-detected time point based on a context feature corresponding to the KPI vector sequence of the to-be-detected time point and a context feature corresponding to the KPI vector sequence of the time point after the to-be-detected time point;
A third calculation subunit, configured to calculate, based on the first data similarity and the second data similarity, a probability of data anomaly at the time point to be detected;
The calculating the data anomaly probability of the to-be-detected time point based on the first data similarity and the second data similarity includes:
determining the maximum similarity from the data similarity with the maximum value in the first data similarity and the second data similarity;
determining the minimum similarity from the data similarity with the minimum value in the first data similarity and the second data similarity;
and calculating the ratio of the minimum similarity to the maximum similarity to obtain the data anomaly probability of the time point to be detected.
6. A storage medium comprising stored instructions, wherein the instructions, when executed, control a device in which the storage medium is located to perform the KPI anomaly data detection method according to any one of claims 1 to 4.
7. An electronic device comprising a memory and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by one or more processors to perform the KPI anomaly data detection method of any of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110017874.8A CN112686330B (en) | 2021-01-07 | 2021-01-07 | KPI abnormal data detection method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110017874.8A CN112686330B (en) | 2021-01-07 | 2021-01-07 | KPI abnormal data detection method and device, storage medium and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112686330A CN112686330A (en) | 2021-04-20 |
CN112686330B true CN112686330B (en) | 2024-05-31 |
Family
ID=75456279
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110017874.8A Active CN112686330B (en) | 2021-01-07 | 2021-01-07 | KPI abnormal data detection method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112686330B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113204590B (en) * | 2021-05-31 | 2021-11-23 | 中国人民解放军国防科技大学 | Unsupervised KPI (Key performance indicator) anomaly detection method based on serialization self-encoder |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106506556A (en) * | 2016-12-29 | 2017-03-15 | 北京神州绿盟信息安全科技股份有限公司 | A kind of network flow abnormal detecting method and device |
CN112131272A (en) * | 2020-09-22 | 2020-12-25 | 平安科技(深圳)有限公司 | Detection method, device, equipment and storage medium for multi-element KPI time sequence |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
SG10201700756PA (en) * | 2017-01-31 | 2018-08-30 | Arris Int Ip Ltd | Anomaly detection based on performance indicators |
-
2021
- 2021-01-07 CN CN202110017874.8A patent/CN112686330B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106506556A (en) * | 2016-12-29 | 2017-03-15 | 北京神州绿盟信息安全科技股份有限公司 | A kind of network flow abnormal detecting method and device |
CN112131272A (en) * | 2020-09-22 | 2020-12-25 | 平安科技(深圳)有限公司 | Detection method, device, equipment and storage medium for multi-element KPI time sequence |
Also Published As
Publication number | Publication date |
---|---|
CN112686330A (en) | 2021-04-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TW202004658A (en) | Self-tuning incremental model compression method in deep neural network | |
CN111784061B (en) | Training method, device and equipment for power grid engineering cost prediction model | |
CN116307215A (en) | Load prediction method, device, equipment and storage medium of power system | |
CN116050674B (en) | Hydraulic engineering operation trend prediction method and device | |
CN115514614B (en) | Cloud network anomaly detection model training method based on reinforcement learning and storage medium | |
CN115361318B (en) | LSTM edge calculation flow prediction method for dynamic load balancing in complex environment | |
CN114860542A (en) | Trend prediction model optimization method, trend prediction model optimization device, electronic device, and medium | |
CN112686330B (en) | KPI abnormal data detection method and device, storage medium and electronic equipment | |
CN116522594A (en) | Time self-adaptive transient stability prediction method and device based on convolutional neural network | |
CN115017819A (en) | Engine remaining service life prediction method and device based on hybrid model | |
CN114564345A (en) | Server abnormity detection method, device, equipment and storage medium | |
CN114580548A (en) | Training method of target detection model, target detection method and device | |
CN112989829A (en) | Named entity identification method, device, equipment and storage medium | |
CN116628444A (en) | Water quality early warning method based on improved meta-learning | |
CN117097541A (en) | API service attack detection method, device, equipment and storage medium | |
CN116910559A (en) | Index anomaly detection method for intelligent operation and maintenance application of power grid supercomputer center | |
CN111475548A (en) | Power utilization abnormity analysis decision system based on big data mining technology | |
Shu et al. | A general kpi anomaly detection using attention models | |
CN116258167A (en) | Data detection method, device, electronic equipment and medium | |
CN115952928A (en) | Short-term power load prediction method, device, equipment and storage medium | |
CN116168403A (en) | Medical data classification model training method, classification method, device and related medium | |
CN114239750A (en) | Alarm data processing method, device, storage medium and equipment | |
CN111178630A (en) | Load prediction method and device | |
CN112465259B (en) | Switch fault prediction method based on deep neural network | |
CN118094248B (en) | Dam operation condition similarity matching method, system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |