CN113268372B - One-dimensional time series anomaly detection method and device and computer equipment - Google Patents

One-dimensional time series anomaly detection method and device and computer equipment Download PDF

Info

Publication number
CN113268372B
CN113268372B CN202110821949.8A CN202110821949A CN113268372B CN 113268372 B CN113268372 B CN 113268372B CN 202110821949 A CN202110821949 A CN 202110821949A CN 113268372 B CN113268372 B CN 113268372B
Authority
CN
China
Prior art keywords
predicted
sample
dimensional
context information
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110821949.8A
Other languages
Chinese (zh)
Other versions
CN113268372A (en
Inventor
蔡志平
王承禹
周桐庆
余广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202110821949.8A priority Critical patent/CN113268372B/en
Publication of CN113268372A publication Critical patent/CN113268372A/en
Application granted granted Critical
Publication of CN113268372B publication Critical patent/CN113268372B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application relates to a one-dimensional time series abnormality detection method and device and computer equipment. The method comprises the following steps: extracting a point to be predicted from the one-dimensional time sequence, and extracting context information to be predicted of the point to be predicted through a sliding window; reducing the dimension of the context information to be predicted by adopting an encoder to obtain low-dimensional embedded data to be predicted; inquiring a plurality of neighbor data of the low-dimensional embedded data to be predicted in the detection set; according to the sample performance vector of the neighbor data, the probability that the base detector sequence correctly detects the one-dimensional time sequence is obtained; obtaining a base detector with the highest detection performance according to the probability of correctly detecting the one-dimensional time sequence by the base detector sequence; and carrying out anomaly detection on the one-dimensional time series according to the base detector with the highest detection performance. The method can improve the time series abnormity detection performance.

Description

One-dimensional time series anomaly detection method and device and computer equipment
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for detecting one-dimensional time series anomalies, and a computer device.
Background
Currently, large internet companies need to closely monitor the real-time performance of their systems, and a short service interruption or quality degradation may result in a huge traffic loss. These real-time performance data (e.g., search response time, CPU usage) are typically collected and stored in a time series. In order to ensure smooth operation of the service, these companies often develop an anomaly detection system capable of accurately detecting time series anomalies and timely troubleshooting the anomalies.
However, the number of time series data in a large company is very large, the data characteristics are very different, the universality of the existing time series abnormality detection algorithm is poor, and each abnormality detection algorithm has the applicable data type, so that the accuracy of time series data abnormality detection by adopting a single model is not high.
Disclosure of Invention
In view of the above, it is desirable to provide a one-dimensional time series abnormality detection method, a one-dimensional time series abnormality detection apparatus, and a computer device, which can improve the time series abnormality detection performance.
A one-dimensional time series anomaly detection method, the method comprising:
extracting sample points from the one-dimensional time series; the one-dimensional time sequence comprises sampling values of a plurality of time points; the sample point corresponds to a sampling value and a label value of a time point;
for each sample point, extracting sample context information of the sample point through a sliding window;
reducing the dimension of the sample context information by adopting an encoder to obtain sample low-dimensional embedded data;
inputting the sample context information and the label value of the sample point into a base detector sequence, and obtaining a sample performance vector according to the output result of the base detector sequence; the base detector sequence is composed of a plurality of base detectors;
establishing a detection set according to the one-to-one correspondence relationship between the sample low-dimensional embedded data and the sample performance vectors;
extracting a point to be predicted from the one-dimensional time sequence, and extracting context information to be predicted of the point to be predicted through a sliding window;
reducing the dimension of the context information to be predicted by adopting an encoder to obtain low-dimensional embedded data to be predicted;
inquiring a plurality of neighbor data of the low-dimensional embedded data to be predicted in the detection set;
according to the sample performance vector of the neighbor data, the probability that the base detector sequence correctly detects the one-dimensional time sequence is obtained;
obtaining a base detector with the highest detection performance according to the probability of correctly detecting the one-dimensional time sequence by the base detector sequence;
and carrying out anomaly detection on the one-dimensional time series according to the base detector with the highest detection performance.
In one embodiment, a decoder is adopted to reconstruct low-dimensional embedded data to be predicted to obtain reconstructed data; obtaining a reconstruction error according to the context information to be predicted and the reconstruction data; and if the reconstruction error is larger than the threshold value, receiving a real label corresponding to the context information to be predicted through a human-computer interaction interface, and predicting the low-dimensional embedded data to be predicted of the context information to be predicted by adopting a preset standard base detector.
In one embodiment, the real tags and the low-dimensional embedded data to be predicted are stored in the detection set after one-to-one correspondence.
In one embodiment, obtaining a reconstruction error according to context information to be predicted and reconstruction data includes: according to the context information to be predicted and the root mean square error of the reconstruction data, the reconstruction error is obtained as follows:
Figure 202729DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 946694DEST_PATH_IMAGE002
the root mean square error is represented as a function of,
Figure 308274DEST_PATH_IMAGE003
represents the context information to be predicted and,
Figure 112282DEST_PATH_IMAGE004
the reconstructed data is represented by the reconstructed data,
Figure 106783DEST_PATH_IMAGE005
representing a norm.
In one embodiment, establishing a detection set according to a one-to-one correspondence relationship between sample low-dimensional embedded data and sample performance vectors includes:
and storing the sample low-dimensional embedded data and the sample performance vectors by adopting a KD-Tree algorithm according to the one-to-one correspondence of the sample low-dimensional embedded data and the sample performance vectors to obtain a detection set.
In one embodiment, the probability that the base detector sequence correctly detects the one-dimensional time sequence is calculated from the sample performance vector of the neighbor data using the following conditional probability formula:
Figure 388860DEST_PATH_IMAGE006
wherein d represents the number of base detector sequences,
Figure 355679DEST_PATH_IMAGE007
represents the context information to be predicted and,
Figure 392905DEST_PATH_IMAGE008
representing the probability that the d-th basis detector correctly detects the point to be predicted,
Figure 796073DEST_PATH_IMAGE009
representing a performance vector
Figure 147420DEST_PATH_IMAGE010
The value of the d-th bit, K represents the number of the neighbor data, and j is the number of the performance vector.
In one embodiment, the probability-highest basis detector of the basis detectors that correctly detects the one-dimensional time series is selected using the following formula according to the probability of the basis detector series correctly detecting the one-dimensional time series,
Figure 765484DEST_PATH_IMAGE011
wherein the content of the first and second substances,
Figure 176873DEST_PATH_IMAGE012
indicating the base detector with the highest probability of detection,
Figure 818070DEST_PATH_IMAGE013
representing a sequence of basis detectors and M representing the total number of basis detectors.
In one embodiment, the encoder and said decoder are implemented by a variational self-encoder.
A one-dimensional time series anomaly detection apparatus, said apparatus comprising:
the sample low-dimensional embedded data acquisition module is used for extracting sample points from the one-dimensional time sequence; the one-dimensional time sequence comprises sampling values of a plurality of time points; the sample point corresponds to a sampling value and a label value of a time point; for each sample point, extracting sample context information of the sample point through a sliding window; reducing the dimension of the sample context information by adopting an encoder to obtain sample low-dimensional embedded data;
the detection set acquisition module is used for inputting the sample context information and the label value of the sample point into the base detector sequence and obtaining a sample performance vector according to the output result of the base detector sequence; the base detector sequence is composed of a plurality of base detectors; establishing a detection set according to the one-to-one correspondence relationship between the sample low-dimensional embedded data and the sample performance vectors;
the encoder dimension reduction module is used for extracting a point to be predicted from the one-dimensional time sequence and extracting context information to be predicted of the point to be predicted through a sliding window; reducing the dimension of the context information to be predicted by adopting an encoder to obtain low-dimensional embedded data to be predicted;
the anomaly detection module is used for inquiring a plurality of neighbor data of the low-dimensional embedded data to be predicted in the detection set; according to the sample performance vector of the neighbor data, the probability that the base detector sequence correctly detects the one-dimensional time sequence is obtained; obtaining a base detector with the highest detection performance according to the probability of correctly detecting the one-dimensional time sequence by the base detector sequence; and carrying out anomaly detection on the one-dimensional time series according to the base detector with the highest detection performance.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
extracting sample points from the one-dimensional time series; the one-dimensional time sequence comprises sampling values of a plurality of time points; the sample point corresponds to a sampling value and a label value of a time point; for each sample point, extracting sample context information of the sample point through a sliding window; reducing the dimension of the sample context information by adopting an encoder to obtain sample low-dimensional embedded data; inputting the sample context information and the label value of the sample point into a base detector sequence, and obtaining a sample performance vector according to the output result of the base detector sequence; the base detector sequence is composed of a plurality of base detectors; establishing a detection set according to the one-to-one correspondence relationship between the sample low-dimensional embedded data and the sample performance vectors; extracting a point to be predicted from the one-dimensional time sequence, and extracting context information to be predicted of the point to be predicted through a sliding window; reducing the dimension of the context information to be predicted by adopting an encoder to obtain low-dimensional embedded data to be predicted; inquiring a plurality of neighbor data of the low-dimensional embedded data to be predicted in the detection set; according to the sample performance vector of the neighbor data, the probability that the base detector sequence correctly detects the one-dimensional time sequence is obtained; obtaining a base detector with the highest detection performance according to the probability of correctly detecting the one-dimensional time sequence by the base detector sequence; and carrying out anomaly detection on the one-dimensional time series according to the base detector with the highest detection performance.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
extracting sample points from the one-dimensional time series; the one-dimensional time sequence comprises sampling values of a plurality of time points; the sample point corresponds to a sampling value and a label value of a time point; for each sample point, extracting sample context information of the sample point through a sliding window; reducing the dimension of the sample context information by adopting an encoder to obtain sample low-dimensional embedded data; inputting the sample context information and the label value of the sample point into a base detector sequence, and obtaining a sample performance vector according to the output result of the base detector sequence; the base detector sequence is composed of a plurality of base detectors; establishing a detection set according to the one-to-one correspondence relationship between the sample low-dimensional embedded data and the sample performance vectors; extracting a point to be predicted from the one-dimensional time sequence, and extracting context information to be predicted of the point to be predicted through a sliding window; reducing the dimension of the context information to be predicted by adopting an encoder to obtain low-dimensional embedded data to be predicted; inquiring a plurality of neighbor data of the low-dimensional embedded data to be predicted in the detection set; according to the sample performance vector of the neighbor data, the probability that the base detector sequence correctly detects the one-dimensional time sequence is obtained; obtaining a base detector with the highest detection performance according to the probability of correctly detecting the one-dimensional time sequence by the base detector sequence; and carrying out anomaly detection on the one-dimensional time series according to the base detector with the highest detection performance.
According to the one-dimensional time sequence anomaly detection method, the one-dimensional time sequence anomaly detection device and the computer equipment, firstly, sample points are extracted from the one-dimensional time sequence; the one-dimensional time sequence comprises sampling values of a plurality of time points; the sample point corresponds to a sampling value and a label value of a time point; for each sample point, extracting sample context information of the sample point through a sliding window, so that the sample point is represented by adopting the context information and is suitable for real-time detection; reducing the dimension of the sample context information by adopting an encoder to obtain sample low-dimensional embedded data; inputting the sample context information and the label value of the sample point into a base detector sequence, and obtaining a sample performance vector according to the output result of the base detector sequence; the base detector sequence is composed of a plurality of base detectors; establishing a detection set according to the one-to-one correspondence relationship between the sample low-dimensional embedded data and the sample performance vectors; then, extracting a point to be predicted from the one-dimensional time sequence, and extracting context information to be predicted of the point to be predicted through a sliding window; reducing the dimension of the context information to be predicted by adopting an encoder to obtain low-dimensional embedded data to be predicted; inquiring a plurality of neighbor data of the low-dimensional embedded data to be predicted in the detection set; according to the sample performance vector of the neighbor data, the probability that the base detector sequence correctly detects the one-dimensional time sequence is obtained, so that the base detector with the highest detection performance can be obtained in real time during real-time detection; obtaining a base detector with the highest detection performance according to the probability of correctly detecting the one-dimensional time sequence by the base detector sequence; and carrying out anomaly detection on the one-dimensional time series according to the base detector with the highest detection performance. Therefore, the embodiment of the invention can carry out accurate anomaly detection on different types of one-dimensional time series through a single model.
Drawings
Fig. 1 is a schematic flow chart of a one-dimensional time series anomaly detection method in one embodiment.
FIG. 2 is a diagram illustrating an overall structure of a one-dimensional time-series anomaly detection method according to an embodiment;
FIG. 3 is a diagram illustrating a process of encoding sample points in a one-dimensional time series according to an embodiment;
FIG. 4 is a diagram illustrating a process of decoding sample points in a one-dimensional time series according to another embodiment;
FIG. 5 is a block diagram showing the structure of a one-dimensional time-series abnormality detection apparatus according to an embodiment;
FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, there is provided a one-dimensional time series anomaly detection method, including the steps of:
step 102, extracting sample points from a one-dimensional time sequence; for each sample point, extracting sample context information of the sample point through a sliding window; and reducing the dimension of the sample context information by adopting an encoder to obtain sample low-dimensional embedded data.
The one-dimensional time series is extracted from a service system, the service system may be an internet system, and the time series obtained from the one-dimensional time series refers to real-time performance data, such as search response time and CPU utilization. The business system may also be a plant status detection system, and the time series may be status data collected by the sensors. The one-dimensional time sequence comprises a plurality of sampling values of time points, and the sampling points extracted from the one-dimensional time sequence correspond to the sampling value and the label value of one time point. The context information of a sample refers to a plurality of consecutive sample values preceding the sample, essentially a consecutive subsequence. The encoder trains using an Evidence Lower Bound (ELBO) as a loss function and maximizing the loss function, and the trained encoder encodes the sample context information into a low-dimensional embedding space, where it is stored in the form of low-dimensional embedded data.
Step 104, inputting the sample context information and the label value of the sample point into a base detector sequence, and obtaining a sample performance vector according to the output result of the base detector sequence; and establishing a detection set according to the one-to-one correspondence relationship between the sample low-dimensional embedded data and the sample performance vectors.
The base detector sequence is an ordered set of base detectors comprising a plurality of base detectors. The sample context information and the label value of the sample point are input into the base detector sequence, the base detector outputs a detection result, and the detection result is represented as a vector (0 represents error, and 1 represents correct), which is called a performance vector. And storing the sample low-dimensional embedded data and the sample performance vectors according to the one-to-one correspondence relationship of the two to obtain a detection set.
106, extracting a point to be predicted from the one-dimensional time sequence, and extracting context information to be predicted of the point to be predicted through a sliding window; and reducing the dimension of the context information to be predicted by adopting an encoder to obtain low-dimensional embedded data to be predicted.
The context information to be predicted of the point to be predicted refers to a plurality of continuous sample values in front of the point to be predicted, and the encoder in the invention is realized by adopting a variational self-encoder.
Step 108, inquiring a plurality of neighbor data of the low-dimensional embedded data to be predicted in the detection set; according to the sample performance vector of the neighbor data, the probability that the base detector sequence correctly detects the one-dimensional time sequence is obtained; obtaining a base detector with the highest detection performance according to the probability of correctly detecting the one-dimensional time sequence by the base detector sequence; and carrying out anomaly detection on the one-dimensional time series according to the base detector with the highest detection performance.
The neighbor data refers to data which are most adjacent to low-dimensional embedded data to be predicted, sample performance vectors of the neighbor data can be obtained according to the one-to-one correspondence relationship between the low-dimensional embedded data and the sample performance vectors, the sample performance vectors of the neighbor data are used as input of the probability that a base detector sequence correctly detects a one-dimensional time sequence, the probability that the base detector sequence correctly detects the one-dimensional time sequence is obtained, and a base detector with the highest probability is selected as a base detector of a point to be predicted to perform anomaly detection on the one-dimensional time sequence, so that the anomaly detection result of the one-dimensional time sequence is obtained.
According to the one-dimensional time sequence anomaly detection method, the one-dimensional time sequence anomaly detection device and the computer equipment, firstly, sample points are extracted from the one-dimensional time sequence; the one-dimensional time sequence comprises sampling values of a plurality of time points; the sample point corresponds to a sampling value and a label value of a time point; for each sample point, extracting sample context information of the sample point through a sliding window, so that the sample point is represented by adopting the context information and is suitable for real-time detection; reducing the dimension of the sample context information by adopting an encoder to obtain sample low-dimensional embedded data; inputting the sample context information and the label value of the sample point into a base detector sequence, and obtaining a sample performance vector according to the output result of the base detector sequence; the base detector sequence is composed of a plurality of base detectors; establishing a detection set according to the one-to-one correspondence relationship between the sample low-dimensional embedded data and the sample performance vectors; then, extracting a point to be predicted from the one-dimensional time sequence, and extracting context information to be predicted of the point to be predicted through a sliding window; reducing the dimension of the context information to be predicted by adopting an encoder to obtain low-dimensional embedded data to be predicted; inquiring a plurality of neighbor data of the low-dimensional embedded data to be predicted in the detection set; according to the sample performance vector of the neighbor data, the probability that the base detector sequence correctly detects the one-dimensional time sequence is obtained, so that the base detector with the highest detection performance can be obtained in real time during real-time detection; obtaining a base detector with the highest detection performance according to the probability of correctly detecting the one-dimensional time sequence by the base detector sequence; and carrying out anomaly detection on the one-dimensional time series according to the base detector with the highest detection performance. Therefore, the embodiment of the invention can carry out accurate anomaly detection on different types of one-dimensional time series through a single model.
In one embodiment, a decoder is adopted to reconstruct low-dimensional embedded data to be predicted to obtain reconstructed data; obtaining a reconstruction error according to the context information to be predicted and the reconstruction data; and if the reconstruction error is larger than the threshold value, receiving a real label corresponding to the context information to be predicted through a human-computer interaction interface, and predicting the low-dimensional embedded data to be predicted of the context information to be predicted by adopting a preset standard base detector.
In the detection process, the decoder reconstructs the low-dimensional embedded data to be predicted at the same time, and calculates the reconstruction error between each reconstructed data and the context information to be predicted. When the reconstruction error is larger than a preset threshold value, the human-computer interaction interface can generate an instruction for marking the context information to be predicted, a real label corresponding to the context information to be predicted is obtained after the context information to be predicted is marked, the real label corresponding to the context information to be predicted is received through the human-computer interaction interface, and the labor cost for marking data is reduced. Meanwhile, a preset standard base detector is adopted to predict to-be-predicted low-dimensional embedded data of the to-be-predicted context information, and when the context information to be predicted with larger reconstruction error is encountered, the lower limit of one-dimensional time sequence anomaly detection is ensured.
In one embodiment, the real tags and the low-dimensional embedded data to be predicted are stored in the detection set after one-to-one correspondence. Therefore, in the whole detection process, the detection set is continuously updated, and thus the accuracy of the anomaly detection performed by the embodiment is higher and higher as time goes by.
The real label is a real label corresponding to the context information to be predicted, the real label and the low-dimensional embedded data to be predicted are in one-to-one correspondence to obtain low-dimensional embedded data with the label, and the low-dimensional embedded data with the label are stored in the detection set to improve the data volume with the label in the detection set, so that the labor cost of marking the data is reduced.
In one embodiment, obtaining a reconstruction error according to context information to be predicted and reconstruction data includes: according to the context information to be predicted and the root mean square error of the reconstruction data, the reconstruction error is obtained as follows:
Figure 956796DEST_PATH_IMAGE014
wherein the content of the first and second substances,
Figure 694945DEST_PATH_IMAGE015
the root mean square error is represented as a function of,
Figure 277236DEST_PATH_IMAGE016
represents the context information to be predicted and,
Figure 405729DEST_PATH_IMAGE017
the reconstructed data is represented by the reconstructed data,
Figure 98879DEST_PATH_IMAGE018
representing a norm.
And taking the root mean square error of the context information to be predicted and the reconstructed data as a reconstruction error, comparing the reconstruction error with a preset threshold, and when the reconstruction error is greater than the preset threshold, generating an instruction for marking the context information to be predicted by the man-machine interaction interface to mark the context information to be predicted.
In one embodiment, establishing a detection set according to a one-to-one correspondence relationship between sample low-dimensional embedded data and sample performance vectors includes:
and storing the sample low-dimensional embedded data and the sample performance vectors by adopting a KD-Tree algorithm according to the one-to-one correspondence of the sample low-dimensional embedded data and the sample performance vectors to obtain a detection set.
The detection set is used for inquiring a plurality of neighbor data of the low-dimensional embedded data to be predicted.
In one embodiment, the probability that the base detector sequence correctly detects the one-dimensional time sequence is calculated from the sample performance vector of the neighbor data using the following conditional probability formula:
Figure 878485DEST_PATH_IMAGE019
wherein d represents the number of base detector sequences,
Figure 631677DEST_PATH_IMAGE020
represents the context information to be predicted and,
Figure 309783DEST_PATH_IMAGE021
representing the probability that the d-th basis detector correctly detects the point to be predicted,
Figure 541044DEST_PATH_IMAGE022
representing a performance vector
Figure 191469DEST_PATH_IMAGE023
The value of the d-th bit, K represents the number of neighbor data, and j is the number of the performance vector.
And each base detector detects the one-dimensional time sequence, calculates the probability of correctly detecting the one-dimensional time sequence by each base detector, and selects the base detector with the highest probability of correctly detecting the one-dimensional time sequence according to the probability.
In one embodiment, the probability-highest basis detector of the basis detectors that correctly detects the one-dimensional time series is selected using the following formula according to the probability of the basis detector series correctly detecting the one-dimensional time series,
Figure 646721DEST_PATH_IMAGE024
wherein the content of the first and second substances,
Figure 264653DEST_PATH_IMAGE025
indicating the base detector with the highest probability of detection,
Figure 299605DEST_PATH_IMAGE026
representing a sequence of basis detectors and M representing the total number of basis detectors.
The base detector with the best performance is used for carrying out abnormity detection on the one-dimensional time sequence, and the accuracy rate of correctly detecting the one-dimensional time sequence can be greatly improved.
In one embodiment, the encoder and said decoder are implemented by a variational self-encoder. By integrating detectors with complementary capabilities to each other, the performance of anomaly detection can be improved.
It should be understood that, although the various steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 5, there is provided a one-dimensional time-series abnormality detection apparatus including: a sample low-dimensional embedded data acquisition module 502, a detection set acquisition module 504, an encoder dimension reduction module 506, and an anomaly detection module 508, wherein:
a sample low-dimensional embedded data acquisition module 502 for extracting sample points from the one-dimensional time series; the one-dimensional time sequence comprises sampling values of a plurality of time points; the sample point corresponds to a sampling value and a label value of a time point; for each sample point, extracting sample context information of the sample point through a sliding window; reducing the dimension of the sample context information by adopting an encoder to obtain sample low-dimensional embedded data;
a detection set obtaining module 504, configured to input sample context information and a tag value of a sample point into a base detector sequence, and obtain a sample performance vector according to an output result of the base detector sequence; the base detector sequence is composed of a plurality of base detectors; establishing a detection set according to the one-to-one correspondence relationship between the sample low-dimensional embedded data and the sample performance vectors;
an encoder dimension reduction module 506, configured to extract a point to be predicted from the one-dimensional time sequence, and extract context information to be predicted of the point to be predicted through a sliding window; reducing the dimension of the context information to be predicted by adopting an encoder to obtain low-dimensional embedded data to be predicted;
an anomaly detection module 508, configured to query a plurality of neighbor data of the low-dimensional embedded data to be predicted in the detection set; according to the sample performance vector of the neighbor data, the probability that the base detector sequence correctly detects the one-dimensional time sequence is obtained; obtaining a base detector with the highest detection performance according to the probability of correctly detecting the one-dimensional time sequence by the base detector sequence; and carrying out anomaly detection on the one-dimensional time series according to the base detector with the highest detection performance.
In one embodiment, the detection set obtaining module 504 is further configured to store the sample low-dimensional embedded data and the sample performance vector by using a KD-Tree algorithm according to a one-to-one correspondence relationship between the sample low-dimensional embedded data and the sample performance vector, so as to obtain a detection set.
In one embodiment, the anomaly detection module 508 is further configured to calculate the probability that the base detector sequence correctly detects the one-dimensional time sequence according to the sample performance vector of the neighbor data using the following conditional probability formula:
Figure 804536DEST_PATH_IMAGE027
wherein d represents the number of base detector sequences,
Figure 633951DEST_PATH_IMAGE028
represents the context information to be predicted and,
Figure 286650DEST_PATH_IMAGE029
representing the probability that the d-th basis detector correctly detects the point to be predicted,
Figure 374560DEST_PATH_IMAGE030
representing a performance vector
Figure 468418DEST_PATH_IMAGE031
The value of the d-th bit, K represents the number of neighbor data, and j is the number of the performance vector.
In one embodiment, the anomaly detection module 508 is further configured to select the most probable basis detector of the basis detectors that correctly detects the one-dimensional time series according to the probability that the basis detector sequence correctly detects the one-dimensional time series,
Figure 734314DEST_PATH_IMAGE032
wherein the content of the first and second substances,
Figure 608729DEST_PATH_IMAGE033
indicating the base detector with the highest probability of detection,
Figure 251063DEST_PATH_IMAGE034
representing a sequence of basis detectors and M representing the total number of basis detectors.
In one embodiment, the apparatus for detecting one-dimensional time series abnormality further includes a decoding module, configured to reconstruct low-dimensional embedded data to be predicted by using a decoder, so as to obtain reconstructed data; obtaining a reconstruction error according to the context information to be predicted and the reconstruction data; and if the reconstruction error is larger than the threshold value, receiving a real label corresponding to the context information to be predicted through a human-computer interaction interface, and predicting the low-dimensional embedded data to be predicted of the context information to be predicted by adopting a preset standard base detector.
In one embodiment, the decoding module is further configured to store the real tag and the low-dimensional embedded data to be predicted in the detection set after the real tag and the low-dimensional embedded data to be predicted correspond to each other one by one.
In one embodiment, the decoding module is further configured to obtain a reconstruction error according to the context information to be predicted and the reconstruction data, and includes: according to the context information to be predicted and the root mean square error of the reconstruction data, the reconstruction error is obtained as follows:
Figure 714275DEST_PATH_IMAGE035
wherein the content of the first and second substances,
Figure 947810DEST_PATH_IMAGE036
the root mean square error is represented as a function of,
Figure 247204DEST_PATH_IMAGE037
represents the context information to be predicted and,
Figure 427650DEST_PATH_IMAGE038
the reconstructed data is represented by the reconstructed data,
Figure 558417DEST_PATH_IMAGE039
representing a norm.
For the specific limitation of the one-dimensional time series abnormality detection apparatus, reference may be made to the above limitation on the one-dimensional time series abnormality detection method, and details thereof are not repeated here. The modules in the one-dimensional time series abnormality detection apparatus may be wholly or partially implemented by software, hardware, or a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a one-dimensional time series anomaly detection method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the method in the above embodiments when the processor executes the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method in the above-mentioned embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A one-dimensional time series abnormality detection method is characterized by comprising the following steps:
extracting sample points from the one-dimensional time series; the one-dimensional time series comprises sampling values of a plurality of time points; the sample point corresponds to a sampling value and a label value of a time point;
for each sample point, extracting sample context information of the sample point through a sliding window; the sample context information is a plurality of sampling values in succession before the sample point;
reducing the dimension of the sample context information by adopting an encoder to obtain sample low-dimensional embedded data;
inputting the sample context information and the label value of the sample point into a base detector sequence, and obtaining a sample performance vector according to an output result of the base detector sequence; the base detector sequence is composed of a plurality of base detectors;
establishing a detection set according to the one-to-one correspondence relationship between the sample low-dimensional embedded data and the sample performance vectors;
extracting a point to be predicted from the one-dimensional time sequence, and extracting context information to be predicted of the point to be predicted through a sliding window; the context information to be predicted of the point to be predicted is a plurality of continuous sampling values in front of the point to be predicted;
adopting an encoder to perform dimension reduction on the context information to be predicted to obtain low-dimensional embedded data to be predicted;
inquiring a plurality of neighbor data of the low-dimensional embedded data to be predicted in the detection set; the neighbor data is the data which is most adjacent to the low-dimensional embedded data to be predicted;
obtaining the probability that the base detector sequence correctly detects the one-dimensional time sequence according to the sample performance vector of the neighbor data;
obtaining a base detector with the highest detection performance according to the probability of correctly detecting the one-dimensional time sequence by the base detector sequence;
and carrying out anomaly detection on the one-dimensional time sequence according to the base detector with the highest detection performance.
2. The method of claim 1, further comprising:
reconstructing the low-dimensional embedded data to be predicted by adopting a decoder to obtain reconstructed data;
obtaining a reconstruction error according to the context information to be predicted and the reconstruction data;
and if the reconstruction error is larger than the threshold value, receiving a real label corresponding to the context information to be predicted through a human-computer interaction interface, and predicting the low-dimensional embedded data to be predicted of the context information to be predicted by adopting a preset standard base detector.
3. The method of claim 2, further comprising:
and after the real label and the low-dimensional embedded data to be predicted are in one-to-one correspondence, storing the real label and the low-dimensional embedded data to be predicted into the detection set.
4. The method according to claim 2, wherein the deriving a reconstruction error according to the context information to be predicted and the reconstruction data comprises:
according to the context information to be predicted and the root mean square error of the reconstruction data, obtaining a reconstruction error as follows:
Figure 750607DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 971504DEST_PATH_IMAGE002
the root mean square error is represented as a function of,
Figure 325125DEST_PATH_IMAGE003
represents the context information to be predicted and,
Figure 264262DEST_PATH_IMAGE004
the reconstructed data is represented by the reconstructed data,
Figure 197583DEST_PATH_IMAGE005
representing a norm.
5. The method of claim 1, wherein establishing a detection set according to the one-to-one correspondence between the sample low-dimensional embedded data and the sample performance vector comprises:
and storing the sample low-dimensional embedded data and the sample performance vector by adopting a KD-Tree algorithm according to the one-to-one correspondence of the sample low-dimensional embedded data and the sample performance vector to obtain a detection set.
6. The method of claim 1, wherein deriving a probability that a base detector sequence correctly detects the one-dimensional time series from a sample performance vector of the neighbor data comprises:
calculating the probability that the base detector sequence correctly detects the one-dimensional time sequence according to the sample performance vector of the neighbor data by using the following conditional probability formula:
Figure 663199DEST_PATH_IMAGE006
wherein d represents the number of base detector sequences,
Figure 390984DEST_PATH_IMAGE007
represents the context information to be predicted and,
Figure 817417DEST_PATH_IMAGE008
representing the probability that the d-th basis detector correctly detects the point to be predicted,
Figure 85587DEST_PATH_IMAGE009
representing a performance vector
Figure 343393DEST_PATH_IMAGE010
The value of the d-th bit, K represents the number of the neighbor data, and j is the number of the performance vector.
7. The method of claim 6, wherein obtaining the base detector with the highest detection performance according to the probability that the base detector sequence correctly detects the one-dimensional time sequence comprises:
selecting a most probable basis detector of the basis detectors that correctly detects the one-dimensional time series according to a probability of the basis detector series correctly detecting the one-dimensional time series using the following formula,
Figure 442412DEST_PATH_IMAGE011
wherein the content of the first and second substances,
Figure 418458DEST_PATH_IMAGE012
indicating the base detector with the highest probability of detection,
Figure 224740DEST_PATH_IMAGE013
representing a sequence of basis detectors, M representing the total number of said basis detectors,
Figure 540315DEST_PATH_IMAGE014
representing the number of said basis detectors.
8. The method according to any of claims 2 to 4, wherein the encoder and the decoder are implemented by a variational self-encoder.
9. A one-dimensional time-series abnormality detection apparatus, characterized in that the apparatus comprises:
the sample low-dimensional embedded data acquisition module is used for extracting sample points from the one-dimensional time sequence; the one-dimensional time series comprises sampling values of a plurality of time points; the sample point corresponds to a sampling value and a label value of a time point; for each sample point, extracting sample context information of the sample point through a sliding window; reducing the dimension of the sample context information by adopting an encoder to obtain sample low-dimensional embedded data; the sample context information is a plurality of sampling values in succession before the sample point;
the detection set acquisition module is used for inputting the sample context information and the label value of the sample point into a base detector sequence and obtaining a sample performance vector according to an output result of the base detector sequence; the base detector sequence is composed of a plurality of base detectors; establishing a detection set according to the one-to-one correspondence relationship between the sample low-dimensional embedded data and the sample performance vectors;
the encoder dimension reduction module is used for extracting a point to be predicted from the one-dimensional time sequence and extracting context information to be predicted of the point to be predicted through a sliding window; adopting an encoder to perform dimension reduction on the context information to be predicted to obtain low-dimensional embedded data to be predicted; the context information to be predicted of the point to be predicted is a plurality of continuous sampling values in front of the point to be predicted;
the anomaly detection module is used for inquiring a plurality of neighbor data of the low-dimensional embedded data to be predicted in the detection set; the neighbor data is the data which is most adjacent to the low-dimensional embedded data to be predicted; obtaining the probability that the base detector sequence correctly detects the one-dimensional time sequence according to the sample performance vector of the neighbor data; obtaining a base detector with the highest detection performance according to the probability of correctly detecting the one-dimensional time sequence by the base detector sequence; and carrying out anomaly detection on the one-dimensional time sequence according to the base detector with the highest detection performance.
10. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 8 when executing the computer program.
CN202110821949.8A 2021-07-21 2021-07-21 One-dimensional time series anomaly detection method and device and computer equipment Active CN113268372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110821949.8A CN113268372B (en) 2021-07-21 2021-07-21 One-dimensional time series anomaly detection method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110821949.8A CN113268372B (en) 2021-07-21 2021-07-21 One-dimensional time series anomaly detection method and device and computer equipment

Publications (2)

Publication Number Publication Date
CN113268372A CN113268372A (en) 2021-08-17
CN113268372B true CN113268372B (en) 2021-09-24

Family

ID=77236896

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110821949.8A Active CN113268372B (en) 2021-07-21 2021-07-21 One-dimensional time series anomaly detection method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN113268372B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106527756A (en) * 2016-10-26 2017-03-22 长沙军鸽软件有限公司 Method and device for intelligently correcting input information
CN106997474A (en) * 2016-12-29 2017-08-01 南京邮电大学 A kind of node of graph multi-tag sorting technique based on deep learning
CN107133343A (en) * 2017-05-19 2017-09-05 哈工大大数据产业有限公司 Big data abnormal state detection method and device based on time series approximate match
EP3401789A1 (en) * 2017-05-09 2018-11-14 Skyline Communications NV Anomaly detection in time series
CN109032829A (en) * 2018-07-23 2018-12-18 腾讯科技(深圳)有限公司 Data exception detection method, device, computer equipment and storage medium
CN111177224A (en) * 2019-12-30 2020-05-19 浙江大学 Time sequence unsupervised anomaly detection method based on conditional regularized flow model
CN111931868A (en) * 2020-09-24 2020-11-13 常州微亿智造科技有限公司 Time series data abnormity detection method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10373070B2 (en) * 2015-10-14 2019-08-06 International Business Machines Corporation Anomaly detection model selection and validity for time series data
US10628252B2 (en) * 2017-11-17 2020-04-21 Google Llc Real-time anomaly detection and correlation of time-series data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106527756A (en) * 2016-10-26 2017-03-22 长沙军鸽软件有限公司 Method and device for intelligently correcting input information
CN106997474A (en) * 2016-12-29 2017-08-01 南京邮电大学 A kind of node of graph multi-tag sorting technique based on deep learning
EP3401789A1 (en) * 2017-05-09 2018-11-14 Skyline Communications NV Anomaly detection in time series
CN107133343A (en) * 2017-05-19 2017-09-05 哈工大大数据产业有限公司 Big data abnormal state detection method and device based on time series approximate match
CN109032829A (en) * 2018-07-23 2018-12-18 腾讯科技(深圳)有限公司 Data exception detection method, device, computer equipment and storage medium
CN111177224A (en) * 2019-12-30 2020-05-19 浙江大学 Time sequence unsupervised anomaly detection method based on conditional regularized flow model
CN111931868A (en) * 2020-09-24 2020-11-13 常州微亿智造科技有限公司 Time series data abnormity detection method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Research on Time Series Anomaly Detection Algorithm and Application》;zhiyang zhao;《2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC 2019)》;20191222;全文 *
《智能视频异常事件检测方法综述》;王思齐;《计算机工程与科学》;20191231;全文 *

Also Published As

Publication number Publication date
CN113268372A (en) 2021-08-17

Similar Documents

Publication Publication Date Title
US11243524B2 (en) System and method for unsupervised root cause analysis of machine failures
CN112436968B (en) Network traffic monitoring method, device, equipment and storage medium
US11138056B2 (en) System and method for unsupervised prediction of machine failures
CN111667011A (en) Damage detection model training method, damage detection model training device, damage detection method, damage detection device, damage detection equipment and damage detection medium
CN110008251B (en) Data processing method and device based on time sequence data and computer equipment
CN109285105A (en) Method of detecting watermarks, device, computer equipment and storage medium
CN112131272A (en) Detection method, device, equipment and storage medium for multi-element KPI time sequence
JP2005141601A (en) Model selection computing device, dynamic model selection device, dynamic model selection method, and program
CN111897695B (en) Method and device for acquiring KPI abnormal data sample and computer equipment
CN110580488B (en) Multi-working-condition industrial monitoring method, device, equipment and medium based on dictionary learning
CN110162972B (en) UAF vulnerability detection method based on statement joint coding deep neural network
CN110781818B (en) Video classification method, model training method, device and equipment
CN110866682B (en) Underground cable early warning method and device based on historical data
CN109086186B (en) Log detection method and device
CN113284000B (en) User electricity data anomaly detection method and device and computer equipment
CN113268372B (en) One-dimensional time series anomaly detection method and device and computer equipment
CN113419950A (en) Method and device for generating UI automation script, computer equipment and storage medium
CN111679953B (en) Fault node identification method, device, equipment and medium based on artificial intelligence
CN117313015A (en) Time sequence abnormality detection method and system based on time sequence and multiple variables
CN117454190A (en) Log data analysis method and device
CN114553681B (en) Device state abnormality detection method and device and computer device
CN116562120A (en) RVE-based turbine engine system health condition assessment method and RVE-based turbine engine system health condition assessment device
CN112507059B (en) Event extraction method and device in public opinion monitoring in financial field and computer equipment
CN114401205A (en) Non-annotation multi-source network flow data drift detection method and device
CN112749539B (en) Text matching method, text matching device, computer readable storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant