CN109902703B - Time series abnormity detection method and device - Google Patents

Time series abnormity detection method and device Download PDF

Info

Publication number
CN109902703B
CN109902703B CN201811019950.3A CN201811019950A CN109902703B CN 109902703 B CN109902703 B CN 109902703B CN 201811019950 A CN201811019950 A CN 201811019950A CN 109902703 B CN109902703 B CN 109902703B
Authority
CN
China
Prior art keywords
cluster
subsequence
feature vector
time series
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811019950.3A
Other languages
Chinese (zh)
Other versions
CN109902703A (en
Inventor
杨思晓
秦臻
田光见
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201811019950.3A priority Critical patent/CN109902703B/en
Publication of CN109902703A publication Critical patent/CN109902703A/en
Application granted granted Critical
Publication of CN109902703B publication Critical patent/CN109902703B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The embodiment of the application discloses a time series abnormity detection method and device, and relates to the technical field of data analysis. The accuracy of detecting the abnormal time sequence in the multi-state mode can be improved. The method can comprise the following steps: receiving a set of signals in the form of a time series; dividing a signal into at least two subsequences according to a preset rule; respectively extracting a feature vector for each subsequence to obtain a feature vector sample set, wherein the feature vector sample set is formed by feature vectors corresponding to each subsequence; obtaining a clustering result in the feature vector sample set by adopting a clustering algorithm, wherein the clustering result comprises at least one cluster and a membership relation between each feature vector sample and the cluster, and each cluster is a subset of the feature vector sample set; applying a machine learning algorithm to establish an abnormal behavior model in each cluster; and carrying out anomaly detection in each cluster according to the corresponding anomaly behavior model to obtain anomaly points.

Description

Time series abnormity detection method and device
Technical Field
The present application relates to the field of data analysis technologies, and in particular, to a method and an apparatus for detecting time series anomalies.
Background
Time series anomaly detection has applications in many areas. For example, abnormal traffic in a computer network may mean that a hacked computer sends sensitive data to an unauthorized target, and abnormal points in electrocardiogram data correspond to abnormal conditions in the patient's heart, and abnormal points in network monitoring data indicate that a partial device in the network is about to fail or has failed.
The method has the advantages that the time series data abnormity can be accurately detected, and the method has important significance in practical application. However, the existing time series abnormality detection algorithm is usually based on stationarity assumption and periodicity assumption of data, the effect of abnormality detection on a stationary time series or a periodic time series is good, and the problems of high false positive rate, low detection rate and the like exist when abnormality detection is performed on a time series in a multi-state mode, and the accuracy rate needs to be improved.
Disclosure of Invention
The embodiment of the application provides a time series abnormality detection method and device, which are used for respectively establishing an abnormality model for different states of a multi-state mode time series, and can improve the accuracy of the multi-state mode time series abnormality detection.
In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:
in a first aspect, the present application provides a method for time series anomaly detection. The method can comprise the following steps: receiving a set of signals in the form of a time series; dividing a signal into at least two subsequences according to a preset rule; respectively extracting a feature vector for each subsequence to obtain a feature vector sample set, wherein the feature vector sample set is formed by feature vectors corresponding to each subsequence; obtaining a clustering result in the feature vector sample set by adopting a clustering algorithm, wherein the clustering result comprises at least one cluster and a membership relation between each feature vector sample and the cluster, and each cluster is a subset of the feature vector sample set; applying a machine learning algorithm to establish an abnormal behavior model in each cluster; and carrying out anomaly detection in each cluster according to the corresponding anomaly behavior model to obtain anomaly points. In the method, after the signal is received, clustering is carried out according to the characteristic vector of each subsequence to obtain at least one cluster, so that each cluster corresponds to one state of the time sequence, different abnormal behavior models can be established according to different states, and the accuracy of abnormal detection of the multi-state mode time sequence can be improved.
In one possible design, the feature vector includes: basic statistics and frequency characteristics of the subsequences; wherein the basic statistics comprise a mean value, a standard deviation, a maximum value and a minimum value; the frequency characteristic is the amplitude of the first W components of the resultant vector of the fast fourier transform of the subsequence, W being a positive integer.
In one possible design, the preset rules include: a data-based partitioning approach or a rule-based partitioning approach.
In one possible design, the data-based partitioning may include: obtaining a change point of a signal by adopting a change point analysis algorithm; the signal is divided into at least two subsequences according to the position of the change point. Generally, the behavior pattern of the time series is obviously changed before and after the change point, so different states of the time series can be effectively divided by adopting the method.
In one possible design, the rule-based partitioning approach includes: the signal is divided into at least two subsequences according to a preset period.
In one possible design, applying a machine learning algorithm to build an abnormal behavior model in each cluster includes: respectively merging the subsequences of the same cluster corresponding to the feature vectors into a subsequence set, wherein each subsequence set corresponds to one cluster; respectively calculating the mean value and the standard deviation of all sample values in each subsequence set; respectively calculating the upper bound and the lower bound of the normal value of the subsequence set according to the mean value and the standard deviation corresponding to each subsequence set; and obtaining the abnormal behavior model of each subsequence set according to the upper normal value boundary and the lower normal value boundary.
In a second aspect, the present application also provides a time-series abnormality detection apparatus, which may include: the device comprises a receiving module, a dividing module, a feature vector extracting module, a clustering module, a model establishing module and a detecting module. The receiving module is used for receiving a group of signals in a time series form; the dividing module is used for dividing the signal into at least two subsequences according to a preset rule; the characteristic vector extraction module is used for respectively extracting characteristic vectors for each subsequence to obtain a characteristic vector sample set, and the characteristic vector sample set is formed by the characteristic vectors corresponding to each subsequence; the clustering module is used for obtaining clustering results in the feature vector sample set by adopting a clustering algorithm, wherein the clustering results comprise at least one cluster and the membership of each feature vector sample and the cluster, and each cluster is a subset of the feature vector sample set; the model establishing module is used for applying a machine learning algorithm to establish an abnormal behavior model in each cluster; the detection module is used for carrying out anomaly detection in each cluster according to the corresponding anomaly behavior model to obtain anomaly points.
In one possible design, the feature vector includes: basic statistics and frequency characteristics of the subsequences; wherein the basic statistics comprise a mean value, a standard deviation, a maximum value and a minimum value; the frequency characteristic is the amplitude of the first W components of the resultant vector of the fast fourier transform of the subsequence, W being a positive integer.
In one possible design, the preset rules include: a data-based partitioning approach or a rule-based partitioning approach.
In one possible design, the dividing module is specifically configured to, in a case where the preset rule is a data-based dividing manner, obtain a change point of the signal by using a change point analysis algorithm; the signal is divided into at least two subsequences according to the position of the change point.
In a possible design, in a case that the preset rule is a rule-based division manner, the dividing module is specifically configured to divide the signal into at least two subsequences according to a preset period.
In one possible design, the model building module is specifically configured to: respectively merging the subsequences of the same cluster corresponding to the feature vectors into a subsequence set, wherein each subsequence set corresponds to one cluster; respectively calculating the mean value and the standard deviation of all sample values in each subsequence set; respectively calculating the upper bound and the lower bound of the normal value of the subsequence set according to the mean value and the standard deviation corresponding to each subsequence set; and obtaining the abnormal behavior model of each subsequence set according to the upper normal value boundary and the lower normal value boundary.
In a third aspect, the present application further provides a time-series anomaly detection device, including: a processor, a memory, and a communication interface. The communication interface is used for supporting communication between the device and other devices, and the communication interface can be a transceiver or a transceiver circuit, and the communication interface is used for receiving a group of signals in a time sequence form. A memory is coupled to the processor, the memory including a non-volatile storage medium for storing computer program code, the computer program code including computer instructions. When the processor executes the computer instructions, the processor is used for dividing the signal into at least two subsequences according to a preset rule; respectively extracting a feature vector for each subsequence to obtain a feature vector sample set, wherein the feature vector sample set is formed by feature vectors corresponding to each subsequence; obtaining a clustering result in the feature vector sample set by adopting a clustering algorithm, wherein the clustering result comprises at least one cluster and a membership relation between each feature vector sample and the cluster, and each cluster is a subset of the feature vector sample set; applying a machine learning algorithm to establish an abnormal behavior model in each cluster; and carrying out anomaly detection in each cluster according to the corresponding anomaly behavior model to obtain anomaly points.
For the explanation of the feature vector and the preset rule, reference may be made to the detailed description in the first aspect, which is not described herein again.
In one possible design, the processor is specifically configured to merge subsequences of the feature vector corresponding to the same cluster into a subsequence set, where each subsequence set corresponds to a cluster; respectively calculating the mean value and the standard deviation of all sample values in each subsequence set; respectively calculating the upper bound and the lower bound of the normal value of the subsequence set according to the mean value and the standard deviation corresponding to each subsequence set; and obtaining the abnormal behavior model of each subsequence set according to the upper normal value boundary and the lower normal value boundary.
The present application also provides a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform the method of any of the above aspects.
The present application also provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of the above aspects.
The present application further provides a chip system, which includes a processor, a memory, and a transceiver circuit, and is configured to implement the method according to any one of the above aspects.
Any one of the above-mentioned apparatuses, computer storage media, computer program products, or chip systems is configured to execute the above-mentioned corresponding methods, so that the beneficial effects achieved by the apparatuses, the computer storage media, the computer program products, or the chip systems can refer to the beneficial effects of the corresponding schemes in the above-mentioned corresponding methods, and are not described herein again.
Drawings
FIG. 1-1 is a first schematic diagram of a network architecture to which the present invention is applicable;
fig. 1-2 are schematic diagrams illustrating a network architecture according to an embodiment of the present application;
FIG. 2 is a first diagram illustrating a time sequence;
FIG. 3 is a diagram of a second time series;
fig. 4 is a schematic diagram of a time series anomaly detection method according to an embodiment of the present application;
FIG. 5-1 is a first diagram of a time series applicable to the embodiment of the present application;
FIG. 5-2 is a diagram of a second time series applicable to the embodiment of the present application;
FIGS. 5-3 are schematic diagrams of a time series applicable to the embodiment of the present application;
fig. 6 is a schematic structural diagram of a time series anomaly detection apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a time-series anomaly detection apparatus according to an embodiment of the present application.
Detailed Description
First, some terms referred to in the present application are explained:
1. time series
The time series contains the changed time stamps and the values on the corresponding time stamps. When the timestamp information is unknown, the relative position (e.g., index, subscript, etc.) of the time series may be used as the timestamp.
2. Expected value
And (4) taking values of the time series after removing noise.
3. Abnormality (S)
Anomalies are distinctive data in a data set that are not randomly biased but result from a completely different mechanism than normal values. And if the value corresponding to a certain timestamp is abnormal, the timestamp and the value are the abnormal point of the time sequence.
4. The term "plurality" herein means two or more. The terms "first" and "second" herein are used to distinguish between different objects, and are not used to describe a particular order of objects. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.
In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
The time series abnormality detection method and device provided by the embodiment of the present application are described in detail below with reference to the accompanying drawings.
The technical scheme provided by the application can be applied to the abnormal detection of the time series with the multi-state mode. The time series may be any domain of time series with a multi-state pattern, for example, may be Gross Domestic Product (GDP) comparable growth rate in the economic cycle; can be indexes such as electrocardiogram data, heartbeat frequency and the like when the human body is in different states (health, sub-health and diseases); or may be a device performance indicator, such as network traffic, network monitoring data, etc.
The technical solution provided by the present application can be applied to the system architecture shown in fig. 1-1 or fig. 1-2, which includes the apparatus 100 and the time series abnormality detection apparatus 200. The apparatus 100 can be used to generate signals in time series, for example, the apparatus 100 can be any device that can generate signals in time series, such as communication equipment, monitoring equipment, network equipment, domestic appliances, industrial instruments, and the like. The time-series abnormality detection device 200 is used to detect an abnormality in a time-series signal generated by the apparatus 100, and the time-series abnormality detection device 200 may be any device having a detection function. For example, the time-series abnormality detection apparatus 200 may be a device independent from the apparatus 100 as shown in fig. 1-1, or may be a unit that is provided in the apparatus 100 as shown in fig. 1-2 and that implements a time-series abnormality detection function. The time series abnormality detection apparatus 200 may be connected to the apparatus 100 by a wired connection, and acquire a signal in a time series form; the signals in the form of time series can also be acquired by wireless means, such as mobile communication networks, short-range communication, etc.
The time series abnormality detection algorithm calculates expected values of the time series by learning behavior patterns of the time series, and considers that the time series is abnormal when the actual value of the time point is greatly different from the expected value. Different pattern extraction algorithms are usually adopted to perform anomaly detection according to different behavior patterns of the time series, for example, a denoising algorithm is adopted for a stationary time series to obtain an expected value of the time series. The period component, the trend component and the noise of the time sequence can be obtained by adopting STL (period and trend decomposition) decomposition on the time sequence with the period behavior and the trend behavior.
However, in reality, time series usually show a multi-state pattern in addition to a periodic pattern and a trend pattern, for example: the website traffic is significantly different in busy and idle times of the day; indexes such as electrocardiogram data, heartbeat frequency and the like can be obviously changed when the human body is in different states; the economic indicators show different behaviors under different economic periods, and the like. The state duration of a time sequence with a multi-state pattern is typically not periodic and the state transitions are random.
Illustratively, fig. 2 is a schematic diagram of the division of each economic cycle stage from 2008 to 2012. As shown in fig. 2, the abscissa represents time, e.g., 2005Q1 represents one quarter of 2005, 2006Q4 represents four quarters of 2006, and the ordinate represents the GDP concordant growth rate. It can be seen that there are four states of comparable GDP growth rate, resuscitation, expansion, stasis and contraction, the duration of which is not periodic.
Illustratively, fig. 3 is a graph of performance indicator data for a network device. As shown in FIG. 3, the abscissa represents time in units of 15 minutes, and the ordinate represents performance index data. It can be seen that the performance indicator data is not smooth and periodic.
The method for carrying out anomaly detection on the time sequence based on the stationarity assumption and the periodicity assumption of the data has a good effect of carrying out anomaly detection on the stationary time sequence or the periodic time sequence, and the following problems generally exist when carrying out anomaly detection on the time sequence of a multi-state mode:
1. the false positive rate is high. The method based on the stationarity assumption and the periodicity assumption of the data can detect the normal behavior in the partial state as abnormal. For example, if the time series shown in fig. 3 is detected to be abnormal based on the assumption of stationarity and periodicity of data, the data in busy time is considered to be in accordance with the expected value, and the data in idle time may be detected to be abnormal, resulting in high false positive rate.
2. The detection rate is low. Models based on stationary or periodic assumptions are generally weak in generalization ability, and when processing time series of multi-state patterns, the detection rate of outliers is low due to under-fitting.
The embodiment of the application provides a time series abnormality detection method, which is used for respectively establishing an abnormality model for different states of a multi-state mode time series, and can improve the accuracy of the multi-state mode time series abnormality detection. The method may be performed by the time-series anomaly detection apparatus 200 of fig. 1-1 or fig. 1-2, and as shown in fig. 4, the method may include S101-S106:
s101, receiving a signal in a time series form.
Signals in time series form are received from one or more devices as a set of signals in time series form. The device may be any device capable of outputting a signal in the form of a time series, such as a network device, a monitoring device, a testing device, etc.
Illustratively, a signal is received from a network device in the form of a time series as shown in fig. 5-1, the abscissa represents time, and is a time series of length 105; the ordinate represents the sample values, the 1 st to 50 th points are generated by normal distribution N (5,1), the 51 st to 100 th points are generated by normal distribution N (15,1), and the 101 st and 105 th points are (4.8,1.2,5.1,0.8, 5.2). The detailed time series of sample values are as follows:
3.644146,5.364861,4.995019,5.119684,5.267784,4.042125,5.064017,4.955253,4.160305,4.418243,4.021981,5.310384,6.874973,5.119141,3.262541,3.870503,6.386098,4.925261,5.898857,3.178964,6.027084,3.734730,5.486550,4.414970,3.193849,4.442141,3.984647,5.575834,4.447070,5.233719,4.771871,3.918034,5.265214,5.437755,5.231816,5.016403,4.438038,5.767812,3.705144,3.570856,4.329993,4.436391,4.145236,2.441678,4.605813,5.591893,5.196055,5.366105,4.774712,6.478371,15.424729,14.064224,15.676915,14.931677,14.666771,15.708086,14.332691,16.520617,15.537886,15.679568,16.359227,15.978498,14.576491,15.478133,15.197947,13.433239,14.513958,14.647816,16.245138,13.589308,15.396134,13.482729,14.074725,14.783087,14.747101,16.570960,15.600240,15.500117,16.632776,17.752100,14.314861,15.402561,15.709759,14.706143,13.938962,14.128660,15.776622,15.835101,14.559545,14.501439,16.240766,13.744677,14.028888,13.776851,15.047210,14.079417,14.392171,14.970326,15.509391,14.272458,4.8,1.2,5.1,0.8,5.2。
and S102, dividing the signal into subsequences according to a preset rule.
Dividing the received signal into at least two subsequences according to a preset rule. The received signal may be divided into a plurality of signals according to any preset rule. For example, the preset rule may be a data-based division manner or a rule-based division manner.
In one implementation, a data-based partitioning approach may include:
firstly, a variable point analysis algorithm is adopted to obtain the variable point of the signal.
The change point is also called a mutation point or a breaking point, and the behavior pattern of the time series usually has a significant change before and after the change point. The change point of the signal is obtained by adopting a change point analysis algorithm, namely different states of the time sequence can be divided. Illustratively, the variable point analysis algorithm may be any one of Binary Segmentation, PELT (pruned exact linear time), and the like.
Illustratively, the time series shown in FIG. 5-1 is analyzed by Binary Segmentation method to obtain two variation points: the 50 th spot and the 100 th spot.
And secondly, dividing the signal into a plurality of subsequences according to the position of the variable point.
The signal is divided into a plurality of subsequences with each transition point as a boundary.
Illustratively, the time sequence shown in fig. 5-1 is divided into 3 subsequences according to the position of the change point: [1,50], [51,100], [101,105], as shown in FIG. 5-2, resulting in subsequence 1, subsequence 2, and subsequence 3.
In another implementation, a rule-based partitioning approach may include:
once the division period T of the signal is determined, for example, the division period is 24 hours, the length of the sub-sequence is fixed to 24 hours.
And secondly, determining a division starting point s of the signal, for example, taking the current point or randomly selecting a potential abnormal point as the division starting point. The potential anomaly point is a time point at which the user needs to perform anomaly detection, and for example, all time points of the time series can be taken as potential anomaly points.
Thirdly, dividing the signal into a plurality of subsequences according to s and T: (s-T, s ], (s-2T, s-T ], (s-3T, s-2T ] … …
And S103, respectively extracting a characteristic vector for each subsequence to obtain a characteristic vector sample set.
And respectively extracting a feature vector for each subsequence, and combining the feature vectors corresponding to all the subsequences to form a feature vector sample set. A feature vector is a set of data that can embody common features of each sub-sequence.
In one implementation, the feature vector may include: basic statistics and frequency characteristics of the subsequences; wherein, the basic statistics may include a mean, a standard deviation, a maximum value and a minimum value; the frequency characteristic is the amplitude of the first W components of the result vector of the fast Fourier transform of the subsequence, W is a positive integer and can be determined according to actual needs.
The fast fourier transform equation is as follows:
Figure BDA0001787060400000061
wherein n is the subsequence length, and x (k) is the kth sampleThe value, i, is an imaginary unit, i.e.
Figure BDA0001787060400000062
The first w components, i.e., y (1), y (2) … y (w).
In summary, the mean, standard deviation, maximum, minimum and magnitude of y (1), y (2) … y (w) of the sub-sequence constitute the feature vector of the sub-sequence. The feature vectors of all sub-sequences constitute a sample set of feature vectors.
Further, in an implementation manner, the feature vector sample set may be normalized, so that the value range of each feature is between [0,1 ].
Illustratively, the feature vectors (w ═ 5) of subsequence 1, subsequence 2, and subsequence 3 in fig. 5-2 are calculated, respectively, and a sample set of feature vectors is obtained as shown in table 1.
TABLE 1
Figure BDA0001787060400000063
And S104, obtaining a clustering result in the feature vector sample set by adopting a clustering algorithm.
The clustering algorithm is a machine learning algorithm, and for a sample set X containing m samples { X1, X2, … xm }, q clusters are obtained by adopting the clustering algorithm: c1, C2 … Cq, making any sample belong to one of the clusters, and the samples in each cluster have similar characteristics, wherein the value of q can be determined by adopting a preset and heuristic method.
And obtaining a clustering result by adopting any clustering algorithm to the feature vector sample set, wherein the clustering result comprises at least one cluster and the membership of each feature vector sample and the cluster, each cluster is a subset of the feature vector sample set, and each cluster corresponds to one state of the time sequence. For example, the clustering algorithm may be K-means, hierarchical clustering, etc.
Exemplarily, the feature vector sample set shown in table 1 is clustered by using K-means, and the cluster number q is 2, and the clustering result is shown in table 2.
TABLE 2
Figure BDA0001787060400000064
And S105, applying a machine learning algorithm to establish an abnormal behavior model in each cluster.
And respectively applying any machine learning algorithm to each cluster to establish an abnormal behavior model, wherein the abnormal behavior model is used for distinguishing abnormal points in the time sequence. The embodiment of the application does not limit the machine learning algorithm adopted for establishing the abnormal behavior model.
In one implementation, the abnormal behavior model may be built as follows.
Combining the subsequences of the same cluster corresponding to the feature vectors into a subsequence set, wherein each subsequence set corresponds to one cluster.
Illustratively, cluster 1 in table 2 includes feature vector 1 and feature vector 3, and cluster 2 includes feature vector 2; the subsequences corresponding to cluster 1 are subsequence 1 and subsequence 3, and the subsequence corresponding to cluster 2 is subsequence 2.
The subsequence set 1 obtained by combining the elements of the subsequence 1 and the subsequence 3, that is, the sample value set corresponding to the cluster 1, has 55 values, which are:
3.644146,5.364861,4.995019,5.119684,5.267784,4.042125,5.064017,4.955253,4.160305,4.418243,4.021981,5.310384,6.874973,5.119141,3.262541,3.870503,6.386098,4.925261,5.898857,3.178964,6.027084,3.734730,5.486550,4.414970,3.193849,4.442141,3.984647,5.575834,4.447070,5.233719,4.771871,3.918034,5.265214,5.437755,5.231816,5.016403,4.438038,5.767812,3.705144,3.570856,4.329993,4.436391,4.145236,2.441678,4.605813,5.591893,5.196055,5.366105,4.774712,6.478371,4.8,1.2,5.1,0.8,5.2。
the subsequence set 2 formed by the elements of the subsequence 2, that is, the sample value set corresponding to the cluster 2, has 50 values, which are:
15.424729,14.064224,15.676915,14.931677,14.666771,15.708086,14.332691,16.520617,15.537886,15.679568,16.359227,15.978498,14.576491,15.478133,15.197947,13.433239,14.513958,14.647816,16.245138,13.589308,15.396134,13.482729,14.074725,14.783087,14.747101,16.570960,15.600240,15.500117,16.632776,17.752100,14.314861,15.402561,15.709759,14.706143,13.938962,14.128660,15.776622,15.835101,14.559545,14.501439,16.240766,13.744677,14.028888,13.776851,15.047210,14.079417,14.392171,14.970326,15.509391,14.272458。
and secondly, respectively calculating the mean value and the standard deviation of all sample values in each subsequence set.
Illustratively, the mean and standard deviation of the above-described subsequence set 1 and subsequence set 2 are calculated, respectively. For example, the mean of subsequence set 1 is 4.62 and the standard deviation is 1.13.
And thirdly, calculating the upper limit and the lower limit of the normal value of the subsequence set according to the mean value and the standard deviation corresponding to each subsequence set.
In practical applications, the upper and lower normal value bounds of the subsequence set can be determined as needed, for example, according to the accuracy requirement. The embodiments of the present application are not limited to the specific method for determining the upper and lower normal value bounds.
In one implementation, an upper normal value bound and a lower normal value bound for a set of subsequences can be calculated from the mean and standard deviation of the set of subsequences.
Illustratively, the upper normal value range is the mean + a standard deviation, and the lower normal value range is the mean-b standard deviation.
For example, if the upper normal value limit is equal to the mean +3 standard deviation and the lower normal value limit is equal to the mean-3 standard deviation, then the upper normal value limit of subsequence set 1 can be calculated to be 4.62+3 × 1.13 equal to 8 and the lower normal value limit of 4.62-3 × 1.13 equal to 1.233.
And fourthly, obtaining the abnormal behavior model of each subsequence set according to the upper limit of the normal value and the lower limit of the normal value.
And respectively obtaining the abnormal behavior model of each subsequence set according to the upper limit of the normal value and the lower limit of the normal value, when the sample value is larger than the upper limit of the normal value or smaller than the lower limit of the normal value, determining that the sample value accords with the abnormal behavior model, and taking the sample value and the corresponding timestamp as an abnormal point.
Illustratively, for subsequence set 1, the abnormal behavior model is that the time point at which the sample value is greater than 8 or the sample value is less than 1.233 is an abnormal point.
Of course, the same method may be adopted to obtain the abnormal behavior model of the subsequence set 2, and details are not described here.
And S106, carrying out anomaly detection in each cluster according to the corresponding anomaly behavior model to obtain anomaly points.
And respectively detecting each sample value in the subsequence set according to the corresponding abnormal behavior model aiming at the subsequence set corresponding to each cluster to obtain abnormal points.
Illustratively, for the cluster 1, each sample value in the subsequence set 1 is detected according to an abnormal behavior model, and a time point at which the sample value is greater than 8 or the sample value is less than 1.233 is determined as an abnormal point. There are two outliers in the subsequence set 1, namely the 102 th and 104 th points, respectively. And performing anomaly detection on the cluster 2 by adopting the same method according to the abnormal behavior model corresponding to the cluster 2, and determining that no abnormal point exists.
As shown in fig. 5-3, the outliers at which the signal in the time-series form shown in fig. 5-1 is obtained are the 102 th and 104 th points.
The time series abnormity detection method provided by the embodiment of the application distinguishes different states of the time series and respectively establishes the abnormal behavior models, and compared with a method for carrying out abnormity detection on the time series by adopting a uniform abnormal behavior model in the prior art, the time series abnormity detection method provided by the application can effectively avoid the problems of high false positive rate and low detectable rate, and improve the accuracy of time series abnormity detection.
The above-mentioned scheme provided by the embodiment of the present application is mainly introduced from the perspective of a time series abnormality detection apparatus. It is to be understood that the time-series abnormality detection apparatus includes, in order to realize the above-described functions, a hardware configuration and/or a software module corresponding to each function. Those of skill in the art would readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiment of the present application, the time-series abnormality detection apparatus may be divided into the functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation. The following description will be given taking the example of dividing each functional module corresponding to each function.
Fig. 6 is a schematic logical structure diagram of an apparatus 600 according to an embodiment of the present application. The apparatus 600 may be a hardware structure, a software module, or a hardware structure plus a software module, and the apparatus 600 may be implemented by a chip system. As shown in fig. 6, the apparatus 600 includes a receiving module 601, a partitioning module 602, a feature vector extraction module 603, a clustering module 604, a model building module 605, and a detection module 606. The receiving module 601 may be configured to perform S101 in fig. 4 and/or perform other steps described in this application. The partitioning module 602 may be used to perform S102 in fig. 4, and/or perform other steps described herein. The feature vector extraction module 603 may be configured to perform S103 in fig. 4, and/or perform other steps described herein. Clustering module 604 may be used to perform S104 in fig. 4, and/or perform other steps described herein. The model building module 605 may be used to perform S105 in fig. 4, and/or perform other steps described herein. The detection module 606 may be used to perform S106 in fig. 4, and/or perform other steps described herein.
All relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.
In the present embodiment, the apparatus 600 may be presented in a form of dividing each functional module in an integrated manner. A "module" herein may refer to a particular ASIC, a circuit, a processor and a memory device executing one or more software or firmware programs, an integrated logic circuit, and/or other components that can provide the described functionality.
In a simple embodiment, the apparatus 600 may take the form shown in FIG. 7.
As shown in fig. 7, the apparatus 700 may include: memory 701, processor 702, and communication interface 703. The memory 701 is used for storing instructions, and when the apparatus 700 runs, the processor 702 executes the instructions stored in the memory 701, so that the apparatus 700 executes the time series anomaly detection method provided by the embodiment of the present application. The memory 701, processor 702, and communication interface 703 are communicatively coupled via a bus 704. For a specific time series abnormality detection method, reference may be made to the above description and the related description in the drawings, and details are not repeated here. It should be noted that, in a specific implementation, the apparatus 700 may also include other hardware devices, which are not listed here. In one possible implementation, the memory 701 may also be included in the processor 702.
In one example of the present application, the receiving module 601 in fig. 6 may be implemented by a communication interface 703. The partitioning module 602, the feature vector extraction module 603, the clustering module 604, the model building module 605, and the detection module 606 in fig. 6 may be implemented by the processor 702.
The communication interface 703 may be a circuit, a device, an interface, a bus, a software module, a transceiver, or any other device capable of implementing communication. The processor 702 may be a field-programmable gate array (FPGA), an Application Specific Integrated Circuit (ASIC), a system on chip (SoC), a Central Processing Unit (CPU), a Network Processor (NP), a digital signal processing circuit (DSP), a Micro Controller Unit (MCU), or a Programmable Logic Device (PLD) or other integrated chip. The memory 701 includes a volatile memory (volatile memory), such as a random-access memory (RAM); the memory may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD); the memory may also comprise a combination of memories of the above kind; the memory may also include any other means having a memory function such as a circuit, device, or software module.
Since the apparatus provided in the embodiment of the present application can be used to execute the time series abnormality detection method, the technical effect obtained by the apparatus can refer to the method embodiment, and will not be described herein again.
It will be apparent to those skilled in the art that all or part of the steps of the above method may be performed by hardware associated with program instructions, and the program may be stored in a computer readable storage medium such as ROM, RAM, optical disk, etc.
The embodiment of the present application also provides a storage medium, which may include the memory 701.
For the explanation and beneficial effects of the related content in any one of the above-mentioned apparatuses, reference may be made to the corresponding method embodiments provided above, and details are not repeated here.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented using a software program, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the present application are all or partially generated upon loading and execution of computer program instructions on a computer. The computer may be a general purpose computer, a special purpose computer, a computer network, a network appliance, a user device, or other programmable apparatus. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or can comprise one or more data storage devices, such as a server, a data center, etc., that can be integrated with the medium. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., Digital Video Disk (DVD)), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
While the present application has been described in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed application, from a review of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Although the present application has been described in conjunction with specific features and embodiments thereof, it will be evident that various modifications and combinations can be made thereto without departing from the spirit and scope of the application. Accordingly, the specification and figures are merely exemplary of the present application as defined in the appended claims and are intended to cover any and all modifications, variations, combinations, or equivalents within the scope of the present application. It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (5)

1. A method for detecting the abnormality of the time series of the performance indexes of equipment is characterized by comprising the following steps:
receiving a set of signals in the form of a time series from one or more devices, the time series being a device performance indicator time series;
dividing the signal into at least two subsequences according to a preset rule;
respectively extracting a feature vector for each subsequence to obtain a feature vector sample set, wherein the feature vector sample set is formed by feature vectors corresponding to each subsequence;
obtaining a clustering result by adopting a clustering algorithm in the feature vector sample set, wherein the clustering result comprises at least one cluster and a membership relation between each feature vector sample and the cluster, and each cluster is a subset of the feature vector sample set;
applying a machine learning algorithm to establish an abnormal behavior model in each cluster;
performing anomaly detection in each cluster according to the corresponding anomaly behavior model to obtain anomaly points in the equipment performance index time sequence;
the preset rules include: a data-based partitioning manner or a rule-based partitioning manner;
wherein, the data-based partitioning method comprises: obtaining a change point of the signal by adopting a change point analysis algorithm; dividing the signal into at least two subsequences according to the position of the variable point;
the rule-based partitioning method comprises the following steps: dividing the signal into at least two subsequences according to a preset period;
wherein, the applying a machine learning algorithm to establish an abnormal behavior model in each cluster comprises: respectively merging the subsequences of the same cluster corresponding to the feature vectors into a subsequence set, wherein each subsequence set corresponds to one cluster;
respectively calculating the mean value and the standard deviation of all sample values in each subsequence set;
respectively calculating the upper bound and the lower bound of the normal value of the subsequence set according to the mean value and the standard deviation corresponding to each subsequence set;
and obtaining the abnormal behavior model of each subsequence set according to the upper normal value boundary and the lower normal value boundary.
2. The method of claim 1, wherein the feature vector comprises:
basic statistics and frequency characteristics of the subsequences;
wherein the basic statistics comprise a mean value, a standard deviation, a maximum value and a minimum value;
the frequency characteristic is the amplitude of the first W components of the resultant vector of the fast fourier transform of the subsequence, W being a positive integer.
3. An apparatus for detecting a time series abnormality of a device performance index, comprising:
the receiving module is used for receiving a group of signals in a time sequence form, wherein the time sequence is a device performance index time sequence;
the dividing module is used for dividing the signal into at least two subsequences according to a preset rule;
the characteristic vector extraction module is used for respectively extracting characteristic vectors for each subsequence to obtain a characteristic vector sample set, and the characteristic vector sample set is formed by the characteristic vectors corresponding to each subsequence;
the clustering module is used for obtaining a clustering result in the feature vector sample set by adopting a clustering algorithm, wherein the clustering result comprises at least one cluster and a membership relation between each feature vector sample and the cluster, and each cluster is a subset of the feature vector sample set;
the model establishing module is used for applying a machine learning algorithm to establish an abnormal behavior model in each cluster;
the detection module is used for carrying out anomaly detection in each cluster according to the corresponding anomaly behavior model to obtain anomaly points in the equipment performance index time sequence;
the preset rules include: a data-based partitioning manner or a rule-based partitioning manner;
the dividing module is specifically configured to, under the condition that the preset rule is a data-based dividing manner, obtain a change point of the signal by using a change point analysis algorithm; dividing the signal into at least two subsequences according to the position of the variable point;
under the condition that the preset rule is a rule-based dividing mode, the dividing module is specifically configured to divide the signal into at least two subsequences according to a preset period;
wherein the model building module is specifically configured to: respectively merging the subsequences of the same cluster corresponding to the feature vectors into a subsequence set, wherein each subsequence set corresponds to one cluster;
respectively calculating the mean value and the standard deviation of all sample values in each subsequence set;
respectively calculating the upper bound and the lower bound of the normal value of the subsequence set according to the mean value and the standard deviation corresponding to each subsequence set;
and obtaining the abnormal behavior model of each subsequence set according to the upper normal value boundary and the lower normal value boundary.
4. The apparatus of claim 3, wherein the feature vector comprises:
basic statistics and frequency characteristics of the subsequences;
wherein the basic statistics comprise a mean value, a standard deviation, a maximum value and a minimum value;
the frequency characteristic is the amplitude of the first W components of the resultant vector of the fast fourier transform of the subsequence, W being a positive integer.
5. A computer storage medium on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1-2.
CN201811019950.3A 2018-09-03 2018-09-03 Time series abnormity detection method and device Active CN109902703B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811019950.3A CN109902703B (en) 2018-09-03 2018-09-03 Time series abnormity detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811019950.3A CN109902703B (en) 2018-09-03 2018-09-03 Time series abnormity detection method and device

Publications (2)

Publication Number Publication Date
CN109902703A CN109902703A (en) 2019-06-18
CN109902703B true CN109902703B (en) 2021-09-21

Family

ID=66943286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811019950.3A Active CN109902703B (en) 2018-09-03 2018-09-03 Time series abnormity detection method and device

Country Status (1)

Country Link
CN (1) CN109902703B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149700A (en) * 2019-06-28 2020-12-29 北京百度网讯科技有限公司 Method, device and equipment for identifying characteristic drift amplitude and storage medium
CN110532297A (en) * 2019-08-01 2019-12-03 河海大学 A kind of symbolism Hydrological Time Series abnormal patterns detection method based on hierarchical clustering
CN112398677A (en) * 2019-08-15 2021-02-23 华为技术有限公司 Flow anomaly detection method, model training method and device
CN110750536B (en) * 2019-10-11 2020-06-23 清华大学 Vibration noise smoothing method and system for attitude time series data
CN110763197B (en) * 2019-10-11 2020-06-09 清华大学 Abnormal value processing method and system for attitude time series data
CN112882822B (en) * 2019-11-29 2024-03-01 阿里巴巴集团控股有限公司 Method, apparatus, device and storage medium for generating load prediction model
CN111258826B (en) * 2020-01-09 2023-08-15 深圳市德明利技术股份有限公司 Method and device for testing command sequence of storage device and device
CN111612082B (en) * 2020-05-26 2023-06-23 河北小企鹅医疗科技有限公司 Method and device for detecting abnormal subsequence in time sequence
CN112148768A (en) * 2020-09-14 2020-12-29 北京基调网络股份有限公司 Index time series abnormity detection method, system and storage medium
CN112070178B (en) * 2020-09-18 2023-10-27 北京金山云网络技术有限公司 Method and device for determining image sequence sample set and computer equipment
CN112329713A (en) * 2020-11-25 2021-02-05 恩亿科(北京)数据科技有限公司 Network flow abnormity online detection method, system, computer equipment and storage medium
CN113704643B (en) * 2021-09-03 2022-10-18 北京百度网讯科技有限公司 Method and device for determining state of target object, electronic equipment and storage medium
US11722359B2 (en) 2021-09-20 2023-08-08 Cisco Technology, Inc. Drift detection for predictive network models
CN114363062A (en) * 2021-12-31 2022-04-15 深信服科技股份有限公司 Domain name detection method, system, equipment and computer readable storage medium
CN114745161B (en) * 2022-03-23 2023-08-22 烽台科技(北京)有限公司 Abnormal traffic detection method and device, terminal equipment and storage medium
CN114861729A (en) * 2022-05-20 2022-08-05 西安邮电大学 Method and device for detecting time sequence abnormity in wireless sensor network

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101900834B (en) * 2010-03-23 2013-11-06 中国地震局地震研究所 Method for detecting ionized layer TEC exception
CN102880621B (en) * 2011-07-14 2017-03-01 富士通株式会社 The method and apparatus extracting similar Time Sub-series
CN102360378A (en) * 2011-10-10 2012-02-22 南京大学 Outlier detection method for time-series data
CN105205112A (en) * 2015-09-01 2015-12-30 西安交通大学 System and method for excavating abnormal features of time series data
CN105702029B (en) * 2016-02-22 2017-11-28 北京航空航天大学 A kind of Expressway Traffic trend prediction method for considering space-time relationship at times
US20170277997A1 (en) * 2016-03-23 2017-09-28 Nec Laboratories America, Inc. Invariants Modeling and Detection for Heterogeneous Logs
CN106127229A (en) * 2016-06-16 2016-11-16 南京大学 A kind of computer data sorting technique based on time series classification
CN106599271A (en) * 2016-12-22 2017-04-26 江苏方天电力技术有限公司 Emission monitoring time series data abnormal value detection method for coal-fired unit

Also Published As

Publication number Publication date
CN109902703A (en) 2019-06-18

Similar Documents

Publication Publication Date Title
CN109902703B (en) Time series abnormity detection method and device
WO2021056724A1 (en) Anomaly detection method and apparatus, electronic device and storage medium
Rostaghi et al. Dispersion entropy: A measure for time-series analysis
Zalesky et al. Towards a statistical test for functional connectivity dynamics
JP2020140580A (en) Detection device and detection program
CN108324271B (en) Electrocardiosignal identification method and system and electrocardiosignal monitoring equipment
CN109656366B (en) Emotional state identification method and device, computer equipment and storage medium
CN109587008A (en) Detect the method, apparatus and storage medium of abnormal flow data
CN108742697B (en) Heart sound signal classification method and terminal equipment
CN111652135B (en) Electrocardiogram data generation method and device and electronic equipment
CN111626360B (en) Method, apparatus, device and storage medium for detecting boiler fault type
CN112565187A (en) Power grid attack detection method, system, equipment and medium based on logistic regression
US20080255773A1 (en) Machine condition monitoring using pattern rules
Jiao et al. Effective connectivity analysis of fMRI data based on network motifs
US9467182B2 (en) Signal decomposition method and electronic apparatus using the same
Hartmann et al. Identifying IIR filter coefficients using particle swarm optimization with application to reconstruction of missing cardiovascular signals
CN116578843A (en) Centrifugal pump diagnostic model training method, diagnostic method, system, device and medium
CN103605880B (en) Closely spaced mode damping ratio precisely-diagnosing method
CN112433952B (en) Method, system, device and medium for testing fairness of deep neural network model
CN113283316A (en) Switch mechanical fault diagnosis method, device and equipment based on sound signals
Huang et al. Signal frequency estimation based on RNN
CN114330859A (en) Optimization method, system and equipment for real-time quality control
TWI734369B (en) Random number generation method, device, electronic equipment and non-transient computer storage medium
US10854331B2 (en) Processing a query using transformed raw data
KR20200117582A (en) Method and apparatus for mental workload assessment of multitasking user using physiological data and simulated data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant