CN113553232A - Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix portrait - Google Patents
Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix portrait Download PDFInfo
- Publication number
- CN113553232A CN113553232A CN202110783748.3A CN202110783748A CN113553232A CN 113553232 A CN113553232 A CN 113553232A CN 202110783748 A CN202110783748 A CN 202110783748A CN 113553232 A CN113553232 A CN 113553232A
- Authority
- CN
- China
- Prior art keywords
- subsequence
- nearest neighbor
- time
- distance
- online matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000011159 matrix material Substances 0.000 title claims abstract description 32
- 238000001514 detection method Methods 0.000 title claims abstract description 27
- 238000012423 maintenance Methods 0.000 title claims abstract description 13
- 238000000034 method Methods 0.000 claims abstract description 16
- 230000002159 abnormal effect Effects 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 2
- 238000012549 training Methods 0.000 abstract description 7
- 230000005856 abnormality Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 5
- 235000012489 doughnuts Nutrition 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000593 degrading effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000001932 seasonal effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3055—Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3452—Performance evaluation by statistical analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Hardware Design (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for carrying out unsupervised anomaly detection on operation and maintenance data through an online matrix portrait, which comprises the steps of firstly adopting a sliding window with the window size of m to divide a time sequence X into a plurality of subsequences Xi,m(ii) a Then based on a plurality of subsequences Xi,mConstructing an online matrix representation P ═ P1,…,pt,…,pn‑m+1}; finally, calculating a nearest neighbor subsequence by an online matrix sketch algorithm, and calculating x by using the nearest neighbor subsequencetDistance significance r oftDistance significance rtAbove a predefined threshold τ, it is considered abnormal, otherwise it is considered normal. The invention can be carried out withoutThe single variable time sequence abnormity detection task is supervised, no model training is needed, and the abnormity can be efficiently and accurately found out.
Description
Technical Field
The invention relates to a computer system anomaly detection technology, in particular to the field of intelligent operation and maintenance monitoring and the like which can be applied to Internet companies.
Background
The time sequence is a series of sequences proceeding in time orderThe observation points of the line tissue are generally considered to have the same time interval between each two points. Given a univariate time series X ═ X1,…,xt,…,xn},xtThe abnormal point is used for judging whether the state at the time t is obviously deviated from the normal state or not. In the field of intelligent operation and maintenance, time series anomaly detection is of great importance for monitoring key performance indexes.
Time series anomaly detection faces many challenges. First, because anomalies are rare and labeling them is difficult and expensive, it is impractical to collect a large amount of labeled anomaly data to train a model. Secondly, the concept drift of the time series occurs with the change of the environment. Once concept drift occurs, the model needs to be updated, which is quite time and labor consuming. Finally, the time-series pattern is not fixed, and may be represented by seasonal, steady or unstable patterns, and the like. Therefore, in practical applications, an unsupervised anomaly detection method which is insensitive to time series pattern variations is required.
However, the current unsupervised anomaly detection technology either requires a lot of resources for training or has poor performance, and cannot balance efficiency and detection accuracy. An unsupervised abnormality detection method based on prediction predicts a state at a current time using history data, and detects an abnormality by judging whether an observed value of the current state greatly deviates from a predicted value. This method relies heavily on the predicted performance, and if the predicted performance is not good, the detection effect is poor. The distribution-based abnormality detection method detects an abnormality by learning the distribution of the normal state and determining whether the state at the present time deviates from the distribution of the normal state. But anomalies in the data often interfere with learning distributively to normal states, degrading the performance of anomaly detection. Meanwhile, when the concept of the time sequence drifts, the model needs to be retrained to learn new distribution. When monitoring hundreds of time series simultaneously, the cost of training and maintaining these models is very high. The distance-based unsupervised anomaly detection method detects anomalies by exploring the relationship between the current state and its k neighbors. These methods are sensitive to the setting of the parameter k and require a relatively high time complexity.
Disclosure of Invention
In view of the above, the present invention provides an unsupervised anomaly detection method for operation and maintenance data through an online matrix sketch, so as to improve detection efficiency and detection accuracy.
In order to achieve the purpose, the invention adopts the technical scheme that:
an unsupervised anomaly detection method for operation and maintenance data through an online matrix portrait comprises the following steps:
step 1, acquiring operation and maintenance data X represented by a time sequence, and performing data preprocessing: slicing a time series X into a plurality of subsequences X using a sliding window of window size mi,m;
Step 2, for each subsequence X in the time sequencei,mCalculating the subsequence Xi,mAnd all subsequences X which occur before itj,mTaking the minimum value p of the distanceiIs Xi,mThe subscript idx corresponding to the minimum distance is taken as the value of the on-line matrix imageiIs a nearest neighbor subscript; then the time series X online matrix picture P ═ { P ═ P1,…,pt,…,pn-m+1Corresponding nearest neighbor subscript I ═ idx }1,…,idxt,…,idxn-m+1};
Step 3, aiming at the state x at the time ttWhich form a subsequence X with the first m-1 statest-m+1,m=xt-m+1,…,xtCalculating the subsequence X by an online matrix image algorithmt-m+1,mDistance p from its nearest neighbor subsequencet-m+1And nearest neighbor subscript idxt-m+1Calculating x using nearest neighbor subsequencestDistance significance r oftIf the distance significance rt is greater than a predefined threshold τ, it is considered abnormal, otherwise it is considered normal; wherein τ is a constant.
In the step 1, before the time series X is segmented by using the sliding window, the following processing is performed:
and detecting whether the time sequence X has a missing value according to the time stamp, and if the missing value exists, filling by adopting first-order linear interpolation of adjacent states.
The method comprises the steps that a buffer area with the size of c is set, and the buffer area stores historical state values of c most recent moments; the sub-sequence at the current time only computes a matrix representation with the sub-sequence in the buffer.
After the scheme is adopted, the standard matrix portrait is firstly improved into an online matrix portrait, possible abnormal amplitude variation is reserved through mean value alignment, then the distance significance is calculated by utilizing the subsequence and the nearest neighbor subsequence thereof, and the distance significance utilizes the ratio of the distance rather than the distance to extract the abnormality, so that the distance significance is insensitive to the amplitude variation and is also applicable to a time sequence with a variable point; therefore, the invention can carry out the unsupervised time sequence anomaly detection task, does not need to carry out any model training and can efficiently and accurately find out the anomaly points.
In addition, the invention also provides a cache strategy, thereby greatly improving the operation efficiency and saving the storage space.
Detailed Description
The invention discloses an unsupervised anomaly detection method for operation and maintenance data through an online matrix portrait, which firstly improves a standard matrix portrait into an online matrix portrait and then extracts anomalies from the online matrix portrait by utilizing distance significance. The method specifically comprises the following steps:
step 1, data preprocessing: the original time series X is cleaned.
Step 1.1, X ═ X for a given original time series1,…,xt,…,xnAnd detecting whether a missing value exists according to the timestamp, and if the missing value exists, filling by adopting first-order linear interpolation of adjacent states.
Specifically, for a missing segment with a missing length less than or equal to M, first-order linear interpolation filling is directly performed by using states before and after the missing segment, and for a missing segment with a missing length greater than M, first-order linear interpolation filling is performed by using state values of the same time period in adjacent cycles of the missing segment.
And step 1.2, dividing the sequence into a plurality of subsequences by adopting a sliding window with the window size of m and the step length of 1. Each subsequence is denoted Xi,m={xi,xi+1,…,xi+m-1All subsequence sets are denoted as S ═ X1,m,…,Xt,m,…,Xn-m+1,m}. Wherein for hour scale data, m takes the value of 48; for the minute scale data, m takes the value 2880.
Step 2, for each subsequence X in the time sequencei,mCalculating the subsequence Xi,mAnd all subsequences X which occur before itj,mTaking the minimum value p of the distanceiIs Xi,mThe subscript idx corresponding to the minimum distance is taken as the value of the on-line matrix imageiIs a nearest neighbor subscript; then the time series X online matrix picture P ═ { P ═ P1,…,pt,…,pn-m+1Corresponding nearest neighbor subscript I ═ idx }1,…,idxt,…,idxn-m+1}。
Standard matrix pictures divide a time sequence into sub-sequences X of fixed length m using a sliding windowi,mAnd calculating the Euclidean distance between each subsequence passing through z-score and the nearest subsequence in the time sequence. In an online scenario, the state after the current time is unknown. Thus, the online matrix representation computes the Euclidean distance between each subsequence after z-score and the nearest neighbor subsequence that occurs before it. After the subsequence passes through z-score, the fluctuations of the subsequence itself are eliminated. However, the fluctuation may indicate the occurrence of an abnormality. To avoid such anomalies being ignored, the online matrix sketch only aligns the means when computing the Euclidean distances between the subsequences. Given subsequence Xi,mThe mean and variance are respectively muiAnd σi,Xi,mAnd the subsequence X which precedes itj,mThe distance calculation formula of (c) is as follows:
wherein<Xi,m,Xj,mCan be calculated by the previous moment<Xi-1,m,Xj-1,m>The calculation result of (2) is obtained, thereby speeding up the calculation.<Xi,m,Xj,m>The calculation formula of (a) is as follows:
<Xi,m,Xj,m>=<Xi-1,m,Xj-1,m>-xi-1xj-1+xi+m-1xj+m-1
in order to further improve the calculation efficiency and save the storage space, the invention sets a buffer area with the size of c. The buffer stores only the state values of the last c moments of the history. The sub-sequence at the current time only computes a matrix representation with the sub-sequence in the buffer.
Step 3, abnormality detection: for state x at time ttWhich form a subsequence X with the first m-1 statest-m+1,m={xt-m+1,…,xtCalculating the subsequence X by an online matrix image algorithmt-m+1,mDistance p from its nearest neighbor subsequencet-m+1And nearest neighbor subscript idxt-m+1Then using the nearest neighbor subsequence to calculate xtDistance significance r oftDistance significance rtAbove a predefined threshold τ, it is considered abnormal, otherwise it is considered normal. Wherein τ is a constant. Wherein,
l is a parameter, and in general, l ═ m is taken. But when m is large enough that there may be multiple anomalies in the window, take l < m.
The invention utilizes the distance significance of the online matrix portrait to calculate the subsequence, and detects abnormal points through a predefined threshold value, thereby achieving the purpose of abnormal detection and obtaining the optimal balance between efficiency and detection precision.
To demonstrate the effectiveness of the present invention, it was compared to existing assaysMethods SPOT, DSPOT, SR-CNN, DONUT, VAE and PAD F on KPI and Yahoo test datasets1A comparison of accuracy, recall and CPU run time is shown in table 1. The SR-CNN, DONUT, VAE and PAD are network-based methods, and the model needs to be trained, wherein the training time is shown in Table 2.
TABLE 1
TABLE 2
Method | KPI (second) | Yahoo (second) |
SR-CNN | 37390.82 | 1415.75 |
DONUT | 37412.59 | 1432.68 |
VAE | 37412.59 | 1432.68 |
PAD | 387691.11 | 14535.43 |
As can be seen from Table 1, F for the algorithm of the present invention under two data sets1Is higher than all methods without training. The algorithm of the present invention achieves the best performance on the Yahoo dataset, slightly inferior to VAE on the KPI dataset. Although the network-based methods DONUT, VAE and PAD perform better on KPI data sets, they take a lot of time and resources to train the model and need to retrain when the data distribution changes, which is not practical in practical scenarios. The algorithm of the invention is the only method which does not need training and can obtain better performance on two data sets. In general, the algorithm of the present invention achieves an optimal balance between accuracy and real-time.
The above description is only exemplary of the present invention and is not intended to limit the technical scope of the present invention, so that any minor modifications, equivalent changes and modifications made to the above exemplary embodiments according to the technical spirit of the present invention are within the technical scope of the present invention.
Claims (3)
1. An unsupervised anomaly detection method for operation and maintenance data through an online matrix portrait, the method comprising the steps of:
step 1, acquiring operation and maintenance data X represented by a time sequence, and performing data preprocessing: slicing a time series X into a plurality of subsequences X using a sliding window of window size mi,m;
Step 2, for each subsequence X in the time sequencei,mCalculating the subsequence Xi,mAnd all subsequences X which occur before itj,mTaking the minimum value p of the distanceiIs Xi,mThe subscript idx corresponding to the minimum distance is taken as the value of the on-line matrix imageiIs a nearest neighbor subscript; then the time series X online matrix picture P ═ { P ═ P1,…,pt,…,pn-m+1Corresponding nearest neighbor subscript I ═ idx }1,…,idxt,…,idxn-m+1};
Step 3, aiming at the state x at the time ttWhich is in contact with the first m-1 statesComponent subsequence Xt-m+1,m={xt-m+1,…,xtCalculating the subsequence X by an online matrix image algorithmt-m+1,mDistance p from its nearest neighbor subsequencet-m+1And nearest neighbor subscript idxt-m+1Calculating x using nearest neighbor subsequencestDistance significance r oftDistance significance rtAbove a predefined threshold τ, considered abnormal, otherwise considered normal; wherein τ is a constant.
2. The method of claim 1, wherein the method for unsupervised anomaly detection of the operation and maintenance data through online matrix sketch comprises: in the step 1, before the time series X is segmented by using the sliding window, the following processing is performed: and detecting whether the time sequence X has a missing value according to the time stamp, and if the missing value exists, filling by adopting first-order linear interpolation of adjacent states.
3. The method of claim 1, wherein the method for unsupervised anomaly detection of the operation and maintenance data through online matrix sketch comprises: the method comprises the steps that a buffer area with the size of c is set, and the buffer area stores historical state values of c most recent moments; the sub-sequence at the current time only computes a matrix representation with the sub-sequence in the buffer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110783748.3A CN113553232B (en) | 2021-07-12 | 2021-07-12 | Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110783748.3A CN113553232B (en) | 2021-07-12 | 2021-07-12 | Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113553232A true CN113553232A (en) | 2021-10-26 |
CN113553232B CN113553232B (en) | 2023-12-05 |
Family
ID=78131585
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110783748.3A Active CN113553232B (en) | 2021-07-12 | 2021-07-12 | Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113553232B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018101317A (en) * | 2016-12-21 | 2018-06-28 | ホーチキ株式会社 | Abnormality monitoring system |
CN108616545A (en) * | 2018-06-26 | 2018-10-02 | 中国科学院信息工程研究所 | A kind of detection method, system and electronic equipment that network internal threatens |
CN110071913A (en) * | 2019-03-26 | 2019-07-30 | 同济大学 | A kind of time series method for detecting abnormality based on unsupervised learning |
CN111913849A (en) * | 2020-07-29 | 2020-11-10 | 厦门大学 | Unsupervised anomaly detection and robust trend prediction method for operation and maintenance data |
CN112966017A (en) * | 2021-03-01 | 2021-06-15 | 北京青萌数海科技有限公司 | Abnormal subsequence detection method with indefinite length in time sequence |
-
2021
- 2021-07-12 CN CN202110783748.3A patent/CN113553232B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018101317A (en) * | 2016-12-21 | 2018-06-28 | ホーチキ株式会社 | Abnormality monitoring system |
CN108616545A (en) * | 2018-06-26 | 2018-10-02 | 中国科学院信息工程研究所 | A kind of detection method, system and electronic equipment that network internal threatens |
CN110071913A (en) * | 2019-03-26 | 2019-07-30 | 同济大学 | A kind of time series method for detecting abnormality based on unsupervised learning |
CN111913849A (en) * | 2020-07-29 | 2020-11-10 | 厦门大学 | Unsupervised anomaly detection and robust trend prediction method for operation and maintenance data |
CN112966017A (en) * | 2021-03-01 | 2021-06-15 | 北京青萌数海科技有限公司 | Abnormal subsequence detection method with indefinite length in time sequence |
Non-Patent Citations (2)
Title |
---|
王宪;柳絮青;宋书林;沈源: "一种无监督学习的异常行为检测方法", 光电工程, vol. 41, no. 3, pages 43 - 48 * |
蔡剑平;雷蕴奇;陈明明;王宁;张双越;: "带有隐式反馈的SVD推荐模型高效求解算法", 中国科学:信息科学, no. 10, pages 122 - 136 * |
Also Published As
Publication number | Publication date |
---|---|
CN113553232B (en) | 2023-12-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108038044B (en) | Anomaly detection method for continuous monitored object | |
EP2905665B1 (en) | Information processing apparatus, diagnosis method, and program | |
CN109508818B (en) | Online NOx prediction method based on LSSVM | |
WO2017139046A1 (en) | System and method for unsupervised root cause analysis of machine failures | |
Shi et al. | Improving power grid monitoring data quality: An efficient machine learning framework for missing data prediction | |
CN105607631B (en) | The weak fault model control limit method for building up of batch process and weak fault monitoring method | |
CN110852509A (en) | Fault prediction method and device of IGBT module and storage medium | |
CN116340796B (en) | Time sequence data analysis method, device, equipment and storage medium | |
Fu et al. | MCA-DTCN: A novel dual-task temporal convolutional network with multi-channel attention for first prediction time detection and remaining useful life prediction | |
CN114691753A (en) | Matrix filling-based rapid multivariate time sequence anomaly detection method | |
WO2017127260A1 (en) | System and method for allocating machine behavioral models | |
CN114048546A (en) | Graph convolution network and unsupervised domain self-adaptive prediction method for residual service life of aircraft engine | |
CN114357037A (en) | Time sequence data analysis method and device, electronic equipment and storage medium | |
CN114819260A (en) | Dynamic generation method of hydrologic time series prediction model | |
CN113553232B (en) | Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix image | |
CN117056842A (en) | Method, device, equipment, medium and product for constructing equipment abnormality monitoring model | |
Deng et al. | An intelligent hybrid deep learning model for rolling bearing remaining useful life prediction | |
CN116079498A (en) | Method for identifying abnormal signals of cutter | |
KR102486462B1 (en) | Method and Apparatus for Fault Detection Using Pattern Learning According to Degradation | |
CN115935285A (en) | Multi-element time series anomaly detection method and system based on mask map neural network model | |
KR102486463B1 (en) | Method and Apparatus for Real Time Fault Detection Using Time series data According to Degradation | |
CN111861798A (en) | Residential electricity data missing value interpolation method based on neighbor algorithm | |
Loyola | A method for real-time error detection in low-cost environmental sensors data | |
Qu et al. | Anomaly detection of massive bridge monitoring data through multiple transfer learning with adaptively setting hyperparameters | |
CN117235651B (en) | Enterprise information data optimization management system based on Internet of things |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |