CN113553232A - Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix portrait - Google Patents

Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix portrait Download PDF

Info

Publication number
CN113553232A
CN113553232A CN202110783748.3A CN202110783748A CN113553232A CN 113553232 A CN113553232 A CN 113553232A CN 202110783748 A CN202110783748 A CN 202110783748A CN 113553232 A CN113553232 A CN 113553232A
Authority
CN
China
Prior art keywords
subsequence
nearest neighbor
time
distance
online matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110783748.3A
Other languages
Chinese (zh)
Other versions
CN113553232B (en
Inventor
赵万磊
兰诗莹
陈润青
雷蕴奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202110783748.3A priority Critical patent/CN113553232B/en
Publication of CN113553232A publication Critical patent/CN113553232A/en
Application granted granted Critical
Publication of CN113553232B publication Critical patent/CN113553232B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for carrying out unsupervised anomaly detection on operation and maintenance data through an online matrix portrait, which comprises the steps of firstly adopting a sliding window with the window size of m to divide a time sequence X into a plurality of subsequences Xi,m(ii) a Then based on a plurality of subsequences Xi,mConstructing an online matrix representation P ═ P1,…,pt,…,pn‑m+1}; finally, calculating a nearest neighbor subsequence by an online matrix sketch algorithm, and calculating x by using the nearest neighbor subsequencetDistance significance r oftDistance significance rtAbove a predefined threshold τ, it is considered abnormal, otherwise it is considered normal. The invention can be carried out withoutThe single variable time sequence abnormity detection task is supervised, no model training is needed, and the abnormity can be efficiently and accurately found out.

Description

Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix portrait
Technical Field
The invention relates to a computer system anomaly detection technology, in particular to the field of intelligent operation and maintenance monitoring and the like which can be applied to Internet companies.
Background
The time sequence is a series of sequences proceeding in time orderThe observation points of the line tissue are generally considered to have the same time interval between each two points. Given a univariate time series X ═ X1,…,xt,…,xn},xtThe abnormal point is used for judging whether the state at the time t is obviously deviated from the normal state or not. In the field of intelligent operation and maintenance, time series anomaly detection is of great importance for monitoring key performance indexes.
Time series anomaly detection faces many challenges. First, because anomalies are rare and labeling them is difficult and expensive, it is impractical to collect a large amount of labeled anomaly data to train a model. Secondly, the concept drift of the time series occurs with the change of the environment. Once concept drift occurs, the model needs to be updated, which is quite time and labor consuming. Finally, the time-series pattern is not fixed, and may be represented by seasonal, steady or unstable patterns, and the like. Therefore, in practical applications, an unsupervised anomaly detection method which is insensitive to time series pattern variations is required.
However, the current unsupervised anomaly detection technology either requires a lot of resources for training or has poor performance, and cannot balance efficiency and detection accuracy. An unsupervised abnormality detection method based on prediction predicts a state at a current time using history data, and detects an abnormality by judging whether an observed value of the current state greatly deviates from a predicted value. This method relies heavily on the predicted performance, and if the predicted performance is not good, the detection effect is poor. The distribution-based abnormality detection method detects an abnormality by learning the distribution of the normal state and determining whether the state at the present time deviates from the distribution of the normal state. But anomalies in the data often interfere with learning distributively to normal states, degrading the performance of anomaly detection. Meanwhile, when the concept of the time sequence drifts, the model needs to be retrained to learn new distribution. When monitoring hundreds of time series simultaneously, the cost of training and maintaining these models is very high. The distance-based unsupervised anomaly detection method detects anomalies by exploring the relationship between the current state and its k neighbors. These methods are sensitive to the setting of the parameter k and require a relatively high time complexity.
Disclosure of Invention
In view of the above, the present invention provides an unsupervised anomaly detection method for operation and maintenance data through an online matrix sketch, so as to improve detection efficiency and detection accuracy.
In order to achieve the purpose, the invention adopts the technical scheme that:
an unsupervised anomaly detection method for operation and maintenance data through an online matrix portrait comprises the following steps:
step 1, acquiring operation and maintenance data X represented by a time sequence, and performing data preprocessing: slicing a time series X into a plurality of subsequences X using a sliding window of window size mi,m
Step 2, for each subsequence X in the time sequencei,mCalculating the subsequence Xi,mAnd all subsequences X which occur before itj,mTaking the minimum value p of the distanceiIs Xi,mThe subscript idx corresponding to the minimum distance is taken as the value of the on-line matrix imageiIs a nearest neighbor subscript; then the time series X online matrix picture P ═ { P ═ P1,…,pt,…,pn-m+1Corresponding nearest neighbor subscript I ═ idx }1,…,idxt,…,idxn-m+1};
Step 3, aiming at the state x at the time ttWhich form a subsequence X with the first m-1 statest-m+1,m=xt-m+1,…,xtCalculating the subsequence X by an online matrix image algorithmt-m+1,mDistance p from its nearest neighbor subsequencet-m+1And nearest neighbor subscript idxt-m+1Calculating x using nearest neighbor subsequencestDistance significance r oftIf the distance significance rt is greater than a predefined threshold τ, it is considered abnormal, otherwise it is considered normal; wherein τ is a constant.
In the step 1, before the time series X is segmented by using the sliding window, the following processing is performed:
and detecting whether the time sequence X has a missing value according to the time stamp, and if the missing value exists, filling by adopting first-order linear interpolation of adjacent states.
The method comprises the steps that a buffer area with the size of c is set, and the buffer area stores historical state values of c most recent moments; the sub-sequence at the current time only computes a matrix representation with the sub-sequence in the buffer.
After the scheme is adopted, the standard matrix portrait is firstly improved into an online matrix portrait, possible abnormal amplitude variation is reserved through mean value alignment, then the distance significance is calculated by utilizing the subsequence and the nearest neighbor subsequence thereof, and the distance significance utilizes the ratio of the distance rather than the distance to extract the abnormality, so that the distance significance is insensitive to the amplitude variation and is also applicable to a time sequence with a variable point; therefore, the invention can carry out the unsupervised time sequence anomaly detection task, does not need to carry out any model training and can efficiently and accurately find out the anomaly points.
In addition, the invention also provides a cache strategy, thereby greatly improving the operation efficiency and saving the storage space.
Detailed Description
The invention discloses an unsupervised anomaly detection method for operation and maintenance data through an online matrix portrait, which firstly improves a standard matrix portrait into an online matrix portrait and then extracts anomalies from the online matrix portrait by utilizing distance significance. The method specifically comprises the following steps:
step 1, data preprocessing: the original time series X is cleaned.
Step 1.1, X ═ X for a given original time series1,…,xt,…,xnAnd detecting whether a missing value exists according to the timestamp, and if the missing value exists, filling by adopting first-order linear interpolation of adjacent states.
Specifically, for a missing segment with a missing length less than or equal to M, first-order linear interpolation filling is directly performed by using states before and after the missing segment, and for a missing segment with a missing length greater than M, first-order linear interpolation filling is performed by using state values of the same time period in adjacent cycles of the missing segment.
And step 1.2, dividing the sequence into a plurality of subsequences by adopting a sliding window with the window size of m and the step length of 1. Each subsequence is denoted Xi,m={xi,xi+1,…,xi+m-1All subsequence sets are denoted as S ═ X1,m,…,Xt,m,…,Xn-m+1,m}. Wherein for hour scale data, m takes the value of 48; for the minute scale data, m takes the value 2880.
Step 2, for each subsequence X in the time sequencei,mCalculating the subsequence Xi,mAnd all subsequences X which occur before itj,mTaking the minimum value p of the distanceiIs Xi,mThe subscript idx corresponding to the minimum distance is taken as the value of the on-line matrix imageiIs a nearest neighbor subscript; then the time series X online matrix picture P ═ { P ═ P1,…,pt,…,pn-m+1Corresponding nearest neighbor subscript I ═ idx }1,…,idxt,…,idxn-m+1}。
Standard matrix pictures divide a time sequence into sub-sequences X of fixed length m using a sliding windowi,mAnd calculating the Euclidean distance between each subsequence passing through z-score and the nearest subsequence in the time sequence. In an online scenario, the state after the current time is unknown. Thus, the online matrix representation computes the Euclidean distance between each subsequence after z-score and the nearest neighbor subsequence that occurs before it. After the subsequence passes through z-score, the fluctuations of the subsequence itself are eliminated. However, the fluctuation may indicate the occurrence of an abnormality. To avoid such anomalies being ignored, the online matrix sketch only aligns the means when computing the Euclidean distances between the subsequences. Given subsequence Xi,mThe mean and variance are respectively muiAnd σi,Xi,mAnd the subsequence X which precedes itj,mThe distance calculation formula of (c) is as follows:
Figure BDA0003158271760000031
wherein<Xi,m,Xj,mCan be calculated by the previous moment<Xi-1,m,Xj-1,m>The calculation result of (2) is obtained, thereby speeding up the calculation.<Xi,m,Xj,m>The calculation formula of (a) is as follows:
<Xi,m,Xj,m>=<Xi-1,m,Xj-1,m>-xi-1xj-1+xi+m-1xj+m-1
in order to further improve the calculation efficiency and save the storage space, the invention sets a buffer area with the size of c. The buffer stores only the state values of the last c moments of the history. The sub-sequence at the current time only computes a matrix representation with the sub-sequence in the buffer.
Step 3, abnormality detection: for state x at time ttWhich form a subsequence X with the first m-1 statest-m+1,m={xt-m+1,…,xtCalculating the subsequence X by an online matrix image algorithmt-m+1,mDistance p from its nearest neighbor subsequencet-m+1And nearest neighbor subscript idxt-m+1Then using the nearest neighbor subsequence to calculate xtDistance significance r oftDistance significance rtAbove a predefined threshold τ, it is considered abnormal, otherwise it is considered normal. Wherein τ is a constant. Wherein,
Figure BDA0003158271760000041
l is a parameter, and in general, l ═ m is taken. But when m is large enough that there may be multiple anomalies in the window, take l < m.
The invention utilizes the distance significance of the online matrix portrait to calculate the subsequence, and detects abnormal points through a predefined threshold value, thereby achieving the purpose of abnormal detection and obtaining the optimal balance between efficiency and detection precision.
To demonstrate the effectiveness of the present invention, it was compared to existing assaysMethods SPOT, DSPOT, SR-CNN, DONUT, VAE and PAD F on KPI and Yahoo test datasets1A comparison of accuracy, recall and CPU run time is shown in table 1. The SR-CNN, DONUT, VAE and PAD are network-based methods, and the model needs to be trained, wherein the training time is shown in Table 2.
TABLE 1
Figure BDA0003158271760000042
TABLE 2
Method KPI (second) Yahoo (second)
SR-CNN 37390.82 1415.75
DONUT 37412.59 1432.68
VAE 37412.59 1432.68
PAD 387691.11 14535.43
As can be seen from Table 1, F for the algorithm of the present invention under two data sets1Is higher than all methods without training. The algorithm of the present invention achieves the best performance on the Yahoo dataset, slightly inferior to VAE on the KPI dataset. Although the network-based methods DONUT, VAE and PAD perform better on KPI data sets, they take a lot of time and resources to train the model and need to retrain when the data distribution changes, which is not practical in practical scenarios. The algorithm of the invention is the only method which does not need training and can obtain better performance on two data sets. In general, the algorithm of the present invention achieves an optimal balance between accuracy and real-time.
The above description is only exemplary of the present invention and is not intended to limit the technical scope of the present invention, so that any minor modifications, equivalent changes and modifications made to the above exemplary embodiments according to the technical spirit of the present invention are within the technical scope of the present invention.

Claims (3)

1. An unsupervised anomaly detection method for operation and maintenance data through an online matrix portrait, the method comprising the steps of:
step 1, acquiring operation and maintenance data X represented by a time sequence, and performing data preprocessing: slicing a time series X into a plurality of subsequences X using a sliding window of window size mi,m
Step 2, for each subsequence X in the time sequencei,mCalculating the subsequence Xi,mAnd all subsequences X which occur before itj,mTaking the minimum value p of the distanceiIs Xi,mThe subscript idx corresponding to the minimum distance is taken as the value of the on-line matrix imageiIs a nearest neighbor subscript; then the time series X online matrix picture P ═ { P ═ P1,…,pt,…,pn-m+1Corresponding nearest neighbor subscript I ═ idx }1,…,idxt,…,idxn-m+1};
Step 3, aiming at the state x at the time ttWhich is in contact with the first m-1 statesComponent subsequence Xt-m+1,m={xt-m+1,…,xtCalculating the subsequence X by an online matrix image algorithmt-m+1,mDistance p from its nearest neighbor subsequencet-m+1And nearest neighbor subscript idxt-m+1Calculating x using nearest neighbor subsequencestDistance significance r oftDistance significance rtAbove a predefined threshold τ, considered abnormal, otherwise considered normal; wherein τ is a constant.
2. The method of claim 1, wherein the method for unsupervised anomaly detection of the operation and maintenance data through online matrix sketch comprises: in the step 1, before the time series X is segmented by using the sliding window, the following processing is performed: and detecting whether the time sequence X has a missing value according to the time stamp, and if the missing value exists, filling by adopting first-order linear interpolation of adjacent states.
3. The method of claim 1, wherein the method for unsupervised anomaly detection of the operation and maintenance data through online matrix sketch comprises: the method comprises the steps that a buffer area with the size of c is set, and the buffer area stores historical state values of c most recent moments; the sub-sequence at the current time only computes a matrix representation with the sub-sequence in the buffer.
CN202110783748.3A 2021-07-12 2021-07-12 Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix image Active CN113553232B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110783748.3A CN113553232B (en) 2021-07-12 2021-07-12 Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110783748.3A CN113553232B (en) 2021-07-12 2021-07-12 Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix image

Publications (2)

Publication Number Publication Date
CN113553232A true CN113553232A (en) 2021-10-26
CN113553232B CN113553232B (en) 2023-12-05

Family

ID=78131585

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110783748.3A Active CN113553232B (en) 2021-07-12 2021-07-12 Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix image

Country Status (1)

Country Link
CN (1) CN113553232B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018101317A (en) * 2016-12-21 2018-06-28 ホーチキ株式会社 Abnormality monitoring system
CN108616545A (en) * 2018-06-26 2018-10-02 中国科学院信息工程研究所 A kind of detection method, system and electronic equipment that network internal threatens
CN110071913A (en) * 2019-03-26 2019-07-30 同济大学 A kind of time series method for detecting abnormality based on unsupervised learning
CN111913849A (en) * 2020-07-29 2020-11-10 厦门大学 Unsupervised anomaly detection and robust trend prediction method for operation and maintenance data
CN112966017A (en) * 2021-03-01 2021-06-15 北京青萌数海科技有限公司 Abnormal subsequence detection method with indefinite length in time sequence

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018101317A (en) * 2016-12-21 2018-06-28 ホーチキ株式会社 Abnormality monitoring system
CN108616545A (en) * 2018-06-26 2018-10-02 中国科学院信息工程研究所 A kind of detection method, system and electronic equipment that network internal threatens
CN110071913A (en) * 2019-03-26 2019-07-30 同济大学 A kind of time series method for detecting abnormality based on unsupervised learning
CN111913849A (en) * 2020-07-29 2020-11-10 厦门大学 Unsupervised anomaly detection and robust trend prediction method for operation and maintenance data
CN112966017A (en) * 2021-03-01 2021-06-15 北京青萌数海科技有限公司 Abnormal subsequence detection method with indefinite length in time sequence

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王宪;柳絮青;宋书林;沈源: "一种无监督学习的异常行为检测方法", 光电工程, vol. 41, no. 3, pages 43 - 48 *
蔡剑平;雷蕴奇;陈明明;王宁;张双越;: "带有隐式反馈的SVD推荐模型高效求解算法", 中国科学:信息科学, no. 10, pages 122 - 136 *

Also Published As

Publication number Publication date
CN113553232B (en) 2023-12-05

Similar Documents

Publication Publication Date Title
CN108038044B (en) Anomaly detection method for continuous monitored object
EP2905665B1 (en) Information processing apparatus, diagnosis method, and program
CN109508818B (en) Online NOx prediction method based on LSSVM
WO2017139046A1 (en) System and method for unsupervised root cause analysis of machine failures
Shi et al. Improving power grid monitoring data quality: An efficient machine learning framework for missing data prediction
CN105607631B (en) The weak fault model control limit method for building up of batch process and weak fault monitoring method
CN110852509A (en) Fault prediction method and device of IGBT module and storage medium
CN116340796B (en) Time sequence data analysis method, device, equipment and storage medium
Fu et al. MCA-DTCN: A novel dual-task temporal convolutional network with multi-channel attention for first prediction time detection and remaining useful life prediction
CN114691753A (en) Matrix filling-based rapid multivariate time sequence anomaly detection method
WO2017127260A1 (en) System and method for allocating machine behavioral models
CN114048546A (en) Graph convolution network and unsupervised domain self-adaptive prediction method for residual service life of aircraft engine
CN114357037A (en) Time sequence data analysis method and device, electronic equipment and storage medium
CN114819260A (en) Dynamic generation method of hydrologic time series prediction model
CN113553232B (en) Technology for carrying out unsupervised anomaly detection on operation and maintenance data through online matrix image
CN117056842A (en) Method, device, equipment, medium and product for constructing equipment abnormality monitoring model
Deng et al. An intelligent hybrid deep learning model for rolling bearing remaining useful life prediction
CN116079498A (en) Method for identifying abnormal signals of cutter
KR102486462B1 (en) Method and Apparatus for Fault Detection Using Pattern Learning According to Degradation
CN115935285A (en) Multi-element time series anomaly detection method and system based on mask map neural network model
KR102486463B1 (en) Method and Apparatus for Real Time Fault Detection Using Time series data According to Degradation
CN111861798A (en) Residential electricity data missing value interpolation method based on neighbor algorithm
Loyola A method for real-time error detection in low-cost environmental sensors data
Qu et al. Anomaly detection of massive bridge monitoring data through multiple transfer learning with adaptively setting hyperparameters
CN117235651B (en) Enterprise information data optimization management system based on Internet of things

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant