CN111694879B - Multielement time sequence abnormal mode prediction method and data acquisition monitoring device - Google Patents

Multielement time sequence abnormal mode prediction method and data acquisition monitoring device Download PDF

Info

Publication number
CN111694879B
CN111694879B CN202010439838.6A CN202010439838A CN111694879B CN 111694879 B CN111694879 B CN 111694879B CN 202010439838 A CN202010439838 A CN 202010439838A CN 111694879 B CN111694879 B CN 111694879B
Authority
CN
China
Prior art keywords
abnormal
subsequence
time sequence
algorithm
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010439838.6A
Other languages
Chinese (zh)
Other versions
CN111694879A (en
Inventor
王玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN202010439838.6A priority Critical patent/CN111694879B/en
Publication of CN111694879A publication Critical patent/CN111694879A/en
Application granted granted Critical
Publication of CN111694879B publication Critical patent/CN111694879B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

The invention provides a multi-element time sequence abnormal mode prediction method and a data acquisition monitoring device; the method comprises the following steps: according to the natural neighbor principle, acquiring an optimal k value of an MMOD algorithm based on historical data; performing online expansion on the MMOD algorithm to realize online identification of a multivariate time sequence abnormal mode; and converting the multi-element time sequence subsequence into an observation sequence according to an increment fuzzy self-adaptive clustering algorithm, constructing a hidden Markov model based on a Baum-Welch algorithm and all the observation sequences, and realizing online prediction of a multi-element time sequence abnormal mode based on the constructed hidden Markov model. According to the invention, through the multi-element time sequence data acquisition system of the cloud platform, related data to be mined are better obtained, and real-time prediction of abnormal modes of the multi-element time sequence can be realized by utilizing an online density difference abnormal detection algorithm and a Markov prediction model algorithm. And a monitoring system APP is constructed, so that real-time monitoring is facilitated.

Description

Multielement time sequence abnormal mode prediction method and data acquisition monitoring device
Technical Field
The invention relates to the technical fields of abnormal mode identification and prediction systems, cloud platforms and monitoring systems, in particular to a multi-element time sequence abnormal mode prediction method and a data acquisition and monitoring device.
Background
Data mining is an emerging technology that has emerged with the development of artificial intelligence and database technology, and is aimed at extracting, from a large, fuzzy, random array of practical application data, information and knowledge that is hidden in its interior, that people were previously unaware of, but potentially useful.
Anomaly detection is an important subject in data mining, and is widely applied to various fields, and is always a hot spot for researches of students. As a type of complex data commonly used in data mining, related researches on a multi-element time series mainly comprise multi-element time series discretization, multi-element time series similarity measurement, multi-element time series anomaly detection and the like. There are often special multiple time series sub-sequences in the multiple time series whose behavior deviates from most other sub-sequences in the multiple time series, which rarely occur are called multiple time series anomaly patterns. These subsequences account for a small proportion, but these abnormal patterns tend to contain more useful information than normal patterns and are of greater research value. For example, in the medical field, whether a disease outbreak exists is detected by analyzing electrocardiographic data to quickly determine whether a patient's heart rate is abnormal. In the field of network intrusion detection, network node communication traffic is detected, when the communication traffic is abnormal within a certain time period, the communication traffic can be reported timely, and whether network intrusion exists in an abnormal segment is further detected. In the industrial field, industrial units are tested for damage due to continuous use and normal wear by monitoring the performance of industrial components such as engines, turbines, oil and the like flowing in pipes or other mechanical components.
The abnormal mode often has a large amount of implicit information, which often contains the reasons of occurrence of the abnormality and the rules of occurrence of the abnormality, and the existing abnormality can be predicted through researching the historical abnormal mode, so that possible occurrence of the abnormality is prevented in advance. In recent years, due to the arrangement of a large number of data collection tools mainly comprising sensors, the data volume from various fields such as weather, hydrology, medicine, industrial sites and the like is rapidly increasing, and how to identify and predict abnormal patterns from the continuously changed data is a problem to be solved urgently.
Anomaly detection is one of the basic problems in research in the field of data mining, and to date, no accepted statement is made about the definition of anomalies, and the definition given by Hawkin is commonly adopted: an anomaly is data that deviates from most of the data in the dataset, thereby giving rise to suspicion that it is generated by another mechanism. The exception of the present invention is the definition given by Hawkin.
Early anomaly detection methods are mainly directed to unordered data sets and can be broadly classified into algorithms based on statistical methods, distance-based methods, density-based methods, clustering-based methods, and the like. Firstly, carrying out distribution model assumption on an integral data set, and then carrying out anomaly identification on a small probability event; the distance-based anomaly detection method first calculates the distance between each data object and their nearest neighbors, and then performs anomaly discrimination on data farther from their nearest neighbors; firstly, estimating the density of each data object by using a density-based anomaly detection method, and then carrying out anomaly identification on the data object with lower density; the clustering-based anomaly detection method firstly clusters data and then performs anomaly identification on clustering clusters far from clustering center data or containing a small number of data objects. The multi-element time series subsequence obtained after segmentation can be regarded as a group of unordered data sets.
Multivariate time series has been popular as a class of important data in a wide variety of practical applications, which has attracted tremendous attention over the last decades. Many researchers began to extend some of the detection methods of data anomalies to anomaly detection in a multivariate time series. The principal component analysis is used for carrying out dimension reduction treatment on the multi-element time series to obtain multi-element time series mode representation, and then the first M multi-element time series with the largest local abnormal factors are identified as abnormal modes by using a local abnormal factor detection algorithm LOF (local outlier factor). Although the method can detect global abnormality and local abnormality, the calculation process of the method is complex, and the method is not suitable for detecting the online multivariate time sequence abnormality mode. There is a study of clustering the divided multi-element time series, calculating an abnormality score of each multi-element time series segment by using a CBLOF (Cluster-Based Local Outier Fator) abnormality degree detection algorithm based on the clustering result, and setting the multi-element time series segment having an abnormality degree larger than a set threshold value as an abnormality pattern. The effect of this approach is highly dependent on the clustering method, so how to select a suitable clustering method remains an important issue. The cluster inner points with the best clustering quality are used as periodic point identifiers, the multi-element time sequence is divided, then the characteristic value of each periodic sequence is extracted, and the abnormal pattern recognition of the periodic subsequence is realized by using a normal abnormal classifier constructed by the characteristic information of the periodic subsequence with the label. The algorithm is a supervised multi-element time sequence abnormal mode detection algorithm, can only identify periodic anomalies, cannot well detect non-periodic anomalies, and is not suitable for online detection of multi-element time sequence abnormal modes. An adaptive multi-element time sequence sub-sequence detection algorithm is also provided, wherein the algorithm takes the average value of k neighbor distances of the algorithm as an anomaly score, and then whether sub-sequences under different anomaly scores are abnormal or not is determined based on the concept of fuzzy membership, so that a threshold value is indirectly and automatically determined. Although the method can automatically determine the threshold value, the k value is selected to be sensitive to the abnormal detection result. There is a document that proposes a semi-supervised anomaly detection framework, the algorithm firstly performs dimension reduction on a multi-element time sequence, then performs sparse coding on dimension reduced data, then constructs a sparse coding feature matrix, calculates anomaly scores thereof, and considers anomalies if the anomaly scores are greater than a threshold value. The algorithm can only identify point anomalies of the multi-element time series, but cannot identify mode anomalies of the multi-element time series. The multiple times are reported to be built into a graph structure, the nodes of the graph are used to represent subsequences or data points, and similarity values of corresponding nodes are captured by searching for the edge-associated weights of the graph. If the similarity is large, the method is normal, and if the similarity is small, the method is abnormal. Although this method can identify point anomalies or subsequence anomalies, it takes much time to estimate the weight of its edges, which is inefficient.
In summary, how to predict the abnormal pattern of the multivariate time series on line becomes a difficult problem to be solved.
Disclosure of Invention
The invention aims to solve the technical problem of providing a multi-element time sequence abnormal mode prediction method and a data acquisition monitoring device so as to solve the problem of on-line abnormal mode prediction of a multi-element time sequence.
In order to solve the technical problems, the invention provides the following technical scheme:
a method of multivariate time series anomaly mode prediction, the method comprising:
according to the principle of natural neighbor, acquiring an optimal k value of an outlier detection algorithm MMOD estimated by density data based on historical data of a multivariate time sequence, so as to configure the optimal k value for the MMOD algorithm;
performing online expansion on the MMOD algorithm, and detecting abnormal modes of the multi-element time sequence according to the configured optimal k value, so as to realize online identification of the abnormal modes of the multi-element time sequence based on the MMOD algorithm;
and converting the subsequence of the multi-element time sequence into the observation sequence according to the increment fuzzy self-adaptive clustering algorithm, constructing a hidden Markov model of the multi-element time sequence based on the Baum-Welch algorithm and all the observation sequences, and realizing online prediction of the abnormal mode of the multi-element time sequence based on the constructed hidden Markov model.
Further, the on-line identification of the multivariate time series exception mode based on the MMOD algorithm comprises the following steps:
detecting whether a new multi-element time sequence subsequence is generated in real time;
if a new multi-element time sequence subsequence is generated, calculating an abnormality score of the current multi-element time sequence subsequence on line, and comparing the magnitude relation between the current abnormality score and a preset threshold in real time;
if the current abnormality score is greater than the preset threshold, the current multivariate time sequence subsequence is in an abnormal mode.
Further, the method for realizing the online prediction of the multivariate time series anomaly mode based on the constructed hidden Markov model comprises the following steps:
judging the observation state of the current multielement subsequence in real time based on an increment fuzzy self-adaptive clustering algorithm;
based on the observation state sequence of the current multi-element subsequence, predicting the hidden state of the next multi-element subsequence through a hidden Markov model.
Further, the method for obtaining the optimal k value of the outlier detection algorithm MMOD estimated by the density data based on the historical data of the multivariate time sequence according to the principle of natural neighbor comprises the following steps:
s1, initializing sup k =1,nb i =0;
S2, searching sup of each subsequence k Neighbor subsequence and use nb i Natural neighbor subsequences representing the ith subsequence, usingStoring the inverse neighbor subsequence of the ith subsequence;
s2.1, calculating the number of subsequences with null natural neighbors, and recording as
S2.2, ifGo to S3, otherwise sup k =sup k +1 rotationTo S2;
s3, determining k as sup k -2 maximum inverse neighbor number under neighbor, i.e
Further, the method for realizing online identification of the multivariate time series abnormal mode based on the MMOD algorithm further comprises the following steps:
for the newly arrived multiple time series subsequence x t When x is t For historic multivariate time series subsequence x i In the k-nearest neighbor mode of (2), d (x t ,x i )<δ k (x i ) Historical multivariate time series subsequence x of (2) i Updating the abnormal score; storing the k neighbor distances of each multi-element time sequence subsequence;
wherein ,d(xt ,x i ) Is a multiple time sequence subsequence x t Its ith neighbor multiple time sequence subsequence x l Distance delta of (d) k (x i ) Is a multiple time sequence subsequence x l Distance from its kth neighbor pattern.
Further, the conversion from the multi-element time sequence subsequence to the observation sequence is realized according to the incremental fuzzy self-adaptive clustering algorithm, and a hidden Markov model of the multi-element time sequence is constructed based on the Baum-Welch algorithm and all the observation sequences, and the method comprises the following steps:
Clustering the subsequences of each variable of the multi-element time sequence according to an increment fuzzy self-adaptive clustering algorithm, and classifying and symbolizing the subsequences according to a maximum membership principle; each different symbol is considered as an observation of the multivariate time series; thereby realizing the conversion from the multi-element time sequence subsequence to the observation sequence through an increment fuzzy self-adaptive clustering algorithm;
calculating a density estimation abnormal value of each historical multi-element time sequence subsequence, judging the size of the density estimation abnormal value and a preset threshold value, and if the density estimation abnormal value is smaller than the preset threshold value, judging the density estimation abnormal value to be in a normal mode, otherwise, judging the density estimation abnormal value to be in an abnormal mode;
if a new observation state is generated, initializing an initial state of a hidden Markov model by using a current observation sequence and an abnormal mode detection result, wherein a matrix pi is obtained by the following formula, BB is randomly assigned, and AA is used for taking an average value;
wherein, normal N and abnormal N are the number of normal mode and abnormal mode;
a hidden markov model constructed based on the Baum-Welch algorithm and all observed sequences.
Further, the expression of the hidden Markov model constructed based on the Baum-Welch algorithm and all observed sequences is as follows:
λ=(AA,BB,Π)
Π=[Π 1 (normal),Π 2 (abnormal)]
wherein aa= [ a ] ij ](1≤i,j≤N)、BB=[b ik ](i is more than or equal to 1 and less than or equal to N, k is more than or equal to 1 and less than or equal to M) is a state transition matrix and an emission matrix respectively; a, a ij =P[i t+1 =q j |i t =q i ]For the t-th multiple time-series segment is an implicit state q i And the t+1st multiple time series segment is an implicit state q j Probability of b ik =P[o t =v k |i t =q i ]Hiding the state q for the t-th multiple time-series segment i And the observed state is v k Probability of pi= [ pi ] i ](1.ltoreq.i.ltoreq.N) is an initial hidden state matrix, pi i =P[i 1 =q i ]Representing the hidden state of the 1 st multiple time sequence segment as q i Is a probability of (2).
Further, the method for realizing the online prediction of the multivariate time series anomaly mode based on the constructed hidden Markov model comprises the following steps:
searching the hidden state i of the t-th subsequence in the current multi-element time sequence subsequence observation state sequence in all possible paths { i } by using Viterbi algorithm 1 ,i 2 ,L,i t-1 The maximum probability of i } and predicting whether the next subsequence is abnormal using the following equation based on the state transition matrix AA; wherein, the length of the observation sequence is t:
when a new multi-element time sequence subsequence is generated, estimating a cluster to which the newly generated multi-element time sequence fragment belongs, and converting the newly generated multi-element time sequence subsequence into a corresponding observation state; calculating the abnormal score of the newly generated multi-element time sequence subsequence, updating the abnormal score of the subsequence with the changed k neighbor set, and judging whether the current multi-element time sequence subsequence is abnormal or not; and initializing an initial state of the hidden Markov model by using the current observation sequence and the abnormal mode detection result when generating a new observation state.
Correspondingly, in order to solve the technical problems, the invention also provides the following technical scheme:
the data acquisition monitoring device comprises a data acquisition module and a cloud platform centralized monitoring module; wherein,
the data acquisition module is used for acquiring monitoring data of the abnormal state of the local multi-element time sequence in real time, and realizing local data monitoring, historical data sampling and storage and uploading of preset real-time data to the cloud platform centralized monitoring system; the monitoring data are obtained by predicting the local multi-element time sequence abnormal state through the multi-element time sequence abnormal mode prediction method;
the cloud platform centralized monitoring module is used for acquiring real-time monitoring data so as to monitor the data condition.
Further, the data acquisition monitoring device also comprises a data service center;
the data service center uses the real-time history database to store real-time data of the production process and provide retrieval service, and uses the business SQL database to store static data of the business process and provide retrieval service.
The technical scheme of the invention has the following beneficial effects:
aiming at the problems that an MMOD abnormal detection algorithm based on data density estimation of density peaks cannot process a multi-element time sequence and parameters are required to be set manually, the invention provides an online density difference abnormal detection algorithm. The hidden Markov model is introduced, and a Markov prediction model can be constructed by observing the hidden state sequences of the sequence and the history mode, so that the real-time prediction of the abnormal mode of the multi-element time sequence is realized.
In addition, the invention also constructs a monitoring system APP, which is convenient for real-time monitoring and is simple and applicable. The three parts of the whole invention form a complete multi-element time sequence abnormality analysis monitoring system and device, and provide a beneficial reference for the research and development of the related fields.
Drawings
FIG. 1 is a flowchart of online anomaly mode prediction according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a data acquisition system according to an embodiment of the present invention;
fig. 3 is a diagram of a cloud platform data link scheme and an access network structure provided by an embodiment of the present invention;
FIG. 4a is a diagnostic chart of an adaptive MMOD abnormal pattern provided by an embodiment of the present invention;
FIG. 4b is a diagram of an abnormal pattern tag according to an embodiment of the present invention;
FIG. 4c is a diagnostic chart of an abnormal pattern of LOF provided by an embodiment of the present invention;
FIG. 4d is a graph of the algorithm F provided by the embodiment of the present invention;
FIG. 4e shows the accuracy of abnormal pattern recognition at different values of k of MMOD according to the embodiment of the present invention;
FIG. 4f is a diagram showing the number of online LOF and online adaptive MMOD mode anomaly score updates provided by an embodiment of the present invention;
FIG. 4g is a graph of HMM-based prediction results provided by an embodiment of the present invention;
FIG. 4h is a diagnostic chart of an actual abnormal pattern provided by an embodiment of the present invention;
FIG. 4i is a diagram of LSTM-based prediction results provided by an embodiment of the present invention;
FIG. 4j is a graph showing the online response time of the combination algorithm for online anomaly pattern recognition and prediction according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages to be solved more apparent, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.
The multi-element time sequence abnormality comprises point abnormality, mode abnormality and sequence abnormality, and if the points in the sequence deviate from most points, the points are abnormal; if the subsequence in the sequence deviates from most subsequences, the pattern is abnormal; if the sequence deviates from most of the sequences in the sequence set, the sequence is abnormal. The invention mainly researches the abnormal mode of a multi-element time sequence, improves the MMOD (mean time domain) according to the problem, adaptively obtains the neighbor number k based on historical data, and expands the neighbor number k on line to provide an on-line adaptive MMOD abnormal mode detection algorithm. And based on the hidden Markov model, the online prediction of the multivariate time series abnormal mode is realized. Detecting whether a new multi-element time sequence sub-sequence is generated or not in real time, if the new multi-element time sequence sub-sequence is generated, calculating the abnormality score of the current multi-element time sequence sub-sequence on line, comparing the magnitude relation between the current abnormality score and a threshold value in real time, and if the current abnormality score is larger than the threshold value, considering the current multi-element time sequence sub-sequence as an abnormality sub-sequence, namely an abnormality mode. And judging the observation state of the current multielement subsequence in real time based on fuzzy clustering, and predicting the next hidden state based on the current observation state sequence.
Aiming at the problem that the detection result of the traditional MMOD algorithm is sensitive to k neighbor, self-adaptive MMOD density detection for automatically acquiring the optimal k neighbor value based on historical data is provided. Aiming at the problem that the traditional multivariate time sequence anomaly detection algorithm can not identify the anomaly mode online. And carrying out online expansion on the MMOD algorithm. The hidden Markov model is introduced, and a Markov prediction model can be constructed by observing the hidden state sequences of the sequences and the historical modes, so that the real-time prediction of the abnormal modes of the multi-element time sequence is realized.
The data acquisition system can better acquire relevant data to be mined, and the data acquisition system is applied to a multivariate time sequence anomaly analysis system, so that the anomaly state can be evaluated and further diagnosis can be made. The proposed algorithm enables real-time prediction of abnormal patterns of a multivariate time series. Finally, a monitoring system APP is constructed, so that real-time monitoring is facilitated. The data acquisition system, the abnormal mode prediction algorithm and the monitoring APP are mutually connected to form a complete system. The following describes the invention in more detail by means of specific examples.
First embodiment
Referring to fig. 1 to fig. 4j, the present embodiment provides a method for predicting a multivariate time series abnormal mode, which is used for a system and a device for performing multivariate time series abnormal state analysis, and provides real-time, accurate and unified state information through a cloud platform system. Aiming at the problems that an MMOD anomaly detection algorithm based on data density estimation of density peaks cannot handle a multi-element time sequence and parameters are required to be set manually, an online density difference anomaly detection algorithm is provided. And the embodiment verifies the effectiveness of the online multivariate time series abnormal mode prediction algorithm on the multivariate time series abnormal state data set. Finally, a new multi-element time sequence abnormal mode prediction method is provided in the whole process, and more concrete reference comments are provided for analysis and decision-making of related multi-element time sequence abnormal states. The monitoring system APP is mainly used for inquiring and displaying related information, and is updated and modified online, so that the monitoring by management personnel in real time is facilitated. The following is a specific description:
Method for identifying and predicting abnormal modes of multi-element time sequence
The multi-element time sequence abnormality comprises point abnormality, mode abnormality and sequence abnormality, and if the points in the sequence deviate from most points, the points are abnormal; if the subsequence in the sequence deviates from most subsequences, the pattern is abnormal; if the sequence deviates from most of the sequences in the sequence set, the sequence is abnormal.
The present embodiment selects a local anomaly factor recognition algorithm to detect an anomaly pattern of a multivariate time series. The local anomaly factor recognition algorithm recognizes an anomaly pattern by calculating an anomaly score for each pattern. The outlier score for each pattern is related to the k neighbor pattern distances for each pattern. Compared with the widely applied LOF local anomaly detection algorithm, the outlier detection algorithm MMOD for density data estimation has less influence on historical data when in online expansion, and the anomaly detection accuracy is higher than that of LOF. However, the detection accuracy is also affected by the neighbor number k, so that the embodiment improves the MMOD according to the problem, adaptively obtains the neighbor number k based on historical data, online expands the neighbor number k, provides an online adaptive MMOD abnormal mode detection algorithm, and realizes online prediction of a multi-element time sequence abnormal mode based on a hidden Markov model.
1.1MMOD anomaly identification algorithm
The MMOD abnormality detection algorithm is a new abnormality detection algorithm based on local abnormality factors. Aiming at the problem of high computational complexity of the traditional local anomaly detection algorithm, the algorithm provides a local anomaly detection algorithm based on a density peak value. Unlike conventional density-based anomaly detection algorithms, it estimates the local density of each mode from the kernel accumulated value of the k-nearest neighbor set of each mode, as shown in equation 1:
wherein δk (p) is the distance between pattern p and its kth neighbor pattern, the close distance between patterns adopts Li Zhengxin multi-element time sequence pattern distance calculation method (compressing sub-sequences first, and then calculating the distance between compressed sub-sequences based on DTW), d (p, x i ) For pattern p and the ith neighbor pattern x l Is the distance of pattern pThe score is calculated by the following formula:
MMOD(p)=1-M(p) (2)
1.2 MMOD-based on-line abnormal pattern recognition algorithm
MMOD is used as a local anomaly factor detection algorithm, and the data density estimation result is greatly influenced by the k value. When the k-nearest neighbor value of the MMOD is set improperly, the normal mode may be identified as the abnormal mode, or the abnormal mode may be identified as the normal mode. Under the condition of no priori knowledge, proper k value is difficult to set, and the invention provides a self-adaptive MMOD algorithm based on natural neighbors. In consideration of the boundless property of the multi-element time series, the MMOD algorithm is expanded on line, and the on-line identification of abnormal modes of the multi-element time series is realized.
1.2.1 automatic acquisition of k-value based on historical data
In 2016, the Jinlong H introduces the concept of natural neighbors, and provides an algorithm based on automatically acquiring a better k value based on the natural neighbors, so as to avoid the problem that a local different factor detection algorithm is inaccurate caused by incorrect setting of the k neighbors by people. The idea of natural neighbor is introduced, and the optimal k value of the MMOD is obtained based on historical data, so that the optimal k value is configured for the MMOD algorithm.
Natural neighbors consider that an isolated pattern should have the least natural neighbors and that a non-isolated pattern should have more neighbors with a greater density. An orphan pattern is defined as a pattern with a natural neighbor number of 0 under the k-neighbor condition. And if the number of the isolated modes is kept unchanged in the process of k neighbor, k+1 neighbor and k+2 neighbor searching, the maximum inverse neighbor number in the k neighbor searching result is the self-adaptive k value of local abnormality. The specific algorithm comprises the following steps:
input: multiplex time series subsequence
And (3) outputting: adaptive k value
Step1 initializing sup k =1,nb i =0。
Step2: sup of searching for the respective subsequence k Neighbor subsequences, i.e. patterns, and using nb i A natural neighbor subsequence representing an i-th subsequence,the inverse neighbor subsequence of the ith subsequence is stored.
Step2.1: calculating the number of subsequences with natural neighbors being empty, and recording as
Step2.2: if it isGo to Step3, otherwise sup k =sup k +1 goes to Step2.
Step3: then k is sup k -2 maximum inverse neighbor number under neighbor, i.e
1.2.2 on-line updating of anomaly scores
Assuming that a pattern, i.e., a multiple time series sub-sequence, newly arrives at time t, the new pattern is denoted as x for convenience of description t The history pattern set is { x } i I=1, l, count, where count represents all modes currently stored.
Calculation pattern x t And { x in the history pattern set i Distance of i=1, l, count element, comparing d (x) t ,x i ) And delta k (x i ) The size of d (x) t ,x i )<δ k (x i ) Then describe pattern x i The k-nearest neighbor set of (a) is changed, and pattern x is required to obtain accurate anomaly score i Is updated. If pattern x t Is pattern x i K neighbor of delta k (x i )=δ k-1 (x i ). Let x be i Is { x } for the historical k-nearest neighbor dataset ik′ I k' =1, l, k }, then the updated k neighbor data set is { x } ik′ |k′=1,L,k-1}∪x t Then cope with pattern x i Is (x) i ) Update is performedThe deduction process is as follows:
wherein ,Mold (x i ) Represents x t X before arrival i Density, M new (x i ) Represents x t X after arrival i Density of kNN old (x i ) Represented at x t X before arrival i K neighbor pattern set, kNN new (x i ) Represented at x t X after arrival i K neighbor mode set of (2), then M new (x i ) And M is as follows old (x i ) The following relationship exists:
as can be seen from formula (5), due to M new (x i ) And M is as follows old (x i ) There is no exact size relationship between them. When new pattern x t For history pattern x i In k-nearest neighbor mode of (2), x will be i The k-nearest neighbor pattern set of (c) changes, and x is now i Abnormal characteristic M of (2) new (x i ) Will also change, requiring x to be re-determined i And correcting the abnormality detection result of the history mode by the abnormality degree. Thus, when x t For history pattern x i In the k-nearest neighbor mode of (2), d (x) t ,x i )<δ k (x i ) History pattern x of (2) i And updating the anomaly score. To avoid history pattern x i The k-nearest neighbor distance of each pattern is stored for the new pattern x t Only need to find x when coming t And (3) k-nearest neighbor mode.
1.2.3 hidden Markov anomaly detection framework construction
A hidden markov model is a statistical model that considers that the current hidden state affects the next hidden state and that the probability of transition between hidden states does not change over time. If a pattern sequence of a multivariate time series exists, a hidden Markov model can be constructed, and prediction of an abnormal pattern of the multivariate time series can be realized.
According to the increment fuzzy self-adaptive clustering algorithm, the subsequences of each variable of the multi-element time sequence can be clustered, and the subsequences are classified and symbolized according to the maximum membership principle. It is assumed that the observation states of the same time series converge into one class, and the different observation states converge into different classes. The conversion of the multivariate time series from the observation series can be achieved by fuzzy clustering. Since the subsequences of each variable may be clustered and symbolized, and the observed state of each variable's subsequence corresponds to its clustered result, the observed states of multiple variable subsequences may collectively characterize the observed states of multiple time-series subsequences.
Described in terms of an EEG eye state multivariate time series subsequence, the EEG eye state dataset contains three variables AF, F7, FC5, where AF is aggregated into 3 classes and is symbolized by A 0 ,A 1 ,A 2 Representing class cluster, the time sequence sub-sequence corresponding to AF attribute has three observation states, A is used 0 ,A 1 ,A 2 A representation; f7 is gathered into 4 classes and B is symbolized by class 1 ,B 2 ,B 3 ,B 4 The time sequence sub-sequence corresponding to the F7 attribute has four observation states, B is used 1 ,B 2 ,B 3 ,B 4 A representation; FC5 is aggregated into 3 classes and symbolized with class C 1 ,C 2 ,C 3 The time sequence sub-sequence corresponding to the FC5 attribute has three observation states and C is used 1 ,C 2 ,C 3 And (3) representing. The observation results are shown in table 1.
TABLE 1 EEG eye state observed states
The observation state of each variable subsequence of the multi-element time sequence can be obtained by an incremental fuzzy clustering algorithm. Whereas the observed state of the multiple time series sub-sequences is commonly characterized by the sub-sequences of the three variables. The EEG eye state is still described here as an example. Since the AF3, F7, FC5 of the EEG eye state contains 3, 4, 3 observation states, respectively, it is known that the multivariate time series has at most 3×4×3=36 observation states. Where the symbolized representation of each variable class label combination of the first 20 sub-sequences of the EEG eye state dataset is represented by table 2, the observed state of the first 20 modes of the multiplex time sequence can be represented by table 3.
TABLE 2 EEG eye state classification results
TABLE 3 EEG eye state observed states
For the multivariate time series abnormal pattern prediction problem, assume a given T multivariate time series segments Z 1 ,Z 2 ,L,Z T Its corresponding observation sequence is o= { O 1 ,o 2 ,L,o T }. The observed sequence corresponding to the first 5 multiple time series subsequences is { A } 2 B 3 C 2 ,A 3 B 2 C 3 ,A 1 B 1 C 1 ,A 1 B 1 C 1 ,A 1 B 1 C 1 Set of observation states v= { V for the first 5 observation sequences } 1 ,v 2 ,v 3 And v 1 =A 2 B 3 C 2 ,v 2 =A 3 B 2 C 3 ,v 3 =A 1 B 1 C 1 . When a new sequence segment is generated, the corresponding segment can be obtained through fuzzy clustering The observation state may or may not exist in the historical observation state set V, and if not, the newly generated observation state should be added to the historical observation state set V. For example, the observation state of the 6 th multiplex time series subsequence is A 1 B 4 C 1 Since it is not in the observation state set v= { V 1 ,v 2 ,v 3 In, therefore, an observation state A that does not appear is required at this time 1 B 4 C 1 Added to the historical observation state set v= { V 1 ,v 2 ,v 3 In the }, a new observation state set v= { V is obtained 1 ,v 2 ,v 3 ,v 4 And v 4 =A 1 B 4 C 1 The observation state sequence at this time is { A 2 B 3 C 2 ,A 3 B 2 C 3 ,A 1 B 1 C 1 ,A 1 B 1 C 1 ,A 1 B 1 C 1 ,A 1 B 4 C 1 }。
Suppose that given T multiple time-series segments (patterns) Z 1 ,Z 2 ,L,Z T Is v= { V 1 ,v 2 ,L,v M Because each observation state can only have two normal or abnormal results, each observation state only contains two hidden states (normal and abnormal), and then the hidden state set Q= { Q corresponding to the multivariate time sequence mode 1 ,L,q N N=2), and q 1 =normal,q 2 =abnormal (as shown in equation 7), the HMM parametric model can be constructed as follows:
λ=(AA,BB,Π) (6)
Π=[Π 1 (normal),Π 2 (abnormal)] (7)
wherein aa= [ a ] ij ](1≤i,j≤N)、BB=[b ik ](1 is less than or equal to i is less than or equal to N,1 is less than or equal to k is less than or equal to M) is a state transition matrix and an emission matrix respectively. a, a ij =P[i t+1 =q j |i t =q i ]For the t-th multiple time-series segment is an implicit state q i And the t+1st multiple time series segment is an implicit state q j Probability of b ik =P[o t =v k |i t =q i ]Hiding the state q for the t-th multiple time-series segment i And the observed state is v k Probability of pi= [ pi ] i ](1.ltoreq.i.ltoreq.N) is an initial hidden state matrix, pi i =P[i 1 =q i ]Representing the hidden state of the 1 st multiple time sequence segment as q i Is a probability of (2).
HMMs can solve many problems, including mainly the following:
1) Evaluation of the problem: in the case where both the model and the observation sequence are given, the probability P (o|λ) of the given observation sequence is found.
2) Decoding problem: given both the model and the observation sequence, the most likely-to-occur implicit state sequence is found.
3) Learning problem: in the case of a given observation sequence, a parameter λ is found that maximizes the output probability P (o|λ) of the observation sequence.
In order to realize the prediction of the multivariate time series abnormal mode, an HMM model needs to be established first, and the construction method of the HMM model comprises supervised learning and unsupervised learning. Since only the observation sequence is known, an unsupervised learning algorithm is used herein to estimate the parameters of the HMM model. Two learning methods are described below.
When the sample is labeled data, supervised learning can be performed, and the state transition matrix and the emission matrix are calculated as follows:
wherein |qi I represents an initial state of q i Number of (q) ij I indicates that at time t is an implicit state q i Is an implicit state q at time t+1 j Number of (v) ik I indicates that the observed state is v k And the implicit state is q i Is a number of (3).
When the sample is label-free data, a Baum-Welch algorithm is often adopted to estimate a state transition matrix, a transmitting matrix and an initial state matrix. Baum-Welch is an algorithm for unsupervised estimation of HMM models.
The known observation state sequence is o= { O 1 ,o 2 ,L,o T Assume i= { I } 1 ,i 2 ,L,i T Is an implicit state sequence, thenIs (O, I) = { O 1 ,o 2 ,L,o T ,i 1 ,i 2 ,L,i T Probability of }, log likelihood function of +.> wherein />And estimating the value of the current parameter of the HMM.
E-solving based on EM algorithmThe following formula is obtained:
m maximization using EM algorithmIteratively estimating AA, BB and pi of the HMM:
assuming that the HMM current parameter value λ is known, the probability of occurrence of the following event can be estimated.
1) The observed states of the first 1, L and t multiple time sequence modes are o respectively 1 ,o 2 ,L,o t And the implicit state of the t-th mode is q i Probability alpha of (2) t (i)。
2) Implicit state in the t-th mode is q i Under the condition of (1), the observation states corresponding to the t+1, L and T modes are o respectively t+1 ,o t+2 ,L,o T Probability beta of (2) t (i)。
The calculation formula of the two is as follows:
let xi t (i, j) the implicit states for the t-th mode and t+1-th mode are q respectively i and qj Probability of (2); let gamma t (i) Implicit state q for the t-th mode i And then xi t (i, j) and gamma t (i) Can be calculated from the following formula:
due toThen AA, BB, pi can be re-estimated according to equations 15, 16 using the lagrangian multiplier method, as follows:
/>
then it can be known that the HMM model is given randomly as a set of parametersLet->Then traversing the EM algorithm once can retrieve a set of +.>Parameter value of>The probability of the observed state can be estimatedThe constant iteration lambda eventually results in a stable P (O|lambda) at which time the model training is considered complete, usually considered +.>At that time, the model training is completed.
The HMM model considers that the current hidden state is related to the last hidden state only, is independent of the past hidden state and the observed state, and the current observed state is independent of the past observed state. The HMM model based on the known parameters can predict the next mode hidden state.
The prediction problem of the implicit state of the next mode, i.e. given o= { O 1 ,o 2 ,L,o t Under the condition of }, the most likely implicit state sequence I= { I is calculated 1 ,i 2 ,L,i t ,i t+1}, wherein it+1 Is the implicit state corresponding to the t+1st mode. As known from the dynamic programming rule, the forward solving process needs to satisfy: if the optimal path from the 1 st mode to the (t+1) th mode is Then a path must be guaranteed +.>Is the optimal path from mode 1 to mode t, otherwiseIt is not the optimal path. The Viterbi algorithm seeks an optimal implicit state sequence under an observation state sequence based on dynamic programming ideas, so this embodiment predicts a multivariate time series anomaly pattern based on Viterbi and a state transition matrix.
The Viterbi algorithm can solve for a given o= { O 1 ,o 2 ,L,o T The most likely implicit state sequence i= { I under the condition } 1 ,i 2 ,L,i T }. Definition delta t (i) All single paths { i } for an implicit state i in the t-th mode 1 ,i 2 ,L,i t Probability maximum, ψ t (i) Implying all single paths { i } with state i for the t-th mode 1 ,i 2 ,L,i t-1 The penultimate node of the probability maximum path for i } is mathematically as follows:
from delta t (i) Definition of delta t+1 (i) Can be calculated by the following formula:
then if λ= (AA, BB, pi) and o= { O are known 1 ,o 2 ,L,o T Then the maximum probability of the partial path with the hidden state of the T-th mode being i can be searched backwards gradually from t=1, and when t=t, all possible paths { i) of each hidden state i are obtained 1 ,i 2 ,L,i T-1 Maximum probability of i }.
The Viterbi algorithm can obtain the observation sequence o 1 ,o 2 ,L,o t Optimal path of (i) and all possible paths { i } 1 ,i 2 ,L,i t-1 Maximum probability delta for each hidden state under i t (i) I=1, l, n; t is greater than or equal to 1, then it can be known that the (t+1) th pattern implies that state i is in all possible paths { i } 1 ,i 2 ,L,i t ,i t+1 The maximum probability under } is:
then the t+1st mode is in all possible paths { i } 1 ,i 2 ,L,i t ,i t+1 The most likely implicit state under }:
the most likely implicit state of the next mode isBased on this, it is estimated whether the implicit state of the next mode is normal or abnormal.
Because only the observation sequence is known, accurate model parameters AA, BB and pi of the HMM model cannot be obtained, a Baum-Welch algorithm is adopted to search for a parameter lambda with the maximum output probability P (O|lambda) under the given condition of the observation sequence. For the state transition matrix AA of the historical mode Baum-Welch, taking an average value when initializing, initializing the observation state probability matrix BB by adopting a random value, and initializing the initial state matrix pi by using the detection result of the online MMOD; in the online updating process of the HMM prediction model, if a new observation state is generated, all modes contained currently are regarded as history modes, and initialization is carried out according to the initialization methods of the history modes AA, BB and pi; if no new observation state is generated, all modes contained in the current are regarded as history modes, and the current HMM prediction model is initialized by using the HMM prediction model parameters AA, BB and pi before updating. The initialization formula of the pi by using the detection result of the online MMOD is as follows:
Where normal and abnormal are the number of normal and abnormal modes. The specific process of the HMM online prediction model is shown in the flow chart 1.
1.2.4 Algorithm implementation steps
And according to the result of the increment fuzzy self-adaptive clustering, symbolizing the multi-element time sequence segment, wherein each different symbol can be regarded as an observation state of the multi-element time sequence. And converting the multi-element time sequence subsequence into an observation state by an incremental fuzzy self-adaptive algorithm. After the initial history mode is subjected to the self-adaptive MMOD abnormality detection, information about the abnormality mode is obtained. Combining the detected historical abnormal mode with a multi-element time sequence observation state sequence, estimating an initial state matrix pi of the HMM model, constructing the HMM model by using a Baum-Welch algorithm, and predicting whether the next mode is abnormal or not based on the current observation state sequence and a state transition matrix AA. The method comprises the following specific steps:
step1: and compressing the historical mode, and calculating the distance between any two multivariate time sequences in the compressed historical multivariate time sequence based on a DTW algorithm.
Step2: the natural k-nearest neighbor of the history pattern is taken as the best k-nearest neighbor value of the online MMOD.
Step3: calculating the density estimation abnormal value of each history mode, judging the size of the density estimation abnormal value and the set threshold value, if the density estimation abnormal value is smaller than the set threshold value, judging the density estimation abnormal value is normal, and if the density estimation abnormal value is abnormal.
Step4: if a new observation state is generated, initializing an initial state matrix pi of the HMM model by using the current observation sequence and an abnormal mode detection result, wherein the initial state matrix pi is obtained by a formula (26), BB is randomly assigned, AA is used for taking an average value, and otherwise, AA, pi and BB are inherited.
Step4.1: a hidden Markov model is constructed based on the Baum-Welch algorithm and all observed sequences, and the process goes to Step5.
Step5: searching for the implicit state i of the t (observation sequence length is t) mode in the current multivariate time sequence mode observation state sequence in all possible paths { i } by using Viterbi 1 ,i 2 ,L,i t-1 The maximum probability of i } and predicts whether the next mode is abnormal using equation (25) based on the state transition matrix AA.
Step6: judging whether a new multi-element time sequence sub-sequence is generated, if so, estimating a cluster to which the newly generated multi-element time sequence segment belongs based on the cluster center of the last data block, and converting the multi-element time sequence sub-sequence into a corresponding observation state.
Step7: and (3) calculating an online MMOD abnormality score of the newly generated multi-element time sequence subsequence, updating the abnormality score of the mode of the k neighbor set change based on a formula 6, judging whether the current multi-element time sequence subsequence is abnormal based on a threshold value, and switching to Step4.
Experimental verification and result analysis of (II) multivariate time series online abnormal pattern recognition and prediction algorithm
And verifying the multi-element time sequence online abnormal mode prediction algorithm by adopting the data of the multi-element time sequence abnormal data set. The multivariate time series abnormal state can be used to evaluate and make further diagnoses. In order to verify the effectiveness of the proposed algorithm, the proposed algorithm is compared with a multi-element time sequence local anomaly detection algorithm based on LOF and SKLOF.
2.1 description of the experiment
In the embodiment, three variables (AF 3, F7 and FC 5) of an EEG eye state data set are selected for the multi-element time series on-line abnormal pattern recognition and prediction study. And identifying the abnormal mode based on the mode density of the multi-element time sequence. And then constructing a hidden Markov prediction model based on the observed state of the historical mode, the detection state of the historical mode and Baum-Welch. To illustrate the effectiveness of the algorithm, the detected and predicted results are compared to different algorithms in the EEG eye state dataset, respectively. The multi-element time sequence subsequence is obtained by using a FOSMTS segmentation algorithm, and the observation state is obtained by converting based on an IFACA fuzzy clustering algorithm.
2.2 offline anomaly mode mining
In order to identify the historical abnormal pattern, an off-line abnormal pattern detection is performed by adopting data density estimation based on the self-adaptive k value. To get the anomaly pattern diagnostic map of fig. 4a, for illustrative purposes of the effectiveness of the adaptive density estimation algorithm, an anomaly signature of the EEG eye state dataset is shown in fig. 4b, where the region with red oval marks is the anomaly pattern region.
As can be seen from fig. 4a and fig. 4b, the outlier of the adaptive MMOD algorithm is relatively large when the multivariate time series is in the outlier mode, and the outlier is relatively low when the multivariate time series is in the normal mode. The method can well separate the normal mode from the abnormal mode, and is suitable for identifying the abnormal mode of the multi-element time sequence.
To further illustrate the effectiveness of the proposed algorithm, the detection results of the adaptive MMOD algorithm are compared to the LOF, SKLOF algorithm, whose anomaly diagnosis is shown in fig. 4 c. As can be seen from fig. 4a and 4d, the anomaly score trends for the adaptive MMOD and LOF are similar in the same mode, but the anomaly score for the LOF algorithm in the last anomaly mode is lower, while the anomaly score for the adaptive MMOD detection algorithm is higher. To further illustrate the effectiveness of the adaptive MMOD algorithm. Table 4 shows the correct number of abnormal patterns detected by the three algorithms, and table 5 shows the accuracy indexes of the abnormal pattern detection by the three algorithms, and it can be clearly seen that the abnormal pattern detection accuracy of the adaptive MMOD is the highest, while the abnormal pattern detection accuracy of the LOF is the lowest, and the abnormal pattern detection accuracy of the SKLOF is still lower than the accuracy of the adaptive MMOD algorithm although it is higher than the LOF. The method is characterized in that the LOF and the SKLOF do not consider modes in category boundaries, k neighbor modes of the modes are always in two different categories, the k neighbor modes in a low-density area can reduce the k neighbor average mode density of the category boundary modes, so that the abnormal score of the boundary modes is increased, the abnormal score of the boundary modes is higher than the abnormal score of an actual abnormal mode, the accuracy of abnormal detection is reduced, the abnormal score of the modes is calculated by adopting a kernel function accumulated value through the adaptive MMOD, the mode far away from the boundary modes has small contribution to the density of the edge modes, the mode density of the boundary modes is not obviously reduced, the probability of identifying the boundary modes as the abnormal modes is effectively reduced, and the accuracy of an algorithm is improved. For more visual observation, fig. 4d shows weighted evaluation index F curves of various algorithms, and the larger the F value, the higher the anomaly identification accuracy of the algorithm is, so that the F curve of the adaptive MMOD is at the top, and compared with other algorithms, the adaptive MMOD has the best weighted evaluation quality.
TABLE 4 multivariate time series pattern anomaly detection results
Table 5 each algorithm detects the index
The evaluation index of the algorithm accuracy is often evaluated using a weighted evaluation index of the balance accuracy PP and recall RR:
where TP is the number of identified patterns as abnormal patterns, FP is the number of identified patterns as non-abnormal patterns, and FN is the number of unrecognized patterns as abnormal patterns.
To verify the validity of the adaptive k values, the first 40 patterns, the first 50 patterns, and the first 60 patterns of the history data are taken to form three data sets, which are labeled as data set 1, data set 2, and data set 3. Dataset 1 will find four abnormal patterns, dataset 2 five abnormal patterns, dataset 3 7 abnormal patterns. Fig. 4f shows the recognition accuracy PP curves of the respective data sets at different k values, and it can be seen from the three data sets that the different k values have a great influence on the detection result of the MMOD abnormal pattern recognition, and it is important to find a suitable k value, as shown in fig. 4 e. The optimal k-values obtained based on natural neighbors in the data sets 1, 2, 3 are known as 32, 29, 44, respectively. Whereas the true optimal k-nearest neighbors of data set 1, 2 include 32, 29, the optimal k-nearest neighbors of data set 3 are 55, 56, 58, which can identify 5 anomaly patterns. From the three data sets, it can be seen that the resulting adaptive k values are most often optimal and the recognition accuracy is superior to most other k.
2.3 on-line Pattern recognition and prediction
And (3) artificially setting a multivariate time sequence abnormal mode threshold, carrying out online abnormal mode identification on follow-up data of the EEG eye state data set based on the threshold, and comparing with an online LOF to illustrate the effectiveness of the proposed algorithm. The effectiveness of the online LOF algorithm and the online MMOD algorithm is explored from two aspects of anomaly detection effect and the update number of multivariate time series anomaly scores.
The neighbor pattern set of the history patterns may be destroyed when a new pattern arrives, thereby further affecting the anomaly score of the history patterns, so that when a new pattern arrives, there may be one or more anomaly scores of the history patterns to update, and table 6 gives the number of historical pattern anomaly scores to update when two algorithms arrive at the 61 th to 72 th patterns, and it is apparent that when a new multivariate time series segment arrives, the online LOF needs to update more anomaly scores of the history patterns than the online adaptive MMOD (hereinafter referred to as online MMOD).
The complexity of the algorithm in finding patterns that update the anomaly score is simply analyzed as follows. Assuming that the number of patterns included in the pattern set D is N, the complexity of the algorithm that finds that its k-nearest neighbor domain changes is O (N). If the number of objects whose k-nearest neighbor distance changes is m, the complexity of the algorithm for finding the change in local reachable density is O (mNk), and if the number of local reachable density change patterns is p, the complexity of the algorithm for finding other LOF change patterns is O (pNk). Whereas the MMOD algorithm only needs to perform the operation of finding that its k-nearest neighbor changes, its complexity is O (N). Thus, the online MMOD algorithm can calculate its anomaly score more quickly than the online LOF algorithm, as shown in fig. 4 f.
TABLE 6 Multi-element time series New pattern anomaly update Table
In this embodiment, since the abnormal mode is mainly identified, in this embodiment, whether the next mode is abnormal is estimated by using the HMM model, in order to illustrate the validity of the constructed model, table 7 shows the prediction results of the long-short memory network (LSTM) and the HMM mode, and the part where the abnormal prediction is correct is displayed in a black bold font, fig. 4g shows the prediction result of the HMM, fig. 4h shows the actual abnormal mode diagram, fig. 4i shows the prediction result of the LSTM, and the red part represents the detected abnormal part in the three diagrams. The accuracy PP, recall RR, and weighted evaluation index F of the two are given in table 8. As can be seen from table 7, LSTM predicts three anomalies, where the number of correctly predicted anomalies is 1, the number of incorrectly predicted anomalies is 2, and HMM models predicts three anomalies, where the number of correctly predicted anomalies is 2, the number of incorrectly predicted anomalies is 1, and the prediction effect of HMM models is significantly higher than that of LSTM models. As can be seen from Table 8, the predicted accuracy PP, recall RR, and weighted evaluation index F of the HMM model are higher than those of the LSTM model.
TABLE 7 multivariate time series pattern anomaly prediction results
Table 8 two algorithm prediction accuracy detection index
To effectively illustrate the real-time nature of the proposed algorithm, this embodiment compares the online response times of the three combined models, as shown in fig. 4 j. The three combined models include: online mmod+hmm prediction model, online lof+hmm prediction, online mmod+lstm prediction. It can be obviously seen that the online response time of the online MMOD+HMM prediction model is shortest, the online LOF+HMM prediction is carried out for a corresponding time, and the online response time of the online MMOD+LSTM prediction is longest, because the LSTM prediction model is introduced, the online abnormal mode identification and prediction response time is increased, and the response speed is reduced. In summary, the online abnormal pattern recognition and prediction of the online MMOD+HMM prediction model have the best real-time performance.
Second embodiment
The embodiment provides a data acquisition and monitoring system based on a cloud platform, as shown in fig. 2; wherein,
the data acquisition system is a support system for acquiring multi-element time sequence abnormal state data, C++ is used as a development language, and various data communication protocols such as 101, 102, 103, 104, modbus, CDT, DISA and the like of various IEC 60870-5 are embedded; modeling accords with the requirements of an interface reference model, a Common Information Model (CIM) and a Component Interface Specification (CIS) in IEC 61970, accords with international standards, and can be used as middleware to be seamlessly integrated with each system; and the access of system data such as a monitoring system, a comprehensive energy management and control system, metering, fault analysis, alarm pushing and the like is realized. The system supports the access of various devices and has the resolving capability of various protocols.
The monitoring system adopts a 2-level architecture, namely a single data acquisition system and a cloud platform centralized monitoring system. The data acquisition system is used for acquiring monitoring data of abnormal states of the local multi-element time sequence in real time, and realizing local data monitoring, historical data sampling and storage and uploading of key real-time data to the cloud platform centralized monitoring system. The cloud platform centralized monitoring system acquires real-time monitoring data from the data acquisition system and is used for monitoring the data condition. The communication protocol between the 2-level systems can adopt an electric power standard IEC104 protocol or other protocols, the real-time data acquisition frequency supports the second level according to the protocol requirements, and modes such as variable quantity uploading, cyclic uploading and calling can be supported.
The main equipment of the data acquisition system comprises a data acquisition front end, a serial port server, an industrial personal computer, a display, data acquisition software and the like, the control of the system is realized through a touch screen, 2 automatic and manual operation modes are provided, and a trigger signal can be automatically received to start or stop the system in the automatic operation mode; in manual mode, the operator can press an operating button to start or stop the system. So as to realize the control of all relevant devices on the system work site. The main elements of the control loop of the electrical equipment adopt international brands such as schneider and the like, and the stability and reliability of the electrical control of the whole system are ensured.
1. In-situ data acquisition
1) Monitoring a data interface by using multivariate time sequence abnormal data: communication interface MB485;
2) A touch screen;
3) Accessing other system data;
4) And a management server.
2. The cloud platform data link scheme and the access network structure diagram are shown in fig. 3;
3. data storage, retrieval and analysis
The data service center uses the real-time history database to store real-time data of the production process, provides retrieval service, and the business SQL database (Oracle or MYSQL) to store static data of the business process, and provides retrieval service. The database design organizes the management of the database in an object-oriented mode conforming to the natural mode of human thinking, realizes the monitoring mode taking equipment as a unit, is convenient for equipment maintenance and fault diagnosis, and improves the data retrieval and searching efficiency.
The real-time database system is novel database management system software, and based on a high-speed database engine developed by a 64-bit system and an advanced distributed cluster architecture, the real-time database system is suitable for collecting, storing, retrieving and publishing massive real-time/historical data, has good horizontal expansion capability and high availability, and can process dynamic data which changes rapidly along with time.
The technical indexes are as follows:
1) Scale of: support the label count scale of more than 100 ten thousand.
2) Speed of: high speed real-time, historical data retrieval capability. Real-time data millisecond-level response; historical data for the month span retrieves the second order response.
3) Storage type: support flexible and diverse multiple data types:
omicron boolean formula
O integer (8 bit/16 bit/32 bit/64 bit)
Floating point type (32 bit/64 bit)
Date data (time stamp)
Other (OWA)
4) Efficient data compression:
the method supports multiple lossless and lossy compression modes, greatly improves the storage efficiency, and improves the analysis and retrieval speeds in massive historical data.
The method supports two stages of compression capability, can effectively improve the utilization rate of network resources, reduces the requirement on hardware, provides multi-stage buffering, and improves the high availability of the system.
The allocation of the granularity of the supporting points is performed, and the compression algorithm can be flexibly selected according to the characteristics of different data.
And the compression ratio is up to tens times.
Advanced distributed architecture:
the hot backup mechanism based on the cluster enables the system to have high availability of data.
The distributed redundant storage architecture enables the system to have high elastic expansion capability.
The o disaster recovery mechanism fully ensures the high security of production data.
5) Flexible data access interface:
the o provides a C/c++/JAVA/JSON interface for third party calls and writing of data.
Third embodiment
The embodiment provides a multi-element time sequence abnormal data acquisition monitoring system APP, which mainly realizes the inquiry and display of related information, updates and modifies on line and is convenient for management personnel to monitor in real time. The system mainly comprises a user registration and login module, an online query module, an area display module, a modification module, login exit and the like. The operation is simple and convenient, and the interface is concise and beautified. The system has real-time performance, and registered users can log on the system through the mobile phone APP wherever they are. The system provides automatic query and display functions and user registration information management capabilities. The system runs stably and safely for a long time.
The APP is matched with the multi-element time sequence abnormal data acquisition monitoring system of the invention, so that an integral system is formed. Using HBuilderX as a development tool, using HTML5+ CSS + JavaScript language development, and building APP with MUI front end framework. The APP realizes a registration and login function, queries the distribution area of the regional data monitoring points, and loads the distribution condition of each region in the map on line and acquires the inflection point coordinates of the region by utilizing the GPS positioning function on the Android mobile phone.
APP platform component
1) Client terminal
The client uses MUI front end framework to develop and design, and uses HTML5, CSS and JavaScript language to develop the front end.
2) Server end
The server uses the ThinkJS server framework to develop, and can realize functions of registration, login verification, data transmission, addition, modification and deletion by matching with a MySQL database.
3) System background management
The system management background is developed by using HTML5, CSS and JavaScript language and is used for managing the database.
2. Development tool
1) MUI front end frame (based on HTML5, CSS, javaScript)
2) The method is used for designing and developing Android clients, HTML5 and CSS, and JavaScript is also used for developing a system management background.
3) ThinkJS service end frame (based on NodeJS)
The logic interface is used for providing services for the client and the system management background, and corresponding functions are realized.
4) Database MySQL
And the system is used for storing the ecological red line related data and the user information.
Furthermore, it should be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media having computer-usable program code embodied therein.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should also be noted that while the above describes the preferred embodiments of the present invention, it should be noted that once the basic inventive concept is known to those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered as the protection scope of the present invention. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Claims (3)

1. A multivariate time series abnormal pattern prediction method applied to the medical field for detecting abnormal patterns of electrocardiographic data, the method comprising:
according to the principle of natural neighbor, acquiring an optimal k value of an outlier detection algorithm MMOD estimated by density data based on historical data of a multivariate time sequence, so as to configure the optimal k value for the MMOD algorithm; wherein the multivariate time series refers to a time series consisting of electrocardiographic data of the medical field;
performing online expansion on the MMOD algorithm, and detecting abnormal modes of the multi-element time sequence according to the configured optimal k value, so as to realize online identification of the abnormal modes of the multi-element time sequence based on the MMOD algorithm;
converting the subsequence of the multi-element time sequence into an observation sequence according to an increment fuzzy self-adaptive clustering algorithm, constructing a hidden Markov model of the multi-element time sequence based on a Baum-Welch algorithm and all the observation sequences, and realizing online prediction of an abnormal mode of the multi-element time sequence based on the constructed hidden Markov model; wherein the abnormal pattern refers to a sub-sequence in the sequence deviating from most sub-sequences;
the MMOD algorithm-based on-line identification of the multivariate time sequence abnormal mode comprises the following steps:
Detecting whether a new multi-element time sequence subsequence is generated in real time;
if a new multi-element time sequence subsequence is generated, calculating an abnormality score of the current multi-element time sequence subsequence on line, and comparing the magnitude relation between the current abnormality score and a preset threshold in real time;
if the current abnormality score is greater than a preset threshold, the current multivariate time sequence subsequence is in an abnormal mode;
the method for realizing the online prediction of the multivariate time series abnormal mode based on the constructed hidden Markov model comprises the following steps:
judging the observation state of the current multielement subsequence in real time based on an increment fuzzy self-adaptive clustering algorithm;
based on the observation state sequence of the current multi-element subsequence, predicting the hidden state of the next multi-element subsequence through a hidden Markov model;
the method for acquiring the optimal k value of the outlier detection algorithm MMOD estimated by density data based on the historical data of the multivariate time sequence according to the principle of natural neighbor comprises the following steps:
s1, initializing sup k =1,nb i =0;
S2, searching sup of each subsequence k Neighbor subsequence and use nb i Natural neighbor subsequences representing the ith subsequence, usingStoring the inverse neighbor subsequence of the ith subsequence;
s2.1, calculating the number of subsequences with null natural neighbors, and recording as
S2.2, ifGo to S3, otherwise sup k =sup k +1 goes to S2;
s3, determining k as sup k -2 maximum inverse neighbor number under neighbor, i.e
The MMOD algorithm-based on-line identification of the multivariate time sequence abnormal mode is realized, and the method further comprises the following steps:
for the newly arrived multiple time series subsequence x t When x is t For historic multivariate time series subsequence x i In the k-nearest neighbor mode of (2), d (x t ,x i )<δ k (x i ) Historical multivariate time series subsequence x of (2) i Updating the abnormal score; storing the k neighbor distances of each multi-element time sequence subsequence;
wherein ,d(xt ,x i ) Is a multiple time sequence subsequence x t Its ith neighbor multiple time sequence subsequence x l Distance delta of (d) k (x i ) Is a multiple time sequence subsequence x l The distance from its kth neighbor pattern;
the conversion from the multi-element time sequence subsequence to the observation sequence is realized according to the increment fuzzy self-adaptive clustering algorithm, and a hidden Markov model of the multi-element time sequence is constructed based on the Baum-Welch algorithm and all the observation sequences, and the method comprises the following steps:
clustering the subsequences of each variable of the multi-element time sequence according to an increment fuzzy self-adaptive clustering algorithm, and classifying and symbolizing the subsequences according to a maximum membership principle; each different symbol is considered as an observation of the multivariate time series; thereby realizing the conversion from the multi-element time sequence subsequence to the observation sequence through an increment fuzzy self-adaptive clustering algorithm;
Calculating a density estimation abnormal value of each historical multi-element time sequence subsequence, judging the size of the density estimation abnormal value and a preset threshold value, and if the density estimation abnormal value is smaller than the preset threshold value, judging the density estimation abnormal value to be in a normal mode, otherwise, judging the density estimation abnormal value to be in an abnormal mode;
if a new observation state is generated, initializing an initial state of a hidden Markov model by using a current observation sequence and an abnormal mode detection result, wherein a matrix pi is obtained by the following formula, BB is randomly assigned, and AA is used for taking an average value;
wherein, normal N and abnormal N are the number of normal mode and abnormal mode;
a hidden Markov model constructed based on the Baum-Welch algorithm and all observation sequences;
the expression of the hidden Markov model constructed based on the Baum-Welch algorithm and all observation sequences is as follows:
λ=(AA,BB,Π)
Π=[Π 1 (normal),Π 2 (abnormal)]
wherein aa= [ a ] ij ](1≤i,j≤N)、BB=[b ik ](i is more than or equal to 1 and less than or equal to N, k is more than or equal to 1 and less than or equal to M) is a state transition matrix and an emission matrix respectively; a, a ij =P[i t+1 =q j |i t =q i ]For the t-th multiple time-series segment is an implicit state q i And the t+1st multiple time series segment is an implicit state q j Probability of b ik =P[o t =v k |i t =q i ]For the t th multiplex time series fragmentThe hidden state is q i And the observed state is v k Probability of pi= [ pi ] i ](1.ltoreq.i.ltoreq.N) is an initial hidden state matrix, pi i =P[i 1 =q i ]Representing the hidden state of the 1 st multiple time sequence segment as q i Probability of (2);
The method for realizing the online prediction of the multivariate time series abnormal mode based on the constructed hidden Markov model comprises the following steps:
searching the hidden state i of the t-th subsequence in the current multi-element time sequence subsequence observation state sequence in all possible paths { i } by using Viterbi algorithm 1 ,i 2 ,…,i t-1 The maximum probability of i } and predicting whether the next subsequence is abnormal using the following equation based on the state transition matrix AA; wherein, the length of the observation sequence is t:
when a new multi-element time sequence subsequence is generated, estimating a cluster to which the newly generated multi-element time sequence fragment belongs, and converting the newly generated multi-element time sequence subsequence into a corresponding observation state; calculating the abnormal score of the newly generated multi-element time sequence subsequence, updating the abnormal score of the subsequence with the changed k neighbor set, and judging whether the current multi-element time sequence subsequence is abnormal or not; and initializing an initial state of the hidden Markov model by using the current observation sequence and the abnormal mode detection result when generating a new observation state.
2. The data acquisition monitoring device is characterized by comprising a data acquisition module and a cloud platform centralized monitoring module; wherein,
The data acquisition module is used for acquiring monitoring data of the abnormal state of the local multi-element time sequence in real time, and realizing local data monitoring, historical data sampling and storage and uploading of preset real-time data to the cloud platform centralized monitoring module; wherein, the monitoring data is obtained by predicting a local multi-element time series abnormal state by the multi-element time series abnormal mode prediction method according to claim 1;
the cloud platform centralized monitoring module is used for acquiring real-time monitoring data so as to monitor the data condition.
3. The data acquisition monitoring device of claim 2, wherein the data acquisition monitoring device further comprises a data service center;
the data service center uses the real-time history database to store real-time data of the production process and provide retrieval service, and uses the business SQL database to store static data of the business process and provide retrieval service.
CN202010439838.6A 2020-05-22 2020-05-22 Multielement time sequence abnormal mode prediction method and data acquisition monitoring device Active CN111694879B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010439838.6A CN111694879B (en) 2020-05-22 2020-05-22 Multielement time sequence abnormal mode prediction method and data acquisition monitoring device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010439838.6A CN111694879B (en) 2020-05-22 2020-05-22 Multielement time sequence abnormal mode prediction method and data acquisition monitoring device

Publications (2)

Publication Number Publication Date
CN111694879A CN111694879A (en) 2020-09-22
CN111694879B true CN111694879B (en) 2023-10-31

Family

ID=72476740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010439838.6A Active CN111694879B (en) 2020-05-22 2020-05-22 Multielement time sequence abnormal mode prediction method and data acquisition monitoring device

Country Status (1)

Country Link
CN (1) CN111694879B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163641B (en) * 2020-10-30 2022-06-14 浙江大学 High-dimensional data visualization method based on probability multi-level graph structure
CN113742883A (en) * 2020-11-20 2021-12-03 国网河北省电力有限公司雄安新区供电公司 Method for dividing service life cycle of alternating current contactor based on multivariate time sequence
CN112487631A (en) * 2020-11-25 2021-03-12 中国科学院力学研究所 Intelligent identification method for working condition parameters of transverse landslide buried pipeline
CN112506933B (en) * 2020-12-17 2024-04-12 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) High-speed multichannel time sequence data storage method
CN112527788A (en) * 2020-12-17 2021-03-19 北京中恒博瑞数字电力科技有限公司 Method and device for detecting and cleaning abnormal value of transformer monitoring data
CN112749509B (en) * 2020-12-30 2022-06-10 西华大学 Intelligent substation fault diagnosis method based on LSTM neural network
CN112445682B (en) * 2021-02-01 2021-05-11 连连(杭州)信息技术有限公司 System monitoring method, device, equipment and storage medium
CN113010805B (en) * 2021-02-23 2023-09-01 腾讯科技(深圳)有限公司 Index data processing method, device, equipment and storage medium
CN113055374B (en) * 2021-03-10 2022-07-08 湖南大学 Detection method and system for IEC104 power protocol security test
CN113112544B (en) * 2021-04-09 2022-07-19 国能智慧科技发展(江苏)有限公司 Personnel positioning abnormity detection system based on intelligent Internet of things and big data
CN113158871B (en) * 2021-04-15 2022-08-02 重庆大学 Wireless signal intensity abnormity detection method based on density core
CN113435825B (en) * 2021-05-06 2023-04-25 中国农业科学院烟草研究所(中国烟草总公司青州烟草研究所) Intelligent management method, system and storage medium based on soil-borne disease control
CN113343581B (en) * 2021-06-28 2022-11-11 山东华科信息技术有限公司 Transformer fault diagnosis method based on graph Markov neural network
CN114861729A (en) * 2022-05-20 2022-08-05 西安邮电大学 Method and device for detecting time sequence abnormity in wireless sensor network
CN115146174B (en) * 2022-07-26 2023-06-09 北京永信至诚科技股份有限公司 Multi-dimensional weight model-based key clue recommendation method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103400040A (en) * 2013-07-31 2013-11-20 中国人民解放军国防科学技术大学 Fault diagnosis and prediction method utilizing multistep time domain difference value learning
WO2016150395A1 (en) * 2015-03-24 2016-09-29 Huawei Technologies Co., Ltd. Adaptive, anomaly detection based predictor for network time series data
CN108491970A (en) * 2018-03-19 2018-09-04 东北大学 A kind of Predict Model of Air Pollutant Density based on RBF neural
CN108681923A (en) * 2018-05-16 2018-10-19 浙江大学城市学院 A kind of consumer spending behavior prediction method based on modified hidden Markov model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103400040A (en) * 2013-07-31 2013-11-20 中国人民解放军国防科学技术大学 Fault diagnosis and prediction method utilizing multistep time domain difference value learning
WO2016150395A1 (en) * 2015-03-24 2016-09-29 Huawei Technologies Co., Ltd. Adaptive, anomaly detection based predictor for network time series data
CN108491970A (en) * 2018-03-19 2018-09-04 东北大学 A kind of Predict Model of Air Pollutant Density based on RBF neural
CN108681923A (en) * 2018-05-16 2018-10-19 浙江大学城市学院 A kind of consumer spending behavior prediction method based on modified hidden Markov model

Also Published As

Publication number Publication date
CN111694879A (en) 2020-09-22

Similar Documents

Publication Publication Date Title
CN111694879B (en) Multielement time sequence abnormal mode prediction method and data acquisition monitoring device
US10929722B2 (en) Anomaly detection in streaming networks
US20190244129A1 (en) Data orchestration platform management
CN106708016A (en) Failure monitoring method and failure monitoring device
CN114785666B (en) Network troubleshooting method and system
US20220012538A1 (en) Compact representation and time series segment retrieval through deep learning
Du et al. GAN-based anomaly detection for multivariate time series using polluted training set
Gibberd et al. Multiple changepoint estimation in high-dimensional gaussian graphical models
Sun et al. Study on fault diagnosis algorithm in WSN nodes based on RPCA model and SVDD for multi-class classification
Yang et al. Remaining useful life prediction based on normalizing flow embedded sequence-to-sequence learning
EP3716279A1 (en) Monitoring, predicting and alerting for census periods in medical inpatient units
CN114819175A (en) Artificial intelligence optimization platform
CN115983087A (en) Method for detecting time sequence data abnormity by combining attention mechanism and LSTM and terminal
Jain et al. Machine learning-based monitoring system with IoT using wearable sensors and pre-convoluted fast recurrent neural networks (P-FRNN)
CN117041017B (en) Intelligent operation and maintenance management method and system for data center
Yu et al. MAG: A novel approach for effective anomaly detection in spacecraft telemetry data
Gao et al. A data mining method using deep learning for anomaly detection in cloud computing environment
Zhang et al. LIFE: Learning individual features for multivariate time series prediction with missing values
Liu et al. Residual useful life prognosis of equipment based on modified hidden semi-Markov model with a co-evolutional optimization method
CN115408189A (en) Artificial intelligence and big data combined anomaly detection method and service system
Xu et al. The unordered time series fuzzy clustering algorithm based on the adaptive incremental learning
CN111931798A (en) Method for carrying out classification detection and service life prediction of cold head state
CN115600478B (en) Software defined wide area network analysis system and method of operation thereof
CN113051006B (en) Auxiliary configuration method and system based on application service and relation vectorization
Mittal et al. Online cleaning of wireless sensor data resulting in improved context extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant