CN116578436A

CN116578436A - Real-time online detection method based on asynchronous multielement time sequence data

Info

Publication number: CN116578436A
Application number: CN202310528808.6A
Authority: CN
Inventors: 刘若辰; 张锦伟; 杨博通; 李卫斌
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2023-05-11
Filing date: 2023-05-11
Publication date: 2023-08-11

Abstract

The invention discloses a real-time online detection method based on asynchronous multi-element time sequence data, which mainly solves the problems of inaccurate solution of a graph adjacent matrix, insufficient feature extraction, poor real-time abnormality detection and low detection efficiency in the prior art for an asynchronous time sequence. The scheme is as follows: collecting data, normalizing the data, dividing the data into time sequence blocks of a training set, and calculating an adjacent matrix of the training set; constructing an asynchronous feature extraction sub-network; an upper-bound prediction sub-network and a lower-bound prediction sub-network are constructed, are respectively connected with an asynchronous feature extraction sub-network in series to form an anomaly detection network, and are trained by utilizing a training set; and inputting the time sequence blocks and the adjacency matrix of the test set into a trained anomaly detection network, and outputting a detection result. According to the invention, the adjacency matrix is solved by dynamic time, the characteristics are fully extracted by means of the asynchronous characteristic extraction sub-network, the anomaly detection accuracy and detection efficiency of the network are improved, and the method can be used for real-time detection of the state in the operation of industrial equipment.

Description

Real-time online detection method based on asynchronous multielement time sequence data

Technical Field

The invention belongs to the technical field of physics, and more particularly relates to a real-time online detection method which can be used for real-time detection of states in the operation of industrial equipment.

Background

In the real-time monitoring process of the operating state of industrial equipment, a plurality of multi-element time sequences for describing the operating state of certain units are often generated, and whether the system has abnormal events or not can be monitored by all-weather observation of the data based on expert experience. However, as the data size becomes larger and larger, a great deal of manpower and material resources are consumed in the conventional method, and development of a multi-element time series data anomaly detection algorithm based on a deep learning technology becomes a trend. However, due to the asynchronous problem of the multivariate time series, i.e. when the state of one cell changes, a period of time is required to cause the state of the other cell to change, this asynchronism results in many methods not being able to accurately extract features, resulting in a decrease in detection performance. In addition, the existing method only can give the anomaly score, cannot not depend on any priori information and gives the anomaly threshold value in real time, so that the method has low practical value.

The university of ocean in China discloses a method and a system for detecting time sequence abnormality based on GCN and attention VAE in the patent application document with the application number of CN 202210824024.3. The implementation steps are as follows: firstly, collecting and preprocessing data, and normalizing industrial state data read in real time; secondly, determining the interrelationship between different unit time sequences by calculating the pearson correlation coefficient between the different unit time sequences, and finally obtaining an adjacency matrix of the graph; then, aggregating information of neighbor nodes by using graph convolution, and reconstructing data by using a long-period memory network added with an attention mechanism; finally, distinguishing whether the abnormality occurs or not by using the reconstruction error. The method has two defects because of using the multivariate time series data to detect the abnormality: firstly, the asynchronism of the multi-element time sequence data can influence the accuracy of similarity calculation, so that the model obtains wrong graph adjacency matrix information. Secondly, the asynchronism of the multi-element time sequence data can influence the information extraction capability of the common graph convolution, so that the model can not fully extract the characteristics.

Julien Audio et al disclose in its published paper "USAD: unSupervised Anomaly Detection onMultivariate Time Series" (KDD' 20:Proceedings of the 26th ACM SIGKDD InternationalConference on Knowledge Discovery&Data Mining.August 2020) a method of multivariate time series anomaly detection based on an antagonistic self-encoder. The method comprises the following steps: firstly, dividing time sequence data into a plurality of time windows; secondly, establishing two self-encoder reconstruction networks, wherein the two self-encoders share the same encoder; finally, the method includes the steps of; the method adopts a two-stage training strategy, wherein the accurate detection capability of the encoder is enhanced through gradient descent in the first stage, and the omission problem of the encoder is avoided through countermeasure training in the second stage. The method reduces the feature extraction capacity of the metering domain because of neglecting interaction of different time sequences in the multi-element time sequence data; meanwhile, the abnormal threshold value cannot be given in real time, and a fixed threshold value can be provided based on the prior information of the test set on the basis of obtaining the abnormal score, so that the detection efficiency is low.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a real-time online detection method based on asynchronous multivariate time sequence data, so as to detect anomalies in real time, accurately acquire graph adjacency matrix information, fully extract model characteristics and improve detection accuracy and efficiency.

The technical idea for realizing the invention is as follows: by measuring the similarity between different time sequences by adopting dynamic time warping, the similarity calculation error caused by the asynchronism of the multi-element time sequences is eliminated, and the graph adjacency matrix is more accurately obtained; . By using an asynchronous sequence feature extraction sub-network based on one-dimensional convolution and asynchronous graph convolution, various dependency information of a multi-element time sequence is extracted more comprehensively, and the problem of insufficient feature extraction of an asynchronous time sequence is solved; the method comprises the steps of using an orthogonal quantile loss function optimization model to give an upper bound and a lower bound of normal change of data under the condition of not depending on any priori information, and judging whether an actual observed value is in a section or not to perform abnormality detection in real time.

According to the above thought, the implementation steps of the invention include the following:

(1) Generating a training set time sequence block:

collecting historical data of M kinds of indexes including CPU utilization rate of a server, memory utilization rate and network bandwidth, solving the mean value and variance of each index, respectively normalizing each index by using the mean value and the variance, and taking each index after normalization as a training set;

dividing the training set into a plurality of time sequence blocks with the length of w by a fixed step length s:

(2) Carrying out graph structure solving by utilizing dynamic time warping:

(2a) Constructing an adjacent matrix A with M multiplied by M, initializing each element value as 0, wherein M is the index number of each unit of industrial equipment;

(2b) Constructing a cost matrix C with the size of w multiplied by w, and calculating the elements C of the ith row and the jth column of the cost matrix C _i,j Is the value of (1):

wherein x_p ^(t-w+i) Time-series block W representing length W at time t ^(t) The ith observation, x, of the p-th index in (b) _q ^(t-w+j) Representing time series block W at time t ^(t) The j observation value of the q index in (1) is an integer value from 1 to w, p and q are row index and column index of the adjacent matrix A respectively, and the values of the j observation value and the j observation value are integer values from 1 to M;

(2c) For time series block W at time t ^(t) The step (2 b) is circularly executed on each observed value of the p-th index and the q-th index, and the element value c of the q-th column of the p-th row of the adjacent matrix A is calculated _w,w ；

(2d) For time series block W at time t ^(t) Calculating M indexes in the adjacent matrix A according to the step (2 c) until all elements in the adjacent matrix A are calculated, and obtaining an updated adjacent matrix A;

(3) Constructing an asynchronous feature extraction sub-network:

selecting an existing Sigmoid function, a hyperbolic tangent function, an asynchronous graph convolution layer and two one-dimensional convolution layers, setting the convolution kernel sizes of a first one-dimensional convolution layer and a second one-dimensional convolution layer to be 3, setting input channels to be 1, and setting output channels to be 4; setting the input channel of the convolution layer of the asynchronous diagram to be 4, and setting the output channel to be 16;

the first one-dimensional convolution layer is connected with the Sigmoid function in series, the second one-dimensional convolution layer is connected with the hyperbolic tangent function in series, and the two are connected in parallel and then connected with the asynchronous graph convolution layer in series to form an asynchronous feature extraction sub-network;

(4) An upper bound prediction sub-network and a lower bound prediction sub-network which are formed by a plurality of one-dimensional convolution layers are utilized, and an asynchronous feature extraction sub-network is respectively connected with the upper bound prediction sub-network and the lower bound prediction sub-network in series to form an anomaly detection network;

(5) Training the anomaly detection network using an orthogonal quantile loss function:

(5a) Initializing network weights, and initializing all weights in the network to random values meeting normal distribution;

(5b) Inputting the time sequence block of the training set and the adjacent matrix A obtained by calculation into an anomaly detection network, and calculating the loss value of the anomaly detection network by using an orthogonal quantile loss function; the gradient descent method is utilized to carry out counter-propagation on the loss value of the network, and the network parameters are iteratively updated until the loss function converges, so that a trained anomaly detection network is obtained;

(6) Detecting the state of equipment:

(6a) Collecting a historical index data time sequence block of M categories of memory utilization rate and network bandwidth, which is included in a server in w seconds time range, in real time, and carrying out normalization processing by adopting the same method as the step (1) to obtain a test time sequence block;

(6b) Inputting the test time sequence block and the calculated adjacent matrix A into a trained anomaly detection network, and outputting an upper bound predicted value and a lower bound predicted value; compare it with the actual observations of w+1 seconds at the next time:

if the actual observed value at the next moment is larger than the upper-bound predicted value or smaller than the lower-bound predicted value, the abnormality is considered to occur,

otherwise, no anomaly occurs.

Compared with the prior art, the invention has the following advantages:

firstly, the invention adopts a graph structure solving strategy based on dynamic time warping, the strategy uses the dynamic time warping to measure the similarity between different indexes, and the problem of learning an incorrect graph adjacency matrix caused by asynchronous time sequences in the prior art is solved, so that the graph structure can be correctly represented, and the accuracy of abnormal detection of industrial equipment of a model is improved.

Secondly, because the invention uses one-dimensional convolution and asynchronous graph convolution jointly, the characteristics of the time domain and the measuring domain of the multi-element time sequence are extracted by utilizing the asynchronous sequence characteristic extraction sub-network, the dependency information of different dimensionalities of the multi-element time sequence is effectively utilized, the defect of the prior art on the characteristic deletion of the time sequence characteristics is overcome, the invention can comprehensively extract the characteristics of different layers of the time sequence, and the abnormal detection performance of the industrial equipment of the model is improved.

Thirdly, because the model is trained by adopting the orthogonal quantile loss function, the model can output the data abnormality detection threshold value in real time under the condition of no prior information, the defect that the threshold value is incorrectly and untimely selected in the prior art is overcome, the abnormality can be detected in real time, and the abnormality detection efficiency of industrial equipment of the model is improved.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention.

Detailed Description

Embodiments of the present invention are described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, the implementation steps of the present embodiment are as follows:

and step 1, generating a training set time sequence block.

1.1 At T) _s The second is a sampling interval, and the collecting time is a parameter of a server with a duration of T seconds, which includes a CPU utilization rate, a memory utilization rate, a bandwidth utilization rate, an interface mode, a TCP connection number, a routing throughput, a disk space, a device temperature, a fan state, a network connection state, an interface mode, an SNMP data collecting state, a packet loss rate, an interface error rate, a power supply state, and historical data of 16 kinds of indexes in total, which are set by this example but not limited to the sampling interval T _s A value of 1 and a collection duration T value of 21600;

1.2 Calculating the mean and variance of each index of the historical data, and using the mean and variance to normalize each index according to the following formula:

wherein ,observation at time t representing the ith index of the history,/>Represents the i index, mu, of the history data which is not normalized _i Represents the mean value, sigma, of the i-th index of the historical data _i Representing the variance, x of the i-th index of the historical data _i Normalized data representing the ith index of historical dataI is an integer value from 1 to 16;

1.3 Dividing the normalized historical data of each index into a plurality of training set time sequence blocks with the time range length of W by a fixed step length s, and dividing the time sequence blocks W at the t time ^(t) The expression is as follows:

W ^(t) ＝{x ^(t-11) ,x ^(t-10) ,…,x ^(t-j) ,…,x ^(t) }

wherein ,normalized observations representing 16 indices at time t-j,/>Representing the normalized observed value of the jth index at the time of t-j, j is an integer value from 0 to w-1, the example sets, but is not limited to, a time range length w of 12 and a step s of 1.

And 2, carrying out graph structure solving by utilizing dynamic time warping.

2.1 A adjacency matrix a of 16 x 16 size is constructed:

wherein ,a_p,q For the element values of the p-th row and the q-th column in the adjacency matrix A, p and q are integer values from 1 to 16, each element value is initialized to be 0, and 16 is the index number of each unit of industrial equipment.

2.2 A 12 x 12 cost matrix C) is constructed:

wherein ,c_i,j For the values of the elements of the ith row and jth column in the adjacency matrix C, i and j are integer values from 1 to 12. And calculates the ith row and jth column elements C of the cost matrix C according to the following formula _i,j Is the value of (1):

wherein ,x_p ^(t-w+i) Time-series block W representing length 12 at time t ^(t) The ith observation, x, of the p-th index in (b) _q ^(t-w+j) Representing time series block W at time t ^(t) The j observation of the q-th index;

2.3 For time series block W at time t ^(t) The p-th index and the q-th index each observed value in the index circularly performs the step 2.2) to calculate the p-th row and the q-th element value c of the adjacent matrix A _12,12 ；

2.4 For time series block W at time t ^(t) And calculating element values of the two indexes of the total 16 indexes according to the step 2.3) until all elements in the adjacent matrix A are calculated, and obtaining the updated adjacent matrix A.

And 3, constructing an asynchronous feature extraction sub-network.

Two one-dimensional convolution layers and one asynchronous graph convolution layer are selected, the convolution kernel sizes of the first one-dimensional convolution layer and the second one-dimensional convolution layer are set to 3, the input channels are set to 1, the output channels are set to 4, the input of the asynchronous graph convolution layer is set to 4, and the output channels are set to 16;

selecting an existing Sigmoid function and a hyperbolic tangent function;

and connecting the first one-dimensional convolution layer with the Sigmoid function in series, connecting the second one-dimensional convolution layer with the hyperbolic tangent function in series, and connecting the two in parallel and then connecting the two with the asynchronous graph convolution layer in series to form an asynchronous feature extraction sub-network.

And 4, constructing an anomaly detection network.

4.1 Establishing an upper bound prediction subnetwork: 2 one-dimensional convolution layers are selected, the input channel of the 1 st one-dimensional convolution layer is 16, the output channels are all 4, the input channel of the 2 nd one-dimensional convolution layer is 4, the output channel is 1, and the two one-dimensional convolution layers are connected in series to form an upper-boundary prediction sub-network

4.2 Establishing a lower bound predictor network: two one-dimensional convolution layers, namely a 3 rd one-dimensional convolution layer and a 4 th one-dimensional convolution layer, are selected, an input channel of the 3 rd one-dimensional convolution layer is 16, an output channel is 4, an input channel of the 4 th one-dimensional convolution layer is 4, an output channel is 1, and the 3 rd one-dimensional convolution layer and the 4 th one-dimensional convolution layer are connected in series to form a lower-bound prediction sub-network;

4.3 An asynchronous feature extraction sub-network is respectively connected with an upper-bound prediction sub-network and a lower-bound prediction sub-network in series to form an anomaly detection network, and an existing orthogonal fractional function is used as a loss function Q of the network.

And step 5, training the anomaly detection network.

5.1 Initializing network weights, and initializing all weights in the network to random values meeting normal distribution, wherein the average value of the normal distribution is 0, and the standard deviation is 0.02;

5.2 Inputting the time series block of the training set and the adjacency matrix a into the anomaly detection network, calculating a loss value of the anomaly detection network using an orthogonal quantile loss function:

5.2.1 Calculating an upper bound loss value for the anomaly detection network according to

5.2.2 Calculating a lower bound loss value for the anomaly detection network according to

5.2.3 Calculating the actual coverage V according to:

5.2.4 Calculating a predicted interval length I according to the following formula:

5.2.5 Calculating an orthogonal regularization term loss value R (V, I) for the anomaly detection network according to:

5.2.6 Calculating a loss value L of the anomaly detection network from the parameters of 5.2.1), 5.2.2), 5.2.5):

wherein x is the actual observed value,for the upper bound predictor, ++>Alpha is the lower bound predictor _up Alpha is the upper quantile _lo For the lower bound quantile, ρ (·) represents the marbles loss function, |·|| represents the L1 norm, cov (·) represents the covariance function, and Var (·) represents the variance function.

5.3 Using gradient descent method to make counter-propagation of loss value L of network, iteratively updating network parameters until the loss function Q is converged so as to obtain trained abnormal detection network.

And 6, detecting the state of the equipment.

6.1 Collecting the set time range length w seconds in real time and at intervals T _s Second sampling a historical data time sequence block of a server, which comprises CPU utilization, memory utilization, bandwidth utilization, interface mode, TCP connection number, route swallowingHistorical data time sequence blocks with 16 kinds of indexes including spitting quantity, disk space, equipment temperature, fan state, network connection state, interface mode, SNMP data collection state, packet loss rate, interface error rate, power state and newly-added connection number are set as a time range with a w value of 12 and a sampling interval T in the embodiment _s A value of 1;

6.2 Normalizing the historical data time sequence blocks of each kind of indexes by adopting the same method as the step 1.2) to obtain a test time sequence block;

6.3 Inputting the test time sequence block and the calculated adjacency matrix A into a trained anomaly detection network, and outputting an upper bound predicted valueAnd lower bound predictor->

6.4 Let x be the actual observed value of w+1 seconds at the next time _test Comparing the abnormal value with the upper-limit predicted value and the lower-limit predicted value, and determining whether the abnormal value occurs:

if it isOr->Then an exception is deemed to have occurred;

otherwise, no anomaly occurs.

The effects of the present invention are further described below in conjunction with simulation experiments:

1. and (5) simulating conditions.

The simulation experiment of the invention is carried out on a computer of an Intel (R) Core (TM) i7-7800X CPU@3.50GHz processor and an Nvidia (R) RTX2080Ti graphic card and is completed by using a deep learning framework PyTorch software.

When the training set and the test set are generated in the simulation experiment, the disclosed standard data set machine-1-7 is used. The dataset included a training set of 23697 minutes duration and a test set of 23697 minutes duration.

Both training set and test contained 38 industrial state indicators and the sampling interval was 1 minute. The training set does not contain abnormal conditions of industrial equipment, 10.12% of the test set has abnormal conditions of equipment, and the rest is normal conditions.

2. And (5) analyzing simulation content and results.

Respectively training the anomaly detection networks by using training sets in the machine-1-7 dataset to obtain trained anomaly detection networks;

and inputting the test set in the machine-1-7 data set into the trained respective anomaly detection network to obtain the anomaly detection result of each observed value in the test set.

According to the abnormal detection result of each observed value in the test set, calculating the precision and recall ratio results of the respective abnormal detection network on the test set:

the results obtained by the two methods are shown in Table 1.

TABLE 1 comparison of precision and recall of anomaly detection scores of the present invention with existing methods

The existing unsupervised multivariate time series anomaly detection algorithm in table 1 is: an unsupervised multivariate time series anomaly detection algorithm is presented by Julien audiobert et al in its published paper "USAD: unSupervised Anomaly Detection on Multivariate Time Series" (KDD' 20:Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery&Data Mining.August 2020).

As can be seen from Table 1, the precision ratio of the invention is 0.99, the recall ratio is 0.93, and both the indexes are higher than those of the prior art method, which proves that the invention can obtain higher detection precision of the state of industrial equipment.

The simulation experiment shows that: according to the method, an accurate graph adjacency matrix is effectively learned by utilizing a graph structure solving strategy based on dynamic time warping, features of different dimensions of time data can be modeled by utilizing the built asynchronous sequence feature extraction sub-network, the problem that feature extraction is incomplete and inaccurate in the prior art method is effectively solved, an abnormal threshold can be given in real time by utilizing an orthogonal quantile loss function optimized model, and the detection efficiency of the model is effectively improved.

Claims

1. The real-time online detection method based on the asynchronous multi-element time sequence data is characterized by comprising the following steps of:

(1) Generating a training set time sequence block:

(2) Carrying out graph structure solving by utilizing dynamic time warping:

wherein x_p ^(t-w+i) Time-series block W representing length W at time t ^(t) The ith observation, x, of the p-th index in (b) _q ^(t ^-w+j) Representing time series block W at time t ^(t) The j observation value of the q index in (1) is an integer value from 1 to w, p and q are row index and column index of the adjacent matrix A respectively, and the values of the j observation value and the j observation value are integer values from 1 to M;

(3) Constructing an asynchronous feature extraction sub-network:

(6) Detecting the state of equipment:

otherwise, no anomaly occurs.

2. The method of claim 1 wherein the step (1) of collecting historical data for M types of metrics includes CPU utilization, memory utilization, bandwidth utilization, interface mode, TCP connection count, routing throughput, disk space, device temperature, fan status, network connection status, interface mode, SNMP data collection status, packet loss rate, interface error rate, power status, and number of new connections.

3. The method of claim 1, wherein step (1) of collecting historical data of M kinds of indexes is performed under the condition that a sampling interval is 1 second and a collection time period is 3600 minutes.

4. The method of claim 1, wherein the respective indices are normalized using the mean and the variance in step (1) as follows:

wherein ,represents the i index, mu, which is not normalized _i Represents the mean value, sigma, of the ith index _i Represents the variance, x of the ith index _i Representing the normalized data of the ith index.

5. The method according to claim 1, wherein constructing an M x M-sized adjacency matrix a in (2 a) is represented as follows:

wherein ,a_p,q For the p-th row and q-th column element values in adjacency matrix a, p and q are integer values from 1 to M.

6. The method according to claim 1, characterized in that in step 2 b) a cost matrix C of size w x w is constructed, expressed as follows:

wherein ,c_i,j For the values of the elements of the ith row and jth column in the adjacency matrix C, i and j are integer values from 1 to w.

7. The method of claim 1, wherein the step (4) is implemented using upper and lower prediction sub-networks of a plurality of one-dimensional convolutional layers, as follows:

4 one-dimensional convolution layers are selected, the input channel of the 1 st one-dimensional convolution layer is set to be 16, and the output channel is set to be 4; setting the input channel of the 2 nd one-dimensional convolution layer as 4, and setting the output channels as 1; setting the input channel of the 3 rd one-dimensional convolution layer as 16 and the output channel as 4; setting the 4 th one-dimensional convolution layer input channel as 4 and the output channel as 1;

the 1 st one-dimensional convolution layer and the 2 nd one-dimensional convolution layer are connected in series to form an upper bound prediction sub-network;

and connecting the 3 rd one-dimensional convolution layer and the 4 th one-dimensional convolution layer in series to form a lower-bound prediction sub-network.

8. The method of claim 1, wherein the calculating of the loss value of the anomaly detection network using the orthogonal quantile loss function in step (5 b) is accomplished by:

(5b1) Calculating an upper bound loss value for an anomaly detection network

(5b2) Calculating a lower bound loss value for an anomaly detection network

(5b3) Calculating the actual coverage ratio V:

(5b4) Calculating a predicted interval length I:

(5b5) Calculating an orthogonal regularization term loss value R (V, I) of the anomaly detection network:

(5b6) Calculating a loss value L of the anomaly detection network based on the parameters of (5 b 1), (5 b 2) and (5 b 5):