CN114511767A

CN114511767A - Quick state prediction method for timing diagram data

Info

Publication number: CN114511767A
Application number: CN202210129022.2A
Authority: CN
Inventors: 王勇; 王晓虎; 王范川; 秦瑞; 张应福; 石锟
Original assignee: Creative Information Technology Co ltd; University of Electronic Science and Technology of China
Current assignee: Creative Information Technology Co ltd; University of Electronic Science and Technology of China
Priority date: 2022-02-11
Filing date: 2022-02-11
Publication date: 2022-05-17
Anticipated expiration: 2042-02-11
Also published as: CN114511767B

Abstract

The invention discloses a quick state prediction method facing to timing diagram data, which comprises the following steps: step 1, data preprocessing: converting the data format; step 2, feature transformation: mapping data information, data state information and time information observed by each sensor to a high-dimensional space; and 3, feature learning of the encoder end: fusing all the transformed features in the step 2 to generate features of a time dimension and a space dimension, fusing the features of the time dimension and the space dimension together, and transmitting the fused features into a decoder part; and 4, feature transformation at the decoder end: performing feature transformation by adopting a generating mode to generate finally learned features; and 5, predicting data and states. The invention considers the state characteristics of the data, and switches in from the angle of data maintenance time to reveal the change of the data state, thereby reducing the redundancy degree of the data to a certain extent and being beneficial to improving the prediction accuracy of the data.

Description

Quick state prediction method for timing diagram data

Technical Field

The invention relates to the technical field of time sequence prediction, in particular to a quick state prediction method for time sequence diagram data.

Background

Timing data is ubiquitous at present, and is exemplified by traffic timing data, and prediction thereof is an important component for constructing intelligent traffic. Only if the traffic time sequence data is predicted more reliably, reasonable planning can be made in advance according to factors such as future traffic flow conditions and the like. In recent years, more and more research has focused on the study of such timing diagram data: lai et al solved the prediction problem of multivariate Time series using LSTNet (Long and Short Time-series Network), which integrates a CNN model and an RNN model, where CNN is used to extract Short-term dependencies of data, and RNN is used to discover Long-term patterns and trends of data, improving the accuracy of data prediction compared to a single model, but it ignores the spatial relationship existing between data. Zheng et al incorporates knowledge of Graph structures into temporal prediction, and proposes a GMAN (Graph Multi-Attention Network) model for long-term traffic information prediction. The STE (spatial temporal embedding) respectively uses GCN to obtain spatial features, uses an attention mechanism to obtain time features, and then fuses the spatial features and the time features, so that the extraction of space-time features is realized; however, when extracting the spatial features, the used adjacency matrix is fixed, and obviously, in the traffic map, the state between data points is changed. Also having the same problem are the STGCN (Spatial-Temporal graphic relational Networks) model proposed by Yu et al, the ASTGCN (extension-Based Spatial-Temporal graphic relational Networks) model proposed by Guo et al, and the like. The Graph WaveNet model proposed by Wu et al solves the above limitations, and uses a dynamically generated approach to construct the adjacency matrix of the Graph, so that the generated spatial relationship can be optimized by the model itself.

With the success of the transform model in the field of natural language processing, there is now some research on applying it to the field of time series prediction, such as the ASTGNN (Attention based Spatial-Temporal Graph Neural Network) proposed by Guo et al, which uses the transform model and uses it for extracting the time features of data, and incorporates the GCN part therein for extracting the Spatial features. However, in the transform model, a large number of matrix operations are involved, and the requirements on hardware devices such as a memory are high. Zhou et al propose an Informer model for optimizing the number of matrix calculations and reducing the calculation complexity, aiming at the problems of large calculation amount and large memory consumption of a Transformer model, but do not consider the spatial relationship between data when applied to the prediction of time series data.

In the above-mentioned methods, they focus on predicting the data value at a future time, which is often reliable when predicting a shorter time period in the future. However, the prediction range is expanded, that is, the accuracy of the prediction result is reduced as the prediction time increases in the long-time series prediction. Moreover, this prediction method only focuses on the characteristics of the data at a certain time, but ignores the status characteristics of the data, such as how long the data will last.

Disclosure of Invention

The present invention is directed to a method for predicting a state of a timing chart data, which is capable of predicting the timing chart data and predicting a corresponding holding time to reveal a corresponding state.

The purpose of the invention is realized by the following technical scheme:

a quick state prediction method facing to timing diagram data comprises the following steps:

step 1, data preprocessing: converting a data format, namely converting the data reported by the sensor from a time point + data value format into a data value + maintenance time format according to a time sequence;

step 2, feature transformation: for data information observed by each sensor, mapping the data information into a high-dimensional space by using a TCN module; mapping the data state information of each sensor to a high-dimensional space in a TCN convolution mode; for the time information of each sensor, carrying out feature coding according to date, and mapping to a high-dimensional space in a linear transformation mode;

and 3, feature learning of the encoder end: fusing all the transformed features in the step 2, transmitting the fused features into an encoder module based on a multi-head attention mechanism, and learning data features and data state features on a time dimension; transmitting the original data before feature transformation to a dynamic graph neural network module, learning data features on a space dimension, fusing the features of a time dimension and the space dimension together, and transmitting the fused data to a decoder part;

and 4, feature transformation at the decoder end: performing characteristic transformation in a generating mode; firstly, generating characteristics with time dimension and space dimension; then, carrying out multi-head attention calculation together with the features generated in the step 3 to generate finally learned features;

said step 4 comprises in particular the sub-steps of,

step 401: obtaining the fusion characteristics Z of the time characteristics, the data characteristics and the data state characteristics through the transformation of the step 2_{Fuse_dec}＝Z_{Time_dec}+Z_{Data_dec}+Z_{Dur_dec}；

Step 402: fusing the features Z_{Fuse_dec}As input, the characteristic representation Z in the time dimension is obtained by passing the decoder part based on the multi-head Attention mechanism through a MaskProbSparse Self-Attention module_{Out_dec_temporal}；

Step 403: using Dynamic GCN to perform feature learning on an original data sequence to obtain data features on spatial dimensions, and then using 1D convolution to perform dimension transformation to obtain Z_{Out_dec_spatio}；

Step 404: merging the temporal and spatial features of the decoder portion to obtain Z_{Out_dec}＝Z_{Out_dec_temporal}+Z_{Out_dec_spatio}. Then, after FullAttention, Z is added_{Out_dec}As Q, Z_{Out_enc}As K, V, to obtain a final characteristic output Z_Out；

And 5, predicting data and states: and respectively transmitting the final generated characteristics of the decoder end into two different full-connection layers to predict data and states.

Further, the data information, the data state information and the time information are all in a data form of a data sequence.

Further, the step 2 comprises the following sub-steps:

step 201: transmitting data information into TCN module, extracting data characteristic Z_{Data_enc}；

Step 202: transmitting the maintaining time sequence into a TCN module, and extracting the characteristic Z of the data state_{Dur_enc}；

Step 203: coding the time sequence information to obtain the characteristic Z of the time_{Time_enc}。

Furthermore, the data state information is a maintenance time sequence, and the state information reflects the rule of data change to a certain extent.

Further, the step 3 specifically includes the following sub-steps:

step 301: combining the characteristics in the step 2 to obtain a fusion characteristic Z containing time information, data characteristics and data state characteristics_{Fuse_enc}＝Z_{Time_enc}+Z_{Data_enc}+Z_{Dur_enc}。

Step 302: fusing the features Z_{Fuse_enc}As input, pass into the encoder portion based on the multi-headed attention mechanism; reducing the dimension of the characteristic sequence to 1/2 of the original dimension for the purpose of distillation; by reducing the dimension, on one hand, the requirements on hardware devices such as a memory are reduced, and on the other hand, the calculation efficiency is further accelerated.

Step 303: through L₁A sub ProbSparse Self-orientation module and a MaxPool distillation module to obtain the output Z of the encoder_{Out_enc_temporal}；

Step 304: using dynamic GCN to perform feature learning on an original data sequence to obtain data features on a space dimension, and then using 1D convolution and MaxPool to perform dimension transformation to obtain Z_{Out_enc_spatio}；

Step 305: fusing the features of the time dimension and the space dimension together to obtain Z_{Out_enc}＝Z_{Out_enc_temporal}+Z_{Out_enc_spatio}And passed into the decoder section.

Further, the processing procedure of the encoder part based on the multi-head attention mechanism is as follows: firstly, multi-head self-Attention calculation is carried out through a ProbSpareSelf-Attention module, and the time complexity is calculated from O (n) by using an optimized matrix calculation mode²) Reduction to O (nlog (n)); and then, reducing the dimension of the characteristic sequence to 1/2 of the original dimension through MaxPool convolution operation for the purpose of distillation.

Further, the step 5 specifically includes the following sub-steps:

step 501: the final characteristic vector Z obtained in the step 4 is processed_OutInput FC1, output predicted timing graph data

Step 502: the final characteristic vector Z obtained in the step 4 is processed_OutTransmitting the predicted retention time corresponding to the timing diagram data to FC2

The invention has the beneficial effects that: the invention considers the state characteristics of the data, switches in from the angle of data maintenance time to reveal the change of the data state, and fuses the change as information into the characteristics, so that the data characteristics are richer on one hand; on the other hand, the redundancy degree of the data is reduced to a certain degree, the prediction accuracy of the data is improved, and the method can also be applied to downstream tasks such as data completion, abnormal detection and the like.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a schematic diagram of a TCN module.

Fig. 3 is a forward process of the model.

FIG. 4 is a diagram of a model framework of the present invention.

FIG. 5 is a schematic diagram of a detailed list of each layer of the model.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Embodiment 1, as shown in fig. 1, a method for fast state prediction oriented to timing diagram data includes the following steps:

step 1: data preprocessing: the data format is converted, and the data reported by the sensor is converted from the format of (time point, data value) to the format of (data value, maintenance time) according to the time sequence.

For a particular sensor XⁱIn other words, the data form is originally

We merge data segments with the same data to form a new format of

On one hand, the conversion mode can reduce the redundancy of data and save the memory resource, and under the worst condition, the data length is equal to the length of the original data sequence; on the other hand, using a sustained time sequence of data

The change of the data state is revealed, and the duration time sequence of the data is used as an additional input for enriching the content of the data characteristics, which is beneficial to improving the prediction accuracy of the data.

Step 2: and (3) feature transformation: for data information observed by each sensor, mapping the data information into a high-dimensional space by using a TCN module; mapping the data state information of each sensor to a high-dimensional space in a TCN convolution mode; and (3) carrying out characteristic coding on the time information of each sensor according to the date, and mapping to a high-dimensional space in a linear transformation mode.

Step 201: transmitting the data value sequence into a TCN module, and extracting the characteristic Z of the data sequence_{Data_enc}；

Because the data dimension of the sensors is small, the TCN module is used here to map the features of each sensor into a high-dimensional space, facilitating feature learning by the encoder module.

Step 202: transmitting the maintaining time sequence into a TCN module, and extracting the state characteristic Z of the data_{Dur_enc}(ii) a TCN is a feature extraction module composed of a hole convolution and a residual structure, as shown in FIG. 2.

The hole convolution kernel is described using the formula:

wherein x ∈ RⁿRepresents a sequence of 1D, s is the sequence for performing the convolution operation, D is the expansion factor, k is the number of filters, and f represents the filter.

Compared with the common convolution operation, the method does not leak future information and cause the condition of state information leakage when extracting the features.

Step 203: coding the time sequence information to obtain a characteristic vector Z of time_{Time_enc}。

For time-series data, there may be a periodic regularity between data, and the sequentiality between data is certain. However, in the multi-head attention mechanism, the data is insensitive to the sequence information of the data, the position information between the data is not enough to be embodied in a relative position coding mode, and the characteristic of the periodicity of the data cannot be well extracted. Therefore, time information of the data is encoded to reveal order information between the data from a global perspective.

The pseudo code of the whole model is shown in FIG. 3, and the details of the parameters of each layer of the model are shown in FIG. 5.

And step 3: feature learning at the encoder side: fusing the features in the step 2, transmitting the fused features into an encoder module based on a multi-head attention mechanism, and learning data features and data state features on a time dimension; and transmitting the original data before feature transformation to a dynamic graph neural network module, learning data features on a spatial dimension, fusing the features of the temporal dimension and the spatial dimension, and transmitting the fused features to a decoder part.

Step 302: fusing the features Z_{Fuse_enc}As input, an encoder section based on a multi-head attention mechanism is introduced. Firstly, multi-head Self-Attention calculation is carried out through a ProbSparse Self-Attention module, the part uses an optimized matrix calculation mode, and the time complexity is reducedFrom O (n)²) The calculation efficiency of the model is accelerated by reducing the calculation rate to O (nlog (n)); and then, the aim of distillation is achieved by MaxPool convolution operation, and the dimension of the characteristic sequence is reduced to 1/2 of the original dimension. By reducing the dimension, on one hand, the requirements on hardware devices such as a memory are reduced, and on the other hand, the calculation efficiency is further accelerated.

When calculating the Attention of a plurality of heads, the method of ProbSparse Self-Attention is adopted, and the formula is expressed as follows:

wherein the content of the first and second substances,

and d represents the dimension corresponding to Q and K.

The reason that the sparse form of self-attention can be used is that after calculating the self-attention score using Q and K, it is found that the effect on V follows a long tail distribution, i.e. only a small fraction of the attention score plays a more important role. Therefore, the similarity between Q and K (measured here using KL divergence) is calculated, and the vector of top K (where K is a parameter set to K ═ ln L _ Q, and L _ Q is the number of vectors in Q) is taken as the top K (where K is a parameter) vector

This approach can reduce computational complexity without affecting attention results, from O (n)²) The calculation efficiency of the model is accelerated by reducing the calculation rate to O (nlog (n));

then, the dimension of the data is reduced to half of the input by using the operation of distillation (here, MaxPool convolution) to reduce the requirement on the memory. Meanwhile, the data dimension of the lower network layer is reduced, and the operation speed is accelerated.

Step 303: through L₁A sub ProbSparse Self-orientation module and a MaxPool distillation module to obtain the output Z of the encoder_{Out_enc_temporal}。

Step 304: using dynamic GCN to perform feature learning on an original data sequence to obtain data features on a space dimension, and then using 1D convolution and MaxPool to perform dimension transformation to obtain Z_{Out_enc_spatio}。

Since the spatial relationship between the data should be dynamically changing in time series. Specifically, at different time points, the adjacency matrix formed between data should be different, so when initializing the adjacency matrix, we use the full-connection matrix to represent the relationship between data. And the relation between the data at each time is changed and is represented by using a dynamically generated correlation strength matrix S.

At this time, we mark the hidden representation of data at different time as E_i(i∈[1,t]) So the correlation strength matrix S is represented as:

s_jkthe (0 ≦ j, k ≦ N) matrix represents the strength of association between the j point and the k point.

The spectral domain based GCN is then used to update the data representation E at each time_i。

E_i＝GCN(E_i)＝σ(A⊙S_iE_iW)

Wherein: e_iFor a hidden layer representation of the incoming data,

S_iis the association strength matrix of each node in the learned graph. A denotes the original adjacency matrix. A, S_i∈R^N*N. An indication of a corresponding position multiplication. W is a matrix of weights that is,

finally, obtaining the spatial characteristics at each time, and forming Z by transforming dimensions_{Out_enc_spatio}And (6) outputting.

And 4, step 4: feature transformation at decoder side: and performing feature transformation by adopting a generating mode. Firstly, generating features with time dimension and space dimension similar to step 3; then, the multi-head attention calculation is performed together with the features generated in step 3, and finally learned features are generated.

Step 401: fusion characteristic Z of information, data characteristic and data state characteristic between acquisitions_{Fuse_dec}＝Z_{Time_dec}+Z_{Data_dec}+Z_{Dur_dec}。

Since the predictive value is obtained by using the generating equation in the decoder, the decoder should have a part of the input information, i.e. the fusion characteristic Z similar to step 301, as in the encoder_{Fuse_dec}＝Z_{Time_dec}+Z_{Data_dec}+Z_{Dur_dec}。

Step 402: fusing the features Z_{Fuse_dec}As input, the characteristic representation Z in the time dimension is acquired by passing the decoder part based on the multi-head Attention mechanism through a Mask ProbSparse Self-Attention module_{Out_dec_temporal}。

Different from the encoder part, the encoder part is training data, and the input time series data can be used for feature extraction. However, in the encoder part, data in the prediction time period needs to be generated, so that the time sequence is strictly observed, and future information cannot be leaked. Therefore, after the Mask ProbSparse Self-Attention module is used to calculate the Attention score of the head, the upper triangular part of the matrix is set to negative infinity, and then after the calculation of the softmax layer, the influence caused by future information can be ignored.

Step 403: using Dynamic GCN to perform feature learning on an original data sequence to obtain data features on spatial dimensions, and then performing dimension transformation through 1D convolution to obtain Z_{Out_dec_spatio}。

Characteristic information Z of decoder transmitted by encoder_{Out_enc}Including features in both the spatial and temporal dimensions, and thus in Z_{Out_enc}Before performing the multi-head attention calculation, it should be ensured that the features of the decoder portion also contain information in the spatial dimension and the time dimension, and therefore, the input data is subjected to the extraction of the spatial features here.

Step 404: merging the temporal and spatial features of the decoder portion to obtain Z_{Out_dec}＝Z_{Out_dec_temporal}+Z_{Out_dec_spatio}. Then, Z is subjected to Full extension_{Out_dec}As Q, Z_{Out_enc}As K, V, to obtain a final characteristic output Z_Out。

And 5: the output of the decoder is connected to two full connection layers (FC), and the timing chart data and the corresponding holding time are respectively predicted. An output part of Decoder as in fig. 4;

MSE was used as an evaluation index:

wherein | X | represents the total number of data in the segment, | T | represents the total number of data retention time in the segment,

representing predicted data, x, t representing originalThe data of (1).

It should be noted that, for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the order of acts described, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and elements referred to are not necessarily required in this application.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a ROM, a RAM, etc.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. A method for fast state prediction oriented to timing diagram data is characterized by comprising the following steps:

said step 4 comprises in particular the sub-steps of,

Step 402: fusing the features Z_{Fuse_dec}As input, the characteristic representation Z in the time dimension is acquired by passing the decoder part based on the multi-head Attention mechanism through a Mask ProbSparse Self-Attention module_{Out_dec_temporal}；

Step 403: using Dynamic GCN to perform feature learning on an original data sequence to obtain data features on spatial dimensions, and then obtaining Z through dimension transformation of 1D convolution_{Out_dec_spatio}；

Step 404: merging the temporal and spatial features of the decoder portion to obtain Z_{Out_dec}＝Z_{Out_dec_temporal}+Z_{Out_dec_spatio}(ii) a Then, Z is subjected to Full extension_{Out_dec}As Q, Z_{Out_enc}As K, V, to obtain a final characteristic output Z_Out；

2. The method as claimed in claim 1, wherein the data information, the data status information and the time information are in the form of data of a data sequence.

3. The method for fast predicting the status of timing diagram data according to claim 1, wherein the step 2 comprises the following sub-steps:

4. The method as claimed in claim 1, wherein the data state information is a time sequence.

5. The method as claimed in claim 1, wherein the step 3 comprises the following sub-steps:

step 301: combining the characteristics in the step 2 to obtain a fusion characteristic Z containing time information, data characteristics and data state characteristics_{Fuse_enc}＝Z_{Time_enc}+Z_{Data_enc}+Z_{Dur_enc}；

Step 302: fusing the features Z_{Fuse_enc}As input, the characteristic sequence is transmitted to an encoder part based on a multi-head attention mechanism for processing so as to achieve the purpose of distillation, and the dimensionality of the characteristic sequence is reduced to 1/2 of the original dimensionality;

step 303: through L₁Second ProbSparse Self-Attention module and MaxPhol distillation module to obtain output Z of encoder_{Out_enc_temporal}；

Step 304: using Dynamic GCN to perform feature learning on an original data sequence to obtain data features on a space dimension, and then using 1D convolution and MaxPool transformation dimension to obtain Z_{Out_enc_spatio}；

6. The method of claim 1, wherein the processing procedure of the encoder portion based on the multi-head attention mechanism is as follows: firstly, multi-head Self-Attention calculation is carried out through a ProbSparse Self-Attention module, and the time complexity is calculated from O (n) by using an optimized matrix calculation mode²) Reduction to O (nlog (n)); and then, the dimension of the characteristic sequence is reduced to 1/2 of the original dimension through MaxPool convolution operation for achieving the purpose of distillation.

7. The method as claimed in claim 1, wherein the step 5 comprises the following sub-steps: