CN115827335B

CN115827335B - Time sequence data missing interpolation system and time sequence data missing interpolation method based on modal crossing method

Info

Publication number: CN115827335B
Application number: CN202310063746.6A
Authority: CN
Inventors: 邱晨萌; 李雨芝; 康明与; 陈都鑫; 虞文武
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2023-02-06
Filing date: 2023-02-06
Publication date: 2023-05-09
Anticipated expiration: 2043-02-06
Also published as: CN115827335A

Abstract

The invention provides a time sequence data deletion interpolation system based on a modal crossing method, which consists of four modules, namely a characterization data module, a multi-modal data fusion module and a stackable space-time converter module (hereinafter referred to as a space-time module for short), and a prediction module. The method is suitable for the situation that data is randomly lost and non-randomly lost and partial node data is completely lost. The system adopts a multi-mode data fusion technology to convert image data and sound data into vectors through characterization learning, converts image data information into adjacent matrixes to form a vector, and fuses the adjacent matrixes with missing time sequence data and the like to form prior information; and constructing an end-to-end coding channel by establishing dense connection between each space-time block to realize accurate recovery of missing data, wherein the space-time block consists of a space block and a time block. The invention has the advantage of high accuracy in recovering the information flow coupling time sequence data.

Description

Time sequence data missing interpolation system and time sequence data missing interpolation method based on modal crossing method

Technical Field

The invention relates to a coupling information flow time sequence data missing interpolation system based on a dense space-time coding network and with partial nodes completely missing, and belongs to the technical field of missing data recovery.

Background

With the popularization of information sensor technology, explosive data volume has prompted the generation of a large data age. The method based on the data driving technology comprises deep learning, reinforcement learning, incremental machine learning and the like, and is well applied to the aspects of optimizing operation of electric power and traffic systems, planning decision-making, demand side response and the like. Taking a traffic system as an example, in an intelligent traffic system, data sampling is based on an intelligent sensor network and comprises various components such as an intelligent card, a GPS, a speed sensor, a video detector and the like. The intelligent traffic system based on the data driving technology has strong traffic state sensing capability, and can realize accurate traffic flow prediction, optimal scheduling and real-time target control. These data-driven models often require a large amount of data to train before they are actually used to achieve better performance. However, due to engineering problems such as irregular sampling equipment faults, time series data collected from a sensor network are often incomplete, which causes common data loss problems. The problem can be largely divided into two categories, namely random deletion and non-random deletion, depending on the type of data deletion.

The generation of the missing data is always the research direction of interest of expert scholars at home and abroad, and the search of a method for improving the recovery performance of the missing data is never stopped. Data driving methods in existing research can be divided into three major categories, namely a statistical model, a tensor method based on low-rank hypothesis and a deep learning model. In the 90 s of the 20 th century, a statistical-based ARIMA, KNN (K-NearestNeighbor) model was proposed, and the core idea of these methods is a weighted average method, so that the weighted average principle is highly dependent on the definition of weight, and the responsibility returning result is often too rough to use because the weighted average principle is linear and not practical. The current mainstream approach is a tensor-based approach, such as the recently proposed Bayesian GaussianCANDECOMP/PARAFAC (BGCP) model, which is based on low rank hypotheses. By processing the global tensor information and reconstructing the sparse tensor by Bayesian reasoning, ideal performance is achieved. However, most tensor-based methods use spatial information is limited to the scope of the temporary tensor structure, rather than the spatial location of the nodes. Thus, their performance may decrease as the data loss rate increases.

Further, deep learning is a new method to estimate missing data. For example, the deep learning such as a long and short term memory artificial neural network (LSTM), a gate-controlled cyclic neural network (GRU), a Convolutional Neural Network (CNN) and the like can effectively mine time-space association in time sequence data, and has good effect of interpolation on random data deletion under the condition of low deletion rate, but the method has overlarge requirement on training sample size, poor generalization capability under the condition of small sample, easy fitting problem occurrence and difficult realization of data extrapolation on data fields without any time sequence information, and the variety limits the performance of the deep learning interpolation method.

Based on the above, the invention aims to consider the problems that the random deletion of data, the non-random deletion has high deletion rate and the data appears partial node data is completely deleted.

Disclosure of Invention

In order to solve the defects in the prior art, the invention provides a novel deep learning model system, namely a method based on a dense space-time coding network (DSTTN, denseSpatial-temporal Transformer Nets), which is used for solving the problem of data recovery under the condition of high miss rate based on a cross-modal principle. The model fully utilizes space and time mode information through a space transducer block (hereinafter referred to as a space block) and a time transducer block (hereinafter referred to as a time block) instead of only considering time sequence information. For the situation that data is randomly missing or non-randomly missing, fitting the distribution of the data of a certain node by extracting the characteristics of the data which is not missing, and randomly sampling the data from the fitted data distribution to be used as an interpolation value to finish missing data interpolation; and for the situation that part of node data is completely missing, fitting the distribution of the node data by utilizing the data of the similar nodes, and then performing random sampling to serve as interpolation values to complete interpolation. Compared with the two-way long-short-term memory artificial neural network with the best performance in the reference model, the invention has the advantage of average absolute percentage error.

The technical scheme adopted by the invention is as follows:

a time sequence data deletion interpolation system based on a modal crossing method comprises a characterization data module, a multi-modal data fusion module, a stackable space-time converter module (hereinafter referred to as a space-time module for short) and a prediction module. The invention has the main characteristics that the multi-mode data fusion module uses multi-mode data to perform missing data interpolation, and the problem processing effect of the stackable space-time module on the occurrence of complete data missing of partial nodes reaches the performance of the current best model.

And a characterization data module: node characterization is obtained through calculation firstly based on graph neural network graph characterization learning, and then graph pooling is carried out on the characterization of each node on the graph to obtain the characterization of the graph, and image data is converted into vectors; while using neural networks to learn a representation that captures high-level semantic content from the signal while being undisturbed by low-level details in the signal (e.g., potential pitch contour or background noise), the sound data is converted into vectors. And inputting the obtained result into a multi-mode data fusion module.

Multimode data fusion module: converting the graph data into adjacent matrixes to form a vector, and then combining the vector with the output of the characterization data module and the data containing the missing time sequence into a tensor to obtain multi-mode data input; and simultaneously, the graph data is used as the initialization input of the first space block, and the time sequence data can extract the space characteristics after passing through the space block. The multi-modal data is input into the stackable space-time module after being upscaled by a 1 x 1 convolutional neural network. The upscaling can increase the data feature extraction dimension for more comprehensive analysis of the data.

Stackable spatiotemporal modules: the module consists of a plurality of space-time modules, a dense connection sub-module and a convolutional neural network sub-module; as shown in fig. 3.

Further, each of the spatio-temporal modules includes a temporal and a spatial block. The time block and the space block have the same structural framework, and the main body framework is built by a transducer. The densely connected sub-modules are connected with a plurality of space-time modules in parallel, so that dense interaction of the characteristic information flows is realized. And outputting the time sequence data which enters the convolutional neural network for dimension reduction to obtain the data interpolation. And inputs it to the prediction module.

And a prediction module: the method is realized by a convolutional neural network, and the input time sequence data after interpolation is used for predicting the time sequence data of a system for a period of time in the future through the convolutional neural network.

It can be changed into a system after modularization.

A time sequence data missing interpolation technology based on a modal crossing method comprises the following steps:

step 1, processing sound data and image data by a characterization data module, forming vectors by the image data, and fusing the vectors and the missing data into a multi-mode data input 1 multiplied by 1 convolutional neural network.

And 2, carrying out dimension lifting on the data by using a 1 multiplied by 1 convolutional neural network, and improving the feature extraction capability of the data.

And 3, performing deletion interpolation on the deletion value of the preprocessed coupling information flow time sequence data by using the DSTTN.

Step 4, constructing a loss function for the model,

step 5, training the model,

and 6, performing dimension reduction on the interpolated time sequence data through a 1 multiplied by 1 convolutional neural network to obtain the time sequence data which is interpolated in the same dimension as the actual data.

And 7, predicting the coupling information flow data of the future time step according to the obtained time sequence data which completes the interpolation of the missing data by using a 1 multiplied by 1 convolutional neural network.

Further, in step 1, the sliding window is manually selected to determine the input dimension, i.e. N nodes are selected, and the coupling information stream data of the T time step is taken as the model input. Limiting the input dimension avoids that the length of the time series data input is too long to cause the dimension of the neural network to be too large to run slowly. The selected node timing data may be subject to random and non-random deletions or partial node data may be completely missing.

Further, in the step 2, the specific process of using the convolutional neural network to perform dimension lifting on the data is as follows: the input dimension of the time sequence data is 1 xN x T, only one characteristic dimension is provided, the data is processed by using C convolution check data, the dimension of the data is changed into C x N x T, the network is deepened under the condition of not changing the receptive field, more nonlinearity is introduced, the expressive capacity of the network is increased, in other words, the data characteristic number is increased, and the data describing capacity is enhanced.

Wherein N is the number of nodes, T is the time step, C is the characteristic number of the data, and can be any natural number.

Further, in the step 3, the method for performing missing interpolation on the missing value of the time series data by using the DSTTN comprises the following steps:

step 3.1 inputting a row of time sequence data vectors containing missing data

And weight matrix->

Multiplying to obtain a column

After entering the self-attention layer->

Namely, use low-dimensional vector +.>

The multi-modal data is represented and,

in the step 3.2 of the method,

first, the first spatial block of the first spatio-temporal module is entered, and a attention mechanism is performed, i.e. an attention distribution is calculated on all input information using inner products, a weighted average of the input information is calculated from the attention distribution,

further, the method comprises the steps of,

step 3.2 is specifically as follows:

step 3.2.1 each vector

Obtaining the corresponding +.>

，/>

The direction of extraction of the guide features is a query vector related to the task,

（1），

parallel, vector

The corresponding key vector is obtained through the second linear layer and the third linear layer respectively>

Value vector->

，

（2），

（3），

Step 3.2.2 with each

For each->

Doing inner product to match the proximity of the two vectors, first for vector +.>

Is>

and />

The term is done, i.e. 2 vectors are done scaled inner product,

next, will +.>

and />

Do atttion, get ∈>

And so on,

wherein ,

as the dimension of the vector, it is,

because of

The value of (2) increases with increasing dimensions, so it is divided by +.>

Equivalent to the effect of normalization,

next, all calculated

The value is->

Operation calculates correlation, i.e.

；

Represents an exponential function based on a natural constant e, taking +.>

The operation of (1) realizes the expression of probability of output, and converts the multi-classified output value into the range of [0, 1 ]]And the sum is a probability distribution of 1,

get the 1 st output focused on the 1 st

Weight of individual inputs->

Then, it is compared with the output value of the third linear layer

Doing the inner volume gives +.>

The first vector representing the output of the first spatio-temporal block, which obviously uses the information of the whole sequence, only needs to learn the corresponding +.>

The preparation method is finished; similarly, when global information is considered, only the corresponding +.>

，

Step 3.2.3 similar to step 3.3.1 pair vector

Other components of->

and />

And (3) performing an alternation to obtain all outputs, inputting the outputs into a forward propagation network to obtain the complete output of a space block, and connecting two neural networks in parallel through residual errors to avoid the problem of gradient disappearance.

Step 3.3 combining the output of the space block and the input of the space block as the input of the time block and converting the combined output into time

The same procedure as in step 3.2 is carried out, here expressed in matrix, by +.>

The dimensions of the data characteristic are such that,

step 3.4 will

Parallel inputs to three linear layers to obtain the query matrix respectively>

Key matrix->

Value matrix->

Will->

Is +.>

And->

Is +.>

Do the attention mechanism, will->

Conversion to a line vector->

Then->

Record->

There is a attention matrix->

Representing the mechanism of attention between every two positions, for +.>

Get +.>

Operation, obtain output matrix for measuring different link weights

Taking out

The operation of (1) realizes the probabilistic expression of the output, and converts the multi-classified output value into the range +.>

And sum to a probability distribution of 1, which is then calculatedMultiplying the value matrix to obtain an output->

The final output of the first space-time block is obtained by a forward propagation network, the two neural networks are connected in parallel through residual errors to avoid the problem of gradient disappearance,

in the whole process, the space blocks are connected in parallel in a layer normalization jump mode, on one hand, time sequence data are normalized, the data are prevented from falling into a saturation region of an activation function, gradient disappearance or explosion is avoided, and on the other hand, internal residual connection is achieved, and the effect of relieving gradient disappearance is achieved.

Step 3.6 the second through kth spatio-temporal blocks perform the same operations from step 3.2 to step 3.5. In parallel, a connection exists between every two space-time blocks, and the effect of avoiding gradient disappearance is achieved.

Further, step 5 builds a loss function for the model.

Further, step 5 performs model training.

Further, in the step 6, the specific process of using the convolutional neural network to reduce the dimension of the data is as follows: the input dimension of the time sequence data after interpolation is C multiplied by N multiplied by T, the time sequence data has C characteristic dimensions, the data is processed by a convolution layer with a convolution kernel of C multiplied by C, the dimension of the data is changed into 1 multiplied by N multiplied by T, and the interpolated coupling information flow data of N nodes T in time steps is obtained.

Further, the time series data after interpolation in step 7 is input as samples into a convolution kernel of 1×1, and data of a future time step is predicted.

The invention has the beneficial effects that:

1. the invention provides a cross-modal method to solve the problem of high data loss rate, can overcome the defect of insufficient single-modal data information, realizes random sampling interpolation after node data distribution fitting by extracting the characteristics of undesired data, has high data recovery accuracy and universality, and remarkably improves the prediction precision and the operation efficiency.

2. The invention provides a DSTTN structure which establishes connection between a plurality of traditional transformer blocks through dense residual connection, and solves the problem of low memory of a LSTM (local area network) and other gating circulating unit networks.

3. The DSTTN model provided by the invention fuses space-time characteristics, and because the space topological relation of each node is considered, the space-time dependence in the problem of predicting the data coupling information flow with the missing value can be more fully captured, namely, the complex space dependence and time dynamic in the actual network, and the model has better precision and stronger generalization capability.

4. The invention realizes the study of time and space information through the space-time attention module, combines the multi-head attention mechanism and the coder-decoder together, can parallelize the calculation and saves the time cost.

Drawings

FIG. 1 is a flow chart of cross-modal data input and interpolation data output in the present method;

FIG. 2 is a schematic diagram of the internal structure of DSTTN in the present method;

FIG. 3 is the first embodiment of the method

Schematic diagram of internal structure of each space block;

FIG. 4 is the first embodiment of the method

Schematic of the internal structure of each time block.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Example 1: a time sequence data deletion interpolation system based on a modal crossing method comprises a characterization data module, a multi-modal data fusion module, a stackable space-time converter module (hereinafter referred to as a space-time module for short) and a prediction module. The invention has the main characteristics that the multi-mode data fusion module uses multi-mode data to perform missing data interpolation, and the problem processing effect of the stackable space-time module on the occurrence of complete data missing of partial nodes reaches the performance of the current best model.

And a prediction module: the method is realized by a convolutional neural network, and the input time sequence data after interpolation predicts the time sequence data of 0.1T time step through the convolutional neural network.

It can be changed into a system after modularization.

Example 2: a time sequence data missing interpolation technology based on a modal crossing method comprises the following steps:

And 4, performing dimension reduction on the interpolated time sequence data through a 1 multiplied by 1 convolutional neural network to obtain the time sequence data which is interpolated in the same dimension as the actual data.

And 5, predicting the coupling information flow data of the future time step according to the obtained time sequence data which completes the missing data interpolation by using a 1 multiplied by 1 convolutional neural network.

In step 1, a sliding window is manually selected to determine an input dimension, namely N nodes are selected, and coupling information flow data of a T time step is used as model input. Limiting the input dimension avoids that the length of the time series data input is too long to cause the dimension of the neural network to be too large to run slowly. The selected node timing data may be subject to random and non-random deletions or partial node data may be completely missing.

The specific process of using the convolutional neural network to carry out dimension lifting on the data in the step 2 is as follows: the input dimension of the time sequence data is 1 xN x T, only one characteristic dimension is provided, the data is processed by using C convolution check data, the dimension of the data is changed into C x N x T, the network is deepened under the condition of not changing the receptive field, more nonlinearity is introduced, the expressive capacity of the network is increased, in other words, the data characteristic number is increased, and the data describing capacity is enhanced.

The method for performing missing interpolation on the missing value of the time sequence data by using the DSTTN in the step 3 comprises the following steps:

step 3.1 inputting a row of time sequence data vectors containing missing data

And weight matrix->

Multiplying to obtain a column

After entering the self-attention layer->

Namely, use low-dimensional vector +.>

Representing multimodal data, step 3.2, < > and/or->

further, step 3.2 is specifically as follows:

step 3.2.1 each vector

Obtaining the corresponding +.>

，/>

（1），

parallel, vector

Value vector->

，

（2），

（3），

Step 3.2.2 with each

For each->

Is>

and />

The term is done, i.e. 2 vectors are done scaled inner product,

next, will +.>

and />

Do atttion, get ∈>

And so on,

wherein ,

for the dimension of the vector, +.>

Because of

The value of (2) increases with increasing dimensions, so it is divided by +.>

Equivalent to the effect of normalization,

next, all calculated

The value is->

Operation calculates correlation, i.e.

；

Represents an exponential function based on a natural constant e, taking +.>

get the 1 st output focused on the 1 st

Weight of individual inputs->

Then, it is compared with the output value of the third linear layer

Doing the inner volume gives +.>

，

Step 3.2.3 similar to step 3.3.1 pair vector

Other components of->

and />

The dimensions of the data characteristic are such that,

step 3.4 will

Parallel inputs to three linear layers to obtain the query matrix respectively>

Key matrix->

Value matrix->

Will->

Is +.>

And->

Is +.>

Do the attention mechanism, will->

Conversion to a line vector->

Then->

Record->

There is a attention matrix->

Representing the mechanism of attention between every two positions, for +.>

Get +.>

Operation, obtain output matrix for measuring different link weights

Taking out

And the sum is a probability distribution of 1, which is multiplied by a matrix of values to obtain the output +.>

Step 3.7 training the model.

The specific process of using convolutional neural network to reduce the data dimension in step 4 is as follows: the input dimension of the time sequence data after interpolation is C multiplied by N multiplied by T, the time sequence data has C characteristic dimensions, the data is processed by a convolution layer with a convolution kernel of C multiplied by C, the dimension of the data is changed into 1 multiplied by N multiplied by T, and the interpolated coupling information flow data of N nodes T in time steps is obtained.

The time series data after interpolation in the step 5 is input into a convolution kernel of 1×1 as a sample, and the data of a future time step is predicted.

Example 3: networked data age information stream data prevails. The specific operation of the present invention will be described below by taking traffic flow data as an example. In order to verify the method of the present invention based on the complete data missing time series data interpolation of space-time coding network, taking as an example a neighboring area (hereinafter abbreviated as D5 data set) of the 5 th area collected from the california bureau performance measurement system in 2019, data of the area aggregated by 53 sensor stations and traffic speed samples in 7 months is selected, a set time interval is 5 minutes, and the meaning of the flow value indicates an average traffic speed (km/h) in the corresponding 5 minute interval. In addition, the locations of these sensor stations are all referred to as latitude and longitude. The original data is a two-dimensional matrix, the row index represents the time step, and the column index represents the number of selected nodes.

If some traffic stations fail for a long period of time, even without sensors, a complete loss of partial data occurs. In general, it is not possible to infer traffic flow evolution laws for non-observable nodes without any information. While DSTTN uses cross-modal views, i.e. traffic flow data of sensor nodes with different shorter distances in a specific mode is similar, it also has good processing power for this problem.

Step 1: obtaining actual measurement traffic flow data information, and representing a detector network of an urban road network as a weighted directed graph

，/>

Node set for detector, +.>

Is edge set, is->

A weighted adjacent matrix for describing the similarity degree of different nodes; preprocessing the actually measured traffic flow data information to form traffic flow time sequence data numbered according to sampling points; wherein the weighted adjacency matrix describing the similarity degree of different nodes is->

Is an adjacency matrix based on graph generated by physical distance between nodes, and its weight is the linear physical distance between two nodes, and is calculated by haverine tool (python) is carried into the node longitude and latitude calculation, and the visual understanding is that the closer the physical distance is, the closer the node flow is. The weighted adjacency matrix serves as an initialization matrix for the spatial blocks such that the time series data has spatial characteristics. The present invention constructs an adjacency matrix in the following manner.

；

wherein ,

representing node->

And node->

Physical distance between the two nodes, it is apparent that the closer the distance between the two nodes is, the higher the degree of similarity.

Step 1, processing sound data and image data by a characterization data module; the graph data are tensed into vectors; and the data is combined with the missing data to form a multi-mode data input 1×1 convolutional neural network.

And 3, performing deletion interpolation on the deletion value of the preprocessed traffic flow time sequence data by using the DSTTN.

And 5, predicting the traffic flow data of the future time step according to the obtained time sequence data which completes the interpolation of the missing data by using a 1 multiplied by 1 convolutional neural network.

Further, in step 1, 53 nodes are selected by manually selecting the sliding window, and traffic flow data of 96 time steps is input as a model. Limiting the input dimension avoids that the length of the time series data input is too long to cause the dimension of the neural network to be too large to run slowly. The selected node timing data may be subject to random and non-random deletions or partial node data may be completely missing.

Further, in the step 2, the specific process of using the convolutional neural network to perform dimension lifting on the data is as follows: the time series data has only one characteristic dimension, and is processed by using 32 convolution check data. Deepening the network without changing the receptive field introduces more nonlinearities, which increases the expressive power of the network, in other words, an increase in the number of data features results in an enhancement of the data's characterization ability.

step 3.1 inputting a row of time sequence data vectors containing missing data

And weight matrix->

Multiplying to obtain a column of ebedding->

And then into the self-attention layer. Embedding is to use low-dimensional vector +.>

Representing multimodal data.

In the step 3.2 of the method,

first the first spatial block of the first spatio-temporal module is entered and the attention mechanism is performed. That is, the attention distribution is calculated using the inner product on all the input information, and the weighted average of the input information is calculated from the attention distribution.

Further, step 3.2.1 each vector

Obtaining the corresponding +.>

，/>

The direction of the guided feature extraction is a query vector associated with the task.

；

Parallel, vector

Value vector->

，

，

，

Step 3.2.2 with each

For each->

Is>

and />

The term is done, i.e. 2 vectors are done scaled inner product,

next, will +.>

and />

Do atttion, get ∈>

And so on,

wherein ,

is the dimension of the vector.

Because of

The value of (2) increases with increasing dimensions, so it is divided by +.>

Equivalent to normalization effect.

Next, all calculated

The value is->

Operation calculates correlation, i.e.

。

Expressed as natural constant->

An exponential function of the base. Get->

Is realized by the operation ofThe probability expression of the output converts the multi-classified output value into the range of 0, 1]And the sum is a probability distribution of 1.

Get the 1 st output focused on the 1 st

Weight of individual inputs->

Then, it is combined with the output of the third linear layer

Doing the inner volume gives +.>

A first vector representing the output of a first spatio-temporal block. Obviously, it uses the information of the whole sequence. If local information is to be considered, only the corresponding +.>

。

Step 3.2.3 similar to step 3.3.1 pair vector

Other components of->

and />

All outputs are obtained by doing the intent, and are input into a forward propagation network to obtain the complete output of the space block. In parallel, the gradient vanishing problem is avoided by connecting two neural networks through residual errors.

The same procedure as in step 3.2 is performed, which is here represented by a matrix. n is the dimension of the data feature.

Step 3.4 will

Parallel inputs to three linear layers to obtain the query matrix respectively>

Key matrix->

Value matrix->

Will->

Is +.>

And->

Is +.>

Do the attention mechanism, will->

Conversion to a line vector->

Then->

Record->

There is a attention matrix->

Representing the mechanism of attention between every two positions, for +.>

Get +.>

Operation, obtain output matrix for measuring different link weights

Taking out

step 3.5, in the whole process, layer normalization jump connection is performed in parallel between the space blocks, on one hand, normalization is performed on time sequence data, so that the data is prevented from falling into a saturation region of an activation function, gradient disappearance or explosion is avoided, on the other hand, internal residual connection is performed, the effect of relieving gradient disappearance is achieved,

step 3.6 the second through kth spatio-temporal blocks perform the same operations from step 3.2 to step 3.5. In parallel, every two space-time blocks are connected, so as to achieve the effect of avoiding gradient disappearance,

step 3.7 training the model.

Further, in the step 4, the specific process of using the convolutional neural network to reduce the dimension of the data is as follows: the output of the time sequence data after interpolation has 32 characteristic dimensions, and the data is processed by a convolution layer, so that the traffic flow data after interpolation of 53 nodes and 96 time steps can be obtained.

Further, the traffic flow data interpolated in step 5 is input as a sample into a 1×1 convolution kernel, and the data of the future time step is predicted.

In an example, the loss function is constructed as follows,

if it is

For missing data->

The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, then->

. wherein />

Is an adjacency moment describing the physical topology between nodes, < >>

Depicting node->

And node->

Degree of similarity between. D is the degree matrix of A->

Is node->

At time->

Is>

Is a model->

At time->

Fitting of the data.

The loss function is formed by adding two items, the first item ensures that the fitting precision of non-missing data is high enough, and the second item utilizes the time sequence value similarity of similar nodes to treat the problem of complete data missing. Setting penalty factors

The weight ratio of the two items is adjusted.

Setting a data loss rate:

, wherein />

Is the number of missing data. N is the number of nodes 53, T is the number of time steps 96./>

By continuously comparing the true value with the interpolated value, the entire neural network will continuously counter-propagate the training error and adjust the weight parameters therein. The loss function is reduced by using a batch gradient descent method, so that the weight parameter in the middle can be adjusted once every time training data of each batch is passed, and training is stopped until the iteration number requirement is met or the error, the precision and the like meet a certain threshold value.

To verify the missing data interpolation effect proposed by the present invention, the D5 dataset described above was used and several reference models were chosen for comparison, including bayesian gaussian tensor decomposition (Bayesian GaussianCANDECOMP/paramac, BGCP), depth stack auto encoder (DeepStack Auto Encoder, DSAE), generate anti-depth stack auto encoder (GAN-DSAE), two-way long and short term memory artificial neural network (Bidirectional long-short term memory, BD-LSTM). The predicted effect pair is shown in table 1 as follows:

table 1:

it can be seen from table 1 that the present invention is superior to other reference models under all metrics at data loss rates of 30% -70%, and is a model that performs better than the reference model in missing data interpolation with high loss rates.

In order to verify the effect of the complete missing data interpolation of the partial nodes, the D5 data set is adopted, and the following reference models are selected for comparison, including BGCP, DSAE, GAN-DSAE and BD-LSTM. The predicted effect pair is shown in table 2:

table 2:

as can be seen from Table 2, the present invention is superior to other reference models in all metrics at a loss rate between 5% and 15%, and is a model that performs better than the reference model in the complete loss data interpolation of partial node data.

The analysis shows that the method provided by the invention solves the problems of low recovery rate and complete loss of local data under the condition of high data loss rate, which are not in good solution, realizes the interpolation of the lost data of the intelligent traffic system flow, and can fully capture the time-space characteristics of traffic flow.

The above embodiments are merely for illustrating the design concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, the scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes or modifications according to the principles and design ideas of the present invention are within the scope of the present invention.

Claims

1. The time sequence data deletion interpolation system based on the modal crossing method is characterized by comprising a characterization data module, a multi-modal data fusion module and a stackable space-time converter module, and a prediction module;

wherein, the characterization data module: node characterization is obtained through calculation firstly based on graph neural network graph characterization learning, and then graph pooling is carried out on the characterization of each node on the graph to obtain the characterization of the graph, and image data is converted into vectors; simultaneously, a neural network is utilized to learn a representation which can capture high-level semantic content from signals and is not interfered by low-level details in the signals, sound data are converted into vectors, and the obtained result is input into a multi-mode data fusion module;

multimode data fusion module: converting the graph data into adjacent matrixes to form a vector, and then combining the vector with the output of the characterization data module and the data containing the missing time sequence into a tensor to obtain multi-mode data input; meanwhile, the graph data is used as the initialization input of a first space block, the space characteristics of the time sequence data can be extracted after passing through the space block, the multi-mode data is input into a stackable space-time module after passing through the dimension rise of a 1X 1 convolutional neural network, and the dimension rise can increase the dimension of the data characteristic extraction so as to analyze the data more comprehensively;

stackable spatiotemporal modules: the module consists of a plurality of space-time modules, a dense connection sub-module and a convolutional neural network sub-module;

each space-time submodule comprises a time block and a space block, the structural frameworks of the time block and the space block are the same, the main body framework is built by a transducer, the densely connected submodules are connected in parallel with a plurality of space-time modules to realize dense interaction of characteristic information flow, output time sequence data which enters a convolutional neural network for dimension reduction to obtain data interpolation is input into a prediction module,

and a prediction module: the method is realized by a convolutional neural network, and the input time sequence data after interpolation predicts the time sequence data of 10% of the original time step through the convolutional neural network.

2. A time series data missing interpolation method based on a modal crossing method, characterized in that the interpolation system of claim 1 is adopted, the method comprises the following steps:

step 1: the characterization data module processes sound data and image data, the image data are tensed into vectors, and the vectors and the missing data are fused into a multi-mode data input 1 multiplied by 1 convolutional neural network;

step 2: the convolution neural network of 1 multiplied by 1 is utilized to carry out dimension lifting on the data, the characteristic extraction capability of the data is improved,

step 3: performing missing interpolation on missing values of the preprocessed coupling information flow time sequence data by using a dense space-time coding network,

step 4: a loss function is constructed for the model and,

step 5: the model is trained in such a way that,

step 6: the interpolated time sequence data is subjected to dimension reduction by a convolution neural network of 1 multiplied by 1 to obtain the time sequence data which is interpolated in the same dimension as the actual data,

step 7: and predicting the coupling information flow data of the future time steps according to the obtained time sequence data which completes the interpolation of the missing data by using a 1 multiplied by 1 convolutional neural network.

3. The method for temporal data loss interpolation based on the modal crossover method as claimed in claim 2,

in step 1, a sliding window is manually selected to determine an input dimension, namely N nodes are selected, coupling information flow data of a T time step is taken as model input, the input dimension is limited, and the time sequence data of the selected nodes can generate random missing and non-random missing situations or partial node data is completely missing.

4. The method for temporal data loss interpolation based on the modal crossover method as claimed in claim 2,

the specific process of using the convolutional neural network to carry out dimension lifting on the data in the step 2 is as follows: the input dimension of the time sequence data is 1 xN x T, only one characteristic dimension is provided, the data is processed by using C convolution check data, the dimension of the data is changed into C x N x T, the network is deepened under the condition of not changing a receptive field, more nonlinearity is introduced, N is the number of nodes, T is the time step, C is the characteristic number of the data, and the characteristic number is any natural number.

5. The method for temporal data loss interpolation based on the modal crossover method as claimed in claim 2,

the method for performing missing interpolation on the missing value of the time sequence data by using the dense space-time coding network in the step 3 comprises the following steps:

step 3.1 inputting a row of time sequence data vectors containing missing data

And weight matrix->

Multiplying to obtain a column

After entering the self-attention layer->

Namely, use low-dimensional vector +.>

The multi-modal data is represented and,

in the step 3.2 of the method,

The dimensions of the data characteristic are such that,

step 3.4 will

Parallel inputs to three linear layers to obtain the query matrix respectively>

Key matrix->

Value matrix

Will->

Is +.>

And->

Is +.>

Do the attention mechanism, will->

Conversion to a line vector->

Then->

Record->

There is a attention matrix->

Representing the mechanism of attention between every two positions, for +.>

Get +.>

Operation, obtain output matrix for measuring different link weights

Taking out

The operation of (1) realizes the expression of probability of output, and converts the multi-classified output values into the range

step 3.5 in the whole process, carrying out layer normalization jump connection between the space block and the empty block in parallel,

step 3.6 second to third

The same operation from step 3.2 to step 3.5 is carried out on each space-time block, and the connection exists between every two space-time blocks in parallel, so that the effect of avoiding gradient disappearance is achieved.

6. The method for temporal data loss interpolation based on the modal crossover method as claimed in claim 5,

step 3.2 is specifically as follows:

step 3.2.1 each vector

Obtaining the corresponding +.>

，/>

（1），

parallel, vector

Value vector->

，

（2），

（3），

Step 3.2.2 with each

For each->

Doing inner products to match thisThe proximity of the two vectors is first of all to the vector +.>

Is>

and />

Do attitution, i.e. do 2 vectors scaled inner product,>

next, will +.>

and />

Do atttion, get ∈>

And so on,

wherein ,

for the dimension of the vector, +.>

Because of

The value of (2) increases with increasing dimensions, so it is divided by +.>

Equivalent to the effect of normalization,

next, all calculated

The value is->

Operation calculates correlation, i.e.

；

Represents an exponential function based on a natural constant e, taking +.>

get the 1 st output focused on the 1 st

Weight of individual inputs->

Then, it is combined with the output value of the third linear layer>

Doing the inner volume gives +.>

，

Step 3.2.3 similar to step 3.3.1 pair vector

Other components of->

and />

7. The temporal data loss interpolation method based on the modal crossover method according to claim 2, wherein

Constructing a loss function of the multi-modal data in step 4

：

；

Is reconstruction error, +.>

Is a graph Laplace regularization term used to make similar nodes more similar, +.>

Is a penalty factor, which is used as a weight to adjust the two errors, optimizing the model.

8. The method for temporal data loss interpolation based on the modal crossover method as claimed in claim 2,

and 5, training the model by using the loss function in the step 4.

9. The method for temporal data loss interpolation based on the modal crossover method as claimed in claim 2,

in the step 6, the specific process of using the convolutional neural network to reduce the data dimension is as follows: the input dimension of the time sequence data after interpolation is C multiplied by N multiplied by T, the time sequence data has C characteristic dimensions, the data is processed by a convolution layer with a convolution kernel of C multiplied by C, the dimension of the data is changed into 1 multiplied by N multiplied by T, and the interpolated coupling information flow data of N nodes T in time steps is obtained.

10. The method according to claim 2, wherein the time series data after interpolation in step 7 is input as samples to a convolution kernel of 1×1, and the data of the future time step is predicted.