CN115827335B - Time sequence data missing interpolation system and time sequence data missing interpolation method based on modal crossing method - Google Patents
Time sequence data missing interpolation system and time sequence data missing interpolation method based on modal crossing method Download PDFInfo
- Publication number
- CN115827335B CN115827335B CN202310063746.6A CN202310063746A CN115827335B CN 115827335 B CN115827335 B CN 115827335B CN 202310063746 A CN202310063746 A CN 202310063746A CN 115827335 B CN115827335 B CN 115827335B
- Authority
- CN
- China
- Prior art keywords
- data
- time
- space
- time sequence
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 76
- 239000013598 vector Substances 0.000 claims abstract description 69
- 238000012512 characterization method Methods 0.000 claims abstract description 28
- 238000012217 deletion Methods 0.000 claims abstract description 21
- 230000037430 deletion Effects 0.000 claims abstract description 21
- 230000008878 coupling Effects 0.000 claims abstract description 15
- 238000010168 coupling process Methods 0.000 claims abstract description 15
- 238000005859 coupling reaction Methods 0.000 claims abstract description 15
- 230000004927 fusion Effects 0.000 claims abstract description 13
- 238000013527 convolutional neural network Methods 0.000 claims description 39
- 239000011159 matrix material Substances 0.000 claims description 25
- 238000013528 artificial neural network Methods 0.000 claims description 24
- 230000000694 effects Effects 0.000 claims description 18
- 230000008034 disappearance Effects 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 15
- 230000007246 mechanism Effects 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 9
- 230000002123 temporal effect Effects 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 9
- 230000009467 reduction Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000002360 preparation method Methods 0.000 claims description 4
- 230000003993 interaction Effects 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 6
- 238000011084 recovery Methods 0.000 abstract description 6
- 230000008901 benefit Effects 0.000 abstract description 3
- 238000012545 processing Methods 0.000 description 7
- 238000005070 sampling Methods 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 238000004880 explosion Methods 0.000 description 3
- 230000015654 memory Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000007787 long-term memory Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000013213 extrapolation Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- YHXISWVBGDMDLQ-UHFFFAOYSA-N moclobemide Chemical compound C1=CC(Cl)=CC=C1C(=O)NCCN1CCOCC1 YHXISWVBGDMDLQ-UHFFFAOYSA-N 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Complex Calculations (AREA)
Abstract
The invention provides a time sequence data deletion interpolation system based on a modal crossing method, which consists of four modules, namely a characterization data module, a multi-modal data fusion module and a stackable space-time converter module (hereinafter referred to as a space-time module for short), and a prediction module. The method is suitable for the situation that data is randomly lost and non-randomly lost and partial node data is completely lost. The system adopts a multi-mode data fusion technology to convert image data and sound data into vectors through characterization learning, converts image data information into adjacent matrixes to form a vector, and fuses the adjacent matrixes with missing time sequence data and the like to form prior information; and constructing an end-to-end coding channel by establishing dense connection between each space-time block to realize accurate recovery of missing data, wherein the space-time block consists of a space block and a time block. The invention has the advantage of high accuracy in recovering the information flow coupling time sequence data.
Description
Technical Field
The invention relates to a coupling information flow time sequence data missing interpolation system based on a dense space-time coding network and with partial nodes completely missing, and belongs to the technical field of missing data recovery.
Background
With the popularization of information sensor technology, explosive data volume has prompted the generation of a large data age. The method based on the data driving technology comprises deep learning, reinforcement learning, incremental machine learning and the like, and is well applied to the aspects of optimizing operation of electric power and traffic systems, planning decision-making, demand side response and the like. Taking a traffic system as an example, in an intelligent traffic system, data sampling is based on an intelligent sensor network and comprises various components such as an intelligent card, a GPS, a speed sensor, a video detector and the like. The intelligent traffic system based on the data driving technology has strong traffic state sensing capability, and can realize accurate traffic flow prediction, optimal scheduling and real-time target control. These data-driven models often require a large amount of data to train before they are actually used to achieve better performance. However, due to engineering problems such as irregular sampling equipment faults, time series data collected from a sensor network are often incomplete, which causes common data loss problems. The problem can be largely divided into two categories, namely random deletion and non-random deletion, depending on the type of data deletion.
The generation of the missing data is always the research direction of interest of expert scholars at home and abroad, and the search of a method for improving the recovery performance of the missing data is never stopped. Data driving methods in existing research can be divided into three major categories, namely a statistical model, a tensor method based on low-rank hypothesis and a deep learning model. In the 90 s of the 20 th century, a statistical-based ARIMA, KNN (K-NearestNeighbor) model was proposed, and the core idea of these methods is a weighted average method, so that the weighted average principle is highly dependent on the definition of weight, and the responsibility returning result is often too rough to use because the weighted average principle is linear and not practical. The current mainstream approach is a tensor-based approach, such as the recently proposed Bayesian GaussianCANDECOMP/PARAFAC (BGCP) model, which is based on low rank hypotheses. By processing the global tensor information and reconstructing the sparse tensor by Bayesian reasoning, ideal performance is achieved. However, most tensor-based methods use spatial information is limited to the scope of the temporary tensor structure, rather than the spatial location of the nodes. Thus, their performance may decrease as the data loss rate increases.
Further, deep learning is a new method to estimate missing data. For example, the deep learning such as a long and short term memory artificial neural network (LSTM), a gate-controlled cyclic neural network (GRU), a Convolutional Neural Network (CNN) and the like can effectively mine time-space association in time sequence data, and has good effect of interpolation on random data deletion under the condition of low deletion rate, but the method has overlarge requirement on training sample size, poor generalization capability under the condition of small sample, easy fitting problem occurrence and difficult realization of data extrapolation on data fields without any time sequence information, and the variety limits the performance of the deep learning interpolation method.
Based on the above, the invention aims to consider the problems that the random deletion of data, the non-random deletion has high deletion rate and the data appears partial node data is completely deleted.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a novel deep learning model system, namely a method based on a dense space-time coding network (DSTTN, denseSpatial-temporal Transformer Nets), which is used for solving the problem of data recovery under the condition of high miss rate based on a cross-modal principle. The model fully utilizes space and time mode information through a space transducer block (hereinafter referred to as a space block) and a time transducer block (hereinafter referred to as a time block) instead of only considering time sequence information. For the situation that data is randomly missing or non-randomly missing, fitting the distribution of the data of a certain node by extracting the characteristics of the data which is not missing, and randomly sampling the data from the fitted data distribution to be used as an interpolation value to finish missing data interpolation; and for the situation that part of node data is completely missing, fitting the distribution of the node data by utilizing the data of the similar nodes, and then performing random sampling to serve as interpolation values to complete interpolation. Compared with the two-way long-short-term memory artificial neural network with the best performance in the reference model, the invention has the advantage of average absolute percentage error.
The technical scheme adopted by the invention is as follows:
a time sequence data deletion interpolation system based on a modal crossing method comprises a characterization data module, a multi-modal data fusion module, a stackable space-time converter module (hereinafter referred to as a space-time module for short) and a prediction module. The invention has the main characteristics that the multi-mode data fusion module uses multi-mode data to perform missing data interpolation, and the problem processing effect of the stackable space-time module on the occurrence of complete data missing of partial nodes reaches the performance of the current best model.
And a characterization data module: node characterization is obtained through calculation firstly based on graph neural network graph characterization learning, and then graph pooling is carried out on the characterization of each node on the graph to obtain the characterization of the graph, and image data is converted into vectors; while using neural networks to learn a representation that captures high-level semantic content from the signal while being undisturbed by low-level details in the signal (e.g., potential pitch contour or background noise), the sound data is converted into vectors. And inputting the obtained result into a multi-mode data fusion module.
Multimode data fusion module: converting the graph data into adjacent matrixes to form a vector, and then combining the vector with the output of the characterization data module and the data containing the missing time sequence into a tensor to obtain multi-mode data input; and simultaneously, the graph data is used as the initialization input of the first space block, and the time sequence data can extract the space characteristics after passing through the space block. The multi-modal data is input into the stackable space-time module after being upscaled by a 1 x 1 convolutional neural network. The upscaling can increase the data feature extraction dimension for more comprehensive analysis of the data.
Stackable spatiotemporal modules: the module consists of a plurality of space-time modules, a dense connection sub-module and a convolutional neural network sub-module; as shown in fig. 3.
Further, each of the spatio-temporal modules includes a temporal and a spatial block. The time block and the space block have the same structural framework, and the main body framework is built by a transducer. The densely connected sub-modules are connected with a plurality of space-time modules in parallel, so that dense interaction of the characteristic information flows is realized. And outputting the time sequence data which enters the convolutional neural network for dimension reduction to obtain the data interpolation. And inputs it to the prediction module.
And a prediction module: the method is realized by a convolutional neural network, and the input time sequence data after interpolation is used for predicting the time sequence data of a system for a period of time in the future through the convolutional neural network.
It can be changed into a system after modularization.
A time sequence data missing interpolation technology based on a modal crossing method comprises the following steps:
And 2, carrying out dimension lifting on the data by using a 1 multiplied by 1 convolutional neural network, and improving the feature extraction capability of the data.
And 3, performing deletion interpolation on the deletion value of the preprocessed coupling information flow time sequence data by using the DSTTN.
Step 4, constructing a loss function for the model,
step 5, training the model,
and 6, performing dimension reduction on the interpolated time sequence data through a 1 multiplied by 1 convolutional neural network to obtain the time sequence data which is interpolated in the same dimension as the actual data.
And 7, predicting the coupling information flow data of the future time step according to the obtained time sequence data which completes the interpolation of the missing data by using a 1 multiplied by 1 convolutional neural network.
Further, in step 1, the sliding window is manually selected to determine the input dimension, i.e. N nodes are selected, and the coupling information stream data of the T time step is taken as the model input. Limiting the input dimension avoids that the length of the time series data input is too long to cause the dimension of the neural network to be too large to run slowly. The selected node timing data may be subject to random and non-random deletions or partial node data may be completely missing.
Further, in the step 2, the specific process of using the convolutional neural network to perform dimension lifting on the data is as follows: the input dimension of the time sequence data is 1 xN x T, only one characteristic dimension is provided, the data is processed by using C convolution check data, the dimension of the data is changed into C x N x T, the network is deepened under the condition of not changing the receptive field, more nonlinearity is introduced, the expressive capacity of the network is increased, in other words, the data characteristic number is increased, and the data describing capacity is enhanced.
Wherein N is the number of nodes, T is the time step, C is the characteristic number of the data, and can be any natural number.
Further, in the step 3, the method for performing missing interpolation on the missing value of the time series data by using the DSTTN comprises the following steps:
step 3.1 inputting a row of time sequence data vectors containing missing dataAnd weight matrix->Multiplying to obtain a columnAfter entering the self-attention layer->Namely, use low-dimensional vector +.>The multi-modal data is represented and,
in the step 3.2 of the method,first, the first spatial block of the first spatio-temporal module is entered, and a attention mechanism is performed, i.e. an attention distribution is calculated on all input information using inner products, a weighted average of the input information is calculated from the attention distribution,
further, the method comprises the steps of,
step 3.2 is specifically as follows:
step 3.2.1 each vectorObtaining the corresponding +.>,/>The direction of extraction of the guide features is a query vector related to the task,
parallel, vectorThe corresponding key vector is obtained through the second linear layer and the third linear layer respectively>Value vector->,
Step 3.2.2 with eachFor each->Doing inner product to match the proximity of the two vectors, first for vector +.>Is> and />The term is done, i.e. 2 vectors are done scaled inner product,next, will +.> and />Do atttion, get ∈>And so on,
because ofThe value of (2) increases with increasing dimensions, so it is divided by +.>Equivalent to the effect of normalization,
Represents an exponential function based on a natural constant e, taking +.>The operation of (1) realizes the expression of probability of output, and converts the multi-classified output value into the range of [0, 1 ]]And the sum is a probability distribution of 1,
get the 1 st output focused on the 1 stWeight of individual inputs->Then, it is compared with the output value of the third linear layerDoing the inner volume gives +.>The first vector representing the output of the first spatio-temporal block, which obviously uses the information of the whole sequence, only needs to learn the corresponding +.>The preparation method is finished; similarly, when global information is considered, only the corresponding +.>,
Step 3.2.3 similar to step 3.3.1 pair vectorOther components of-> and />And (3) performing an alternation to obtain all outputs, inputting the outputs into a forward propagation network to obtain the complete output of a space block, and connecting two neural networks in parallel through residual errors to avoid the problem of gradient disappearance.
Step 3.3 combining the output of the space block and the input of the space block as the input of the time block and converting the combined output into timeThe same procedure as in step 3.2 is carried out, here expressed in matrix, by +.>The dimensions of the data characteristic are such that,
step 3.4 willParallel inputs to three linear layers to obtain the query matrix respectively>Key matrix->Value matrix->Will->Is +.>And->Is +.>Do the attention mechanism, will->Conversion to a line vector->Then->Record->There is a attention matrix->Representing the mechanism of attention between every two positions, for +.>Get +.>Operation, obtain output matrix for measuring different link weights
Taking outThe operation of (1) realizes the probabilistic expression of the output, and converts the multi-classified output value into the range +.>And sum to a probability distribution of 1, which is then calculatedMultiplying the value matrix to obtain an output->The final output of the first space-time block is obtained by a forward propagation network, the two neural networks are connected in parallel through residual errors to avoid the problem of gradient disappearance,
in the whole process, the space blocks are connected in parallel in a layer normalization jump mode, on one hand, time sequence data are normalized, the data are prevented from falling into a saturation region of an activation function, gradient disappearance or explosion is avoided, and on the other hand, internal residual connection is achieved, and the effect of relieving gradient disappearance is achieved.
Step 3.6 the second through kth spatio-temporal blocks perform the same operations from step 3.2 to step 3.5. In parallel, a connection exists between every two space-time blocks, and the effect of avoiding gradient disappearance is achieved.
Further, step 5 builds a loss function for the model.
Further, step 5 performs model training.
Further, in the step 6, the specific process of using the convolutional neural network to reduce the dimension of the data is as follows: the input dimension of the time sequence data after interpolation is C multiplied by N multiplied by T, the time sequence data has C characteristic dimensions, the data is processed by a convolution layer with a convolution kernel of C multiplied by C, the dimension of the data is changed into 1 multiplied by N multiplied by T, and the interpolated coupling information flow data of N nodes T in time steps is obtained.
Further, the time series data after interpolation in step 7 is input as samples into a convolution kernel of 1×1, and data of a future time step is predicted.
The invention has the beneficial effects that:
1. the invention provides a cross-modal method to solve the problem of high data loss rate, can overcome the defect of insufficient single-modal data information, realizes random sampling interpolation after node data distribution fitting by extracting the characteristics of undesired data, has high data recovery accuracy and universality, and remarkably improves the prediction precision and the operation efficiency.
2. The invention provides a DSTTN structure which establishes connection between a plurality of traditional transformer blocks through dense residual connection, and solves the problem of low memory of a LSTM (local area network) and other gating circulating unit networks.
3. The DSTTN model provided by the invention fuses space-time characteristics, and because the space topological relation of each node is considered, the space-time dependence in the problem of predicting the data coupling information flow with the missing value can be more fully captured, namely, the complex space dependence and time dynamic in the actual network, and the model has better precision and stronger generalization capability.
4. The invention realizes the study of time and space information through the space-time attention module, combines the multi-head attention mechanism and the coder-decoder together, can parallelize the calculation and saves the time cost.
Drawings
FIG. 1 is a flow chart of cross-modal data input and interpolation data output in the present method;
FIG. 2 is a schematic diagram of the internal structure of DSTTN in the present method;
FIG. 3 is the first embodiment of the methodSchematic diagram of internal structure of each space block;
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1: a time sequence data deletion interpolation system based on a modal crossing method comprises a characterization data module, a multi-modal data fusion module, a stackable space-time converter module (hereinafter referred to as a space-time module for short) and a prediction module. The invention has the main characteristics that the multi-mode data fusion module uses multi-mode data to perform missing data interpolation, and the problem processing effect of the stackable space-time module on the occurrence of complete data missing of partial nodes reaches the performance of the current best model.
And a characterization data module: node characterization is obtained through calculation firstly based on graph neural network graph characterization learning, and then graph pooling is carried out on the characterization of each node on the graph to obtain the characterization of the graph, and image data is converted into vectors; while using neural networks to learn a representation that captures high-level semantic content from the signal while being undisturbed by low-level details in the signal (e.g., potential pitch contour or background noise), the sound data is converted into vectors. And inputting the obtained result into a multi-mode data fusion module.
Multimode data fusion module: converting the graph data into adjacent matrixes to form a vector, and then combining the vector with the output of the characterization data module and the data containing the missing time sequence into a tensor to obtain multi-mode data input; and simultaneously, the graph data is used as the initialization input of the first space block, and the time sequence data can extract the space characteristics after passing through the space block. The multi-modal data is input into the stackable space-time module after being upscaled by a 1 x 1 convolutional neural network. The upscaling can increase the data feature extraction dimension for more comprehensive analysis of the data.
Stackable spatiotemporal modules: the module consists of a plurality of space-time modules, a dense connection sub-module and a convolutional neural network sub-module; as shown in fig. 3.
Further, each of the spatio-temporal modules includes a temporal and a spatial block. The time block and the space block have the same structural framework, and the main body framework is built by a transducer. The densely connected sub-modules are connected with a plurality of space-time modules in parallel, so that dense interaction of the characteristic information flows is realized. And outputting the time sequence data which enters the convolutional neural network for dimension reduction to obtain the data interpolation. And inputs it to the prediction module.
And a prediction module: the method is realized by a convolutional neural network, and the input time sequence data after interpolation predicts the time sequence data of 0.1T time step through the convolutional neural network.
It can be changed into a system after modularization.
Example 2: a time sequence data missing interpolation technology based on a modal crossing method comprises the following steps:
And 2, carrying out dimension lifting on the data by using a 1 multiplied by 1 convolutional neural network, and improving the feature extraction capability of the data.
And 3, performing deletion interpolation on the deletion value of the preprocessed coupling information flow time sequence data by using the DSTTN.
And 4, performing dimension reduction on the interpolated time sequence data through a 1 multiplied by 1 convolutional neural network to obtain the time sequence data which is interpolated in the same dimension as the actual data.
And 5, predicting the coupling information flow data of the future time step according to the obtained time sequence data which completes the missing data interpolation by using a 1 multiplied by 1 convolutional neural network.
In step 1, a sliding window is manually selected to determine an input dimension, namely N nodes are selected, and coupling information flow data of a T time step is used as model input. Limiting the input dimension avoids that the length of the time series data input is too long to cause the dimension of the neural network to be too large to run slowly. The selected node timing data may be subject to random and non-random deletions or partial node data may be completely missing.
The specific process of using the convolutional neural network to carry out dimension lifting on the data in the step 2 is as follows: the input dimension of the time sequence data is 1 xN x T, only one characteristic dimension is provided, the data is processed by using C convolution check data, the dimension of the data is changed into C x N x T, the network is deepened under the condition of not changing the receptive field, more nonlinearity is introduced, the expressive capacity of the network is increased, in other words, the data characteristic number is increased, and the data describing capacity is enhanced.
Wherein N is the number of nodes, T is the time step, C is the characteristic number of the data, and can be any natural number.
The method for performing missing interpolation on the missing value of the time sequence data by using the DSTTN in the step 3 comprises the following steps:
step 3.1 inputting a row of time sequence data vectors containing missing dataAnd weight matrix->Multiplying to obtain a columnAfter entering the self-attention layer->Namely, use low-dimensional vector +.>Representing multimodal data, step 3.2, < > and/or->First, the first spatial block of the first spatio-temporal module is entered, and a attention mechanism is performed, i.e. an attention distribution is calculated on all input information using inner products, a weighted average of the input information is calculated from the attention distribution,
further, step 3.2 is specifically as follows:
step 3.2.1 each vectorObtaining the corresponding +.>,/>The direction of extraction of the guide features is a query vector related to the task,
parallel, vectorThe corresponding key vector is obtained through the second linear layer and the third linear layer respectively>Value vector->,
Step 3.2.2 with eachFor each->Doing inner product to match the proximity of the two vectors, first for vector +.>Is> and />The term is done, i.e. 2 vectors are done scaled inner product,next, will +.> and />Do atttion, get ∈>And so on,
Because ofThe value of (2) increases with increasing dimensions, so it is divided by +.>Equivalent to the effect of normalization,
Represents an exponential function based on a natural constant e, taking +.>The operation of (1) realizes the expression of probability of output, and converts the multi-classified output value into the range of [0, 1 ]]And the sum is a probability distribution of 1,
get the 1 st output focused on the 1 stWeight of individual inputs->Then, it is compared with the output value of the third linear layerDoing the inner volume gives +.>The first vector representing the output of the first spatio-temporal block, which obviously uses the information of the whole sequence, only needs to learn the corresponding +.>The preparation method is finished; similarly, when global information is considered, only the corresponding +.>,
Step 3.2.3 similar to step 3.3.1 pair vectorOther components of-> and />And (3) performing an alternation to obtain all outputs, inputting the outputs into a forward propagation network to obtain the complete output of a space block, and connecting two neural networks in parallel through residual errors to avoid the problem of gradient disappearance.
Step 3.3 combining the output of the space block and the input of the space block as the input of the time block and converting the combined output into timeThe same procedure as in step 3.2 is carried out, here expressed in matrix, by +.>The dimensions of the data characteristic are such that,
step 3.4 willParallel inputs to three linear layers to obtain the query matrix respectively>Key matrix->Value matrix->Will->Is +.>And->Is +.>Do the attention mechanism, will->Conversion to a line vector->Then->Record->There is a attention matrix->Representing the mechanism of attention between every two positions, for +.>Get +.>Operation, obtain output matrix for measuring different link weights
Taking outThe operation of (1) realizes the probabilistic expression of the output, and converts the multi-classified output value into the range +.>And the sum is a probability distribution of 1, which is multiplied by a matrix of values to obtain the output +.>The final output of the first space-time block is obtained by a forward propagation network, the two neural networks are connected in parallel through residual errors to avoid the problem of gradient disappearance,
in the whole process, the space blocks are connected in parallel in a layer normalization jump mode, on one hand, time sequence data are normalized, the data are prevented from falling into a saturation region of an activation function, gradient disappearance or explosion is avoided, and on the other hand, internal residual connection is achieved, and the effect of relieving gradient disappearance is achieved.
Step 3.6 the second through kth spatio-temporal blocks perform the same operations from step 3.2 to step 3.5. In parallel, a connection exists between every two space-time blocks, and the effect of avoiding gradient disappearance is achieved.
Step 3.7 training the model.
The specific process of using convolutional neural network to reduce the data dimension in step 4 is as follows: the input dimension of the time sequence data after interpolation is C multiplied by N multiplied by T, the time sequence data has C characteristic dimensions, the data is processed by a convolution layer with a convolution kernel of C multiplied by C, the dimension of the data is changed into 1 multiplied by N multiplied by T, and the interpolated coupling information flow data of N nodes T in time steps is obtained.
The time series data after interpolation in the step 5 is input into a convolution kernel of 1×1 as a sample, and the data of a future time step is predicted.
Example 3: networked data age information stream data prevails. The specific operation of the present invention will be described below by taking traffic flow data as an example. In order to verify the method of the present invention based on the complete data missing time series data interpolation of space-time coding network, taking as an example a neighboring area (hereinafter abbreviated as D5 data set) of the 5 th area collected from the california bureau performance measurement system in 2019, data of the area aggregated by 53 sensor stations and traffic speed samples in 7 months is selected, a set time interval is 5 minutes, and the meaning of the flow value indicates an average traffic speed (km/h) in the corresponding 5 minute interval. In addition, the locations of these sensor stations are all referred to as latitude and longitude. The original data is a two-dimensional matrix, the row index represents the time step, and the column index represents the number of selected nodes.
If some traffic stations fail for a long period of time, even without sensors, a complete loss of partial data occurs. In general, it is not possible to infer traffic flow evolution laws for non-observable nodes without any information. While DSTTN uses cross-modal views, i.e. traffic flow data of sensor nodes with different shorter distances in a specific mode is similar, it also has good processing power for this problem.
Step 1: obtaining actual measurement traffic flow data information, and representing a detector network of an urban road network as a weighted directed graph,/>Node set for detector, +.>Is edge set, is->A weighted adjacent matrix for describing the similarity degree of different nodes; preprocessing the actually measured traffic flow data information to form traffic flow time sequence data numbered according to sampling points; wherein the weighted adjacency matrix describing the similarity degree of different nodes is->Is an adjacency matrix based on graph generated by physical distance between nodes, and its weight is the linear physical distance between two nodes, and is calculated by haverine tool (python) is carried into the node longitude and latitude calculation, and the visual understanding is that the closer the physical distance is, the closer the node flow is. The weighted adjacency matrix serves as an initialization matrix for the spatial blocks such that the time series data has spatial characteristics. The present invention constructs an adjacency matrix in the following manner.
wherein ,representing node->And node->Physical distance between the two nodes, it is apparent that the closer the distance between the two nodes is, the higher the degree of similarity.
And 2, carrying out dimension lifting on the data by using a 1 multiplied by 1 convolutional neural network, and improving the feature extraction capability of the data.
And 3, performing deletion interpolation on the deletion value of the preprocessed traffic flow time sequence data by using the DSTTN.
And 4, performing dimension reduction on the interpolated time sequence data through a 1 multiplied by 1 convolutional neural network to obtain the time sequence data which is interpolated in the same dimension as the actual data.
And 5, predicting the traffic flow data of the future time step according to the obtained time sequence data which completes the interpolation of the missing data by using a 1 multiplied by 1 convolutional neural network.
Further, in step 1, 53 nodes are selected by manually selecting the sliding window, and traffic flow data of 96 time steps is input as a model. Limiting the input dimension avoids that the length of the time series data input is too long to cause the dimension of the neural network to be too large to run slowly. The selected node timing data may be subject to random and non-random deletions or partial node data may be completely missing.
Further, in the step 2, the specific process of using the convolutional neural network to perform dimension lifting on the data is as follows: the time series data has only one characteristic dimension, and is processed by using 32 convolution check data. Deepening the network without changing the receptive field introduces more nonlinearities, which increases the expressive power of the network, in other words, an increase in the number of data features results in an enhancement of the data's characterization ability.
Further, in the step 3, the method for performing missing interpolation on the missing value of the time series data by using the DSTTN comprises the following steps:
step 3.1 inputting a row of time sequence data vectors containing missing dataAnd weight matrix->Multiplying to obtain a column of ebedding->And then into the self-attention layer. Embedding is to use low-dimensional vector +.>Representing multimodal data.
In the step 3.2 of the method,first the first spatial block of the first spatio-temporal module is entered and the attention mechanism is performed. That is, the attention distribution is calculated using the inner product on all the input information, and the weighted average of the input information is calculated from the attention distribution.
Further, step 3.2.1 each vectorObtaining the corresponding +.>,/>The direction of the guided feature extraction is a query vector associated with the task.
Parallel, vectorThe corresponding key vector is obtained through the second linear layer and the third linear layer respectively>Value vector->,
Step 3.2.2 with eachFor each->Doing inner product to match the proximity of the two vectors, first for vector +.>Is> and />The term is done, i.e. 2 vectors are done scaled inner product,next, will +.> and />Do atttion, get ∈>And so on,
Because ofThe value of (2) increases with increasing dimensions, so it is divided by +.>Equivalent to normalization effect.
Expressed as natural constant->An exponential function of the base. Get->Is realized by the operation ofThe probability expression of the output converts the multi-classified output value into the range of 0, 1]And the sum is a probability distribution of 1.
Get the 1 st output focused on the 1 stWeight of individual inputs->Then, it is combined with the output of the third linear layerDoing the inner volume gives +.>A first vector representing the output of a first spatio-temporal block. Obviously, it uses the information of the whole sequence. If local information is to be considered, only the corresponding +.>The preparation method is finished; similarly, when global information is considered, only the corresponding +.>。
Step 3.2.3 similar to step 3.3.1 pair vectorOther components of-> and />All outputs are obtained by doing the intent, and are input into a forward propagation network to obtain the complete output of the space block. In parallel, the gradient vanishing problem is avoided by connecting two neural networks through residual errors.
Step 3.3 combining the output of the space block and the input of the space block as the input of the time block and converting the combined output into timeThe same procedure as in step 3.2 is performed, which is here represented by a matrix. n is the dimension of the data feature.
Step 3.4 willParallel inputs to three linear layers to obtain the query matrix respectively>Key matrix->Value matrix->Will->Is +.>And->Is +.>Do the attention mechanism, will->Conversion to a line vector->Then->Record->There is a attention matrix->Representing the mechanism of attention between every two positions, for +.>Get +.>Operation, obtain output matrix for measuring different link weights
Taking outThe operation of (1) realizes the probabilistic expression of the output, and converts the multi-classified output value into the range +.>And the sum is a probability distribution of 1, which is multiplied by a matrix of values to obtain the output +.>The final output of the first space-time block is obtained by a forward propagation network, the two neural networks are connected in parallel through residual errors to avoid the problem of gradient disappearance,
step 3.5, in the whole process, layer normalization jump connection is performed in parallel between the space blocks, on one hand, normalization is performed on time sequence data, so that the data is prevented from falling into a saturation region of an activation function, gradient disappearance or explosion is avoided, on the other hand, internal residual connection is performed, the effect of relieving gradient disappearance is achieved,
step 3.6 the second through kth spatio-temporal blocks perform the same operations from step 3.2 to step 3.5. In parallel, every two space-time blocks are connected, so as to achieve the effect of avoiding gradient disappearance,
step 3.7 training the model.
Further, in the step 4, the specific process of using the convolutional neural network to reduce the dimension of the data is as follows: the output of the time sequence data after interpolation has 32 characteristic dimensions, and the data is processed by a convolution layer, so that the traffic flow data after interpolation of 53 nodes and 96 time steps can be obtained.
Further, the traffic flow data interpolated in step 5 is input as a sample into a 1×1 convolution kernel, and the data of the future time step is predicted.
In an example, the loss function is constructed as follows,
if it isFor missing data->The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, then->. wherein />Is an adjacency moment describing the physical topology between nodes, < >>Depicting node->And node->Degree of similarity between. D is the degree matrix of A->Is node->At time->Is>Is a model->At time->Fitting of the data.
The loss function is formed by adding two items, the first item ensures that the fitting precision of non-missing data is high enough, and the second item utilizes the time sequence value similarity of similar nodes to treat the problem of complete data missing. Setting penalty factorsThe weight ratio of the two items is adjusted.
Setting a data loss rate:, wherein />Is the number of missing data. N is the number of nodes 53, T is the number of time steps 96./>
By continuously comparing the true value with the interpolated value, the entire neural network will continuously counter-propagate the training error and adjust the weight parameters therein. The loss function is reduced by using a batch gradient descent method, so that the weight parameter in the middle can be adjusted once every time training data of each batch is passed, and training is stopped until the iteration number requirement is met or the error, the precision and the like meet a certain threshold value.
To verify the missing data interpolation effect proposed by the present invention, the D5 dataset described above was used and several reference models were chosen for comparison, including bayesian gaussian tensor decomposition (Bayesian GaussianCANDECOMP/paramac, BGCP), depth stack auto encoder (DeepStack Auto Encoder, DSAE), generate anti-depth stack auto encoder (GAN-DSAE), two-way long and short term memory artificial neural network (Bidirectional long-short term memory, BD-LSTM). The predicted effect pair is shown in table 1 as follows:
table 1:
it can be seen from table 1 that the present invention is superior to other reference models under all metrics at data loss rates of 30% -70%, and is a model that performs better than the reference model in missing data interpolation with high loss rates.
In order to verify the effect of the complete missing data interpolation of the partial nodes, the D5 data set is adopted, and the following reference models are selected for comparison, including BGCP, DSAE, GAN-DSAE and BD-LSTM. The predicted effect pair is shown in table 2:
table 2:
as can be seen from Table 2, the present invention is superior to other reference models in all metrics at a loss rate between 5% and 15%, and is a model that performs better than the reference model in the complete loss data interpolation of partial node data.
The analysis shows that the method provided by the invention solves the problems of low recovery rate and complete loss of local data under the condition of high data loss rate, which are not in good solution, realizes the interpolation of the lost data of the intelligent traffic system flow, and can fully capture the time-space characteristics of traffic flow.
The above embodiments are merely for illustrating the design concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, the scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes or modifications according to the principles and design ideas of the present invention are within the scope of the present invention.
Claims (10)
1. The time sequence data deletion interpolation system based on the modal crossing method is characterized by comprising a characterization data module, a multi-modal data fusion module and a stackable space-time converter module, and a prediction module;
wherein, the characterization data module: node characterization is obtained through calculation firstly based on graph neural network graph characterization learning, and then graph pooling is carried out on the characterization of each node on the graph to obtain the characterization of the graph, and image data is converted into vectors; simultaneously, a neural network is utilized to learn a representation which can capture high-level semantic content from signals and is not interfered by low-level details in the signals, sound data are converted into vectors, and the obtained result is input into a multi-mode data fusion module;
multimode data fusion module: converting the graph data into adjacent matrixes to form a vector, and then combining the vector with the output of the characterization data module and the data containing the missing time sequence into a tensor to obtain multi-mode data input; meanwhile, the graph data is used as the initialization input of a first space block, the space characteristics of the time sequence data can be extracted after passing through the space block, the multi-mode data is input into a stackable space-time module after passing through the dimension rise of a 1X 1 convolutional neural network, and the dimension rise can increase the dimension of the data characteristic extraction so as to analyze the data more comprehensively;
stackable spatiotemporal modules: the module consists of a plurality of space-time modules, a dense connection sub-module and a convolutional neural network sub-module;
each space-time submodule comprises a time block and a space block, the structural frameworks of the time block and the space block are the same, the main body framework is built by a transducer, the densely connected submodules are connected in parallel with a plurality of space-time modules to realize dense interaction of characteristic information flow, output time sequence data which enters a convolutional neural network for dimension reduction to obtain data interpolation is input into a prediction module,
and a prediction module: the method is realized by a convolutional neural network, and the input time sequence data after interpolation predicts the time sequence data of 10% of the original time step through the convolutional neural network.
2. A time series data missing interpolation method based on a modal crossing method, characterized in that the interpolation system of claim 1 is adopted, the method comprises the following steps:
step 1: the characterization data module processes sound data and image data, the image data are tensed into vectors, and the vectors and the missing data are fused into a multi-mode data input 1 multiplied by 1 convolutional neural network;
step 2: the convolution neural network of 1 multiplied by 1 is utilized to carry out dimension lifting on the data, the characteristic extraction capability of the data is improved,
step 3: performing missing interpolation on missing values of the preprocessed coupling information flow time sequence data by using a dense space-time coding network,
step 4: a loss function is constructed for the model and,
step 5: the model is trained in such a way that,
step 6: the interpolated time sequence data is subjected to dimension reduction by a convolution neural network of 1 multiplied by 1 to obtain the time sequence data which is interpolated in the same dimension as the actual data,
step 7: and predicting the coupling information flow data of the future time steps according to the obtained time sequence data which completes the interpolation of the missing data by using a 1 multiplied by 1 convolutional neural network.
3. The method for temporal data loss interpolation based on the modal crossover method as claimed in claim 2,
in step 1, a sliding window is manually selected to determine an input dimension, namely N nodes are selected, coupling information flow data of a T time step is taken as model input, the input dimension is limited, and the time sequence data of the selected nodes can generate random missing and non-random missing situations or partial node data is completely missing.
4. The method for temporal data loss interpolation based on the modal crossover method as claimed in claim 2,
the specific process of using the convolutional neural network to carry out dimension lifting on the data in the step 2 is as follows: the input dimension of the time sequence data is 1 xN x T, only one characteristic dimension is provided, the data is processed by using C convolution check data, the dimension of the data is changed into C x N x T, the network is deepened under the condition of not changing a receptive field, more nonlinearity is introduced, N is the number of nodes, T is the time step, C is the characteristic number of the data, and the characteristic number is any natural number.
5. The method for temporal data loss interpolation based on the modal crossover method as claimed in claim 2,
the method for performing missing interpolation on the missing value of the time sequence data by using the dense space-time coding network in the step 3 comprises the following steps:
step 3.1 inputting a row of time sequence data vectors containing missing dataAnd weight matrix->Multiplying to obtain a columnAfter entering the self-attention layer->Namely, use low-dimensional vector +.>The multi-modal data is represented and,
in the step 3.2 of the method,first, the first spatial block of the first spatio-temporal module is entered, and a attention mechanism is performed, i.e. an attention distribution is calculated on all input information using inner products, a weighted average of the input information is calculated from the attention distribution,
step 3.3 combining the output of the space block and the input of the space block as the input of the time block and converting the combined output into timeThe same procedure as in step 3.2 is carried out, here expressed in matrix, by +.>The dimensions of the data characteristic are such that,
step 3.4 willParallel inputs to three linear layers to obtain the query matrix respectively>Key matrix->Value matrixWill->Is +.>And->Is +.>Do the attention mechanism, will->Conversion to a line vector->Then->Record->There is a attention matrix->Representing the mechanism of attention between every two positions, for +.>Get +.>Operation, obtain output matrix for measuring different link weights
Taking outThe operation of (1) realizes the expression of probability of output, and converts the multi-classified output values into the rangeAnd the sum is a probability distribution of 1, which is multiplied by a matrix of values to obtain the output +.>The final output of the first space-time block is obtained by a forward propagation network, the two neural networks are connected in parallel through residual errors to avoid the problem of gradient disappearance,
step 3.5 in the whole process, carrying out layer normalization jump connection between the space block and the empty block in parallel,
6. The method for temporal data loss interpolation based on the modal crossover method as claimed in claim 5,
step 3.2 is specifically as follows:
step 3.2.1 each vectorObtaining the corresponding +.>,/>The direction of extraction of the guide features is a query vector related to the task,
parallel, vectorThe corresponding key vector is obtained through the second linear layer and the third linear layer respectively>Value vector->,
Step 3.2.2 with eachFor each->Doing inner products to match thisThe proximity of the two vectors is first of all to the vector +.>Is> and />Do attitution, i.e. do 2 vectors scaled inner product,>next, will +.> and />Do atttion, get ∈>And so on,
Because ofThe value of (2) increases with increasing dimensions, so it is divided by +.>Equivalent to the effect of normalization,
Represents an exponential function based on a natural constant e, taking +.>The operation of (1) realizes the expression of probability of output, and converts the multi-classified output value into the range of [0, 1 ]]And the sum is a probability distribution of 1,
get the 1 st output focused on the 1 stWeight of individual inputs->Then, it is combined with the output value of the third linear layer>Doing the inner volume gives +.>The first vector representing the output of the first spatio-temporal block, which obviously uses the information of the whole sequence, only needs to learn the corresponding +.>The preparation method is finished; similarly, when global information is considered, only the corresponding +.>,
Step 3.2.3 similar to step 3.3.1 pair vectorOther components of-> and />And (3) performing an alternation to obtain all outputs, inputting the outputs into a forward propagation network to obtain the complete output of a space block, and connecting two neural networks in parallel through residual errors to avoid the problem of gradient disappearance.
7. The temporal data loss interpolation method based on the modal crossover method according to claim 2, wherein
8. The method for temporal data loss interpolation based on the modal crossover method as claimed in claim 2,
and 5, training the model by using the loss function in the step 4.
9. The method for temporal data loss interpolation based on the modal crossover method as claimed in claim 2,
in the step 6, the specific process of using the convolutional neural network to reduce the data dimension is as follows: the input dimension of the time sequence data after interpolation is C multiplied by N multiplied by T, the time sequence data has C characteristic dimensions, the data is processed by a convolution layer with a convolution kernel of C multiplied by C, the dimension of the data is changed into 1 multiplied by N multiplied by T, and the interpolated coupling information flow data of N nodes T in time steps is obtained.
10. The method according to claim 2, wherein the time series data after interpolation in step 7 is input as samples to a convolution kernel of 1×1, and the data of the future time step is predicted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310063746.6A CN115827335B (en) | 2023-02-06 | 2023-02-06 | Time sequence data missing interpolation system and time sequence data missing interpolation method based on modal crossing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310063746.6A CN115827335B (en) | 2023-02-06 | 2023-02-06 | Time sequence data missing interpolation system and time sequence data missing interpolation method based on modal crossing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115827335A CN115827335A (en) | 2023-03-21 |
CN115827335B true CN115827335B (en) | 2023-05-09 |
Family
ID=85520744
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310063746.6A Active CN115827335B (en) | 2023-02-06 | 2023-02-06 | Time sequence data missing interpolation system and time sequence data missing interpolation method based on modal crossing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115827335B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116149896B (en) * | 2023-03-27 | 2023-07-21 | 阿里巴巴(中国)有限公司 | Time sequence data abnormality detection method, storage medium and electronic device |
CN117473234B (en) * | 2023-12-28 | 2024-04-30 | 广州地铁设计研究院股份有限公司 | Deformation monitoring data preprocessing method, device, equipment and storage medium |
CN117909658B (en) * | 2024-03-19 | 2024-05-14 | 北京航空航天大学 | Interpolation method and system based on cyclic neural network |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3376446A1 (en) * | 2017-03-18 | 2018-09-19 | Tata Consultancy Services Limited | Method and system for anomaly detection, missing data imputation and consumption prediction in energy data |
CN113673769B (en) * | 2021-08-24 | 2024-02-02 | 北京航空航天大学 | Traffic flow prediction method of graph neural network based on multivariate time sequence interpolation |
CN115630211A (en) * | 2022-09-16 | 2023-01-20 | 山东科技大学 | Traffic data tensor completion method based on space-time constraint |
CN115510174A (en) * | 2022-09-29 | 2022-12-23 | 重庆邮电大学 | Road network pixelation-based Wasserstein generation countermeasure flow data interpolation method |
-
2023
- 2023-02-06 CN CN202310063746.6A patent/CN115827335B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN115827335A (en) | 2023-03-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115827335B (en) | Time sequence data missing interpolation system and time sequence data missing interpolation method based on modal crossing method | |
CN109214575B (en) | Ultrashort-term wind power prediction method based on small-wavelength short-term memory network | |
CN109492822B (en) | Air pollutant concentration time-space domain correlation prediction method | |
CN113487061A (en) | Long-time-sequence traffic flow prediction method based on graph convolution-Informer model | |
CN114802296A (en) | Vehicle track prediction method based on dynamic interaction graph convolution | |
CN111612243A (en) | Traffic speed prediction method, system and storage medium | |
CN112949828A (en) | Graph convolution neural network traffic prediction method and system based on graph learning | |
Wu et al. | End-to-end driving model for steering control of autonomous vehicles with future spatiotemporal features | |
CN111523706B (en) | Section lane-level short-term traffic flow prediction method based on deep learning combination model | |
CN111242377A (en) | Short-term wind speed prediction method integrating deep learning and data denoising | |
CN109829495A (en) | Timing image prediction method based on LSTM and DCGAN | |
CN112561039A (en) | Improved search method of evolutionary neural network architecture based on hyper-network | |
CN112613657B (en) | Short-term wind speed prediction method for wind power plant | |
CN112734094B (en) | Intelligent city intelligent rail vehicle fault gene prediction method and system | |
CN115510174A (en) | Road network pixelation-based Wasserstein generation countermeasure flow data interpolation method | |
CN114841072A (en) | Differential fusion Transformer-based time sequence prediction method | |
CN115862319A (en) | Traffic flow prediction method for space-time diagram self-encoder | |
CN114860715A (en) | Lanczos space-time network method for predicting flow in real time | |
Shao et al. | Failure detection for motion prediction of autonomous driving: An uncertainty perspective | |
CN115860056B (en) | Sensor array neural network method for mixed gas concentration prediction | |
CN115357862B (en) | Positioning method in long and narrow space | |
CN115457657A (en) | Method for identifying channel characteristic interaction time modeling behaviors based on BERT model | |
CN115062759A (en) | Fault diagnosis method based on improved long and short memory neural network | |
CN115018134A (en) | Pedestrian trajectory prediction method based on three-scale spatiotemporal information | |
CN114689351A (en) | Equipment fault predictive diagnosis system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |