CN117743852A - Traffic missing data filling method based on diffusion neural network - Google Patents

Traffic missing data filling method based on diffusion neural network Download PDF

Info

Publication number
CN117743852A
CN117743852A CN202311780639.1A CN202311780639A CN117743852A CN 117743852 A CN117743852 A CN 117743852A CN 202311780639 A CN202311780639 A CN 202311780639A CN 117743852 A CN117743852 A CN 117743852A
Authority
CN
China
Prior art keywords
data
traffic
model
missing
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311780639.1A
Other languages
Chinese (zh)
Inventor
邱忠权
梁瑶
廖壮志
袁红霞
周垲轶
吴兵
唐晓波
李昀泓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Chengnan Expressway Co ltd
Southwest Jiaotong University
Sichuan Communication Surveying and Design Institute Co Ltd
Original Assignee
Sichuan Chengnan Expressway Co ltd
Southwest Jiaotong University
Sichuan Communication Surveying and Design Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Chengnan Expressway Co ltd, Southwest Jiaotong University, Sichuan Communication Surveying and Design Institute Co Ltd filed Critical Sichuan Chengnan Expressway Co ltd
Priority to CN202311780639.1A priority Critical patent/CN117743852A/en
Publication of CN117743852A publication Critical patent/CN117743852A/en
Pending legal-status Critical Current

Links

Landscapes

  • Traffic Control Systems (AREA)

Abstract

The invention relates to the technical field of traffic, and discloses a traffic missing data filling method based on a diffusion neural network, which specifically comprises the following steps: s1, data acquisition and pretreatment: original traffic flow and speed data which are recorded by road network sensors and do not contain missing values are obtained, preprocessed, and a two-dimensional matrix is constructedWherein rows represent different points in time and columns represent different sensors; s2, data set division and processing: time-wise maintaining traffic dataDegree division into training sets X train Verification set X val And test set X test Randomly deleting a certain proportion of numerical values to simulate the data deletion condition and forming a data set containing the deletion rate; the invention adopts a network training mode, can directly process traffic data containing the missing value, and the filled missing value is closer to the true value, and the method is beneficial to improving the accuracy and reliability of data filling.

Description

Traffic missing data filling method based on diffusion neural network
Technical Field
The invention relates to the technical field of traffic, in particular to a traffic missing data filling method based on a diffusion neural network.
Background
With the rapid development of intelligent traffic systems, data driving methods based on deep neural networks are increasingly being widely explored and applied in solving traffic-related problems, such as traffic flow prediction and congestion recognition. However, these methods rely heavily on high quality historical space-time traffic data, and data from real world sensor devices often appear incomplete or corrupted by natural or human factors (e.g., sensor failures, network communication errors, or storage anomalies). Complete traffic data is critical for further tasks, so proper algorithms need to be designed to extract patterns from multi-dimensional spatio-temporal traffic features to estimate missing values. Over the past several decades, there have been many efforts to address the problem of missing data. Classical statistical time series models, such as autoregressive integral moving average models, are used to estimate missing values, but these models typically impose linear or smooth assumptions on the data, which degrade when they are not established. New machine learning methods such as K-nearest neighbor have been paid attention to their ability to handle complex traffic problems (e.g., missing data prediction and event detection), and generally better capture nonlinear relationships. Meanwhile, matrix decomposition-based and tensor-based dimension reduction techniques have been proposed to solve traffic data filling problems and perform well in practice, however these methods require low rank matrices and are very sensitive to dimension information. Recently, deep learning-based methods have been shown to be significantly attractive for missing value estimation as a part of machine learning. Many studies have applied deep learning methods to fill in missing traffic data, such as long and short term memory neural networks and bi-directional recurrent neural networks, and achieved promising performance.
Because the traffic network has a typical graph structure, and the data recorded by the sensors on the road network has strong spatial correlation, in the aspect of filling the missing data, the correlation of the graph data is not effectively considered by the method based on the graph neural network, so that the obtained filling result is inaccurate.
Therefore, we propose a traffic loss data filling method based on a diffuse neural network.
Disclosure of Invention
The invention mainly solves the technical problems existing in the prior art and provides a traffic missing data filling method based on a diffusion neural network.
In order to achieve the above purpose, the invention adopts the following technical scheme that the traffic missing data filling method based on the diffusion neural network specifically comprises the following steps:
s1, acquiring and preprocessing a traffic data set: firstly, acquiring data such as original traffic flow, speed and the like which are recorded by road network sensors and do not contain missing values, preprocessing the data, and constructing a two-dimensional matrixIn the matrix, rows represent different points in time and columns represent different sensors;
s2, dividing and processing a data set: dividing the traffic data set into training sets X according to time dimension train Verification set X val And test set X test Randomly deleting a certain proportion of numerical values to simulate the data deletion condition and forming a data set containing the deletion rate;
s3, processing a road network topology structure diagram: calculating an adjacency matrixThe relationship between nodes in the traffic network G is represented, reachability between the nodes is calculated using the distance information, and values in the adjacency matrix are calculated based on the distances, the calculation formula of which is:
wherein d ij For node v i To node v j Sigma is the distance standard deviation, epsilon is the super parameter for controlling the sparsity of the weighting matrix;
s4, calculating forward and reverse transfer matrixes: calculating forward transfer matrixAnd reverse transfer matrix->These transfer matrices describe the direction of information transfer between nodes, < >>Describing forward information transfer,/->Reverse information transfer is described, wherein the forward transfer matrix of random walk +.>The calculation formula of the method can be calculated from the original adjacency matrix W and the degree matrix D, and is as follows:
where w is an original adjacency matrix, representing the connection relationship between nodes, D is a diagonal matrix, the diagonal element of which is the degree of a node (i.e. the number of edges connected to the node), and D is calculated by:
d i =∑ j W ij
wherein d i Is the degree of node i. Then, reverse transfer matrixCan be made of->The transpose of (2) is calculated as:
s5, constructing a basic module of a diffusion graph convolution network model: the diffusion graph convolution network is capable of updating the feature representation of nodes by multiple iterative diffusion processes, capturing local and global information in the graph structure, which utilizes a transfer matrixAnd->The node characteristics are iteratively updated, and the specific calculation formula is as follows:
wherein,the forward transfer matrix and the reverse transfer matrix are respectively, and W is an adjacent matrix. K is the order of the diffusion convolution, the convolution process in the diffusion graph convolution network is approximated using a Chebyshev polynomial, and then the calculation formula is obtained:
T k (X)=2XT k-1 (X)-T k-2 (X),
defined as T in a recursive manner 0 (X)=I,T 1 (X)=X,And->Is the learning parameter of the first layer, controls how each node transforms the received information, H l+1 Is the output of the first layer, H if the first layer is the first layer l For input X containing missing data, if the first layer is the last layer, H l+1 For the final output of the whole basic module, the output result is filled data
S6, model training and data filling: using training set X containing data deletions train Training the input model, filling the data containing the missing values to form complete data, and calculating the missing data x and the complete dataMean square error between->To evaluate the padding effect;
s7, model optimization and stopping strategies: using a back propagation continuous optimization model and introducing an early stop strategy according to the verification set X val Mean square error of (b)Stopping training when the difference is not reduced any more, and obtaining a trained data filling model;
s8, model evaluation: test set X test Inputting a trained model to obtain filled complete data, and calculating an error index between the full data and the real data to evaluate the performance of the model, such as average absolute error (MAE), root Mean Square Error (RMSE) and Median Absolute Percentage Error (MAPE);
s8, verifying an experimental result: the experimental result of the invention is compared with the traditional data filling method, and the effectiveness and superiority of the model are verified.
Preferably, in the model training process in S6, a model optimization technology is adopted, and an advanced stopping strategy is introduced, so that the model training is monitored based on errors of the verification set.
Advantageous effects
The invention provides a traffic loss data filling method based on a diffusion neural network. The beneficial effects are as follows:
(1) According to the traffic missing data filling method based on the diffusion neural network, the network training mode is adopted, traffic data containing missing values can be directly processed, the filled missing values are closer to the true values, and the method is beneficial to improving the accuracy and reliability of data filling.
(2) According to the traffic missing data filling method based on the diffusion neural network, the correlation among different sensor data is considered by the diffusion neural network, so that the graph structural characteristics of the road network are more fully captured, the data are more effectively filled, the topological characteristics of the traffic network are more carefully analyzed, and a more accurate modeling basis is provided for data filling.
Drawings
FIG. 1 is a model structure for handling traffic loss data according to the present invention;
fig. 2 is an example of traffic data to be processed including missing values according to the present invention.
Detailed Description
Examples: the traffic missing data filling method based on the diffusion neural network, as shown in fig. 1-2, specifically comprises the following steps:
s1, acquiring and preprocessing a traffic data set: firstly, acquiring data such as original traffic flow, speed and the like which are recorded by road network sensors and do not contain missing values, preprocessing the data, and constructing a two-dimensional matrixIn the matrix, rows represent different points in time and columns represent different sensors;
s2, dividing and processing a data set: dividing the traffic data set into training sets X according to time dimension train Verification set X val And test set X test Randomly deleting a certain proportion of numerical values to simulate the data deletion condition and forming a data set containing the deletion rate;
s3, processing a road network topology structure diagram: calculating an adjacency matrixThe relationship between nodes in the traffic network G is represented, reachability between the nodes is calculated using the distance information, and values in the adjacency matrix are calculated based on the distances, the calculation formula of which is:
wherein d ij For node v i To node v j Sigma is the distance standard deviation, epsilon is the super parameter for controlling the sparsity of the weighting matrix;
s4, calculating forward and reverse transfer matrixes: calculating forward transfer matrixAnd reverse transfer matrix->These transfer matrices describe the direction of information transfer between nodes, < >>Describing forward information transfer,/->Reverse information transfer is described, wherein the forward transfer matrix of random walk +.>The calculation formula of the method can be calculated from the original adjacency matrix W and the degree matrix D, and is as follows:
where w is an original adjacency matrix, representing the connection relationship between nodes, D is a diagonal matrix, the diagonal element of which is the degree of a node (i.e. the number of edges connected to the node), and D is calculated by:
d i =∑ j W ij
wherein d i Is the degree of node i. Then, reverse transfer matrixCan be made of->The transpose of (2) is calculated as:
s5, constructing a basic module of a diffusion graph convolution network model: the diffusion graph convolution network is capable of updating the feature representation of nodes by multiple iterative diffusion processes, capturing local and global information in the graph structure, which utilizes a transfer matrixAnd->The node characteristics are iteratively updated, and the specific calculation formula is as follows:
wherein,the forward transfer matrix and the reverse transfer matrix are respectively, and W is an adjacent matrix. K is the order of the diffusion convolution, the convolution process in the diffusion graph convolution network is approximated using a Chebyshev polynomial, and then the calculation formula is obtained:
T k (X)=2XT k-1 (X)-T k-2 (X),
defined as T in a recursive manner 0 (X)=I,T 1 (X)=X,And->Is the learning parameter of the first layer, controls how each node transforms the received information, H l+1 Is the output of the first layer, H if the first layer is the first layer l For input X containing missing data, if the first layer is the last layer, H l+1 For the final output of the whole basic module, the output result is filled data
S6, model training and data filling: using training set X containing data deletions train Training the input model, filling the data containing the missing values to form complete data, and calculating the missing data x and the complete dataMean square error between->To evaluate the padding effect;
s7, model optimization and stopping strategies: using a back propagation continuous optimization model and introducing an early stop strategy according to the verification set X val Stopping training when the error is not reduced any more, and obtaining a trained data filling model;
s8, model evaluation: test set X test Inputting a trained model to obtain filled complete data, and calculating an error index between the full data and the real data to evaluate the performance of the model, such as average absolute error (MAE), root Mean Square Error (RMSE) and Median Absolute Percentage Error (MAPE);
s8, verifying an experimental result: the experimental result of the invention is compared with the traditional data filling method, and the effectiveness and superiority of the model are verified.
In the S6 model training process, a model optimization technology is adopted, an advanced stopping strategy is introduced, and error monitoring model training based on a verification set is performed.
The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (3)

1. The traffic missing data filling method based on the diffusion neural network is characterized by comprising the following steps of:
s1, data acquisition and pretreatment: original traffic flow and speed data which are recorded by road network sensors and do not contain missing values are obtained, preprocessed, and a two-dimensional matrix is constructedWherein rows represent different points in time and columns represent different sensors;
s2, data set division and processing: dividing traffic data into training sets X according to time dimension train Verification set X val And test set X test Randomly deleting a certain proportion of numerical values to simulate the data deletion condition and forming a data set containing the deletion rate;
s3, road network topology structure diagram processing: computing adjacency matrix between nodes in a traffic network GReflecting the node relationship. Calculating node reachability based on the distance information, and calculating values in the adjacency matrix in combination with the parameters;
s4, calculating a transfer matrix: calculating forward transfer matrixAnd reverse transfer matrix->Wherein (1)>Describing forward information transfer,/->Describing reverse information transfer;
s5, constructing a diffusion graph convolution network model: based onAnd->Performing iterative updating of node characteristics to form an iterative mode of a diffusion map convolutional network model;
s6, model training and data fillingAnd (3) supplementing: using training set X containing data deletions train Training the input model, filling traffic data containing missing values to form a complete data set, and performing mean square errorEvaluating the filling effect;
s7, model optimization and stopping strategies: using a back propagation optimization model and based on validation set X val An early stop strategy is adopted to prevent over-fitting;
s8, model evaluation: using test set X test Inputting a trained model, calculating various error indexes (such as MAE, RMSE, MAPE), and evaluating the performance of the model;
s9, verifying an experimental result: the result of the invention is compared with the traditional data filling method, and the effectiveness and superiority of the model are verified.
2. The traffic loss data filling method based on the diffuse neural network according to claim 1, wherein: the iterative updating of the diffusion graph convolution network model in the S5 is adopted
And approximates the convolution process using Chebyshev polynomials.
3. The traffic loss data filling method based on the diffuse neural network according to claim 1, wherein: and in the S6 model training process, a model optimization technology is adopted, an advanced stopping strategy is introduced, and error monitoring model training based on a verification set is performed.
CN202311780639.1A 2023-12-22 2023-12-22 Traffic missing data filling method based on diffusion neural network Pending CN117743852A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311780639.1A CN117743852A (en) 2023-12-22 2023-12-22 Traffic missing data filling method based on diffusion neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311780639.1A CN117743852A (en) 2023-12-22 2023-12-22 Traffic missing data filling method based on diffusion neural network

Publications (1)

Publication Number Publication Date
CN117743852A true CN117743852A (en) 2024-03-22

Family

ID=90260541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311780639.1A Pending CN117743852A (en) 2023-12-22 2023-12-22 Traffic missing data filling method based on diffusion neural network

Country Status (1)

Country Link
CN (1) CN117743852A (en)

Similar Documents

Publication Publication Date Title
CN113313947B (en) Road condition evaluation method of short-term traffic prediction graph convolution network
CN112071065A (en) Traffic flow prediction method based on global diffusion convolution residual error network
CN110047291B (en) Short-term traffic flow prediction method considering diffusion process
CN114299723B (en) Traffic flow prediction method
CN111612243A (en) Traffic speed prediction method, system and storage medium
CN111047078B (en) Traffic characteristic prediction method, system and storage medium
CN110570035B (en) People flow prediction system for simultaneously modeling space-time dependency and daily flow dependency
CN113570859B (en) Traffic flow prediction method based on asynchronous space-time expansion graph convolution network
CN115660135A (en) Traffic flow prediction method and system based on Bayes method and graph convolution
CN115482656B (en) Traffic flow prediction method by using space dynamic graph convolutional network
CN112910711A (en) Wireless service flow prediction method, device and medium based on self-attention convolutional network
CN116596151B (en) Traffic flow prediction method and computing device based on time-space diagram attention
CN115862324A (en) Space-time synchronization graph convolution neural network for intelligent traffic and traffic prediction method
CN116844041A (en) Cultivated land extraction method based on bidirectional convolution time self-attention mechanism
CN114860715A (en) Lanczos space-time network method for predicting flow in real time
Chitra et al. Time-series analysis and flood prediction using a deep learning approach
CN114358246A (en) Graph convolution neural network module of attention mechanism of three-dimensional point cloud scene
CN109993282B (en) Typhoon wave and range prediction method
CN116777068A (en) Causal transducer-based networked data prediction method
CN117743852A (en) Traffic missing data filling method based on diffusion neural network
CN115830865A (en) Vehicle flow prediction method and device based on adaptive hypergraph convolution neural network
CN115953902A (en) Traffic flow prediction method based on multi-view space-time diagram convolution network
CN115544307A (en) Directed graph data feature extraction and expression method and system based on incidence matrix
CN114566048A (en) Traffic control method based on multi-view self-adaptive space-time diagram network
CN115131605A (en) Structure perception graph comparison learning method based on self-adaptive sub-graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination