CN115510174A - Road network pixelation-based Wasserstein generation countermeasure flow data interpolation method - Google Patents
Road network pixelation-based Wasserstein generation countermeasure flow data interpolation method Download PDFInfo
- Publication number
- CN115510174A CN115510174A CN202211197830.9A CN202211197830A CN115510174A CN 115510174 A CN115510174 A CN 115510174A CN 202211197830 A CN202211197830 A CN 202211197830A CN 115510174 A CN115510174 A CN 115510174A
- Authority
- CN
- China
- Prior art keywords
- data
- road
- road network
- missing
- traffic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000008439 repair process Effects 0.000 claims abstract description 21
- 238000010586 diagram Methods 0.000 claims abstract description 10
- 239000011159 matrix material Substances 0.000 claims description 36
- 230000007246 mechanism Effects 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 14
- 238000005457 optimization Methods 0.000 claims description 13
- 238000000280 densification Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 7
- 238000012217 deletion Methods 0.000 claims description 5
- 230000037430 deletion Effects 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 4
- 230000001902 propagating effect Effects 0.000 claims description 2
- 230000004927 fusion Effects 0.000 abstract description 7
- 230000003068 static effect Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 8
- 235000019580 granularity Nutrition 0.000 description 6
- 230000002457 bidirectional effect Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 102100022970 Basic leucine zipper transcriptional factor ATF-like Human genes 0.000 description 1
- 101000903742 Homo sapiens Basic leucine zipper transcriptional factor ATF-like Proteins 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000003042 antagnostic effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 229910052731 fluorine Inorganic materials 0.000 description 1
- 125000001153 fluoro group Chemical group F* 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G06T5/77—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Remote Sensing (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Quality & Reliability (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention belongs to the field of intelligent traffic, and particularly relates to a road network pixelation-based Wasserstein generation countermeasure flow data interpolation method, which comprises the steps of providing a traffic data Tracjectory 2Matrix representation method in consideration of certain structural similarity between traffic data and image data, and performing pixelation representation on the traffic network and track data; constructing a road network flow generation countermeasure network model, introducing a reconstructed road network topological structure and a multi-source heterogeneous fusion module to optimize a Wasserstein generator for generating the countermeasure network by considering the influence of external factors and implicit spatial characteristics, and simultaneously providing a brand new loss function for effectively repairing missing parts in a generated traffic flow characteristic diagram; inputting the repaired data into a discriminator of a road network flow data generation confrontation network model to judge the truth of the repaired data, finishing the repair if the truth is greater than a set threshold value, otherwise, inputting the road network flow data again to generate the confrontation network model to carry out data repair; the method can better mine multidimensional compensation characteristics of the missing road network flow data, compensate the missing data from three dimensions of the missing data historical data, the missing road neighbor nodes and external factors, improve the robustness and the adaptability of data restoration, effectively fuse dynamic/static external attributes and time multimode characteristics, and further improve the precision of data restoration.
Description
Technical Field
The invention belongs to the field of intelligent traffic, and particularly relates to a road network pixelation-based Wasserstein generation countermeasure flow data interpolation method.
Background
Currently, scholars study the repair problem of traffic flow missing data from multiple angles.
Around the problem of efficient representation of traffic flow trajectory data, researchers have proposed various efficient representation methods with respect to trajectory data. For example, the representation methods are classified based on interest points, based on track directions, based on artificial design features, based on track segments, based on bag of words, and the like. The above representation method effectively alleviates the problem of sparse distribution of the trajectory data, but mostly ignores the spatiotemporal information contained in the trajectory data.
Around the problem of the Spatio-temporal correlation of traffic data, the document ST-LBAGAN, spatial-temporal learnable bidirectional adaptive traffic network for missing traffic data input constructs Spatio-temporal learnable bidirectional attention generation against network learning traffic flow Spatio-temporal characteristics on the basis of U-net. The document Deep spatial-temporal bi-directional optimization based on tensor optimization for traffic data input on url road network proposes a model combining tensor completion and residual optimization, and fully captures the space-time dependency of traffic data. Document a Multi-Attention detector Completion Network for spatial-temporal Data acquisition proposes a spatial signal propagation module and a temporal self-Attention module as basic stack blocks of a deep Network, representing aggregation and extraction of dynamic dependency relationships in the Spatio-temporal dimension. The above documents all study the complex spatio-temporal correlation of traffic flow data in data mining. However, the spatial features hidden in the road network still need to be further mined to improve the data interpolation accuracy.
The external factor characteristic problem surrounding the multimode nature and dynamic/static behavior of traffic data time, document ASTGCN: the coverage Based Spatial-Temporal graphic relational network for Traffic Flow Forecasting is composed of three independent components, the distribution captures the dependency relationship of the recent period, the daily period and the week period of the Traffic Flow, and the output of the three components is weighted and fused. Document APTN: a Spatial-Temporal extension Approach for Traffic Prediction uses an encoder Attention mechanism to model the periodic dependence of data. Most of the models fully consider the time multi-mode characteristics and external characteristics of traffic data, but the module design aiming at the data multi-type characteristics is still to be further researched.
Disclosure of Invention
In order to solve the problems of effective representation of vehicle motion trajectory data, complex traffic data loss types and uncertain loss rate, the method is inspired by the success of generation of a countermeasure network in the field of image restoration, the traffic data restoration is regarded as an image restoration problem in consideration of certain structural similarity of the traffic data and the image data, and the accurate restoration of the traffic missing data is realized.
Aiming at the problem of representation of traffic flow trajectory data, the trajectory data is converted into a 2-dimensional characteristic graph representation of time-space by considering the space-time characteristics of the traffic flow trajectory, and a traffic flow characteristic graph is obtained;
generating a countermeasure network based on Wasserstein to construct road network flow data to generate a countermeasure network model, rebuilding a generator of the Wasserstein generated countermeasure network and carrying out optimization training on the generator, inputting a traffic flow characteristic diagram into the generator which completes the optimization training to repair a missing part in the traffic flow characteristic diagram;
inputting the repaired data into a discriminator of the road network flow data generation confrontation network model to judge the truth of the repaired data, finishing the repair if the truth is larger than a set threshold value, otherwise, inputting the road network flow data again to generate the confrontation network model to carry out data repair.
Further, the representation method using the traffic data track 2Matrix includes the steps of:
carrying out densification processing on the acquired track data, and matching the track coordinates with the road network data by adopting a geometric-based matching algorithm under the condition that the coordinates of the track points and the coordinates of the road network data after the densification processing are projected to uniform coordinates through Gauss;
stacking the flow time sequences of adjacent road sections together to obtain road section historical flow data of T dimension measured by a detector at the tth moment of NxT dimension, wherein N is the number of the road sections, and T is the time dimension;
creating a first mask matrix representing the data missing condition, namely when the median of the first mask matrix is 1, the data missing is represented, and when the median of the first mask matrix is 0, the data missing is represented;
and multiplying the first mask matrix by the T-dimensional road section historical flow data measured by the detector at the T-th moment to obtain a traffic flow characteristic diagram.
Further, the process of generating projected coordinates (x, y) by performing gaussian projection on the coordinates comprises:
wherein N is the radius of curvature of the meridian;lon and lat are respectively longitude and latitude of coordinates before projection, lon "= lon-lon 0 ;lon 0 Is the central meridian longitude; t = tanlat; eta 2 =e' 2 cos 2 lat, e' is the second eccentricity of the ellipsoid; and X is the meridian arc length.
Further, when the track coordinates are matched with the road network data by adopting a matching algorithm based on geometry, the track points are matched with the road sections with the minimum vertical projection distance of each road section in the road network data, and the vertical projection distance is expressed as:
wherein D is a vertical projection distance; (x, y) represents coordinates of the trace points; (x) 0 ,y 0 ) And (x) 1 ,y 1 ) Coordinates of two points in a road segment.
Further, the T-dimension road section historical flow data measured by the detector at the T-th time is expressed as:
wherein, Y t Representing the historical traffic data of the road section of the T dimension measured by the detector at the T-th time; y is i(t-jΔt) For road e i Traffic data at time (T-j Δ T), j =0,1, …, T-1; Δ t is a flow data sampling interval; t is the length of the historical time series of road segments.
Further, the process of generating the confrontation network model by the road network traffic data to repair the missing road network traffic data includes:
constructing multimodal input data effectively captures the periodic and long term dependence of traffic data. In a space-time data generation component, after splicing the traffic flow data to be repaired and the adjacent traffic flow data, the day cycle traffic flow data and the Zhou Zhouqi traffic flow together, extracting by convolution operation to obtain multi-modal input data. Calculating a correlation coefficient of multi-modal input data through an attention mechanism, and taking the output of the attention mechanism as a historical compensation feature of the missing road network flow data;
and reconstructing a topological structure of the urban road network to realize the explicit and implicit spatial correlation capture of traffic data. Reconstructing a road network topological structure by using a Pearson correlation coefficient of historical time data of road nodes, calculating attention coefficients of neighbor nodes according to the inherent road node adjacency relation of the missing nodes and the reconstructed road network topological structure through a graph attention network, and capturing information of different dimensions of the neighbor nodes on missing data restoration to serve as neighbor compensation characteristics of the missing data;
a multi-source heterogeneous fusion module is provided for the irregularity and uncertainty of traffic data, external features such as weather, time and the like of missing data and a plurality of compensation features are effectively fused, and the robustness of repair is improved; constructing a sequence of time sequence factors, time characteristic factors and weather factors, and enabling the dimensionality of the three sequences to be the same as the dimensionality of historical data compensation characteristics and neighbor node compensation characteristics of missing road network flow data through convolution operation; and splicing the historical data compensation characteristics and the neighbor node compensation characteristics of the missing road network flow data and the sequences of time sequence factors, time characteristic factors and weather factors, and propagating and extracting the channel characteristics along the channel dimension to serve as the value of the repaired real data.
Further, obtaining historical compensation characteristics of missing flow data includes:
first, multimodal input data is constructed. Splicing traffic flow data X currently needing to be repaired t Adjacent traffic flow data X t r Day cycle traffic flow X t d And Zhou Zhouqi traffic flow X t w . The input data matrix isTraffic flow data of 4T 'time points representing N nodes, N being the total number of links, and T' representing the length of the historical time series of the links. Fusing the splicing matrix by using a convolutional neural network to obtain multi-modal input data
Secondly, calculating the vector correlation between the missing road network flow data and the historical data by using the multi-modal input data, and taking the vector correlation as the historical data compensation characteristic of the missing data:
wherein, F t Compensating characteristics for historical data of missing road network flow data; multihead (·) is multi-headed self-attention;the method comprises the steps that multi-modal input data are obtained, N is the total number of road sections, C is the number of channels, and T is the length of a historical time sequence of the road sections;weight matrices, d, being the query subspace, the key subspace and the value subspace, respectively q 、d k 、d v Dimensions of the query subspace, the key subspace, and the value subspace, respectively; i =1,2,.. H, h is the number of heads of a multi-head self-notice.
Further, the adjacent node compensation feature for obtaining the missing traffic data includes:
firstly, reconstructing an urban road network topological structure by using a Pearson correlation coefficient of historical time data of road nodes. Specifically, a Pearson correlation matrix A is established by calculating the Pearson coefficient of one-week historical data between every two road nodes p ∈R N×N And N is the number of the road segment nodes.
Then, inputting the current missing road network traffic data to be repaired, an adjacent matrix and a Pearson correlation matrix in the attention of the drawing, and taking the obtained output as the missing data neighbor node compensation characteristic:
wherein σ (·) is a sigmoid function; k is the number of heads of the attention force; n is a radical of i The adjacent node set is a missing node i of the original adjacent matrix according to the road node; w k Paying attention to the dimension expansion parameter of the head number k corresponding to the graph; alpha is alpha k ij Normalizing the attention head number k of the corresponding graph according to the attention coefficient of the original road network structure;representing traffic flow data to be restored after time parallel processing, wherein T is the dimension of time, N is the total number of road sections, and C is the number of channels; n is a radical of i ' is an adjacent node set of a missing node i according to a reconstructed road network topological structure; w' k Paying attention to the dimension expansion parameter of the head number k corresponding to the graph; alpha is alpha i ′ j k And (4) the attention coefficient of the corresponding graph after the attention head number k is normalized according to the reconstructed road network topological structure.
Further, the obtaining of the repaired traffic flow data in the multi-source heterogeneous fusion module comprises:
firstly, selecting external factors such as time sequence factor, time characteristic factor and weather factor, and respectively performing convolution operation on the 3 external factors to convert the external factors into the dimension which is the same as the dimension of the characteristic matrix extracted by the two sub-modules, namely xi i ∈R C×N×T′ And i is the category of external factors.
Then, completing splicing of 3 types of external factors with traffic flow time characteristics and space characteristics on the 0 th dimension to obtain an input F of the multi-source heterogeneous fusion module 0 ∈R 5C×N×T′ ,F 0 =concat(F temporal ,F spatial ,ξ 1 ,ξ 2 ,ξ 3 ),
Next, the extracted channel features are propagated along the channel dimension using an attention mechanism to obtain an output Y' of the module, thereby better focusing on important features and suppressing unnecessary features.
Finally, the output result of the module is recombined with the observed real data to realize the data restoration of local deletion and obtain the restored traffic flow
Wherein Y represents traffic flow data obtained by restoration; m denotes a mask matrix; and Y' is how the multi-source heterogeneous fusion module generates complete traffic data by time sequence factors, time characteristic factors, weather factors, historical data compensation characteristics and neighbor node compensation characteristics.
Further, the pixelized countermeasure network model comprises a discriminator and a traffic flow generator, the discriminator is used for discriminating the truth of repair data generated by the traffic flow generator, the discriminator and the traffic flow generator are respectively optimized by using a RMSProp optimization algorithm in the training process, and a loss function adopted when the discriminator is optimized is as follows:
a new target optimization method is provided when a traffic flow generator is optimized, namely a reconstruction Loss function Loss is introduced re And in combination with the resistance loss, the similarity of the data of the generator and the real data is improved, and the loss function of the optimization training is expressed as:
wherein M representsA mask matrix;representing the traffic flow data obtained by repairing; d (-) is a discriminator; α is the coefficient of the mask reconstruction loss function; n is the number of deletion positions;is composed ofRepresents the value of the jth time dimension of the ith road segment; m ij The element in M represents the missing condition of the jth time dimension traffic flow data of the ith road segment, and the value of the element is 1 to represent that the data is not missing and 0 to represent that the data is missing.
According to the method, missing data is compensated from historical data and 2 dimensions of neighbor nodes, so that the robustness and the adaptivity of data restoration are improved; aiming at external attributes and time multimode characteristics of traffic flow data, a multi-channel attention mechanism is introduced to carry out weight distribution on key attributes influencing data repair performance, and a heterogeneous multi-source fusion module is constructed, and the module can effectively fuse dynamic/static external attributes and time multimode characteristics, so that the data repair precision is further improved.
Drawings
FIG. 1 is a flowchart of an interpolation method for generating confrontation traffic data based on road network pixelation according to the present invention;
FIG. 2 is a graph comparing the true value and the repaired value according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a road network pixelation-based Wasserstein generation confrontation flow data interpolation method, which comprises the following steps of:
aiming at the problem of representation of traffic flow track data, the track data is converted into a time-space 2-dimensional characteristic representation by considering the space-time characteristics of the traffic flow track, and a traffic data Tracjectory 2Matrix representation method is provided, wherein the method converts space-time traffic dynamic data into a traffic flow space-time relation Matrix, and increases the repair scale and the network range of the traffic data;
the method comprises the steps of constructing a road network flow data generation countermeasure network model, providing a generator for reconstructing Wasserstein to generate a countermeasure network by a space-time data generation component and a new target optimization function, and accurately repairing missing parts in a generated traffic flow characteristic diagram, wherein the model solves the problem of unstable repairing performance under the conditions of high loss rate and complex loss types of traffic data;
inputting the repaired data into a discriminator of the road network flow data generation confrontation network model to judge the truth of the repaired data, finishing the repair if the truth is larger than a set threshold value, otherwise, inputting the road network flow data again to generate the confrontation network model to carry out data repair.
In this embodiment, when the raw track data is subjected to the densification processing, the sampling frequencies of the raw tracks are not uniform, and the frequencies after the densification processing are all 15s. City road network data is downloaded from an OpenStreetMap (OSM), coordinates of original road network data and coordinates of track points are projected to a uniform coordinate through gauss, and a map road network matching problem is converted into a pattern matching problem of a plane line segment sequence. The gaussian projection includes a positive and negative equation, the following being the positive equation:
wherein x and y represent projected coordinates;lon"=lon-lon 0 Lon is the longitude of a point, lon 0 Is the central meridian longitude; n is the radius of curvature of the meridian,a is an ellipsoid long semi-axis, and e is an ellipsoid first eccentricity; t = tanlat; eta 2 =e' 2 cos 2 lat, e' is the second eccentricity of the ellipsoid;and X is the meridian arc length.
And a matching algorithm based on geometry is adopted, and the geometric information of the GPS points and the roads is integrated, so that the aim of regulating the track data to the road section is fulfilled. The principle of the algorithm is that a road section with the minimum vertical projection distance from a point to be matched to each candidate road section is searched as a matched road section, and the vertical projection distance formula is as follows:
wherein, (x, y) is the projection coordinate of the point to be matched, (x) 0 ,y 0 ) And (x) 1 ,y 1 ) Two-point coordinates of the candidate link.
Due to the fact that the quantity of road network information data is too large, the shortest distance needs to be calculated for both the road network data set and the track data points. The embodiment introduces a KDTree neighbor matching algorithm, and effectively improves the matching speed. Outputting a track data road section vector sequence RV after algorithm matching i 。
In order to utilize the space-time information of the traffic data, the traffic time series of each road section are stacked together according to adjacent links to obtain a 'time-space' two-dimensional traffic data traffic characteristic diagram Y t ∈R N×T The method specifically comprises the following steps:
carrying out densification processing on the acquired track data, and matching the track coordinates with the road network data by adopting a geometric-based matching algorithm under the condition that the coordinates of the track points and the coordinates of the road network data after the densification processing are projected to uniform coordinates through Gauss; the process of generating the projected coordinates (x, y) by performing Gaussian projection on the coordinates comprises the following steps:
wherein N is the radius of curvature of the meridian;lon and lat are respectively longitude and latitude of coordinates before projection, lon "= lon-lon 0 ;lon 0 Is the central meridian longitude; t = tanlat; eta 2 =e' 2 cos 2 lat, e' is the second eccentricity of the ellipsoid; x is the meridian arc length;
when the track coordinates are matched with the road network data by adopting a matching algorithm based on geometry, the track points are matched with the road sections with the minimum vertical projection distance of each road section in the road network data, and the vertical projection distance is expressed as:
wherein D is a vertical projection distance; (x, y) represents coordinates of the trace points; (x) 0 ,y 0 ) And (x) 1 ,y 1 ) Coordinates of two points in a road segment.
Stacking the flow time sequences of adjacent road sections together to obtain road section historical flow data of T dimension measured by a detector at the tth moment of NxT dimension, wherein N is the number of the road sections, and T is the time dimension;
creating a first mask matrix representing the data missing condition, namely when the median of the first mask matrix is 1, the data missing is represented, and when the median of the first mask matrix is 0, the data missing is represented;
multiplying the first mask matrix with the T-dimensional road section historical flow data measured by the detector at the T moment to obtain a traffic flow characteristic diagram;
the detector measures T-dimensional road section historical flow data Y at the T-th moment t Expressed as:
wherein N is the total number of road segments, and T is the selected data time dimension; y is i(t-jΔt) For road e i Traffic data at time (T-j Δ T), j =0,1, …, T-1; Δ t is the traffic data sampling interval.
In order to characterize the missing condition of road network traffic data, a mask matrix is createdWherein m is i(t-jΔt) For road e i The flow data value missing state at time (t-j Δ t) is expressed as:
therefore, the road network data actually acquired is the hadamard product of the road section historical flow data Y and the mask matrix M:
X=Y·M
in order to integrate a plurality of input data and quantify the correlation of time series, the input data needs to be spliced with the traffic flow data X which needs to be repaired currently t Adjacent traffic flow data X t r Day cycle traffic flow X t d And Zhou Zhouqi traffic flow X t w . The input data matrix isAnd 4T 'time traffic flow data representing N nodes, N being the total number of links, and T' representing the length of the historical time series of links. Fusing the splicing matrix by using a convolutional neural network to obtain multi-modal input data
And calculating the vector correlation of the multi-input data, distributing the weight between the missing data and the data before the missing moment, and better outputting the repairing result. Through the attention mechanism, the correlation among various input data is calculated, the potential dependence of all road flow data on the time dimension is learned, and the influence of long distance is avoided. In the self-attention mechanism, embedding multi-module input data into a high-dimensional space is obtainedC is the number of channels mapped to the high dimensional space, N is the total number of road segments, and T' represents the length of the historical time series of road segments. Each input vector has 3 subspaces, the "query" subspace"Key" subspaceSum "value" subspaceThen, in the single-head attention mechanism, the "query" subspace Q, the "key" subspace K, and the "value" subspace V are respectively expressed as:
wherein are respectivelyWeight matrices Q, K, V, respectively; d q 、d k 、d v The dimensions of the "query" subspace, the "key" subspace, and the "value" subspace, respectively.
In order to carry out road node parallelization processing on input, embedded high-dimensional data is processedAfter dimension transformation operation, obtaining
Multiplying Q by K points results in an attention score = Q · K, the magnitude of the attention score representing the dynamic dependency between road nodes over different time periods. To ensure the smoothness of the counter-propagating gradient, score is divided byScaling, performing softmax processing and weighting processing on the scores, and outputting a final result, which is expressed as:
due to the fact that the dimensionality of input data is high, a single self-attention model is difficult to capture the information diversity of the road node time sequence. Therefore, the historical data compensation features of the missing road network flow data are extracted in parallel by adopting a plurality of self-attention models, the extracted features are spliced, and final output is further obtained through linear conversion, so that the aim of capturing the time dependence in a multi-angle and diversified manner is fulfilled. The multi-modal input data to the multi-headed self-attention network can be described as a linear mapping, and the historical data compensation characteristic of the missing road network flow data is represented as:
the historical data compensation feature of the missing road network traffic data may also be expressed as:
wherein Q is i "query" subspace, K, representing the ith head in a multi-head attention mechanism i "Key" subspace, V, representing the ith head in a Multi-headed attention System i Represents the "value" subspace of the ith head in a multi-head attention mechanism;are respectively Q i 、K i 、V i A corresponding weight matrix.
The linear correlation between different nodes is obtained by calculating the Pearson correlation coefficient of the historical data of one week between different nodes, and the Pearson correlation coefficient between the road nodes i and j is calculated as follows:
wherein T is the length of the historical time sequence of the selected road section node; x is a radical of a fluorine atom it Traffic data representing a road node i at time t; x is the number of jt Traffic data representing a road node j at time t;
after the Pearson correlation coefficients among all road nodes are calculated, a Pearson correlation matrix is establishedAnd used as a reconstructed road network topology.
Inputting the current missing road network traffic data to be repaired, an adjacent matrix and a Pearson correlation matrix in the drawing attention, and taking the obtained output as the missing data neighbor node compensation characteristic:
wherein σ (·) is a sigmoid function; k is the number of heads of the attention force; n is a radical of i The adjacent node set is a missing node i of the original adjacent matrix according to the road node; w k Paying attention to the dimension expansion parameter of the head number k corresponding to the graph; alpha is alpha k ij Normalizing attention coefficients for the attention head number k of the corresponding graph according to the original road network structure;representing traffic flow data to be restored after time parallel processing, wherein T is the dimension of time, N is the total number of road sections, and C is the number of channels; n is a radical of i ' is an adjacent node set of a missing node i according to a reconstructed road network topological structure; w' k Paying attention to the dimension expansion parameter of the head number k corresponding to the graph; alpha' ij k And (4) the attention coefficient of the corresponding graph after the attention head number k is normalized according to the reconstructed road network topological structure.
And selecting external factors such as time sequence factors, time characteristic factors, weather factors and the like. The time series factors are divided into time series factors and time series factors. The time sequence factor refers to the hour of a day, the sub-sequence factor refers to the time period of an hour, in this embodiment, 1 hour is divided into 12 time periods at intervals of 5 minutes as the sub-sequence factor, a person skilled in the art can divide the granularity of an hour according to actual needs, and similarly, the division granularity of the time sequence factor is 24 in this embodiment, so each hour of 24 hours is taken as a time sequence, therefore, a person skilled in the art can divide a day into time sequences of N granularities according to actual needs, so the size of each divided granularity is greater than 1 hour and less than 24 hours, a person skilled in the art can divide the time in one time sequence granularity into the sub-sequence factor, and the size of each granularity of the sub-sequence factor is greater than 1 minute and less than 60 minutes. The time characteristic factor indicates whether the day is a holiday or a holiday. The weather factors can be selected according to season change, and can also be selected according to the fact that the temperature is divided into a plurality of intervals, rainfall is selected as the weather factors in the embodiment to represent weather conditions, and the rainfall is divided into 6 levels of no rain, light rain, medium rain, heavy rain and extra heavy rain according to the rainfall level.
Respectively performing convolution operation on 3 external factors, and converting the external factors into the dimension which is the same as the compensation characteristic matrix extracted by the space-time data generation component, namely xi i ∈R C×N×T′ I is the category of external factors, and the input F of the multi-source heterogeneous fusion module is obtained after the splicing of the 3 types of external factors and the historical data compensation characteristics and the neighbor node compensation characteristics is completed on the 0 th dimension 0 ∈R 5C ×N×T′ ,F 0 =concat(F t ,F s ,ξ 1 ,ξ 2 ,ξ 3 ),ξ 1 、ξ 2 、ξ 3 Time series factors, time characteristic factors and weather factors, respectively. The extracted channel features are propagated along the channel dimension using an attention mechanism to obtain the output result Y' of the module, thereby better focusing on important features and suppressing unnecessary features. The output result of the module is recombined with the observed real data to realize the data restoration of local deletion and obtain the restored traffic flow
Where, denotes the hadamard product, and m is the mask matrix.
And a discriminator is introduced into the pixelation generation antagonistic network, and is alternately trained with the generator to continuously compete, so that the generator can better generate data. In order to generate data of which the model focuses on missing positions, the judger distinguishes the truth of each specific data in the matrix, and the purpose of repairing the local part accurately is achieved. The input of the discriminator is the repaired complete dataEmbedding space position codes between roads and mapping a high-dimensional space by using two convolution layers respectively, then extracting space-time information of input data by using an attention mechanism to evaluate generated data, and finally obtaining each piece of restored complete data through two complete connection layers and a sigmoid activation functionTrue probability of individual data. The discriminator can be viewed as the function D χ → [0,1],Represents the complete dataThe (i, j) th component after being input into the discriminator D corresponds to the real probability of the (i, j) th data of the traffic flow restoration result.
The content of the traffic flow data restoration algorithm is as follows:
fig. 2 shows the comparison result between the true value and the repair value when the deficiency rate is 50%, and it can be seen from the figure that the repair result of the model is close to the true value of the traffic flow. Table 1 shows the comparison of MAE (mean absolute error), MAPE (mean absolute error) and RMSE (mean square error) performance at different loss rates, comparing the performance of the ST-DIGAN model with the latest BGCP model, BATF model and GAIN. As can be seen from Table 1, the MAE of this model was relatively small at both low and high deletion rates.
TABLE 1
The invention provides a method for generating and confronting network traffic data repairing based on road network pixelation, aiming at the effective representation problem of vehicle motion track data and a plurality of challenges of complex traffic data loss types, uncertain loss rate and the like. In the present embodiment, first, a traffic network and a trajectory data pixelization expression algorithm are designed in consideration of the irregularity of the trajectory data, and the trajectory data is expressed as a time-space 2-dimensional map. Then, considering the complex loss type and uncertain loss rate of the traffic data, a generator for generating a countermeasure network is optimized through a spatio-temporal data generation component so as to better repair the missing road network traffic data from three dimensions of historical data, neighbor nodes and external weather. Finally, the model was evaluated and tested under the adult taxi track data set. Experiments show that under the condition of high data loss rate, compared with other baseline methods, the model has a good repairing effect on traffic flow loss data, and the robustness and the high precision of the model are shown.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (10)
1. A method for interpolating confrontation flow data generated based on road network pixelation is characterized by comprising the following steps:
aiming at the problem of representation of traffic flow trajectory data, the trajectory data is converted into a 2-dimensional characteristic graph representation of time-space by considering the space-time characteristics of the traffic flow trajectory, and a traffic flow characteristic graph is obtained;
generating a countermeasure network based on Wasserstein to construct road network flow data to generate a countermeasure network model, rebuilding a generator of the Wasserstein generated countermeasure network and carrying out optimization training on the generator, inputting a traffic flow characteristic diagram into the generator which completes the optimization training to repair a missing part in the traffic flow characteristic diagram;
inputting the repaired data into a discriminator of the road network flow data generation confrontation network model to judge the truth of the repaired data, finishing the repair if the truth is larger than a set threshold value, otherwise, inputting the road network flow data again to generate the confrontation network model to carry out data repair.
2. The road network pixelation-based Wasserstein generation confrontation flow data interpolation method as claimed in claim 1, wherein the step of converting trajectory data into a time-space 2-dimensional feature map representation comprises the steps of:
carrying out densification processing on the acquired track data, and matching the track coordinates with the road network data by adopting a geometric-based matching algorithm under the condition that the coordinates of the track points and the coordinates of the road network data after the densification processing are projected to uniform coordinates through Gauss;
stacking the flow time sequences of adjacent road sections together to obtain road section historical flow data of T dimension measured by a detector at the tth moment of NxT dimension, wherein N is the number of the road sections, and T is the time dimension;
creating a first mask matrix for representing the data missing condition, namely when the median of the first mask matrix is 1, the data missing is represented, and when the median of the first mask matrix is 0, the data missing is represented;
and multiplying the first mask matrix with the T-dimensional road section historical flow data measured by the detector at the T-th moment to obtain a traffic flow characteristic diagram.
3. The road network pixelation-based Wasserstein generation confrontation flow data interpolation method as claimed in claim 2, wherein the process of generating projected coordinates (x, y) by performing Gaussian projection on the coordinates comprises:
wherein N is a radius curvature halfDiameter;lon and lat are respectively longitude and latitude of coordinates before projection, lon "= lon-lon 0 ;lon 0 Is the center meridian longitude; t = tanlat; eta 2 =e' 2 cos 2 lat, e' is the second eccentricity of the ellipsoid; and X is the meridian arc length.
4. The road network pixelation-based Wasserstein generation countermeasure network traffic data interpolation method as claimed in claim 2, wherein when matching the track coordinates with the road network data by adopting a geometry-based matching algorithm, the track points are matched with the road segments with the minimum vertical projection distance of each road segment in the road network data, and the vertical projection distance is expressed as:
wherein D is a vertical projection distance; (x, y) represents coordinates of the trace points; (x) 0 ,y 0 ) And (x) 1 ,y 1 ) Coordinates of two points in a road segment.
5. The road network pixelation-based method for interpolating warspersein countermeasures generated network traffic data, as claimed in claim 2, wherein T-dimension road section historical traffic data measured by the T-th time detector is represented as:
wherein, Y t Representing the historical traffic data of the road section of the T dimension measured by the detector at the T-th time; y is i(t-jΔt) For road e i Traffic data at time (T-j Δ T), j =0,1, …, T-1; Δ t is a flow data sampling interval; t is the length of the historical time series of road segments.
6. The road network pixelation-based Wasserstein generation countermeasure network traffic data interpolation method as claimed in claim 1, wherein the process of repairing missing data by a road network traffic data generation countermeasure network model comprises:
splicing traffic flow data to be repaired and adjacent traffic flow data thereof, daily cycle traffic flow data thereof and Zhou Zhouqi traffic flow together, extracting multi-modal input data through convolution operation, calculating the vector correlation of the multi-modal input data by using a multi-head attention mechanism, and taking the output of the multi-head attention mechanism as historical data compensation characteristics of the traffic flow data;
reconstructing a road network topological structure by using the Pearson correlation coefficient of the historical time data of the road nodes, calculating the attention coefficient of a neighbor node according to the inherent road node adjacency relation of the missing node and the reconstructed road network topological structure through a graph attention network, and capturing the information of the neighbor node for repairing the missing data in different dimensions as the neighbor compensation characteristic of the missing data;
constructing a sequence of time sequence factors, time characteristic factors and weather factors, and enabling the dimensionality of the three sequences to be the same as the dimensionality of historical data compensation characteristics and neighbor node compensation characteristics of missing road network flow data through convolution operation;
and splicing the historical data compensation characteristics and the neighbor node compensation characteristics of the missing road network flow data and the sequences of time sequence factors, time characteristic factors and weather factors, and propagating and extracting the channel characteristics along the channel dimension to serve as the value of the repaired real data.
7. The road network pixelation-based Wasserstein generation confrontation flow data interpolation method according to claim 6, wherein the historical data compensation characteristics of missing road network flow data are represented as:
wherein, F t Characteristics of historical data compensation for missing road network traffic data of traffic flow data; multihead (·) is multi-headed self-attention;inputting data in multiple modes, wherein N is the total number of road sections, C is the number of channels, and T is the length of a historical time sequence of the road sections; weight matrices, d, being the query subspace, the key subspace and the value subspace, respectively q 、d k 、d v Dimensions of the query subspace, the key subspace, and the value subspace, respectively; i =1,2,.. H, h is the number of heads of a multi-head self-notice.
8. The road network pixelation-based Wasserstein generation confrontation traffic data interpolation method according to claim 6, wherein the neighbor node compensation feature of missing road network traffic data is represented as:
wherein σ (·) is a sigmoid function; k is the number of heads of the graph attention; n is a radical of i The adjacent node set is a missing node i of the original adjacent matrix according to the road node; w k Paying attention to the dimension expansion parameter of the head number k corresponding to the graph; alpha is alpha k ij Normalizing the attention head number k of the corresponding graph according to the attention coefficient of the original road network structure;representing the traffic flow data to be restored after time parallel processing, T is the dimension of time, and N is the total number of road sectionsC is the number of channels; n' i The adjacent node set is an adjacent node set of a missing node i according to the reconstructed road network topological structure; w' k Paying attention to the dimension expansion parameter of the head number k corresponding to the graph;and (4) the attention coefficient of the corresponding graph after the attention head number k is normalized according to the reconstructed road network topological structure.
9. The road network pixelation-based Wasserstein generation countermeasure traffic data interpolation method according to claim 6, wherein the time series factors include time series factors and subsequence factors, the time series factors refer to the time at which the data to be repaired is in the first hour of a day, and the subsequence factors refer to the time at which the data to be repaired is in the first time period of a certain hour; the time characteristic sequence factor represents whether the time sequence to be repaired is a working day or not; the weather factor refers to the weather condition of the data to be repaired.
10. The road network pixelation-based Wasserstein generation countermeasure flow data interpolation method according to claim 1, wherein the pixelation generation countermeasure network model comprises a discriminator and a traffic flow generator, the discriminator is used for discriminating the truth of repair data generated by the generator, the discriminator and the traffic flow generator are respectively optimized by utilizing a RMSProp optimization algorithm in the training process, and a loss function adopted when the discriminator is optimized is as follows:
when the traffic flow generator is optimized, a reconstruction Loss function Loss is introduced re In combination with the penalty, the penalty function for the optimization training is expressed as:
wherein M represents a mask matrix;representing the traffic flow data obtained by repairing; d (-) is a discriminator; α is the coefficient of the mask reconstruction loss function; n is the number of deletion positions;is composed ofRepresents the value of the jth time dimension of the ith road segment; m ij The element in M represents the missing condition of the jth time dimension traffic flow data of the ith road segment, and the value of the element is 1 to represent that the data is not missing and 0 to represent that the data is missing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211197830.9A CN115510174A (en) | 2022-09-29 | 2022-09-29 | Road network pixelation-based Wasserstein generation countermeasure flow data interpolation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211197830.9A CN115510174A (en) | 2022-09-29 | 2022-09-29 | Road network pixelation-based Wasserstein generation countermeasure flow data interpolation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115510174A true CN115510174A (en) | 2022-12-23 |
Family
ID=84507223
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211197830.9A Pending CN115510174A (en) | 2022-09-29 | 2022-09-29 | Road network pixelation-based Wasserstein generation countermeasure flow data interpolation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115510174A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115827335A (en) * | 2023-02-06 | 2023-03-21 | 东南大学 | Time sequence data missing interpolation system and method based on modal crossing method |
CN116206443A (en) * | 2023-02-03 | 2023-06-02 | 重庆邮电大学 | Traffic flow data interpolation method based on time-space road network pixelized representation |
CN116628435A (en) * | 2023-07-21 | 2023-08-22 | 山东高速股份有限公司 | Road network traffic flow data restoration method, device, equipment and medium |
CN116913445A (en) * | 2023-06-05 | 2023-10-20 | 重庆邮电大学 | Medical missing data interpolation method based on form learning |
CN117576918A (en) * | 2024-01-17 | 2024-02-20 | 四川国蓝中天环境科技集团有限公司 | Urban road flow universe prediction method based on multi-source data |
-
2022
- 2022-09-29 CN CN202211197830.9A patent/CN115510174A/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116206443A (en) * | 2023-02-03 | 2023-06-02 | 重庆邮电大学 | Traffic flow data interpolation method based on time-space road network pixelized representation |
CN116206443B (en) * | 2023-02-03 | 2023-12-15 | 重庆邮电大学 | Traffic flow data interpolation method based on time-space road network pixelized representation |
CN115827335A (en) * | 2023-02-06 | 2023-03-21 | 东南大学 | Time sequence data missing interpolation system and method based on modal crossing method |
CN116913445A (en) * | 2023-06-05 | 2023-10-20 | 重庆邮电大学 | Medical missing data interpolation method based on form learning |
CN116913445B (en) * | 2023-06-05 | 2024-05-07 | 重庆邮电大学 | Medical missing data interpolation method based on form learning |
CN116628435A (en) * | 2023-07-21 | 2023-08-22 | 山东高速股份有限公司 | Road network traffic flow data restoration method, device, equipment and medium |
CN116628435B (en) * | 2023-07-21 | 2023-09-29 | 山东高速股份有限公司 | Road network traffic flow data restoration method, device, equipment and medium |
CN117576918A (en) * | 2024-01-17 | 2024-02-20 | 四川国蓝中天环境科技集团有限公司 | Urban road flow universe prediction method based on multi-source data |
CN117576918B (en) * | 2024-01-17 | 2024-04-02 | 四川国蓝中天环境科技集团有限公司 | Urban road flow universe prediction method based on multi-source data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111400620B (en) | User trajectory position prediction method based on space-time embedded Self-orientation | |
CN115510174A (en) | Road network pixelation-based Wasserstein generation countermeasure flow data interpolation method | |
CN109409499B (en) | Track recovery method based on deep learning and Kalman filtering correction | |
CN111612243B (en) | Traffic speed prediction method, system and storage medium | |
Chen et al. | Vehicle trajectory prediction based on intention-aware non-autoregressive transformer with multi-attention learning for Internet of Vehicles | |
Ren et al. | Mtrajrec: Map-constrained trajectory recovery via seq2seq multi-task learning | |
Wang et al. | A multi-view bidirectional spatiotemporal graph network for urban traffic flow imputation | |
CN113762338B (en) | Traffic flow prediction method, equipment and medium based on multiple graph attention mechanism | |
CN114925836B (en) | Urban traffic flow reasoning method based on dynamic multi-view graph neural network | |
CN115829171A (en) | Pedestrian trajectory prediction method combining space information and social interaction characteristics | |
Liu et al. | Pristi: A conditional diffusion framework for spatiotemporal imputation | |
CN114202120A (en) | Urban traffic travel time prediction method aiming at multi-source heterogeneous data | |
CN115206092A (en) | Traffic prediction method of BiLSTM and LightGBM model based on attention mechanism | |
Tsiligkaridis et al. | Personalized destination prediction using transformers in a contextless data setting | |
Wang et al. | Reconstruction of missing trajectory data: a deep learning approach | |
Zeng et al. | Multistage relation network with dual-metric for few-shot hyperspectral image classification | |
CN113837148A (en) | Pedestrian trajectory prediction method based on self-adjusting sparse graph transform | |
CN115359437A (en) | Accompanying vehicle identification method based on semantic track | |
CN115630211A (en) | Traffic data tensor completion method based on space-time constraint | |
CN115082896A (en) | Pedestrian trajectory prediction method based on topological graph structure and depth self-attention network | |
CN115457081A (en) | Hierarchical fusion prediction method based on graph neural network | |
Lin et al. | Pre-Training General Trajectory Embeddings With Maximum Multi-View Entropy Coding | |
Sun et al. | Visual perception based situation analysis of traffic scenes for autonomous driving applications | |
Xu et al. | Vehicle trajectory prediction considering multi-feature independent encoding | |
CN114328791B (en) | Map matching algorithm based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |