CN115510174A - Road network pixelation-based Wasserstein generation countermeasure flow data interpolation method - Google Patents

Road network pixelation-based Wasserstein generation countermeasure flow data interpolation method Download PDF

Info

Publication number
CN115510174A
CN115510174A CN202211197830.9A CN202211197830A CN115510174A CN 115510174 A CN115510174 A CN 115510174A CN 202211197830 A CN202211197830 A CN 202211197830A CN 115510174 A CN115510174 A CN 115510174A
Authority
CN
China
Prior art keywords
data
road
road network
missing
traffic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211197830.9A
Other languages
Chinese (zh)
Inventor
王蓉
李淼妃
赵健宽
蒋建春
王�华
郭清旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202211197830.9A priority Critical patent/CN115510174A/en
Publication of CN115510174A publication Critical patent/CN115510174A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06T5/77
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Remote Sensing (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention belongs to the field of intelligent traffic, and particularly relates to a road network pixelation-based Wasserstein generation countermeasure flow data interpolation method, which comprises the steps of providing a traffic data Tracjectory 2Matrix representation method in consideration of certain structural similarity between traffic data and image data, and performing pixelation representation on the traffic network and track data; constructing a road network flow generation countermeasure network model, introducing a reconstructed road network topological structure and a multi-source heterogeneous fusion module to optimize a Wasserstein generator for generating the countermeasure network by considering the influence of external factors and implicit spatial characteristics, and simultaneously providing a brand new loss function for effectively repairing missing parts in a generated traffic flow characteristic diagram; inputting the repaired data into a discriminator of a road network flow data generation confrontation network model to judge the truth of the repaired data, finishing the repair if the truth is greater than a set threshold value, otherwise, inputting the road network flow data again to generate the confrontation network model to carry out data repair; the method can better mine multidimensional compensation characteristics of the missing road network flow data, compensate the missing data from three dimensions of the missing data historical data, the missing road neighbor nodes and external factors, improve the robustness and the adaptability of data restoration, effectively fuse dynamic/static external attributes and time multimode characteristics, and further improve the precision of data restoration.

Description

Road network pixelation-based Wasserstein generation countermeasure flow data interpolation method
Technical Field
The invention belongs to the field of intelligent traffic, and particularly relates to a road network pixelation-based Wasserstein generation countermeasure flow data interpolation method.
Background
Currently, scholars study the repair problem of traffic flow missing data from multiple angles.
Around the problem of efficient representation of traffic flow trajectory data, researchers have proposed various efficient representation methods with respect to trajectory data. For example, the representation methods are classified based on interest points, based on track directions, based on artificial design features, based on track segments, based on bag of words, and the like. The above representation method effectively alleviates the problem of sparse distribution of the trajectory data, but mostly ignores the spatiotemporal information contained in the trajectory data.
Around the problem of the Spatio-temporal correlation of traffic data, the document ST-LBAGAN, spatial-temporal learnable bidirectional adaptive traffic network for missing traffic data input constructs Spatio-temporal learnable bidirectional attention generation against network learning traffic flow Spatio-temporal characteristics on the basis of U-net. The document Deep spatial-temporal bi-directional optimization based on tensor optimization for traffic data input on url road network proposes a model combining tensor completion and residual optimization, and fully captures the space-time dependency of traffic data. Document a Multi-Attention detector Completion Network for spatial-temporal Data acquisition proposes a spatial signal propagation module and a temporal self-Attention module as basic stack blocks of a deep Network, representing aggregation and extraction of dynamic dependency relationships in the Spatio-temporal dimension. The above documents all study the complex spatio-temporal correlation of traffic flow data in data mining. However, the spatial features hidden in the road network still need to be further mined to improve the data interpolation accuracy.
The external factor characteristic problem surrounding the multimode nature and dynamic/static behavior of traffic data time, document ASTGCN: the coverage Based Spatial-Temporal graphic relational network for Traffic Flow Forecasting is composed of three independent components, the distribution captures the dependency relationship of the recent period, the daily period and the week period of the Traffic Flow, and the output of the three components is weighted and fused. Document APTN: a Spatial-Temporal extension Approach for Traffic Prediction uses an encoder Attention mechanism to model the periodic dependence of data. Most of the models fully consider the time multi-mode characteristics and external characteristics of traffic data, but the module design aiming at the data multi-type characteristics is still to be further researched.
Disclosure of Invention
In order to solve the problems of effective representation of vehicle motion trajectory data, complex traffic data loss types and uncertain loss rate, the method is inspired by the success of generation of a countermeasure network in the field of image restoration, the traffic data restoration is regarded as an image restoration problem in consideration of certain structural similarity of the traffic data and the image data, and the accurate restoration of the traffic missing data is realized.
Aiming at the problem of representation of traffic flow trajectory data, the trajectory data is converted into a 2-dimensional characteristic graph representation of time-space by considering the space-time characteristics of the traffic flow trajectory, and a traffic flow characteristic graph is obtained;
generating a countermeasure network based on Wasserstein to construct road network flow data to generate a countermeasure network model, rebuilding a generator of the Wasserstein generated countermeasure network and carrying out optimization training on the generator, inputting a traffic flow characteristic diagram into the generator which completes the optimization training to repair a missing part in the traffic flow characteristic diagram;
inputting the repaired data into a discriminator of the road network flow data generation confrontation network model to judge the truth of the repaired data, finishing the repair if the truth is larger than a set threshold value, otherwise, inputting the road network flow data again to generate the confrontation network model to carry out data repair.
Further, the representation method using the traffic data track 2Matrix includes the steps of:
carrying out densification processing on the acquired track data, and matching the track coordinates with the road network data by adopting a geometric-based matching algorithm under the condition that the coordinates of the track points and the coordinates of the road network data after the densification processing are projected to uniform coordinates through Gauss;
stacking the flow time sequences of adjacent road sections together to obtain road section historical flow data of T dimension measured by a detector at the tth moment of NxT dimension, wherein N is the number of the road sections, and T is the time dimension;
creating a first mask matrix representing the data missing condition, namely when the median of the first mask matrix is 1, the data missing is represented, and when the median of the first mask matrix is 0, the data missing is represented;
and multiplying the first mask matrix by the T-dimensional road section historical flow data measured by the detector at the T-th moment to obtain a traffic flow characteristic diagram.
Further, the process of generating projected coordinates (x, y) by performing gaussian projection on the coordinates comprises:
Figure BDA0003871151290000031
Figure BDA0003871151290000032
wherein N is the radius of curvature of the meridian;
Figure BDA0003871151290000033
lon and lat are respectively longitude and latitude of coordinates before projection, lon "= lon-lon 0 ;lon 0 Is the central meridian longitude; t = tanlat; eta 2 =e' 2 cos 2 lat, e' is the second eccentricity of the ellipsoid; and X is the meridian arc length.
Further, when the track coordinates are matched with the road network data by adopting a matching algorithm based on geometry, the track points are matched with the road sections with the minimum vertical projection distance of each road section in the road network data, and the vertical projection distance is expressed as:
Figure BDA0003871151290000034
wherein D is a vertical projection distance; (x, y) represents coordinates of the trace points; (x) 0 ,y 0 ) And (x) 1 ,y 1 ) Coordinates of two points in a road segment.
Further, the T-dimension road section historical flow data measured by the detector at the T-th time is expressed as:
Figure BDA0003871151290000035
wherein, Y t Representing the historical traffic data of the road section of the T dimension measured by the detector at the T-th time; y is i(t-jΔt) For road e i Traffic data at time (T-j Δ T), j =0,1, …, T-1; Δ t is a flow data sampling interval; t is the length of the historical time series of road segments.
Further, the process of generating the confrontation network model by the road network traffic data to repair the missing road network traffic data includes:
constructing multimodal input data effectively captures the periodic and long term dependence of traffic data. In a space-time data generation component, after splicing the traffic flow data to be repaired and the adjacent traffic flow data, the day cycle traffic flow data and the Zhou Zhouqi traffic flow together, extracting by convolution operation to obtain multi-modal input data. Calculating a correlation coefficient of multi-modal input data through an attention mechanism, and taking the output of the attention mechanism as a historical compensation feature of the missing road network flow data;
and reconstructing a topological structure of the urban road network to realize the explicit and implicit spatial correlation capture of traffic data. Reconstructing a road network topological structure by using a Pearson correlation coefficient of historical time data of road nodes, calculating attention coefficients of neighbor nodes according to the inherent road node adjacency relation of the missing nodes and the reconstructed road network topological structure through a graph attention network, and capturing information of different dimensions of the neighbor nodes on missing data restoration to serve as neighbor compensation characteristics of the missing data;
a multi-source heterogeneous fusion module is provided for the irregularity and uncertainty of traffic data, external features such as weather, time and the like of missing data and a plurality of compensation features are effectively fused, and the robustness of repair is improved; constructing a sequence of time sequence factors, time characteristic factors and weather factors, and enabling the dimensionality of the three sequences to be the same as the dimensionality of historical data compensation characteristics and neighbor node compensation characteristics of missing road network flow data through convolution operation; and splicing the historical data compensation characteristics and the neighbor node compensation characteristics of the missing road network flow data and the sequences of time sequence factors, time characteristic factors and weather factors, and propagating and extracting the channel characteristics along the channel dimension to serve as the value of the repaired real data.
Further, obtaining historical compensation characteristics of missing flow data includes:
first, multimodal input data is constructed. Splicing traffic flow data X currently needing to be repaired t Adjacent traffic flow data X t r Day cycle traffic flow X t d And Zhou Zhouqi traffic flow X t w . The input data matrix is
Figure BDA0003871151290000041
Traffic flow data of 4T 'time points representing N nodes, N being the total number of links, and T' representing the length of the historical time series of the links. Fusing the splicing matrix by using a convolutional neural network to obtain multi-modal input data
Figure BDA0003871151290000051
Figure BDA0003871151290000052
Secondly, calculating the vector correlation between the missing road network flow data and the historical data by using the multi-modal input data, and taking the vector correlation as the historical data compensation characteristic of the missing data:
Figure BDA0003871151290000053
wherein, F t Compensating characteristics for historical data of missing road network flow data; multihead (·) is multi-headed self-attention;
Figure BDA0003871151290000054
the method comprises the steps that multi-modal input data are obtained, N is the total number of road sections, C is the number of channels, and T is the length of a historical time sequence of the road sections;
Figure BDA0003871151290000055
weight matrices, d, being the query subspace, the key subspace and the value subspace, respectively q 、d k 、d v Dimensions of the query subspace, the key subspace, and the value subspace, respectively; i =1,2,.. H, h is the number of heads of a multi-head self-notice.
Further, the adjacent node compensation feature for obtaining the missing traffic data includes:
firstly, reconstructing an urban road network topological structure by using a Pearson correlation coefficient of historical time data of road nodes. Specifically, a Pearson correlation matrix A is established by calculating the Pearson coefficient of one-week historical data between every two road nodes p ∈R N×N And N is the number of the road segment nodes.
Then, inputting the current missing road network traffic data to be repaired, an adjacent matrix and a Pearson correlation matrix in the attention of the drawing, and taking the obtained output as the missing data neighbor node compensation characteristic:
Figure BDA0003871151290000056
wherein σ (·) is a sigmoid function; k is the number of heads of the attention force; n is a radical of i The adjacent node set is a missing node i of the original adjacent matrix according to the road node; w k Paying attention to the dimension expansion parameter of the head number k corresponding to the graph; alpha is alpha k ij Normalizing the attention head number k of the corresponding graph according to the attention coefficient of the original road network structure;
Figure BDA0003871151290000057
representing traffic flow data to be restored after time parallel processing, wherein T is the dimension of time, N is the total number of road sections, and C is the number of channels; n is a radical of i ' is an adjacent node set of a missing node i according to a reconstructed road network topological structure; w' k Paying attention to the dimension expansion parameter of the head number k corresponding to the graph; alpha is alpha ij k And (4) the attention coefficient of the corresponding graph after the attention head number k is normalized according to the reconstructed road network topological structure.
Further, the obtaining of the repaired traffic flow data in the multi-source heterogeneous fusion module comprises:
firstly, selecting external factors such as time sequence factor, time characteristic factor and weather factor, and respectively performing convolution operation on the 3 external factors to convert the external factors into the dimension which is the same as the dimension of the characteristic matrix extracted by the two sub-modules, namely xi i ∈R C×N×T′ And i is the category of external factors.
Then, completing splicing of 3 types of external factors with traffic flow time characteristics and space characteristics on the 0 th dimension to obtain an input F of the multi-source heterogeneous fusion module 0 ∈R 5C×N×T′ ,F 0 =concat(F temporal ,F spatial123 ),
Next, the extracted channel features are propagated along the channel dimension using an attention mechanism to obtain an output Y' of the module, thereby better focusing on important features and suppressing unnecessary features.
Finally, the output result of the module is recombined with the observed real data to realize the data restoration of local deletion and obtain the restored traffic flow
Figure BDA0003871151290000061
Figure BDA0003871151290000062
Wherein Y represents traffic flow data obtained by restoration; m denotes a mask matrix; and Y' is how the multi-source heterogeneous fusion module generates complete traffic data by time sequence factors, time characteristic factors, weather factors, historical data compensation characteristics and neighbor node compensation characteristics.
Further, the pixelized countermeasure network model comprises a discriminator and a traffic flow generator, the discriminator is used for discriminating the truth of repair data generated by the traffic flow generator, the discriminator and the traffic flow generator are respectively optimized by using a RMSProp optimization algorithm in the training process, and a loss function adopted when the discriminator is optimized is as follows:
Figure BDA0003871151290000063
a new target optimization method is provided when a traffic flow generator is optimized, namely a reconstruction Loss function Loss is introduced re And in combination with the resistance loss, the similarity of the data of the generator and the real data is improved, and the loss function of the optimization training is expressed as:
Figure BDA0003871151290000064
Figure BDA0003871151290000065
wherein M representsA mask matrix;
Figure BDA0003871151290000066
representing the traffic flow data obtained by repairing; d (-) is a discriminator; α is the coefficient of the mask reconstruction loss function; n is the number of deletion positions;
Figure BDA0003871151290000067
is composed of
Figure BDA0003871151290000068
Represents the value of the jth time dimension of the ith road segment; m ij The element in M represents the missing condition of the jth time dimension traffic flow data of the ith road segment, and the value of the element is 1 to represent that the data is not missing and 0 to represent that the data is missing.
According to the method, missing data is compensated from historical data and 2 dimensions of neighbor nodes, so that the robustness and the adaptivity of data restoration are improved; aiming at external attributes and time multimode characteristics of traffic flow data, a multi-channel attention mechanism is introduced to carry out weight distribution on key attributes influencing data repair performance, and a heterogeneous multi-source fusion module is constructed, and the module can effectively fuse dynamic/static external attributes and time multimode characteristics, so that the data repair precision is further improved.
Drawings
FIG. 1 is a flowchart of an interpolation method for generating confrontation traffic data based on road network pixelation according to the present invention;
FIG. 2 is a graph comparing the true value and the repaired value according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a road network pixelation-based Wasserstein generation confrontation flow data interpolation method, which comprises the following steps of:
aiming at the problem of representation of traffic flow track data, the track data is converted into a time-space 2-dimensional characteristic representation by considering the space-time characteristics of the traffic flow track, and a traffic data Tracjectory 2Matrix representation method is provided, wherein the method converts space-time traffic dynamic data into a traffic flow space-time relation Matrix, and increases the repair scale and the network range of the traffic data;
the method comprises the steps of constructing a road network flow data generation countermeasure network model, providing a generator for reconstructing Wasserstein to generate a countermeasure network by a space-time data generation component and a new target optimization function, and accurately repairing missing parts in a generated traffic flow characteristic diagram, wherein the model solves the problem of unstable repairing performance under the conditions of high loss rate and complex loss types of traffic data;
inputting the repaired data into a discriminator of the road network flow data generation confrontation network model to judge the truth of the repaired data, finishing the repair if the truth is larger than a set threshold value, otherwise, inputting the road network flow data again to generate the confrontation network model to carry out data repair.
In this embodiment, when the raw track data is subjected to the densification processing, the sampling frequencies of the raw tracks are not uniform, and the frequencies after the densification processing are all 15s. City road network data is downloaded from an OpenStreetMap (OSM), coordinates of original road network data and coordinates of track points are projected to a uniform coordinate through gauss, and a map road network matching problem is converted into a pattern matching problem of a plane line segment sequence. The gaussian projection includes a positive and negative equation, the following being the positive equation:
Figure BDA0003871151290000081
Figure BDA0003871151290000082
wherein x and y represent projected coordinates;lon"=lon-lon 0 Lon is the longitude of a point, lon 0 Is the central meridian longitude; n is the radius of curvature of the meridian,
Figure BDA0003871151290000083
a is an ellipsoid long semi-axis, and e is an ellipsoid first eccentricity; t = tanlat; eta 2 =e' 2 cos 2 lat, e' is the second eccentricity of the ellipsoid;
Figure BDA0003871151290000084
and X is the meridian arc length.
And a matching algorithm based on geometry is adopted, and the geometric information of the GPS points and the roads is integrated, so that the aim of regulating the track data to the road section is fulfilled. The principle of the algorithm is that a road section with the minimum vertical projection distance from a point to be matched to each candidate road section is searched as a matched road section, and the vertical projection distance formula is as follows:
Figure BDA0003871151290000085
wherein, (x, y) is the projection coordinate of the point to be matched, (x) 0 ,y 0 ) And (x) 1 ,y 1 ) Two-point coordinates of the candidate link.
Due to the fact that the quantity of road network information data is too large, the shortest distance needs to be calculated for both the road network data set and the track data points. The embodiment introduces a KDTree neighbor matching algorithm, and effectively improves the matching speed. Outputting a track data road section vector sequence RV after algorithm matching i
In order to utilize the space-time information of the traffic data, the traffic time series of each road section are stacked together according to adjacent links to obtain a 'time-space' two-dimensional traffic data traffic characteristic diagram Y t ∈R N×T The method specifically comprises the following steps:
carrying out densification processing on the acquired track data, and matching the track coordinates with the road network data by adopting a geometric-based matching algorithm under the condition that the coordinates of the track points and the coordinates of the road network data after the densification processing are projected to uniform coordinates through Gauss; the process of generating the projected coordinates (x, y) by performing Gaussian projection on the coordinates comprises the following steps:
Figure BDA0003871151290000091
Figure BDA0003871151290000092
wherein N is the radius of curvature of the meridian;
Figure BDA0003871151290000093
lon and lat are respectively longitude and latitude of coordinates before projection, lon "= lon-lon 0 ;lon 0 Is the central meridian longitude; t = tanlat; eta 2 =e' 2 cos 2 lat, e' is the second eccentricity of the ellipsoid; x is the meridian arc length;
when the track coordinates are matched with the road network data by adopting a matching algorithm based on geometry, the track points are matched with the road sections with the minimum vertical projection distance of each road section in the road network data, and the vertical projection distance is expressed as:
Figure BDA0003871151290000094
wherein D is a vertical projection distance; (x, y) represents coordinates of the trace points; (x) 0 ,y 0 ) And (x) 1 ,y 1 ) Coordinates of two points in a road segment.
Stacking the flow time sequences of adjacent road sections together to obtain road section historical flow data of T dimension measured by a detector at the tth moment of NxT dimension, wherein N is the number of the road sections, and T is the time dimension;
creating a first mask matrix representing the data missing condition, namely when the median of the first mask matrix is 1, the data missing is represented, and when the median of the first mask matrix is 0, the data missing is represented;
multiplying the first mask matrix with the T-dimensional road section historical flow data measured by the detector at the T moment to obtain a traffic flow characteristic diagram;
the detector measures T-dimensional road section historical flow data Y at the T-th moment t Expressed as:
Figure BDA0003871151290000101
wherein N is the total number of road segments, and T is the selected data time dimension; y is i(t-jΔt) For road e i Traffic data at time (T-j Δ T), j =0,1, …, T-1; Δ t is the traffic data sampling interval.
In order to characterize the missing condition of road network traffic data, a mask matrix is created
Figure BDA0003871151290000102
Wherein m is i(t-jΔt) For road e i The flow data value missing state at time (t-j Δ t) is expressed as:
Figure BDA0003871151290000103
therefore, the road network data actually acquired is the hadamard product of the road section historical flow data Y and the mask matrix M:
X=Y·M
in order to integrate a plurality of input data and quantify the correlation of time series, the input data needs to be spliced with the traffic flow data X which needs to be repaired currently t Adjacent traffic flow data X t r Day cycle traffic flow X t d And Zhou Zhouqi traffic flow X t w . The input data matrix is
Figure BDA0003871151290000104
And 4T 'time traffic flow data representing N nodes, N being the total number of links, and T' representing the length of the historical time series of links. Fusing the splicing matrix by using a convolutional neural network to obtain multi-modal input data
Figure BDA0003871151290000105
Figure BDA0003871151290000106
And calculating the vector correlation of the multi-input data, distributing the weight between the missing data and the data before the missing moment, and better outputting the repairing result. Through the attention mechanism, the correlation among various input data is calculated, the potential dependence of all road flow data on the time dimension is learned, and the influence of long distance is avoided. In the self-attention mechanism, embedding multi-module input data into a high-dimensional space is obtained
Figure BDA0003871151290000111
C is the number of channels mapped to the high dimensional space, N is the total number of road segments, and T' represents the length of the historical time series of road segments. Each input vector has 3 subspaces, the "query" subspace
Figure BDA0003871151290000112
"Key" subspace
Figure BDA0003871151290000113
Sum "value" subspace
Figure BDA0003871151290000114
Then, in the single-head attention mechanism, the "query" subspace Q, the "key" subspace K, and the "value" subspace V are respectively expressed as:
Figure BDA0003871151290000115
Figure BDA0003871151290000116
Figure BDA0003871151290000117
wherein are respectively
Figure BDA0003871151290000118
Weight matrices Q, K, V, respectively; d q 、d k 、d v The dimensions of the "query" subspace, the "key" subspace, and the "value" subspace, respectively.
In order to carry out road node parallelization processing on input, embedded high-dimensional data is processed
Figure BDA0003871151290000119
After dimension transformation operation, obtaining
Figure BDA00038711512900001110
Multiplying Q by K points results in an attention score = Q · K, the magnitude of the attention score representing the dynamic dependency between road nodes over different time periods. To ensure the smoothness of the counter-propagating gradient, score is divided by
Figure BDA00038711512900001111
Scaling, performing softmax processing and weighting processing on the scores, and outputting a final result, which is expressed as:
Figure BDA00038711512900001112
due to the fact that the dimensionality of input data is high, a single self-attention model is difficult to capture the information diversity of the road node time sequence. Therefore, the historical data compensation features of the missing road network flow data are extracted in parallel by adopting a plurality of self-attention models, the extracted features are spliced, and final output is further obtained through linear conversion, so that the aim of capturing the time dependence in a multi-angle and diversified manner is fulfilled. The multi-modal input data to the multi-headed self-attention network can be described as a linear mapping, and the historical data compensation characteristic of the missing road network flow data is represented as:
Figure BDA0003871151290000121
the historical data compensation feature of the missing road network traffic data may also be expressed as:
Figure BDA0003871151290000122
wherein Q is i "query" subspace, K, representing the ith head in a multi-head attention mechanism i "Key" subspace, V, representing the ith head in a Multi-headed attention System i Represents the "value" subspace of the ith head in a multi-head attention mechanism;
Figure BDA0003871151290000123
are respectively Q i 、K i 、V i A corresponding weight matrix.
The linear correlation between different nodes is obtained by calculating the Pearson correlation coefficient of the historical data of one week between different nodes, and the Pearson correlation coefficient between the road nodes i and j is calculated as follows:
Figure BDA0003871151290000124
wherein T is the length of the historical time sequence of the selected road section node; x is a radical of a fluorine atom it Traffic data representing a road node i at time t; x is the number of jt Traffic data representing a road node j at time t;
after the Pearson correlation coefficients among all road nodes are calculated, a Pearson correlation matrix is established
Figure BDA0003871151290000125
And used as a reconstructed road network topology.
Inputting the current missing road network traffic data to be repaired, an adjacent matrix and a Pearson correlation matrix in the drawing attention, and taking the obtained output as the missing data neighbor node compensation characteristic:
Figure BDA0003871151290000126
wherein σ (·) is a sigmoid function; k is the number of heads of the attention force; n is a radical of i The adjacent node set is a missing node i of the original adjacent matrix according to the road node; w k Paying attention to the dimension expansion parameter of the head number k corresponding to the graph; alpha is alpha k ij Normalizing attention coefficients for the attention head number k of the corresponding graph according to the original road network structure;
Figure BDA0003871151290000127
representing traffic flow data to be restored after time parallel processing, wherein T is the dimension of time, N is the total number of road sections, and C is the number of channels; n is a radical of i ' is an adjacent node set of a missing node i according to a reconstructed road network topological structure; w' k Paying attention to the dimension expansion parameter of the head number k corresponding to the graph; alpha' ij k And (4) the attention coefficient of the corresponding graph after the attention head number k is normalized according to the reconstructed road network topological structure.
And selecting external factors such as time sequence factors, time characteristic factors, weather factors and the like. The time series factors are divided into time series factors and time series factors. The time sequence factor refers to the hour of a day, the sub-sequence factor refers to the time period of an hour, in this embodiment, 1 hour is divided into 12 time periods at intervals of 5 minutes as the sub-sequence factor, a person skilled in the art can divide the granularity of an hour according to actual needs, and similarly, the division granularity of the time sequence factor is 24 in this embodiment, so each hour of 24 hours is taken as a time sequence, therefore, a person skilled in the art can divide a day into time sequences of N granularities according to actual needs, so the size of each divided granularity is greater than 1 hour and less than 24 hours, a person skilled in the art can divide the time in one time sequence granularity into the sub-sequence factor, and the size of each granularity of the sub-sequence factor is greater than 1 minute and less than 60 minutes. The time characteristic factor indicates whether the day is a holiday or a holiday. The weather factors can be selected according to season change, and can also be selected according to the fact that the temperature is divided into a plurality of intervals, rainfall is selected as the weather factors in the embodiment to represent weather conditions, and the rainfall is divided into 6 levels of no rain, light rain, medium rain, heavy rain and extra heavy rain according to the rainfall level.
Respectively performing convolution operation on 3 external factors, and converting the external factors into the dimension which is the same as the compensation characteristic matrix extracted by the space-time data generation component, namely xi i ∈R C×N×T′ I is the category of external factors, and the input F of the multi-source heterogeneous fusion module is obtained after the splicing of the 3 types of external factors and the historical data compensation characteristics and the neighbor node compensation characteristics is completed on the 0 th dimension 0 ∈R 5C ×N×T′ ,F 0 =concat(F t ,F s123 ),ξ 1 、ξ 2 、ξ 3 Time series factors, time characteristic factors and weather factors, respectively. The extracted channel features are propagated along the channel dimension using an attention mechanism to obtain the output result Y' of the module, thereby better focusing on important features and suppressing unnecessary features. The output result of the module is recombined with the observed real data to realize the data restoration of local deletion and obtain the restored traffic flow
Figure BDA0003871151290000131
Figure BDA0003871151290000132
Where, denotes the hadamard product, and m is the mask matrix.
And a discriminator is introduced into the pixelation generation antagonistic network, and is alternately trained with the generator to continuously compete, so that the generator can better generate data. In order to generate data of which the model focuses on missing positions, the judger distinguishes the truth of each specific data in the matrix, and the purpose of repairing the local part accurately is achieved. The input of the discriminator is the repaired complete data
Figure BDA0003871151290000141
Embedding space position codes between roads and mapping a high-dimensional space by using two convolution layers respectively, then extracting space-time information of input data by using an attention mechanism to evaluate generated data, and finally obtaining each piece of restored complete data through two complete connection layers and a sigmoid activation function
Figure BDA0003871151290000142
True probability of individual data. The discriminator can be viewed as the function D χ → [0,1],
Figure BDA0003871151290000143
Represents the complete data
Figure BDA0003871151290000144
The (i, j) th component after being input into the discriminator D corresponds to the real probability of the (i, j) th data of the traffic flow restoration result.
The content of the traffic flow data restoration algorithm is as follows:
Figure BDA0003871151290000145
Figure BDA0003871151290000151
fig. 2 shows the comparison result between the true value and the repair value when the deficiency rate is 50%, and it can be seen from the figure that the repair result of the model is close to the true value of the traffic flow. Table 1 shows the comparison of MAE (mean absolute error), MAPE (mean absolute error) and RMSE (mean square error) performance at different loss rates, comparing the performance of the ST-DIGAN model with the latest BGCP model, BATF model and GAIN. As can be seen from Table 1, the MAE of this model was relatively small at both low and high deletion rates.
TABLE 1
Figure BDA0003871151290000152
The invention provides a method for generating and confronting network traffic data repairing based on road network pixelation, aiming at the effective representation problem of vehicle motion track data and a plurality of challenges of complex traffic data loss types, uncertain loss rate and the like. In the present embodiment, first, a traffic network and a trajectory data pixelization expression algorithm are designed in consideration of the irregularity of the trajectory data, and the trajectory data is expressed as a time-space 2-dimensional map. Then, considering the complex loss type and uncertain loss rate of the traffic data, a generator for generating a countermeasure network is optimized through a spatio-temporal data generation component so as to better repair the missing road network traffic data from three dimensions of historical data, neighbor nodes and external weather. Finally, the model was evaluated and tested under the adult taxi track data set. Experiments show that under the condition of high data loss rate, compared with other baseline methods, the model has a good repairing effect on traffic flow loss data, and the robustness and the high precision of the model are shown.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. A method for interpolating confrontation flow data generated based on road network pixelation is characterized by comprising the following steps:
aiming at the problem of representation of traffic flow trajectory data, the trajectory data is converted into a 2-dimensional characteristic graph representation of time-space by considering the space-time characteristics of the traffic flow trajectory, and a traffic flow characteristic graph is obtained;
generating a countermeasure network based on Wasserstein to construct road network flow data to generate a countermeasure network model, rebuilding a generator of the Wasserstein generated countermeasure network and carrying out optimization training on the generator, inputting a traffic flow characteristic diagram into the generator which completes the optimization training to repair a missing part in the traffic flow characteristic diagram;
inputting the repaired data into a discriminator of the road network flow data generation confrontation network model to judge the truth of the repaired data, finishing the repair if the truth is larger than a set threshold value, otherwise, inputting the road network flow data again to generate the confrontation network model to carry out data repair.
2. The road network pixelation-based Wasserstein generation confrontation flow data interpolation method as claimed in claim 1, wherein the step of converting trajectory data into a time-space 2-dimensional feature map representation comprises the steps of:
carrying out densification processing on the acquired track data, and matching the track coordinates with the road network data by adopting a geometric-based matching algorithm under the condition that the coordinates of the track points and the coordinates of the road network data after the densification processing are projected to uniform coordinates through Gauss;
stacking the flow time sequences of adjacent road sections together to obtain road section historical flow data of T dimension measured by a detector at the tth moment of NxT dimension, wherein N is the number of the road sections, and T is the time dimension;
creating a first mask matrix for representing the data missing condition, namely when the median of the first mask matrix is 1, the data missing is represented, and when the median of the first mask matrix is 0, the data missing is represented;
and multiplying the first mask matrix with the T-dimensional road section historical flow data measured by the detector at the T-th moment to obtain a traffic flow characteristic diagram.
3. The road network pixelation-based Wasserstein generation confrontation flow data interpolation method as claimed in claim 2, wherein the process of generating projected coordinates (x, y) by performing Gaussian projection on the coordinates comprises:
Figure FDA0003871151280000021
wherein N is a radius curvature halfDiameter;
Figure FDA0003871151280000022
lon and lat are respectively longitude and latitude of coordinates before projection, lon "= lon-lon 0 ;lon 0 Is the center meridian longitude; t = tanlat; eta 2 =e' 2 cos 2 lat, e' is the second eccentricity of the ellipsoid; and X is the meridian arc length.
4. The road network pixelation-based Wasserstein generation countermeasure network traffic data interpolation method as claimed in claim 2, wherein when matching the track coordinates with the road network data by adopting a geometry-based matching algorithm, the track points are matched with the road segments with the minimum vertical projection distance of each road segment in the road network data, and the vertical projection distance is expressed as:
Figure FDA0003871151280000023
wherein D is a vertical projection distance; (x, y) represents coordinates of the trace points; (x) 0 ,y 0 ) And (x) 1 ,y 1 ) Coordinates of two points in a road segment.
5. The road network pixelation-based method for interpolating warspersein countermeasures generated network traffic data, as claimed in claim 2, wherein T-dimension road section historical traffic data measured by the T-th time detector is represented as:
Figure FDA0003871151280000024
wherein, Y t Representing the historical traffic data of the road section of the T dimension measured by the detector at the T-th time; y is i(t-jΔt) For road e i Traffic data at time (T-j Δ T), j =0,1, …, T-1; Δ t is a flow data sampling interval; t is the length of the historical time series of road segments.
6. The road network pixelation-based Wasserstein generation countermeasure network traffic data interpolation method as claimed in claim 1, wherein the process of repairing missing data by a road network traffic data generation countermeasure network model comprises:
splicing traffic flow data to be repaired and adjacent traffic flow data thereof, daily cycle traffic flow data thereof and Zhou Zhouqi traffic flow together, extracting multi-modal input data through convolution operation, calculating the vector correlation of the multi-modal input data by using a multi-head attention mechanism, and taking the output of the multi-head attention mechanism as historical data compensation characteristics of the traffic flow data;
reconstructing a road network topological structure by using the Pearson correlation coefficient of the historical time data of the road nodes, calculating the attention coefficient of a neighbor node according to the inherent road node adjacency relation of the missing node and the reconstructed road network topological structure through a graph attention network, and capturing the information of the neighbor node for repairing the missing data in different dimensions as the neighbor compensation characteristic of the missing data;
constructing a sequence of time sequence factors, time characteristic factors and weather factors, and enabling the dimensionality of the three sequences to be the same as the dimensionality of historical data compensation characteristics and neighbor node compensation characteristics of missing road network flow data through convolution operation;
and splicing the historical data compensation characteristics and the neighbor node compensation characteristics of the missing road network flow data and the sequences of time sequence factors, time characteristic factors and weather factors, and propagating and extracting the channel characteristics along the channel dimension to serve as the value of the repaired real data.
7. The road network pixelation-based Wasserstein generation confrontation flow data interpolation method according to claim 6, wherein the historical data compensation characteristics of missing road network flow data are represented as:
Figure FDA0003871151280000031
wherein, F t Characteristics of historical data compensation for missing road network traffic data of traffic flow data; multihead (·) is multi-headed self-attention;
Figure FDA0003871151280000032
inputting data in multiple modes, wherein N is the total number of road sections, C is the number of channels, and T is the length of a historical time sequence of the road sections;
Figure FDA0003871151280000033
Figure FDA0003871151280000034
weight matrices, d, being the query subspace, the key subspace and the value subspace, respectively q 、d k 、d v Dimensions of the query subspace, the key subspace, and the value subspace, respectively; i =1,2,.. H, h is the number of heads of a multi-head self-notice.
8. The road network pixelation-based Wasserstein generation confrontation traffic data interpolation method according to claim 6, wherein the neighbor node compensation feature of missing road network traffic data is represented as:
Figure FDA0003871151280000041
wherein σ (·) is a sigmoid function; k is the number of heads of the graph attention; n is a radical of i The adjacent node set is a missing node i of the original adjacent matrix according to the road node; w k Paying attention to the dimension expansion parameter of the head number k corresponding to the graph; alpha is alpha k ij Normalizing the attention head number k of the corresponding graph according to the attention coefficient of the original road network structure;
Figure FDA0003871151280000042
representing the traffic flow data to be restored after time parallel processing, T is the dimension of time, and N is the total number of road sectionsC is the number of channels; n' i The adjacent node set is an adjacent node set of a missing node i according to the reconstructed road network topological structure; w' k Paying attention to the dimension expansion parameter of the head number k corresponding to the graph;
Figure FDA0003871151280000043
and (4) the attention coefficient of the corresponding graph after the attention head number k is normalized according to the reconstructed road network topological structure.
9. The road network pixelation-based Wasserstein generation countermeasure traffic data interpolation method according to claim 6, wherein the time series factors include time series factors and subsequence factors, the time series factors refer to the time at which the data to be repaired is in the first hour of a day, and the subsequence factors refer to the time at which the data to be repaired is in the first time period of a certain hour; the time characteristic sequence factor represents whether the time sequence to be repaired is a working day or not; the weather factor refers to the weather condition of the data to be repaired.
10. The road network pixelation-based Wasserstein generation countermeasure flow data interpolation method according to claim 1, wherein the pixelation generation countermeasure network model comprises a discriminator and a traffic flow generator, the discriminator is used for discriminating the truth of repair data generated by the generator, the discriminator and the traffic flow generator are respectively optimized by utilizing a RMSProp optimization algorithm in the training process, and a loss function adopted when the discriminator is optimized is as follows:
Figure FDA0003871151280000044
when the traffic flow generator is optimized, a reconstruction Loss function Loss is introduced re In combination with the penalty, the penalty function for the optimization training is expressed as:
Figure FDA0003871151280000051
Figure FDA0003871151280000052
wherein M represents a mask matrix;
Figure FDA0003871151280000053
representing the traffic flow data obtained by repairing; d (-) is a discriminator; α is the coefficient of the mask reconstruction loss function; n is the number of deletion positions;
Figure FDA0003871151280000054
is composed of
Figure FDA0003871151280000055
Represents the value of the jth time dimension of the ith road segment; m ij The element in M represents the missing condition of the jth time dimension traffic flow data of the ith road segment, and the value of the element is 1 to represent that the data is not missing and 0 to represent that the data is missing.
CN202211197830.9A 2022-09-29 2022-09-29 Road network pixelation-based Wasserstein generation countermeasure flow data interpolation method Pending CN115510174A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211197830.9A CN115510174A (en) 2022-09-29 2022-09-29 Road network pixelation-based Wasserstein generation countermeasure flow data interpolation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211197830.9A CN115510174A (en) 2022-09-29 2022-09-29 Road network pixelation-based Wasserstein generation countermeasure flow data interpolation method

Publications (1)

Publication Number Publication Date
CN115510174A true CN115510174A (en) 2022-12-23

Family

ID=84507223

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211197830.9A Pending CN115510174A (en) 2022-09-29 2022-09-29 Road network pixelation-based Wasserstein generation countermeasure flow data interpolation method

Country Status (1)

Country Link
CN (1) CN115510174A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115827335A (en) * 2023-02-06 2023-03-21 东南大学 Time sequence data missing interpolation system and method based on modal crossing method
CN116206443A (en) * 2023-02-03 2023-06-02 重庆邮电大学 Traffic flow data interpolation method based on time-space road network pixelized representation
CN116628435A (en) * 2023-07-21 2023-08-22 山东高速股份有限公司 Road network traffic flow data restoration method, device, equipment and medium
CN116913445A (en) * 2023-06-05 2023-10-20 重庆邮电大学 Medical missing data interpolation method based on form learning
CN117576918A (en) * 2024-01-17 2024-02-20 四川国蓝中天环境科技集团有限公司 Urban road flow universe prediction method based on multi-source data

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116206443A (en) * 2023-02-03 2023-06-02 重庆邮电大学 Traffic flow data interpolation method based on time-space road network pixelized representation
CN116206443B (en) * 2023-02-03 2023-12-15 重庆邮电大学 Traffic flow data interpolation method based on time-space road network pixelized representation
CN115827335A (en) * 2023-02-06 2023-03-21 东南大学 Time sequence data missing interpolation system and method based on modal crossing method
CN116913445A (en) * 2023-06-05 2023-10-20 重庆邮电大学 Medical missing data interpolation method based on form learning
CN116913445B (en) * 2023-06-05 2024-05-07 重庆邮电大学 Medical missing data interpolation method based on form learning
CN116628435A (en) * 2023-07-21 2023-08-22 山东高速股份有限公司 Road network traffic flow data restoration method, device, equipment and medium
CN116628435B (en) * 2023-07-21 2023-09-29 山东高速股份有限公司 Road network traffic flow data restoration method, device, equipment and medium
CN117576918A (en) * 2024-01-17 2024-02-20 四川国蓝中天环境科技集团有限公司 Urban road flow universe prediction method based on multi-source data
CN117576918B (en) * 2024-01-17 2024-04-02 四川国蓝中天环境科技集团有限公司 Urban road flow universe prediction method based on multi-source data

Similar Documents

Publication Publication Date Title
CN111400620B (en) User trajectory position prediction method based on space-time embedded Self-orientation
CN115510174A (en) Road network pixelation-based Wasserstein generation countermeasure flow data interpolation method
CN109409499B (en) Track recovery method based on deep learning and Kalman filtering correction
CN111612243B (en) Traffic speed prediction method, system and storage medium
Chen et al. Vehicle trajectory prediction based on intention-aware non-autoregressive transformer with multi-attention learning for Internet of Vehicles
Ren et al. Mtrajrec: Map-constrained trajectory recovery via seq2seq multi-task learning
Wang et al. A multi-view bidirectional spatiotemporal graph network for urban traffic flow imputation
CN113762338B (en) Traffic flow prediction method, equipment and medium based on multiple graph attention mechanism
CN114925836B (en) Urban traffic flow reasoning method based on dynamic multi-view graph neural network
CN115829171A (en) Pedestrian trajectory prediction method combining space information and social interaction characteristics
Liu et al. Pristi: A conditional diffusion framework for spatiotemporal imputation
CN114202120A (en) Urban traffic travel time prediction method aiming at multi-source heterogeneous data
CN115206092A (en) Traffic prediction method of BiLSTM and LightGBM model based on attention mechanism
Tsiligkaridis et al. Personalized destination prediction using transformers in a contextless data setting
Wang et al. Reconstruction of missing trajectory data: a deep learning approach
Zeng et al. Multistage relation network with dual-metric for few-shot hyperspectral image classification
CN113837148A (en) Pedestrian trajectory prediction method based on self-adjusting sparse graph transform
CN115359437A (en) Accompanying vehicle identification method based on semantic track
CN115630211A (en) Traffic data tensor completion method based on space-time constraint
CN115082896A (en) Pedestrian trajectory prediction method based on topological graph structure and depth self-attention network
CN115457081A (en) Hierarchical fusion prediction method based on graph neural network
Lin et al. Pre-Training General Trajectory Embeddings With Maximum Multi-View Entropy Coding
Sun et al. Visual perception based situation analysis of traffic scenes for autonomous driving applications
Xu et al. Vehicle trajectory prediction considering multi-feature independent encoding
CN114328791B (en) Map matching algorithm based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination