CN113744525A

CN113744525A - Traffic distribution prediction method based on feature extraction and deep learning

Info

Publication number: CN113744525A
Application number: CN202110941891.0A
Authority: CN
Inventors: 王炜; 于维杰; 秦韶阳; 华雪东; 赵德; 陈思远
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-08-17
Filing date: 2021-08-17
Publication date: 2021-12-03

Abstract

The invention discloses a traffic distribution prediction method based on feature extraction and deep learning, which is characterized in that features of a departure traffic cell and an arrival traffic cell are input into a trained deep learning prediction model to obtain predicted traffic distribution, namely traffic volume between the departure traffic cell and the arrival traffic cell. The method can predict the traffic distribution among the traffic districts with high precision, provides basis for traffic planning and traffic control, and has higher popularization and application values.

Description

Traffic distribution prediction method based on feature extraction and deep learning

Technical Field

The invention belongs to the field of urban traffic, and particularly relates to a traffic distribution prediction method.

Background

In recent years, with the increasingly rapid urbanization process of China, the urbanization level is gradually improved, and the contradiction between the increasing population and the limited land resources causes a series of urban traffic problems, such as traffic jam, tail gas pollution and the like. In the aspect of traffic distribution, the problem of unbalanced traffic demand distribution is increasingly prominent, the problem of urban congestion is more serious, and the problem of unbalanced demand distribution cannot be fundamentally solved by only depending on a traffic management scheme and urban road network extension. Therefore, traffic distribution needs to be predicted based on the multivariate urban characteristics and the deep learning algorithm, so that traffic planning and traffic control are performed from the perspective of balancing traffic demands, and the problem of urban traffic congestion at the present stage is relieved.

At present, with the rapid development of perception technology, the omnibearing coverage of a mobile communication network, the popularization of a smart phone and the wide use of an electronic map, Point of Interest (Point of Interest) data in a city provides data support for resident trip characteristic research and trip demand analysis, and lays a foundation for urban traffic demand distribution prediction analysis. With the further deep research of the deep learning algorithm, the accuracy and efficiency of traffic distribution prediction are greatly improved, and conditions are provided for urban traffic demand distribution prediction.

Although much research has been focused on urban traffic demand prediction and analysis of its spatiotemporal distribution, various limitations still exist. The existing research mostly focuses on the influence of a single land use type on traffic distribution, but few scholars research and predict the traffic distribution based on population structure characteristics, traffic demand characteristics, land use characteristics and travel distance characteristics. Therefore, the existing research results have certain limitations.

Disclosure of Invention

In order to solve the technical problems mentioned in the background art, the invention provides a traffic distribution prediction method based on feature extraction and deep learning.

In order to achieve the technical purpose, the technical scheme of the invention is as follows:

a traffic distribution prediction method based on feature extraction and deep learning comprises the steps of inputting features of a departure traffic cell and an arrival traffic cell into a trained deep learning prediction model to obtain predicted traffic distribution, namely traffic volume between the departure traffic cell and the arrival traffic cell;

the deep learning prediction model is constructed by the following steps:

(1) data acquisition: collecting historical travel data of residents, traffic zone dividing data, urban population data and urban interest point data;

(2) data arrangement: the method comprises two steps of spatial data matching and traffic distribution extraction;

the spatial data matching is to match the departure position and the arrival position of each piece of historical travel data with the boundary of the traffic cell to determine a departure traffic cell and an arrival traffic cell

The traffic distribution extraction is to count departure traffic cells and arrival traffic cells of all historical travel data to obtain the traffic volume between every two traffic cells;

(3) feature extraction: extracting characteristics influencing traffic volume between traffic districts, including population structure characteristic extraction, traffic demand characteristic extraction, land utilization characteristic extraction and travel distance characteristic extraction;

(4) characteristic screening;

(5) constructing a model: and (4) constructing a deep learning prediction model, wherein the input of the model is the characteristics screened in the step (4), and the output of the model is traffic distribution.

Further, in the step (1), the resident historical travel data includes a travel date, a departure time, a departure position, an arrival time and an arrival position;

the traffic cell division data includes a traffic cell number, a traffic cell boundary, and a traffic cell area;

the city population data comprises the number of the standing population, the age of the standing population and the gender of the standing population;

the city interest point data comprises interest point types and interest point positions.

Further, the types of points of interest include catering services, shopping services, science and education culture services, public facilities, corporate enterprises, transportation facility services, financial insurance services, business residences, living services, healthcare services, government agencies and social groups, and lodging services.

Further, in the step (1), in the collected historical travel data of the residents and the urban interest point data, the departure position, the arrival position and the interest point position are all represented by longitude and latitude, and the boundary of the traffic cell is the existing street dividing boundary.

Further, in step (3), the population structure feature extraction: extracting the number of the constant-living population, the density of the constant-living population, the sex structure of the constant-living population and the age structure of the constant-living population of each traffic cell;

the traffic demand feature extraction: extracting the daily departure traffic volume, arrival traffic volume, per-person departure times and per-person arrival times of each traffic cell;

the land utilization characteristic extraction: extracting the interest point density of each traffic cell and the proportion of interest points of different types;

the travel distance feature extraction: and extracting the linear distance between every two traffic districts.

Further, the number of departure times per capita and the number of arrival times per capita of the traffic cell are obtained by dividing the number of the standing population of the traffic cell by the number of the departure traffic volume and the number of arrival traffic volume of the traffic cell.

Further, in the step (4), the feature screening comprises two steps of correlation analysis and significance analysis;

and (3) correlation analysis: calculating a correlation coefficient between every two features aiming at the features extracted in the step (3), if the correlation coefficient of the two features is larger than 0.3, calculating the correlation coefficient of the two features and the traffic distribution extracted in the step (2) respectively, and deleting the feature with the smaller correlation coefficient;

the significance analysis comprises the following steps: and (3) performing regression analysis on the retained features after the correlation analysis and the traffic distribution extracted in the step (2), and removing the features with the significance difference larger than 0.05.

Further, in step (5), the mean square error is calculated in the model training process, and when the mean square error is less than 0.03, the training is stopped, and the model construction is completed.

Adopt the beneficial effect that above-mentioned technical scheme brought:

firstly, carrying out spatial data matching and traffic distribution extraction based on resident historical trip data, traffic zone dividing data, city population data and city interest point data, and establishing a deep learning prediction model through extraction and screening of population structure characteristics, traffic demand characteristics, land utilization characteristics and trip distance characteristics; secondly, the traffic distribution among all traffic districts can be predicted with high precision through the traffic distribution prediction based on the feature extraction and the deep learning, and a basis is provided for traffic planning and traffic control.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention;

FIG. 2 is a schematic diagram of traffic cell segmentation data in an embodiment;

fig. 3 is a correlation matrix diagram in the embodiment.

Detailed Description

The technical scheme of the invention is explained in detail in the following with the accompanying drawings.

According to the method, historical travel data of residents in Nanjing city, traffic zone dividing data, Nanjing city population data and Nanjing city interest point data in 2019 are adopted, and Nanjing city traffic distribution prediction based on feature extraction and deep learning is achieved according to data processing steps in the technical scheme. The method flow is shown in figure 1, and comprises the following 6 steps:

(1) data acquisition: the example takes Nanjing as the main research area. Historical travel data of residents in Nanjing city, traffic zone dividing data, population data of Nanjing city and interest point data of Nanjing city are collected. The resident historical travel data comprises five fields of travel date, departure time, departure position, arrival time and arrival position; the population data comprises the number of the permanent population, the age of the permanent population and the gender of the permanent population; dividing each administrative district of Nanjing city into 3324 traffic districts by taking the current street dividing boundary as the boundary of the traffic district, as shown in FIG. 2; the interest point data includes interest point types and interest point positions, as shown in table 1, wherein the interest point types are divided into 12 categories: catering services, shopping services, science and education culture, public facilities, corporate enterprises, transportation facility services, financial insurance services, business housing, living services, healthcare services, government agencies and social groups, lodging services.

TABLE 1 points of interest data of Nanjing City

Type (B)	Longitude (G)	Latitude
			Catering service	118.713855	32.215343
Catering service	118.713702	32.214973
			Life service	118.738974	32.067836
Life service	118.73895	32.06745
			……	……	……
Commercial residence	118.865525	31.777959
			Commercial residence	118.869361	31.77501

(2) Data arrangement: the method comprises two steps of spatial data matching and traffic distribution extraction. Spatial data matching: matching the departure position and the arrival position of each piece of historical travel data with the boundary of a traffic cell, and determining the numbers of the traffic cells where the departure position and the arrival position are located, namely the departure traffic cell and the arrival traffic cell; traffic distribution extraction: and counting the departure traffic districts and the arrival traffic districts of all historical travel data to obtain the traffic volume between every two traffic districts.

(3) Feature extraction: the method comprises four steps of population structure feature extraction, traffic demand feature extraction, land utilization feature extraction and travel distance feature extraction. Population distribution characteristic extraction: and extracting the number of the standing population, the density of the standing population, the gender structure of the standing population and the age structure of the standing population of each traffic cell. The number of surviving population, gender structure and age structure in this example are shown in table 2. Extracting traffic distribution characteristics: extracting the daily departure traffic volume, arrival traffic volume, per-person departure times and per-person arrival times of each traffic cell; land utilization feature extraction: and extracting the interest point density of each traffic cell and the proportion of interest points of different types. The proportions of the different types of interest points in this example are shown in table 3. And (3) travel distance feature extraction: and extracting the linear distance between every two traffic districts.

TABLE 2 population distribution feature extraction results

TABLE 3 statistics of interest Point type ratios

Categories	Number of	Ratio of
			Food and beverage	49276	16.93％
Public facility	4576	1.57％
			Company(s)	44181	15.18％
Shopping	71271	24.49％
			Traffic facility	23620	8.12％
Education	15134	5.20％
			Financial insurance	6566	2.26％
Life saving	45251	15.55％
			Medical treatment	5734	1.97％
Government	9291	3.19％
			Accommodation device	5907	2.03％
House with a plurality of rooms	10236	3.52％
			Total of	291043	100％

(4) And (3) feature screening: the method comprises two steps of correlation analysis and significance analysis. And (3) correlation analysis: and (4) carrying out correlation analysis on the features extracted in the step (3), wherein a correlation matrix is shown in figure 3. As can be seen from the correlation matrix, the land use intensity, the per-person attraction amount, the shopping and company POI ratio and at least one other index have the correlation of more than 0.3, and the four indexes are deleted in the subsequent analysis. And (3) significance analysis: and performing regression analysis on the residual characteristics subjected to the correlation analysis and the traffic distribution, and removing the characteristics with the significance difference larger than 0.05.

(5) Constructing a model: and (4) constructing a deep neural network model, wherein the input of the model is the characteristics screened in the step (4), and the output of the model is the traffic volume between every two traffic districts, namely traffic distribution. The parameters of the model take the following values: the maximum training times are 100, the initial learning rate of the network is 0.1, the expected learning error of the network is 0.0004, the number of neurons in an input layer is 13, the number of neurons in a hidden layer is 3, and the number of neurons in an output layer is 1. And calculating the mean square error in the training process, stopping the training when the mean square error is less than 0.03, and finishing the model construction.

(6) And (3) distribution prediction: the constructed deep neural network model is adopted to predict the traffic distribution, the Mean Square Error (MSE) of a predicted value and an actual value is 1.625 x 10-3, and the Mean Absolute Percentage Error (MAPE) is 2.543%, which shows that the method can realize high-precision prediction of the traffic distribution based on feature extraction and deep learning.

The embodiments are only for illustrating the technical idea of the present invention, and the technical idea of the present invention is not limited thereto, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the scope of the present invention.

Claims

1. A traffic distribution prediction method based on feature extraction and deep learning is characterized by comprising the following steps: inputting the characteristics of the departure traffic cell and the arrival traffic cell into the trained deep learning prediction model to obtain predicted traffic distribution, namely the traffic volume between the departure traffic cell and the arrival traffic cell;

the deep learning prediction model is constructed by the following steps:

(4) characteristic screening;

2. The traffic distribution prediction method based on feature extraction and deep learning of claim 1, wherein: in the step (1), the resident historical travel data comprises a travel date, a departure time, a departure position, an arrival time and an arrival position;

3. The traffic distribution prediction method based on feature extraction and deep learning of claim 2, characterized in that: the types of points of interest include catering services, shopping services, science and education culture services, public facilities, corporate enterprises, transportation facility services, financial insurance services, business residences, living services, healthcare services, government agencies and social groups and lodging services.

4. The traffic distribution prediction method based on feature extraction and deep learning of claim 1, wherein: in the step (1), in the collected historical travel data of residents and the urban interest point data, the departure position, the arrival position and the interest point position are all represented by longitude and latitude in a unified manner, and the boundary of the traffic cell is the existing street dividing boundary.

5. The traffic distribution prediction method based on feature extraction and deep learning of claim 1, wherein: in step (3), the population structure feature extraction: extracting the number of the constant-living population, the density of the constant-living population, the sex structure of the constant-living population and the age structure of the constant-living population of each traffic cell;

6. The traffic distribution prediction method based on feature extraction and deep learning of claim 5, wherein: and the number of the departure times and the arrival times of the average people in the traffic cell are obtained by dividing the departure traffic volume and the arrival traffic volume of the traffic cell by the number of the permanent population of the cell.

7. The traffic distribution prediction method based on feature extraction and deep learning of claim 1, wherein: in the step (4), the feature screening comprises two steps of correlation analysis and significance analysis;

8. The traffic distribution prediction method based on feature extraction and deep learning of claim 1, wherein: in the step (5), the mean square error is calculated in the model training process, and when the mean square error is less than 0.03, the training is stopped, and the model construction is completed.