CN111860699B

CN111860699B - Commuting trip mode identification method based on fluctuation rate

Info

Publication number: CN111860699B
Application number: CN202010872239.3A
Authority: CN
Inventors: 安奎霖; 杨梦宁; 曹景南; 王明宸; 王壮壮
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2021-04-13
Anticipated expiration: 2040-08-26
Also published as: CN111860699A

Abstract

The invention relates to a commuting trip mode identification method based on fluctuation rate, which comprises the steps of firstly dividing urban areas, clustering by using a K-means algorithm, randomly selecting a clustering center, calculating the influence distance from each station to each clustering center point aiming at each station in a clustering data set, determining which clustering center the influence distance from the station to is the smallest, dividing the station into the clustering center class, and outputting the clustering center and all stations in each class; and then, carrying out commuting travel mode identification, wherein a passenger flow fluctuation rate is introduced in the step, the number q of fluctuation rates larger than a threshold value is counted, and if q is 4 and four fluctuation rate peak values respectively correspond to the starting and stopping time points of the early peak and the late peak, the pair of study object areas are identified as commuting travel modes. The method can accurately identify the commuting travel mode, so that the accuracy of station passenger flow prediction is improved, and further, early warning of congestion or abnormity can be effectively given.

Description

Commuting trip mode identification method based on fluctuation rate

Technical Field

The invention relates to an LSTM network data preprocessing method for predicting passenger flow in an OD area, in particular to a commuting travel mode identification method based on fluctuation rate.

Background

Along with the development of the modernization process of the world city and the gradual rise and development of various business circles in the city, the city economy is continuously flourished, meanwhile, the number of urban residents is also rapidly increased, the number of road motor vehicles is increased day by day, and great pressure is brought to urban road traffic. The urban road congestion condition is more serious due to the fact that the number of the urban roads cannot meet the travel demand of urban residents. The congestion of road traffic seriously restricts the development of urban economy, and becomes an important problem restricting the development of urban modernization. In recent years, urban cultural communication is frequent, including the holding of various large-scale events, and the quantity of urban residents in residents is increasing due to holidays, so that the sudden increase of passenger flow in a short time is very likely to be caused. The quality of life of urban residents is continuously improved, and the requirements on the comfort level and the convenience of traveling are also continuously increased. The urban rail transit is one of important transportation means for solving urban road congestion due to the advantages of convenience, rapidness, punctuality and large passenger capacity.

At present, rail transit is taken as an important travel mode for traveling of residents in Chongqing cities, so that the rail transit becomes an aorta of urban traffic for the Chongqing cities, and an important mode for relieving traffic jam conditions is provided. At Chongqing, more than 200 million people have passenger flow entering and leaving urban rail transit every day. The networking complexity of urban rail transit is continuously increased, the future traffic trend analysis is more and more emphasized, and based on the analysis result of regional OD passenger flow prediction, a traffic operation plan can be made, and early warning of congestion or abnormity can be made to improve the operation efficiency and the service quality of rail transit, so that the urban rail transit becomes one of key technologies of an Intelligent Transportation System (ITS).

The passenger flow prediction of the region OD is researched by taking historical passenger flow as an entry point, and the station region division and the region passenger flow travel mode of urban rail transit are identified, so that the early warning of congestion or abnormity can be effectively given.

Disclosure of Invention

Aiming at the problems in the prior art, the technical problems to be solved by the invention are as follows: the method for effectively identifying the commuting travel mode is provided.

In order to solve the technical problems, the invention adopts the following technical scheme: a commuting travel mode identification method based on fluctuation rate comprises the following steps:

s10, dividing the city area;

s11 go through the cityThe number and the clustering range of the divided and clustered regions of some administrative and functional region division planning regions are as follows: all n study site x are taken as a clustering dataset Ω, Ω ═ x₁,x₂,x₃……x_n}；

All the research object sites are respectively classified into k site area sets theta_iIn (c) (-)_i＝{x_i,1,x_i,2,x_i,3……}，i∈{1,2,3,4…,k}；

S12, clustering by using a K-means algorithm, randomly selecting a clustering center, and calculating the influence distance from each station to each clustering center point aiming at each station in the omega clustering data set

Determining site x_iThe smallest distance of impact to which cluster center, site x will be_iDividing the cluster centers into classes of the cluster centers, and outputting the cluster centers and all the sites in each class;

s20, identifying a commuting travel mode;

s21, each clustering center and all the sites of the clustering center form a research object area, and a group of research object areas are formed by the two research object areas;

randomly extracting hourly passenger flow statistics a for a plurality of working days for a group of study areas_iThe 24-hour passenger flow data form a data set Ψ, Ψ ═ a₁,a₂,a₃…a₂₄}；

S22, respectively calculating 24-n passenger flow fluctuation rates S_iSearch for 24-n passenger flow fluctuation rates s_iCounting the number q of fluctuation rates larger than a threshold value:

if q is 4 and the four fluctuation rate peaks correspond to the start-stop time points of the early peak and the late peak, respectively, then this set of subject areas is identified as commuting travel pattern.

As an improvement, the method for randomly selecting the cluster center in S12 is as follows:

1) randomly selecting a site from a clustering data set omega as an initial clustering center C₁By the formula(1) Compute site x_iAnd cluster center point C_jEuclidean distance of

Calculating site x by equation (2)_iProbability of being selected as next cluster center point

Where k is the coordinate parameter dimension, x_i,kAnd c_j,kRespectively represent sites x_iAnd cluster center point C_jThe kth-dimension data of (1);

2) according to each site x_iIs/are as follows

Determining the area of the wheel disc of each station, selecting the next clustering center by using the wheel disc method, and belonging to the same theta after selecting the next clustering center each time_iX of_iDeleting the cluster centers from the wheel disc, and sequentially selecting k cluster centers as a cluster center point set phi, wherein phi is { c }₁,c₂,c₃…c_t…c_k}。

As an improvement, the influence distance from each station to each cluster center point is calculated in the step S12

The method comprises the following steps:

I) calculating the Euclidean distance mean value S from each non-cluster-center site to the cluster center by using the formula (3)_iAs characteristic Euclidean distance values;

calculating the belonged characteristic value R of each non-clustering-center site to the preset region of the research clustering center by using the formula (4)_i；

i∈{1,2,3,4…,n-k} (4)；

II) using the formula (5) and the formula (6) to obtain S_iAnd R_iS 'is obtained by normalization calculation'_iAnd R'_i；

III) calculating entropy values e of the current research clustering centers S and R using formula (7) and formula (8), respectively_SAnd e_R；

Wherein, S "_iAnd R "_iIs two calculated intermediate values, without practical meaning, S "_iAnd R "_iRespectively calculated by formula (9) and formula (10);

IV) calculating the information entropy redundancies d of the S and R of the current research clustering centers by respectively using a formula (11) and a formula (12)_SAnd d_R；

d_S＝1-e_S (11)；

d_R＝1-e_R (12)；

V) calculating information entropy weights w of the current research cluster centers S and R by using formula (13) and formula (14) respectively_SAnd w_R；

V) repeating the calculation processes of I) to V) to obtain information entropy weights w of k clustering centers_S,iAnd w_R,i；

VI) clustering operation is carried out by using a K-means clustering algorithm, and the influence distance from each station to each clustering center point is calculated by using a formula (15)

As an improvement, the S12 outputs the cluster center and all the sites in each class as:

a) calculating the influence distance from each site to each cluster center point

DeterminingSite x_iThe smallest distance of impact to which cluster center, site x will be_iDividing into the cluster center class;

b) for each class i after repartitioning in a), calculating a new cluster center c for that class using equation (16)_i；

c) Repeatedly and randomly selecting clustering centers and calculating the influence distance from each site to each clustering center

And finishing the division of the regional sites until the position of the clustering center of each category is not changed any more, and outputting the clustering center and all the sites in each category.

As an improvement, the passenger flow fluctuation rate S in S22_iThe calculation method comprises the following steps:

s221: 23 logarithmic parameters b of the time of the passenger flow are calculated by using a formula (17)_i，

S222: calculating 24-n passenger flow fluctuation rates s using equation (18)_iWhere n is the fluctuation observation range, s₁The fluctuation rate at the time of 1+ (n-1)/2 points is shown;

wherein the calculation is performed using the formula (19)

As an improvement, in S22, the method for counting the number q of fluctuation rates greater than the threshold value includes:

calculating 24-n passenger flow fluctuation rates s using the formula (20)_iMean value of

And standard deviation d_s；

Wherein the calculation is performed using the formula (21)

Retrieving 24-n passenger flow fluctuation rates s_iCounting the number q of fluctuation rates larger than a threshold value:

compared with the prior art, the invention has at least the following advantages:

according to the method, the urban rail transit station area division is carried out by combining two factors of urban administration, functional area division and station geographic position and using the entropy weight calculation method. And after the regional division is mature, the fluctuation rate is introduced to identify the morning and evening peaks of regional passenger flow so as to identify the commuting travel mode. Finally, all the preprocessing operations enable the area pairs identified as the commuting travel modes to adopt the working day data of the holidays excluded as historical contemporaneous data sequences for calculation and prediction so as to achieve more accurate area OD passenger flow prediction effect.

Drawings

FIG. 1 shows the K-mean clustering result of the present invention.

FIG. 2 is a region-divided graph obtained by the method of the present invention.

Detailed Description

The present invention is described in further detail below.

The urban area division according to urban rail transit is the basis for extracting the travel mode of urban rail transit passengers. In order to divide urban areas, a K-mean clustering algorithm is applied on the basis of analyzing the structure of a rail transit network and dividing urban administrative and functional areas.

A commuting travel mode identification method based on fluctuation rate comprises the following steps:

s10, dividing the city area;

and S11, dividing the clustering number and the clustering range by the current administrative and functional area division planning areas of the city: all n study site x are taken as a clustering dataset Ω, Ω ═ x₁,x₂,x₃……x_n}；

s20, identifying a commuting travel mode;

for a group of study areasRandomly extracting hourly passenger flow statistical data a of a plurality of working days by domain_iThe 24-hour passenger flow data form a data set Ψ, Ψ ═ a₁,a₂,a₃…a₂₄}；

if q is 4 and the four fluctuation rate peaks correspond to the start and stop time points of the early and late peaks, respectively, in this example 7 am and 9 am, and 6 pm and 8 pm, then this set of subject areas is identified as commuting travel mode.

1) randomly selecting a site from a clustering data set omega as an initial clustering center C₁Calculating site x by equation (1)_iAnd cluster center point C_jEuclidean distance of

2) according to each site x_iIs/are as follows

Determining the area of the wheel disc of each station, selecting the next clustering center by using the wheel disc method, and belonging to the same theta after selecting the next clustering center each time_iX of_iDeleting the cluster from the wheel disc to ensure that the last k cluster centers are respectively positioned at different preset theta_iSequentially selecting k clustering centers as a clustering center point set phi, wherein phi is { c ═ c₁,c₂,c₃…c_t…c_k}。

In order to enable the final region division result to contain the characteristics of urban administrative and functional regions, an entropy weight is introduced to determine the Euclidean distance and the weight value of the condition of a preset region, and the influence distance is the sum of the product of the Euclidean distance and the weight value.

Firstly, because the value of the influence distance is the sum of the product of the euclidean distance eigenvalue and the weight value of the affiliated eigenvalue of the preset region, respectively, we need to obtain the information entropy weight of each cluster center about the two eigenvalues. The following is the process of finding the information entropy weight of a certain cluster center (the entropy weight needs to be calculated for each cluster center pair):

The method comprises the following steps:

i∈{1,2,3,4…,n-k} (4)；

d_S＝1-e_S (11)；

d_R＝1-e_R (12)；

Determining site x_iThe smallest distance of impact to which cluster center, site x will be_iDividing into the cluster center class;

c) Repeated random selection clusteringCentering and calculating the influence distance from each site to each cluster center point

After the area division is completed, in order to optimize the prediction effect by extracting historical synchronization data, the passenger flow travel mode is identified, and the commuting travel mode is mainly identified.

Since here we discuss commute travel patterns, we need to focus on early and late peaks, correspondingly we propose a commute travel pattern recognition based on the volatility.

According to the commute travel pattern definition, the identification of the commute pattern must be within the working day. And has two traffic peaks of early peak and late peak, and the early peak is probably 7 am to 9 am according to data statistics, and the late peak is 6 pm to 8 pm according to data statistics. This area OD traffic situation we call commute travel mode.

s221: 23 logarithmic parameters b of the time of the passenger flow are calculated by using a formula (17)_i，b₁Representing the corresponding parameter at 1 point in time, and so on

wherein the calculation is performed using the formula (19)

The method for counting the number q of the fluctuation rates larger than the threshold in the step S22 includes:

calculating 24-n passenger flow fluctuation rates using equation (20)

Mean value of

And standard deviation d_s；

Wherein the calculation is performed using the formula (21)

and (3) experimental verification:

in the experiment, the Chongqing city is taken as an example, and track traffic data in Chongqing city areas are taken as an experiment original data set.

The experimental results clearly show that the optimized clustering algorithm has stronger environmental adaptability and good dividing effect, and avoids the misclassification condition that the geographic position is close but the track distance is far.

The invention uses GPS positioning data of stations when spatial clustering is carried out on the track stations, and the attributes in the table 1 are as follows in sequence: card id, site number, site name, longitude, latitude.

TABLE 1 GPS positioning data

id	ostation	StationName	oLongitude	oLatitude
					1	101	Upward door	106.5844	29.55976
2	102	Small assorted Chinese character	106.5791	29.56167
					3	103	Field comparison port	106.5686	29.5564
4	104	Seven-star sentry box	106.5596	29.55797
					5	105	Two road junctions	106.5457	29.55557
6	106	Goose green	106.5302	29.5508
					7	107	Terrace	106.5149	29.54346
8	108	Petroleum road	106.5063	29.54199
					9	109	Resting table	106.4928	29.5379
10	110	Stone bridge is spread	106.4813	29.53553
					11	111	Gaomicun	106.465	29.53917
12	112	Majia rock	106.4648	29.548
					13	113	Small dragon ridge	106.4643	29.55621
…	…	…	…	…

By using the station area division method based on the entropy weight, the station area division method takes the Chongqing as an example result and divides the station area into the following 10 clustering areas:

the area 0 is a commercial tourist area represented by a red flag river channel transfer station.

In the area 1, fish holes are used as representatives of the southward region, and the scenic spots and ancient town courtyards are numerous.

Area 2 is the northwest region, cultural exchange center, centered on gumbo.

Area 3 is a campus parcel of a college city centered around the college city.

The area 4 is a Yu Chinese and western communication industry plot represented by a plateau and a Yuanjia post.

Region 5 is the middle beam mountain region.

Area 6 is a gong area.

The area 7 is a Chongqing North station-Jiangbei airport district in the direction of the Jiangbei airport, and comprises a railway station and an airport.

The area 8 is a scientific and educational culture area of an apron dam area with an apron dam as a center.

The region 9 is a convergence region of the Yangling Yangtze river represented by two paths of orifices and the south plateau.

FIG. 1 is a graph of the experimental results of the K-means clustering algorithm based on spatio-temporal influence distance, from which two points can be clearly seen: firstly, the influence of the geographic factors of clustering division is still obvious, the geographic position distance of each clustered station in the division result is relatively close, and the clustering condition with large geographic difference for meeting the influence of time dimension can not occur; secondly, the distribution of the clustering stations does not depend on the straight line geographic distance completely, and the clustering stations are all located at the similar positions of the rail transit lines from the view point of the distribution of the stations.

The urban area division step and the commuting trip pattern recognition step complement each other, a group of comparative examples are given below, and the comparison results are shown in table 2:

table 2 comparison of accuracy rates of prediction of commuting travel mode passenger flow of area OD before and after preprocessing

Network model	The method of the invention	Comparative example
			LSTM	95.6％	89.2％

The only difference between the comparative example and the method of the present invention is that the method of the present invention preprocesses the acquired site data of the study object by urban regional division, whereas the comparative example does not.

The table shows that the accuracy of the commuting travel mode passenger flow prediction is greatly improved by the method of the entropy weight to the urban area division.

Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims

1. A commuting travel mode identification method based on fluctuation rate is characterized by comprising the following steps:

s10, dividing the city area;

s11, dividing the clustering number and the clustering range through the existing administrative and functional areas of the city: all n study site x are taken as a clustering dataset Ω, Ω ═ x₁,x₂,x₃……x_n}；

the method for randomly selecting the cluster center in S12 comprises the following steps:

Calculating site x by equation (2)_iProbability P of being selected as next cluster center point_(xi)；

2) according to each site x_iP of_(xi)Determining the area of the wheel disc of each station, selecting the next clustering center by using the wheel disc method, and belonging to the same theta after selecting the next clustering center each time_iX of_iDeleting the cluster centers from the roulette wheel, and sequentially selecting k cluster centers as a cluster center point set phi, wherein phi is { c ═ c₁,c₂,c₃…c_t…c_k}；

In the step S12, the influence distance from each station to each cluster center point is calculated

The method comprises the following steps:

i) calculating the Euclidean distance mean value S from each non-clustering center site to the clustering center by using the formula (3)_iAs characteristic Euclidean distance values;

III) calculating the current research clustering centers S and S using equation (7) and equation (8), respectivelyEntropy of R e_SAnd e_R；

d_S＝1-e_S(11)；

d_R＝1-e_R(12)；

VI) repeating the calculation processes I) to V)Obtaining the information entropy weight w of k clustering centers_S,iAnd w_R,i；

VII) clustering operation is carried out by using a K-means clustering algorithm, and the influence distance from each site to each clustering center point is calculated by using a formula (15)

S20, identifying a commuting travel mode;

2. The method for wave-rate based commuter travel pattern recognition of claim 1, wherein said S12 outputs cluster center and all sites in each class as:

Determining site x_iThe smallest distance of impact to which cluster center, site x will be_iIs divided into the clusterThe class of the center;

3. The method of claim 2, wherein the passenger flow fluctuation rate S in S22 is a traffic flow pattern recognition method based on fluctuation rate_iThe calculation method comprises the following steps:

wherein the calculation is performed using the formula (19)

4. The method for identifying a commuting travel pattern based on fluctuation rate as claimed in claim 3, wherein the method for counting the number q of fluctuation rates greater than the threshold in S22 is:

And standard deviation d_s；

Wherein the calculation is performed using the formula (21)