CN111860699B - Commuting trip mode identification method based on fluctuation rate - Google Patents

Commuting trip mode identification method based on fluctuation rate Download PDF

Info

Publication number
CN111860699B
CN111860699B CN202010872239.3A CN202010872239A CN111860699B CN 111860699 B CN111860699 B CN 111860699B CN 202010872239 A CN202010872239 A CN 202010872239A CN 111860699 B CN111860699 B CN 111860699B
Authority
CN
China
Prior art keywords
clustering
formula
site
calculating
center
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010872239.3A
Other languages
Chinese (zh)
Other versions
CN111860699A (en
Inventor
安奎霖
杨梦宁
曹景南
王明宸
王壮壮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202010872239.3A priority Critical patent/CN111860699B/en
Publication of CN111860699A publication Critical patent/CN111860699A/en
Application granted granted Critical
Publication of CN111860699B publication Critical patent/CN111860699B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Mathematical Physics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Mathematical Analysis (AREA)
  • Marketing (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Operations Research (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Development Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Computation (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention relates to a commuting trip mode identification method based on fluctuation rate, which comprises the steps of firstly dividing urban areas, clustering by using a K-means algorithm, randomly selecting a clustering center, calculating the influence distance from each station to each clustering center point aiming at each station in a clustering data set, determining which clustering center the influence distance from the station to is the smallest, dividing the station into the clustering center class, and outputting the clustering center and all stations in each class; and then, carrying out commuting travel mode identification, wherein a passenger flow fluctuation rate is introduced in the step, the number q of fluctuation rates larger than a threshold value is counted, and if q is 4 and four fluctuation rate peak values respectively correspond to the starting and stopping time points of the early peak and the late peak, the pair of study object areas are identified as commuting travel modes. The method can accurately identify the commuting travel mode, so that the accuracy of station passenger flow prediction is improved, and further, early warning of congestion or abnormity can be effectively given.

Description

Commuting trip mode identification method based on fluctuation rate
Technical Field
The invention relates to an LSTM network data preprocessing method for predicting passenger flow in an OD area, in particular to a commuting travel mode identification method based on fluctuation rate.
Background
Along with the development of the modernization process of the world city and the gradual rise and development of various business circles in the city, the city economy is continuously flourished, meanwhile, the number of urban residents is also rapidly increased, the number of road motor vehicles is increased day by day, and great pressure is brought to urban road traffic. The urban road congestion condition is more serious due to the fact that the number of the urban roads cannot meet the travel demand of urban residents. The congestion of road traffic seriously restricts the development of urban economy, and becomes an important problem restricting the development of urban modernization. In recent years, urban cultural communication is frequent, including the holding of various large-scale events, and the quantity of urban residents in residents is increasing due to holidays, so that the sudden increase of passenger flow in a short time is very likely to be caused. The quality of life of urban residents is continuously improved, and the requirements on the comfort level and the convenience of traveling are also continuously increased. The urban rail transit is one of important transportation means for solving urban road congestion due to the advantages of convenience, rapidness, punctuality and large passenger capacity.
At present, rail transit is taken as an important travel mode for traveling of residents in Chongqing cities, so that the rail transit becomes an aorta of urban traffic for the Chongqing cities, and an important mode for relieving traffic jam conditions is provided. At Chongqing, more than 200 million people have passenger flow entering and leaving urban rail transit every day. The networking complexity of urban rail transit is continuously increased, the future traffic trend analysis is more and more emphasized, and based on the analysis result of regional OD passenger flow prediction, a traffic operation plan can be made, and early warning of congestion or abnormity can be made to improve the operation efficiency and the service quality of rail transit, so that the urban rail transit becomes one of key technologies of an Intelligent Transportation System (ITS).
The passenger flow prediction of the region OD is researched by taking historical passenger flow as an entry point, and the station region division and the region passenger flow travel mode of urban rail transit are identified, so that the early warning of congestion or abnormity can be effectively given.
Disclosure of Invention
Aiming at the problems in the prior art, the technical problems to be solved by the invention are as follows: the method for effectively identifying the commuting travel mode is provided.
In order to solve the technical problems, the invention adopts the following technical scheme: a commuting travel mode identification method based on fluctuation rate comprises the following steps:
s10, dividing the city area;
s11 go through the cityThe number and the clustering range of the divided and clustered regions of some administrative and functional region division planning regions are as follows: all n study site x are taken as a clustering dataset Ω, Ω ═ x1,x2,x3……xn};
All the research object sites are respectively classified into k site area sets thetaiIn (c) (-)i={xi,1,xi,2,xi,3……},i∈{1,2,3,4…,k};
S12, clustering by using a K-means algorithm, randomly selecting a clustering center, and calculating the influence distance from each station to each clustering center point aiming at each station in the omega clustering data set
Figure GDA0002868941430000021
Determining site xiThe smallest distance of impact to which cluster center, site x will beiDividing the cluster centers into classes of the cluster centers, and outputting the cluster centers and all the sites in each class;
s20, identifying a commuting travel mode;
s21, each clustering center and all the sites of the clustering center form a research object area, and a group of research object areas are formed by the two research object areas;
randomly extracting hourly passenger flow statistics a for a plurality of working days for a group of study areasiThe 24-hour passenger flow data form a data set Ψ, Ψ ═ a1,a2,a3…a24};
S22, respectively calculating 24-n passenger flow fluctuation rates SiSearch for 24-n passenger flow fluctuation rates siCounting the number q of fluctuation rates larger than a threshold value:
if q is 4 and the four fluctuation rate peaks correspond to the start-stop time points of the early peak and the late peak, respectively, then this set of subject areas is identified as commuting travel pattern.
As an improvement, the method for randomly selecting the cluster center in S12 is as follows:
1) randomly selecting a site from a clustering data set omega as an initial clustering center C1By the formula(1) Compute site xiAnd cluster center point CjEuclidean distance of
Figure GDA0002868941430000022
Calculating site x by equation (2)iProbability of being selected as next cluster center point
Figure GDA0002868941430000023
Figure GDA0002868941430000024
Figure GDA0002868941430000025
Where k is the coordinate parameter dimension, xi,kAnd cj,kRespectively represent sites xiAnd cluster center point CjThe kth-dimension data of (1);
2) according to each site xiIs/are as follows
Figure GDA0002868941430000026
Determining the area of the wheel disc of each station, selecting the next clustering center by using the wheel disc method, and belonging to the same theta after selecting the next clustering center each timeiX ofiDeleting the cluster centers from the wheel disc, and sequentially selecting k cluster centers as a cluster center point set phi, wherein phi is { c }1,c2,c3…ct…ck}。
As an improvement, the influence distance from each station to each cluster center point is calculated in the step S12
Figure GDA0002868941430000027
The method comprises the following steps:
I) calculating the Euclidean distance mean value S from each non-cluster-center site to the cluster center by using the formula (3)iAs characteristic Euclidean distance values;
Figure GDA0002868941430000031
calculating the belonged characteristic value R of each non-clustering-center site to the preset region of the research clustering center by using the formula (4)i
Figure GDA0002868941430000032
i∈{1,2,3,4…,n-k} (4);
II) using the formula (5) and the formula (6) to obtain SiAnd RiS 'is obtained by normalization calculation'iAnd R'i
Figure GDA0002868941430000033
Figure GDA0002868941430000034
III) calculating entropy values e of the current research clustering centers S and R using formula (7) and formula (8), respectivelySAnd eR
Figure GDA0002868941430000035
Figure GDA0002868941430000036
Wherein, S "iAnd R "iIs two calculated intermediate values, without practical meaning, S "iAnd R "iRespectively calculated by formula (9) and formula (10);
Figure GDA0002868941430000037
Figure GDA0002868941430000038
IV) calculating the information entropy redundancies d of the S and R of the current research clustering centers by respectively using a formula (11) and a formula (12)SAnd dR
dS=1-eS (11);
dR=1-eR (12);
V) calculating information entropy weights w of the current research cluster centers S and R by using formula (13) and formula (14) respectivelySAnd wR
Figure GDA0002868941430000039
Figure GDA00028689414300000310
V) repeating the calculation processes of I) to V) to obtain information entropy weights w of k clustering centersS,iAnd wR,i
VI) clustering operation is carried out by using a K-means clustering algorithm, and the influence distance from each station to each clustering center point is calculated by using a formula (15)
Figure GDA0002868941430000041
Figure GDA0002868941430000042
As an improvement, the S12 outputs the cluster center and all the sites in each class as:
a) calculating the influence distance from each site to each cluster center point
Figure GDA0002868941430000043
DeterminingSite xiThe smallest distance of impact to which cluster center, site x will beiDividing into the cluster center class;
b) for each class i after repartitioning in a), calculating a new cluster center c for that class using equation (16)i
Figure GDA0002868941430000044
c) Repeatedly and randomly selecting clustering centers and calculating the influence distance from each site to each clustering center
Figure GDA0002868941430000045
And finishing the division of the regional sites until the position of the clustering center of each category is not changed any more, and outputting the clustering center and all the sites in each category.
As an improvement, the passenger flow fluctuation rate S in S22iThe calculation method comprises the following steps:
s221: 23 logarithmic parameters b of the time of the passenger flow are calculated by using a formula (17)i
Figure GDA0002868941430000046
S222: calculating 24-n passenger flow fluctuation rates s using equation (18)iWhere n is the fluctuation observation range, s1The fluctuation rate at the time of 1+ (n-1)/2 points is shown;
Figure GDA0002868941430000047
wherein the calculation is performed using the formula (19)
Figure GDA0002868941430000048
Figure GDA0002868941430000049
As an improvement, in S22, the method for counting the number q of fluctuation rates greater than the threshold value includes:
calculating 24-n passenger flow fluctuation rates s using the formula (20)iMean value of
Figure GDA00028689414300000410
And standard deviation ds
Figure GDA00028689414300000411
Wherein the calculation is performed using the formula (21)
Figure GDA00028689414300000412
Figure GDA00028689414300000413
Retrieving 24-n passenger flow fluctuation rates siCounting the number q of fluctuation rates larger than a threshold value:
Figure GDA0002868941430000052
compared with the prior art, the invention has at least the following advantages:
according to the method, the urban rail transit station area division is carried out by combining two factors of urban administration, functional area division and station geographic position and using the entropy weight calculation method. And after the regional division is mature, the fluctuation rate is introduced to identify the morning and evening peaks of regional passenger flow so as to identify the commuting travel mode. Finally, all the preprocessing operations enable the area pairs identified as the commuting travel modes to adopt the working day data of the holidays excluded as historical contemporaneous data sequences for calculation and prediction so as to achieve more accurate area OD passenger flow prediction effect.
Drawings
FIG. 1 shows the K-mean clustering result of the present invention.
FIG. 2 is a region-divided graph obtained by the method of the present invention.
Detailed Description
The present invention is described in further detail below.
The urban area division according to urban rail transit is the basis for extracting the travel mode of urban rail transit passengers. In order to divide urban areas, a K-mean clustering algorithm is applied on the basis of analyzing the structure of a rail transit network and dividing urban administrative and functional areas.
A commuting travel mode identification method based on fluctuation rate comprises the following steps:
s10, dividing the city area;
and S11, dividing the clustering number and the clustering range by the current administrative and functional area division planning areas of the city: all n study site x are taken as a clustering dataset Ω, Ω ═ x1,x2,x3……xn};
All the research object sites are respectively classified into k site area sets thetaiIn (c) (-)i={xi,1,xi,2,xi,3……},i∈{1,2,3,4…,k};
S12, clustering by using a K-means algorithm, randomly selecting a clustering center, and calculating the influence distance from each station to each clustering center point aiming at each station in the omega clustering data set
Figure GDA0002868941430000051
Determining site xiThe smallest distance of impact to which cluster center, site x will beiDividing the cluster centers into classes of the cluster centers, and outputting the cluster centers and all the sites in each class;
s20, identifying a commuting travel mode;
s21, each clustering center and all the sites of the clustering center form a research object area, and a group of research object areas are formed by the two research object areas;
for a group of study areasRandomly extracting hourly passenger flow statistical data a of a plurality of working days by domainiThe 24-hour passenger flow data form a data set Ψ, Ψ ═ a1,a2,a3…a24};
S22, respectively calculating 24-n passenger flow fluctuation rates SiSearch for 24-n passenger flow fluctuation rates siCounting the number q of fluctuation rates larger than a threshold value:
if q is 4 and the four fluctuation rate peaks correspond to the start and stop time points of the early and late peaks, respectively, in this example 7 am and 9 am, and 6 pm and 8 pm, then this set of subject areas is identified as commuting travel mode.
As an improvement, the method for randomly selecting the cluster center in S12 is as follows:
1) randomly selecting a site from a clustering data set omega as an initial clustering center C1Calculating site x by equation (1)iAnd cluster center point CjEuclidean distance of
Figure GDA0002868941430000061
Calculating site x by equation (2)iProbability of being selected as next cluster center point
Figure GDA0002868941430000062
Figure GDA0002868941430000063
Figure GDA0002868941430000064
Where k is the coordinate parameter dimension, xi,kAnd cj,kRespectively represent sites xiAnd cluster center point CjThe kth-dimension data of (1);
2) according to each site xiIs/are as follows
Figure GDA0002868941430000065
Determining the area of the wheel disc of each station, selecting the next clustering center by using the wheel disc method, and belonging to the same theta after selecting the next clustering center each timeiX ofiDeleting the cluster from the wheel disc to ensure that the last k cluster centers are respectively positioned at different preset thetaiSequentially selecting k clustering centers as a clustering center point set phi, wherein phi is { c ═ c1,c2,c3…ct…ck}。
In order to enable the final region division result to contain the characteristics of urban administrative and functional regions, an entropy weight is introduced to determine the Euclidean distance and the weight value of the condition of a preset region, and the influence distance is the sum of the product of the Euclidean distance and the weight value.
Firstly, because the value of the influence distance is the sum of the product of the euclidean distance eigenvalue and the weight value of the affiliated eigenvalue of the preset region, respectively, we need to obtain the information entropy weight of each cluster center about the two eigenvalues. The following is the process of finding the information entropy weight of a certain cluster center (the entropy weight needs to be calculated for each cluster center pair):
as an improvement, the influence distance from each station to each cluster center point is calculated in the step S12
Figure GDA0002868941430000066
The method comprises the following steps:
I) calculating the Euclidean distance mean value S from each non-cluster-center site to the cluster center by using the formula (3)iAs characteristic Euclidean distance values;
Figure GDA0002868941430000067
calculating the belonged characteristic value R of each non-clustering-center site to the preset region of the research clustering center by using the formula (4)i
Figure GDA0002868941430000071
i∈{1,2,3,4…,n-k} (4);
II) using the formula (5) and the formula (6) to obtain SiAnd RiS 'is obtained by normalization calculation'iAnd R'i
Figure GDA0002868941430000072
Figure GDA0002868941430000073
III) calculating entropy values e of the current research clustering centers S and R using formula (7) and formula (8), respectivelySAnd eR
Figure GDA0002868941430000074
Figure GDA0002868941430000075
Wherein, S "iAnd R "iIs two calculated intermediate values, without practical meaning, S "iAnd R "iRespectively calculated by formula (9) and formula (10);
Figure GDA0002868941430000076
Figure GDA0002868941430000077
IV) calculating the information entropy redundancies d of the S and R of the current research clustering centers by respectively using a formula (11) and a formula (12)SAnd dR
dS=1-eS (11);
dR=1-eR (12);
V) calculating information entropy weights w of the current research cluster centers S and R by using formula (13) and formula (14) respectivelySAnd wR
Figure GDA0002868941430000078
Figure GDA0002868941430000079
V) repeating the calculation processes of I) to V) to obtain information entropy weights w of k clustering centersS,iAnd wR,i
VI) clustering operation is carried out by using a K-means clustering algorithm, and the influence distance from each station to each clustering center point is calculated by using a formula (15)
Figure GDA00028689414300000710
Figure GDA00028689414300000711
As an improvement, the S12 outputs the cluster center and all the sites in each class as:
a) calculating the influence distance from each site to each cluster center point
Figure GDA0002868941430000081
Determining site xiThe smallest distance of impact to which cluster center, site x will beiDividing into the cluster center class;
b) for each class i after repartitioning in a), calculating a new cluster center c for that class using equation (16)i
Figure GDA0002868941430000082
c) Repeated random selection clusteringCentering and calculating the influence distance from each site to each cluster center point
Figure GDA0002868941430000083
And finishing the division of the regional sites until the position of the clustering center of each category is not changed any more, and outputting the clustering center and all the sites in each category.
After the area division is completed, in order to optimize the prediction effect by extracting historical synchronization data, the passenger flow travel mode is identified, and the commuting travel mode is mainly identified.
Since here we discuss commute travel patterns, we need to focus on early and late peaks, correspondingly we propose a commute travel pattern recognition based on the volatility.
According to the commute travel pattern definition, the identification of the commute pattern must be within the working day. And has two traffic peaks of early peak and late peak, and the early peak is probably 7 am to 9 am according to data statistics, and the late peak is 6 pm to 8 pm according to data statistics. This area OD traffic situation we call commute travel mode.
As an improvement, the passenger flow fluctuation rate S in S22iThe calculation method comprises the following steps:
s221: 23 logarithmic parameters b of the time of the passenger flow are calculated by using a formula (17)i,b1Representing the corresponding parameter at 1 point in time, and so on
Figure GDA0002868941430000084
S222: calculating 24-n passenger flow fluctuation rates s using equation (18)iWhere n is the fluctuation observation range, s1The fluctuation rate at the time of 1+ (n-1)/2 points is shown;
Figure GDA0002868941430000085
wherein the calculation is performed using the formula (19)
Figure GDA0002868941430000086
Figure GDA0002868941430000087
The method for counting the number q of the fluctuation rates larger than the threshold in the step S22 includes:
calculating 24-n passenger flow fluctuation rates using equation (20)
Figure GDA0002868941430000088
Mean value of
Figure GDA0002868941430000089
And standard deviation ds
Figure GDA0002868941430000091
Wherein the calculation is performed using the formula (21)
Figure GDA0002868941430000092
Figure GDA0002868941430000093
Retrieving 24-n passenger flow fluctuation rates siCounting the number q of fluctuation rates larger than a threshold value:
Figure GDA0002868941430000094
and (3) experimental verification:
in the experiment, the Chongqing city is taken as an example, and track traffic data in Chongqing city areas are taken as an experiment original data set.
The experimental results clearly show that the optimized clustering algorithm has stronger environmental adaptability and good dividing effect, and avoids the misclassification condition that the geographic position is close but the track distance is far.
The invention uses GPS positioning data of stations when spatial clustering is carried out on the track stations, and the attributes in the table 1 are as follows in sequence: card id, site number, site name, longitude, latitude.
TABLE 1 GPS positioning data
id ostation StationName oLongitude oLatitude
1 101 Upward door 106.5844 29.55976
2 102 Small assorted Chinese character 106.5791 29.56167
3 103 Field comparison port 106.5686 29.5564
4 104 Seven-star sentry box 106.5596 29.55797
5 105 Two road junctions 106.5457 29.55557
6 106 Goose green 106.5302 29.5508
7 107 Terrace 106.5149 29.54346
8 108 Petroleum road 106.5063 29.54199
9 109 Resting table 106.4928 29.5379
10 110 Stone bridge is spread 106.4813 29.53553
11 111 Gaomicun 106.465 29.53917
12 112 Majia rock 106.4648 29.548
13 113 Small dragon ridge 106.4643 29.55621
By using the station area division method based on the entropy weight, the station area division method takes the Chongqing as an example result and divides the station area into the following 10 clustering areas:
the area 0 is a commercial tourist area represented by a red flag river channel transfer station.
In the area 1, fish holes are used as representatives of the southward region, and the scenic spots and ancient town courtyards are numerous.
Area 2 is the northwest region, cultural exchange center, centered on gumbo.
Area 3 is a campus parcel of a college city centered around the college city.
The area 4 is a Yu Chinese and western communication industry plot represented by a plateau and a Yuanjia post.
Region 5 is the middle beam mountain region.
Area 6 is a gong area.
The area 7 is a Chongqing North station-Jiangbei airport district in the direction of the Jiangbei airport, and comprises a railway station and an airport.
The area 8 is a scientific and educational culture area of an apron dam area with an apron dam as a center.
The region 9 is a convergence region of the Yangling Yangtze river represented by two paths of orifices and the south plateau.
FIG. 1 is a graph of the experimental results of the K-means clustering algorithm based on spatio-temporal influence distance, from which two points can be clearly seen: firstly, the influence of the geographic factors of clustering division is still obvious, the geographic position distance of each clustered station in the division result is relatively close, and the clustering condition with large geographic difference for meeting the influence of time dimension can not occur; secondly, the distribution of the clustering stations does not depend on the straight line geographic distance completely, and the clustering stations are all located at the similar positions of the rail transit lines from the view point of the distribution of the stations.
The urban area division step and the commuting trip pattern recognition step complement each other, a group of comparative examples are given below, and the comparison results are shown in table 2:
table 2 comparison of accuracy rates of prediction of commuting travel mode passenger flow of area OD before and after preprocessing
Network model The method of the invention Comparative example
LSTM 95.6% 89.2%
The only difference between the comparative example and the method of the present invention is that the method of the present invention preprocesses the acquired site data of the study object by urban regional division, whereas the comparative example does not.
The table shows that the accuracy of the commuting travel mode passenger flow prediction is greatly improved by the method of the entropy weight to the urban area division.
Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims (4)

1. A commuting travel mode identification method based on fluctuation rate is characterized by comprising the following steps:
s10, dividing the city area;
s11, dividing the clustering number and the clustering range through the existing administrative and functional areas of the city: all n study site x are taken as a clustering dataset Ω, Ω ═ x1,x2,x3……xn};
All the research object sites are respectively classified into k site area sets thetaiIn (c) (-)i={xi,1,xi,2,xi,3……},i∈{1,2,3,4…,k};
S12, clustering by using a K-means algorithm, randomly selecting a clustering center, and calculating the influence distance from each station to each clustering center point aiming at each station in the omega clustering data set
Figure FDA0002940358080000011
Determining site xiThe smallest distance of impact to which cluster center, site x will beiDividing the cluster centers into classes of the cluster centers, and outputting the cluster centers and all the sites in each class;
the method for randomly selecting the cluster center in S12 comprises the following steps:
1) randomly selecting a site from a clustering data set omega as an initial clustering center C1Calculating site x by equation (1)iAnd cluster center point CjEuclidean distance of
Figure FDA0002940358080000012
Calculating site x by equation (2)iProbability P of being selected as next cluster center point(xi)
Figure FDA0002940358080000013
Figure FDA0002940358080000014
Where k is the coordinate parameter dimension, xi,kAnd cj,kRespectively represent sites xiAnd cluster center point CjThe kth-dimension data of (1);
2) according to each site xiP of(xi)Determining the area of the wheel disc of each station, selecting the next clustering center by using the wheel disc method, and belonging to the same theta after selecting the next clustering center each timeiX ofiDeleting the cluster centers from the roulette wheel, and sequentially selecting k cluster centers as a cluster center point set phi, wherein phi is { c ═ c1,c2,c3…ct…ck};
In the step S12, the influence distance from each station to each cluster center point is calculated
Figure FDA0002940358080000015
The method comprises the following steps:
i) calculating the Euclidean distance mean value S from each non-clustering center site to the clustering center by using the formula (3)iAs characteristic Euclidean distance values;
Figure FDA0002940358080000016
calculating the belonged characteristic value R of each non-clustering-center site to the preset region of the research clustering center by using the formula (4)i
Figure FDA0002940358080000017
II) using the formula (5) and the formula (6) to obtain SiAnd RiS 'is obtained by normalization calculation'iAnd R'i
Figure FDA0002940358080000021
Figure FDA0002940358080000022
III) calculating the current research clustering centers S and S using equation (7) and equation (8), respectivelyEntropy of R eSAnd eR
Figure FDA0002940358080000023
Figure FDA0002940358080000024
Wherein, S "iAnd R "iIs two calculated intermediate values, without practical meaning, S "iAnd R "iRespectively calculated by formula (9) and formula (10);
Figure FDA0002940358080000025
Figure FDA0002940358080000026
IV) calculating the information entropy redundancies d of the S and R of the current research clustering centers by respectively using a formula (11) and a formula (12)SAnd dR
dS=1-eS(11);
dR=1-eR(12);
V) calculating information entropy weights w of the current research cluster centers S and R by using formula (13) and formula (14) respectivelySAnd wR
Figure FDA0002940358080000027
Figure FDA0002940358080000028
VI) repeating the calculation processes I) to V)Obtaining the information entropy weight w of k clustering centersS,iAnd wR,i
VII) clustering operation is carried out by using a K-means clustering algorithm, and the influence distance from each site to each clustering center point is calculated by using a formula (15)
Figure FDA0002940358080000029
Figure FDA00029403580800000210
S20, identifying a commuting travel mode;
s21, each clustering center and all the sites of the clustering center form a research object area, and a group of research object areas are formed by the two research object areas;
randomly extracting hourly passenger flow statistics a for a plurality of working days for a group of study areasiThe 24-hour passenger flow data form a data set Ψ, Ψ ═ a1,a2,a3…a24};
S22, respectively calculating 24-n passenger flow fluctuation rates SiSearch for 24-n passenger flow fluctuation rates siCounting the number q of fluctuation rates larger than a threshold value:
if q is 4 and the four fluctuation rate peaks correspond to the start-stop time points of the early peak and the late peak, respectively, then this set of subject areas is identified as commuting travel pattern.
2. The method for wave-rate based commuter travel pattern recognition of claim 1, wherein said S12 outputs cluster center and all sites in each class as:
a) calculating the influence distance from each site to each cluster center point
Figure FDA0002940358080000031
Determining site xiThe smallest distance of impact to which cluster center, site x will beiIs divided into the clusterThe class of the center;
b) for each class i after repartitioning in a), calculating a new cluster center c for that class using equation (16)i
Figure FDA0002940358080000032
c) Repeatedly and randomly selecting clustering centers and calculating the influence distance from each site to each clustering center
Figure FDA0002940358080000033
And finishing the division of the regional sites until the position of the clustering center of each category is not changed any more, and outputting the clustering center and all the sites in each category.
3. The method of claim 2, wherein the passenger flow fluctuation rate S in S22 is a traffic flow pattern recognition method based on fluctuation rateiThe calculation method comprises the following steps:
s221: 23 logarithmic parameters b of the time of the passenger flow are calculated by using a formula (17)i
Figure FDA0002940358080000034
S222: calculating 24-n passenger flow fluctuation rates s using equation (18)iWhere n is the fluctuation observation range, s1The fluctuation rate at the time of 1+ (n-1)/2 points is shown;
Figure FDA0002940358080000035
wherein the calculation is performed using the formula (19)
Figure FDA0002940358080000036
Figure FDA0002940358080000037
4. The method for identifying a commuting travel pattern based on fluctuation rate as claimed in claim 3, wherein the method for counting the number q of fluctuation rates greater than the threshold in S22 is:
calculating 24-n passenger flow fluctuation rates s using the formula (20)iMean value of
Figure DEST_PATH_FDA0002868941420000041
And standard deviation ds
Figure FDA0002940358080000041
Wherein the calculation is performed using the formula (21)
Figure FDA0002940358080000042
Figure FDA0002940358080000043
Retrieving 24-n passenger flow fluctuation rates siCounting the number q of fluctuation rates larger than a threshold value:
Figure FDA0002940358080000044
CN202010872239.3A 2020-08-26 2020-08-26 Commuting trip mode identification method based on fluctuation rate Active CN111860699B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010872239.3A CN111860699B (en) 2020-08-26 2020-08-26 Commuting trip mode identification method based on fluctuation rate

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010872239.3A CN111860699B (en) 2020-08-26 2020-08-26 Commuting trip mode identification method based on fluctuation rate

Publications (2)

Publication Number Publication Date
CN111860699A CN111860699A (en) 2020-10-30
CN111860699B true CN111860699B (en) 2021-04-13

Family

ID=72967959

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010872239.3A Active CN111860699B (en) 2020-08-26 2020-08-26 Commuting trip mode identification method based on fluctuation rate

Country Status (1)

Country Link
CN (1) CN111860699B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112791997B (en) * 2020-12-16 2022-11-18 北方工业大学 Method for cascade utilization and screening of retired battery

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832779A (en) * 2017-12-11 2018-03-23 北方工业大学 Track station classification system
CN110232398A (en) * 2019-04-24 2019-09-13 广东交通职业技术学院 A kind of road network sub-area division and its appraisal procedure based on Canopy+Kmeans cluster
WO2020018679A1 (en) * 2018-07-17 2020-01-23 Nvidia Corporation Regression-based line detection for autonomous driving machines
CN111125184A (en) * 2019-11-23 2020-05-08 同济大学 Bus passenger flow dynamic monitoring method based on time sequence structural variable point identification

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170178044A1 (en) * 2015-12-21 2017-06-22 Sap Se Data analysis using traceable identification data for forecasting transportation information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832779A (en) * 2017-12-11 2018-03-23 北方工业大学 Track station classification system
WO2020018679A1 (en) * 2018-07-17 2020-01-23 Nvidia Corporation Regression-based line detection for autonomous driving machines
CN110232398A (en) * 2019-04-24 2019-09-13 广东交通职业技术学院 A kind of road network sub-area division and its appraisal procedure based on Canopy+Kmeans cluster
CN111125184A (en) * 2019-11-23 2020-05-08 同济大学 Bus passenger flow dynamic monitoring method based on time sequence structural variable point identification

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Using NARX Neural Network for Prediction of Urban Rail Transit Passenger Flow;Xiaochao Zhao等;《2018 IEEE 9th International Conference on Software Engineering and Service Science》;20181125;117-121 *
城市轨道交通短时客流不确定性预测模型;郭旷等;《城市轨道交通研究》;20200131;22-26 *

Also Published As

Publication number Publication date
CN111860699A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN105261212B (en) A kind of trip space-time analysis method based on GPS data from taxi map match
CN111653099B (en) Bus passenger flow OD obtaining method based on mobile phone signaling data
CN109686090B (en) Virtual traffic flow calculation method based on multi-source data fusion
CN110555544B (en) Traffic demand estimation method based on GPS navigation data
CN106781479A (en) A kind of method for obtaining highway running status in real time based on mobile phone signaling data
CN111581325B (en) K-means station area division method based on space-time influence distance
CN110634299B (en) Urban traffic state fine division and identification method based on multi-source track data
CN108961758A (en) A kind of crossing broadening lane detection method promoting decision tree based on gradient
CN113436433B (en) Efficient urban traffic outlier detection method
CN110414795B (en) Newly-increased high-speed rail junction accessibility influence method based on improved two-step mobile search method
CN114416710B (en) Method and system for extracting OD position of express way vehicle
CN109489679B (en) Arrival time calculation method in navigation path
CN112036757A (en) Parking transfer parking lot site selection method based on mobile phone signaling and floating car data
CN106327867B (en) Bus punctuation prediction method based on GPS data
CN115795332A (en) User travel mode identification method
CN110913345B (en) Section passenger flow calculation method based on mobile phone signaling data
CN111860699B (en) Commuting trip mode identification method based on fluctuation rate
CN111341135B (en) Mobile phone signaling data travel mode identification method based on interest points and navigation data
CN116233757A (en) Resident travel carbon emission amount calculating method based on mobile phone signaling data
CN114358386A (en) Double-trip-mode ride-sharing site generation method based on reserved trip demand
CN114742131A (en) Method for identifying urban excessive tourism area based on pattern mining
CN112866934A (en) Subway user identification method and system
CN110610446A (en) County town classification method based on two-step clustering thought
CN113724494B (en) Customized bus demand area identification method
Song et al. Clustering and understanding traffic flow patterns of large scale urban roads

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant