CN107563540A - A kind of public transport in short-term based on random forest is got on the bus the Forecasting Methodology of the volume of the flow of passengers - Google Patents

A kind of public transport in short-term based on random forest is got on the bus the Forecasting Methodology of the volume of the flow of passengers Download PDF

Info

Publication number
CN107563540A
CN107563540A CN201710609933.4A CN201710609933A CN107563540A CN 107563540 A CN107563540 A CN 107563540A CN 201710609933 A CN201710609933 A CN 201710609933A CN 107563540 A CN107563540 A CN 107563540A
Authority
CN
China
Prior art keywords
bus
time window
passengers
volume
flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710609933.4A
Other languages
Chinese (zh)
Other versions
CN107563540B (en
Inventor
王璞
凌溪蔓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN201710609933.4A priority Critical patent/CN107563540B/en
Publication of CN107563540A publication Critical patent/CN107563540A/en
Application granted granted Critical
Publication of CN107563540B publication Critical patent/CN107563540B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Traffic Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A kind of Forecasting Methodology for the volume of the flow of passengers of being got on the bus the invention provides public transport in short-term based on random forest, including:Obtain passenger's riding information and bus positional information in survey region;The website of getting on the bus of passenger is extrapolated by the passenger's riding information and bus positional information of acquisition;Zoning bus station and time window;Random forest grader is trained, establishes regressive prediction model;Forecast sample is built, the forecast sample is inputted into regressive prediction model, target area bus station is obtained and is got on the bus the volume of the flow of passengers in the prediction of object time window.The present invention, using random forests algorithm, obtains high-precision forecast result, has practical guided significance by proposing region bus station concept.

Description

A kind of public transport in short-term based on random forest is got on the bus the Forecasting Methodology of the volume of the flow of passengers
Technical field
The present invention relates to technical field of transportation, and in particular to a kind of public transport in short-term based on random forest is got on the bus the volume of the flow of passengers Forecasting Methodology.
Background technology
Traffic transport power of the public transport to whole city plays leading role, but current most domestic city is public The city bus transport power deficiency of traffic transport power deficiency, particularly peak period, now goes up the volume of the flow of passengers in short-term to each bus station Forecasting research seems particularly important.Can be that public transport management system carry to the above prediction of the volume of the flow of passengers in short-term of each bus station For the volume of the flow of passengers in more structurally sound prediction, play a part of adjusting public transport transport power in time, alleviate the crowding of bus passenger.But Main research at present concentrate on urban public traffic network planning and designing optimization and public transport management in terms of optimization, and deposit In following problem:
1. because the data supporting of above-mentioned aspect is less, qualitative analysis is often based upon.
2. the short-term prediction research for traffic flow is relatively broad, but few predictions to the volume of the flow of passengers in public transport in short-term are ground Study carefully.
3. existing prediction is based on single bus station, because the upper volume of the flow of passengers fluctuation of single bus station is larger, therefore Prediction effect is poor.
The content of the invention
A kind of Forecasting Methodology for the volume of the flow of passengers of being got on the bus the invention provides public transport in short-term based on random forest, can solve the problem that State the problem of prior art is present.
A kind of Forecasting Methodology for the volume of the flow of passengers of being got on the bus the invention provides public transport in short-term based on random forest, including:
Step S1:Obtain passenger's riding information and bus positional information in survey region;
Step S2:By the step S1 passenger's riding informations obtained and bus positional information, getting on the bus for passenger is extrapolated Website;
Step S3:Zoning bus station and time window;
The survey region is divided into an equal amount of grid spaces, and the grid spaces are numbered, by same side The bus station included in lattice is polymerize, and is obtained region bus station, whole day search time is divided into an equal amount of Time window, count the volume of the flow of passengers of getting on the bus of regional bus station in each time window;
Step S4:Random forest grader is trained, establishes regressive prediction model;
Target area bus station and object time window are determined, with the target area bus station in the object time The volume of the flow of passengers of getting on the bus of (d+1) individual time window before prediction day where window in n days is inputted as training sample using training sample Random forest grader is trained, and establishes regressive prediction model;
Wherein, every day, n and d were integer in the volume of the flow of passengers of getting on the bus of (d+1) individual time window as a sample data;
Step S5:Forecast sample is built, the forecast sample is inputted into regressive prediction model, obtains target area bus station Point is got on the bus the volume of the flow of passengers in the prediction of object time window;
When choosing d on the day of the target area bus station is located at object time window and before object time window Between window the volume of the flow of passengers of getting on the bus as forecast sample, the forecast sample is inputted into the regressive prediction model, obtains target area Bus station is got on the bus the volume of the flow of passengers in the prediction of object time window, and d is integer.
In the prior art, because the volume of the flow of passengers fluctuation of getting on the bus of single bus station is larger, it is based on single bus station The passenger flow forecast effect of getting on the bus of point is poor, the directive significance without reality.The invention proposes " region public transport The concept (i.e. step S3) of website ", by regarding the bus station in certain area as a set, overall statistics and entirety are pre- The volume of the flow of passengers of always getting on the bus of all bus stations in the region is surveyed, can more reflect the trip information of resident in the region, is had more preferable Can be predictive.Wherein, the actual size and institute that the grid size of " region bus station " can be according to whole survey region Comprising bus station position and number flexibly delimit.Simultaneously as routine bus system is with respect to subway, passenger flow is more sparse, In order to reach preferably statistics and prediction effect, be divided into multiple time windows of equalization, by each time the whole day time The volume of the flow of passengers of getting on the bus in window is counted and predicted, to substitute for the passenger flow statisticses sometime put and prediction, have more preferable Actual directive significance.
Further, the step S1 is specifically included:
Step S1.1:The bus IC card card using information of the passenger in survey region is obtained by the vehicle-mounted POS of bus, The card using information includes identification number, pick-up time and the bus of the seating license plate number of passenger;
Step S1.2:Believe the wheelpath position obtained by bus vehicle positioning equipment in the bus running period Breath, the wheelpath positional information includes bus license plate number, trajectory location points correspond to the time, trajectory location points correspond to longitude With trajectory location points corresponding latitude.
Further, the step S2 is specifically included:
Step S2.1:The bus positional information that step S1 is obtained is compared with the public bus network data of reality, from The location point matched with bus positional information is searched in public bus network data, temporal information corresponding to the location point is public affairs Car is handed over to reach the specific time of each bus station;
Wherein, public bus network data include circuit number, site name, website sequence number, website longitude and website latitude;
Step S2.2:Passenger's riding information that step S1 is obtained is reached into each bus station with the bus extrapolated The specific time carries out comparing, extrapolates the website of getting on the bus of passenger.
Specifically, by circuit number corresponding to the determination of bus license plate number, trajectory location points are then corresponded into longitude With trajectory location points corresponding latitude compared with website longitude and website latitude pair, will both longitudes and latitudes it is close or identical when it is corresponding Trajectory location points correspond to the time and site name establishes one-to-one relation, it follows that it is corresponding that each bus reaches it The specific time of each bus station on circuit.
By the bus license plate number taken in passenger's riding information determine corresponding to bus, then by passenger when getting on the bus Between reached with this bus obtained above in its corresponding line compared with the specific time of each bus station pair, so as to obtain Get on the bus website and the pick-up time of passenger.
Further, the step S4 is specifically included:
Step S4.1:Target area bus station and object time window are determined, and obtains the target area bus station The volume of the flow of passengers of getting on the bus of (d+1) individual time window before predicting day where the object time window in n days, i.e. target area public transport (d+1) before day is predicted in all bus stations where the object time window in grid spaces corresponding to website in n days is individual The volume of the flow of passengers of always getting on the bus of time window, n and d are integer;
Step S4.2:With the target area bus station in n days with object time window with the d time before the period First input parameter x of the volume of the flow of passengers as training sample that get on the bus in window, with the target area bus station in n days with mesh Mark the get on the bus volume of the flow of passengers second input parameter y as training sample of the time window with the period, structure training sample set D={ (x1, y1),(x2,y2),…,(xi,yi),…,(xn,yn)},xi∈Rd,yi∈ R, each sample (xi,yi) all join comprising two inputs Number, each sample contain in one day the volume of the flow of passengers of getting on the bus of (d+1) individual time window, wherein, xiThe intrinsic dimensionality having is d, X in i.e. each sampleiD value is respectively provided with, represents in one day and is got on the bus with object time window with d time window before the period The volume of the flow of passengers, xiForm matrix X=(x1,x2,…,xi,…,xn)T, matrix X have n rows d row, represent in n days with object time window With the volume of the flow of passengers of getting on the bus in d time window before the period, yiThe intrinsic dimensionality having is 1, i.e., the y in each sampleiOnly have One value, represents the volume of the flow of passengers of getting on the bus with object time window with the period, y in one dayiForm column matrix Y=(y1,y2,..,yn )T, there is matrix Y n rows 1 to arrange, and represent in n days with object time window with the volume of the flow of passengers of getting on the bus known to the period, yiWith xiCorrespond;
Step S4.3:With training sample set D={ (x1,y1),(x2,y2),…,(xi,yi),…,(xn,yn)},xi∈Rd,yi ∈ R train random forest grader, establish regressive prediction model.
Further, the step S4.3 is specially:
With training sample set the D={ (x1,y1),(x2,y2),…,(xi,yi),…,(xn,yn) calculated as random forest The input parameter of method, sets CART post-class processing number t and the depth deep of each tree, and each node uses f dimensional features, entered Row model training;
Wherein, t, deep and f are integer, and f value is d, d square root or is logarithm that bottom takes d with 2;
Step S4.3.1:There is for sampling with putting back to and being formed in t CART post-class processing from the training sample set D The self-service sample set of j CART post-class processing, to each tree successively from root node start with node pair in each tree with The self-service sample set is divided corresponding to each tree;
Wherein, j span arrives t for 1;
Step S4.3.2:In the d dimensional features having at each node of each tree from training sample set without put back to Machine chooses f dimensional features, and seeks the best kth dimensional feature of classifying quality from f dimensional features, using the kth dimensional feature as division Feature, as threshold value, the present node for being unsatisfactory for end condition is divided using characteristic value corresponding to the kth dimensional feature;
Wherein, the sample for kth dimensional feature in present node being less than to threshold value is divided into left sibling, will be remaining in present node Sample be divided into right node, k span arrives f for 1;
Step S4.3.3:The present node for meeting end condition is divided into leaf node, the prediction of the leaf node Export the average value of each sample value included for present node;
Wherein, when end condition is that the sample number that includes of present node is minimum and information gain is minimum, the present node Stop division;
Step S4.3.4:Repeat step S4.3.1 to step S4.3.3, until all nodes are all completed to train or marked It is designated as leaf node;
Step S4.3.5:Repeat step S4.3.1 to step S4.3.4, until all CART post-class processings are completed to train.
Further, the prediction process of regressive prediction model described in step S5 is specially:
Choose jth CART post-class processings, the root by the CART post-class processings to forecast sample from present tree Node proceeds by division, and according to the division feature and threshold value of node, the forecast sample less than threshold value is divided into left section Point, remaining sample are divided into right node, until reaching the leaf node of present tree, and export predicted value;Wherein, j value model Enclose and arrive t for 1;
Aforesaid operations are repeated, until all CART post-class processings export predicted value, all CART post-class processings outputs The average value of predicted value is the output valve of the regressive prediction model.
Beneficial effect
The present invention can not get on the bus this problem, i.e. peak period for the peak period in-car overcrowding passenger that causes to wait City bus transport power deficiency, it is proposed that a kind of public transport in short-term based on random forest is got on the bus the Forecasting Methodology of the volume of the flow of passengers, is passed through The concept of region bus station is proposed, it is public by excavation of the machine learning algorithm to historical data and learning training, estimation range The volume of the flow of passengers on website is handed over, precision of prediction is high, has practical guided significance, is provided for public transport management system more structurally sound pre- The volume of the flow of passengers in survey, play a part of adjusting public transport transport power in time, improve the service level of public transport.
Brief description of the drawings
Fig. 1 is that a kind of public transport in short-term based on random forest provided in an embodiment of the present invention is got on the bus the Forecasting Methodology of the volume of the flow of passengers Flow chart;
Fig. 2 and Fig. 3 be using this method predict Shenzhen in short-term in public transport the volume of the flow of passengers result displaying figure, wherein, Fig. 2 represent 30 days 18 October in 2014:00-19:00 observation and 19:00-20:The fitted figure of 00 observation, Fig. 3 represent all areas Domain was 30 days 19 October in 2014:00-20:00 predicted value and the fit solution of observation.
Embodiment
More fully understand that a kind of public transport in short-term based on random forest provided by the invention is got on the bus the volume of the flow of passengers for convenience Forecasting Methodology, it is described in detail with reference to specific embodiment.
A kind of Forecasting Methodology for the volume of the flow of passengers of being got on the bus the embodiments of the invention provide public transport in short-term based on random forest, realize Specific steps are as shown in Figure 1.The present embodiment has used Shenzhen public affairs in Shenzhen's on October 31,11 days to 2014 October in 2014 Hand over IC-card brushing card data and public transport vehicle-mounted gps data.Embodiment comprises the following steps:
Step S1:The bus IC card brushing card data in Shenzhen's on October 31st, 11 days 1 October in 2014 is obtained, often One data record of swiping the card " passenger ID ", " pick-up time ", " information such as public transport ID ";Obtain Shenzhen on October 11st, 2014 To the public transport vehicle-mounted gps data on October 31st, 2014, the data contain " public transport ID ", " GPS point time ", " GPS point warp The information such as degree ", " GPS point latitude ".
Step S2:Extrapolate the Entrucking Point and pick-up time of the passenger according to above- mentioned information, i.e. passenger every swipes the card note Bus station where when record occurs.In order to reach this purpose, it is necessary first to by gps data and public bus network, station data Data fusion is carried out, extrapolates the specific time that public transit vehicle reaches each bus station accordingly;Then by the data after fusion Further merged with passenger's IC-card brushing card data, extrapolate get on the bus website and the pick-up time of passenger.Public bus network packet Contain " circuit number ", " site name ", " website sequence number ", " website longitude ", " website latitude ".The mode of data fusion is Retain all fields of different pieces of information, specifically contain on Shenzhen passenger in October 11 to 31 days 21 October in 2014 in 2014 Car site information, include 14109 trip datas of 293 passengers altogether after screening.
Step S3:It is that Shenzhen's (sharing 53914 bus stations) is divided into an equal amount of side by whole survey region Lattice (each grid size is 1km × 1km), and grid is numbered, the bus station that then will be included in each grid Point is polymerize, so as to forming region bus station;Meanwhile the research of the present embodiment uses daily 6:00-22:In 00 period Caused brushing card data, counts the volume of the flow of passengers in the public transport daily in each each grid of time window, and time window size is 1h, whole day Totally 16 time windows.
Step S4:Determine that target area bus station (grid to be predicted) and object time window (to be predicted Time window), (d+1) that statistics is obtained before day is predicted where the object time window in the target area bus station in n days is individual The volume of the flow of passengers of getting on the bus of time window.
By the target area bus station in n days with object time window with upper in d time window before the period First input parameter x of the car volume of the flow of passengers as training sample, with the target area bus station in n days with object time window With second input parameter y of the volume of the flow of passengers as training sample that get on the bus of period, structure training sample set D={ (x1,y1),(x2, y2),…,(xi,yi),…,(xn,yn)},xi∈Rd,yi∈ R, each sample (xi,yi) two input parameters are all included, it is each Individual sample contains in one day the volume of the flow of passengers of getting on the bus of (d+1) individual time window, wherein, xiThe intrinsic dimensionality having is d, i.e., each sample X in thisiD value is respectively provided with, represents the volume of the flow of passengers of getting on the bus with object time window with d time window before the period, x in one dayi Form matrix X=(x1,x2,…,xi,…,xn)T, matrix X has n rows d row, before representing in n days with object time window with the period D time window in the volume of the flow of passengers of getting on the bus, yiThe intrinsic dimensionality having is 1, i.e., the y in each sampleiThere was only a value, generation The get on the bus volume of the flow of passengers of the table in one day with object time window with the period, yiForm column matrix Y=(y1,y2,..,yn)T, matrix Y tools There are n rows 1 to arrange, represent in n days with object time window with the volume of the flow of passengers of getting on the bus known to the period, yiWith xiCorrespond.
With training sample set D={ (x1,y1),(x2,y2),…,(xi,yi),…,(xn,yn)},xi∈Rd,yi∈ R training with Machine forest classified device, establishes regressive prediction model.
Step S5:When obtaining d before target area bus station is located at object time window on the day of object time window Between window the volume of the flow of passengers of getting on the bus;With the volume of the flow of passengers structure forecast sample x that gets on the bus*, x*∈Rd, forecast sample x*With the first input parameter x In each sample there is identical intrinsic dimensionality d, i.e. forecast sample and the first input parameter contains equal number time window The volume of the flow of passengers of getting on the bus;By forecast sample x*The regressive prediction model is inputted, obtains target area bus station in the object time The prediction of window is got on the bus the volume of the flow of passengers.
Specifically, in the present embodiment, on October 27th, 2014 to 30 days 16 October is chosen:00-20:00 period The volume of the flow of passengers of getting on the bus of target grid is researched and analysed in (i.e. time window TW16-TW19), using the data of 27 to 29 as Training set, the data of 30 days are as forecast set.In corresponding training characteristics collection D, d=3, comprising TW16-TW18,3 times are represented Window;N=3, represent 27 days to three days on the 29th;Forecast set x*Represent the same day 16 on the 30th:00-19:00 (TW16-TW18) target area The volume of the flow of passengers of getting on the bus of domain bus station.Machine learning algorithm used in the present embodiment is random forest, and its algorithm is realized and included Two processes of training and prediction, it is specific as follows shown:
Training process:
1st, training set is training characteristics collection D={ (x1,y1),(x2,y2),…,(xn,yn)},xi∈Rw,yi∈ R. test sets For forecast set x*∈Rd, training characteristics collection and forecast set are respectively provided with d dimensional features.Therefore symbiosis is into t CART post-class processing, every The depth of tree is deep, and each node uses f dimensional features, and when a certain node includes, sample number is minimum and information gain is minimum When, the node stops division;In the present embodiment, t values are that 10, deep is initial value, are worth for sky, take f=d.
2nd, sample with putting back to form Train (j) from training characteristics collection D, Train (j) represents that jth CART classifies back Gui Shu training set, wherein, j=1,2,3 ..., 10, trained since root node.
If the 3, present node meets end condition, present node is divided into leaf node, the leaf node it is pre- The average value for each sample value of sample set that output includes for present node is surveyed, then proceedes to train other nodes.If work as prosthomere Point is unsatisfactory for end condition, then without putting back to randomly selects f dimensional features (f value one by a certain percentage from above-mentioned d dimensional features As be d, sqrt (d) or log2(d) f=d, in the present embodiment, is taken), search out classifying quality preferably (i.e. present node sample During the value maximum for the variance VarRight that the variance VarLeft that the variance Var of this collection subtracts left child node subtracts right child node again) One-dimensional characteristic, be designated as kth dimensional feature (1<k<F) and corresponding characteristic value is threshold value Threshold, by present node kth Sample of the dimensional feature less than threshold value Threshold is divided into left sibling, and remaining sample is divided into right node, then proceedes to train Other nodes.
4th, leaf node was all trained or be marked as to repeat step 2,3 until all nodes.
5th, repeat step 2,3,4 was all trained to until all CART post-class processings.
Prediction process:
1st, for jth CART trees, since the root node of present tree, according to the division feature k and threshold value of present node Threshold, the sample less than threshold value Threshold is divided into left sibling, remaining sample is divided into right node, until arriving Up to some leaf node, and export predicted value.
2nd, previous step is repeated until t CART tree all outputs predicted value, and the predicted value of Random Forest model is For the average value of the output of all CART trees.
Specifically, in the present embodiment, on October 30 19 is predicted:00-20:00 all grids are 19:00-20:00 Get on the bus passenger flow, and by predicted value compared with observation, as a result as shown in Figures 2 and 3, Fig. 2 represents on October 30th, 2014 18:00-19:00 observation and 19:00-20:The fitted figure of 00 observation;Fig. 3 represents all areas in October, 2014 30 days 19:00-20:00 predicted value and the fit solution of observation.R in figure2It is excellent for the coefficient of determination (goodness of fit), fitting Degree is bigger to represent that point is more intensive near the tropic, and prediction effect is better;RMSE represents root-mean-square error, its smaller expression of value Prediction effect is better.As can be seen, the prediction effect of this method is fabulous.
In summary, a kind of public transport in short-term based on random forest provided by the invention is got on the bus the Forecasting Methodology of the volume of the flow of passengers, It can not be got on the bus the city bus transport power of this problem, i.e. peak period for the peak period in-car overcrowding passenger that causes to wait The problem of insufficient, by proposing the concept of region bus station, by excavation and study of the machine learning algorithm to historical data Train, the volume of the flow of passengers on the bus station of estimation range, the volume of the flow of passengers in more structurally sound prediction is provided for public transport management system, rise To the effect of timely adjustment public transport transport power, the service level of public transport is improved.
Embodiments of the invention are the foregoing is only, are not intended to limit the invention, it is all in spirit of the invention and former Within then, change, equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims (6)

  1. The Forecasting Methodology of the volume of the flow of passengers 1. a kind of public transport in short-term based on random forest is got on the bus, it is characterised in that including:
    Step S1:Obtain passenger's riding information and bus positional information in survey region;
    Step S2:By the step S1 passenger's riding informations obtained and bus positional information, the website of getting on the bus of passenger is extrapolated;
    Step S3:Zoning bus station and time window;
    The survey region is divided into an equal amount of grid spaces, and the grid spaces are numbered, by same grid Comprising bus station polymerize, obtain region bus station, whole day search time be divided into an equal amount of time Window, count the volume of the flow of passengers of getting on the bus of regional bus station in each time window;
    Step S4:Random forest grader is trained, establishes regressive prediction model;
    Target area bus station and object time window are determined, with the target area bus station in the object time window institute The volume of the flow of passengers of getting on the bus of (d+1) individual time window before day is predicted in n days is inputted random as training sample using training sample Forest classified device is trained, and establishes regressive prediction model;
    Wherein, every day, n and d were integer in the volume of the flow of passengers of getting on the bus of (d+1) individual time window as a sample data;
    Step S5:Forecast sample is built, the forecast sample is inputted into regressive prediction model, target area bus station is obtained and exists The prediction of object time window is got on the bus the volume of the flow of passengers;
    Choose d time window on the day of the target area bus station is located at object time window and being located at before object time window The volume of the flow of passengers of getting on the bus as forecast sample, the forecast sample is inputted into the regressive prediction model, obtains target area public transport Website is got on the bus the volume of the flow of passengers in the prediction of object time window, and d is integer.
  2. 2. Forecasting Methodology according to claim 1, it is characterised in that the step S1 is specifically included:
    Step S1.1:The bus IC card card using information of the passenger in survey region is obtained by the vehicle-mounted POS of bus, it is described Card using information includes identification number, pick-up time and the bus of the seating license plate number of passenger;
    Step S1.2:Wheelpath positional information in the bus running period is obtained by bus vehicle positioning equipment, The wheelpath positional information include bus license plate number, trajectory location points correspond to the time, trajectory location points correspond to longitude and Trajectory location points corresponding latitude.
  3. 3. Forecasting Methodology according to claim 2, it is characterised in that the step S2 is specifically included:
    Step S2.1:The bus positional information that step S1 is obtained is compared with the public bus network data of reality, from public transport The location point matched with bus positional information is searched in track data, temporal information corresponding to the location point is bus Reach the specific time of each bus station;
    Wherein, public bus network data include circuit number, site name, website sequence number, website longitude and website latitude;
    Step S2.2:Passenger's riding information that step S1 is obtained is reached into the specific of each bus station with the bus extrapolated Time carries out comparing, extrapolates the website of getting on the bus of passenger.
  4. 4. Forecasting Methodology according to claim 3, it is characterised in that the step S4 is specifically included:
    Step S4.1:Target area bus station and object time window are determined, and obtains the target area bus station in institute The volume of the flow of passengers of getting on the bus of (d+1) individual time window before prediction day where stating object time window in n days, n and d are integer;
    Step S4.2:With the target area bus station in n days with object time window with d time window before the period First input parameter x of the volume of the flow of passengers as training sample that get on the bus, with the target area bus station in n days with target when Between window with second input parameter y of the volume of the flow of passengers as training sample that get on the bus of period, structure training sample set D={ (x1,y1), (x2,y2),…,(xi,yi),…,(xn,yn)},xi∈Rd,yi∈ R, each sample (xi,yi) two input parameters are all included, Wherein, xiThe intrinsic dimensionality having is d, and d is integer;
    Step S4.3:With training sample set D={ (x1,y1),(x2,y2),…,(xi,yi),…,(xn,yn)},xi∈Rd,yi∈R Random forest grader is trained, establishes regressive prediction model.
  5. 5. Forecasting Methodology according to claim 4, it is characterised in that the step S4.3 is specially:
    With training sample set the D={ (x1,y1),(x2,y2),…,(xi,yi),…,(xn,yn) as random forests algorithm Input parameter, CART post-class processing number t and the depth deep of each tree are set, each node uses f dimensional features, carries out mould Type training;
    Wherein, t, deep and f are integer, and f value is d, d square root or is logarithm that bottom takes d with 2;
    Step S4.3.1:Have from the training sample set D sample the jth to be formed in t CART post-class processing with putting back to The self-service sample set of CART post-class processings, node pair in each tree and every are started with from root node successively to each tree The self-service sample set is divided corresponding to tree;
    Wherein, j span arrives t for 1;
    Step S4.3.2:Without putting back to, ground is random to be selected in the d dimensional features having at each node of each tree from training sample set F dimensional features are taken, and seek the best kth dimensional feature of classifying quality from f dimensional features, using the kth dimensional feature as division feature, , as threshold value, the present node for being unsatisfactory for end condition is divided using characteristic value corresponding to the kth dimensional feature;
    Wherein, the sample for kth dimensional feature in present node being less than to threshold value is divided into left sibling, by remaining sample in present node Originally right node is divided into, k span arrives f for 1;
    Step S4.3.3:The present node for meeting end condition is divided into leaf node, the prediction output of the leaf node The average value of each sample value included for present node;
    Wherein, when end condition is that the sample number that includes of present node is minimum and information gain is minimum, the present node stops Division;
    Step S4.3.4:Repeat step S4.3.1 to step S4.3.3, until all nodes are all completed to train or be marked as Leaf node;
    Step S4.3.5:Repeat step S4.3.1 to step S4.3.4, until all CART post-class processings are completed to train.
  6. 6. Forecasting Methodology according to claim 5, it is characterised in that the prediction of regressive prediction model described in step S5 Journey is specially:
    Choose jth CART post-class processings, the root node by the CART post-class processings to forecast sample from present tree Division is proceeded by, according to the division feature and threshold value of node, the forecast sample less than threshold value is divided into left sibling, is remained Remaining sample is divided into right node, until reaching the leaf node of present tree, and exports predicted value;Wherein, j span is 1 To t;
    Aforesaid operations are repeated, until all CART post-class processings export predicted value, all CART post-class processings output predictions The average value of value is the output valve of the regressive prediction model.
CN201710609933.4A 2017-07-25 2017-07-25 Method for predicting short-time bus boarding passenger flow based on random forest Active CN107563540B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710609933.4A CN107563540B (en) 2017-07-25 2017-07-25 Method for predicting short-time bus boarding passenger flow based on random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710609933.4A CN107563540B (en) 2017-07-25 2017-07-25 Method for predicting short-time bus boarding passenger flow based on random forest

Publications (2)

Publication Number Publication Date
CN107563540A true CN107563540A (en) 2018-01-09
CN107563540B CN107563540B (en) 2021-03-30

Family

ID=60974256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710609933.4A Active CN107563540B (en) 2017-07-25 2017-07-25 Method for predicting short-time bus boarding passenger flow based on random forest

Country Status (1)

Country Link
CN (1) CN107563540B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108415885A (en) * 2018-02-08 2018-08-17 武汉蓝泰源信息技术有限公司 The real-time bus passenger flow prediction technique returned based on neighbour
CN108877223A (en) * 2018-07-13 2018-11-23 南京理工大学 A kind of Short-time Traffic Flow Forecasting Methods based on temporal correlation
CN109035770A (en) * 2018-07-31 2018-12-18 上海世脉信息科技有限公司 The real-time analyzing and predicting method of public transport passenger capacity under a kind of big data environment
CN109711428A (en) * 2018-11-20 2019-05-03 佛山科学技术学院 A kind of saturated gas pipeline internal corrosion speed predicting method and device
CN109741597A (en) * 2018-12-11 2019-05-10 大连理工大学 A kind of bus section runing time prediction technique based on improvement depth forest
CN111105070A (en) * 2019-11-20 2020-05-05 深圳市北斗智能科技有限公司 Passenger flow early warning method and system
CN112235362A (en) * 2020-09-28 2021-01-15 北京百度网讯科技有限公司 Position determination method, device, equipment and storage medium
CN112949939A (en) * 2021-03-30 2021-06-11 福州市电子信息集团有限公司 Taxi passenger carrying hotspot prediction method based on random forest model
CN113392880A (en) * 2021-05-27 2021-09-14 扬州大学 Traffic flow short-time prediction method based on deviation correction random forest
WO2021189950A1 (en) * 2020-10-29 2021-09-30 平安科技(深圳)有限公司 Short-time bus station passenger flow prediction method and apparatus, and computer device and storage medium
CN113570099A (en) * 2020-04-28 2021-10-29 百度在线网络技术(北京)有限公司 Departure interval prediction method, prediction model training method, device and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102169606A (en) * 2010-02-26 2011-08-31 同济大学 Method for predicting influence of heavy passenger flow of urban rail transit network
CN102436603A (en) * 2011-08-29 2012-05-02 北京航空航天大学 Rail transit full-road-network passenger flow prediction method based on probability tree destination (D) prediction
CN105095993A (en) * 2015-07-22 2015-11-25 济南市市政工程设计研究院(集团)有限责任公司 System and method for predicting passenger flow volume of railway stations
US20150356458A1 (en) * 2014-06-10 2015-12-10 Jose Oriol Lopez Berengueres Method And System For Forecasting Future Events

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102169606A (en) * 2010-02-26 2011-08-31 同济大学 Method for predicting influence of heavy passenger flow of urban rail transit network
CN102436603A (en) * 2011-08-29 2012-05-02 北京航空航天大学 Rail transit full-road-network passenger flow prediction method based on probability tree destination (D) prediction
US20150356458A1 (en) * 2014-06-10 2015-12-10 Jose Oriol Lopez Berengueres Method And System For Forecasting Future Events
CN105095993A (en) * 2015-07-22 2015-11-25 济南市市政工程设计研究院(集团)有限责任公司 System and method for predicting passenger flow volume of railway stations

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108415885A (en) * 2018-02-08 2018-08-17 武汉蓝泰源信息技术有限公司 The real-time bus passenger flow prediction technique returned based on neighbour
CN108877223A (en) * 2018-07-13 2018-11-23 南京理工大学 A kind of Short-time Traffic Flow Forecasting Methods based on temporal correlation
CN109035770B (en) * 2018-07-31 2022-01-04 上海世脉信息科技有限公司 Real-time analysis and prediction method for bus passenger capacity in big data environment
CN109035770A (en) * 2018-07-31 2018-12-18 上海世脉信息科技有限公司 The real-time analyzing and predicting method of public transport passenger capacity under a kind of big data environment
CN109711428A (en) * 2018-11-20 2019-05-03 佛山科学技术学院 A kind of saturated gas pipeline internal corrosion speed predicting method and device
CN109741597B (en) * 2018-12-11 2020-09-29 大连理工大学 Bus section operation time prediction method based on improved deep forest
CN109741597A (en) * 2018-12-11 2019-05-10 大连理工大学 A kind of bus section runing time prediction technique based on improvement depth forest
CN111105070A (en) * 2019-11-20 2020-05-05 深圳市北斗智能科技有限公司 Passenger flow early warning method and system
CN111105070B (en) * 2019-11-20 2024-04-16 深圳市北斗智能科技有限公司 Passenger flow early warning method and system
CN113570099A (en) * 2020-04-28 2021-10-29 百度在线网络技术(北京)有限公司 Departure interval prediction method, prediction model training method, device and equipment
CN112235362A (en) * 2020-09-28 2021-01-15 北京百度网讯科技有限公司 Position determination method, device, equipment and storage medium
CN112235362B (en) * 2020-09-28 2022-08-30 北京百度网讯科技有限公司 Position determination method, device, equipment and storage medium
WO2021189950A1 (en) * 2020-10-29 2021-09-30 平安科技(深圳)有限公司 Short-time bus station passenger flow prediction method and apparatus, and computer device and storage medium
CN112949939A (en) * 2021-03-30 2021-06-11 福州市电子信息集团有限公司 Taxi passenger carrying hotspot prediction method based on random forest model
CN113392880B (en) * 2021-05-27 2021-11-23 扬州大学 Traffic flow short-time prediction method based on deviation correction random forest
CN113392880A (en) * 2021-05-27 2021-09-14 扬州大学 Traffic flow short-time prediction method based on deviation correction random forest

Also Published As

Publication number Publication date
CN107563540B (en) 2021-03-30

Similar Documents

Publication Publication Date Title
CN107563540A (en) A kind of public transport in short-term based on random forest is got on the bus the Forecasting Methodology of the volume of the flow of passengers
CN108133302B (en) Public bicycle potential demand prediction method based on big data
CN105788260B (en) A kind of bus passenger OD projectional techniques based on intelligent public transportation system data
CN103177575B (en) System and method for dynamically optimizing online dispatching of urban taxies
CN110264709A (en) The prediction technique of the magnitude of traffic flow of road based on figure convolutional network
CN103984994B (en) Method for predicting urban rail transit passenger flow peak duration
CN110836675B (en) Decision tree-based automatic driving search decision method
CN105206048A (en) Urban resident traffic transfer mode discovery system and method based on urban traffic OD data
CN103310287A (en) Rail transit passenger flow predicting method for predicting passenger travel probability and based on support vector machine (SVM)
CN102324128A (en) Method for predicting OD (Origin-Destination) passenger flow among bus stations on basis of IC (Integrated Circuit)-card record and device
CN109544690A (en) Shared bicycle trip influence factor recognition methods, system and storage medium
CN106898142B (en) A kind of path forms time reliability degree calculation method considering section correlation
CN107919014A (en) Taxi towards more carrying kilometres takes in efficiency optimization method
CN109840272B (en) Method for predicting user demand of shared electric automobile station
CN106327867B (en) Bus punctuation prediction method based on GPS data
CN106777169A (en) A kind of user&#39;s trip hobby analysis method based on car networking data
JP6307376B2 (en) Traffic analysis system, traffic analysis program, and traffic analysis method
CN109800903A (en) A kind of profit route planning method based on taxi track data
CN113642757A (en) Internet of things charging pile construction planning method and system based on artificial intelligence
CN111008730B (en) Crowd concentration prediction model construction method and device based on urban space structure
Tan et al. Statistical analysis and prediction of regional bus passenger flows
CN115994787A (en) Car pooling demand prediction matching method based on neural network
CN116090785A (en) Custom bus planning method for two stages of large-scale movable loose scene
CN109886746A (en) A kind of trip purpose recognition methods based on passenger getting off car when and where
CN114239929A (en) Taxi traffic demand characteristic prediction method based on random forest

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant