CN107563540B - Method for predicting short-time bus boarding passenger flow based on random forest - Google Patents

Method for predicting short-time bus boarding passenger flow based on random forest Download PDF

Info

Publication number
CN107563540B
CN107563540B CN201710609933.4A CN201710609933A CN107563540B CN 107563540 B CN107563540 B CN 107563540B CN 201710609933 A CN201710609933 A CN 201710609933A CN 107563540 B CN107563540 B CN 107563540B
Authority
CN
China
Prior art keywords
bus
prediction
passenger flow
dividing
time window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710609933.4A
Other languages
Chinese (zh)
Other versions
CN107563540A (en
Inventor
王璞
凌溪蔓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN201710609933.4A priority Critical patent/CN107563540B/en
Publication of CN107563540A publication Critical patent/CN107563540A/en
Application granted granted Critical
Publication of CN107563540B publication Critical patent/CN107563540B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Traffic Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method for predicting the short-time bus boarding passenger flow based on a random forest, which comprises the following steps: acquiring passenger riding information and bus position information in a research area; calculating the getting-on station of the passenger according to the obtained passenger riding information and the bus position information; dividing a regional bus stop and a time window; training a random forest classifier, and establishing a regression prediction model; and constructing a prediction sample, inputting the prediction sample into a regression prediction model, and obtaining the predicted getting-on passenger flow of the bus stop in the target area in the target time window. The invention obtains a high-precision prediction result by providing a regional bus stop concept and adopting a random forest algorithm, and has practical guiding significance.

Description

Method for predicting short-time bus boarding passenger flow based on random forest
Technical Field
The invention relates to the technical field of traffic, in particular to a method for predicting the short-time bus boarding passenger flow based on a random forest.
Background
Public transport plays a leading role in the transportation capacity of the whole city, but at present, the public transport capacity of most domestic cities is insufficient, especially the urban public transportation capacity in peak hours is insufficient, and at the moment, the prediction and research on the short-term passenger flow volume at each bus stop is very important. The method can predict the short-time passenger flow of each bus stop, provide more reliable predicted passenger flow for a bus operation management system, play a role in adjusting bus transportation in time and relieve the crowdedness of bus passengers. However, the main research at present focuses on the optimization of the planning design of the urban public transportation network and the optimization of the public transportation management, and the following problems exist:
1. due to the less data support in the above aspects, qualitative analysis is often based.
2. The short-term prediction research on traffic flow is wide, but few short-term prediction research on the short-term passenger flow of the public transport is available.
3. The existing prediction is based on a single bus stop, and the prediction effect is poor due to the fact that the fluctuation of the passenger flow of the single bus stop is large.
Disclosure of Invention
The invention provides a method for predicting the short-time bus getting-on passenger flow based on a random forest, which can solve the problems in the prior art.
The invention provides a method for predicting the short-time bus boarding passenger flow based on a random forest, which comprises the following steps:
step S1: acquiring passenger riding information and bus position information in a research area;
step S2: calculating the getting-on station of the passenger according to the passenger taking information and the bus position information obtained in the step S1;
step S3: dividing a regional bus stop and a time window;
dividing the research area into square areas with the same size, numbering the square areas, aggregating bus stops contained in the same square to obtain regional bus stops, dividing the whole-day research time into time windows with the same size, and counting the passenger flow of each regional bus stop in each time window;
step S4: training a random forest classifier, and establishing a regression prediction model;
determining a target area bus stop and a target time window, taking the passenger flow volume of the target area bus stop in (d +1) time windows in n days before the prediction date of the target time window as a training sample, inputting the training sample into a random forest classifier for training, and establishing a regression prediction model;
wherein, the passenger flow volume of getting on the bus in (d +1) time windows every day is taken as a sample data, and n and d are integers;
step S5: constructing a prediction sample, inputting the prediction sample into a regression prediction model, and obtaining the predicted getting-on passenger flow of the bus stop in the target area in the target time window;
and selecting the boarding passenger flow of the target area bus stop in d time windows which are positioned on the same day and before the target time window as a prediction sample, inputting the prediction sample into the regression prediction model, and obtaining the predicted boarding passenger flow of the target area bus stop in the target time window, wherein d is an integer.
In the prior art, the passenger flow fluctuation of a single bus stop is large, so the passenger flow prediction effect based on the single bus stop is poor, and the practical guiding significance is not provided. The invention creatively provides a concept of 'regional bus stops' (namely step S3), and by taking the bus stops in a certain region as a set, the total passenger flow of all bus stops in the region is integrally counted and predicted, the travel information of residents in the region can be better reflected, and the predictability is better. The size of the grids of the regional bus stop can be flexibly determined according to the actual size of the whole research region and the positions and the number of the included bus stop positions. Meanwhile, as the passenger flow of the ground public transport is sparse compared with that of the subway, in order to achieve better statistical and prediction effects, the whole day time is divided into a plurality of equal time windows, and the passenger flow of the bus in each time window is counted and predicted so as to replace the passenger flow statistics and prediction of a certain time point, so that the ground public transport has better practical guiding significance.
Further, the step S1 specifically includes:
step S1.1: acquiring bus IC card swiping information of passengers in a research area through a bus-mounted card swiping machine, wherein the card swiping information comprises the identity numbers of the passengers, the boarding time and the number of the taken bus;
step S1.2: the method comprises the steps that driving track position information in a bus running time period is obtained through bus-mounted positioning equipment, and the driving track position information comprises a bus license plate number, track position point corresponding time, track position point corresponding longitude and track position point corresponding latitude.
Further, the step S2 specifically includes:
step S2.1: comparing the bus position information obtained in the step S1 with actual bus route data, and searching a position point matched with the bus position information from the bus route data, wherein the time information corresponding to the position point is the specific time when the bus arrives at each bus stop;
the bus line data comprises a line number, a station name, a station serial number, a station longitude and a station latitude;
step S2.2: and S1, comparing the passenger riding information obtained in the step S1 with the calculated specific time of the bus arriving at each bus stop, and calculating the getting-on stop of the passenger.
Specifically, the corresponding line number is determined through the bus number, then the corresponding longitude and the corresponding latitude of the track position point are compared with the longitude and the latitude of the stop, and the one-to-one correspondence relationship is established between the corresponding time of the track position point and the name of the stop when the longitude and the latitude of the track position point are close or the same, so that the specific time of each bus reaching each bus stop on the corresponding line is obtained.
And determining the corresponding bus according to the number plate of the bus taken in the passenger taking information, and then comparing the getting-on time of the passenger with the obtained specific time of the bus reaching each bus stop on the corresponding line, thereby obtaining the getting-on stop and the getting-on time of the passenger.
Further, the step S4 specifically includes:
step S4.1: determining a target area bus stop and a target time window, and acquiring the getting-on passenger flow of (d +1) time windows of the target area bus stop in n days before the prediction date of the target time window, namely the total getting-on passenger flow of (d +1) time windows of all bus stops in a grid area corresponding to the target area bus stop in n days before the prediction date of the target time window, wherein n and d are integers;
step S4.2: taking the passenger flow volume of the target area bus stop in D time windows before the target time window in n days as a first input parameter x of a training sample, taking the passenger flow volume of the target area bus stop in the n days and the target time window as a second input parameter y of the training sample, and constructing a training sample set D { (x)1,y1),(x2,y2),…,(xi,yi),…,(xn,yn)},xi∈Rd,yiE.g. R, each sample (x)i,yi) Both contain two input parameters, each sample containing the amount of boarding traffic for (d +1) time windows of the day, where xiHaving a feature dimension d, i.e. x in each sampleiAll have d values representing the same time period of the day as the target time windowPassenger volume of getting on the bus, x, of the first d time windowsiForm matrix X ═ X1,x2,…,xi,…,xn)TThe matrix X has n rows and d columns representing the amount of passenger traffic in d time windows preceding the period of coincidence with the target time window in n days, yiHaving a characteristic dimension of 1, i.e. y in each sampleiAll have a value representing the amount of boarding traffic during the same time period of the day as the target time window, yiForm a column matrix Y ═ (Y)1,y2,..,yn)TThe matrix Y has n rows and 1 columns representing known boarding traffic for the same time period as the target time window in n days, YiAnd xiOne-to-one correspondence is realized;
step S4.3: training sample set D { (x)1,y1),(x2,y2),…,(xi,yi),…,(xn,yn)},xi∈Rd,yiAnd e.g. R training a random forest classifier, and establishing a regression prediction model.
Further, step S4.3 specifically includes:
using the training sample set D { (x)1,y1),(x2,y2),…,(xi,yi),…,(xn,yn) Setting the number t of CART classification regression trees and the depth deep of each tree as input parameters of a random forest algorithm, and performing model training by using f-dimensional characteristics for each node;
wherein t, deep and f are integers, and the value of f is the square root of d and d or the logarithm of d with 2 as the base;
step S4.3.1: sampling from the training sample set D to form a self-help sample set of the jth CART classification regression tree in the t CART classification regression trees in a returning mode, and dividing each tree into the self-help sample set corresponding to each tree by using nodes in each tree from a root node in sequence;
wherein j has a value ranging from 1 to t;
step S4.3.2: randomly selecting f-dimensional features from d-dimensional features of a training sample set at each node of each tree without putting back, seeking k-dimensional features with the best classification effect from the f-dimensional features, dividing the current nodes which do not meet termination conditions by taking the k-dimensional features as dividing features and taking a feature value corresponding to the k-dimensional features as a threshold value;
dividing a sample with the kth dimension characteristic smaller than a threshold value in the current node into a left node, dividing the rest samples in the current node into a right node, wherein the value range of k is 1-f;
step S4.3.3: dividing the current node meeting the termination condition into leaf nodes, wherein the predicted output of the leaf nodes is the average value of all sample values contained in the current node;
the method comprises the following steps that a termination condition is that when the number of samples contained in a current node is minimum and information gain is minimum, the current node stops splitting;
step S4.3.4: repeating steps S4.3.1 through S4.3.3 until all nodes are trained or marked as leaf nodes;
step S4.3.5: steps S4.3.1 through S4.3.4 are repeated until all CART classification regression trees are trained.
Further, the prediction process of the regression prediction model in step S5 specifically includes:
selecting a jth CART classification regression tree, dividing prediction samples from a root node of a current tree through the CART classification regression tree, dividing the prediction samples smaller than a threshold value to a left node according to the division characteristics and the threshold value of the nodes, dividing the rest samples to a right node until the leaf nodes of the current tree are reached, and outputting a predicted value; wherein j has a value ranging from 1 to t;
and repeating the operation until all the CART classification regression trees output predicted values, wherein the average value of the output predicted values of all the CART classification regression trees is the output value of the regression prediction model.
Advantageous effects
The invention provides a method for predicting the passenger flow of a short-time bus on the basis of a random forest aiming at the problem that passengers waiting for the bus cannot get on the bus due to excessive crowding in the bus at a peak time, namely the urban bus capacity at the peak time is insufficient.
Drawings
FIG. 1 is a flow chart of a method for predicting the passenger flow volume of a short-time bus based on a random forest according to an embodiment of the present invention;
fig. 2 and 3 are result presentation graphs for predicting passenger flow volume on Shenzhen short-term public transport by applying the method, wherein fig. 2 shows a fitting graph of the observed values 18:00-19:00 and the observed values 19:00-20:00 in 10 and 30 months in 2014, and fig. 3 shows fitting situations of the predicted values and the observed values 19:00-20:00 in all regions in 30 days 19:00-20:00 in 10 and 30 months in 2014.
Detailed Description
In order to better understand the method for predicting the short-time bus boarding passenger flow based on the random forest, which is provided by the invention, the following detailed explanation is carried out by combining a specific embodiment.
The embodiment of the invention provides a method for predicting the short-time bus boarding passenger flow based on a random forest, which is realized by the specific steps shown in figure 1. The embodiment uses the Shenzhen bus IC card swiping data and the bus GPS data from 10 month 11 date 2014 to 10 month 31 date 2014 in Shenzhen city. The specific implementation mode comprises the following steps:
step S1: acquiring card swiping data of bus IC cards from 10/11 th 2014 to 10/31 th 2014 in Shenzhen city, wherein each card swiping data records information such as 'passenger ID', 'boarding time', 'bus ID' and the like; acquiring bus-mounted GPS data from 10/11 th 2014 to 10/31 th 2014 in Shenzhen city, wherein the data comprises information such as 'bus ID', 'GPS point time', 'GPS point longitude', 'GPS point latitude', and the like.
Step S2: and calculating the boarding place and boarding time of the passenger according to the information, namely the bus stop where each card swiping record of the passenger occurs. In order to achieve the purpose, firstly, data fusion is carried out on GPS data and bus route and station data, and the specific time of the bus arriving at each bus station is calculated according to the data fusion; and then fusing the fused data with passenger IC card swiping data to calculate the boarding station and boarding time of the passenger. The bus route data includes "route number", "station name", "station serial number", "station longitude", and "station latitude". The data fusion mode is to reserve all fields of different data, specifically includes information of boarding stations of Shenzhen passengers from 11/10/2014 to 21/10/31/2014, and includes 14109 travel data of 293 passengers after being screened.
Step S3: dividing the whole research region, namely Shenzhen city (53914 bus stops in total), into squares (the size of each square is 1km multiplied by 1km) with the same size, numbering the squares, and then aggregating the bus stops contained in each square to form the region bus stops; meanwhile, the study of the embodiment uses card swiping data generated in a time period of 6:00-22:00 a day to count the passenger flow on the bus in each square in each time window every day, wherein the size of the time window is 1h, and 16 time windows are provided all day long.
Step S4: determining a target area bus stop (namely a square grid to be predicted) and a target time window (namely a time window to be predicted), and counting to obtain the passenger flow of the target area bus stop in (d +1) time windows in n days before the prediction date of the target time window.
Taking the passenger flow volume of the target area bus stop in D time windows before the time window of the target area bus stop in n days as a first input parameter x of a training sample, taking the passenger flow volume of the target area bus stop in the time window of the target area bus stop in n days as a second input parameter y of the training sample, and constructing a training sample set D { (x)1,y1),(x2,y2),…,(xi,yi),…,(xn,yn)},xi∈Rd,yiE.g. R, each sample (x)i,yi) Both contain two input parameters, each sample containing the amount of boarding traffic for (d +1) time windows of the dayWherein x isiHaving a feature dimension d, i.e. x in each sampleiEach having d values representing the amount of boarding traffic d time windows before the same period as the target time window in a day, xiForm matrix X ═ X1,x2,…,xi,…,xn)TThe matrix X has n rows and d columns representing the amount of passenger traffic in d time windows preceding the period of coincidence with the target time window in n days, yiHaving a characteristic dimension of 1, i.e. y in each sampleiAll have a value representing the amount of boarding traffic during the same time period of the day as the target time window, yiForm a column matrix Y ═ (Y)1,y2,..,yn)TThe matrix Y has n rows and 1 columns representing known boarding traffic for the same time period as the target time window in n days, YiAnd xiAnd correspond to each other.
Training sample set D { (x)1,y1),(x2,y2),…,(xi,yi),…,(xn,yn)},xi∈Rd,yiAnd e.g. R training a random forest classifier, and establishing a regression prediction model.
Step S5: acquiring the getting-on passenger flow of a bus stop in a target area in d time windows before the target time window on the same day; constructing a prediction sample x by using the passenger flow volume of the getting-on bus*,x*∈RdPrediction sample x*The characteristic dimension d is the same as that of each sample in the first input parameter x, namely the predicted sample and the first input parameter contain the getting-on passenger flow of the same number of time windows; will predict sample x*And inputting the regression prediction model to obtain the predicted boarding passenger flow of the bus stop in the target area in the target time window.
Specifically, in this embodiment, the amount of passengers getting on the target grid in the time period from 10/27 th to 10/30 th 16:00-20:00 (i.e., time windows TW 16-TW 19) in 2014 is selected for research and analysis, and the data from 27 th to 29 th is used as a training set, and the data from 30 th is used as a prediction set. In the corresponding training feature set D, D is 3, which includes TW 16-TW 18, and represents 3 time windows; n is 3, and represents 27 to 29 daysThree days; prediction set x*Representing the passenger flow on the bus stop in the target area of 16:00-19:00(TW 16-TW 18) on the day of 30. The machine learning algorithm used in this embodiment is a random forest, and the algorithm implementation includes two processes of training and prediction, which are specifically as follows:
training process:
1. the training set is a training set D { (x)1,y1),(x2,y2),…,(xn,yn)},xi∈Rw,yie.R. test set as prediction set x*∈RdThe training feature set and the prediction set both have d-dimensional features. Therefore, t CART classification regression trees are formed, the depth of each tree is deep, each node uses f-dimensional characteristics, and when a certain node contains the least number of samples and the information gain is the least, the node stops splitting; in this embodiment, t is 10, deep is an initial value, and f is d.
2. And forming train (j) by sampling the training feature set D in a place-back manner, wherein the train (j) represents the training set of the jth CART classification regression tree, and the training is started from the root node, wherein j is 1,2,3, … and 10.
3. If the current node meets the termination condition, the current node is divided into leaf nodes, the prediction output of the leaf nodes is the average value of all sample values of a sample set contained in the current node, and then other nodes are continuously trained. If the current node does not meet the termination condition, randomly selecting f-dimensional features from the d-dimensional features according to a certain proportion without replacing (the value of f is generally d, sqrt (d) or log)2(d) In this embodiment, if f ═ d is taken, the one-dimensional feature with the best classification effect (i.e., when the value obtained by subtracting the variance VarLeft of the left child node from the variance Var of the current node sample set and subtracting the variance VarRight of the right child node from the variance Var of the current node sample set is maximum) is found and is recorded as the kth-dimensional feature (1 ═ d)<k<f) And the corresponding characteristic value is a Threshold value Threshold, dividing the sample of which the k-dimensional characteristic of the current node is smaller than the Threshold value Threshold into a left node and dividing the rest samples into a right node, and then continuing to train other nodes.
4. And repeating the steps 2 and 3 until all the nodes are trained or marked as leaf nodes.
5. And repeating the steps 2,3 and 4 until all the CART classification regression trees are trained.
And (3) prediction process:
1. for the jth CART tree, starting from the root node of the current tree, dividing samples smaller than a Threshold value Threshold to a left node according to a division characteristic k and the Threshold value Threshold of the current node, dividing the remaining samples to a right node until reaching a certain leaf node, and outputting a predicted value.
2. And repeating the previous step until the t CART trees output predicted values, wherein the predicted values of the random forest model are the average values of the outputs of all the CART trees.
Specifically, in the present embodiment, the boarding traffic of 19:00-20:00 of all the squares in 10/month and 30/day is predicted, and the predicted values are compared with the observed values, and the result is shown in fig. 2 and fig. 3, and fig. 2 is a fitting graph showing the observed values in 18:00-19:00 and the observed values in 19:00-20:00 in 10/month and 30/year 2014; fig. 3 shows the fit of the predicted values and observed values for all regions at 19:00-20:00, 30 months 10 and 2014. R in the figure2For determining the coefficient (goodness of fit), the larger the goodness of fit is, the denser the points are near the regression line is, and the better the prediction effect is; RMSE represents the root mean square error, and a smaller value thereof indicates better prediction effect. As can be seen, the prediction effect of the method is excellent.
In summary, the method for predicting the passenger flow volume of the short-time bus based on the random forest, provided by the invention, aims at the problem that passengers waiting for the bus cannot get on the bus due to excessive congestion in the bus at the peak time, namely the problem that the urban bus transport capacity is insufficient at the peak time, predicts the passenger flow volume of the regional bus stop by providing the concept of the regional bus stop and mining and learning training historical data by means of a machine learning algorithm, provides a more reliable predicted passenger flow volume for a bus operation management system, plays a role in timely adjusting the bus transport capacity, and improves the service level of the public transport.
The above description is only exemplary of the present invention and should not be taken as limiting the invention, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. A method for predicting the short-time bus boarding passenger flow based on a random forest is characterized by comprising the following steps:
step S1: acquiring passenger riding information and bus position information in a research area;
step S2: calculating the getting-on station of the passenger according to the passenger taking information and the bus position information obtained in the step S1;
step S3: dividing a regional bus stop and a time window;
dividing the research area into square areas with the same size, numbering the square areas, aggregating bus stops contained in the same square to obtain regional bus stops, dividing the whole-day research time into time windows with the same size, and counting the passenger flow of each regional bus stop in each time window;
step S4: training a random forest classifier, and establishing a regression prediction model;
determining a target area bus stop and a target time window, taking the passenger flow volume of the target area bus stop in (d +1) time windows in n days before the prediction date of the target time window as a training sample, inputting the training sample into a random forest classifier for training, and establishing a regression prediction model;
wherein, the passenger flow volume of getting on the bus in (d +1) time windows every day is taken as a sample data, and n and d are integers;
step S5: constructing a prediction sample, inputting the prediction sample into a regression prediction model, and obtaining the predicted getting-on passenger flow of the bus stop in the target area in the target time window;
selecting the boarding passenger flow of the target area bus stop in d time windows which are positioned on the same day and before the target time window as a prediction sample, inputting the prediction sample into the regression prediction model to obtain the predicted boarding passenger flow of the target area bus stop in the target time window, wherein d is an integer;
the step S4 specifically includes:
step S4.1: determining a target area bus stop and a target time window, and acquiring the passenger flow volume of the target area bus stop in (d +1) time windows in n days before the prediction date of the target time window, wherein n and d are integers;
step S4.2: taking the passenger flow volume of the target area bus stop in D time windows before the target time window in n days as a first input parameter x of a training sample, taking the passenger flow volume of the target area bus stop in the n days and the target time window as a second input parameter y of the training sample, and constructing a training sample set D { (x)1,y1),(x2,y2),…,(xi,yi),…,(xn,yn)},xi∈Rd,yiE.g. R, each sample (x)i,yi) Both contain two input parameters, where xiHas a characteristic dimension of d, d being an integer;
step S4.3: training sample set D { (x)1,y1),(x2,y2),…,(xi,yi),…,(xn,yn)},xi∈Rd,yiAnd e.g. R training a random forest classifier, and establishing a regression prediction model.
2. The prediction method according to claim 1, wherein the step S1 specifically includes:
step S1.1: acquiring bus IC card swiping information of passengers in a research area through a bus-mounted card swiping machine, wherein the card swiping information comprises the identity numbers of the passengers, the boarding time and the number of the taken bus;
step S1.2: the method comprises the steps that driving track position information in a bus running time period is obtained through bus-mounted positioning equipment, and the driving track position information comprises a bus license plate number, track position point corresponding time, track position point corresponding longitude and track position point corresponding latitude.
3. The prediction method according to claim 2, wherein the step S2 specifically includes:
step S2.1: comparing the bus position information obtained in the step S1 with actual bus route data, and searching a position point matched with the bus position information from the bus route data, wherein the time information corresponding to the position point is the specific time when the bus arrives at each bus stop;
the bus line data comprises a line number, a station name, a station serial number, a station longitude and a station latitude;
step S2.2: and S1, comparing the passenger riding information obtained in the step S1 with the calculated specific time of the bus arriving at each bus stop, and calculating the getting-on stop of the passenger.
4. The prediction method according to claim 3, characterized in that said step S4.3 is specifically:
using the training sample set D { (x)1,y1),(x2,y2),…,(xi,yi),…,(xn,yn) Setting the number t of CART classification regression trees and the depth deep of each tree as input parameters of a random forest algorithm, and performing model training by using f-dimensional characteristics for each node;
wherein t, deep and f are integers, and the value of f is the square root of d and d or the logarithm of d with 2 as the base;
step S4.3.1: sampling from the training sample set D to form a self-help sample set of the jth CART classification regression tree in the t CART classification regression trees in a returning mode, and dividing each tree into the self-help sample set corresponding to each tree by using nodes in each tree from a root node in sequence;
wherein j has a value ranging from 1 to t;
step S4.3.2: randomly selecting f-dimensional features from d-dimensional features of a training sample set at each node of each tree without putting back, seeking k-dimensional features with the best classification effect from the f-dimensional features, dividing the current nodes which do not meet termination conditions by taking the k-dimensional features as dividing features and taking a feature value corresponding to the k-dimensional features as a threshold value;
dividing a sample with the kth dimension characteristic smaller than a threshold value in the current node into a left node, dividing the rest samples in the current node into a right node, wherein the value range of k is 1-f;
step S4.3.3: dividing the current node meeting the termination condition into leaf nodes, wherein the predicted output of the leaf nodes is the average value of all sample values contained in the current node;
the method comprises the following steps that a termination condition is that when the number of samples contained in a current node is minimum and information gain is minimum, the current node stops splitting;
step S4.3.4: repeating steps S4.3.1 through S4.3.3 until all nodes are trained or marked as leaf nodes;
step S4.3.5: steps S4.3.1 through S4.3.4 are repeated until all CART classification regression trees are trained.
5. The prediction method according to claim 4, wherein the prediction process of the regression prediction model in step S5 is specifically:
selecting a jth CART classification regression tree, dividing prediction samples from a root node of a current tree through the CART classification regression tree, dividing the prediction samples smaller than a threshold value to a left node according to the division characteristics and the threshold value of the nodes, dividing the rest samples to a right node until the leaf nodes of the current tree are reached, and outputting a predicted value; wherein j has a value ranging from 1 to t;
and repeating the operation until all the CART classification regression trees output predicted values, wherein the average value of the output predicted values of all the CART classification regression trees is the output value of the regression prediction model.
CN201710609933.4A 2017-07-25 2017-07-25 Method for predicting short-time bus boarding passenger flow based on random forest Expired - Fee Related CN107563540B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710609933.4A CN107563540B (en) 2017-07-25 2017-07-25 Method for predicting short-time bus boarding passenger flow based on random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710609933.4A CN107563540B (en) 2017-07-25 2017-07-25 Method for predicting short-time bus boarding passenger flow based on random forest

Publications (2)

Publication Number Publication Date
CN107563540A CN107563540A (en) 2018-01-09
CN107563540B true CN107563540B (en) 2021-03-30

Family

ID=60974256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710609933.4A Expired - Fee Related CN107563540B (en) 2017-07-25 2017-07-25 Method for predicting short-time bus boarding passenger flow based on random forest

Country Status (1)

Country Link
CN (1) CN107563540B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108415885A (en) * 2018-02-08 2018-08-17 武汉蓝泰源信息技术有限公司 The real-time bus passenger flow prediction technique returned based on neighbour
CN108877223A (en) * 2018-07-13 2018-11-23 南京理工大学 A kind of Short-time Traffic Flow Forecasting Methods based on temporal correlation
CN109035770B (en) * 2018-07-31 2022-01-04 上海世脉信息科技有限公司 Real-time analysis and prediction method for bus passenger capacity in big data environment
CN109711428A (en) * 2018-11-20 2019-05-03 佛山科学技术学院 A kind of saturated gas pipeline internal corrosion speed predicting method and device
CN109741597B (en) * 2018-12-11 2020-09-29 大连理工大学 Bus section operation time prediction method based on improved deep forest
CN111105070B (en) * 2019-11-20 2024-04-16 深圳市北斗智能科技有限公司 Passenger flow early warning method and system
CN113570099A (en) * 2020-04-28 2021-10-29 百度在线网络技术(北京)有限公司 Departure interval prediction method, prediction model training method, device and equipment
CN112235362B (en) * 2020-09-28 2022-08-30 北京百度网讯科技有限公司 Position determination method, device, equipment and storage medium
CN112288162B (en) * 2020-10-29 2024-05-10 平安科技(深圳)有限公司 Short-time bus station passenger flow prediction method and device, computer equipment and storage medium
CN112949939B (en) * 2021-03-30 2022-12-06 福州市电子信息集团有限公司 Taxi passenger carrying hotspot prediction method based on random forest model
CN113392880B (en) * 2021-05-27 2021-11-23 扬州大学 Traffic flow short-time prediction method based on deviation correction random forest

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102169606A (en) * 2010-02-26 2011-08-31 同济大学 Method for predicting influence of heavy passenger flow of urban rail transit network
CN102436603A (en) * 2011-08-29 2012-05-02 北京航空航天大学 Rail transit full-road-network passenger flow prediction method based on probability tree destination (D) prediction
CN105095993A (en) * 2015-07-22 2015-11-25 济南市市政工程设计研究院(集团)有限责任公司 System and method for predicting passenger flow volume of railway stations

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9639807B2 (en) * 2014-06-10 2017-05-02 Jose Oriol Lopez Berengueres Method and system for forecasting future events

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102169606A (en) * 2010-02-26 2011-08-31 同济大学 Method for predicting influence of heavy passenger flow of urban rail transit network
CN102436603A (en) * 2011-08-29 2012-05-02 北京航空航天大学 Rail transit full-road-network passenger flow prediction method based on probability tree destination (D) prediction
CN105095993A (en) * 2015-07-22 2015-11-25 济南市市政工程设计研究院(集团)有限责任公司 System and method for predicting passenger flow volume of railway stations

Also Published As

Publication number Publication date
CN107563540A (en) 2018-01-09

Similar Documents

Publication Publication Date Title
CN107563540B (en) Method for predicting short-time bus boarding passenger flow based on random forest
Gurumurthy et al. Analyzing the dynamic ride-sharing potential for shared autonomous vehicle fleets using cellphone data from Orlando, Florida
US9726502B2 (en) Route planner for transportation systems
CN108133302B (en) Public bicycle potential demand prediction method based on big data
CN103984994B (en) Method for predicting urban rail transit passenger flow peak duration
CN111932925A (en) Method, device and system for determining travel passenger flow of public transport station
Loulizi et al. Steady‐State Car‐Following Time Gaps: An Empirical Study Using Naturalistic Driving Data
CN110836675B (en) Decision tree-based automatic driving search decision method
Woensel et al. Empirical validation of a queueing approach to uninterrupted traffic flows
CN110021161B (en) Traffic flow direction prediction method and system
CN114501336B (en) Road traffic volume measuring and calculating method and device, electronic equipment and storage medium
JP6307376B2 (en) Traffic analysis system, traffic analysis program, and traffic analysis method
Karoń Travel demand and transportation supply modelling for agglomeration without transportation model
Ni et al. DEPART: Dynamic route planning in stochastic time-dependent public transit networks
Ghasedi et al. Robust optimization of bus stop placement based on dynamic demand using meta heuristic approaches: a case study in a developing country
CN108681741A (en) Based on the subway of IC card and resident&#39;s survey data commuting crowd&#39;s information fusion method
CN116090785B (en) Custom bus planning method for two stages of large-scale movable loose scene
CN112926809B (en) Flight flow prediction method and system based on clustering and improved xgboost
CN114239929B (en) Taxi traffic demand feature prediction method based on random forest
Roxas Jr et al. Cost overruns and the proposed Panay-Guimaras-Negros inter-island bridge project, Philippines
Zenina et al. Transport simulation model calibration with two-step cluster analysis procedure
Jonker Modelling the trip length distribution of shopping trips from GPS data
Liu et al. A Short‐Turn Dispatching Strategy to Improve the Reliability of Bus Operation
CN112733891B (en) Method for identifying bus IC card passengers to get off station points during travel chain breakage
CN110570650B (en) Travel path and node flow prediction method based on RFID data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210330