CN111784084B - Travel generation prediction method, system and device based on gradient lifting decision tree - Google Patents

Travel generation prediction method, system and device based on gradient lifting decision tree Download PDF

Info

Publication number
CN111784084B
CN111784084B CN202010823717.1A CN202010823717A CN111784084B CN 111784084 B CN111784084 B CN 111784084B CN 202010823717 A CN202010823717 A CN 202010823717A CN 111784084 B CN111784084 B CN 111784084B
Authority
CN
China
Prior art keywords
predicted
value
model
training
decision tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010823717.1A
Other languages
Chinese (zh)
Other versions
CN111784084A (en
Inventor
杜立群
刘斌
郑猛
张宇
吴丹婷
吕宜生
李志帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Municipal Institute Of City Planning & Design
Institute of Automation of Chinese Academy of Science
Original Assignee
Beijing Municipal Institute Of City Planning & Design
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Municipal Institute Of City Planning & Design, Institute of Automation of Chinese Academy of Science filed Critical Beijing Municipal Institute Of City Planning & Design
Priority to CN202010823717.1A priority Critical patent/CN111784084B/en
Publication of CN111784084A publication Critical patent/CN111784084A/en
Application granted granted Critical
Publication of CN111784084B publication Critical patent/CN111784084B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the field of population travel generation prediction, and particularly relates to a travel generation prediction method, a travel generation prediction system and a travel generation prediction device based on a gradient lifting decision tree, aiming at solving the problems that the existing travel generation method cannot truly reflect the nonlinear relation between an input value and prediction, and the model inspection calculation amount is large, and the result is not intuitive. The invention comprises the following steps: extracting independent variables of current travel generation data of each traffic cell of an area to be predicted, and performing normalization processing; generating a prediction model through travel, and acquiring the prediction value of each current traffic cell of the area to be predicted; and performing inverse normalization on the predicted values to obtain predicted travel generation data of each current traffic cell of the area to be predicted. The invention can accurately reflect the nonlinear relation between the original input and the original output, and uses the square error principle to search the minimum division characteristic and the division point, automatically omits the redundant variable, omits the manual screening process of the variable and has higher precision and robustness.

Description

Travel generation prediction method, system and device based on gradient lifting decision tree
Technical Field
The invention belongs to the field of population travel generation prediction, and particularly relates to a travel generation prediction method, system and device based on a gradient boosting decision tree.
Background
The interactive relationship between urban traffic and urban land utilization determines that social activities of different types and strengths can be generated by different land utilization layout forms and strengths, so that the traffic distribution amount and distribution conditions in different areas are determined. Correspondingly, the functional efficiency of the traffic system directly influences the price, rents and gas of surrounding land and influences the realization of the functions of the surrounding land. Therefore, the interrelationship between urban land utilization and traffic needs to be deeply researched in traffic planning, and the traffic trip rate is one of the important indexes for intuitively reflecting the interrelationship.
Urban traffic demand prediction is one of the core contents of urban traffic planning, and is an important basis for determining the scale of a traffic network, the structure of a road section, the scale of a junction and the like in a city. The traffic four-phase method is based on resident trip survey and comprises four phases of trip generation (trip generation/association), traffic distribution (trip distribution), traffic mode division (model split) and traffic allocation (traffic allocation).
The travel generation model is the sum of the travel production of a certain traffic cell in unit time equal to the number of home trips of the home end point in the partition and the number of non-home trips and cargo trips of the starting point in the partition. There are two endpoints for a trip: one end is a generating end point; the other end is a suction end point. The main factors affecting the production are population size and related classifications, such as age structure, occupation classification, income level, vehicle ownership, etc.
The traditional travel generation prediction method comprises a type analysis method, a regression analysis method and a growth rate method. The yield predicted by the type analysis method does not include two parts, namely home trip and cargo trip, and the prediction data is incomplete; the growth rate method results are rough. Therefore, at present, the most practical engineering application is the multiple regression analysis method, but the method defaults to the linear relationship between the input value and the prediction, the nonlinear influence between the input value and the prediction and the coupling relationship between the input value and the prediction cannot be truly reflected, statistical tests (significance and correlation) need to be carried out on the prediction model, the calculation amount is large, and the result is not intuitive enough.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, the existing trip generation method cannot truly reflect the nonlinear relationship between the input value and the prediction, and has the problems of large model checking calculation amount and non-intuitive result, the invention provides a trip generation prediction method based on a gradient lifting decision tree, which comprises the following steps:
step S10, extracting independent variables of current travel generation data of each traffic cell of the area to be predicted, and performing normalization processing on the independent variables to obtain preprocessed data;
step S20, based on the preprocessed data, generating a prediction model through the trained trip, and acquiring the current prediction value of each traffic cell of the area to be predicted;
step S30, performing inverse normalization on the predicted values to obtain current predicted travel generation data of each traffic cell of the area to be predicted;
the trip generation prediction model is a gradient lifting decision tree model structure, a decision tree is used as a base learner, the sum of the outputs of all decision trees in the model is used as the output of the model, the square error is used as a loss function L between the predicted value and the true value of the model, and the model training method comprises the following steps:
step B10, extracting independent variables and dependent variables of historical travel generation data of each traffic cell of the area to be predicted, carrying out normalization processing, and dividing the normalized data into a training set and a test set according to a preset proportion;
step B20, performing N rounds of travel generation prediction model training based on each training data of the training set, wherein in the N-th round of training, in the model training, the modelAdding n decision trees, calculating the error negative gradient value r output by the n model based on the loss function L(n+1)i(ii) a N is more than or equal to 1 and less than or equal to N is the round of current model training;
step B30, adding n +1 decision trees in the model, and making the error of the nth round negative gradient value r(n+1)iAs labels, training the (N + 1) th decision tree until the training of the N decision trees is completed;
and step B40, performing performance test of the trained trip generation prediction model based on each test data of the test set, if the test result does not meet the set threshold, increasing the training round or adjusting the structure of the decision tree of the base learner and performing model training again by using the original training set until the test result meets the set threshold, and obtaining the trained trip generation prediction model.
In some preferred embodiments, the historical travel generation data of each traffic cell of the area to be predicted includes an independent variable and a dependent variable;
the independent variables comprise the number of families with or without vehicles and the population number, the number of workers with or without vehicles, students and other types of personnel and the total number of people in each employment post in each traffic district; the employment posts comprise industry, water conservancy environment and public facilities, transportation and postal storage, public management, education, resident service industry, financial industry, information technology service industry, agriculture, forestry, animal husbandry and fishery;
the dependent variable comprises travel production of vehicles in each traffic cell and family-based and non-family-based families in the absence of vehicles.
In some preferred embodiments, step S10, "normalization processing of variables" is performed by:
Figure BDA0002635393420000031
Figure BDA0002635393420000032
wherein,
Figure BDA0002635393420000041
and
Figure BDA0002635393420000042
independent variables X of the historical data before normalizationiAnd dependent variable YiMaximum value, x, of the data of each dimension of (1)iAnd yiRespectively are independent variable and dependent variable after normalization, k is xiD is yiDimension (d) of (a).
In some preferred embodiments, for the ith training data (x) in the training seti,yi) The method for calculating the loss value comprises the following steps:
Figure BDA0002635393420000043
wherein, f (x)i) And yiRespectively generating a prediction value output by a prediction model and training data x for traveliCorresponding true value, D is f (x)i) And yiDimension (d) of (a).
In some preferred embodiments, step B20 "calculate the error negative gradient value r of the nth round model output based on the loss function L(n+1)i", the method is as follows:
Figure BDA0002635393420000044
wherein, L (y)i,fn(xi) Represents a predicted value f of the output of the trip generation prediction model in the nth round of trainingn(xi) Corresponding to the true value yiThe loss value between, m is the number of training data in the training set,
Figure BDA0002635393420000047
represents the loss value L (y)i,fn(xi) With respect to the predicted value fn(xi) Partial derivatives of (a);
Figure BDA0002635393420000045
wherein, T (x)i,Θn) The predicted value, theta, output for the nth decision tree representing the modelnParameters of the nth decision tree for the nth round of model training.
In some preferred embodiments, step B30 "add n +1 decision tree in model, and make the error of the n round negative gradient value r(n+1)iAnd (3) training the (n + 1) th decision tree as a label, wherein the method comprises the following steps:
Figure BDA0002635393420000046
wherein, thetan+1For the n +1 decision tree parameters in the n +1 round of model training, r(n+1)iNegative gradient value of error, L, for the output of the nth modelb(r(n+1)i,T(xi,Θn+1) Represents the predicted value T (x) of the n +1 decision tree output of the n +1 round of training of the modeli,Θn+1) With corresponding error negative gradient value r(n+1)iThe loss value between true, m is the number of training data in the training set;
Figure BDA0002635393420000051
wherein L isbD is the negative gradient r of the model in the n +1 th round of training as the loss function of the base learner(n+1)iAnd the predicted value T (x) output by the n +1 decision treei,Θn+1) Dimension (d) of (a).
In some preferred embodiments, in step B40, "performance test of the trip generation prediction model after training based on each test data of the test set", the method includes:
step C10, inputting the independent variables in each test data of the test set into the trained trip generation prediction model, and obtaining the prediction value output by the trip generation prediction model;
step C20, calculating R between the predicted value and dependent variable corresponding to independent variable2Value, root mean square error, and average absolute error;
step C30, if said R is2The value is close to 1, and the root mean square error and the average absolute error are smaller than a set threshold value, so that the performance of the trip generation prediction model meets the requirement; otherwise, increasing training rounds or adjusting the structure of the decision tree of the base learner and performing model training again by using the original training set.
On the other hand, the invention provides a travel generation prediction system based on a gradient lifting decision tree, and the travel generation prediction method based on the gradient lifting decision tree comprises an input module, a preprocessing module, a prediction module, an inverse normalization module and an output module;
the input module is configured to acquire and input current travel generation data of each traffic cell of an area to be predicted;
the preprocessing module is configured to extract independent variables of current travel generation data of each traffic cell of the area to be predicted, and normalize the independent variables to obtain preprocessed data;
the prediction module is configured to generate a prediction model through the trained trip based on the preprocessed data, and obtain the current prediction value of each traffic cell of the area to be predicted;
the reverse normalization module is configured to reverse normalize the predicted value to obtain current predicted travel generation data of each traffic cell of the area to be predicted;
the output module is configured to output the obtained current predicted travel generation data of each traffic cell of the area to be predicted.
In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, and the programs are suitable for being loaded and executed by a processor to implement the above-mentioned travel generation prediction method based on a gradient lifting decision tree.
In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; the processor is suitable for executing various programs; the storage device is suitable for storing a plurality of programs; the program is suitable for being loaded and executed by a processor to realize the travel generation prediction method based on the gradient boost decision tree.
The invention has the beneficial effects that:
the method for predicting the travel generation based on the gradient lifting decision tree obtains a prediction model to predict the travel generation by utilizing preprocessed resident survey data and training a gradient lifting decision tree structure, can accurately reflect the nonlinear relation between original input and output, uses a square error principle to find the minimum division characteristic and division point, automatically ignores redundant variables, omits a manual variable screening process, and has higher precision and robustness compared with the conventional multiple linear regression method. Meanwhile, the invention provides the model performance evaluation index by adopting the test set, and the quality of different models can be compared under the index, so that the model parameter inspection process is simpler and more visual.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a schematic flow chart of a travel generation prediction method based on a gradient boosting decision tree according to the present invention;
FIG. 2 is a schematic overall structure diagram of a travel generation prediction method based on a gradient lifting decision tree according to the present invention;
fig. 3 is a schematic structural diagram of a decision tree with a depth of d-3 leaf nodes and a number of J-4 leaf nodes according to an embodiment of the trip generation prediction method based on a gradient lifting decision tree;
fig. 4 is a schematic structural diagram of a gradient boosting decision tree including N decision trees, which is adopted in an embodiment of the travel generation prediction method based on a gradient boosting decision tree.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention provides a travel generation prediction method based on a Gradient Boosting Decision Tree, which is characterized in that the method considers and utilizes big data of resident survey again to research a traffic generation prediction problem based on a Tree structure model, and provides a prediction method of a Decision Tree structure based on a Gradient Boosting Decision Tree (GBDT) aiming at the actual application requirement of traffic generation prediction. The method is based on a data driving mode, the optimal division characteristics and division points are found through the principle of minimizing the square error, the screening process of independent variables is omitted, the characteristics of an input mode are effectively extracted, meanwhile, a test set is adopted to evaluate the performance of the model, and the prediction and parameter inspection of the independent variables are simple and visual.
The invention discloses a travel generation prediction method based on a gradient lifting decision tree, which comprises the following steps:
step S10, extracting independent variables of current travel generation data of each traffic cell of the area to be predicted, and performing normalization processing on the independent variables to obtain preprocessed data;
step S20, based on the preprocessed data, generating a prediction model through the trained trip, and acquiring the current prediction value of each traffic cell of the area to be predicted;
step S30, performing inverse normalization on the predicted values to obtain current predicted travel generation data of each traffic cell of the area to be predicted;
the trip generation prediction model is a gradient lifting decision tree model structure, a decision tree is used as a base learner, the sum of the outputs of all decision trees in the model is used as the output of the model, the square error is used as a loss function L between the predicted value and the true value of the model, and the model training method comprises the following steps:
step B10, extracting independent variables and dependent variables of historical travel generation data of each traffic cell of the area to be predicted, carrying out normalization processing, and dividing the normalized data into a training set and a test set according to a preset proportion;
and step B20, performing N rounds of travel generation prediction model training based on each training data of the training set, adding N decision trees in the model in the N round of training, and calculating the error negative gradient value r output by the N round of model based on the loss function L(n+1)i(ii) a N is more than or equal to 1 and less than or equal to N is the round of current model training;
step B30, adding n +1 decision trees in the model, and making the error of the nth round negative gradient value r(n+1)iAs labels, training the (N + 1) th decision tree until the training of the N decision trees is completed;
and step B40, performing performance test of the trained trip generation prediction model based on each test data of the test set, if the test result does not meet the set threshold, increasing the training round or adjusting the structure of the decision tree of the base learner and performing model training again by using the original training set until the test result meets the set threshold, and obtaining the trained trip generation prediction model.
In order to more clearly describe the travel generation prediction method based on the gradient boosting decision tree of the present invention, details of each step in the embodiment of the present invention are described below with reference to fig. 1 and 2.
The trip generation prediction method based on the gradient lifting decision tree of the embodiment of the invention comprises the following steps:
and step S10, extracting independent variables of the current travel generation data of each traffic cell of the area to be predicted, and performing normalization processing on the independent variables to obtain preprocessed data.
In one embodiment of the invention, travel generation data of each traffic cell of an area to be predicted is obtained in a questionnaire survey mode, and in the model application process, current travel generation data of the area to be predicted, including an independent variable X, is used; in the model training and testing, historical trip generated data of the area to be predicted are used, and the historical trip generated data comprise independent variable X and dependent variable Y.
The independent variable X comprises the number of families with vehicles and without vehicles and the number of population in each traffic district, the number of workers with vehicles and without vehicles, students and other types of personnel and the total number of people in each employment post; the employment posts comprise industry, water conservancy environment and public facilities, transportation and postal storage, public management, education, residential service industry, financial industry, information technology service industry, agriculture, forestry, animal husbandry and fishery industry and the like.
The dependent variable Y comprises travel production of vehicles in each traffic cell and family-based and non-family-based families without vehicles; the family-based travel production represents that the position of a departure place or a destination at the time of travel or in the travel is a home, and otherwise, the family-based travel production is not the family-based travel production.
Besides questionnaire survey, the travel generation data of each traffic cell of the area to be predicted can be obtained in other manners, and the invention is not described in detail herein.
And performing coarse screening on the acquired data to remove data which is intuitively useless for a prediction result, such as the number of a traffic cell, the number of a street where the traffic cell is located and the like.
The data normalization process is to scale the value of each dimension variable to 0-1, so as to reduce the fluctuation of data and make the prediction result more stable, assuming that after the data of the questionnaire survey is roughly screened, the dimension of each independent variable is kX 1, the dimension of the dependent variable is DX 1, the two form a sample, and the ith sample (X) is used as the samplei,Yi) For example, wherein
Figure BDA0002635393420000101
Figure BDA0002635393420000102
Figure BDA00026353934200001010
For the real number domain, samples of all traffic cells constitute a data set
Figure BDA0002635393420000103
The normalization process of the variables is shown in formula (1) and formula (2):
Figure BDA0002635393420000104
Figure BDA0002635393420000105
wherein,
Figure BDA0002635393420000106
and
Figure BDA0002635393420000107
independent variables X of the historical data before normalizationiAnd dependent variable YiMaximum value, x, of the data of each dimension of (1)iAnd yiRespectively are independent variable and dependent variable after normalization, k is xiD is yiDimension (d) of (a).
And step S20, based on the preprocessed data, generating a prediction model through the trained trip, and acquiring the current prediction value of each traffic cell of the area to be predicted.
And inputting the data into a trained trip generation prediction model with fixed parameters, and obtaining the current predicted values of all traffic cells of the area to be predicted, which are output by the model.
And step S30, performing inverse normalization on the predicted values to obtain current predicted travel generation data of each traffic cell of the area to be predicted.
The inverse normalization of the predicted value is shown as formula (3):
Figure BDA0002635393420000108
wherein x isiDenotes the ith input sample, f (x)i) As a predictor of the model, F (x)i) In order to reverse-normalize the predicted value of the model, namely the final predicted travel generation amount, D is the dimension of the predicted value,
Figure BDA0002635393420000109
for the dependent variable Y in the history data before normalizationiMaximum value of each dimension data.
The trip generation prediction model is a gradient lifting decision tree model structure, a decision tree is used as a base learner, the sum of the outputs of all decision trees in the model is used as the output of the model, the square error is used as a loss function L between a model prediction value f (x) and a true value y, and the ith sample (x) is usedi,yi) The calculation process is, for example, as shown in equation (4):
Figure BDA0002635393420000111
wherein, f (x)i) And yiRespectively generating a prediction value output by a prediction model and training data x for traveliCorresponding true value, D is f (x)i) And yiDimension (d) of (a).
In one embodiment of the present invention, Classification and Regression Trees (CART) are selected as the base learners of the GBDT, wherein the CART can only form a binary tree, N CART Regression Trees are selected to combine into a GBDT prediction model, that is, the model is trained for N rounds, and each CART Regression tree has the same structure information, wherein the structure information includes: the number J of leaf nodes of the CATR regression tree, the depth of each tree, and the like.
The trip generation prediction model is trained by the following steps:
and step B10, extracting independent variables and dependent variables of historical travel generation data of each traffic cell of the area to be predicted, carrying out normalization processing, and dividing the normalized data into a training set and a test set according to a preset proportion.
Because the data volume generated by the trip of the traffic district obtained by obtaining the questionnaire is limited, the preprocessed data set can be mixed according to the proportion of 7: 3
Figure BDA0002635393420000112
Division into training sets
Figure BDA0002635393420000113
And test set
Figure BDA0002635393420000114
Meanwhile, the sequence of the samples in the training set needs to be randomly disturbed. With data sets
Figure BDA0002635393420000115
The ratio of the training set to the test set can be adjusted to 9: 1 by increasing the data amount.
And step B20, performing N rounds of travel generation prediction model training based on each training data of the training set, adding N decision trees in the model in the N round of training, and calculating the error negative gradient value r output by the N round of model based on the loss function L(n+1)i(ii) a And N is more than or equal to 1 and less than or equal to N is the round of current model training.
Predicted value T (x) for each base learner (i.e., each tree) during the build processi,Θn) The square error is adopted as a loss function L between the model and the negative gradient value r of the modelbIt should be noted that the loss function L, L is different from the model's predicted and true loss functions L, LbFor the loss function of the base learner, still take the ith sample as an example, as shown in equation (5):
Figure BDA0002635393420000121
wherein D is the negative gradient r of the model in the n +1 th round of training(n+1)iAnd the predicted value T (x) output by the n +1 decision treei,Θn+1) Dimension (d) of (a).
M sample data of training setTo { (x)1,y1),(x2,y2),...,(xm,ym) Inputting all the parameters into a first decision tree of the constructed travel generation prediction model, and training parameters of the tree, as shown in formula (6):
Figure BDA0002635393420000122
obtaining a predicted value of the first decision tree model, which is an output of the first decision tree, as shown in formula (7):
f1(xi)=T(xi,Θ1),i=1,2,...m (7)
calculating model output result f by constructed model loss function L1(xi) With the true value yiNegative gradient r of error between2i1, 2.. m, as shown in formula (8):
Figure BDA0002635393420000123
combining input data xiAnd the negative gradient r of the model error after the first round of training2iCombining new data pairs { (x)1,r21),(x2,r22),...,(xm,r2m) And it is used to train a second decision tree, resulting in a result T (x)i,Θ2) And last round model output f1(xi) The sum is taken as a predicted value, as shown in equation (9):
f2(xi)=T(xi,Θ2)+f1(xi),i=1,2,...m (9)
by analogy, the error negative gradient r of the model in the nth round (namely the model has n decision trees) is obtained(n+1)iThe process is shown as formula (10):
Figure BDA0002635393420000131
wherein, L (y)i,fn(xi) Represents a predicted value f of the output of the trip generation prediction model in the nth round of trainingn(xi) Corresponding to the true value yiThe loss value between, m is the number of training data in the training set,
Figure BDA0002635393420000133
represents the loss value L (y)i,fn(xi) With respect to the predicted value fn(xi) The partial derivatives of (1).
Step B30, adding n +1 decision trees in the model, and making the error of the nth round negative gradient value r(n+1)iAnd (5) as a label, training the (N + 1) th decision tree until the training of the N decision trees is completed.
r(n+1)iForming new sample pairs with corresponding input data { (x)1,r(n+1)1),(x2,r(n+1)2),...,(xm,r(n+1)m) And (3) training the trip to generate the (n + 1) th decision tree of the prediction model by using the decision tree to obtain corresponding parameters, as shown in the formula (11):
Figure BDA0002635393420000132
wherein, thetan+1Parameters of the n +1 decision Tree for the n +1 th round of model training, r(n+1)iNegative gradient value of error, L, for the output of the nth modelb(r(n+1)i,T(xi,Θn+1) Represents the predicted value T (x) of the n +1 decision tree output of the n +1 round of training of the modeli,Θn+1) With corresponding error negative gradient value r(n+1)iThe loss value between true, m is the number of training data in the training set.
The predicted value of the model at this time is shown in equation (12):
Figure BDA0002635393420000141
by analogy, straightObtaining a travel generation prediction model f after the training of the N decision trees is finishedN(xi),i=1,2,...m。
From the above operations, the lifting tree is an addition model of the decision tree, so that the final predicted value of the model is obtained, as shown in formula (13):
Figure BDA0002635393420000142
the training process of the decision tree is to find the optimal division node of the decision tree until the structural information of the tree meets a set value. The specific process is as follows: and traversing each possible value of each feature, respectively calculating the square error, and finding the partition feature j and the corresponding partition node s which enable the square error to be minimum, namely determining the partition feature j and the corresponding partition node s as the optimal partition node (j, s).
As shown in fig. 3, a schematic diagram of a decision tree structure with a depth of d-3 leaf nodes and a number of J-4 in an embodiment of the trip generation prediction method based on a gradient lifting decision tree is shown, assuming that the dimension k of input data x is 3, that is, the segmentation characteristics of arguments are 3, and output data y is outputiThe dimension r is 1, that is, the dependent variable has 1, the maximum depth of the decision tree is set to d is 3, and the number of leaf nodes is 4, then the training process is as follows:
first, the loss function L is determined by the base learnerbThe obtained and trained decision tree is used for dividing the input space, finding the optimal segmentation characteristic j of the input data x and the optimal segmentation point s under the characteristic, and using
Figure BDA0002635393420000143
Feature x in j-th dimension(j)And then, in two areas divided by s, the optimization process is to sequentially traverse each dimension feature j and each value s of the feature, calculate a loss function of each dividing point (j, s), and the dividing point with the minimum loss function is the optimal dividing point, as shown in formula (14):
Figure BDA0002635393420000151
wherein, c1,c2The mean value of all samples in each region is shown in equation (15):
Figure BDA0002635393420000152
and secondly, continuously dividing the data in the two sub-regions into the sub-regions by the steps until the number of the leaf nodes of the decision tree is equal to a set value.
Thirdly, because the number of leaf nodes is 4, dividing the input x into 4 sub-regions
Figure BDA0002635393420000153
Each region sample mean is
Figure BDA0002635393420000154
The final CART learner is shown in equation (16):
Figure BDA0002635393420000155
as shown in fig. 4, which is a schematic diagram of a gradient boosting decision tree structure including N decision trees adopted in an embodiment of the travel generation prediction method based on a gradient boosting decision tree according to the present invention, a negative gradient value r is used to train each decision tree.
And step B40, performing performance test of the trained trip generation prediction model based on each test data of the test set, if the test result does not meet the set threshold, increasing the training round or adjusting the structure of the decision tree of the base learner and performing model training again by using the original training set until the test result meets the set threshold, and obtaining the trained trip generation prediction model.
The model performance test method comprises the following steps:
and step C10, inputting the independent variables in the test data of the test set into the trained trip generation prediction model, and obtaining the predicted value output by the trip generation prediction model.
Step C20, calculating R between the predicted value and dependent variable corresponding to independent variable2The value, Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) are shown as equations (17), (18), and (19), respectively:
Figure BDA0002635393420000161
Figure BDA0002635393420000162
Figure BDA0002635393420000163
wherein, yiM is the true value of the sample, i 1, 2,. M,
Figure BDA0002635393420000164
average of the test set samples, f (x)i) M is the model prediction value, and M is the number of test set samples.
Step C30, if said R is2The value is close to 1, and the root mean square error and the average absolute error are smaller than a set threshold value, so that the performance of the trip generation prediction model meets the requirement; otherwise, increasing training rounds or adjusting the structure of the decision tree of the base learner and performing model training again by using the original training set.
The travel generation prediction system based on the gradient lifting decision tree in the second embodiment of the invention is based on the travel generation prediction method based on the gradient lifting decision tree, and comprises an input module, a preprocessing module, a prediction module, an inverse normalization module and an output module;
the input module is configured to acquire and input current travel generation data of each traffic cell of an area to be predicted;
the preprocessing module is configured to extract variables of current travel generation data of each traffic cell of the area to be predicted, and perform normalization processing on the variables to obtain preprocessed data;
the prediction module is configured to generate a prediction model through the trained trip based on the preprocessed data, and obtain the current prediction value of each traffic cell of the area to be predicted;
the reverse normalization module is configured to reverse normalize the predicted value to obtain current predicted travel generation data of each traffic cell of the area to be predicted;
the output module is configured to output the obtained current predicted travel generation data of each traffic cell of the area to be predicted.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that, the travel generation prediction system based on the gradient boost decision tree provided in the foregoing embodiment is only illustrated by the division of the functional modules, and in practical applications, the functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
A storage device according to a third embodiment of the present invention stores a plurality of programs, and the programs are suitable for being loaded and executed by a processor to implement the above-mentioned travel generation prediction method based on a gradient boosting decision tree.
A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to realize the travel generation prediction method based on the gradient boost decision tree.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (7)

1. A travel generation prediction method based on a gradient lifting decision tree is characterized by comprising the following steps:
step S10, extracting independent variables of current travel generation data of each traffic cell of the area to be predicted, and performing normalization processing on the independent variables to obtain preprocessed data; historical travel generation data of each traffic cell of the area to be predicted comprise independent variables and dependent variables; the independent variables comprise the number of families with or without vehicles and the population number, the number of workers with or without vehicles, students and other types of personnel and the total number of people in each employment post in each traffic district; the employment posts comprise industry, water conservancy environment and public facilities, transportation and postal storage, public management, education, resident service industry, financial industry, information technology service industry, agriculture, forestry, animal husbandry and fishery; the dependent variable comprises travel production of vehicles in each traffic cell and family-based and non-family-based families of the families without the vehicles;
step S20, based on the preprocessed data, generating a prediction model through the trained trip, and acquiring the current prediction value of each traffic cell of the area to be predicted;
step S30, performing inverse normalization on the predicted values to obtain current predicted travel generation data of each traffic cell of the area to be predicted;
the trip generation prediction model is a gradient lifting decision tree model structure, a decision tree is used as a base learner, the sum of the outputs of all decision trees in the model is used as the output of the model, the square error is used as a loss function L between the predicted value and the true value of the model, and the model training method comprises the following steps:
step B10, extracting independent variables and dependent variables of historical travel generation data of each traffic cell of the area to be predicted, carrying out normalization processing, and dividing the normalized data into a training set and a test set according to a preset proportion;
and step B20, performing N rounds of travel generation prediction model training based on each training data of the training set, adding N decision trees in the model in the N round of training, and calculating the error negative gradient value r output by the N round of model based on the loss function L(n+1)i(ii) a N is more than or equal to 1 and less than or equal to N is the round of current model training;
step B30, adding n +1 decision trees in the model, and making the error of the nth round negative gradient value r(n+1)iAs a label, training of the (n + 1) th decision tree is performed:
Figure FDA0003347061500000021
Θn+1for the n +1 decision tree parameters in the n +1 round of model training, r(n+1)iNegative gradient value of error, L, for the output of the nth modelb(r(n+1)i,T(xin+1) Represents the predicted value T (x) of the n +1 decision tree output of the n +1 round of training of the modelin+1) With corresponding error negative gradient value r(n+1)iThe loss value between true, m is the number of training data in the training set;
Figure FDA0003347061500000022
Lbd is the negative gradient r of the model in the n +1 th round of training as the loss function of the base learner(n+1)iAnd the predicted value T (x) output by the n +1 decision treein+1) The dimension of (a);
after the N +1 decision tree training is finished, continuing to train the N +2 decision trees until the N decision trees are trained;
step B40, inputting the independent variable in each test data of the test set into the trained trip generation prediction model, obtaining the predicted value output by the trip generation prediction model, and calculating the R between the predicted value and the dependent variable corresponding to the independent variable2Value, root mean square error and mean absolute error, if said R2The value is close to 1, and the root mean square error and the average absolute error are smaller than a set threshold value, so that the performance of the trip generation prediction model meets the requirement; otherwise, increasing training rounds or adjusting the structure of a decision tree of the base learner and performing model training again by using the original training set until the test result meets a set threshold value to obtain a trained trip generation prediction model;
in the training process of the decision tree, traversing each possible value of each feature, respectively calculating a square error, finding a partition feature j and a corresponding partition node s which enable the square error to be minimum, and obtaining an optimal partition node (j, s):
Figure FDA0003347061500000023
Figure FDA0003347061500000024
feature x in j-th dimension(j)Two regions divided by s, c1,c2Is the mean of all samples in both regions, NtFor all sample numbers in both regions:
Figure FDA0003347061500000031
continuously dividing the data in the two sub-regions into the sub-regions by the steps respectively until the number of the leaf nodes of the decision tree is equal to a set value;
if the number of leaf nodes is 4, dividing the input x into 4 sub-regions
Figure FDA0003347061500000032
Each region sample mean is
Figure FDA0003347061500000033
Final CART learnerComprises the following steps:
Figure FDA0003347061500000034
2. the method for generating and predicting a trip based on a gradient boosting decision tree according to claim 1, wherein in step S10, "normalization processing of variables" is performed, and the method includes:
Figure FDA0003347061500000035
Figure FDA0003347061500000036
wherein,
Figure FDA0003347061500000037
and
Figure FDA0003347061500000038
respectively are historical data independent variables X before normalizationiAnd dependent variable YiMaximum value, x, of the data of each dimension of (1)iAnd yiRespectively are independent variable and dependent variable after normalization, k is xiD is yiDimension (d) of (a).
3. The method of claim 1, wherein the ith training data (x) in the training set is predicted by generating a gradient lifting decision tree based tripi,yi) The method for calculating the loss value comprises the following steps:
Figure FDA0003347061500000039
wherein, f (x)i) And yiRespectively generating a prediction value output by a prediction model and training data x for traveliCorresponding true value, D is f (x)i) And yiDimension (d) of (a).
4. The method for predicting travel generation based on gradient-boosting decision tree as claimed in claim 1, wherein in step B20, the error negative gradient value r of the n-th round model output is calculated based on the loss function L(n+1)i", the method is as follows:
Figure FDA0003347061500000041
wherein, L (y)i,fn(xi) Represents a predicted value f of the output of the trip generation prediction model in the nth round of trainingn(xi) Corresponding to the true value yiThe loss value between, m is the number of training data in the training set,
Figure FDA0003347061500000043
represents the loss value L (y)i,fn(xi) With respect to the predicted value fn(xi) Partial derivatives of (a);
Figure FDA0003347061500000042
wherein, T (x)in) The predicted value, theta, output for the nth decision tree representing the modelnParameters of the nth decision tree for the nth round of model training.
5. A travel generation prediction system based on a gradient boosting decision tree, which is characterized in that based on the travel generation prediction method based on the gradient boosting decision tree of any one of claims 1 to 4, the travel generation prediction system comprises an input module, a preprocessing module, a prediction module, an inverse normalization module and an output module;
the input module is configured to acquire and input current travel generation data of each traffic cell of an area to be predicted;
the preprocessing module is configured to extract independent variables of current travel generation data of each traffic cell of the area to be predicted, and normalize the independent variables to obtain preprocessed data;
the prediction module is configured to generate a prediction model through the trained trip based on the preprocessed data, and obtain the current prediction value of each traffic cell of the area to be predicted;
the reverse normalization module is configured to reverse normalize the predicted value to obtain current predicted travel generation data of each traffic cell of the area to be predicted;
the output module is configured to output the obtained current predicted travel generation data of each traffic cell of the area to be predicted.
6. A storage device having stored therein a plurality of programs, wherein the programs are adapted to be loaded and executed by a processor to implement the gradient boosting decision tree based travel generation prediction method according to any one of claims 1 to 4.
7. A treatment apparatus comprises
A processor adapted to execute various programs; and
a storage device adapted to store a plurality of programs;
wherein the program is adapted to be loaded and executed by a processor to perform:
the gradient boosting decision tree based travel generation prediction method of any one of claims 1-4.
CN202010823717.1A 2020-08-17 2020-08-17 Travel generation prediction method, system and device based on gradient lifting decision tree Active CN111784084B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010823717.1A CN111784084B (en) 2020-08-17 2020-08-17 Travel generation prediction method, system and device based on gradient lifting decision tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010823717.1A CN111784084B (en) 2020-08-17 2020-08-17 Travel generation prediction method, system and device based on gradient lifting decision tree

Publications (2)

Publication Number Publication Date
CN111784084A CN111784084A (en) 2020-10-16
CN111784084B true CN111784084B (en) 2021-12-28

Family

ID=72762181

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010823717.1A Active CN111784084B (en) 2020-08-17 2020-08-17 Travel generation prediction method, system and device based on gradient lifting decision tree

Country Status (1)

Country Link
CN (1) CN111784084B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113327424B (en) * 2021-08-02 2022-01-07 深圳市城市交通规划设计研究中心股份有限公司 Traffic demand prediction method and device and electronic equipment
CN114298881B (en) * 2021-10-29 2023-01-06 广东省国土资源测绘院 Vector map watermark processing method and terminal based on gradient lifting decision tree
CN115829061B (en) * 2023-02-21 2023-04-28 中国电子科技集团公司第二十八研究所 Emergency accident handling method based on historical case and experience knowledge learning
CN117649164B (en) * 2024-01-30 2024-04-16 四川宽窄智慧物流有限责任公司 Gradient distribution method and system for overall cargo management

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095993A (en) * 2015-07-22 2015-11-25 济南市市政工程设计研究院(集团)有限责任公司 System and method for predicting passenger flow volume of railway stations
CN109543916A (en) * 2018-11-30 2019-03-29 广东工业大学 Silicon rod growth rate prediction model in a kind of polycrystalline silicon reducing furnace
CN110245802A (en) * 2019-06-20 2019-09-17 杭州安脉盛智能技术有限公司 Based on the cigarette void-end rate prediction technique and system for improving gradient promotion decision tree
CN110322695A (en) * 2019-07-23 2019-10-11 内蒙古工业大学 A kind of Short-time Traffic Flow Forecasting Methods based on deep learning
CN110889558A (en) * 2019-11-29 2020-03-17 北京世纪高通科技有限公司 Road condition prediction method and device
CN111126678A (en) * 2019-12-09 2020-05-08 深圳市市政设计研究院有限公司 Traffic generation prediction method based on big data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9207098B2 (en) * 2014-02-21 2015-12-08 Iteris, Inc. Short-term travel-time prediction modeling augmented with radar-based precipitation predictions and scaling of same
WO2020027864A1 (en) * 2018-07-31 2020-02-06 Didi Research America, Llc System and method for point-to-point traffic prediction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095993A (en) * 2015-07-22 2015-11-25 济南市市政工程设计研究院(集团)有限责任公司 System and method for predicting passenger flow volume of railway stations
CN109543916A (en) * 2018-11-30 2019-03-29 广东工业大学 Silicon rod growth rate prediction model in a kind of polycrystalline silicon reducing furnace
CN110245802A (en) * 2019-06-20 2019-09-17 杭州安脉盛智能技术有限公司 Based on the cigarette void-end rate prediction technique and system for improving gradient promotion decision tree
CN110322695A (en) * 2019-07-23 2019-10-11 内蒙古工业大学 A kind of Short-time Traffic Flow Forecasting Methods based on deep learning
CN110889558A (en) * 2019-11-29 2020-03-17 北京世纪高通科技有限公司 Road condition prediction method and device
CN111126678A (en) * 2019-12-09 2020-05-08 深圳市市政设计研究院有限公司 Traffic generation prediction method based on big data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GBDT—梯度提升决策树;Lavender-csdn;《CSDN博客-https://blog.csdn.net/kidchildcsdn/article/details/104840748》;20200313;第1-6页 *

Also Published As

Publication number Publication date
CN111784084A (en) 2020-10-16

Similar Documents

Publication Publication Date Title
CN111784084B (en) Travel generation prediction method, system and device based on gradient lifting decision tree
Karolemeas et al. Determining electric vehicle charging station location suitability: A qualitative study of greek stakeholders employing thematic analysis and analytical hierarchy process
CN111242493B (en) Street quality evaluation method, device and system and storage medium
Margules et al. Nature conservation: cost effective biological surveys and data analysis
Roy et al. Understanding citizen science and environmental monitoring: final report on behalf of UK Environmental Observation Framework
Arabsheibani et al. Land suitability assessment for locating industrial parks: a hybrid multi criteria decision‐making approach using Geographical Information System
Brendel et al. Information systems in the context of sustainable mobility services: A literature review and directions for future research
Anezakis et al. Fuzzy cognitive maps for long-term prognosis of the evolution of atmospheric pollution, based on climate change scenarios: the case of Athens
Ahtesham et al. House price prediction using machine learning algorithm-the case of Karachi city, Pakistan
CN112668803B (en) Automobile service chain enterprise shop-opening and site-selecting method based on LightGBM model
CN114662774B (en) Urban block vitality prediction method, storage medium and terminal
CN112465561A (en) Method, apparatus, medium, and device for building a model for real estate valuation
CN116437291A (en) Cultural circle planning method and system based on mobile phone signaling
Ferrari et al. Where and how? A comprehensive review of multicriteria approaches for bioenergy plant siting
Han et al. Urban redevelopment at the block level: Methodology and its application to all Chinese cities
CN113158084B (en) Method, device, computer equipment and storage medium for processing movement track data
CN118014297A (en) Intelligent evaluation method and system for supply and demand responsiveness of outdoor fitness facility
CN112148821B (en) City mixed occupation space calculation method and system
Lyu et al. Fishing capacity evaluation of fishing vessel based on cloud model
Deeb et al. Developing a Comprehensive Smart City Rating System: Case of Riyadh, Saudi Arabia
CN110264010B (en) Novel rural power saturation load prediction method
Bozdağ Local-based mapping of carbon footprint variation in Turkey using artificial neural networks
CN114139827B (en) Intelligent perception and optimization method for urban functional area function performance
Liang et al. A statistical analysis model of big data for precise poverty alleviation based on multisource data fusion
Karmshahi et al. Application of an integrated CA-Markov model in simulating spatiotemporal changes in forest cover: a case study of Malekshahi county forests, Ilam province

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant