CN111784084B

CN111784084B - Travel generation prediction method, system and device based on gradient lifting decision tree

Info

Publication number: CN111784084B
Application number: CN202010823717.1A
Authority: CN
Inventors: 杜立群; 刘斌; 郑猛; 张宇; 吴丹婷; 吕宜生; 李志帅
Original assignee: Beijing Municipal Institute Of City Planning & Design; Institute of Automation of Chinese Academy of Science
Current assignee: Beijing Municipal Institute Of City Planning & Design; Institute of Automation of Chinese Academy of Science
Priority date: 2020-08-17
Filing date: 2020-08-17
Publication date: 2021-12-28
Anticipated expiration: 2040-08-17
Also published as: CN111784084A

Abstract

The invention belongs to the field of population travel generation prediction, and particularly relates to a travel generation prediction method, a travel generation prediction system and a travel generation prediction device based on a gradient lifting decision tree, aiming at solving the problems that the existing travel generation method cannot truly reflect the nonlinear relation between an input value and prediction, and the model inspection calculation amount is large, and the result is not intuitive. The invention comprises the following steps: extracting independent variables of current travel generation data of each traffic cell of an area to be predicted, and performing normalization processing; generating a prediction model through travel, and acquiring the prediction value of each current traffic cell of the area to be predicted; and performing inverse normalization on the predicted values to obtain predicted travel generation data of each current traffic cell of the area to be predicted. The invention can accurately reflect the nonlinear relation between the original input and the original output, and uses the square error principle to search the minimum division characteristic and the division point, automatically omits the redundant variable, omits the manual screening process of the variable and has higher precision and robustness.

Description

Travel generation prediction method, system and device based on gradient lifting decision tree

Technical Field

The invention belongs to the field of population travel generation prediction, and particularly relates to a travel generation prediction method, system and device based on a gradient boosting decision tree.

Background

The interactive relationship between urban traffic and urban land utilization determines that social activities of different types and strengths can be generated by different land utilization layout forms and strengths, so that the traffic distribution amount and distribution conditions in different areas are determined. Correspondingly, the functional efficiency of the traffic system directly influences the price, rents and gas of surrounding land and influences the realization of the functions of the surrounding land. Therefore, the interrelationship between urban land utilization and traffic needs to be deeply researched in traffic planning, and the traffic trip rate is one of the important indexes for intuitively reflecting the interrelationship.

Urban traffic demand prediction is one of the core contents of urban traffic planning, and is an important basis for determining the scale of a traffic network, the structure of a road section, the scale of a junction and the like in a city. The traffic four-phase method is based on resident trip survey and comprises four phases of trip generation (trip generation/association), traffic distribution (trip distribution), traffic mode division (model split) and traffic allocation (traffic allocation).

The travel generation model is the sum of the travel production of a certain traffic cell in unit time equal to the number of home trips of the home end point in the partition and the number of non-home trips and cargo trips of the starting point in the partition. There are two endpoints for a trip: one end is a generating end point; the other end is a suction end point. The main factors affecting the production are population size and related classifications, such as age structure, occupation classification, income level, vehicle ownership, etc.

The traditional travel generation prediction method comprises a type analysis method, a regression analysis method and a growth rate method. The yield predicted by the type analysis method does not include two parts, namely home trip and cargo trip, and the prediction data is incomplete; the growth rate method results are rough. Therefore, at present, the most practical engineering application is the multiple regression analysis method, but the method defaults to the linear relationship between the input value and the prediction, the nonlinear influence between the input value and the prediction and the coupling relationship between the input value and the prediction cannot be truly reflected, statistical tests (significance and correlation) need to be carried out on the prediction model, the calculation amount is large, and the result is not intuitive enough.

Disclosure of Invention

In order to solve the above problems in the prior art, that is, the existing trip generation method cannot truly reflect the nonlinear relationship between the input value and the prediction, and has the problems of large model checking calculation amount and non-intuitive result, the invention provides a trip generation prediction method based on a gradient lifting decision tree, which comprises the following steps:

step S10, extracting independent variables of current travel generation data of each traffic cell of the area to be predicted, and performing normalization processing on the independent variables to obtain preprocessed data;

step S20, based on the preprocessed data, generating a prediction model through the trained trip, and acquiring the current prediction value of each traffic cell of the area to be predicted;

step S30, performing inverse normalization on the predicted values to obtain current predicted travel generation data of each traffic cell of the area to be predicted;

the trip generation prediction model is a gradient lifting decision tree model structure, a decision tree is used as a base learner, the sum of the outputs of all decision trees in the model is used as the output of the model, the square error is used as a loss function L between the predicted value and the true value of the model, and the model training method comprises the following steps:

step B10, extracting independent variables and dependent variables of historical travel generation data of each traffic cell of the area to be predicted, carrying out normalization processing, and dividing the normalized data into a training set and a test set according to a preset proportion;

step B20, performing N rounds of travel generation prediction model training based on each training data of the training set, wherein in the N-th round of training, in the model training, the modelAdding n decision trees, calculating the error negative gradient value r output by the n model based on the loss function L_(n+1)i(ii) a N is more than or equal to 1 and less than or equal to N is the round of current model training;

step B30, adding n +1 decision trees in the model, and making the error of the nth round negative gradient value r_(n+1)iAs labels, training the (N + 1) th decision tree until the training of the N decision trees is completed;

and step B40, performing performance test of the trained trip generation prediction model based on each test data of the test set, if the test result does not meet the set threshold, increasing the training round or adjusting the structure of the decision tree of the base learner and performing model training again by using the original training set until the test result meets the set threshold, and obtaining the trained trip generation prediction model.

In some preferred embodiments, the historical travel generation data of each traffic cell of the area to be predicted includes an independent variable and a dependent variable;

the independent variables comprise the number of families with or without vehicles and the population number, the number of workers with or without vehicles, students and other types of personnel and the total number of people in each employment post in each traffic district; the employment posts comprise industry, water conservancy environment and public facilities, transportation and postal storage, public management, education, resident service industry, financial industry, information technology service industry, agriculture, forestry, animal husbandry and fishery;

the dependent variable comprises travel production of vehicles in each traffic cell and family-based and non-family-based families in the absence of vehicles.

In some preferred embodiments, step S10, "normalization processing of variables" is performed by:

wherein,

and

independent variables X of the historical data before normalization_iAnd dependent variable Y_iMaximum value, x, of the data of each dimension of (1)_iAnd y_iRespectively are independent variable and dependent variable after normalization, k is x_iD is y_iDimension (d) of (a).

In some preferred embodiments, for the ith training data (x) in the training set_i，y_i) The method for calculating the loss value comprises the following steps:

wherein, f (x)_i) And y_iRespectively generating a prediction value output by a prediction model and training data x for travel_iCorresponding true value, D is f (x)_i) And y_iDimension (d) of (a).

In some preferred embodiments, step B20 "calculate the error negative gradient value r of the nth round model output based on the loss function L_(n+1)i", the method is as follows:

wherein, L (y)_i，f_n(x_i) Represents a predicted value f of the output of the trip generation prediction model in the nth round of training_n(x_i) Corresponding to the true value y_iThe loss value between, m is the number of training data in the training set,

represents the loss value L (y)_i，f_n(x_i) With respect to the predicted value f_n(x_i) Partial derivatives of (a);

wherein, T (x)_i，Θ_n) The predicted value, theta, output for the nth decision tree representing the model_nParameters of the nth decision tree for the nth round of model training.

In some preferred embodiments, step B30 "add n +1 decision tree in model, and make the error of the n round negative gradient value r_(n+1)iAnd (3) training the (n + 1) th decision tree as a label, wherein the method comprises the following steps:

wherein, theta_n+1For the n +1 decision tree parameters in the n +1 round of model training, r_(n+1)iNegative gradient value of error, L, for the output of the nth model_b(r_(n+1)i，T(x_i，Θ_n+1) Represents the predicted value T (x) of the n +1 decision tree output of the n +1 round of training of the model_i，Θ_n+1) With corresponding error negative gradient value r_(n+1)iThe loss value between true, m is the number of training data in the training set;

wherein L is_bD is the negative gradient r of the model in the n +1 th round of training as the loss function of the base learner_(n+1)iAnd the predicted value T (x) output by the n +1 decision tree_i，Θ_n+1) Dimension (d) of (a).

In some preferred embodiments, in step B40, "performance test of the trip generation prediction model after training based on each test data of the test set", the method includes:

step C10, inputting the independent variables in each test data of the test set into the trained trip generation prediction model, and obtaining the prediction value output by the trip generation prediction model;

step C20, calculating R between the predicted value and dependent variable corresponding to independent variable²Value, root mean square error, and average absolute error;

step C30, if said R is²The value is close to 1, and the root mean square error and the average absolute error are smaller than a set threshold value, so that the performance of the trip generation prediction model meets the requirement; otherwise, increasing training rounds or adjusting the structure of the decision tree of the base learner and performing model training again by using the original training set.

On the other hand, the invention provides a travel generation prediction system based on a gradient lifting decision tree, and the travel generation prediction method based on the gradient lifting decision tree comprises an input module, a preprocessing module, a prediction module, an inverse normalization module and an output module;

the input module is configured to acquire and input current travel generation data of each traffic cell of an area to be predicted;

the preprocessing module is configured to extract independent variables of current travel generation data of each traffic cell of the area to be predicted, and normalize the independent variables to obtain preprocessed data;

the prediction module is configured to generate a prediction model through the trained trip based on the preprocessed data, and obtain the current prediction value of each traffic cell of the area to be predicted;

the reverse normalization module is configured to reverse normalize the predicted value to obtain current predicted travel generation data of each traffic cell of the area to be predicted;

the output module is configured to output the obtained current predicted travel generation data of each traffic cell of the area to be predicted.

In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, and the programs are suitable for being loaded and executed by a processor to implement the above-mentioned travel generation prediction method based on a gradient lifting decision tree.

In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; the processor is suitable for executing various programs; the storage device is suitable for storing a plurality of programs; the program is suitable for being loaded and executed by a processor to realize the travel generation prediction method based on the gradient boost decision tree.

The invention has the beneficial effects that:

the method for predicting the travel generation based on the gradient lifting decision tree obtains a prediction model to predict the travel generation by utilizing preprocessed resident survey data and training a gradient lifting decision tree structure, can accurately reflect the nonlinear relation between original input and output, uses a square error principle to find the minimum division characteristic and division point, automatically ignores redundant variables, omits a manual variable screening process, and has higher precision and robustness compared with the conventional multiple linear regression method. Meanwhile, the invention provides the model performance evaluation index by adopting the test set, and the quality of different models can be compared under the index, so that the model parameter inspection process is simpler and more visual.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a schematic flow chart of a travel generation prediction method based on a gradient boosting decision tree according to the present invention;

FIG. 2 is a schematic overall structure diagram of a travel generation prediction method based on a gradient lifting decision tree according to the present invention;

fig. 3 is a schematic structural diagram of a decision tree with a depth of d-3 leaf nodes and a number of J-4 leaf nodes according to an embodiment of the trip generation prediction method based on a gradient lifting decision tree;

fig. 4 is a schematic structural diagram of a gradient boosting decision tree including N decision trees, which is adopted in an embodiment of the travel generation prediction method based on a gradient boosting decision tree.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

The invention provides a travel generation prediction method based on a Gradient Boosting Decision Tree, which is characterized in that the method considers and utilizes big data of resident survey again to research a traffic generation prediction problem based on a Tree structure model, and provides a prediction method of a Decision Tree structure based on a Gradient Boosting Decision Tree (GBDT) aiming at the actual application requirement of traffic generation prediction. The method is based on a data driving mode, the optimal division characteristics and division points are found through the principle of minimizing the square error, the screening process of independent variables is omitted, the characteristics of an input mode are effectively extracted, meanwhile, a test set is adopted to evaluate the performance of the model, and the prediction and parameter inspection of the independent variables are simple and visual.

The invention discloses a travel generation prediction method based on a gradient lifting decision tree, which comprises the following steps:

and step B20, performing N rounds of travel generation prediction model training based on each training data of the training set, adding N decision trees in the model in the N round of training, and calculating the error negative gradient value r output by the N round of model based on the loss function L_(n+1)i(ii) a N is more than or equal to 1 and less than or equal to N is the round of current model training;

In order to more clearly describe the travel generation prediction method based on the gradient boosting decision tree of the present invention, details of each step in the embodiment of the present invention are described below with reference to fig. 1 and 2.

The trip generation prediction method based on the gradient lifting decision tree of the embodiment of the invention comprises the following steps:

and step S10, extracting independent variables of the current travel generation data of each traffic cell of the area to be predicted, and performing normalization processing on the independent variables to obtain preprocessed data.

In one embodiment of the invention, travel generation data of each traffic cell of an area to be predicted is obtained in a questionnaire survey mode, and in the model application process, current travel generation data of the area to be predicted, including an independent variable X, is used; in the model training and testing, historical trip generated data of the area to be predicted are used, and the historical trip generated data comprise independent variable X and dependent variable Y.

The independent variable X comprises the number of families with vehicles and without vehicles and the number of population in each traffic district, the number of workers with vehicles and without vehicles, students and other types of personnel and the total number of people in each employment post; the employment posts comprise industry, water conservancy environment and public facilities, transportation and postal storage, public management, education, residential service industry, financial industry, information technology service industry, agriculture, forestry, animal husbandry and fishery industry and the like.

The dependent variable Y comprises travel production of vehicles in each traffic cell and family-based and non-family-based families without vehicles; the family-based travel production represents that the position of a departure place or a destination at the time of travel or in the travel is a home, and otherwise, the family-based travel production is not the family-based travel production.

Besides questionnaire survey, the travel generation data of each traffic cell of the area to be predicted can be obtained in other manners, and the invention is not described in detail herein.

And performing coarse screening on the acquired data to remove data which is intuitively useless for a prediction result, such as the number of a traffic cell, the number of a street where the traffic cell is located and the like.

The data normalization process is to scale the value of each dimension variable to 0-1, so as to reduce the fluctuation of data and make the prediction result more stable, assuming that after the data of the questionnaire survey is roughly screened, the dimension of each independent variable is kX 1, the dimension of the dependent variable is DX 1, the two form a sample, and the ith sample (X) is used as the sample_i，Y_i) For example, wherein

For the real number domain, samples of all traffic cells constitute a data set

The normalization process of the variables is shown in formula (1) and formula (2):

wherein,

and

And step S20, based on the preprocessed data, generating a prediction model through the trained trip, and acquiring the current prediction value of each traffic cell of the area to be predicted.

And inputting the data into a trained trip generation prediction model with fixed parameters, and obtaining the current predicted values of all traffic cells of the area to be predicted, which are output by the model.

And step S30, performing inverse normalization on the predicted values to obtain current predicted travel generation data of each traffic cell of the area to be predicted.

The inverse normalization of the predicted value is shown as formula (3):

wherein x is_iDenotes the ith input sample, f (x)_i) As a predictor of the model, F (x)_i) In order to reverse-normalize the predicted value of the model, namely the final predicted travel generation amount, D is the dimension of the predicted value,

for the dependent variable Y in the history data before normalization_iMaximum value of each dimension data.

The trip generation prediction model is a gradient lifting decision tree model structure, a decision tree is used as a base learner, the sum of the outputs of all decision trees in the model is used as the output of the model, the square error is used as a loss function L between a model prediction value f (x) and a true value y, and the ith sample (x) is used_i，y_i) The calculation process is, for example, as shown in equation (4):

In one embodiment of the present invention, Classification and Regression Trees (CART) are selected as the base learners of the GBDT, wherein the CART can only form a binary tree, N CART Regression Trees are selected to combine into a GBDT prediction model, that is, the model is trained for N rounds, and each CART Regression tree has the same structure information, wherein the structure information includes: the number J of leaf nodes of the CATR regression tree, the depth of each tree, and the like.

The trip generation prediction model is trained by the following steps:

and step B10, extracting independent variables and dependent variables of historical travel generation data of each traffic cell of the area to be predicted, carrying out normalization processing, and dividing the normalized data into a training set and a test set according to a preset proportion.

Because the data volume generated by the trip of the traffic district obtained by obtaining the questionnaire is limited, the preprocessed data set can be mixed according to the proportion of 7: 3

Division into training sets

And test set

Meanwhile, the sequence of the samples in the training set needs to be randomly disturbed. With data sets

The ratio of the training set to the test set can be adjusted to 9: 1 by increasing the data amount.

And step B20, performing N rounds of travel generation prediction model training based on each training data of the training set, adding N decision trees in the model in the N round of training, and calculating the error negative gradient value r output by the N round of model based on the loss function L_(n+1)i(ii) a And N is more than or equal to 1 and less than or equal to N is the round of current model training.

Predicted value T (x) for each base learner (i.e., each tree) during the build process_i，Θ_n) The square error is adopted as a loss function L between the model and the negative gradient value r of the model_bIt should be noted that the loss function L, L is different from the model's predicted and true loss functions L, L_bFor the loss function of the base learner, still take the ith sample as an example, as shown in equation (5):

wherein D is the negative gradient r of the model in the n +1 th round of training_(n+1)_iAnd the predicted value T (x) output by the n +1 decision tree_i，Θ_n+1) Dimension (d) of (a).

M sample data of training setTo { (x)₁，y₁)，(x₂，y₂)，...，(x_m，y_m) Inputting all the parameters into a first decision tree of the constructed travel generation prediction model, and training parameters of the tree, as shown in formula (6):

obtaining a predicted value of the first decision tree model, which is an output of the first decision tree, as shown in formula (7):

f₁(x_i)＝T(x_i，Θ₁)，i＝1，2，...m (7)

calculating model output result f by constructed model loss function L₁(x_i) With the true value y_iNegative gradient r of error between_2i1, 2.. m, as shown in formula (8):

combining input data x_iAnd the negative gradient r of the model error after the first round of training_2iCombining new data pairs { (x)₁，r₂₁)，(x₂，r₂₂)，...，(x_m，r_2m) And it is used to train a second decision tree, resulting in a result T (x)_i，Θ₂) And last round model output f₁(x_i) The sum is taken as a predicted value, as shown in equation (9):

f₂(x_i)＝T(x_i，Θ₂)+f₁(x_i)，i＝1，2，...m (9)

by analogy, the error negative gradient r of the model in the nth round (namely the model has n decision trees) is obtained_(n+1)iThe process is shown as formula (10):

represents the loss value L (y)_i，f_n(x_i) With respect to the predicted value f_n(x_i) The partial derivatives of (1).

Step B30, adding n +1 decision trees in the model, and making the error of the nth round negative gradient value r_(n+1)iAnd (5) as a label, training the (N + 1) th decision tree until the training of the N decision trees is completed.

r_(n+1)iForming new sample pairs with corresponding input data { (x)₁，r_(n+1)1)，(x₂，r_(n+1)2)，...，(x_m，r_(n+1)m) And (3) training the trip to generate the (n + 1) th decision tree of the prediction model by using the decision tree to obtain corresponding parameters, as shown in the formula (11):

wherein, theta_n+1Parameters of the n +1 decision Tree for the n +1 th round of model training, r_(n+1)iNegative gradient value of error, L, for the output of the nth model_b(r_(n+1)i，T(x_i，Θ_n+1) Represents the predicted value T (x) of the n +1 decision tree output of the n +1 round of training of the model_i，Θ_n+1) With corresponding error negative gradient value r_(n+1)iThe loss value between true, m is the number of training data in the training set.

The predicted value of the model at this time is shown in equation (12):

by analogy, straightObtaining a travel generation prediction model f after the training of the N decision trees is finished_N(x_i)，i＝1，2，...m。

From the above operations, the lifting tree is an addition model of the decision tree, so that the final predicted value of the model is obtained, as shown in formula (13):

the training process of the decision tree is to find the optimal division node of the decision tree until the structural information of the tree meets a set value. The specific process is as follows: and traversing each possible value of each feature, respectively calculating the square error, and finding the partition feature j and the corresponding partition node s which enable the square error to be minimum, namely determining the partition feature j and the corresponding partition node s as the optimal partition node (j, s).

As shown in fig. 3, a schematic diagram of a decision tree structure with a depth of d-3 leaf nodes and a number of J-4 in an embodiment of the trip generation prediction method based on a gradient lifting decision tree is shown, assuming that the dimension k of input data x is 3, that is, the segmentation characteristics of arguments are 3, and output data y is output_iThe dimension r is 1, that is, the dependent variable has 1, the maximum depth of the decision tree is set to d is 3, and the number of leaf nodes is 4, then the training process is as follows:

first, the loss function L is determined by the base learner_bThe obtained and trained decision tree is used for dividing the input space, finding the optimal segmentation characteristic j of the input data x and the optimal segmentation point s under the characteristic, and using

Feature x in j-th dimension^(j)And then, in two areas divided by s, the optimization process is to sequentially traverse each dimension feature j and each value s of the feature, calculate a loss function of each dividing point (j, s), and the dividing point with the minimum loss function is the optimal dividing point, as shown in formula (14):

wherein, c₁，c₂The mean value of all samples in each region is shown in equation (15):

and secondly, continuously dividing the data in the two sub-regions into the sub-regions by the steps until the number of the leaf nodes of the decision tree is equal to a set value.

Thirdly, because the number of leaf nodes is 4, dividing the input x into 4 sub-regions

Each region sample mean is

The final CART learner is shown in equation (16):

as shown in fig. 4, which is a schematic diagram of a gradient boosting decision tree structure including N decision trees adopted in an embodiment of the travel generation prediction method based on a gradient boosting decision tree according to the present invention, a negative gradient value r is used to train each decision tree.

The model performance test method comprises the following steps:

and step C10, inputting the independent variables in the test data of the test set into the trained trip generation prediction model, and obtaining the predicted value output by the trip generation prediction model.

Step C20, calculating R between the predicted value and dependent variable corresponding to independent variable²The value, Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) are shown as equations (17), (18), and (19), respectively:

wherein, y_iM is the true value of the sample, i 1, 2,. M,

average of the test set samples, f (x)_i) M is the model prediction value, and M is the number of test set samples.

The travel generation prediction system based on the gradient lifting decision tree in the second embodiment of the invention is based on the travel generation prediction method based on the gradient lifting decision tree, and comprises an input module, a preprocessing module, a prediction module, an inverse normalization module and an output module;

the preprocessing module is configured to extract variables of current travel generation data of each traffic cell of the area to be predicted, and perform normalization processing on the variables to obtain preprocessed data;

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.

It should be noted that, the travel generation prediction system based on the gradient boost decision tree provided in the foregoing embodiment is only illustrated by the division of the functional modules, and in practical applications, the functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

A storage device according to a third embodiment of the present invention stores a plurality of programs, and the programs are suitable for being loaded and executed by a processor to implement the above-mentioned travel generation prediction method based on a gradient boosting decision tree.

A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to realize the travel generation prediction method based on the gradient boost decision tree.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A travel generation prediction method based on a gradient lifting decision tree is characterized by comprising the following steps:

step S10, extracting independent variables of current travel generation data of each traffic cell of the area to be predicted, and performing normalization processing on the independent variables to obtain preprocessed data; historical travel generation data of each traffic cell of the area to be predicted comprise independent variables and dependent variables; the independent variables comprise the number of families with or without vehicles and the population number, the number of workers with or without vehicles, students and other types of personnel and the total number of people in each employment post in each traffic district; the employment posts comprise industry, water conservancy environment and public facilities, transportation and postal storage, public management, education, resident service industry, financial industry, information technology service industry, agriculture, forestry, animal husbandry and fishery; the dependent variable comprises travel production of vehicles in each traffic cell and family-based and non-family-based families of the families without the vehicles;

step B30, adding n +1 decision trees in the model, and making the error of the nth round negative gradient value r_(n+1)iAs a label, training of the (n + 1) th decision tree is performed:

Θ_n+1for the n +1 decision tree parameters in the n +1 round of model training, r_(n+1)iNegative gradient value of error, L, for the output of the nth model_b(r_(n+1)i,T(x_i,Θ_n+1) Represents the predicted value T (x) of the n +1 decision tree output of the n +1 round of training of the model_i,Θ_n+1) With corresponding error negative gradient value r_(n+1)iThe loss value between true, m is the number of training data in the training set;

L_bd is the negative gradient r of the model in the n +1 th round of training as the loss function of the base learner_(n+1)iAnd the predicted value T (x) output by the n +1 decision tree_i,Θ_n+1) The dimension of (a);

after the N +1 decision tree training is finished, continuing to train the N +2 decision trees until the N decision trees are trained;

step B40, inputting the independent variable in each test data of the test set into the trained trip generation prediction model, obtaining the predicted value output by the trip generation prediction model, and calculating the R between the predicted value and the dependent variable corresponding to the independent variable²Value, root mean square error and mean absolute error, if said R²The value is close to 1, and the root mean square error and the average absolute error are smaller than a set threshold value, so that the performance of the trip generation prediction model meets the requirement; otherwise, increasing training rounds or adjusting the structure of a decision tree of the base learner and performing model training again by using the original training set until the test result meets a set threshold value to obtain a trained trip generation prediction model;

in the training process of the decision tree, traversing each possible value of each feature, respectively calculating a square error, finding a partition feature j and a corresponding partition node s which enable the square error to be minimum, and obtaining an optimal partition node (j, s):

feature x in j-th dimension^(j)Two regions divided by s, c₁,c₂Is the mean of all samples in both regions, N_tFor all sample numbers in both regions:

continuously dividing the data in the two sub-regions into the sub-regions by the steps respectively until the number of the leaf nodes of the decision tree is equal to a set value;

if the number of leaf nodes is 4, dividing the input x into 4 sub-regions

Each region sample mean is

Final CART learnerComprises the following steps:

2. the method for generating and predicting a trip based on a gradient boosting decision tree according to claim 1, wherein in step S10, "normalization processing of variables" is performed, and the method includes:

wherein,

and

respectively are historical data independent variables X before normalization_iAnd dependent variable Y_iMaximum value, x, of the data of each dimension of (1)_iAnd y_iRespectively are independent variable and dependent variable after normalization, k is x_iD is y_iDimension (d) of (a).

3. The method of claim 1, wherein the ith training data (x) in the training set is predicted by generating a gradient lifting decision tree based trip_i,y_i) The method for calculating the loss value comprises the following steps:

4. The method for predicting travel generation based on gradient-boosting decision tree as claimed in claim 1, wherein in step B20, the error negative gradient value r of the n-th round model output is calculated based on the loss function L_(n+1)i", the method is as follows:

wherein, L (y)_i,f_n(x_i) Represents a predicted value f of the output of the trip generation prediction model in the nth round of training_n(x_i) Corresponding to the true value y_iThe loss value between, m is the number of training data in the training set,

represents the loss value L (y)_i,f_n(x_i) With respect to the predicted value f_n(x_i) Partial derivatives of (a);

wherein, T (x)_i,Θ_n) The predicted value, theta, output for the nth decision tree representing the model_nParameters of the nth decision tree for the nth round of model training.

5. A travel generation prediction system based on a gradient boosting decision tree, which is characterized in that based on the travel generation prediction method based on the gradient boosting decision tree of any one of claims 1 to 4, the travel generation prediction system comprises an input module, a preprocessing module, a prediction module, an inverse normalization module and an output module;

6. A storage device having stored therein a plurality of programs, wherein the programs are adapted to be loaded and executed by a processor to implement the gradient boosting decision tree based travel generation prediction method according to any one of claims 1 to 4.

7. A treatment apparatus comprises

A processor adapted to execute various programs; and

a storage device adapted to store a plurality of programs;

wherein the program is adapted to be loaded and executed by a processor to perform:

the gradient boosting decision tree based travel generation prediction method of any one of claims 1-4.