CN109376331A

CN109376331A - A kind of city bus emission index estimation method promoting regression tree based on gradient

Info

Publication number: CN109376331A
Application number: CN201810958885.4A
Authority: CN
Inventors: 陈淑燕; 潘应久
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2018-08-22
Filing date: 2018-08-22
Publication date: 2019-02-22

Abstract

The invention discloses a kind of city bus emission index estimation methods that regression tree is promoted based on gradient, first according to actual measurement bus emissions data, carry out standardization processing using Lagrange's interpolation, obtain by second emissions data；Secondly, characterizing the current operating condition of bus using vehicle specific power VSP (Vehicle Specific Power), while considering influence of the previous driving status to discharge, establishes the quantitative model of emission index；Regression tree training data finally is promoted using gradient, and carries out parameter regulation, obtains bus emission index estimation model.The present invention considers current time operating condition and previous driving condition to the joint effect of current time emission index, it overcomes and is difficult to describe non-linear relation complicated between bus emission index and each influence factor existing for existing emission index estimation method, regression tree model is promoted using nonparametric technique gradient, the estimated accuracy for improving bus emission index has realistic meaning for control transportation emission discharge amount and optimization road environment.

Description

A kind of city bus emission index estimation method promoting regression tree based on gradient

Technical field

The invention belongs to intelligent transport technology and traffic environment field, more particularly to a kind of regression tree is promoted based on gradient City bus emission index estimation method.

Background technique

Pollution problem caused by urban transportation has caused the attention of countries in the world, and wherein bus is daily as heavy-duty car It shuttles among city, therefore estimates with the emission performance of assessment bus for managing and controlling Pollution of City Traffic problem tool There is realistic meaning.Currently, having the model of some vehicular emission amounts in the world, such as the MOVES model of U.S. EPA, Europe committee The COPERT model that member can develop, the CMEM model etc. of University of California Riverside's exploitation, these models are all based on greatly foreign countries The discharge of traffic emission data mining estimate model, for Chinese complicated road traffic environment, model is not fully applicable in, Some fuel types are in current certain models and unavailable simultaneously, and as do not supported at present in MOVES, liquefied natural gas is public Hand over the emission performance estimation of vehicle.

In terms of city bus emission index estimation, handed over since vehicle will receive complicated road during actual travel Logical environment influences, therefore complicated non-linear relation is presented in bus emission index and road traffic parameter, against simple linear Homing method can not relationship between precise quantification explanatory variable and emission index.Meanwhile merely with simple regression tree-model, Wu Fajing It really extracts the information characteristics of explanatory variable and be easy to cause the over-fitting of discharge estimation model.If being directed to certain of explanatory variable A kind of characteristic is classified, and establishes regression tree respectively, and calculates residual error using loss function using gradient method for improving to be fitted Several regression tree models are finally overlapped by one regression tree again, can more in depth excavate the information of explanatory variable Feature reaches the target of the estimation accuracy for improving model and enhancing generalization ability.

Summary of the invention

Goal of the invention: for the above problem of the existing technology, the present invention proposes a kind of with gradient promotion regression tree More regression iterative methods carry out the estimation of bus emission index method, this method sufficiently analyze bus current operating conditions with Previous operating status is to the influence degree of current emissions characteristic, to improve the accuracy of bus emission index estimation.

Technical solution: to achieve the purpose of the present invention, the technical scheme adopted by the invention is that: one kind is promoted based on gradient The city bus emission index estimation method of regression tree, this method comprises the following steps:

(1) standardization processing is carried out to obtained bus discharge and running data, obtained by second emission index and traveling shape State characteristic parameter data；

(2) bus driving condition calculation of characteristic parameters real-time vehicle specific power is utilized, and is characterized with velocity and acceleration The driving status of previous second, the emissions data obtained based on step (1) determines training set, as mode input parameter；

(3) the loss function L that model is determined using the input parameter that step (2) obtain sets regression tree number M, and just The weak learner of beginningization constructs new return as residual error approximation in the value of current regression tree model with the negative gradient of loss function Gui Shu；

(4) regression tree determined in an iteration according to step (3), renewal learning device function, until M iteration Terminate, i.e. M regression tree obtains final strong learner model；

(5) emission index estimation is carried out to test set using the model established.

Wherein, in step (1), standardization processing is carried out to measured data according to the following formula:

For n+1 point to (x₀,y₀),(x₁,y₁),...,(x_n,y_n), seek a function l_i(x), make the function in x_iPlace Obtain corresponding y_iValue, l_iIt (x) is Lagrangian fundamental polynomials, i.e. Interpolation-Radix-Function, expression formula are as follows:

Wherein, n+1 indicates the point of data set to number；x_nIndicate (n+1)th point to it is corresponding at the time of；y_nExpression n-th+ The discharge and driving status characteristic variable value of 1 point pair；

Assuming that the x that any two are different_iIt is all different, lagrange polynomial can be obtained:

Wherein, in step (1), the bus includes that bus discharges during actual travel by second emission index CO、CO₂、HC、NO_XBy second emission index；The real-time driving condition characteristic parameter includes speed, acceleration, road grade and moves State passenger capacity.

Wherein, described to utilize bus driving condition calculation of characteristic parameters real-time vehicle specific power in step (2), it is used in combination The driving status method that velocity and acceleration characterizes the previous second is as follows:

(2.1) the bus driving parameters data obtained using step (1), calculate the vehicle specific power of bus, specifically Calculation method is as follows:

In formula, VSP is the vehicle specific power of bus, F_tIt is tractive force (N)；V is travel speed (m/s)；M is bus Total weight, including vehicle body nt wt net weight and carrying weight (kg)；F_f,F_w,F_i,F_j, respectively indicate rolling resistance, air drag, ramp Resistance and acceleration resistance (N)；A indicates bus acceleration (m/s²)；G is acceleration of gravity (9.8 m/s²)；F is rolling resistance Coefficient is dimensionless group；ε_iIndicate quality factor；α indicates road grade；ρ_aIndicate atmospheric density；C_DTraction coeficient；A is indicated Bus windshield area；

(2.2) velocity and acceleration of previous driving status is obtained by second driving status supplemental characteristic according to bus, I.e. previous second velocity and acceleration.

Wherein, in step (2), the emissions data obtained based on step (1) determines training set, as mode input parameter, Above-mentioned training set determines that method is as follows:

D={ (x₁,y₁),(x₂,y₂),...(x_i,y_i),...,(x_N,y_N) i=1,2 ..., N

Wherein, D indicates the training set as discharge estimation mode input layer, (x therein_i,y_i) indicate i-th in training set The independent variable and dependent variable point pair of group data, x_iIt indicates argument data collection, i.e. discharge variation, altogether includes that three influences become Amount, respectively VSP_t, v_t-1, a_t-1, wherein VSP_tIndicate t moment, the i.e. instantaneous vehicle specific power at current time, v_t-1And a_t-1Point Not Biao Shi the t-1 moment instantaneous velocity and acceleration；y_iIndicate bus emission index, including CO, CO₂, HC, NO_XFour kinds of discharges The emission index of object, N are the number of samples of input data.

Wherein, in step (3), regression tree number M is preset, and loss function is negative binomial log-likelihood function, expression Formula is as follows:

L (y, f (x))=log (1+exp (- 2yf (x)))；

Wherein, y surveys emission index value for being dependent variable value；F (x) indicates emission index estimated value；

Initialize the form of weak learner are as follows:

Wherein, N is the number of samples of input data；C is initial leaf node output parameter；L(y_i, c) and it indicates with i-th The loss function that sample training obtains.

Wherein, it in step (3), with the negative gradient of loss function in the value of "current" model, is wanted as the regression tree newly constructed The approximation of the residual error of fitting, for each sample (x_i,y_i), residual error is calculated using the method for gradient decline:

Wherein, r_m,iIndicate the residual error of i-th of sample in the m regression tree；f_m-1(x_i) indicate the m-1 regression tree training Obtained learner, i.e., when independent variable is x_iThe emission index estimated value that the m-1 regression tree of Shi Liyong acquires is calculating the m F is used when the residual error of i-th of sample of regression tree_m-1(x_i) replace f (x_i)；

Since loss function is negative binomial log-likelihood function, residual error can be further indicated that are as follows:

Wherein, the method for one regression tree of fitting is in step (3): utilizing i-th of sample in calculated the m regression tree This residual error r_m,i, gathered { (x_i,r_m,i)}_{I=1,2 ..., N}, to train the m regression tree T_m, stroke of leaf node Subregion is denoted as R_m,j, j=1,2 ..., J.

Wherein, in step (3), the method for solving leaf node output valve is, for regression tree T_mEach leaf node:

Wherein, c_m,jIndicate the leaf node output valve in j-th of feature unit of the m regression tree, the i.e. estimation of emission index Value.

Wherein, in step (4), the update method of learner are as follows: obtain regression tree T_mAll leaf node output valves after, more New learner:

Wherein, feature space is divided into J unit { R by regression tree₁,R₂,...,R_J, feature space refers to every recurrence The generating mode of leaf nodes determines the method for dividing leaf node that is, according to the number of independent variable and value range；R_m,jIt indicates J-th of feature unit of the m regression tree, each feature unit indicate a division classification；I(x∈R_m,j) it is indicator function, x For independent variable, i.e. discharge variation VSP_t, v_t-1And a_t-1, as regression tree T_mDetermine x ∈ R_m,jWhen, that is, indicate the independent variable category In R_m,jIn unit, I value takes 1 at this time, is otherwise 0；c_m,jIndicate the leaf segment that the m regression tree obtains under j-th of feature unit The output valve of point, i.e. emission index value under this feature unit；

Gradient method for improving introduces shrinkage parameters v, then the expression formula of renewal learning device becomes:

Wherein, shrinkage parameters v is known as learning rate；

According to the continuous iteration of process of the m regression tree of training, until obtaining M regression tree superposition most after iteration M times Whole gradient promotes regression tree model, and representation method is as follows:

Wherein,Indicate that gradient promotes regression tree.

The utility model has the advantages that compared with prior art, technical solution of the present invention has following advantageous effects:

(1) influence of the current operating condition to emission performance is not only allowed for, while considering previous operating status pair The influence of emission performance, to improve the accuracy of discharge estimation；

(2) it using the current operating condition of vehicle specific power characterization bus as one of input parameter of model, both wrapped The operating condition feature of bus is contained, such as velocity and acceleration also contains roadway characteristic parameter, such as road grade, simultaneously Also contemplate influence of the passenger capacity to discharge of dynamic change；

(3) compared to single post-class processing, promoting regression tree using gradient can implicit spy preferably in learning data Sign, overcome be difficult to describe existing for existing emission index estimation method it is complicated non-thread between bus emission index and each influence factor Sexual intercourse；

(4) by more regression tree Shared Decision Makings of iteration, all regression trees are stacked up to obtain final discharge estimation Model, can the significantly more efficient information for excavating explanatory variable so that entire model reaches higher estimation accuracy.

Detailed description of the invention

Fig. 1 is overview flow chart of the invention；

Fig. 2 is the emission index estimated result obtained in the present invention using liquefied natural gas bus measured data.

Specific embodiment

Further description of the technical solution of the present invention with reference to the accompanying drawings and examples.

As shown in Figure of description 1, the invention proposes a kind of city bus discharges that regression tree is promoted based on gradient Rate estimation method, this method comprises the following steps:

(1) using PEMS actual measurement urban road bus discharge and transport condition data, and Lagrange's interpolation side is utilized Method carries out standardization processing to data.

The test of the bus discharge algorithm for estimating of regression tree is promoted the present invention is based on gradient and training dataset is all from In 1 tunnel of industry, 51 tunnels, No. 206 buses measured data.In operation using PEMS equipment acquisition bus Real-time emission index, including CO, CO₂、HC、NO_XThe emission index of four kinds of pollutants, while utilizing handhold GPS equipment record vehicle Running track can obtain vehicle running state data, including speed, acceleration by running track data, have also obtained description Link characteristics road grade data.In addition, investigator by be recorded in the passengers quantity that bus station is got on or off the bus obtain it is dynamic The passenger loading data of state.The sampling interval of PEMS and GPS device is 1-2 seconds, to obtain by second discharge and transport condition data, benefit Standardization processing is carried out to data set with lagrange-interpolation, is obtained by second emissions data.By taking speed data as an example, preceding 10 The speed data of second is as shown in table 1 below.

1 preceding 10 second speed data of table

Moment (s)	1	2	3	4	5
						Speed (km/h)	11.0	13.5	Null value	16.5	Null value
Moment (s)	6	7	8	9	10
						Speed (km/h)	19.5	18.0	Null value	25.0	26.0

It is modeled using 2 data not lacked each before and after missing values, concrete methods of realizing is as follows:

The moment (s) is indicated with x, and y=f (x) indicates the speed (km/h) under moment x.It is needed in table 1 to f (3), f (5), f (8) Interpolation is carried out, by taking f (3) as an example, is modeled using 2 data not lacked each before and after missing values, i.e. selection f (1), f (2), The data of f (4), f (5) are modeled, but f (5) is missing values herein, therefore only chooses f (1), and f (2), f (4) are carried out to f (3) Interpolation calculation.The Interpolation-Radix-Function l of calculating_i(x) as follows:

According to the Interpolation-Radix-Function l being calculated_i(x) lagrange polynomial can be obtained:

According to obtained lagrange polynomial, the value of (3) f can be calculated:

F (3)=L (3)=15.32

According to above-mentioned calculation method, can also interpolation be carried out to f (5) and f (8).

It include respectively CO emission index, CO to emissions data using above-mentioned lagrange-interpolation₂Emission index, NO_XDischarge Rate and HC emission index；Transport condition data includes vehicle specific power VSP, instantaneous velocity, acceleration, roadway characteristic data, i.e. road The road gradient and dynamic passenger loading data carry out interpolation, obtain by second data, and be recorded as referring to the time, discharge number to by the second According to, merged by second transport condition data, by second roadway characteristic data and dynamic passenger loading data, obtain final data Collection.Data set includes that acquisition moment, dynamic passenger capacity, instantaneous velocity, acceleration, road grade, operating range, vehicle compare function altogether Rate, longitude and latitude, CO emission index, CO₂Emission index, NO_X12 attributes of emission index and HC emission index.

(2) characteristic factor for influencing emission performance is extracted, determines the input layer of emission index estimation model.

In at a time, the pollutant emission rate of bus is influenced by factors, road grade, vehicle driving State, bus carrying weight etc. can all influence the emission index of vehicle to a certain extent.Accurately to estimate vehicle emission index, Quantify the operating condition of bus using the vehicle specific power (VSP) being widely adopted, circular is as follows:

In formula, Power indicates vehicle general power (kW)；Mass indicate bus gross mass (kg), be vehicle dry weight amount with The sum of passenger capacity；F_tIt is tractive force (N)；V is travel speed (m/s)；M is bus total weight, including vehicle body nt wt net weight and load Objective weight (kg)；F_f,F_w,F_i,F_j, respectively indicate rolling resistance, air drag, gradient resistance and acceleration resistance (N)；A indicates public Hand over vehicle acceleration (m/s²)；G is acceleration of gravity (9.8m/s²)；F is coefficient of rolling resistance, is dimensionless group；ε_iIndicate matter Measure the factor；α indicates road grade；ρ_aIt indicates atmospheric density, takes 1.207kg/m at 20 °C³；C_DTraction coeficient；A is indicated Bus windshield area (m²)。

Except when preceding operating condition will affect except the emission index at bus current time, previous driving status also can be one Determine to influence bus in degree in the emission performance at current time.Therefore, the instantaneous velocity in the present invention with the previous second and acceleration The previous driving status for spending two parameter characterization buses, the vehicle specific power with current time is together as the main defeated of model Enter parameter.

Mode input layer indicates are as follows:

D={ (x₁,y₁),(x₂,y₂),...(x_i,y_i),...,(x_N,y_N) i=1,2 ..., N

Wherein, D indicates the training set as discharge estimation mode input layer, (x therein_i,y_i) indicate i-th in training set The independent variable and dependent variable point pair of group data, x_iIndicate argument data collection, i.e. discharge variation, according to being analyzed above, altogether Including three variations, respectively VSP_t, v_t-1, a_t-1, wherein VSP_tIndicate t moment, i.e. the instantaneous vehicle at current time compares function Rate, v_t-1And a_t-1Respectively indicate the instantaneous velocity and acceleration at t-1 moment；y_iIndicate bus emission index, including CO, CO₂, HC, NO_XThe emission index of four kinds of emissions；N is the number of samples of input data.

3) loss function type, and the method for determining regression criterion are determined, gradient is established and promotes regression tree model.

To improve the estimated accuracy of emission index and the extensive degree of very high model, gradient promotes regression tree model and passes through iteration To obtain final discharge estimated result, regression tree sum is indicated more regression trees with M.

It selects to use negative binomial log-likelihood function as loss function in the present invention, expression formula is as follows:

L (y, f (x))=log (1+exp (- 2yf (x)))；

Wherein, y surveys emission index value for being dependent variable value；F (x) indicates emission index estimated value.By utilizing gradient Descending method uses the negative gradient of loss function in the value of "current" model, as the regression tree newly constructed the residual error to be fitted Approximation optimizes loss function with this.

Illustrate the construction method of model by taking the process of the m regression tree of training as an example below.

Initialize the form of weak learner are as follows:

Wherein, N is the number of samples of input data；C is initial leaf node output parameter, be can according to need customized Variable c；L(y_i, c) and indicate the loss function obtained with i-th of sample training.

For each sample (x_i,y_i), residual error is determined using the method that gradient declines:

Wherein, r_m,iIndicate the residual error of i-th of sample in the m regression tree；f_m-1(x_i) indicate the m-1 regression tree training Obtained learner, i.e., when independent variable is x_iThe emission index estimated value that the m-1 regression tree of Shi Liyong acquires is calculating the m F is used when the residual error of i-th of sample of regression tree_m-1(x_i) replace f (x_i).Residual error is calculated by gradient descent method, so that each Loss function is mobile and smaller and smaller to negative gradient direction when iteration, to obtain more and more accurate model.

After determining residual error, { (x is utilized_i,r_m,i)}_{I=1,2 ..., N}Train the m regression tree T_mMethod it is as follows:

For regression tree T_mEach leaf node, the calculation method of output valve are as follows:

Obtain regression tree T_mAll leaf node output valves after, renewal learning device:

Wherein, feature space is divided into J unit { R by regression tree₁,R₂,...,R_J, feature space refers to every recurrence The generating mode of leaf nodes is exactly specifically the number and value range according to independent variable, determines the side for dividing leaf node Method, the present invention in, refer to independent variable VSP_t, v_t-1And a_t-1Value within a certain range when, such as 2≤VSP_t≤ 5,10≤v_t-1 ≤ 20, -2.5≤a_t-1When≤0, it can be divided into a leaf node, at this time 2≤VSP_t≤ 5,10≤v_t-1≤ 20, -2.5≤a_t-1 ≤ 0 is exactly one of division unit, and above-mentioned value range can be set according to actual needs；R_m,jIndicate the m recurrence J-th of unit of tree, each unit indicate a division classification；I(x∈R_m,j) it is indicator function, x is independent variable, that is, is discharged Variation VSP_t, v_t-1And a_t-1.As regression tree T_mDetermine x ∈ R_m,jWhen, that is, indicate that the independent variable belongs to R_m,jIn unit, at this time I value takes 1, is otherwise 0；c_m,jIndicate the output valve for the leaf node that the m regression tree obtains under j-th of feature unit, i.e. this spy Levy the emission index value under unit.

The generalization ability of model is improved to avoid model over-fitting, gradient boosting algorithm introduces shrinkage parameters v, then more The expression formula of new learner becomes:

Shrinkage parameters v is known as learning rate, and v=1 is ungauged regions, when learning rate takes smaller value, can effectively improve model Generalization ability avoids model from over-fitting occur, but the size of learning rate is directly proportional to the complexity of model, therefore learning rate is big Small selection answers the performance of equilibrium model and calculates the time.

Wherein,Indicate that gradient promotes regression tree.

It is that M iteration is carried out using training set data that gradient, which promotes regression tree, and each iteration generates a regression tree model, By using the method that gradient declines, in each iteration by making loss function to the movement of the negative gradient direction of loss function It is smaller and smaller, to obtain more and more accurate model.Each time in iterative process, the output valve by calculating leaf node obtains spy The leaf node output valve in space in all feature units is levied, finally adds up and obtains all leaves in M regression tree feature space Node exports value set.When carrying out emission index estimation using test set, using the independent variable in test set as mode input, instruction The regression tree model perfected can classify to independent variable according to ready-portioned feature unit, determine its which belonging feature list Then member can predict the emission index of the test sample with the mean predicted value of all training samples in this feature unit Value.

4) gradient promotes regression tree model parameter regulation.

The parameter that gradient promotes regression tree mainly includes two classes, and the first kind is the parameter of regulating gradient method for improving, and second Class is the parameter that control returns tree construction.

Important gradient is promoted there are two parameters, is learning rate and regression tree number respectively, is used respectively in Python Learning_rate and n_estimators are indicated, are used to regulating step 3) in shrinkage parameters v and regression tree sum M. The default value of learning_rate is 0.1.

There are four important regression tree structural parameters: (1) depth capacity set, and (2) each node needs to continue division Minimum sample number, minimum sample number needed for (3) generate leaf node, the characteristic of (4) feature space, i.e., the J value in step 3) Size.The depth for indicating tree with max_depth in Python, with min_samples_split indicate each node need after The minimum sample number of continuous division indicates minimum sample number needed for generating leaf node with min_samples_leaf, uses max_ The characteristic of features expression feature space.

Firstly, regulating gradient promotes parameter value.To determine that gradient promotes parameter, the initial of regression tree structural parameters is first set Value.According to the total sample number and variable number of training set D, the depth max_depth of tree generally chooses 5-8, to avoid initial model There is over-fitting, chooses smaller value 5 here；Each node needs to continue the minimum sample number min_samples_split of division Value range generally between the 0.5%-1% of the total sample number of training set D, the sample number of this experiment about 30,000, therefore Min_samples_split initial value may be set to 150；To avoid over-fitting, minimum sample number needed for generating leaf node Min_samples_leaf chooses 20 and is used as initial value；The characteristic max_features of feature space is selected 5 as initial Value.After initial value is provided with, under conditions of learning_rate default value is 0.1, regression tree number n_ is adjusted estimators.Using trellis search method, it is incremented by with 10 numbers, 80 is incremented to from 20, according to the average value of cross validation (cross_val_score) optimal n_estimators is determined.If optimal value is too large or too small, need to readjust Learning_rate value continues to find n_estimators optimal value using trellis search method.

Tree construction is returned secondly, determining.It is preferentially to adjust the ginseng being affected to result adjusting regression tree structural parameters Number.The depth max_depth of tree and each node need the minimum sample number min_samples_split for continuing division directly to determine Surely the structure set, therefore preferential adjusting max_depth and min_samples_split.Max_depth can measure 10 from 5, Min_samples_split measures 300 from 150 for interval with 10, chooses optimal value using trellis search method.Determine max_ After the optimal value of depth and min_samples_split, min_samples_leaf is adjusted, can be interval with 10 100 are measured from 10, therefrom chooses optimal value.It finally needs that the maximum characteristic max_features of feature space is adjusted, Adjustable range measures 10 from 2, therefrom chooses optimal value.After the completion of parameter regulation, final discharge estimation model is obtained.

A kind of city bus emission index estimation method promoting regression tree based on gradient proposed by the present invention, herein with reality Industry's city bus emissions data of survey carries out example.

(1) data explanation

Using PEMS and GPS device on April 10th, 2016 to during April 20 to 1 tunnel of industry, 51 tunnels and No. 206 buses carry out emissions data acquisition, and the attribute for including is as shown in the table:

Table 2 surveys the attribute value that emissions data includes

The 1- moment	2- seating capacity	3- instantaneous velocity (m/s)	4- acceleration (m/s²)
				5- height above sea level (m)	6- longitude and latitude	7- time interval (s)	8- operating range (m)
9-CO emission index (g/s)	10-CO₂Emission index (g/s)	11-HC emission index (g/s)	12-NO_xEmission index (g/s)

According to the seating capacity that record obtains, it is estimated that the passenger capacity of each moment dynamic change；It is measured according to GPS Elevation data, it is estimated that road grade；According to time interval data, between available PEMS and the sampling of GPS device Every to carry out standardization processing, and the row that the PEMS emissions data measured and GPS are measured using lagrange-interpolation It sails state and geographic information data is merged, be obtained by 30, more than 000 item of second emissions data；Meanwhile according to speed, acceleration The information such as degree, road grade and passenger capacity, can calculate vehicle specific power (VSP), to measure the current operating condition of bus. In addition, it is contemplated that the driving status at t-1 moment can influence the emission performance of t moment to a certain extent, therefore in the number of t moment According to the velocity and acceleration that the t-1 moment is added is concentrated, v is used respectively_t-1And a_t-1It indicates, the emissions data attribute finally obtained is as follows Shown in table:

The attribute value that data packet contains is discharged after 3 standardization processing of table

The 1- moment	2- dynamic passenger capacity	3- instantaneous velocity (m/s)	4- acceleration (m/s²)
				5- road grade	6-VSP	7-v_t-1	8-a_t-1
9-CO emission index (g/s)	10-CO₂Emission index (g/s)	11-HC emission index (g/s)	12-NO_xEmission index (g/s)
				13- operating range (m)

It according to the ratio of 7:3 is training set by emissions data test value random division by the performance of assessment institute proposition model And test set.Training set is used to have the training of supervision, the emission index in test set estimated based on trained model to Assessment models effect.

(2) model foundation

Since the emission performance of different fuel type bus is there are significant difference, model training and survey are used in the present invention The data of examination are the actual measurement emissions data of liquefied natural gas bus.Liquefied natural gas public transport is established using the step in specification Model is estimated in the discharge of vehicle, and carries out parameter regulation, and the results are shown in Table 4:

Each parameter value of table 4

learning_rate	n_estimators	max_depth	min_samples_leaf	min_samples_split	max_features
						0.05	50	7	28	175	6

Fig. 2 gives to CO, CO in test set₂、HC、NO_XIn the estimation of four kinds of emission index with the comparing result in actual measurement. It was found that measured value and the probability value p-value of estimated value are respectively less than 0.01, illustrate that estimated value to measured value is significant relevant.Together When, coefficient of determination R²Value is all larger than 0.6, illustrates that model has preferable estimation effect to four kinds of emissions.

(3) effect analysis

For the estimation effect for further verifying model, U.S. EPA is utilized ' s MOtor Vehicle Emission Simulator (MOVES) model carries out Comparative result, and using three kinds of common verifying indexs: mean absolute error (MAE), Mean absolute percentage error (MAPE), root mean square error (RMSE) evaluate the effect of proposed model.Further, since Regional is not provided in MOVES, and does not include liquefied natural gas bus type, therefore is estimated carrying out discharge using MOVES Meter is to have selected to replace with Indiana, USA similar in Jiangsu Province's landform, simultaneous selection compressed natural gas bus approximation Liquefied natural gas bus.Table 5 illustrates the comparison between the emission index estimation effect of two kinds of models, the results showed that is proposed Model works well in emission index estimation.

The performance comparison of 5 model of table

Claims

1. a kind of city bus emission index estimation method for promoting regression tree based on gradient, which is characterized in that including walking as follows It is rapid:

(1) standardization processing is carried out to the bus discharge of acquisition and running data, obtained special by second emission index and driving status Levy supplemental characteristic；

(2) according to bus driving condition calculation of characteristic parameters real-time vehicle specific power, and it is previous with velocity and acceleration characterization The driving status of second, the emissions data obtained based on step (1) determines training set, as mode input parameter；

(3) the loss function L of model is determined according to the input parameter that step (2) obtains, and sets regression tree number M, and initialize Weak learner constructs new recurrence as residual error approximation in the value of current regression tree model with the negative gradient of loss function Tree；

(4) regression tree determined in an iteration according to step (3), renewal learning device function, until M iteration knot Beam, i.e. M regression tree obtain final strong learner model；

2. the city bus emission index estimation method according to claim 1 for promoting regression tree based on gradient, feature It is, in step (1), standardization processing is carried out to the data of acquisition according to the following formula:

For n+1 point to (x₀,y₀),(x₁,y₁),...,(x_n,y_n), seek a function l_i(x), make the function in x_iPlace obtains Corresponding y_iValue, l_iIt (x) is Lagrangian fundamental polynomials, i.e. Interpolation-Radix-Function, expression formula are as follows:

Wherein, n+1 indicates the point of data set to number；x_nIndicate (n+1)th point to it is corresponding at the time of；y_nIndicate (n+1)th point Pair discharge and driving status characteristic variable value；

3. the city bus emission index estimation method according to claim 1 for promoting regression tree based on gradient, feature It is, in step (1), the bus is by CO, CO that second emission index includes that bus discharges during actual travel₂、HC、 NO_XBy second emission index；The real-time driving condition characteristic parameter includes speed, acceleration, road grade and dynamic passenger capacity.

4. the city bus emission index estimation method according to claim 1 for promoting regression tree based on gradient, feature It is, it is described to utilize bus driving condition calculation of characteristic parameters real-time vehicle specific power in step (2), and with speed and add The driving status method that speed characterizes the previous second is as follows:

(2.1) the bus driving parameters data obtained using step (1), calculate the vehicle specific power of bus, specific to calculate Method is as follows:

In formula, VSP is the vehicle specific power of bus, F_tIt is tractive force (N)；V is travel speed (m/s)；M is bus gross weight Amount, including vehicle body nt wt net weight and carrying weight (kg)；F_f,F_w,F_i,F_j, respectively indicate rolling resistance, air drag, gradient resistance And acceleration resistance (N)；A indicates bus acceleration (m/s²)；G is acceleration of gravity (9.8m/s²)；F is coefficient of rolling resistance, For dimensionless group；ε_iIndicate quality factor；α indicates road grade；ρ_aIndicate atmospheric density；C_DTraction coeficient；A indicates public transport Car bumper wind transparency area；

(2.2) velocity and acceleration of previous driving status is obtained by second driving status supplemental characteristic according to bus, i.e., before One second velocity and acceleration.

5. the city bus emission index estimation method according to claim 4 for promoting regression tree based on gradient, feature It is, in step (2), the emissions data obtained based on step (1) determines training set, as mode input parameter, above-mentioned training It is as follows to collect the method for determination:

D={ (x₁,y₁),(x₂,y₂),...(x_i,y_i),...,(x_N,y_N) i=1,2 ..., N

Wherein, D indicates the training set as discharge estimation mode input layer, (x therein_i,y_i) indicate i-th group of number in training set According to independent variable and dependent variable point pair, x_iIt indicates argument data collection, i.e. discharge variation, altogether includes three variations, Respectively VSP_t, v_t-1, a_t-1, wherein VSP_tIndicate t moment, the i.e. instantaneous vehicle specific power at current time, v_t-1And a_t-1Table respectively Show the instantaneous velocity and acceleration at t-1 moment；y_iIndicate bus emission index, including CO, CO₂, HC, NO_XThe row of four kinds of emissions Rate is put, N is the number of samples of input data.

6. the city bus emission index estimation method according to claim 1 for promoting regression tree based on gradient, feature It is, in step (3), regression tree number M is preset, and loss function is negative binomial log-likelihood function, and expression formula is as follows:

L (y, f (x))=log (1+exp (- 2yf (x)))；

Initialize the form of weak learner are as follows:

Wherein, N is the number of samples of input data；C is initial leaf node output parameter；L(y_i, c) and it indicates with i-th of sample The loss function that training obtains.

7. the city bus emission index estimation method according to claim 6 for promoting regression tree based on gradient, feature It is, in step (3), with the negative gradient of loss function in the value of "current" model, is fitted as the regression tree newly constructed residual The approximation of difference, for each sample (x_i,y_i), residual error is calculated using the method for gradient decline:

Wherein, r_m,iIndicate the residual error of i-th of sample in the m regression tree；f_m-1(x_i) indicate that the m-1 regression tree training obtains Learner, i.e., when independent variable be x_iThe emission index estimated value that the m-1 regression tree of Shi Liyong acquires is calculating the m recurrence F is used when setting the residual error of i-th of sample_m-1(x_i) replace f (x_i)；

8. the city bus emission index estimation method according to claim 7 for promoting regression tree based on gradient, feature It is, the method for one regression tree of fitting is in step (3): utilizes the residual error of i-th of sample in calculated the m regression tree r_m,i, gathered { (x_i,r_m,i)}_{I=1,2 ..., N}, to train the m regression tree T_m, the division region note of leaf node For R_m,j, j=1,2 ..., J.

9. the city bus emission index estimation method according to claim 8 for promoting regression tree based on gradient, feature It is, in step (3), the method for solving leaf node output valve is, for regression tree T_mEach leaf node:

Wherein, c_m,jIndicate the leaf node output valve in j-th of feature unit of the m regression tree, the i.e. estimated value of emission index.

10. the city bus emission index estimation method according to claim 9 for promoting regression tree based on gradient, feature It is, in step (4), the update method of learner are as follows: obtain regression tree T_mAll leaf node output valves after, renewal learning Device:

Wherein, feature space is divided into J unit { R by regression tree₁,R₂,...,R_J, feature space refers to every recurrence leaf The generating mode of node determines the method for dividing leaf node that is, according to the number of independent variable and value range；R_m,jIndicate the m J-th of feature unit of regression tree, each feature unit indicate a division classification；I(x∈R_m,j) it is indicator function, x is certainly Variable, i.e. discharge variation VSP_t, v_t-1And a_t-1, as regression tree T_mDetermine x ∈ R_m,jWhen, that is, indicate that the independent variable belongs to R_m,j In unit, I value takes 1 at this time, is otherwise 0；c_m,jIndicate the defeated of the leaf node that the m regression tree obtains under j-th of feature unit It is worth out, i.e. emission index value under this feature unit；

Wherein, shrinkage parameters v is known as learning rate；

According to the continuous iteration of process of the m regression tree of training, until obtaining the final ladder of M regression tree superposition after iteration M times Degree promotes regression tree model, and representation method is as follows:

Wherein,Indicate that gradient promotes regression tree.