CN116796141A

CN116796141A - GBDT regression model-based office building energy consumption prediction method

Info

Publication number: CN116796141A
Application number: CN202210246202.9A
Authority: CN
Inventors: 郑清涛; 李进; 王栋伟; 张玲; 骆丽仪; 孙金礼; 熊湜; 吴咏昆; 陈丝绸
Original assignee: Shuifa Xingye Energy Zhuhai Co ltd; Zhuhai China Construction Xingye Green Building Design Institute Co ltd; Zhuhai Xingye Energy Saving Science And Technology Co ltd; Zhuhai Singyes Green Building Technology Co Ltd; Shuifa Energy Group Co Ltd
Current assignee: Shuifa Xingye Energy Zhuhai Co ltd; Zhuhai China Construction Xingye Green Building Design Institute Co ltd; Zhuhai Xingye Energy Saving Science And Technology Co ltd; Zhuhai Singyes Green Building Technology Co Ltd; Shuifa Energy Group Co Ltd
Priority date: 2022-03-14
Filing date: 2022-03-14
Publication date: 2023-09-22

Abstract

The invention is suitable for the technical field of building energy consumption, and discloses an office building energy consumption prediction method based on a gradient lifting decision tree (GBDT) regression model. The prediction method comprises the steps of obtaining historical energy consumption data of a building and influence characteristic data thereof, sorting the data and dividing the data by seasons; performing feature selection according to the Pearson correlation coefficient and the feature influence degree of ETC; the training method comprises the steps of carrying out normalization processing on data in different seasons; setting a square loss function, an absolute loss function and a regression tree learning rate, and training a GBDT model; inputting data to be predicted in a corresponding season into the GBDT model, and obtaining a building energy consumption predicted value of the day to be predicted after inversely normalizing the predicted output data; and finally, verifying the GBDT model effect by using a Root Mean Square Error (RMSE) and a Mean Absolute Error (MAE). The GBDT prediction flow and method can eliminate the influence of different dimensions on the prediction result, reduce the dimension of the input vector, improve the training speed of the model, reduce the influence of abnormal values in the sample on the training and the prediction result, have higher prediction accuracy and have certain practical significance on building energy-saving construction.

Description

GBDT regression model-based office building energy consumption prediction method

Technical Field

The invention relates to the technical field of building energy consumption, in particular to an office building energy consumption prediction method based on a gradient lifting decision tree (GBDT) regression model.

Background

With the development of social economy, the urban process is accelerated, and the building energy consumption is increasingly larger in national energy consumption, so that the whole society has to pay attention to building energy conservation transformation. The accuracy of energy consumption prediction is an important focus in the building field for energy saving modes. Meanwhile, the country is greatly pushing the construction of buildings with near zero energy consumption and ultra-low energy consumption, and decision basis can be provided for balancing the energy consumption and energy supply of the buildings and the operation scheme of an energy storage system through the high-precision energy consumption prediction technology, so that the optimal operation benefit of the buildings is improved, and the healthy and durable development of the buildings with near zero energy consumption and the ultra-low energy consumption in China is promoted.

The building energy consumption prediction is influenced by various factors such as ambient temperature, meteorological conditions and the like, and the modeling accuracy is also influenced and reduced; in addition, in the energy consumption prediction data, due to the existence of various special conditions, abnormal values are not contained, so that errors are increased, and the existing problems make the establishment of a high-precision prediction model difficult.

Disclosure of Invention

The invention aims to provide an office building energy consumption prediction method based on a GBDT regression model, which has the following effects: the method can eliminate the influence of different dimensions on the prediction result, reduce the dimension of the input vector, improve the training speed of the model, reduce the influence of the abnormal value in the sample on the training and the prediction result, and has higher prediction accuracy.

An office building energy consumption prediction method based on GBDT regression model is shown in the attached figure 1, and is characterized by comprising the following steps: and acquiring historical energy consumption data of the building and influence characteristic data thereof. The influence characteristic data not only comprises the acquired climate environment data of the building and the building self enclosure structure, but also particularly comprises personnel conditions in the building and the performance of energy utilization equipment in the building, and the historical energy consumption data of the building are the energy consumption data of various equipment of the building, the energy consumption data of a refrigerating unit and the like.

The acquired data are sorted and divided by seasons; analyzing the correlation between the influence factors of the building energy consumption and the building energy consumption through the Pearson correlation coefficient, and selecting main characteristics according to the degree of correlation; simultaneously, respectively inputting the data into an ETC model and outputting the characteristic importance index; and finally, weighting and averaging the Pearson correlation coefficient and the feature importance index corresponding to each feature, and selecting several features with high correlation as main influence features.

And normalizing the main characteristic data and the historical building energy consumption data corresponding to the characteristic data to obtain training sample data. The method mainly comprises the following implementation steps:

and respectively generating input vectors by using the divided main characteristic data, wherein the corresponding historical building energy consumption data is used as output vectors.

And carrying out normalization processing on the divided input vector and output vector, namely adopting a Z-score normalization method to change each characteristic dimension attribute value of the original data into Gaussian distribution with the mean value of 0 and the standard deviation of 1, and generating dimensionless training sample sets in different seasons, thereby eliminating the situation of larger prediction result difference caused by inconsistent dimensionality. The Z-score normalization method is expressed as:

where x is an input vector, μ is the average value of the columns of the input vector x, δ is the standard deviation of the columns of the input vector x, and x' is the value of the input vector x normalized by the Z-score standard deviation.

Respectively inputting the training sample sets of different seasons after normalization processing to construct GBDT regression prediction models, wherein the sample sets are as followsWherein x is an input vector and y is an output vector; and carrying out regression prediction by using the square loss and absolute loss GBDT algorithm and the learning rate of the regression tree. As shown in fig. 2, the GBDT model is iterated through multiple rounds, each round of iteration producing a weak classifier, each classifier being of the last round of classifierTraining on the basis of the negative gradient, and outputting a result after the training is iterated for the maximum number of times to obtain a final predicted value of the building energy consumption.

Preferably, the square loss function is set as:

formula (1)

Preferably, the absolute loss function is set as:

formula (2)

In the formula (1) and the formula (2), y is an actual building energy consumption value, and f (x) is a predicted building energy consumption value.

Further, inputting the training sample set T, and initializing a GBDT regression prediction model. Setting the number k=1, 2 …, K of regression trees, and for samples i=1, 2, …, N; calculating the initial value f of the kth-1 tree ₀ (x) A. The invention relates to a method for producing a fibre-reinforced plastic composite When f ₀ (xi) Mean time f of building energy consumption actual value in sample T _a0 (xi) Calculating the residual A of the ith sample position _i The method comprises the steps of carrying out a first treatment on the surface of the When f ₀ (xi) Median f, the actual value of the building energy consumption in the sample T _b0 (xi) When calculating the residual B of the ith sample position _i 。

Further, the square loss function is very sensitive to outliers, resulting in excessive prediction errors at outliers, while the absolute loss function is more robust to outliers, thus optimizing the negative gradient calculation method.

Preferably, the negative gradient r of the loss function L _ki The calculation method of (1) is as follows:

1. when A is _i <B _i When r is _ki The calculation formula is as follows:

formula (3)

2. When A is _i >B _i When r is _ki The calculation formula is as follows:

formula (4)

In the formula (3) and the formula (4), yi is the building energy consumption actual value of the ith sample of the kth-1 tree in the T samples, f _k-1 (xi) Building energy consumption predicted value of the ith sample of the kth-1 tree, sign is a sign function.

Traversing each feature in the sample set T, according to the formula (1) and the formula (2), calculating total loss functions of building energy consumption actual values under all possible division conditions according to a certain feature of the ith sample position as a division node, and taking a feature value corresponding to the combination with the minimum first total loss function as the division node.

Preferably, the new training set calculated from the negative gradientFitting a regression tree to obtain a leaf node region R of the kth tree _kj For j=1, 2, …, J, a constant value C that minimizes the loss function L is estimated using a linear search _kj . Since the calculation of the negative gradient combines the square loss and the absolute loss, to avoid C _kj The value of (2) is inaccurate, and the calculation formula is as follows:

formula (5)

In the formula (5), gamma _kj For the mean value of the pseudo residual errors in the jth leaf node of the kth tree, N is the number of samples of the jth leaf node of the kth tree, and min () represents the minimum value taken among them.

Further, updating the regression tree to obtain i building energy consumption predicted values f output by the kth tree _k (x)

Formula [ (formula ]6)

In the formula (6), f _k-1 (x) The predicted value of building energy consumption of the kth-1 tree is that J is the number of leaf nodes of the regression tree, C _kj For a constant value in the j-th node of iteration k minimizing the loss function, I (x ε R _kj ) As a sexual function lr is the learning rate.

Preferably, since the original learning rate is 1, the GBDT model is usually trained to obtain the same number, so that the model is predicted to be fitted, and the prediction accuracy of the model is greatly reduced. Therefore, according to the idea of kringing (Shrinkage), the present invention sets the learning rate lr=0.08.

Preferably, after the regression tree is fitted for the maximum iteration number K, the final GBDT prediction model F (x) is output as follows:

formula (7)

Preferably, the input vector of the day to be predicted in the corresponding season is normalized through the normalization method, a corresponding final GBDT model is input, prediction data is output, and the building energy consumption prediction value of the day to be predicted is obtained after the inverse normalization method is adopted.

Preferably, the evaluation index comprises average absolute error MAE and root mean square error RMSE; and calculating the RMSE and the MAE according to the actual building energy consumption value and the building energy consumption predicted value of the day to be predicted, and evaluating the prediction precision of the model, wherein the smaller the RMSE and the MAE, the better the prediction effect of the model is, and the worse the model is otherwise.

According to the feature selection method provided by the invention, the main features are extracted by combining the Pearson correlation coefficient measurement and the ETC feature influence degree measurement, so that the dimension of an input vector is reduced, the training efficiency can be improved, and the overfitting is reduced; the normalization processing method can eliminate the influence of different dimensions on the prediction result; most importantly, the office building energy consumption prediction method based on the GBDT regression model provided by the invention combines the square loss algorithm with the absolute loss algorithm, reduces the influence of abnormal values in samples on training and prediction results, improves the prediction accuracy, and simultaneously uses the learning rate to perform model training, so that the model has higher prediction accuracy, more accurate energy consumption prediction values can be obtained, and the energy saving construction quality of the building is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art. In the accompanying drawings:

FIG. 1 is a schematic diagram of a building energy consumption prediction process according to the present invention;

FIG. 2 is a flow chart of GBDT model construction.

Detailed Description

The present invention will be described in detail with reference to specific embodiments, and the flow chart is shown in fig. 1.

Data is collected. Because the energy consumption of the lighting equipment, the air conditioner of the refrigerating machine room and the office equipment is the highest in the office building energy consumption, in the embodiment, the prediction of the energy consumption of the lighting equipment is taken as an example, the influence factor data of the building energy consumption and the sources of the historical building energy consumption data are the data collected by a certain office building, wherein the original input vector x comprises the outdoor temperature, the sunlight level, the solar radiation, the building area, the orientation, the attendance number, the number of the lighting equipment, the running time of the lighting equipment and the like; the output vector y includes the amount of electricity used by the lighting devices within the building.

Data partitioning and feature selection. The influence factors of building energy consumption are more complicated, and mainly comprise local solar radiation, precipitation, wind speed and sunlight time, the position and the orientation of the building envelope and thermal performance of the building envelope, different functional types of the building, use objects, energy utilization equipment types and the like. During model training, the model training speed is greatly slowed down by excessive feature vectors. Therefore, to improve training efficiency, a main feature vector needs to be selected.

Dividing the acquired data by seasons; analyzing the correlation between the influence factors of the building energy consumption and the building energy consumption through the Pearson correlation coefficient, and selecting main characteristics according to the degree of correlation; simultaneously, the data are respectively input into an ETC model, the characteristic importance indexes are output, the Pearson correlation coefficient and the characteristic importance index corresponding to each characteristic are weighted and averaged, and several characteristics with high correlation are selected as main influence characteristics. The final determined characteristics are outdoor temperature, solar radiation, attendance, number of lighting devices, lighting device run time.

And normalizing the input vector of the main characteristic data and the output vector corresponding to the characteristic to normalize the data to be between 0 and 1 so as to facilitate unified modeling and calculation of the system. Specifically, a Z-score standardization method is adopted to change each characteristic dimension attribute value of original data into Gaussian distribution with a mean value of 0 and a standard deviation of 1, and a dimensionless training sample set is generated, so that the situation that the prediction results have large differences due to inconsistent dimensionality is eliminated. The Z-score normalization method is expressed as

Formula (1)

In the formula (1), x is an input vector, μ is a mean value of a column in which the input vector x is located, δ is a standard deviation of the column in which the input vector x is located, and x' is a value obtained by normalizing the input vector x by the Z-score standard deviation. Thereby obtaining training sample data.

And constructing the GBDT model by using the obtained training sample set. The sample set isWherein x is an input vector comprising collected outdoor temperature, solar radiation, attendance number, number of lighting devices and running time data of the lighting devices, and y is an output vector comprising collected electricity consumption of the lighting devices in the building; regression prediction is performed by using a square loss and absolute loss GBDT algorithm. The GBDT model is iterated through multiple rounds, each round of iteration producing a weak classifier, eachThe classifier is trained on the basis of the negative gradient of the previous round of classifier, and the training is output after the maximum number of iterations to obtain the final building energy consumption predicted value.

First, the square loss function is set as:

formula (2)

Setting the absolute loss function as

Formula (3)

In the formula (2) and the formula (3), y is an actual building energy consumption value, and f (x) is a predicted building energy consumption value.

And inputting the training sample set T, and initializing a GBDT model. Setting the number of regression trees k=1, 2, …, K and the number of samples i=1, 2, …, N; since the first tree is preceded by the 0 th tree, calculating f (x) of the first tree may take an initial value f according to the loss function used ₀ (x) F (x) is then the predicted value for the k-1 tree. When f ₀ (xi) Mean time f of building energy consumption actual value in sample T _a0 (xi) Calculating the residual A of the ith sample position _i The method comprises the steps of carrying out a first treatment on the surface of the When f ₀ (xi) Median f, the actual value of the building energy consumption in the sample T _b0 (xi) When calculating the residual B of the ith sample position _i 。

A negative gradient of the loss function L is calculated. Since the square loss function is very sensitive to outliers, the prediction error at outliers is excessive, while the absolute loss function is more robust to outliers. Thus, the negative gradient r of the loss function L _ki The calculation method of (1) is as follows:

1. when A is _i <B _i When r is _ki The calculation formula is as follows:

formula (4)

2. When A is _i >B _i When r is _ki The calculation formula is as follows:

formula (5)

In the formula (4) and the formula (5), yi is the actual building energy consumption value of the ith sample of the kth-1 tree in the T samples, f _k-1 (xi) Building energy consumption predicted value of the ith sample of the kth-1 tree, sign is a sign function.

And traversing each feature in the sample set T, according to the formula (2) and the formula (3), calculating the total loss function of the building energy consumption actual values under all possible division conditions according to a certain feature of the ith sample position as a division node, and taking the feature value corresponding to the combination with the minimum first total loss function as the division node.

New training set calculated from negative gradientsFitting a regression tree to obtain a leaf node region R of the kth tree _kj For j=1, 2, …, J, a constant value C that minimizes the loss function L is estimated using a linear search _kj . Since the calculation of the negative gradient combines the square loss and the absolute loss, to avoid C _kj The value of (2) is inaccurate and the formula is as follows:

formula (6)

In the formula (6), gamma _kj For the mean value of the pseudo residual errors in the jth leaf node of the kth tree, N is the number of samples of the jth leaf node of the kth tree, and min () represents the minimum value taken among them.

Updating the regression tree to obtain i building energy consumption predicted values f output by the kth tree _k (x)

Formula (7)

In the formula (7), f _k-1 (x) The predicted value of building energy consumption of the kth-1 tree is that J is the number of leaf nodes of the regression tree, C _kj For a constant value in the j-th node of iteration k minimizing the loss function, I (x ε R _kj ) As a sexual function lr is the learning rate.

The original learning rate is 1, so that the GBDT model is usually trained to obtain the same number, and the prediction is fitted, so that the prediction accuracy of the model is greatly reduced. Therefore, according to the idea of kringing (Shrinkage), the present invention sets the learning rate lr=0.08.

After the regression tree is fitted for K times with the maximum iteration number, a final GBDT prediction model F (x) is output as follows:

formula (8)

The input vector of the day to be predicted in the corresponding season is normalized through the normalization method, and the input vector comprises outdoor temperature, solar radiation, attendance number, lighting equipment number and lighting equipment running time data corresponding to training sets in different seasons; and inputting a corresponding final GBDT model, outputting prediction data, and obtaining a predicted value of the building energy consumption on the day to be predicted after processing by an inverse normalization method.

Finally, the evaluation index comprises an average absolute error MAE and a root mean square error RMSE; calculating RMSE and MAE according to the actual building energy consumption value and the predicted building energy consumption value of the day to be predicted, and evaluating the prediction precision of the model; wherein, the smaller the RMSE and the MAE, the better the model prediction effect, and conversely, the worse the model prediction effect. To further verify the effect of the model, the same dataset was predicted using a multiple regression model, and the collected actual and predicted values of the power consumption of the lighting devices in the building for 6 months and 30 days were compared to the prediction error of the multiple regression model, to obtain the following table 1:

by comparing the energy consumption prediction precision of two different models of the lighting equipment, the error of the GBDT decision tree energy consumption prediction model is smaller than that of the multiple regression energy consumption prediction model, and the prediction error is smaller for daily energy consumption prediction of the lighting equipment.

The embodiment of the invention predicts the energy consumption of the office building based on the GBDT regression model, and the prediction result has higher accuracy, can effectively help to improve the energy-saving construction quality of the building, and is specifically embodied in:

firstly, the feature selection method combines the Pearson correlation coefficient measurement and the ETC feature influence degree measurement, extracts main features, reduces the dimension of an input vector, can improve training efficiency and reduces overfitting;

secondly, the Z-score normalization processing method is used, so that the influence of different dimensions on a prediction result can be eliminated;

thirdly, the office building energy consumption prediction method based on the GBDT regression model is used for combining a square loss algorithm and an absolute loss algorithm when the model is trained, so that the influence of abnormal values in samples on training and prediction results is reduced, and the prediction precision is improved;

fourth, the model training is carried out by using a proper learning rate, so that the law of the office building energy consumption can be better learned, and a more accurate energy consumption predicted value is obtained.

It can be appreciated that the office building energy consumption prediction method based on the GBDT regression model according to the embodiment of the invention can be applied to other building energy consumption prediction methods.

Embodiments of the present invention may be realized in hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or special purpose design hardware. Those of ordinary skill in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such as provided on a carrier medium such as a magnetic disk, CD or DVD-ROM, a programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the description of the present invention and the accompanying drawings or direct/indirect application in other related technical fields are included in the scope of the invention.

Claims

1. An office building energy consumption prediction method based on a gradient lifting decision tree (GBDT) regression model is shown in fig. 1, and is characterized by comprising the following steps:

step S1, acquiring historical energy consumption data of a building and influence factor data of the historical energy consumption data;

s2, sorting the acquired data and dividing the data by seasons, and determining main features by adopting Pearson correlation coefficients and ETC characteristic influence degree measurement influence factors;

step S3, carrying out normalization processing on the main characteristic data and the corresponding historical building energy consumption data to obtain training sample data;

s4, setting a square loss function, an absolute loss function and a regression tree learning rate, and constructing GBDT regression prediction models of different seasons by using the obtained training sample data;

s5, obtaining data to be predicted of the training sample in a corresponding season, including building energy consumption influence characteristic data of a day to be predicted, and obtaining a set to be predicted after normalization processing;

s6, inputting the to-be-predicted set into a GBDT model of a corresponding season, and processing predicted output data by an inverse normalization method to obtain a building energy consumption predicted value of the to-be-predicted day;

s7, taking a Root Mean Square Error (RMSE) and an average absolute error (MAE) as evaluation indexes of the GBDT model, and verifying the model effect; wherein, the smaller RMSE and MAE represent the better prediction effect, and vice versa.

2. The method for predicting energy consumption of office building based on GBDT regression model according to claim 1, wherein in step S1, the influence characteristic data of building energy consumption includes not only the collected weather environment data of the building, the building' S own enclosure structure, but also the personnel condition in the building, the performance of energy-consuming devices in the building, and the historical energy consumption data of the building are the energy consumption data of various devices in the building, the energy consumption data of the refrigerating unit, etc.

3. The method for predicting the energy consumption of an office building based on the GBDT regression model according to claim 1, wherein in the step S2, the raw data are divided into seasons, the correlation between the influence factors of the energy consumption of the building and the energy consumption of the building is analyzed through Pearson correlation coefficients, and the main characteristics are selected according to the degree of the correlation; and simultaneously inputting the data into an ETC model respectively, outputting a characteristic importance index of the data, and finally weighting and averaging the Pearson correlation coefficient and the characteristic importance index corresponding to each characteristic to select the characteristic with high correlation as the main influence characteristic.

4. The method for predicting the energy consumption of an office building based on the GBDT regression model according to claim 3, wherein in the step S3, the preprocessing comprises the input vector and output data dividing and normalizing; respectively generating input vectors by using the divided main characteristic data, wherein the corresponding historical building energy consumption data are used as output vectors; and carrying out normalization processing on the divided input vector and output vector, namely adopting a Z-score normalization method to change each characteristic dimension attribute value of the original data into Gaussian distribution with the mean value of 0 and the standard deviation of 1, and generating dimensionless training sample sets in different seasons, thereby eliminating the situation of larger prediction result difference caused by inconsistent dimensionality.

5. The method for predicting energy consumption of office building based on GBDT regression model according to claim 1, wherein in step S4, the training method comprises respectively inputting the training sample sets of different seasons after normalization processing to construct GBDT model,the sample set isWherein x is a feature vector and y is a target vector; and carrying out regression prediction by using the square loss and absolute loss GBDT algorithm and the learning rate of the regression tree.

6. The method for predicting energy consumption of an office building based on a GBDT regression model according to claim 5, wherein the square loss function and the absolute loss function are set as follows:

formula (1)

Formula (2)

7. The GBDT regression model based office building energy consumption prediction method of claim 5, wherein the original learning rate is 1, which usually results in the GBDT model being trained to obtain the same value, resulting in a predicted fit; therefore, according to the idea of kringing (Shrinkage), the learning rate lr=0.08 is set.

8. The GBDT regression model based office building energy consumption prediction method according to claim 1, wherein the square loss function according to equation (1) and the absolute loss function according to equation (2) are used for gradient lifting decision tree regression prediction according to the learning rate of claim 7; the GBDT model generates a weak classifier through multiple iterations, each classifier is trained on the basis of the negative gradient of the previous classifier, and the final building energy consumption predicted value is obtained by outputting the result after the maximum number of iterations of training, and the specific steps are as follows:

(1) Initializing a GBDT model, setting the number k=1, 2 …, K of regression trees, and calculating the initial value f of the kth-1 tree for the sample positions i=1, 2, …, N of the sample T ₀ (x) When f ₀ (xi) Mean time f of building energy consumption actual value in sample T _a0 (xi) Calculate residual A _i When f ₀ (xi) Median f, the actual value of the building energy consumption in the sample T _b0 (xi) When calculating residual B _i ；

(2) Calculating the negative gradient of the loss function L, the negative gradient r of the loss function L is because the square loss function is very sensitive to outliers, resulting in excessive prediction error at outliers, while the absolute loss function is more robust to outliers _ki The calculation method of (1) is as follows:

(a) When A is _i <B _i When r is _ki The calculation formula is as follows:

formula (3)

(b) When A is _i >B _i When r is _ki The calculation formula is as follows:

formula (4)

In the formula (3) and the formula (4), yi is the building energy consumption actual value of the ith sample of the kth-1 tree in the T samples, f _k-1 (xi) Building energy consumption predicted values of the ith sample of the (k-1) th tree, wherein sign is a symbol function;

(3) Traversing each feature in the sample set T, according to the formula (1) and the formula (2), calculating total loss functions of building energy consumption actual values under all possible division conditions according to a certain feature of the ith sample position as a division node, and taking a feature value corresponding to the combination with the minimum first total loss function as the division node;

(4) Obtained from step (2)Combining the step (3), fitting a regression tree to obtain a leaf node region R of the kth tree _kj For j=1, 2, …, J, a constant value C that minimizes the loss function L is estimated using a linear search _kj Since the calculation of the negative gradient combines the square loss and the absolute loss, to avoid C _kj The value of (2) is inaccurate, and the calculation formula is as follows:

formula (5)

In the formula (5), gamma _kj For the average value of pseudo residual errors in the jth leaf node of the kth tree, N is the sample number of the jth leaf node of the kth tree, and min () represents the minimum value taken by the sample number;

(5) Updating the regression tree to obtain i building energy consumption predicted values f output by the kth tree _k (x)

Formula (6)

In the formula (6), f _k-1 (x) The predicted value of building energy consumption of the kth-1 tree is that J is the number of leaf nodes of the regression tree, C _kj For a constant value in the j-th node of iteration k minimizing the loss function, I (x ε R _kj ) As an oscillography function, lr is a learning rate, wherein, as the original learning rate is 1, the GBDT model is usually trained to obtain the same number, so that the prediction is fitted, and the prediction accuracy of the model is greatly reduced, therefore, according to the idea of the shrinkability, the learning rate lr=0.08 is set;

(6) After the regression tree is fitted for K times with the maximum iteration number, a final GBDT prediction model F (x) is output as

Equation (7).

9. The method for predicting the energy consumption of an office building based on the GBDT regression model according to claim 1, wherein in the step S5, the input vector of the day to be predicted in the corresponding season is normalized by the normalization method, the corresponding GBDT regression prediction model is input, the prediction data is output, and the energy consumption predicted value of the building on the day to be predicted is obtained after the inverse normalization processing.

10. The method for predicting energy consumption of office building based on GBDT regression model according to claim 7, wherein in step S7, the evaluation index is used to obtain average absolute error MAE and root mean square error RMSE; and calculating the RMSE and the MAE according to the actual building energy consumption value and the building energy consumption predicted value of the day to be predicted, and evaluating the prediction precision of the model, wherein the smaller the RMSE and the MAE, the better the prediction effect of the model is, and the worse the model is otherwise.