CN116796141A - GBDT regression model-based office building energy consumption prediction method - Google Patents

GBDT regression model-based office building energy consumption prediction method Download PDF

Info

Publication number
CN116796141A
CN116796141A CN202210246202.9A CN202210246202A CN116796141A CN 116796141 A CN116796141 A CN 116796141A CN 202210246202 A CN202210246202 A CN 202210246202A CN 116796141 A CN116796141 A CN 116796141A
Authority
CN
China
Prior art keywords
energy consumption
data
building
gbdt
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210246202.9A
Other languages
Chinese (zh)
Inventor
郑清涛
李进
王栋伟
张玲
骆丽仪
孙金礼
熊湜
吴咏昆
陈丝绸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shuifa Xingye Energy Zhuhai Co ltd
Zhuhai China Construction Xingye Green Building Design Institute Co ltd
Zhuhai Xingye Energy Saving Science And Technology Co ltd
Zhuhai Singyes Green Building Technology Co Ltd
Shuifa Energy Group Co Ltd
Original Assignee
Shuifa Xingye Energy Zhuhai Co ltd
Zhuhai China Construction Xingye Green Building Design Institute Co ltd
Zhuhai Xingye Energy Saving Science And Technology Co ltd
Zhuhai Singyes Green Building Technology Co Ltd
Shuifa Energy Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shuifa Xingye Energy Zhuhai Co ltd, Zhuhai China Construction Xingye Green Building Design Institute Co ltd, Zhuhai Xingye Energy Saving Science And Technology Co ltd, Zhuhai Singyes Green Building Technology Co Ltd, Shuifa Energy Group Co Ltd filed Critical Shuifa Xingye Energy Zhuhai Co ltd
Priority to CN202210246202.9A priority Critical patent/CN116796141A/en
Publication of CN116796141A publication Critical patent/CN116796141A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/08Construction

Abstract

The invention is suitable for the technical field of building energy consumption, and discloses an office building energy consumption prediction method based on a gradient lifting decision tree (GBDT) regression model. The prediction method comprises the steps of obtaining historical energy consumption data of a building and influence characteristic data thereof, sorting the data and dividing the data by seasons; performing feature selection according to the Pearson correlation coefficient and the feature influence degree of ETC; the training method comprises the steps of carrying out normalization processing on data in different seasons; setting a square loss function, an absolute loss function and a regression tree learning rate, and training a GBDT model; inputting data to be predicted in a corresponding season into the GBDT model, and obtaining a building energy consumption predicted value of the day to be predicted after inversely normalizing the predicted output data; and finally, verifying the GBDT model effect by using a Root Mean Square Error (RMSE) and a Mean Absolute Error (MAE). The GBDT prediction flow and method can eliminate the influence of different dimensions on the prediction result, reduce the dimension of the input vector, improve the training speed of the model, reduce the influence of abnormal values in the sample on the training and the prediction result, have higher prediction accuracy and have certain practical significance on building energy-saving construction.

Description

GBDT regression model-based office building energy consumption prediction method
Technical Field
The invention relates to the technical field of building energy consumption, in particular to an office building energy consumption prediction method based on a gradient lifting decision tree (GBDT) regression model.
Background
With the development of social economy, the urban process is accelerated, and the building energy consumption is increasingly larger in national energy consumption, so that the whole society has to pay attention to building energy conservation transformation. The accuracy of energy consumption prediction is an important focus in the building field for energy saving modes. Meanwhile, the country is greatly pushing the construction of buildings with near zero energy consumption and ultra-low energy consumption, and decision basis can be provided for balancing the energy consumption and energy supply of the buildings and the operation scheme of an energy storage system through the high-precision energy consumption prediction technology, so that the optimal operation benefit of the buildings is improved, and the healthy and durable development of the buildings with near zero energy consumption and the ultra-low energy consumption in China is promoted.
The building energy consumption prediction is influenced by various factors such as ambient temperature, meteorological conditions and the like, and the modeling accuracy is also influenced and reduced; in addition, in the energy consumption prediction data, due to the existence of various special conditions, abnormal values are not contained, so that errors are increased, and the existing problems make the establishment of a high-precision prediction model difficult.
Disclosure of Invention
The invention aims to provide an office building energy consumption prediction method based on a GBDT regression model, which has the following effects: the method can eliminate the influence of different dimensions on the prediction result, reduce the dimension of the input vector, improve the training speed of the model, reduce the influence of the abnormal value in the sample on the training and the prediction result, and has higher prediction accuracy.
An office building energy consumption prediction method based on GBDT regression model is shown in the attached figure 1, and is characterized by comprising the following steps: and acquiring historical energy consumption data of the building and influence characteristic data thereof. The influence characteristic data not only comprises the acquired climate environment data of the building and the building self enclosure structure, but also particularly comprises personnel conditions in the building and the performance of energy utilization equipment in the building, and the historical energy consumption data of the building are the energy consumption data of various equipment of the building, the energy consumption data of a refrigerating unit and the like.
The acquired data are sorted and divided by seasons; analyzing the correlation between the influence factors of the building energy consumption and the building energy consumption through the Pearson correlation coefficient, and selecting main characteristics according to the degree of correlation; simultaneously, respectively inputting the data into an ETC model and outputting the characteristic importance index; and finally, weighting and averaging the Pearson correlation coefficient and the feature importance index corresponding to each feature, and selecting several features with high correlation as main influence features.
And normalizing the main characteristic data and the historical building energy consumption data corresponding to the characteristic data to obtain training sample data. The method mainly comprises the following implementation steps:
and respectively generating input vectors by using the divided main characteristic data, wherein the corresponding historical building energy consumption data is used as output vectors.
And carrying out normalization processing on the divided input vector and output vector, namely adopting a Z-score normalization method to change each characteristic dimension attribute value of the original data into Gaussian distribution with the mean value of 0 and the standard deviation of 1, and generating dimensionless training sample sets in different seasons, thereby eliminating the situation of larger prediction result difference caused by inconsistent dimensionality. The Z-score normalization method is expressed as:
where x is an input vector, μ is the average value of the columns of the input vector x, δ is the standard deviation of the columns of the input vector x, and x' is the value of the input vector x normalized by the Z-score standard deviation.
Respectively inputting the training sample sets of different seasons after normalization processing to construct GBDT regression prediction models, wherein the sample sets are as followsWherein x is an input vector and y is an output vector; and carrying out regression prediction by using the square loss and absolute loss GBDT algorithm and the learning rate of the regression tree. As shown in fig. 2, the GBDT model is iterated through multiple rounds, each round of iteration producing a weak classifier, each classifier being of the last round of classifierTraining on the basis of the negative gradient, and outputting a result after the training is iterated for the maximum number of times to obtain a final predicted value of the building energy consumption.
Preferably, the square loss function is set as:
formula (1)
Preferably, the absolute loss function is set as:
formula (2)
In the formula (1) and the formula (2), y is an actual building energy consumption value, and f (x) is a predicted building energy consumption value.
Further, inputting the training sample set T, and initializing a GBDT regression prediction model. Setting the number k=1, 2 …, K of regression trees, and for samples i=1, 2, …, N; calculating the initial value f of the kth-1 tree 0 (x) A. The invention relates to a method for producing a fibre-reinforced plastic composite When f 0 (xi) Mean time f of building energy consumption actual value in sample T a0 (xi) Calculating the residual A of the ith sample position i The method comprises the steps of carrying out a first treatment on the surface of the When f 0 (xi) Median f, the actual value of the building energy consumption in the sample T b0 (xi) When calculating the residual B of the ith sample position i
Further, the square loss function is very sensitive to outliers, resulting in excessive prediction errors at outliers, while the absolute loss function is more robust to outliers, thus optimizing the negative gradient calculation method.
Preferably, the negative gradient r of the loss function L ki The calculation method of (1) is as follows:
1. when A is i <B i When r is ki The calculation formula is as follows:
formula (3)
2. When A is i >B i When r is ki The calculation formula is as follows:
formula (4)
In the formula (3) and the formula (4), yi is the building energy consumption actual value of the ith sample of the kth-1 tree in the T samples, f k-1 (xi) Building energy consumption predicted value of the ith sample of the kth-1 tree, sign is a sign function.
Traversing each feature in the sample set T, according to the formula (1) and the formula (2), calculating total loss functions of building energy consumption actual values under all possible division conditions according to a certain feature of the ith sample position as a division node, and taking a feature value corresponding to the combination with the minimum first total loss function as the division node.
Preferably, the new training set calculated from the negative gradientFitting a regression tree to obtain a leaf node region R of the kth tree kj For j=1, 2, …, J, a constant value C that minimizes the loss function L is estimated using a linear search kj . Since the calculation of the negative gradient combines the square loss and the absolute loss, to avoid C kj The value of (2) is inaccurate, and the calculation formula is as follows:
formula (5)
In the formula (5), gamma kj For the mean value of the pseudo residual errors in the jth leaf node of the kth tree, N is the number of samples of the jth leaf node of the kth tree, and min () represents the minimum value taken among them.
Further, updating the regression tree to obtain i building energy consumption predicted values f output by the kth tree k (x)
Formula [ (formula ]6)
In the formula (6), f k-1 (x) The predicted value of building energy consumption of the kth-1 tree is that J is the number of leaf nodes of the regression tree, C kj For a constant value in the j-th node of iteration k minimizing the loss function, I (x ε R kj ) As a sexual function lr is the learning rate.
Preferably, since the original learning rate is 1, the GBDT model is usually trained to obtain the same number, so that the model is predicted to be fitted, and the prediction accuracy of the model is greatly reduced. Therefore, according to the idea of kringing (Shrinkage), the present invention sets the learning rate lr=0.08.
Preferably, after the regression tree is fitted for the maximum iteration number K, the final GBDT prediction model F (x) is output as follows:
formula (7)
Preferably, the input vector of the day to be predicted in the corresponding season is normalized through the normalization method, a corresponding final GBDT model is input, prediction data is output, and the building energy consumption prediction value of the day to be predicted is obtained after the inverse normalization method is adopted.
Preferably, the evaluation index comprises average absolute error MAE and root mean square error RMSE; and calculating the RMSE and the MAE according to the actual building energy consumption value and the building energy consumption predicted value of the day to be predicted, and evaluating the prediction precision of the model, wherein the smaller the RMSE and the MAE, the better the prediction effect of the model is, and the worse the model is otherwise.
According to the feature selection method provided by the invention, the main features are extracted by combining the Pearson correlation coefficient measurement and the ETC feature influence degree measurement, so that the dimension of an input vector is reduced, the training efficiency can be improved, and the overfitting is reduced; the normalization processing method can eliminate the influence of different dimensions on the prediction result; most importantly, the office building energy consumption prediction method based on the GBDT regression model provided by the invention combines the square loss algorithm with the absolute loss algorithm, reduces the influence of abnormal values in samples on training and prediction results, improves the prediction accuracy, and simultaneously uses the learning rate to perform model training, so that the model has higher prediction accuracy, more accurate energy consumption prediction values can be obtained, and the energy saving construction quality of the building is effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art. In the accompanying drawings:
FIG. 1 is a schematic diagram of a building energy consumption prediction process according to the present invention;
FIG. 2 is a flow chart of GBDT model construction.
Detailed Description
The present invention will be described in detail with reference to specific embodiments, and the flow chart is shown in fig. 1.
Data is collected. Because the energy consumption of the lighting equipment, the air conditioner of the refrigerating machine room and the office equipment is the highest in the office building energy consumption, in the embodiment, the prediction of the energy consumption of the lighting equipment is taken as an example, the influence factor data of the building energy consumption and the sources of the historical building energy consumption data are the data collected by a certain office building, wherein the original input vector x comprises the outdoor temperature, the sunlight level, the solar radiation, the building area, the orientation, the attendance number, the number of the lighting equipment, the running time of the lighting equipment and the like; the output vector y includes the amount of electricity used by the lighting devices within the building.
Data partitioning and feature selection. The influence factors of building energy consumption are more complicated, and mainly comprise local solar radiation, precipitation, wind speed and sunlight time, the position and the orientation of the building envelope and thermal performance of the building envelope, different functional types of the building, use objects, energy utilization equipment types and the like. During model training, the model training speed is greatly slowed down by excessive feature vectors. Therefore, to improve training efficiency, a main feature vector needs to be selected.
Dividing the acquired data by seasons; analyzing the correlation between the influence factors of the building energy consumption and the building energy consumption through the Pearson correlation coefficient, and selecting main characteristics according to the degree of correlation; simultaneously, the data are respectively input into an ETC model, the characteristic importance indexes are output, the Pearson correlation coefficient and the characteristic importance index corresponding to each characteristic are weighted and averaged, and several characteristics with high correlation are selected as main influence characteristics. The final determined characteristics are outdoor temperature, solar radiation, attendance, number of lighting devices, lighting device run time.
And normalizing the input vector of the main characteristic data and the output vector corresponding to the characteristic to normalize the data to be between 0 and 1 so as to facilitate unified modeling and calculation of the system. Specifically, a Z-score standardization method is adopted to change each characteristic dimension attribute value of original data into Gaussian distribution with a mean value of 0 and a standard deviation of 1, and a dimensionless training sample set is generated, so that the situation that the prediction results have large differences due to inconsistent dimensionality is eliminated. The Z-score normalization method is expressed as
Formula (1)
In the formula (1), x is an input vector, μ is a mean value of a column in which the input vector x is located, δ is a standard deviation of the column in which the input vector x is located, and x' is a value obtained by normalizing the input vector x by the Z-score standard deviation. Thereby obtaining training sample data.
And constructing the GBDT model by using the obtained training sample set. The sample set isWherein x is an input vector comprising collected outdoor temperature, solar radiation, attendance number, number of lighting devices and running time data of the lighting devices, and y is an output vector comprising collected electricity consumption of the lighting devices in the building; regression prediction is performed by using a square loss and absolute loss GBDT algorithm. The GBDT model is iterated through multiple rounds, each round of iteration producing a weak classifier, eachThe classifier is trained on the basis of the negative gradient of the previous round of classifier, and the training is output after the maximum number of iterations to obtain the final building energy consumption predicted value.
First, the square loss function is set as:
formula (2)
Setting the absolute loss function as
Formula (3)
In the formula (2) and the formula (3), y is an actual building energy consumption value, and f (x) is a predicted building energy consumption value.
And inputting the training sample set T, and initializing a GBDT model. Setting the number of regression trees k=1, 2, …, K and the number of samples i=1, 2, …, N; since the first tree is preceded by the 0 th tree, calculating f (x) of the first tree may take an initial value f according to the loss function used 0 (x) F (x) is then the predicted value for the k-1 tree. When f 0 (xi) Mean time f of building energy consumption actual value in sample T a0 (xi) Calculating the residual A of the ith sample position i The method comprises the steps of carrying out a first treatment on the surface of the When f 0 (xi) Median f, the actual value of the building energy consumption in the sample T b0 (xi) When calculating the residual B of the ith sample position i
A negative gradient of the loss function L is calculated. Since the square loss function is very sensitive to outliers, the prediction error at outliers is excessive, while the absolute loss function is more robust to outliers. Thus, the negative gradient r of the loss function L ki The calculation method of (1) is as follows:
1. when A is i <B i When r is ki The calculation formula is as follows:
formula (4)
2. When A is i >B i When r is ki The calculation formula is as follows:
formula (5)
In the formula (4) and the formula (5), yi is the actual building energy consumption value of the ith sample of the kth-1 tree in the T samples, f k-1 (xi) Building energy consumption predicted value of the ith sample of the kth-1 tree, sign is a sign function.
And traversing each feature in the sample set T, according to the formula (2) and the formula (3), calculating the total loss function of the building energy consumption actual values under all possible division conditions according to a certain feature of the ith sample position as a division node, and taking the feature value corresponding to the combination with the minimum first total loss function as the division node.
New training set calculated from negative gradientsFitting a regression tree to obtain a leaf node region R of the kth tree kj For j=1, 2, …, J, a constant value C that minimizes the loss function L is estimated using a linear search kj . Since the calculation of the negative gradient combines the square loss and the absolute loss, to avoid C kj The value of (2) is inaccurate and the formula is as follows:
formula (6)
In the formula (6), gamma kj For the mean value of the pseudo residual errors in the jth leaf node of the kth tree, N is the number of samples of the jth leaf node of the kth tree, and min () represents the minimum value taken among them.
Updating the regression tree to obtain i building energy consumption predicted values f output by the kth tree k (x)
Formula (7)
In the formula (7), f k-1 (x) The predicted value of building energy consumption of the kth-1 tree is that J is the number of leaf nodes of the regression tree, C kj For a constant value in the j-th node of iteration k minimizing the loss function, I (x ε R kj ) As a sexual function lr is the learning rate.
The original learning rate is 1, so that the GBDT model is usually trained to obtain the same number, and the prediction is fitted, so that the prediction accuracy of the model is greatly reduced. Therefore, according to the idea of kringing (Shrinkage), the present invention sets the learning rate lr=0.08.
After the regression tree is fitted for K times with the maximum iteration number, a final GBDT prediction model F (x) is output as follows:
formula (8)
The input vector of the day to be predicted in the corresponding season is normalized through the normalization method, and the input vector comprises outdoor temperature, solar radiation, attendance number, lighting equipment number and lighting equipment running time data corresponding to training sets in different seasons; and inputting a corresponding final GBDT model, outputting prediction data, and obtaining a predicted value of the building energy consumption on the day to be predicted after processing by an inverse normalization method.
Finally, the evaluation index comprises an average absolute error MAE and a root mean square error RMSE; calculating RMSE and MAE according to the actual building energy consumption value and the predicted building energy consumption value of the day to be predicted, and evaluating the prediction precision of the model; wherein, the smaller the RMSE and the MAE, the better the model prediction effect, and conversely, the worse the model prediction effect. To further verify the effect of the model, the same dataset was predicted using a multiple regression model, and the collected actual and predicted values of the power consumption of the lighting devices in the building for 6 months and 30 days were compared to the prediction error of the multiple regression model, to obtain the following table 1:
by comparing the energy consumption prediction precision of two different models of the lighting equipment, the error of the GBDT decision tree energy consumption prediction model is smaller than that of the multiple regression energy consumption prediction model, and the prediction error is smaller for daily energy consumption prediction of the lighting equipment.
The embodiment of the invention predicts the energy consumption of the office building based on the GBDT regression model, and the prediction result has higher accuracy, can effectively help to improve the energy-saving construction quality of the building, and is specifically embodied in:
firstly, the feature selection method combines the Pearson correlation coefficient measurement and the ETC feature influence degree measurement, extracts main features, reduces the dimension of an input vector, can improve training efficiency and reduces overfitting;
secondly, the Z-score normalization processing method is used, so that the influence of different dimensions on a prediction result can be eliminated;
thirdly, the office building energy consumption prediction method based on the GBDT regression model is used for combining a square loss algorithm and an absolute loss algorithm when the model is trained, so that the influence of abnormal values in samples on training and prediction results is reduced, and the prediction precision is improved;
fourth, the model training is carried out by using a proper learning rate, so that the law of the office building energy consumption can be better learned, and a more accurate energy consumption predicted value is obtained.
It can be appreciated that the office building energy consumption prediction method based on the GBDT regression model according to the embodiment of the invention can be applied to other building energy consumption prediction methods.
Embodiments of the present invention may be realized in hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or special purpose design hardware. Those of ordinary skill in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such as provided on a carrier medium such as a magnetic disk, CD or DVD-ROM, a programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the description of the present invention and the accompanying drawings or direct/indirect application in other related technical fields are included in the scope of the invention.

Claims (10)

1. An office building energy consumption prediction method based on a gradient lifting decision tree (GBDT) regression model is shown in fig. 1, and is characterized by comprising the following steps:
step S1, acquiring historical energy consumption data of a building and influence factor data of the historical energy consumption data;
s2, sorting the acquired data and dividing the data by seasons, and determining main features by adopting Pearson correlation coefficients and ETC characteristic influence degree measurement influence factors;
step S3, carrying out normalization processing on the main characteristic data and the corresponding historical building energy consumption data to obtain training sample data;
s4, setting a square loss function, an absolute loss function and a regression tree learning rate, and constructing GBDT regression prediction models of different seasons by using the obtained training sample data;
s5, obtaining data to be predicted of the training sample in a corresponding season, including building energy consumption influence characteristic data of a day to be predicted, and obtaining a set to be predicted after normalization processing;
s6, inputting the to-be-predicted set into a GBDT model of a corresponding season, and processing predicted output data by an inverse normalization method to obtain a building energy consumption predicted value of the to-be-predicted day;
s7, taking a Root Mean Square Error (RMSE) and an average absolute error (MAE) as evaluation indexes of the GBDT model, and verifying the model effect; wherein, the smaller RMSE and MAE represent the better prediction effect, and vice versa.
2. The method for predicting energy consumption of office building based on GBDT regression model according to claim 1, wherein in step S1, the influence characteristic data of building energy consumption includes not only the collected weather environment data of the building, the building' S own enclosure structure, but also the personnel condition in the building, the performance of energy-consuming devices in the building, and the historical energy consumption data of the building are the energy consumption data of various devices in the building, the energy consumption data of the refrigerating unit, etc.
3. The method for predicting the energy consumption of an office building based on the GBDT regression model according to claim 1, wherein in the step S2, the raw data are divided into seasons, the correlation between the influence factors of the energy consumption of the building and the energy consumption of the building is analyzed through Pearson correlation coefficients, and the main characteristics are selected according to the degree of the correlation; and simultaneously inputting the data into an ETC model respectively, outputting a characteristic importance index of the data, and finally weighting and averaging the Pearson correlation coefficient and the characteristic importance index corresponding to each characteristic to select the characteristic with high correlation as the main influence characteristic.
4. The method for predicting the energy consumption of an office building based on the GBDT regression model according to claim 3, wherein in the step S3, the preprocessing comprises the input vector and output data dividing and normalizing; respectively generating input vectors by using the divided main characteristic data, wherein the corresponding historical building energy consumption data are used as output vectors; and carrying out normalization processing on the divided input vector and output vector, namely adopting a Z-score normalization method to change each characteristic dimension attribute value of the original data into Gaussian distribution with the mean value of 0 and the standard deviation of 1, and generating dimensionless training sample sets in different seasons, thereby eliminating the situation of larger prediction result difference caused by inconsistent dimensionality.
5. The method for predicting energy consumption of office building based on GBDT regression model according to claim 1, wherein in step S4, the training method comprises respectively inputting the training sample sets of different seasons after normalization processing to construct GBDT model,the sample set isWherein x is a feature vector and y is a target vector; and carrying out regression prediction by using the square loss and absolute loss GBDT algorithm and the learning rate of the regression tree.
6. The method for predicting energy consumption of an office building based on a GBDT regression model according to claim 5, wherein the square loss function and the absolute loss function are set as follows:
formula (1)
Formula (2)
In the formula (1) and the formula (2), y is an actual building energy consumption value, and f (x) is a predicted building energy consumption value.
7. The GBDT regression model based office building energy consumption prediction method of claim 5, wherein the original learning rate is 1, which usually results in the GBDT model being trained to obtain the same value, resulting in a predicted fit; therefore, according to the idea of kringing (Shrinkage), the learning rate lr=0.08 is set.
8. The GBDT regression model based office building energy consumption prediction method according to claim 1, wherein the square loss function according to equation (1) and the absolute loss function according to equation (2) are used for gradient lifting decision tree regression prediction according to the learning rate of claim 7; the GBDT model generates a weak classifier through multiple iterations, each classifier is trained on the basis of the negative gradient of the previous classifier, and the final building energy consumption predicted value is obtained by outputting the result after the maximum number of iterations of training, and the specific steps are as follows:
(1) Initializing a GBDT model, setting the number k=1, 2 …, K of regression trees, and calculating the initial value f of the kth-1 tree for the sample positions i=1, 2, …, N of the sample T 0 (x) When f 0 (xi) Mean time f of building energy consumption actual value in sample T a0 (xi) Calculate residual A i When f 0 (xi) Median f, the actual value of the building energy consumption in the sample T b0 (xi) When calculating residual B i
(2) Calculating the negative gradient of the loss function L, the negative gradient r of the loss function L is because the square loss function is very sensitive to outliers, resulting in excessive prediction error at outliers, while the absolute loss function is more robust to outliers ki The calculation method of (1) is as follows:
(a) When A is i <B i When r is ki The calculation formula is as follows:
formula (3)
(b) When A is i >B i When r is ki The calculation formula is as follows:
formula (4)
In the formula (3) and the formula (4), yi is the building energy consumption actual value of the ith sample of the kth-1 tree in the T samples, f k-1 (xi) Building energy consumption predicted values of the ith sample of the (k-1) th tree, wherein sign is a symbol function;
(3) Traversing each feature in the sample set T, according to the formula (1) and the formula (2), calculating total loss functions of building energy consumption actual values under all possible division conditions according to a certain feature of the ith sample position as a division node, and taking a feature value corresponding to the combination with the minimum first total loss function as the division node;
(4) Obtained from step (2)Combining the step (3), fitting a regression tree to obtain a leaf node region R of the kth tree kj For j=1, 2, …, J, a constant value C that minimizes the loss function L is estimated using a linear search kj Since the calculation of the negative gradient combines the square loss and the absolute loss, to avoid C kj The value of (2) is inaccurate, and the calculation formula is as follows:
formula (5)
In the formula (5), gamma kj For the average value of pseudo residual errors in the jth leaf node of the kth tree, N is the sample number of the jth leaf node of the kth tree, and min () represents the minimum value taken by the sample number;
(5) Updating the regression tree to obtain i building energy consumption predicted values f output by the kth tree k (x)
Formula (6)
In the formula (6), f k-1 (x) The predicted value of building energy consumption of the kth-1 tree is that J is the number of leaf nodes of the regression tree, C kj For a constant value in the j-th node of iteration k minimizing the loss function, I (x ε R kj ) As an oscillography function, lr is a learning rate, wherein, as the original learning rate is 1, the GBDT model is usually trained to obtain the same number, so that the prediction is fitted, and the prediction accuracy of the model is greatly reduced, therefore, according to the idea of the shrinkability, the learning rate lr=0.08 is set;
(6) After the regression tree is fitted for K times with the maximum iteration number, a final GBDT prediction model F (x) is output as
Equation (7).
9. The method for predicting the energy consumption of an office building based on the GBDT regression model according to claim 1, wherein in the step S5, the input vector of the day to be predicted in the corresponding season is normalized by the normalization method, the corresponding GBDT regression prediction model is input, the prediction data is output, and the energy consumption predicted value of the building on the day to be predicted is obtained after the inverse normalization processing.
10. The method for predicting energy consumption of office building based on GBDT regression model according to claim 7, wherein in step S7, the evaluation index is used to obtain average absolute error MAE and root mean square error RMSE; and calculating the RMSE and the MAE according to the actual building energy consumption value and the building energy consumption predicted value of the day to be predicted, and evaluating the prediction precision of the model, wherein the smaller the RMSE and the MAE, the better the prediction effect of the model is, and the worse the model is otherwise.
CN202210246202.9A 2022-03-14 2022-03-14 GBDT regression model-based office building energy consumption prediction method Pending CN116796141A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210246202.9A CN116796141A (en) 2022-03-14 2022-03-14 GBDT regression model-based office building energy consumption prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210246202.9A CN116796141A (en) 2022-03-14 2022-03-14 GBDT regression model-based office building energy consumption prediction method

Publications (1)

Publication Number Publication Date
CN116796141A true CN116796141A (en) 2023-09-22

Family

ID=88040605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210246202.9A Pending CN116796141A (en) 2022-03-14 2022-03-14 GBDT regression model-based office building energy consumption prediction method

Country Status (1)

Country Link
CN (1) CN116796141A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117235679A (en) * 2023-11-15 2023-12-15 长沙金码测控科技股份有限公司 LUCC-based tensile load and compressive load evaluation method and system for foundation pit monitoring
CN117236522A (en) * 2023-11-10 2023-12-15 四川智源能诚售电有限公司 Power energy consumption management method, system, electronic equipment and medium
CN117251814A (en) * 2023-09-28 2023-12-19 广东省交通开发有限公司 Method for analyzing electric quantity loss abnormality of highway charging pile

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117251814A (en) * 2023-09-28 2023-12-19 广东省交通开发有限公司 Method for analyzing electric quantity loss abnormality of highway charging pile
CN117236522A (en) * 2023-11-10 2023-12-15 四川智源能诚售电有限公司 Power energy consumption management method, system, electronic equipment and medium
CN117236522B (en) * 2023-11-10 2024-02-13 四川智源能诚售电有限公司 Power energy consumption management method, system, electronic equipment and medium
CN117235679A (en) * 2023-11-15 2023-12-15 长沙金码测控科技股份有限公司 LUCC-based tensile load and compressive load evaluation method and system for foundation pit monitoring

Similar Documents

Publication Publication Date Title
CN116796141A (en) GBDT regression model-based office building energy consumption prediction method
CN106842914A (en) A kind of temperature control energy-saving processing method, apparatus and system
CN104463381A (en) Building energy consumption predication method based on KPCA and WLSSVM
CN113554466A (en) Short-term power consumption prediction model construction method, prediction method and device
CN115130741A (en) Multi-model fusion based multi-factor power demand medium and short term prediction method
CN105787259A (en) Method for analyzing influence correlation of multiple meteorological factors and load changes
Elhariri et al. H-ahead multivariate microclimate forecasting system based on deep learning
CN114282730A (en) Data completeness inspection and feature learning method for building load prediction
CN113591368A (en) Comprehensive energy system multi-energy load prediction method and system
CN113762387A (en) Data center station multi-load prediction method based on hybrid model prediction
CN112183826A (en) Building energy consumption prediction method based on deep cascade generation countermeasure network and related product
CN116187835A (en) Data-driven-based method and system for estimating theoretical line loss interval of transformer area
CN115759389A (en) Day-ahead photovoltaic power prediction method based on weather type similar day combination strategy
CN116757057A (en) Air quality prediction method based on PSO-GA-LSTM model
CN116822672A (en) Air conditioner cold load prediction optimization method and system
CN110598947A (en) Load prediction method based on improved cuckoo-neural network algorithm
CN116245259B (en) Photovoltaic power generation prediction method and device based on depth feature selection and electronic equipment
CN110276478B (en) Short-term wind power prediction method based on segmented ant colony algorithm optimization SVM
CN117113086A (en) Energy storage unit load prediction method, system, electronic equipment and medium
CN114862032B (en) XGBoost-LSTM-based power grid load prediction method and device
CN116595895A (en) Training method of short-time electric quantity prediction model and short-time electric quantity prediction method
CN113449466B (en) Solar radiation prediction method and system for optimizing RELM based on PCA and chaos GWO
CN115796327A (en) Wind power interval prediction method based on VMD (vertical vector decomposition) and IWOA-F-GRU (empirical mode decomposition) -based models
Tan et al. Long-term load forecasting based on feature fusion and lightgbm
CN114372631A (en) Data-lacking area runoff prediction method based on small sample learning and LSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination