CN116362376A - Comprehensive energy station construction carbon emission prediction method based on machine learning - Google Patents

Comprehensive energy station construction carbon emission prediction method based on machine learning Download PDF

Info

Publication number
CN116362376A
CN116362376A CN202310130789.1A CN202310130789A CN116362376A CN 116362376 A CN116362376 A CN 116362376A CN 202310130789 A CN202310130789 A CN 202310130789A CN 116362376 A CN116362376 A CN 116362376A
Authority
CN
China
Prior art keywords
carbon emission
energy station
comprehensive energy
prediction
construction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310130789.1A
Other languages
Chinese (zh)
Inventor
凌建
孙雷
马天
陈松涛
方磊
徐超
沈文韬
刘骁繁
林冬阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jiangsu Electric Power Co Ltd
Original Assignee
State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jiangsu Electric Power Co Ltd filed Critical State Grid Jiangsu Electric Power Co Ltd
Priority to CN202310130789.1A priority Critical patent/CN116362376A/en
Publication of CN116362376A publication Critical patent/CN116362376A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/22Social work
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/80Management or planning
    • Y02P90/84Greenhouse gas [GHG] management systems

Abstract

The invention discloses a comprehensive energy station construction carbon emission prediction method based on machine learning, which comprises the following steps: collecting a comprehensive energy station to construct a carbon emission prediction index system and identifying a carbon emission source; preprocessing comprehensive energy station project prediction index data and carbon emission source data; and constructing a comprehensive energy station construction carbon emission prediction model based on a machine learning algorithm so as to predict the comprehensive energy station construction carbon emission. The invention provides an accurate, rapid and intelligent comprehensive energy station construction carbon emission prediction method, which reduces the complexity and complexity of the traditional carbon emission calculation process and provides effective and highly-credible data support for low-carbon design and construction.

Description

Comprehensive energy station construction carbon emission prediction method based on machine learning
Technical Field
The invention belongs to the technical field of carbon emission prediction, and particularly relates to a comprehensive energy station construction carbon emission prediction method based on machine learning.
Background
The machine learning method has a sufficient application prospect in the prediction research of the comprehensive energy stations and the building field, but the research on the carbon emission of the comprehensive energy stations such as the comprehensive energy stations is lacking at present. In the related research of the carbon emission of the comprehensive energy station, the analysis and the prediction of the carbon emission of the construction of the comprehensive energy station are lack of importance, in the work of actually developing the assessment of the carbon emission of the construction of the comprehensive energy station, the current carbon emission calculation method is focused on the calculation in the past or afterwards, a large amount of design and construction data are relied on, the calculation process is complicated, and the calculation of the carbon emission in the construction stage of the comprehensive energy station is difficult, the actual development effect is poor, the universal implementation is difficult due to the reasons of numerous construction units, poor data integrity, high technical complexity and difficult data collection, and the significance of actual design and construction guidance is low. Therefore, an intelligent prediction method combining data preprocessing and machine learning is needed. In the machine learning algorithm, the XGBoost algorithm is a novel Boosting-based integrated learning algorithm, and compared with the traditional machine learning algorithm, the XGBoost algorithm has the advantages of high running speed, strong generalization capability, high prediction precision, good robustness and the like, in addition, the model of the XGBoost algorithm has high interpretability and can be used for predicting small samples, and is widely applied to the fields of runoff prediction, credit card transaction, project investment prediction, fault monitoring and the like at present, but is not deeply applied to the field of carbon emission prediction.
Disclosure of Invention
Aiming at the problems, the invention provides a comprehensive energy station construction carbon emission prediction method based on machine learning, which is characterized in that firstly, a comprehensive energy station construction carbon emission prediction index system is constructed based on the view angles of comprehensive energy station project composition, construction process and carbon footprint, a carbon emission source is identified, secondly, the comprehensive energy station construction prediction index and related data of the carbon emission source are collected and processed, and then, a comprehensive energy station construction carbon emission prediction model is constructed based on a machine learning algorithm, and training, testing and evaluating are carried out on the model. And finally, realizing practical application, inputting prediction index data into a comprehensive energy station construction carbon emission prediction model to obtain a predicted value of the comprehensive energy station construction carbon emission, and finishing prediction of the newly built comprehensive energy station project construction carbon emission.
The technical scheme is as follows: the invention provides a comprehensive energy station construction carbon emission prediction method based on machine learning, which comprises the following steps:
step 1, collecting a comprehensive energy station to construct a carbon emission prediction index system and identifying a carbon emission source;
step 2, preprocessing the comprehensive energy station construction carbon emission prediction index data and carbon emission source data;
step 3, constructing a comprehensive energy station construction carbon emission prediction model based on a machine learning algorithm, and training the model by using the data in the step 2;
and 4, inputting prediction index data into a comprehensive energy station construction carbon emission prediction model, outputting a comprehensive energy station construction carbon emission value by the model, and finishing prediction of newly built comprehensive energy station construction carbon emission.
Further, in the step 1, a comprehensive energy station construction carbon emission prediction index system is collected, wherein the comprehensive energy station construction carbon emission prediction index system comprises the number of layers on the ground, the foundation burial depth, the total building area, the foundation soil volume, the concrete design volume and the steel bar design weight; the carbon emission source comprises steel, concrete, electric power consumption and water consumption required by the construction of the comprehensive energy station.
Further, in step 2, the specific method for preprocessing the carbon emission prediction index data and the carbon emission source data of the comprehensive energy station is as follows:
step 2-1, converting carbon emission source data consumed by the construction of the comprehensive energy station into carbon emission source data of a unit building area;
step 2-2, analyzing carbon emission source data of unit building area by using a box line graph, and identifying abnormal data as data to be repaired;
step 2-3, replacing the missing value or the abnormal value of the carbon emission source data of the unit building area by using a K-nearest neighbor algorithm;
step 2-4, calculating the construction carbon emission value of the integrated energy station of the unit building area by using the replaced carbon emission source data of the unit building area and combining the carbon emission factors of each carbon emission source, and taking the construction carbon emission value as an output variable of a construction carbon emission prediction model of the integrated energy station;
and 2-5, respectively processing the carbon emission prediction index of each type of comprehensive energy station construction by using a Min-Max normalization method, and taking the carbon emission prediction index as an input variable of a comprehensive energy station construction carbon emission prediction model.
Further, in step 2-1, the comprehensive energy station construction carbon emission source data is converted into carbon emission source data of a unit building area through a formula (1);
Figure BDA0004083977830000021
wherein:
Y i the consumption of the unit building area of the ith carbon emission source is calculated, i is the type of the carbon emission source;
M i is the total amount of the ith carbon emission source;
s is the total building area of the comprehensive energy station.
Further, the method in step 2-2 is as follows: 25% of the quantiles Q of all samples were taken 1 And 75% quantile Q 2 ,Q 1 、Q 2 The difference is the length IQR of the box body, when the value of the carbon emission source per unit building area is smaller than (Q 1 -1.5 IQR) or greater than (Q 2 +1.5 IQR), an abnormal sample is obtained.
Further, the method in step 2-3 is as follows: calculating Euclidean distance between the predicted index of the normal item and the predicted index of the item to be repaired by using a K-nearest neighbor algorithm, and selecting the average value of the carbon emission source data of the unit building area of the normal item with the K nearest Euclidean distances to replace the missing value or the abnormal value in the carbon emission source data of the unit building area.
Further, the method in step 2-4 is as follows: and calculating the building carbon emission quantity of the integrated energy station of each unit building area by the cumulative sum of the carbon emission source data and the carbon emission factors of each unit building area, wherein the building carbon emission quantity is used as an output variable of a building carbon emission prediction model of the integrated energy station, and the calculation model is shown in a formula:
Figure BDA0004083977830000031
wherein:
c is the carbon emission amount of a unit building area in the construction process of the comprehensive energy station;
Y i consumption per building area for the ith carbon emission source;
F i the carbon emission factor of the i-th carbon emission source, i is the carbon emission source type, and N types are all used;
the output variable dataset l= { Ci } is obtained by calculation.
Further, the method of step 2-5 is as follows: the carbon emission prediction indexes of the construction of each type of comprehensive energy station are used as input variables of the carbon emission prediction model of the construction of the comprehensive energy station to form an input variable set F= { X 1 ’,X 2 ’,X 3 ’,X 4 ’,X 5 ’,X 6 ' wherein X j ’={x′ 1j ,x′ 2j ,x′ 3j ,x′ 4j ,x′ 5j ,…,x′ mj A j-th prediction index data set after normalization treatment is represented and consists of values after normalization treatment of the j-th prediction index of m items;
Figure BDA0004083977830000032
wherein:
x′ ij representing the value of the ith item after normalization processing of the jth predictive index;
x j,max ,x j,min respectively representing the maximum value and the minimum value in the j-th prediction index class;
x ij representing the original value of the ith item at the jth predictive index;
j represents the category of the predictor, i represents different items, and n represents the number of items.
Further, in step 3, the method for constructing the carbon emission prediction model by the comprehensive energy station based on the machine learning algorithm and training the model is as follows:
step 3-1, dividing the comprehensive energy station construction carbon emission prediction index data and the unit building area comprehensive energy station construction carbon emission data obtained through processing into a training set and a testing set;
step 3-2, dividing the training set into K subsets with equal size on the basis of the training set obtained in the step 3-1, wherein the K-1 subsets are used for model training, and the rest 1 subsets form verification subsets;
step 3-3, constructing a comprehensive energy station construction carbon emission prediction model based on an XGBoost algorithm, training a tree integration model used in the XGBoost algorithm in an additive mode, and training by adjusting model parameters through verification subsets based on a K-fold cross verification mode to obtain a plurality of comprehensive energy station construction carbon emission prediction models;
and 3-4, inputting the test set into a comprehensive energy station construction carbon emission prediction model to predict, comparing the prediction result of the prediction model with the actual result, and determining an optimal comprehensive energy station construction carbon emission prediction model.
Further, the method of step 3-3 is as follows:
(1) taking the CART classification tree as a basic learner, setting a loss function, an objective function and an XGBoost prediction model as shown in formulas (4) - (7) respectively:
setting a loss function:
Figure BDA0004083977830000041
in the method, in the process of the invention,
Figure BDA0004083977830000042
respectively representing an actual value of the carbon emission of the construction of the comprehensive energy station and a predicted value of the carbon emission of the construction of the comprehensive energy station, wherein n is the number of samples, and l () is a loss function;
setting an objective function:
Figure BDA0004083977830000043
wherein f t (x i ) Is the predictive model of the t-th tree,
Figure BDA0004083977830000044
summing the complexity of all t trees, and adding the complexity to a regularization term of an objective function for preventing the model from being over fitted;
setting an XGBoost prediction model:
Figure BDA0004083977830000045
in the method, in the process of the invention,
Figure BDA0004083977830000046
after t iterations, the training set inputs sample x i Loss prediction value of->
Figure BDA0004083977830000047
Is the predicted loss value f of the previous t-1 tree t (x i ) A predictive model for the t-th tree;
objective function Obj t Written as shown in formula (7):
Figure BDA0004083977830000048
wherein C is a constant term, Ω (f t (x i ) For regular term, t is the number of spanning trees, obj) t Generating an objective function after the t-th tree;
(2) according to the taylor formula, the loss function is as in formula (8):
Figure BDA0004083977830000051
in the formula g i To be the first derivative of the loss function, h i A second derivative of the loss function;
(3) according to formulas (7), (8), the objective function is written as shown in formula (9):
Figure BDA0004083977830000052
in the method, in the process of the invention,
Figure BDA0004083977830000054
for a constant, the objective function is further optimized as in equation (10):
Figure BDA0004083977830000053
(4) based on the objective function of the formula (10), when the training set is input, a first order derivative g of each step of loss function is calculated i Second order derivative h i Then optimizing the objective function to obtain f for each step t (x i ) Obtaining an overall model according to a formula (6), namely building a carbon emission prediction model for the comprehensive energy station;
(5) and adjusting model parameters to obtain different comprehensive energy station construction carbon emission prediction models.
The beneficial effects are that: compared with the prior art, the technical scheme of the invention has the following beneficial effects:
(1) According to the method, the comprehensive energy station construction carbon emission prediction index system is constructed through analysis of comprehensive energy station project compositions, construction processes and carbon emission sources, the comprehensive energy station construction carbon emission sources are defined, and a data processing method and a carbon emission calculation model are determined. According to the method, a comprehensive energy station construction carbon emission prediction model based on a machine learning algorithm is constructed, an optimal model is obtained through K-fold cross validation training, and prediction and effect evaluation are carried out on the comprehensive energy station construction carbon emission on a test set, so that prediction of newly built comprehensive energy station project construction carbon emission is completed.
(2) The invention provides thought and method for predicting carbon emission in the construction process of the comprehensive energy station, provides references for how to process data, and has higher reference value for some limited number of projects with data problems. On the other hand, the comprehensive energy station construction carbon emission prediction model based on the machine learning algorithm constructed by the invention has the advantages of excellent fitting and small model error, and can be used for predicting the comprehensive energy station construction carbon emission well, thereby avoiding the collection and complicated calculation of a large amount of data in the comprehensive energy station construction carbon emission calculation process, having less manual intervention and simple prediction process, being capable of realizing the prediction of the comprehensive energy station construction carbon emission in advance and having stronger universality rule.
Drawings
FIG. 1 is a flow chart of the implementation of the method of the present invention.
FIG. 2 is a schematic diagram of the method of the present invention for outlier identification using a box plot.
FIG. 3 is a schematic representation of a method of the present invention employing 5-fold cross-validation to construct and train a model.
Detailed Description
In order to make the purposes, technical schemes and advantages of the invention more clear, the invention will be described in detail below with reference to the accompanying drawings and specific embodiments, which provide a comprehensive energy station construction carbon emission prediction process.
Examples
The invention provides a comprehensive energy station construction carbon emission prediction method based on machine learning, which aims to solve the problems in the prior art and can realize the prediction of the comprehensive energy station construction carbon emission through comprehensive energy station construction carbon emission influencing factors. FIG. 1 is a flow chart of an implementation of the present invention. The invention discloses a comprehensive energy station construction carbon emission prediction method based on machine learning, which specifically comprises the following steps:
and 1, collecting a comprehensive energy station to construct a carbon emission prediction index system and identifying a carbon emission source.
The comprehensive energy station construction carbon emission prediction index needs to reflect the characteristics of building structure types, building material consumption, engineering quantity and the like, so that the comprehensive energy station construction carbon emission prediction index is determined to be the number of ground layers, the foundation burial depth, the total building area, the foundation earthwork quantity, the concrete design volume and the steel bar design weight, and a prediction index system consisting of six prediction indexes is formed. The carbon emission source comprises steel, concrete, electric power consumption and water consumption required by the construction of the comprehensive energy station.
And 2, collecting and processing the comprehensive energy station construction carbon emission prediction index and the carbon emission source data. The carbon emission prediction index and the carbon emission source data acquisition mode of the comprehensive energy station construction are shown in the following table 1.
Table 1 data acquisition table
Figure BDA0004083977830000061
Figure BDA0004083977830000071
Converting the comprehensive energy station construction carbon emission source data into carbon emission source data of unit building area so as to perform data analysis and processing, wherein the data is shown in a formula (1):
Figure BDA0004083977830000072
wherein:
Y i is the ith carbon emission source listAnd (3) the consumption of the building area, i is the type of the carbon emission source.
M i Is the total amount of the ith carbon emission source.
S is the total building area.
Taking a certain project as an example, the building area s= 1182.6m 2 Consumption M of steel 1 By 1200t, the steel consumption per unit building area of the project
Figure BDA0004083977830000073
Other types of carbon emission sources are treated in the same manner.
Further, the carbon emission source data using a unit building area is analyzed using a box graph, and data in which abnormality is identified as data to be repaired. 25% of the quantiles Q of all samples were taken 1 And 75% quantile Q 2 ,Q 1 、Q 2 The difference is the length IQR of the box body, when the value of the carbon emission source per unit building area is smaller than (Q 1 -1.5 IQR) or greater than (Q 2 +1.5 IQR), an abnormal sample is obtained. The box diagram drawn by the invention is shown in fig. 2, and points outside the box are abnormal values, namely, abnormal project numbers exist on a certain carbon emission source value. Taking concrete consumption per unit area as an example, the concrete consumption per unit building area of items 5, 6, 33 and 46 is abnormal, and the concrete consumption per unit building area is taken as an object to be repaired. Other types of carbon emission source data processing per building area are treated in the same manner.
Further, the K-nearest neighbor algorithm is utilized to calculate the Euclidean distance between the predicted index of the normal item and the predicted index of the item to be repaired, and the average value of the carbon emission sources of the unit building area of the normal item closest to the K Euclidean distances is selected to replace the missing value or the abnormal value in the carbon emission source data of the unit building area.
Further, after data restoration is completed, the carbon emission quantity of the building of the integrated energy station of each unit building area is calculated through the sum of the cumulative multiplication of the carbon emission source data and the carbon emission factors of each unit building area and is used as an output variable of a prediction model of the building of the integrated energy station, and the calculation model is shown in a formula (2):
Figure BDA0004083977830000074
wherein:
c is the carbon emission (unit: kgCO) of unit building area in the construction process of the comprehensive energy station 2 e/m 2 )。
Y i Consumption per building area for the ith carbon emission source.
F i The carbon emission factor is the i-th carbon emission source, i is the carbon emission source type, and 4 types are used.
The output variable dataset l= { Ci }, is obtained by a computational model.
The values of the carbon emission sources of the unit building area of the ith integrated energy station project are shown in the following table 2, the carbon emission factors of the carbon emission sources are shown in the following table 3, and the unit building area integrated energy station building carbon emission of the ith integrated energy station project is calculated by combining the formula (2).
TABLE 2 consumption of carbon emissions sources per building area
Figure BDA0004083977830000081
TABLE 3 carbon emission factors for each carbon emission source
Category(s) Numerical value Unit (B)
Electric power 0.581 kgCO 2 e/Kwh
Steel material 2.35 kgCO 2 e/kg
Concrete 295 kgCO 2 e/m 3
Water and its preparation method 0.168 kgCO 2 e/t
Further, various comprehensive energy station construction carbon emission prediction indexes are processed by using a Min-Max normalization method, and as shown in a formula (3), each type of comprehensive energy station construction carbon emission prediction index is used as an input variable of a comprehensive energy station construction carbon emission prediction model to form an input variable set F= { X 1 ’,X 2 ’,X 3 ’,X 4 ’,X 5 ’,X 6 ' wherein X j ’={x′ 1j ,x′ 2j ,x′ 3j ,x′ 4j ,x′ 5j ,…,x′ mj }。
Figure BDA0004083977830000082
Wherein:
x′ ij representing the value of the ith item after normalization processing of the jth predictive index.
x j,max ,x j,min Representing the maximum and minimum values of the jth predictor respectively.
x ij Representing the original value of the ith item at the jth predictor.
j represents the category of the predictor, i represents different items, and m represents the number of items.
The normalization results are shown in table 4.
TABLE 4Min-Max normalization results
X 1 X 2 X 4 X 5 X 6
0.24 0.11 0.07 0.37 0.07
0.09 0.12 0.01 0.12 0.04
0.12 0.08 0.03 0.07 0.03
x′ i1 x′ i2 x′ i4 x′ i5 x′ i6
0.12 0.15 0.04 0.09 0.06
0.12 0.15 ... 0.04 0.09 0.03
0.05 0.02 ... 0.09 0.34 0.00
0.14 0.08 ... 0.04 0.09 0.04
And 3, constructing a comprehensive energy station construction carbon emission prediction model based on a machine learning algorithm.
Step 3-1, dividing the comprehensive energy station construction carbon emission prediction index data and the unit building area comprehensive energy station construction carbon emission data obtained through processing into a training set and a testing set;
before a comprehensive energy station construction prediction model based on a machine learning algorithm is constructed, an input data set F= { X obtained according to the step 2 1 ’,X 2 ’,X 3 ’,X 4 ’,X 5 ’,X 6 ' and output dataset l= { Ci }, divided into training and testing sets by sklearn packets in Python, and set the ratio to 7:3.
Step 3-2, dividing the data set into K subsets with equal size on the basis of the training set obtained in step 3-1, wherein the K-1 subsets are used for model training, and the rest 1 subsets form verification subsets;
in order to eliminate the influence of the division mode and the sequence randomness of the samples on the prediction result, the data set is further divided into K subsets with the same size on the basis of the training set obtained after the data set is divided, K-1 subsets are used for model construction, and the remaining 1 subsets are used for model verification. The mean value of the evaluation index of the K results is taken as an estimate of the model accuracy, and in practice, K is typically selected in the range of 5 to 10, and k=5 is taken in the present invention, as shown in fig. 3.
Step 3-3, constructing a comprehensive energy station construction carbon emission prediction model based on an XGBoost algorithm, training a tree integration model used in the XGBoost algorithm in an additive mode, and training by adjusting model parameters through verification subsets based on a K-fold cross verification mode to obtain a plurality of comprehensive energy station construction carbon emission prediction models;
and 3-4, inputting the test set into a comprehensive energy station construction carbon emission prediction model to predict, comparing the prediction result of the prediction model with the actual result, and determining an optimal comprehensive energy station construction carbon emission prediction model.
The method comprises the following steps of constructing a comprehensive energy station construction carbon emission prediction model based on an XGBoost algorithm, training a tree integration model used in the XGBoost algorithm in an additive mode until the tree depth threshold is reached, stopping splitting, and storing the comprehensive energy station construction carbon emission prediction model, wherein the method comprises the following steps of:
(1) Taking the CART classification tree as a basic learner, setting a loss function, an objective function and an XGBoost prediction model as shown in formulas (4) - (7) respectively:
setting a loss function:
Figure BDA0004083977830000101
in the method, in the process of the invention,
Figure BDA0004083977830000102
respectively representing an actual value of the carbon emission of the construction of the comprehensive energy station and a predicted value of the carbon emission of the construction of the comprehensive energy station, wherein n is the number of samples, and l () is a loss function;
setting an objective function:
Figure BDA0004083977830000103
wherein f t (x i ) Is the predictive model of the t-th tree,
Figure BDA0004083977830000104
summing the complexity of all t trees, and adding the complexity to a regularization term of an objective function for preventing the model from being over fitted;
setting an XGBoost prediction model:
Figure BDA0004083977830000105
in the method, in the process of the invention,
Figure BDA0004083977830000106
after t iterations, the training set inputs sample x i Loss prediction value of->
Figure BDA0004083977830000107
Is the predicted loss value f of the previous t-1 tree t (x i ) A predictive model for the t-th tree;
objective function Obj t Written as shown in formula (7):
Figure BDA0004083977830000108
wherein C is a constant term, Ω (f t (x i ) For regular term, t is the number of spanning trees, obj) t To generate an objective function after the t-th tree.
(2) According to the taylor formula, the loss function is developed as in formula (8):
Figure BDA0004083977830000109
in the formula g i To be the first derivative of the loss function, h i A second derivative of the loss function;
(3) According to formulas (7), (8), the objective function is written as shown in formula (9):
Figure BDA0004083977830000111
in the method, in the process of the invention,
Figure BDA0004083977830000112
for a constant, the objective function is further optimized as in equation (10):
Figure BDA0004083977830000113
(4) Based on the objective function of the formula (10), when the training set is input, only the first order derivative g of each step of loss function is calculated i Second order derivative h i Then optimizing the objective function to obtain f for each step t (x i ) Obtaining an overall model according to a formula (6), namely building a carbon emission prediction model for the comprehensive energy station;
(5) And adjusting model parameters to obtain different comprehensive energy station construction carbon emission prediction models.
And adjusting parameters of the machine learning model, and storing the model with the best performance. The invention is based on XGBRegoresor construction model and adjustment parameters in machine learning package in python, and the specific parameters include:
(1) boost is a model for selecting each iteration, generally two options: gbtree and gbliner.
(2) learning_rate is the learning rate, the smaller the parameter is, the slower the calculation speed is; the larger the parameter, the more likely it is that convergence is impossible.
(3) max depth is the maximum depth of each tree, range 0, + -infinity), the greater the parameter, the more likely an overfitting occurs, the greater the max_depth, and the more specific and localized samples will be learned by the model.
(4) n_optimizers is the number of trees in machine learning, the better the model performance is, but when the number is to a certain degree, the model performance is improved only to a limited extent, and the speed of an algorithm is adversely affected.
(5) colsample_byte is the column sampling rate, typically the feature sampling rate, range (0, 1) by employing a random forest-like column sampling of the features used for each tree generation.
(6) min _ child _ weight is the minimum sum of weights inside each leaf, in the range of [0 ], ++ infinity A kind of electronic device. The larger the parameter, the more conservative the algorithm, the less likely the overfitting.
(7) lambda is an L2 regularization parameter used to control the regularization portion of machine learning. In the range of [0 ], in +++). The larger the parameter is the greater the degree of freedom, the less likely the overfitting.
(8) gamma is a loss threshold, a parameter that controls the number of leaves, gamma specifies the minimum loss function drop value required for node splitting, the range [0, +++ ], the larger the parameter is the greater the degree of freedom, the more conservative the algorithm, the less likely the overfitting.
And evaluating the model training result, inputting the test set into a comprehensive energy station construction carbon emission prediction model for verification, comparing the prediction result of the prediction model with the actual result, evaluating the model, and determining an optimal comprehensive energy station construction carbon emission prediction model according to the evaluation effect of the comprehensive energy station construction carbon emission prediction model after the evaluation is passed. Model evaluation is shown in accordance with formulas (11) - (14):
Figure BDA0004083977830000121
Figure BDA0004083977830000122
Figure BDA0004083977830000123
Figure BDA0004083977830000124
wherein y is i Is the actual value of the i-th sample,
Figure BDA0004083977830000125
is the predictive value of the ith sample, +.>
Figure BDA0004083977830000126
Is the average of the samples, n is the number of samples. In the present invention, when R 2 When the MAPE is more than or equal to 0.8 and less than or equal to 20%, the accuracy of the model is considered acceptable, and the model is stored. In all saved models, by pair MSE, MAE, R 2 The comparison of MAPE model metrics selects the optimal model which, in the present invention,at the time of ensuring R 2 And under the conditions that the MAPE is more than or equal to 0.9 and less than or equal to 10 percent, taking the model with the minimum value in the MAE and the MSE as the optimal model. The invention stores the optimal prediction model based on the four model evaluation indexes, and model parameters are shown in the following table 5.
Table 5 parameter selection for optimal machine learning model
Parameter name Optimum parameter value Meaning of
booster gbtree Model for each iteration
learning_rate 0.001 Learning rate
max_depth 8 Maximum depth of each tree
n_estimators 5000 Number of trees in machine learning
colsample_bytree 0.6 Column sampling rate
min_child_weight
0 Minimum weight sum inside each leaf
lambda
1 L2 regularization parameters
gamma 0.0001 Loss threshold
The index evaluation effect of the optimal machine learning model is shown in table 6:
TABLE 6 optimal machine learning model performance
MSE MAE R 2 MAPE
Training 0.003363 0.041492 0.993361 1.82042
Testing 0.00992 0.086771 0.973252 5.80986
And 4, inputting prediction index data into a prediction model in actual application, outputting a comprehensive energy station construction carbon emission value by the model, and predicting the situation as shown in table 7. The comprehensive energy station construction carbon emission prediction method based on the machine learning algorithm provided by the research achieves higher accuracy, the maximum error is within 10%, the average absolute error is 5.8%, and the method has high practical and prediction values.
TABLE 7 comprehensive energy station construction carbon emission prediction Effect
Figure BDA0004083977830000131
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention.

Claims (10)

1. The comprehensive energy station construction carbon emission prediction method based on machine learning is characterized by comprising the following steps of:
step 1, collecting a comprehensive energy station to construct a carbon emission prediction index system and identifying a carbon emission source;
step 2, preprocessing the comprehensive energy station construction carbon emission prediction index data and carbon emission source data;
step 3, constructing a comprehensive energy station construction carbon emission prediction model based on a machine learning algorithm, and training the model by using the data in the step 2;
and 4, inputting prediction index data into a comprehensive energy station construction carbon emission prediction model, outputting a comprehensive energy station construction carbon emission value by the model, and predicting the newly built comprehensive energy station construction carbon emission.
2. The machine learning-based comprehensive energy station construction carbon emission prediction method is characterized in that in the step 1, a comprehensive energy station construction carbon emission prediction index system is collected, wherein the comprehensive energy station floor number, the foundation burial depth, the total building area, the foundation earthwork, the concrete design volume and the steel bar design weight are included; the carbon emission source comprises steel consumption, concrete consumption, electric power consumption and water consumption required by the construction of the comprehensive energy station.
3. The machine learning-based comprehensive energy station construction carbon emission prediction method according to claim 1, wherein in step 2, the specific method for preprocessing comprehensive energy station construction carbon emission prediction index data and carbon emission source data is as follows:
step 2-1, converting carbon emission source data consumed by the construction of the comprehensive energy station into carbon emission source data of a unit building area;
step 2-2, analyzing carbon emission source data of unit building area by using a box line graph, and identifying missing or abnormal data as data to be repaired;
step 2-3, replacing the missing value or the abnormal value of the carbon emission source data of the unit building area by using a K-nearest neighbor algorithm;
step 2-4, calculating the construction carbon emission value of the integrated energy station of the unit building area by using the replaced carbon emission source data of the unit building area and combining the carbon emission factors of each carbon emission source, and taking the construction carbon emission value as an output variable of a construction carbon emission prediction model of the integrated energy station;
and 2-5, respectively processing the carbon emission prediction index of each type of comprehensive energy station construction by using a Min-Max normalization method, and taking the carbon emission prediction index as an input variable of a comprehensive energy station construction carbon emission prediction model.
4. The machine learning-based comprehensive energy station construction carbon emission prediction method according to claim 3, wherein in step 2-1, comprehensive energy station construction carbon emission source data is converted into carbon emission source data of a unit building area through formula (1);
Figure QLYQS_1
wherein:
Y i the consumption of the unit building area of the ith carbon emission source is calculated, i is the type of the carbon emission source;
M i is the total amount of the ith carbon emission source;
s is the total building area of the comprehensive energy station.
5. A machine learning based comprehensive energy station construction carbon emission prediction method according to claim 3, wherein the method in step 2-2 is as follows: 25% of the quantiles Q of all samples were taken 1 And 75% quantile Q 2 ,Q 1 、Q 2 The difference is the length IQR of the box body, when the value of the carbon emission source per unit building area is smaller than (Q 1 -1.5 IQR) or greater than (Q 2 +1.5 IQR), an abnormal sample is obtained.
6. The machine learning based comprehensive energy station construction carbon emission prediction method according to claim 3 or 5, wherein the method in step 2-3 is as follows: calculating Euclidean distance between the predicted index of the normal project and the predicted index of the project to be repaired by using a K-nearest neighbor algorithm, and selecting the average value of the carbon emission source data of the unit building area of the normal project with the K nearest Euclidean distances to replace the missing value or the abnormal value in the carbon emission source data of the unit building area.
7. A machine learning based comprehensive energy station construction carbon emission prediction method according to claim 3, wherein the method in steps 2-4 is as follows: and calculating the building carbon emission quantity of the integrated energy station of each unit building area by the cumulative sum of the carbon emission source data and the carbon emission factors of each unit building area, wherein the building carbon emission quantity is used as an output variable of a building carbon emission prediction model of the integrated energy station, and the calculation model is shown in a formula:
Figure QLYQS_2
wherein:
c is the carbon emission amount of a unit building area in the construction process of the comprehensive energy station;
Y i consumption per building area for the ith carbon emission source;
F i the carbon emission factor of the i-th carbon emission source, i is the carbon emission source type, and 4 types are total;
the output variable dataset l= { Ci } is obtained by calculation.
8. A machine learning based comprehensive energy station construction carbon emission prediction method according to claim 3, wherein the method of steps 2-5 is as follows: the carbon emission prediction indexes of the construction of each type of comprehensive energy station are used as input variables of the carbon emission prediction model of the construction of the comprehensive energy station to form an input variable set F= { X 1 ’,X 2 ’,X 3 ’,X 4 ’,X 5 ’,X 6 ' wherein X j ’={x′ 1j ,x′ 2j ,x′ 3j ,x′ 4j ,x′ 5j ,…,x′ mj A j-th prediction index data set after normalization treatment is represented and consists of values after normalization treatment of the j-th prediction index of m items;
Figure QLYQS_3
wherein:
x ij representing the value of the ith item after normalization processing of the jth predictive index;
x j,max ,x j,min respectively representing the maximum value and the minimum value in the j-th prediction index class;
x ij representing the original value of the ith item at the jth predictive index;
j represents the category of the predictor, i represents different items, and n represents the number of items.
9. The machine learning-based comprehensive energy station construction carbon emission prediction method according to claim 1, wherein in step 3, a machine learning algorithm-based comprehensive energy station construction carbon emission prediction model is constructed, and the model training method is as follows:
step 3-1, dividing the comprehensive energy station construction carbon emission prediction index data and the unit building area comprehensive energy station construction carbon emission data obtained through processing into a training set and a testing set;
step 3-2, dividing the training set into K subsets with equal size on the basis of the training set obtained in the step 3-1, wherein the K-1 subsets are used for model training, and the rest 1 subsets form verification subsets;
step 3-3, constructing a comprehensive energy station construction carbon emission prediction model based on an XGBoost algorithm, training a tree integration model used in the XGBoost algorithm in an additive mode, and training by adjusting model parameters through verification subsets based on a K-fold cross verification mode to obtain a plurality of comprehensive energy station construction carbon emission prediction models;
and 3-4, inputting the test set into a comprehensive energy station construction carbon emission prediction model to predict, comparing the prediction result of the prediction model with the actual result, and determining an optimal comprehensive energy station construction carbon emission prediction model.
10. The machine learning based comprehensive energy station construction carbon emission prediction method according to claim 9, wherein the method of step 3-3 is as follows:
(1) taking the CART classification tree as a basic learner, setting a loss function, an objective function and an XGBoost prediction model as shown in formulas (4) - (7) respectively:
setting a loss function:
Figure QLYQS_4
wherein y is i ,
Figure QLYQS_5
Respectively representing an actual value of the carbon emission of the construction of the comprehensive energy station and a predicted value of the carbon emission of the construction of the comprehensive energy station, wherein n is the number of samples, and l () is a loss function;
setting an objective function:
Figure QLYQS_6
wherein f t (x ) ) Is the predictive model of the t-th tree,
Figure QLYQS_7
summing the complexity of all t trees, and adding the complexity to a regularization term of an objective function for preventing the model from being over fitted;
setting an XGBoost prediction model:
Figure QLYQS_8
in the method, in the process of the invention,
Figure QLYQS_9
after t iterations, the training set inputs sample x i Loss prediction value of->
Figure QLYQS_10
Is the predicted loss value f of the previous t-1 tree t (x ) ) A predictive model for the t-th tree;
objective function Obj t Written as shown in formula (7):
Figure QLYQS_11
wherein C is a constant term, Ω (f t (x i ) For regular term, t is the number of spanning trees, obj) t Generating an objective function after the t-th tree;
(2) according to the taylor formula, the loss function is as in formula (8):
Figure QLYQS_12
in the formula g i To be the first derivative of the loss function, h i A second derivative of the loss function;
(3) according to formulas (7), (8), the objective function is written as shown in formula (9):
Figure QLYQS_13
in the method, in the process of the invention,
Figure QLYQS_14
for a constant, the objective function is further optimized as in equation (10):
Figure QLYQS_15
(4) based on the objective function of the formula (10), when the training set is input, a first order derivative g of each step of loss function is calculated i Second order derivative h i Then optimizing the objective function to obtain f for each step t (x i ) Obtaining an overall model according to a formula (6), namely building a carbon emission prediction model for the comprehensive energy station;
(5) and adjusting model parameters to obtain different comprehensive energy station construction carbon emission prediction models.
CN202310130789.1A 2023-02-17 2023-02-17 Comprehensive energy station construction carbon emission prediction method based on machine learning Pending CN116362376A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310130789.1A CN116362376A (en) 2023-02-17 2023-02-17 Comprehensive energy station construction carbon emission prediction method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310130789.1A CN116362376A (en) 2023-02-17 2023-02-17 Comprehensive energy station construction carbon emission prediction method based on machine learning

Publications (1)

Publication Number Publication Date
CN116362376A true CN116362376A (en) 2023-06-30

Family

ID=86931255

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310130789.1A Pending CN116362376A (en) 2023-02-17 2023-02-17 Comprehensive energy station construction carbon emission prediction method based on machine learning

Country Status (1)

Country Link
CN (1) CN116362376A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117251816A (en) * 2023-10-26 2023-12-19 南方电网能源发展研究院有限责任公司 Verification method and device for carbon emission data, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117251816A (en) * 2023-10-26 2023-12-19 南方电网能源发展研究院有限责任公司 Verification method and device for carbon emission data, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
Cao et al. Hybrid ensemble deep learning for deterministic and probabilistic low-voltage load forecasting
US20230196076A1 (en) Method for optimally selecting flood-control operation scheme based on temporal convolutional network
CN108921339B (en) Quantile regression-based photovoltaic power interval prediction method for genetic support vector machine
CN111915092B (en) Ultra-short-term wind power prediction method based on long-short-term memory neural network
CN111260117B (en) CA-NARX water quality prediction method based on meteorological factors
CN112990500B (en) Transformer area line loss analysis method and system based on improved weighted gray correlation analysis
CN109978253B (en) Electric power system short-term load prediction method based on incremental learning
CN112381673B (en) Park electricity utilization information analysis method and device based on digital twin
CN107909221A (en) Power-system short-term load forecasting method based on combination neural net
CN112884012A (en) Building energy consumption prediction method based on support vector machine principle
CN116362376A (en) Comprehensive energy station construction carbon emission prediction method based on machine learning
CN115438833A (en) Short-term power load hybrid prediction method
CN114021483A (en) Ultra-short-term wind power prediction method based on time domain characteristics and XGboost
CN114971090A (en) Electric heating load prediction method, system, equipment and medium
CN115358437A (en) Power supply load prediction method based on convolutional neural network
CN113449919B (en) Power consumption prediction method and system based on feature and trend perception
CN109408896B (en) Multi-element intelligent real-time monitoring method for anaerobic sewage treatment gas production
CN112232570A (en) Forward active total electric quantity prediction method and device and readable storage medium
CN117113086A (en) Energy storage unit load prediction method, system, electronic equipment and medium
CN112149896A (en) Attention mechanism-based mechanical equipment multi-working-condition fault prediction method
CN112014757A (en) Battery SOH estimation method integrating capacity increment analysis and genetic wavelet neural network
CN113762591B (en) Short-term electric quantity prediction method and system based on GRU and multi-core SVM countermeasure learning
CN115860212A (en) Risk prediction method and terminal for power distribution network
CN112581311B (en) Method and system for predicting long-term output fluctuation characteristics of aggregated multiple wind power plants
CN114861555A (en) Regional comprehensive energy system short-term load prediction method based on Copula theory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination