CN115860173A

CN115860173A - Construction and prediction method and medium of carbon emission prediction model based on Stacking algorithm

Info

Publication number: CN115860173A
Application number: CN202211294294.4A
Authority: CN
Inventors: 朱亮亮; 徐骁; 徐辰冠; 夏凡; 赖国书; 郑佩祥; 蔡雨晴; 陈吴晓; 胡泽延; 宋微浪
Original assignee: Wuhan Energy Efficiency Evaluation Co Ltd Of State Grid Electric Power Research Institute; State Grid Corp of China SGCC; State Grid Fujian Electric Power Co Ltd; State Grid Electric Power Research Institute
Current assignee: Wuhan Energy Efficiency Evaluation Co Ltd Of State Grid Electric Power Research Institute; State Grid Corp of China SGCC; State Grid Fujian Electric Power Co Ltd; State Grid Electric Power Research Institute
Priority date: 2022-10-21
Filing date: 2022-10-21
Publication date: 2023-03-28

Abstract

The invention relates to a method and a medium for constructing and predicting a carbon emission prediction model based on a Stacking algorithm, which comprises the following steps: acquiring power data and corresponding carbon emission data samples to form a data set; preprocessing a data set, and dividing the data set into a training set and a test set; analyzing the characteristics influencing carbon emission by using an XGboost algorithm to obtain target characteristics; constructing a carbon emission prediction model, wherein the carbon emission prediction model comprises a meta model and a plurality of base models; fusing the meta model and the plurality of base models by using a Stacking algorithm, and training the models based on the training set and the target characteristics; and adjusting the weight distribution of the prediction result input into the meta-model by each basic model according to the error proportion of the carbon emission prediction result output by each basic model, so as to obtain the optimized carbon emission prediction model. According to the invention, a plurality of algorithm models with the largest difference are selected as the base models of the Stacking integration model, and weight distribution is carried out according to the predicted error ratio, so that the advantages of different algorithms are fully utilized, and the prediction accuracy is improved.

Description

Construction and prediction method and medium of carbon emission prediction model based on Stacking algorithm

Technical Field

The invention belongs to the field of artificial intelligence application, particularly relates to the technical field of carbon emission prediction, and particularly relates to a construction and prediction method and medium of a carbon emission prediction model based on a Stacking algorithm.

Background

The problem of climate warming caused by carbon emission is a common challenge facing countries in the world, and the main targets of constructing a green low-carbon sustainable development system, controlling the total energy consumption, improving the energy utilization efficiency, improving the renewable energy consumption ratio and the like are provided. Therefore, energy consumption characteristic research of key energy utilization industries needs to be developed, carbon emission conditions and evolution trends of key industries and enterprises are mastered, relevant work such as carbon emission reduction potential mining and carbon emission reduction service is developed in a targeted manner, and support is provided for implementing energy conservation and carbon reduction and improving resource utilization efficiency of key industries and enterprises. Currently, the following problems are mainly faced in this part of the work:

1) The energy consumption and carbon emission data of enterprises are difficult to master accurately and comprehensively in real time. At present, only a few key energy-using enterprises are provided with energy consumption online monitoring platforms, but the operation and maintenance cost is high, the carbon emission management function is lacked, and the carbon emission of the enterprises cannot be predicted and early warned; in addition, a large number of emission control enterprises with scales are not built with energy consumption monitoring systems. For government regulatory departments and enterprises, it is difficult to accurately and comprehensively master energy consumption data and carbon emission data of the enterprises in real time, and relevant data sources are mainly based on annual carbon check reports, statistical yearbooks and the like.

2) Key energy-using enterprises lack the carbon asset management function and cannot fully exploit the potential of carbon emission reduction. At present, the cognition and knowledge storage of key energy-consuming enterprises on the aspects of carbon emission accounting, carbon asset management, carbon market trading and the like are still relatively lacked, the carbon emission reduction potential of the enterprises cannot be effectively exploited, and the emission reduction requirement cannot be met most economically and effectively.

Therefore, the construction of a scientific carbon emission model can predict the future carbon emission situation, and has important significance for energy conservation and emission reduction work and sustainable development in China.

With the development of artificial intelligence, people can construct an algorithm prediction model by analyzing factors influencing carbon emission, and the accuracy of model prediction is improved. For example, 20319Xin et al predicted the carbon emission in China from 2012 to 2020 using the gray model GM (1, 1), the emission reduction pressure in China is also very large according to the prediction result. The sanfotam employs an improved BP neural network model,the method is characterized in that the prediction research of China carbon emission peak value is performed under 8 development modes, and research shows that China can achieve the 2030-year carbon emission peak reaching target under the economic decline mode, the energy-saving mode and other modes. The carbon emission amount and the carbon emission intensity of the whale in 2019-2040 years are changed by the Wanke and the like by utilizing a whale optimization algorithm, and the future carbon emission trend of China is accurately reflected. The method is characterized in that the building of a Logistic model with increased carbon emission by Duqiang and the like verifies that the Logistic model is applied to CO emission in CO through comparison with original data on carbon dioxide emission in different provinces from 2011 to 2020 ₂ Accuracy and high reliability in emissions prediction problems. However, the existing common carbon emission method also has certain problems, and although the method can predict the carbon emission, a single model obviously lacks of comprehensive research on multiple models and multiple mechanisms.

In recent years, a carbon emission combined prediction method is also popular in China, for example, zhang Feng et al combines a traditional grey prediction model, a system cloud grey prediction model and a Verhulst model, the model combines the advantages of a single model, and the combined grey prediction model is used for predicting CO in industries such as construction industry between 2013 and 2017 in Shandong province ₂ And (4) the emission amount shows that the prediction precision of the combined model is higher than that of a single model. Gena et al proposed a Prophet-LSTM-based time series fusion prediction model in 2019, and finally verified that model fusion has higher accuracy in time series prediction through comparison experiments. The prediction effect of the combined prediction methods is better than that of a single model, but the optimization space is still left in the accuracy of the prediction effect.

Disclosure of Invention

The invention aims to provide a construction and prediction method of a carbon emission prediction model based on a Stacking algorithm.

The scheme for solving the technical problems is as follows: a construction method of a carbon emission prediction model based on a packing algorithm comprises the following steps:

s1, acquiring electric power data and a carbon emission data sample corresponding to the electric power data to form a data set;

s2, preprocessing the data set, and dividing the preprocessed data set into a training set and a test set according to a preset proportion; the training set and test set preset ratio may be 2;

s3, analyzing the characteristics influencing carbon emission by using an XGboost algorithm, selecting the characteristics, and removing redundant characteristics to obtain target characteristics;

s4, constructing a carbon emission prediction model, wherein the constructed carbon emission prediction model comprises a meta-model and a plurality of base models;

s5, fusing the meta-model and the plurality of base models by using a Stacking algorithm, training the constructed carbon emission prediction model based on a training set and target characteristics, and performing parameter optimization on the meta-model and the hyper-parameters of each base model in a grid search mode;

and S6, based on the test set, adjusting the weight distribution of the prediction results input into the meta-model by each base model according to the error proportion of the carbon emission prediction results output by each base model, and obtaining the trained carbon emission prediction model.

Preferably, the above method step further comprises S7:

and S7, evaluating the accuracy of the carbon emission prediction model, inputting the test set for prediction, comparing corresponding carbon emission data samples, and analyzing the prediction result.

Preferably, in S2, the processing of the data set includes:

removing abnormal values in the data set and completing missing values in the data set; and carrying out normalization processing on the data set after the missing value is completed.

Because the power data is obtained through the correlation model, the obtained power data may have missing values and abnormal values in special or artificial situations. Therefore, when carbon emission prediction is carried out, abnormal values are removed from the power data

And completing the missing values. In the invention, two statistical information of the mean value and the highest frequency number are mainly used for filling missing values.

If the value of a feature of one dimension is several orders of magnitude greater than that of other dimensions, it will dominate the objective function of the machine learning model, making it unable to learn features of other dimensions. And the normalization processing can scale each dimension feature in the data set into an array with the mean value of 0, so that each dimension feature is scaled to the same order of magnitude, and the deviation is avoided when the machine learning model is trained. If the normalization processing is not carried out, the phenomenon that the features with larger orders of magnitude are biased occurs in the process of training the model.

The formula for the normalization process is as follows:

/>

where x is the initial data, x' is the processed data, x _min Denotes the minimum value, x, in the initial data _max Representing the maximum value in the initial data.

Preferably, the S3 includes the following steps:

s31, inputting the characteristics influencing carbon emission into an XGboost algorithm to obtain the gain condition of a decision tree in the training process;

and S32, grading the input features according to the gain condition of the lifting decision tree by the XGboost algorithm according to the input features, and selecting the features with high grades as target features.

The XGboost algorithm can divide input features according to the gain condition of a tree in the training process, remove some redundant features with lower importance and enable prediction to be more accurate and efficient.

Preferably, the base model performs carbon emission prediction by selecting a plurality of algorithm models and inputting power data, and then compares prediction result errors of the algorithm models according to a difference metric method, and selects three algorithm models with the largest difference to determine; the meta model includes XGBoost.

Preferably, the base model is used for predicting carbon emission by inputting power data into GBDT, RF, SVM, KNN and LSTM algorithm models, and then the prediction result errors of the algorithm models are compared according to a difference metric method, and the three algorithm models with the largest difference are selected for determination.

Because the data observation angles of different algorithms are different from the principle structure of the algorithms, the advantages of the algorithms can be maximized when the algorithms with large difference are selected for prediction, and the prediction accuracy can be greatly improved.

Preferably, the S6 includes the steps of:

s61, inputting the prediction result of the base model into a meta model;

s62, calculating the error of the output result of the base model through cross validation, and performing weight distribution on the prediction result input into the meta-model by the base model according to the error proportion to obtain the constructed carbon emission prediction model.

The weighted output can increase the influence degree of the prediction value with high precision on the prediction result, and reduce the influence degree of the prediction value with low precision on the prediction result. The method not only improves the accuracy of the prediction model, but also improves the stability of the prediction model.

Preferably, in S7, the average absolute percentage error is used as an accuracy standard of the evaluation model.

The construction device of the carbon emission prediction model based on the Stacking algorithm comprises the following components:

the first acquisition module is used for acquiring the power data and carbon emission data samples corresponding to the power data to form a data set;

the preprocessing module is used for preprocessing the data set and dividing the preprocessed data set into a training set and a test set according to a preset proportion;

the characteristic selection module is used for analyzing characteristics influencing carbon emission through an XGboost algorithm, performing characteristic selection and removing redundant characteristics to obtain target characteristics;

the carbon emission prediction model construction module is used for constructing a carbon emission prediction model, and the constructed carbon emission prediction model comprises a meta model and a plurality of base models;

the carbon emission prediction model training module is used for fusing the meta-model and the plurality of base models through a Stacking algorithm, training the constructed carbon emission prediction model based on a training set and target characteristics, and performing parameter optimization on the meta-model and the hyper-parameters of each base model in a grid search mode;

and the carbon emission prediction model optimization module is used for adjusting the weight distribution of the prediction results input into the meta-model by each base model according to the error proportion of the carbon emission prediction results output by each base model based on the test set to obtain the trained carbon emission prediction model.

A method of carbon emission prediction, the method comprising:

acquiring power data to be predicted;

inputting the power data to be predicted into a carbon emission prediction model trained in advance, and predicting carbon emission based on the carbon emission prediction model to obtain a carbon emission prediction result;

the carbon emission prediction model is constructed by the construction method of the carbon emission prediction model based on the Stacking algorithm.

A carbon emissions prediction apparatus, the apparatus comprising:

the second acquisition module is used for acquiring the power data to be predicted;

and the prediction module is used for inputting the power data to be predicted into a carbon emission prediction model trained in advance, and predicting carbon emission based on the carbon emission prediction model to obtain a carbon emission prediction result.

Another object of the present invention is to provide a computer-readable storage medium, which stores a computer program, which when executed by a processor, implements the method for constructing a carbon emission prediction model based on a Stacking algorithm as described above.

An electronic device that can read the computer-readable storage medium as described above.

The invention has the beneficial effects that:

1. the XGboost algorithm is used for analyzing the importance of the features during feature engineering, and redundant features are effectively eliminated.

2. When the base model of the Stacking model is selected, the difference of each base model is evaluated by using a difference measurement method. On one hand, when the Stacking model uses a base model with strong learning capacity, a better prediction effect can be obtained. On the other hand, the algorithm with larger difference can fully utilize the advantages of different algorithms, so that each algorithm can make up for the deficiencies of each other. In contrast to selecting the base model randomly or based on manual experience, the invention can select the base model with larger difference to obtain better prediction effect.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings. The detailed description of the present invention is given in detail by the following examples and the accompanying drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention and do not constitute a limitation of the invention. In the drawings:

fig. 1 is a flowchart of a method for constructing a carbon emission prediction model based on a Stacking algorithm in embodiment 1 of the present invention;

fig. 2 is a flowchart of selecting a base model of the Stacking algorithm in embodiment 1 of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

The working principle of the invention is as follows: the invention mainly utilizes an improved Stacking integration algorithm, wherein the Stacking integration algorithm takes the prediction result of a base model as the input of a meta-model, and the output of the meta-model is integrated as the final prediction result through a combination strategy.

The Stacking algorithm generally has better prediction performance than a single model, but has certain defects, the original Stacking algorithm cannot guarantee that a proper base model can be selected, and an overfitting phenomenon is easy to occur. Meta-models may achieve worse experimental results than the base model if the model selection is not appropriate, or due to factors such as data type. The invention makes some improvements to the Stacking algorithm: compared with the base model selected according to manual experience in the original Stacking algorithm, the difference of the base models is evaluated by using a difference metric method, the base models with larger difference are combined together to improve the prediction precision, and the base models are weighted and distributed to the meta-model according to the result of the cross-validation error to reduce the prediction error. Although the training time is increased to a certain extent, the generalization performance of the model is improved. The improved Stacking algorithm overcomes the defects of the original algorithm and obtains better prediction effect on carbon emission.

Example 1

As shown in fig. 1, a ceramic industry is selected as a test point, and a carbon emission prediction model is established according to 25 obtained carbon emission data samples of ceramic enterprises in Fujian province and Hubei province, wherein the data are provided by enterprises, and the specific steps are as follows:

s1, obtaining electric power data of a ceramic enterprise and a carbon emission data sample corresponding to the electric power data to form a data set.

S2, processing the data set, and specifically comprising the following steps:

s21, screening and processing abnormal values in the data set by using a pandas tool;

s22, filling missing values in a data set by using a single variable feature filling method, mainly filling the missing values by using statistical information of numbers with the highest mean value and frequency, firstly calculating the occurrence frequency of each value in the features, filling the missing values of the features by using the number if the value with the higher occurrence frequency exists, and otherwise, filling the missing values by using the mean value;

s23, after filling up missing values, carrying out normalization processing on the data set, wherein the processing formula is as follows:

And S24, dividing the processed data set of the power data into an 80% training set and a 20% testing set according to a proportion.

S3, analyzing the characteristics influencing carbon emission by using an XGboost algorithm, and selecting the characteristics, wherein the method specifically comprises the following steps;

The ceramic energy consumption sources are mainly electric power, coal, natural gas and diesel oil, wherein the energy consumption of the firing process is the largest, and accounts for more than 60% of the comprehensive energy consumption of a product unit, and the electric energy consumption of the raw material production process and the kiln firing process accounts for more than 85% of the total power consumption of an enterprise, and after the characteristics are selected, the factors influencing the largest energy consumption of the embodiment are as follows: the type of kiln used for firing, the firing temperature, the firing time, the type of fuel and the quality are also important factors.

S4, constructing a carbon emission prediction model, and specifically comprising the following steps:

s41, inputting power data in the GBDT, RF, SVM, KNN and LSTM algorithm models to predict carbon emission;

s42, calculating the prediction error of each model by adopting a Pearson coefficient so as to analyze the difference degree of different base models, wherein the specific selection process is shown in figure 2, and the difference degree analysis of each algorithm model is shown in Table 1. As can be seen from table 1, the correlation coefficients of SVM, KNN, and RF are low, and therefore the degree of model difference is also the greatest, and RF, KNN, and SVM are selected as the base models of the Stacking integrated model, and then XGBoost is selected as the meta model.

TABLE 1

	GBDT	RF	SVM	KNN	LSTM
						GBDT	1	0.923	0.664	0.597	0.688
RF	0.923	1	0.512	0.557	0.587
						SVM	0.664	0.512	1	0.537	0.684
KNN	0.597	0.557	0.537	1	0.712
						LSTM	0.688	0.587	0.684	0.712	1

S5, fusing the base models and the meta-models by using a Stacking algorithm, screening effective data of a training set according to target characteristics, and training the integrated model on the screened training set; and optimizing the hyper-parameter sets of each base model and meta-model by using grid search, wherein the optimal parameters of each algorithm are shown in table 2.

TABLE 2

S6, optimizing the model, specifically comprising the following steps:

s61, inputting the prediction result of the base model into the meta model;

s62, calculating the error of the output result of the base model through cross validation, and performing weight distribution on the prediction result input into the meta model by the base model according to the error proportion, wherein the formula is as follows:

let a test set of a certain base model of the first layer model have a result of (y) ₁ ,y ₂ ,…,y _n ) The calculation error is (x) ₁ ,x ₂ ,…,x _n ) Then its weight calculation formula is

/>

The output result is (e) ₁ y ₁ ,e ₂ y ₂ ,…,e _n y _n ) The cross validation errors for each algorithm are shown in table 3.

According to the cross-validation performance of each model, an SVM weight of 0.5, an RF weight of 0.3 and a KNN weight of 0.2 are set.

TABLE 3

	Error
		RF	0.1269
SVM	0.0824
		KNN	0.1916

And S7, evaluating the accuracy of the model, inputting the test set for prediction, comparing the carbon emission data samples, and selecting a Mean Absolute Percentage Error (MAPE) as a model evaluation index. MAPE is defined as follows:

wherein x (i) and y (i) represent actual and predicted values of carbon emissions, respectively, and n represents the number of sample points;

through the verification of the test set, the average result of MAPE adopting 5-fold cross validation of each model is shown in Table 4, the model is 1.59%, and the carbon emission prediction achieves a good effect.

TABLE 4

	MAPE/％
		RF	2.187
SVM	1.972
		KNN	2.784
Fusion model	1.59

Example 2

A method of carbon emission prediction, the method comprising:

acquiring power data to be predicted;

the carbon emission prediction model is constructed by the construction method of the carbon emission prediction model based on the Stacking algorithm in the embodiment 1.

Example 3

An apparatus for constructing a carbon emission prediction model based on a Stacking algorithm, the apparatus comprising:

the first acquisition module is used for acquiring the electric power data and carbon emission data samples corresponding to the electric power data to form a data set;

Example 4

A carbon emissions prediction apparatus, the apparatus comprising:

Example 5

A computer-readable storage medium storing a computer program which, when executed by a processor, implements a method of constructing a carbon emission prediction model based on a packing algorithm as in embodiment 1.

Example 6

An electronic device that can read a computer-readable storage medium as in embodiment 5.

The foregoing is merely a preferred embodiment of the invention and is not intended to limit the invention in any manner; the present invention may be readily implemented by those of ordinary skill in the art as illustrated in the accompanying drawings and described above; however, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the scope of the invention as defined by the appended claims; meanwhile, any changes, modifications, and evolutions of the equivalent changes of the above embodiments according to the actual techniques of the present invention are still within the protection scope of the technical solution of the present invention.

Claims

1. A construction method of a carbon emission prediction model based on a Stacking algorithm is characterized by comprising the following steps:

s2, preprocessing the data set, and dividing the preprocessed data set into a training set and a test set according to a preset proportion;

2. The method for constructing the carbon emission prediction model based on the Stacking algorithm according to claim 1, further comprising S7:

3. The method for constructing the carbon emission prediction model based on the Stacking algorithm according to claim 1, wherein in the step S2, the processing of the data set comprises:

4. The method for constructing a carbon emission prediction model based on the Stacking algorithm according to claim 1,

the S3 comprises the following steps:

5. The method for constructing the carbon emission prediction model based on the Stacking algorithm according to claim 1, wherein the base model is used for predicting carbon emission by selecting a plurality of algorithm model input power data, comparing prediction result errors of the algorithm models according to a difference metric method, and selecting three algorithm models with the largest difference for determination; the meta model includes XGBoost.

6. The method for constructing the carbon emission prediction model based on the Stacking algorithm as claimed in claim 5, wherein the base model is determined by inputting power data into GBDT, RF, SVM, KNN and LSTM algorithm models to predict carbon emission, comparing prediction result errors of the algorithm models according to a difference metric method, and selecting three algorithm models with the largest difference.

7. The method for constructing the carbon emission prediction model based on the Stacking algorithm according to claim 1, wherein the S6 comprises the following steps:

s61, inputting the prediction result of the base model into the meta model;

s62, calculating the error of the output result of the base model through cross validation, and performing weight distribution on the prediction result input into the meta-model by the base model according to the error ratio to obtain the constructed carbon emission prediction model.

8. The method for constructing the carbon emission prediction model based on the Stacking algorithm as claimed in claim 2, wherein in the step S7, the average absolute percentage error is adopted as an accuracy standard of the evaluation model.

9. A method of predicting carbon emissions, the method comprising:

acquiring power data to be predicted;

the carbon emission prediction model is constructed by the construction method of the carbon emission prediction model based on the packing algorithm according to any one of claims 1 to 8.

10. A computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, carries out a method of constructing a Stacking algorithm based carbon emissions prediction model according to any of the claims 1-8.