CN117709524A

CN117709524A - Bayesian optimization-based carbon emission prediction method and system for steel industry

Info

Publication number: CN117709524A
Application number: CN202311697596.0A
Authority: CN
Inventors: 何海; 赵郁婷; 祝湘博; 陈昱达; 陶义; 孟令卿; 陈卉雯; 潘媛; 金佳星; 商梦洋; 李广地; 王迎春; 周博文; 谷鹏; 李子文
Original assignee: Anshan Power Supply Co Of State Grid Liaoning Electric Power Co; State Grid Corp of China SGCC
Current assignee: Anshan Power Supply Co Of State Grid Liaoning Electric Power Co; State Grid Corp of China SGCC
Priority date: 2023-12-12
Filing date: 2023-12-12
Publication date: 2024-03-15

Abstract

The invention relates to a Bayesian optimization-based steel industry carbon emission prediction method and a Bayesian optimization-based steel industry carbon emission prediction system, which belong to the technical field of electric power energy prediction. According to the prediction method, the relation between characteristic variables of the steel industry is analyzed by using the Person correlation coefficient, the influence factors with strong relevance to carbon emission are comprehensively considered, an improved Stacking integrated learning model is used in carbon emission prediction, and an error compensation model is adopted to optimize the Stacking integrated learning model error, so that the prediction precision is greatly improved.

Description

Bayesian optimization-based carbon emission prediction method and system for steel industry

Technical Field

The invention belongs to the technical field of electric power energy prediction, and particularly relates to a Bayesian optimization-based method and a Bayesian optimization-based system for predicting carbon emission in the steel industry.

Background

With the development of industrialization and city, the problem of carbon emission has become an extremely important environmental challenge, and especially, the steel industry is an industry with dense energy and huge carbon emission, and the influence on the environment is particularly remarkable. The energy consumption of China iron and steel enterprises as large households of energy consumption and carbon emission is 11% of the whole country, and the carbon emission is 15% of the total amount. Therefore, the method has important significance for predicting the carbon emission in the steel industry. However, the carbon emission ratio of each link is different due to the fact that the production procedures of iron and steel enterprises are more, the process is complex, and therefore the carbon emission prediction in the iron and steel industry is complex and challenging to a certain extent.

The problem of emission reduction of energy-consuming enterprises is always focused on by students. Establishing a mathematical model according to the production process of the iron and steel enterprises to check carbon emission, and checking CO ₂ The emission reduction potential was evaluated. By means of IPCC, the carbon emission of the middle iron and steel enterprises is calculated, and the relation between the carbon emission and the variables is studied. These two traditional carbon emission calculation methods often have no instantaneity and cannot timely reflect the carbon emission dynamics of the iron and steel enterprises. In recent years, with the deep development of machine learning, various artificial intelligence algorithms are applied to carbon emission prediction. The scholars respectively use a grey prediction model, a BP neural network and logistic regression to predict the carbon emission, and certain achievements are obtained, but the accuracy is limited. With the continuous development of the integrated learning method, the accuracy of carbon emission prediction is further improved by using the integrated learning model. Because the number of the super parameters of the integrated learning model is large, the parameter tuning is often carried out by adopting a trial-and-error method, so that the precision still has room for further improvement. The grid search method is adopted to optimize the superparameter of the integrated learning model, but the calculation cost is high, and when the number of the superparameter is increased, a great amount of calculation resources and time are needed to complete the search process. The particle swarm optimization is used for optimizing the super-parameters of the integrated learning model, and the parameter searching time is reduced. At present, less research on integrated learning superparameter is carried out, and development is still left blankAnd (3) the room(s).

Therefore, a new technical solution is needed in the prior art to solve the above-mentioned problems.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: the method and the system for predicting the carbon emission in the steel industry based on the Bayesian optimization are provided, the limitation of the existing method is solved, a method for Stacking integrated learning based on the Bayesian optimization is used, relevant data are collected and analyzed, input characteristic variables are determined through Person relevant coefficients, a prediction model comprehensively considering a plurality of factors is established, a prediction result is presented in an intuitive mode, and scientific basis is provided for environmental protection and carbon emission reduction decision. The application of the method can help enterprises and governments to better manage carbon emission data and information and make more efficient strategies and operational decisions.

A Bayesian optimization-based carbon emission prediction method in the steel industry comprises the following steps,

acquiring related data to be predicted, and establishing an original sample set;

carrying out outlier correction on the outlier data in the original sample set to obtain a corrected original sample set, obtaining the degree of correlation between features by adopting a Person correlation coefficient on the corrected original sample set, and determining an input feature variable;

the data is standardized, unified into the same scale range, and the standardized sample data is divided into k parts for cross verification to evaluate the performance of the model;

constructing a double-layer Stacking integrated learning model, wherein a first layer uses a base learner to train according to the characteristics and target values of sample data;

inputting a data set obtained by combining the new data set trained by the base learner with the original training set into the element learner, so that the element learner learns an implicit relation between the original training set and the new training set to obtain a trained Stacking model, and substituting the testing set into the trained Stacking model to obtain a carbon emission predicted value;

substituting the parameters obtained by Bayesian optimization into a Stacking integrated learning model, taking the error between the predicted value and the true value of the training set as a target value, training an error compensation model, performing error compensation on the predicted result, and adding the compensation value and the carbon emission predicted value to obtain a final carbon emission predicted value.

The related data to be predicted comprise power consumption and energy consumption structures in the production process, and specifically comprise historical power consumption data, historical fossil energy consumption data in the production process and historical carbon emission.

The abnormal value correction fills the blank of the blank numerical control by adopting an interpolation method, and ensures the integrity of the data set; the interpolation method comprises linear interpolation, polynomial interpolation, sectional erasing and drawing and exponential interpolation;

for noisy data, the original data is fitted to the regression model, ensuring data accuracy.

The normalization process selects k=5 as the number of K-fold cross-validation folds, dividing the sample data into 10 parts for training and validating the model, respectively.

The first layer of the double-layer Stacking integrated learning model is built by adopting a Xgboost, lightGBM, SVR prediction algorithm.

The meta learner learns that the implicit relation between the original training set and the new training set is that the average absolute percentage error is used as an index function, and a trained model is obtained after the iterative training is completed.

A Bayesian optimization-based steel industry carbon emission prediction system comprises a data acquisition module: the method comprises the steps of acquiring related data to be predicted, including consumption, production process, energy consumption structure and the like, and collecting historical electricity consumption data, historical production related quantity data and historical carbon emission in the production process flow as sample data;

and a data processing module: the abnormal value correction is used for correcting the abnormal data in the original sample set of the related data of the history date to obtain a corrected original sample set; the method is also used for obtaining the association degree between the features by using the Person correlation coefficient and determining the input feature variable; the method is also used for carrying out standardization processing on the data in the prediction sample set;

and a cross verification module: dividing the processed sample data into a plurality of subsets, wherein one subset is used as a verification set, and the rest subsets are used for training a model; repeatedly training and verifying the model in a cross verification mode, evaluating the generalization capability of the model and selecting the optimal model parameters;

the Stacking model building module: the method comprises the steps of constructing a Stacking integrated learning model, and initializing the number and the types of base learners and the types of meta learners in the Stacking integrated learning model;

bayesian optimization module: parameters are obtained through Bayesian optimization and are used for substituting the parameters into a Stacking integrated learning model, and an error compensation model is trained by taking the error between the predicted value and the true value of the training set as a target value.

Through the design scheme, the invention has the following beneficial effects:

1. the invention is based on the improvement of the Stacking integrated learning model, and takes the original data set as a part of the training of the element learner, namely, the training set of the element learner is obtained by combining the original training set and the new data set formed by the base learner, so that the element learner learns the implicit relation between the original training set and the new training set, thereby improving the model prediction effect.

2. The invention is based on a Bayesian optimization algorithm, and can find the optimal super-parameters in limited iteration times by optimizing parameter selection according to priori information and observation data. The integrated learning method based on Bayesian optimization has higher calculation efficiency and resource utilization efficiency, and the prediction accuracy is improved.

3. The invention can acquire data such as electricity consumption data, fossil energy consumption in the production process and the like in real time, can monitor the carbon emission of steel enterprises in real time, can help enterprises and governments to better manage the carbon emission data and information, and makes more efficient strategy and operation decision.

Drawings

The invention is further described with reference to the drawings and detailed description which follow:

FIG. 1 is a flowchart of an error compensation Stacking carbon emission prediction model based on Bayesian optimization.

Fig. 2 is a modified Stacking ensemble learning model in an example of the present invention.

FIG. 3 is a flow chart of Bayesian parameter optimization in an example of the present invention.

FIG. 4 is a graph of single model predictions in an example of the present invention.

FIG. 5 is a graph of the prediction results of the ensemble learning model in an example of the present invention.

FIG. 6 is an error distribution diagram of an example of the present invention.

Detailed Description

The principles and features of the present invention are described below with reference to the drawings, the examples are illustrated for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.

A Bayesian optimization-based error compensation Stacking carbon emission prediction model flow chart is shown in fig. 1, and the carbon emission prediction method comprises the following steps:

s1: analyzing the process production flow of the steel industry: the method comprises the steps of counting relevant data in a carbon emission check report of a steel enterprise, determining a carbon emission source, and collecting historical electricity consumption data, historical production process fossil energy consumption and historical carbon emission of 50 steel enterprises in Liaoning areas as sample data;

s2: the sample data is preprocessed, and the method specifically comprises the following steps:

s21: in the data statistics process, the problems of data missing, format content error, logic error and the like may occur, and the data needs to be cleaned, which comprises the following specific steps: determining the position of missing data, selecting a proper interpolation method according to the type of the data and the condition of the missing value, calculating an interpolation coefficient, calculating a filling value of the missing data by utilizing the interpolation coefficient and known data points, and inserting the calculated filling value into the original data to finish filling of the missing data;

s22: obtaining the association degree between the features and the carbon emission by using the Person correlation coefficient, and finally selecting and determining feature variables;

r _XY is a correlation coefficient, n is the number of samples, X _i ,Y _i The observed quantity of the current characteristic attribute and the band predicted value respectively,the average value of the current characteristic attribute and the average value of the band predicted value are respectively.

S23: and (5) carrying out standardization processing on the data.

x ^* Is a normalized feature, x is an observed feature, x _min For this purpose the minimum value, x, of the feature in the dataset _max For this purpose the feature is the maximum in the dataset.

S3: constructing a double-layer Stacking integrated learning model, setting a Stacking integrated learning model base learner and a meta learner, and initializing network parameters of the base learner and the meta learner;

s4: optimizing the Stacking integrated learning model, which comprises the following steps,

s41: dividing the data sample into a training set and a testing set;

s42: the training set is subjected to k-fold cross validation, the output of the base learner is combined with corresponding original data, standardized processing is carried out, and the combined data is input into the element learner for training, so that the element learner learns the implicit relation between the original training set and the new training set, the model prediction effect is improved, and the improved Stacking integrated learning model is shown in figure 2.

S43: in order to evaluate the prediction effect of the model, the study adopts 3 evaluation indexes to evaluate the effect of the model, namely average absolute percentage error, root mean square error and maximum error, and obtains the prediction model error index of each model carbon emission model through simulation,

y _i to observe the values, i.e. the actual values,e is the predicted value _MAPE 、e _RMSE 、e _MAX The average absolute percentage error, the root mean square error and the maximum error are respectively, and n is the number of samples.

As shown in Table 1, e is output as a model in order to improve the accuracy of model prediction _MAPE And (3) in order to optimize the target, obtaining a trained model after the iterative training is completed, substituting the test set into the model to obtain a carbon emission predicted value.

TABLE 1 Single model prediction results

S5: and carrying out error compensation on the prediction result. Parameters are obtained through Bayesian optimization and substituted into a Stacking integrated learning model, and an error between a predicted value and a true value of a training set is used as a target value to train an error compensation model. Substituting the test set into an error compensation model, and adding the output result and the predicted value of the test set to obtain the final predicted value of the carbon emission. The bayesian objective function is:

x _min ＝argmin _x∈X f(x)

wherein: x is x _min And f (X) is an objective function to be optimized, and X represents the search range of X.

To verify the predictive performance of the bayesian-optimization-based Stacking ensemble learning model, the predictive results of the single model were first observed, as shown in fig. 4. The errors of which are shown in table 2,

TABLE 2 prediction model error index for carbon emission model

As can be seen from the results, the single model error is large due to the limited ability of SVR, lightGBM, XGBoost to learn data, and as can be seen from Table 2, XGBoost has the best overall prediction effect, e _MAPE 29.12% decrease in e relative to SVR _RMSE 17.63 percent of the carbon emission curve is reduced, but the maximum error is still larger, and the deviation from the actual carbon emission curve is larger; the carbon emission result of the integrated learning is shown in fig. 5, the Stacking integrated learning model can fully combine the advantages of each model, make up the defect of a single model, improve the prediction precision, and greatly reduce three error indexes, but because the parameter optimization method adopts a grid search mode, the parameter search space is limited, and the optimal parameters cannot be optimized;

in order to further improve the prediction precision of the model, the invention improves the Stacking integrated learning model on the basis of the traditional Stacking prediction model and introduces an error compensation model. Greatly improves the prediction accuracy, particularly at the peak of carbon emission, and obviously reduces the prediction error, as shown in the error distribution of FIG. 6, wherein e is compared with the traditional Stacking integrated learning model _MAX The method reduces 27.22%, and effectively improves the robustness of prediction.

The above description is only of the preferred embodiments of the present invention, and is not intended to limit the present invention in any way; those skilled in the art will readily appreciate that the present invention may be implemented as shown in the drawings and described above; however, those skilled in the art will appreciate that many modifications, adaptations, and variations of the present invention are possible in light of the above teachings without departing from the scope of the invention; meanwhile, any equivalent changes, modifications and evolution of the above embodiments according to the essential technology of the present invention still fall within the scope of the present invention.

Claims

1. A Bayesian optimization-based carbon emission prediction method in the steel industry is characterized by comprising the following steps: comprises the steps of,

carrying out standardization processing on the measurement data of the iron and steel enterprises, unifying the measurement data into the same scale range, dividing the standardized sample data into k parts, and carrying out cross verification to evaluate the performance of the model;

using a Bayesian optimization algorithm to find out optimal parameters of the neural network, including the circulation times, the learning rate, the activation function and the number of hidden layer neurons of the neural network; substituting the parameters obtained by Bayesian optimization into a Stacking integrated learning model, taking the error between the predicted value and the true value of the training set as a target value, training an error compensation model, performing error compensation on the predicted result, and adding the compensation value and the carbon emission predicted value to obtain a final carbon emission predicted value.

2. The steel industry carbon emission prediction method based on Bayesian optimization as claimed in claim 1, wherein the method is characterized by comprising the following steps: the related data to be predicted comprise power consumption and energy consumption structures in the production process, and specifically comprise historical power consumption data, historical fossil energy consumption data in the production process and historical carbon emission.

3. The steel industry carbon emission prediction method based on Bayesian optimization as claimed in claim 1, wherein the method is characterized by comprising the following steps: the abnormal value correction fills the blank of the blank numerical control by adopting an interpolation method, and ensures the integrity of the data set; the interpolation method comprises linear interpolation, polynomial interpolation, sectional erasing and drawing and exponential interpolation;

4. The steel industry carbon emission prediction method based on Bayesian optimization as claimed in claim 1, wherein the method is characterized by comprising the following steps: the normalization process selects k=5 as the number of K-fold cross-validation folds, dividing the sample data into 10 parts for training and validating the model, respectively.

5. The steel industry carbon emission prediction method based on Bayesian optimization as claimed in claim 1, wherein the method is characterized by comprising the following steps: the first layer of the double-layer Stacking integrated learning model is built by adopting a Xgboost, lightGBM, SVR prediction algorithm.

6. The steel industry carbon emission prediction method based on Bayesian optimization as claimed in claim 1, wherein the method is characterized by comprising the following steps: the meta learner learns that the implicit relation between the original training set and the new training set is that the average absolute percentage error is used as an index function, and a trained model is obtained after the iterative training is completed.

7. A Bayesian optimization-based carbon emission prediction system in the steel industry is characterized in that: the method comprises the following steps of: the method comprises the steps of acquiring related data to be predicted, including consumption, production process, energy consumption structure and the like, and collecting historical electricity consumption data, historical production related quantity data and historical carbon emission in the production process flow as sample data;