CN116108963A - Electric power carbon emission prediction method and equipment based on integrated learning module - Google Patents

Electric power carbon emission prediction method and equipment based on integrated learning module Download PDF

Info

Publication number
CN116108963A
CN116108963A CN202211574631.5A CN202211574631A CN116108963A CN 116108963 A CN116108963 A CN 116108963A CN 202211574631 A CN202211574631 A CN 202211574631A CN 116108963 A CN116108963 A CN 116108963A
Authority
CN
China
Prior art keywords
electric power
carbon emission
data
learning module
predicted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211574631.5A
Other languages
Chinese (zh)
Inventor
曾振松
林伟伟
沈豫
刘林
涂夏哲
杨丝雨
阙定飞
林文彬
林可尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Fujian Electric Power Co Ltd
Economic and Technological Research Institute of State Grid Fujian Electric Power Co Ltd
Original Assignee
State Grid Fujian Electric Power Co Ltd
Economic and Technological Research Institute of State Grid Fujian Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Fujian Electric Power Co Ltd, Economic and Technological Research Institute of State Grid Fujian Electric Power Co Ltd filed Critical State Grid Fujian Electric Power Co Ltd
Priority to CN202211574631.5A priority Critical patent/CN116108963A/en
Publication of CN116108963A publication Critical patent/CN116108963A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/80Management or planning
    • Y02P90/84Greenhouse gas [GHG] management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention discloses an electric power carbon emission prediction method and equipment based on an integrated learning module, which are characterized in that after historical electric power carbon emission data are acquired, correlation analysis is carried out on the historical electric power carbon emission data, characteristics irrelevant to the electric power data can be removed, the characteristic quantity in a data set is reduced, the influence caused by different dimensions or value range differences among the data characteristics is eliminated by carrying out standardized processing on the data, finally, a plurality of groups of predicted values are obtained through different base learners, and then final output is realized through a meta learner, so that the problems of inaccurate model and the like existing in electric power carbon emission prediction can be effectively solved, the workload of related workers is effectively reduced, and the electric power carbon emission prediction method has strong expansion characteristics.

Description

Electric power carbon emission prediction method and equipment based on integrated learning module
Technical Field
The invention relates to the technical field of machine learning, in particular to an electric power carbon emission prediction method and equipment based on an integrated learning module.
Background
Climate change, which is mainly characterized by global warming, has received increasing attention in international society since the 90 s of the 20 th century. Human socioeconomic activity, such as fossil fuel usage and land use changes, is the result of CO 2 Isothermal chamber gas concentrations increase, a major driver in global warming.
Among them, the power industry is one of the largest sources of carbon emissions worldwide. The carbon emission amount of the carbon-carbon composite material accounts for about 41 percent of the total carbon emission amount of the global fossil energy. In the process of electric energy production, the required raw materials can generate greenhouse gases in the processes of exploitation, transmission and electric energy production. China is a country with extensive electricity production mainly using coal and electricity for a long time, and greenhouse gas emission caused by the electricity industry is always a main part of greenhouse gas emission in China and the world. Referring to fig. 1 and 2, the carbon emission of the power industry in china in 2002-2019 is wholly in an increasing trend according to the data of the national statistical office and the chinese carbon accounting database. Therefore, on the premise that the power industry ensures national economy development, a method for accurately predicting electric power carbon emission is researched, reasonable adjustment of a power supply structure is realized, comprehensive understanding of the carbon emission condition of an electric power system is realized, a way for reducing carbon emission is found, and the method has important practical significance for realizing peak-reaching targets of carbon emission and low-carbon economy development.
Accurate prediction and accurate control of carbon emission peaks are important means for achieving carbon emission peaks, and the prediction technology establishes emission reduction targets for carbon emission reduction work. At present, a metering model such as a stochastic 3E comprehensive model (random energy-economy-environment comprehensive model) and a Chinese TIMES model establishes an empirical relationship between carbon emission and factors such as economy, population, technology and the like, and the values of the factors in a given situation are deduced according to a regression equation. With the development of computer technology, a neural network model which simulates the behavior characteristics of an animal neural network and performs distributed parallel information processing is rapidly developed, and the neural network model mainly comprises BP, NARX and other neural network models and improved models thereof, and the neural network model has good performance in carbon emission prediction. The system dynamics of researching social, economic, environmental and other problems from macroscopic and microscopic angles is used for analyzing the carbon emission problem of a future power system, setting basic scenes, technical progress scenes and optimizing the power structure scenes, and considering that the power system has great emission reduction potential in the future. Most of the existing researches adopt a prediction technology to predict peak value and peak value time, and a new scheme for optimizing the prediction technology per se according to the existing situation is not proposed to provide an emission reduction path based on a control technology.
There are two main categories of research on carbon emission characteristics in the electric sector: firstly, analysis of emission influencing factors in the power industry is carried out, all influencing factors are decomposed, main factors are identified, influence of factors such as economy, population, carbon price and electricity price on carbon emission is quantitatively researched, and a specific control path is difficult to give out in such research. Secondly, discussing the emission reduction path of the power industry from bottom to top, analyzing the influence of a power supply structure, energy conservation, consumption reduction and the like on the power industry, wherein the research is usually based on the assumption of a traditional economic growth mode, and the analysis of the influence of carbon emission of the power industry from a multi-element angle cannot be realized although the emission reduction potential of a certain factor on the power industry can be well quantified.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the electric power carbon emission prediction method and device based on the integrated learning module are provided, and the electric power industry carbon emission influence is analyzed from a multi-element angle so as to improve the electric power carbon emission prediction precision.
In order to solve the technical problems, the invention adopts the following technical scheme:
an electric power carbon emission prediction method based on an integrated learning module comprises the following steps:
acquiring historical electric power carbon emission data;
performing feature extraction and data correlation analysis on the historical electric power carbon emission data to obtain electric power carbon emission data features;
carrying out standardization processing on the electric power carbon emission data characteristics to obtain standardized data characteristics;
respectively inputting the standardized data characteristics into different base learners for training to obtain a plurality of groups of predicted values;
and inputting a plurality of groups of predicted values into a meta learner to obtain a predicted result.
In order to solve the technical problems, the invention adopts another technical scheme that:
an integrated learning module-based electric carbon emission prediction device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, which when executed implements the steps of an integrated learning module-based electric carbon emission prediction method as described above.
The invention has the beneficial effects that: after the historical electric power carbon emission data are obtained, the historical carbon emission data are subjected to correlation analysis, characteristics irrelevant to the electric power data can be removed, the characteristic quantity in a data set is reduced, the influence caused by difference of dimension or value range among data characteristics is eliminated through standardized processing of the data, finally, after a plurality of groups of predicted values are obtained through different base learners, final output is realized through a meta learner, the problems of inaccurate model and the like existing in electric power carbon emission prediction can be effectively solved, and accordingly, the workload of related workers is effectively reduced, and the electric power carbon emission prediction system has strong expansion characteristics.
Drawings
FIG. 1 is a graph of total carbon emissions from the Chinese electric power industry in 2002-2019;
FIG. 2 is a carbon emission histogram of the electric power industry in the middle of 2019;
FIG. 3 is a flowchart showing steps of an integrated learning module-based electric power carbon emission prediction method according to an embodiment of the present invention;
fig. 4 is a flowchart of a Stacking integrated model in an integrated learning module-based electric power carbon emission prediction method according to an embodiment of the present invention;
FIG. 5 is a diagram of a process for evaluating importance of operation characteristics of a random forest algorithm in an electric power carbon emission prediction method based on an integrated learning module according to an embodiment of the present invention;
FIG. 6 is a graph of operational variable feature importance scores and ranking graphs thereof in an integrated learning module-based electric carbon emission prediction method in an embodiment of the present invention;
FIG. 7 is a flowchart of an XGBoost algorithm in an integrated learning module-based electric power carbon emission prediction method according to an embodiment of the present invention;
fig. 8 is a flowchart of an Adaboost algorithm in an integrated learning module-based electric carbon emission prediction method according to an embodiment of the present invention;
FIG. 9 is a flowchart of a KNN algorithm in an integrated learning module-based electric power carbon emission prediction method in an embodiment of the invention;
FIG. 10 is a flowchart of DT algorithm in an integrated learning module-based electric carbon emission prediction method according to an embodiment of the present invention;
FIG. 11 is a flowchart showing another step of an integrated learning module-based electric carbon emission prediction method according to an embodiment of the present invention;
FIG. 12 is a graph of the prediction results of various models in an embodiment of the present invention;
fig. 13 is a graph of a prediction result of the electric power carbon emission prediction method based on an integrated learning module for the chinese electric power carbon emission of 2016-2019 according to an embodiment of the present invention;
FIG. 14 is a thermal graph of electrical carbon emission data correlation of an electrical carbon emission prediction method based on an integrated learning module according to an embodiment of the present invention;
FIG. 15 is a Stacking integrated model pseudo code of an integrated learning module-based electric carbon emission prediction method in an embodiment of the present invention;
fig. 16 is a schematic structural diagram of an electric carbon emission prediction device based on an integrated learning module according to an embodiment of the present invention.
Detailed Description
In order to describe the technical contents, the achieved objects and effects of the present invention in detail, the following description will be made with reference to the embodiments in conjunction with the accompanying drawings.
Referring to fig. 3, an electric power carbon emission prediction method based on an integrated learning module includes the steps of:
acquiring historical electric power carbon emission data;
performing feature extraction and data correlation analysis on the historical electric power carbon emission data to obtain electric power carbon emission data features;
carrying out standardization processing on the electric power carbon emission data characteristics to obtain standardized data characteristics;
respectively inputting the standardized data characteristics into different base learners for training to obtain a plurality of groups of predicted values;
and inputting a plurality of groups of predicted values into a meta learner to obtain a predicted result.
From the above description, the beneficial effects of the invention are as follows: after the historical electric power carbon emission data are obtained, the historical carbon emission data are subjected to correlation analysis, characteristics irrelevant to the electric power data can be removed, the characteristic quantity in a data set is reduced, the influence caused by difference of dimension or value range among data characteristics is eliminated through standardized processing of the data, finally, after a plurality of groups of predicted values are obtained through different base learners, final output is realized through a meta learner, the problems of inaccurate model and the like existing in electric power carbon emission prediction can be effectively solved, and accordingly, the workload of related workers is effectively reduced, and the electric power carbon emission prediction system has strong expansion characteristics.
Further, the performing feature extraction and data correlation analysis on the historical electric power carbon emission data to obtain electric power carbon emission data features includes:
determining an electric power carbon emission influence factor index;
extracting a power feature set from the historical power carbon emission data;
and carrying out correlation analysis on the electric power characteristic set according to the electric power carbon emission influence factor index to obtain the electric power carbon emission data characteristic.
From the above description, it is known that by determining the influence factor index of the electric power carbon emission and removing the data which is not related to the electric power carbon emission influence factor index from the history electric power carbon emission data, effective data characteristics can be obtained, thereby improving the prediction accuracy of the base learner and the meta learner.
Further, the base learner includes XGBoost;
the training by respectively inputting the standardized data features into different base learners comprises the following steps:
the importance ranking is carried out on the standardized data features through a random forest algorithm;
and inputting the standardized data features into the XGBoost for training according to the importance ranking order.
As can be seen from the above description, the XGBoost algorithm has strong generalization capability, can effectively prevent the overfitting phenomenon, has parallel optimization capability, has high training efficiency and high modeling speed, has the advantage of high-dimensional processing speed, and can effectively improve training and prediction effects by combining the XGBoost algorithm with the RF algorithm.
Further, the base learner includes KNN;
the step of respectively inputting the standardized data features into different base learners for training, and the step of obtaining a plurality of groups of predicted values comprises the following steps:
obtaining a sample to be predicted;
calculating the distance between the sample to be predicted and the standardized data feature to obtain a standardized data feature distance ordering;
selecting a preset number of standardized data features according to the distance sorting of the standardized data features, and calculating a distance weight factor corresponding to the sample to be predicted according to the preset number of standardized data features;
and classifying the samples to be predicted according to the distance weight factors, and outputting predicted values.
As can be seen from the above description, when extracting relevant features of the electric power carbon emission data, there is often a disturbance in the features of the predictive analysis due to problems such as noise or missing existing in the data; and when multiple data features are similar, the traditional KNN algorithm can obtain good performance after parameter adjustment, but as more other data features exist in the training set and are also contained in the range of K nearest neighbors, the classification result is wrong, and the problems can be effectively solved by combining the distance weight factors and the KNN.
Further, selecting a preset number of standardized data features according to the distance sorting of the standardized data features, and calculating the distance weight factor corresponding to the sample to be predicted according to the selected number of standardized data features comprises:
Figure BDA0003988789680000061
classifying the sample to be predicted according to the distance weight factor comprises:
Figure BDA0003988789680000062
wherein W is i Represents a distance weight factor, d i Represents the distance between the sample to be measured and the ith neighbor, d k Represents the furthest distance in k-nearest neighbor, d 1 Representing the nearest distance in k neighbors; c x Representing the final classification result, L represents the set of all sample classes, N k (X) represents a neighborhood of X containing k neighboring points, class (c) xi ) Representing training samples x i Is defined by the category, I (v=class (c xi ) Indicating a function, returns a 1 when its value is true,when the value is false, 0 is returned.
From the above description, the distance between the sample to be tested and the i-th neighbor, the farthest distance in the k-th neighbor and the nearest distance in the k-th neighbor are calculated, so that accurate classification is effectively realized through the distance weight factor and the distance between the sample to be tested, and a predicted value is output.
Further, the base learner further includes adaboost and DT;
the step of respectively inputting the standardized data features into different base learners for training comprises the steps of obtaining a plurality of groups of predicted values:
training the adaboost through the standardized data features and outputting a group of predicted values;
and training the DT through the standardized data characteristics and outputting a group of predicted values.
As can be seen from the above description, the adaboost algorithm and the DT algorithm are adopted as the base learners, and the adaboost algorithm can effectively measure the weight of each learner, so that the prediction accuracy is high; the DT algorithm can deduce a corresponding expression according to the generated decision tree, and the prediction accuracy is improved.
Further, the data correlation analysis of the historical electrical carbon emission data includes:
and carrying out correlation analysis on the historical electric power carbon emission data through a Pearson correlation coefficient, and reserving the characteristic that the Pearson correlation coefficient value is larger than a preset value to obtain the electric power carbon emission data characteristic.
From the above description, the analysis of the data features by the Pearson correlation coefficient can effectively remove the features irrelevant to the learning task in the historical electric power carbon emission data, improve the learning precision of the learner, and further output more accurate predicted values.
Further, the normalizing the electrical carbon emission data characteristic includes:
the electrical carbon emission data characteristics are data normalized by Z-score normalization.
From the above description, it is known that by performing the standardization processing on the electric power carbon emission data characteristics by using the z-score, the problem of dimension and magnitude inconsistency existing between the data characteristics can be effectively eliminated.
Further, the electric power carbon emission influence factor index includes:
energy subsystem index, population subsystem index, economic subsystem index, industrial electricity consumption subsystem index, and power production subsystem index.
As can be seen from the above description, the energy subsystem index, population subsystem index, economic subsystem index, industrial electricity consumption subsystem index and power production subsystem index are used as learning indexes, so that the influence of the related angle between industries on the carbon emission of the power industry can be analyzed, and the prediction of the carbon emission of the power industry can be improved.
Another embodiment of the present invention provides an integrated learning module-based electric carbon emission prediction apparatus, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements each step in an integrated learning module-based electric carbon emission prediction method as described above when the computer program is executed by the processor.
According to the electric power carbon emission prediction method and method based on the integrated learning module, the electric power industry carbon emission influence is analyzed based on the multielement angle, so that the electric power carbon emission prediction precision is improved, and the electric power carbon emission prediction method and method are described by the following specific embodiments:
example 1
Referring to fig. 3, an electric power carbon emission prediction method based on an integrated learning module includes the steps of:
s1, acquiring historical electric power carbon emission data; collecting carbon emission data of the power industry in China in 2002-2019 as the historical power carbon emission data by a national statistical bureau and a China carbon accounting database, and taking the data as data required by model training;
s2, performing feature extraction and data correlation analysis on the historical electric power carbon emission data to obtain electric power carbon emission data features; the feature extraction is one of the most common technologies in machine learning, and is used for discarding features in the data set, which are not related to learning tasks, so that the number of the features in the data set is reduced; however, in the prior art, main characteristics are often selected empirically, so that the method has certain subjectivity; however, at different times and places, the characteristics affecting the prediction of the carbon emission of the electric power are usually different, so that a quantitative method needs to be used to find the main characteristics affecting the carbon emission of the electric power; in this example, pearson correlation coefficients were used for analysis, specifically:
s21, determining an electric power carbon emission influence factor index; in an alternative embodiment, the energy subsystem index, population subsystem index, economic subsystem index, industrial electric consumption subsystem index and electric power production subsystem index are used as the electric power carbon emission influencing factor indexes;
s22, extracting an electric power characteristic set from the historical electric power carbon emission data;
s23, carrying out correlation analysis on the electric power feature set according to the electric power carbon emission influence factor index to obtain electric power carbon emission data features, carrying out correlation analysis on the historical electric power carbon emission data through a Pearson correlation coefficient, and reserving the feature that the Pearson correlation coefficient value is larger than a preset value to obtain the electric power carbon emission data features:
Figure BDA0003988789680000081
wherein n represents the total number of features, x i Represents the ith feature of x, y i The ith feature representing y, x and y represent feature means, r xy A correlation coefficient value representing a feature x and a feature y; if y is the energy subsystem index, population subsystem index, economic subsystem index, industrial power consumption subsystem index and power production subsystem index, and x is the feature to be extracted, extracting the features related to the 5 indexes; referring to fig. 14, the electrical carbon emission data characteristics as extracted include: 1. thermal power generation power (PTP); 2. flat plateDaily energy consumption (EC 1); 3. total industrial Electricity Consumption (ECI); 4. power production (OE); 5. average daily power consumption (EC 2); 6. population (POP); 7. average daily Crude Oil Consumption (COC); 8. average daily coke consumption (CC 2); 9. average daily coal consumption (CC 1); 10. total domestic production (GDP); 11. hydropower production power (PHP); 12. average natural gas consumption per day (NGC); 13. a power outlet quantity (EE); 14. average daily Gasoline Consumption (GC); 15. average daily Diesel Consumption (DC); 16. average kerosene consumption per day (KC); 17. wind power production power amount (PWP); 18. PNP: nuclear power production electric quantity (PNP); 19. power inlet amount (IE);
s3, carrying out standardized processing on the electric power carbon emission data characteristics to obtain standardized data characteristics; since the 19 features obtained in S2 have different units and there is an order of magnitude difference between the features, e.g., the unit of feature 10-GDP is a unit, and the feature 5-average daily power consumption is joule (or kilowatt-hour equivalent); if the original characteristic data is directly used for calculation and analysis such as distance, gradient and the like, the effect of the index with large magnitude is outstanding, and relatively speaking, the effect of the index with small magnitude is weakened and the prediction result is greatly influenced; therefore, the feature data needs to be standardized to eliminate the influence, so that the convergence rate of model training is increased, and the accuracy of a prediction result is improved; in this embodiment, the data normalization processing is performed on the electric power carbon emission data features by adopting Z-score normalization, which is specific to the field to which different normalization methods are applied:
Figure BDA0003988789680000091
wherein Z is α Represents standard score, mu α Representing the mean value of the dataset, X α Representing the original dataset, S α Representing standard deviation of the dataset; the electric power carbon emission data features are normalized by the above formula as data sets;
s4, respectively inputting the standardized data characteristics into different base learners for training to obtain a plurality of groups of predicted values; training and learning the standardized data by a base learner of four different algorithms, and outputting four different predicted values according to training results and combining data to be predicted;
s5, inputting a plurality of groups of predicted values into a meta learner to obtain a predicted result; and after receiving the four groups of predicted values, the meta learner outputs a final predicted result.
Example two
The present embodiment differs from the first embodiment in that the base learner and the meta learner adopted are specifically defined;
referring to fig. 4 and 15, the base learner includes RF-XGBoost (Random Forest-ExtremeGradient Boosting ), adaboost (Adaptive Boosting "(adaptive boosting), improved KNN (K-Nearest Neighbor) and DT (Decision Tree);
1) A base learner RF-XGBoost;
referring to fig. 5, S1, ranking importance of the normalized data features by a random forest algorithm; carrying out importance assessment and quantification on the 19 electric power carbon emission data characteristics screened in the first embodiment through a random forest algorithm; representing importance scores of each feature by VIM, wherein a higher score indicates that the feature has a greater influence on electric carbon emission; the 19 features are respectively represented by X1, X2, … and X19, wherein the average value of the contribution of the nth feature in all decision trees of the random forest algorithm is the GI score of the operating variable Xn, so as to
Figure BDA0003988789680000101
A representation; in this embodiment, the feature importance is measured by using a matrix importance score VIMj, and the feature importance is obtained according to the matrix index variation before and after branching of the decision tree of the random forest, and the calculation formula of the matrix index is as follows:
Figure BDA0003988789680000102
wherein q represents the number of classes, p, of the feature samples q Representing the proportion of the category q in all nodes; feature X j The importance of the node w is that the Gini index change quantity before and after branching of the node w is:
Figure BDA0003988789680000103
GI l and GI r Gini index representing two new nodes after branching;
feature X j The importance on the ith tree is:
Figure BDA0003988789680000111
wherein W represents the total number of nodes;
assuming a total of t trees, summing to obtain feature X j The importance of (2) is:
Figure BDA0003988789680000112
finally, normalization operation is carried out, and the importance score VImu of Gini is obtained as follows:
Figure BDA0003988789680000113
then, carrying out normalization processing on the 19 feature importance scores related to the electric power carbon emission, and sorting the feature importance scores according to descending order; referring to fig. 6, among the 19 selected characteristics, the thermal power generation power amount has the highest score, which illustrates that the influence on the electric power carbon emission is the greatest; and the score of the electric power inlet amount is the lowest, which shows that the influence on electric power carbon emission is the smallest;
s2, inputting the standardized data features into the XGBoost for training according to the importance ranking order, referring to FIG. 7, the algorithm flow of the XGBoost is as follows:
s21, generating a basic unit decision tree, generating the decision tree through splitting of input features, and continuously fitting the residual error of the last splitting in the process;
s22, after the model trains n trees, calculating the scores of all leaf nodes;
s23, counting leaf scores of all trees, namely, a predicted value of XGBoost;
the objective function of XGBoost is:
Figure BDA0003988789680000114
wherein y is i The actual value is represented by a value that is,
Figure BDA0003988789680000115
representing the predicted value at time t-1, f t (x i ) The difference between the predicted value at time t and the predicted value at time t-1 is represented by l, the loss function is represented by l, and the constant is represented by a constant value; omega (f) t ) Representing a regularization term, including L1 regularization and L2 regularization, wherein the expression of the regularization term is that the complexity of a basic unit decision tree is weakened when the complexity of the decision tree is higher as the numerical value is larger:
Figure BDA0003988789680000121
wherein T represents the number of leaf nodes, ω represents the score of the leaf nodes, and γ and λ represent weight coefficients; meanwhile, in the XGBoost algorithm, taylor second-order expansion is carried out on the loss function, and the objective function is obtained after optimization:
Figure BDA0003988789680000122
Figure BDA0003988789680000123
Figure BDA0003988789680000124
wherein m represents the total amount of samples,
Figure BDA0003988789680000125
representing a first order gradient +.>
Figure BDA0003988789680000126
Representing a second order gradient;
2) Adaboost base learner
Referring to fig. 8, the Adaboost is trained through the standardized data features, and a set of predicted values are output, and the calculation flow of the Adaboost algorithm is as follows:
s1, initializing sample weights, wherein the formula is as follows:
D(k)=(ω k1k2 ,...,ω km );ω 1i =1/m;i=1,2,...,m;
wherein m represents the total number of samples, ω 1i The i-th sample weight representing the 1 st weak learner;
s2, initializing a weak learner;
s3, calculating the weighted error rate and the weight coefficient of the kth weak learner:
Figure BDA0003988789680000127
Figure BDA0003988789680000128
wherein omega ki The ith sample weight, G, representing the kth weak learner k (x i ) Representing the prediction result of the weak learner, y i Representing the true value, I (G k (x i )≠y i ) Representing an indication function, a k A weight coefficient representing a weak learner;
s4, updating weights in the training data set, and starting the next iteration:
Figure BDA0003988789680000129
wherein Z is k Is a generalization factor, and the expression is as follows:
Figure BDA0003988789680000131
s5, generating a final strong learner:
Figure BDA0003988789680000132
wherein K represents the number of weak learners, sign (x) represents a sign function, 1 is returned when the function value is greater than 0, 1 is returned when the function value is less than 0, and 0 is returned when the function value is equal to 0.
3) A base learner KNN, wherein the brief steps of the KNN algorithm are as follows: firstly, calculating the distance between the point in the known class data set and the current point; secondly, sequentially sorting according to increasing distance; then, k points with the smallest distance from the current point are selected, and the occurrence frequency of the category where the first k points are located is determined; finally, returning the category with highest occurrence frequency of the first k points to be used as the prediction category of the current point;
please refer to fig. 9, specifically:
s1, calculating the distance between the sample to be predicted and the standardized data feature to obtain a standardized data feature distance sequence;
s2, selecting a preset number of standardized data features according to the distance sorting of the standardized data features, and calculating a distance weight factor corresponding to the sample to be predicted according to the preset number of standardized data features;
Figure BDA0003988789680000133
wherein,,W i represents a distance weight factor, d i Represents the distance between the sample to be measured and the ith neighbor, d k Represents the furthest distance in k-nearest neighbor, d 1 Representing the nearest distance in k neighbors;
s3, classifying the samples to be predicted according to the distance weight factors, and outputting predicted values;
Figure BDA0003988789680000134
c x representing the final classification result, L represents the set of all sample classes, N k (X) represents a neighborhood of X containing k neighboring points, class (c) xi ) Representing training samples x i Is defined by the category, I (v=class (c xi ) Indicating function, when its value is true, returns 1, when its value is false, returns 0;
4) The basic learner DT, DT algorithm is similar to a white box model, can be pushed to a corresponding expression according to the generated decision tree, and parameters can be effectively adjusted in the experimental process to obtain a satisfactory result; the specific process of DT algorithm is as follows:
referring to fig. 10, a key coefficient gain_σ is selected from a decision tree as an index for evaluating splitting attribute; in the decision tree splitting process, calculating a coefficient gain_sigma, wherein the smaller the coefficient gain_sigma is, the better the characteristic attribute is, and the better the model construction is;
if the objective function of the sample set S is continuous data, the total variance is:
Figure BDA0003988789680000141
wherein μ represents the average value of the prediction results in the sample set S, y k Representing a kth sample prediction; for a sample set S containing N samples, the sample set S is divided into two parts according to the ith attribute value of the attribute a, and after the division into two parts, the coefficient of the key gain_σ is calculated as follows:
Gain_σ A,i (S)=σ(S 1 )+σ(S 2 );
when the tree structure calculates the attribute a, the minimum coefficient gain_σ is selected when calculating any part of the coefficient of the attribute a, and at this time, the attribute a dividing scheme:
Figure BDA0003988789680000142
for the sample set S, an optimal value is selected from the two schemes of all the attributes as an optimal scheme of the sample set S:
Figure BDA0003988789680000143
the obtained attribute A and the i-th attribute value thereof are the optimal splitting attribute and the optimal splitting attribute value of the sample set S;
s5, inputting a plurality of groups of predicted values into a meta learner to obtain a predicted result; after receiving the four groups of predicted values, the meta learner outputs a final predicted result, specifically:
inputting the data to be predicted into the base learner, and outputting four corresponding groups of predicted values; processing the four groups of predicted values through a meta learner adopting a GBDT (gradient lifting decision tree) algorithm, and outputting a final predicted result; the GBDT algorithm is briefly described as follows: s51, initializing a weak learner; s52, calculating a negative gradient of the model; s53, taking the residual error in the S52 as a new value of data, and taking a data sample and the residual error value as training data of a next decision tree; s54, training a new weak learner G according to the new training data generated in the previous step k (X); s55, repeating the steps S52 to S54, and generating a final strong learner when the minimum error requirement is met.
Example III
The embodiment provides a specific parameter setting example of each base learner;
referring to fig. 11, since different super parameters may affect the performance of the machine learning model, in this embodiment, latin hypercube sampling technology is used to analyze the super parameter selection of machine learning; the Latin supersampling method mainly comprises three steps of layering, sampling and disordered; layering is carried out in the initial value range of the super parameter defined in the embodiment, the value range of the parameter is divided into 5 layers for sampling, then random sampling is carried out in each layer, and then the model is substituted for accuracy verification, compared with a random Monte Carlo sampling method, the layering step of Latin super sampling enables the sampling to cover a wider value range of the parameter; through example verification, in the embodiment, parameters of each algorithm are set as follows:
the base learner 1) the learning rate of the RF-XGBoost algorithm is set to 0.01, the number of estimators is set to 1000, and the maximum depth of the tree is set to 4;
the base learner 2) the estimator number of the Adaboost algorithm is set to 50, the learning rate is set to 0.05, and the loss function type is selected to be linear;
the base learner 3) improving the weight type of the KNN algorithm to be uniform weight, and selecting the leaf size to be 30; the number of the RF algorithm decision trees is set to 100, the maximum depth of the trees is set to 10, the number of samples with the least leaf nodes is set to 5, and the number of samples with the least division of each node is set to 5;
the base learner 4) the minimum number of samples of the leaf node of the DT algorithm is set to 4, and the minimum number of samples divided by each node is set to 4; the kernel function of SVM selects Gaussian kernel function, and depth selects 3
The learning rate of the GBDT algorithm of the meta-learner is set to 0.01, the number of estimators is set to 1000, the minimum number of samples of the leaf nodes is set to 2, and the minimum number of samples divided by each node is set to 1;
after the original data set is analyzed and processed, the data set presents the condition that similar characteristic data are continuously adjacent, in order to avoid the condition that data sampling is unbalanced in the process of dividing the training set and the testing set and avoid the overfitting phenomenon caused by continuous similar characteristics in batch training, the embodiment provides a dynamic balance sampling algorithm aiming at the time sequence data set; firstly, dividing a data set into B sets according to attribute groups, and segmenting data samples of each set according to a training time step T; then, according to the sampling sample number S and the single sampling set number K, sampling S/K number samples from the corresponding set K by adopting a system sampling method; finally, combining the extracted samples to form a batch training set and a testing set; the sampling algorithm is performed in a non-return manner, and the problem of sample boundaries is noted during use: the formula is as follows:
Figure BDA0003988789680000161
where v represents the number of dynamic balance samples performed; h represents the extracted set number; s is S ik Representing the ith sample taken from set h; d represents the current sample set being extracted;
for the above constructed models, three common predictive performance metrics are selected to evaluate each predictive model, including: root Mean Square Error (RMSE), the expression of which is as follows:
Figure BDA0003988789680000162
mean Absolute Error (MAE), expressed as follows:
Figure BDA0003988789680000163
a coefficient (R2) is determined, the expression of which is as follows:
Figure BDA0003988789680000164
wherein M is i Representing target values of the dataset, P i Representing a predicted value, N representing the data set size, M representing the mean of the data set target values; in order to verify the performance of the Stacking integrated learning model in the embodiment, the Stacking integrated learning model is compared with other traditional regression models; referring to FIG. 12, the results of the examples show thatThe Stacking integrated learning model is superior to other traditional regression models in the index of RMSE, MAE, R2 three regression models; in the described embodiment: RMSE of Stacking ensemble learning model is 0.2823, while RMSE of GBDT, RF-XGBoost, adaboost, RF, modified KNN, DT, SVM is 0.3160, 0.3357, 0.3434, 0.3852, 0.5321, 0.3544, 0.4160, respectively; MAE of the Stacking ensemble learning model was 0.1969, whereas GBDT, RF-XGBoost, adaboost, RF, MAE of improvement KNN, DT, SVM were 0.2646, 0.2929, 0.3140, 0.3296, 0.5295, 0.3220, 0.3081, respectively; the R2 values of the Stacking ensemble learning model are 0.8426, while the R2 values of GBDT, RF-XGBoost, adaboost, RF, and modified KNN, DT, SVM are 0.8029, 0.7775, 0.7671, 0.7070, 0.6508, 0.7520, 0.6583; in summary, the Stacking integrated learning model provided by the embodiment has a certain improvement on various evaluation indexes;
further, in order to verify the effectiveness of the algorithm, a basic learner and a meta learner of the Stacking integrated learning model are sequentially subjected to replacement comparison; please refer to fig. 13, wherein: a represents a model which takes DT, RF-XGBoost, adaboost and improved KNN as basic learners and GBDT as meta learners; b represents a model taking GBDT, RF-XGBoost, adaboost and improved KNN as base learners and DT as element learners; c represents a model with DT, GBDT, adaboost and improved KNN as a base learner and RF-XGBoost as a meta learner; d represents a model taking DT, RF-XGBoost, GBDT and improved KNN as base learners and Adaboost as element learners; e represents a model based on DT and RF-XGBoost, adaboost, GBDT to improve KNN as a meta-learner; according to the example results: the performance of the Stacking integrated learning model with GBDT as a meta-learner, DT, RF-XGBoost, adaboost and improved KNN as a base learner is optimal, and the performance of the Stacking integrated learning model is improved in RMSE, MAE, R three regression model indexes. The experimental results of the embodiment show that the performance of the Stacking integrated learning model is not superior to that of a single model, wherein the Stacking integrated learning model with the improved KNN as a meta-learner and the GBDT and RF-XGBoost, adaboost, DT as base learners has a lower sliding on three regression model evaluation indexes compared with the improved KNN model.
Example IV
Referring to fig. 16, an integrated learning module-based electric carbon emission prediction apparatus includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps in an integrated learning module-based electric carbon emission prediction method according to any one of the first, second, or third embodiments when executing the computer program.
In summary, the invention discloses a method and a device for predicting electric power carbon emission based on an integrated learning module, which are used for removing characteristics irrelevant to electric power data by performing correlation analysis on the historical electric power carbon emission data after obtaining the historical electric power carbon emission data, reducing the number of the characteristics in a data set, eliminating the influence caused by different dimensions or value range differences among the characteristics of the data by performing standardization processing on the data, finally obtaining a plurality of groups of predicted values through different base learners and realizing final output through a meta learner, thereby effectively solving the problems of inaccurate model and the like existing in electric power carbon emission prediction, effectively reducing the workload of related workers and having strong expansion characteristics.
It should be noted that the present invention is not limited to the exemplary examples shown above. Moreover, the algorithm proposed by the present invention can be implemented by other methods or forms without violating the essential features of the present invention. Accordingly, the above example should be viewed as an illustrative example, and not a limiting example. The scope of the invention is indicated by the appended claims rather than by the foregoing description, and all changes that come within the meaning and range of equivalency of the claims are intended to be embraced therein, therefore, without departing from the principles of the invention, and several modifications and improvements thereto are intended to be regarded as within the scope of the invention.
The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent changes made by the specification and drawings of the present invention, or direct or indirect application in the relevant art, are included in the scope of the present invention.

Claims (10)

1. The electric power carbon emission prediction method based on the integrated learning module is characterized by comprising the following steps of:
acquiring historical electric power carbon emission data;
performing feature extraction and data correlation analysis on the historical electric power carbon emission data to obtain electric power carbon emission data features;
carrying out standardization processing on the electric power carbon emission data characteristics to obtain standardized data characteristics;
respectively inputting the standardized data characteristics into different base learners for training to obtain a plurality of groups of predicted values;
and inputting a plurality of groups of predicted values into a meta learner to obtain a predicted result.
2. The method for predicting the carbon emission of electric power based on an integrated learning module according to claim 1, wherein the performing feature extraction and data correlation analysis on the historical carbon emission data of electric power to obtain the feature of the carbon emission data of electric power comprises:
determining an electric power carbon emission influence factor index;
extracting a power feature set from the historical power carbon emission data;
and carrying out correlation analysis on the electric power characteristic set according to the electric power carbon emission influence factor index to obtain the electric power carbon emission data characteristic.
3. The method for predicting the carbon emissions of electric power based on an ensemble learning module of claim 1, wherein said base learner comprises XGBoost;
the training by respectively inputting the standardized data features into different base learners comprises the following steps:
the importance ranking is carried out on the standardized data features through a random forest algorithm;
and inputting the standardized data features into the XGBoost for training according to the importance ranking order.
4. The method for predicting electric carbon emissions based on an ensemble learning module of claim 1, wherein said base learner comprises KNN;
the step of respectively inputting the standardized data features into different base learners for training, and the step of obtaining a plurality of groups of predicted values comprises the following steps:
obtaining a sample to be predicted;
calculating the distance between the sample to be predicted and the standardized data feature to obtain a standardized data feature distance ordering;
selecting a preset number of standardized data features according to the distance sorting of the standardized data features, and calculating a distance weight factor corresponding to the sample to be predicted according to the preset number of standardized data features;
and classifying the samples to be predicted according to the distance weight factors, and outputting predicted values.
5. The method for predicting the electric carbon emission based on the integrated learning module according to claim 4, wherein selecting a preset number of the standardized data features according to the distance ordering of the standardized data features, and calculating the distance weight factor corresponding to the sample to be predicted according to the selected number of the standardized data features comprises:
Figure FDA0003988789670000021
classifying the sample to be predicted according to the distance weight factor comprises:
Figure FDA0003988789670000022
wherein W is i Represents a distance weight factor, d i Representing the distance between the sample to be measured and the ith neighborSeparation, d k Represents the furthest distance in k-nearest neighbor, d 1 Representing the nearest distance in k neighbors; c x Representing the final classification result, L represents the set of all sample classes, N k (X) represents a neighborhood of X containing k neighboring points, class (c) xi ) Representing training samples x i Is defined by the category, I (v=class (c xi ) Indicating a function, returns 1 when its value is true, and returns 0 when its value is false.
6. The method for predicting the carbon emissions of electric power based on an integrated learning module of claim 1, wherein the base learner further comprises adaboost and DT;
the step of respectively inputting the standardized data features into different base learners for training comprises the steps of obtaining a plurality of groups of predicted values:
training the adaboost through the standardized data features and outputting a group of predicted values;
and training the DT through the standardized data characteristics and outputting a group of predicted values.
7. The method for predicting the carbon emissions of electric power based on an ensemble learning module of claim 1, wherein said data correlation analysis of said historical carbon emissions of electric power comprises:
and carrying out correlation analysis on the historical electric power carbon emission data through a Pearson correlation coefficient, and reserving the characteristic that the Pearson correlation coefficient value is larger than a preset value to obtain the electric power carbon emission data characteristic.
8. The method for predicting the carbon emissions of electric power based on an ensemble learning module as set forth in claim 1, wherein said normalizing the carbon emissions data features of electric power includes:
the electrical carbon emission data characteristics are data normalized by Z-score normalization.
9. The method for predicting the carbon emission of electric power based on the integrated learning module according to claim 1, wherein the index of the carbon emission influencing factor of electric power comprises:
energy subsystem index, population subsystem index, economic subsystem index, industrial electricity consumption subsystem index, and power production subsystem index.
10. An integrated learning module based electric carbon emission prediction device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, implements the steps of an integrated learning module based electric carbon emission prediction method as claimed in any one of claims 1-9.
CN202211574631.5A 2022-12-08 2022-12-08 Electric power carbon emission prediction method and equipment based on integrated learning module Pending CN116108963A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211574631.5A CN116108963A (en) 2022-12-08 2022-12-08 Electric power carbon emission prediction method and equipment based on integrated learning module

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211574631.5A CN116108963A (en) 2022-12-08 2022-12-08 Electric power carbon emission prediction method and equipment based on integrated learning module

Publications (1)

Publication Number Publication Date
CN116108963A true CN116108963A (en) 2023-05-12

Family

ID=86266567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211574631.5A Pending CN116108963A (en) 2022-12-08 2022-12-08 Electric power carbon emission prediction method and equipment based on integrated learning module

Country Status (1)

Country Link
CN (1) CN116108963A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116739867A (en) * 2023-06-27 2023-09-12 南方电网能源发展研究院有限责任公司 Method and device for measuring carbon emission of electric power system and computer equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116739867A (en) * 2023-06-27 2023-09-12 南方电网能源发展研究院有限责任公司 Method and device for measuring carbon emission of electric power system and computer equipment

Similar Documents

Publication Publication Date Title
He et al. Short-term wind power prediction based on EEMD–LASSO–QRNN model
CN104881706B (en) A kind of power-system short-term load forecasting method based on big data technology
CN106845717B (en) Energy efficiency evaluation method based on multi-model fusion strategy
Llorca et al. Using the latent class approach to cluster firms in benchmarking: An application to the US electricity transmission industry
CN114048436A (en) Construction method and construction device for forecasting enterprise financial data model
CN115860173A (en) Construction and prediction method and medium of carbon emission prediction model based on Stacking algorithm
CN118134210B (en) Carbon footprint management method and system for steel production
CN116187835A (en) Data-driven-based method and system for estimating theoretical line loss interval of transformer area
CN114266421B (en) New energy power prediction method based on composite meteorological feature construction and selection
CN110533249B (en) Metallurgical enterprise energy consumption prediction method based on integrated long-term and short-term memory network
CN116108963A (en) Electric power carbon emission prediction method and equipment based on integrated learning module
CN116861776A (en) Near real-time carbon emission calculation method based on electric-energy-carbon correlation model
CN117150232B (en) Large model non-time sequence training data quality evaluation method
CN110033126A (en) Shot and long term memory network prediction technique based on attention mechanism and logistic regression
CN117422167A (en) Electric power carbon emission predictive analysis method based on tree model
CN117117857A (en) Regional short-term power load prediction method considering user electricity consumption and seasonal characteristics
CN116341929A (en) Prediction method based on clustering and adaptive gradient lifting decision tree
CN114330485A (en) Power grid investment capacity prediction method based on PLS-SVM-GA algorithm
Karimi et al. Analyzing the results of buildings energy audit by using grey incidence analysis
CN114266593A (en) Power consumption prediction method based on KPCA and linear regression
Li et al. University Students' behavior characteristics analysis and prediction method based on combined data mining model
Qin Software reliability prediction model based on PSO and SVM
Zhang et al. Ensemble optimization approach based on hybrid mode decomposition and intelligent technology for wind power prediction system
CN117172094B (en) Positive and negative influence visualization and quantification method for land utilization change driving factors
CN114465256B (en) Multi-node electric vehicle charging load combined countermeasure generation interval prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination