CN114896860A - Soft measurement method for carbon content in fly ash based on LightGBM and XGboost combined model - Google Patents
Soft measurement method for carbon content in fly ash based on LightGBM and XGboost combined model Download PDFInfo
- Publication number
- CN114896860A CN114896860A CN202210318954.1A CN202210318954A CN114896860A CN 114896860 A CN114896860 A CN 114896860A CN 202210318954 A CN202210318954 A CN 202210318954A CN 114896860 A CN114896860 A CN 114896860A
- Authority
- CN
- China
- Prior art keywords
- lightgbm
- model
- xgboost
- fly ash
- carbon content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 title claims abstract description 43
- 229910052799 carbon Inorganic materials 0.000 title claims abstract description 43
- 239000010881 fly ash Substances 0.000 title claims abstract description 42
- 238000000691 measurement method Methods 0.000 title abstract description 6
- 238000000034 method Methods 0.000 claims abstract description 55
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 36
- 238000005259 measurement Methods 0.000 claims abstract description 19
- 238000005457 optimization Methods 0.000 claims abstract description 17
- 239000011159 matrix material Substances 0.000 claims abstract description 13
- 238000007418 data mining Methods 0.000 claims abstract description 7
- 238000004806 packaging method and process Methods 0.000 claims abstract description 7
- 238000003066 decision tree Methods 0.000 claims description 20
- 238000002485 combustion reaction Methods 0.000 claims description 12
- 239000003245 coal Substances 0.000 claims description 11
- 238000011156 evaluation Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 10
- 230000008878 coupling Effects 0.000 claims description 7
- 238000010168 coupling process Methods 0.000 claims description 7
- 238000005859 coupling reaction Methods 0.000 claims description 7
- 239000000126 substance Substances 0.000 claims description 7
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 claims description 6
- 229910052760 oxygen Inorganic materials 0.000 claims description 6
- 239000001301 oxygen Substances 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- 238000012952 Resampling Methods 0.000 claims description 5
- 230000002159 abnormal effect Effects 0.000 claims description 5
- 238000010248 power generation Methods 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 5
- 238000002790 cross-validation Methods 0.000 claims description 4
- 239000007789 gas Substances 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 5
- 238000005516 engineering process Methods 0.000 abstract description 2
- 230000002349 favourable effect Effects 0.000 abstract 1
- 238000003672 processing method Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 24
- 230000000875 corresponding effect Effects 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000010801 machine learning Methods 0.000 description 5
- 238000000053 physical method Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 101001095088 Homo sapiens Melanoma antigen preferentially expressed in tumors Proteins 0.000 description 2
- 102100037020 Melanoma antigen preferentially expressed in tumors Human genes 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 238000012407 engineering method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- UGFAIRIUMAVXCW-UHFFFAOYSA-N Carbon monoxide Chemical compound [O+]#[C-] UGFAIRIUMAVXCW-UHFFFAOYSA-N 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 239000002956 ash Substances 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000007636 ensemble learning method Methods 0.000 description 1
- 239000003546 flue gas Substances 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012887 quadratic function Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000000779 smoke Substances 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012731 temporal analysis Methods 0.000 description 1
- 238000000700 time series analysis Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000004580 weight loss Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a soft measurement method for carbon content in fly ash based on a LightGBM and XGboost combined model, which comprises the following steps: 1) clearing obvious error values in DCS data and extracting steady-state data by using a data mining technology; 2) in the soft measurement of the carbon content in the fly ash, a correlation matrix and a packaging method in characteristic engineering are used for solving the problem of redundant characteristics; 3) and combining the processed data set with LightGBM, XGboost and Bayesian Optimization (BO) algorithms to perform fly ash carbon content prediction modeling, selecting an optimal hyper-parameter and improving prediction precision. 4) The BO-XGboost and BO-LightGBM models are combined using a sequence least squares planning algorithm. Compared with a common soft measurement model for the carbon content of the fly ash, the method provided by the invention has the advantages that a more detailed and reasonable feature processing method is provided, redundant features are eliminated, and the method is more favorable for subsequent prediction modeling. The LightGBM and XGboost models are combined by adopting a sequential quadratic programming algorithm, so that the generalization capability of the models is stronger, the prediction precision is higher, and the effect obtained in the soft measurement task of the carbon content in the fly ash is better than that obtained by the traditional method.
Description
Technical Field
The invention belongs to the technical field of measurement of carbon content in boiler fly ash, and particularly relates to a soft measurement method for carbon content in fly ash based on a LightGBM and XGboost combined model.
Background
The carbon content of the fly ash of the boiler is one of important indexes for evaluating the combustion state of the coal-fired boiler, and the real-time monitoring of the carbon content of the fly ash is beneficial to controlling the carbon content of the fly ash within a reasonable range, so that the power generation cost is reduced, and the economical efficiency of a unit is improved. The fly ash heat loss of the boiler is the second largest heat loss next to the heat loss of the flue gas. In the actual operation of the boiler, the working condition of the boiler is difficult to adjust to the optimal working condition, and the price of a carbon measuring instrument is not good, so that the carbon content of the fly ash is accurately and really obtained by an economic and effective method, the combustion efficiency is improved, and the production of a boiler thermal power generating unit is guided.
The existing method for acquiring the fly ash carbon content of the coal-fired boiler mainly comprises 3 types: manual sampling submission assay, physical measurement method and soft measurement method. The manual sampling and inspection for chemical examination needs a specially-assigned person to sample and prepare samples regularly, consumes manpower and material resources, and has the problems of data lag, easy occurrence of errors and leaks and the like. The physical methods commonly used include a combustion weight loss method, a spectral analysis method, a microwave method and the like. Various physical methods are difficult to popularize widely for technical or cost reasons. The soft measurement method organically combines the knowledge of the production process through mechanism analysis, can quickly and accurately reflect the carbon content of the fly ash under different working conditions, and has higher economy.
At present, some prior arts have studied on the method for soft measurement of carbon content in fly ash, however, the boiler combustion process is a multivariable, nonlinear and strongly coupled thermal process. For example, the DCS system records parameters such as air volume, air pressure, air temperature, etc. at each outlet of the coal mill. When the parameters are used as boiler combustion modeling variables, the parameters have high correlation, so that certain variable redundancy is generated, the estimation accuracy of the model is influenced, and the calculation complexity is increased. Therefore, it is necessary to apply a more detailed feature engineering method to reduce the influence of the redundancy variables. At present, most research tests have limited data and working conditions and cannot effectively represent the whole operating working condition of the boiler. The traditional regression methods include linear regression, support vector machine, time series analysis method, etc. These methods are relatively simple models and generally do not predict well when processing complex, high-dimensional, noisy data. The ensemble learning method fuses the prediction results of a plurality of learners through various voting mechanisms, and obtains a more accurate result. Therefore, a model with higher accuracy is established by adopting methods such as an integrated model combined with characteristic engineering, super-parameter tuning and the like and is applied to an actual combustion system.
Disclosure of Invention
The present invention is made to solve the above problems, and an object of the present invention is to provide a method for soft measurement of carbon content in fly ash based on a LightGBM and XGBoost combined model, which can obtain more accurate carbon content in fly ash.
In order to achieve the purpose, the invention adopts the following scheme:
as shown in fig. 1, the present invention provides a method for soft measurement of carbon content in fly ash based on a LightGBM and XGBoost combined model, comprising:
step 1, acquiring DCS (distributed Control System) system data of a boiler, and performing data mining on the acquired DCS system data, wherein the data mining specifically comprises obvious abnormal value removal and data resampling;
step 2, acquiring historical data variables of measured parameter values of the boiler working condition measuring points including the relative working condition measuring points and the reference working condition measuring points in a certain period, and aiming at the characteristics of multivariable, nonlinearity and strong coupling in the boiler combustion process, firstly, finding out variables with strong coupling with the carbon content of the fly ash through a correlation matrix, removing the variables with low correlation with the carbon content of the fly ash, and further extracting important variables as the input of a subsequent model through a packaging method;
step 3, dividing the important variables finally extracted in the step 2 into a training set, a verification set and a test set, and respectively adopting XGboost and LightGBM models as prediction models;
step 4, carrying out super-parameter optimization by using a Bayesian optimization algorithm, setting 5-fold cross validation in the satisfaction evaluation of the prediction model, wherein the evaluation mode is RMSE, the iteration times are set to N, and establishing BO-LightGBM and BO-XGboost models for fly ash carbon content prediction after selecting the optimal super-parameter;
and 5, combining fly ash carbon content prediction of the BO-XGboost model and the BO-LightGBM model by using a sequence least square planning algorithm to obtain a final predicted value.
Further, in the step 2, the correlation matrix is represented by a correlation coefficient, and an expression of the correlation coefficient is shown as equation (1), and represents a direct proportion or inverse proportion relation with the target variable;
where r is the correlation coefficient, x i Is the ith value, y, of the x variable i Is the ith value corresponding to the y variable, i ∈ [1, n ∈]N is the total number of values,the average values of the x and y variables, respectively.
Further, the historical data variables in step 2 include the coal feeding amount of each coal mill, the primary air pressure, the air temperature, the air volume, the separator outlet temperature and current of the coal mill, the secondary air door opening of each layer, the temperature, the pressure, the air volume and the oxygen content of the primary air and the secondary air related to the air preheater, the air supply temperature, the pressure and the air volume of the air feeder, the oxygen content and the exhaust gas temperature of the tail flue, the power generation power, the total primary air volume, the total secondary air volume, the furnace pressure and the furnace temperature.
Furthermore, when the BO-XGboost model and the BO-LightGBM model are combined and combined by using a sequence least square planning algorithm in the step 5,
wherein the objective function Obj is a mean square error functionY is the average of the corresponding real values of all samples;
the ratio of the mean square error of the predicted value and the true value of the two models is selected as the initial value of the weight, and the formula (7) shows;
where n is the total number of sample data, i denotes the ith sample data, w 1 ,w 2 Is the weight coefficient, y, of the BO-XGboost model and the BO-LightGBM model 1i Is a predicted value y obtained by the ith sample data through a BO-XGboost model 2i Is a predicted value y obtained by the ith sample data through a BO-LightGBM model i The real value corresponding to the ith sample data is obtained, and the predicted value of the combined model is shown as the formula (8);
wherein the content of the first and second substances,andthe average value of the predicted values corresponding to all samples of the BO-XGboost model and the BO-LightGBM model is obtained.
Further, for a given trainingPrediction value of set, LightGBM modelCan be expressed by equation (2):
wherein the content of the first and second substances,representing the predicted value of the LightGBM model, K representing the number of decision trees, f k Denotes the predicted value, x, of the kth decision tree i Represents the ith input sample; f represents the set of all decision trees; LightGBM's objective function L (t) Represented by formula (3):
in equation (3), n represents the total number of samples, i is the index of the current sample,is a loss function, represents the target value y i And the predicted valueThe difference between them is expressed for the regression problem by a mean square error loss function, i.e. the loss function is Is the predicted value of the previous t-1 round in the t-th iteration, f t (x i ) Is the predicted value of the t-th round, Ω (f) t ) Is the model complexity, expressed in equation (4);
in the formula (4), r and λ are regular term coefficients, so that the decision tree is prevented from being too complex, T represents the number of leaf nodes in the objective function, and w is a weight coefficient of the leaf nodes.
Further, for a given training set, the predicted value of the XGBoost model may be represented by the following formula:
wherein, f (x) i ) Representing the predicted value of the XGboost model, K representing the number of decision trees, f k Denotes the predicted value, x, of the kth decision tree i Represents the ith input sample; f represents the set of all decision trees; the target function of the XGboost model is shown as the formula (5);
where n represents the total number of samples, i is the index of the current sample, g i Represents a sample x i With respect to the first derivative value of the loss function, h i Represents a sample x i With respect to the second derivative value of the loss function, the loss function is a mean square error loss function, f t (x i ) Is the predicted value of the T-th round, lambda is the regular term coefficient, T represents the number of leaf nodes in the objective function, w j Representing the weight coefficient of the jth leaf node.
Compared with the prior art, the scheme of the invention has the beneficial effects that:
the method is based on actual working condition data of the coal-fired boiler of the power plant, and integrates a data driving method of various machine learning algorithms and data mining technologies for the first time to analyze the relationship between the carbon content of the fly ash and various operating parameters of the boiler. And (4) removing redundant features and extracting important features in two steps by using a correlation matrix and a packaging method. And then substituting the data into LightGBM and XGboost models for training, learning, predicting and verifying, and then combining the models through a sequence least square planning algorithm to obtain the fly ash carbon content which is closest to an actual combustion system, so that the soft measurement precision is improved, and the reliability and the accuracy of the soft measurement of the fly ash carbon content of the power plant are ensured.
Drawings
FIG. 1 is a flow chart of a method for soft measurement of carbon content in fly ash according to the present invention;
fig. 2 is a flow chart of bayesian optimization for hyperparameter optimization of the LightGBM model according to the present invention.
FIG. 3 is a flow chart of Bayesian optimization for over-parameter optimization of the XGboost model according to the invention.
Fig. 4 is a flowchart of the combined model of the sequential least squares planning algorithm according to the present invention.
Detailed Description
The invention will be further elucidated and described with reference to the drawings and the detailed description.
Step 1, acquiring DCS (distributed Control System) system data of a boiler, and performing data mining on the acquired DCS system data, specifically including obvious abnormal value removal and data resampling, wherein part of the acquired DCS system data generates abnormal values due to system restart or other reasons, and removing data which are out of a reasonable range in each detection point. Because the power generation amount of the thermal power generating unit needs to be adjusted according to the load of the power grid during operation, the load changes violently, and the thermal power generating unit continuously changes the working conditions of steady state, transition, steady state and the like. This may result in reduced correlation between data. This effect can be minimized by merging the data into a larger time interval, resampling the data for an appropriate time period. For example, the actual load recorded by the DCS has just started to have some invalid data due to the shutdown of the power plant.
And 2, the boiler combustion process is a multivariable, nonlinear and strongly coupled thermal process. For example, the DCS system records parameters such as air volume, air pressure, air temperature, etc. at each outlet of the coal mill. When the parameters are used as boiler combustion modeling variables, the parameters have high correlation, so that certain variable redundancy is generated, the estimation accuracy of the model is influenced, and the calculation complexity is increased. Therefore, it is necessary to apply a feature engineering method to reduce the influence of the redundant variables. Firstly, finding out variables with strong coupling through a correlation matrix, removing the variables with low correlation with the carbon content of fly ash, and further extracting important variables through a packaging method.
Firstly, a correlation matrix is constructed to quantify the variable dependency, the correlation matrix is a table representing how the variables are correlated with the predicted values, and the correlation matrix is represented by a correlation coefficient, as shown in equation (1), the value of the correlation coefficient can be negative or integer, and represents that the correlation coefficient is in direct proportion or inverse proportion to the target variable.
Where r is the correlation coefficient, x i Is the ith value, y, of the x variable i Is the ith value corresponding to the y variable, i ∈ [1, n ∈]N is the total number of values,the average values of the x and y variables, respectively.
The wrapping method is a method for selecting variables according to a specific prediction model, and the method adopts Recursive Feature Elimination (RFE). The method is a greedy optimization algorithm, and an optimal variable set is selected through repeated iteration.
And 3, dividing the data processed in the steps 1 and 2 into a training set, a verification set and a test set, and adopting LightGBM and XGboost models as prediction models:
LightGBM is an integrated machine learning algorithm developed by Microsoft in 2017, is a distributed Gradient lifting framework (GBDT) advanced implementation of a decision tree algorithm, and is a GOSS (Gradient-based One-Side Sampling) and EFB (explicit Feature mapping) algorithm which are blended on the basis of GBDT, supports parallelized learning and rapidly processes large-scale data, so that the efficiency is higher on the premise of ensuring accuracy and interpretability. The GBDT algorithm is the core of LightGBM, and the strong learner is generated by iteratively adding the weak learner by calculating the negative gradient of the loss function. For GOSS, only the data example with larger gradient is used for calculating information gain, so that relatively accurate information gain estimation can be obtained by using less data, and for EFB, the mutually exclusive characteristics are bundled, and the number of mutually exclusive characteristics is reduced. The two methods are used for reducing the calculation time and reducing the use of memory so as to complete the training more quickly.
wherein the content of the first and second substances,representing the predicted value of the model, K representing the number of decision trees, f k Denotes the predicted value, x, of the kth decision tree i Represents the ith input sample; f represents the set of all decision trees; LightGBM's objective function L (t) Represented by formula (3):
in equation (3), n represents the number of samples, i is the current sample,is a loss function, represents the target value y i And the predicted valueThe difference between them, often expressed as a mean square error loss function for the regression problem-that is, is the predicted value of the previous t-1 round in the t-th iteration, f t (x i ) Is the predicted value of the t-th round, Ω (f) t ) Is the model complexity and is usually expressed by equation (4).
In the formula (4), r and lambda are regular term coefficients, so that the decision tree is prevented from being too complex, T represents the number of leaf nodes in the objective function, and w is a weight coefficient of the leaf nodes.
The XGBoost algorithm, a limit gradient boost algorithm, proposed by Tianqi Chen, is one of machine learning algorithms widely used by data scientists at present, and achieves good results in numerous machine learning competitions. The XGboost algorithm is also an improvement of the GDBT algorithm, and is different from the LightGBM in that the XGboost is used for more meticulous traversal calculation and the like of data, the data can be completely loaded into a memory during calculation, and the calculation speed is accelerated by adopting a parallel calculation mode. The calculation mode of the predicted value of the XGboost algorithm is the same as that of the predicted value of the LightGBM, and the target function of the XGboost algorithm is shown as a formula 5;
where n represents the number of samples, i is the current sample, g i Represents a sample x i With respect to the first derivative value of the loss function, h i Represents a sample x i With respect to the second derivative value of the loss function, f t (x i ) Is the predicted value of the T-th round, lambda is the regular term coefficient, T represents the number of leaf nodes in the objective function, w j Representing the weight coefficient of the jth leaf node.
And 4, carrying out super-parameter tuning by using a Bayes optimization algorithm (BO), and setting 5-fold cross validation in the evaluation of the satisfaction degree of the model, wherein the evaluation mode is RMSE, and the sequential iteration times in the optimization process are 100. And after the optimal hyperparameter is selected, establishing a BO-LightGBM and BO-XGboost model for predicting the carbon content of the fly ash.
And 5, combining the XGboost model and the LightGBM model by using a sequence least square programming algorithm in order to improve the prediction precision of the model and solve the problem of limited robustness of a single model. The SQP (sequential quadratic programming) algorithm is widely applied in many fields, such as solving of least square problem, nonlinear optimization problem, economics and system analysis. The combined model problem can be expressed by the formula (6):
wherein the objective function Obj is a mean square error functionY is the average of the corresponding real values of all samples;
since equation (6) is a non-linear quadratic function and the constraints are linear, it is a quadratic programming problem that can be solved with a sequential least squares programming algorithm.
The ratio of the mean square error of the predicted value and the true value of the two models is selected as shown in formula (7) according to the initial value of the weight, so that the solving speed can be increased, and the situation that the local optimal solution is involved is avoided.
Where n is the total number of sample data, i denotes the ith sample data, w 1 ,w 2 Is the weight coefficient, y, of the BO-XGboost model and the BO-LightGBM model 1i The predicted value of the ith sample data obtained by a BO-XGboost model,y 2i Is a predicted value y obtained by the ith sample data through a BO-LightGBM model i The real value corresponding to the ith sample data is obtained, and the predicted value of the combined model is shown as the formula (8);
wherein the content of the first and second substances,andthe average value of the predicted values corresponding to all samples of the BO-XGboost model and the BO-LightGBM model is obtained.
Example 1
Step 1, acquiring all historical operating condition data of a power plant within a period of time (for example, 50 days), wherein the acquired operating condition measurement points comprise about 70 total operating condition measurement points, including coal feeding amount of each coal mill, primary air pressure, air temperature, air volume, separator outlet temperature, current and the like of the coal mills, secondary air door opening of each layer, temperature, pressure, air volume and oxygen content of primary air and secondary air related to an air preheater, air supply temperature, pressure and air volume of an air feeder, oxygen content and smoke exhaust temperature of a tail flue, and other general parameters such as power generation power, total primary air volume, total secondary air volume, furnace pressure, furnace temperature and the like;
and step two, removing the obvious abnormal values in the data, and resampling the data by taking 5 minutes as an average value interval.
And step three, performing feature dimension reduction by using a feature dimension reduction method of machine learning. Aiming at the characteristics of multivariable, nonlinearity and strong coupling in the boiler combustion process, firstly, the variables with strong coupling are found out through a correlation matrix, the variables with low correlation with the carbon content of fly ash are removed, and the important variables are further extracted through a packaging method.
In the step 2, the correlation matrix is expressed by a correlation coefficient, and the expression of the correlation coefficient is shown as equation (1) and represents a direct proportion or inverse proportion relation with a target variable;
where r is the correlation coefficient, x i Is the ith value, y, of the x variable i Is the ith value corresponding to the y variable, i ∈ [1, n ]]N is the total number of values,the average values of the x and y variables, respectively.
The wrapping method is a method for selecting variables according to a specific prediction model, and the method adopts Recursive Feature Elimination (RFE). The method is a greedy optimization algorithm, an optimal variable set is selected through repeated iteration, and variables selected by a correlation matrix are further screened through a packaging method.
Step four, dividing all samples screened in the step three into a training set, a verification set and a test set according to a ratio of 4:1:1, and respectively adopting XGboost and LightGBM models as prediction models; the verification adopts five-fold cross verification.
And fifthly, based on the prediction model provided by the invention, a Bayes optimization algorithm (BO) is used for carrying out hyperparametric tuning, 5-fold cross validation is set in the evaluation of the satisfaction degree of the model, the evaluation mode is RMSE, and the sequential iteration frequency of the optimization process is 100. After the optimal hyper-parameter is selected, a BO-LightGBM model is established, and the BO-XGboost model is used for predicting the carbon content of fly ash, wherein the specific operation mode is shown in figures 2 and 3.
And sixthly, combining the ash carbon content prediction of the BO-XGboost model and the BO-LightGBM model by using a sequence least square planning algorithm to obtain a final prediction value.
The method is applied to the following embodiments to achieve the technical effects of the present invention, and the detailed steps in the embodiments will not be described again.
Table 1 shows the performance of the process proposed herein compared to other processes. The methods presented herein achieve the lowest MAPE, RMSE and highest R 2 . Compared with other methods, the method has the advantages that the RMSE is reduced by 1.8-26.2%, the MAPE is reduced by 0.7-19.24%, the error is further reduced, and the measurement precision is improved. R 2 The improvement is 1.3% -20.9%, and the fitting effect of the prediction curve is better, so that the method has higher accuracy and reliability. Specifically, parameter tuning is carried out on LM-Garson-BP, AQPSO-SVR and FPA-RF by adopting a heuristic algorithm, and prediction is carried out by combining a regression model, so that the prediction precision of the corresponding model is improved to a certain extent. However, from the perspective of super-parameter tuning, when facing a complex optimization problem of non-convex, multi-peak and high evaluation cost, such as super-parameter tuning, the BO algorithm can find the next evaluation position according to the information obtained for the unknown objective function, thereby reaching the optimal solution most quickly. The BO algorithm avoids the problems that iterative feedback information cannot be effectively utilized, the algorithm searching speed is low and the like. From the perspective of a prediction model, the LightGBM and XGboost are used as the integrated algorithm model objective functions of the decision tree, the second-order Taylor expansion is adopted, the model can be fully learned, the regular terms are added, the model complexity is reduced, the method has the advantages of preventing overfitting, supporting parallel and distributed computation and the like, and the prediction precision can be effectively improved. The combined model can effectively combine the advantages of the two models on the basis of a single model, and the robustness of the model is improved. Therefore, the prediction effect is better compared to the comparative 6 models.
TABLE 1 prediction results of different models
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.
Claims (6)
1. The method for soft measurement of the carbon content of fly ash based on the LightGBM and XGboost combined model is characterized by comprising the following steps:
step 1, acquiring DCS (distributed Control System) system data of a boiler, and performing data mining on the acquired DCS system data, wherein the data mining specifically comprises obvious abnormal value removal and data resampling;
step 2, acquiring historical data variables of measured parameter values of the boiler working condition measuring points including the relative working condition measuring points and the reference working condition measuring points in a certain period, and aiming at the characteristics of multivariable, nonlinearity and strong coupling in the boiler combustion process, firstly, finding out variables with strong coupling with the carbon content of the fly ash through a correlation matrix, removing the variables with low correlation with the carbon content of the fly ash, and further extracting important variables as the input of a subsequent model through a packaging method;
step 3, dividing the important variables finally extracted in the step 2 into a training set, a verification set and a test set, and respectively adopting XGboost and LightGBM models as prediction models;
step 4, carrying out super-parameter optimization by using a Bayesian optimization algorithm, setting 5-fold cross validation in the satisfaction evaluation of the prediction model, wherein the evaluation mode is RMSE, the iteration times are set to N, and establishing BO-LightGBM and BO-XGboost models for predicting the carbon content of the fly ash after selecting the optimal super-parameter;
and 5, combining fly ash carbon content prediction of the BO-XGboost model and the BO-LightGBM model by using a sequence least square planning algorithm to obtain a final predicted value.
2. The method for soft measurement of carbon content in fly ash based on a LightGBM and XGBoost combined model according to claim 1, wherein: in the step 2, the correlation matrix is expressed by a correlation coefficient, and the expression of the correlation coefficient is shown as equation (1) and represents a direct proportion or inverse proportion relation with a target variable;
3. The method for soft measurement of carbon content in fly ash based on a LightGBM and XGBoost combined model according to claim 1, wherein: the historical data variables in the step 2 include the coal feeding amount of each coal mill, the primary air pressure, the air temperature, the air volume, the separator outlet temperature and the current of the coal mill, the secondary air door opening of each layer, the temperature, the pressure, the air volume and the oxygen content of the primary air and the secondary air related to the air preheater, the air supply temperature, the pressure and the air volume of the air feeder, the oxygen content and the exhaust gas temperature of the tail flue, the power generation power, the total primary air volume, the total secondary air volume, the hearth pressure and the hearth temperature.
4. The method for soft measurement of carbon content in fly ash based on a LightGBM and XGBoost combined model according to claim 1, wherein: when the BO-XGboost model and the BO-LightGBM model are combined and combined by using a sequence least square planning algorithm in the step 5,
wherein the objective function Obj is a mean square error functionY is the average of the corresponding real values of all samples;
the ratio of the mean square error of the predicted value and the true value of the two models is selected as the initial value of the weight, and the formula (7) shows;
where n is the total number of sample data, i denotes the ith sample data, w 1 ,w 2 Is the weight coefficient, y, of the BO-XGboost model and the BO-LightGBM model 1i Is a predicted value y obtained by the ith sample data through a BO-XGboost model 2i Is a predicted value y obtained by the ith sample data through a BO-LightGBM model i The predicted value of the combined model is shown as a formula (8) in the real value corresponding to the ith sample data;
5. The method for soft measurement of fly ash carbon content based on a LightGBM and XGBoost combined model according to claim 1, wherein: predictive value of LightGBM model for a given training setCan be expressed by equation (2):
wherein the content of the first and second substances,to representPredictor of LightGBM model, K represents the number of decision trees, f k Denotes the predicted value, x, of the kth decision tree i Represents the ith input sample; f represents the set of all decision trees; LightGBM's objective function L (t) Represented by formula (3):
in equation (3), n represents the total number of samples, i is the index of the current sample,is a loss function, represents the target value y i And the predicted valueThe difference between them is expressed for the regression problem by the mean square error loss function, i.e. the loss function is Is the predicted value of the previous t-1 round in the t-th iteration, f t (x i ) Is the predicted value of the t-th round, Ω (f) t ) Is the model complexity, expressed by equation (4);
in the formula (4), r and λ are regular term coefficients, so that the decision tree is prevented from being too complex, T represents the number of leaf nodes in the objective function, and w is a weight coefficient of the leaf nodes.
6. The method for soft measurement of carbon content in fly ash based on a LightGBM and XGBoost combined model according to claim 1, wherein: for a given training set, the predicted value of the XGBoost model may be represented by the following formula:
wherein, f (x) i ) Representing the predicted value of the XGboost model, K representing the number of decision trees, f k Denotes the predicted value, x, of the kth decision tree i Represents the ith input sample; f represents the set of all decision trees; the target function of the XGboost model is shown as the formula (5);
where n represents the total number of samples, i is the index of the current sample, g i Represents a sample x i With respect to the first derivative value of the loss function, h i Represents a sample x i With respect to the second derivative value of the loss function, the loss function is a mean square error loss function, f t (x i ) Is the predicted value of the T-th round, lambda is the regular term coefficient, T represents the number of leaf nodes in the objective function, w j Representing the weight coefficient of the jth leaf node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210318954.1A CN114896860B (en) | 2022-03-29 | 2022-03-29 | Soft measurement method for carbon content of fly ash based on LightGBM and XGBoost combined model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210318954.1A CN114896860B (en) | 2022-03-29 | 2022-03-29 | Soft measurement method for carbon content of fly ash based on LightGBM and XGBoost combined model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114896860A true CN114896860A (en) | 2022-08-12 |
CN114896860B CN114896860B (en) | 2024-05-14 |
Family
ID=82716399
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210318954.1A Active CN114896860B (en) | 2022-03-29 | 2022-03-29 | Soft measurement method for carbon content of fly ash based on LightGBM and XGBoost combined model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114896860B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019144386A1 (en) * | 2018-01-26 | 2019-08-01 | 大连理工大学 | Method for predicting key performance parameters of aviation engine transition state |
WO2021004198A1 (en) * | 2019-07-10 | 2021-01-14 | 江苏金恒信息科技股份有限公司 | Plate performance prediction method and apparatus |
CN112287598A (en) * | 2020-09-28 | 2021-01-29 | 山西漳山发电有限责任公司 | Fly ash carbon content prediction method based on particle swarm parameter optimization |
CN113591930A (en) * | 2021-07-06 | 2021-11-02 | 武汉纺织大学 | Virus-host correlation prediction method based on network fusion and graph embedding |
CN113918881A (en) * | 2021-10-21 | 2022-01-11 | 武汉纺织大学 | Soft measurement method and monitoring system for carbon content in fly ash based on hierarchical polynomial model |
-
2022
- 2022-03-29 CN CN202210318954.1A patent/CN114896860B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019144386A1 (en) * | 2018-01-26 | 2019-08-01 | 大连理工大学 | Method for predicting key performance parameters of aviation engine transition state |
WO2021004198A1 (en) * | 2019-07-10 | 2021-01-14 | 江苏金恒信息科技股份有限公司 | Plate performance prediction method and apparatus |
CN112287598A (en) * | 2020-09-28 | 2021-01-29 | 山西漳山发电有限责任公司 | Fly ash carbon content prediction method based on particle swarm parameter optimization |
CN113591930A (en) * | 2021-07-06 | 2021-11-02 | 武汉纺织大学 | Virus-host correlation prediction method based on network fusion and graph embedding |
CN113918881A (en) * | 2021-10-21 | 2022-01-11 | 武汉纺织大学 | Soft measurement method and monitoring system for carbon content in fly ash based on hierarchical polynomial model |
Non-Patent Citations (2)
Title |
---|
卞和营;王军敏;: "支持向量回归在飞灰含碳量软测量中的应用", 计算机测量与控制, no. 02, 25 February 2014 (2014-02-25) * |
曹渝昆;朱萌;: "基于主成分分析和LightGBM的风电场发电功率超短期预测", 上海电力学院学报, no. 06, 15 December 2019 (2019-12-15) * |
Also Published As
Publication number | Publication date |
---|---|
CN114896860B (en) | 2024-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gao et al. | A novel fractional grey Riccati model for carbon emission prediction | |
Huang et al. | Air quality prediction using improved PSO-BP neural network | |
CN109508818B (en) | Online NOx prediction method based on LSSVM | |
CN112085277B (en) | SCR denitration system prediction model optimization method based on machine learning | |
CN111814956B (en) | Multi-task learning air quality prediction method based on multi-dimensional secondary feature extraction | |
CN110597070B (en) | Method for identifying model parameters of thermal power generating unit system | |
Zhai et al. | Robust air/fuel ratio control with adaptive DRNN model and AD tuning | |
CN112650063B (en) | Self-adaptive soft measurement method based on semi-supervised incremental Gaussian mixture regression | |
Wang et al. | Fuzzy modeling of boiler efficiency in power plants | |
Feng et al. | Adversarial smoothing tri-regression for robust semi-supervised industrial soft sensor | |
Pal et al. | Multi-objective stochastic Bayesian optimization for iterative engine calibration | |
CN113780420A (en) | Method for predicting concentration of dissolved gas in transformer oil based on GRU-GCN | |
Wang et al. | Optimization of aluminum fluoride addition in aluminum electrolysis process based on pruned sparse fuzzy neural network | |
Li et al. | Fuzzy-tree-constructed data-efficient modelling methodology for volumetric efficiency of dedicated hybrid engines | |
CN114896860A (en) | Soft measurement method for carbon content in fly ash based on LightGBM and XGboost combined model | |
CN116738866A (en) | Instant learning soft measurement modeling method based on time sequence feature extraction | |
CN110909492A (en) | Sewage treatment process soft measurement method based on extreme gradient lifting algorithm | |
CN116227350A (en) | Multi-target optimization method and device for boiler | |
CN114924489A (en) | Model autonomous learning method suitable for process industry prediction control | |
CN115186584A (en) | Width learning semi-supervised soft measurement modeling method integrating attention mechanism and adaptive composition | |
CN115035962A (en) | Variational self-encoder and generation countermeasure network-based virtual sample generation and soft measurement modeling method | |
CN113707240A (en) | Component parameter robust soft measurement method based on semi-supervised nonlinear variational Bayes mixed model | |
Zhang et al. | Modelling of gas turbine via discrete state-space neural networks | |
CN117190173B (en) | Optimal control method and control system for flue gas recirculation and boiler coupling system | |
CN118171765A (en) | Multi-output online prediction modeling method for urban solid waste incineration process |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |