CN114896860A - Soft measurement method for carbon content in fly ash based on LightGBM and XGboost combined model - Google Patents

Soft measurement method for carbon content in fly ash based on LightGBM and XGboost combined model Download PDF

Info

Publication number
CN114896860A
CN114896860A CN202210318954.1A CN202210318954A CN114896860A CN 114896860 A CN114896860 A CN 114896860A CN 202210318954 A CN202210318954 A CN 202210318954A CN 114896860 A CN114896860 A CN 114896860A
Authority
CN
China
Prior art keywords
lightgbm
model
xgboost
fly ash
carbon content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210318954.1A
Other languages
Chinese (zh)
Other versions
CN114896860B (en
Inventor
刘军平
骆海瑞
彭涛
胡新荣
何儒汉
朱强
张俊杰
熊明福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Textile University
Original Assignee
Wuhan Textile University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Textile University filed Critical Wuhan Textile University
Priority to CN202210318954.1A priority Critical patent/CN114896860B/en
Publication of CN114896860A publication Critical patent/CN114896860A/en
Application granted granted Critical
Publication of CN114896860B publication Critical patent/CN114896860B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a soft measurement method for carbon content in fly ash based on a LightGBM and XGboost combined model, which comprises the following steps: 1) clearing obvious error values in DCS data and extracting steady-state data by using a data mining technology; 2) in the soft measurement of the carbon content in the fly ash, a correlation matrix and a packaging method in characteristic engineering are used for solving the problem of redundant characteristics; 3) and combining the processed data set with LightGBM, XGboost and Bayesian Optimization (BO) algorithms to perform fly ash carbon content prediction modeling, selecting an optimal hyper-parameter and improving prediction precision. 4) The BO-XGboost and BO-LightGBM models are combined using a sequence least squares planning algorithm. Compared with a common soft measurement model for the carbon content of the fly ash, the method provided by the invention has the advantages that a more detailed and reasonable feature processing method is provided, redundant features are eliminated, and the method is more favorable for subsequent prediction modeling. The LightGBM and XGboost models are combined by adopting a sequential quadratic programming algorithm, so that the generalization capability of the models is stronger, the prediction precision is higher, and the effect obtained in the soft measurement task of the carbon content in the fly ash is better than that obtained by the traditional method.

Description

Soft measurement method for carbon content in fly ash based on LightGBM and XGboost combined model
Technical Field
The invention belongs to the technical field of measurement of carbon content in boiler fly ash, and particularly relates to a soft measurement method for carbon content in fly ash based on a LightGBM and XGboost combined model.
Background
The carbon content of the fly ash of the boiler is one of important indexes for evaluating the combustion state of the coal-fired boiler, and the real-time monitoring of the carbon content of the fly ash is beneficial to controlling the carbon content of the fly ash within a reasonable range, so that the power generation cost is reduced, and the economical efficiency of a unit is improved. The fly ash heat loss of the boiler is the second largest heat loss next to the heat loss of the flue gas. In the actual operation of the boiler, the working condition of the boiler is difficult to adjust to the optimal working condition, and the price of a carbon measuring instrument is not good, so that the carbon content of the fly ash is accurately and really obtained by an economic and effective method, the combustion efficiency is improved, and the production of a boiler thermal power generating unit is guided.
The existing method for acquiring the fly ash carbon content of the coal-fired boiler mainly comprises 3 types: manual sampling submission assay, physical measurement method and soft measurement method. The manual sampling and inspection for chemical examination needs a specially-assigned person to sample and prepare samples regularly, consumes manpower and material resources, and has the problems of data lag, easy occurrence of errors and leaks and the like. The physical methods commonly used include a combustion weight loss method, a spectral analysis method, a microwave method and the like. Various physical methods are difficult to popularize widely for technical or cost reasons. The soft measurement method organically combines the knowledge of the production process through mechanism analysis, can quickly and accurately reflect the carbon content of the fly ash under different working conditions, and has higher economy.
At present, some prior arts have studied on the method for soft measurement of carbon content in fly ash, however, the boiler combustion process is a multivariable, nonlinear and strongly coupled thermal process. For example, the DCS system records parameters such as air volume, air pressure, air temperature, etc. at each outlet of the coal mill. When the parameters are used as boiler combustion modeling variables, the parameters have high correlation, so that certain variable redundancy is generated, the estimation accuracy of the model is influenced, and the calculation complexity is increased. Therefore, it is necessary to apply a more detailed feature engineering method to reduce the influence of the redundancy variables. At present, most research tests have limited data and working conditions and cannot effectively represent the whole operating working condition of the boiler. The traditional regression methods include linear regression, support vector machine, time series analysis method, etc. These methods are relatively simple models and generally do not predict well when processing complex, high-dimensional, noisy data. The ensemble learning method fuses the prediction results of a plurality of learners through various voting mechanisms, and obtains a more accurate result. Therefore, a model with higher accuracy is established by adopting methods such as an integrated model combined with characteristic engineering, super-parameter tuning and the like and is applied to an actual combustion system.
Disclosure of Invention
The present invention is made to solve the above problems, and an object of the present invention is to provide a method for soft measurement of carbon content in fly ash based on a LightGBM and XGBoost combined model, which can obtain more accurate carbon content in fly ash.
In order to achieve the purpose, the invention adopts the following scheme:
as shown in fig. 1, the present invention provides a method for soft measurement of carbon content in fly ash based on a LightGBM and XGBoost combined model, comprising:
step 1, acquiring DCS (distributed Control System) system data of a boiler, and performing data mining on the acquired DCS system data, wherein the data mining specifically comprises obvious abnormal value removal and data resampling;
step 2, acquiring historical data variables of measured parameter values of the boiler working condition measuring points including the relative working condition measuring points and the reference working condition measuring points in a certain period, and aiming at the characteristics of multivariable, nonlinearity and strong coupling in the boiler combustion process, firstly, finding out variables with strong coupling with the carbon content of the fly ash through a correlation matrix, removing the variables with low correlation with the carbon content of the fly ash, and further extracting important variables as the input of a subsequent model through a packaging method;
step 3, dividing the important variables finally extracted in the step 2 into a training set, a verification set and a test set, and respectively adopting XGboost and LightGBM models as prediction models;
step 4, carrying out super-parameter optimization by using a Bayesian optimization algorithm, setting 5-fold cross validation in the satisfaction evaluation of the prediction model, wherein the evaluation mode is RMSE, the iteration times are set to N, and establishing BO-LightGBM and BO-XGboost models for fly ash carbon content prediction after selecting the optimal super-parameter;
and 5, combining fly ash carbon content prediction of the BO-XGboost model and the BO-LightGBM model by using a sequence least square planning algorithm to obtain a final predicted value.
Further, in the step 2, the correlation matrix is represented by a correlation coefficient, and an expression of the correlation coefficient is shown as equation (1), and represents a direct proportion or inverse proportion relation with the target variable;
Figure BDA0003569781720000031
where r is the correlation coefficient, x i Is the ith value, y, of the x variable i Is the ith value corresponding to the y variable, i ∈ [1, n ∈]N is the total number of values,
Figure BDA0003569781720000032
the average values of the x and y variables, respectively.
Further, the historical data variables in step 2 include the coal feeding amount of each coal mill, the primary air pressure, the air temperature, the air volume, the separator outlet temperature and current of the coal mill, the secondary air door opening of each layer, the temperature, the pressure, the air volume and the oxygen content of the primary air and the secondary air related to the air preheater, the air supply temperature, the pressure and the air volume of the air feeder, the oxygen content and the exhaust gas temperature of the tail flue, the power generation power, the total primary air volume, the total secondary air volume, the furnace pressure and the furnace temperature.
Furthermore, when the BO-XGboost model and the BO-LightGBM model are combined and combined by using a sequence least square planning algorithm in the step 5,
Figure BDA0003569781720000041
wherein the objective function Obj is a mean square error function
Figure BDA0003569781720000042
Y is the average of the corresponding real values of all samples;
the ratio of the mean square error of the predicted value and the true value of the two models is selected as the initial value of the weight, and the formula (7) shows;
Figure BDA0003569781720000043
where n is the total number of sample data, i denotes the ith sample data, w 1 ,w 2 Is the weight coefficient, y, of the BO-XGboost model and the BO-LightGBM model 1i Is a predicted value y obtained by the ith sample data through a BO-XGboost model 2i Is a predicted value y obtained by the ith sample data through a BO-LightGBM model i The real value corresponding to the ith sample data is obtained, and the predicted value of the combined model is shown as the formula (8);
Figure BDA0003569781720000044
wherein the content of the first and second substances,
Figure BDA0003569781720000045
and
Figure BDA0003569781720000046
the average value of the predicted values corresponding to all samples of the BO-XGboost model and the BO-LightGBM model is obtained.
Further, for a given trainingPrediction value of set, LightGBM model
Figure BDA0003569781720000047
Can be expressed by equation (2):
Figure BDA0003569781720000048
wherein the content of the first and second substances,
Figure BDA0003569781720000049
representing the predicted value of the LightGBM model, K representing the number of decision trees, f k Denotes the predicted value, x, of the kth decision tree i Represents the ith input sample; f represents the set of all decision trees; LightGBM's objective function L (t) Represented by formula (3):
Figure BDA0003569781720000051
in equation (3), n represents the total number of samples, i is the index of the current sample,
Figure BDA0003569781720000052
is a loss function, represents the target value y i And the predicted value
Figure BDA0003569781720000053
The difference between them is expressed for the regression problem by a mean square error loss function, i.e. the loss function is
Figure BDA0003569781720000054
Figure BDA0003569781720000055
Is the predicted value of the previous t-1 round in the t-th iteration, f t (x i ) Is the predicted value of the t-th round, Ω (f) t ) Is the model complexity, expressed in equation (4);
Figure BDA0003569781720000056
in the formula (4), r and λ are regular term coefficients, so that the decision tree is prevented from being too complex, T represents the number of leaf nodes in the objective function, and w is a weight coefficient of the leaf nodes.
Further, for a given training set, the predicted value of the XGBoost model may be represented by the following formula:
Figure BDA0003569781720000057
wherein, f (x) i ) Representing the predicted value of the XGboost model, K representing the number of decision trees, f k Denotes the predicted value, x, of the kth decision tree i Represents the ith input sample; f represents the set of all decision trees; the target function of the XGboost model is shown as the formula (5);
Figure BDA0003569781720000058
where n represents the total number of samples, i is the index of the current sample, g i Represents a sample x i With respect to the first derivative value of the loss function, h i Represents a sample x i With respect to the second derivative value of the loss function, the loss function is a mean square error loss function, f t (x i ) Is the predicted value of the T-th round, lambda is the regular term coefficient, T represents the number of leaf nodes in the objective function, w j Representing the weight coefficient of the jth leaf node.
Compared with the prior art, the scheme of the invention has the beneficial effects that:
the method is based on actual working condition data of the coal-fired boiler of the power plant, and integrates a data driving method of various machine learning algorithms and data mining technologies for the first time to analyze the relationship between the carbon content of the fly ash and various operating parameters of the boiler. And (4) removing redundant features and extracting important features in two steps by using a correlation matrix and a packaging method. And then substituting the data into LightGBM and XGboost models for training, learning, predicting and verifying, and then combining the models through a sequence least square planning algorithm to obtain the fly ash carbon content which is closest to an actual combustion system, so that the soft measurement precision is improved, and the reliability and the accuracy of the soft measurement of the fly ash carbon content of the power plant are ensured.
Drawings
FIG. 1 is a flow chart of a method for soft measurement of carbon content in fly ash according to the present invention;
fig. 2 is a flow chart of bayesian optimization for hyperparameter optimization of the LightGBM model according to the present invention.
FIG. 3 is a flow chart of Bayesian optimization for over-parameter optimization of the XGboost model according to the invention.
Fig. 4 is a flowchart of the combined model of the sequential least squares planning algorithm according to the present invention.
Detailed Description
The invention will be further elucidated and described with reference to the drawings and the detailed description.
Step 1, acquiring DCS (distributed Control System) system data of a boiler, and performing data mining on the acquired DCS system data, specifically including obvious abnormal value removal and data resampling, wherein part of the acquired DCS system data generates abnormal values due to system restart or other reasons, and removing data which are out of a reasonable range in each detection point. Because the power generation amount of the thermal power generating unit needs to be adjusted according to the load of the power grid during operation, the load changes violently, and the thermal power generating unit continuously changes the working conditions of steady state, transition, steady state and the like. This may result in reduced correlation between data. This effect can be minimized by merging the data into a larger time interval, resampling the data for an appropriate time period. For example, the actual load recorded by the DCS has just started to have some invalid data due to the shutdown of the power plant.
And 2, the boiler combustion process is a multivariable, nonlinear and strongly coupled thermal process. For example, the DCS system records parameters such as air volume, air pressure, air temperature, etc. at each outlet of the coal mill. When the parameters are used as boiler combustion modeling variables, the parameters have high correlation, so that certain variable redundancy is generated, the estimation accuracy of the model is influenced, and the calculation complexity is increased. Therefore, it is necessary to apply a feature engineering method to reduce the influence of the redundant variables. Firstly, finding out variables with strong coupling through a correlation matrix, removing the variables with low correlation with the carbon content of fly ash, and further extracting important variables through a packaging method.
Firstly, a correlation matrix is constructed to quantify the variable dependency, the correlation matrix is a table representing how the variables are correlated with the predicted values, and the correlation matrix is represented by a correlation coefficient, as shown in equation (1), the value of the correlation coefficient can be negative or integer, and represents that the correlation coefficient is in direct proportion or inverse proportion to the target variable.
Figure BDA0003569781720000071
Where r is the correlation coefficient, x i Is the ith value, y, of the x variable i Is the ith value corresponding to the y variable, i ∈ [1, n ∈]N is the total number of values,
Figure BDA0003569781720000072
the average values of the x and y variables, respectively.
The wrapping method is a method for selecting variables according to a specific prediction model, and the method adopts Recursive Feature Elimination (RFE). The method is a greedy optimization algorithm, and an optimal variable set is selected through repeated iteration.
And 3, dividing the data processed in the steps 1 and 2 into a training set, a verification set and a test set, and adopting LightGBM and XGboost models as prediction models:
LightGBM is an integrated machine learning algorithm developed by Microsoft in 2017, is a distributed Gradient lifting framework (GBDT) advanced implementation of a decision tree algorithm, and is a GOSS (Gradient-based One-Side Sampling) and EFB (explicit Feature mapping) algorithm which are blended on the basis of GBDT, supports parallelized learning and rapidly processes large-scale data, so that the efficiency is higher on the premise of ensuring accuracy and interpretability. The GBDT algorithm is the core of LightGBM, and the strong learner is generated by iteratively adding the weak learner by calculating the negative gradient of the loss function. For GOSS, only the data example with larger gradient is used for calculating information gain, so that relatively accurate information gain estimation can be obtained by using less data, and for EFB, the mutually exclusive characteristics are bundled, and the number of mutually exclusive characteristics is reduced. The two methods are used for reducing the calculation time and reducing the use of memory so as to complete the training more quickly.
Predictive value of LightGBM for a given training set D
Figure BDA0003569781720000081
Can be expressed by formula (2):
Figure BDA0003569781720000082
wherein the content of the first and second substances,
Figure BDA0003569781720000083
representing the predicted value of the model, K representing the number of decision trees, f k Denotes the predicted value, x, of the kth decision tree i Represents the ith input sample; f represents the set of all decision trees; LightGBM's objective function L (t) Represented by formula (3):
Figure BDA0003569781720000084
in equation (3), n represents the number of samples, i is the current sample,
Figure BDA0003569781720000085
is a loss function, represents the target value y i And the predicted value
Figure BDA0003569781720000086
The difference between them, often expressed as a mean square error loss function for the regression problem-that is,
Figure BDA0003569781720000087
Figure BDA0003569781720000088
is the predicted value of the previous t-1 round in the t-th iteration, f t (x i ) Is the predicted value of the t-th round, Ω (f) t ) Is the model complexity and is usually expressed by equation (4).
Figure BDA0003569781720000091
In the formula (4), r and lambda are regular term coefficients, so that the decision tree is prevented from being too complex, T represents the number of leaf nodes in the objective function, and w is a weight coefficient of the leaf nodes.
The XGBoost algorithm, a limit gradient boost algorithm, proposed by Tianqi Chen, is one of machine learning algorithms widely used by data scientists at present, and achieves good results in numerous machine learning competitions. The XGboost algorithm is also an improvement of the GDBT algorithm, and is different from the LightGBM in that the XGboost is used for more meticulous traversal calculation and the like of data, the data can be completely loaded into a memory during calculation, and the calculation speed is accelerated by adopting a parallel calculation mode. The calculation mode of the predicted value of the XGboost algorithm is the same as that of the predicted value of the LightGBM, and the target function of the XGboost algorithm is shown as a formula 5;
Figure BDA0003569781720000092
where n represents the number of samples, i is the current sample, g i Represents a sample x i With respect to the first derivative value of the loss function, h i Represents a sample x i With respect to the second derivative value of the loss function, f t (x i ) Is the predicted value of the T-th round, lambda is the regular term coefficient, T represents the number of leaf nodes in the objective function, w j Representing the weight coefficient of the jth leaf node.
And 4, carrying out super-parameter tuning by using a Bayes optimization algorithm (BO), and setting 5-fold cross validation in the evaluation of the satisfaction degree of the model, wherein the evaluation mode is RMSE, and the sequential iteration times in the optimization process are 100. And after the optimal hyperparameter is selected, establishing a BO-LightGBM and BO-XGboost model for predicting the carbon content of the fly ash.
And 5, combining the XGboost model and the LightGBM model by using a sequence least square programming algorithm in order to improve the prediction precision of the model and solve the problem of limited robustness of a single model. The SQP (sequential quadratic programming) algorithm is widely applied in many fields, such as solving of least square problem, nonlinear optimization problem, economics and system analysis. The combined model problem can be expressed by the formula (6):
Figure BDA0003569781720000101
Figure BDA0003569781720000102
wherein the objective function Obj is a mean square error function
Figure BDA0003569781720000103
Y is the average of the corresponding real values of all samples;
since equation (6) is a non-linear quadratic function and the constraints are linear, it is a quadratic programming problem that can be solved with a sequential least squares programming algorithm.
Figure BDA0003569781720000104
The ratio of the mean square error of the predicted value and the true value of the two models is selected as shown in formula (7) according to the initial value of the weight, so that the solving speed can be increased, and the situation that the local optimal solution is involved is avoided.
Where n is the total number of sample data, i denotes the ith sample data, w 1 ,w 2 Is the weight coefficient, y, of the BO-XGboost model and the BO-LightGBM model 1i The predicted value of the ith sample data obtained by a BO-XGboost model,y 2i Is a predicted value y obtained by the ith sample data through a BO-LightGBM model i The real value corresponding to the ith sample data is obtained, and the predicted value of the combined model is shown as the formula (8);
Figure BDA0003569781720000105
wherein the content of the first and second substances,
Figure BDA0003569781720000106
and
Figure BDA0003569781720000107
the average value of the predicted values corresponding to all samples of the BO-XGboost model and the BO-LightGBM model is obtained.
Example 1
Step 1, acquiring all historical operating condition data of a power plant within a period of time (for example, 50 days), wherein the acquired operating condition measurement points comprise about 70 total operating condition measurement points, including coal feeding amount of each coal mill, primary air pressure, air temperature, air volume, separator outlet temperature, current and the like of the coal mills, secondary air door opening of each layer, temperature, pressure, air volume and oxygen content of primary air and secondary air related to an air preheater, air supply temperature, pressure and air volume of an air feeder, oxygen content and smoke exhaust temperature of a tail flue, and other general parameters such as power generation power, total primary air volume, total secondary air volume, furnace pressure, furnace temperature and the like;
and step two, removing the obvious abnormal values in the data, and resampling the data by taking 5 minutes as an average value interval.
And step three, performing feature dimension reduction by using a feature dimension reduction method of machine learning. Aiming at the characteristics of multivariable, nonlinearity and strong coupling in the boiler combustion process, firstly, the variables with strong coupling are found out through a correlation matrix, the variables with low correlation with the carbon content of fly ash are removed, and the important variables are further extracted through a packaging method.
In the step 2, the correlation matrix is expressed by a correlation coefficient, and the expression of the correlation coefficient is shown as equation (1) and represents a direct proportion or inverse proportion relation with a target variable;
Figure BDA0003569781720000111
where r is the correlation coefficient, x i Is the ith value, y, of the x variable i Is the ith value corresponding to the y variable, i ∈ [1, n ]]N is the total number of values,
Figure BDA0003569781720000112
the average values of the x and y variables, respectively.
The wrapping method is a method for selecting variables according to a specific prediction model, and the method adopts Recursive Feature Elimination (RFE). The method is a greedy optimization algorithm, an optimal variable set is selected through repeated iteration, and variables selected by a correlation matrix are further screened through a packaging method.
Step four, dividing all samples screened in the step three into a training set, a verification set and a test set according to a ratio of 4:1:1, and respectively adopting XGboost and LightGBM models as prediction models; the verification adopts five-fold cross verification.
And fifthly, based on the prediction model provided by the invention, a Bayes optimization algorithm (BO) is used for carrying out hyperparametric tuning, 5-fold cross validation is set in the evaluation of the satisfaction degree of the model, the evaluation mode is RMSE, and the sequential iteration frequency of the optimization process is 100. After the optimal hyper-parameter is selected, a BO-LightGBM model is established, and the BO-XGboost model is used for predicting the carbon content of fly ash, wherein the specific operation mode is shown in figures 2 and 3.
And sixthly, combining the ash carbon content prediction of the BO-XGboost model and the BO-LightGBM model by using a sequence least square planning algorithm to obtain a final prediction value.
The method is applied to the following embodiments to achieve the technical effects of the present invention, and the detailed steps in the embodiments will not be described again.
Table 1 shows the performance of the process proposed herein compared to other processes. The methods presented herein achieve the lowest MAPE, RMSE and highest R 2 . Compared with other methods, the method has the advantages that the RMSE is reduced by 1.8-26.2%, the MAPE is reduced by 0.7-19.24%, the error is further reduced, and the measurement precision is improved. R 2 The improvement is 1.3% -20.9%, and the fitting effect of the prediction curve is better, so that the method has higher accuracy and reliability. Specifically, parameter tuning is carried out on LM-Garson-BP, AQPSO-SVR and FPA-RF by adopting a heuristic algorithm, and prediction is carried out by combining a regression model, so that the prediction precision of the corresponding model is improved to a certain extent. However, from the perspective of super-parameter tuning, when facing a complex optimization problem of non-convex, multi-peak and high evaluation cost, such as super-parameter tuning, the BO algorithm can find the next evaluation position according to the information obtained for the unknown objective function, thereby reaching the optimal solution most quickly. The BO algorithm avoids the problems that iterative feedback information cannot be effectively utilized, the algorithm searching speed is low and the like. From the perspective of a prediction model, the LightGBM and XGboost are used as the integrated algorithm model objective functions of the decision tree, the second-order Taylor expansion is adopted, the model can be fully learned, the regular terms are added, the model complexity is reduced, the method has the advantages of preventing overfitting, supporting parallel and distributed computation and the like, and the prediction precision can be effectively improved. The combined model can effectively combine the advantages of the two models on the basis of a single model, and the robustness of the model is improved. Therefore, the prediction effect is better compared to the comparative 6 models.
TABLE 1 prediction results of different models
Figure BDA0003569781720000131
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (6)

1. The method for soft measurement of the carbon content of fly ash based on the LightGBM and XGboost combined model is characterized by comprising the following steps:
step 1, acquiring DCS (distributed Control System) system data of a boiler, and performing data mining on the acquired DCS system data, wherein the data mining specifically comprises obvious abnormal value removal and data resampling;
step 2, acquiring historical data variables of measured parameter values of the boiler working condition measuring points including the relative working condition measuring points and the reference working condition measuring points in a certain period, and aiming at the characteristics of multivariable, nonlinearity and strong coupling in the boiler combustion process, firstly, finding out variables with strong coupling with the carbon content of the fly ash through a correlation matrix, removing the variables with low correlation with the carbon content of the fly ash, and further extracting important variables as the input of a subsequent model through a packaging method;
step 3, dividing the important variables finally extracted in the step 2 into a training set, a verification set and a test set, and respectively adopting XGboost and LightGBM models as prediction models;
step 4, carrying out super-parameter optimization by using a Bayesian optimization algorithm, setting 5-fold cross validation in the satisfaction evaluation of the prediction model, wherein the evaluation mode is RMSE, the iteration times are set to N, and establishing BO-LightGBM and BO-XGboost models for predicting the carbon content of the fly ash after selecting the optimal super-parameter;
and 5, combining fly ash carbon content prediction of the BO-XGboost model and the BO-LightGBM model by using a sequence least square planning algorithm to obtain a final predicted value.
2. The method for soft measurement of carbon content in fly ash based on a LightGBM and XGBoost combined model according to claim 1, wherein: in the step 2, the correlation matrix is expressed by a correlation coefficient, and the expression of the correlation coefficient is shown as equation (1) and represents a direct proportion or inverse proportion relation with a target variable;
Figure FDA0003569781710000011
where r is the correlation coefficient, x i Is the ith value, y, of the x variable i Is y variable corresponds toIs given by the ith value of (i ∈ [1, n ]]N is the total number of values,
Figure FDA0003569781710000012
the average values of the x and y variables, respectively.
3. The method for soft measurement of carbon content in fly ash based on a LightGBM and XGBoost combined model according to claim 1, wherein: the historical data variables in the step 2 include the coal feeding amount of each coal mill, the primary air pressure, the air temperature, the air volume, the separator outlet temperature and the current of the coal mill, the secondary air door opening of each layer, the temperature, the pressure, the air volume and the oxygen content of the primary air and the secondary air related to the air preheater, the air supply temperature, the pressure and the air volume of the air feeder, the oxygen content and the exhaust gas temperature of the tail flue, the power generation power, the total primary air volume, the total secondary air volume, the hearth pressure and the hearth temperature.
4. The method for soft measurement of carbon content in fly ash based on a LightGBM and XGBoost combined model according to claim 1, wherein: when the BO-XGboost model and the BO-LightGBM model are combined and combined by using a sequence least square planning algorithm in the step 5,
Figure FDA0003569781710000021
Figure FDA0003569781710000022
wherein the objective function Obj is a mean square error function
Figure FDA0003569781710000023
Y is the average of the corresponding real values of all samples;
the ratio of the mean square error of the predicted value and the true value of the two models is selected as the initial value of the weight, and the formula (7) shows;
Figure FDA0003569781710000024
where n is the total number of sample data, i denotes the ith sample data, w 1 ,w 2 Is the weight coefficient, y, of the BO-XGboost model and the BO-LightGBM model 1i Is a predicted value y obtained by the ith sample data through a BO-XGboost model 2i Is a predicted value y obtained by the ith sample data through a BO-LightGBM model i The predicted value of the combined model is shown as a formula (8) in the real value corresponding to the ith sample data;
Figure FDA0003569781710000025
wherein the content of the first and second substances,
Figure FDA0003569781710000026
and
Figure FDA0003569781710000027
the average value of the predicted values corresponding to all samples of the BO-XGboost model and the BO-LightGBM model is obtained.
5. The method for soft measurement of fly ash carbon content based on a LightGBM and XGBoost combined model according to claim 1, wherein: predictive value of LightGBM model for a given training set
Figure FDA0003569781710000028
Can be expressed by equation (2):
Figure FDA0003569781710000029
wherein the content of the first and second substances,
Figure FDA0003569781710000031
to representPredictor of LightGBM model, K represents the number of decision trees, f k Denotes the predicted value, x, of the kth decision tree i Represents the ith input sample; f represents the set of all decision trees; LightGBM's objective function L (t) Represented by formula (3):
Figure FDA0003569781710000032
in equation (3), n represents the total number of samples, i is the index of the current sample,
Figure FDA0003569781710000033
is a loss function, represents the target value y i And the predicted value
Figure FDA0003569781710000034
The difference between them is expressed for the regression problem by the mean square error loss function, i.e. the loss function is
Figure FDA0003569781710000035
Figure FDA0003569781710000036
Is the predicted value of the previous t-1 round in the t-th iteration, f t (x i ) Is the predicted value of the t-th round, Ω (f) t ) Is the model complexity, expressed by equation (4);
Figure FDA0003569781710000037
in the formula (4), r and λ are regular term coefficients, so that the decision tree is prevented from being too complex, T represents the number of leaf nodes in the objective function, and w is a weight coefficient of the leaf nodes.
6. The method for soft measurement of carbon content in fly ash based on a LightGBM and XGBoost combined model according to claim 1, wherein: for a given training set, the predicted value of the XGBoost model may be represented by the following formula:
Figure FDA0003569781710000038
wherein, f (x) i ) Representing the predicted value of the XGboost model, K representing the number of decision trees, f k Denotes the predicted value, x, of the kth decision tree i Represents the ith input sample; f represents the set of all decision trees; the target function of the XGboost model is shown as the formula (5);
Figure FDA0003569781710000039
where n represents the total number of samples, i is the index of the current sample, g i Represents a sample x i With respect to the first derivative value of the loss function, h i Represents a sample x i With respect to the second derivative value of the loss function, the loss function is a mean square error loss function, f t (x i ) Is the predicted value of the T-th round, lambda is the regular term coefficient, T represents the number of leaf nodes in the objective function, w j Representing the weight coefficient of the jth leaf node.
CN202210318954.1A 2022-03-29 2022-03-29 Soft measurement method for carbon content of fly ash based on LightGBM and XGBoost combined model Active CN114896860B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210318954.1A CN114896860B (en) 2022-03-29 2022-03-29 Soft measurement method for carbon content of fly ash based on LightGBM and XGBoost combined model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210318954.1A CN114896860B (en) 2022-03-29 2022-03-29 Soft measurement method for carbon content of fly ash based on LightGBM and XGBoost combined model

Publications (2)

Publication Number Publication Date
CN114896860A true CN114896860A (en) 2022-08-12
CN114896860B CN114896860B (en) 2024-05-14

Family

ID=82716399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210318954.1A Active CN114896860B (en) 2022-03-29 2022-03-29 Soft measurement method for carbon content of fly ash based on LightGBM and XGBoost combined model

Country Status (1)

Country Link
CN (1) CN114896860B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019144386A1 (en) * 2018-01-26 2019-08-01 大连理工大学 Method for predicting key performance parameters of aviation engine transition state
WO2021004198A1 (en) * 2019-07-10 2021-01-14 江苏金恒信息科技股份有限公司 Plate performance prediction method and apparatus
CN112287598A (en) * 2020-09-28 2021-01-29 山西漳山发电有限责任公司 Fly ash carbon content prediction method based on particle swarm parameter optimization
CN113591930A (en) * 2021-07-06 2021-11-02 武汉纺织大学 Virus-host correlation prediction method based on network fusion and graph embedding
CN113918881A (en) * 2021-10-21 2022-01-11 武汉纺织大学 Soft measurement method and monitoring system for carbon content in fly ash based on hierarchical polynomial model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019144386A1 (en) * 2018-01-26 2019-08-01 大连理工大学 Method for predicting key performance parameters of aviation engine transition state
WO2021004198A1 (en) * 2019-07-10 2021-01-14 江苏金恒信息科技股份有限公司 Plate performance prediction method and apparatus
CN112287598A (en) * 2020-09-28 2021-01-29 山西漳山发电有限责任公司 Fly ash carbon content prediction method based on particle swarm parameter optimization
CN113591930A (en) * 2021-07-06 2021-11-02 武汉纺织大学 Virus-host correlation prediction method based on network fusion and graph embedding
CN113918881A (en) * 2021-10-21 2022-01-11 武汉纺织大学 Soft measurement method and monitoring system for carbon content in fly ash based on hierarchical polynomial model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
卞和营;王军敏;: "支持向量回归在飞灰含碳量软测量中的应用", 计算机测量与控制, no. 02, 25 February 2014 (2014-02-25) *
曹渝昆;朱萌;: "基于主成分分析和LightGBM的风电场发电功率超短期预测", 上海电力学院学报, no. 06, 15 December 2019 (2019-12-15) *

Also Published As

Publication number Publication date
CN114896860B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
Gao et al. A novel fractional grey Riccati model for carbon emission prediction
Huang et al. Air quality prediction using improved PSO-BP neural network
CN109508818B (en) Online NOx prediction method based on LSSVM
CN112085277B (en) SCR denitration system prediction model optimization method based on machine learning
CN111814956B (en) Multi-task learning air quality prediction method based on multi-dimensional secondary feature extraction
CN110597070B (en) Method for identifying model parameters of thermal power generating unit system
Zhai et al. Robust air/fuel ratio control with adaptive DRNN model and AD tuning
CN112650063B (en) Self-adaptive soft measurement method based on semi-supervised incremental Gaussian mixture regression
Wang et al. Fuzzy modeling of boiler efficiency in power plants
Feng et al. Adversarial smoothing tri-regression for robust semi-supervised industrial soft sensor
Pal et al. Multi-objective stochastic Bayesian optimization for iterative engine calibration
CN113780420A (en) Method for predicting concentration of dissolved gas in transformer oil based on GRU-GCN
Wang et al. Optimization of aluminum fluoride addition in aluminum electrolysis process based on pruned sparse fuzzy neural network
Li et al. Fuzzy-tree-constructed data-efficient modelling methodology for volumetric efficiency of dedicated hybrid engines
CN114896860A (en) Soft measurement method for carbon content in fly ash based on LightGBM and XGboost combined model
CN116738866A (en) Instant learning soft measurement modeling method based on time sequence feature extraction
CN110909492A (en) Sewage treatment process soft measurement method based on extreme gradient lifting algorithm
CN116227350A (en) Multi-target optimization method and device for boiler
CN114924489A (en) Model autonomous learning method suitable for process industry prediction control
CN115186584A (en) Width learning semi-supervised soft measurement modeling method integrating attention mechanism and adaptive composition
CN115035962A (en) Variational self-encoder and generation countermeasure network-based virtual sample generation and soft measurement modeling method
CN113707240A (en) Component parameter robust soft measurement method based on semi-supervised nonlinear variational Bayes mixed model
Zhang et al. Modelling of gas turbine via discrete state-space neural networks
CN117190173B (en) Optimal control method and control system for flue gas recirculation and boiler coupling system
CN118171765A (en) Multi-output online prediction modeling method for urban solid waste incineration process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant