CN114896860B - Soft measurement method for carbon content of fly ash based on LightGBM and XGBoost combined model - Google Patents

Soft measurement method for carbon content of fly ash based on LightGBM and XGBoost combined model Download PDF

Info

Publication number
CN114896860B
CN114896860B CN202210318954.1A CN202210318954A CN114896860B CN 114896860 B CN114896860 B CN 114896860B CN 202210318954 A CN202210318954 A CN 202210318954A CN 114896860 B CN114896860 B CN 114896860B
Authority
CN
China
Prior art keywords
lightgbm
xgboost
model
fly ash
carbon content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210318954.1A
Other languages
Chinese (zh)
Other versions
CN114896860A (en
Inventor
刘军平
骆海瑞
彭涛
胡新荣
何儒汉
朱强
张俊杰
熊明福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Textile University
Original Assignee
Wuhan Textile University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Textile University filed Critical Wuhan Textile University
Priority to CN202210318954.1A priority Critical patent/CN114896860B/en
Publication of CN114896860A publication Critical patent/CN114896860A/en
Application granted granted Critical
Publication of CN114896860B publication Critical patent/CN114896860B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a fly ash carbon content soft measurement method based on LightGBM and XGBoost combined model, which comprises the following steps: 1) Clear the obvious error value in DCS data and extract steady-state data by using a data mining technology; 2) In the soft measurement of the carbon content of the fly ash, the redundant characteristic problem is solved by utilizing a correlation matrix and a packaging method in characteristic engineering; 3) And (3) combining LightGBM, XGBoost the processed data set with a Bayesian Optimization (BO) algorithm to perform fly ash carbon content prediction modeling, so as to select the optimal super-parameters and improve the prediction precision. 4) The BO-XGBoost and BO-LightGBM models are combined using a sequence least squares programming algorithm. Compared with a general fly ash carbon content soft measurement model, the invention provides a more detailed and reasonable feature processing method, eliminates redundant features and is more beneficial to subsequent predictive modeling. The model LightGBM, XGBoost is combined by adopting a sequence square planning algorithm, so that the model has stronger generalization capability and higher prediction precision, and meanwhile, the effect obtained in the fly ash carbon content soft measurement task is better than that obtained by the traditional method.

Description

Soft measurement method for carbon content of fly ash based on LightGBM and XGBoost combined model
Technical Field
The invention belongs to the technical field of boiler fly ash carbon content measurement, and particularly relates to a fly ash carbon content soft measurement method based on LightGBM and XGBoost combined models.
Background
The carbon content of the fly ash of the boiler is one of important indexes for evaluating the combustion state of the coal-fired boiler, and the real-time monitoring of the carbon content of the fly ash is beneficial to controlling the carbon content of the fly ash within a reasonable range, so that the power generation cost is reduced, and the economy of a unit is improved. The fly ash heat loss of the boiler is the second largest heat loss next to the flue gas heat loss. In the actual operation of the boiler, the working condition of the boiler is difficult to be adjusted to the optimal working condition, and the price of a carbon measuring instrument is not good, so that the adoption of an economic and effective method for accurately obtaining the carbon content of the fly ash in real time is important to improving the combustion efficiency and guiding the production of a thermal power unit of the boiler.
The current method for obtaining the carbon content of the fly ash of the coal-fired boiler is mainly divided into 3 types: manual sampling and inspection assays, physical measurement methods, and soft measurement methods. The manual sampling and inspection test requires special personnel to sample and prepare samples periodically, so that manpower and material resources are consumed, and meanwhile, the problems of data lag, easy occurrence of error and leakage and the like exist. The physical method is usually a combustion weightlessness method, a spectroscopic analysis method, a microwave method and the like. Various physical methods are difficult to popularize widely for technical or cost reasons. The soft measurement method organically combines the knowledge of the production process through the mechanism analysis, can quickly and accurately reflect the carbon content of the fly ash under different working conditions, and has higher economy.
There have been some prior art studies on soft measurement of fly ash carbon content, however, the boiler combustion process is a multivariable, nonlinear, strongly coupled thermodynamic process. For example, the DCS system may record parameters such as air volume, air pressure, air temperature, etc. for each outlet of the coal pulverizer. When the parameters are used as the boiler combustion modeling variables, the parameters have high correlation, so that a certain variable redundancy is generated, the model estimation accuracy is affected, and the calculation complexity is increased. Therefore, there is a need to use more elaborate feature engineering methods to reduce the effects of redundant variables. Most of research tests at present have limited data and working conditions, and cannot effectively represent the whole operation working conditions of the boiler. The traditional regression method comprises linear regression, a support vector machine, a time sequence analysis method and the like. These method models are relatively simple and often do not perform well when dealing with complex, high-dimensional, multi-noise data. The integrated learning method fuses the prediction results of a plurality of learners through various voting mechanisms, and a more accurate result is obtained. Therefore, an integrated model is combined with characteristic engineering, and a model with higher accuracy is established by a super-parameter tuning method and the like and is applied to an actual combustion system.
Disclosure of Invention
The invention is made to solve the above problems, and an object of the invention is to provide a soft measurement method for fly ash carbon content based on LightGBM and XGBoost combined model, which can obtain more accurate fly ash carbon content.
In order to achieve the above object, the present invention adopts the following scheme:
As shown in fig. 1, the invention provides a fly ash carbon content soft measurement method based on LightGBM and XGBoost combined model, which comprises the following steps:
step 1, DCS (Distributed Control System) system data of a boiler are obtained, and data mining is carried out on the obtained DCS system data, wherein the steps comprise obvious outlier removal and data resampling;
Step 2, acquiring historical data variables of actual measurement parameter values of boiler working condition measuring points including relative working condition measuring points and reference working condition measuring points in a certain period, aiming at the characteristics of multivariable, nonlinear and strong coupling of the boiler combustion process, firstly finding out variables with strong coupling with the carbon content of fly ash through a correlation matrix, removing variables with low correlation with the carbon content of the fly ash, and further extracting important variables through a packaging method to serve as input of a subsequent model;
step 3, dividing the finally extracted important variables in the step 2 into a training set, a verification set and a test set, and respectively adopting XGBoost and LightGBM models as prediction models;
Step 4, performing super-parameter tuning by using a Bayesian optimization algorithm, setting 5-fold cross validation in the evaluation of satisfaction degree of the prediction model, setting the evaluation mode as RMSE, setting the iteration number as N, and establishing BO-LightGBM and BO-XGBoost models for predicting the carbon content of the fly ash after selecting the optimal super-parameters;
And 5, combining fly ash carbon content predictions of the BO-XGBoost and BO-LightGBM models by using a sequence least squares programming algorithm to obtain a final predicted value.
Further, in the step 2, the correlation matrix is represented by a correlation coefficient, and the expression of the correlation coefficient is shown in equation (1), and represents a proportional or inverse relation with the target variable;
Where r is the correlation coefficient, x i is the ith value for the x variable, y i is the ith value for the y variable, i e 1, n is the total number of values, Each is the average of the x and y variables.
Further, the historical data variables in the step 2 include the coal feeding amount of each coal mill, the primary air pressure, the air temperature, the air quantity, the outlet temperature and the current of the separator of each coal mill, the opening degree of the secondary air door of each layer, the temperature, the pressure, the air quantity and the oxygen content of the primary air and the secondary air related to the air preheater, the air feeding temperature, the pressure and the air quantity of the blower, the oxygen content and the exhaust gas temperature of the tail flue, the power generation power, the total primary air quantity, the total secondary air quantity, the hearth pressure and the hearth temperature.
Further, when the combination of the BO-XGBoost and BO-LightGBM models is performed in step 5 using a sequential least squares programming algorithm,
Wherein the objective function Obj is a mean square error functionY is the average value of the true values corresponding to all samples;
The initial value of the weight selects the ratio of the mean square error of the predicted value and the true value of the two models, as shown in a formula (7);
Where n is the total number of sample data, i represents the ith sample data, w 1,w2 is the weight coefficient of the BO-XGBoost model and the BO-LightGBM model, y 1i is the predicted value of the ith sample data obtained by the BO-XGBoost model, y 2i is the predicted value of the ith sample data obtained by the BO-LightGBM model, y i is the true value corresponding to the ith sample data, and the predicted value of the combined model is shown in formula (8);
Wherein, And/>Is the average of the predictions corresponding to all samples of the BO-XGBoost model and the BO-LightGBM model.
Further, for a given training set, the predicted values of LightGBM modelsCan be represented by formula (2):
Wherein, Representing predicted values of LightGBM models, K representing the number of decision trees, f k representing predicted values of the kth decision tree, x i representing the ith input sample; f represents a set of all decision trees; the objective function L (t) of LightGBM is represented by equation (3):
In the formula (3), n represents the total number of samples, i is the index of the current sample, Is a loss function representing a target value y i and a predicted value/>The difference between them is expressed for the regression problem by a mean square error loss function, i.e., the loss function is/> Is the predicted value of the previous t-1 round in the t-th iteration, f t(xi) is the predicted value of the t-th round, Ω (f t) is the model complexity, expressed by equation (4);
in the formula (4), r and lambda are regular term coefficients, so that the decision tree is prevented from being too complex, T represents the number of leaf nodes in the objective function, and w is the weight coefficient of the leaf nodes.
Further, for a given training set, the predicted value of XGBoost model may be expressed by the following formula:
Wherein f (x i) represents the predicted value of the XGBoost model, K represents the number of decision trees, f k represents the predicted value of the kth decision tree, x i represents the ith input sample; f represents a set of all decision trees; the objective function of XGBoost model is shown in formula (5);
Where n represents the total number of samples, i is the index of the current sample, g i represents the first derivative value of sample x i with respect to the loss function, h i represents the second derivative value of sample x i with respect to the loss function, the loss function is the mean square error loss function, f t(xi) is the predicted value of the T-th round, λ is the regularized term coefficient, T represents the number of leaf nodes in the objective function, and w j represents the weight coefficient of the j-th leaf node.
Compared with the prior art, the scheme of the invention has the beneficial effects that:
Based on the actual working condition data of the coal-fired boiler of the power plant, the invention integrates a plurality of machine learning algorithms and a data driving method of a data mining technology for the first time to analyze the relation between the carbon content of fly ash and various operation parameters of the boiler. And removing redundant features in two steps by using a correlation matrix and a packaging method, and extracting important features. And substituting the data into LightGBM, XGBoost models for training, learning, predicting and verifying, and combining the models through a sequence least square planning algorithm, so that the actual electric field operation condition can be truly and comprehensively reflected, the fly ash carbon content closest to an actual combustion system is improved, the soft measurement precision is improved, and the reliability and accuracy of the soft measurement of the fly ash carbon content of a power plant are ensured.
Drawings
FIG. 1 is a flow chart of a soft measurement method of the carbon content of fly ash according to the invention;
FIG. 2 is a flowchart of Bayesian optimization to LightGBM model super-parameter optimization according to the present invention.
FIG. 3 is a flowchart of Bayesian optimization to XGBoost model super-parameter optimization according to the present invention.
Fig. 4 is a flowchart of a sequence least squares programming algorithm combining model according to the present invention.
Detailed Description
The invention is further illustrated and described below with reference to the drawings and detailed description.
Step 1, acquiring DCS (Distributed Control System) system data of a boiler, performing data mining on the acquired DCS system data, wherein the data specifically comprises obvious outlier removal and data resampling, wherein part of the acquired DCS system data can generate outliers due to system restarting or other reasons, and rejecting all detection point data which are out of a reasonable range. Because the generating capacity needs to be adjusted according to the load of the power grid when the thermal power generating unit operates, the load fluctuation is severe, and the thermal power generating unit continuously carries out the fluctuation of working conditions such as steady state-transition-steady state. This may result in reduced correlation between the data. This effect can be minimized by combining the data into a larger time interval, resampling the data over an appropriate period of time. For example, due to power plant shutdown, the actual load recorded by the DCS has just begun to have certain invalid data.
Step 2, the boiler combustion process is a multivariable, nonlinear, strongly coupled thermodynamic process. For example, the DCS system may record parameters such as air volume, air pressure, air temperature, etc. for each outlet of the coal pulverizer. When the parameters are used as the boiler combustion modeling variables, the parameters have high correlation, so that a certain variable redundancy is generated, the model estimation accuracy is affected, and the calculation complexity is increased. Therefore, it is necessary to use feature engineering methods to reduce the effects of redundant variables. Firstly, a variable with strong coupling property is found out through a correlation matrix, a variable with low correlation of the carbon content of the neutral fly ash is removed, and an important variable is further extracted through a packaging method.
Firstly, constructing a correlation matrix to quantify variable dependence, wherein the correlation matrix is a table for showing how a variable is related to a predicted value, and is expressed by a correlation coefficient, and as shown in an equation (1), the value of the correlation coefficient can be negative or integral, and the correlation coefficient is in a proportional or inverse relation with a target variable.
Where r is the correlation coefficient, x i is the ith value for the x variable, y i is the ith value for the y variable, i e 1, n is the total number of values,Each is the average of the x and y variables.
The packaging method is a method for selecting variables according to a specific prediction model, and the method adopts a recursive feature elimination method (Recursive feature elimination, RFE). It is a greedy optimization algorithm that selects the best set of variables by iterative iterations.
Step 3, dividing the data processed in the steps 1 and 2 into a training set, a verification set and a test set, and adopting LightGBM and XGBoost models as prediction models:
LightGBM is an integrated machine learning algorithm developed by microsoft in 2017, which is a high-level implementation of a distributed Gradient promotion framework (Gradient boosting decision tree, GBDT) of a decision tree algorithm, and is integrated with a GOSS (Gradient-based One-SIDE SAMPLING) and EFB (Exclusive Feature Bundling) algorithm on the basis of GBDT, wherein the LightGBM algorithm supports parallelized learning and rapid processing of large-scale data, so that the method has higher efficiency on the premise of ensuring accuracy and interpretability. The GBDT algorithm is the core of LightGBM, and iteratively adding weak learners generates strong learners by computing negative gradients of the loss function. For GOSS, only data instances with larger gradients are used to calculate the information gain, so that a relatively accurate information gain estimate can be obtained with less data, and for EFB, the number of mutually exclusive features is reduced by adopting the feature of binding mutually exclusive. By the two methods, the calculation time is reduced, the memory is reduced, and the training is completed faster.
Predicted values for LightGBM for a given training set DCan be represented by formula (2):
Wherein, Representing the predicted value of the model, K representing the number of decision trees, f k representing the predicted value of the kth decision tree, x i representing the ith input sample; f represents a set of all decision trees; the objective function L (t) of LightGBM is represented by equation (3):
In the equation (3), n represents the number of samples, i is the current sample, Is a loss function representing a target value y i and a predicted value/>The difference between them, is often represented by a mean square error loss function for regression problems, i.e., Is the predicted value of the previous t-1 round in the t-th iteration, f t(xi) is the predicted value of the t-th round, Ω (f t) is the model complexity, and is usually expressed in equation (4).
In the formula (4), r and lambda are regular term coefficients, so that the decision tree is prevented from being too complex, T represents the number of leaf nodes in the objective function, and w is the weight coefficient of the leaf nodes.
XGBoost algorithm, namely limit gradient lifting algorithm, proposed by TIANQI CHEN is one of the machine learning algorithms widely used by data scientists at present, and has achieved good results in numerous machine learning contests. The XGBoost algorithm is an improvement of the GDBT algorithm, and is different from LightGBM in that XGBoost is finer in traversing calculation of data, and the data can be completely loaded into a memory during calculation, so that the calculation speed is increased in a parallel calculation mode. The predicted value of XGBoost algorithm is the same as the predicted value of LightGBM, and the objective function is shown in formula 5;
where n represents the number of samples, i is the current sample, g i represents the first derivative value of sample x i with respect to the loss function, h i represents the second derivative value of sample x i with respect to the loss function, f t(xi) is the predicted value of the T-th round, λ is the regularized term coefficient, T represents the number of leaf nodes in the objective function, and w j represents the weight coefficient of the j-th leaf node.
And 4, performing super-parameter tuning by using a Bayesian optimization algorithm (BO), wherein in model satisfaction evaluation, 5-fold cross validation is set, the evaluation mode is RMSE, and the sequential iteration times of the optimization process are 100. And after the optimal super parameters are selected, a BO-LightGBM model is established, and the carbon content of the fly ash is predicted by a BO-XGBoost model.
And 5, combining XGBoost and LightGBM models by using a sequence least squares programming algorithm in order to improve the model prediction accuracy and solve the problem of limited robustness of a single model. Sequence quadratic programming algorithm (Successive quadratic programming, SQP) algorithm is widely used in various fields such as least squares problem solving, nonlinear optimization problem, economics and system analysis. The combined model problem can be expressed by the formula (6) that is:
Wherein the objective function Obj is a mean square error function Y is the average value of the true values corresponding to all samples;
since equation (6) is a nonlinear quadratic function and the constraint is linear, it is a quadratic programming problem that can be solved with a sequential least squares programming algorithm.
The ratio of the mean square error of the predicted value and the true value of the two models is selected as shown in the formula (7) as the initial value of the weight, so that the solving speed can be increased and the problem of sinking into a local optimal solution can be avoided.
Where n is the total number of sample data, i represents the ith sample data, w 1,w2 is the weight coefficient of the BO-XGBoost model and the BO-LightGBM model, y 1i is the predicted value of the ith sample data obtained by the BO-XGBoost model, y 2i is the predicted value of the ith sample data obtained by the BO-LightGBM model, y i is the true value corresponding to the ith sample data, and the predicted value of the combined model is shown in formula (8);
Wherein, And/>Is the average of the predictions corresponding to all samples of the BO-XGBoost model and the BO-LightGBM model.
Example 1
Step 1, acquiring historical data of all power plant working conditions within a period of time (for example, 50 days), wherein the acquired working condition measurement points comprise coal feeding amount of each coal mill, primary air pressure, air temperature, air quantity, separator outlet temperature, current and the like of each coal mill, secondary air door opening of each layer, primary air related to an air preheater, temperature, pressure, air quantity and oxygen content of the secondary air, air supply temperature, pressure and air quantity of an air feeder, oxygen content of a tail flue, exhaust gas temperature, and other general parameters such as power generation power, total primary air quantity, total secondary air quantity, hearth pressure, hearth temperature and the like, wherein the total number of parameters is about 70;
and step two, removing obvious abnormal values in the data, and resampling the data with 5 minutes as an average interval.
And thirdly, performing feature dimension reduction by a feature dimension reduction method of machine learning. Aiming at the characteristics of multivariable, nonlinear and strong coupling in the combustion process of the boiler, firstly, the variable with strong coupling is found out through a correlation matrix, the variable with low correlation of the neutral fly ash carbon content is removed, and the important variable is further extracted through a packaging method.
In the step 2, the correlation matrix is represented by a correlation coefficient, and the expression of the correlation coefficient is shown as an equation (1) and represents a proportional or inverse relation with a target variable;
Where r is the correlation coefficient, x i is the ith value for the x variable, y i is the ith value for the y variable, i e 1, n is the total number of values, Each is the average of the x and y variables.
The packaging method is a method for selecting variables according to a specific prediction model, and the method adopts a recursive feature elimination method (Recursive feature elimination, RFE). The method is a greedy optimization algorithm, an optimal variable set is selected through repeated iteration, and variables selected by a correlation matrix are further screened through a packaging method.
Dividing all samples screened in the step three into a training set, a verification set and a test set according to a ratio of 4:1:1, and respectively adopting XGBoost and LightGBM models as prediction models; the verification adopts five-fold cross verification.
And fifthly, based on the prediction model provided by the invention, performing super-parameter tuning by using a Bayesian optimization algorithm (BO), and setting 5-fold cross verification in model satisfaction evaluation, wherein the evaluation mode is RMSE, and the sequential iteration times of the optimization process are 100. And after the optimal super parameters are selected, a BO-LightGBM model is established, and the BO-XGBoost model is used for predicting the carbon content of the fly ash, wherein the specific operation modes are shown in figures 2 and 3.
And step six, combining ash carbon content predictions of the BO-XGBoost and the BO-LightGBM models by using a sequence least square programming algorithm to obtain a final predicted value.
The above method is applied to the following embodiments to embody the technical effects of the present invention, and specific steps in the embodiments will not be described in detail.
Table 1 shows the performance comparisons of the methods presented herein with other methods. The method presented herein achieves the lowest MAPE, RMSE and highest R 2. Compared with other methods, the method reduces the RMSE by 1.8-26.2%, reduces the MAPE by 0.7-19.24%, shows that the error is further reduced, and improves the measurement precision. R 2 is improved by 1.3% -20.9%, which shows that the fitting effect of the prediction curve is better, and the method has higher accuracy and reliability. Specifically, the LM-Garson-BP, the AQPSO-SVR and the FPA-RF all adopt heuristic algorithms to carry out parameter tuning, and the regression model is combined to carry out prediction, so that the prediction precision of the corresponding model is improved to a certain extent. However, from the perspective of super-parameter tuning, when the BO algorithm faces the complex optimization problem of super-parameter tuning, such as non-convex, multi-peak and high evaluation cost, the next evaluation position can be found according to the information obtained from the unknown objective function, so that the optimal solution is reached at the highest speed. The BO algorithm avoids the problems that iteration feedback information cannot be effectively utilized, the algorithm searching speed is low, and the like. From the perspective of a prediction model, lightGBM, XGBoost is taken as an integrated algorithm model objective function of a decision tree, a second-order Taylor expansion type is adopted, so that the model can be fully learned, a regular term is added, the complexity of the model is reduced, the advantages of preventing overfitting, supporting parallel and distributed computation and the like are achieved, and the prediction precision can be effectively improved. The combined model can effectively combine the advantages of the two models on the basis of a single model, and the robustness of the model is improved. Thus, the prediction effect is better than the 6 models compared.
TABLE 1 prediction results for different models
The specific embodiments described herein are offered by way of example only to illustrate the spirit of the invention. Those skilled in the art may make various modifications or additions to the described embodiments or substitutions thereof without departing from the spirit of the invention or exceeding the scope of the invention as defined in the accompanying claims.

Claims (3)

1. The soft measurement method for the carbon content of the fly ash based on LightGBM and XGBoost combined model is characterized by comprising the following steps of:
step 1, acquiring DCS system data of a boiler, and performing data mining on the acquired DCS system data, wherein the data mining comprises obvious outlier removal and data resampling;
Step 2, acquiring historical data variables of actual measurement parameter values of boiler working condition measuring points including relative working condition measuring points and reference working condition measuring points in a certain period, aiming at the characteristics of multivariable, nonlinear and strong coupling of the boiler combustion process, firstly finding out variables with strong coupling with the carbon content of fly ash through a correlation matrix, removing variables with low correlation with the carbon content of the fly ash, and further extracting important variables through a packaging method to serve as input of a subsequent model;
step 3, dividing the finally extracted important variables in the step 2 into a training set, a verification set and a test set, and respectively adopting XGBoost and LightGBM models as prediction models;
Predicted values of LightGBM models for a given training set Can be represented by formula (2):
Wherein, Representing predicted values of LightGBM models, K representing the number of decision trees, f k representing predicted values of the kth decision tree, x i representing the ith input sample; f represents a set of all decision trees; the objective function L (t) of LightGBM is represented by equation (3):
in the formula (3), the amino acid sequence of the compound, Is a loss function representing a target value y i and a predicted value/>The difference between them is expressed for the regression problem by a mean square error loss function, i.e., the loss function is/> Is the predicted value of the previous t-1 round in the t-th iteration, f t(xi) is the predicted value of the t-th round, Ω (f t) is the model complexity, expressed by equation (4);
In the formula (4), r and lambda are regular term coefficients, so that the decision tree is prevented from being too complex, T represents the number of leaf nodes in the objective function, and w is the weight coefficient of the leaf nodes;
for a given training set, the predicted value of XGBoost model may be expressed by the following formula:
Wherein f (x i) represents the predicted value of the XGBoost model, K represents the number of decision trees, f k represents the predicted value of the kth decision tree, x i represents the ith input sample; f represents a set of all decision trees; the objective function of XGBoost model is shown in formula (5);
Wherein g i represents the first derivative value of the sample x i with respect to the loss function, h i represents the second derivative value of the sample x i with respect to the loss function, the loss function is the mean square error loss function, f t(xi) is the predicted value of the T-th round, λ is the regularized term coefficient, T represents the number of leaf nodes in the objective function, and w j represents the weight coefficient of the j-th leaf node;
Step 4, performing super-parameter tuning by using a Bayesian optimization algorithm, setting 5-fold cross validation in the evaluation of satisfaction degree of the prediction model, setting the evaluation mode as RMSE, setting the iteration number as N, and establishing BO-LightGBM and BO-XGBoost models for predicting the carbon content of the fly ash after selecting the optimal super-parameters;
Step 5, using a sequence least square programming algorithm to combine fly ash carbon content predictions of BO-XGBoost and BO-LightGBM models to obtain a final predicted value;
when the sequence least squares programming algorithm is used to combine the BO-XGBoost and BO-LightGBM models in step 5,
Wherein the objective function Obj is a mean square error functionY is the average value of the true values corresponding to all samples;
The initial value of the weight selects the ratio of the mean square error of the predicted value and the true value of the two models, as shown in a formula (7);
Where n is the total number of sample data, i represents the ith sample data, w 1,w2 is the weight coefficient of the BO-XGBoost model and the BO-LightGBM model, y 1i is the predicted value of the ith sample data obtained by the BO-XGBoost model, y 2i is the predicted value of the ith sample data obtained by the BO-LightGBM model, y i is the true value corresponding to the ith sample data, and the predicted value of the combined model is shown in formula (8);
Wherein, And/>Is the average of the predictions corresponding to all samples of the BO-XGBoost model and the BO-LightGBM model.
2. The soft measurement method for the carbon content of the fly ash based on LightGBM and XGBoost combined model as claimed in claim 1, wherein: in the step 2, the correlation matrix is represented by a correlation coefficient, and the expression of the correlation coefficient is shown as an equation (1) and represents a proportional or inverse relation with a target variable;
Where r is the correlation coefficient, x i is the ith value for the x variable, y i is the ith value for the y variable, i e 1, n is the total number of values, Each is the average of the x and y variables.
3. The soft measurement method for the carbon content of the fly ash based on LightGBM and XGBoost combined model as claimed in claim 1, wherein: the historical data variables in the step 2 comprise the coal feeding amount of each coal mill, the primary air pressure, the air temperature, the air quantity, the outlet temperature and the current of a separator of each coal mill, the opening degree of a secondary air door of each layer, the temperature, the pressure, the air quantity and the oxygen content of primary air and secondary air related to an air preheater, the air supply temperature, the pressure and the air quantity of a blower, the oxygen content and the exhaust gas temperature of a tail flue, and the power generation, the total primary air quantity, the total secondary air quantity, the furnace pressure and the furnace temperature.
CN202210318954.1A 2022-03-29 2022-03-29 Soft measurement method for carbon content of fly ash based on LightGBM and XGBoost combined model Active CN114896860B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210318954.1A CN114896860B (en) 2022-03-29 2022-03-29 Soft measurement method for carbon content of fly ash based on LightGBM and XGBoost combined model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210318954.1A CN114896860B (en) 2022-03-29 2022-03-29 Soft measurement method for carbon content of fly ash based on LightGBM and XGBoost combined model

Publications (2)

Publication Number Publication Date
CN114896860A CN114896860A (en) 2022-08-12
CN114896860B true CN114896860B (en) 2024-05-14

Family

ID=82716399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210318954.1A Active CN114896860B (en) 2022-03-29 2022-03-29 Soft measurement method for carbon content of fly ash based on LightGBM and XGBoost combined model

Country Status (1)

Country Link
CN (1) CN114896860B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019144386A1 (en) * 2018-01-26 2019-08-01 大连理工大学 Method for predicting key performance parameters of aviation engine transition state
WO2021004198A1 (en) * 2019-07-10 2021-01-14 江苏金恒信息科技股份有限公司 Plate performance prediction method and apparatus
CN112287598A (en) * 2020-09-28 2021-01-29 山西漳山发电有限责任公司 Fly ash carbon content prediction method based on particle swarm parameter optimization
CN113591930A (en) * 2021-07-06 2021-11-02 武汉纺织大学 Virus-host correlation prediction method based on network fusion and graph embedding
CN113918881A (en) * 2021-10-21 2022-01-11 武汉纺织大学 Soft measurement method and monitoring system for carbon content in fly ash based on hierarchical polynomial model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019144386A1 (en) * 2018-01-26 2019-08-01 大连理工大学 Method for predicting key performance parameters of aviation engine transition state
WO2021004198A1 (en) * 2019-07-10 2021-01-14 江苏金恒信息科技股份有限公司 Plate performance prediction method and apparatus
CN112287598A (en) * 2020-09-28 2021-01-29 山西漳山发电有限责任公司 Fly ash carbon content prediction method based on particle swarm parameter optimization
CN113591930A (en) * 2021-07-06 2021-11-02 武汉纺织大学 Virus-host correlation prediction method based on network fusion and graph embedding
CN113918881A (en) * 2021-10-21 2022-01-11 武汉纺织大学 Soft measurement method and monitoring system for carbon content in fly ash based on hierarchical polynomial model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于主成分分析和LightGBM的风电场发电功率超短期预测;曹渝昆;朱萌;;上海电力学院学报;20191215(第06期);全文 *
支持向量回归在飞灰含碳量软测量中的应用;卞和营;王军敏;;计算机测量与控制;20140225(第02期);全文 *

Also Published As

Publication number Publication date
CN114896860A (en) 2022-08-12

Similar Documents

Publication Publication Date Title
Hou et al. From model-based control to data-driven control: Survey, classification and perspective
Duan et al. A multivariate grey prediction model based on energy logistic equation and its application in energy prediction in China
CN109508818B (en) Online NOx prediction method based on LSSVM
CN112085277B (en) SCR denitration system prediction model optimization method based on machine learning
CN113107626B (en) Load prediction method of combined cycle generator set based on multivariable LSTM
CN113780420B (en) GRU-GCN-based method for predicting concentration of dissolved gas in transformer oil
Wang et al. Fuzzy modeling of boiler efficiency in power plants
CN107085371A (en) Crude(oil)unit economic model forecast control method based on data-driven
Wu et al. Integrated soft sensing of coke-oven temperature
Pepiot et al. Model reduction and lumping procedures
Wang et al. Optimization of aluminum fluoride addition in aluminum electrolysis process based on pruned sparse fuzzy neural network
Li et al. Data cleaning method for the process of acid production with flue gas based on improved random forest
CN114896860B (en) Soft measurement method for carbon content of fly ash based on LightGBM and XGBoost combined model
CN110276478B (en) Short-term wind power prediction method based on segmented ant colony algorithm optimization SVM
CN117034805A (en) WSBLMA-based fuel cell centrifugal air compressor life prediction method
CN114924489B (en) Model autonomous learning method suitable for process industry prediction control
CN116314956A (en) Method for selecting key performance parameters of fuel cell system and method for determining values
CN116227350A (en) Multi-target optimization method and device for boiler
CN115186584A (en) Width learning semi-supervised soft measurement modeling method integrating attention mechanism and adaptive composition
Chen et al. A novel fractional hausdorff discrete grey model for forecasting the renewable energy consumption
CN111459030B (en) Self-adaptive modeling method for closed-loop combustion optimization of boiler
CN112699600B (en) Thermal power operating parameters and NOxMethod for analyzing partial return between emission concentrations
CN113839072B (en) Fuel cell service stability control method and system
Tang et al. Deep belief network based NOx emissions prediction of coal-fired boiler
CN112801388B (en) Power load prediction method and system based on nonlinear time series algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant