CN113449476B - Stacking-based soft measurement method for butane content in debutanizer - Google Patents
Stacking-based soft measurement method for butane content in debutanizer Download PDFInfo
- Publication number
- CN113449476B CN113449476B CN202110771243.5A CN202110771243A CN113449476B CN 113449476 B CN113449476 B CN 113449476B CN 202110771243 A CN202110771243 A CN 202110771243A CN 113449476 B CN113449476 B CN 113449476B
- Authority
- CN
- China
- Prior art keywords
- training
- prediction
- learner
- butane content
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/28—Design optimisation, verification or simulation using fluid dynamics, e.g. using Navier-Stokes equations or computational fluid dynamics [CFD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2113/00—Details relating to the application field
- G06F2113/08—Fluids
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2119/00—Details relating to the type or aim of the analysis or the optimisation
- G06F2119/14—Force analysis or force optimisation, e.g. static or dynamic forces
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Abstract
The invention discloses a Stacking-based soft measurement method for butane content in a debutanizer. The method includes the steps of carrying out feature expansion on process variables at the current sampling moment through a time sliding window, solving the problem of process dynamics by using historical process data, enhancing the diversity among first-layer learners in an ensemble learning model by introducing a feature disturbance mechanism and adopting a non-homogeneous learner, and further combining the output of the first-layer learner by using a Stacking integration strategy and a second-layer learner to obtain a final predicted value of butane content.
Description
Technical Field
The invention belongs to the field of prediction and soft measurement of industrial processes, and particularly relates to a soft measurement method for butane content in a debutanizer based on Stacking.
Background
There are many variables in an actual industrial process that are difficult or costly to measure directly, and these variables tend to affect product quality to a large extent. The industrial process soft measurement technology is a method for estimating the true value of a variable to be measured by establishing a mathematical model between the variable to be measured and other variables easy to measure. The soft measurement technology is mainly divided into two types of modeling through mechanism and modeling based on data, and the development of computer technology and machine learning enables the application of the soft measurement method based on data modeling to be more and more extensive. Common soft measurement methods based on data modeling include: support vector machines, partial least squares, neural networks, and the like. These methods have better prediction effects on simple data sets, however, these single model methods perform poorly in the face of soft measurement problems of process nonlinearity, data non-gaussian distributions, and process dynamics.
The debutanization process is used in industrial oil refining processes to remove propane and butane from naphtha gases, and monitoring and controlling the butane content at the bottom of the debutanizer is important to maximize the production of liquefied petroleum gas. Gas chromatographs are used in the industry to monitor butane content, but the location of this hardware sensor installation can result in a delay of over 30 minutes in the measurement. Therefore, in order to realize real-time monitoring of the butane content, a soft measurement model of the butane content at the bottom of the debutanizer needs to be established so as to fully utilize other process data which can be monitored in real time in the debutanizing process. The debutanizing process has strong nonlinearity and dynamics, the butane content at the current moment and the process data at the previous sampling moments have close relation, and the traditional single prediction model is difficult to meet the requirements.
Patent publication No. CN103389360A discloses a soft measurement method for the butane content of a debutanizer based on a probability principal component regression model, introduces a probability modeling method, and provides a soft measurement model based on probability principal component regression, which can simultaneously model process data and noise information. Patent publication No. CN108647373A discloses "a method for soft measurement of industrial process based on xgboost model", which includes: independent repeated sampling and data preprocessing are carried out on historical data, an xgboost model is established by utilizing a training sample, and a soft measurement model aiming at a target variable is established through cross validation and parameter adjustment. The methods reported in the above patents all use a soft measurement method based on a specific regression model, and when the method is applied to the butane content prediction of a debutanizer, the characteristics of dynamicity, nonlinearity and the like of debutanizer process data cannot be well processed at the same time, and when the method is applied to the establishment of a butane content soft measurement model, the prediction accuracy is difficult to guarantee.
Patent publication No. CN110066895A discloses "a method for predicting a blast furnace molten iron mass interval based on Stacking", which comprises: acquiring original historical data of a blast furnace and preprocessing the data; extracting a sample data set from the preprocessed blast furnace original historical data according to the input and output parameters; establishing a Stacking algorithm molten iron quality model based on an N-fold model and calculating a modeling error prediction interval; and predicting the real-time collected blast furnace data according to the Stacking algorithm molten iron quality model of the N-fold model to obtain a predicted value and a predicted interval. In fact, the first-layer learner in the Stacking model reported in the patent only has one random weight neural network, and the second-layer learner adopts the same model as the first-layer learner, so that the advantages of the Stacking model cannot be well utilized.
Disclosure of Invention
The invention aims to provide a soft measuring method for butane content in a debutanizer based on Stacking, aiming at overcoming the defects of the prior art in solving the problems of nonlinearity and dynamics of debutanizing process data.
The purpose of the invention is realized by the following technical scheme: a method for integrally learning soft measurement of butane content in a debutanizer based on Stacking comprises the following steps:
(1) continuously sampling the tower top temperature, the tower top pressure, the reflux quantity, the flow to the next stage, the sixth tower plate temperature, the tower bottom temperature 1 and the tower bottom temperature 2 of the debutanizer for n times in a fixed sampling period T, obtaining the butane content value at each sampling moment through off-line laboratory analysis, and obtaining n samples which are used as an original sample set and are expressed as Dtrain={(Xi′,yi′) 1,2, …, n, where X isi′For the feature vector, there are seven columns, Xi′∈R7Each column represents the column top temperature, column top pressure, reflux amount, flow rate to the next stage, sixth column plate temperature, column bottom temperature 1, column bottom temperature 2, y at the i' th sampling timei,For the prediction target, there is one column in total, yi,∈R1And represents the butane content at the i' th sampling time.
(2) Respectively carrying out feature screening and feature disturbance on an original training sample set by utilizing an XGB OST model and a time sliding window mechanism, and constructing two training sets Dtrain1={(Xi_train1,yi)|i=W1,W1+1, …, n } and Dtrain2={(Xi_train2,yi)|i=W1,W1+1, …, n }; wherein, W1Is a pair DtrainThe time sliding window width of the feature expansion.
(3) Establishing a two-layer ensemble learning butane content prediction model based on the Stacking learning strategy based on the two training sample sets obtained in the step (2), and specifically comprising the following steps:
(3.1) Cross-training method based on K-fold using Dtrain1Training two different learners, denoted L1And L2Using Dtrain2Training two different learners, denoted L3And L4Obtaining four learners in total as the first-layer learners in the prediction model, obtaining the prediction value of each learner on the training sample, and expressing the prediction value of the t-th learner on the ith training sample asWherein t is 1,2, 3, 4, i is W1,W1+1,…,n。
(3.2) mixing L1And L2Averaging of the predicted values over the training samples, L3And L4Averaging the predicted values over the training samples and using the original training set DtrainButane content y iniConstructing a second-level learner training set Dconstructed={(X″i,i)|i=W1,W1+1, …, n }, wherein X ″ ", isi∈R2,
(3.3) use of DstackingTraining the second tier learner, denoted as L.
(4) Predicting the real-time butane content of the debutanizer by using a Stacking-based ensemble learning debutanizer butane content soft measurement model: performing characteristic disturbance and expansion on the tower top temperature, the tower top pressure, the reflux amount, the flow to the next stage, the sixth tower plate temperature, the tower bottom temperature 1, the tower bottom temperature 2 and historical sample data in a database obtained by the sensors in the step (2) to obtain two new samples, wherein the first sample is used as D in the step (3.1)train1The two learner inputs obtained from the training, the second sample as D in step (3.1)train2Training the obtained inputs of two learners to obtainAnd (4) when the four predicted values are obtained, respectively averaging the four predicted values in the step (3.2) to obtain two characteristic values which are used as the input of the second-layer learning device in the step (3.3), and using the output of the second-layer learning device as the final real-time predicted value of the butane content.
Further, in the step (2), feature screening and feature disturbance are respectively performed on the original training sample set by using the XGBOOST model and the time sliding window mechanism, and the specific steps are as follows:
(2.1) set of training samples Dtrain(Xi,) As input to the XGBOOST model, Dtrain(yi′) As the target output of the XGBOOST model, training the XGBOOST model, deleting the features with lower scores according to the feature _ attributes of the XGBOOST model after the training is finished, and obtaining a training sample set D after feature screeningtrain_screened={(X′i′,yi′) 1,2, …, n, wherein X'i′Each column of (a) respectively representing a retained feature, yi′∈R1Still represents the butane content at the i' th sampling instant.
(2.2) determining a first time sliding window width W1Through a time sliding window pair DtrainThe feature variables in (1) are expanded, and a training sample set after feature expansion is expressed asWherein the content of the first and second substances,representing the characteristic variable X of the ith sample after characteristic expansion through the first time sliding windowi_train1In total of 7W1The columns of the image data are,
(2.3) determining a second time sliding window width W2(W2<W1) Through a time sliding window pair Dtrain_screenedThe feature variables in (1) are expanded, and a training sample set after feature expansion is expressed as Wherein the content of the first and second substances,representing the characteristic variable X of the ith sample after characteristic screening and characteristic expansion by a second time sliding windowi_train2In total of 4W2The columns of the image data are,
further, the step (3.1) of training the first-layer learner based on the K-fold cross training method specifically includes the steps of:
(3.1.1) adding Dtrain1Dividing the prediction set into K subsets, taking K-1 subsets as a training set to train a learner, and taking the other subset as a prediction set; performing K times of training and prediction, selecting a subset different from the previous one as a prediction set each time, storing the prediction output of the learner on the prediction set after each training is completed, and obtaining all X times of the learner in the X times through K times of training and predictioni_train1,i=W1,W1Predicted values at +1, …, n; co-training two different learners, the two learners being at Xi_train1,i=W1,W1The predicted values at +1, …, n are:
(3.1.2) reaction of Dtrain2Dividing the prediction set into K subsets, taking K-1 subsets as a training set to train a learner, and taking the other subset as a prediction set; performing K times of training and prediction, selecting a subset different from the previous one as a prediction set each time, storing the prediction output of the learner on the prediction set after each training is completed, and obtaining all X times of the learner in the X times through K times of training and predictioni_train2,i=W1,W1Predicted values on +1, …, n; co-training two different learners, the two learners being at Xi_train2,i=W1,W1The predicted values at +1, …, n are:
further, in step (3.1), D is addedtrain1、Dtrain2Equally dividing into K subsets for training the learning device.
Further, the learner includes an XGBOOST regression model, a RBF kernel-based support vector machine regression model, a multi-layered perceptron regression model, a bayesian ridge regression model, a RBF kernel-based ridge regression model, and the like.
The invention has the beneficial effects that: the method and the device have the advantages that the process variable at the current sampling moment is subjected to characteristic expansion through the time sliding window, the problem of process dynamics is solved by using historical process data, the diversity among first-layer learners in the ensemble learning model is enhanced by introducing a characteristic disturbance mechanism and adopting a non-homogeneous learner, and then the final predicted value of the butane content is obtained by combining the output of the first-layer learner by using a Stacking integration strategy and a second-layer learner.
Drawings
FIG. 1 is a schematic flow diagram of a debutanizer column;
FIG. 2 is a schematic diagram of feature expansion using a width 3 time sliding window;
FIG. 3 is a flow chart of a soft measurement model for real-time estimation of butane content according to an embodiment of the present invention; wherein, XGBOOST represents XGBOOST regression model, KSVR represents support vector regression model based on RBF kernel, MLP represents multi-layer perceptron regression model, BRR represents bayesian ridge regression model, KRR represents ridge regression model based on RBF kernel;
FIG. 4 is a graph of the results of prediction of butane content in a Stacking-based ensemble learning debutanizer according to an embodiment of the present invention; wherein "+" is the laboratory analysis value of the butane content of each sampling point, and "+" is the predicted value of the butane content of each sampling point.
Detailed Description
The invention is further described in detail below with reference to the figures and examples.
The invention relates to a Stacking-based ensemble learning debutanizer soft measurement method, which comprises the following steps:
(1) continuously sampling the tower top temperature U1, the tower top pressure U2, the reflux U3, the flow U4 to the next stage, the sixth tower plate temperature U5, the tower bottom temperature 1U6 and the tower bottom temperature 2U7 of the debutanizer for n times in a fixed sampling period T, obtaining the butane content value at each sampling moment through off-line laboratory analysis, and obtaining n samples as an original sample set which are expressed as Dtrain={(Xi′,yi′) I | 1,2, …, n }. Wherein, Xi′For the feature vector, there are seven columns, Xi′∈R7Each column respectively represents the tower top temperature, the tower top pressure, the reflux quantity, the flow to the next stage, the sixth tower plate temperature, the tower bottom temperature 1 and the tower bottom temperature 2 at the ith' sampling moment; y isi′For the prediction target, there is one column in total, yi′∈R1And represents the butane content at the i' th sampling time. Fig. 1 illustrates seven process variables that need to be collected at each sampling point.
(2) Respectively carrying out feature screening and feature disturbance on an original training sample set by utilizing an XGB OST model and a time sliding window mechanism, and constructing two training sets Dtrain1And Dtrain2The method comprises the following specific steps:
(2.1) set of training samples Dtrain(Xi′) As input to the XGBOOST model, Dtrain(yi′) As the target output of the XGBOOST model, training the XGBOOST model, deleting three features with lower scores according to feature _ attributes of the XGBOOST model after the training is finished, and obtaining a training sample set D after feature screeningtrain_screened={(X′i′,yi′) 1,2, …, n, wherein X'i′∈R4There are four columns, each column representing the column top temperature, reflux amount, flow rate to the next stage and sixth tray temperature at the i' th sampling time, yi′∈R1Still represents the butane content at the i' th sampling instant.
(2.2) determining a first time sliding window width W1Through a time sliding window pair DtrainThe feature variables in (1) are expanded, and a training sample set after feature expansion is expressed as Wherein the content of the first and second substances,representing the characteristic variable X of the ith sample after characteristic expansion through the first time sliding windowi_train1In total of 7W1The columns of the image data are,as shown in FIG. 2, W1=3。
(2.3) determining a second time sliding window width W2(W2<W1) Through a time sliding window pair Dtrain_screenedThe feature variables in (1) are expanded, and a training sample set after feature expansion is expressed asWherein the content of the first and second substances,representing the characteristic variable X of the ith sample after characteristic screening and characteristic expansion by a second time sliding windowi_train2In total of 4W2The columns of the image data are,
(3) establishing a two-layer ensemble learning butane content prediction model based on the Stacking learning strategy based on the two training sample sets obtained in the step (2), and specifically comprising the following steps:
(3.1) Cross-training method based on K-fold using Dtrain1Training two different learners, denoted L1And L2Using Dtrain2Training two different learners, denoted L3And L4Obtaining four learners in total as the first-layer learners in the prediction model, and obtaining the prediction value of each learner on the training sample, wherein the prediction value of the t-th learner on the ith training sample is represented asWherein t is 1,2, 3, 4, i is W1,W1+1, …, n. The method comprises the following specific steps:
(3.1.1) reaction of Dtrain1Equally dividing the data into K subsets, taking K-1 subsets as a training set to train a learning machine, and taking the other subset as a prediction set; performing K times of training and prediction, selecting a subset different from the previous one as a prediction set each time, storing the prediction output of the learner on the prediction set after each training is completed, and obtaining all X times of the learner in the X times through K times of training and predictioni_train1,i=W1,W1Predicted values on +1, …, n; co-training two different learners, the two learners being at Xi_train1,i=W1,W1The predicted values at +1, …, n are:
(3.1.2) adding Dtrain2Equally dividing the data into K subsets, taking K-1 subsets as a training set to train a learning machine, and taking the other subset as a prediction set; performing K times of training and prediction, selecting a subset different from the previous one as a prediction set each time, storing the prediction output of the learner on the prediction set after each training is completed, and obtaining all X times of the learner in the X times through K times of training and predictioni_train2,i=W1,W1Predicted values on +1, …, n; co-training two different learners, the two learners being at Xi_train2,i=W1,W1The predicted values at +1, …, n are:
(3.2) mixing L1And L2Averaging of predicted values over training samplesL3And L4Averaging of predicted values over training samplesAnd using the original training set DtrainButane content y iniConstructing a second-level learner training set Dconstructed={(X″i=[y′i,y″i],yi)|i=W1,W1+1, …, n }; wherein, X ″)i∈R2,i=W1,W1+1,…,n。
(3.3) use of DconstructedTraining the second tier learner, denoted as L.
(4) Predicting the real-time butane content of the debutanizer by using the Stacking-based integrated learning debutanizer butane content soft measurement model established in the step (3):
and (3) performing characteristic disturbance and expansion on seven process variables including real-time tower top temperature, tower top pressure, reflux quantity, flow to the next stage, sixth tower plate temperature, tower bottom temperature 1 and tower bottom temperature 2 obtained by the sensors and historical sample data in a database in the step (2) to obtain two new samples, taking the first sample as the input of two learners obtained by training in the step (3.1.1), taking the second sample as the input of two learners obtained by training in the step (3.1.2) to obtain four predicted values, averaging the four predicted values in the step (3.2) to obtain two characteristic values respectively, taking the two characteristic values as the input of a second-layer learner in the step (3.3), and taking the output of the second-layer learner as a final real-time predicted value of the butane content.
Examples
The invention is illustrated below with reference to a specific debutanizer butane content prediction example:
the debutanizer is continuously sampled to obtain 2394 samples, the first 1596 samples are used as a training set for training the integrated learning soft measurement model, and the last 798 samples are used as a test set for verifying the effectiveness of the integrated learning soft measurement model. In the process, seven process variables were selected to model the butane content at the bottom of the debutanizer, which were the overhead temperature, overhead pressure, reflux, flow to the next stage, sixth tray temperature, bottom temperature 1, bottom temperature 2, as shown in fig. 1.
The following will explain the implementation steps of the present invention in detail with reference to the specific process, as shown in fig. 3, specifically:
1. an XGB OST regression model is trained by using 1596 training samples, and importance scores of seven process variables are obtained through feature _ attributes of the XGB OST model, wherein the importance scores are respectively as follows: 0.151, 0.001, 0.166, 0.095, 0.585, 0, 0.003. The 2 nd, 6 th and 7 th process variables in each sample are deleted in feature screening according to the scores of the various process variables.
2. Using a time sliding window with the width of 50 to perform characteristic expansion on the 1596 original training samples to obtain Dtrain11546 new training samples are included, each sample having a feature vector dimension of 350.
Using a time sliding window with the width of 35 to perform characteristic expansion on 1596 samples obtained by characteristic screening to obtain Dtrain2Comprises 1561 new training samples with a feature vector dimension of 140, and keeps Dtrain1And Dtrain2The number of middle training samples is consistent, and D is deletedTrain2The first 15 samples in (a).
3. Establishing a Stacking-based ensemble learning debutanizer butane content soft measurement model according to a detailed method in the implementation steps:
(1) use of Dtrain1Training an XGBOOST regression model (XGBOOST) and an RBF kernel-based support vector machine regression model (KSVR) using Dtrain2A multi-level perceptron regression Model (MLP) and a bayesian ridge regression model (BRR) are trained as the first level learner.
(2) Using 3-fold cross training method to separate Dtrain1The trisection is divided into three subsets, each subset contains 516, 515 and 515Training samples which are sequentially marked as a subset 1, a subset 2 and a subset 3, training an XGB regression model and a support vector machine regression model based on an RBF core by using the subset 1 and the subset 2 as training sets in a first round, then using the subset 3 as a prediction set of the two models, and storing the average value of prediction results of the two models; in the second round, the subset 1 and the subset 3 are used as training sets, an XGB regression model and an RBF kernel-based support vector machine regression model are trained, then the subset 2 is used as a prediction set of the two models, and the average value of prediction results of the two models is stored; the third round takes the subset 2 and the subset 3 as training sets, trains an XGB regression model and a RBF core-based support vector machine regression model, then takes the subset 1 as a prediction set of the two models, and stores the average value of the prediction results of the two models; the predicted average values of the XGB OST regression model and the RBF core-based support vector regression model on 1546 samples can be obtained through three rounds of training. Similarly, Dtrain2Trisection is carried out to three subsets, and the predicted average values of the multilayer perceptron regression model and the Bayesian ridge regression model on 1546 samples are obtained through three rounds of training.
(3) These two predicted averages are used as the eigenvalues of the second-level learner input samples, Dtrain1Using the 1546 label values as output labels of the second-layer learner, and constructing a training set D of the second-layer learnerconstrutedA ridge regression model (KRR) based on RBF kernel is trained as a second-level learner.
4. 798 test samples were used to verify the validity of the soft measurement model proposed by the present invention:
after each test sample and each historical sample are subjected to feature expansion through a first time sliding window, the test sample with the feature vector dimension of 350 is used as a first-layer learner L1And L2The two predicted values are averaged to obtain a first characteristic of the input sample of the second-layer learner.
After the test sample is subjected to feature screening and feature expansion by a second time sliding window, the feature vector dimension is expanded to 140 and is used as a first-layer learner L3And L4Is inputted to obtainAnd averaging the two predicted values to be used as a second characteristic of the input sample of the second-layer learner.
And taking the output value of the second-layer learner, namely the ridge regression model based on the RBF core as the final estimated value of the real-time butane content.
FIG. 4 is a graph comparing the butane content prediction results and the butane content analysis values of the soft measurement model of the present invention over 798 test samples; wherein "+" is the laboratory analysis value of the butane content of each sampling point, and "+" is the model prediction value of the butane content of each sampling point. It can be seen that, at most sampling points, the estimated value of the butane content obtained by the soft measurement model provided by the present invention can track the offline analysis value of the butane content with a small error, and the Root Mean Square Error (RMSE) is obtained for the estimated value of 798 samples and the offline analysis value of 798 samples, and the error value is about 0.1, in this embodiment, the calculation formula of the Root Mean Square Error (RMSE) is:
wherein, ykAn off-line analysis value representing the butane content of each sampling point,a soft measurement estimate representing the butane content of each sample point.
The above examples are intended to illustrate the invention, but not to limit the invention, and any modifications and variations of the invention within the spirit of the invention and the scope of the claims are within the scope of the invention.
Claims (5)
1. A method for integrally learning soft measurement of butane content in a debutanizer based on Stacking is characterized by comprising the following steps:
(1) continuously sampling the tower top temperature, the tower top pressure, the reflux quantity, the flow to the next stage, the sixth tower plate temperature, the tower bottom temperature 1 and the tower bottom temperature 2 of the debutanizer for n times in a fixed sampling period T, and introducingAnalyzing in off-line laboratory to obtain butane content value at each sampling time, and obtaining n samples as original sample set, denoted as Dtrain={(Xi′,yi′) 1,2, ·, n }, where X ═ i' ═ 1,2i′For the feature vector, there are seven columns, Xi′∈R7Each column represents the column top temperature, column top pressure, reflux amount, flow to the next stage, column tray temperature, column bottom temperature 1, column bottom temperature 2, y at the ith' sampling time, respectivelyi′For the prediction target, there is one column in total, yi′∈R1Represents the butane content at the i' th sampling time;
(2) respectively carrying out feature screening and feature disturbance on an original training sample set by utilizing an XGB OST model and a time sliding window mechanism, and constructing two training sets Dtrain1={(Xi_train1,yi)|i=W1,W1+1,.,. n } and Dtrain2={(Xi_train2,yi)|i=W1,W1+1,.. multidot.n }; wherein, W1Is a pair DtrainA time sliding window width of feature expansion;
(3) establishing a two-layer ensemble learning butane content prediction model based on the Stacking learning strategy based on the two training sample sets obtained in the step (2), and specifically comprising the following steps:
(3.1) Cross-training method based on K-fold using Dtrain1Training two different learners, denoted L1And L2Using Dtrain2Training two different learners, denoted L3And L4Obtaining four learners in total as the first-layer learners in the prediction model, obtaining the prediction value of each learner on the training sample, and expressing the prediction value of the t-th learner on the ith training sample asWherein t is 1,2, 3, 4, i is W1,W1+1,...,n;
(3.2) mixing L1And L2Averaging of the predicted values over the training samples, L3And L4Averaging the predicted values over the training samples and using the original training set DtrainButane content y iniConstructing a second-level learner training set Dconstructed={(X″i,yi)|i=W1,W1+1, ai∈R2,X″i=[y′i,y′i],i=W1,W1+1,...,n;
(3.3) use of DstackingTraining a second-layer learner, and recording as L;
(4) predicting the real-time butane content of the debutanizer by using a Stacking-based ensemble learning debutanizer butane content soft measurement model: performing characteristic disturbance and expansion on the tower top temperature, the tower top pressure, the reflux amount, the flow to the next stage, the sixth tower plate temperature, the tower bottom temperature 1, the tower bottom temperature 2 and historical sample data in a database obtained by the sensors in the step (2) to obtain two new samples, wherein the first sample is used as D in the step (3.1)train1The two learner inputs obtained from the training, the second sample as D in step (3.1)train2And (3) training the obtained input of the two learners to obtain four predicted values, respectively averaging the four predicted values in the step (3.2) to obtain two characteristic values which are used as the input of the second-layer learner in the step (3.3), and using the output of the second-layer learner as the final real-time predicted value of the butane content.
2. The method for integrated learning and soft measurement of the butane content in the debutanizer based on Stacking according to claim 1, wherein the step (2) of respectively performing feature screening and feature perturbation on an original training sample set by using an XGBOOST model and a time sliding window mechanism comprises the following specific steps:
(2.1) set of training samples Dtrain(Xi′) As input to the XGBOOST model, Dtrain(yi′) As the target output of the XGBOST model, the XGBOST model is trained and finishedDeleting the features with lower scores according to the feature _ attributes of the XGBOOST model after the completion to obtain a training sample set D after feature screeningtrain_screened={(X′i′,yi′) 1,2, ·, n }, where X'i′Each column of (a) respectively representing a retained feature, yi′∈R1Still represents the butane content at the i' th sampling instant;
(2.2) determining a first time window width W1Through a time sliding window pair DtrainThe feature variables in (1) are expanded, and a training sample set after feature expansion is expressed asWherein the content of the first and second substances,representing the characteristic variable X of the ith sample after characteristic expansion through the first time sliding windowi_train1In total of 7W1The columns of the image data are,
(2.3) determining a second time sliding window width W2,W2<W1Through a time sliding window pair Dtrain_screenedThe feature variables in (1) are expanded, and a training sample set after feature expansion is expressed as Wherein, the first and the second end of the pipe are connected with each other,representing the characteristic variable X of the ith sample after characteristic screening and characteristic expansion by a second time sliding windowi_train2In total of 4W2The columns of the image data are arranged in rows,
3. the method for integrated learning and soft measurement of the butane content in the debutanizer based on Stacking according to claim 1, wherein the step (3.1) of training the first-layer learner based on the K-fold cross training method comprises the following specific steps:
(3.1.1) adding Dtrain1Dividing the training set into K subsets, taking K-1 subsets as a training set to train a learner, and taking the other subset as a prediction set; performing K times of training and prediction, selecting a subset different from the previous one as a prediction set each time, storing the prediction output of the learner on the prediction set after each training is completed, and obtaining all X times of the learner in the X times through K times of training and predictioni_train1,i=W1,W1+ 1.. ang., predicted value on n; co-training two different learners, the two learners being at Xi_train1,i=W1,W1+1,.. the predicted values on n are:i=W1,W1+1,...,n;
(3.1.2) adding Dtrain2Dividing the training set into K subsets, taking K-1 subsets as a training set to train a learner, and taking the other subset as a prediction set; performing K times of training and prediction, selecting a subset different from the previous one as a prediction set each time, storing the prediction output of the learner on the prediction set after each training is completed, and obtaining all X times of the learner in the X times through K times of training and predictioni_train2,i=W1,W1+ 1.. ang., predicted value on n; co-training two different learners, the two learners being at Xi_train2,i=W1,W1+1,.. the predicted values on n are:i=W1,W1+1,...,n。
4. the Stacking-based ensemble learning debutanizer butane content soft measurement method according to claim 3, wherein in step (3.1), D istrain1、Dtrain2Equally dividing into K subsets for training the learning device.
5. The Stacking-based ensemble learning debutanizer butane content soft measurement method according to claim 1, wherein the learner comprises an XGBOOST regression model, a RBF kernel-based support vector machine regression model, a multi-layered perceptron regression model, a bayesian ridge regression model, a RBF kernel-based ridge regression model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110771243.5A CN113449476B (en) | 2021-07-08 | 2021-07-08 | Stacking-based soft measurement method for butane content in debutanizer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110771243.5A CN113449476B (en) | 2021-07-08 | 2021-07-08 | Stacking-based soft measurement method for butane content in debutanizer |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113449476A CN113449476A (en) | 2021-09-28 |
CN113449476B true CN113449476B (en) | 2022-07-05 |
Family
ID=77815400
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110771243.5A Active CN113449476B (en) | 2021-07-08 | 2021-07-08 | Stacking-based soft measurement method for butane content in debutanizer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113449476B (en) |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8357795B2 (en) * | 2008-08-04 | 2013-01-22 | Allergan, Inc. | Hyaluronic acid-based gels including lidocaine |
CN105205224B (en) * | 2015-08-28 | 2018-10-30 | 江南大学 | Time difference Gaussian process based on fuzzy curve analysis returns soft-measuring modeling method |
CN109240090B (en) * | 2018-11-08 | 2020-10-23 | 浙江大学 | Adaptive soft measurement modeling method for incremental learning XGBOST model based on time difference |
CN110066895B (en) * | 2019-04-10 | 2021-01-12 | 东北大学 | Stacking-based blast furnace molten iron quality interval prediction method |
CN111914476B (en) * | 2020-06-23 | 2023-09-29 | 宁波大学 | On-line soft measurement method for butane content of debutanizer bottom product |
CN112884079A (en) * | 2021-03-30 | 2021-06-01 | 河南大学 | Method for estimating near-surface nitrogen dioxide concentration based on Stacking integrated model |
-
2021
- 2021-07-08 CN CN202110771243.5A patent/CN113449476B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN113449476A (en) | 2021-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109884892B (en) | Process industrial system prediction model based on cross correlation time-lag grey correlation analysis | |
Zhu et al. | Deep learning based soft sensor and its application on a pyrolysis reactor for compositions predictions of gas phase components | |
CN105205224A (en) | Modeling method for soft measurement of time difference gaussian process regression based on fuzzy curve analysis | |
CN110096810B (en) | Industrial process soft measurement method based on layer-by-layer data expansion deep learning | |
CN107526927B (en) | Blast furnace molten iron quality online robust soft measurement method | |
CN107168063B (en) | Soft measurement method based on integrated variable selection type partial least square regression | |
CN106227910B (en) | A kind of accelerated degradation test reliability estimation method based on gray system theory | |
CN113761787A (en) | Blast furnace molten iron silicon content online prediction method and system based on deep migration network | |
CN110175425A (en) | A kind of prediction technique of the gear remaining life based on MMALSTM | |
CN113361690A (en) | Water quality prediction model training method, water quality prediction device, water quality prediction equipment and medium | |
Graziani et al. | Design of a soft sensor for an industrial plant with unknown delay by using deep learning | |
CN114117919B (en) | Instant learning soft measurement modeling method based on sample collaborative representation | |
CN110188399B (en) | Dam safety monitoring single-measuring-point evaluation method based on multiple correlation sequences | |
CN113449476B (en) | Stacking-based soft measurement method for butane content in debutanizer | |
CN112949836A (en) | Method for carrying out regression prediction on-line migration learning on time-varying distribution data | |
CN116821695A (en) | Semi-supervised neural network soft measurement modeling method | |
CN116842358A (en) | Soft measurement modeling method based on multi-scale convolution and self-adaptive feature fusion | |
CN116386756A (en) | Soft measurement modeling method based on integrated neural network reliability estimation and weighted learning | |
CN113707240B (en) | Component parameter robust soft measurement method based on semi-supervised nonlinear variation Bayesian hybrid model | |
CN115186584A (en) | Width learning semi-supervised soft measurement modeling method integrating attention mechanism and adaptive composition | |
CN112182854B (en) | Intelligent monitoring method and system for abnormal furnace conditions of blast furnace | |
CN107977742B (en) | Construction method of medium-long term power load prediction model | |
CN106339588A (en) | Discrete modeling method of accelerated degradation data based on grey system theory | |
CN111650894A (en) | Bayesian network complex industrial process soft measurement method based on hidden variables | |
CN111126694A (en) | Time series data prediction method, system, medium and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |