CN113449476B - Stacking-based soft measurement method for butane content in debutanizer - Google Patents

Stacking-based soft measurement method for butane content in debutanizer Download PDF

Info

Publication number
CN113449476B
CN113449476B CN202110771243.5A CN202110771243A CN113449476B CN 113449476 B CN113449476 B CN 113449476B CN 202110771243 A CN202110771243 A CN 202110771243A CN 113449476 B CN113449476 B CN 113449476B
Authority
CN
China
Prior art keywords
training
prediction
learner
butane content
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110771243.5A
Other languages
Chinese (zh)
Other versions
CN113449476A (en
Inventor
葛志强
庄新镇
孔祥印
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202110771243.5A priority Critical patent/CN113449476B/en
Publication of CN113449476A publication Critical patent/CN113449476A/en
Application granted granted Critical
Publication of CN113449476B publication Critical patent/CN113449476B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/28Design optimisation, verification or simulation using fluid dynamics, e.g. using Navier-Stokes equations or computational fluid dynamics [CFD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/08Fluids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/14Force analysis or force optimisation, e.g. static or dynamic forces
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention discloses a Stacking-based soft measurement method for butane content in a debutanizer. The method includes the steps of carrying out feature expansion on process variables at the current sampling moment through a time sliding window, solving the problem of process dynamics by using historical process data, enhancing the diversity among first-layer learners in an ensemble learning model by introducing a feature disturbance mechanism and adopting a non-homogeneous learner, and further combining the output of the first-layer learner by using a Stacking integration strategy and a second-layer learner to obtain a final predicted value of butane content.

Description

Stacking-based soft measurement method for butane content in debutanizer
Technical Field
The invention belongs to the field of prediction and soft measurement of industrial processes, and particularly relates to a soft measurement method for butane content in a debutanizer based on Stacking.
Background
There are many variables in an actual industrial process that are difficult or costly to measure directly, and these variables tend to affect product quality to a large extent. The industrial process soft measurement technology is a method for estimating the true value of a variable to be measured by establishing a mathematical model between the variable to be measured and other variables easy to measure. The soft measurement technology is mainly divided into two types of modeling through mechanism and modeling based on data, and the development of computer technology and machine learning enables the application of the soft measurement method based on data modeling to be more and more extensive. Common soft measurement methods based on data modeling include: support vector machines, partial least squares, neural networks, and the like. These methods have better prediction effects on simple data sets, however, these single model methods perform poorly in the face of soft measurement problems of process nonlinearity, data non-gaussian distributions, and process dynamics.
The debutanization process is used in industrial oil refining processes to remove propane and butane from naphtha gases, and monitoring and controlling the butane content at the bottom of the debutanizer is important to maximize the production of liquefied petroleum gas. Gas chromatographs are used in the industry to monitor butane content, but the location of this hardware sensor installation can result in a delay of over 30 minutes in the measurement. Therefore, in order to realize real-time monitoring of the butane content, a soft measurement model of the butane content at the bottom of the debutanizer needs to be established so as to fully utilize other process data which can be monitored in real time in the debutanizing process. The debutanizing process has strong nonlinearity and dynamics, the butane content at the current moment and the process data at the previous sampling moments have close relation, and the traditional single prediction model is difficult to meet the requirements.
Patent publication No. CN103389360A discloses a soft measurement method for the butane content of a debutanizer based on a probability principal component regression model, introduces a probability modeling method, and provides a soft measurement model based on probability principal component regression, which can simultaneously model process data and noise information. Patent publication No. CN108647373A discloses "a method for soft measurement of industrial process based on xgboost model", which includes: independent repeated sampling and data preprocessing are carried out on historical data, an xgboost model is established by utilizing a training sample, and a soft measurement model aiming at a target variable is established through cross validation and parameter adjustment. The methods reported in the above patents all use a soft measurement method based on a specific regression model, and when the method is applied to the butane content prediction of a debutanizer, the characteristics of dynamicity, nonlinearity and the like of debutanizer process data cannot be well processed at the same time, and when the method is applied to the establishment of a butane content soft measurement model, the prediction accuracy is difficult to guarantee.
Patent publication No. CN110066895A discloses "a method for predicting a blast furnace molten iron mass interval based on Stacking", which comprises: acquiring original historical data of a blast furnace and preprocessing the data; extracting a sample data set from the preprocessed blast furnace original historical data according to the input and output parameters; establishing a Stacking algorithm molten iron quality model based on an N-fold model and calculating a modeling error prediction interval; and predicting the real-time collected blast furnace data according to the Stacking algorithm molten iron quality model of the N-fold model to obtain a predicted value and a predicted interval. In fact, the first-layer learner in the Stacking model reported in the patent only has one random weight neural network, and the second-layer learner adopts the same model as the first-layer learner, so that the advantages of the Stacking model cannot be well utilized.
Disclosure of Invention
The invention aims to provide a soft measuring method for butane content in a debutanizer based on Stacking, aiming at overcoming the defects of the prior art in solving the problems of nonlinearity and dynamics of debutanizing process data.
The purpose of the invention is realized by the following technical scheme: a method for integrally learning soft measurement of butane content in a debutanizer based on Stacking comprises the following steps:
(1) continuously sampling the tower top temperature, the tower top pressure, the reflux quantity, the flow to the next stage, the sixth tower plate temperature, the tower bottom temperature 1 and the tower bottom temperature 2 of the debutanizer for n times in a fixed sampling period T, obtaining the butane content value at each sampling moment through off-line laboratory analysis, and obtaining n samples which are used as an original sample set and are expressed as Dtrain={(Xi′,yi′) 1,2, …, n, where X isi′For the feature vector, there are seven columns, Xi′∈R7Each column represents the column top temperature, column top pressure, reflux amount, flow rate to the next stage, sixth column plate temperature, column bottom temperature 1, column bottom temperature 2, y at the i' th sampling timei,For the prediction target, there is one column in total, yi,∈R1And represents the butane content at the i' th sampling time.
(2) Respectively carrying out feature screening and feature disturbance on an original training sample set by utilizing an XGB OST model and a time sliding window mechanism, and constructing two training sets Dtrain1={(Xi_train1,yi)|i=W1,W1+1, …, n } and Dtrain2={(Xi_train2,yi)|i=W1,W1+1, …, n }; wherein, W1Is a pair DtrainThe time sliding window width of the feature expansion.
(3) Establishing a two-layer ensemble learning butane content prediction model based on the Stacking learning strategy based on the two training sample sets obtained in the step (2), and specifically comprising the following steps:
(3.1) Cross-training method based on K-fold using Dtrain1Training two different learners, denoted L1And L2Using Dtrain2Training two different learners, denoted L3And L4Obtaining four learners in total as the first-layer learners in the prediction model, obtaining the prediction value of each learner on the training sample, and expressing the prediction value of the t-th learner on the ith training sample as
Figure BDA0003153513210000021
Wherein t is 1,2, 3, 4, i is W1,W1+1,…,n。
(3.2) mixing L1And L2Averaging of the predicted values over the training samples, L3And L4Averaging the predicted values over the training samples and using the original training set DtrainButane content y iniConstructing a second-level learner training set Dconstructed={(X″ii)|i=W1,W1+1, …, n }, wherein X ″ ", isi∈R2
Figure BDA0003153513210000022
Figure BDA0003153513210000023
(3.3) use of DstackingTraining the second tier learner, denoted as L.
(4) Predicting the real-time butane content of the debutanizer by using a Stacking-based ensemble learning debutanizer butane content soft measurement model: performing characteristic disturbance and expansion on the tower top temperature, the tower top pressure, the reflux amount, the flow to the next stage, the sixth tower plate temperature, the tower bottom temperature 1, the tower bottom temperature 2 and historical sample data in a database obtained by the sensors in the step (2) to obtain two new samples, wherein the first sample is used as D in the step (3.1)train1The two learner inputs obtained from the training, the second sample as D in step (3.1)train2Training the obtained inputs of two learners to obtainAnd (4) when the four predicted values are obtained, respectively averaging the four predicted values in the step (3.2) to obtain two characteristic values which are used as the input of the second-layer learning device in the step (3.3), and using the output of the second-layer learning device as the final real-time predicted value of the butane content.
Further, in the step (2), feature screening and feature disturbance are respectively performed on the original training sample set by using the XGBOOST model and the time sliding window mechanism, and the specific steps are as follows:
(2.1) set of training samples Dtrain(Xi,) As input to the XGBOOST model, Dtrain(yi′) As the target output of the XGBOOST model, training the XGBOOST model, deleting the features with lower scores according to the feature _ attributes of the XGBOOST model after the training is finished, and obtaining a training sample set D after feature screeningtrain_screened={(X′i,yi′) 1,2, …, n, wherein X'i′Each column of (a) respectively representing a retained feature, yi′∈R1Still represents the butane content at the i' th sampling instant.
(2.2) determining a first time sliding window width W1Through a time sliding window pair DtrainThe feature variables in (1) are expanded, and a training sample set after feature expansion is expressed as
Figure BDA0003153513210000031
Wherein the content of the first and second substances,
Figure BDA0003153513210000032
representing the characteristic variable X of the ith sample after characteristic expansion through the first time sliding windowi_train1In total of 7W1The columns of the image data are,
Figure BDA0003153513210000033
(2.3) determining a second time sliding window width W2(W2<W1) Through a time sliding window pair Dtrain_screenedThe feature variables in (1) are expanded, and a training sample set after feature expansion is expressed as
Figure BDA0003153513210000034
Figure BDA0003153513210000035
Wherein the content of the first and second substances,
Figure BDA0003153513210000036
representing the characteristic variable X of the ith sample after characteristic screening and characteristic expansion by a second time sliding windowi_train2In total of 4W2The columns of the image data are,
Figure BDA0003153513210000037
further, the step (3.1) of training the first-layer learner based on the K-fold cross training method specifically includes the steps of:
(3.1.1) adding Dtrain1Dividing the prediction set into K subsets, taking K-1 subsets as a training set to train a learner, and taking the other subset as a prediction set; performing K times of training and prediction, selecting a subset different from the previous one as a prediction set each time, storing the prediction output of the learner on the prediction set after each training is completed, and obtaining all X times of the learner in the X times through K times of training and predictioni_train1,i=W1,W1Predicted values at +1, …, n; co-training two different learners, the two learners being at Xi_train1,i=W1,W1The predicted values at +1, …, n are:
Figure BDA0003153513210000038
(3.1.2) reaction of Dtrain2Dividing the prediction set into K subsets, taking K-1 subsets as a training set to train a learner, and taking the other subset as a prediction set; performing K times of training and prediction, selecting a subset different from the previous one as a prediction set each time, storing the prediction output of the learner on the prediction set after each training is completed, and obtaining all X times of the learner in the X times through K times of training and predictioni_train2,i=W1,W1Predicted values on +1, …, n; co-training two different learners, the two learners being at Xi_train2,i=W1,W1The predicted values at +1, …, n are:
Figure BDA0003153513210000039
further, in step (3.1), D is addedtrain1、Dtrain2Equally dividing into K subsets for training the learning device.
Further, the learner includes an XGBOOST regression model, a RBF kernel-based support vector machine regression model, a multi-layered perceptron regression model, a bayesian ridge regression model, a RBF kernel-based ridge regression model, and the like.
The invention has the beneficial effects that: the method and the device have the advantages that the process variable at the current sampling moment is subjected to characteristic expansion through the time sliding window, the problem of process dynamics is solved by using historical process data, the diversity among first-layer learners in the ensemble learning model is enhanced by introducing a characteristic disturbance mechanism and adopting a non-homogeneous learner, and then the final predicted value of the butane content is obtained by combining the output of the first-layer learner by using a Stacking integration strategy and a second-layer learner.
Drawings
FIG. 1 is a schematic flow diagram of a debutanizer column;
FIG. 2 is a schematic diagram of feature expansion using a width 3 time sliding window;
FIG. 3 is a flow chart of a soft measurement model for real-time estimation of butane content according to an embodiment of the present invention; wherein, XGBOOST represents XGBOOST regression model, KSVR represents support vector regression model based on RBF kernel, MLP represents multi-layer perceptron regression model, BRR represents bayesian ridge regression model, KRR represents ridge regression model based on RBF kernel;
FIG. 4 is a graph of the results of prediction of butane content in a Stacking-based ensemble learning debutanizer according to an embodiment of the present invention; wherein "+" is the laboratory analysis value of the butane content of each sampling point, and "+" is the predicted value of the butane content of each sampling point.
Detailed Description
The invention is further described in detail below with reference to the figures and examples.
The invention relates to a Stacking-based ensemble learning debutanizer soft measurement method, which comprises the following steps:
(1) continuously sampling the tower top temperature U1, the tower top pressure U2, the reflux U3, the flow U4 to the next stage, the sixth tower plate temperature U5, the tower bottom temperature 1U6 and the tower bottom temperature 2U7 of the debutanizer for n times in a fixed sampling period T, obtaining the butane content value at each sampling moment through off-line laboratory analysis, and obtaining n samples as an original sample set which are expressed as Dtrain={(Xi′,yi′) I | 1,2, …, n }. Wherein, Xi′For the feature vector, there are seven columns, Xi′∈R7Each column respectively represents the tower top temperature, the tower top pressure, the reflux quantity, the flow to the next stage, the sixth tower plate temperature, the tower bottom temperature 1 and the tower bottom temperature 2 at the ith' sampling moment; y isi′For the prediction target, there is one column in total, yi′∈R1And represents the butane content at the i' th sampling time. Fig. 1 illustrates seven process variables that need to be collected at each sampling point.
(2) Respectively carrying out feature screening and feature disturbance on an original training sample set by utilizing an XGB OST model and a time sliding window mechanism, and constructing two training sets Dtrain1And Dtrain2The method comprises the following specific steps:
(2.1) set of training samples Dtrain(Xi′) As input to the XGBOOST model, Dtrain(yi′) As the target output of the XGBOOST model, training the XGBOOST model, deleting three features with lower scores according to feature _ attributes of the XGBOOST model after the training is finished, and obtaining a training sample set D after feature screeningtrain_screened={(X′i′,yi′) 1,2, …, n, wherein X'i′∈R4There are four columns, each column representing the column top temperature, reflux amount, flow rate to the next stage and sixth tray temperature at the i' th sampling time, yi′∈R1Still represents the butane content at the i' th sampling instant.
(2.2) determining a first time sliding window width W1Through a time sliding window pair DtrainThe feature variables in (1) are expanded, and a training sample set after feature expansion is expressed as
Figure BDA0003153513210000051
Figure BDA0003153513210000052
Wherein the content of the first and second substances,
Figure BDA0003153513210000053
representing the characteristic variable X of the ith sample after characteristic expansion through the first time sliding windowi_train1In total of 7W1The columns of the image data are,
Figure BDA0003153513210000054
as shown in FIG. 2, W1=3。
(2.3) determining a second time sliding window width W2(W2<W1) Through a time sliding window pair Dtrain_screenedThe feature variables in (1) are expanded, and a training sample set after feature expansion is expressed as
Figure BDA0003153513210000055
Wherein the content of the first and second substances,
Figure BDA0003153513210000056
representing the characteristic variable X of the ith sample after characteristic screening and characteristic expansion by a second time sliding windowi_train2In total of 4W2The columns of the image data are,
Figure BDA0003153513210000057
(3) establishing a two-layer ensemble learning butane content prediction model based on the Stacking learning strategy based on the two training sample sets obtained in the step (2), and specifically comprising the following steps:
(3.1) Cross-training method based on K-fold using Dtrain1Training two different learners, denoted L1And L2Using Dtrain2Training two different learners, denoted L3And L4Obtaining four learners in total as the first-layer learners in the prediction model, and obtaining the prediction value of each learner on the training sample, wherein the prediction value of the t-th learner on the ith training sample is represented as
Figure BDA0003153513210000058
Wherein t is 1,2, 3, 4, i is W1,W1+1, …, n. The method comprises the following specific steps:
(3.1.1) reaction of Dtrain1Equally dividing the data into K subsets, taking K-1 subsets as a training set to train a learning machine, and taking the other subset as a prediction set; performing K times of training and prediction, selecting a subset different from the previous one as a prediction set each time, storing the prediction output of the learner on the prediction set after each training is completed, and obtaining all X times of the learner in the X times through K times of training and predictioni_train1,i=W1,W1Predicted values on +1, …, n; co-training two different learners, the two learners being at Xi_train1,i=W1,W1The predicted values at +1, …, n are:
Figure BDA0003153513210000059
(3.1.2) adding Dtrain2Equally dividing the data into K subsets, taking K-1 subsets as a training set to train a learning machine, and taking the other subset as a prediction set; performing K times of training and prediction, selecting a subset different from the previous one as a prediction set each time, storing the prediction output of the learner on the prediction set after each training is completed, and obtaining all X times of the learner in the X times through K times of training and predictioni_train2,i=W1,W1Predicted values on +1, …, n; co-training two different learners, the two learners being at Xi_train2,i=W1,W1The predicted values at +1, …, n are:
Figure BDA00031535132100000510
(3.2) mixing L1And L2Averaging of predicted values over training samples
Figure BDA00031535132100000511
L3And L4Averaging of predicted values over training samples
Figure BDA00031535132100000512
And using the original training set DtrainButane content y iniConstructing a second-level learner training set Dconstructed={(X″i=[y′i,y″i],yi)|i=W1,W1+1, …, n }; wherein, X ″)i∈R2,i=W1,W1+1,…,n。
(3.3) use of DconstructedTraining the second tier learner, denoted as L.
(4) Predicting the real-time butane content of the debutanizer by using the Stacking-based integrated learning debutanizer butane content soft measurement model established in the step (3):
and (3) performing characteristic disturbance and expansion on seven process variables including real-time tower top temperature, tower top pressure, reflux quantity, flow to the next stage, sixth tower plate temperature, tower bottom temperature 1 and tower bottom temperature 2 obtained by the sensors and historical sample data in a database in the step (2) to obtain two new samples, taking the first sample as the input of two learners obtained by training in the step (3.1.1), taking the second sample as the input of two learners obtained by training in the step (3.1.2) to obtain four predicted values, averaging the four predicted values in the step (3.2) to obtain two characteristic values respectively, taking the two characteristic values as the input of a second-layer learner in the step (3.3), and taking the output of the second-layer learner as a final real-time predicted value of the butane content.
Examples
The invention is illustrated below with reference to a specific debutanizer butane content prediction example:
the debutanizer is continuously sampled to obtain 2394 samples, the first 1596 samples are used as a training set for training the integrated learning soft measurement model, and the last 798 samples are used as a test set for verifying the effectiveness of the integrated learning soft measurement model. In the process, seven process variables were selected to model the butane content at the bottom of the debutanizer, which were the overhead temperature, overhead pressure, reflux, flow to the next stage, sixth tray temperature, bottom temperature 1, bottom temperature 2, as shown in fig. 1.
The following will explain the implementation steps of the present invention in detail with reference to the specific process, as shown in fig. 3, specifically:
1. an XGB OST regression model is trained by using 1596 training samples, and importance scores of seven process variables are obtained through feature _ attributes of the XGB OST model, wherein the importance scores are respectively as follows: 0.151, 0.001, 0.166, 0.095, 0.585, 0, 0.003. The 2 nd, 6 th and 7 th process variables in each sample are deleted in feature screening according to the scores of the various process variables.
2. Using a time sliding window with the width of 50 to perform characteristic expansion on the 1596 original training samples to obtain Dtrain11546 new training samples are included, each sample having a feature vector dimension of 350.
Using a time sliding window with the width of 35 to perform characteristic expansion on 1596 samples obtained by characteristic screening to obtain Dtrain2Comprises 1561 new training samples with a feature vector dimension of 140, and keeps Dtrain1And Dtrain2The number of middle training samples is consistent, and D is deletedTrain2The first 15 samples in (a).
3. Establishing a Stacking-based ensemble learning debutanizer butane content soft measurement model according to a detailed method in the implementation steps:
(1) use of Dtrain1Training an XGBOOST regression model (XGBOOST) and an RBF kernel-based support vector machine regression model (KSVR) using Dtrain2A multi-level perceptron regression Model (MLP) and a bayesian ridge regression model (BRR) are trained as the first level learner.
(2) Using 3-fold cross training method to separate Dtrain1The trisection is divided into three subsets, each subset contains 516, 515 and 515Training samples which are sequentially marked as a subset 1, a subset 2 and a subset 3, training an XGB regression model and a support vector machine regression model based on an RBF core by using the subset 1 and the subset 2 as training sets in a first round, then using the subset 3 as a prediction set of the two models, and storing the average value of prediction results of the two models; in the second round, the subset 1 and the subset 3 are used as training sets, an XGB regression model and an RBF kernel-based support vector machine regression model are trained, then the subset 2 is used as a prediction set of the two models, and the average value of prediction results of the two models is stored; the third round takes the subset 2 and the subset 3 as training sets, trains an XGB regression model and a RBF core-based support vector machine regression model, then takes the subset 1 as a prediction set of the two models, and stores the average value of the prediction results of the two models; the predicted average values of the XGB OST regression model and the RBF core-based support vector regression model on 1546 samples can be obtained through three rounds of training. Similarly, Dtrain2Trisection is carried out to three subsets, and the predicted average values of the multilayer perceptron regression model and the Bayesian ridge regression model on 1546 samples are obtained through three rounds of training.
(3) These two predicted averages are used as the eigenvalues of the second-level learner input samples, Dtrain1Using the 1546 label values as output labels of the second-layer learner, and constructing a training set D of the second-layer learnerconstrutedA ridge regression model (KRR) based on RBF kernel is trained as a second-level learner.
4. 798 test samples were used to verify the validity of the soft measurement model proposed by the present invention:
after each test sample and each historical sample are subjected to feature expansion through a first time sliding window, the test sample with the feature vector dimension of 350 is used as a first-layer learner L1And L2The two predicted values are averaged to obtain a first characteristic of the input sample of the second-layer learner.
After the test sample is subjected to feature screening and feature expansion by a second time sliding window, the feature vector dimension is expanded to 140 and is used as a first-layer learner L3And L4Is inputted to obtainAnd averaging the two predicted values to be used as a second characteristic of the input sample of the second-layer learner.
And taking the output value of the second-layer learner, namely the ridge regression model based on the RBF core as the final estimated value of the real-time butane content.
FIG. 4 is a graph comparing the butane content prediction results and the butane content analysis values of the soft measurement model of the present invention over 798 test samples; wherein "+" is the laboratory analysis value of the butane content of each sampling point, and "+" is the model prediction value of the butane content of each sampling point. It can be seen that, at most sampling points, the estimated value of the butane content obtained by the soft measurement model provided by the present invention can track the offline analysis value of the butane content with a small error, and the Root Mean Square Error (RMSE) is obtained for the estimated value of 798 samples and the offline analysis value of 798 samples, and the error value is about 0.1, in this embodiment, the calculation formula of the Root Mean Square Error (RMSE) is:
Figure BDA0003153513210000071
wherein, ykAn off-line analysis value representing the butane content of each sampling point,
Figure BDA0003153513210000072
a soft measurement estimate representing the butane content of each sample point.
The above examples are intended to illustrate the invention, but not to limit the invention, and any modifications and variations of the invention within the spirit of the invention and the scope of the claims are within the scope of the invention.

Claims (5)

1. A method for integrally learning soft measurement of butane content in a debutanizer based on Stacking is characterized by comprising the following steps:
(1) continuously sampling the tower top temperature, the tower top pressure, the reflux quantity, the flow to the next stage, the sixth tower plate temperature, the tower bottom temperature 1 and the tower bottom temperature 2 of the debutanizer for n times in a fixed sampling period T, and introducingAnalyzing in off-line laboratory to obtain butane content value at each sampling time, and obtaining n samples as original sample set, denoted as Dtrain={(Xi′,yi′) 1,2, ·, n }, where X ═ i' ═ 1,2i′For the feature vector, there are seven columns, Xi′∈R7Each column represents the column top temperature, column top pressure, reflux amount, flow to the next stage, column tray temperature, column bottom temperature 1, column bottom temperature 2, y at the ith' sampling time, respectivelyi′For the prediction target, there is one column in total, yi′∈R1Represents the butane content at the i' th sampling time;
(2) respectively carrying out feature screening and feature disturbance on an original training sample set by utilizing an XGB OST model and a time sliding window mechanism, and constructing two training sets Dtrain1={(Xi_train1,yi)|i=W1,W1+1,.,. n } and Dtrain2={(Xi_train2,yi)|i=W1,W1+1,.. multidot.n }; wherein, W1Is a pair DtrainA time sliding window width of feature expansion;
(3) establishing a two-layer ensemble learning butane content prediction model based on the Stacking learning strategy based on the two training sample sets obtained in the step (2), and specifically comprising the following steps:
(3.1) Cross-training method based on K-fold using Dtrain1Training two different learners, denoted L1And L2Using Dtrain2Training two different learners, denoted L3And L4Obtaining four learners in total as the first-layer learners in the prediction model, obtaining the prediction value of each learner on the training sample, and expressing the prediction value of the t-th learner on the ith training sample as
Figure FDA0003632036110000011
Wherein t is 1,2, 3, 4, i is W1,W1+1,...,n;
(3.2) mixing L1And L2Averaging of the predicted values over the training samples, L3And L4Averaging the predicted values over the training samples and using the original training set DtrainButane content y iniConstructing a second-level learner training set Dconstructed={(X″i,yi)|i=W1,W1+1, ai∈R2,X″i=[y′i,y′i],
Figure FDA0003632036110000012
i=W1,W1+1,...,n;
(3.3) use of DstackingTraining a second-layer learner, and recording as L;
(4) predicting the real-time butane content of the debutanizer by using a Stacking-based ensemble learning debutanizer butane content soft measurement model: performing characteristic disturbance and expansion on the tower top temperature, the tower top pressure, the reflux amount, the flow to the next stage, the sixth tower plate temperature, the tower bottom temperature 1, the tower bottom temperature 2 and historical sample data in a database obtained by the sensors in the step (2) to obtain two new samples, wherein the first sample is used as D in the step (3.1)train1The two learner inputs obtained from the training, the second sample as D in step (3.1)train2And (3) training the obtained input of the two learners to obtain four predicted values, respectively averaging the four predicted values in the step (3.2) to obtain two characteristic values which are used as the input of the second-layer learner in the step (3.3), and using the output of the second-layer learner as the final real-time predicted value of the butane content.
2. The method for integrated learning and soft measurement of the butane content in the debutanizer based on Stacking according to claim 1, wherein the step (2) of respectively performing feature screening and feature perturbation on an original training sample set by using an XGBOOST model and a time sliding window mechanism comprises the following specific steps:
(2.1) set of training samples Dtrain(Xi′) As input to the XGBOOST model, Dtrain(yi′) As the target output of the XGBOST model, the XGBOST model is trained and finishedDeleting the features with lower scores according to the feature _ attributes of the XGBOOST model after the completion to obtain a training sample set D after feature screeningtrain_screened={(X′i′,yi′) 1,2, ·, n }, where X'i′Each column of (a) respectively representing a retained feature, yi′∈R1Still represents the butane content at the i' th sampling instant;
(2.2) determining a first time window width W1Through a time sliding window pair DtrainThe feature variables in (1) are expanded, and a training sample set after feature expansion is expressed as
Figure FDA0003632036110000029
Wherein the content of the first and second substances,
Figure FDA0003632036110000021
representing the characteristic variable X of the ith sample after characteristic expansion through the first time sliding windowi_train1In total of 7W1The columns of the image data are,
Figure FDA0003632036110000022
(2.3) determining a second time sliding window width W2,W2<W1Through a time sliding window pair Dtrain_screenedThe feature variables in (1) are expanded, and a training sample set after feature expansion is expressed as
Figure FDA0003632036110000023
Figure FDA0003632036110000024
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003632036110000025
representing the characteristic variable X of the ith sample after characteristic screening and characteristic expansion by a second time sliding windowi_train2In total of 4W2The columns of the image data are arranged in rows,
Figure FDA0003632036110000026
3. the method for integrated learning and soft measurement of the butane content in the debutanizer based on Stacking according to claim 1, wherein the step (3.1) of training the first-layer learner based on the K-fold cross training method comprises the following specific steps:
(3.1.1) adding Dtrain1Dividing the training set into K subsets, taking K-1 subsets as a training set to train a learner, and taking the other subset as a prediction set; performing K times of training and prediction, selecting a subset different from the previous one as a prediction set each time, storing the prediction output of the learner on the prediction set after each training is completed, and obtaining all X times of the learner in the X times through K times of training and predictioni_train1,i=W1,W1+ 1.. ang., predicted value on n; co-training two different learners, the two learners being at Xi_train1,i=W1,W1+1,.. the predicted values on n are:
Figure FDA0003632036110000027
i=W1,W1+1,...,n;
(3.1.2) adding Dtrain2Dividing the training set into K subsets, taking K-1 subsets as a training set to train a learner, and taking the other subset as a prediction set; performing K times of training and prediction, selecting a subset different from the previous one as a prediction set each time, storing the prediction output of the learner on the prediction set after each training is completed, and obtaining all X times of the learner in the X times through K times of training and predictioni_train2,i=W1,W1+ 1.. ang., predicted value on n; co-training two different learners, the two learners being at Xi_train2,i=W1,W1+1,.. the predicted values on n are:
Figure FDA0003632036110000028
i=W1,W1+1,...,n。
4. the Stacking-based ensemble learning debutanizer butane content soft measurement method according to claim 3, wherein in step (3.1), D istrain1、Dtrain2Equally dividing into K subsets for training the learning device.
5. The Stacking-based ensemble learning debutanizer butane content soft measurement method according to claim 1, wherein the learner comprises an XGBOOST regression model, a RBF kernel-based support vector machine regression model, a multi-layered perceptron regression model, a bayesian ridge regression model, a RBF kernel-based ridge regression model.
CN202110771243.5A 2021-07-08 2021-07-08 Stacking-based soft measurement method for butane content in debutanizer Active CN113449476B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110771243.5A CN113449476B (en) 2021-07-08 2021-07-08 Stacking-based soft measurement method for butane content in debutanizer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110771243.5A CN113449476B (en) 2021-07-08 2021-07-08 Stacking-based soft measurement method for butane content in debutanizer

Publications (2)

Publication Number Publication Date
CN113449476A CN113449476A (en) 2021-09-28
CN113449476B true CN113449476B (en) 2022-07-05

Family

ID=77815400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110771243.5A Active CN113449476B (en) 2021-07-08 2021-07-08 Stacking-based soft measurement method for butane content in debutanizer

Country Status (1)

Country Link
CN (1) CN113449476B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8357795B2 (en) * 2008-08-04 2013-01-22 Allergan, Inc. Hyaluronic acid-based gels including lidocaine
CN105205224B (en) * 2015-08-28 2018-10-30 江南大学 Time difference Gaussian process based on fuzzy curve analysis returns soft-measuring modeling method
CN109240090B (en) * 2018-11-08 2020-10-23 浙江大学 Adaptive soft measurement modeling method for incremental learning XGBOST model based on time difference
CN110066895B (en) * 2019-04-10 2021-01-12 东北大学 Stacking-based blast furnace molten iron quality interval prediction method
CN111914476B (en) * 2020-06-23 2023-09-29 宁波大学 On-line soft measurement method for butane content of debutanizer bottom product
CN112884079A (en) * 2021-03-30 2021-06-01 河南大学 Method for estimating near-surface nitrogen dioxide concentration based on Stacking integrated model

Also Published As

Publication number Publication date
CN113449476A (en) 2021-09-28

Similar Documents

Publication Publication Date Title
CN109884892B (en) Process industrial system prediction model based on cross correlation time-lag grey correlation analysis
Zhu et al. Deep learning based soft sensor and its application on a pyrolysis reactor for compositions predictions of gas phase components
CN105205224A (en) Modeling method for soft measurement of time difference gaussian process regression based on fuzzy curve analysis
CN110096810B (en) Industrial process soft measurement method based on layer-by-layer data expansion deep learning
CN107526927B (en) Blast furnace molten iron quality online robust soft measurement method
CN107168063B (en) Soft measurement method based on integrated variable selection type partial least square regression
CN106227910B (en) A kind of accelerated degradation test reliability estimation method based on gray system theory
CN113761787A (en) Blast furnace molten iron silicon content online prediction method and system based on deep migration network
CN110175425A (en) A kind of prediction technique of the gear remaining life based on MMALSTM
CN113361690A (en) Water quality prediction model training method, water quality prediction device, water quality prediction equipment and medium
Graziani et al. Design of a soft sensor for an industrial plant with unknown delay by using deep learning
CN114117919B (en) Instant learning soft measurement modeling method based on sample collaborative representation
CN110188399B (en) Dam safety monitoring single-measuring-point evaluation method based on multiple correlation sequences
CN113449476B (en) Stacking-based soft measurement method for butane content in debutanizer
CN112949836A (en) Method for carrying out regression prediction on-line migration learning on time-varying distribution data
CN116821695A (en) Semi-supervised neural network soft measurement modeling method
CN116842358A (en) Soft measurement modeling method based on multi-scale convolution and self-adaptive feature fusion
CN116386756A (en) Soft measurement modeling method based on integrated neural network reliability estimation and weighted learning
CN113707240B (en) Component parameter robust soft measurement method based on semi-supervised nonlinear variation Bayesian hybrid model
CN115186584A (en) Width learning semi-supervised soft measurement modeling method integrating attention mechanism and adaptive composition
CN112182854B (en) Intelligent monitoring method and system for abnormal furnace conditions of blast furnace
CN107977742B (en) Construction method of medium-long term power load prediction model
CN106339588A (en) Discrete modeling method of accelerated degradation data based on grey system theory
CN111650894A (en) Bayesian network complex industrial process soft measurement method based on hidden variables
CN111126694A (en) Time series data prediction method, system, medium and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant