CN108734207B - Method for predicting concentration of butane at bottom of debutanizer tower based on model of double-optimization semi-supervised regression algorithm - Google Patents

Method for predicting concentration of butane at bottom of debutanizer tower based on model of double-optimization semi-supervised regression algorithm Download PDF

Info

Publication number
CN108734207B
CN108734207B CN201810454373.4A CN201810454373A CN108734207B CN 108734207 B CN108734207 B CN 108734207B CN 201810454373 A CN201810454373 A CN 201810454373A CN 108734207 B CN108734207 B CN 108734207B
Authority
CN
China
Prior art keywords
sample
samples
unlabeled
label
labeled
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810454373.4A
Other languages
Chinese (zh)
Other versions
CN108734207A (en
Inventor
熊伟丽
程康明
马君霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Minglong Electronic Technology Co ltd
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN201810454373.4A priority Critical patent/CN108734207B/en
Publication of CN108734207A publication Critical patent/CN108734207A/en
Application granted granted Critical
Publication of CN108734207B publication Critical patent/CN108734207B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Complex Calculations (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for predicting the concentration of butane at the bottom of a debutanizer tower based on a model of a double-optimization semi-supervised regression algorithm, and belongs to the field of semi-supervised regression. Solving the center of the labeled sample compact area through a double-optimization strategy, screening unlabeled samples according to the similarity between the unlabeled samples and the compact area center, and screening labeled samples according to the similarity between the labeled samples; then establishing an auxiliary learner for the selected labeled samples by using a Gaussian process regression method so as to predict labels of the selected unlabeled samples; and finally, the prediction effect of the main learner is improved by utilizing the pseudo label samples, the problem that when the number of label samples is small, the quality of the label-free samples cannot be guaranteed, so that accurate prediction cannot be realized is solved, and the effect of realizing accurate prediction by utilizing few label samples is achieved.

Description

Method for predicting concentration of butane at bottom of debutanizer tower based on model of double-optimization semi-supervised regression algorithm
Technical Field
The invention relates to a method for predicting the concentration of butane at the bottom of a debutanizer tower based on a model of a double-optimization semi-supervised regression algorithm, belonging to the field of semi-supervised regression.
Background
Some important quality variables in chemical, metallurgical, fermentation and other industrial processes cannot be measured through an online instrument, and serious lag exists in a laboratory offline analysis mode, so that the important quality variables need to be predicted through some sample data which can be directly measured.
With the development of science and technology, especially the development of industrial big data technology, a large number of unlabeled samples are more and more easily obtained, and the obtaining cost of labeled samples is still very high, so that labeled samples are few in some industrial processes, and the prediction effect of the model is difficult to ensure in the conventional modeling method under the condition.
To solve these problems, semi-supervised learning, which utilizes a small number of labeled exemplars and a large number of unlabeled exemplars to improve learning performance, has received much attention. At present, the research on semi-supervised clustering and semi-supervised classification is more, but the research on semi-supervised regression is less. The existing semi-supervised regression methods include semi-supervised regression algorithms using popular learning, cooperative training algorithms, semi-supervised support vector regression, selective integration algorithms, and the like. However, when there are few labeled samples, these methods cannot guarantee the quality of the label-free samples, and thus accurate prediction cannot be achieved.
Disclosure of Invention
In order to solve the existing problems, the unlabeled samples are utilized more accurately, and considering that part of samples in the unlabeled samples cannot be accurately predicted through a small amount of labeled samples and outliers existing in the small amount of labeled samples can influence the prediction effect of the unlabeled samples, the invention realizes accurate prediction of the unlabeled samples by defining two optimal criteria from the two aspects of screening the unlabeled samples and screening the labeled samples so as to improve the prediction effect of the model after the unlabeled samples are utilized. The method comprises the following steps:
step 1: screening out the non-label samples according to the optimization criterion 1 and the optimization criterion 2 by using a non-label sample screening algorithm to obtain a non-label sample set M1The unlabeled sample comes from actual sampling of the real process of the debutanizer;
preferred criteria 1 are described below: given a threshold value theta1Measure unlabeled sample x 'using Mahalanobis distance'iSimilarity to labeled sample dense center C, diX'iThe distance from C is less than theta1X'iSatisfies the preferred conditions wherein diObtained by the formulas (1) to (3); the labeled sample comes from actual sampling of a real process of the debutanizer;
di=sqrt[(x′i-C)′M-1(x′i-C)] (1)
Figure GDA0002990457990000021
Figure GDA0002990457990000022
where M is the unlabeled sample covariance matrix,n is the number of unlabeled samples,
Figure GDA0002990457990000023
is the mean value of the unlabeled samples;
the preferred criteria 2 are described as follows: given a threshold value theta2Using mahalanobis distance to measure the similarity d (x) between samplesi,xj) Statistical sample xiAnd surrounding sample xjHas a Mahalanobis distance of less than theta2M, if m is not less than 2, xiSatisfies the preferred condition wherein d (x)i,xj) Obtained from equations (4) to (6)
d(xi,xj)=sqrt[(xi-xj)′S-1(xi-xj)] (4)
Figure GDA0002990457990000024
Figure GDA0002990457990000025
Wherein S is a labeled sample covariance matrix, n is the number of labeled samples,
Figure GDA0002990457990000027
the sample mean value with the label is obtained;
the mahalanobis distance represents the covariance distance of data, and the similarity of two unknown sample sets can be effectively calculated;
the unlabeled sample screening algorithm is as follows:
step 1: initializing 1, i assigning an initial value of 1, and setting a threshold value theta3
Step 2: sequentially judging xiWhether or not the threshold value theta is satisfied3Preferred criterion 2 defined as3Alternative theta2Selecting labeled samples which meet the conditions to form a matrix A as similarity constraint;
step 3: and solving the center C of the sample dense area by using the obtained A matrix:
Figure GDA0002990457990000026
wherein l is the number of the samples in the dense area contained in A, and i represents the dimension of the samples;
step 4: calculating each unlabeled sample x 'according to formulas (1) - (3)'iDistance d from CiSelecting the unlabeled samples satisfying the preference criterion 1 and storing them in the matrix M1Performing the following steps;
step 2: utilizing an auxiliary learner to establish an algorithm, selecting labeled samples according to an optimal selection criterion 2, and establishing a more targeted auxiliary learner f1
The auxiliary learner predicts the label of the unlabeled sample by utilizing a model established by the labeled sample;
the auxiliary learner set-up algorithm is as follows:
step 1: initializing 2, i and assigning an initial value of 1;
step 2: sequentially judging xiWhether the preference criterion 2 is met or not is judged, and the labeled samples meeting the condition are selected to form a matrix B;
step 3: establishing an auxiliary learner f Using Gaussian Process regression GPR according to B1
The GPR is a nonparametric probability model based on a statistical learning theory, and is modeled by the GPR as follows:
given training sample set X ∈ RD×NAnd y ∈ RNWherein X ═ { X ═ Xi∈RD}i=1…N,y={yi∈R}i=1…NInput data and output data representing D dimensions, respectively, the relationship between the input data and the output data being generated by equation (7):
y=f(x)+ε (7)
where f is an unknown functional form, ε is a mean of 0 and a variance of
Figure GDA0002990457990000031
For a new input x*Corresponding probability prediction outputy*Also satisfies a Gaussian distribution whose mean and variance are shown in formulas (8) and (9):
y*(x*)=cT(x*)C-1y (8)
Figure GDA0002990457990000032
in the formula c (x)*)=[c(x*,x1),…,c(x*,xn)]TIs a covariance matrix between the training data and the test data,
Figure GDA0002990457990000033
is a covariance matrix between training data, I is an identity matrix of dimension N × N, c (x)*,x*) Is the autocovariance of the test data;
GPR selects gaussian covariance function:
Figure GDA0002990457990000034
where v controls a measure of covariance, ωdRepresents each component xdThe relative importance of;
for the unknown parameters v, ω in equation (10)1,…,ωDSum of Gaussian noise variance
Figure GDA0002990457990000035
Using maximum likelihood estimation to obtain the parameters
Figure GDA0002990457990000036
Figure GDA0002990457990000037
The procedure for finding the value of the parameter θ is as follows:
in order to jump out of local optimum, setting the parameter theta as random values in different ranges, and selecting one random value in each range, wherein the ranges are in different magnitudes, namely 0.001, 0.01, 0.1, 1, 10 and the like;
obtaining optimized parameters by adopting a conjugate gradient method;
after obtaining the optimal parameter θ, for the test sample x*Estimating an output value of the GPR model by equations (8) and (9);
and step 3: using auxiliary learning devices f1For unlabeled sample set M1Predicting the label, and collecting the obtained pseudo label sample set S1Added to the initially labeled sample set S0In the method, a main learner is established by using a GPR method, wherein S0Is an initial labeled sample set;
the pseudo label sample is a sample obtained by artificially predicting a non-label sample by using an auxiliary learner, and the main learner tracks the test sample by using a model established by combining the label sample with the pseudo label sample.
Optionally, the method further includes:
selecting a sample dense area center by selecting samples belonging to the sample dense area;
the sample-dense region refers to a region where samples are distributed in a concentrated manner, and the center of the sample-dense region is the center of the sample-dense region.
Optionally, the method is a method for predicting variables that cannot be directly measured by unlabeled samples in an industrial process.
Optionally, the industrial process comprises an environmental, metallurgical and chemical process.
The invention has the beneficial effects that:
solving the center of the labeled sample compact area through a double-optimization strategy, screening unlabeled samples according to the similarity between the unlabeled samples and the compact area center, and screening labeled samples according to the similarity between the labeled samples; then establishing an auxiliary learner for the selected labeled samples by using a Gaussian process regression method so as to predict labels of the selected unlabeled samples; and finally, the prediction effect of the main learner is improved by utilizing the pseudo label samples, the problem that when the number of label samples is small, the quality of the label-free samples cannot be guaranteed, so that accurate prediction cannot be realized is solved, and the effect of realizing accurate prediction by utilizing few label samples is achieved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a general algorithm flow diagram;
FIG. 2 is a histogram distribution of labeled and unlabeled exemplars;
FIG. 3 is a diagram of a numerically simulated dual-preferred semi-supervised prediction effect;
FIG. 4 longitudinal comparison of different methods;
FIG. 5 comparison of prediction errors for different methods;
FIG. 6 is a histogram statistics of predicted values versus actual values for various methods.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Example (b):
the embodiment provides a model prediction method based on a double-optimization semi-supervised regression algorithm, and takes a common chemical process, namely a debutanizer process as an example. Experimental data from actual sampling of the real process, predictive of butane concentration, see fig. 1, the method comprises:
step 1: screening out the non-label samples according to the optimization criterion 1 and the optimization criterion 2 by using a non-label sample screening algorithm to obtain a non-label sample set M1
Preferred criteria 1 are as follows: given a threshold value theta1Measure unlabeled sample x 'using Mahalanobis distance'iSimilarity to labeled sample dense center C, diX'iThe distance from C is less than theta1X'iThe preferred conditions are satisfied. Wherein d isiObtained from the formulas (1) to (3).
di=sqrt[(x′i-C)′M-1(x′i-C)] (1)
Figure GDA0002990457990000051
Figure GDA0002990457990000052
Wherein M is the unlabeled sample covariance matrix, n is the number of unlabeled samples,
Figure GDA0002990457990000053
mean unlabeled samples.
The preferred criteria 2 are as follows: given a threshold value theta2Using mahalanobis distance to measure the similarity d (x) between samplesi,xj) Statistical sample xiAnd surrounding sample xjHas a Mahalanobis distance of less than theta2M, if m is not less than 2, xiThe preferred conditions are satisfied. Wherein d (x)i,xj) Obtained from the formulas (4) to (6).
d(xi,xj)=sqrt[(xi-xj)′S-1(xi-xj)] (4)
Figure GDA0002990457990000054
Figure GDA0002990457990000055
Wherein S is a labeled sample covariance matrix, n is the number of labeled samples,
Figure GDA0002990457990000056
the sample mean value with the label is obtained;
mahalanobis distance (Mahalanobis distance) is proposed by the indian statistician Mahalanobis (p.c. Mahalanobis) and represents the covariance distance of the data. The method is an effective method for calculating the similarity of two unknown sample sets.
The unlabeled sample screening algorithm is as follows:
step 1: initializing 1, i assigning an initial value of 1, and setting a threshold value theta3
Step 2: sequentially judging xiWhether or not the threshold value theta is satisfied3Under the limit (i.e. using theta at this time)3Alternative theta2As similarity constraint), selecting labeled samples meeting the condition to form a matrix A;
step 3: and solving the center C of the sample dense area by using the obtained A matrix:
Figure GDA0002990457990000057
wherein l is the number of the samples in the dense area contained in A, and i represents the dimension of the samples;
step 4: calculating each unlabeled sample x by formulas (1) to (3)iDistance d from CiSelecting the unlabeled samples satisfying the preference criterion 1 and storing them in the matrix M1In (1).
Step 2: utilizing an auxiliary learner to establish an algorithm, selecting labeled samples according to an optimal selection criterion 2, and establishing a more targeted auxiliary learner f1
The secondary learner uses a model built from the labeled exemplars to predict the labels of the unlabeled exemplars.
The auxiliary learner set-up algorithm is as follows:
step 1: initializing 2, i and assigning an initial value of 1;
step 2: sequentially judging xiWhether the preference criterion 2 is met or not is judged, and the labeled sample composition B meeting the condition is selected;
step 3: building a Secondary learner f from B using Gaussian Process Regression (GPR)1
The GPR is a nonparametric probability model based on a statistical learning theory, and is modeled by the GPR as follows:
given training sample set X ∈ RD×NAnd y ∈ RNWherein X ═ { X ═ Xi∈RD}i=1…N,y={yi∈R}i=1…NRepresenting input and output data in the D dimension, respectively. The relationship between input and output is generated by equation (7):
y=f(x)+ε (7)
where f is the unknown functional form, ε is the mean 0, and the variance is
Figure GDA0002990457990000061
Gaussian noise. For a new input x*Corresponding probability prediction output y*Also satisfies a Gaussian distribution whose mean and variance are shown in formulas (8) and (9):
y*(x*)=cT(x*)C-1y (8)
Figure GDA0002990457990000062
in the formula c (x)*)=[c(x*,x1),…,c(x*,xn)]TIs a covariance matrix between the training data and the test data.
Figure GDA0002990457990000063
Is a covariance matrix between training data, and I is an identity matrix of dimension N × N. c (x)*,x*) Is the autocovariance of the test data.
GPR can select different covariance functions c (x)i,xj) The covariance matrix sigma is generated as long as the selected covariance function ensures that the generated covariance matrix satisfies the non-negative-positive-definite relationship. The gaussian covariance function is chosen here:
Figure GDA0002990457990000064
where v controls a measure of covariance, ωdRepresents each component xdRelative importance of.
For the unknown parameters v, ω in equation (10)1,…,ωDSum of Gaussian noise variance
Figure GDA0002990457990000065
The simplest method is to obtain the parameters by maximum likelihood estimation
Figure GDA0002990457990000066
Figure GDA0002990457990000067
The procedure for finding the value of the parameter θ is as follows:
in order to jump out of local optimum, setting the parameter theta as random values in different ranges, and selecting one random value in each range, wherein the ranges are different magnitudes and are respectively 0.001, 0.01, 0.1, 1 and 10;
and obtaining optimized parameters by a conjugate gradient method.
After obtaining the optimal parameter θ, for the test sample x*The output values of the GPR model can be estimated using equations (8) and (9).
And step 3: using auxiliary learning devices f1For unlabeled sample set M1Predicting the label, and collecting the obtained pseudo label sample set S1Added to the initially labeled sample set S0(S0An initial set of labeled samples), a master learner is established using the GPR method.
The pseudo label sample is a sample obtained by artificially predicting a non-label sample by using an auxiliary learner, and the main learner tracks the test sample by using a model established by combining the label sample with the pseudo label sample.
Fig. 2 is a histogram distribution of labeled and unlabeled samples, theoretically illustrating the necessity of double optimization, and fig. 3 and fig. 4, 5 and 6 are the results of numerical simulation and debutanizer process simulation, respectively, demonstrating the effectiveness of double optimization from an experimental perspective.
According to the invention, through a double-optimization strategy, the center of a labeled sample dense area is solved, unlabeled samples are screened according to the similarity between the unlabeled samples and the center of the dense area, and labeled samples are screened according to the similarity between the labeled samples; then establishing an auxiliary learner for the selected labeled samples by using a Gaussian process regression method so as to predict labels of the selected unlabeled samples; and finally, the prediction effect of the main learner is improved by utilizing the pseudo label samples, the problem that when the number of label samples is small, the quality of the label-free samples cannot be guaranteed, so that accurate prediction cannot be realized is solved, and the effect of realizing accurate prediction by utilizing few label samples is achieved.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (4)

1. A method for predicting the concentration of butane at the bottom of a debutanizer tower based on a model of a double-preference semi-supervised regression algorithm, which is characterized by comprising the following steps:
step 1: screening out the non-label samples according to the optimization criterion 1 and the optimization criterion 2 by using a non-label sample screening algorithm to obtain a non-label sample set M1The unlabeled sample comes from actual sampling of the real process of the debutanizer;
preferred criteria 1 are described below: given a threshold value theta1Measure unlabeled sample x 'using Mahalanobis distance'iSimilarity to labeled sample dense center C, diX'iThe distance from C is less than theta1X'iSatisfies the preferred conditions wherein diObtained by the formulas (1) to (3); the labeled sample comes from actual sampling of a real process of the debutanizer;
di=sqrt[(x′i-C)′M-1(x′i-C)] (1)
Figure FDA0002990457980000011
Figure FDA0002990457980000012
wherein M is the unlabeled sample covariance matrix, n is the number of unlabeled samples,
Figure FDA0002990457980000013
is the mean value of the unlabeled samples;
the preferred criteria 2 are described as follows: given a threshold value theta2Using mahalanobis distance to measure the similarity d (x) between samplesi,xj) Statistical sample xiAnd surrounding sample xjHas a Mahalanobis distance of less than theta2M, if m is not less than 2, xiSatisfies the preferred condition wherein d (x)i,xj) Obtained from equations (4) to (6)
d(xi,xj)=sqrt[(xi-xj)′S-1(xi-xj)] (4)
Figure FDA0002990457980000014
Figure FDA0002990457980000015
Wherein S is a labeled sample covariance matrix, n is the number of labeled samples,
Figure FDA0002990457980000016
the sample mean value with the label is obtained;
the mahalanobis distance represents the covariance distance of data, and the similarity of two unknown sample sets can be effectively calculated;
the unlabeled sample screening algorithm is as follows:
step 1: initializing 1, i assigning an initial value of 1, and setting a threshold value theta3
Step 2: sequentially judging xiWhether or not the threshold value theta is satisfied3Preferred criterion 2 defined as3Alternative theta2Selecting labeled samples which meet the conditions to form a matrix A as similarity constraint;
step 3: and solving the center C of the sample dense area by using the obtained A matrix:
Figure FDA0002990457980000017
wherein l is the number of the samples in the dense area contained in A, and i represents the dimension of the samples;
step 4: calculating each unlabeled sample x 'according to formulas (1) - (3)'iDistance d from CiSelecting the unlabeled samples satisfying the preference criterion 1 and storing them in the matrix M1Performing the following steps;
step 2: utilizing an auxiliary learner to establish an algorithm, selecting labeled samples according to an optimal selection criterion 2, and establishing a more targeted auxiliary learner f1
The auxiliary learner predicts the label of the unlabeled sample by utilizing a model established by the labeled sample;
the auxiliary learner set-up algorithm is as follows:
step 1: initializing 2, i and assigning an initial value of 1;
step 2: sequentially judging xiWhether the preference criterion 2 is met or not is judged, and the labeled samples meeting the condition are selected to form a matrix B;
step 3: establishing an auxiliary learner f Using Gaussian Process regression GPR according to B1
The GPR is a nonparametric probability model based on a statistical learning theory, and is modeled by the GPR as follows:
given training sample set X ∈ RD×NAnd y ∈ RNWherein X ═ { X ═ Xi∈RD}i=1…N,y={yi∈R}i=1…NInput data and output data representing D dimensions, respectively, the relationship between the input data and the output data being generated by equation (7):
y=f(x)+ε (7)
where f is an unknown functional form, ε is a mean of 0 and a variance of
Figure FDA0002990457980000021
For a new input x*Corresponding probability prediction output y*Also satisfies a Gaussian distribution whose mean and variance are shown in formulas (8) and (9):
y*(x*)=cT(x*)C-1y (8)
Figure FDA0002990457980000022
in the formula c (x)*)=[c(x*,x1),…,c(x*,xn)]TIs a covariance matrix between the training data and the test data,
Figure FDA0002990457980000023
is a covariance matrix between training data, I is an identity matrix of dimension N × N, c (x)*,x*) Is the autocovariance of the test data;
GPR selects gaussian covariance function:
Figure FDA0002990457980000024
where v controls a measure of covariance, ωdRepresents each component xdThe relative importance of;
for the unknown parameters v, ω in equation (10)1,…,ωDSum of Gaussian noise variance
Figure FDA0002990457980000025
Using maximum likelihood estimation to obtain the parameters
Figure FDA0002990457980000026
Figure FDA0002990457980000027
The procedure for finding the value of the parameter θ is as follows:
in order to jump out of local optimum, setting the parameter theta as random values in different ranges, and selecting one random value in each range, wherein the ranges are different magnitudes and are respectively 0.001, 0.01, 0.1, 1 and 10;
obtaining optimized parameters by adopting a conjugate gradient method;
after obtaining the optimal parameter θ, for the test sample x*Estimating an output value of the GPR model by equations (8) and (9);
and step 3: using auxiliary learning devices f1For unlabeled sample set M1Predicting the label, and collecting the obtained pseudo label sample set S1Added to the initially labeled sample set S0In the method, a main learner is established by using a GPR method, wherein S0Is an initial labeled sample set;
the pseudo label sample is a sample obtained by artificially predicting a non-label sample by using an auxiliary learner, and the main learner tracks the test sample by using a model established by combining the label sample with the pseudo label sample; namely, the built model is used for predicting the concentration of butane at the bottom of the debutanizer.
2. The method of claim 1, further comprising:
selecting a sample dense area center by selecting samples belonging to the sample dense area;
the sample-dense region refers to a region where samples are distributed in a concentrated manner, and the center of the sample-dense region is the center of the sample-dense region.
3. The method of claim 1, wherein the method is applied to an industrial process to predict variables that cannot be directly measured by unlabeled samples.
4. The method of claim 3, wherein the industrial process comprises an environmental, metallurgical, and chemical process.
CN201810454373.4A 2018-05-14 2018-05-14 Method for predicting concentration of butane at bottom of debutanizer tower based on model of double-optimization semi-supervised regression algorithm Active CN108734207B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810454373.4A CN108734207B (en) 2018-05-14 2018-05-14 Method for predicting concentration of butane at bottom of debutanizer tower based on model of double-optimization semi-supervised regression algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810454373.4A CN108734207B (en) 2018-05-14 2018-05-14 Method for predicting concentration of butane at bottom of debutanizer tower based on model of double-optimization semi-supervised regression algorithm

Publications (2)

Publication Number Publication Date
CN108734207A CN108734207A (en) 2018-11-02
CN108734207B true CN108734207B (en) 2021-05-28

Family

ID=63937437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810454373.4A Active CN108734207B (en) 2018-05-14 2018-05-14 Method for predicting concentration of butane at bottom of debutanizer tower based on model of double-optimization semi-supervised regression algorithm

Country Status (1)

Country Link
CN (1) CN108734207B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766922B (en) * 2018-12-18 2021-10-12 东软集团股份有限公司 Data processing method, data processing device, storage medium and electronic equipment
CN112149733B (en) * 2020-09-23 2024-04-05 北京金山云网络技术有限公司 Model training method, model quality determining method, model training device, model quality determining device, electronic equipment and storage medium
CN116956201B (en) * 2023-09-19 2023-12-08 成都中轨轨道设备有限公司 Intelligent coupling early warning method for big data decision

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5814460A (en) * 1990-02-14 1998-09-29 Diatide, Inc. Method for generating and screening useful peptides
CN102693452A (en) * 2012-05-11 2012-09-26 上海交通大学 Multiple-model soft-measuring method based on semi-supervised regression learning
CN104778298A (en) * 2015-01-26 2015-07-15 江南大学 Gaussian process regression soft measurement modeling method based on EGMM (Error Gaussian Mixture Model)
CN104914723A (en) * 2015-05-22 2015-09-16 浙江大学 Industrial process soft measurement modeling method based on cooperative training partial least squares model
CN105205219A (en) * 2015-08-25 2015-12-30 华南师范大学 Production prediction method and system based on nonlinear regression model parameters
CN107451102A (en) * 2017-07-28 2017-12-08 江南大学 A kind of semi-supervised Gaussian process for improving self-training algorithm returns soft-measuring modeling method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5814460A (en) * 1990-02-14 1998-09-29 Diatide, Inc. Method for generating and screening useful peptides
CN102693452A (en) * 2012-05-11 2012-09-26 上海交通大学 Multiple-model soft-measuring method based on semi-supervised regression learning
CN104778298A (en) * 2015-01-26 2015-07-15 江南大学 Gaussian process regression soft measurement modeling method based on EGMM (Error Gaussian Mixture Model)
CN104914723A (en) * 2015-05-22 2015-09-16 浙江大学 Industrial process soft measurement modeling method based on cooperative training partial least squares model
CN105205219A (en) * 2015-08-25 2015-12-30 华南师范大学 Production prediction method and system based on nonlinear regression model parameters
CN107451102A (en) * 2017-07-28 2017-12-08 江南大学 A kind of semi-supervised Gaussian process for improving self-training algorithm returns soft-measuring modeling method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Application of multiple linear regression, central composite;Su Sin Chong et al.;《Measurement》;20150704;第78–86页 *
基于混合高斯回归的动态软测量方法研究;苏勇;《中国优秀硕士学位论文全文数据库基础科学辑》;20180115(第01期);全文 *

Also Published As

Publication number Publication date
CN108734207A (en) 2018-11-02

Similar Documents

Publication Publication Date Title
Sun et al. Using Bayesian deep learning to capture uncertainty for residential net load forecasting
WO2022262500A1 (en) Steof-lstm-based method for predicting marine environmental elements
Liu et al. A driving intention prediction method based on hidden Markov model for autonomous driving
CN108734207B (en) Method for predicting concentration of butane at bottom of debutanizer tower based on model of double-optimization semi-supervised regression algorithm
Angelov et al. Identification of evolving fuzzy rule-based models
CN108764295B (en) Method for predicting concentration of butane at bottom of debutanizer tower based on soft measurement modeling of semi-supervised ensemble learning
CN104699894A (en) JITL (just-in-time learning) based multi-model fusion modeling method adopting GPR (Gaussian process regression)
CN113012766B (en) Self-adaptive soft measurement modeling method based on online selective integration
CN109543731A (en) A kind of three preferred Semi-Supervised Regression algorithms under self-training frame
CN114912195B (en) Aerodynamic sequence optimization method for commercial vehicle
CN105913078A (en) Multi-mode soft measurement method for improving adaptive affine propagation clustering
CN107704426A (en) Water level prediction method based on extension wavelet-neural network model
Kasiviswanathan et al. Quantification of prediction uncertainty in artificial neural network models
CN115099511A (en) Photovoltaic power probability estimation method and system based on optimized copula
CN115742855A (en) Electric automobile remaining mileage prediction method and device, electric automobile and medium
Lu et al. Neural network interpretability for forecasting of aggregated renewable generation
Billert et al. A method of developing quantile convolutional neural networks for electric vehicle battery temperature prediction trained on cross-domain data
CN110491443B (en) lncRNA protein correlation prediction method based on projection neighborhood non-negative matrix decomposition
Rottmann et al. Learning non-stationary system dynamics online using gaussian processes
JP4220169B2 (en) Actual vehicle coating thickness prediction method, actual vehicle coating thickness prediction system, and recording medium
CN101226521A (en) Machine learning method for ambiguity data object estimation modeling
Yi et al. Efficient global optimization using a multi-point and multi-objective infill sampling criteria
CN112163632A (en) Application of semi-supervised extreme learning machine based on bat algorithm in industrial detection
CN115691140B (en) Analysis and prediction method for space-time distribution of automobile charging demand
Ito et al. Design space exploration using Self-Organizing Map based adaptive sampling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221024

Address after: 230000 B-1015, wo Yuan Garden, 81 Ganquan Road, Shushan District, Hefei, Anhui.

Patentee after: HEFEI MINGLONG ELECTRONIC TECHNOLOGY Co.,Ltd.

Address before: No. 1800 Lihu Avenue, Wuxi City, Jiangsu Province

Patentee before: Jiangnan University