CN112541558A

CN112541558A - Bayesian semi-supervised robust PPLS soft measurement method based on incomplete data

Info

Publication number: CN112541558A
Application number: CN202011576333.0A
Authority: CN
Inventors: 任世锦; 唐娴; 潘剑寒; 魏明生; 苏陈澄
Original assignee: Jiangsu Normal University
Current assignee: Jiangsu Normal University
Priority date: 2020-09-18
Filing date: 2020-12-28
Publication date: 2021-03-23

Abstract

The invention discloses a Bayesian semi-supervised robust PPLS (Bayesian semi-supervised robust PPLS) soft measurement method based on incomplete data, which is a Bayesian semi-supervised robust PPLS (BSRPPLS) fault monitoring method based on incomplete data, and is different from the traditional multivariate student distribution-based PPLS modeling method, wherein independent student distribution is used for modeling noise of each data vector, and an adjustable robust degree of freedom parameter is contained in the utilized distribution, so that the modeling flexibility is improved; solving the estimated posterior distribution parameters by using a Bayesian variational inference method; the model can reconstruct original data by using pollution-free data elements, reduces the influence of the polluted elements on reconstructed data, solves the problems of data loss and influence on model precision in a wild point, has good robustness, and is favorable for improving the monitoring performance of the industrial process and the understanding and cognition level of process running.

Description

Bayesian semi-supervised robust PPLS soft measurement method based on incomplete data

Technical Field

The invention belongs to the technical field of PPLS soft measurement, and particularly relates to a Bayesian semi-supervised robust PPLS soft measurement method based on incomplete data.

Background

With the advent of the industrial 4.0 era, modern industrial automation systems are continuously developing towards the trend of complication, informatization and intellectualization. Process monitoring has become an indispensable important component of modern complex industrial systems as a key for ensuring the stability of product quality and the safe and stable operation of process production equipment. In the actual process, due to the external environment change, the fluctuation of the quality of raw materials, the accuracy of the measuring equipment and the complexity of the equipment, a process mathematical monitoring model which is difficult to directly establish is difficult. Therefore, the data-based process monitoring theory and technology can help operators and engineers to further know the relevant knowledge of the production process, receive general attention of people, and achieve better effects in practical application [1-5 ]. Typical process monitoring methods mainly include Principal Component Analysis (PCA) and its modified forms, Partial Least Squares (PLS), Gaussian Mixture Model (GMM), and other statistical learning methods [5-8 ]. Considering the locality of process modal data and the fact that process essential features are often located in a data low-dimensional space, manifold learning has a powerful ability to describe the geometry of data, reduces in a non-linear dimension and represents an excellent representation of local characteristics of data. The method fully utilizes the advantages of the manifold learning description data structure and a classical statistical analysis method, and is a feasible method for improving the accuracy and the understandability of fault diagnosis. Common manifold learning methods for fault diagnosis are mainly Maximum Variance Unfolding (MVU), statistical Local Preserving Projections (LPP), Neighbor Preserving Embedding (NPE) and their expanded forms [9-13 ].

In practice, process data, variables and various complexities of the system itself, and in addition, the startup and shutdown of the production process or the switching process of each operation condition often have strong dynamic characteristics, which causes the data representation form of the industrial process to be very complex. The complexity is mainly expressed in the wild points and the loss, and the industrial process variables are expressed in the dynamic property, the Gaussian and non-Gaussian characteristics and the randomness. Due to the complexity of industrial process data, process operation and quality characteristics are often hidden in the data, presenting challenges to process monitoring, pattern understanding, and acquisition of knowledge of industrial operations. How to extract hidden data features from the complex and uncertain process data and improve the accuracy of process fault diagnosis and the understandability of a process operation mechanism is a key basic problem of process fault diagnosis. The process operation characteristics are often located in a hidden low-dimensional space, process data are subjected to external environment and equipment operation condition change to represent strong random uncertainty, and a modeling method for improving the robustness and the information extraction capability of a model gradually becomes an important process fault monitoring and soft measurement modeling method [15-21] by utilizing the strong capabilities of a linear hidden variable model in data dimension reduction and information extraction and a probability map theory in random data modeling. Considering the uncertainty such as environmental noise and the like and the problem of difference in PCA principal component space and residual space statistics, the document [15] proposes a whole-local factor number determination method capable of automatically determining the factor number based on a Factor Analysis (FA) model, and constructs monitoring statistics by using NLLP added with variable variance information. The student t-distribution can approximate Gaussian/non-Gaussian distribution capability, a PICA algorithm based on t-distribution is proposed in the article [7], and two-stage probability ICA (probabilistic ICA, PICA) and PPCA are proposed on the basis to respectively extract Gaussian and non-Gaussian information in data, so that the comprehensiveness of the mode is improved. Text [16] introduces a bayesian regularization factor into mixed Principal Component Regression (PCR), automatically selecting the number of principal component components. Student t-distribution belongs to generalized Gaussian distribution, is more suitable for simulating non-Gaussian distribution, and becomes an important method for modeling in a non-Gaussian process and improving model robustness [7, 20 ]. The text [20] provides a robust filter based on student t-distribution aiming at the problem that process noise and observation noise are both student t-distribution, and improves the navigation precision of the integrated navigation system and the adaptability to special conditions by fusing information of a plurality of sensors. In consideration of the problems of data missing and outliers in the process, the document [7] proposes a robust PPCA (robust predictive PCA, RPPCA) algorithm based on an EM optimization method, describes non-gaussian hidden variables by using independent t-distributions, and discusses a PCA modeling method for missing data. In order to improve the robustness of the hybrid model, the method [21] proposes a semi-supervised robust hybrid linear regression modeling method based on t-distribution obeyed by input variables, and is successfully applied to multi-modal process quality prediction. Document [22] proposes a Maximum-likelihood mixture factor analysis model (MLMFA) to solve the noise factor, non-gaussian component, and multi-modal problems. The problems of non-gaussian noise distribution and measurement data loss of the actual system sensor are solved, and a robust principal component analysis method of incomplete data is provided by the article [22] and is successfully used for data noise elimination.

The Partial Least Squares (PLS) model is a widely used industrial soft-sensing and fault monitoring technique, with great advances in both theoretical and practical research [19,23,24 ]. The text [19] researches an implicit variable model-PLS modeling theory and a probabilistic integrated modeling method of the implicit variable model in detail based on a probability map theory, and compares a plurality of implicit variable models. A detailed review of the recent development of PLS has been made by Anauspicious jade, and it can be seen that PLS and its expanded form remain important tools for soft measurement and fault monitoring of industrial processes [22 ]. However, conventional PLS is primarily directed to a fair amount of process (input) data and quality (output) data, complete data, and gaussian noise. The actual industrial process has the characteristics of high cost for acquiring quality data, frequent measurement data loss, deviation of measurement noise from Gaussian distribution, random uncertainty and the like, and the performance of the PLS model is seriously influenced. In order to fully utilize information of marked unmarked process data, eliminate the influence of missing data and outliers on PLS modeling and accurately model non-Gaussian noise.

[1].Zeyu Yang，Zhiqiang Ge.Industrial Virtual Sensing for Big Process Data based on Parallelized Nonlinear Variational Bayesian Factor Regression.IEEE Transactions on Instrumentation and Measurement，2020

[2]Jing Yang，Guo Xie a，Yanxi Yang.An improved ensemble fusion autoencoder model for fault diagnosis from imbalanced and incomplete data.Control Engineering Practice，98(2020) 104358

[3].Tipping，M.E.，Lawrence，N.D.Variational inference for Students’t-models:robust Bayesian interpolation and generalized component analysis.Neurocomputing，69:123-141，2005.

[4].Bei Wang，Zhichao Li，Zhenwen Dai，Neil Lawrence，Xuefeng Yan.Data-driven mode identification and unsupervised fault detection for nonlinear multimode processes.IEEE Transactions on Industrial informatics，16(6):3651-3660，2020

[5].Wende Tian，Yujia Ren，Yuxi Dong，Shaoguang Wang，Lingzhen Bu.Fault monitoring based on mutual information feature engineering modeling in chemical process.Chinese Journal of Chemical Engineering 27(2019)2491–2497

[6] Zhaikun, Duvenxia, Lufeng, Chongtao, Xiyuan, an improved dynamic nuclear principal component analysis fault detection method, chemical science report 2019,70(2):716-

[7] Zhujinlin, data-driven industrial process robust supervision, doctor academic thesis, Zhejiang university, 2016

[8]T Yie Yu.A nonlinear kernel Gaussian mixture model based inferential monitoring approach for fault detection and diagnosis of chemical processes.Chemical Engineering Science,68(2012) 506-519

[9]Yuan-Jui Liu,Tao Chen,Yuan Yao.Nonlinear process monitoring and fault isolation using extended maximum variance unfolding.Journal of Process Control 24(2014)880–891

[10]Fei He,Jinwu Xu A novel process monitoring and fault detection approach based on statistics locality preserving projections.Journal of Process Control,37:46-57,2016.

[11]Xiaoxia Chen,Chudong Tong,Ting Lan,Lijia Luo.Dynamic process monitoring based on orthogonal dynamic inner neighborhood preserving embedding model.Chemometrics and Intelligent Laboratory Systems,193(2019)103812

[12]Bing Song,Shuai Tan,Hongbo Shi.Time–space locality preserving coordination for multimode process monitoring.Chemometrics and Intelligent Laboratory Systems,15115:190-200,2016

[13]Yue Li,Yijie Zeng,Yuanyuan Qing，Guang-Bin Huang.Learning local discriminative representations via extreme learning machine for machine fault diagnosis.Neurocomputing,4097:275-285,2020.

[14] Baiting, wangsing, waihong, factor analysis monitoring method based on variable probability information, chemical science, 2017,68(7):2844- "2850.

[15]Zhiqiang Ge.Mixture Bayesian Regularization of PCR Model and Soft Sensing Application.IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS,62(7):4336-4343,2015

[16] Dynamic monitoring method of chemical process based on latent variable autoregressive algorithm for Tangjun Miao, Shuhaizhen, Shixuhua and Tongdong, chemical industry bulletin, 2019,70(3):987-

[17]GE Z Q,SONG Z H.Maximum-likelihood mixture factor analysis model and its application for process monitoring[J].Chemometrics&Intelligent Laboratory Systems,2010,102(1):53-61.

[18]Weiming Shao,Zhiqiang Ge,Le Yao,Zhihuan Song.Bayesian Nonlinear Gaussian Mixture Regression and its Application to Virtual Sensing for Multimode Industrial Processes.IEEE Transactions on Automation Science and Engineering,17(2):423-437,2020

[19] Zhengjunhua industrial process data hidden variable regression modeling and application, doctor academic thesis, Zhejiang university, 2017

[20] Information fusion algorithm under the Filter framework of Markuo, Wuhang Student's t Zhejiang university newspaper (engineering edition), 54(3) 581-

[21]Weiming Shao,Zhiqiang Ge,Zhihuan Song,et al.Semisupervised Robust Modeling of Multimode Industrial Processes for Quality Variable Prediction Based on Student’s t Mixture Model.IEEE Transactions on Industrial informatics,16(5):2965-2976,2020.

[22]Jaakko Luttinen,Alexander Ilin,Juha T Karhunen.Bayesian Robust PCA of Incomplete Data.Neural Processing Letters,36(2):189-202,2012

[23] Chenjiayi, zhao zhong gai, liu fei, robust PPLS model and its application in process monitoring, chemical industry, 67 (7): 2907-2915, 2016.

[24] The method comprises the steps of hole jade, reconstruction, Luo Jia Yu and the like, a quality-related fault detection method based on local information increment and MPLS, control and decision https:// doi.org/10.13195/j.kzyjc.2019.1402.

Disclosure of Invention

The invention aims to solve the technical problem of providing a Bayesian semi-supervised robust PPLS soft measurement method based on incomplete data aiming at the defects of the background technology, modeling each data vector noise by using independent student distribution, improving the flexibility of modeling by utilizing the distribution to contain an adjustable robust degree of freedom parameter, and estimating posterior distribution by using a Bayesian variational inference method. The model not only makes full use of the marked data and the unmarked data, but also reconstructs original data by using the pollution-free data elements as much as possible, reduces the influence of the polluted elements on the reconstructed data, solves the problems of data loss and wild points, has good robustness, improves the precision of the model, and is beneficial to improving the monitoring performance of the industrial process and the understanding and cognition level of the process operation.

The invention adopts the following technical scheme for solving the technical problems:

a Bayesian semi-supervised robust PPLS soft measurement method based on incomplete data comprises a Bayesian semi-supervised robust PPLS model of incomplete data and model parameter learning of Bayesian variational inference; the method specifically comprises the following steps;

step 1, initializing prior distribution parameters and hidden variable distribution hyperparameters;

step 2, determining initial model parameters and hidden variable parameters by using a PPLS (point-to-multipoint localization and localization) method based on an EM (effective distance) algorithm according to a training data set;

step 3, calculating posterior distribution q (delta) of the hidden variables and updating distribution parameters;

step 4, solving an optimization problem to obtain an optimal prior hyperparameter v;

step 5, calculating the variation lower bound of the log-likelihood function according to the approximate posterior distribution;

step 6, judging whether a convergence condition is met, if so, predicting quality data corresponding to the unknown data sample to realize quality soft measurement; otherwise, returning to the step 3.

As a preferred scheme of the Bayes semi-supervised robust PPLS soft measurement method based on incomplete data, in step 6, the model convergence condition is that likelihood function L (q (delta), theta) change is smaller than a predetermined threshold value to determine thr, namely that

|L(q(Δ^(t+1)),Θ^(t+1))-L(q(Δ^(t)),Θ^(t))|<thr

Wherein, L (q (. DELTA.. DELTA.))^(t)),Θ^(t)) And L (q (. DELTA.))^(t+1)),Θ^(t+1)) Respectively, the values at the t-th and t + 1-th iterations, and a threshold thr is set to 10^-5。

As an optimal scheme of the Bayes semi-supervised robust PPLS soft measurement method based on incomplete data, the Bayes semi-supervised robust PPLS model of the incomplete data specifically comprises the following steps;

giving output of marked samples

And input

Label-free sample

And satisfies N ═ N_L+N_uObserving noise and process noise are subjected to independent t-distribution, wherein N is the amount of marked sample data, and PLS input data and PLS output data are sharedShared hidden variables

The probabilistic PLS model can be expressed as

Wherein the content of the first and second substances,

and

in order to be a weight matrix, the weight matrix,

to share hidden variables, mu_xAnd mu_yMean vectors of the process variable and the observation variable respectively; x and y are column vectors with dimensions D and E, process data noise

v_x＝[v_x,1,v_x,2,…,v_x,D]，τ_x＝[τ_x,1,τ_x,2,…,τ_x,D](ii) a Observation data noise epsilon_yAlso obey an independent t-distribution, the form of which is the same as above; the t-distribution has the following form

Wherein the content of the first and second substances,

in order to be a function of the Gamma function,

representing a Gamma distribution with a shape parameter a ' and an inverse scale parameter b ', v ' being a degree of freedom. As can be seen from the above equation, the t-distribution can be interpreted as an infinite number of Gaussian distributionsMixing; for the latent variable t, the a priori of the model parameters P, C, μ and the noise level τ is similar to PPCA in the form

p(t)＝N(t|0,I_M) (3)

Wherein τ represents τ_xAnd τ_yAnd μ represents μ_xAnd mu_yAnd beta represents beta_xAnd beta_y(ii) a The parameter tau is used for P, C, discussion of mu prior, and the prior of the noise parameter tau is a Gamma distribution which is independent of each other, i.e.

p(τ)＝∏Ga(τ_d|a_τ,b_τ)

Parameter α ═ α_x,α_y]And a priori of β is

p(β)＝Ga(β|a_β,b_β)

In the simulation, to obtain a wider distribution, the hyper-parameter is set to a_τ＝b_τ＝a_α＝b_α＝a_β＝b_β＝10^-5For each isotropic noise, τ can be set_m＝τ；

Order to

If z is_nThe elements are observed independently, so that the independent contaminating element z_dnIf the assumption is true, an independent student t-distribution pair ε may be used_nEach element of (a) is modeled; the likelihood function for the labeled sample is

Wherein O represents z_dnObservable indications dn set, w_dIs the D-th row vector of the matrix W, D is 1,2, …, D + E, N is 1,2, …, N_LFor unlabeled samples

Corresponding likelihood function of

Wherein O' represents an unlabeled sample X^uIndicates the set of d 'n', w_d'Is equivalent to p_d'，d'＝1，2,…,D，n'＝N_L+1,N_L+2,…,N，μ_1:DThe component vector from element 1 to element D representing the vector mu is mu_x；

Introducing hidden variables U and U', constructing student t-distribution by using Gaussian distribution hierarchy, and considering likelihood functions of all marked samples and unmarked samples

Wherein, W_1:D,:Representing momentsThe matrix formed by row vectors of 1 to D rows of the matrix is the matrix P.

As an optimal scheme of the Bayes semi-supervised robust PPLS soft measurement method based on incomplete data, the Bayes variational inference model parameter learning specifically comprises the following steps;

step 2.1 Bayesian variational reasoning;

and 2.1, learning posterior distribution parameters of Bayesian variational inference.

As the optimal scheme of the Bayes semi-supervised robust PPLS soft measurement method based on incomplete data, the Bayes variational inference specifically comprises the following steps;

the bayesian variational reasoning principle is that given a training data set Ω and the model parameters to be optimized and hidden variables Δ ═ { T, μ, W, τ, U, α, β, Θ }, the true posterior distribution p (Δ | Ω) and the arbitrary form probability distribution q (Δ) about Δ can be decomposed into log-likelihood functions lnp (Ω)

lnp(Ω)＝L(q)+KL(q||p)

Where l (q) ═ q (Δ) ln (p (Δ, Ω)/q (Δ)) d Δ, KL (q | | p) ═ q (Δ) ln (q (Δ)/p (Δ | Ω)) d Δ represents Kullback-leibler (KL) divergence. Since KL (q | | p) ≧ 0, lnp (Ω) ≧ l (q), which maximizes lnp (Ω) is equivalent to maximizing l (q), the approximation of q (Δ) to p (Δ | Ω) is achieved by optimizing q (Δ) such that KL (q | | p) is 0, which takes the form of an optimization problem of

Suppose q (Δ) can be decomposed into products of respective optimized parameter distributions

Obtaining an optimal approximate distribution q by_i(Δ_i) I.e. by

Wherein，-Δ_iRemoving Δ for Δ_iThe latter set of optimization parameters.

Based on Bayes variational inference theory, according to model probability structure diagram and likelihood function, combining posterior probability distribution function as

Wherein, according to the principle of conditional independence, p (W, μ, τ | α, β) is p (W | τ, α) p (μ | τ, β) p (τ);

according to the mean field theory, the posterior probabilities of hidden variables can be respectively

p(W,T,μ,τ,U,α,β)≈q(T)q(μ,W|τ)q(U)q(α)q(β)q(τ) (14)

Let Δ ═ q (T, μ, W, τ, U, α, β, Θ }, q (Δ) ═ q (T) q (μ, W | τ) q (U) q (α) q (β) q (τ), and the lower bound of the variation of the log-likelihood function is defined as the bayesian variation principle

L(q(Δ),Θ)＝〈lnp(Δ,Z,X^u|Θ)〉_Δ-〈lnq(Δ)〉_Δ+const (15)

Here const is treated as a constant as an independent term independent of Δ. Solving for L (q (Δ), Θ) is equivalent to separately deriving all the variational distributions.

As an optimal scheme of the Bayes semi-supervised robust PPLS soft measurement method based on incomplete data, the posterior distribution parameter learning of Bayes variational inference specifically comprises the following steps;

considering q (μ, W | τ, α, β), note that

The following correlation terms can be obtained by differentiating the variable lower bound pair L (q (Δ), Θ) with q (μ, W | τ) by:

where, -W represents the remaining optimization parameters of Δ divided by W. Tau_dIs a random number τ_d(iii) a desire; the above formula is arranged to obtain

Has a mean and a variance of

O 'when D + D +1, D +2, …, D + E'_dAnd (4) space-time collection. Calculated by the same method

The mean value and variance are updated in the form of

Since q (W | τ, α) and q (μ | τ, β) follow a normal distribution, then

According to equations (16) - (20), the corresponding mean and variance of equation (21) are expressed as follows

Obtaining

Then, with

The related expectation has the following form

Wherein the content of the first and second substances,

to pair

The corresponding covariance matrix is then used as a basis,

is mu_dThe variance of the corresponding one of the first and second values,

to represent

And mu_dCovariance vector between, they can all be derived from the covariance matrix ∑_dIs obtained directly in the step (1);

taking into account posterior distribution

Derived from q (tau)

Further finishing the above formula to obtain

Here, N_dLabeled sample x given d_dnNumber and unlabeled sample x_dnSum of quantities, N_d'Labeling sample output y for a given d_dnQuantity, O_dAll observable mark samples z ═ n | dn ∈ O }_dnIndicating a set of n, homologus O'_dAll observable unlabeled samples x are for { n | dn ∈ O' }_dnIndicating a set of n;

taking into account posterior distribution

The updated form is

The parametric form of q (T) can be obtained according to the above formula

Wherein, O_:n＝{d|dn∈O}，O'_:n＝{d|dn∈O'}；

Note the posterior distribution

Derived from q (. alpha.) to obtain

By arranging the above formula, the following parameter form of q (alpha) can be easily obtained

Same posterior distribution

Derived from q (beta)

The parameter calculation method comprises

For independent student t-distribution model, posterior distribution

For derivation thereof may be

The distribution parameter of q (U) can be obtained from the above formula

Wherein the content of the first and second substances,

is calculated in the form of

Is calculated in the form of

For a multidimensional student t-distribution noise model, the text simply refers to the parameter v_dAnd u_dnIs set as v_d＝v,u_dn＝u_nThen the method is finished;

hyper-parameter

Can be obtained by maximizing the following optimization problem

v_d(D ═ D +1, D +2, …, D + E) can be obtained by maximizing the following optimization problem

Wherein the content of the first and second substances,

student t-distribution in d dimension;

after modeling, for unlabeled data x_nWith the corresponding hidden variable t_nThen the predicted output is

Compared with the prior art, the invention adopting the technical scheme has the following technical effects:

the invention discloses a Bayesian semi-supervised robust probability PLS (BRPPLS) fault monitoring method for incomplete data, which is different from the existing multivariate student t-distribution PPLS-based modeling method, wherein independent student t-distribution is used for modeling noise of each data vector, and an adjustable robust degree of freedom parameter is included in the t-distribution, so that the flexibility of modeling is improved; solving the estimated posterior distribution by using a Bayesian variational inference method; the model not only makes full use of the marked data and the unmarked data, but also reconstructs original data by using the pollution-free data elements as much as possible, reduces the influence of the polluted elements on the reconstructed data, solves the problems of data loss and wilderness, has good robustness and high precision, and is beneficial to improving the monitoring performance of the industrial process and the understanding and cognition level of the process operation.

Drawings

FIG. 1 is a Bayesian semi-supervised robust PPLS probability structure diagram of incomplete data of the present invention;

fig. 2 is a flow chart of a soft measurement framework proposed by the present invention.

Detailed Description

The technical scheme of the invention is further explained in detail by combining the attached drawings:

the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

step 4, solving an optimization problem and solving an optimal prior hyperparameter v;

step 5, calculating the variation lower bound of the logarithmic interpretation function according to the approximate posterior distribution;

As shown in fig. 1, a bayesian semi-supervised robust PPLS probabilistic architecture diagram of incomplete data: y in FIG. 1_nFor output of the tagged data, x_nInput of tagged data and untagged data. t is t_nIs y_nAnd x_nNot only eliminate redundancy and noise in input variables, but also can express x_nChange and y_nThe nature of the linkage between the changes. Hidden variables alpha, beta and tau are model parameters P, C and mean value mu probability distribution parameters, model parameter oscillation is controlled, and the convergence rate of model training is improved, wherein mu comprises the mean value mu_xAnd mu_y. Introducing hidden variable u_nThe student t-distribution can be converted into Gaussian distribution, and model optimization problem solving is facilitated. Parameter v is hidden variable u_nThe Gamma distribution parameter improves the description capability of the input variable and the output variable. The variable denoted by the arrow in the figure represents a variable dependent on the back end of the arrow, e.g. the input variable x_nDependent parameter P, t_n、ε_xAnd mu_xThe variable C depends on the implicit variables α and τ. As the student t-distribution is used for modeling the noise, the non-Gaussian noise existing in the actual system is well described, and the modeling precision is improved; the model fully utilizes the information of the marked samples and the unmarked samples, is suitable for the situation that the number of the marked samples is far less than that of the unmarked samples, and enlarges the application range of the method. Missing elements in the data vectors are ignored, and the number vector normal elements are used for reconstructing the missing data, so that the robustness and the performance of the model are improved.

The standard of prior probability parameters is simplified in the figure, and only training data, model parameters, hidden variables and parameters needing to be updated are labeled. Both the full likelihood distribution function and the hidden variable posterior distribution form are obtained according to fig. 1.

Fig. 2 shows a flow chart of implementation of the present invention, in which an initial prior distribution parameter and a latent variable distribution hyper-parameter selection method are discussed in the above description, the prior distribution parameter is determined mainly based on experience, and an initial model parameter and a latent variable parameter are determined according to a training data set by using a PPLS method based on an EM algorithm proposed in text [19 ]. Therefore, the convergence rate of the optimization algorithm is improved, the local extreme points are not easy to trap, and the model performance is improved.

And updating the posterior distribution parameters of the hidden variables according to the parameter updating formulas of the formulas (17) and (42). When updating, the current parameter is to be updated by using the parameter value updated previously.

Solving the optimization problem shown in the formula (43) and the formula (44) to update the values of the t-distribution parameters v of the students corresponding to the marked samples and the unmarked samples_nThe matlab optimization function can be used to solve.

The convergence condition of the model is determined by determining thr according to the likelihood function L (q (Δ), Θ) shown in equation (15) with a variation smaller than a predetermined threshold, that is

|L(q(Δ^(t+1)),Θ^(t+1))-L(q(Δ^(t)),Θ^(t))|＜thr

Wherein, L (q (. DELTA.. DELTA.))^(t)),Θ^(t)) And L (q (. DELTA.))^(t+1)),Θ^(t+1)) Respectively, the values at the t-th iteration and t +1 iteration, and the threshold thr is generally set to 10^-5。

After the model training is finished, predicting the quality data corresponding to the unknown data sample by using the formula (45) to realize the soft measurement of the quality

Bayesian semi-supervised robust PPLS model for 1 incomplete data

Giving output of marked samples

And input

Label-free sample

Satisfy N ═ N_L+N_uThe observation noise and the process noise are subjected to independent t-distribution, wherein N is the amount of marked sample data. Sharing hidden variables between PLS input data and output data

The probabilistic PLS model can then be expressed as

Wherein the content of the first and second substances,

and

in order to be a weight matrix, the weight matrix,

v_x＝[v_x,1,v_x,2,…,v_x,D]，τ_x＝[τ_x,1,τ_x,2,…,τ_x,D](ii) a Observation data noise epsilon_yAlso obey an independent t-profile, the form of which is the same as above. the t-distribution has the following form

Wherein the content of the first and second substances,

in order to be a function of the Gamma function,

representing a Gamma distribution with a shape parameter a ' and an inverse scale parameter b ', v ' being a degree of freedom. As can be seen from the above equation, the t-distribution can be interpreted as a mixture of infinite Gaussian distributions. For the latent variable t, the a priori of the model parameters P, C, μ and the noise level τ is similar to PPCA in the form

p(t)＝N(t|0,I_M) (3)

Where τ denotes τ_xAnd τ_yAnd μ represents μ_xAnd mu_yAnd beta represents beta_xAnd beta_y. The purpose of applying a priori to P, C is to reduce the risk of over-fitting, helping to actively find the dimensions of the subspace. The parameter τ is used in P, C, and the discussion of μ a priori is set forth in article [ 2]]. Of course a more complex prior could be used for P, C, μ, since we use a simple prior for the model parameters since we focus on the distribution of noise here. The noise parameter τ being a priori a Gamma distribution which is independent of each other, i.e.

p(τ)＝∏Ga(τ_d|a_τ，b_τ)

Parameter α ═ α_x，α_y]And a priori of β is

p(β)＝Ga(β|a_β,b_β)

In the simulation, to obtain a wider distribution, the parameter is set to a_τ＝b_τ＝a_α＝b_α＝a_β＝b_β＝10^-5. For each isotropic noise, τ can be set_m＝τ。

Order to

As can be seen from the above definition,

if z is_nThe elements are observed independently, so that the independent contaminating element z_dnThe assumption is that, independent student t-distribution pairs ε can be used_nEach element of (a) is modeled. Likelihood function for a labeled sample is

Wherein O represents z_dnObservable indications dn set, w_dD being W_thA row vector, D1, 2, …, D + E, N1, 2, …, N_l. For unlabeled samples

Corresponding likelihood function of

Wherein O' represents an unlabeled sample X^uCan observeThe elements indicate the set of d 'n', w_d'Is equivalent to p_d'，d'＝1,2,…,D，n'＝N_L+1,N_L+2,…,N。μ_1:DElement 1 through element D representing vector μ constitute a vector (i.e., μ_x)。

Introducing hidden variables U and U', the student t-distribution can be constructed by using Gaussian distribution hierarchy, and then the likelihood functions of all marked samples and unmarked samples are considered

Wherein, W_1:D,:A matrix (actually, matrix P) consisting of row vectors representing 1 to D rows of the matrix. Given the observed data, bayesian inference is based on estimating the posterior distribution of the unknown variables. We use variational Bayesian methods to solve the problem that joint posterior distribution is difficult to handle.

Model parameter learning for Bayesian variational inference

2.1 Bayesian variational inference

lnp(Ω)＝L(q)+KL(q||p) (10)

It is assumed that q (Δ) can be decomposed into products of respective optimized parameter distributions

Obtaining an optimal approximate distribution q by_i(Δ_i) I.e. by

Here, -Delta_iRemoving Δ for Δ_iThe latter set of optimization parameters.

Based on Bayes variational inference theory, according to the model probability structure diagram shown in FIG. 1 and the likelihood function shown in equation (9), the combined posterior probability distribution function is

Here, according to the principle of conditional independence, p (W, μ, τ | α, β) is p (W | τ, α) p (μ | τ, β) p (τ).

According to the mean field theory, the posterior probabilities of the hidden variables can be respectively

p(W,T,μ,τ,U,α,β)≈q(T)q(μ,W|τ)q(U)q(α)q(β)q(τ) (14)

Let Δ ═ T, μ, W, τ, U, α, β, Θ }, q (Δ) ═ q (T) q (μ, W | τ) q (U) q (α) q (β) q (τ), and the lower bound of the variation of the log-likelihood function be, according to the bayes principle of variation

L(q(Δ),Θ)＝〈lnp(Δ,Z,X^u|Θ)〉_Δ-〈lnq(Δ)〉_Δ+const (15)

2.1 posterior distribution parameter learning by Bayesian variational inference

Considering q (μ, W | τ, α, β), note that

here, -W represents the remaining optimization parameters of Δ divided by W. Tau_dIs a random number τ_dThe expectation is that. The above formula is arranged to obtain

Has a mean and a variance of

It is noted that O 'when D +1, D +2, …, D + E'_dAnd (4) space-time collection. Calculated by the same method

The mean value and variance are updated in the form of

Since q (W | τ, α) and q (μ | τ, β) follow a normal distribution, then

Obtaining

Then, with

μ_dThe related expectation has the following form

Wherein the content of the first and second substances,

to pair

The corresponding covariance matrix is then used as a basis,

is mu_dThe variance of the corresponding one of the first and second values,

to represent

And mu_dCovariance vector between, they can all be derived from the covariance matrix ∑_dCan be directly obtained.

Taking into account posterior distribution

Derived from q (tau)

Further finishing the above formula to obtain

Here, N_dLabeled sample x given d_dnNumber and unlabeled sample x_dnSum of quantities, N_d'Labeling sample output y for a given d_dnQuantity, O_dAll observable mark samples z ═ n | dn ∈ O }_dnIndicating a set of n, homologus O'_dAll observable unlabeled samples x are for { n | dn ∈ O' }_dnIndicating the set of n.

Taking into account posterior distribution

The updated form is

The parametric form of q (T) can be obtained according to the above formula

Here, O_:n＝{d|dn∈O}，O'_:n＝{d|dn∈O'}。

Note the posterior distribution

Derived from q (. alpha.) to obtain

Same posterior distribution

Derived from q (beta)

The parameter calculation method comprises

For independent student t-distribution model, posterior distribution

For derivation thereof may be

The distribution parameter of q (U) can be obtained from the above formula

Wherein the content of the first and second substances,

is calculated byIn the form of

Is calculated in the form of

For a multidimensional student t-distribution noise model, the text simply refers to the parameter v_dAnd u_dnIs set as v_d＝v,u_dn＝u_nAnd (4) finishing.

Hyper-parameter

Can be obtained by maximizing the following optimization problem

Wherein the content of the first and second substances,

is the d-dimensional student t-distribution.

In implementation, the problem of selecting a hidden space dimension needs to be solved first. One approach is to compute the L2-norm of the side-by-side projection matrix, select the projection dimension with the larger column vector norm, and ignore the dimension with the smaller norm. The other method is to roughly select projection dimensions according to a norm method, and then adopt a cross validation method to select dimensions.

Additionally, initial values of the model parameters need to be determined. C. P, mu_y、μ_xAnd the hidden variable initial value problem can use the text [19] for the output quality data and the output data respectively]The PPLS is obtained, and the method has the advantages that the learning effect of the model under the Bayesian variational inference iterative framework is more stable, and the convergence speed is accelerated.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention. While the embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. A Bayesian semi-supervised robust PPLS soft measurement method based on incomplete data is characterized in that: the Bayes semi-supervised robust PPLS model containing incomplete data and the model parameter learning of Bayes variational inference are carried out; the method specifically comprises the following steps;

step 3, calculating posterior distribution q (delta) of the hidden variables according to a Bayes variational inference method and updating model parameters and the hidden variables;

step 6, judging whether a convergence condition is met, if so, predicting quality data corresponding to the unknown data sample, and realizing soft measurement of the quality index; otherwise, returning to the step 3.

2. The Bayesian semi-supervised robust PPLS soft measurement method based on incomplete data as recited in claim 1, wherein: in step 6, the model convergence condition is that the change of the likelihood function L (q (Δ), Θ) is smaller than the predetermined threshold thr, i.e. the model convergence condition is

|L(q(Δ^(t+1)),Θ^(t+1))-L(q(Δ^(t)),Θ^(t))|＜thr

Wherein, L (q (. DELTA.. DELTA.))^(t)),Θ^(t)) And L (q (. DELTA.))^(t+1)),Θ^(t+1)) Respectively representing the values at the t-th and t + 1-th iterations, and a threshold thr set to 10^-5。

3. The Bayesian semi-supervised robust PPLS soft measurement method based on incomplete data as recited in claim 1, wherein: the Bayes semi-supervised robust PPLS model of incomplete data specifically comprises the following steps;

giving output of marked samples

And input

Label-free sample

And satisfies N ═ N_L+N_uThe observation noise and the process noise are both subject to independent t-distributions, where N is_LAnd N_uRespectively representing the sample size of marked and unmarked samples, D and E respectively representing the dimension of sample input and output, and setting the shared hidden variable between PLS input data and output data

The probabilistic PLS model can be expressed as

Wherein the content of the first and second substances,

and

in order to be a weight matrix, the weight matrix,

to share hidden variables, mu_xAnd mu_yMean vectors of the process variable and the observation variable respectively; x and y are column vectors with dimensions D and E respectively, and the process data noise obeys t-distribution

v_x＝[v_x,1,v_x,2,…,v_x,D]，τ_x＝[τ_x,1,τ_x,2,…,τ_x,D](ii) a Observation data noise epsilon_yAlso obeys an independent t-distribution, the form of which is the same as above, wherein the student t-distribution has the form

Wherein the content of the first and second substances,

in order to be a function of the Gamma function,

representing a Gamma distribution with a shape parameter a ' and an inverse scale parameter b ', v ' being a degree of freedom; the t-distribution is interpreted as a mixture of infinite gaussian distributions, u' is the noise level of the hidden variable control variable; for the latent variable t, the a priori of the model parameters P, C, μ and the noise level τ is similar to PPCA in the form

p(t)＝N(t|0,I_M) (3)

Wherein the content of the first and second substances,

denotes τ_xAnd τ_yThe column vector of the component is composed of,

represents μ_xAnd mu_yVector of composition, beta denotes beta_xAnd beta_yWhere T represents the transpose of a matrix or vector; the parameter tau is used for P, C, mu a priori, the a priori of the noise level parameter tau is a Gamma distribution,the distribution of each variable being independent of each other, i.e.

Parameter α ═ α_x,α_y]And a priori of β is

p(β)＝Ga(β|a_β,b_β)

In the simulation, to obtain a wider distribution, the hyper-parameter is set to a_τ＝b_τ＝a_α＝b_α＝a_β＝b_β＝10^-5For each isotropic noise, all noise level parameters are the same, i.e. τ can be set_m＝τ；

Order to

A matrix representing the composition of the tag data,

a sample of the marked data is represented,

a matrix of all inputs of labeled and unlabeled sample sets,

a matrix of weights is represented by a matrix of weights,

indicating the noise of the marked sample,

the mean value vector is represented by a mean value vector,

a hyper-parametric vector representing the noise distribution,

representing the process variable and the observation variable noise level vectors,

representing shared hidden variables corresponding to marked samples and unmarked samples,

representing hidden variables corresponding to marked samples and unmarked samples,

for the n-th marked sample

If z is_nThe element is observed independently, then its element z_dnAre independent of each other, and if this assumption is true, independent student t-distribution pairs z may be used_nOf the noise variance epsilon_nIs modeled, N is 1,2, …, N_LD ═ 1,2, …, D + E; the likelihood function for the labeled sample is

Wherein O represents z_dnSet of observable indications dn, w_dIs the D-th row vector of W, D is 1,2, …, D + E, N is 1,2, …, N_LFor not markingRecording sample

Corresponding likelihood function of

Wherein O' represents an unlabeled sample X^uIndicates the set of d 'n', note that here w_d'Is equivalent to p_d'，d'＝1,2,…,D，n'＝N_l+1，N_l+2,…,N，μ_1:DThe 1 st element to the D th element of the expression vector mu form the vector mu_x；

Introducing an implicit variable U, constructing a student t-distribution by using a Gaussian distribution, and considering likelihood functions of all marked samples and unmarked samples

Wherein, W_1:D,:The matrix formed by row vectors of the 1 st row to the D th row of the matrix W is the matrix P.

4. The Bayesian semi-supervised robust PPLS soft measurement method based on incomplete data as recited in claim 1, wherein: model parameter learning of Bayesian variational inference specifically comprises the following steps;

step 2.1 Bayesian variational reasoning;

5. The Bayesian semi-supervised robust PPLS soft measurement method based on incomplete data as recited in claim 4, wherein: bayes variational reasoning, which specifically comprises the following steps;

the bayesian variational reasoning principle is that given a training data set Ω and a posterior distribution p (Δ | Ω) and an arbitrary formal probability distribution q (Δ) about Δ that require optimization of model parameters and hidden variables Δ ═ T, μ, W, τ, U, α, β, Θ, the log-likelihood function lnp (Ω) can be decomposed into

ln p(Ω)＝L(q)+KL(q||p) (10)

Where l (q) ═ q (Δ) ln (p (Δ, Ω)/q (Δ)) d Δ, KL (q | | p) ═ q (Δ) ln (q (Δ)/p (Δ | Ω)) d Δ represents Kullback-leibler (KL) divergence. Since KL (q | | p) ≧ 0, lnp (Ω) ≧ L (q). Such maximization lnp (Ω) is equivalent to maximization l (q), and q (Δ) is approximated to p (Δ | Ω) by optimizing q (Δ) such that KL (q | | p) is 0, which is in the form of an optimization problem

The optimum approximate distribution q can be obtained by_i(Δ_i) I.e. by

Wherein, -Delta_iRemoving Δ for Δ_iThe latter set of optimization parameters.

from the mean field theory and the model probability map, the hidden variable posterior probability can be written as

p(W,T,μ,τ,U,α,β)≈q(T)q(μ,W|τ)q(U)q(α)q(β)q(τ) (14)

L(q(Δ)，Θ)＝<ln p(Δ,Z,X^u|Θ)>_Δ-<lnq(Δ)>_Δ+const (15)

6. The Bayesian semi-supervised robust PPLS soft measurement method based on incomplete data as recited in claim 4, wherein: the posterior distribution parameter learning of Bayes variational inference specifically comprises the following steps;

(1) taking into account the posterior distribution q (μ, W | τ, α, β) of the model parameters (μ, W), it is noted that

wherein, -W represents the remaining optimization parameter of Δ divided by W, O_dN | dn ∈ O } represents all observable marker samples z in the set O_dnIndicating a set of n, O'_dDenotes the set O 'all observable marker samples z [ n | dn ∈ O' ]_dnIndicating the set of N, N-1, 2, …, N, D-1, 2, …, D + E.<τ_d>Representing a random number τ_d(iii) a desire; the above formula is arranged to obtain