CN109917777B

CN109917777B - Fault detection method based on mixed multi-sampling rate probability principal component analysis model

Info

Publication number: CN109917777B
Application number: CN201910304064.3A
Authority: CN
Inventors: 周乐; 丛亚; 葛志强; 武晓莉; 成忠; 单胜道
Original assignee: Zhejiang Lover Health Science and Technology Development Co Ltd
Current assignee: Zhejiang Lover Health Science and Technology Development Co Ltd
Priority date: 2019-04-16
Filing date: 2019-04-16
Publication date: 2020-08-25
Anticipated expiration: 2039-04-16
Also published as: CN109917777A

Abstract

The invention discloses a fault detection method based on a mixed multi-sampling rate probability principal component analysis model, which comprises the following steps: (1) collecting data of different sampling rates under multiple modes of normal operation of a chemical process to be monitored to form a training sample set for modeling; (2) preprocessing a training sample set; (3) constructing a mixed multi-sampling rate probability principal component analysis model by utilizing the preprocessed training sample set; (4) collecting monitoring samples of a sampling rate process of a chemical process to be monitored on line; (5) based on the constructed mixed multi-sampling rate probability principal component analysis model, calculating monitoring statistics of monitoring samples; (6) and judging whether the fault occurs according to the monitoring statistics. Compared with the traditional multi-mode modeling method, the multi-mode modeling method has the advantages that the multi-sampling rate data characteristics and the multi-mode process characteristics can be considered simultaneously, so that the applicability and the fault detection precision in the multi-mode multi-sampling rate process fault detection problem are improved.

Description

Fault detection method based on mixed multi-sampling rate probability principal component analysis model

Technical Field

The invention belongs to the technical field of fault detection, and particularly relates to a fault detection method based on a mixed multi-sampling rate probability principal component analysis model.

Background

With the development of modern industry, process safety and product quality are widely regarded. With the wide application of Distributed Control Systems (DCS) in the industrial field, a large number of process variables can be collected and stored by various sensors with high sampling rates, while key quality variables related to production safety and product quality need to be collected in a low sampling rate manner and acquired through testing, so that the multi-sampling rate characteristics of data and the difficult acquirement of important variable data are caused, which is a challenge to the management of actual industrial engineering. Meanwhile, with the continuous progress of process monitoring (MSPM) and soft measurement technology based on multivariate statistical analysis, the dimension reduction, reconstruction and visualization of mass data variables are realized, and the method is widely applied to the fields of pharmacy, chemical industry, pollution control and the like. Traditional static Principal Component Analysis (PCA) and least squares estimation (PLS) models can effectively extract cross-correlation of variables, but are not effective in the face of multi-modal problems. While techniques based on Gaussian Mixture Models (GMMs) and mixed probability models can effectively extract and analyze multi-modal data characteristics of the process, they cannot utilize complete multi-sample rate data. The method based on multi-sampling probability principal component analysis can completely utilize multi-sampling rate data information and effectively estimate model parameters by utilizing an expectation-maximization (EM) algorithm, but has poor processing effect on multi-modal data. Therefore, there is a need to provide a soft measurement technique that can not only fully utilize multi-sampling rate data information, but also fully consider the characteristics of multi-modal data in industrial processes.

Disclosure of Invention

The invention aims to provide a multi-mode multi-sampling rate industrial process fault detection method based on a mixed multi-sampling rate probability principal component analysis model aiming at the defects of the prior art.

Aiming at the problem of multi-modal working condition fault detection in the chemical process, the multi-sampling rate data under a plurality of normal working modes are collected by a discrete control system, and a mixed multi-sampling rate probability principal component analysis model is established. The model structure may be estimated by an expectation-maximization algorithm. On the basis, sampling is carried out on the data of the on-line pre-decarburization unit to obtain a multi-sampling rate test sample, latent variables of the test sample are estimated by utilizing the existing model structure, and statistical monitoring indexes at the moment are calculated to realize the fault diagnosis result of the pre-decarburization unit.

A fault detection method based on a mixed multi-sampling rate probability principal component analysis model comprises the following steps:

(1) collecting data of different sampling rates under M modes of normal operation of a chemical process to be monitored, wherein M is more than or equal to 1, and forming a training sample set X for modeling;

(2) preprocessing a training sample set X;

(3) constructing a mixed multi-sampling rate probability principal component analysis model by utilizing the preprocessed training sample set X;

(4) on-line collection of monitoring sample x of sampling rate process of chemical process to be monitored_new；

(5) Based on the constructed mixed multi-sampling rate probability principal component analysis model, calculating monitoring statistics of monitoring samples;

(6) and judging whether the fault occurs according to the monitoring statistics.

Preferably, in step (1) or/and step (4), the training sample or the monitoring sample is collected by using a discrete control system in the invention. The discrete control system is used as a mature data acquisition system, and the stability and accuracy of data can be integrally guaranteed.

The multi-mode process means that for a certain chemical unit, multiple products may respectively correspond to one working condition according to the requirements of different products, and the variable number and the sampling rate under each working condition may change; taking the pre-decarburization unit in the synthetic ammonia process as an example, multiple working conditions may exist according to different component contents of production raw materials or different requirements of final products, and the working conditions may have differences in parameters such as stable working condition operating points, reaction internal parameters, final product quality indexes and the like. Meanwhile, the difference of sampling rates in different working conditions brings obstacles to modeling and fault detection of the process. The invention provides a technical scheme aiming at the technical problem in the prior art.

Of course, the method of the present invention can also be applied to simple single-mode chemical processes. Preferably, in the present invention, the chemical process is a multi-modal chemical process, such as a pre-decarbonization process in an ammonia synthesis process or a synthesis unit of an organic compound-like intermediate. Preferably, the chemical process to be monitored is a pre-decarburization process of an ammonia synthesis process.

In the present invention, the training sample set X can be represented as:

X＝[X⁽¹⁾X⁽²⁾… X⁽ⁱ⁾… X^(S)]∈R^N×d

wherein: n is the number of samples, S is the number of sampling rates,

representing a matrix of data at the ith sampling rate, having a number of samples N_iAnd sample dimension d_i，

For training sample set XThe total dimension.

Preferably, the training sample set X is preprocessed so that all elements in each data sample in the training sample set X fluctuate around 0, where greater than 0 indicates higher than average level, less than 0 indicates lower than average level, and there is a linear correlation relationship with latent variables, so that the following static model (in m-mode) can be obtained:

wherein, therein

For training samples at the ith sampling rate, W^m(i) And mu^m(i) The divergence matrix and mean value corresponding to the sample at the ith sampling rate in the m mode η^m(i) η for the noise corresponding to the sample at the ith sampling rate in the m-mode^m(i) Obey a 0-mean Gaussian distribution, i.e. satisfy

β^m(i) Is the variance of the noise at that sampling rate,

is d_i×d_iThe unit diagonal matrix of (1), wherein β^m(i)＝σ^m(i)²，σ^m(i) Is the standard deviation of the noise at that sampling rate. t is t_n,m∈R^qThe implicit variable of the training sample under the m mode is subject to the standard normal distribution, namely t is satisfied_n,m～N(0,I_q) Q is the dimension of an implicit variable, I_qIs an identity diagonal matrix of q × q.

Thus for the global model, the divergence matrix W^mMean value of μ^mAnd noise η^mOf the variance matrix sigma^mCan be expressed as:

W^m＝[W^m(1)；W^m(2)；...；W^m(i)；...；W^m(S)]

μ^m＝[μ^m(1)；μ^m(2)；...；μ^m(r)；...；μ^m(S)]

W^m(i) a divergence matrix corresponding to a training sample at the ith sampling rate in the mth mode; mu.s^m(i) The mean value corresponding to the training sample under the ith sampling rate in the mth mode;

corresponding noise variance of the training sample under the mth mode;

for the nth training sample x_nThe probability density function can be expressed as:

π_mis the probability that the sample belongs to the mth mode; p (x)_n| m) is the probability that the training sample occurs in the mth modality; t is t_n,mThe implicit variable of the sample in the m mode is taken; p (x)_n|t_n,mM) is the data sample with respect to the hidden variable t in the mth modality_n,mThe conditional probability of (a); p (t)_n,mIm) is the probability that the hidden variable of the data sample occurs in the m-th modality.

Thus, the model parameter set for mrmppc a (i.e., the mixed multisampling rate probabilistic principal component analysis model) is:

{π_m,W^m(i),μ^m(i),β^m(i)},(i＝1,2,...,S；m＝1,2,...,M)

π_mthe probability that the current training sample belongs to the m mode is obtained; w^m(i) A divergence matrix of the training sample under the ith sampling rate under the m mode; mu.s^m(i) β is the mean value of the training sample at the ith sampling rate in the m mode^m(i) The noise variance of the training sample at the ith sampling rate in M modes is shown, and M is the total mode number.

Updating model parameters by adopting an expectation maximization algorithm during the constructed mixed multi-sampling rate probability principal component analysis model, and estimating the posterior probability of latent variables by using the current model parameters in the step E; in the M step, updating the parameters of the mixed multi-sampling rate model in a mode of a maximum likelihood function; and repeating the step E and the step M until reaching the model convergence condition.

First, model parameters { π are matched_m,W^m(i),μ^m(i),β^m(i) Initializing (i ═ 1, 2.., S, and M ═ 1, 2.., M) randomly; at the same time, we define a certain training sample x_nComprising S_nA different sampling rate, then x_nCan be expressed as

Wherein

Is a sequence number of a sampling rate and satisfies

Is as follows

Training samples at a sampling rate. Then it can be written accordingly:

is the sample x_nCorresponding mean value in the m-th mode:

is in the m-th mode

Sample x at one sampling rate_nA corresponding mean value;

is the sample x in the m-th mode_nThe corresponding divergence matrix in the mth mode:

is in the m-th mode

The sample x at a sampling rate_nA corresponding divergence matrix;

is the sample x_nCorresponding noise variance in the mth mode:

is as follows

Sample x at one sampling rate_nVariance of the corresponding noise;

is d_i×d_iThe unit diagonal matrix of (2);

diag { } is a diagonal matrix

And then in the step E of model parameter estimation, obtaining an updated value of the model latent variable estimation according to the initial value of the current model parameter, wherein the main formula is as follows:

to simplify the above formula, we define:

<z_n,m>for the training sample x_n(ii) a posterior probability expectation belonging to the mth modality;<t_n,m>for the training sample x_nThe posterior probability expectation of the hidden variable in the m-th mode.

For the training sample x_nA posterior probability covariance matrix of the hidden variables in the m-th mode.

Comparing the maximum likelihood value theta corresponding to the new model parameter_newMaximum likelihood value Θ o corresponding to its original model parameter_ldIf | | | Θ_new-Θ_old||²If yes, entering the fourth step, otherwise, continuing to iterate the EM algorithm, wherein the complete log-maximum likelihood estimation formula of the model is as follows for a threshold value of model convergence:

where Θ represents a maximum likelihood function value, const represents an arbitrary constant, and trace () represents a trace of a matrix.

In step M, obtaining model parameters { pi ] according to the result of step E_m,W^m(i),μ^m(i),β^m(i) The update values of (i ═ 1, 2.., S; M ═ 1, 2.., M) are as follows:

x_n(i) is a sample x_nA subvector consisting of variables at the ith sampling rate; Σ represents the sum over all acquired samples at that sampling rate; trace () represents the traces of the matrix.

In step (4), collecting new multiple sampling rate process monitoring sample x of chemical process on line_newThe monitoring sample contains S_newA different sampling rate, then x_newCan be expressed as:

wherein

A sequence number representing a sampling rate and satisfying

Is as follows

Monitoring samples at individual sampling rates.

In the step (5) and the step (6):

(5-1) solving monitoring sample x based on constructed mixed multi-sampling rate probability principal component analysis model_newT in the m-th mode²Statistics:

and SPE statistics:

obtaining M T²Statistics:

and SPE statistics:

after the monitoring sample is obtained, the same pretreatment and standardization are carried out on the monitoring sample, and the data sample x can be obtained according to the constructed mixed multi-sampling rate probability principal component analysis model_newThe corresponding score belongs to the mean vector under M modes

Divergence matrix

Sum noise covariance matrix

Respectively as follows:

wherein

For monitoring a sample x_newIn the m-th mode

A divergence matrix at one sampling rate.

For monitoring a sample x_newIn the m-th mode

Mean vector at each sampling rate.

For monitoring a sample x_newIn the m-th mode

The noise variance at each sampling rate.

Firstly, the monitoring sample x is obtained_newPosterior probability expectation belonging to the m-th mode<z_new,m>Namely:

the monitoring sample x can be obtained under the model of the m-th mode_newIs a hidden variable t_new,mExpected value of posterior distribution of (1):

where we define for simplicity the formula:

the monitoring sample x_newIn the m-th modeIs/are as follows

The statistics may be obtained by:

further, we can obtain a monitoring sample x_newConditional probability distribution in the m-th mode

Wherein:

then the observation sample x_newThe residual at the mth mode is:

the sample x_newIn the m-th mode

The statistics may be expressed as:

thus obtaining M total T²Statistics and SPE statistics.

(5-2) calculating the monitor sample x_newIn the m-th mode

And

counting the probability of failure;

for the obtained M T²Statistics and SPE statistics, calculating monitorSample x_newIn the m-th mode

And

the probability of the fault occurrence of the statistic is respectively as follows:

wherein

And

is assumed to be:

where (1- α) is the confidence level, it may be set to 0.99, i.e., α ═ 0.01.

Sample x_newIn that

And

the conditional probabilities of normal samples (N) and fault samples (F) in the statistics are:

wherein

And

respectively as statistical confidence limits in the mth mode

The distribution, g and h, can be approximated by:

wherein

And

statistics calculated for samples belonging to the mth modality in the modeling data, respectively. mean represents mean and Var represents variance.

(5-3) binding-monitoring sample x_newPosterior probability of each mode to obtain fused

And SPE_newStatistics are obtained.

Combining on-line samples x_newThe posterior probability of each mode is<z_new,m>Then after fusion

And SPE_newThe statistics are:

in step (6), the online sample x is sampled_newStatistic of (2)

And SPE_newAre compared to the value of the confidence level α to determine whether the sample is a fault.

The invention relates to a fault detection method based on a mixed multi-sampling rate probability principal component analysis model, which comprises the steps of establishing the multi-sampling rate probability principal component analysis model under each mode, fusing a plurality of sub-mode models by a mixed model method, extracting mode information and variable autocorrelation relation of a process, diagnosing faults by using the autocorrelation relation and providing a corresponding online fault detection statistic construction method. Compared with the traditional multi-mode modeling method, the MrMPPCA model (namely a mixed multi-sampling rate probability principal component analysis model) provided by the invention can simultaneously consider the multi-sampling rate data characteristics and the multi-mode process characteristics, so that the applicability and the fault detection precision on the fault detection problem of the multi-mode multi-sampling rate process are improved.

Detailed Description

The invention is further explained by taking a pre-decarbonization unit in the synthetic ammonia process as an example:

a fault detection method based on a mixed multi-sampling rate probability principal component analysis model is disclosed. Aiming at the problem of multi-mode working condition fault detection of a pre-decarburization unit in a synthetic ammonia process, the method firstly utilizes a discrete control system to collect multi-sampling rate data under a plurality of normal working modes and establishes a mixed multi-sampling rate probability principal component analysis model. The model structure is estimated by an expectation-maximization algorithm. On the basis, sampling is carried out on the data of the on-line pre-decarburization unit to obtain a multi-sampling rate test sample, latent variables of the test sample are estimated by utilizing the existing model structure, and statistical monitoring indexes at the moment are calculated to realize the fault diagnosis result of the pre-decarburization unit.

The multi-mode process means that for a certain chemical unit, multiple products may respectively correspond to one working condition according to the requirements of different products, and the variable number and the sampling rate under each working condition may change; taking the pre-decarburization unit in the synthetic ammonia process as an example, multiple working conditions may exist according to different component contents of production raw materials or different requirements of final products, and the working conditions may have differences in parameters such as stable working condition operating points, reaction internal parameters, final product quality indexes and the like. Meanwhile, the difference of sampling rates in different working conditions brings obstacles to modeling and fault detection of the process.

The invention relates to a multi-mode multi-sampling rate fault detection method based on a mixed multi-sampling rate probability principal component analysis model and a synthetic ammonia process pre-decarburization process, which comprises the following steps:

the first step is as follows: collecting data of different sampling rates under M (M is more than or equal to 0) modes in normal operation in the pre-decarburization process of the synthetic ammonia process by using a distributed control system, and forming a training sample set X for modeling to be expressed as:

X＝[X⁽¹⁾X⁽²⁾... X⁽ⁱ⁾... X^(S)]∈R^N×d

wherein: n is the number of samples, S is the number of sampling rates,

Is the total dimension of the training sample set X.

The second step is that: preprocessing and normalizing the data set X, i.e. for a training sample, averaging all elements in the sample, and then subtracting the average value from each element, so that the respective normalized variable value (or element) fluctuates around 0, more than 0 indicates higher than average level, less than 0 indicates lower than average level, and there is a linear correlation with latent variables, the following static model can be obtained (m.di.m. 1,2 … M in the mth modality):

wherein, therein

For training samples at the ith sampling rate, W^m(i) And mu^m(i) The divergence matrix and mean value corresponding to the sample at the ith sampling rate in the mth mode, η^m(i) η for the noise corresponding to the sample at the ith sampling rate in the mth mode^m(i) Obey a 0-mean Gaussian distribution, i.e. satisfy

β^m(i) Is the variance of the noise at that sampling rate,

is d_i×d_iThe unit diagonal matrix of (1), wherein β^m(i)＝σ^m(i)²，σ^m(i) Is the standard deviation of the noise at that sampling rate. t is t_n,_m∈R^qThe implicit variable of the training sample under the m mode is subject to the standard normal distribution, namely t is satisfied_n,m～N(0,I_q) Q is the dimension of an implicit variable, I_qIs the unit diagonal matrix of q × q for the global model, the divergence matrix W is therefore^mMean value of μ^mAnd noise η^mOf the variance matrix sigma^mCan be expressed as:

W^m＝[W^m(1)；W^m(2)；...；W^m(i)；...；W^m(S)]

μ^m＝[μ^m(1)；μ^m(2)；...；μ^m(i)；...；μ^m(S)]

corresponding noise variance of the training sample under the mth mode;

π_mthe probability that the training sample belongs to the m mode is taken as the probability; p (x)_n| m) is the probability that the training sample occurs in the mth modality; t is t_n,mThe hidden variable of the training sample under the m mode; p (x)_n|t_n,mM) is the data sample with respect to the hidden variable t in the mth modality_n,mThe conditional probability of (a); p (t)_n,mIm) is the probability that the hidden variable of the data sample occurs in the m-th modality.

Thus, the set of model parameters for MrMPPCA (i.e., the mixed multisampling rate probabilistic principal component analysis model) is

{π_m,W^m(i),μ^m(i),β^m(i)},(i＝1,2,...,S；m＝1,2,...,M)

W^m(i) A divergence matrix corresponding to a training sample at the ith sampling rate in the mth mode; mu.s^m(i) Is the ith sampling rate in the mth modeMean of training samples β^m(i) And the noise variance corresponding to the training sample at the ith sampling rate in the mth mode.

The third step: updating the model parameters by using an Expectation Maximization (EM) algorithm, and estimating the posterior probability of the latent variable by using the current model parameters in the step E; in M, updating the parameters of the mixed multi-sampling rate model by means of a maximum likelihood function. And finally, repeating the step E and the step M until reaching the model convergence condition.

Wherein

Is a sequence number of the sampling rate and satisfies

Is as follows

Training samples at a sampling rate. Then it can be written accordingly:

is the sample x_nCorresponding mean value at the mth modality:

is the m < th > modality

Sample x at one sampling rate_nA corresponding mean value;

is the sample x_nIn the m-th mode

Corresponding divergence matrix at each sampling rate:

is the sample x_nIn the m-th mode

The sample x at a sampling rate_nA corresponding divergence matrix;

is the sample x_nCorresponding noise variance in the mth mode:

for the m-th mode

Sample x at one sampling rate_nVariance of the corresponding noise;

is d_i×d_iThe unit diagonal matrix of (2);

diag { } is a diagonal matrix.

to simplify the above formula, we define:

Comparing the maximum likelihood value theta corresponding to the new model parameter_newMaximum likelihood value theta corresponding to original model parameter_oldIf | | | Θ_new-Θ_old||²If yes, entering the fourth step, otherwise, continuing to iterate the EM algorithm, wherein the complete log-maximum likelihood estimation formula of the model is as follows for a threshold value of model convergence:

The fourth step: on-line collection of multiple sampling rate process monitoring samples x for a new synthetic ammonia pre-decarbonization process_newThe monitoring sample contains S_newA different sampling rate, then x_newCan be expressed as:

wherein

A sequence number representing a sampling rate and satisfying

Is as follows

Monitoring samples at individual sampling rates.

For the monitoring sample x_newThe same pretreatment and normalization as in the second step were performed. The monitoring sample x can be obtained according to the constructed mixed multi-sampling rate probability principal component analysis model_newThe corresponding score belongs to the mean vector under M modes

Divergence matrix

Sum noise covariance matrix

Respectively as follows:

wherein

For monitoring a sample x_newIn the m-th mode

A divergence matrix at one sampling rate.

For monitoring a sample x_newIn the m-th mode

Mean vector at each sampling rate.

For monitoring a sample x_newIn the m-th mode

The noise variance at each sampling rate.

Firstly, the monitoring sample x is obtained_newPosterior probability expectation z belonging to the m-th mode_new,m>Namely:

where we define for simplicity the formula:

the monitoring sample x_newIn the m-th mode

The statistics may be obtained by:

Wherein:

then the observation sample x_newThe residual at the mth mode is:

the sample x_newIn the m-th mode

The statistics may be expressed as:

thus obtaining M total T²Statistics and SPE statistics.

Then the observation sample x is calculated_newIn the m-th mode

And

wherein

And

is assumed to be:

Sample x_newIn that

And

wherein

And

respectively as statistical confidence limits in the mth mode

The distribution, g and h, can be approximated by:

wherein

And

And SPE_newThe statistics are:

and a sixth step: to sample x online_newStatistic of (2)

And SPE_newMay be compared to the confidence level α to determine whether the sample is a fault.

Claims

1. A fault detection method based on a mixed multi-sampling rate probability principal component analysis model is characterized by comprising the following steps:

(2) preprocessing a training sample set X;

(4) on-line collection of monitoring samples x of a multi-sampling-rate process of a chemical process to be monitored_new；

(6) judging whether the fault occurs according to the monitoring statistics;

preprocessing a training sample set X to enable all elements in each data sample in the training sample set X to fluctuate around 0;

a linear correlation relationship exists between the preprocessed training sample set X and latent variables;

when constructing the mixed multi-sampling rate probability principal component analysis model, the model parameter set is as follows:

{π_m,W^m(i),μ^m(i),β^m(i)},(i＝1,2,...,S；m＝1,2,...,M)

π_mthe probability that the current training sample belongs to the m mode is obtained; w^m(i) A divergence matrix of the training sample under the ith sampling rate under the m mode; mu.s^m(i) β is the mean value of the training sample at the ith sampling rate in the m mode^m(i) Noise variance of a training sample under the ith sampling rate under M modes is obtained, and M is the total mode number;

in the step (5) and the step (6):

and SPE statistics:

obtaining M T²Statistics:

and SPE statistics:

(5-2) calculating the monitor sample x_newIn the m-th mode

And

counting the probability of failure;

And SPE_newStatistics are obtained.

2. The fault detection method based on the mixed multisampling rate probabilistic principal component analysis model according to claim 1, wherein in step (1) or step (4), a discrete control system is used to collect data.

3. The method of claim 1, wherein the chemical process to be monitored is a pre-decarbonization process of a synthetic ammonia process.

4. The fault detection method based on the hybrid multisampling rate probabilistic principal component analysis model according to claim 1, wherein the model parameters are updated by adopting an expectation-maximization algorithm when the hybrid multisampling rate probabilistic principal component analysis model is constructed, and the posterior probability of latent variables is estimated by using the current model parameters in step E of the expectation-maximization algorithm; in M steps in the expectation-maximization algorithm, updating the parameters of the mixed multi-sampling rate model in a mode of a maximization likelihood function; and repeating the step E and the step M until reaching the model convergence condition.

5. The fault detection method based on the mixed multisampling rate probabilistic principal component analysis model as claimed in claim 1, wherein in the step (6), the monitoring sample x is_newFused statistics of