CN111160811A

CN111160811A - Batch process fault monitoring method based on multi-stage FOM-SAE

Info

Publication number: CN111160811A
Application number: CN202010057396.9A
Authority: CN
Inventors: 金辰; 王普
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2020-01-17
Filing date: 2020-01-17
Publication date: 2020-05-15
Anticipated expiration: 2040-01-17
Also published as: CN111160811B

Abstract

The invention relates to a batch industrial process fault monitoring method based on multi-stage FOM-SAE, which is necessary for ensuring the safety and stability of a production process and improving the product quality. The specific method comprises two steps of off-line modeling and on-line monitoring. The off-line modeling firstly processes normal data, carries out stage division on the whole batch production process, then respectively establishes FOM-SAE monitoring models and constructs monitoring statistics in each stage, and determines the control limit of the statistics by using a nuclear density estimation method. And in the step of 'online monitoring', a corresponding sub-model is selected according to the current sampling moment, the statistic is calculated and compared with the control limit, and whether the current system is normal or not is judged. The method provided by the invention can simultaneously process the multi-stage, non-Gaussian and non-linear characteristics of batch data, effectively reduces the failure report rate of monitoring, greatly improves the accuracy rate of fault monitoring, and has higher practical application value.

Description

Batch process fault monitoring method based on multi-stage FOM-SAE

Technical Field

The invention relates to batch process fault diagnosis based on multi-stage FOM-SAE, and belongs to the field of industrial process fault diagnosis and soft measurement.

Background

In recent decades, batch processes have been widely used in biopharmaceutical, semiconductor processing, and other fields because they can meet the demand for producing high add-on products and can be quickly adjusted to produce in response to market demand. The industrial process has complex mechanism and high operation control difficulty, and the product quality and the production efficiency are easily influenced by uncertain factors. In order to ensure the safety and stability of the industrial system, it is necessary to establish an effective process monitoring means to find out abnormal conditions in time. At present, Multivariate Statistical Process Monitoring (MSPM) methods based on Principal Component Analysis (PCA), Partial Least Squares (PLS) and other methods are widely applied to batch industrial processes. However, the application of the above method is limited to two: firstly, the data must conform to the gaussian distribution, and secondly, the different variables must have linear relations.

To address the problem of non-gaussian characteristics of batch process data, Independent Component Analysis (ICA) was introduced into the process monitoring field. Although this method can handle non-gaussian data, ICA is unstable when solving the unmixing matrix, resulting in degraded monitoring performance. The related literature proves that the Fourth order distance (FOM) can effectively extract the non-gaussian information of the data. Furthermore, for non-linear relationships between batch process data variables, the introduction of Kernel Trick (Kernel lock) was introduced into traditional MSPM algorithms such as KPCA, KPLS, KICA. However, these non-linear monitoring methods require manual setting of the values of the nuclear parameters, which affects the monitoring effect.

The batch process has a plurality of operation stages, and monitoring the whole batch by adopting a single model inevitably leads to the reduction of the monitoring effect in different stages. In view of this feature, many scholars have made a lot of research, and the monitoring accuracy can be improved by reasonably dividing the batch data into stages and establishing a separate monitoring model at each sub-stage.

The purpose of process monitoring is to extract characteristic information from industrial data and realize the separation of normal data and fault data. An Auto Encoder (AE) algorithm is an unsupervised feature extraction method proposed by Hinton, and consists of an encoder and a decoder. The AE algorithm is used for projecting the normal samples to the low-dimensional specific space through nonlinear transformation, so that the nonlinear problem of the data can be effectively solved, and meanwhile, deep information of the data is extracted. While the stacked auto-encoder (SAE) has a stronger feature extraction capability compared to AE. SAE has been used in the fields of image processing, natural language processing, and the like, and application to the field of fault monitoring is also beginning to be emphasized.

A multi-stage batch process fault monitoring method based on four-step distance and stacking self-encoders can effectively improve the fault monitoring capability of a batch process. The method can simultaneously process the problems of non-Gaussian, non-linear and multi-stage process data, and has better fault monitoring performance.

Disclosure of Invention

Aiming at the problems of multiple stages, non-Gaussian and non-linearity of batch process data, the invention provides a batch industrial process fault monitoring method based on multi-stage FOM-SAE, and an overall flow chart of a model is shown in figure 1.

The invention adopts the following technical scheme and implementation steps:

A. an off-line modeling stage:

1) collecting multiple batches of historical data under a normal working condition to form a three-dimensional array X, wherein three dimensions are respectively production batch times I equal to 1 and …, I and process variables J equal to 1 and … and J, and sampling time K equal to 1 and … and K;

2) performing optimal stage division on multiple batches of historical data X by adopting a Fisher algorithm;

3) splitting the three-dimensional data X into I batch slice matrixes along the batch direction, wherein the matrix T of the ith batch_iThe row represents the sampling time K equal to 1, …, K, and the column represents the process variable J equal to 1, …, J, as shown in fig. 3;

4) for each batch slice matrix T_iCalculate FOM, I is 1, …, I, resulting in I FOM matrices, wherein the FOM matrix T of the ith batch_FOMiThe form is as follows:

wherein, the element t of the kth row and the jth column in the ith batch of slices_j(k) Fourth step distance f_j(k) Calculated by the following formula:

f_j(k)＝t_j(k)t_j(k-1)t_j(k-2)t_j(k-3)

for data at the first 3 moments, padding with data at the 4 th moment, i.e. f_j(1)＝f_j(2)＝f_j(3)＝f_j(4)；

5) Combining the data of the s stage in all batches of FOM matrixes into a new two-dimensional data matrix T according to the optimal stage division obtained in the step 2_Stage _sS1, …, S, whose row represents the sampling instant, has a total of I × (p)_s+1-p_s) Rows and columns represent the process variable J ═ 1, …, J, as shown in fig. 4;

6) for each phase data matrix T_Stage _sS is 1, …, and is obtained by normalizing all elements in S

Wherein, the element f of the kth row and the jth column in the s-stage data matrix_j(k) The normalization formula of (a) is as follows:

wherein f is_j ^maxAnd f_j ^minRespectively represent the current s-th stage data matrix T_Stage _sThe maximum and minimum values in column j,

denotes f_j(k) Normalized values;

7) setting SAE of phase s_sSub-model hyper-parameters, utilization

Training SAE layer by layer as input_sSubmodel, minimizing the reconstruction error function for each layer:

wherein, theta_EnAnd theta_DeIs SAE_sParameter of submodel, x⁽ⁱ⁾And s⁽ⁱ⁾Is SAE_sInputting and outputting the ith layer of the submodel, wherein M is the total amount of samples; will SAE_sThe output of the previous layer of the sub-model is used as the input of the next layer to be trained layer by layer until the last layer finishes SAE_sTraining a model;

8) SAE will be_sThe reconstruction error function of the last layer of the model is used as RE statistic, RE statistic of all data in each stage is calculated, and the control limit of the RE statistic in the sub-stage s is determined by utilizing a nuclear density estimation method and is used for on-line monitoring of the stage s;

9) and (5) repeating the steps 5-8, respectively training the monitoring models of the S sub-stages, and respectively calculating the control limit of each sub-stage.

B. And (3) an online monitoring stage:

1) obtaining sampling data of current time

Selecting a corresponding stage monitoring model according to the stage to which the sampling value at the current k moment belongs;

2) calculating the current time

FOM matrix x of_f＝[f₁(k),f₂(k),…,f_J(k)]Where the variable sample value x at time k_j(k) FOM value f of_j(k) The following formula is adopted to obtain:

f_j(k)＝x_j(k)x_j(k-1)x_j(k-2)x_j(k-3)

for data at the first 3 moments, padding with data at the 4 th moment, i.e. f_j(1)＝f_j(2)＝f_j(3)＝f_j(4)。

3) According to the maximum value and the minimum value obtained in the off-line modeling step 6), x obtained in the previous step is added_fIs subjected to standardization to obtain

Where the j variable k is the FOM value f at the moment_j(k) The normalization formula of (a) is as follows:

wherein f is_j ^maxAnd f_j ^minRespectively represent the s-th stage off-line data T_Stage _sThe maximum and minimum values in column j,

denotes f_j(k) Normalized values;

4) obtained in the previous step

Inputting corresponding phase submodel SAE_sAnd calculating the statistic RE;

5) comparing the monitoring statistic RE obtained in the previous step with the control limit of the corresponding stage obtained in the offline modeling step 8), and if the monitoring statistic RE exceeds the control limit, determining that a fault occurs, otherwise, determining that the fault occurs normally until the batch process is finished.

Advantageous effects

Compared with the prior art, the method divides the whole production batch into a plurality of operation stages, fully considers the stage characteristics of the batch process, respectively establishes the FOM-SAE model for each stage, and constructs the monitoring statistic for fault monitoring. Because the traditional FOM can effectively extract non-Gaussian information in data, the instability of solving the ICA model is avoided. The SAE can effectively extract deep information of the batch process while solving the nonlinearity of the batch data. The method can reduce the report missing in the monitoring process and improve the accuracy of fault monitoring.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 shows three-dimensional data X_30×10×400Expanded into a two-dimensional time slice matrix x_kA schematic diagram of (a);

FIG. 3 shows three-dimensional data X_30×10×400Expanded into a two-dimensional batch sheet matrix X_IA schematic diagram of (a);

FIG. 4 is a two-dimensional data matrix T for each stage of data processing reconstruction_Stage _sA schematic diagram of (a);

FIG. 5 is a block diagram of a stacked self-encoder monitor model

FIG. 6 is a graph of the relationship between the number of stages of the penicillin batch fermentation process and α;

FIG. 7 shows the results of the staging of a penicillin batch fermentation process;

FIG. 8(a) is a diagram showing the monitoring effect of the SAE method on No. 5 fault

FIG. 8(b) is a diagram showing the monitoring effect of FOM-SAE method on No. 5 fault

FIG. 8(c) is a diagram showing the effect of the multi-stage SAE method on the monitoring of No. 5 fault

FIG. 8(d) is a diagram showing the effect of the multi-stage FOM-SAE method on the monitoring of No. 5 fault

FIG. 9(a) is a diagram showing the monitoring effect of the SAE method on No. 7 fault

FIG. 9(b) is a diagram showing the monitoring effect of the FOM-SAE method on No. 7 fault

FIG. 9(c) is a diagram showing the monitoring effect of the multi-stage SAE method on No. 7 fault

FIG. 9(d) is a diagram showing the effect of the multi-stage FOM-SAE method on the monitoring of No. 7 fault

Detailed Description

Penicillin fed-batch fermentation is a typical batch industrial process with obvious non-gaussian, non-linear, multi-stage characteristics. A Birol model improved based on a Bajpai mechanism model and Pensim2.0 software developed by the Ethino science and technology institute become a standard test platform for monitoring, fault diagnosis and control of the penicillin fermentation process. Relevant studies have demonstrated the effectiveness and utility of this simulation platform.

In the experiment, the Pensim2.0 simulation data is used as a research object, the sampling time is set to be 1 hour, the fermentation time of each batch is set to be 400 hours, and 10 main process variables are selected to monitor the process running condition, as shown in Table 1. Three-dimensional matrix X for generating 30 batches of normal data by adopting platform_30×10×400For modeling, 10 additional batches of fault data were generated to test the effectiveness of the monitoring model. The set fault type, amplitude and start-stop time are shown in table 2.

TABLE 1 Process variables used for modeling

TABLE 2 failure batch data

The application process of the invention in the simulation platform for penicillin fermentation production is specifically stated as follows:

A. an off-line modeling stage:

1) the historical data X collected in the collection batch process under the normal working condition is a three-dimensional array, the three dimensions are respectively production batch times i of 1, … and 30, process variables j of 1, … and 10, and sampling points k of 1, … and 400.

2) The Fisher algorithm is adopted to carry out stage division on the time slice matrix, and the steps are as follows:

2.1) splitting the three-dimensional data into 400 time slice matrices along the time direction, wherein the time slice matrix x at the k-th time instant_kThe row represents the production lot number i 1, …,30, the column represents the process variable j 1, …, 10;

2.2) time slice matrix is divided into S stages according to time sequence, and the initial time of each stage is arbitrarily set as p_sWhere S is 1, …, S, in this case p₁＝1<p₂<…<p_SK is less than or equal to K. Calculating the class inner diameter of each stage, wherein the class inner diameter is calculated for the stage S by using the following formula (when S is S, p_s+1-1＝K)：

Wherein

Represents the average of all time slice matrices during this phase. At this time, the time slice matrix is dividedFor an error function of S stages, by the following formula:

2.3) in order to obtain the optimal division result quickly, the minimum value of the error function is preferably calculated by a recurrence formula, and the optimal division result divided into S stages can be obtained at the moment:

when the division into two stages is made,

when the division stage is greater than 2 a,

the minimum value of the error function is calculated by the above formula when S is 1,2, …, K.

2.4) when a recurrence formula is adopted, a discriminant function can be calculated

The more α is close to 1, the more accurate the staged result is, the combined stage characteristics of penicillin fermentation and α value determination process is divided into 5 stages, and α value change and division results are shown in fig. 6 and fig. 7.

3) Splitting three-dimensional data X into 30 batch slice matrixes according to batches, wherein the matrix T of the ith batch_iThe row represents the sampling time k equal to 1, …,400, and the column represents the process variable j equal to 1, …, 10;

4) for each batch slice matrix T_iCalculate FOM, I1, …,30, resulting in I FOM matrices, wherein the FOM matrix T of the ith batch_FOMiThe form is as follows:

wherein, in the ith batch of slicesElement t of k row and j column_j(k) Fourth step distance f_j(k) Calculated by the following formula:

f_j(k)＝t_j(k)t_j(k-1)t_j(k-2)t_j(k-3)

5) Combining the data of the s stage in all batches of FOM matrixes into a new two-dimensional data matrix T according to the optimal stage division obtained in the step 2_Stage _sS is 1, …, S, the row representing the sampling instant, for a total of 30 × (p)_s+1-p_s) Rows and columns represent process variables J ═ 1, …, J;

6) for s-phase data matrix T_Stage _sAll elements in 1, …,5 are normalized to obtain

denotes f_j(k) Normalized values;

7) setting SAE of phase s_sSub-model hyper-parameters, utilization

wherein, theta_EnAnd theta_DeIs SAE_sParameter of submodel, x⁽ⁱ⁾And s⁽ⁱ⁾Is SAE_sInputting and outputting the ith layer of the submodel, wherein M is the total amount of samples; SAE will be_sThe output of the previous layer of the sub-model is used as the input of the next layer to be trained layer by layer until the last layer finishes SAE_sModel training, SAE model is a well-known structure as shown in FIG. 5.

9) and (5) repeating the steps 5-8, respectively training the monitoring models of the 5 sub-stages, and respectively calculating the control limit of each sub-stage.

B. And (3) an online monitoring stage:

1) obtaining sampling data of current time

Selecting a corresponding stage monitoring sub-model according to the stage to which the sampling value at the current k moment belongs;

2) calculating the current time

f_j(k)＝x_j(k)x_j(k-1)x_j(k-2)x_j(k-3)

denotes f_j(k) Normalized values;

4) obtained in the previous step

Inputting corresponding phase submodel SAE_sAnd calculating the statistic RE;

The steps are the specific application of the invention on the penicillin fermentation simulation platform. To verify the effectiveness of the monitoring algorithm herein, two failures with failure numbers 5 and 7 were chosen for comparison with the SAE, FOM-SAE, multi-stage FOM-SAE algorithms, respectively, to show the experimental results in detail, see FIGS. 8 and 9. Each graph includes 5 straight lines parallel to the abscissa, i.e., control limits obtained through kernel density estimation, and the up-down fluctuation curves are values of the monitoring statistics. If the curve value is larger than the control limit, the fault occurs at the moment. For more intuitive comparison of monitoring performance of the method, the false alarm rate (MAR) and the accuracy rate (ACC) of the model for 10 faults are respectively calculated as monitoring indexes:

fig. 8 shows a graph of the monitoring results for fault No. 5, which is a ramp-type fault with an increase of 2W/h in the agitation rate fault, and which is introduced from time 70 and continues until time 150 is completed. For a ramped fault, the magnitude of the fault increases cumulatively over time, making it more difficult to detect the fault at an early time when the fault occurs. It can be seen from fig. 8(a) that the single-stage SAE model has a higher false alarm rate of 86.42%, while the multi-stage SAE model of fig. 8(c) better captures the stage characteristics, the false alarm rate is reduced to 67.9%, and the two models have obvious alarm lag. After FOM is added in the graph 8(b), the accuracy is obviously improved to 95%, and the model is proved to effectively improve the monitoring accuracy by extracting non-Gaussian information. The monitoring result of the present invention is shown in fig. 8(d), which has the highest accuracy of 95.25%.

Fig. 9 shows a graph of the monitoring results for fault No. 7, which is a step-type fault with a 3% increase in substrate feed rate, introducing the fault from time 58 to the end of time 130. As can be seen from fig. 9(a) and 9(b), the phase models have high false negative rate and have lost monitoring effect. Fig. 9(c) a multi-stage SAE with a slight improvement over a single-stage SAE, but still has a false negative rate of 87.67%. The method of the invention is shown in figure 9(d), and the method can simultaneously keep lower rate of missing report and rate of false report, and the accuracy rate reaches 97.75 percent.

The performance indexes of monitoring ten faults are shown in a table 3, the multi-stage FOM-SAE provided by the method has higher accuracy rate for various faults, and the fault monitoring effect in the penicillin fermentation process is improved.

TABLE 3 monitoring Performance indicators

Claims

1. A multi-stage FOM-SAE based fault monitoring method for complex industrial process is realized by

The method comprises two steps of 'off-line modeling' and 'on-line monitoring', and the specific scheme is as follows:

the invention adopts the following technical scheme and implementation steps:

A. an off-line modeling stage:

3) splitting the three-dimensional data X into I batch slice matrixes along the batch direction, wherein the matrix T of the ith batch_iThe row represents the sampling time K equal to 1, …, K, and the column represents the process variable J equal to 1, …, J;

f_j(k)＝t_j(k)t_j(k-1)t_j(k-2)t_j(k-3)

5) Combining the data of the s stage in all batches of FOM matrixes into a new two-dimensional data matrix T according to the optimal stage division obtained in the step 2_StagesS1, …, S, whose row represents the sampling instant, has a total of I × (p)_s+1-p_s) Rows and columns represent process variables J ═ 1, …, J;

6) for each onePhase data matrix T_StagesS is 1, …, and is obtained by normalizing all elements in S

wherein f is_j ^maxAnd f_j ^minRespectively represent the current s-th stage data matrix T_StagesThe maximum and minimum values in column j,

denotes f_j(k) Normalized values;

7) setting SAE of phase s_sSub-model hyper-parameters, utilization

wherein, theta_EnAnd theta_DeIs SAE_sParameter of submodel, x⁽ⁱ⁾And s⁽ⁱ⁾Is SAE_sInputting and outputting the ith layer of the submodel, wherein M is the total amount of samples; SAE will be_sThe output of the previous layer of the sub-model is used as the input of the next layer to be trained layer by layer until the last layer finishes SAE_sTraining a model;

B. And (3) an online monitoring stage:

1) obtaining sampling data of current time

2) calculating the current time

f_j(k)＝x_j(k)x_j(k-1)x_j(k-2)x_j(k-3)

wherein f is_j ^maxAnd f_j ^minRespectively represent the s-th stage off-line data T_StagesThe maximum and minimum values in column j,

to representf_j(k) Normalized values;

4) obtained in the previous step

Inputting corresponding phase submodel SAE_sAnd calculating the statistic RE;

2. The multi-stage FOM-SAE-based fault monitoring method for complex industrial processes according to claim 1, characterized in that: the specific steps of the optimal phase division in the off-line modeling phase are as follows:

2.1) splitting the three-dimensional data X into K time slice matrices along the time direction, wherein X_kA time slice matrix representing the sampling time of k, wherein the row of the time slice matrix represents the production batch number I equal to 1, …, and the column of the time slice matrix represents the process variable J equal to 1, …, J;

2.2) randomly dividing the multiple batches of historical data X into S stages, calculating the class inner diameter of each stage, and calculating the error function of the stage division according to the obtained class inner diameter, wherein the class inner diameter calculation formula of the stage S is as follows:

wherein p is_sDenotes the start of the s-th phase, p_s+1Denotes the start of the s +1 th phase, p_s+1-1 denotes the end of the s-th phase, i.e. p_s+1The time of the previous sampling instant of (c),

represents the average of all time slice matrices within the phase s,

the error function for dividing the multi-batch historical data X into S stages is as follows:

and 2.3) the division mode with the minimum error function is the optimal division mode.