CN114280935A

CN114280935A - Multi-stage fermentation process fault monitoring method based on semi-supervised FCM and SAE of information entropy

Info

Publication number: CN114280935A
Application number: CN202111541540.7A
Authority: CN
Inventors: 高学金; 李学凤; 高慧慧; 韩华云
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-12-16
Filing date: 2021-12-16
Publication date: 2022-04-05

Abstract

The invention discloses a multi-stage fermentation process fault monitoring method based on semi-supervised FCM and SAE of information entropy. Firstly, the data of the fermentation process is divided into stable stages by using a semi-supervised fuzzy C-means clustering algorithm based on information entropy; then introducing a Silhouette coefficient to divide a transition stage of every two stable stages, wherein the fault monitoring comprises the following steps: and respectively establishing a monitoring model for each stable stage and each transition stage by utilizing a sparse automatic encoder, and establishing a reconstruction error as a statistic index. Then determining the control limit of each sample statistic index by using a nuclear density estimation method; and finally, substituting the test batch samples into the model, calculating the statistic index of the model, comparing the statistic index with the control limit of the normal sample, and if the statistic index exceeds the control limit, determining the test batch samples as fault samples. The invention is more sensitive to faults, enhances the robustness of a monitoring model, improves the accuracy of fault monitoring and can reduce the occurrence of false alarm and missed alarm in process monitoring.

Description

Multi-stage fermentation process fault monitoring method based on semi-supervised FCM and SAE of information entropy

Technical Field

The invention relates to the technical field of fault monitoring based on data driving, in particular to a fault monitoring technology for a fermentation process. The data-driven method is a specific application in monitoring faults of the fermentation process.

Background

With the rapid development of industrial automation technology, the integration level and complexity of modern industrial systems are higher and higher. In order to enable the system to timely monitor the occurrence of the fault, it is very important to improve the reliability of the system fault monitoring performance at present. The fermentation process is an industrial process mainly based on a batch process, and refers to the process of preparing raw materials into small batches and products with high added value within a limited time. Due to the extremely high daily demands made on these products, fault monitoring has been of high interest in the industry for the safe and orderly production of fermentation industrial processes. At present, common fault monitoring methods at home and abroad are mainly divided into two types, namely a method based on a mechanism model and a method based on data driving. The mechanism model based approach limits the specific use of the mechanism model approach due to the need to make accurate mathematical expressions of the system's mechanism knowledge, its limited understanding of the underlying complex physicochemical phenomena, the constantly changing process operating conditions, and the difficulties associated with the basic model development. The data-driven method can fully utilize soft measurement and sensor technology to obtain a large amount of real-time data, and processing and modeling of the obtained historical data becomes one of the methods commonly used at present.

Data-driven methods such as Multi-way Principal Component Analysis (MPCA) and Multi-way Partial Least Squares (MPLS) have been widely used. However, these methods model the production data of the fermentation process as a whole, ignoring the multi-stage nature, non-linearity and dynamics of the fermentation process. Therefore, in establishing a fermentation process monitoring model, two key issues are how to achieve effective staging and how to establish an accurate local model. In order to solve the multi-stage characteristic of the fermentation process, a clustering algorithm is adopted in many stage division methods. The clustering is a process of dividing all samples of a data set into different clusters according to a certain rule, wherein sample points in the clusters are similar, and sample points among the clusters are different, the conventional common K-means clustering is used as a hardening clustering algorithm, the value of the membership degree can only be 0 or 1, one sample can only belong to one cluster, and the clustering algorithm strictly defines the classification category, so that a larger classification error is caused. The fuzzy C-means clustering is used as soft clustering, although the defect of K-means clustering is made up, the problem that the number of clusters needs to be determined manually still exists. In order to effectively solve the nonlinear and Dynamic characteristics in the fermentation process, although the above problems can be solved by the traditional multi-directional Kernel Principal Component Analysis (MKPCA) and Dynamic Kernel Partial Least Squares (DKPCA) methods, the introduction of the Kernel function causes the computation of the whole algorithm to be greatly increased, and different Kernel numbers also have great influence on the monitoring of the fermentation process. An Automatic Encoder (AE), one of representatives of neural networks, is a model for reducing dimensions and extracting nonlinear features from data by minimizing reconstruction errors of input and output. However, when there are more AE hidden nodes than input nodes, the ability of AE to extract sample features is greatly reduced, and the generalization ability of the model is also deteriorated.

Disclosure of Invention

In order to overcome the defects of the method, the fermentation process fault monitoring method Based on the information Entropy Semi-supervised Fuzzy C-means clustering (ESFCM) and the Sparse Automatic Encoder (SAE) is provided. The information entropy can measure the chaos degree of data. Therefore, the information entropy can be introduced into the FCM algorithm, and when the stage division is more reasonable, the information entropy value is smaller, and the cluster number corresponding to the minimum information entropy is the optimal cluster number. In order to explain and verify the effectiveness of the stage division result, the Euclidean distance and the time slice label of each stage are substituted into a Silhouette coefficient formula, so that the stage division performance is measured, and the transition stage of the fermentation process is divided. Then, in order to improve the model generalization capability of SAE and make up the easy saturation defect of the traditional sigmoid function, a Swish activation function is introduced into the traditional SAE to construct an SAE network with strong generalization capability, and finally, an SAE model is respectively established for each sub-stage to carry out fault monitoring.

The invention adopts the following technical scheme and implementation steps:

A. staging

1) Utilization letterSemi-supervised FCM algorithm (ESFCM) of entropy carries out clustering on two-dimensional data of the fermentation process. The method comprises the steps of firstly unfolding three-dimensional data X (I multiplied by J multiplied by K) into two-dimensional data X (I multiplied by KJ) along batches when a three-dimensional data sample under a normal working condition of a fermentation process is given, wherein I is the batch number of the fermentation process, J is a fermentation process variable, and K is a sampling period. The two-dimensional matrix is then normalized by column. The standardized formula is

Wherein the content of the first and second substances,

x_k,jis the jth column element in the time-slice matrix at the kth sampling instant,

is the value of the same after it has been normalized,

and s_jMean and standard deviation, respectively, in column j.

2) When the semi-supervised FCM of the information entropy is used for clustering the two-dimensional data, firstly, the maximum clustering number m is set for a two-dimensional sample matrix_maxAnd the minimum clustering number m_minSince the number of clusters in the clustering algorithm is 2 at minimum, m _min2. The growth cycle of the microorganism in the fermentation process is divided into a growth adaptation phase, a logarithmic growth phase, a growth stabilization phase and a decay phase, so that m is_max4. And then, randomly initializing a membership matrix U of the FCM, enabling t to be 0, and updating m, wherein the membership matrix U is a function for indicating that a certain sample X belongs to a certain set. After the initialization is finished, the membership degree matrixes U and t are updated. The entropy of the information describes the degree of disorder of the sample points, and when the division is reasonable, the entropy of the information of the cluster is smaller. Computing information entropy

It represents the information entropy of the k-th sample point on the cluster, and N is the total number of samples.Wherein the content of the first and second substances,

p denotes a cluster tag number, u_pxAnd representing the membership value of the sample point x belonging to the cluster label p. After the information entropy calculation is finished, when | U |, is^t+1-U^tIf the current clustering number is larger than e, the updating is stopped, and the current clustering number m is determined. When m is more than or equal to m_maxAnd then, determining the current clustering number m as the final clustering number.

3) Introducing m obtained in the step 2 into an FCM algorithm, and substituting Euclidean distance into a target function in the FCM after solving the Euclidean distance

d is the Euclidean distance, and the distance is,

the membership value of the membership degree of the sample point x at the kth sampling moment belonging to a certain cluster p, m is the number of clusters in the step 2, and N is the number of sample time slices. I | represents a measure of the distance of each time slice to the center of the cluster. The maximum iteration number M, the iteration number o belongs to (0, M) and o is a positive integer. Calculating the objective function values of the o-th iteration and the o-1 st iteration of the algorithm, and when the absolute value of the difference between the objective function values of the two iterations is less than the iteration error v, namely | R_o-R_o-1If the | is less than v, finishing the clustering algorithm and finishing the stage division.

B. And (3) offline modeling:

4) inputting the samples of each stage divided in the step 3 into a sparse self-encoder model, replacing the traditional sigmoid activation function with a Swish activation function in the method, wherein the expression of the Swish function is

x is SAE as the input variable of the neuron in the next layer, the input variable is determined by the input variable of the first layer of SAE, the input variable of the first layer is the fermentation process sample matrix, the dimension of the first layer is consistent with that of each batch of sample matrix, the first layer is J multiplied by K, beta is a random parameter, the beta is changed from 0 to infinity, and the corresponding activation functionWith values of 0 to 1, different smoothing functions are obtained between the linear function and the ReLU function, in order to have a certain regularization effect, i.e. x<The part 0 has a certain lower limit error, namely the activation curve is below the x-axis, because the activation curve is approximate to a linear function when beta is 0.1, and is already approximate to a ReLU function when beta is 10, the regularization effect is deteriorated, and according to the criterion, the method beta of the invention takes random values which are more than 0.1 and less than 10 to have no obvious distinguishing effect on the experimental effect. In an encoding network, the output information f (X) W of the encoder_KJh′_t+b_JX is a sample matrix formed of each sample X, W_KJFor the weight matrix of the encoder, K represents the weight matrix W_KJThe number of rows of (J) represents the weight matrix W_KJAlso indicates the dimension of the coding network in the corresponding hidden layer output neuron. h'_tA matrix of input variables corresponding to hidden layer neurons at time t' for the encoder. b_JIs the bias matrix of the encoder and J is the encoder output dimension. In order to prevent SAE symmetric weight phenomenon during training, random initialization can be adopted to configure the network weight, and when the layer I neuron uses Swish activation function, the parameter

The variance can be obtained

M_l-1The number of neurons in the previous layer. The initialization method can complete the configuration of the weight parameter only by keeping the gradient variance of the activation function unchanged. Entering a decoding network g (f (X) ═ g (W ') after the coding operation is finished'_KJh′_t+b′_J)，W′_KJIs a weight matrix of the decoder, h 'represents a weight matrix W'_KJThe number of lines of (1) also represents the number of hidden layers of the decoder, and J represents the weight matrix W'_KJAlso indicates the dimension of the decoding network in the corresponding hidden layer output neuron. h'_tThe decoder is a matrix of input variables corresponding to hidden layer neurons at time t. b'_JIs the bias matrix of the decoder. g (f) (X)) is decodingAnd (4) decoding the output information f (X) of the original coding network by the reconstructed information after the device finishes working.

5) The reconstruction error E is then used to find the reconstruction difference between the original and the reconstructed sample.

Wherein e is_kRepresents the square sum error of the original sample and the reconstructed sample at the kth moment, | | | | | sweet wind₂Representing a two-norm and K representing the fermentation process sampling time.

6) Inputting normal working condition data X (IK multiplied by J) in the fermentation process used for modeling into a trained sparse automatic encoder model, and calculating according to model reconstruction errors to obtain SPE statistics of each normal sample; SPE is processed by step 5 e_kThe calculation formula is as follows:

wherein x is_kSample matrix input for the k-th time instant, a_kAnd N is the number of samples of the sample matrix reconstructed at the corresponding kth moment.

7) And calculating a kernel density estimation value corresponding to the SPE of the historical normal sample by adopting Kernel Density Estimation (KDE), so as to obtain a control limit of SPE statistic of the normal sample, wherein the control limit can be used as a reference line for fault monitoring, and when the control limit of the fault sample exceeds the control of the normal sample, a fault occurs. The kernel density estimation formula is:

wherein spe_kThe SPE value of the kth normal sample, δ is the width, is the random parameter, G is the kernel function, N is the total sample number, and f (SPE) is the probability density corresponding to each normal sample SPE.

C. Online monitoring:

8) sampling on line to obtain sample data x of the kth time of a new fermentation batch_new,k(1 XJ), wherein I is the batch number of the fermentation process, and J is the variable number of the fermentation process. Determining samples from sampling timesThe stage is obtained by normalizing the mean value and standard deviation of the corresponding k-th time in the off-line modeling stage

To normalize the data at the kth sampling instant,

the normalized first variable data at the kth time,

is the j variable data at the k moment after normalization. Wherein

The normalization of (a) is as follows:

represents the mean value, s, of the original sample corresponding to the jth variable at the kth time in the batch direction_k,jAnd the standard deviation of the original sample corresponding to the jth variable at the kth moment in the batch direction is shown. Then will be

Inputting the established sparse automatic encoder model to obtain a reconstructed vector a_new,k，

W 'is a decoder weight matrix, W is an encoder weight matrix, b is an encoder offset vector, b' is a decoder offset vector,

is the output vector of the encoder.

9) Computing

Monitoring statistics SPE_k，SPE_kFor the squared prediction error at the kth time instant, the equation is as follows:

a_new,kis the reconstructed variable at the kth time obtained by using the sparse autoencoder model.

10) SPE obtained in the last step at the current moment_kCompared with the control limit obtained by calculating the historical batch sample by using KDE in the off-line modeling,

wherein spe_kAnd d, taking the value of the SPE of the kth test sample, wherein delta is the width, G is a kernel function, N is the total sample number, and f (SPE) is the control limit corresponding to the SPE of the current test sample, comparing the sample control limit of the test batch with the normal sample control limit, if the sample control limit is higher than the normal sample control limit, judging the current sampling point as a fault point, and solving the fault point of the whole sampling period so as to calculate the false alarm rate and the false alarm rate.

Advantageous effects

1) The method introduces information entropy into the FCM to construct a semi-supervised learning algorithm so as to determine the number of clusters. After the clustering division of the stable stage is completed by using the algorithm, the dynamic stage is sparsely divided by using Silhouette, and the accuracy of the stage division is measured. Compared with the traditional method, the method solves the problem that the number of clusters needs to be artificially determined, obtains more accurate clustering results, and simultaneously improves the monitoring performance of subsequent SAE.

2) According to the invention, a Swish activation function is introduced into SAE instead of a traditional Sigmid activation function, so that the generalization performance of the model is improved, and the problem of gradient disappearance of the traditional sigmoid function is solved. An SAE monitoring model is established for each stage of the stage division, the original characteristics are reconstructed, and compared with the overall modeling, the fault monitoring strategy of the multi-stage modeling is more sensitive to faults, so that the robustness of the monitoring model is enhanced, the fault monitoring accuracy is improved, and the occurrence of false reporting and false reporting in process monitoring can be reduced.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a flow chart of the ESFCM algorithm;

FIG. 3 is a schematic diagram of a sparse encoder;

FIG. 4 is an off-line modeling flow diagram;

FIG. 5 is a flow chart of online monitoring;

FIG. 6 is a diagram showing the results of ESFCM preliminary stage division according to the present invention;

FIG. 7 is a diagram of ESFCM transition phase identification according to the method of the present invention;

FIG. 8 is a diagram of an ESFCM global phase partition of the method of the present invention;

FIG. 9 is a FCM preliminary stage partition diagram;

FIG. 10 is a FCM transition phase identification diagram;

FIG. 11 is a FCM global phase partition diagram;

FIG. 12 is a KFCM preliminary stage division result diagram;

FIG. 13 is a KFCM transition stage identification diagram;

FIG. 14 is a KFCM global phase partition diagram;

FIG. 15 is a diagram showing the effect of ESFCM-SAE on fault monitoring of ventilation rate faults (step faults) according to the method of the present invention;

FIG. 16 is a graph showing the effect of ESFCM-SAE on fault monitoring of substrate flow rate faults (ramp faults) in accordance with the method of the present invention;

FIG. 17 is a diagram showing the effect of ESFCM-SAE on fault monitoring of stirring power failure (slope failure) in the method of the present invention;

FIG. 18 is a graph of the effectiveness of comparative AE in monitoring ventilation rate failures (step failures);

FIG. 19 is a graph showing the effect of comparative method AE on fault monitoring of substrate feed rate faults (ramp faults);

FIG. 20 is a graph showing the effect of comparison AE on the failure monitoring of stirring power failure (slope failure);

Detailed Description

Penicillin (penicillin) is an antibiotic of great clinical value. The international platform pensim2.0 simulation platform was used herein for fault monitoring studies of penicillin fermentation processes.

In the platform, for the convenience of experiment, the working conditions are set to be 400h for continuous fermentation of each batch, and the sampling interval is 1 h. The initial conditions for each batch were slightly varied within the allowable range. 40 batches of normal data are used for simulation, 10 process variables are selected, and 40 x 400 x 10 data are obtained for off-line modeling. The process variables are shown in table 1. In order to better simulate the actual production condition, a certain amount of Gaussian noise interference is added to the training sample. To verify the performance of the proposed method on fault monitoring, three sets of fault batch samples were selected for testing, as shown in table 2.

TABLE 1 penicillin simulation platform Process variables

TABLE 2 penicillin fermentation Process Primary variables

The application of the method of the invention to the penicillin fermentation process simulation object comprises two major steps of stage division and fault monitoring, and the method specifically comprises the following steps:

A. staging

Step 1: data pre-processing

Three-dimensional data of penicillin fermentation were developed along the batch and converted into two-dimensional data. The development in the batch direction is to three-dimensional data X according to time_40×10×400Slicing and unfolding the two-dimensional time slice matrix X_{400×(40×10)}The two-dimensional time slice matrix is then normalized by column. The normalized formula is:

k

1, 400,

j

1, 400. Wherein the content of the first and second substances,

is the value of the same after it has been normalized,

and s_jMean and standard deviation, respectively, in column j.

Step 2: and carrying out phase division on the two-dimensional data by using semi-supervised FCM based on information entropy. Firstly, setting the maximum clustering number of 4 and the minimum clustering number of m to a two-dimensional time slice matrix_minThen, randomly initializing the membership matrix of FCM, making t equal to 0, and then updating the membership matrices U and t. When | | | U^t+1-U^tWhen | | is > e, the information entropy describes the degree of disorder of the sample points, and when the division is reasonable, the information entropy of the cluster is smaller. Computing information entropy

It represents the information entropy of the k-th sample point corresponding to the cluster, wherein,

p denotes a cluster tag number, u_pxAnd representing the membership value of the sample point x belonging to the cluster label p.

After the information entropy calculation is finished, when | U |, is^t+1-U^tI > e, update cutoff. And e within 0.001 does not have great influence on the algorithm result. To ensure the clustering accuracy, let e equal to 0.0001, update the current clustering number m. When m is more than or equal to m_maxAnd determining the current clustering number m as the final clustering number, and obtaining a penicillin stage division result m which is 4 according to an algorithm.

And step 3: after the clustering number m is 4 and is substituted into the FCM algorithm to obtain the Euclidean distance, the Euclidean distance is substituted into the objective function in the FCM

The membership value of the membership degree of the sample point x at the kth sampling moment belonging to a certain cluster p, m is the number of clusters in the step 2, and N is the number of sample time slices. I | represents a measure of the distance of each time slice to the center of the cluster. The maximum iteration number M, the iteration number o belongs to (0, M) and o is a positive integer. Calculating objective function values of the o-th iteration and the o-1-th iteration of the algorithm, wherein the iteration error v is within 0.001, the algorithm result is not influenced, v is made to be 0.001, and when the absolute value of the difference between the objective function values of the two iterations is smaller than the iteration error v, namely | R_o-R_o-1If the | is less than 0.001, finishing the clustering algorithm and finishing the stage division.

B. Offline modeling

And after the two-dimensional time slice matrix is divided into stages by using a semi-supervised FCM algorithm based on the information entropy, performing off-line modeling training on each sub-stage by using an automatic encoder model.

Step 1: firstly, three-dimensional data reconstruction is carried out on the previously standardized data, and the new three-dimensional data is X'_{40×(400×10)}Each batch of reconstructed new samples consists of 400 rows and 10 columns, 400 rows representing 400 sampling instants and 10 columns representing 10 variables for each sample. Each batch of 400 x 10 two-dimensional time slice samples was then input into a sparse autoencoder containing a swish activation function. The method uses Intel (R) core (TM) i5-9500 CPU, pycharm3.8 software, tenterflow 2.0 platform to carry out experiment, and uses manual parameter adjustment, Bayesian optimal parameter adjustment and random initialization method to process the SAE network. Wherein, the manual parameter adjusting parameters are as follows: the hidden layer is a 3-layer network, each layer is provided with 52 network units, beta is 0.5, an adam optimizer is used, the learning rate is configured to be 0.05, the iteration number epoch is 900, the batch size is set to be 1, and the dimensionality of a network input vector is set to be 10. Benefit toAnd optimizing the sliding window width and the hidden layer dimension by Bayesian optimization to obtain a sliding window width of 8 and a hidden layer dimension of 32. And manually adjusting the other parameters according to the fermentation process data. In addition, in the encoding network, the output information f (x) of the encoder is W_KJh′_t+b_J＝W_400×10h′_t+b₁₀X is a sample matrix formed of each sample X, W_KJFor the weight matrix of the encoder, K represents the weight matrix W_KJThe number of rows of (J) represents the weight matrix W_KJAlso indicates the dimension of the coding network in the corresponding hidden layer output neuron. h'_tA matrix of input variables corresponding to hidden layer neurons at time t' for the encoder. b_JIs the bias matrix of the encoder and J is the encoder output dimension. In order to prevent SAE symmetric weight phenomenon during training, random initialization can be adopted to configure the network weight, and when the layer I neuron uses Swish activation function, the parameter

The variance can be obtained

M_l-1The number of neurons in the previous layer. The initialization method can complete the configuration of the weight parameter only by keeping the gradient variance of the activation function unchanged. f (x) is the output information of the encoder, and the encoding operation is completed and then the decoding network g (f (x) ═ g (W'_KJh′_t+b′_J)＝g(W′_400×10h′_t+b′₁₀)，W′_KJIs a weight matrix of the decoder, h 'represents a weight matrix W'_KJThe number of lines of (1) also represents the number of hidden layers of the decoder, and J represents the weight matrix W'_KJAlso indicates the dimension of the decoding network in the corresponding hidden layer output neuron. h'_tThe decoder is a matrix of input variables corresponding to hidden layer neurons at time t. b'_JIs the bias matrix of the decoder. g (f) (X) is reconstruction information after the decoder finishes working, and decoding the output information f (X) of the original coding network. Then the original is found by using the reconstruction error EReconstruction differences between samples and reconstructed samples.

Wherein e is_kRepresents the square sum error of the original sample and the reconstructed sample at the kth moment, | | | | | sweet wind₂The two-norm expression is realized, K represents the sampling time of the fermentation process, the sampling period is based on simulation sampling data or actual sampling data, the penicillin fermentation process can be set to sample once per hour in the pensum simulation platform, and the number of samples is equal to the sampling period 400. And E is the determined real reconstruction error, and when the reconstruction error is minimum, the requirement of model training is met.

Step 2: inputting normal working condition data X (IK multiplied by J) in the fermentation process used for modeling into a trained sparse automatic encoder model, and calculating according to model reconstruction errors to obtain SPE statistics of each normal sample; SPE by e_kThe calculation formula is as follows:

wherein x is_kSample matrix input for the k-th time instant, a_kAnd N is the number of samples of the sample matrix reconstructed at the corresponding kth moment. 400 indicates that the original sample and the reconstructed sample are both 400.

And step 3: and calculating a kernel density estimation value corresponding to the SPE of the historical normal sample by adopting Kernel Density Estimation (KDE), so as to obtain a control limit of SPE statistic of the normal sample, wherein the control limit can be used as a reference line for fault monitoring, and when the control limit of the fault sample exceeds the control of the normal sample, a fault occurs. The kernel density estimation formula is as follows:

C. on-line monitoring

After the off-line modeling is finished, an on-line monitoring strategy is carried out, the stage of the fault sample is judged firstly, then the test sample containing the fault sample is input into the corresponding self-encoder model, the SPE and the control limit of the test sample are judged,

spe_kthe value of SPE of the kth test sample is delta is width, G is kernel function, N is total number of test samples, penicillin fermentation test batch is 400 sample samples, f_k(x) And taking the probability density corresponding to each normal sample SPE as the control limit of the test sample. And if the SPE of a certain point in the test sample is larger than the corresponding control limit, the sample point is a fault sample point.

The steps are the specific application of the method in the field of penicillin simulation platform fault monitoring. To verify the effectiveness of the method, an on-line monitoring phase experiment was carried out for each of fault 1 (step fault set for aeration rate), fault 2 (ramp fault set for substrate feed rate) and fault 3 (ramp fault set for agitation power). FIG. 6 is a diagram showing the results of the preliminary stage division of ESFCM in the method of the present invention, in which the preliminary division is only the division of the stable stage of penicillin fermentation, it can be seen that the 4 stable stages of the fermentation process are 1-76, 77-171, 172-276 and 277-400. Fig. 7 is a result diagram of the transition stage division by using the Silhouette coefficient in the method of the present invention. As can be seen, the transition stages of the fermentation process are 47-112, 164-212, 261-287. FIG. 8 is a diagram of the whole fermentation process of the present invention, which includes the stable stages 1-46, the transition stages 47-112, the stable stage 113-163, the transition stage 164-212, the stable stage 213-260, the transition stage 261-287 and the stable stage 288-400. FIG. 9 is a diagram of the result of the initial classification of the comparison method FCM, and it can be seen that the stable stages are 1-200, 201-. Fig. 10 is a diagram of the transition stage division result of the FCM of the comparative method, and it can be seen from the diagram that there is a serious misclassification phenomenon in the transition stage between the first stable stage and the second stable stage. FIG. 12 is a KFCM preliminary stage division result diagram, from which it can be seen that the stable stage is 1-75,76-187, 188-. Fig. 13 is a transition stage identification diagram, and fig. 14 is a global stage division diagram. As can be seen, the stabilization phases 0-41, the transition phases 42-75, the stabilization phases 76-141, the transition phases 142-187, the stabilization phases 188-245, the transition phases 246-275, and the stabilization phases 276-400. Due to the addition of the kernel function, the KFCM has more iteration times and longer calculation time. By comprehensive comparison, the ESFCM model is more stable, and the accuracy of stage division is higher. The accuracy of the staging was evaluated using the Silhouette coefficient, as shown in Table 3. It is known that the FCM fails to perform the transition stage division because the misclassification is serious when the transition stage division is performed in the first stage. Resulting in a low accuracy of the final staging. The KFCM has higher accuracy than FCM but is caused by KFCM

TABLE 3 Algorithm stage division accuracy comparison

In order to verify the performance of the ESFCM in fault monitoring, SAE and AE are adopted to respectively carry out process monitoring on fault 1, fault 2 and fault 3 in the penicillin fermentation process, and a test effect graph of the ESFCM-SAE and an experiment effect graph of the AE in a comparison method are shown in FIGS. 15-20. The red dotted line is the threshold of normal batch samples, and the black line is the statistical index SPE when fault batches are introduced in online monitoring. When the black line exceeds the red dotted line, an alarm occurs. It can thus be seen that for fault 1, fig. 15 only shows a false alarm for the 18 th and 113 th sample points, whereas fig. 18 shows a large number of false alarms in the first transition phase. For the failure 2, it can be seen from fig. 16 that the system only generates the false alarm at 24 th, 33 th and 178 th, but it can be seen from fig. 19 that a large number of false alarms are generated and the occurrence of the false alarm phenomenon is serious within the sample point interval of 200 th and 250 th. For the failure 3, it can be seen from fig. 17 that the false alarm occurs only at the 23 rd, 24 th and 25 th sample points, and the false alarm occurs in the 200 th and 215 th sample point intervals. However, as can be seen from fig. 20, although there is no false alarm, the leak alarm phenomenon is very serious in the 200-320 intervals. The research method compares the method with the traditional AE method in two evaluation indexes of false alarm rate and missing alarm rate, as shown in a chart 4.

Table 4: comparison of algorithms on false alarm rate indicators

Table 5: comparison of algorithms on missing report rate index

In a word, according to the data of the monitoring effect graphs (15-20) and the tables (3-5), it can be seen that the ESFCM method provided by the invention has higher accuracy of stage division than that of the FCM method and the KFCM method due to more flexible configuration of the number of clusters when the stage division is carried out, and 90% of the division accuracy can meet the requirement of follow-up fault monitoring. And when the fault is monitored by using an SAE model containing a Swish activation function, compared with the traditional method, the ESFCM-SAE has 99% of fault state monitoring precision, and the effectiveness and the accuracy of the method are proved.

Claims

1. A multi-stage fermentation process fault monitoring method based on semi-supervised FCM and sparse automatic encoder of information entropy comprises 3 stages of 'stage division', 'off-line modeling' and 'on-line monitoring', and is characterized by comprising the following specific steps:

A. staging

1) Clustering two-dimensional data of the fermentation process by using an information entropy semi-supervised FCM algorithm (ESFCM); the method comprises the steps of firstly, expanding three-dimensional data X (I multiplied by J multiplied by K) into two-dimensional data X (I multiplied by KJ) along batches for a given three-dimensional data sample under a normal working condition of a fermentation process, wherein I is the batch number of the fermentation process, J is a fermentation process variable, and K is a sampling period; then, standardizing the two-dimensional matrix according to columns; the standardized formula is

Wherein the content of the first and second substances,

x_k,jis the kth miningThe jth column element in the time slice matrix at a sample time,

is the value of the same after it has been normalized,

and s_jMean and standard deviation in column j, respectively;

2) when the semi-supervised FCM of the information entropy is used for clustering the two-dimensional data, firstly, the maximum clustering number m is set for a two-dimensional sample matrix_maxAnd the minimum clustering number m_minSince the number of clusters in the clustering algorithm is 2 at minimum, m_min2; the growth cycle of the microorganism in the fermentation process is divided into a growth adaptation phase, a logarithmic growth phase, a growth stabilization phase and a decay phase, so that m is_max4; then, randomly initializing a membership matrix U of the FCM, enabling t to be 0, and updating m, wherein the membership matrix U is a function for representing that a certain sample X belongs to a certain set; after the initialization is finished, updating the membership degree matrixes U and t; computing information entropy

It represents the information entropy of the k-th sample point on the cluster, N is the total number of samples, wherein,

p denotes a cluster tag number, u_pxRepresenting the membership value of the sample point x belonging to the clustering label p; after the information entropy calculation is finished, when | U |, is^t+1-U^tIf the current clustering number is greater than e, stopping updating, and determining the current clustering number m; when m is more than or equal to m_maxDetermining the current clustering number m as the final clustering number; e is within 0.001;

d is EuropeThe distance between the two adjacent electrodes is the same as the distance between the two adjacent electrodes,

the membership value of the membership degree of a sample point x at the kth sampling moment belonging to a certain cluster p, m is the number of clusters in the step 2, and N is the number of sample time slices; i | represents the measurement of the distance from each time slice to the center of the cluster; the maximum iteration number M is larger than the maximum iteration number o, the iteration number o belongs to (0, M), and o is a positive integer; calculating the objective function values of the o-th iteration and the o-1 st iteration of the algorithm, and when the absolute value of the difference between the objective function values of the two iterations is less than the iteration error v, namely | R_o-R_o-1If the | is less than v, finishing the clustering algorithm and finishing the stage division; v is within 0.001;

B. and (3) offline modeling:

x is SAE which is the input variable of the neuron in the next layer, the input variable is determined by the input variable of the first layer of SAE, the input variable of the first layer is a fermentation process sample matrix, the dimensionality of the input variable is consistent with the dimensionality of each batch of sample matrix, J is multiplied by K, beta is a random parameter, and beta is a random value which is larger than 0.1 and smaller than 10;

in an encoding network, the output information f (X) W of the encoder_KJh′_t+b_JX is a sample matrix formed of each sample X, W_KJFor the weight matrix of the encoder, K represents the weight matrix W_KJThe number of rows of (J) represents the weight matrix W_KJThe number of columns of (a) also represents the dimension of the output neuron of the coding network at the corresponding hidden layer; h'_tA matrix of input variables corresponding to hidden layer neurons of the encoder at time t'; b_JIs the bias matrix of the encoder, and J is the output dimension of the encoder; configuring the network weight by random initialization, and when the first layer neuron uses Swish activation function, the parameters

Variance is obtained

M_l-1The number of neurons in the previous layer; the initialization method completes the configuration of weight parameters only by keeping the gradient variance of the activation function unchanged; entering a decoding network g (f (X) ═ g (W ') after the coding operation is finished'_KJh′_t+b′_J)，W′_KJIs a weight matrix of the decoder, h 'represents a weight matrix W'_KJThe number of lines of (1) also represents the number of hidden layers of the decoder, and J represents the weight matrix W'_KJThe column number of (2) also represents the dimension of the decoding network at the output neuron of the corresponding hidden layer; h'_tA matrix of input variables corresponding to hidden layer neurons of a decoder at time t; b'_JIs a bias matrix of the decoder; g (f) (X) is reconstruction information after the work of the decoder is finished, and decoding the output information f (X) of the original coding network;

5) then, the reconstruction error E is utilized to obtain the reconstruction difference between the original sample and the reconstructed sample;

wherein e is_kRepresents the square sum error of the original sample and the reconstructed sample at the kth moment, | | | | | sweet wind₂Representing a two-norm, K representing the fermentation process sampling time;

wherein x is_kSample matrix input for the k-th time instant, a_kA corresponding sample matrix reconstructed at the kth moment, wherein N is the number of samples;

7) calculating a nuclear density estimation value corresponding to the SPE of the historical normal sample by adopting nuclear density estimation, thereby obtaining a control limit of SPE statistic of the normal sample, wherein the control limit is used as a reference line for fault monitoring, and when the control limit of the fault sample exceeds the control of the normal sample, a fault occurs; the kernel density estimation formula is:

wherein spe_kThe value of SPE of the kth normal sample, delta is width and is a random parameter, G is a kernel function, N is the total sample number, and f (SPE) is the probability density corresponding to each normal sample SPE;

C. online monitoring:

8) sampling on line to obtain sample data x of the kth time of a new fermentation batch_new,k(1 XJ), wherein I is the batch number of the fermentation process, and J is the variable number of the fermentation process; determining the phase of the sample according to the sampling time, and normalizing the phase by using the mean value and standard deviation of the corresponding k-th time in the off-line modeling phase to obtain the phase

To normalize the data at the kth sampling instant,

the normalized first variable data at the kth time,

the j variable data at the k moment after normalization; wherein

The normalization of (a) is as follows:

represents the mean value, s, of the original sample corresponding to the jth variable at the kth time in the batch direction_k,jRepresenting the standard deviation of an original sample corresponding to the jth variable at the kth moment in the batch direction; then will be

is the output vector of the encoder;

9) computing

a_new,kthe reconstruction variable at the kth moment is obtained by utilizing a sparse automatic encoder model;

wherein spe_kThe value of SPE for the kth test sample, δ is the width, G is the kernel function, N is the total number of samples, thisAnd f (SPE) is the control limit corresponding to the SPE of the current test sample, the sample control limit of the test batch is compared with the normal sample control limit, if the sample control limit is higher than the control limit of the normal sample, the current sampling point is judged to be a fault point, the fault point of the whole sampling period is solved, and therefore the false alarm rate and the false alarm rate are calculated.