CN105739489B

CN105739489B - A kind of batch process fault detection method based on ICA KNN

Info

Publication number: CN105739489B
Application number: CN201610313490.XA
Authority: CN
Inventors: 何建; 章文; 邹见效; 凡时财; 张刚
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2016-05-12
Filing date: 2016-05-12
Publication date: 2018-04-13
Anticipated expiration: 2036-05-12
Also published as: CN105739489A

Abstract

The present invention discloses an intermittent process fault detection method based on ICA-KNN. By applying ICA to process the training data set, select fewer independent pivots to replace the original high-dimensional data, and extract the main features of the original data at the same time. After that, In the independent pivot space, the KNN method is applied to obtain the corresponding statistical control limits for fault detection. In this way, the non-Gaussian nonlinear batch production process has a high fault detection rate, and at the same time reduces the computational complexity compared with KICA.

Description

A Fault Detection Method for Batch Process Based on ICA-KNN

技术领域technical field

本发明属于间歇过程技术领域，更为具体地讲，涉及一种基于ICA-KNN的间歇过程故障检测方法。The invention belongs to the field of batch process technology, and more specifically relates to an ICA-KNN-based batch process fault detection method.

背景技术Background technique

间歇过程，又被称为批处理过程。由于其操作灵活而被广泛应用与小批量、高附加值产品的生产中。如今间歇过程已经成为精细化工、生物制药以及农产品深加工等行业的主要生产方式。半导体批次生产过程存在批次不等长、进程中心漂移、变量非线性和多工况等特点，为减少半导体晶片生成过程中的报废率，故障检测方法已经成为一个重点课题。Batch process, also known as batch process. Because of its flexible operation, it is widely used in the production of small batches and high value-added products. Nowadays, the batch process has become the main production method in industries such as fine chemical industry, biopharmaceutical and deep processing of agricultural products. The semiconductor batch production process has the characteristics of batch unequal length, process center drift, variable nonlinearity and multiple working conditions. In order to reduce the scrap rate in the semiconductor wafer production process, fault detection methods have become a key topic.

多元统计分析，如主元分析(PCA)和偏最小二乘(PLS)以及独立主元分析(ICA)等在化工产业中有着广泛的应用。PCA是多元统计过程监测的重要工具，同时也是数据压缩和信息提取的有效工具。由于PCA算法假定过程是线性的，对于具有强非线性的生产过程，在线监测的结果十分不可靠，存在误报率过高的现象。特别是PCA进行故障检测时使用的统计量T²和SPE确定控制限时需要进行多元高斯分布的假设，这种假设要求训练集中的变量符合多元高斯分布，对于多数半导体批次过程这种假设是不成立的。与主元分析(PCA)方法不同，独立成分分析(ICA)并不要求观测变量数据服从高斯分布，同时基于高阶统计信息分离或估计出统计独立的源信号，其统计意义更强，而且这些隐含的信号通常具有实际物理意义，或者是所研究对象的本质特征反映，因此ICA在分析非高斯分布过程数据方面具有更好的特征提取能力。然而ICA方法本身也是一种线性方法，因此对于间歇过程中存在的非线性数据监测效果也不尽人意。基于此，有学者提出了基于核函数方法的核独立成分分析(KICA)方法用于间歇过程故障检测，并取得较好的效果。其基本思想是首先将输入数据通过一个非线性映射投影到高维特征空间，然后再在高维特征空间应用线性ICA处理。但KICA方法需要计算核矩阵，核矩阵的维数是样本数的平方，当样本数很大时，会增加计算的复杂性。Q.P.He和J.Wang提出一种基于K近邻规则的故障检测方法(FD-KNN)，这种方法并不在乎所处理的数据线性与否，在故障检测过程中能克服半导体数据非线性和多工况特点，实际应用中取得较好的效果。然而，FD-KNN方法存在相应的缺陷，例如当批次进程数据展开后变量规模会迅速增长，致使FD-KNN消耗大量时间用于数据信息的计算，同时占用大量的存储空间记录数据，庞大的数据规模使应用FD-KNN变得困难。Multivariate statistical analysis, such as principal component analysis (PCA) and partial least squares (PLS) and independent principal component analysis (ICA), has been widely used in the chemical industry. PCA is an important tool for multivariate statistical process monitoring, as well as an effective tool for data compression and information extraction. Since the PCA algorithm assumes that the process is linear, for a production process with strong nonlinearity, the results of online monitoring are very unreliable, and there is a phenomenon that the false alarm rate is too high. In particular, the statistic T ² used by PCA for fault detection and the assumption of multivariate Gaussian distribution are required to determine the control limit by SPE. This assumption requires that the variables in the training set conform to the multivariate Gaussian distribution. For most semiconductor batch processes, this assumption is not true. of. Different from the principal component analysis (PCA) method, the independent component analysis (ICA) does not require the observed variable data to obey the Gaussian distribution, and at the same time separate or estimate statistically independent source signals based on high-order statistical information, which has stronger statistical significance, and these The implied signal usually has actual physical meaning, or reflects the essential characteristics of the research object, so ICA has better feature extraction ability in analyzing non-Gaussian distribution process data. However, the ICA method itself is also a linear method, so the monitoring effect for the nonlinear data existing in the batch process is not satisfactory. Based on this, some scholars proposed the Kernel Independent Component Analysis (KICA) method based on the kernel function method for fault detection in batch processes, and achieved good results. The basic idea is to first project the input data to a high-dimensional feature space through a nonlinear mapping, and then apply linear ICA processing in the high-dimensional feature space. However, the KICA method needs to calculate the kernel matrix. The dimension of the kernel matrix is the square of the number of samples. When the number of samples is large, the complexity of the calculation will be increased. QPHe and J.Wang proposed a fault detection method based on the K nearest neighbor rule (FD-KNN). This method does not care whether the processed data is linear or not, and can overcome the non-linearity and multiple working conditions of semiconductor data in the fault detection process. characteristics, and achieve good results in practical applications. However, the FD-KNN method has corresponding defects. For example, when the batch process data is expanded, the variable scale will increase rapidly, causing FD-KNN to consume a lot of time for the calculation of data information, and occupy a large amount of storage space to record data. Huge The data scale makes it difficult to apply FD-KNN.

发明内容Contents of the invention

本发明的目的在于克服现有技术的不足，提供一种基于ICA-KNN的间歇过程故障检测方法，针对具有非线性和多工况等特点的半导体生产过程中，在减少计算复杂度的基础上，有效提高故障检测的准确性。The purpose of the present invention is to overcome the deficiencies of the prior art, to provide an ICA-KNN-based intermittent process fault detection method, aiming at the semiconductor production process with the characteristics of nonlinearity and multiple working conditions, on the basis of reducing computational complexity , effectively improving the accuracy of fault detection.

为实现上述发明目的，本发明一种基于ICA-KNN的间歇过程故障检测方法，其特征在于，包含以下步骤：In order to realize the foregoing invention object, a kind of intermittent process fault detection method based on ICA-KNN of the present invention is characterized in that, comprises the following steps:

(1)、数据预处理(1), data preprocessing

将间歇过程采集的三维样本矩阵X(I×J×K)先进行基于批次个数展开，得到二维矩阵X(I×KJ)，再对二维矩阵X(I×KJ)在批次方向上做标准化处理，使该二维矩阵X(I×KJ)的每列的均值为0、方差为1，最后将标准化处理后的二维矩阵X(I×KJ)纵向重新排列成矩阵X(KI×J)；其中，I表示批次个数，J表示观测变量个数，K表示采样次数；The three-dimensional sample matrix X(I×J×K) collected in the batch process is first expanded based on the number of batches to obtain the two-dimensional matrix X(I×KJ), and then the two-dimensional matrix X(I×KJ) in the batch Do normalization in the direction, so that the mean value of each column of the two-dimensional matrix X (I×KJ) is 0, and the variance is 1. Finally, the normalized two-dimensional matrix X (I×KJ) is vertically rearranged into a matrix X (KI×J); Among them, I represents the number of batches, J represents the number of observed variables, and K represents the number of samples;

(2)、对矩阵X(KI×J)进行ICA降维处理，得到反映间歇过程信息的d个独立成分S_d和主部分离矩阵W_d (2) Perform ICA dimensionality reduction on the matrix X(KI×J) to obtain d independent components S _d reflecting batch process information and main part separation matrix W _d

(2.1)、先对矩阵X(KI×J)进行白化处理，得到白化向量Z；(2.1), first carry out whitening process to matrix X (KI×J), obtain whitening vector Z;

Z＝QXZ=QX

其中，Q为白化矩阵，Q＝Λ^-1/2U^T，Λ＝diag(λ₁,…,λ_n)，λ_i(i＝1,…,n)为协方差矩阵E{XX^T}的前n个特征值，U为n个特征值对应的特征向量组成的矩阵；Among them, Q is the whitening matrix, Q=Λ ^-1/2 U ^T , Λ=diag(λ ₁ ,…,λ _n ), λ _i (i=1,…,n) is the covariance matrix E{XX ^T } The first n eigenvalues of , U is a matrix composed of eigenvectors corresponding to n eigenvalues;

(2.2)、对白化向量Z进行分解，得到反映间歇过程信息的d个独立成分S_d和主部分离矩阵W_d；(2.2), decompose the whitening vector Z, obtain d independent components S _d and the main part separation matrix W _d reflecting the intermittent process information;

(2.2.1)、构建初始随机矢量值b_k，并令k＝1，k∈[1,n]；(2.2.1), construct the initial random vector value b _k , and set k=1, k∈[1,n];

b_k＝E{Zg(b_k ^TZ)}-E{g'(b_k ^TZ)}b_k b _k ＝E{Zg(b _k ^T Z)}-E{g'(b _k ^T Z)}b _k

其中，函数g()为已选定的非二次函数G的一阶导数，g'()表示函数g()的导数，E{}表示求期望；Among them, the function g() is the first-order derivative of the selected non-quadratic function G, g'() represents the derivative of the function g(), and E{} represents expectation;

(2.2.2)、对b_k进行迭代；(2.2.2), iterating b _k ;

(2.2.3)、对迭代后的b_k进行归一化处理，即b_k＝b_k/||b_k||，其中，||b_k||表示求b_k的范数；(2.2.3), carry out normalization process to b _k after the iteration, i.e. b _k =b _k /||b _k ||, wherein, ||b _k || represents seeking the norm of b _k ;

(2.2.4)、对归一化处理后的b_k进行判断，如果|b_k ^Tb_k|＝1±5％，则输出矢量值b_k，并进入步骤(2.2.5)；否则，k＝k+1，并返回到步骤(2.2.2)继续迭代直到满足|b_k ^Tb_k|＝1±5％时，再进入步骤(2.2.5)；(2.2.4), judge b _k after the normalization process, if |b _k ^T b _k |=1 ± 5%, then output vector value b _k , and enter step (2.2.5); Otherwise, k=k+1, and return to step (2.2.2) to continue iteration until satisfying | b _k ^T b _k |=1 ± 5%, then enter step (2.2.5);

(2.2.5)、构造矩阵B＝[b₁,…,b_n]^T，利用公式S＝B^TZ求得独立成分，利用公式W＝B^TQ求得分离矩阵；再将独立成分S按非高斯程度大小排列，选取前d个作为独立成分S_d，其对应的前d个作为主部分离矩阵W_d；(2.2.5), construct matrix B=[b ₁ ,...,b _n ] ^T , use the formula S=B ^T Z to obtain the independent components, use the formula W=B ^T Q to obtain the separation matrix; then the independent components S Arranged according to the degree of non-Gaussian, select the first d as independent components S _d , and the corresponding first d as the main part separation matrix W _d ;

(3)、在独立成分S_d中使用KNN算法，求取统计控制限CL(3), use the KNN algorithm in the independent component S _d to obtain the statistical control limit CL

在独立成分S_d＝[s₁,…,s_d]中，计算行与行之间的平方和距离，通过距离大小以此确定每一行的m近邻，并计算其KNN平方距离D_s；In the independent component S _d =[s ₁ ,…,s _d ], calculate the square sum distance between rows, determine the m neighbors of each row through the distance, and calculate its KNN square distance D _s ;

其中，表示S_d中第i行与距离它第j近的行的欧氏距离的平方；in, Indicates the square of the Euclidean distance between the i-th row and the j-th closest row in S _d ;

由于D_s近似服从χ²分布，依据显著性水平可以确定控制限α为置信水平，N为独立成分S_d行数；Since D _s approximately obeys the χ ² distribution, the control limits can be determined according to the significance level α is the confidence level, N is the number of rows of independent components S _d ;

(4)、将待检测数据x'按照步骤(1)进行标准化处理，得到数据x，再根据主部分离矩阵W_d计算数据x的独立成分 (4), standardize the data x' to be detected according to step (1) to obtain the data x, and then calculate the independent components of the data x according to the main part separation matrix W _d

(5)、将独立成分按照步骤(3)计算KNN平方距离D_x；将D_x与控制限CL进行比较，如果D_x＞CL，则认为该样本是故障样本，反之，则认为该样本是正常样本。(5), the independent components Calculate the KNN square distance D _x according to step (3); compare D _x with the control limit CL, if D _x > CL, the sample is considered to be a fault sample, otherwise, the sample is considered to be a normal sample.

本发明的发明目的是这样实现的：The purpose of the invention of the present invention is achieved like this:

本发明基于ICA-KNN的间歇过程故障检测方法，通过应用ICA处理训练数据集，再选取较少的独立主元取代原始的高维数据，同时提取原始数据的主要特征，之后，在独立主元空间中应用KNN方法求取相应的统计控制限用于故障检测。这样使非高斯非线性的间歇生产过程具有较高的故障检测率，同时相比KICA还减少了计算复杂度。The present invention is based on the ICA-KNN intermittent process fault detection method, by applying ICA to process the training data set, then selecting less independent pivots to replace the original high-dimensional data, and extracting the main features of the original data at the same time, after that, in the independent pivots The KNN method is applied in the space to obtain the corresponding statistical control limits for fault detection. In this way, the non-Gaussian nonlinear batch production process has a high fault detection rate, and at the same time reduces the computational complexity compared with KICA.

附图说明Description of drawings

图1是基于ICA-KNN的间歇过程故障检测方法流程图；Figure 1 is a flow chart of the intermittent process fault detection method based on ICA-KNN;

图2是KICA方法的I2检测图；Fig. 2 is the I2 detection figure of KICA method;

图3是KICA方法的SPE检测图；Fig. 3 is the SPE detection figure of KICA method;

图4是KNN方法的检测图；Figure 4 is a detection map of the KNN method;

图5是ICA-KNN方法的检测图。Figure 5 is the detection map of the ICA-KNN method.

具体实施方式Detailed ways

下面结合附图对本发明的具体实施方式进行描述，以便本领域的技术人员更好地理解本发明。需要特别提醒注意的是，在以下的描述中，当已知功能和设计的详细描述也许会淡化本发明的主要内容时，这些描述在这里将被忽略。Specific embodiments of the present invention will be described below in conjunction with the accompanying drawings, so that those skilled in the art can better understand the present invention. It should be noted that in the following description, when detailed descriptions of known functions and designs may dilute the main content of the present invention, these descriptions will be omitted here.

实施例Example

为了方便描述，先对具体实施方式中出现的相关专业术语进行说明：For the convenience of description, the relevant technical terms appearing in the specific implementation are explained first:

ICA(Independent Component Analysis)独立成分分析；ICA (Independent Component Analysis) independent component analysis;

KNN(K-Nearest Neighbor)K近邻；KNN (K-Nearest Neighbor) K nearest neighbor;

FD-KNN(Fault Detection based on K-Nearest Neighbor)基于K近邻的故障检测方法；FD-KNN (Fault Detection based on K-Nearest Neighbor) is a fault detection method based on K-nearest neighbors;

KICA(Kernel Independent Component Analysis)核独立成分分析；KICA (Kernel Independent Component Analysis) nuclear independent component analysis;

图1是基于ICA-KNN的间歇过程故障检测方法流程图。Figure 1 is a flow chart of the fault detection method for intermittent processes based on ICA-KNN.

在本实施例中，如图1所示，本发明一种基于ICA-KNN的间歇过程故障检测方法，包含以下步骤：In the present embodiment, as shown in Figure 1, a kind of intermittent process fault detection method based on ICA-KNN of the present invention comprises the following steps:

S1、数据预处理S1. Data preprocessing

将间歇过程采集的三维样本矩阵X(I×J×K)先进行基于批次个数展开，得到二维矩阵X(I×KJ)，这样消除量纲的影响；再对二维矩阵X(I×KJ)在批次方向上做标准化处理，使该二维矩阵X(I×KJ)的每列的均值为0、方差为1，这样去掉了所有批次的平均运行轨迹；最后将标准化处理后的二维矩阵X(I×KJ)纵向重新排列成矩阵X(KI×J)；其中，I表示批次个数，J表示观测变量个数，K表示采样次数。The three-dimensional sample matrix X(I×J×K) collected in the batch process is first expanded based on the number of batches to obtain the two-dimensional matrix X(I×KJ), so as to eliminate the influence of dimension; then the two-dimensional matrix X( I×KJ) is standardized in the batch direction, so that the mean value of each column of the two-dimensional matrix X(I×KJ) is 0 and the variance is 1, so that the average running track of all batches is removed; finally, the normalized The processed two-dimensional matrix X (I×KJ) is vertically rearranged into a matrix X(KI×J); where I represents the number of batches, J represents the number of observed variables, and K represents the number of samples.

在本实施例中，在Lam 9600上进行的半导体铝蚀反应，其中包含107批次的正常数据和20批次的故障数据。将82批次的正常数据作为训练样本，25批次的正常数据作为测试样本，最终检测20批次的故障数据，看20批次的故障数据是否可及时检测出来。In this embodiment, the semiconductor aluminum etching reaction performed on Lam 9600 contains 107 batches of normal data and 20 batches of fault data. Take 82 batches of normal data as training samples, 25 batches of normal data as test samples, and finally detect 20 batches of faulty data to see if the 20 batches of faulty data can be detected in time.

设最终所有训练样本的批次数据为(82×18×90)的三维样本矩阵，基于批次个数展开，得到二维样本矩阵(82×1620)，之后对二维样本矩阵减去均值除以标准差来进行标准化处理，最后将标准化后的样本矩阵(82×1620)纵向重新排列成二维矩阵(7380×18)，用于随后的ICA降维处理。Let the final batch data of all training samples be a three-dimensional sample matrix of (82×18×90), expand based on the number of batches, and obtain a two-dimensional sample matrix (82×1620), and then subtract the mean value from the two-dimensional sample matrix The standard deviation is used for standardization, and finally the normalized sample matrix (82×1620) is vertically rearranged into a two-dimensional matrix (7380×18) for subsequent ICA dimension reduction.

S2、对矩阵X(KI×J)进行ICA降维处理，得到反映间歇过程信息的d个独立成分S_d和主部分离矩阵W_d S2. Carry out ICA dimension reduction processing on matrix X(KI×J), and obtain d independent components S _d reflecting batch process information and main part separation matrix W _d

S2.1、为去除样本数据之间的相关性，简化独立分量提取过程，因此需要对S2.1. In order to remove the correlation between sample data and simplify the independent component extraction process, it is necessary to

矩阵X(KI×J)进行白化处理，得到白化向量Z；The matrix X (KI×J) is whitened to obtain the whitening vector Z;

Z＝QXZ=QX

S2.2、对白化向量Z进行分解，得到反映间歇过程信息的d个独立成分S_d和主部分离矩阵W_d；S2.2. Decompose the whitening vector Z to obtain d independent components S _d reflecting the information of the batch process and the main part separation matrix W _d ;

S2.2.1、构建初始随机矢量值b_k,k∈[1,n]；S2.2.1. Construct the initial random vector value b _k ,k∈[1,n];

在本实施例中，非二次函数G可以选多种形式，如：In this embodiment, the non-quadratic function G can be selected in various forms, such as:

其中，1≤a₁≤2，a₂＝1；cosh()表示一个函数用来返回参数的双曲余弦值Among them, 1≤a ₁ ≤2, a ₂ =1; cosh() indicates a function used to return the hyperbolic cosine value of the parameter

S2.2.2、从k＝1开始对b_k进行迭代；S2.2.2, starting to iterate b _k from k=1;

S2.2.3、对迭代后的b_k进行归一化处理，即b_k＝b_k/||b_k||，其中，||b_k||表示求b_k的范数；S2.2.3. Perform normalization processing on the iterated b _k , that is, b _k = b _k /||b _k ||, where ||b _k || represents seeking the norm of b _k ;

S2.2.4、对归一化处理后的b_k进行判断，如果|b_k ^Tb_k|＝1±5％，则输出矢量值b_k，并进入步骤S2.2.5；否则，k＝k+1，并返回到步骤S2.2.2继续迭代直到满足|b_k ^Tb_k|＝1±5％时，再进入步骤S2.2.5；S2.2.4. Judge b _k after normalization processing, if |b _k ^T b _k |=1±5%, then output vector value b _k and enter step S2.2.5; otherwise, k=k+ 1, and return to step S2.2.2 to continue iterating until |b _k ^T b _k |=1±5% is satisfied, then enter step S2.2.5;

S2.2.5、构造矩阵B＝[b₁,…,b_n]^T，利用公式S＝B^TZ求得独立成分，利用公式W＝B^TQ求得分离矩阵；再将独立成分S按非高斯程度大小排列，选取前d个作为独立成分S_d，其对应的前d个作为主部分离矩阵W_d；S2.2.5. Construct matrix B=[b ₁ ,...,b _n ] ^T , use formula S=B ^T Z to obtain independent components, use formula W=B ^T Q to obtain separation matrix; Arrangement of Gaussian degree, select the first d as the independent components S _d , and the corresponding first d as the main part separation matrix W _d ;

S3、在独立成分S_d中使用KNN算法，求取统计控制限CLS3. Use the KNN algorithm in the independent component S _d to obtain the statistical control limit CL

在本实施例中，则第1行和第2行之间的平方和距离为：In this example, Then the sum of squares distance between row 1 and row 2 is:

(1-1)²+(1-2)²+(1-1)²＝1，同理第1行和其它行之间的平方和距离分别为：1,1,1,3,12；本实施例中，取m＝3，再分别求取每一行的KNN平方距离D_s；(1-1) ² +(1-2) ² +(1-1) ² ＝1, similarly the square sum distances between the first row and other rows are: 1,1,1,3,12; In this embodiment, take m=3, and then calculate the KNN square distance D _s of each row;

S4、将待检测数据x'按照步骤S1进行标准化处理，得到数据x，再根据主部分离矩阵W_d计算数据x的独立成分 S4. Standardize the data x' to be detected according to step S1 to obtain the data x, and then calculate the independent components of the data x according to the main part separation matrix W _d

S5、将独立成分按照步骤S3计算KNN平方距离D_x；将D_x与控制限CL进行比较，如果D_x＞CL，则认为该样本是故障样本，反之，则认为该样本是正常样本。S5, the independent components Calculate the KNN square distance D _x according to step S3; compare D _x with the control limit CL, if D _x > CL, the sample is considered to be a faulty sample, otherwise, the sample is considered to be a normal sample.

为了验证提出方法的有效性，我们采用半导体生产中铝堆蚀刻工艺过程数据进行仿真，并与KICA、FD-KNN方法进行对比说明。图2为KICA的I²统计线检测图；图3为KICA的SPE检测图；图4为KNN检测图；图5为ICA-KNN检测图。通过对比，可以看到ICA-KNN方法有着更高的故障检测率，在误报率方面，几种方法差异不大。从算法时间上对比来看，ICA-KNN算法相比FD-KNN算法和KICA算法可以有效降低复杂度，表明了该方法的优越性。仿真实验表明，ICA-KNN方法简单有效，具有很好的应用前景。In order to verify the effectiveness of the proposed method, we used the data of the aluminum stack etching process in semiconductor production to simulate, and compared it with KICA and FD-KNN methods. Fig. 2 is the I ² statistical line detection diagram of KICA; Fig. 3 is the SPE detection diagram of KICA; Fig. 4 is the KNN detection diagram; Fig. 5 is the ICA-KNN detection diagram. By comparison, it can be seen that the ICA-KNN method has a higher fault detection rate, and there is little difference between the several methods in terms of false alarm rate. From the comparison of algorithm time, ICA-KNN algorithm can effectively reduce the complexity compared with FD-KNN algorithm and KICA algorithm, which shows the superiority of this method. The simulation experiment shows that the ICA-KNN method is simple and effective, and has a good application prospect.

尽管上面对本发明说明性的具体实施方式进行了描述，以便于本技术领域的技术人员理解本发明，但应该清楚，本发明不限于具体实施方式的范围，对本技术领域的普通技术人员来讲，只要各种变化在所附的权利要求限定和确定的本发明的精神和范围内，这些变化是显而易见的，一切利用本发明构思的发明创造均在保护之列。Although the illustrative specific embodiments of the present invention have been described above, so that those skilled in the art can understand the present invention, it should be clear that the present invention is not limited to the scope of the specific embodiments. For those of ordinary skill in the art, As long as various changes are within the spirit and scope of the present invention defined and determined by the appended claims, these changes are obvious, and all inventions and creations using the concept of the present invention are included in the protection list.

Claims

1. An ICA-KNN-based intermittent process fault detection method is characterized by comprising the following steps:

(1) data preprocessing

Expanding a three-dimensional sample matrix X (I multiplied by J multiplied by K) acquired in an intermittent process based on the number of batches to obtain a two-dimensional matrix X (I multiplied by KJ), standardizing the two-dimensional matrix X (I multiplied by KJ) in the batch direction to enable the mean value of each row of the two-dimensional matrix X (I multiplied by KJ) to be 0 and the variance to be 1, and finally longitudinally rearranging the standardized two-dimensional matrix X (I multiplied by KJ) into a matrix X (KI multiplied by J); wherein I represents the number of batches, J represents the number of observation variables, and K represents the sampling times;

(2) ICA dimension reduction processing is carried out on the matrix X (KI multiplied by J) to obtain d independent components S reflecting intermittent process information_dAnd a main part separation matrix W_d

(2.1) whitening the matrix X (KI multiplied by J) to obtain a whitening vector Z;

Z＝QX

wherein Q is a whitening matrix, Q ═ Λ^-1/2U^T，Λ＝diag(λ₁,…,λ_n)，λ_i(i ═ 1, …, n) is the covariance matrix E { XX^TThe first n eigenvalues of the matrix are obtained, and U is a matrix formed by eigenvectors corresponding to the n eigenvalues;

(2.2) decomposing the whitening vector Z to obtain d independent components S reflecting the information of the intermittent process_dAnd a main part separation matrix W_d；

(2.2.1) constructing an initial random vector value b_k，k∈[1,n]；

b_k＝E{Zg(b_kTZ)}-E{g'(b_k ^TZ)}b_k

Wherein, the function G () is the first derivative of the selected non-quadratic function G, G' () represents the derivative of the function G (), E { } represents the expectation;

(2.2.2) let k equal 1, pair b_kCarrying out iteration;

<mrow> <msub> <mi>b</mi> <mi>k</mi> </msub> <mo>=</mo> <msub> <mi>b</mi> <mi>k</mi> </msub> <mo>-</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>k</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mrow> <mo>(</mo> <msup> <msub> <mi>b</mi> <mi>k</mi> </msub> <mi>T</mi> </msup> <msub> <mi>b</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>b</mi> <mi>i</mi> </msub> </mrow>

(2.2.3) for b after iteration_kPerforming a normalization process, i.e. b_k＝b_k/||b_kI, wherein i b_kI represents solving b_kNorm of (d);

(2.2.4) normalization of b_kMaking a judgment if | b_k ^Tb_k1 +/-5%, then outputting vector value b_kAnd entering step (2.2.5); otherwise, k is k +1 and returns to step (2.2.2) to continue the iteration until | b is satisfied_k ^Tb_kIf | ═ 1 ± 5%, then go to step (2.2.5);

(2.2.5) construction matrix B ═ B₁,…,b_n]^TUsing the formula S ═ B^TZ is the independent component, and the formula W is B^TQ, solving a separation matrix; then arranging the independent components S according to the non-Gaussian degree, and selecting the first d as the independent components S_dThe first d corresponding to it as main part separation matrix W_d；

(3) In the independent component S_dIn the method, a KNN algorithm is used to obtain a statistical control limit CL

In the independent component S_d＝[s₁,…,s_d]Calculating square distance between rows, determining m neighbor of each row according to distance, and calculating KNN square distance D_s；

<mrow> <msub> <mi>D</mi> <mi>s</mi> </msub> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msubsup> <mi>d</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> <mn>2</mn> </msubsup> </mrow>

Wherein,denotes S_dThe square of the euclidean distance between the ith row and the row closest to it;

due to D_sApproximate compliance chi²Distribution, from which a control limit can be determinedα confidence level, N is independent component S_dA number of rows;

(4) carrying out standardization processing on the data x' to be detected according to the step (1) to obtain data x, and then separating the matrix W according to the main part_dCalculating the independent components of the data x

(5) The independent componentsCalculating the KNN squared distance D according to the step (3)_x(ii) a Will D_xCompared with the control limit CL, if D_xIf the sample is more than CL, the sample is considered to be a fault sample, otherwise, the sample is considered to be a normal sample.

2. The ICA-KNN based intermittent process fault detection method of claim 1, wherein the non-quadratic function G can be selected from two forms:

G(x)＝logcosh(a₁x)/a₁or g (x) -exp (-a)₂x²/2)/a₂

Wherein, a₁、a₂Being constant, cosh () represents a hyperbolic cosine value that a function uses to return arguments.