CN111352408B

CN111352408B - A fault detection method for multi-condition process industrial process based on evidence K-nearest neighbors

Info

Publication number: CN111352408B
Application number: CN202010164092.2A
Authority: CN
Inventors: 贺凯迅; 苏照阳; 钟麦英
Original assignee: Shandong University of Science and Technology
Current assignee: Shandong University of Science and Technology
Priority date: 2020-03-11
Filing date: 2020-03-11
Publication date: 2020-12-08
Anticipated expiration: 2040-03-11
Also published as: CN111352408A

Abstract

The invention discloses a multi-working-condition process industrial process fault detection method based on evidence K nearest neighbor, which belongs to the field of industrial process monitoring. Then calculating the correlation degree of each sample in the training set and each working condition, and establishing an evidence information base; for a sample to be detected, K samples are selected from the training set by the K neighbor model, and evidence information corresponding to the K samples is synthesized by a D-S evidence theory to obtain a final detection result. The fault diagnosis model established based on evidence K nearest neighbor can be well adapted to the multi-working-condition running state of the system; the method can reduce the false alarm rate in the monitoring process and improve the accuracy of fault alarm.

Description

A fault detection method for multi-condition process industrial process based on evidence K-nearest neighbors

技术领域technical field

本发明属于工业过程监控领域，具体涉及一种基于证据K近邻的多工况流程工业过程故障检测方法。The invention belongs to the field of industrial process monitoring, and in particular relates to a multi-condition process industrial process fault detection method based on evidence K nearest neighbors.

背景技术Background technique

随着DCS的广泛应用，从工业过程中收集数据变得越来越容易。数据挖掘和数据库技术的发展也为数据驱动建模方法在工业过程中的发展和应用提供了强有力的技术支持。因此，数据驱动的过程监控、控制和系统优化得到了广泛的应用。With the widespread adoption of DCS, it has become easier to collect data from industrial processes. The development of data mining and database technology also provides strong technical support for the development and application of data-driven modeling methods in industrial processes. Therefore, data-driven process monitoring, control and system optimization are widely used.

锅炉作为一种复杂的大型能源转换设备，是火力发电厂最重要的部分之一，锅炉的稳定运行与电能质量密切相关。然而，锅炉燃烧过程复杂多变，难以利用传统的机理分析建立其精准模型；同时，锅炉负荷随着外界需求而变化，在不同的负荷下，锅炉的操作模式发生改变，运行过程表现出明显的多工况特性。As a complex large-scale energy conversion equipment, the boiler is one of the most important parts of the thermal power plant. The stable operation of the boiler is closely related to the power quality. However, the combustion process of the boiler is complex and changeable, and it is difficult to use the traditional mechanism analysis to establish its accurate model; at the same time, the boiler load changes with the external demand. Under different loads, the operation mode of the boiler changes, and the operation process shows obvious Multi-case feature.

对于传统的故障诊断方法，如主成分分析(principal components analysis,PCA)，偏最小二乘(partial least squares,PLS)等长期以来一直用于工业过程的故障检测和诊断。此类方法也被认为是非常有效的。但需要注意的是，PCA、PLS等方法是假设过程变量独立，分布均匀且线性相关的前提下制定的，并且假设系统只有单个正常工况。而在实际的工业生成中，难以符合这种假设条件，导致监控结果不精确，出现大量的错误报警。For traditional fault diagnosis methods, such as principal components analysis (PCA), partial least squares (partial least squares, PLS), etc., have been used for fault detection and diagnosis in industrial processes for a long time. Such methods are also considered to be very effective. However, it should be noted that methods such as PCA and PLS are formulated under the assumption that the process variables are independent, uniformly distributed and linearly correlated, and assume that the system has only a single normal operating condition. In actual industrial production, it is difficult to meet this assumption, resulting in inaccurate monitoring results and a large number of false alarms.

发明内容SUMMARY OF THE INVENTION

针对上述技术问题，本发明提出了一种基于证据K近邻的多工况流程工业过程故障检测方法，克服了现有技术的不足，具有良好的效果。In view of the above technical problems, the present invention proposes a multi-condition flow industrial process fault detection method based on evidence K neighbors, which overcomes the deficiencies of the prior art and has good effects.

为了实现上述目的，本发明采用如下技术方案：In order to achieve the above object, the present invention adopts the following technical solutions:

一种基于证据K近邻的多工况流程工业过程故障检测方法，包括如下步骤：A method for detecting faults in a multi-condition process industrial process based on evidence K-nearest neighbors, comprising the following steps:

步骤1：获取工业过程的正常历史工况数据，将此数据作为训练集，对训练集进行标准化，计算标准化后的值；Step 1: Obtain the normal historical working condition data of the industrial process, use this data as a training set, standardize the training set, and calculate the standardized value;

步骤2：通过聚类算法，将标准化后的训练集划分为多个正常工况；Step 2: Divide the standardized training set into multiple normal working conditions through the clustering algorithm;

步骤3：根据相关度准则，确定证据信息，由于每个训练样本都携带证据信息，得出每个训练样本与每个正常工况的相关度，进而建立证据信息库；证据信息库是所有正常样本所携带证据信息的集合；Step 3: Determine the evidence information according to the correlation criterion. Since each training sample carries the evidence information, the correlation between each training sample and each normal working condition is obtained, and then the evidence information database is established; the evidence information database is all normal conditions. A collection of evidentiary information carried by the sample;

步骤4：根据每个训练样本的局部密度，计算训练样本的折扣因子γ_i，使故障边界适应训练样本的局部特征；Step 4: Calculate the discount factor γ _i of the training sample according to the local density of each training sample, so that the fault boundary is adapted to the local characteristics of the training sample;

步骤5：对待测数据进行标准化，并在训练集中找出距离待测数据最近的k个样本；Step 5: Standardize the data to be tested, and find the k samples closest to the data to be tested in the training set;

步骤6：根据D-S证据理论Dempster’s整合规则，将步骤5中的k个样本的证据信息进行合成，得到故障检测结果。Step 6: According to the D-S evidence theory Dempster's integration rule, synthesize the evidence information of the k samples in step 5 to obtain the fault detection result.

优选地，在步骤1中，根据公式(1)，计算标准化后的值，正常工况数据集

n为正常工况样本的个数，i代表第i个样本，p为每个样本的维度：Preferably, in step 1, according to formula (1), the normalized value is calculated, the normal working condition data set

n is the number of samples under normal conditions, i represents the ith sample, and p is the dimension of each sample:

其中，

是实际值，

是变量x^j的最大值，

是变量x^j的最小值，

是标准化后的值，j表示样本维度。in,

is the actual value,

is the maximum value of the variable x ^j ,

is the minimum value of the variable x ^j ,

is the normalized value, and j represents the sample dimension.

优选地，所述步骤2中的聚类算法为K均值聚类法、高斯混合模型聚类法、密度峰值聚类法中的任意一种。Preferably, the clustering algorithm in the step 2 is any one of K-means clustering, Gaussian mixture model clustering, and density peak clustering.

优选地，所述步骤3中的相关度准则为欧氏距离、马氏距离、局部密度中的一种或两种方法的组合。Preferably, the correlation criterion in step 3 is one of Euclidean distance, Mahalanobis distance, and local density, or a combination of two methods.

优选地，在步骤3中，根据公式(2)，为每个训练样本计算证据信息：Preferably, in step 3, according to formula (2), the evidence information is calculated for each training sample:

其中，q表示第q个正常工况，d_iq为第i个样本与第q个正常工况的相关度，c表示总的正常工况数量；

表示第i个样本与所有工况相关度的总和，m_i({ω_q})表示第i个样本属于第q个正常工况的概率。Among them, q represents the qth normal operating condition, d _iq represents the correlation between the ith sample and the qth normal operating condition, and c represents the total number of normal operating conditions;

Represents the sum of the correlation between the ith sample and all working conditions, and m _i ({ω _q }) represents the probability that the ith sample belongs to the qth normal working condition.

优选地，在步骤4中，根据公式(3)，计算训练样本的折扣因子γ_i：Preferably, in step 4, according to formula (3), the discount factor γ _i of the training sample is calculated:

其中，D_i是样本x_i与距离样本x_i最近的k训练样本之间的距离组合成的集合，ε是quantile函数内的参数。Among them, D _i is the set of distances between the sample _xi and the k training samples closest to the sample _xi , and ε is the parameter in the quantile function.

优选地，在步骤6中，包括以下步骤：Preferably, in step 6, the following steps are included:

步骤6.1：根据公式(4)，计算自适应折扣因子γ^s：Step 6.1: According to formula (4), calculate the adaptive discount factor γ ^s :

其中，λ用来调整γ^s的大小，γ_i为训练样本的折扣因子；Among them, λ is used to adjust the size of γ ^s , and γ _i is the discount factor of training samples;

步骤6.2：根据公式(5)，计算训练样本的权重值：Step 6.2: According to formula (5), calculate the weight value of the training sample:

其中，d_i表示待测样本与训练样本的欧氏距离，γ^s为自适应折扣因子；Among them, d _i represents the Euclidean distance between the test sample and the training sample, and γ ^s is the adaptive discount factor;

步骤6.3：根据公式(6)，计算待测样本从属于每个工况的mass函数：Step 6.3: According to formula (6), calculate the mass function of the sample to be tested belonging to each working condition:

其中，α为训练集的置信度，m_i(A)为待测样本的证据信息；

表示距离待测样本最近的K个训练样本的索引集合，d_si表示待测样本与训练样本之间的距离；Among them, α is the confidence level of the training set, and m _i (A) is the evidence information of the sample to be tested;

Represents the index set of K training samples closest to the sample to be tested, and _dsi represents the distance between the sample to be tested and the training sample;

步骤6.4：根据公式(7)、(8)、(9)，合成mass函数，得出诊断结果；Step 6.4: According to formulas (7), (8), (9), synthesize the mass function to obtain the diagnosis result;

其中，K₁为标准化因子，m^s(Ω)表示测试样本故障概率，m^s({ω_q})表示测试样本从属于每个工况的概率。Among them, K ₁ is the normalization factor, m ^s (Ω) represents the failure probability of the test sample, and m ^s ({ω _q }) represents the probability that the test sample belongs to each working condition.

本发明所带来的有益技术效果：Beneficial technical effects brought by the present invention:

(1)基于K近邻建立的故障诊断模型可以很好的适应系统多工况运行状态；(2)该方法可以降低在监控过程中的误报率，提高故障报警的精度。(1) The fault diagnosis model based on K-nearest neighbors can be well adapted to the operating state of the system under multiple operating conditions; (2) This method can reduce the false alarm rate in the monitoring process and improve the accuracy of fault alarms.

附图说明Description of drawings

图1为本发明方法的流程图。Figure 1 is a flow chart of the method of the present invention.

图2为本发明实施例训练集聚类结果在二维空间上的投影示意图。FIG. 2 is a schematic diagram of a projection of a training set clustering result on a two-dimensional space according to an embodiment of the present invention.

图3为本发明实施例采用本发明基于证据K近邻的多工况流程工业过程故障检测方法的故障检测结果示意图。FIG. 3 is a schematic diagram of a fault detection result of an embodiment of the present invention using the method for detecting faults in a multi-condition process industrial process based on the evidence K nearest neighbors of the present invention.

具体实施方式Detailed ways

下面结合附图以及具体实施方式对本发明作进一步详细说明：The present invention is described in further detail below in conjunction with the accompanying drawings and specific embodiments:

本实例数据由某电厂锅炉系统内一个磨煤机产生，一共采集了2880个样本，28个变量。故障区间为2133-2500。取前1000个正常样本作为训练集，其余样本作为测试集。如图1所示，本发明所述方法，包括如下两大阶段：其分别为离线建模和在线监控，所述离线建模阶段具体包括：The data of this example is generated by a coal mill in a boiler system of a power plant, and a total of 2880 samples and 28 variables are collected. The fault interval is 2133-2500. Take the first 1000 normal samples as the training set and the rest as the test set. As shown in Figure 1, the method of the present invention includes the following two major stages: offline modeling and online monitoring, respectively, and the offline modeling stage specifically includes:

步骤1：对训练集进行标准化；Step 1: Normalize the training set;

上式中，

是实际值，

是变量x^j的最大值，

是变量x^j的最小值，

是标准化后的值，j表示样本维度。In the above formula,

is the actual value,

is the maximum value of the variable x ^j ,

is the minimum value of the variable x ^j ,

is the normalized value, and j represents the sample dimension.

步骤2：利用PCA把过程数据组成的高维数据空间投影到低维特征子空间，如图2所示；观察最优聚类个数，通过K均值聚类算法对训练集聚类，参数K＝5，把正常工况划分为w1，w2两个工况；Step 2: Use PCA to project the high-dimensional data space composed of the process data into the low-dimensional feature subspace, as shown in Figure 2; observe the optimal number of clusters, and cluster the training set through the K-means clustering algorithm. The parameter K =5, divide the normal working condition into two working conditions, w1 and w2;

步骤3：分别计算出w1，w2的中心点，计算每个训练集与w1，w2的相关度建立证据信息库，并计算每个训练样本的折扣因子γ_i；Step 3: Calculate the center points of w1 and w2 respectively, calculate the correlation between each training set and w1, w2 to establish an evidence information base, and calculate the discount factor γ _i of each training sample;

上式中，D_i是训练样本x_i与K最近邻模型挑选出训练样本的距离集合，参数ε＝0.85。In the above formula, D _i is the distance set between the training sample _xi and the K nearest neighbor model to select the training sample, and the parameter ε = 0.85.

在线监控阶段主要包括：The online monitoring stage mainly includes:

步骤1：对于一个待测样本x_i，首先进行标准化。在训练集中找个5个最近邻的样本组成集合Θ。Step 1: For a sample _xi to be tested, first standardize. Find the 5 nearest neighbor samples in the training set to form a set Θ.

步骤2：根据Θ内每个训练集对应的折扣因子计算出待测样本x_i的自适应折扣因子γ^s。Step 2: Calculate the adaptive discount factor γ ^s of the sample _xi to be tested according to the discount factor corresponding to each training set in Θ.

上式中，参数λ＝0.2，γ_i为训练样本的折扣因子；In the above formula, the parameter λ=0.2, and γ _i is the discount factor of the training sample;

步骤3：计算出待测样本x_i与Θ内每个训练集之间的权重：Step 3: Calculate the weight between the sample _xi to be tested and each training set in Θ:

上式中，d_i表示待测样本与训练样本的欧氏距离，γ^s为自适应折扣因子；In the above formula, d _i represents the Euclidean distance between the test sample and the training sample, and γ ^s is the adaptive discount factor;

步骤4：计算待测样本从属于每个工况的mass函数：Step 4: Calculate the mass function of the sample to be tested belonging to each working condition:

其中，参数α＝0.95为训练集的置信度，m_i(A)为训练样本的证据信息；Among them, the parameter α=0.95 is the confidence level of the training set, and m _i (A) is the evidence information of the training sample;

步骤5：运用Dempster’s整合规则合成Θ内每个训练集携带的证据得到最后的检测结果。规定当m^s(Ω)＝0.2为故障早期预警边界，m^s(Ω)＝0.9为故障边界。监控结果如图3所示；Step 5: Use Dempster's integration rule to synthesize the evidence carried by each training set in Θ to obtain the final detection result. It is stipulated that when m ^s (Ω)=0.2 is the fault early warning boundary, and m ^s (Ω)=0.9 is the fault boundary. The monitoring results are shown in Figure 3;

上述说明并非是对本发明的限制，本发明也并不仅限于上述举例，本技术领域的技术人员在本发明的实质范围内所做出的变化、改型、添加或替换，也应属于本发明的保护范围。The above description is not intended to limit the present invention, and the present invention is not limited to the above examples. Changes, modifications, additions or substitutions made by those skilled in the art within the scope of the present invention should also belong to the present invention. protected range.

Claims

1. A multi-working-condition process industrial process fault detection method based on evidence K nearest neighbor is characterized by comprising the following steps: the method comprises the following steps:

step 1: acquiring normal historical working condition data of an industrial process, taking the normal historical working condition data as a training set, standardizing the training set, and calculating a standardized value;

step 2: dividing the standardized training set into a plurality of normal working conditions through a clustering algorithm;

and step 3: determining evidence information according to a relevancy criterion, and obtaining the relevancy of each training sample and each normal working condition as each training sample carries the evidence information, thereby establishing an evidence information base; the evidence information base is a set of evidence information carried by all normal samples;

and 4, step 4: calculating discount factor gamma of training samples according to local density of each training sample_iAdapting the fault boundary to the local characteristics of the training sample;

and 5: standardizing the data to be detected, and finding out k samples closest to the data to be detected in a training set;

step 6: synthesizing the evidence information of the k samples in the step 5 according to a Dempster' S integration rule of a D-S evidence theory to obtain a fault detection result; the method comprises the following steps:

step 6.1: calculating an adaptive discount factor gamma according to equation (4)^s：

Wherein λ is used to adjust γ^sSize of (a), γ_iA discount factor for a training sample;

step 6.2: according to equation (5), the weight values of the training samples are calculated:

wherein d is_iRepresenting Euclidean distance, gamma, of the sample to be measured and the training sample^sIs an adaptive discount factor;

step 6.3: and (3) calculating a mass function of the sample to be tested belonging to each working condition according to the formula (6):

where α is the confidence of the training set, m_i(A) The evidence information of the sample to be detected is obtained;

index set representing K training samples closest to the sample to be measured, d_siRepresenting the distance between the sample to be tested and the training sample;

step 6.4: synthesizing a mass function according to the formulas (7), (8) and (9) to obtain a diagnosis result;

wherein, K₁As a normalization factor, m^s(Ω) represents the failure probability of the test sample, m^s({ω_q} represents the probability that the test sample is subject to each condition.

2. The evidence K nearest neighbor based multi-condition process industrial process fault detection method according to claim 1, wherein: in step 1, the normalized value, normal condition data set, is calculated according to equation (1)

n is the number of normal working condition samples, i represents the ith sample, and p is the dimension of each sample:

wherein,

is the actual value of the,

is a variable x^jThe maximum value of (a) is,

is a variable x^jThe minimum value of (a) is determined,

is a normalized value and j represents the sample dimension.

3. The evidence K nearest neighbor based multi-condition process industrial process fault detection method according to claim 1, wherein: the clustering algorithm in the step 2 is any one of a K-means clustering method, a Gaussian mixture model clustering method and a density peak value clustering method.

4. The evidence K nearest neighbor based multi-condition process industrial process fault detection method according to claim 1, wherein: and the relevancy criterion in the step 3 is one or a combination of two methods of Euclidean distance, Mahalanobis distance and local density.

5. The evidence K nearest neighbor based multi-condition process industrial process fault detection method according to claim 1, wherein: in step 3, evidence information is calculated for each training sample according to equation (2):

wherein q represents the q-th normal operating condition, d_iqThe correlation degree of the ith sample and the qth normal working condition is shown, and c represents the total number of the normal working conditions;

represents the sum of the correlation of the ith sample with all the working conditions, m_i({ω_qAnd) represents the probability that the ith sample belongs to the qth normal condition.

6. The evidence K nearest neighbor based multi-condition process industrial process fault detection method according to claim 1, wherein: in step 4, the fold of the training sample is calculated according to equation (3)Deduction factor gamma_i：

Wherein D is_iIs a sample x_iAnd distance sample x_iThe distance between the nearest k training samples is combined into a set, which is a parameter within the quantile function.