CN105809203A

CN105809203A - Hierarchical clustering-based system steady state detection algorithm

Info

Publication number: CN105809203A
Application number: CN201610146318.XA
Authority: CN
Inventors: 赵均; 季丁; 季一丁; 邵之江; 徐祖华
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2016-03-15
Filing date: 2016-03-15
Publication date: 2016-07-27
Anticipated expiration: 2036-03-15
Also published as: CN105809203B

Abstract

本发明公开了一种基于层次聚类的系统稳态检测算法。对于一段连续时间区间内的工业数据，计算所有两两类之间的距离，得到矩阵，找到矩阵中的最小距离值合并为一个新类，并作删除，再处理获得更新后的矩阵，迭代计算获得层次聚类树的表达矩阵；对于表达矩阵，由最后一次迭代数值向第一次迭代数值的顺序依次计算获得各次的聚类结果合理性值，由每次计算得到的聚类结果合理性值求取计算获得最终的阈值，以作为最终的阈值对工业数据进行聚类计算，获得最终聚类结果序列，通过联合时序判断获得系统稳态的情况。本发明适用性强，而且避免了处理数据中因为数据维度而产生的计算量剧增现象。The invention discloses a system steady-state detection algorithm based on hierarchical clustering. For industrial data in a continuous time interval, calculate the distance between all two classes, get the matrix, find the minimum distance value in the matrix and merge it into a new class, and delete it, and then process to obtain the updated matrix, iterative calculation Obtain the expression matrix of the hierarchical clustering tree; for the expression matrix, calculate the rationality value of each clustering result from the last iteration value to the first iteration value in sequence, and the clustering result rationality obtained by each calculation Calculate the value to obtain the final threshold value, use it as the final threshold value to perform clustering calculation on industrial data, obtain the final clustering result sequence, and obtain the steady state of the system through joint timing judgment. The invention has strong applicability, and avoids the phenomenon of sharp increase of calculation amount caused by data dimension in processing data.

Description

A System Steady State Detection Algorithm Based on Hierarchical Clustering

技术领域technical field

本发明涉及了一种系统检测方法，尤其是涉及了一种基于层次聚类的系统稳态检测算法。The invention relates to a system detection method, in particular to a system steady-state detection algorithm based on hierarchical clustering.

背景技术Background technique

在对一个过程系统的研究中，稳态是最重要且最常见的假设。系统是否处于稳态，直接关系到后续对系统的建模、控制和优化的方法。过程系统内部原理和结构复杂，表现为物理量存在着较强的耦合，系统存在着极强的非线性。当系统处于非稳定状态时，系统各变量的数据特性变动剧烈，数值上表现为不稳定或者异常，与真实系统的输入输出关系存在较大偏差。只有系统处于稳定工况下，各个参数和变量才具有较强的状态一致性。基于此种情况，对设备运行性能的评价，对象特性和控制器效果的分析，都需要以获取系统的运行稳态为前提。Steady state is the most important and common assumption in the study of a process system. Whether the system is in a steady state is directly related to the subsequent modeling, control and optimization methods of the system. The internal principle and structure of the process system are complicated, which is manifested by the strong coupling of physical quantities and the strong nonlinearity of the system. When the system is in an unstable state, the data characteristics of each variable of the system change drastically, and the numerical value is unstable or abnormal, and there is a large deviation from the input-output relationship of the real system. Only when the system is in a stable working condition can each parameter and variable have a strong state consistency. Based on this situation, the evaluation of equipment operation performance, the analysis of object characteristics and controller effects, all need to obtain the stable operation of the system as the premise.

随着过程工业的发展，实际的生产过程中系统和生产方式都趋于复杂化，涉及到的过程系统对象往往是多变量、高维数和强耦合的，系统整体表现为非线性、时变性、不确定性和不完全性。复杂的系统虽然对机理研究和建模造成了极大的困难，但同时由于DCS控制系统和智能仪表的应用，使得越来越多的过程数据被记录下来。实际的过程系统中的各个变量之间存在的强烈且复杂的耦合关系使得通过测量数据来研究整个系统的稳定性成为了可能，随着数据挖掘和统计学习理论的不断发展和完善，过程工业领域也在逐步使用相关算法解决实际问题，产生了如统计过程控制等领域。在处理稳态检测的问题上，大数据的思想和方法也与传统的过程控制研究方法不同，后者判断系统是否处于稳定往往是通过几个关键变量来制定稳态标准进行决策，而前者则要求从全部的数据出发对系统进行分析，从理论上讲基于全部数据得到的结果将会更加全面和真实地反映系统的实际情况，因此可以得到更准确可靠的结果。With the development of the process industry, the system and production methods in the actual production process tend to be more complex, and the process system objects involved are often multi-variable, high-dimensional and strongly coupled, and the overall system is nonlinear and time-varying. , uncertainty and incompleteness. Although the complex system has caused great difficulties to mechanism research and modeling, more and more process data have been recorded due to the application of DCS control system and smart instruments. The strong and complex coupling relationship between variables in the actual process system makes it possible to study the stability of the entire system through measurement data. With the continuous development and improvement of data mining and statistical learning theory, the field of process industry It is also gradually using related algorithms to solve practical problems, resulting in fields such as statistical process control. In dealing with the problem of steady-state detection, the ideas and methods of big data are also different from the traditional process control research methods. The latter judges whether the system is stable or not, and often uses several key variables to formulate steady-state standards for decision-making, while the former makes decisions. It is required to analyze the system starting from all the data. Theoretically speaking, the results based on all the data will reflect the actual situation of the system more comprehensively and truly, so more accurate and reliable results can be obtained.

稳态检测问题自上世纪80年代被提出以来，国内外许多学者都提出了不同的稳态检测的方法，但是由于现场数据的复杂性，很多现有的在仿真试验中验证了有效性的稳态检测手段在实际应用中得到的检测结果并不十分精确。而且各种方法受到自身影响和实际应用中的具体需求限制，应用的场合也不尽相同。Since the steady-state detection problem was proposed in the 1980s, many scholars at home and abroad have proposed different steady-state detection methods. However, due to the complexity of field data, many existing steady-state detection methods have been verified in simulation tests The detection results obtained by state detection methods in practical applications are not very accurate. Moreover, various methods are limited by their own influence and specific requirements in practical applications, and the application occasions are also different.

聚类是非监督学习领域中很重要的技术，用于将数据样本中的个体分成不同的类别，使得类内个体具有尽可能高的相似性，类间个体具有尽可能高的差异性。层次聚类思想最早是由JohnsonSC.在1967年提出的，不同与EM和K-means等其他聚类，层次聚类不需要提前给定类别个数，而且不需要迭代优化过程，通过给定的相似度函数和阈值即可得到聚类结果。由于不涉及优化求解等问题，因此可以避免“维数灾难”，但由于层次聚类需要计算两两样本之间的距离，复杂度会随着样本容量的增加而平方倍数增加。Clustering is a very important technique in the field of unsupervised learning. It is used to divide individuals in data samples into different categories, so that individuals within a class have as high a similarity as possible, and individuals between classes have as high a difference as possible. The idea of hierarchical clustering was first proposed by JohnsonSC. in 1967. Unlike other clusters such as EM and K-means, hierarchical clustering does not need to specify the number of categories in advance, and does not require an iterative optimization process. Through a given The clustering result can be obtained by using the similarity function and the threshold. Since it does not involve optimization and other problems, it can avoid the "curse of dimensionality", but because hierarchical clustering needs to calculate the distance between two samples, the complexity will increase quadratically with the increase of sample size.

传统计算方法结构的流程图如图1，假设待聚类的样本数据集合中有N个样本个体，层次聚类的算法具体有如下的步骤：The flow chart of the traditional computing method structure is shown in Figure 1. Assuming that there are N sample individuals in the sample data set to be clustered, the hierarchical clustering algorithm specifically has the following steps:

1)定义相似度函数和算法终止条件阈值(一般为最大类内距离或者最小类间距离)；1) Define the similarity function and the algorithm termination condition threshold (generally the maximum intra-class distance or the minimum inter-class distance);

2)将数据集中的每个个体聚为一类，共N类；2) Cluster each individual in the data set into one category, a total of N categories;

3)当前数据集合被分作为k类，计算每类之间的相似度，d(i,j)表示i和j类之间的相似度，若d(i,j)是当前两两之间相似度集合中最小值，则合并第i和第j类。此时聚类个数由k变成k-1；3) The current data set is divided into k categories, and the similarity between each category is calculated. d(i,j) indicates the similarity between i and j categories. If d(i,j) is the current The minimum value in the similarity set, then merge the i-th and j-th classes. At this time, the number of clusters changes from k to k-1;

4)计算此时的终止条件值，若满足阈值则结束算法；4) Calculate the termination condition value at this time, and end the algorithm if the threshold is met;

5)回到3)，当k＝1时，所有的采样点都被聚为同一类，聚类算法无法继续，则算法停止。5) Back to 3), when k=1, all sampling points are clustered into the same cluster, and the clustering algorithm cannot continue, and the algorithm stops.

由此，传统方法主要缺点是对过程对象的依赖性强，普适性差，而且主要针对单变量进行稳态检测，因此适用性不强，并且无法避免了处理数据中因为数据维度而产生的计算量剧增的问题。Therefore, the main disadvantage of the traditional method is the strong dependence on the process object, poor universality, and mainly for single-variable steady-state detection, so the applicability is not strong, and the calculation due to the data dimension in processing data cannot be avoided. problem of increasing volume.

发明内容Contents of the invention

为了解决背景技术中存在的问题，本发明提出了一种基于层次聚类的稳态检测算法。In order to solve the problems existing in the background technology, the present invention proposes a steady-state detection algorithm based on hierarchical clustering.

本发明采用的技术方案是：The technical scheme adopted in the present invention is:

STEP1生成聚类树：STEP1 generates a clustering tree:

1.1)对于一段连续时间区间内包含N个采样点的工业数据{d_i},i＝1,2,3…,N，以集合中采样点作为类，d_i表示类的工业数据；1.1) For the industrial data {d _i } containing N sampling points in a continuous time interval, i=1, 2, 3..., N, the sampling points in the set are used as the class, and d _i represents the industrial data of the class;

1.2)计算所有两两类之间的距离，得到N×N的矩阵A，矩阵A中i行j列的元素记为a_ij，a_ii＝0，a_ij表示类d_i和类d_j之间的距离，得到的矩阵A如下：1.2) Calculate the distance between all two classes, and get the matrix A of N×N, the elements of row i and column j in matrix A are marked as a _ij , a _ii =0, and a _ij represents the difference between class d _i and class d _j The distance between the obtained matrix A is as follows:

1.3)找到矩阵A中的最小距离值a_mn，a_mn为其类间距。a_mn表示第m类和第n类为距离最近的类，将第m类和第n类合并为一个新类，删除矩阵A中的m行、n行、m列和n列，将矩阵A中剩余的类与合并得到的新类再采用步骤1.2)相同的方式进行处理获得更新后的矩阵A；1.3) Find the minimum distance value a _mn in the matrix A, where a _mn is its class distance. a _mn means that the mth class and the nth class are the closest classes, the mth class and the nth class are merged into a new class, the m rows, n rows, m columns and n columns in the matrix A are deleted, and the matrix A The remaining classes in and the new class obtained by merging are processed in the same way as step 1.2) to obtain the updated matrix A;

1.4)重复步骤1.2)～1.3)进行迭代，直至矩阵A变成1×1的矩阵，记录每次迭代计算过程中的m，n和a_mn，构成层次聚类树的N×3的表达矩阵Z；1.4) Repeat steps 1.2) to 1.3) to iterate until the matrix A becomes a 1×1 matrix, record m, n and a _mn in each iterative calculation process, and form an N×3 expression matrix of the hierarchical clustering tree Z;

$Z Z = = [\begin{matrix} {Z Z}_{1111} & {Z Z}_{21 twenty one} & {Z Z}_{3131} \\ . . & . . & . . \\ . . & . . & . . \\ . . & . . & . . \\ {Z Z}_{N N 11} & {Z Z}_{N N 11} ... ... & {Z Z}_{33 N N} \end{matrix}]$

STEP2阈值选取：STEP2 threshold selection:

2.1)对于表达矩阵Z，由最后一次迭代数值向第一次迭代数值的顺序依次计算采用以下方式进行计算：2.1) For the expression matrix Z, the sequence from the value of the last iteration to the value of the first iteration is calculated in the following manner:

以当前的最小距离值a_mn为阈值对工业数据{d_i}进行聚类计算，得到的聚类结果序列为T_k，k表示迭代序数，聚类结果序列T_k为1×N的整数序列，其中，T_k(i)表示聚类结果序列T_k的第i项聚类结果，T_k(i)＝p；计算求取聚类结果序列T_k的差分序列D，即是对序列中的每个相邻元素相减获得差值，计算差分序列D中零的个数作为聚类结果合理性值D_zero(k)。The industrial data {d _i } is clustered with the current minimum distance value a _mn as the threshold, and the obtained clustering result sequence is T _k , k represents the iteration number, and the clustering result sequence T _k is an integer sequence of 1×N , wherein, T _k (i) represents the i-th clustering result of the clustering result sequence T _k , T _k (i)=p; calculate and obtain the differential sequence D of the clustering result sequence T _k , that is, in the sequence Each adjacent element of is subtracted to obtain the difference, and the number of zeros in the difference sequence D is calculated as the clustering result rationality value D_zero(k).

2.2)由每次计算得到的聚类结果合理性值D_zero(k)构成聚类结果合理性值序列D_zero，计算求取聚类结果合理性值序列D_zero的差分序列，即是对序列中的每个相邻元素相减获得差值，从中找到差分序列中最大的差分值及其所在的序号k，并以Z_3k作为最终的阈值。2.2) The clustering result rationality value sequence D_zero is formed from the clustering result rationality value D_zero(k) obtained each time, and the difference sequence of the clustering result rationality value sequence D_zero is calculated, that is, for each sequence in the sequence The difference is obtained by subtracting two adjacent elements, from which the largest difference value and its sequence number k in the difference sequence are found, and Z _3k is used as the final threshold.

所述步骤STEP2由最后一次迭代数值向第一次迭代数值的顺序依次计算是指计算到k＝1的第一次迭代数值或者k＝N/2的中间次迭代数值为止。In the step STEP2, the sequential calculation from the last iteration value to the first iteration value refers to calculation until the first iteration value of k=1 or the intermediate iteration value of k=N/2.

当k太小时，聚类的结果对最终的稳态识别已经没有意义，为了减小计算量一般选择k＝N/2作为算法的停止条件。When k is too small, the clustering result is meaningless for the final steady-state identification. In order to reduce the amount of calculation, k=N/2 is generally selected as the stop condition of the algorithm.

STEP3联合时序判断稳态：STEP3 joint timing judgment steady state:

最终聚类结果序列T中的元素T_i若满足以下条件，则认为第m个采样点到第m+k-1个采样点之间的系统处于稳态：If the element Ti in the final clustering result sequence T satisfies the following conditions, the system between the mth sampling point and the m+ _k -1th sampling point is considered to be in a steady state:

T_i＝c，i＝m,m+1,m+2,…,m+k-1T _i = c, i = m, m+1, m+2,..., m+k-1

其中，T_i为最终聚类结果序列T中第i采样点对应的聚类结果，c表示结果常数，k＝τ/T_s，τ为上述的时间长度阈值，T_s为数据的采样时间间隔。Among them, T _i is the clustering result corresponding to the i-th sampling point in the final clustering result sequence T, c represents the result constant, k=τ/T _s , τ is the above-mentioned time length threshold, and T _s is the sampling time interval of the data .

τ的大小根据所研究对象的时间特性来确定，一般取τ＝3t^*，t^*为系统的单位阶跃响应调节时间。The size of τ is determined according to the time characteristics of the researched object. Generally, τ=3t ^* is taken, and t ^* is the unit step response adjustment time of the system.

所述步骤1.3)中合并得到的新类置于更新后的矩阵A中的末尾，末尾加入新一的行/列可记作[a_1(N+1)a_2(N+1)…]^T。The new class obtained by merging in step 1.3) is placed at the end of the updated matrix A, and a new row/column is added at the end, which can be written as [a _1(N+1) a _2(N+1) ...] ^T.

所述矩阵A中剩余的类的行列数字序号保持不变，合并得到的新类的行列序号数字采用新的行列数字序号且与之前所有类的行列数字序号均不相同，使得整个系统过程每一类都有唯一的行列数字序号。The row and column numbers of the remaining classes in the matrix A remain unchanged, and the row and column numbers of the new class obtained by merging adopt new row and column numbers and are different from the row and column numbers of all previous classes, so that the entire system process can be processed every time Classes have a unique row and column numeric sequence number.

所述步骤1.3)中合并得到的类与矩阵A中剩余的各个类之间的距离采用以下公式计算：The distance between the classes that are merged in the step 1.3) and the remaining classes in the matrix A is calculated by the following formula:

a_si＝αa_mi+(1-α)a_n* a _si =αa _mi +(1-α)a _n*

其中，α为权重参数，0≤α≤1，a_si表示s类和i类之间的间距，a_ni表示n类和i类之间的间距，a_mi表示m类和i类之间的间距。Among them, α is the weight parameter, 0≤α≤1, a _si represents the distance between class s and class i, a _ni represents the distance between class n and class i, and a _mi represents the distance between class m and class i spacing.

将系统的每个采样点看做是一个状态空间的“特征向量”(或特征点)，当系统处于稳态时，特征点在一定区间内波动，呈现为以某一特定点为中心的高维高斯分布。当系统处于过渡状态或非稳态时候，状态点将脱离原来的高斯分布，进而呈现出另外分散的分布状态。基于系统处于非稳态时前后时刻之间状态的差异很大这一特点，可以通过比较系统状态特征向量之间的差异程度来进行过程系统稳态的检测。这里引入了聚类算法，在稳态检测中，稳态数据之间具有很高的相似性，在聚类中会被聚为一类，同时动态数据点与稳态数据之间存在很大差异，则会被分到不同的类别。Each sampling point of the system is regarded as a "feature vector" (or feature point) of a state space. When the system is in a steady state, the feature point fluctuates within a certain interval, presenting as a high point centered on a specific point. Gaussian distribution. When the system is in a transition state or an unsteady state, the state points will deviate from the original Gaussian distribution, and then present another scattered distribution state. Based on the fact that when the system is in an unsteady state, there is a large difference between the states before and after the time, the steady state of the process system can be detected by comparing the degree of difference between the system state eigenvectors. The clustering algorithm is introduced here. In the steady-state detection, the steady-state data has a high similarity, and will be clustered into one class in the clustering. At the same time, there is a big difference between the dynamic data points and the steady-state data. , will be divided into different categories.

由于系统的特性，只能保证系统处于稳态时候的状态分布比较集中，但是如果系统处于波动状态时，在状态空间内其分布的波动是很大的。也就是说，在稳态采样点之间存在相似性的同时，动态采样点之间的相似性是很小的，在聚类结果中，连续时间内的动态点可能会被聚在很多不同的类中。因此，在聚类之前是无法确定最后存在的类的个数，基于这些采用本发明层次聚类算法。Due to the characteristics of the system, it can only be guaranteed that the state distribution is relatively concentrated when the system is in a steady state, but if the system is in a fluctuating state, the fluctuation of its distribution in the state space is very large. That is to say, while there is similarity between steady-state sampling points, the similarity between dynamic sampling points is very small. In the clustering results, dynamic points in continuous time may be clustered in many different clusters. in class. Therefore, it is impossible to determine the number of the last existing classes before clustering, and the hierarchical clustering algorithm of the present invention is adopted based on these.

本发明的有益效果是：The beneficial effects of the present invention are:

本发明将“大数据”思想和技术引入稳态检测中，通过比较数据集中各个采样点之间的相似程度，同时结合数据的时序特性来进行稳态检测，并且提出了如何确定聚类阈值的方法。The present invention introduces the idea and technology of "big data" into the steady-state detection, performs steady-state detection by comparing the similarity between each sampling point in the data set, and combines the time series characteristics of the data, and proposes how to determine the clustering threshold method.

本发明的特点是适用性强，而且避免了处理数据中因为数据维度而产生的计算量剧增现象。The present invention is characterized by strong applicability, and avoids the phenomenon of sharp increase in calculation amount caused by data dimension in processing data.

附图说明Description of drawings

图1是层次聚类算法的流程图。Figure 1 is a flowchart of the hierarchical clustering algorithm.

图2是本发明方法阈值选择过程的流程图。Fig. 2 is a flowchart of the threshold selection process of the method of the present invention.

图3是本发明方法结合时序判断稳态的流程图。Fig. 3 is a flow chart of the method of the present invention for judging a steady state in combination with time series.

图4是实施例1二阶仿真系统的输入输出曲线图。FIG. 4 is an input-output curve diagram of the second-order simulation system in Embodiment 1. FIG.

图5是实施例1二阶系统状态点聚类结果图。Fig. 5 is a diagram of the clustering results of the second-order system state points in Embodiment 1.

图6是实施例1聚类检测结果图。FIG. 6 is a diagram of the clustering detection results in Example 1. FIG.

图7是实施例2二阶仿真系统的输入输出曲线图。Fig. 7 is an input-output curve diagram of the second-order simulation system in Embodiment 2.

图8是实施例2取不同阈值时得到的聚类结果对比图。Fig. 8 is a comparison diagram of clustering results obtained when different thresholds are used in Example 2.

图9是实施例3输入的待处理数据曲线图。FIG. 9 is a graph of data to be processed inputted in Embodiment 3.

图10是实施例3第50次迭代中的聚类结果。FIG. 10 is the clustering result in the 50th iteration of Embodiment 3.

图11是实施例3所有50次迭代中D中0的个数及其变化分布图。FIG. 11 is a distribution diagram of the number of 0s in D and its variation in all 50 iterations of Embodiment 3.

图12是实施例3具有代表性的部分聚类结果。FIG. 12 is a representative partial clustering result of Example 3.

图13是实施例3的稳态检测结果图。FIG. 13 is a diagram of the steady-state detection results of Example 3. FIG.

具体实施方式detailed description

下面结合附图和实施例对本发明作进一步说明。The present invention will be further described below in conjunction with drawings and embodiments.

本发明方法具体应用于过程系统稳态的检测上，如图2所示，有如下的流程：待分类的数据点{d_i}，首先定义其状态空间内点的距离以及类间距离的度量方式(由于过程系统的状态点在其相对应的状态空间中按照一定的概率和聚集程度随机分布，在稳态检测的方法中我们选择使用欧氏距离进行衡量)。在衡量类间距离时，我们所关心的稳态在聚类结果中为一个类，我们将尽可能提高每一类的聚集程度，选择最大距离更满足上述要求。The method of the present invention is specifically applied to the detection of the steady state of the process system. As shown in Figure 2, there is the following process: for the data point {d _i } to be classified, first define the distance between the points in its state space and the measurement of the distance between classes (Since the state points of the process system are randomly distributed in the corresponding state space according to a certain probability and degree of aggregation, we choose to use Euclidean distance for measurement in the method of steady-state detection). When measuring the distance between classes, the steady state we care about is a class in the clustering results. We will try to increase the degree of aggregation of each class, and choose the maximum distance to meet the above requirements.

算法的输入：按照时间排序的系统历史数据状态点{d_i}，其中{d_i}为p维的向量数据，i＝1,2,3…,N，N为采样点个数。The input of the algorithm: system historical data status points {d _i } sorted by time, where {d _i } is p-dimensional vector data, i=1, 2, 3..., N, N is the number of sampling points.

算法的输出：每个采样点对应的类别T，T为长度为N的一维序列，即采样点类别编号序列。这里的类别编号仅起到区分类别的作用，没有任何物理和数值上的意义。The output of the algorithm: the category T corresponding to each sampling point, T is a one-dimensional sequence of length N, that is, the sequence of sampling point category numbers. The category numbers here are only used to distinguish categories without any physical or numerical meaning.

根据层次聚类的步骤，接下来算法通过计算数据集中点之间的距离进行迭代，每一步检查是否达到算法终止条件，算法结束则输出聚类结果T。According to the steps of hierarchical clustering, the algorithm then iterates by calculating the distance between points in the data set, checking whether the termination condition of the algorithm is reached at each step, and output the clustering result T when the algorithm ends.

由于分析的数据对象是根据时间排列的时间序列，当系统在一段时间内处于稳态，其空间分布也相对集中。因此利用数据的时序性，满足如下两条件的，即可认为是稳态：1)数据采样点在时间上连续；2)数据采样点的相似度很高(即聚类算法将数据聚到一类中)。Since the analyzed data objects are time series arranged according to time, when the system is in a steady state for a period of time, its spatial distribution is relatively concentrated. Therefore, using the timing of the data, it can be considered as a steady state if the following two conditions are met: 1) The data sampling points are continuous in time; 2) The similarity of the data sampling points is very high (that is, the clustering algorithm gathers the data into one class).

通过聚类结果得到稳态，需要确定聚类的阈值，以及判断稳定的最小时间区间T，在选定的阈值δ下，当长度不小于T的一段时间内系统的状态点被聚在同一类中，即说明该段时间内系统处于稳态。最小时间区间T的确定可以根据系统对象的时间特性，下面主要讨论聚类阈值δ的选取以及阈值对算法的影响。To obtain a steady state through clustering results, it is necessary to determine the threshold of clustering and the minimum time interval T for judging stability. Under the selected threshold δ, when the length of the system is not less than T, the state points of the system are clustered in the same class , which means that the system is in a steady state during this period. The determination of the minimum time interval T can be based on the time characteristics of the system object. The following mainly discusses the selection of the clustering threshold δ and the influence of the threshold on the algorithm.

在聚类计算的过程中，待聚类的数据集合中有N个样本，将数据从N类逐渐聚合到1类的过程记录N*3的聚类树矩阵Z中，其中Z_i1记录第i步聚类时合并两类中的第一类序号，Z_i2记录第i步聚类时合并两类中的第二类序号，Z_i3记录第i步聚类时合并两类间的距离。因此获取最优聚类阈值即为在Z中找到一行，作为算法输出的结束行，其中Z_i3即为阈值。这里我们采取对每一行的结果进行枚举的方式获取最佳阈值。In the process of clustering calculation, there are N samples in the data set to be clustered, and the process of gradually aggregating the data from N classes to 1 class is recorded in the N*3 clustering tree matrix Z, where Z _i1 records the i-th Merge the serial numbers of the first class in the two classes during the first step of clustering, Z _i2 records the serial numbers of the second class in the merged two classes in the ith step of clustering, and Z _i3 records the distance between the merged two classes in the ith step of clustering. Therefore, to obtain the optimal clustering threshold is to find a row in Z as the end row of the algorithm output, where Z _i3 is the threshold. Here we enumerate the results of each row to obtain the optimal threshold.

另外本发明的创新在于采用指定合理的量化指标来找到合理的阈值进行计算，具体原理是：T_k为第k次枚举得到的聚类结果，使用其差分项D＝diff(T_k)来衡量总体的聚类效果，随着阈值缩小，D中0元素的个数逐渐减小，一般情况如(1)～(9)，D中0的个数呈较慢的趋势减少，但是当一类稳态被分裂开时，结果中会出现一段时间内系统点频繁在两类之间切换，这就会使得D中0的个数急剧减少，因此，需要将聚类结果进行枚举直到枚举的阈值太小对稳态识别无意义时候停止。找出其中D中0的个数变化最大的一步，即为聚类阈值缩小的终止步，其对应的阈值也即当前数据聚类所需要的合适的稳态阈值，确定阈值的流程图如3。In addition, the innovation of the present invention is to use specified reasonable quantitative indicators to find a reasonable threshold for calculation. The specific principle is: T _k is the clustering result obtained by the kth enumeration, and its difference item D=diff(T _k ) is used to calculate To measure the overall clustering effect, as the threshold shrinks, the number of 0 elements in D gradually decreases, generally as in (1) to (9), the number of 0 in D decreases slowly, but when a When the class steady state is split, the result will appear that the system points frequently switch between the two classes for a period of time, which will make the number of 0s in D decrease sharply. Therefore, it is necessary to enumerate the clustering results until the enumeration Stop when the threshold is too small for steady-state identification. Find the step where the number of 0s in D changes the most, that is, the termination step where the clustering threshold is reduced, and the corresponding threshold is the appropriate steady-state threshold required by the current data clustering. The flow chart for determining the threshold is shown in 3 .

本发明的实施例如下：Embodiments of the present invention are as follows:

实施例1Example 1

实施例1针对一个二阶单入单出系统，采用本发明方法仿真处理后的输入和输出如图4所示，仿真时间长度为100s，采样间隔为0.1s，图4(a)表示输入变量曲线，图4(b)表示输出变量曲线。Embodiment 1 is aimed at a second-order single-input-single-out system, adopting the input and output after simulation processing of the method of the present invention as shown in Figure 4, the simulation time length is 100s, and the sampling interval is 0.1s, and Figure 4 (a) represents the input variable Curve, Figure 4(b) shows the output variable curve.

实施例1变量中未加入噪声，可从图4的曲线上看到，系统在开始阶段(t＝0～300)处于稳态，在t＝300处输入产生一个斜坡调整，系统进入过渡状态，当t＝600后斜坡信号结束，输出也趋于稳定最终系统进入另一个稳态。由于未加入噪声，系统状态在稳态时候的聚合程度非常高，这里暂不采用上述算法选择阈值的方法，直接人工选择一个很小的阈值t＝0.01，对数据进行聚类，得到的结果如图5所示。Noise is not added in the variable of embodiment 1, it can be seen from the curve of Fig. 4 that the system is in the steady state at the initial stage (t=0～300), and a ramp adjustment is generated at t=300 place input, and the system enters the transition state, When t=600, the ramp signal ends, and the output also tends to be stable, and the final system enters another steady state. Since no noise is added, the degree of aggregation of the system state is very high in the steady state. Here, instead of using the threshold selection method of the above algorithm, a small threshold t=0.01 is directly manually selected, and the data is clustered. The obtained results are as follows Figure 5 shows.

从图4中的曲线可以看出，整个过程中系统共有两个不同的稳定状态，在图5聚类结果中，处于同一稳态的数据点类别标号是一致的，可以看到图5中存在两段水平的部分，时间上与系统的两个稳态相对应。From the curve in Figure 4, it can be seen that the system has two different stable states in the whole process. In the clustering results in Figure 5, the labels of the data points in the same steady state are consistent. It can be seen that there are The two horizontal sections correspond in time to the two steady states of the system.

聚类的结果表示了状态点在空间内的距离，判断系统是否处于稳态需要结合数据的时序性。当系统在一段时间内处于稳态，则在该段时间区间内的数据将会因为相似度高而被聚到同一类。反过来可以以此为稳态判据，当一段连续时间τ内的所有数据都被聚类算法聚为一类，则就认为系统处于稳态。其中时间段的长度τ根据所研究对象的时间特性来确定，这里取τ＝2t*，t*为系统的单位阶跃响应调节时间。The result of clustering represents the distance of the state points in the space, and judging whether the system is in a steady state needs to be combined with the timing of the data. When the system is in a steady state for a period of time, the data in this period of time will be clustered into the same category because of their high similarity. Conversely, this can be used as a steady state criterion, when all the data within a continuous time τ are clustered into one category by the clustering algorithm, the system is considered to be in a steady state. The length τ of the time period is determined according to the time characteristics of the researched object. Here, τ=2t* is taken, and t* is the unit step response adjustment time of the system.

因此得到的系统稳定结果如图6所示，图6(a)表示二阶系统状态数据点聚类结果，图6(b)表示系统最终稳态检测结果，稳态结果中1表示稳定，0表示不稳定，结果中可以看出，系统大约在t＝300附近从第一个稳态中脱离，进入过渡态，在接近t＝700时再次进入稳态，对比输入输出，虽然输入在t＝600时就结束了斜坡信号进入稳定，但是输出y在之后经历了一段时间的调整才稳定下来。Therefore, the obtained system stability results are shown in Figure 6. Figure 6(a) shows the clustering results of the second-order system state data points, and Figure 6(b) shows the final system steady-state detection results. In the steady-state results, 1 means stable and 0 Indicates instability. It can be seen from the results that the system breaks away from the first steady state around t=300, enters a transition state, and enters a steady state again when it is close to t=700. Comparing the input and output, although the input is at t= At 600, the ramp signal ends and becomes stable, but the output y is stabilized after a period of adjustment.

实施例2Example 2

实施例2加入了噪声进行仿真，并基于实施例1用到的单入单出系统过程进一步说明阈值选取过程，加入噪声的仿真系统输入输出如图7所示，图7(a)表示输入变量曲线，图7(b)表示输出变量曲线：Embodiment 2 added noise for simulation, and further explained the threshold selection process based on the single-input-single-out system process used in Embodiment 1. The input and output of the simulation system with noise added are shown in Figure 7, and Figure 7(a) shows the input variables curve, Figure 7(b) shows the output variable curve:

图7中可看出，系统在时间t＝0～300的区间内，处于一个稳态，然后输入u产生一个斜坡变化，经过一段时间的过渡态，系统进入另外一个稳态。整个过程的状态空间中，两个稳态的状态点聚集程度较高，而过渡状态点则比较分散。阈值的选取采取从大到小顺序进行枚举，随着聚类阈值的缩小，每一类内部的集中程度越来越高。当进行到第j步，阈值缩小到一定程度则会将原来属于同一稳态的一类点分开成多类，认为此时的阈值已经太小超出了我们的期望，继续缩小阈值也是没有必要的。此时，前一步即第j-1步所达到的阈值即为当前合适的值。It can be seen from Figure 7 that the system is in a steady state within the interval of time t=0~300, and then input u produces a ramp change, and after a period of transition state, the system enters another steady state. In the state space of the whole process, the state points of the two steady states are highly aggregated, while the transition state points are scattered. The selection of the threshold is enumerated from large to small. As the clustering threshold shrinks, the degree of concentration within each class becomes higher and higher. When the jth step is reached, if the threshold is reduced to a certain extent, the points belonging to the same stable state will be divided into multiple categories. We think that the threshold at this time is too small to exceed our expectations, and it is unnecessary to continue to reduce the threshold. . At this time, the threshold reached in the previous step, that is, the j-1th step, is the current appropriate value.

实施例2的单入单出二阶系统进行枚举聚类的前12步如图8所示：从图8中可看出，在图8(1)中阈值较大，算法仅将数据分成了两类，在图8(2)～图8(9)中，中间处于过渡状态的数据在算法的驱动下不断被分离出来。聚类阈值缩小时，每一类内的聚合程度越来越高，本发明的结果就越精确。但是在图8(10)中，当t为700～1000时，系统的状态点原本属于同一类，但在这一步阈值缩小后，许多点被从中剥离出来，而且被分出的两类在时间顺序上彼此交错。如果将这两类都视为系统的稳态，图8(10)中表示在这段时间内系统频繁地在两个稳态之间切换，并且不存在过渡状态——这在自然的过程系统中是不存在的。The first 12 steps of the enumeration and clustering of the single-input-single-out second-order system of Embodiment 2 are shown in Figure 8: It can be seen from Figure 8 that the threshold in Figure 8 (1) is relatively large, and the algorithm only divides the data into In Figure 8(2) to Figure 8(9), the data in the intermediate state is continuously separated under the drive of the algorithm. As the clustering threshold shrinks, the degree of aggregation within each class becomes higher and higher, and the results of the present invention become more accurate. However, in Figure 8(10), when t is 700-1000, the state points of the system originally belong to the same class, but after the threshold is narrowed in this step, many points are separated from it, and the two classes separated out in time sequentially interleaved with each other. If these two categories are regarded as the steady state of the system, Figure 8(10) shows that the system frequently switches between the two steady states during this period, and there is no transition state—this is in the natural process system does not exist in .

由此认为，图8(10)中的状态已经达到了阈值缩小的极限，合理的距离阈值大小应该取图8(9)中结果相对应的阈值作为最终阈值。From this, it is considered that the state in Figure 8(10) has reached the limit of threshold shrinkage, and the reasonable distance threshold should take the threshold corresponding to the result in Figure 8(9) as the final threshold.

实施例3：Example 3:

实施例3应用于实际过程中产生的历史数据，使用了某电厂60MW机组锅炉数据进行数据试验。数据主要包括：锅炉负荷指令，发电功率，进煤量，进风量，各测量点汽水温度，压力等等，共180维。选取其中10000点数据使用上述聚类算法进行稳态检测。该段时间内主要的变量变化情况如图9所示：Embodiment 3 is applied to the historical data generated in the actual process, and the boiler data of a 60MW unit of a certain power plant is used for data experimentation. The data mainly includes: boiler load command, power generation, coal intake, air intake, steam water temperature and pressure at each measurement point, etc., a total of 180 dimensions. Select 10,000 points of data and use the above clustering algorithm for steady-state detection. The changes in the main variables during this period are shown in Figure 9:

从图9中看出，在所研究的时间段中，系统负荷共有两次较大的调整，大致上在区间内产生了3段稳态的数据。主蒸汽温度和主蒸汽压力都有较大的波动，仅从蒸汽温度曲线上甚至看不出系统的波动情况。下面将系统所有的变量组成的状态点进行聚类，按照上文中选择阈值的方法，选择总迭代次数为50次，数据试验中证明第50次的结果如图10，已经无法从结果中获得有用的消息，因此在这里停止继续迭代，最后得到的D中0的个数及其变化如图11，图11(a)表示50次迭代中D中0的个数分布，图11(b)表示50次迭代中D中0个数的变化情况。It can be seen from Figure 9 that during the time period studied, there were two major adjustments in the system load, roughly producing three periods of steady-state data within the interval. Both the main steam temperature and the main steam pressure fluctuate greatly, and the fluctuation of the system cannot even be seen from the steam temperature curve. Next, the state points composed of all the variables of the system are clustered. According to the method of selecting the threshold value above, the total number of iterations is selected as 50 times. The data test proves that the result of the 50th time is shown in Figure 10, and it is no longer possible to obtain useful results from the results. The news, so stop to continue iteration here, the number of 0 in D and its changes finally obtained are shown in Figure 11, Figure 11(a) shows the distribution of the number of 0 in D in 50 iterations, and Figure 11(b) shows Changes in the number of 0s in D in 50 iterations.

实施例在第16步时，D中0元素的个数发生突变，为了更加直观地看到聚类的变化，选取部分具有代表性的迭代结果如图12中所示。图12中可看出，迭代从第16步到第17步时，0～2000时间段内的点被分裂为两类，在聚类结果上的图片与D中0元素个数的突变吻合，也说明了本发明方法的合理性。Embodiment In step 16, the number of 0 elements in D changes suddenly. In order to see the change of clustering more intuitively, some representative iteration results are selected as shown in FIG. 12 . It can be seen from Figure 12 that when iterating from step 16 to step 17, the points in the time period from 0 to 2000 are split into two categories, and the picture on the clustering result is consistent with the sudden change in the number of 0 elements in D. It also illustrates the rationality of the method of the present invention.

根据聚类得到的结果，选择合适的时间长度进行稳态检测，根据工业对象的特性，认为当系统在500个采样点内保持稳定，则可认为系统处于稳态。在实施例的聚类结果中，采用聚类结果中不小于500点的区间内所有点聚类编号都相同，则认为系统处于稳定状态，据此得到系统的稳态检测输出结果如图13所示，图13(c)其中1表示稳定，0表示不稳定。According to the results obtained by clustering, an appropriate time length is selected for steady-state detection. According to the characteristics of industrial objects, when the system remains stable within 500 sampling points, the system can be considered to be in a steady state. In the clustering results of the embodiment, if the clustering numbers of all the points in the interval of not less than 500 points in the clustering results are the same, the system is considered to be in a stable state, and the steady state detection output of the system is obtained accordingly, as shown in Figure 13 13(c) where 1 means stable and 0 means unstable.

最终可见本发明具有其显著的技术效果，稳态检测适用性强，避免了处理数据中因为数据维度而产生的计算量剧增现象。Finally, it can be seen that the present invention has its remarkable technical effect, strong applicability of steady-state detection, and avoids the phenomenon of sharp increase in calculation amount caused by data dimension in processing data.

上述实施例并非是对于本发明的限制，本发明并非仅限于上述实施例，只要符合本发明要求，均属于本发明的保护范围。The above-mentioned embodiments do not limit the present invention, and the present invention is not limited to the above-mentioned embodiments, as long as the requirements of the present invention are met, they all belong to the protection scope of the present invention.

Claims

1. the systematic steady state detection algorithm based on hierarchical clustering, it is characterised in that:

STEP1 generates clustering tree:

1.1) comprised to the industrial data { d of N number of sampled point one period of continuous time in interval_i, i=1,2,3 ..., N, in set, sampled point is as class, d_iRepresent the industrial data of class；

1.2) calculate all distances between class between two, obtain the matrix of N × NIn matrix A, the element of p row q row is designated as a_pq, a_pqRepresent class d_pWith class d_qBetween distance, a_pp=0；

1.3) the class spacing minima a in matrix A is found_mnNamely m class and the n-th class are two classes that current distance is nearest, m class and the n-th class are merged into a new class, m row in puncture table A, n row, m arrange and n row, the class of current residual is adopted step 1.2 again with merging the new class obtained) identical mode calculates class spacing, it is thus achieved that the matrix A after renewal；

1.4) step 1.2 is repeated)～1.3) be iterated, until matrix A becomes the matrix of 1 × 1, record m, n and a in each iterative process_mn, constitute expression matrix Z, the m of N × 3 of hierarchical clustering tree, n and a_mnThe first, second, third columns value respectively as matrix Z；

Z = [\begin{matrix} Z_{11} & Z_{21} & Z_{31} \\ \begin{matrix} . \\ . \\ . \end{matrix} & \begin{matrix} . \\ . \\ . \end{matrix} & \begin{matrix} . \\ . \\ . \end{matrix} \\ Z_{N 1} & Z_{N 1} ... & Z_{3 N} \end{matrix}]

STEP2 threshold value is chosen:

For expression matrix Z, calculate cluster result reasonability value D_zero (k) obtaining each time successively to the order of first time iterative numerical by last iterative numerical, asked for calculate by each calculated cluster result reasonability value D_zero (k) and obtain final threshold value, with z_3kAs final threshold value to industrial data { d_iCarry out cluster calculation, it is thus achieved that final cluster result sequence T；

STEP3 combines sequential and judges stable state:

Element T in final cluster result sequence T_iIf meeting the following conditions, then it is assumed that the system between m-th sampled point to the m+l-1 sampled point is in stable state:

T_i=c, i=m, m+1, m+2 ..., m+l-1

Wherein, T_iFor the cluster result that the i-th sampled point in final cluster result sequence T is corresponding, c represents result constant, l=τ/T_s, τ is above-mentioned time span threshold value, T_sSampling time interval for data.

2. a kind of systematic steady state detection algorithm based on hierarchical clustering according to claim 1, it is characterized in that: described step 1.3) the middle end merged in the matrix A after the new class obtained is placed in renewal, in described matrix A, the ranks digital number of remaining class remains unchanged, and the ranks rank-numeral merging the new class obtained adopts new ranks digital number and the ranks digital number with all classes before all to differ.

3. a kind of systematic steady state detection algorithm based on hierarchical clustering according to claim 1, it is characterized in that: described step 1.3) in merge in the class that obtains and matrix A between each class remaining distance calculation as follows, class after note m class and the merging of n class is numbered s, i class is any sort except s class, then the distance calculating s class and i class has:

a_si=α a_mi+(1-α)α_ni

Wherein, α is weight parameter, 0≤α≤1, a_siRepresent the spacing between s class and i class, a_niRepresent the spacing between n class and i class, a_miRepresent the spacing between m class and i class.

4. a kind of systematic steady state detection algorithm based on hierarchical clustering according to claim 1, it is characterised in that: described step STEP2 calculates every time and calculates all in the following ways: with current lowest distance value a_mnFor threshold value to industrial data { d_iCarrying out cluster calculation, the cluster result sequence obtained is T_k, k represents iteration ordinal number, cluster result sequence T_kIt is the integer sequence of 1 × N, calculates and ask for cluster result sequence T_kDifference sequence D, calculate in difference sequence D the number of zero as cluster result reasonability value D_zero (k).

5. a kind of systematic steady state detection algorithm based on hierarchical clustering according to claim 1, it is characterized in that: described step STEP2 is asked for by each calculated cluster result reasonability value D_zero (k) calculating and obtains final threshold value particularly as follows: be made up of cluster result reasonability value sequence D_zero each calculated cluster result reasonability value D_zero (k), calculate the difference sequence asking for cluster result reasonability value sequence D_zero, therefrom find the sequence number j at difference value maximum in difference sequence and place thereof, and with Z_3jAs final threshold value.

6. a kind of systematic steady state detection algorithm based on hierarchical clustering according to claim 1, it is characterised in that: described step STEP2 is calculated, to the order of first time iterative numerical, the centre time iterative numerical referring to first time iterative numerical or the k=N/2 calculating k=1 by last iterative numerical successively.