CN103440368B

CN103440368B - Multi-model dynamic soft measurement modeling method

Info

Publication number: CN103440368B
Application number: CN201310349985.4A
Authority: CN
Inventors: 王昕�; 唐苦
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2013-08-12
Filing date: 2013-08-12
Publication date: 2017-06-13
Anticipated expiration: 2033-08-12
Also published as: CN103440368A

Abstract

A multi-model dynamic soft measurement modeling method is characterized in that a plurality of sub-models are established by utilizing a self-adaptive fuzzy kernel clustering method and a least square support vector machine; then, a probability distribution function constructed by an evidence synthesis rule is used as a weight factor to fuse the sub-model outputs to obtain multi-model outputs; and finally, dynamically estimating the prediction error of the multiple models by combining an autoregressive moving average model.

Description

A multi-model dynamic soft sensor modeling method

技术领域technical field

本发明涉及聚酯工业生产过程中酯化率的软测量方法，具体涉及一种基于证据理论合成规则和自回归滑动平均模型的多模型动态软测量方法。The invention relates to a soft-sensing method for esterification rate in the polyester industrial production process, in particular to a multi-model dynamic soft-sensing method based on evidence theory synthesis rules and an autoregressive sliding average model.

背景技术Background technique

图1所示的是酯化反应的基本过程，酯化反应作为整个聚酯生产工艺的关键环节，对稳定聚酯生产起决定性作用。而反应装置中第一酯化釜出口的关键质量指标——酯化率的高低直接影响后续反应的进行和聚酯产品的结晶性能，因此常常通过控制酯化率来控制整个生产过程。但是不同的缩聚工艺对酯化率有不同的要求，所以生产过程中必须通过调整反应压力和原料量比等操作条件来达到所需的酯化率。可是操作条件的突然改变会引起酯化率的质量波动，不利于整个生产过程的实时控制。另一方面，酯化过程大多采用两个酯化反应器来达到工艺要求的酯化率，而反应系统的高度非线性，时变性和不确定性加大了酯化率在线测量的难度。Figure 1 shows the basic process of esterification reaction. As a key link in the entire polyester production process, esterification plays a decisive role in stabilizing polyester production. The key quality index of the outlet of the first esterification tank in the reaction device - the esterification rate directly affects the progress of subsequent reactions and the crystallization performance of polyester products, so the entire production process is often controlled by controlling the esterification rate. However, different polycondensation processes have different requirements on the esterification rate, so operating conditions such as reaction pressure and raw material ratio must be adjusted in the production process to achieve the required esterification rate. However, sudden changes in operating conditions will cause quality fluctuations in the esterification rate, which is not conducive to real-time control of the entire production process. On the other hand, the esterification process mostly uses two esterification reactors to achieve the esterification rate required by the process, but the highly nonlinear reaction system, time-varying and uncertainties increase the difficulty of online measurement of the esterification rate.

现场分析仪表不仅价格昂贵、维护保养复杂，而且使用分析仪表对酯化率进行测量时，通常存在很长时间的滞后，这将导致控制质量的性能下降，难以满足生产要求。软测量的基本方法是把自动控制理论与生产过程知识有机结合起来，应用计算机技术，针对难以测量或暂时不能测量的主导变量，选择另外一些容易测量的辅助变量，通过构成某种数学关系来推断和估计，以软件来代替现场分析仪表的功能。软测量方法因响应迅速，能够连续给出主导变量信息，且投资低、维护保养简单等优点而在各领域得到广泛研究和应用。但是，在最近的几十年里，随着科学技术的进步，现代工业生产对于生产过程的要求越来越高，数据量急剧增大，数据类型越来越复杂，而且工况复杂多变，另一方面，工业过程一般都是动态的，静态软测量方法通常无法反映工业过程的动态信息和全局特性，造成模型的适应性差，无法长期使用。所以以往简单、常规的软测量方法已经不能满足现代生产工艺的需要，易出现过程特性匹配不佳、预测精度低和适应性差等问题。On-site analytical instruments are not only expensive and complicated to maintain, but also there is usually a long time lag when using analytical instruments to measure the esterification rate, which will lead to a decline in the performance of control quality and it is difficult to meet production requirements. The basic method of soft measurement is to organically combine the automatic control theory with the knowledge of the production process, apply computer technology, and select some auxiliary variables that are easy to measure for the leading variables that are difficult to measure or cannot be measured temporarily, and infer by forming a certain mathematical relationship. And estimate, replace the function of on-site analysis instrument with software. The soft sensor method has been widely studied and applied in various fields because of its rapid response, continuous information on leading variables, low investment, and simple maintenance. However, in recent decades, with the advancement of science and technology, modern industrial production has higher and higher requirements for the production process, the amount of data has increased sharply, the data types have become more and more complex, and the working conditions are complex and changeable. On the other hand, industrial processes are generally dynamic, and static soft-sensing methods usually cannot reflect the dynamic information and global characteristics of industrial processes, resulting in poor adaptability of the model and long-term use. Therefore, the simple and conventional soft-sensing methods in the past can no longer meet the needs of modern production processes, and are prone to problems such as poor matching of process characteristics, low prediction accuracy, and poor adaptability.

为了得到更一般意义上适用于对酯化率数据进行预测并分析的软测量方法，许多改进方法被提出，并形成了丰硕的研究成果，主要有以下几个方面：利用各种建模方法，如机理分析、人工神经网络、最小二乘支持向量机和高斯过程对样本集建立模型来预测主导变量输出；利用各种智能优化方法，如：粒子群算法、遗传算法和进化算法等对模型的参数进行优选；利用各种聚类方法如：K-均值聚类，模糊C-均值聚类，流形聚类和仿射传播聚类方法将样本集聚成几个子类，构建几个子模型来提高模型预测性能等等。In order to obtain a soft-sensing method suitable for predicting and analyzing esterification rate data in a more general sense, many improved methods have been proposed, and fruitful research results have been formed, mainly in the following aspects: using various modeling methods, Such as mechanism analysis, artificial neural network, least square support vector machine and Gaussian process to build a model for the sample set to predict the output of the leading variable; use various intelligent optimization methods, such as: particle swarm algorithm, genetic algorithm and evolutionary algorithm, etc. to model the model Optimize parameters; use various clustering methods such as: K-means clustering, fuzzy C-means clustering, manifold clustering and affine propagation clustering methods to cluster samples into several sub-categories, and build several sub-models to improve Model predictive performance and more.

发明内容Contents of the invention

本发明针对现有技术存在的上述不足，提供了一种多模型动态软测量建模方法，基于证据理论合成规则(D-S rule)和自回归滑动平均模型(ARMA)，较现有技术具有更好的适应性，在对酯化率的软测量中精度更高。Aiming at the above-mentioned deficiencies in the prior art, the present invention provides a multi-model dynamic soft-sensing modeling method, which is based on the synthesis rule of evidence theory (D-S rule) and the autoregressive moving average model (ARMA), which has better performance than the prior art. The adaptability is higher in the soft measurement of the esterification rate.

本发明通过以下技术方案实现：The present invention is realized through the following technical solutions:

一种多模型动态软测量建模方法，包括以下步骤：A multi-model dynamic soft sensor modeling method, comprising the following steps:

S1、数据预处理：选择训练样本数据集X_m*n，m为样本维数，n为样本个数，剔除异常数据并对数据进行归一化处理；S1. Data preprocessing: select the training sample data set X _m*n , m is the sample dimension, n is the number of samples, remove abnormal data and normalize the data;

S2、自适应模糊核聚类分析：采用自适应模糊核聚类方法对训练样本数据集X_m*n进行聚类，得到每个样本的模糊类隶属度和各聚类中心，并自动确定出最佳聚类数目c；S2. Adaptive fuzzy kernel clustering analysis: use the adaptive fuzzy kernel clustering method to cluster the training sample data set X _m*n , obtain the fuzzy class membership degree and each cluster center of each sample, and automatically determine the The optimal number of clusters c;

S3、建立子模型：采用最小二乘支持向量机对c个聚类的训练样本集进行训练学习，选择高斯核函数作为最小二乘支持向量机的核函数，通过交叉验证法建立并确定c个子模型的参数：惩罚因子C和核参数σ，并得到各个子模型的输出 S3. Establish sub-models: use the least squares support vector machine to train and learn the training sample sets of c clusters, select the Gaussian kernel function as the kernel function of the least squares support vector machine, and establish and determine c sub-models by cross-validation method Model parameters: penalty factor C and kernel parameter σ, and get the output of each sub-model

S4、基于证据理论合成规则的子模型输出融合：计算各子模型的证据概率分配函数值，将其作为子模型的权值因子，然后对各子模型的输出进行证据融合，得到静态多模型输出 S4. Sub-model output fusion based on evidence theory synthesis rules: Calculate the evidence probability distribution function value of each sub-model, use it as the weight factor of the sub-model, and then perform evidence fusion on the output of each sub-model to obtain static multi-model output

S5、模型输出的动态化：使用自回归滑动平均模型对当前时刻t的多模型输出，即对进行动态调整，首先判断是否是平稳序列，若不是，将转换为平稳序列；否则直接将和真实测量值y相减，得到一个关于输出值误差Δy的时间序列，然后利用自回归滑动平均模型(p,q)对该时间序列进行建模，得到关于预测误差的自回归滑动平均模型，最后，将以上两模型相结合进行模型预测，则最终的动态多模型输出为 S5. Dynamic model output: use the autoregressive moving average model to output multiple models at the current moment t, that is, to For dynamic adjustment, first judge Whether it is a stationary sequence, if not, will converted to a stationary sequence; otherwise, directly Subtract the real measurement value y to get a time series about the output value error Δy, and then use the autoregressive moving average model (p, q) to model the time series to get the autoregressive moving average model about the forecast error, Finally, combining the above two models for model prediction, the final dynamic multi-model output is

较佳的，步骤S2中，自适应模糊核聚类方法的步骤包括：Preferably, in step S2, the steps of the adaptive fuzzy kernel clustering method include:

S21：聚类目标函数：对训练样本集X＝{x_i|i＝1,2...n}，自适应模糊核聚类方法的目标函数定义为S21: Clustering objective function: for the training sample set X={ _xi |i=1,2...n}, the objective function of the adaptive fuzzy kernel clustering method is defined as

式中，m为模糊控制指数，μ_ij为第i个样本对应于第j个聚类的隶属度值，v_j为第j个聚类中心，K(x_i，v_j)为高斯核函数；In the formula, m is the fuzzy control index, μ _ij is the membership value of the i-th sample corresponding to the j-th cluster, v _j is the j-th cluster center, and K( _xi , v _j ) is a Gaussian kernel function;

S22：隶属度更新：S22: Membership update:

S23：聚类中心更新：S23: Cluster center update:

S23：聚类结果评价：聚类结束后，对有效性指标对聚类的结果进行评价S23: Evaluation of clustering results: after the clustering is completed, evaluate the clustering results of the effectiveness indicators

较佳的，步骤S4包括：Preferably, step S4 includes:

S41：第一个子模型的证据概率分配函数：将聚类所得的所有c个子模型作为证据理论中的辨识框架，并将任一子模型视为焦元C_j(j＝1,2...c)，对于样本x₁，计算其对于第一个子模型，即第一个焦元C₁的模糊类隶属度，并根据证据理论，将其作为一条证据，记该证据的概率分配函数为m({C₁}|x₁)＝μ₁₁；S41: Evidence probability distribution function of the first sub-model: use all the c sub-models obtained by clustering as the identification framework in the evidence theory, and regard any sub-model as the focal element C _j (j=1,2.. .c), for the sample x ₁ , calculate its fuzzy class membership degree for the first sub-model, that is, the first focal element C ₁ , and use it as a piece of evidence according to the evidence theory, and record the probability distribution function of the evidence is m({C ₁ }|x ₁ )=μ ₁₁ ;

而对于所有的n个测试样本数据X＝{x_i|i＝1,2...n}，同样得到n条证据，其概率分配函数记为m({C₁}|x_i)＝μ_i1,i＝1,2...n；And for all n test sample data X={ _xi |i=1,2...n}, also get n pieces of evidence, and its probability distribution function is recorded as m({C ₁ }| _xi )=μ _i1 , i=1,2...n;

然后，使用证据理论合成规则对这些概率分配函数进行融合，将融合后的概率分配函数作为第一个子模型的概率分配函数：Then, these probability distribution functions are fused using evidence-theoretic composition rules, and the fused probability distribution function is used as the probability distribution function of the first sub-model:

其中，矛盾因子用以反映证据的冲突程度；Among them, the contradiction factor used to reflect the degree of conflict in the evidence;

S44：所有子模型的证据概率分配函数：依此类推，对于所有的c个子模型，按照步骤S43，得到c个证据概率分配函数m({C₁}|X)...m({C_c}|X)；S44: Evidence probability distribution functions of all sub-models: by analogy, for all c sub-models, follow step S43 to obtain c evidence probability distribution functions m({C ₁ }|X)...m({C _c }|X);

S45：多模型输出：分别计算出X对于各子模型的子输出将S44中的c个概率分配函数作为各子模型的权值因子，对所得的子模型输出结果进行加权融合，则训练样本数据集的多模型输出表示为：S45: Multi-model output: separately calculate the sub-output of X for each sub-model The c probability distribution functions in S44 are used as the weight factors of each sub-model, and the output results of the obtained sub-models are weighted and fused, then the multi-model output of the training sample data set is expressed as:

较佳的，步骤S5包括：Preferably, step S5 includes:

S51：采用自回归滑动平均模型对静态的多模型输出进行动态校正，自回归滑动平均模型描述系统当前时刻t的响应即不仅在时间上同它以前的观测值有关，还与系统扰动的现值和滞后值存在一定的依存关系，自回归滑动平均模型(p,q)可表示为S51: Using autoregressive moving average model to static multi-model output For dynamic correction, the autoregressive moving average model describes the response of the system at the current moment t which is It is not only related to its previous observations in time, but also has a certain dependence relationship with the present value and lag value of the system disturbance. The autoregressive moving average model (p,q) can be expressed as

其中，p为自回归项；q为移动平均项数；Among them, p is the autoregressive item; q is the number of moving average items;

S52：引入线性推移算子B，则有故S51中的公式可变换为S52: Introducing the linear transition operator B, then there is Therefore, the formula in S51 can be transformed into

式中，ε_t为满足N(0,σ²)的白噪声序列，Φ(B)和θ(B)为推移算子B的m阶和n阶多项式，In the formula, ε _t is a white noise sequence satisfying N(0,σ ² ), Φ(B) and θ(B) are m-order and n-order polynomials of shift operator B,

根据希尔伯特空间上线性算子的基本理论，对满足平稳、正态、零均值的随机时间序列可用一自回归滑动平均模型(p,q)以任意精度逼近。According to the basic theory of linear operators on Hilbert space, for random time series that satisfy stationary, normal, zero mean It can be approximated with arbitrary precision by an autoregressive moving average model (p,q).

该方法首先利用证据合成规则处理不确定信息的聚焦优势，针对仿射传播聚类方法得到的各子模型建立了多个证据概率分配函数，将其作为各子模型的权值因子，对各子模型的输出进行加权融合得到测试样本的多模型输出，避免了切换方式引起的震荡，消除了样本错误划分对模型输出精度的影响，有效提高了模型的预测能力；然后结合自回归滑动平均模型对得到的静态多模型输出误差信息进行动态校正，显著改善了系统的动态响应特性。This method first utilizes the focusing advantage of evidence synthesis rules to deal with uncertain information, establishes multiple evidence probability distribution functions for each sub-model obtained by the affine propagation clustering method, and uses it as the weight factor of each sub-model, for each sub-model The output of the model is weighted and fused to obtain the multi-model output of the test sample, which avoids the shock caused by the switching mode, eliminates the influence of the wrong division of samples on the output accuracy of the model, and effectively improves the predictive ability of the model; The obtained static multi-model output error information is dynamically corrected, which significantly improves the dynamic response characteristics of the system.

附图说明Description of drawings

图1所示的是酯化反应的过程示意图；What Fig. 1 shows is the process schematic diagram of esterification;

图2所示的是本发明多模型动态软测量建模方法的流程图；What Fig. 2 shows is the flow chart of multi-model dynamic soft sensor modeling method of the present invention;

图3所示的是子模型C和σ的参数表；What Fig. 3 shows is the parameter table of submodel C and σ;

图4所示的是LSSVM测量方法对酯化率测试样本的预测值和人工值的对比结果示意图；What Fig. 4 shows is the comparison result schematic diagram of LSSVM measurement method to the predicted value of esterification rate test sample and artificial value;

图5所示的是SFKCM-LSSVM测量方法对酯化率测试样本的预测值和人工值的对比结果示意图；Shown in Fig. 5 is the contrast result schematic diagram of SFKCM-LSSVM measurement method to the predicted value of esterification rate test sample and artificial value;

图6所示的是AP-LS-SVM测量方法对酯化率测试样本的预测值和人工值的对比结果示意图；Shown in Fig. 6 is the comparison result schematic diagram of AP-LS-SVM measurement method to the predicted value of esterification rate test sample and artificial value;

图7所示的是本发明对酯化率测试样本的预测值和人工值的对比结果示意图；What Fig. 7 shows is the comparison result schematic diagram of the present invention to the predicted value of esterification rate test sample and artificial value;

图8所示的是本发明与现有的测量方法的性能比较示意图。Fig. 8 is a schematic diagram showing the performance comparison between the present invention and the existing measuring method.

具体实施方式detailed description

以下将结合本发明的附图，对本发明实施例中的技术方案进行清楚、完整的描述和讨论，显然，这里所描述的仅仅是本发明的一部分实例，并不是全部的实例，基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本发明的保护范围。The technical solutions in the embodiments of the present invention will be clearly and completely described and discussed below in conjunction with the accompanying drawings of the present invention. Obviously, what is described here is only a part of the examples of the present invention, not all examples. Based on the present invention All other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

为了便于对本发明实施例的理解，下面将结合附图以具体实施例为例作进一步的解释说明，且各个实施例不构成对本发明实施例的限定。In order to facilitate the understanding of the embodiments of the present invention, specific embodiments will be taken as examples for further explanation below in conjunction with the accompanying drawings, and each embodiment does not constitute a limitation to the embodiments of the present invention.

首先利用证据合成规则处理不确定信息的聚焦优势，针对仿射传播聚类方法得到的各子模型建立了多个证据概率分配函数，将其作为各子模型的权值因子，对各子模型的输出进行加权融合得到测试样本的多模型输出，避免了切换方式引起的震荡，消除了样本错误划分对模型输出精度的影响，有效提高了模型的预测能力；然后结合自回归滑动平均模型(ARMA，Auto-Regressive Moving Average Model)对得到的静态多模型输出误差信息进行动态校正，显著改善了系统的动态响应特性。First, using the focusing advantage of evidence synthesis rules to deal with uncertain information, multiple evidence probability distribution functions are established for each sub-model obtained by the affine propagation clustering method, which are used as the weight factors of each sub-model, and the weight factors of each sub-model The output is weighted and fused to obtain the multi-model output of the test sample, which avoids the shock caused by the switching mode, eliminates the impact of sample error division on the model output accuracy, and effectively improves the prediction ability of the model; then combined with the autoregressive moving average model (ARMA, Auto-Regressive Moving Average Model) dynamically corrects the obtained static multi-model output error information, which significantly improves the dynamic response characteristics of the system.

本方法解决技术问题所采取的技术方案是：The technical scheme that this method solves technical problem takes is:

请参考图2，一种基于证据理论合成规则和自回归滑动平均模型的多模型动态软测量建模方法，包括以下步骤：Please refer to Figure 2, a multi-model dynamic soft sensor modeling method based on evidence theory synthesis rules and autoregressive moving average model, including the following steps:

S1：数据预处理：选择训练样本数据集X_m*n，m为样本维数，n为样本个数，剔除异常数据并对数据进行归一化处理；S1: Data preprocessing: select the training sample data set X _m*n , m is the sample dimension, n is the number of samples, remove abnormal data and normalize the data;

S2：自适应模糊核聚类分析：采用自适应模糊核聚类方法对样本集X_m*n进行聚类，得到每个样本的模糊类隶属度和各聚类中心，并自动确定出最佳聚类数目c；S2: Adaptive fuzzy kernel clustering analysis: use the adaptive fuzzy kernel clustering method to cluster the sample set X _m*n , get the fuzzy class membership and each cluster center of each sample, and automatically determine the best The number of clusters c;

S3：建立子模型：对各子训练样本集，采用最小二乘支持向量机(LS-SVM，以下以LS-SVM代替)对其进行训练学习，并确定各子模型的参数。选择高斯核函数作为LS-SVM的核函数，通过交叉验证法确定各子模型的参数：惩罚因子C和核参数σ，如图3所示；S3: Building sub-models: For each sub-training sample set, use least squares support vector machine (LS-SVM, hereinafter replaced by LS-SVM) to train and learn it, and determine the parameters of each sub-model. Select the Gaussian kernel function as the kernel function of LS-SVM, and determine the parameters of each sub-model through the cross-validation method: the penalty factor C and the kernel parameter σ, as shown in Figure 3;

S4：基于D-S的子模型输出融合：按照式(6)的方法得到各子模型的证据概率分配函数值，将其作为子模型的权值因子，然后利用式(7)对各子模型的输出进行证据融合，得到多模型的输出 S4: DS-based sub-model output fusion: According to the method of formula (6), the evidence probability distribution function value of each sub-model is obtained, and it is used as the weight factor of the sub-model, and then the output of each sub-model is calculated by formula (7). Perform evidence fusion to obtain multi-model output

S5：模型输出的动态化：在利用上面的静态模型得到样本的多模型输出后，使用ARMA模型对当前时刻t的多模型输出即对进行动态调整。首先判断是否是平稳序列，若不是，将转换为平稳序列；否则直接将和真实测量值y相减，得到一个关于输出值误差Δy的时间序列，然后利用ARMA模型(p,q)对该时间序列进行建模，得到关于预测误差的ARMA模型。最后，将以上两模型相结合进行模型预测，则样本的最终输出为 S5: Dynamic model output: use the above static model to get the multi-model output of the sample After that, use the ARMA model to output the multi-model at the current time t right Make dynamic adjustments. judge first Whether it is a stationary sequence, if not, will converted to a stationary sequence; otherwise, directly Subtract it from the real measurement value y to get a time series about the output value error Δy, and then use the ARMA model (p, q) to model the time series to get the ARMA model about the forecast error. Finally, combining the above two models for model prediction, the final output of the sample is

步骤S2中，“自适应模糊核聚类方法”的步骤如下：In step S2, the steps of the "adaptive fuzzy kernel clustering method" are as follows:

式中，m为模糊控制指数，μ_ij为第i个样本对应于第j个聚类的隶属度值，v_j为第j个聚类中心，K(x,y)为高斯核函数。In the formula, m is the fuzzy control index, μ _ij is the membership value of the i-th sample corresponding to the j-th cluster, v _j is the j-th cluster center, and K(x,y) is a Gaussian kernel function.

S22：隶属度更新：S22: Membership update:

S23：聚类中心更新：S23: Cluster center update:

S23：聚类结果评价：聚类结束后，采用如下有效性指标对聚类的结果进行评价S23: Evaluation of clustering results: after the clustering is completed, the following validity indicators are used to evaluate the clustering results

步骤S4中，“基于证据理论合成规则的模型预测输出”的具体步骤如下：In step S4, the specific steps of "model prediction output based on evidence theory synthesis rules" are as follows:

S41：第一个子模型的证据概率分配函数：将聚类所得的所有c个子模型作为证据理论中的辨识框架，并将任一子模型视为焦元C_j(j＝1,2...c)。那么，对于样本x₁，首先根据式(3)求出其对于第一个子模型，也即第一个焦元C₁的模糊类隶属度。并根据证据理论，将其作为一条证据，记该证据的概率分配函数为m({C₁}|x_i)＝μ₁₁。S41: Evidence probability distribution function of the first sub-model: use all the c sub-models obtained by clustering as the identification framework in the evidence theory, and regard any sub-model as the focal element C _j (j=1,2.. .c). Then, for the sample x ₁ , first calculate its fuzzy class membership degree for the first sub-model, that is, the first focal element C ₁ , according to formula (3). And according to the evidence theory, take it as a piece of evidence, record the probability distribution function of this evidence as m({C ₁ }| _xi )=μ ₁₁ .

而对于所有的n个测试样本数据X＝{x_i|i＝1,2...n}，同理，可得到n条证据，其概率分配函数记为m({C₁}|x_i)＝μ_i1,i＝1,2...n。And for all n test sample data X={ _xi |i=1,2...n}, similarly, n pieces of evidence can be obtained, and its probability distribution function is recorded as m({C ₁ }| _xi )=μ _i1 , i=1,2...n.

然后，使用证据理论合成规则对这些概率分配函数进行融合，将融合后的概率分配函数作为第一个子模型的概率分配函数，如式(6)所示：Then, these probability distribution functions are fused using evidence theory synthesis rules, and the fused probability distribution function is used as the probability distribution function of the first sub-model, as shown in formula (6):

其中，矛盾因子它的大小反映了证据的冲突程度。Among them, the contradiction factor Its size reflects how conflicting the evidence is.

S44：所有子模型的证据概率分配函数：依此类推，对于所有的c个子模型，按照步骤S43，可以得到c个证据概率分配函数m({C₁}|X)...m({C_c}|X)。S44: Evidence probability distribution functions of all sub-models: by analogy, for all c sub-models, according to step S43, c evidence probability distribution functions m({C ₁ }|X)...m({C _c }|X).

S45：多模型输出：分别计算出X对于各子模型LS-SVM1,...LS-SVMc的子输出将上面得到的c个概率分配函数作为各子模型的权值因子，对所得的子模型输出结果进行加权融合，则测试样本集的多模型输出可以表示为S45: Multi-model output: Calculate the sub-output of X for each sub-model LS-SVM1,...LS-SVMc respectively Using the c probability distribution functions obtained above as the weight factors of each sub-model, and performing weighted fusion on the obtained sub-model output results, the multi-model output of the test sample set can be expressed as

步骤S5中，“模型输出的动态化”的具体步骤为：In step S5, the specific steps of "dynamization of model output" are:

S51：采用自回归滑动平均模型(ARMA)对上节得到的静态多模型输出进行动态校正。ARMA模型描述系统当前时刻t的响应即不仅在时间上同它以前的观测值有关，还与系统扰动的现值和滞后值存在一定的依存关系。ARMA模型(p,q)可表示为S51: Use the autoregressive moving average model (ARMA) to output the static multi-model obtained in the previous section Make dynamic corrections. The ARMA model describes the response of the system at the current time t which is It is not only related to its previous observation value in time, but also has a certain dependence relationship with the present value and lag value of the system disturbance. The ARMA model (p,q) can be expressed as

其中，AR是自回归，p为自回归项；MA为移动平均，q为移动平均项数。Among them, AR is autoregressive, p is autoregressive item; MA is moving average, and q is the number of moving average items.

S52：引入线性推移算子B，则有故式(8)可变换为S52: Introducing the linear transition operator B, then there is So formula (8) can be transformed into

根据希尔伯特空间上线性算子的基本理论，对满足平稳、正态、零均值的随机时间序列可用一个ARMA模型(p,q)以任意精度逼近。According to the basic theory of linear operators on Hilbert space, for random time series that satisfy stationary, normal, zero mean It can be approximated with arbitrary precision by an ARMA model (p,q).

以下根据实际数据举一实施例：Give an embodiment according to actual data below:

第一步：对现场采集到的数据进行处理，得到1000组标准数据。将其中的900组数据作为训练数据集X，用于模型的建立；剩下的100组作为测试数据集，用于检验模型的预测能力。Step 1: Process the data collected on site to obtain 1000 sets of standard data. Among them, 900 sets of data are used as the training data set X for the establishment of the model; the remaining 100 sets are used as the test data set to test the predictive ability of the model.

第二步：利用仿射传播聚类方法对训练数据集进行聚类，得到最优聚类个数为c＝4，对应的聚类中心v。The second step: use the affine propagation clustering method to cluster the training data set, and obtain the optimal number of clusters c=4, and the corresponding cluster center v.

第三步：对聚类所得到的四个子训练样本集，利用LS-SVM方法建立四个子模型，并训练学习，经交叉验证法确定LS-SVM的参数，如图3所示。Step 3: For the four sub-training sample sets obtained by clustering, use the LS-SVM method to establish four sub-models, and train and learn, and determine the parameters of the LS-SVM through the cross-validation method, as shown in Figure 3.

第四步：按照式(6)计算得出测试样本集对各子模型的概率分配函数，将其作为各个子模型的权值因子，然后计算出测试样本X^test相对于各个子模型的输出然后利用式(7)对各子模型的输出进行融合，得到测试样本的输出 Step 4: Calculate the probability distribution function of the test sample set to each sub-model according to formula (6), use it as the weight factor of each sub-model, and then calculate the output of the test sample X ^test relative to each sub-model Then use formula (7) to fuse the output of each sub-model to get the output of the test sample

第五步：将当前时刻t的测试样本的预测值和人工分析值y相减，对输出误差Δy的时间序列进行ARMA建模。得出当最佳阶数p＝4时，算法的预测性能最好。Step 5: The predicted value of the test sample at the current moment t Subtract it from the manual analysis value y, and perform ARMA modeling on the time series of the output error Δy. It is concluded that when the optimal order p=4, the prediction performance of the algorithm is the best.

图4-7为三种不同的测量方法及本发明的测量方法的预测性能曲线。从仿真结果可以看出，本发明的基于证据理论合成规则和自回归滑动平均模型的多模型动态软测量建模方法相比于单一模型及传统的多模型方法，对酯化率的预测性能有了较大改善。这是因为酯化反应具有较高非线性及多工况的特点，而单模型建模时一个模型需要考虑到全部训练样本，这限制了模型的精度；而传统多模型方法在建模时尽管对训练数据集进行了聚类划分，分别建立了不同的子模型，但是在预测测试样本输出阶段没有深入考虑测试样本与训练样本的差异和划分情况及过程的动态变化对多模型输出结果的影响，因此预测性能没有显著改善。本发明的方法利用仿射传播聚类方法将工况相同和特性相似的样本先聚类划分，然后充分考虑在预测测试样本的输出阶段，各子模型输出对样本最终输出结果的影响，利用证据理论合成规则构造权值因子对子模型的输出进行多模型融合，得到测试样本的最终输出，避免了切换方式引起的震荡和样本误分引起的预测偏差；此外，考虑到实际工业过程的动态特性，利用自回归滑动平均模型对多模型输出进行动态校正，改善了系统的动态响应特性，因而具有更好的适应性，在对酯化率的软测量中取得了较好的拟合效果。Figures 4-7 are plots of predicted performance for three different measurement methods and the measurement method of the present invention. As can be seen from the simulation results, the present invention's multi-model dynamic soft sensor modeling method based on evidence theory synthesis rules and autoregressive moving average model has a significant effect on the predictive performance of esterification rate compared with single model and traditional multi-model method. greatly improved. This is because the esterification reaction has the characteristics of high nonlinearity and multiple working conditions, and a model needs to consider all training samples when modeling a single model, which limits the accuracy of the model; The training data set is clustered and divided, and different sub-models are established respectively, but the difference between the test sample and the training sample, the division of the test sample and the impact of the dynamic change of the process on the output results of the multi-model are not considered in depth in the stage of predicting the output of the test sample , so there is no significant improvement in predictive performance. The method of the present invention uses the affine propagation clustering method to first cluster and divide the samples with the same working conditions and similar characteristics, and then fully considers the influence of the output of each sub-model on the final output of the sample in the output stage of the prediction test sample, and uses evidence Theoretical synthesis rules construct weight factors to perform multi-model fusion on the output of the sub-models to obtain the final output of the test sample, avoiding the oscillation caused by the switching mode and the prediction deviation caused by the misclassification of the sample; in addition, considering the dynamic characteristics of the actual industrial process , the autoregressive moving average model is used to dynamically correct the output of multiple models, which improves the dynamic response characteristics of the system, so it has better adaptability, and has achieved better fitting results in the soft measurement of the esterification rate.

图8中列出了不同软测量方法的性能参数。从图8中可以看出，采用本发明所提供的多模型动态软测量建模方法，较现有的建模方法而言，均方根误差和最大相对误差都得到了改进。Figure 8 lists the performance parameters of different soft sensing methods. It can be seen from FIG. 8 that, compared with the existing modeling method, the root mean square error and the maximum relative error have been improved by adopting the multi-model dynamic soft sensor modeling method provided by the present invention.

以上所述，仅为本发明较佳的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应该以权利要求的保护范围为准。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art within the technical scope disclosed in the present invention can easily think of changes or Replacement should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims

1. a kind of multi-model dynamic soft measuring modeling method, it is characterised in that comprise the following steps：

S1, data prediction：Selection training sample data collection X_m*n, m is sample dimension, and n is number of samples, rejecting abnormalities data And data are normalized；

S2, the analysis of adaptive fuzzy kernel clustering：Using adaptive fuzzy kernel clustering method to training sample data collection X_m*nGathered Class, obtains the fuzzy class degree of membership and each cluster centre of each sample, and automatically determines out preferable clustering number mesh c；

S3, set up submodel：Study is trained to the training sample set of c cluster using least square method supporting vector machine, is selected Kernel function of the gaussian kernel function as least square method supporting vector machine is selected, c submodel is set up and determined by cross-validation method Parameter：Penalty factor and nuclear parameter σ, and obtain the output of each submodel

S4, the submodel output fusion based on Combination Rules of Evidence Theory：The evidential probability partition function value of each submodel is calculated, As the weight of submodel, the then output to each submodel carries out evidence fusion, obtains static multiple mode output

S5, the mobilism of model output：The multi-model of current time t is exported using autoregressive moving-average model, i.e., it is rightEnter Mobile state is adjusted, and is first determined whetherWhether it is stationary sequence, if it is not, willBe converted to stationary sequence；Otherwise directly willWith it is true Actual measurement value y subtracts each other, and a time series on output valve error delta y is obtained, then using autoregressive moving-average model (p, q) is modeled to the time series, obtains the autoregressive moving-average model on predicated error, finally, by the step The output of static multiple mode in rapid S4Being combined with the output of autoregressive moving-average model in S5 carries out model prediction, then most Whole dynamic multi-model is output as

Include in step S2, the step of the adaptive fuzzy kernel clustering method：

S21：Cluster object function：To training sample set X={ x_i| i=1,2...n }, the target of adaptive fuzzy kernel clustering method Function is defined as

\{\begin{matrix} J_{Φ} (U, c) = Σ_{i = 1}^{n} Σ_{j = 1}^{c} μ_{i j}^{m} | | [1 - K (x_{i}, v_{j})] | |^{2} \\ \begin{matrix} s . t . & U &Element; M_{f c} \end{matrix} \end{matrix}

In formula,M is fuzzy control index, μ_ijIt is What i sample corresponded to j-th cluster is subordinate to angle value, v_jIt is j-th cluster centre, K (x_i, v_j) it is gaussian kernel function；

S22：Degree of membership updates：

μ_{i j} = {(1 - K (x_{i}, v_{j}))}^{- 1 / (m - 1)} / Σ_{j = 1}^{c} {(1 - K (x_{i}, v_{j}))}^{- 1 / (m - 1)};

S23：Cluster centre updates：

v_{i} = Σ_{i = 1}^{n} {μ_{i j}}^{m} K (x_{i}, v_{j}) x_{i} / Σ_{i = 1}^{n} {μ_{i j}}^{m} K (x_{i}, v_{j});

S24：Cluster result is evaluated：After cluster terminates, Validity Index is evaluated the result for clustering

V_{G X} (c) = \frac{Σ_{i = 1}^{c} Σ_{k = 1}^{n} μ_{i k}^{m} (1 - K (v_{k}, x_{i})) + \frac{1}{c} Σ_{i = 1}^{c} (1 - K (v_{i}, \overset{&OverBar;}{v}))}{\underset{i &NotEqual; k}{m i n} (1 - K (v_{k}, v_{i}))};

Step S4 includes：

S41：First evidential probability partition function of submodel：Using all c submodels obtained by cluster as evidence theory In framework of identification, and any submodel is considered as Jiao unit C_j, wherein, j=1,2...c, for sample x₁, it is calculated for One submodel, i.e., first Jiao unit C₁Fuzzy class degree of membership, and according to evidence theory, as an evidence, note should The probability distribution function of evidence is m ({ C₁}|x₁)=μ₁₁；

And for all of n test sample data X={ x_i| i=1,2...n }, n bar evidences are similarly obtained, its probability assignments letter Number scale is m ({ C₁}|x_i)=μ_i1, i=1,2...n；

Then, these probability distribution functions are merged using Combination Rules of Evidence Theory, by the probability assignments letter after fusion Number is used as first probability distribution function of submodel：

\{\begin{matrix} m ({C_{1}} | X) = \frac{\underset{{C_{1}} | x_{1} \cap ... \cap {C_{1}} | x_{n} = {C_{1}} | X}{Σ} m_{1} ({C_{1}} | x_{1}) ... m_{n} ({C_{1}} | x_{n})}{1 - k} \\ m (Φ) = 0 \end{matrix}

Wherein, the contradiction factorIt is used to reflect the conflict journey of evidence Degree；

S44：The evidential probability partition function of all submodels：The rest may be inferred, for all of c submodel, obtains c evidence Probability distribution function m ({ C₁}|X)...m({C_c}|X)；

S45：Multi-model is exported：Son outputs of the X for each submodel is calculated respectivelyBy c probability in S44 point With function as each submodel weight, the submodel output result to gained is weighted fusion, then number of training It is expressed as according to the multi-model output of collection：

\hat{y} = m ({C_{1}} | X) {\hat{y}}_{1} + m ({C_{2}} | X) {\hat{y}}_{2} + ... m ({C_{c}} | X) {\hat{y}}_{c};

Step S5 includes：

S51：The static multi-model is exported using autoregressive moving-average modelDynamic calibration is carried out, autoregression is slided The response of averaging model descriptive system current time tI.e.It is not only relevant with the observation before it in time, also be There is certain dependence in the present worth and lagged value of system disturbance, autoregressive moving-average model (p, q) is represented by

Wherein, p is autoregression；Q is rolling average item number；

S52：Linear shift operator B is introduced, is then hadTherefore the formula in S51 can transform to

Φ (B) {\hat{y}}_{t} = θ (B) ϵ_{t}

In formula, ε_tTo meet N (0, σ²) white noise sequence, Φ (B) and θ (B) for Shift operators B m ranks and n-order polynomial,

\{\begin{matrix} Φ (B) = (1 - φ_{1} B - ... φ_{m} B^{m}) \\ θ (B) = (1 - θ_{1} B - ... θ_{m} B^{m}) \end{matrix}

According to the basic theories of Hilbert space Linear Operators, the Random time sequence to meeting steady, normal state, zero-meanCan be approached with arbitrary accuracy with an autoregressive moving-average model (p, q).