CN110309491A

CN110309491A - A Transient Phase Division Method and System Based on Local Gaussian Mixture Model

Info

Publication number: CN110309491A
Application number: CN201910571289.5A
Authority: CN
Inventors: 刘井响; 王丹; 彭周华; 刘陆
Original assignee: Dalian Maritime University
Current assignee: Dalian Maritime University
Priority date: 2019-06-26
Filing date: 2019-06-26
Publication date: 2019-10-08
Anticipated expiration: 2039-06-26
Also published as: CN110309491B

Abstract

The invention discloses a phase division method and system based on a local Gaussian mixture model, comprising: S1, collecting samples and creating a historical training data set; S2, selecting some samples from the historical training data set to create a first Gaussian distribution model to Determine the first steady-state phase; S3. Based on the previously determined Gaussian model, create a Gaussian mixture model containing two Gaussian components to determine the next steady-state phase; S4. Based on the determined two adjacent steady-state phases The state phase model determines the possible transient phase between the two steady state phases; S5, repeating S3 and S4 to complete the phase division of all sample data. The invention greatly reduces redundant calculation, improves calculation efficiency, adopts a step-by-step update strategy, and determines the steady-state phase and the transient phase step by step according to the sampling time sequence, and the number of phase divisions does not need to be pre-specified and divided. As a result, there is no need for subsequent processing and other advantages.

Description

A Transient Phase Division Method and System Based on Local Gaussian Mixture Model

技术领域technical field

本发明涉及批次过程统计建模技术领域，尤其涉及一种多相位批次过程中暂态相位划分方法及系统。The invention relates to the technical field of batch process statistical modeling, in particular to a method and system for transient phase division in a multiphase batch process.

背景技术Background technique

批次过程是现代工业中一种十分常见的生产方式被广泛应用于精细化工、制药、冶金和半导体等行业。随着技术的发展和需求的多样化，批次过程也变得越来越复杂，直接的表现是一个批次过程中包含多个不同的操作阶段，或者多个不同的反应/变化阶段，这样的过程被称为多相位批次过程，每一个这样的阶段被称为一个相位。例如青霉素发酵过程，假设一个青霉素发酵过程的时间为400h，前45h是预培养阶段，后355h属于馈式加料阶段，即从第45h起向反应釜里加入原料。从反应机理来说，一个典型的青霉素发酵过程又可以分为四个相位(阶段)，包括延滞阶段，指数增长阶段，稳定阶段和自溶阶段。而青霉素发酵过程是一个典型的慢时变过程，不同相位之间的转变不是一个突变的过程，而是一个缓慢变化的过程，因此不同相位之间的转变并没有那么明显，并且会出现下面一种情况，在两个稳态相位之间的样本即部分保留第一个稳态相位的特征，又包含下一个新的稳态相位的特征，符合这样特性的相位被称为暂态相位。如何将一个多相位的批次过程准确合理的划分成不同相位，有利于加强对过程机理的进一步理解和提高过程建模的精度。Batch process is a very common production method in modern industry and is widely used in fine chemical, pharmaceutical, metallurgy and semiconductor industries. With the development of technology and the diversification of requirements, the batch process has become more and more complex. The direct manifestation is that a batch process contains multiple different operation stages, or multiple different reaction/change stages, so that The process is called a multiphase batch process, and each such phase is called a phase. For example, in the penicillin fermentation process, suppose the time of a penicillin fermentation process is 400h, the first 45h is the pre-cultivation stage, and the last 355h is the feed-feeding stage, that is, the raw materials are added to the reaction kettle from the 45th hour. In terms of reaction mechanism, a typical penicillin fermentation process can be divided into four phases (stages), including lag stage, exponential growth stage, stable stage and autolysis stage. The penicillin fermentation process is a typical slow time-varying process. The transition between different phases is not a mutation process, but a slowly changing process. Therefore, the transition between different phases is not so obvious, and the following one will occur. In this case, the samples between two steady-state phases not only retain the characteristics of the first steady-state phase, but also contain the characteristics of the next new steady-state phase. How to accurately and reasonably divide a multi-phase batch process into different phases is beneficial to further understanding of the process mechanism and improving the accuracy of process modeling.

目前，对于多相位划分已经有了一定的研究成果，包括似于穷举法的多相位主元分析法，利用重复因子指标进行批次划分，然而对于过程没有明显变化拐点的过程来说，这种方法不再适用。聚类的方法被广泛用于批次过程的相位划分，然而，基于K-means算法除了要预先指定划分类别个数外，样本之间的时序性关系没有考虑，造成划分结果混乱且需要进一步后续处理，不易于解释等问题。因此在相位时，样本之间的时序关系不不可忽略的一个重要因素，并且不仅要多稳态相位进行有效划分，还要能准确划分暂态相位。也就是说上述两种方法均没有考虑时序性和暂态相位划分等不足的问题。At present, there have been some research results for multiphase division, including the multiphase principal component analysis method similar to the exhaustive method, using the repetition factor index for batch division, but for the process with no obvious change inflection point, this is method is no longer applicable. The clustering method is widely used in the phase division of the batch process. However, in addition to pre-specifying the number of division categories based on the K-means algorithm, the temporal relationship between samples is not considered, resulting in confusion in the division results and the need for further follow-up. Handling, not easy to explain and other issues. Therefore, in the phase, the timing relationship between samples is an important factor that cannot be ignored, and it is necessary not only to divide the multi-stable phase effectively, but also to accurately divide the transient phase. That is to say, the above two methods do not consider the problems of timing and transient phase division.

发明内容SUMMARY OF THE INVENTION

基于此，针对现有相位划分方法中没有考虑时序性和暂态相位划分等不足，特提出了一种基于局部高斯混合模型的相位划分方法。Based on this, a new phase division method based on local Gaussian mixture model is proposed in view of the shortcomings of the existing phase division methods that do not consider timing and transient phase division.

一种基于局部高斯混合模型的相位划分方法，包括如下步骤：A phase division method based on a local Gaussian mixture model, comprising the following steps:

S1、采集样本并创建历史训练数据集；S1. Collect samples and create a historical training data set;

S2、按照采样时间顺序自所述历史训练数据集选择部分样本创建第一个高斯分布模型以确定出第一个稳态相位；S2. Select some samples from the historical training data set to create the first Gaussian distribution model according to the sampling time sequence to determine the first steady-state phase;

S3、基于前一个已经确定的高斯模型，创建包含两个高斯成分的高斯混合模型以确定出下一稳态相位；S3. Based on the previously determined Gaussian model, create a Gaussian mixture model containing two Gaussian components to determine the next steady state phase;

S4、基于已确定的两个相邻稳态相位模型确定出两个稳态相位之间可能存在的暂态相位；S4. Determine a possible transient phase between the two steady-state phases based on the determined two adjacent steady-state phase models;

S5、重复S3和S4以完成对全部样本数据的相位划分。S5. Repeat S3 and S4 to complete the phase division of all sample data.

可选的，在其中一个实施例中，所述自所述历史训练数据集选择部分样本创建第一个高斯分布模型以确定出第一个稳态相位数据包括：Optionally, in one embodiment, the selecting partial samples from the historical training data set to create the first Gaussian distribution model to determine the first steady-state phase data includes:

S21、依次自所述历史训练数据集选取前N₁个样本并计算其均值和方差以获得对应的高斯分布模型p(x|1)，其中，p(x|1)表示第一个高斯分布模型的概率密度函数，x表示采集的样本数据；S21. Select the first N ₁ samples from the historical training data set in turn and calculate their mean and variance to obtain a corresponding Gaussian distribution model p(x|1), where p(x|1) represents the first Gaussian distribution The probability density function of the model, x represents the sample data collected;

S22、抽取样本点进行稳态相位验证，即自第N₁/2个样本点开始验证以找到连续三个满足第一验证条件的样本点并将满足第一该验证条件的样本点所对应的序号标记为所述验证条件为S22, extracting sample points for steady-state phase verification, that is, starting from the N ₁ /2th sample point for verification to find three consecutive sample points that satisfy the first verification condition and the corresponding sample points that satisfy the first verification condition serial number marked as The verification condition is

其中，ρ是预先指定的阈值；where ρ is a pre-specified threshold;

S23、判断N₁是否等于是则表示结果收敛即第一个稳态相位已确定并进行下一步；否则令并返回步骤S21进行迭代直至N₁等于 S23. Determine whether N ₁ is equal to Yes, it means that the result is converged, that is, the first steady-state phase has been determined and the next step is performed; otherwise, let And return to step S21 to iterate until N ₁ is equal to

可选的，在其中一个实施例中，所述基于前一个已经确定的高斯模型，创建包含两个高斯成分的高斯混合模型以确定出下一稳态相位包括：Optionally, in one embodiment, creating a Gaussian mixture model containing two Gaussian components based on the previously determined Gaussian model to determine the next steady state phase includes:

S31、基于前一个已经确定的高斯模型，创建包含两个高斯分布函数的混合模型并进行训练，不失一般性，假设前c-1个稳态相位已经确定，c是大于等于2的整数，该混合模型所对应的公式为S31. Based on the previously determined Gaussian model, create and train a mixture model containing two Gaussian distribution functions, without loss of generality, assuming that the first c-1 steady-state phases have been determined, and c is an integer greater than or equal to 2, The formula corresponding to the mixed model is

p(x|θ_c)＝α_c-1p(x|c-1)+α_cp(x|c)p(x|θ _c )=α _c-1 p(x|c-1)+α _c p(x|c)

其中，第c-1个高斯模型的概率密度函数p(x|c-1)已确定，记包含N_c-1个样本的第c-1个稳态相位为X_c-1，第c个高斯模型的概率密度函数p(x|c)待定，假设包含N_c个样本的第c个稳态相位为X_c，记X_m＝{X_c-1,X_c}为此高斯混合模型的训练数据，所对应的训练参数θ_c＝{α_c-1,α_c,μ_c,Σ_c}，α_c-1和α_c分别是混合高斯模型中第c-1个和第c个高斯成分的组合系数，μ_c和Σ_c分别是第c个高斯概率密度函数p(x|c)中的均值向量和方差矩阵，利用最大期望算法即EM算法对上述混合模型进行训练；Among them, the probability density function p(x|c-1) of the c-1 th Gaussian model has been determined, and the c-1 th steady-state phase including N _c-1 samples is X _c-1 , and the c th The probability density function p(x|c) of the Gaussian model is to be determined. Assume that the c-th steady-state phase containing N _c samples is X _c , and denote X _m ={X _c-1 , X _c } for this Gaussian mixture model. Training data, the corresponding training parameters θ _c = {α _c-1 , α _c , μ _c , Σ _c }, α _c-1 and α _c are the c-1th and cth Gaussian in the mixed Gaussian model, respectively Combination coefficients of components, μ _c and Σ _c are the mean vector and variance matrix in the c-th Gaussian probability density function p(x|c) respectively, and the above-mentioned mixed model is trained by using the maximum expectation algorithm, ie the EM algorithm;

S32、抽取样本点对混合模型进行稳态相位验证，S32, extracting sample points to verify the steady-state phase of the hybrid model,

即自第N_c-1/2个样本点开始验证以找到连续三个满足第二验证条件的样本点并将满足该第二验证条件的样本点所对应的序号标记为所述验证条件为That is, start the verification from the N _c-1 /2th sample point to find three consecutive sample points that meet the second verification condition and mark the sequence number corresponding to the sample point that meets the second verification condition as The verification condition is

其中，ρ是预先指定的阈值；where ρ is a pre-specified threshold;

S33、判断是否等于N_c+N_c-1，是则表示结果收敛即第c个稳态相位已确定；否则令并返回步骤S31进行迭代直至等于N_c+N_c-1。S33. Judgment Whether it is equal to N _c +N _c-1 , if it is, it means that the result is converged, that is, the c-th steady-state phase has been determined; otherwise, let And return to step S31 to iterate until is equal to N _c +N _c-1 .

可选的，在其中一个实施例中，所述基于已确定的两个相邻稳态相位模型确定出两个稳态相位之间可能存在的暂态相位包括包括：Optionally, in one of the embodiments, the determining, based on the determined two adjacent steady-state phase models, a transient phase that may exist between the two steady-state phases includes:

自两个已经确定的两个稳态相位X_c-1和X_c，从第c个稳态相位第一个样本点开始检验并找出连续满足p(x_n|c)＜ρ的样本点，记为暂态相位X_c-1,c。From the two determined two stable phases X _c-1 and X _c , start the test from the first sample point of the c-th stable phase and find out the continuous sample points that satisfy p(x _n |c)<ρ , denoted as the transient phase X _c-1,c .

此外，为解决传统技术所存在的不足，还提出了一种基于局部高斯混合模型的相位划分系统。In addition, in order to solve the shortcomings of traditional techniques, a phase division system based on local Gaussian mixture model is also proposed.

一种基于局部高斯混合模型的相位划分系统，包括：A phase division system based on a local Gaussian mixture model, including:

采集单元，其用于采集样本并创建历史训练数据集；a collection unit for collecting samples and creating historical training data sets;

第一高斯分布创建单元，其用于按照采样时间顺序自所述历史训练数据集选择部分样本创建第一个高斯分布模型以确定出第一个稳态相位数据；a first Gaussian distribution creation unit, which is configured to select partial samples from the historical training data set to create a first Gaussian distribution model in the order of sampling time to determine the first steady-state phase data;

高斯混合模型创建单元，其用于基于前一个已经确定的高斯模型，创建包含两个高斯成分的高斯混合模型以确定出下一稳态相位并与暂态相位获取单元配合完成对全部样本数据的相位划分；The Gaussian mixture model creation unit is used to create a Gaussian mixture model containing two Gaussian components based on the previously determined Gaussian model to determine the next steady state phase, and cooperate with the transient phase acquisition unit to complete the analysis of all sample data. phase division;

暂态相位获取单元，其用于基于两个相邻稳态相位数据确定出对应的暂态相位数据。A transient phase acquisition unit, configured to determine corresponding transient phase data based on two adjacent steady-state phase data.

可选的，在其中一个实施例中，所述第一高斯分布创建单元包括：Optionally, in one embodiment, the first Gaussian distribution creation unit includes:

第一数据获取模块，其用于依次自所述历史训练数据集选取前N₁个样本并计算其均值和方差以获得对应的高斯分布函数p(x|1)，其中，p(x|1)表示第一个高斯分布模型的概率密度函数，x表示采集的样本数据；A first data acquisition module, which is used to sequentially select the first N ₁ samples from the historical training data set and calculate their mean and variance to obtain a corresponding Gaussian distribution function p(x|1), where p(x|1 ) represents the probability density function of the first Gaussian distribution model, and x represents the collected sample data;

第一稳态相位验证模块，其用于抽取样本点进行稳态相位验证，即自第N₁/2个样本点开始验证以找到连续三个满足第一验证条件的样本点并将满足第一该验证条件的样本点所对应的序号标记为所述验证条件为The first steady-state phase verification module is used for extracting sample points for steady-state phase verification, that is, starting from the N ₁ /2th sample point for verification to find three consecutive sample points that satisfy the first verification condition and will satisfy the first The serial number corresponding to the sample point of the verification condition is marked as The verification condition is

其中，ρ是预先指定的阈值；where ρ is a pre-specified threshold;

第一个稳态相位确定模块，其用于判断N₁是否等于是则表示结果收敛即第一个稳态相位已确定并进行下一步；否则令并由第一稳态相位验证模块重新进行迭代直至N₁等于 The _first steady-state phase determination module, which is used to determine whether N1 is equal to Yes, it means that the result is converged, that is, the first steady-state phase has been determined and the next step is performed; otherwise, let and iterates again by the _first steady-state phase verification module until N1 equals

可选的，在其中一个实施例中，所述高斯混合模型创建单元包括：Optionally, in one embodiment, the Gaussian mixture model creation unit includes:

第二数据获取模块，其用于基于前一个已经确定的高斯模型，创建包含两个高斯成分的高斯混合模型并进行训练，假设前c-1个稳态相位已经确定，c是大于等于2的整数，该混合模型所对应的公式为The second data acquisition module is used to create and train a Gaussian mixture model containing two Gaussian components based on the previously determined Gaussian model, assuming that the first c-1 steady-state phases have been determined, and c is greater than or equal to 2 Integer, the formula corresponding to the mixed model is

p(x|θ_c)＝α_c-1p(x|c-1)+α_cp(x|c)p(x|θ _c )=α _c-1 p(x|c-1)+α _c p(x|c)

利用最大期望算法即EM算法对上述混合模型进行训练，其中，记X_m＝{X_c-1,X_c}，所对应的训练差数θ_c＝{α_c-1,α_c,μ_c,Σ_c}；The above-mentioned mixed model is trained by using the maximum expectation algorithm, namely the EM algorithm, wherein, denoting X _m ={X _c-1 ,X _c }, the corresponding training difference θ _c ={α _c-1 ,α _c ,μ _c ,Σ _c };

第二稳态相位验证模块，其用于抽取样本点对混合模型进行稳态相位验证，即自第N_c-1/2个样本点开始验证以找到连续三个满足第二验证条件的样本点并将满足该第二验证条件的样本点所对应的序号标记为所述验证条件为The second steady-state phase verification module is used to extract sample points to perform steady-state phase verification on the mixed model, that is, start the verification from the N _c-1 /2th sample point to find three consecutive sample points that satisfy the second verification condition and mark the serial number corresponding to the sample point that satisfies the second verification condition as The verification condition is

其中，ρ是预先指定的阈值；where ρ is a pre-specified threshold;

第二个稳态相位确定模块，其用于判断是否等于N_c+N_c-1，是则表示结果收敛即第c个稳态相位已确定；否则令并由第二稳态相位验证模块重新进行迭代直至等于N_c+N_c-1。The second steady-state phase determination module, which is used to determine Whether it is equal to N _c +N _c-1 , if it is, it means that the result is converged, that is, the c-th steady-state phase has been determined; otherwise, let and iterates again by the second steady-state phase verification module until is equal to N _c +N _c-1 .

可选的，在其中一个实施例中，所述暂态相位获取单元的处理过程包括：自两个已经确定的两个稳态相位X_c-1和X_c，从第c个稳态相位第一个样本点开始检验并找出连续满足p(x_n|c)＜ρ的样本点，记为暂态相位X_c-1,c。Optionally, in one of the embodiments, the processing procedure of the transient phase acquisition unit includes: from the two determined two steady-state phases X _c-1 and X _c , from the c-th steady-state phase A sample point starts to test and find out the continuous sample points that satisfy p(x _n |c)<ρ, which is recorded as the transient phase X _c-1,c .

此外，为解决传统技术存在的不足，还提出了一种计算机可读存储介质，包括计算机指令，当所述计算机指令在计算机上运行时，使得计算机执行所述的方法。In addition, in order to solve the shortcomings of the traditional technology, a computer-readable storage medium is also proposed, which includes computer instructions, when the computer instructions are executed on the computer, the computer can execute the method.

实施本发明实施例，除了解决了现有相位划分方法中没有考虑时序性和暂态相位划分等不足，本发明还具有下述有益效果：即(1)本发明从高斯分布的角度，用一个独立的高斯分布描述一个稳态相位，用两个相邻的高斯分布的混合模型描述暂态相位，使得相位划分方法既可以有效的划分出稳态相位，又能同时确定暂态相位；(2)本发明每次迭代只采用局部的数据进行建模验证，极大的减少了冗余计算，提高了计算效率；(3)本发明采用步进式更新策略，按照采样时间顺序，逐步确定出稳态相位和暂态相位，具有相位划分个数不需要预指定和划分结果不需要后续处理等优点。In the implementation of the embodiments of the present invention, in addition to solving the deficiencies in the existing phase division methods that do not consider timing and transient phase division, the present invention also has the following beneficial effects: namely (1) the present invention uses a Gaussian distribution from the perspective of a An independent Gaussian distribution describes a steady-state phase, and a mixture model of two adjacent Gaussian distributions is used to describe the transient phase, so that the phase division method can effectively divide the steady-state phase and determine the transient phase at the same time; (2 ) Each iteration of the present invention only uses local data for modeling verification, which greatly reduces redundant calculations and improves computational efficiency; (3) the present invention adopts a step-by-step update strategy, and according to the sampling time sequence, step by step determines The steady-state phase and the transient phase have the advantages that the number of phase divisions does not need to be pre-specified and the division results do not require subsequent processing.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

其中：in:

图1a为一个实施例中初始模型更新示意图；Fig. 1a is a schematic diagram of initial model update in one embodiment;

图1b为一个实施例中混合模型更新示意图；Fig. 1b is a schematic diagram of hybrid model updating in one embodiment;

图2为一个实施例中局部高斯混合模型相位划分示意图；2 is a schematic diagram of phase division of a local Gaussian mixture model in one embodiment;

图3为一个实施例中青霉素发酵过程示意图；Fig. 3 is a schematic diagram of penicillin fermentation process in one embodiment;

图4为一个实施例中核心步骤流程图。FIG. 4 is a flow chart of core steps in one embodiment.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

除非另有定义，本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的，不是旨在限制本发明。可以理解，本发明所使用的术语“第一”、“第二”等可在本文中用于描述各种元件，但这些元件不受这些术语限制。这些术语仅用于将第一个元件与另一个元件区分。举例来说，在不脱离本申请的范围的情况下，可以将第一元件称为第二元件，且类似地，可将第二元件为第一元件。第一元件和第二元件两者都是元件，但其不是同一元件。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terms used herein in the description of the present invention are for the purpose of describing specific embodiments only, and are not intended to limit the present invention. It will be understood that the terms "first", "second", etc., as used herein, may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish a first element from another element. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present application. Both the first element and the second element are elements, but they are not the same element.

在针对现有相位划分方法中没有考虑时序性和暂态相位划分等不足，在本实施例中，特提出了一种基于局部高斯混合模型的相位划分方法，其通过利用改进的局部高斯混合模型方法，采用步进式的概率建模，对多相位批次过程进行相位划分；即每次通过建立局部的高斯混合模型，即只有一个高斯成分的初始模型和包含两个高斯成分的混合模型，利用迭代的方式可以同时确定过程中稳态相位和暂态相位，具体的，如图4所示，该方法包括如下步骤：In view of the lack of consideration of timing and transient phase division in the existing phase division methods, in this embodiment, a phase division method based on a local Gaussian mixture model is proposed, which uses an improved local Gaussian mixture model. The method uses step-by-step probability modeling to perform phase division on the multiphase batch process; that is, each time a local Gaussian mixture model is established, that is, an initial model with only one Gaussian component and a mixture model containing two Gaussian components, The steady-state phase and the transient phase in the process can be simultaneously determined by an iterative method. Specifically, as shown in Figure 4, the method includes the following steps:

S1、采集样本并创建历史训练数据集；在一些具体的实施例中，通过采集批次过程数据并展开成获取历史训练数据集；S1, collect samples and create a historical training data set; in some specific embodiments, by collecting batch process data and expand into Get historical training datasets;

S2、按照采样时间顺序自所述历史训练数据集选择部分样本创建第一个高斯分布模型以确定出第一个稳态相位数据，本步骤的目的是建立只包含一个高斯成分的初始模型，初始模型更新示意图如图1(a)所示；在一些具体的实施例中，自所述历史训练数据集选择部分样本创建第一个高斯分布模型以确定出第一个稳态相位数据包括：S2. Select some samples from the historical training data set according to the sampling time sequence to create the first Gaussian distribution model to determine the first steady-state phase data. The purpose of this step is to establish an initial model containing only one Gaussian component. A schematic diagram of model update is shown in Figure 1(a); in some specific embodiments, selecting partial samples from the historical training data set to create the first Gaussian distribution model to determine the first steady-state phase data includes:

S22、抽取样本点进行稳态相位验证，即自第N₁/2个样本点开始验证以找到连续三个满足第一验证条件的样本点并将满足第一该验证条件的样本点所对应的序号标记为所述验证条件为S22, extracting sample points for steady-state phase verification, that is, starting from the N ₁ /2th sample point for verification to find three consecutive sample points that meet the first verification condition serial number marked as The verification condition is

其中，ρ是预先指定的阈值，例如ρ＝0.001，如果N₁/2为非整数，则向前取整或者向后取整均可，即若N₁/2为16.5，则可取16或者17均可；Among them, ρ is a pre-specified threshold, for example ρ=0.001, if N ₁ /2 is a non-integer, it can be rounded forward or backward, that is, if N ₁ /2 is 16.5, it can be 16 or 17 can be;

S3、基于前一个已经确定的高斯模型，创建包含两个高斯成分的高斯混合模型以确定出下一稳态相位，本步骤的目的是建立包含两个高斯成分的混合模型，不失一般性，由于前c-1个稳态相位已经确定，现在确定出第c个稳态相位即可，混合模型更新示意图如图1(b)所示；在一些具体的实施例中，所述创建高斯混合模型以确定出下一稳态相位数据包括：S3. Based on the previously determined Gaussian model, create a Gaussian mixture model containing two Gaussian components to determine the next steady state phase. The purpose of this step is to establish a mixture model containing two Gaussian components without loss of generality. Since the first c-1 steady-state phases have been determined, the c-th steady-state phase can now be determined, and the schematic diagram of the mixture model update is shown in Figure 1(b); in some specific embodiments, the creation of the Gaussian mixture The model to determine the next steady state phase data includes:

S31、创建包含两个高斯分布函数的混合模型并进行训练，该混合模型所对应的公式为S31. Create and train a mixture model including two Gaussian distribution functions. The formula corresponding to the mixture model is:

p(x|θ_c)＝α_c-1p(x|c-1)+α_cp(x|c)p(x|θ _c )=α _c-1 p(x|c-1)+α _c p(x|c)

其中，假定前c-1个稳态相位已经确定，记包含N_c-1个样本的第c-1个稳态相位为X_c-1，包含N_c个样本的第c个稳态相位为X_c，c是大于等于2的整数；Among them, it is assumed that the first c-1 stable phases have been determined, and the c-1 stable phase containing N _c-1 samples is denoted as X _c-1 , and the c stable phase containing N _c samples is X _c , c is an integer greater than or equal to 2;

利用最大期望算法即EM算法对上述混合模型进行训练，其中，记X_m＝{X_c-1,X_c}，所对应的训练差数θ_c＝{α_c-1,α_c,μ_c,Σ_c}；由于前c-1个稳态相位已经确定，如第一个高斯成分的均值和方差已经由S2确定，则本步骤仅计算训练差数θ_c＝{α_c-1,α_c,μ_c,Σ_c}即可；The above-mentioned mixed model is trained by using the maximum expectation algorithm, namely the EM algorithm, wherein, denoting X _m ={X _c-1 ,X _c }, the corresponding training difference θ _c ={α _c-1 ,α _c ,μ _c ,Σ _c }; Since the first c-1 steady-state phases have been determined, such as the mean and variance of the first Gaussian component have been determined by S2, this step only calculates the training difference θ _c ={α _c-1 ,α _c , μ _c , Σ _c } can be;

S32、抽取样本点对混合模型进行稳态相位验证，即自第N_c-1/2个样本点开始验证以找到连续三个满足第二验证条件的样本点并将满足该第二验证条件的样本点所对应的序号标记为所述验证条件为S32, extracting sample points to perform steady-state phase verification on the mixed model, that is, starting from the N _c-1 /2th sample point for verification to find three consecutive sample points that meet the second verification condition The serial number corresponding to the sample point is marked as The verification condition is

其中，ρ是预先指定的阈值，例如ρ＝0.001；where ρ is a pre-specified threshold, such as ρ=0.001;

所述基于前一个已经确定的高斯模型，创建包含两个高斯成分的高斯混合模型以确定出下一稳态相位包括：Based on the previously determined Gaussian model, creating a Gaussian mixture model containing two Gaussian components to determine the next steady state phase includes:

p(x|θ_c)＝α_c-1p(x|c-1)+α_cp(x|c)p(x|θ _c )=α _c-1 p(x|c-1)+α _c p(x|c)

其中，ρ是预先指定的阈值；where ρ is a pre-specified threshold;

S4、所述基于已确定的两个相邻稳态相位模型确定出两个稳态相位之间可能存在的暂态相位，本发明相位划分示意图如图2所示。在一些具体的实施例中，所述所述基于已确定的两个相邻稳态相位模型确定出两个稳态相位之间可能存在的暂态相位包括：自两个已经确定的两个稳态相位X_c-1和X_c，从第c个稳态相位第一个样本点开始检验并找出连续满足p(x_n|c)＜ρ的样本点，记为暂态相位X_c-1,c。S4. The transient phase that may exist between the two steady-state phases is determined based on the determined two adjacent steady-state phase models. The schematic diagram of the phase division of the present invention is shown in FIG. 2 . In some specific embodiments, the determining, based on the determined two adjacent steady-state phase models, a transient phase that may exist between the two steady-state phases includes: from the two already determined two steady-state phases The state phase X _c-1 and X _c , starting from the first sample point of the c-th steady state phase, and find out the continuous sample points that satisfy p(x _n |c)<ρ, recorded as the transient phase X _{c- 1,c} .

此外，为解决传统技术所存在的不足，还提出了一种基于局部高斯混合模型的相位划分系统，其包括：In addition, in order to solve the shortcomings of the traditional technology, a phase division system based on the local Gaussian mixture model is also proposed, which includes:

采集单元，其用于采集样本并创建历史训练数据集；在一些具体的实施例中，通过采集批次过程数据并展开成获取历史训练数据集；a collection unit for collecting samples and creating historical training data sets; in some specific embodiments, by collecting batch process data and expand into Get historical training datasets;

第一高斯分布创建单元，其用于按照采样时间顺序自所述历史训练数据集选择部分样本创建第一个高斯分布模型以确定出第一个稳态相位数据；在一些具体的实施例中，所述第一高斯分布创建单元包括：a first Gaussian distribution creation unit, which is configured to select partial samples from the historical training data set to create a first Gaussian distribution model in the order of sampling time to determine the first steady-state phase data; in some specific embodiments, The first Gaussian distribution creation unit includes:

第一数据获取模块，其用于依次自所述历史训练数据集选取前N₁个样本并计算其均值和方差以获得对应的高斯分布模型p(x|1)，其中，p(x|1)表示第一个高斯分布模型的概率密度函数，x表示采集的样本数据；A first data acquisition module, which is used to sequentially select the first N ₁ samples from the historical training data set and calculate their mean and variance to obtain a corresponding Gaussian distribution model p(x|1), where p(x|1 ) represents the probability density function of the first Gaussian distribution model, and x represents the collected sample data;

高斯混合模型创建单元，其用于基于前一个已经确定的高斯模型，创建包含两个高斯成分的高斯混合模型以确定出下一稳态相位并与暂态相位获取单元配合完成对全部样本数据的相位划分即由高斯混合模型创建单元确定出稳态相位，由暂态相位获取单元确定出暂态相位；在一些具体的实施例中，所述高斯混合模型创建单元包括：The Gaussian mixture model creation unit is used to create a Gaussian mixture model containing two Gaussian components based on the previously determined Gaussian model to determine the next steady state phase, and cooperate with the transient phase acquisition unit to complete the analysis of all sample data. The phase division is that the steady-state phase is determined by the Gaussian mixture model creation unit, and the transient phase is determined by the transient phase acquisition unit; in some specific embodiments, the Gaussian mixture model creation unit includes:

第二数据获取模块，其用于基于前一个已经确定的高斯模型，创建包含两个高斯分布函数的混合模型并进行训练，不失一般性，假设前c-1个稳态相位已经确定，c是大于等于2的整数，该混合模型所对应的公式为The second data acquisition module is used to create and train a mixture model containing two Gaussian distribution functions based on the previously determined Gaussian model, without loss of generality, assuming that the first c-1 steady-state phases have been determined, c is an integer greater than or equal to 2, the formula corresponding to the mixed model is

p(x|θ_c)＝α_c-1p(x|c-1)+α_cp(x|c)p(x|θ _c )=α _c-1 p(x|c-1)+α _c p(x|c)

暂态相位获取单元，其用于基于两个相邻稳态相位数据确定出对应的暂态相位数据以完成对全部样本数据的相位划分。在一些具体的实施例中，所述暂态相位获取单元的处理过程包括：自两个已经确定的两个稳态相位X_c-1和X_c，从第c个稳态相位第一个样本点开始检验并找出连续满足p(x_n|c)＜ρ的样本点，记为暂态相位X_c-1,c。A transient phase acquisition unit, configured to determine corresponding transient phase data based on two adjacent steady-state phase data to complete the phase division of all sample data. In some specific embodiments, the processing procedure of the transient phase acquisition unit includes: from the two determined two steady-state phases X _c-1 and X _c , from the first sample of the c-th steady-state phase The point starts to check and find out the continuous sample points that satisfy p(x _n |c)<ρ, which is recorded as the transient phase X _c-1,c .

基于相同的发明构思，本发明还提出了一种计算机可读存储介质，包括计算机指令，当所述计算机指令在计算机上运行时，使得计算机执行所述的方法。Based on the same inventive concept, the present invention also provides a computer-readable storage medium, comprising computer instructions, which, when the computer instructions are executed on a computer, cause the computer to execute the method.

基于上述技术方案，下面以具体实验例-青霉素发酵过程为例进行验证本发明的有效性，青霉素发酵过程的示意图如图3所示。Based on the above-mentioned technical solution, the following takes a specific experimental example—the penicillin fermentation process as an example to verify the effectiveness of the present invention. The schematic diagram of the penicillin fermentation process is shown in FIG. 3 .

具体的：specific:

在采集样本并创建历史训练数据集的阶段：这里一共产生20批正常数据用作相位划分，并且每批数据都加入了大小为N(0,0.04)的白噪音；设定每个批次的反应时间是400h，每1h采样一次，因此每个批次包含400个样本点，每个样本点包含11个变量，参见表1中。In the stage of collecting samples and creating historical training data sets: a total of 20 batches of normal data are generated for phase division, and white noise of size N(0, 0.04) is added to each batch of data; The reaction time is 400h, and sampling is performed every 1h, so each batch contains 400 sample points, each sample point contains 11 variables, see Table 1.

表1Table 1

在按照采样时间顺序确定出第一个稳态相位数据和下一稳态相位数据且对全部样本数据的相位划分阶段：第一个相位的样本点个数设置为变量数的三倍作为初始建模样本点，即N₁＝33。当ρ＝0.001时的划分结果如表2所示，表中可以看出整个过程大约分为10个稳态相位和三个暂态相位。在第一和第五稳态相位之间有三个很小的相位。实际发酵反应过程中，开始阶段是预培养阶段比较稳定，对应着第一个稳态相位。然后进入剧烈反应阶段，对应着后面三个小的稳态相位。然后过程进入馈式加料阶段，再经历一段转化之后过程进入稳定发酵阶段，最后是自溶阶段。可见此方法划分的结果和真实过程阶段可以很好地而对应。In the stage of determining the first steady-state phase data and the next steady-state phase data in the order of sampling time and dividing the phases of all sample data: the number of sample points of the first phase is set to three times the number of variables as the initial construction A sample point, ie N ₁ =33. When ρ=0.001, the division results are shown in Table 2. It can be seen from the table that the whole process is roughly divided into 10 steady-state phases and three transient phases. There are three small phases between the first and fifth steady state phases. In the actual fermentation reaction process, the initial stage is the pre-culture stage which is relatively stable, corresponding to the first steady-state phase. Then enter the violent reaction phase, corresponding to the following three small steady-state phases. Then the process enters the feed-feeding stage, and after a period of conversion, the process enters the stable fermentation stage, and finally the autolysis stage. It can be seen that the results divided by this method can correspond well with the real process stages.

表2Table 2

实施本发明实施例，将具有如下有益效果：Implementing the embodiment of the present invention will have the following beneficial effects:

除了解决了现有相位划分方法中没有考虑时序性和暂态相位划分等不足，本发明还具有下述有益效果：即(1)本发明从高斯分布的角度，用一个独立的高斯分布描述一个稳态相位，用两个相邻的高斯分布的混合模型描述暂态相位，使得相位划分方法既可以有效的划分出稳态相位，又能同时确定暂态相位；(2)本发明每次迭代只采用局部的数据进行建模验证，极大的减少了冗余计算，提高了计算效率；(3)本发明采用步进式更新策略，按照采样时间顺序，逐步确定出稳态相位和暂态相位，具有相位划分个数不需要预指定和划分结果不需要后续处理等优点。In addition to solving the shortcomings of the existing phase division methods that do not consider timing and transient phase division, the present invention also has the following beneficial effects: (1) the present invention describes a Gaussian distribution with an independent Gaussian distribution. Steady-state phase, the transient phase is described by two adjacent Gaussian distribution mixture models, so that the phase division method can not only effectively divide the steady-state phase, but also determine the transient phase at the same time; (2) each iteration of the present invention Only local data is used for modeling verification, which greatly reduces redundant calculation and improves calculation efficiency; (3) The present invention adopts a step-by-step update strategy, and determines the steady-state phase and transient state step by step according to the sampling time sequence. The phase has the advantages that the number of phase divisions does not need to be pre-specified and the division result does not need subsequent processing.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本申请专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are relatively specific and detailed, but should not be construed as a limitation on the scope of the patent of the present application. It should be pointed out that for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims

1. A phase division method based on a local Gaussian mixture model, comprising the steps of:

S1. Collect samples and create a historical training data set;

S2. Select some samples from the historical training data set to create the first Gaussian distribution model according to the sampling time sequence to determine the first steady-state phase;

S3. Based on the previously determined Gaussian model, create a Gaussian mixture model containing two Gaussian components to determine the next steady state phase;

S4. Determine a possible transient phase between the two steady-state phases based on the determined two adjacent steady-state phase models;

S5. Repeat S3 and S4 to complete the phase division of all sample data.

2. The phase division method according to claim 1, wherein selecting partial samples from the historical training data set to create the first Gaussian distribution model to determine the first steady-state phase data comprises:

S21. Select the first N ₁ samples from the historical training data set in turn and calculate their mean and variance to obtain a corresponding Gaussian distribution model p(x|1), where p(x|1) represents the first Gaussian distribution The probability density function of the model, x represents the sample data collected;

S22, extracting sample points for steady-state phase verification, that is, starting from the N ₁ /2th sample point for verification to find three consecutive sample points that satisfy the first verification condition and the corresponding sample points that satisfy the first verification condition serial number marked as The verification condition is

where ρ is a pre-specified threshold;

S23. Determine whether N ₁ is equal to Yes, it means that the result is converged, that is, the first steady-state phase has been determined and the next step is performed; otherwise, let And return to step S21 to iterate until N ₁ is equal to

3. phase division method according to claim 2, is characterized in that,

Based on the previously determined Gaussian model, creating a Gaussian mixture model containing two Gaussian components to determine the next steady state phase includes:

S31. Based on the previously determined Gaussian model, create and train a mixture model containing two Gaussian distribution functions, assuming that the first c-1 steady-state phases have been determined, c is an integer greater than or equal to 2, and the mixture model corresponds to The formula is

p(x|θ _c )=α _c-1 p(x|c-1)+α _c p(x|c)

Among them, the probability density function p(x|c-1) of the c-1 th Gaussian model has been determined, and the c-1 th steady-state phase including N _c-1 samples is X _c-1 , and the c th The probability density function p(x|c) of the Gaussian model is to be determined. Assume that the c-th steady-state phase containing N _c samples is X _c , and denote X _m ={X _c-1 , X _c } for this Gaussian mixture model. Training data, the corresponding training parameters θ _c = {α _c-1 , α _c , μ _c , Σ _c }, α _c-1 and α _c are the c-1th and cth Gaussian in the mixed Gaussian model, respectively Combination coefficients of components, μ _c and Σ _c are the mean vector and variance matrix in the c-th Gaussian probability density function p(x|c) respectively, and the above-mentioned mixed model is trained by using the maximum expectation algorithm, ie the EM algorithm;

S32, extracting sample points to verify the steady-state phase of the hybrid model,

That is, start the verification from the N _c-1 /2th sample point to find three consecutive sample points that meet the second verification condition and mark the sequence number corresponding to the sample point that meets the second verification condition as The verification condition is

where ρ is a pre-specified threshold;

S33. Judgment Whether it is equal to N _c +N _c-1 , if it is, it means that the result is converged, that is, the c-th steady-state phase has been determined; otherwise, let And return to step S31 to iterate until is equal to N _c +N _c-1 .

4 . The phase division method according to claim 3 , wherein the determining, based on the determined two adjacent steady-state phase models, a transient phase that may exist between the two steady-state phases comprises: Two determined steady-state phases X _c-1 and X _c , start from the first sample point of the c-th steady-state phase and find out the continuous sample points that satisfy p(x _n |c)<ρ, record is the transient phase X _c-1,c .

5. A phase division system based on a local Gaussian mixture model, characterized in that, comprising:

a collection unit for collecting samples and creating historical training data sets;

a first Gaussian distribution creation unit, which is configured to select partial samples from the historical training data set to create a first Gaussian distribution model in the order of sampling time to determine the first steady-state phase data;

The Gaussian mixture model creation unit is used to create a Gaussian mixture model containing two Gaussian components based on the previously determined Gaussian model to determine the next steady state phase, and cooperate with the transient phase acquisition unit to complete the analysis of all sample data. phase division;

A transient phase acquisition unit, which is used for determining a possible transient phase between the two steady-state phases based on the determined two adjacent steady-state phase models.

6. The system according to claim 5, wherein the first Gaussian distribution creation unit comprises:

A first data acquisition module, which is used to sequentially select the first N ₁ samples from the historical training data set and calculate their mean and variance to obtain a corresponding Gaussian distribution function p(x|1), where p(x|1 ) represents the probability density function of the first Gaussian distribution model, and x represents the collected sample data;

The first steady-state phase verification module is used for extracting sample points for steady-state phase verification, that is, starting from the N ₁ /2th sample point for verification to find three consecutive sample points that satisfy the first verification condition and will satisfy the first The serial number corresponding to the sample point of the verification condition is marked as The verification condition is

where ρ is a pre-specified threshold;

The _first steady-state phase determination module, which is used to determine whether N1 is equal to Yes, it means that the result is converged, that is, the first steady-state phase has been determined and the next step is performed; otherwise, let and iterates again by the _first steady-state phase verification module until N1 equals

7. The system according to claim 6, wherein the Gaussian mixture model creation unit comprises:

The second data acquisition module is used to create and train a Gaussian mixture model containing two Gaussian components based on the previously determined Gaussian model, assuming that the first c-1 steady-state phases have been determined, and c is greater than or equal to 2 Integer, the formula corresponding to the mixed model is

p(x|θ _c )=α _c-1 p(x|c-1)+α _c p(x|c)

The second steady-state phase verification module is used to extract sample points to perform steady-state phase verification on the mixed model, that is, start the verification from the N _c-1 /2th sample point to find three consecutive sample points that satisfy the second verification condition and mark the serial number corresponding to the sample point that satisfies the second verification condition as The verification condition is

where ρ is a pre-specified threshold;

The second steady-state phase determination module, which is used to determine Whether it is equal to N _c +N _c-1 , if it is, it means that the result is converged, that is, the c-th steady-state phase has been determined; otherwise, let and iterates again by the second steady-state phase verification module until is equal to N _c +N _c-1 .

8 . The system according to claim 7 , wherein the processing procedure of the transient phase acquisition unit comprises: from the two determined two steady-state phases X _c-1 and X _c , from the c-th The first sample point of the steady-state phase starts to test and finds out the continuous sample points satisfying p(x _n |c)<ρ, which is recorded as the transient phase X _c-1,c .