JP2001202358A

JP2001202358A - Bayesian inference method for mixed model and recording medium with recorded bayesian inference program for mixed model

Info

Publication number: JP2001202358A
Application number: JP2000013545A
Authority: JP
Inventors: Shuko Ueda; 修功上田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2000-01-21
Filing date: 2000-01-21
Publication date: 2001-07-27

Abstract

PROBLEM TO BE SOLVED: To provide a Bayesian inference method for mixed model and a recording medium with recorded program therefor, with which the optimal number of mixtures can be searched from the viewpoint of post-distribution maximization to the number of mixtures concerning the Bayesian inference of the mixed model. SOLUTION: This method is provided with a step for inferring the post- distribution of parameters while using a general Bayesian inference method for an initial parameter value and the initial number of mixtures, step for providing the best merged result, step for providing the best merged/divided result, step for providing the beat divided result, step for selecting the result, with which the lower limit value of a logarithmic ensemble likelihood function becomes maximum, corresponding to each of results by comparing the best merged result, the best merged/divided result and the best divided result provided in the respective steps, and step for repeatedly executing series of steps until the lower limit value of the logarithmic ensemble likelihood function is not increased any more.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、混合正規分布のベ
イズ（Ｂａｙｅｓ）推定等、主にパラメトリック統計の
ベイズ推定の基本技術に関わり、特に混合モデルのパラ
メータの事後分布を推定する混合モデルのベイズ推定法
に関し、更に詳しくは、混合モデルの確率密度関数がパ
ラメトリックに与えられ、該確率密度関数と観測データ
を用いて算出される対数アンサンブル尤度の下限値を最
大化するパラメータを逐次反復法により求める混合モデ
ルのベイズ推定方法および混合モデルのベイズ推定プロ
グラムを記録した記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a basic technique for Bayes estimation of parametric statistics, such as Bayes estimation of a mixture normal distribution, and more particularly, to Bayes estimation of a mixture model for estimating a posterior distribution of parameters of a mixture model. Regarding the estimation method, in more detail, the probability density function of the mixture model is given parametrically, and the parameter maximizing the lower limit value of the log ensemble likelihood calculated using the probability density function and the observation data is determined by a sequential iterative method. The present invention relates to a Bayesian estimation method for a mixed model to be obtained and a recording medium on which a Bayesian estimation program for a mixed model is recorded.

【０００２】[0002]

【従来の技術】今、モデルの複雑さを混合数とすると
き、混合数ｍとモデルパラメータθで規定される混合モ
デルのパラメトリックな確率分布（確率モデル）のクラ
スを2. Description of the Related Art When the complexity of a model is represented by a mixture number, the class of a parametric probability distribution (probability model) of the mixture model defined by the mixture number m and the model parameter θ is defined as

【数１】とし、これを仮説空間と呼ぶこととする（・；任意の変
数）。(Equation 1) And this is called a hypothesis space (·; arbitrary variable).

【０００３】統計的推定とは、観測データＤ＝｛ｄ₁，ｄ₂，…，ｄ_n｝に基づいて仮説空間上で真のモデルを最良近似する仮説
ｐ（Ｄ｜θ，ｍ）を“探索”することと言える。そして
その近似の良さの尺度として“尤度”が用いられる。[0003] Statistical estimation refers to a hypothesis p (D | θ, m) that best approximates a true model in a hypothesis space based on observation data D = {d ₁ , d ₂ ,..., D _n }. You can say "search". "Likelihood" is used as a measure of the goodness of the approximation.

【０００４】観測データＤが得られると、モデルパラメ
ータθに対する尤度が算出できる。When the observation data D is obtained, the likelihood for the model parameter θ can be calculated.

【数２】尤度推定では、この尤度を最大にするパラメータθ＾を有する確率モデルｐ（・｜θ＾，ｍ）を最良モデルと
する。そしてその値を尤度推定値と呼ぶ。尤度はｍの増
加とともに一般に単調増加するため、モデル指標を予め
固定した上で最良モデルを推定する。つまり、最尤（Ｍ
Ｌ：ＭａｘｉｍｕｍＬｉｋｅｌｉｈｏｏｄ）推定法で
はｍを固定した仮説空間でのモデル探索と言える。(Equation 2) In the likelihood estimation, a parameter θ ＾ that maximizes the likelihood is Is set as the best model. The value is called a likelihood estimation value. Since the likelihood generally increases monotonically with the increase of m, the best model is estimated after fixing the model index in advance. That is, the maximum likelihood (M
The L (Maximum Likelihood) estimation method can be said to be a model search in a hypothetical space where m is fixed.

【０００５】ベイズ（ＶＢ：Ｖａｒｉａｔｉｏｎａｌ
Ｂａｙｅｓ）推定では、尤度に加えてパラメータθの事
前分布ｐ（θ｜ｍ）をも考慮する。即ち、パラメータを
も確率変数として取り扱う。そしてベイズ推定では、最
尤推定の様に１つの仮説ｐ（Ｄ｜θ＾，ｍ）を求めるの
ではなく、未知データｄ_n+1に対し、観測データＤが与
えられた下でのθの事後分布ｐ（θ｜Ｄ，ｍ）で仮説ｐ
（ｄ_n+1｜θ，ｍ）を重み付き平均した“事後の予測分
布”ｐ（ｄ_n+1｜Ｄ，ｍ）を求めｄ_n+1についての確率
的な言明を行う。即ち、次式で計算する。[0005] Bayes (VB: Variational)
Bayes) estimation considers the prior distribution p (θ | m) of the parameter θ in addition to the likelihood. That is, the parameters are also handled as random variables. In Bayesian estimation, instead of obtaining one hypothesis p (D | θ ＾, m) as in the maximum likelihood estimation, unknown data d _{n + 1} are calculated based on θ of unknown data given observation data D. Hypothesis p with posterior distribution p (θ | D, m)
(D _{n + 1} | θ, m) is weighted and averaged to obtain a “post-prediction distribution” p (d _{n + 1} | D, m), and a probabilistic statement is made about d _{n + 1} . That is, it is calculated by the following equation.

【０００６】[0006]

【数３】ベイズ推定では、混合数ｍも確率変数として取り扱え
る。即ち、ｍの分布Ｐ（ｍ）も考慮すると式（２）は次
式のように書き換えられる。(Equation 3) In Bayesian estimation, the number of mixtures m can be treated as a random variable. That is, when the distribution P (m) of m is also considered, the expression (2) can be rewritten as the following expression.

【０００７】[0007]

【数４】例えば、ｄ_i＝（ｘ_i，ｙ_i），ｉ＝１，…，ｎが平均０、分散１の正規ノイズεを用いてｙ＝ｆ（ｘ；θ）＋ε から生成されるとする回帰モデルの場合、未知入力ｘ
_n+1に対する予測期待出力(Equation 4) For example, a regression model in which d _i = (x _i , y _i ), i = 1,..., N is generated from y = f (x; θ) + ε using a normal noise ε having an average of 0 and a variance of 1 , The unknown input x
Expected expected output for _{n + 1}

【数５】は式（３）の両辺の期待値をとった次式で算出される。(Equation 5) Is calculated by the following equation taking the expected values of both sides of equation (3).

【０００８】[0008]

【数６】上記の事後予測分布は特殊な場合を除き解析的に求める
ことが困難で何らかの近似法を援用する。その一近似法
としてラプラス近似法がある(D.MacKay,“A practical
Bayesian framework for backpropagation networks,”
Neural Computation,vol.4,pp.448-472,1992）。ラプラ
ス近似法では事後分布をガウス関数近似し、上記積分を
解析的に求める手法である。しかしながら、この近似は
サンプル数が無限個存在するという仮定の下での近似
で、有限データの場合近似の精度に問題がある。(Equation 6) It is difficult to analytically obtain the posterior prediction distribution except for special cases, and some approximation method is used. One such approximation is the Laplace approximation (D. MacKay, “A practical
Bayesian framework for backpropagation networks, ”
Neural Computation, vol. 4, pp. 448-472, 1992). In the Laplace approximation method, a posterior distribution is approximated by a Gaussian function, and the integral is analytically obtained. However, this approximation is an approximation under the assumption that the number of samples is infinite, and there is a problem in the accuracy of approximation in the case of finite data.

【０００９】より正確な近似法としてマルコフ連鎖モン
テカルロ（ＭＣＭＣ）法がある(D.Gamerman,“Markov c
hain Monte Carlo, ”Chapman ＆ Hall,1997）。今、ベ
イズ推定における期待値計算を一般的にA more accurate approximation is the Markov chain Monte Carlo (MCMC) method (D. Gamerman, “Markovc
hain Monte Carlo, "Chapman & Hall, 1997).

【数７】と書くと、ｐ（ｘ）の分布に従うサンプル(Equation 7) And the sample that follows the distribution of p (x)

【数８】が生成できればΦは次式で近似できる。(Equation 8) Can be approximated by the following equation.

【００１０】[0010]

【数９】これがＭＣＭＣ法の基本的な考え方である。(Equation 9) This is the basic concept of the MCMC method.

【００１１】単純なモンテカルロ法との相違点は、ｘ空
間全てを評価するのではなく、ｐ（ｘ）を近似する有限
個の｛ｘ_t｝をサンプリングという形式で“生成”する
点にある。サンプリングの具体的手法としてメトロポリ
ス法、Gibbs サンプリング法が著名である。A difference from the simple Monte Carlo method is that a finite number of {x _t } approximating p (x) is "generated" in the form of sampling instead of evaluating the entire x space. Metropolis method and Gibbs sampling method are famous as specific sampling methods.

【００１２】しかしながら、これらＭＣＭＣ法はサンプ
リングに膨大な時間を要し、また、収束判定も一般には
容易ではないという問題がある。近年、ラプラス近似よ
りも近似精度が高く、ＭＣＭＣに比べ遥かに効率的な、
ベイズ推定の第三のアプローチであるVariational ベイ
ズ推定法が提案された(S.R.Waterhouse,D.MacKay andA.
J.Robinson,“Bayesian methods for mixture of exper
ts,”Advances in Neural Information Processing Sys
tems (NIPS8),1995）。[0012] However, these MCMC methods require a great deal of time for sampling, and the convergence determination is generally not easy. In recent years, approximation accuracy is higher than Laplace approximation and much more efficient than MCMC.
A third approach to Bayesian estimation, Variational Bayesian estimation, was proposed (SRWaterhouse, D. MacKay and A.
J. Robinson, “Bayesian methods for mixture of exper
ts, ”Advances in Neural Information Processing Sys
tems (NIPS8), 1995).

【００１３】混合モデルの場合、観測データｘ_nはどの
要素モデルから生成されたかは未知である。この場合、
潜在変数Ｚ_i ⁿを導入し、ｘ_nが第ｉ要素モデルから生
成されたとき、Ｚ_i ⁿ＝１、さもなくばＺ_i ⁿ＝０とす
る。そして、この潜在変数の集合をＺ＝｛Ｚ_i ⁿ｜ｉ＝１，・・・，Ｃ、ｎ＝１，・・・，
Ｎ｝とする。ここでＣは混合数、Ｎは観測データ数を表す。In the case of the mixed model, it is not known from which element model the observation data _xn was generated. in this case,
Introducing a latent variable Z _i ^n, when x _n is generated from the i element model, and Z _i ⁿ = 1, else Z _i ⁿ = 0. Then, a set of latent variables _{^{Z = {Z i n | i}} = 1, ···, C, n = 1, ···,
N｝. Here, C represents the number of mixtures, and N represents the number of observation data.

【００１４】ベイズ推定では、前述した様に、全ての未
知量Ｚ，θ，ｍを確率変数として取り扱う。当初のベイ
ズ推定では、ｍは固定（定数扱い）していたが、Attias
はベイズ推定でモデル選択を行うべくｍも確率変数とし
て、MacKayのベイズ推定の定式化を拡張した(H.Attias,
“Inferrring parameters and structure of Graphical
models by variational Bayes, ”to appear in Advan
ces in Neural Information Processing Systems (NIPS
12))。以下にこれについて詳述する。In Bayesian estimation, as described above, all unknowns Z, θ, and m are handled as random variables. In the initial Bayesian estimation, m was fixed (constant), but Attias
Extended MacKay's formulation of Bayesian estimation, where m is also a random variable, for model selection in Bayesian estimation (H. Attias,
“Inferrring parameters and structure of Graphical
models by variational Bayes, ”to appear in Advan
ces in Neural Information Processing Systems (NIPS
12)). This will be described in detail below.

【００１５】全ての未知量を周辺化した次式のアンサン
ブル尤度を考える。Consider an ensemble likelihood of the following equation in which all unknowns are marginalized.

【００１６】[0016]

【数１０】ここでＬは観測データＤのみの関数であることに注意。(Equation 10) Note that L is a function of observation data D only.

【００１７】尚、全ての確率変数の結合分布ｐ（Ｄ，
Ｚ，θ，ｍ）はIt should be noted that the joint distribution p (D,
Z, θ, m)

【数１１】ｐ（Ｄ，Ｚ，θ，ｍ）＝ｐ（Ｄ，Ｚ｜ｍ）ｐ（θ｜ψ，ｍ）ｐ（ｍ｜Ｍ） …（７）と分解できる。式（７）の右辺第一項はモデル指標が与
えられた下での完全データ（Ｄ，Ｚ）の尤度に、第二項
はモデル指標が与えられた時のパラメータθの事前分
布、そして、第三項はモデル指標の事前分布に各々対応
している。ψ，Ｍは事前分布を規定するハイパーパラメ
ータ（定数）である。(11) p (D, Z, θ, m) = p (D, Z | m) p (θ | ψ, m) p (m | M) (7) The first term on the right side of equation (7) is the likelihood of complete data (D, Z) under the given model index, the second term is the prior distribution of parameter θ when the model index is given, and , The third term respectively correspond to the prior distribution of the model index. ψ and M are hyperparameters (constants) that define the prior distribution.

【００１８】ここで、新たな分布Ｑを導入し、対数関数
に対するJensenの不等式を適用することにより次式を得
る。Here, the following equation is obtained by introducing a new distribution Q and applying Jensen's inequality to the logarithmic function.

【００１９】[0019]

【数１２】但し、表記〈ｆ（ｘ）〉_p(x)はｘの分布ｐ（ｘ）に関す
るｆ（ｘ）の期待値：(Equation 12) Where the notation <f (x)> _{p (x)} is the expected value of f (x) for the distribution p (x) of x:

【数１３】を表すものとする。(Equation 13) Shall be expressed.

【００２０】またＦはＱを変関数とする汎関数で、対数
アンサンブル尤度Ｌの下限値となっている。そしてＬと
Ｆの間には次式に関係式が成り立つ。 F is a functional having Q as a variable function, which is the lower limit of the log ensemble likelihood L. And L
The following relational expression holds between F.

【００２１】[0021]

【数１４】ここに、ＫＬ（ｐ（ｘ）‖ｑ（ｘ））は２つの分布ｐ（ｘ），ｑ（ｘ）間の距離でKullback L
ibler 情報量と呼ばれ（坂本，石黒，北川，“情報量統
計学，”共立出版，1991）、次式で定義される。[Equation 14] Here, KL (p (x) ‖q (x)) is a Kullback L by the distance between two distributions p (x) and q (x).
It is called ibler information quantity (Sakamoto, Ishiguro, Kitagawa, "Information Information Statistics," Kyoritsu Shuppan, 1991) and is defined by the following equation.

【数１５】式（９）でＬがＤのみに存在する定数であることに注意
すると、下限値を最大化すべく、Ｆ［Ｑ］をＱに関して
最大化することは、Ｑと真の事後分布ｐ（・｜Ｄ）との
ＫＬ情報量を最小化することと等価である。換言すれ
ば、Ｆを最大化する分布Ｑは真の事後分布の最良の近似
となっている。真の事後分布をvariational 近似する事
後分布であることから、Ｑはvariational 事後分布と呼
ばれる（Ｑは事後分布故、本来はＱ（・｜ｍ，Ｄ）と書
くべきであるが表記を簡単にする為、Ｄを省略してい
る）。(Equation 15) Note that in equation (9), L is a constant that exists only in D, maximizing F [Q] with respect to Q in order to maximize the lower bound, requires that Q and the true posterior distribution p (· | D) is equivalent to minimizing the amount of KL information. In other words, the distribution Q that maximizes F is the best approximation of the true posterior distribution. Q is called a variational posterior distribution because it is a posterior distribution that approximates the true posterior distribution by variational (Q is a posterior distribution, so it should be written as Q (• m, D), but the notation is simplified. Therefore, D is omitted).

【００２２】Ｑとして各未知変量毎に分解したDecomposed for each unknown variable as Q

【数１６】Ｑ＝Ｑ（Ｚ｜ｍ）Ｑ（θ｜ｍ）Ｑ（ｍ） …（10）の形(factorization）を仮定するが、各分布のクラスは
任意で良いとする。式（10）の制約された形で真の事後
分布を推定するため一般には真の分布に一致しないが、
全パラメータの同時事後分布を単一の正規分布で近似す
るラプラス近似法に比べれば、遥かに近似精度が高いと
言える。Q = Q (Z | m) Q (θ | m) Q (m) (10) It is assumed that each distribution has an arbitrary class. In order to estimate the true posterior distribution in the constrained form of equation (10), it generally does not match the true distribution,
It can be said that the approximation accuracy is much higher than the Laplace approximation method in which the simultaneous posterior distribution of all parameters is approximated by a single normal distribution.

【００２３】モデル指標ｍが与えられた下でのθの最適
variational 事後分布Ｑ（θ｜ｍ）は、制約条件 ∫Ｑ（θ｜ｍ）ｄθ＝１の下でＦ［Ｑ］をＱに関して最大化することにより得ら
れる。Optimum θ for given model index m
The variational posterior distribution Q (θ | m) is obtained by maximizing F [Q] with respect to Q under the constraint ∫Q (θ | m) dθ = 1.

【００２４】[0024]

【数１７】但し、Ｃ_θは ∫Ｑ（θ｜ｍ）ｄθ＝１となるための規格化定数である。同様に、[Equation 17] Here, C _θ is a normalized constant for satisfying ∫Q (θ | m) dθ = 1. Similarly,

【数１８】式（11），（12）より明らかな様に、Ｑ（θ｜ｍ）とＱ
（Ｚ｜ｍ）は相互に依存関係にあり閉形式で解くことは
できず逐次解法により求める。即ち、第ｔ反復での事後
分布の推定値を各々Ｑ（Ｚ｜ｍ）^(t)のＱ（θ｜ｍ）
^(t)とすると、第ｔ＋１反復での推定値を各々以下で計
算する。(Equation 18) As is clear from equations (11) and (12), Q (θ | m) and Q
(Z | m) is mutually dependent and cannot be solved in a closed form, and is determined by a sequential solution method. That is, the estimated values of the posterior distribution at the t-th iteration are respectively ^expressed as Q (θ | m) of Q (Z | m) ^(t ).
^{Assuming (t)} , the estimated value at the ^(t + 1) th iteration is calculated as follows.

【００２５】[0025]

【数１９】式（13），（14）を収束するまで実行することにより局
所最適事後分布Ｑ（Ｚ｜ｍ）^*，Ｑ（θ｜ｍ）^*が求ま
る。[Equation 19] By executing equations (13) and (14) until convergence, local optimal posterior distributions Q (Z | m) ^* and Q (θ | m) ^* are obtained.

【００２６】一方、Ｑ（Ｚ｜ｍ）^*，Ｑ（θ｜ｍ）^*が
得られれば、モデル指標ｍの最適事後分布はＦのＱ
（ｍ）に関する最大化より解析的にOn the other hand, if Q (Z | m) ^* and Q (θ | m) ^* are obtained, the optimal posterior distribution of the model index m is the Q of F
More analytically than maximization of (m)

【数２０】と求まる。Ｃ_mは規格化定数、この時、明らかに、(Equation 20) Is obtained. C _m is a normalized constant, at which time

【数２１】が事後分布最大の観点で最適な混合数となる。これが従
来の混合モデルのベイズ推定方法である。(Equation 21) Is the optimal number of mixtures from the viewpoint of the maximum posterior distribution. This is the conventional Bayesian estimation method for the mixed model.

【００２７】[0027]

【発明が解決しようとする課題】しかしながら、上述し
た様に、従来のベイズ推定法を用いれば混合モデルのベ
イズ推定が実行できるが、式（13）および式（14）に示
した従来のベイズ推定法は前記対数アンサンブル尤度関
数の下限値の局所最適解に収束するに過ぎず、必ずしも
対数アンサンブル尤度関数の下限値を最大化するわけで
はない。従って式（13）および（14）を用いて算出され
る式（15）の最適混合数の信頼性にも問題が残る。ま
た、従来の最適混合数決定法は、複数の候補の中から式
（15）を最大とする混合数を選択するという“モデル選
択的”手法であった。However, as described above, the Bayesian estimation of the mixed model can be executed by using the conventional Bayesian estimation method. However, the conventional Bayesian estimation shown in the equations (13) and (14) can be performed. The method simply converges to a local optimal solution of the lower limit of the log ensemble likelihood function, and does not necessarily maximize the lower limit of the log ensemble likelihood function. Therefore, there still remains a problem in the reliability of the optimum number of mixtures in the equation (15) calculated using the equations (13) and (14). In addition, the conventional method for determining the optimal number of mixtures is a “model-selective” method of selecting the number of mixtures that maximizes Expression (15) from a plurality of candidates.

【００２８】本発明は、上記課題に鑑みてなされたもの
で、混合モデルに対する上記ベイズ推定の局所最適性の
問題を解決し、さらにはベイズ推定においてモデルのパ
ラメータ推定と混合数の推定を同一の目的関数の最大化
問題として同時に求めることができることを示し得る混
合モデルのベイズ推定方法および混合モデルのベイズ推
定プログラムを記録した記録媒体を提供することを目的
とする。The present invention has been made in view of the above problems, and solves the problem of the local optimality of the Bayesian estimation for a mixed model. Further, in the Bayesian estimation, the estimation of the model parameters and the estimation of the number of mixtures are the same. An object of the present invention is to provide a Bayesian estimation method for a mixed model and a recording medium on which a Bayesian estimation program for a mixed model is recorded, which can indicate that it can be obtained simultaneously as a problem of maximizing an objective function.

【００２９】[0029]

【課題を解決するための手段】前述した目的を達成する
ために、本発明のうちで請求項１記載の発明は、混合モ
デルの確率密度関数がパラメトリックに与えられ、該確
率密度関数と観測データとを用いて算出される対数アン
サンブル尤度関数の下限値を最大化するパラメータの事
後分布と混合数の事後分布とを逐次反復法によって求め
るときの混合モデルのベイズ推定方法であって、初期パ
ラメータ値および初期混合数に対して、一般のベイズ推
定方法を用いてパラメータの事後分布を推定するステッ
プと、最良併合結果を得るために、２つの要素モデルを
選択し、これら２つの要素モデルを新たな１つの要素モ
デルとして併合し、この新たな要素モデルのパラメータ
の事後分布を推定した後、前記ベイズ推定方法で全ての
要素モデルのパラメータの事後分布を推定し直し、前記
併合により前記対数アンサンブル尤度関数の下限値が増
大する場合、その推定値を最良併合結果として採用し、
増大しない場合には、前記併合処理前に戻り、別の要素
モデルとの併合を行うという処理を予め定めた有限個の
候補がなくなるまで実行するステップと、最良併合分割
結果を得るために、３つの要素モデルを選択し、これら
２つの要素モデルを新たな１つの要素モデルとして併合
し、残りの１つを新たな２つの要素モデルとして分割
し、この新たな要素モデルのパラメータの事後分布を推
定した後、前記ベイズ推定方法で全ての要素モデルのパ
ラメータの事後分布を推定し直し、前記併合分割により
前記対数アンサンブル尤度関数の下限値が増大する場
合、その推定値を最良併合分割結果として採用し、増大
しない場合には、前記併合分割処理前に戻り、別の要素
モデルとの併合分割を行うという処理を予め定めた有限
個の候補がなくなるまで実行するステップと、最良分割
結果を得るために、１つの要素モデルを選択し、その要
素モデルを新たな２つのモデルとして分割し、この新た
な要素モデルのパラメータの事後分布を推定した後、前
記ベイズ推定方法で全ての要素モデルのパラメータの事
後分布を推定し直し、前記分割により前記対数アンサン
ブル尤度関数の下限値が増大する場合、その推定値を最
良分割結果として採用し、増大しない場合には、前記分
割処理前に戻り、別の要素モデルの分割を行うという処
理を予め定めた有限個の候補がなくなるまで実行するス
テップと、前記各ステップで得られた最良併合結果、最
良併合分割結果、最良分割結果のそれぞれを比較し、各
々に対応する前記対数アンサンブル尤度関数の下限値が
最大となる結果を選択するステップと、上記一連のステ
ップを前記対数アンサンブル尤度関数の下限値が増大し
なくなるまで繰り返し実行するステップとを有すること
を要旨とする。In order to achieve the above-mentioned object, according to the first aspect of the present invention, a probability density function of a mixture model is given parametrically, and the probability density function and observation data are obtained. A Bayesian estimation method for a mixture model when the posterior distribution of parameters maximizing the lower limit of the log ensemble likelihood function and the posterior distribution of the number of mixtures calculated using Estimating the posterior distribution of the parameters using a general Bayesian estimation method for the values and the initial number of mixtures, and selecting two element models to obtain the best merging result; After merging as a single element model and estimating the posterior distribution of the parameters of the new element model, the parameters of all the element models are estimated using the Bayesian estimation method. Again estimated posterior distribution of over data, if the lower limit value of the log-ensemble likelihood function by the merging is increased, adopts the estimated value as the best merged result,
If the number does not increase, the process returns to before the merging process, and performs a process of merging with another element model until there is no more than a predetermined finite number of candidates. One element model is selected, these two element models are merged as one new element model, the remaining one is divided as two new element models, and the posterior distribution of the parameters of the new element model is estimated. After that, the Bayesian estimation method re-estimates the posterior distribution of the parameters of all the element models, and when the lower limit of the log ensemble likelihood function increases due to the merged division, the estimated value is adopted as the best merged division result. However, if the number does not increase, the process returns to the step before the merging and splitting process, and the process of performing the merging and splitting with another element model is performed until a predetermined finite number of candidates disappear. Performing, and selecting one element model, dividing the element model as two new models, and estimating the posterior distribution of the parameters of the new element model to obtain the best division result, In the Bayesian estimation method, the posterior distributions of the parameters of all the element models are re-estimated, and when the lower limit of the logarithmic ensemble likelihood function is increased by the division, the estimated value is adopted as the best division result, and when the increase is not increased, Returns to before the division process, performing a process of dividing another element model until there is no more than a predetermined finite number of candidates, the best merged result obtained in each step, the best merged divided result Comparing each of the best segmentation results and selecting a result having a maximum lower limit of the logarithmic ensemble likelihood function corresponding to each of the results, The serial sequence of steps is summarized in that and a step of repeatedly executed until the lower limit value no longer increases the logarithmic ensemble likelihood function.

【００３０】請求項１記載の本発明では、局所最適解を
回避しながら、混合数の事後分布最大化の観点で最適な
混合数を探索することができる。According to the first aspect of the present invention, it is possible to search for an optimum number of mixtures from the viewpoint of maximizing the posterior distribution of the number of mixtures while avoiding a local optimum solution.

【００３１】また、請求項２記載の発明は、混合モデル
の確率密度関数がパラメトリックに与えられ、該確率密
度関数と観測データとを用いて算出される対数アンサン
ブル尤度関数の下限値を最大化するパラメータの事後分
布と混合数の事後分布とを逐次反復法によって求めると
きの混合モデルのベイズ推定プログラムを記録した記録
媒体であって、初期パラメータ値および初期混合数に対
して、一般のベイズ推定方法を用いてパラメータの事後
分布を推定するステップと、最良併合結果を得るため
に、２つの要素モデルを選択し、これら２つの要素モデ
ルを新たな１つの要素モデルとして併合し、この新たな
要素モデルのパラメータの事後分布を推定した後、前記
ベイズ推定方法で全ての要素モデルのパラメータの事後
分布を推定し直し、前記併合により前記対数アンサンブ
ル尤度関数の下限値が増大する場合、その推定値を最良
併合結果として採用し、増大しない場合には、前記併合
処理前に戻り、別の要素モデルとの併合を行うという処
理を予め定めた有限個の候補がなくなるまで実行するス
テップと、最良併合分割結果を得るために、３つの要素
モデルを選択し、これら２つの要素モデルを新たな１つ
の要素モデルとして併合し、残りの１つを新たな２つの
モデルとして分割し、この新たな要素モデルのパラメー
タの事後分布を推定した後、前記ベイズ推定方法で全て
の要素モデルのパラメータの事後分布を推定し直し、前
記併合分割により前記対数アンサンブル尤度関数の下限
値が増大する場合、その推定値を最良併合分割結果とし
て採用し、増大しない場合には、前記併合分割処理前に
戻り、別の要素モデルとの併合分割を行うという処理を
予め定めた有限個の候補がなくなるまで実行するステッ
プと、最良分割結果を得るために、１つの要素モデルを
選択し、その要素モデルを新たな２つの要素モデルとし
て分割し、この新たな要素モデルのパラメータの事後分
布を推定した後、前記ベイズ推定方法で全ての要素モデ
ルのパラメータの事後分布を推定し直し、前記分割によ
り前記対数アンサンブル尤度関数の下限値が増大する場
合、その推定値を最良分割結果として採用し、増大しな
い場合には、前記分割処理前に戻り、別の要素モデルの
分割を行うという処理を予め定めた有限個の候補がなく
なるまで実行するステップと、前記各ステップで得られ
た最良併合結果、最良併合分割結果、最良分割結果のそ
れぞれを比較し、各々に対応する前記対数アンサンブル
尤度関数の下限値が最大となる結果を選択するステップ
と、上記一連のステップを前記対数アンサンブル尤度関
数の下限値が増大しなくなるまで繰り返し実行するステ
ップとをコンピュータに実行させる混合モデルのベイズ
推定プログラムを記録媒体に記録したことを要旨とす
る。According to a second aspect of the present invention, the probability density function of the mixture model is given parametrically, and the lower limit value of the log ensemble likelihood function calculated using the probability density function and the observation data is maximized. Recording medium on which a Bayesian estimation program of a mixture model for calculating a posterior distribution of parameters to be performed and a posterior distribution of the number of mixtures by an iterative method is used. Estimating the posterior distribution of the parameters using the method and selecting the two element models to obtain the best merge result, merging these two element models as a new one element model, After estimating the posterior distribution of the parameters of the model, re-estimate the posterior distribution of the parameters of all element models by the Bayesian estimation method, When the lower limit of the logarithmic ensemble likelihood function increases due to the merging, the estimated value is adopted as the best merging result. When the lower limit does not increase, the process returns to the merging process, and merging with another element model is performed. Is performed until there is no more than a predetermined finite number of candidates, and three element models are selected to obtain the best merged division result, and these two element models are merged as a new one. , The remaining one is divided as two new models, the posterior distribution of the parameters of the new element model is estimated, and then the posterior distribution of the parameters of all the element models is re-estimated by the Bayesian estimation method. When the lower limit of the logarithmic ensemble likelihood function increases due to the merged division, the estimated value is adopted as the best merged division result. Returning to before the merge division processing, performing a merge division with another element model until there are no more than a predetermined finite number of candidates, and selecting one element model to obtain the best division result After dividing the element model as two new element models, estimating the posterior distribution of the parameters of the new element model, re-estimating the posterior distribution of the parameters of all the element models by the Bayesian estimation method, When the lower limit of the logarithmic ensemble likelihood function increases due to the division, the estimated value is adopted as the best division result, and when the lower limit does not increase, the process returns to before the division processing to perform another element model division. Is performed until there is no more than a predetermined finite number of candidates, and the best merged result, the best merged divided result, and the best divided result obtained in each of the above steps are obtained. Comparing each of them and selecting a result in which the lower limit value of the logarithmic ensemble likelihood function corresponding to each is the largest, and repeating the above series of steps until the lower limit value of the logarithmic ensemble likelihood function does not increase. The gist of the present invention is that a Bayesian estimation program of a mixed model that causes a computer to execute the steps to be executed is recorded on a recording medium.

【００３２】請求項２記載の本発明では、混合モデルの
ベイズ推定プログラムを記録媒体として記録しているた
め、該記録媒体を利用して、そのベイズ推定プログラム
の流通性を高めることができる。According to the second aspect of the present invention, since the Bayesian estimation program of the mixed model is recorded as a recording medium, the distribution of the Bayesian estimation program can be enhanced by using the recording medium.

【００３３】[0033]

【発明の実施の形態】まず、本発明の概要を本発明に係
る一実施形態を例に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS First, the outline of the present invention will be described by taking an embodiment according to the present invention as an example.

【００３４】Ｆ［Ｑ］を、Ｑを変関数とする汎関数とす
るとき、Ｆ［Ｑ］において、Ｑ（ｍ）を含まない項をま
とめてＦ _mと書くと次式を得る。[0034] The F [Q], when the functional to varying function of Q, obtained in F [Q], Q contains not when together claim written as F _m the following equation (m).

【００３５】[0035]

【数２２】ここに、Ｆ _mは次式で与えられる。(Equation 22) Here, F _m is given by the following equation.

【００３６】[0036]

【数２３】式（16）の右辺第一項は、Ｑのうち、Ｑ（Ｚ｜ｍ），Ｑ
（θ｜ｍ）に依存し、第二項はＱ（ｍ）に依存する。従
って、前述したＦ［Ｑ］の最大化は、実際には以下の２
ステップの最大化と等価である。(Equation 23) The first term on the right side of the equation (16) is Q (Z | m), Q
(Θ | m), and the second term depends on Q (m). Therefore, the above-mentioned maximization of F [Q] is actually the following 2
It is equivalent to step maximization.

【００３７】Ｓｔｅｐ１：各ｍについて、Ｆ _mのＱ（Ｚ
｜ｍ）およびＱ（θ｜ｍ）に関する最大化。Step 1: For each m, Q of F _m (Z
| M) and Q (θ | m).

【００３８】Ｓｔｅｐ２：各ｍについて、Ｆ _mのＱ
（ｍ）に関する最大化。Step 2: For each _m , Q of F _m
Maximization for (m).

【００３９】Ｆ _m ^*をＳｔｅｐ１で得られたＦ _mの最適
値を表すものとすると、式（16）より次式を得る。Assuming that F _m ^* represents the optimum value of F _m obtained in Step 1, the following equation is obtained from equation (16).

【００４０】[0040]

【数２４】従って、Ｓｔｅｐ２でのＱ（ｍ）の最適値は、 Σ_mＱ（ｍ）＝１の下でＱ（ｍ）に関する式（18）の最大化により(Equation 24) Therefore, the optimal value of Q (m) in Step 2 is obtained by maximizing Equation (18) for Q (m) under Σ _m Q (m) = 1.

【数２５】と求まる。式（19）が式（15）と等価であることは容易
に確認できる。ここで式（19）を注意深く見ると、分母
はｍに依存しないのでＱ（ｍ）のｍに関する最大化は(Equation 25) Is obtained. It can easily be confirmed that equation (19) is equivalent to equation (15). Looking carefully at equation (19), the denominator does not depend on m, so maximizing Q (m) with respect to m is

【数２６】の最大化に他ならない。簡単のためｍの事前分布を一様
分布Ｐ（ｍ｜Ｍ）＝１／Ｍとすると、Ｑ（ｍ）の最大化は単純にＦ _mの最大化とな
る。(Equation 26) It is nothing but maximizing. Uniform distribution P a prior distribution of m for simplicity | When (m M) = 1 / M , maximizing Q (m) is the largest of simply F _m.

【００４１】これは、Ｆ _mをＱ（θ｜ｍ），Ｑ（Ｚ｜
ｍ）のみならずｍに関しても同時に最大化することによ
り、式（15）を計算することなく最適なモデル指標ｍが
同時に求まることを意味する。[0041] This is, the _{F m Q (θ | m)} , Q (Z |
By maximizing not only m) but also m at the same time, it means that the optimal model index m can be obtained at the same time without calculating equation (15).

【００４２】換言すれば、Ｆ［Ｑ］ではなくＦ _mを目的
関数としてＱおよびｍに関して同時に最大化することに
より事後分布最大化（Maximum a posteriori Probabili
ty：ＭＡＰ）の観点で最適なモデルパラメータおよび最
適なモデル指標が次式の様に得られる。In other words, maximizing the posterior distribution (Maximum a posteriori Probabili) by simultaneously maximizing Q and m with F _m as the objective function instead of F [Q]
(ty: MAP), an optimal model parameter and an optimal model index are obtained as in the following equation.

【００４３】[0043]

【数２７】 θ_MAPおよびｍ_MAPが得られれば、式（３），（４）に
示した未知データｄ_n+1に対する予測分布あるいは期待
予測出力は各々[Equation 27] If θ _MAP and m _MAP are obtained, the predicted distribution or the expected predicted output for the unknown data d _{n + 1} shown in equations (3) and (4) are respectively

【数２８】として近似的に求まる。[Equation 28] Approximately.

【００４４】以上、同一の目的関数でθおよびｍの最適
値が同時に推定できることを示した。次に、θの最適値
を求める際に局所最適性の問題も同時に取り扱える方法
について詳述する。As described above, it has been shown that the optimum values of θ and m can be simultaneously estimated with the same objective function. Next, a method for simultaneously dealing with the problem of local optimality when finding the optimal value of θ will be described in detail.

【００４５】混合モデルの場合、仮説空間Ｈ _mがｍに関
する直和としてIn the case of a mixed model, the hypothesis space H _m is a direct sum with respect to m.

【数２９】で与えられる。(Equation 29) Given by

【００４６】この場合、局所解の大半はあるデータ領域
に過剰数の要素モデルが割り当てられ、かつ、あるデー
タ領域に過少数の要素モデルが割り当てられた状況に相
当する。実際、前述したＳｔｅｐ１のＦ _mの最大化、即
ち、式（13），（14）の逐次増大化では、適切な初期値
を設定しない限り、上記のような不均衡な要素モデル配
置（poorな局所解）に収束してしまう。In this case, most of the local solutions correspond to a situation where an excessive number of element models are assigned to a certain data area and an excessively small number of element models are assigned to a certain data area. In fact, maximization of F _m of Step1 described above, i.e., formula (13), (14) in the successive increase in the, unless you set the appropriate initial value, unbalanced element model such arrangements of (a poor (Local solution).

【００４７】この要素モデル配置の不均衡を解消し、よ
り良い要素モデル配置を実現するために、筆者が最尤推
定法の枠組みで先に提案したモデルの併合分割操作（特
願平１０−３４０６３９号“混合モデルの最尤推定方法
および混合モデルの最尤推定プログラムを記録した記録
媒体”および、上田修功、中野良平「混合モデルのため
の併合分割操作付きＥＭアルゴリズム」（電子情報通信
学会論文誌，vol.J82-D-11, no.5,pp 930-940,1999）の
第３節および第４節）をベイズ推定に導入する。In order to solve the imbalance of the element model arrangement and realize a better element model arrangement, the author has proposed a merging / dividing operation of the model proposed in the framework of the maximum likelihood estimation method (Japanese Patent Application No. 10-340639). No. “Recording medium that records the maximum likelihood estimation method for mixed models and the maximum likelihood estimation program for mixed models”, and Osamu Ueda and Ryohei Nakano, “EM Algorithm with Merge and Split Operation for Mixed Models” (Transactions of the Institute of Electronics, Information and Communication Engineers) , Vol.J82-D-11, no.5, pp 930-940,1999), Sections 3 and 4) are introduced into Bayesian estimation.

【００４８】但し、ここではモデル指標ｍも同時に最適
化するという点で更に拡張している。これについて以下
に詳述する。Here, however, the model index m is further extended in that it is optimized at the same time. This will be described in detail below.

【００４９】式（21）が成立する場合、Ｆ _mは、次式の
様に書ける。When the equation (21) is satisfied, F _m can be written as the following equation.

【００５０】[0050]

【数３０】ｆ_i（Ｑ（Ｚ，θ｜ｍ））はモデル指標（モデルの複雑
さ）がｍのときの第ｉモデルに対応する目的関数を意味
する。今、あるｍに対し、式（13）および式（14）によ
り得た事後分布（局所最適解）をＱ^*、そのときのＦ _m
の値をＦ _m ^*と書くこととすると、式（22）は更に次式
の様に書ける。[Equation 30] f _i (Q (Z, θ | m)) means an objective function corresponding to the i-th model when the model index (model complexity) is m. Now, for a certain m, the posterior distribution (local optimal solution) obtained by Expressions (13) and (14) is Q ^* , and F _{m at} that time
Is written as F _m ^* , equation (22) can be further written as the following equation.

【００５１】[0051]

【数３１】但し、(Equation 31) However,

【数３２】とする。この時、式（23）の右辺の(Equation 32) And At this time, the right side of equation (23)

【数３３】のみに着目し、要素モデルｊと要素モデルｋとを新たな
要素モデルｊ′として併合し、要素モデルｌを２つの要
素モデルｋ′とｌ′とに分割することにより、Ｆ _m値の
更なる増大を試みる。[Equation 33]Focusing on only element model j and element model k
The element model j 'is merged, and the element model l is
By dividing into the prime models k ′ and l ′,F _mValue of
Attempt further increase.

【００５２】要素モデルｊ，ｋ，ｌの選択、更に、新た
な要素モデルｊ′，ｋ′，ｌ′の初期化、再推定等は前
記手法（上田修功、中野良平「混合モデルのための併合
分割操作付きＥＭアルゴリズム」（電子情報通信学会論
文誌，vol.J82-D-11, no.5,pp 930-940,1999））のとき
と同様に行えば良い。The selection of the element models j, k, l, the initialization of the new element models j ', k', l ', re-estimation, etc. are carried out by the above-mentioned method (N. Ueda, R. Nakano, "Merging for Mixed Models"). EM algorithm with division operation ”(Transactions of the Institute of Electronics, Information and Communication Engineers, vol.J82-D-11, no.5, pp 930-940, 1999).

【００５３】前述した様に、最尤推定ではｍを増加（減
少）させると一般に尤度が増加（減少）するので、例え
ば、分割のみを行うと、分割と再推定により局所解から
脱出してより良い解に到達して尤度が増加したのか、単
にｍが増加したことで尤度が増加したのかの識別が困難
となる。それ故、前記手法（特願平１０−３４０６３９
号）ではｍを固定すべく、併合と分割を同時に行うよう
にしていた。As described above, in the maximum likelihood estimation, increasing (decreasing) m generally increases (decreases) the likelihood. For example, if only division is performed, the local solution is escaped by division and re-estimation. It becomes difficult to discriminate whether the likelihood has increased after reaching a better solution or simply because m has increased. Therefore, the aforementioned method (Japanese Patent Application No. 10-340639)
In order to fix m, merging and division were performed simultaneously.

【００５４】一方、本実施形態における方法では目的関
数Ｆ _mを用いてパラメータとモデルの複雑さの最適化が
実行できる。即ち、ｍの増加とともにＦ _m値は増加せ
ず、最適なｍの値に対し最大値をとる。On the other hand, it can be performed to optimize the complexity parameter and model using the objective function F _m in the method of the present embodiment. That, F _m values with increasing m is not increased, the maximum value to the optimum value of m.

【００５５】そこで、式（23）に基づく同時併合分割操
作だけでなく、“併合操作のみ”、あるいは、“分割操
作のみ”、も試みる。明らかに“併合（分割）操作の
み”はｍを１だけ増加（減少）させることを意味する。Therefore, not only the simultaneous merging / dividing operation based on the equation (23) but also a "merging operation only" or a "dividing operation only" is attempted. Obviously, "merge (split) operation only" means increasing (decreasing) m by one.

【００５６】従って、これら３種類の操作を実行し、Ｆ
_mを増大させることにより局所最適性の問題と最適な混
合数の決定の問題が同時解決することが可能となる。Therefore, these three types of operations are executed, and F
By increasing _m , the problem of local optimality and the problem of determining the optimal number of mixtures can be solved simultaneously.

【００５７】次に、図面を用いて本発明の実施の形態に
ついて説明する。図１は本発明の一実施形態に係る混合
モデルのベイズ推定方法を実施するための装置の機能構
成を示すブロック図である。Next, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a functional configuration of an apparatus for implementing a Bayesian estimation method for a mixed model according to an embodiment of the present invention.

【００５８】図１に示す実施形態では、訓練用の観測デ
ータ、例えば気象データ等に任意のデータがコンピュー
タシステムを構成するデータベース等の記憶手段を含む
観測データ入手部１を介して外部から事後分布推定部３
に与えられ、該事後分布推定部３ではコンピュータ等を
により、この観測データから、パラメータの事後分布と
混合数の事後分布を推定する。そらに、この事後分布推
定部３では、次に示すステップＳ１１からステップＳ３
７の手順を順次実行して事後分布を得た後に、事後分布
出力部５を介して出力する。In the embodiment shown in FIG. 1, arbitrary data such as observation data for training, for example, weather data, is externally distributed via an observation data acquisition unit 1 including storage means such as a database constituting a computer system. Estimation unit 3
The posterior distribution estimating unit 3 estimates the posterior distribution of the parameters and the posterior distribution of the number of mixtures from the observation data using a computer or the like. In addition, the posterior distribution estimating unit 3 performs the following steps S11 to S3.
After the steps 7 are sequentially executed to obtain the posterior distribution, the posterior distribution is output via the posterior distribution output unit 5.

【００５９】まず、ステップＳ１１においてｍを適当に
設定し、式（13）および式（14）を用いて最適事後分布
を求める。次に、ステップＳ１３で収束したときの事後
分布の値をＱ（θ｜ｍ）^*，Ｑ（Ｚ｜ｍ）^*とし、First, in step S11, m is appropriately set, and the optimal posterior distribution is obtained using equations (13) and (14). Next, let Q (θ | m) ^* and Q (Z | m) ^* be the values of the posterior distribution when converging in step S13,

【数３４】とする。(Equation 34) And

【００６０】さらに、ステップＳ１５に進み、最適事後
分布に基づき併合分割の対象となる要素モデルの候補を
ソートする。Further, the process proceeds to step S15, in which candidate element models to be merged and divided are sorted based on the optimal posterior distribution.

【００６１】次に、以下の手順、ステップＳ１７、ステ
ップＳ１９およびステップＳ２１をそれぞれ独立して実
行する。図２では、順次、ステップＳ１７、ステップＳ
１９およびステップＳ２１を実行しているが、これらス
テップＳ１７、ステップＳ１９およびステップＳ２１を
同時に並行して実行するようにしても良い。Next, the following procedures, steps S17, S19 and S21, are executed independently. In FIG. 2, step S17 and step S17 are sequentially performed.
Although step 19 and step S21 are executed, step S17, step S19 and step S21 may be executed simultaneously in parallel.

【００６２】まず、ステップＳ１７について説明する。
ステップＳ１７では、Ｃ個の併合の対象となる要素モデ
ルの候補を順に、目的関数がＦ^*を上回るまで併合操作
のみによる探索を行う。その時の目的関数の値をＦ _m-1
^**とする。First, step S17 will be described.
In step S <b ^> 17, a search is performed by using only the merging operation on the C candidate element models to be merged in order until the objective function exceeds F ^* . The value of the objective function at that time is F _m-1
^**

【００６３】ステップＳ１９では、Ｃ個の併合分割の対
象となる要素モデルの候補を順に、目的関数がＦ^*を上
回るまで同時併合分割操作による探索を行う。その時の
目的関数の値をＦ _m ^**とする。In step S 19, a search is performed for C element model candidates to be subjected to merging and division by a simultaneous merging and division operation until the objective function exceeds F ^* . The value of the objective function at that time is defined as F _m ^** .

【００６４】ステップＳ２１では、Ｃ個の分割の対象と
なる要素モデルの候補を順に、目的関数がＦ^*を上回る
まで分割操作のみによる探索を行う。その時の目的関数
の値をＦ _m+1 ^**とする。In step S21, a search is performed for only C element model candidates in order by division operation only until the objective function exceeds F ^* . The value of the objective function at that time is defined as F _{m + 1} ^** .

【００６５】次に、ステップＳ２３では、上述したステ
ップＳ１７、ステップＳ１９およびステップＳ２１にお
いてＦ^*を上回る候補がなければ終了する。Next, in step S23, if there is no candidate exceeding F ^* in steps S17, S19 and S21 described above, the process ends.

【００６６】一方、上回る候補があるときには、ステッ
プＳ２５に進み、On the other hand, if there are more candidates, the process proceeds to step S25,

【数３５】とし、もしステップＳ２７において、(Equation 35) In step S27,

【数３６】ならば（ＹＥＳ）、ステップＳ２９に進み、ステップＳ
１７で探索された探索結果を採用し、ｍ←ｍ−１として
ステップＳ１５に戻る。[Equation 36] If so (YES), the process proceeds to step S29,
The search result found in step 17 is adopted, and m ← m−1 is returned to step S15.

【００６７】ここで、もしステップＳ３１において、Here, if in step S31,

【数３７】ならば（ＹＥＳ）、ステップＳ３３に進み、ステップＳ
１９で探索された探索結果を採用し、ステップＳ１５に
戻る。(37) If yes (YES), the process proceeds to step S33,
The search result found in step 19 is adopted, and the process returns to step S15.

【００６８】さらに、もしステップＳ３５において、Further, if in step S35,

【数３８】ならば（ＹＥＳ）、ステップＳ３７に進み、ステップＳ
２１で探索された探索結果を採用し、ｍ←ｍ＋１として
ステップＳ１５に戻る。(38) If so (YES), the process proceeds to step S37,
The search result found in step 21 is adopted, and m ← m + 1 is set, and the process returns to step S15.

【００６９】上記アルゴリズムのステップＳ１７、ステ
ップＳ１９およびステップＳ２１の各々はｍを固定した
下で、Ｑ（Ｚ，θ｜ｍ）の局所解からの脱出とより良い
解への誘導を行う。そして、最適モデル選択の観点で、
この３通りのモデルの複雑さから最良のものをステップ
Ｓ２３乃至ステップＳ３７で選択する。これらを反復す
ることにより、局所解を回避しながら最適モデルを探索
することができる。上記ステップＳ１７、ステップＳ１
９およびステップＳ２１で示される手順はいわゆるgree
dy search であることから、上記アルゴリズムはＦ _mの
より良い極大値の探索であり、大域的最大値が得られる
理論的保証はない。Steps S17, S19, and S21 of the above algorithm perform escape from the local solution of Q (Z, θ | m) and guidance to a better solution while fixing m. And from the viewpoint of optimal model selection,
From the complexity of these three models, the best one is selected in steps S23 to S37. By repeating these, it is possible to search for the optimal model while avoiding local solutions. Steps S17 and S1 above
9 and step S21 are called gree.
Being a dy search, the above algorithm is a search for a better local maximum of F _m and there is no theoretical guarantee that a global maximum will be obtained.

【００７０】しかしながら、Ｆ^*の単調増加性は保証さ
れ、より良い極大値の探索が効率良く実現できる。However, the monotonic increase of F ^* is guaranteed, and a search for a better maximum value can be efficiently realized.

【００７１】図３は本実施形態における混合モデルのベ
イズ推定の有効性を実験的に示すものである。実験で
は、推定結果が可視化可能な２次元の混合正規分布推定
問題を用いた。真の混合数は５とし、図３中の点線の楕
円群は真の５つの２次元正規分布を示し、点群はこれら
５つの正規分布から人工的に生成したデータである。FIG. 3 shows experimentally the effectiveness of Bayesian estimation of the mixture model in this embodiment. In the experiment, a two-dimensional mixed normal distribution estimation problem in which the estimation result can be visualized was used. The true number of mixtures is 5, and the group of dashed ellipses in FIG. 3 indicates five true two-dimensional normal distributions, and the point group is data artificially generated from these five normal distributions.

【００７２】図３（ａ）の実線の１０個の楕円群は混合
数を１０とし、また、各パラメータの事後分布を適当に
初期化したときの各分布の最大値（ＭＡＰ推定値）に対
応するパラメータをもつ正規分布を示す。図３（ｂ）は
従来のベイズ推定法のＭＡＰ推定結果である。図３
（ｃ）は図３（ｂ）から更に前記の手順で併合、分割を
繰り返して最終的に得られた最適混合数のパラメータの
事後分布のＭＡＰ推定値である。混合数を１０と初期値
化したにもかかわらず、真の混合数を探索できており、
真の分布に近い良好な結果が得られていることがわか
る。In FIG. 3A, the ten solid ellipse groups correspond to the maximum value (MAP estimation value) of each distribution when the number of mixtures is 10, and the posterior distribution of each parameter is appropriately initialized. 2 shows a normal distribution with parameters FIG. 3B shows a MAP estimation result of the conventional Bayes estimation method. FIG.
(C) is the MAP estimated value of the posterior distribution of the parameter of the optimal number of mixing finally obtained by repeating merging and division in the above procedure from FIG. 3 (b). Despite initializing the number of mixtures to 10, we were able to search for the true number of mixtures,
It can be seen that good results close to the true distribution were obtained.

【００７３】また、前記目的関数の値は図３（ａ），
（ｂ），（ｃ）の順に−１．３７×１０³，−１．０１
×１０³，−０．８９×１０³であり、確かに目的関数
が増大するにつれてより良い推定値が得られていること
がわかる。The value of the objective function is shown in FIG.
(B), (c) -1.37 × 10 ³ , -1.01
× 10 ³ , −0.89 × 10 ³ , and it is clear that a better estimated value is obtained as the objective function increases.

【００７４】上述してきたように、本発明にあっては、
混合モデルのベイズ推定プログラムを記録媒体として記
録しているため、該記録媒体を利用して、そのベイズ推
定プログラムの流通性を高めることができる。As described above, in the present invention,
Since the Bayesian estimation program of the mixed model is recorded as a recording medium, the distribution of the Bayesian estimation program can be enhanced by using the recording medium.

【００７５】[0075]

【発明の効果】以上説明した様に、本発明によれば、初
期パラメータ値および初期混合数に対して、一般のベイ
ズ推定方法を用いてパラメータの事後分布を推定するス
テップと、最良併合結果を得るために、２つの要素モデ
ルを選択し、これら２つの要素モデルを新たな１つの要
素モデルとして併合し、この新たな要素モデルのパラメ
ータの事後分布を推定した後、前記ベイズ推定方法で全
ての要素モデルのパラメータの事後分布を推定し直し、
前記併合により前記対数アンサンブル尤度関数の下限値
が増大する場合、その推定値を最良併合結果として採用
し、増大しない場合には、前記併合処理前に戻り、別の
要素モデルとの併合を行うという処理を予め定めた有限
個の候補がなくなるまで実行するステップと、最良併合
分割結果を得るために、３つの要素モデルを選択し、こ
れら２つの要素モデルを新たな１つの要素モデルとして
併合し、残りの１つを新たな２つの要素モデルとして分
割し、この新たな要素モデルのパラメータの事後分布を
推定した後、前記ベイズ推定方法で全ての要素モデルの
パラメータの事後分布を推定し直し、前記併合分割によ
り前記対数アンサンブル尤度関数の下限値が増大する場
合、その推定値を最良併合分割結果として採用し、増大
しない場合には、前記併合分割処理前に戻り、別の要素
モデルとの併合分割を行うという処理を予め定めた有限
個の候補がなくなるまで実行するステップと、最良分割
結果を得るために、１つの要素モデルを選択し、その要
素モデルを新たな２つの要素モデルとして分割し、この
新たな要素モデルのパラメータの事後分布を推定した
後、前記ベイズ推定方法で全ての要素モデルのパラメー
タの事後分布を推定し直し、前記分割により前記対数ア
ンサンブル尤度関数の下限値が増大する場合、その推定
値を最良分割結果として採用し、増大しない場合には、
前記分割処理前にバックトラックし（戻し）、別の要素
モデルの分割を行うという処理を予め定めた有限個の候
補がなくなるまで実行するステップと、前記各ステップ
で得られた最良併合結果、最良併合分割結果、最良分割
結果のそれぞれを比較し、各々に対応する前記対数アン
サンブル尤度関数の下限値が最大となる結果を選択する
ステップと、上記一連のステップを前記対数アンサンブ
ル尤度関数の下限値が増大しなくなるまで繰り返し実行
するステップとを有するので、混合モデルのベイズ推定
に対し、局所最適解を回避しながら、混合数の事後分布
最大化の観点で最適な混合数を探索することができる。As described above, according to the present invention, the step of estimating the posterior distribution of parameters using a general Bayesian estimation method with respect to the initial parameter values and the initial number of mixtures, and In order to obtain, two element models are selected, these two element models are merged as one new element model, and the posterior distribution of the parameters of the new element model is estimated. Re-estimate the posterior distribution of the parameters of the element model,
When the lower limit of the logarithmic ensemble likelihood function increases due to the merging, the estimated value is adopted as the best merging result, and when the lower limit does not increase, the process returns to before the merging process and merges with another element model. Is performed until there is no more than a predetermined finite number of candidates, and three element models are selected to obtain the best merged division result, and these two element models are merged as a new one. , The remaining one is divided into two new element models, and the posterior distribution of the parameters of the new element model is estimated. Then, the posterior distribution of the parameters of all the element models is re-estimated by the Bayesian estimation method, If the lower limit of the logarithmic ensemble likelihood function is increased by the merged division, the estimated value is adopted as the best merged division result, and if not increased, Returning to before the merged division process, performing a process of performing merged division with another element model until there is no more than a predetermined finite number of candidates, and selecting one element model to obtain the best division result Then, the element model is divided as two new element models, the posterior distribution of the parameters of this new element model is estimated, and then the posterior distribution of the parameters of all the element models is re-estimated by the Bayesian estimation method, When the lower limit of the logarithmic ensemble likelihood function is increased by the division, the estimated value is adopted as the best division result, and when the lower limit is not increased,
Performing a process of backtracking (reverting) before the division process and dividing another element model until there is no more than a predetermined finite number of candidates; Merging division results, comparing each of the best division results, and selecting a result in which the lower limit value of the logarithmic ensemble likelihood function corresponding to each is maximum; and And the step of repeatedly executing until the value does not increase, so that it is possible to search for the optimal number of mixtures from the viewpoint of maximizing the posterior distribution of the number of mixtures while avoiding a local optimal solution for the Bayesian estimation of the mixture model. it can.

[Brief description of the drawings]

【図１】本発明の一実施形態に係る混合モデルのベイズ
推定方法を実施するための装置の機能構成を示すブロッ
ク図である。FIG. 1 is a block diagram showing a functional configuration of an apparatus for implementing a Bayesian estimation method for a mixed model according to an embodiment of the present invention.

【図２】図１に示したベイズ推定方法による事後分布推
定処理手順を示すフローチャートである。FIG. 2 is a flowchart showing a posterior distribution estimation processing procedure by the Bayes estimation method shown in FIG. 1;

【図３】本発明の有効性を実験的に示した図である。FIG. 3 is a diagram experimentally showing the effectiveness of the present invention.

[Explanation of symbols]

１観測データ入手部３事後分布推定部５事後分布出力部 1 Observation data acquisition section 3 Posterior distribution estimation section 5 Posterior distribution output section

Claims

[Claims]

1. A probability density function of a mixture model defined as a linear sum of a plurality of probability density functions is given parametrically, and a lower limit of a logarithmic ensemble likelihood function calculated using the probability density function and observation data. A Bayesian estimation method for a mixture model when a posterior distribution of a parameter maximizing a value and a posterior distribution of a mixture number are obtained by an iterative method, wherein a general Bayes estimation method is used for an initial parameter value and an initial mixture number. Estimating the posterior distribution of the parameters using, selecting the two component models in the mixture model, and merging these two component models as a new one to obtain the best merged result, After estimating the posterior distribution of the parameters of the new element model, estimating the posterior distribution of the parameters of all the element models by the Bayesian estimation method. And, if the lower limit value of the log-ensemble likelihood function by the merging is increased, adopts the estimated value as the best merged result, if not increase, the process returns before the merging process,
Performing a process of merging with another element model until there is no more than a predetermined finite number of candidates; selecting three element models in order to obtain the best merged division result; After merging as one new element model, dividing the remaining one as two new element models, estimating the posterior distribution of the parameters of this new element model, and using the Bayesian estimation method, If the posterior distribution of the parameters is re-estimated and the lower limit of the logarithmic ensemble likelihood function increases due to the merged division, the estimated value is adopted as the best merged division result. Back to
Performing a process of performing a merged division with another element model until there is no more than a predetermined finite number of candidates; selecting one element model to obtain the best division result, and replacing the element model with a new one After dividing into two element models and estimating the posterior distribution of the parameters of the new element model, the Bayesian estimation method is used to re-estimate the posterior distribution of the parameters of all the element models, and the logarithmic ensemble likelihood is obtained by the division. If the lower bound of the function increases,
Adopting the estimated value as the best division result, and if it does not increase, returning to before the division processing and executing processing of dividing another element model until there is no more than a predetermined finite number of candidates; A step of comparing each of the best merged result, the best merged divided result, and the best divided result obtained in each of the steps, and selecting a result in which the lower limit value of the logarithmic ensemble likelihood function corresponding to each is maximized; Repeatedly executing a series of steps until the lower limit of the logarithmic ensemble likelihood function no longer increases.

2. A probability density function of a mixed model defined as a linear sum of a plurality of probability density functions is given parametrically, and a lower limit of a logarithmic ensemble likelihood function calculated using the probability density function and observation data. A recording medium recording a Bayesian estimation program of a mixture model when the posterior distribution of the parameter maximizing the value and the posterior distribution of the number of mixtures are obtained by an iterative method, and for an initial parameter value and an initial number of mixtures, Estimating the posterior distribution of the parameters using a general Bayesian estimation method, and selecting two element models in the mixture model to obtain the best merged result, and replacing these two element models with a new one element model After estimating the posterior distribution of the parameters of this new element model, the Bayesian estimation method If the lower limit of the log ensemble likelihood function increases due to the merging, the estimated value is adopted as the best merging result, and if it does not increase, the process returns to before the merging process. ,
Performing a process of merging with another element model until there is no more than a predetermined finite number of candidates; selecting three element models in order to obtain the best merged division result; After merging as one new element model, dividing the remaining one as two new element models, estimating the posterior distribution of the parameters of this new element model, and using the Bayesian estimation method, If the posterior distribution of the parameters is re-estimated and the lower limit of the logarithmic ensemble likelihood function increases due to the merged division, the estimated value is adopted as the best merged division result. Back to
Performing a process of performing a merged division with another element model until there is no more than a predetermined finite number of candidates; selecting one element model to obtain the best division result, and replacing the element model with a new one After dividing into two element models and estimating the posterior distribution of the parameters of the new element model, the Bayesian estimation method is used to re-estimate the posterior distribution of the parameters of all the element models, and the logarithmic ensemble likelihood is obtained by the division. If the lower bound of the function increases,
Adopting the estimated value as the best division result, and if it does not increase, returning to before the division processing and executing processing of dividing another element model until there is no more than a predetermined finite number of candidates; A step of comparing each of the best merged result, the best merged divided result, and the best divided result obtained in each of the steps, and selecting a result in which the lower limit value of the logarithmic ensemble likelihood function corresponding to each is maximized; A step of repeatedly executing a series of steps until the lower limit of the logarithmic ensemble likelihood function does not increase any more.