JP2010170422A

JP2010170422A - Model selection apparatus, selection method for model selection apparatus, and program

Info

Publication number: JP2010170422A
Application number: JP2009013503A
Authority: JP
Inventors: Ryohei Fujimaki; 遼平藤巻; Satoshi Morinaga; 聡森永; Michiya Monma; 道也門馬; Kenji Aoki; 健児青木
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2009-01-23
Filing date: 2009-01-23
Publication date: 2010-08-05
Anticipated expiration: 2029-01-23
Also published as: JP5332647B2

Abstract

<P>PROBLEM TO BE SOLVED: To implement a fast model selection to a model selection problem with a mixed distribution. <P>SOLUTION: A model selection apparatus includes a mixture number setting means for selecting a candidate value of mixture number that has not been optimized, an initialization means for initializing data with the selected mixture number, an expectation calculation means for calculating an expectation of cluster assignment, an optimization means for calculating an information criterion expectation to a posterior probability of cluster assignment and performing optimization, an information criterion calculation means for calculating an information criterion value, an optimality determination means for repeating the expectation calculation if determining that the optimization is incomplete, a mixture number determination means for determining whether a parameter optimizing the information criterion expectation has been calculated for every candidate value of mixture number in data determined to be optimum, and an optimum distribution selection means for comparing the optimized information criterion to select an optimum mixture number as an optimum model. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、情報量基準の期待値から高速なモデル選択を実現するモデル選択装置、モデル選択装置の選択方法及びプログラムに関する。 The present invention relates to a model selection device, a model selection device selection method, and a program for realizing high-speed model selection from an expected value of an information amount standard.

混合分布は、複数の分布によってデータの分布を表現するモデルであり、産業上データモデル化に対して重要なモデルである。このようなモデルには、例えば混合正規分布や混合隠れマルコフモデルなど様々なモデルが存在する。 The mixed distribution is a model that represents the distribution of data by a plurality of distributions, and is an important model for industrial data modeling. Such models include various models such as a mixed normal distribution and a mixed hidden Markov model.

一般的に、混合の数と各コンポーネントの種類が特定された場合には、ＥＭアルゴリズム（例えば非特許文献１）などの公知の技術を利用して、分布のパラメータを特定することが可能である。 Generally, when the number of mixtures and the type of each component are specified, it is possible to specify distribution parameters using a known technique such as an EM algorithm (for example, Non-Patent Document 1). .

パラメータを推定するためには、混合の数や各コンポーネントの種類を決定する必要があり、このような、モデルの形を特定する問題は、一般的に「モデル選択問題」や「システム同定問題」と呼ばれ、信頼性のあるモデルを構築するために極めて重要な問題であり、そのための技術が従来より提案されている。 In order to estimate the parameters, it is necessary to determine the number of mixtures and the type of each component, and such problems that specify the shape of the model are generally called “model selection problem” or “system identification problem”. This is an extremely important problem for constructing a reliable model, and a technique for that purpose has been proposed.

混合分布のモデル選択問題では、最小記述長（ＭＤＬ）（例えば非特許文献２）、赤池情報量基準（ＡＩＣ）（例えば非特許文献３）、交差検定など、複数のモデルの候補に対して、モデル選択の基準を計算し、その基準が最適となるモデルを選択する方針がとられる。情報量基準を最適化するモデルは、例えばＭＤＬ基準の場合は真の分布への一致性や、ＡＩＣの場合は予測誤差最小など、優れた統計的性質を持つことが知られている。 In the mixed distribution model selection problem, for a plurality of model candidates such as minimum description length (MDL) (for example, Non-Patent Document 2), Akaike Information Criterion (AIC) (for example, Non-Patent Document 3), cross validation, etc. The policy is to calculate the model selection criteria and select the model for which the criteria are optimal. It is known that a model for optimizing the information criterion has excellent statistical properties such as matching to a true distribution in the case of the MDL criterion and minimum prediction error in the case of the AIC.

ところが、従来技術では、混合の数を最適化することは可能であったが、混合しているコンポーネントの種類までも最適化することは、計算量の問題で事実上不可能だった。 However, with the prior art, it was possible to optimize the number of blends, but it was practically impossible to optimize even the types of components that were blended.

計算量の問題について、混合多項式曲線の選択問題にて説明する。多項式曲線は、直線（１次曲線）、２次曲線、３次曲線と、複数の次数が存在する。 The problem of computational complexity is explained in the problem of selecting a mixed polynomial curve. The polynomial curve includes a straight line (primary curve), a quadratic curve, a cubic curve, and a plurality of orders.

混合数を１からCmaxまで、曲線の次数を1からDmaxまで探索して最適なモデルを選択する場合、従来技術では、直線と２次曲線が２つ（混合数は３）、３次曲線が３つと４次曲線が２つ（混合数は５）など全てのモデルの候補に対して情報量基準を計算する必要がある。このモデルの候補の数は、例えばCmax=10、Dmax=10とした場合には約十万通り、Cmax=20、Dmax=20とした場合には数百億通りとなり、探索すべきモデルの複雑さと共に指数的に増加する。 When selecting the optimal model by searching the number of mixtures from 1 to Cmax and the order of the curve from 1 to Dmax, in the prior art, there are two straight lines and a quadratic curve (mixing number is 3), and the cubic curve is It is necessary to calculate the information criterion for all model candidates such as three and two quartic curves (the number of mixtures is five). The number of candidates for this model is, for example, approximately 100,000 when Cmax = 10 and Dmax = 10, and tens of billions when Cmax = 20 and Dmax = 20. It increases exponentially with it.

そこで、非特許文献４では、最初に混合の数の最適化をし、次に種類の最適化を行う２段階の技術が提案されている。 Therefore, Non-Patent Document 4 proposes a two-stage technique in which the number of mixing is first optimized and then the type of optimization is performed.

Christopher M.Bishop著，「Pattern Recognition And Machine Learning」，New edition版，Springer-Verlag，２００６年８月１７日，ｐ．４３８−４４１Christopher M. Bishop, “Pattern Recognition And Machine Learning”, New edition, Springer-Verlag, August 17, 2006, p. 438-441 山西健司、韓太舜著，「ＭＤＬ入門：情報理論の立場から」，人工知能学会誌，１９９２年５月，第７巻，第３号，ｐ．４２７−４３４Kenji Yamanishi, Tae Hanhan, “Introduction to MDL: From the Viewpoint of Information Theory”, Journal of Artificial Intelligence, May 1992, Vol. 7, No. 3, p. 427-434 下平英寿著、他３名，「モデル選択予測・検定・推定の交差点統計科学のフロンティア（３）」，岩波書店，２００４年１２月，ｐ．２４−２５Shigehira Hidetoshi and three others, "Model selection Prediction, test, estimation intersection Frontier of statistical science (3)", Iwanami Shoten, December 2004, p. 24-25 Yue Wang,Lan Luo,Matthew T.Freedman,and Sun-Yuan Kung,"Probabilistic Principal Component Subspaces:A Hierarchical Finite Mixture Model for Data Visualization",IEEE TRANSACTIONS ON NEURAL NETWORKS, MAY 2000,VOL.11,NO.3,p.625-636Yue Wang, Lan Luo, Matthew T. Freedman, and Sun-Yuan Kung, "Probabilistic Principal Component Subspaces: A Hierarchical Finite Mixture Model for Data Visualization", IEEE TRANSACTIONS ON NEURAL NETWORKS, MAY 2000, VOL.11, NO.3, p.625-636

しかしながら、非特許文献４に記載されるような経験則に基づく方法では、情報量基準の優れた統計的性質が失われてしまっていた。 However, in the method based on the rule of thumb as described in Non-Patent Document 4, the excellent statistical properties of the information criterion have been lost.

そこで本発明は、上記問題点に鑑みてなされたもので、混合分布のモデル選択問題に対して、高速にモデル選択を実現することを目的とする。 Therefore, the present invention has been made in view of the above problems, and an object of the present invention is to realize model selection at high speed with respect to the model selection problem of the mixed distribution.

上記課題を解決するため、本発明におけるモデル選択装置は、混合数の候補値から最適化が行われていない候補値を選択する混合数設定手段と、混合数設定手段で選択された混合数を用いてデータのクラスタアサイメントを初期化する初期化手段と、初期化手段にて初期化されたデータにおけるクラスタアサイメントの期待値を計算する期待値計算手段と、期待値計算手段で計算されたクラスタアサイメントの期待値を用いて、情報量基準のクラスタアサイメントの事後確率に関する期待値を計算し、各コンポーネントに関して最適とするコンポーネントの種類とパラメータを算出する最適化手段と、最適化手段にて最適化された分布に関する情報量基準の値を計算する情報量基準計算手段と、情報量基準計算手段で計算された情報量基準の最適性を判定し、最適でないと判定されれば、期待値計算手段にて再度期待値計算を行わせる最適性判定手段と、最適性判定手段にて最適と判定されたデータにおける混合数の候補値の全てについて、情報量基準の期待値を最適化するパラメータが算出されているかを判定し、全候補値が最適化されていなければ混合数設定手段へとデータを送信する混合数判定手段と、各混合数に対して最適化されたパラメータに関する情報量基準を比較し、最適な混合数を最適なモデルとして選択する最適分布選択手段と、を備えることを特徴とする。 In order to solve the above-described problem, the model selection apparatus according to the present invention includes a mixture number setting unit that selects a candidate value that is not optimized from a mixture number candidate value, and a mixture number selected by the mixture number setting unit. Using the initialization means for initializing the cluster assignment of the data, the expected value calculation means for calculating the expected value of the cluster assignment in the data initialized by the initialization means, and the expected value calculation means Using the expected value of the cluster assignment, calculate the expected value for the posterior probability of the information-based cluster assignment and calculate the component type and parameters to be optimized for each component. Information criterion calculation means for calculating the information criterion value for the optimized distribution and the information criterion based on the information criterion calculation means If the suitability is judged to be non-optimal, the expectation value calculating means performs the expected value calculation again, and the candidate value of the number of mixtures in the data determined to be optimum by the optimumness judging means For all of the above, it is determined whether a parameter that optimizes the expected value of the information amount criterion is calculated, and if all candidate values are not optimized, a mixture number determination unit that transmits data to the mixture number setting unit; And an optimum distribution selection unit that compares information amount criteria regarding parameters optimized for each number of mixtures and selects the optimum number of mixtures as an optimum model.

また、本発明におけるモデル選択装置の選択方法は、混合数の候補値から最適化が行われていない候補値を選択する混合数設定ステップと、混合数設定ステップで選択された混合数を用いてデータのクラスタアサイメントを初期化する初期化ステップと、初期化ステップにて初期化されたデータにおけるクラスタアサイメントの期待値を計算する期待値計算ステップと、期待値計算ステップで計算されたクラスタアサイメントの期待値を用いて、情報量基準のクラスタアサイメントの事後確率に関する期待値を計算し、各コンポーネントに関して最適とするコンポーネントの種類とパラメータを算出する最適化ステップと、最適化ステップにて最適化された分布に関する情報量基準の値を計算する情報量基準計算ステップと、情報量基準計算ステップで計算された情報量基準の最適性を判定し、最適でないと判定されれば、期待値計算ステップにて再度期待値計算を行わせる最適性判定ステップと、最適性判定ステップにて最適と判定されたデータにおける混合数の候補値の全てについて、情報量基準の期待値を最適化するパラメータが算出されているかを判定し、全候補値が最適化されていなければ混合数設定ステップから処理を繰り返させる混合数判定ステップと、各混合数に対して最適化されたパラメータに関する情報量基準を比較し、最適な混合数を最適なモデルとして選択する最適分布選択ステップと、を備えることを特徴とする。 Further, the selection method of the model selection device according to the present invention uses a mixture number setting step for selecting candidate values that are not optimized from the mixture number candidate values, and the number of mixtures selected in the mixture number setting step. An initialization step for initializing the cluster assignment of data, an expected value calculation step for calculating an expected value of cluster assignment in the data initialized in the initialization step, and a cluster assignment calculated in the expected value calculation step Using the expected value of the asset to calculate the expected value related to the posterior probability of the information-based cluster assignment, and to optimize the component type and parameters for each component, and to optimize the optimization step Information criterion calculation step for calculating the information criterion value for the generalized distribution, and information criterion calculation The optimality of the information amount criterion calculated in step is determined, and if it is determined that it is not optimal, the optimality determination step in which the expected value calculation is performed again in the expected value calculation step and the optimality in the optimality determination step. For all candidate values for the number of mixtures in the determined data, it is determined whether a parameter for optimizing the expected value based on the information amount has been calculated. If all candidate values have not been optimized, processing is performed from the mixture number setting step And a step of determining the number of mixtures, and an optimum distribution selection step of comparing an information amount criterion regarding parameters optimized for each number of mixtures and selecting the optimum number of mixtures as an optimum model. And

また、本発明におけるプログラムは、混合数の候補値から最適化が行われていない候補値を選択する混合数設定処理と、混合数設定処理で選択された混合数を用いてデータのクラスタアサイメントを初期化する初期化処理と、初期化処理にて初期化されたデータにおけるクラスタアサイメントの期待値を計算する期待値計算処理と、期待値計算処理で計算されたクラスタアサイメントの期待値を用いて、情報量基準のクラスタアサイメントの事後確率に関する期待値を計算し、各コンポーネントに関して最適とするコンポーネントの種類とパラメータを算出する最適化処理と、最適化処理にて最適化された分布に関する情報量基準の値を計算する情報量基準計算処理と、情報量基準計算処理で計算された情報量基準の最適性を判定し、最適でないと判定されれば、期待値計算処理にて再度期待値計算を行わせる最適性判定処理と、最適性判定処理にて最適と判定されたデータにおける混合数の候補値の全てについて、情報量基準の期待値を最適化するパラメータが算出されているかを判定し、全候補値が最適化されていなければ混合数設定処理から処理を繰り返させる混合数判定処理と、各混合数に対して最適化されたパラメータに関する情報量基準を比較し、最適な混合数を最適なモデルとして選択する最適分布選択処理と、をコンピュータに実行させることを特徴とする。 Further, the program according to the present invention includes a mixture number setting process for selecting candidate values that are not optimized from the mixture number candidate values, and a cluster assignment of data using the mixture number selected in the mixture number setting process. Initialization processing for initializing, expected value calculation processing for calculating the expected value of cluster assignment in the data initialized by initialization processing, and expected value of cluster assignment calculated by the expected value calculation processing To calculate the expected value for the posterior probability of the information-based cluster assignment, to calculate the type and parameters of the component that is optimal for each component, and to the distribution optimized by the optimization process Determine the optimality of the information criterion based on the information criterion that calculates the information criterion value and the information criterion calculated by the information criterion calculation. If it is determined that the optimal value determination process is performed again in the expected value calculation process, the information criterion is used for all candidate values of the number of mixtures in the data determined to be optimal in the optimality determination process. It is determined whether the parameter that optimizes the expected value is calculated, and if all candidate values are not optimized, the mixture number determination process that repeats the process from the mixture number setting process, and optimization for each mixture number It is characterized in that the computer executes an optimal distribution selection process for comparing the information amount criteria regarding the parameters and selecting an optimal number of mixtures as an optimal model.

本発明により、混合分布のモデル選択問題に対して、情報量基準の統計的性質を保持したまま、高速にモデル選択を実現することが可能となる。 According to the present invention, it is possible to realize model selection at high speed while maintaining the statistical property of the information amount criterion for the model selection problem of the mixed distribution.

本発明の実施形態に係るモデル選択装置の構成図である。It is a block diagram of the model selection apparatus which concerns on embodiment of this invention. 本発明の実施形態に係るモデル選択装置のフローチャート図である。It is a flowchart figure of the model selection apparatus which concerns on embodiment of this invention.

次に、発明を実施するための最良の形態について図面を参照して詳細に説明する。 Next, the best mode for carrying out the invention will be described in detail with reference to the drawings.

［第１の実施の形態］
図１は、本発明の実施形態におけるモデル選択装置の構成図である。モデル選択装置１００は、データ入力部１０１と、混合数設定部１０２と、分布初期化処理部１０３と、クラスタアサイメント期待値計算処理部１０４と、期待情報量基準最適化処理部１０５と、情報量基準計算処理部１０６と、最適性判定処理部１０７と、混合数ループ終了判定処理部１０８と、最適分布選択処理部１０９と、モデル選択結果出力装置１１０と、を備えている。 [First Embodiment]
FIG. 1 is a configuration diagram of a model selection device according to an embodiment of the present invention. The model selection apparatus 100 includes a data input unit 101, a mixture number setting unit 102, a distribution initialization processing unit 103, a cluster assignment expected value calculation processing unit 104, an expected information amount criterion optimization processing unit 105, an information An amount criterion calculation processing unit 106, an optimality determination processing unit 107, a mixture number loop end determination processing unit 108, an optimal distribution selection processing unit 109, and a model selection result output device 110 are provided.

モデル選択装置１００は、入力データ１１１が入力されると、入力データ１１１に対して混合の数及び各コンポーネントの種類を最適化し、モデル選択結果１１２として出力する。 When the input data 111 is input, the model selection apparatus 100 optimizes the number of mixtures and the type of each component with respect to the input data 111 and outputs the result as a model selection result 112.

データ入力部１０１は、入力データ１１１を入力するための機能部である。また、入力データ１１１には、混合されるコンポーネントの種類や、混合数の候補値など、モデル選択に必要なパラメータが含まれる。 The data input unit 101 is a functional unit for inputting input data 111. Further, the input data 111 includes parameters necessary for model selection, such as the types of components to be mixed and candidate values for the number of mixtures.

混合数設定部１０２では、モデルの混合数を入力された混合数の候補値から選択して設定する。以下では、設定された混合数をKと表記する。 The mixture number setting unit 102 selects and sets the mixture number of the model from the inputted mixture number candidate values. In the following, the set number of mixtures is expressed as K.

分布初期化処理部１０３では、推定のための初期化処理を実施する。なお、初期化は任意の方法によって実施すればよい。 The distribution initialization processing unit 103 performs initialization processing for estimation. Note that initialization may be performed by an arbitrary method.

クラスタアサイメント期待値計算処理部１０４では、入力された各データが混合分布のどのコンポーネントに属しているか（クラスタアサイメントと呼ぶ）の期待値を計算する。 The cluster assignment expected value calculation processing unit 104 calculates an expected value of which component of the mixed distribution each input data belongs to (referred to as cluster assignment).

期待情報量基準最適化処理部１０５では、計算されたクラスタアサイメントの期待値を利用して、各コンポーネントに関して期待情報量基準を最適とするコンポーネントの種類とその時のパラメータを算出する。 The expected information criterion optimization processing unit 105 uses the expected value of the calculated cluster assignment to calculate the type of component that optimizes the expected information criterion for each component and the parameters at that time.

ここで、期待情報量基準とは、クラスタアサイメントの事後確率に対して情報量基準の期待値を取った量である。この処理において重要な点は、各コンポーネントは個々に最適化を行なう事が可能であり、コンポーネント全体での組合せは考慮する必要がない点である。これにより、混合数の候補数やコンポーネントの種類数が増加し、モデルの組合せが指数的に増加する場合であっても、混合数の候補数およびコンポーネントの種類数の線形オーダーでモデル選択を行なう事が可能となる。なお、コンポーネント単位の最適化に関しては、任意の最適化手法を用いる事が可能である。 Here, the expected information criterion is an amount obtained by taking the expected value of the information criterion with respect to the posterior probability of cluster assignment. An important point in this processing is that each component can be individually optimized, and there is no need to consider the combination of the entire components. As a result, even if the number of mixed candidates and the number of component types increase and the number of model combinations increases exponentially, model selection is performed in a linear order of the number of mixed number candidates and the number of component types. Things will be possible. It should be noted that any optimization method can be used for the component unit optimization.

情報量基準計算処理部１０６では、コンポーネントの種類およびパラメータが最適化されたモデルに対して、情報量基準の計算を行う。 The information amount criterion calculation processing unit 106 calculates the information amount criterion for a model in which the component type and parameters are optimized.

最適性判定処理部１０７では、本ループで計算された情報量基準の値と、前のループで計算された情報量基準の値を比較して、最適化処理が収束されているかを判定する。収束されていなければ、クラスタアサイメント期待値計算処理部１０４からループさせる。 The optimality determination processing unit 107 compares the information amount reference value calculated in this loop with the information amount reference value calculated in the previous loop to determine whether the optimization processing has converged. If it is not converged, the cluster assignment expected value calculation processing unit 104 is looped.

混合数ループ終了判定処理部１０８では、入力された全ての混合数の候補値に対して最適な情報量基準の値が計算されているかを判定する。最適でないと判定された場合には、混合数設定部１０２からループさせる。 The mixture number loop end determination processing unit 108 determines whether or not an optimal information amount reference value has been calculated for all input candidate numbers for the mixture number. If it is determined that it is not optimal, the mixture number setting unit 102 causes a loop.

最適分布選択処理部１０９では、全ての混合数の候補値に対して計算された情報量基準の値を比較し、情報量基準が最適である混合数を選択する。この混合数に対して、各コンポーネントの種類およびパラメータはすでに最適化されているため、それを最適な分布として選択する。 The optimum distribution selection processing unit 109 compares the information amount criterion values calculated for all the mixture number candidate values, and selects the mixture number for which the information amount criterion is optimum. Since the type and parameters of each component have already been optimized for this number of mixtures, it is selected as the optimal distribution.

モデル選択結果出力部１１０では、最適な混合数、コンポーネントの種類、パラメータなどをモデル選択結果１１２として出力する。 The model selection result output unit 110 outputs an optimal number of mixtures, component types, parameters, and the like as a model selection result 112.

図２は、本発明の実施形態におけるフローチャート図である。本実施の形態に関するモデル選択装置１００は、概略以下のように動作する。 FIG. 2 is a flowchart in the embodiment of the present invention. The model selection apparatus 100 according to the present embodiment generally operates as follows.

まず、データ入力部１０１に入力データ１１１が入力されると（ステップＳ１００）、混合数設定部１０２において、入力された混合数の候補値のうち、まだ最適化の行なわれていない混合数を選択し設定する（ステップＳ１０１）。 First, when the input data 111 is input to the data input unit 101 (step S100), the mixture number setting unit 102 selects a mixture number that has not yet been optimized among the input candidate values for the mixture number. Set (step S101).

次に、初期化処理部１０３にて、混合数設定部１０２において指定された混合数に対して、データのクラスタアサイメントを初期化する（ステップＳ１０２）。 Next, the initialization processing unit 103 initializes the data cluster assignment for the mixture number designated by the mixture number setting unit 102 (step S102).

次に、クラスタアサイメント期待値計算処理部１０４において、指定された混合数の最適化途中の分布に対してデータのクラスタアサイメントの期待値を計算する（ステップＳ１０３）。なお、初期化直後は分布の最適化が始まっておらず、この処理はスキップされる。 Next, the cluster assignment expected value calculation processing unit 104 calculates the expected value of the cluster assignment of the data for the distribution in the middle of the optimization of the designated number of mixtures (step S103). Note that distribution optimization has not started immediately after initialization, and this processing is skipped.

次に、期待情報量基準最適化処理部１０５において、各コンポーネントに対して、計算されたクラスタアサイメントの期待値を利用して、情報量基準のクラスタアサイメントの事後確率に関する期待値を計算し、それを最適とするコンポーネントの種類とパラメータを算出する（ステップＳ１０４）。 Next, the expected information criterion optimization processing unit 105 calculates an expected value related to the posterior probability of the information criterion cluster assignment by using the expected value of the cluster assignment calculated for each component. Then, the type and parameter of the component that optimizes it are calculated (step S104).

次に、情報量基準計算処理部１０６において、ステップＳ１０４で最適化された分布に関する情報量基準の値を計算する（ステップＳ１０５）。 Next, the information amount reference calculation processing unit 106 calculates the information amount reference value related to the distribution optimized in step S104 (step S105).

次に、最適性判定処理部１０７において、ステップＳ１０５で計算された情報量基準の値と、ひとつ前のループで計算された情報量基準の値を比較し、最適化が終了しているかどうかを判定する（ステップＳ１０６）。 Next, the optimality determination processing unit 107 compares the information amount reference value calculated in step S105 with the information amount reference value calculated in the previous loop, and determines whether or not the optimization has been completed. Determination is made (step S106).

最適でないと判定された場合には、ステップＳ１０３からステップＳ１０６の処理を繰り返す。最適と判定された場合には、混合数ループ終了判定処理部１０８において、混合数の候補値の全てに対して最適化が完了し、情報量基準の値が計算されているかを判定する（ステップＳ１０７）。全ての候補に対して最適化が完了していない場合には、ステップＳ１０１からステップＳ１０７の処理を繰り返す。 If it is determined that it is not optimal, the processing from step S103 to step S106 is repeated. If it is determined to be optimal, the mixture number loop end determination processing unit 108 determines whether or not the optimization of all the mixture number candidate values has been completed and the information amount reference value has been calculated (step) S107). If optimization has not been completed for all candidates, the processing from step S101 to step S107 is repeated.

全ての候補に対して最適化が完了した場合には、最適分布選択処理部１０９において、各混合数に対する最適化された情報量基準の値を比較し、その値が最適な混合数を最適なモデルとして選択する（ステップＳ１０８）。なお、選択されたモデルに関しては、ステップＳ１０３からステップＳ１０６の処理において、コンポーネントの種類およびパラメータが最適化されており、最適な混合数およびコンポーネントの種類を持った分布が取得される。 When the optimization has been completed for all candidates, the optimum distribution selection processing unit 109 compares the optimized information amount standard values for the number of mixtures, and determines the optimum number of mixtures for that value. Select as a model (step S108). For the selected model, the component types and parameters are optimized in the processing from step S103 to step S106, and a distribution having the optimal number of mixtures and component types is acquired.

次に、モデル選択結果出力装置１１０において、モデル選択結果１１２を出力する（ステップＳ１０９）。 Next, the model selection result output device 110 outputs the model selection result 112 (step S109).

以下では、本発明で提案するモデル選択装置が適用可能なモデルについて、具体的に説明する。 Hereinafter, models to which the model selection device proposed in the present invention can be applied will be described in detail.

情報量基準としてＭＤＬ基準を利用する場合の一例について説明する。 An example of using the MDL standard as the information amount standard will be described.

まず、学習すべき混合分布は、入力されたデータに対応する確率変数Xに対して（数１）で表される。ただし、πkはk番目のコンポーネントに関する混合比を表し、ηkはk番目のコンポーネントに関する分布のパラメータを表し、θ=[π,η1,…,ηK]とする。ここで、π=(π1,…,πK)である。各コンポーネントの分布P(X;ηk)は、コンポーネント候補の集合Sの元であり、例えば（数１）は正規分布と指数分布など、複数の異なる分布を混合させる事も可能である。 First, the mixture distribution to be learned is expressed by (Expression 1) with respect to the random variable X corresponding to the input data. Here, πk represents a mixing ratio regarding the kth component, ηk represents a distribution parameter regarding the kth component, and θ = [π, η1,..., ΗK]. Here, π = (π1,..., ΠK). The distribution P (X; ηk) of each component is an element of a set S of component candidates. For example, (Equation 1) can be a mixture of a plurality of different distributions such as a normal distribution and an exponential distribution.

次に、ＭＤＬ基準とは、（数２）で表されるデータの記述長とモデルの記述長の総和を最小化するモデルを最適なモデルとして選択するための基準である。情報量基準としてＭＤＬ基準を利用する場合には、情報量基準計算処理部１０６にＭＤＬ基準の計算方法が記憶されており、（数２）によって分布のＭＤＬ基準の値が計算される。 Next, the MDL standard is a standard for selecting, as an optimal model, a model that minimizes the sum of the data description length and the model description length expressed by (Expression 2). When the MDL criterion is used as the information amount criterion, the MDL criterion calculation method is stored in the information amount criterion calculation processing unit 106, and the MDL criterion value of the distribution is calculated by (Equation 2).

ただし、lは記述長関数を、x^N=(x₁,…,x_N)は入力されたデータセットを、Mはモデルを表す。x_iはデータ１点を表し、Xはデータに対応する確率変数とする。例えば、モデルMによって決まるXの分布をP(X；θ)とすると（θは分布のパラメータ）、l(x^N|M)は（数３）や（数４）のように計算する事が可能である。 Here, l is a description length function, x ^N = (x ₁ ,..., X _N ) is an input data set, and M is a model. x _i represents one point of data, and X is a random variable corresponding to the data. For example, if the distribution of X determined by the model M is P (X; θ) (θ is a distribution parameter), l (x ^N | M) can be calculated as (Equation 3) or (Equation 4). Is possible.

ただし、logは底が２の対数とし、lnは自然対数とする。また、＾はパラメータが最尤推定量である事を表すとする。また、I(θ)はフィッシャー情報行列である。なお、記述長関数l(x^N|M)およびl(M)は、Mの種類によって様々な記述方法が提案されており、本発明の実施形態においては、任意の記述方法を利用することが可能である。 Where log is the logarithm of the base 2 and ln is the natural logarithm. Also, ^ represents that the parameter is a maximum likelihood estimator. I (θ) is a Fisher information matrix. Various description methods have been proposed for the description length functions l (x ^N | M) and l (M) depending on the type of M, and any description method may be used in the embodiment of the present invention. Is possible.

データx_iに対するクラスタアサイメントをz_iとし、z^N=(z₁, …, z_N)とする。z_i=(z_i1, …,z_iK)であり、z_ikはx_iがk番目のクラスタに属する場合には１を、k番目のクラスタに属さない場合には0をとる変数である。z^Nは実際には観測することができないため、混合分布の学習ではしばしば隠れ変数と呼ばれる。 Let z _{i be the} cluster assignment for data x _i and let z ^N = (z ₁ ,..., Z _N ). _{_{z i = (z i1, ...}} , z iK) is a, z _ik is 1 if x _i belongs to the k-th cluster, in the case that do not belong to the k-th cluster, which is a variable that takes a 0. Since z ^N cannot actually be observed, it is often called a hidden variable in learning mixed distributions.

クラスタアサイメント期待値計算処理部１０４には、データx^Nが与えられた上でのクラスタアサイメントの事後確率に関する期待値の計算方法が記載されている。なお、クラスタアサイメントの事後確率は、P(X;θ)によって異なり、公知の任意の方法によって計算する事が可能である。以下では、Ez[A]は引数Aのクラスタアサイメントの事後確率に関する期待値を表す。 The cluster assignment expectation value calculation processing unit 104, calculation of expected values for the posterior probability of the cluster assignment of in terms of data x ^N is given are described. Note that the posterior probability of cluster assignment varies depending on P (X; θ), and can be calculated by any known method. In the following, Ez [A] represents an expected value related to the posterior probability of the cluster assignment of argument A.

x^Nとz^Nの両者を記述する場合の記述長は、（数５）によって計算される。 description length of coding both x ^N and z ^N is calculated by equation (5).

ここで、l(x^N,z^N;M)は、l(x^N|M)およびl(M)と同じく任意の記述長関数を利用することが可能である。例としては、（数６）や（数７）などが挙げられる。 ^{^{Here, l (x N, z N}} ; M) is, l | can be similarly utilize any description length function and (x ^N M) and l (M). Examples include (Equation 6) and (Equation 7).

ただし、Mkはηkの次元であり、Nkはk番目のクラスタに属するデータの個数であり、（数８）で計算可能である。また、P(zi;πk)はi番目のデータに関するクラスタアサイメントが１または０をとる確率を表す。 However, Mk is the dimension of ηk, Nk is the number of data belonging to the kth cluster, and can be calculated by (Equation 8). P (zi; πk) represents the probability that the cluster assignment for the i-th data takes 1 or 0.

期待記述長最適化処理部１０５で計算される期待記述長とは、l(x^N,z^N;M)をクラスタアサイメントの事後確率に関して期待値をとった量であり、Ez[l(x^N,z^N;M)]と計算される。期待記述長を最適とするパラメータ及びモデルは、各コンポーネント間に依存関係がないため、コンポーネント毎に最適化が可能となる。なお、コンポーネント毎の分布のパラメータ推定方法に関しては、公知の任意の技術を利用することが可能である。 The expected description length calculated by the expected description length optimization processing unit 105 is an amount obtained by taking an expected value of l (x ^N , z ^N ; M) with respect to the posterior probability of cluster assignment, and Ez [l (x ^{N 1} , z ^N ; M)]. Since parameters and models that optimize the expected description length have no dependency relationship between components, optimization is possible for each component. It should be noted that any known technique can be used for the parameter estimation method of the distribution for each component.

（独立性の異なる複数の混合分布）
本実施形態におけるモデル選択装置を利用すると、多次元データに対して独立性の異なる複数の分布の混合分布に関し、混合の数および各コンポーネントの独立性を高速に最適化する事が可能である。 (Multiple mixed distributions with different independence)
When the model selection apparatus according to the present embodiment is used, the number of mixtures and the independence of each component can be optimized at high speed with respect to a mixture distribution of a plurality of distributions having different independence with respect to multidimensional data.

例えば、３次元の正規分布を例に考えると、３次元正規分布では、全次元が独立、１次元目のみが独立、２次元目のみが独立、３次元目のみが独立、全次元が従属、という５種類の独立性が考えられ、異種多様な独立性を持つ分布の混合分布を算出することが可能である。 For example, taking a three-dimensional normal distribution as an example, in a three-dimensional normal distribution, all dimensions are independent, only the first dimension is independent, only the second dimension is independent, only the third dimension is independent, all dimensions are subordinate, It is possible to calculate a mixed distribution of distributions having different kinds of independence.

本実施形態におけるモデル選択装置を利用すると、多次元正規分布に関わらず、任意の多次元分布に関して適用することが可能である。 When the model selection apparatus according to the present embodiment is used, the present invention can be applied to any multidimensional distribution regardless of the multidimensional normal distribution.

（異種多様な複数の混合分布）
本実施形態におけるモデル選択装置を利用すると、複数の異なる分布の混合分布に関して、混合の数および各コンポーネントの分布の種類を最適化する事が可能である。 (Multiple mixed distributions of different types)
By using the model selection apparatus in the present embodiment, it is possible to optimize the number of mixtures and the type of distribution of each component with respect to a plurality of mixed distributions.

例えば、正規分布と指数分布を混合させる分布の候補とした場合には、正規分布と指数分布の数が最適化された混合分布を計算する事が可能である。 For example, when a distribution candidate for mixing a normal distribution and an exponential distribution is used, a mixed distribution in which the number of normal distributions and exponential distributions is optimized can be calculated.

本実施形態におけるモデル選択装置を利用すると、正規分布と指数分布の例に関わらず、任意の複数種類の分布に関して適用することが可能である。 When the model selection device according to the present embodiment is used, the present invention can be applied to an arbitrary plurality of types of distributions regardless of examples of the normal distribution and the exponential distribution.

（異なる種類の確率的な回帰関数の混合分布）
本実施形態におけるモデル選択装置を利用すると、異なる種類の確率的回帰関数の混合分布に関して、混合の数および各コンポーネントに関する回帰関数を最適化する事が可能である。 (Mixed distribution of different types of stochastic regression functions)
By using the model selection device in the present embodiment, it is possible to optimize the number of mixtures and the regression function for each component regarding the mixture distribution of different types of stochastic regression functions.

例えば、多項式曲線（多次元の場合は曲面）による回帰曲線の混合モデルを考える。この場合、各コンポーネントの候補としては、次数の異なる多項式曲線が考えられる。本発明で提案するモデル選択装置を利用すると、混合の数と各コンポーネントの多項式曲線の次数を最適化する事が可能である。 For example, consider a mixed model of regression curves using polynomial curves (or curved surfaces in the case of multi-dimensions). In this case, polynomial curves having different orders can be considered as candidates for each component. By using the model selection apparatus proposed in the present invention, it is possible to optimize the number of mixtures and the degree of the polynomial curve of each component.

本実施形態におけるモデル選択装置を利用すると、多項式曲線の例に関わらず、任意の複数種類の回帰関数の混合モデルに関して適用することが可能である。 When the model selection device in the present embodiment is used, the present invention can be applied to any mixed model of a plurality of types of regression functions regardless of examples of polynomial curves.

（異なる種類の確率的な識別関数の混合分布）
本実施形態におけるモデル選択装置を利用すると、異なる種類の確率的分類関数の混合分布に関して、混合の数および各コンポーネントに関する分類関数を最適化する事が可能である。 (Mixed distribution of different types of probabilistic discriminant functions)
By using the model selection apparatus according to the present embodiment, it is possible to optimize the number of mixtures and the classification function for each component regarding the mixture distribution of different types of probabilistic classification functions.

例えば、線形判別関数、２次判別関数など次数の異なる分類関数の混合モデルを考える。この場合、各コンポーネントの候補としては、次数の異なる分類関数が考えられる。 For example, consider a mixed model of classification functions having different orders such as a linear discriminant function and a quadratic discriminant function. In this case, classification functions having different orders can be considered as candidates for each component.

本実施形態におけるモデル選択装置を利用すると、混合の数と各コンポーネントの分類関数次数を最適化する事が可能である。また、次数の異なる分類関数の例に関わらず、任意の複数種類の分類関数の混合モデルに関して適用することが可能である。 By using the model selection device in the present embodiment, it is possible to optimize the number of mixtures and the classification function order of each component. Further, the present invention can be applied to a mixed model of arbitrary plural types of classification functions regardless of examples of classification functions having different orders.

（異なる隠れ状態数を持つ隠れマルコフモデルの混合分布）
本実施形態におけるモデル選択装置を利用すると、異なる隠れ状態数を持つ隠れマルコフモデルの混合分布に関して、混合の数および各コンポーネントに関する隠れ状態数を最適化する事が可能である。 (Mixed distribution of hidden Markov models with different numbers of hidden states)
By using the model selection apparatus in the present embodiment, it is possible to optimize the number of mixtures and the number of hidden states for each component regarding the mixture distribution of hidden Markov models having different numbers of hidden states.

（異なる混合数を持つ混合分布の混合分布）
本実施形態におけるモデル選択装置を利用すると、異なる混合数を持つ混合分布の混合分布に関して、混合の数および各コンポーネントに関する混合数を最適化する事が可能である。 (Mixed distribution with different number of mixtures)
By using the model selection device in the present embodiment, it is possible to optimize the number of mixtures and the number of mixtures related to each component with respect to the mixture distribution of mixture distributions having different numbers of mixtures.

例えば、正規分布の混合分布の混合分布を考える。本手法を用いると、全体としての混合数と同時に、各コンポーネントに関する混合数を最適化する事が可能である。 For example, consider a mixture distribution of normal distributions. Using this method, it is possible to optimize the number of mixtures for each component as well as the total number of mixtures.

本実施形態におけるモデル選択装置を利用すると、正規分布の例に関わらず、任意の混合分布の混合モデルに関して適用することが可能である。 When the model selection apparatus according to the present embodiment is used, the present invention can be applied to a mixed model having an arbitrary mixed distribution regardless of an example of a normal distribution.

（異なる次元を持つ確率的次元圧縮モデルの混合分布）
本実施形態におけるモデル選択装置を利用すると、異なる次元数を持つ確率的次元圧縮モデルの混合分布に関して、混合の数および各コンポーネントに関する圧縮次元数を最適化する事が可能である。 (Mixed distribution of stochastic dimension compression model with different dimensions)
By using the model selection apparatus in the present embodiment, it is possible to optimize the number of mixtures and the number of compression dimensions for each component regarding the mixture distribution of stochastic dimension compression models having different number of dimensions.

例えば、確率的主成分分析の混合分布を考える。本手法を用いると、混合数と同時に、各コンポーネントに関する主成分の数（圧縮次元）を最適化する事が可能である。 For example, consider a mixture distribution of probabilistic principal component analysis. By using this method, it is possible to optimize the number of principal components (compression dimension) for each component simultaneously with the number of mixtures.

本実施形態におけるモデル選択装置を利用すると、確率的主成分分析の例に関わらず、任意の確率的次元圧縮モデルの混合モデルに関して適用することが可能である。 When the model selection apparatus according to the present embodiment is used, it can be applied to a mixed model of an arbitrary probabilistic dimension compression model regardless of the example of the probabilistic principal component analysis.

（異なる時系列モデルの混合分布）
本実施形態におけるモデル選択装置を利用すると、異なる時系列モデルの混合分布に関して、混合の数および各コンポーネントに関する時系列モデルの種類を最適化する事が可能である。 (Mixed distribution of different time series models)
By using the model selection apparatus according to the present embodiment, it is possible to optimize the number of mixtures and the type of time series model for each component regarding the mixture distribution of different time series models.

例えば、自己回帰モデル、移動平均自己回帰モデル、季節変動モデルといった複数の時系列モデルを候補とし、各モデル候補の数が最適化されたモデルを算出する事が可能である。 For example, a plurality of time series models such as an autoregressive model, a moving average autoregressive model, and a seasonal variation model can be used as candidates, and a model in which the number of model candidates is optimized can be calculated.

本実施形態におけるモデル選択装置を利用すると、上記の時系列モデルに関わらず、任意の時系列モデルに適用可能である。 When the model selection apparatus according to the present embodiment is used, it can be applied to any time series model regardless of the above time series model.

（異なる次数を持つ時系列モデルの混合分布）
本実施形態におけるモデル選択装置を利用すると、異なる次数を持つ時系列モデルの混合分布に関して、混合の数および各コンポーネントに関する時系列モデルの次数を最適化する事が可能である。 (Mixed distribution of time series models with different orders)
When the model selection apparatus according to the present embodiment is used, it is possible to optimize the number of mixtures and the order of the time series model for each component regarding the mixture distribution of time series models having different orders.

例えば、自己回帰モデルの混合分布を考えると、混合数と同時に、各自己回帰モデルに関する次数を最適化する事が可能である。また例えば、状態空間モデルを考えると、混合数と同時に、各コンポーネントに関する状態量の数を最適化する事が可能となる。 For example, considering the mixture distribution of autoregressive models, it is possible to optimize the order for each autoregressive model as well as the number of mixtures. Further, for example, when considering a state space model, it is possible to optimize the number of state quantities related to each component simultaneously with the number of mixtures.

以上、実施の形態を説明したが、特許請求の範囲に定義された本発明の広範囲な趣旨および範囲から逸脱することなく、これら実施の形態や具体例に様々な修正および変更が可能である。 Although the embodiments have been described above, various modifications and changes can be made to these embodiments and specific examples without departing from the broad scope and scope of the present invention defined in the claims.

１０１データ入力部
１０２混合数設定部
１０３初期化処理部
１０４クラスタアサイメント期待値計算処理部
１０５期待情報量基準最適化処理部
１０６情報量基準計算処理部
１０７最適性判定処理部
１０８混合数ループ終了判定処理部
１０９最適分布選択処理部
１１０モデル選択結果出力部 DESCRIPTION OF SYMBOLS 101 Data input part 102 Mixed number setting part 103 Initialization process part 104 Cluster assignment expected value calculation process part 105 Expected information amount reference | standard optimization process part 106 Information amount reference | standard calculation process part 107 Optimality determination process part 108 End of mixture number loop Determination processing unit 109 Optimal distribution selection processing unit 110 Model selection result output unit

Claims

A mixture number setting means for selecting candidate values that are not optimized from the mixture number candidate values;
Initialization means for initializing a cluster assignment of data using the mixture number selected by the mixture number setting means;
Expected value calculation means for calculating an expected value of cluster assignment in the data initialized by the initialization means;
Using the expected value of the cluster assignment calculated by the expected value calculation means, the expected value related to the posterior probability of the information-based cluster assignment is calculated, and the component type and the parameter that are optimal for each component are calculated. Optimization means;
An information criterion calculation means for calculating an information criterion value related to the distribution optimized by the optimization means;
Optimality determining means for determining the optimality of the information amount criterion calculated by the information amount criterion calculating means, and if it is determined that the information amount criterion is not optimal, the optimality determining means for performing expected value calculation again by the expected value calculating means;
For all candidate values of the number of mixtures in the data determined to be optimal by the optimality determination means, it is determined whether parameters for optimizing the information amount-based expected value are calculated, and all candidate values are optimized. If not, a mixture number determining means for transmitting data to the mixture number setting means;
A model selection apparatus comprising: an optimum distribution selection unit that compares information amount criteria regarding parameters optimized for each number of mixtures and selects an optimum number of mixtures as an optimum model.

The model selection apparatus according to claim 1, wherein the information amount criterion is an MDL criterion.

3. The model selection apparatus according to claim 1, wherein the number of mixtures and the independence of each component are optimized with respect to a mixture distribution of a plurality of distributions having different independence with respect to multidimensional data.

The model selection apparatus according to claim 1, wherein the number of mixtures and the kind of distribution of each component are optimized with respect to a mixture distribution of a plurality of different distributions.

3. The model selection apparatus according to claim 1, wherein the number of mixtures and the regression function for each component are optimized with respect to a mixture distribution of different types of stochastic regression functions.

3. The model selection apparatus according to claim 1, wherein the number of mixtures and the classification function for each component are optimized with respect to a mixture distribution of different types of probabilistic classification functions.

3. The model selection apparatus according to claim 1, wherein the number of mixtures and the number of hidden states for each component are optimized with respect to a mixture distribution of hidden Markov models having different numbers of hidden states.

3. The model selection apparatus according to claim 1, wherein the number of mixtures and the number of mixtures for each component are optimized with respect to a mixture distribution of mixture distributions having different numbers of mixtures.

3. The model selection apparatus according to claim 1, wherein the number of mixtures and the number of compression dimensions for each component are optimized with respect to a mixture distribution of stochastic dimension compression models having different dimensions.

3. The model selection apparatus according to claim 1, wherein the number of mixtures and the type of time series model for each component are optimized with respect to the mixture distribution of different time series models.

3. The model selection apparatus according to claim 1, wherein the number of mixtures and the order of the time series model for each component are optimized with respect to the mixture distribution of time series models having different orders.

A mixture number setting step of selecting candidate values that are not optimized from the mixture number candidate values;
An initialization step of initializing a cluster assignment of data using the number of mixtures selected in the mixture number setting step;
An expected value calculation step of calculating an expected value of a cluster assignment in the data initialized in the initialization step;
Using the expected value of the cluster assignment calculated in the expected value calculation step, the expected value related to the posterior probability of the information-based cluster assignment is calculated, and the component type and the parameter that are optimal for each component are calculated. An optimization step;
An information criterion calculation step for calculating an information criterion value related to the distribution optimized in the optimization step;
Determining the optimality of the information criterion calculated in the information criterion calculation step, and if it is determined that it is not optimal, the optimality determination step of performing the expected value calculation again in the expected value calculation step;
For all candidate values of the number of mixtures in the data determined to be optimal in the optimality determination step, it is determined whether a parameter for optimizing the information amount-based expected value is calculated, and all candidate values are optimized. If not, a mixture number determination step for repeating the process from the mixture number setting step;
A selection method of a model selection device, comprising: an optimum distribution selection step of comparing information amount criteria regarding parameters optimized for each number of mixtures and selecting the optimum number of mixtures as an optimum model.

A mixture number setting process for selecting candidate values that are not optimized from the mixture number candidate values;
An initialization process for initializing a cluster assignment of data using the mixture number selected in the mixture number setting process;
An expected value calculation process for calculating an expected value of a cluster assignment in the data initialized by the initialization process;
Using the expected value of the cluster assignment calculated in the expected value calculation process, the expected value regarding the posterior probability of the information-based cluster assignment is calculated, and the component type and the parameter that are optimal for each component are calculated. Optimization process,
An information amount criterion calculation process for calculating an information amount criterion value related to the distribution optimized in the optimization process;
Determining the optimality of the information criterion calculated in the information criterion calculation process, and if it is determined not to be optimal, the optimality determination process in which the expected value calculation is performed again in the expected value calculation process;
For all candidate values of the number of mixtures in the data determined to be optimal in the optimality determination process, it is determined whether parameters for optimizing the information amount-based expected value are calculated, and all candidate values are optimized. If not, a mixture number determination process for repeating the process from the mixture number setting process;
A program that causes a computer to execute an optimum distribution selection process that compares information amount criteria regarding parameters optimized for each number of mixtures and selects an optimum number of mixtures as an optimum model.