JP5493867B2

JP5493867B2 - Statistical model learning apparatus, method and program

Info

Publication number: JP5493867B2
Application number: JP2009547010A
Authority: JP
Inventors: 祥史大西
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2007-12-20
Filing date: 2008-12-03
Publication date: 2014-05-14
Anticipated expiration: 2028-12-03
Also published as: WO2009081707A1; JPWO2009081707A1

Description

本発明は、統計モデル学習装置、方法及びプログラムに関し、特に、モデルパラメタの状態密度の影響を考慮した、予測分布を算出する技術に関する。 The present invention relates to a statistical model learning apparatus, method, and program, and more particularly to a technique for calculating a prediction distribution in consideration of the influence of the state density of model parameters.

音声認識システムにおける音素のモデルや、話者照合システムにおける話者のモデルでは、統計モデルが用いられている。具体的には、音素モデルでは隠れマルコフモデル（以下、「ＨＭＭ（ＨｉｄｄｅｎＭａｌｋｏｖＭｏｄｅｌ）」という）や、話者のモデルでは混合ガウスモデル（以下、「ＧＭＭ（ＧａｕｓｓｉａｎＭｉｘｔｕｒｅＭｏｄｅｌ）」という）などがよく用いられている。 Statistical models are used for phoneme models in speech recognition systems and speaker models in speaker verification systems. Specifically, a hidden Markov model (hereinafter referred to as “HMM (Hidden Malkov Model)”) is often used as a phoneme model, and a mixed Gaussian model (hereinafter referred to as “GMM (Gaussian Mixture Model)”) as a speaker model. It is used.

これらの学習方法としては、ベイズ推定に基づく枠組みがあり、例えば、非特許文献１において数理的解説が記載されている。 As these learning methods, there is a framework based on Bayesian estimation. For example, Non-Patent Document 1 describes a mathematical explanation.

確率変数ｘの学習例である学習データを数１、統計モデルのパラメタであるモデルパラメタを数２、モデルパラメタが与えられたとき確率変数ｘが出現する確率密度関数を与える学習モデルを数３、モデルパラメタの事前分布を数４、と記述するとき、ベイズ推定の枠組みでは、学習データが与えられたときのモデルパラメタの確率密度である事後分布は、数５で表される。ここで、数５における分母Ｚ（Ｘ^ｎ）は、分配関数と呼ばれ、数６で表される。Learning data, which is a learning example of the random variable x, Equation 1, Numerical model parameter, which is a parameter of the statistical model, Equation 2, Learning model which gives a probability density function in which the random variable x appears when the model parameter is given, When describing the prior distribution of model parameters as Equation 4, in the framework of Bayesian estimation, the posterior distribution, which is the probability density of the model parameters when learning data is given, is expressed by Equation 5. Here, the denominator Z (X ⁿ ) in Equation 5 is called a partition function and is expressed by Equation 6.

このとき、学習データが与えられたときの確率変数ｘが出現する確率密度関数は、予測分布として数７で表される。 At this time, the probability density function in which the random variable x appears when the learning data is given is expressed by Equation 7 as a predicted distribution.

しかし数６のモデルパラメタによる積分が困難であるため、実際によく用いられる、確率密度分布学習方法としては事後確率最大化法（ｍａｘｉｍｕｍａｐｏｓｔｅｒｉｏｒｉ：ＭＡＰ推定法。以下、「ＭＡＰ法」という）あるいは最尤推定法がある。 However, since integration using the model parameters of Equation 6 is difficult, a probability density distribution learning method that is often used in practice is a posterior probability maximization method (maximum a posteriori: MAP estimation method, hereinafter referred to as “MAP method”) or There is a maximum likelihood estimation method.

ＭＡＰ法では、対数尤度関数を数８として、対数尤度関数を最大化するモデルパラメタを、推定パラメタ数９とし、予測分布を数１０として算出する。 In the MAP method, the log likelihood function is calculated as Eq. 8, the model parameter that maximizes the log likelihood function is set as Eq. 9, and the prediction distribution is calculated as Eq.

最尤推定法ではＭＡＰ法において事前分布を用いない場合であり、数１１と置き換えればよい。 In the maximum likelihood estimation method, the prior distribution is not used in the MAP method.

ＭＡＰ法あるいは最尤推定法では、ベイズ推定における数７で、尤度が最大となる１点のモデルパラメタのみを用いて予測分布としていることに相当する。そのような最尤推定法を用いた統計モデル学習装置について、例を挙げて説明する。 In the MAP method or the maximum likelihood estimation method, this corresponds to the prediction distribution using only one model parameter with the maximum likelihood in Equation 7 in Bayesian estimation. A statistical model learning apparatus using such a maximum likelihood estimation method will be described with an example.

図４に従来の最尤推定法による統計モデル学習装置を示す。この従来の最尤推定法による統計モデル学習装置は、学習データ入力部４０１と、パラメタ算出手段４０２と、予測分布算出手段４０３とから構成されている。このような構成を有する従来の最尤推定法による統計モデル学習装置は次のように動作する。 FIG. 4 shows a conventional statistical model learning apparatus based on the maximum likelihood estimation method. The conventional statistical model learning apparatus based on the maximum likelihood estimation method includes a learning data input unit 401, parameter calculation means 402, and predicted distribution calculation means 403. A conventional statistical model learning apparatus based on the maximum likelihood estimation method having such a configuration operates as follows.

図４の統計モデル学習装置は、学習データ入力部４０１から入力した学習データから、パラメタ算出手段４０２において、数９を満たすモデルパラメタを算出する。例えば、ＨＭＭやＧＭＭなどの学習モデルにおいてはＥＭアルゴリズムを用いて算出する。予測分布算出手段４０３において、算出されたモデルパラメタを用いて数１０として予測分布を算出する。 The statistical model learning apparatus in FIG. 4 calculates model parameters satisfying Equation 9 in the parameter calculation unit 402 from the learning data input from the learning data input unit 401. For example, a learning model such as HMM or GMM is calculated using an EM algorithm. The predicted distribution calculating unit 403 calculates the predicted distribution as Equation 10 using the calculated model parameter.

また、本発明に関連する背景技術としては、他に、非特許文献２に記載の、ＨＭＭを用いた一般的な音響モデルの学習方法がある。また、特許文献１に記載の「統計モデル作成方法」も本発明に関連する技術である。
特開２００１−０８３９８６号公報渡辺澄夫著「代数幾何と学習理論」森北出版株式会社、２００６年４月２０日 L.Rabiner, B-H.Juang 著「音声認識の基礎」ＮＴＴアドバンステクノロジ株式会社、１９９５年 In addition, as a background art related to the present invention, there is a general acoustic model learning method using an HMM described in Non-Patent Document 2. The “statistical model creation method” described in Patent Document 1 is also a technique related to the present invention.
JP 2001-083986 A Sumio Watanabe, “Algebraic Geometry and Learning Theory”, Morikita Publishing Co., Ltd., April 20, 2006 L.Rabiner, BH.Juang, “Basics of Speech Recognition”, NTT Advanced Technology, 1995

従来のＭＡＰ法、あるいは最尤推定法による学習方法の問題点は、予測分布の性能が低くなる場合があるということである。その理由は、尤度のみを考慮し、最尤となる１点だけで予測分布を算出し、モデルパラメタの分布が考慮されていないということである。パラメタの分布を考慮していないため、同じ尤度をもつパラメタの密度（数１２で表される状態密度）は無視される。 The problem with the conventional learning method based on the MAP method or the maximum likelihood estimation method is that the performance of the prediction distribution may be lowered. The reason for this is that only the likelihood is considered, the prediction distribution is calculated from only one point having the maximum likelihood, and the model parameter distribution is not considered. Since the parameter distribution is not taken into consideration, the density of parameters having the same likelihood (state density expressed by Equation 12) is ignored.

非特許文献１において指摘されているように、ＨＭＭやＧＭＭといった学習モデルは特異モデルといわれ、異なるパラメタにおいて数１３を満たすような同一の分布となる特異点が存在する。 As pointed out in Non-Patent Document 1, learning models such as HMM and GMM are called singular models, and there are singular points having the same distribution satisfying Equation 13 in different parameters.

これら特異点の周りでは状態密度が大きな値となるため、予測分布に強く影響し、パラメタ１点のみを用いる推定方法では性能が低くなる場合がある。すなわち、尤度は最大とならないモデルパラメタであっても、同じ尤度を算出する状態密度が大きくなるようなモデルパラメタは、数７で表される予測分布には大きな影響を与え、最尤となる１点のみを用いるＭＡＰ法や最尤推定法では十分な予測分布を与えない場合があるということである。 Since the state density becomes a large value around these singular points, the prediction distribution is strongly affected, and the estimation method using only one parameter may lower the performance. That is, even if the model parameter does not have the maximum likelihood, the model parameter that increases the density of states for calculating the same likelihood has a large effect on the prediction distribution expressed by Equation 7, and the maximum likelihood In other words, the MAP method or the maximum likelihood estimation method using only one point may not give a sufficient prediction distribution.

本発明の目的は、数６のモデルパラメタによる積分に起因する計算量の問題を回避し、かつ、状態密度の影響を考慮した、予測分布を算出する、統計モデル学習装置を提供することにある。 An object of the present invention is to provide a statistical model learning apparatus that calculates a prediction distribution that avoids the problem of the amount of calculation due to integration by the model parameter of Equation 6 and that takes into account the influence of the state density. .

上記目的を達成するために本発明は、モデルパラメタの自己応答関数を算出する自己応答関数算出手段と、Ｍ個のモデルパラメタ間の自己応答関数の値を成分に持つ行列の固有値及び固有ベクトルを算出する固有値解析手段と、前記固有値解析手段において算出された固有値及び固有ベクトルのうち、固有値の大きい方から、Ｍより小さいＭ´個の固有値とＭ´に対応する固有ベクトルの値を用いて、モデルパラメタの重みを算出し、モデルパラメタがその重みを持つとして、重み付き和として予測分布を算出する予測分布算出手段と、を有することを特徴とする。 In order to achieve the above object, the present invention calculates a self-response function calculation means for calculating a self-response function of model parameters, and calculates eigenvalues and eigenvectors of a matrix having the values of self-response functions between M model parameters as components. Of the eigenvalue analysis means and the eigenvalues and eigenvectors calculated by the eigenvalue analysis means, from the larger eigenvalue, M ′ eigenvalues smaller than M and eigenvector values corresponding to M ′ are used. And a predictive distribution calculating means for calculating a predictive distribution as a weighted sum by calculating a weight and assuming that the model parameter has the weight.

本発明によれば、計算量の問題を回避し、かつ、状態密度の影響を考慮した、予測分布を算出する、統計モデル学習装置を提供することが可能となる。 According to the present invention, it is possible to provide a statistical model learning device that calculates a prediction distribution that avoids the problem of calculation amount and considers the influence of state density.

本発明の実施の形態について図面を参照して詳細に説明する。 Embodiments of the present invention will be described in detail with reference to the drawings.

図１及び図２を参照すると、本実施形態に係る統計モデル学習装置が示されている。本実施形態に係る統計モデル学習装置は、学習データ入力部２０１と、学習データ分割手段２０２と、分割学習データ記憶部２０３と、パラメタ算出手段２０４と、パラメタ記憶部２０５及び１０１、自己応答関数算出手段１０２と、固有値解析手段１０３と、予測分布算出手段１０４、とを有して構成される。本実施形態に係る統計モデル学習装置は、例えばパーソナルコンピュータで実現される。なお、ここでパラメタ記憶部２０５と１０１は同じものであってよい。 Referring to FIGS. 1 and 2, a statistical model learning apparatus according to the present embodiment is shown. The statistical model learning apparatus according to the present embodiment includes a learning data input unit 201, a learning data dividing unit 202, a divided learning data storage unit 203, a parameter calculation unit 204, parameter storage units 205 and 101, and a self-response function calculation. Means 102, eigenvalue analysis means 103, and predicted distribution calculation means 104 are configured. The statistical model learning device according to the present embodiment is realized by a personal computer, for example. Here, the parameter storage units 205 and 101 may be the same.

学習データ入力部２０１は、学習データを、例えば、自身のコンピュータから、あるいは他のコンピュータからネットワークを通じて、入力するプログラムである。また、学習データ分割手段２０２は、学習データ入力部２０１で入力された学習データをＭ個に分割するプログラムである。また、分割学習データ記憶部２０３は、例えばハードディスク装置やメモリなどであり、分割されたＭ個のデータが記憶される。 The learning data input unit 201 is a program that inputs learning data, for example, from its own computer or from another computer through a network. The learning data dividing unit 202 is a program that divides the learning data input by the learning data input unit 201 into M pieces. The divided learning data storage unit 203 is, for example, a hard disk device or a memory, and stores M pieces of divided data.

学習データ分割手段２０２は、全学習データを、重なりを許してもよいとして、Ｍ個の部分集団にランダムに分割する。学習データ量が十分にある場合は重なりを許さない分割でもよい。分割個数Ｍは、期待する計算量となるように設定して指定する、あるいは実験的に性能が高くなるように設定し指定する、又は計算量と性能の兼ね合いをみて指定することができる。 The learning data dividing unit 202 randomly divides all learning data into M subgroups, assuming that overlapping is allowed. When there is a sufficient amount of learning data, division that does not allow overlap may be used. The division number M can be set and specified so as to have an expected calculation amount, or set and specified so as to increase performance experimentally, or can be specified in view of a balance between the calculation amount and performance.

パラメタ算出手段２０４は、分割されたＭ個の学習データそれぞれについて、Ｍ個のモデルパラメタを推定する、プログラムである。パラメタ記憶部２０５及び１０１は、例えばハードディスク装置やメモリなどであり、推定されたＭ個のモデルパラメタを記憶する。 The parameter calculation means 204 is a program that estimates M model parameters for each of the divided M learning data. The parameter storage units 205 and 101 are, for example, hard disk devices or memories, and store the estimated M model parameters.

パラメタ算出手段２０４は、与えられた学習データに対し、数３で表される学習モデルの尤度が最大となるようにモデルパラメタを学習する。最尤推定に限らずＭＡＰ推定などを用いてもよく、数８、数９で表される。
学習モデルとしては、例えばＨＭＭやＧＭＭといった隠れ変数を持つ複雑なモデルを用いることができ、それらのモデルパラメタの、最尤あるいはＭＡＰ推定は、ＥＭアルゴリズムを用いることにより実行することが可能である。推定されたＭ個のモデルパラメタを数１４で表す。The parameter calculation unit 204 learns model parameters so that the likelihood of the learning model expressed by Equation 3 is maximized for the given learning data. Not only maximum likelihood estimation but MAP estimation or the like may be used, which are represented by Equations 8 and 9.
As a learning model, for example, a complex model having hidden variables such as HMM and GMM can be used, and maximum likelihood or MAP estimation of these model parameters can be executed by using an EM algorithm. The estimated M model parameters are expressed by Equation 14.

これにより、学習データに対し、高い尤度を持つＭ個のモデルパラメタ候補を得ることができる。 Thereby, M model parameter candidates having high likelihood can be obtained for the learning data.

自己応答関数算出手段１０２は、モデルパラメタ間の確率変数ｘを通じた、自己応答関数を算出するプログラムである。自己応答関数算出手段１０２は、数１５で表される、確率変数ｘを通じたモデルパラメタω、ω´間の自己応答関数を算出する。 The self-response function calculating unit 102 is a program that calculates a self-response function through a random variable x between model parameters. The self-response function calculating unit 102 calculates a self-response function between the model parameters ω and ω ′ through the random variable x, which is expressed by Equation 15.

ここで、確率変数ｘの積分は、学習モデルがＧＭＭの場合には解析的に計算できる。またＨＭＭのようなより複雑なモデルの場合は、学習データの値の和に置き換えることにより実行できる。この学習データの和はすべてを用いる必要はない、あるいは逆に、学習データの線形結合といった点を含めることにより、学習データ以外の点の和も含めることによっても算出することができる。 Here, the integration of the random variable x can be calculated analytically when the learning model is GMM. In the case of a more complex model such as an HMM, it can be executed by replacing it with the sum of the values of learning data. It is not necessary to use all of the sum of the learning data, or conversely, the sum of the points other than the learning data can be calculated by including a point such as a linear combination of the learning data.

固有値解析手段１０３は、パラメタ算出手段２０４で算出されたＭ個のモデルパラメタ間における、自己応答関数算出手段１０２で算出された値を持つ行列において、固有値、固有ベクトルを算出する、プログラムである。 The eigenvalue analyzing unit 103 is a program that calculates eigenvalues and eigenvectors in a matrix having values calculated by the self-response function calculating unit 102 among the M model parameters calculated by the parameter calculating unit 204.

固有値解析手段１０３では、Ｍ個のモデルパラメタ間の自己応答関数の値を成分とする、Ｍ行Ｍ列の行列を構成して、この行列の固有値（数１６）と対応する固有ベクトル（数１７）を算出する。 The eigenvalue analyzing means 103 constructs a matrix of M rows and M columns, the component of which is the value of the self-response function between the M model parameters, and an eigenvector (Equation 17) corresponding to the eigenvalue (Equation 16) of this matrix. Is calculated.

予測分布算出手段１０４は、固有値解析手段１０３で算出された、Ｍ個の固有値及び固有ベクトルから、固有値が大きいＭ´個（Ｍ´＜Ｍ）の固有値とそれに対応した固有ベクトルを用いて、Ｍ個のモデルパラメタの重みを算出し、それら重みつき和として予測分布を算出する、プログラムである。 The prediction distribution calculation unit 104 uses M ′ eigenvalues (M ′ <M) having large eigenvalues and eigenvectors corresponding to the M eigenvalues and eigenvectors calculated by the eigenvalue analysis unit 103 to obtain M pieces of eigenvalues and eigenvectors. It is a program that calculates weights of model parameters and calculates a predicted distribution as a sum of these weights.

予測分布算出手段１０４は、数１８で予測分布を算出する。 The predicted distribution calculation unit 104 calculates the predicted distribution using Equation 18.

ここで、数１９から、学習モデルにおいて、１番目の固有ベクトルの成分の二乗が、固有値に対応することが分かる。 Here, it can be seen from Equation 19 that the square of the component of the first eigenvector corresponds to the eigenvalue in the learning model.

Ｍ´は期待する計算量となるように設定して指定する、あるいは実験的に性能が高くなるように設定し指定する、又は計算量と性能の兼ね合いをみて指定することができる。 M ′ can be set and specified so as to have an expected calculation amount, or set and specified so as to increase performance experimentally, or can be specified in view of the balance between the calculation amount and performance.

数１８と数７を比較すると、本発明では、モデルパラメタの事後分布（数５）ではなく、モデルパラメタの自己応答関数から算出された固有値及び固有ベクトルの値を用いて、重みつけた値となっている。 Comparing Eq. 18 and Eq. 7, in the present invention, the weighted value is obtained using the eigenvalue and eigenvector value calculated from the self-response function of the model parameter instead of the posterior distribution of the model parameter (Equation 5). ing.

これにより、数６のモデルパラメタの積分といった計算量の困難を回避し、学習データに対し尤度が高く、かつ自己応答関数の大きくなるモデルパラメタ、すなわち同じ尤度を与えるモデルパラメタに重みを大きくすることにより、状態密度の影響を考慮した予測分布が算出される。 This avoids the difficulty of computational complexity such as the integration of the model parameters of Equation 6, and increases the weight to model parameters that have a high likelihood of learning data and a large self-response function, that is, model parameters that give the same likelihood. By doing so, a predicted distribution in consideration of the influence of the state density is calculated.

次に、本発明の別の実施の形態について説明する。図３を参照すると、本発明の別の実施形態の構成が示されている。本発明の別の実施形態は、学習データ入力部３０１と、パラメタ初期値記憶部３０２と、パラメタ算出手段３０３と、パラメタ記憶部３０４と、を有する構成である。 Next, another embodiment of the present invention will be described. Referring to FIG. 3, the configuration of another embodiment of the present invention is shown. Another embodiment of the present invention is a configuration including a learning data input unit 301, a parameter initial value storage unit 302, a parameter calculation unit 303, and a parameter storage unit 304.

パラメタ算出手段３０３において、入力された学習データと、異なるＭ個の初期パラメタを用いて、モデルパラメタをそれぞれＭ個算出し、パラメタ記憶部３０４に記憶する。 The parameter calculation unit 303 calculates M model parameters using the input learning data and M different initial parameters, and stores them in the parameter storage unit 304.

パラメタ算出手段３０３は、与えられた学習データに対し、数３で表される学習モデルの尤度が最大となるようにモデルパラメタを学習する。最尤推定に限らずＭＡＰ推定などを用いてもよく、数８、数９で表される。 The parameter calculation unit 303 learns the model parameter so that the likelihood of the learning model expressed by Equation 3 is maximized for the given learning data. Not only maximum likelihood estimation but MAP estimation or the like may be used, which are represented by Equations 8 and 9.

学習モデルとしては、例えばＨＭＭやＧＭＭといった隠れ変数を持つ複雑なモデルを用いることができ、それらのモデルパラメタの、最尤あるいはＭＡＰ推定は、ＥＭアルゴリズムを用いることにより実行することが可能である。ＥＭアルゴリズムは初期パラメタに依存するため、同じ学習データを用いても、異なる初期値を用いると異なるモデルパラメタが推定される。あるいはまた、本実施の形態と前述の実施の形態を組み合わせ、学習データの分割と、異なる初期値を用いて、Ｍ個のモデルパラメタを算出することもできる。これらにより、学習データに対し、高い尤度を持つＭ個のモデルパラメタ候補を得ることができる。
以上で、本発明の別の実施形態についての説明を終える。As a learning model, for example, a complex model having hidden variables such as HMM and GMM can be used, and maximum likelihood or MAP estimation of these model parameters can be executed by using an EM algorithm. Since the EM algorithm depends on the initial parameters, even if the same learning data is used, different model parameters are estimated when different initial values are used. Alternatively, M model parameters can be calculated using a combination of the present embodiment and the above-described embodiment, using learning data division and different initial values. As a result, M model parameter candidates having high likelihood can be obtained for the learning data.
This completes the description of another embodiment of the present invention.

次に、上述した各実施形態の要部の構成とその作用について、再度説明する。
上述した本発明の実施形態の統計モデル学習装置は、Ｍ個のモデルパラメタを記憶するパラメタ記憶部１０１と、モデルパラメタの自己応答関数を算出する自己応答関数算出手段１０２と、Ｍ個のモデルパラメタ間の自己応答関数の値を成分にもつ行列の固有値及び固有ベクトルを算出する固有値解析手段１０３と、算出された固有値及び固有ベクトルを用いて、予測分布を算出する予測分布算出手段１０４を有して構成されている。Next, the structure of the principal part of each embodiment mentioned above and its effect | action are demonstrated again.
The statistical model learning apparatus according to the embodiment of the present invention described above includes a parameter storage unit 101 that stores M model parameters, a self-response function calculating unit 102 that calculates a self-response function of model parameters, and M model parameters. And eigenvalue analysis means 103 for calculating eigenvalues and eigenvectors of a matrix having self-response function values as components, and prediction distribution calculation means 104 for calculating a prediction distribution using the calculated eigenvalues and eigenvectors. Has been.

本実施形態では、このようにして、モデルパラメタ記憶部１０１に記憶された、Ｍ個のモデルパラメタ間の確率変数ｘを通じた、自己応答関数を自己応答関数算出手段１０２で算出する。この自己応答関数の値は、それぞれのモデルパラメタにおいて得られる確率変数ｘの確率密度関数の重なりが大きいとき大きな値となるため、同じ尤度を与えるモデルパラメタ間でその値が大きくなる。 In the present embodiment, the self-response function calculation means 102 calculates the self-response function through the random variable x between the M model parameters stored in the model parameter storage unit 101 in this way. Since the value of the self-response function becomes a large value when the overlap of the probability density functions of the random variables x obtained in the respective model parameters is large, the value becomes large between model parameters giving the same likelihood.

前記Ｍ個のモデルパラメタ間の自己応答関数の値を成分とする、Ｍ行Ｍ列の行列を構成して、この行列の固有値、固有ベクトルを、固有値解析手段１０３で算出し、固有値が大きいＭ´個（Ｍ´＜Ｍ）の固有値とそれに対応した固有ベクトルを用いて、Ｍ個のモデルパラメタの重みを算出し、それら重みつき和として予測分布を算出する。これにより、本実施形態は、数６のモデルパラメタによる積分を行うことなく自己応答関数の大きくなるモデルパラメタ、すなわち同じ尤度を与えるモデルパラメタに重みを大きくすることにより、状態密度の影響を考慮した予測分布が算出される。 An M-row and M-column matrix having the values of the self-response function between the M model parameters as components is constructed, and eigenvalues and eigenvectors of this matrix are calculated by the eigenvalue analysis means 103, and M ′ having a large eigenvalue is obtained. The weights of the M model parameters are calculated using the individual (M ′ <M) eigenvalues and the corresponding eigenvectors, and the prediction distribution is calculated as the weighted sum. As a result, the present embodiment considers the influence of the state density by increasing the weight to the model parameter that increases the self-response function without performing integration with the model parameter of Equation 6, that is, the model parameter that gives the same likelihood. The predicted distribution is calculated.

したがって、本実施形態の効果は、モデルパラメタによる積分を行うことなく、モデルパラメタの状態密度の影響を考慮した予測分布が算出できることにある。その理由は、モデルパラメタ間の自己応答関数を値とする行列の固有値解析を用いて、モデルパラメタの重みを算出しそれらの重みつき和として予測分布を算出しているためである。 Therefore, the effect of the present embodiment is that a predicted distribution can be calculated in consideration of the influence of the state density of the model parameter without performing integration by the model parameter. This is because the weight of model parameters is calculated using eigenvalue analysis of a matrix whose value is a self-response function between model parameters, and the predicted distribution is calculated as a weighted sum thereof.

次に具体的な実施例を用いて、本発明を実施するための最良の形態を説明する。
本実施例は、音声認識のための音響モデルとしてＨＭＭを用い、特にＧＭＭをＨＭＭの各状態からの特徴ベクトルを出力する確率密度関数とする場合において、このＧＭＭのパラメタを学習する場合を例とする。なお、本実施例の構成を示す図面は、図１及び図２である。Next, the best mode for carrying out the present invention will be described using specific examples.
This embodiment uses an HMM as an acoustic model for speech recognition, and in particular, when the GMM is a probability density function that outputs a feature vector from each state of the HMM, this GMM parameter is learned as an example. To do. The drawings showing the configuration of this embodiment are shown in FIGS.

音声認識においては、音声データをケプストラム分析等により特徴ベクトルとし、学習データとする。
既存のＨＭＭを用いてビタービアライメントを実施することにより、ＨＭＭの各状態に学習データを割り当てることができる。今状態ｓのＧＭＭに着目すると、学習データ入力部２０１に、状態ｓに割り当てられた、特徴ベクトルを入力する。In speech recognition, speech data is made into feature vectors by cepstrum analysis or the like and used as learning data.
By performing Viterbi alignment using an existing HMM, learning data can be assigned to each state of the HMM. Focusing on the GMM in the current state s, the feature vector assigned to the state s is input to the learning data input unit 201.

学習データ分割手段２０２において、入力されたデータをＭ個にランダムに分割する。パラメタ算出手段２０４は、前記Ｍ個に分割された学習データを用いて、Ｍ個のモデルパラメタをＥＭアルゴリズムにより算出する。
ＧＭＭの混合重み、平均ベクトル及び共分散行列をまとめたものがモデルパラメタである。The learning data dividing unit 202 divides the input data into M pieces at random. The parameter calculation means 204 calculates M model parameters by the EM algorithm using the learning data divided into M pieces.
A model parameter is a set of GMM mixture weights, mean vectors, and covariance matrices.

自己応答関数算出手段１０２において数１５を用いてモデルパラメタ間の自己応答関数を算出するが、特徴ベクトルｘの積分は、ＧＭＭの場合解析的に実行することが可能である。あるいはまた、学習データ入力部２０１に入力された特徴ベクトルの和に置き換えることにより近似的に実行することもできる。 The self-response function calculation means 102 calculates the self-response function between model parameters using Equation 15, but the integration of the feature vector x can be performed analytically in the case of GMM. Alternatively, it can also be executed approximately by replacing it with the sum of the feature vectors input to the learning data input unit 201.

固有値解析手段１０３において、固有値解析を行い、最も簡単な場合としては「Ｍ´＝１」として、最大固有値とその固有ベクトルの値を用いる。予測分布算出手段１０４においては、数１８においてＭ´＝１とし、予測分布として算出する。
この場合、数１８をみるとＭ個のＧＭＭの重みつき和の形となるため、予測分布全体も再びＧＭＭとなることが分かる。なお、ＧＭＭに限らず、特に混合分布型のモデルを学習する場合にはこの性質は、Ｍ´＝１と限らずとも、数１８により保たれる。The eigenvalue analysis means 103 performs eigenvalue analysis. In the simplest case, “M ′ = 1” is used, and the maximum eigenvalue and the value of its eigenvector are used. In the predicted distribution calculation means 104, M ′ = 1 in the equation 18, and the predicted distribution is calculated.
In this case, it can be seen from Equation 18 that since the weighted sum of M GMMs is formed, the entire prediction distribution is again a GMM. It should be noted that this property is not limited to GMM, and particularly when learning a mixed distribution type model, this property is maintained by Equation 18, even if M ′ = 1.

なお、この出願は、２００７年１２月２０日に出願した、日本特許出願番号２００７−３２８４３５号を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2007-328435 filed on Dec. 20, 2007, the entire disclosure of which is incorporated herein.

本発明の実施形態に係る統計モデル学習装置の構成を示すブロック図（その１）である。It is a block diagram (the 1) which shows the structure of the statistical model learning apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る統計モデル学習装置の構成を示すブロック図（その２）である。It is a block diagram (the 2) which shows the structure of the statistical model learning apparatus which concerns on embodiment of this invention. 本発明の別の実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of another embodiment of this invention. 従来技術を説明するための図である。It is a figure for demonstrating a prior art.

Explanation of symbols

１０１パラメタ記憶部
１０２自己応答関数算出手段
１０３固有値解析手段
１０４予測分布算出手段
２０１学習データ入力部
２０２学習データ分割手段
２０３分割学習データ記憶部
２０４パラメタ算出手段
２０５パラメタ記憶部
３０１学習データ入力部
３０２パラメタ初期値記憶部
３０３パラメタ算出手段
３０４パラメタ記憶部
４０１学習データ入力部
４０２パラメタ算出手段
４０３予測分布算出手段DESCRIPTION OF SYMBOLS 101 Parameter memory | storage part 102 Self-response function calculation means 103 Eigenvalue analysis means 104 Prediction distribution calculation means 201 Learning data input part 202 Learning data division means 203 Division learning data storage part 204 Parameter calculation means 205 Parameter storage part 301 Learning data input part 302 Parameter Initial value storage unit 303 Parameter calculation unit 304 Parameter storage unit 401 Learning data input unit 402 Parameter calculation unit 403 Predictive distribution calculation unit

Claims

A self-response function calculating means for calculating a self-response function of a model parameter that is a parameter of a statistical model of an acoustic model for speech recognition ;
Eigenvalue analysis means for calculating eigenvalues and eigenvectors of a matrix having components of the self-response function between M model parameters;
Of the eigenvalues and eigenvectors calculated by the eigenvalue analysis means, the model parameter weight is calculated using M ′ eigenvalues smaller than M and the eigenvector values corresponding to M ′ from the larger eigenvalues, A prediction distribution calculating means for calculating a prediction distribution as a weighted sum, assuming that the parameter has its weight;
A statistical model learning apparatus comprising:

A model that calculates M model parameters from learning data, which is data for learning a statistical model of an acoustic model for speech recognition, so that the likelihood of a learning model that is a model having hidden variables is maximized. Having parameter calculation means,
2. The statistical model learning apparatus according to claim 1, wherein the self-response function calculating means uses the calculated M model parameters as model parameters.

The model parameter calculating means uses the learning data and initial values of model parameters used when calculating the model parameters using the statistical model, each of M different initial values, The statistical model learning apparatus according to claim 2, wherein M model parameters are calculated.

Learning data dividing means for dividing the learning data into M subsets;
3. The statistical model learning device according to claim 2, wherein the model parameter calculation unit calculates M model parameters using each of the M subsets.

A self-response function calculation process for calculating a self-response function of a model parameter that is a parameter of a statistical model of an acoustic model for speech recognition ;
An eigenvalue analysis process for calculating eigenvalues and eigenvectors of a matrix having self-response function values among M model parameters as components;
Of the eigenvalues and eigenvectors calculated in the eigenvalue analysis process, the model parameter weight is calculated using M ′ eigenvalues smaller than M and eigenvector values corresponding to M ′ from the larger eigenvalues, Prediction distribution calculation processing that calculates a prediction distribution as a weighted sum, assuming that the parameter has its weight,
A statistical model learning method characterized by comprising:

A self-response function calculation process for calculating a self-response function of a model parameter that is a parameter of a statistical model of an acoustic model for speech recognition ;
An eigenvalue analysis process for calculating eigenvalues and eigenvectors of a matrix having self-response function values among M model parameters as components;
Of the eigenvalues and eigenvectors calculated in the eigenvalue analysis process, the model parameter weight is calculated using M ′ eigenvalues smaller than M and eigenvector values corresponding to M ′ from the larger eigenvalues, Prediction distribution calculation processing that calculates a prediction distribution as a weighted sum, assuming that the parameter has its weight,
A statistical model learning program characterized by causing an information processing apparatus to execute.