WO2009081707A1 - Statistical model learning device, method, and program - Google Patents

Statistical model learning device, method, and program Download PDF

Info

Publication number
WO2009081707A1
WO2009081707A1 PCT/JP2008/071983 JP2008071983W WO2009081707A1 WO 2009081707 A1 WO2009081707 A1 WO 2009081707A1 JP 2008071983 W JP2008071983 W JP 2008071983W WO 2009081707 A1 WO2009081707 A1 WO 2009081707A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
eigenvalues
self
parameter
response function
Prior art date
Application number
PCT/JP2008/071983
Other languages
French (fr)
Japanese (ja)
Inventor
Yoshifumi Onishi
Original Assignee
Nec Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nec Corporation filed Critical Nec Corporation
Priority to JP2009547010A priority Critical patent/JP5493867B2/en
Publication of WO2009081707A1 publication Critical patent/WO2009081707A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs

Definitions

  • the present invention relates to a statistical model learning apparatus, method, and program, and more particularly, to a technique for calculating a predicted distribution in consideration of the influence of the state density of model parameters.
  • HMM Hidden Malkov Model
  • GMM Mixed Gaussian model
  • Non-Patent Document 1 describes mathematical explanations.
  • Equation 5 Learning data, which is a learning example of the random variable x, Equation 1
  • Numerical model parameter which is a parameter of the statistical model, Equation 2
  • Learning model which gives a probability density function in which the random variable x appears when the model parameter is given
  • Equation 5 the posterior distribution, which is the probability density of the model parameters when learning data is given, is expressed by Equation 5.
  • the denominator Z (X n ) in Equation 5 is called a partition function and is expressed by Equation 6.
  • Equation 7 the probability density function in which the random variable x appears when the learning data is given is expressed by Equation 7 as a predicted distribution.
  • MAP method posterior probability maximization method
  • the log likelihood function is calculated as Equation 8
  • the model parameter that maximizes the log likelihood function is set as the estimated parameter number 9
  • the prediction distribution is calculated as Equation 10.
  • the prior distribution is not used in the MAP method.
  • this corresponds to the prediction distribution using only one model parameter having the maximum likelihood in Equation 7 in Bayesian estimation.
  • a statistical model learning apparatus using such a maximum likelihood estimation method will be described with an example.
  • Fig. 4 shows a conventional statistical model learning apparatus using the maximum likelihood estimation method.
  • the conventional statistical model learning apparatus based on the maximum likelihood estimation method includes a learning data input unit 401, parameter calculation means 402, and predicted distribution calculation means 403.
  • a conventional statistical model learning apparatus based on the maximum likelihood estimation method having such a configuration operates as follows.
  • the statistical model learning apparatus in FIG. 4 calculates model parameters satisfying Equation 9 in the parameter calculation unit 402 from the learning data input from the learning data input unit 401.
  • a learning model such as HMM or GMM is calculated using an EM algorithm.
  • the predicted distribution calculating unit 403 calculates the predicted distribution as Equation 10 using the calculated model parameter.
  • Patent Document 1 JP 2001-083986 A Sumio Watanabe, “Algebraic Geometry and Learning Theory”, Morikita Publishing Co., Ltd., April 20, 2006 L.Rabiner, BH.Juang, “Basics of Speech Recognition”, NTT Advanced Technology, 1995
  • the problem with the learning method based on the conventional MAP method or maximum likelihood estimation method is that the performance of the prediction distribution may be lowered.
  • the reason for this is that only the likelihood is considered, the prediction distribution is calculated from only one point having the maximum likelihood, and the model parameter distribution is not considered. Since the parameter distribution is not taken into consideration, the density of parameters having the same likelihood (state density expressed by Equation 12) is ignored.
  • the prediction distribution is strongly affected, and the estimation method using only one parameter may lower the performance. That is, even if the model parameter does not have the maximum likelihood, the model parameter that increases the density of states for calculating the same likelihood has a large effect on the prediction distribution expressed by Equation 7, and the maximum likelihood In other words, the MAP method or the maximum likelihood estimation method using only one point may not give a sufficient prediction distribution.
  • An object of the present invention is to provide a statistical model learning apparatus that calculates a prediction distribution that avoids the problem of the amount of calculation due to integration by the model parameter of Equation 6 and that takes into account the influence of the state density. .
  • the present invention calculates a self-response function calculation means for calculating a self-response function of model parameters, and calculates eigenvalues and eigenvectors of a matrix having the values of self-response functions between M model parameters as components.
  • a self-response function calculation means for calculating a self-response function of model parameters, and calculates eigenvalues and eigenvectors of a matrix having the values of self-response functions between M model parameters as components.
  • M ′ eigenvalues smaller than M and eigenvector values corresponding to M ′ are used.
  • a predictive distribution calculating means for calculating a predictive distribution as a weighted sum by calculating a weight and assuming that the model parameter has the weight.
  • the statistical model learning apparatus includes a learning data input unit 201, a learning data dividing unit 202, a divided learning data storage unit 203, a parameter calculation unit 204, parameter storage units 205 and 101, and a self-response function calculation. Means 102, eigenvalue analysis means 103, and predicted distribution calculation means 104 are configured.
  • the statistical model learning device according to the present embodiment is realized by a personal computer, for example.
  • the parameter storage units 205 and 101 may be the same.
  • the learning data input unit 201 is a program that inputs learning data, for example, from its own computer or from another computer through a network.
  • the learning data dividing unit 202 is a program that divides the learning data input by the learning data input unit 201 into M pieces.
  • the divided learning data storage unit 203 is, for example, a hard disk device or a memory, and stores M pieces of divided data.
  • Learning data dividing means 202 randomly divides all learning data into M subpopulations, assuming that overlapping is allowed. When there is a sufficient amount of learning data, division that does not allow overlap may be used.
  • the division number M can be set and specified so as to have an expected calculation amount, or set and specified so as to increase performance experimentally, or can be specified in view of a balance between the calculation amount and performance.
  • the parameter calculation means 204 is a program that estimates M model parameters for each of the divided M learning data.
  • the parameter storage units 205 and 101 are, for example, hard disk devices or memories, and store the estimated M model parameters.
  • the parameter calculation unit 204 learns model parameters so that the likelihood of the learning model expressed by Equation 3 is maximized for the given learning data. Not only maximum likelihood estimation but MAP estimation or the like may be used, which are represented by Equations 8 and 9.
  • a learning model for example, a complex model having hidden variables such as HMM and GMM can be used, and maximum likelihood or MAP estimation of these model parameters can be executed by using an EM algorithm.
  • the estimated M model parameters are expressed by Equation 14.
  • the self-response function calculating means 102 is a program for calculating a self-response function through a random variable x between model parameters.
  • the self-response function calculating unit 102 calculates a self-response function between the model parameters ⁇ and ⁇ ′ through the random variable x, which is expressed by Equation 15.
  • the integration of the random variable x can be calculated analytically when the learning model is GMM.
  • the learning model is GMM.
  • it can be executed by replacing it with the sum of the values of learning data. It is not necessary to use all of the sum of the learning data, or conversely, the sum of the points other than the learning data can be calculated by including a point such as a linear combination of the learning data.
  • the eigenvalue analyzing means 103 is a program for calculating eigenvalues and eigenvectors in a matrix having values calculated by the self-response function calculating means 102 among the M model parameters calculated by the parameter calculating means 204.
  • the eigenvalue analyzing means 103 constructs a matrix of M rows and M columns, the component of which is the value of the self-response function between the M model parameters, and an eigenvector (Equation 17) corresponding to the eigenvalue (Equation 16) of this matrix. Is calculated.
  • the prediction distribution calculation unit 104 uses M ′ eigenvalues (M ′ ⁇ M) having large eigenvalues and eigenvectors corresponding to the M eigenvalues and eigenvectors calculated by the eigenvalue analysis unit 103 to obtain M pieces of eigenvalues and eigenvectors. It is a program that calculates weights of model parameters and calculates a predicted distribution as a sum of these weights.
  • the predicted distribution calculation means 104 calculates the predicted distribution using Equation 18.
  • Equation 19 the square of the component of the first eigenvector corresponds to the eigenvalue in the learning model.
  • M ′ can be set and specified so as to have an expected calculation amount, or set and specified so as to increase performance experimentally, or can be specified in view of the balance between the calculation amount and performance.
  • the weighted value is obtained using the eigenvalue and eigenvector value calculated from the self-response function of the model parameter instead of the posterior distribution of the model parameter (Equation 5). ing.
  • FIG. 3 the configuration of another embodiment of the present invention is shown.
  • Another embodiment of the present invention is a configuration including a learning data input unit 301, a parameter initial value storage unit 302, a parameter calculation unit 303, and a parameter storage unit 304.
  • the parameter calculation unit 303 calculates M model parameters using the input learning data and M different initial parameters, and stores them in the parameter storage unit 304.
  • the parameter calculation means 303 learns model parameters so that the likelihood of the learning model expressed by Equation 3 is maximized for the given learning data. Not only maximum likelihood estimation but MAP estimation or the like may be used, which are represented by Equations 8 and 9.
  • a complex model having hidden variables such as HMM and GMM can be used, and maximum likelihood or MAP estimation of these model parameters can be executed by using an EM algorithm. Since the EM algorithm depends on the initial parameters, even if the same learning data is used, different model parameters are estimated when different initial values are used.
  • M model parameters can be calculated using a combination of the present embodiment and the above-described embodiment, using learning data division and different initial values. As a result, M model parameter candidates having high likelihood can be obtained for the learning data. This completes the description of another embodiment of the present invention.
  • the statistical model learning apparatus includes a parameter storage unit 101 that stores M model parameters, a self-response function calculating unit 102 that calculates a self-response function of model parameters, and M model parameters. And eigenvalue analysis means 103 for calculating eigenvalues and eigenvectors of a matrix having self-response function values as components, and prediction distribution calculation means 104 for calculating a prediction distribution using the calculated eigenvalues and eigenvectors. Has been.
  • the self-response function calculation means 102 calculates the self-response function through the random variable x between the M model parameters stored in the model parameter storage unit 101 in this way. Since the value of the self-response function becomes a large value when the overlap of the probability density functions of the random variables x obtained in the respective model parameters is large, the value becomes large between model parameters giving the same likelihood.
  • An M-row and M-column matrix having the values of the self-response function between the M model parameters as components is constructed, and eigenvalues and eigenvectors of this matrix are calculated by the eigenvalue analysis means 103, and M ′ having a large eigenvalue is obtained.
  • the weights of the M model parameters are calculated using the individual (M ′ ⁇ M) eigenvalues and the corresponding eigenvectors, and the prediction distribution is calculated as the weighted sum.
  • the present embodiment considers the influence of the state density by increasing the weight to the model parameter that increases the self-response function without performing integration with the model parameter of Equation 6, that is, the model parameter that gives the same likelihood.
  • the predicted distribution is calculated.
  • the effect of this embodiment is that a predicted distribution can be calculated in consideration of the influence of the state density of the model parameter without performing integration by the model parameter. This is because the weight of model parameters is calculated using eigenvalue analysis of a matrix whose value is a self-response function between model parameters, and the predicted distribution is calculated as a weighted sum thereof.
  • This embodiment uses an HMM as an acoustic model for speech recognition, and in particular, when the GMM is a probability density function that outputs a feature vector from each state of the HMM, this GMM parameter is learned as an example. To do.
  • the drawings showing the configuration of this embodiment are shown in FIGS.
  • speech data is made into feature vectors by cepstrum analysis or the like and used as learning data.
  • learning data can be assigned to each state of the HMM. Focusing on the GMM in the current state s, the feature vector assigned to the state s is input to the learning data input unit 201.
  • the learning data dividing unit 202 divides the input data into M pieces at random.
  • the parameter calculation means 204 calculates M model parameters by the EM algorithm using the learning data divided into M pieces.
  • a model parameter is a set of GMM mixture weights, mean vectors, and covariance matrices.
  • the self-response function between the model parameters is calculated using the formula 15 in the self-response function calculation means 102.
  • the integration of the feature vector x can be performed analytically in the case of GMM. Alternatively, it can also be executed approximately by replacing it with the sum of the feature vectors input to the learning data input unit 201.
  • the eigenvalue analysis means 103 performs eigenvalue analysis.
  • M ′ 1 in the equation 18, and the predicted distribution is calculated.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

It is possible to provide a statistical model learning device which can calculate a prediction distribution considering affects of the state density while avoiding the problem of a statistical amount attributed to model parameter integration. The statistical model learning device includes: self-response function calculation means (102) which calculates a self-response function of a model parameter; eigenvalue analysis means (103) which calculates eigenvalues and eigenvectors of a matrix having self-response functions between M model parameters as components; and prediction distribution calculation means (104) which calculates a weight of a model parameter by using values of M' eigenvalues and values of eigenvectors corresponding to M' which is smaller than M in the descending order of the eigenvalues among the eigenvalues and the eigenvectors calculated by the eigenvalue analysis means (103), and calculates a prediction distribution as a weighted sum, assuming that the model parameter has the weight.

Description

統計モデル学習装置、方法及びプログラムStatistical model learning apparatus, method and program
 本発明は、統計モデル学習装置、方法及びプログラムに関し、特に、モデルパラメタの状態密度の影響を考慮した、予測分布を算出する技術に関する。 The present invention relates to a statistical model learning apparatus, method, and program, and more particularly, to a technique for calculating a predicted distribution in consideration of the influence of the state density of model parameters.
 音声認識システムにおける音素のモデルや、話者照合システムにおける話者のモデルでは、統計モデルが用いられている。具体的には、音素モデルでは隠れマルコフモデル(以下、「HMM(Hidden Malkov Model)」という)や、話者のモデルでは混合ガウスモデル(以下、「GMM(Gaussian Mixture Model)」という)などがよく用いられている。 Statistical models are used for phoneme models in speech recognition systems and speaker models in speaker verification systems. Specifically, a hidden Markov model (hereinafter referred to as “HMM (Hidden Malkov Model)”) is often used as a phoneme model, and a mixed Gaussian model (hereinafter referred to as “GMM (Gaussian Mixture Model)”) as a speaker model. It is used.
 これらの学習方法としては、ベイズ推定に基づく枠組みがあり、例えば、非特許文献1において数理的解説が記載されている。 These learning methods include a framework based on Bayesian estimation. For example, Non-Patent Document 1 describes mathematical explanations.
 確率変数xの学習例である学習データを数1、統計モデルのパラメタであるモデルパラメタを数2、モデルパラメタが与えられたとき確率変数xが出現する確率密度関数を与える学習モデルを数3、モデルパラメタの事前分布を数4、と記述するとき、ベイズ推定の枠組みでは、学習データが与えられたときのモデルパラメタの確率密度である事後分布は、数5で表される。ここで、数5における分母Z(X)は、分配関数と呼ばれ、数6で表される。 Learning data, which is a learning example of the random variable x, Equation 1, Numerical model parameter, which is a parameter of the statistical model, Equation 2, Learning model which gives a probability density function in which the random variable x appears when the model parameter is given, When describing the prior distribution of model parameters as Equation 4, in the framework of Bayesian estimation, the posterior distribution, which is the probability density of the model parameters when learning data is given, is expressed by Equation 5. Here, the denominator Z (X n ) in Equation 5 is called a partition function and is expressed by Equation 6.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
 このとき、学習データが与えられたときの確率変数xが出現する確率密度関数は、予測分布として数7で表される。 At this time, the probability density function in which the random variable x appears when the learning data is given is expressed by Equation 7 as a predicted distribution.
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007
 しかし数6のモデルパラメタによる積分が困難であるため、実際によく用いられる、確率密度分布学習方法としては事後確率最大化法(maximum a posteriori:MAP推定法。以下、「MAP法」という)あるいは最尤推定法がある。 However, since integration with the model parameters of Equation 6 is difficult, a probability density distribution learning method that is often used in practice is a posterior probability maximization method (maximum a posteriori: MAP estimation method, hereinafter referred to as “MAP method”) or There is a maximum likelihood estimation method.
 MAP法では、対数尤度関数を数8として、対数尤度関数を最大化するモデルパラメタを、推定パラメタ数9とし、予測分布を数10として算出する。 In the MAP method, the log likelihood function is calculated as Equation 8, the model parameter that maximizes the log likelihood function is set as the estimated parameter number 9, and the prediction distribution is calculated as Equation 10.
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000009
Figure JPOXMLDOC01-appb-M000009
Figure JPOXMLDOC01-appb-M000010
Figure JPOXMLDOC01-appb-M000010
 最尤推定法ではMAP法において事前分布を用いない場合であり、数11と置き換えればよい。 In the maximum likelihood estimation method, the prior distribution is not used in the MAP method.
Figure JPOXMLDOC01-appb-M000011
Figure JPOXMLDOC01-appb-M000011
 MAP法あるいは最尤推定法では、ベイズ推定における数7で、尤度が最大となる1点のモデルパラメタのみを用いて予測分布としていることに相当する。そのような最尤推定法を用いた統計モデル学習装置について、例を挙げて説明する。 In the MAP method or the maximum likelihood estimation method, this corresponds to the prediction distribution using only one model parameter having the maximum likelihood in Equation 7 in Bayesian estimation. A statistical model learning apparatus using such a maximum likelihood estimation method will be described with an example.
 図4に従来の最尤推定法による統計モデル学習装置を示す。この従来の最尤推定法による統計モデル学習装置は、学習データ入力部401と、パラメタ算出手段402と、予測分布算出手段403とから構成されている。このような構成を有する従来の最尤推定法による統計モデル学習装置は次のように動作する。 Fig. 4 shows a conventional statistical model learning apparatus using the maximum likelihood estimation method. The conventional statistical model learning apparatus based on the maximum likelihood estimation method includes a learning data input unit 401, parameter calculation means 402, and predicted distribution calculation means 403. A conventional statistical model learning apparatus based on the maximum likelihood estimation method having such a configuration operates as follows.
 図4の統計モデル学習装置は、学習データ入力部401から入力した学習データから、パラメタ算出手段402において、数9を満たすモデルパラメタを算出する。例えば、HMMやGMMなどの学習モデルにおいてはEMアルゴリズムを用いて算出する。予測分布算出手段403において、算出されたモデルパラメタを用いて数10として予測分布を算出する。 The statistical model learning apparatus in FIG. 4 calculates model parameters satisfying Equation 9 in the parameter calculation unit 402 from the learning data input from the learning data input unit 401. For example, a learning model such as HMM or GMM is calculated using an EM algorithm. The predicted distribution calculating unit 403 calculates the predicted distribution as Equation 10 using the calculated model parameter.
 また、本発明に関連する背景技術としては、他に、非特許文献2に記載の、HMMを用いた一般的な音響モデルの学習方法がある。また、特許文献1に記載の「統計モデル作成方法」も本発明に関連する技術である。
特開2001-083986号公報 渡辺澄夫著「代数幾何と学習理論」森北出版株式会社、2006年4月20日 L.Rabiner, B-H.Juang 著「音声認識の基礎」NTTアドバンステクノロジ株式会社、1995年
In addition, as a background art related to the present invention, there is a general acoustic model learning method using an HMM described in Non-Patent Document 2. The “statistical model creation method” described in Patent Document 1 is also a technique related to the present invention.
JP 2001-083986 A Sumio Watanabe, "Algebraic Geometry and Learning Theory", Morikita Publishing Co., Ltd., April 20, 2006 L.Rabiner, BH.Juang, "Basics of Speech Recognition", NTT Advanced Technology, 1995
 従来のMAP法、あるいは最尤推定法による学習方法の問題点は、予測分布の性能が低くなる場合があるということである。その理由は、尤度のみを考慮し、最尤となる1点だけで予測分布を算出し、モデルパラメタの分布が考慮されていないということである。パラメタの分布を考慮していないため、同じ尤度をもつパラメタの密度(数12で表される状態密度)は無視される。 The problem with the learning method based on the conventional MAP method or maximum likelihood estimation method is that the performance of the prediction distribution may be lowered. The reason for this is that only the likelihood is considered, the prediction distribution is calculated from only one point having the maximum likelihood, and the model parameter distribution is not considered. Since the parameter distribution is not taken into consideration, the density of parameters having the same likelihood (state density expressed by Equation 12) is ignored.
Figure JPOXMLDOC01-appb-M000012
Figure JPOXMLDOC01-appb-M000012
 非特許文献1において指摘されているように、HMMやGMMといった学習モデルは特異モデルといわれ、異なるパラメタにおいて数13を満たすような同一の分布となる特異点が存在する。 As pointed out in Non-Patent Document 1, learning models such as HMM and GMM are called singular models, and there are singular points having the same distribution satisfying Equation 13 in different parameters.
Figure JPOXMLDOC01-appb-M000013
Figure JPOXMLDOC01-appb-M000013
 これら特異点の周りでは状態密度が大きな値となるため、予測分布に強く影響し、パラメタ1点のみを用いる推定方法では性能が低くなる場合がある。すなわち、尤度は最大とならないモデルパラメタであっても、同じ尤度を算出する状態密度が大きくなるようなモデルパラメタは、数7で表される予測分布には大きな影響を与え、最尤となる1点のみを用いるMAP法や最尤推定法では十分な予測分布を与えない場合があるということである。 Since the state density is a large value around these singular points, the prediction distribution is strongly affected, and the estimation method using only one parameter may lower the performance. That is, even if the model parameter does not have the maximum likelihood, the model parameter that increases the density of states for calculating the same likelihood has a large effect on the prediction distribution expressed by Equation 7, and the maximum likelihood In other words, the MAP method or the maximum likelihood estimation method using only one point may not give a sufficient prediction distribution.
 本発明の目的は、数6のモデルパラメタによる積分に起因する計算量の問題を回避し、かつ、状態密度の影響を考慮した、予測分布を算出する、統計モデル学習装置を提供することにある。 An object of the present invention is to provide a statistical model learning apparatus that calculates a prediction distribution that avoids the problem of the amount of calculation due to integration by the model parameter of Equation 6 and that takes into account the influence of the state density. .
 上記目的を達成するために本発明は、モデルパラメタの自己応答関数を算出する自己応答関数算出手段と、M個のモデルパラメタ間の自己応答関数の値を成分に持つ行列の固有値及び固有ベクトルを算出する固有値解析手段と、前記固有値解析手段において算出された固有値及び固有ベクトルのうち、固有値の大きい方から、Mより小さいM´個の固有値とM´に対応する固有ベクトルの値を用いて、モデルパラメタの重みを算出し、モデルパラメタがその重みを持つとして、重み付き和として予測分布を算出する予測分布算出手段と、を有することを特徴とする。 In order to achieve the above object, the present invention calculates a self-response function calculation means for calculating a self-response function of model parameters, and calculates eigenvalues and eigenvectors of a matrix having the values of self-response functions between M model parameters as components. Of the eigenvalue analysis means and the eigenvalues and eigenvectors calculated by the eigenvalue analysis means, from the larger eigenvalue, M ′ eigenvalues smaller than M and eigenvector values corresponding to M ′ are used. And a predictive distribution calculating means for calculating a predictive distribution as a weighted sum by calculating a weight and assuming that the model parameter has the weight.
 本発明によれば、計算量の問題を回避し、かつ、状態密度の影響を考慮した、予測分布を算出する、統計モデル学習装置を提供することが可能となる。 According to the present invention, it is possible to provide a statistical model learning apparatus that calculates a predicted distribution that avoids the problem of calculation amount and considers the influence of the state density.
 本発明の実施の形態について図面を参照して詳細に説明する。 Embodiments of the present invention will be described in detail with reference to the drawings.
 図1及び図2を参照すると、本実施形態に係る統計モデル学習装置が示されている。本実施形態に係る統計モデル学習装置は、学習データ入力部201と、学習データ分割手段202と、分割学習データ記憶部203と、パラメタ算出手段204と、パラメタ記憶部205及び101、自己応答関数算出手段102と、固有値解析手段103と、予測分布算出手段104、とを有して構成される。本実施形態に係る統計モデル学習装置は、例えばパーソナルコンピュータで実現される。なお、ここでパラメタ記憶部205と101は同じものであってよい。 Referring to FIGS. 1 and 2, a statistical model learning apparatus according to the present embodiment is shown. The statistical model learning apparatus according to the present embodiment includes a learning data input unit 201, a learning data dividing unit 202, a divided learning data storage unit 203, a parameter calculation unit 204, parameter storage units 205 and 101, and a self-response function calculation. Means 102, eigenvalue analysis means 103, and predicted distribution calculation means 104 are configured. The statistical model learning device according to the present embodiment is realized by a personal computer, for example. Here, the parameter storage units 205 and 101 may be the same.
 学習データ入力部201は、学習データを、例えば、自身のコンピュータから、あるいは他のコンピュータからネットワークを通じて、入力するプログラムである。また、学習データ分割手段202は、学習データ入力部201で入力された学習データをM個に分割するプログラムである。また、分割学習データ記憶部203は、例えばハードディスク装置やメモリなどであり、分割されたM個のデータが記憶される。 The learning data input unit 201 is a program that inputs learning data, for example, from its own computer or from another computer through a network. The learning data dividing unit 202 is a program that divides the learning data input by the learning data input unit 201 into M pieces. The divided learning data storage unit 203 is, for example, a hard disk device or a memory, and stores M pieces of divided data.
 学習データ分割手段202は、全学習データを、重なりを許してもよいとして、M個の部分集団にランダムに分割する。学習データ量が十分にある場合は重なりを許さない分割でもよい。分割個数Mは、期待する計算量となるように設定して指定する、あるいは実験的に性能が高くなるように設定し指定する、又は計算量と性能の兼ね合いをみて指定することができる。 Learning data dividing means 202 randomly divides all learning data into M subpopulations, assuming that overlapping is allowed. When there is a sufficient amount of learning data, division that does not allow overlap may be used. The division number M can be set and specified so as to have an expected calculation amount, or set and specified so as to increase performance experimentally, or can be specified in view of a balance between the calculation amount and performance.
 パラメタ算出手段204は、分割されたM個の学習データそれぞれについて、M個のモデルパラメタを推定する、プログラムである。パラメタ記憶部205及び101は、例えばハードディスク装置やメモリなどであり、推定されたM個のモデルパラメタを記憶する。 The parameter calculation means 204 is a program that estimates M model parameters for each of the divided M learning data. The parameter storage units 205 and 101 are, for example, hard disk devices or memories, and store the estimated M model parameters.
 パラメタ算出手段204は、与えられた学習データに対し、数3で表される学習モデルの尤度が最大となるようにモデルパラメタを学習する。最尤推定に限らずMAP推定などを用いてもよく、数8、数9で表される。
 学習モデルとしては、例えばHMMやGMMといった隠れ変数を持つ複雑なモデルを用いることができ、それらのモデルパラメタの、最尤あるいはMAP推定は、EMアルゴリズムを用いることにより実行することが可能である。推定されたM個のモデルパラメタを数14で表す。
The parameter calculation unit 204 learns model parameters so that the likelihood of the learning model expressed by Equation 3 is maximized for the given learning data. Not only maximum likelihood estimation but MAP estimation or the like may be used, which are represented by Equations 8 and 9.
As a learning model, for example, a complex model having hidden variables such as HMM and GMM can be used, and maximum likelihood or MAP estimation of these model parameters can be executed by using an EM algorithm. The estimated M model parameters are expressed by Equation 14.
Figure JPOXMLDOC01-appb-M000014
Figure JPOXMLDOC01-appb-M000014
 これにより、学習データに対し、高い尤度を持つM個のモデルパラメタ候補を得ることができる。 This makes it possible to obtain M model parameter candidates having high likelihood for the learning data.
 自己応答関数算出手段102は、モデルパラメタ間の確率変数xを通じた、自己応答関数を算出するプログラムである。自己応答関数算出手段102は、数15で表される、確率変数xを通じたモデルパラメタω、ω´間の自己応答関数を算出する。 The self-response function calculating means 102 is a program for calculating a self-response function through a random variable x between model parameters. The self-response function calculating unit 102 calculates a self-response function between the model parameters ω and ω ′ through the random variable x, which is expressed by Equation 15.
Figure JPOXMLDOC01-appb-M000015
Figure JPOXMLDOC01-appb-M000015
 ここで、確率変数xの積分は、学習モデルがGMMの場合には解析的に計算できる。またHMMのようなより複雑なモデルの場合は、学習データの値の和に置き換えることにより実行できる。この学習データの和はすべてを用いる必要はない、あるいは逆に、学習データの線形結合といった点を含めることにより、学習データ以外の点の和も含めることによっても算出することができる。 Here, the integration of the random variable x can be calculated analytically when the learning model is GMM. In the case of a more complex model such as an HMM, it can be executed by replacing it with the sum of the values of learning data. It is not necessary to use all of the sum of the learning data, or conversely, the sum of the points other than the learning data can be calculated by including a point such as a linear combination of the learning data.
 固有値解析手段103は、パラメタ算出手段204で算出されたM個のモデルパラメタ間における、自己応答関数算出手段102で算出された値を持つ行列において、固有値、固有ベクトルを算出する、プログラムである。 The eigenvalue analyzing means 103 is a program for calculating eigenvalues and eigenvectors in a matrix having values calculated by the self-response function calculating means 102 among the M model parameters calculated by the parameter calculating means 204.
 固有値解析手段103では、M個のモデルパラメタ間の自己応答関数の値を成分とする、M行M列の行列を構成して、この行列の固有値(数16)と対応する固有ベクトル(数17)を算出する。 The eigenvalue analyzing means 103 constructs a matrix of M rows and M columns, the component of which is the value of the self-response function between the M model parameters, and an eigenvector (Equation 17) corresponding to the eigenvalue (Equation 16) of this matrix. Is calculated.
Figure JPOXMLDOC01-appb-M000016
Figure JPOXMLDOC01-appb-M000016
Figure JPOXMLDOC01-appb-M000017
Figure JPOXMLDOC01-appb-M000017
 予測分布算出手段104は、固有値解析手段103で算出された、M個の固有値及び固有ベクトルから、固有値が大きいM´個(M´<M)の固有値とそれに対応した固有ベクトルを用いて、M個のモデルパラメタの重みを算出し、それら重みつき和として予測分布を算出する、プログラムである。 The prediction distribution calculation unit 104 uses M ′ eigenvalues (M ′ <M) having large eigenvalues and eigenvectors corresponding to the M eigenvalues and eigenvectors calculated by the eigenvalue analysis unit 103 to obtain M pieces of eigenvalues and eigenvectors. It is a program that calculates weights of model parameters and calculates a predicted distribution as a sum of these weights.
 予測分布算出手段104は、数18で予測分布を算出する。 The predicted distribution calculation means 104 calculates the predicted distribution using Equation 18.
Figure JPOXMLDOC01-appb-M000018
Figure JPOXMLDOC01-appb-M000018
 ここで、数19から、学習モデルにおいて、1番目の固有ベクトルの成分の二乗が、固有値に対応することが分かる。 Here, it can be seen from Equation 19 that the square of the component of the first eigenvector corresponds to the eigenvalue in the learning model.
Figure JPOXMLDOC01-appb-M000019
Figure JPOXMLDOC01-appb-M000019
 M´は期待する計算量となるように設定して指定する、あるいは実験的に性能が高くなるように設定し指定する、又は計算量と性能の兼ね合いをみて指定することができる。 M ′ can be set and specified so as to have an expected calculation amount, or set and specified so as to increase performance experimentally, or can be specified in view of the balance between the calculation amount and performance.
 数18と数7を比較すると、本発明では、モデルパラメタの事後分布(数5)ではなく、モデルパラメタの自己応答関数から算出された固有値及び固有ベクトルの値を用いて、重みつけた値となっている。 Comparing Eq. 18 and Eq. 7, in the present invention, the weighted value is obtained using the eigenvalue and eigenvector value calculated from the self-response function of the model parameter instead of the posterior distribution of the model parameter (Equation 5). ing.
 これにより、数6のモデルパラメタの積分といった計算量の困難を回避し、学習データに対し尤度が高く、かつ自己応答関数の大きくなるモデルパラメタ、すなわち同じ尤度を与えるモデルパラメタに重みを大きくすることにより、状態密度の影響を考慮した予測分布が算出される。 This avoids the difficulty of computational complexity such as the integration of the model parameters of Equation 6, and increases the weight to model parameters that have a high likelihood of learning data and a large self-response function, that is, model parameters that give the same likelihood. By doing so, a predicted distribution in consideration of the influence of the state density is calculated.
 次に、本発明の別の実施の形態について説明する。図3を参照すると、本発明の別の実施形態の構成が示されている。本発明の別の実施形態は、学習データ入力部301と、パラメタ初期値記憶部302と、パラメタ算出手段303と、パラメタ記憶部304と、を有する構成である。 Next, another embodiment of the present invention will be described. Referring to FIG. 3, the configuration of another embodiment of the present invention is shown. Another embodiment of the present invention is a configuration including a learning data input unit 301, a parameter initial value storage unit 302, a parameter calculation unit 303, and a parameter storage unit 304.
 パラメタ算出手段303において、入力された学習データと、異なるM個の初期パラメタを用いて、モデルパラメタをそれぞれM個算出し、パラメタ記憶部304に記憶する。 The parameter calculation unit 303 calculates M model parameters using the input learning data and M different initial parameters, and stores them in the parameter storage unit 304.
 パラメタ算出手段303は、与えられた学習データに対し、数3で表される学習モデルの尤度が最大となるようにモデルパラメタを学習する。最尤推定に限らずMAP推定などを用いてもよく、数8、数9で表される。 The parameter calculation means 303 learns model parameters so that the likelihood of the learning model expressed by Equation 3 is maximized for the given learning data. Not only maximum likelihood estimation but MAP estimation or the like may be used, which are represented by Equations 8 and 9.
 学習モデルとしては、例えばHMMやGMMといった隠れ変数を持つ複雑なモデルを用いることができ、それらのモデルパラメタの、最尤あるいはMAP推定は、EMアルゴリズムを用いることにより実行することが可能である。EMアルゴリズムは初期パラメタに依存するため、同じ学習データを用いても、異なる初期値を用いると異なるモデルパラメタが推定される。あるいはまた、本実施の形態と前述の実施の形態を組み合わせ、学習データの分割と、異なる初期値を用いて、M個のモデルパラメタを算出することもできる。これらにより、学習データに対し、高い尤度を持つM個のモデルパラメタ候補を得ることができる。
 以上で、本発明の別の実施形態についての説明を終える。
As a learning model, for example, a complex model having hidden variables such as HMM and GMM can be used, and maximum likelihood or MAP estimation of these model parameters can be executed by using an EM algorithm. Since the EM algorithm depends on the initial parameters, even if the same learning data is used, different model parameters are estimated when different initial values are used. Alternatively, M model parameters can be calculated using a combination of the present embodiment and the above-described embodiment, using learning data division and different initial values. As a result, M model parameter candidates having high likelihood can be obtained for the learning data.
This completes the description of another embodiment of the present invention.
 次に、上述した各実施形態の要部の構成とその作用について、再度説明する。
 上述した本発明の実施形態の統計モデル学習装置は、M個のモデルパラメタを記憶するパラメタ記憶部101と、モデルパラメタの自己応答関数を算出する自己応答関数算出手段102と、M個のモデルパラメタ間の自己応答関数の値を成分にもつ行列の固有値及び固有ベクトルを算出する固有値解析手段103と、算出された固有値及び固有ベクトルを用いて、予測分布を算出する予測分布算出手段104を有して構成されている。
Next, the structure of the principal part of each embodiment mentioned above and its effect | action are demonstrated again.
The statistical model learning apparatus according to the embodiment of the present invention described above includes a parameter storage unit 101 that stores M model parameters, a self-response function calculating unit 102 that calculates a self-response function of model parameters, and M model parameters. And eigenvalue analysis means 103 for calculating eigenvalues and eigenvectors of a matrix having self-response function values as components, and prediction distribution calculation means 104 for calculating a prediction distribution using the calculated eigenvalues and eigenvectors. Has been.
 本実施形態では、このようにして、モデルパラメタ記憶部101に記憶された、M個のモデルパラメタ間の確率変数xを通じた、自己応答関数を自己応答関数算出手段102で算出する。この自己応答関数の値は、それぞれのモデルパラメタにおいて得られる確率変数xの確率密度関数の重なりが大きいとき大きな値となるため、同じ尤度を与えるモデルパラメタ間でその値が大きくなる。 In this embodiment, the self-response function calculation means 102 calculates the self-response function through the random variable x between the M model parameters stored in the model parameter storage unit 101 in this way. Since the value of the self-response function becomes a large value when the overlap of the probability density functions of the random variables x obtained in the respective model parameters is large, the value becomes large between model parameters giving the same likelihood.
 前記M個のモデルパラメタ間の自己応答関数の値を成分とする、M行M列の行列を構成して、この行列の固有値、固有ベクトルを、固有値解析手段103で算出し、固有値が大きいM´個(M´<M)の固有値とそれに対応した固有ベクトルを用いて、M個のモデルパラメタの重みを算出し、それら重みつき和として予測分布を算出する。これにより、本実施形態は、数6のモデルパラメタによる積分を行うことなく自己応答関数の大きくなるモデルパラメタ、すなわち同じ尤度を与えるモデルパラメタに重みを大きくすることにより、状態密度の影響を考慮した予測分布が算出される。 An M-row and M-column matrix having the values of the self-response function between the M model parameters as components is constructed, and eigenvalues and eigenvectors of this matrix are calculated by the eigenvalue analysis means 103, and M ′ having a large eigenvalue is obtained. The weights of the M model parameters are calculated using the individual (M ′ <M) eigenvalues and the corresponding eigenvectors, and the prediction distribution is calculated as the weighted sum. As a result, the present embodiment considers the influence of the state density by increasing the weight to the model parameter that increases the self-response function without performing integration with the model parameter of Equation 6, that is, the model parameter that gives the same likelihood. The predicted distribution is calculated.
 したがって、本実施形態の効果は、モデルパラメタによる積分を行うことなく、モデルパラメタの状態密度の影響を考慮した予測分布が算出できることにある。その理由は、モデルパラメタ間の自己応答関数を値とする行列の固有値解析を用いて、モデルパラメタの重みを算出しそれらの重みつき和として予測分布を算出しているためである。 Therefore, the effect of this embodiment is that a predicted distribution can be calculated in consideration of the influence of the state density of the model parameter without performing integration by the model parameter. This is because the weight of model parameters is calculated using eigenvalue analysis of a matrix whose value is a self-response function between model parameters, and the predicted distribution is calculated as a weighted sum thereof.
 次に具体的な実施例を用いて、本発明を実施するための最良の形態を説明する。
 本実施例は、音声認識のための音響モデルとしてHMMを用い、特にGMMをHMMの各状態からの特徴ベクトルを出力する確率密度関数とする場合において、このGMMのパラメタを学習する場合を例とする。なお、本実施例の構成を示す図面は、図1及び図2である。
Next, the best mode for carrying out the present invention will be described using specific examples.
This embodiment uses an HMM as an acoustic model for speech recognition, and in particular, when the GMM is a probability density function that outputs a feature vector from each state of the HMM, this GMM parameter is learned as an example. To do. The drawings showing the configuration of this embodiment are shown in FIGS.
 音声認識においては、音声データをケプストラム分析等により特徴ベクトルとし、学習データとする。
 既存のHMMを用いてビタービアライメントを実施することにより、HMMの各状態に学習データを割り当てることができる。今状態sのGMMに着目すると、学習データ入力部201に、状態sに割り当てられた、特徴ベクトルを入力する。
In speech recognition, speech data is made into feature vectors by cepstrum analysis or the like and used as learning data.
By performing Viterbi alignment using an existing HMM, learning data can be assigned to each state of the HMM. Focusing on the GMM in the current state s, the feature vector assigned to the state s is input to the learning data input unit 201.
 学習データ分割手段202において、入力されたデータをM個にランダムに分割する。パラメタ算出手段204は、前記M個に分割された学習データを用いて、M個のモデルパラメタをEMアルゴリズムにより算出する。
 GMMの混合重み、平均ベクトル及び共分散行列をまとめたものがモデルパラメタである。
The learning data dividing unit 202 divides the input data into M pieces at random. The parameter calculation means 204 calculates M model parameters by the EM algorithm using the learning data divided into M pieces.
A model parameter is a set of GMM mixture weights, mean vectors, and covariance matrices.
 自己応答関数算出手段102において数15を用いてモデルパラメタ間の自己応答関数を算出するが、特徴ベクトルxの積分は、GMMの場合解析的に実行することが可能である。あるいはまた、学習データ入力部201に入力された特徴ベクトルの和に置き換えることにより近似的に実行することもできる。 The self-response function between the model parameters is calculated using the formula 15 in the self-response function calculation means 102. The integration of the feature vector x can be performed analytically in the case of GMM. Alternatively, it can also be executed approximately by replacing it with the sum of the feature vectors input to the learning data input unit 201.
 固有値解析手段103において、固有値解析を行い、最も簡単な場合としては「M´=1」として、最大固有値とその固有ベクトルの値を用いる。予測分布算出手段104においては、数18においてM´=1とし、予測分布として算出する。
 この場合、数18をみるとM個のGMMの重みつき和の形となるため、予測分布全体も再びGMMとなることが分かる。なお、GMMに限らず、特に混合分布型のモデルを学習する場合にはこの性質は、M´=1と限らずとも、数18により保たれる。
The eigenvalue analysis means 103 performs eigenvalue analysis. In the simplest case, “M ′ = 1” is used, and the maximum eigenvalue and the value of its eigenvector are used. In the predicted distribution calculation means 104, M ′ = 1 in the equation 18, and the predicted distribution is calculated.
In this case, it can be seen from Equation 18 that since the weighted sum of M GMMs is formed, the entire prediction distribution is again a GMM. It should be noted that this property is not limited to GMM, and particularly when learning a mixed distribution type model, this property is maintained by Equation 18, even if M ′ = 1.
 なお、この出願は、2007年12月20日に出願した、日本特許出願番号2007-328435号を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2007-328435 filed on December 20, 2007, the entire disclosure of which is incorporated herein.
本発明の実施形態に係る統計モデル学習装置の構成を示すブロック図(その1)である。It is a block diagram (the 1) which shows the structure of the statistical model learning apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る統計モデル学習装置の構成を示すブロック図(その2)である。It is a block diagram (the 2) which shows the structure of the statistical model learning apparatus which concerns on embodiment of this invention. 本発明の別の実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of another embodiment of this invention. 従来技術を説明するための図である。It is a figure for demonstrating a prior art.
符号の説明Explanation of symbols
 101  パラメタ記憶部
 102  自己応答関数算出手段
 103  固有値解析手段
 104  予測分布算出手段
 201  学習データ入力部
 202  学習データ分割手段
 203  分割学習データ記憶部
 204  パラメタ算出手段
 205  パラメタ記憶部
 301  学習データ入力部
 302  パラメタ初期値記憶部
 303  パラメタ算出手段
 304  パラメタ記憶部
 401  学習データ入力部
 402  パラメタ算出手段
 403  予測分布算出手段
DESCRIPTION OF SYMBOLS 101 Parameter memory | storage part 102 Self-response function calculation means 103 Eigenvalue analysis means 104 Prediction distribution calculation means 201 Learning data input part 202 Learning data division means 203 Division learning data storage part 204 Parameter calculation means 205 Parameter storage part 301 Learning data input part 302 Parameter Initial value storage unit 303 Parameter calculation unit 304 Parameter storage unit 401 Learning data input unit 402 Parameter calculation unit 403 Predictive distribution calculation unit

Claims (6)

  1.  モデルパラメタの自己応答関数を算出する自己応答関数算出手段と、
     M個のモデルパラメタ間の自己応答関数の値を成分に持つ行列の固有値及び固有ベクトルを算出する固有値解析手段と、
     前記固有値解析手段において算出された固有値及び固有ベクトルのうち、固有値の大きい方から、Mより小さいM´個の固有値とM´に対応する固有ベクトルの値を用いて、モデルパラメタの重みを算出し、モデルパラメタがその重みを持つとして、重み付き和として予測分布を算出する予測分布算出手段と、
     を有することを特徴とする統計モデル学習装置。
    A self-response function calculating means for calculating a self-response function of the model parameter;
    Eigenvalue analysis means for calculating eigenvalues and eigenvectors of a matrix having components of the self-response function between M model parameters;
    Of the eigenvalues and eigenvectors calculated by the eigenvalue analysis means, the model parameter weight is calculated using M ′ eigenvalues smaller than M and the eigenvector values corresponding to M ′ from the larger eigenvalues, A prediction distribution calculating means for calculating a prediction distribution as a weighted sum, assuming that the parameter has its weight;
    A statistical model learning apparatus comprising:
  2.  M個のモデルパラメタを算出するモデルパラメタ算出手段を有し、
     前記自己応答関数算出手段は、モデルパラメタとして、前記算出されたM個のモデルパラメタを用いることを特徴とする請求項1記載の統計モデル学習装置。
    Having model parameter calculation means for calculating M model parameters;
    2. The statistical model learning apparatus according to claim 1, wherein the self-response function calculating means uses the calculated M model parameters as model parameters.
  3.  前記モデルパラメタ算出手段は、学習データと、異なる初期値M個を用いて、M個のモデルパラメタを算出することを特徴とする請求項2記載の統計モデル学習装置。 3. The statistical model learning device according to claim 2, wherein the model parameter calculation means calculates M model parameters using learning data and M different initial values.
  4.  学習データをM個の部分集合に分割する学習データ分割手段を有し、
     前記モデルパラメタ算出手段は、分割された学習データを用いて、M個のモデルパラメタを算出することを特徴とする請求項2記載の統計モデル学習装置。
    Learning data dividing means for dividing the learning data into M subsets;
    3. The statistical model learning device according to claim 2, wherein the model parameter calculation means calculates M model parameters using the divided learning data.
  5.  モデルパラメタの自己応答関数を算出する自己応答関数算出処理と、
     M個のモデルパラメタ間の自己応答関数の値を成分に持つ行列の固有値及び固有ベクトルを算出する固有値解析処理と、
     前記固有値解析処理において算出された固有値及び固有ベクトルのうち、固有値の大きい方から、Mより小さいM´個の固有値とM´に対応する固有ベクトルの値を用いて、モデルパラメタの重みを算出し、モデルパラメタがその重みを持つとして、重み付き和として予測分布を算出する予測分布算出処理と、
     を含むことを特徴とする統計モデル学習方法。
    A self-response function calculation process for calculating a self-response function of a model parameter;
    An eigenvalue analysis process for calculating eigenvalues and eigenvectors of a matrix having self-response function values among M model parameters as components;
    Of the eigenvalues and eigenvectors calculated in the eigenvalue analysis process, the model parameter weight is calculated using M ′ eigenvalues smaller than M and eigenvector values corresponding to M ′ from the larger eigenvalues, Prediction distribution calculation processing that calculates a prediction distribution as a weighted sum, assuming that the parameter has its weight,
    A statistical model learning method characterized by comprising:
  6.  情報処理装置に、
     モデルパラメタの自己応答関数を算出する自己応答関数算出処理と、
     M個のモデルパラメタ間の自己応答関数の値を成分に持つ行列の固有値及び固有ベクトルを算出する固有値解析処理と、
     前記固有値解析処理において算出された固有値及び固有ベクトルのうち、固有値の大きい方から、Mより小さいM´個の固有値とM´に対応する固有ベクトルの値を用いて、モデルパラメタの重みを算出し、モデルパラメタがその重みを持つとして、重み付き和として予測分布を算出する予測分布算出処理と、
     を実行させることを特徴とする統計モデル学習プログラム。
    In the information processing device,
    A self-response function calculation process for calculating a self-response function of a model parameter;
    An eigenvalue analysis process for calculating eigenvalues and eigenvectors of a matrix having self-response function values among M model parameters as components;
    Of the eigenvalues and eigenvectors calculated in the eigenvalue analysis process, the model parameter weight is calculated using M ′ eigenvalues smaller than M and eigenvector values corresponding to M ′ from the larger eigenvalues, Prediction distribution calculation processing that calculates a prediction distribution as a weighted sum, assuming that the parameter has its weight,
    A statistical model learning program characterized in that
PCT/JP2008/071983 2007-12-20 2008-12-03 Statistical model learning device, method, and program WO2009081707A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2009547010A JP5493867B2 (en) 2007-12-20 2008-12-03 Statistical model learning apparatus, method and program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2007328435 2007-12-20
JP2007-328435 2007-12-20

Publications (1)

Publication Number Publication Date
WO2009081707A1 true WO2009081707A1 (en) 2009-07-02

Family

ID=40801018

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2008/071983 WO2009081707A1 (en) 2007-12-20 2008-12-03 Statistical model learning device, method, and program

Country Status (2)

Country Link
JP (1) JP5493867B2 (en)
WO (1) WO2009081707A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05257492A (en) * 1992-03-13 1993-10-08 Toshiba Corp Voice recognizing system
JP2001126056A (en) * 1999-10-26 2001-05-11 Mitsubishi Electric Inf Technol Center America Inc Method for modeling system operating in plural forms and device for modeling dynamic system operating in various forms

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05257492A (en) * 1992-03-13 1993-10-08 Toshiba Corp Voice recognizing system
JP2001126056A (en) * 1999-10-26 2001-05-11 Mitsubishi Electric Inf Technol Center America Inc Method for modeling system operating in plural forms and device for modeling dynamic system operating in various forms

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
NAGATA K. ET AL.: "Bayes Jigo Bunpu no Saiteki Kinjiho no Teian to Yukosei ni Tsuite", IEICE TECHNICAL REPORT, vol. 104, no. 760, 23 March 2005 (2005-03-23), pages 195 - 200 *
PARK HYUNSIN ET AL.: "Onso Bubun Kukan no Togo ni yoru Onsei Tokuchoryo Chushutsu no Kento", INFORMATION PROCESSING SOCIETY OF JAPAN KENKYU HOKOKU, vol. 2007, no. 129, 20 December 2007 (2007-12-20), pages 241 - 246 *
TAKAMATSU S. ET AL.: "Kyokushoka Bayes Gakushuho", THE TRANSACTIONS OF THE INSTITUTE OF ELECTROS D, vol. J89-D, no. 10, 1 October 2006 (2006-10-01), pages 2260 - 2268 *

Also Published As

Publication number Publication date
JP5493867B2 (en) 2014-05-14
JPWO2009081707A1 (en) 2011-05-06

Similar Documents

Publication Publication Date Title
US11158305B2 (en) Online verification of custom wake word
Arisoy et al. Bidirectional recurrent neural network language models for automatic speech recognition
Tjandra et al. VQVAE unsupervised unit discovery and multi-scale code2spec inverter for zerospeech challenge 2019
US20190206389A1 (en) Method and apparatus with a personalized speech recognition model
US9400955B2 (en) Reducing dynamic range of low-rank decomposition matrices
JP5229478B2 (en) Statistical model learning apparatus, statistical model learning method, and program
CN117709426A (en) Method, system and computer storage medium for training machine learning model
CN110349597B (en) Voice detection method and device
US20180300610A1 (en) Select one of plurality of neural networks
Sadhu et al. Continual Learning in Automatic Speech Recognition.
Saito et al. Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis
JP2008203469A (en) Speech recognition device and method
JP4817250B2 (en) Voice quality conversion model generation device and voice quality conversion system
Huang et al. Speaker adaptation of RNN-BLSTM for speech recognition based on speaker code
JPWO2007105409A1 (en) Standard pattern adaptation device, standard pattern adaptation method, and standard pattern adaptation program
Tanaka et al. Automated structure discovery and parameter tuning of neural network language model based on evolution strategy
Saeidi et al. Particle swarm optimization for sorted adapted gaussian mixture models
Triefenbach et al. Can non-linear readout nodes enhance the performance of reservoir-based speech recognizers?
JP5493867B2 (en) Statistical model learning apparatus, method and program
JP2006201265A (en) Voice recognition device
Tjandra et al. Stochastic gradient variational bayes for deep learning-based ASR
Samarakoon et al. Learning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic Models.
Mun et al. Bootstrap equilibrium and probabilistic speaker representation learning for self-supervised speaker verification
Junior et al. A speech recognition system for embedded applications using the SOM and TS-SOM networks
Lu Sequence training and adaptation of highway deep neural networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08864234

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2009547010

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08864234

Country of ref document: EP

Kind code of ref document: A1