JP2005141049A

JP2005141049A - Pattern recognition device, voice recognition device, and program

Info

Publication number: JP2005141049A
Application number: JP2003378078A
Authority: JP
Inventors: Tadashi Emori; 正江森; Kenichi Iso; 健一磯
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2003-11-07
Filing date: 2003-11-07
Publication date: 2005-06-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a pattern recognition device, a voice recognition device and a program which can make a standard pattern highly accurate by considering an identification property in the adjusting method of the number of probability distributions in voice recognition using a plurality of probability distributions. <P>SOLUTION: This pattern recognition device is characterized in that when adjusting the number of probability distributions optimally in the pattern recognition device which makes a plurality of probability distributions a standard pattern, the number of the probability distributions is adjusted using at least the consistency of learning data and the standard pattern and a reference in which a penalty of an amount of parameters to be calculated according to the number of parameters of the probability distribution in the standard pattern and an identification penalty being the grade of incorrect recognition of each probability distribution are added. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、パターン認識装置、音声認識装置、およびプログラムに関し、特に混合確率密度モデルを用いた音声認識における標準パターンの作成において、自動的に標準パターンの最適な構造を決めることのできる技術に関する。 The present invention relates to a pattern recognition device, a speech recognition device, and a program, and more particularly to a technique that can automatically determine an optimal structure of a standard pattern in creating a standard pattern in speech recognition using a mixed probability density model.

近年、隠れマルコフモデル(HMM)を用いた音声認識の研究が行われている。HMMの詳細な説明は、ラビナー、ジュアング著、古井訳「音声認識の基礎（下）」、NTTアドバンステクノロジ（1995）（以下、非特許文献１）の102〜187頁に記述されている。HMMを用いた音声認識では、状態の分布を表すのに、ガウス分布を複数用いた混合ガウス分布が広く用いられている。1状態あたりのガウス分布数が多いほど、その状態の分布を正確に表すことができるため、認識性能も向上することが知られている。個々のガウス分布のパラメータは、学習において推定されるが、それぞれのガウス分布を正確に推定するためには、ある程度の学習データ量が必要になる。 In recent years, speech recognition using Hidden Markov Model (HMM) has been studied. A detailed description of the HMM is described on pages 102 to 187 of Rabiner, Juan, Translated by Furui, “Basics of Speech Recognition (below)”, NTT Advanced Technology (1995) (hereinafter, Non-Patent Document 1). In speech recognition using an HMM, a mixed Gaussian distribution using a plurality of Gaussian distributions is widely used to represent the state distribution. It is known that the greater the number of Gaussian distributions per state, the more accurately the state distribution can be expressed, thus improving the recognition performance. The parameters of individual Gaussian distributions are estimated in learning, but a certain amount of learning data is required to accurately estimate each Gaussian distribution.

一方、一般には学習データ量は限られているため、精度良く推定できるガウス分布数は限られている。そのため、データ量あたりのガウス分布数を最適に設定できる基準が必要となる。たとえば、特開平2002-268675(特許文献1と称する)では、記述長最小化(MDL)基準を用いて1状態あたりのガウス分布の数を決定している。MDL基準は、なるべく少ないパラメータ数(ガウス分布数)で、与えられた学習データを良く表すという理念を表現した基準のひとつである。MDLは、特許文献１の［数式１８］（下記（式１）として示す）であらわされるように、データとモデルの整合性を表す量と、ガウス分布数が増えることに対するペナルティとなる量で構成されている。 On the other hand, since the amount of learning data is generally limited, the number of Gaussian distributions that can be accurately estimated is limited. Therefore, a standard that can optimally set the number of Gaussian distributions per data amount is required. For example, in Japanese Patent Laid-Open No. 2002-268675 (referred to as Patent Document 1), the number of Gaussian distributions per state is determined using a description length minimization (MDL) criterion. The MDL standard is one of the standards expressing the philosophy that the given learning data is well represented with as few parameters as possible (the number of Gaussian distributions). As shown in [Formula 18] of Patent Document 1 (shown as (Formula 1) below), the MDL is composed of an amount that represents the consistency between the data and the model, and an amount that is a penalty for an increase in the number of Gaussian distributions. Has been.

L_MDL(i) = -logP_θ(i)(x^N) + 0.5α_ilogN ・・（式１）
（式１）の記号を、特許文献1におけるモデルという用語を状態と読み替えて説明する。x^Nは、状態iを学習するための学習データで、Nは、データの個数を示す。θ(i)は、学習データx^Nによって推定された、状態iの最尤推定量である。α_iは、状態iの次元数を表し、ガウス分布数と１ガウス分布あたりのパラメータ数の積である。特許文献１には、状態数に関する項もあったが、ここでは１状態あたりのガウス分布数を対象としているので省略した。（式１）の右辺第１項目は、データとモデルの整合性がよくなるほど小さくなる。整合性は、データ量あたりのガウス分布が多くなるほど良くなるため、（式１）の右辺の第１項目のみを用いてガウス分布数を決定した場合、データ量に応じてガウス分布数が多くなる方向に働く。 L _{MDL (i)} = -logP _{θ (i)} (x ^N ) + 0.5α _i logN (1)
The symbol of (Formula 1) will be described by replacing the term “model” in Patent Document 1 with a state. x ^N is learning data for learning the state i, and N indicates the number of data. theta (i) was estimated by the learning data x ^N, a maximum likelihood estimator of state i. α _i represents the number of dimensions of the state i, and is the product of the number of Gaussian distributions and the number of parameters per Gaussian distribution. Japanese Patent Laid-Open No. 2004-228688 also has a section on the number of states, but here it is omitted because it deals with the number of Gaussian distributions per state. The first item on the right side of (Expression 1) decreases as the consistency between the data and the model improves. Since the consistency increases as the Gaussian distribution per data amount increases, when the number of Gaussian distributions is determined using only the first item on the right side of (Equation 1), the number of Gaussian distributions increases according to the data amount. Work in the direction.

一方、（式１）の右辺の第２項は、ガウス分布が増加した場合に、値が増加する。そのため、（式１）の右辺第１項によってガウス分布数を増加させようとしても、（式１）の右辺第２項によりガウス分布数に応じてペナルティがかかるため、ある最適な個数が決まる。このような基準は、MDLの他に赤池情報量基準(AIC)やベイズ情報量基準(BIC)などがしられている。特許文献１では、（式１）のL_MDLが最小になるように、ガウス分布を増やした後の記述長L_MDLと、ガウス分布を増やす前の記述長L_MDLの差分である、特許文献1の［数式２］(下記（式２）で示す)であらわされるΔ_MDLが、Δ_MDL＜０の場合は、ガウス分布数を増やして、Δ_MDL＞０の場合、ガウス分布数を増やさないように制御する。 On the other hand, the value of the second term on the right side of (Expression 1) increases when the Gaussian distribution increases. For this reason, even if an attempt is made to increase the number of Gaussian distributions by the first term on the right side of (Equation 1), a penalty is imposed according to the number of Gaussian distributions by the second term on the right side of (Equation 1), and thus an optimal number is determined. Such standards include the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) in addition to the MDL. In Patent Document 1, Patent Document 1 is the difference between the description length L _MDL after increasing the Gaussian distribution and the description length L _MDL before increasing the Gaussian distribution so that L _MDL in (Equation 1) is minimized. When Δ _MDL expressed by [Formula 2] (shown in the following (Formula 2)) is Δ _MDL <0, the number of Gaussian distributions is increased, and when Δ _MDL > 0, the number of Gaussian distributions is not increased. To control.

Δ_MDL=0.5Γ(S1)log|Σ_S1|+0.5Γ(S2)log|Σ_S2|-0.5Γ(S0)log|Σ_S0|+Klog|V| ・・（式２）
（式２）について説明する。簡単のために、状態iは、２つのガウス分布S1とS2を持っていたとし、状態iのガウス分布数を決めるために、特許文献１に示されるように、ガウ
ス分布S1とS2の親ノードとなるガウス分布S0を作成されているとする。ここで、Γ(S0)とΓ(S1)、Γ(S2)は、ガウス分布S1を作成されたときのデータ量、Σ_S0とΣ_S1、Σ_S2は、ガウス分布S1とS2の分散を表す。Kは、分散の次元数を表す。Vは、Γ(S1)+Γ(S2)である。 Δ _MDL = 0.5Γ (S1) log | Σ _S1 | + 0.5Γ (S2) log | Σ _S2 | -0.5Γ (S0) log | Σ _S0 | + Klog | V |
(Formula 2) will be described. For simplicity, it is assumed that the state i has two Gaussian distributions S1 and S2. In order to determine the number of Gaussian distributions in the state i, the parent node of the Gaussian distributions S1 and S2, as shown in Patent Document 1, is shown. Suppose that a Gaussian distribution S0 is created. Here, Γ (S0), Γ (S1), and Γ (S2) are the amount of data when the Gaussian distribution S1 is created, and Σ _S0 and Σ _S1 and Σ _S2 represent the variance of the Gaussian distributions S1 and S2. . K represents the number of dimensions of dispersion. V is Γ (S1) + Γ (S2).

図３と図４を用いて特許文献１に示される、MDL基準を用いたHMMのガウス分布数の調節を行う、従来法の音声認識装置の説明を行う。従来の音声認識装置は、標準パターン作成手段１００と標準パターン調整手段２００と標準パターン記憶手段３００と入力パターン作成手段５００と認識手段４００で構成される。それぞれの動作は、特許文献１に記述されているように、標準パターン作成手段１００は、学習音声を用いて調整前の標準パターンＨ０を作成し、入力パターン作成手段５００は、入力音声を用いて特徴ベクトルVを計算し出力し、標準パターン調整手段２００は、MDL基準を用いて調整前の標準パターンＨ０のガウス分布数を調整して新たな調整後の標準パターンＨを作成し、出力する。標準パターン記憶手段３００は、調整前の標準パターンＨ０を記憶する。認識手段４００は、特徴ベクトルVとHを用いて認識し、認識結果を出力する。 A conventional speech recognition apparatus that adjusts the number of Gaussian distributions of the HMM using the MDL criterion, as shown in Patent Document 1, will be described with reference to FIGS. 3 and 4. The conventional speech recognition apparatus includes a standard pattern creation unit 100, a standard pattern adjustment unit 200, a standard pattern storage unit 300, an input pattern creation unit 500, and a recognition unit 400. As described in Patent Document 1, the standard pattern creating unit 100 creates the standard pattern H0 before adjustment using the learning speech, and the input pattern creating unit 500 uses the input speech. The feature vector V is calculated and output, and the standard pattern adjustment unit 200 creates and outputs a new adjusted standard pattern H by adjusting the number of Gaussian distributions of the standard pattern H0 before adjustment using the MDL standard. The standard pattern storage means 300 stores the standard pattern H0 before adjustment. The recognition unit 400 recognizes using the feature vectors V and H and outputs a recognition result.

図４を用いて、標準パターン調整手段２００を説明する。図４は、従来の標準パターン調整手段２００を説明するためのフローチャートである。整合性計算ステップ２００ａは、（式２）のモデルとデータの整合性の差分を表す量である、右辺の第１、第２、第３項を計算する。パラメータ量ペナルティ計算ステップ２００ｂは、（式２）のガウス分布を増加させることによるペナルティ項の差分である、右辺の第4項を計算する。この値を、パラメータ量ペナルティと呼ぶ。混合分布数決定ステップ２００ｃは、整合性計算ステップ２００ａとパラメータ量ペナルティ計算ステップ２００ｂで計算された値から（式２）のΔ_MDLの値を計算し、Δ_MDL＜０の場合分割を行い、Δ_MDL＞０の場合分割を停止する。これらの動作の詳細は、特許文献１に記載されている。 The standard pattern adjustment unit 200 will be described with reference to FIG. FIG. 4 is a flowchart for explaining the conventional standard pattern adjustment means 200. The consistency calculation step 200a calculates the first, second, and third terms on the right side, which are amounts representing the difference between the model of (Equation 2) and data consistency. The parameter amount penalty calculation step 200b calculates the fourth term on the right side, which is the difference between the penalty terms by increasing the Gaussian distribution of (Equation 2). This value is called a parameter amount penalty. The mixture distribution number determination step 200c calculates the value of Δ _{MDL in} (Equation 2) from the values calculated in the consistency calculation step 200a and the parameter amount penalty calculation step 200b, and performs division when Δ _MDL <0. _{When MDL} > 0, the division is stopped. Details of these operations are described in Patent Document 1.

特許文献1のように、ガウス分布数の決定をデータ量とガウス分布数に着目した基準で行っている方法に対し、ノルマンディン著、「オプティマル・スプリッティング・オブ・エイチエムエム・ガウシャン・ミックスチャー・コンポーネント・ウィズ・エムエムアイイー・トレーニング」アイキャスプ1995(Normandin,“Optimal Splitting of HMM Gaussian Mixture Component with MMIE Training”,ICASSP 1995)(以下、非特許文献２)で相互情報量最大化基準(MMI)を用いたガウス分布数の制御方法を提案している。非特許文献２では、MMIを状態毎の混同する度合いを小さくするための基準として用いている。この方法は、MMIを用いて他の状態と混同の度合いが大きい状態のガウス分布数を増やして、混同の度合いの小さい状態のガウス分布数は増やさないことに特徴がある。 In contrast to the method of determining the number of Gaussian distributions based on the amount of data and the number of Gaussian distributions as in Patent Document 1, Normandin's “Optimal Splitting of HM Gaussian Mixture”・ Component with MMI Training “Icusp 1995” (Normandin, “Optimal Splitting of HMM Gaussian Mixture Component with MMIE Training”, ICASSP 1995) (hereinafter referred to as Non-Patent Document 2) We propose a method for controlling the number of Gaussian distributions using. In Non-Patent Document 2, MMI is used as a reference for reducing the degree of confusion for each state. This method is characterized by using MMI to increase the number of Gaussian distributions in a state of high confusion with other states and not increasing the number of Gaussian distributions in a state of low confusion.

特開平２００２−２６８６７５号公報Japanese Patent Laid-Open No. 2002-268675 ラビナー、ジュアング著、古井訳、「音声認識の基礎（下）」、ＮＴＴアドバンステクノロジ（１９９５）、１０２〜１８７頁Rabiner, J. Wang, Translated by Furui, "Fundamentals of Speech Recognition (below)", NTT Advanced Technology (1995), pp. 102-187 ノルマンディン著、「オプティマル・スプリッティング・オブ・エイチエムエム・ガウシャン・ミックスチャー・コンポーネント・ウィズ・エムエムアイイー・トレーニング」アイキャスプ１９９５（Normandin,“Optimal Splitting of HMM Gaussian Mixture Component with MMIE Training,”ICASSP 1995）Normandin, “Optimal Splitting of HMM Gaussian Mixture Component with MMIE Training,” ICASSP 1995 )

特許文献1で示されているMDL(BIC,AICなど)基準を用いたガウス分布数の制御は、基本的に学習データ量に応じてガウス分布数を決定する方法で、ガウス分布数が過剰に割り当てられるのを防ぐことができる。そのため、HMM全体で持つガウス分布の数が決まっている場合、状態毎に与えられるガウス分布数は、それぞれの状態の学習データ量の相対量に応じた数になる。すなわち、データ量が多い状態には、ガウス分布が多く、データ量の少ない状態には、ガウス分布数が少ない。 The control of the number of Gaussian distributions using the MDL (BIC, AIC, etc.) standard shown in Patent Document 1 is basically a method of determining the number of Gaussian distributions according to the amount of learning data. It can be prevented from being assigned. Therefore, when the number of Gaussian distributions in the entire HMM is determined, the number of Gaussian distributions given for each state is a number corresponding to the relative amount of learning data in each state. That is, the Gaussian distribution is large when the data amount is large, and the Gaussian distribution number is small when the data amount is small.

一方、音声認識では、単語や音節、音素などを識別することを目的としている。特許文献1の方法では、データ量が多いという理由だけで識別に不要なガウス分布が与えられる状態が現れる可能性があり、更に学習データ量が少ないが他と混同が大きい状態には混同を解消するのに不十分な数のガウス分布となる可能性がある。 On the other hand, the purpose of speech recognition is to identify words, syllables, phonemes and the like. In the method of Patent Document 1, there is a possibility that a state where a Gaussian distribution unnecessary for identification is given only appears because the amount of data is large, and furthermore, the confusion is eliminated when the amount of learning data is small but the amount of confusion is large with others. There may be an insufficient number of Gaussian distributions.

非特許文献２のように、識別的な基準でガウス分布数の制御を行っている例では、ガウス分布数の上限を決める基準がはっきりしない。すなわち、極端な場合、学習データに対して識別力を高めようとして識別的にガウス分布数を増やしていくと、1ガウス分布あたりの学習データ量が少なくなり、分布の推定が不安定になる原因となる。また、識別的な基準は、認識の単位とするクラス(ここでは状態)毎に、混同の度合いを計算する必要がある。一般に、音声認識で用いられるHMMは、1000から10000の状態を持っており、これら全ての組み合わせを計算すると多大な計算量が必要になる可能性がある。
［発明の目的］
本発明の目的は、ガウス分布を持つHMMを用いた音声認識装置において、ガウス分布の数の調整を効果的に行うことのできる、音声認識装置を提供することにある。 In an example in which the control of the number of Gaussian distributions is performed using discriminative criteria as in Non-Patent Document 2, the criterion for determining the upper limit of the number of Gaussian distributions is not clear. That is, in extreme cases, if the number of Gaussian distributions is increased discriminatively in order to increase the discriminating power for learning data, the amount of learning data per Gaussian distribution decreases, and the estimation of the distribution becomes unstable. It becomes. In addition, the discriminative criterion needs to calculate the degree of confusion for each class (in this case, state) as a recognition unit. In general, an HMM used in speech recognition has a state of 1000 to 10000, and calculating all of these combinations may require a large amount of calculation.
[Object of invention]
An object of the present invention is to provide a speech recognition apparatus that can effectively adjust the number of Gaussian distributions in a speech recognition apparatus using an HMM having a Gaussian distribution.

本発明の第１のパターン認識装置は、複数の確率分布を標準パターンとするパターン認識装置において、確率分布の個数を最適に調整するに際して、標準パターンにおける確率分布のパラメータ数に応じて計算されるパラメータ量ペナルティと、各確率分布の誤認識の程度である識別ペナルティを合わせた基準を用いて調整することを特徴とする。 The first pattern recognition apparatus according to the present invention is calculated according to the number of parameters of the probability distribution in the standard pattern when optimally adjusting the number of probability distributions in the pattern recognition apparatus using a plurality of probability distributions as the standard pattern. The adjustment is performed using a standard that combines a parameter amount penalty and an identification penalty that is the degree of erroneous recognition of each probability distribution.

本発明の第２のパターン認識装置は、複数の確率分布を標準パターンとするパターン認識装置において、確率分布の個数を最適に調整するに際して、学習データと標準パターンの整合性と、標準パターンにおける確率分布のパラメータ数に応じて計算されるパラメータ量ペナルティと、各確率分布の誤認識の程度である識別ペナルティを合わせた基準を用いて調整することを特徴とする。 The second pattern recognition apparatus according to the present invention is a pattern recognition apparatus that uses a plurality of probability distributions as a standard pattern. When the number of probability distributions is optimally adjusted, the consistency between learning data and the standard pattern, and the probability in the standard pattern It is characterized by adjusting using a standard that combines a parameter amount penalty calculated according to the number of parameters of the distribution and an identification penalty that is the degree of erroneous recognition of each probability distribution.

本発明の第３のパターン認識装置は、確率分布を用いるパターン認識装置において、学習データから作成された標準パターンと入力データを用いて認識結果を出力する認識手段と、標準パターンを保持する標準パターン記憶手段と、標準パターンと認識結果を用いて、確率分布の数を最適に調整する識別的混合分布調整手段を保持し、前記識別的混合分布調整手段が、学習データと標準パターンの整合性を計算する整合性計算手段と、確率分布のパラメータ数に応じたパラメータ量ペナルティを計算するパラメータ量ペナルティ計算手段と、認識結果から識別ペナルティを計算する誤識別ペナルティ計算手段と、前記整合性と前記パラメータ量ペナルティと前記識別ペナルティを用いて、確率分布のパラメータ数を調節するための基準を計算する決定基準計算手段と、前記基準を用いて確率分布数の調整を行う混合分布決定手段を備えることを特徴とする。 According to a third pattern recognition apparatus of the present invention, in a pattern recognition apparatus that uses a probability distribution, a recognition unit that outputs a recognition result using a standard pattern created from learning data and input data, and a standard pattern that holds the standard pattern A discriminant mixture distribution adjustment unit that optimally adjusts the number of probability distributions is stored using the storage unit, the standard pattern, and the recognition result. Consistency calculation means for calculating, parameter quantity penalty calculation means for calculating a parameter quantity penalty corresponding to the number of parameters of the probability distribution, misidentification penalty calculation means for calculating an identification penalty from a recognition result, the consistency and the parameters Use the quantity penalty and the discrimination penalty to calculate a criterion for adjusting the number of parameters in the probability distribution And determining the reference calculating unit that, characterized in that it comprises a mixture distribution determining means for adjusting the probability distribution number using the reference.

本発明の第４のパターン認識装置は、本発明の第３のパターン認識装置において、前記誤識別ペナルティ計算手段が、認識結果と正解の認識単位毎の尤度差、認識単位毎の事後確率、または対象とした認識単位と他の認識単位のカルバック距離の総和の逆数を識別ペナルティとすること特徴とする。 According to a fourth pattern recognition apparatus of the present invention, in the third pattern recognition apparatus of the present invention, the misidentification penalty calculation means includes a recognition result and a likelihood difference for each correct recognition unit, a posterior probability for each recognition unit, Alternatively, the recognizing penalty is a reciprocal of the sum of the Cullback distances of the target recognition unit and other recognition units.

本発明の音声認識装置は、確率分布を用いてパターン認識する音声認識装置において、学習音声データから作成された標準パターンと音声入力データを用いて認識結果を出力する認識手段と、標準パターンを保持する標準パターン記憶手段と、標準パターンと認識結果を用いて、確率分布の数を最適に調整する識別的混合分布調整手段を保持し、前記識別的混合分布調整手段が、学習データと標準パターンの整合性を計算する整合性計算手段と、確率分布のパラメータ数に応じたパラメータ量ペナルティを計算するパラメータ量ペナルティ計算手段と、認識結果から識別ペナルティを計算する誤識別ペナルティ計算手段と、前記整合性と前記パラメータ量ペナルティと前記識別ペナルティを用いて、確率分布のパラメータ数を調節するための基準を計算する決定基準計算手段と、前記基準を用いて確率分布数の調整を行う混合分布決定手段を備えることを特徴とする。 The speech recognition apparatus according to the present invention includes a recognition unit that outputs a recognition result using a standard pattern created from learning speech data and speech input data, and a standard pattern in the speech recognition device that recognizes a pattern using a probability distribution. And a discriminant mixed distribution adjusting unit that optimally adjusts the number of probability distributions using the standard pattern and the recognition result, and the discriminant mixed distribution adjusting unit Consistency calculation means for calculating consistency, parameter quantity penalty calculation means for calculating a parameter quantity penalty corresponding to the number of parameters in the probability distribution, misidentification penalty calculation means for calculating an identification penalty from the recognition result, and the consistency And adjusting the number of parameters of the probability distribution using the parameter amount penalty and the identification penalty And determining the reference calculating means for calculating the quasi, characterized in that it comprises a mixture distribution determining means for adjusting the probability distribution number using the reference.

本発明の第１のプログラムは、複数の確率分布を標準パターンとするパターン認識のプログラムにおいて、確率分布の個数を最適に調整するに際して、標準パターンにおける確率分布のパラメータ数に応じて計算されるパラメータ量ペナルティと、各確率分布の誤認識の程度である識別ペナルティを合わせた基準を用いて調整する手順をコンピュータに実行させることを特徴とする。 The first program of the present invention is a parameter that is calculated according to the number of parameters of a probability distribution in a standard pattern when optimally adjusting the number of probability distributions in a pattern recognition program using a plurality of probability distributions as a standard pattern. It is characterized by having a computer execute a procedure for adjusting using a standard that combines a quantity penalty and an identification penalty that is the degree of erroneous recognition of each probability distribution.

本発明の第２のプログラムは、複数の確率分布を標準パターンとするパターン認識のプログラムにおいて、確率分布の個数を最適に調整するに際して、学習データと標準パターンの整合性と、標準パターンにおける確率分布のパラメータ数に応じて計算されるパラメータ量ペナルティと、各確率分布の誤認識の程度である識別ペナルティを合わせた基準を用いて調整する手順をコンピュータに実行させることを特徴とする。 The second program of the present invention is a pattern recognition program that uses a plurality of probability distributions as a standard pattern. When the number of probability distributions is optimally adjusted, the consistency between the learning data and the standard pattern, and the probability distribution in the standard pattern The computer is caused to perform a procedure for adjustment using a standard that combines a parameter amount penalty calculated according to the number of parameters and an identification penalty that is the degree of erroneous recognition of each probability distribution.

本発明の第３のプログラムは、確率分布を用いるパターン認識のプログラムにおいて、学習データから作成された標準パターンと入力データを用いて認識結果を出力する手順と、前記標準パターンと前記認識結果を用いて確率分布の数を最適に調整する手順と、前記学習データと前記標準パターンの整合性を計算する手順と、確率分布のパラメータ数に応じたパラメータ量ペナルティを計算する手順と、前記認識結果から識別ペナルティを計算する手順と、前記整合性と前記パラメータ量ペナルティと前記識別ペナルティを用いて確率分布のパラメータ数を調節するための基準を計算する手順と、前記基準を用いて確率分布数の調整を行う手順とをコンピュータに実行させることを特徴とする。 According to a third program of the present invention, in a pattern recognition program using a probability distribution, a procedure for outputting a recognition result using a standard pattern created from learning data and input data, and the standard pattern and the recognition result are used. A procedure for optimally adjusting the number of probability distributions, a procedure for calculating consistency between the learning data and the standard pattern, a procedure for calculating a parameter amount penalty corresponding to the number of parameters of the probability distribution, and the recognition result A procedure for calculating an identification penalty, a procedure for calculating a criterion for adjusting the number of parameters of the probability distribution using the consistency, the parameter amount penalty, and the identification penalty, and an adjustment of the number of probability distributions using the criterion And causing the computer to execute a procedure for performing the above.

本発明の第４のプログラムは、本発明の第３のプログラムにおいて、前記識別ペナルティを計算する手順が、前記認識結果と正解の認識単位毎の尤度差、認識単位毎の事後確率、または対象とした認識単位と他の認識単位のカルバック距離の総和の逆数を識別ペナルティとすること特徴とする。 According to a fourth program of the present invention, in the third program of the present invention, the procedure for calculating the identification penalty is the likelihood difference between the recognition result and the correct recognition unit, the posterior probability for each recognition unit, or the target The recognizing penalty is a reciprocal of the sum of the Cullback distances of the recognition unit and other recognition units.

本発明の第５のプログラムは、確率分布を用いてパターン認識する音声認識のプログラムにおいて、学習データから作成された標準パターンと入力データを用いて認識結果を出力する手順と、前記標準パターンと前記認識結果を用いて確率分布の数を最適に調整する手順と、前記学習データと前記標準パターンの整合性を計算する手順と、確率分布のパラメータ数に応じたパラメータ量ペナルティを計算する手順と、前記認識結果から識別ペナルティを計算する手順と、前記整合性と前記パラメータ量ペナルティと前記識別ペナルティを用いて確率分布のパラメータ数を調節するための基準を計算する手順と、前記基準を用いて確率分布数の調整を行う手順とをコンピュータに実行させることを特徴とする。 The fifth program of the present invention is a speech recognition program for recognizing a pattern using a probability distribution, a procedure for outputting a recognition result using a standard pattern created from learning data and input data, the standard pattern, A procedure for optimally adjusting the number of probability distributions using a recognition result; a procedure for calculating consistency between the learning data and the standard pattern; a procedure for calculating a parameter amount penalty according to the number of parameters of the probability distribution; A procedure for calculating an identification penalty from the recognition result, a procedure for calculating a criterion for adjusting the number of parameters of the probability distribution using the consistency, the parameter amount penalty, and the identification penalty, and a probability using the criterion. And causing the computer to execute a procedure for adjusting the number of distributions.

データ量と整合性だけでガウス分布数を調節するのではなく、識別的な量を考慮してガウス分布数を決定するので、識別に必要なガウス分布数が確保できる。基本的には、データ量も考慮された基準を使うことから、識別的な基準だけを使うより、汎化性の高い音響モデルができる。 The number of Gaussian distributions is determined not only by adjusting the number of Gaussian distributions based on the consistency with the amount of data, but also by considering the discriminative amount. Basically, since a criterion that also considers the amount of data is used, an acoustic model with high generalization can be made rather than using only a discriminative criterion.

その結果、認識に最適な混合ガウス分布の配分が行われることになり、MDL基準や識別的な基準を別々に使うより認識性能高い音響モデルを作成することができる。 As a result, an optimal mixture of Gaussian distributions for recognition is performed, and it is possible to create an acoustic model with higher recognition performance than using MDL criteria and discriminative criteria separately.

図１と図２を用いて本発明の最良の形態を説明する。本発明の構成は、標準パターン作成手段１００と、識別的混合分布調整手段２０００と、標準パターン記憶手段３００と、入力パターン作成手段５００と、認識手段４００と、入力パターン計算手段で構成される。識別的混合分布調整手段２０００以外は、従来の技術で提示した特許文献１と同じである。図２を用いて識別的標混合分布調整手段２０００について説明する。図２は、識別的混合分布調整手段の動作を示すフローチャートである。まず、識別的混合分布調整手段２０００は、従来例同様、整合性計算ステップ２００ａとパラメータ量ペナルティ計算ステップ２００ｂを行う。誤識別ペナルティ計算ステップ２０００ａでは、識別性能を表す識別ペナルティを計算する。識別ペナルティの計算方法は、後述する。決定基準計算ステップ２０００ｂにて、整合性とパラメータ量ペナルティと識別ペナルティを用いて、新たな分割基準を計算する。
誤識別ペナルティ計算ステップ２０００ａにて計算される識別ペナルティdLについて説明する。認識結果Ｒは、標準パターン作成手段１００において作成された調整前の標準パターンＨ０を用いて音声データ2を用いて入力パターン作成手段５００で作成された入力パターンの照合を行い、一番類似どの高いシンボル系列として選択される。音声データ2は、学習に用いた音声データや評価に用の音声データ、目的のタスク用の開発用音声データなどが考えられる。また、認識に用いる辞書も、音節辞書や、目的のタスクの辞書などが考えられる。ここでタスクとは、認識や学習に用いる語彙セットのことを示す。 The best mode of the present invention will be described with reference to FIGS. The configuration of the present invention includes a standard pattern creation unit 100, a discriminative mixture distribution adjustment unit 2000, a standard pattern storage unit 300, an input pattern creation unit 500, a recognition unit 400, and an input pattern calculation unit. Except for the discriminative mixture distribution adjusting unit 2000, the method is the same as Patent Document 1 presented in the prior art. The discriminative mark mixture distribution adjusting means 2000 will be described with reference to FIG. FIG. 2 is a flowchart showing the operation of the discriminant mixture distribution adjusting means. First, the discriminant mixture distribution adjusting unit 2000 performs the consistency calculation step 200a and the parameter amount penalty calculation step 200b as in the conventional example. In the misidentification penalty calculation step 2000a, an identification penalty representing the identification performance is calculated. A method for calculating the identification penalty will be described later. In the determination criterion calculation step 2000b, a new division criterion is calculated using the consistency, the parameter amount penalty, and the identification penalty.
The identification penalty dL calculated in the erroneous identification penalty calculation step 2000a will be described. The recognition result R is collated with the input pattern created by the input pattern creating means 500 using the voice data 2 using the standard pattern H0 before adjustment created by the standard pattern creating means 100, and the highest similarity is obtained. Selected as a symbol series. The voice data 2 may be voice data used for learning, voice data for evaluation, development voice data for a target task, or the like. As a dictionary used for recognition, a syllable dictionary, a dictionary of a target task, and the like can be considered. Here, a task indicates a vocabulary set used for recognition and learning.

その後、認識結果Ｒとその結果に対する正解について、それぞれアライメントを計算する。アライメントは、非特許文献１に示されるように、調整前の標準パターンＨ０と入力パターンを用いて、ビタービ・アルゴリズムで行われる。アライメントを計算した後、正解のアライメントと認識結果Ｒのアライメントをフレーム毎に比較し、（式３）正解の状態毎に対数尤度差(以後、尤度とする)を集計する。 Thereafter, the alignment is calculated for the recognition result R and the correct answer to the result. As shown in Non-Patent Document 1, the alignment is performed by the Viterbi algorithm using the standard pattern H0 and the input pattern before adjustment. After the alignment is calculated, the correct alignment and the alignment of the recognition result R are compared for each frame, and (Equation 3) a log likelihood difference (hereinafter referred to as likelihood) is added for each correct state.

dL(正解の状態) = Σ_t{logP(x_t|正解の状態) - logP(x_t|認識結果の状態)} ・・（式３）
（式３）の説明をする。x_tは、正解でアライメントが計算されたとき正解の状態にアライメントされたデータとする。同様に、認識結果でアライメントが計算されたとき認識結果の状態にアライメントされたデータでもある。Σ_tは、x_tについて和を示す。logP(x_t|正解の状態)は、正解の状態のx_tに対する対数尤度、logP(x_t|認識結果の状態)は、認識結果の状態のx_tに対する対数尤度である。このように集計された尤度差が識別ペナルティdLである。ここで、尤度差などを集計する単位を、状態ごととしたが、分布ごとや音素ごと、単語ごとなどの認識単位であればどれでも使える。識別ペナルティdLは、集計された尤度差を出現したフレーム数で割ったものでもよいし、状態や音素ごとの事後確率や、そのフレーム単位の平均でもよい。また、特許文献1に示されるような、（式４）の集計対象となる状態s0と他の全ての状態とのカルバックダイバージェンス総和の逆数でもよい。 dL (correct state) = Σ _t {logP (x _t | correct state)-logP (x _t | recognition result state)} (Equation 3)
(Formula 3) will be described. x _t is data aligned to the correct state when the alignment is calculated with the correct answer. Similarly, when the alignment is calculated from the recognition result, the data is also aligned to the recognition result state. Σ _t shows the sum for x _t. logP (x _t | correct state) is the log likelihood for x _{t in the} correct state, and logP (x _t | recognition result state) is the log likelihood for x _{t in the} recognition result state. The likelihood difference thus aggregated is the identification penalty dL. Here, the unit for counting the likelihood difference is set for each state, but any recognition unit such as for each distribution, for each phoneme, or for each word can be used. The identification penalty dL may be obtained by dividing the total likelihood difference by the number of appearing frames, or may be the posterior probability for each state or phoneme, or the average for each frame. Further, as shown in Patent Document 1, it may be the reciprocal of the total of the culback divergence between the state s0 to be counted in (Equation 4) and all other states.

dL = (Σ_sKL(s0,s))^-κ ・・（式４）
（式４）の説明をする。κは、正規化定数で、正の値を持つ。KL(s0,s)は、状態s0の分布と状態sの分布のカルバックダイバージェンスを表す。Σ_sは、状態sについての総和を表す。カルバックタイバージェンスは、分布同士の重なりを表す量で、値が大きいほど分布が重なっておらず識別するのに有利であると考えられる。（式４）は、状態s0の分布と他の状態の分布のカルバックダイバージェンスの総和の逆数となっているため、他の状態の分布と重なりが小さければ、分布との重なり具合が大きいため識別に不利と考え、状態s0のガウス分布数を増やすためにdLの値が大きくなるように設計する。（式４）は、カルバックダイバージェンスの総和の逆数としているが、カルバックダイバージェンスの総和を対象となる状態s0の分布が出現フレーム数で割った値の逆数でも良い。 dL = (Σ _s KL (s0, s)) ^-κ・・ (Formula 4)
(Formula 4) will be described. κ is a normalization constant and has a positive value. KL (s0, s) represents the distribution of the state s0 and the Cullback divergence of the distribution of the state s. Σ _s represents the sum for state s. The Cullback TI is an amount representing the overlap between distributions, and it is considered that the larger the value, the more advantageous the identification is because the distributions do not overlap. (Equation 4) is the reciprocal of the sum of the Calbach divergences of the distribution of the state s0 and the distribution of the other states. Therefore, if the overlap with the distribution of the other states is small, the degree of overlap with the distribution is large. It is considered disadvantageous and the design is made so that the value of dL increases to increase the number of Gaussian distributions in state s0. (Equation 4) is the reciprocal of the sum of the Cullback divergence, but may be the reciprocal of a value obtained by dividing the distribution of the state s0 as the target by the number of appearance frames.

決定基準計算ステップ２０００ｂでは、整合性とパラメータ量ペナルティを用いて（式２）を用いて、記述長の差分Δ_MDLを計算し、得られたΔ_MDLと（式５）を用いて、識別ペナルティdLを加えた新たな量Δ’を定義して値を計算する。 In the decision criterion calculation step 2000b, the difference Δ _{MDL of the} description length is calculated using (Equation 2) using the consistency and the parameter amount penalty, and the identification penalty is calculated using the obtained Δ _MDL and (Equation 5). Define a new quantity Δ ′ plus dL and calculate the value.

Δ’ = Δ_MDL - β|dL| ・・（式５）
βは、Δ_MDLとdLを計算するのに用いられたデータ量やΔ_MDLとdLの単位(次元)の違いを吸収するための係数とする。識別ペナルティdLは、前述した尤度差や、カルバックダイバージェンスの逆数のように、識別力が弱ければ、その絶対値が大きく、識別するのに十分な場合は、絶対値が小さな値になる。混合分布数決定ステップ２００ｃは、従来と同様にΔ’＜０の場合、ガウス分布数を分割して、Δ’＞０の場合、ガウス分布数の分割を行わない動作を行う。混合分布数決定ステップ２００ｃの動作は従来と同じであるが、入力される値が、記述長L_MDLの差分ではなく、記述長L_MDLの差分に識別的な要素が加わった値Δ’である。 Δ '= Δ _MDL -β | dL | (5)
β is a coefficient for absorbing the data amount used to calculate Δ _MDL and dL and the difference in units (dimensions) of Δ _MDL and dL. The discriminating penalty dL has a large absolute value if the discriminating power is weak, such as the above-described likelihood difference and the inverse of the Cullback divergence, and has a small absolute value if sufficient for discriminating. The mixed distribution number determination step 200c performs an operation of dividing the number of Gaussian distributions when Δ ′ <0, and not dividing the number of Gaussian distributions when Δ ′> 0, as in the conventional case. Although the operation of the mixing distribution number determination step 200c is the same as the conventional, the value entered is not a difference between the description length L _MDL, is described length L difference to the value applied identification elements of _MDL delta ' .

ここで、標準パターン作成手段１００や認識手段４００で、音声認識を行うものとして説明しているが、特に音声認識に限定する必要はなく、確率分布を用いた認識に関わる装置であれば何でも良い。たとえは、標準パターン作成手段１００において、音声データの代わりに、音声データ以外の音楽データや雑音データ、画像データや文字データ等を用いて標準パターンを作成することも可能である。その場合、認識手段４００では、前述のように画像データ等で作成された標準パターンと、入力された画像データ等を照合して、もっとも類似度の高いシンボル系列などを認識結果Ｒとすることもできる。 Here, the standard pattern creation unit 100 and the recognition unit 400 are described as performing speech recognition. However, the present invention is not particularly limited to speech recognition, and any device that is related to recognition using a probability distribution may be used. . For example, the standard pattern creating means 100 can create a standard pattern using music data, noise data, image data, character data, or the like other than voice data instead of voice data. In that case, the recognizing unit 400 may collate the standard pattern created with the image data or the like as described above with the input image data or the like, and set the symbol series having the highest similarity as the recognition result R. it can.

本発明によれば、音声認識における標準パターンを作成する際に、精度良い標準パターンを作成できるため、自動電話予約システムや、音声自動通訳機、自動ディクテーションシステム、ゲームなど、音声認識の使われる分野が広がる。また、本発明によれば、認識結果を反映してチューニングを行うことができるため、例えば、自動電話応答システム等で用いられている、名前や電話番号、株式銘柄など、語彙が限定できれば、認識性能が低かった単語等のチューニングを行うことができるので、前述のように個人情報や取引情報など、より重要な入力を音声で行うような、自動電話チケットの予約システムや自動電話株式売買システムなどの分野に応用できる。 According to the present invention, since a standard pattern with high accuracy can be created when creating a standard pattern for speech recognition, fields such as an automatic telephone reservation system, an automatic speech interpreter, an automatic dictation system, and a game are used. Spread. In addition, according to the present invention, tuning can be performed by reflecting the recognition result. For example, if the vocabulary can be limited, such as names, telephone numbers, stocks, etc. used in an automatic telephone answering system, etc. Because it is possible to tune words with poor performance, as mentioned above, automatic telephone ticket reservation system and automatic telephone stock trading system, etc. that make more important input such as personal information and transaction information by voice It can be applied to the field of

本発明の第１の発明を実施するための最良の形態の構成を示すブロック図である。[BRIEF DESCRIPTION OF THE DRAWINGS] It is a block diagram which shows the structure of the best form for implementing 1st invention of this invention. 第１の発明の識別的混合数調整手段を詳細に説明するためのフローチャートである。It is a flowchart for demonstrating in detail the discriminating mixture number adjustment means of 1st invention. 従来例を説明するためのブロック図である。It is a block diagram for demonstrating a prior art example. 従来例の、標準パターン調整手段を詳細に説明するためのフローチャートである。It is a flowchart for demonstrating in detail the standard pattern adjustment means of a prior art example.

Explanation of symbols

１００標準パターン作成手段
２００標準パターン調整手段
３００標準パターン記憶手段
４００認識手段
５００入力パターン作成手段
２０００識別的混合分布調整手段
２００ａ整合性計算ステップ
２００ｂパラメータ量ペナルティ計算ステップ
２００ｃ混合分布数決定ステップ
２０００ａ誤識別ペナルティ計算ステップ
２０００ｂ決定基準計算ステップ
Ｈ０調整前の標準パターン
Ｈ調整後の標準パターン
Ｒ認識結果
100 Standard pattern creation means 200 Standard pattern adjustment means 300 Standard pattern storage means 400 Recognition means 500 Input pattern creation means 2000 Discriminant mixture distribution adjustment means 200a Consistency calculation step 200b Parameter amount penalty calculation step 200c Mixed distribution number determination step 2000a Misidentification Penalty calculation step 2000b Determination criteria calculation step H0 Standard pattern before adjustment H Standard pattern after adjustment R Recognition result

Claims

In a pattern recognition device with multiple probability distributions as standard patterns, when optimally adjusting the number of probability distributions, the parameter amount penalty calculated according to the number of parameters of the probability distribution in the standard pattern and the erroneous recognition of each probability distribution The pattern recognition apparatus is characterized in that adjustment is performed by using a standard that combines an identification penalty that is a degree of.

Parameters that are calculated according to the consistency between the learning data and the standard pattern and the number of parameters of the probability distribution in the standard pattern when optimally adjusting the number of probability distributions in a pattern recognition device that uses multiple probability distributions as standard patterns A pattern recognition apparatus that adjusts by using a standard that combines a quantity penalty and an identification penalty that is a degree of erroneous recognition of each probability distribution.

In a pattern recognition apparatus using a probability distribution, a recognition unit that outputs a recognition result using a standard pattern created from learning data and input data, a standard pattern storage unit that holds the standard pattern, and a standard pattern and the recognition result are used. A discriminant mixture distribution adjustment unit that optimally adjusts the number of probability distributions, and the discriminant mixture distribution adjustment unit includes a consistency calculation unit that calculates the consistency between the learning data and the standard pattern, Using the parameter amount penalty calculating means for calculating the parameter amount penalty according to the number of parameters, the misidentification penalty calculating means for calculating the identification penalty from the recognition result, the consistency, the parameter amount penalty, and the identification penalty, the probability A decision criterion calculation means for calculating a criterion for adjusting the number of parameters of the distribution, and the criterion Pattern recognition apparatus comprising: a mixing distribution determining means for adjusting the probability distribution number to have.

The misidentification penalty calculation means is configured to identify a recognition result and a likelihood difference for each correct recognition unit, a posteriori probability for each recognition unit, or an inverse of the sum of the Cullback distances of the target recognition unit and other recognition units as an identification penalty. The pattern recognition apparatus according to claim 3.

In a speech recognition apparatus that recognizes a pattern using a probability distribution, a recognition unit that outputs a recognition result using a standard pattern created from learning speech data and speech input data, a standard pattern storage unit that holds a standard pattern, and a standard A discriminant mixture distribution adjustment unit that optimally adjusts the number of probability distributions using patterns and recognition results is held, and the discriminant mixture distribution adjustment unit calculates consistency between learning data and a standard pattern. Means, a parameter amount penalty calculating means for calculating a parameter amount penalty according to the number of parameters of the probability distribution, a misidentification penalty calculating means for calculating an identification penalty from a recognition result, the consistency, the parameter amount penalty, and the identification A decision criterion that uses penalties to calculate criteria for adjusting the number of parameters in the probability distribution Means, voice recognition device, characterized in that it comprises a mixture distribution determining means for adjusting the probability distribution number using the reference.

In a pattern recognition program that uses multiple probability distributions as standard patterns, when optimally adjusting the number of probability distributions, the parameter amount penalty calculated according to the number of parameters of the probability distribution in the standard pattern and the error of each probability distribution A program that causes a computer to execute a procedure for adjustment using a standard that matches an identification penalty that is a degree of recognition.

In a pattern recognition program that uses multiple probability distributions as a standard pattern, when optimally adjusting the number of probability distributions, it is calculated according to the consistency between the learning data and the standard pattern, and the number of parameters of the probability distribution in the standard pattern A program that causes a computer to execute a procedure for adjustment using a criterion that combines a parameter amount penalty and an identification penalty that is the degree of erroneous recognition of each probability distribution.

In a pattern recognition program using a probability distribution, a procedure for outputting a recognition result using a standard pattern created from learning data and input data, and an optimal adjustment of the number of probability distributions using the standard pattern and the recognition result A procedure for calculating the consistency between the learning data and the standard pattern, a procedure for calculating a parameter amount penalty according to the number of parameters of the probability distribution, a procedure for calculating an identification penalty from the recognition result, Causing the computer to execute a procedure for calculating a criterion for adjusting the number of parameters of the probability distribution using the consistency, the parameter amount penalty, and the discrimination penalty, and a procedure for adjusting the number of probability distributions using the criterion A program characterized by that.

The procedure for calculating the identification penalty identifies the difference between the recognition result and the correct recognition unit for each recognition unit, the posterior probability for each recognition unit, or the reciprocal of the sum of the Cullback distances of the target recognition unit and other recognition units. The program according to claim 8, wherein the program is a penalty.

In a speech recognition program for pattern recognition using a probability distribution, a procedure for outputting a recognition result using a standard pattern created from learning data and input data, and the number of probability distributions using the standard pattern and the recognition result , A procedure for calculating the consistency between the learning data and the standard pattern, a procedure for calculating a parameter amount penalty according to the number of parameters of the probability distribution, and a discrimination penalty from the recognition result A procedure, a procedure for calculating a criterion for adjusting the number of parameters of the probability distribution using the consistency, the parameter amount penalty, and the identification penalty; and a procedure for adjusting the number of probability distributions using the criterion. A program characterized by being executed by a computer.