JP2738508B2

JP2738508B2 - Statistical language model creation device and speech recognition device

Info

Publication number: JP2738508B2
Application number: JP6263531A
Authority: JP
Inventors: 仁一村上
Original assignee: Ei Tei Aaru Onsei Honyaku Tsushin Kenkyusho Kk
Current assignee: Ei Tei Aaru Onsei Honyaku Tsushin Kenkyusho Kk
Priority date: 1994-10-27
Filing date: 1994-10-27
Publication date: 1998-04-08
Anticipated expiration: 2013-04-08
Also published as: JPH08123463A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、音声認識のための統計
的言語モデルを自動的に作成する統計的言語モデル作成
装置、及びそれで作成された統計的言語モデルを参照し
て入力された単語列からなる発声音声文を音声認識する
音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a statistical language model creating apparatus for automatically creating a statistical language model for speech recognition, and a word input by referring to the statistical language model created by the statistical language model creating apparatus. The present invention relates to a speech recognition device for recognizing an utterance sentence composed of a sequence.

【０００２】[0002]

【従来の技術】音声認識において利用される言語モデル
として、ネットワーク文法や文脈自由文法に代表される
構文モデルや、バイグラム（bigram）及びトライグラム
（trigram）に代表される統計モデルが用いられてい
る。前者の構文的な言語モデルは、自然言語処理の分野
で広く研究されているが、言語に関する知識に基づいて
構文規則を人間が記述するために多大な労力を要する。
一方、後者のバイグラムやトライグラムなどの統計モデ
ルは簡単なモデルであるために、音声認識の分野で言語
モデルとして広く用いられているが、このモデルは言語
を表現するにはあまりにも単純である。そこで両モデル
の問題点を補完するために、構文モデルに確率を加えた
確率付きネットワーク文法や確率付き文脈自由文法など
が提案されて研究されている。2. Description of the Related Art As language models used in speech recognition, syntax models represented by network grammar and context-free grammar, and statistical models represented by bigrams and trigrams are used. . Although the former syntactic language model has been widely studied in the field of natural language processing, it requires a great deal of effort for humans to describe syntax rules based on knowledge about languages.
On the other hand, statistical models such as bigrams and trigrams are widely used as language models in the field of speech recognition because they are simple models, but these models are too simple to express languages. . Therefore, to complement the problems of both models, a network grammar with probability and a context-free grammar with probability have been proposed and studied.

【０００３】ところで、音声認識の分野では、隠れマル
コフモデル（以下、ＨＭＭという。）が良く利用されて
いる。ＨＭＭの種類の中で、図２に示すように、全状態
間の遷移が許されているエルゴディック（Ergodic）離
散型ＨＭＭ（以下、Ｅ−ＨＭＭという。）において単語
を出力シンボルとした場合、その構造はネットワーク文
法記述と形式的に類似する。従って、大量の単語列デー
タから、バーム・ウェルチ（Ｂａｕｍ−Ｗｅｌｃｈ）の
学習アルゴリズムを使用して、確率つきネットワーク文
法を例えばディジタル計算機を用いて自動的に作成でき
る可能性がある。[0003] In the field of speech recognition, a hidden Markov model (hereinafter, referred to as HMM) is often used. As shown in FIG. 2, among the types of HMMs, as shown in FIG. 2, when a word is set as an output symbol in an Ergodic discrete HMM (hereinafter, referred to as E-HMM) in which transition between all states is allowed, Its structure is formally similar to a network grammar description. Therefore, there is a possibility that a stochastic network grammar can be automatically created from a large amount of word string data using a Baum-Welch learning algorithm, for example, using a digital computer.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、状態数
を大きくすると、バーム・ウェルチの学習アルゴリズム
を使用して確率つきネットワーク文法を作成するディジ
タル計算機のメモリ量及び計算量は指数関数的に増加す
るため、現実的に計算が不可能になる。本発明者の試算
によれば、状態数が３０を超えると計算は不可能とな
る。そのため従来の研究では、状態数が少なく、音声認
識性能やパープレキシティー（perplexity）は単語のバ
イグラムと比較して悪いという結果が得られている（例
えば、村瀬ほか，“文脈自由文法とｂｉｇｒａｍ・ｔｒ
ｉｇｒａｍによる言語のモデル化の検討”，電子情報通
信学会技術報告，ＳＰ９０−７５，ＮＬＣ９０−４７，
ｐｐ．４９−５６，１９９０年参照。）。However, when the number of states is increased, the amount of memory and the amount of calculation of a digital computer for creating a network grammar with probability using the learning algorithm of Balm-Welch increase exponentially. In practice, calculation becomes impossible. According to the inventor's trial calculation, when the number of states exceeds 30, calculation becomes impossible. For this reason, previous studies have found that the number of states is small, and that speech recognition performance and perplexity are worse than bigrams of words (for example, Murase et al., “Context-free grammar and bigram. tr
Study of Language Modeling Using igram ", IEICE Technical Report, SP90-75, NLC90-47,
pp. 49-56, 1990. ).

【０００５】本発明の目的は以上の問題点を解決し、バ
ーム・ウェルチの学習アルゴリズムを用いて統計的言語
モデルを自動的に作成でき、しかも従来例に比較してよ
り高い音声認識率を得ることができる統計的言語モデル
作成装置と、それを用いた音声認識装置とを提供するこ
とにある。SUMMARY OF THE INVENTION An object of the present invention is to solve the above-mentioned problems, to automatically generate a statistical language model using a Balm-Welch learning algorithm, and to obtain a higher speech recognition rate than the conventional example. It is an object of the present invention to provide a statistical language model creation device capable of performing the above-mentioned operations and a speech recognition device using the same.

【０００６】[0006]

【課題を解決するための手段】本発明に係る請求項１記
載の統計的言語モデル作成装置は、学習途中の統計的言
語モデルを記憶する記憶装置と、入力される学習用テキ
ストデータに基づいて、初期状態確率と状態遷移確率と
シンボル出力確率のパラメータを含む所定の状態数のエ
ルゴディック隠れマルコフモデルである、音声認識又は
形態素解析のための統計的言語モデルを、バーム・ウェ
ルチの学習アルゴリズムを用いて学習して作成する制御
手段とを備えた統計的言語モデル作成装置において、上
記制御手段は、上記バーム・ウェルチの学習アルゴリズ
ムを用いて上記パラメータを推定計算するときに、シン
ボル出力確率が所定のしきい値よりも小さいときに、当
該シンボル出力確率を０に設定して、上記設定したシン
ボル出力確率の遷移先の状態を上記記憶装置から削除す
る削除手段と、上記バーム・ウェルチの学習アルゴリズ
ムを用いて上記パラメータを学習するときに、上記エル
ゴディック隠れマルコフモデルの状態数を逐次２倍に設
定してより大きな状態数のときの上記エルゴディック隠
れマルコフモデルの上記パラメータを計算するパラメー
タ計算手段とを備えたことを特徴とする。According to a first aspect of the present invention, there is provided a statistical language model creating apparatus which stores a statistical language model in the course of learning and a learning text data input thereto. A statistical language model for speech recognition or morphological analysis, which is an Ergodic hidden Markov model of a predetermined number of states including parameters of initial state probability, state transition probability, and symbol output probability, a learning algorithm of Balm-Welch A statistical language model creating device comprising: a learning unit that learns and uses the learning unit. The control unit estimates and calculates the parameter using the Balm-Welch learning algorithm, and the symbol output probability is a predetermined value. Is smaller than the threshold value, the symbol output probability is set to 0, and the transition of the set symbol output probability is set. Deleting means for deleting the previous state from the storage device; and, when learning the parameters using the Balm-Welch learning algorithm, sequentially setting the number of states of the Ergodic hidden Markov model to twice. Parameter calculation means for calculating the parameters of the Ergodic hidden Markov model when the number of states is large.

【０００７】また、請求項２記載の統計的言語モデル作
成装置は、請求項１記載の統計的言語モデル作成装置に
おいて、上記パラメータ計算手段は、第１の状態数か
ら、上記第１の状態数の２倍である第２の状態数のパラ
メータを計算するときに、２つの状態毎に、第１の状態
数のときの同一の初期状態確率の初期パラメータを有す
る２個のパラメータを計算し、計算した２個の初期パラ
メータに正規化のための０．５を乗算することにより、
第２の状態数のときの初期状態確率の初期パラメータを
計算する第１の計算手段と、４つの状態毎に、第１の状
態数のときの同一の状態遷移確率の初期パラメータを有
する４個のパラメータを計算し、計算した４個の初期パ
ラメータに正規化のための０．５を乗算することによ
り、第２の状態数のときの状態遷移確率の初期パラメー
タを計算する第２の計算手段と、４つの状態毎に、第１
の状態数のときの同一のシンボル出力確率の初期パラメ
ータを有する４個のパラメータを計算し、計算した４個
の初期パラメータにそれぞれ、所定の方法で発生した互
いに異なる乱数を乗算し、その乗算結果を第２の状態数
のときのシンボル出力確率の和が１となるように正規化
することにより、第２の状態数のときのシンボル出力確
率のパラメータを計算する第３の計算手段とを備えたこ
とを特徴とする。According to a second aspect of the present invention, in the statistical language model creating apparatus according to the first aspect, the parameter calculating means calculates the first state number from the first state number. When calculating a parameter of the second number of states that is twice the number of the two states, two parameters having the same initial state probability as the first number of states at the first number of states are calculated for each of the two states; By multiplying the two calculated initial parameters by 0.5 for normalization,
First calculating means for calculating initial parameters of initial state probabilities at the second number of states, and four calculating means for every four states having initial parameters of the same state transition probability at the first number of states Second calculating means for calculating the initial parameters of the state transition probability at the second number of states by multiplying the calculated four initial parameters by 0.5 for normalization And for each of the four states,
Calculate four parameters having initial parameters of the same symbol output probability at the number of states, multiply each of the calculated initial parameters by a different random number generated by a predetermined method, and calculate the multiplication result. And a third calculating means for calculating a parameter of the symbol output probability at the second number of states by normalizing the parameter so that the sum of the symbol output probabilities at the second number of states becomes 1. It is characterized by having.

【０００８】また、本発明に係る請求項３記載の音声認
識装置は、音声認識のための統計的言語モデルを作成す
る請求項１又は２記載の統計的言語モデル作成装置と、
入力される単語列からなる発声音声の音声信号に基づい
て、上記統計的言語モデル作成装置によって作成された
統計的言語モデルを参照して上記発声音声を音声認識す
る音声認識手段とを備えたことを特徴とする。According to a third aspect of the present invention, there is provided a speech recognition apparatus for creating a statistical language model for speech recognition,
Voice recognition means for recognizing the uttered voice by referring to the statistical language model created by the statistical language model creating device based on the speech signal of the uttered speech composed of the input word string It is characterized by.

【０００９】[0009]

【作用】以上のように構成された請求項１記載の統計的
言語モデル作成装置においては、上記削除手段は、上記
バーム・ウェルチの学習アルゴリズムを用いて上記パラ
メータを推定計算するときに、シンボル出力確率が所定
のしきい値よりも小さいときに、当該シンボル出力確率
を０に設定して、上記設定したシンボル出力確率の遷移
先の状態を上記記憶装置から削除する。また、上記パラ
メータ計算手段は、上記バーム・ウェルチの学習アルゴ
リズムを用いて上記パラメータを学習するときに、上記
エルゴディック隠れマルコフモデルの状態数を逐次２倍
に設定してより大きな状態数のときの上記エルゴディッ
ク隠れマルコフモデルの上記パラメータを計算する。従
って、上記削減手段により、上記記憶装置のメモリ量及
び上記制御手段の計算量を大幅に削減することができる
一方、上記パラメータ計算によりより大きな状態数のＥ
−ＨＭＭを作成することができる。すなわち、Ｅ−ＨＭ
Ｍを含む統計的言語モデルをバーム・ウェルチの学習ア
ルゴリズムを用いて学習して作成することができる。In the statistical language model creating apparatus according to the first aspect of the present invention, when the parameter is estimated and calculated using the Balm-Welch learning algorithm, the deleting means outputs the symbol. When the probability is smaller than a predetermined threshold, the symbol output probability is set to 0, and the state to which the set symbol output probability transitions is deleted from the storage device. When learning the parameters using the Balm-Welch learning algorithm, the parameter calculation means sets the number of states of the ergodic hidden Markov model to twice in succession to obtain a larger number of states. Compute the parameters of the Ergodic Hidden Markov Model. Therefore, while the amount of memory of the storage device and the amount of calculation of the control unit can be significantly reduced by the reduction unit, a larger number of states can be obtained by the parameter calculation.
-An HMM can be created. That is, E-HM
A statistical language model including M can be created by learning using a Balm-Welch learning algorithm.

【００１０】また、請求項２の統計的言語モデル作成装
置においては、上記パラメータ計算手段は、第１の状態
数から、上記第１の状態数の２倍である第２の状態数の
パラメータを計算するときに、好ましくは、上記第１乃
至第３の計算手段を備える。ここで、上記第１の計算手
段は、２つの状態毎に、第１の状態数のときの同一の初
期状態確率の初期パラメータを有する２個のパラメータ
を計算し、計算した２個の初期パラメータに正規化のた
めの０．５を乗算することにより、第２の状態数のとき
の初期状態確率の初期パラメータを計算する。次いで、
上記第２の計算手段は、４つの状態毎に、第１の状態数
のときの同一の状態遷移確率の初期パラメータを有する
４個のパラメータを計算し、計算した４個の初期パラメ
ータに正規化のための０．５を乗算することにより、第
２の状態数のときの状態遷移確率の初期パラメータを計
算する。さらに、上記第３の計算手段は、４つの状態毎
に、第１の状態数のときの同一のシンボル出力確率の初
期パラメータを有する４個のパラメータを計算し、計算
した４個の初期パラメータにそれぞれ、所定の方法で発
生した互いに異なる乱数を乗算し、その乗算結果を第２
の状態数のときのシンボル出力確率の和が１となるよう
に正規化することにより、第２の状態数のときのシンボ
ル出力確率のパラメータを計算する。Further, in the statistical language model creating apparatus according to the second aspect, the parameter calculating means converts a parameter of a second state number, which is twice the first state number, from the first state number. When performing the calculation, the apparatus preferably includes the first to third calculation means. Here, the first calculation means calculates, for each of the two states, two parameters having the same initial state probability at the first number of states and the calculated two initial parameters. Is multiplied by 0.5 for normalization to calculate an initial parameter of the initial state probability at the second number of states. Then
The second calculation means calculates, for each of the four states, four parameters having the same initial parameter of the same state transition probability as the first number of states, and normalizes to the calculated four initial parameters. The initial parameter of the state transition probability at the second number of states is calculated by multiplying by 0.5. Further, the third calculating means calculates, for each of the four states, four parameters having the same initial parameters of the same symbol output probability at the first number of states, and calculates the four initial parameters. Each of them is multiplied by a different random number generated by a predetermined method, and the result of the multiplication is referred to as a second random number.
By normalizing so that the sum of the symbol output probabilities at the number of states is 1, the parameter of the symbol output probability at the second number of states is calculated.

【００１１】さらに、請求項３記載の音声認識装置にお
いては、上記統計的言語モデル装置は、音声認識のため
の統計的言語モデルを作成する。そして、上記音声認識
手段は、入力される単語列からなる発声音声の音声信号
に基づいて、上記統計的言語モデル作成装置によって作
成された統計的言語モデルを参照して上記発声音声を音
声認識する。Further, in the speech recognition device according to the third aspect, the statistical language model device creates a statistical language model for speech recognition. Then, the speech recognition unit performs speech recognition on the uttered speech with reference to the statistical language model created by the statistical language model creating device based on the speech signal of the uttered speech composed of the input word string. .

【００１２】[0012]

【実施例】以下、図面を参照して本発明に係る実施例の
音声認識装置について説明する。図１の本実施例の音声
認識装置は、特に、学習用テキストデータメモリ１０に
格納された単語列からなる学習用テキストデータに基づ
いて、バーム・ウェルチの学習アルゴリズムを用いて、
かつこれに接続され学習途中の統計的言語モデルを一時
的に記憶するワーキングエリアとして用いられるモデル
メモリ１２を用いて以下に示す言語作成モデル処理を実
行することにより、Ｅ−ＨＭＭの統計的言語モデルを作
成して統計的言語モデルメモリ７に格納する言語モデル
作成部１１を備えたことを特徴とする。ここで、この音
声認識装置は、マイクロホン１と、特徴抽出部２と、バ
ッファメモリ３と、入力される発声音声データに基づい
てＨＭＭメモリ５内の音響モデルであるＨＭＭを参照し
て音素照合処理を実行して音素データを出力する音素照
合部４と、音素照合部４からの音素データに基づいてＯ
ｎｅｐａｓｓＤＰ（Ｖｉｔｅｒｂｉｓｅａｒｃ
ｈ）アルゴリズムを用いて統計的言語モデルメモリ７内
の統計的言語モデルを参照して音声認識を実行するＯｎ
ｅｐａｓｓＤＰ音声認識部（以下、音声認識部とい
う。）６とを備える。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A speech recognition apparatus according to an embodiment of the present invention will be described below with reference to the drawings. The speech recognition apparatus of the present embodiment shown in FIG. 1 particularly uses the Balm-Welch learning algorithm based on learning text data composed of word strings stored in the learning text data memory 10,
In addition, by executing a language creation model process described below using a model memory 12 connected thereto and used as a working area for temporarily storing a statistical language model during learning, a statistical language model of the E-HMM is obtained. And a language model creating unit 11 that creates the statistical language model and stores it in the statistical language model memory 7. Here, the speech recognition apparatus performs a phoneme matching process by referring to a microphone 1, a feature extraction unit 2, a buffer memory 3, and an HMM which is an acoustic model in an HMM memory 5 based on input uttered speech data. And outputs a phoneme data, and a phoneme matching unit 4 based on the phoneme data from the phoneme matching unit 4.
ne pass DP (Viterbi search)
h) On which performs speech recognition with reference to the statistical language model in the statistical language model memory 7 using an algorithm
e pass DP speech recognition unit (hereinafter, referred to as speech recognition unit) 6.

【００１３】まず、言語モデル作成部１１において実行
される言語モデル作成処理について以下に説明する。そ
の最初に、バーム・ウェルチの学習アルゴリズムについ
て説明する。ＨＭＭのモデルを規定するパラメータは、
状態数Ｎのときの状態ｉと状態ｊとの間の遷移確率ａ_N
（ｉ，ｊ）と、各状態からのシンボル出力確率ｂ
_N（ｉ，ｊ，ｋ）である。これらを音声データを用いて
推定学習するアルゴリズムが存在することがＨＭＭの大
きな特徴になっている。その代表的なものがバーム・ウ
ェルチの学習アルゴリズムであり、別名ｆｏｒｗａｒｄ
−ｂａｃｋｗａｒｄアルゴリズムともいう。このアルゴ
リズムは、トレイスの上で前向きと後向きの演算を繰り
返して、徐々に学習データに対して高い確率を与えるよ
うにパラメータを更新するものである。バーム・ウェル
チの学習アルゴリズムは以下のようなものである。First, the language model creation processing executed by the language model creation unit 11 will be described below. First, the learning algorithm of Balm-Welch will be described. The parameters that define the model of the HMM are:
Transition probability a _N between state i and state j when the number of states is _N
(I, j) and symbol output probability b from each state
_N (i, j, k). An important feature of the HMM is that there is an algorithm for estimating and learning them using voice data. A typical example is the learning algorithm of Balm-Welch, also known as forward.
-Also referred to as a backward algorithm. In this algorithm, forward and backward calculations are repeated on a trace, and parameters are updated so as to gradually give a higher probability to learning data. The learning algorithm of Balm-Welch is as follows.

【００１４】まず、モデルλと観測された、例えば音声
データの特徴パラメータの系列Ｙ＝｛ｙ₁，ｙ₂，…，ｙ
_r｝が与えられた時に、時刻ｔに状態ｉにいる又は通過
する確率γ_t（ｉ）は次の数１で定義される。First, for example, a series Y = ｛y ₁ , y ₂ ,..., Y of feature parameters of speech data observed as a model λ.
Given _r }, the probability γ _t (i) of being in or passing through state i at time t is defined by the following _equation (1).

【００１５】[0015]

【数１】γ_t（ｉ）＝Ｐ（ｑ_t＝ｉ│Ｙ，λ）Γ _t (i) = P (q _t = i | Y, λ)

【００１６】ここで、ｑ_tは時刻ｔにいる状態の番号を
示し、数１の確率γ_t（ｉ）は、時刻１から時刻ｔまで
の間、部分観測列｛ｙ₁，ｙ₂，ｙ₃，…，ｙ_t｝を観測し
た後、時刻ｔには状態ｉにいる確率である前向き変数α
_t（ｉ）と、時刻ｔに状態ｉにいて時刻ｔ＋１から最後
までの部分観測列｛ｙ_t+1，ｙ_t+2，ｙ_t+3，…，ｙ_r｝を
観測する確率である後向き変数β_t（ｉ）とを用いて次
の数２のように表すことができる。Here, q _t indicates the number of the state at the time t, and the probability γ _t (i) of Equation 1 is the partial observation sequence {y ₁ , y ₂ , y during the period from the time 1 to the time t. ₃ ,..., Y _t }, at time t, the forward variable α which is the probability of being in state i
_t (i) and the probability of observing the partial observation sequence {y _{t + 1} , y _{t + 2} , y _{t + 3} ,..., y _r } from time t + 1 to the end in state i at time t Using the variable β _t (i), it can be expressed as in the following Expression 2.

【００１７】[0017]

【数２】 (Equation 2)

【００１８】また、時刻ｔに状態ｉにいて時刻ｔ＋１に
状態ｊにいる確率をξ_t（ｉ，ｊ）とすると、次の数３
で表される。If the probability of being in state i at time t and being in state j at time t + 1 is ξ _t (i, j),
It is represented by

【００１９】[0019]

【数３】ξ_t(ｉ,ｊ)＝Ｐ(ｑ_t＝ｉ,ｑ_t+1＝ｊ│Ｙ,λ)Ξ _t (i, j) = P (q _t = i, q _{t + 1} = j│Y, λ)

【００２０】ここで、前向き変数α_t(ｉ)と後向き変数
β_t(ｉ)の定義から、次の数４を得る。From the definition of the forward variable α _t (i) and the backward variable β _t (i), the following _equation 4 is obtained.

【００２１】[0021]

【数４】ξ_t(ｉ,ｊ)＝{α_t(ｉ)ａ_N(ｉ,ｊ)ｂ_N(ｉ,ｊ,
ｋ)β_t+1(ｊ)}／Ｐ(Ｙ│λ)Ξ _t (i, j) = {α _t (i) a _N (i, j) b _N (i, j,
k) β _{t + 1} (j)} / P (Y│λ)

【００２２】この数４は次の数５のように書き換えられ
る。This equation 4 is rewritten as the following equation 5.

【００２３】[0023]

【数５】 (Equation 5)

【００２４】ここで、確率γ_t（ｉ）と確率ξ_t（ｉ，
ｊ）との間には、次の数６で示す関係が存在する。Here, the probability γ _t (i) and the probability ξ _t (i,
j) has the relationship shown in the following Expression 6.

【００２５】[0025]

【数６】 (Equation 6)

【００２６】以上より、観測信号ｙ_tが離散符号の場合
は、再推定計算後の初期状態確率πｈ（ｉ）と状態遷移
確率ａｈ_N（ｉ，ｊ）とシンボル出力確率ｂｈ_N（ｉ，
ｊ，ｋ）とはそれぞれ次の数７、数８及び数９で表わす
ことができる。As described above, when the observation signal y _t is a discrete code, the initial state probability πh (i), the state transition probability ah _N (i, j) and the symbol output probability bh _N (i,
j, k) can be expressed by the following equations 7, 8, and 9, respectively.

【００２７】[0027]

【数７】πｈ(ｉ)＝時刻０に状態ｉにいる確率＝γ
₁(ｉ)Πh (i) = probability of being in state i at time 0 = γ
₁ (i)

【数８】 (Equation 8)

【数９】 (Equation 9)

【００２８】このバーム・ウェルチの学習アルゴリズム
は、再推定計算することによって、再推定後のモデルλ
ｈにより系列Ｙを出力する確率Ｐ（Ｙ│λｈ）＞再推定
前のモデルλにより系列Ｙを出力する確率Ｐ（Ｙ│λ）
となることが保証されている。従って、この手順を繰り
返せば、よりモデルλにより近いモデルを得ることがで
きる。このようにして、適当な初期値を与えて、モデル
から学習用テキストデータが観測される確率を大きくす
るように繰り返し計算することによって、学習用テキス
トデータに基づくＨＭＭのパラメータを得ることができ
る。もし各音素ごとに音素パターンのデータが多数得ら
れれば、このバーム・ウェルチの学習アルゴリズムを用
いて、そのようなデータを生成する確率が最も高くなる
ようなＨＭＭのパラメータを推定計算することができ
る。さらには、音素パラメータのあらゆる変形を含むよ
うな十分な量の音声データが得られれば、よりよりモデ
ルを得ることができる。The learning algorithm of Balm-Welch calculates the model λ after re-estimation by performing re-estimation calculation.
h, the probability P (Y│λh) of outputting the sequence Y by the model λ before re-estimation P (Y│λ)
Is guaranteed to be Therefore, by repeating this procedure, a model closer to model λ can be obtained. In this way, by giving an appropriate initial value and repeatedly calculating to increase the probability that the learning text data is observed from the model, it is possible to obtain the HMM parameters based on the learning text data. If a large amount of phoneme pattern data is obtained for each phoneme, HMM parameters that maximize the probability of generating such data can be estimated and calculated using this Balm-Welch learning algorithm. . Furthermore, if a sufficient amount of voice data including all deformations of the phoneme parameters is obtained, a model can be further obtained.

【００２９】ところで、本実施例の言語モデル作成処理
において用いる、遷移出力型で状態数Ｎ（ただし、Ｎは
自然数である。）のＥ−ＨＭＭのパラメータλ＝（Π，
Ａ，Ｂ）は、初期状態確率Πと、状態ｉから状態ｊへの
状態遷移確率Ａと、状態ｉから状態ｊへのシンボル出力
確率Ｂをそれぞれ次の数１０乃至数１２に示すように定
義する。By the way, the parameter λ = (Π, Π) of the transition output type E-HMM of the number of states N (where N is a natural number) used in the language model creation processing of this embodiment.
A, B) define an initial state probability Π, a state transition probability A from state i to state j, and a symbol output probability B from state i to state j as shown in the following equations 10 to 12, respectively. I do.

【００３０】[0030]

【数１０】Π＝π_N（ｉ）；ｉ＝１，２，…，Ｎ１０ = π _N (i); i = 1, 2,..., N

【数１１】Ａ＝ａ_N（ｉ，ｊ）；ｉ＝１，２，…，Ｎ，ｊ＝１，２，…，ＮA = a _N (i, j); i = 1, 2,..., N, j = 1, 2,.

【数１２】Ｂ＝ｂ_N（ｉ，ｊ，ｋ）；ｉ＝１，２，…，Ｎ，ｊ＝１，２，…，Ｎ，ｋ＝１，２，…，ＶB = b _N (i, j, k); i = 1, 2,..., N, j = 1, 2,..., N, k = 1, 2,.

【００３１】ここで、Ｖは語彙数である。本実施例の言
語モデル作成部１１によって実行される言語モデル作成
処理における特徴は以下の通りである。（特徴１）小さ
いシンボル出力確率の状態を削除する。すなわち、バー
ム・ウェルチの学習アルゴリズムを用いてＥ−ＨＭＭの
パラメータを推定計算するとき、シンボル出力確率ｂ_N
（ｉ，ｊ，ｋ）が所定のしきい値より小さいとき、その
シンボル出力確率ｂ_Nを０に設定して、再推定計算せ
ず、そのシンボル出力確率ｂ_Nの遷移先の状態をモデル
メモリ１２から削除する。これにより、モデルメモリ１
２のメモリ量及び言語モデル作成部１１の計算量を大幅
に削減できる。上記しきい値は０を超え１未満の数であ
って、その好ましい値は１０^-300に設定し、このしきい
値の値を詳細後述するシミュレーションにおいて用い
た。（特徴２）状態数を逐次増大させる。すなわち、状
態数が大きなＥ−ＨＭＭのパラメータを再学習する場
合、大量のモデルメモリ１２が必要になる。そこで、状
態数を逐次的に増加させる。Ｎ状態のＥ−ＨＭＭのパラ
メータが既に推定計算されたとして、２Ｎ状態のＥ−Ｈ
ＭＭの初期状態確率及び状態遷移確率の各初期パラメー
タπ_2N（ｉ），ａ_2N（ｉ）をそれぞれ次の数１３及び数
１４に示すように計算する。Here, V is the number of words. The features of the language model creation processing executed by the language model creation unit 11 of the present embodiment are as follows. (Feature 1) The state of small symbol output probability is deleted. That is, when the parameters of the E-HMM are estimated and calculated using the Balm-Welch learning algorithm, the symbol output probability b _N
When (i, j, k) is smaller than a predetermined threshold value, the symbol output probability b _N is set to 0, re-estimation calculation is not performed, and the transition destination state of the symbol output probability b _N is stored in the model memory. 12 is deleted. Thereby, the model memory 1
2 and the calculation amount of the language model creation unit 11 can be significantly reduced. The threshold value is a number exceeding 0 and less than 1, and a preferable value is set to 10 ^−300, and the value of the threshold value is used in a simulation described later in detail. (Feature 2) The number of states is sequentially increased. That is, when re-learning parameters of the E-HMM having a large number of states, a large amount of model memory 12 is required. Therefore, the number of states is sequentially increased. Assuming that the parameters of the N-state E-HMM have already been estimated and calculated, the 2N-state E-HMM
The initial parameters π _2N (i) and a _2N (i) of the initial state probability and the state transition probability of the MM are calculated as shown in the following _Expressions 13 and 14, respectively.

【００３２】[0032]

【数１３】π_2N(ｉ)＝０．５×π_N(ｆｒｕ(ｉ／２))；ｉ＝１，２，…，２ＮΠ _2N (i) = 0.5 × π _N (fru (i / 2)); i = 1, 2,..., 2N

【数１４】ａ_2N(ｉ,ｊ)＝０．５×ａ_N(ｆｒｕ(ｉ／２),
ｆｒｕ(ｊ／２))；ｉ＝１，２，…，２Ｎ，ｊ＝１，２，…，２ＮA _2N (i, j) = 0.5 × a _N (fru (i / 2),
fru (j / 2)); i = 1, 2,..., 2N, j = 1, 2,.

【００３３】ここで、関数ｆｒｕ（・）は引数の小数点
以下を切り上げた整数の値を計算する関数である。数１
３の物理的な意味は、２つの状態毎に、状態数Ｎのとき
の同一の元の初期状態確率の初期パラメータπ_N（ｆｒ
ｕ（ｉ／２））を有する２個の初期パラメータを計算
し、この２個の初期パラメータにそれぞれ正規化のため
の０．５を乗算することにより、状態数が２倍であって
個数が２倍の状態数２Ｎのときの初期状態確率の初期パ
ラメータπ_2N（ｉ）を計算している。また、数１４の物
理的な意味は、４つの状態毎に、状態数Ｎのときの同一
の元の状態遷移確率の初期パラメータａ_N（ｆｒｕ（ｉ
／２），ｆｒｕ（ｊ／２））を有する４個の初期パラメ
ータを計算し、この４個の初期パラメータに正規化のた
めの０．５を乗算することにより、状態数が２倍であっ
て個数が４倍の状態数２Ｎのときの状態遷移確率の初期
パラメータａ_2N（ｉ）を計算している。さらに、シンボ
ル出力確率の初期パラメータｂ_2N（ｉ，ｊ，ｋ）は次の
数１５で表される。Here, the function fru (•) is a function for calculating an integer value obtained by rounding up the decimal part of the argument. Number 1
The physical meaning of 3 is that, for each of the two states, the initial parameter π _N (fr
By calculating two initial parameters having u (i / 2)) and multiplying each of the two initial parameters by 0.5 for normalization, the number of states is doubled and the number of states is doubled. The initial parameter π _2N (i) of the initial state probability when the number of states is 2N, which is twice, is calculated. Further, the physical meaning of the expression 14 is that the initial parameter a _N (fru (i
/ 2), fru (j / 2)), and by multiplying the four initial parameters by 0.5 for normalization, the number of states is doubled. Thus, the initial parameter a _2N (i) of the state transition probability when the number of states is 2N, which is four times as large, is calculated. Further, an initial parameter b _2N (i, j, k) of the symbol output probability is represented by the following _equation (15).

【００３４】[0034]

【数１５】ｂ_2N(ｉ,ｊ,ｋ)＝ｂ_N(ｆｒｕ(ｉ／２,ｊ／
２,ｋ))×random(ｉ,ｊ,ｋ) ｉ＝１，２，…，２Ｎ，ｊ＝１，２，…，２Ｎ，ｋ＝１，２，…，ＶB _2N (i, j, k) = b _N (fru (i / 2, j /
2, k)) × random (i, j, k) i = 1, 2,..., 2N, j = 1, 2,..., 2N, k = 1, 2,.

【００３５】ここで、関数ｒａｎｄｏｍ（・）は引数
ｉ，ｊ，ｋを初期値として所定の従来の乱数発生方法で
発生される乱数計算の関数である。なお、上記シンボル
出力確率の初期パラメータｂ_2N（ｉ，ｊ，ｋ）はさら
に、次の数１６が満足するように正規化される。すなわ
ち、状態数２Ｎのときのシンボル出力確率の和が１とな
るように正規化される。Here, the function random (•) is a function for calculating a random number generated by a predetermined conventional random number generation method using the arguments i, j, and k as initial values. The initial parameter b _2N (i, j, k) of the symbol output probability is further normalized so that the following equation 16 is satisfied. That is, normalization is performed so that the sum of the symbol output probabilities when the number of states is 2N is 1.

【００３６】[0036]

【数１６】 (Equation 16)

【００３７】従って、数１５及び数１６の物理的な意味
は、４つの状態毎に、状態数Ｎのときの同一の元のシン
ボル出力確率の初期パラメータｂ_N（ｆｒｕ（ｉ／
２），ｆｒｕ（ｊ／２），ｋ）を有する４個の初期パラ
メータを計算し、同一のパラメータのときバーム・ウェ
ルチの学習アルゴリズムを適応できないのでこれを回避
して４個の初期パラメータを異ならせるために、所定の
乱数発生方法で計算された互いに異なる乱数ｒａｎｄｏ
ｍ（ｉ，ｊ，ｋ）をそれぞれ、上記計算された４個の初
期パラメータに乗算し、さらに、これに正規化のための
係数を乗算することにより、状態数が２倍であって個数
が４倍の状態数２Ｎのときのシンボル出力確率の初期パ
ラメータｂ_2N（ｉ）を計算している。Therefore, the physical meaning of Equations 15 and 16 is that, for each of the four states, the initial parameter b _N (fru (i / i /
2), four initial parameters having fru (j / 2), k) are calculated. If the same parameters are used, the Balm-Welch learning algorithm cannot be applied. Different random numbers calculated by a predetermined random number generation method.
m (i, j, k) is multiplied by the four calculated initial parameters, and further multiplied by a coefficient for normalization, whereby the number of states is doubled and the number of states is doubled. The initial parameter b _2N (i) of the symbol output probability when the number of states is 4N, which is four times, is calculated.

【００３８】以下、状態数が大きなＥ−ＨＭＭについて
は、小さいシンボル出力確率を削除し、かつ状態数を逐
次２倍にするように、同様の処理を繰り返すことによっ
て、所望の状態数ＮｄのＥ−ＨＭＭの統計的言語モデル
を得ることができる。具体的な処理、すなわち、一般的
に状態Ｎｄまでの統計的言語モデルを作成するときの当
該言語モデル作成処理である用いられるバーム・ウェル
チの学習アルゴリズムについて図３のフローチャートを
参照して以下に説明する。Hereinafter, with respect to the E-HMM having a large number of states, the same processing is repeated so that the small symbol output probability is deleted and the number of states is successively doubled. -A statistical language model of the HMM can be obtained. A specific algorithm, that is, a Balm-Welch learning algorithm used as a language model creation process for creating a statistical language model up to the state Nd will be described below with reference to the flowchart of FIG. I do.

【００３９】まず、ステップＳ１において、学習用テキ
ストデータメモリ１０に格納された単語列からなる学習
用テキストデータに基づいて、１状態のＥ−ＨＭＭの初
期パラメータπ₁（ｉ），ａ₁（ｉ，ｋ），ｂ₁（ｉ，
ｊ，ｋ）をそれぞれ数１０乃至数１２を用いて推定計算
してモデルメモリ１２に格納する。次いで、ステップＳ
２において、同一の学習用テキストデータに基づいて、
１状態のＥ−ＨＭＭのパラメータを、バーム・ウェルチ
の学習アルゴリズムを用いて再推定計算して、モデルメ
モリ１２に格納する。このとき、シンボル出力確率が例
えば好ましくは１０^-300であるしきい値よりも小さいと
き当該シンボル出力確率を０にして、当該シンボル出力
確率の遷移先の状態をモデルメモリ１２内の再推定パラ
メータから削除する。そして、ステップＳ３において、
状態数を示す制御パラメータＮを２にセットする。First, in step S1, based on learning text data composed of word strings stored in the learning text data memory 10, initial parameters π ₁ (i), a ₁ (i) of the _one -state E-HMM are set. , K), b ₁ (i,
j, k) are estimated using equations (10) to (12) and stored in the model memory 12. Then, step S
In 2, based on the same learning text data,
The parameters of the one-state E-HMM are re-estimated using a Balm-Welch learning algorithm and stored in the model memory 12. At this time, when the symbol output probability is smaller than a threshold value, for example, preferably 10 ⁻³⁰⁰ , the symbol output probability is set to 0, and the state of the transition of the symbol output probability is determined from the re-estimation parameter in the model memory 12. delete. Then, in step S3,
The control parameter N indicating the number of states is set to 2.

【００４０】次いで、ステップＳ４において、学習用テ
キストデータメモリ１０に格納された単語列からなる学
習用テキストデータに基づいて、Ｎ状態のＥ−ＨＭＭの
初期パラメータπ_N（ｉ），ａ_N（ｉ，ｋ），ｂ_N（ｉ，
ｊ，ｋ）をそれぞれ数１０乃至数１２を用いて推定計算
してモデルメモリ１２に格納する。次いで、ステップＳ
５において、同一の学習用テキストデータに基づいて、
Ｎ状態のＥ−ＨＭＭのパラメータを、バーム・ウェルチ
の学習アルゴリズムを用いて再推定計算して、モデルメ
モリ１２に格納する。このとき、シンボル出力確率が例
えば好ましくは１０^-300であるしきい値よりも小さいと
き当該シンボル出力確率を０にして、当該シンボル出力
確率の遷移先の状態をモデルメモリ１２内の再推定パラ
メータから削除する。そして、ステップＳ６に進み、制
御パラメータＮが所望の状態数Ｎｄであるか否かが判断
され、Ｎ＝Ｎｄであるときは（ステップＳ６でＹＥＳ）
この言語モデル作成処理を終了する。一方、Ｎ≠Ｎｄで
あるときは（ステップＳ６でＮＯ）ステップＳ７で制御
パラメータＮを２倍して制御パラメータＮにセットし
て、ステップＳ４及びＳ５の処理を繰り返す。すなわ
ち、状態数を２倍にして再推定計算の学習を繰り返す。Next, in step S4, based on the learning text data composed of the word strings stored in the learning text data memory 10, initial parameters π _N (i), a _N (i) of the _N -state E-HMM are obtained. , K), b _N (i,
j, k) are estimated using equations (10) to (12) and stored in the model memory 12. Then, step S
In 5, based on the same learning text data,
The parameters of the N-state E-HMM are re-estimated and calculated using the Balm-Welch learning algorithm and stored in the model memory 12. At this time, when the symbol output probability is smaller than a threshold value, for example, preferably 10 ⁻³⁰⁰ , the symbol output probability is set to 0, and the state of the transition of the symbol output probability is determined from the re-estimation parameter in the model memory 12. delete. Then, the process proceeds to step S6, where it is determined whether or not the control parameter N is a desired number of states Nd. If N = Nd (YES in step S6)
This language model creation processing ends. On the other hand, if N ≠ Nd (NO in step S6), the control parameter N is doubled in step S7 and set as the control parameter N, and the processing in steps S4 and S5 is repeated. That is, the number of states is doubled, and learning of re-estimation calculation is repeated.

【００４１】上記の言語モデル作成処理の後、言語モデ
ル作成部１１は、モデルメモリ１２に格納された再推定
計算後の統計的言語モデルを統計的言語モデルメモリ７
にコピーして格納する。以上のようにして作成格納され
た、Ｅ−ＨＭＭを用いた確率付きネットワーク文法であ
る、本実施例の統計的言語モデルを用いた音声認識装置
の構成及び動作について説明する。After the above-described language model creation processing, the language model creation unit 11 stores the statistical language model after re-estimation calculation stored in the model memory 12 into the statistical language model memory 7.
Copy to and store. The configuration and operation of the speech recognition apparatus using the statistical language model of the present embodiment, which is the network grammar with probabilities using the E-HMM created and stored as described above, will be described.

【００４２】図１において、話者の発声音声はマイクロ
ホン１に入力されて音声信号に変換された後、特徴抽出
部２に入力される。特徴抽出部２は、入力された音声信
号をＡ／Ｄ変換した後、例えばＬＰＣ分析を実行し、対
数パワー、１６次ケプストラム係数、Δ対数パワー及び
１６次Δケプストラム係数を含む３４次元の特徴パラメ
ータを抽出する。抽出された特徴パラメータの時系列は
バッファメモリ３を介して音素照合部４に入力される。
音素照合部４に接続されるＨＭＭメモリ５内のＨＭＭ
は、複数の状態と、各状態間の遷移を示す弧から構成さ
れ、各弧には状態間の遷移確率と入力コードに対する出
力確率を有している。音素照合部４は、入力されたデー
タに基づいて音素照合処理を実行して音素データを、音
声認識部６に出力する。In FIG. 1, a uttered voice of a speaker is input to a microphone 1 and converted into a voice signal, and then input to a feature extracting unit 2. After performing A / D conversion on the input audio signal, the feature extraction unit 2 performs, for example, LPC analysis, and obtains a 34-dimensional feature parameter including logarithmic power, 16th cepstrum coefficient, Δlog power, and 16th Δcepstrum coefficient. Is extracted. The time series of the extracted feature parameters is input to the phoneme matching unit 4 via the buffer memory 3.
HMM in HMM memory 5 connected to phoneme matching unit 4
Is composed of a plurality of states and arcs indicating transitions between the states. Each arc has a transition probability between states and an output probability for an input code. The phoneme matching unit 4 performs phoneme matching processing based on the input data, and outputs phoneme data to the speech recognition unit 6.

【００４３】上記作成された統計的言語モデルを記憶す
る統計的言語モデルメモリ７は音声認識部６に接続され
る。音声認識部６は、統計的言語モデルメモリ７内の統
計的言語モデルを参照して、所定のＯｎｅｐａｓｓ
ＤＰアルゴリズムを用いて、入力された音素データにつ
いて左から右方向に、後戻りなしに処理してより高い生
起確率の単語を音声認識結果データと決定することによ
り音声認識の処理を実行して、決定された音声認識結果
データ（文字列データ）を出力する。The statistical language model memory 7 for storing the created statistical language model is connected to the speech recognition unit 6. The speech recognition unit 6 refers to the statistical language model in the statistical language model memory 7 and determines a predetermined One pass.
Using the DP algorithm, the input phoneme data is processed from left to right without regression to determine a word having a higher probability of occurrence as speech recognition result data, thereby executing speech recognition processing and determining. Then, the obtained speech recognition result data (character string data) is output.

【００４４】本発明者は、上述の言語モデル作成処理の
アルゴリズムを用いて、大量の単語列データから確率つ
きネットワーク文法の統計的言語モデルを作成するシミ
ュレーションを行なった。その結果を以下に示す。学習
用テキストデータとして、本出願人が所有する対話デー
タベースの中の「国際会議に関する問い合わせの電話で
の対話」を用いる。シミュレーションの条件を次の表１
に示す。The present inventor performed a simulation for creating a statistical language model of a network grammar with probabilities from a large amount of word string data using the above-described algorithm for language model creation processing. The results are shown below. As the text data for learning, “dialog by telephone for inquiries about international conferences” in the dialogue database owned by the present applicant is used. Table 1 shows the simulation conditions.
Shown in

【００４５】[0045]

【表１】言語モデル生成シミュレーションの条件 ──────────────────────────────── ＨＭＭの構造状態遷移出力型 ──────────────────────────────── 学習語彙数６４２０単語 ──────────────────────────────── 学習用テキストデータ数８４７５文 ──────────────────────────────── 総単語数５７３５４ ──────────────────────────────── ＨＭＭパラメータの再学習における４０回の繰り返しバーム・ウェルチのアルゴリズムの終了条件 ────────────────────────────────[Table 1] Conditions for language model generation simulation ──────────────────────────────── HMM structure State transition output type ─数 Number of learning vocabulary 6420 words ──────────────数 Number of text data for learning 8475 sentences ─────────────────────────総 Total number of words 57354 40 in retraining of HMM parameters Iterations Termination condition of Balm-Welch algorithm ────────────────────────────────

【００４６】まず、状態数の増加にともなうエントロピ
ーの変化を図４に示す。比較のために、図４に単語のバ
イグラム（bigram）及び単語のトライグラム（trigra
m）を含む統計的言語モデルを用いた音声認識装置のと
きのエントロピーの変化も示す。この図４から本実施例
の装置のエントロピーは、状態数が１２８以上で単語の
バイグラムのときよりも低くなることがわかる。なお、
状態数が５１２のときの本実施例において、シンボル出
力確率に値があるパラメータ数は２１０６９６７個であ
った。従って、本実施例の言語モデル作成処理は、基本
的なバーム・ウェルチの学習アルゴリズムのみを用いた
とき比較すると、モデルメモリ１２のメモリ量及び言語
モデル作成部１１の計算量を０．１２５％だけ削減して
いる。First, FIG. 4 shows a change in entropy with an increase in the number of states. For comparison, FIG. 4 shows a word bigram and a word trigram.
Changes in entropy for a speech recognizer using a statistical language model including m) are also shown. It can be seen from FIG. 4 that the entropy of the apparatus of this embodiment is lower than that of a word bigram when the number of states is 128 or more. In addition,
In this embodiment when the number of states is 512, the number of parameters having a value in the symbol output probability is 2106967. Therefore, the language model creation processing of the present embodiment requires only 0.125% of the memory amount of the model memory 12 and the calculation amount of the language model creation unit 11 when compared with the case where only the basic Balm-Welch learning algorithm is used. Has been reduced.

【００４７】次に、本発明者は、本実施例の言語モデル
作成部１１によって学習されたＥ−ＨＭＭを統計的言語
モデルを用いて、図１の音声認識装置を用いて、連続音
声認識のシミュレーションを行った。この結果を以下に
述べる。連続音声認識シミュレーションにおいては、音
素モデルとして連続分布型ＨＭＭを用い、サーチアルゴ
リズムにビームサーチ用い、言語モデルにＥ−ＨＭＭを
使用した。発声音声のテストデータ（以下、テストデー
タという。）には、Ｅ−ＨＭＭの学習に使用しメモリ１
０内の学習用テキストデータと同一タスクの会話３８文
を用いた。評価は文認識率で行なった。また、上記学習
用テキストデータに、上記テストデータを加えて統計的
言語モデルを作成したときのシミュレーションも行なっ
た。その他のシミュレーションの条件を次の表２に示
す。Next, the present inventor uses the statistical language model to learn the E-HMM learned by the language model creating section 11 of the present embodiment, and uses the speech recognition apparatus of FIG. A simulation was performed. The results are described below. In the continuous speech recognition simulation, a continuous distribution HMM was used as a phoneme model, a beam search was used as a search algorithm, and an E-HMM was used as a language model. Test data of the uttered voice (hereinafter referred to as test data) includes a memory 1 used for E-HMM learning.
38 sentences of the same task as the learning text data in 0 were used. The evaluation was performed at the sentence recognition rate. Also, a simulation was performed when a statistical language model was created by adding the test data to the learning text data. Other simulation conditions are shown in Table 2 below.

【００４８】[0048]

【表２】連続音声認識シミュレーションの条件 ──────────────────────────── 音素音響モデル４状態３ループ混合分布型ＨＭＭ ──────────────────────────── 音響パラメータｌｏｇパワー＋１６次ＬＰケプストラム＋Δｌｏｇパワー＋１６次Δケプストラム ──────────────────────────── 学習用男性アナウンサー１名、２６２０単語発声テキストデータ ──────────────────────────── 認識語彙数４３５単語 ──────────────────────────── ビーム幅４０９６ ──────────────────────────── テストデータ同一話者発声３８文 ──────────────────────────── 発話様式朗読 ────────────────────────────[Table 2] Conditions for continuous speech recognition simulation ──────────────────────────── Phoneme acoustic model 4-state 3-loop mixed distribution HMM ─ ─────────────────────────── Acoustic parameter log power + 16th order LP cepstrum + Δlog power + 16th order Δ cepstrum ─────────男性 One male announcer for learning, uttering 2620 words Text data ───────────────────数 Number of recognized words 435 words ──────────────────────────── Beam width 4096 ───── ─────────────────────── Test data Same speaker's utterance 38 sentences ───────── ────────────────── speaking style recitation ────────────────────────────

【００４９】図５に、学習用テキストデータにテストデ
ータを加えて統計的言語モデルを作成したときのシミュ
レーションの結果を示す。また、比較のために単語の
バイグラム及び単語のトライグラムを含む統計的言語モ
デルを用いて音声認識したときのシミュレーションの結
果も示す。この図５からわかるように、状態数が１２８
であるとき、単語のバイグラムより高い文認識率が得ら
れた。FIG. 5 shows a simulation result when a statistical language model is created by adding test data to learning text data. Also, for comparison, the results of simulation when speech recognition is performed using a statistical language model including a word bigram and a word trigram are shown. As can be seen from FIG. 5, the number of states is 128.
, A sentence recognition rate higher than that of a word bigram was obtained.

【００５０】図６に、学習用テキストデータのみで統計
的言語モデルを作成したときのシミュレーションの結果
を示す。この図６には、文認識率とともに、単語のパー
セントコレクトと単語のパーセントアキュラシーも示
す。また、比較のために単語のトライグラムを含む統計
的言語モデルのシミュレーションの結果も示す。ここ
で、単語のパーセントコレクトと、単語のパーセントア
キュラシーはそれぞれ当該技術分野で既に決められてい
るように、数１７及び数１８で表される。FIG. 6 shows the results of a simulation when a statistical language model was created using only the learning text data. FIG. 6 shows the word percent correct and the word percent accuracy together with the sentence recognition rate. The results of a simulation of a statistical language model including a trigram of words are also shown for comparison. Here, the percent correct of the word and the percent accuracy of the word are expressed by Expression 17 and Expression 18, respectively, as already determined in the technical field.

【００５１】[0051]

【数１７】パーセントコレクト＝{(Ｎ−Ｄ−Ｓ)／Ｎ}×
１００［％］## EQU17 ## Percent Collect = {(N−D−S) / N} ×
100 [%]

【数１８】パーセントアキュラシー＝{(Ｎ−Ｄ−Ｓ−
Ｉ)／Ｎ}×１００［％］## EQU18 ## Percent accuracy = {(NDS-
I) / N} × 100 [%]

【００５２】ここで、Ｎはすべての単語数であり、Ｄは
脱落誤りの数であり、Ｓは置換誤りの数であり、Ｉは挿
入誤りの数である。図６から明らかなように、このシミ
ュレーションでは状態数１２８のＥ−ＨＭＭにおいて文
認識率は４４．７％が得られた。一方、単語のトライグ
ラムのモデルの場合の文認識率は、３６．８％であっ
た。従って、文認識率を改善することができ、これらの
シミュレーションの結果から本実施例の言語モデル作成
処理の有効性が示された。Here, N is the number of all words, D is the number of missing errors, S is the number of replacement errors, and I is the number of insertion errors. As is clear from FIG. 6, in this simulation, a sentence recognition rate of 44.7% was obtained in the E-HMM having 128 states. On the other hand, the sentence recognition rate in the case of the word trigram model was 36.8%. Therefore, the sentence recognition rate could be improved, and the results of these simulations showed the effectiveness of the language model creation processing of the present embodiment.

【００５３】以上の実施例において、特徴抽出部２と、
音素照合部４と、音声認識部６と、言語モデル作成部１
１とは、例えばディジタル計算機によって構成される。In the above embodiment, the feature extracting unit 2
Phoneme matching unit 4, speech recognition unit 6, language model creation unit 1
1 is constituted by, for example, a digital computer.

【００５４】以上説明したように、本実施例の言語モデ
ル作成処理を用いることによって、モデルメモリ１２の
メモリ量及び言語モデル作成部１１の計算量を大幅に削
減することができ、これによって、Ｅ−ＨＭＭを含む統
計的言語モデルをバーム・ウェルチの学習アルゴリズム
を用いて学習して作成することができる。作成された統
計的言語モデルを音声認識装置に適応した場合、従来例
に比較してより高い音声認識率を得ることができる。As described above, by using the language model creation processing of this embodiment, the amount of memory of the model memory 12 and the amount of calculation of the language model creation unit 11 can be greatly reduced. -A statistical language model including an HMM can be created by learning using a Balm-Welch learning algorithm. When the created statistical language model is applied to a speech recognition device, a higher speech recognition rate can be obtained as compared with the conventional example.

【００５５】本実施例で用いたバーム・ウェルチの学習
アルゴリズムは、局所的に各状態遷移の確率の誤差が最
小となるように学習するので、初期パラメータが重要に
なる。本実施例においては、状態数を逐次増加すると
き、シンボル出力確率に乱数の重みをつけたが、同一の
類語に対して例えば類語辞書を使用して、同一のカテゴ
リ又は品詞に属する単語に同一の重みをつけることによ
り、確率付きネットワーク文法である統計的言語モデル
を作成してもよい。この類語辞書を用いて確率付きネッ
トワーク文法を用いて音声認識を実行した場合、本実施
例の統計的言語モデルを用いて音声認識を実行する場合
に比較して高い音声認識率を得ることができると考えら
れる。The Balm-Welch learning algorithm used in the present embodiment locally learns to minimize the error in the probability of each state transition, so the initial parameters are important. In this embodiment, when the number of states is sequentially increased, the symbol output probability is weighted by a random number. , A statistical language model that is a network grammar with probability may be created. When speech recognition is performed using a network grammar with probabilities using this thesaurus, a higher speech recognition rate can be obtained than when speech recognition is performed using the statistical language model of the present embodiment. it is conceivable that.

【００５６】以上の実施例においては、１次のＥ−ＨＭ
Ｍを用いたが、本発明はこれに限らず、２次のＥ−ＨＭ
Ｍを用いて本実施例の言語モデル作成処理を同様に適用
して統計的言語モデルを作成してもよい。以上の実施例
においては、モデルメモリ１２のメモリ量及び言語モデ
ル作成部１１の計算量を削減した、バーム・ウェルチの
学習アルゴリズムを用いた言語モデル作成処理を、確率
つきネットワーク文法である統計的言語モデルの作成に
用いたが、本発明はこれに限らず、形態素解析装置など
に適用してもよい。In the above embodiment, the primary E-HM
M was used, but the present invention is not limited to this, and the secondary E-HM
The statistical language model may be created by applying the language model creation processing of the present embodiment using M in the same manner. In the above embodiment, the language model creation processing using the Balm-Welch learning algorithm, in which the amount of memory in the model memory 12 and the amount of calculation in the language model creation unit 11 are reduced, is performed using a statistical language Although used for creating a model, the present invention is not limited to this, and may be applied to a morphological analyzer or the like.

【００５７】[0057]

【発明の効果】以上詳述したように本発明によれば、学
習途中の統計的言語モデルを記憶する記憶装置と、入力
される学習用テキストデータに基づいて、初期状態確率
と状態遷移確率とシンボル出力確率のパラメータを含む
所定の状態数のエルゴディック隠れマルコフモデルであ
る、音声認識又は形態素解析のための統計的言語モデル
を、バーム・ウェルチの学習アルゴリズムを用いて学習
して作成する制御手段とを備えた統計的言語モデル作成
装置において、上記制御手段は、上記バーム・ウェルチ
の学習アルゴリズムを用いて上記パラメータを推定計算
するときに、シンボル出力確率が所定のしきい値よりも
小さいときに、当該シンボル出力確率を０に設定して、
上記設定したシンボル出力確率の遷移先の状態を上記記
憶装置から削除する削除手段と、上記バーム・ウェルチ
の学習アルゴリズムを用いて上記パラメータを学習する
ときに、上記エルゴディック隠れマルコフモデルの状態
数を逐次２倍に設定してより大きな状態数のときの上記
エルゴディック隠れマルコフモデルの上記パラメータを
計算するパラメータ計算手段とを備える。従って、上記
削減手段により、上記記憶装置のメモリ量及び上記制御
手段の計算量を大幅に削減することができる一方、上記
パラメータ計算によりより大きな状態数のＥ−ＨＭＭを
作成することができる。すなわち、Ｅ−ＨＭＭを含む統
計的言語モデルをバーム・ウェルチの学習アルゴリズム
を用いて学習して作成することができる。作成された統
計的言語モデルを、音声認識装置又は形態素解析装置に
適用することができ、このとき、従来例に比較してより
高い音声認識率又はより低い形態素解析の解析誤差を得
ることができる。As described above in detail, according to the present invention, the initial state probability and the state transition probability are determined based on the storage device for storing the statistical language model during learning and the input learning text data. Control means for learning and creating a statistical language model for speech recognition or morphological analysis, which is an ergodic hidden Markov model having a predetermined number of states including a parameter of a symbol output probability, using a Balm-Welch learning algorithm In the statistical language model creating apparatus having the above, when the control means estimates and calculates the parameters using the Balm-Welch learning algorithm, when the symbol output probability is smaller than a predetermined threshold value , The symbol output probability is set to 0,
Deleting means for deleting the state of the transition of the set symbol output probability from the storage device; and, when learning the parameters using the Balm-Welch learning algorithm, the number of states of the Ergodic hidden Markov model is Parameter calculation means for calculating the parameters of the Ergodic Hidden Markov Model when the number of states is set to be twice as large and the number of states is larger. Therefore, while the amount of memory of the storage device and the amount of calculation of the control unit can be significantly reduced by the reduction unit, an E-HMM having a larger number of states can be created by the parameter calculation. That is, a statistical language model including an E-HMM can be created by learning using a Balm-Welch learning algorithm. The created statistical language model can be applied to a speech recognition device or a morphological analysis device. At this time, a higher speech recognition rate or a lower morphological analysis error can be obtained as compared with the conventional example. .

[Brief description of the drawings]

【図１】本発明に係る一実施例である音声認識装置の
ブロック図である。FIG. 1 is a block diagram of a speech recognition apparatus according to an embodiment of the present invention.

【図２】図１の音声認識装置において用いるエルゴデ
ィックＨＭＭの一例を示す状態遷移図である。FIG. 2 is a state transition diagram showing an example of an ergodic HMM used in the speech recognition device of FIG.

【図３】図１の言語モデル作成部によって実行される
言語モデル作成処理を示すフローチャートである。FIG. 3 is a flowchart illustrating a language model creation process executed by a language model creation unit in FIG. 1;

【図４】図１の音声認識装置のシミュレーションの結
果である状態数に対するエントロピーを示すグラフであ
る。FIG. 4 is a graph showing entropy with respect to the number of states as a result of the simulation of the speech recognition device of FIG. 1;

【図５】学習用テキストデータにテストデータを加え
て統計的言語モデルを作成したときの図１の音声認識装
置のシミュレーションの結果である状態数に対する文認
識率を示すグラフである。5 is a graph showing a sentence recognition rate with respect to the number of states as a result of simulation of the speech recognition apparatus of FIG. 1 when a statistical language model is created by adding test data to learning text data.

【図６】学習用テキストデータのみに基づいて統計的
言語モデルを作成したときの図１の音声認識装置のシミ
ュレーションの結果である状態数に対する認識率を示す
グラフである。6 is a graph showing a recognition rate with respect to the number of states as a result of simulation of the speech recognition apparatus in FIG. 1 when a statistical language model is created based on only learning text data.

[Explanation of symbols]

１…マイクロホン、２…特徴抽出部、３…バッファメモリ、４…音素照合部、５…ＯｎｅｐａｓｓＤＰ音声認識部、６…隠れマルコフモデル（ＨＭＭ）メモリ、７…統計的言語モデルメモリ、１０…学習用テキストデータメモリ、１１…言語モデル作成部、１２…モデルメモリ。 DESCRIPTION OF SYMBOLS 1 ... microphone, 2 ... feature extraction part, 3 ... buffer memory, 4 ... phoneme collation part, 5 ... One pass DP speech recognition part, 6 ... Hidden Markov model (HMM) memory, 7 ... statistical language model memory Learning text data memory, 11: Language model creation unit, 12: Model memory.

フロントページの続き (56)参考文献特開平６−202687（ＪＰ，Ａ) 日本音響学会講演論文集（平成６年10 〜11月）１−Ｑ−６，Ｐ．153〜154 電子情報通信学会技術研究報告「音声」ＳＰ93−26，Ｐ．17〜24，平成５年６月ＰＲＯＣＥＥＤＩＮＧＳＯＦＴＨＥＩＥＥＥＩＮＴＥＲＮＡＴＩＯＮＡＬＣＯＮＦＥＲＥＮＣＥＯＮＡＣＯＵＳＴＩＣＳ，ＳＰＥＥＣＨＡＮＤＳＩＧＮＡＬＰＲＯＣＥＳＳＩＮＧ，Ｐ．Ｉ−357〜360．1994（平成６年)Continuation of the front page (56) References JP-A-6-202687 (JP, A) Proceedings of the Acoustical Society of Japan (Oct.-Nov. 1994) 1-Q-6, P.S. 153-154 IEICE Technical Report "Sound" SP93-26, P.E. 17-24, June 1993, PROCEEDINGS OF THE IEEE INTERNATIONAL AL CONFERENCE ON A COUSTICS, SPEECH AND SIGNAL PROCESSING G, P.E. I-357 to 360. 1994 (1994)

Claims

(57) [Claims]

1. A storage device for storing a statistical language model during learning, and a predetermined number of states including parameters of an initial state probability, a state transition probability, and a symbol output probability, based on input learning text data. An ergodic hidden Markov model, a statistical language model for speech recognition or morphological analysis, learning means using a learning algorithm of Balm-Welch, and a statistical language model creating apparatus comprising: The control means, when estimating and calculating the parameters using the Balm-Welch learning algorithm, when the symbol output probability is smaller than a predetermined threshold, the symbol output probability is set to 0, Deleting means for deleting, from the storage device, a state at the transition of the set symbol output probability, the balm-welch When learning the above parameters using the learning algorithm described above, the number of states of the ergodic hidden Markov model is sequentially set to twice, and the above parameters of the ergodic hidden Markov model when the number of states is larger are calculated. A statistical language model creation device, comprising: a parameter calculation unit.

2. The method according to claim 1, wherein the parameter calculation means calculates a second state number parameter which is twice the first state number from the first state number. By calculating two parameters having the same initial parameters of the same initial state probability at the number of states as, and multiplying the calculated two initial parameters by 0.5 for normalization, the second state First calculating means for calculating initial parameters of initial state probabilities at the time of numbers; calculating four parameters having the same initial parameters of state transition probabilities at the first number of states for each of the four states A second calculating means for calculating the initial parameters of the state transition probability at the second number of states by multiplying the calculated four initial parameters by 0.5 for normalization; For each state, when the first number of states Are calculated, and the calculated four initial parameters are respectively multiplied by different random numbers generated by a predetermined method, and the multiplication result is referred to as a second state. A third calculating means for calculating a parameter of the symbol output probability at the time of the second number of states by normalizing so that the sum of the symbol output probabilities at the time of the number is one. The statistical language model creating apparatus according to claim 1, wherein

3. A statistical language model creating apparatus according to claim 1, wherein said statistical language model is created for speech recognition. 3. The statistical language model creating apparatus according to claim 1, further comprising: And a speech recognition unit for recognizing the uttered speech by referring to a statistical language model created by the statistical language model creation device.