JPH0627989A

JPH0627989A - Background hmm parameter extracting method and speech recognizing device using the same

Info

Publication number: JPH0627989A
Application number: JP4185086A
Authority: JP
Inventors: Yasuyuki Masai; 康之正井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1992-07-13
Filing date: 1992-07-13
Publication date: 1994-02-04

Abstract

PURPOSE:To learn a background HMM(Hidden Markov Model) matching a purpose at a high speed and speedily converge a parameter. CONSTITUTION:A series of labels for learning specific speeches used to learn the background HMM is supplied from a speech quantization part 2 to an HMM parameter estimation part 91, which estimates the parameters of the HMM, at every speech and stores them in an HMM parameter storage part 92. When the parameters of the background HMM are found, the parameters of the HMM at every speech used to learn the background HMM are taken out of the HMM parameter storage part 92, a background HMM parameter arithmetic part 93 calculates, for example, the mean of the respective parameters, and the result is stored as the parameter of the background HMM in a background parameter storage part 6.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、入力音声の認識にキー
ワードＨＭＭと組み合わせて用いられるバックグラウン
ドＨＭＭのパラメータ抽出に好適なバックグラウンドＨ
ＭＭパラメータ抽出方法およびその方法を用いた音声認
識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a background HMM suitable for extracting parameters of a background HMM used in combination with a keyword HMM for recognition of input speech.
The present invention relates to an MM parameter extraction method and a speech recognition device using the method.

【０００２】[0002]

【従来の技術】近年、入力音声を認識する音声認識方式
として、音声を一定の符号系列に変換するベクトル量子
化やマトリクス量子化等を行い、量子化符号系列を隠れ
マルコフモデル（Hidden Markov Model ）、即ちＨＭＭ
で認識する方式が成功を収めている。2. Description of the Related Art In recent years, as a speech recognition method for recognizing input speech, vector quantization or matrix quantization for converting speech into a fixed code sequence is performed, and the quantized code sequence is hidden Markov model. , Ie HMM
The method of recognizing with is successful.

【０００３】この種の音声認識方式を適用する音声認識
装置をより実用的な装置とするためには、認識対象とし
ている言葉の前後に認識対象外の言葉や周囲騒音などが
付加された入力音声に対して、認識対象としている言葉
を正しく認識するワードスポッティング技術が必要不可
欠となっている。In order to make a voice recognition device to which this kind of voice recognition system is applied more practical, an input voice in which unrecognized words and ambient noise are added before and after the word to be recognized. On the other hand, word spotting technology that correctly recognizes a word to be recognized is essential.

【０００４】ワードスポッティングを行うためには、非
常に多くの異なる音声区間候補に対するＨＭＭのスコア
を比較しなければならない。しかし、比較する音声区間
が異なるために、音声区間の違いによるスコアの変動が
大きく、各認識対象単語のＨＭＭから得られるスコアを
そのまま認識処理に用いたのでは、高い認識性能は得ら
れない。In order to perform word spotting, it is necessary to compare HMM scores for a large number of different voice section candidates. However, since the voice sections to be compared are different, the score varies greatly depending on the voice section, and high recognition performance cannot be obtained if the score obtained from the HMM of each recognition target word is directly used for the recognition processing.

【０００５】そこで、ワードスポッティングを行うのに
好適なＨＭＭのスコアの正規化方法として、認識対象と
する音声（単語）、即ちキーワードのＨＭＭ（以下、キ
ーワードＨＭＭと称する）の対数尤度Ｌk と認識対象と
しない音声（単語）や雑音のＨＭＭ（以下、バックグラ
ウンドＨＭＭと称する）の対数尤度Ｌb との差を求め、
この差をＨＭＭのスコアとする方法が米国MIT のRichar
d C.Rose等により報告されている（ICASSP-90,S2．2
4）。Therefore, as a method of normalizing the score of the HMM suitable for performing the word spotting, the speech (word) to be recognized, that is, the logarithmic likelihood Lk of the keyword HMM (hereinafter referred to as the keyword HMM) is recognized. The difference from the log-likelihood Lb of the HMM (hereinafter, referred to as background HMM) of the target speech (word) or noise is calculated,
How to use this difference as the score of HMM is Richard of MIT in the US
d Reported by C. Rose, etc. (ICASSP-90, S2.2
Four).

【０００６】従来、このバックグラウンドＨＭＭのパラ
メータは、キーワード以外の種々の異なる音声や雑音を
１つのＨＭＭに入力して学習することにより決定（推
定）されていた。Conventionally, the parameters of this background HMM have been determined (estimated) by inputting and learning various different voices and noises other than keywords into one HMM.

【０００７】[0007]

【発明が解決しようとする課題】上記したように従来
は、ワードスポッティングを行うのに好適なＨＭＭのス
コアの正規化方法に用いられるバックグラウンドＨＭＭ
のパラメータは、キーワード以外の種々の異なる音声や
雑音を１つのＨＭＭに入力して学習することにより求め
られていた。As described above, the background HMM used in the conventional method for normalizing the score of the HMM suitable for performing the word spotting is conventionally used.
The parameters have been obtained by inputting various different voices and noises other than the keywords into one HMM and learning them.

【０００８】このため、バックグラウンドＨＭＭの学習
に使用する音声の種類が変更された場合には、バックグ
ラウンドＨＭＭのパラメータを再度学習しなければなら
ず問題であった。また、異なる音声を１つのＨＭＭに入
力してパラメータを求めることから、パラメータが収束
しないという問題もあった。For this reason, when the type of voice used for learning the background HMM is changed, the parameters of the background HMM must be learned again, which is a problem. Further, since different voices are input to one HMM to obtain the parameters, there is a problem that the parameters do not converge.

【０００９】そこで本発明は、バックグラウンドＨＭＭ
の学習に使用する認識対象外の各音声または雑音のそれ
ぞれについて、その各音声または雑音に固有のＨＭＭの
パラメータを求めて保存し、この各音声または雑音毎に
求めたパラメータからバックグラウンドＨＭＭのパラメ
ータを求めることにより、パラメータの収束性が悪くな
るという問題が解消でき、更にバックグラウンドＨＭＭ
の学習に使用する音声または雑音の種類を変更する場合
にも、変更する音声または雑音に固有のＨＭＭのパラメ
ータだけを求めるだけで済み、目的にあったバックグラ
ウンドＨＭＭの学習が迅速に行えるバックグラウンドＨ
ＭＭパラメータ抽出方法およびその方法を用いた音声認
識装置を提供することを目的とする。Therefore, the present invention provides a background HMM.
For each speech or noise not used for recognition used for learning, the parameters of the HMM peculiar to the speech or noise are found and stored, and the parameters of the background HMM are calculated from the parameters found for each speech or noise. , The problem of poor convergence of parameters can be solved, and the background HMM
Even when changing the type of speech or noise used for learning, it is only necessary to find the parameters of the HMM specific to the speech or noise to be changed, and the background that suits the purpose can be learned quickly. H
An object of the present invention is to provide an MM parameter extraction method and a speech recognition device using the method.

【００１０】[0010]

【課題を解決するための手段】本発明は、入力音声の認
識にキーワードＨＭＭと組み合わせて用いられるバック
グラウンドＨＭＭの学習（パラメータ抽出）に使用する
任意の各種音声または雑音のそれぞれについて、その各
音声または雑音に固有のＨＭＭのパラメータを求めて記
憶手段に格納し、この各音声または雑音毎に求めた各Ｈ
ＭＭのパラメータをもとに、バックグラウンドＨＭＭに
対する対数尤度を求めるために使用するバックグラウン
ドＨＭＭのパラメータを求めるようにしたことを特徴と
するものである。The present invention relates to each of various arbitrary voices or noises used for learning (parameter extraction) of a background HMM used in combination with a keyword HMM for recognizing an input voice. Alternatively, HMM parameters specific to noise are obtained and stored in the storage means, and each H obtained for each voice or noise is obtained.
It is characterized in that the parameters of the background HMM used for obtaining the log-likelihood for the background HMM are obtained based on the parameters of the MM.

【００１１】また本発明は、バックグラウンドＨＭＭの
学習に使用する各種音声または雑音を変更する場合に
は、変更後の各種音声または雑音のうち記憶手段にＨＭ
Ｍパラメータが格納されていない音声または雑音につい
てのみ、その音声または雑音に固有のＨＭＭのパラメー
タを求めて記憶手段に格納し、この記憶手段に格納され
ている各ＨＭＭのパラメータのうち、変更後の各種音声
または雑音の各ＨＭＭのパラメータをもとに、バックグ
ラウンドＨＭＭのパラメータを再抽出するようにしたこ
とも特徴とする。また本発明は、上記のバックグラウン
ドＨＭＭパラメータの抽出機能を音声認識装置に搭載し
たことも特徴とする。Further, according to the present invention, when various voices or noises used for learning the background HMM are changed, the HM is stored in the storage means among the various voices or noises after the change.
Only for speech or noise for which the M parameter is not stored, the HMM parameter specific to the speech or noise is obtained and stored in the storage means, and among the parameters of each HMM stored in this storage means It is also characterized in that the parameters of the background HMM are re-extracted based on the parameters of each HMM of various voices or noises. The present invention is also characterized in that the above-mentioned background HMM parameter extraction function is incorporated in a voice recognition device.

【００１２】[0012]

【作用】上記の構成においては、バックグラウンドＨＭ
Ｍの学習に必要な、キーワード（認識対象としている音
声）以外の任意の各音声（または雑音）毎にＨＭＭのパ
ラメータが求められ、この各音声（または雑音）毎に求
められたＨＭＭのパラメータからバックグラウンドＨＭ
Ｍのパラメータが求められる。この際の計算には、各音
声（または雑音）毎に求めたＨＭＭのパラメータの平均
または加重平均をとる方法などが適用される。In the above structure, the background HM
The parameters of the HMM required for learning M are obtained for each voice (or noise) other than the keyword (the voice that is the recognition target), and from the HMM parameters obtained for each voice (or noise) Background HM
The parameters of M are determined. For the calculation at this time, a method of taking an average or a weighted average of the parameters of the HMM obtained for each voice (or noise) is applied.

【００１３】このように、上記の構成においては、バッ
クグラウンドＨＭＭのパラメータが、バックグラウンド
ＨＭＭの学習に必要な各音声（または雑音）毎に求めた
ＨＭＭから求められるので、バックグラウンドＨＭＭの
学習に使用する音声（または雑音）が変更になった場合
には、まだ求められていない音声（または雑音）につい
てのＨＭＭパラメータのみを求めるだけで、変更後の各
音声（または雑音）の各ＨＭＭのパラメータに基づくバ
ックグラウンドＨＭＭのパラメータの再抽出が可能とな
る。このため、バックグラウンドＨＭＭのパラメータの
計算を変更後の全音声に対してやり直す必要がなく、バ
ックグラウンドＨＭＭの学習を高速に行うことが可能と
なる。As described above, in the above configuration, the parameters of the background HMM are obtained from the HMM obtained for each voice (or noise) required for learning the background HMM, so that the background HMM can be learned. When the voice (or noise) to be used is changed, only the HMM parameters of the voice (or noise) that have not yet been obtained are obtained, and the parameters of each HMM of each voice (or noise) after the change are obtained. It is possible to re-extract the parameters of the background HMM based on Therefore, it is not necessary to redo the calculation of the parameters of the background HMM for all the changed voices, and the background HMM can be learned at high speed.

【００１４】また上記の構成においては、従来のように
バックグラウンドＨＭＭの学習に使用する各音声（また
は雑音）をバックグラウンドＨＭＭのパラメータ計算に
直接使用するのではなく、各音声（または雑音）毎にＨ
ＭＭのパラメータを計算した後、これらの各パラメータ
を用いてバックグラウンドＨＭＭのパラメータを計算す
るために、パラメータの収束が速く、高速に学習を行う
ことができる。Further, in the above configuration, each voice (or noise) used for learning the background HMM is not directly used for parameter calculation of the background HMM as in the conventional case, but each voice (or noise) is used. To H
After the parameters of the MM are calculated, the parameters of the background HMM are calculated using these parameters, so that the parameters converge quickly and learning can be performed at high speed.

【００１５】[0015]

【実施例】以下、本発明の一実施例について図面を参照
して説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【００１６】図１は、本発明の一実施例に係る音声認識
装置の構成を示すブロック図である。この図１の音声認
識装置における処理は、基本的には、音声学的に意味の
あるセグメント（Phonetic Segment；以下ＰＳと記述す
る）を認識処理単位とし、このＰＳ単位の認識辞書を用
いてＰＳに量子化された入力音声をＨＭＭ認識部で単語
照合するものである。ここまでの図１の音声認識装置に
おける処理を更に詳しく説明する。FIG. 1 is a block diagram showing the configuration of a voice recognition apparatus according to an embodiment of the present invention. The processing in the speech recognition apparatus of FIG. 1 basically uses a phonetic segment (hereinafter referred to as PS) as a processing unit for recognition in terms of phonetically, and a PS is recognized by using the recognition dictionary of the PS unit. The HMM recognition unit performs word matching on the quantized input speech. The processing up to this point in the speech recognition apparatus of FIG. 1 will be described in more detail.

【００１７】まず、図示せぬマイクロフォンを通して入
力される音声信号（入力音声）は音響分析部１に導かれ
る。音響分析部１は、入力音声を音響分析して特徴パラ
メータを求めるためのものである。音響分析部１は、図
２にその詳細を示すように、Ａ／Ｄ変換器１１、パワー
計算部１２およびＬＰＣ分析部１３から構成される。First, a voice signal (input voice) input through a microphone (not shown) is guided to the acoustic analysis unit 1. The acoustic analysis unit 1 is for acoustically analyzing the input voice to obtain characteristic parameters. The acoustic analysis unit 1, as shown in detail in FIG. 2, includes an A / D converter 11, a power calculation unit 12, and an LPC analysis unit 13.

【００１８】音響分析部１に導かれた入力音声はＡ／Ｄ
変換器１１にて、例えばサンプリング周波数１２ｋＨ
ｚ，１２ビットで量子化された後、パワー計算部１２に
入力されて、その音声パワーが計算され、更にＬＰＣ分
析部１３に入力されて、ＬＰＣ（Linear Predictive Co
ding）メルケプストラム分析（ＬＰＣ分析）される。こ
のＬＰＣ分析は、例えばフレーム長１６msec、フレーム
周期８msecで１６次のＬＰＣメルケプストラムを分析パ
ラメータとして行われる。なお、音響分析部１１での音
響分析は、ＬＰＣメルケプストラム分析に限るものでは
なく、ＢＰＦ（Band Pass Filter）分析等でもよい。The input voice guided to the acoustic analysis unit 1 is A / D
In the converter 11, for example, a sampling frequency of 12 kHz
After being quantized with z, 12 bits, it is input to the power calculation unit 12, the voice power thereof is calculated, and further input to the LPC analysis unit 13, and LPC (Linear Predictive Co
ding) Mel cepstrum analysis (LPC analysis). This LPC analysis is performed using a 16th-order LPC mel cepstrum as an analysis parameter with a frame length of 16 msec and a frame period of 8 msec, for example. The acoustic analysis by the acoustic analysis unit 11 is not limited to the LPC mel cepstrum analysis, and may be a BPF (Band Pass Filter) analysis or the like.

【００１９】音響分析部１での音響分析により求められ
る入力音声の特徴パラメータは、音声量子化部２に送ら
れる。音声量子化部２は、この特徴パラメータから、所
定のＰＳ（音声セグメント）単位の認識辞書が登録され
ているＰＳ辞書記憶部３を用いてフレーム毎にラベル
（ＰＳラベル）を求める。即ち音声量子化部２は、音響
分析部１で分析された入力音声の特徴パラメータをＰＳ
辞書記憶部３に登録されている所定のＰＳ単位の認識辞
書と時間軸方向に連続的にマッチング処理し、各フレー
ム毎に類似度が最大となるＰＳを量子化結果として出力
する。ここで、音声量子化部３でのＰＳによる連続マッ
チング処理は、次式（１）に示す複合ＬＰＣメルケプス
トラム類似尺度を用いて行われる。The characteristic parameters of the input voice obtained by the acoustic analysis in the acoustic analysis unit 1 are sent to the voice quantization unit 2. The voice quantizing unit 2 obtains a label (PS label) for each frame from the characteristic parameter by using the PS dictionary storage unit 3 in which a recognition dictionary of a predetermined PS (voice segment) unit is registered. That is, the voice quantization unit 2 uses the PS of the characteristic parameters of the input voice analyzed by the acoustic analysis unit 1.
A matching process with the recognition dictionary of a predetermined PS unit registered in the dictionary storage unit 3 is continuously performed in the time axis direction, and the PS having the maximum similarity for each frame is output as the quantization result. Here, the continuous matching process by PS in the voice quantization unit 3 is performed using the composite LPC mel-cepstral similarity measure shown in the following expression (1).

【００２０】[0020]

【数１】 [Equation 1]

【００２１】なお、（１）式において、ＣはＬＰＣメル
ケプストラム、Ｗ_m ^(Ki)、φ_m ^(ki)はそれぞれＰＳ名Ｋ
i の固有値から求められる重みと固有ベクトルである。
また、（・）は内積を示し、‖ ‖はノルムを示し
ている。本実施例で用いられるＰＳとしては、例えば次
のようなものがある。（１）持続性セグメント：（１−１）母音定常部（１−２）摩擦子音部（２）子音セグメント：母音への渡り（過渡部）を
含む部分［半音節］（３）音節境界セグメント：（３−１）母音境界（３−２）母音、子音境界（３−３）母音、無音境界（４）その他のセグメント：無声化母音等In the equation (1), C is the LPC mel cepstrum, W _m ^(Ki) and φ _m ^(ki) are the PS names K, respectively.
These are the weights and eigenvectors found from the eigenvalues of i.
In addition, (・) indicates the inner product and ‖ ‖ indicates the norm. Examples of PS used in this embodiment include the following. (1) Persistence segment: (1-1) Vowel stationary part (1-2) Friction consonant part (2) Consonant segment: Part including transition (transition part) to vowel [semi-syllable] (3) Syllable boundary segment : (3-1) Vowel boundary (3-2) Vowel, consonant boundary (3-3) Vowel, silence boundary (4) Other segment: unvoiced vowel, etc.

【００２２】このうち、（１）、（２）および（４）の
一部については音節を認識セグメントとする場合にも採
用されることが多い。しかし、本実施例におけるＰＳの
特徴は、上記（１）、（２）、（４）に示されるセグメ
ントに加えて上記（３）の音節境界セグメントを採用し
たことにある。Of these, some of (1), (2) and (4) are often adopted when syllables are used as recognition segments. However, the PS of this embodiment is characterized in that the syllable boundary segment of (3) above is adopted in addition to the segments of (1), (2) and (4) above.

【００２３】さて、音声量子化部２での連続マッチング
処理で求められたＰＳラベルの系列は、ＨＭＭ認識部４
に送られる。ＨＭＭ認識部４は、音声量子化部２により
求められたラベル系列を、認識対象とする複数種の各音
声（キーワード）毎のキーワードＨＭＭと（各キーワー
ドＨＭＭに共通の）バックグラウンドＨＭＭとを用いて
認識するものである。各キーワードＨＭＭのパラメータ
は、後述するキーワードＨＭＭパラメータ推定部８によ
って予め求められてキーワードＨＭＭパラメータ記憶部
５に記憶される。一方、各キーワードＨＭＭに共通のバ
ックグラウンドＨＭＭのパラメータは、後述するバック
グラウンドＨＭＭパラメータ抽出部９によって求められ
てバックグラウンドＨＭＭパラメータ記憶部６に記憶さ
れる。なお、バックグラウンドＨＭＭを、各キーワード
ＨＭＭにそれぞれ対応して別々に用意することも可能で
ある。Now, the PS label sequence obtained by the continuous matching process in the voice quantizer 2 is the HMM recognizer 4
Sent to. The HMM recognizing unit 4 uses the label sequence obtained by the voice quantizing unit 2 as a keyword HMM for each of a plurality of types of speech (keywords) to be recognized and a background HMM (common to each keyword HMM). To recognize. The parameters of each keyword HMM are obtained in advance by the keyword HMM parameter estimation unit 8 described later and stored in the keyword HMM parameter storage unit 5. On the other hand, the background HMM parameter common to each keyword HMM is obtained by the background HMM parameter extraction unit 9 described later and stored in the background HMM parameter storage unit 6. It is also possible to separately prepare the background HMM for each keyword HMM.

【００２４】ＨＭＭ認識部４は、各キーワードＨＭＭの
対数尤度Ｌk を算出するキーワードＨＭＭ尤度計算部４
１、バックグラウンドＨＭＭの対数尤度Ｌb を算出する
バックグラウンドＨＭＭ尤度計算部４２、これら２種の
対数尤度Ｌk ，Ｌb をもとに各キーワードＨＭＭ毎にス
コアの正規化を行うスコア正規化部４３、およびスコア
正規化部４３により正規化された各スコアの比較演算を
行うスコア比較部４４により構成されている。The HMM recognition unit 4 calculates the logarithmic likelihood Lk of each keyword HMM by the keyword HMM likelihood calculation unit 4.
1. Background HMM likelihood calculation unit 42 for calculating log likelihood Lb of background HMM, score normalization for normalizing scores for each keyword HMM based on these two types of log likelihood Lk, Lb It is configured by a unit 43 and a score comparison unit 44 that performs a comparison calculation of each score normalized by the score normalization unit 43.

【００２５】さて、音声量子化部２からＨＭＭ認識部４
に送られたラベル系列は、同認識部４内のキーワードＨ
ＭＭ尤度計算部４１およびバックグラウンドＨＭＭ尤度
計算部４２に渡される。Now, from the voice quantizer 2 to the HMM recognizer 4
The label sequence sent to the recognition unit 4 is the keyword H in the recognition unit 4.
It is passed to the MM likelihood calculator 41 and the background HMM likelihood calculator 42.

【００２６】キーワードＨＭＭ尤度計算部４１は、音声
量子化部２により求められたラベル系列の各キーワード
ＨＭＭに対する対数尤度Ｌk を、キーワードＨＭＭパラ
メータ記憶部５に記憶されている各キーワードＨＭＭ毎
のパラメータ（後述する出力確率、遷移確率等）を用い
て求める。The keyword HMM likelihood calculation unit 41 calculates the log likelihood Lk for each keyword HMM of the label sequence obtained by the speech quantization unit 2 for each keyword HMM stored in the keyword HMM parameter storage unit 5. It is obtained using parameters (output probability, transition probability, etc., which will be described later).

【００２７】一方、バックグラウンドＨＭＭ尤度計算部
４２は、音声量子化部２により求められたラベル系列の
バックグラウンドＨＭＭに対する対数尤度Ｌb を、バッ
クグラウンドＨＭＭパラメータ記憶部６に記憶されてい
るバックグラウンドＨＭＭのパラメータ（出力確率、遷
移確率等）を用いて求める。On the other hand, the background HMM likelihood calculation unit 42 stores the log likelihood Lb for the background HMM of the label sequence obtained by the voice quantization unit 2 in the background HMM parameter storage unit 6. It is obtained using the parameters of the ground HMM (output probability, transition probability, etc.).

【００２８】上記両計算部４１，４２により各キーワー
ドＨＭＭ，バックグラウンドＨＭＭ毎に求められた対数
尤度Ｌk ，Ｌb はスコア正規化部４３に送られる。スコ
ア正規化部４３は、各キーワードＨＭＭの対数尤度Ｌk
と（各キーワードＨＭＭに共通の）バックグラウンドＨ
ＭＭの対数尤度Ｌb の差をそれぞれ求め、各キーワード
ＨＭＭのスコアの正規化を行う。スコア正規化部４３で
正規化された各キーワードＨＭＭのスコアはスコア比較
部４４で比較演算され、その比較演算結果が認識結果出
力部７に渡される。The log-likelihoods Lk and Lb obtained for each of the keyword HMM and the background HMM by both the calculation units 41 and 42 are sent to the score normalization unit 43. The score normalization unit 43 calculates the log likelihood Lk of each keyword HMM.
And background H (common to each keyword HMM)
The difference between the log-likelihoods Lb of the MMs is obtained, and the score of each keyword HMM is normalized. The score of each keyword HMM normalized by the score normalization unit 43 is compared and calculated by the score comparison unit 44, and the comparison calculation result is passed to the recognition result output unit 7.

【００２９】認識結果出力部７は、入力音声に対する音
声認識結果の出力を行うもので、スコア比較部４４での
比較演算の結果をもとに、スコアが最大となる（キーワ
ードＨＭＭに対応する）キーワードを認識結果として出
力する。The recognition result output unit 7 outputs the voice recognition result for the input voice, and the score becomes maximum based on the result of the comparison operation in the score comparison unit 44 (corresponding to the keyword HMM). The keyword is output as the recognition result.

【００３０】ここで、ＨＭＭの一般的定式化について述
べる。ＨＭＭでは、Ｎ個の状態Ｓ₁，Ｓ₂，…，Ｓ_Nを
持ち、初期状態がこれらＮ個の状態に確率的に分布して
いるとする。音声では、一定のフレーム周期毎に、ある
確率（遷移確率）で状態を遷移するモデルが使われる。
遷移の際には、ある確率（出力確率）でラベルを出力す
るが、ラベルを出力しないで状態を遷移するナル遷移を
導入することもある。出力ラベル系列が与えられても状
態遷移系列は一意には決まらない。観測できるのは、ラ
ベル系列だけであることからhidden（隠れ）markov mod
el （ＨＭＭ）と呼ばれている。ＨＭＭのモデルＭは次
の６つのパラメータから定義される。Ｎ：状態数（状態Ｓ₁，Ｓ₂，…，Ｓ_N）Ｋ：ラベル数（ラベルＲ＝１，２，…，Ｋ）ｐ_ij ：遷移確率Ｓ_iからＳ_jに遷移する確率ｑ_ij(k) ：Ｓ_iからＳ_jへの遷移の際にラベルｋを出力
する確率ｍ_i ：初期状態確率初期状態がＳ_iである確率Ｆ：最終状態の集合Here, a general formulation of the HMM will be described. It is assumed that the HMM has N states S ₁ , S ₂ , ..., _SN , and the initial state is stochastically distributed to these N states. For speech, a model that transitions a state with a certain probability (transition probability) is used for each fixed frame period.
At the time of transition, a label is output with a certain probability (output probability), but a null transition that transitions the state without outputting the label may be introduced. Even if the output label sequence is given, the state transition sequence is not uniquely determined. Since only label sequences can be observed, hidden markov mod
It is called el (HMM). The HMM model M is defined by the following six parameters. N: number of states (states S ₁ , S ₂ , ..., S _N ) K: number of labels (labels R = 1, 2, ..., K) p _ij : transition probability Probability of transition from S _i to S _j q _ij ( k): Probability of outputting the label k at the transition from S _i to S _j m _i : Probability of initial state S Probability that initial state is S _i F: Set of final states

【００３１】次に、モデルＭに対して音声の特徴を反映
した遷移上の制限を加える。音声では、一般的に状態Ｓ
_iから以前に通過した状態（Ｓ_i-1，Ｓ_i-2，…）に戻
るようなループの遷移は時間的前後関係を乱すため許さ
れない。この種のＨＭＭの構造としては、図３のような
例が代表的である。Next, the model M is subject to transition restrictions that reflect the characteristics of voice. In voice, the state S is generally
The transition of the loop from _i to the previous passed state (S _i-1 , S _i-2 , ...) Is not allowed because it disturbs the temporal context. As a structure of this kind of HMM, an example as shown in FIG. 3 is typical.

【００３２】ＨＭＭの評価は、モデルＭが第１位のラベ
ル系列Ｏ₁＝ｏ₁₁，ｏ₂₁，…，ｏ_T1を出力する確率Ｐｒ
（Ｏ／Ｍ）を求めることである。認識時には、ＨＭＭ認
識部４で各モデルを仮定してＰｒ（Ｏ／Ｍ）が最大にな
るようなモデルＭを探す。このＨＭＭ認識部４で仮定さ
れる各モデル、即ちキーワードＨＭＭのモデル（のパラ
メータ）は、キーワードＨＭＭの学習により求められる
ものであり、キーワードＨＭＭパラメータ記憶部５に蓄
積されている。The HMM is evaluated by the probability Pr that the model M outputs the first-ranked label sequence O ₁ = o ₁₁ , o ₂₁ , ..., O _T1.
(O / M). At the time of recognition, the HMM recognition unit 4 assumes each model and searches for a model M that maximizes Pr (O / M). Each model assumed in the HMM recognizing unit 4, that is, the model (parameter thereof) of the keyword HMM is obtained by learning the keyword HMM and is stored in the keyword HMM parameter storage unit 5.

【００３３】さて、図１の音声認識装置では、キーワー
ドＨＭＭの学習が行えるように、キーワードＨＭＭパラ
メータ推定部８が設けられている。このキーワードＨＭ
Ｍパラメータ推定部８を用いたキーワードＨＭＭの学習
のためには、図１の音声認識装置をキーワードＨＭＭの
学習モードに設定した状態で、キーワードとなる（単語
の）音声を話者により逐次発声してもらう。この話者か
らの入力音声を受けて音響分析部１にてＬＰＣ分析して
特徴パラメータを求め、その特徴パラメータから音声量
子化部２によりＰＳラベル系列を学習データとして求め
る。The speech recognition apparatus of FIG. 1 is provided with the keyword HMM parameter estimation unit 8 so that the keyword HMM can be learned. This keyword HM
In order to learn the keyword HMM using the M parameter estimation unit 8, the speaker (voice) is sequentially uttered by the speaker while the speech recognition apparatus of FIG. 1 is set to the learning mode of the keyword HMM. Ask. The input voice from this speaker is received and LPC analysis is performed by the acoustic analysis unit 1 to obtain a characteristic parameter, and the PS quantization sequence is obtained from the characteristic parameter by the speech quantization unit 2 as learning data.

【００３４】以上のキーワード音声の発声を、複数の話
者により、複数のキーワードについて行ってもらうこと
により、同一のキーワードに対する話者毎の学習用ＰＳ
ラベル系列が、予め定められた複数のキーワードについ
て求められる。これらの多数の学習用ＰＳラベル系列は
図示せぬ記憶部を介してキーワードＨＭＭパラメータ推
定部８に渡される。A plurality of speakers perform the above-mentioned keyword voice utterance on a plurality of keywords, so that the learning PS for each speaker for the same keyword can be obtained.
A label series is calculated for a plurality of predetermined keywords. These many learning PS label sequences are passed to the keyword HMM parameter estimation unit 8 via a storage unit (not shown).

【００３５】キーワードＨＭＭパラメータ推定部８は、
以上の学習用ＰＳラベル系列をキーワード別にＨＭＭに
与え、Ｐｒ（Ｏ／Ｍ）を最大にするように、各キーワー
ドに対応するモデルＭのパラメータを推定する。そして
キーワードＨＭＭパラメータ推定部８は、推定した各キ
ーワード毎のモデルのパラメータをキーワードＨＭＭパ
ラメータ記憶部５に登録する。次に、本発明に直接関係
するバックグラウンドＨＭＭのパラメータ抽出について
説明する。The keyword HMM parameter estimation unit 8 is
The above learning PS label sequence is given to the HMM for each keyword, and the parameters of the model M corresponding to each keyword are estimated so as to maximize Pr (O / M). Then, the keyword HMM parameter estimation unit 8 registers the estimated model parameter for each keyword in the keyword HMM parameter storage unit 5. Next, the background HMM parameter extraction directly related to the present invention will be described.

【００３６】まず、図１の音声認識装置には、バックグ
ラウンドＨＭＭのパラメータが抽出できるようにバック
グラウンドＨＭＭパラメータ抽出部９が設けられてい
る。このバックグラウンドＨＭＭパラメータ抽出部９
は、バックグラウンドＨＭＭのパラメータ抽出に必要な
（認識対象外の）複数種の音声（または雑音）のそれぞ
れについて、ＨＭＭのパラメータを学習するためのＨＭ
Ｍパラメータ推定部９１、同推定部９１により求められ
たＨＭＭのパラメータを、対応する音声（または雑音）
毎に記憶するためのＨＭＭパラメータ記憶部９２、同記
憶部９２からバックグラウンドとしようとする音声（ま
たは雑音）のＨＭＭのパラメータを取り出して、バック
グラウンドＨＭＭのパラメータを算出するバックグラウ
ンドＨＭＭパラメータ演算部９３、およびバックグラウ
ンドモデル管理部９４により構成されている。この管理
部９４は、ＨＭＭパラメータ推定部９１およびバックグ
ラウンドＨＭＭパラメータ演算部９３を制御する。First, the speech recognition apparatus of FIG. 1 is provided with a background HMM parameter extraction unit 9 so that the parameters of the background HMM can be extracted. This background HMM parameter extraction unit 9
Is an HM for learning the parameters of the HMM for each of a plurality of types of speech (or noise) (which are not recognized) necessary for extracting the parameters of the background HMM.
The parameter of the HMM obtained by the M parameter estimation unit 91 and the estimation unit 91 is set to the corresponding speech (or noise).
An HMM parameter storage unit 92 for storing each, and a background HMM parameter calculation unit that extracts the HMM parameter of the voice (or noise) to be the background from the storage unit 92 and calculates the parameter of the background HMM. 93 and a background model management unit 94. The management unit 94 controls the HMM parameter estimation unit 91 and the background HMM parameter calculation unit 93.

【００３７】本実施例において、バックグラウンドＨＭ
Ｍパラメータ抽出部９によりバックグラウンドＨＭＭの
パラメータを求めるためには、図１の音声認識装置をバ
ックグラウンドＨＭＭの学習モードに設定する必要があ
る。そして、この状態で、目的とするパラメータ抽出に
必要な音声についての学習データを、音響分析部１、音
声量子化部２を通してバックグラウンドＨＭＭパラメー
タ抽出部９に与える。In this embodiment, the background HM
In order to obtain the parameters of the background HMM by the M parameter extraction unit 9, it is necessary to set the speech recognition apparatus of FIG. 1 to the learning mode of the background HMM. Then, in this state, the learning data about the voice necessary for the target parameter extraction is given to the background HMM parameter extraction unit 9 through the acoustic analysis unit 1 and the voice quantization unit 2.

【００３８】ここで、バックグラウンドＨＭＭの学習に
必要な音声、即ちバックグラウンドとする認識対象外の
音声が、例えば音声Ａ，Ｂ，Ｃの３種類であるものとす
ると、音声Ａ，Ｂ，Ｃについての多数の学習データを、
バックグラウンドＨＭＭパラメータ抽出部９に与えれば
よい。Here, assuming that the voices necessary for learning the background HMM, that is, the voices not to be recognized as the background are three voices A, B, and C, the voices A, B, and C are given. A large number of learning data about
It may be given to the background HMM parameter extraction unit 9.

【００３９】バックグラウンドＨＭＭパラメータ抽出部
９内のＨＭＭパラメータ推定部９１は、上記の音声Ａ，
Ｂ，Ｃについての多数の学習データから、音声Ａ，Ｂ，
Ｃ毎に、（前記キーワードＨＭＭパラメータ推定部８で
のパラメータ推定と同様にして）、その音声Ａ，Ｂ，Ｃ
のＨＭＭのパラメータを推定する。そしてＨＭＭパラメ
ータ推定部９１は、推定した音声Ａ，Ｂ，ＣのＨＭＭの
パラメータを、バックグラウンドモデル管理部９４から
各音声別に指定されるＨＭＭパラメータ記憶部９２内領
域に格納する。The HMM parameter estimation unit 91 in the background HMM parameter extraction unit 9 uses the above-mentioned speech A,
From a large number of learning data about B and C, speech A, B,
For each C, (similar to the parameter estimation in the keyword HMM parameter estimation unit 8), its voice A, B, C
Estimate the parameters of the HMM. Then, the HMM parameter estimation unit 91 stores the estimated HMM parameters of the voices A, B, and C in an area inside the HMM parameter storage unit 92 designated by the background model management unit 94 for each voice.

【００４０】バックグラウンドＨＭＭパラメータ抽出部
９内のバックグラウンドＨＭＭパラメータ演算部９３
は、バックグラウンドモデル管理部９４の指定のもと
で、バックグラウンドＨＭＭの学習に必要な音声Ａ，
Ｂ，Ｃの各ＨＭＭパラメータをＨＭＭパラメータ記憶部
９２から取り出す。そしてバックグラウンドＨＭＭパラ
メータ演算部９３は、取り出した音声Ａ，Ｂ，Ｃの各パ
ラメータの例えば平均を計算して、バックグラウンドＨ
ＭＭのパラメータを求め、バックグラウンドＨＭＭパラ
メータ記憶部６に格納する。The background HMM parameter calculation unit 93 in the background HMM parameter extraction unit 9
Is a voice A necessary for learning the background HMM under the designation of the background model management unit 94.
The HMM parameters of B and C are retrieved from the HMM parameter storage unit 92. Then, the background HMM parameter calculation unit 93 calculates, for example, an average of the extracted parameters of the voices A, B, and C to calculate the background HMM.
The MM parameters are calculated and stored in the background HMM parameter storage unit 6.

【００４１】このように本実施例では、まずバックグラ
ウンドＨＭＭの学習に必要な各音声毎にＨＭＭのパラメ
ータを求めてＨＭＭパラメータ記憶部９２に蓄積してお
き、しかる後に、そのバックグラウンドＨＭＭの学習に
必要な各音声のＨＭＭのパラメータをもとにした計算に
よりバックグラウンドＨＭＭのパラメータを求めてい
る。As described above, in this embodiment, first, the HMM parameters are obtained for each voice necessary for learning the background HMM and stored in the HMM parameter storage unit 92, and thereafter, the background HMM learning is performed. The background HMM parameters are obtained by calculation based on the HMM parameters of each voice required for the above.

【００４２】このため、バックグラウンドＨＭＭの学習
に使用する音声を、音声Ａ，Ｂ，Ｃから例えば音声Ａ，
Ｂ，Ｄに変更する場合に、音声Ａ，ＢのＨＭＭの再推定
は（そのパラメータがＨＭＭパラメータ記憶部９２に蓄
積されていることから）行う必要がなく、新たな音声Ｄ
のＨＭＭのパラメータだけを推定すればよい。これにつ
いて具体的に説明する。Therefore, the voices used for learning the background HMM are changed from the voices A, B, and C to, for example, the voice A,
When changing to B and D, it is not necessary to re-estimate the HMMs of the voices A and B (since the parameters are stored in the HMM parameter storage unit 92), and a new voice D is obtained.
It is only necessary to estimate the parameters of the HMM. This will be specifically described.

【００４３】まず、音声ＤのＨＭＭのパラメータ推定
は、前記した音声Ａ，Ｂ，ＣのＨＭＭのパラメータの推
定の場合と同様に、音声Ｄの学習データをＨＭＭパラメ
ータ推定部９１に与えることにより行われる。ＨＭＭパ
ラメータ推定部９１により推定された音声ＤのＨＭＭの
パラメータは、バックグラウンドモデル管理部９４の指
定する音声Ｄに固有のＨＭＭパラメータ記憶部９２内領
域に格納される。First, the HMM parameter estimation of the voice D is performed by giving the learning data of the voice D to the HMM parameter estimation unit 91, as in the case of the estimation of the HMM parameters of the voices A, B and C described above. Be seen. The HMM parameters of the voice D estimated by the HMM parameter estimation unit 91 are stored in the internal area of the HMM parameter storage unit 92 specific to the voice D designated by the background model management unit 94.

【００４４】バックグラウンドＨＭＭパラメータ演算部
９３は、バックグラウンドモデル管理部９４の指定のも
とで、ＨＭＭパラメータ記憶部９２から、既に求められ
ていた音声Ａ，Ｂの各ＨＭＭのパラメータと、今回新た
に求められた音声ＤのＨＭＭのパラメータとを取り出
し、それらの平均を計算して、変更後のバックグラウン
ドＨＭＭのパラメータを求める。そして、バックグラウ
ンドＨＭＭパラメータ演算部９３は、バックグラウンド
ＨＭＭパラメータ記憶部６の内容を、この新たに求めた
バックグラウンドＨＭＭのパラメータに書き換える。The background HMM parameter calculation unit 93, under the designation of the background model management unit 94, sets the parameters of the respective HMMs of the voices A and B, which have already been obtained from the HMM parameter storage unit 92, and this time. The parameters of the HMM of the voice D obtained in step 1 are taken out, the average of them is calculated, and the parameter of the background HMM after the change is obtained. Then, the background HMM parameter calculation unit 93 rewrites the contents of the background HMM parameter storage unit 6 with the newly obtained parameters of the background HMM.

【００４５】なお、従来のように、バックグラウンドＨ
ＭＭの学習に使用する全ての音声を１つのＨＭＭに入力
して直接バックグラウンドＨＭＭのパラメータを推定す
る方式では、上記の例であれば、音声Ａ，Ｂ，Ｄを全て
用いてパラメータの推定を行わなければならず、計算量
が大幅に増加する。また、異なる音声のデータを１つの
モデルに入力した場合には、パラメータの収束性が悪く
なる。As in the conventional case, the background H
In the method of directly estimating the parameters of the background HMM by inputting all the voices used for MM learning into one HMM, in the above example, the estimation of the parameters is performed by using all the voices A, B, and D. This has to be done, which greatly increases the amount of calculation. Further, when data of different voices are input to one model, the convergence of parameters becomes poor.

【００４６】さて、本発明の要旨とするところは、認識
対象外音声のモデルであるバックグラウンドＨＭＭのパ
ラメータを求める際に、バックグラウンドＨＭＭの学習
に使用する異なる各音声（または雑音）を１つのモデル
に入力して直接パラメータを推定するのではなく、各音
声（または雑音）毎にモデルのパラメータを推定した
後、これらの各パラメータからバックグラウンドＨＭＭ
のパラメータを求めることにあるNow, the gist of the present invention is that when obtaining the parameters of the background HMM, which is a model of unrecognized speech, each different speech (or noise) used for learning the background HMM is treated as one. After estimating the parameters of the model for each speech (or noise), instead of inputting them into the model and estimating the parameters directly, the background HMM is calculated from each of these parameters.
To find the parameters of

【００４７】したがって、ＨＭＭの構造などは図３に示
したものに限らない。また、前記実施例では、各音声毎
のＨＭＭのパラメータの平均をバックグラウンドＨＭＭ
のパラメータとする場合について説明したが、これに限
るものではなく、例えば加重平均法を適用し、各音声毎
のＨＭＭのパラメータに重み付け等を行ってから平均を
求めて、バックグラウンドＨＭＭのパラメータとするよ
うにしてもよい。Therefore, the structure of the HMM is not limited to that shown in FIG. In the above embodiment, the average of the HMM parameters for each voice is calculated as the background HMM.
However, the present invention is not limited to this. For example, the weighted average method is applied, the parameters of the HMM for each voice are weighted, and then the average is calculated to obtain the parameters of the background HMM. You may do it.

【００４８】また、キーワードＨＭＭパラメータ推定部
８およびバックグラウンドＨＭＭパラメータ抽出部９
は、必ずしも音声認識装置に設けられている必要はな
く、別の装置においてキーワードＨＭＭパラメータ推定
部８，バックグラウンドＨＭＭパラメータ抽出部９を用
いて、キーワードＨＭＭのパラメータ，バックグラウン
ドＨＭＭのパラメータを求め、ディスク装置等を通して
キーワードＨＭＭパラメータ記憶部５，バックグラウン
ドＨＭＭパラメータ記憶部６に格納するようにしてもよ
い。Further, the keyword HMM parameter estimation unit 8 and the background HMM parameter extraction unit 9
Is not necessarily provided in the voice recognition device, and the keyword HMM parameter estimation unit 8 and the background HMM parameter extraction unit 9 are used in another device to obtain the keyword HMM parameter and the background HMM parameter, It may be stored in the keyword HMM parameter storage unit 5 and the background HMM parameter storage unit 6 through a disk device or the like.

【００４９】また、前記実施例では、音声を量子化する
単位をＰＳ（音声セグメント）としたが、量子化の単位
は音素、音節であってもよく、更に音響学的分類とは異
なるクラスタリング手法を用いて量子化した単位を用い
てもよい。その他、本発明はその要旨を逸脱しない範囲
で種々変形して実施することができる。In the above embodiment, the unit for quantizing speech is PS (speech segment), but the unit of quantization may be a phoneme or a syllable, and a clustering method different from acoustic classification. You may use the unit quantized using. In addition, the present invention can be variously modified and implemented without departing from the scope of the invention.

【００５０】[0050]

【発明の効果】以上説明したように本発明によれば、バ
ックグラウンドＨＭＭのパラメータを、バックグラウン
ドＨＭＭの学習に使用する各音声（または雑音）毎に求
めて記憶手段に格納したＨＭＭのパラメータから求める
ようにしたので、バックグラウンドＨＭＭの学習に使用
する音声が変更になった場合等においては、バックグラ
ウンドＨＭＭのパラメータの計算を全音声に対してやり
直す必要がなく、目的に合ったバックグラウンドＨＭＭ
の学習を高速に行うことができる。As described above, according to the present invention, the parameters of the background HMM are obtained from the parameters of the HMM stored in the storage means obtained for each voice (or noise) used for learning the background HMM. Since the calculation is performed, when the voice used for learning the background HMM is changed, it is not necessary to recalculate the parameters of the background HMM for all voices, and the background HMM suitable for the purpose can be obtained.
Can be learned at high speed.

【００５１】また、本発明によれば、異なる音声をバッ
クグラウンドＨＭＭのパラメータ計算に直接使用するの
ではなく、各音声毎にＨＭＭのパラメータを計算し、そ
の計算結果を用いてバックグラウンドＨＭＭのパラメー
タを計算するので、パラメータの収束が速く、高速に学
習を行うことができる。According to the present invention, instead of directly using different voices for the background HMM parameter calculation, the HMM parameters are calculated for each voice, and the background HMM parameters are calculated using the calculation results. Since, the parameters converge quickly and learning can be performed at high speed.

[Brief description of drawings]

【図１】本発明の一実施例に係る音声認識装置の構成を
示すブロック図。FIG. 1 is a block diagram showing a configuration of a voice recognition device according to an embodiment of the present invention.

【図２】図１の装置における音響分析部１の詳細構成を
示すブロック図。2 is a block diagram showing a detailed configuration of an acoustic analysis unit 1 in the apparatus shown in FIG.

【図３】ＨＭＭの構造の代表例を示す図。FIG. 3 is a diagram showing a typical example of the structure of an HMM.

[Explanation of symbols]

１…音響分析部、２…音声量子化部（ラベル抽出手
段）、４…ＨＭＭ認識部、５…キーワードＨＭＭパラメ
ータ記憶部、６…バックグラウンドＨＭＭパラメータ記
憶部、７…認識結果出力部、８…キーワードＨＭＭパラ
メータ推定部、９…バックグラウンドＨＭＭパラメータ
抽出部、４１…キーワードＨＭＭ尤度計算部、４２…バ
ックグラウンドＨＭＭ尤度計算部、４３…スコア正規化
部、４４…スコア比較部、９１…ＨＭＭパラメータ推定
部、９２…ＨＭＭパラメータ記憶部（記憶手段）、９３
…バックグラウンドパラメータ演算部。DESCRIPTION OF SYMBOLS 1 ... Acoustic analysis part, 2 ... Speech quantization part (label extraction means), 4 ... HMM recognition part, 5 ... Keyword HMM parameter storage part, 6 ... Background HMM parameter storage part, 7 ... Recognition result output part, 8 ... Keyword HMM parameter estimation unit, 9 ... Background HMM parameter extraction unit, 41 ... Keyword HMM likelihood calculation unit, 42 ... Background HMM likelihood calculation unit, 43 ... Score normalization unit, 44 ... Score comparison unit, 91 ... HMM Parameter estimation unit, 92 ... HMM parameter storage unit (storage unit), 93
… Background parameter calculator.

Claims

[Claims]

1. A label is calculated for each frame from a feature parameter obtained by acoustically analyzing an input speech signal, and a log-likelihood of a sequence of this label with respect to an HMM (Hidden Markov Model) of a keyword to be recognized. And a log likelihood of a background HMM which is an HMM of any various speech or noise other than the keyword, of the same label series as the label series, and the keyword score based on these two log likelihoods. In the speech recognition method for generating a speech and recognizing an input speech, a HMM parameter specific to each speech or noise is obtained for each of various kinds of speech or noise used for parameter extraction of the background HMM. Based on the parameters of each HMM stored in the storage means and obtained for each voice or noise. , The background H
A background HMM parameter extraction method, characterized in that a parameter of the background HMM used for obtaining a log-likelihood for MM is obtained.

2. When changing various voices or noises used for the parameter extraction of the background HMM, only the voices or noises for which the parameter is not stored in the storage means among the various voices or noises after the change , HMM parameters specific to the voice or noise are obtained and stored in the storage means, and among the parameters of the HMMs stored in the storage means, the HMM parameters of various voices or noise after change are also stored. The background HMM according to claim 1, wherein the parameters of the background HMM are extracted again.
Parameter extraction method.

3. The background according to claim 1, wherein the parameter of the background HMM is obtained by taking an average or a weighted average of the parameters of the HMM for each of the various voices or noises. HMM parameter extraction method.

4. An acoustic analysis means for obtaining a characteristic parameter by acoustically analyzing an input voice signal, a label extraction means for obtaining a label for each frame from the characteristic parameter obtained by the acoustic analysis means, and this label extraction. Of the series of labels found by
First likelihood calculating means for obtaining a logarithmic likelihood of a keyword to be recognized with respect to an HMM, and a logarithm of a background HMM which is an HMM of any kind of speech or noise other than the keyword, of the same label series as the label series. Second likelihood calculating means for obtaining a likelihood, and score calculating means for obtaining a score of a keyword to be recognized based on both calculation results of the first likelihood calculating means and the second likelihood calculating means. And a means for recognizing the input voice based on the score obtained by the score calculating means, and for each of various voices or noises used for the parameter extraction of the background HMM, Parameter estimating means for obtaining a unique HMM parameter, and the above-mentioned parameter estimating means obtained by the parameter estimating means. Storage means for storing the parameters of each voice or noise HMM, and parameter calculation for obtaining the parameters of the background HMM based on the parameters of each voice or noise HMM stored in this storage means And a second likelihood calculating means, wherein the second likelihood calculating means has the background HM calculated by the parameter calculating means.
A speech recognition apparatus, characterized in that the log likelihood is obtained using M parameters.

5. When changing various voices or noises used for the parameter extraction of the background HMM,
The parameter estimating unit obtains an HMM parameter specific to the voice or noise of the various voices or noises that have not been stored in the storing unit and stores the HMM parameter in the storing unit. Then, the parameter calculating means re-extracts the parameters of the background HMM based on the parameters of each HMM of various voices or noises after the change among the parameters of each HMM stored in the storage means. The voice recognition device according to claim 4, wherein