JPH0997095A

JPH0997095A - Speech recognition device

Info

Publication number: JPH0997095A
Application number: JP7253146A
Authority: JP
Inventors: Nobuyuki Kono; 信幸香野
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1995-09-29
Filing date: 1995-09-29
Publication date: 1997-04-08

Abstract

PROBLEM TO BE SOLVED: To provide the speech recognition device which uses HMM(Hidden Markov Model) that enables a rejection threshold for handling the input of an incorrect spoken sound such as a given cough to be set and used with likelihood corresponding to a user. SOLUTION: In addition to a speech input means 1, a word voice segmentation part 2, a feature extraction part 3, a state estimation part 4, a learning part 5, etc., a likelihood output part 6 which finds likelihood from feature data and HMM parameters and a threshold setting part 8 which sets the threshold for rejection are constituted. Therefore, such trouble that a voice is rejected no matter how many times the user voices a word is eliminated and the usability of the user can be improved.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、単語音声を認識し、そ
の認識結果を出力する音声認識装置に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus for recognizing word speech and outputting the recognition result.

【０００２】[0002]

【従来の技術】従来の、ＨｉｄｄｅｎＭａｒｋｏｖ
Ｍｏｄｅｌ（本発明では、ＨＭＭと略称する）を用いた
単語音声を認識する音声認識装置の説明を行なうため
に、初めにＨＭＭによる音声認識の方法について説明す
る。ＨＭＭは、Ｎ個の状態Ｓ１，Ｓ２，．．．，ＳＮを
持ち、一定周期毎に、ある確率（遷移確率）で状態を次
々に遷移するとともに、その際に、ある確率（出力確
率）でラベル（特徴データ）を一つずつ出力するという
マルコフモデルである。2. Description of the Related Art The conventional Hidden Markov
In order to describe a voice recognition device that recognizes a word voice using Model (abbreviated as HMM in the present invention), a method of voice recognition by HMM will be described first. The HMM has N states S1, S2 ,. . . , SN, and the state transitions one after another with a certain probability (transition probability) at regular intervals, and at that time, outputs a label (feature data) one by one with a certain probability (output probability). Is.

【０００３】音声をラベル（特徴データ）の時系列と見
た場合に、学習時に、各単語を数回発声してそれらをモ
デル化したＨＭＭを作成しておき、認識時には、入力音
声のラベル系列を出力する確率（尤度）が最大になるＨ
ＭＭを探すことで認識を行なう。以下、図面を参照して
具体的に説明する。When the speech is regarded as a time series of labels (feature data), each word is uttered several times during learning to create an HMM that models them, and at the time of recognition, the label series of the input speech is recognized. H that maximizes the probability (likelihood) of outputting
Recognition is performed by searching for MM. Hereinafter, a specific description will be given with reference to the drawings.

【０００４】図５は従来のＨＭＭの例図であって、日本
音響学会誌４２巻１２号（１９８６）「Ｈｉｄｄｅｎ
ＭａｒｋｏｖＭｏｄｅｌに基づいた音声認識」で示さ
れたＨＭＭの簡単な例である。このＨＭＭは、３つの状
態で構成され、２種類のラベルａとｂのみからなるラベ
ル系列を出力する。初期状態はＳ１で、Ｓ１からは、
０．３の確率でＳ１自体に遷移する（その際にラベルａ
を出力する。ラベルｂは出力確率が０．０なので出力さ
れない）か、０．７の確率でＳ２に遷移する（その際に
ラベルａを０．５の確率で、ラベルｂを０．５の確率で
出力する）。状態Ｓ２からは、０．２の確率でＳ２自体
に遷移する（その際にラベルａかｂかをそれぞれ０．
３、０．７の確率で出力する）か、０．８の確率で最終
状態Ｓ３に遷移する（その際にラベルｂを出力する。ラ
ベルａは出力確率が０．０なので出力されない）ことを
表している。FIG. 5 is a diagram showing an example of a conventional HMM, which is shown in "Hidden", Vol.
3 is a simple example of an HMM shown in "Voice Recognition Based on Markov Model". This HMM is composed of three states and outputs a label series consisting of only two types of labels a and b. The initial state is S1, and from S1,
Transition to S1 itself with a probability of 0.3 (at that time, label a
Is output. The label b is not output because the output probability is 0.0) or transits to S2 with a probability of 0.7 (in that case, the label a is output with a probability of 0.5 and the label b is output with a probability of 0.5). ). From the state S2, the state transits to S2 itself with a probability of 0.2 (at that time, whether the label a or b is 0.
Output with a probability of 3, 0.7) or transition to the final state S3 with a probability of 0.8 (in this case, output the label b. Since the output probability of the label a is 0.0, it is not output). It represents.

【０００５】ここで、このＨＭＭがラベル系列（特徴デ
ータの列）ａｂｂを出力する確率（尤度）を考えると、
このＨＭＭで許される状態系列はＳ１Ｓ１Ｓ２Ｓ３とＳ
１Ｓ２Ｓ２Ｓ３の２つだけであり、それぞれ確率は、
０．３＊１．０＊０．７＊０．５＊０．８＊１．０＝
０．０８４０および０．７＊０．５＊０．２＊０．７＊
０．８＊１．０＝０．０３９２である。どちらの可能性
もあるので合計０．０８４０＋０．０３９２＝０．１２
３２の確率でこのＨＭＭはａｂｂを出力することがわか
る。Here, considering the probability (likelihood) that this HMM outputs a label series (characteristic data string) abb,
The state sequences allowed in this HMM are S1S1S2S3 and S
There are only 1S2S2S3, and the probability of each is
0.3 * 1.0 * 0.7 * 0.5 * 0.8 * 1.0 =
0.0840 and 0.7 * 0.5 * 0.2 * 0.7 *
0.8 * 1.0 = 0.0392. Both possibilities are possible, so total 0.0840 + 0.0392 = 0.12
It can be seen that with a probability of 32, this HMM outputs abb.

【０００６】そこで、予め単語毎にそのＨＭＭを学習し
て、各単語に最も適した状態の遷移確率と各状態遷移に
おけるラベルの出力確率を求めておけば、ある未知の単
語のラベル系列が入力された場合、各ＨＭＭに対して確
率（尤度）計算を行なえば、どの単語に対するＨＭＭが
このラベル系列を出力し易いかがわかり、これにより認
識ができる。以上がＨＭＭによる音声認識の方法であ
る。Therefore, if the HMM for each word is learned in advance and the transition probability of the state most suitable for each word and the output probability of the label at each state transition are obtained, the label sequence of an unknown word is input. In such a case, if the probability (likelihood) is calculated for each HMM, it is possible to know which word the HMM is likely to output this label sequence, and it is possible to recognize it. The above is the method of voice recognition by the HMM.

【０００７】また、図６は、従来の音声認識における音
声波形、特徴データの時系列とＨＭＭの各状態の対応を
示す例図であり、「はじめ」と発声した場合の対応を示
している。このように、音声の特徴データの時系列に対
して、その単語の音韻数程度の少ない状態でＨＭＭが表
現される。Further, FIG. 6 is an example diagram showing the correspondence between the time series of the voice waveform and the characteristic data and the respective states of the HMM in the conventional voice recognition, and shows the correspondence when uttering "beginning". In this way, the HMM is represented in a state in which the number of phonemes of the word is small with respect to the time series of the voice feature data.

【０００８】従来のＨＭＭを用いた単語音声を認識する
音声認識装置では、学習時に、音声認識装置に登録する
各単語に対し、その単語の音韻数程度の少ない状態数を
音韻のスペクトル変化等から求め、各状態遷移での特徴
データの出力確率と状態間の遷移確率を学習により推定
してＨＭＭにモデル化しておき、認識時に、入力音声を
これらすべてのモデルに当てはめて尤度計算を行ない、
認識していた。In a conventional speech recognition apparatus for recognizing a word speech using an HMM, at the time of learning, for each word registered in the speech recognition apparatus, the number of states having a small number of phonemes of the word is determined from a change in the phoneme spectrum or the like. Obtained, the output probability of the feature data in each state transition and the transition probability between states are estimated by learning and modeled in HMM, and at the time of recognition, the input speech is applied to all these models to perform likelihood calculation,
I was aware.

【０００９】[0009]

【発明が解決しようとする課題】音声認識装置では、咳
払いなどの不正な発声音声の入力があった場合に対応す
るために、常に一番尤度の高い候補を利用者に返すので
はなく、一番尤度が高い候補が、あるしきい値を越えて
いなければ、その候補をリジェクトしてしまい、利用者
に再度発声を促すようにすることが、操作性の向上の上
で重要である。しかし、このリジェクトのためのしきい
値は、音声認識装置の提供者が予めその音声認識装置を
評価することにより決めた一定値であるため、利用者に
よっては何度発声し直しても、認識候補がリジェクトさ
れてしまい、認識できないことがあった。因みに、古井
貞照著「ディジタル信号処理」（東海大学出版会）の第
１０章１０．２音声認識の課題によると、「全体の中で
はわずかな割合の話者ではあるが、極めて認識率の低い
話者が生ずる問題がある」とされている。In the voice recognition device, in order to deal with the case where an unfair vocalization input such as throat clearing is received, the candidate with the highest likelihood is not always returned to the user, If the candidate with the highest likelihood does not exceed a certain threshold, it will be rejected and it will be important to prompt the user to speak again in order to improve operability. . However, the threshold for this rejection is a constant value determined by the voice recognition device provider in advance by evaluating the voice recognition device. Sometimes candidates were rejected and could not be recognized. By the way, according to Sadateru Furui's "Digital Signal Processing" (Tokai University Press), Chapter 10, 10.2 Speech recognition problem, "a small percentage of the total speakers, but extremely high recognition rate There is a problem with low speakers. "

【００１０】これは、認識率が低くなってしまう話者で
は、その特徴データが個人特有の声の明瞭さの違いや口
ごもり等の要因で平均的な話者の特徴データのバラツキ
の範囲外にあるために（確率的に起こりにくいという意
味で）尤度が通常より低く計算されてしまうためであ
り、このため、平均的な話者の特徴データを基に設定さ
れた尤度しきい値を常に越えないという現象が発生して
いた。This is because, for a speaker whose recognition rate is low, the feature data is out of the range of the average feature data of the speaker due to factors such as a difference in the clarity of voice peculiar to an individual and a fluffiness. This is because the likelihood is calculated lower than usual (in the sense that it is unlikely to occur probabilistically), and therefore the likelihood threshold set based on the average speaker feature data is set. There was always a phenomenon of not exceeding.

【００１１】したがって本発明は、咳払いなどの不正な
発声音声の入力に対応するためのリジェクトしきい値を
利用者に応じた尤度で設定および利用することを可能に
する単語音声を認識するＨＭＭを用いた音声認識装置を
提供することを目的とする。Therefore, according to the present invention, an HMM for recognizing a word voice that allows setting and using a reject threshold for responding to an input of an uttered voice such as a cough clearing with a likelihood according to a user. An object of the present invention is to provide a voice recognition device using the.

【００１２】[0012]

【課題を解決するための手段】このために本発明の音声
認識装置は、単語音声を含む音声を入力するための音声
入力手段と、単語音声を含む音声から単語音声の部分だ
けを切り出す単語音声切り出し部と、切り出した単語音
声から特徴データを抽出する特徴抽出部と、特徴データ
からＨＭＭによりモデル化する際の単語音声に対する状
態数を推定する状態数推定部と、特徴データを単語モデ
ルに当てはめてＨＭＭパラメータを求める学習部と、特
徴データとＨＭＭパラメータから尤度を求める尤度出力
部と、学習したＨＭＭパラメータおよび尤度情報からな
る音声辞書ファイルと、リジェクトのためのしきい値を
設定するしきい値設定部と、各単語モデルに対して尤度
を計算して、認識候補を判定する照合判定部と、認識結
果を出力する判定結果出力部とを備えた。To this end, the voice recognition apparatus of the present invention comprises a voice input means for inputting a voice including a word voice, and a word voice for cutting out only the word voice portion from the voice including the word voice. A cutout unit, a feature extraction unit that extracts feature data from the cut out word voice, a state number estimation unit that estimates the number of states for the word voice when modeling from the feature data by HMM, and applies the feature data to the word model. A HMM parameter learning unit, a likelihood output unit that calculates a likelihood from the feature data and the HMM parameter, a voice dictionary file including the learned HMM parameter and likelihood information, and a threshold value for rejecting are set. A threshold setting unit, a matching determination unit that calculates the likelihood for each word model and determines a recognition candidate, and a determination that outputs a recognition result And a result output unit.

【００１３】[0013]

【作用】音声認識装置への単語登録の際に、学習して求
めたＨＭＭパラメータを用いて、登録用に入力された音
声を認識し、その時の尤度を求めるようにする。つま
り、認識率が低くなってしまう話者の場合でも、その話
者の発声に応じた（低めの）尤度を利用者の尤度として
求めておく。そして、その尤度もＨＭＭパラメータと一
緒に音声辞書ファイルに登録しておく。認識の時には、
音声辞書ファイル内の尤度情報を読み、尤度しきい値の
参考値とする。これにより、利用者に応じた尤度しきい
値が設定でき、正確な認識ができる。このように利用者
に応じた尤度しきい値を設定することができるため、
「利用者によっては何度発声し直しても、認識候補がリ
ジェクトされてしまい、認識できない」ということがな
くなる。When the word is registered in the voice recognition device, the HMM parameter obtained by learning is used to recognize the voice input for registration, and the likelihood at that time is obtained. That is, even in the case of a speaker whose recognition rate becomes low, the (lower) likelihood corresponding to the utterance of the speaker is obtained as the likelihood of the user. Then, the likelihood is also registered in the voice dictionary file together with the HMM parameter. At the time of recognition,
The likelihood information in the voice dictionary file is read and used as a reference value for the likelihood threshold. Thereby, the likelihood threshold according to the user can be set and accurate recognition can be performed. Since it is possible to set the likelihood threshold according to the user in this way,
The problem that "the recognition candidate is rejected and cannot be recognized no matter how many times the user utters it again" does not occur.

【００１４】[0014]

【実施例】以下、本発明の一実施例について図面を参照
しながら説明する。図１は本発明の一実施例における音
声認識装置の構成ブロック図である。図中、１は単語音
声を含む音声を入力するための音声入力手段、２は単語
音声を含む音声から単語音声の部分だけを切り出す単語
音声切り出し部、３は切り出した単語音声から特徴デー
タを抽出する特徴抽出部、４は特徴データからＨＭＭに
よりモデル化する際の単語音声に対する状態数を推定す
る状態数推定部、５は特徴データを単語モデルに当ては
めてＨＭＭパラメータを求める学習部、６は特徴データ
とＨＭＭパラメータから尤度を求める尤度出力部、７は
学習したＨＭＭパラメータおよび尤度情報からなる音声
辞書ファイル、８はリジェクトのためのしきい値を設定
するしきい値設定部、９は各単語モデルに対して尤度を
計算して、認識候補を判定する照合判定部、１０は認識
結果を出力する判定結果出力部である。An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a configuration block diagram of a voice recognition device according to an embodiment of the present invention. In the figure, 1 is a voice input means for inputting a voice including a word voice, 2 is a word voice cutout portion for cutting out only a portion of the word voice from the voice including the word voice, and 3 is characteristic data extracted from the cut out word voice. A feature extraction unit, 4 is a state number estimation unit that estimates the number of states for a word voice when modeling with HMM from the feature data, 5 is a learning unit that applies the feature data to the word model to obtain HMM parameters, and 6 is a feature Likelihood output unit that obtains a likelihood from data and HMM parameters, 7 is a voice dictionary file that includes learned HMM parameters and likelihood information, 8 is a threshold setting unit that sets a threshold for reject, and 9 is The collation judging unit 10 which calculates the likelihood for each word model and judges the recognition candidate is a judgment result output unit which outputs the recognition result.

【００１５】図２は本発明の一実施例における音声認識
装置の回路ブロック図であり、図中、１１はマイク、１
２は読み出し専用メモリ（ＲＯＭ）、１３は中央処理装
置（ＣＰＵ）、１４は書き込み可能メモリ（ＲＡＭ）、
１５はモニター、１６はファイル装置である。FIG. 2 is a circuit block diagram of a voice recognition device according to an embodiment of the present invention.
2 is a read only memory (ROM), 13 is a central processing unit (CPU), 14 is a writable memory (RAM),
Reference numeral 15 is a monitor, and 16 is a file device.

【００１６】図１に示す音声入力手段１はマイク１１に
より、単語音声切り出し部２と特徴抽出部３と状態数推
定部４と学習部５と尤度出力部６としきい値設定部８と
照合判定部９は、ＣＰＵ１３がマイク１１とＲＯＭ１２
とＲＡＭ１４およびファイル装置１６とデータのやりと
りを行ないながらＲＯＭ１２に記憶されたプログラムを
実行することにより、音声辞書ファイル７はファイル装
置１６により、判定結果出力部１０はモニター１５によ
り実現されている。The voice input means 1 shown in FIG. 1 is collated with the word voice cutout unit 2, the feature extraction unit 3, the state number estimation unit 4, the learning unit 5, the likelihood output unit 6, the threshold value setting unit 8 by the microphone 11. In the determination unit 9, the CPU 13 has a microphone 11 and a ROM 12
By executing the program stored in the ROM 12 while exchanging data with the RAM 14 and the file device 16, the voice dictionary file 7 is realized by the file device 16 and the determination result output unit 10 is realized by the monitor 15.

【００１７】図３は本発明の一実施例における音声認識
装置の登録時のフローチャート、図４は本発明の一実施
例における音声認識装置の認識時のフローチャートであ
る。上記のように構成された音声認識装置に、ある単語
音声が登録される場合について、図３のフローチャート
に基づき説明する。FIG. 3 is a flow chart at the time of registration of the voice recognition device according to the embodiment of the present invention, and FIG. 4 is a flow chart at the time of recognition of the voice recognition device according to the embodiment of the present invention. A case where a certain word voice is registered in the voice recognition device configured as described above will be described based on the flowchart of FIG.

【００１８】ステップ１では、音声入力手段１により、
単語音声を含む発声音声が入力される。ステップ２で
は、単語音声切り出し部２により単語音声を含む発声音
声から単語音声を切り出す。これは音声のパワー等によ
り単語音声の前後の無音または低雑音部分を検出し取り
除くことにより実現できる。ステップ３では、特徴抽出
部３において、線形予測分析（ＬＰＣ分析）により、そ
の単語音声に対するＬＰＣケプストラム係数を求める等
の方法で特徴抽出を行なう。ステップ４では、状態数推
定部４により、ステップ３で単語音声から抽出した特徴
データからその単語音声に対する状態数を推定する。状
態数の推定は、日本音響学会講演論文集（１９９０．
３）「連続数字音声認識におけるＨＭＭの状態数及び混
合数について」に基づいて行なうことができる。In step 1, the voice input means 1
A voiced voice including a word voice is input. In step 2, the word voice cutout unit 2 cuts out a word voice from a voiced voice including the word voice. This can be realized by detecting and removing silent or low noise portions before and after the word voice by the power of the voice or the like. In step 3, the feature extraction unit 3 performs feature extraction by a method such as obtaining an LPC cepstrum coefficient for the word voice by linear prediction analysis (LPC analysis). In step 4, the number-of-states estimation unit 4 estimates the number of states for the word voice from the feature data extracted from the word voice in step 3. The number of states can be estimated by the Acoustical Society of Japan Proceedings (1990.
3) It can be performed based on "about the number of states and the number of mixtures of HMMs in continuous digit speech recognition".

【００１９】ステップ５では、学習部５により単語音声
の特徴データをステップ４で求めた状態数を持つＨＭＭ
モデルを用いて学習し、各状態間の遷移確率および遷移
における特徴データの出力確率のＨＭＭパラメータを求
め、音声辞書ファイル７に、求めたＨＭＭパラメータを
格納する。ステップ６では、尤度出力部６により単語音
声の特徴データを用いて、音声辞書ファイル７から読み
込んだステップ５で求めたＨＭＭパラメータ上で尤度計
算を行ない、その尤度を求める。そして、音声辞書ファ
イル７に、この尤度の情報も格納する。In step 5, the HMM having the number of states obtained in step 4 for the feature data of the word voice by the learning unit 5
Learning is performed using the model, HMM parameters of transition probabilities between states and output probabilities of feature data at transitions are obtained, and the obtained HMM parameters are stored in the voice dictionary file 7. In step 6, the likelihood output unit 6 uses the feature data of the word voice to perform likelihood calculation on the HMM parameters read from the voice dictionary file 7 and found in step 5 to obtain the likelihood. Then, the likelihood information is also stored in the voice dictionary file 7.

【００２０】次に、ある単語音声を認識する場合につい
て、以下、この動作を図４のフローチャートに基づき説
明する。ステップ１１では、音声入力手段１により、単
語音声を含む発声音声が入力される。ステップ１２で
は、単語音声切り出し部２により単語音声を含む発声音
声から単語音声を切り出す。ステップ１３では、特徴抽
出部３により単語音声に対する特徴抽出を行なう。ステ
ップ１４では、照合判定部９により単語音声の特徴デー
タを用いて音声辞書ファイル７から読み込んだ各単語モ
デルのＨＭＭパラメータ上で尤度計算を行ない尤度の高
い単語モデルを認識候補と判定する。Next, in the case of recognizing a certain word voice, this operation will be described below with reference to the flowchart of FIG. In step 11, the voice input means 1 inputs a voiced voice including a word voice. In step 12, the word voice cutout unit 2 cuts out the word voice from the voiced voice including the word voice. In step 13, the feature extraction unit 3 extracts the feature of the word voice. In step 14, the matching determination unit 9 performs likelihood calculation on the HMM parameters of each word model read from the voice dictionary file 7 using the feature data of the word voice, and determines a word model with a high likelihood as a recognition candidate.

【００２１】ステップ１５では、しきい値設定部８によ
り、音声辞書ファイル７から読み込んだ尤度情報によ
り、リジェクトのためのしきい値を設定する。このしき
い値は、「読み込んだ尤度情報をそのまましきい値とす
る」ようにして、あるいは、「音声認識装置を評価して
決定したしきい値に対して、読み込んだ尤度情報で重み
付けしたものをしきい値とする」等として設定できる。
ステップ１６では、照合判定部９により、ステップ１４
で求めた認識候補の尤度がステップ１５で設定したしき
い値を越えているかどうかを判断し、越えていれば、ス
テップ１７に進み、越えていなければ、リジェクトして
利用者に再度入力してもらうためステップ１１に戻る。
ステップ１７では、判定結果出力部１０により認識結果
を利用者に通知する。In step 15, the threshold value setting unit 8 sets a threshold value for rejecting based on the likelihood information read from the voice dictionary file 7. This threshold value is set as "the read likelihood information is used as it is" or, "the threshold value determined by evaluating the voice recognition device is weighted with the read likelihood information. What has been done is used as a threshold ".
In step 16, the collation determining unit 9 determines in step 14
It is determined whether or not the likelihood of the recognition candidate obtained in step 15 exceeds the threshold value set in step 15. If it exceeds, the process proceeds to step 17. If it does not exceed, it is rejected and the user inputs it again. Return to step 11 to get the request.
In step 17, the determination result output unit 10 notifies the user of the recognition result.

【００２２】[0022]

【発明の効果】以上説明したように本発明の音声認識装
置によれば、登録時に、学習して求めたＨＭＭパラメー
タを用いて、登録時の入力音声を認識させ、その尤度情
報を求めておくことにより、認識時に、利用者に応じた
リジェクトのしきい値の設定が行なえるため、利用者
が、何度発声してもリジェクトされるような不都合が生
じず、利用者の使い勝手を向上させることができる。As described above, according to the voice recognition apparatus of the present invention, at the time of registration, the input voice at the time of registration is recognized by using the HMM parameter obtained by learning, and its likelihood information is obtained. By doing so, the rejection threshold can be set according to the user at the time of recognition, so there is no inconvenience that the user is rejected no matter how many times they say it, and the usability of the user is improved. Can be made.

[Brief description of drawings]

【図１】本発明の一実施例における音声認識装置の構成
ブロック図FIG. 1 is a configuration block diagram of a voice recognition device according to an embodiment of the present invention.

【図２】本発明の一実施例における音声認識装置の回路
ブロック図FIG. 2 is a circuit block diagram of a voice recognition device according to an embodiment of the present invention.

【図３】本発明の一実施例における音声認識装置の登録
時のフローチャートFIG. 3 is a flowchart when registering a voice recognition device according to an embodiment of the present invention.

【図４】本発明の一実施例における音声認識装置の認識
時のフローチャートFIG. 4 is a flowchart at the time of recognition of the voice recognition device in the embodiment of the present invention.

【図５】従来のＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅ
ｌの例図FIG. 5: Conventional Hidden Markov Mode
Example of l

【図６】従来の音声認識における音声波形、特徴データ
の時系列とＨＭＭの各状態の対応を示す例図FIG. 6 is an example diagram showing correspondence between a voice waveform, a time series of feature data, and each state of the HMM in conventional voice recognition.

[Explanation of symbols]

１音声入力手段２単語音声切り出し部３特徴抽出部４状態数推定部５学習部６尤度出力部７音声辞書ファイル８しきい値設定部９照合判定部１０判定結果出力部１１マイク１２ＲＯＭ１３ＣＰＵ１４ＲＡＭ１５モニター１６ファイル装置 1 voice input means 2 word voice cutout unit 3 feature extraction unit 4 state number estimation unit 5 learning unit 6 likelihood output unit 7 voice dictionary file 8 threshold value setting unit 9 collation determination unit 10 determination result output unit 11 microphone 12 ROM 13 CPU 14 RAM 15 Monitor 16 File device

Claims

[Claims]

1. A voice input unit for inputting a voice including a word voice, a word voice cutout unit for cutting out only a portion of the word voice from the voice including the word voice, and a feature for extracting feature data from the cut out word voice. From the feature data and the HMM parameters, an extraction unit, a state number estimation unit that estimates the number of states for the word speech when modeling with the HMM from the feature data, a learning unit that applies the feature data to the word model and obtains HMM parameters. Likelihood output unit for obtaining a likelihood, a voice dictionary file composed of learned HMM parameters and likelihood information, a threshold setting unit for setting a threshold for rejection, and a likelihood for each word model. And a determination result output unit that outputs a recognition result.