JPH096387A

JPH096387A - Voice recognition device

Info

Publication number: JPH096387A
Application number: JP7151598A
Authority: JP
Inventors: Nobuyuki Kono; 信幸香野
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1995-06-19
Filing date: 1995-06-19
Publication date: 1997-01-10

Abstract

PURPOSE: To provide a voice recognition device excellent in discriminating ability between similar words from each other. CONSTITUTION: This device is provided with a feature extraction part 3 extracting the feature data from a word voice segmented from an inputted voice, a state number estimation part 4 estimating a state number for the word voice when the feature data are modeled from the feature data by a Markov model, a similar word judging part 5 judging whether or not the word similar to the word voice to be registered newly has been registered, a state number addition part 6 increasing the estimated state number, a learning part 7 applying the feature data to a word model and obtaining a Markov model parameter, a voice dictionary file 8 consisting of the Markov model parameter and a collation decision part 9 calculating likelihood for respective word models and deciding a recognition candidate.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、単語音声を認識し、そ
の認識結果を出力する音声認識装置に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus for recognizing word speech and outputting the recognition result.

【０００２】[0002]

【従来の技術】従来の音声認識装置についての説明を行
うに先立ち、従来の音声認識装置で用いられているHidd
en Markov Model（本明細書において「マルコフモデ
ル」という）による音声認識の原理について説明する。2. Description of the Related Art Prior to explaining a conventional voice recognition device, Hidd used in the conventional voice recognition device is used.
The principle of speech recognition by the en Markov Model (herein referred to as "Markov model") will be described.

【０００３】マルコフモデルは、N個の状態Ｓ１,Ｓ
２,...,ＳNを持ち、一定周期毎に、ある確率（遷移確
率）で状態を次々に遷移するとともに、その際に、ある
確率（出力確率）でラベル（特徴データ）を一つずつ出
力するというものである。The Markov model has N states S1 and S.
2, ..., SN, transitions the states one after another with a certain probability (transition probability) at regular intervals, and at that time, labels (feature data) one by one with a certain probability (output probability). It is to output.

【０００４】そして、音声をラベル（特徴データ）の時
系列と見て、学習時に、各単語を数回発声してそれらを
モデル化したマルコフモデルを作成しておき、認識時に
は、入力音声のラベル系列を出力する確率（尤度）が最
大になるマルコフモデルを探すことで認識を行なう。Then, the speech is regarded as a time series of labels (feature data), and at the time of learning, a Markov model is created by uttering each word several times to model them, and at the time of recognition, the label of the input speech is recognized. Recognition is performed by searching for a Markov model that maximizes the probability (likelihood) of outputting a sequence.

【０００５】以下、図面を見ながら具体的に説明する。
図５は、従来の音声認識装置におけるマルコフモデルの
説明図である。図示しているものは、日本音響学会誌４
２巻１２号（１９８６）「Hidden Markov Modelに基
づいた音声認識」で示されたマルコフモデルの簡単な例
であり、このマルコフモデルは、３つの状態で構成さ
れ、２種類のラベルaとラベルbのみからなるラベル系列
を出力する。A detailed description will be given below with reference to the drawings.
FIG. 5: is explanatory drawing of the Markov model in the conventional speech recognition apparatus. What is shown is the Journal of the Acoustical Society of Japan 4
This is a simple example of a Markov model shown in Volume 2, No. 12 (1986), "Speech Recognition Based on Hidden Markov Model". This Markov model is composed of three states and has two kinds of labels a and b. Output a label sequence consisting of only.

【０００６】初期状態はＳ１で、Ｓ１からは、０．３の
確率でＳ１自体に遷移する（その際にラベルａを出力す
る。なお、ラベルｂは出力確率が０．０なので出力され
ない）か、０．７の確率でＳ２に遷移する（その際にラ
ベルａを０．５の確率で、ラベルｂを０．５の確率で出
力する）。The initial state is S1, and the transition from S1 to S1 itself has a probability of 0.3 (at that time, the label a is output. Since the output probability of the label b is 0.0, it is not output). , And transition to S2 with a probability of 0.7 (at that time, the label a is output with a probability of 0.5 and the label b is output with a probability of 0.5).

【０００７】状態Ｓ２からは、０．２の確率でＳ２自体
に遷移する（その際にラベルａかラベルｂかをそれぞれ
０．３、０．７の確率で出力する）か、０．８の確率で
最終状態Ｓ３に遷移する（その際にラベルｂを出力す
る。ラベルａは出力確率が０．０なので出力されない）
ことを表している。From the state S2, the state transits to S2 itself with a probability of 0.2 (at that time, the label a or the label b is output with a probability of 0.3 or 0.7, respectively) or 0.8. Transition to the final state S3 with probability (at that time, the label b is output. Since the output probability of the label a is 0.0, it is not output)
It represents that.

【０００８】ここで、このマルコフモデルがラベル系列
（特徴データの列）ａｂｂを出力する確率（尤度）を考
えると、このマルコフモデルで許される状態系列は、Ｓ
１Ｓ１Ｓ２Ｓ３とＳ１Ｓ２Ｓ２Ｓ３の２つだけであり、
それぞれ確率は、０．３＊１．０＊０．７＊０．５＊０．８＊１．０＝０．０８４００．７＊０．５＊０．２＊０．７＊０．８＊１．０＝０．０３９２である。どちらの可能性もあるので、合計０．０８４０
＋０．０３９２＝０．１２３２の確率でこのマルコフモ
デルはａｂｂを出力することがわかる。Here, considering the probability (likelihood) that this Markov model outputs a label sequence (sequence of feature data) abb, the state sequence permitted by this Markov model is S
There are only two, 1S1S2S3 and S1S2S2S3,
Probability is 0.3 * 1.0 * 0.7 * 0.5 * 0.8 * 1.0 = 0.0840 0.7 * 0.5 * 0.2 * 0.7 * 0.8 * 1.0 = 0.0392. Both possibilities are possible, so 0.0840 total
It can be seen that this Markov model outputs abb with a probability of + 0.0392 = 0.1232.

【０００９】さて、予め単語毎にそのマルコフモデルを
学習して、各単語に最も適した状態の遷移確率と各状態
遷移におけるラベルの出力確率を求めておけば、ある未
知の単語のラベル系列が入力された場合、各マルコフモ
デルに対して確率（尤度）計算を行い、どの単語に対す
るマルコフモデルがこのラベル系列を出力し易いかを知
ることができ、これにより認識を行うことができる。以
上が、マルコフモデルによる音声認識の原理である。By learning the Markov model for each word in advance and obtaining the transition probability of the state most suitable for each word and the output probability of the label at each state transition, the label sequence of an unknown word becomes When input, probability (likelihood) calculation is performed on each Markov model, and it is possible to know which word the Markov model is likely to output this label sequence, and thus recognition can be performed. The above is the principle of speech recognition by the Markov model.

【００１０】また図６は、従来の音声認識装置における
音声波形、特徴データの時系列とマルコフモデルの各状
態の対応を示す例示図であり、「はじめ」と発声した場
合の対応関係を示している。このように、音声の特徴デ
ータの時系列に対して、その単語の音韻数程度の少ない
状態数でマルコフモデルが表現される。FIG. 6 is an exemplary diagram showing the correspondence between the time series of the voice waveform and the feature data and each state of the Markov model in the conventional voice recognition device, and shows the correspondence relationship when "beginning" is uttered. There is. In this way, the Markov model is represented in the time series of the voice feature data with a small number of states of the phoneme of the word.

【００１１】ところで、従来のマルコフモデルを用いた
単語音声を認識する音声認識装置では、学習時に、音声
認識装置に登録する各単語に対し、その単語の音韻数程
度の少ない状態数を音韻のスペクトル変化等から求め、
各状態遷移での特徴データの出力確率と状態間の遷移確
率を学習により推定してマルコフモデルにモデル化して
おき、認識時に入力音声をこれら全てのモデルに当ては
めて、尤度計算を行い認識していた。By the way, in a conventional speech recognition apparatus for recognizing a word speech using a Markov model, at the time of learning, for each word registered in the speech recognition apparatus, the number of states, which is as small as the number of phonemes of the word, is used as the phoneme spectrum. Seeking from changes, etc.,
The output probabilities of the feature data at each state transition and the transition probabilities between states are estimated by learning and modeled in a Markov model, and the input speech is applied to all these models at the time of recognition, and likelihood calculation is performed for recognition. Was there.

【００１２】[0012]

【発明が解決しようとする課題】ところで、従来の音声
認識装置を用いて、学習時に単語を登録する際に、例え
ば、「さとう」という単語を登録した後から、それと音
響的によく似た単語「かとう」などを登録しようとする
と、同じような状態数とマルコフモデルパラメータとな
るため、そのまま登録すると認識時に両単語の識別が難
しくなる。このため、既に類似単語が登録されている場
合、利用者に「かとう」ではなく例えば「かとうかちょ
う」などどいうように、言い直しをしてもらってから、
登録する必要があった。このように、従来の音声認識装
置では、類似した音声を識別する能力が低いため、頻繁
に利用者に言い直しを求めざるを得ないという問題点が
あった。By the way, when a word is registered at the time of learning using a conventional voice recognition device, for example, after the word "Sato" is registered, a word acoustically similar to that word is registered. If "Katou" or the like is to be registered, the number of states and the Markov model parameters are the same, so if it is registered as it is, it becomes difficult to identify both words at the time of recognition. Therefore, if a similar word has already been registered, ask the user to say something like "Katokacho" instead of "Katou", and then
I had to register. As described above, the conventional voice recognition device has a problem that it is often necessary to reword the user because the ability to identify similar voices is low.

【００１３】そこで本発明は、類似単語同士の識別能力
が優れた音声認識装置を提供することを目的とする。Therefore, an object of the present invention is to provide a voice recognition device having an excellent ability to distinguish between similar words.

【００１４】[0014]

【課題を解決するための手段】本発明の音声認識装置
は、単語音声を含む音声を入力するための音声入力手段
と、入力された音声から単語音声の部分だけを切り出す
単語音声切り出し部と、切り出した単語音声から特徴デ
ータを抽出する特徴抽出部と、特徴データからマルコフ
モデルによりモデル化する際の単語音声に対する状態数
を推定する状態数推定部と、新に登録しようとする単語
音声と類似した単語が既に登録されていないか判定する
類似単語判定部と、推定した状態数を増やす状態数加算
部と、特徴データを単語モデルに当てはめてマルコフモ
デルパラメータを求める学習部と、学習したマルコフモ
デルパラメータからなる音声辞書ファイルと、各単語モ
デルに対して尤度を計算して、認識候補を判定する照合
判定部と、認識結果を出力する判定結果出力部とを備え
る。A voice recognition device of the present invention comprises a voice input means for inputting a voice containing a word voice, a word voice cutout portion for cutting out only the word voice portion from the input voice, A feature extraction unit that extracts feature data from the cut out word voice, a state number estimation unit that estimates the number of states for the word voice when modeling from the feature data by a Markov model, and a word voice that is newly registered Similar word determination unit that determines whether the word that has been registered is already registered, a state number addition unit that increases the estimated number of states, a learning unit that applies feature data to a word model to obtain Markov model parameters, and a learned Markov model A voice dictionary file consisting of parameters, a matching determination unit that calculates the likelihood for each word model and determines a recognition candidate, and a recognition result And a determination result output unit for outputting.

【００１５】[0015]

【作用】上記構成により、状態数加算部が推定された状
態数をさらに増やすことにより、単語の特徴がきめ細や
かに表現され、音声認識装置が類似単語を識別し易くな
るため、類似単語でもそのまま登録できることになり、
その結果、利用者に類似単語が区別できるように、言い
直しを求める頻度を低く押さえることができる。With the above configuration, by further increasing the number of states estimated by the state number addition unit, the features of the word are expressed finely, and the voice recognition device can easily identify the similar word. You can register,
As a result, it is possible to keep the frequency of rewording low so that the user can distinguish similar words.

【００１６】[0016]

【実施例】次に図面を参照しながら、本発明の実施例に
ついて説明する。Embodiments of the present invention will now be described with reference to the drawings.

【００１７】図１は、本発明の一実施例における音声認
識装置の機能ブロック図であり、図１において、１は単
語音声を含む音声を入力するための音声入力手段、２は
単語音声を含む音声から単語音声の部分だけを切り出す
単語音声切り出し部、３は切り出した単語音声から特徴
データを抽出する特徴抽出部、４は特徴データからマル
コフモデルによりモデル化する際の単語音声に対する状
態数を推定する状態数推定部、５は新に登録しようとす
る単語音声と類似した単語が既に登録されていないか判
定する類似単語判定部、６は推定した状態数を増やす状
態数加算部、７は特徴データを単語モデルに当てはめて
マルコフモデルパラメータを求める学習部、８は学習し
たマルコフモデルパラメータを含む音声辞書ファイル、
９は各単語モデルに対して尤度を計算して、認識候補を
判定する照合判定部、１０は認識結果を出力する判定結
果出力部である。FIG. 1 is a functional block diagram of a voice recognition apparatus according to an embodiment of the present invention. In FIG. 1, 1 is a voice input means for inputting voice including word voice, and 2 is word voice. A word-speech cutout unit that cuts out only the word-speech part from the sound, 3 is a feature extraction unit that extracts feature data from the cut-out word voice, and 4 is the number of states for the word voice when modeling from the feature data by a Markov model State number estimating unit 5, a similar word determining unit for determining whether or not a word similar to the word voice to be newly registered is already registered, 6 is a state number adding unit for increasing the estimated state number, and 7 is a feature A learning unit that applies the data to a word model to obtain Markov model parameters, 8 is a voice dictionary file including the learned Markov model parameters,
Reference numeral 9 is a matching determination unit that calculates a likelihood for each word model and determines a recognition candidate, and 10 is a determination result output unit that outputs a recognition result.

【００１８】図２は、本発明の一実施例における音声認
識装置の回路ブロック図であり、図２において、１１は
マイク、１２はプログラムを記憶するＲＯＭ（読み出し
専用メモリ）、１３はＲＯＭ１２のプログラムを実行し
全体を制御するＣＰＵ（中央処理装置）、１４はＣＰＵ
１３がプログラムを実行する際に必要な情報を一時格納
するＲＡＭ（書き込み可能メモリ）、１５は処理状況な
どを利用者に表示するモニター、１６は情報を保存する
ファイル装置である。FIG. 2 is a circuit block diagram of a voice recognition apparatus in one embodiment of the present invention. In FIG. 2, 11 is a microphone, 12 is a ROM (read only memory) for storing programs, and 13 is a program of the ROM 12. CPU (Central Processing Unit) that executes the
Reference numeral 13 is a RAM (writable memory) for temporarily storing information necessary for executing the program, 15 is a monitor for displaying processing status to the user, and 16 is a file device for storing information.

【００１９】なお、図１における音声入力手段１はマイ
ク１１により、単語音声切り出し部２と特徴抽出部３と
状態数推定部４と類似単語判定部５と状態数加算部６と
学習部７と照合判定部９は、ＣＰＵ１３がマイク１１と
ＲＯＭ１２とＲＡＭ１４およびファイル装置１６とデー
タのやりとりを行ないながらＲＯＭ１２に記憶されたプ
ログラムを実行することにより、実現される。また、音
声辞書ファイル８はファイル装置１６に格納されるもの
であり、判定結果出力部１０はモニター１５により実現
されている。It should be noted that the voice input means 1 in FIG. 1 uses a microphone 11 to include a word voice cutout unit 2, a feature extraction unit 3, a state number estimation unit 4, a similar word determination unit 5, a state number addition unit 6, and a learning unit 7. The collation determination unit 9 is realized by the CPU 13 executing a program stored in the ROM 12 while exchanging data with the microphone 11, the ROM 12, the RAM 14, and the file device 16. The voice dictionary file 8 is stored in the file device 16, and the determination result output unit 10 is realized by the monitor 15.

【００２０】図３は、本発明の一実施例における登録時
のフローチャート、図４は、本発明の一実施例における
認識時のフローチャートである。FIG. 3 is a flowchart for registration in one embodiment of the present invention, and FIG. 4 is a flowchart for recognition in one embodiment of the present invention.

【００２１】以上のように構成された本実施例における
音声認識装置に、単語音声「かとう」が登録される場合
の動作を図３のフローチャートに基づき説明する。なお
ここでは、既に音響的に「かとう」と類似単語である
「さとう」が音声辞書ファイル８内に登録されているも
のとする。The operation when the word voice "Katou" is registered in the voice recognition apparatus according to the present embodiment having the above-described configuration will be described with reference to the flowchart of FIG. Note that, here, it is assumed that the acoustically similar word "Sato" to "Katou" is acoustically registered in the voice dictionary file 8.

【００２２】まず、ステップ１にて、音声入力手段１か
ら単語音声「かとう」を含む発声音声が入力される。ス
テップ２では、単語音声切り出し部２により、単語音声
「かとう」を含む発声音声から単語音声「かとう」を切
り出す。これは、単語音声切り出し部２が、音声のパワ
ー等により単語音声「かとう」の前後の無音または低雑
音部分を検出し取り除くことにより実現できる。First, in step 1, a voiced voice containing the word voice "Katou" is input from the voice input means 1. In step 2, the word voice cutout unit 2 cuts out the word voice "Katou" from the voiced voice including the word voice "Katou". This can be realized by the word voice cutout unit 2 detecting and removing a silent or low noise part before and after the word voice "KATO" by the power of the voice or the like.

【００２３】ステップ３では、特徴抽出部３における線
形予測分析（LPC分析）により、その単語音声「かと
う」に対するLPCケプストラム係数を求める等の方法で
特徴抽出を行なう。ステップ４では、状態数推定部４に
より、ステップ３で単語音声「かとう」から抽出した特
徴データから、その単語音声に対する状態数を推定す
る。状態数の推定は、日本音響学会講演論文集（１９９
０．３）「連続数字音声認識におけるＨＭＭの状態数及
び混合数について」に基づいて行う。In step 3, feature extraction is performed by a method such as obtaining an LPC cepstrum coefficient for the word voice "Katou" by a linear predictive analysis (LPC analysis) in the feature extraction unit 3. In step 4, the number-of-states estimation unit 4 estimates the number of states for the word voice from the feature data extracted from the word voice “Kato” in step 3. The number of states can be estimated by the Proceedings of the Acoustical Society of Japan (199
0.3) "Regarding the number of HMM states and the number of mixtures in continuous digit speech recognition".

【００２４】ステップ５では、類似単語判定部５によ
り、単語音声「かとう」の類似単語が、既に音声辞書フ
ァイル１８内に存在しているかどうかを判定する。この
判定は、一般的なDPマッチング技術を用いたり、単語音
声「かとう」を用いて本音声認識装置で認識を行ない、
認識候補が有るか無いかにより判断することで実現して
いる。判定の結果、類似単語が有ればステップ６へ進
み、類似単語が無ければステップ７へ進む。ここでは、
類似単語「さとう」があるためステップ６へ進むことに
なる。In step 5, the similar word determination unit 5 determines whether or not a similar word of the word voice "Katou" already exists in the voice dictionary file 18. This determination is made by using a general DP matching technique, or by using the word voice "Katou" to recognize the voice recognition device,
It is realized by judging whether there is a recognition candidate or not. As a result of the determination, if there is a similar word, the process proceeds to step 6, and if there is no similar word, the process proceeds to step 7. here,
Since there is a similar word "Sato", the process proceeds to step 6.

【００２５】ステップ６では、状態数加算部６により、
ステップ４で推定した状態数を、あるパーセント分増や
す。なお、このパーセント値は、例えば推定した状態数
の１０パーセント増等、類似単語が識別できるようにな
るまでパーセント値を少しずつ変化させながら、この音
声認識装置を評価することにより予め決定しておくこと
ができる。これにより、単語の音韻数程度の状態数に比
べ、かなり状態数を増やすことができる。At step 6, the state number adding section 6
Increase the number of states estimated in step 4 by a certain percentage. The percentage value is determined in advance by evaluating the voice recognition device while gradually changing the percentage value until similar words can be identified, for example, by increasing the estimated number of states by 10%. be able to. As a result, the number of states can be increased considerably compared to the number of states of the phoneme of a word.

【００２６】ステップ７では、学習部７により単語音声
「かとう」の特徴データをステップ５またはステップ６
で求めた状態数（この例では類似単語があるためステッ
プ６で求めた状態数）を持つマルコフモデルを用いて学
習し、各状態間の遷移確率および遷移における特徴デー
タの出力確率のマルコフモデルパラメータを求め、音声
辞書ファイル８に、求めたマルコフモデルパラメータを
格納する。In step 7, the learning unit 7 sets the characteristic data of the word voice "Katou" in step 5 or step 6.
Learned using a Markov model having the number of states obtained in (the number of states obtained in step 6 because there are similar words in this example), and the Markov model parameters of the transition probabilities between the states and the output probabilities of the feature data at the transition And the obtained Markov model parameters are stored in the voice dictionary file 8.

【００２７】さらに、ステップ７で、単語音声「かと
う」の特徴データを学習しマルコフモデルパラメータを
求めて、音声辞書ファイル８に、求めたマルコフモデル
パラメータと特徴データとを格納する。類似単語がある
場合、音声辞書ファイル８に既に登録済みの、類似単語
であると判断された単語音声「さとう」の特徴データ読
み込み、再度ステップ６で求めた状態数を持つマルコフ
モデルを用いて学習し、求めたマルコフモデルパラメー
タを、既に格納していた「さとう」のマルコフモデルパ
ラメータと入れ替える。Further, in step 7, the feature data of the word voice "Katou" is learned to obtain the Markov model parameters, and the obtained Markov model parameters and the feature data are stored in the voice dictionary file 8. If there is a similar word, the feature data of the word voice “Sato” that has already been registered in the voice dictionary file 8 and is judged to be a similar word is read, and learning is performed again using the Markov model having the number of states obtained in step 6. Then, the obtained Markov model parameter is replaced with the already stored Markov model parameter of “Sato”.

【００２８】次に、単語音声「かとう」を認識する場合
の動作を図４のフローチャートに基づき説明する。まず
ステップ１１にて、音声入力手段１から単語音声「かと
う」を含む発声音声が入力される。ステップ１２では、
単語音声切り出し部２により単語音声「かとう」を含む
発声音声から単語音声「かとう」を切り出す。ステップ
１３では、特徴抽出部３により単語音声「かとう」に対
する特徴抽出を行なう。Next, the operation for recognizing the word voice "Katou" will be described with reference to the flowchart of FIG. First, in step 11, a voiced voice including the word voice "Katou" is input from the voice input means 1. In step 12,
The word voice cutout unit 2 cuts out the word voice "Katou" from the voiced voice including the word voice "Katou". In step 13, the feature extraction unit 3 performs feature extraction for the word voice "Katou".

【００２９】ステップ１４では、照合判定部９により単
語音声「かとう」の特徴データを用いて音声辞書ファイ
ル８から読み込んだ各単語モデルのマルコフモデルパラ
メータ上で尤度計算を行ない尤度の高い単語モデルを認
識候補と判定する。ステップ１５では、判定結果出力部
１０により認識結果を利用者に表示する。In step 14, the matching determination unit 9 uses the feature data of the word voice "Katou" to perform likelihood calculation on the Markov model parameters of each word model read from the voice dictionary file 8, and the word model with high likelihood is calculated. Is determined as a recognition candidate. In step 15, the determination result output unit 10 displays the recognition result to the user.

【００３０】[0030]

【発明の効果】本発明では、類似単語として判定した場
合に、マルコフモデル化するときの状態数を作為的に増
やすことにより特徴を詳細に表現して類似単語間の識別
を可能にしている。このため利用者に言い直しを求める
頻度が少なくなり、認識の精度も向上することができ
る。According to the present invention, when it is determined that the words are similar to each other, the number of states in the Markov model is artificially increased to express the features in detail to enable the discrimination between the similar words. Therefore, the frequency of rewording the user is reduced, and the recognition accuracy can be improved.

[Brief description of drawings]

【図１】本発明の一実施例における音声認識装置の機能
ブロック図FIG. 1 is a functional block diagram of a speech recognition apparatus according to an embodiment of the present invention.

【図２】本発明の一実施例における音声認識装置の回路
ブロック図FIG. 2 is a circuit block diagram of a voice recognition device according to an embodiment of the present invention.

【図３】本発明の一実施例における登録時のフローチャ
ートFIG. 3 is a flowchart at the time of registration in one embodiment of the present invention.

【図４】本発明の一実施例における認識時のフローチャ
ートFIG. 4 is a flowchart at the time of recognition in one embodiment of the present invention.

【図５】従来の音声認識装置におけるマルコフモデルの
説明図FIG. 5 is an explanatory diagram of a Markov model in a conventional speech recognition device.

【図６】従来の音声認識装置における音声波形、特徴デ
ータの時系列とマルコフモデルの各状態の対応を示す例
示図FIG. 6 is an exemplary diagram showing correspondence between a voice waveform, a time series of characteristic data, and states of a Markov model in a conventional voice recognition device.

[Explanation of symbols]

１音声入力手段２単語音声切り出し部３特徴抽出部４状態数推定部５類似単語判定部６状態数加算部７学習部８音声辞書ファイル９照合判定部１０判定結果出力部 1 voice input means 2 word voice cutout unit 3 feature extraction unit 4 state number estimation unit 5 similar word determination unit 6 state number addition unit 7 learning unit 8 voice dictionary file 9 collation determination unit 10 determination result output unit

Claims

[Claims]

1. A voice input means for inputting a voice including a word voice, a word voice cutout unit for cutting out only a portion of the word voice from the input voice, and a feature extraction for extracting feature data from the cut out word voice. Section, a state number estimation section that estimates the number of states for the word speech when modeling with the Markov model from the feature data, and similarity that determines whether a word similar to the word speech to be newly registered has already been registered. A word determination unit, a number-of-states addition unit that increases the number of estimated states, a learning unit that applies feature data to a word model to obtain Markov model parameters, and a voice dictionary file that includes the learned Markov model parameters.
A speech recognition apparatus comprising: a matching determination unit that calculates a likelihood for each word model and determines a recognition candidate; and a determination result output unit that outputs a recognition result.

2. In order to perform learning again for the registered words with the increased number of states, in addition to the learned Markov model parameters, the characteristic data of the word sounds are also stored. The voice recognition device according to claim 1, wherein the voice recognition device is a voice recognition device.