JPS607498A

JPS607498A - Word voice recognition equipment and method

Info

Publication number: JPS607498A
Application number: JP11505483A
Authority: JP
Inventors: 晋太木村; 裕二木島
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-06-28
Filing date: 1983-06-28
Publication date: 1985-01-16

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】発明の技術分野本発明は音声認識装置に関するものであり、よシ特定的
には単語を音声入力し認識する装置であって、認識の対
象とする単語についての音声入力登録を事前に全ての認
識対象語について行う必要のない音声認識装置に関する
。[Detailed Description of the Invention] Technical Field of the Invention The present invention relates to a speech recognition device, and more particularly to a speech recognition device for inputting and recognizing words by speech. The present invention relates to a speech recognition device that does not require registration of all recognition target words in advance.

技術の背景及び従来技術と問題点在来の単語音声認識装置としては、一般に全単語登録形
特定話者認識装置及び不特定話者認識装置とがある。前
者は話者を特定し予め認識対象語の全てについて音声入
力し登録しておく方式を採るものである。この方式の装
置は認識率も良く認識対象語も多く設定できるのである
が、事前に全認識対象語について音声登録をしなければ
ならないので例えば１ｏｏｏ語を越える認識対象単語て
にっ“　いて利用者が音声登録する場合、利用者の登録
の負担が大きくなシすぎるという問題点がある。TECHNICAL BACKGROUND, PRIOR ART, AND PROBLEMS Conventional word speech recognition devices generally include all-word registration type speaker specific recognition devices and non-specific speaker recognition devices. The former method employs a method in which a speaker is identified and all words to be recognized are input and registered in advance. This type of device has a good recognition rate and can set a large number of recognition target words, but since it is necessary to register the voices for all recognition target words in advance, it is difficult for users to recognize recognition target words that exceed 100 words. When users register their voices, there is a problem in that the burden of registration on users is too large.

一方後者は話者を特定せず、利用者が事前に音声登録す
ることは必要としない利点があるが、一般に装置におけ
る認識が難しくなシ設備及び応答性等の制約から認識対
象語が例えば数十個程度に制限されているという問題点
及び認識対象語を自由に変更することができないという
問題点を有している。On the other hand, the latter has the advantage that the speaker is not specified and the user does not need to register the voice in advance. This method has the problem that the number of words to be recognized is limited to about 10, and the problem that the recognition target words cannot be changed freely.

発明の目的本発明の目的は、上記従来技術の問題点を解決し、最小
限の音声登録で数多くの単語を認識することができる単
語音声認識装置全提供することにある。また本発明の目
的は容易に認識対象語を変更することができる音声認識
装置を提供することにある。OBJECTS OF THE INVENTION An object of the present invention is to solve the above-mentioned problems of the prior art and to provide a word speech recognition device that can recognize a large number of words with a minimum number of speech registrations. Another object of the present invention is to provide a speech recognition device that can easily change the recognition target word.

本発明の上記目的は、在来方式の如く音声認識の際音素
のセグメンテーシいンを行うという方法を採らず、登録
の際音素のセグメンテーションを行っておき、このよう
にセグメンテーションが行なわれた音素辞書を用いて単
語音声入力の認識を行うという着想にもとづいて達成さ
れる。The above object of the present invention is to perform segmentation of phonemes at the time of registration, without adopting a method of performing phoneme segmentation during speech recognition as in conventional methods, and to create a phoneme dictionary in which segmentation has been performed in this way. This is achieved based on the idea of recognizing word audio input using .

発明の構成本発明においては、音声入力を受け入れ、該音声入力を
並列に設けられ中心周波数の異なる複数の帯域濾波器を
通し整流した複数チャネルの音声データの信号、及び該
複数の音声データ信号から１つのノｆワー信号全発生す
る音声入力手段、認識すべき単語とは独立した関係にあ
る登録語についての音声入力を前記音声入力部に印加し
、登録語についての音声入力のｚ４ワー信号の時間的変
化にもとづき登録語についての音声入力の複数の音声デ
ータ信号から音声辞書を作成するラベル付手段、認識す
べき単語に・ついての音声入力を前記音声入力部に印加
し、前記音声入力部の出力としてのパワー信号及び複数
の音声データ信号から認識単語の音声入力についての音
素データを作成し、予め得られた前記音素辞書の音素と
近似度の高いものについて音素ラティスデータを作成す
る音素ラティス作成手段、及び、該音素ラティスデータ
を予め記憶されている認識対象語と照合し類似度の最も
高い単語全抽出するラベル照合手段を具備する単語音声
認識装置が提供される。Structure of the Invention In the present invention, a plurality of channels of audio data signals are obtained by receiving an audio input and rectifying the audio input through a plurality of parallel bandpass filters having different center frequencies, and a plurality of audio data signals from the plurality of audio data signals. A voice input means for generating one nof word signal, applies a voice input for a registered word that is independent of the word to be recognized to the voice input section, and generates a z4 word signal of the voice input for the registered word. a labeling means for creating a speech dictionary from a plurality of speech data signals of speech inputs for registered words based on temporal changes; A phoneme lattice that creates phoneme data for the voice input of a recognized word from a power signal as an output and a plurality of voice data signals, and creates phoneme lattice data for phonemes with a high degree of approximation to phonemes in the phoneme dictionary obtained in advance. A word speech recognition device is provided that includes a creation means and a label matching means for matching the phoneme lattice data with pre-stored recognition target words and extracting all words with the highest degree of similarity.

また本発明においては、認識すべき単語とは独立した登
録語について音声入力し該音声入力にもとづき予め音素
辞書を作成する段階、認識すべき単語を音声入力し該音
声入力にもとづき音素データを作成し、前記音素辞書と
照合して音素ラティスを作成する段階、及び該音素ラテ
ィスと認識対象語を比較照合し最も類似度の高い単語を
抽出する段階、を有する単語音声認識方法が提供される
。In addition, in the present invention, there is a step of voice inputting registered words independent of the words to be recognized and creating a phoneme dictionary based on the voice input, and a step of voice inputting the words to be recognized and creating phoneme data based on the voice input. There is also provided a word speech recognition method comprising the steps of creating a phoneme lattice by comparing the phoneme lattice with the phoneme dictionary, and comparing and comparing the phoneme lattice with the recognition target word to extract a word with the highest degree of similarity.

発明の実施例本発明の一実施例について添付図面を参照して下記に述
べる。Embodiment of the Invention An embodiment of the present invention will be described below with reference to the accompanying drawings.

第１図は本発明にもとづく音声認識装置の構成図を示す
。当該音声認識装置は、音声を入力するためのマイクロ
フォン１、該マイクロフォンからの信号を受け入れ入力
信号処理を行う音声入力部２、該音声入力部２の出力信
号Ｓ２と登録語記憶部８からの信号Ｓ８とにもとづき音
素辞書を作成し、音素辞書記憶部６に記憶するラベル付
部５を有している。また音声認識装置は、認識すべき音
声入力があったとき、音声入力部２からの信号Ｓ２を音
素辞書記憶部６の音素辞書データを参照して音素ラティ
スを作成する音素ラティス作成部３、該音素ラティスと
認識対象単語記憶部７の認識対象単語とを照合し該当す
る単語を作成するラベル照合部４、及び認識された単語
の表示若しくは登録すべき語の表示又はその低利用者と
の間のマン・マシン・コミュニケーション用のディスプ
レー９、キーが一ド（図示せず）を有している。FIG. 1 shows a block diagram of a speech recognition device according to the present invention. The speech recognition device includes a microphone 1 for inputting speech, a speech input section 2 that receives a signal from the microphone and processes the input signal, and an output signal S2 of the speech input section 2 and a signal from a registered word storage section 8. It has a labeling section 5 that creates a phoneme dictionary based on S8 and stores it in a phoneme dictionary storage section 6. Furthermore, when there is a speech input to be recognized, the speech recognition device includes a phoneme lattice creation section 3 that creates a phoneme lattice by referring to the phoneme dictionary data of the phoneme dictionary storage section 6 from the signal S2 from the speech input section 2; A label matching unit 4 that collates the phoneme lattice with the recognition target word in the recognition target word storage unit 7 to create a corresponding word, and a display of the recognized word or a word to be registered, or between it and a low user. A display 9 for man-machine communication, the key having a key (not shown).

第２図に音声入力部２の内部回路図を示す。第２図にお
いて、フィルタチャネル回路２１１〜２１ｎと該回路２
１１〜２１ｎの出力の総和をとる加算器２２が図示され
ている。また回路２１１〜２１ｎのうちの回路２１１に
ついての詳細全第３図に示す。第３図示おいて、ＢＰＦ
Ｉは中心周波数ｆ１の帯域済波器、Ｄはダイオード、Ｒ
Ｖｉ抵抗器、Ｃはキャノやシタを示す。他の回路２１２
〜２ｉｎも第３図の回路構成と同様であるが、イＷ域濾
波器の中心周波数は／１とは異なシそれぞれｆ２〜／ｎ
’″Ｃある。FIG. 2 shows an internal circuit diagram of the audio input section 2. In FIG. 2, filter channel circuits 211 to 21n and the circuit 2
An adder 22 is shown which sums the outputs of 11 to 21n. Further, the details of the circuit 211 among the circuits 211 to 21n are shown in FIG. In the third diagram, BPF
I is a bandpass filter with center frequency f1, D is a diode, and R
Vi resistor, C indicates capacitor or capacitor. Other circuit 212
~2in is also similar to the circuit configuration shown in Fig. 3, but the center frequency of the IW band filter is different from /1 and is respectively f2~/n.
'''C is there.

第２図及び第３図の回路から明らかなように、音声入力
部２にマイクロフォン１からの音声入力（交流）が入力
されると、各チャネル毎帯域濾波され、ダイオードＤで
整流され、この直流がＲＣ時定数回路によシ平滑されて
さらにそれらがディジタライプ２３１〜２３ｎを通して
ディジタル信号に変換され、各チャネルの出力データ信
号ＳＤＩ〜ＳＤｎとなる。他方信号ＳＤＩ〜ＳＤｎが加
算器２２に印加されるとその出力はパワーを表わす（Ｓ
　ＰｗＲ）。Ｓ晶爪もディジタライザ２４を介してディ
ジタル員の（ｉｍ号となっている。尚、音声入力部２の
出力信号Ｓ２は信号Ｓ　Ｄ　１〜ＳＤｎ　％　５ＰＷＲ
ぎ総称したものである。As is clear from the circuits in FIGS. 2 and 3, when the audio input (AC) from the microphone 1 is input to the audio input section 2, it is bandpass filtered for each channel, rectified by the diode D, and the DC are smoothed by an RC time constant circuit, and further converted into digital signals through digital lines 231-23n to become output data signals SDI-SDn of each channel. On the other hand, when the signals SDI to SDn are applied to the adder 22, its output represents the power (S
PwR). The S crystal claw is also converted into a digital member (im number) via the digitizer 24.The output signal S2 of the audio input section 2 is the signal S D1 to SDn%5PWR.
This is a general term for the following.

以下第１図装置の動作について説明する。当該装置Ｒ：
　を用いて音声認識を行う形態は、登録フェーズと認識
フェーズに大別される。The operation of the apparatus shown in FIG. 1 will be explained below. The device R:
The form in which speech recognition is performed using is broadly divided into a registration phase and a recognition phase.

先ず登録７エーズの動作について述べる。利用者（話者
）を特定し、以下特定された利用者について下記の如く
行う。登録フェーズでは後述する音素のセグメンテーシ
ョンを行ない音素辞書を作成するが、音素辞書作成のだ
めの音声登録語は事前に定められておシ、登録語記憶部
８に格納されている。First, the operation of the registered 7Aze will be described. The user (speaker) is identified, and the following steps are performed for the identified user. In the registration phase, phoneme segmentation, which will be described later, is performed to create a phoneme dictionary, but speech registration words that are not needed for phoneme dictionary creation are determined in advance and stored in the registration word storage section 8.

登録語記憶部８から音声登録語、例えばｒ　ＡＧＡ（ア
ガ）」ラブイスプレー９に表示し、利用者に発声を指示
する。利用者は上記音声登録Ｋをマイクロフォン１に向
って発声する。マイクロフォンｌの音声入力信号Ｓｌが
音声入力部２に印加され、それぞれのチャネルデータＳ
ＤＩ〜ＳＤｎ及びノ４ワ−８ＰＷＲが得られる。これら
の信号が８２としてラベル付部５に印加される。／？ク
ワ−ＰＷＲは第４図（４）に図示の如き特性曲線として
示される。第４図（４）において横軸ｔは時間を表わす
。A voice registered word, for example, ``r AGA'' from the registered word storage section 8 is displayed on the love display 9 and the user is instructed to speak it. The user speaks the voice registration K into the microphone 1. The audio input signal Sl of the microphone l is applied to the audio input section 2, and the respective channel data S
DI to SDn and 4W-8PWR are obtained. These signals are applied as 82 to the labeling section 5. /? The mulberry PWR is shown as a characteristic curve as shown in FIG. 4 (4). In FIG. 4 (4), the horizontal axis t represents time.

ラベル付部５では、時間ｔについて一定の時間間隔Δｔ
で、これ子フレームと呼ぶ、等間隔で隣シ合うノｆワー
の変化ｄＳＰＷＲ’（ｚ求める（第４図（Ｂ））。In the labeling section 5, a constant time interval Δt with respect to time t
Then, the change dSPWR'(z) of adjacent equally spaced frames called child frames is determined (FIG. 4(B)).

このようにして得られた・やワー変化ｄ　Ｓ　ＰＷＲと
単語ｒＡＧＡＪの記号列にもとづいてフレーム毎の音素
、第４図（Ｃ）の例示においてｒＡＪｒＧＪのラベルを
付ける。ここではパワー変化のピーク値と単語の記号列
とのつじつまが合うようにラベル付けを行なう。このよ
うに音素ラベル付けを行なったら（第４図（Ｃ）　）　
、各フレームについて第５図の形態の音素辞書として音
素辞書記憶部６に記憶する。Based on the thus-obtained /yawa change d S PWR and the symbol string of the word rAGAJ, a label is attached to the phoneme of each frame, rAJrGJ in the example shown in FIG. 4(C). Here, labeling is performed so that the peak value of the power change is consistent with the symbol string of the word. After phoneme labeling is done in this way (Figure 4 (C))
, each frame is stored in the phoneme dictionary storage unit 6 as a phoneme dictionary in the form shown in FIG.

第５図に図示の音素辞書について詳述する。第５図にお
いてｒＳＥ　ＬＢＬＪは音素ラベルを表わしておシ、こ
の時の音素はそれぞれのフレームについて母音「Ａ（ア
）」、濁音子音ガギグ等の頭部「Ｇ」がある。またｒｓ
Ｅ　ＤＡＴＪは音素データを示しておシ、成るフレーム
についてノ千ワー変化があった１つの音素ｒＡＪについ
てチャネルデータＳＤＩ〜ＳＤｎのうち該当するデータ
１ＤＡＴ１としているもので「特徴」を表わすデータで
ある。The phoneme dictionary shown in FIG. 5 will be described in detail. In FIG. 5, rSE LBLJ represents a phoneme label, and the phonemes at this time include a vowel "A" and a prefix "G" such as a voiced consonant "Gagig" for each frame. Also rs
E DATJ indicates phoneme data, and for one phoneme rAJ that has changed by 1,000 times in the frame, it is set as corresponding data 1DAT1 among the channel data SDI to SDn, and is data representing "characteristics."

以上の如く「ＡＧＡ」について音素辞書が作成されたら
次の登録語についても同様にして音素辞書を作成してい
く。ここで音声登録語は認識すべき対象単語と等価では
なく、むしろ独立した関係にあシ、認識対象単語全認識
する上で必要となる音素辞書作成のために必要な数でよ
い。すなわち一定量の音素辞書が作成されるだけの音声
登録語があフ、認識対象単語が増加しても、音声登録音
さらに行う必要はない。Once a phoneme dictionary is created for "AGA" as described above, a phoneme dictionary is created for the next registered word in the same manner. Here, the voice registration words are not equivalent to the target words to be recognized, but rather have an independent relationship, and may be as many as necessary to create a phoneme dictionary necessary for recognizing all target words. In other words, even if the number of voice registration words is sufficient to create a certain amount of phoneme dictionary, and the number of words to be recognized increases, there is no need to perform further voice registration.

また上記の如き音素辞書作成時点で、予め判っている音
声登録語についての記号列とつじつまが合わないような
ラベル付けとなった場合には、ディスプレー９を介して
、利用者に再発声をうながし、音声登録処理ヲ＜シ返し
て音素辞書作成を確実化する。In addition, at the time of creating the phoneme dictionary as described above, if the labeling is inconsistent with the symbol string for the registered phonetic word known in advance, the user is prompted to repeat it via the display 9. , the voice registration process is returned to ensure the creation of a phoneme dictionary.

次いで上記音素辞ｔを用いて実際に音声認識を行つ認識
フェーズついて述べる。Next, the recognition phase in which speech recognition is actually performed using the phoneme word t will be described.

認識すべき音声入力がマイクロフォン１に印加されその
出力信号・Ｓｌについてのノぐワーが第６図の如く得ら
れたとする。音声入力部２の出力信号Ｓ２、第６図の５
ｐＮｖＲ２含むもの、が音素ラティス作成部３に印加さ
れる。音素ラティス作成部３において、第６図のパワー
信号についてフレーム毎（Δｔ）音素データ全作シ出し
、予め得られている音素辞書と距離計算を行なう。そし
て距離の小さい音素から順に整理した音素ラティス表（
第７図１）を作成する。第７図において横方向が時間、
すなわちフレーム数、縦方向が距離ｄの小さい順に並べ
た音素とその距ｍを示している。例えば第７図左端列に
おけるフレーム１について、音素「Ｏ」。Assume that a voice input to be recognized is applied to the microphone 1, and the output signal Sl is obtained as shown in FIG. Output signal S2 of audio input section 2, 5 in FIG.
pNvR2 is applied to the phoneme lattice creation section 3. In the phoneme lattice creation section 3, all phoneme data are generated for each frame (Δt) for the power signal shown in FIG. 6, and distance calculations are performed using a phoneme dictionary obtained in advance. Then, a phoneme lattice table organized in descending order of distance between phonemes (
Figure 7 1) is created. In Figure 7, the horizontal direction is time;
That is, the number of frames and the phonemes arranged in ascending order of distance d in the vertical direction and their distances m are shown. For example, for frame 1 in the leftmost column of FIG. 7, the phoneme "O".

距離＝５０、音素「Ａ」、距離＝５２、音素ｒｏＪ距離
＝５３、音素「Ａ」、距離＝７０の音素ラティスが得ら
れていることを示している。It is shown that a phoneme lattice with distance = 50, phoneme "A", distance = 52, phoneme roJ distance = 53, phoneme "A", distance = 70 has been obtained.

このようにして得られた音素ラティス表と認識対象単語
記憶部７に記憶されている対象単語の文字列との比較照
合全ラベル照合部４で行う。認識対象語記憶部７には認
識の対象とすべき全ての単語が記憶されている。またこ
れらの単語の１つ１つには第８図に図示の如き整合窓Ｗ
Ｉ　ＮＤＷが設けられている。第８図に例示の単語ｆ−
００ＳＡＫＡＪについて述べると各音素ｒＯＪ　、　ｒ
ＯＪ　、　ｒｓＪ　、　ｒＡＪ　。The phoneme lattice table obtained in this manner is compared with the character string of the target word stored in the recognition target word storage unit 7, and the comparison is performed in the all-label collation unit 4. The recognition target word storage unit 7 stores all words to be recognized. In addition, each of these words has a matching window W as shown in FIG.
INDW is provided. Figure 8 shows an example word f-
Regarding 00SAKAJ, each phoneme rOJ, r
OJ, rsJ, rAJ.

ｒＫＪ　、　ｒＡＪごと比較照合すべきフレーム帯域が
設けられておシ、（斜線部は比較照合の範囲外）適切な
フレーム範囲について照合を行ない認識率向上の寄与と
効率アップを図るようにしている。A frame band to be compared and verified for each rKJ and rAJ is provided, and verification is performed in an appropriate frame range (the shaded area is outside the range of comparison and verification) to contribute to improving the recognition rate and increase efficiency.

成る音素ラティス素（第１０図）について、上記第８図
のｒｏｏｓＡＫＡＪについて整合窓をかけてｒｏｏｓＡ
ＫＡＪの文字列の文字に一致する音素ラベルの最小距離
をそのフレームの距離とし、全フレームについて距離（
ｄ）と音素ラベル（ＳＥ、ＬＢＬ）’ｉ言１算する（第
９図参照）。次いでこれらの距離の総和をとり、フレー
ム数ｎで除した値Σｄ　ｌ　／ｎを、認識対象単語文字
列と入力音声による音素ラティスの距離とする。For the phoneme lattice element (Fig. 10) consisting of
Let the minimum distance of the phoneme label that matches the character of the string of KAJ be the distance of that frame, and the distance (
d) and the phoneme label (SE, LBL)'i word (see Figure 9). Next, the sum of these distances is taken, and the value Σd l /n, which is divided by the number of frames n, is set as the distance between the word character string to be recognized and the phoneme lattice formed by the input speech.

全ての認識対象単語文字列について上記距離計算を行な
い、距離の最小のものを最も近似度の高いものとして抽
出し、認識単語Ｓ４としてディスプレー９に表示する。The above distance calculation is performed for all word strings to be recognized, and the one with the smallest distance is extracted as the one with the highest degree of approximation and displayed on the display 9 as the recognized word S4.

この例示ではｒｏＯ８ＡＩ（Ａ（大阪）」が認識単語で
ある。In this example, roO8AI(A(Osaka)" is the recognized word.

上記認識単語が所望の場合には利用者にキーが一ド等で
その旨指示し、そうでない場合には再試行をする。If the recognition word is desired, the user is instructed to that effect by pressing the key or the like, and if not, a retry is made.

発明の効果以上述べたように本発明によれば、認識対象語について
音声登録をするのではなく、音素辞書作成のための登録
語について必要な数だけ音声登録をすればよく、音素辞
書は認識対象語数が増加してもそのまま使用できるから
、適切な叡の音声登録で数多くの単面全認識することが
できる。Effects of the Invention As described above, according to the present invention, instead of registering the voices of the words to be recognized, it is only necessary to register the necessary number of voices of the words to be registered to create a phoneme dictionary. Since it can be used as is even if the number of target words increases, it is possible to recognize a large number of single and complete words by registering appropriate speech.

また本発明によれば認識対象語について音声登録をする
のではなく、認識対象語は独立しており、認識対象語の
増減又は変更が容易にできるという効果全奏する。Further, according to the present invention, the recognition target words are not registered as voices, but the recognition target words are independent, and the recognition target words can be easily increased, decreased, or changed.

[Brief explanation of drawings]

第１図は本発明の一実施例としての音声認識装置の）４
成図、第２図は第１固装厘７における音声入力部の回路図、第３図は２．２図回路の一実施例としての詳細な回路図
、第４図体）は第２図回路の出力としてのパワー特性図、第４図（Ｂ）は第４図（４）についてのフレーム毎の・
臂ワー変化特性図、第４ｕｌ（ｃ）は第４関（Ｂ）にもとづいて得られる刊
素データを示す図、第５図は第４図（Ｃ）にもとづく音素辞書の一例？示す
図、第６図は認識すべき音声入力のパワー特性図、第７図は
第６図に図示のデータから・得たＶｉ素ラうィス表を示
す図、第８図は整合窓の一例を示す図、第９図は給１０図に示す音素ラティスデータを第８図の
整合窓をかけて得られた音素データ透水す図、第１０図は成る認識すべき音声入力についての音素ラテ
ィス表を示す図、である。（符号の説明）１・・・マイクロフォン、２・・・音声入方部、３・・
・音素ラティス作成部、４・・・ラベル照合部、５・・
・ラベル付部、６・・・音素辞書記憶部、７・・・認識
対象単語記憶部、８・・・登録語記憶部、９・・・ディ
スル−。特許出願人富士通株式会社特許出願代理人弁理士　η′　木　朗弁理士　西　舘　和　之弁理士　内　１）　幸　男弁理士　山　口　昭　之第４図（Ａ）第５図率６図Figure 1 shows a speech recognition device (4) as an embodiment of the present invention.
Figure 2 is a circuit diagram of the audio input section in the first mounting unit 7, Figure 3 is a detailed circuit diagram as an example of the circuit in Figure 2.2, and Figure 4 is a circuit diagram of the circuit in Figure 2. Figure 4 (B) shows the power characteristic diagram for each frame as the output of Figure 4 (4).
Arm waa change characteristic diagram, 4th ul (c) is a diagram showing the phoneme data obtained based on 4th ul (B), and 5th is an example of a phoneme dictionary based on 4th ul (C). Figure 6 is a power characteristic diagram of the voice input to be recognized, Figure 7 is a diagram showing the Vi element table obtained from the data shown in Figure 6, and Figure 8 is a diagram showing the power characteristics of the voice input to be recognized. A diagram showing an example. Figure 9 is a diagram showing the phoneme data obtained by applying the matching window of Figure 8 to the phoneme lattice data shown in Figure 10. Figure 10 is a phoneme lattice for the speech input to be recognized. It is a figure showing a table. (Explanation of symbols) 1...Microphone, 2...Audio input section, 3...
- Phoneme lattice creation section, 4...Label matching section, 5...
- Labeling section, 6... Phoneme dictionary storage section, 7... Recognition target word storage section, 8... Registered word storage section, 9... Distle-. Patent applicant Fujitsu Ltd. Patent attorney η' Akira Ki Patent attorney Kazuyuki Nishidate Patent attorney 1) Yukio Patent attorney Akira Yamaguchi Figure 4 (A) Figure 5 Ratio Figure 6

Claims

[Claims] 1. All audio inputs are input, and the audio inputs are rectified through a plurality of parallel bandpass filters with different center frequencies, and the audio data of the plurality of channels are rectified. audio input means for generating one noise signal from the signal;
A voice input for a registered word that is in an independent relationship with the word to be recognized is applied to the voice input section, and the voice input for the registered word is determined based on the temporal change in the voice input signal for the registered word. Labeling means for creating a phoneme dictionary from a plurality of audio data signals; A phoneme lattice creation means for creating phoneme data regarding voice input; creating phoneme lattice data for phonemes with a high degree of similarity to phonemes in the phoneme dictionary obtained in advance; and recognition in which the phoneme lattice data is stored in advance. A word speech recognition device that is fully equipped with a label matching means that matches the target word and extracts the word with the highest degree of similarity. 2. The step of voice inputting a registered word independent of the word to be recognized and creating a phoneme dictionary in advance based on the voice input; inputting the word to be recognized voice and creating a phoneme r-y based on the voice input; A word speech recognition method comprising the steps of creating a phoneme lattice by comparing it with a phoneme dictionary, and comparing and comparing the phoneme lattice with a recognition target word to extract a word with the highest degree of similarity.