JPH0667695A

JPH0667695A - Method and device for speech recognition

Info

Publication number: JPH0667695A
Application number: JP3097893A
Authority: JP
Inventors: Shigemi Otsu; 茂実大津
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1991-04-04
Filing date: 1991-04-04
Publication date: 1994-03-11

Abstract

PURPOSE:To improve the degree of speech recognition by eliminating speech misrecognition caused by a single speech recognizing processing means. CONSTITUTION:The method and device are equipped with a speech cut-out means 1 which cuts out an input speech signal into prescribed sections, a 1st speech recognizing processing means 2 which performs a pattern matching processing for the cut-out speech signal, a 2nd speech recognizing processing means 3 which recognized the speech signal from the vowels, and an integrated speech recognizing processing means 4 which recognizes the recognition result on one of speech recognizing processing means by the other speech recognizing processing means.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音声認識方法およびその
装置に係り、特にパターンマッチングだけでは解決でき
ない誤認識を減少させることのできる音声認識方法と音
声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition method and apparatus thereof, and more particularly to a speech recognition method and speech recognition apparatus capable of reducing erroneous recognition which cannot be solved only by pattern matching.

【０００２】[0002]

【従来の技術】人間の言葉を機械に認識させる音声認識
は、予め識別する必要のある単語パターンを音声単語標
準パターンとして入力音声パターンとして登録してお
き、認識の際に、入力された未知の単語のパターンと上
記音声単語標準パターンとを比較照合して、両者間の距
離を計算し、最小となる音声パターンを認識単語とする
パターンマッチングの手法が使われる。すなわち、マタ
ーンマッチングの手法は、入力された音声信号を音声分
析し、その音声パターンの特徴を表すパラメータに変換
する。この音声分析には、帯域フィルタ分析，線径予測
分析，ケプストラム分析などの手法が用いられる。パタ
ーンマッチングの過程では、入力パターンと標準パター
ンとの間のパターンの変動に対応できることが必要であ
る。音声パターンは、発音の仕方，前後の音との繋がり
具合い等により、その時間的軸が変動する。時間軸の変
動は、全体が一定の比率で伸縮するのではなく、非直線
に変化する。この変動に対処する方法として，所謂時間
軸の正規化にはＤＰ（ＤｙｎａｍｉｃＰｒｏｇｒａｍ
ｉｎｇ）マッチングという手法が最も良く知られてい
る。このＤＰマッチング手法は、標準パターンと入力パ
ターンの一方の時間軸を他方のパターンに最も適合する
ように非線形に歪ませ、対応する時点のベクトル間の距
離を累積した値を２つのパターンの間の距離として求め
る最適化手法である。なお、上記従来技術を開示したも
のとして、特開平２−１０８９３６号公報，あるいは
「日本音響学会誌」Ｖｏｌ．４２，Ｎｏ．９，１９８６
（第７２５頁〜第７３０頁）がある。2. Description of the Related Art In the voice recognition in which a human word is recognized by a machine, a word pattern that needs to be identified in advance is registered as an input voice pattern as a voice word standard pattern. A pattern matching method is used in which a word pattern and the above-mentioned voice word standard pattern are compared and collated, the distance between them is calculated, and the minimum voice pattern is used as a recognition word. That is, according to the mattern matching method, the input voice signal is voice-analyzed and converted into a parameter representing the feature of the voice pattern. For this voice analysis, methods such as bandpass filter analysis, wire diameter prediction analysis, and cepstrum analysis are used. In the process of pattern matching, it is necessary to be able to cope with the variation of the pattern between the input pattern and the standard pattern. The time axis of the voice pattern varies depending on the pronunciation method and the connection with the preceding and following sounds. The fluctuation of the time axis changes non-linearly instead of expanding and contracting at a constant rate as a whole. As a method of coping with this fluctuation, DP (Dynamic Program) is used for so-called normalization of the time axis.
ing) matching is the best known method. In this DP matching method, one of the standard pattern and the input pattern is non-linearly distorted so as to best fit the other pattern, and a value obtained by accumulating distances between vectors at corresponding time points is calculated between the two patterns. This is an optimization method for obtaining the distance. As a disclosure of the above-mentioned prior art, Japanese Patent Laid-Open No. 2-108936 or “Journal of Acoustical Society of Japan”, Vol. 42, No. 9,1986
(Pages 725 to 730).

【０００３】[0003]

【発明が解決しようとする課題】音声単語には互いに近
似した音声パターンをもつものが多い。上記従来の技術
の手法では、入力音声に近いパターンが数多く存在する
場合に、上記２つのパターン間の距離が最も近い登録パ
ターンが正しい入力単語であるとは限らず、誤認識が生
じるという問題がある。本発明の目的は、上記従来技術
の問題を解消し、例えば第２候補以下の単語に正しい入
力単語がある場合でも、これを識別して正確な音声認識
を行わせることを可能とした音声認識方法とその装置を
提供することにある。Many speech words have speech patterns that are similar to each other. In the above conventional technique, when there are many patterns close to the input voice, the registered pattern in which the distance between the two patterns is the closest is not always the correct input word, and there is a problem that erroneous recognition occurs. is there. It is an object of the present invention to solve the above-mentioned problems of the prior art, and for example, even if there is a correct input word in the words below the second candidate, it is possible to identify the input word and to perform accurate speech recognition. A method and apparatus therefor.

【０００４】[0004]

【課題を解決するための手段】上記目的を達成するため
に、本発明は、音声認識処理として母音認識と単語認識
とを用い、それぞれの認識結果をもとに１つの正しい入
力単語を認識するようにしたものである。すなわち、本
発明は、入力した音声信号を所定の区間で切り出し、切
り出された音声信号をパターンマッチング処理またはそ
の母音認識処理のいずれか一方で認識処理し、この認識
処理結果を他方の認識処理によって絞り込むことを特徴
とし、その装置構成として、入力した音声信号を所定の
区間で切り出す音声切出し手段（１）と、切り出された
音声信号を予め発声者によって登録された音声単語標準
パターンと入力音声パターンとの距離を計算し、その結
果に基づいて音声入力単語を認識する第１の音声認識処
理手段（２）と、切り出された音声信号を周波数分析し
て得られる第１ないし第３ホルマントのうち隣接する第
１および第２ホルマント周波数，ならびに第２および第
３ホルマント周波数の対数差情報を特徴パラメーターと
して抽出して母音を認識する第２の音声認識処理手段
（３）と、第１の音声認識処理手段と第２の音声認識処
理手段の一方の処理による認識結果を他方によって絞り
込む統合音声認識処理手段（４）とから構成したことを
特徴とする。In order to achieve the above object, the present invention uses vowel recognition and word recognition as speech recognition processing, and recognizes one correct input word based on the respective recognition results. It was done like this. That is, the present invention cuts out an input voice signal in a predetermined section, recognizes the cut out voice signal by either pattern matching processing or its vowel recognition processing, and recognizes the recognition processing result by the other recognition processing. It is characterized by narrowing down, and its device configuration is a voice cutout means (1) for cutting out an input voice signal in a predetermined section, a voice word standard pattern and an input voice pattern in which the cutout voice signal is registered in advance by a speaker. A first voice recognition processing means (2) for recognizing a voice input word based on the result of calculation of the distance between the voice signal and the first to third formants obtained by frequency-analyzing the cut voice signal. The logarithmic difference information between the adjacent first and second formant frequencies and the second and third formant frequencies is used as a characteristic parameter. A second voice recognition processing means (3) for recognizing a vowel and an integrated voice recognition processing means for narrowing down the recognition result by one of the first voice recognition processing means and the second voice recognition processing means by the other ( 4) and is composed of.

【０００５】[0005]

【作用】前記したように、通常、単語認識にはパターン
認識法（ＤＰマッチングやＨＭＭ−ＨｉｄｄｅｎＭａ
ｒｋｏｖＭｏｄｅｌ）が用いられるが、登録単語と入
力された音声パターンの距離ｄがある程度以上離れてい
ると、候補がいくつも存在する。このときの、例えば第
１の音声認識処理としてのＤＰマッチングで複数の単語
候補が抽出された場合に、そのしぼり込みにフォルマン
ト周波数相互間の対数比を用いる第２の音声認識処理と
しての母音認識処理を用いる。なお、単語候補のしぼり
込みは、母音をもとに単語をしぼり込む母音認識処理を
施したものにパターンマッチング処理を施すようにして
もよい。上記の各認識処理は、第１と第２のみに限ら
ず、２以上のさらに他の形式の音声認識処理を施すよう
にすることもできる。これら２つ以上の音声認識処理
は、並列処理用のプロセッサを用いることで実時間処理
が実現できる。このとき、母音認識部の処理には、特開
平２−１０８９３６号公報に開示されたような話者に存
在しない母音認識方法を用いるのが望ましい。この母音
認識は、フォルマント周波数相互間の対数比を用いる方
法であり、少ない計算量で認識効果が大きい音声認識方
法である。As described above, the word recognition is usually performed by the pattern recognition method (DP matching or HMM-Hidden Ma).
rkov Model) is used, but if the distance d between the registered word and the input voice pattern is more than a certain distance, there are many candidates. At this time, for example, when a plurality of word candidates are extracted by DP matching as the first speech recognition processing, vowel recognition as the second speech recognition processing using the logarithmic ratio between formant frequencies for narrowing Use processing. The word candidates may be narrowed down by subjecting the vowel recognition processing to narrowing down the words based on the vowels to the pattern matching processing. Each of the above recognition processes is not limited to the first and second recognition processes, and it is also possible to perform two or more voice recognition processes in other formats. These two or more voice recognition processes can be realized in real time by using a processor for parallel processing. At this time, it is desirable to use a vowel recognition method that does not exist in the speaker as disclosed in Japanese Patent Laid-Open No. 2-108936 for the processing of the vowel recognition unit. This vowel recognition is a method that uses a logarithmic ratio between formant frequencies, and is a speech recognition method that has a large recognition effect with a small amount of calculation.

【０００６】[0006]

【実施例】以下、本発明の実施例につき、図面を参照し
て詳細に説明する。図１は本発明の音声認識方法を適用
する音声認識装置の基本構成を説明するブロツク図であ
って、１は入力した音声信号を所定の閾値を参照して単
語単位で切り出す音声切出し手段、２はＤＰマッチング
で入力音声に音声認識を施す第１の音声認識処理手段、
３は母音認識により入力音声に音声認識を施す第２の音
声認識処理手段、４は第１の音声認識処理手段２と第２
の音声認識処理手段３の音声認識処理結果から単語候補
を絞り込む統合音声認識処理手段である。同図におい
て、音声切出し手段１で切り出された単語信号は、第１
の音声認識処理手段２と第２の音声認識処理手段３とに
並列に供給される。第１の音声認識処理手段２では、Ｄ
Ｐマッチング法やＨＭＭ法による波形分析で生成された
音声パターンを、予め登録されている単語の音声パター
ンと比較照合する。この比較照合の結果は統合音声認識
処理手段４に渡される。一方、第２の音声認識処理手段
３ではフォルマント周波数相互間の対数比を用いた話者
に依存しない母音認識法等で母音認識処理を施し、その
結果を統合音声認識処理手段４に渡す。Embodiments of the present invention will now be described in detail with reference to the drawings. FIG. 1 is a block diagram for explaining the basic configuration of a voice recognition device to which the voice recognition method of the present invention is applied. Reference numeral 1 is a voice cutout unit for cutting out an input voice signal in word units with reference to a predetermined threshold. Is a first voice recognition processing means for performing voice recognition on an input voice by DP matching,
Reference numeral 3 is a second voice recognition processing means for performing voice recognition on an input voice by vowel recognition, and 4 is a first voice recognition processing means 2 and a second voice recognition processing means.
This is an integrated voice recognition processing means for narrowing down word candidates from the voice recognition processing result of the voice recognition processing means 3. In the figure, the word signal cut out by the voice cut-out means 1 is the first
It is supplied in parallel to the voice recognition processing means 2 and the second voice recognition processing means 3. In the first voice recognition processing means 2, D
A voice pattern generated by waveform analysis by the P matching method or the HMM method is compared and collated with a voice pattern of a word registered in advance. The result of this comparison and matching is passed to the integrated voice recognition processing means 4. On the other hand, the second voice recognition processing means 3 performs vowel recognition processing by a speaker-independent vowel recognition method or the like using the logarithmic ratio between formant frequencies, and passes the result to the integrated voice recognition processing means 4.

【０００７】図２は第１の音声認識処理手段の概略構成
を説明するブロツク図であって、音声切出し手段１で切
り出された音声パターンを距離評価部１８において標準
パターン登録部１７に登録されている標準パターンとの
間で距離評価し、その結果から候補単語を比較推定部１
８で照合して候補単語を抽出する。FIG. 2 is a block diagram for explaining the schematic configuration of the first voice recognition processing means. The voice pattern cut out by the voice cutout means 1 is registered in the standard pattern registration section 17 in the distance evaluation section 18. Estimate the distance from the existing standard pattern and compare the candidate words from the result
The candidate words are extracted by collating at 8.

【０００８】図３は第２の音声認識処理手段の概略構成
を説明するブロツク図であって、音声切出し手段１で切
り出された音声パターンを周波数分析部２０でその一定
周期（例えば、１６ｋＨｚ）毎にサンプリングして周波
数分析を行う。ホルマント抽出部２１は、分析された，
例えば５ｋＨｚ以下のスペクトルピークを抽出し、第
１，第２，第３ホルマント周波数ｆ₁，ｆ₂，ｆ₃を得
る。パラメータ抽出部２２では、対数軸スケールにて第
１ホルマント周波数ｆ₁と第２ホルマント周波数ｆ₂の
差、並びに第２ホルマント周波数ｆ₂と第３ホルマント
周波数ｆ₃の差を特徴とするパラメータとして抽出す
る。音声判別部２３は、母音”ａ，ｉ，ｕ，ｅ，ｏ”に
対応する第１，第２パラメータの基準値を予め具備し、
特徴パラメータ抽出部２２で得られた第１パラメータｌ
ｏｇ（ｆ₁／ｆ₂）をもとに入力音声を”ａ”および”
ｏ”と”ｕ”および”ｅ”と”ｉ”の三つに分類し、そ
の後第２パラメータｌｏｇ（ｆ₂／ｆ₃）をもとに”
ａ”および”ｏ”並びに”ｕ”および”ｅ”を分離して
母音を判別する。FIG. 3 is a block diagram for explaining the schematic configuration of the second voice recognition processing means. The voice pattern cut out by the voice cutout means 1 is set in the frequency analysis section 20 at every constant period (for example, 16 kHz). Frequency analysis is performed by sampling to. The formant extraction unit 21 is analyzed,
For example, the spectrum peak of 5 kHz or less is extracted to obtain the first, second and third formant frequencies f ₁ , f ₂ and f ₃ . The parameter extraction unit 22 extracts as a parameter the difference between the first formant frequency f ₁ and the second formant frequency f _{2 and} the difference between the second formant frequency f ₂ and the third formant frequency f _{3 on} a logarithmic scale. To do. The voice discrimination unit 23 is provided with the reference values of the first and second parameters corresponding to the vowels “a, i, u, e, o” in advance,
The first parameter l obtained by the characteristic parameter extraction unit 22
Based on og (f ₁ / f ₂ ), input voices are “a” and “
o "and" u "and" e "and" i "three to the classification of, then the second parameter log (f ₂ / f ₃₎ on the basis of"
Vowels are discriminated by separating "a" and "o" and "u" and "e".

【０００９】図４は本発明による音声認識装置の１実施
例の構成を説明するブロツク図であって、１０は入力音
声分析部で、１１はマイクロフオン、１２は入力音声の
増幅，不要周波数成分除去，自動利得調整等の機能を備
えた信号調整装置、１３は多数のバンドパスフィルター
からなる周波数解析部、１４はマルチプレクサ、１５は
Ａ／Ｄコンバーターである。また、符号１〜５は前記図
１における符号１〜５と同様の部分である。同図におい
て、マイクロフォン１１から入力された音声は信号調整
装置１２で所定の信号調整を施され、周波数解析部１３
に与えられる。周波数分析部１３は入力信号を多数（例
えば、ｎ＝１５）の周波数帯域（チャンネル）に分割し
て当該入力信号を構成する周波数を分析する周波数解析
を行なう。周波数解析部１３で解析された各周波数帯域
成分はマルチプレクサ１４で並列→直列変換されてＡ／
Ｄコンバーター１５においてデジタル形に変換される。
Ａ／Ｄ変換された信号は音声切出し手段１で、例えば所
定の閾値を越えるレベルの信号の継続周期で切り出さ
れ、ＤＰマッチング手段２と母音認識手段３に与えられ
る。ＤＰマッチング手段２では、標準パターン登録部１
７に予め登録されている音声パターンとの比較をＤＰマ
ッチングで実行し、入力信号パターンと登録信号パター
ンとの間の距離を計算し、該距離が小さいもの順に並
べ、候補単語Ｗｎとその候補単語パターンと入力信号パ
ターンとの距離ｄｎを統合音声認識処理手段３に渡す。FIG. 4 is a block diagram for explaining the configuration of one embodiment of the voice recognition apparatus according to the present invention. 10 is an input voice analysis unit, 11 is a microphone, 12 is an input voice amplification, unnecessary frequency component. A signal adjusting device having functions such as removal and automatic gain adjustment, 13 is a frequency analysis unit including a number of bandpass filters, 14 is a multiplexer, and 15 is an A / D converter. Reference numerals 1 to 5 are the same as the reference numerals 1 to 5 in FIG. In the figure, the sound input from the microphone 11 is subjected to a predetermined signal adjustment by the signal adjustment device 12, and the frequency analysis unit 13
Given to. The frequency analysis unit 13 divides the input signal into a large number (for example, n = 15) of frequency bands (channels) and performs frequency analysis to analyze the frequencies forming the input signal. Each frequency band component analyzed by the frequency analysis unit 13 is converted from parallel to serial by the multiplexer 14 and converted to A /
It is converted into a digital form in the D converter 15.
The A / D-converted signal is cut out by the voice cut-out means 1, for example, in a continuous cycle of a signal having a level exceeding a predetermined threshold value, and given to the DP matching means 2 and the vowel recognition means 3. In the DP matching means 2, the standard pattern registration unit 1
7, the distance between the input signal pattern and the registered signal pattern is calculated, and the candidate words Wn and their candidate words are calculated by DP matching. The distance dn between the pattern and the input signal pattern is passed to the integrated voice recognition processing means 3.

【００１０】一方、母音認識手段３では、ホルマント周
波数の対数比をもとに母音列を抽出し、これを統合音声
認識処理手段３に渡す。統合音声認識処理手段３は、Ｄ
Ｐマッチング手段２と母音認識手段３とからそれぞれ渡
された候補単語Ｗｎとその候補単語パターンと入力信号
パターンとの距離ｄｎおよび母音列とから候補単語を絞
り込んで正しい単語を抽出する。抽出された単語は認識
結果として表示装置等の認識出力手段５に出力される。
上記ＤＰマッチング手段２，母音認識手段３および統合
音声認識処理手段３を並列処理プロセッサで構成するこ
とにより、リアルタイムの処理を可能とすることができ
る。なお、上記実施例においては、ＤＰマッチング処理
と母音認識処理とを並列に実行するようにしているが、
これらの処理を時系列で実行するようにしてもよいこと
は言うまでもない。On the other hand, the vowel recognition means 3 extracts a vowel sequence based on the logarithmic ratio of the formant frequencies and passes it to the integrated voice recognition processing means 3. The integrated voice recognition processing means 3 is D
Correct words are extracted by narrowing down the candidate words from the candidate word Wn, the distance dn between the candidate word pattern and the input signal pattern, and the vowel sequence, which are passed from the P matching means 2 and the vowel recognition means 3, respectively. The extracted word is output as a recognition result to the recognition output means 5 such as a display device.
By configuring the DP matching means 2, the vowel recognition means 3, and the integrated speech recognition processing means 3 by a parallel processor, it is possible to perform real-time processing. Although the DP matching process and the vowel recognition process are executed in parallel in the above embodiment,
It goes without saying that these processes may be executed in time series.

【００１１】図５は統合音声認識処理手段での音声認識
処理アルゴリズムの一例を説明するフローチヤートであ
って、まずＤＰマッチング手段２から渡された第１の単
語候補Ｗ₁の距離ｄ₁を見て（ステップ−１）、それが
一定値Ｔｈ₁より小さい時は認識単語をＷ₁とする。ま
たそれが一定値Ｔｈ₂より大きいときは認識不可として
処理を終える。そして、第１の単語候補Ｗ₁の距離ｄ₁
が一定値Ｔｈ₁とＴｈ₂の中間にあるときは、母音認識
手段３から送られた母音列を判断に使用する。このとき
第１の単語候補Ｗ₁と第２の単語候補Ｗ₂の距離の差ｄ
₂−ｄ₁を計算し（ステップ−３）、もし一定値Ｔｈ₃
より大きいときは認識不可として処理を終える。上記距
離の差ｄ₂−ｄ₁が一定値Ｔｈ₃より小さい時は、母音
列と比較して一致したときのみその単語候補を認識単語
Ｗ₂とする（ステップ−４）。また、一致しなければ、
ｎをカウントアップして（ステップ−５）ステップ−３
に戻り次の単語候補Ｗ₃に対して上記と同様の処理を実
行し、ステップ−４で一致の判断がなされば単語候補Ｗ
₃を認識単語とする。不一致の場合はまたｎ＝ｎ＋１と
して上記と同様の処理を施し、候補単語から認識単語を
捜す。以上のようにして、単語候補を決定する。FIG. 5 is a flow chart for explaining an example of a voice recognition processing algorithm in the integrated voice recognition processing means. First, the distance d ₁ of the _first word candidate W ₁ passed from the DP matching means 2 is checked. (Step-1), if it is smaller than the constant value Th _{1, the} recognized word is W ₁ . If it is larger than the constant value Th _{2, the} recognition is not possible and the process is terminated. Then, the distance d _{1 of the} _first word candidate W ₁
Is between the constant values Th ₁ and Th ₂ , the vowel train sent from the vowel recognition means 3 is used for the judgment. At this time, the difference d in distance between the first word candidate W ₁ and the second word candidate W ₂
_2- d ₁ is calculated (step-3), and if a constant value Th ₃
If it is larger than that, the recognition is not possible, and the process ends. When the difference d ₂ -d ₁ of the distance is smaller than the predetermined value Th ₃ is a miso word candidates if they match as compared to the vowel sequence recognition word W ₂ (step -4). If they do not match,
Count up n (Step-5) Step-3
Then, the same process as described above is performed on the next word candidate W ₃ , and if a match is determined in step-4, the word candidate W 3
Let ₃ be the recognition word. If they do not match, n = n + 1 is set and the same processing as above is performed to search for a recognized word from the candidate words. Word candidates are determined as described above.

【００１２】[0012]

【発明の効果】以上説明したように、本発明によれば、
パターンマッチング手法のみでは第２の単語候補以下に
正しい単語がある場合に発生する誤認識をなくし、入力
された単語に相当する正しい単語を認識するこができる
音声認識方法およびその装置を提供できる。As described above, according to the present invention,
It is possible to provide a voice recognition method and an apparatus therefor capable of eliminating the erroneous recognition that occurs when there is a correct word below the second word candidate and recognizing the correct word corresponding to the input word only by the pattern matching method.

[Brief description of drawings]

【図１】本発明の音声認識方法を適用する音声認識装
置の基本構成を説明するブロツク図である。FIG. 1 is a block diagram illustrating a basic configuration of a voice recognition device to which a voice recognition method of the present invention is applied.

【図２】第１の音声認識処理手段の概略構成を説明す
るブロツク図である。FIG. 2 is a block diagram illustrating a schematic configuration of first voice recognition processing means.

【図３】第２の音声認識処理手段の概略構成を説明す
るブロツク図である。FIG. 3 is a block diagram illustrating a schematic configuration of second voice recognition processing means.

【図４】本発明による音声認識装置の１実施例の構成
を説明するブロツク図である。FIG. 4 is a block diagram illustrating a configuration of an embodiment of a voice recognition device according to the present invention.

[Explanation of symbols]

１・・・・音声切出し手段、２・・・・第１の音声認識
処理手段、３・・・・第２の音声認識処理手段、４・・
・・統合音声認識処理手段、１７・・・・標準パターン
登録部。1 ... Voice cutting means, 2 ... First voice recognition processing means, 3 ... Second voice recognition processing means, 4 ...
.... Integrated voice recognition processing means, 17 ... Standard pattern registration unit.

─────────────────────────────────────────────────────
─────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成５年８月１１日[Submission date] August 11, 1993

【手続補正１】[Procedure Amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】図面の簡単な説明[Name of item to be corrected] Brief description of the drawing

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明の音声認識方法を適用する音声認識装
置の基本構成を説明するブロック図である。FIG. 1 is a block diagram illustrating a basic configuration of a voice recognition device to which a voice recognition method of the present invention is applied.

【図２】第１の音声認識処理手段の概略構成を説明す
るブロック図である。FIG. 2 is a block diagram illustrating a schematic configuration of first voice recognition processing means.

【図３】第２の音声認識処理手段の概略構成を説明す
るブロック図である。FIG. 3 is a block diagram illustrating a schematic configuration of a second voice recognition processing unit.

【図４】本発明による音声認識装置の１実施例の構成
を説明するブロック図である。FIG. 4 is a block diagram illustrating a configuration of an embodiment of a voice recognition device according to the present invention.

【図５】統合音声認識処理手段での音声認識アルゴリ
ズムの一例を説明するフローチャートである。FIG. 5 is a flowchart illustrating an example of a voice recognition algorithm in integrated voice recognition processing means.

【符号の説明】１・・・・音声切出し手段、２・・・・第１の音声認識
処理手段、３・・・・第２の音声認識処理手段、４・・
・・統合音声認識処理手段、１７・・・・標準パターン
登録部。[Explanation of Codes] 1 ... Voice cut-out means, 2 ... First voice recognition processing means, 3 ... Second voice recognition processing means, 4 ...
.... Integrated voice recognition processing means, 17 ... Standard pattern registration unit.

Claims

[Claims]

1. A voice recognition method for recognizing an input voice by analyzing the input voice and matching it with a standard pattern registered in advance, the input voice signal being cut out in a predetermined section, and the cut out voice signal. A voice recognition method, characterized in that a recognition process is performed by either the pattern matching process or the vowel recognition process thereof, and the result of the recognition process is narrowed down by the other recognition process.

2. A voice recognition device for recognizing the input voice by analyzing the input voice and comparing it with a standard pattern registered in advance, and a voice cut-out means for cutting out the input voice signal in a predetermined section, and a cut-out means. First voice recognition processing means for calculating a distance between a voice word standard pattern registered in advance by a speaker and an input voice pattern, and recognizing a voice input word based on the result, is cut out. Adjacent first and second formants among first to third formants obtained by frequency-analyzing an audio signal
Second voice recognition processing means for recognizing vowels by extracting formant frequencies and logarithmic difference information of the second and third formant frequencies as characteristic parameters, first voice recognition processing means and second voice recognition processing means A voice recognition device comprising: an integrated voice recognition processing means for narrowing down a recognition result by one of the processes by the other.