JPS62217298A

JPS62217298A - Voice recognition equipment

Info

Publication number: JPS62217298A
Application number: JP61061725A
Authority: JP
Inventors: 安田　晴剛; 潤一郎藤本
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1986-03-19
Filing date: 1986-03-19
Publication date: 1987-09-24
Anticipated expiration: 2010-10-09
Also published as: JPH0792675B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】韮１Ｂ辷艷本発明は、音声認識装置、より詳細には、音声認識装置
の認識処理及び辞書作成処理に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech recognition device, and more particularly, to a recognition process and a dictionary creation process of the speech recognition device.

児來流豊音声認識装置における音声区間検出において、１００％
完壁な入力を得ることは難しく、特に、騒音下の場合な
どは可変閾値方式を構じているものが多いため、語頭２
語尾の子音ブロックなどが検出されないことがたびたび
生じる。又、不特定話者認識方式などにおいては個人間
のバラツキがあり、子音ブロックのパワーの強弱や時間
幅が極端に違う場合がある。このような場合、ある特定
のブロックが欠落し、まず、辞書作成を行う場合や認識
時にその差異のために正常な辞書が作れなかったり、誤
認識を起こしてしまうことになる。100% of voice interval detection in the voice recognition device
It is difficult to obtain perfect input, especially in noisy environments, as many devices have a variable threshold method.
Word-final consonant blocks are often not detected. Furthermore, in speaker-independent recognition methods, there are variations between individuals, and the power strength and duration of consonant blocks may be extremely different. In such a case, a certain specific block is missing, and due to the difference when creating a dictionary or during recognition, a correct dictionary cannot be created or erroneous recognition occurs.

而して、従来においては、ある音声ブロックが話者によ
って欠落したりしなかったりした場合、あらかじめその
単語を認識しにくい単語として欠落している単語と欠落
していない単語の両方の辞書を有するマルチプレート処
理を行っていた。又。Therefore, conventionally, when a certain speech block is missing or not depending on the speaker, a dictionary is prepared in advance for both the missing words and the non-missing words as words that are difficult to recognize. Multi-plate processing was performed. or.

辞書作成時にはそのまま加算していたため、その辞書の
精度が悪くなることが多かった。When creating a dictionary, the numbers were simply added, which often resulted in poor accuracy of the dictionary.

目　　　　　的本発明は、上述のごとき実情に鑑みてなされたもので、
特に、音声認識装置における辞書の精度を向上させて音
声認識精度を向上させることを目的としてなされたもの
である。Purpose The present invention was made in view of the above-mentioned circumstances.
In particular, this was done with the aim of improving the precision of the dictionary in the speech recognition device, thereby improving the speech recognition accuracy.

皇−一威本発明は、上記目的を達成するために、マイクから入力
された音声の特微量を抽出する手段と、音声区間を検出
する手段と、一単語内の有音区間と無音区間を切り分け
る手段と、その無音区間に基づいてブロック単位で加算
辞書を登録する手段と、無音区間に基づいて認識処理を
行う手段とを具備した音声認識装置において、辞書作成
時に、比較的欠落しやすい子音ブロック等を有する単語
をあらかじめ指定しておき、認識時に、入力パターンに
対してそのブロックが欠落した場合とそうでない場合の
両方の類似度を算出し、高い方をその単語の類似度とす
ることを特徴としたものである。以下、本発明の実施例
に基づいて説明する。In order to achieve the above-mentioned object, the present invention provides means for extracting the characteristic amount of voice input from a microphone, means for detecting voice sections, and means for detecting voiced sections and silent sections within one word. In a speech recognition device that is equipped with means for segmenting, means for registering an additive dictionary in block units based on the silent sections, and means for performing recognition processing based on the silent sections, consonants that are relatively likely to be omitted when creating a dictionary are used. A word having a block, etc. is specified in advance, and during recognition, the degree of similarity is calculated for both cases in which the block is missing and not, and the higher one is taken as the degree of similarity for that word. It is characterized by Hereinafter, the present invention will be explained based on examples.

第１図は、本発明の一実施例を説明するための電気的ブ
ロック線図で、図中、１はマイク、２は前処理部、３は
特徴抽出部、４は音声区間検出部。FIG. 1 is an electrical block diagram for explaining an embodiment of the present invention. In the figure, 1 is a microphone, 2 is a preprocessing section, 3 is a feature extraction section, and 4 is a voice section detection section.

５は認識処理及び辞書作成部、６は識別子チェック部、
７は単語辞書で、以下、−例として、ＢＴＳＰ方式を用
いた無音区間に着目した不特定話者認識装置に適用した
場合について説明する。5 is a recognition processing and dictionary creation unit; 6 is an identifier check unit;
Reference numeral 7 denotes a word dictionary, and as an example, a case will be described in which it is applied to a speaker-independent recognition device using the BTSP method and focusing on silent sections.

第２図は、無音区間に着目した認識方法の一例を説明す
るための図で、この無音区間に着目した認識方法は、音
声区間Ａ中に生じる無音区間Ｂを節としてその両側の有
音ブロックＣ同志を対応づけてパターン照合を行うもの
である。ところが、例えば、図示の′ストップ′という
単語は、話者によっては、第３図に示すように１プ′が
欠落する場合がある。例えば入力時に′プ′が欠落した
場合、入力の１スト′と辞書の′ストップ′とを認識す
ることになり、誤認識の原因となる。FIG. 2 is a diagram for explaining an example of a recognition method that focuses on a silent section. This recognition method that focuses on a silent section uses a silent section B that occurs in a vocal section A as a node, and blocks the sound on both sides of the silent section B. This is to match C comrades and perform pattern matching. However, for example, in the illustrated word ``stop,'' depending on the speaker, 1 p' may be omitted, as shown in FIG. For example, if ``pu'' is missing during input, the input 1st stroke'' will be recognized as the dictionary ``stop'', resulting in erroneous recognition.

本発明は、上述のごとき誤認識を防止するためになされ
たもので、本発明においては、上述のごとき、単語にあ
らかじめ、辞書作成時に識別子をつけておき、この識別
子のついているものに関しては、辞書の語尾のブロック
についている場合と、ついていない場合の２回の認識演
算を行い、その類似度の大きい方をその単語の類似度と
するようにしたものである。このような語尾の欠落は特
に英語や独語に多く　’５ｋｉｐ”Ｕｐ’等も同様な事
がいえる。又１語尾が欠落する場合や、語頭の場合、そ
の場合に応じて識別子を変化させれば認識時のこの２回
の設定の仕方は容易に判断ができる。The present invention was made in order to prevent the above-mentioned misrecognition.In the present invention, as mentioned above, an identifier is attached to a word in advance at the time of dictionary creation, and for words with this identifier, Recognition calculations are performed twice, one for when a word is attached to a block at the end of a word in the dictionary, and another for when it is not attached, and the one with the greater degree of similarity is taken as the degree of similarity for that word. This type of missing endings is especially common in English and German, and the same can be said for things like '5kip' and 'Up'.Also, if one ending is missing, or if it is at the beginning of a word, you can change the identifier accordingly. The method of setting these two times during recognition can be easily determined.

次に、辞書作成時に本発明を適用する場合について述べ
る。認識時と同様に無音区間に着目して辞書作成するた
め、語中の有音ブロックが欠落すると、作成された辞書
の質が低下する。また、辞書作成時に例えば３回の加算
辞書を作る場合、その類似度を求めてチェックしながら
行う方法がある。従って、その類似度を求める際にブロ
ックが欠落している場合としていない場合とを想定して
、２回の類似度演算を用い、その類似度の大きい方が正
しい対応づけとなり、ブロックが欠落しているとしてそ
のブロックを除外して他ブロックのみを辞書加算するこ
とにより正確な辞書を作成することが可能となる。Next, a case will be described in which the present invention is applied when creating a dictionary. As in the case of recognition, the dictionary is created by focusing on silent sections, so if a voiced block in a word is missing, the quality of the created dictionary deteriorates. Further, when creating a dictionary, for example, when creating an additive dictionary three times, there is a method of calculating and checking the degree of similarity. Therefore, when calculating the similarity, two similarity calculations are performed, assuming that the block is missing and when it is not. By excluding that block and adding only other blocks to the dictionary, it is possible to create an accurate dictionary.

第４図は、その場合の一例を示す図で、■が１回目の照
合、■が２回目の照合である。また、前述のあらかじめ
識別子をつけておくのは、例えば辞書別のもので作成し
て供給する場合は容易に供給する事ができるが、その装
置で作成する場合は識別子を自動的に作成しなくてはな
らない。そのためには上記の２回の類似度を求める演算
を行った結果、ブロックが欠落した方が類似度が高かっ
た場合、そのように入力が欠落する場合があり得るので
、その単語にその識別子を付加させればよいことになる
。この様にして、辞書作成時に識別子に応じて認識時に
その類似度演算を数回行い、その最も高い類似度のもの
を認識結果とすればよい。FIG. 4 is a diagram showing an example of such a case, where ■ is the first verification and ■ is the second verification. In addition, adding an identifier in advance as mentioned above can be easily provided if, for example, a dictionary is created and supplied, but if the device is used to create the identifier, the identifier cannot be created automatically. must not. In order to do this, if the result of performing the above two calculations to calculate the similarity is that the similarity is higher when the block is missing, then the input may be missing, so we need to add that identifier to the word. It would be a good idea to add it. In this way, when creating a dictionary, the similarity calculation may be performed several times during recognition according to the identifier, and the one with the highest similarity may be used as the recognition result.

効　　　果以上の説明から明らかなように、本発明によると、たび
たびブロックが欠落する様な単語でも正確に辞書を作り
、認識することが可能となる。Effects As is clear from the above explanation, according to the present invention, it is possible to accurately create a dictionary and recognize words even for words whose blocks are frequently missing.

[Brief explanation of drawings]

第１図は、本発明による音声認識装置の一実施例を説明
するための電気的ブロック線図、第２図は、無音区間に
着目した認識方法の一例を説明す　　−るための図、第
３図は、従来の認識方法の欠点を説明するための図、第
４図は、辞書作成の一方法を説明するための図である。１・・・マイク、２・・・前処理部、３・・・特徴抽出
部、４・・・音声区間検出部、５・・・認識処理及び辞
書作成部。６・・・識別子チェック部、７・・・単語辞書。第　　１　　図第２図第３！！Ｉ　　　宵４＠FIG. 1 is an electrical block diagram for explaining an embodiment of a speech recognition device according to the present invention, and FIG. 2 is a diagram for explaining an example of a recognition method focusing on silent intervals. FIG. 3 is a diagram for explaining the drawbacks of the conventional recognition method, and FIG. 4 is a diagram for explaining one method for creating a dictionary. DESCRIPTION OF SYMBOLS 1... Microphone, 2... Preprocessing section, 3... Feature extraction section, 4... Speech section detection section, 5... Recognition processing and dictionary creation section. 6...Identifier check unit, 7...Word dictionary. Figure 1 Figure 2 Figure 3! ! I evening 4 @

Claims

[Claims]

(1) A means for extracting the characteristic amount of the voice input from the microphone, a means for detecting the voice interval, a means for separating the voiced interval and the silent interval within one word, and In a speech recognition device that is equipped with a means for registering an additive dictionary and a means for performing recognition processing based on silent intervals, when creating a dictionary, words having consonant blocks etc. that are relatively easy to be omitted are specified in advance, and then recognized. A speech recognition device comprising a pattern matching method that sometimes calculates the degree of similarity for an input pattern both when that block is missing and when it does not, and sets the higher one as the degree of similarity for the word.

(2) The speech recognition device according to claim (1), characterized in that when creating the dictionary, dictionary addition is performed while determining similarity.