JPH09244692A

JPH09244692A - Uttered word certifying method and device executing the same method

Info

Publication number: JPH09244692A
Application number: JP8049843A
Authority: JP
Inventors: Kazuhiro Arai; 和博荒井; Mikio Kitai; 幹雄北井; Shigeki Sagayama; 茂樹嵯峨山
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1996-03-07
Filing date: 1996-03-07
Publication date: 1997-09-19

Abstract

PROBLEM TO BE SOLVED: To correctly judge whether an uttered word is of a correct uttered content or not by recognizing it as it is when an acoustically similar word is uttered and comparing it with a recognition objective word directly. SOLUTION: A word list generating part 4 generates a word list to be used for a word voice recognition in addition to pseudo words generated in a pseudo word generating part 2 and KANA(Japanese syllabary) notations preliminarily set in a word setting part 1 to transmit them to a voice recognition part 6. The voice recognition part 6 performs a voice recognizing processing based on the input voice data inputted from a voice input part 5 and the word list generated in the word list generating part 4 and outputs the KANA notation of the word given with the highest likelihood among words registered in the word list as the recognition result. A word certifying part 7 compares the KANA notain being the recognition result of the input voice to be outputted by the voice recognition part 6 with KANA notations of the words inputted to the word setting part 1 and when they coincide, the part 7 judges that an objective word is uttered.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、発声単語認証方
法およびこの方法を実施する装置に関し、特に、予め設
定されている単語および当該単語と音響的に類似する単
語を認証対象語彙として音声認識を実施して認識結果が
予め設定されている単語である場合に正しい発声がなさ
れたものと認証する発声単語認証方法およびこの方法を
実施する装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a spoken word authentication method and an apparatus for implementing this method, and more particularly to speech recognition using a preset word and a word acoustically similar to the word as an authentication target vocabulary. The present invention relates to a spoken word authentication method for authenticating that a correct utterance is made when a recognition result is a preset word, and an apparatus for implementing this method.

【０００２】[0002]

【従来の技術】ユーザである話者が予め設定されている
認証対象単語を正確に発声したか否かを認証するに際し
て、従来は、設定した単語のみを認識対象語彙として音
声認識処理を行ない、認識の結果得られる尤度の大小に
基づいて発声内容が予め設定されている単語であるか否
かを判定していた。2. Description of the Related Art In authenticating whether or not a speaker who is a user correctly utters a preset authentication target word, conventionally, a voice recognition process is performed by using only the set word as a recognition target vocabulary, Whether or not the utterance content is a preset word is determined based on the magnitude of the likelihood obtained as a result of recognition.

【０００３】また、多数の話者のあらゆる音節或は音素
についての音声データを使用してガーベージモデルと呼
ばれるいわば人間の音声全般に対するモデルを作成し、
音声認識処理してこのモデルが発声された音声データに
対して与えた尤度を計算する一方、予め設定されている
単語の音節或は音素系列に対する入力音声データの尤度
を計算し、両尤度を比較して有意な差がある場合に予め
設定されている単語が発声されたとものとする発声単語
認証方法も実施されている。Further, a model for general human speech called a garbage model is created by using speech data of all syllables or phonemes of many speakers.
While the speech recognition processing is performed to calculate the likelihood given to the uttered speech data by this model, the likelihood of the input speech data for the syllable or phoneme sequence of the word set in advance is calculated. A spoken word authentication method has also been implemented in which it is assumed that a preset word is uttered when there is a significant difference in comparing degrees.

【０００４】[0004]

【発明が解決しようとする課題】しかし、設定した単語
のみを認識対象語彙として音声認識する発声単語認証方
法は、得られる尤度が雑音その他の音声認識を実施する
環境の影響を蒙り易く、また、設定された単語を発声し
ても低い尤度しか得られない話者がいる可能性もあっ
て、厳密な意味の発声単語認証方法とは言い難い。However, the uttered word authentication method for recognizing only the set words as the vocabulary to be recognized is liable to be affected by noise or other environment in which the speech recognition is performed. , There is a possibility that some speakers may get a low likelihood even if they speak a set word, so it is hard to say that it is a spoken word authentication method with a strict meaning.

【０００５】そして、ガーベージモデルを使用する発声
単語認証方法は、音声認識処理を音節単位で実施する場
合、予め設定された単語の音節列の入力音声データに対
する尤度は計算することができるが、この尤度と設定さ
れた単語と類似した音節列を持つ単語が与える尤度との
間の相対的な比較を行なうことはしていないので、予め
設定された単語とは部分的に異なる音節或は音節列を有
する単語が発声された場合にこれを誤認証する可能性を
有している。In the uttered word authentication method using the garbage model, when the voice recognition process is performed in syllable units, the likelihood of the preset syllable string of the word with respect to the input voice data can be calculated. Since no relative comparison is made between this likelihood and the likelihood given by a word having a syllable sequence similar to the set word, a syllable or a syllable that is partially different from the preset word is not compared. Has the possibility of erroneously authenticating a word with a syllable sequence when it is spoken.

【０００６】この発明は、上述の問題を解決する発声単
語認証方法およびこの方法を実施する装置を提供するも
のである。The present invention provides a spoken word authentication method that solves the above-mentioned problems and an apparatus that implements this method.

【０００７】[0007]

【課題を解決するための手段】予め設定されている単語
および当該単語と音響的に類似する単語を認識対象語彙
として音声認識を実施し、入力音声の認識結果と予め設
定されている単語とを比較して認識結果が予め設定され
ている単語である場合に正しい発声がなされたものと認
証する発声単語認証方法を構成した。[Means for Solving the Problems] Speech recognition is performed using a preset word and a word acoustically similar to the word as a recognition target vocabulary, and a recognition result of an input voice and a preset word are compared. By comparison, a uttered word authentication method was constructed to authenticate that the correct utterance was made when the recognition result was a preset word.

【０００８】そして、設定されている単語の各音節を音
響的に類似した音節に置き換えることにより類似した単
語を作成する発声単語認証方法を構成した。また、設定
されている単語の各音節を音響的に類似した音節に置き
換えるに際して、音節変換テーブルを参照する発声単語
認証方法を構成した。更に、音節変換テーブルから類似
する音節を選択するに際して、任意の音節を選択する発
声単語認証方法を構成した。Then, a vocalized word authentication method is constructed in which each syllable of the set word is replaced with an acoustically similar syllable to create a similar word. We also constructed a spoken word authentication method that refers to a syllable conversion table when replacing each syllable of a set word with an acoustically similar syllable. Furthermore, a vocabulary word authentication method that selects an arbitrary syllable when a similar syllable is selected from the syllable conversion table is constructed.

【０００９】そして、音節変換テーブルから類似する音
節を選択するに際して、音節間の置き換わり易さを表す
統計量を参照して順次に所望の個数だけ音節を選択し、
複数の単語を作成する発声単語認証方法を構成した。こ
こで、認証されるべき単語が予め設定されている単語設
定部１を具備し、単語設定部１から入力された単語に基
づいて音節系列の類似した疑似単語を生成する疑似単語
生成部２を具備し、音響的に類似していて認識誤りを相
互に生起し易い音節を格納して疑似単語生成部２に供給
する音節変換テーブル３を具備し、疑似単語生成部２で
生成された疑似単語と単語設定部１に予め設定されてい
る単語とを併せて単語音声認識に使用される単語リスト
を作成する単語リスト生成部４を具備し、発声された音
声を入力して音声データに変換する音声入力部５を具備
し、音声入力部５から入力された入力音声データと単語
リスト生成部４において作成された単語リストとに基づ
いて音声認識処理を行ない、単語リストに登録されてい
る単語の内の最も高い尤度を与えられた単語を認識結果
として出力する音声認識部６を具備し、音声認識部６の
出力する入力音声の認識結果と単語設定部１に予め設定
されている単語とを比較し、両者が一致している場合に
目的の単語が発声されたものと判定する単語認証部７を
具備する発声単語認証装置を構成した。Then, when similar syllables are selected from the syllable conversion table, a desired number of syllables are sequentially selected by referring to a statistic representing the ease of replacement between syllables.
We constructed a spoken word authentication method that creates multiple words. Here, a pseudo word generation unit 2 that includes a word setting unit 1 in which a word to be authenticated is preset and that generates a pseudo word having a similar syllable sequence based on the word input from the word setting unit 1 is used. The pseudo word generated by the pseudo word generation unit 2 is provided with the syllable conversion table 3 that stores the syllables that are acoustically similar and that easily cause mutual recognition errors and supply the syllables to the pseudo word generation unit 2. A word list generation unit 4 that creates a word list used for word speech recognition by combining the word and a word preset in the word setting unit 1 is input, and the uttered voice is input and converted into voice data. A voice input unit 5 is provided, and voice recognition processing is performed based on the input voice data input from the voice input unit 5 and the word list created by the word list generation unit 4, and the words registered in the word list are processed. The most of The speech recognition unit 6 that outputs a word given a high likelihood as a recognition result is provided, and the recognition result of the input speech output by the speech recognition unit 6 is compared with a word preset in the word setting unit 1. A voiced word authentication device including a word authentication unit 7 that determines that a target word is uttered when the two match.

【００１０】[0010]

【発明の実施の形態】この発明の実施の形態を図１の実
施例を参照して説明する。１は単語設定部、２は疑似単
語生成部、３は音節変換テーブル、４は単語リスト生成
部、５は音声入力部、６は音声認識部、７は単語認証部
である。単語設定部１には、認証されるべき単語のかな
表記が入力される。入力されたかな表記単語は、疑似単
語生成部２に入力される。疑似単語生成部２は、入力さ
れたかな表記単語に基づいて音節列の類似した疑似単語
を所望の個数だけ生成し、単語リスト生成部４に登録語
として送付する。BEST MODE FOR CARRYING OUT THE INVENTION An embodiment of the present invention will be described with reference to the example of FIG. 1 is a word setting unit, 2 is a pseudo word generation unit, 3 is a syllable conversion table, 4 is a word list generation unit, 5 is a voice input unit, 6 is a voice recognition unit, and 7 is a word authentication unit. In the word setting unit 1, kana notation of a word to be authenticated is input. The input kana notation word is input to the pseudo word generation unit 2. The pseudo word generation unit 2 generates a desired number of pseudo words having similar syllable strings based on the input kana notation word and sends the pseudo word to the word list generation unit 4 as a registered word.

【００１１】ここで、疑似単語の生成の仕方について説
明する。図２は音節変換テーブル３の例を示す。図２の
音節変換テーブル３は音響的に類似していて認識誤りを
相互に生起し易い音節をクラス毎に格納している。例え
ば、図２のクラス１に格納されている音節“は”は、子
音の欠落によって“あ”と誤認識されたり、或は子音の
置換によって“さ”、“た”その他の音節に誤認識され
る可能性が高い。更に、図２の例は、同一の母音であっ
ても拗音を含む音節は別のクラスに格納している。Here, a method of generating a pseudo word will be described. FIG. 2 shows an example of the syllable conversion table 3. The syllable conversion table 3 of FIG. 2 stores, for each class, syllables that are acoustically similar and easily cause mutual recognition errors. For example, the syllable "ha" stored in the class 1 of FIG. 2 is erroneously recognized as "a" due to the lack of consonants, or erroneously recognized as "sa", "ta" or other syllables due to the replacement of consonants. Is likely to be. Further, in the example shown in FIG. 2, syllables that include the sound of the same vowel are stored in different classes.

【００１２】与えられたかな表記単語は音節に分解さ
れ、入力された音節と音節系列が類似した音節がランダ
ムに、或は予め計算されている音節出現頻度に基づいて
各音節毎に音節変換テーブル３から選択される。選択さ
れた音節は順次接続され、疑似単語が１つ生成される。
疑似単語生成部２は以上の処理を繰り返し、疑似単語を
所望の個数だけ生成する。また、疑似単語生成部２は、
以上の処理を複数回行なう場合、生成されたすべての疑
似単語が異なる音節列を有するものとするために、既に
生成された疑似単語と同一の音節列を有する疑似単語を
生成しているか否かを判定する。もし、同一の音節列を
有する疑似単語が既に生成されている場合、新規に生成
された疑似単語を棄却する。一方、同一の音節列を有す
る疑似単語が未だ生成されていない場合、新規に生成さ
れた疑似単語を登録語として単語リスト生成部４に送付
する。A given kana expression word is decomposed into syllables, and syllable conversion tables are provided for each syllable randomly based on the input syllables and syllables whose syllable sequences are similar to each other, or based on a previously calculated syllable appearance frequency. It is selected from 3. The selected syllables are sequentially connected, and one pseudo word is generated.
The pseudo word generation unit 2 repeats the above processing to generate a desired number of pseudo words. Further, the pseudo word generation unit 2
When the above process is performed multiple times, whether or not a pseudo word having the same syllable string as the already generated pseudo word is generated so that all generated pseudo words have different syllable strings. To judge. If a pseudo word having the same syllable string has already been generated, the newly generated pseudo word is rejected. On the other hand, if a pseudo word having the same syllable string has not been generated yet, the newly generated pseudo word is sent to the word list generation unit 4 as a registered word.

【００１３】図３は疑似単語の生成例を示す。この例に
おいては、単語設定部１には、かな表記単語として“か
いえだしょういち”が入力される。入力されたかな表記
単語“かいえだしょういち”は、疑似単語生成部２に入
力される。疑似単語の生成においては、疑似単語生成部
２において音節変換テーブル３を参照して与えられた各
音節毎に類似した音節が選択される。ここで、音節変換
テーブル３から類似する音節を選択するに際して、音節
間の置き換わり易さを表す統計量を参照して順次に所望
の個数だけ音節を選択し、複数の単語を作成すると、誤
認証をより少なくすることにつながる疑似単語が生成さ
れる。図３に示される例においては、“か”については
“た”が選択され、“い”については“り”が選択さ
れ、“え”については“れ”が選択され、以下同様に選
択された後、これらは順次に接続されて疑似単語“たり
れあにょふみじ”が生成された。疑似単語生成部２は、
この様に生成された疑似単語“たりれあにょふみじ”を
単語リスト生成部４に登録語として送付する。FIG. 3 shows an example of pseudo word generation. In this example, “KAIEDA Shoichi” is input to the word setting unit 1 as a kana notation word. The input kana notation word “Kaieda Shoichi” is input to the pseudo word generation unit 2. In the generation of the pseudo word, the pseudo word generation unit 2 refers to the syllable conversion table 3 and selects a similar syllable for each given syllable. Here, when similar syllables are selected from the syllable conversion table 3, a desired number of syllables are sequentially selected with reference to a statistic representing the ease of replacement between syllables, and a plurality of words are created. Pseudowords are generated that lead to less. In the example shown in FIG. 3, "ta" is selected for "ka", "ri" is selected for "i", "re" is selected for "e", and so on. After that, these were connected in sequence to generate the pseudo word "Tare Leanyo Fumiji". The pseudo word generator 2
The pseudo word “Tare Lean Fumiji” thus generated is sent to the word list generation unit 4 as a registered word.

【００１４】単語リスト生成部４は、疑似単語生成部２
で生成された疑似単語と、単語設定部１に予め設定され
ている認証されるべき単語のかな表記とを併せて単語音
声認識に使用される単語リストを作成し、音声認識部６
に送付する。図４は単語リスト生成部４における単語リ
スト生成例を示す。疑似単語生成部２は任意の個数の疑
似単語を生成することができるが、図４に示した例は疑
似単語を１０個生成した例である。生成されたすべての
疑似単語は、単語設定部１に予め設定されているかな表
記と一緒にされ、図４の右側に示される単語リストが作
成される。単語リスト生成部４において作成された単語
リストは音声認識部６に送られ、認識対象語彙として設
定される。The word list generator 4 is a pseudo word generator 2.
The word list used for word voice recognition is created by combining the pseudo-word generated in step 1 and the kana notation of the word to be authenticated, which is preset in the word setting unit 1, and the voice recognition unit 6
Send to FIG. 4 shows a word list generation example in the word list generation unit 4. Although the pseudo word generator 2 can generate an arbitrary number of pseudo words, the example shown in FIG. 4 is an example in which ten pseudo words are generated. All the generated pseudo-words are combined with the kana notation preset in the word setting unit 1, and the word list shown on the right side of FIG. 4 is created. The word list created by the word list generation unit 4 is sent to the voice recognition unit 6 and set as a recognition target vocabulary.

【００１５】音声認識部６は、音声入力部５から入力さ
れた入力音声データと単語リスト生成部４において作成
された単語リストとに基づいて音声認識処理を行ない、
単語リストに登録されている単語の内の最も高い尤度を
与えられた単語のかな表記を認識結果として出力する。
単語認証部７は、音声認識部６の出力する入力音声の認
識結果であるかな表記と、単語設定部１に入力された単
語のかな表記とを比較し、両かな表記が一致している場
合は目的の単語が発声されたものと判定する。一方、両
かな表記が一致しない場合は誤った単語が発声されたと
ものと判定する。The voice recognition unit 6 performs a voice recognition process based on the input voice data input from the voice input unit 5 and the word list created by the word list generation unit 4.
The kana notation of the word given the highest likelihood among the words registered in the word list is output as the recognition result.
The word authentication unit 7 compares the kana notation, which is the recognition result of the input voice output by the voice recognition unit 6, with the kana notation of the word input to the word setting unit 1, and when both kana notations match. Determines that the target word has been uttered. On the other hand, when the kana notation does not match, it is determined that the wrong word is uttered.

【００１６】[0016]

【発明の効果】以上の通りであって、この発明によれ
ば、音響的に類似した単語が発声された場合、これをそ
のまま認識してこれと認証対象単語と直接比較するの
で、正しい発声内容であるか否かを正確に判定すること
ができる。そして、音声認識の難しさはおよそ認識対象
語彙数に比例するが、この発明によれば、生成する疑似
単語の個数を調整することにより認識対象語彙数を自由
に設定することができる。従って、この設定により音声
認識の難易を容易に変化させることができ、発声単語認
証の厳密さを自由に設定することができる。As described above, according to the present invention, when an acoustically similar word is uttered, it is recognized as it is and directly compared with the word to be authenticated. Can be accurately determined. The difficulty of speech recognition is approximately proportional to the number of recognition target words, but according to the present invention, the number of recognition target words can be freely set by adjusting the number of pseudo words to be generated. Therefore, the difficulty of voice recognition can be easily changed by this setting, and the strictness of uttered word authentication can be freely set.

【００１７】また、音節変換テーブルに記述する類似音
節の定義を変更することにより、生成させる類似単語の
特徴を容易に変更することができる。Further, by changing the definition of the similar syllable described in the syllable conversion table, it is possible to easily change the characteristics of the similar word to be generated.

[Brief description of drawings]

【図１】実施例を説明する図。FIG. 1 illustrates an embodiment.

【図２】音節変換テーブルを示す図。FIG. 2 is a diagram showing a syllable conversion table.

【図３】疑似単語生成例を示す図。FIG. 3 is a diagram showing an example of pseudo word generation.

【図４】単語リスト生成例を示す図。FIG. 4 is a diagram showing an example of word list generation.

[Explanation of symbols]

１単語設定部２疑似単語生成部３音節変換テーブル４単語リスト生成部５音声入力部６音声認識部７単語認証部 1 word setting unit 2 pseudo word generation unit 3 syllable conversion table 4 word list generation unit 5 voice input unit 6 voice recognition unit 7 word authentication unit

Claims

[Claims]

1. A speech recognition is performed by using a preset word and a word acoustically similar to the preset word as a recognition target vocabulary, and recognition is performed by comparing a recognition result of input speech with a preset word. A uttered word authentication method, characterized in that if the result is a preset word, it is authenticated that the correct utterance has been made.

2. The spoken word authentication method according to claim 1, wherein a similar word is created by replacing each syllable of the set word with an acoustically similar syllable. Authentication method.

3. The spoken word authentication method according to claim 2, wherein a syllable conversion table is referred to when replacing each syllable of the set word with a syllable that is acoustically similar. Authentication method.

4. The spoken word authentication method according to claim 3, wherein an arbitrary syllable is selected when a similar syllable is selected from the syllable conversion table.

5. The voicing word authentication method according to claim 3, wherein when a similar syllable is selected from the syllable conversion table, a desired number of syllables are sequentially referred to by referring to a statistic representing the ease of replacement between syllables. A spoken word authentication method characterized by selecting only syllables and creating multiple words.

6. A word setting unit in which a word to be authenticated is preset, and a pseudo word generation unit that generates a pseudo word having a similar syllable sequence based on the word input from the word setting unit. However, it is equipped with a syllable conversion table that stores syllables that are acoustically similar and are prone to mutual recognition errors and supplies them to the pseudo word generation unit. The pseudo word generated by the pseudo word generation unit and the word setting unit It is equipped with a word list generation unit that creates a word list used for word speech recognition together with the preset words, and a voice input unit that inputs the uttered voice and converts it into voice data. , The voice recognition process is performed based on the input voice data input from the voice input unit and the word list created by the word list generation unit, and the highest likelihood of the words registered in the word list is given. When a speech recognition unit that outputs the recognized word as a recognition result is provided, the recognition result of the input speech output by the speech recognition unit is compared with the word preset in the word setting unit, and both match An uttered word authentication device comprising: a word authentication unit that determines that the target word is uttered.