JPH09244691A

JPH09244691A - Input speech rejecting method and device for executing same method

Info

Publication number: JPH09244691A
Application number: JP8049842A
Authority: JP
Inventors: Mikio Kitai; 幹雄北井; Kazuhiro Arai; 和博荒井; Shigeki Sagayama; 茂樹嵯峨山
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1996-03-07
Filing date: 1996-03-07
Publication date: 1997-09-19

Abstract

PROBLEM TO BE SOLVED: To accurately reject vocalization which is not acoustically similar to an object vocabulary to be recognized and not reject the object vocabulary to be recognized when it is inputted by being vocalized ambiguously. SOLUTION: This method stores a character string (1) to be recognized, and generates a similar character string (3) which is acoustically similar to it (2) and also generates a rejected character string (5) which is acoustically similar to neither the character string to be recognized and the similar character string (4). Then a speech recognizing process is performed (9) among the character string to be recognized, similar character string, and character string for rejection for an input speech (6) and when the character string for rejection is present with higher than predetermined accuracy among several recognition results of high order predetermined in the decreasing order of recognition likelihood, it is judged (10) that a character string other than the character string to be recognized was vocalized.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、入力音声リジェ
クト方法およびこの方法を実施する装置に関し、特に、
発声入力された語彙が認識対象語彙であるか否かを判定
して認識対象語彙以外の語彙をリジェクトする入力音声
リジェクト方法およびこの方法を実施する装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an input voice reject method and an apparatus for implementing this method, and more particularly,
The present invention relates to an input speech reject method for determining whether or not a vocabulary input by utterance is a recognition target vocabulary and rejecting a vocabulary other than the recognition target vocabulary, and an apparatus for implementing this method.

【０００２】[0002]

【従来の技術】ユーザが認識対象語彙を発声したか否か
を判別するに際して、入力音声認識の結果得られた認識
対象語彙の尤度を予め設定した閾値と比較して、これが
閾値より小さい場合は認識対象語彙以外の発声であると
判定していた。しかし、この認識対象語彙に対する尤度
は、雑音、話者の相違その他の条件の影響を受けてその
値は変動する。従って、話者の相違する不特定話者の認
識を雑音環境下において実施することを考慮した場合、
雑音或は話者の相違その他の条件の影響を受けて変動す
る尤度のみに依存して認識対象語彙を発声したか否かを
安定に判別することは困難なことである。2. Description of the Related Art When determining whether or not a user has uttered a recognition target vocabulary, the likelihood of the recognition target vocabulary obtained as a result of input speech recognition is compared with a preset threshold, and this is smaller than the threshold. Had determined that the utterance was other than the recognition target vocabulary. However, the likelihood for the recognition target vocabulary varies due to the influence of noise, speaker difference, and other conditions. Therefore, considering that recognition of unspecified speakers with different speakers is performed in a noisy environment,
It is difficult to stably determine whether or not the recognition target vocabulary is uttered, depending only on the likelihood that changes due to the influence of noise or speaker differences or other conditions.

【０００３】そこで、この認識対象語彙に対する尤度を
正規化する方法が考えられた。これは、予め決められた
特定の認識対象語彙に対して尤度が高くなるモデルとは
異なり、すべての音節或は音素の環境に対して或る程度
確からしい尤度を出力する尤度正規化モデルを使用する
方法であり、この正規化モデルにより認識対象語彙に対
する尤度を正規化し、この正規化されたモデルと閾値と
の間の大小比較をして認識対象語彙を発声したか否かを
判別するものである。なお、正規化された尤度として
は、認識対象語彙とは異なる音声により構築した認識モ
デルであるガーベージモデルを使用して正規化された尤
度、或は音声タイプライタから出力される尤度が一例と
して使用される。Therefore, a method of normalizing the likelihood for this recognition target vocabulary has been considered. This is different from a model in which the likelihood is high for a predetermined specific recognition target vocabulary, and a likelihood normalization that outputs a certain likelihood to all syllable or phoneme environments. It is a method that uses a model.The normalized model is used to normalize the likelihood for the recognition target vocabulary, and whether the recognition target vocabulary is uttered by comparing the normalized model with a threshold value. It is to determine. The normalized likelihood is the likelihood normalized using a garbage model, which is a recognition model constructed by speech different from the recognition target vocabulary, or the likelihood output from the speech typewriter. Used as an example.

【０００４】[0004]

【発明が解決しようとする課題】以上の通り、不特定話
者を認識し或は電話環境下において音声認識を実施する
に際して安定して尤度を正規化するモデルを必要とす
る。例えば、上述のガーベージモデルを使用する場合、
これを作成しなければならないが、その作成には多数の
話者が様々な地域から電話して発声した大量の音声デー
タを用意しなければならない。ところが、この様な音声
データを収集用意し、改良するには多大の費用および時
間を必要とする。As described above, a model for stably normalizing the likelihood is required when recognizing an unspecified speaker or performing voice recognition in a telephone environment. For example, using the garbage model above,
This has to be created, but in order to create it, it is necessary to prepare a large amount of voice data that many speakers call from various regions and utter. However, it takes a lot of money and time to collect, prepare, and improve such voice data.

【０００５】そして、正規化モデルを使用して安定して
尤度を正規化しても、本来的に認識対象語彙ではないが
認識対象語彙に類似した音節列が発声された場合、これ
を認識対象語彙であるものと誤判別する可能性は依然と
して高い。この発明は、上述の問題を解消した入力音声
リジェクト方法およびこの方法を実施する装置を提供す
るものである。Even if the likelihood is stably normalized using the normalization model, if a syllable string that is not originally the recognition target vocabulary but is similar to the recognition target vocabulary is uttered, this is recognized. There is still a high possibility of misidentifying it as vocabulary. The present invention provides an input voice reject method and an apparatus for implementing the method, which solves the above problems.

【０００６】[0006]

【課題を解決するための手段】認識対象文字列を予め記
憶しておき、認識対象文字列に音響的に類似する類似文
字列を作成すると共に認識対象文字列および類似文字列
に音響的に類似しないリジェクト文字列を作成し、入力
された音声に対して認識対象文字列と類似文字列とリジ
ェクト用文字列との間において音声認識処理を施し、認
識尤度の高い順の予め決められた上位数個の認識結果の
内にリジェクト用文字列が予め定められた確度以上で存
在した場合、認識対象文字列以外の発声であると判断す
る入力音声リジェクト方法を構成した。A character string to be recognized is stored in advance, a similar character string acoustically similar to the character string to be recognized is created, and at the same time acoustically similar to the character string to be recognized and the similar character string. Do not create a reject character string, perform voice recognition processing on the input voice between the recognition target character string, similar character string and reject character string, and set a predetermined higher order of recognition likelihood. The input voice reject method is configured to judge that the utterance is other than the recognition target character string when the reject character string is present with a certain accuracy or higher among the several recognition results.

【０００７】そして、認識対象文字列を音節表記し、各
音節の一部或はすべてをその音節に音響的に類似する他
の音節に置き換えて類似文字列を作成する入力音声リジ
ェクト方法を構成した。また、子音を持つ各音節の一部
またはすべてをその母音に置き換える入力音声リジェク
ト方法を構成した。Then, an input voice reject method is constructed in which a character string to be recognized is represented in syllables, and a part or all of each syllable is replaced with another syllable acoustically similar to the syllable to create a similar character string. . In addition, we constructed an input speech reject method that replaces some or all of syllables with consonants with their vowels.

【０００８】更に、リジェクト用の文字列を作成する音
節を記憶しておき、この記憶される音節の内から複数個
をランダムに選択してリジェクト用文字列候補を作成
し、リジェクト用文字列候補の内から禁止文字列を排除
すると共に認識対象文字列および類似文字列と連続して
音節の重なりを持つ文字列を排除してリジェクト用文字
列を作成する入力音声リジェクト方法を構成した。Further, a syllable for creating a reject character string is stored, a plurality of syllables are randomly selected from the stored syllables to create reject character string candidates, and reject character string candidates are created. In this paper, we constructed an input speech reject method that excludes forbidden character strings from among the above and excludes character strings having overlapping syllables that are continuous with the recognition target character string and similar character strings to create a reject character string.

【０００９】また、認識対象文字列の各音節の一部また
はすべてを予め決められた複数の音節に置き換えてリジ
ェクト用文字列候補を作成し、生成された各文字列候補
の内の認識対象文字列および類似文字列と共通する文字
列を除いてリジェクト用文字列とする入力音声リジェク
ト方法を構成した。ここで、認識対象文字列記憶部１を
具備し、認識対象文字列記憶部１に予め記憶される各文
字列に音響的に類似する文字列を生成する類似文字列作
成部２を具備し、類似文字列作成部２により作成した文
字列を記憶する類似文字列記憶部３を具備し、認識対象
文字列および類似文字列に音響的に類似していない文字
列を作成するリジェクト用文字列作成部４を具備し、リ
ジェクト用文字列作成部４により作成した文字列を記憶
するリジェクト用文字列記憶部５を具備し、認識対象文
字列記憶部１、類似文字列記憶部３およびリジェクト用
文字列記憶部５に記憶される文字列から音声認識に使用
する辞書を作成する認識辞書作成部７を具備し、認識辞
書記憶部８を具備し、音声の入力を受け付ける音声入力
部６を具備し、音声入力部６に入力された音声を認識辞
書記憶部８に記憶される認識辞書に基づいて音声認識処
理し、尤度の高い順に予め定められた上位Ｎ個の文字列
とその尤度を出力する音声認識部９を具備し、上位Ｎ個
の認識結果の内にリジェクト用文字列が予め定められた
確度以上で存在した場合、認識対象外の発声であると判
断するリジェクト判断部１０を具備する入力音声リジェ
クト装置を構成した。Further, a part or all of each syllable of the character string to be recognized is replaced with a plurality of predetermined syllables to create a reject character string candidate, and the character to be recognized in each of the generated character string candidates. We constructed an input speech reject method that rejects character strings that are common to strings and similar character strings and uses them as reject character strings. Here, the recognition target character string storage unit 1 is provided, and the similar character string creation unit 2 that generates a character string acoustically similar to each character string previously stored in the recognition target character string storage unit 1 is provided. Rejection character string creation that includes a similar character string storage unit 3 that stores the character string created by the similar character string creation unit 2 and creates a character string that is not acoustically similar to the recognition target character string and the similar character string A rejection character string storage unit 5 that stores the character string created by the rejection character string creation unit 4, and includes a recognition target character string storage unit 1, a similar character string storage unit 3, and a reject character. It is provided with a recognition dictionary creating section 7 for creating a dictionary used for voice recognition from a character string stored in the column storage section 5, a recognition dictionary storage section 8 and a voice input section 6 for receiving a voice input. Is input to the voice input unit 6. A voice recognition unit 9 for performing voice recognition processing on the recognized voices based on the recognition dictionary stored in the recognition dictionary storage unit 8 and outputting the upper N predetermined character strings in descending order of likelihood and their likelihoods. If the reject character string is present in the top N recognition results with a predetermined accuracy or higher, the input voice reject apparatus is configured to include the reject determination unit 10 that determines that the utterance is not a recognition target. did.

【００１０】そして、類似文字列作成部２は、各音節に
対して置き換え可能な音節を記述した音節変換規則記憶
部２１、および音節変換規則記憶部２１に記憶される音
節に基づいて認識対象文字列の各音節の一部またはすべ
てを置き換える類似文字列生成部２２より成る入力音声
リジェクト装置を構成した。また、音節変換規則記憶部
２１の書き換え規則は子音を持つ各音節をその母音に置
き換える規則である入力音声リジェクト装置を構成し
た。Then, the similar character string creating unit 2 recognizes a syllable conversion rule storage unit 21 in which replaceable syllables are described for each syllable, and a character to be recognized based on the syllables stored in the syllable conversion rule storage unit 21. An input speech rejecting device comprising a similar character string generation unit 22 which replaces a part or all of each syllable in the string is constructed. Further, the rewriting rule of the syllable conversion rule storage unit 21 constitutes an input voice reject device which is a rule for replacing each syllable having a consonant with its vowel.

【００１１】更に、リジェクト用文字列作成部２は、す
べての音節を記憶した音節記憶部４１を具備し、音節記
憶部４１に記憶される音節をランダムに出力するランダ
ム音節出力部４２を具備し、ランダム音節出力部４２か
ら出力される音節を使用して予め決められた長さの文字
列を生成するリジェクト用文字列候補生成部４３を具備
し、予め定められた許されない音節の並びを記憶する禁
止部分文字列記憶部４４を具備し、リジェクト用文字列
候補生成部４３により生成される文字列が禁止部分文字
列記憶部４４に記憶された文字列を含んでおらず、且つ
認識対象文字列記憶部１および類似文字列記憶部３に記
憶された文字列と音節表記上において予め決められた長
さ以上一致した部分を持たなければ、この文字列をリジ
ェクト用文字列記憶部５に書き込むリジェクト用文字列
選択部４５を具備し、リジェクト用文字列記憶部５に記
憶される文字列が予め定められた数になるまで新たなリ
ジェクト用文字列候補の生成を予め定められた有限回数
繰り返す入力音声リジェクト装置を構成した。Further, the reject character string creating section 2 includes a syllable storage section 41 that stores all syllables, and a random syllable output section 42 that randomly outputs the syllables stored in the syllable storage section 41. , A reject character string candidate generation unit 43 that generates a character string of a predetermined length using the syllable output from the random syllable output unit 42, and stores a predetermined unacceptable syllable arrangement. The character string generated by the reject character string candidate generation unit 43 does not include the character string stored in the prohibited part character string storage unit 44, and is a recognition target character. If there is no part that matches the character string stored in the string storage unit 1 and the similar character string storage unit 3 for a predetermined length or more on the syllable notation, this character string is written as a reject character string. The reject character string selection unit 45 for writing to the unit 5 is provided, and generation of new reject character string candidates is predetermined until the number of character strings stored in the reject character string storage unit 5 reaches a predetermined number. We constructed an input speech reject device that repeats a finite number of times.

【００１２】また、リジェクト用文字列作成部４は、各
音節に対して置き換え可能な音節を記述したリジェクト
用音節変換規則記憶部４−１を具備し、リジェクト用音
節変換規則記憶部４−１に記憶される音節に基づいて認
識対象文字列記憶部１に記憶される文字列の各音節の一
部またはすべてを置き換えた文字列を生成するリジェク
ト用文字列候補生成選択部４−２を具備し、リジェクト
用文字列候補生成選択部４−２により生成された文字列
が、認識対象文字列記憶部１および類似文字列記憶部３
に記憶されない文字列であれば、この文字列をリジェク
ト用文字列記憶部５に書き込む入力音声リジェクト装置
を構成した。Further, the reject character string creating section 4 comprises a reject syllable conversion rule storage section 4-1 in which replaceable syllables are described for each syllable, and the reject syllable conversion rule storage section 4-1. A reject character string candidate generation / selection unit 4-2 for generating a character string in which a part or all of each syllable of the character string stored in the recognition target character string storage unit 1 is generated based on the syllable stored in The character string generated by the reject character string candidate generation / selection unit 4-2 is the recognition target character string storage unit 1 and the similar character string storage unit 3.
If the character string is not stored in the input voice reject device, the character string is written in the reject character string storage unit 5.

【００１３】[0013]

【発明の実施の形態】この発明の実施の形態を図１の実
施例を参照して説明する。１は認識対象となる文字列の
集合Ａを予め記憶しておく認識対象文字列記憶部であ
る。２は認識対象文字列記憶部１に記憶される文字列に
音響的に類似した文字列を作成する類似文字列作成部で
ある。BEST MODE FOR CARRYING OUT THE INVENTION An embodiment of the present invention will be described with reference to the example of FIG. Reference numeral 1 denotes a recognition target character string storage unit that stores in advance a set A of character strings to be recognized. Reference numeral 2 is a similar character string creation unit that creates a character string acoustically similar to the character string stored in the recognition target character string storage unit 1.

【００１４】３は類似文字列生成部２により生成した文
字列の集合Ｂを記憶する類似文字列記憶部である。４は
認識対象文字列記憶部１に記憶される文字列および類似
文字列記憶部３に記憶される文字列の双方に音響的に類
似していない文字列を生成するリジェクト用文字列作成
部である。A similar character string storage unit 3 stores a set B of character strings generated by the similar character string generation unit 2. Reference numeral 4 denotes a reject character string creation unit that generates a character string that is not acoustically similar to both the character string stored in the recognition target character string storage unit 1 and the character string stored in the similar character string storage unit 3. is there.

【００１５】５はリジェクト用文字列作成部４により生
成した文字列の集合Ｃを記憶するリジェクト用文字列記
憶部である。６は音声認識の際に音声の入力を受け付け
る音声入力部である。７は認識対象文字列記憶部１、類
似文字列記憶部３、およびリジェクト用文字列記憶部５
に記憶される文字列から音声認識に使用される辞書を作
成する認識辞書作成部である。A reject character string storage unit 5 stores a set C of character strings generated by the reject character string creating unit 4. Reference numeral 6 denotes a voice input unit that receives voice input during voice recognition. 7 is a recognition target character string storage unit 1, a similar character string storage unit 3, and a reject character string storage unit 5.
A recognition dictionary creating unit that creates a dictionary used for voice recognition from the character string stored in.

【００１６】８は認識辞書作成部７により作成された辞
書を記憶する認識辞書記憶部である。９は音声入力部
６に入力される音声を認識辞書記憶部８に記憶される認
識辞書に基づいて音声認識処理を行ない、尤度の高い順
に、予め定めた上位Ｎ個の文字列とその尤度を出力する
音声認識部である。１０はリジェクト判断部であり、認
識結果として与えられた上位Ｎ個の認識結果の内に、リ
ジェクト用文字列記憶部５に記憶される集合Ｃに属する
文字列が予め定めた確度Ｌ以上で存在したならば、入力
音声は認識対象以外のものであると判断し、そうでなけ
れば認識対象内のものであると判断する判断部である。Reference numeral 8 is a recognition dictionary storage unit for storing the dictionary created by the recognition dictionary creation unit 7. Reference numeral 9 performs a voice recognition process on the voice input to the voice input unit 6 based on the recognition dictionary stored in the recognition dictionary storage unit 8, and a predetermined upper N character string and its likelihood are arranged in descending order of likelihood. It is a voice recognition unit that outputs degrees. Reference numeral 10 denotes a reject determination unit, in which the character strings belonging to the set C stored in the reject character string storage unit 5 are present with a certainty L or more in the top N recognition results given as recognition results. If so, the input unit determines that the input voice is other than the recognition target, and otherwise determines that it is within the recognition target.

【００１７】以上の類似文字列作成部２は、各音節に対
して置き換え可能な音節を記述した音節変換規則記憶部
２１と、音節変換規則記憶部２１に従って認識対象語彙
記憶部１に記憶される文字列の各音節の一部或はすべて
を置き換える類似文字列生成部２２より成る。そして、
リジェクト用文字列作成部４は、すべての音節を記憶し
た音節記憶部４１、音節記憶部４１に記憶される音節を
ランダムに出力するランダム音節出力部４２、ランダム
音節出力部４２から出力される音節を使用して予め決め
られた長さの文字列を生成するリジェクト用文字列候補
生成部４３、予め決められ許されない音節の並びを記憶
する禁止部分文字列記憶部４４、リジェクト用文字列候
補生成部４３により作成された文字列が禁止部分文字列
記憶部４４に記憶される文字列を含んでおらず、且つ認
識対象文字列記憶部１および類似文字列記憶部３に記憶
される文字列と音節表記上において予め決められた長さ
以上一致した部分を持たなければこの文字列をリジェク
ト用文字列記憶部５に書き込み、そうでなければ書き込
まないリジェクト用文字列選択部４５より成る。The above similar character string creating unit 2 is stored in the recognition target vocabulary storage unit 1 according to the syllable conversion rule storage unit 21 in which replaceable syllables are described for each syllable and the syllable conversion rule storage unit 21. It is composed of a similar character string generation unit 22 which replaces a part or all of each syllable of the character string. And
The reject character string creation unit 4 includes a syllable storage unit 41 that stores all syllables, a random syllable output unit 42 that randomly outputs the syllables stored in the syllable storage unit 41, and a syllable output from the random syllable output unit 42. Character string candidate generation section 43 for generating a character string of a predetermined length using, a prohibited partial character string storage section 44 for storing a predetermined and unacceptable syllable arrangement, and a character string candidate generation for rejection. The character string created by the unit 43 does not include the character string stored in the prohibited part character string storage unit 44, and the character string stored in the recognition target character string storage unit 1 and the similar character string storage unit 3 If there is no matching part in the syllable notation for a predetermined length or longer, this character string is written in the reject character string storage unit 5, otherwise it is not written reject Consisting of a string selection unit 45.

【００１８】ここで、上述した各部の処理内容、動作を
更に具体的に説明する。認識対象語彙は「はい」、「い
いえ」、「そーです」、「ちがいます」の４単語である
ものとする。この場合、認識対象文字列記憶部１の内容
のイメージは図２となる。各音節に対して置き換え可能
な音節を記述した音節変換規則記憶部２１の内容のイメ
ージは、その規則が子音を持つ各音節を、その母音のみ
に置き換えるものであるものとすると、図３となる。即
ち、これを１行目についてみると「か」、「さ」、
「た」、「な」、「は」、「ま」、「や」、「ら」、
「わ」、「が」、「ざ」、「だ」、「ば」、「ぱ」は、
その母音「あ」のみに書き換えることを意味する。な
お、外来語をも考慮すると、図３の規則は不足している
が、この点は説明を簡単化するために省略している。Here, the processing contents and operations of the above-mentioned respective parts will be described more specifically. The recognition target vocabulary is assumed to be four words, "yes", "no", "so-do", and "difference". In this case, the image of the contents of the recognition target character string storage unit 1 is as shown in FIG. The image of the contents of the syllable conversion rule storage unit 21 in which replaceable syllables are described for each syllable is as shown in FIG. 3, assuming that each syllable whose rule has consonants is replaced by its vowel. . That is, looking at this in the first line, "ka", "sa",
"Ta", "na", "ha", "ma", "ya", "ra",
"Wa", "ga", "za", "da", "ba", "pa" are
This means rewriting only the vowel "A". It should be noted that the rule of FIG. 3 is deficient when foreign words are also taken into consideration, but this point is omitted for simplification of description.

【００１９】音節変換規則記憶部２１が図３であるもの
とすると、類似文字列生成部２２は認識対象文字列記憶
部１に記憶される図２の文字列の各音節の一部或はすべ
てをこの記憶部２１の変換規則に基づいて置き換える。
その結果は、図４に示される通りの文字列として類似文
字列記憶部３に書き込まれる。図４には「いいえ」に対
応する類似文字列はない。また、この例において、各認
識対象文字列に対して得られる類似文字列には全く同じ
文字列はないが、同じ文字列がある場合は一つだけを残
す処理が必要になる。Assuming that the syllable conversion rule storage unit 21 is shown in FIG. 3, the similar character string generation unit 22 causes a part or all of each syllable of the character string of FIG. 2 stored in the recognition target character string storage unit 1. Is replaced based on the conversion rule of the storage unit 21.
The result is written in the similar character string storage unit 3 as a character string as shown in FIG. In FIG. 4, there is no similar character string corresponding to “No”. Further, in this example, similar character strings obtained for each recognition target character string do not have exactly the same character strings, but if there are the same character strings, processing to leave only one is necessary.

【００２０】図２および図４が決まったので、今度はリ
ジェクト用文字列の生成の仕方について説明する。先
ず、リジェクト用の文字列を作成する音節を記憶した音
節記憶部４１のイメージは図５に示されるものとする。
なお、図５に示される音節は通常の意味の音節とは異な
るが、各音の長さが通常の１文字程度で且つ全体で現れ
得る音の並びの大部分を表現しているものでありさえす
れば何れであっても差し支えない。例えば、“一”，
“っ”，“ぁ”，“ぃ”，“ぇ”，“ぉ”を入れている
のもこのためである。Now that FIG. 2 and FIG. 4 have been decided, a method of generating a reject character string will be described. First, the image of the syllable storage unit 41 storing the syllables for creating the character string for reject is shown in FIG.
The syllable shown in FIG. 5 is different from the syllable in the ordinary sense, but each syllable has a normal length of about one character and represents most of the sequence of sounds that can appear as a whole. It does not matter which one is used. For example, "one",
This is why "tsu", "a", "i", "e", and "o" are inserted.

【００２１】ここで、図５には１１７の音節が記述され
ているが、この音節記憶部４１に記憶される音節４個を
ランダムに出力させて３１個のリジェクト用の文字列候
補を作成すると、図６の様な結果が得られる。この結果
は、音節記憶部４１に書かれている音節の並びの影響を
受ける。なお、図６の「禁止文字列の是非」の欄は、後
述される禁止文字列であるか否かを示している。禁止文
字列ではない場合に×を付し、禁止文字列である場合に
○を付している。また、「重なりの是非」の欄は、図２
の認識対象文字列および図４の類似文字列と連続して２
音節以上の重なりを持っているか否かを表している。重
なりを持つ場合は○が、持たない場合は×が付されてい
る。Although 117 syllables are described in FIG. 5, if four syllables stored in the syllable storage unit 41 are randomly output to create 31 character string candidates for reject. The results shown in FIG. 6 are obtained. This result is influenced by the arrangement of syllables written in the syllable storage unit 41. The column of "property of prohibited character string" in FIG. 6 indicates whether the character string is a prohibited character string described later. When it is not a prohibited character string, it is marked with X, and when it is a prohibited character string, it is marked with O. In addition, the column of "Pros and Cons of Overlap" is
2 consecutively with the character string to be recognized and the similar character string in FIG.
It indicates whether or not it has more than the syllable overlap. If there is an overlap, it is marked with ○, and if it does not, it is marked with ×.

【００２２】一方、許されない音節の並びを記憶する禁
止部分文字列記憶部４４のイメージは例えば図７とな
る。図７において、＊は任意の文字列を表わし、例えば
規則１は「ん」で始まる文字列が禁止されていること、
規則２は同じく「っ」で始まる文字列が禁止されている
こと、規則３は「んー」という並びを含む文字列が許さ
れないことを示している。この図７の規則に従えば、図
６の文字列の内の２８番目の文字列だけが禁止されてい
ることになる。ここで、予め必要と決めたリジェクト用
文字列の数が３０であったとし、且つ認識対象文字列、
類似文字列との間の連続した音節の重なりが２以上であ
る場合は許されないとすると、図６の文字列はすべて重
なりに関しては問題がないので、２８番目の文字列を除
いた３０個の文字列が、リジェクト用文字列としてリジ
ェクト用文字列記憶部５に書き込まれることになる。On the other hand, an image of the forbidden substring storage unit 44 for storing an unacceptable syllable arrangement is, for example, as shown in FIG. In FIG. 7, * represents an arbitrary character string, for example, rule 1 prohibits character strings beginning with "n",
Rule 2 also indicates that a character string beginning with "tsu" is prohibited, and rule 3 indicates that a character string including the sequence "n" is not allowed. According to the rule of FIG. 7, only the 28th character string of the character strings of FIG. 6 is prohibited. Here, it is assumed that the number of reject character strings determined to be necessary in advance is 30, and the recognition target character string is
If the overlapping of consecutive syllables with the similar character string is not more than two and it is not allowed, all the character strings in FIG. 6 have no problem with the overlapping, so the 30 strings except the 28th character string are excluded. The character string will be written in the reject character string storage unit 5 as a reject character string.

【００２３】実際のリジェクト処理においては、図６の
文字列が、リジェクト用文字列候補生成部４３により１
個づつ生成され、リジェクト用文字列選択部４５により
図６の第３欄の禁止文字列か否か、第４欄の認識対象文
字列および類似文字列と２以上の連続した音素列の重な
りがないか否かをチェックされて、リジェクト用文字列
記憶部５に書き込むか否かの判断が行われる。この処理
は、リジェクト用文字列記憶部５に記憶される文字列が
予め定めた数になるか、この繰り返し回数が予め定めた
回数になるまで実施される。In the actual reject process, the character string shown in FIG.
The reject character string selection unit 45 generates the individual character strings and determines whether the character strings are the prohibited character strings in the third column in FIG. 6 and whether the recognition target character string and the similar character string in the fourth column overlap two or more consecutive phoneme strings. It is checked whether or not there is any, and it is determined whether or not to write in the reject character string storage unit 5. This process is performed until the number of character strings stored in the reject character string storage unit 5 reaches a predetermined number, or the number of times of this repetition reaches a predetermined number.

【００２４】次いで、認識対象文字列記憶部１、類似文
字列記憶部３、およびリジェクト用文字列記憶部５の文
字列に基づいて、認識辞書作成部７により音声認識に使
用する辞書が作成され、この辞書は認識辞書記憶部８に
記憶される。以上で入力音声を認識する準備は完了す
る。ここで、認識対象文字列記憶部１に含まれていない
言葉「まるちめでぃあ」が入力された場合を例として説
明する。下記の文献に記載される音声認識装置に、「ま
るちめでぃあ」なる文字列を発声入力した場合の上位３
位までの認識結果を図８に示す。３回ともに上位２位ま
での候補は、リジェクト用文字列であった。これら３回
の発声は、１位の認識結果がリジェクト用文字列である
ので何れもリジェクトされることになる。ここで、上位
数個例えば上位３位までの候補を対象とし、この内に予
め定められた確度以上、例えば１位候補との間の尤度差
が２０００以下と小さいリジェクト用文字列があった場
合、この入力音声を認識対象以外の発声であると判断す
るものとすると、この３回の発声「まるちめでぃあ」は
確実にリジェクトされることになる。１位の認識結果が
認識対象文字列であっても、同様に、１位候補との間の
尤度差が２０００以下と小さいリジェクト用文字列があ
った場合、入力音声を認識対象以外の発声であると判断
する（参照文献山田、野田、井本、嵯峨山：“クライア
ント・サーバ構成のＨＭＭ−ＬＲ連続音声認識システム
とその応用”、情報処理学会研究技術報告ＳＩＧ−Ｓ
ＬＰ９４−５，ｐｐ．３９−４６，１９９５．２．）。Next, the recognition dictionary creating unit 7 creates a dictionary to be used for voice recognition based on the character strings in the recognition target character string storage unit 1, the similar character string storage unit 3, and the reject character string storage unit 5. This dictionary is stored in the recognition dictionary storage unit 8. This completes the preparation for recognizing the input voice. Here, a case where the word “Maruchimedia” not included in the recognition target character string storage unit 1 is input will be described as an example. Top 3 when the character string "Maruchimedia" is input to the voice recognition device described in the following document.
The recognition result up to the rank is shown in FIG. The top two candidates for all three times were reject character strings. All of these three utterances are rejected because the first-ranked recognition result is the reject character string. Here, there are some character strings for rejection, which are targeted for the top several candidates, for example, the top three candidates, and have a small likelihood difference of less than or equal to a predetermined accuracy, for example, 2000 or less with respect to the first place candidate. In this case, if it is determined that this input voice is a voice other than the recognition target, the three voices "Maruchimedia" will be reliably rejected. Even if the recognition result of the first place is the recognition target character string, if there is a reject character string whose likelihood difference with the first place candidate is 2000 or less, similarly, the input voice is uttered other than the recognition target. (References: Yamada, Noda, Imoto, Sagayama: "HMM-LR continuous speech recognition system with client / server configuration and its application", IPSJ SIG-S
LP94-5, pp. 39-46, 1995.2. ).

【００２５】なお、この発明を実施する場合、上述の手
順により作成したリジェクト用文字列の他に、これに加
えて話者がその使用する場面において誤って発声する可
能性の高い語彙をもリジェクト用文字列として登録する
ことにより、リジェクトの効果は向上するものと考えら
れる。この発明を例えば電話に適用する場合、冒頭にお
ける音声認識には「もしもし」の様な通常の電話で使わ
れる慣用句を追加しておくと良いと考えられる。When the present invention is carried out, in addition to the reject character string created by the above-described procedure, in addition to this, the vocabulary that is likely to be uttered by the speaker by mistake in the scene of use is rejected. It is considered that the effect of reject will be improved by registering it as a character string. When the present invention is applied to, for example, a telephone, it is considered preferable to add an idiom such as "Hello!" Used in ordinary telephones to the speech recognition at the beginning.

【００２６】図９を参照してこの発明の他の実施例を説
明する。図９の実施例と図１の実施例とは、リジェクト
用文字列作成部４が相違しているが、その他の構成はす
べて共通している。リジェクト用文字列作成部４は、各
音節をそれとは異なる他の音節に置き換える予め定めら
れた規則を記憶するリジェクト用音節変換規則記憶部４
−１と、この規則に従って認識対象文字列記憶部１に予
め記憶される文字列の各音節の一部またはすべてを置き
換えたリジェクト用文字列を生成し、生成された各文字
列についてその文字列が認識対象文字列記憶部１および
類似文字列記憶部３に記憶されていない場合のみ、リジ
ェクト用文字列記憶部５に書き込む操作をするリジェク
ト用文字列候補生成選択部４−２より成る。Another embodiment of the present invention will be described with reference to FIG. The embodiment of FIG. 9 and the embodiment of FIG. 1 differ in the reject character string creation unit 4, but all other configurations are common. The reject character string creation unit 4 stores a predetermined rule for replacing each syllable with another syllable different from it, and the reject syllable conversion rule storage unit 4
-1, and a reject character string in which a part or all of each syllable of the character string previously stored in the recognition target character string storage unit 1 is replaced according to this rule is generated, and the character string is generated for each generated character string. Is included in the recognition target character string storage unit 1 and the similar character string storage unit 3 only, and includes a reject character string candidate generation / selection unit 4-2 which performs an operation of writing in the reject character string storage unit 5.

【００２７】ここにおいては、認識対象文字列は先の実
施例の４個の内の「はい」および「いいえ」の２個のみ
とする。この場合、認識対象文字列記憶部１のイメージ
は図１０となり、類似文字列記憶部３のイメージは図１
１となる。以下、リジェクト用音節変換規則記憶部４−
１とリジェクト用文字列候補生成選択部４−２の動き
と、認識させた場合の例について説明する。Here, the recognition target character strings are only two, "Yes" and "No", out of the four character strings in the previous embodiment. In this case, the image of the recognition target character string storage unit 1 is shown in FIG. 10, and the image of the similar character string storage unit 3 is shown in FIG.
It becomes 1. Hereinafter, the syllable conversion rule storage unit for reject 4-
1 and the movements of the reject character string candidate generation / selection unit 4-2, and an example of the case where they are recognized will be described.

【００２８】リジェクト用音節変換規則記憶部４−１の
内容は、簡単のために、図１２に示されるものあると考
える。この場合、リジェクト用文字列候補生成選択部４
−２は図１３に示されるリジェクト用文字列候補が生成
される。この例においては、図１３の文字列は、何れ
も、図１０の文字列および図１１の文字列とは異なるの
で、そのままジェクト用文字列記憶部５の内容とされ
る。なお、「えい」、「おう」という母音が続く場合、
そのまま母音を正確に発音するのか「えー」、「おー」
の様に長音化させるのか、或は双方を許すのかを考える
必要があるが、この例においてはそのまま正確に発音す
るものとして考えている。For the sake of simplicity, the contents of the reject syllable conversion rule storage unit 4-1 are considered to be those shown in FIG. In this case, the reject character string candidate generation / selection unit 4
For -2, the reject character string candidates shown in FIG. 13 are generated. In this example, the character strings shown in FIG. 13 are different from the character strings shown in FIG. 10 and the character string shown in FIG. If the vowels "ei" and "ou" continue,
Do you want to pronounce the vowel exactly as it is?
It is necessary to consider whether to lengthen the sound or allow both, like this, but in this example, it is considered that the sound is accurately pronounced.

【００２９】次いで、認識対象文字列記憶部１、類似文
字列記憶部３、およびリジェクト用文字列記憶部５の文
字列から音声認識に使用する辞書が認識辞書作成部７に
より作成され、辞書は認識辞書記憶部８に記憶され、入
力音声を認識する準備が完了する。ここで、先の実施例
と同じく認識対象文字列にない言葉として「まるちめで
ぃあ」が入力された場合を例として説明する。先の文献
に記載される音声認識装置にこの文字列を力した場合の
上位３位までの認識結果を図１４に示す。３回ともに上
位３位までの候補はすべてリジェクト用文字列であっ
た。上位３位までの候補を対象とし、その内に１位候補
との間の差が２０００以下であるリジェクト用文字列が
あった場合、入力音声を認識対象以外の発声と判断する
ものとした場合は、この３回の発声「まるちめでぃあ」
は正しくリジェクトされることになる。Next, a dictionary used for voice recognition is created by the recognition dictionary creating unit 7 from the character strings in the recognition target character string storage unit 1, the similar character string storage unit 3, and the reject character string storage unit 5, and the dictionary is It is stored in the recognition dictionary storage unit 8, and preparation for recognizing the input voice is completed. Here, a case will be described as an example where "Maruchimedia" is input as a word that is not in the recognition target character string, as in the previous embodiment. FIG. 14 shows the recognition results of the top three ranks when this character string is applied to the voice recognition device described in the above document. In all three cases, the top three candidates were all reject character strings. When the rejected character strings that have a difference of 2000 or less from the first-ranked candidates are targeted, and the input voice is determined to be a utterance other than the recognition target. "Maruchimedia" is the three vocalizations
Will be correctly rejected.

【００３０】なお、この発明を実施する場合、上述の手
順により作成したリジェクト用文字列の他に、これに加
えて話者がその使用する場面において誤って発声する可
能性の高い語彙をもリジェクト用文字列として登録する
ことにより、リジェクトの効果は向上するものと考えら
れる。この発明を例えば電話に適用する場合、冒頭にお
ける音声認識には「もしもし」の様な通常の電話で使わ
れる慣用句を追加しておくと良いと考えられる。When the present invention is embodied, in addition to the reject character string created by the above-mentioned procedure, in addition to this, the vocabulary that is likely to be erroneously uttered by the speaker in the use scene is rejected. It is considered that the effect of reject will be improved by registering it as a character string. When the present invention is applied to, for example, a telephone, it is considered preferable to add an idiom such as "Hello!" Used in ordinary telephones to the speech recognition at the beginning.

【００３１】[0031]

【発明の効果】以上の通りであって、この発明は、認識
対象文字列に音響的に類似する類似文字列を作成すると
共に認識対象文字列および類似文字列に音響的に類似し
ないリジェクト文字列を作成し、入力された音声に対し
て認識対象文字列と類似文字列とリジェクト用文字列と
の間において音声認識処理を施し、認識結果の内にリジ
ェクト用文字列が存在した場合認識対象文字列以外の発
声であるとするものである。この様に認識対象文字列お
よび類似文字列とは音響的に類似しないリジェクト文字
列が認識されたことを根拠として当該発声をリジェクト
するので、認識対象語彙に音響的に類似していない発声
がなされた場合、これを的確にリジェクトすることがで
き、この発明を話者認識装置その他の認識装置に使用し
た場合にその誤動作を防止することができる。As described above, according to the present invention, a similar character string acoustically similar to a recognition target character string is created, and a reject character string acoustically dissimilar to the recognition target character string and the similar character string. When the voice recognition process is performed between the recognition target character string, the similar character string and the reject character string for the input voice, and the reject character string exists in the recognition result, the recognition target character It is assumed that the utterance is other than the line. In this way, the utterance is rejected based on the fact that the rejected character string that is not acoustically similar to the recognition target character string and the similar character string is recognized, so that a utterance that is not acoustically similar to the recognition target vocabulary is made. In this case, this can be properly rejected, and when the present invention is used in a speaker recognition device or other recognition device, its malfunction can be prevented.

【００３２】そして、認識対象語彙に音響的に類似する
語彙をも認識対象語彙と共に予め記憶しておいてこれを
認識対象語彙としているので、認識対象語彙を多少曖昧
に発声入力した様な場合もこれをリジェクトすることを
回避することができる。A vocabulary acoustically similar to the recognition target vocabulary is also stored in advance together with the recognition target vocabulary and is used as the recognition target vocabulary. Therefore, even when the recognition target vocabulary is slightly ambiguously input. It is possible to avoid rejecting this.

[Brief description of drawings]

【図１】実施例を説明する図。FIG. 1 illustrates an embodiment.

【図２】認識対象文字列のイメージを示す図。FIG. 2 is a diagram showing an image of a recognition target character string.

【図３】音節変換規則記憶部のイメージを示す図。FIG. 3 is a diagram showing an image of a syllable conversion rule storage unit.

【図４】類似文字列記憶部のイメージを示す図。FIG. 4 is a diagram showing an image of a similar character string storage unit.

【図５】音節記憶部のイメージを示す図。FIG. 5 is a diagram showing an image of a syllable storage unit.

【図６】リジェクト用文字列の候補を示す図。FIG. 6 is a diagram showing candidates for reject character strings.

【図７】禁止部分文字列記憶部のイメージを示す図。FIG. 7 is a diagram showing an image of a prohibited character string storage unit.

【図８】認識結果を示す図。FIG. 8 is a diagram showing a recognition result.

【図９】他の実施例を説明する図。FIG. 9 is a diagram illustrating another embodiment.

【図１０】認識対象文字列のイメージを示す図。FIG. 10 is a diagram showing an image of a recognition target character string.

【図１１】類似文字列記憶部のイメージを示す図。FIG. 11 is a diagram showing an image of a similar character string storage unit.

【図１２】音節変換規則記憶部のイメージを示す図。FIG. 12 is a diagram showing an image of a syllable conversion rule storage unit.

【図１３】リジェクト用文字列候補を示す図。FIG. 13 is a diagram showing rejected character string candidates.

【図１４】認識結果を示す図。FIG. 14 is a diagram showing a recognition result.

[Explanation of symbols]

１認識対象文字列記憶部２類似文字列作成部３類似文字列記憶部４リジェクト用文字列作成部５リジェクト用文字列記憶部６音声入力部７認識辞書作成部８認識辞書記憶部９音声認識部１０リジェクト判断部 1 recognition target character string storage unit 2 similar character string creation unit 3 similar character string storage unit 4 reject character string creation unit 5 reject character string storage unit 6 voice input unit 7 recognition dictionary creation unit 8 recognition dictionary storage unit 9 voice recognition Division 10 Rejection judgment section

Claims

[Claims]

1. A reject character string that stores a recognition target character string in advance, creates a similar character string acoustically similar to the recognition target character string, and does not acoustically resemble the recognition target character string and the similar character string. Create a speech recognition process between the recognition target character string, a similar character string, and the reject character string for the input speech, and recognize a predetermined number of higher-ranked recognition sequences in descending order of recognition likelihood. An input voice reject method, characterized in that when a reject character string is present in a result with a certainty or higher, the utterance is other than the recognition target character string.

2. The input voice reject method according to claim 1, wherein the recognition target character string is represented in syllables, and a part or all of each syllable is replaced with another syllable acoustically similar to the syllable. An input voice reject method characterized by creating a similar character string.

3. The input voice reject method according to claim 2, wherein a part or all of each syllable having a consonant is replaced with its vowel.

4. The input voice reject method according to claim 1, wherein a syllable for creating a character string for reject is stored,
Randomly select a plurality of stored syllables to create reject character string candidates, exclude prohibited character strings from reject character string candidates, and connect with the recognition target character string and similar character strings. Then, an input speech reject method is characterized in that a character string for rejection is created by eliminating character strings having overlapping syllables.

5. The input voice reject method according to claim 1, wherein a part or all of each syllable of the recognition target character string is replaced with a plurality of predetermined syllables. An input speech reject method characterized in that a character string candidate for rejection is created, and a character string common to a recognition target character string and a similar character string among the generated character string candidates is excluded to be a character string for rejection. .

6. A recognition target character string storage unit is provided, and a similar character string creation unit for generating a character string acoustically similar to each character string stored in advance in the recognition target character string storage unit is provided. A similar character string storage unit that stores the character string created by the character string creation unit is provided, and a reject character string creation unit that creates a character string that is not acoustically similar to the recognition target character string and the similar character string is provided. The character string storage unit for storing the character string created by the reject character string creation unit is provided, and the characters stored in the recognition target character string storage unit, the similar character string storage unit, and the reject character string storage unit. It is equipped with a recognition dictionary creation unit that creates a dictionary used for voice recognition from a string, a recognition dictionary storage unit, a voice input unit that receives voice input, and a voice recognition unit that recognizes the voice input to the voice input unit. Record The speech recognition processing is performed based on the recognition dictionary stored in the unit, and the speech recognition unit includes a speech recognition unit that outputs predetermined upper N character strings and their likelihoods in descending order of likelihood. An input voice rejecting device, comprising: a reject determining unit that determines that the rejected character string is a utterance that is not a recognition target when the reject character string exists with a certainty or higher.

7. The input voice reject apparatus according to claim 6, wherein the similar character string creating unit includes a syllable conversion rule storage unit in which replaceable syllables are described for each syllable, and a syllable conversion rule storage unit. An input speech rejecting device comprising a similar character string generation unit that replaces a part or all of each syllable of a character string to be recognized based on a stored syllable.

8. The input voice reject apparatus according to claim 7, wherein the rewriting rule of the syllable conversion rule storage unit is a rule for replacing each syllable having a consonant with its vowel.

9. The input voice reject device according to claim 6, wherein the reject character string creating unit includes a syllable storage unit that stores all syllables. Character string candidates for rejection that include a random syllable output unit that randomly outputs the syllables stored in the section, and generate a character string of a predetermined length using the syllables output from the random syllable output unit A prohibited part character string storage part for storing a predetermined sequence of unacceptable syllables, and a character string generated by the reject character string candidate generation part is stored in the prohibited part character string storage part. Character string that does not include a character string that is stored in the recognition target character string storage unit and the similar character string storage unit, and has a portion that matches the syllable notation for a predetermined length or more. A reject character string selection unit that writes this character string to the reject character string storage unit is provided, and new reject character string candidates are provided until the number of character strings stored in the reject character string storage unit reaches a predetermined number. The input voice rejecting device characterized in that the generation of the above is repeated a predetermined finite number of times.

10. The input voice reject device according to claim 6, wherein the reject character string creating unit describes a syllable for rejection in which replaceable syllables are described for each syllable. A conversion rule storage unit is provided, and a character string in which a part or all of each syllable of the character string stored in the recognition target character string storage unit is replaced is generated based on the syllable stored in the rejection syllable conversion rule storage unit. A character string candidate generation / selection unit for reject
If the character string generated by the reject character string candidate generation / selection unit is a character string that is not stored in the recognition target character string storage unit or the similar character string storage unit, write this character string in the reject character string storage unit. An input voice reject device characterized by: