JP2002372988A

JP2002372988A - Recognition dictionary preparing device and rejection dictionary and rejection dictionary generating method

Info

Publication number: JP2002372988A
Application number: JP2001180707A
Authority: JP
Inventors: Toru Iwazawa; 透岩沢; Takashi Tonomura; 孝史外村
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2001-06-14
Filing date: 2001-06-14
Publication date: 2002-12-26

Abstract

PROBLEM TO BE SOLVED: To provide a voice recognition dictionary for rejecting a noise or the utterance of words other than recognition words according to the use form of voice recognition using discrete word recognition, and for returning correct voice recognized results only when the recognition words are uttered, and to provide a framework for discriminatingly using rejection vocabulary due to a noise and rejection vocabulary due to the utterance of a user from a constructed rejection dictionary. SOLUTION: The group of rejection words to be rejected is stored in a rejection word data base 5. As for the rejection words to be actually used for voice recognition, recognition words stored in a recognition word storing part 41 and recognized results obtained at the time of erroneously recognizing the correct voices of recognition words inputted by a correct voice inputting part 8 as rejection words are removed from the rejection words in the rejection word data base 5 including similar words generated by a similar word generating part 7 by a rejection word removing part 6, and the words included in a noise rejection word data base are stored in a noise rejection word storing part 422, and the other words are stored in an utterance rejection word storing part 421. At the time of using voice recognition, voice recognition is operated to the dictionary including the words stored in the recognition word storing part 41, the utterance rejection word storing part 421, and the noise rejection word storing part 422.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声認識システムに
おいて、機械雑音や話し声などの周囲雑音や文法外の発
話を棄却する技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for rejecting ambient noise such as mechanical noise or speech and speech outside the grammar in a speech recognition system.

【０００２】[0002]

【従来の技術】音声認識を利用したシステムにおける大
きな問題点として、雑音や周囲の話し声を認識単語と誤
認識してしまい誤動作をするという問題がある。また、
雑音ばかりではなく利用者が認識単語以外の語彙を発話
した際にも同様に誤認識をしてしまう問題がある。2. Description of the Related Art A major problem in a system using speech recognition is that noise or surrounding speech is erroneously recognized as a recognized word, resulting in malfunction. Also,
In addition to the noise, there is a problem that the user may similarly misrecognize when the user speaks a vocabulary other than the recognized word.

【０００３】このような誤認識を防ぐ方法として、雑音
を認識単語以外の音声として棄却する方法がある。従来
の雑音棄却の方法として、特開平１１−２８８２９５号
公報には、単語辞書部は、１回の発声の認識対象単語を
複数登録してあり、認識させる単語データを登録した単
語記憶部と、雑音に近い雑音単語データを登録してある
雑音単語記憶部によって構成されている。As a method of preventing such erroneous recognition, there is a method of rejecting noise as speech other than the recognized word. As a conventional noise rejection method, Japanese Patent Application Laid-Open No. H11-288295 discloses a word dictionary unit in which a plurality of recognition target words of one utterance are registered, and a word storage unit in which word data to be recognized is registered, It is composed of a noise word storage unit in which noise word data close to noise is registered.

【０００４】単語データは認識対象単語を複数単語登録
した単語群で、実際に発声される単語と同じ単語群が登
録されている。雑音単語データは除去対象とする雑音に
近い雑音単語を複数登録した単語群で、雑音で誤認識さ
れやすい単語が複数登録されている。つまり一度雑音を
認識させることで選りすぐった単語を使用し、その音声
環境や認識させたい単語データ９との類似性により、必
要な種類の単語を登録したものである。[0004] The word data is a word group in which a plurality of words to be recognized are registered, and the same word group as words actually uttered is registered. The noise word data is a word group in which a plurality of noise words close to the noise to be removed are registered, and a plurality of words that are likely to be erroneously recognized by the noise are registered. In other words, words that are selected by using noise once are used, and necessary types of words are registered according to their voice environment and similarity with the word data 9 to be recognized.

【０００５】例えば「あ」、「う」などの単音節とその
単音節を２つ組み合わせた単語「ああ」、「あう」等を
単語データとして単語記憶部に登録し、真の雑音、例え
ば、物音や紙をめくる音、人が歩く音等を認識し、その
結果認識した単語を拾いだし、雑音単語データとして雑
音単語記憶部３に登録している。[0005] For example, a single syllable such as "A" or "U" and words "Ah" or "Au" obtained by combining the two syllables are registered in the word storage unit as word data, and true noise, for example, Recognition of noises such as noises of turning over paper, sounds of people walking, and the like are performed, and the recognized words are picked up and registered in the noise word storage unit 3 as noise word data.

【０００６】一方、特開平２−８９１００号公報などに
おいてはマイクへ混入したノイズと辞書に登録された認
識語の参照について述べられている。On the other hand, Japanese Unexamined Patent Publication No. 2-89100 and the like describe noise mixed into a microphone and reference to a recognition word registered in a dictionary.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、ロボッ
トのように不特定の話者が不定の位置からロボットへ語
りかけた言葉を認識するためには単に雑音を認識単語と
してあらかじめ登録し棄却する方法は、登録できる語彙
に限界があり十分な棄却性能が得られない場合があっ
た。また、利用者が認識単語以外の発話をした場合の誤
認識には対応できなかった。However, in order to recognize a word spoken by an unspecified speaker to an robot from an undefined position like a robot, a method of simply registering noise in advance as a recognition word and rejecting the noise is as follows. There were cases where the vocabulary that could be registered was limited and sufficient rejection performance could not be obtained. In addition, it was not possible to cope with erroneous recognition when the user uttered speech other than the recognized word.

【０００８】本発明は、あらかじめ音節ネットを用いた
膨大な棄却辞書を構築し、音声認識に利用する認識単語
に影響を及ぼす棄却単語のみを除去していくことで雑音
や認識単語以外の発話を棄却し、認識単語を発話した場
合のみ正しい音声認識結果を返す音声認識辞書を構築す
ることを目的とする。また、構築した棄却辞書の中から
雑音による棄却語彙と利用者の発話による棄却語彙を区
別し利用するための枠組みを提供することも併せて目的
とする。According to the present invention, an enormous rejection dictionary using a syllable net is constructed in advance, and only rejection words that affect recognition words used for speech recognition are removed, thereby making it possible to generate utterances other than noise and recognition words. An object of the present invention is to construct a speech recognition dictionary that rejects and only returns a correct speech recognition result when a recognized word is uttered. Another object of the present invention is to provide a framework for distinguishing and using a rejected vocabulary due to noise and a rejected vocabulary based on a user's utterance from the constructed rejection dictionary.

【０００９】また、本手法を利用することで、音声認識
の利用形態に適応した雑音棄却語彙を含む音声認識辞書
を構築可能となることが期待できる。[0009] Further, it is expected that a speech recognition dictionary including a noise rejection vocabulary adapted to a use form of speech recognition can be constructed by using the present method.

【００１０】[0010]

【課題を解決するための手段】人間の音声を入力し音声
認識結果を返すシステムにおいて、人間の音声発話を受
理し、音声入力信号を出力する音声入力部と、音声入力
信号を元に音声認識結果をテキスト形式で出力する音声
認識部と、音声認識結果を外部出力装置や別プログラム
へ出力する音声出力部と、認識単語を格納する認識単語
格納部と無効な音声発話や雑音を棄却するための語彙を
格納する棄却単語格納部からなり、音声認識部において
音声認識時に利用される認識辞書部と、棄却単語を網羅
的に格納する棄却単語データベースと、認識単語格納部
から受け渡された認識語彙にマッチする語彙を棄却単語
データベースから除去し残った語彙を棄却単語格納部へ
格納する棄却単語除去部を備えることを特徴とし、棄却
単語格納部が発話棄却単語格納部と雑音棄却単語格納部
から構成され、周囲の話し声や機械雑音、環境雑音に対
応する棄却語彙を格納する雑音棄却単語データベース
と、棄却単語除去部より受け渡された棄却単語が雑音棄
却単語データベースに含まれる場合は雑音棄却単語格納
部へ、含まれない場合は発話棄却単語格納部へ棄却単語
を格納する棄却単語判定部を備えることを特徴とし、棄
却単語データベースから、認識語彙にマッチする棄却単
語を除去するステップと、認識語彙の類似語彙にマッチ
する棄却語彙を除去するステップと、複数の認識語彙発
音データを利用した棄却語彙を除去するステップとから
なることを特徴とする。In a system for inputting human voice and returning a voice recognition result, a voice input unit for receiving a human voice utterance and outputting a voice input signal, and voice recognition based on the voice input signal. A speech recognition unit that outputs results in text format, a speech output unit that outputs speech recognition results to an external output device or another program, a recognition word storage unit that stores recognition words, and a rejection of invalid speech utterance or noise Recognition dictionary unit used for speech recognition in the speech recognition unit, a rejection word database that stores rejection words comprehensively, and recognition passed from the recognition word storage unit. A rejection word removing unit that removes the vocabulary matching the vocabulary from the rejection word database and stores the remaining vocabulary in the rejection word storage unit. A noise rejection word database consisting of a rejection word storage unit and a noise rejection word storage unit that stores rejection vocabularies corresponding to surrounding speech, machine noise, and environmental noise, and a rejection word passed from the rejection word removal unit It is characterized by having a rejection word determination unit that stores a rejection word in the noise rejection word storage unit if it is included in the rejection word database, and an utterance rejection word storage unit if it is not included in the rejection word database. The method comprises the steps of: removing a rejected word that matches; removing a rejected vocabulary that matches a similar vocabulary of the recognized vocabulary; and removing a rejected vocabulary using a plurality of recognized vocabulary pronunciation data.

【００１１】[0011]

【発明の実施の形態】本発明の実施の形態について図面
を参照しながら詳細に説明する。Embodiments of the present invention will be described in detail with reference to the drawings.

【００１２】図１は、本発明の音声認識用雑音棄却装置
の第１の実施の形態の構成図である。FIG. 1 is a block diagram of a first embodiment of a noise rejection apparatus for speech recognition according to the present invention.

【００１３】第１の実施の形態の音声認識用雑音棄却装
置は、人間の音声発話を受理し、音声入力信号を出力す
る音声入力部１と、入力された音声入力信号を元に音声
認識結果をテキスト形式等で出力する音声認識部２と、
得られた音声認識結果を外部出力装置や別プログラム等
へ出力する音声出力部３で構成されている。A noise rejection apparatus for speech recognition according to a first embodiment comprises a speech input unit 1 for receiving a human speech utterance and outputting a speech input signal, and a speech recognition result based on the inputted speech input signal. A voice recognition unit 2 that outputs the text in a text format or the like;
The voice output unit 3 outputs the obtained voice recognition result to an external output device or another program.

【００１４】認識辞書４は、認識単語を格納する認識単
語格納部４１と無効な音声発話や雑音を棄却するための
語彙を格納する棄却単語格納部４２からなり音声認識部
２において音声認識時に利用される。The recognition dictionary 4 includes a recognition word storage unit 41 for storing recognition words and a rejection word storage unit 42 for storing vocabularies for rejecting invalid speech utterances and noises. Is done.

【００１５】更に、認識辞書部４は、棄却単語を網羅的
に格納する棄却単語データベース５と、認識単語格納部
４１から受け渡された認識語彙にマッチする語彙を棄却
単語データベース５から除去し残った語彙を棄却単語格
納部４２へ格納する棄却単語除去部６とから構成され
る。Further, the recognition dictionary unit 4 removes from the rejected word database 5 the rejected word database 5 that comprehensively stores the rejected words and the vocabulary that matches the recognized vocabulary passed from the recognized word storage unit 41. And a rejected word removing unit 6 for storing the vocabulary in the rejected word storage unit 42.

【００１６】音声入力部１において入力された音声は、
認識辞書部４に格納される認識語彙を元に音声認識部２
で音声認識され、結果が音声認識結果出力部３へ出力さ
れる。認識辞書部４は認識単語格納部４１と棄却単語格
納部４２からなり音声認識結果出力部３において得られ
た結果が認識単語か棄却単語か区別することが可能であ
る。その結果、音声認識結果が棄却単語格納部４２に含
まれるものである場合は、棄却したり誤認識反応を返す
などの対応が可能となる。The voice input by the voice input unit 1 is
Based on the recognition vocabulary stored in the recognition dictionary unit 4, the speech recognition unit 2
And the result is output to the speech recognition result output unit 3. The recognition dictionary unit 4 includes a recognized word storage unit 41 and a rejected word storage unit 42, and can discriminate whether the result obtained by the speech recognition result output unit 3 is a recognized word or a rejected word. As a result, when the speech recognition result is included in the rejected word storage unit 42, it is possible to take measures such as rejection or returning an erroneous recognition reaction.

【００１７】次に棄却単語格納部４２に格納される棄却
語彙の生成方法について説明する。Next, a method of generating a rejected vocabulary stored in the rejected word storage section 42 will be described.

【００１８】図６に音声認識語彙（「おはよう」「こん
にちは」「こんばんは」）は「◆」で空間にマッピング
されており，周囲の点線の円で囲われた音声認識空間を
音声認識エンジンが正しい認識結果と解釈するものとす
る。なお，図２中の「こんにちは」「こんばんは」のよ
うに円に重なりがある場合は，距離の近い方の音声認識
語彙が認識結果となる．この点線円の半径が音声認識結
果を返すための閾値であり，棄却判定閾値が低いほど半
径が大きくなる．これに対し，円の内部にある実線枠で
囲われた空間は正しく音声認識すべき発話音声の空間を
意味する。[0018] The speech recognition vocabulary in FIG. 6 ( "Good morning", "Hello", "Good evening") is "◆" are mapped to the space, the voice recognition space surrounded by a dotted line of the circle surrounding the speech recognition engine correct It shall be interpreted as a recognition result. It should be noted that, when there is overlap in the circle, such as "Hello", "Good evening" in FIG. 2, distance voice recognition vocabulary of the closer of becomes the recognition result. The radius of this dotted circle is the threshold for returning the speech recognition result. The lower the rejection judgment threshold, the larger the radius. On the other hand, the space surrounded by the solid line frame inside the circle means the space of the uttered voice to be correctly recognized.

【００１９】不特定話者かつ近距離発声から遠距離発声
まで様々な距離からの発話音声が正しく認識すべき対象
に含まれるため実線枠の空間が必然的に大きくなる．こ
の実線枠の空間を吸収するためには閾値を下げ半径を大
きくする必要があり，その結果として不要音声が点線円
の内部に入りやすくなるため誤認識が多くなる。Since an unspecified speaker and uttered voices from various distances from a short distance utterance to a long distance utterance are included in objects to be correctly recognized, the space of the solid line frame is inevitably increased. In order to absorb the space of the solid line frame, it is necessary to lower the threshold value and increase the radius, and as a result, unnecessary voices easily enter the inside of the dotted circle, so that erroneous recognition increases.

【００２０】尚、環境雑音と発話音声は区別可能であ
り、環境雑音は比較的短い音節数で吸収可能であり、棄
却辞書構築後の音声認識空間のイメージ図を図７に示
す。It is to be noted that the environmental noise and the uttered voice can be distinguished, the environmental noise can be absorbed by a relatively short syllable number, and an image diagram of the speech recognition space after the rejection dictionary is constructed is shown in FIG.

【００２１】棄却単語の散布により音声認識語彙の周囲
以外の空間が棄却単語の音声認識範囲により覆われてい
るというのが棄却辞書構築のイメージである。そして，
音声認識空間が環境雑音が認識される領域とそれ以外の
主に人間の発話音声が認識される領域に分かれており，
環境雑音領域をカバーする雑音棄却単語（図中の×）か
らなる雑音棄却辞書とそれ以外の領域をカバーする発話
棄却単語（図中の●）からなる発話棄却辞書を別々に構
築することが可能である。It is an image of the rejection dictionary construction that the space other than around the speech recognition vocabulary is covered by the speech recognition range of the rejection word due to the dissemination of the rejection word. And
The speech recognition space is divided into an area where environmental noise is recognized and another area where human speech is mainly recognized.
A noise rejection dictionary consisting of noise rejection words covering the environmental noise area (x in the figure) and a speech rejection dictionary consisting of speech rejection words covering other areas (● in the figure) can be constructed separately. It is.

【００２２】図８に棄却単語データベース５の一例を示
す。音声認識語彙として「こんにちは」を例として示
す。FIG. 8 shows an example of the rejection word database 5. It is shown as an example "Hello" as the speech recognition vocabulary.

【００２３】棄却単語データベース５は、１文字から数
文字の音節をつなぎ合わせた音節ネットの形状をしてい
る。図６では４文字を例として用いた。The rejected word database 5 is in the form of a syllable net in which syllables of one to several characters are connected. In FIG. 6, four characters are used as an example.

【００２４】棄却単語データベースの音節ネットの文字
数に関しては、短すぎると十分な棄却性能が得られな
い、また長すぎると認識単語格納部４１及び棄却単語格
納部４２の認識辞書４に登録される単語数が莫大となり
音声認識に時間がかかるなどの問題があり日本語の場合
４文字程度が適当と考えられる。Regarding the number of characters of the syllable net in the rejected word database, if it is too short, sufficient rejection performance cannot be obtained, and if it is too long, words registered in the recognition dictionary 4 of the recognized word storage unit 41 and the rejected word storage unit 42 There is a problem that the number is enormous and it takes time for voice recognition. For Japanese, about four characters are considered appropriate.

【００２５】一方、記憶容量と計算機の計算速度との向
上があるので特に４文字にこだわる必要はないが、で棄
却辞書を作ると効果的である。On the other hand, since there is an improvement in the storage capacity and the calculation speed of the computer, it is not particularly necessary to stick to four characters, but it is effective to create a rejection dictionary.

【００２６】図９に棄却単語除去部６の動作をフローチ
ャートで示す。FIG. 9 is a flowchart showing the operation of the rejected word removing unit 6.

【００２７】まず、棄却単語除去部６は棄却単語データ
ベース５に含まれる棄却単語を全てコピーする(S1)。次
に認識単語格納部４１から受け渡された全ての認識単語
を棄却単語から除去し(S2)た後残った棄却単語を出力す
る(S3)。First, the rejected word removing unit 6 copies all rejected words contained in the rejected word database 5 (S1). Next, all the recognized words passed from the recognized word storage unit 41 are removed from the rejected words (S2), and the remaining rejected words are output (S3).

【００２８】図９の認識単語格納部４１から受け渡され
た全ての認識単語を棄却単語から除去する(S2)のステッ
プを詳細に説明する。The step of removing all recognized words passed from the recognized word storage section 41 of FIG. 9 from rejected words (S2) will be described in detail.

【００２９】棄却辞書は、４文字からなる文字列全てを
含む棄却単語データベースから、過程１音声認識語彙
にマッチする棄却単語の除去過程２音声認識語彙の類
似語彙にマッチする棄却語彙の除去過程３複数話者の
音声認識語彙発声データを利用した棄却語彙の除去する
ことで構成することができる。The rejection dictionary removes rejected words matching the speech recognition vocabulary from the rejection word database including all the character strings composed of four characters. It can be configured by removing rejected vocabulary using speech recognition vocabulary utterance data of multiple speakers.

【００３０】ここで過程２における類似語彙は、「こん
にちは」の先頭４文字「こんにち」の「こ」＝「ko」の
ｋとｈの子音が誤認識しやすいことから「ほんにち」が
除去される。[0030] similar vocabulary in here in the course 2, "Hello" first four characters of "Today,""child" of = "ko", "Hon'nichi" from the fact that k and easy to consonants is erroneous recognition of h of Is removed.

【００３１】過程３では、正解音声を音声認識辞書で認
識させ、棄却単語に誤認識した場合に、そのその棄却単
語を除去した音声辞書を再構築し再び認識させる処理を
繰り返す。In step 3, the correct speech is recognized by the speech recognition dictionary, and when the rejected word is erroneously recognized, the process of reconstructing the speech dictionary from which the rejected word has been removed and re-recognizing it is repeated.

【００３２】図６では、棄却単語「こんいち」「こんい
ひ」「ほんいち」が誤認識された後正しい「こんにち
は」が認識された例である。[0032] In FIG. 6, is an example of "Hello" is recognized right after reject the word "Konichi,""Konii,""Hon'ichi" is erroneously recognized.

【００３３】過程３は、ある程度の誤認識を許容する場
合は省略しても構わない。Step 3 may be omitted if a certain degree of erroneous recognition is allowed.

【００３４】図６では音声語彙の棄却辞書について説明
したが、環境雑音に関しても音声語彙の棄却辞書と同様
の方法をもちいることができる。Although FIG. 6 has described the speech vocabulary rejection dictionary, a method similar to the speech vocabulary rejection dictionary can be used for environmental noise.

【００３５】この際文字列は音声語彙で用いた文字列よ
りも少なくても構わない。At this time, the character string may be smaller than the character string used in the speech vocabulary.

【００３６】尚、環境雑音は音声語彙のデーターベース
よりも少ない文字数であっても問題がない。更に、認識
辞書に、発話音声以外に、環境雑音、例えばドアをノッ
クする音や、火災報知器の音を認識語として入れておく
と次に図２を用いて、本発明の第２の実施の形態につい
て説明する。It should be noted that there is no problem even if the number of characters of the environmental noise is smaller than that of the database of the speech vocabulary. Further, in addition to the speech sound, environmental noise, for example, the sound of knocking on a door and the sound of a fire alarm are put in the recognition dictionary as recognition words. Next, referring to FIG. 2, a second embodiment of the present invention will be described. The embodiment will be described.

【００３７】第２の実施の形態における音声認識用雑音
棄却装置は、図１に示した第１の実施の形態の構成図に
加えて、認識単語格納部４１から受け渡された認識単語
の類似語を生成し棄却単語除去部６へ出力する類似語生
成部７を含み構成される。類似語生成部７では、受け渡
された認識単語そのものを含め認識単語の類似語を生成
し棄却単語除去部６へ出力する。The noise rejection apparatus for speech recognition according to the second embodiment has the same configuration as that of the first embodiment shown in FIG. A similar word generation unit 7 that generates a word and outputs it to the rejected word removal unit 6 is configured. The similar word generation unit 7 generates a similar word of the recognized word including the received recognized word itself, and outputs it to the rejected word removing unit 6.

【００３８】音声認識においては、利用者の発話を音響
通りに認識し期待通りの結果になることはまれである。
従って、認識単語をベースに棄却単語を除去する場合
は、認識単語の類似語を生成し除去しておく必要があ
る。In speech recognition, it is rare that a user's utterance is recognized as acoustic and the result is as expected.
Therefore, when removing a rejected word based on a recognized word, it is necessary to generate and remove a similar word of the recognized word.

【００３９】類似語の生成には様々な方法が考えられる
が、ここでは２種類の類似語生成方法について説明す
る。Various methods can be considered for generating similar words. Here, two types of similar word generation methods will be described.

【００４０】一つは音響的な類似語生成であり、もう一
つは部分マッチを利用した類似語生成である。まず、音
響的な類似語生成は、音素レベルで誤認識を起こしやす
いペアをあらかじめピックアップしておき、それらの音
素を子音とする同じ母音を持つ音節（これを類似音節と
呼ぶ）に対し認識単語の類似語を生成する。例えば、'
K'(カ行)と'H'(ハ行)の音素が類似音素として登録され
ていて、認識単語にカ行の音節がある場合にその音節を
類似音節と交換したものを類似語として登録する。One is acoustic similar word generation, and the other is similar word generation using partial matching. First, in the acoustic synonym generation, pairs that are likely to cause misrecognition at the phoneme level are picked up in advance, and a syllable having the same vowel as a consonant (these are called similar syllables) is recognized. Generates synonyms for. For example, '
If the phonemes of K '(ka line) and' H '(c line) are registered as similar phonemes and the recognized word has a syllable of ka line, the syllable replaced with a similar syllable is registered as a similar word I do.

【００４１】図１０に１つの認識単語に対し類似語を生
成するアルゴリズムのフローチャートを示す。まず、認
識単語を類似語として登録する(S4)。認識単語の最初の
音節を切り出し(S5)、その音節に対し類似音節が存在す
る場合は(S6)全ての登録済の類似語に対し該当する類似
音節を入れ替えたものを類似語として追加する(S7)。上
記S6, S7の処理を認識単語の全ての音節に対して(S8)音
節を切り出し行う(S9)。FIG. 10 shows a flowchart of an algorithm for generating a similar word for one recognized word. First, a recognized word is registered as a similar word (S4). Cut out the first syllable of the recognized word (S5), and if there is a similar syllable for that syllable (S6), add a similar word that replaces the corresponding similar syllable for all registered similar words (S6) S7). In the processing of S6 and S7, syllables are cut out (S8) for all syllables of the recognized word (S9).

【００４２】このアルゴリズムを利用し、'K'(カ行)と'
H'(ハ行)の音素が類似音素として登録されている場合に
認識単語として「けんかい（見解）」が類似語生成部に
入力されると、「けんかい」「へんかい」「けんはい」
「へんはい」の４つが類似語として生成される。Using this algorithm, 'K' (ka row) and '
When the phoneme of H '(ha line) is registered as a similar phoneme, and "Kenkai (view)" is input to the similar word generator as a recognition word, "Kenkai", "Henkai", "Kenhai"
The four words "Henhai" are generated as similar words.

【００４３】次に部分マッチを用いた類似語生成の方法
について説明する。部分マッチを利用した類似語とは、
認識単語に対しある固定長の連続した部分音節を固定し
残りの音節を任意の音節に変化させたものである。図１
１に「けんかい」という認識語彙に対する１文字固定、
２文字固定の部分マッチを利用した類似語を示す。Next, a method of generating a similar word using a partial match will be described. Synonyms that use partial matching are
This is one in which a fixed partial length of continuous partial syllables is fixed for the recognition word, and the remaining syllables are changed to arbitrary syllables. FIG.
1 is fixed to one character for the recognition vocabulary
This shows a synonym using a two-character fixed partial match.

【００４４】なお、認識単語の音節数が棄却単語データ
ベースの最大音節数より長い場合は認識単語の１文字目
から最大音節数の長さの文字列を切り出し利用する。When the number of syllables of the recognized word is longer than the maximum number of syllables in the rejected word database, a character string having the maximum number of syllables is cut out from the first character of the recognized word and used.

【００４５】固定する部分音節長は、短くするほど多く
の類似語が生成され、結果として棄却語彙を除去するこ
とが可能となる反面、棄却性能が劣化する。逆に部分音
節長を長くすると類似語が少なく棄却性能が強すぎるこ
とになる。固定する部分音節長は、認識語彙数に応じて
決定するのが良いと考えられる。The shorter the fixed partial syllable length, the more similar words are generated. As a result, it is possible to remove the rejected vocabulary, but the rejection performance deteriorates. Conversely, if the partial syllable length is increased, similar words are reduced and rejection performance is too strong. It is considered that the fixed partial syllable length should be determined according to the number of recognized words.

【００４６】なお、音響的な類似語と部分マッチによる
類似語は組み合わせで生成することも可能である。It should be noted that acoustic similar words and similar words by partial matching can be generated in combination.

【００４７】次に図３を用いて、本発明の第３の実施の
形態について説明する。第３の実施の形態の音声認識用
雑音棄却装置は、図２に示した第２の実施の形態の構成
図に加えて、認識単語を正しく発声した正解音声を音声
認識部２へ入力する正解音声入力部８と、正解音声を入
力した際の認識結果を音声認識結果出力部３から取得
し、その認識結果が棄却単語であった場合に認識結果を
棄却単語除去部へ出力する棄却単語判定部９を含み構成
される。Next, a third embodiment of the present invention will be described with reference to FIG. The noise rejection apparatus for speech recognition according to the third embodiment has the same configuration as that of the second embodiment shown in FIG. Rejection word determination that acquires a recognition result when a correct answer voice is input from the voice input unit 8 and the speech recognition result output unit 3 and outputs the recognition result to the rejection word removal unit when the recognition result is a rejection word It comprises a part 9.

【００４８】第３の実施の形態は、認識単語を元にした
棄却単語除去では対応しきれない棄却単語を利用者の音
声ベースで除去するものである。正解音声入力部８で
は、認識単語のみを対象にした音声認識で正解が得られ
た音声（これを正解音声と呼ぶ）を事前に収録したもの
が再生される。この正解音声の音声認識結果は音声認識
結果出力部３から棄却単語判定部９へと受け渡される。
受け渡された音声認識結果が正解とはならず棄却単語格
納部４２に含まれる語彙に誤認識された場合は、その音
声認識結果が棄却単語除去部６へ受け渡される。棄却単
語除去部６では、図９に示したフローチャートに従い認
識単語とその類似語と音声認識結果を棄却単語データベ
ース５から除去したものを棄却単語格納部４２へ格納す
る。なお、実際には正解音声を連続で再生し棄却単語に
誤認識された語彙を除去する必要があるため、棄却単語
除去判定部９において過去に除去された棄却単語を全て
格納しておき、新規に除去する棄却単語を含め棄却単語
除去部６へ受け渡す必要がある。In the third embodiment, rejected words that cannot be dealt with by rejected word removal based on recognized words are removed based on the user's voice. The correct voice input unit 8 reproduces a voice in which a correct answer has been obtained by voice recognition of only the recognized word (this is called a correct voice) in advance. The speech recognition result of the correct speech is passed from the speech recognition result output unit 3 to the rejected word determination unit 9.
If the received speech recognition result is not correct and is incorrectly recognized as a vocabulary included in the rejected word storage unit 42, the speech recognition result is passed to the rejected word removing unit 6. The rejected word removing unit 6 stores in the rejected word storage unit 42 what is obtained by removing the recognized word, its similar words, and the speech recognition result from the rejected word database 5 according to the flowchart shown in FIG. Since it is actually necessary to continuously play the correct speech and remove the vocabulary erroneously recognized as a rejected word, the rejected word removal determination unit 9 stores all rejected words that have been removed in the past, It is necessary to transfer to the rejected word removing unit 6 including the rejected word to be removed.

【００４９】正解音声は、特定話者による音声認識であ
れば特定話者の正解音声があればよいが、不特定話者の
音声認識を行う場合は老若男女のバランスの取れた大人
数の正解音声を利用するのが望ましい。The correct speech may be a specific speaker's correct speech if it is speech recognition by a specific speaker. It is desirable to use audio.

【００５０】次に図４を用いて、本発明の第４の実施の
形態について説明する。Next, a fourth embodiment of the present invention will be described with reference to FIG.

【００５１】第４の実施の形態の音声認識用雑音棄却装
置は、図３に示した第３の実施の形態の構成図におい
て、棄却単語判定部９において認識結果を棄却単語除去
部６ではなく類似語生成部７へ出力し、除去すべき棄却
単語の類似語を生成した後棄却単語除去部６へ出力する
ことを特徴とする。The noise rejection apparatus for speech recognition according to the fourth embodiment differs from the configuration diagram of the third embodiment shown in FIG. The rejected word is output to the rejected word removing unit 6 after generating the similar word of the rejected word to be removed.

【００５２】棄却単語に誤認識された語彙の類似語彙を
あらかじめ除去しておくことで、正解音声入力による棄
却単語除去の手間を軽減できる反面正解音声入力に支障
のない棄却単語も除去してしまい棄却性能を弱める可能
性がある。By removing in advance the vocabulary similar to the vocabulary erroneously recognized as the rejected word, the trouble of removing the rejected word by the correct speech input can be reduced, but the rejected word which does not hinder the correct speech input is also removed. It may weaken rejection performance.

【００５３】第４の実施の形態の手法は、実施の形態３
の正解音声を利用した棄却単語除去に膨大な時間がかか
る場合に有効であると考えられる。The method according to the fourth embodiment is similar to the method according to the third embodiment.
It is considered to be effective when it takes an enormous amount of time to remove rejected words using the correct speech.

【００５４】次に図５を用いて、本発明の第５の実施の
形態について説明する。Next, a fifth embodiment of the present invention will be described with reference to FIG.

【００５５】第５の実施の形態の音声認識用雑音棄却装
置は、第３または第４の実施の形態の構成図に加えて、
棄却単語格納部４２が発話棄却単語格納部４２１と雑音
棄却単語格納部４２２から構成され、周囲の話し声や機
械雑音、環境雑音に対応する棄却語彙を格納する雑音棄
却単語データベース１０と、棄却単語除去部６より受け
渡された棄却単語が雑音棄却単語データベース１０に含
まれる場合は雑音棄却単語格納部４２２へ、含まれない
場合は発話棄却単語格納部４２１へ棄却単語を格納する
棄却単語判定部１１とを含んで構成される。The noise rejection apparatus for speech recognition according to the fifth embodiment is different from the configuration according to the third or fourth embodiment in that
The rejection word storage unit 42 includes an utterance rejection word storage unit 421 and a noise rejection word storage unit 422, and includes a noise rejection word database 10 that stores rejection vocabulary corresponding to surrounding speech, machine noise, and environmental noise, and a rejection word removal. If the rejected word passed from the unit 6 is included in the noise rejection word database 10, the rejection word determination unit 11 stores the rejection word in the noise rejection word storage unit 421. It is comprised including.

【００５６】雑音棄却単語データベース１０は、棄却単
語データベースの全ての語彙を対象とした音声認識辞書
に対し事前に収録した雑音音声を入力してその認識結果
を収集、格納することで構築される。入力する雑音音声
は、音声認識を行う環境を想定し様々な雑音源を含んだ
ものであることが望ましい。The noise rejection word database 10 is constructed by inputting noise speech recorded in advance to a speech recognition dictionary for all vocabularies in the rejection word database, collecting and storing the recognition results. It is desirable that the input noise voice includes various noise sources in consideration of an environment for performing voice recognition.

【００５７】なお、雑音音声には、利用者が認識単語以
外の発話をした場合の音声は収録しない。棄却単語判定
部１１では、棄却単語除去部６より受け渡された棄却語
彙が雑音棄却単語データベース１０に含まれているかど
うかを調べ含まれる場合は雑音棄却単語格納部４２２
へ、含まれない場合は発話棄却単語格納部４２１へ格納
する。It should be noted that noisy speech does not include speech when the user speaks other than the recognized word. The rejected word determination unit 11 checks whether or not the rejected vocabulary passed from the rejected word removal unit 6 is included in the noise rejected word database 10.
If not included, it is stored in the utterance rejection word storage unit 421.

【００５８】このような棄却単語の振り分けを行うこと
により、雑音音声の殆どは雑音棄却単語格納部に含まれ
る語彙に、認識単語以外の誤発話は発話棄却単語格納部
に含まれる語彙に認識されやすくなる。By rejecting such rejected words, most of the noise speech is recognized by the vocabulary included in the noise rejected word storage unit, and erroneous utterances other than the recognized words are recognized by the vocabulary included in the utterance rejected word storage unit. It will be easier.

【００５９】[0059]

【発明の効果】本発明は、あらかじめ音節ネットを利用
した膨大な棄却単語データベースの中から音声認識に必
要な認識単語に影響を及ぼす語彙を除去し生成された棄
却単語を音声認識辞書に組み込むことにより、雑音や認
識単語以外の発話を棄却し認識単語が正しく発話された
場合のみ正しい認識結果を返すことが可能となる。According to the present invention, a vocabulary affecting a recognition word required for speech recognition is removed from a huge rejection word database using a syllable net in advance, and the generated rejection word is incorporated into a speech recognition dictionary. Thus, it is possible to reject speech other than noise and the recognized word, and return a correct recognition result only when the recognized word is correctly uttered.

【００６０】また、正解音声を認識させた結果を元に棄
却単語除去を行うことにより、近距離ではっきり発話す
ることを前提とした音声認識インタフェースのみでなく
音声認識の利用形態に適応した雑音棄却辞書を生成する
ことが可能となる。Further, by removing rejected words based on the result of recognizing the correct speech, not only a speech recognition interface premised on clearly speaking at a short distance, but also noise rejection adapted to the use form of speech recognition. A dictionary can be generated.

【００６１】さらに、棄却単語を雑音棄却単語と発話棄
却単語に分けることにより雑音音声と認識単語以外の誤
発話を区別することが可能である。その結果として雑音
音声は無視し、誤発話に対してはその旨を利用者に対し
教示するようなインタフェースを構築することが可能と
なる。Further, by dividing the rejected words into noise rejected words and speech rejected words, it is possible to distinguish between noise speech and erroneous utterances other than the recognized words. As a result, it is possible to construct an interface that disregards the noise voice and teaches the user about the erroneous utterance.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態の構成図である。FIG. 1 is a configuration diagram of a first embodiment of the present invention.

【図２】本発明の第２の実施の形態の構成図である。FIG. 2 is a configuration diagram of a second embodiment of the present invention.

【図３】本発明の第３の実施の形態の構成図である。FIG. 3 is a configuration diagram of a third embodiment of the present invention.

【図４】本発明の第４の実施の形態の構成図である。FIG. 4 is a configuration diagram of a fourth embodiment of the present invention.

【図５】本発明の第５の実施の形態の構成図である。FIG. 5 is a configuration diagram of a fifth embodiment of the present invention.

【図６】音声認識空間のイメージを示す図である。FIG. 6 is a diagram showing an image of a speech recognition space.

【図７】棄却辞書構築後の音声認識空間イメージを示
す図である。FIG. 7 is a diagram showing a speech recognition space image after the rejection dictionary is constructed.

【図８】雑音棄却データベースの例を示す図である。FIG. 8 is a diagram illustrating an example of a noise rejection database.

【図９】棄却単語除去部の動作を示すフローチャート
である。FIG. 9 is a flowchart showing an operation of a rejected word removing unit.

[Explanation of symbols]

１音声入力部２音声認識部３音声認識結果出力部４認識辞書部４１認識単語格納部４２棄却単語格納部４２１発話棄却単語格納部４２２雑音棄却単語格納部５棄却単語データベース６棄却単語判定部７類似語生成部８正解音声入力部９棄却単語判定部１０雑音棄却単語データベース１１棄却単語判定部 Reference Signs List 1 voice input unit 2 voice recognition unit 3 voice recognition result output unit 4 recognition dictionary unit 41 recognized word storage unit 42 rejected word storage unit 421 uttered rejected word storage unit 422 noise rejected word storage unit 5 rejected word database 6 rejected word determination unit 7 Similar word generator 8 Correct speech input unit 9 Rejected word determination unit 10 Noise rejection word database 11 Rejected word determination unit

Claims

[Claims]

1. A voice input unit that inputs voice and outputs a voice input signal, a voice recognition unit that outputs a voice recognition result based on the voice input signal, and a voice output unit that outputs the voice recognition result. A speech recognition system having a recognition dictionary used for speech recognition in the speech recognition means, wherein the recognition dictionary is for recognizing a recognition word storage for storing a recognition word and invalid speech utterance or noise. And a rejection word database that stores the rejection words comprehensively, and a vocabulary matching the recognition vocabulary passed from the recognition word storage unit is removed from the rejection word database. An apparatus for creating a recognition dictionary, comprising: a rejection word removing unit that stores remaining vocabulary in the rejection word storage unit.

2. The recognition dictionary according to claim 1, further comprising: a synonym generation unit that generates a synonym of the recognition word passed from the recognition word storage unit and outputs the synonym to the rejection word removal unit. Creating device.

3. A correct speech input means for inputting a correct speech which uttered the recognized word correctly to the speech recognition means, and acquiring the recognition result when the correct speech is input from the speech recognition result output means, 3. The recognition dictionary creating apparatus according to claim 2, further comprising: a rejection word determining unit that outputs the recognition result to the rejection word removing unit when the recognition result is the rejection word.

4. The rejected word determining means outputs the recognition result to a similar word generating means instead of a rejected word removing means, and outputs the similar word to the rejected word removing means. Item 3. The recognition dictionary creation device according to Item 3.

5. A rejection word database, wherein the rejection word storage means comprises speech rejection word storage means and noise rejection word storage means, and stores the rejection vocabulary corresponding to surrounding speech, machine noise, and environmental noise. If the rejected word passed from the rejection word removing unit is included in the noise rejection word database, the rejection word is stored in the noise rejection word storage unit, and if not, the rejection word is stored in the utterance rejection word storage unit. The apparatus according to claim 3, further comprising a rejection word determination unit.

6. A recognition dictionary used for voice recognition, wherein the recognition dictionary stores a recognition word storing a recognition word and a rejection word storing a vocabulary for rejecting invalid speech utterance or noise. The rejection word storage unit includes a recognition word storage unit for storing a recognition word, an utterance rejection word storage unit for storing a vocabulary for rejecting invalid speech utterance and noise, and an environmental noise rejection word storage area. A word dictionary characterized by the following.

7. A method for generating a rejection dictionary comprising the steps of: removing a rejected word matching a recognized vocabulary from a rejected word database; and removing a rejected vocabulary matching a similar vocabulary of the recognized vocabulary.

8. A step of removing a rejected word matching a recognized vocabulary from a rejected word database, a step of removing a rejected vocabulary matching a similar vocabulary of the recognized vocabulary, a rejected vocabulary using a plurality of recognized vocabulary pronunciation data. And generating a rejection dictionary.