JP3112037B2

JP3112037B2 - Voice recognition device

Info

Publication number: JP3112037B2
Application number: JP03298253A
Authority: JP
Inventors: 哲也室井
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1991-10-17
Filing date: 1991-10-17
Publication date: 2000-11-27
Anticipated expiration: 2015-11-27
Also published as: JPH05108091A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、入力された音声の認識
を行なう音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus for recognizing inputted speech.

【０００２】[0002]

【従来の技術】一般に音声認識装置においては、入力さ
れた音声の特徴パタ−ンと予め辞書等に登録されている
種々の標準パタ−ンとを照合し、標準パタ−ンのうちで
入力音声の特徴パタ−ンと類似しているものを候補（認
識結果）として選出し、選出された候補のうちで最も大
きな類似度をもつ第１の候補を基本的には最終的な認識
結果として選択するが、従来においてはさらに、最も大
きな類似度をもつ第１の候補が得られたときにも、例え
ばこの第１の候補の認識信頼度が低いような場合には最
終的な認識結果をリジェクトとして出力したり、あるい
はユ−ザに認識結果の確認を求め、確認が得られたもの
だけを正しい認識結果として出力するようになってい
た。このようの機能が付加されていることによって、認
識対象となる音声以外の音響信号が入力された場合や入
力音声が不安定な場合などに誤認識動作がなされるのを
防止することができる。2. Description of the Related Art In general, in a speech recognition apparatus, a feature pattern of an inputted speech is collated with various standard patterns registered in a dictionary or the like in advance, and the input speech among the standard patterns is compared. Are selected as candidates (recognition results), and the first candidate having the highest similarity among the selected candidates is basically selected as the final recognition result. However, conventionally, even when the first candidate having the highest similarity is obtained, for example, if the recognition reliability of the first candidate is low, the final recognition result is rejected. Or asking the user to confirm the recognition result, and outputting only the confirmed result as a correct recognition result. By adding such a function, it is possible to prevent an erroneous recognition operation from being performed when an acoustic signal other than the voice to be recognized is input or when the input voice is unstable.

【０００３】[0003]

【発明が解決しようとする課題】このように従来の音声
認識装置においては、得られた候補の認識信頼度が、例
えば所定の閾値よりも低いときにはリジェクトなどの機
能によって誤認識等を有効に防止することができるが、
その反面、これらの機能が付加されているために、ある
特定の言葉の音声についてはこれが入力されてもリジェ
クト等され易く、正しい認識結果を即座に得ることがで
きなくなるなどの問題があった。例えば、認識対象語彙
の中に音響的に類似した言葉があるような場合には、認
識信頼度が低くなり、この場合に何回言い直しを行なっ
ても認識信頼度が低い原因を解消できず、言い直した言
葉についても同様の頻度でリジェクト等がされ易いの
で、正しい認識結果をすぐに得ることができない。As described above, in the conventional speech recognition apparatus, when the recognition reliability of the obtained candidate is lower than a predetermined threshold value, for example, erroneous recognition is effectively prevented by a function such as rejection. Can be
On the other hand, since these functions are added, there is a problem that the voice of a specific word is easily rejected even if it is input, and a correct recognition result cannot be obtained immediately. For example, when there are words that are acoustically similar in the vocabulary to be recognized, the recognition reliability is low. In this case, the cause of the low recognition reliability cannot be eliminated even if repetition is performed many times. In addition, since rejected words are likely to be rejected at a similar frequency, a correct recognition result cannot be obtained immediately.

【０００４】本発明は、このような従来の欠点を解決す
るものであり、認識結果が採用されず言い直しをする場
合に、言い直した言葉については正しい認識結果を得易
く、正しい認識結果を早期に得ることの可能な音声認識
装置を提供することを目的としている。SUMMARY OF THE INVENTION The present invention solves such a conventional drawback. In the case where the recognition result is not adopted and restatement is performed, a correct recognition result can be easily obtained for the reworded word, and the correct recognition result can be obtained. It is an object of the present invention to provide a speech recognition device that can be obtained early.

【０００５】[0005]

【課題を解決するための手段】上記目的を達成するため
に請求項１記載の発明は、音声を入力する音声入力手段
と、入力された音声を認識する認識手段と、認識手段に
よって得られた認識結果を採用するか否かを判定する判
定手段とを有し、前記判定手段において認識結果が採用
されなかった場合に、前記認識手段は、次の入力音声を
認識対象語彙を変更して認識するようになっており、こ
の際、前記認識手段は、変更後の認識対象語彙を前回の
認識結果の上位候補とすることを特徴としている。According to a first aspect of the present invention, there is provided a voice input unit for inputting a voice, a recognition unit for recognizing the input voice, and a recognition unit. Determining means for determining whether or not to use the recognition result; and if the recognition result is not used by the determining means, the recognition means recognizes the next input voice by changing the vocabulary to be recognized. It is supposed to
In this case, the recognizing means sets the recognition target vocabulary after the change
It is characterized in that it is set as a top candidate of the recognition result .

【０００６】[0006]

【０００７】また、請求項２記載の発明では、音声を入
力する音声入力手段と、入力された音声を認識する認識
手段と、認識手段によって得られた認識結果を採用する
か否かを判定する判定手段とを有し、前記判定手段にお
いて認識結果が採用されなかった場合に、前記認識手段
は、次の入力音声を認識対象語彙を変更して認識するよ
うになっており、この際、前記認識手段は、変更後の認
識対象語彙を前回の認識結果の上位候補と該上位候補の
同意味語とにすることを特徴としている。[0007] According to the second aspect of the present invention, audio is input.
Voice input means for inputting and recognition for recognizing the input voice
Means and the recognition result obtained by the recognition means is adopted
Determination means for determining whether or not the
And the recognition result is not adopted,
Changes the vocabulary to be recognized for the next input speech.
In this case, the recognition means is characterized in that the recognition target vocabulary after the change is a high-rank candidate of the previous recognition result and a synonym of the high-rank candidate.

【０００８】また、請求項３記載の発明は、音声を入力
する音声入力手段と、入力された音声を認識する認識手
段と、認識結果を採用するか否かを判定する判定手段と
を有し、前記判定手段において、認識結果が採用されな
かった場合に、前記認識手段は、次の入力音声の認識結
果のスコアと前回の認識結果のスコアを組合せて統合し
たスコアを求めるようになっていることを特徴としてい
る。[0008] The invention according to claim 3 has voice input means for inputting voice, recognition means for recognizing the input voice, and determination means for determining whether or not to use the recognition result. When the recognition result is not adopted by the determination means, the recognition means obtains an integrated score by combining the score of the recognition result of the next input voice and the score of the previous recognition result. It is characterized by:

【０００９】また、請求項４記載の発明では、判定手段
において認識結果が採用されなかった場合に、前記認識
手段は、同じ認識対象語彙のスコアを組合せて統合した
スコアとするようになっていることを特徴としている。In the invention according to claim 4 , when the recognition result is not adopted in the determination means, the recognition means combines the scores of the same vocabulary to be recognized into an integrated score. It is characterized by:

【００１０】また、請求項５記載の発明では、判定手段
において認識結果が採用されなかった場合に、前記認識
手段は、同意味語のスコアを組合せて統合したスコアと
するようになっていることを特徴としている。[0010] In the invention according to claim 5 , when the recognition result is not adopted in the determination means, the recognition means combines the scores of the synonyms to obtain an integrated score. It is characterized by.

【００１１】[0011]

【作用】請求項１記載の発明では、認識結果が採用され
ず言い直しをする場合に、言い直した発声については、
認識対象語彙を変更して認識する。この際、変更後の認
識対象語彙を前回の認識結果の上位候補とすることで、
変更後の認識処理では認識対象語彙数を少なくすること
ができる。 According to the first aspect of the invention, when the recognition result is not adopted and the restatement is made, the re-uttered utterance is
Change the recognition target vocabulary and recognize. At this time, the certification after the change
By making the target vocabulary the top candidate of the previous recognition result,
Reduce the number of vocabulary words to be recognized in the post-change recognition process
Can be.

【００１２】[0012]

【００１３】また、請求項２記載の発明では、言い直し
の発声に対する認識対象語彙を前回の認識結果の上位候
補とその同意味語に設定することで、前回の発声で認識
できなかった場合、ユ−ザ−が異なる言葉で再発声して
も、正しい認識を行なうことができる。According to the second aspect of the present invention, by setting the recognition target vocabulary for the utterance of the rephrasing to the top candidate of the previous recognition result and its synonym, if the utterance cannot be recognized by the previous utterance, Even if the user re-utters in a different language, correct recognition can be performed.

【００１４】また、請求項３，４記載の発明では、認識
結果が採用されず言い直しをする場合に、言い直した発
声の認識を前回のスコアと今回のスコアを組合せて認識
する。According to the third and fourth aspects of the present invention, in the case where the recognition result is not adopted and restatement is performed, recognition of the reuttered utterance is performed by combining the previous score and the current score.

【００１５】さらに、請求項５記載の発明のように、前
回のスコアを同意味語について組合せることで、一層高
い認識性能が得られる。Further, by combining the previous score with the synonym as in the invention of claim 5 , higher recognition performance can be obtained.

【００１６】[0016]

【実施例】以下、本発明の実施例を図面に基づいて説明
する。図１は本発明に係る音声認識装置のブロック図で
ある。図１の音声認識装置は、音声を入力する音声入力
部１と、入力された音声の特徴パターンを予め登録され
ている種々の標準パターンと照合し、標準パターンのう
ちで入力音声の特徴パターンと類似しているものを候補
（認識結果）として選出する認識部２と、選出された候
補のうちで最も大きな類似度をもつものを基本的には最
終的な認識結果として選択するが、この候補の認識信頼
度をも考慮してこの候補を最終的な認識結果として採用
するか否かを判定する判定部３とを有している。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram of a speech recognition device according to the present invention. The voice recognition device of FIG. 1 collates a voice input unit 1 for inputting voice, a feature pattern of the input voice with various standard patterns registered in advance, and detects a feature pattern of the input voice among the standard patterns. The recognizing unit 2 selects similar ones as candidates (recognition results), and the one having the highest similarity among the selected candidates is basically selected as the final recognition result. And a determination unit 3 that determines whether to adopt this candidate as a final recognition result in consideration of the recognition reliability.

【００１７】認識部２における認識手法には種々の方式
のものを用いることができる。認識部２は、例えば認識
対象語彙ごとに１つの単語標準パターンを用意し、ＤＰ
マッチングの手法を用いて、入力音声の特徴パターンと
標準パターンとの距離を認識対象語彙の類似度値（スコ
ア）として求め、類似度値の大きさの順に所定順位まで
の候補を選出するようになっている。Various methods can be used for the recognition method in the recognition section 2. The recognition unit 2 prepares, for example, one word standard pattern for each vocabulary to be recognized, and
Using a matching technique, the distance between the feature pattern of the input speech and the standard pattern is obtained as a similarity value (score) of the vocabulary to be recognized, and candidates up to a predetermined rank are selected in the order of the magnitude of the similarity value. Has become.

【００１８】また、判定部３における認識信頼度の求め
方にも種々の方式のものを用いることができる。例え
ば、最も大きな類似度値をもつ第１の候補の類似度値Ｒ
（１）とその次の第２の候補の類似度値Ｒ（２）との差
や比をとったものを第１の候補についての認識信頼度Ｓ
として求めることができる。例えば、認識信頼度Ｓを次
式のように類似度値Ｒ（１）とＲ（２）との差により求
める。Also, various methods can be used for determining the recognition reliability in the determination unit 3. For example, the similarity value R of the first candidate having the largest similarity value
The difference or ratio between (1) and the similarity value R (2) of the next second candidate is taken as the recognition reliability S of the first candidate.
Can be obtained as For example, the recognition reliability S is obtained from the difference between the similarity values R (1) and R (2) as in the following equation.

【００１９】[0019]

【数１】Ｓ＝Ｒ（２）−Ｒ（１）S = R (2) -R (1)

【００２０】そして、判定部３は、この認識信頼度Ｓが
例えば閾値ＴＨよりも大きいか否かにより、この第１の
候補を認識結果として採用するか否かを判定し、採用し
ない場合には、ユーザに再発声，すなわち言い直しを促
すようになっている。The determination unit 3 determines whether or not to use the first candidate as a recognition result based on whether or not the recognition reliability S is larger than, for example, a threshold value TH. Thus, the user is urged to re-speak, that is, rephrase.

【００２１】ところで、本発明の第１の実施例において
は、判定部３が第１の候補の認識結果を採用せず、ユー
ザに言い直しを促し、ユーザが再発声を行なう場合に、
認識部２は、この次の入力音声を認識対象語彙を変更し
て認識するようになっている。すなわち、認識部２は、
変更後の認識対象語彙に前回の認識結果の上位候補を設
定するか、あるいは、変更後の認識対象語彙に前回の認
識結果の上位候補と該上位候補の同意味語とを設定し
て、次の入力音声に対して認識を行なうようになってい
る。In the first embodiment of the present invention, when the determination unit 3 does not use the recognition result of the first candidate, prompts the user to rephrase, and when the user re-speaks,
The recognition unit 2 recognizes the next input voice by changing the recognition target vocabulary. That is, the recognition unit 2
Either set the top candidate of the previous recognition result in the changed recognition target vocabulary, or set the top candidate of the previous recognition result and the synonym of the top candidate in the changed recognition target vocabulary, and The input speech is recognized.

【００２２】このような構成の第１の実施例の音声認識
装置において、認識部２が変更後の認識対象語彙に前回
の認識結果の上位候補を設定して認識を行なうようにな
っている場合の動作について説明する。ユーザが現在、
音声を入力し、その入力音声を認識部２で認識して、第
１位から所定順位までの候補が得られ、判定部３で判定
した結果、第１の候補の認識信頼度Ｓが閾値ＴＨよりも
小さいときには、現在の認識結果を、例えば図２に示す
ように、「イ：（第１位の言葉），ロ：（第２位の言
葉），ハ：（第３位の言葉），イロハで入力して下さ
い」旨、ディスプレイに表示したり、あるいは合成音声
で指示して、ユーザに言い直しを促がす。In the speech recognition apparatus according to the first embodiment having such a configuration, the recognition unit 2 performs recognition by setting the upper candidate of the previous recognition result in the changed recognition target vocabulary. Will be described. The user is currently
A voice is input, and the input voice is recognized by the recognition unit 2, and candidates from the first rank to a predetermined rank are obtained. As a result of the determination by the determination unit 3, the recognition reliability S of the first candidate becomes the threshold TH. If it is smaller, the current recognition result is, for example, as shown in FIG. 2, "A: (first word), B: (second word), C: (third word), "Please input with Iroha." Is displayed on the display or instructed by a synthesized voice to urge the user to restate.

【００２３】これにより、ユーザが再発声して、これが
音声認識部２に送られると、音声認識部２では、上記認
識結果の上位候補（例えば第１位から第３位まで，すな
わちイ，ロ，ハ）を認識対象語彙として、言い直しのな
された音声入力に対して認識を行ない、判定部３では、
その認識結果から第１の候補の認識信頼度Ｓを求め、こ
の認識信頼度Ｓが閾値ＴＨよりも大きいときに、この第
１の候補を最終的な認識結果とする。As a result, when the user re-speaks and is sent to the speech recognition unit 2, the speech recognition unit 2 selects the top candidates (for example, from the first place to the third place, ie, a, b) of the above recognition result. , C) as the recognition target vocabulary, recognizes the rephrased voice input, and determines
The recognition reliability S of the first candidate is obtained from the recognition result, and when the recognition reliability S is larger than the threshold value TH, the first candidate is set as the final recognition result.

【００２４】このように、ある特定の言葉について１回
目の発声で最終的な認識結果が得られなかった場合に
も、その言葉を言い直させ、この言葉の入力音声を認識
対象語彙を変更して認識するようにしているので、第１
の候補の認識信頼度Ｓが閾値ＴＨよりも大きくなる蓋然
性を高めることができ、正しい認識結果が得られ易くな
る。As described above, even when the final recognition result is not obtained by the first utterance of a specific word, the word is re-stated, and the input speech of this word is changed in the recognition target vocabulary. The first
, The probability that the recognition reliability S of the candidate becomes larger than the threshold value TH can be increased, and a correct recognition result can be easily obtained.

【００２５】次に、音声認識部２が、変更後の認識対象
語彙に前回の認識結果の上位候補と該上位候補の同意味
語とを設定して認識を行なうようになっている場合の動
作について説明する。なお、変更後の認識対象語彙の範
囲を上位候補の同意味語にも拡張したのは、１回目に発
声した音声が認識されなかったのはこの言葉が認識対象
語彙に含まれていなかったためとユーザが考え、意味は
同じだが別の言葉で言い直すことがあることを考慮した
ものである。Next, the operation in the case where the speech recognition unit 2 performs recognition by setting the upper candidate of the previous recognition result and the synonym of the higher candidate in the recognition target vocabulary after the change. Will be described. The reason that the range of the vocabulary to be recognized after the change was extended to the synonyms of the top candidates is that the first uttered voice was not recognized because this word was not included in the vocabulary to be recognized. It takes into account that the user may think and have the same meaning but rephrase in different words.

【００２６】音声認識部２がこのような構成となってい
る場合、１回目の発声に対する認識の結果、例えば、図
３（ａ）に示すような候補が所定順位まで得られている
ときに、言い直した言葉の音声に対する認識は、図３
（ａ）に示す候補の例えば上位３位までの候補「キノ
ウ」，「キンヨウ」，「キョウ」を認識対象語彙とする
とともに、さらに、これらの同意味語をも認識対象語彙
としてなされる。例えば、今日が火曜日である場合に
は、「キノウ」の同意味語としての「ゲツヨウビ」，
「キョウ」の同意味語としての「カヨウビ」をも認識対
象語彙に加え、認識対象語彙を図３（ｂ）に示すような
ものにして認識を行なう。これにより、１回目の発声で
最終的な認識結果が得られなかったときに、意味は同じ
であるが異なる言葉でユーザが再発声しても正しい認識
結果が得られ易くなる。In the case where the speech recognition unit 2 has such a configuration, for example, when a candidate as shown in FIG. Recognition of the rephrased speech is shown in FIG.
For example, the top three candidates "kinou", "kinyou", and "kyo" of the candidates shown in (a) are set as recognition target vocabularies, and these synonyms are also set as recognition target vocabularies. For example, if today is Tuesday, the synonyms for "Kinou" are "
"Kyoubi" as a synonym of "Kyou" is added to the recognition target vocabulary, and the recognition target vocabulary is as shown in FIG. Thus, when the final recognition result is not obtained by the first utterance, a correct recognition result is easily obtained even if the user re-utters with the same meaning but different words.

【００２７】なお、同意味語については、予め登録して
おいても良いし、あるいは装置の動作状態に合わせてダ
イナミックに変更するようにしても良い。また、図２の
ように認識結果が指示されれば、「イ，ロ，ハ」をそれ
ぞれ「キノウ，キンヨウ，キョウ」の同意味語として認
識させるようにしても良い。The synonyms may be registered in advance, or may be dynamically changed according to the operation state of the apparatus. Further, if the recognition result is indicated as shown in FIG. 2, "i, ro, ha" may be respectively recognized as synonyms of "kinou, kinyou, kyo".

【００２８】また、本発明の第２の実施例においては、
判定部３が第１の候補の認識結果を採用せず、ユーザに
言い直しを促がし、ユーザが再発声を行なう場合に、認
識部２は、現在の認識で得られた各候補の類似度値（ス
コア）と次の入力音声で得られる各候補の類似度値（ス
コア）とを組合せて、統合した類似度値（スコア）を求
めるようになっている。すなわち、認識部２は、同じ認
識対象語彙の類似度値（スコア）を組合せて統合した類
似度値（スコア）を求めるか、あるいは、同意味語の類
似度値（スコア）を組合せて統合した類似度値（スコ
ア）として求めて、次の入力音声に対し認識を行なうよ
うになっている。In the second embodiment of the present invention,
When the determination unit 3 does not adopt the recognition result of the first candidate and urges the user to rephrase, and the user makes a re-utterance, the recognition unit 2 determines the similarity of each candidate obtained by the current recognition. The combined similarity value (score) is obtained by combining the similarity value (score) of each candidate obtained by the next input voice with the similarity value (score). That is, the recognizing unit 2 obtains an integrated similarity value (score) by combining similarity values (scores) of the same recognition target vocabulary, or integrates by combining similarity values (scores) of synonyms. It is determined as a similarity value (score), and the next input speech is recognized.

【００２９】このような構成の第２の実施例の音声認識
装置において、認識部２が同じ認識対象語彙の類似度値
（スコア）を組合せて統合した類似度値（スコア）を求
めて認識を行なうようになっている場合の動作について
説明する。ユーザが現在、音声を入力し、その入力音声
を認識部２で認識して、第１位から所定順位までの候補
が得られ、判定部３で判定した結果、第１の候補の認識
信頼度Ｓが閾値ＴＨよりも小さいときには、現在の認識
結果を、例えば図４（ａ）のように保持する。すなわ
ち、図４（ａ）の例では、第１の候補「キノウ」の認識
信頼度Ｓは数１に従がい、“２”であり、閾値ＴＨを例
えば“５”とすると、これよりも小さいので、第１位か
ら所定順位までの候補について、その候補名とともにそ
の候補の標準パターンと入力音声の特徴パターンとの間
の距離，すなわち類似度値（スコア）が保持される。In the speech recognition apparatus of the second embodiment having such a configuration, the recognition unit 2 obtains a combined similarity value (score) by combining similarity values (scores) of the same vocabulary to be recognized, and performs recognition. The operation in the case where the operation is performed will be described. The user currently inputs a voice, the input voice is recognized by the recognition unit 2, candidates from the first rank to a predetermined rank are obtained, and the determination unit 3 determines that the recognition reliability of the first candidate is high. When S is smaller than the threshold value TH, the current recognition result is held, for example, as shown in FIG. That is, in the example of FIG. 4A, the recognition reliability S of the first candidate “Kinou” is “2” according to Expression 1, and is smaller than the threshold TH if the threshold is set to “5”, for example. Therefore, for the candidates from the first place to the predetermined place, the distance between the standard pattern of the candidate and the feature pattern of the input voice, that is, the similarity value (score) is held together with the candidate name.

【００３０】次いで、第１の実施例で述べたと同様に、
ユーザに再発声，すなわち言い直しを促がし、ユーザが
再発声を行なうと、その入力音声に対して上位候補のみ
を認識対象語彙として認識を行なう。その結果、例えば
図４（ｂ）のような結果が得られたとすると、認識部２
ではさらに、上位候補（例えば上位３位までの候補）の
各々について、図４（ａ）に示す１回目の発声での類似
度値（スコア）と図４（ｂ）に示す２回目の発声での類
似度値（スコア）とを組合せて統合する。例えば、これ
らを加算する。その結果、各候補の統合された類似度値
（スコア）は、図４（ｃ）のようになり、判定部３で
は、統合された類似度値（スコア）を基に第１の候補の
認識信頼度２Ｓを算出し、閾値ＴＨを例えば２倍にし
て、認識信頼度２Ｓが閾値２ＴＨよりも大きいか否かに
より第１の候補を認識結果として採用するか否かの判定
を行なう。図４（ｃ）の例では、第１の候補「キノウ」
の認識信頼度２Ｓは、数１に従がって、“１７”とな
り、閾値２ＴＨが“１０”であるとすると、これよりも
大きいので、第１の候補「キノウ」が最終的な認識結果
として得られる。Next, as described in the first embodiment,
The user is encouraged to re-speak, that is, rephrasing, and when the user re-speaks, only the top candidate is recognized as the recognition target vocabulary for the input voice. As a result, for example, if a result as shown in FIG.
Then, for each of the top candidates (for example, the top three candidates), the similarity value (score) in the first utterance shown in FIG. 4A and the second utterance shown in FIG. And the similarity values (scores) are combined. For example, these are added. As a result, the integrated similarity value (score) of each candidate is as shown in FIG. 4C, and the determination unit 3 recognizes the first candidate based on the integrated similarity value (score). The reliability 2S is calculated, the threshold value TH is doubled, for example, and it is determined whether the first candidate is used as a recognition result based on whether the recognition reliability 2S is greater than the threshold value 2TH. In the example of FIG. 4C, the first candidate “Kinou”
The recognition reliability 2S is “17” in accordance with Equation 1, and if the threshold value 2TH is “10”, the first candidate “Kinou” is larger than the final recognition result. Is obtained as

【００３１】このように、１回目の発声で最終的な認識
結果が得られなかった場合にも、その言葉を言い直させ
て２回目の発声についての認識処理を行ない、１回目の
類似度値（スコア）と２回目の類似度値（スコア）とを
組合せて認識させることにより、正しい認識結果が得ら
れ易くなり、認識精度が向上する。As described above, even when the final recognition result is not obtained by the first utterance, the word is re-worded and the recognition process for the second utterance is performed, and the first similarity value is obtained. By performing recognition in combination with the (score) and the second similarity value (score), a correct recognition result is easily obtained, and the recognition accuracy is improved.

【００３２】次に、認識部２が同意味語の類似度値（ス
コア）を組合せて統合した類似度値（スコア）として求
めて認識を行なうようになっている場合の動作について
説明する。１回目の発声に対しては上述の例と同様にし
て、現在の認識結果を、例えば図４（ａ）のように保持
しているとする。次いで、ユーザが再発声を行なうと、
その入力音声に対して上位候補とその同意味語を認識対
象語彙として、再発声に対する認識を行なう。例えば、
図４（ａ）に示す候補の例えば上位３位までの候補「キ
ノウ」，「キンヨウ」，「キョウ」を認識対象語彙とす
るとともに、今日が火曜日である場合には、「キノウ」
の同意味語としての「ゲツヨウビ」，「キョウ」の同意
味語としての「カヨウビ」をも認識対象語彙に加える。
このような認識対象語彙につき、例えば図５に示すよう
な認識結果が得られたとすると、認識部２ではさらに、
図４（ａ）に示す１回目の発声での類似度値（スコア）
と図５に示す２回目の発声での類似度値（スコア）とを
組合せて統合した類似度値（スコア）を算出する。Next, a description will be given of an operation in a case where the recognizing unit 2 performs recognition by obtaining similarity values (scores) by combining similarity values (scores) of synonyms. For the first utterance, the current recognition result is assumed to be held, for example, as shown in FIG. Next, when the user re-speaks,
Recognition of the re-speech is performed on the input speech using the top candidates and their synonyms as recognition target vocabulary. For example,
For example, the top three candidates “Kinou”, “Kinyou”, and “Kyou” of the candidates shown in FIG. 4A are set as recognition target vocabularies, and if today is Tuesday, “Kinou”
Also, "getsoubi" as a synonym of "" and "kayoubi" as a synonym of "kyo" are added to the vocabulary to be recognized.
Assuming that a recognition result as shown in FIG. 5 is obtained for such a vocabulary to be recognized, the recognizing unit 2 further calculates
Similarity value (score) in the first utterance shown in FIG.
And the similarity value (score) of the second utterance shown in FIG. 5 is combined to calculate an integrated similarity value (score).

【００３３】この際、同意味語，例えば「キノウ」，
「ゲツヨウ」では、いずれか類似度値（スコア）が小さ
い方，すなわち距離の小さい方を採用して統合する。具
体的には、図５において、「キノウ」と「ゲツヨウ」と
では、「キノウ」の方が類似度値（スコア）が小さいの
で、「キノウ」の方の類似度値（スコア），すなわち
“６４”を採用し、これを図４（ａ）の「キノウ」の類
似度値（スコア），すなわち“３１”と加算して統合
し、「キノウ，ゲツヨウ」について、統合した類似度値
（スコア）として“９５”を得る。同様にして、「キン
ヨウ」については、統合した類似度値（スコア）とし
て、“８６”を得、「キョウ，カヨウ」については、
“６４”を得る。この場合、第１の候補は、統合された
類似度値の最も小さなもの，すなわち「キョウ，カヨ
ウ」であり、「キョウ，カヨウ」の認識信頼度２Ｓは、
数１に従がい、“２２”となり、閾値２ＴＨが“１０”
であるとすると、これよりも大きいので、第１の候補
「キョウ，カヨウ」が最終的な認識結果として得られ
る。At this time, synonyms such as “Kinou”,
In “getsou”, one having the smaller similarity value (score), that is, the smaller distance is adopted and integrated. Specifically, in FIG. 5, between “Kinou” and “Getuyou”, “Kinou” has a smaller similarity value (score) than “Kinou”, so the similarity value (score) of “Kinou”, ie, “ 64 ", which is added to the similarity value (score) of" Kinou "in FIG. 4A, that is," 31 "and integrated, and the integrated similarity value (score) for" Kinou, Getuyou "is added. ) As "95". Similarly, for “Kinyou”, “86” is obtained as an integrated similarity value (score), and for “Kyou, Kayou”,
"64" is obtained. In this case, the first candidate is the one with the smallest integrated similarity value, that is, “Kyou, Kayou”, and the recognition reliability 2S of “Kyou, Kayou” is:
According to Equation 1, it becomes “22” and the threshold value 2TH is “10”.
If so, the first candidate “Kyo, Kayou” is obtained as the final recognition result.

【００３４】このように、１回目の類似度値（スコア）
と２回目の類似度値（スコア）とをさらに同意味語につ
いて組合せて認識させることにより、正しい認識結果を
一層得易くなり、認識精度を一層向上させることができ
る。As described above, the first similarity value (score)
By further recognizing the same similarity value (score) with the second similarity value (score), the correct recognition result can be more easily obtained, and the recognition accuracy can be further improved.

【００３５】[0035]

【発明の効果】以上に説明したように、請求項１記載の
発明では、認識結果が採用されず言い直しをする場合
に、言い直した発声については、認識対象語彙を変更し
て認識するようにしているので、正しい認識結果を得易
く、正しい認識結果を早期に得ることができる。特に、
変更後の認識対象語彙を前回の認識結果の上位候補とす
ることにより、変更後の認識処理では認識対象語彙数が
少なくなり、これにより、正しい認識結果が得られ易く
なる。 As described above, according to the first aspect of the present invention, in the case where the recognition result is not adopted and the restatement is performed, the reworded utterance is recognized by changing the recognition target vocabulary. Therefore, a correct recognition result can be easily obtained, and a correct recognition result can be obtained at an early stage. In particular,
The recognition target vocabulary after the change is set as the top candidate of the previous recognition result.
In the recognition process after the change, the number of words to be recognized is
Less, which makes it easier to get correct recognition results
Become.

【００３６】[0036]

【００３７】また、請求項２記載の発明では、言い直し
の発声に対する認識対象語彙を前回の認識結果の上位候
補とその同意味語に設定しているので、前回の発声で認
識できなかった場合、ユ−ザ−が異なる言葉で再発声し
ても、正しい認識を行なうことができる。According to the second aspect of the present invention, since the recognition target vocabulary for the utterance of the rephrasing is set to the top candidate of the previous recognition result and its synonym, the recognition is not possible by the previous utterance. Even if the user re-speaks in a different language, correct recognition can be performed.

【００３８】また、請求項３，４記載の発明では、認識
結果が採用されず言い直しをする場合に、言い直した発
声の認識を前回のスコアと今回のスコアを組合せて認識
するようになっているので、正しい認識結果を得易く、
正しい認識結果を早期に得ることができる。According to the third and fourth aspects of the present invention, when a restatement is performed without using a recognition result, recognition of the reuttered utterance is recognized by combining the previous score and the current score. It is easy to get the correct recognition result,
A correct recognition result can be obtained early.

【００３９】また、請求項５記載の発明では、言い直し
をする場合に、前回の認識結果の上位候補とその同意味
語を認識対象語彙として、言い直した発声の認識を行な
うようにしているので、前回の発声で認識できなかった
場合、ユ−ザ−が異なる言葉で再発声しても、その言葉
を正しく認識することができ、さらに、前回のスコアを
同意味語について組合せることで、一層高い認識性能を
得ることができる。According to the fifth aspect of the present invention, when rephrasing is performed, the re-uttered utterance is recognized using the top candidate of the previous recognition result and its synonym as the recognition target vocabulary. Therefore, if the user cannot recognize by the previous utterance, even if the user re-utters with a different word, the user can correctly recognize the word, and furthermore, by combining the previous score with the synonym, And higher recognition performance can be obtained.

[Brief description of the drawings]

【図１】本発明に係る音声認識装置のブロック図であ
る。FIG. 1 is a block diagram of a speech recognition device according to the present invention.

【図２】ユーザに言い直しを促すメッセージの一例を示
す図である。FIG. 2 is a diagram illustrating an example of a message urging a user to rephrase.

【図３】（ａ），（ｂ）は本発明の第１の実施例におけ
る処理を説明するための図である。FIGS. 3A and 3B are diagrams for explaining processing in the first embodiment of the present invention.

【図４】（ａ），（ｂ），（ｃ）は本発明の第２の実施
例における処理を説明するための図である。FIGS. 4A, 4B, and 4C are diagrams for explaining processing according to a second embodiment of the present invention.

【図５】本発明の第２の実施例における処理を説明する
ための図である。FIG. 5 is a diagram for explaining processing in a second embodiment of the present invention.

[Explanation of symbols]

１音声入力部２認識部３判定部 DESCRIPTION OF SYMBOLS 1 Voice input part 2 Recognition part 3 Judgment part

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/00 - 17/00 ──────────────────────────────────────────────────続き Continuation of front page (58) Field surveyed (Int. Cl. ⁷ , DB name) G10L 15/00-17/00

Claims

(57) [Claims]

A voice input unit for inputting voice, a recognition unit for recognizing the input voice, and a determination unit for determining whether or not to use a recognition result obtained by the recognition unit; If the recognition result in the determination means is not adopted, the recognition means is adapted to recognize and modify a vocabulary to be recognized for the next input speech, this time, the
The recognition means uses the changed recognition target vocabulary as a result of the previous recognition result.
A speech recognition device characterized as being a top candidate .

2. A voice input means for inputting voice,
And a recognition means for recognizing the received speech.
Determining means for determining whether or not to employ the obtained recognition result.
The recognition result is not adopted by the determination means
In the case where the recognition target word is
In this case, the recognition means sets the recognition target vocabulary after the change to a top candidate of the previous recognition result and a synonym of the top candidate. <br/> be Ruoto voice recognition device.

3. A voice input device for inputting voice, a recognition device for recognizing the input voice, and a determination device for determining whether or not to use the recognition result, wherein the determination result Is not adopted, the recognizing means obtains an integrated score by combining the score of the recognition result of the next input voice with the score of the previous recognition result. .

Wherein said recognition means, claim recognition result in the determination means when not adopted, characterized in that adapted to a score that integrates a combination of scores for the same recognition target vocabulary 3 The speech recognition device according to the above.

5. The speech recognition apparatus according to claim 3 , wherein said recognition means combines the scores of the synonyms to obtain an integrated score.