JPH02103599A

JPH02103599A - Voice recognizing device

Info

Publication number: JPH02103599A
Application number: JP63258266A
Authority: JP
Inventors: Shoji Kuriki; 章次栗木
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1988-10-13
Filing date: 1988-10-13
Publication date: 1990-04-16

Abstract

PURPOSE:To stabilize utterance and to improve a recognition rate by stopping the output of a voice response section when the voice recognized during voice response is detected in a voice section detecting section. CONSTITUTION:The voice inputted from a microphone 1 is inputted to a characteristic extracting section 2 and the voice section detecting section 3. The characteristic quantity is extracted from the voice inputted in the characteristic extracting section 2 and the voice section is detected from the inputted voice in the voice section detecting section 3. The characteristic quantity in the voice section is compared with a voice recognition dictionary 5 in a recognition section 4 and the most analogous word of the dictionary is determined as a correct answer. On the other hand, the voice is outputted from the voice response data by the command of a voice control section 6 in the voice response section 8. This device is so constituted as to stop the output of the voice response section 8 when the voice recognized during the voice response is detected in the voice section detecting section 3. The high recognition rate is obtd. in this way even if the user begins to start utterance during the response voice output.

Description

【発明の詳細な説明】挟権分互本発明は、音声認識装置に関する。[Detailed description of the invention] division of powers The present invention relates to a speech recognition device.

炙米挟権音声認識装置の認識結果を使用者が確認する手段として
、音声応答が一最に使用されている。更に、認識装置を
使用したシステムのガイダンスとしても音声応答が使用
される。これらの使用法は次の様になる。例えば、ガイ
ダンスであれば音声応答部の出力が終了してから使用者
はそのガイダンスにそって音声を発する事になる。音声
認識結果の出力としての音声応答であれば音声応答部の
出力を確認してから次の発声を行なう倶になる。Voice response is most commonly used as a means for the user to confirm the recognition results of the voice recognition device. Additionally, voice responses are used as guidance for systems using recognition devices. Their usage is as follows. For example, in the case of guidance, the user will utter a voice in accordance with the guidance after the output of the voice response unit is finished. In the case of a voice response as an output of a voice recognition result, the next utterance is made after confirming the output of the voice response section.

しかし使用者が慣れてくると最後まで音声応答部の出力
を聞かずに発声を行なう様になる。なぜならば音声応答
部の出力を最後まで聞いていると応答音声の終了を待た
なくてはならず発声できる回数が減り、音声によるデー
タ入力が遅くなるからである。この場合、認識部の方で
は音声応答部と独自に動作できるので認識を始める事が
出来る。However, as the user gets used to it, he or she will start speaking without listening to the output of the voice response section until the end. This is because if the output of the voice response unit is listened to until the end, the user must wait for the end of the response voice, which reduces the number of times the voice can be uttered and slows down data input by voice. In this case, the recognition section can operate independently of the voice response section, so recognition can begin.

しかし、使用者は音声応答部の出力を聞いたままで発声
を行なう事になる。音声応答の出力はヘラ１−セノ１〜
やハンドセット等で出力されるため、使用者の耳に近い
所で出力される。一般に、発声者が静かな所で発声する
場合とうるさい所で発声する場合では声の音量も発声の
仕方も変化する。音声応答部の出力を聞きながら発声す
る時は周りがうるさい場合と良く似た環境となり、言い
方が不安定になったり、話す音Ｉｆが大きくなったりす
るため認識率が悪くなるという欠点があった。更に、使
用者として自分が発声している時に音声応答部の出力を
聞かなければならないというのは不快であった。However, the user must speak while listening to the output of the voice response section. The output of the voice response is Hera 1-Seno 1~
Since it is output from a device such as a computer or a handset, it is output close to the user's ear. In general, the volume of a speaker's voice and the manner in which he or she speaks changes depending on whether the speaker speaks in a quiet place or in a noisy place. When speaking while listening to the output of the voice response unit, the environment is similar to when the surroundings are noisy, and the disadvantage is that the speech becomes unstable and the speaking sound If becomes louder, resulting in a poor recognition rate. . Furthermore, it was unpleasant for the user to have to listen to the output of the voice response unit while he or she was speaking.

止−一ゴケ本発明は、上述のごとき実情に鑑みてなされたもので、
特に、使用者が応答音声出力中に発声を始めても高い認
識率を得ることのできる音声認識装置を提供することを
目的としてなされたものである。The present invention was made in view of the above-mentioned circumstances.
In particular, the purpose of this invention is to provide a voice recognition device that can obtain a high recognition rate even if the user starts speaking while outputting a response voice.

市り一一戊本発明は、上記目的を達成するために、音声をピックア
ップするマイクと、音声の特徴を抽出する特徴−獣抽出
部と、音声区間を検出する音声区間検出手段と、音声認
識辞書と、入力された音声を音声認３ｉｌ辞書と比較し
最も類似している辞書を正答として出力する認識部と、
音声応答部と、音声応答データ部と、音声出力部と、音
声応答部の動作を制御する応答制御部と、タイマー部と
を有する音声認識装置において、音声応答中に認識され
るべき音声が音声区間検出部で検出された場合に、前記
音声応答部の出力を中止する事を特徴としたものである
。以下、本発明の実施例に基いて説明する。In order to achieve the above object, the present invention provides a microphone that picks up voices, a feature-animal extractor that extracts voice features, a voice section detection means that detects voice sections, and a voice recognition system. a dictionary, a recognition unit that compares the input speech with a speech recognition 3il dictionary and outputs the dictionary that is most similar as the correct answer;
In a voice recognition device that includes a voice response section, a voice response data section, a voice output section, a response control section that controls the operation of the voice response section, and a timer section, the voice to be recognized during the voice response is a voice. It is characterized in that the output of the voice response section is stopped when the section detection section detects the detection. Hereinafter, the present invention will be explained based on examples.

第１図は、本発明の一実施例を説明するための構成図で
、図中、１はマイクロフォン、２は特徴抽出部、３は音
声区間検出部、４は認識部、５は音声辞書、６は応答制
御部、７は音声応答データ部、８は音声応答部、９は音
声出力部、１０はスピーカで、マイクロフォン１より入
力された音声は特徴抽出部２と音声区１７１ｊ検、′４
３部３に入力される。FIG. 1 is a block diagram for explaining one embodiment of the present invention, in which 1 is a microphone, 2 is a feature extraction section, 3 is a speech section detection section, 4 is a recognition section, 5 is a speech dictionary, 6 is a response control section, 7 is a voice response data section, 8 is a voice response section, 9 is a voice output section, 10 is a speaker, and the voice input from the microphone 1 is passed through a feature extraction section 2 and a voice section 171j inspection, '4
3 Part 3 is input.

特徴抽出部２では入力された音声から特徴量を抽出する
。音声区間検出部３では入力された音声から音声区間の
検出を行なう。音声区間内の特徴Ｆａｔは認識部４にお
いて音声認識辞書５と比較され最も類似している辞書の
単語を正答とする。一方。The feature extractor 2 extracts feature amounts from the input voice. The voice section detecting section 3 detects a voice section from the input voice. The feature Fat within the speech section is compared with the speech recognition dictionary 5 in the recognition unit 4, and the most similar word in the dictionary is determined as the correct answer. on the other hand.

音声応答部８では応答制御部６の指令により、音声応答
データ７から音声を出力する。The voice response section 8 outputs voice from the voice response data 7 in response to a command from the response control section 6 .

次に、応答出力動作と認識動作の関係について説明する
が、ここではガイダンス付の認識動作という場合につい
て説明することにする。動作の始めに応答制御部６より
音声応答部８にガイダンスを出力する命令が与えられる
（音声応答部への指令信号ｂ）。この命令により音声応
答部８はガイダンスを音声出力する。それと同時に認識
部４の方は音声区間の検出を開始する（音声区間検出４
８号）。ここで、使用者が音声応答出力後に発声したと
きは通常の認識動作を行なうだけで良い。しかし、使用
者が音声応答動作中に発声をした場合は次のような動作
を行なう。既に、音声区間検出部３では音声区間検出が
可能になっているので発声され音声の音声区間が検出さ
れる。ここで音声区間が検出された事を検知した応答制
御部６では音声応答部８の動作を中止する。こうするこ
とにより発声者は音声応答出力を聞かずに通常の発声が
行なえる。次に認識した場合、一般的に認識結果を使用
者に知らせるために音声応答部８から認識結果が出力さ
れる。ここで、使用者は応答出力を全部聞かなくても認
識結果が正しいのか間違っているのかを分かる場合があ
る。その場合には分かった時点で次の発声を行なう。そ
うすれば音声によるデータ入力が早くなるからである。Next, the relationship between the response output operation and the recognition operation will be explained, but here, the case of recognition operation with guidance will be explained. At the beginning of the operation, the response control section 6 gives a command to output guidance to the voice response section 8 (command signal b to the voice response section). In response to this command, the voice response unit 8 outputs guidance as voice. At the same time, the recognition unit 4 starts detecting voice sections (voice section detection 4
No. 8). Here, when the user utters a voice after outputting a voice response, it is sufficient to perform a normal recognition operation. However, if the user speaks during the voice response operation, the following operation is performed. Since the speech section detection unit 3 is already capable of detecting speech sections, the speech section of the uttered voice is detected. The response control section 6 detects that the voice section has been detected and stops the operation of the voice response section 8. This allows the speaker to speak normally without hearing the voice response output. When the next recognition is performed, the recognition result is generally output from the voice response unit 8 in order to notify the user of the recognition result. Here, the user may be able to tell whether the recognition result is correct or incorrect without listening to the entire response output. In that case, make the next utterance as soon as you know. This is because data input by voice becomes faster.

この状態でも認識結果出力後直ちに認識可能な状態にな
っていれば音声区間を検出する事が可能なため、音声区
間が検出されたならば応答制御部６が応答出力を中止す
ることにより使用者が正しい発声を行なうことができる
。Even in this state, it is possible to detect a voice section if the recognition result is immediately recognized, so if a voice section is detected, the response control unit 6 stops outputting the response, and the user can produce correct vocalizations.

ところで上記の動作をする場合、認識部は常に発声待ち
の状態にいなければならない。しかし、マイクから雑音
を拾った場合は間違って応答出力を止めることになり不
便である。そのため、第２図（ａ）に示す様に、音声区
間が検出された後（図中ｔ、の時点）、ある一定の時間
（図中のＴ）継続した場合のみ音声応答出力を止める様
にする。この値Ｔはおよそ１５０　ｍ　ｓ程度が適当で
ある。なぜなら、一般にそれより短い単語は存在しない
し雑音はそれより短い場合が多いからである。By the way, when performing the above operation, the recognition unit must always be in a state of waiting for utterance. However, if noise is picked up from the microphone, the response output may be stopped by mistake, which is inconvenient. Therefore, as shown in Figure 2 (a), after a voice section is detected (time point t in the figure), the voice response output is stopped only if it continues for a certain period of time (T in the figure). do. Appropriately, this value T is approximately 150 ms. This is because there are generally no words shorter than that, and noise is often shorter than that.

第３図は、上述のごとき動作を行う本発明による音声認
識装置の一実施例を説明するための構成図で、図中、１
１はタイマーで、その他力１図に示した実施例と同様の
作用をする部分には第１図の場合と同一の参照番号が付
しである。而して、第１図に示した実施例の動作と同様
に音声応答出力中に音声区間が検出されると、応答制御
部ではタイマーからの信号により音声区間がある一定時
間連続して検出されたかどうかを検知する。音声区間信
号が第２図（ｂ）に示すようにある一定時間Ｔに満たな
い場合には、音声応答部の出力を持続させる。一方、第
２図（、）のようにＴより長くなれば応答部の出力を中
止（図中Ｌ２の時点）する。こうすることにより雑音に
よって応答出力が止まる事が無くなる。FIG. 3 is a block diagram for explaining one embodiment of the speech recognition device according to the present invention that performs the above-mentioned operation.
Reference numeral 1 designates a timer, and other parts having the same functions as those in the embodiment shown in FIG. 1 are given the same reference numerals as in FIG. Similarly to the operation of the embodiment shown in FIG. 1, when a voice section is detected while outputting a voice response, the response control section detects the voice section continuously for a certain period of time based on a signal from the timer. detect whether or not the If the voice section signal is less than a certain time T as shown in FIG. 2(b), the output of the voice response section is maintained. On the other hand, if the length becomes longer than T as shown in FIG. 2 (,), the output of the response section is stopped (at the point L2 in the figure). By doing this, the response output will not stop due to noise.

紘−一末以上の説明から明らかなように、請求項第１項の音声認
識装置においては、使用者が音声応答中に発声を始めた
場合、応答出力が停止するので発声が安定し、認識率が
上がる。また、請求項第２項の音声認識装置においては
、周りの雑音による間違った音声応答の停止が無くなり
、使用者が安定した音声応答出力を聞く事ができる。As is clear from the above explanation, in the voice recognition device of claim 1, if the user starts speaking during a voice response, the response output is stopped, so the voice becomes stable and recognition is improved. rate increases. Furthermore, in the speech recognition device according to the second aspect of the present invention, there is no erroneous stoppage of the speech response due to surrounding noise, and the user can hear stable speech response output.

[Brief explanation of the drawing]

第１図は、本発明による音声認識装置の一実施例を説明
するための構成図、第２図は、本発明の動作説明をする
ためのタイムチャート、第３図は、本発明の他の実施例
を説明するための構成図である。１・・・マイクロフォン、２・・・特徴抽出部、３・・
・音声区間検出部、４・・・認識部、５・・・音声辞湯
、６・・・応答制御部、７・・・音声応答データ部、８
・・音声応答部、９・・音声出力部、１０・・スピーカ
、１１・・・タイマー部。第図第図第図FIG. 1 is a block diagram for explaining one embodiment of a speech recognition device according to the present invention, FIG. 2 is a time chart for explaining the operation of the present invention, and FIG. FIG. 2 is a configuration diagram for explaining an example. 1...Microphone, 2...Feature extraction unit, 3...
- Voice section detection unit, 4... Recognition unit, 5... Voice response unit, 6... Response control unit, 7... Voice response data unit, 8
...Voice response section, 9..Speech output section, 10..Speaker, 11..Timer section. Figure Figure Figure Figure

Claims

[Scope of Claims] 1. A microphone that picks up voices, a feature amount extraction unit that extracts voice features, a voice section detection means that detects voice sections, a voice recognition dictionary, and a voice recognition system for input voice. a recognition unit that compares with the dictionary and outputs the most similar dictionary as the correct answer; a voice response unit; a voice response data unit; a voice output unit; a response control unit that controls the operation of the voice response unit;
What is claimed is: 1. A voice recognition device having a timer unit, characterized in that when a voice to be recognized during a voice response is detected by a voice section detection unit, the output of the voice response unit is stopped. 2. A microphone that picks up the voice, a feature extractor that extracts the features of the voice, a voice section detector that detects the voice section, a voice recognition dictionary, and compares the input voice with the voice recognition dictionary to find the most similar one. A voice recognition device comprising: a recognition unit that outputs a dictionary that is correct as a correct answer, a voice response unit, a voice response data unit, a voice output unit, a response control unit that controls the operation of the voice response unit, and a timer unit, A voice recognition device characterized in that when a voice to be recognized during a voice response is continuously detected by a voice section detection unit for a certain period of time, the output of the voice response unit is stopped.