JPH02149900A

JPH02149900A - Voice recognizing and answering device

Info

Publication number: JPH02149900A
Application number: JP63303518A
Authority: JP
Inventors: Takanori Murata; 村田　隆憲; Katsumi Takahashi; 勝美高橋; Yoshinao Umezawa; 梅澤　義尚
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1988-11-30
Filing date: 1988-11-30
Publication date: 1990-06-08

Abstract

PURPOSE:To improve service to users by providing a continuous voicing detecting means which detects user's utterance and sends out an utterance indication signal for urging a user to correct the voicing utterance to the user in response to the detection. CONSTITUTION:A voice section detection part 123 detects the length of a voice from all input voice signals inputted to a voice recognition part 16D regardless of whether the user voices a speech continuously or not and an excessive utterance length detection part 127 judges whether the detected utterance length is excessive utterance length or not and outputs an excessive utterance length signal when so. A continuous utterance detection part 151, on the other hand, judges whether or not the voice is uttered continuously and outputs a candidate signal for continuous utterance when it is judged that the voice is uttered continuously. Then only when the excessive utterance length signal and candidate signal are inputted at the same time, the continuous utterance is securely decided and the utterance indication signal which urges the user to correct the voicing utterance is sent out of a continuous decision part 152 and, therefore, a continuous utterance means 150 immediately. Consequently, the service to users is improved.

Description

【発明の詳細な説明】（産業上の利用分野）この発明（よ、音声認識応答装置、特に利用者の発声不
具合による誤認識を少なく（〕た音音声認識応答装！ｉ
ｉｌ：関する。[Detailed Description of the Invention] (Industrial Field of Application) This invention is a voice recognition response device, especially a voice recognition response device that reduces misrecognition caused by poor vocalization of the user!i
il: related.

（従来の技術）現在、金融業界、流通菟宥等１こ電話回線による音声認
識応答装置が導入され、残高照会、オーダ・エントリー
等のサービスが利用者に提供されている。つまり、これ
らのサービスは利用者が暗証番号、会員番号、サービス
コード等を音声により電話機等から入力して行なわれる
ものである。(Prior Art) Currently, voice recognition and response devices using a single telephone line have been introduced in the financial industry, distribution channels, etc., and services such as balance inquiries and order entry are provided to users. In other words, these services are performed by the user inputting a personal identification number, membership number, service code, etc. by voice from a telephone or the like.

この発明の説明に先立ち、従来の音声認識応答装置につ
き簡単に説明する。Prior to explaining the present invention, a conventional voice recognition response device will be briefly explained.

第２図は従来の音声認識応答装置の説明図で、各構成要
素をブロック図として示しである。FIG. 2 is an explanatory diagram of a conventional voice recognition response device, showing each component as a block diagram.

同図において、１０は音声認識応答装置で、利用者が外
部音声入力部２０を構成する電話機の送話器に向かい発
声した音声が交換機３０を介して入力音声信号として供
給される。４０は外部処理装Ｍを構成するホスト計算機
で入力音声信号に応答して音声認識応答装置１０から出
力した信号に応答して当該音声認識応答装Ｍ１０の各構
成要素を制御したり、他の外部機器に対し制御信号を出
力する機能を有している。In the figure, reference numeral 10 denotes a voice recognition response device, and the voice uttered by the user into the transmitter of the telephone that constitutes the external voice input section 20 is supplied via the exchange 30 as an input voice signal. 40 is a host computer constituting the external processing device M, which controls each component of the voice recognition response device M10 in response to an input voice signal and a signal output from the voice recognition response device 10, and controls other external processing devices. It has the function of outputting control signals to equipment.

この音声認識応答装置１１０には交換機３０と接続され
公衆回線等とのインタフェースを司どる回線制御部１１
０、回線制御部１１０から送られてきた入力音声信号に
基づいて利用者の発声する音声を認識する音声認識部１
２０、音声認識部１２０からの出力及び又はホスト計算
機４０からの出力に応答して音声認識応答装？ｌｌｌ０
の各構成要素を制御する主制御部１３０、主制御部１３
０からの指示によって音声合成出力する音声合成部１４
０である。This voice recognition response device 110 is connected to the exchange 30 and has a line control unit 11 that controls the interface with public lines and the like.
0, a voice recognition unit 1 that recognizes the voice uttered by the user based on the input voice signal sent from the line control unit 110;
20. A voice recognition response device in response to the output from the voice recognition unit 120 and/or the output from the host computer 40? lll0
A main control unit 130 and a main control unit 13 that control each component of
Speech synthesis unit 14 that synthesizes and outputs speech according to instructions from 0
It is 0.

また、音声認識部１２０は、認識に先だって利用者の発
声を促す合図音を電話機２０の受話器から発生させるた
めの合図音発生部１２１、この合図音に従って利用者が
発声した入力音声を分析して特徴パラメータを抽出する
音声分析部１２２、入力された音声部分を切り出して音
声区間（音声長）を決定する音声区間検出部１２３、各
音声の標準バタンである特徴パラメータを読出し自在に
格納している音声辞書部１２４、得られた音声区間の特
徴パラメータと音声辞書１２４に蓄えられでいる各特徴
バラメークを照合して両者の距Ｍ（非類似度）を算出す
る照合演算部１２５、照合演算部１２５で得られた各カ
テゴリ毎の距離値に基づいて最も距Ｍ値の小さいカテゴ
リを認識結果とする判定部１２６及び音声区間検出部１
２３で検出された入力音声長が所定の閾値Ｔｔｈ（１文
字従って数字の場合には１桁の発声で、正常な認識が可
能な最大音声長であって、ここでは発声過長間値という
、）を越えた場合に、発声適長として検出する発声適長
検出部１２７を含んだ構成となっている。このような音
声認識応答装置１０の構成については従来構成であるの
で、それ以上の詳細な説明を省略する。In addition, the voice recognition unit 120 includes a signal sound generation unit 121 for generating a signal sound from the receiver of the telephone 20 to prompt the user to speak, and analyzes the input voice uttered by the user according to this signal sound. A voice analysis section 122 extracts characteristic parameters, a voice section detection section 123 extracts an input voice part and determines a voice section (voice length), and stores characteristic parameters, which are standard bangs of each voice, in a readable manner. A speech dictionary section 124, a matching calculation section 125 that matches the obtained feature parameters of the speech section with each feature variation stored in the speech dictionary 124 and calculates the distance M (dissimilarity) between the two; A determination unit 126 and a speech section detection unit 1 that determine the category with the smallest distance M value as a recognition result based on the distance value for each category obtained in
The input voice length detected in step 23 is set to a predetermined threshold Tth (the maximum voice length that can be correctly recognized by uttering one character or one digit in the case of a number, herein referred to as the utterance overlength value). ), the configuration includes a utterance appropriate length detecting section 127 that detects the utterance as an appropriate utterance length when the length exceeds . Since the configuration of the voice recognition response device 10 is a conventional configuration, further detailed explanation will be omitted.

次に、第２図に示した従来の音声認識応答装置による通
常の動作例につき別表工及び■を参照して簡単に説明す
る。別表１からも理解出来るように、製雪始動後、ホス
ト計算機４０からの指令により、音声認識応答装？１ｌ
ＩＯの主制御部１３０を通じて音声合成部１４０を始動
させ、これより回線制御部１１０及び交換機３０を経て
電話機２０の受話器からの「暗証番号をどうぞ」等の合
成音声による発声指示（ガイダンス）を行う、このガイ
ダンスに従って利用者が発声した音声が送話器から交換
機３０、回線制御部１１０ヲ通じて音声認識部１２０に
入力音声信号として入力し、この音声認識部１２０にお
いて音声を認識し、認識結果の確認を行う、この認識結
果が主制御部１３０を通じてホスト計算機４０に送られ
所要の外部処理を実行するための信号として利用される
。Next, an example of the normal operation of the conventional voice recognition response device shown in FIG. 2 will be briefly explained with reference to the appendix and (2). As can be understood from Attached Table 1, after snowmaking starts, the voice recognition response device? 1l
The voice synthesis unit 140 is started through the main control unit 130 of the IO, and from this, a synthetic voice instruction (guidance) such as “Please give me your PIN” is issued from the receiver of the telephone 20 via the line control unit 110 and the exchange 30. , the voice uttered by the user according to this guidance is input as an input voice signal from the transmitter to the voice recognition unit 120 through the exchange 30 and the line control unit 110, and the voice is recognized by the voice recognition unit 120, and the recognition result is The recognition result is sent to the host computer 40 through the main control unit 130 and used as a signal for executing necessary external processing.

ところで、音声認識に際しては合図音発生部１２１から
電話機２０の受話器に送られる“ヒラ”という合図音を
送出した後、音声認識動作を開始し、入力された音声を
合図音毎に１文字の発声音毎、従って数字の場合には１
桁ずつ認識している。By the way, when performing voice recognition, after the signal sound generator 121 sends a signal sound "Hira" to the handset of the telephone 20, the voice recognition operation is started, and the input voice is uttered as one character for each signal sound. 1 for each voice, therefore digits
Recognizes each digit.

しかし、音声入力に先だって「これからの入力は“ピッ
”という合図音に従って１文字の発声音毎（或いは数字
の場合には１桁ずつ）区切ってあ願いします」というガ
イダンスを送出しているにもかかわらず、利用者の不注
意で、連続しで発声する例がしばしばある。別表ＩＩは
このような不具合動作の例を示す、このような事態は、
特に、本装貫１０を初めて利用するユーザーの場合に発
生し易い、その場合、装置は、例えば数字の「１」を発
声したとき、「イーデーｊなどのよう（ｉ−不自然な発
声であると、入力音声長が長すぎて誤認識２′なる可能
性が大きくなるｊとめ、これを防止１−る１３゛６めに
発声適長によるリジェクトと（］、［確認７′″きまぜ
んので、もう〜度入力して下さいｊ等のガイダンスによ
り再発声を利用者に促すように動作する構成となってい
る。However, before inputting voice, it sends out a guidance that says, ``In the future input, please separate each character (or one digit in the case of numbers) according to the beep sound.'' However, there are often cases where the user utters the words in succession due to carelessness. Appendix II shows examples of such malfunctions.
This problem is particularly likely to occur when the user is using Honsoukan 10 for the first time. If the length of the input voice is too long, there is a high possibility that it will be misrecognized. It is configured to operate so as to prompt the user to repeat the voice with guidance such as , please input again ~ j, etc.

（発明が解決（］ようとする課題）しか（）ながら、従来装置では発声適長を検出する（う
のの、それが連続発声に起因するのかどうかの判定は行
なわれでおらず、従って、利用者（ま、連続発声したこ
とが不具合となっていることが解らない１とめ、再度連
続発声を繰り返すことが多く、装置側から運用を中断す
るなど、利用者に対するサービスの低下を招くという問
題があつ１．：。(Problem to be solved by the invention) However, the conventional device does not detect the appropriate length of utterance (Unono), but does not determine whether it is caused by continuous utterance, and therefore, The problem is that users (well, they don't understand that continuous utterance is a problem) often repeat continuous utterance again, leading to a decline in service to users, such as interrupting operation from the equipment side. But 1.:.

この発明の目的１よ、利用者の不注意で、連続発声され
た音声入力かあり１と場合に、最終的に装＝か−う運用
本中断するに至る前に発声の不具合を利用者に知ら］ノ
め、以襖の音声認識応答装置の動作を正常に保つことに
より、利用者１区一対づ−るソーｌ−ニスの向上を図る
ことか−Ｃきるように構成（］た音声認識応答装置を提
供４−ること（ｔある。Purpose 1 of the present invention: In the case where the user's carelessness causes continuous voice input, the user can correct the problem with the voice output before it finally ends up interrupting the operation. By maintaining the normal operation of the voice recognition response device in the sliding door, the purpose of the voice recognition system is to improve the quality of service provided to users in each ward. 4- Providing a response device (t).

（課題を解決するための手段）この目的の達成を図る１、：め、この発明の音声認識応
答装置によれば、従来がら有する機能に追加しで、利用
者が連続発声したことを検出］ノかつ、この検出に応答
して、利用者（ご対し、音声発声の訂正を促３″発声指
示信号を送出する連続発声検出手段を設けたことを特徴
とする。(Means for Solving the Problems) Aiming to achieve this object 1. According to the voice recognition response device of the present invention, in addition to the conventional functions, it detects that the user has made continuous utterances.] In addition, in response to this detection, the continuous utterance detection means is provided which sends out a 3'' utterance instruction signal to the user (to prompt the user to correct the utterance).

この発明の実施に当り、この連１発声手段を連続発声検
出部と、連続発声判定部とを以って構成１″るのが好適
である。ぞの場合、速続発声検出部を音声認識部に６９
す、当該連続発声検出部を、入力音声の音声＆が予め定
めた連続発声検出手段であるとき連続発声である旨の候
補信号を出力うるように、構成するのが好ましい。又、
連続発声判定部を主制御部（ｖｌ設け、当該連続発声判
定部を、入力音声の音声易が予め定め１５発声過長間値
以上であるとき［ｉｌｍ、音声認識部から出力される発
声適長信号と、前述の候補信号との論理積信号を生ずる
ように、構成するのが好ましい。In carrying out the present invention, it is preferable that the consecutive utterance means is configured by a continuous utterance detection section and a continuous utterance determination section. Part 69
Preferably, the continuous utterance detection section is configured to output a candidate signal indicating continuous utterance when the input voice & is a predetermined continuous utterance detection means. or,
A continuous utterance determination section is provided in the main control section (vl), and when the voice ease of the input voice is equal to or greater than a predetermined value of 15 utterance excessive lengths, the continuous utterance determination section is configured to control the utterance appropriate length output from the voice recognition section. Preferably, the arrangement is such that an AND signal is generated between the signal and the aforementioned candidate signal.

（作用）このような構成１５−よれば、利用者が連続発声（）た
か否かに拘わらず、音声認識応答装置の音声認識部に入
力され１と全での入力音声信号から音声区間検出部で音
声長を検出し、この検出された音声長が、一方において
は、発声適長検出部においＣ発声適長であるか否かの判
断を行って発声適長である場合には発声適長信号を出力
し、他方においては、連続発声手段の一部分を構成する
速続発声検出部においで連続発声か杏かの一応の判断を
行って連続発声であると一応判断され１と場合に（Ｊ連
続発声の候補信号を出力Ｊ″る。そして、主制御部に設
けられ連続発声手段の残りの一部分を構成する連続発声
判定部に３３いて、上述（ノた発声適長信号と候補信号
とが同時に入力され１こ場合にのみ利用者が故意又１よ
無意識に音声長を延ばして発声１ノたので（課なく、確
かに連続発声１ノブとどの判定を行い、この判定に応答
して当該連続判定部従って連続発声手段から直ちに利用
者に対して音声発声の訂正を促す発声指示信号を送出す
る。(Operation) According to such configuration 15-, regardless of whether the user continuously utters () or not, the voice section detection section detects the voice section from the input voice signal of 1 and all that is input to the voice recognition section of the voice recognition response device. On the one hand, the detected voice length is determined in the appropriate utterance length detecting section as to whether or not it is the appropriate utterance length. A signal is output, and on the other hand, a rapid continuous vocalization detection unit that constitutes a part of the continuous vocalization means tentatively determines whether continuous vocalization is continuous vocalization or apricot. A continuous utterance candidate signal is outputted.Then, the continuous utterance determination section 33, which is provided in the main control section and constitutes the remaining part of the continuous utterance means, outputs the continuous utterance candidate signal and the candidate signal as described above. In this case, the user intentionally or unconsciously extends the length of the voice and utters 1 note. The continuous determining section, ie, the continuous utterance means, immediately sends out a utterance instruction signal urging the user to correct the voice utterance.

従って、連続発声検出手段により、利用者の発声し１．
：音声が連続発声音声であると判定されると、この判定
に応答１ノ℃直ちに連続発声の旨を知らせると共に、発
声方法の訂正を促１−ガイグンスを送出するので、以少
の音声入力動作を円滑１こ続行することが可能となり、
よって利用者１こ対づ−るサービスの向上が図れる・（実施例）以下、図面を参照して＼この発明の音声認識応答装置の
実施例につき説明覆る。Therefore, the continuous utterance detection means detects the user's utterances.1.
:When it is determined that the voice is continuous voice, in response to this determination, it immediately notifies you that it is continuous voice, prompts you to correct the voice method, and sends out 1-Geigns, so that less voice input operations are required. It becomes possible to continue the process smoothly,
Therefore, the service provided to each user can be improved. (Embodiment) An embodiment of the voice recognition response device of the present invention will be described below with reference to the drawings.

第１図はこの発明の音声認識応答装置の一実施例の説明
に供１゛るブロック図であり、同図において第２図に示
し１辷構成要素と同一の椙成要ｇ＋こついて（Ｊ特ｆｔ
ｍ言及這−る場合を除き同一の符号を付して示し、その
詳細な説明を省略する。FIG. 1 is a block diagram for explaining one embodiment of the voice recognition response device of the present invention, and in the same figure, the same components as those shown in FIG. special ft
Components are designated by the same reference numerals except when mentioned, and detailed explanation thereof will be omitted.

この発明によれば、音声認識応答装置に連続発声検出手
段１５０を設けである。ここで、この発明の音声認識応
答装置には１００ヲ付して示す、この連続発声検出手段
１５０ヲ、利用者が連続発声した、ことを検出しかつ、
この連続発声の検出に応答して、利用者に対して、音声
の訂正を促す発声指示信号を送出する機能を有していれ
ばその構成は問わないが、好ましくは、この連続発声検
出手段１５０を連続発声検出部１５１と連続発声判定部
１５２とを以って構成するのが良い、そして、この実施
例では、この連続発声検出部１５１ヲ音声認識部中にそ
の従来の構成要素に追加して設けるので、この発明の実
施例では、この音声認識部を符号１６０を以って示しで
ある。また、この連続発声判定部１５２を主制御部中に
その従来の構成要素（図示されていない）に追加して設
けるので、この発明の実施例ではこの主制御部を符号１
７０を以って示しである。According to this invention, the continuous utterance detection means 150 is provided in the voice recognition response device. Here, in the voice recognition response device of the present invention, this continuous utterance detection means 150 shown as 100 detects that the user has made continuous utterances, and
The continuous utterance detecting means 150 may have any configuration as long as it has a function of sending a utterance instruction signal to prompt the user to correct the voice in response to detection of continuous utterances, but preferably this continuous utterance detection means 150 It is preferable to configure the continuous utterance detection section 151 and the continuous utterance determination section 152, and in this embodiment, the continuous utterance detection section 151 is added to the conventional components in the speech recognition section. Therefore, in the embodiment of the present invention, this speech recognition section is designated by the reference numeral 160. Further, since this continuous utterance determination section 152 is provided in the main control section in addition to its conventional components (not shown), in the embodiment of the present invention, this main control section is designated by reference numeral 1.
70 is indicated.

この連続発声検出部１５１を、音声区間検出部１２３に
おいて入力音声信号から検出された音声長が所定の閾値
Ｔｔｈｃ　　にこではこの閾値のことを連続発声閾値と
いう、）以上であると判定した場合に、−不連続発声で
ある旨の候補信号を出力するように、例えば入力した音
声長と閾値Ｔい。どの比較を行って判定するように構成
することができる。This continuous utterance detection section 151 is activated when the speech length detected from the input speech signal by the speech section detection section 123 is determined to be greater than or equal to a predetermined threshold Tthc (hereinafter, this threshold is referred to as a continuous utterance threshold). , - For example, the input speech length and the threshold value T are set so as to output a candidate signal indicating that the speech is discontinuous. It can be configured to determine which comparison is to be performed.

この場合の閾値ｖｔｈｃは次式で与えられる。The threshold value vthc in this case is given by the following equation.

Ｔｔｈｃ＝ｎＸｔＡ　＋＊＋＋＋＋＋＋＋＋＋　　（１
）ここで、ｎは入力される文字数であって例えば数字の
場合には桁数であり、本装置１００の使用目的に応じて
定まった値で、例えばホスト計算機４０からの指令に基
づき主制御部１７０に設けられている、図示されていな
い任意好適なメモリに格納されでいて所要に応じ読み出
されて与えられる。また、ｔＡは統計的に得られた１文
字（１桁）毎の平均的な最小音声長で各文字（各数字）
毎に予め任意好適なメモリに格納されでいて、この最小
音声長も所要に応じで読み出されて与えられる。Tthc=nXtA +*++++++++++++ (1
) Here, n is the number of characters to be input, for example, in the case of numbers, it is the number of digits, and is a value determined depending on the purpose of use of the device 100. It is stored in any suitable memory (not shown) provided at 170 and read out and provided as required. In addition, tA is the statistically obtained average minimum voice length for each character (each digit).
The minimum voice length is stored in advance in any suitable memory for each voice, and the minimum voice length is also read out and given as required.

従って、木製Ｍ１００を「暗証番号」の照合に使用する
場合には、通常はｎは５桁であり、又、０、・・・　９
までの各数字の最小音声長とから、それぞれの数字毎の
閾＠Ｔｔｈｃが予め算出され、音声認識応答装Ｍ１０の
任意好適箇所、例えば、連続発声検出部１５１内に設け
た、図示されでいない任意好適なメモリに、読み出し自
在に格納することが出来る。Therefore, when using the wooden M100 to verify a "PIN", normally n is 5 digits, and 0,...9
A threshold @Tthc for each digit is calculated in advance from the minimum voice length of each digit up to It can be readably stored in any suitable memory.

一方、連続発声判定部１５２は、発声適長検出部１２７
から入力音声信号に対応する発声音声の音声長が発声適
長であると判定されて出力された発声適長信号と、前述
した連続発声検出部１５１からの候補信号とから、発声
音声の音声長が、利用者の音声発声の引き延ばしによる
発声適長ではなく連続発声による発声適長であることを
確実に判定するための判定部である。この連続発声判定
部１５２を、好ましくは、発声適長信号と候補信号との
論理積を出力する論理積回路を以って構成するのが良い
、このような論理積回路を以って構成することにより、
両信号が入力された場合にのみ、連続発声である旨の確
実な判定結果を与える論理積信号を得ることが出来る。On the other hand, the continuous utterance determination unit 152
The length of the utterance corresponding to the input audio signal is determined to be the appropriate utterance length and is output from the appropriate utterance length signal and the candidate signal from the continuous utterance detection unit 151 described above. is a determination unit that reliably determines that the user's voice utterance is not an appropriate length of voice utterance due to prolongation, but is an appropriate length of voice utterance due to continuous voice utterance. Preferably, the continuous utterance determination section 152 is configured with an AND circuit that outputs an AND between the appropriate utterance length signal and the candidate signal. By this,
Only when both signals are input, it is possible to obtain an AND signal that provides a reliable determination result that the utterance is continuous.

この論理積信号を音声合成部１４０に音声発声の訂正を
促す発声指示信号として出力し、この信号に応答して音
声合成部１４０から電話機２０の受話器を通じで利用者
に対し今回の発声は連続発声であった旨を知らせると共
に、再発声のガイダンスを与えることが出来る。This AND signal is outputted to the speech synthesis section 140 as a speech instruction signal that prompts correction of the speech utterance, and in response to this signal, the speech synthesis section 140 communicates to the user through the receiver of the telephone 20 that the current utterance is a continuous utterance. It is possible to notify the user that the message was uttered and provide guidance for re-voicing.

次に、第１図に示したこの発明の音声認識応答装置の動
作例を別表■を参照して説明する。この実施例では４桁
の暗証番号を発声する例につき説明する。Next, an example of the operation of the voice recognition response device of the present invention shown in FIG. 1 will be explained with reference to Appendix (2). In this embodiment, an example in which a four-digit password is uttered will be explained.

木製Ｍ１００が作動状態にセットされると、ホスト計算
機４０からの指令により主制御部１７０ヲ介して、（１
）式に従って得られた連続発声閾値’Ｔｔｈｃが例えば
連続発声検出部１５１に設けたメモリ（図示せず）に格
納される。When the wooden M100 is set to the operating state, (1
) The continuous utterance threshold 'Tthc obtained according to the formula is stored, for example, in a memory (not shown) provided in the continuous utterance detection section 151.

先ず、主制御部１７０からの指令により、音声合成部１
４０から電話機２０の受話器に送られ、利用者に対し「
これからの入力は“ヒラ”という合図音に従って１桁ず
つ区切ってお願いします」と知らせる。続いて、「暗証
番号をどうぞ」と指示し、合図音発生部１２１から「ピ
ッ」という音を発生させる。これに応答して利用者が「
ハチナナヨンゼロゴ」と連続発声すると、電話機２０の
送話器から音声認識応答装置１００に入力音声信号が入
力し、音声区間検出部１２３において音声長りを検出す
る。この音声長りを発声適長検出部１２７及び連続発声
検出部１５１において、それぞれの閾＠Ｔい及びＴｔｈ
Ｃと比較し、Ｌ≧７ｔｈであれば発声適長検出部１２７
から発声適長信号が生じ、又、Ｌ≧Ｔｔｈｃであれば連
続発声検出部１５１から候補信号が生し、それぞれ連続
発声判定部１５２に入力する。これら両信号か検出され
て連続発声判定部１５２で両信号の論理積信号が生ずる
と、この論理積信号は今回の利用者の発声は連続発声で
あったと判定を下したことを示している。従って、この
論理積信号を直接又は他の任意好適な信号に換えて主制
御部１７０から音声合成部１４０に指令を送り、この音
声合成部１４０から電話機の受話器へ「連続して発声さ
れているため確認できません“ピッ”イチ”ピッ　、“
二”のように“どツ”という合図音に従って、１桁ずつ
区切って発声をお願いしますｊ等のガイダンスを利用者
に音声合成出力し、利用者に発声の注意を促す。First, according to a command from the main control section 170, the speech synthesis section 1
40 to the handset of the telephone 20, and the message "
From now on, please enter each digit one by one according to the beep sound.'' Next, the user instructs the user to ``please enter the PIN number,'' and causes the signal sound generator 121 to generate a ``beep'' sound. In response, the user
When the user continuously utters "Hachi Nana Yon Zero Go," an input voice signal is input from the transmitter of the telephone 20 to the voice recognition response device 100, and the voice section detection unit 123 detects the length of the voice. This voice length is determined by thresholds @T and Tth in the appropriate voice length detection unit 127 and continuous voice detection unit 151.
Compared with C, if L≧7th, the appropriate length of utterance detection unit 127
If L≧Tthc, candidate signals are generated from the continuous utterance detection section 151 and input to the continuous utterance determination section 152, respectively. When both of these signals are detected and an AND signal of both signals is generated in the continuous utterance determination section 152, this AND signal indicates that it has been determined that the user's utterance this time was continuous utterance. Therefore, the main control section 170 sends a command to the speech synthesis section 140 by directly using this AND signal or by replacing it with any other suitable signal, and the speech synthesis section 140 sends a command to the telephone receiver saying, Therefore, it cannot be confirmed.
Following the signal sound of ``dotsu'' like ``2'', guidance such as ``Please say digit by digit.

利用者は、この発声指示に従い、合図像発生部＋２１か
らの“ピッ”の音に続いて、「ハチ」、“ピッ　　「ナ
ナ」、“ピッ　　「ヨン」、“どツ　　「ゴ」というよ
うに４桁数字を１桁ずつ“ピッ”の音の後に正しく発声
する。このような正しい発声が行われると、この木製言
１００は従来と同様に作動して利用者に対し「暗証番号
は８．７．４．０．５ですね」とメツセージを送る。こ
のメツセージの後に“ピッ”という音に続いて利用者が
「ハイ１と応答すればよい。The user follows this vocal instruction and, following the "beep" sound from the signal image generating unit +21, says "Hachi", "Pip "Nana", "Pip "Yon", "Dotsu "Go", etc. Correctly say each 4-digit number after the "beep" sound. When such a correct utterance is made, the wooden word 100 operates in the same manner as before and sends a message to the user saying, ``Your password is 8.7.4.0.5.'' After this message, the user hears a beep and then responds with ``Yes 1''.

（発明の効果）上述した説明からも明らかなように、この発明の音声認
識応答装置によれば、利用者の不注意により、連続発声
された場合、これを検出することにより、即座に発声の
誤り、訂正を利用者に対して知らしめることが出来るの
で、音声認識応答装置の使い易さが向上し、利用者に対
するサービスの向上を図れる音声認識応答装置を提供出
来る。(Effects of the Invention) As is clear from the above description, according to the voice recognition response device of the present invention, when continuous utterances are made due to the user's carelessness, by detecting this, the utterances can be immediately stopped. Since it is possible to notify the user of errors and corrections, the ease of use of the voice recognition response device is improved, and it is possible to provide a voice recognition response device that can improve the service provided to the user.

別表■ 音声認識応答装置による通常の動作例別表Ｉ＋音声認識応答装置における不具合動作例Separate table■ Example of normal operation by voice recognition response device Appendix I+ Examples of malfunctions in voice recognition response devices

[Brief explanation of the drawing]

第１図はこの発明の音声認識応答装置の実施例の説明に
供するブロック図、第２図は従来の音声認識応答装置の説明に供するブロッ
ク図である。２０・・・外部音声入力部（例えば電話機）３０・・・
交換機４０・・・外部処理袋＝（例えばホスト計算機）１００
・・・音声認識応答袋で１１０・・・回線制御部、　　１２１−・・合図音発生
部１２２・・・音声分析部、　　１２３・・・音声区間
検出部１２４・・・音声群Ｍ部、　　１２５・−・照合
演算部１２６・・・判定部、　　　　１２７・・・発声
過員検出部１４０・・・音声合成部、　　１５０・・・
連続発声検出手段１５１・・・連続発声検出部、１５２
・・・連続発声判定部１６０・・・音声認識部、　　１
７０・・・主制御部。特許出願人　　　　　　沖電気工業株式会社代理人　弁
理士　　　　　　大　垣　　　孝別表■FIG. 1 is a block diagram for explaining an embodiment of the voice recognition response device of the present invention, and FIG. 2 is a block diagram for explaining a conventional voice recognition response device. 20... External audio input section (for example, telephone) 30...
Switching machine 40...external processing bag = (for example, host computer) 100
...Voice recognition response bag 110...Line control section, 121-...Signal sound generation section 122...Speech analysis section, 123...Voice section detection section 124...Voice group M section, 125 - Verification calculation unit 126... Judgment unit, 127... Overvocalization detection unit 140... Speech synthesis unit, 150...
Continuous utterance detection means 151...Continuous utterance detection section, 152
... Continuous utterance determination section 160 ... Speech recognition section, 1
70... Main control unit. Patent Applicant Oki Electric Industry Co., Ltd. Agent Patent Attorney Takashi Ogaki

Claims

[Claims]

(1) a voice recognition unit that recognizes the input voice from an external voice input unit word by word; a voice synthesis unit that outputs synthesized voice based on voice data for instructing the user to speak;
The main control unit is equipped with a main control unit, and prior to voice recognition, it sends out a signal sound to prompt the user to speak, recognizes the voice uttered character by character for each signal sound, and outputs the recognition result to an external processing device. A voice recognition response device configured to detect continuous utterances by a user and, in response to the detection, send a utterance instruction signal to the user to prompt the user to correct the utterances. A voice recognition response device comprising a detection means.

(2) In the voice recognition response device according to claim 1, the continuous utterance detection means is provided in the voice recognition section and indicates that the continuous utterance is continuous utterance when the voice length of the input voice is equal to or greater than a predetermined continuous utterance threshold. a continuous utterance detection section that outputs a candidate signal of utterance, and an overlength utterance signal that is provided in the main control section and is output from the speech recognition section when the voice length of the input speech is equal to or greater than a predetermined utterance overlength threshold; , and a continuous utterance determination unit that generates an AND signal with the candidate signal.