JPH06100959B2

JPH06100959B2 - Voice interaction device

Info

Publication number: JPH06100959B2
Application number: JP18022685A
Authority: JP
Inventors: 一男住田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1985-08-16
Filing date: 1985-08-16
Publication date: 1994-12-12
Anticipated expiration: 2009-12-12
Also published as: JPS6240577A

Description

【発明の詳細な説明】［発明の技術分野］本発明は、利用者の音声入力に対して音声で対話を行う
音声対話装置に関する。Description: TECHNICAL FIELD OF THE INVENTION The present invention relates to a voice interaction device for performing voice interaction with a user's voice input.

［発明の技術的背景］近年銀行の預金残高照会等においては音声対話装置が普
及している。この音声対話装置は利用者からの音声入力
信号を認識する音声認識部と、認識された音声に対する
所定のメッセージを生成したり、利用者側に対する指示
メッセージを生成する会話制御部と、生成されたメッセ
ージを音声信号として合成し利用者側に送出する音声応
答部等からなる。音声応答部で合成される音声は、たと
えば１文節、１単語または１文章のように区切られた一
連の語が一群として利用者側に送出される。[Technical background of the invention] In recent years, a voice dialog device has become widespread in bank balance inquiry and the like. This voice interaction device includes a voice recognition unit that recognizes a voice input signal from a user, a conversation control unit that generates a predetermined message for the recognized voice and a command message for the user side. It is composed of a voice response unit which synthesizes a message as a voice signal and sends it to the user side. The voice synthesized by the voice response unit is sent to the user side as a group of a series of delimited words such as 1 phrase, 1 word or 1 sentence.

［背景技術の問題点］しかしながら従来の音声対話装置では利用者の命令や質
問に対して音声対話装置が音声を出力する場合、または
音声対話装置側の必要性から音声を生成する場合に、一
旦音声の出力が始まると終りまで出力されるので、利用
者が途中で聞きのがすようなことがあっても、もう一度
その部分を聞き直すことはできなかった。[Problems of the Background Art] However, in the conventional voice interaction device, when the voice interaction device outputs a voice in response to a user's command or question, or when a voice is generated due to the necessity of the voice interaction device side, Since the sound is output until the end when it starts to be output, even if the user has a difficulty in listening on the way, it was not possible to hear the part again.

このため音声出力を聞き直すことのできる音声対話装置
も考案されているが、これは一回の出力音声が終了する
のを待って改めて利用者が音声を繰り返し出力すること
を命ずるコマンドを入力して行っていたので、時間がか
かるうえ同じ音声を２度聞かねばならないという問題点
があった。For this reason, a voice interaction device that can hear the voice output again has been devised, but this is to wait for the end of one output voice, and then enter a command to instruct the user to repeatedly output voice. Since it was going on, there was a problem that it took time and had to listen to the same voice twice.

［発明の目的］本発明の目的は前記問題点を解決すべく迅速な聞き返し
のできる音声対話装置を提供することにある。[Object of the Invention] It is an object of the present invention to provide a voice dialogue device capable of quick listening in order to solve the above problems.

［発明の概要］本発明は利用者から送られる音声を認識する音声認識部
と、メッセージを語群毎に利用者に出力する音声応答部
とを備えた音声対話装置において、前記音声認識部に特
定語を認識する特定語認識回路を設け、メッセージ出力
中に特定語が認識された場合には特定語が音声入力され
た時点における出力中の語群から再びメッセージを出力
する出力回路を前記音声応答部に設けたことにより、迅
速な聞き返しができるようにしたものである。[Summary of the Invention] The present invention provides a voice interaction device including a voice recognition unit for recognizing a voice sent from a user and a voice response unit for outputting a message for each word group to the user. A specific word recognition circuit for recognizing a specific word is provided, and if the specific word is recognized during message output, the output circuit for outputting the message again from the word group being output at the time when the specific word is input by voice By providing it in the response part, it is possible to hear back quickly.

［発明の実施例］以下、図面に基づいて本発明の実施例を詳細に説明す
る。Embodiments of the Invention Embodiments of the present invention will be described in detail below with reference to the drawings.

第１図は本発明の一実施例に係る音声対話装置の構成ブ
ロック図であり、この音声対話装置は、音声認識部１、
会話制御部３、音声応答部５、およびタスクとのインタ
ーフェイス部７とからなる。FIG. 1 is a configuration block diagram of a voice dialog device according to an embodiment of the present invention. The voice dialog device includes a voice recognition unit 1,
It includes a conversation control unit 3, a voice response unit 5, and a task interface unit 7.

音声認識部１と会話制御部３とは接続線９、データ線1
1、割込み信号線13によって接続される。音声応答部５
と会話制御部３とは接続線15、データ線17、割込み信号
線19によって接続される。音声認識部１と音声応答部５
とは接続線21によって接続される。会話制御部３とタス
クとのインターフェイス部７とは接続線23、25によって
接続される。The voice recognition unit 1 and the conversation control unit 3 have a connection line 9 and a data line 1
1, connected by interrupt signal line 13. Voice response unit 5
And the conversation control unit 3 are connected by a connection line 15, a data line 17, and an interrupt signal line 19. Voice recognition unit 1 and voice response unit 5
Are connected to each other by a connecting line 21. The conversation control unit 3 and the task interface unit 7 are connected by connecting lines 23 and 25.

音声認識部１は利用者からの音声入力信号を認識する。
利用者から音声入力があると会話制御部３へ割込み信号
線13を介して割込み信号を送出する。またこの音声認識
部１には特定語を認識する特定語認識回路（図示せず）
が設けられており、この特定語認識回路たとえば「え
っ」とか「なんですって」等の特定語を記憶する第１メ
モリ（図示せず）と、認識され電気信号に変換された入
力信号と第１メモリの内容を比較する比較回路（図示せ
ず）からなる。利用者からの音声入力が「なんですっ
て」であった場合、この文節が特定語であるという情報
を会話制御部３へ送出し、利用者からの音声入力が特定
語でない場合には音声入力が特定語でないという情報を
会話制御部３へ送出する。The voice recognition unit 1 recognizes a voice input signal from a user.
When the user inputs a voice, an interrupt signal is sent to the conversation control unit 3 via the interrupt signal line 13. Further, the voice recognition unit 1 has a specific word recognition circuit (not shown) for recognizing a specific word.
The specific word recognition circuit, for example, a first memory (not shown) that stores a specific word such as “um” or “what” and an input signal that is recognized and converted into an electric signal. It comprises a comparison circuit (not shown) for comparing the contents of the first memory. When the voice input from the user is "what-what", the information that this clause is a specific word is sent to the conversation control unit 3, and when the voice input from the user is not the specific word, the voice is input. Information that the input is not a specific word is sent to the conversation control unit 3.

この音声認識部１には接続線21より音声応答部５の出力
中を表わす信号が入力されこの信号が入力されていると
きは特定語のみの認識を行う。A signal indicating that the voice response unit 5 is being output is input to the voice recognition unit 1 through the connection line 21. When this signal is input, only a specific word is recognized.

音声認識部１は、音声を検出した時点で、割り込み信号
を送出するとともに、音声を検出したことを示すデータ
を会話制御部３へ知らせる。また特定語の認識を行い、
検出した語が特定語か否かを認識した時点で、割り込み
信号を送出するとともに、特定語か否かを示すデータを
会話制御部３へ知らせる。When the voice is detected, the voice recognizer 1 sends an interrupt signal and informs the conversation controller 3 of data indicating that the voice is detected. It also recognizes specific words,
When it is recognized whether the detected word is a specific word or not, an interrupt signal is sent and the conversation control unit 3 is notified of data indicating whether or not the word is the specific word.

会話制御部３は音声認識部１から割込み信号が入力さ
れ、それが音声の検出の時点での割り込みであった場
合、音声応答部５へ割込み信号を送出し、音声を中断す
るコマンドを送出する。また音声認識部１で認識された
音声入力が特定語である場合には音声応答部５へ再合成
命令を送出し、認識された音声入力が特定語でない場合
には、音声応答部５へ中断解除命令を送出する。The conversation control unit 3 receives an interrupt signal from the voice recognition unit 1 and, if the interrupt signal is an interrupt at the time of voice detection, sends an interrupt signal to the voice response unit 5 and a command to interrupt voice. . When the voice input recognized by the voice recognition unit 1 is a specific word, a re-synthesis command is sent to the voice response unit 5, and when the recognized voice input is not the specific word, the voice response unit 5 is interrupted. Send a release command.

タスクとのインターフェイス部７は会話制御部３よりコ
マンドを受取り一定の仕事を行い、またそれによって得
られる情報を会話制御部３へ返す。The task interface unit 7 receives a command from the conversation control unit 3 to perform a certain task, and returns information obtained thereby to the conversation control unit 3.

音声応答部５は利用者に音声メッセージを送出するもの
で、出力される音声は１文節、１単語、または１文章の
ように区切られた一連の語が一群として送出される。The voice response unit 5 sends a voice message to the user, and the output voice is a group of a series of words delimited such as one phrase, one word, or one sentence.

この音声応答部５は会話制御部３より割込み信号を受
け、音声の中断を示すコマンドを受けると出力中の語群
の出力が終わった時点で音声出力の中断を行う。When the voice response unit 5 receives an interrupt signal from the conversation control unit 3 and receives a command indicating the interruption of voice, the voice response unit 5 interrupts the voice output when the output of the word group being output ends.

また会話制御部３から再合成命令が入力されると、音声
入力があった時点の語群から再びメッセージを合成す
る。また会話制御部３から中断解除命令があった場合に
は再び次のメッセージを合成して出力する。When a recomposition command is input from the conversation control unit 3, a message is recombined from the word group at the time of voice input. When the conversation control unit 3 issues an interruption canceling command, the next message is synthesized again and output.

次にこの実施例の動作について説明する。Next, the operation of this embodiment will be described.

第２図はこの実施例における装置側Ｓと利用者Ｕとの会
話のタイムチャートであり、装置側Ｓが「振込人名は山
本」といいかけたときに利用者Ｕが「なんですって」と
聞き返した場合を想定している。FIG. 2 is a time chart of the conversation between the device side S and the user U in this embodiment, and when the device side S calls out "The transfer person's name is Yamamoto", the user U says "what is it?" It is assumed that you asked back.

音声応答部５は会話制御部３からの指令により前述した
ように「振込人名は山本」という音声を利用者に送出す
る。音声応答部５が出力中であるときはその出力中を表
わす信号が信号線21を介して音声認識部１に入力される
ので、このとき音声認識部１は内部に記憶されたたとえ
ば「なんですって」とか「えっ」とかいう特定語のみの
認識を行っている。As described above, the voice response unit 5 outputs the voice "name of transfer person is Yamamoto" to the user in response to a command from the conversation control unit 3. When the voice response unit 5 is outputting, a signal indicating that the voice is being output is input to the voice recognition unit 1 via the signal line 21. At this time, the voice recognition unit 1 stores, for example, "What? It recognizes only specific words such as "" or "".

音声認識部１は時刻ａにおいて音声入力があると割込み
送出し、割込み要因をデータ線９によって知らせる。The voice recognition unit 1 sends an interrupt when there is a voice input at time a, and notifies the interrupt factor by the data line 9.

会話制御部３は音声認識部１から割込み信号が送られる
と音声応答部５へ割込み信号を送出すると同時にデータ
線15によって出力中の文節が出力し終わった時点で音声
出力を中断するコマンドを送出する。When the voice recognition unit 1 sends an interrupt signal, the conversation control unit 3 sends an interrupt signal to the voice response unit 5 and at the same time sends a command to interrupt the voice output when the phrase being output by the data line 15 has finished being output. To do.

このため音声応答部５は第２図において時刻ａにおいて
音声入力があったので語群「山本」を出力した時点で音
声出力を中断する。Therefore, the voice response unit 5 interrupts the voice output at the time when the word group "Yamamoto" is output because the voice input is made at the time a in FIG.

音声認識部１は特定語のみの認識を行っており、利用者
Ｕから「なんですって」という音声入力があったので、
音声認識部１はこの特定語「なんですって」を認識し、
会話制御部３へデータ線９によって特定語が認識された
ことを伝える。Since the voice recognition unit 1 recognizes only a specific word, and the user U inputs a voice saying "What is it?",
The voice recognition unit 1 recognizes this specific word "What is it?"
Data line 9 informs conversation control unit 3 that a specific word has been recognized.

会話制御部３は特定語が認識された場合には音声応答部
５へ再合成命令を送出する。音声応答部５は再合成命令
を受けると割込みを受けた時点の語群から出力し直す。
すなわち時刻S₂において音声応答部５から「山本和夫で
す」という音声が出力される。When the specific word is recognized, the conversation control unit 3 sends a resynthesis command to the voice response unit 5. When the voice response unit 5 receives the resynthesis command, the voice response unit 5 outputs again from the word group at the time of receiving the interrupt.
That is, at time S ₂ , the voice response unit 5 outputs the voice “I am Kazuo Yamamoto”.

また時刻ａにおいて利用者Ｕから入力される音声が特定
語でない場合は、会話制御部３は音声応答部５へ割込み
信号を送り、これに伴い音声応答部５は出力中の語群の
出力が終わった時点で音声出力の中断が行っているが、
この場合特定語でないので時刻S₂において会話制御部３
から音声応答部５へ中断解除命令が出され、これに応じ
て音声応答部５は次の語群のメッセージを合成し出力す
る。If the voice input from the user U at time a is not a specific word, the conversation control unit 3 sends an interrupt signal to the voice response unit 5, and the voice response unit 5 outputs the word group being output. Although the audio output is interrupted at the end,
In this case, since it is not a specific word, the conversation control unit 3 at time S ₂
An interruption cancellation command is issued from the voice response unit 5 to the voice response unit 5. In response to this, the voice response unit 5 synthesizes and outputs the message of the next word group.

したがって利用者が「なんですって」とか「えっ」とか
いう特定語を発すると、その特定語が発せられた時刻に
音声応答部５から出力されていた語群から再び利用者に
音声出力が行われるので迅速な聞き返しが可能となる。Therefore, when the user utters a specific word such as "What's that" or "Eh", the user outputs a voice again from the word group output from the voice response unit 5 at the time when the specific word is issued. You will be able to hear back quickly.

なお特定語認識回路に設定される特定語は「なんですっ
て」とか「えっ」だけに限るものではなく他の種々のも
のを設定しておくことが可能である。It should be noted that the specific word set in the specific word recognition circuit is not limited to "what" and "huh", and various other kinds can be set.

［発明の効果］以上詳細に説明したように本発明によれば、迅速な聞き
返しが可能となり、効率のよい音声対話装置を提供する
ことができる。[Effects of the Invention] As described in detail above, according to the present invention, it is possible to promptly listen and to provide an efficient voice interaction device.

[Brief description of drawings]

第１図は本発明の一実施例に係る音声対話装置の構成ブ
ロック図、第２図は同実施例の動作を示すタイムチャー
トである。１……音声認識部３……会話制御部５……音声応答部FIG. 1 is a block diagram showing the configuration of a voice dialogue apparatus according to an embodiment of the present invention, and FIG. 2 is a time chart showing the operation of the same embodiment. 1 ... Voice recognition unit 3 ... Conversation control unit 5 ... Voice response unit

Claims

[Claims]

1. A voice interactive apparatus comprising: a voice recognition unit for recognizing a voice sent from a user; and a voice response unit for outputting a message for each word group to the user. A specific word recognition circuit for recognizing is provided, and when the specific word is recognized during message output, an output circuit for outputting a message again from the word group being output at the time when the specific word is input by voice is provided in the voice response unit. A voice interaction device characterized by being provided.