JP2003228449A

JP2003228449A - Interactive apparatus and recording medium with recorded program

Info

Publication number: JP2003228449A
Application number: JP2002025830A
Authority: JP
Inventors: Keiko Watanuki; 啓子綿貫
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2002-02-01
Filing date: 2002-02-01
Publication date: 2003-08-15

Abstract

<P>PROBLEM TO BE SOLVED: To provide an interactive apparatus that enables a man to conduct natural dialog by reducing the waiting time during the interaction, between machine and man. <P>SOLUTION: An interactive apparatus 10 is equipped with an audio input unit for inputting a user's voice, an action acquiring unit for acquiring actions for each body part of the user, an audio processing unit for processing audio by recognizing audio signals, an action analyzer for analyzing the actions by recognizing the actions, a dialog coordinating unit for coordinating dialogs with the users, a dialog administrative unit for administrating a progress of the dialogs and a determined result by a dialog progress determining unit, an audio control unit, an action control unit 7, and an audio output unit, as well as an action output unit. The interactive coordinating unit 200 is equipped with an voice attribute analyzer for analyzing attributes of responding voices on the basis of inputted voices, an action attribute analyzer for analyzing attributes of responding actions on the basis of inputted actions, and the dialog progress determining unit to determine the interactive progress, on the basis of the attributes of the voices or the actions, and also send an advanced message for ending of the dialog on the basis of the analyzed results by the dialog progress determining unit. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、ユーザと機械との
間で自然な対話を実現する対話装置及び対話処理プログ
ラムを記録した記録媒体に関し、詳細には、ユーザから
の音声入力手段と、ユーザの動き検出手段を備えたユー
ザとの対話装置及び対話処理プログラムを記録した記録
媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a dialogue medium for realizing a natural dialogue between a user and a machine and a recording medium on which a dialogue processing program is recorded. The present invention relates to a user interaction device and a recording medium having an interaction processing program recorded therein.

【０００２】[0002]

【従来の技術】近年、高性能のワークステーションやパ
ーソナルコンピュータ（以下、パソコンという）に併せ
て、記憶容量の大きな光磁気ディスク等の記憶媒体が低
廉化し、高解像度の表示装置やマルチメディアに適応し
た周辺機器も大幅に低廉化している。文書処理、画像デ
ータ処理その他の分野では、処理対象となるデータの情
報量の増大に適応可能なデータ処理機能の向上が要求さ
れ、従来、主として文字や数値に施されていた処理に併
せて音声や動画にも多様な処理を施すことが可能な種々
の処理装置が開発されつつある。2. Description of the Related Art In recent years, along with high-performance workstations and personal computers (hereinafter referred to as personal computers), storage media such as magneto-optical disks having a large storage capacity have become inexpensive, and are suitable for high-resolution display devices and multimedia. The price of peripheral devices has also dropped significantly. In the field of document processing, image data processing, and other fields, it is required to improve the data processing function that can adapt to the increase in the amount of information of the data to be processed. Various processing devices are being developed that are capable of performing various kinds of processing on video and moving images.

【０００３】また、音声認識技術が発達し、テレビジョ
ン受信機、ラジオ受信機、車載ナビゲーション、携帯電
話、パソコン等の機器に搭載されつつある。音声認識技
術は通常それぞれの機器の一部として内蔵されている。
さらに、カメラから入力される動画像の認識技術も開発
されつつある。マイクロホン、カメラ、キーボード、マ
ウス、また、ライトペン、タブレット等のポインティン
グデバイスなど、複数の入力装置をコンピュータに接続
して、ユーザがその局面局面において自分にとってもっ
とも都合の良い入力装置を使って入力できれば非常に使
いやすいインタフェースとなる。このように複数の異な
る入力モードからユーザが任意の入力モードを選択し、
組み合わせて自分の意図をシステムに伝えることができ
るインタフェースをマルチモーダルインタフェースとい
う。Further, voice recognition technology has been developed and is being mounted in devices such as television receivers, radio receivers, vehicle-mounted navigations, mobile phones, and personal computers. Speech recognition technology is usually built in as part of each device.
Furthermore, a technique for recognizing a moving image input from a camera is being developed. If multiple input devices, such as a microphone, camera, keyboard, mouse, pointing device such as light pen, tablet, etc., can be connected to the computer and the user can input using the input device that is most convenient for them in that aspect The interface is very easy to use. In this way, the user selects any input mode from multiple different input modes,
An interface that can be combined to convey your intention to the system is called a multimodal interface.

【０００４】従来、人間とコンピュータの間のインタフ
ェースとしては、音声入力に対して応答する対話装置が
考えられてきた。これは、人間が発する音声を認識し、
それに応じてシステムの内部状態を変化させ、予め決め
られた出力をし、人間との対話を実現しようとしたもの
である。また、コンピュータとの対話をより円滑にする
ために、入力音声に対してアニメーション等が応答する
出力合成システムが提案されている。Conventionally, as an interface between a human and a computer, a dialogue device which responds to a voice input has been considered. It recognizes human voice
In response to this, the internal state of the system is changed, a predetermined output is performed, and a dialogue with a human is realized. In order to make the dialogue with a computer smoother, an output synthesis system in which an animation or the like responds to an input voice has been proposed.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、人間と
コンピュータの間のインタフェースとしては、コンピュ
ータが主導権をもって問いを発し、ユーザはそれに従っ
て受動的に答えながら作業を進める形態が主であった。
そのため、ユーザの発話の順番が固定的で、ユーザの発
声のしやすさや対話の自然性等には配慮がなされていな
い。However, as an interface between a human being and a computer, the computer is mainly in charge of asking questions, and the user follows the task passively to proceed with the work.
Therefore, the order of the user's utterances is fixed, and no consideration is given to the easiness of the user's utterance and the naturalness of the dialogue.

【０００６】コンピュータと対話するとき、コンピュー
タとユーザの発話のタイミングがよくないと対話の自然
性が失われる。特開平９−２１８７７０には、ユーザの
入力から音声言語や非言語メッセージによる相づち表現
を抽出し、これに応じて対話制御する装置、及び、ユー
ザからの相づちを要求する表現を生成する装置が記載さ
れている。また、特開昭６２−４０５７７号公報には、
「え、なんですか」等の発話を検出して、コンピュータ
の発話の途中で聞き返しができるようにした装置が記載
されている。しかし、上記各公報記載の装置では、相づ
ちを利用して対話の終了を予測する技術については述べ
られていない。また、一つの対話装置と一人の人間との
１種類の対話で発生する相づち応答しか記載していな
い。When interacting with a computer, the naturalness of the interaction is lost unless the computer and the user speak in a timely manner. Japanese Unexamined Patent Publication No. 9-218770 describes a device for extracting a relationship expression in a spoken language or a non-verbal message from a user's input, and interactively controlling the expression, and a device for generating an expression requesting a relationship from the user. Has been done. Further, JP-A-62-40577 discloses that
There is described a device capable of detecting an utterance such as "what is it?" And being able to listen back during the utterance of the computer. However, the devices described in the above publications do not describe the technique of predicting the end of the dialogue by utilizing the association. Further, only the mutual response that occurs in one type of dialogue between one dialogue device and one person is described.

【０００７】一方、将来、例えば相談窓口での機械化が
進むと、複数の対話型処理装置を複数の人間が使うとい
うケースが増えてくる。このとき、複数の対話型処理装
置を複数の人間が効率良く使うために、待ち行列の処理
方法が問題になる。一つの対話が終わり、人間が機械か
ら離れたことを見てから次の人間が空いた機械に行くの
ではその移動時間が無駄になる。On the other hand, in the future, as the mechanization at the consultation counter progresses, the number of cases where a plurality of people use a plurality of interactive processing devices will increase. At this time, a queue processing method becomes a problem because a plurality of human beings use the plurality of interactive processing devices efficiently. It is wasted time for one person to go to an empty machine after seeing that one human being has left the machine after one dialogue.

【０００８】本発明は、このような課題に鑑みてなされ
たものであって、機械と人間との対話において、待ち時
間を短縮することができ、自然な対話を実現することが
できる対話装置及び対話処理プログラムを記録した記録
媒体を提供することを目的としている。The present invention has been made in view of the above problems, and in a dialogue between a machine and a human, it is possible to shorten the waiting time and realize a natural dialogue, and It is an object of the present invention to provide a recording medium recording an interactive processing program.

【０００９】[0009]

【課題を解決するための手段】本発明の対話装置は、音
声入力に対して対応する対話装置において、ユーザから
の音声を入力する音声入力手段と、ユーザの動作を入力
する動作入力手段と、前記音声入力及び又は前記動作入
力に基づいて、ユーザとの対話の進行を判定する対話進
行判定手段と、前記対話進行判定手段が対話の終了を予
測したときにはその旨を通知する対話終了予告通知手段
とを備えることを特徴としている。According to another aspect of the present invention, there is provided a dialog device which responds to a voice input, wherein voice input means for inputting a voice from a user, and motion input means for inputting a user's action. A dialogue progress judging means for judging the progress of the dialogue with the user based on the voice input and / or the motion input, and a dialogue end notice notifying means for notifying the end of the dialogue when the dialogue progress judging means predicts the end of the dialogue. It is characterized by having and.

【００１０】本発明の対話装置は、音声入力に対して対
応する対話装置において、ユーザからの音声を入力する
音声入力手段と、ユーザの動作を入力する動作入力手段
と、前記音声入力及び前記動作入力を分析して、ユーザ
との対話を管理する対話管理手段と、前記対話管理手段
が、対話の終了を決定したときには、対話の終了を誘導
するように自己の音声出力及び動作出力を制御する対話
終了誘導手段とを備えることを特徴としている。A dialogue apparatus of the present invention is a dialogue apparatus corresponding to a voice input, a voice input means for inputting a voice from a user, a motion input means for inputting a user's motion, the voice input and the motion. A dialogue management unit that analyzes the input and manages the dialogue with the user, and when the dialogue management unit determines the end of the dialogue, controls its own voice output and action output so as to guide the end of the dialogue. It is characterized in that it comprises a dialogue end guide means.

【００１１】また、より好ましくは、前記対話進行判定
手段が、対話の終了を予測したときには、対話の終了を
誘導するように音声出力と動作出力を制御する対話終了
誘導手段をさらに備えるものであってもよい。また、本
発明は、前記請求項１又は２に記載の対話装置を複数備
え、前記複数の対話装置の終了予測タイミングを各対話
装置のＩＤとともに、外部に通知する対話終了予告通知
手段を備えることを特徴とする。Further, more preferably, the dialogue progress judging means further comprises dialogue end guiding means for controlling voice output and motion output so as to guide the end of the dialogue when the end of the dialogue is predicted. May be. Further, the present invention comprises a plurality of dialogue devices according to claim 1 or 2, and further comprises dialogue end notice notifying means for notifying the end prediction timings of the plurality of dialogue devices together with the IDs of the respective dialogue devices to the outside. Is characterized by.

【００１２】また、より好ましくは、前記対話進行判定
手段は、情報理解・情報共有と、対話終了の進行状態と
を判定するものであってもよい。また、好ましい具体的
な態様としては、前記音声特徴判定手段により判定され
る音声は、相づち音声であってもよく、前記動作特徴判
定手段により判定される動作は、相づち動作であっても
よい。Further, more preferably, the dialogue progress judging means may judge the information comprehension / information sharing and the progress status of the dialogue end. As a preferred specific aspect, the voice determined by the voice feature determination means may be a joint voice, and the operation determined by the motion feature determination means may be a joint movement.

【００１３】また、より好ましくは、前記対話終了誘導
手段は、前記対話管理手段が対話を終了させると決定し
た場合は、相づち音声又は相づち動作のうち少なくとも
一つを出力するように前記音声制御手段又は前記動作制
御手段を制御するものであってもよい。また、前記動作
出力は、自己の動作を映像により表示する、又は機構的
構造を有するロボットの動作により表現するものであっ
てもよい。Further, more preferably, when the dialogue management means decides to end the dialogue, the dialogue end guiding means outputs the at least one of the chorus voice and the chorus operation. Alternatively, it may control the operation control means. In addition, the motion output may be a motion that displays the motion of itself or a motion of a robot having a mechanical structure.

【００１４】さらに、本発明は、コンピュータを、音声
入力に対して対応する対話装置において、ユーザからの
音声を入力する音声入力手段と、ユーザの動作を入力す
る動作入力手段と、前記音声入力及び又は前記動作入力
に基づいて、ユーザとの対話の進行を判定する対話進行
判定手段と、前記対話進行判定手段が対話の終了を予測
したときにはその旨を通知する対話終了予告通知手段と
を備える対話装置として機能させるためのプログラムを
記録したことを特徴とするコンピュータ読み取り可能な
記録媒体である。Further, according to the present invention, in a dialogue device corresponding to a voice input for a computer, a voice input means for inputting a voice from a user, an operation input means for inputting a user's action, the voice input and Alternatively, a dialogue including a dialogue progress judging means for judging the progress of the dialogue with the user based on the operation input, and a dialogue end notice notifying means for notifying the end of the dialogue when the dialogue progress judging means predicts the end of the dialogue. It is a computer-readable recording medium in which a program for functioning as a device is recorded.

【００１５】また、本発明は、コンピュータを、音声入
力に対して対応する対話装置において、ユーザからの音
声を入力する音声入力手段と、ユーザの動作を入力する
動作入力手段と、前記音声入力及び前記動作入力を分析
して、ユーザとの対話を管理する対話管理手段と、前記
対話管理手段が、対話の終了を決定したときには、対話
の終了を誘導するように音声出力及び動作出力を制御す
る対話終了誘導手段とを備える対話装置として機能させ
るためのプログラムを記録したことを特徴とするコンピ
ュータ読み取り可能な記録媒体である。Further, according to the present invention, in a dialogue device corresponding to a voice input for a computer, a voice input means for inputting a voice from a user, an operation input means for inputting a user's action, the voice input and the voice input and A dialogue management unit that analyzes the action input and manages the dialogue with the user, and when the dialogue management unit determines the end of the dialogue, controls voice output and action output so as to guide the end of the dialogue. A computer-readable recording medium having recorded therein a program for functioning as a dialogue device including dialogue end guide means.

【００１６】[0016]

【発明の実施の形態】以下、添付図面を参照しながら本
発明の好適な対話装置の実施の形態について詳細に説明
する。まず、本発明の基本的な考え方について説明す
る。BEST MODE FOR CARRYING OUT THE INVENTION Preferred embodiments of a dialogue apparatus of the present invention will be described in detail below with reference to the accompanying drawings. First, the basic idea of the present invention will be described.

【００１７】本発明者は、実際の人間同士の対話を収録
し、分析した。収録にあたっては、光学式のモーション
キャプチャシステムを導入し、音声・画像データととも
に、赤外線カメラで被験者の身体につけたマーカーを光
学的にとらえて、その位置の３次元数値データを収集し
ている。これにより、対話者ふたりの画像・音声データ
と、マーカー位置の情報とを同じ時間軸上で分析可能な
マルチモーダル対話データを収集できるようになった。
この対話データを分析した結果、以下のことが分かっ
た。The inventor has recorded and analyzed actual human interaction. For recording, an optical motion capture system was introduced, and along with voice and image data, a marker attached to the subject's body was optically captured by an infrared camera, and three-dimensional numerical data of that position was collected. As a result, it has become possible to collect multi-modal dialogue data that allows analysis of the image / sound data of the two interlocutors and the information on the marker position on the same time axis.
As a result of analyzing this dialogue data, the following was found.

【００１８】（１）一方の話者が話題を提起し、相手話
者がその話題の情報を理解（情報理解）したり、あるい
は情報について同じ意見を共有していたり、感情を表出
（情報共有）する場合、相づち音声又は動作が、相手発
話中に起こる傾向がある。（２）提起された話題についてそれ以上話すことがなく
なり、対話が終了しつつある（対話終了）と、相手発話
末又は相手発話後（相手発話休止区間）に、間を埋める
ように相づち動作又は音声が起こるようになり、その時
の相づち音声の種類は「んー」「そー」である傾向があ
る。（３）情報理解又は情報共有時の相づち音声の種類は
「あっ」「あー」「んんん」「へー」「えっ」「ええ」
「はい」「うそー」「ほんとー」であることが多いが、
対話終了の場合は、これらの相づち音声は現れない。（４）対話終了時の相づち音声は、情報理解・情報共有
時の相づち音声に比べて、ｒｍｓ（root mean square：
平方自乗平均）振幅値が小さい傾向がある。(1) One speaker raises a topic, and the other speaker understands information on the topic (information understanding), or shares the same opinion on the information, and expresses emotions (information). Sharing), voices or movements tend to occur while the other party speaks. (2) When the conversation is about to end (dialogue end), no more talking about the raised topic, and when the other party's utterance ends or after the other person's utterance (other party's utterance pause period), a joint action is made to fill the gap or Voices come to occur, and at that time, the types of mutual voices tend to be "n" and "so". (3) The types of voices for sharing when understanding or sharing information are "Ah""Ah""Nnn""Hee""Eh""Eh"
Often "yes", "lie" and "really",
At the end of the dialogue, these related voices do not appear. (4) Relative voice at the end of the dialogue is rms (root mean square:
Root mean square) Amplitude values tend to be small.

【００１９】本発明は、上記知見に基づき、以下の機能
を備えたことを特徴とする対話装置である。The present invention is an interactive apparatus having the following functions based on the above findings.

【００２０】（１）システムの音声・動作出力中又は出
力後数フレーム（例えば、３０フレーム＝１ｓｅｃ）に
発生するユーザの相づち音声又は動作を認識し、そのと
きの音声又は動作が対話終了と判定されたときは対話の
終了を次番のユーザに予告する。（２）システムが対話を終了したいときは、ユーザの発
話末又は終了後数フレーム（例えば、３０フレーム＝１
ｓｅｃ）の間に対話の終了を誘導する相づち音声又は相
づち動作を出力し、対話終了を誘導する。(1) Recognizing the user's mutual voice or action occurring during or after the output of system voice or action or several frames (for example, 30 frames = 1 sec), and it is determined that the voice or action at that time is the end of the dialogue. When this is done, the next user is notified of the end of the dialogue. (2) When the system wants to end the conversation, several frames after the user's utterance or after the end (for example, 30 frames = 1)
(sec), a mutual voice or a mutual action for inducing the end of the dialogue is output to guide the end of the dialogue.

【００２１】これにより、機械とユーザとの対話におい
て、対話の終了を予測したり、対話の終了を誘導するこ
とができるとともに、複数のユーザと複数のシステムと
の対話において、順番を待つユーザに順番が近いことを
予測し、順番待ちのイライラを緩和するとともに、シス
テム（例えば、ＣＧ（Computer Graphics）やロボッ
ト）の発話ないし動作を制御して対話が終了するよう誘
導することで、待ち時間を短縮することができるように
なり、ユーザとシステムとの間に円滑な対話を実現でき
る。Thus, in the dialog between the machine and the user, it is possible to predict the end of the dialog and guide the end of the dialog, and in the dialog between the plurality of users and the plurality of systems, the user waiting for the turn can be performed. Waiting time is reduced by predicting that the turn is near, alleviating the frustration of waiting, and controlling the utterance or operation of the system (for example, CG (Computer Graphics) or robot) to induce the dialogue to end. It becomes possible to shorten the time, and a smooth dialogue can be realized between the user and the system.

【００２２】図１は、上記基本的な考え方に基づく本発
明の第１の実施の形態の対話装置の基本構成を示すブロ
ック図である。対話装置として、マルチモーダルインタ
フェースを用いた情報処理装置に適用した例である。図
１において、対話装置１０は、ユーザからの音声信号を
入力する音声入力部１０１（音声入力手段）、ユーザの
身体各部位の動きを入力する動作入力部１０２（動作入
力手段）、入力された音声信号を認識処理する音声処理
部１０３、入力された動きを認識処理する動作処理部１
０４、ユーザとの対話を調整する対話調整部２００（対
話進行判定部２０３を含む）、ユーザからの入力に基づ
いて対話の進行を管理するとともに、対話進行判定部２
０３の判定結果を管理する対話管理部１０５（対話管理
手段）、出力する音声を制御する音声制御部１０６、出
力する動作を制御する動作制御部１０７、音声を出力す
る音声出力部１０８、及び動作を出力する動作出力部１
０９から構成される。FIG. 1 is a block diagram showing the basic configuration of a dialogue apparatus according to the first embodiment of the present invention based on the above basic idea. This is an example applied to an information processing device using a multimodal interface as a dialogue device. In FIG. 1, the dialog device 10 includes a voice input unit 101 (voice input unit) for inputting a voice signal from a user, a motion input unit 102 (motion input unit) for inputting a movement of each part of the user's body, and an input. A voice processing unit 103 for recognizing a voice signal, and an operation processing unit 1 for recognizing an input motion.
04, the dialogue adjustment unit 200 (including the dialogue progress determination unit 203) for adjusting the dialogue with the user, managing the progress of the dialogue based on the input from the user, and the dialogue progress determination unit 2
Dialog management unit 105 (dialog management means) that manages the determination result of 03, voice control unit 106 that controls the output voice, operation control unit 107 that controls the output operation, voice output unit 108 that outputs the voice, and operation Output unit 1 for outputting
09.

【００２３】対話調整部２００は、ユーザの相づちから
対話の進行を判定する手段であり、入力された音声から
相づち音声の特徴を判定する音声特徴判定部２０１、入
力された動作から相づち動作の特徴を判定する相づち動
作特徴判定部２０２（対話進行判定手段）、これら音声
ないし動作の特徴から対話の進行状態を判定する対話進
行判定部２０３（対話進行判定手段の一部）、対話進行
判定部２０３の判定結果に基づき次ユーザに対話の終了
を予告する終了予告部２０４、終了予告を次ユーザに通
知する終了予告通知部２０５とから構成され、判定結果
を対話管理部１０５に出力する。上記終了予告部２０４
及び終了予告通知部２０５は、全体として、対話進行判
定部２０３が対話の終了を予測したときにはその旨を通
知する対話終了予告通知手段を構成する。The dialogue adjusting unit 200 is a means for judging the progress of the dialogue based on the user's mutual relations, the voice characteristic judging unit 201 for judging the characteristic of the joint voice from the inputted voice, and the characteristic of the joint action from the inputted action. A mutual action characteristic determination unit 202 (dialogue progress determination unit), a dialogue progress determination unit 203 (a part of the dialogue progress determination unit) that determines a progress state of the dialogue based on these voice or motion features, a dialogue progress determination unit 203. It is composed of an end notifying unit 204 for notifying the next user of the end of the dialogue based on the result of the judgment, and an end notifying unit 205 for notifying the next user of the end notice, and outputs the judgment result to the dialog management unit 105. The end notice section 204
Also, the end advance notice notifying unit 205, as a whole, constitutes a dialogue end advance notice notifying unit that notifies the end of the conversation when the dialogue progress determination unit 203 predicts the end of the conversation.

【００２４】図２は、対話装置１０の具体的なシステム
構成を示す図である。図２において、対話装置１０は、
ＣＰＵ１、ＲＡＭからなるワークメモリ２、音声信号及
び動画像信号を入力する入力部３、入力された信号及び
ＣＰＵ１の演算処理結果を格納するデータベース４、ド
ットマトリックス構成の液晶表示装置（ＬＣＤ）等から
なる表示部又は、動作形状を出力するロボットからなる
出力部５、電源バックアップにより書き込まれた情報を
保持するＳＲＡＭ（Static RAM）カード、ＦＤ、ＣＤ−
ＲＯＭ等の外部記憶装置７、外部記憶装置７の読み取り
装置である外部記憶ドライバ６から構成される。FIG. 2 is a diagram showing a concrete system configuration of the dialogue apparatus 10. In FIG. 2, the dialogue device 10 is
From a CPU 1, a work memory 2 including a RAM, an input unit 3 for inputting a sound signal and a moving image signal, a database 4 for storing the input signal and a calculation processing result of the CPU 1, a liquid crystal display device (LCD) having a dot matrix configuration, and the like. Display unit or output unit 5 consisting of a robot that outputs motion shapes, SRAM (Static RAM) card that holds information written by power backup, FD, CD-
It is composed of an external storage device 7 such as a ROM and an external storage driver 6 which is a reading device of the external storage device 7.

【００２５】ＣＰＵ１は、音声・動画像処理の実行を含
む装置全体の制御を行う制御部であり、内蔵のシステム
プログラムに従い、演算に使用するデータを記憶したワ
ークメモリ２を使用して対話管理プログラムを実行して
ワークメモリ２上に処理結果を作成する。The CPU 1 is a control unit for controlling the entire apparatus including execution of voice / moving image processing, and uses a work memory 2 storing data used for calculation in accordance with a built-in system program, and a dialogue management program. Is executed to create a processing result on the work memory 2.

【００２６】ワークメモリ２は、文字表示に関するデー
タや演算に使用するデータ及び演算結果等を一時的に記
憶するいわゆるワーキングメモリであり、ＣＰＵ１によ
り実行されるプログラム処理において利用されるプログ
ラム、音声・動画像処理データ等を格納する。なお、こ
のプログラムは、システムプログラムとして図示してい
ないＲＯＭに格納されるものでもよい。また、ワークメ
モリとして用いられるＲＡＭの一部の記憶領域は、電源
バックアップするか、あるいはＥＥＰＲＯＭ（electric
ally erasable programmable ROM）やフラッシュメモリ
等の不揮発性メモリにより構成され、電源ＯＦＦ後も設
定条件を保持する。この記憶領域には、各種設定データ
や処理データ等が格納される。The work memory 2 is a so-called working memory for temporarily storing data relating to character display, data used for calculation, calculation results, etc., and is a program used in the program processing executed by the CPU 1, voice / moving images. It stores image processing data and so on. Note that this program may be stored in a ROM (not shown) as a system program. In addition, a part of the storage area of the RAM used as a work memory is backed up by a power source, or an EEPROM (electric
It is composed of non-volatile memory such as an all erasable programmable ROM) and flash memory, and retains the setting conditions even after the power is turned off. Various setting data, processing data, and the like are stored in this storage area.

【００２７】出力部５は、ドットマトリックス構成の液
晶表示装置（ＬＣＤ）又はロボット等であり、データベ
ース４に格納される音声情報や動作情報、ＣＰＵ１の演
算処理結果を表示又は動作により提示する。また、この
ロボットは、機構的な構造をもつハード構成のものでも
よいし、出力部５の表示画面上に例えばＣＧで合成表示
されるものでもよい。The output unit 5 is a liquid crystal display device (LCD) or a robot having a dot matrix structure, and presents voice information and operation information stored in the database 4 and calculation results of the CPU 1 by display or operation. Further, this robot may have a hardware structure having a mechanical structure, or may be synthesized and displayed on the display screen of the output unit 5 by, for example, CG.

【００２８】外部記憶ドライバ６は、対話管理プログラ
ムを記憶した外部記憶装置８の読み取り装置である。メ
モリカード、ＦＤ、ＣＤ−ＲＯＭ等の外部記憶装置８
は、本発話権管理機能を実現するためのプログラム、後
述する音声・動画像処理プログラム等を記録した記憶媒
体である。The external storage driver 6 is a reading device of the external storage device 8 which stores the dialogue management program. External storage device 8 such as a memory card, FD, CD-ROM, etc.
Is a storage medium in which a program for realizing the speaking right management function, a voice / moving image processing program described later, and the like are recorded.

【００２９】以下、上述のように構成された対話装置の
動作を説明する。まず、対話装置１０の基本動作につい
て述べる。図１において、ユーザからの音声信号はマイ
ク等の音声入力部１０１により入力され、ユーザの頭部
を含む身体各部位の動きはビデオカメラ等の動作入力部
１０２により入力される。Hereinafter, the operation of the dialog device configured as described above will be described. First, the basic operation of the dialogue device 10 will be described. In FIG. 1, a voice signal from a user is input by a voice input unit 101 such as a microphone, and a movement of each body part including a user's head is input by a motion input unit 102 such as a video camera.

【００３０】入力された音声信号は、音声処理部１０３
により認識処理されて対話管理部１０５に出力され、入
力されたユーザの身体各部位の動きは、動作処理部１０
４により認識処理されて対話管理部１０５に出力され
る。The input voice signal is processed by the voice processing unit 103.
The motion of each part of the user's body, which is recognized and processed by the user, is output to the dialogue management unit 105.
4 and recognition processing is performed and the result is output to the dialogue management unit 105.

【００３１】対話管理部１０５は、音声処理部１０３及
び動作処理部１０４により認識処理された音声・動作情
報に基づいて対話の進行を管理し、自己（システム）発
話の有無（後述する図９のＷ１又はＷ２に対応する）を
対話進行判定部２０３に出力するとともに、自己（シス
テム）の出力すべき音声及び動作を適当なタイミングで
音声制御部１０６及び動作制御部１０７に渡す。The dialogue management unit 105 manages the progress of the dialogue based on the voice / motion information recognized and processed by the voice processing unit 103 and the motion processing unit 104, and determines whether or not there is a self (system) utterance (see FIG. 9 described later). (Corresponding to W1 or W2) is output to the dialogue progress determination unit 203, and the voice and action to be output by itself (system) are passed to the voice control unit 106 and the action control unit 107 at appropriate timing.

【００３２】音声制御部１０６及び動作制御部１０７
は、音声及び動作を制御し、音声出力部１０８及び動作
出力部１０９で提示される。音声処理部１０３で認識処
理された音声信号、及び動作処理部１０４で認識処理さ
れた動きはまた、対話調整部２００に入力されており、
音声特徴判定部２０１では、入力された音声から相づち
音声を分析し、相づち動作特徴判定部２０２では、入力
された動作から相づち動作の大きさを測定する。そし
て、対話進行判定部２０３は、両者の判定結果に基づい
て、対話が「情報理解・情報共有」「対話終了」のいず
れの段階にあるかを判定する。Voice control unit 106 and operation control unit 107
Controls voice and motion, and is presented by the voice output unit 108 and the motion output unit 109. The voice signal recognized by the voice processing unit 103 and the motion recognized by the motion processing unit 104 are also input to the dialogue adjusting unit 200.
The voice feature determination unit 201 analyzes the coherent voice from the input voice, and the coherence operation feature determination unit 202 measures the magnitude of the coherent action from the input action. Then, the dialogue progress judging unit 203 judges, based on the judgment results of both parties, at which stage of “information understanding / information sharing” and “dialogue end” the dialogue is.

【００３３】対話進行判定部２０３は、判定結果を対話
管理部１０５に出力するとともに、「対話終了」の段階
であると判定されると、判定結果を終了予告部２０４に
出力する。終了予告部２０４は、待ち状態にある次ユー
ザに終了予告通知部２０５で対話終了の予告をする。The dialogue progress judgment unit 203 outputs the judgment result to the dialogue management unit 105, and also outputs the judgment result to the end notice unit 204 when it is judged that the stage is the “dialogue end”. The end notice unit 204 gives a notice to the next user in the waiting state of the end of the dialogue by the end notice unit 205.

【００３４】次に、上記対話調整部２００における動作
について具体的に説明する。音声特徴判定部２０１で
は、マイク等の音声入力部１０１から入力されるユーザ
の音声データがＡ／Ｄ変換され、相づちと認識された音
声データの認識結果（相づちの種類）を出力し、当該音
声データのｒｍｓ振幅（パワー）をフレーム（例えば、
１／３０ｓｅｃ）毎に測定する。相づち音声は一般に、
「ええ」「はい」「ん」などの音声として現れる。ｒｍ
ｓ振幅とは、音声波の音圧の変化の平均であり、音の大
きさはｒｍｓ振幅に依存する。Next, the operation of the dialogue adjusting section 200 will be specifically described. In the voice feature determination unit 201, the voice data of the user input from the voice input unit 101 such as a microphone is A / D-converted, and the recognition result (type of mutual relation) of the voice data recognized as mutual relation is output. The rms amplitude (power) of the data is framed (eg,
Measure every 1/30 sec). Mutual voice is generally
Appear as voices such as "Yes", "Yes", and "N". rm
The s amplitude is the average of changes in the sound pressure of a voice wave, and the loudness of the sound depends on the rms amplitude.

【００３５】相づち動作特徴判定部２０２では、ビデオ
カメラ等の動作入力部１０２から入力され、動作処理部
１０４でＡ／Ｄ変換され、相づちと認識されたユーザの
動きの大きさをフレーム（例えば、１／３０ｓｅｃ）毎
に測定する。ここで、動作入力部１０２は、モーション
キャプチャシステムなど、人間等の身体の一以上の部分
の位置情報を抽出するシステムでもよく、その場合は、
身体各部位の３次元位置情報が動作処理部１０４に入力
される。相づち動作は一般に、頭部の上下方向（３次元
座標上ではＹ軸方向）の動きとして捉えられる。In the grouping motion feature determining section 202, the magnitude of the user's movement, which is input from the motion inputting section 102 such as a video camera and A / D converted in the motion processing section 104 and is recognized as the grouping motion, is framed (for example, Measure every 1/30 sec). Here, the motion input unit 102 may be a system such as a motion capture system that extracts position information of one or more parts of the human body, in which case,
The three-dimensional position information of each body part is input to the motion processing unit 104. The joint movement is generally regarded as the movement of the head in the vertical direction (Y-axis direction on the three-dimensional coordinates).

【００３６】以下、図３乃至図６を参照して対話進行判
定部２０３において対話の進行を判定する動作について
説明する。実際の人間同士の対話を収録したデータを分
析した結果、以下のような傾向が判明した。対話の流れ
を大きく捉えると、一方の話者が話題を提起し、他方の
話者が話題となっている情報の内容について理解し、さ
らに両者で話題を共有しながら進んでいくが、それ以上
話すことがなくなると、両者とも「んー」、「そー」、
うなずき、といった相づちで応答するようになり、次第
に話題が終了に向い、次の話題に転換する。すなわち、
対話は大まかに、情報理解→情報共有→話題終了へと進
むと考えられる。The operation of determining the progress of the dialogue in the dialogue progress determination unit 203 will be described below with reference to FIGS. As a result of analyzing the data recording the actual dialogue between humans, the following tendency was found. When we grasp the flow of dialogue in a big way, one speaker raises a topic, the other speaker understands the content of the information being talked about, and further progresses while sharing the topic with both parties. When I don't talk
They started to respond with a nod, and the topic gradually turned to the end, turning to the next topic. That is,
It is thought that the dialogue proceeds roughly from information comprehension → information sharing → end of topic.

【００３７】図３乃至図６は、対話データの分析結果の
一例を示す図であり、図３は「情報理解」の段階での相
づちの例を、図４及び図５は「情報共有」の段階での相
づちの例を、図６は「話題終了」の段階での相づちの例
をそれぞれ示す。FIGS. 3 to 6 are diagrams showing an example of the analysis result of the dialogue data. FIG. 3 shows an example of the association at the “information understanding” stage, and FIGS. 4 and 5 show “information sharing”. FIG. 6 shows an example of association at the stage, and FIG. 6 shows an example of association at the stage of “topic end”.

【００３８】各図において、（ａ）は音声データ（ｒｍ
ｓ、ピッチ、書き起こし）、（ｂ）は頭部（Ｈｅａｄ）
の動きの速度を示したものである。動き情報としては、
速度のほかに、各部位の移動量の大きさ、又は加速度で
もよい。In each figure, (a) is voice data (rm
(s, pitch, transcription), (b) is the head (Head)
It shows the speed of movement of. As motion information,
In addition to speed, the amount of movement of each part or acceleration may be used.

【００３９】このように人間同士の対話を分析した結
果、以下の傾向が判明した。（１）一方の話者が話題を提起し、相手話者がその話題
の情報を理解（情報理解）したり、あるいは情報につい
て同じ意見を共有していたり、感情を表出（情報共有）
する場合、相づち音声又は動作が、相手発話中に起こる
傾向がある。例えば、図３（ａ）において、一方の話者
が「ＸＸのー」と発話中に、対話相手は情報理解して
「あっ」という相づち音声を表出すると共に、図３
（ｂ）に示すように頭部の動き（相づち動作）が生じ
る。さらに、一方の話者が「…ねんねんわりびきをー」
と発話中に、対話相手は情報理解して「あー」という相
づち音声を表出し、同時に図３（ｂ）に示すように頭部
の動き（相づち動作）が数回続く。図４では、一方の話
者が発話中に、対話相手が情報共有して相づち音声（図
４（ａ））及び相づち動作（図４（ｂ））を表出してい
る。図５でも同様に、情報共有の相づち音声及び相づち
動作が見られる。（２）提起された話題についてそれ以上話すことがなく
なり、対話が終了しつつある（対話終了）と、相手発話
末又は相手発話後（相手発話休止区間）に、“間”を埋
めるように相づち動作又は音声が起こるようになり、そ
の時の相づち音声の種類は「んー」「そー」である傾向
がある。例えば、図６（ａ）において、一方の話者が
「んー」と発話すると、対話相手はその発話後の“間”
を埋めるように、「そーなの」と相づちを入れている。
また、「めずらしーね」と話すと、「そーかね」という
相づちを相手発話休止区間に間を埋めるようにうつ。（３）情報理解又は情報共有時の相づち音声の種類は、
図３乃至図５に示すように「あっ」「あー」「んんん」
「へー」「えっ」「ええ」「はい」「うそー」「ほんと
ー」であることが多いが、対話が終了しつつある（対話
終了）場合は、図６に示すようにこれらの相づち音声は
現れない。（４）対話終了時の相づち音声は、情報理解・情報共有
時の相づち音声に比べて、ｒｍｓ振幅値が小さい傾向が
ある。例えば、図６に示す対話終了時の相づち音声は、
図３に示す情報理解や図４に示す情報共有の相づち音声
よりもｒｍｓ振幅値が小さい。As a result of analyzing the dialogue between humans in this way, the following tendency was found. (1) One speaker raises a topic and the other speaker understands the information of the topic (information understanding), or shares the same opinion about the information, and expresses emotions (information sharing).
If so, the voices or movements tend to occur during the other party's speech. For example, in FIG. 3A, while one speaker speaks “XX no-”, the conversation partner understands the information and expresses a voice together with “Ah”.
As shown in (b), the movement of the head (joint movement) occurs. In addition, one of the speakers said, "...
During the utterance, the conversation partner understands the information and expresses a joint voice "Ah", and at the same time, as shown in FIG. 3B, the movement of the head (joint movement) continues several times. In FIG. 4, while one speaker is speaking, the other party of the dialogue shares information and expresses mutual voice (FIG. 4A) and mutual movement (FIG. 4B). Similarly, in FIG. 5, the joint voice and the joint movement of information sharing can be seen. (2) When the conversation is about to end (dialogue end) because the subject is no longer being talked about, and the conversation ends, or after the other party's utterance (participant utterance pause period), the "pause" is filled. A motion or voice starts to occur, and at that time, the type of the coherent voice tends to be “n” or “so”. For example, in FIG. 6 (a), when one speaker utters "Nu", the other party of the dialogue "pauses" after the utterance.
So as to fill in, "Sona no" is added.
Also, when you say "Mezurashine", you will be depressed to fill the gap of "Sokane" in the other party's speech pause. (3) The types of mutual voice when understanding or sharing information are:
As shown in FIG. 3 to FIG. 5, “Ah” “Ah” “Nnn”
Often "he", "eh", "yes", "yes", "lie", and "really", but when the dialogue is about to end (end of dialogue), as shown in FIG. Does not appear. (4) The rms amplitude value of the coherent voice at the end of the dialogue tends to be smaller than that of the coherent voice at the time of information understanding and information sharing. For example, the connected voice at the end of the dialogue shown in FIG.
The rms amplitude value is smaller than that of the companion voice for information understanding shown in FIG. 3 and the information sharing shown in FIG.

【００４０】上記知見に基づき、以下の手順で対話終了
を判定する。図７及び図８は、対話進行判定部２０３に
おいて対話の進行を判定する動作のフローチャートであ
り、本フローは図２のＣＰＵ１において実行される。図
中、Ｓはフローの各ステップを示す。また、図９はシス
テム発話とユーザの相づち挿入のタイミングを説明する
図である。図中、Ｗ１はシステム発話中、Ｗ２はシステ
ム発話末及び発話後を示す。Based on the above knowledge, the dialogue end is determined by the following procedure. 7 and 8 are flowcharts of the operation of determining the progress of the dialogue in the dialogue progress determination unit 203, and this flow is executed by the CPU 1 of FIG. In the figure, S indicates each step of the flow. FIG. 9 is a diagram for explaining the timing of system utterance and the user's mutual insertion. In the figure, W1 indicates a system utterance, and W2 indicates a system utterance end and a system utterance end.

【００４１】対話進行判定処理がスタートすると、対話
進行判定部２０３では、対話管理部１０５から自己（シ
ステム）発話が発話中であるか（図９のＷ１区間）ある
いは発話末・発話後（図９のＷ２区間）であるかを抽出
する（ステップＳ１０１）。次いで、ステップＳ１０２
で対話進行判定部２０３は、音声特徴判定部２０１と相
づち動作特徴判定部２０２の両者に基づき、相づち音声
区間及び相づち動作区間を抽出し、以下の手順で対話終
了を判定する。When the dialogue progress judgment processing starts, the dialogue progress judgment unit 203 determines whether the self (system) utterance is being uttered from the dialogue management unit 105 (W1 section in FIG. 9) or at the end / after utterance (FIG. 9). W2 section) is extracted (step S101). Then, step S102.
Then, the dialogue progress judging unit 203 extracts the mutual voice section and the mutual movement section based on both the voice feature judging unit 201 and the joint motion characteristic judging unit 202, and judges the dialogue end by the following procedure.

【００４２】（１）自己（システム）の発話中（図９の
Ｗ１区間）にユーザの相づち音声又は動作が現れるとき
は、ステップＳ１０３で「情報理解・情報共有」と判定
して本フローを終了する。例えば、図９において、自己
（システム）の発話中（図９のＷ１区間）にユーザの相
づち音声「Ｖ１」が現れるときは、このステップＳ１０
３で「情報理解・情報共有」と判定される。（２）自己（システム）の発話末又は発話後（図９のＷ
２区間）にユーザの相づち音声又は動作が現れるとき
は、ステップＳ１０４で相づちが音声を含むか否かを判
定する。例えば、図９において、自己（システム）の発
話末又は発話後（図９のＷ２区間）にユーザの相づち音
声「Ｖ２」が現れるときは、まずこのステップＳ１０４
で「情報理解・情報共有」と判定され、次いで相づちの
種類又は韻律、あるいは動きの速さによりさらに詳細に
判定される。(1) When the user's mutual voice or motion appears during the utterance of the self (system) (W1 section in FIG. 9), it is judged as "information understanding / information sharing" in step S103 and this flow is ended. To do. For example, in FIG. 9, when the user's mutual voice “V1” appears during the utterance of itself (system) (W1 section in FIG. 9), this step S10
At 3, it is judged as "information understanding / information sharing". (2) End of utterance of self (system) or after utterance (W in FIG. 9)
When the user's mutual voice or motion appears in (2 sections), it is determined in step S104 whether the mutual voice includes a voice. For example, in FIG. 9, when the user's joint voice “V2” appears at the end of the utterance of the self (system) or after the utterance (W2 section of FIG. 9), first, this step S104.
Is judged as "information understanding / information sharing", and then is judged in more detail according to the kind or prosody of the joint, or the speed of movement.

【００４３】すなわち、上記ステップＳ１０４で相づち
が動作のみの時は、ステップＳ１０５で動きの速さを測
定し、速さの平均値が所定の閾値Ｄ１以上のときは、ス
テップＳ１０７で対話終了と判定して本フローを終了
し、速さの平均値が所定の閾値Ｄ１より小さいときは、
ステップＳ１０６で「情報理解・情報共有」と判定して
本フローを終了する。That is, when the movements are the only movements in step S104, the speed of movement is measured in step S105. When the average speed is equal to or higher than the predetermined threshold value D1, it is determined in step S107 that the dialogue is ended. Then, the present flow is ended, and when the average value of the speeds is smaller than the predetermined threshold value D1,
In step S106, "information understanding / information sharing" is determined, and this flow ends.

【００４４】上記ステップＳ１０４で相づち音声がある
場合は、ステップＳ１０８で音声の種類を判定し、相づ
ち音声が「あっ」「あー」「んんん」「へー」「えっ」
「ええ」「はい」「うそー」など、「情報理解」又は
「情報共有」を意味する種類又は韻律を持つ場合は、ス
テップＳ１０９で「情報理解・情報共有」と判定して本
フローを終了する。If there is a coherent voice in the above step S104, the kind of the voice is judged in step S108, and the coherent voice is "Ah""Ah""Nnn""Hee""Eh".
If it has a type or prosody that means “information understanding” or “information sharing”, such as “yes”, “yes”, “lie”, it is determined as “information understanding / information sharing” in step S109, and this flow ends. To do.

【００４５】上記ステップＳ１０８で相づち音声が「ん
ー」「そー」など、「対話終了」を意味する種類又は韻
律を持つ場合は、ステップＳ１１０で音声のｒｍｓ振幅
を測定し、音声のｒｍｓ振幅の平均値が所定の閾値Ｄ２
よりも小さいか否かを判別する。音声のｒｍｓ振幅の平
均値が所定の閾値Ｄ２よりも小さいときは、ステップＳ
１１２で「対話終了」と判定して本フローを終了し、音
声のｒｍｓ振幅の平均値が所定の閾値Ｄ２以上のとき
は、ステップＳ１１１で「情報理解・情報共有」と判定
して本フローを終了する。In step S108, if the voices have a kind or prosody that means "dialogue end" such as "n" and "so", the rms amplitude of the voice is measured in step S110, and the rms amplitude of the voice is measured. Is the predetermined threshold value D2
Is smaller than. When the average value of the rms amplitude of the voice is smaller than the predetermined threshold D2, step S
In 112, it is determined as “dialogue end” and this flow is ended, and when the average value of the rms amplitude of the voice is equal to or greater than the predetermined threshold D2, in step S111 it is determined as “information understanding / information sharing” and this flow is executed. finish.

【００４６】上記処理フローによる判定結果が「情報理
解・情報共有」である場合は、対話進行判定部２０３は
判定結果を対話管理部１０５に出力し、処理を終了す
る。また、上記判定結果が「対話終了」である場合は、
対話進行判定部２０３は判定結果を対話管理部１０５に
出力するとともに、終了予告部２０４に入力する。終了
予告部２０４では、次ユーザに対話の終了を予告するこ
ととし、終了予告通知部２０５に入力する。第１の実施
の形態では、このような対話過程における相づちのパタ
ーンの変化を捉えることで、対話の終了を予測し、次ユ
ーザに対話の終了を予告することができる。When the determination result according to the above process flow is "information understanding / information sharing", the dialogue progress determination unit 203 outputs the determination result to the dialogue management unit 105 and ends the processing. If the above judgment result is “dialogue end”,
The dialogue progress judgment unit 203 outputs the judgment result to the dialogue management unit 105 and inputs it to the end notice unit 204. The end notice unit 204 gives a notice to the next user of the end of the dialogue, and inputs it to the end notice unit 205. In the first embodiment, it is possible to predict the end of the dialogue and to notify the next user of the end of the dialogue by catching the change in the mutual patterns in the dialogue process.

【００４７】以上のように、本実施の形態の対話装置１
０は、ユーザからの音声信号を入力する音声入力部１０
１、ユーザの身体各部位の動きを入力する動作入力部１
０２、入力された音声信号を認識処理する音声処理部１
０３、入力された動きを認識処理する動作処理部１０
４、ユーザとの対話を調整する対話調整部２００、ユー
ザからの入力に基づいて対話の進行を管理するととも
に、対話進行判定部２０３の判定結果を管理する対話管
理部１０５、音声制御部１０６、動作制御部１０７、音
声出力部１０８及び動作出力部１０９を備え、対話調整
部２００は、入力された音声から相づち音声の特徴を判
定する音声特徴判定部２０１、入力された動作から相づ
ち動作の特徴を判定する相づち動作特徴判定部２０２、
これら音声ないし動作の特徴から対話の進行状態を判定
する対話進行判定部２０３を備え、対話進行判定部２０
３の判定結果に基づき次ユーザに対話の終了を予告通知
するので、機械とユーザとの対話において、対話の終了
を予測し、順番を待つ次ユーザに順番が近いことを予告
することが可能になり、順番待ちのイライラを緩和する
ことができ、ユーザとシステムとの間に円滑な対話を実
現できる。As described above, the dialogue device 1 of the present embodiment
0 is a voice input unit 10 for inputting a voice signal from the user.
1. Motion input unit 1 for inputting movements of various parts of the user's body
02, a voice processing unit 1 for recognizing an input voice signal
03, motion processing unit 10 for recognizing input motion
4, a dialogue adjusting unit 200 for adjusting the dialogue with the user, a dialogue managing unit 105 for managing the progress of the dialogue based on an input from the user, and a decision result of the dialogue progress judging unit 203, a voice control unit 106, The interaction adjustment unit 200 includes a motion control unit 107, a voice output unit 108, and a motion output unit 109, and the dialogue adjustment unit 200 determines a feature of the synched voice from the input voice, a voice feature determination unit 201, and a feature of the synched motion from the input action. The joint motion characteristic determination unit 202 for determining
A dialogue progress determination unit 203 for determining the progress state of the dialogue based on the characteristics of these voices or motions is provided, and the dialogue progress determination unit 20 is provided.
Since the next user is notified of the end of the dialogue based on the determination result of 3, it is possible to predict the end of the dialogue in the dialogue between the machine and the user, and to notify the next user who is waiting for the turn that the turn is near. Therefore, the annoyance of waiting for a turn can be alleviated, and a smooth dialogue between the user and the system can be realized.

【００４８】第２の実施の形態図１０は、本発明の第２の実施の形態の対話装置の基本
構成を示すブロック図である。図１と同一構成部分に
は、同一番号を付して重複箇所の説明を省略する。図１
０において、本実施の形態の対話装置２０は、対話調整
部３００が、終了誘導部２０６（対話終了誘導手段）を
備えて構成される。また、対話管理部１０５Ａ（対話管
理手段）及び対話進行判定部２０３Ａ（対話進行判定手
段）の処理が第１の実施の形態と異なる。Second Embodiment FIG. 10 is a block diagram showing the basic configuration of a dialogue apparatus according to the second embodiment of the present invention. The same components as those in FIG. 1 are designated by the same reference numerals and the description of the duplicated portions will be omitted. Figure 1
0, in the dialogue device 20 of the present embodiment, the dialogue adjusting unit 300 is configured to include the end guiding unit 206 (dialogue ending guiding means). Further, the processes of the dialogue management unit 105A (dialogue management means) and the dialogue progress determination unit 203A (dialogue progress determination means) are different from those of the first embodiment.

【００４９】以下、上述のように構成された対話装置２
０の動作を説明する。第１の実施の形態の対話装置１０
の動作と異なる点について述べる。Hereinafter, the dialogue device 2 configured as described above.
The operation of 0 will be described. Dialogue device 10 of the first embodiment
Differences from the operation of will be described.

【００５０】本実施の形態では、対話管理部１０５Ａ
は、ユーザからの入力に基づいて対話の進行を管理する
とともに、対話進行判定部２０３Ａの判定結果を管理す
る。そして、対話進行判定部２０３Ａの判定結果を基
に、例えばなかなか「対話終了」にならず、対話時間が
長くなり過ぎた場合等に、システム側から「対話終了」
を決定し、終了誘導部２０６に対話終了を通知する。終
了誘導部２０６はシステム側から対話を終了させるよう
誘導するための相づちを生成するための相づちを音声及
び動作で提示するための特徴を付加し、音声制御部１０
６及び動作制御部１０７に出力する。具体的には、ユー
ザの発話末又は終了後数フレーム（３０フレーム＝１
秒）の間に「対話終了」に特徴的な相づち音声又は相づ
ち動作を出力する。音声制御部１０６、動作制御部１０
７ではこれらの特徴を元の音声及び動作に反映するよう
に制御して、音声出力部１０８及び動作出力部１０９で
提示する。In the present embodiment, the dialogue management unit 105A.
Manages the progress of the dialogue based on the input from the user and manages the determination result of the dialogue progress determination unit 203A. Then, based on the judgment result of the dialogue progress judgment unit 203A, for example, when the dialogue time is not very long and the dialogue time is too long, the system side gives the dialogue end.
Is determined and the end guide unit 206 is notified of the end of the dialogue. The end guide unit 206 adds a feature for presenting a group of voices and actions for generating a group of voices for guiding the user to terminate the dialogue from the system side, and the voice control unit 10
6 and the operation control unit 107. Specifically, several frames after the user's utterance or after the end (30 frames = 1
During the second), the spit voice or the spit operation characteristic of "dialogue end" is output. Voice control unit 106, operation control unit 10
In 7, the characteristics are controlled so as to be reflected in the original voice and the action, and presented by the voice output unit 108 and the action output unit 109.

【００５１】第２の実施の形態では、システム側（例え
ば、ＣＧやロボット）の発話又は動作を制御して対話が
終了するよう誘導することで、待ち時間を短縮すること
ができるようになる。また、対話の終了を予測するとと
もにシステム側から対話が終了するよう誘導すること
で、対話の終了を促進し、待ち時間をさらに短縮できる
ようになる。In the second embodiment, the waiting time can be shortened by controlling the utterance or operation of the system side (for example, CG or robot) to induce the dialogue to end. In addition, by predicting the end of the dialogue and inducing the system to end the dialogue, the end of the dialogue can be promoted and the waiting time can be further shortened.

【００５２】第３の実施の形態図１１は、本発明の第３の実施の形態の対話装置の基本
構成を示すブロック図である。図１及び図１０と同一構
成部分には、同一番号を付して重複箇所の説明を省略す
る。Third Embodiment FIG. 11 is a block diagram showing the basic configuration of an interactive device according to the third embodiment of the present invention. The same components as those in FIGS. 1 and 10 are designated by the same reference numerals, and description of duplicated portions will be omitted.

【００５３】図１１において、本実施の形態の対話装置
３０は、対話調整部４００が、終了誘導部２０６を備
え、対話進行判定部２０３Ｂ（対話進行判定手段）で
「対話終了」と判定されたとき、対話進行判定部２０３
Ｂは判定結果を終了予告部２０４と終了誘導部２０６に
出力する。In FIG. 11, in the dialogue device 30 of the present embodiment, the dialogue adjusting unit 400 includes the end guiding unit 206, and the dialogue progress judging unit 203B (conversation progress judging means) judges that the dialogue has ended. At this time, the dialogue progress determination unit 203
B outputs the determination result to the end notifying unit 204 and the end guiding unit 206.

【００５４】これにより、対話進行判定部２０３Ｂで
「対話終了」と判定されたとき、次ユーザに対話の終了
を予告するとともに、現ユーザに対話の終了を誘導する
システム側の相づちを生成し、出力することが可能にな
る。As a result, when the dialogue progress judging unit 203B judges that the dialogue has ended, the system notifies the next user of the end of the dialogue and creates a system-side relationship that guides the current user to end the dialogue. It becomes possible to output.

【００５５】第３の実施の形態では、対話の終了を次ユ
ーザに予告するとともに、現ユーザとの対話を終了する
よう誘導することができる。なお、第２の実施の形態の
構成と第３の実施の形態の構成を両方持っていても良い
ことは勿論である。In the third embodiment, it is possible to notify the next user of the end of the dialogue and guide the user to end the dialogue with the current user. Needless to say, it may have both the configuration of the second embodiment and the configuration of the third embodiment.

【００５６】第４の実施の形態図１２は、本発明の第４の実施の形態の対話システムの
構成を示すブロック図であり、図１、図１０又は図１１
の対話装置１０，２０，３０が複数ある対話システムで
ある。図１２において、２０３Ｃは、各対話装置<１>〜
<３>の対話進行判定部であり、図１、図１０又は図１１
の対話進行判定部２０３，２０３Ａと同様の機能を有す
るものである。また、各対話装置<１>〜<３>の終了予告
部２０４Ｃ及び終了予告通知部２０５Ｃは、共通に構成
される。Fourth Embodiment FIG. 12 is a block diagram showing the configuration of a dialogue system according to a fourth embodiment of the present invention, which is shown in FIG. 1, FIG. 10 or FIG.
Is a dialogue system having a plurality of dialogue devices 10, 20, 30. In FIG. 12, 203C is each of the dialogue devices <1> to
<3> Dialogue progress judging unit, which is shown in FIG. 1, FIG. 10 or FIG.
It has a function similar to that of the dialogue progress determination units 203 and 203A. Further, the end notice unit 204C and the end notice unit 205C of each of the dialogue devices <1> to <3> are configured in common.

【００５７】各対話装置<１>〜<３>は、対話進行判定部
２０３Ｃにより「対話終了」の段階であると判定される
と、判定された時刻とともに判定結果を終了予告部２０
４Ｃに出力する。終了予告部２０４Ｃは、各対話装置<
１>〜<３>から送られてくる判定時刻を基に、最も早く
「対話終了」となると予測される装置のＩＤ番号を、待
ち状態にある次ユーザに終了予告通知部２０５Ｃで予告
をする。When each of the dialogue devices <1> to <3> is judged by the dialogue progress judging section 203C to be in the "dialogue end" stage, the judgment result together with the judged time is given to the end notifying section 20.
Output to 4C. The end notifying unit 204C determines that each dialogue device <
Based on the determination times sent from 1> to <3>, the end notice notifying unit 205C gives a notice to the next user in the waiting state of the ID number of the device predicted to be the “dialogue end” earliest. .

【００５８】これにより、複数のユーザと複数のシステ
ムとの対話において、順番を待つユーザに、どの対話装
置に進めばよいかを教示することができる。ここでは、
判定時刻の例を述べたが、全システムの対話終了予測時
刻あるいは対話終了までの時間を通知してもよい。With this, in a dialogue between a plurality of users and a plurality of systems, it is possible to teach a user waiting for a turn to which dialogue apparatus to proceed to. here,
Although the example of the determination time has been described, the predicted dialogue end time of all systems or the time until the dialogue end may be notified.

【００５９】なお、本発明の対話装置は、上述の実施の
形態に限定されるものではなく、本発明の要旨を逸脱し
ない範囲内において種々変更を加え得ることは勿論であ
る。例えば、上述したようなマルチモーダルインタフェ
ースを用いた情報処理装置に適用することもできるが、
これには限定されず、全ての装置に適用可能である。The interactive device of the present invention is not limited to the above-described embodiment, and it goes without saying that various changes can be made without departing from the gist of the present invention. For example, although it can be applied to an information processing device using a multimodal interface as described above,
The present invention is not limited to this, and can be applied to all devices.

【００６０】また、上記各実施の形態に係る対話装置
が、ＰＤＡ（Personal Digital Assistants）等の携帯
情報端末やパーソナルコンピュータの音声・動画像処理
機能として組み込まれたものでもよい。また、上記各実
施の形態では、対話装置の名称を用いているが、これは
説明の便宜上であり、例えば音声・動画像処理装置、マ
ルチモーダルインタフェース装置でもよい。さらに、上
記対話装置を構成する各回路部等の種類、データベース
などは前述した実施形態に限られない。Further, the dialogue device according to each of the above-described embodiments may be incorporated as a voice / moving image processing function of a portable information terminal such as a PDA (Personal Digital Assistants) or a personal computer. Further, in each of the above-described embodiments, the name of the dialogue device is used, but this is for convenience of description, and may be, for example, a voice / moving image processing device or a multimodal interface device. Furthermore, the types of the circuit units and the like that make up the dialog device and the database are not limited to those in the above-described embodiment.

【００６１】以上説明した対話装置は、この処理装置を
機能させるためのプログラムでも実現される。このプロ
グラムはコンピュータで読み取り可能な記録媒体に格納
されている。本発明では、この記録媒体として、メイン
メモリそのものがプログラムメディアであってもよい
し、また外部記憶装置としてプログラム読み取り装置が
設けられ、そこに記録媒体を挿入することで読み取り可
能なプログラムメディアであってもよい。いずれの場合
においても、格納されているプログラムはＣＰＵがアク
セスして実行させる構成であってもよいし、あるいはい
ずれの場合もプログラムを読み出し、読み出されたプロ
グラムは、図示されていないプログラム記憶エリアにダ
ウンロードされて、そのプログラムが実行される方式で
あってもよい。このダウンロード用のプログラムは予め
本体装置に格納されているものとする。The dialog device described above is also realized by a program for operating this processing device. This program is stored in a computer-readable recording medium. In the present invention, as the recording medium, the main memory itself may be a program medium, or a program reading device is provided as an external storage device, and the program medium can be read by inserting the recording medium therein. May be. In any case, the stored program may be configured to be accessed and executed by the CPU, or in any case, the program is read and the read program is stored in a program storage area (not shown). The program may be downloaded to the computer and the program may be executed. It is assumed that this download program is stored in the main body device in advance.

【００６２】ここで、上記プログラムメディアは、本体
と分離可能に構成される記録媒体であり、例えばＰＣカ
ード（ＳＲＡＭカード）のほか、磁気テープやカセット
テープ等のテープ系、フロッピーディスク（登録商標）
やハードディスク等の磁気ディスクやＣＤ−ＲＯＭ／Ｍ
Ｏ／ＭＤ／ＤＶＤ等の光ディスクのディスク系、ＩＣカ
ード／光カード等のカード系、あるいはマスクＲＯＭ、
ＥＰＲＯＭ、ＥＥＰＲＯＭ、フラッシュＲＯＭ等による
半導体メモリを含めた固定的にプログラムを担持する媒
体であってもよい。Here, the program medium is a recording medium which can be separated from the main body. For example, in addition to a PC card (SRAM card), a tape system such as a magnetic tape or a cassette tape, a floppy disk (registered trademark).
Disk such as hard disk and hard disk, CD-ROM / M
Disk system of optical disk such as O / MD / DVD, card system such as IC card / optical card, or mask ROM,
It may be a medium that fixedly carries the program, including a semiconductor memory such as an EPROM, an EEPROM, a flash ROM, and the like.

【００６３】さらに、外部の通信ネットワークとの接続
が可能な通信装置を備えている場合には、その通信装置
を介して通信ネットワークからプログラムをダウンロー
ドするように、流動的にプログラムを担持する媒体であ
ってもよい。なお、このように通信ネットワークからプ
ログラムをダウンロードする場合には、そのダウンロー
ド用プログラムは予め本体装置に格納しておくか、ある
いは別な記録媒体からインストールされるものであって
もよい。なお、記録媒体に格納されている内容としては
プログラムに限定されず、データであってもよい。Further, when a communication device capable of connecting to an external communication network is provided, it is a medium that carries the program fluidly so that the program is downloaded from the communication network via the communication device. It may be. When the program is downloaded from the communication network as described above, the download program may be stored in the main body device in advance, or may be installed from another recording medium. The content stored in the recording medium is not limited to the program and may be data.

【００６４】[0064]

【発明の効果】以上、詳述したように、本発明によれ
ば、ユーザの相づちから対話の終了を判定し、次ユーザ
に対話終了を予告する構成としたので、機械とユーザと
の対話において、対話の終了を予測し、順番を待つ次ユ
ーザに順番が近いことを予告することができ、順番待ち
のイライラを緩和することができる。As described above in detail, according to the present invention, the end of the dialogue is determined from the user's co-operation and the next user is notified of the end of the dialogue. Therefore, in the dialogue between the machine and the user. , It is possible to predict the end of the dialogue and give a notice to the next user who is waiting for his turn that his turn is near, and it is possible to alleviate the frustration of waiting for his turn.

【００６５】また、システム側から対話を終了したいと
きは、対話終了を誘導する相づちを生成し、対話の終了
を誘導する構成としたので、システム側（例えばＣＧや
ロボット）の発話ないし動作を制御して対話が終了する
よう誘導することで、待ち時間を短縮することができ
る。When it is desired to end the dialogue from the system side, a structure for guiding the end of the dialogue is generated to guide the end of the dialogue, so that the utterance or operation of the system side (for example, CG or robot) is controlled. Then, the waiting time can be shortened by inducing the dialogue to end.

【００６６】さらに、ユーザの相づちから対話の終了を
判定し、次ユーザに対話終了を予告するとともに、シス
テム側から対話終了を誘導する相づちを生成し、対話の
終了を誘導する構成としたので、対話の終了を予測する
とともにシステム側から対話が終了するよう誘導するこ
とで、対話の終了を促進し、待ち時間をさらに短縮でき
る。したがって、待ち時間を短縮することができ、ユー
ザとシステムとの間に自然な対話を実現することができ
る。Further, since the end of the dialogue is judged from the user's co-operation, the next user is notified of the end of the dialogue, and the co-operation for guiding the end of the dialogue is generated from the system side to guide the end of the dialogue. By predicting the end of the dialogue and inducing the system to end the dialogue, the end of the dialogue can be promoted and the waiting time can be further shortened. Therefore, the waiting time can be shortened and a natural dialogue can be realized between the user and the system.

[Brief description of drawings]

【図１】本発明の第１の実施の形態の対話装置の基本構
成を示すブロック図である。FIG. 1 is a block diagram showing a basic configuration of a dialogue device according to a first embodiment of the present invention.

【図２】本実施の形態の対話装置の具体的なシステム構
成を示す図である。FIG. 2 is a diagram showing a specific system configuration of the dialog device according to the present embodiment.

【図３】本実施の形態の対話装置の対話データの分析結
果の「情報理解」の段階での相づちの例を示す図であ
る。FIG. 3 is a diagram showing an example of how the result of analysis of dialogue data of the dialogue apparatus according to the present embodiment is related at the “information understanding” stage.

【図４】本実施の形態の対話装置の対話データの分析結
果の「情報共有」の段階での相づちの例を示す図であ
る。FIG. 4 is a diagram showing an example of how the result of analysis of dialogue data of the dialogue apparatus of the present embodiment is related at the “information sharing” stage.

【図５】本実施の形態の対話装置の対話データの分析結
果の「情報共有」の段階での相づちの例を示す図であ
る。FIG. 5 is a diagram showing an example of how the results of analysis of dialogue data of the dialogue device of the present embodiment are related at the “information sharing” stage.

【図６】本実施の形態の対話装置の対話データの分析結
果の「話題終了」の段階での相づちの例を示す図であ
る。FIG. 6 is a diagram showing an example of how the dialogue data analysis results of the dialogue device of the present embodiment are joined at the “topic end” stage.

【図７】本実施の形態の対話装置の対話進行判定部にお
いて対話の進行を判定する動作のフローチャートであ
る。FIG. 7 is a flowchart of an operation of determining the progress of a dialogue in the dialogue progress determination unit of the dialogue device of the present embodiment.

【図８】本実施の形態の対話装置の対話進行判定部にお
いて対話の進行を判定する動作のフローチャートであ
る。FIG. 8 is a flowchart of an operation of determining the progress of a dialogue in the dialogue progress determination unit of the dialogue device of the present embodiment.

【図９】本実施の形態の対話装置のシステム発話とユー
ザの相づち挿入のタイミングを説明する図である。FIG. 9 is a diagram for explaining the system utterance of the dialogue apparatus of the present embodiment and the timing of the user's mutual insertion.

【図１０】本発明の第２の実施の形態の対話装置の基本
構成を示すブロック図である。FIG. 10 is a block diagram showing a basic configuration of an interactive device according to a second embodiment of the present invention.

【図１１】本発明の第３の実施の形態の対話装置の基本
構成を示すブロック図である。FIG. 11 is a block diagram showing a basic configuration of an interactive device according to a third embodiment of the present invention.

【図１２】本発明の第４の実施の形態の対話システムの
構成を示すブロック図である。FIG. 12 is a block diagram showing a configuration of a dialogue system according to a fourth embodiment of this invention.

[Explanation of symbols]

１ＣＰＵ２ワークメモリ３入力部４データベース５出力部６外部記憶ドライバ７外部記憶装置（記録媒体）１０，２０，３０対話装置１０１音声入力部（音声入力手段）１０２動作入力部（動き検出手段）１０３音声処理部１０４動作処理部１０５対話管理部（対話管理手段）１０６音声制御部１０７動作制御部１０８音声出力部（音声出力手段）１０９動作出力部（動作出力手段）２００，３００，４００対話調整部２０１音声特徴判定部２０２相づち動作特徴判定部（対話進行判定手段）２０３，２０３Ａ，２０３Ｂ対話進行判定部（対話進
行判定手段の一部）２０４，２０４Ｃ終了予告部２０５，２０５Ｃ終了予告通知部２０６終了誘導部（対話終了誘導手段）1 CPU 2 work memory 3 input unit 4 database 5 output unit 6 external storage driver 7 external storage device (recording medium) 10, 20, 30 dialogue device 101 voice input unit (voice input means) 102 operation input unit (motion detection means) 103 voice processing unit 104 motion processing unit 105 dialogue management unit (dialogue management means) 106 voice control unit 107 motion control unit 108 voice output unit (voice output means) 109 action output unit (action output means) 200, 300, 400 dialogue adjustment Unit 201 Voice feature determination unit 202 Coupling operation feature determination unit (dialogue progress determination unit) 203, 203A, 203B Dialogue progress determination unit (a part of the dialogue progress determination unit) 204, 204C Termination notice unit 205, 205C Termination notice unit 206 End guidance unit (dialogue end guidance means)

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 3/00 ５７１Ｑ ─────────────────────────────────────────────────── ─── Continuation of front page (51) Int.Cl. ⁷ Identification code FI theme code (reference) G10L 3/00 571Q

Claims

[Claims]

1. A dialogue device corresponding to a voice input, wherein voice input means for inputting voice from a user, motion input means for inputting a user's motion, and based on the voice input and / or the motion input. A dialogue apparatus, comprising: dialogue progress determination means for determining the progress of a dialogue with a user; and dialogue termination notice means for notifying that the dialogue progress determination means predicts the end of the dialogue.

2. An interactive device corresponding to a voice input, a voice input means for inputting a voice from a user, a motion input means for inputting a user's motion, and analyzing the voice input and the motion input. , Dialogue management means for managing the dialogue with the user, and the dialogue management means, when the end of the dialogue is determined,
A dialogue ending device for controlling a voice output and a motion output so as to guide the end of the dialogue.

3. The dialogue progress judgment means further comprises dialogue end guide means for controlling voice output and action output so as to guide the end of the dialogue when the end of the dialogue is predicted. 1. The dialogue device according to 1.

4. A plurality of dialogue devices according to claim 1 or 2, further comprising: dialogue end advance notice notifying means for externally notifying end prediction timings of the plurality of dialogue devices together with IDs of the respective dialogue devices. Characterized dialogue device.

5. The dialogue apparatus according to claim 1, wherein the dialogue progress determination means determines information comprehension / information sharing and a progress state of dialogue end.

6. The dialogue apparatus according to claim 1, wherein the voice determined by the voice feature determination means is a continuous voice.

7. The dialogue apparatus according to claim 1, wherein the movement judged by the movement characteristic judging means is a joint movement.

8. The dialogue ending guide means, if the dialogue managing means decides to end the dialogue, the voice control means or the operation control so as to output at least one of a joint voice and a joint movement. 4. An interactive device according to claim 3, characterized in that it controls the means.

9. The dialogue device according to claim 2, wherein the action output displays the action of the user by means of an image or is expressed by the action of a robot having a mechanical structure.

10. An interactive device adapted to a computer for voice input, wherein voice input means for inputting a voice from a user, action input means for inputting a user's action, said voice input and / or said action input. Based on the above, the dialog device is provided with a dialog progress judging means for judging the progress of the dialogue with the user, and a dialogue end notice notifying means for notifying the end of the dialogue when the dialogue progress judging means predicts the end of the dialogue. A computer-readable recording medium on which a program for recording is recorded.

11. An interactive device, wherein a computer supports voice input, a voice input unit for inputting a voice from a user, a motion input unit for inputting a user's motion, the voice input and the motion input. Analyzing and managing the dialogue with the user, and the dialogue management means, when the dialogue management means decides to end the dialogue, the dialogue end for controlling its own voice output and action output so as to guide the end of the dialogue. A computer-readable recording medium having recorded therein a program for functioning as an interactive device including a guiding means.