JPH10111786A

JPH10111786A - Rhythm control dialog device

Info

Publication number: JPH10111786A
Application number: JP8262779A
Authority: JP
Inventors: Keiko Watanuki; 啓子綿貫
Original assignee: GIJUTSU KENKYU KUMIAI SHINJOHO SHIYORI KAIHATSU KIKO; Sharp Corp
Current assignee: GIJUTSU KENKYU KUMIAI SHINJOHO SHIYORI KAIHATSU KIKO; Sharp Corp
Priority date: 1996-10-03
Filing date: 1996-10-03
Publication date: 1998-04-28

Abstract

PROBLEM TO BE SOLVED: To attain a natural dialog by recognizing an operating state from the voices, gestures, etc., of a user, detecting the rhythm of the user's speech and controlling the output. SOLUTION: The voice pitch of the input data (I1 ) undergoes the A/D conversion, a voice pitch level is recognized in every prescribed processing unit, and the rise pitch of a prescribed level is recognized. These recognized level and pitch are sent to a rhythm detection means 3. The means 3 detects a cycle (rhythm) from the voice pitch of a user and also detects the dialog rhythm of the user together with the time information. A dialog management means 4 estimates the timing for the 'response-giving' answers and the speech switching answers which are outputted from a computer in response to the user's rhythm. Then an output means 5 outputs the 'response-giving' answers through an action where a human being model synthesized by the CG(computer graphics) shakes vertically his head to show the voice output 'yes'.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、人間とコンピュー
タとの対話装置に関し、より詳細には、音声あるいは身
振りなどを通じて伝えようとする情報を認識して対話を
交わす対話装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a dialogue device between a human and a computer, and more particularly, to a dialogue device for recognizing information to be conveyed through voice or gesture and having a dialogue.

【０００２】[0002]

【従来の技術】従来、人間とコンピュータの間のインタ
フェースとしては、音声入力に対して応答する対話装置
が考えられてきた。これは、人間が発する音声を認識
し、それに応じコンピュータ側においてシステムの内部
状態を変化させることにより人間に対してあらかじめ決
められた出力をし、人間との対話を実現しようとしたも
のである。また、こうしたコンピュータとの対話をより
円滑にするために、入力音声に対してアニメーション等
が応答する音声応答システムが提案されている。2. Description of the Related Art Hitherto, as an interface between a human and a computer, a dialogue device which responds to voice input has been considered. This is to realize a dialogue with a human by recognizing a voice uttered by a human and changing the internal state of the system on the computer side to output a predetermined output to the human. Further, in order to facilitate such dialogue with a computer, a voice response system in which animation or the like responds to input voice has been proposed.

【０００３】上述のような対話装置を実現するための具
体化手法の１例として、発話が終了してから、あるい
は、一定の間（ポーズ区間）を検出して、その検出時か
ら、音声を理解する処理を始めることが多い。しかし、
この場合、処理に要する時間の全ての間ユーザは待たさ
れ、ユーザの発話が終了してしばらくしてからシステム
の応答を受けることになる。このような応答の遅れは、
ユーザをいらつかせ、スムーズな対話をコンピュータと
ユーザの間に実現させる妨げになる。特開平６−１１０
８３５号公報のように、コンピュータからの音声をさえ
ぎってユーザが発話できる装置など、ユーザ自身の発話
を自由にするものはあるが、対話を促進するようにコン
ピュータからの出力を制御するものではなかった。[0003] As an example of a concrete method for realizing the above-described dialogue device, after the utterance is completed, or during a certain period (pause section), a voice is output from the detection. Often starts a process to understand. But,
In this case, the user waits for the entire time required for the processing, and receives a response from the system some time after the utterance of the user ends. Such a delay in response
Frustrating the user and hindering smooth interaction between the computer and the user. JP-A-6-110
As disclosed in Japanese Patent Application Laid-Open No. 835, there is a device that allows the user to speak freely, such as a device that allows the user to speak by interrupting the sound from the computer, but does not control the output from the computer so as to promote dialogue. Was.

【０００４】[0004]

【発明が解決しようとする課題】本発明は、こうした従
来技術における問題点に鑑みてなさせたもので、人間と
コンピュータの対話装置において、ユーザが対話の際に
起こす動作に対応して発生する複数の信号特徴からユー
ザ対話のリズムを抽出し、このリズムに応じてコンピュ
ータから出力される音声情報や画像情報等を制御するこ
とにより、自然な対話を実現するリズム制御対話装置を
提供することをその解決すべき課題とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above-mentioned problems in the prior art, and occurs in a human-computer interactive apparatus in response to an operation that a user performs during an interactive session. It is an object of the present invention to provide a rhythm control dialogue device that realizes a natural dialogue by extracting a rhythm of a user dialogue from a plurality of signal features and controlling audio information and image information output from a computer according to the rhythm. The issues that need to be solved.

【０００５】[0005]

【課題を解決するための手段】請求項１の発明は、伝え
るべき情報を表現する音声等の入力及び出力手段と、該
入力手段から入力されたユーザの音声や身振り等から動
作モード及びその動作モードにおける動作状態を認識す
る認識手段と、該認識手段による認識結果からユーザ発
話のリズムを検出するリズム検出手段と、該リズム検出
手段の検出結果に対応して前記出力手段からの出力を制
御する対話管理手段を備えるようにしたことを特徴と
し、ユーザのリズムに合わせたスムーズな対話をコンピ
ュータとの間に実現し得るものである。According to a first aspect of the present invention, there is provided an input / output unit for inputting and outputting voice and the like representing information to be transmitted, and an operation mode and an operation thereof based on a user's voice and gesture input from the input unit. Recognition means for recognizing an operation state in a mode, rhythm detection means for detecting a rhythm of a user's utterance from the recognition result by the recognition means, and controlling output from the output means in accordance with the detection result of the rhythm detection means It is characterized by having a dialogue management means, and can realize a smooth dialogue with the computer in accordance with the rhythm of the user.

【０００６】請求項２の発明は、請求項１に記載のリズ
ム制御対話装置において、時刻付与手段を備え、該時刻
付与手段は、前記認識手段により認識されたユーザの動
作状態の開始及び終了時刻を付与するとともに、前記リ
ズム検出手段におけるリズム検出動作及び前記対話管理
手段における管理動作に伴う動作時刻を付与しそれらの
時間管理に用いるようにしたことを特徴とし、よりスム
ーズな対話を可能とするものである。According to a second aspect of the present invention, there is provided the rhythm control dialogue device according to the first aspect, further comprising a time giving means, wherein the time giving means starts and ends the operation state of the user recognized by the recognition means. And an operation time associated with the rhythm detection operation of the rhythm detection means and the management operation of the dialog management means are provided and used for time management thereof, thereby enabling a smoother dialogue. Things.

【０００７】請求項３の発明は、請求項１または２のい
ずれかに記載のリズム制御対話装置において、前記リズ
ム検出を、ユーザから発生される音声パワー、音声ピッ
チおよび手の動きの動作モードのうち、少なくとも一つ
以上の種類の動作モードについてその動作状態を認識す
ることにより行うようにしたことを特徴とし、有効な具
体化手段を提供するものである。According to a third aspect of the present invention, in the rhythm control dialogue device according to any one of the first and second aspects, the rhythm detection is performed based on a voice power generated from a user, a voice pitch, and an operation mode of a hand movement. The present invention is characterized in that it is performed by recognizing the operation state of at least one or more types of operation modes, and provides an effective concrete means.

【０００８】請求項４の発明は、請求項１ないし３のい
ずれかに記載のリズム制御対話装置において、前記対話
管理手段は、前記リズム検出手段の検出結果に応じて、
前記出力手段から出力されるユーザへの応答のタイミン
グを制御するようにしたことを特徴とするものである。According to a fourth aspect of the present invention, in the rhythm control dialogue device according to any one of the first to third aspects, the dialogue management means is configured to perform the following depending on the detection result of the rhythm detection means.
The timing of a response to the user output from the output means is controlled.

【０００９】請求項５の発明は、請求項１ないし４のい
ずれかに記載のリズム制御対話装置において、前記対話
管理手段は、前記リズム検出手段の検出結果に応じて、
前記出力手段から出力されるコンピュータの応答のリズ
ムを変化させるリズム調節手段を備え、コンピュータ側
からユーザへ伝える情報の出力のリズムをユーザのリズ
ムに合わせるようにしたことを特徴とするものである。According to a fifth aspect of the present invention, in the rhythm control dialogue device according to any one of the first to fourth aspects, the dialogue management means is configured to perform the following based on the detection result of the rhythm detection means.
Rhythm adjusting means for changing the rhythm of the response of the computer output from the output means is provided, and the rhythm of the information output from the computer to the user is adjusted to the rhythm of the user.

【００１０】請求項６の発明は、請求項１ないし５のい
ずれかに記載のリズム制御対話装置において、前記リズ
ム検出手段にリズムの変化を検出するリズム変化検出手
段をさらに備え、該リズム変化検出手段の検出結果を前
記対話管理手段に入力するようにしたことを特徴とし、
ユーザが返答に困っている状況でもユーザの発話を助け
るような応答を行うことを可能とするものである。According to a sixth aspect of the present invention, in the rhythm control dialogue apparatus according to any one of the first to fifth aspects, the rhythm detecting means further includes a rhythm change detecting means for detecting a change in rhythm, and the rhythm change detecting means is provided. Characterized in that the detection result of the means is input to the dialog management means,
This makes it possible to make a response that assists the user in speaking even in situations where the user is having trouble answering.

【００１１】[0011]

【発明の実施の形態】本発明の実施の形態を添付図にも
とづいて以下に説明する。図１は、本発明のリズム制御
対話装置の１実施形態を示すブロック図である。本実施
形態のリズム制御対話装置は、データ入力手段１から入
力される音声信号、身振りの時刻情報を含む入力データ
（１₁），（１₂）…を認識する複数チャネルの認識手段
２を具備しており、認識手段２には、時刻情報を出力す
る時刻付与手段６と各認識手段２₁，２₂…より並列に出
力される認識結果を統合処理してユーザの対話のリズム
を検出するリズム検出手段３とが接続されている。リズ
ム検出手段３には、リズム検出手段３の検出結果をリズ
ムの覆歴として保持する覆歴格納手段７と、リズム検出
手段３により検出されたリズムに基づいて対話を進める
ための処理を行う対話管理手段４が接続させており、対
話管理手段４には、対話に必要な情報をユーザに対して
提示する出力データ等を出力する出力手段５が接続され
ている。Embodiments of the present invention will be described below with reference to the accompanying drawings. FIG. 1 is a block diagram showing one embodiment of the rhythm control dialogue device of the present invention. The rhythm control dialogue apparatus according to the present embodiment includes a plurality of channels of recognition means 2 for recognizing input data (1 ₁ ), (1 ₂ )... Including voice signals input from the data input means 1 and gesture time information. The recognition means 2 integrates the recognition results output in parallel from the time giving means 6 for outputting time information and the recognition means 21 ₁ , 22 ₂ ... To detect the rhythm of the user's dialogue. The rhythm detecting means 3 is connected. The rhythm detecting means 3 includes a history storing means 7 for storing the detection result of the rhythm detecting means 3 as a history of the rhythm, and a dialog for performing a process for advancing a dialog based on the rhythm detected by the rhythm detecting means 3. The management means 4 is connected, and the dialog management means 4 is connected to an output means 5 for outputting output data or the like for presenting information necessary for the dialogue to the user.

【００１２】なお、各認識手段２₁，２₂…は、それぞれ
の認識データに応じた認識アルゴリズムを持ち、さらに
認識結果を得た入力データのこの認識手段における処理
開始時刻と認識結果を得た処理終了時刻を時刻付与手段
６から得るように構成されている。Each of the recognition means 2 ₁ , 2 _2, ... Has a recognition algorithm corresponding to the respective recognition data, and further obtains the processing start time and the recognition result of the input data for which the recognition result has been obtained. The processing end time is obtained from the time giving means 6.

【００１３】また上記した入力データとして、ユーザの
行動あるいは動作に対応して発生する複数の信号を取り
込むことを可能とするようにデータ入力手段１を構成
し、たとえば、カメラやマイク、動きセンサー、あるい
は心電計などを用い、それらから得られる、ユーザの行
動あるいは動作に対応して発生する複数の信号を取り込
むようにする。そして、たとえば、音声の大きさ（音声
パワー）や音声の高低（音声ピッチ）、発話の速度、ポ
ーズ（間）の長さ、発話区間の長さ、身振り、顔の向
き、口の大きさや形、目の大きさやまばたき、視線の方
向、身振り、手振り、頭の動き、心拍数が認識される。
出力手段としては、たとえば、スピーカやディスプレイ
（視認可能な表示手段）、触覚装置が可能である。The data input means 1 is constituted so as to be able to take in a plurality of signals generated in response to the action or action of the user as the input data, for example, a camera, a microphone, a motion sensor, Alternatively, an electrocardiograph or the like is used, and a plurality of signals obtained in response to the user's action or operation are acquired. Then, for example, the volume of the voice (voice power), the pitch of the voice (voice pitch), the speed of the utterance, the length of the pause (interval), the length of the utterance section, the gesture, the face direction, the size and shape of the mouth , Eye size, blinking, gaze direction, gesture, hand gesture, head movement, heart rate are recognized.
As the output unit, for example, a speaker, a display (visible display unit), or a tactile device can be used.

【００１４】以下では、入力データを音声としてその音
声ピッチを認識する手段、また出力手段として合成音声
を出力する音声出力手段と、ＣＧ（Computer Graphic
s）による疑似人間を表示する出力手段とを具備するコ
ンピュータによるものとして、本発明の実施形態を説明
する。先ず、入力データ（１₁）の音声ピッチはＡ／Ｄ
変換され、あらかじめ決められた処理単位（フレーム：
１フレームは１／３０秒）毎に「音声ピッチ」レベルが
認識され、あらかじめ決められたレベルの「上昇ピッ
チ」が認識され、リズム検出手段３に送出される。各認
識手段２₁，２₂…から渡される情報は、開始時刻、終了
時刻、モード、認識結果が表わされる。開始時刻や終了
時刻は時刻付与手段６から渡される値であり、その認識
結果を得た入力データのこの認識手段における処理開始
時刻と認識結果を得た処理終了時刻とを表わす。Hereinafter, means for recognizing the voice pitch of input data as voice, voice output means for outputting synthesized voice as output means, and CG (Computer Graphic)
An embodiment of the present invention will be described as being based on a computer having output means for displaying a simulated human according to s). First, the voice pitch of the input data (1 ₁ ) is A / D
The converted and predetermined processing unit (frame:
The “voice pitch” level is recognized every 1/30 second of one frame), the “rising pitch” of a predetermined level is recognized, and sent to the rhythm detecting means 3. The information passed from each of the recognition means 2 ₁ , 2 _2, ... Represents a start time, an end time, a mode, and a recognition result. The start time and the end time are values passed from the time giving means 6, and represent the processing start time of the input data obtained by the recognition result and the processing end time obtained by the recognition result.

【００１５】動作モードとは、この実施形態では音声ピ
ッチで、ユーザから同時に発生される複数の出力の種類
を表わす。認識結果は、これらモードに応じて求めら
れ、音声ピッチでは「上昇ピッチ」として得られる。リ
ズム検出部３では、入力されたユーザの音声ピッチから
周期（リズム）を検出し、時刻情報とともにユーザの対
話のリズムが検出される。人間同士の対話の解析で、対
話におけるリズムが、音声パワー（音声の有無）、音声
ピッチ（上昇ピッチ）、手の振り（動きの加速度）から
抽出できることがわかっている。以下では、音声の「上
昇ピッチ」を検出し、リズムを検出する例について説明
する。図２は、リズム検出手段３で処理の一例を説明す
るための図である。図２には、ユーザが「コンピュータ
中でそれを再現してやると」と発話したときの音声の時
間変化が示されている。縦軸がピッチ（Ｈｚ）、横軸が
時間（フレーム）である。図２中に上昇ピッチが示さ
れ、各上昇ピッチの開始時刻ｔ１，ｔ２，ｔ３…から、
上昇ピッチの出現の周期を自己相関により求め、時刻情
報と共にユーザの対話のリズムが検出される。この例の
場合、時刻ｔ２での周期（リズム）Ｒｔ２は５７フレー
ム（＝１.９sec）、時刻ｔ３での周期（リズム）Ｒｔ３
は５３フレーム（１.８sec）である。The operation mode is a voice pitch in this embodiment, and represents a type of a plurality of outputs simultaneously generated by the user. Recognition results are obtained according to these modes, and are obtained as a “rising pitch” in the voice pitch. The rhythm detecting unit 3 detects a cycle (rhythm) from the input voice pitch of the user, and detects the rhythm of the user's dialogue together with the time information. Analysis of the dialogue between humans has shown that the rhythm in the dialogue can be extracted from the voice power (presence or absence of voice), the voice pitch (elevation pitch), and the wave of the hand (acceleration of movement). Hereinafter, an example will be described in which a “rising pitch” of a voice is detected and a rhythm is detected. FIG. 2 is a diagram for explaining an example of a process performed by the rhythm detecting unit 3. FIG. 2 shows a time change of the voice when the user utters “when it is reproduced in the computer”. The vertical axis is pitch (Hz) and the horizontal axis is time (frame). The ascending pitch is shown in FIG. 2, and from the start times t1, t2, t3,.
The cycle of appearance of the rising pitch is obtained by autocorrelation, and the rhythm of the user's dialogue is detected together with the time information. In this example, the period (rhythm) Rt2 at time t2 is 57 frames (= 1.9 sec), and the period (rhythm) Rt3 at time t3.
Is 53 frames (1.8 sec).

【００１６】対話管理手段４は、上述のようにしてリズ
ム検出部３で検出されたユーザのリズムに応じて、コン
ピュータ側から出力されるあいづち応答や発話の交替、
といった応答のタイミングを推定する。以下では、あい
づち応答のタイミングを推定する例について説明する。The dialogue management means 4 responds to the user's rhythm detected by the rhythm detection unit 3 as described above, and outputs a response to the utterance and alternation of speech output from the computer.
Is estimated. In the following, an example of estimating the timing of a response will be described.

【００１７】図３は、対話管理手段４でのあいづち応答
処理の一例を説明するための図である。図３には、ユー
ザが「コンピュータ中でそれを再現してやると」と発話
したときの、コンピュータからのあいづち応答タイミン
グが示されている。ここでは、上昇ピッチの開始時刻ｔ
ｒを基準にしてあいづち応答を行う場合の例を示してお
り、Ｒｔｒは時刻ｔｒでのリズムであり、ｔｒ−１は時
刻ｔｒの直前の上昇ピッチの開始時刻である。FIG. 3 is a diagram for explaining an example of the response processing in the dialogue management means 4. FIG. 3 shows the timing of a response from the computer when the user utters, "I will reproduce it in the computer." Here, the rising pitch start time t
An example of a case where a response is made based on r is shown, where Rtr is the rhythm at time tr, and tr-1 is the start time of the rising pitch immediately before time tr.

【００１８】対話管理制御手段４は、上述のようにして
リズム検出部３で得られた時刻ｔｒでのリズムＲｔｒを
用いて、時刻付与手段６より随時得られる現在時刻ｔａ
がｔａ＝ｔｒ＋Ｒｔｒ−Ｔｍを満たす時にあいづち応答をおこなう。ここで、Ｔｍは
人間同士の対話の中で、あいづちの挿入されるタイミン
グを解析して得られた値で、ここでは０.２（sec）であ
る。このＴｍの値は、コンピュータの内部状態に応じて
値を変えることも考えられる。出力手段５は、音声出力
５₁や画像出力５₂等を備え、あいづちの応答として、人
間の姿をしたＣＧ(Computer Graphics)合成のモデル
を、音声出力「はい」と首を縦に振るうなずきの動作を
させて行う。また、まばたきなどをさせることも考えら
れる。The dialogue management control means 4 uses the rhythm Rtr at the time tr obtained by the rhythm detection unit 3 as described above to obtain the current time ta obtained from the time giving means 6 at any time.
Satisfies ta = tr + Rtr-Tm. Here, Tm is a value obtained by analyzing the timing of insertion of a message in a dialogue between humans, and is 0.2 (sec) here. The value of Tm may be changed according to the internal state of the computer. The output means 5, an audio output 5 ₁ and the image output 5 _2, etc., as a response to the back-channel feedback, the model of CG where the human form (Computer Graphics) synthesis, shaking his head in the vertical and voice output "yes" This is done by making a nod operation. In addition, blinking may be considered.

【００１９】図４は、本発明のもう１つのリズム制御対
話装置の実施形態を示すブロック図であり、同図を参照
して以下にその説明する。人間同士の対話の解析の結
果、考えながら話すときのリズムと、熱心に話している
ときでは、対話のリズムが変化することがわかってい
る。本実施形態では、このようなユーザの対話のリズム
の変化に応じて、コンピュータ側のリズムを変化させる
ものである。本実施形態の構成は、図４に示すように、
上述した実施形態の構成に加えて、ユーザの対話のリズ
ムに合わせてコンピュータの対話のリズムを調節する、
対話管理手段４におけるリズム調節手段８が付加されて
いる。FIG. 4 is a block diagram showing an embodiment of another rhythm control dialogue apparatus of the present invention, which will be described below with reference to FIG. Analysis of the dialogue between humans shows that the rhythm of talking while thinking and the rhythm of talking when eagerly talking change. In the present embodiment, the rhythm on the computer side is changed according to the change in the rhythm of the user's dialogue. The configuration of the present embodiment, as shown in FIG.
In addition to the configuration of the above-described embodiment, the rhythm of the computer interaction is adjusted in accordance with the rhythm of the user interaction.
A rhythm adjusting means 8 in the dialog managing means 4 is added.

【００２０】図４に示される構成によって行われるコン
ピュータの対話のリズムを調節する動作について次に説
明する。リズム検出手段３で検出されたリズム情報は、
時刻付与手段６からの時刻情報とともに対話管理手段４
におけるリズム調節手段８に送出される。図５は、リズ
ム調節手段８での処理の一例を説明するための図であ
る。リズム調整手段８では、前記リズム検出手段３で得
られたリズムＲｔが、コンピュータ側からユーザに伝え
る情報の出力のリズムとなるよう調節する。図５では、
ユーザのリズムＲｔに合わせて、コンピュータの発話の
上昇ピッチがリズムＲｔ毎に出現するように、発話の速
度を調節して、リズムを調節している例である。また、
コンピュータの発話だけでなく、ＣＧの手の振りのリズ
ムを調節するようにしてもいい。これにより、ユーザの
リズムの変化に合わせてコンピュータのリズムを変化さ
せることができるようになり、ユーザとコンピュータと
の間にリズム感のある対話が実現される。The operation of adjusting the rhythm of the computer dialogue performed by the configuration shown in FIG. 4 will now be described. The rhythm information detected by the rhythm detecting means 3 is
Dialogue management means 4 together with time information from time giving means 6
To the rhythm adjusting means 8 at FIG. 5 is a diagram for explaining an example of the processing in the rhythm adjusting means 8. The rhythm adjusting unit 8 adjusts the rhythm Rt obtained by the rhythm detecting unit 3 so as to be the rhythm of the output of information transmitted from the computer to the user. In FIG.
In this example, the rhythm is adjusted by adjusting the speed of the utterance so that the rising pitch of the utterance of the computer appears for each rhythm Rt in accordance with the rhythm Rt of the user. Also,
Not only the utterance of the computer, but also the rhythm of the CG hand may be adjusted. As a result, the rhythm of the computer can be changed in accordance with the change of the rhythm of the user, and a rhythm-like dialogue between the user and the computer is realized.

【００２１】図６は、本発明のもう１つの実施形態を示
すブロック図であり、同図を参照して以下にその説明を
する。人間同士の対話の解析の結果、返答に困ると、対
話のリズムが不規則になることがわかっている。本実施
形態では、このようなリズムの変化を検出することによ
り、ユーザが返答に困っている状況を推測し、コンピュ
ータからユーザの発話を助けるような応答を出力して、
ユーザの困惑を回避するようにするものである。本実施
形態は、図６に示すように、上述した図１に示される実
施形態の構成に加えて、リズム検出手段３において、ユ
ーザの対話のリズムの変化を検出するリズム変化検出手
段９が付加され構成されている。FIG. 6 is a block diagram showing another embodiment of the present invention, which will be described below with reference to FIG. Analysis of the dialogue between humans has shown that the rhythm of the dialogue becomes irregular if it is difficult to respond. In the present embodiment, by detecting such a change in the rhythm, it is inferred that the user has a problem in answering, and a response is output from the computer to assist the user in speaking.
The purpose is to avoid user confusion. In the present embodiment, as shown in FIG. 6, in addition to the configuration of the embodiment shown in FIG. 1 described above, a rhythm change detecting means 9 for detecting a change in the rhythm of a user's dialogue is added to the rhythm detecting means 3. It is configured.

【００２２】次に、図６に示される本実施形態の構成に
よりユーザの対話のリズムの変化を検出する動作につい
て説明する。リズム検出手段３で検出されたリズム情報
は時刻付与手段６からの時刻情報とともにリズム変化検
出手段９に送出される。リズム変化検出手段９では、前
記リズム検出手段３で得られる現在時刻ｔでのリズムＲ
ｔが、Ｒｔ／Ｒｔ−１＜Ｘ１を満たす場合、リズムが遅くなった、すなわち、たとえ
ばユーザが返答に困っていると検出され、その情報が対
話管理手段４に送られる。ここで、Ｘ１はたとえば０.
８などの定数で、コンピュータの内部状態に応じて変え
ることができる。このようにリズムが変化すると、対話
管理手段４で、コンピュータから、たとえば、「どうし
たの」という発話や、あるいは「どうしたの」という表
情のＣＧ顔画像を出力手段５を通じて行う。また、コン
ピュータの応答のリズムを速くして、ユーザのリズムを
速めるよう誘導することも考えられる。Next, the operation of detecting a change in the rhythm of the user's dialogue with the configuration of this embodiment shown in FIG. 6 will be described. The rhythm information detected by the rhythm detecting means 3 is sent to the rhythm change detecting means 9 together with the time information from the time giving means 6. The rhythm change detecting means 9 outputs the rhythm R at the current time t obtained by the rhythm detecting means 3.
If t satisfies Rt / Rt-1 <X1, the rhythm is detected to be slow, that is, for example, it is detected that the user is in trouble with a response, and the information is sent to the dialogue management means 4. Here, X1 is, for example, 0.
It can be changed by a constant such as 8, depending on the internal state of the computer. When the rhythm changes in this way, the dialogue management means 4 performs, for example, an utterance of "what happened" or a CG face image of the expression "what happened" through the output means 5 from the computer. It is also conceivable that the response rhythm of the computer is made faster to induce the user to make the rhythm faster.

【００２３】一方、Ｒｔ／Ｒｔ−１＞Ｘ２の場合は、リズムが速くなった、すなわち、たとえばユ
ーザが興奮したり、怒ったりしていると検出される。こ
こで、Ｘ２はたとえば１.２などの定数で、コンピュー
タの内部状態に応じて変えることがてきる。そして、こ
のようなリズムの変化に応じて対話管理手段４は、コン
ピュータからたとえば、「落ち着いて」という発話や、
あるいはなだめるようなしぐさのＣＧ画像を出力手段５
を通じて行う。また、コンピュータの対話のリズムを遅
くして、ユーザのリズムを遅くするよう誘導することも
考えられる。これにより、ユーザの対話のリズム変化を
検出して、ユーザが返答に困っている状況を検出し、コ
ンピュータからユーザの発話を助けるような応答を出力
して、ユーザの困惑を回避することが可能となる。On the other hand, when Rt / Rt-1> X2, it is detected that the rhythm has become faster, that is, for example, the user is excited or angry. Here, X2 is a constant such as 1.2, which can be changed according to the internal state of the computer. Then, in response to such a change in the rhythm, the dialogue management means 4 sends, for example, an utterance “calm down” from the computer,
Or a soothing CG image output means 5
Through. It is also conceivable that the rhythm of the dialogue of the computer is slowed to induce the user to slow down the rhythm. Thus, it is possible to detect a change in the rhythm of the user's conversation, detect a situation in which the user is having difficulty in answering, and output a response that assists the user in speaking from the computer, thereby avoiding user confusion. Becomes

【００２４】[0024]

【The invention's effect】

請求項１の効果：ユーザが伝えるべき情報を表現する動
作（動作状態）に対応して発生する音声および映像とい
った複数の信号の特徴からユーザの対話のリズムを検出
することができるとともに、ユーザの対話のリズムに応
じてコンピュータからの音声や可視信号の出力を制御す
ることができる。したがって、対話をスムーズに運ぶこ
とが可能なリズム制御対話装置を提供することができ
る。請求項２の効果：請求項１の効果に加えて、時刻情報を
出力する時刻付与手段をさらに備えたことで、ユーザの
行動に対応して発生する複数の信号特徴の時間的な相関
関係を捕えることができる。これにより、より間違いの
少ないリズムの検出及び対話管理ができる。請求項３の効果：請求項１または２の効果に加えて、ユ
ーザの音声の大きさ（音声パワー）と音声の高さ（音声
ピッチ）、および手の動きからユーザの対話のリズムが
検出されることにより、より間違いの少ないリズム検出
を具体化し得る。請求項４の効果：請求項１ないし４の効果に加えて、ユ
ーザのリズムに応じて、コンピュータからユーザへの応
答タイミングが制御される。これにより、ユーザのリズ
ムに合わせた応答タイミングの予測が可能になり、遅れ
ることなくコンピュータ側からユーザに対して応答する
ことができる。請求項５の効果：請求項１ないし４の効果に加えて、ユ
ーザのリズムに合わせてコンピュータ側の対話のリズム
が変化する。これにより、ユーザとコンピュータとの間
にリズム感のある対話が実現する。請求項６の効果：請求項１ないし５の効果に加えて、ユ
ーザのリズムの変化を検出し、ユーザが困惑している状
況が判定されることになって、その結果により、コンピ
ュータからューザの発話を助けるような応答を出力し、
ユーザの困惑を回避することができ、より満足できる対
話が可能となる。The rhythm of the user's dialogue can be detected from the characteristics of a plurality of signals such as audio and video generated in response to an operation (operation state) expressing information to be conveyed by the user. It is possible to control the output of audio and visual signals from the computer according to the rhythm of the dialogue. Therefore, it is possible to provide a rhythm control dialogue device capable of smoothly carrying out a dialogue. Effect of Claim 2: In addition to the effect of Claim 1, by further providing a time giving means for outputting time information, a temporal correlation of a plurality of signal features generated according to the user's action can be obtained. Can be caught. As a result, it is possible to detect a rhythm with less errors and manage the dialogue. Effect of Claim 3: In addition to the effect of Claim 1 or 2, the rhythm of the user's dialogue is detected from the volume of the user's voice (voice power), the pitch of the voice (voice pitch), and the hand movement. By doing so, rhythm detection with fewer errors can be realized. Effect of Claim 4: In addition to the effects of Claims 1 to 4, the response timing from the computer to the user is controlled according to the rhythm of the user. This makes it possible to predict the response timing according to the rhythm of the user, and the computer can respond to the user without delay. Effect of Claim 5: In addition to the effects of Claims 1 to 4, the rhythm of the dialogue on the computer side changes in accordance with the rhythm of the user. As a result, a rhythmic dialog between the user and the computer is realized. Effect of Claim 6: In addition to the effects of Claims 1 to 5, a change in the rhythm of the user is detected, and a situation in which the user is confused is determined. Outputs a response that helps the utterance,
The user's confusion can be avoided, and a more satisfactory conversation can be performed.

[Brief description of the drawings]

【図１】本発明のリズム制御対話装置の一実施形態の概
略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of an embodiment of a rhythm control dialogue device of the present invention.

【図２】本発明の図１の実施形態において、リズム検出
手段３での処理の一例を説明するための図である。FIG. 2 is a diagram for explaining an example of processing in a rhythm detecting means 3 in the embodiment of FIG. 1 of the present invention.

【図３】本発明の図１の実施形態において、対話管理手
段４での処理の一例を説明するための図である。FIG. 3 is a diagram for explaining an example of processing in a dialogue management means 4 in the embodiment of FIG. 1 of the present invention.

【図４】本発明のもう一つのリズム制御対話装置の実施
形態の概略構成を示すブロック図である。FIG. 4 is a block diagram showing a schematic configuration of an embodiment of another rhythm control dialogue device of the present invention.

【図５】本発明の図４の実施形態において、リズム調節
手段８での処理の一例を説明するための図である。FIG. 5 is a diagram for explaining an example of processing in a rhythm adjustment unit 8 in the embodiment of FIG. 4 of the present invention.

【図６】本発明のもう一つのリズム制御対話装置の実施
形態の概略構成を示すブロック図である。FIG. 6 is a block diagram showing a schematic configuration of an embodiment of another rhythm control dialogue device of the present invention.

[Explanation of symbols]

１…データ入力手段、２…認識手段、３…リズム検出手
段、４…対話管理手段、５…出力手段、６…時刻付与手
段、７…覆歴格納手段、８…リズム調節手段、９…リズ
ム変化検出手段。DESCRIPTION OF SYMBOLS 1 ... Data input means, 2 ... Recognition means, 3 ... Rhythm detection means, 4 ... Dialog management means, 5 ... Output means, 6 ... Time provision means, 7 ... Cover history storage means, 8 ... Rhythm adjustment means, 9 ... Rhythm Change detection means.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号ＦＩＧ１０Ｌ 9/00 ３０１Ｇ１０Ｌ 9/00 ３０１Ｃ ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁶ Identification code FI G10L 9/00 301 G10L 9/00 301C

Claims

[Claims]

1. An input and output unit for voice and the like expressing information to be conveyed, a recognition unit for recognizing an operation mode and an operation state in the operation mode from a user's voice and gesture input from the input unit; A rhythm detecting unit for detecting a rhythm of the user's utterance from the recognition result by the recognition unit; and a dialog managing unit for controlling an output from the output unit in accordance with the detection result of the rhythm detecting unit. Rhythm control dialogue device.

2. The apparatus according to claim 1, further comprising: a time providing means for providing the start and end times of the operation state of the user recognized by the recognition means, a rhythm detecting operation in the rhythm detecting means, and the dialogue management. 2. The rhythm control dialogue device according to claim 1, wherein an operation time associated with the management operation in the means is added and used for time management.

3. The rhythm detection is performed by recognizing an operation state of at least one or more types of operation modes among operation modes of voice power, voice pitch, and hand movement generated by a user. The rhythm control dialogue device according to claim 1 or 2, wherein:

4. The apparatus according to claim 1, wherein said dialogue management means controls a timing of a response to a user output from said output means according to a detection result of said rhythm detection means. 3. The rhythm control dialogue device according to any one of 3.

5. The computer according to claim 1, wherein said dialogue management means includes rhythm adjustment means for changing a rhythm of a computer response output from said output means in accordance with a detection result of said rhythm detection means. The rhythm control dialogue device according to any one of claims 1 to 4, wherein the output rhythm is adapted to the rhythm of the user.

6. The rhythm detecting means further comprising a rhythm change detecting means for detecting a rhythm change, and a detection result of the rhythm change detecting means is inputted to the dialog managing means. The rhythm control dialogue device according to any one of 1 to 5.