JP5982840B2

JP5982840B2 - Dialogue device, dialogue program, and dialogue method

Info

Publication number: JP5982840B2
Application number: JP2012019130A
Authority: JP
Inventors: 岳今井
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2012-01-31
Filing date: 2012-01-31
Publication date: 2016-08-31
Anticipated expiration: 2032-01-31
Also published as: JP2013154458A

Description

本発明は、対話装置、対話プログラムおよび対話方法に関する。 The present invention relates to a dialogue apparatus, a dialogue program, and a dialogue method.

会話や仕草などにより、人との間で親和的な対話（インタラクション）を行う対話装置が開発されている。この対話装置の一態様としては、セラピーロボットやエンターテイメントロボットなどが挙げられる。 An interactive device has been developed that performs an interactive interaction with a person through conversation or gestures. As an aspect of this interactive apparatus, a therapy robot, an entertainment robot, or the like can be given.

ところで、対話装置は、人から入力された発話の内容を理解できない場合がある。このように、人から入力された発話の内容を理解できない場合でも対話を継続する技術が各種提案されている。例えば、人から入力された発話を認識できなかった場合に、現在の会話における話題を判別し、話題に応じた返答を出力する技術が提案されている。また、例えば、ユーザの命令などを理解できない場合、ロボットに首を傾げさせたり、耳に手を当てるなどの所定の行動を行わせ、ユーザに対して、音声認識の信頼性が低いことや音声認識の失敗を知らせる技術が提案されている。 By the way, the dialogue apparatus may not understand the content of the utterance input by the person. As described above, various techniques have been proposed for continuing the conversation even when the content of the utterance input by the person cannot be understood. For example, there has been proposed a technique for discriminating a topic in a current conversation and outputting a response according to the topic when an utterance input by a person cannot be recognized. In addition, for example, when the user's command or the like cannot be understood, the robot is allowed to perform a predetermined action such as tilting the head or placing a hand on the ear, and the voice recognition reliability is low for the user. Techniques have been proposed to notify recognition failure.

特開２００２−１６９５９０号公報JP 2002-169590 A 特開２００２−１１６７９２号公報JP 2002-116792 A 特開２００２−１８９４８８号公報JP 2002-189488 A 特開２００３−２６０６８１号公報JP 2003260681 A

しかし、従来技術は、人との対話が継続されても、対話装置の発話が不自然な内容となり、人との対話が不自然になってしまう場合がある。 However, according to the conventional technology, even if the dialogue with the person is continued, the utterance of the dialogue apparatus may become unnatural and the dialogue with the person may become unnatural.

開示の技術は、人との対話が不自然になることを抑制できる対話装置、対話プログラムおよび対話方法を提供することを目的とする。 It is an object of the disclosed technology to provide an interactive apparatus, an interactive program, and an interactive method capable of suppressing an unnatural conversation with a person.

本願の開示する対話装置は、一つの態様において、検出部と、算出部と、表出部と、変更部とを有する。検出部は、対話対象者の発話および動作の少なくとも一方を検出する。算出部は、前記検出部により検出された発話および動作の少なくとも一方に基づいて、前記対話対象者との対話に関する評価を算出する。表出部は、前記対話対象者に対して仕草により定常的に態度を表出する。変更部は、前記算出部により算出された評価が高い状態が続いた場合、前記表出部により表出される態度を前記対話に対して理解の高い状態へと変更する。また、変更部は、前記評価が低い状態が続いた場合、前記表出部により表出される態度を前記対話に対して理解の低い状態へと変更する。 In one aspect, the dialog device disclosed in the present application includes a detection unit, a calculation unit, a presentation unit, and a change unit. The detection unit detects at least one of an utterance and an action of the conversation target person. The calculation unit calculates an evaluation related to the dialogue with the dialogue target person based on at least one of the utterance and the action detected by the detection unit. The exposing unit constantly expresses the attitude to the conversation target person by gesture. When the state where the evaluation calculated by the calculation unit is high continues, the change unit changes the attitude expressed by the display unit to a state of high understanding with respect to the dialogue. Moreover, a change part changes the attitude | position expressed by the said display part to the state with a low understanding with respect to the said dialog, when the state with the said low evaluation continues.

本願の開示する対話装置の一つの態様によれば、人との対話が不自然になることを抑制できる。 According to one aspect of the dialogue device disclosed in the present application, it is possible to suppress an unnatural dialogue with a person.

図１は、実施例１に係るロボットの構成を示す斜視図である。FIG. 1 is a perspective view illustrating the configuration of the robot according to the first embodiment. 図２は、実施例１に係るロボットの内部的な構成を示すブロック図である。FIG. 2 is a block diagram illustrating an internal configuration of the robot according to the first embodiment. 図３は、実施例１に係るロボットが有する制御部の機能的構成を示すブロック図である。FIG. 3 is a block diagram illustrating a functional configuration of a control unit included in the robot according to the first embodiment. 図４は、実施例１に係る会話データの構成の一例を示す図である。FIG. 4 is a diagram illustrating an example of a configuration of conversation data according to the first embodiment. 図５は、実施例１に係る非言語動作データの構成の一例を示す図である。FIG. 5 is a diagram illustrating an example of a configuration of non-language operation data according to the first embodiment. 図６は、実施例１に係る頷き動作データの構成の一例を示す図である。FIG. 6 is a diagram illustrating an example of a configuration of whispering operation data according to the first embodiment. 図７は、実施例１に係る定常動作頻度データの構成の一例を示す図である。FIG. 7 is a diagram illustrating an example of a configuration of steady operation frequency data according to the first embodiment. 図８は、対話の流れの一例を示す図である。FIG. 8 is a diagram showing an example of the flow of dialogue. 図９は、実施例１に係る対話処理の手順を示すフローチャートである。FIG. 9 is a flowchart illustrating the procedure of the dialogue processing according to the first embodiment. 図１０は、実施例１に係る表出態度決定処理の手順を示すフローチャートである。FIG. 10 is a flowchart illustrating the procedure of the expression attitude determination process according to the first embodiment. 図１１は、実施例１に係る応答処理の手順を示すフローチャートである。FIG. 11 is a flowchart of the response process according to the first embodiment. 図１２は、実施例１に係る定常態度表出処理の手順を示すフローチャートである。FIG. 12 is a flowchart illustrating a procedure of steady attitude expression processing according to the first embodiment. 図１３は、対話処理プログラムを実行するコンピュータの一例について説明するための図である。FIG. 13 is a diagram for describing an example of a computer that executes a dialogue processing program.

以下に、本願の開示する対話装置、対話プログラムおよび対話方法の実施例を図面に基づいて詳細に説明する。なお、この実施例は開示の技術を限定するものではない。そして、各実施例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。以下の実施例では、対話装置の一態様としてロボットに適用した場合について説明する。 Embodiments of a dialogue apparatus, a dialogue program, and a dialogue method disclosed in the present application will be described below in detail with reference to the drawings. Note that this embodiment does not limit the disclosed technology. Each embodiment can be appropriately combined within a range in which processing contents are not contradictory. In the following embodiments, a case where the present invention is applied to a robot will be described as an aspect of an interactive device.

［ロボットの全体構成］
まず、本実施例に係るロボットの全体構成について説明する。図１は、実施例１に係るロボットの構成を示す斜視図である。図１に示すように、ロボット１は、軟らかい毛で覆われた表皮や幼児に近い体型を持つ子ぐまのぬいぐるみの外観を採用している。ロボット１は、このような子ぐまのぬいぐるみの外観を採用することにより、スキンシップを促し、かつ動物と人間（幼児）の中間的な存在感を演出でき、人の日常生活に溶け込んで親和的な対話を行える。 [Entire robot configuration]
First, the overall configuration of the robot according to the present embodiment will be described. FIG. 1 is a perspective view illustrating the configuration of the robot according to the first embodiment. As shown in FIG. 1, the robot 1 adopts the appearance of a stuffed doll having a skin shape covered with soft hair and a body shape close to that of an infant. By adopting the appearance of the stuffed toy of such a cub, the robot 1 can promote skinship and produce an intermediate presence between animals and humans (infants). Can have a dialogue.

ロボット１は、頭２と、胴体３、右腕４Ｒと、左腕４Ｌと、右脚５Ｒと、左脚５Ｌとを有する。頭２は、頭２を垂直方向にしてロボット１を配置した際に、左右を軸（Ｘ軸）として前後方向に回転するピッチ方向と、上下を軸（Ｚ軸）として左右方向に回転するヨー方向と、前後を軸（Ｙ軸）として左右方向に回転するロール方向の動作が可能とされている。すなわち、ロボット１は、頭２をピッチ方向、ヨー方向、ロール方向の３軸方向に回転させることが可能とされている。ロボット１は、頭２を３軸方向に回転させることにより、様々な動作を行うことができる。例えば、ロボット１は、頭２をピッチ方向に回転させて頷く動作や、頭２をヨー方向に回転させて顔を横に向ける動作、頭２をロール方向に回転させて頭２を傾ける動作を行うことができる。 The robot 1 has a head 2, a torso 3, a right arm 4R, a left arm 4L, a right leg 5R, and a left leg 5L. When the robot 1 is placed with the head 2 in the vertical direction, the head 2 is rotated in the front-rear direction with the left / right axis (X axis) and the yaw rotated in the left / right direction with the upper / lower axis (Z axis). Operation in the roll direction that rotates in the left-right direction about the direction and the front-rear axis (Y-axis) is possible. That is, the robot 1 can rotate the head 2 in the three axis directions of the pitch direction, the yaw direction, and the roll direction. The robot 1 can perform various operations by rotating the head 2 in three axis directions. For example, the robot 1 performs an operation of turning the head 2 in the pitch direction, an operation of rotating the head 2 in the yaw direction and turning the face sideways, and an operation of tilting the head 2 by rotating the head 2 in the roll direction. It can be carried out.

また、右腕４Ｒ、左腕４Ｌは、頭２を上方向としてロボット１を配置した際に、胴体３と接続部分を軸として、前後方向に回転する動作が可能とされている。すなわち、ロボット１は、右腕４Ｒ、左腕４Ｌをピッチ方向の１軸方向に回転させることが可能とされている。ロボット１は、右腕４Ｒ、左腕４Ｌをピッチ方向に動作させて、右腕４Ｒ、左腕４Ｌを前後させる動作を行うことができる。 Further, the right arm 4R and the left arm 4L are capable of rotating in the front-rear direction around the body 3 and the connection portion when the robot 1 is placed with the head 2 in the upward direction. That is, the robot 1 can rotate the right arm 4R and the left arm 4L in one pitch direction. The robot 1 can move the right arm 4R and the left arm 4L back and forth by moving the right arm 4R and the left arm 4L in the pitch direction.

また、右脚５Ｒと、左脚５Ｌは、頭２を上方向としてロボット１を配置した際に、胴体３と接続部分を軸として、前後方向に回転する動作が可能とされている。ロボット１は、右脚５Ｒと、左脚５Ｌをピッチ方向に動作させて、右脚５Ｒと、左脚５Ｌを前後させる動作を行うことができる。 Further, the right leg 5R and the left leg 5L can be rotated in the front-rear direction around the body 3 and the connection portion when the robot 1 is disposed with the head 2 in the upward direction. The robot 1 can move the right leg 5R and the left leg 5L back and forth by moving the right leg 5R and the left leg 5L in the pitch direction.

頭２には、右耳６Ｒと、左耳６Ｌと、２つの目７と、鼻８と、口９が設けられている。右耳６Ｒおよび左耳６Ｌは、頭２と接続部分を軸として、前後方向に回転する動作が可能とされている。目７は、眼球を左右方向に回転させる動作が可能とされており、視線を左右へ移動させることが可能とされている。また、目７には、まぶた７Ａが設けられている。まぶた７Ａは、上下方向の動作が可能とされており、瞬きが可能とされている。口９は、開閉する動作が可能とされている。また、鼻８には、後述するカメラ４１が内蔵されている。カメラ４１は、光軸をロボット１の視線方向に概ね一致させて配置されている。さらに、頭２には、後述するマイク４２およびスピーカ４３が内蔵されている。 The head 2 is provided with a right ear 6R, a left ear 6L, two eyes 7, a nose 8, and a mouth 9. The right ear 6R and the left ear 6L are capable of rotating in the front-rear direction around the head 2 and the connection portion. The eye 7 is capable of rotating the eyeball in the left-right direction, and can move the line of sight to the left and right. The eye 7 is provided with an eyelid 7A. The eyelid 7A can move in the vertical direction and can blink. The mouth 9 can be opened and closed. Further, the nose 8 incorporates a camera 41 described later. The camera 41 is arranged with its optical axis substantially coincident with the viewing direction of the robot 1. Furthermore, the head 2 incorporates a microphone 42 and a speaker 43 described later.

本実施例に係るロボット１は、人との間で発話および仕草により対話を行う。本実施例に係るロボット１は、人の発話および動作の少なくとも一方を検出すると、検出された発話および動作の少なくとも一方に基づいて、対話対象者との対話に関する評価を算出する。本実施例に係るロボット１は、対話対象者との対話に関する評価に応じて発話や仕草により応答動作を行う。また、本実施例に係るロボット１は、人からの発話が無い状態でも対話対象者に対して仕草により定常的に態度を表出する。そして、本実施例に係るロボット１は、対話対象者の発話において、評価が高い状態が続いた場合、表出する態度を対話に対して理解の高い状態へと変更し、評価が低い状態が続いた場合、表出する態度を対話に対して理解の低い状態へと変更する。例えば、本実施例に係るロボット１は、対話内容が理解できており、興味が高いことを示す傾聴感、対話内容をやや理解できていないことを示す不明感、および対話内容を理解できておらず、退屈であることを示す退屈感の何れかの態度を仕草などにより表出する。そして、本実施例に係るロボット１は、評価が高い状態が続いた場合、表出する態度を傾聴感に変更し、評価が低い状態が続いた場合、不明感、退屈感へと変更する。このように、本実施例に係るロボット１は、対話対象者の発話に対して理解状態に応じて、表出する態度を変更する。このため、本実施例に係るロボット１によれば、ロボット１が表出する態度からロボット１が対話に対して理解が高い状態か理解が低い状態か対話対象者が判別できる。これにより、本実施例に係るロボット１によれば、ロボット１が対話に対して理解の低い状態となり発話が不自然な内容となっても、対話対象者が予めロボット１の対話に対する理解が低い状態であると判別できるため、人との対話が不自然になることを抑制できる。 The robot 1 according to the present embodiment performs a dialogue with a person by utterance and gesture. When the robot 1 according to the present embodiment detects at least one of a person's utterance and action, the robot 1 calculates an evaluation related to the conversation with the person to be talked based on at least one of the detected utterance and action. The robot 1 according to the present embodiment performs a response operation by utterance or gesture according to the evaluation regarding the dialogue with the dialogue target person. Further, the robot 1 according to the present embodiment constantly expresses an attitude to the conversation target person by gesture even when there is no utterance from a person. Then, the robot 1 according to the present embodiment changes the attitude to be expressed to a state of high understanding with respect to the conversation when the state of high evaluation continues in the utterance of the conversation target person, and the state of low evaluation is If it continues, the attitude to be expressed is changed to a state of low understanding for dialogue. For example, the robot 1 according to the present embodiment understands the content of the dialogue, and has a sense of listening that indicates that the user is highly interested, an unknown feeling that indicates that the content of the dialogue is not understood, and a content of the dialogue. First, any attitude of boredom indicating boring is expressed by gestures. And the robot 1 which concerns on a present Example will change the attitude to express to a feeling of listening when a state with high evaluation continues, and will change to a feeling of unknown and bored when a state with low evaluation continues. As described above, the robot 1 according to the present embodiment changes the attitude to be expressed in accordance with the understanding state with respect to the utterance of the conversation target person. For this reason, according to the robot 1 according to the present embodiment, it is possible to determine whether the robot 1 is in a state of high understanding or low understanding of the dialogue from the attitude of the robot 1 to express. Thereby, according to the robot 1 according to the present embodiment, even if the robot 1 is in a state of low understanding with respect to the dialogue and the utterance becomes unnatural, the conversation target person has a low understanding of the dialogue of the robot 1 in advance. Since it can be determined that the state is a state, it is possible to suppress an unnatural conversation with a person.

次に、本実施例に係るロボット１の内部構成について説明する。図２は、実施例１に係るロボットの内部的な構成を示すブロック図である。 Next, the internal configuration of the robot 1 according to the present embodiment will be described. FIG. 2 is a block diagram illustrating an internal configuration of the robot according to the first embodiment.

図２に示すように、ロボット１の胴体３には、頭２、右腕４Ｒ、左腕４Ｌ、右脚５Ｒおよび左脚５Ｌがモータ３３Ｐ、３３Ｒ、３３Ｙ、３４Ｒ、３４Ｌ、３５Ｒ及び３５Ｌを介して接続される。また、ロボット１の頭２には、小型のモータ３６Ｒ、３６Ｌ、３７、３８、３９が設けられている。 As shown in FIG. 2, the head 2, the right arm 4R, the left arm 4L, the right leg 5R, and the left leg 5L are connected to the body 3 of the robot 1 through motors 33P, 33R, 33Y, 34R, 34L, 35R, and 35L. Is done. The head 2 of the robot 1 is provided with small motors 36R, 36L, 37, 38, and 39.

モータ３３Ｐ、３３Ｒ及び３３Ｙは、ロボット１の頭２と胴体３とを接続する首関節に設けられる。モータ３３Ｐは、ロボット１の頭２をピッチ方向に回転させ、モータ３３Ｒは、ロボット１の頭２をロール方向に回転させ、また、モータ３３Ｙは、ロボット１の頭２をヨー方向に回転させる。 The motors 33P, 33R, and 33Y are provided at the neck joint that connects the head 2 and the body 3 of the robot 1. The motor 33P rotates the head 2 of the robot 1 in the pitch direction, the motor 33R rotates the head 2 of the robot 1 in the roll direction, and the motor 33Y rotates the head 2 of the robot 1 in the yaw direction.

また、モータ３４Ｒ及び３４Ｌは、ロボット１の右腕４Ｒ及び左腕４Ｌと胴体３とを接続する肩関節に設けられる。モータ３４Ｒは、ロボット１の右腕４Ｒをロボット１の前後方向に回転させる。モータ３４Ｌは、ロボット１の左腕４Ｌをロボット１の前後方向に回転させる。 The motors 34R and 34L are provided at shoulder joints that connect the right arm 4R and the left arm 4L of the robot 1 and the body 3. The motor 34R rotates the right arm 4R of the robot 1 in the front-rear direction of the robot 1. The motor 34L rotates the left arm 4L of the robot 1 in the front-rear direction of the robot 1.

また、モータ３５Ｒ及び３５Ｌは、ロボット１の右脚５Ｒ及び左脚５Ｌと胴体３とを接続する股関節に設けられる。モータ３５Ｒは、ロボット１の右脚５Ｒをロボット１の前後方向に回転させる。モータ３５Ｌは、ロボット１の左脚５Ｌをロボット１の前後方向に回転させる。 The motors 35R and 35L are provided at the hip joint that connects the right leg 5R and the left leg 5L of the robot 1 and the body 3. The motor 35R rotates the right leg 5R of the robot 1 in the front-rear direction of the robot 1. The motor 35L rotates the left leg 5L of the robot 1 in the front-rear direction of the robot 1.

また、モータ３６Ｒおよび３６Ｌは、ロボット１の頭２の右耳６Ｒ、左耳６Ｌの接続箇所に設けられる。モータ３６Ｒは、右耳６Ｒを前後方向に回転させる。モータ３６Ｌは、左耳６Ｌを前後方向に回転させる。 Further, the motors 36R and 36L are provided at the connection points of the right ear 6R and the left ear 6L of the head 2 of the robot 1. The motor 36R rotates the right ear 6R in the front-rear direction. The motor 36L rotates the left ear 6L in the front-rear direction.

モータ３７は、ロボット１の頭２の目７の部分に設けられる。モータ３７は、図示しない伝達機構を用いて２つの目７の眼球を左右方向に回転させる。すなわち、モータ３７は、目７の眼球を左右方向に回転させることにより、ロボット１の視線方向を変更する。 The motor 37 is provided in the portion of the eyes 7 of the head 2 of the robot 1. The motor 37 rotates the eyeballs of the two eyes 7 in the left-right direction using a transmission mechanism (not shown). That is, the motor 37 changes the line-of-sight direction of the robot 1 by rotating the eyeball of the eye 7 in the left-right direction.

モータ３８は、ロボット１の頭２の目７の部分に設けられる。モータ３８は、図示しない伝達機構を用いて２つの目７のまぶた７Ａを上下方向に動作させる。すなわち、モータ３８は、まぶた７Ａを上下方向に動作させることにより、瞬きを行わせる。 The motor 38 is provided in the eye 7 portion of the head 2 of the robot 1. The motor 38 moves the eyelids 7A of the two eyes 7 in the vertical direction using a transmission mechanism (not shown). That is, the motor 38 blinks by operating the eyelid 7A in the vertical direction.

モータ３９は、ロボット１の頭２の口９部分に設けられる。モータ３９は、図示しない伝達機構を用いて口９を開閉動作させる。 The motor 39 is provided in the mouth 9 portion of the head 2 of the robot 1. The motor 39 opens and closes the opening 9 using a transmission mechanism (not shown).

ここで、以下では、モータ３３Ｐ、３３Ｒ、３３Ｙ、３４Ｒ、３４Ｌ、３５Ｒ、３５Ｌ、モータ３６Ｒ、３６Ｌ、３７、３８および３９を区別なく総称する場合には「モータ３０」と総称する場合がある。また、モータ３０は、モータを制御するモータ制御部４０と接続される。モータ制御部４０は、頭２、右腕４Ｒ、左腕４Ｌ、右脚５Ｒおよび左脚５Ｌを駆動させるモータ３０の動力制御やモータ３０に流れる電流値のモニタリングなどの各種の統括制御を行う。また、右耳６Ｒ、左耳６Ｌ、目７、および口９を駆動させるモータ３０の動力制御やモータ３０に流れる電流値のモニタリングなどの各種の統括制御を行う。 Here, hereinafter, the motors 33P, 33R, 33Y, 34R, 34L, 35R, 35L, and the motors 36R, 36L, 37, 38, and 39 may be collectively referred to as “motor 30” in some cases. The motor 30 is connected to a motor control unit 40 that controls the motor. The motor control unit 40 performs various overall controls such as power control of the motor 30 that drives the head 2, the right arm 4R, the left arm 4L, the right leg 5R, and the left leg 5L, and monitoring of a current value flowing through the motor 30. In addition, various overall controls such as power control of the motor 30 that drives the right ear 6R, the left ear 6L, the eyes 7, and the mouth 9 and monitoring of a current value flowing through the motor 30 are performed.

各モータ３０は、モータ制御部４０からの指示にしたがって動力を伝達することにより、頭２、右腕４Ｒ、左腕４Ｌ、右脚５Ｒ、左脚５Ｌ、右耳６Ｒ、左耳６Ｌ、目７の眼球、まぶた７Ａ、および口９をそれぞれ駆動する。 Each motor 30 transmits power according to an instruction from the motor control unit 40, so that the head 2, the right arm 4 R, the left arm 4 L, the right leg 5 R, the left leg 5 L, the right ear 6 R, the left ear 6 L, The eyelid 7A and the mouth 9 are respectively driven.

また、ロボット１の頭２には、カメラ４１と、マイク４２と、スピーカ４３が設けられている。カメラ４１は、頭２の鼻８の部分に配置されており、頭２の正面方向を所定の周期で撮影する。マイク４２は、任意の位置に取り付けることができるが、対話を行う人の発話を集音するため、頭２の正面側に設置されることが好ましい。スピーカ４３も、任意の位置に取り付けることができるが、対話を行う人に対して音声を出力するため、頭２の正面側に設置されることが好ましい。 The head 2 of the robot 1 is provided with a camera 41, a microphone 42, and a speaker 43. The camera 41 is disposed in the nose 8 portion of the head 2 and photographs the front direction of the head 2 at a predetermined cycle. The microphone 42 can be attached at an arbitrary position, but is preferably installed on the front side of the head 2 in order to collect the utterances of the person performing the dialogue. The speaker 43 can also be attached at an arbitrary position, but is preferably installed on the front side of the head 2 in order to output a sound to a person who performs a dialogue.

また、ロボット１の胴体３には、ロボット１の全体制御を司る制御部１０が設けられている。モータ制御部４０、カメラ４１、マイク４２およびスピーカ４３は、制御部１０と接続される。モータ制御部４０は、制御部１０からの指示に応じて各モータ３０を動作させる制御を行う。カメラ４１は、撮影された画像データを制御部１０へ出力する。マイク４２は、集音された音声データを制御部１０へ出力する。スピーカ４３は、制御部１０からの制御に基づき、音声を出力する。 The body 3 of the robot 1 is provided with a control unit 10 that controls the entire robot 1. The motor control unit 40, the camera 41, the microphone 42 and the speaker 43 are connected to the control unit 10. The motor control unit 40 performs control to operate each motor 30 in accordance with an instruction from the control unit 10. The camera 41 outputs the captured image data to the control unit 10. The microphone 42 outputs the collected audio data to the control unit 10. The speaker 43 outputs sound based on control from the control unit 10.

続いて、本実施例に係るロボットが有する制御部の機能的構成について説明する。図３は、実施例１に係るロボットが有する制御部の機能的構成を示すブロック図である。図３に示すように、制御部１０は、ロボット１の全体制御を司る処理部である。制御部１０には、記憶部１１が接続されている。 Next, the functional configuration of the control unit included in the robot according to the present embodiment will be described. FIG. 3 is a block diagram illustrating a functional configuration of a control unit included in the robot according to the first embodiment. As shown in FIG. 3, the control unit 10 is a processing unit that performs overall control of the robot 1. A storage unit 11 is connected to the control unit 10.

記憶部１１は、制御部１０で実行されるＯＳ（Operating System）やロボット１の制御に用いる各種プログラムを記憶する。さらに、記憶部１１は、制御部１０で実行されるプログラムの実行に必要な各種データを記憶する。かかるデータの一例として、記憶部１１は、会話データ１１ａと、非言語動作データ１１ｂと、頷き動作データ１１ｃと、定常動作頻度データ１１ｄとを記憶する。 The storage unit 11 stores an OS (Operating System) executed by the control unit 10 and various programs used for controlling the robot 1. Further, the storage unit 11 stores various data necessary for executing the program executed by the control unit 10. As an example of such data, the storage unit 11 stores conversation data 11a, non-language motion data 11b, whispering motion data 11c, and steady motion frequency data 11d.

会話データ１１ａは、会話を行う際の発話内容を示す情報である。一例として、会話データ１１ａは、ロボット１の作成元で予め登録される。また、他の一例として、会話データ１１ａは、発話を行うために、後述する応答決定部１０ｆによって参照される。 The conversation data 11a is information indicating the utterance content when a conversation is performed. As an example, the conversation data 11 a is registered in advance at the creation source of the robot 1. As another example, the conversation data 11a is referred to by a response determination unit 10f described later in order to speak.

図４は、実施例１に係る会話データの構成の一例を示す図である。図４に示すように、会話データ１１ａは、「話題」、「質問」、「態度」、「次の質問」の各項目を有する。話題の項目は、対話を行う話題を示す情報を格納する領域である。質問の項目は、人に対して質問した内容を示す情報を格納する領域である。態度の項目は、表出する態度を示す情報を格納する領域である。次の質問の項目は、次に質問する質問内容を示す情報を格納する領域である。会話データ１１ａには、話題、質問内容、態度毎に次に質問する質問内容が格納される。また、質問および態度の項目に共に「−」が格納されたレコードには、話題を変更した際に最初に質問する質問内容が格納される。 FIG. 4 is a diagram illustrating an example of a configuration of conversation data according to the first embodiment. As shown in FIG. 4, the conversation data 11 a includes items of “topic”, “question”, “attitude”, and “next question”. The topic item is an area for storing information indicating a topic for conversation. The question item is an area for storing information indicating the content of a question to a person. The attitude item is an area for storing information indicating the attitude to be expressed. The next question item is an area for storing information indicating the question content to be questioned next. The conversation data 11a stores the question content for the next question for each topic, question content, and attitude. Also, in the record in which “-” is stored in both the question and attitude items, the content of the question to be asked first when the topic is changed is stored.

図４の例では、話題「Ａ」において最初に「質問ａ１」の質問を行うことを示す。また、図４の例では、話題「Ａ」の「質問ａ１」の質問を行い、回答を得た結果、表出する態度が「傾聴感」である場合、次に「質問ａ２」の質問を行うことを示す。また、図４の例では、話題「Ａ」の「質問ａ１」の質問を行い、回答を得た結果、表出する態度が「不明感」である場合、次に「質問ａ３」の質問を行うことを示す。また、図４の例では、話題「Ａ」の「質問ａ１」の質問を行い、回答を得た結果、表出する態度が「退屈感」である場合、次に「話題Ｂ」の質問を行うことを示す。そして、図４の例では、話題「Ｂ」において最初に「質問ｂ１」の質問を行うことを示す。 In the example of FIG. 4, it is shown that the question “Question a1” is first performed on the topic “A”. Further, in the example of FIG. 4, when the question “Question a1” of the topic “A” is asked and the answer is obtained, if the attitude to be expressed is “listening”, then the question “Question a2” is asked next. Indicates what to do. In the example of FIG. 4, when the question “Question a1” of the topic “A” is asked and the answer is obtained, when the attitude to be expressed is “Unknown”, the question “Question a3” is then asked. Indicates what to do. Further, in the example of FIG. 4, when the question “Question a1” of the topic “A” is asked and the answer is obtained, when the attitude to be expressed is “feeling bored”, the question of “Topic B” is asked next. Indicates what to do. In the example of FIG. 4, the question “question b1” is first performed on the topic “B”.

非言語動作データ１１ｂは、表出する態度毎の動作内容を示す情報である。一例として、非言語動作データ１１ｂは、ロボット１の作成元で予め登録される。また、他の一例として、非言語動作データ１１ｂは、非言語表現の動作を指示するために、後述する非言語応答決定部１０ｈおよび定常非言語動作決定部１０ｉによって参照される。 The non-language action data 11b is information indicating action contents for each attitude to be expressed. As an example, the non-language motion data 11b is registered in advance at the creation source of the robot 1. As another example, the non-language motion data 11b is referred to by a non-language response determination unit 10h and a steady non-language operation determination unit 10i described later in order to instruct a non-language expression operation.

図５は、実施例１に係る非言語動作データの構成の一例を示す図である。図５に示すように、非言語動作データ１１ｂは、「設定要素」、「傾聴感」、「不明感」、「退屈感」の各項目を有する。設定要素の項目は、態度を表出するために設定を行う要素を示す情報を格納する領域である。傾聴感の項目は、傾聴感を表出する際にそれぞれの要素での設定内容を示す情報を格納する領域である。不明感の項目は、不明感を表出する際にそれぞれの要素での設定内容を示す情報を格納する領域である。退屈感の項目は、退屈感を表出する際にそれぞれの要素での設定内容を示す情報を格納する領域である。 FIG. 5 is a diagram illustrating an example of a configuration of non-language operation data according to the first embodiment. As illustrated in FIG. 5, the non-language motion data 11 b includes items of “setting element”, “listening feeling”, “unknown feeling”, and “boring feeling”. The item of setting element is an area for storing information indicating an element to be set for expressing an attitude. The item of listening feeling is an area for storing information indicating setting contents in each element when expressing listening feeling. The item of unclearness is an area for storing information indicating setting contents in each element when unclearness is expressed. The bored feeling item is an area for storing information indicating the setting contents of each element when expressing bored feeling.

設定要素の項目には、人が傾聴感、不明感、退屈感をそれぞれ表出する際に、動作に違いが現れる要素が格納される。そして、傾聴感、不明感、退屈感の各項目には、それぞれで顕著となる動作内容を示す情報が格納される。一態様として、本実施例では、設定要素として「音量」、「音高」、「発話の速さ」、「視線」、「瞬き」、「動作、表情」について設定を行う。音量の設定要素は、スピーカ４３から発話などの音声を出力する際の音量の設定を示す。本実施例では、音量の「小」は、発話の標準的な音量よりも小さくすることを示す。音量の「中」は、発話の標準的な音量とすることを示す。音量の「大」は、発話の標準的な音量よりも大きくすることを示す。発話の標準的な音量は、例えば、予め多数の人の音を解析して決定してもよく、所定の音量を標準的な音量と定めてもよい。 The setting element item stores an element that shows a difference in operation when a person expresses a sense of listening, an unknown feeling, or a bored feeling. In each of the items of listening feeling, unknown feeling, and feeling bored, information indicating the action contents that become prominent is stored. As an aspect, in this embodiment, settings are made for “volume”, “pitch”, “speech speed”, “line of sight”, “blink”, and “motion, facial expression” as setting elements. The volume setting element indicates the volume setting when outputting sound such as speech from the speaker 43. In the present embodiment, “low” in the volume indicates that the volume is lower than the standard volume of the utterance. “Middle” of the volume indicates that the standard volume of the utterance is set. “Large” of the volume indicates that the volume is higher than the standard volume of the utterance. The standard volume of utterance may be determined, for example, by analyzing the sounds of a large number of people in advance, or a predetermined volume may be determined as the standard volume.

音高の設定要素は、スピーカ４３から発話を出力する際の声の高さの設定を示す。本実施例では、声高の「低」は、発話の声の高さを、発話の標準的な声の高さよりも低くすることを示す。声高の「中」は、発話の声の高さを、発話の標準的な声の高さとすることを示す。声高の「高」は、発話の声の高さを、発話の標準的な声の高さよりも高くすることを示す。発話の標準的な声の高さは、例えば、予め多数の人の音を解析して決定してもよく、所定の声の高さを標準的な高さと定めてもよい。 The pitch setting element indicates the setting of the voice pitch when the utterance is output from the speaker 43. In this embodiment, “low” of the voice pitch indicates that the voice pitch of the utterance is set lower than the standard voice pitch of the utterance. “Middle” of the voice pitch indicates that the voice level of the utterance is set as the standard voice level of the utterance. “High” of the voice pitch indicates that the voice pitch of the utterance is set higher than the standard voice pitch of the utterance. The standard voice level of the utterance may be determined, for example, by analyzing a number of human sounds in advance, or the predetermined voice level may be determined as the standard pitch.

発話の速さの設定要素は、スピーカ４３から発話を出力する際の発話の速さの設定を示す。本実施例では、発話の速さの「ゆっくり」は、発話の速度を、発話の標準的な速度よりも遅くすることを示す。発話の速さの「ふつう」は、発話の速度を、発話の標準的な速度とすることを示す。発話の標準的な速度は、例えば、予め多数の人の音を解析して決定してもよく、所定の発話の速度を標準的な発話の速度と定めてもよい。 The utterance speed setting element indicates the setting of the utterance speed when the utterance is output from the speaker 43. In the present embodiment, “slow” in the speaking speed indicates that the speaking speed is made slower than the standard speaking speed. “Normal” of utterance speed indicates that the utterance speed is set to the standard speed of utterance. The standard utterance speed may be determined, for example, by analyzing the sounds of a large number of people in advance, and the predetermined utterance speed may be determined as the standard utterance speed.

視線の設定要素は、目７の視線方向の設定を示す。本実施例では、視線の「ユーザ注視」は、ユーザの方向に視線を向けたままの状態とすることを示す。視線の「時々そらす」は、ユーザの方向に視線を向けるが、周期的にユーザと異なる方向へ視線を移動させることを示す。視線の「あまりユーザを見ない」は、ユーザと異なる方向へ視線を向け、周期的にユーザの方向に視線を移動させることを示す。視線を移動させる周期は、例えば、一定期間としてもよく、所定の範囲内でランダムな期間としてもよい。 The line-of-sight setting element indicates the setting of the line-of-sight direction of the eye 7. In the present embodiment, “user gaze” of the line of sight indicates that the line of sight remains in the direction of the user. “Slightly divert” the line of sight indicates that the line of sight is directed toward the user, but the line of sight is periodically moved in a direction different from the user. The line of sight “not seeing the user so much” indicates that the line of sight is directed in a direction different from the user and the line of sight is periodically moved in the direction of the user. The period for moving the line of sight may be, for example, a fixed period or a random period within a predetermined range.

瞬きの設定要素は、まぶた７Ａを上下動作させる瞬きの設定を示す。本実施例では、瞬きの「ふつう」は、標準的な頻度で瞬きを行うことを示す。瞬きの「頻度多い」は、標準的な頻度よりも多い頻度で瞬きを行うことを示す。瞬きの標準的な頻度は、例えば、予め多数の人の瞬きの頻度を解析して決定してもよく、所定の頻度を標準的な瞬きの頻度と定めてもよい。 The blink setting element indicates a blink setting for moving the eyelid 7A up and down. In this embodiment, blinking “normal” indicates blinking at a standard frequency. “Frequently” blinking means that blinking is performed at a frequency higher than the standard frequency. The standard blink frequency may be determined, for example, by analyzing the blink frequency of a large number of people in advance, or the predetermined frequency may be determined as the standard blink frequency.

動作、表情の設定要素は、ロボット１が表出する動作や表情に関する設定を示す。本実施例では、動作、表情の「笑顔」は、ロボット１の各部を動作させて笑顔であると感じられる状態にすることを示す。例えば、右耳６Ｒ、左耳６Ｌを最も直立した状態とし、頭２および目７をユーザの方向に向け、まぶた７Ａ、口９を全開の状態とし、右腕４Ｒ、左腕４Ｌを上部へ移動させる。動作、表情の「首を傾げる」は、ロボット１の各部を動作させて首を傾げたと感じられる状態にすることを示す。例えば、口９を閉じた状態とし、右腕４Ｒ、左腕４Ｌを下部へ移動させ、頭２をヨー方向やピッチ方向に回転させて首を傾げた状態にする。動作、表情の「眠そう」は、ロボット１の各部を動作させて眠そうと感じられる状態にすることを示す。例えば、まぶた７Ａを半分開いた状態とし、右腕４Ｒ、左腕４Ｌを下部へ移動させ、頭２をピッチに回転させて頭２を下方向に向けた状態にする。 The action / expression setting elements indicate settings related to the action and expression expressed by the robot 1. In the present embodiment, “smile” in action and expression indicates that each part of the robot 1 is operated to make it feel like a smile. For example, the right ear 6R and the left ear 6L are in the most upright state, the head 2 and eyes 7 are directed toward the user, the eyelid 7A and the mouth 9 are fully opened, and the right arm 4R and the left arm 4L are moved upward. The action and expression “tilt the neck” indicate that each part of the robot 1 is operated to make it feel that the head is tilted. For example, the mouth 9 is closed, the right arm 4R and the left arm 4L are moved downward, and the head 2 is rotated in the yaw direction or the pitch direction to tilt the neck. “Let's sleep” in action and expression indicates that each part of the robot 1 is operated to make it feel sleepy. For example, the eyelid 7A is half open, the right arm 4R and the left arm 4L are moved downward, the head 2 is rotated at a pitch, and the head 2 is directed downward.

図５の例では、傾聴感を表出する場合、音量を「大」とし、音高を「中」とし、発話の速さを「ふつう」とし、視線を「ユーザ注視」とし、瞬きを「ふつう」とし、動作、表情を「笑顔」とすることを示す。また、図５の例では、不明感を表出する場合、音量を「小」とし、音高量を「高」とし、発話の速さを「ふつう」とし、視線を「時々そらす」とし、瞬きを「頻度多い」とし、動作、表情を「首を傾げる」とすることを示す。また、図５の例では、退屈感を表出する場合、音量を「中」とし、音高量を「低」とし、発話の速さを「ゆっくり」とし、視線を「あまりユーザを見ない」とし、瞬きを「ふつう」とし、動作、表情を「眠そう」とすることを示す。 In the example of FIG. 5, when expressing a sense of listening, the volume is set to “high”, the pitch is set to “medium”, the speaking speed is set to “normal”, the line of sight is set to “user gaze”, and the blink is set to “ “Normal” and “Smile” as the action and facial expression. In the example of FIG. 5, when an unclear feeling is expressed, the volume is set to “low”, the pitch is set to “high”, the speed of the utterance is set to “normal”, and the line of sight is changed from time to time. It indicates that blinking is “frequently” and movement and facial expression are “tilt neck”. Further, in the example of FIG. 5, when boring is expressed, the volume is set to “medium”, the pitch is set to “low”, the speaking rate is set to “slow”, and the line of sight is set to “not seeing the user very much”. ”,“ Normal ”for blinking, and“ sleepy ”for action and facial expression.

頷き動作データ１１ｃは、応答を行う際の頷きに関する設定を示す情報である。一例として、頷き動作データ１１ｃは、ロボット１の作成元で予め登録される。また、他の一例として、頷き動作データ１１ｃは、頷き動作を指示するために、後述する非言語応答決定部１０ｈによって参照される。 The whispering motion data 11c is information indicating settings related to whispering when performing a response. As an example, the whispering motion data 11 c is registered in advance at the creator of the robot 1. As another example, the whispering motion data 11c is referred to by a non-language response determination unit 10h described later in order to instruct a whispering motion.

図６は、実施例１に係る頷き動作データの構成の一例を示す図である。図６に示すように、頷き動作データ１１ｃは、「設定要素」、「傾聴感」、「不明感」、「退屈感」の各項目を有する。設定要素の項目は、頷き動作に関して設定を行う要素を示す情報を格納する領域である。傾聴感の項目は、傾聴感を表出する際にそれぞれの要素での設定内容を示す情報を格納する領域である。不明感の項目は、不明感を表出する際にそれぞれの要素での設定内容を示す情報を格納する領域である。退屈感の項目は、退屈感を表出する際にそれぞれの要素での設定内容を示す情報を格納する領域である。 FIG. 6 is a diagram illustrating an example of a configuration of whispering operation data according to the first embodiment. As illustrated in FIG. 6, the whispering motion data 11 c includes items of “setting element”, “feeling of listening”, “feeling of unknown”, and “feeling of boredom”. The item of setting element is an area for storing information indicating an element to be set for the whirling operation. The item of listening feeling is an area for storing information indicating setting contents in each element when expressing listening feeling. The item of unclearness is an area for storing information indicating setting contents in each element when unclearness is expressed. The bored feeling item is an area for storing information indicating the setting contents of each element when expressing bored feeling.

設定要素の項目には、頷き動作に関する設定要素が格納される。そして、傾聴感、不明感、退屈感の各項目には、それぞれでの頷き動作内容を示す情報が格納される。一態様として、本実施例では、設定要素として「頷き深さ」、「頷き速さ」、「頷き遅れ」について設定を行う。頷き深さの設定要素は、頷く際の上下動作の振幅の大きさの設定を示す。頷き速さの設定要素は、頷く速度の設定を示す。頷き遅れの設定要素は、ユーザの発話を検出してから頷くまでの期間の設定を示す。 In the setting element field, a setting element related to the whirling operation is stored. In each item of the sense of listening, the feeling of ambiguity, and the feeling of boredom, information indicating the content of the whispering operation is stored. As an aspect, in the present embodiment, settings are made for “blowing depth”, “blowing speed”, and “blowing delay” as setting elements. The setting element for the pitching depth indicates the setting of the magnitude of the amplitude of the up and down motion when rolling. The setting element for the sowing speed indicates the setting of the soaking speed. The whispering delay setting element indicates the setting of a period from when the user's utterance is detected until he / she speaks.

ここで、人は、対話に対する状態に応じて頷き動作が変化する。例えば、人は、相手の話を傾聴している場合、相手の発話に応じて頷く。また、人は、相手の話が十分に理解できない場合、相手の発話に応じて頷くが、頷き動作の振幅が小さく、頷く速さが速くなり、頷く頻度も多くなる。また、人は、相手の話を理解できず、退屈な場合、相手の発話に応じて頷くが遅延が大きく、頷き動作の振幅がより小さくなる。そこで、本実施例では、傾聴感、不明感、退屈感毎に、後述する理解度、興味度から頷き深さ、頷き速さ、頷き遅れを導出する条件を設定している。 Here, a person's movement changes according to the state with respect to the dialogue. For example, when a person listens to the other person's story, he or she responds according to the other person's utterance. In addition, when a person cannot fully understand the other person's speech, the person speaks according to the other person's speech, but the amplitude of the whispering action is small, the speed of whispering is increased, and the frequency of whispering is increased. In addition, when a person cannot understand the other person's story and is bored, the person speaks according to the other person's utterance, but the delay is large, and the amplitude of the whispering action becomes smaller. Therefore, in this embodiment, conditions for deriving the depth, speed, and delay from the understanding level and interest level described later are set for each of the sense of listening, unknownness, and boredom.

図６の例では、表出する態度が「傾聴感」である場合、係数ｇ×ｍａｘ（理解度、興味度）から頷く深さを導出し、係数ｉ×興味度から頷く速さを導出し、頷き遅れを０．０とすることを示す。なお、ｍａｘ（理解度、興味度）は、（）内に代入された理解度、興味度から最も大きい値を返す関数である。また、図６の例では、表出する態度が「不明感」である場合、係数ｈ×理解度から頷く深さを導出し、頷く速さを１．０とし、係数ｊ×（１．０−理解度）から頷き遅れを導出することを示す。また、図６の例では、表出する態度が「退屈感」である場合、頷く深さ０．１とし、頷く速さを０．５とし、係数ｋ×（１．０−興味度）から頷き遅れを導出することを示す。なお、本実施例では、頷き深さ、頷き速さ、頷き遅れを０．０〜１．０の範囲で定めるものとしており、０．０より小さい場合、０．０とし、１．０より大きい場合、１．０とする。また、不明感よりも傾聴感での頷く深さを大きくするため、係数ｇは、係数ｈよりも大きい値とされている。不明感よりも退屈感での頷きの遅れを大きくするため、係数ｋは、係数ｊよりも大きい値とされている。 In the example of FIG. 6, when the attitude to be expressed is “listening”, the depth of craving is derived from the coefficient g × max (understanding degree, interest degree), and the speed of craving is derived from the coefficient i × interest degree. This indicates that the whirling delay is 0.0. Note that max (degree of understanding, degree of interest) is a function that returns the largest value from the degree of understanding and degree of interest assigned in (). Further, in the example of FIG. 6, when the attitude to be expressed is “Unknown”, the depth to be whispered is derived from the coefficient h × understanding degree, the whirling speed is set to 1.0, and the coefficient j × (1.0 -Demonstrate that the lag is derived from the degree of understanding. In the example of FIG. 6, when the attitude to be expressed is “feeling bored”, the crawl depth is 0.1, the crawl speed is 0.5, and the coefficient k × (1.0−interest degree) It shows that the whirling delay is derived. In this embodiment, the depth, speed, and delay are determined in the range of 0.0 to 1.0, and when smaller than 0.0, 0.0 is greater than 1.0. In this case, 1.0 is set. In addition, the coefficient g is set to a value larger than the coefficient h in order to increase the depth of the listening feeling rather than the unknown feeling. The coefficient k is set to a value larger than the coefficient j in order to increase the delay in boring with a feeling of boredom than with an unclear feeling.

定常動作頻度データ１１ｄは、人からの発話が無い状態で定常的に態度を表出する頻度に関する設定を示す情報である。一例として、定常動作頻度データ１１ｄは、ロボット１の作成元で予め登録される。また、他の一例として、定常動作頻度データ１１ｄは、定常的な非言語表現の動作の動作タイミングを指示するために、後述する定常非言語動作決定部１０ｉによって参照される。 The steady operation frequency data 11d is information indicating a setting relating to a frequency at which an attitude is constantly expressed in a state where there is no utterance from a person. As an example, the steady operation frequency data 11 d is registered in advance at the creation source of the robot 1. As another example, the steady motion frequency data 11d is referred to by a steady non-linguistic motion determination unit 10i, which will be described later, in order to instruct the motion timing of a motion of a steady non-language expression.

図７は、実施例１に係る定常動作頻度データの構成の一例を示す図である。図７に示すように、頷き動作データ１１ｃは、「設定要素」、「傾聴感」、「不明感」、「退屈感」の各項目を有する。設定要素の項目は、定常的に態度を表出する動作を示す情報を格納する領域である。傾聴感の項目は、傾聴感を表出する際にそれぞれの要素での設定内容を示す情報を格納する領域である。不明感の項目は、不明感を表出する際にそれぞれの要素での設定内容を示す情報を格納する領域である。退屈感の項目は、退屈感を表出する際にそれぞれの要素での設定内容を示す情報を格納する領域である。 FIG. 7 is a diagram illustrating an example of a configuration of steady operation frequency data according to the first embodiment. As shown in FIG. 7, the whispering motion data 11c has items of “setting element”, “feeling of listening”, “feeling of unknown”, and “feeling of boredom”. The item of the setting element is an area for storing information indicating an operation for constantly expressing the attitude. The item of listening feeling is an area for storing information indicating setting contents in each element when expressing listening feeling. The item of unclearness is an area for storing information indicating setting contents in each element when unclearness is expressed. The bored feeling item is an area for storing information indicating the setting contents of each element when expressing bored feeling.

設定要素の項目には、人が傾聴感、不明感、退屈感をそれぞれ表出する際に、違いが現れる動作が格納される。そして、傾聴感、不明感、退屈感の各項目には、後述する理解度、興味度から頻度を導出する条件が設定されている。一態様として、本実施例では、設定要素として「頷き頻度」、「視線頻度」について設定を行う。頷き頻度の設定要素は、頷きを行う周期の設定を示す。視線頻度の設定要素は、ユーザに視線を合わせる周期の設定を示す。 In the setting element item, there is stored an operation in which a difference appears when a person expresses a sense of listening, unknown, or bored. Conditions for deriving the frequency from the degree of understanding and the degree of interest, which will be described later, are set for each item of the sense of listening, the feeling of ambiguity, and the feeling of boredom. As an aspect, in this embodiment, settings are made for “blink frequency” and “gaze frequency” as setting elements. The setting element of the whispering frequency indicates setting of a cycle for whispering. The line-of-sight frequency setting element indicates the setting of a cycle for aligning the line of sight with the user.

図７の例では、表出する態度が「傾聴感」である場合、理解度×係数ａから頷きを行う頻度を導出し、理解度×係数ｄからユーザに視線を合わせる頻度を導出することを示す。また、図７の例では、表出する態度が「不明感」である場合、理解度×係数ｂから頷きを行う頻度を導出し、興味度×係数ｅからユーザに視線を合わせる頻度を導出することを示す。また、図７の例では、表出する態度が「退屈感」である場合、興味度×係数ｃから頷きを行う頻度を導出し、興味度×係数ｆからユーザに視線を合わせる頻度を導出することを示す。ここで、後述するように、本実施例では、理解度が高いと興味度が高くなり、理解度が低下すると興味度が低下するように理解度から興味度を算出しており、理解度および興味度から対話の状態が傾聴感、不明感、退屈感の何れかであるか判別している。対話状態が興味度の高い傾聴感である場合は、理解度から頷き頻度および視線頻度を導出する。また、対話状態が興味度が高く、理解度の低い不明感である場合は、理解度から頷き頻度を導出し、興味度から視線頻度を導出する。また、対話状態が興味度の低い退屈感である場合は、興味度から頷き頻度および視線頻度を導出する。なお、頷き頻度を、退屈感よりも不明感で多く、不明感よりも傾聴感でより多くするため、係数ａ、係数ｂ、係数ｃは、係数ａ＞係数ｂ＞係数ｃとなるように値を設定する。また、視線頻度を、退屈感よりも不明感で多く、不明感よりも傾聴感でより多くするため、係数ｄ、係数ｅ、係数ｆは、係数ｄ＞係数ｅ＞係数ｆとなるように値を設定する。 In the example of FIG. 7, when the attitude to be expressed is “listening”, the frequency of whispering is derived from the degree of understanding × the coefficient a, and the frequency of matching the line of sight with the user is derived from the degree of understanding × the coefficient d. Show. In the example of FIG. 7, when the attitude to be expressed is “Unknown”, the frequency of whispering is derived from the degree of understanding × the coefficient b, and the frequency of matching the user's line of sight is derived from the degree of interest × the coefficient e. It shows that. Further, in the example of FIG. 7, when the expressed attitude is “feeling bored”, the frequency of whispering is derived from the degree of interest × coefficient c, and the frequency of matching the line of sight with the user is derived from the degree of interest × coefficient f. It shows that. Here, as will be described later, in this embodiment, the degree of interest increases when the degree of understanding is high, and the degree of interest is calculated from the degree of understanding so that the degree of interest decreases when the degree of understanding decreases. It is determined from the degree of interest whether the state of dialogue is a sense of listening, an unknown feeling, or a bored feeling. When the conversation state is a sense of listening with a high degree of interest, the whispering frequency and the gaze frequency are derived from the degree of understanding. Also, when the conversation state has a high degree of interest and a low level of understanding, the whispering frequency is derived from the understanding level, and the line-of-sight frequency is derived from the interest level. When the conversation state is boring with a low degree of interest, the whispering frequency and the line-of-sight frequency are derived from the degree of interest. Note that the coefficient a, the coefficient b, and the coefficient c are set so that coefficient a> coefficient b> coefficient c in order to increase the whispering frequency with an unclear feeling more than a bored feeling and with a sense of listening rather than an unclear feeling. Set. Further, in order to increase the line-of-sight frequency with an unknown feeling more than bored feeling and with a sense of listening more than an unknown feeling, the coefficient d, the coefficient e, and the coefficient f are values such that the coefficient d> the coefficient e> the coefficient f. Set.

図３に戻り、制御部１０は、各種の処理手順を規定したプログラムや制御データを格納するための内部メモリを有し、これらによって種々の処理を実行する。制御部１０は、図３に示すように、発話解析部１０ａと、発話理解部１０ｂと、非言語表現解析部１０ｃと、評価算出部１０ｄと、状態決定部１０ｅとを有する。また、制御部１０は、応答決定部１０ｆと、発話応答決定部１０ｇと、非言語応答決定部１０ｈと、定常非言語動作決定部１０ｉと、発話生成部１０ｊと、非言語動作指示部１０ｋとをさらに有する。 Returning to FIG. 3, the control unit 10 has an internal memory for storing programs and control data that define various processing procedures, and executes various processes using these. As shown in FIG. 3, the control unit 10 includes an utterance analysis unit 10a, an utterance understanding unit 10b, a non-linguistic expression analysis unit 10c, an evaluation calculation unit 10d, and a state determination unit 10e. Further, the control unit 10 includes a response determination unit 10f, an utterance response determination unit 10g, a non-language response determination unit 10h, a steady non-language operation determination unit 10i, an utterance generation unit 10j, and a non-language operation instruction unit 10k. It has further.

なお、制御部１０には、各種の集積回路や電子回路を採用できる。例えば、集積回路としては、ＡＳＩＣ（Application Specific Integrated Circuit）が挙げられる。また、電子回路としては、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）などが挙げられる。 Note that various types of integrated circuits and electronic circuits can be employed for the control unit 10. For example, an ASIC (Application Specific Integrated Circuit) is an example of the integrated circuit. Examples of the electronic circuit include a central processing unit (CPU) and a micro processing unit (MPU).

発話解析部１０ａは、マイク４２により取得される音声データに対して音声認識を行う。音声認識の手法は、統計的手法、動的時間伸縮法、隠れマルコフモデルを用いた手法など何れの手法でもよい。発話解析部１０ａは、音声データに対して音声認識を行い、認識した音声をテキストデータへ変換する。 The utterance analysis unit 10 a performs voice recognition on the voice data acquired by the microphone 42. The speech recognition method may be any method such as a statistical method, a dynamic time expansion / contraction method, or a method using a hidden Markov model. The speech analysis unit 10a performs voice recognition on the voice data, and converts the recognized voice into text data.

発話理解部１０ｂは、発話内容に関して理解できた単語の割合の算出を行う。一態様としては、発話理解部１０ｂは、発話解析部１０ａにより音声データから音声認識が行えた単語の割合を算出する。また、他の一態様としては、例えば、対話を行う話題毎、それぞれの話題の対話を行った際に会話に含まれる単語を話題別単語データとして予め記憶する。そして、発話理解部１０ｂは、発話解析部１０ａにより変換されたテキストデータに、話題別単語データに記憶された現在の話題に関する単語が含まれる割合を算出する。 The utterance understanding unit 10b calculates the ratio of words that can be understood with respect to the utterance content. As one aspect, the utterance understanding unit 10b calculates the ratio of words that can be recognized by the utterance analysis unit 10a from the voice data. Moreover, as another aspect, for example, for each topic to be interacted with, words included in the conversation when the conversation of each topic is performed are previously stored as topic-specific word data. Then, the utterance understanding unit 10b calculates a ratio in which the text data converted by the utterance analysis unit 10a includes a word related to the current topic stored in the topic-specific word data.

非言語表現解析部１０ｃは、非言語表現から対話に対する関心状態について解析を行い、関心状態を示す計測値を生成する。例えば、人は、楽しそうに話をしている場合、話題に対して関心が高い。そこで、一態様としては、非言語表現解析部１０ｃは、カメラ４１から得られる画像データに対して、顔認識処理を含む画像処理を行って口角や目尻の検出を行う。そして、非言語表現解析部１０ｃは、検出した口角や目尻の位置の変化や、口の拡がり度合いなどから人物の表情が笑顔である可能性の高さを示す笑顔度を計測値として算出する。また、人は、熱心に話しをしている場合、ジャスチャが大きくなったり、ジャスチャが速くなる。そこで、非言語表現解析部１０ｃは、画像データに含まれる人物の手の動きを追跡することにより、人の手の動きによって示されるジェスチャの大きさや、その動きの速さなどを示す計測値を算出してもよい。人は、熱心に話しをしている場合、声が大きく、抑揚も大きくなり、発話の頻度に多くなる。そこで、他の一態様としては、非言語表現解析部１０ｃは、マイク４２で得られた音声データに対して音声解析処理を適用することにより、例えば、人の声の抑揚や声の大きさ、発話の頻度などを、非視覚的な非言語反応の強さを示す計測値として生成してもよい。なお、ロボット１には、ユーザの非言語表現を検出するセンサとして、カメラ４１およびマイク４２に加えて、接触を検出する接触センサを設けてもよい。そして、非言語表現解析部１０ｃは、接触センサで検出されるユーザの接触期間や接触頻度から関心状態を示す計測値を生成してもよい。例えば、非言語表現解析部１０ｃは、ユーザの接触期間が長く、接触頻度が多いほど計測値を高く算出する。 The non-linguistic expression analysis unit 10c analyzes the interest state for the dialogue from the non-linguistic expression, and generates a measurement value indicating the interest state. For example, when a person is talking happily, the person is highly interested in the topic. Thus, as one aspect, the non-linguistic expression analysis unit 10c performs image processing including face recognition processing on the image data obtained from the camera 41 to detect the mouth corners and the corners of the eyes. Then, the non-linguistic expression analysis unit 10c calculates, as a measurement value, a smile degree that indicates a high possibility that the facial expression of a person is a smile based on the detected change in the position of the mouth corner and the corner of the eye and the degree of mouth spread. In addition, when a person is eagerly speaking, the gesture becomes larger or the gesture becomes faster. Therefore, the non-linguistic expression analysis unit 10c tracks the movement of the person's hand included in the image data, thereby obtaining measurement values indicating the size of the gesture indicated by the movement of the person's hand and the speed of the movement. It may be calculated. When a person is eagerly speaking, the voice is louder, the intonation is greater, and the frequency of utterance increases. Therefore, as another aspect, the non-linguistic expression analysis unit 10c applies, for example, voice analysis processing to voice data obtained by the microphone 42, so that, for example, human voice inflection or voice volume, The frequency of utterances or the like may be generated as a measured value indicating the strength of non-visual non-verbal reaction. Note that the robot 1 may be provided with a contact sensor for detecting contact in addition to the camera 41 and the microphone 42 as a sensor for detecting the non-language expression of the user. And the non-linguistic expression analysis part 10c may produce | generate the measured value which shows an interest state from the contact period and contact frequency of a user detected with a contact sensor. For example, the non-linguistic expression analysis unit 10c calculates the measurement value higher as the contact period of the user is longer and the contact frequency is higher.

評価算出部１０ｄは、発話理解部１０ｂにより算出された理解できた単語の割合、および非言語表現解析部１０ｃによる解析によって得られた計測値に基づき、対話対象者との対話に関する評価を算出する。一態様としては、評価算出部１０ｄは、対話に関する評価として、発話に対する理解度を算出する。そして、評価算出部１０ｄは、算出された理解度から対話に対する興味の度合い表す興味度を算出する。例えば、発話理解部１０ｂにより、時刻ｔに入力された音声についての理解できた単語の割合をｐとした場合、評価算出部１０ｄは、時刻ｔ−１時点の理解度Ｕ（ｔ−１）から以下の（１）式から時刻ｔ時点の理解度Ｕ（ｔ）を算出する。 The evaluation calculation unit 10d calculates an evaluation related to the dialogue with the conversation target person based on the ratio of the understood words calculated by the utterance understanding unit 10b and the measurement value obtained by the analysis by the non-linguistic expression analysis unit 10c. . As an aspect, the evaluation calculation unit 10d calculates an understanding level for an utterance as an evaluation related to a dialogue. Then, the evaluation calculation unit 10d calculates an interest level representing the degree of interest in the dialogue from the calculated understanding level. For example, when the percentage of words that can be understood about the speech input at time t by the utterance understanding unit 10b is p, the evaluation calculation unit 10d calculates from the degree of understanding U (t-1) at time t-1. The degree of understanding U (t) at time t is calculated from the following equation (1).

Ｕ（ｔ）＝Ｕ（ｔ−１）×(１−α)＋ｐ×α （１）
ここで、αは、更新の為の係数である。 U (t) = U (t−1) × (1−α) + p × α (1)
Here, α is a coefficient for updating.

また、例えば、発話理解部１０ｂにより、時刻ｔに検出された非言語表現ｉに対する計測値をｑｉとした場合、評価算出部１０ｄは、時刻ｔ−１時点の理解度Ｕ（ｔ−１）から以下の（２）式により時刻ｔ時点の理解度Ｕ（ｔ）を算出する。 For example, when the measured value for the non-linguistic expression i detected at time t is set to qi by the speech understanding unit 10b, the evaluation calculation unit 10d determines from the understanding level U (t-1) at the time t−1. The degree of understanding U (t) at time t is calculated by the following equation (2).

Ｕ（ｔ）＝Ｕ（ｔ−１）×(１−β)＋ｑｉ×β （２）
ここで、βは、更新の為の係数である。 U (t) = U (t−1) × (1−β) + qi × β (2)
Here, β is a coefficient for updating.

そして、評価算出部１０ｄは、算出された理解度が高い状態が続いた場合に上昇し、当該理解度が低い状態が続いた場合に低下するように興味度を算出する。例えば、時刻ｔに算出された理解度Ｕ（ｔ）とした場合、評価算出部１０ｄは、時刻ｔ−１時点の興味度Ｉ（ｔ−１）から以下の（３）式から時刻ｔ時点の興味度Ｉ（ｔ）を算出する。 Then, the evaluation calculation unit 10d calculates the degree of interest so that it rises when a state where the calculated degree of understanding continues is high and decreases when a state where the degree of understanding is low continues. For example, when the understanding level U (t) calculated at the time t is used, the evaluation calculation unit 10d calculates the interest level I (t-1) at the time t-1 from the following equation (3) to the time t The degree of interest I (t) is calculated.

Ｉ（ｔ）＝Ｉ（ｔ−１）−(１．０−Ｕ（ｔ）)×γ＋Ａ（３）
ここで、γは、更新の為の係数であり、γ＞Ａとする。また、Ａは、(０＜Ａ＜１．０)の定数とする。 I (t) = I (t−1) − (1.0−U (t)) × γ + A (3)
Here, γ is a coefficient for updating, and γ> A. A is a constant (0 <A <1.0).

なお、本実施例では、理解度および興味度を０．０〜１．０の範囲で定めるものとしている。評価算出部１０ｄは、算出されたＵ（ｔ）やＩ（ｔ）が０．０より小さい場合、０．０とし、１．０より大きい場合、１．０とする。また、対話を開始した最初状態において、理解度および興味度には、初期値として１．０を設定する。 In this embodiment, the degree of understanding and the degree of interest are set in the range of 0.0 to 1.0. The evaluation calculation unit 10d sets 0.0 when the calculated U (t) and I (t) are smaller than 0.0, and sets 1.0 when larger than 1.0. In the initial state where the conversation is started, 1.0 is set as the initial value for the degree of understanding and the degree of interest.

状態決定部１０ｅは、評価算出部１０ｄにより算出される評価に基づいて、表出する態度を決定する。また、状態決定部１０ｅは、評価が高い状態が続いた場合、表出する態度を対話に対して理解の高い状態へと変更する。また、状態決定部１０ｅは、評価が低い状態が続いた場合、表出する態度を前記対話に対して理解の低い状態へと変更する。一態様としては、状態決定部１０ｅは、興味度が所定の閾値Ｔ１よりも大きいか否かを判定する。そして、状態決定部１０ｅは、興味度が閾値Ｔ１以下の場合、表出する態度を退屈感と決定する。また、興味度が閾値Ｔ１よりも大きい場合、状態決定部１０ｅは、理解度が所定の閾値Ｔ２よりも大きいか否かを判定する。そして、状態決定部１０ｅは、理解度が閾値Ｔ２よりも大きい場合、表出する態度を傾聴感と決定する。一方、状態決定部１０ｅは、理解度が閾値Ｔ２以下の場合、表出する態度を不明感と決定する。なお、閾値Ｔ１および閾値Ｔ２は、算出される理解度および興味度が表出する態度と対応するように適切に定める。例えば、閾値Ｔ１は、０．３〜０．５の範囲内の値とする。また、閾値Ｔ２は、０．３〜０．７の範囲の値とする。なお、閾値Ｔ１、Ｔ２は、外部から調整可能としてもよい。これにより、状態決定部１０ｅは、理解度が高い状態が続き、興味度も高い状態へと変化した場合、表出する態度を退屈感、不明感、傾聴感へ順次変更する。また、状態決定部１０ｅは、理解度が低い状態が続き、興味度も低い状態へと変化した場合、表出する態度を傾聴感、不明感、退屈感へ順次変更する。 The state determination unit 10e determines the attitude to be expressed based on the evaluation calculated by the evaluation calculation unit 10d. Moreover, when the state where evaluation is high continues, the state determination part 10e changes the attitude to express to a state with high understanding with respect to a dialog. Moreover, when the state where evaluation is low continues, the state determination unit 10e changes the attitude to be expressed to a state of low understanding with respect to the dialogue. As one aspect, the state determination unit 10e determines whether or not the degree of interest is greater than a predetermined threshold T1. Then, when the degree of interest is equal to or less than the threshold value T1, the state determination unit 10e determines that the attitude to be expressed is bored. If the degree of interest is greater than the threshold value T1, the state determination unit 10e determines whether or not the degree of understanding is greater than a predetermined threshold value T2. Then, when the degree of understanding is greater than the threshold value T2, the state determination unit 10e determines the attitude to be expressed as a sense of listening. On the other hand, when the understanding level is equal to or less than the threshold value T2, the state determination unit 10e determines the attitude to be expressed as an unknown feeling. Note that the threshold value T1 and the threshold value T2 are appropriately determined so as to correspond to the calculated degree of understanding and the degree of interest. For example, the threshold value T1 is a value within the range of 0.3 to 0.5. The threshold T2 is a value in the range of 0.3 to 0.7. The threshold values T1 and T2 may be adjustable from the outside. Thereby, the state determination part 10e changes a state to express to a bored feeling, an unknown feeling, and a listening feeling one by one when a state with a high degree of understanding continues and the state of interest changes. In addition, the state determination unit 10e sequentially changes the attitude to be expressed to an audible feeling, an unknown feeling, and a bored feeling when the degree of understanding continues and the degree of interest changes to a low state.

応答決定部１０ｆは、人との対話を行う際の発話内容を決定する。一態様としては、応答決定部１０ｆは、対話を開始した最初の発話内容について、会話データ１１ａに記憶された何れかの話題を選択し、選択した話題の最初に質問する質問内容を読み出し、読み出した質問内容を最初の発話内容と決定する。また、応答決定部１０ｆは、ユーザに対して何れの話題で質問を行っており、発話理解部１０ｂによりユーザの発話に対する解析が行われた場合、次のような処理を行う。すなわち、応答決定部１０ｆは、会話データ１１ａから、現在の話題の直前の質問についての、状態決定部１０ｅにより決定された態度に対応する次の質問内容を読み出し、読み出した質問内容を次の発話内容と決定する。 The response determination unit 10f determines the utterance content when performing a dialogue with a person. As one aspect, the response determination unit 10f selects any topic stored in the conversation data 11a for the first utterance content that has started the conversation, and reads and reads out the question content to be asked at the beginning of the selected topic. The question content is determined as the first utterance content. In addition, the response determination unit 10f asks the user on any topic, and when the utterance understanding unit 10b analyzes the user's utterance, the response determination unit 10f performs the following processing. That is, the response determination unit 10f reads the next question content corresponding to the attitude determined by the state determination unit 10e for the question immediately before the current topic from the conversation data 11a, and the read question content is the next utterance. Decide with content.

発話応答決定部１０ｇは、発話の音質を決定する。一態様としては、発話応答決定部１０ｇは、非言語動作データ１１ｂから態度に対応する音量、声高、発話の速さの設定を読み出し、読み出した設定を発話に関する音質の設定と決定する。 The utterance response determination unit 10g determines the sound quality of the utterance. As one aspect, the utterance response determination unit 10g reads the volume, voice pitch, and utterance speed settings corresponding to the attitude from the non-language action data 11b, and determines the read settings as sound quality settings related to the utterance.

非言語応答決定部１０ｈは、表出する動作内容を決定する。一態様としては、非言語応答決定部１０ｈは、頷き動作データ１１ｃから態度に対応する頷き深さ、頷き速さ、頷き遅れの導出条件を読み出す。そして、非言語応答決定部１０ｈは、評価算出部１０ｄにより算出された理解度および興味度を用いて、読み出した導出条件から頷き深さ、頷き速さ、頷き遅れを算出する。例えば、非言語応答決定部１０ｈは、決定された態度が傾聴感である場合、係数ｇ×ｍａｘ（理解度、興味度）から頷く深さを導出し、係数ｉ×興味度から頷く速さを導出し、頷き遅れを０．０とする。また、非言語応答決定部１０ｈは、決定された態度が不明感である場合、係数ｈ×理解度から頷く深さを導出し、頷く速さを１．０とし、係数ｊ×（１．０−理解度）から頷き遅れを導出する。非言語応答決定部１０ｈは、決定された態度が退屈感である場合、頷く深さ０．１とし、頷く速さを０．５とし、係数ｋ×（１．０−興味度）から頷き遅れを導出する。なお、非言語応答決定部１０ｈは、算出された頷き深さ、頷き速さ、頷き遅れが０．０より小さい場合、０．０とし、１．０より大きい場合、１．０とする。 The non-language response determination unit 10h determines the operation content to be expressed. As one aspect, the non-linguistic response determination unit 10h reads out the conditions for deriving the whirling depth, whirling speed, and whispering delay corresponding to the attitude from the whispering motion data 11c. Then, the non-linguistic response determination unit 10h uses the understanding level and the interest level calculated by the evaluation calculation unit 10d to calculate the whirling depth, the whirling speed, and the whirling delay from the read out derivation conditions. For example, if the determined attitude is a sense of listening, the non-linguistic response determination unit 10h derives the depth obtained from the coefficient g × max (understanding degree, interest degree), and determines the speed obtained from the coefficient i × interest degree. Derived and set the throwing delay to 0.0. In addition, when the determined attitude is unclear, the non-language response determination unit 10h derives the depth to be uttered from the coefficient h × understanding, sets the crawl speed to 1.0, and the coefficient j × (1.0 -Deriving the whispering delay from the degree of understanding. If the determined attitude is boring, the non-verbal response determination unit 10h sets a depth of 0.1 and sets a speed of 0.5 as a lag from a coefficient k × (1.0−degree of interest). Is derived. Note that the non-language response determining unit 10h sets 0.0 when the calculated depth, rate, and delay are smaller than 0.0, and 1.0 when larger than 1.0.

また、非言語応答決定部１０ｈは、非言語動作データ１１ｂから、決定された態度に対応する視線、瞬き、動作表情の設定を読み出し、読み出した設定を動作に関する設定と決定する。 Further, the non-language response determination unit 10h reads the setting of the line of sight, blink, and action expression corresponding to the determined attitude from the non-language action data 11b, and determines the read setting as the setting related to the action.

定常非言語動作決定部１０ｉは、人からの発話が無い状態でも対話対象者に対して頷きや仕草により定常的に表出する態度を決定する。一態様としては、定常非言語動作決定部１０ｉは、定常動作頻度データ１１ｄから、決定された態度に対応する頷き頻度、視線頻度の導出条件を読み出す。そして、非言語応答決定部１０ｈは、評価算出部１０ｄにより算出された理解度および興味度を用いて、読み出した導出条件から頷き頻度、視線頻度を算出する。例えば、非言語応答決定部１０ｈは、決定された態度が傾聴感である場合、理解度×係数ａから頷きを行う頻度を導出し、理解度×係数ｄからユーザに視線を合わせる頻度を導出する。また、非言語応答決定部１０ｈは、決定された態度が不明感である場合、理解度×係数ｂから頷きを行う頻度を導出し、興味度×係数ｅからユーザに視線を合わせる頻度を導出する。また、非言語応答決定部１０ｈは、決定された態度が退屈感である場合、興味度×係数ｃから頷きを行う頻度を導出し、興味度×係数ｆからユーザに視線を合わせる頻度を導出する。そして、定常非言語動作決定部１０ｉは、頷きを行う頻度が高いほど短いものとして頷きを行う周期を導出する。また、定常非言語動作決定部１０ｉは、視線を合わせる頻度が高いほど短いものとしてユーザに視線を合わせる周期を導出する。周期は、頻度から演算によって導出してもよい。例えば、周期は、値ｘ／頻度から導出する。この値ｘは定数であってもよく、対話状態によって変化させてもよい。また、頻度毎に周期を定めた周期情報を記憶部１１に記憶させておき、定常非言語動作決定部１０ｉは、周期情報から頻度に応じた周期を導出してもよい。 The stationary non-verbal motion determining unit 10i determines an attitude that is constantly expressed by whispering and gestures with respect to a conversation target person even in a state where there is no utterance from a person. As one aspect, the steady non-linguistic motion determination unit 10i reads the squeezing frequency and line-of-sight frequency derivation conditions corresponding to the determined attitude from the steady motion frequency data 11d. Then, the non-linguistic response determination unit 10h calculates the stroking frequency and the line-of-sight frequency from the read out derivation conditions using the degree of understanding and the degree of interest calculated by the evaluation calculation unit 10d. For example, if the determined attitude is a sense of listening, the non-language response determination unit 10h derives the frequency of whispering from the degree of understanding × coefficient a, and derives the frequency of matching the user's line of sight from the degree of understanding × coefficient d. . Further, when the determined attitude is unclear, the non-language response determination unit 10h derives the frequency of whispering from the degree of understanding × the coefficient b, and derives the frequency of matching the line of sight with the user from the degree of interest × the coefficient e. . In addition, when the determined attitude is boring, the non-language response determination unit 10h derives the frequency of whispering from the interest degree × coefficient c, and derives the frequency of matching the user's line of sight from the interest degree × coefficient f. . Then, the steady non-linguistic action determination unit 10i derives the period for performing the whispering as being shorter as the whispering frequency is higher. In addition, the steady non-linguistic action determining unit 10i derives a cycle for aligning the line of sight with the user, assuming that the higher the frequency of aligning the line of sight, the shorter. The period may be derived from the frequency by calculation. For example, the period is derived from the value x / frequency. This value x may be a constant or may be changed depending on the dialog state. Moreover, the periodic information which defined the period for each frequency may be stored in the storage unit 11, and the steady non-linguistic action determining unit 10 i may derive the period according to the frequency from the periodic information.

また、定常非言語動作決定部１０ｉは、頷き動作データ１１ｃから、決定された態度に対応する頷き深さ、頷き速さ、頷き遅れの導出条件を読み出し、理解度および興味度を用いて、読み出した導出条件から頷き深さ、頷き速さ、頷き遅れを算出する。また、定常非言語動作決定部１０ｉは、非言語動作データ１１ｂから、決定された態度に対応する視線、瞬き、動作表情の設定を読み出し、読み出した設定を動作に関する設定と決定する。 Further, the steady non-linguistic motion determining unit 10i reads the squeezing depth, the squeezing speed, and the squeezing delay derivation conditions corresponding to the determined attitude from the stumbling motion data 11c, and reads them using the understanding level and the interest level. Based on the derived conditions, the depth, speed, and delay are calculated. In addition, the steady non-language motion determination unit 10i reads gaze, blink, and motion facial expression settings corresponding to the determined attitude from the non-language motion data 11b, and determines the read settings as settings related to the motion.

発話生成部１０ｊは、発話内容に応じた音声信号を生成してスピーカ４３へ出力する。一態様としては、発話生成部１０ｊは、発話内容および発話に関する音量、声高、発話の速さが決定した場合、決定した発話内容の音声を、決定した音量、声高、発話の速さで発生させる音声信号を生成してスピーカ４３へ出力する。これにより、スピーカ４３は、音声信号に応じた音声を出力して発話を行う。 The utterance generation unit 10 j generates an audio signal corresponding to the utterance content and outputs it to the speaker 43. As one aspect, when the utterance content and the volume, voice pitch, and utterance speed related to the utterance are determined, the utterance generation unit 10j generates the voice of the determined utterance content at the determined volume, voice pitch, and utterance speed. An audio signal is generated and output to the speaker 43. Thereby, the speaker 43 outputs a sound corresponding to the sound signal and speaks.

非言語動作指示部１０ｋは、仕草など非言語動作を行わせる動作指示を生成してモータ制御部４０へ出力する。一態様としては、非言語動作指示部１０ｋは、発話内容および発話に関する音量、声高、発話の速さが決定し、発話を行う場合、非言語応答決定部１０ｈにより算出された頷き深さ、頷き速さ、頷き遅れで頷き動作を行わせる動作指示をモータ制御部４０へ出力する。また、非言語動作指示部１０ｋは、非言語応答決定部１０ｈにより決定された視線、瞬き、動作表情で動作を行わせる動作指示をモータ制御部４０へ出力する。例えば、非言語動作指示部１０ｋは、非言語表現解析部１０ｃにより顔が認識された方向へ目７あるいは頭２を移動させ、決定された視線、瞬き、動作表情で動作させる動作指示をモータ制御部４０へ出力する。モータ制御部４０は、動作指示に従ってモータ３０を制御して、ロボット１に動作指示に応じた動作を行わせる。これにより、ロボット１は、ユーザからの発話に対して非言語動作による応答を行う。 The non-language operation instruction unit 10k generates an operation instruction for performing a non-language operation such as gesture and outputs the operation instruction to the motor control unit 40. As one aspect, the non-language operation instruction unit 10k determines the volume, voice pitch, and speed of speech related to the utterance content and utterance, and when speaking, An operation instruction to perform a whirling operation at a speed and a whispering delay is output to the motor control unit 40. In addition, the non-language operation instruction unit 10k outputs to the motor control unit 40 an operation instruction for performing an operation with the line of sight, blinking, and operation expression determined by the non-language response determination unit 10h. For example, the non-language operation instruction unit 10k moves the eyes 7 or the head 2 in the direction in which the face is recognized by the non-language expression analysis unit 10c, and controls the operation instruction to operate with the determined line of sight, blink, and operation expression. To the unit 40. The motor control unit 40 controls the motor 30 according to the operation instruction, and causes the robot 1 to perform an operation according to the operation instruction. Thereby, the robot 1 responds to the utterance from the user by a non-language operation.

また、非言語動作指示部１０ｋは、前回の頷き動作、ユーザに視線を合わせた動作から定常非言語動作決定部１０ｉで算出された頷きを行う周期、視線を合わせる周期が経過した場合、次の定常的な動作を行わせる動作指示をモータ制御部４０へ出力する。例えば、非言語動作指示部１０ｋは、定常非言語動作決定部１０ｉにより決定された頷き深さ、頷き速さ、頷き遅れで頷き動作を行わせる動作指示をモータ制御部４０へ出力する。また、非言語動作指示部１０ｋは、定常非言語動作決定部１０ｉにより決定された視線、瞬き、動作表情で動作を行わせる動作指示をモータ制御部４０へ出力する。モータ制御部４０は、動作指示に従ってモータ３０を制御して、ロボット１に動作指示に応じた動作を行わせる。これにより、ロボット１は、ユーザ対して定常的に非言語動作により現在の状態を態度で表出する。 In addition, the non-language movement instruction unit 10k performs the following when the period of performing the whisper calculated by the steady non-language movement determination unit 10i and the period of matching the line of sight from the previous whispering movement, the movement of aligning the line of sight with the user, An operation instruction for performing a steady operation is output to the motor control unit 40. For example, the non-language operation instruction unit 10k outputs to the motor control unit 40 an operation instruction for causing the operation to be performed with a stroke depth, a stroke speed, and a roll delay determined by the steady non-language operation determination unit 10i. In addition, the non-language motion instruction unit 10k outputs to the motor control unit 40 an operation instruction for performing an operation with the line of sight, blinking, and motion expression determined by the steady non-language operation determination unit 10i. The motor control unit 40 controls the motor 30 according to the operation instruction, and causes the robot 1 to perform an operation according to the operation instruction. As a result, the robot 1 constantly expresses the current state as an attitude by non-verbal operation to the user.

次に、ロボット１による対話の流れを説明する。図８は、対話の流れの一例を示す図である。図８に示すように、ユーザから発話Ｃ１が行われた場合、ロボット１は、発話Ｃ１に対して、発話および非言語動作による応答Ｃ２を行う。例えば、ロボット１は、決定された状態が傾聴感である場合、頷きながら応答Ｃ２を行い、また、ユーザを注視するなどして傾聴感を表出する。 Next, the flow of dialogue by the robot 1 will be described. FIG. 8 is a diagram showing an example of the flow of dialogue. As illustrated in FIG. 8, when the utterance C1 is made by the user, the robot 1 performs a response C2 based on the utterance and the non-language operation on the utterance C1. For example, when the determined state is a sense of listening, the robot 1 makes a response C2 while whispering, and expresses the sense of listening by gazing at the user.

また、ロボット１は、定常的に非言語動作により態度Ｃ３を表出する。また、ロボット１は、ユーザの発話やユーザの態度を検出した結果、決定された状態が退屈感である場合、ユーザから視線を時々そらしたりして非言語動作で退屈感を表出する。また、ユーザから発話Ｃ４が行われ、発話Ｃ４の内容を十分に理解できない場合、ロボット１は、発話Ｃ４に対して、発話と共に首を傾げるなどして不明感を表出する応答Ｃ５を行う。 Further, the robot 1 constantly expresses the attitude C3 by non-verbal operation. Further, when the determined state is bored as a result of detecting the user's speech or the user's attitude, the robot 1 sometimes turns the line of sight away from the user and expresses boredom in a non-verbal operation. When the user utters C4 and cannot fully understand the contents of the utterance C4, the robot 1 responds to the utterance C4 by expressing a sense of ambiguity by tilting the head along with the utterance.

また、ロボット１は、ユーザの発話やユーザの態度を検出した結果、決定された状態が退屈感である場合、定常的な非言語動作で眠そうな表情などの態度Ｃ６を表出して退屈感を表出する。 In addition, when the robot 1 detects the user's utterance or the user's attitude and the determined state is boring, the robot 1 expresses an attitude C6 such as a sleepy expression with a steady non-verbal motion and feels bored. Is expressed.

このように、本実施例に係るロボット１は、対話対象者の発話に対して理解状態に応じて、定常的に非言語動作で表出する態度を変更する。このため、本実施例に係るロボット１によれば、ロボット１が表出する態度からロボット１が対話に対して理解が高い状態か理解が低い状態か対話対象者が判別できる。 As described above, the robot 1 according to the present embodiment changes the attitude that is constantly expressed in the non-language operation according to the understanding state with respect to the utterance of the conversation target person. For this reason, according to the robot 1 according to the present embodiment, it is possible to determine whether the robot 1 is in a state of high understanding or low understanding of the dialogue from the attitude of the robot 1 to express.

次に、本実施例に係るロボット１により対話を行う処理の流れについて説明する。図９は、実施例１に係る対話処理の手順を示すフローチャートである。この対話処理は、ロボット１に対して対話開始を指示する所定の操作が行われた場合に処理を起動する。例えば、対話処理は、ロボット１の電源がオンされ、所定の初期処理が終了したタイミングで処理を起動する。 Next, the flow of processing for performing a dialogue with the robot 1 according to the present embodiment will be described. FIG. 9 is a flowchart illustrating the procedure of the dialogue processing according to the first embodiment. This dialogue processing is started when a predetermined operation for instructing the robot 1 to start dialogue is performed. For example, the interactive process is started when the power of the robot 1 is turned on and a predetermined initial process is completed.

図９に示すように、評価算出部１０ｄは、理解度および興味度を初期値として１．０が設定する（ステップＳ１０）。評価算出部１０ｄは、ユーザからの対話の入力が検出されたか否かを判定する（ステップＳ１１）。例えば、評価算出部１０ｄは、発話解析部１０ａによる音声認識の結果、ユーザの発話が検出された場合、音声により対話が入力されたと判定する。また、評価算出部１０ｄは、非言語表現解析部１０ｃによりユーザの表情や動作などが検出された場合、非言語表現により対話が入力されたと判定する。ユーザからの対話の入力が検出されていない場合（ステップＳ１１否定）、ステップＳ１１へ移行してユーザからの対話の入力待ちを行う。ユーザからの対話の入力が検出された場合（ステップＳ１１肯定）、評価算出部１０ｄは、音声認識したユーザの発話内容に含まれる理解できた単語の割合、およびユーザの非言語表現を解析して得られた計測値に基づき、理解度を更新する（ステップＳ１２）。そして、評価算出部１０ｄは、理解度に基づき、興味度を更新する（ステップＳ１３）。状態決定部１０ｅは、対話を継続するか判定する（ステップＳ１４）。例えば、状態決定部１０ｅは、興味度が０である場合、対話を継続できないと判定する。対話を継続できない場合（ステップＳ１４否定）、処理を終了する。対話を継続できる場合（ステップＳ１４肯定）、状態決定部１０ｅは、表出する態度を決定する表出態度決定処理を行う（ステップＳ１５）。応答決定部１０ｆは、会話データ１１ａから応答する発話内容を決定する（ステップＳ１６）。そして、発話による応答を行う応答処理を行い（ステップＳ１７）、再度ステップＳ１１へ移行する。 As illustrated in FIG. 9, the evaluation calculation unit 10d sets 1.0 as an initial value for the degree of understanding and the degree of interest (step S10). The evaluation calculation unit 10d determines whether or not a dialogue input from the user has been detected (step S11). For example, if the user's utterance is detected as a result of speech recognition by the utterance analysis unit 10a, the evaluation calculation unit 10d determines that a dialogue is input by voice. Further, the evaluation calculation unit 10d determines that the dialogue is input with the non-linguistic expression when the non-linguistic expression analysis unit 10c detects the facial expression or action of the user. When the input of the dialogue from the user is not detected (No at Step S11), the process proceeds to Step S11 and waits for the input of the dialogue from the user. When the input of the dialogue from the user is detected (Yes at Step S11), the evaluation calculation unit 10d analyzes the ratio of words that can be understood included in the speech utterance content of the user that has been voice-recognized and the non-linguistic expression of the user. Based on the obtained measurement value, the degree of understanding is updated (step S12). Then, the evaluation calculation unit 10d updates the degree of interest based on the degree of understanding (step S13). The state determination unit 10e determines whether to continue the dialogue (step S14). For example, when the degree of interest is 0, the state determination unit 10e determines that the conversation cannot be continued. If the dialogue cannot be continued (No at Step S14), the process is terminated. When the dialogue can be continued (Yes at Step S14), the state determination unit 10e performs an expression attitude determination process for determining an attitude to be expressed (Step S15). The response determination unit 10f determines the utterance content to be responded from the conversation data 11a (step S16). And the response process which responds by utterance is performed (step S17), and it transfers to step S11 again.

次に、本実施例に係る表出態度決定処理の流れについて説明する。図１０は、実施例１に係る表出態度決定処理の手順を示すフローチャートである。この表出態度決定処理は、対話処理のステップＳ１５から呼び出されて起動する。 Next, the flow of the expression attitude determination process according to the present embodiment will be described. FIG. 10 is a flowchart illustrating the procedure of the expression attitude determination process according to the first embodiment. This expression attitude determination process is invoked and started from step S15 of the dialogue process.

図１０に示すように、状態決定部１０ｅは、興味度が所定の閾値Ｔ１よりも大きいか否かを判定する（ステップＳ３０）。興味度が閾値Ｔ１以下の場合（ステップＳ３０否定）、状態決定部１０ｅは、表出する態度を退屈感と決定する（ステップＳ３１）。一方、興味度が閾値Ｔ１よりも大きい場合（ステップＳ３０肯定）、状態決定部１０ｅは、理解度が所定の閾値Ｔ２よりも大きいか否かを判定する（ステップＳ３２）。状態決定部１０ｅは、理解度が閾値Ｔ２以下の場合（ステップＳ３２否定）、状態決定部１０ｅは、表出する態度を不明感と決定する（ステップＳ３３）。一方、理解度が閾値Ｔ２よりも大きい場合（ステップＳ３２肯定）、状態決定部１０ｅは、表出する態度を傾聴感と決定する（ステップＳ３４）。 As illustrated in FIG. 10, the state determination unit 10e determines whether or not the degree of interest is greater than a predetermined threshold T1 (step S30). When the degree of interest is equal to or less than the threshold T1 (No at Step S30), the state determination unit 10e determines that the attitude to be expressed is bored (Step S31). On the other hand, when the degree of interest is greater than the threshold T1 (Yes at Step S30), the state determination unit 10e determines whether the degree of understanding is greater than a predetermined threshold T2 (Step S32). When the understanding level is equal to or lower than the threshold T2 (No at Step S32), the state determination unit 10e determines that the attitude to be expressed is unknown (Step S33). On the other hand, when the comprehension level is greater than the threshold value T2 (Yes at Step S32), the state determination unit 10e determines the attitude to be expressed as a sense of listening (Step S34).

次に、本実施例に係る応答処理の流れについて説明する。図１１は、実施例１に係る応答処理の手順を示すフローチャートである。この応答処理は、対話処理のステップＳ１７から呼び出されて起動する。 Next, the flow of response processing according to the present embodiment will be described. FIG. 11 is a flowchart of the response process according to the first embodiment. This response process is invoked and started from step S17 of the interactive process.

図１１に示すように、発話応答決定部１０ｇおよび非言語応答決定部１０ｈは、応答を行う際に動作パラメータを決定する（ステップＳ４０）。例えば、発話応答決定部１０ｇは、非言語動作データ１１ｂから態度に対応する音量、声高、発話を決定する。また、非言語応答決定部１０ｈは、頷き動作データ１１ｃから頷き深さ、頷き速さ、頷き遅れを決定する。発話生成部１０ｊは、決定した発話内容の音声を、決定した音量、声高、発話の速さで発生させる音声信号を生成してスピーカ４３へ出力する（ステップＳ４１）。非言語動作指示部１０ｋは、決定した頷き深さ、頷き速さ、頷き遅れで頷き動作を行わせる動作指示をモータ制御部４０へ出力する（ステップＳ４２）。 As shown in FIG. 11, the speech response determination unit 10g and the non-language response determination unit 10h determine an operation parameter when performing a response (step S40). For example, the utterance response determination unit 10g determines the volume, voice pitch, and utterance corresponding to the attitude from the non-language action data 11b. Further, the non-language response determining unit 10h determines the whirling depth, the whirling speed, and the whirling delay from the whispering motion data 11c. The utterance generation unit 10j generates an audio signal for generating the audio of the determined utterance content at the determined volume, voice pitch, and utterance speed, and outputs the audio signal to the speaker 43 (step S41). The non-language operation instruction unit 10k outputs to the motor control unit 40 an operation instruction for performing a whirling operation with the determined whispering depth, whispering speed, and whispering delay (step S42).

次に、本実施例に係るロボット１が定常的に態度を表出する定常態度表出処理の流れについて説明する。図１２は、実施例１に係る定常態度表出処理の手順を示すフローチャートである。この定常態度表出処理は、ロボット１に対して対話開始を指示する所定の操作が行われた場合に処理を起動する。例えば、定常態度表出処理は、ロボット１の電源がオンされ、所定の初期処理が終了したタイミングで処理を起動する。 Next, a flow of steady attitude expression processing in which the robot 1 according to the present embodiment constantly expresses attitude will be described. FIG. 12 is a flowchart illustrating a procedure of steady attitude expression processing according to the first embodiment. This steady-state attitude expression process is started when a predetermined operation for instructing the robot 1 to start a dialogue is performed. For example, in the steady attitude expression process, the process is started when the power of the robot 1 is turned on and a predetermined initial process is completed.

図１２に示すように、定常非言語動作決定部１０ｉは、処理終了を指示する所定操作が行われた否かを判定する（ステップＳ５０）。処理終了を指示する所定操作が行われた場合（ステップＳ５０肯定）、処理を終了する。一方、処理終了を指示する所定操作が行われていない場合（ステップＳ５０否定）、定常非言語動作決定部１０ｉは、一定期間待機を行う（ステップＳ５１）。この一定期間は、仕草により定常的な態度を表出するために適切な期間に定められており、例えば、数百ミリ秒から数秒程度とされている。 As illustrated in FIG. 12, the steady non-language operation determining unit 10 i determines whether or not a predetermined operation for instructing the end of the process has been performed (step S 50). When a predetermined operation for instructing the end of the process is performed (Yes at Step S50), the process ends. On the other hand, when the predetermined operation for instructing the end of the process has not been performed (No at Step S50), the steady non-language operation determining unit 10i waits for a certain period (Step S51). This fixed period is set to an appropriate period in order to express a steady attitude by gesture, and is, for example, about several hundred milliseconds to several seconds.

定常非言語動作決定部１０ｉは、定常動作頻度データ１１ｄから態度に対応する頷きを行う周期および、視線を合わせる周期を導出する（ステップＳ５２）。非言語動作指示部１０ｋは、頷き動作を行うタイミングであるか否かを判定する（ステップＳ５３）。例えば、非言語動作指示部１０ｋは、前回の頷き動作から頷きを行う周期を経過している場合、頷き動作を行うタイミングであると判定する。頷き動作を行うタイミングでない場合（ステップＳ５３否定）、後述するステップＳ５６へ移行する。一方、頷き動作を行うタイミングである場合（ステップＳ５３肯定）、定常非言語動作決定部１０ｉは、頷き動作の動作パラメータを決定する（ステップＳ５４）。例えば、定常非言語動作決定部１０ｉは、頷き動作データ１１ｃから態度に対応する頷き深さ、頷き速さを決定する。非言語動作指示部１０ｋは、決定された頷き深さ、頷き速さで頷き動作を行わせる動作指示をモータ制御部４０へ出力する（ステップＳ５５）。 The steady non-language motion determining unit 10i derives a cycle for performing a whisper corresponding to the attitude and a cycle for matching the line of sight from the steady motion frequency data 11d (step S52). The non-language operation instruction unit 10k determines whether it is time to perform a whispering operation (step S53). For example, the non-language operation instruction unit 10k determines that it is the timing to perform the whispering operation when the whispering cycle has passed since the previous whispering operation. If it is not time to perform the whispering operation (No at Step S53), the process proceeds to Step S56 described later. On the other hand, when it is time to perform the whispering operation (Yes at Step S53), the steady non-language motion determining unit 10i determines the motion parameter of the whispering motion (Step S54). For example, the steady non-linguistic motion determination unit 10i determines the whirling depth and the whirling speed corresponding to the attitude from the whispering motion data 11c. The non-linguistic operation instruction unit 10k outputs an operation instruction for performing a whirling operation at the determined whispering depth and whirling speed to the motor control unit 40 (step S55).

次に、非言語動作指示部１０ｋは、ユーザに視線を合わせる動作を行うタイミングであるか否かを判定する（ステップＳ５６）。例えば、非言語動作指示部１０ｋは、前回の視線を合わせる動作から視線を合わせる周期を経過している場合、視線を合わせる動作を行うタイミングであると判定する。視線を合わせる動作を行うタイミングでない場合（ステップＳ５６否定）、後述するステップＳ５８へ移行する。一方、視線を合わせる動作を行うタイミングである場合（ステップＳ５６肯定）、非言語動作指示部１０ｋは、ユーザの方向へ目７あるいは頭２を移動させる動作指示をモータ制御部４０へ出力する（ステップＳ５７）。 Next, the non-language operation instruction unit 10k determines whether or not it is a timing to perform an operation of aligning the line of sight with the user (step S56). For example, the non-language operation instruction unit 10k determines that it is time to perform a line-of-sight operation when a line-of-sight cycle has elapsed since the previous line-of-sight operation. If it is not time to perform the line-of-sight operation (No at Step S56), the process proceeds to Step S58 described later. On the other hand, when it is the timing to perform the operation of aligning the line of sight (Yes at Step S56), the non-language operation instruction unit 10k outputs an operation instruction to move the eyes 7 or the head 2 in the direction of the user to the motor control unit 40 (Step S56). S57).

次に、非言語動作指示部１０ｋは、仕草などの振舞いを行うタイミングであるか否かを判定する（ステップＳ５８）。例えば、非言語動作指示部１０ｋは、状態決定部１０ｅにより決定される態度が変わったタイミングを振舞いを行うタイミングであると判定する。また、非言語動作指示部１０ｋは、前回仕草を表出してから、所定の期間を経過している場合、振舞いを行うタイミングであると判定する。この所定の期間は、一定の期間としてもよく、所定の範囲内でランダムな期間としてもよい。振舞いを行うタイミングでない場合（ステップＳ５８否定）、ステップＳ５０へ移行する。一方、振舞いを行うタイミングである場合（ステップＳ５８肯定）、非言語動作指示部１０ｋは、仕草などに関する動作パラメータを決定する（ステップＳ５９）。例えば、定常非言語動作決定部１０ｉは、仕草として表情などの表出を行う場合、頷き動作データ１１ｃから動作、表情を決定する。非言語動作指示部１０ｋは、決定された動作、表情の動作を行わせる動作指示をモータ制御部４０へ出力し（ステップＳ６０）、ステップＳ５０へ移行する。 Next, the non-language operation instruction unit 10k determines whether or not it is time to perform a behavior such as gesture (step S58). For example, the non-language operation instruction unit 10k determines that the timing at which the attitude determined by the state determination unit 10e has changed is the timing for performing the behavior. Further, the non-language operation instruction unit 10k determines that it is the timing to perform the behavior when a predetermined period has elapsed since the last gesture was expressed. This predetermined period may be a fixed period or may be a random period within a predetermined range. When it is not time to perform the behavior (No at Step S58), the process proceeds to Step S50. On the other hand, when it is time to perform the behavior (Yes at Step S58), the non-language operation instruction unit 10k determines an operation parameter related to the gesture (Step S59). For example, the stationary non-verbal motion determination unit 10i determines the motion and facial expression from the whispering motion data 11c when expressing a facial expression as a gesture. The non-language operation instruction unit 10k outputs an operation instruction for performing the determined operation and facial expression operation to the motor control unit 40 (step S60), and proceeds to step S50.

上述してきたように、本実施例に係るロボット１は、対話対象者の発話および動作の少なくとも一方を検出し、検出された発話および動作の少なくとも一方に基づいて、対話対象者との対話に関する評価を算出する。また、本実施例に係るロボット１は、対話対象者に対して仕草により定常的に態度を表出する。そして、本実施例に係るロボット１は、算出された評価が高い状態が続いた場合、表出される態度を対話に対して理解の高い状態へと変更する。また、本実施例に係るロボット１は、算出された評価が低い状態が続いた場合、表出される態度を前記対話に対して理解の低い状態へと変更する。これにより、本実施例に係るロボット１によれば、ロボット１が表出する態度からロボット１が対話に対して理解が高い状態か理解が低い状態か対話対象者が判別できる。これにより、本実施例に係るロボット１によれば、ロボット１が対話に対して理解の低い状態となり発話が不自然な内容となっても、対話対象者が予めロボット１の対話に対する理解が低い状態であると判別できるため、人との対話が不自然になることを抑制できる。 As described above, the robot 1 according to the present embodiment detects at least one of the utterance and action of the conversation target person, and evaluates the conversation with the conversation target person based on at least one of the detected utterance and action. Is calculated. Further, the robot 1 according to the present embodiment constantly expresses an attitude to the conversation target person by gesture. And the robot 1 which concerns on a present Example will change the attitude | position expressed to a state with high understanding with respect to a dialog, when the state where the calculated evaluation is high continues. In addition, when the calculated evaluation continues to be in a low state, the robot 1 according to the present embodiment changes the expressed attitude to a state with a low understanding with respect to the dialogue. Thereby, according to the robot 1 according to the present embodiment, it is possible to determine the conversation target person from the attitude that the robot 1 expresses, whether the robot 1 is in a high understanding state or low understanding state. Thereby, according to the robot 1 according to the present embodiment, even if the robot 1 is in a state of low understanding with respect to the dialogue and the utterance becomes unnatural, the conversation target person has a low understanding of the dialogue of the robot 1 in advance. Since it can be determined that the state is a state, it is possible to suppress an unnatural conversation with a person.

また、本実施例に係るロボット１は、評価として、発話に対する理解度、および当該理解度が高い状態が続いた場合に上昇し、当該理解度が低い状態が続いた場合に低下する興味度を算出する。そして、本実施例に係るロボット１は、理解度および前記興味度に基づいて表出される態度を変更する。これにより、本実施例に係るロボット１によれば、単純にユーザからの一つの発話に対する理解度のみでなく、興味度も用いて態度を変更するため、ユーザとの対話毎に態度が頻繁に変わることを抑制できる。 In addition, the robot 1 according to the present embodiment, as an evaluation, increases the degree of understanding of the utterance and the degree of interest that decreases when the state of high understanding continues and decreases when the state of low understanding continues. calculate. And the robot 1 which concerns on a present Example changes the attitude | position expressed based on an understanding degree and the said interest degree. Thereby, according to the robot 1 according to the present embodiment, the attitude is changed using not only the degree of understanding of one utterance from the user but also the degree of interest, so the attitude frequently occurs for each dialogue with the user. The change can be suppressed.

また、本実施例に係るロボット１は、対話に対する傾聴感、不明感、退屈感を仕草により表出可能とし、評価が低い状態が続いた場合、表出される態度が傾聴感、不明感、退屈感がある態度へ順次変更する。これにより、本実施例に係るロボット１によれば、理解が低い状態が続いた場合に、対話が理解できないことがより表出される態度に段階的に移行するため、対話の違和感を抑制できる。 In addition, the robot 1 according to the present embodiment can express a sense of listening, unknown, and bored with respect to dialogue by gestures, and when the evaluation continues to be low, the attitude expressed is a sense of listening, unknown, bored Sequentially change the attitude. Thereby, according to the robot 1 which concerns on a present Example, when the state where an understanding is low continues, since it shifts in steps to the attitude | position which expresses more that a dialogue cannot be understood, the uncomfortable feeling of a dialogue can be suppressed.

また、本実施例に係るロボット１は、頷きの深さ、頷きの早さ、頷きのタイミング、頷きの頻度、前記対話対象者に対して視線を合わせるタイミング、視線を合わせる時間、視線を合わせる頻度、表情の何れかにより定常的に態度を表出する。これにより、本実施例に係るロボット１によれば、ユーザがロボット１の状態を判別しやすい。 In addition, the robot 1 according to the present embodiment has a depth of whispering, a speed of whispering, a timing of whispering, a frequency of whispering, a timing for aligning the line of sight with respect to the conversation target, a time for aligning the line of sight, and a frequency of aligning the line of sight The attitude is constantly expressed by any of the facial expressions. Thereby, according to the robot 1 according to the present embodiment, the user can easily determine the state of the robot 1.

さて、これまで開示の装置に関する実施例について説明したが、本発明は上述した実施例以外にも、種々の異なる形態にて実施されてよいものである。そこで、以下では、本発明に含まれる他の実施例を説明する。 Although the embodiments related to the disclosed apparatus have been described above, the present invention may be implemented in various different forms other than the above-described embodiments. Therefore, another embodiment included in the present invention will be described below.

例えば、上記の実施例では、表出する態度を傾聴感、不明感、退屈感の３段階とした場合について説明したが、開示の装置はこれに限定されない。例えば、表出する態度を２段階あるいは４段階以上としてもよい。また、表出する態度に段階を設けず、除々に変化させるようにしてもよい。 For example, in the above-described embodiments, the case where the attitude to be expressed is the three stages of the sense of listening, the unknown, and the bored feeling has been described, but the disclosed apparatus is not limited to this. For example, the attitude to be expressed may be two or four or more. In addition, the attitude of the expression may be gradually changed without providing a stage.

また、上記の実施例では、対話に関する評価として、理解度および興味度を算出する場合について説明したが、開示の装置はこれに限定されない。対話に関する評価を示す値であれば、何れの値を用いてもよい。例えば、理解できた単語の割合や、非言語表現解析部１０ｃによる解析によって得られた計測値を評価として用いてもよい。 In the above-described embodiments, the case where the degree of understanding and the degree of interest are calculated as the evaluation regarding the dialogue has been described. However, the disclosed apparatus is not limited thereto. Any value may be used as long as it is a value indicating the evaluation regarding the dialogue. For example, the ratio of words that can be understood or the measurement value obtained by the analysis by the non-linguistic expression analysis unit 10c may be used as the evaluation.

また、上記の実施例では、ロボット１により対話装置を実現した場合について説明したが、これに限定されない。例えば、パーソナルコンピュータ（ＰＣ：Personal Computer）を始めとする固定端末や、携帯電話機、ＰＨＳ（Personal Handyphone System）やＰＤＡ（Personal Digital Assistant）などの移動体端末により対話装置を実現してもよい。例えば、固定端末や移動体端末により対話装置を実現する場合、ディスプレイなどの表示に顔の画像を表示し、顔の画像を変化させて態度を表出すればよい。 In the above-described embodiment, the case where the interactive device is realized by the robot 1 has been described. However, the present invention is not limited to this. For example, the interactive device may be realized by a fixed terminal such as a personal computer (PC), a mobile terminal such as a mobile phone, a PHS (Personal Handyphone System), or a PDA (Personal Digital Assistant). For example, when an interactive device is realized by a fixed terminal or a mobile terminal, a face image may be displayed on a display or the like, and the attitude may be expressed by changing the face image.

［分散および統合］
また、図示した各装置の各構成要素は、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、制御部１０の各処理部が適宜統合されてもよい。また、各処理部の処理が適宜複数の処理部の処理に分離されてもよい。さらに、各処理部にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 [Distribution and integration]
In addition, each component of each illustrated apparatus does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. For example, the processing units of the control unit 10 may be integrated as appropriate. Further, the processing of each processing unit may be appropriately separated into a plurality of processing units. Further, all or any part of each processing function performed in each processing unit can be realized by a CPU and a program analyzed and executed by the CPU, or can be realized as hardware by wired logic. .

［対話プログラム］
また、上記の実施例で説明した各種の処理は、予め用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータで実行することによって実現することができる。そこで、以下では、図１３を用いて、上記の実施例と同様の機能を有する対話プログラムを実行し、ディスプレイに顔画像を表示してユーザと対話を行うコンピュータの一例について説明する。 [Dialogue program]
The various processes described in the above embodiments can be realized by executing a prepared program on a computer such as a personal computer or a workstation. Therefore, in the following, an example of a computer that executes a dialogue program having the same function as in the above-described embodiment, displays a face image on the display, and interacts with the user will be described with reference to FIG.

図１３は、対話処理プログラムを実行するコンピュータの一例について説明するための図である。図１３に示すように、コンピュータ１００は、操作部１１０ａと、スピーカ１１０ｂと、カメラ１１０ｃと、マイク１１０ｄ、ディスプレイ１２０と、通信部１３０とを有する。さらに、このコンピュータ１００は、ＣＰＵ１５０と、ＲＯＭ１６０と、ＨＤＤ１７０と、ＲＡＭ１８０と有する。これら１１０〜１８０の各部はバス１４０を介して接続される。 FIG. 13 is a diagram for describing an example of a computer that executes a dialogue processing program. As illustrated in FIG. 13, the computer 100 includes an operation unit 110a, a speaker 110b, a camera 110c, a microphone 110d, a display 120, and a communication unit 130. Further, the computer 100 includes a CPU 150, a ROM 160, an HDD 170, and a RAM 180. These units 110 to 180 are connected via a bus 140.

ＨＤＤ１７０には、制御部１０の各処理部と同様の機能を発揮する対話プログラム１７０ａが予め記憶される。この対話プログラム１７０ａについては、実施例１で示した各構成要素と同様、適宜統合又は分離しても良い。すなわち、ＨＤＤ１７０に格納される各データは、常に全てのデータがＨＤＤ１７０に格納される必要はなく、処理に必要なデータのみがＨＤＤ１７０に格納されれば良い。 The HDD 170 stores in advance a dialogue program 170a that performs the same function as each processing unit of the control unit 10. The dialogue program 170a may be appropriately integrated or separated as in the case of each component shown in the first embodiment. In other words, all data stored in the HDD 170 need not always be stored in the HDD 170, and only data necessary for processing may be stored in the HDD 170.

そして、ＣＰＵ１５０が、対話プログラム１７０ａをＨＤＤ１７０から読み出してＲＡＭ１８０に展開する。これによって、図１３に示すように、対話プログラム１７０ａは、対話プロセス１８０ａとして機能する。この対話プロセス１８０ａは、ＨＤＤ１７０から読み出した各種データを適宜ＲＡＭ１８０上の自身に割り当てられた領域に展開し、この展開した各種データに基づいて各種処理を実行する。なお、対話プロセス１８０ａは、制御部１０の各処理部にて実行される処理、例えば図９〜１２に示す処理を含む。また、ＣＰＵ１５０上で仮想的に実現される各処理部は、常に全ての処理部がＣＰＵ１５０上で動作する必要はなく、処理に必要な処理部のみが仮想的に実現されれば良い。 Then, the CPU 150 reads the interactive program 170 a from the HDD 170 and develops it in the RAM 180. Thus, as shown in FIG. 13, the dialogue program 170a functions as a dialogue process 180a. The interactive process 180a expands various data read from the HDD 170 in an area allocated to itself on the RAM 180 as appropriate, and executes various processes based on the expanded data. The dialogue process 180a includes processes executed by the respective processing units of the control unit 10, for example, the processes shown in FIGS. In addition, each processing unit virtually realized on the CPU 150 does not always require that all processing units operate on the CPU 150, and only a processing unit necessary for the processing needs to be virtually realized.

なお、上記の対話プログラム１７０ａについては、必ずしも最初からＨＤＤ１７０やＲＯＭ１６０に記憶させておく必要はない。例えば、コンピュータ１００に挿入されるフレキシブルディスク、いわゆるＦＤ、ＣＤ−ＲＯＭ、ＤＶＤディスク、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」に各プログラムを記憶させる。そして、コンピュータ１００がこれらの可搬用の物理媒体から各プログラムを取得して実行するようにしてもよい。また、公衆回線、インターネット、ＬＡＮ、ＷＡＮなどを介してコンピュータ１００に接続される他のコンピュータまたはサーバ装置などに各プログラムを記憶させておき、コンピュータ１００がこれらから各プログラムを取得して実行するようにしてもよい。 Note that the dialogue program 170a is not necessarily stored in the HDD 170 or the ROM 160 from the beginning. For example, each program is stored in a “portable physical medium” such as a flexible disk inserted into the computer 100, so-called FD, CD-ROM, DVD disk, magneto-optical disk, or IC card. Then, the computer 100 may acquire and execute each program from these portable physical media. Each program is stored in another computer or server device connected to the computer 100 via a public line, the Internet, a LAN, a WAN, etc., and the computer 100 acquires and executes each program from these. It may be.

１ロボット
２頭
４Ｒ右腕
４Ｌ左腕
５Ｒ右脚
５Ｌ左脚
６Ｒ右耳
６Ｌ左耳
７目
７Ａまぶた
９口
１０制御部
１０ｄ評価算出部
１０ｅ状態決定部
４１カメラ
４２マイク
４３スピーカ DESCRIPTION OF SYMBOLS 1 Robot 2 Head 4R Right arm 4L Left arm 5R Right leg 5L Left leg 6R Right ear 6L Left ear 7 Eye 7A Eyelid 9 Mouth 10 Control part 10d Evaluation calculation part 10e State determination part 41 Camera 42 Microphone 43 Speaker

Claims

A detection unit for detecting the utterances of the conversation target person,
Based on the utterance detected by the detection unit, a calculation unit that calculates an evaluation regarding the dialogue with the dialogue target person;
An expression unit that constantly expresses an attitude according to one or both of the frequency of whispering and the frequency of aligning the line of sight with the subject of dialogue;
If the evaluation calculated by the calculation unit continues to be in a low state, change one or both of the contact frequency and the frequency of matching the line of sight with the person to be talked, and if the evaluation continues to be high, the contact frequency, A change unit that changes one or both of the frequency of matching the line of sight with respect to the conversation target person,
An interactive device having

The calculation unit calculates the degree of understanding for the utterance, as the evaluation, rises when the comprehension that has continued high, calculating the interest degree to be reduced when the comprehension that has continued low state ,
The interactive device according to claim 1, wherein the changing unit changes an attitude expressed by the expressing unit based on the degree of interest.

Prior Symbol changing unit, when the degree of interest is equal to or less than a predetermined first threshold value, modify nod frequency, the first state less one or both of the frequency of eye contact to the conversation subject, the degree of interest Is greater than the first threshold and the comprehension level is less than or equal to a predetermined second threshold, one or both of the whispering frequency and the frequency of aligning the line of sight with respect to the person to be dialogued are greater than in the first state. If the degree of interest is greater than the first threshold and the degree of understanding is greater than the second threshold, the frequency of whispering and the frequency of aligning the line of sight with the conversation target person The interactive apparatus according to claim 2, wherein one or both of the two are changed to a third state that is greater than the second state.

On the computer,
Based on the utterance detected by the detection unit for detecting the utterance of the conversation target person, an evaluation regarding the dialog with the conversation target person is calculated,
If the calculated evaluation continues to be low, the frequency of whispering expressed by the expression unit that constantly expresses the attitude according to one or both of the whispering frequency and the frequency of aligning the line of sight with the subject of conversation, subject of dialogue If one or both of the frequency of matching the line of sight to the person is changed to a low level and the evaluation continues to be high, the frequency of whispering expressed by the display unit, the frequency of matching the line of sight to the conversation target person or An interactive program characterized by executing a process that changes both.

Computer
Based on the utterance detected by the detection unit for detecting the utterance of the conversation target person, an evaluation regarding the dialog with the conversation target person is calculated,
If the calculated evaluation continues to be low, the frequency of whispering expressed by the expression unit that constantly expresses the attitude according to one or both of the whispering frequency and the frequency of aligning the line of sight with the subject of conversation, subject of dialogue If one or both of the frequency of matching the line of sight to the person is changed to a low level and the evaluation continues to be high, the frequency of whispering expressed by the display unit, the frequency of matching the line of sight to the conversation target person or An interactive method characterized by executing a process that changes both.