JP2012181697A

JP2012181697A - Dialog system, interaction control method, and program

Info

Publication number: JP2012181697A
Application number: JP2011044406A
Authority: JP
Inventors: Ryohei Sasama; 亮平笹間; Tomoharu Yamaguchi; 智治山口; Mutsuo Sano; 睦夫佐野; Toshimoto Nishiguchi; 敏司西口; Kenzaburo Miyawaki; 健三郎宮脇; Kentaro Mukai; 謙太郎向井; Tomohiro Domoto; 知裕堂本
Original assignee: NEC Corp; Josho Gakuen Educational Foundation
Current assignee: NEC Corp; Josho Gakuen Educational Foundation
Priority date: 2011-03-01
Filing date: 2011-03-01
Publication date: 2012-09-20

Abstract

PROBLEM TO BE SOLVED: To solve communication gaps in real time and perform a persistent and natural communication in a dialog system.SOLUTION: An interaction state amount calculation part 12 detects an interaction state amount including speaker switching pauses in utterance of a user P, a pitch of an utterance section, and power or mora. A communication synchronization control part 13 calculates a communication synchronization shift amount between the user P and a robot system 10, minimizes the communication synchronization shift amount by continuous pull-in control by a state equation expressing a synchronization model, and simultaneously makes an interaction state amount of the user P similar to that of a robot system 10 by discrete pull-in control of an interaction rule, or makes the interaction state amount of the user P similar to that of the robot system 10 while making the interaction state amount of the robot system 10 similar to that of the user P.

Description

本発明は、対話システム、対話制御方法およびプログラムに関する。より詳しくは在宅で、問診などの情報収集を行うために人間との会話を行う対話システム、対話制御方法およびプログラムに関する。 The present invention relates to a dialog system, a dialog control method, and a program. More specifically, the present invention relates to a dialogue system, a dialogue control method, and a program for performing conversation with a human at home to collect information such as an inquiry.

会話による自然なコミュニケーションを行うためには、適切な発話タイミング制御や韻律制御、および頷きなどの身体的制御が必要となる。問診などの情報収集を円滑に行うには、その日のユーザの状態に合わせて、適切な発話タイミング制御や韻律制御および頷きなどの身体的制御が必要となる。 Appropriate utterance control, prosodic control, and physical control such as whispering are necessary for natural communication through conversation. In order to smoothly collect information such as medical interviews, appropriate speech timing control, prosodic control, and physical control such as whispering are required according to the state of the user on that day.

非特許文献１および非特許文献２は、発話の音響的特徴とキーワードを素性とし、人間同士の対話を決定木で機械学習することにより適切な発話タイミングで相槌や応答を行う対話システムに関するものである。この対話システムは、オフラインでの対話学習から導出される人間の平均的な状態を考慮して相槌や話者交替のタイミングを生成している。非特許文献１または２の対話システムでは、リアルタイムに相手の発話に同調的に発話を生成することはできない。 Non-Patent Document 1 and Non-Patent Document 2 relate to an interactive system that uses acoustic features and keywords of utterances as features, and performs interaction and response at appropriate utterance timings by machine learning of human interaction with a decision tree. is there. In this dialogue system, the timing of reciprocity and speaker change is generated in consideration of an average human state derived from offline dialogue learning. In the dialog system of Non-Patent Document 1 or 2, it is not possible to generate an utterance synchronously with the utterance of the other party in real time.

非特許文献３は、上記対話システムを発展させて、リアルタイムで応答タイミングを生成させる技術を開示している。しかし非特許文献３の音声対話システムでは、対話システムと人間の韻律情報や発話の「間（ま）」（交替潜時）のギャップをどのようにシステムが解消していくのかについては示されていない。 Non-Patent Document 3 discloses a technique for developing the above-described dialog system and generating response timing in real time. However, in the spoken dialogue system of Non-Patent Document 3, it is shown how the system resolves the gap between the dialogue system and human prosodic information and utterances. Absent.

特許文献１には、話者の感性に即した円滑な対話を行うことを目的とする音声対話装置が開示されている。特許文献１の音声対話装置は、通常状態では、応答音声の発話までのポーズ時間及び発話速度が、話者の発話速度に応じた状態となるように制御される。応答音声の出力期間中に、話者側で応答音声が早期に終わって欲しいと感じるイベントが発生したときには、応答音声の発話速度を、そのピッチを変化させることなく、それまでの速度より連続的に高速化するように制御する。 Japanese Patent Application Laid-Open No. 2004-151867 discloses a voice dialogue apparatus for the purpose of performing a smooth dialogue in accordance with the sensitivity of a speaker. In the normal state, the speech dialogue apparatus of Patent Document 1 is controlled so that the pause time and speech rate until the response speech is spoken are in a state corresponding to the speech rate of the speaker. When an event occurs that the speaker wants the response voice to finish early during the response voice output period, the response voice is spoken more continuously than the previous speed without changing the pitch. Control to speed up.

ロボットの制御技術として、特許文献２には、ロボットの可動部の運動を周期運動と捉え、その位相を調整することで大局的な姿勢安定制御を行なうことが記載されている。特許文献２では、不整地踏破性を目的として、位相を離散的に制御する方法を用いている。 As a robot control technique, Patent Document 2 describes that the movement of a movable part of a robot is regarded as a periodic movement, and a global posture stability control is performed by adjusting the phase thereof. In patent document 2, the method of discretely controlling the phase is used for the purpose of rough terrain breakthrough.

特開２００８−２６４６３号公報JP 2008-26463 A 特開２００５−９６０６８号公報Japanese Patent Laid-Open No. 2005-96068

Kitaoka,N. ,Takeuchi,M. Nishimura, R., Nakagawa, S. “Response Timing Detection Using Prosodic and Linguistic Information for Human-Friendly Spoken Dialog Systems, 人工知能学会論文誌 Vol.20 No.3 , SP-E pp.220-228, 2005Kitaoka, N., Takeuchi, M. Nishimura, R., Nakagawa, S. “Response Timing Detection Using Prosodic and Linguistic Information for Human-Friendly Spoken Dialog Systems, Journal of Japanese Society for Artificial Intelligence Vol.20 No.3, SP-E pp.220-228, 2005 北岡、中川他、“協調的音声対話の相槌・話者交替タイミング分析とそれに基づく応答生成法の研究”、科研費報告資料研究課題番号17300064、2005Kitaoka, Nakagawa, et al., “Analysis and timing analysis of collaborative spoken dialogue and response generation based on it”, Grant-in-Aid for Scientific Research report No. 17300064, 2005 西村、中川、“応答タイミングを考慮した音声対話システムとその評価”、情報処理学会SLP研究報告2009-SLP-77 No.2、 pp.1-6、2009Nishimura, Nakagawa, “Spoken Dialogue System Considering Response Timing and Its Evaluation”, IPSJ SLP Research Report 2009-SLP-77 No.2, pp.1-6, 2009

特許文献１の音声対話装置は、応答音声の発話の韻律特徴（ポーズ時間や発話速度）を、人間の発話速度に応じた状態になるように制御するものであり、人間の要求に応じて話速変換するものである。しかし、発話速度の調整だけでは、自然な対話は実現できず、発話の間や音調などを適応的に制御できる仕組みが必要となる。 The speech dialogue apparatus of Patent Document 1 controls prosodic features (pause time and speech speed) of response speech utterance so as to be in a state according to the human speech speed. It is for speed conversion. However, a natural conversation cannot be realized only by adjusting the speaking speed, and a mechanism capable of adaptively controlling the utterance and tone is required.

在宅で問診などの毎日の情報収集をユーザに負担をかけずに行うには、ユーザの体調やメンタル面での変化などから起因するコミュニケーションに関する状態量をリアルタイムに把握し、持続的に情報収集を適切に行う必要がある。ここで、課題となるのが、自然なコミュニケーションの中で的確な情報収集を行うことである。 In order to collect daily information such as interviews at home without placing a burden on the user, we can grasp the amount of state related to communication resulting from changes in the user's physical condition and mental aspects in real time and collect information continuously. It needs to be done properly. Here, the issue is to collect accurate information through natural communication.

ユーザに精神的負担がかからない自然なコミュニケーションを実現するためには、話者交替タイミングで代表されるような、対話システムのリズムとユーザのリズムが同調していることが前提である。対話システムは、ハードウェアとして実体を持つロボットを介するもの、または実体をもたないロボット（ＣＧ：コンピュータグラフィックス表現されたもの）を介する構成がある。ロボットのリズムがユーザのリズムとギャップがあったとき、ロボットがユーザのリズムを探索し、ロボットのリズムをユーザのリズムに近づけることにより同調現象を誘発させることは可能であるが、一方的に、ロボットのコミュニケーション制御パラメータをユーザのそれに近づけることが、自然なコミュニケーションを実現するとは言い難い。一時的にコミュニケーションは成立するが、長期的に全体を通して評価すると不自然なコミュニケーションであって、ロボット固有の印象を壊してしまい、ユーザがロボットとのコミュニケーションに対してストレスを感じ、持続的なコミュニケーションができなくなってしまう可能性がある。 In order to realize natural communication that does not impose a mental burden on the user, it is premised that the rhythm of the dialogue system and the rhythm of the user, as represented by the speaker change timing, are synchronized. The dialogue system has a configuration via a robot having an entity as hardware or via a robot (CG: computer graphics expression) having no entity. When the robot rhythm has a gap with the user's rhythm, the robot can search for the user's rhythm and bring the robot's rhythm closer to the user's rhythm. It is hard to say that bringing the communication control parameters of the robot closer to that of the user will realize natural communication. Communication is established temporarily, but it is unnatural communication when evaluated over the long term, destroying the impression unique to the robot, the user feels stressed about communication with the robot, and continuous communication May become impossible.

また、コミュニケーションを進めていく中で、コミュニケーション同調が断絶することが多々存在し、どのようにリアルタイムでコミュニケーションギャップを解決するかが課題となる。それには、ユーザとロボット間のコミュニケーションのダイナミクスを考慮してコミュニケーション制御方式を設計する必要がある。このコミュニケーションダイナミクスをモデル化する一般的な方法論として、制御すべき状態量の挙動のダイナミクスを、連続値をとる状態方程式で表現する方式が考えられる。同調現象は、心臓のペースメーカや電子回路の共振モデルなどで用いられている振動子の原理に基づき状態方程式を用いて表現可能である。本発明におけるコミュニケーション状態量は、話者交替時の「間（ま）」（交替潜時）の状態量や発話区間のピッチ、パワーまたはモーラを含む韻律特徴の状態量、および／または、頷きなどの身体的挙動を示す状態量と考えている。これらのコミュニケーション状態量の同調現象を状態方程式により表現し、連続的に、ユーザとロボット間の引き込み制御（連続的引き込み制御）を行う方式が想定される。 In addition, there are many cases where communication synchronization is interrupted in the course of communication, and how to solve the communication gap in real time is an issue. To do this, it is necessary to design a communication control method considering the communication dynamics between the user and the robot. As a general methodology for modeling the communication dynamics, a method of expressing the dynamics of the behavior of the state quantity to be controlled by a state equation taking a continuous value is conceivable. The tuning phenomenon can be expressed using a state equation based on the principle of a vibrator used in a heart pacemaker or a resonance model of an electronic circuit. The communication state quantity in the present invention includes the state quantity of “between” (alternative latency) at the time of speaker change, the state quantity of prosodic features including the pitch, power, or mora of the utterance section, and / or whispering, etc. It is considered as a state quantity that indicates the physical behavior of the child. It is assumed that a synchronization phenomenon of these communication state quantities is expressed by a state equation and a pull-in control (continuous pull-in control) between the user and the robot is continuously performed.

しかし、このような物理的な同調モデルだけでは、ユーザとロボット間の引き込み制御を行うことは困難である。たとえば、ロボットの交替潜時とユーザの交替潜時とにギャップがあったとき、ロボットがユーザの交替潜時を探索し、ロボットの交替潜時をユーザの交替潜時に近づけることにより同調現象を誘発させることは可能であるが、一方的に、ロボットの交替潜時をユーザのそれに近づけることが、自然で持続的なコミュニケーションを実現することは言い難い。また、同調現象が発現するまでに、ユーザがコミュニケーションを諦めてしまう可能性もあり、コミュニケーションの早い段階から同調現象を発現させる必要がある。 However, it is difficult to perform the pull-in control between the user and the robot only with such a physical tuning model. For example, when there is a gap between the robot's change latency and the user's change latency, the robot searches for the user's change latency and induces the synchronization phenomenon by bringing the robot change latency closer to the user's change latency. However, it is difficult to say that bringing the robot's alternation latency closer to that of the user realizes natural and continuous communication. Further, there is a possibility that the user gives up communication before the synchronization phenomenon appears, and it is necessary to cause the synchronization phenomenon to appear from an early stage of communication.

さらに、より自然なコミュニケーションを実現するには、目的に沿った会話構造を意識する必要がある。会議での議論、雑談、癒し応答など目的に応じた会話構造がそれぞれ存在する。すべての会話の種類を扱える会話構造など存在せず、会話の種類に応じたモデルをたてないと、会話を自然に進めることができない。 Furthermore, in order to realize more natural communication, it is necessary to be conscious of the conversation structure according to the purpose. There are conversation structures according to the purpose, such as discussions at meetings, chats, and healing responses. There is no conversation structure that can handle all types of conversations, and conversations cannot be advanced naturally unless a model corresponding to the type of conversation is established.

本発明は、上記事情に鑑みてなされたもので、コミュニケーションギャップをリアルタイムに解決し、持続的かつ自然なコミュニケーションを行う対話システム、対話制御方法およびプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a dialog system, a dialog control method, and a program that solve a communication gap in real time and perform continuous and natural communication.

上記目的を達成するために、本発明の第１の観点に係る対話システムは、
話者の発話内容を認識する音声認識手段、およびその認識結果に応じて音声による聴覚的応答、および／または、身体的挙動の表現による視覚応答を出力する応答制御手段を備える対話システムであって、
前記話者の発話における、話者交替潜時、発話区間のピッチ、パワーまたはモーラを含む韻律特徴の状態量の検出、および／または、前記話者の身体的挙動を示す状態量の検出、を行う状態量検出手段と、
前記韻律特徴の状態量または前記身体的挙動を示す状態量を含むインタラクション状態量から、前記話者と前記応答制御手段とのインタラクション状態量のずれ量であるコミュニケーション同調ずれ量を算出する手段と、
前記応答制御手段のインタラクション状態量を変化させる規則であるインタラクションルールを記憶するルール記憶手段と、
前記コミュニケーション同調ずれ量に基づいて、前記ルール記憶手段から前記インタラクションルールを選択するルール選択手段と、
同調モデルを表す状態方程式による連続的な引き込み制御により、前記コミュニケーション同調ずれ量を最小化すると同時に、前記ルール選択手段で選択したインタラクションルールによる離散的な引き込み制御により、前記話者のインタラクション状態量を前記対話システムのインタラクション状態量に近づけていくこと、または、前記対話システムのインタラクション状態量を前記話者のインタラクション状態量に近づけつつ前記話者のインタラクション状態量を前記対話システムのインタラクション状態量に近づけていく同調制御手段と、
を備えることを特徴とする。 In order to achieve the above object, an interactive system according to the first aspect of the present invention provides:
An interactive system comprising speech recognition means for recognizing the utterance content of a speaker, and response control means for outputting an auditory response by voice and / or a visual response by expression of physical behavior according to the recognition result ,
Detection of state quantities of prosodic features including speaker alternation latency, pitch of speech section, power or mora, and / or detection of state quantities indicating physical behavior of the speaker in the speaker's utterance State quantity detection means to perform;
Means for calculating a communication synchronization deviation amount, which is a deviation amount of an interaction state amount between the speaker and the response control means, from an interaction state amount including the state amount of the prosodic feature or the state amount indicating the physical behavior;
Rule storage means for storing an interaction rule which is a rule for changing an interaction state quantity of the response control means;
Rule selection means for selecting the interaction rule from the rule storage means based on the communication synchronization deviation amount;
The amount of interaction state of the speaker is reduced by discrete pull-in control by the interaction rule selected by the rule selection means, while at the same time minimizing the communication tuning shift amount by continuous pull-in control by a state equation representing a tuning model. Approaching the interaction state quantity of the dialog system, or bringing the interaction state quantity of the dialog system close to the interaction state quantity of the speaker while bringing the interaction state quantity of the speaker close to the interaction state quantity of the dialog system Tuning control means to go,
It is characterized by providing.

本発明の第２の観点に係る対話制御方法は、
話者の発話内容を認識する音声認識手段、およびその認識結果に応じて音声による聴覚的応答、および／または、身体的挙動の表現による視覚応答を出力する応答制御手段を備える対話システムが行う対話制御方法であって、
前記話者の発話における、話者交替潜時、発話区間のピッチ、パワーまたはモーラを含む韻律特徴の状態量の検出、および／または、前記話者の身体的挙動を示す状態量の検出、を行う状態量検出ステップと、
前記韻律特徴の状態量または前記身体的挙動を示す状態量を含むインタラクション状態量から、前記話者と前記応答制御手段とのインタラクション状態量のずれ量であるコミュニケーション同調ずれ量を算出するステップと、
前記コミュニケーション同調ずれ量に基づいて、前記応答制御手段のインタラクション状態量を変化させる規則であるインタラクションルールを記憶するルール記憶手段から、前記インタラクションルールを選択するルール選択ステップと、
同調モデルを表す状態方程式による連続的な引き込み制御により、前記コミュニケーション同調ずれ量を最小化すると同時に、前記ルール選択ステップで選択したインタラクションルールによる離散的な引き込み制御により、前記話者のインタラクション状態量を前記対話システムのインタラクション状態量に近づけていくこと、または、前記対話システムのインタラクション状態量を前記話者のインタラクション状態量に近づけつつ前記話者のインタラクション状態量を前記対話システムのインタラクション状態量に近づけていく同調制御ステップと、
を備えることを特徴とする。 The dialogue control method according to the second aspect of the present invention includes:
Dialogue performed by a dialogue system comprising speech recognition means for recognizing the utterance content of a speaker, and response control means for outputting an auditory response by voice and / or a visual response by expression of physical behavior according to the recognition result A control method,
Detection of state quantities of prosodic features including speaker alternation latency, pitch of speech section, power or mora, and / or detection of state quantities indicating physical behavior of the speaker in the speaker's utterance A state quantity detection step to be performed;
Calculating a communication synchronization deviation amount, which is a deviation amount of an interaction state amount between the speaker and the response control means, from an interaction state amount including the state amount of the prosodic feature or the state amount indicating the physical behavior;
A rule selection step of selecting the interaction rule from a rule storage unit that stores an interaction rule that is a rule for changing an interaction state amount of the response control unit based on the communication synchronization deviation amount;
The communication state deviation amount is minimized by continuous pull-in control by a state equation representing a tuning model, and at the same time, the interaction state amount of the speaker is determined by discrete pull-in control by the interaction rule selected in the rule selection step. Approaching the interaction state quantity of the dialog system, or bringing the interaction state quantity of the dialog system close to the interaction state quantity of the speaker while bringing the interaction state quantity of the speaker close to the interaction state quantity of the dialog system The tuning control step
It is characterized by providing.

本発明の第３の観点に係るプログラムは、
話者の発話内容を認識する音声認識手段、およびその認識結果に応じて音声による聴覚的応答、および／または、身体的挙動の表現による視覚応答を出力する応答制御手段を備える対話システムを制御するコンピュータに、
話者の発話における、話者交替潜時、発話区間のピッチ、パワーまたはモーラを含む韻律特徴の状態量の検出、および／または、前記話者の身体的挙動を示す状態量の検出、を行う状態量検出ステップと、
前記韻律特徴の状態量または前記身体的挙動を示す状態量を含むインタラクション状態量から、前記話者と前記応答制御手段とのインタラクション状態量のずれ量であるコミュニケーション同調ずれ量を算出するステップと、
前記コミュニケーション同調ずれ量に基づいて、前記応答制御手段のインタラクション状態量を変化させる規則であるインタラクションルールを記憶するルール記憶手段から、前記インタラクションルールを選択するルール選択ステップと、
同調モデルを表す状態方程式による連続的な引き込み制御により、前記コミュニケーション同調ずれ量を最小化すると同時に、前記ルール選択ステップで選択したインタラクションルールによる離散的な引き込み制御により、前記話者のインタラクション状態量を前記対話システムのインタラクション状態量に近づけていくこと、または、前記対話システムのインタラクション状態量を前記話者のインタラクション状態量に近づけつつ前記話者のインタラクション状態量を前記対話システムのインタラクション状態量に近づけていく同調制御ステップと、
を実行させることを特徴とする。 The program according to the third aspect of the present invention is:
Controlling a dialogue system comprising voice recognition means for recognizing the utterance content of a speaker, and response control means for outputting an auditory response by voice and / or a visual response by expression of physical behavior according to the recognition result On the computer,
In the speaker's utterance, detection of the state of the prosody feature including the alternation latency of the speaker, the pitch of the utterance section, the power or the mora, and / or the state amount indicating the physical behavior of the speaker is performed. A state quantity detection step;
Calculating a communication synchronization deviation amount, which is a deviation amount of an interaction state amount between the speaker and the response control means, from an interaction state amount including the state amount of the prosodic feature or the state amount indicating the physical behavior;
A rule selection step of selecting the interaction rule from a rule storage unit that stores an interaction rule that is a rule for changing an interaction state amount of the response control unit based on the communication synchronization deviation amount;
The communication state deviation amount is minimized by continuous pull-in control by a state equation representing a tuning model, and at the same time, the interaction state amount of the speaker is determined by discrete pull-in control by the interaction rule selected in the rule selection step. Approaching the interaction state quantity of the dialog system, or bringing the interaction state quantity of the dialog system close to the interaction state quantity of the speaker while bringing the interaction state quantity of the speaker close to the interaction state quantity of the dialog system The tuning control step
Is executed.

本発明によれば、コミュニケーションギャップをリアルタイムに解決し、持続的かつ自然なコミュニケーションを通じて、毎日の情報収集を継続的に行うように、対話システムの制御を行うことができる。 ADVANTAGE OF THE INVENTION According to this invention, a communication gap can be solved in real time, and a dialog system can be controlled so that daily information collection may be continuously performed through continuous and natural communication.

本発明の実施の形態１に係るロボットシステムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the robot system which concerns on Embodiment 1 of this invention. 引き込み制御方式のパターンを示す図である。It is a figure which shows the pattern of a drawing-in control system. ユーザ状態量に基づく適応的引き込み制御の例を示す図である。It is a figure which shows the example of adaptive drawing-in control based on a user state quantity. インタラクションルール学習による同調発現の加速を示す図である。It is a figure which shows acceleration of synchronous expression by interaction rule learning. インタラクションルールによる引き込み制御の相転移を示す図である。It is a figure which shows the phase transition of entrainment control by an interaction rule. 交替潜時および韻律特徴の引き込み制御例を示す図である。It is a figure which shows the example of pull-in control of alternation latency and prosodic features. 本発明の実施の形態２に係るロボットシステムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the robot system which concerns on Embodiment 2 of this invention. 本発明の実施の形態３に係るロボットシステムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the robot system which concerns on Embodiment 3 of this invention. 情報収集（問診）タスクにおける会話構造モデルを示すブロック図である。It is a block diagram which shows the conversation structure model in an information collection (inquiry) task. 働きかけを伴う持続的な情報収集（問診）システムの構成例を示す図である。It is a figure which shows the structural example of the continuous information collection (inquiry) system with an action. 引き込み制御の多階層モデルを示す図である。It is a figure which shows the multi-hierarchical model of drawing-in control. インタラクションルールに基づく韻律情報制御および間の制御例を示す図である。It is a figure which shows the example of prosodic information control based on an interaction rule, and the control between them. 決定木学習による発話と頷きに関する動作タイミング制御例を示す図である。It is a figure which shows the example of operation | movement timing control regarding the speech by the decision tree learning, and whispering. 本発明の実施の形態４に係るロボットシステムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the robot system which concerns on Embodiment 4 of this invention. 本発明の実施の形態に係るロボットシステムの物理的構成例を示すブロック図である。It is a block diagram which shows the physical structural example of the robot system which concerns on embodiment of this invention.

本発明では、対話システムにおける同調モデルの連続的引き込み制御の枠組みの中で、コミュニケーション同調ずれ量を最小化する過程において、学習モデルにより選択されたインタラクションルールを、適応タイミングを考慮した離散的な引き込み制御により、ユーザの状態量を対話システムの状態量に近づけていくこと、または、対話システムの状態量をユーザの状態量に近づけつつユーザの状態量を対話システムの状態量に近づけていくことによって、ユーザと対話システムのコミュニケーション同調を早期から発現させる新しい対話制御方法を提案する。 In the present invention, the interaction rule selected by the learning model is discretely drawn in consideration of the adaptation timing in the process of minimizing the amount of communication tuning deviation in the framework of continuous pull-in control of the tuning model in the dialogue system. By controlling, the user's state quantity approaches the dialog system state quantity, or the user's state quantity approaches the dialog system state quantity while the dialog system state quantity approaches the user state quantity. Then, we propose a new dialogue control method that enables early communication synchronization between users and dialogue systems.

本発明を実施するための形態について図面を参照して詳細に説明する。以下に示す本発明の各実施の形態に係る対話システムは、持続的かつ自然なコミュニケーションを通じて、毎日の情報収集を、ストレスを与えずに、継続的にコミュニケーション制御を行うことができるものである。 Embodiments for carrying out the present invention will be described in detail with reference to the drawings. The dialogue system according to each embodiment of the present invention described below is capable of continuous communication control without applying stress to daily information collection through continuous and natural communication.

（実施の形態１）
まず、本発明の実施の形態に係る対話システムの構成ついて説明する。実施の形態では、以後、対話システムをロボットシステムに置き換えて説明する。ロボットシステムは、物質的な実体を伴わない、コンピュータ画面上に表現されたキャラクタであってもよい。対話システムは、視覚的にいわゆるロボットの形態をもたない場合がある。対話システムは、音声のみによる音声対話システムの場合を含む。 (Embodiment 1)
First, the configuration of the dialogue system according to the embodiment of the present invention will be described. In the embodiment, hereinafter, the dialogue system will be described by replacing it with a robot system. The robot system may be a character represented on a computer screen without a material entity. An interactive system may not visually have a so-called robot form. The dialogue system includes a case of a voice dialogue system using only voice.

図１には、本実施の形態に係るロボットシステムの概略的な構成が示されている。ロボットシステム１０は、センシング部１１、インタラクション状態量算出部１２、コミュニケーション同調制御部１３、インタラクションルール学習履歴部１９、状態方程式パラメータ学習履歴部２０、および、ロボットインタラクション制御部１４を備える。 FIG. 1 shows a schematic configuration of the robot system according to the present embodiment. The robot system 10 includes a sensing unit 11, an interaction state amount calculation unit 12, a communication tuning control unit 13, an interaction rule learning history unit 19, a state equation parameter learning history unit 20, and a robot interaction control unit 14.

センシング部１１は、例えばマイクロフォンとカメラを備える。マイクロフォンは、ユーザＰの発話した音声を電気信号に変換し、音声データを生成する。センシング部１１は、カメラでユーザＰを撮影し、その動画像データを生成する。センシング部１１は、生成した音声データおよび動画像データをセンシングデータとして、インタラクション状態量算出部１２に送る。 The sensing unit 11 includes, for example, a microphone and a camera. The microphone converts voice uttered by the user P into an electrical signal and generates voice data. The sensing unit 11 captures the user P with a camera and generates moving image data thereof. The sensing unit 11 sends the generated audio data and moving image data to the interaction state amount calculation unit 12 as sensing data.

インタラクション状態量算出部１２は、ユーザＰの音声から発話内容を音声認識して、ロボットインタラクション制御部１４に送る。インタラクション状態量算出部１２は、センシングデータに基づき、インタラクション状態量、例えば、話者交替の交替潜時や、発話の基本周波数（Ｆ_０）・パワー・モーラ長などを算出し、頷きなどの身体動作のイベントなどの検出を行う。インタラクション状態量算出部１２は、これらのインタラクション状態量（以下、単に状態量ともいう）をコミュニケーション同調制御部１３に送る。 The interaction state quantity calculation unit 12 recognizes the utterance content from the voice of the user P and sends it to the robot interaction control unit 14. Based on the sensing data, the interaction state quantity calculation unit 12 calculates the interaction state quantity, for example, the change latency of the speaker change, the basic frequency (F ₀ ) of the utterance, the power / mora length, and the like, and the body such as whispering Detect motion events. The interaction state quantity calculation unit 12 sends these interaction state quantities (hereinafter also simply referred to as state quantities) to the communication tuning control unit 13.

コミュニケーション同調制御部１３は、連続的引き込み制御部１５、状態方程式パラメータ記憶部１６、離散的引き込み制御部１７およびインタラクションルール記憶部１８を含む。インタラクションルールは、ロボットシステム１０のインタラクション状態量を変化させる規則である。連続的引き込み制御部１５は、ユーザＰのインタラクション状態量とロボットシステム１０のインタラクション状態量を入力として、状態方程式パラメータ記憶部１６の状態方程式パラメータを用いて、パラメータ学習された状態方程式による連続的引き込制御を行う。離散的引き込み制御部１７は、インタラクションルール記憶部１８からインタラクションルールを選択して、学習されたインタラクションルールによる離散的引き込み制御を行う。コミュニケーション同調制御部１３は、これらの引き込み制御によってロボットシステム１０のインタラクションの制御情報を生成し、ロボットインタラクション制御部１４に指令する。 The communication tuning control unit 13 includes a continuous pull-in control unit 15, a state equation parameter storage unit 16, a discrete pull-in control unit 17, and an interaction rule storage unit 18. The interaction rule is a rule that changes the amount of interaction state of the robot system 10. The continuous pull-in control unit 15 receives the interaction state quantity of the user P and the interaction state quantity of the robot system 10 as inputs, and uses the state equation parameters in the state equation parameter storage unit 16 to perform continuous pulling by the state equations learned by parameters. Control. The discrete pull-in control unit 17 selects an interaction rule from the interaction rule storage unit 18 and performs discrete pull-in control using the learned interaction rule. The communication tuning control unit 13 generates interaction control information of the robot system 10 by these pull-in controls, and instructs the robot interaction control unit 14.

最後に、ロボットインタラクション制御部１４は、インタラクション状態量算出部１２で音声認識したデータと、コミュニケーション同調制御部１３からインタラクションの制御情報を与えられ、ロボットシステム１０の発話や動作に関するインタラクションを生成する。ロボットインタラクション制御部１４は、ロボットシステム１０の発話や動作に関するインタラクションを生成して、音声合成によって音声を出力する。また、生成したインタラクションに従って、ロボットシステム１０の腕、顔、体の動作を行う。 Finally, the robot interaction control unit 14 receives the data recognized by the interaction state quantity calculation unit 12 and the interaction control information from the communication tuning control unit 13, and generates an interaction related to the speech and operation of the robot system 10. The robot interaction control unit 14 generates an interaction related to the speech and operation of the robot system 10 and outputs a voice by voice synthesis. In addition, according to the generated interaction, the arm, face, and body of the robot system 10 are moved.

インタラクションルール学習履歴部１９は、後述するインタラクションルールの学習結果を格納する。状態方程式パラメータ学習履歴部２０は、後述する状態方程式パラメータの学習結果を格納する。以下、ユーザＰとロボットシステム１０のインタラクション状態量にギャップがあった場合に、コミュニケーション同調制御部１３がインタラクションの制御情報を生成する方法を説明する。 The interaction rule learning history unit 19 stores a learning result of an interaction rule described later. The state equation parameter learning history unit 20 stores a learning result of a state equation parameter described later. Hereinafter, a method in which the communication tuning control unit 13 generates the control information for the interaction when there is a gap in the interaction state quantity between the user P and the robot system 10 will be described.

図２は、引き込み制御方式のパターンを示す図である。図２では、理解を容易にするため、状態量を１次元で表している。状態量は一般には多次元であって、状態量の変化は状態量空間の軌跡で表される。図２に示されるように、ユーザＰとロボットシステム１０の状態量のギャップが大きい場合は、同調現象は生じず、同調が発現する状態量のレベルを探索する必要がある。同調現象は、非線形振動子によりモデル化が可能である。本実施の形態では、Van der Pol方程式により定式化を行う。同調させるユーザＰおよびロボットシステム１０の時刻ｔの位相をそれぞれｘ(ｔ)、ｙ(ｔ)とし、観測される状態量の波形変位をＦ(ｘ，ｙ，ｔ)とすると、非線形振動子は、式
ｘ’(ｔ) ＝ｙ(ｔ)
ｙ’(ｔ) ＝ ε( １ − ｘ^２(ｔ)) ｙ(ｔ) ＋ α×Ｆ(ｘ，ｙ，ｔ) （１）
により定式化される。この連立常微分方程式の解の軌道はリミットサイクルとなっている。ここで、εは非線形性を示し、ε＞０のとき安定なリミットサイクルをもつ。αは影響度パラメータである。 FIG. 2 is a diagram illustrating a pattern of the pull-in control method. In FIG. 2, the state quantity is represented in one dimension for easy understanding. The state quantity is generally multidimensional, and the change in the state quantity is represented by a locus in the state quantity space. As shown in FIG. 2, when the gap between the state quantities of the user P and the robot system 10 is large, the tuning phenomenon does not occur, and it is necessary to search for the level of the state quantities at which the tuning occurs. The tuning phenomenon can be modeled by a nonlinear oscillator. In the present embodiment, formulation is performed using the Van der Pol equation. When the phase of the user P to be synchronized and the time t of the robot system 10 is x (t) and y (t) and the waveform displacement of the observed state quantity is F (x, y, t), the nonlinear oscillator is , X ′ (t) = y (t)
y ′ (t) = ε (1 −x ² (t)) y (t) + α × F (x, y, t) (1)
Is formulated by The orbit of the solution of this simultaneous ordinary differential equation is a limit cycle. Here, ε represents nonlinearity, and has a stable limit cycle when ε> 0. α is an influence parameter.

式（１）は、観測される状態量が単一となっているが、複数の観測される状態量からなる空間において、同調させる場合には、一般的には、α×Ｆ(ｘ，ｙ，ｔ)の項を、
α×Ｆ(ｘ，ｙ，ｔ)＋β×Ｇ(ｘ，ｙ，ｔ)＋・・・
のように線形結合することにより実現される。また、複数の状態量について、それぞれ独立に方程式を立て、独立に制御してもよい。状態量間の独立性がはっきりしており、状態量間の相互関係がない場合は、独立制御の方が適している。線形結合か独立制御かは状態量の関係性により決定する。 In the equation (1), the observed state quantity is single. However, when tuning is performed in a space composed of a plurality of observed state quantities, generally, α × F (x, y , T)
α × F (x, y, t) + β × G (x, y, t) +.
This is realized by linear combination as follows. Further, an equation may be established independently for each of the plurality of state quantities and controlled independently. When the independence between the state quantities is clear and there is no correlation between the state quantities, the independent control is more suitable. Whether it is linear combination or independent control is determined by the relationship between the state quantities.

式（１）に示されるような同調モデルによりコミュニケーション制御を行うには、同調が発現する状態量へ誘導する必要がある。ロボットシステム１０の状態量とユーザＰの状態量とにギャップがあったとき、図２（ａ）に示すように、ロボットシステム１０がユーザＰの状態量を探索し、ロボットシステム１０の状態量をユーザＰの状態量に近づけることにより同調現象を誘発させることは可能である。しかし、従順的に、ロボットシステム１０のコミュニケーション制御パラメータをユーザＰのそれに近づけることは、一時的にはコミュニケーションは成立するが、ロボットシステム１０の固有の印象も壊してしまい、ユーザＰがコミュニケーションに対してストレスを感じ、持続的なコミュニケーションができなくなってしまう可能性がある。 In order to perform communication control by the tuning model as shown in the equation (1), it is necessary to induce to a state quantity in which tuning is expressed. When there is a gap between the state quantity of the robot system 10 and the state quantity of the user P, the robot system 10 searches for the state quantity of the user P as shown in FIG. It is possible to induce the tuning phenomenon by approaching the state quantity of the user P. However, compliantly, bringing the communication control parameter of the robot system 10 close to that of the user P temporarily establishes communication, but also destroys the unique impression of the robot system 10, so that the user P can You may feel stressed and you may not be able to communicate continuously.

図２（ｂ）に示すように、強制的にユーザＰの状態量をロボットシステム１０の状態量に近づけるように、ロボットシステム１０のインタラクションルールを起動する戦略もある。しかし、この戦略は必ず成功するとは限らない。そこで例えば、図２（ｃ）に示すように、ロボットシステム１０の状態量を一旦、ユーザＰの状態量に近づけ、その後、徐々にロボットシステム１０の元の状態量に引き込むように、ロボットシステム１０の状態量を変化させる。このように、インタラクションルールを起動しつつ、ロボットシステム１０の状態量を適応的にユーザＰの状態量に近づけ、同調の発現を加速させる方が、ユーザＰへの負荷を低減させつつ、ロボットシステム１０自身が有する自然な印象を維持できる。 As shown in FIG. 2B, there is also a strategy for starting the interaction rule of the robot system 10 so that the state quantity of the user P is forcibly brought close to the state quantity of the robot system 10. However, this strategy is not always successful. Therefore, for example, as shown in FIG. 2C, the robot system 10 is configured so that the state quantity of the robot system 10 is once brought close to the state quantity of the user P and then gradually pulled into the original state quantity of the robot system 10. Change the amount of state. Thus, the robot system 10 can reduce the load on the user P while adaptively bringing the state amount of the robot system 10 close to the state amount of the user P and accelerating the onset of synchronization while activating the interaction rule. The natural impression of 10 itself can be maintained.

インタラクションルールには、それを起動するタイミング（ギャップの条件など）、同調動作の方向と変化の速さ、引き込みに転じるタイミング、引き込みの変化の速さなどのパラメータがありうる。状態量が多次元の場合は、ギャップもベクトルであって、インタラクションルールは１つとは限らない。また、インタラクションルールの同調動作の変化のパターンは１つとは限らない。 The interaction rule may have parameters such as the timing of starting it (gap conditions, etc.), the direction of the tuning operation and the speed of change, the timing of turning to pull-in, and the speed of change of pull-in. When the state quantity is multidimensional, the gap is also a vector, and the number of interaction rules is not necessarily one. Further, the change pattern of the interaction operation of the interaction rule is not necessarily one.

コミュニケーション同調制御部１３は、上述のように状態量のギャップに応じてインタラクションルールを選択する。選択したインタラクションルールを起動したのち、例えば、コミュニケーションの同調発現の継続時間、または発話の量などによってインタラクションルールを評価する。すなわち、選択したインタラクションルールに対応して、そのときの同調発現継続時間または発話の量などを記憶し、過去の実績の平均（または直近の移動平均など）が大きいインタラクションルールを選択する。あるいは、インタラクションルールのパラメータを変化させて、同調発現の継続時間と発話量の評価値が大きいパラメータに収束させる。コミュニケーション同調制御部１３は、このようにインタラクションルールの学習結果をインタラクションルール学習履歴部１９に記憶させる。 The communication tuning control unit 13 selects an interaction rule according to the state quantity gap as described above. After activating the selected interaction rule, the interaction rule is evaluated based on, for example, the duration of synchronous communication or the amount of speech. That is, in correspondence with the selected interaction rule, the synchronized onset duration or the amount of utterance at that time is stored, and an interaction rule with a large average of past results (or the latest moving average, etc.) is selected. Alternatively, the parameters of the interaction rule are changed so as to converge to a parameter having a large synchronized expression duration and an evaluation value of the amount of speech. The communication tuning control unit 13 stores the interaction rule learning result in the interaction rule learning history unit 19 in this way.

図３は、ユーザ状態量に基づく適応的引き込み制御の例を示す。図３に示すように、コミュニケーション同調を進めていく中で、コミュニケーション同調が途中で断絶することが多々存在しえる。このような場合には、どのようなインタラクションルールをどのタイミングで起動すべきかが重要となる。このようなインタラクションルールの選択は、インタラクションルール学習履歴に基づき行われる。 FIG. 3 shows an example of adaptive pull-in control based on user state quantities. As shown in FIG. 3, there are many cases in which communication synchronization is interrupted during the progress of communication synchronization. In such a case, what kind of interaction rule should be activated at which timing is important. Selection of such an interaction rule is performed based on the interaction rule learning history.

図４は、インタラクションルール学習による同調発現の加速を示す図である。コミュニケーション同調制御部１３は、状態量の差が同調発現レベルまで最急勾配（最短時間）で低減させるインタラクションルールを選択する。 FIG. 4 is a diagram illustrating acceleration of synchronized expression by interaction rule learning. The communication tuning control unit 13 selects an interaction rule that reduces the difference in the state quantity to the tuning expression level with the steepest slope (shortest time).

図５は、インタラクションルールによる引き込み制御の相転移を示す図である。図５に示すように、一旦リミットサイクルに入れば、軌道は位相安定点に収束していくが、多次元の状態量空間では、安定収束点を有するリミットサイクルは一般に複数ある。適切なインタラクションルールを適用することにより、より安定なリミットサイクルに相転移させることが可能となる。 FIG. 5 is a diagram illustrating a phase transition of the pull-in control based on the interaction rule. As shown in FIG. 5, once the limit cycle is entered, the trajectory converges to a phase stable point. However, in a multidimensional state quantity space, there are generally a plurality of limit cycles having stable convergence points. By applying an appropriate interaction rule, it is possible to make a phase transition to a more stable limit cycle.

図６は、話者交替の間の状態量と発話区間の韻律特徴の状態量から構成されるインタラクション状態量の引き込み制御過程を示す。ここでは、話者交替の間の状態量として、交替潜時（発話終了からと相手の発話が始まるまでの時間）を取り上げている。交替潜時の代わりに発話間隔（発話開始から相手の発話が開始されるまでの時間）でもよい。交替潜時の時間間隔のユーザＰとロボットシステム１０とのずれ量は、式（１）で示される同調モデルの状態方程式により、安定なリミットサイクルの中で振動しながら収束し、最小化される。 FIG. 6 shows an interaction state amount pull-in control process composed of the state amount during the speaker change and the state amount of the prosodic feature in the utterance section. Here, the change latency (time from the end of the utterance to the start of the other party's utterance) is taken up as the state quantity during the speaker change. An utterance interval (time from the start of utterance to the start of the other party's utterance) may be used instead of the alternate latency. The deviation amount between the user P and the robot system 10 in the time interval of the alternating latency is converged and minimized while oscillating in a stable limit cycle by the state equation of the tuning model expressed by the equation (1). .

また、韻律特徴として代表的な基本周波数Ｆ_０（ピッチ特徴）をとりあげて説明する。基本周波数（Ｆ_０）はサンプリング間隔の各時刻で算出され、発話区間の中で平均化した平均Ｆ_０値を、発話区間を代表する基本周波数とする。発話区間を代表する基本周波数は、発話句末モーラの基本周波数でもよい。ここで、同調制御の観点から、基本周波数の絶対量の値で制御をかけるのではなく、基本周波数の変化の度合いにより制御をかける方針をとる。なぜなら、基本周波数は声質にも依存しており個人差があるので、絶対量では同調制御は困難であるからである。 A typical fundamental frequency F ₀ (pitch feature) will be described as a prosodic feature. The fundamental frequency (F ₀ ) is calculated at each time of the sampling interval, and an average F ₀ value averaged in the utterance interval is set as a fundamental frequency representing the utterance interval. The fundamental frequency representing the utterance interval may be the fundamental frequency of the utterance phrase ending mora. Here, from the viewpoint of tuning control, the policy is not to apply the control with the absolute value of the fundamental frequency but to apply the control according to the degree of change of the fundamental frequency. This is because the fundamental frequency depends on the voice quality and there are individual differences, so that tuning control is difficult with absolute amounts.

基本周波数の変化の度合いは、１ターン前の発話区間の基本周波数と、現在のターンの発話区間の基本周波数の変動の値として定義し、この変動パターンを、同調制御の入力とする。式（１）の状態方程式に基づいて同調制御してもよいが、線形予測を行う状態方程式により同調制御を行ってもよい。具体的には、現在のターンをｔとし、１つ前のターンを（ｔ−１）としたとき、現在のターンのロボットシステム１０の発話区間の基本周波数を、（ｔ−１）ターンのユーザＰの発話区間の基本周波数と、（ｔ−１）ターンのロボットシステム１０の発話区間の基本周波数から同じ変動パターンになるように状態制御をかけ算出してもよい。 The degree of change of the fundamental frequency is defined as a value of fluctuation of the fundamental frequency of the utterance section one turn before and the fundamental frequency of the utterance section of the current turn, and this fluctuation pattern is used as an input for tuning control. Although the tuning control may be performed based on the state equation of Expression (1), the tuning control may be performed by a state equation that performs linear prediction. Specifically, when the current turn is t and the previous turn is (t−1), the fundamental frequency of the utterance section of the robot system 10 of the current turn is the user of the (t−1) turn. The calculation may be performed by applying state control so that the same variation pattern is obtained from the fundamental frequency of the utterance section of P and the fundamental frequency of the utterance section of the robot system 10 of (t-1) turn.

コミュニケーション同調制御部１３は、インタラクションルールと同じように、状態方程式のパラメータについても同調発現の継続時間または発話量などで評価し、学習結果を状態方程式パラメータ学習履歴部２０に記憶する。そして、学習結果を参照して、状態量のギャップに合わせて、最も早く同調発現し、同調発現の継続時間が長くなるパラメータに設定する。 Similar to the interaction rule, the communication tuning control unit 13 evaluates the parameters of the state equation based on the duration of the synchronous expression or the amount of speech, and stores the learning result in the state equation parameter learning history unit 20. Then, with reference to the learning result, the parameter is set to a parameter that synchronizes and develops the earliest in accordance with the state quantity gap, and that the duration of the synchronized expression becomes longer.

以上説明したように、本実施の形態の対話システムによれば、コミュニケーションギャップをリアルタイムに解決し、持続的かつ自然なコミュニケーションを行うように、対話システムの制御を行うことができる。 As described above, according to the dialog system of the present embodiment, the dialog system can be controlled so as to solve the communication gap in real time and perform continuous and natural communication.

（実施の形態２）
ユーザの状態量を対話システムの状態量に近づけていくこと、または、対話システムの状態量をユーザの状態量に近づけつつユーザの状態量を対話システムの状態量に近づけていく連続的引き込み制御と離散的引き込み制御の過程において、ユーザがストレスを感じることがあっては持続的なコミュニケーションは成立しない。ストレスは、心電図のＲ−Ｒ間隔の変動から交換神経指標や副交換神経指標の自律神経指標を算出することにより測定することは可能であるが、センサを装着することへの抵抗感は否めない。 (Embodiment 2)
Continuous pull-in control in which the user's state quantity is brought close to the dialog system state quantity, or the user's state quantity is brought close to the dialog system state quantity while the dialog system state quantity is brought close to the user's state quantity. If the user feels stress in the process of discrete pull-in control, continuous communication cannot be established. Stress can be measured by calculating an autonomic nerve index such as an exchange nerve index or an accessory nerve index from a change in an RR interval of an electrocardiogram, but a sense of resistance to wearing a sensor cannot be denied. .

心理学の分野では、性格や社会スキルに関するアンケート項目が確立されており、性格や社会スキルの違いにより、対話システムがユーザの状態量へ近づける戦略を決定することができる。たとえば、性格に依存傾向がある場合には、ユーザの状態量を対話システムの状態量に近づけてもストレスを結果的に感じず、独立傾向がある場合には、最初からユーザの状態量を対話システムの状態量に近づけるよりも、一旦、ユーザの状態量に近づけてから対話システムの状態量に引き込む方がストレスを感じない傾向がある。 In the field of psychology, questionnaire items related to personality and social skills have been established, and the strategy for the dialogue system to approach the state quantity of the user can be determined based on the difference in personality and social skills. For example, if there is a tendency to depend on personality, even if the user's state quantity is close to the state quantity of the dialog system, no stress will be felt as a result. Rather than approaching the system state quantity, there is a tendency that stress is not felt when it is brought closer to the user state quantity and then drawn into the dialog system state quantity.

これらのユーザパーソナリティ情報を事前に獲得しておいてもよいが、日々のユーザの個性は状況に応じて変化する場合がある。このような場合では、対話システムがユーザとの会話による情報収集のプロセスの中で新しいパーソナリティ情報を取得した方がより効果的であると考えられる。 These user personality information may be acquired in advance, but the daily personality of the user may change depending on the situation. In such a case, it is considered more effective that the interactive system acquires new personality information in the process of collecting information through conversation with the user.

本実施の形態２では、事前に判定された性格や社会スキルから構成されるユーザパーソナリティ情報に従い、ロボットシステム１０がユーザの状態量へ近づける戦略を決定し、コミュニケーション時のユーザのストレスを許容範囲内に抑えつつ、ユーザとロボットシステム１０のコミュニケーションにおける同調を早期に発現させる対話制御方法を採用する。 In the second embodiment, a strategy for the robot system 10 to approach the user's state quantity is determined in accordance with user personality information that is determined in advance from personality and social skills, and the user's stress during communication is within an allowable range. The dialogue control method is adopted that allows early synchronization in the communication between the user and the robot system 10 while suppressing the noise.

図７は、本発明の実施の形態２に係るロボットシステムの構成例を示すブロック図である。実施の形態２のロボットシステム１０は、実施の形態１の構成に加えて、ユーザパーソナリティ情報データベース２１を備える。ユーザパーソナリティ情報データベース２１は、ユーザＰの性格および／または社会スキルから構成されるユーザパーソナリティ情報を格納する。 FIG. 7 is a block diagram illustrating a configuration example of the robot system according to the second embodiment of the present invention. The robot system 10 of the second embodiment includes a user personality information database 21 in addition to the configuration of the first embodiment. The user personality information database 21 stores user personality information composed of the personality of the user P and / or social skills.

コミュニケーション同調制御部１３は、会話を行っているユーザＰのユーザパーソナリティ情報をユーザパーソナリティ情報データベース２１から取得し、ロボットシステム１０がユーザＰの状態量へ近づけるインタラクションルールを、ユーザパーソナリティ情報に基づいてユーザＰごとに決定する。コミュニケーション時のユーザＰのストレスを許容範囲内に抑えることにより、ユーザＰとロボットシステム１０のコミュニケーションにおける同調を、ストレスを軽減させた状態で、早期に発現させることが可能となる。 The communication tuning control unit 13 acquires user personality information of the user P having a conversation from the user personality information database 21, and sets an interaction rule that causes the robot system 10 to approach the state quantity of the user P based on the user personality information. Determine for each P. By suppressing the stress of the user P at the time of communication within an allowable range, the synchronization in the communication between the user P and the robot system 10 can be expressed early in a state where the stress is reduced.

（実施の形態３）
本実施の形態３では、在宅での問診などの情報収集ロボットを目標としており、ロボットが積極的に共感したり、エピソードに働きかける要素を情報収集の会話構造モデルに入れることにより、自然で持続的な情報収集の実現を図る。本実施の形態３では、聞き出しモード、応答・共感モード、エピソード展開モードの３つのコミュニケーションモードの遷移モデルを想定する。そして、そのモード毎に、同調モデルを表す状態方程式のパラメータやインタラクションルールを適応的に変化させることにより、ユーザとロボット間の自然なコミュニケーションを最大化させる。 (Embodiment 3)
In this third embodiment, the goal is an information gathering robot for home-based interviews, etc., and the robot actively sympathizes with it, and by putting elements that act on episodes into the conversation structure model of information gathering, it is natural and sustainable. Realization of information collection. In the third embodiment, a transition model of three communication modes of a listening mode, a response / sympathy mode, and an episode development mode is assumed. For each mode, the natural communication between the user and the robot is maximized by adaptively changing the parameters of the state equation representing the tuning model and the interaction rules.

コミュニケーションモードとは、聞き出しモード、応答・共感モードおよびエピソード展開モードを含む、発話に対する話者の関わり方である。また、会話構造（モデル）は、コミュニケーションモードの遷移する順序（モデル）である。 The communication mode is how the speaker relates to the utterance including the listening mode, the response / sympathy mode, and the episode development mode. The conversation structure (model) is the order (model) in which the communication mode transitions.

話者交替の交代潜時の状態量は、会話全体にわたり共通的に同調制御される。しかし、韻律情報の状態量は、会話全体にわたり同調制御がかかるものではなく、会話構造の中で、選択的に同調制御を行う必要がある。 The state quantity of the change latency of the speaker change is tuned in common throughout the conversation. However, the state quantity of prosodic information is not subject to tuning control over the entire conversation, and it is necessary to selectively perform tuning control within the conversation structure.

図８は、本発明の実施の形態３に係るロボットシステムの構成例を示すブロック図である。実施の形態３のロボットシステム１０は、実施の形態１の構成に加えて、会話構造モデル記述部２２および会話戦略記述部２３を備える。会話構造モデル記述部２２は、会話構造の状態遷移モデルを格納する。会話戦略記述部２３は、会話の目的および状態と、会話構造モデルとを対応づけるデータを格納する。会話の目的には、例えば、問診、情報提供、励まし、気分転換などがある。会話の状態とは、例えば問診の場合には、初めてか２回目以降何回目か、２回目以降の場合の前回の会話からの経過時間、過去の会話の継続時間および発話量などをいう。会話戦略記述部２３は、会話の目的と状態の分類ごとに、採用すべき会話構造モデルを規定する。 FIG. 8 is a block diagram illustrating a configuration example of the robot system according to the third embodiment of the present invention. The robot system 10 according to the third embodiment includes a conversation structure model description unit 22 and a conversation strategy description unit 23 in addition to the configuration of the first embodiment. The conversation structure model description unit 22 stores a state transition model of the conversation structure. The conversation strategy description unit 23 stores data that associates the purpose and state of the conversation with the conversation structure model. The purpose of the conversation includes, for example, an inquiry, information provision, encouragement, and change of mood. For example, in the case of an inquiry, the state of conversation refers to the elapsed time from the previous conversation, the duration of the previous conversation, the amount of speech, and the like for the first time, the second and subsequent times, and the second and subsequent times. The conversation strategy description unit 23 defines a conversation structure model to be adopted for each conversation purpose and state classification.

図９は、問診のような情報収集タスクの会話構造モデルを表している。問診は、例えば聞き出しモードから、応答・共感モードに遷移し、エピソード展開モードを介して、情報収集を行う。図１０は、ロボットが情報収集→共感→話題の想起を行うことで、ユーザの関心度向上・やる気の発現に至る因果関係を図式化したものである。ロボットシステム１０が働きかける会話構造としては、図９の状態遷移で十分であると考えている。 FIG. 9 shows a conversation structure model of an information collection task such as an inquiry. In the inquiry, for example, the mode is changed from the listening mode to the response / sympathy mode, and information is collected through the episode development mode. FIG. 10 is a schematic diagram of the causal relationship leading to improvement of the degree of interest of the user and expression of motivation by the robot collecting information → sympathy → recalling the topic. As the conversation structure that the robot system 10 works on, the state transition of FIG. 9 is considered to be sufficient.

このような会話構造の中で、効果的な会話状態のときに、韻律同調をかけるものとする。聞き出しモードなど、要求を行う場合などでは、同調に必要な呼応関係が希薄であるので、韻律同調は効果的に働かない。それに対して、応答・共感モードでは、相槌や呼応関係が頻繁に起こってくる。したがって、韻律同調が効果的に働くと予想される。以上より、会話構造モデルに基づく遷移の中で、韻律同調は、例えば図１１に示すように、選択的に同調制御を行うこととする。インタラクションルールについても、会話構造モデルに基づく状態遷移の中で、選択的に適用する必要がある。 It is assumed that prosody tuning is applied in an effective conversation state in such a conversation structure. When making a request such as a listening mode, prosody tuning does not work effectively because the responsiveness required for tuning is sparse. In contrast, in the response / sympathy mode, conflicts and responsiveness frequently occur. Therefore, prosody tuning is expected to work effectively. As described above, in the transition based on the conversation structure model, the tuning of prosody is selectively performed as shown in FIG. Interaction rules also need to be selectively applied during state transitions based on the conversation structure model.

図１１には、聞き出しモードで句末モーラの基本周波数（Ｆ_０）上昇ルールによる離散的引き込み制御を適用し、応答・共感モードでは韻律同調の状態方程式による連続的引き込み制御を行うことが示されている。各モードに共通して、句末モーラ長長音付加ルール、相槌に呼応した頷き生成ルールなどの共通インタラクションルールによる離散的引き込み制御を適用している。また、各モードに共通して、交替潜時の同調の状態方程式による連続的引き込みを適用することが示されている。 FIG. 11 shows that the discrete pull-in control by the fundamental frequency (F ₀ ) increase rule of the phrase end mora is applied in the listening mode, and the continuous pull-in control by the state equation of prosodic tuning is performed in the response / sympathy mode. ing. In each mode, discrete pull-in control based on common interaction rules such as a phrase end mora long sound addition rule and a whispering generation rule in response to the conflict is applied. In addition, it is shown that the continuous pull-in by the state equation of the tuning at the alternation latency is applied in common to each mode.

図１２（ａ）は、句末モーラの基本周波数（Ｆ_０）の上昇を行うインタラクションルールを示している。このルールは、聞き出しモードでの活用に適しており、次に続くユーザの発話の韻律レベルを上昇させる効果を有する。 FIG. 12A shows an interaction rule for increasing the fundamental frequency (F ₀ ) of the phrase end mora. This rule is suitable for use in the listening mode, and has the effect of increasing the prosodic level of the subsequent user's utterance.

このような選択的な適用が適しているインタラクションルールと、共通的な適用が適しているルールがある。図１２（ｂ）は、句末モーラを伸ばす長音付加ルールの適用の効果を示している。このルールは、「間」（交替潜時）を伸展する効果を有しており、ユーザの発話タイミングが遅い場合に、ロボットシステム１０の発話タイミングの同調を加速するときに適用される。また、相槌に呼応した頷き生成ルールなども会話構造モデルに無関係に共通的に働くインタラクションルールである。どのタイミングで、どのようなルールを適用するかは、会話構造モデルの中で、ルール適用履歴をもとに決定木などの学習技術により決定される。たとえば、頷き生成ルールとしては、図１３に示すような決定木が考えられる。 There are interaction rules suitable for such selective application, and rules suitable for common application. FIG. 12B shows the effect of applying the long sound addition rule that extends the phrase end mora. This rule has the effect of extending “between” (alternative latency), and is applied when accelerating the synchronization of the utterance timing of the robot system 10 when the utterance timing of the user is late. In addition, a whispering generation rule that responds to the interaction is an interaction rule that works in common regardless of the conversation structure model. Which timing is applied at what timing is determined by a learning technique such as a decision tree based on the rule application history in the conversation structure model. For example, a decision tree as shown in FIG.

以上、本実施の形態３の引き込み制御方式は、大きくは状態方程式による連続的引き込み制御とインタラクションルールによる離散的引き込み制御の２層構造になっており、会話構造モデルの中で共通的または選択的に制御が行われ、情報収集タスクにおいて、ユーザとロボットシステム間の自然なコミュニケーションを最大化させることが可能となる。 As described above, the pull-in control method of the third embodiment has a two-layer structure of continuous pull-in control based on the state equation and discrete pull-in control based on the interaction rule, and is common or selective in the conversation structure model. Thus, it is possible to maximize natural communication between the user and the robot system in the information collection task.

（実施の形態４）
ユーザの状態量を対話システムの状態量に近づけていくことまたは、対話システムの状態量をユーザの状態量に近づけつつユーザの状態量を対話システムの状態量に近づけていく連続的引き込み制御と離散的引き込み制御の過程において、実施の形態１、実施の形態２および実施の形態３を組み合わせることにより、ユーザと対話システムのコミュニケーションにおける同調を早期に発現させるとともに、ユーザと対話システム間の自然なコミュニケーションを最大化させることを特徴とする対話システムの対話制御方法を実現することが可能となる。 (Embodiment 4)
Continuous pull-in control and discrete operation to bring the user's state quantity close to the dialog system state quantity, or to bring the user's state quantity close to the dialog system state quantity while bringing the dialog system state quantity close to the user's state quantity In the process of automatic pull-in control, by combining the first embodiment, the second embodiment, and the third embodiment, the communication between the user and the dialogue system is expressed early, and the natural communication between the user and the dialogue system is performed. It is possible to realize a dialogue control method of a dialogue system characterized by maximizing the value of the dialogue system.

図１４は、本発明の実施の形態４に係るロボットシステムの構成例を示すブロック図である。実施の形態４のロボットシステム１０は、実施の形態２と実施の形態３の構成を合わせた構成になっている。すなわち、実施の形態１の構成に、ユーザパーソナリティ情報データベース２１、会話構造モデル記述部２２および会話戦略記述部２３を追加した構成である。 FIG. 14 is a block diagram showing a configuration example of a robot system according to Embodiment 4 of the present invention. The robot system 10 according to the fourth embodiment has a configuration in which the configurations of the second and third embodiments are combined. That is, the user personality information database 21, the conversation structure model description unit 22, and the conversation strategy description unit 23 are added to the configuration of the first embodiment.

実施の形態４では具体的には、事前に判定された性格や社会スキルから構成されるユーザパーソナリティ情報データベース２１の内容に従い、コミュニケーション同調制御部１３においてロボットシステム１０がユーザＰの状態量へ近づける戦略を決定する。それに合わせて、現在の会話状態がどの会話状態かを会話構造モデルと比較し判定し、聞き出しモード、応答・共感モード、エピソード展開モードの特性によって同調モデルを表す状態方程式のパラメータやインタラクションルールを適応的に変化させる。このようにして、ユーザＰとロボットシステム１０のコミュニケーションにおける同調を早期に発現させるとともに、ユーザＰとロボットシステム１０間の自然なコミュニケーションを最大化させることができる。 Specifically, in the fourth embodiment, the strategy in which the robot system 10 approaches the state quantity of the user P in the communication tuning control unit 13 in accordance with the contents of the user personality information database 21 composed of personality and social skills determined in advance. To decide. Accordingly, the current conversation state is compared with the conversation structure model to determine which conversation state, and the state equation parameters and interaction rules that represent the tuning model are applied according to the characteristics of the listening mode, response / sympathy mode, and episode development mode. Change. In this way, synchronization in communication between the user P and the robot system 10 can be expressed early, and natural communication between the user P and the robot system 10 can be maximized.

（実施の形態の変形）
上記の実施の形態の対話制御方法は、対話システム対１人のユーザを想定している。実施の形態１ないし４のロボットシステム１０を、ロボットシステム対複数人のユーザにも容易に拡張可能である。例えば、複数人ユーザの平均的な状態量をユーザグループの代表的な状態量として定義する、または、最悪ケースのユーザ（たとえば、交替潜時が一番長いユーザ）の状態量をユーザグループの代表的な状態量として定義することができる。複数のユーザを代表する状態量を定義することにより、グループユーザの代表的な状態量をロボットシステムの状態量に近づけていくこと、または、ロボットシステムの状態量をグループユーザの代表的な状態量に近づけつつグループユーザの代表的な状態量をロボットシステムの状態量に近づけていく連続的引き込み制御と離散的引き込み制御の過程に拡張することができる。 (Modification of the embodiment)
The dialog control method of the above embodiment assumes a dialog system versus one user. The robot system 10 according to the first to fourth embodiments can be easily extended to a robot system versus a plurality of users. For example, an average state quantity of a plurality of users is defined as a representative state quantity of a user group, or a state quantity of a worst case user (for example, a user with the longest alternation latency) is represented by a user group. It can be defined as a state quantity. By defining state quantities that represent multiple users, the group user's representative state quantities are brought close to the robot system state quantities, or the robot system state quantities are representative of group user state quantities. It is possible to extend the process to a continuous pull-in control and a discrete pull-in control in which the representative state quantity of the group user is brought close to the state quantity of the robot system.

図１５は、本発明の実施の形態に係るロボットシステムの物理的な構成例を示すブロック図である。 FIG. 15 is a block diagram illustrating a physical configuration example of the robot system according to the embodiment of the present invention.

ロボットシステム１０は、図１５に示すように、制御部３１、主記憶部３２、外部記憶部３３、操作部３４、表示部３５、入出力部３６および送受信部３７を備える。主記憶部３２、外部記憶部３３、操作部３４、表示部３５、入出力部３６および送受信部３７はいずれも内部バス３０を介して制御部３１に接続されている。 As shown in FIG. 15, the robot system 10 includes a control unit 31, a main storage unit 32, an external storage unit 33, an operation unit 34, a display unit 35, an input / output unit 36, and a transmission / reception unit 37. The main storage unit 32, the external storage unit 33, the operation unit 34, the display unit 35, the input / output unit 36 and the transmission / reception unit 37 are all connected to the control unit 31 through the internal bus 30.

制御部３１はＣＰＵ（Central Processing Unit）等から構成され、外部記憶部３３に記憶されている制御プログラム３９に従って、音声会話のインタラクション制御のための処理を実行する。 The control unit 31 includes a CPU (Central Processing Unit) and the like, and executes processing for voice conversation interaction control according to a control program 39 stored in the external storage unit 33.

主記憶部３２はＲＡＭ（Random-Access Memory）等から構成され、外部記憶部３３に記憶されている制御プログラム３９をロードし、制御部３１の作業領域として用いられる。 The main storage unit 32 is composed of a RAM (Random-Access Memory) or the like, loads a control program 39 stored in the external storage unit 33, and is used as a work area for the control unit 31.

外部記憶部３３は、フラッシュメモリ、ハードディスク、ＤＶＤ−ＲＡＭ（Digital Versatile Disc Random-Access Memory）、ＤＶＤ−ＲＷ（Digital Versatile Disc ReWritable）等の不揮発性メモリから構成され、上述の処理を制御部３１に行わせるための制御プログラム３９を予め記憶し、また、制御部３１の指示に従って、この制御プログラム３９が記憶するデータを制御部３１に供給し、制御部３１から供給されたデータを記憶する。 The external storage unit 33 includes a non-volatile memory such as a flash memory, a hard disk, a DVD-RAM (Digital Versatile Disc Random-Access Memory), a DVD-RW (Digital Versatile Disc ReWritable), and the above processing is performed by the control unit 31. A control program 39 to be executed is stored in advance, and data stored in the control program 39 is supplied to the control unit 31 in accordance with an instruction from the control unit 31, and the data supplied from the control unit 31 is stored.

操作部３４はキーボードおよびマウスなどのポインティングデバイス等と、キーボードおよびポインティングデバイス等を内部バス３０に接続するインタフェース装置から構成されている。操作部３４を介して、ユーザの情報、状態方程式パラメータ、インタラクションルール、ユーザパーソナリティ情報、会話構造モデル、会話戦略または各種の判定条件などが入力され、制御部３１に供給される。 The operation unit 34 includes a pointing device such as a keyboard and a mouse, and an interface device that connects the keyboard and the pointing device to the internal bus 30. User information, state equation parameters, interaction rules, user personality information, conversation structure model, conversation strategy, various determination conditions, and the like are input via the operation unit 34 and supplied to the control unit 31.

表示部３５は、ＣＲＴ（Cathode Ray Tube）もしくはＬＣＤ（Liquid Crystal Display）、およびスピーカなどから構成され、ロボットシステム１０の発声を出力する。また、ロボットをコンピュータ画面上のキャラクタで表現する場合は、ロボットのキャラクタを表示する。そのほか、ユーザの情報、状態方程式パラメータ、インタラクションルール、ユーザパーソナリティ情報、会話構造モデル、会話戦略などを表示する。 The display unit 35 includes a CRT (Cathode Ray Tube) or LCD (Liquid Crystal Display), a speaker, and the like, and outputs the utterance of the robot system 10. When the robot is represented by a character on the computer screen, the robot character is displayed. In addition, user information, state equation parameters, interaction rules, user personality information, conversation structure model, conversation strategy, and the like are displayed.

入出力部３６は、シリアルインタフェースまたはパラレルインタフェースから構成されている。入出力部３６に位置センサ、撮像装置およびマイク（いずれも図示せず）などが接続される。さらに、ロボットが実体的な顔、腕および脚を備えてそれらを作動させる場合は、制御部は、入出力部を介して、それらのアクチュエータに制御信号を指令し、それらのセンサが検出するデータを入力する。 The input / output unit 36 includes a serial interface or a parallel interface. A position sensor, an imaging device, a microphone (all not shown), and the like are connected to the input / output unit 36. Further, when the robot has a substantial face, arms and legs and operates them, the control unit instructs the actuators to control signals via the input / output unit, and the data detected by the sensors. Enter.

送受信部３７は、無線送受信機、無線モデムまたは網終端装置、およびそれらと接続するシリアルインタフェースまたはＬＡＮ（Local Area Network）インタフェースから構成されている。送受信部３７を介して、ユーザの発話を認識したデータを送信する。また、ユーザパーソナリティ情報をネットワーク経由で収集する。 The transmission / reception unit 37 includes a wireless transmitter / receiver, a wireless modem or a network termination device, and a serial interface or a LAN (Local Area Network) interface connected thereto. Data that recognizes the user's utterance is transmitted via the transmission / reception unit 37. User personality information is also collected via the network.

ロボットシステム１０のセンシング部１１、インタラクション状態量算出部１２、コミュニケーション同調制御部１３、ロボットインタラクション制御部１４、インタラクションルール学習履歴部１９、状態方程式パラメータ学習履歴部２０、ユーザパーソナリティ情報データベース２１、会話構造モデル記述部２２、会話戦略記述部２３などの処理は、制御プログラム３９が、制御部３１、主記憶部３２、外部記憶部３３、操作部３４、表示部３５、入出力部３６および送受信部３７などを資源として用いて処理することによって実行する。 Sensing unit 11, interaction state quantity calculating unit 12, communication tuning control unit 13, robot interaction control unit 14, interaction rule learning history unit 19, state equation parameter learning history unit 20, user personality information database 21, conversation structure of robot system 10 In the processes such as the model description unit 22 and the conversation strategy description unit 23, the control program 39 is controlled by the control unit 31, the main storage unit 32, the external storage unit 33, the operation unit 34, the display unit 35, the input / output unit 36, and the transmission / reception unit 37. And so on as a resource.

その他、前記のハードウエア構成やフローチャートは一例であり、任意に変更および修正が可能である。 In addition, the above-described hardware configuration and flowchart are examples, and can be arbitrarily changed and modified.

制御部３１、主記憶部３２、外部記憶部３３、操作部３４、内部バス３０などから構成される制御処理を行う中心となる部分は、専用のシステムによらず、通常のコンピュータシステムを用いて実現可能である。たとえば、前記の動作を実行するためのコンピュータプログラムを、コンピュータが読み取り可能な記録媒体（フレキシブルディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ等）に格納して配布し、当該コンピュータプログラムをコンピュータにインストールすることにより、前記の処理を実行するロボットシステム１０を構成してもよい。また、インターネット等の通信ネットワーク上のサーバ装置が有する記憶装置に当該コンピュータプログラムを格納しておき、通常のコンピュータシステムがダウンロード等することでロボットシステム１０を構成してもよい。 The central part that performs control processing including the control unit 31, the main storage unit 32, the external storage unit 33, the operation unit 34, the internal bus 30 and the like is not based on a dedicated system, but using a normal computer system. It is feasible. For example, a computer program for executing the above operation is stored and distributed in a computer-readable recording medium (flexible disk, CD-ROM, DVD-ROM, etc.), and the computer program is installed in the computer. Thus, the robot system 10 that executes the above-described processing may be configured. Alternatively, the robot system 10 may be configured by storing the computer program in a storage device included in a server device on a communication network such as the Internet and downloading it by a normal computer system.

また、ロボットシステム１０の機能を、ＯＳ（オペレーティングシステム）とアプリケーションプログラムの分担、またはＯＳとアプリケーションプログラムとの協働により実現する場合などには、アプリケーションプログラム部分のみを記録媒体や記憶装置に格納してもよい。 Further, when the functions of the robot system 10 are realized by sharing an OS (operating system) and an application program, or by cooperation between the OS and the application program, only the application program portion is stored in a recording medium or a storage device. May be.

また、搬送波にコンピュータプログラムを重畳し、通信ネットワークを介して配信することも可能である。たとえば、通信ネットワーク上の掲示板(BBS：Bulletin Board System)に前記コンピュータプログラムを掲示し、ネットワークを介して前記コンピュータプログラムを配信してもよい。そして、このコンピュータプログラムを起動し、ＯＳの制御下で、他のアプリケーションプログラムと同様に実行することにより、前記の処理を実行できるように構成してもよい。 It is also possible to superimpose a computer program on a carrier wave and distribute it via a communication network. For example, the computer program may be posted on a bulletin board (BBS: Bulletin Board System) on a communication network, and the computer program may be distributed via the network. The computer program may be started and executed in the same manner as other application programs under the control of the OS, so that the above-described processing may be executed.

上記の実施形態の一部または全部は、以下の付記のようにも記載されうるが、以下には限られない。 A part or all of the above-described embodiment can be described as in the following supplementary notes, but is not limited thereto.

（付記１）
話者の発話内容を認識する音声認識手段、およびその認識結果に応じて音声による聴覚的応答、および／または、身体的挙動の表現による視覚応答を出力する応答制御手段を備える対話システムであって、
前記話者の発話における、話者交替潜時、発話区間のピッチ、パワーまたはモーラを含む韻律特徴の状態量の検出、および／または、前記話者の身体的挙動を示す状態量の検出、を行う状態量検出手段と、
前記韻律特徴の状態量または前記身体的挙動を示す状態量を含むインタラクション状態量から、前記話者と前記応答制御手段とのインタラクション状態量のずれ量であるコミュニケーション同調ずれ量を算出する手段と、
前記応答制御手段のインタラクション状態量を変化させる規則であるインタラクションルールを記憶するルール記憶手段と、
前記コミュニケーション同調ずれ量に基づいて、前記ルール記憶手段から前記インタラクションルールを選択するルール選択手段と、
同調モデルを表す状態方程式による連続的な引き込み制御により、前記コミュニケーション同調ずれ量を最小化すると同時に、前記ルール選択手段で選択したインタラクションルールによる離散的な引き込み制御により、前記話者のインタラクション状態量を前記対話システムのインタラクション状態量に近づけていくこと、または、前記対話システムのインタラクション状態量を前記話者のインタラクション状態量に近づけつつ前記話者のインタラクション状態量を前記対話システムのインタラクション状態量に近づけていく同調制御手段と、
を備えることを特徴とする対話システム。 (Appendix 1)
An interactive system comprising speech recognition means for recognizing the utterance content of a speaker, and response control means for outputting an auditory response by voice and / or a visual response by expression of physical behavior according to the recognition result ,
Detection of state quantities of prosodic features including speaker alternation latency, pitch of speech section, power or mora, and / or detection of state quantities indicating physical behavior of the speaker in the speaker's utterance State quantity detection means to perform;
Means for calculating a communication synchronization deviation amount, which is a deviation amount of an interaction state amount between the speaker and the response control means, from an interaction state amount including the state amount of the prosodic feature or the state amount indicating the physical behavior;
Rule storage means for storing an interaction rule which is a rule for changing an interaction state quantity of the response control means;
Rule selection means for selecting the interaction rule from the rule storage means based on the communication synchronization deviation amount;
The amount of interaction state of the speaker is reduced by discrete pull-in control by the interaction rule selected by the rule selection means, while at the same time minimizing the communication tuning shift amount by continuous pull-in control by a state equation representing a tuning model. Approaching the interaction state quantity of the dialog system, or bringing the interaction state quantity of the dialog system close to the interaction state quantity of the speaker while bringing the interaction state quantity of the speaker close to the interaction state quantity of the dialog system Tuning control means to go,
A dialogue system characterized by comprising:

（付記２）
前記話者の性格および／または社会スキルを表すユーザパーソナリティ情報を取得する手段を備え、
前記ルール選択手段は、前記ユーザパーソナリティ情報に基づいて、会話の進行に合わせて選択する前記インタラクションルールの順序を決定することを特徴とする付記１に記載の対話システム。 (Appendix 2)
Means for obtaining user personality information representing the personality and / or social skills of the speaker;
The dialog system according to claim 1, wherein the rule selection unit determines the order of the interaction rules to be selected in accordance with the progress of conversation based on the user personality information.

（付記３）
会話の聞き出しモード、応答・共感モードおよびエピソード展開モードを含むコミュニケーションモードの、遷移する順序を規定する会話構造モデルを記憶する構造モデル記憶手段と、
前記会話構造モデルに従って遷移する前記コミュニケーションモードに基づいて、前記同調モデルを表す状態方程式のパラメータ、および／または、前記インタラクションルールを変化させる適応制御手段と、
を備えることを特徴とする付記１または２に記載の対話システム。 (Appendix 3)
A structure model storage means for storing a conversation structure model that defines a transition order of communication modes including a conversation listening mode, a response / sympathy mode, and an episode development mode;
Adaptive control means for changing a parameter of a state equation representing the tuning model and / or the interaction rule based on the communication mode transitioning according to the conversation structure model;
The interactive system according to appendix 1 or 2, characterized by comprising:

（付記４）
話者の発話内容を認識する音声認識手段、およびその認識結果に応じて音声による聴覚的応答、および／または、身体的挙動の表現による視覚応答を出力する応答制御手段を備える対話システムが行う対話制御方法であって、
前記話者の発話における、話者交替潜時、発話区間のピッチ、パワーまたはモーラを含む韻律特徴の状態量の検出、および／または、前記話者の身体的挙動を示す状態量の検出、を行う状態量検出ステップと、
前記韻律特徴の状態量または前記身体的挙動を示す状態量を含むインタラクション状態量から、前記話者と前記応答制御手段とのインタラクション状態量のずれ量であるコミュニケーション同調ずれ量を算出するステップと、
前記コミュニケーション同調ずれ量に基づいて、前記応答制御手段のインタラクション状態量を変化させる規則であるインタラクションルールを記憶するルール記憶手段から、前記インタラクションルールを選択するルール選択ステップと、
同調モデルを表す状態方程式による連続的な引き込み制御により、前記コミュニケーション同調ずれ量を最小化すると同時に、前記ルール選択ステップで選択したインタラクションルールによる離散的な引き込み制御により、前記話者のインタラクション状態量を前記対話システムのインタラクション状態量に近づけていくこと、または、前記対話システムのインタラクション状態量を前記話者のインタラクション状態量に近づけつつ前記話者のインタラクション状態量を前記対話システムのインタラクション状態量に近づけていく同調制御ステップと、
を備えることを特徴とする対話制御方法。 (Appendix 4)
Dialogue performed by a dialogue system comprising speech recognition means for recognizing the utterance content of a speaker, and response control means for outputting an auditory response by voice and / or a visual response by expression of physical behavior according to the recognition result A control method,
Detection of state quantities of prosodic features including speaker alternation latency, pitch of speech section, power or mora, and / or detection of state quantities indicating physical behavior of the speaker in the speaker's utterance A state quantity detection step to be performed;
Calculating a communication synchronization deviation amount, which is a deviation amount of an interaction state amount between the speaker and the response control means, from an interaction state amount including the state amount of the prosodic feature or the state amount indicating the physical behavior;
A rule selection step of selecting the interaction rule from a rule storage unit that stores an interaction rule that is a rule for changing an interaction state amount of the response control unit based on the communication synchronization deviation amount;
The communication state deviation amount is minimized by continuous pull-in control by a state equation representing a tuning model, and at the same time, the interaction state amount of the speaker is determined by discrete pull-in control by the interaction rule selected in the rule selection step. Approaching the interaction state quantity of the dialog system, or bringing the interaction state quantity of the dialog system close to the interaction state quantity of the speaker while bringing the interaction state quantity of the speaker close to the interaction state quantity of the dialog system The tuning control step
A dialogue control method comprising:

（付記５）
前記話者の性格および／または社会スキルを表すユーザパーソナリティ情報を取得するステップを備え、
前記ルール選択ステップは、前記ユーザパーソナリティ情報に基づいて、会話の進行に合わせて選択する前記インタラクションルールの順序を決定することを特徴とする付記４に記載の対話制御方法。 (Appendix 5)
Obtaining user personality information representing the personality and / or social skills of the speaker,
The dialog control method according to appendix 4, wherein the rule selection step determines the order of the interaction rules to be selected in accordance with the progress of the conversation based on the user personality information.

（付記６）
会話の聞き出しモード、応答・共感モードおよびエピソード展開モードを含むコミュニケーションモードの、遷移する順序を規定する会話構造モデルを設定する構造モデル記憶ステップと、
前記会話構造モデルに従って遷移する前記コミュニケーションモードに基づいて、前記同調モデルを表す状態方程式のパラメータ、および／または、前記インタラクションルールを変化させる適応制御ステップと、
を備えることを特徴とする付記４または５に記載の対話制御方法。 (Appendix 6)
A structure model storage step for setting a conversation structure model that defines a transition order of communication modes including a conversation listening mode, a response / sympathy mode, and an episode development mode;
An adaptive control step of changing a parameter of a state equation representing the tuning model and / or the interaction rule based on the communication mode transitioning according to the conversation structure model;
The dialog control method according to appendix 4 or 5, characterized by comprising:

（付記７）
話者の発話内容を認識する音声認識手段、およびその認識結果に応じて音声による聴覚的応答、および／または、身体的挙動の表現による視覚応答を出力する応答制御手段を備える対話システムを制御するコンピュータに、
話者の発話における、話者交替潜時、発話区間のピッチ、パワーまたはモーラを含む韻律特徴の状態量の検出、および／または、前記話者の身体的挙動を示す状態量の検出、を行う状態量検出ステップと、
前記韻律特徴の状態量または前記身体的挙動を示す状態量を含むインタラクション状態量から、前記話者と前記応答制御手段とのインタラクション状態量のずれ量であるコミュニケーション同調ずれ量を算出するステップと、
前記コミュニケーション同調ずれ量に基づいて、前記応答制御手段のインタラクション状態量を変化させる規則であるインタラクションルールを記憶するルール記憶手段から、前記インタラクションルールを選択するルール選択ステップと、
同調モデルを表す状態方程式による連続的な引き込み制御により、前記コミュニケーション同調ずれ量を最小化すると同時に、前記ルール選択ステップで選択したインタラクションルールによる離散的な引き込み制御により、前記話者のインタラクション状態量を前記対話システムのインタラクション状態量に近づけていくこと、または、前記対話システムのインタラクション状態量を前記話者のインタラクション状態量に近づけつつ前記話者のインタラクション状態量を前記対話システムのインタラクション状態量に近づけていく同調制御ステップと、
を実行させることを特徴とするプログラム。 (Appendix 7)
Controlling a dialogue system comprising voice recognition means for recognizing the utterance content of a speaker, and response control means for outputting an auditory response by voice and / or a visual response by expression of physical behavior according to the recognition result On the computer,
In the speaker's utterance, detection of the state of the prosody feature including the alternation latency of the speaker, the pitch of the utterance section, the power or the mora, and / or the state amount indicating the physical behavior of the speaker is performed. A state quantity detection step;
Calculating a communication synchronization deviation amount, which is a deviation amount of an interaction state amount between the speaker and the response control means, from an interaction state amount including the state amount of the prosodic feature or the state amount indicating the physical behavior;
A rule selection step of selecting the interaction rule from a rule storage unit that stores an interaction rule that is a rule for changing an interaction state amount of the response control unit based on the communication synchronization deviation amount;
The communication state deviation amount is minimized by continuous pull-in control by a state equation representing a tuning model, and at the same time, the interaction state amount of the speaker is determined by discrete pull-in control by the interaction rule selected in the rule selection step. Approaching the interaction state quantity of the dialog system, or bringing the interaction state quantity of the dialog system close to the interaction state quantity of the speaker while bringing the interaction state quantity of the speaker close to the interaction state quantity of the dialog system The tuning control step
A program characterized by having executed.

１０ロボットシステム（対話システム）
１１センシング部
１２インタラクション状態量算出部
１３コミュニケーション同調制御部
１４ロボットインタラクション制御部
１５連続的引き込み制御部
１６状態方程式パラメータ記憶部
１７離散的引き込み制御部
１８インタラクションルール記憶部
１９インタラクションルール学習履歴部
２０状態方程式パラメータ学習履歴部
２１ユーザパーソナリティ情報データベース
２２会話構造モデル記述部
２３会話戦略記述部
３０内部バス
３１制御部
３２主記憶部
３３外部記憶部
３４操作部
３５表示部
３６入出力部
３７送受信部
３９制御プログラム 10 Robot system (dialogue system)
DESCRIPTION OF SYMBOLS 11 Sensing part 12 Interaction state quantity calculation part 13 Communication tuning control part 14 Robot interaction control part 15 Continuous drawing control part 16 State equation parameter storage part 17 Discrete drawing control part 18 Interaction rule storage part 19 Interaction rule learning history part 20 State Equation parameter learning history part 21 User personality information database 22 Conversation structure model description part 23 Conversation strategy description part 30 Internal bus 31 Control part 32 Main storage part 33 External storage part 34 Operation part 35 Display part 36 Input / output part 37 Transmission / reception part 39 Control program

Claims

An interactive system comprising speech recognition means for recognizing the utterance content of a speaker, and response control means for outputting an auditory response by voice and / or a visual response by expression of physical behavior according to the recognition result ,
Detection of state quantities of prosodic features including speaker alternation latency, pitch of speech section, power or mora, and / or detection of state quantities indicating physical behavior of the speaker in the speaker's utterance State quantity detection means to perform;
Means for calculating a communication synchronization deviation amount, which is a deviation amount of an interaction state amount between the speaker and the response control means, from an interaction state amount including the state amount of the prosodic feature or the state amount indicating the physical behavior;
Rule storage means for storing an interaction rule which is a rule for changing an interaction state quantity of the response control means;
Rule selection means for selecting the interaction rule from the rule storage means based on the communication synchronization deviation amount;
The amount of interaction state of the speaker is reduced by discrete pull-in control by the interaction rule selected by the rule selection means, while at the same time minimizing the communication tuning shift amount by continuous pull-in control by a state equation representing a tuning model. Approaching the interaction state quantity of the dialog system, or bringing the interaction state quantity of the dialog system close to the interaction state quantity of the speaker while bringing the interaction state quantity of the speaker close to the interaction state quantity of the dialog system Tuning control means to go,
A dialogue system characterized by comprising:

Means for obtaining user personality information representing the personality and / or social skills of the speaker;
The dialog system according to claim 1, wherein the rule selection unit determines an order of the interaction rules to be selected in accordance with the progress of the conversation based on the user personality information.

A structure model storage means for storing a conversation structure model that defines a transition order of communication modes including a conversation listening mode, a response / sympathy mode, and an episode development mode;
Adaptive control means for changing a parameter of a state equation representing the tuning model and / or the interaction rule based on the communication mode transitioning according to the conversation structure model;
The dialogue system according to claim 1, further comprising:

Dialogue performed by a dialogue system comprising speech recognition means for recognizing the utterance content of a speaker, and response control means for outputting an auditory response by voice and / or a visual response by expression of physical behavior according to the recognition result A control method,
Detection of state quantities of prosodic features including speaker alternation latency, pitch of speech section, power or mora, and / or detection of state quantities indicating physical behavior of the speaker in the speaker's utterance A state quantity detection step to be performed;
Calculating a communication synchronization deviation amount, which is a deviation amount of an interaction state amount between the speaker and the response control means, from an interaction state amount including the state amount of the prosodic feature or the state amount indicating the physical behavior;
A rule selection step of selecting the interaction rule from a rule storage unit that stores an interaction rule that is a rule for changing an interaction state amount of the response control unit based on the communication synchronization deviation amount;
The communication state deviation amount is minimized by continuous pull-in control by a state equation representing a tuning model, and at the same time, the interaction state amount of the speaker is determined by discrete pull-in control by the interaction rule selected in the rule selection step. Approaching the interaction state quantity of the dialog system, or bringing the interaction state quantity of the dialog system close to the interaction state quantity of the speaker while bringing the interaction state quantity of the speaker close to the interaction state quantity of the dialog system The tuning control step
A dialogue control method comprising:

Controlling a dialogue system comprising voice recognition means for recognizing the utterance content of a speaker, and response control means for outputting an auditory response by voice and / or a visual response by expression of physical behavior according to the recognition result On the computer,
In the speaker's utterance, detection of the state of the prosody feature including the alternation latency of the speaker, the pitch of the utterance section, the power or the mora, and / or the state amount indicating the physical behavior of the speaker is performed. A state quantity detection step;
Calculating a communication synchronization deviation amount, which is a deviation amount of an interaction state amount between the speaker and the response control means, from an interaction state amount including the state amount of the prosodic feature or the state amount indicating the physical behavior;
A rule selection step of selecting the interaction rule from a rule storage unit that stores an interaction rule that is a rule for changing an interaction state amount of the response control unit based on the communication synchronization deviation amount;
The communication state deviation amount is minimized by continuous pull-in control by a state equation representing a tuning model, and at the same time, the interaction state amount of the speaker is determined by discrete pull-in control by the interaction rule selected in the rule selection step. Approaching the interaction state quantity of the dialog system, or bringing the interaction state quantity of the dialog system close to the interaction state quantity of the speaker while bringing the interaction state quantity of the speaker close to the interaction state quantity of the dialog system The tuning control step
A program characterized by having executed.