JP2015148701A

JP2015148701A - Robot control device, robot control method and robot control program

Info

Publication number: JP2015148701A
Application number: JP2014021121A
Authority: JP
Inventors: 崇裕松元; Takahiro Matsumoto; 俊一瀬古; Shunichi Seko; 良輔青木; Ryosuke Aoki; 仁土川; Hitoshi Tsuchikawa; 山田　智広; Tomohiro Yamada; 智広山田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2014-02-06
Filing date: 2014-02-06
Publication date: 2015-08-20
Anticipated expiration: 2034-02-06
Also published as: JP6122792B2

Abstract

PROBLEM TO BE SOLVED: To achieve an action as if a robot views a video picture together with a user, and to cause the robot to generate empathy for the user.SOLUTION: A robot control device is configured to: acquire a comment about a program under viewing from a social media server 7; determine an utterance content caused to be uttered by a robot 2 from comments of a personality match utterer matching a personality set in the robot 2; and extract an action content caused to be executed by the robot 2 from an action database 15 on the basis of a dialogue state of the utterance content and an emotional state of the robot 2. Thus, the robot control device allows the robot 2 to execute the action in accordance with the content of the program under viewing. When determining the utterance content, the robot control device narrows dowm the comments by the personality set in the robot to allow the robot 2 to execute an utterance and action with consistency.

Description

本発明は、ロボットを制御する技術に関する。 The present invention relates to a technique for controlling a robot.

人間同士の間で、ある表情表出において他人が同調的に感情表出をした場合、同調的反応によって被験者の幸福表情は促進され、怒り・悲しみ表情は弱められることが分かっている。そのため、ロボットと映像を共視聴することで共に映像内容に対して笑い・喜び・悲しみ・怒りといった同調的反応をすることは、一人きりで映像を見た場合よりも笑い・喜びといった感情を促進し、悲しみ・怒りといった感情を抑えることが可能となる。 It is known that when a person expresses an emotion in a certain expression between humans, the happiness expression of the subject is promoted and the anger / sadness expression is weakened by the synchronized reaction. Therefore, co-viewing the video with the robot and co-responding to the video content together, such as laughter, joy, sadness, and anger, promotes emotions such as laughter and joy rather than watching the video alone. It is possible to suppress emotions such as sadness and anger.

また、人とＣＧ人物とのコミュニケーション研究において、共感を与えるような表情変化をＣＧ人物に行わせることで人に対し親和動機を与えることが指摘されている。親和動機とは、相手に対して近寄り・協力し・行為に報いることを求める欲求と定義されており、人は自分と類似した態度をとる他者に対して親和動機を抱くと考えられている。 Further, in communication research between a person and a CG person, it has been pointed out that an affinity motivation is given to a person by causing the CG person to perform a facial expression change that gives empathy. Affinity motivation is defined as a desire to approach, cooperate with, or reward the other person, and people are considered to have affinity motivation for others who have a similar attitude to them. Yes.

映像視聴時にユーザがロボットに対し共感を得たと感じさせる技術として、非特許文献１では、ユーザの視聴番組ログと視聴中の発話から視聴番組に対するユーザの評価をプロファイルとして推定し、視聴中にユーザが退屈そうであればプロファイルを用いてロボットが他のテレビ番組を推薦することで、ロボットに対しユーザの共感を生む技術が開示されている。 In Non-Patent Document 1, as a technique for making a user feel that a robot has sympathy with a robot during video viewing, the user's evaluation of the viewing program is estimated as a profile from the user's viewing program log and the utterance being viewed, and the user is viewing However, if the robot seems to be bored, a technique is disclosed in which the robot recommends another TV program using a profile, thereby generating user empathy for the robot.

また、非特許文献２では、視聴番組に関するソーシャルメディア上のコメントをロボットが発話文として用いユーザに向けて対話を行い、更にロボットがユーザからの発話をソーシャルメディア上へコメントとして投稿することで、ロボットがソーシャルメディアの仲介役を行う技術が開示されている。 Further, in Non-Patent Document 2, the robot uses the comments on the social media regarding the viewing program as the utterance sentence, performs the dialogue toward the user, and the robot posts the utterance from the user as a comment on the social media. A technique in which a robot acts as an intermediary for social media is disclosed.

高橋達、他２名、“高齢者の発話機会増加のためのソーシャルメディア仲介ロボット”、信学技報、電子情報通信学会，２０１２年１０月、第１１２巻、第２３３号、pp.21-26Tatsuhashi Takahashi and two others, “Social media brokering robot for increasing speech opportunities for the elderly”, IEICE Technical Report, IEICE, October, 112, 233, pp.21- 26 高間康史、他５名、“テレビ視聴時の情報推薦に基づくヒューマン・ロボットコミュニケーション”、第２１回人工知能学会全国大会、人工知能学会、２００７年、2D5-5Yasushi Takama and 5 others, “Human-Robot Communication Based on Recommendation of Information when Watching TV”, 21st Annual Conference of Japanese Society for Artificial Intelligence, Japan Society for Artificial Intelligence, 2007, 2D5-5

しかしながら、非特許文献１では、番組単位のユーザ評価により他の番組を紹介しているのみで、現在見ている映像の内容について感情表現を行ったり発話を行ったりすることはできない。また、映像やディスプレイに対する電源ＯＮ−ＯＦＦや音量調整といったユーザの操作についての感情表現や発話もチャンネルの変化についてのみであったため、共感を生じさせる影響が限定的であった。 However, Non-Patent Document 1 only introduces other programs based on user evaluation in units of programs, and cannot express emotions or speak about the content of the currently viewed video. Also, emotional expressions and utterances regarding user operations such as power ON / OFF for video and display and volume adjustment are only for channel changes, so the effect of creating empathy was limited.

また、非特許文献２においてもテレビ内容に合わせたアクションによる感情表現は実現されていない。発話内容についてもソーシャルメディアを利用する場合はロボットの発話に一貫性を持たせることが難しく、非特許文献２においてもロボットが一貫性の無い発話をしてしまうことについてネガティブな感想が寄せられている。一貫性の無い発話に対しネガティブな感想が寄せられてしまう原因として、例えばユーザと共に視聴している番組に対して女性の話すような発話内容で発話をしていたロボットが、突然男性の発話内容で話し出してしまうと、ユーザの中でそれまで共にテレビを見ていたロボットのエージェンシーが崩れてしまうためと考えられる。そのため、一貫性を保った発話をさせることは重要となる。非特許文献２においては、一貫性の実現方法について「なりきり方式」としてソーシャルメディア上のコメントをテキストの意味属性の解釈を行い、一貫性のある意見のみを抽出し発話することで実現すると述べられているが、一般にソーシャルメディアから一貫性のある意見のみを抽出する事は容易ではなく、具体的な実現方法については述べられていない。また非特許文献２においては、ユーザの操作に対する感情表現や発話については述べられていない。そのため、非特許文献２においても共感を生じさせる影響は限定的であった。 Also, even in Non-Patent Document 2, emotional expression by action according to television contents is not realized. Concerning the utterance contents, it is difficult to make the utterances of robots consistent when using social media, and even in Non-Patent Document 2, there is a negative impression that robots utter inconsistent utterances. Yes. As a cause of negative impressions of inconsistent utterances, for example, a robot that was speaking with a female speaking utterance on a program being watched with a user suddenly uttered a male utterance It seems that the agency of the robot that had been watching TV together until then collapsed. Therefore, it is important to make the utterance consistent. In Non-Patent Document 2, it is stated that the method of realizing consistency is realized by interpreting the semantic attributes of the text on social media as a “sequential method”, and extracting and speaking only consistent opinions. However, it is generally not easy to extract only consistent opinions from social media, and no concrete implementation method is described. Further, Non-Patent Document 2 does not describe emotion expressions and utterances for user operations. Therefore, even in Non-Patent Document 2, the effect of generating empathy was limited.

本発明は、上記に鑑みてなされたものであり、ロボットがユーザと共に映像を視聴しているかのようなアクションを実現し、ロボットがユーザに対して共感を生じさせることを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to realize an action as if a robot is watching a video together with a user, so that the robot can sympathize with the user.

第１の本発明に係るロボット制御装置は、ユーザとともに映像を視聴するような動作をロボットに実行させるロボット制御装置であって、前記映像に対して投稿されたコメントを取得するコメント取得手段と、前記コメントから前記ロボットに発話させる発話文を生成する発話文生成手段と、前記コメントから前記ロボットの感情状態を決定する感情決定手段と、前記ロボットに発話させる発話文の対話状態と前記ロボットの感情状態と前記ロボットに実行させるアクションとを関連付けて記憶したアクション蓄積手段と、前記アクション蓄積手段を参照し、前記発話文生成手段が生成した発話文の対話状態と前記感情決定手段が決定した感情状態から前記ロボットに実行させるアクションを決定するアクション決定手段と、前記発話文生成手段が生成した発話文を音声合成して前記ロボットに出力させるとともに、前記アクション決定手段が決定したアクションに基づく動作を前記ロボットに実行させる制御手段と、を有することを特徴とする。 A robot control apparatus according to a first aspect of the present invention is a robot control apparatus that causes a robot to perform an operation of viewing a video together with a user, and a comment acquisition unit that acquires a comment posted on the video; An utterance sentence generating means for generating an utterance sentence to be uttered by the robot from the comment, an emotion determination means for determining an emotion state of the robot from the comment, an interactive state of the utterance sentence to be uttered by the robot, and an emotion of the robot Action storage means for storing the state and the action to be executed by the robot in association with each other, and referring to the action storage means, the dialogue state of the utterance sentence generated by the utterance sentence generation means and the emotion state determined by the emotion determination means Action deciding means for deciding an action to be executed by the robot, and generating the spoken sentence The utterance sentence stage generated causes output to the robot by voice synthesis, and having a control means for executing the operation based on the action the action determining means has determined the robot.

上記ロボット制御装置において、前記ロボットに設定されるパーソナリティの情報を蓄積したパーソナリティ蓄積手段を有し、前記発話文生成手段は、前記パーソナリティに一致する投稿者が投稿した前記コメントから前記発話文を生成することを特徴とする。 In the robot control device, the robot control device has personality storage means for storing personality information set for the robot, and the speech sentence generation means generates the speech sentence from the comment posted by a poster who matches the personality. It is characterized by doing.

上記ロボット制御装置において、前記ユーザの方向を取得する方向取得手段を有し、前記アクションは、前記ユーザの方向を見るアクションを含むことを特徴とする。 The robot control apparatus includes a direction acquisition unit that acquires the direction of the user, and the action includes an action of viewing the direction of the user.

第２の本発明に係るロボット制御方法は、ユーザとともに映像を視聴するような動作をロボットに実行させるコンピュータによるロボット制御方法であって、前記映像に対して投稿されたコメントを取得するステップと、前記コメントから前記ロボットに発話させる発話文を生成するステップと、前記コメントから前記ロボットの感情状態を決定するステップと、前記ロボットに発話させる発話文の対話状態と前記ロボットの感情状態と前記ロボットに実行させるアクションとを関連付けて記憶したアクション蓄積手段を参照し、前記発話文を生成するステップで生成した発話文の対話状態と前記感情状態を決定するステップで決定した感情状態から前記ロボットに実行させるアクションを決定するステップと、前記発話文を生成するステップで生成した発話文を音声合成して前記ロボットに出力させるとともに、前記アクションを決定するステップで決定したアクションに基づく動作を前記ロボットに実行させるステップと、を有することを特徴とする。 A robot control method according to a second aspect of the present invention is a robot control method by a computer that causes a robot to perform an operation of viewing a video together with a user, and obtaining a comment posted on the video; Generating an utterance sentence to be uttered by the robot from the comment; determining an emotion state of the robot from the comment; an interaction state of an utterance sentence to be uttered by the robot; an emotion state of the robot; and Referring to the action storage means stored in association with the action to be executed, the robot is caused to execute from the conversation state of the utterance sentence generated in the step of generating the utterance sentence and the emotion state determined in the step of determining the emotion state A step of determining an action, and a step of generating the spoken sentence. The spoken sentence causes output to the robot by voice synthesis resulting in, and having the steps of: executing the operation based on the action determined in the robot in the step of determining the action.

第３の本発明に係るロボット制御プログラムは、ユーザとともに映像を視聴するような動作をロボットに実行させるロボット制御プログラムであって、前記映像に対して投稿されたコメントを取得する処理と、前記コメントから前記ロボットに発話させる発話文を生成する処理と、前記コメントから前記ロボットの感情状態を決定する処理と、前記ロボットに発話させる発話文の対話状態と前記ロボットの感情状態と前記ロボットに実行させるアクションとを関連付けて記憶したアクション蓄積手段を参照し、前記発話文を生成する処理で生成した発話文の対話状態と前記感情状態を決定する処理で決定した感情状態から前記ロボットに実行させるアクションを決定する処理と、前記発話文を生成する処理で生成した発話文を音声合成して前記ロボットに出力させるとともに、前記アクションを決定する処理で決定したアクションに基づく動作を前記ロボットに実行させる処理と、をコンピュータに実行させることを特徴とする。 A robot control program according to a third aspect of the present invention is a robot control program for causing a robot to perform an operation of viewing a video together with a user, the process for obtaining a comment posted on the video, and the comment Processing for generating an utterance sentence to be uttered by the robot, processing for determining the emotional state of the robot from the comment, dialogue state of the utterance sentence to be uttered by the robot, emotion state of the robot, and causing the robot to execute An action to be executed by the robot from the emotion state determined in the conversation state of the utterance sentence generated in the process of generating the utterance sentence and the emotion state determined in the process of determining the emotion state with reference to the action storage means stored in association with the action Speech synthesis of the utterance sentence generated in the process of determining and the process of generating the utterance sentence It causes outputs to the robot, characterized in that to perform the operation based on the action determined in the process of determining the actions and processes to be executed by the robot, to the computer.

本発明によれば、ロボットがユーザと共に映像を視聴しているかのようなアクションを実現し、ロボットがユーザに対して共感を生じさせることができる。 According to the present invention, it is possible to realize an action as if the robot is watching a video together with the user, and the robot can generate empathy for the user.

本実施の形態におけるロボット制御装置を含む全体構成図である。It is a whole block diagram including the robot control apparatus in this Embodiment. ロボットパーソナリティ属性情報の例を示す図である。It is a figure which shows the example of robot personality attribute information. アクションデータベースに格納されるアクション決定テーブルと制御シーケンステーブルの例を示す図である。It is a figure which shows the example of the action determination table and control sequence table which are stored in an action database. 定型発話文データベースに格納される定型発話文の例を示す図である。It is a figure which shows the example of the fixed form speech sentence stored in a fixed form speech sentence database. ポジネガ単語データベースに格納されるデータの例を示す図である。It is a figure which shows the example of the data stored in a positive / negative word database. 口調変換データベースに格納されるデータの例を示す図である。It is a figure which shows the example of the data stored in a tone conversion database. 番組関連発話・アクションタグデータベースに格納されるデータの例を示す図である。It is a figure which shows the example of the data stored in a program related speech / action tag database. 電子番組表情報データベースが保持するデータの例を示す図である。It is a figure which shows the example of the data which an electronic program guide information database hold | maintains. 番組−ソーシャルメディアタグ関連データベースが保持するデータの例を示す図である。It is a figure which shows the example of the data which a program-social media tag related database hold | maintains. ソーシャルメディアサーバが保持するソーシャルメディアコメント情報の例を示す図である。It is a figure which shows the example of the social media comment information which a social media server hold | maintains. 位置取得サーバが保持するユーザ・ディスプレイ方位情報の例を示す図である。It is a figure which shows the example of the user display direction information which a position acquisition server hold | maintains. ディスプレイ状態変更に基づいて発話内容とアクション内容を決定する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which determines speech content and action content based on a display state change. 盛り上がり値とパーソナリティ一致話者コメント情報を取得する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process which acquires an excitement value and personality matching speaker comment information. 盛り上がり値とパーソナリティ一致話者コメント情報に基づいて発話内容とアクション内容を決定する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which determines speech content and action content based on an excitement value and personality matching speaker comment information. 盛り上がり値とポジティブ・ネガティブ値に基づく感情状態のマップを示す図である。It is a figure which shows the map of the emotional state based on a climax value and a positive / negative value. 番組に付随するシナリオに基づいて発話内容とアクション内容を決定する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which determines utterance content and action content based on the scenario accompanying a program. 決定した発話内容とアクション内容に基づいてロボットに発話とアクションを実行させる処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which makes a robot perform speech and an action based on the determined speech content and action content. ロボット制御装置により制御されたロボットの様子を示す図である。It is a figure which shows the mode of the robot controlled by the robot control apparatus.

以下、本発明の実施の形態について図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本実施の形態におけるロボット制御装置を含む全体構成図である。 FIG. 1 is an overall configuration diagram including a robot control apparatus according to the present embodiment.

本実施の形態におけるロボット制御装置１は、リモコン３によって操作されるテレビ番組表示機能を有するディスプレイ４の状態変化（例えば電源ＯＮ，ＯＦＦや音量の変化など）、ソーシャルメディアサーバ７に投稿された視聴中のチャンネル（＝番組）に関連するコメント、および視聴中のチャンネルに付随するシナリオ、に基づいて発話・アクション内容を決定し、ロボット２を制御する装置である。 The robot control apparatus 1 according to the present embodiment changes the state of the display 4 having a TV program display function operated by the remote controller 3 (for example, power ON / OFF, volume change, etc.), viewing posted on the social media server 7. This is a device for controlling the robot 2 by determining the utterance / action content based on the comment associated with the channel (= program) in the middle and the scenario associated with the channel being viewed.

［ロボット制御装置の構成］
まず、本実施の形態におけるロボット制御装置１の構成について説明する。 [Robot controller configuration]
First, the configuration of the robot control apparatus 1 in the present embodiment will be described.

ロボット制御装置１は、ディスプレイ情報処理部１１、ソーシャルメディア情報取得部１２、発話・アクション決定部１３、ロボットパーソナリティ属性情報データベース１４、アクションデータベース１５、定型発話文データベース１６、ポジネガ単語データベース１７、口調変換データベース１８、および番組関連発話・アクションタグデータベース１９を備える。ロボット制御装置１が備える各部は、演算処理装置、記憶装置等を備えたコンピュータにより構成して、各部の処理がプログラムによって実行されるものとしてもよい。このプログラムはロボット制御装置１が備える記憶装置に記憶されており、磁気ディスク、光ディスク、半導体メモリ等の記録媒体に記録することも、ネットワークを通して提供することも可能である。図１では、ロボット制御装置１とロボット２とを分けて示しているが、ロボット２内にロボット制御装置１を組み込んでもよい。 The robot control apparatus 1 includes a display information processing unit 11, a social media information acquisition unit 12, an utterance / action determination unit 13, a robot personality attribute information database 14, an action database 15, a fixed utterance sentence database 16, a positive / negative word database 17, and a tone conversion. A database 18 and a program-related utterance / action tag database 19 are provided. Each unit included in the robot control device 1 may be configured by a computer including an arithmetic processing device, a storage device, and the like, and the processing of each unit may be executed by a program. This program is stored in a storage device included in the robot control apparatus 1 and can be recorded on a recording medium such as a magnetic disk, an optical disk, or a semiconductor memory, or provided through a network. In FIG. 1, the robot control device 1 and the robot 2 are shown separately, but the robot control device 1 may be incorporated in the robot 2.

ディスプレイ情報処理部１１は、リモコン３によりディスプレイ４が操作された内容を含むディスプレイ状態変更情報や視聴チャンネル変更情報を取得し、ディスプレイ状態変更情報は発話・アクション決定部１３に送信し、視聴チャンネル変更情報を取得した場合は、新たに視聴するチャンネルの情報を取得してソーシャルメディア情報取得部１２に送信する。また、視聴中のチャンネルに関する発話・アクションタグ情報が番組関連発話・アクションタグデータベース１９に存在する場合は、視聴中のチャンネルに関する発話・アクションタグ情報を取得して発話・アクション決定部１３に送信する。 The display information processing unit 11 acquires display state change information and viewing channel change information including the content of the operation of the display 4 by the remote controller 3, and the display state change information is transmitted to the utterance / action determination unit 13 to change the viewing channel. When the information is acquired, information on a channel to be newly viewed is acquired and transmitted to the social media information acquisition unit 12. If the utterance / action tag information related to the channel being viewed exists in the program-related utterance / action tag database 19, the utterance / action tag information related to the channel being viewed is acquired and transmitted to the utterance / action determination unit 13. .

ソーシャルメディア情報取得部１２は、視聴中のチャンネルの情報をディスプレイ情報処理部１１から受信し、視聴中のチャンネルに関するコメントをソーシャルメディアサーバ７から取得し、ロボットに設定されたパーソナリティと一致するパーソナリティの話者（以下、「パーソナリティ一致話者」という）のコメントと視聴中のチャンネルの盛り上がり度合いを示す盛り上がり値を求め、発話・アクション決定部１３に送信する。 The social media information acquisition unit 12 receives information on the channel being viewed from the display information processing unit 11, acquires a comment regarding the channel being viewed from the social media server 7, and has a personality that matches the personality set for the robot. An excitement value indicating the excitement level of the comment of the speaker (hereinafter referred to as “personality matching speaker”) and the channel being viewed is obtained and transmitted to the utterance / action determination unit 13.

発話・アクション決定部１３は、ディスプレイ情報処理部１１から受信したディスプレイ状態変更情報と発話・アクションタグ情報、ソーシャルメディア情報取得部１２から受信したパーソナリティ一致話者コメント情報と盛り上がり値に基づき、ロボットに発話させる発話内容及びロボットにさせるアクション内容を決定してロボットを制御する。発話・アクション内容を決定する際にはロボット２に設定されたロボットパーソナリティを考慮する。 The utterance / action determination unit 13 determines whether the robot is based on the display state change information and the utterance / action tag information received from the display information processing unit 11, the personality match speaker comment information received from the social media information acquisition unit 12, and the excitement value. The robot controls the robot by determining the utterance content to be uttered and the action content to be performed by the robot. The robot personality set in the robot 2 is taken into consideration when determining the utterance / action content.

［ロボット制御装置が保持するデータ］
続いて、ロボット制御装置１が保持するデータについて説明する。 [Data held by the robot controller]
Next, data held by the robot control apparatus 1 will be described.

ロボットパーソナリティ属性情報データベース１４は、ロボットに設定されるパーソナリティを表すロボットパーソナリティ属性情報を格納する。ロボットパーソナリティ属性情報は、ソーシャルメディア情報取得部１２がパーソナリティ一致話者を抽出するとき、および発話・アクション決定部１３が発話・アクション内容を決定するときに用いられる。 The robot personality attribute information database 14 stores robot personality attribute information representing the personality set for the robot. The robot personality attribute information is used when the social media information acquisition unit 12 extracts personality-matching speakers and when the utterance / action determination unit 13 determines the utterance / action content.

図２に、ロボットパーソナリティ属性情報データベース１４に格納されるロボットパーソナリティ属性情報の例を示す。ロボットパーソナリティ属性情報は、図２（ａ）に示す映像依存のロボットパーソナリティと図２（ｂ）に示す固定のロボットパーソナリティの２種類のパーソナリティ属性情報で構成される。 FIG. 2 shows an example of the robot personality attribute information stored in the robot personality attribute information database 14. The robot personality attribute information includes two types of personality attribute information, that is, a video-dependent robot personality shown in FIG. 2A and a fixed robot personality shown in FIG.

映像依存のロボットパーソナリティには、カテゴリ毎に、単数または複数の属性、属性値、属性重みのセットが設定される。カテゴリは、スポーツ・ニュース・ドラマなど映像の種類を表すものから構成され、サブカテゴリを持つカテゴリもある。例えば、メインカテゴリがスポーツである場合は、サッカー、野球、バスケットボールなどがサブカテゴリとなる。属性は、番組に対し個人が持ちうる趣味や趣向の要素を表し、例えば、サッカー番組における好きなチーム・好きな選手やニュース番組における好きなジャンルなどから構成される。属性値は、属性に対する具体的な趣味や趣向の項目であり、例えば、好きなチームの属性に対し属性値はチームＢ，好きなジャンルの属性に対して属性値は芸能などの値が記載される。属性重みは、各属性にそれぞれ付与され、その属性の属性値の一致がパーソナリティの類似性判定に対して、どれだけ寄与するかを表す重みづけである。 In the video-dependent robot personality, a set of one or more attributes, attribute values, and attribute weights is set for each category. The category is composed of items representing the type of video such as sports, news, and dramas, and some categories have subcategories. For example, when the main category is sports, soccer, baseball, basketball, and the like are subcategories. The attribute represents an element of hobbies or preferences that an individual can have with respect to the program, and includes, for example, a favorite team in a soccer program, a favorite player, a favorite genre in a news program, and the like. The attribute value is an item of a specific hobby or preference for the attribute. For example, the attribute value is the team B attribute for the favorite team attribute, and the attribute value is the entertainment value for the favorite genre attribute. The The attribute weight is given to each attribute, and is a weight indicating how much the attribute value match of the attribute contributes to the personality similarity determination.

固定のロボットパーソナリティは映像に依存しない固定されたパーソナリティであり、属性、属性値、属性重みのセットで構成される。属性として性別や年代を持ち、属性が性別の場合は属性値として男性又は女性が設定され、属性が年代の場合は属性値として２０代、３０代・・が設定される。属性重みは、映像依存のロボットパーソナリティと同様の、各属性に対する重みづけである。 The fixed robot personality is a fixed personality that does not depend on the image, and is composed of a set of attributes, attribute values, and attribute weights. If the attribute is gender or age, and the attribute is gender, male or female is set as the attribute value, and if the attribute is age, the twenties, thirties, and so on are set. The attribute weight is a weight for each attribute similar to the video-dependent robot personality.

アクションデータベース１５は、発話・アクション決定部１３がロボット２のアクション内容を決定するときに用いるアクション決定テーブルと、各アクションについてのロボットの制御シーケンスを記載した制御シーケンステーブルを格納する。 The action database 15 stores an action determination table used when the utterance / action determination unit 13 determines the action content of the robot 2 and a control sequence table describing a robot control sequence for each action.

図３（ａ）は、アクション決定テーブルの例であり、図３（ｂ）は、制御シーケンステーブルの例である。 FIG. 3A is an example of an action determination table, and FIG. 3B is an example of a control sequence table.

アクション決定テーブルには、対話状態トリガー、感情状態トリガー、動作名、および実行速度をセットとしたデータが格納される。対話状態トリガーには、「話しかけ」と「感想」のいずれかの値が入る。対話状態は、発話・アクション決定部１３が決定した発話内容により決められるものであり、発話内容が話しかける内容であれば話しかけ、感想を述べる内容であれば感想となる。感情状態トリガーには、喜び、驚きなどロボット２に設定される感情状態の値が入る。感情状態は、発話・アクション決定部１３が決定した発話・アクション内容に応じて決められて、発話・アクション決定部１３が備える記憶領域に格納されている。動作名は、例えば、うなずく、首を横に振る、万歳をするなどロボット２にさせるアクションを識別するためのラベルである。実行速度は、動作名で指定された動きを実行する際の速度に関するパラメータであり、値が大きいほどアクションを実行する速度が速くなる。発話・アクション決定部１３は、アクション決定テーブルを参照し、対話状態、感情状態に基づいてロボット２にさせるアクション内容を決定する。 The action determination table stores data including a dialogue state trigger, an emotion state trigger, an action name, and an execution speed. The conversation state trigger includes either “talk” or “impression”. The conversation state is determined by the utterance content determined by the utterance / action determination unit 13, and is spoken if the utterance content is a content to be spoken, and is an impression if the content is to describe an impression. The emotional state trigger includes values of emotional states set in the robot 2, such as joy and surprise. The emotional state is determined according to the content of the utterance / action determined by the utterance / action determining unit 13 and stored in a storage area included in the utterance / action determining unit 13. The action name is a label for identifying an action to be performed by the robot 2 such as nodding, shaking his / her head, or taking a long time. The execution speed is a parameter related to the speed when executing the motion specified by the action name. The larger the value, the faster the action is executed. The utterance / action determination unit 13 refers to the action determination table and determines the action content to be made to the robot 2 based on the dialogue state and the emotion state.

制御シーケンステーブルには、動作名とアクチュエータ制御シーケンスをセットとしたデータが格納される。動作名は、アクション決定テーブルと対応するラベルであり、一連のアクチュエータ制御シーケンスをロボットが実行した際のロボット動作の様子を示している。アクチュエータ制御シーケンスは、ユーザ方位必要の有無、ディスプレイ方位必要の有無に加えて、モータ制御箇所と値、シーケンス移動間隔で構成されるデータのリストで構成される。ユーザ方位必要の有無、ディスプレイ方位必要の有無には、該当するアクションをする際に、ユーザ方位、ディスプレイ方位が必要であるか否かが指定される。モータ制御箇所と値には、例えば頭部チルト角０度、左腕チルト角０度というように、制御対象のアクチュエータ箇所とそのアクチュエータに設定する角度値が入る。モータ制御箇所と値のデータは、例えば図３（ｂ）のうなずく動作であれば、頭部チルト角０度から始まり，頭部チルト角−４０度、頭部チルト角０度と逐次リスト形式で保持されており、左側から順番に指定の角度になるまでアクチュエータ制御が実施される。シーケンス移動間隔には、同列のモータ制御箇所と値から次のモータ制御箇所と値に遷移する際の遷移間隔を示す値が入り、値が小さいほど遷移する速度が速くなる。 The control sequence table stores data in which operation names and actuator control sequences are set. The operation name is a label corresponding to the action determination table, and indicates a state of the robot operation when the robot executes a series of actuator control sequences. The actuator control sequence includes a list of data including motor control locations and values, and sequence movement intervals, in addition to whether or not user orientation is necessary and whether or not display orientation is necessary. Whether or not the user orientation is necessary and whether or not the display orientation is necessary specify whether or not the user orientation and the display orientation are necessary when performing the corresponding action. The motor control location and value include an actuator location to be controlled and an angle value set for the actuator, such as a head tilt angle of 0 degrees and a left arm tilt angle of 0 degrees. For example, in the case of the nod operation shown in FIG. 3B, the motor control location and the value data start from a head tilt angle of 0 degrees, a head tilt angle of −40 degrees, and a head tilt angle of 0 degrees in a list format. The actuator control is performed until the specified angle is reached in order from the left side. The sequence movement interval includes a value indicating a transition interval at the time of transition from the motor control location and value in the same row to the next motor control location and value. The smaller the value, the faster the transition speed.

定型発話文データベース１６は、ディスプレイ４の状態変化時のロボット２の発話内容を記載した定型発話文を格納する。発話・アクション決定部１３は、ディスプレイ状態変更情報を受信したときに、定型発話文データベース１６を参照し、ディスプレイ状態変更情報の内容、現在の感情状態、およびロボットパーソナリティからロボット２に発話させる発話内容を決定するとともに、実行時の感情状態を得る。 The fixed utterance sentence database 16 stores fixed utterance sentences describing the utterance contents of the robot 2 when the state of the display 4 changes. When the utterance / action determination unit 13 receives the display state change information, the utterance / action determination unit 13 refers to the standard utterance sentence database 16 and makes the robot 2 utter from the content of the display state change information, the current emotion state, and the robot personality. And get the emotional state at run time.

図４に、定型発話文データベース１６に格納される定型発話文の例を示す。同図に示す定型発話文は、テレビ状態遷移、元の感情状態、実行する感情状態、対話状態、発話内容、およびロボットパーソナリティをセットとしたデータの集合からなる。テレビ状態遷移には、電源ＯＮ、電源ＯＦＦ、チャンネル変更、音量大、音量小など、ディスプレイ４の操作により変化したディスプレイ４の状態変化が入る。元の感情状態と実行する感情状態は、驚きや喜びなどのロボットの感情状態を示す値が入る。元の感情状態には、全ての感情状態を表すＡＬＬや疲労・眠い・悲しみのように複数の感情状態を含んでもよい。対話状態は、発話内容を実行する場合の発話態度であり話しかけ又は感想の値が入る。発話内容は、「一緒にテレビ見ようよ」のように、実際にロボットが発話する文字列が入る。ロボットパーソナリティは、発話内容にあったパーソナリティを示す値が入る。図４の例では、性別と年代を記載している。 FIG. 4 shows an example of a fixed utterance sentence stored in the fixed utterance sentence database 16. The fixed utterance sentence shown in the figure consists of a set of data including a television state transition, an original emotion state, an emotion state to be executed, a dialogue state, an utterance content, and a robot personality. The television state transition includes a change in the state of the display 4 that has been changed by the operation of the display 4, such as power ON, power OFF, channel change, high volume, and low volume. The original emotional state and the emotional state to be executed contain values indicating the emotional state of the robot such as surprise and joy. The original emotional state may include a plurality of emotional states such as ALL representing all emotional states, fatigue, sleepiness, and sadness. The dialogue state is an utterance attitude when executing the utterance content, and a value of talking or impression is entered. The content of the utterance contains a character string actually spoken by the robot, such as “Let's watch TV together”. The robot personality contains a value indicating the personality that matches the utterance content. In the example of FIG. 4, sex and age are described.

ポジネガ単語データベース１７は、単語の意味が、ポジティブな内容であるか、ネガティブな内容であるかを示すデータを格納する。発話・アクション決定部１３は、パーソナリティ一致話者コメント情報を受信したときに、ポジネガ単語データベース１７を参照し、そのコメント内の単語がポジティブであるかネガティブであるかを判定してポジティブ・ネガティブ値を算出する。ポジティブ・ネガティブ値は、盛り上がり値と合わせて感情状態を決定するのに用いられる。また、ロボット２に発話させる音声の高低等を決定するのにも用いられる。 The positive / negative word database 17 stores data indicating whether the meaning of the word is positive or negative. When the utterance / action determination unit 13 receives personality match speaker comment information, the utterance / action determination unit 13 refers to the positive / negative word database 17 to determine whether a word in the comment is positive or negative, and thus a positive / negative value. Is calculated. The positive / negative value is used to determine the emotional state together with the excitement value. It is also used to determine the level of voice to be uttered by the robot 2.

図５に、ポジネガ単語データベース１７に格納されるデータの例を示す。ポジネガ単語データベース１７に格納されるデータは、単語、ポジティブ、ネガティブの３つの要素で構成される。単語には、凄い、きれい、残念などの単語が入る。単語がポジティブな意味であればボジティブの欄に１、ネガティブの欄に０が入り、単語がネガティブな意味であればボジティブの欄に０、ネガティブの欄に１が入る。 FIG. 5 shows an example of data stored in the positive / negative word database 17. The data stored in the positive / negative word database 17 is composed of three elements: word, positive, and negative. The words include great, beautiful, and regretful words. If the word has a positive meaning, 1 is entered in the positive column, 0 is entered in the negative column, and if the word has a negative meaning, 0 is entered in the positive column, and 1 is entered in the negative column.

口調変換データベース１８は、発話内容の口調を変換するための変換データを格納する。発話・アクション決定部１３は、パーソナリティ一致話者コメント情報から発話内容を決定し、口調変換データベース１８を参照して、その発話内容の口調を変換する。 The tone conversion database 18 stores conversion data for converting the tone of the utterance content. The utterance / action determination unit 13 determines the utterance contents from the personality matching speaker comment information, and refers to the tone conversion database 18 to convert the tone of the utterance contents.

図６に、口調変換データベース１８に格納されるデータの例を示す。口調変換データベース１８に格納されるデータは、変換元と変換先のセットで構成される。変換元、変換先のどちらも文字列である。 FIG. 6 shows an example of data stored in the tone conversion database 18. The data stored in the tone conversion database 18 includes a set of conversion sources and conversion destinations. Both the conversion source and the conversion destination are character strings.

番組関連発話・アクションタグデータベース１９は、ディスプレイ４で映される番組に合わせてロボット２に発話させる発話内容などを格納する。ディスプレイ情報処理部１１が、視聴中のチャンネルの情報を取得して、番組関連発話・アクションタグデータベース１９内に該当する番組のデータが格納されているか否か判定し、視聴中の番組に該当するデータが格納されている場合は、そのデータを発話・アクション決定部１３へ送信し、発話・アクション決定部１３は、番組の再生時刻、ロボットパーソナリティに基づいてロボット２に発話させる発話内容を決定する。 The program-related utterance / action tag database 19 stores utterance contents to be uttered by the robot 2 in accordance with the program shown on the display 4. The display information processing unit 11 acquires information on the channel being viewed, determines whether or not the data of the corresponding program is stored in the program-related utterance / action tag database 19, and corresponds to the program being viewed. If the data is stored, the data is transmitted to the utterance / action determination unit 13, and the utterance / action determination unit 13 determines the utterance content to be uttered by the robot 2 based on the reproduction time of the program and the robot personality. .

図７に、番組関連発話・アクションタグデータベース１９に格納されるデータの例を示す。番組関連発話・アクションタグデータベース１９に格納されるデータは、チャンネル情報、番組名、動作開始時間、実行アクション、実行する感情状態、発話内容、およびロボットパーソナリティで構成される。チャンネル情報は、テレビ番組のチャンネルを表している。番組名は、各チャンネルで行われているテレビ番組の名前を示している。動作開始時間は、番組の開始時刻を基準として、実行アクション、発話内容を実施させるタイミングを示す時間である。実行アクションは、ロボットが実行する動作名を示している。実行する感情状態は、アクション実行時のロボットの感情状態を示している。発話内容は、アクション実行と同時にロボットが発話する発話文を示したテキストである。ロボットパーソナリティは、実行アクションや発話内容にあったパーソナリティを示す値である。図７の例では、性別と年代を記載している。 FIG. 7 shows an example of data stored in the program-related utterance / action tag database 19. The data stored in the program-related utterance / action tag database 19 includes channel information, program name, operation start time, execution action, emotion state to be executed, utterance content, and robot personality. The channel information represents a television program channel. The program name indicates the name of the television program being performed on each channel. The operation start time is a time indicating the timing at which the execution action and the content of the utterance are performed based on the start time of the program. The execution action indicates the name of the operation executed by the robot. The emotional state to be executed indicates the emotional state of the robot when the action is executed. The utterance content is a text indicating an utterance sentence uttered by the robot simultaneously with the execution of the action. The robot personality is a value indicating the personality according to the execution action and the content of the utterance. In the example of FIG. 7, sex and age are described.

[ロボット制御装置が利用する外部のデータ]
続いて、ロボット制御装置１が利用する外部のサーバや外部のデータベースが保持するデータについて説明する。 [External data used by the robot controller]
Subsequently, data held by an external server or an external database used by the robot control apparatus 1 will be described.

図８は、電子番組表情報データベース５が保持するデータの例を示す図である。ディスプレイ情報処理部１１は電子番組表情報データベース５を参照し、ディスプレイ４で視聴中のチャンネルの番組名やカテゴリを取得する。 FIG. 8 is a diagram showing an example of data held in the electronic program guide information database 5. The display information processing unit 11 refers to the electronic program guide information database 5 and acquires the program name and category of the channel being viewed on the display 4.

図８に示す電子番組表情報は、チャンネル情報、カテゴリ、番組名、開始時刻、および終了時刻のセットで構成される。電子番組表情報には、各番組がどのチャンネルで何時から何時まで放送されるかが示されている。カテゴリは番組内容に応じた分類を示す情報である。 The electronic program guide information shown in FIG. 8 includes a set of channel information, category, program name, start time, and end time. The electronic program guide information indicates from what time to what time each program is broadcast on which channel. The category is information indicating the classification according to the program content.

図９は、番組−ソーシャルメディアタグ関連データベース６が保持するデータの例を示す図である。番組−ソーシャルメディアタグ関連データベース６は、ソーシャルメディア情報取得部１２がユーザが視聴中の番組に関連するコメントを抽出するために用いる。 FIG. 9 is a diagram illustrating an example of data held in the program-social media tag related database 6. The program-social media tag related database 6 is used by the social media information acquisition unit 12 to extract comments related to the program being viewed by the user.

図９に示す番組−ソーシャルメディアタグ関連データベース６では、番組名、複数の番組関連タグのセットを保持する。番組関連タグとは、ソーシャルメディアサーバを利用する利用者が特定のテレビ番組に関してコメントしたことを示すために意図的にコメント内につける共通の文字列である。図９の例では、ソーシャルメディアタグは記号＃から始まる半角の英語大文字の文字列とする。この番組関連タグを含むコメントを抽出することで、番組に関連するコメントのみを抽出することができる。 The program-social media tag related database 6 shown in FIG. 9 holds a program name and a set of a plurality of program related tags. A program-related tag is a common character string that is intentionally included in a comment to indicate that a user using a social media server has commented on a specific television program. In the example of FIG. 9, the social media tag is a single-byte English uppercase character string starting with the symbol #. By extracting a comment including the program-related tag, only a comment related to the program can be extracted.

図１０は、ソーシャルメディアサーバ７が保持するソーシャルメディアコメント情報の例を示す図である。ソーシャルメディア情報取得部１２は、ソーシャルメディアサーバ７にアクセスしてソーシャルメディアコメント情報を取得する。 FIG. 10 is a diagram illustrating an example of social media comment information held by the social media server 7. The social media information acquisition unit 12 accesses the social media server 7 and acquires social media comment information.

図１０に示すソーシャルメディアコメント情報は、ユーザＩＤ、コメント時刻、およびコメント内容のセットで構成される。ユーザＩＤは、ソーシャルメディアサーバ７にコメントを投稿する利用者ひとりひとりに付く固有のＩＤである。コメント時刻は、ソーシャルメディアの利用者がコメントをソーシャルメディアサーバ７に送信した時刻である。コメント内容は、利用者がソーシャルメディアサーバ７に送信したコメントの文字列である。 The social media comment information shown in FIG. 10 includes a set of user ID, comment time, and comment content. The user ID is a unique ID given to each user who posts a comment to the social media server 7. The comment time is the time when the user of the social media transmits the comment to the social media server 7. The comment content is a character string of a comment transmitted to the social media server 7 by the user.

図１１は、位置取得サーバ８が保持するユーザ・ディスプレイ方位情報の例を示す図である。ユーザ・ディスプレイ方位情報は、ロボット２からユーザ、ディスプレイ４への方向を示す情報であり、発話・アクション決定部１３がロボット２にアクションを実行させるときに用いる。 FIG. 11 is a diagram illustrating an example of user display orientation information held by the position acquisition server 8. The user / display direction information is information indicating the direction from the robot 2 to the user and the display 4, and is used when the utterance / action determination unit 13 causes the robot 2 to execute an action.

図１１に示すユーザ・ディスプレイ方位情報は、ユーザとディスプレイ４の２つの対象それぞれに対し、方位角と仰俯角を持つ。方位角は、ロボット２から各対象が地面に水平方向において北を０°とした際にどの方位にあるかを示したものである。仰俯角は、ロボット２から各対象が地面に垂直方向において水平を０°，真上を９０°とした際にどの角度にあるかを示したものである。各対象の方位角と仰俯角は、ユーザ、ロボット２、ディスプレイ４の移動に応じて逐次更新されるものとする。 The user display azimuth information shown in FIG. 11 has an azimuth angle and an elevation angle for each of the two objects of the user and the display 4. The azimuth angle indicates in which direction each object from the robot 2 is located when the north is 0 ° in the horizontal direction on the ground. The elevation angle indicates the angle at which each object from the robot 2 is at 0 ° in the direction perpendicular to the ground and 90 ° directly above the ground. It is assumed that the azimuth angle and elevation angle of each object are sequentially updated according to the movement of the user, the robot 2 and the display 4.

［ロボット制御装置の動作］
次に、ロボット制御装置１の動作について説明する。以下では、ディスプレイ４に対する操作を取得する処理、発話内容とアクション内容を決定する処理、およびロボット２に発話とアクションを実行させる処理について順に説明する。 [Robot controller operation]
Next, the operation of the robot control device 1 will be described. Below, the process which acquires operation with respect to the display 4, the process which determines speech content and action content, and the process which makes the robot 2 perform speech and action are demonstrated in order.

［ディスプレイに対する操作を取得する処理］
まず、ディスプレイ４に対するユーザによる操作を取得する処理について説明する。 [Process to acquire operation for display]
First, a process for acquiring a user operation on the display 4 will be described.

ユーザがリモコン３を操作すると、リモコン３は、操作内容に応じたディスプレイ状態操作やチャンネル操作の信号をディスプレイ４に送信する。ディスプレイ状態操作は、ディスプレイ４の電源ＯＮや電源ＯＦＦ、音量大や音量小など、ディスプレイ４の状態を変化させる操作である。チャンネル操作は、ディスプレイに表示されてる映像を変更する信号であり、例えば１ｃｈから２ｃｈなど他のチャンネルに変更する操作である。 When the user operates the remote controller 3, the remote controller 3 transmits a display state operation signal or a channel operation signal corresponding to the operation content to the display 4. The display state operation is an operation for changing the state of the display 4 such as turning the display 4 on and off, increasing the volume, and decreasing the volume. The channel operation is a signal for changing the video displayed on the display, and is an operation for changing to another channel such as 1ch to 2ch, for example.

リモコン３は、ディスプレイ４に信号を送信するとともに、ロボット制御装置１にディスプレイ状態変更情報や視聴チャンネル変更情報を送信する。ディスプレイ状態変更情報にはディスプレイ４に対する操作内容を示す文字列（例えば電源ＯＮや音量大など）が含まれる。視聴チャンネル変更情報には変更後のチャンネル情報が含まれる。 The remote controller 3 transmits a signal to the display 4 and transmits display state change information and viewing channel change information to the robot controller 1. The display state change information includes a character string (for example, power ON, volume level, etc.) indicating the operation content for the display 4. The viewing channel change information includes channel information after the change.

ロボット制御装置１は、視聴チャンネル変更情報やディスプレイ状態変更情報を受信すると、後述する発話内容とアクション内容を決定する処理を実行する。 When receiving the viewing channel change information and the display state change information, the robot control device 1 executes processing for determining the utterance content and action content described later.

なお、本実施の形態では、リモコン３からロボット制御装置１に対して視聴チャンネル変更情報やディスプレイ状態変更情報が送信されるとしたが、ロボット制御装置１が、リモコン３からディスプレイ４へ送信される信号を受信し、ロボット制御装置１内の処理によって、受信した信号を視聴チャンネル変更情報とディスプレイ状態変更情報に変換してもよい。例えば、リモコン３としてスマートフォンのアプリケーションを用いて無線ＬＡＮによりディスプレイ４を操作する場合、操作情報の送信先にロボット制御装置１を加える。 In the present embodiment, the viewing channel change information and the display state change information are transmitted from the remote controller 3 to the robot control apparatus 1. However, the robot control apparatus 1 is transmitted from the remote control 3 to the display 4. The received signal may be converted into viewing channel change information and display state change information by receiving the signal and processing in the robot control apparatus 1. For example, when the display 4 is operated by a wireless LAN using a smartphone application as the remote controller 3, the robot control device 1 is added to the transmission destination of the operation information.

また、ディスプレイ４が、ロボット制御装置１に対して視聴チャンネル変更情報とディスプレイ状態変更情報を送信する機能を有しても良い。例えば、赤外線リモコンによるテレビ操作に対してディスプレイ状態の変更を検知する場合は、ディスプレイ４の赤外線受光口の傍に赤外線リモコンからの赤外線信号を受信し、赤外線信号の示す操作内容を無線によりロボット制御装置１へ送信する。 The display 4 may have a function of transmitting viewing channel change information and display state change information to the robot control device 1. For example, when a change in display state is detected in response to a TV operation by an infrared remote controller, an infrared signal is received from the infrared remote controller near the infrared light receiving port of the display 4, and the operation content indicated by the infrared signal is wirelessly controlled by the robot. Transmit to device 1.

さらに、ユーザの音声やリモコン操作などに応じてロボット２がディスプレイ操作を仲介する場合は、ロボットに対するユーザ音声やリモコン操作からディスプレイ状態の変更を検知する。あるいは、ロボット２がロボット制御装置１へディスプレイ状態を通知してもよい。 Further, when the robot 2 mediates the display operation according to the user's voice or remote control operation, a change in the display state is detected from the user voice or remote control operation on the robot. Alternatively, the robot 2 may notify the robot controller 1 of the display state.

［発話内容とアクション内容を決定する処理］
続いて、ロボット２に発話させる発話内容と実行させるアクション内容を決定する処理について説明する。 [Process to determine utterance content and action content]
Next, processing for determining the utterance content to be uttered by the robot 2 and the action content to be executed will be described.

ロボット制御装置１は、ディスプレイ状態変更情報や視聴チャンネル変更情報を受信すると、ディスプレイの状態の変更や視聴中のチャンネルにあった発話内容とアクション内容を決定する。本実施の形態におけるロボット制御装置１は、（Ａ）ディスプレイ状態変更に基づく方法、（Ｂ）ソーシャルメディアを用いる方法、（Ｃ）番組に付随するシナリオに基づく方法、の３通りの方法で発話・アクション内容を決定する。以下で、（Ａ）〜（Ｃ）の処理について順に説明する。なお、（Ａ）〜（Ｃ）のいずれの方法を用いてもよいし、組み合わせてもよい。予め決められたルールに従う動作でないという点で、（Ｂ）のソーシャルメディアを用いる方法を備えることが好ましい。 When receiving the display state change information and the viewing channel change information, the robot control device 1 determines the utterance content and the action content suitable for the display state change and the channel being viewed. The robot control apparatus 1 according to the present embodiment uses three methods: (A) a method based on a display state change, (B) a method using social media, and (C) a method based on a scenario associated with a program. Determine the action content. Hereinafter, the processes (A) to (C) will be described in order. In addition, you may use any method of (A)-(C), and may combine. It is preferable to provide the method of using the social media of (B) in that it is not an operation according to a predetermined rule.

（Ａ）ディスプレイ状態変更に基づく方法
まず、ディスプレイ状態変更に基づいて発話内容とアクション内容を決定する方法について説明する。 (A) Method Based on Display State Change First, a method for determining speech content and action content based on display state change will be described.

図１２は、ディスプレイ状態変更に基づいて発話内容とアクション内容を決定する処理の流れを示すフローチャートである。 FIG. 12 is a flowchart showing a flow of processing for determining the utterance content and the action content based on the display state change.

ディスプレイ情報処理部１１は、リモコン３からディスプレイ状態変更情報を受信すると、ディスプレイ状態変更情報を発話・アクション決定部１３へ送信する（ステップＳ１１）。 Upon receiving the display state change information from the remote controller 3, the display information processing unit 11 transmits the display state change information to the utterance / action determining unit 13 (step S11).

発話・アクション決定部１３は、ディスプレイ状態変更情報を受信すると、発話・アクション決定部１３の記憶領域から感情状態を取得する（ステップＳ１２）。なお、電源投入後や初期化後などロボット２に初めてアクションさせる場合は感情状態は保存されていない。 When the utterance / action determination unit 13 receives the display state change information, the utterance / action determination unit 13 acquires the emotional state from the storage area of the utterance / action determination unit 13 (step S12). Note that the emotional state is not stored when the robot 2 is caused to act for the first time after power-on or initialization.

発話・アクション決定部１３は、ロボットパーソナリティ、感情状態、および受信したディスプレイ状態変更情報に応じた定型発話文情報を定型発話文データベース１６から抽出する（ステップＳ１３）。具体的には、発話・アクション決定部１３は、ロボットパーソナリティ属性情報データベース１４からロボット２に設定されたロボットパーソナリティ属性情報を取得するとともに、定型発話文データベース１６を参照し、ステップＳ１２で取得した感情状態と定型発話文データベース１６の元の感情状態の値が一致し、かつロボットパーソナリティ属性情報の属性値と定型発話文データベース１６のパーソナリティの属性値が一致する定型発話文情報を抽出する。そして、抽出した定型発話文情報のなかでテレビ状態遷移の値が受信したディスプレイ状態変更情報と一致する定型発話文情報を抽出する。ステップＳ１３で抽出した定型発話文情報の発話内容がロボット２に発話させる発話内容となる。なお、定型発話文データベース１６において元の感情状態が複数設定されているものに関しては、取得した感情状態が含まれていれば一致したものとする。また、元の感情状態にＡＬＬが設定されている場合は、感情状態にかかわらず一致したものとする。ステップＳ１３で抽出した結果が複数存在する場合は、そのうち１つをランダムで選択する。 The utterance / action determination unit 13 extracts the fixed utterance sentence information corresponding to the robot personality, the emotional state, and the received display state change information from the fixed utterance sentence database 16 (step S13). Specifically, the utterance / action determination unit 13 acquires the robot personality attribute information set for the robot 2 from the robot personality attribute information database 14 and refers to the standard utterance sentence database 16 to obtain the emotion acquired in step S12. The standard utterance text information in which the state and the original emotional state value in the standard utterance text database 16 match and the attribute value in the robot personality attribute information matches the personality attribute value in the standard utterance text database 16 is extracted. Then, in the extracted standard utterance text information, the standard utterance text information whose TV state transition value matches the received display state change information is extracted. The utterance content of the fixed utterance sentence information extracted in step S13 is the utterance content that the robot 2 utters. It should be noted that a plurality of original emotional states set in the standard utterance sentence database 16 match if the acquired emotional states are included. When ALL is set as the original emotional state, it is assumed that they match regardless of the emotional state. If there are a plurality of results extracted in step S13, one of them is selected at random.

そして、ステップＳ１３で抽出した定型発話文情報の実行する感情状態の値を発話・アクション決定部１３の記憶領域に感情状態として保存する（ステップＳ１４）。 Then, the value of the emotional state executed by the fixed utterance sentence information extracted in step S13 is stored as an emotional state in the storage area of the utterance / action determining unit 13 (step S14).

続いて、ロボット２に実行させるアクション内容を決定する。 Subsequently, the action content to be executed by the robot 2 is determined.

発話・アクション決定部１３は、発話内容の対話状態、感情状態に応じたアクションをアクションデータベース１５から抽出する（ステップＳ１５）。具体的には、発話・アクション決定部１３は、アクションデータベース１５のアクション決定テーブルを参照し、アクションデータベース１５の対話状態トリガーがステップＳ１３で抽出した定型発話文情報の対話状態と一致し、かつアクションデータベース１５の感情状態トリガーがステップＳ１３で抽出した定型発話文情報の実行する感情状態と一致するデータを抽出する。なお、複数のデータが一致する場合は、そのうち１つをランダムで選択する。また、一致するデータがない場合は、実施するアクション無しとする。 The utterance / action determination unit 13 extracts an action corresponding to the dialogue state and emotional state of the utterance content from the action database 15 (step S15). Specifically, the utterance / action determination unit 13 refers to the action determination table of the action database 15, the dialog state trigger of the action database 15 matches the dialog state of the fixed utterance sentence information extracted in step S 13, and the action Data whose emotional state trigger in the database 15 matches the emotional state executed by the fixed utterance sentence information extracted in step S13 is extracted. When a plurality of data matches, one of them is selected at random. If there is no matching data, no action is taken.

そして、ステップＳ１５でアクション決定テーブルから抽出したデータの動作名をアクションデータベース１５の制御シーケンステーブルから検索する。検索したデータのアクチュエータ制御シーケンスがロボット２に実行させるアクション内容となる。なお、アクション内容に基づいてロボット２を制御する処理については後述する。 Then, the operation name of the data extracted from the action determination table in step S15 is searched from the control sequence table of the action database 15. The actuator control sequence of the retrieved data is the action content that the robot 2 executes. A process for controlling the robot 2 based on the action content will be described later.

以上の処理により、受信したディスプレイ状態変更情報に基づき、ロボットパーソナリティを考慮した発話内容とアクション内容が決定される。発話・アクション内容を決定した後は、後述するロボットに発話とアクションを実行させる処理を実行する。 Through the above processing, the utterance content and action content considering the robot personality are determined based on the received display state change information. After the utterance / action content is determined, a process for causing the robot described later to execute the utterance and action is executed.

（Ｂ）ソーシャルメディアを用いる方法
続いて、ソーシャルメディアを用いて発話内容とアクション内容を決定する方法について説明する。 (B) Method Using Social Media Next, a method for determining utterance content and action content using social media will be described.

図１３は、ソーシャルメディアを用いて発話内容とアクション内容を決定する処理のうち、盛り上がり値とパーソナリティ一致話者コメント情報を取得する処理の流れを示すフローチャートである。後述する処理により、発話・アクション決定部１３は、盛り上がり値とパーソナリティ一致話者コメント情報に基づいて発話内容とアクション内容を決定する。 FIG. 13 is a flowchart showing a flow of a process of acquiring the excitement value and the personality matching speaker comment information among the processes of determining the utterance content and the action content using social media. Through the processing described later, the utterance / action determination unit 13 determines the utterance content and the action content based on the excitement value and the personality matching speaker comment information.

ディスプレイ情報処理部１１は、リモコン３から視聴チャンネル変更情報を受信すると（ステップＳ２１）、電子番組表情報データベース５を参照し、受信した視聴チャンネル変更情報と現在時刻から視聴中の番組のデータを取得する（ステップＳ２２）。 When the display information processing unit 11 receives the viewing channel change information from the remote controller 3 (step S21), the display information processing unit 11 refers to the electronic program guide information database 5 and acquires the data of the currently viewed program from the received viewing channel change information and the current time. (Step S22).

そして、ディスプレイ情報処理部１１は、番組−ソーシャルメディアタグ関連データベース６を参照し、視聴中の番組に関連する番組関連タグを取得する（ステップＳ２３）。ディスプレイ情報処理部１１は、視聴中の番組のカテゴリと番組関連タグを番組ドメイン情報と番組関連ソーシャルメディアタグ情報としてソーシャルメディア情報取得部１２へ送信する。 Then, the display information processing unit 11 refers to the program-social media tag related database 6 and acquires a program related tag related to the program being viewed (step S23). The display information processing unit 11 transmits the category of the program being viewed and the program-related tag to the social media information acquisition unit 12 as program domain information and program-related social media tag information.

ソーシャルメディア情報取得部１２は、番組ドメイン情報と番組関連ソーシャルメディアタグ情報を受信すると、受信した番組関連ソーシャルメディアタグ情報を含むソーシャルメディアコメント情報をソーシャルメディアサーバ７から取得する（ステップＳ２４）。 When receiving the program domain information and the program related social media tag information, the social media information acquisition unit 12 acquires social media comment information including the received program related social media tag information from the social media server 7 (step S24).

ソーシャルメディア情報取得部１２は、取得したソーシャルメディアコメント情報から盛り上がり値を算出する（ステップＳ２５）。本実施の形態では、盛り上がり値を各シーンの時間に対応するコメント数の増減に基づいて算出する。具体的には、現在時刻から１分以内に投稿されたソーシャルメディアコメントの総コメント数ｘと、番組開始から現在時刻の1分あたりのコメント数の平均値μと、番組開始から現在時刻まで１分毎にカウントしたコメント数の分散値σを用いて、次式（１）により盛り上がり値を求める。 The social media information acquisition unit 12 calculates a climax value from the acquired social media comment information (step S25). In this embodiment, the excitement value is calculated based on the increase / decrease in the number of comments corresponding to the time of each scene. Specifically, the total number x of social media comments posted within one minute from the current time, the average value μ of the number of comments per minute from the start of the program to the current time, and 1 from the start of the program to the current time Using the variance value σ of the number of comments counted every minute, a rising value is obtained by the following equation (1).

式（１）で算出される値が−１．０を下回る場合は盛り上がり値を−１．０、また、値が１．０を上回る場合は盛り上がり値を１．０とする。式（１）で算出される値が−１．０から１．０の範囲内の場合はその値を盛り上がり値とする。 When the value calculated by Equation (1) is less than −1.0, the rising value is −1.0, and when the value is higher than 1.0, the rising value is 1.0. When the value calculated by the expression (1) is within the range of -1.0 to 1.0, the value is set as a rising value.

そして、ソーシャルメディア情報取得部１２は、ロボットパーソナリティ属性情報データベース１４を参照し、受信した番組ドメイン情報と一致するロボットパーソナリティ属性情報を抽出する（ステップＳ２６）。番組ドメイン情報とロボットパーソナリティ属性情報との一致判定では、まず映像依存のロボットパーソナリティからメインカテゴリで一致するものがあるか否か判定する。一致するメインカテゴリがない場合は全てのサブカテゴリで一致するものがあるか否か判定する。一致するメインカテゴリが存在し、そのメインカテゴリにサブカテゴリが存在する場合は、そのサブカテゴリのなかからランダムで１つを選択し、選択したサブカテゴリの属性、属性値、属性重みと固定のロボットパーソナリティを組み合わせて、パーソナリティ一致話者の特定に用いるロボットパーソナリティ属性情報とする。一致するメインカテゴリにサブカテゴリがない場合や、一致するメインカテゴリがなく一致するサブカテゴリが存在する場合は、そのカテゴリの属性、属性値、属性重みと固定のロボットパーソナリティを組み合わせて、パーソナリティ一致話者の特定に用いるロボットパーソナリティ属性情報とする。映像依存のロボットパーソナリティに番組ドメイン情報と一致するカテゴリがない場合は、固定のロボットパーソナリティのみをパーソナリティ一致話者の特定に用いるロボットパーソナリティ属性情報とする。 Then, the social media information acquisition unit 12 refers to the robot personality attribute information database 14 and extracts the robot personality attribute information that matches the received program domain information (step S26). In the coincidence determination between the program domain information and the robot personality attribute information, it is first determined whether there is a match in the main category from the video-dependent robot personality. If there is no matching main category, it is determined whether or not there is a match in all subcategories. If there is a matching main category and there are subcategories in the main category, select one of the subcategories at random and combine the attributes, attribute values, attribute weights of the selected subcategory with a fixed robot personality. Thus, the robot personality attribute information used for specifying the personality matching speaker is used. If there is no subcategory in the matching main category, or if there is no matching main category and there is a matching subcategory, combine the category's attributes, attribute values, attribute weights, and a fixed robot personality to create a personality matching speaker. Let it be the robot personality attribute information used for identification. If there is no category that matches the program domain information in the video-dependent robot personality, only the fixed robot personality is used as the robot personality attribute information used to identify the personality matching speaker.

そして、ソーシャルメディア情報取得部１２は、抽出したロボットパーソナリティ属性情報を用いてパーソナリティ一致話者を抽出する（ステップＳ２７）。具体的には、まず、ロボットパーソナリティ属性情報の各属性に対して、ステップＳ２４で取得したソーシャルメディアコメント情報の全ユーザの属性値を推定する。全ユーザの各属性における属性値の推定には、Jun ITO, “What is he/she like?: Estimating Twitter User Attributes from Contents and Social Neighbors” に記載された技術を用いる。そして、推定した全ユーザの各属性における属性値とロボットパーソナリティ属性情報の各属性における属性値を用いて、各ユーザとロボットのパーソナリティの一致度を計算し、予め決められた値を超えた一致度のユーザをパーソナリティ一致話者とする。一致度の計算には、各ユーザとロボットパーソナリティ属性情報の同じ属性に対し、各ユーザの属性値の推定結果とロボットパーソナリティ属性情報の属性値の比較を行い、属性値が一致する属性の属性重みの和を計算する。そして、属性重みの和をロボットパーソナリティ属性情報の属性の数で割ったものをユーザとロボットのパーソナリティの一致度とする。 And the social media information acquisition part 12 extracts a personality matching speaker using the extracted robot personality attribute information (step S27). Specifically, first, the attribute values of all users of the social media comment information acquired in step S24 are estimated for each attribute of the robot personality attribute information. The technique described in Jun ITO, “What is he / she like ?: Estimating Twitter User Attributes from Contents and Social Neighbors” is used to estimate the attribute value of each attribute of all users. Then, using the attribute values in each attribute of all the estimated users and the attribute values in each attribute of the robot personality attribute information, the degree of coincidence of each user and the robot's personality is calculated, and the degree of coincidence exceeding a predetermined value Are designated as personality matching speakers. In calculating the degree of match, for each user and the same attribute of the robot personality attribute information, the attribute value estimation result of each user and the attribute value of the robot personality attribute information are compared, and the attribute weight of the attribute with the matching attribute value Calculate the sum of The sum of the attribute weights divided by the number of attributes of the robot personality attribute information is used as the degree of coincidence between the personality of the user and the robot.

そして、パーソナリティ一致話者のコメントを抽出する（ステップＳ２８）。抽出したパーソナリティ一致話者のコメントは、パーソナリティ一致話者コメント情報として盛り上がり値とともに発話・アクション決定部１３に送信される。 Then, the comment of the personality matching speaker is extracted (step S28). The extracted comment of the personality matching speaker is transmitted to the utterance / action determination unit 13 together with the excitement value as personality matching speaker comment information.

以上の処理により、ユーザが視聴中の番組の盛り上がり値とロボットに設定されたパーソナリティに合ったパーソナリティ一致話者コメント情報が発話・アクション決定部１３に送信される。引き続いて、発話・アクション決定部１３が発話内容とアクション内容を決定する処理について説明する。 Through the above processing, personality matching speaker comment information matching the excitement value of the program being watched by the user and the personality set in the robot is transmitted to the utterance / action determining unit 13. Subsequently, processing in which the utterance / action determination unit 13 determines the utterance content and the action content will be described.

図１４は、ソーシャルメディアを用いて発話内容とアクション内容を決定する処理のうち、盛り上がり値とパーソナリティ一致話者コメント情報に基づいて発話内容とアクション内容を決定する処理の流れを示すフローチャートである。 FIG. 14 is a flowchart showing a flow of processing for determining speech content and action content based on the excitement value and personality match speaker comment information among the processing for determining speech content and action content using social media.

発話・アクション決定部１３は、ポジネガ単語データベース１７を参照し、受信したパーソナリティ一致話者コメント情報を用いてポジティブ・ネガティブ値を算出する（ステップＳ３１）。具体的には、パーソナリティ一致話者コメント情報の全てのコメント内容に対して形態素解析を行い、形態素解析されたコメント内容の各単語について、ポジネガ単語データベース１７に格納されたポジネガ単語情報の単語と一致するものがあるか否か判定する。形態素解析された全単語の数をｗｏｒｄＮＭＢ、形態素解析された全単語のｉ番目をｗ_iとすると、ポジティブ・ネガティブ値ＰＮは次式（２）で求められる。 The utterance / action determination unit 13 refers to the positive / negative word database 17 and calculates a positive / negative value using the received personality matching speaker comment information (step S31). Specifically, the morphological analysis is performed on all comment contents of the personality matching speaker comment information, and each word of the comment contents subjected to the morphological analysis matches the word of the positive / negative word information stored in the positive / negative word database 17. Determine whether there is something to do. When the number of all words subjected to morphological analysis is wordNMB and the i-th number of all words subjected to morphological analysis is w _i , a positive / negative value PN is obtained by the following equation (2).

式（２）において、Ｊ（ｗ_i）は、ｗ_iがポジネガ単語情報の単語と一致するものがあり、かつポジティブが１、ネガティブが０であった場合は１、ｗ_iがポジネガ単語情報の単語と一致するものがあり、かつポジティブが０、ネガティブが１であった場合は−１、それ以外の場合は０を返す関数である。なお、式（２）の算出結果、ＰＮ＞１００の場合はＰＮ＝１００、ＰＮ＜−１００の場合はＰＮ＝−１００とする。−１００≦ＰＮ≦１００の場合は算出されたＰＮをそのままポジティブ・ネガティブ値とする。 In equation (2), J (w _i ) is one when w _i matches the word of the positive / negative word information, and positive is 1 and negative is 0, and w _i is the positive / negative word information. This is a function that matches a word and returns -1 if positive is 0 and negative is 1, and returns 0 otherwise. As a result of the calculation of Expression (2), PN = 100 when PN> 100, and PN = −100 when PN <−100. When −100 ≦ PN ≦ 100, the calculated PN is used as a positive / negative value as it is.

そして、発話・アクション決定部１３は、受信したパーソナリティ一致話者コメント情報から発話内容を決定する（ステップＳ３２）。具体的には、まず、現在時刻より一定時間以内に投稿されたコメントをパーソナリティ一致話者コメント情報から抽出し、抽出したコメントに対して形態素解析を行う。そして、形態素解析されたコメントの全ての単語に対してＴＦ−ＩＤＦ値を算出する。このとき、ＴＦは形態素解析されたコメントの全ての単語における各単語の出現数であり、ＩＤＦは一般的な文書コーパスより算出される。一般的な文書コーパスとしては新聞社のコーパスなどが挙げられる。そして、各コメントの文が持つ各単語のＴＦ−ＩＤＦ値の合計値を算出し、合計値が最も大きかったコメントを発話内容として決定する。 Then, the utterance / action determination unit 13 determines the utterance content from the received personality matching speaker comment information (step S32). Specifically, first, a comment posted within a certain time from the current time is extracted from personality matching speaker comment information, and morphological analysis is performed on the extracted comment. Then, TF-IDF values are calculated for all the words in the comment subjected to morphological analysis. At this time, TF is the number of occurrences of each word in all words of the comment subjected to morphological analysis, and IDF is calculated from a general document corpus. A typical document corpus includes a newspaper corpus. And the total value of TF-IDF value of each word which each comment sentence has is calculated, and the comment having the largest total value is determined as the utterance content.

そして、発話・アクション決定部１３は、盛り上がり値とポジティブ・ネガティブ値を用いて感情状態を決定する（ステップＳ３３）。感情状態の決定は、ラッセルの感情円環モデル（James A. Russell, “A Circumplex Model of Affect”）を応用した図１５に示す盛り上がり値とポジティブ・ネガティブ値に基づく感情状態のマップに、盛り上がり値とポジティブ・ネガティブ値を当てはめて、マップ中に示された感情状態のうち最も近い感情状態をロボットの感情状態とする。図１５のマップにおいて、盛り上がり値の最大値はＡｃｔＭＡＸ＝１、最小値はＡｃｔＭＩＮ−１であり、ポジティブ・ネガティブ値の最大値はＰＮＭＡＸ＝１００、最小値はＰＮＭＩＮ＝−１００である。図１５のマップ中の感情状態ＥＭ_nの盛り上がり値をＡｃｔ_n、ポジティブ・ネガティブ値をＰＮ_nとし、盛り上がり値をＡｃｔ、ポジティブ・ネガティブ値をＰＮとすると、感情状態ＥＭ_nとの感情距離ＥｍＤｉｓｔ_nは次式（３）で表される。 Then, the utterance / action determination unit 13 determines the emotional state using the excitement value and the positive / negative value (step S33). The emotional state is determined based on the emotional state map based on the positive value and the positive value shown in FIG. 15 applying Russell's emotional ring model (James A. Russell, “A Circumplex Model of Affect”). By applying positive and negative values, the closest emotional state among the emotional states shown in the map is set as the robot's emotional state. In the map of FIG. 15, the maximum value of the climax value is ActMAX = 1, the minimum value is ActMIN-1, the maximum value of the positive / negative value is PNMAX = 100, and the minimum value is PNMIN = -100. The emotional distance EmDist _n with the emotional state EM _n is given by _{assuming that} the rising value of the emotional state EM _{n in} the map of FIG. 15 is Act _n , the positive / negative value is PN _n , the rising value is Act and the positive / negative value is PN. Is represented by the following equation (3).

式（３）を用いて、マップ中の全ての感情状態ＥＭ_nとの感情距離ＥｍＤｉｓｔ_nを計算し、感情距離ＥｍＤｉｓｔ_nが最も小さい値の感情状態ＥＭ_nをロボットの感情状態とする。決定した感情状態は、発話・アクション決定部１３の記憶領域に保存する。 Using equation (3), the emotional distance EmDist _n with all emotional states EM _n in the map is calculated, and the emotional state EM _n having the smallest emotional distance EmDist _n is set as the emotional state of the robot. The determined emotional state is stored in the storage area of the utterance / action determination unit 13.

なお、ロボット２の感情状態を決定する方法として、映像中の音声から感情を類推する方法（特開２００９−１１１９３８号公報、特開２００９−２５１４６９号公報）や映像から感情を類推する方法（特開２０１１−８１４４５号公報）を用いて、特定の登場人物の感情を抽出してもよい。 In addition, as a method of determining the emotional state of the robot 2, a method of analogizing emotions from sound in a video (JP 2009-111938 A, JP 2009-251469 A) or a method of analogizing emotions from video (special May be used to extract emotions of a particular character.

そして、発話・アクション決定部１３は、ステップＳ３２で決定した発話内容に対して、ノイズ除去、口調変換を行う（ステップＳ３４）。ノイズの除去では、発話時に不必要なソーシャルメディアタグや記号を使った顔文字などを除去する。ソーシャルメディアタグの除去の際には、ソーシャルメディアタグを表すルールに沿った文字列の除去を行う。本実施の形態では＃で始まる半角英語大文字の文字列を除去する。顔文字の除去では、Michal Ptaszynski, “CAO: A Fully Automatic Emoticon Analysis System Based on Theory of Kinesics” などに記載された顔文字解析システムを用いて発話内容内の顔文字を抽出して除去する。また、発話内容に口調変換データベース１８に変換元として登録された言葉が存在する場合は、該当する文字列を変換先の文字列に置き換える。 Then, the utterance / action determination unit 13 performs noise removal and tone conversion on the utterance content determined in step S32 (step S34). Noise removal removes unnecessary social media tags and emoticons using symbols when speaking. When removing the social media tag, the character string is removed according to the rule representing the social media tag. In the present embodiment, single-byte English uppercase character strings starting with # are removed. In emoticon removal, emoticons in the utterance content are extracted and removed using an emoticon analysis system described in Michal Ptaszynski, “CAO: A Fully Automatic Emoticon Analysis System Based on Theory of Kinesics”. If the utterance content includes a word registered as a conversion source in the tone conversion database 18, the corresponding character string is replaced with the conversion destination character string.

そして、発話・アクション決定部１３は、ステップＳ３４で変換処理した発話内容とステップＳ３３で決定した感情状態からロボット２に実行させるアクションを決定する（ステップＳ３５）。まず、発話内容が疑問文であるか否かを判定し、対話状態を特定する。発話内容が疑問符や「か」「かな」で終わっている場合は疑問文であると判定する。発話内容が疑問文である場合は対話状態を話しかけとし、そうでない場合は対話状態を感想とする。そして、ディスプレイ状態変更に基づく方法のステップＳ１５と同様に、対話状態と感情状態のそれぞれがアクションデータベース１５の対話状態トリガーと感情状態トリガーに一致するデータを抽出し、ロボット２に実行させるアクション内容を決定する。 Then, the utterance / action determination unit 13 determines an action to be executed by the robot 2 from the utterance content converted in step S34 and the emotion state determined in step S33 (step S35). First, it is determined whether or not the utterance content is a question sentence, and the dialog state is specified. If the utterance content ends with a question mark or “ka” or “kana”, it is determined to be a question sentence. If the utterance content is a question sentence, the conversation state is set as a talk, and if not, the dialog state is set as an impression. Then, similarly to step S15 of the method based on the display state change, the action contents to be executed by the robot 2 are extracted by extracting data in which the dialogue state and the emotion state respectively match the dialogue state trigger and the emotion state trigger in the action database 15. decide.

以上の処理により、ソーシャルメディアサーバ７に投稿されたコメントを用いて、ロボットパーソナリティを考慮した発話内容とアクション内容が決定される。発話・アクション内容を決定した後は、後述するロボットに発話とアクションを実行させる処理を実行する。 Through the above processing, the utterance content and action content considering the robot personality are determined using the comments posted on the social media server 7. After the utterance / action content is determined, a process for causing the robot described later to execute the utterance and action is executed.

なお、上記では、利用者が自由にコメントを投稿するソーシャルメディアサーバ７から番組に関するタグを含むコメントを抽出して発話・アクション内容を決定したが、チャンネル毎に設置された電子掲示板から番組に関するコメントを抽出して発話・アクション内容を決定してもよい。 In the above, the comment including the tag related to the program is extracted from the social media server 7 where the user freely posts the comment and the utterance / action content is determined. However, the comment related to the program is read from the electronic bulletin board installed for each channel. May be extracted to determine the utterance / action content.

また、リアルタイムに放送されている番組でなく、映像の再生時間に対応させてコメントが付与された映像を視聴するときは、映像に付与されたコメントを用いることもできる。 In addition, when viewing a video to which a comment is given corresponding to the playback time of the video, rather than a program broadcast in real time, the comment given to the video can also be used.

（Ｃ）番組に付随するシナリオに基づく方法
続いて、番組に付随するシナリオに基づいて発話内容とアクション内容を決定する方法について説明する。 (C) Method Based on Scenario Attached to Program Next, a method of determining the utterance content and action content based on the scenario associated with the program will be described.

図１６は、番組に付随するシナリオに基づいて発話内容とアクション内容を決定する処理の流れを示すフローチャートである。 FIG. 16 is a flowchart showing a flow of processing for determining the utterance content and the action content based on the scenario accompanying the program.

ディスプレイ情報処理部１１は、番組関連発話・アクションタグデータベース１９を参照し、視聴中の番組に該当する発話・アクションタグ情報を抽出する（ステップＳ４１）。視聴中の番組は、ソーシャルメディアを用いる方法のステップＳ２２と同様に、電子番組表情報データベース５を参照して特定する。ディスプレイ情報処理部１１は、視聴中のチャンネル、番組名が一致する発話・アクションタグ情報を番組関連発話・アクションタグデータベース１９から抽出する。 The display information processing unit 11 refers to the program-related utterance / action tag database 19 and extracts utterance / action tag information corresponding to the program being viewed (step S41). The program being viewed is identified with reference to the electronic program guide information database 5 as in step S22 of the method using social media. The display information processing unit 11 extracts, from the program-related utterance / action tag database 19, utterance / action tag information that matches the currently viewed channel and program name.

ディスプレイ情報処理部１１は、抽出した発話・アクションタグ情報のうち、動作開始時間が近いものを発話・アクション決定部１３へ送信する（ステップＳ４２）。例えば、視聴中の番組の開始時刻に抽出した発話・アクションタグ情報の動作開始時間を加えた時刻と現在時刻との差が所定時間（１０秒程度）以内の場合に、該当する発話・アクションタグ情報を発話・アクションタグ候補情報として発話・アクション決定部１３へ送信する。なお、ステップＳ４２の処理は、ディスプレイ４の電源がＯＦＦされるまで、あるいは視聴中の番組が変更されるまで繰り返し行われる。視聴中の番組が変更された場合は、ステップＳ４１に戻る。 The display information processing unit 11 transmits the extracted utterance / action tag information that has a near operation start time to the utterance / action determination unit 13 (step S42). For example, when the difference between the time obtained by adding the operation start time of the extracted utterance / action tag information to the start time of the program being viewed and the current time is within a predetermined time (about 10 seconds), the corresponding utterance / action tag The information is transmitted to the utterance / action determination unit 13 as utterance / action tag candidate information. The process of step S42 is repeated until the display 4 is turned off or the program being viewed is changed. If the program being viewed has been changed, the process returns to step S41.

発話・アクション決定部１３は、発話・アクションタグ候補情報を受信すると、発話・アクションタグ候補情報の中から、ロボット２に設定されたロボットパーソナリティに合う発話・アクションタグ情報を選択し、発話内容とアクション内容を決定する（ステップＳ４３）。具体的には、発話・アクションタグ候補情報として受信した発話・アクションタグ情報のうち、ロボットパーソナリティ属性情報データベース１４に格納されたロボットパーソナリティと一致するロボットパーソナリティの属性値を持つ発話・アクションタグ情報を選択する。一致する発話・アクションタグ情報が複数あった場合はそのうちの１つをランダムに選択する。選択した発話・アクションタグ情報の発話内容をロボット２に発話させる発話内容として決定し、発話・アクションタグ情報の実行アクションをロボット２に実行させるアクションとして決定する。なお、一致する発話・アクションタグ情報が無かった場合は発話内容を無し、アクション内容を無しとする。 Upon receiving the utterance / action tag candidate information, the utterance / action determination unit 13 selects the utterance / action tag information suitable for the robot personality set in the robot 2 from the utterance / action tag candidate information, The action content is determined (step S43). Specifically, among the utterance / action tag information received as the utterance / action tag candidate information, the utterance / action tag information having the robot personality attribute value that matches the robot personality stored in the robot personality attribute information database 14 is obtained. select. When there are a plurality of matching utterance / action tag information, one of them is selected at random. The utterance content of the selected utterance / action tag information is determined as the utterance content that causes the robot 2 to utter, and the execution action of the utterance / action tag information is determined as an action that causes the robot 2 to execute. If there is no matching utterance / action tag information, there is no utterance content and no action content.

発話・アクション決定部１３は、実行させるアクションを決定すると、実行させるアクションを動作名として持つデータをアクションデータベース１５の制御シーケンステーブルから検索する。検索したデータのアクチュエータ制御シーケンスがロボット２に実行させるアクション内容となる。また、実行させるアクションを動作名として持つデータをアクションデータベース１５のアクション決定テーブルから検索してアクションの実行速度を得る。 When the utterance / action determination unit 13 determines the action to be executed, the utterance / action determination unit 13 searches the control sequence table of the action database 15 for data having the action to be executed as an operation name. The actuator control sequence of the retrieved data is the action content that the robot 2 executes. In addition, data having an action to be executed as an operation name is retrieved from the action determination table of the action database 15 to obtain an action execution speed.

また、発話・アクション決定部１３は、ステップＳ４３で選択した発話・アクションタグ情報の実行する感情状態を発話・アクション決定部１３の記憶領域に感情状態として保存する（ステップＳ４４）。 Further, the utterance / action determination unit 13 stores the emotional state executed by the utterance / action tag information selected in step S43 as an emotional state in the storage area of the utterance / action determination unit 13 (step S44).

以上の処理により、番組関連発話・アクションタグデータベース１９に格納された番組に付随するシナリオに基づき、ロボットパーソナリティを考慮した発話内容とアクション内容が決定される。発話・アクション内容を決定した後は、後述するロボットに発話とアクションを実行させる処理を実行する。 Through the above processing, the utterance content and action content considering the robot personality are determined based on the scenario associated with the program stored in the program-related utterance / action tag database 19. After the utterance / action content is determined, a process for causing the robot described later to execute the utterance and action is executed.

［ロボットに発話とアクションを実行させる処理］
続いて、決定した発話内容とアクション内容に基づいてロボット２に発話とアクションを実行させる処理について説明する。 [Process to make robot execute speech and action]
Next, processing for causing the robot 2 to execute utterance and action based on the determined utterance content and action content will be described.

図１７は、決定した発話内容とアクション内容に基づいてロボット２に発話とアクションを実行させる処理の流れを示すフローチャートである。 FIG. 17 is a flowchart showing a flow of processing for causing the robot 2 to execute speech and action based on the determined speech content and action content.

まず、発話・アクション決定部１３は、ロボット２に実行させるアクションがユーザ方位情報あるいはディスプレイ方位情報を必要とするか否か判定する（ステップＳ５１）。発話・アクション決定部１３は、ロボット２に実行させるアクションのアクチュエータ制御シーケンスのユーザ方位情報必要の有無及びディスプレイ方位情報必要の有無を調べて判定する。ユーザ方位情報必要の有無とディスプレイ方位情報必要のいずれも無の場合はステップＳ５４へ進む。 First, the utterance / action determination unit 13 determines whether the action to be executed by the robot 2 requires user orientation information or display orientation information (step S51). The utterance / action determination unit 13 determines whether or not the user orientation information is necessary and the display orientation information is necessary in the actuator control sequence of the action to be executed by the robot 2. If neither user orientation information is required nor display orientation information is required, the process proceeds to step S54.

ユーザ方位情報あるいはディスプレイ方位情報が必要な場合は、位置取得サーバ８からユーザ方位情報とディスプレイ方位情報を取得し（ステップＳ５２）、ロボット２に実行させるアクションのアクチュエータ制御シーケンスの必要な箇所へ代入する（ステップＳ５３）。例えば、図３（ｂ）に示した「ユーザを見る」アクションでは、アクチュエータ制御シーケンスの頭部チルト角度ｙ₁度にはユーザ方位情報の方位角を代入し、アクチュエータ制御シーケンスの頭部ロール角度ｙ₂度にはユーザ方位情報の仰俯角を代入する。また、「右手でディスプレイを指さしてユーザを見る」アクションでは、アクチュエータ制御シーケンスの頭部チルト角度ｙ₁度にはユーザ方位情報の方位角を代入し、アクチュエータ制御シーケンスの頭部ロール角度ｙ₂度にはユーザ方位情報の仰俯角を代入する。さらに、アクチュエータ制御シーケンスの右腕チルト角度ｄ₁度にはディスプレイ方位情報の方位角を代入し、アクチュエータ制御シーケンスの右腕ロール角度ｄ₂度にはディスプレイ方位情報の仰俯角を代入する。 When user orientation information or display orientation information is required, the user orientation information and display orientation information are acquired from the position acquisition server 8 (step S52), and are substituted into the necessary portions of the actuator control sequence of actions to be executed by the robot 2. (Step S53). For example, in the “see user” action shown in FIG. 3B, the azimuth angle of the user azimuth information is substituted for the head tilt angle y ₁ degree of the actuator control sequence, and the head roll angle y of the actuator control sequence is set. _{At 2} degrees, the elevation angle of the user orientation information is substituted. Further, in the action of “seeing the user with the right hand pointing at the display”, the azimuth angle of the user azimuth information is substituted for the head tilt angle y ₁ degree of the actuator control sequence, and the head roll angle y ₂ degrees of the actuator control sequence Is substituted with the elevation angle of the user orientation information. Further, the azimuth angle of the display azimuth information is substituted for the right arm tilt angle d ₁ degree of the actuator control sequence, and the elevation angle of the display azimuth information is substituted for the right arm roll angle d ₂ degrees of the actuator control sequence.

なお、ロボット２に対するユーザ方位情報、ディスプレイ方位情報を取得する方法としては、屋内位置測位手法を用いることができる。例えば、Y. Gu, “A Survey of Indoor Positioning Systems for Wireless Personal Networks” に示される手法などが挙げられる。この手法によって取得したユーザ、ロボット２、およびディスプレイ４の屋内位置情報と、さらにロボット２に方位を取得可能なセンサを取り付けることで、ロボット２に対するユーザ・ディスプレイ方向へ視線制御、指差し制御を行うことができる。他の方法としては、ロボット２にカメラを取り付けて画像処理を行うことでユーザ・ディスプレイを識別する方法が挙げられる。カメラ画像中のユーザ・ディスプレイ位置から、ロボット２から見たユーザ方位情報、ディスプレイ方位情報を取得する。 As a method for acquiring user orientation information and display orientation information for the robot 2, an indoor position measurement method can be used. For example, Y. Gu, “A Survey of Indoor Positioning Systems for Wireless Personal Networks”. By attaching indoor position information of the user, the robot 2 and the display 4 acquired by this method, and a sensor capable of acquiring the orientation to the robot 2, line-of-sight control and pointing control are performed in the user display direction with respect to the robot 2. be able to. As another method, there is a method of identifying the user display by attaching a camera to the robot 2 and performing image processing. User orientation information and display orientation information viewed from the robot 2 are acquired from the user display position in the camera image.

ロボット２に実行させるアクションのアクチュエータ制御シーケンスを取得後、発話・アクション決定部１３は、発話内容に対して音声合成する（ステップＳ５４）。音声合成する際、音の高低、発話速度、および音の大小は感情状態に応じて決定する。発話・アクション決定部１３は記憶領域から感情状態を取得し、図１５のマップから感情状態に対応する盛り上がり値とポジティブ・ネガティブ値を得る。 After acquiring the actuator control sequence of the action to be executed by the robot 2, the utterance / action determination unit 13 synthesizes speech with the utterance content (step S54). When performing speech synthesis, the pitch of the sound, the speaking speed, and the magnitude of the sound are determined according to the emotional state. The utterance / action determination unit 13 acquires the emotional state from the storage area, and obtains the excitement value and the positive / negative value corresponding to the emotional state from the map of FIG.

盛り上がり値の最大値をＡｃｔＭＡＸ、最小値をＡｃｔＭＩＮ、ポジティブ・ネガティブ値の最大値をＰＮＭＡＸ、最小値をＰＮＭＩＮ、図１５のマップから得た感情状態に対応する盛り上がり値をＡｃｔ_m、ポジティブ・ネガティブ値をＰＮ_mとし、音声合成における音の高さの最大値をＳＴＨ＿ＭＡＸ、最小値をＳＴＨ＿ＭＩＮ、発話速度の最大値をＳＳ＿ＭＡＸ、最小値をＳＳ＿ＭＩＮ、音の大きさの最大値をＳＶ＿ＭＡＸ、最小値をＳＶ＿ＭＩＮとすると、音声合成における音の高さＳＴＨ_m、発話速度ＳＳ_m、音の大きさＳＶ_mは、次式（４）〜（６）でそれぞれ求めることができる。 The maximum value of the climax value is ActMAX, the minimum value is ActMIN, the maximum value of the positive / negative value is PNMAX, the minimum value is PNMIN, the climax value corresponding to the emotional state obtained from the map of FIG. 15 is Act _m , and the positive / negative value Is PN _m , the maximum value of the sound pitch in speech synthesis is STH_MAX, the minimum value is STH_MIN, the maximum value of speech rate is SS_MAX, the minimum value is SS_MIN, the maximum value of sound volume is SV_MAX, and the minimum value is SV_MIN Then, the sound pitch STH _m , speech rate SS _m , and sound volume SV _m in speech synthesis can be obtained by the following equations (4) to (6), respectively.

式（４）〜（６）で求めた音の高さ、発話速度、および音の大きさに従って発話内容を音声合成する。音声合成の結果は、音声ファイルとして生成する。 The speech content is synthesized according to the pitch, the speaking speed, and the loudness obtained by the equations (4) to (6). The result of speech synthesis is generated as a speech file.

そして、発話・アクション決定部１３は、ロボット２にアクチュエータ制御信号と音声ファイルを送信する（ステップＳ５５）。アクチュエータ制御信号は、アクチュエータ制御シーケンスから生成され、制御するアクチュエータ部位、制御角度、および制御角度到達までの時間を含む。 Then, the utterance / action determination unit 13 transmits an actuator control signal and a voice file to the robot 2 (step S55). The actuator control signal is generated from the actuator control sequence and includes an actuator part to be controlled, a control angle, and a time until the control angle is reached.

ロボット２は、受信したアクチュエータ制御信号により指定の部分のアクチュエータを指定の角度と時間で制御するとともに、受信した音声ファイルを再生してロボット２が備えるスピーカーから出力する。 The robot 2 controls a specified portion of the actuator with a specified angle and time based on the received actuator control signal, reproduces the received audio file, and outputs it from a speaker included in the robot 2.

図１８は、ロボット制御装置１により制御されたロボット２の様子を示す図である。図１８（ａ）は、ディスプレイ４の方向へ視線を向けるように制御された様子を示す図であり、図１８（ｂ）は、ディスプレイ４の方向を指さしつつ、ユーザの方向へ視線を向けるように制御された様子を示す図である。 FIG. 18 is a diagram illustrating a state of the robot 2 controlled by the robot control device 1. FIG. 18A is a diagram illustrating a state in which the line of sight is controlled to be directed toward the display 4, and FIG. 18B is directed to the direction of the user while pointing the direction of the display 4. It is a figure which shows a mode that it was controlled to.

ロボット２に対してディスプレイ４の方向へ顔や目を向けるように制御することで、ロボット２がディスプレイ４に注目していることをユーザに自然に知らせることが可能となる。同様に、図示していないが、ロボット２がユーザの方向へ顔や目を向けるように制御することで、ロボット２がユーザに注目していることを自然に知らせることが可能となる。 By controlling the robot 2 so that its face and eyes are directed toward the display 4, it is possible to naturally notify the user that the robot 2 is paying attention to the display 4. Similarly, although not shown, by controlling the robot 2 so that its face and eyes are directed toward the user, it is possible to naturally inform the robot 2 that the user is paying attention.

また、ロボット２がユーザに注目し、映像内容に対する発話を行う場合に、ロボット２がユーザの方向へ顔や目を向け、ディスプレイ４を指さすように制御することで、ディスプレイ４に表示されている内容に対してユーザに発話していることを自然に知らせることが可能となる。 In addition, when the robot 2 pays attention to the user and speaks the video content, the robot 2 controls the user so that his / her face or eyes are directed toward the user and the display 4 is pointed to the display 4. It is possible to naturally inform the user that the content is being spoken.

このように、ユーザやディスプレイ４に向けたロボットの視線制御、指差し制御を行うことで、ロボット２がユーザとディスプレイ４の存在を認識していることをユーザに知覚させて、ロボット２が共に映像を視聴しているような感覚をユーザに与えることができる。 In this way, by performing line-of-sight control and pointing control of the robot toward the user and the display 4, the user perceives that the robot 2 recognizes the presence of the user and the display 4, and the robot 2 It is possible to give the user a feeling as if viewing video.

以上説明したように、本実施の形態によれば、定型発話文データベース１６を参照し、ディスプレイ４に対する操作内容とロボット２の感情状態に基づいてロボット２に発話させる発話内容を決定するとともに、発話内容の対話状態とロボット２の感情状態に応じたアクション内容をアクションデータベース１５から抽出することで、ディスプレイ４の状態変化に応じたアクションをロボット２に実行させることが可能となる。 As described above, according to the present embodiment, the utterance content to be uttered by the robot 2 is determined based on the operation content on the display 4 and the emotional state of the robot 2 with reference to the fixed utterance sentence database 16 and the utterance. By extracting the action content corresponding to the dialogue state of the content and the emotional state of the robot 2 from the action database 15, it is possible to cause the robot 2 to execute an action corresponding to the state change of the display 4.

本実施の形態によれば、ソーシャルメディアサーバ７から視聴中の番組に関するコメントを取得し、ロボット２に設定されたパーソナリティと一致するパーソナリティ一致話者のコメントからロボット２に発話させる発話内容を決定するとともに、発話内容の対話状態とロボット２の感情状態に基づいてロボット２に実行させるアクション内容をアクションデータベース１５から抽出することで、視聴中の番組の内容に応じたアクションをロボット２に実行させることが可能となる。その結果、ユーザはロボット２と共に番組を視聴している感覚を持たせることができ、ロボットがユーザに対して共感を生じさせることが可能となる。映像視聴時にユーザがロボットに対して共感を得たと感じることで、ロボット２に対する親和動機をユーザに与え、ロボット２にユーザへ対する商品・サービス・行動レコメンドなどを行わせた際に、ユーザが受け入れる確率を高めることが期待できる。 According to the present embodiment, a comment regarding the program being viewed is acquired from the social media server 7, and the utterance content to be uttered by the robot 2 is determined from the comment of the personality matching speaker that matches the personality set in the robot 2. At the same time, by extracting from the action database 15 the action content to be executed by the robot 2 based on the conversation state of the utterance content and the emotional state of the robot 2, the robot 2 is caused to execute an action corresponding to the content of the program being viewed. Is possible. As a result, the user can have a feeling of watching the program together with the robot 2, and the robot can give empathy to the user. When the user feels that the user has sympathized with the robot when viewing the video, the user accepts the user when the robot 2 is given a motivation for the robot 2 and the robot 2 performs a product, service, action recommendation, etc. for the user. It can be expected to increase the probability.

本実施の形態によれば、発話内容を決定する際に、ロボットに設定されたパーソナリティでコメントを絞り込むことで、一貫性のある発話・アクションをロボット２に実行させることが可能となる。 According to the present embodiment, when utterance contents are determined, it is possible to cause the robot 2 to execute consistent utterances / actions by narrowing down comments by the personality set in the robot.

本実施の形態によれば、番組関連発話・アクションタグデータベース１９から視聴中の番組に関連する発話・アクションタグ情報を取得して、ロボット２に発話させる発話内容と実行させるアクション内容を決定することで、ユーザが視聴中の番組に沿ったアクションをロボットに実行させることが可能となる。 According to the present embodiment, the utterance / action tag information related to the program being viewed is acquired from the program-related utterance / action tag database 19, and the utterance content to be uttered by the robot 2 and the action content to be executed are determined. Thus, it is possible to cause the robot to execute an action along the program that the user is viewing.

本実施の形態によれば、ユーザ方向、ディスプレイ方向に応じてロボット２を制御し、ユーザに対し、ロボット２が共にディスプレイ４を見ている感覚や発話時にユーザを見ている感覚を与えることで、ロボット２が映像やユーザの存在を認識していることを感じさせて、より視聴内容へ共感を与えることが可能となる。 According to the present embodiment, the robot 2 is controlled in accordance with the user direction and the display direction, and the user is given a sense that the robot 2 is both viewing the display 4 and a user when speaking. It is possible to make the robot 2 recognize the video and the presence of the user, and to give more empathy to the viewing content.

１…ロボット制御装置
１１…ディスプレイ情報処理部
１２…ソーシャルメディア情報取得部
１３…発話・アクション決定部
１４…ロボットパーソナリティ属性情報データベース
１５…アクションデータベース
１６…定型発話文データベース
１７…ポジネガ単語データベース
１８…口調変換データベース
１９…番組関連発話・アクションタグデータベース
２…ロボット
３…リモコン
４…ディスプレイ
５…電子番組表情報データベース
６…番組−ソーシャルメディアタグ関連データベース
７…ソーシャルメディアサーバ
８…位置取得サーバ DESCRIPTION OF SYMBOLS 1 ... Robot control apparatus 11 ... Display information processing part 12 ... Social media information acquisition part 13 ... Utterance and action determination part 14 ... Robot personality attribute information database 15 ... Action database 16 ... Regular utterance sentence database 17 ... Positive negative word database 18 ... Tone Conversion database 19 ... Program related utterance / action tag database 2 ... Robot 3 ... Remote control 4 ... Display 5 ... Electronic program guide information database 6 ... Program-social media tag related database 7 ... Social media server 8 ... Location acquisition server

Claims

A robot control device that causes a robot to perform an operation such as viewing a video with a user,
Comment acquisition means for acquiring a comment posted to the video;
An utterance sentence generating means for generating an utterance sentence to be uttered by the robot from the comment;
Emotion determining means for determining the emotional state of the robot from the comment;
Action storage means for storing the dialogue state of the utterance sentence to be uttered by the robot, the emotional state of the robot, and the action to be executed by the robot in association with each other;
Referring to the action accumulating means, action determining means for determining an action to be executed by the robot from the dialogue state of the spoken sentence generated by the utterance sentence generating means and the emotion state determined by the emotion determining means;
Control means for causing the robot to perform an operation based on the action determined by the action determination means, while causing the robot to synthesize and output the speech sentence generated by the speech sentence generation means;
A robot control apparatus comprising:

Having personality storage means for storing personality information set in the robot;
The robot control apparatus according to claim 1, wherein the utterance sentence generation unit generates the utterance sentence from the comment posted by a contributor who matches the personality.

Direction acquisition means for acquiring the direction of the user;
The robot control apparatus according to claim 1, wherein the action includes an action of looking at a direction of the user.

A robot control method by a computer that causes a robot to perform an operation such as viewing a video with a user,
Obtaining a comment posted to the video;
Generating an utterance sentence to be uttered by the robot from the comment;
Determining an emotional state of the robot from the comments;
The dialogue of the utterance sentence generated in the step of generating the utterance sentence with reference to the action storage means stored in association with the dialogue state of the utterance sentence to be uttered by the robot, the emotion state of the robot and the action to be executed by the robot Determining an action to be executed by the robot from the state and the emotional state determined in the step of determining the emotional state;
Synthesizing and outputting the utterance sentence generated in the step of generating the utterance sentence to the robot, and causing the robot to perform an action based on the action determined in the step of determining the action;
A robot control method comprising:

A robot control program for causing a robot to perform an operation such as viewing a video with a user,
Processing for obtaining a comment posted to the video;
Processing to generate an utterance sentence to be uttered by the robot from the comment;
A process for determining the emotional state of the robot from the comment;
Dialogue of the utterance sentence generated by the process of generating the utterance sentence with reference to the action storage means stored in association with the conversation state of the utterance sentence to be uttered by the robot, the emotion state of the robot and the action to be executed by the robot A process for determining an action to be executed by the robot from the emotional state determined by the process for determining the state and the emotional state;
A process of causing the robot to perform an action based on the action determined in the process of determining the action, while synthesizing and outputting the speech sentence generated in the process of generating the spoken sentence to the robot;
A robot control program for causing a computer to execute.