JP2016093279A

JP2016093279A - Control apparatus, control apparatus operation method, and computer program

Info

Publication number: JP2016093279A
Application number: JP2014230741A
Authority: JP
Inventors: 崇裕松元; Takahiro Matsumoto; 章裕宮田; Akihiro Miyata; 智樹渡部; Tomoki Watabe; 清田中; Kiyoshi Tanaka; 丈二中山; Joji Nakayama; 智広山田; Tomohiro Yamada
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2014-11-13
Filing date: 2014-11-13
Publication date: 2016-05-26

Abstract

PROBLEM TO BE SOLVED: To shorten a time for preparation and execution required for reminiscence therapy.SOLUTION: A control apparatus 1, that is a control apparatus to control an image presentation apparatus 2 and a robot 3 used by a user U, switches the operation mode of the control apparatus 1 among a reminiscence acceleration mode, a speech mode, and a display image speech mode on the basis of a speech by the user U and the direction of the user U's attention, and controls the image presentation apparatus 2 and the robot 3 in association.SELECTED DRAWING: Figure 1

Description

本発明は、回想療法に必要な事前準備や実施の時間を自動化によって低減することを目的としてなされた画像提示装置およびロボットの制御技術に関するものである。 The present invention relates to an image presentation apparatus and a robot control technology that are intended to reduce the time required for preparation and execution necessary for reminiscence therapy by automation.

従来、回想療法を実施する際は、療法を行う高齢者に対して適切な質問を投げかけたり、高齢者の会話を促すきっかけとなる写真を提示したり、高齢者の対話に対して適切な頷きや相槌を打ったりなどを行うことで、高齢者が対話を継続しやすい状態を演出する必要があった。そのため、回想療法を実施するスキルを持った人材（セラピスト）の形成、回想療法を実施するためのセラピストの稼働の確保、高齢者の会話を促すための写真の準備など、実施のために多くの人的リソースを必要としていた。 Conventionally, when performing reminiscence therapy, it is possible to ask appropriate questions to elderly people who are receiving therapy, present pictures that encourage conversations by elderly people, and appropriately speak to elderly people. It was necessary to produce a state in which elderly people can continue to interact with each other by hitting and competing. For this reason, there are many things to implement, such as the formation of human resources (therapists) with skills to implement reminiscence therapy, ensuring the operation of therapists to implement reminiscence therapy, and preparing photos to encourage conversation among the elderly. Needed human resources.

従来技術において、回想療法を支援する方法の１つとして非特許文献1のような、思い出ビデオと遠隔TV電話による回想療法支援が提案されている。非特許文献１の手法では、高齢者の昔の写真をスライドショーにした思い出ビデオを遠隔地からセラピストが操作しながら回想療法を実施することで、セラピストが直接対面することなく離れた場所からの回想療法を実現している。しかしながら、本手法に必要な思い出ビデオの作成には、あらかじめ高齢者と関係のある写真を選び、写真をPCに取り込んだあとで写真の地名や人名、撮影日時などをアノテーションデータとして取り込む必要がある。そのため、回想療法を実施する各個人に向けた思い出ビデオの作成は容易ではなく実施における課題の１つとなっていた。また遠隔TV電話による回想療法の実施は、セラピストが直接高齢者宅に訪れる必要が無くなった点で移動に必要な時間を削減できる一方、回想療法実施時には遠隔地でセラピストが療法を実施する必要があり、依然として人的稼働を必要にする点で課題が残っていた。 In the prior art, as one of the methods for supporting reminiscence therapy, reminiscence therapy support using a memory video and a remote videophone as in Non-Patent Document 1 has been proposed. In the method of Non-Patent Document 1, reminiscence therapy is performed while a therapist operates a memory video from a remote place with a slideshow of old photos of the elderly, so that the therapist can recall from a remote location without directly facing him. The therapy is realized. However, in order to create a memory video necessary for this method, it is necessary to select photos that are related to the elderly in advance, import the photos to the PC, and then import the location name, person name, shooting date, etc. of the photos as annotation data . For this reason, the creation of a memory video for each individual who performs reminiscence therapy has not been easy, and has been one of the challenges in implementation. In addition, reminiscence therapy by remote videophone can reduce the time required for movement because it is no longer necessary for the therapist to visit the elderly's house, while therapist needs to perform therapy at a remote location when reminiscence therapy is performed. There was still a challenge in terms of requiring human operation.

また特許文献１では、回想療法を受ける各ユーザに対して、”いつ”を示す時間属性と、”どこで”を示す空間属性と、”何を”を示す対象属性と、”どうしたのか”を示す行動属性とのそれぞれの値をイベントデータベースとして登録することで、イベントデータベースから質問文を生成し、質問をユーザに実施することで回想療法を実施する方法が提案されている。しかしながら、本手法は各ユーザの行動をイベントデータベースに登録する必要があり、自動取得のためには各種センサをユーザや環境に取り付けるなどの方法を取る必要がある。さらに、自動取得する場合においても“何を”を示す対象属性と“どうしたのか”を示す行動属性を自動取得することは一般に容易ではない。そのため、ユーザまたは環境にイベント自動抽出のためのセンサ機器などの設備を十分に整えるか、介護者による日々のイベントの登録を各ユーザ毎に行う必要があり、課題が残っていた。 In Patent Document 1, for each user who receives reminiscence therapy, a time attribute indicating “when”, a spatial attribute indicating “where”, a target attribute indicating “what”, and “what happened” A method of performing reminiscence therapy by generating a question sentence from an event database by registering each value of the behavior attribute to be shown as an event database and by asking the user a question has been proposed. However, in this method, it is necessary to register each user's action in the event database, and for automatic acquisition, it is necessary to take a method such as attaching various sensors to the user or the environment. Further, even in the case of automatic acquisition, it is generally not easy to automatically acquire the target attribute indicating “what” and the behavior attribute indicating “how”. Therefore, it is necessary to prepare facilities such as a sensor device for automatic event extraction in the user or environment, or it is necessary to register daily events for each user by a caregiver, and there remains a problem.

また従来技術では人間同士における対話の内容に応じて画像表示装置で表示する画像を切り替える[特許文献2]のような発明の提案が行われていた。しかしながら、これら従来技術を回想療法に適用する場合、回想を行いやすくする画像刺激を予め用意せずとも会話に応じて高齢者に画像刺激が提供できる利点があるが、回想療法の実施中はセラピストが高齢者に付き添う必要があり、また高齢者が表示された画像に対して話を展開したい場合にも画像表示が切り替わってしまうため、付添者または高齢者自身が端末を操作する必要があり、どちらかが装置の利用方法を覚え、会話の流れを停止しないように適切に各々が操作をする必要があった。 Further, in the prior art, an invention such as [Patent Document 2] has been proposed in which an image to be displayed on an image display device is switched in accordance with the content of dialogue between humans. However, when these conventional techniques are applied to reminiscence therapy, there is an advantage that image stimuli can be provided to elderly people according to conversation without preparing image stimuli that facilitate recollection in advance. Needs to be accompanied by the elderly, and when the elderly wants to expand the story on the displayed image, the image display will be switched, so the attendant or the elderly themselves must operate the device, One of them learned how to use the device, and each had to operate appropriately so as not to stop the flow of conversation.

特開２０１０−９２３５８号公報JP 2010-92358 A 特開２００９−２９４８６６号公報JP 2009-294866 A

TV電話とコンテンツ共有を用いた高齢者への遠隔からの対話や回想法を可能とするシステムの実現と評価．ヒューマンインタフェース学会論文誌 9(2), 111-122, 2007．Realization and evaluation of a system that enables remote conversation and reminiscence to elderly people using TV phone and content sharing. Journal of Human Interface Society 9 (2), 111-122, 2007. Dirichlet事前分布を用いた音声区間検出の検討．電子情報通信学会技術研究報告. NLC, 言語理解とコミュニケーション 109(355), 65-70, 2009．A study on speech segment detection using Dirichlet prior distribution. IEICE technical report. Language comprehension and communication 109 (355), 65-70, 2009. 日本語文章に含まれる固有用語の自動抽出方式．全国大会講演論文集第41回平成2年後期，227-228，1990An automatic extraction method for proper terms contained in Japanese sentences. National Conference Proceedings 41st Late 1990, 227-228, 1990

本発明は、上記の課題に鑑みてなされたものであり、その目的とするところは、回想療法に必要な事前準備や実施の時間を自動化によって低減することにある。 This invention is made | formed in view of said subject, The place made into the objective is to reduce the prior preparation required for reminiscence therapy, and the time of implementation by automation.

上記の課題を解決するために、第１の本発明は、ユーザにより使用される画像提示装置およびロボットを制御する制御装置であって、前記ユーザによる発話および前記ユーザが向ける注意の方向に基づいて、前記制御装置の動作モードを回想促進モード、対話モードおよび表示画像対話モードの中で切り替え、前記画像提示装置および前記ロボットを連動させて制御する手段を備えることを特徴とする。 In order to solve the above-described problems, a first aspect of the present invention is an image presentation device used by a user and a control device for controlling a robot, based on the utterance by the user and the direction of attention directed by the user. The operation mode of the control device is switched between a recollection promotion mode, a dialogue mode, and a display image dialogue mode, and the image presentation device and the robot are controlled in conjunction with each other.

例えば、前記回想促進モードのときは、前記ロボットが前記ユーザに向くように制御し、前記ユーザに回想を開始させるための質問文を前記ロボットが発話するように制御し、前記対話モードのときは、前記ユーザの発話から得た検索単語列により検索した画像を前記画像提示装置が切り替えて表示するように制御し、前記ロボットが前記ユーザに向くように制御し、前記ユーザに傾聴している動作を前記ロボットが実行するように制御し、前記表示画像対話モードのときは、前記ユーザの発話から得た検索単語列により検索した画像を前記画像提示装置が継続して表示するように制御し、前記継続して表示される前記画像を表示する前記画像提示装置の方に前記ロボットが向くように制御し、前記検索単語列に応じた発話テキストを前記ロボットが発話するように制御する。 For example, in the recollection promotion mode, the robot is controlled so as to face the user, and the robot is controlled to speak a question sentence for causing the user to start recollection. The operation of controlling the image presentation device to switch and display the image searched by the search word string obtained from the user's utterance, and controlling the robot to face the user and listening to the user Is controlled so that the robot executes, and in the display image dialogue mode, the image presentation device is controlled to continuously display an image searched by a search word string obtained from the user's utterance, The robot is controlled so that the robot faces the image presentation device that displays the continuously displayed image, and the utterance text corresponding to the search word string is Tsu door is controlled so as to speak.

例えば、前記回想促進モードのときは、前記ユーザの発話を検知した場合は、前記対話モードに遷移し、前記対話モードのときは、前記ユーザが前記画像提示装置に注意を向けることを開始したと検知した場合は、前記表示画像対話モードに遷移する一方、前記ユーザの発話が一定時間以上ないことを検知した場合は、前記回想促進モードに遷移し、前記表示画像対話モードのときは、前記ユーザが前記画像提示装置に注意を向けることを終了したと検知した場合は、前記対話モードに遷移する一方、前記ユーザの発話が一定時間以上ないことを検知した場合は、前記回想促進モードに遷移する。 For example, when the user's utterance is detected in the recollection promotion mode, the mode changes to the dialogue mode, and in the dialogue mode, the user starts to pay attention to the image presentation device. When detected, the display image dialogue mode is changed, while when the user's utterance is detected not to exceed a certain time, the recollection promotion mode is changed. When the display image dialogue mode is selected, the user is changed to the display image dialogue mode. Transitions to the dialogue mode when detecting that the user has finished paying attention to the image presentation device, while transitioning to the recollection promotion mode when it is detected that the user's utterance has not exceeded a predetermined time. .

例えば、前記対話モードにおいて、前記ユーザによる対話が停止している区間の長さが、予め定められた第１の閾値より長いという第１の条件と、前記ユーザの顔または視線の少なくとも一方が前記画像提示装置に向いており且つ前記ユーザによる対話が継続している区間の長さが、予め定められた第２の閾値より長いという第２の条件とが充足した場合に、前記ユーザが前記画像提示装置に注意を向けることを開始したと判定する。 For example, in the dialog mode, at least one of the first condition that the length of the section in which the dialog by the user is stopped is longer than a predetermined first threshold, and the user's face or line of sight is the When the user satisfies the second condition that the length of the section that is suitable for the image presentation device and the conversation by the user continues is longer than a predetermined second threshold, the user It determines with having started paying attention to a presentation apparatus.

例えば、前記表示画像対話モードにおいて、前記ユーザの顔または視線の少なくとも一方が前記画像提示装置に向いている状態が終了したことを検知してからの時間の長さが、予め定められた閾値より長いという条件が充足した場合に、前記ユーザが前記画像提示装置に注意を向けることを終了したと判定する。 For example, in the display image dialogue mode, the length of time after detecting that the state where at least one of the user's face or line of sight is facing the image presentation device has ended is greater than a predetermined threshold value. When the long condition is satisfied, it is determined that the user has finished paying attention to the image presentation device.

第２の本発明は、ユーザにより使用される画像提示装置およびロボットを制御する制御装置の動作方法であって、前記制御装置が、前記ユーザによる発話および前記ユーザが向ける注意の方向に基づいて、前記制御装置の動作モードを回想促進モード、対話モードおよび表示画像対話モードの中で切り替え、前記画像提示装置および前記ロボットを連動させて制御する。 A second aspect of the present invention is an operation method of a control device for controlling an image presentation device and a robot used by a user, wherein the control device is based on the direction of the utterance by the user and the direction of attention directed by the user. The operation mode of the control device is switched among a recall promotion mode, a dialogue mode, and a display image dialogue mode, and the image presentation device and the robot are controlled in conjunction with each other.

第３の本発明は、第１の本発明に係る制御装置としてコンピュータを機能させるためのコンピュータプログラムである。 The third aspect of the present invention is a computer program for causing a computer to function as the control device according to the first aspect of the present invention.

本発明によれば、回想療法に必要な事前準備や実施の時間を自動化によって低減することができる。 ADVANTAGE OF THE INVENTION According to this invention, the advance preparation required for reminiscence therapy and the time of implementation can be reduced by automation.

本実施の形態に係る制御装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the control apparatus which concerns on this Embodiment. 制御装置１および他の機器の配置を示す図である。It is a figure which shows arrangement | positioning of the control apparatus 1 and another apparatus. 回想促進モードでの動作例を示す図である。It is a figure which shows the operation example in recollection promotion mode. 対話モードでの動作例を示す第１の図である。It is a 1st figure which shows the operation example in dialog mode. 対話モードでの動作例を示す第２の図である。It is a 2nd figure which shows the operation example in dialog mode. 対話モードでの動作例を示す第３の図である。It is a 3rd figure which shows the operation example in dialog mode. 表示画像対話モードでの動作例を示す図である。It is a figure which shows the operation example in display image dialog mode. 動作モードの遷移例を示す図である。It is a figure which shows the example of a transition of an operation mode. 動作モードの遷移図である。It is a transition diagram of an operation mode. ユーザＵが画像提示装置２に注意を向けることを開始したことの検知に用いる変数を示す図である。It is a figure which shows the variable used for the detection of the user U having started paying attention to the image presentation apparatus. ユーザＵが画像提示装置２に注意を向けることを終了したことの検知に用いる変数を示す図である。It is a figure which shows the variable used for the detection of the user U having finished paying attention to the image presentation apparatus. 回想開始質問文記憶装置１０８の情報の構成を示す図である。It is a figure which shows the structure of the information of the recollection start question memory | storage device 108. FIG. 収集画像記憶装置１０９の情報の構成を示す図である。FIG. 4 is a diagram illustrating a configuration of information in a collected image storage device 109. 現在動作モード記憶装置１１１の情報の構成を示す図である。It is a figure which shows the structure of the information of the present operation mode memory | storage device 111. FIG. 注意対象結果記憶装置１１０の情報の構成を示す図である。It is a figure which shows the structure of the information of the attention object result storage device. 音声認識器１０１の処理のフローチャートである。4 is a flowchart of processing of a speech recognizer 101. 検索単語抽出器１０２の処理のフローチャートである。4 is a flowchart of processing of a search word extractor 102. 注意対象判定器１０５の処理のフローチャートである。10 is a flowchart of processing of a caution target determiner 105. 動作モード判定器１０６の処理のフローチャートである。5 is a flowchart of processing of an operation mode determination unit 106. 対話関連画像検索器１０３の処理のフローチャートである。It is a flowchart of a process of the dialog related image search device 103. ロボット動作生成器１０７の処理のフローチャートである。5 is a flowchart of processing of a robot motion generator 107. 表示画像操作器１０４の処理のフローチャートである。5 is a flowchart of processing of a display image operation device 104.

以下、本発明の実施の形態について図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

本実施の形態では、少ない事前準備で、かつ回想療法実施時にセラピストを必要せず注意に応じて適切に表示内容を切り替える手法を提案する。 In the present embodiment, a method is proposed that switches display contents appropriately according to attention with little advance preparation and without the need of a therapist when performing reminiscence therapy.

図１は、本実施の形態に係る制御装置１の概略構成を示すブロック図である。図２は、制御装置１および他の機器の配置を示す図である。 FIG. 1 is a block diagram showing a schematic configuration of a control device 1 according to the present embodiment. FIG. 2 is a diagram illustrating an arrangement of the control device 1 and other devices.

制御装置１は、ユーザＵにより使用される画像提示装置２およびロボット３を制御する装置であって、ユーザＵによる発話およびユーザＵが向ける注意の方向に基づいて、制御装置１の動作モードを回想促進モード、対話モードおよび表示画像対話モードの中で切り替え、画像提示装置２およびロボット３を連動させて制御する。 The control device 1 is a device that controls the image presentation device 2 and the robot 3 used by the user U, and recalls the operation mode of the control device 1 based on the utterance by the user U and the direction of attention directed by the user U. The mode is switched between the promotion mode, the dialogue mode, and the display image dialogue mode, and the image presentation device 2 and the robot 3 are controlled in conjunction with each other.

制御装置１は、その制御を行うために、マイク４、画像提示装置側のカメラ５、ロボット側のカメラ６、外部画像検索装置７に対し、ケーブルや通信回線（インターネットなど）を介して接続される。カメラ５は、画像提示装置２に十分近い位置に配置されるものとする。カメラ６は、ロボット３に十分近い位置に配置されるものとする。マイク４は、指向性を持ち、ロボット３が発話する音声は計測しないものとする。外部画像検索装置７は、１台に限定しないものとする。 In order to perform the control, the control device 1 is connected to the microphone 4, the camera 5 on the image presentation device side, the camera 6 on the robot side, and the external image search device 7 via a cable or a communication line (such as the Internet). The It is assumed that the camera 5 is disposed at a position sufficiently close to the image presentation device 2. It is assumed that the camera 6 is disposed at a position sufficiently close to the robot 3. The microphone 4 has directivity and does not measure the voice uttered by the robot 3. The external image search device 7 is not limited to one.

マイク４は、ユーザＵの音声を取得するものである。カメラ５、６は、ユーザＵの顔・視線方向を計測するためのものであり、Ｗｅｂカメラや、深度計測可能なカメラなどである。ロボット３は、音声出力および顔の向きの調整が可能なものである。なお、制御装置１、画像提示装置２、マイク４、カメラ５、６は、ロボット３に含まれていてもよい。また、カメラ５、６は、顔の方向と視線方向の一方を計測するために用いてもよい。また、ロボット３の音声は別の装置から出力してもよい。 The microphone 4 acquires the voice of the user U. The cameras 5 and 6 are for measuring the face / line-of-sight direction of the user U, and are a Web camera, a camera capable of depth measurement, and the like. The robot 3 is capable of voice output and face orientation adjustment. Note that the control device 1, the image presentation device 2, the microphone 4, and the cameras 5 and 6 may be included in the robot 3. The cameras 5 and 6 may be used for measuring one of the face direction and the line-of-sight direction. The voice of the robot 3 may be output from another device.

制御装置１は、取得されたユーザの音声に対し音声認識を行う音声認識器１０１と、音声認識結果に対して検索単語を抽出する検索単語抽出器１０２と、外部画像検索装置７に対して検索単語による画像検索を行い結果を取得する対話関連画像検索器１０３と、検索結果の画像を画像提示装置２に表示する表示画像操作器１０４と、ユーザＵの注意対象を判定する注意対象判定器１０５と、入力に応じて画像提示装置２とロボット３の動作を変更する動作モード判定器１０６と、ロボット３を操作するロボット動作生成器１０７と、ロボット３が質問を行う際に参照する回想開始質問文記憶装置１０８と、外部画像検索装置７から取得した画像が記憶される収集画像記憶装置１０９と、注意対象結果を記憶する注意対象結果記憶装置１１０と、現在の動作モードを記憶する現在動作モード記憶装置１１１とを備える。 The control device 1 searches the external image search device 7 for a speech recognizer 101 that performs speech recognition on the acquired user speech, a search word extractor 102 that extracts a search word from the speech recognition result, and the external image search device 7. Dialog-related image search unit 103 that performs image search by word and obtains a result, display image operation unit 104 that displays an image of the search result on image presentation device 2, and attention target determination unit 105 that determines the attention target of user U An operation mode determination unit 106 that changes the operations of the image presentation device 2 and the robot 3 according to the input, a robot operation generator 107 that operates the robot 3, and a recollection start question that the robot 3 refers to when making a question A sentence storage device 108, a collected image storage device 109 for storing images acquired from the external image search device 7, and a caution target result storage device 110 for storing caution target results It stores the current operation mode and a current operating mode storage unit 111.

音声認識器１０１からロボット動作生成器１０７までは、ソフトウェアモジュールであり、その他は、制御装置１内の記憶装置（ハードウェア）である。検索単語抽出器１０２は、例えば、音声認識結果の文章から固有名詞を抽出する機能を有する。 The speech recognizer 101 to the robot motion generator 107 are software modules, and the others are storage devices (hardware) in the control device 1. The search word extractor 102 has a function of extracting proper nouns from sentences of speech recognition results, for example.

制御装置１は、動作モードを回想促進モード、対話モード、表示画像対話モードのいずれかに切り替えて動作し、各動作モードに応じて画像提示装置２、ロボット３の動作を変更する。 The control device 1 operates by switching the operation mode to any one of the recall mode, the interaction mode, and the display image interaction mode, and changes the operations of the image presentation device 2 and the robot 3 according to each operation mode.

図３は、回想促進モードでの動作例を示す図である。図４、図５、図６は、対話モードでの動作例を示す図である。図７は、表示画像対話モードでの動作例を示す図である。図８は、動作モードの遷移例を示す図である。図９は、動作モードの遷移図である。 FIG. 3 is a diagram illustrating an operation example in the recollection promotion mode. 4, 5, and 6 are diagrams illustrating an operation example in the interactive mode. FIG. 7 is a diagram illustrating an operation example in the display image dialogue mode. FIG. 8 is a diagram illustrating an example of transition of the operation mode. FIG. 9 is a transition diagram of the operation mode.

回想促進モードの際、制御装置１は、ロボット３の顔がユーザＵに向くように制御し、前記ユーザに回想を開始させるための質問文をロボット３が発話するように制御する。 In the recollection promotion mode, the control device 1 controls the robot 3 so that the face of the robot 3 faces the user U, and controls the robot 3 to speak a question sentence for causing the user to start recollection.

回想促進モードは、ユーザＵの発話が検知されていない限り継続し、ロボット３はユーザＵの顔・視線の存在を検知し、ユーザＵに対して回想のきっかけとなる質問の発話を行う（図３（１））。 The recollection promotion mode continues unless the user U's utterance is detected, and the robot 3 detects the presence of the user's U face and line of sight and utters a question that triggers recollection to the user U (see FIG. 3 (1)).

回想促進モードでは、ロボット３は例えば、「子供の頃旅に行った場所で１番楽しかった場所について教えてよ」のような質問対話をユーザＵに投げかける。ロボット３の質問に対してユーザＵが返答の発話を行った場合は、動作モードを対話モードに推移させる。 In the recollection promotion mode, for example, the robot 3 throws a question dialogue such as “Tell me about the place you enjoyed most when you were traveling as a child” to the user U. When the user U utters a response to the question of the robot 3, the operation mode is changed to the dialogue mode.

対話モードでは、制御装置１が、ユーザＵの発話により得た検索単語列により検索した画像を画像提示装置２が切り替えて表示するように制御し、ロボット３の顔がユーザＵに向くように制御し、ユーザに傾聴している動作をロボット３が実行するように制御する。 In the interactive mode, the control device 1 controls the image presentation device 2 to switch and display the image searched by the search word string obtained by the utterance of the user U, and controls the face of the robot 3 to face the user U. Then, the robot 3 is controlled so as to execute the operation listening to the user.

対話モードは、ユーザＵの発話が検知されている限り継続し、ユーザＵの発話文から検索単語を抽出し（図４（２））、検索単語を元に画像検索を実施（図５（４））、検索結果の画像を画像提示装置２に一定時間表示する。1つの検索単語から複数の画像が検索結果として得られた場合は、一定時間で複数の検索結果の画像を切り替える（図６（５））。発話文中に複数の検索単語が抽出された場合は、其々で画像検索を実施、得られた複数の検索単語による複数の検索結果の画像を一定時間で切り替え表示する（図６（５））。 The dialogue mode continues as long as the user U's utterance is detected, and a search word is extracted from the user U's utterance sentence (FIG. 4 (2)), and an image search is performed based on the search word (FIG. 5 (4)). )), The image of the search result is displayed on the image presentation device 2 for a certain period of time. When a plurality of images are obtained as a search result from one search word, the images of the plurality of search results are switched in a certain time (FIG. 6 (5)). When a plurality of search words are extracted from the spoken sentence, an image search is performed for each of them, and images of a plurality of search results obtained by the plurality of search words are switched and displayed in a certain time (FIG. 6 (5)). .

また、対話モードでは、ロボット３はユーザＵの発話に対してうなずきや相槌、認識した単語のオウム返しから構成される傾聴動作を実施する（図４（３））。 Further, in the dialogue mode, the robot 3 performs a listening operation composed of nodding, competing, and parrot returning of the recognized word in response to the utterance of the user U ((3) in FIG. 4).

対話モードにおいて、一定時間以上のユーザＵによる発話停止を検知した場合は動作モードを回想促進モードに戻す（図8（１０））。また、対話モードにおいて、ユーザＵが画像提示装置２に注意を向けたことを検知した場合は、動作モードを表示画像対話モードに移行する。 In the dialogue mode, when it is detected that the user U has stopped speaking for a certain time or longer, the operation mode is returned to the recall promotion mode (FIG. 8 (10)). In the interactive mode, when it is detected that the user U has paid attention to the image presentation device 2, the operation mode is shifted to the display image interactive mode.

表示画像対話モードでは、制御装置１は、ユーザＵの発話により得た検索単語列により検索した画像を画像提示装置２が継続して表示するように制御し、継続して表示される画像を表示する画像提示装置２にロボット３の顔が向くように制御し、検索単語列に応じた発話テキストをロボット３が発話するように制御する。 In the display image dialogue mode, the control device 1 controls the image presentation device 2 to continuously display the image searched by the search word string obtained by the utterance of the user U, and displays the continuously displayed image. The robot 3 is controlled so that the face of the robot 3 faces the image presentation device 2 to be controlled, and the robot 3 is controlled to speak the utterance text corresponding to the search word string.

表示画像対話モードは、ユーザＵの表示画像に対する注意を検知する限り継続する。表示画像対話モードでは、画像提示装置２の画像切り替えを停止し、注意が画像提示装置２に向いたと判定されたタイミングでの画像提示を維持する（図７（６））。 The display image interaction mode continues as long as the user U's attention to the display image is detected. In the display image dialogue mode, the image switching of the image presentation device 2 is stopped, and the image presentation at the timing when it is determined that the attention is directed to the image presentation device 2 is maintained (FIG. 7 (6)).

また、表示画像対話モードでは、ロボット３も表示画像に応じた発話、動作を実施する。表示画像に応じた発話、動作では、ロボット３の視線方向（顔やロボット全体）を画像提示装置２に向ける、ロボット３が現在表示されている画像の検索単語「Ｎ」を用いて“これはＮなの？もっとこれのお話聞きたい”のような質問文を生成し、ロボット３に発話させる。 In the display image dialogue mode, the robot 3 also performs speech and operation according to the display image. In the utterance and operation according to the display image, the robot 3 is directed to the image presentation device 2 with the direction of the line of sight (face and the entire robot), and the search word “N” of the image currently displayed on the robot 3 is used. A question sentence such as “I want to hear more about this story?” Is generated and the robot 3 is uttered.

表示画像対話モードでは、提示内容が停止した時は提示内容に応じた発話動作に切り替える（図７（７））。 In the display image dialogue mode, when the presentation content is stopped, the speech operation is switched according to the presentation content ((7) in FIG. 7).

また、表示画像対話モードでは、ユーザＵの画像提示装置２への注意の終了を検知した場合は、動作モードを対話モードに戻し、画像の切り替えを再開する（図8（８））。また、対話モードでは、うなずきや相槌やオウム返しなどの傾聴動作を実施する（図８（９））。 Further, in the display image dialogue mode, when the end of attention of the user U to the image presentation device 2 is detected, the operation mode is returned to the dialogue mode, and image switching is resumed (FIG. 8 (8)). In the dialogue mode, listening operations such as nodding, companionship, and parrot return are performed (FIG. 8 (9)).

また、表示画像対話モードでは、ユーザＵの会話が一定時間停止したことを検知した場合は、動作モードを回想促進モードに戻し、回想のきっかけとなる質問の発話を行う（図8（１０））。 Further, in the display image dialogue mode, when it is detected that the conversation of the user U has been stopped for a certain period of time, the operation mode is returned to the recall promotion mode, and a question that triggers the recall is uttered (FIG. 8 (10)). .

上記のように、回想促進モードのときは、ユーザＵの発話を検知した場合は、対話モードに遷移する。 As described above, in the recollection promotion mode, when the user U's utterance is detected, the mode transitions to the dialogue mode.

また、対話モードのときは、ユーザＵが画像提示装置２に注意を向けることを開始したと検知した場合は、表示画像対話モードに遷移する一方、ユーザＵの発話が一定時間以上ないことを検知した場合は、回想促進モードに遷移する。 Further, in the interactive mode, when it is detected that the user U has started to pay attention to the image presentation device 2, the display mode is changed to the display image interactive mode, and it is detected that the user U has not spoken for a certain time or more. If you do, transition to recollection promotion mode.

また、表示画像対話モードのときは、ユーザＵが画像提示装置２に注意を向けることを終了したと検知した場合は、対話モードに遷移する一方、ユーザＵの発話が一定時間以上ないことを検知した場合は、回想促進モードに遷移する。 Further, in the display image dialogue mode, when it is detected that the user U has finished paying attention to the image presentation device 2, the user U is detected that the utterance of the user U is not longer than a predetermined time while transitioning to the dialogue mode. If you do, transition to recollection promotion mode.

制御装置１は、上記の各動作モードとモード遷移に加えて、ユーザＵの注意対象が表示画像に向けられたことを判定するアルゴリズムを持つ。一般的に人は対話中に対話内容の思考や、対話相手との親密度の調整や、会話番を保持するため、視線を対話相手から外す行為を行うことが知られている。そのため、単純に視線方向がロボット３から外れ画像提示装置２に向けられたか否かを注意対象判断の基準とすると、ユーザＵが表示画像に対して注意を向けていないのにもかかわらず表示画像に注意が向けられたと誤判定してしまう可能性が高まる。そこで、ユーザＵの表示画像への注意遷移検知の精度を向上させるためのアルゴリズムを用いる。 The control device 1 has an algorithm for determining that the attention object of the user U is directed to the display image, in addition to each operation mode and mode transition described above. In general, it is known that a person performs an act of removing his / her line of sight from a conversation partner in order to think about the content of the conversation, adjust the intimacy with the conversation partner, and maintain a conversation number during the conversation. For this reason, if the sight line direction is simply deviated from the robot 3 and directed to the image presentation device 2, the display image is displayed even though the user U is not paying attention to the display image. The possibility of misjudging that attention has been directed to is increased. Therefore, an algorithm for improving the accuracy of detecting the attention transition to the display image of the user U is used.

図１０は、ユーザＵが画像提示装置２に注意を向けることを開始したことの検知に用いる変数を示す図である。 FIG. 10 is a diagram illustrating variables used to detect that the user U has started to pay attention to the image presentation device 2.

本アルゴリズムには、２つのスレッショルド値（閾値）と2つの変数を用いる。対話切り替え区間判定スレッショルドｔ_{Ｔｎｅｘｔ}と、表示画像注意遷移判定スレッショルドｔ_ｔｏＶは0以上の実数であり、一方の変数である対話停止区間の長さｔ_{Ｔｓｔｏｐ}は対話検知停止から次の対話検知が始まるまでの区間の長さであり、他方の変数である対話開始からの表示画像への顔・視線方向検知兼対話検知継続区間の長さｔ_{ＶａｎｄＴ}は、対話開始が検出されてから表示画像への顔・視線検知または対話検知のいずれかが停止するまでの区間の長さである。つまり、ｔ_{ＶａｎｄＴ}は、ユーザＵの顔・視線方向が画像提示装置２に向いており且つユーザＵによる対話が継続している区間の長さである。対話開始の検出にはｔ_{Ｔｎｅｘｔ}とｔ_{Ｔｓｔｏｐ}を用いる。ｔ_{Ｔｎｅｘｔ}＜ｔ_{Ｔｓｔｏｐ}である場合、ｔ_{Ｔｓｔｏｐ}の最後の時刻を対話開始時刻とする。続いて、対話開始時刻からｔ_{ＶａｎｄＴ}を検出する。ｔ_ｔｏＶ＜ｔ_{ＶａｎｄＴ}を満たす場合、ユーザＵが画像提示装置２に注意を向けることを開始したと判定する。 In this algorithm, two threshold values (threshold values) and two variables are used. The dialogue switching section judgment threshold t _Tnext and the display image attention transition judgment threshold t _toV are real numbers of 0 or more, and one of the variables, the conversation stop section length t _Tstop, is from the dialog detection stop until the next dialog detection starts. The length t _VandT of the face / gaze direction detection / dialogue direction continuation detection to the display image from the start of dialogue, which is the other variable, is the face to the display image after the start of dialogue is detected. -This is the length of the interval until either gaze detection or dialogue detection stops. That is, t _VandT is the length of a section in which the face / line-of-sight direction of the user U faces the image presentation device 2 and the dialogue by the user U continues. T _Tnext and t _Tstop are used to detect the start of dialogue. When t _Tnext <t _Tstop , the last time of t _Tstop is set as the dialog start time. Subsequently, t _VandT is detected from the dialogue start time. When t _toV <t _VandT is satisfied, it is determined that the user U has started to _pay attention to the image presentation device 2.

また、制御装置１は、上記のように、ユーザＵが画像提示装置２に注意を向けることを開始したと検知する際の判定のアルゴリズムに加え、同様にユーザの注意対象判定器に関して、ユーザＵが画像提示装置２に注意を向けることを終了したと検知する際の判定のアルゴリズムを持つ。 In addition to the determination algorithm when detecting that the user U has started to pay attention to the image presentation device 2 as described above, the control device 1 similarly relates to the user U's attention target determination device. Has an algorithm for determination when it is detected that the user has finished paying attention to the image presentation device 2.

図１１は、ユーザＵが画像提示装置２に注意を向けることを終了したことの検知に用いる変数を示す図である。 FIG. 11 is a diagram illustrating variables used for detecting that the user U has finished paying attention to the image presentation device 2.

本アルゴリズムには、1つのスレッショルド値（閾値）と1つの変数を用いる。表示画像注意終了判定スレッショルドｔ_ｎｏＶは0以上の実数であり、変数である表示画像への顔・視線検知が停止してからの時間の長さｔ_{Ｖｓｔｏｐ}つまり、ユーザＵの顔・視線方向が画像提示装置２に向いている状態が終了したことを検知してからの時間の長さｔ_{Ｖｓｔｏｐ}が、ｔ_ｎｏＶに対してｔ_ｎｏＶ＜ｔ_{Ｖｓｔｏｐ}を満たす場合、ユーザＵが画像提示装置２に注意を向けることを終了したと判定する。 This algorithm uses one threshold value (threshold) and one variable. The display image attention end determination threshold t _noV is a real number greater than or _equal to 0, and the time t _Vstop after the detection of the face / gaze on the variable display image is stopped, that is, the face / gaze direction of the user U is the image. the length of time _{t Vstop} state facing the presentation device 2 from detection that it has _finished, if it meets _{_t noV _<t} _Vstop against _{t nov,} user U attention to the image display device 2 It is determined that the direction is finished.

ここで、図１に示す制御装置１内でやり取りされるデータについて述べる。
（１）音声入力は、マイク４により計測された時刻と結びついた音の波形データのストリームである。
（２）顔・視線方向入力は、カメラ５、６により取得された画像から得られる、最大1人の各カメラに対する.顔・視線方向であり、顔の水平方向角θ_{ｆａｃｅ＿ｘ}、顔の垂直方向角θ_{ｆａｃｅ＿ｙ}、視線の水平方向角θ_{ｅｙｅ＿ｘ}、視線の垂直方向角θ_{ｅｙｅ＿ｙ}から構成される。ユーザＵの顔と視線が共に正面がカメラの方向を向きカメラ光軸と水平である場合は、θ_{ｆａｃｅ＿ｘ}＝０、θ_{ｆａｃｅ＿ｙ}＝０、θ_{ｅｙｅ＿ｘ}＝０、θ_{ｅｙｅ＿ｙ}＝０であるものとし、それぞれ角度に応じて-90~90までの値をとる。人の顔・視線方向を取る方法としては、例えばオムロン社のOkao-Visionなどが挙げられる。
（３）注意対象結果は、“画像提示装置への注意”、“ロボットへの注意”、“注意対象該当無し”の３つの何れかの文字列と時刻が結びついた情報とする。
（４）注意対象結果ログは、注意対象結果を1つ以上まとめた情報とする。
（５）動作モード結果と前回動作モード結果は、“表示画像対話モード”、“対話モード“、”回想促進モード“の何れかの文字列情報とする。
（６）質問文文字列は、会話内容を記述した文字列とする。
（７）参照情報更新は、回想開始質問文記憶装置１０８の中の指定したIDの最終呼び出し時刻を書き換える命令情報である。
（８）発話文字列は、音声入力結果から生成した文字列情報とする。
（９）検索単語列は、固有名詞の文字列情報である。
（１０）画像データは、写真やイラストを表す画像ファイルとする。
（１１）画像検索単語ペアは、画像関連単語と画像データの組とする。
（１２）収集画像情報は、１組の画像表示時刻、画像記憶時刻、画像関連単語、画像データとする。
（１３）画像情報更新命令は、収集画像記憶装置１０９内の現在表示フラグと画像表示時刻を書き換え更新するための命令とする。
（１４）画像表示命令は、画像データを画像提示装置２に表示させる命令とする。
（１５）ロボット制御情報は、ロボット３に実行させるモーション指示と発話テキストから構成され、ロボット３は画像提示装置２に顔を向ける動作、ユーザ方向に顔を向ける動作、首を縦に振る動作が予め設定されており、ロボット制御情報で何れか１つを指定することで動作を実行でき、またロボット制御情報の発話テキストを受け取ると、テキストに対し音声合成を行い、音声出力を実行できるものとする。 Here, data exchanged in the control device 1 shown in FIG. 1 will be described.
(1) The voice input is a stream of sound waveform data associated with the time measured by the microphone 4.
(2) Face / line-of-sight direction input is the face / line-of-sight direction for each of a maximum of one camera obtained from images acquired by the cameras 5 and 6, the face horizontal direction angle θ _{face_x} , and the face vertical direction The angle θ _{face_y} , the line-of-sight horizontal direction angle θ eye — _x , and the line of sight vertical angle θ _{eye — y} When the face of the user U and the line of sight are both facing the direction of the camera and are horizontal with the camera optical axis, θ _{face_x} = 0, θ _{face_y} = 0, θ _{eye_x} = 0, θ _{eye_y} = 0, It takes a value from -90 to 90 depending on the angle. As a method of taking a person's face and line-of-sight direction, for example, OMRON's Okao-Vision can be cited.
(3) The attention object result is information in which any one of the three character strings “attention to image presentation device”, “attention to robot”, and “not applicable to attention object” is associated with the time.
(4) The attention object result log is information in which one or more attention object results are collected.
(5) The operation mode result and the previous operation mode result are character string information of “display image dialog mode”, “interaction mode”, or “recollection promotion mode”.
(6) The question sentence character string is a character string describing conversation contents.
(7) The reference information update is command information for rewriting the last call time of the specified ID in the recollection start question message storage device 108.
(8) The speech character string is character string information generated from the voice input result.
(9) The search word string is character string information of proper nouns.
(10) The image data is an image file representing a photograph or illustration.
(11) The image search word pair is a set of an image related word and image data.
(12) Collected image information is a set of image display time, image storage time, image-related word, and image data.
(13) The image information update command is a command for rewriting and updating the current display flag and the image display time in the collected image storage device 109.
(14) The image display command is a command for causing the image presentation device 2 to display the image data.
(15) The robot control information is composed of motion instructions to be executed by the robot 3 and utterance texts. The robot 3 has an action of turning the face toward the image presentation device 2, an action of turning the face toward the user, and an action of shaking the head vertically. It is set in advance and can be executed by specifying any one of the robot control information, and when the utterance text of the robot control information is received, speech synthesis is performed on the text and voice output can be executed. To do.

図１２は、回想開始質問文記憶装置１０８の情報の構成を示す図である。 FIG. 12 is a diagram showing a configuration of information in the recollection start question message storage device 108.

回想開始質問文記憶装置１０８は、ユーザＵに回想を開始させるべくユーザＵに対して発せられる複数の回想開始質問文のそれぞれにつき、その識別情報であるID、最終呼び出し時刻、回想開始質問文を含むレコードを備える。IDは、各回想開始質問文に固有の数値データである。最終呼び出し時刻は、時刻情報から構成され、回想開始質問文記憶装置１０８から各回想開始質問文が最後に参照された時刻を示す。１度も呼び出しが無い回想開始質問文においては、最終呼び出し時刻はnullとなる。回想開始質問文は例に示す様な対話のテキスト情報である。 The recollection start question storage device 108 stores ID, final call time, and recollection start question for each of a plurality of recollection start questions sent to the user U so as to cause the user U to start recollection. With records to contain. ID is numerical data unique to each recollection start question sentence. The last call time is composed of time information, and indicates the time when each recollection start question sentence is last referred to from the recollection start question sentence storage device 108. In a recall start question sentence that has never been called, the last call time is null. The recollection start question text is text information of dialogue as shown in the example.

図１３は、収集画像記憶装置１０９の情報の構成を示す図である。 FIG. 13 is a diagram illustrating a configuration of information in the collected image storage device 109.

収集画像記憶装置１０９は、各画像データにつき、その識別情報であるID、現在表示フラグ、画像表示時刻、画像関連単語、画像データを含む収集画像情報を記憶する。現在表示フラグは、画像提示装置２に現在表示されている画像を表し、１つの収集画像情報の現在表示フラグだけがtrueとなる。画像表示時刻は、現在表示フラグがtrueになった時刻を表す。画像記録時刻は、画像検索単語ペアが収集画像記憶装置１０９に登録された時間を表す。画像関連単語は、画像検索単語と一致する文字列である。 The collected image storage device 109 stores collected image information including ID, current display flag, image display time, image related word, and image data as identification information for each image data. The current display flag represents an image currently displayed on the image presentation device 2, and only the current display flag of one collected image information is true. The image display time represents the time when the current display flag becomes true. The image recording time represents the time when the image search word pair is registered in the collected image storage device 109. The image-related word is a character string that matches the image search word.

図１４は、現在動作モード記憶装置１１１の情報の構成を示す図である。 FIG. 14 is a diagram illustrating a configuration of information in the current operation mode storage device 111.

現在動作モード記憶装置１１１は、現在の動作モードを示す現在動作モードを記憶する。図では、現在動作モードが表示画像対話モードを示すが、対話モードや回想促進モードを示す場合もある。 The current operation mode storage device 111 stores a current operation mode indicating the current operation mode. In the figure, the current operation mode indicates the display image interaction mode, but there are also cases where the interaction mode and the recall promotion mode are indicated.

図１５は、注意対象結果記憶装置１１０の情報の構成を示す図である。 FIG. 15 is a diagram illustrating a configuration of information in the attention object result storage device 110.

注意対象結果記憶装置１１０は、ユーザＵによる注意の対象を時系列で検出した結果のそれぞれにつき、注意対象結果と更新時刻を含む情報を記憶する。注意対象結果は、注意の対象が画像提示装置２である場合は、”画像提示装置への注意”であり、注意の対象がロボット３である場合は、”ロボットへの注意”であり、注意の対象がない場合は、”注意対象該当無し”である。更新時刻は、各注意対象結果が注意対象結果記憶装置１１０に追加された時刻を示す。 The attention object result storage device 110 stores information including the attention object result and the update time for each result of detecting the object of attention by the user U in time series. The attention object result is “attention to the image presentation apparatus” when the attention object is the image presentation apparatus 2, and “attention to the robot” when the attention object is the robot 3. If there is no target, “Not applicable for caution”. The update time indicates the time when each attention object result is added to the attention object result storage device 110.

図１６は、音声認識器１０１の処理のフローチャートである。 FIG. 16 is a flowchart of processing of the speech recognizer 101.

音声認識器１０１は、音声入力を受ける限り、以下の処理を繰り返し実施する。 As long as the voice recognizer 101 receives voice input, the voice recognizer 101 repeatedly performs the following processing.

S101では、音声入力を受信する。音声入力が受信されていたら処理をS102に進める。 In S101, a voice input is received. If a voice input has been received, the process proceeds to S102.

S102では、S101で取得した音声入力において発話区間が継続しているか否かを判定する。音における人の発話区間を識別する手法としては様々な手法が一般に知られており、例えば非特許文献2の方法が挙げられる。音声入力の中で発話区間として区切られる部分がある場合は、音声入力開始から区切り部分までをS103へ送り、残りの音声入力はキューとして保持し次回の音声入力取得の際は保持した音声入力に続く形で情報を追加し、発話区間の継続は無いものとして処理をS103に進める。発話区間として区切られる部分が無い場合は全ての音声入力をキューとして保持し、処理をS101に戻し追加の音声入力受信を待つ。 In S102, it is determined whether or not the speech section is continued in the voice input acquired in S101. Various methods are generally known as a method for identifying a person's utterance section in sound, and for example, the method of Non-Patent Document 2 can be mentioned. If there is a part that is divided as an utterance section in the voice input, the part from the voice input start to the part is sent to S103, the remaining voice input is held as a queue, and the held voice input is used for the next voice input acquisition. Information is added in the following form, and the process proceeds to S103 assuming that there is no continuation of the utterance section. If there is no portion delimited as the utterance section, all voice inputs are held as a queue, and the process returns to S101 to wait for reception of additional voice inputs.

S103では、S102から送られた発話区間の音声入力に対して音声認識によるテキスト化を行い、テキスト化を行ったら処理をS104に進める。 In S103, the speech input sent from S102 is converted to text by speech recognition, and if the text is converted, the process proceeds to S104.

S104では、音声認識結果が文字列を含むか否かを判定する。テキスト化されたデータに文字列が含まれていなかった場合は処理をS101に戻す。文字列が含まれていた場合は処理をS105に進める。 In S104, it is determined whether or not the speech recognition result includes a character string. If the text data contains no character string, the process returns to S101. If a character string is included, the process proceeds to S105.

S105では、検索単語抽出器１０２に対し、音声認識結果（テキスト化された文字列）を発話文字列として送信する。発話文字列を送信したら処理を終了する。 In S105, the speech recognition result (text string) is transmitted to the search word extractor 102 as an utterance character string. When the utterance character string is transmitted, the process is terminated.

図１７は、検索単語抽出器１０２の処理のフローチャートである。 FIG. 17 is a flowchart of the processing of the search word extractor 102.

検索単語抽出器１０２は、音声認識器１０１から発話文字列を受信した際に、以下の処理を開始する。 When the search word extractor 102 receives the uttered character string from the speech recognizer 101, the search word extractor 102 starts the following processing.

S201では、音声認識器１０１から発話文字列を受信する。受信が完了したら処理をS202に進める。 In S201, the utterance character string is received from the speech recognizer 101. When reception is completed, the process proceeds to S202.

S202では、S201において受信した発話文字列が固有名詞を含むか否かを判定する。テキストから固有名詞を抽出する方式には様々な方式が提案されており、非特許文献3などを用いることができる。固有名詞が1つ以上含まれていた場合は、処理をS203に進める。固有名詞が１つも含まれていなかった場合は処理をS201に戻す。 In S202, it is determined whether or not the utterance character string received in S201 includes a proper noun. Various methods for extracting proper nouns from text have been proposed, and Non-Patent Document 3 and the like can be used. If one or more proper nouns are included, the process proceeds to S203. If no proper noun is included, the process returns to S201.

S203では、発話文字列に含まれる固有名詞からランダムで１つ選択し、選択された固有名詞を検索単語列として対話関連画像検索器１０３に送信する。送信を行ったら処理を終了する。 In S203, one of the proper nouns included in the utterance character string is selected at random, and the selected proper noun is transmitted as a search word string to the dialogue related image search unit 103. When transmission is completed, the process ends.

図１８は、注意対象判定器１０５の処理のフローチャートである。 FIG. 18 is a flowchart of the processing of the attention target determination unit 105.

S301では、画像提示装置側のカメラ５から顔・視線方向入力θ_{ｆａｃｅ＿ｘ}、θ_{ｆａｃｅ＿ｙ}、θ_{ｅｙｅ＿ｘ}、θ_{ｅｙｅ＿ｙ}を受信する。これらは、カメラ５に対し顔・視線共に正面を向いた時に0度となる値である。θ_{ｆａｃｅ＿ｘ}、θ_{ｅｙｅ＿ｘ}はそれぞれカメラ５の正面から水平方向に顔方向と視線方向が何度傾いているかを表し、θ_{ｆａｃｅ＿ｙ}、θ_{ｅｙｅ＿ｙ}はそれぞれカメラ５の正面から垂直方向に顔方向と視線方向が何度傾いているかを表す。受信が完了したら処理をS302に進める．
S302では、ユーザの顔・視線方向が画像提示装置２に向いているか否かを判定する。判定には視線と顔が顔・視線方向を取得したカメラの方向を向いているか否かを判定するスレッショルドをθ_{ＴＨＲＥＡＳＨＯＬＤ＿ｘ、}θ_{ＴＨＲＥＡＳＨＯＬＤ＿ｙ}の値を設定する。これら値は0~90までの正の実数とする。２つの条件式 In S301, the face / gaze direction inputs θ _{face_x} , θ _{face_y} , θ _{eye_x} , θ _{eye_y} are received from the camera 5 on the image presentation device side. These values are 0 degrees when both the face and line of sight face the camera 5. θ _{face_x} and θ _{eye_x} represent _{how much} the face direction and the line-of-sight direction are inclined in the horizontal direction from the front of the camera 5, respectively, and θ _{face_y} and θ _{eye_y} are the face direction and the line-of-sight direction perpendicular to the front of the camera 5, respectively. Indicates how many times it is tilted. When reception is completed, the process proceeds to S302.
In S302, it is determined whether or not the face / line-of-sight direction of the user is facing the image presentation device 2. For the determination, the thresholds for determining whether or not the line of sight and the face are facing the direction of the camera from which the face / line-of-sight direction is acquired are set to the values of _{θTHREAHOLD_x and} _{θTHREAHOLD_y} . These values are positive real numbers from 0 to 90. Two conditional expressions

を共に満たす場合、ユーザの顔・視線方向が画像提示装置を向いていると判定し処理をS303に進める。２つの条件式のうち何れか、または両方の条件を満たしていない場合は処理をS304に進める。なお、２つの条件式の一方のみを用いてもよい。つまり、ユーザの顔の方向が画像提示装置２
を向いているか否かのみを判定してもよく、ユーザの視線の方向が画像提示装置２を向いているか否かのみを判定してもよい。 If both are satisfied, it is determined that the user's face / gaze direction is facing the image presentation device, and the process proceeds to S303. If either or both of the two conditional expressions are not satisfied, the process proceeds to S304. Only one of the two conditional expressions may be used. That is, the direction of the user's face is the image presentation device 2.
It may be determined only whether or not the user is facing, or only whether or not the direction of the user's line of sight is facing the image presentation device 2 may be determined.

S303では、注意対象結果記憶装置１１０に対し、注意対象結果として、”画像提示装置への注意”を追加する。追加の際は、追加の際の時刻を更新時刻として利用する。追加を行ったら処理を終了する。 In S303, “attention to the image presentation device” is added to the attention object result storage device 110 as the attention object result. At the time of addition, the time at the time of addition is used as the update time. The process ends when the addition is made.

S304では、ユーザ側のカメラ６からユーザの顔・視線方向入力θ’_{ｆａｃｅ＿ｘ}、θ’_{ｆａｃｅ＿ｙ}、θ’_{ｅｙｅ＿ｘ}、θ’_{ｅｙｅ＿ｙ}を受信する。これらはカメラ６に対し顔・視線共に正面を向いた時に0度となる値である。θ’_{ｆａｃｅ＿ｘ}、θ’_{ｅｙｅ＿ｘ}はそれぞれカメラ６の正面から水平方向に顔方向と視線方向が何度傾いているかを表し、θ’_{ｆａｃｅ＿ｙ}、θ’_{ｅｙｅ＿ｙ}はそれぞれカメラ６の正面から垂直方向に顔方向と視線方向が何度傾いているかを表す。受信が完了したら処理をS305に進める。 In S304, the user's face / gaze direction input θ ′ _{face_x} , θ ′ _{face_y} , θ ′ _{eye_x} , θ ′ _{eye_y} is received from the camera 6 on the user side. These values are 0 degrees when both the face and line of sight face the camera 6. θ ′ _{face_x} and θ ′ _{eye_x} represent how much the face direction and the line-of-sight direction are inclined in the horizontal direction from the front of the camera 6, respectively, and θ ′ _{face_y} and θ ′ _{eye_y} are the face directions in the vertical direction from the front of the camera 6, respectively. And how many times the line-of-sight direction is tilted. When reception is completed, the process proceeds to S305.

S305では、ユーザの顔・視線方向がロボット３に向いているか否かを判定する。ここでは、２つの条件式 In S305, it is determined whether or not the face / line-of-sight direction of the user is facing the robot 3. Here, two conditional expressions

を満たすか否かを判定する。2つの条件式を共に満たす場合は、ユーザの顔・視線方向がロボット３に向いていると判定し、処理をS306に進める。２つの条件式のうち何れか、または両方の条件を満たしていない場合は処理をS307に進める。なお、２つの条件式の一方のみを用いてもよい。つまり、ユーザの顔の方向が画像提示装置２を向いているか否かのみを判定してもよく、ユーザの視線の方向が画像提示装置２を向いているか否かのみを判定してもよい。 It is determined whether or not the above is satisfied. If both of the two conditional expressions are satisfied, it is determined that the user's face / gaze direction is facing the robot 3, and the process proceeds to S306. If either or both of the two conditional expressions are not satisfied, the process proceeds to S307. Only one of the two conditional expressions may be used. That is, it may be determined only whether or not the direction of the user's face is facing the image presentation device 2, and only whether or not the direction of the user's line of sight is facing the image presentation device 2 may be determined.

S306では、注意対象結果記憶装置１１０に対し、注意対象結果として、”ロボットへの注意”を追加する。追加の際は、追加の際の時刻を更新時刻として利用する。追加を行ったら処理を終了する。 In S306, “attention to robot” is added to the attention object result storage device 110 as the attention object result. At the time of addition, the time at the time of addition is used as the update time. The process ends when the addition is made.

S307では、注意対象結果記憶装置１１０に対し、注意対象結果として、”注意対象該当無し”を追加する。追加の際は、追加の際の時刻を更新時刻として利用する。追加を行ったら処理を終了する。 In S307, “not applicable to caution target” is added to the caution target result storage device 110 as the caution target result. At the time of addition, the time at the time of addition is used as the update time. The process ends when the addition is made.

図１９は、動作モード判定器１０６の処理のフローチャートである。 FIG. 19 is a flowchart of the process of the operation mode determination unit 106.

動作モード判定器１０６は、音声入力がある場合、以下の処理を繰り返し実行する。 The operation mode determination unit 106 repeatedly executes the following processing when there is a voice input.

S401では、マイク４から新しい音声入力を一定時間分受信する。受信が完了したら処理をS402に進める。 In S401, a new voice input is received from the microphone 4 for a predetermined time. When reception is completed, the process proceeds to S402.

S402では、注意対象結果記憶装置１１０から注意対象判定結果ログを読み出す。取得が完了したら処理をS403に進める。 In S402, the attention object determination result log is read from the attention object result storage device 110. When the acquisition is completed, the process proceeds to S403.

S403では、現在動作モード記憶装置１１１に記憶された動作モードを読み出す。以下、この動作モードを前回動作モード結果という。前回動作モード結果を読み出したら、処理をS404に進める
S404では、マイク４から受信した一定時間分の音声入力に発話区間があるか否かを判定する。発話区間がない場合は処理をS411に進める。ある場合は処理をS405に進める。 In S403, the operation mode stored in the current operation mode storage device 111 is read. Hereinafter, this operation mode is referred to as a previous operation mode result. After reading the previous operation mode result, the process proceeds to S404.
In S404, it is determined whether or not there is an utterance section in the voice input for a certain time received from the microphone 4. If there is no utterance section, the process proceeds to S411. If there is, the process proceeds to S405.

S405では、S403において受信した前回動作モード結果の動作モードが表示画像対話モードであるか否かを判定する。表示画像対話モードである場合は処理をS407に進める。表示画像対話モードでなかった場合は処理をS406に進める。 In S405, it is determined whether or not the operation mode of the previous operation mode result received in S403 is the display image dialogue mode. If it is in the display image interactive mode, the process proceeds to S407. If it is not the display image interactive mode, the process proceeds to S406.

S406では、S403において前回動作モード結果の動作モードが対話モードであるか否かを判定する。対話モードである場合は処理をS408に進める。対話モードでなかった場合は処理をS410に進める。 In S406, it is determined in S403 whether or not the operation mode of the previous operation mode result is the interactive mode. If it is in the interactive mode, the process proceeds to S408. If not in the interactive mode, the process proceeds to S410.

S407では、注意判定結果ログに基づいてユーザの画像提示装置への注意終了を検知する。注意判定結果ログにおける各注意対象結果の更新時刻の中で、最も時刻が新しく注意対象結果が画像提示装置への注意であるものの更新時刻をｔ_０とする。このとき、本計算時の時刻をｔ_ｎｏｗとすると直近の表示画像への顔・視線検知が停止してからの時間ｔ_{Ｖｓｔｏｐ}は時刻ｔ_ｎｏｗとｔ_０の差としてｔ_{Ｖｓｔｏｐ}＝ｔ_ｎｏｗ−ｔ_０のように計算される。そして、表示画像注意終了判定スレッショルドｔ_ｎｏＶに対してｔ_ｎｏＶ＜ｔ_{Ｖｓｔｏｐ}を満たす場合、ユーザの表示画像に対する注意が終了したと判定する。注意が終了したと判定された場合は処理をS408に進める。それ以外の場合は処理をS409に進める．
S408では、注意判定結果ログと一定時間の音声入力とに基づいてユーザの画像提示装置への注意開始を検知する。 In S407, the end of the user's attention to the image presentation device is detected based on the attention determination result log. Of the update times of the respective attention target results in the attention determination result log, the update time of the latest time that is the attention target result is attention to the image presentation device is t ₀ . At this time, if the time at the time of this calculation is t _now , the time t _Vstop after the detection of the face / gaze on the most recent display image is stopped as the difference between the time t _now and t ₀ , t _Vstop = t _now −t ₀ It is calculated as follows. If t _noV <t _Vstop is satisfied with respect to the display image attention end determination threshold t _noV , it is determined that the user's attention to the display image has ended. If it is determined that the attention has ended, the process proceeds to S408. Otherwise, the process proceeds to S409.
In S408, the user's attention start to the image presentation device is detected based on the attention determination result log and the voice input for a certain period of time.

まず一定時間の音声入力内の発話区間の抽出を行い、非発話区間から発話区間へ変化のあった箇所を抽出し、その各時刻をｔ_ｓｔ＿ｉとする（ｉは抽出された箇所が音声入力の開始からみて何か所目にあたるかを示す整数値）、１か所も抽出が行われなかった場合は処理をS410に進める。次に、音声入力において各時刻ｔ_ｓｔ＿ｉの直前の非発話区間の長さをそれぞれｔ_{ｓｔｏｐ＿ｉ}とする。各ｔ_{ｓｔｏｐ＿ｉ}と対話切り替え区間判定スレッショルドｔ_{Ｔｎｅｘｔ}を比較した際に、ｔ_{Ｔｎｅｘｔ}＜ｔ_{ｓｔｏｐ＿ｉ}となる非発話区間から発話区間へ変化のあった箇所ｉの時刻ｔ_ｓｔ＿ｉをＴ_ｓｔ＿ｊとする（ｊは抽出された箇所の中で更にｔ_{Ｔｎｅｘｔ}＜ｔ_{ｓｔｏｐ＿ｉ}を満たす箇所が、音声入力の開始からみて何か所目にあたるかを示す整数値）。Ｔ_ｓｔ＿ｊに該当する箇所が１か所もなかった場合は処理をS410に進める。 First, an utterance section in a voice input for a certain period of time is extracted, a place where there has been a change from a non-speech section to a utterance section is extracted, and each time is set to t _{st_i} (where i is the place where the extracted part is a voice input) (An integer value indicating what point is seen from the start) If no extraction is performed, the process proceeds to S410. Then, the length of the non-speech section immediately before the time _{t ST_i} and _{t Stop_i} respectively in the speech input. When each t _{stop —} _i is compared with the dialogue switching interval determination threshold t _Tnext , the time t _{st —} i of the portion i where the change from the non-speech interval _where t _Tnext <t _{stop —} i is _satisfied to the utterance interval is _defined as T _{st —} j (j is extracted) Integer value indicating whether a portion satisfying t _Tnext <t _{stop — i} corresponds to what point from the start of voice input). If there is no place corresponding to T _{st — j} , the process proceeds to S410.

次に注意判定結果ログからＴ_ｓｔ＿ｊより前の時刻で最も新しい更新時刻を持つ注意対象結果を起点のＤ_ｊ０として、注意判定結果ログ内で最新の更新結果をもつ注意対象結果までを古い順にＤ_ｊ０、Ｄ_ｊ１、…、Ｄ_ｊｎとする。更にＤ_ｊ０、Ｄ_ｊ１、…、Ｄ_ｊｎに対して、注意対象結果が”画面提示装置への注意”であるという条件を満たすか否かを、Ｄ_ｊ０から更新時刻の古い順に調査し始めて条件を満たさなかったログをＤ_ｊｋとする。この時、全て条件を満たす場合はｋ＝ｎ、条件を満たすものが１つもない場合はｋ＝０とする。更に、非発話区間から発話区間へ変化のあった箇所の各時刻Ｔ_ｓｔ＿ｊに対して、次に発話区間から非発話区間に変化する時刻をＴ_ｅｔ＿ｊとする。この時、Ｔ_ｓｔ＿ｊの次に発話区間から非発話区間に変化することなく音声入力が終了した場合は音声入力の終了時刻をＴ_ｅｔ＿ｊとする。最後にＤ_ｊｋの更新時刻Ｔ_ｊｋとしたとき、Ｔ_ｊｋとＴ_ｅｔ＿ｊで時刻の新しい方をＴ_{ｊ＿ｅｎｄ}とし、表示画像注意遷移判定スレッショルドｔ_ｔｏＶに対してｔ_ｔｏＶ＜Ｔ_ｓｔ＿ｊ−Ｔ_{ｊ＿ｅｎｄ}となる抽出箇所ｊがある場合、処理をS409に進める。該当する抽出箇所が無い場合、処理をS410に進める。 Next, from the attention determination result log, the attention object result having the latest update time at the time before T _{st — j} is _set as the starting point D _j0 , and the attention object result having the latest update result in the attention determination result log is sorted in chronological order. _{_j0,} _D j1, ..., and _{D jn.} Furthermore, for D _j0 , D _j1 ,..., D _jn , whether or not the condition that the result of attention is “attention to the screen display device” is satisfied is started by _checking from D _j0 to the oldest update time. _Let D _jk be the log that _does not satisfy. At this time, if all the conditions are satisfied, k = n, and if none satisfy the conditions, k = 0. Further, for each time T _{st_j} where there is a change from the non-speaking section to the speaking section, the time when the next changing from the speaking section to the non-speaking section is _{Tet_j} . At this time, the T _{Et_j} the end time of the audio input when the audio input is terminated without changing the next speech segment of T _{St_j} the non-spoken section. Finally when the update time _{T jk} of _{D jk} to _the newer time at _{T jk} and _{T Et_j} and _{T J_end,} a _{_t toV _{_<T} st_j} _-T j_end the display image note transition determination threshold _{t Tov} extraction If there is a part j, the process proceeds to S409. If there is no corresponding extraction location, the process proceeds to S410.

S409では、現在動作モード記憶装置１１１の現在動作モードを表示画像対話モードに更新し、ロボット動作生成器１０７と表示画像操作器１０４に動作モード結果を送信する。更新および送信が終了したら処理を終了する。 In S409, the current operation mode of the current operation mode storage device 111 is updated to the display image dialogue mode, and the operation mode result is transmitted to the robot operation generator 107 and the display image operation unit 104. When the update and transmission are completed, the process ends.

S410では、現在動作モード記憶装置１１１の現在動作モードを対話モードに更新し、ロボット動作生成器１０７と表示画像操作器１０４に動作モード結果を送信する。更新および送信が終了したら処理を終了する。 In S410, the current operation mode of the current operation mode storage device 111 is updated to the interactive mode, and the operation mode result is transmitted to the robot operation generator 107 and the display image operation unit 104. When the update and transmission are completed, the process ends.

S411では、現在動作モード記憶装置１１１の現在動作モードを回想促進モードに更新し、ロボット動作生成器１０７と表示画像操作器１０４に動作モード結果を送信し、更新および送信が終了したら処理を終了する。 In S411, the current operation mode of the current operation mode storage device 111 is updated to the recollection promotion mode, the operation mode result is transmitted to the robot operation generator 107 and the display image operation unit 104, and the process is terminated when the update and transmission are completed. .

図２０は、対話関連画像検索器１０３の処理のフローチャートである。 FIG. 20 is a flowchart of the process of the dialogue related image search unit 103.

対話関連画像検索器１０３は、検索単語列を受信した際に以下の処理を実行する。 The dialogue related image search unit 103 executes the following process when receiving the search word string.

S501では、検索単語抽出器１０２から検索単語列を受信する。受信が完了したら処理をS502に進める．
S502では、検索単語列を外部画像検索装置７に送信する。送信が完了したら処理をS503に進める。 In S501, a search word string is received from the search word extractor 102. When reception is completed, the process proceeds to S502.
In S502, the search word string is transmitted to the external image search device 7. When the transmission is completed, the process proceeds to S503.

S503では、外部画像検索装置７から、検索単語列により検索された画像の画像データを受信する。受信が完了したら処理をS504に進める。 In S503, the image data of the image searched by the search word string is received from the external image search device 7. When reception is completed, the process proceeds to S504.

S504では、対話関連画像検索器１０３から受信した各画像データと検索単語列をペア（画像検索単語ペア）として収集画像記憶装置１０９に保存する。保存が完了したら処理を終了する。 In S504, each image data received from the dialogue related image search unit 103 and the search word string are stored in the collected image storage device 109 as a pair (image search word pair). When saving is completed, the process is terminated.

図２１は、ロボット動作生成器１０７の処理のフローチャートである。 FIG. 21 is a flowchart of the process of the robot motion generator 107.

ロボット動作生成器１０７は、動作モード結果を受信した場合、以下の処理を実行する。 When receiving the operation mode result, the robot operation generator 107 executes the following processing.

S601では、動作モード判定器１０６から動作モード結果を受信する。受信が完了したら処理をS602に進める。 In S601, the operation mode result is received from the operation mode determination unit 106. If reception is completed, the process proceeds to S602.

S602では、S601で受信した動作モード結果が表示画像対話モードか否かを判定する。結果が表示画像対話モードであった場合は処理をS605に進める。それ以外の場合は処理をS603に進める。 In S602, it is determined whether or not the operation mode result received in S601 is the display image dialogue mode. If the result is the display image interactive mode, the process proceeds to S605. Otherwise, the process proceeds to S603.

S603では、ロボット３の顔がユーザＵに向くようにロボット３を制御するためのロボット制御情報をロボット３に送信する。送信が完了したら処理をS604に進める。なお、ロボット３自体またはロボット３の視線がユーザＵに向くようにしてもよい。 In S603, robot control information for controlling the robot 3 so that the face of the robot 3 faces the user U is transmitted to the robot 3. When the transmission is completed, the process proceeds to S604. Note that the robot 3 itself or the line of sight of the robot 3 may face the user U.

S604では、S601で受信した動作モード結果が対話モードか否かを判定する。動作モード結果が対話モードであった場合は処理をS611に進める。それ以外の場合は処理をS615に進める。 In S604, it is determined whether or not the operation mode result received in S601 is the interactive mode. If the operation mode result is the interactive mode, the process proceeds to S611. Otherwise, the process proceeds to S615.

S605では、ロボット３の顔が画像提示装置２の方を向くようにロボット３を制御するためのロボット制御情報をロボット３に送信する。送信が完了したら処理をS606に進める。なお、ロボット３自体またはロボット３の視線が画像提示装置２の方に向くようにしてもよい。 In S605, robot control information for controlling the robot 3 is transmitted to the robot 3 so that the face of the robot 3 faces the image presentation device 2. When the transmission is completed, the process proceeds to S606. The robot 3 itself or the line of sight of the robot 3 may be directed toward the image presentation device 2.

S606では、マイク４から音声入力を一定時間受信する。音声入力の受信が終了したら処理をS607に進める。 In S606, a voice input is received from the microphone 4 for a predetermined time. When the reception of the voice input is completed, the process proceeds to S607.

S607では、まず、音声入力中の発話区間を求める処理を実行する。更に、正の実数で表される発話継続を判定する時間を表すスレッショルドｔ_{ｋｅｅｐ＿ｔａｌｋ}を予め設定しておき、受信した音声入力の最後からｔ_{ｋｅｅｐ＿ｔａｌｋ}以上の非発話区間が存在する場合は処理をS609に進める。それ以外の場合は処理をS608に進める。 In S607, first, processing for obtaining an utterance section during voice input is executed. Further, a threshold t _{keep_talk} representing a time for determining continuation of speech expressed by a positive real number is set in advance, and if there is a non-speech interval of t _{keep_talk} or more from the end of the received voice input, the process _proceeds to S609. Proceed. In other cases, the process proceeds to S608.

S608では、首を縦に振る（うなずく）動作をロボット３にさせるためのロボット制御情報をロボット３に送信する。送信が完了したら処理を終了する。 In S608, robot control information for causing the robot 3 to swing the head vertically (nodding) is transmitted to the robot 3. When the transmission is completed, the process ends.

S609では、表示画像操作器１０４から表示画像検索単語を受信し、受信が完了したら処理をS610に進める。 In S609, the display image search word is received from the display image operation unit 104, and when the reception is completed, the process proceeds to S610.

S610では、S609で受信した表示画像検索単語Ｗｏｒｄ_{ｎｏｗ＿ｖ}を用いて発話テキストを作成する。発話テキスト作成には定型文に表示画像検索単語を当てはめることで実現でき例えば、“ねぇねぇ、ぼくはＷｏｒｄ_{ｎｏｗ＿ｖ}について凄い興味があるから、もう少し教えて欲しいな。”のように作成することができる。発話テキスト作成が終了したら、発話テキストを音声合成して出力するようにロボット３を動作させるための発話テキストを含むロボット制御情報をロボット３に送信する。送信が終了したら処理を終了する。 In S610, an utterance text is created using the display image search word Word _{now_v} received in S609. Utterance text can be created by applying display image search words to a fixed sentence. For example, “Hey, I _'m very interested in _{Wordnow_v} , so I want you to tell me a little more.” . When the utterance text creation is completed, the robot control information including the utterance text for operating the robot 3 is transmitted to the robot 3 so that the utterance text is synthesized and output. When the transmission is completed, the process ends.

S611では、マイク４から音声入力を一定時間受信する。音声入力の受信が終了したら処理をS612に進める。 In S611, a voice input is received from the microphone 4 for a predetermined time. When the reception of the voice input is completed, the process proceeds to S612.

S612では、S607と同様にまず音声入力中の発話区間を求める処理を実行する。更に、S607と同様に発話継続を判定する時間を表すスレッショルドｔ_{ｋｅｅｐ＿ｔａｌｋ}を予め設定しておき、受信した音声入力の最後からｔ_{ｋｅｅｐ＿ｔａｌｋ}以上の非発話区間が存在する場合は処理をS613に進める。それ以外の場合は処理をS614に進める。 In S612, as in S607, first, processing for obtaining an utterance section during voice input is executed. Further, similarly to S607, a threshold t _{keep_talk indicating the} time for determining the continuation of utterance is set in advance, and if there is a non-speech interval of t _{keep_talk} or more from the end of the received voice input, the process proceeds to S613. Otherwise, the process proceeds to S614.

S613では、首を縦に振る（うなずく）動作をロボット３にさせるためのロボット制御情報をロボット３に送信する。送信が完了したら処理を終了する。 In S613, robot control information for causing the robot 3 to swing the head vertically (nodding) is transmitted to the robot 3. When the transmission is completed, the process ends.

S614では、ロボット３にあいづちをうたせかつ発話テキストを音声合成して出力（発話）させるための発話テキストを含むロボット制御情報をロボット３に送信する。発話テキストとしては、”それで、それで？”など予め決められたものを用いる。送信が終了したら処理を終了する。 In S 614, robot control information including the utterance text for uttering the robot 3 and synthesizing and outputting the utterance text is transmitted to the robot 3. As the utterance text, a predetermined text such as “So, then?” Is used. When the transmission is completed, the process ends.

S615では、まず、回想開始質問文記憶装置１０８から回想開始質問文を質問文文字列として読み出す。その際は最終呼び出し時刻を参照しnullのものの回想開始質問文を読み出す。nullのものが存在しなかった場合は、回想開始質問文記憶装置１０８内において最も古い最終呼び出し時刻をもつ回想開始質問文を読み出す。読み出したら、その回想開始質問文のIDの最終呼び出し時刻を現在時刻の値に変更するように参照情報更新を回想開始質問文記憶装置１０８に送信する。送信が完了したら処理をS616に進める。 In S615, first, the recollection start question sentence is read out from the recollection start question sentence storage device 108 as a question sentence character string. In that case, refer to the last call time and read the recollection start question sentence of null. If there is no null thing, the recollection start question sentence having the oldest last call time in the recollection start question sentence storage device 108 is read out. When read, the reference information update is transmitted to the recollection start question sentence storage device 108 so that the last call time of the ID of the recollection start question sentence is changed to the current time value. When the transmission is completed, the process proceeds to S616.

S616では、S615で読み出した質問文文字列をロボット３に発話テキストとして音声合成させ出力（発話）させるための質問文文字列を含むロボット制御情報をロボット３に送信する。送信が終了したら処理を終了する。 In S616, the robot control information including the question sentence character string for causing the robot 3 to synthesize and output (speak) the question sentence character string read in S615 as the utterance text is transmitted to the robot 3. When the transmission is completed, the process ends.

図２２は、表示画像操作器１０４の処理のフローチャートである。 FIG. 22 is a flowchart of processing of the display image operation unit 104.

表示画像操作器１０４は動作モード結果を受信した場合、以下の処理を実行する。 When the display image operation unit 104 receives the operation mode result, the display image operation unit 104 executes the following processing.

S701では、動作モード判定器１０６から動作モード結果を受信する。受信が完了したら処理をS702に進める。 In S701, the operation mode result is received from the operation mode determiner 106. If reception is completed, the process proceeds to S702.

S702では、S701で受信した動作モード結果が表示画像対話モードか否かを判定する。動作モード結果が表示画像対話モードであった場合は処理をS703に進める。それ以外の場合は処理をS705に進める。 In S702, it is determined whether or not the operation mode result received in S701 is the display image dialogue mode. If the operation mode result is the display image interaction mode, the process proceeds to S703. Otherwise, the process proceeds to S705.

S703では、収集画像記憶装置１０９を参照し、現在表示フラグがtrueの収集画像情報を読み出す。収集画像情報を読み出したら処理をS704に進める。 In S703, the collected image storage device 109 is referred to, and the collected image information whose current display flag is true is read. When the collected image information is read, the process proceeds to S704.

S704では、S703で読み出した収集画像情報に含まれる画像関連単語を表示画像検索単語としてロボット動作生成器１０７に送信する。送信が終了したら処理を終了する。 In S704, the image related word included in the collected image information read in S703 is transmitted to the robot motion generator 107 as a display image search word. When the transmission is completed, the process ends.

S705では、S701で受信した動作モード結果が対話モードであるか否かを判定する。動作モード結果が対話モードであった場合は処理をS706に進める。それ以外の場合は処理を終了する．
S706では、収集画像記憶装置１０９を参照し、現在表示フラグがtrueの収集画像情報を読み出す。読み出しが終了したら処理をS707に進める。 In S705, it is determined whether or not the operation mode result received in S701 is the interactive mode. If the operation mode result is the interactive mode, the process proceeds to S706. Otherwise, the process ends.
In S706, the collected image storage device 109 is referred to, and the collected image information whose current display flag is true is read. When the reading is finished, the process proceeds to S707.

S707では、収集画像情報の画像表示時刻と現在時刻を比較し、現在時刻が画像表示時刻から一定時間以上経過しているか否かを判定する。一定時間以上経過している場合は処理をS708に処理を進める。経過していない場合は処理をS713に進める。 In S707, the image display time of the collected image information is compared with the current time, and it is determined whether or not the current time has passed a certain time from the image display time. If the predetermined time has elapsed, the process proceeds to S708. If not, the process proceeds to S713.

S708では、収集画像記憶装置１０９を参照し、現在時刻から画像記録時刻が一定時間以内でかつ画像表示時刻がnullの収集画像情報を探索する。検索が終了したなら処理をS709に進める。 In S708, the collected image storage device 109 is referenced to search for collected image information whose image recording time is within a predetermined time from the current time and whose image display time is null. If the search ends, the process proceeds to S709.

S709では、S708の条件に満たす収集画像情報が１つ以上あるか否かを判定する。取得された収集画像情報が１つもなかった場合は処理をS713に進める。それ以外の場合は処理をS710に進める。 In S709, it is determined whether there is one or more collected image information satisfying the condition of S708. If there is no acquired collected image information, the process proceeds to S713. In other cases, the process proceeds to S710.

S710では、S708で取得された収集画像情報の中からランダムで１つを選択する。選択が終了したら処理をS711に進める。 In S710, one is randomly selected from the collected image information acquired in S708. When the selection is completed, the process proceeds to S711.

S711では、S710で選択された収集画像情報の画像データを画像表示命令としてロボット動作生成器１０７に送信する。送信が完了したら処理をS712に進める。 In S711, the image data of the collected image information selected in S710 is transmitted to the robot motion generator 107 as an image display command. When the transmission is completed, the process proceeds to S712.

S712では、まず収集画像記憶装置１０９の現在表示フラグがtrueとなっている情報をfalseに変更し、S710で選択された収集画像情報の現在表示フラグをtrueに変更する画像情報更新命令を収集画像記憶装置１０９に送信する。さらにS710で選択された収集画像情報の画像表示時刻を現在時刻に変更する画像情報更新命令を送信する．送信が終了したら処理を終了する。 In S712, first, the information in which the current display flag of the collected image storage device 109 is true is changed to false, and an image information update command is executed to change the current display flag of the collected image information selected in S710 to true. Transmit to the storage device 109. Furthermore, an image information update command for changing the image display time of the collected image information selected in S710 to the current time is transmitted. When the transmission is completed, the process ends.

S713では、S706で読み出した収集画像情報の画像関連単語を表示画像検索単語としてロボット動作生成器１０７に送信する。送信が終了したら処理を終了する。 In S713, the image related words of the collected image information read out in S706 are transmitted to the robot motion generator 107 as display image search words. When the transmission is completed, the process ends.

以上のように、本実施の形態に係る制御装置１は、ユーザＵにより使用される画像提示装置２およびロボット３を制御する制御装置であって、ユーザＵによる発話およびユーザＵが向ける注意の方向に基づいて、制御装置１の動作モードを回想促進モード、対話モードおよび表示画像対話モードの中で切り替え、画像提示装置２およびロボット３を連動させて制御することを特徴とする。 As described above, the control device 1 according to the present embodiment is a control device that controls the image presentation device 2 and the robot 3 that are used by the user U. Based on the above, the operation mode of the control device 1 is switched among the recall mode, the dialogue mode, and the display image dialogue mode, and the image presentation device 2 and the robot 3 are controlled in conjunction with each other.

具体的には、回想促進モードのときは、ロボット３がユーザＵに向くように制御し、ユーザＵに回想を開始させるための質問文をロボットＵが発話するように制御し、対話モードのときは、ユーザＵの発話から得た検索単語列により検索した画像を画像提示装置２が切り替えて表示するように制御し、ロボット３がユーザＵに向くように制御し、ユーザＵに傾聴している動作をロボット３が実行するように制御し、表示画像対話モードのときは、ユーザＵの発話から得た検索単語列により検索した画像を画像提示装置２が継続して表示するように制御し、継続して表示される画像を表示する画像提示装置２の方にロボット３が向くように制御し、検索単語列に応じた発話テキストをロボット３が発話するように制御する。 Specifically, in the recollection promotion mode, the robot 3 is controlled to face the user U, and the robot U is controlled to speak a question sentence for causing the user U to start recollection. Controls the image presentation device 2 to switch and display the image searched by the search word string obtained from the utterance of the user U, controls the robot 3 to face the user U, and listens to the user U. The operation is controlled so that the robot 3 executes, and in the display image dialogue mode, the image presentation device 2 is controlled to continuously display the image searched by the search word string obtained from the utterance of the user U, Control is performed so that the robot 3 faces the image presentation apparatus 2 that displays images continuously displayed, and control is performed so that the robot 3 speaks the utterance text corresponding to the search word string.

さらに具体的には、回想促進モードのときは、ユーザＵの発話を検知した場合は、対話モードに遷移し、対話モードのときは、ユーザＵが画像提示装置に注意を向けることを開始したと検知した場合は、表示画像対話モードに遷移する一方、ユーザＵの発話が一定時間以上ないことを検知した場合は、回想促進モードに遷移し、表示画像対話モードのときは、ユーザＵが画像提示装置２に注意を向けることを終了したと検知した場合は、対話モードに遷移する一方、ユーザＵの発話が一定時間以上ないことを検知した場合は、回想促進モードに遷移する。 More specifically, when the user U's utterance is detected in the recollection promotion mode, the mode transitions to the dialogue mode, and in the dialogue mode, the user U starts to pay attention to the image presentation device. When it detects, it changes to display image dialog mode, On the other hand, when it detects that there is no utterance of user U for a fixed time or more, it changes to recollection promotion mode. When it is detected that attention to the device 2 is finished, the mode transitions to the dialogue mode. On the other hand, when it is detected that the user U has not spoken for a certain period of time, the mode changes to the recollection promotion mode.

さらに具体的には、対話モードにおいて、ユーザＵによる対話が停止している区間の長さ（ｔ_{Ｔｓｔｏｐ}）が、予め定められた第１の閾値（ｔ_{Ｔｎｅｘｔ}）より長いという第１の条件と、ユーザＵの顔または視線の少なくとも一方が画像提示装置２に向いており且つユーザＵによる対話が継続している区間の長さ（ｔ_{ＶａｎｄＴ}）が、予め定められた第２の閾値（ｔ_ｔｏＶ）より長いという第２の条件とが充足した場合に、ユーザＵが画像提示装置２に注意を向けることを開始したと判定する。 More specifically, in the interactive mode, the first condition that the length (t _Tstop ) of the section in which the dialogue by the user U is stopped is longer than a predetermined first threshold value (t _Tnext ); A length (t _VandT ) of a section in which at least one of the face or line of sight of the user U faces the image presentation device 2 and the dialogue by the user U continues is a predetermined second threshold value (t _toV ). If the second condition of longer is satisfied, it is determined that the user U has started to pay attention to the image presentation device 2.

また、表示画像対話モードにおいて、ユーザＵの顔または視線の少なくとも一方が画像提示装置２に向いている状態が終了したことを検知してからの時間の長さ（ｔ_{Ｖｓｔｏｐ}）が、予め定められた閾値（ｔ_ｎｏＶ）より長いという条件が充足した場合に、ユーザＵが画像提示装置２に注意を向けることを終了したと判定する。 Further, in the display image interaction mode, the length of time (t _Vstop ) after detecting that the state where at least one of the user U's face or line of sight is facing the image presentation device 2 is completed is determined in advance. When the condition that it is longer than the threshold (t _noV ) is satisfied, it is determined that the user U has finished paying attention to the image presentation device 2.

制御装置１によれば、これらの技術の１つまたは組み合わせにより、回想療法に必要な事前準備や実施の時間を低減することができる。 According to the control device 1, it is possible to reduce the preparation time required for reminiscence therapy and the implementation time by one or a combination of these techniques.

なお、制御装置１としてコンピュータを機能させるためのコンピュータプログラムは、半導体メモリ、磁気ディスク、光ディスク、光磁気ディスク、磁気テープなどのコンピュータ読み取り可能な記録媒体に記録でき、また、インターネットなどの通信網を介して伝送させて、広く流通させることができる。 The computer program for causing the computer to function as the control device 1 can be recorded on a computer-readable recording medium such as a semiconductor memory, a magnetic disk, an optical disk, a magneto-optical disk, a magnetic tape, and a communication network such as the Internet. And can be widely distributed.

１制御装置
２画像提示装置
３ロボット
４マイク
５、６カメラ
７外部画像検索装置
１０１音声認識器
１０２検索単語抽出器
１０３対話関連画像検索器
１０４表示画像操作器
１０５注意対象判定器
１０６動作モード判定器
１０７ロボット動作生成器
１０８回想開始質問文記憶装置
１０９収集画像記憶装置
１１０注意対象結果記憶装置
DESCRIPTION OF SYMBOLS 1 Control apparatus 2 Image presentation apparatus 3 Robot 4 Microphone 5, 6 Camera 7 External image search apparatus 101 Speech recognizer 102 Search word extractor 103 Dialog-related image search unit 104 Display image operation unit 105 Attention object determination unit 106 Operation mode determination unit 107 robot motion generator 108 recollection start question sentence storage device 109 collected image storage device 110 attention object result storage device

Claims

A control device for controlling an image presentation device and a robot used by a user,
Based on the utterance by the user and the direction of attention directed by the user, the operation mode of the control device is switched between the recall mode, the dialogue mode, and the display image dialogue mode, and the image presentation device and the robot are interlocked. A control apparatus comprising: means for controlling.

When in the recollection promotion mode, the robot is controlled to face the user, and the robot is controlled to speak a question sentence for starting the recollection.
In the interactive mode, control is performed so that the image presentation device switches and displays an image searched based on a search word string obtained from the user's utterance, and the robot is controlled to face the user. Control the robot to perform the action of listening to
In the display image dialogue mode, the image presentation device is controlled to continuously display an image searched by a search word string obtained from the user's utterance, and the continuously displayed image is displayed. The control device according to claim 1, wherein the robot is controlled so that the robot faces the image presentation device, and the utterance text corresponding to the search word string is controlled to speak.

In the recollection promotion mode, if the user's utterance is detected, transition to the dialogue mode,
In the interactive mode, when it is detected that the user has started to pay attention to the image presentation device, the display image interactive mode is transitioned to while the user's utterance is not detected for a certain period of time. If you do, transition to the recall mode,
In the display image dialogue mode, when it is detected that the user has finished paying attention to the image presentation device, the user makes a transition to the dialogue mode while detecting that the user's utterance has not exceeded a predetermined time. When it does, it changes to the recollection promotion mode. The control device according to claim 1 or 2 characterized by things.

In the interactive mode,
The first condition that the length of the section in which the user's dialogue is stopped is longer than a predetermined first threshold, and at least one of the user's face or line of sight is suitable for the image presentation device. And when the 2nd condition that the length of the section where the dialog by the user is continuing is longer than the predetermined 2nd threshold is satisfied, the user pays attention to the image presentation device. The control device according to claim 3, wherein it is determined that the operation is started.

In the display image interaction mode,
When the condition that the length of time after detecting that the state in which at least one of the user's face or line of sight is facing the image presentation device has ended is longer than a predetermined threshold is satisfied, The control device according to claim 3, wherein it is determined that the user has finished paying attention to the image presentation device.

An operation method of an image presentation device used by a user and a control device for controlling a robot,
The control device switches the operation mode of the control device among a recollection promotion mode, a dialogue mode and a display image dialogue mode based on the utterance by the user and the direction of attention directed by the user, the image presentation device and the An operation method of a control device characterized by controlling the robot in conjunction with each other.

The computer program for functioning a computer as a control apparatus in any one of Claims 1 thru | or 5.