JP2020106746A

JP2020106746A - Control device, control method, control program, and interactive device

Info

Publication number: JP2020106746A
Application number: JP2018247545A
Authority: JP
Inventors: 浩志和田; Hiroshi Wada
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2020-07-09

Abstract

To provide an interactive device which can respond to a user at an appropriate timing.SOLUTION: A control device (20) which controls an interactive device (1) includes: an utterance appropriate value management unit (23) which updates an utterance appropriate value indicating the appropriateness of the uttering of the interactive device (1), based on the content of an operation performed by a user to the interactive device (1), the content of the utterance made by the user to the interactive device (1), or the content of a response made by the interactive device (1) to the user; and an utterance appropriateness determination unit (22b) which determines whether or not it is appropriate for the interactive device (1) to speak to the user according to the utterance appropriate value updated by the utterance appropriate value management unit (23).SELECTED DRAWING: Figure 1

Description

本発明は、利用者と対話を行う対話装置を制御する制御装置、制御方法、及び制御プログラムに関する。また、本発明は、利用者と対話を行う対話装置に関する。 The present invention relates to a control device, a control method, and a control program for controlling a dialogue device that interacts with a user. The present invention also relates to a dialogue device for dialogue with a user.

近年、音声認識や言語処理を行うことで利用者と対話する対話装置（スマートフォンやロボットなど）の開発が盛んに行なわれている。例えば、過去にロボットを利用した複数の利用者の各々について、その利用者の操作履歴、及び、その利用者とロボットとの対話履歴を、ロボットの感情値と共に記憶部に記憶するようにしたロボットが提案されている（特許文献１参照）。 In recent years, interactive devices (smartphones, robots, etc.) that interact with users by performing voice recognition and language processing have been actively developed. For example, for each of a plurality of users who have used the robot in the past, the operation history of the user and the history of the dialogue between the user and the robot are stored in a storage unit together with the emotion value of the robot. Has been proposed (see Patent Document 1).

上記ロボットでは、複数の利用者の中から遭遇した利用者を特定する人物特定手段（例えば、顔画像認識手段、音声認識手段）を設け、人物特定手段により利用者を特定した場合、ロボットの現在の感情値と、当該利用者に関して記憶部に記憶された操作履歴、対話履歴、及び感情値とを参照して、新たなロボットの感情値を生成する構成となっている。これにより、利用者毎に異なる行動をロボットにとらせることを可能としている。 In the above robot, a person identifying means (for example, a face image recognizing means, a voice recognizing means) for identifying a user encountered from a plurality of users is provided. The emotion value of, the operation history, the dialogue history, and the emotion value stored in the storage unit for the user are referenced to generate a new emotion value of the robot. This enables the robot to take different actions for each user.

特開２００３−１１７８６６号公報（２００３年４月２３日公開）JP-A-2003-117866 (Published April 23, 2003)

しかしながら、特許文献１に記載のロボットでは、ロボットが利用者に対して話しかけるタイミングが最適化されていない。このため、利用者にとって適切なタイミングでロボットが発話することができないおそれがあるという課題がある。 However, in the robot described in Patent Document 1, the timing at which the robot talks to the user is not optimized. Therefore, there is a problem that the robot may not be able to speak at an appropriate timing for the user.

本発明は、上述した課題に鑑みなされたものであり、その目的は、利用者にとって適切なタイミングで発話することが可能な対話装置を実現するための制御装置、制御方法、及び制御プログラムを提供することにある。 The present invention has been made in view of the above-mentioned problems, and an object thereof is to provide a control device, a control method, and a control program for realizing a dialogue device capable of speaking at an appropriate timing for a user. To do.

上記の課題を解決するために、本発明の一態様に係る制御装置は、利用者と対話する対話装置を制御する制御装置であって、上記対話装置が発話することの適切さを表す発話適切値を、上記利用者が上記対話装置に対して行った操作の内容、上記利用者が上記対話装置に対して行った発話の内容、又は、上記対話装置が上記利用者に対して行った応答の内容に基づいて更新する発話適切値管理部と、上記対話装置が上記利用者に対して発話することが適切であるか否かを、上記発話適切値管理部により更新された発話適切値に応じて判定する発話適否判定部と、を備えている。 In order to solve the above problems, a control device according to an aspect of the present invention is a control device that controls a dialogue device that interacts with a user, and the utterance appropriateness that indicates the appropriateness of speaking by the dialogue device. The value is the content of the operation performed by the user on the dialog device, the content of the utterance made by the user to the dialog device, or the response made by the dialog device to the user. The utterance appropriate value management unit that updates based on the content of the utterance appropriate value, and whether or not it is appropriate for the dialogue device to utter the user, the utterance appropriate value updated by the utterance appropriate value management unit. An utterance adequacy determining unit that determines according to the utterance.

上記の課題を解決するために、本発明の一態様に係る制御方法は、利用者と対話する対話装置を制御する制御方法であって、上記対話装置が発話することの適切さを表す発話適切値を、上記利用者が上記対話装置に対して行った操作の内容、上記利用者が上記対話装置に対して行った発話の内容、又は、上記対話装置が上記利用者に対して行った応答の内容に基づいて更新する発話適切値更新処理と、上記対話装置が上記利用者に対して発話することが適切であるか否かを、上記発話適切値更新処理にて更新された発話適切値に応じて判定する発話適否判定処理と、を含んでいる。 In order to solve the above problems, a control method according to an aspect of the present invention is a control method for controlling a dialogue device that interacts with a user, and the utterance appropriateness indicates the appropriateness of speaking by the dialogue device. The value is the content of the operation performed by the user on the dialog device, the content of the utterance made by the user to the dialog device, or the response made by the dialog device to the user. Utterance appropriate value update processing for updating based on the content of the, and whether or not it is appropriate for the dialog device to speak to the user, the utterance appropriate value updated in the utterance appropriate value update processing. Utterance suitability determination processing for determining according to.

本発明の一態様によれば、利用者にとって適切なタイミングで話しかけることができる。 According to one embodiment of the present invention, a user can speak at an appropriate timing.

本発明の実施形態１に係る対話装置の概略構成を示すブロック図である。1 is a block diagram showing a schematic configuration of a dialogue device according to a first embodiment of the present invention. 実施形態１に係る対話装置における発話適切値の初期値を表す図である。FIG. 6 is a diagram showing initial values of appropriate utterance values in the dialogue device according to the first embodiment. （ａ）は、実施形態１に係る対話装置における利用者の操作内容と発話適切値の変化量との関係を示す図であり、（ｂ）は、利用者の発話内容と発話適切値の変化量との関係を示す図であり、（ｃ）は、対話装置の応答内容と発話適切値の変化量との関係を示す図である。(A) is a figure which shows the relationship between the user's operation content and the amount of change of a utterance appropriate value in the dialogue apparatus concerning Embodiment 1, (b) is a user's utterance content and the change of an utterance appropriate value. It is a figure which shows the relationship with a quantity, and (c) is a figure which shows the relationship between the response content of a dialog device and the amount of change of an appropriate utterance value. 実施形態１に係る対話装置における発話適切値の時間変化の例を示すグラフである。5 is a graph showing an example of temporal changes in utterance appropriate values in the dialogue device according to the first embodiment. 実施形態１に係る対話装置における発話処理の流れを示すフローチャートである。5 is a flowchart showing a flow of speech processing in the dialogue device according to the first embodiment. 本発明の実施形態２に係る対話装置における発話処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the utterance process in the dialog apparatus which concerns on Embodiment 2 of this invention.

〔実施形態１〕
以下、本発明の実施形態１に係る対話装置１について、図１〜図５を参照して説明する。なお、説明の便宜上、同一の機能を有する部材については、同一の符号を付し、適宜その説明を省略する。 [Embodiment 1]
Hereinafter, the dialogue device 1 according to the first embodiment of the present invention will be described with reference to FIGS. 1 to 5. For convenience of description, members having the same function will be denoted by the same reference numeral, and description thereof will be omitted as appropriate.

＜対話装置の構成＞
対話装置１の構成について、図１を参照して説明する。図１は、実施形態１に係る対話装置１の概略構成を示すブロック図である。対話装置１は、利用者と対話する装置（電子機器）であり、スマートフォンなどの非人型装置として実現されていてもよいし、ロボットなどの人型装置として実現されていてもよい。 <Structure of dialogue device>
The configuration of the dialogue device 1 will be described with reference to FIG. FIG. 1 is a block diagram showing a schematic configuration of the dialogue device 1 according to the first embodiment. The interactive device 1 is a device (electronic device) that interacts with a user, and may be realized as a non-humanoid device such as a smartphone or a humanoid device such as a robot.

対話装置１は、図１に示すように、入力部１０と、制御部２０（特許請求の範囲における「制御装置」の一例）と、記憶部３０と、出力部４０と、を備えている。なお、対話装置１は、更に、通信ネットワークにより外部との通信を行う通信部（不図示）を備えていてもよい。通信ネットワークとしては、例えば、インターネット、電話回線網、移動体通信網、ＣＡＴＶ（Cable Television）通信網、衛星通信網等を利用できる。 As shown in FIG. 1, the dialogue device 1 includes an input unit 10, a control unit 20 (an example of a “control device” in the claims), a storage unit 30, and an output unit 40. The dialogue device 1 may further include a communication unit (not shown) that communicates with the outside through a communication network. As the communication network, for example, the Internet, a telephone line network, a mobile communication network, a CATV (Cable Television) communication network, a satellite communication network, or the like can be used.

（入力部）
入力部１０は、対話装置１に情報を入力するための装置である。入力部１０は、例えば図１に示すように、マイク１１と、カメラ１２と、タッチパネル１３と、センサ１４と、により構成することができる。 (Input section)
The input unit 10 is a device for inputting information to the dialogue device 1. The input unit 10 can be configured by a microphone 11, a camera 12, a touch panel 13, and a sensor 14, as shown in FIG. 1, for example.

マイク１１は、対話装置１の周囲の音声を取得する。マイク１１は、取得した音声を表す音声データを制御部２０に提供する。例えば、発話中の利用者が対話装置１の周囲に居る場合、制御部２０に提供される音声データの表す音声には、その利用者の発話音が含まれる。 The microphone 11 acquires a voice around the dialogue apparatus 1. The microphone 11 provides the control unit 20 with audio data representing the acquired audio. For example, when the user who is speaking is around the dialog device 1, the voice represented by the voice data provided to the control unit 20 includes the utterance sound of the user.

カメラ１２は、対話装置１の周囲の画像を取得する。カメラ１２は、取得した画像を表す画像データを制御部２０に提供する。例えば、利用者が対話装置１の周囲に居る場合、制御部２０に提供される画像データの画像には、その利用者の像が含まれる。 The camera 12 acquires an image of the surroundings of the interactive device 1. The camera 12 provides the control unit 20 with image data representing the acquired image. For example, when the user is around the dialogue device 1, the image of the image data provided to the control unit 20 includes the image of the user.

タッチパネル１３は、利用者のタッチ操作を受け付ける。タッチパネル１３は、受け付けたタッチ操作を表す操作情報を制御部２０に提供する。タッチパネル１３が受け付け可能な操作としては、例えば、アプリの起動、スリープの解除、又は発話の中断などを指示するタッチ操作が挙げられる。なお、タッチパネル１３の代わりに、例えば押しボタン式の操作装置（例えば、物理ボタン）を用いてもよい。 The touch panel 13 receives a touch operation by the user. The touch panel 13 provides the control unit 20 with operation information indicating the received touch operation. Examples of operations that the touch panel 13 can receive include touch operations for instructing activation of an application, release of sleep, or interruption of speech. Note that, instead of the touch panel 13, for example, a push button type operation device (for example, a physical button) may be used.

センサ１４は、対話装置１の状態、又は、対話装置１の周辺の状態を検知する。センサ１４は、検知した状態を表す状態情報を制御部２０に提供する。対話装置１の状態としては、例えば、対話装置１の振動、又は、対話装置１の加速度が挙げられる。この場合、例えば、振動センサ、又は、加速度センサがセンサ１４として用いられる。また、対話装置１の周辺の状態としては、例えば、対話装置１の周辺における利用者の有無が挙げられる。この場合、例えば、人感センサ、又は、近接センサがセンサ１４として用いられる。 The sensor 14 detects the state of the dialogue device 1 or the state around the dialogue device 1. The sensor 14 provides the control unit 20 with state information indicating the detected state. Examples of the state of the dialog device 1 include vibration of the dialog device 1 or acceleration of the dialog device 1. In this case, for example, a vibration sensor or an acceleration sensor is used as the sensor 14. The state around the dialogue device 1 may be, for example, the presence or absence of a user around the dialogue device 1. In this case, for example, a motion sensor or a proximity sensor is used as the sensor 14.

（制御部）
制御部２０は、対話装置１を制御するための装置である。制御部２０は、例えばＣＰＵ（Central Processing Unit）により実現される。制御部２０は、例えば図１に示すように、入力制御部２１と、出力制御部２２と、発話適切値管理部２３と、として機能する。制御部２０を入力制御部２１、出力制御部２２、及び発話適切値管理部２３として機能させるプログラムは、記憶部３０に記憶されていてもよいし、通信部（不図示）を介して外部から供給されてもよい。 (Control unit)
The control unit 20 is a device for controlling the dialogue device 1. The control unit 20 is realized by, for example, a CPU (Central Processing Unit). The control unit 20 functions as an input control unit 21, an output control unit 22, and an appropriate utterance value management unit 23, as shown in FIG. 1, for example. The program that causes the control unit 20 to function as the input control unit 21, the output control unit 22, and the utterance appropriate value management unit 23 may be stored in the storage unit 30 or may be externally supplied via a communication unit (not shown). It may be supplied.

入力制御部２１は、入力部１０を用いて対話装置１に入力された情報を処理するための構成である。入力制御部２１は、例えば図１に示すように、音声認識部２１ａと、画像認識部２１ｂと、イベント処理部２１ｃと、により構成することができる。 The input control unit 21 is a configuration for processing information input to the dialogue device 1 using the input unit 10. The input control unit 21 can be configured by, for example, as shown in FIG. 1, a voice recognition unit 21a, an image recognition unit 21b, and an event processing unit 21c.

音声認識部２１ａは、マイク１１から提供された音声データが表す音声に含まれる利用者の発話音声を検出する検出処理を実行する。この検出処理にて検出する利用者の発話音声は、特定の利用者の発話音声であってもよいし、不特定の利用者の発話音声であってもよい。また、音声認識部２１ａは、この検出処理に成功した場合、検出した発話音声を発話内容（例えば、発話内容をテキストで表したテキストデータ）に変換する変換処理を実行する。この変換処理は、例えば、公知のＳＴＴ（Speech To Text）技術によって実現することができる。音声認識部２１ａは、この変換処理に成功した場合、得られた発話内容を出力制御部２２及び発話適切値管理部２３に提供する。また、音声認識部２１ａは、検出処理の正否、及び、変換処理の正否を発話適切値管理部２３に通知する。 The voice recognition unit 21a executes a detection process of detecting the user's uttered voice included in the voice represented by the voice data provided from the microphone 11. The uttered voice of the user detected by this detection process may be the uttered voice of a specific user or may be the uttered voice of an unspecified user. Further, when the detection process is successful, the voice recognition unit 21a executes a conversion process for converting the detected utterance voice into utterance content (for example, text data in which the utterance content is represented by text). This conversion process can be realized by a known STT (Speech To Text) technique, for example. When this conversion processing is successful, the voice recognition unit 21a provides the obtained utterance content to the output control unit 22 and the utterance appropriate value management unit 23. The voice recognition unit 21a also notifies the utterance appropriate value management unit 23 of the correctness of the detection process and the correctness of the conversion process.

画像認識部２１ｂは、カメラ１２から提供された画像データが表す画像に含まれる利用者の像を検出する検出処理を実行する。この検出処理にて検出する利用者の像は、特定の利用者の像であってもよいし、不特定の利用者の像であってもよい。画像認識部２１ｂは、この検出処理の正否を出力制御部２２に通知する。 The image recognition unit 21b executes a detection process of detecting the image of the user included in the image represented by the image data provided by the camera 12. The image of the user detected by this detection process may be an image of a specific user or an image of an unspecified user. The image recognition unit 21b notifies the output control unit 22 of the correctness of this detection process.

イベント処理部２１ｃは、タッチパネル１３から提供される操作情報に基づいて、操作内容を特定する特定処理を実行する。イベント処理部２１ｃは、この特定処理に成功した場合、特定した操作内容を発話適切値管理部２３に通知する。また、イベント処理部２１ｃは、センサ１４から提供される状態情報に基づいて、対話装置１の発話のトリガーとなるイベント（以下、「発話トリガーイベント」と記載する）を検出する。本実施形態においては、利用者が対話装置１に触れたことによって生じる対話装置１の振動を発話トリガーイベントとしている。イベント処理部２１ｃは、発話トリガーイベントを検出すると、その旨を出力制御部２２に通知する。 The event processing unit 21c executes a specific process for specifying the operation content based on the operation information provided from the touch panel 13. If the identification processing is successful, the event processing unit 21c notifies the utterance appropriate value management unit 23 of the identified operation content. In addition, the event processing unit 21c detects an event that triggers the utterance of the dialog device 1 (hereinafter, referred to as an "utterance trigger event") based on the state information provided by the sensor 14. In the present embodiment, the vibration of the dialogue device 1 caused by the user touching the dialogue device 1 is used as the utterance trigger event. When the event processing unit 21c detects the utterance trigger event, the event processing unit 21c notifies the output control unit 22 to that effect.

出力制御部２２は、入力制御部２１を用いて対話装置１に入力され情報に応じたリアクションを対話装置１に実行させるための構成である。出力制御部２２は、例えば図１に示すように、応答生成部２２ａと、発話適否判定部２２ｂと、音声出力制御部２２ｃと、表示制御部２２ｄと、により構成することができる。 The output control unit 22 is configured to cause the dialogue device 1 to execute a reaction according to information input to the dialogue device 1 using the input control unit 21. The output control unit 22 can be configured by a response generation unit 22a, an utterance adequacy determination unit 22b, a voice output control unit 22c, and a display control unit 22d, as shown in FIG. 1, for example.

応答生成部２２ａは、（１）利用者から話しかけられたとき、及び、発話トリガーイベントが発生したときに対話装置１の発話内容及び表示内容を決定し、（２）決定した発話内容を発話適切値管理部２３に通知し、（３）決定した発話内容を表す音声データを音声出力制御部２２ｃに提供し、（４）決定した表示内容を表す画像データを表示制御部２２ｄに提供する。なお、利用者から話かけられた場合、利用者の発話内容が音声認識部２１ａから応答生成部２２ａに提供される。この場合、応答生成部２２ａは、音声認識部２１ａから提供された（利用者の）発話内容に応じた（対話装置１の）発話内容及び表示内容を決定する。一方、発話トリガーイベントが発した場合、その旨がイベント処理部２１ｃから応答生成部２２ａに通知される。この場合、応答生成部２２ａは、そのときの状況に応じた（対話装置１の）発話内容及び表示内容を決定する。 The response generation unit 22a (1) determines the utterance content and the display content of the dialogue device 1 when the user speaks to the user and when the utterance trigger event occurs, and (2) appropriately utters the determined utterance content. The value management unit 23 is notified, (3) audio data representing the determined utterance content is provided to the audio output control unit 22c, and (4) image data representing the determined display content is provided to the display control unit 22d. When the user speaks, the speech recognition unit 21a provides the response generation unit 22a with the content of the user's utterance. In this case, the response generation unit 22a determines the utterance content (of the dialogue device 1) and the display content according to the utterance content (of the user) provided from the voice recognition unit 21a. On the other hand, when the utterance trigger event occurs, the event processing unit 21c notifies the response generation unit 22a of that fact. In this case, the response generation unit 22a determines the utterance content (display of the dialog device 1) and the display content according to the situation at that time.

発話適否判定部２２ｂは、発話トリガーイベントが発生したときに対話装置１が発話することが適切であるか否かを、記憶部３０に記憶された発話適切値に応じて判定する。この発話適切値は、対話装置１が発話することの適切さを表す数値であり、発話することが適切であるときには相対的に大きい値を取り、発話することが不適切であるときには相対的に小さい値を取る。発話適否判定部２２ｂは、この発話適切値を予め定められた閾値と比較することによって、対話装置１が発話することが適切であるか否かを判定する。具体的には、発話適切値が閾値以上である（又は閾値よりも大きい）場合、発話することが適切であると判定し、発話適切値が閾値よりも小さい（又は閾値以下である）場合、発話することが不適切であると判定する。 The utterance adequacy determination unit 22b determines whether or not it is appropriate for the dialog device 1 to utter when the utterance trigger event occurs, according to the utterance appropriate value stored in the storage unit 30. The appropriate utterance value is a numerical value indicating the appropriateness of the dialogue device 1 to speak, and takes a relatively large value when it is appropriate to speak, and relatively when it is inappropriate to speak. Take a small value. The utterance adequacy determining unit 22b compares the appropriate utterance value with a predetermined threshold value to determine whether or not it is appropriate for the dialog device 1 to utter. Specifically, when the appropriate utterance value is equal to or higher than the threshold value (or larger than the threshold value), it is determined that it is appropriate to speak, and when the appropriate utterance value is lower than the threshold value (or equal to or lower than the threshold value), It is determined that speaking is inappropriate.

音声出力制御部２２ｃは、応答生成部２２ａから提供された音声データに基づいて、応答生成部２２ａが決定した発話内容を可聴的に出力するよう、出力部４０に含まれるスピーカ４１を制御する。表示制御部２２ｄは、応答生成部２２ａから提供された画像データに基づいて、応答生成部２２ａが決定した表示内容を可視的に出力するよう、出力部４０に含まれるディスプレイ４２を制御する。 The voice output control unit 22c controls the speaker 41 included in the output unit 40 so as to audibly output the utterance content determined by the response generation unit 22a based on the voice data provided from the response generation unit 22a. The display control unit 22d controls the display 42 included in the output unit 40 so as to visually output the display content determined by the response generation unit 22a based on the image data provided from the response generation unit 22a.

発話適切値管理部２３は、音声認識部２１ａから提供された（利用者の）発話内容、及び、イベント処理部２１ｃから提供された（利用者の）操作内容に基づいて、記憶部３０に記憶された発話適切値を管理する。発話適切値管理部２３による発話適切値の管理方法については、参照する図面を代えて後述する。 The appropriate utterance value management unit 23 is stored in the storage unit 30 based on the utterance content (of the user) provided by the voice recognition unit 21a and the operation content (of the user) provided by the event processing unit 21c. Manage the appropriate value of the uttered speech. A method of managing the appropriate utterance value by the appropriate utterance value management unit 23 will be described later with reference to the drawings.

（記憶部）
記憶部３０は、対話装置１に情報を記憶するための装置である。記憶部３０は、例えば、半導体メモリにより実現される。記憶部３０には、上述した発話適切値の他に、後述する操作内容−変化量テーブル、発話内容−変化量テーブル、及び応答内容−変化量テーブルが記録されている。これらのテーブルの内容については、参照する図面を代えて後述する。 (Storage unit)
The storage unit 30 is a device for storing information in the dialogue device 1. The storage unit 30 is realized by, for example, a semiconductor memory. In addition to the utterance appropriate value described above, the storage unit 30 stores an operation content-change amount table, an utterance content-change amount table, and a response content-change amount table, which will be described later. The contents of these tables will be described later with reference to the referenced drawings.

（出力部）
出力部４０は、対話装置１から情報を出力するための装置である。出力部４０は、例えば図１に示すように、スピーカ４１と、ディスプレイ４２と、により実現される。 (Output section)
The output unit 40 is a device for outputting information from the dialogue device 1. The output unit 40 is realized by a speaker 41 and a display 42 as shown in FIG. 1, for example.

スピーカ４１は、音声出力制御部２２ｃの制御によって、応答生成部２２ａが決定した発話内容を可聴的に出力する。ディスプレイ４２は、表示制御部２２ｄの制御によって、応答生成部２２ａが決定した表示内容を可視的に出力する。なお、出力部４０を構成するディスプレイ４２は、入力部１０を構成するタッチパネル１３と一体化されていてもよい。 The speaker 41 audibly outputs the utterance content determined by the response generation unit 22a under the control of the voice output control unit 22c. The display 42 visually outputs the display content determined by the response generation unit 22a under the control of the display control unit 22d. The display 42 forming the output unit 40 may be integrated with the touch panel 13 forming the input unit 10.

＜発話適切値とその管理＞
次に、発話適切値とその管理について説明する。発話適切値は、その時点において対話装置１が発話することの適切さを表す数値である。発話適切値は、利用者が対話装置１を操作したとき、及び、利用者と対話装置１とが対話したときに、発話適切値管理部２３によって更新される。 <Proper utterance value and its management>
Next, the utterance appropriate value and its management will be described. The appropriate utterance value is a numerical value indicating the appropriateness of the dialogue device 1 to speak at that time. The appropriate utterance value is updated by the appropriate utterance value management unit 23 when the user operates the interactive device 1 and when the user interacts with the interactive device 1.

なお、本実施形態においては、発話適切値が０以上１００以下の整数値であるものとする。対話装置１が発話することが最も不適切であるとき、発話適切値は０を取り、対話装置１が発話することが最も適切であるとき、発話適切値は１００を取る。また、本実施形態においては、予め定められた複数の利用者に対する発話適切値が利用者毎に個別に管理されているものとする。また、本実施形態においては、予め定められた複数の時間帯における発話適切値が時間帯毎に個別に管理されているものとする。 In this embodiment, the appropriate utterance value is an integer value of 0 or more and 100 or less. When the dialog device 1 is the most inappropriate to speak, the appropriate speech value takes 0, and when the interactive device 1 is the most appropriate to speak, the appropriate speech value is 100. In addition, in the present embodiment, it is assumed that predetermined utterance appropriate values for a plurality of users are individually managed for each user. Further, in the present embodiment, it is assumed that the appropriate utterance values in a plurality of predetermined time zones are individually managed for each time zone.

（発話適切値の初期値）
発話適切値の初期値について、図２を参照して説明する。図２は、発話適切値の初期値を表すテーブルである。 (Initial value of appropriate utterance value)
The initial value of the appropriate utterance value will be described with reference to FIG. FIG. 2 is a table showing initial values of appropriate utterance values.

図２に示すテーブルには、利用者Ａ、利用者Ｂ、利用者Ｃ、…に対する、時間帯０〜４時、時間帯４〜８時、時間帯８〜１２時、…における発話適切値の初期値が格納されている。 In the table shown in FIG. 2, appropriate utterance values for user A, user B, user C,... In time zones 0 to 4 o'clock, time zones 4 to 8 o'clock, time zones 8 to 12 o'clock,. The initial value is stored.

例えば、利用者Ａに対する時間帯８〜１２時における発話適切値は、初期値「５０」である。時間帯８〜１２時において、利用者Ａは在宅中であることが多い等の理由により、利用者Ａは対話装置１による発話を受け入れ易いことが想定されるので、発話適切値の初期値は、このような相対的に高い値に設定されている。一方、利用者Ａに対する時間帯０〜４時における発話適切値は、初期値「０」である。時間帯０〜４時において、利用者Ａは就寝中であることが多い等の理由により、利用者Ａは対話装置１による発話を受け入れ難いことが想定されるので、発話適切値の初期値は、このような相対的に低い値に設定されている。 For example, the appropriate utterance value for the user A in the time zone of 8 to 12 o'clock is the initial value “50”. It is assumed that the user A is likely to accept the utterance by the dialogue device 1 because the user A is often at home during the time zone from 8 to 12 o'clock, so the initial value of the appropriate utterance value is , Such a relatively high value is set. On the other hand, the appropriate utterance value for the user A in the time zone 0 to 4:00 is the initial value "0". Since it is assumed that the user A is difficult to accept the utterance by the dialogue device 1 because the user A is often sleeping at the time of 0 to 4 o'clock, the initial value of the appropriate utterance value is , Such a relatively low value is set.

（発話適切値の更新）
発話適切値管理部２３は、利用者が対話装置１を操作したときに、発話適切値を更新する更新処理を実行する。この更新処理は、例えば、以下のように実行される。すなわち、利用者が対話装置１を操作すると、その操作内容がイベント処理部２１ｃから発話適切値管理部２３に通知される。発話適切値管理部２３は、通知された操作内容に応じて発話適切値を更新する。 (Update appropriate utterance value)
The utterance appropriate value management unit 23 executes an update process for updating the utterance appropriate value when the user operates the dialog device 1. This update process is executed as follows, for example. That is, when the user operates the dialogue device 1, the event processing unit 21c notifies the utterance appropriate value management unit 23 of the operation content. The appropriate utterance value management unit 23 updates the appropriate utterance value according to the notified operation content.

本実施形態においては、図３の（ａ）に示すような、操作内容−変化量テーブルが記憶部３０に記憶されている。発話適切値管理部２３は、このテーブルにおいて通知された操作内容に対応する変化量を、更新前の発話適切値に加算することによって、更新後の発話適切値を得る。例えば、このテーブルにおいて「アプリを起動」という操作内容に対応する変化量は、「＋５」である。したがって、発話適切値管理部２３は、利用者がアプリを起動するよう対話装置１を操作しときに、発話適切値を「＋５」だけ増加させる。或いは、このテーブルにおいて「発話を中断」という操作内容に対応する変化量は、「−１」である。したがって、発話適切値管理部２３は、利用者が発話を中断するよう対話装置１を操作しときに、発話適切値を「−１」だけ増加（１だけ減少）させる。 In the present embodiment, the operation content-change amount table as shown in FIG. 3A is stored in the storage unit 30. The utterance appropriate value management unit 23 obtains the updated utterance appropriate value by adding the change amount corresponding to the operation content notified in this table to the utterance appropriate value before the update. For example, in this table, the amount of change corresponding to the operation content of “start application” is “+5”. Therefore, the appropriate utterance value management unit 23 increases the appropriate utterance value by “+5” when the user operates the dialog device 1 to activate the application. Alternatively, in this table, the amount of change corresponding to the operation content “interrupt speech” is “−1”. Therefore, the utterance appropriate value management unit 23 increases the utterance appropriate value by "-1" (decreases by 1) when the user operates the dialog device 1 to interrupt the utterance.

また、発話適切値管理部２３は、利用者と対話装置１とが対話したとき、発話適切値を更新する更新処理を実行する。この更新処理は、例えば、以下のように実行される。すなわち、利用者と対話装置１とが対話すると、（１）利用者の発話内容が音声認識部２１ａから発話適切値管理部２３に通知される共に、（２）対話装置１の応答内容が応答生成部２２ａから発話適切値管理部２３に通知さる。発話適切値管理部２３は、（１）通知された利用者の発話内容に応じて発話適切値を更新すると共に、（２）通知された対話装置１の応答内容に応じて発話適切値を更新する。 Further, the utterance appropriate value management unit 23 executes an update process for updating the utterance appropriate value when the user interacts with the dialogue device 1. This update process is executed as follows, for example. That is, when the user interacts with the interactive device 1, (1) the utterance content of the user is notified from the voice recognition unit 21a to the appropriate utterance value management unit 23, and (2) the response content of the interactive device 1 responds. The generation unit 22a notifies the appropriate utterance value management unit 23. The appropriate utterance value management unit 23 (1) updates the appropriate utterance value according to the notified utterance content of the user, and (2) updates the appropriate utterance value according to the notified response content of the interactive device 1. To do.

本実施形態においては、図３の（ｂ）に示すような、発話内容−変化量テーブルが記憶部３０に記憶されている。発話適切値管理部２３は、このテーブルにおいて通知された発話内容に対応する変化量を、更新前の発話適切値に加算することよって、更新後の発話適切値を得る。例えば、このテーブルにおいて「おはよう」という発話内容に対応する変化量は、「＋５」である。したがって、発話適切値管理部２３は、利用者が「おはよう」と発話したときに、発話適切値を「＋５」だけ増加させる。 In the present embodiment, the utterance content-change amount table as shown in FIG. 3B is stored in the storage unit 30. The appropriate utterance value management unit 23 obtains the appropriate utterance value after update by adding the amount of change corresponding to the utterance content notified in this table to the appropriate utterance value before update. For example, in this table, the change amount corresponding to the utterance content of "Good morning" is "+5". Therefore, the utterance appropriate value management unit 23 increases the utterance appropriate value by “+5” when the user utters “Good morning”.

また、本実施形態においては、図３の（ｃ）に示すような、応答内容−変化量テーブルが記憶部３０に記憶されている。発話適切値管理部２３は、このテーブルにおいて通知された応答内容に対応する変化量を、更新前の発話適切値に加算することによって、更新後の発話適切値を得る。例えば、このテーブルにおいて「いってらっしゃい」という応答内容に対応する変化量は、「−３」である。したがって、発話適切値管理部２３は、対話装置１が「いってらっしゃい」と応答したときに、発話適切値を−３だけ増加（３だけ減少）させる。 Further, in the present embodiment, a response content-change amount table as shown in FIG. 3C is stored in the storage unit 30. The utterance appropriate value management unit 23 obtains the updated utterance appropriate value by adding the amount of change corresponding to the response content notified in this table to the utterance appropriate value before the update. For example, the amount of change corresponding to the response content "Welcome to you" in this table is "-3". Therefore, the utterance appropriate value management unit 23 increases the utterance appropriate value by -3 (decreases by 3) when the interactive device 1 responds by saying "Welcome."

なお、発話適切値管理部２３は、（１）音声認識部２１ａが発話音声を検出する検出処理の正否、（２）音声認識部２１ａが発話音声を発話内容（テキスト）に変換する変換処理の正否、及び、（３）応答生成部２２ａが発話内容に応じた応答内容を決定する決定処理の正否に応じて、発話適切値を更新してもよい。決定処理に失敗する場合としては、例えば、変換処理にて得られた発話内容が有意な発話内容ではない場合などが挙げられる。検出処理に成功し、変換処理に失敗した場合、発話適切値管理部２３は、例えば、発話適切値を＋１だけ増加させる。また、検出処理に成功し、変換処理に成功し、決定処理に失敗した場合、発話適切値管理部２３は、例えば、発話適切値を「＋３」だけ増加させる。また、検出処理に成功し、変換処理に成功し、決定処理に成功した場合、発話適切値管理部２３は、例えば、発話適切値を「＋５」だけ増加させる。 The appropriate utterance value management unit 23 performs (1) whether the voice recognition unit 21a detects the utterance voice correctly or not (2) the voice recognition unit 21a converts the utterance voice into the utterance content (text). The appropriate utterance value may be updated depending on whether the utterance is correct or not and (3) whether or not the determination process in which the response generation unit 22a determines the response content according to the utterance content is correct. Examples of the case where the determination process fails include a case where the utterance content obtained by the conversion process is not significant utterance content. When the detection process is successful and the conversion process is unsuccessful, the utterance appropriate value management unit 23 increases the utterance appropriate value by +1, for example. When the detection process succeeds, the conversion process succeeds, and the determination process fails, the utterance appropriate value management unit 23 increases the utterance appropriate value by “+3”, for example. When the detection process is successful, the conversion process is successful, and the determination process is successful, the utterance appropriate value management unit 23 increases the utterance appropriate value by “+5”, for example.

なお、本実施形態において、発話適切値管理部２３は、発話適切値を時間帯毎に管理する。すなわち、発話適切値管理部２３は、各時間帯の始点において発話適切値を初期値にリセットする。ここで、各時間帯に対応する発話適切値の初期値は、図２に示すテーブルを参照して決定される。ただし、本発明は、これに限定されない。例えば、発話適切値を任意の時間単位（日、週、月、年など）毎に管理する構成を採用してもよい。この場合、発話適切値は、時間単位毎に一回、初期値にリセットされる。 In the present embodiment, the appropriate utterance value management unit 23 manages the appropriate utterance value for each time period. That is, the utterance appropriate value management unit 23 resets the utterance appropriate value to the initial value at the start point of each time period. Here, the initial value of the appropriate utterance value corresponding to each time zone is determined with reference to the table shown in FIG. However, the present invention is not limited to this. For example, a configuration may be adopted in which the appropriate utterance value is managed for each arbitrary time unit (day, week, month, year, etc.). In this case, the appropriate utterance value is reset to the initial value once every time unit.

また、本実施形態において、発話適切値管理部２３は、発話適切値を利用者毎に管理する。すなわち、発話適切値管理部２３は、各利用者に対応する発話適切値を、その利用者が対話装置１を操作したとき、その利用者が対話装置１に発話したとき、又は、対話装置１がその利用者に応答したときに更新する（別の利用者が対話装置１を操作したとき、別の利用者が対話装置１に発話したとき、又は、対話装置１が別の利用者に応答したときには更新しない）。ただし、本発明は、これに限定されない。例えば、誰かが対話装置１を操作したときには、対話装置１を操作した利用者を問わず、その操作内容に応じて発話適切値を更新する構成を採用することができる。誰かが対話装置１に発話したとき、及び、対話装置１が誰かに応答したときについても、同様のことが言える。 Further, in the present embodiment, the utterance appropriate value management unit 23 manages the utterance appropriate value for each user. That is, the utterance appropriate value management unit 23 provides the utterance appropriate value corresponding to each user when the user operates the dialogue device 1, when the user utters the dialogue device 1, or the dialogue device 1 Is updated when the user responds to the user (when another user operates the dialogue device 1, when another user speaks to the dialogue device 1 or when the dialogue device 1 responds to the other user). I will not update it when I do). However, the present invention is not limited to this. For example, when someone operates the dialogue device 1, regardless of the user who operates the dialogue device 1, it is possible to adopt a configuration in which the appropriate utterance value is updated according to the operation content. The same applies when someone speaks to the dialogue device 1 and when the dialogue device 1 responds to someone.

（発話適切値の時間変化の例）
図４は、発話適切値の時間変化の例を示すグラフである。例えば、同グラフにおける時間帯８〜１２時における発話適切値の時間変化をみると、以下のことが分かる。 (Example of time-dependent change in appropriate utterance value)
FIG. 4 is a graph showing an example of the change over time of the appropriate utterance value. For example, looking at the time change of the appropriate utterance value in the time zone from 8 to 12 o'clock in the graph, the following can be seen.

すなわち、時間帯８〜１２時の始点（すなわち、８時）において、発話適切値は、当該時間帯に対応する初期値５０に設定される。次に、利用者が対話装置１に対してスリープ解除操作を行った時点において、発話適切値は、５０から５３に増加する。これは、図３の（ａ）に示す操作内容−変化量テーブルにおいて、操作内容「スリープ解除」が変化量「＋３」に対応しているからである。次に、利用者が対話装置１に対して「おはよう」と発話した時点において、発話適切値は、５３から５８に増加する。これは、図３の（ｂ）に示す発話内容−変化量テーブルにおいて、発話内容「おはよう」が変化量「＋５」に対応しているからである。次に、対話装置１が利用者に対して「おはよう」と応答した時点において、発話適切値は、５８から５９に増加する。これは、図３の（ｃ）に示す応答内容−変化量テーブルにおいて、応答内容「おはよう」が変化量「＋１」に対応しているからである。次に、利用者が対話装置１に対して「いってきます」と発話した時点において、発話適切値は、５９から６４に増加する。これは、図３の（ｂ）に示す発話内容−変化量テーブルにおいて、発話内容「いってきます」が変化量「＋５」に対応しているからである。次に、対話装置１が利用者に対して「いってらっしゃい」と応答した時点において、発話適切値は、６４から６１に減少する。これは、図３の（ｃ）に示す応答内容−変化量テーブルにおいて、応答内容「いってらっしゃい」が変化量「−３」に対応しているからである。時間帯８〜１２時の終点（すなわち、１２時）において、発話適切値は、当該時間帯の次の時間帯１２〜１６時に対応する初期値３０に設定される。 That is, at the start point (that is, 8:00) of the time zone from 8 to 12 o'clock, the appropriate utterance value is set to the initial value 50 corresponding to the time zone. Next, at the time when the user performs the sleep release operation on the dialog device 1, the appropriate utterance value increases from 50 to 53. This is because in the operation content-change amount table shown in FIG. 3A, the operation content “release sleep” corresponds to the change amount “+3”. Next, when the user utters "Good morning" to the dialog device 1, the appropriate utterance value increases from 53 to 58. This is because the utterance content “good morning” corresponds to the change amount “+5” in the utterance content-change amount table shown in FIG. 3B. Next, when the interactive device 1 responds to the user with "Good morning", the appropriate utterance value increases from 58 to 59. This is because the response content "Good morning" corresponds to the variation amount "+1" in the response content-variation amount table shown in (c) of FIG. Next, when the user utters “I come” to the dialog device 1, the appropriate utterance value increases from 59 to 64. This is because, in the utterance content-change amount table shown in FIG. 3B, the utterance content “I come” corresponds to the change amount “+5”. Next, at the time when the interactive device 1 responds to the user by saying "Welcome," the appropriate utterance value decreases from 64 to 61. This is because, in the response content-variation amount table shown in FIG. 3C, the response content "Welcome" corresponds to the variation amount "-3". At the end point of time zone 8 to 12 o'clock (that is, 12 o'clock), the appropriate utterance value is set to the initial value 30 corresponding to the time zone 12 to 16 o'clock next to the time zone.

このように、対話装置１においては、利用者が対話装置１を操作した操作内容、利用者が対話装置１に発話した発話内容、及び、対話装置１が利用者に応答した応答内容に応じて、発話適切値が更新される。例えば、利用者が対話装置１に「おはよう」と発話した後のように、利用者が対話装置１による発話を受け入れ易い状況においては、発話適切値が相対的に大きな値を取り、対話装置１が利用者に「いってらっしゃい」と応答した後のように、利用者が対話装置１による発話を受け入れ難い状況においては、発話適切値が相対的に小さな値を取る。したがって、この発話適切値に応じて対話装置１が発話を実行するか否かを決定するようにすれば、適切なタイミングで発話する対話装置１を実現することが可能になる。 As described above, in the interactive device 1, the user operates the interactive device 1 according to the operation content, the utterance content uttered by the user to the interactive device 1, and the response content returned by the interactive device 1 to the user. , The appropriate utterance value is updated. For example, in a situation in which the user easily accepts the utterance by the dialogue device 1, such as after the user utters "good morning" to the dialogue device 1, the utterance appropriate value takes a relatively large value and the dialogue device 1 In a situation in which the user has difficulty accepting the utterance by the dialogue device 1, such as after the user responds to the user by saying “Welcome,” the appropriate utterance value takes a relatively small value. Therefore, by determining whether or not the dialogue device 1 executes the utterance according to the utterance appropriate value, the dialogue device 1 that utters at an appropriate timing can be realized.

なお、本実施形態においては、発話適切値管理部２３により更新された発話適切値が、直ちに発話適否判定部２２ｂにより参照される構成について説明したが、本発明はこれに限定されない。すなわち、ある日に発話適切値管理部２３により更新された発話適切値が、その日の翌日以降に発話適否判定部２２ｂにより参照される構成を採用することもできる。この場合、ある日のある時間帯において発話適否判定部２２ｂが参照する発話適切値は、例えば、その日の前日のその時間帯において発話適切値管理部２３が最後に更新した発話適切値となる。また、この場合、ある日のある時間帯において発話適否判定部２２ｂが参照する発話適切値を、その日の前日の各時間帯において発話適切値管理部２３が最後に更新した発話適切値の合計に対する、その日の前日のその時間帯において発話適切値管理部２３が最後に更新した発話適切値の比の値とすることも可能である。 In addition, in the present embodiment, the configuration in which the appropriate utterance value updated by the appropriate utterance value management unit 23 is immediately referred to by the appropriate utterance determination unit 22b has been described, but the present invention is not limited to this. That is, it is also possible to adopt a configuration in which the utterance appropriateness value updated by the utterance appropriateness value managing unit 23 is referred to by the utterance appropriateness determining unit 22b on and after the next day of the day. In this case, the utterance appropriateness value referred to by the utterance adequacy determining unit 22b in a certain time zone on a certain day is, for example, the utterance appropriateness value last updated by the utterance appropriateness value managing unit 23 in the time zone on the previous day of the day. Further, in this case, the utterance appropriateness value referred to by the utterance adequacy determining unit 22b in a certain time zone on a certain day is compared with the total utterance appropriateness value last updated by the utterance appropriateness value managing unit 23 in each time zone on the previous day of the day. It is also possible to set it as the ratio value of the utterance appropriate value last updated by the utterance appropriate value management unit 23 in the time zone of the day before the day.

＜発話トリガーイベントが生じたときの発話処理＞
発話トリガーイベントが生じたときに対話装置１が実行する発話処理Ｓ２について、図５を参照して説明する。図５は、発話処理Ｓ２の流れを示したフローチャートである。なお、このフローチャートは一例であり、これに限定されない。 <Utterance processing when an utterance trigger event occurs>
The speech process S2 executed by the dialogue device 1 when a speech trigger event occurs will be described with reference to FIG. FIG. 5 is a flowchart showing the flow of the utterance processing S2. Note that this flowchart is an example, and the present invention is not limited to this.

発話処理Ｓ２は、イベント判定処理Ｓ１１と、発話適切値取得処理Ｓ１２と、発話適否判定処理Ｓ１３と、応答内容生成処理Ｓ１４と、発話実行処理Ｓ１５と、利用者有無判定処理Ｓ１６と、を含んでいる。各処理の内容は、以下の通りである。 The utterance process S2 includes an event determination process S11, an appropriate utterance value acquisition process S12, an appropriateness utterance determination process S13, a response content generation process S14, an utterance execution process S15, and a user presence/absence determination process S16. There is. The contents of each process are as follows.

イベント判定処理Ｓ１１は、イベント処理部２１ｃが、センサ１４から提供される状態情報に基づいて、発話トリガーイベントが発生したか否かを判定する処理である。イベント判定処理Ｓ１１は、発話トリガーイベントが発生したとイベント処理部２１ｃが判定するまで繰り返される。発話トリガーイベントが発生したとイベント処理部２１ｃが判定すると、後続する発話適切値取得処理Ｓ１２が実行される。 The event determination process S11 is a process in which the event processing unit 21c determines, based on the state information provided from the sensor 14, whether or not a speech trigger event has occurred. The event determination process S11 is repeated until the event processing unit 21c determines that a speech trigger event has occurred. When the event processing unit 21c determines that the utterance trigger event has occurred, the subsequent utterance appropriate value acquisition process S12 is executed.

発話適切値取得処理Ｓ１２は、発話適否判定部２２ｂが、記憶部３０から発話適切値を読み出す処理である。発話適否判定処理Ｓ１３は、発話適否判定部２２ｂが、対話装置１が発話することが適切であるか否かを、発話適切値取得処理Ｓ１２にて取得した発話適切値に応じて判定する処理である。発話適否判定処理Ｓ１３は、例えば、発話適切値取得処理Ｓ１２にて取得した発話適切値が予め定められた閾値以上である（又は閾値よりも大きい）場合、発話することが適切であると判定し、発話適切値取得処理Ｓ１２にて取得した発話適切値が閾値よりも小さい（又は閾値以下である）場合、発話することが不適切であると判定する。発話することが適切であると発話適否判定部２２ｂが判定すると、後述する応答内容生成処理Ｓ１４が実行される。一方、発話することが不適切であると発話適否判定部２２ｂが判定すると、後述する利用者有無判定処理Ｓ１６が実行される。 The appropriate utterance value acquisition process S<b>12 is a process in which the appropriate utterance determination unit 22 b reads the appropriate utterance value from the storage unit 30. The utterance adequacy determination process S13 is a process in which the utterance adequacy determination unit 22b determines whether or not it is appropriate for the dialogue device 1 to speak, according to the appropriate utterance value acquired in the appropriate utterance value acquisition process S12. is there. The utterance adequacy determination process S13 determines that it is appropriate to speak, for example, when the appropriate utterance value acquired in the appropriate utterance value acquisition process S12 is equal to or greater than (or larger than) a predetermined threshold value. If the appropriate utterance value acquired in the appropriate utterance value acquisition process S12 is smaller than (or equal to or less than) the threshold value, it is determined that utterance is inappropriate. When the utterance suitability determination unit 22b determines that it is appropriate to speak, a response content generation process S14 described below is executed. On the other hand, when the utterance suitability determination unit 22b determines that it is inappropriate to speak, the user presence/absence determination process S16 described below is executed.

応答内容生成処理Ｓ１４は、応答生成部２２ａが、そのときの状況に応じた発話内容及び表示内容を決定した上で、（ａ）決定した発話内容を表す音声データを音声出力制御部２２ｃに提供すると共に、（ｂ）決定した表示内容を表す画像データを表示制御部２２ｄに提供する処理である。発話実行処理Ｓ１５は、（ａ）音声出力制御部２２ｃが、応答生成部２２ａから提供された音声データに基づいて、応答生成部２２ａが決定した発話内容を可聴的に出力するようスピーカ４１を制御すると共に、（ｂ）表示制御部２２ｄが、応答生成部２２ａから提供された画像データに基づいて、応答生成部２２ａが決定した表示内容を可視的に出力するようディスプレイ４２を制御する処理である。 In the response content generation process S14, the response generation unit 22a determines the utterance content and the display content according to the situation at that time, and then (a) provides the voice output control unit 22c with the voice data representing the determined utterance content. In addition, (b) is a process of providing image data representing the determined display content to the display control unit 22d. In the utterance execution process S15, (a) the voice output control unit 22c controls the speaker 41 so as to audibly output the utterance content determined by the response generation unit 22a based on the voice data provided from the response generation unit 22a. At the same time, (b) the display control unit 22d controls the display 42 so as to visually output the display content determined by the response generation unit 22a based on the image data provided from the response generation unit 22a. ..

利用者有無判定処理Ｓ１６は、イベント処理部２１ｃが、センサ１４から提供される状態情報に基づいて、対話装置１の周囲に利用者が存在するか否かを判定する処理である。対話装置１の周囲に利用者が存在するとイベント処理部２１ｃが判定した場合には、上述した応答内容生成処理Ｓ１４及び発話実行処理Ｓ１５が実行された後、フローが終了する。対話装置１の周囲に利用者が存在しないとイベント処理部２１ｃが判定した場合には、上述した応答内容生成処理Ｓ１４及び発話実行処理Ｓ１５が実行されることなく、フローが終了する。 The user presence/absence determination process S16 is a process in which the event processing unit 21c determines, based on the state information provided from the sensor 14, whether or not a user is present around the dialogue apparatus 1. When the event processing unit 21c determines that there is a user around the dialogue apparatus 1, the response content generation process S14 and the utterance execution process S15 described above are executed, and then the flow ends. When the event processing unit 21c determines that there is no user around the interactive apparatus 1, the flow ends without executing the response content generation process S14 and the utterance execution process S15 described above.

〔実施形態２〕
以下、本発明の実施形態２に係る対話装置１について、図６を参照して説明する。なお、説明の便宜上、同一の機能を有する部材については、同一の符号を付し、適宜その説明を省略する。実施形態２では、対話装置１による発話処理Ｓ２の流れが実施形態１と異なる。 [Embodiment 2]
Hereinafter, the dialogue device 1 according to the second embodiment of the present invention will be described with reference to FIG. For convenience of description, members having the same function will be denoted by the same reference numeral, and description thereof will be omitted as appropriate. In the second embodiment, the flow of the utterance processing S2 by the dialogue device 1 is different from that of the first embodiment.

図６は、実施形態２における対話装置１の発話処理Ｓ２の流れを示すフローチャートである。なお、このフローチャートは一例であり、これに限定されない。実施形態２における対話装置１の発話処理Ｓ２は、イベント判定処理Ｓ２１と、発話適切値取得処理Ｓ２２と、発話適否判定処理Ｓ２３と、応答内容生成処理Ｓ２４と、発話実行処理Ｓ２５と、利用者有無判定処理Ｓ２６と、更に、動作実行処理Ｓ２７と、音声入力有無判定処理Ｓ２８と、を含んでいる。各処理の内容は、以下の通りである。 FIG. 6 is a flowchart showing the flow of the utterance processing S2 of the dialog device 1 according to the second embodiment. Note that this flowchart is an example, and the present invention is not limited to this. The utterance process S2 of the dialogue apparatus 1 according to the second embodiment includes an event determination process S21, an appropriate utterance value acquisition process S22, an utterance appropriateness determination process S23, a response content generation process S24, an utterance execution process S25, and presence/absence of a user. The determination process S26 further includes an operation execution process S27 and a voice input presence/absence determination process S28. The contents of each process are as follows.

イベント判定処理Ｓ２１は、イベント処理部２１ｃが、センサ１４から提供される状態情報に基づいて、発話トリガーイベントが発生したか否かを判定する処理である。イベント判定処理Ｓ２１は、発話トリガーイベントが発生したとイベント処理部２１ｃが判定するまで繰り返される。発話トリガーイベントが発生したとイベント処理部２１ｃが判定すると、後続する発話適切値取得処理Ｓ２２が実行される。 The event determination process S21 is a process in which the event processing unit 21c determines, based on the state information provided from the sensor 14, whether or not a speech trigger event has occurred. The event determination process S21 is repeated until the event processing unit 21c determines that a speech trigger event has occurred. When the event processing unit 21c determines that the utterance trigger event has occurred, the subsequent utterance appropriate value acquisition process S22 is executed.

発話適切値取得処理Ｓ２２は、発話適否判定部２２ｂが、記憶部３０から発話適切値を読み出す処理である。発話適否判定処理Ｓ２３は、発話適否判定部２２ｂが、対話装置１が発話することが適切であるか否かを、発話適切値取得処理Ｓ２２にて取得した発話適切値に応じて判定する処理である。 The utterance appropriateness value acquisition process S22 is a process in which the utterance appropriateness determination unit 22b reads the utterance appropriateness value from the storage unit 30. The utterance adequacy determination process S23 is a process in which the utterance adequacy determination unit 22b determines whether or not it is appropriate for the dialog device 1 to utter according to the utterance appropriate value acquired in the utterance appropriate value acquisition process S22. is there.

発話適否判定処理Ｓ２３は、発話適切値取得処理Ｓ２２にて取得した発話適切値が高い値で、予め定められた第１閾値以上である（又は閾値よりも大きい）場合、発話することが適切であると判定する。この場合、上記実施形態１の応答内容生成処理Ｓ１４と同様の応答内容生成処理Ｓ２４が実行された後、上記実施形態１の発話実行処理Ｓ１５と同様の発話実行処理Ｓ２５が実行される。 In the utterance adequacy determination process S23, it is appropriate to utter when the utterance appropriate value acquired in the utterance appropriate value acquisition process S22 is a high value and is equal to or larger than (or larger than) a predetermined first threshold value. Judge that there is. In this case, after the response content generation process S24 similar to the response content generation process S14 of the first embodiment is executed, the utterance execution process S25 similar to the utterance execution process S15 of the first embodiment is executed.

一方、発話適切値取得処理Ｓ２２にて取得した発話適切値が中程度の値で、第１閾値よりも小さく、第２閾値以上である（又は第２閾値よりも大きい）場合、上記実施形態１の利用者有無判定処理Ｓ１６と同様の利用者有無判定処理Ｓ２６が実行される。ここで、第２閾値は、第１閾値よりも小さい値に設定されている。 On the other hand, when the appropriate utterance value acquired in the appropriate utterance value acquisition processing S22 is a medium value, which is smaller than the first threshold value and equal to or larger than the second threshold value (or larger than the second threshold value), the above-described first embodiment is performed. The user presence/absence determination process S26 similar to the user presence/absence determination process S16 is executed. Here, the second threshold is set to a value smaller than the first threshold.

利用者有無判定処理Ｓ２６にて、対話装置１の周囲に利用者が存在するとイベント処理部２１ｃが判定した場合には、上述した応答内容生成処理Ｓ２４及び発話実行処理Ｓ２５が実行された後、フローが終了する。対話装置１の周囲に利用者が存在しないとイベント処理部２１ｃが判定した場合には、応答内容生成処理Ｓ２４及び発話実行処理Ｓ２５が実行されることなく、フローが終了する。 In the user presence/absence determination process S26, when the event processing unit 21c determines that a user is present around the dialogue apparatus 1, the response content generation process S24 and the utterance execution process S25 described above are executed, and then the flow is executed. Ends. When the event processing unit 21c determines that there is no user around the dialogue apparatus 1, the flow ends without executing the response content generation process S24 and the utterance execution process S25.

また、発話適切値取得処理Ｓ２２にて取得した発話適切値が低い値で、第２閾値よりも小さい（又は第２閾値以下である）場合、後述する動作実行処理Ｓ２７及び音声入力有無判定処理Ｓ２８が実行される。 If the utterance appropriate value acquired in the utterance appropriate value acquisition process S22 is a low value and smaller than the second threshold value (or equal to or less than the second threshold value), the operation execution process S27 and the voice input presence/absence determination process S28 described later are performed. Is executed.

動作実行処理Ｓ２７では、例えば、スピーカ４１から所定の音声（「一緒に話そうよ」等）を出力するといった利用者の気を引く動作を所定時間継続して行うものとする。音声入力有無判定処理Ｓ２８では、例えば、利用者の発話はないけれども利用者が動いているので音声入力を待つ状態を想定して、音声認識部２１ａにより利用者の音声入力があったか否かを判定する。音声認識部２１ａにより利用者の音声入力があったと判定された場合、上述した応答内容生成処理Ｓ２４、及び発話実行処理Ｓ２５が実行される。一方、所定時間経過後も、利用者の音声入力がない場合、応答内容生成処理Ｓ２４及び発話実行処理Ｓ２５が実行されることなく、フローが終了する。 In the operation execution process S27, for example, an operation that attracts the user, such as outputting a predetermined voice ("Let's talk together") from the speaker 41, is continuously performed for a predetermined time. In the voice input presence/absence determination process S28, for example, it is determined whether or not the voice input by the user is performed by the voice recognition unit 21a, assuming a state in which the user is moving but the user is moving and thus waits for voice input. To do. When the voice recognition unit 21a determines that the user's voice is input, the response content generation process S24 and the utterance execution process S25 described above are executed. On the other hand, if there is no voice input by the user even after the lapse of the predetermined time, the flow ends without executing the response content generation process S24 and the utterance execution process S25.

〔ソフトウェアによる実現例〕
制御部２０（特許請求の範囲における「制御装置」の一例）の各機能ブロックは、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ソフトウェアによって実現してもよい。 [Example of software implementation]
Each functional block of the control unit 20 (an example of the “control device” in the claims) may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or by software. You may.

後者の場合、制御部２０は、各機能を実現するソフトウェアであるプログラムの命令を実行するコンピュータを備えている。このコンピュータは、例えば少なくとも１つのプロセッサを備えていると共に、上記プログラムを記憶したコンピュータ読み取り可能な少なくとも１つの記録媒体を備えている。そして、上記コンピュータにおいて、上記プロセッサが上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記プロセッサとしては、例えばＣＰＵ（Central Processing Unit）を用いることができる。上記記録媒体としては、「一時的でない有形の媒体」、例えば、ＲＯＭ（Read Only Memory）等の他、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムを展開するＲＡＭ（Random Access Memory）などをさらに備えていてもよい。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明の一態様は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the control unit 20 includes a computer that executes the instructions of a program that is software that realizes each function. The computer includes, for example, at least one processor, and at least one computer-readable recording medium that stores the program. Then, in the computer, the processor reads the program from the recording medium and executes the program to achieve the object of the present invention. As the processor, for example, a CPU (Central Processing Unit) can be used. As the recording medium, a "non-transitory tangible medium" such as a ROM (Read Only Memory), a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. Further, a RAM (Random Access Memory) for expanding the program may be further provided. The program may be supplied to the computer via any transmission medium (communication network, broadcast wave, etc.) capable of transmitting the program. Note that one aspect of the present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the program is embodied by electronic transmission.

〔まとめ〕
（１）本発明の態様１に係る制御装置（制御部２０）は、利用者と対話する対話装置１を制御する制御装置であって、上記対話装置１が発話することの適切さを表す発話適切値を、上記利用者が上記対話装置１に対して行った操作の内容、上記利用者が上記対話装置１に対して行った発話の内容、又は、上記対話装置１が上記利用者に対して行った応答の内容に基づいて更新する発話適切値管理部２３と、上記対話装置１が上記利用者に対して発話することが適切であるか否かを、上記発話適切値管理部２３により更新された発話適切値に応じて判定する発話適否判定部２２ｂとを備えている。 [Summary]
(1) The control device (control unit 20) according to the first aspect of the present invention is a control device that controls the dialog device 1 that interacts with the user, and the utterance that represents the appropriateness of the dialog device 1 to speak. The appropriate value is set to the content of the operation performed by the user on the dialog device 1, the content of the utterance made by the user to the dialog device 1, or the dialog device 1 to the user. The appropriate utterance value management unit 23 for updating based on the content of the response made by the user and the appropriate utterance value management unit 23 for determining whether or not it is appropriate for the dialog device 1 to utter the user. An utterance adequacy determining unit 22b that determines the utterance adequacy value according to the updated utterance adequacy value.

上記の構成によれば、上記対話装置１が上記利用者に対して発話することが適切であるか否かを、上記利用者が上記対話装置１に対して行った操作の内容、上記利用者が上記対話装置１に対して行った発話の内容、又は、上記対話装置１が上記利用者に対して行った応答の内容に基づいて判定することができる。このため、この判定結果に基づいて上記利用者に対する発話を実行するか否かを決定することで、利用者にとって適切なタイミングで発話を実行することが可能になる。 According to the above configuration, it is determined whether or not it is appropriate for the dialogue device 1 to speak to the user, the contents of the operation performed by the user on the dialogue device 1, and the user. Can be determined based on the content of the utterance made by the dialogue device 1 or the contents of the response made by the dialogue device 1 to the user. Therefore, it is possible to execute the utterance at an appropriate timing for the user by determining whether or not to execute the utterance to the user based on the determination result.

（２）本発明の態様２に係る制御装置（制御部２０）では、上記態様１において、上記発話適切値は、時間帯毎に管理されており、上記発話適切値管理部２３は、各時間帯に対応する発話適切値の現行値に、上記操作の内容、上記発話の内容、又は、上記応答の内容に応じた変化量を加算することによって、該時間帯に対応する発話適切値を更新する。 (2) In the control device (control unit 20) according to the second aspect of the present invention, in the first aspect, the utterance appropriate value is managed for each time zone, and the utterance appropriate value management unit 23 sets each time. The appropriate utterance value corresponding to the time zone is updated by adding the amount of change depending on the content of the operation, the content of the utterance, or the content of the response to the current value of the appropriate utterance value corresponding to the zone. To do.

上記の構成によれば、上記対話装置１が上記利用者に対して発話することが適切であるか否かを、時間帯毎に独立に判定することができる。このため、上記利用者にとってより適切なタイミングで発話を実行することが可能になる。 According to the above configuration, whether or not it is appropriate for the dialog device 1 to speak to the user can be independently determined for each time period. Therefore, it becomes possible for the user to perform the utterance at a more appropriate timing.

（３）本発明の態様３に係る制御装置（制御部２０）では、上記態様１又は態様２において、上記発話適切値は、利用者毎に管理されており、上記発話適切値管理部２３は、各利用者に対応する発話適切値の現行値に、上記操作の内容、上記発話の内容、又は、上記応答の内容に応じた変化量を加算することによって、該利用者に対応する発話適切値を更新する。 (3) In the control device (control unit 20) according to Aspect 3 of the present invention, in the above Aspect 1 or Aspect 2, the utterance appropriate value is managed for each user, and the utterance appropriate value management unit 23 is , The utterance appropriate for the user by adding the amount of change according to the content of the operation, the content of the utterance, or the content of the response to the current value of the utterance appropriate value corresponding to each user. Update the value.

上記の構成によれば、上記対話装置１が各利用者に対して発話することが適切であるか否かを、利用者毎に独立に判定することができる。このため、各利用者にとってより適切なタイミングで発話を実行することが可能になる。 According to the above configuration, it is possible to independently determine, for each user, whether it is appropriate for the dialogue device 1 to speak to each user. Therefore, it becomes possible for each user to execute the utterance at a more appropriate timing.

（４）本発明の態様４に係る制御装置（制御部２０）では、上記態様１から３において、上記発話適否判定部２２ｂが適切と判定した場合、又は、上記発話適否判定部２２ｂが不適切と判定し、且つ、上記利用者が上記対話装置１の周辺に存在することが確認された場合、上記利用者に対する発話を実行する音声出力制御部２２ｃを更に備えている。 (4) In the control device (control unit 20) according to Aspect 4 of the present invention, in the above Aspects 1 to 3, when the utterance suitability determination unit 22b determines that the utterance suitability is appropriate, or the utterance suitability determination unit 22b is inappropriate. If it is determined that the user is present in the vicinity of the dialog device 1, a voice output control unit 22c that executes a speech to the user is further included.

上記の構成によれば、上記利用者が上記対話装置１の周辺に存在している場合には、発話適切値に基づく判定の結果によらず、上記利用者に対する発話を実行することが可能になる。これにより、上記対話装置１による発話が過剰に抑制されることを防止することができる。 According to the above configuration, when the user is present in the vicinity of the dialogue device 1, it is possible to perform the utterance to the user regardless of the result of the determination based on the appropriate utterance value. Become. As a result, it is possible to prevent the utterance by the dialogue device 1 from being excessively suppressed.

（５）本発明の態様５に係る制御装置（制御部２０）では、上記態様１から４において、上記発話適否判定部２２ｂが不適切と判定した場合、上記利用者の注目を得るための予め定められた動作を上記対話装置１に実行させる動作実行部であるスピーカ４１及びディスプレイ４２を更に備えている、
上記の構成によれば、上記利用者に対する発話が不適切な状況を上記利用者に対する発話が適切な状況に変化させることが可能になる。 (5) In the control device (control unit 20) according to the fifth aspect of the present invention, when the utterance adequacy determination unit 22b determines that the utterance adequacy determination unit 22b is inappropriate in the first to fourth aspects, it is necessary to obtain the user's attention beforehand The apparatus further includes a speaker 41 and a display 42, which are operation execution units that cause the interactive apparatus 1 to execute a predetermined operation.
According to the above configuration, it is possible to change a situation where the utterance to the user is inappropriate to a situation where the utterance to the user is appropriate.

（６）本発明の態様６に係る制御方法では、利用者と対話する対話装置１を制御する制御方法であって、上記対話装置１が発話することの適切さを表す発話適切値を、上記利用者が上記対話装置１に対して行った操作の内容、上記利用者が上記対話装置１に対して行った発話の内容、又は、上記対話装置１が上記利用者に対して行った応答の内容に基づいて更新する発話適切値更新処理と、上記対話装置１が上記利用者に対して発話することが適切であるか否かを、上記発話適切値更新処理にて更新された発話適切値に応じて判定する発話適否判定処理と、を含んでいる。 (6) The control method according to the sixth aspect of the present invention is a control method for controlling the dialogue device 1 for interacting with a user, wherein the utterance appropriate value representing the appropriateness of the utterance of the dialogue device 1 is The contents of the operation performed by the user on the dialogue device 1, the contents of the utterance made by the user to the dialogue device 1, or the response made by the dialogue device 1 to the user. The appropriate utterance value updating process that is updated based on the content, and whether or not it is appropriate for the dialog device 1 to utter the user, the appropriate utterance value updated by the appropriate utterance value updating process. Utterance suitability determination processing for determining according to.

上記の方法によれば、上記対話装置１が上記利用者に対して発話することが適切であるか否かを、上記利用者が上記対話装置１に対して行った操作の内容、上記利用者が上記対話装置１に対して行った発話の内容、又は、上記対話装置１が上記利用者に対して行った応答の内容に基づいて判定することができる。このため、この判定結果に基づいて上記利用者に対する発話を実行するか否かを決定することで、利用者にとって適切なタイミングで発話を実行することが可能になる。 According to the above method, it is determined whether it is appropriate for the dialogue device 1 to speak to the user, the contents of the operation performed by the user on the dialogue device 1, and the user. Can be determined based on the content of the utterance made by the dialogue device 1 or the contents of the response made by the dialogue device 1 to the user. Therefore, it is possible to execute the utterance at an appropriate timing for the user by determining whether or not to execute the utterance to the user based on the determination result.

（７）本発明の態様７に係る制御プログラムは、上記態様１から５の対話装置１を制御する制御装置としてコンピュータを動作させるための制御プログラムであって、上記コンピュータを上記制御装置の各部として機能させる。 (7) A control program according to aspect 7 of the present invention is a control program for operating a computer as a control device for controlling the dialogue apparatus 1 according to aspects 1 to 5, wherein the computer is used as each part of the control device. Make it work.

上記の制御プログラムによれば、上記対話装置１によって、上記利用者に対して発話することが適切であるか否かを、上記利用者が上記対話装置１に対して行った操作の内容、上記利用者が上記対話装置１に対して行った発話の内容、又は、上記対話装置１が上記利用者に対して行った応答の内容に基づいて判定することができる。このため、この判定結果に基づいて上記利用者に対する発話を実行するか否かを決定することで、利用者にとって適切なタイミングで発話を実行することが可能になる。 According to the control program, whether or not it is appropriate for the dialogue device 1 to speak to the user, the contents of the operation performed by the user on the dialogue device 1, The determination can be made based on the content of the utterance made by the user to the dialogue device 1 or the contents of the response made by the dialogue device 1 to the user. Therefore, it is possible to execute the utterance at an appropriate timing for the user by determining whether or not to execute the utterance to the user based on the determination result.

（８）本発明の態様８に係る対話装置１は、利用者と対話する対話装置であって、当該対話装置を制御する上記態様１から５の制御装置（制御部２０）を備えている。 (8) The dialogue apparatus 1 according to the eighth aspect of the present invention is a dialogue apparatus that interacts with a user, and includes the control device (control unit 20) according to the first to fifth aspects that controls the dialogue apparatus.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。さらに、各実施形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成することができる。 The present invention is not limited to the above-described embodiments, but various modifications can be made within the scope of the claims, and embodiments obtained by appropriately combining the technical means disclosed in the different embodiments Is also included in the technical scope of the present invention. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.

１対話装置
１０入力部
２０制御部（制御装置）
２１入力制御部
２１ａ音声認識部
２１ｂ画像認識部
２１ｃイベント処理部
２２出力制御部
２２ａ応答生成部
２２ｂ発話適否判定部
２２ｃ音声出力制御部
２２ｄ表示制御部
２３発話適切値管理部
３０記憶部
４０出力部
４１スピーカ
４２ディスプレイ 1 Dialogue device 10 Input unit 20 Control unit (control device)
21 input control unit 21a voice recognition unit 21b image recognition unit 21c event processing unit 22 output control unit 22a response generation unit 22b utterance adequacy determination unit 22c voice output control unit 22d display control unit 23 utterance appropriate value management unit 30 storage unit 40 output unit 41 speaker 42 display

Claims

A control device for controlling an interactive device for interacting with a user,
The utterance appropriate value representing the appropriateness of the dialogue device to speak, the content of the operation performed by the user on the dialogue device, the content of the utterance performed by the user on the dialogue device, or , An utterance appropriate value management unit that updates based on the content of the response that the dialog device makes to the user,
An utterance adequacy determination unit that determines whether or not it is appropriate for the dialog device to utter the user according to the utterance appropriate value updated by the utterance appropriate value management unit. A control device characterized by the above.

The appropriate utterance value is managed for each time period,
The utterance appropriate value management unit adds the amount of change depending on the content of the operation, the content of the utterance, or the content of the response to the current value of the utterance appropriate value corresponding to each time zone, The control device according to claim 1, wherein the utterance appropriate value corresponding to the time zone is updated.

The utterance appropriate value is managed for each user,
The utterance appropriate value management unit adds, to the current value of the utterance appropriate value corresponding to each user, the amount of change according to the content of the operation, the content of the utterance, or the content of the response, The control device according to claim 1 or 2, wherein the appropriate utterance value corresponding to the user is updated.

If the utterance adequacy determining unit determines that it is appropriate, or if the utterance adequacy determining unit determines that the utterance appropriateness is inappropriate and it is confirmed that the user is present in the vicinity of the dialogue device, the user The control device according to any one of claims 1 to 3, further comprising a voice output control unit that executes an utterance to the user.

The operation execution unit that causes the dialog device to execute a predetermined operation for obtaining the attention of the user when the utterance adequacy determination unit determines that the utterance suitability is inappropriate. 5. The control device according to claim 4.

A control method for controlling an interactive device for interacting with a user, comprising:
The utterance appropriate value representing the appropriateness of the dialogue device to speak, the content of the operation performed by the user on the dialogue device, the content of the utterance performed by the user on the dialogue device, or , An utterance appropriate value update process for updating based on the content of a response made by the dialog device to the user,
Utterance adequacy determination processing for determining whether or not it is appropriate for the dialogue device to speak to the user according to the utterance appropriate value updated in the utterance appropriate value update processing. A control method characterized in that

A control program for operating a computer as the control device according to any one of claims 1 to 5, the control program causing the computer to function as each unit of the control device.

An interactive device for interacting with a user, comprising the control device according to any one of claims 1 to 5 as a control unit for controlling the interactive device.