JP2021135363A

JP2021135363A - Control system, control device, control method, and computer program

Info

Publication number: JP2021135363A
Application number: JP2020030607A
Authority: JP
Inventors: 拓也岩本; Takuya Iwamoto; 大志鵜口; Hiroshi Uguchi; 康太郎西; Kotaro Nishi; 惇馬場; Atsushi Baba
Original assignee: CyberAgent Inc
Current assignee: CyberAgent Inc
Priority date: 2020-02-26
Filing date: 2020-02-26
Publication date: 2021-09-13
Anticipated expiration: 2040-02-26
Also published as: JP6887035B1

Abstract

To provide a control system capable of further clearly indicating a person whom an interaction device is speaking to.SOLUTION: A control system 1 includes an acquisition unit that acquires, for each person, information about people who face a prescribed interaction device for speaking to a person, an operation deciding unit 166 that decides, among the people, a person toward whom the interaction device is to be directed, on the basis of the acquired information and information about the history of interactions between the persons and the interaction device stored in a prescribed storage device, a speech deciding unit 165 that decides a speech of the interaction device on the basis of a voice given by the person, and a device control unit 167 that controls operation of the interaction device such that the interaction device is directed to the direction, and controls a loudspeaker 40 disposed near the interaction device so as to output the speech.SELECTED DRAWING: Figure 1

Description

本発明は、制御システム、制御装置、制御方法及びコンピュータプログラムに関する。 The present invention relates to control systems, control devices, control methods and computer programs.

近年、ロボットや対話エージェント等の対話装置が人と対話をする技術について研究されている。このような技術は、例えば接客等の人との対話が求められる分野で活用されている。対話装置は、複数の人と同時に対話をする場合がある。 In recent years, research has been conducted on technologies for dialogue devices such as robots and dialogue agents to interact with humans. Such technology is utilized in fields where dialogue with people is required, such as customer service. The dialogue device may have a dialogue with a plurality of people at the same time.

特開２００５−２４６５０２号公報Japanese Unexamined Patent Publication No. 2005-246502

しかしながら、対話装置が複数の人と同時に対話する場合、どの人に話しかけているのか、わかりにくくなるという問題がある。上記事情に鑑み、本発明は、対話装置がどの人に話しかけているのかよりわかりやすく示すことができる技術を提供することを目的としている。 However, when the dialogue device interacts with a plurality of people at the same time, there is a problem that it becomes difficult to understand which person is talking to. In view of the above circumstances, it is an object of the present invention to provide a technique capable of showing which person the dialogue device is talking to in an easy-to-understand manner.

本発明の一態様は、人に話しかける所定の対話装置に対峙する複数の人に関する情報を前記人毎に取得する取得部と、前記取得された情報と、所定の記憶装置に記憶された前記人と前記対話装置との対話の履歴情報と、に基づいて、前記対話装置を前記複数の人のうちどの人の方向に向けるか決定する動作決定部と、前記人によって発話された音声に基づいて、前記対話装置の発話内容を決定する発話決定部と、前記方向を向くように前記対話装置の動作を制御し、前記発話内容を出力するように前記対話装置の近傍に設けられたスピーカーを制御する機器制御部と、を備える、制御システムである。 One aspect of the present invention is an acquisition unit that acquires information about a plurality of people facing a predetermined dialogue device that speaks to a person for each person, the acquired information, and the person stored in the predetermined storage device. Based on the history information of the dialogue between the user and the dialogue device, an action determining unit that determines which of the plurality of people the dialogue device should be directed to, and a voice spoken by the person. Controls the speech determination unit that determines the speech content of the dialogue device and the speaker provided in the vicinity of the dialogue device so as to control the operation of the dialogue device so as to face the direction and output the speech content. It is a control system including a device control unit.

本発明の一態様は、上記の制御システムであって、前記動作決定部は、前記取得された情報と前記履歴情報とに基づいて算出される、前記人の方向に前記対話装置を向けるか否かを定める優先度が最も高い人の方向に前記対話装置を向けることを決定するとともに、動作内容を決定する。 One aspect of the present invention is the control system, and whether or not the operation determining unit directs the dialogue device toward the person, which is calculated based on the acquired information and the history information. It is determined that the dialogue device is directed to the person with the highest priority, and the operation content is determined.

本発明の一態様は、上記の制御システムであって、前記履歴情報と前記発話内容とに基づいて、前記発話内容を数値化した発話内容スコアと、前記履歴情報が示し、前記対話装置の前記人に対する発話の履歴を数値化した発話履歴スコアと、前記履歴情報が示し、前記人の状態を数値化した人状態スコアと、のうち、いずれか１つ以上を算出し、前記算出されたスコアに基づいて、前記優先度を算出する優先度算出部をさらに備え、前記動作決定部は、前記優先度算出部によって算出された前記優先度に基づいて前記対話装置を向ける方向及び前記動作内容を決定する。 One aspect of the present invention is the control system, wherein the utterance content score obtained by quantifying the utterance content based on the history information and the utterance content and the history information indicate the above-mentioned dialogue device. One or more of the utterance history score obtained by quantifying the utterance history of a person and the person state score indicated by the history information and quantifying the state of the person is calculated, and the calculated score is obtained. A priority calculation unit for calculating the priority is further provided, and the operation determination unit determines the direction in which the dialogue device is directed and the operation content based on the priority calculated by the priority calculation unit. decide.

本発明の一態様は、上記の制御システムであって、前記動作決定部は、前記履歴情報と前記優先度とを対応付けた教師データを機械学習することで生成された推定器に基づいて、前記対話装置を前記複数の人のうちどの人の方向に向けるか決定する。 One aspect of the present invention is the control system, wherein the operation determination unit is based on an estimator generated by machine learning teacher data in which the history information and the priority are associated with each other. It is determined which of the plurality of people the dialogue device is directed toward.

本発明の一態様は、人に話しかける所定の対話装置に対峙する複数の人に関する情報を前記人毎に取得する取得部と、前記取得された情報と、所定の記憶装置に記憶された前記人と前記対話装置との対話の履歴情報と、に基づいて、前記対話装置を前記複数の人のうちどの人の方向に向けるか決定する動作決定部と、前記人によって発話された音声に基づいて、前記対話装置の発話内容を決定する発話決定部と、前記方向を向くように前記対話装置の動作を制御し、前記発話内容を出力するように前記対話装置の近傍に設けられたスピーカーを制御する機器制御部と、を備える、制御装置である。 One aspect of the present invention is an acquisition unit that acquires information about a plurality of people facing a predetermined dialogue device that speaks to a person for each person, the acquired information, and the person stored in the predetermined storage device. Based on the history information of the dialogue between the user and the dialogue device, an action determining unit that determines which of the plurality of people the dialogue device should be directed to, and a voice spoken by the person. Controls the speech determination unit that determines the speech content of the dialogue device and the speaker provided in the vicinity of the dialogue device so as to control the operation of the dialogue device so as to face the direction and output the speech content. It is a control device including a device control unit for the operation.

本発明の一態様は、制御装置が、人に話しかける所定の対話装置に対峙する複数の人に関する情報を前記人毎に取得する取得ステップと、制御装置が、前記取得された情報と、所定の記憶装置に記憶された前記人と前記対話装置との対話の履歴情報と、に基づいて、前記対話装置を前記複数の人のうちどの人の方向に向けるか決定する動作決定ステップと、制御装置が、前記人によって発話された音声に基づいて、前記対話装置の発話内容を決定する発話決定ステップと、制御装置が、前記方向を向くように前記対話装置の動作を制御し、前記発話内容を出力するように前記対話装置の近傍に設けられたスピーカーを制御する機器制御ステップと、有する、制御方法である。 One aspect of the present invention is an acquisition step in which a control device acquires information about a plurality of people facing a predetermined dialogue device that speaks to a person for each person, and a control device obtains the acquired information and a predetermined information. Based on the history information of the dialogue between the person and the dialogue device stored in the storage device, an operation determination step for determining which of the plurality of people the dialogue device is directed to, and a control device. However, the utterance determination step of determining the utterance content of the dialogue device based on the voice uttered by the person, and the control device controls the operation of the dialogue device so as to face the direction, and the utterance content is determined. It is a control method having a device control step for controlling a speaker provided in the vicinity of the dialogue device so as to output.

本発明の一態様は、上記の制御システムとしてコンピュータを機能させるためのコンピュータプログラムである。 One aspect of the present invention is a computer program for operating a computer as the control system described above.

本発明により、対話装置がどの人に話しかけているのかよりわかりやすく示すことができる。 INDUSTRIAL APPLICABILITY According to the present invention, it is possible to show which person the dialogue device is talking to in a more understandable manner.

制御システム１のシステム構成を示すシステム構成図である。It is a system configuration diagram which shows the system configuration of the control system 1. 対話装置５０の一具体例を示す図である。It is a figure which shows one specific example of a dialogue device 50. 人情報テーブルの一具体例を示す図である。It is a figure which shows a specific example of a person information table. 発話内容スコアテーブルの一具体例を示す図である。It is a figure which shows a specific example of the utterance content score table. 発話履歴スコアテーブルの一具体例を示す図である。It is a figure which shows a specific example of the utterance history score table. 人状態スコアテーブルの一具体例を示す図である。It is a figure which shows a specific example of a human condition score table. 人情報の生成に関する処理の一具体例を示すフローチャートである。It is a flowchart which shows a specific example of the process concerning the generation of person information. 対話装置５０の動作を決定する処理の一具体例を示すフローチャートである。It is a flowchart which shows a specific example of the process which determines the operation of a dialogue device 50. 発話履歴スコア算出処理の流れの一具体例を示すフローチャートである。It is a flowchart which shows a specific example of the flow of the utterance history score calculation process. 人状態スコア算出処理の流れの一具体例を示すフローチャートである。It is a flowchart which shows a specific example of the flow of the human condition score calculation process. 発話内容スコア算出処理の流れの一具体例を示すフローチャートである。It is a flowchart which shows a specific example of the flow of the utterance content score calculation process. 対話装置５０の動作の一具体例を示す図である。It is a figure which shows a specific example of the operation of the dialogue device 50. 制御システム１がロボットの代わりに表示装置又はアクチュエータ等の所定の装置を備える場合の一具体例を示す図である。It is a figure which shows a specific example of the case where the control system 1 is provided with a predetermined device such as a display device or an actuator instead of a robot.

図１は、制御システム１のシステム構成を示すシステム構成図である。制御システム１は、センサー１０、カメラ２０、マイク３０、スピーカー４０、対話装置５０及び制御装置１００を備える。制御システム１では、制御装置１００に入力された情報に基づいて、制御装置１００が、対話装置５０の動作を制御する。制御システム１では、制御装置１００に入力された情報に基づいて、制御装置１００が、スピーカー４０から音声を出力させる。 FIG. 1 is a system configuration diagram showing a system configuration of the control system 1. The control system 1 includes a sensor 10, a camera 20, a microphone 30, a speaker 40, a dialogue device 50, and a control device 100. In the control system 1, the control device 100 controls the operation of the dialogue device 50 based on the information input to the control device 100. In the control system 1, the control device 100 outputs sound from the speaker 40 based on the information input to the control device 100.

センサー１０は、対話装置５０の近傍に訪れた人を検出する。センサー１０は、人を検出すると、検出された人に関する情報を制御装置１００に出力する。検出される情報は、センサー１０の種類に応じて異なってもよい。例えば、センサー１０は、人を検知すると、人が訪れたことを示す信号を制御装置１００に出力してもよい。例えば、センサー１０は、デプスセンサ等の人の位置を検出するセンサーが用いられてもよい。この場合、センサー１０は、検出された人の位置を示す座標情報を制御装置１００に出力してもよい。センサー１０は、対話装置５０の近傍に設置される。なお、センサー１０は、複数設置されてもよい。この場合、センサー１０は、様々な方向の人を検出できるように向きを変えて設置されてもよい。センサー１０は、取得部の一具体例である。 The sensor 10 detects a person who has visited in the vicinity of the dialogue device 50. When the sensor 10 detects a person, the sensor 10 outputs information about the detected person to the control device 100. The detected information may differ depending on the type of the sensor 10. For example, when the sensor 10 detects a person, the sensor 10 may output a signal indicating that the person has visited to the control device 100. For example, as the sensor 10, a sensor that detects the position of a person, such as a depth sensor, may be used. In this case, the sensor 10 may output coordinate information indicating the position of the detected person to the control device 100. The sensor 10 is installed in the vicinity of the dialogue device 50. A plurality of sensors 10 may be installed. In this case, the sensor 10 may be installed in a different direction so that it can detect people in various directions. The sensor 10 is a specific example of the acquisition unit.

カメラ２０は、対話装置５０近傍の動画像を撮像する。カメラ２０は、例えば対話装置５０に対峙する人物を撮像する。カメラ２０は、カメラ等の撮像装置を制御装置１００に接続するためのインタフェースであってもよい。この場合、カメラ２０は、撮像装置において撮像された動画像から映像信号を生成し、制御装置１００に入力する。カメラ２０は、対話装置５０に対峙する人を撮像できる位置であれば、どの位置に設けられてもよい。カメラ２０は、取得部の一具体例である。 The camera 20 captures a moving image in the vicinity of the dialogue device 50. The camera 20 captures, for example, a person facing the dialogue device 50. The camera 20 may be an interface for connecting an imaging device such as a camera to the control device 100. In this case, the camera 20 generates a video signal from the moving image captured by the imaging device and inputs it to the control device 100. The camera 20 may be provided at any position as long as it can image a person facing the dialogue device 50. The camera 20 is a specific example of the acquisition unit.

マイク３０は、対話装置５０近傍の音声を収音する。マイク３０は、対話装置５０の近傍に設置される。マイク３０は、例えば対話装置５０に対峙する人によって発話された音声を収音する。マイク３０は、例えば対話装置５０に対して発話された音声を収音する。マイク３０は、収音された音声に基づいて音声信号を生成する。マイク３０は、生成された音声信号を制御装置１００に出力する。なお、マイク３０は、複数設置されてもよい。この場合、マイク３０は、様々な方向の音声を収音できるように向きを変えて設置されてもよい。なお、マイク３０は、外付けマイク等の収音装置を制御装置１００に接続するためのインタフェースであってもよい。この場合、マイク３０は、収音装置において入力された音声から音声信号を生成し、制御装置１００に出力する。マイク３０は、取得部の一具体例である。 The microphone 30 collects sound in the vicinity of the dialogue device 50. The microphone 30 is installed in the vicinity of the dialogue device 50. The microphone 30 picks up the voice spoken by, for example, a person facing the dialogue device 50. The microphone 30 collects, for example, the voice spoken to the dialogue device 50. The microphone 30 generates an audio signal based on the picked-up sound. The microphone 30 outputs the generated audio signal to the control device 100. A plurality of microphones 30 may be installed. In this case, the microphone 30 may be installed in a different direction so that sound in various directions can be picked up. The microphone 30 may be an interface for connecting a sound collecting device such as an external microphone to the control device 100. In this case, the microphone 30 generates an audio signal from the audio input by the sound collecting device and outputs the audio signal to the control device 100. The microphone 30 is a specific example of the acquisition unit.

スピーカー４０は、制御装置１００から出力された音声信号を音声として出力する。スピーカー４０は、対話装置５０の近傍に設置される。このように構成されることで、対話装置５０が、対話装置５０に対峙する人に話しかけているように見せることができる。スピーカー４０は、複数設けられてもよい。スピーカー４０は、複数設けられることで複数の人と同時に対話をすることが可能になる。スピーカー４０は、例えば指向性スピーカーであってもよい。 The speaker 40 outputs the audio signal output from the control device 100 as audio. The speaker 40 is installed in the vicinity of the dialogue device 50. With this configuration, the dialogue device 50 can appear to be talking to a person facing the dialogue device 50. A plurality of speakers 40 may be provided. By providing a plurality of speakers 40, it is possible to have a dialogue with a plurality of people at the same time. The speaker 40 may be, for example, a directional speaker.

対話装置５０は、自装置に対峙する人に話しかけて、対話を行う装置である。対話装置５０は、制御装置１００の制御に基づいて、所定の動作を行う。対話装置５０は、例えばロボット等の装置であってもよい。以下、本実施形態では、対話装置５０は、ロボットであるとして説明する。ロボットは、例えば、首、肩又は腕の各関節部に設けられた駆動機構を作動して動作してもよい。ロボットは、例えば、肩又は脚等の各関節部に設けられた駆動機構を作動して歩行する動物型であってもよい。ロボットは、肩又は脚等の各関節部に設けられた駆動機構を作動して自立歩行する二足歩行等のロボットであってもよい。ロボットは車輪又は無限軌道で移動できるような移動型ロボットであってもよい。ロボットは、例えばテーブルや受付台等の板状の台の上に設置される。 The dialogue device 50 is a device that talks to a person facing the own device and has a dialogue. The dialogue device 50 performs a predetermined operation based on the control of the control device 100. The dialogue device 50 may be a device such as a robot. Hereinafter, in the present embodiment, the dialogue device 50 will be described as a robot. The robot may operate by operating a drive mechanism provided at each joint of the neck, shoulder or arm, for example. The robot may be, for example, an animal type that walks by operating a drive mechanism provided at each joint such as a shoulder or a leg. The robot may be a robot such as a bipedal walking robot that walks independently by operating a drive mechanism provided at each joint such as a shoulder or a leg. The robot may be a mobile robot capable of moving on wheels or tracks. The robot is installed on a plate-shaped table such as a table or a reception table.

図２は、対話装置５０の一具体例を示す図である。対話装置５０は、頭部５１０と、胴体部とを備える。胴体部は、対話装置５０の胴体と、胴体に接続される左腕部５２０、右腕部５３０及び脚部５４０で構成される。頭部５１０は、マイク５１１、カメラ５１２、第一発光部５１３及び第二発光部５１４を備える。頭部５１０は、制御装置１００の機器制御部１６７の制御に基づいて、上下左右前後の３軸方向に駆動する。対話装置５０は、例えば機器制御部１６７の制御に基づいて、所定の方向に頭部５１０の向きを制御する。なお、頭部５１０の前面部を対話装置５０の顔として説明する。対話装置５０の前面部は、頭部５１０の面のうち、マイク５１１、カメラ５１２、第一発光部５１３及び第二発光部５１４が備えられる面である。 FIG. 2 is a diagram showing a specific example of the dialogue device 50. The dialogue device 50 includes a head 510 and a body portion. The body portion is composed of the body portion of the dialogue device 50, and the left arm portion 520, the right arm portion 530, and the leg portion 540 connected to the body portion. The head 510 includes a microphone 511, a camera 512, a first light emitting unit 513, and a second light emitting unit 514. The head 510 is driven in three axial directions, up, down, left, right, front and back, based on the control of the device control unit 167 of the control device 100. The dialogue device 50 controls the direction of the head 510 in a predetermined direction, for example, based on the control of the device control unit 167. The front portion of the head 510 will be described as the face of the dialogue device 50. The front surface of the dialogue device 50 is a surface of the head 510 provided with a microphone 511, a camera 512, a first light emitting unit 513, and a second light emitting unit 514.

マイク５１１は、対話装置５０近傍の音声を収音する。マイク５１１は、例えば対話装置５０が対峙する人によって発話された音声を収音する。マイク５１１は、収音された音声に基づいて音声信号を生成する。マイク５１１は、生成された音声信号を制御装置１００に記録してもよいし、ネットワークを介して外部の装置に記録してもよい。 The microphone 511 picks up the sound in the vicinity of the dialogue device 50. The microphone 511, for example, picks up the voice spoken by the person facing the dialogue device 50. The microphone 511 generates an audio signal based on the picked-up audio. The microphone 511 may record the generated audio signal in the control device 100, or may record it in an external device via a network.

カメラ５１２は、カメラ５１２が備えられる対話装置５０の前面部が向いている方向の動画像又は静止画像を撮像する。例えば、カメラ５１２は、対話装置５０が対峙する人及び対話装置５０が対峙する人の近傍の動画像又は静止画像を撮像する。カメラ５１２は、撮像された動画像又は静止画像を制御装置１００に記録してもよいし、ネットワークを介して外部の装置に記録してもよい。カメラ５１２は、機器制御部１６７の制御に基づいて、動画像又は静止画像を撮像してもよい。 The camera 512 captures a moving image or a still image in the direction in which the front portion of the dialogue device 50 provided with the camera 512 is facing. For example, the camera 512 captures a moving image or a still image in the vicinity of a person facing the dialogue device 50 and a person facing the dialogue device 50. The camera 512 may record the captured moving image or still image in the control device 100, or may record it in an external device via a network. The camera 512 may capture a moving image or a still image under the control of the device control unit 167.

第一発光部５１３は、フルカラーＬＥＤ（Light Emitting Diode）等の発光部材である。第一発光部５１３は、機器制御部１６７の制御に基づいて、所定の色で発光してもよい。例えば、機器制御部１６７が、第一発光部５１３に緑色に発光させる制御を行った場合について説明する。この場合、第一発光部５１３は、緑色に発光する。なお、第一発光部５１３は、対話装置５０の右目と左目とに相当する。第一発光部５１３は、左右の目を同時に同じ色で発光してもよいし、左右の目を同時に異なる色で発光してもよいし、交互に異なる色で発光してもよいし、交互に同じ色で発光してもよい。 The first light emitting unit 513 is a light emitting member such as a full-color LED (Light Emitting Diode). The first light emitting unit 513 may emit light in a predetermined color under the control of the device control unit 167. For example, a case where the device control unit 167 controls the first light emitting unit 513 to emit green light will be described. In this case, the first light emitting unit 513 emits green light. The first light emitting unit 513 corresponds to the right eye and the left eye of the dialogue device 50. The first light emitting unit 513 may simultaneously emit light of the same color to the left and right eyes, may emit light of different colors at the same time for the left and right eyes, may emit light of different colors alternately, or may alternately emit light of different colors. May emit light in the same color.

第二発光部５１４は、フルカラーＬＥＤ等の発光部材である。第二発光部５１４は、機器制御部１６７の制御に基づいて、所定の色で発光してもよい。例えば、機器制御部１６７が、スピーカー４０の音声出力中に第二発光部５１４を赤色に点滅させる制御を行った場合について説明する。この場合、第二発光部５１４は、スピーカー４０の音声出力中に赤色に点滅する。なお、第二発光部５１４は、対話装置５０の口に相当する。このように構成されることで、スピーカー４０が音声を出力している時に、第二発光部５１４を点滅させることができる。このように第二発光部５１４が発光することで、対話装置５０が、対話装置５０に対峙している人と対話しているように見せることができる。 The second light emitting unit 514 is a light emitting member such as a full-color LED. The second light emitting unit 514 may emit light in a predetermined color under the control of the device control unit 167. For example, a case where the device control unit 167 controls the second light emitting unit 514 to blink red during the audio output of the speaker 40 will be described. In this case, the second light emitting unit 514 blinks red during the audio output of the speaker 40. The second light emitting unit 514 corresponds to the mouth of the dialogue device 50. With this configuration, the second light emitting unit 514 can be made to blink when the speaker 40 is outputting sound. By emitting light from the second light emitting unit 514 in this way, the dialogue device 50 can appear to be interacting with a person facing the dialogue device 50.

左腕部５２０は、左肩関節部５２１と左腕関節部５２２とを備える。左肩関節部５２１は、左腕部５２０を上下左右に駆動させる駆動機構である。左肩関節部５２１は、制御装置１００の制御に基づいて、回転駆動する。例えば、機器制御部１６７が、左腕部５２０を上方向に移動させる制御を行った場合につい説明する。この場合、左肩関節部５２１は左腕部５２０を上方向に移動させるように回転駆動させる。右腕部５３０は、右肩関節部５３１と右腕関節部５３２とを備える。右腕部５３０は、左腕部５２０と同様に、機器制御部１６７の制御に基づいて、右肩関節部５３１と右腕関節部５３２とを駆動させる。脚部５４０は、機器制御部１６７の制御に基づいて、対話装置５０の向きを変更したり、前後左右に移動させたりする。 The left arm portion 520 includes a left shoulder joint portion 521 and a left arm joint portion 522. The left shoulder joint portion 521 is a drive mechanism that drives the left arm portion 520 up, down, left and right. The left shoulder joint portion 521 is rotationally driven under the control of the control device 100. For example, a case where the device control unit 167 controls to move the left arm portion 520 upward will be described. In this case, the left shoulder joint portion 521 is rotationally driven so as to move the left arm portion 520 upward. The right arm portion 530 includes a right shoulder joint portion 531 and a right arm joint portion 532. The right arm portion 530 drives the right shoulder joint portion 531 and the right arm joint portion 532 under the control of the device control unit 167, similarly to the left arm portion 520. The leg portion 540 changes the direction of the dialogue device 50 or moves the dialogue device 50 back and forth and left and right based on the control of the device control unit 167.

図１に戻って、制御システム１の説明を続ける。制御装置１００は、パーソナルコンピュータ、タブレットコンピュータ又はサーバ等の情報処理装置を用いて構成される。制御装置１００は、対話装置５０を動作させる制御機能が実装されている。制御機能は、ハードウェアによって制御装置１００に実装されてもよいし、ソフトウェアのインストールによって実装されてもよい。制御装置１００は、通信部１０１、人情報記憶部１０２、スコアマップ記憶部１０３、発話内容記憶部１０４、動作内容記憶部１０５及び制御部１０６を備える。 Returning to FIG. 1, the description of the control system 1 will be continued. The control device 100 is configured by using an information processing device such as a personal computer, a tablet computer, or a server. The control device 100 is equipped with a control function for operating the dialogue device 50. The control function may be implemented in the control device 100 by hardware, or may be implemented by installing software. The control device 100 includes a communication unit 101, a human information storage unit 102, a score map storage unit 103, an utterance content storage unit 104, an operation content storage unit 105, and a control unit 106.

通信部１０１は、ネットワークインタフェース等の通信装置である。通信部１０１は所定のプロトコルでネットワークに通信可能に接続する。通信部１０１は、制御部１０６の制御に応じてネットワークを介して、他の装置との間でデータ通信する。 The communication unit 101 is a communication device such as a network interface. The communication unit 101 is communicably connected to the network by a predetermined protocol. The communication unit 101 communicates data with other devices via the network under the control of the control unit 106.

人情報記憶部１０２は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。人情報記憶部１０２は、人情報テーブルを記憶する。人情報テーブルは、人と対話装置５０との対話の履歴を人毎に記憶する。人は、センサー１０によって検知された人を示す。人情報テーブルについては後述する。人情報記憶部１０２は、記憶装置の一具体例である。人情報テーブルは、履歴情報の一具体例である。履歴情報は、所定の記憶装置に記憶された人と対話装置５０との対話の履歴を示す。 The human information storage unit 102 is configured by using a storage device such as a magnetic hard disk device or a semiconductor storage device. The human information storage unit 102 stores the human information table. The person information table stores the history of dialogue between the person and the dialogue device 50 for each person. The person indicates a person detected by the sensor 10. The person information table will be described later. The human information storage unit 102 is a specific example of a storage device. The person information table is a specific example of history information. The history information indicates the history of the dialogue between the person and the dialogue device 50 stored in the predetermined storage device.

図３は、人情報テーブルの一具体例を示す図である。人情報テーブルは、人情報レコードを有する。人情報レコードは、人識別子、表情、人発話回数、視線、滞在時間、対話装置発話回数、顔を向けた回数、発話時間、言語及び優先度の各値を有する。人情報レコードは、人毎に生成される。人識別子は、人情報レコードを一意に識別可能な情報である。人識別子は、対話装置５０近傍に訪れた人に対応付けされる。人識別子は、例えば数字及び文字を組み合わせた情報であってもよい。人識別子は、他の人識別子と重複しない情報であればどのような情報であってもよい。人識別子は、人情報レコードが生成される際に人識別部１６１によって決定される。表情は、人の表情に関する情報を示す。表情は、カメラ２０によって撮像された人の動画像に基づいて推定される。表情は、特徴推定部１６２によって推定される。表情は、例えば、笑顔、悲しい顔、怒り顔又は無表情等の人の表情に関する情報であればどのような情報であってもよい。人発話回数は、人から対話装置５０に対する発話の回数を示す。人発話回数は、人が発話した発話内容の音声がマイク３０で収音される都度、１計数される。人発話回数は、例えば音声処理部１６４によって計数される。視線は、人の視線に関する情報を示す。視線は、カメラ２０によって撮像された人の動画像に基づいて推定される。視線は、特徴推定部１６２によって推定される。視線は、例えば、「対話装置」、「対話装置以外」又は「目を閉じている」の３つのいずれかで示されてもよい。「対話装置」では、人の視線が対話装置５０に向いていることを示す。「対話装置以外」では、人の視線が対話装置以外に向いていることを示す。「目を閉じている」では、人の目が閉じていることを示す。滞在時間は、人が対話装置５０の近傍に滞在している時間を示す。滞在時間は、センサー１０によって検知されてから計測が開始される。対話装置発話回数は、対話装置５０から人に対する発話の回数を示す。対話装置発話回数は、スピーカー４０から人に対して発話が行われる都度、１計数される。顔を向けた回数は、対話装置５０が人に顔を向けた回数を示す。顔を向けた回数は、対話装置５０が人に顔を向けると決定される都度、１計数される。発話時間は、対話装置５０が人に対して発話を行った累積時間を示す。発話時間は、対話装置５０が人に対して発話を行う都度、計測される。言語は、人に対して出力される発話の言語を示す。言語は、人によって指定されてもよい。優先度は、人に対話装置５０を向けるか否かを定める指標である。優先度は、所定のタイミングで人毎に算出される。優先度が大きいほど、対話装置５０は、人に顔や体を向けやすくなる。 FIG. 3 is a diagram showing a specific example of the human information table. The person information table has a person information record. The person information record has each value of a person identifier, a facial expression, the number of utterances of a person, the line of sight, the staying time, the number of times the dialogue device utters, the number of times the face is turned, the utterance time, the language, and the priority. A person information record is generated for each person. The person identifier is information that can uniquely identify the person information record. The person identifier is associated with a person who visits the vicinity of the dialogue device 50. The person identifier may be, for example, information that is a combination of numbers and characters. The person identifier may be any information as long as it does not overlap with other person identifiers. The person identifier is determined by the person identification unit 161 when the person information record is generated. Facial expressions indicate information about a person's facial expressions. The facial expression is estimated based on a moving image of a person captured by the camera 20. The facial expression is estimated by the feature estimation unit 162. The facial expression may be any information as long as it is information on a person's facial expression such as a smiling face, a sad face, an angry face, or an expressionless face. The number of human utterances indicates the number of utterances from a person to the dialogue device 50. The number of human utterances is counted by 1 each time the voice of the utterance content spoken by a person is picked up by the microphone 30. The number of human utterances is counted, for example, by the voice processing unit 164. The line of sight indicates information about a person's line of sight. The line of sight is estimated based on a moving image of a person captured by the camera 20. The line of sight is estimated by the feature estimation unit 162. The line of sight may be indicated by, for example, any of the three "dialogue device", "non-dialogue device", or "closed eyes". The “dialogue device” indicates that the line of sight of a person is directed toward the dialogue device 50. “Other than the dialogue device” indicates that the line of sight of the person is directed to other than the dialogue device. "Closed eyes" indicates that the human eye is closed. The staying time indicates the time during which a person stays in the vicinity of the dialogue device 50. The staying time is measured after being detected by the sensor 10. The number of utterances of the dialogue device indicates the number of utterances from the dialogue device 50 to a person. The number of times the dialogue device utters is counted by 1 each time the speaker 40 speaks to a person. The number of times the face is turned indicates the number of times the dialogue device 50 turns the face to a person. The number of times the face is turned is counted by 1 each time the dialogue device 50 is determined to turn the face to a person. The utterance time indicates the cumulative time during which the dialogue device 50 speaks to a person. The utterance time is measured each time the dialogue device 50 speaks to a person. The language indicates the language of the utterance output to the person. The language may be specified by a person. The priority is an index that determines whether or not to point the dialogue device 50 at a person. The priority is calculated for each person at a predetermined timing. The higher the priority, the easier it is for the dialogue device 50 to turn its face or body toward a person.

図３に示される例では、人情報テーブルの最上段の人情報レコードは、人識別子の値が“Ｕｓｅｒ００１”、表情の値が“笑顔”、人発話回数の値が“２”、視線の値が“対話装置”、滞在時間の値が“１８秒”、対話装置発話回数の値が“３”、顔を向けた回数の値が“２”、発話時間の値が“２５秒”、言語の値が“日本語”、優先度の値が“15”である。従って、人情報テーブルの最上段のレコードは、人識別子“Ｕｓｅｒ００１”で識別される人に関する情報が登録されたレコードである。このレコードによると、人は、“笑顔”であり、対話装置５０に対して２回発話を行っている。また、人は視線を対話装置に向けている。人は対話装置５０の近傍に１８秒間いる。また、対話装置５０は、人に発話を３回行っており、発話時間は２５秒である。また、対話装置５０は、人に顔を２回向けている。対話装置５０は、人に発話する際には日本語で発話を行う。人の優先度は15である。なお、図３に示される人情報テーブルは一具体例に過ぎない。そのため、図３とは異なる態様で人情報テーブルが構成されてもよい。例えば、人情報テーブルは、対話装置５０によって顔を向けられていた時間のカラムを有していてもよい。また、人情報レコードは、個人ではなく、グループ毎に生成されてもよい。グループとは、家族や友達等の複数の人で構成される人の集合である。 In the example shown in FIG. 3, the person information record at the top of the person information table has a person identifier value of "User001", a facial expression value of "smile", a person utterance count value of "2", and a line-of-sight value. Is "dialogue device", the value of staying time is "18 seconds", the value of the number of times the dialogue device is spoken is "3", the value of the number of times the face is turned is "2", the value of the speech time is "25 seconds", and the language The value of is "Japanese" and the priority value is "15". Therefore, the record at the top of the person information table is a record in which information about the person identified by the person identifier "User001" is registered. According to this record, the person is "smiling" and speaks twice to the dialogue device 50. Also, people are looking at the dialogue device. The person is in the vicinity of the dialogue device 50 for 18 seconds. Further, the dialogue device 50 speaks to a person three times, and the utterance time is 25 seconds. In addition, the dialogue device 50 turns the face toward the person twice. The dialogue device 50 speaks in Japanese when speaking to a person. The priority of a person is 15. The person information table shown in FIG. 3 is only a specific example. Therefore, the person information table may be configured in a manner different from that shown in FIG. For example, the person information table may have a column of time that was faced by the dialogue device 50. Moreover, the person information record may be generated for each group instead of an individual. A group is a group of people composed of a plurality of people such as family members and friends.

スコアマップ記憶部１０３は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。スコアマップ記憶部１０３は、人の優先度を算出するために用いられる各種の情報を記憶する。例えば、スコアマップ記憶部１０３は、発話内容スコアテーブル、発話履歴スコアテーブル及び人状態スコアテーブルを記憶する。各スコアテーブルについては後述する。 The score map storage unit 103 is configured by using a storage device such as a magnetic hard disk device or a semiconductor storage device. The score map storage unit 103 stores various types of information used for calculating the priority of a person. For example, the score map storage unit 103 stores the utterance content score table, the utterance history score table, and the human state score table. Each score table will be described later.

発話内容記憶部１０４は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。発話内容記憶部１０４は、対話装置５０によって発話される発話内容を記憶する。発話内容記憶部１０４は、所定の文字列と、発話内容とを対応付けて記憶する。所定の文字列とは、例えば人によって発話された内容を示す文字列である。また、発話内容記憶部１０４は、所定の文字列の代わりに、センサー１０によって新しく人が訪れたことが検知されたことと、発話内容とを対応付けて記憶してもよい。発話内容は、予め発話内容記憶部１０４に記録される。発話内容記憶部１０４は、日本語以外の言語で表される発話内容を記憶していてもよい。 The utterance content storage unit 104 is configured by using a storage device such as a magnetic hard disk device or a semiconductor storage device. The utterance content storage unit 104 stores the utterance content uttered by the dialogue device 50. The utterance content storage unit 104 stores a predetermined character string in association with the utterance content. The predetermined character string is, for example, a character string indicating the content uttered by a person. Further, the utterance content storage unit 104 may store the utterance content in association with the fact that the sensor 10 has detected that a new person has visited instead of the predetermined character string. The utterance content is recorded in advance in the utterance content storage unit 104. The utterance content storage unit 104 may store the utterance content expressed in a language other than Japanese.

動作内容記憶部１０５は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。動作内容記憶部１０５は、対話装置５０が人に対して振る舞う動作の内容（以下、「動作内容」という。）を記憶する。動作内容は、例えば、所定の人に対して顔を向ける動作であってもよい。動作内容は、例えば、所定の人に対して体を向ける動作であってもよい。動作内容は、例えば、所定の人に対して左腕を上げる動作であってもよい。動作内容は、対話装置５０によって可能なノンバーバルな動作であればどのような内容であってもよい。所定の人とは、例えば動作の対象として決定された人である。動作内容記憶部１０５は、動作内容と発話内容とを対応付けて記憶する。また、動作内容記憶部１０５は、人情報レコードの各カラムのとりうる値と、動作内容とを対応付けて記憶していてもよい。このように構成されることで、制御装置１００は、人の状態や、対話装置５０の発話内容に基づいて、対話装置５０を動作させることができる。動作内容は、予め動作内容記憶部１０５に記録される。 The operation content storage unit 105 is configured by using a storage device such as a magnetic hard disk device or a semiconductor storage device. The operation content storage unit 105 stores the content of the operation (hereinafter, referred to as “operation content”) that the dialogue device 50 behaves with respect to a person. The operation content may be, for example, an operation of turning a face toward a predetermined person. The operation content may be, for example, an operation of turning the body toward a predetermined person. The operation content may be, for example, an operation of raising the left arm with respect to a predetermined person. The operation content may be any nonverbal operation possible by the dialogue device 50. The predetermined person is, for example, a person determined as an object of action. The operation content storage unit 105 stores the operation content and the utterance content in association with each other. Further, the operation content storage unit 105 may store the possible values of each column of the human information record and the operation content in association with each other. With this configuration, the control device 100 can operate the dialogue device 50 based on the state of the person and the utterance content of the dialogue device 50. The operation content is recorded in advance in the operation content storage unit 105.

図１に戻って、制御システム１の説明を続ける。制御部１０６は、制御装置１００の各部の動作を制御する。制御部１０６は、ＣＰＵ（Central Processing Unit）等のプロセッサ及びＲＡＭ（Random Access Memory）を用いて構成される。制御部１０６は、プロセッサが特定のプログラムを実行することによって、人識別部１６１、特徴推定部１６２、優先度算出部１６３、音声処理部１６４、発話決定部１６５、動作決定部１６６及び機器制御部１６７として機能する。 Returning to FIG. 1, the description of the control system 1 will be continued. The control unit 106 controls the operation of each unit of the control device 100. The control unit 106 is configured by using a processor such as a CPU (Central Processing Unit) and a RAM (Random Access Memory). When the processor executes a specific program, the control unit 106 includes a person identification unit 161, a feature estimation unit 162, a priority calculation unit 163, a voice processing unit 164, an utterance determination unit 165, an operation determination unit 166, and a device control unit. Functions as 167.

人識別部１６１は、対話装置５０に対峙する人を識別する。具体的には、人識別部１６１は、センサー１０によって検出された人に関する情報をセンサー１０から取得する。人識別部１６１は、取得した人の情報に基づいて、新しく対話装置５０の近傍に訪れた人であるか否かを識別する。センサー１０によって検出された人が対話装置５０を新しく訪れた人である場合、人識別部１６１は、検出された人に関する人情報を生成する。人識別部１６１は、生成した人情報を含む人情報レコードを作成する。人識別部１６１は、生成した人情報レコードを人情報テーブルに新たに追加することによって、新たに検出された人の情報を登録する。なお、人識別部１６１は、対話装置５０を新しく訪れた人であるか否かを識別するために、顔認証等の公知の手段を用いてもよい。なお、人識別部１６１は、人情報レコードによって示される人に関する情報をセンサー１０から取得できなくなった場合、当該人情報レコードを人情報テーブルから削除してもよい。このように構成されることで、制御装置１００は、対話装置５０の近傍から立ち去った人に関する人情報レコードを削除することが可能になる。 The person identification unit 161 identifies a person facing the dialogue device 50. Specifically, the person identification unit 161 acquires information about a person detected by the sensor 10 from the sensor 10. The person identification unit 161 identifies whether or not the person is a new visitor in the vicinity of the dialogue device 50 based on the acquired information of the person. When the person detected by the sensor 10 is a new visitor to the dialogue device 50, the person identification unit 161 generates human information about the detected person. The person identification unit 161 creates a person information record including the generated person information. The person identification unit 161 registers the newly detected person information by newly adding the generated person information record to the person information table. The person identification unit 161 may use a known means such as face recognition to identify whether or not the person is a new visitor to the dialogue device 50. When the person identification unit 161 cannot acquire the information about the person indicated by the person information record from the sensor 10, the person identification unit 161 may delete the person information record from the person information table. With such a configuration, the control device 100 can delete the person information record about the person who has left from the vicinity of the dialogue device 50.

特徴推定部１６２は、センサー１０によって検出された人の特徴を推定する。特徴推定部１６２は推定された特徴を人情報レコードに記録する。具体的には、特徴推定部１６２は、カメラ２０によって撮像された人の動画像に基づいて、撮像された人の特徴を推定する。例えば、特徴推定部１６２は、撮像された人の表情を推定する。特徴推定部１６２は、撮像された人の動画像を解析することで、人が、笑顔、悲しい顔、怒り顔又は無表情のいずれの表情であるかを推定する。特徴推定部１６２は、推定された表情を人情報レコードの表情カラムに記録する。また、特徴推定部１６２は、撮像された人の視線を推定する。特徴推定部１６２は、撮像された人の動画像を解析することで、人の視線が「対話装置５０」、「対話装置５０以外」又は「目を閉じている」のいずれであるか推定する。特徴推定部１６２は、推定された視線を人情報レコードの視線カラムに記録する。なお、特徴推定部１６２は、表情の推定又はカラムの推定にはいずれも公知の手段を用いて推定してもよい。 The feature estimation unit 162 estimates the characteristics of the person detected by the sensor 10. The feature estimation unit 162 records the estimated feature in the human information record. Specifically, the feature estimation unit 162 estimates the characteristics of the captured person based on the moving image of the person captured by the camera 20. For example, the feature estimation unit 162 estimates the facial expression of an imaged person. The feature estimation unit 162 estimates whether a person has a smiling face, a sad face, an angry face, or an expressionless face by analyzing the captured moving image of the person. The feature estimation unit 162 records the estimated facial expression in the facial expression column of the human information record. In addition, the feature estimation unit 162 estimates the line of sight of the photographed person. The feature estimation unit 162 estimates whether the line of sight of the person is "dialogue device 50", "other than the dialogue device 50", or "closed eyes" by analyzing the captured moving image of the person. .. The feature estimation unit 162 records the estimated line of sight in the line of sight column of the human information record. The feature estimation unit 162 may estimate the facial expression or the column by using known means.

優先度算出部１６３は、センサー１０、カメラ２０及びマイク３０によって取得された情報と、スコアマップ記憶部１０３とに基づいて、人毎に優先度を算出する。優先度算出の具体的な手法については後述する。優先度算出部１６３は、算出された優先度を人情報テーブルに記録する。 The priority calculation unit 163 calculates the priority for each person based on the information acquired by the sensor 10, the camera 20, and the microphone 30 and the score map storage unit 103. The specific method for calculating the priority will be described later. The priority calculation unit 163 records the calculated priority in the person information table.

音声処理部１６４は、音声認識処理を実行する。音声認識処理は、音声信号に基づいて文字列を生成する処理である。音声処理部１６４は、音声認識処理を実行することで、マイク３０によって出力された音声信号に基づいて文字列を生成する。音声処理部１６４は、生成された文字列を発話決定部１６５に出力する。音声処理部１６４は、公知の手法を用いて文字列を生成してもよい。なお、センサー１０によって検知された人が複数いる場合について説明する。この場合、音声処理部１６４は、公知の音源分離処理に基づいて、人と人によって発話された音声に関する音声信号とを対応付けする。 The voice processing unit 164 executes the voice recognition process. The voice recognition process is a process of generating a character string based on a voice signal. The voice processing unit 164 generates a character string based on the voice signal output by the microphone 30 by executing the voice recognition process. The voice processing unit 164 outputs the generated character string to the utterance determination unit 165. The voice processing unit 164 may generate a character string by using a known method. A case where there are a plurality of people detected by the sensor 10 will be described. In this case, the voice processing unit 164 associates a person with a voice signal related to a voice uttered by the person based on a known sound source separation process.

発話決定部１６５は、対話装置５０の発話内容を決定する。具体的には、発話決定部１６５は、音声処理部１６４によって生成された文字列に基づいて発話内容を決定する。発話決定部１６５は、生成された文字列に対応付けされた発話内容を発話内容記憶部１０４から取得する。発話決定部１６５は、取得された発話内容を対話装置５０の発話内容として決定する。 The utterance determination unit 165 determines the utterance content of the dialogue device 50. Specifically, the utterance determination unit 165 determines the utterance content based on the character string generated by the voice processing unit 164. The utterance determination unit 165 acquires the utterance content associated with the generated character string from the utterance content storage unit 104. The utterance determination unit 165 determines the acquired utterance content as the utterance content of the dialogue device 50.

動作決定部１６６は、対話装置５０の動作内容を決定する。具体的には、動作決定部１６６は、対話装置５０の顔を向ける方向を決定する。対話装置５０が顔を向ける方向は、検知された人のうち、いずれか一人のいる方向である。例えば、動作決定部１６６は、対話装置５０の顔を向ける方向として、人毎に算出された優先度のうち、優先度の最も高い人の方向を対話装置５０の顔を向ける方向として決定する。この場合、動作決定部１６６は、センサー１０やカメラ２０によって取得された情報に基づいて、人の位置を示す座標を推定する。動作決定部１６６は、推定された座標の方向に対話装置５０の顔を向けるように頭部５１０や脚部５４０の角度を決定する。 The operation determination unit 166 determines the operation content of the dialogue device 50. Specifically, the motion determination unit 166 determines the direction in which the dialogue device 50 faces. The direction in which the dialogue device 50 turns its face is the direction in which any one of the detected persons is present. For example, the motion determination unit 166 determines as the direction in which the face of the dialogue device 50 is directed, the direction of the person having the highest priority among the priorities calculated for each person as the direction in which the face of the dialogue device 50 is directed. In this case, the motion determination unit 166 estimates the coordinates indicating the position of the person based on the information acquired by the sensor 10 and the camera 20. The motion determining unit 166 determines the angles of the head 510 and the legs 540 so that the face of the dialogue device 50 faces in the direction of the estimated coordinates.

また、動作決定部１６６は、対話装置５０の顔を向ける方向以外の動作内容を決定する。例えば、動作決定部１６６は、対話装置５０の発話内容に対応付けされた動作内容に決定してもよい。動作決定部１６６は、人情報レコードに記録された情報に基づいて動作内容を決定してもよい。動作決定部１６６は、決定された動作内容を動作内容記憶部１０５から取得する。 In addition, the operation determination unit 166 determines the operation content other than the direction in which the dialogue device 50 faces the face. For example, the operation determination unit 166 may determine the operation content associated with the utterance content of the dialogue device 50. The operation determination unit 166 may determine the operation content based on the information recorded in the human information record. The operation determination unit 166 acquires the determined operation content from the operation content storage unit 105.

機器制御部１６７は、決定された動作内容及び発話内容に基づいて、スピーカー４０及び対話装置５０を制御する。例えば、機器制御部１６７は、決定された発話内容に基づいて、音声信号を生成する。機器制御部１６７は、音声信号をスピーカー４０に出力することで、スピーカー４０から音声を出力させる。なお、機器制御部１６７は、複数のスピーカーが設置されている場合、複数の人に対する音声信号を同時に出力してもよい。このとき、機器制御部１６７は、人のいる位置に応じて、スピーカー４０毎に異なる音声信号を出力してもよい。また、機器制御部１６７は、決定された動作内容に基づいて、対話装置５０を制御する。 The device control unit 167 controls the speaker 40 and the dialogue device 50 based on the determined operation content and utterance content. For example, the device control unit 167 generates an audio signal based on the determined utterance content. The device control unit 167 outputs an audio signal to the speaker 40 to output audio from the speaker 40. When a plurality of speakers are installed, the device control unit 167 may output audio signals to a plurality of people at the same time. At this time, the device control unit 167 may output a different audio signal for each speaker 40 depending on the position where a person is present. Further, the device control unit 167 controls the dialogue device 50 based on the determined operation content.

図４は、発話内容スコアテーブルの一具体例を示す図である。発話内容スコアテーブルは、対話装置５０の発話内容とスコアとを対応付けたテーブルである。対話装置５０が人に所定の発話を行った場合について説明する。優先度算出部１６３は、対話装置５０によって行われた発話内容に対応付けされたスコアを優先度の算出に用いる。発話内容スコアテーブルは、発話内容スコアレコードを有する。発話内容スコアレコードは、発話内容及びスコアの各値を有する。発話内容テーブルは、発話内容毎に発話内容レコードを有する。発話内容は、対話装置５０から人に対して発話される内容を示す。発話内容は、発話内容記憶部１０４に記憶される発話内容と同じ内容を示す。スコアは、複数の人のうち、どの人に顔を向けるべきかを数値化した情報である。スコアは、優先度の算出に用いられる。 FIG. 4 is a diagram showing a specific example of the utterance content score table. The utterance content score table is a table in which the utterance content of the dialogue device 50 and the score are associated with each other. A case where the dialogue device 50 makes a predetermined utterance to a person will be described. The priority calculation unit 163 uses the score associated with the utterance content made by the dialogue device 50 to calculate the priority. The utterance content score table has an utterance content score record. The utterance content score record has each value of the utterance content and the score. The utterance content table has an utterance content record for each utterance content. The utterance content indicates the content uttered from the dialogue device 50 to a person. The utterance content indicates the same content as the utterance content stored in the utterance content storage unit 104. The score is information that quantifies which of a plurality of people should be turned to. The score is used to calculate the priority.

図４に示される例では、発話内容スコアテーブルの最上段の発話内容スコアレコードは、発話内容の値が“挨拶（おはよう、こんにちは）”、スコアの値が“５”である。従って、発話内容スコアテーブルの最上段のレコードによると、対話装置５０が、おはよう又はこんにちは等の挨拶を人に行う場合、人に５のスコアを用いて優先度が算出される。なお、図４に示される発話内容スコアテーブルは一具体例に過ぎない。そのため、図４とは異なる態様で発話内容スコアテーブルが構成されてもよい。例えば、発話内容スコアテーブルは、他の言語で表された発話内容のカラムを有していてもよい。 In the example shown in FIG. 4, the utterance content score record at the top of the utterance content score table has a utterance content value of "greeting (good morning, hello)" and a score value of "5". Therefore, according to the record at the top of the utterance content score table, when the dialogue device 50 gives a greeting such as good morning or hello to a person, the priority is calculated using a score of 5 for the person. The utterance content score table shown in FIG. 4 is only a specific example. Therefore, the utterance content score table may be configured in a mode different from that shown in FIG. For example, the utterance content score table may have a column of utterance content expressed in another language.

図５は、発話履歴スコアテーブルの一具体例を示す図である。発話履歴スコアテーブルは、対話装置５０から人に対する発話や動作に関する情報とスコアとを対応付けたテーブルである。発話履歴スコアテーブルは、発話履歴スコアレコードを有する。発話履歴スコアレコードは、履歴内容、回数・時間及びスコアの各値を有する。履歴内容は、対話装置５０から人に対する発話や動作に関する内容を示す。履歴内容は、例えば、顔を向けた回数、対話装置発話回数及び発話時間を有する。履歴内容は、人状態レコードのカラムのうち、対話装置５０の発話や動作に関する内容を示す。回数・時間は、履歴内容に示される発話又は動作が行われた回数を示す。回数・時間は、履歴内容に示される発話又は動作が行われた時間を示す。履歴内容及び回数・時間はスコアに対応付けされる。 FIG. 5 is a diagram showing a specific example of the utterance history score table. The utterance history score table is a table in which information on utterances and actions from the dialogue device 50 to a person is associated with a score. The utterance history score table has a utterance history score record. The utterance history score record has each value of history content, number of times / time, and score. The history content indicates the content related to utterances and actions from the dialogue device 50 to a person. The history content includes, for example, the number of times the face is turned, the number of times the dialogue device utters, and the utterance time. The history content indicates the content related to the utterance and operation of the dialogue device 50 in the column of the human status record. The number of times / time indicates the number of times the utterance or action shown in the history content has been performed. The number of times / time indicates the time during which the utterance or action shown in the history content was performed. The history content and the number of times / time are associated with the score.

図５に示される例では、発話履歴スコアテーブルの最上段の発話内容スコアレコードは、履歴内容の値が“顔を向けた回数”、回数・時間の値が“５回以上”、スコアの値が“１”である。従って、発話履歴スコアテーブルの最上段のレコードによると、対話装置５０が人に対して、５回以上顔を向けている場合、人に１のスコアを用いて優先度が算出される。なお、図５に示される発話履歴スコアテーブルは一具体例に過ぎない。そのため、図５とは異なる態様で発話履歴スコアテーブルが構成されてもよい。 In the example shown in FIG. 5, in the utterance content score record at the top of the utterance history score table, the value of the history content is "the number of times the face is turned", the value of the number of times / time is "5 times or more", and the score value. Is "1". Therefore, according to the record at the top of the utterance history score table, when the dialogue device 50 faces a person five times or more, the priority is calculated using a score of 1 for the person. The utterance history score table shown in FIG. 5 is only a specific example. Therefore, the utterance history score table may be configured in a mode different from that shown in FIG.

図６は、人状態スコアテーブルの一具体例を示す図である。人状態スコアテーブルは、人の状態に関する情報とスコアとを対応付けたテーブルである。人状態スコアテーブルは、人状態スコアレコードを有する。人状態スコアレコードは、人の状態、詳細及びスコアの各値を有する。人の状態は、人の状態に関する内容を示す。人の状態は、例えば、表情、人発話回数、視線及び滞在時間を有する。人の状態は、人状態レコードのカラムのうち、人の状態に関する内容を示す。詳細は、人の状態の具体的な内容を表す。例えば、図６では、人の状態“表情”の具体的な内容として“笑顔”や“悲しい顔”が示されている。スコアは、人の状態に対して与えられる得点を示す。スコアは、人の表情、人発話回数、人の視線及び人の滞在時間の具体的な内容毎に設定されている。 FIG. 6 is a diagram showing a specific example of the human condition score table. The human condition score table is a table in which information about a person's condition and a score are associated with each other. The human status score table has a human status score record. The human condition score record has each value of the person's condition, details and score. The state of a person indicates the content related to the state of the person. A person's condition has, for example, a facial expression, the number of utterances, a line of sight, and a staying time. The human state indicates the contents related to the human state in the column of the human state record. The details represent the specific content of a person's condition. For example, in FIG. 6, a “smile” and a “sad face” are shown as specific contents of the human condition “facial expression”. The score indicates the score given to the person's condition. The score is set for each specific content of a person's facial expression, the number of utterances, a person's line of sight, and a person's staying time.

図６に示される例では、人状態スコアテーブルの最上段の人状態スコアレコードは、人の状態の値が“表情”、詳細の値が“笑顔”、スコアの値が“３”である。従って、人状態スコアテーブルの最上段のレコードによると、人が対話装置５０に対して、笑顔である場合、人に３のスコアを用いて優先度が算出される。なお、図６に示される人状態スコアテーブルは一具体例に過ぎない。そのため、図６とは異なる態様で人状態スコアテーブルが構成されてもよい。 In the example shown in FIG. 6, in the human condition score record at the top of the human condition score table, the human condition value is "facial expression", the detailed value is "smile", and the score value is "3". Therefore, according to the record at the top of the human state score table, when the person smiles at the dialogue device 50, the priority is calculated using the score of 3 for the person. The human condition score table shown in FIG. 6 is only a specific example. Therefore, the human condition score table may be configured in a manner different from that shown in FIG.

図７は、人情報の生成に関する処理の一具体例を示すフローチャートである。人情報の生成に関する処理は、例えばセンサー１０によって人が検出される都度実行される。制御装置１００の人識別部１６１は、センサー１０によって検出された人を識別する（ステップＳ１０１）。具体的には、人識別部１６１は、センサー１０によって検出された人に関する情報をセンサー１０から取得する。人識別部１６１は、取得された人に関する情報に基づいて、対話装置５０の近傍に新しく訪れた人であるか否かを識別する。 FIG. 7 is a flowchart showing a specific example of the process related to the generation of human information. The process related to the generation of human information is executed each time a person is detected by, for example, the sensor 10. The person identification unit 161 of the control device 100 identifies the person detected by the sensor 10 (step S101). Specifically, the person identification unit 161 acquires information about a person detected by the sensor 10 from the sensor 10. The person identification unit 161 identifies whether or not the person is a new visitor in the vicinity of the dialogue device 50 based on the acquired information about the person.

検出された人が対話装置５０を新しく訪れた人である場合（ステップＳ１０２：ＹＥＳ）、人識別部１６１は、検出された人に関する人情報を生成する（ステップＳ１０３）。人識別部１６１は、生成された人情報を人情報レコードとして人情報テーブルに記録する。検出された人が対話装置５０を新しく訪れた人でない場合（ステップＳ１０２：ＮＯ）、処理は、ステップＳ１０４に遷移する。 When the detected person is a new visitor to the dialogue device 50 (step S102: YES), the person identification unit 161 generates human information about the detected person (step S103). The person identification unit 161 records the generated person information as a person information record in the person information table. If the detected person is not a new visitor to the dialogue device 50 (step S102: NO), the process transitions to step S104.

制御装置１００の特徴推定部１６２は、検出された人の特徴を推定する（ステップＳ１０４）。具体的には、特徴推定部１６２は、カメラ２０によって撮像された人の動画像に基づいて、撮像された人の表情を推定する。特徴推定部１６２は、カメラ２０によって撮像された人の動画像に基づいて、撮像された人の視線を推定する。特徴推定部１６２は推定された表情及び視線を人情報レコードに記録する（ステップＳ１０５）。 The feature estimation unit 162 of the control device 100 estimates the features of the detected person (step S104). Specifically, the feature estimation unit 162 estimates the facial expression of the captured person based on the moving image of the person captured by the camera 20. The feature estimation unit 162 estimates the line of sight of the captured person based on the moving image of the person captured by the camera 20. The feature estimation unit 162 records the estimated facial expression and line of sight in the human information record (step S105).

図８は、対話装置５０の動作を決定する処理の一具体例を示すフローチャートである。対話装置５０の動作を決定する処理（以下、「動作決定処理」という。）は、例えば優先度を算出するタイミングで実行される。優先度を算出するタイミングは、例えば人が新たに来たことが検知されたタイミング、対話装置５０による発話が終わったタイミング、所定の間隔（例えば、５秒に１回）などである。この場合、優先度を算出するタイミングになると、制御装置１００は、動作決定処理を実行する。まず、制御装置１００の優先度算出部１６３は、人情報記憶部１０２から人情報テーブルを取得する（ステップＳ２０１）。次に、優先度算出部１６３は、人情報テーブルに基づいて、優先度算出の対象となる人（以下、「対象者」という。）を決定する（ステップＳ２０２）。具体的には、優先度算出部１６３は、人情報テーブルが有する人情報レコードのうち、いずれか１つの人情報レコードを決定する。優先度算出部１６３は、決定された人情報レコードによって示される人を対象者として決定する。優先度算出部１６３は、例えば人情報テーブルの最上段から順番に対象者を決定してもよい。優先度算出部１６３は、今回の動作決定処理において優先度を算出されていない人に決定するならばどのような手段で対象者を決定してもよい。優先度算出部１６３は、決定された対象者の人情報レコードを取得する（ステップＳ２０３）。 FIG. 8 is a flowchart showing a specific example of the process of determining the operation of the dialogue device 50. The process of determining the operation of the dialogue device 50 (hereinafter, referred to as “operation determination process”) is executed at the timing of calculating the priority, for example. The timing for calculating the priority is, for example, the timing when a new person is detected, the timing when the utterance by the dialogue device 50 ends, a predetermined interval (for example, once every 5 seconds), and the like. In this case, when it is time to calculate the priority, the control device 100 executes the operation determination process. First, the priority calculation unit 163 of the control device 100 acquires the person information table from the person information storage unit 102 (step S201). Next, the priority calculation unit 163 determines a person (hereinafter, referred to as “target person”) to be the target of priority calculation based on the person information table (step S202). Specifically, the priority calculation unit 163 determines any one of the person information records held in the person information table. The priority calculation unit 163 determines the person indicated by the determined person information record as the target person. The priority calculation unit 163 may determine the target person in order from the top of the person information table, for example. The priority calculation unit 163 may determine the target person by any means as long as it determines the person whose priority has not been calculated in the current operation determination process. The priority calculation unit 163 acquires the determined person information record of the target person (step S203).

次に、優先度算出部１６３は、発話履歴スコアを算出する（ステップＳ２０４）。優先度算出部１６３は、発話履歴スコア算出処理を実行することで対象者の発話履歴スコアを算出する。発話履歴スコア算出処理については後述する。次に、優先度算出部１６３は、人状態スコアを算出する（ステップＳ２０５）。優先度算出部１６３は、人状態スコア算出処理を実行することで対象者の人状態スコアを算出する。人状態スコア算出処理については後述する。次に、優先度算出部１６３は、発話内容スコアを算出する（ステップＳ２０６）。優先度算出部１６３は、発話内容スコア算出処理を実行することで対象者の発話内容スコアを算出する。発話内容スコア算出処理については後述する。なお、発話内容スコアの算出の際に、スピーカー４０から出力される発話内容が決定される。優先度算出部１６３は、算出された発話履歴スコア、人状態スコア及び発話内容スコアに基づいて、対象者の優先度を算出する（ステップＳ２０７）。例えば、優先度算出部１６３は、発話履歴スコア、人状態スコア及び発話内容スコアを以下の数式（１）のように、加算することで優先度を算出してもよい。例えば、優先度算出部１６３は、発話履歴スコア、人状態スコア及び発話内容スコアの各スコアに以下の数式（２）のように、重みづけをすることで優先度を算出してもよい。なお、重みは、任意の値が指定されてもよい。 Next, the priority calculation unit 163 calculates the utterance history score (step S204). The priority calculation unit 163 calculates the utterance history score of the target person by executing the utterance history score calculation process. The utterance history score calculation process will be described later. Next, the priority calculation unit 163 calculates the human condition score (step S205). The priority calculation unit 163 calculates the human condition score of the target person by executing the human condition score calculation process. The human condition score calculation process will be described later. Next, the priority calculation unit 163 calculates the utterance content score (step S206). The priority calculation unit 163 calculates the utterance content score of the target person by executing the utterance content score calculation process. The utterance content score calculation process will be described later. When calculating the utterance content score, the utterance content output from the speaker 40 is determined. The priority calculation unit 163 calculates the priority of the target person based on the calculated utterance history score, human condition score, and utterance content score (step S207). For example, the priority calculation unit 163 may calculate the priority by adding the utterance history score, the human condition score, and the utterance content score as in the following mathematical formula (1). For example, the priority calculation unit 163 may calculate the priority by weighting each score of the utterance history score, the human state score, and the utterance content score as in the following mathematical formula (2). Any value may be specified for the weight.

優先度算出部１６３は、全ての人の優先度を算出したか否かを判定する（ステップＳ２０８）。具体的には、優先度算出部１６３は、今回の動作決定処理において優先度を算出されていない人がいない場合、全ての人の優先度を算出したと判定する。優先度算出部１６３は、今回の動作決定処理において優先度を算出されていない人がいる場合、全ての人の優先度を算出していないと判定する。 The priority calculation unit 163 determines whether or not the priority of all persons has been calculated (step S208). Specifically, the priority calculation unit 163 determines that the priority of all the persons has been calculated when there is no person for whom the priority has not been calculated in the operation determination process this time. If there is a person whose priority has not been calculated in the operation determination process this time, the priority calculation unit 163 determines that the priority of all the people has not been calculated.

全ての人の優先度を算出していない場合（ステップＳ２０８：ＮＯ）、処理は、ステップＳ２０２に遷移する。優先度算出部１６３は優先度算出の対象となる人を決定する。全ての人の優先度を算出している場合（ステップＳ２０８：ＹＥＳ）、動作決定部１６６は動作を行う対象となる人を決定する（ステップＳ２０９）。具体的には、動作決定部１６６は、算出された優先度のうち、優先度の最も高い人を対話装置５０が顔を向ける人として決定する。なお、動作決定部１６６は、優先度の最も高い人が複数いた場合、複数の人のうちいずれか一人の人に決定する。この場合、動作決定部１６６は、どのような手段を用いて人を決定してもよい。例えば、動作決定部１６６は、人情報レコードに基づいて、顔を向けた回数が最も少ない人に決定してもよい。例えば、動作決定部１６６は、対話装置５０の姿勢を向けられていた時間が最も短い人に決定してもよい。 If the priority of all persons has not been calculated (step S208: NO), the process proceeds to step S202. The priority calculation unit 163 determines a person to be the target of the priority calculation. When the priority of all people is calculated (step S208: YES), the operation determination unit 166 determines the person to perform the operation (step S209). Specifically, the motion determination unit 166 determines the person with the highest priority among the calculated priorities as the person to whom the dialogue device 50 faces. If there are a plurality of people with the highest priority, the motion determination unit 166 determines one of the plurality of people. In this case, the motion determination unit 166 may determine a person by any means. For example, the motion determination unit 166 may determine the person who has turned his / her face the least number of times based on the person information record. For example, the motion determination unit 166 may determine the person who has been in the posture of the dialogue device 50 for the shortest time.

動作決定部１６６は、決定された人に対して行う対話装置５０の動作内容を決定する（ステップＳ２１０）。例えば、動作決定部１６６は、対話装置５０の顔の向ける方向を決定する。動作決定部１６６は、センサー１０やカメラ２０によって取得された情報に基づいて、人の位置を示す座標を推定する。動作決定部１６６は、推定された座標の方向に対話装置５０の顔を向けるように頭部５１０や脚部５４０の角度を決定する。また、動作決定部１６６は、決定された人に対する発話内容に対応付けされた動作内容を動作内容記憶部１０５から取得する。動作決定部１６６は、取得された動作内容を、対話装置５０の動作内容として決定する。機器制御部１６７は、決定された動作内容及び発話内容に基づいて、スピーカー４０又は対話装置５０を制御する（ステップＳ２１１）。 The operation determination unit 166 determines the operation content of the dialogue device 50 to be performed on the determined person (step S210). For example, the motion determination unit 166 determines the direction in which the face of the dialogue device 50 is directed. The motion determination unit 166 estimates the coordinates indicating the position of the person based on the information acquired by the sensor 10 and the camera 20. The motion determining unit 166 determines the angles of the head 510 and the legs 540 so that the face of the dialogue device 50 faces in the direction of the estimated coordinates. Further, the operation determination unit 166 acquires the operation content associated with the utterance content for the determined person from the operation content storage unit 105. The operation determination unit 166 determines the acquired operation content as the operation content of the dialogue device 50. The device control unit 167 controls the speaker 40 or the dialogue device 50 based on the determined operation content and utterance content (step S211).

図９は、発話履歴スコア算出処理の流れの一具体例を示すフローチャートである。優先度算出部１６３は、発話履歴スコアテーブルから顔を向けた回数のスコアを取得する（ステップＳ２４１）。具体的には、優先度算出部１６３は、ステップＳ２０３において取得された人情報レコードから、顔を向けた回数を取得する。優先度算出部１６３は、スコアマップ記憶部１０３から発話履歴スコアテーブルを取得する。優先度算出部１６３は、顔を向けた回数に対応付けされたスコアを発話履歴スコアテーブルから取得する。例えば、人情報レコードが有する顔を向けた回数が３である場合、優先度算出部１６３は、発話履歴スコアテーブルから顔を向けた回数のスコアとして３を取得する。 FIG. 9 is a flowchart showing a specific example of the flow of the utterance history score calculation process. The priority calculation unit 163 acquires the score of the number of times the face is turned from the utterance history score table (step S241). Specifically, the priority calculation unit 163 acquires the number of times the face is turned from the person information record acquired in step S203. The priority calculation unit 163 acquires the utterance history score table from the score map storage unit 103. The priority calculation unit 163 acquires a score associated with the number of times the face is turned from the utterance history score table. For example, when the number of times the person turns the face is 3, the priority calculation unit 163 acquires 3 as the score of the number of times the face is turned from the utterance history score table.

優先度算出部１６３は、発話履歴スコアテーブルから対話装置発話回数のスコアを取得する（ステップＳ２４２）。具体的には、優先度算出部１６３は、ステップＳ２０３において取得された人情報レコードから、対話装置発話回数を取得する。優先度算出部１６３は、スコアマップ記憶部１０３から発話履歴スコアテーブルを取得する。優先度算出部１６３は、対話装置発話回数に対応付けされたスコアを発話履歴スコアテーブルから取得する。例えば、人情報レコードが有する対話装置発話回数が０である場合、優先度算出部１６３は、発話履歴スコアテーブルから対話装置発話回数のスコアとして５を取得する。 The priority calculation unit 163 acquires the score of the number of utterances of the dialogue device from the utterance history score table (step S242). Specifically, the priority calculation unit 163 acquires the number of speeches of the dialogue device from the person information record acquired in step S203. The priority calculation unit 163 acquires the utterance history score table from the score map storage unit 103. The priority calculation unit 163 acquires a score associated with the number of utterances of the dialogue device from the utterance history score table. For example, when the number of utterances of the dialogue device held by the human information record is 0, the priority calculation unit 163 acquires 5 as the score of the number of utterances of the dialogue device from the utterance history score table.

優先度算出部１６３は、発話履歴スコアテーブルから発話時間のスコアを取得する（ステップＳ２４３）。具体的には、優先度算出部１６３は、ステップＳ２０３において取得された人情報レコードから、発話時間を取得する。優先度算出部１６３は、スコアマップ記憶部１０３から発話履歴スコアテーブルを取得する。優先度算出部１６３は、発話時間に対応付けされたスコアを発話履歴スコアテーブルから取得する。例えば、人情報レコードが有する発話時間が２５秒である場合、優先度算出部１６３は、発話履歴スコアテーブルから発話時間のスコアとして１を取得する。 The priority calculation unit 163 acquires the utterance time score from the utterance history score table (step S243). Specifically, the priority calculation unit 163 acquires the utterance time from the person information record acquired in step S203. The priority calculation unit 163 acquires the utterance history score table from the score map storage unit 103. The priority calculation unit 163 acquires the score associated with the utterance time from the utterance history score table. For example, when the utterance time of the person information record is 25 seconds, the priority calculation unit 163 acquires 1 as the utterance time score from the utterance history score table.

優先度算出部１６３は、発話履歴スコアを算出する（ステップＳ２４４）。具体的には、優先度算出部１６３は、ステップＳ２４１〜Ｓ２４３において取得された各スコアに対して所定の演算を行うことで発話履歴スコアを算出してもよい。例えば、優先度算出部１６３は、各スコアを加算することで発話履歴スコアを算出してもよい。例えば、優先度算出部１６３は、各スコアに所定の重みづけをすることで発話履歴スコアを算出してもよい。なお、重みは、任意の値が指定されてもよい。 The priority calculation unit 163 calculates the utterance history score (step S244). Specifically, the priority calculation unit 163 may calculate the utterance history score by performing a predetermined calculation on each score acquired in steps S241 to S243. For example, the priority calculation unit 163 may calculate the utterance history score by adding each score. For example, the priority calculation unit 163 may calculate the utterance history score by giving a predetermined weight to each score. Any value may be specified for the weight.

図１０は、人状態スコア算出処理の流れの一具体例を示すフローチャートである。
優先度算出部１６３は、人状態スコアテーブルから表情のスコアを取得する（ステップＳ２５１）。具体的には、優先度算出部１６３は、ステップＳ２０３において取得された人情報レコードから、表情を取得する。優先度算出部１６３は、スコアマップ記憶部１０３から人状態スコアテーブルを取得する。優先度算出部１６３は、表情に対応付けされたスコアを人状態スコアテーブルから取得する。例えば、人情報レコードが有する表情が笑顔である場合、優先度算出部１６３は、人情報スコアテーブルから人の状態のスコアとして３を取得する。 FIG. 10 is a flowchart showing a specific example of the flow of the human condition score calculation process.
The priority calculation unit 163 acquires the facial expression score from the human condition score table (step S251). Specifically, the priority calculation unit 163 acquires a facial expression from the person information record acquired in step S203. The priority calculation unit 163 acquires a person status score table from the score map storage unit 103. The priority calculation unit 163 acquires the score associated with the facial expression from the human state score table. For example, when the facial expression of the person information record is a smile, the priority calculation unit 163 acquires 3 as the score of the person's condition from the person information score table.

優先度算出部１６３は、人状態スコアテーブルから人発話回数のスコアを取得する（ステップＳ２５２）。具体的には、優先度算出部１６３は、ステップＳ２０３において取得された人情報レコードから、人発話回数を取得する。優先度算出部１６３は、スコアマップ記憶部１０３から人状態スコアテーブルを取得する。優先度算出部１６３は、人発話回数に対応付けされたスコアを人状態スコアテーブルから取得する。例えば、人情報レコードが有する人発話回数が０である場合、優先度算出部１６３は、人状態スコアテーブルから人発話回数のスコアとして５を取得する。 The priority calculation unit 163 acquires the score of the number of human utterances from the human status score table (step S252). Specifically, the priority calculation unit 163 acquires the number of utterances of a person from the person information record acquired in step S203. The priority calculation unit 163 acquires a person status score table from the score map storage unit 103. The priority calculation unit 163 acquires a score associated with the number of human utterances from the human state score table. For example, when the number of human utterances held by the human information record is 0, the priority calculation unit 163 acquires 5 as the score of the number of human utterances from the human status score table.

優先度算出部１６３は、人状態スコアテーブルから視線のスコアを取得する（ステップＳ２５３）。具体的には、優先度算出部１６３は、ステップＳ２０３において取得された人状態レコードから、視線の値を取得する。優先度算出部１６３は、スコアマップ記憶部１０３から人状態スコアテーブルを取得する。優先度算出部１６３は、視線に対応付けされたスコアを人状態スコアテーブルから取得する。例えば、人情報レコードが有する視線が対話装置以外を向いている場合、優先度算出部１６３は、人状態スコアテーブルから視線のスコアとして５を取得する。 The priority calculation unit 163 acquires the line-of-sight score from the human condition score table (step S253). Specifically, the priority calculation unit 163 acquires the line-of-sight value from the person state record acquired in step S203. The priority calculation unit 163 acquires a person status score table from the score map storage unit 103. The priority calculation unit 163 acquires the score associated with the line of sight from the human state score table. For example, when the line of sight of the person information record is directed to a position other than the dialogue device, the priority calculation unit 163 acquires 5 as the line of sight score from the person state score table.

優先度算出部１６３は、人状態スコアテーブルから滞在時間のスコアを取得する（ステップＳ２５４）。具体的には、優先度算出部１６３は、ステップＳ２０３において取得された人状態レコードから、滞在時間の値を取得する。優先度算出部１６３は、スコアマップ記憶部１０３から人状態スコアテーブルを取得する。優先度算出部１６３は、滞在時間に対応付けされたスコアを人状態スコアテーブルから取得する。例えば、人情報レコードが有する滞在時間が１２秒である場合、優先度算出部１６３は、人状態スコアテーブルから視線のスコアとして５を取得する。 The priority calculation unit 163 acquires the score of the staying time from the human condition score table (step S254). Specifically, the priority calculation unit 163 acquires the value of the staying time from the person status record acquired in step S203. The priority calculation unit 163 acquires a person status score table from the score map storage unit 103. The priority calculation unit 163 acquires the score associated with the staying time from the human condition score table. For example, when the staying time of the person information record is 12 seconds, the priority calculation unit 163 acquires 5 as the line-of-sight score from the person status score table.

優先度算出部１６３は、人状態スコアを算出する（ステップＳ２４４）。具体的には、優先度算出部１６３は、ステップＳ２５１〜Ｓ２５４において取得された各スコアに対して所定の演算を行うことで人状態スコアを算出してもよい。例えば、優先度算出部１６３は、各スコアを加算することで人状態スコアを算出してもよい。例えば、優先度算出部１６３は、各スコアに所定の重みづけをすることで人状態スコアを算出してもよい。なお、重みは、任意の値が指定されてもよい。 The priority calculation unit 163 calculates the human condition score (step S244). Specifically, the priority calculation unit 163 may calculate the human condition score by performing a predetermined calculation on each score acquired in steps S251 to S254. For example, the priority calculation unit 163 may calculate the human condition score by adding each score. For example, the priority calculation unit 163 may calculate the human condition score by giving a predetermined weight to each score. Any value may be specified for the weight.

図１１は、発話内容スコア算出処理の流れの一具体例を示すフローチャートである。制御装置１００の音声処理部１６４は、マイク３０から音声信号を取得したか否かを判定する（ステップＳ２６１）。音声信号を取得した場合（ステップＳ２６１：ＹＥＳ）、音声処理部１６４は、音声認識処理を実行する。音声処理部１６４は、音声認識処理によって音声信号に基づいて文字列を生成する。なお、音声処理部１６４は、複数の人の音声信号を取得した場合、人毎に文字列を生成する。この場合、音声処理部１６４は、公知の音源分離処理に基づいて、人と音声信号とを対応付ける。音声信号を取得していない場合（ステップＳ２６１：ＮＯ）、処理は、ステップＳ２６３に遷移する。 FIG. 11 is a flowchart showing a specific example of the flow of the utterance content score calculation process. The voice processing unit 164 of the control device 100 determines whether or not a voice signal has been acquired from the microphone 30 (step S261). When the voice signal is acquired (step S261: YES), the voice processing unit 164 executes the voice recognition process. The voice processing unit 164 generates a character string based on the voice signal by voice recognition processing. When the voice processing unit 164 acquires the voice signals of a plurality of people, the voice processing unit 164 generates a character string for each person. In this case, the voice processing unit 164 associates the person with the voice signal based on a known sound source separation process. If the audio signal has not been acquired (step S261: NO), the process proceeds to step S263.

発話決定部１６５は、対話装置５０の発話内容を決定する（ステップＳ２６３）。具体的には、発話決定部１６５は、生成された文字列に基づいて発話内容を決定する。発話決定部１６５は、生成された文字列に対応付けされた発話内容を発話内容記憶部１０４から取得する。発話決定部１６５は、取得された発話内容を対話装置５０の発話内容として決定する。 The utterance determination unit 165 determines the utterance content of the dialogue device 50 (step S263). Specifically, the utterance determination unit 165 determines the utterance content based on the generated character string. The utterance determination unit 165 acquires the utterance content associated with the generated character string from the utterance content storage unit 104. The utterance determination unit 165 determines the acquired utterance content as the utterance content of the dialogue device 50.

優先度算出部１６３は、発話内容スコアを算出する（ステップＳ２６４）。具体的には、優先度算出部１６３は、スコアマップ記憶部１０３から発話内容スコアテーブルを取得する。優先度算出部１６３は、発話内容に対応付けされたスコアを発話内容スコアテーブルから取得する。例えば、発話内容が「おはよう」である場合、優先度算出部１６３は、発話内容スコアテーブルから発話内容スコアとして５を取得する。優先度算出部１６３は、発話内容スコアに所定の重みづけをすることで発話内容スコアを算出してもよい。なお、重みは、任意の値が指定されてもよい。 The priority calculation unit 163 calculates the utterance content score (step S264). Specifically, the priority calculation unit 163 acquires the utterance content score table from the score map storage unit 103. The priority calculation unit 163 acquires the score associated with the utterance content from the utterance content score table. For example, when the utterance content is "good morning", the priority calculation unit 163 acquires 5 as the utterance content score from the utterance content score table. The priority calculation unit 163 may calculate the utterance content score by giving a predetermined weight to the utterance content score. Any value may be specified for the weight.

このように構成された制御システム１では、制御装置１００の動作決定部１６６がセンサー１０によって検出された複数の人のうちどの人の方向に対話装置５０の顔を向けるのかを決定する。具体的には、動作決定部１６６は、人と対話装置５０との対話の履歴に基づいてどの人の方向に対話装置５０を向けるのかを決定する。機器制御部１６７が、決定された人の方向を向くように対話装置５０を制御する。このため、対話装置５０がどの人に話しかけているのかよりわかりやすく示すことができる。 In the control system 1 configured in this way, the operation determination unit 166 of the control device 100 determines which of the plurality of people detected by the sensor 10 the face of the dialogue device 50 is directed. Specifically, the motion determination unit 166 determines in which direction the dialogue device 50 should be directed based on the history of dialogue between the person and the dialogue device 50. The device control unit 167 controls the dialogue device 50 so as to face the determined person. Therefore, it is possible to show which person the dialogue device 50 is talking to in an easy-to-understand manner.

図１２は、対話装置５０の動作の一具体例を示す図である。図１２は、領域２００、領域２００ａ及び領域２００ｂを示す。領域２００〜２００ｂは、時系列の変化を示す。領域２００、領域２００ａ及び領域２００ｂは、いずれも人６０、人６０ａ、人６０ｂを含む。図１２では、対話装置５０が、人６０、人６０ａ、人６０ｂに同時に話しかけている。図１２では、時間は、領域２００、領域２００ａ及び領域２００ｂの順に経過する。優先度算出部１６３は、時間の経過に応じて優先度を繰り返して算出する。 FIG. 12 is a diagram showing a specific example of the operation of the dialogue device 50. FIG. 12 shows a region 200, a region 200a and a region 200b. Regions 200-200b show changes over time. The region 200, the region 200a, and the region 200b all include a person 60, a person 60a, and a person 60b. In FIG. 12, the dialogue device 50 is talking to the person 60, the person 60a, and the person 60b at the same time. In FIG. 12, the time elapses in the order of region 200, region 200a, and region 200b. The priority calculation unit 163 repeatedly calculates the priority according to the passage of time.

領域２００では、人６０の優先度は１５である。領域２００では、人６０ａの優先度は１０である。領域２００では、人６０ｂの優先度は５である。このため、対話装置５０は、人６０の方向に顔を向けて発話を行う。領域２００ａでは、人６０の優先度は１０である。領域２００ａでは、人６０ａの優先度は５である。領域２００ａでは、人６０ｂの優先度は１５である。このため、対話装置５０は、人６０ｂの方向に顔を向けて発話を行う。領域２００ｂでは、人６０の優先度は１０である。領域２００ｂでは、人６０ａの優先度は１５である。領域２００ｂでは、人６０ｂの優先度は１０である。このため、対話装置５０は、人６０ａの方向に顔を向けて発話を行う。このように、優先度算出部１６３は、優先度を定期的に算出する。対話装置５０は、特定の人に偏ることなく、複数の人に対して交替で顔を向けて発話を行う。このため、人は自分に対して話しかけられていると認識することができる。 In region 200, the priority of person 60 is 15. In region 200, the priority of person 60a is 10. In region 200, person 60b has a priority of 5. Therefore, the dialogue device 50 turns its face toward the person 60 and speaks. In region 200a, the priority of person 60 is 10. In the region 200a, the priority of the person 60a is 5. In region 200a, person 60b has a priority of 15. Therefore, the dialogue device 50 turns its face toward the person 60b and speaks. In region 200b, person 60 has a priority of 10. In region 200b, person 60a has a priority of 15. In region 200b, person 60b has a priority of 10. Therefore, the dialogue device 50 turns its face toward the person 60a and speaks. In this way, the priority calculation unit 163 periodically calculates the priority. The dialogue device 50 takes turns to speak to a plurality of people without being biased toward a specific person. Therefore, a person can recognize that he / she is being spoken to himself / herself.

上述の実施形態では、優先度算出部１６３は、発話履歴スコア、人状態スコア及び発話内容スコアに基づいて優先度を算出するように構成されたがこれに限定されない。例えば、優先度算出部１６３は、発話履歴スコア、人状態スコア及び発話内容スコアのうち、いずれか１つ以上に基づいて優先度を算出するように構成されてもよい。この場合、優先度算出部１６３は、用いられないスコアについては算出しないように構成されてもよい。 In the above-described embodiment, the priority calculation unit 163 is configured to calculate the priority based on the utterance history score, the human condition score, and the utterance content score, but is not limited thereto. For example, the priority calculation unit 163 may be configured to calculate the priority based on any one or more of the utterance history score, the human condition score, and the utterance content score. In this case, the priority calculation unit 163 may be configured not to calculate the unused score.

上述の実施形態では、制御システム１は、対話装置５０はロボットであるものとして説明したがこれに限定されない。対話装置５０は、ロボットの代わりにディスプレイ等の表示装置や、アクチュエータ等の駆動装置で構成されてもよい。図１３は、制御システム１がロボットの代わりに表示装置又はアクチュエータ等の所定の装置を備える場合の一具体例を示す図である。図１３（ａ）は、制御システム１が、ロボットの代わりに表示装置を備える場合の一具体例を示す図である。図１３（ａ）によると、制御システム１は、対話装置５０の代わりに対話装置５０ａ（表示装置）を備える。対話装置５０ａは、タッチパネルやディスプレイ等の表示装置である。対話装置５０ａは、人画像５１を表示する。人画像５１は、人の姿を示す画像である。人の姿は、全身を表す姿であってもよいし、バストアップ等の姿の一部であってもよい。人画像５１は、実写が用いられてもよいし、ＣＧ(Computer Graphics)が用いられてもよい。人画像５１は、機器制御部１６７によって決定された動作内容に基づいて動作する。なお、対話装置５０ａに表示される画像は、複数の人画像を表示でもよいし、動物、物又は植物等の人以外の画像でもよい。 In the above-described embodiment, the control system 1 has described, but is not limited to, the dialogue device 50 as being a robot. The dialogue device 50 may be composed of a display device such as a display or a drive device such as an actuator instead of the robot. FIG. 13 is a diagram showing a specific example of a case where the control system 1 is provided with a predetermined device such as a display device or an actuator instead of the robot. FIG. 13A is a diagram showing a specific example in the case where the control system 1 is provided with a display device instead of the robot. According to FIG. 13A, the control system 1 includes a dialogue device 50a (display device) instead of the dialogue device 50. The dialogue device 50a is a display device such as a touch panel or a display. The dialogue device 50a displays the human image 51. The human image 51 is an image showing a human figure. The figure of a person may be a figure representing the whole body or a part of a figure such as a bust-up. As the human image 51, a live-action image may be used, or CG (Computer Graphics) may be used. The human image 51 operates based on the operation content determined by the device control unit 167. The image displayed on the dialogue device 50a may be a plurality of human images, or may be an image of an animal, an object, a plant, or the like other than a human.

図１３（ｂ）は、制御システム１が、ロボットの代わりにアクチュエータを備える場合の一具体例を示す図である。図１３（ｂ）によると、制御システム１は、対話装置５０の代わりに対話装置５０ｂ（アクチュエータ）を備える。対話装置５０ｂは、内部にモータや伝達ギア等を備える。対話装置５０ｂには、物体５２を載せることができる。物体５２は、本、食品又は道具等の物である。対話装置５０ｂは、制御装置１００によって決定された動作内容に基づいて動作する。具体的には、対話装置５０ｂは、回転移動をしたり、上下に振動したり、等の所定の動作を行う。対話装置５０ｂに載せられた物体５２は、対話装置５０ｂの動作に応じて、回転移動したり、上下に振動したりする。このように、制御システム１は、対話装置５０ｂを動作させながら、スピーカー４０から音声を出力させることで、物体５２を生き物のようにふるまわせることができる。また、物体５２が、人に話しかける場合であっても、どの人に話しかけているのか人は知ることができる。 FIG. 13B is a diagram showing a specific example in the case where the control system 1 includes an actuator instead of the robot. According to FIG. 13B, the control system 1 includes a dialogue device 50b (actuator) instead of the dialogue device 50. The dialogue device 50b includes a motor, a transmission gear, and the like inside. An object 52 can be placed on the dialogue device 50b. The object 52 is an object such as a book, food, or a tool. The dialogue device 50b operates based on the operation content determined by the control device 100. Specifically, the dialogue device 50b performs a predetermined operation such as rotational movement, vibration up and down, and the like. The object 52 mounted on the dialogue device 50b rotates and moves or vibrates up and down according to the operation of the dialogue device 50b. In this way, the control system 1 can make the object 52 behave like a living thing by outputting sound from the speaker 40 while operating the dialogue device 50b. Further, even when the object 52 talks to a person, the person can know which person the object 52 is talking to.

上述の実施形態では、制御システム１は、１台の対話装置５０を制御するものとして説明したが、１台に限定されない。例えば、制御システム１は、２台以上の対話装置５０を制御するように構成されてもよい。この場合、動作決定部１６６は優先度の高い順に、対話装置５０の台数だけ、対象者を決定してもよい。 In the above-described embodiment, the control system 1 has been described as controlling one dialogue device 50, but the control system 1 is not limited to one. For example, the control system 1 may be configured to control two or more dialogue devices 50. In this case, the operation determination unit 166 may determine the target person by the number of dialogue devices 50 in descending order of priority.

上述の実施形態では、制御システム１は、優先度を算出することで、対話装置５０をいずれの人の方向に向けるのかを決定した。しかし、制御システム１は、人及び対話装置５０間の対話の履歴と所定の推定器とに基づいて対話装置５０の顔をいずれの人の方向に向けるのか決定するように構成されてもよい。対話の履歴とは、例えば人情報レコード等の人と対話装置５０との履歴を示す情報であればどのような情報でもよい。推定器とは、複数の学習データを機械学習することで生成される学習モデルである。推定器は、人及び対話装置５０間の対話の履歴に基づいて、優先度を推定する機能を持つ。この場合、優先度算出部１６３は、人情報記憶部１０２に記憶された人情報レコードと推定器とに基づいて優先度を推定する。推定器は、例えば優先度算出部１６３によって生成されてもよいし、予め制御装置１００が記憶していてもよい。学習データは、例えば人及び対話装置５０間の対話の履歴と優先度とを対応付けたデータである。学習データは、推定器の生成に用いられる。機械学習は、例えば、ＳＧＤ（Stochastic Gradient Descent）、ランダムフォレスト、線形回帰、決定木又はＣＮＮ（Convolutional Neural Network）等の公知の機械学習であればどのような機械学習であってもよい。 In the above-described embodiment, the control system 1 determines in which direction the dialogue device 50 is directed by calculating the priority. However, the control system 1 may be configured to determine which person the face of the dialogue device 50 should face based on the history of dialogue between the person and the dialogue device 50 and a predetermined estimator. The dialogue history may be any information as long as it is information indicating the history between the person and the dialogue device 50, such as a human information record. The estimator is a learning model generated by machine learning a plurality of learning data. The estimator has a function of estimating the priority based on the history of dialogue between the person and the dialogue device 50. In this case, the priority calculation unit 163 estimates the priority based on the person information record stored in the person information storage unit 102 and the estimator. The estimator may be generated by, for example, the priority calculation unit 163, or may be stored in advance by the control device 100. The learning data is, for example, data in which the history of dialogue between a person and the dialogue device 50 and the priority are associated with each other. The training data is used to generate the estimator. The machine learning may be any known machine learning such as SGD (Stochastic Gradient Descent), random forest, linear regression, decision tree or CNN (Convolutional Neural Network).

制御装置１００は、ネットワークを介して通信可能に接続された複数台の情報処理装置を用いて実装されてもよい。この場合、制御装置１００が備える各機能部は、複数の情報処理装置に分散して実装されてもよい。例えば、人識別部１６１及び特徴推定部１６２と、優先度算出部１６３、音声処理部１６４、発話決定部１６５、動作決定部１６６及び機器制御部１６７とはそれぞれ異なる情報処理装置に実装されてもよい。 The control device 100 may be implemented by using a plurality of information processing devices that are communicably connected via a network. In this case, each functional unit included in the control device 100 may be distributed and mounted in a plurality of information processing devices. For example, even if the person identification unit 161 and the feature estimation unit 162, the priority calculation unit 163, the voice processing unit 164, the utterance determination unit 165, the operation determination unit 166, and the device control unit 167 are mounted on different information processing devices. good.

上述した実施形態における制御装置１００をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ（Field Programmable Gate Array）等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 The control device 100 in the above-described embodiment may be realized by a computer. In that case, the program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer system and executed. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. Further, the "computer-readable recording medium" refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, or a storage device such as a hard disk built in a computer system. Further, a "computer-readable recording medium" is a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line, and dynamically holds the program for a short period of time. It may also include a program that holds a program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or a client in that case. Further, the above program may be for realizing a part of the above-mentioned functions, and may be further realized for realizing the above-mentioned functions in combination with a program already recorded in the computer system. It may be realized by using a programmable logic device such as FPGA (Field Programmable Gate Array).

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiments of the present invention have been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and includes designs and the like within a range that does not deviate from the gist of the present invention.

１…制御システム、１０…センサー、２０…カメラ、３０…マイク、４０…スピーカー、５０…対話装置、１００…制御装置、１０１…通信部、１０２…人情報記憶部、１０３…スコアマップ記憶部、１０４…発話内容記憶部、１０５…動作内容記憶部、１０６…制御部、１６１…人識別部、１６２…特徴推定部、１６３…優先度算出部、１６４…音声処理部、１６５…発話決定部、１６６…動作決定部、１６７…機器制御部 1 ... control system, 10 ... sensor, 20 ... camera, 30 ... microphone, 40 ... speaker, 50 ... dialogue device, 100 ... control device, 101 ... communication unit, 102 ... human information storage unit, 103 ... score map storage unit, 104 ... utterance content storage unit, 105 ... operation content storage unit, 106 ... control unit, 161 ... person identification unit, 162 ... feature estimation unit, 163 ... priority calculation unit, 164 ... voice processing unit, 165 ... utterance determination unit, 166 ... Operation determination unit, 167 ... Equipment control unit

Claims

An acquisition unit that acquires information about a plurality of people facing a predetermined dialogue device that talks to a person for each person, and an acquisition unit.
Which of the plurality of people the dialogue device should be directed to, based on the acquired information and the history information of the dialogue between the person and the dialogue device stored in the predetermined storage device. The action decision unit to decide and
An utterance determination unit that determines the utterance content of the dialogue device based on the voice uttered by the person.
A device control unit that controls the operation of the dialogue device so as to face the direction and controls a speaker provided in the vicinity of the dialogue device so as to output the utterance content.
A control system.

The operation determining unit calculates the dialogue device based on the acquired information and the history information, and determines whether or not to direct the dialogue device in the direction of the person. The dialogue device has the highest priority in the direction of the person. Decide to point and decide what to do,
The control system according to claim 1.

An utterance content score in which the utterance content is quantified based on the history information and the utterance content, an utterance history score in which the history information indicates and quantifies the utterance history of the dialogue device to the person, and the above. A priority calculation unit that calculates one or more of the person status score, which is indicated by the history information and quantifies the status of the person, and calculates the priority based on the calculated score. Further prepare
The operation determination unit determines the direction in which the dialogue device is directed and the operation content based on the priority calculated by the priority calculation unit.
The control system according to claim 2.

The operation determining unit moves the dialogue device in the direction of any of the plurality of people based on the estimator generated by machine learning the teacher data in which the history information and the priority are associated with each other. Decide whether to turn
The control system according to claim 2.

An acquisition unit that acquires information about a plurality of people facing a predetermined dialogue device that talks to a person for each person, and an acquisition unit.
Which of the plurality of people the dialogue device should be directed to, based on the acquired information and the history information of the dialogue between the person and the dialogue device stored in the predetermined storage device. The action decision unit to decide and
An utterance determination unit that determines the utterance content of the dialogue device based on the voice uttered by the person.
A device control unit that controls the operation of the dialogue device so as to face the direction and controls a speaker provided in the vicinity of the dialogue device so as to output the utterance content.
A control device.

An acquisition step in which the control device acquires information about a plurality of people facing a predetermined dialogue device that speaks to a person for each person.
Based on the acquired information and the history information of the dialogue between the person and the dialogue device stored in the predetermined storage device, the control device uses the dialogue device of any of the plurality of people. The action decision step to decide whether to turn and
An utterance determination step in which the control device determines the utterance content of the dialogue device based on the voice uttered by the person.
A device control step in which the control device controls the operation of the dialogue device so as to face the direction and controls a speaker provided in the vicinity of the dialogue device so as to output the utterance content.
Control method to have.

A computer program for operating a computer as the control system according to any one of claims 1 to 4.