JP2020086774A

JP2020086774A - Apparatus, method and program for controlling scenario

Info

Publication number: JP2020086774A
Application number: JP2018218430A
Authority: JP
Inventors: 充裕後藤; Mitsuhiro Goto; 純史布引; Ayafumi Nunobiki; 成宗松村; Narimune Matsumura; 昭博柏原; Akihiro Kashiwabara
Original assignee: Nippon Telegraph and Telephone Corp; University of Electro Communications NUC
Current assignee: Nippon Telegraph and Telephone Corp; University of Electro Communications NUC
Priority date: 2018-11-21
Filing date: 2018-11-21
Publication date: 2020-06-04
Anticipated expiration: 2038-11-21
Also published as: JP7153256B2

Abstract

To provide an apparatus for controlling a scenario adapted to assist in effective content presentation for a variety of audiences.SOLUTION: An apparatus for controlling a scenario according to a first embodiment of the present invention comprises an acquiring section that acquires information representative of a situation of an audience viewing a content, an extracting section that extracts a feature amount of the audience from an image, an estimating section that estimates a viewing state of the audience as any one of a plurality of states including a first state based on the feature amount of the audience and takes the estimated state as a current state, a determining section that if the current state is not the first state, determines any one different from the current state among the plurality of states as a destination-of-transition state, and a selecting section that selects any one out of at least one available correction contents for a presentation scenario of the content, associated with a combination of the current state and the destination-of-transition state.SELECTED DRAWING: Figure 1

Description

本発明は、例えば、コンテンツの提示態様を記述したシナリオの制御に関する。 The present invention relates to control of a scenario that describes a presentation mode of content, for example.

近年、デジタルサイネージとコミュニケーションロボットとを組み合わせたコンテンツ提示技法が知られている。例えば、ディスプレイに表示されたスライドなどの映像コンテンツに同期してロボットに説明文を発話させたりジェスチャを取らせたりすることで、例えば種々のインフォメーションサービスの省人化に寄与することができる。 In recent years, a content presentation technique that combines digital signage and a communication robot is known. For example, it is possible to contribute to labor saving of various information services, for example, by causing the robot to speak an explanation or to make a gesture in synchronization with video contents such as a slide displayed on the display.

かかるコンテンツ提示技法では、コンテンツの提示態様、一般的には、ディスプレイに表示されるスライド、そのスライドの表示中にロボットに発話させる台詞（発話内容）、およびロボットに取らせるジェスチャなどの非言語動作、を記述する提示シナリオ（以降、単にシナリオと称する）が事前に作成される。すなわち、スライド表示中に、ロボットはシナリオに記述された台詞を発話し、シナリオに記述された非言語動作を取ることになる。シナリオは、理想的には、聴衆にスライドの要点や詳細が伝わるように作成されるが、例えば年齢、性別、知識、嗜好、などの聴衆の属性は多様であるのであらゆる聴衆に対して効果的なコンテンツ提示を実現するシナリオを作成することは容易でない。 In such a content presentation technique, a presentation mode of content, generally, a slide displayed on a display, dialogue (contents of speech) that a robot speaks while displaying the slide, and non-verbal actions such as gestures that the robot takes. , A presentation scenario (hereinafter, simply referred to as a scenario) describing "," is created in advance. That is, during the slide display, the robot speaks the dialogue described in the scenario and takes the non-verbal movement described in the scenario. Scenarios are ideally created to convey the key points and details of the slides to the audience, but the attributes of the audience, such as age, gender, knowledge, preferences, etc., are diverse and effective for all audiences. It is not easy to create a scenario that presents various contents.

非特許文献１には、人の位置、距離により、インタラクティブに反応する広告コンテンツを表示するデジタルサイネージが提案されている。また、非特許文献２には、運転中のドライバの意識状態を検知し、その情報に基づいてドライバにわかりやすい警報を提示することが提案されている。 Non-Patent Document 1 proposes a digital signage that displays interactive advertising content that responds to the position and distance of a person. Further, Non-Patent Document 2 proposes to detect a driver's consciousness state while driving and to present an easy-to-understand alarm to the driver based on the information.

陳成ら，「人の状況にインタラクティブに反応するデジタルサイネージ」，２０１４年，情報処理学会第７６回全国大会Chen et al., "Digital Signage Interactively Responding to Human Situation", 2014, 76th National Convention of Information Processing Society of Japan 山崎初夫ら，「ドライバ状態モニターの開発と運転支援システムの警報提示方法の検討」，ＩＥＥＪＴｒａｎｓ．ＩＡ，Ｖｏｌ．１２５，Ｎｏ．１１，２００５年Hatsuo Yamazaki et al., "Development of driver status monitor and examination of warning presentation method of driving support system", IEEE Trans. IA, Vol. 125, No. 11, 2005

本発明は、多様な聴衆に対する効果的なコンテンツ提示を支援することを目的とする。 The present invention aims to support effective content presentation to diverse audiences.

本発明の第１の態様に係るシナリオ制御装置は、コンテンツを視聴する聴衆の様子を表す情報を取得する取得部と、コンテンツを視聴する聴衆の様子を表す情報から聴衆の特徴量を抽出する抽出部と、聴衆の特徴量に基づいて、聴衆の視聴状態を第１の状態を含む複数の状態のいずれか１つとして推定し、推定された状態を現在状態とする推定部と、現在状態が第１の状態でない場合に、複数の状態のうち現在状態とは異なるいずれか１つを遷移先状態と決定する決定部と、現在状態および遷移先状態の組み合わせに関連付けられている、コンテンツの提示シナリオに対する少なくとも１つの利用可能な修正内容のうちのいずれか１つを選択する選択部とを具備する。 A scenario control device according to a first aspect of the present invention includes an acquisition unit that acquires information that represents the state of an audience who views content, and an extraction that extracts a feature amount of the audience from the information that represents the state of the audience who views the content. And an estimation unit that estimates the viewing state of the audience as one of a plurality of states including the first state based on the feature amount of the audience, and the estimated state is the current state, and the current state is Presentation of content associated with a combination of a current state and a transition destination state, and a determination unit that determines any one of the plurality of states different from the current state as the transition destination state if the state is not the first state A selection unit for selecting any one of at least one available correction content for the scenario.

すなわち、このシナリオ制御装置は、聴衆の画像に基づいて当該聴衆の視聴状態を推定し、推定した視聴状態を遷移させるべく、コンテンツの提示態様が記述されるシナリオに対する修正内容を選択する。従って、このシナリオ制御装置によれば、予め用意されたシナリオが適さない聴衆を相手にコンテンツを提示する場合であっても、当該聴衆に適するように修正内容を選択することができる。 That is, the scenario control device estimates the viewing state of the audience based on the image of the audience, and selects the correction content for the scenario in which the presentation mode of the content is described in order to transition the estimated viewing state. Therefore, according to this scenario control device, even when the content is presented to the audience who is not suitable for the scenario prepared in advance, the correction content can be selected to be suitable for the audience.

第１の態様に係るシナリオ制御装置において、聴衆の特徴量は、聴衆のコンテンツへの興味・関心度を示す第１の特徴量と、聴衆のコンテンツへの集中度を示す第２の特徴量とを含み、複数の状態は、第１の状態に加え、第２の状態、第３の状態および第４の状態を含み、推定部は、第１の特徴量が第１の閾値以上であって、第２の特徴量が第２の閾値以上である場合に、聴衆の視聴状態を第１の状態と推定し、推定部は、第１の特徴量が第１の閾値以上であって、第２の特徴量が第２の閾値未満である場合に、聴衆の視聴状態を第２の状態と推定し、推定部は、第１の特徴量が第１の閾値未満であって、第２の特徴量が第２の閾値以上である場合に、聴衆の視聴状態を第３の状態と推定し、推定部は、第１の特徴量が第１の閾値未満であって、第２の特徴量が第２の閾値未満である場合に、聴衆の視聴状態を第４の状態と推定してもよい。 In the scenario control device according to the first aspect, the audience feature amount includes a first feature amount indicating the interest/degree of interest of the audience content, and a second feature amount indicating the concentration degree of the audience on the content. In addition to the first state, the plurality of states include a second state, a third state, and a fourth state, and the estimation unit determines that the first feature amount is equal to or larger than the first threshold value. , The second feature amount is equal to or more than the second threshold value, the audience viewing state is estimated to be the first state, and the estimating unit determines that the first feature amount is equal to or more than the first threshold value. When the second feature amount is less than the second threshold value, the viewing state of the audience is estimated to be the second state, and the estimation unit determines that the first feature amount is less than the first threshold value and the second feature amount is less than the second threshold value. When the feature amount is equal to or larger than the second threshold value, the viewing state of the audience is estimated to be the third state, and the estimation unit determines that the first feature amount is less than the first threshold value and the second feature amount. If is less than the second threshold, the viewing state of the audience may be estimated as the fourth state.

このシナリオ制御装置（以降、本発明の第２の態様に係るシナリオ制御装置と称する）は、上記複数の状態を興味・関心度および集中度の２軸で分類するので、聴衆の視聴状態を改善するために向上させる必要のある状態要素を絞り込み、適切な修正内容を選択することができる。 This scenario control device (hereinafter, referred to as a scenario control device according to the second aspect of the present invention) classifies the plurality of states into two axes of interest/interest level and concentration level, and thus improves the viewing state of the audience. Therefore, it is possible to narrow down the state elements that need to be improved in order to select appropriate correction content.

第２の態様に係るシナリオ制御装置において、決定部は、現在状態が第４の状態であって、かつ提示シナリオの進捗状況が第３の閾値未満である場合に、第２の状態を遷移先状態と決定し、決定部は、現在状態が第４の状態であって、かつ提示シナリオの進捗状況が第３の閾値以上である場合に、第３の状態を遷移先状態と決定してもよい。 In the scenario control device according to the second aspect, when the present state is the fourth state and the progress status of the presentation scenario is less than the third threshold, the determining unit changes the second state to the transition destination. If the present state is the fourth state and the progress of the presentation scenario is equal to or greater than the third threshold, the determining unit determines the third state as the transition destination state. Good.

このシナリオ制御装置によれば、シナリオの進捗状況が第３の閾値に達するまでは聴衆の興味・関心を引き出すための状態遷移が優先され、シナリオの進捗状況がこの第３の閾値に達した後は聴衆の集中を引き出すための状態遷移が優先される。 According to this scenario control device, until the progress of the scenario reaches the third threshold, priority is given to the state transition for drawing the interest of the audience, and after the progress of the scenario reaches the third threshold. Priority is given to state transitions in order to bring out the concentration of the audience.

第２の態様に係るシナリオ制御装置において、決定部は、現在状態が第４の状態であって、かつ提示シナリオの全長が第４の閾値未満である場合に、第２の状態を遷移先状態と決定してもよい。このシナリオ制御装置によれば、シナリオの全長が短い場合には、シナリオの進捗状況にかかわらず、聴衆の興味・関心を引き出すための状態遷移が優先される。 In the scenario control device according to the second aspect, the determining unit sets the second state to the transition destination state when the current state is the fourth state and the total length of the presentation scenario is less than the fourth threshold value. You may decide. According to this scenario control device, when the total length of the scenario is short, priority is given to the state transition for eliciting the interest of the audience regardless of the progress of the scenario.

第１または第２の態様に係るシナリオ制御装置において、選択部は、現在状態および遷移先状態の組み合わせに関連付けられている少なくとも１つの利用可能な修正内容のうち、累積選択回数が最小である１つを選択してもよい。 In the scenario control device according to the first or second aspect, the selection unit has the smallest cumulative selection count among at least one available correction content associated with the combination of the current state and the transition destination state. You may choose one.

このシナリオ制御装置によれば、様々な修正内容が満遍なく選択されるので、聴衆の反応の良いシナリオを探り当て、聴衆の視聴状態を良好と定義される状態へ遷移させることができる。また、これにより、シナリオは非画一的に修正されるので、コンテンツの提示態様に慣れることによるシナリオ修正の効力の低下を抑制することもできる。 According to this scenario control device, since various correction contents are uniformly selected, it is possible to find a scenario in which the audience has a good reaction and make the audience's viewing state transition to a state defined as good. Further, as a result, the scenario is modified in a non-uniform manner, so that it is possible to suppress a decrease in the effectiveness of the scenario modification due to familiarization with the presentation mode of the content.

第１または第２の態様に係るシナリオ制御装置は、選択された修正内容に基づいてコンテンツの提示シナリオを修正する修正部をさらに具備してもよい。このシナリオ制御装置によれば、予め用意されたシナリオが適さない聴衆を相手にコンテンツを提示する場合であっても、当該聴衆に適するようにシナリオを動的に修正することができる。 The scenario control device according to the first or second aspect may further include a correction unit that corrects the content presentation scenario based on the selected correction content. According to this scenario control device, even when the content is presented to an audience who is not suitable for the prepared scenario, the scenario can be dynamically modified to be suitable for the audience.

本発明の第３の態様に係るシナリオ制御方法は、コンピュータによって実行されるシナリオ制御方法であって、コンテンツを視聴する聴衆の様子を表す情報を取得することと、コンテンツを視聴する聴衆の様子を表す情報から聴衆の特徴量を抽出することと、聴衆の特徴量に基づいて、聴衆の視聴状態を第１の状態を含む複数の状態のいずれか１つとして推定し、推定された状態を現在状態とすることと、現在状態が第１の状態でない場合に、複数の状態のうち現在状態とは異なるいずれか１つを遷移先状態と決定することと、現在状態および遷移先状態の組み合わせに関連付けられている、コンテンツの提示シナリオに対する少なくとも１つの利用可能な修正内容のうちのいずれか１つを選択することとを具備する。 A scenario control method according to a third aspect of the present invention is a computer-implemented scenario control method, which comprises acquiring information indicating a state of an audience who views content, and a state of an audience who views content. Extracting the feature amount of the audience from the represented information, and estimating the viewing state of the audience as one of the plurality of states including the first state based on the feature amount of the audience, and estimating the estimated state as the current state. A state, and if the current state is not the first state, determine any one of a plurality of states different from the current state as the transition destination state, and combine the current state and the transition destination state. Selecting any one of the associated at least one available modification to the content presentation scenario.

すなわち、このシナリオ制御方法は、聴衆の画像に基づいて当該聴衆の視聴状態を推定し、推定した視聴状態を遷移させるべく、コンテンツの提示態様が記述されるシナリオに対する修正内容を選択する。従って、このシナリオ制御方法によれば、予め用意されたシナリオが適さない聴衆を相手にコンテンツを提示する場合であっても、当該聴衆に適するように修正内容を選択することができる。 That is, the scenario control method estimates the viewing state of the audience based on the image of the audience, and selects the correction content for the scenario in which the presentation mode of the content is described in order to transition the estimated viewing state. Therefore, according to this scenario control method, even when the contents are presented to an audience who is not suitable for the scenario prepared in advance, the correction content can be selected to be suitable for the audience.

本発明の第４の態様に係るシナリオ制御プログラムは、コンピュータを第１または第２の態様に係るシナリオ制御装置として機能させるためのコンピュータ可読命令を具備する。このシナリオ制御プログラムによれば、第１または第２の態様に係るシナリオ制御装置をソフトウェアで実現できる。 A scenario control program according to a fourth aspect of the present invention includes computer-readable instructions for causing a computer to function as the scenario control device according to the first or second aspect. According to this scenario control program, the scenario control device according to the first or second aspect can be realized by software.

本発明によれば、多様な聴衆に対する効果的なコンテンツ提示を支援できる。 According to the present invention, it is possible to support effective content presentation to various audiences.

実施形態に係るシナリオ制御装置を含むコンテンツ提示システムを例示するブロック図。The block diagram which illustrates the contents presentation system containing the scenario control device concerning an embodiment. 図１のコンテンツ提示システムによるコンテンツ提示態様の説明図。Explanatory drawing of the content presentation aspect by the content presentation system of FIG. 図１のコンテンツ提示システムによるコンテンツ提示態様を記述するシナリオの説明図。Explanatory drawing of the scenario which describes the content presentation aspect by the content presentation system of FIG. 図１の聴衆特徴抽出部によって行われる聴衆特徴抽出処理の説明図。Explanatory drawing of the audience feature extraction process performed by the audience feature extraction part of FIG. 図１の聴衆状態推定部によって推定される聴衆状態の説明図。Explanatory drawing of the audience state estimated by the audience state estimation part of FIG. 現在状態が図５の状態４と推定された場合に決定される遷移先状態の候補の一例を示す図。The figure which shows an example of the candidate of the transition destination state determined when the present state is presumed to be the state 4 of FIG. 現在状態が図５の状態４と推定された場合に決定される遷移先状態の候補の別の例を示す図。The figure which shows another example of the candidate of the transition destination state determined when the present state is estimated to be the state 4 of FIG. 図１の状態遷移ルール記憶部に保存される状態遷移ルールテーブルを例示する図。The figure which illustrates the state transition rule table preserve|saved at the state transition rule memory|storage part of FIG. 図５の状態１以外の各状態を改善する状態遷移を実現するための修正内容を例示する図。The figure which illustrates the correction content for implement|achieving the state transition which improves each state other than the state 1 of FIG. 図９に挙げられた修正内容の一例である「効果音の活用」の説明図。Explanatory drawing of "utilization of a sound effect" which is an example of the correction content mentioned in FIG. 図９に挙げられた修正内容の一例である「同じ説明を繰り返す」の説明図。Explanatory drawing of "repeat the same description" which is an example of the correction content mentioned in FIG. 図９に挙げられた修正内容の一例である「ポインティング動作」の説明図。Explanatory drawing of "pointing operation" which is an example of the correction content quoted in FIG. 図９に挙げられた修正内容の一例である「視線制御による聴衆へのアイコンタクト動作」の説明図。Explanatory drawing of "eye contact operation|movement with respect to an audience by eye-gaze control" which is an example of the correction content mentioned in FIG. 図９に挙げられた修正内容の一例である「視線制御によるスライドへの注意誘導」の説明図。Explanatory drawing of "attention guidance to a slide by a line-of-sight control" which is an example of the correction content mentioned in FIG. 図１の修正内容ルール記憶部に保存される修正内容ルールテーブルを例示する図。The figure which illustrates the correction content rule table preserve|saved at the correction content rule memory|storage part of FIG. 図１の修正履歴記憶部に保存される修正履歴テーブルを例示する図。The figure which illustrates the correction history table preserve|saved at the correction history storage part of FIG. 図１のシナリオ制御装置の動作を例示するフローチャート。3 is a flowchart illustrating the operation of the scenario control device in FIG. 1. 図１７のステップＳ３１０の詳細を例示するフローチャート。The flowchart which illustrates the detail of step S310 of FIG. 図１７のステップＳ３２０の詳細を例示するフローチャート。The flowchart which illustrates the detail of step S320 of FIG. プレゼンタの一変形例を示す図。The figure which shows the example of a changed completely type of a presenter. プレゼンタの別の変形例を示す図。The figure which shows another modification of a presenter.

以下、図面を参照しながら実施形態の説明を述べる。なお、以降、説明済みの要素と同一または類似の要素には同一または類似の符号を付し、重複する説明については基本的に省略する。 Hereinafter, a description of embodiments will be given with reference to the drawings. In the following, the same or similar reference numerals will be given to the same or similar elements as the elements already described, and redundant description will be basically omitted.

（実施形態）
実施形態に係るシナリオ制御装置は、例えば、デジタルサイネージとコミュニケーションロボットとを組み合わせたコンテンツ提示システムなどに組み込むことができる。なお、後述するように、かかるコンテンツ提示システムは一例に過ぎない。例えば、デジタルサイネージは、必ずしも物理的なディスプレイによって実現されなくてもよく、ＶＲ（ＶｉｒｔｕａｌＲｅａｌｉｔｙ）／ＡＲ（ＡｕｇｍｅｎｔｅｄＲｅａｌｉｔｙ）／ＭＲ（ＭｉｘｅｄＲｅａｌｉｔｙ)空間（以降、単に仮想空間と称する）に設けられた仮想的なディスプレイによって実現されてもよい。また、コミュニケーションロボットは、ポインティングデバイスや仮想空間に存在するバーチャルエージェントに置き換えられてもよい。 (Embodiment)
The scenario control device according to the embodiment can be incorporated in, for example, a content presentation system in which a digital signage and a communication robot are combined. Note that, as will be described later, such a content presentation system is only an example. For example, the digital signage does not necessarily have to be realized by a physical display, and is provided in a VR (Virtual Reality)/AR (Augmented Reality)/MR (Mixed Reality) space (hereinafter simply referred to as a virtual space). It may be realized by a virtual display. Further, the communication robot may be replaced with a pointing device or a virtual agent existing in a virtual space.

かかるコンテンツ提示システムは、図１に例示されるように、実施形態に係るシナリオ制御装置１００と、カメラ１０と、ディスプレイ２０と、ロボット３０と、提示制御装置２００とを含み得る。 The content presentation system may include the scenario control device 100 according to the embodiment, the camera 10, the display 20, the robot 30, and the presentation control device 200, as illustrated in FIG. 1.

ディスプレイ２０およびロボット３０は、コンテンツの提示を担当する。具体的には図２に例示されるように、ディスプレイ２０は映像コンテンツ、例えばスライドを表示し、ロボット３０は、発話、およびジェスチャなどの非言語動作を活用して、ディスプレイ２０に表示されている映像コンテンツを聴衆に対して説明する。 The display 20 and the robot 30 are in charge of presenting content. Specifically, as illustrated in FIG. 2, the display 20 displays video content, for example, a slide, and the robot 30 is displayed on the display 20 by utilizing nonverbal actions such as utterances and gestures. Explain the video content to the audience.

カメラ１０は、コンテンツ提示時の聴衆の様子、特に聴衆の顔領域付近を撮影するように設置される。なお、カメラ１０として、以降の説明においてロボット３０と別のハードウェア、例えばＷｅｂカメラが用意されることを前提とするが、ディスプレイ２０またはロボット３０に搭載されたカメラが利用されてもよい。また、一般的な画像データを取得する可視光カメラだけではなく、赤外線センサを組み合わせたデプスカメラを利用して、聴衆の顔領域の特徴量（目や鼻などの特徴点座標）や骨格データの特徴量（肩や首など各関節の座標）を求め、聴衆の顔の向きや着目点を取得しても良い。もしくは、視線計測カメラなどを用いて聴衆の視線方向を取得しても良い。要するに、画像データに限らずコンテンツ提示時の聴衆の様子を表す任意の情報が利用され得るが、以降の説明では画像データを利用することを前提とする。 The camera 10 is installed so as to capture an image of the audience at the time of presenting the content, particularly the vicinity of the face area of the audience. In the following description, it is assumed that hardware different from the robot 30 such as a web camera is prepared as the camera 10, but a camera mounted on the display 20 or the robot 30 may be used. In addition to the visible light camera that acquires general image data, a depth camera combined with an infrared sensor is used to measure the features (coordinates of feature points such as eyes and nose) and skeleton data of the face area of the audience. It is also possible to obtain a feature amount (coordinates of each joint such as a shoulder and a neck) and acquire the face direction and the point of interest of the audience. Alternatively, the line-of-sight direction of the audience may be acquired using a line-of-sight measurement camera or the like. In short, not only the image data but also any information indicating the state of the audience at the time of presenting the content can be used, but in the following description, it is assumed that the image data is used.

提示制御装置２００は、シナリオに従って、ディスプレイ２０の表示内容と、ロボット３０の発話内容および非言語動作とを制御する。なお、図１は例示に過ぎず、これらの制御対象の一部または全部が、互いに別個の制御装置によって制御されてもよい。 The presentation control device 200 controls the display content of the display 20, the utterance content of the robot 30, and the non-verbal operation according to the scenario. Note that FIG. 1 is merely an example, and some or all of these control targets may be controlled by separate control devices.

シナリオは、図３に例示されるように、映像コンテンツの要素（プレゼンテーションではスライド、またはスライドに設定されたアニメーション（部品）であるが、例えば動画であればシーンなどであり得る）と、当該要素を説明するために提示される発話内容および非言語動作を記述する。すなわち、提示制御装置２００は、図３のシナリオに従って、例えばスライド１のアニメーション１−１をディスプレイ２０に表示させ、その間にロボットに発話内容１−１、例えば「今から，××をご説明します」、を発話させるとともに非言語動作１−１、例えばディスプレイ２０の一部または全体を指し示すポインティング動作、を取らせ、さらにその後に同スライド１の次のアニメーション１−２をディスプレイ２０に表示させ、その間にロボットに発話内容１−２を発話させるとともに非言語動作１−２を取らせることができる。さらに、提示制御装置２００は、図３のシナリオに従って、例えば、スライド２をディスプレイ２０に表示させ、その間にロボットに発話内容２を発話させるとともに、非言語動作２を取らせることができる。 As illustrated in FIG. 3, the scenario is an element of the video content (a slide in the presentation, or an animation (part) set to the slide, but it may be a scene in the case of a moving image, for example) and the element. Describe the utterance content and non-verbal actions that are presented to explain. That is, the presentation control device 200 displays the animation 1-1 of the slide 1, for example, on the display 20 according to the scenario of FIG. , And perform a non-verbal action 1-1, for example, a pointing action that indicates part or the whole of the display 20, and then display the next animation 1-2 of the slide 1 on the display 20. In the meantime, the robot can be made to utter the utterance content 1-2 and take the non-verbal action 1-2. Further, the presentation control device 200 can display the slide 2 on the display 20 and cause the robot to speak the utterance content 2 and perform the non-verbal motion 2 according to the scenario of FIG. 3, for example.

シナリオ制御装置１００は、カメラ１０によって撮影された聴衆の画像に基づいて聴衆の視聴状態を推定し、推定した視聴状態をさらに良好と定義される状態に遷移させるべく、提示制御装置２００によって実行されるシナリオを動的に修正する。これにより、後述するように、ロボット３０の発話内容、非言語動作などのコンテンツの提示態様は、聴衆の反応に依存して適応的に変化することになる。故に、予め用意されたシナリオが適さない聴衆を相手にコンテンツを提示する場合であっても、このシナリオ制御装置１００は当該聴衆に適するようにシナリオを修正し、当該聴衆にコンテンツの要点・詳細を効果的に伝えることが可能となる。 The scenario control device 100 is executed by the presentation control device 200 to estimate the viewing state of the audience based on the image of the audience captured by the camera 10, and to shift the estimated viewing state to a state defined as better. Dynamically modify the scenario. Thereby, as will be described later, the presentation mode of the content such as the utterance content and the non-verbal motion of the robot 30 adaptively changes depending on the reaction of the audience. Therefore, even when presenting content to an audience who is not suitable for the prepared scenario, the scenario control device 100 corrects the scenario to suit the audience, and the main points and details of the content are provided to the audience. It is possible to communicate effectively.

次に、シナリオ制御装置１００のハードウェア構成例を説明する。なお、提示制御装置２００は、シナリオ制御装置１００と同様のハードウェア構成を採用し得る。 Next, a hardware configuration example of the scenario control device 100 will be described. Note that the presentation control device 200 can adopt the same hardware configuration as the scenario control device 100.

シナリオ制御装置１００は、例えばコンピュータであり得る。この場合に、シナリオ制御装置１００は、シナリオの動的修正などの情報処理を行うプロセッサ（例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、など）を含む。さらに、シナリオ制御装置１００は、かかる処理を実現するためにプロセッサによって実行されるプログラムおよび当該プロセッサによって使用されるデータなどを一時的に格納するメモリを含む。 The scenario control device 100 can be, for example, a computer. In this case, the scenario control device 100 performs information processing such as dynamic correction of a scenario (for example, CPU (Central Processing Unit), FPGA (Field Programmable Gate Array), ASIC (Application Specific Integrated)), and the like. including. Further, the scenario control device 100 includes a memory that temporarily stores a program executed by the processor to realize such processing, data used by the processor, and the like.

シナリオ制御装置１００は、さらに、例えば提示制御装置２００などの外部装置に例えばネットワークを介して接続するための通信Ｉ／Ｆ（インタフェース）を利用可能である。通信Ｉ／Ｆは、シナリオ制御装置１００に内蔵されてもよいし、シナリオ制御装置１００に外付けされてもよい。 The scenario control device 100 can further use a communication I/F (interface) for connecting to an external device such as the presentation control device 200 via, for example, a network. The communication I/F may be built in the scenario control device 100 or external to the scenario control device 100.

シナリオ制御装置１００は、さらに、データを蓄積するための補助記憶装置を利用可能である。補助記憶装置は、シナリオ制御装置１００に内蔵されてもよいし、シナリオ制御装置１００に外付けされてもよい。補助記憶装置は、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、フラッシュメモリなどの不揮発性記憶媒体であることが好ましい。或いは、補助記憶装置は、シナリオ制御装置１００にネットワーク経由で接続されたファイルサーバであり得る。 The scenario control device 100 can further use an auxiliary storage device for storing data. The auxiliary storage device may be built in the scenario control device 100 or external to the scenario control device 100. The auxiliary storage device is preferably a non-volatile storage medium such as an HDD (Hard Disk Drive), an SSD (Solid State Drive), or a flash memory. Alternatively, the auxiliary storage device may be a file server connected to the scenario control device 100 via a network.

シナリオ制御装置１００は、さらに、例えば画像データなどの外部入力データを受け付けるための入力Ｉ／Ｆを利用可能である。入力Ｉ／Ｆは、シナリオ制御装置１００に内蔵されてもよいし、シナリオ制御装置１００に外付けされてもよい。 The scenario control device 100 can further use an input I/F for receiving external input data such as image data. The input I/F may be built in the scenario control device 100 or external to the scenario control device 100.

シナリオ制御装置１００および提示制御装置２００は、図１に例示するように別個の装置であってもよいし、同一の装置であってもよい。シナリオ制御装置１００および提示制御装置２００が別個の装置である場合に、両者は必ずしも近接して配置する必要はない。例えば、シナリオ制御装置１００は、全国各地に配置された１または複数のクライアントとしての提示制御装置２００に例えばインターネットなどのネットワークを介して接続されたクラウドサーバであってもよい。すなわち、コンテンツ提示システムは、例えば、Ｒ−ｅｎｖ：連舞（登録商標）のようなクラウド対応型インタラクション制御技術により実現されてもよい。 The scenario control device 100 and the presentation control device 200 may be separate devices as illustrated in FIG. 1, or may be the same device. When the scenario control device 100 and the presentation control device 200 are separate devices, they do not necessarily have to be arranged close to each other. For example, the scenario control device 100 may be a cloud server connected to the presentation control device 200 as one or a plurality of clients located all over the country via a network such as the Internet. That is, the content presentation system may be realized by a cloud-compatible interaction control technology such as R-env: Renmai (registered trademark).

次に、シナリオ制御装置１００の機能構成例を説明する。図１に例示されるように、シナリオ制御装置１００は、画像取得部１０１と、聴衆特徴抽出部１０２と、聴衆状態推定部１０３と、遷移先決定部１０４と、状態遷移ルール記憶部１０５と、修正内容選択部１０６と、修正内容ルール記憶部１０７と、修正履歴記憶部１０８と、シナリオ修正部１０９とを含む。 Next, a functional configuration example of the scenario control device 100 will be described. As illustrated in FIG. 1, the scenario control device 100 includes an image acquisition unit 101, an audience feature extraction unit 102, an audience state estimation unit 103, a transition destination determination unit 104, a state transition rule storage unit 105, A correction content selection unit 106, a correction content rule storage unit 107, a correction history storage unit 108, and a scenario correction unit 109 are included.

画像取得部１０１は、コンテンツを視聴する聴衆をカメラ１０によって撮影することで得られた画像（データ）を取得する。画像取得部１０１は、取得した画像を聴衆特徴抽出部１０２へ送る。なお、画像は、動画像であってもよいし、静止画像であってもよい。ただし、後述するように聴衆特徴抽出部１０２において特徴量を抽出するために時系列画像を必要とするので、後者の例では複数枚の静止画像が必要とされる。画像取得部１０１は、例えば前述の通信Ｉ／Ｆおよび／または入力Ｉ／Ｆに相当し得る。 The image acquisition unit 101 acquires an image (data) obtained by capturing an image of an audience viewing content with the camera 10. The image acquisition unit 101 sends the acquired image to the audience feature extraction unit 102. The image may be a moving image or a still image. However, as will be described later, since the audience feature extraction unit 102 needs a time-series image to extract a feature amount, a plurality of still images are required in the latter example. The image acquisition unit 101 can correspond to, for example, the communication I/F and/or the input I/F described above.

聴衆特徴抽出部１０２は、画像取得部１０１から画像を受け取り、当該画像から聴衆の特徴量を抽出する。具体的には、聴衆特徴抽出部１０２は、画像から聴衆の顔領域の向き、大きさ、動きなどに基づいて、コンテンツに対する聴衆の興味・関心、集中度などを示す特徴量を抽出する。聴衆特徴抽出部１０２は、抽出した特徴量を聴衆状態推定部１０３へ送る。聴衆特徴抽出部１０２は、例えば前述のプロセッサに相当し得る。 The audience feature extraction unit 102 receives the image from the image acquisition unit 101 and extracts the feature amount of the audience from the image. Specifically, the audience feature extraction unit 102 extracts a feature amount indicating the interest of the audience with respect to the content, the degree of concentration, and the like from the image based on the orientation, size, and movement of the face area of the audience. The audience feature extraction unit 102 sends the extracted feature amount to the audience state estimation unit 103. The audience feature extraction unit 102 can correspond to, for example, the above-described processor.

例えば、聴衆特徴抽出部１０２は、以下に説明するように、聴衆のコンテンツへの興味・関心度を示す第１の特徴量および聴衆のコンテンツへの集中度を示す第２の特徴量を含む２次元の特徴量を抽出してもよい。なお、聴衆特徴抽出部１０２は、１次元または３次元以上の特徴量を抽出してもよい。 For example, as described below, the audience feature extraction unit 102 includes a second feature amount that indicates a first feature amount that indicates the interest of the audience in the content and a second feature amount that indicates the concentration degree of the audience in the content. The dimensional feature amount may be extracted. The audience feature extraction unit 102 may extract one-dimensional or three-dimensional or more feature amounts.

第１の特徴量は、顔領域が前向きである、例えば、ディスプレイ２０および／またはロボット３０へ顔を向けている聴衆の顔領域の大きさに基づいて算出され得る。顔領域が前向きである聴衆は、少なくとも画像が撮影された時点においてコンテンツに興味・関心を引かれていた可能性がある。そして、顔領域の大きさは、聴衆の本来の顔の大きさにも依存するが、聴衆からカメラ１０（これは、ディスプレイ２０およびロボット３０の近隣に配置されるとする）までの距離に大きく依存する。この距離は、聴衆がコンテンツに興味・関心を引かれ、当該コンテンツが視聴しやすくなるように近づくことで、小さくなる。また、聴衆からカメラ１０までの距離が同じであっても聴衆が増えれば、顔領域の大きさの総和は大きくなる。このように、顔領域が前向きである聴衆の顔領域の大きさは、聴衆のコンテンツへの興味・関心度を示し得る。 The first feature amount may be calculated based on, for example, the size of the face area of the audience whose face area faces forward, for example, the face of the audience facing the display 20 and/or the robot 30. An audience with a positive face area may have been interested in the content at least when the image was captured. The size of the face area depends on the size of the original face of the audience, but is large in the distance from the audience to the camera 10 (which is supposed to be located near the display 20 and the robot 30). Dependent. This distance becomes smaller as the audience gets closer to the content so that the content is easier to view. Further, even if the distance from the audience to the camera 10 is the same, if the number of audiences increases, the total size of the face area increases. In this way, the size of the face area of the audience whose face area is positive can indicate the interest in the content of the audience.

具体的には、聴衆特徴抽出部１０２は、画像に含まれる顔領域を検出する。なお、聴衆特徴抽出部１０２は、可能な限り多くの顔領域を検出してもよいが、検出数に上限が設けられてもよいし、または所定面積未満の顔領域を無視してもよい。それから、聴衆特徴抽出部１０２は、検出した顔領域のそれぞれの向きを計算する。ここで、第ｉ番目の顔領域の向きをθ_ｉとする。ｉは任意の整数である。 Specifically, the audience feature extraction unit 102 detects the face area included in the image. Note that the audience feature extraction unit 102 may detect as many face areas as possible, but an upper limit may be set for the number of detections, or face areas smaller than a predetermined area may be ignored. Then, the audience feature extraction unit 102 calculates the orientation of each of the detected face areas. Here, the orientation of the i-th face area is θ _i . i is an arbitrary integer.

次に、聴衆特徴抽出部１０２は、計算した顔領域の向きを必要に応じて補正する。例えば図４に示すように、聴衆がその顔を向けることを望まれる点（以降、基準点と称する）と、カメラ１０の位置とが一致しないことがある。かかる場合には、聴衆特徴抽出部１０２は、カメラ位置と基準点との角度差θ_ｄを、それぞれの顔領域の向きから差し引くことで、補正後の向きを求めることができる。第ｉ番目の顔領域の補正後の向きをθ’_ｉとすると、θ’_ｉ＝θ_ｉ−θ_ｄである。なお、カメラ位置および基準点が同一である場合にはθ_ｄ＝０となる。 Next, the audience feature extraction unit 102 corrects the calculated orientation of the face area as necessary. For example, as shown in FIG. 4, the point where the audience desires to turn his/her face (hereinafter referred to as a reference point) and the position of the camera 10 may not match. In such a case, the audience feature extraction unit 102 can obtain the corrected orientation by subtracting the angle difference θ _d between the camera position and the reference point from the orientation of each face area. If the corrected orientation of the i-th face area is θ′ _i , then θ′ _i =θ _i −θ _d . When the camera position and the reference point are the same, θ _d =0.

聴衆特徴抽出部１０２は、それぞれの顔領域の補正後の向きが所定の範囲内にあるか否かにより、当該顔領域が前向きであるか否かを判定する。例えば、聴衆特徴抽出部１０２は、φ_１≦θ’_ｉ≦φ_２を満足する場合には、第ｉ番目の顔領域は前向きであると判定する。ここで、φ_１およびφ_２は、想定されている聴衆の位置からディスプレイ２０およびロボット３０を視認できる顔の向きの境界に基づいて定められる閾値であり得、φ_１＜φ_２を満足する。なお、前向きの判定には、画像ベースで顔向きを取得するだけではなく、デプスカメラを用いて取得した骨格データから両肩関節の座標位置や背骨関節の向きから求めたり、視線計測カメラを用いて取得した聴衆の視線方向から求めたりしても良い。 The audience feature extraction unit 102 determines whether or not the face area is forward depending on whether or not the corrected orientation of each face area is within a predetermined range. For example, the audience feature extraction unit 102 determines that the i-th face area is facing forward when φ ₁ ≦θ′ _i ≦φ ₂ is satisfied. Here, φ ₁ and φ ₂ may be threshold values determined based on the boundary of the orientation of the face where the display 20 and the robot 30 can be visually recognized from the assumed position of the audience, and satisfy φ ₁ <φ ₂ . In addition, not only the face orientation is acquired on the basis of images, but also the face orientation is obtained from the skeletal data acquired using the depth camera from the coordinate positions of both shoulder joints and the orientation of the spinal joint, and the gaze measurement camera is used to determine the forward facing Alternatively, it may be obtained from the direction of the line of sight of the audience that has been acquired.

聴衆特徴抽出部１０２は、前向きであると判定した顔領域の面積をそれぞれ算出し、算出した面積の最大値および総和を求める。さらに、聴衆特徴抽出部１０２は、この総和をこの最大値によって除算することで面積比を求める。そして、聴衆特徴抽出部１０２は、一定時間に亘って画像の取得およびこの面積比の算出を繰り返し、当該一定時間に亘る面積比の総和を第１の特徴量として算出し得る。なお、前述のように両肩関節の座標位置や背骨関節の向きや、聴衆の視線方向から前向きの判定を行う場合には、面積比に代えて人数が第１の特徴量として算出されてよい。ここで、人数比は、前向きと判定された聴衆の数を聴衆の総数で除算することで得られる。 The audience feature extraction unit 102 calculates the area of each face area that is determined to be positive, and obtains the maximum value and the sum of the calculated areas. Further, the audience feature extraction unit 102 obtains the area ratio by dividing this total sum by this maximum value. Then, the audience feature extraction unit 102 can repeat the acquisition of the image and the calculation of the area ratio over a certain period of time, and can calculate the sum of the area ratios over the certain period of time as the first feature amount. In addition, as described above, when determining the coordinate position of both shoulder joints, the direction of the spinal joint, or the forward direction from the line of sight of the audience, the number of people may be calculated as the first feature amount instead of the area ratio. .. Here, the number-of-people ratio is obtained by dividing the number of audiences determined to be positive by the total number of audiences.

第２の特徴量は、聴衆の顔領域の動作に基づいて算出され得る。例えば、ディスプレイ２０に表示された映像コンテンツを見つめている（顔領域のブレが少ない）聴衆や、頷いている聴衆は、コンテンツに集中していると予想される。他方、左右に首を振っている聴衆や、顔を過剰に大きく動かしている聴衆は、コンテンツよりも周囲が気になっているか、単に気が散っていると予想される。 The second feature amount may be calculated based on the movement of the face area of the audience. For example, an audience who is gazing at the video content displayed on the display 20 (there is little blur in the face area) or a nod audience is expected to concentrate on the content. On the other hand, it is expected that an audience who shakes their heads to the left or right, or an audience who moves their faces too much is more distracting than the content or simply distracted.

具体的には、聴衆特徴抽出部１０２は、一定時間に亘る時系列画像からそれぞれの聴衆の顔領域を追跡する。そして、聴衆特徴抽出部１０２は、それぞれの聴衆の顔領域の動きに基づいて、当該聴衆の顔の動作を認識する。聴衆特徴抽出部１０２は、例えばジェスチャ認識器を利用して顔の動作を認識してもよい。このジェスチャ認識器は、例えば、大量の学習用の顔領域の動きデータおよびその動作ラベルを用いた教師付き学習により作成された学習モデルを含み得る。なお、画像ベースで顔領域の動作を取得するだけではなく、デプスカメラを用いて取得した骨格データの各座標から動作を認識しても良い。 Specifically, the audience feature extraction unit 102 tracks the face area of each audience from the time-series images over a certain period of time. Then, the audience feature extraction unit 102 recognizes the movement of the face of the audience based on the movement of the face area of each audience. The audience feature extraction unit 102 may recognize a facial action using, for example, a gesture recognizer. This gesture recognizer may include, for example, a learning model created by supervised learning using a large amount of face area motion data for learning and its motion labels. It should be noted that not only the motion of the face area is acquired based on the image, but the motion may be recognized from each coordinate of the skeleton data acquired using the depth camera.

顔の動作の認識結果である動作ラベルには、それぞれ事前に集中度が割り当てられている。例えば、「注視」、「頷き」などには高い集中度が割り当てられ、「左右の首振り」、「大きな動き」などには低い集中度が割り当てられ得る。ここで、集中度は、多値であってもよいが、「集中」および「発散」を意味する２値であってもよい。 The degree of concentration is assigned in advance to each motion label, which is the recognition result of the motion of the face. For example, a high degree of concentration may be assigned to “gaze”, “nodding”, etc., and a low degree of concentration may be assigned to “left/right swing”, “large movement”, and the like. Here, the degree of concentration may be multivalued, or may be a binary value that means “concentration” and “divergence”.

聴衆特徴抽出部１０２は、全聴衆に亘る認識結果（動作ラベル）をヒストグラム化し、最頻値となる認識結果を求める。そして、聴衆特徴抽出部１０２は、この最頻値に割り当てられた集中度を第２の特徴量として抽出し得る。なお、複数の最頻値が存在する場合には、聴衆特徴抽出部１０２は、これら最頻値に割り当てられた集中度の最小値またはその他の統計量を第２の特徴量として抽出し得る。 The audience feature extraction unit 102 creates a histogram of the recognition results (action labels) over the entire audience, and obtains the recognition result that is the most frequent value. Then, the audience feature extraction unit 102 can extract the degree of concentration assigned to this mode value as the second feature amount. When there are a plurality of mode values, the audience feature extraction unit 102 can extract the minimum value of the degree of concentration assigned to these mode values or another statistic as the second feature amount.

或いは、聴衆特徴抽出部１０２は、各聴衆についての認識結果に割り当てられた集中度を当該聴衆の集中度として抽出し、この集中度の平均などの統計量を第２の特徴量としてもよい。 Alternatively, the audience feature extraction unit 102 may extract the degree of concentration assigned to the recognition result for each audience as the degree of concentration of the audience, and use a statistic such as the average of the degree of concentration as the second feature amount.

聴衆状態推定部１０３は、聴衆特徴抽出部１０２から聴衆の特徴量を受け取り、これに基づいて、聴衆の視聴状態を複数の状態のいずれか１つとして推定する。推定された視聴状態は、以降の説明において現在状態と称する。聴衆状態推定部１０３は、現在状態を示す値、例えば状態ＩＤを遷移先決定部１０４へ送る。聴衆状態推定部１０３は、例えば前述のプロセッサに相当し得る。 The audience state estimation unit 103 receives the audience feature amount from the audience feature extraction unit 102, and estimates the audience viewing state as one of the plurality of states based on the feature amount. The estimated viewing state is referred to as a current state in the description below. The audience state estimation unit 103 sends a value indicating the current state, for example, a state ID, to the transition destination determination unit 104. The audience state estimation unit 103 can correspond to, for example, the above-described processor.

上記複数の状態は様々に定義可能であるが、前述の２次元の特徴量を前提とすると例えば図５に示す４つの状態が定義され得る。状態１は、聴衆のコンテンツに対する興味・関心および集中度が高い状態を指し、状態２は、聴衆のコンテンツに対する興味・関心は高いものの集中度が低い状態を指し、状態３は、聴衆のコンテンツに対する興味・関心が低いものの集中度が高い状態を指し、状態４は、聴衆のコンテンツに対する興味・関心および集中度が低い状態を指す。なお、図５の例では簡単化のために上記第１の特徴量および第２の特徴量がそれぞれ２つの範囲に区分されているが、一方または両方が３以上の範囲に区分されてもよい。 Although the plurality of states can be defined in various ways, four states shown in FIG. 5, for example, can be defined based on the above-described two-dimensional feature amount. State 1 refers to a state in which the audience has a high interest/interest in the content and concentration, state 2 refers to a state in which the audience has a high interest/interest in the content but a low concentration, and state 3 refers to the content in the audience. A state of low interest/high interest but high concentration, state 4 indicates a state of low interest/interest and focus on the content of the audience. In the example of FIG. 5, the first feature amount and the second feature amount are each divided into two ranges for simplification, but one or both may be divided into three or more ranges. ..

図５の例によれば、聴衆状態推定部１０３は、第１の特徴量および第２の特徴量がそれぞれ第１の閾値および第２の閾値以上である場合に、現在状態を状態１と推定し、第１の特徴量が第１の閾値以上であって第２の特徴量が第２の閾値未満である場合に、現在状態を状態２と推定し、第１の特徴量が第１の閾値未満であって第２の特徴量が第２の閾値以上である場合に、現在状態を状態３と推定し、第１の特徴量および第２の特徴量がそれぞれ第１の閾値および第２の閾値未満である場合に、現在状態を状態４と推定する。 According to the example of FIG. 5, the audience state estimation unit 103 estimates the current state as state 1 when the first feature amount and the second feature amount are equal to or more than the first threshold value and the second threshold value, respectively. However, when the first feature amount is equal to or more than the first threshold value and the second feature amount is less than the second threshold value, the current state is estimated to be the state 2, and the first feature amount is the first When it is less than the threshold value and the second feature amount is equal to or more than the second threshold value, the current state is estimated to be the state 3, and the first feature amount and the second feature amount are respectively the first threshold value and the second threshold value. If it is less than the threshold of, the current state is estimated to be state 4.

図５の例では、状態１が最も良好と状態と定義され、状態４が最も良好でない状態と定義される。すなわち、現在状態が状態１と推定されれば、現在のシナリオによりコンテンツが聴衆に対して効果的に提示すされていると予想されるので、シナリオの修正は不要である。他方、現在状態が状態１以外と推定されれば、聴衆のコンテンツに対する興味・関心および／または集中度に改善の余地があるので、シナリオ制御装置１００は状態１を目指してシナリオを修正することになる。 In the example of FIG. 5, the state 1 is defined as the best state, and the state 4 is defined as the poorest state. That is, if the current state is estimated to be state 1, it is expected that the content will be effectively presented to the audience by the current scenario, and thus the scenario need not be modified. On the other hand, if the current state is estimated to be other than state 1, there is room for improvement in the interest and/or concentration of the audience's content, and therefore the scenario control device 100 decides to correct the scenario aiming at state 1. Become.

遷移先決定部１０４は、聴衆状態推定部１０３から聴衆の視聴状態（現在状態）を示す値を受け取る。遷移先決定部１０４は、まず、現在状態が状態遷移をする必要ない状態、例えば図５の状態１であるか否かを判定する。現在状態が状態遷移をする必要のある状態である場合には、遷移先決定部１０４は、現在状態よりも良好と定義される１つを遷移先状態と決定する。遷移先決定部１０４は、現在状態および遷移先状態を示す値、例えば状態ＩＤを修正内容選択部１０６へ送る。遷移先決定部１０４は、例えば前述のプロセッサに相当し得る。 The transition destination determination unit 104 receives a value indicating the viewing state (current state) of the audience from the audience state estimation unit 103. The transition destination determination unit 104 first determines whether or not the current state is a state that does not require state transition, for example, state 1 in FIG. When the current state is a state that needs to undergo state transition, the transition destination determination unit 104 determines one that is defined as better than the current state as the transition destination state. The transition destination determination unit 104 sends a value indicating the current state and the transition destination state, for example, a state ID, to the correction content selection unit 106. The transition destination determination unit 104 can correspond to, for example, the above-described processor.

図５の例によれば、状態１が状態遷移をする必要ない状態であって、状態２〜４が状態遷移をする必要のある状態である。現在状態が状態２または状態３である場合には、遷移先決定部１０４は、状態１を遷移先状態として決定すればよい。他方、現在状態が状態４である場合には、図６に例示される状態４→状態２、それから状態２→状態１という状態遷移と、図７に例示される状態４→状態３、それから状態３→状態１という状態遷移が選択可能である。どちらの状態遷移を優先するかは固定またはランダムであってもよいが、例えば以下に説明するように戦略的に決定されてよい。 According to the example of FIG. 5, the state 1 is the state that does not need to make the state transition, and the states 2 to 4 are the states that need to make the state transition. When the current state is the state 2 or the state 3, the transition destination determination unit 104 may determine the state 1 as the transition destination state. On the other hand, when the current state is the state 4, the state transition illustrated in FIG. 6 is state 4→state 2, then the state transition 2→state 1, and the state transition illustrated in FIG. 7 is state 4→state 3 and then the state transition. A state transition of 3→state 1 can be selected. Which state transition has priority may be fixed or random, but may be strategically determined as described below, for example.

例えば、映像コンテンツの一例であるプレゼンテーションの構成は様々であるが、ある種のプレゼンテーションでは、その前半である導入部では聴衆に興味・関心を引くためにインパクトのあるトピックや身近なトピックを含んだスライドが配置され、その後半である結論部ではそのプレゼンテーションによって聴衆に最終的に伝えたいメッセージなどを含んだスライドが配置される。故に、例えば前半部ではプレゼンテーションへの興味・関心を引くことを優先してシナリオを修正し、後半部ではプレゼンテーションに意識を集中させることを優先してシナリオを修正することで、ロボット３０にスライドの構成と調和した発話および／または非言語動作を取らせて聴衆の興味・関心・集中を巧みに引き出すことができる。 For example, while the presentation, which is an example of video content, has various configurations, some types of presentations include topics that have impact and familiar topics in the first half of the presentation to attract the audience. Slides are arranged, and in the latter half of the conclusion section, slides containing messages and the like to be finally conveyed to the audience by the presentation are arranged. Therefore, for example, in the first half, the scenario is corrected by giving priority to attracting interest in the presentation, and in the second half, the scenario is corrected by giving priority to focusing on the presentation. Speaking and/or non-verbal actions in harmony with the composition can be used to skillfully elicit the interest, attention and concentration of the audience.

そこで、例えば、遷移先決定部１０４は、遷移先状態の候補が複数ある（図５の例によれば現在状態が状態４である）場合には、シナリオの進捗状況を参照し得る。ここで、進捗状況は、例えば、シナリオの実行位置をシナリオの全長で除算することで導出可能である。シナリオの実行位置は、映像コンテンツの再生中の要素の位置を表し、例えば、再生中のスライド番号、スライドに設定された再生中のアニメーションなどの要素の番号などの識別情報、映像コンテンツの再生中のシーンの番号などの識別情報、映像コンテンツの現在の再生時間、コンテンツの提示開始からの経過時間、などであってよく、提示制御装置２００（の実行位置通知部２０２）によって通知され得る。また、シナリオの全長は、例えば、総スライド数、映像コンテンツの総再生時間、コンテンツの提示開始から終了までの時間、などあってよい。遷移先決定部１０４は、シナリオの進捗状況が閾値（以降、便宜的に方針転換閾値とも称する）未満である場合に状態２を遷移先状態と決定し、シナリオの進捗状況が方針転換閾値以上である場合に状態３を遷移先状態と決定してもよい。方針転換閾値は、例えば１／２であって、状態遷移ルール記憶部１０５に保存される状態遷移ルールの少なくとも一部として記述され得る。 Therefore, for example, when there are a plurality of candidates for the transition destination state (the current state is state 4 according to the example of FIG. 5), the transition destination determination unit 104 can refer to the progress status of the scenario. Here, the progress status can be derived, for example, by dividing the execution position of the scenario by the total length of the scenario. The execution position of the scenario represents the position of the element during playback of the video content. For example, the slide number being played, identification information such as the element number of the animation being set for the slide, etc., the playback of the video content. The identification information such as the scene number, the current reproduction time of the video content, the elapsed time from the start of presentation of the content, and the like may be used, and can be notified by (the execution position notification unit 202 of) the presentation control device 200. Further, the total length of the scenario may be, for example, the total number of slides, the total playback time of the video content, the time from the start of presentation of the content to the end thereof. The transition destination determination unit 104 determines the state 2 as the transition destination state when the progress status of the scenario is less than the threshold value (hereinafter, also referred to as policy transition threshold value for convenience), and the progress status of the scenario is equal to or greater than the policy transition threshold value. In some cases, the state 3 may be determined as the transition destination state. The policy change threshold is, for example, 1/2 and can be described as at least a part of the state transition rule stored in the state transition rule storage unit 105.

なお、シナリオの全長が短いと定義される範囲にある場合には、どちらの状態遷移を優先するかを固定しておいてもよい。例えば、聴衆の興味・関心を引き出す状態遷移を優先する場合には方針転換閾値を１とし、聴衆の集中を引き出す状態遷移を優先する場合には方針転換閾値を０とすればよい。仮に、あるシナリオ長閾値よりもシナリオの全長が短い場合に当該シナリオの全長が短いと判定する場合に、シナリオの進捗状況に関わらず聴衆の興味・関心を引き出す状態遷移を優先するとすれば、方針転換閾値は以下のように導出可能である。 If the total length of the scenario is within the range defined as short, which state transition is to be prioritized may be fixed. For example, the policy change threshold may be set to 1 when priority is given to state transitions that bring out the interest of the audience, and the policy change threshold may be set to 0 when priority is given to state transitions that bring out the concentration of the audience. If the total length of a scenario is shorter than a certain scenario length threshold and it is determined that the total length of the scenario is short, if the priority is given to state transitions that elicit the interest of the audience regardless of the progress of the scenario, the policy is The conversion threshold can be derived as follows.

上記数式において、Ｔｈ_Ｐは方針転換閾値、Ｌはシナリオの全長、ＴＨ_Ｌはシナリオ長閾値をそれぞれ表す。 In the above formula, Th _P represents the policy change threshold, L represents the total length of the scenario, and TH _L represents the scenario length threshold.

状態遷移ルール記憶部１０５は、状態遷移ルールを、例えば図８に示される状態遷移ルールテーブルの形式で保存する。状態遷移ルール記憶部１０５に保存された状態遷移ルールは、遷移先決定部１０４によって必要に応じて読み出される。状態遷移ルール記憶部１０５は、例えば前述のメモリおよび／または補助記憶装置に相当し得る。 The state transition rule storage unit 105 stores the state transition rule in the state transition rule table format shown in FIG. 8, for example. The state transition rule stored in the state transition rule storage unit 105 is read by the transition destination determination unit 104 as needed. The state transition rule storage unit 105 can correspond to, for example, the above-mentioned memory and/or auxiliary storage device.

修正内容選択部１０６は、遷移先決定部１０４から現在状態および遷移先状態を示す値を受け取り、さらに提示制御装置２００（のシナリオ通知部２０３）から現在実行中のシナリオ、およびその後に実行されるシナリオを通知される。修正内容選択部１０６は、修正内容ルール記憶部１０７に保存された修正内容ルールを参照し、現在状態および遷移先状態の組み合わせに関連付けられる少なくとも１つのシナリオ修正内容を読み出す。ここで、修正内容ルールは、現在状態および遷移先状態の組み合わせに対して利用可能なシナリオ修正内容およびその修正ターゲットを記述する。読み出されたシナリオ修正内容は、適用されるシナリオ修正内容の候補に相当する。修正内容選択部１０６は、候補の中から１つを選択し、選択したシナリオ修正内容を示す値、修正の対象となるシナリオ位置を示す値、および当該シナリオ位置によって特定されるシナリオ（の一部分）をシナリオ修正部１０９へ送る。修正内容選択部１０６は、例えば前述のプロセッサに相当し得る。 The correction content selection unit 106 receives values indicating the current state and the transition destination state from the transition destination determination unit 104, and is executed from the presentation control device 200 (the scenario notification unit 203 thereof) and the scenario currently being executed. You will be notified of the scenario. The modification content selection unit 106 refers to the modification content rule stored in the modification content rule storage unit 107 and reads out at least one scenario modification content associated with the combination of the current state and the transition destination state. Here, the modification content rule describes the scenario modification content and the modification target that can be used for the combination of the current state and the transition destination state. The read scenario correction content corresponds to a candidate scenario correction content to be applied. The correction content selection unit 106 selects one from the candidates, a value indicating the selected scenario correction content, a value indicating a scenario position to be corrected, and (a part of) the scenario specified by the scenario position. To the scenario correction unit 109. The modification content selection unit 106 can correspond to, for example, the above-described processor.

ここで、現在状態および遷移先状態の組み合わせ、すなわち目標となる状態遷移毎に、当該状態遷移を実現するために効果的なアプローチ、すなわちシナリオ修正内容は異なり得る。図９に例示されるように、状態４から状態２への遷移には、「効果音／ＬＥＤの活用」、「同じ説明を繰り返す」、「手招き動作」などをロボット３０に行わせるようにシナリオを修正することが効果的であるかもしれないが、状態４から状態３への遷移には、「視線制御によるスライドへの注意誘導」、「ポインティング動作」などをロボット３０に行わせるようにシナリオを修正することが効果的であるかもしれない。また、聴衆の属性は多様であるので、このようなシナリオ修正内容の全てが目標となる状態遷移を実現するために常に効果的であるとは限らない。そこで、修正内容選択部１０６は、シナリオ修正内容を試行錯誤的に選択することで、聴衆の反応の良いシナリオを探り当て、聴衆の視聴状態を良好と定義される状態へ遷移させる。 Here, for each combination of the current state and the transition destination state, that is, for each target state transition, an effective approach for realizing the state transition, that is, a scenario correction content may be different. As illustrated in FIG. 9, in the transition from the state 4 to the state 2, the scenario in which the robot 30 performs “use of sound/LED”, “repeat the same description”, “beckoning action”, etc. Although it may be effective to correct the above, in the transition from the state 4 to the state 3, the robot 30 is caused to perform “attention guidance to slide by eye control”, “pointing action”, etc. May be effective. Further, since the attributes of the audience are various, not all such scenario correction contents are always effective for realizing the targeted state transition. Therefore, the correction content selection unit 106 selects a scenario correction content by trial and error to find a scenario in which the audience has a good reaction, and shifts the audience's viewing state to a state defined as good.

図９における「効果音／ＬＥＤ（Ｌｉｇｈｔ−ＥｍｉｔｔｉｎｇＤｉｏｄｅ）の活用」とは、例えば図１０に示されるように、対象となるシナリオ位置におけるロボット３０の発話時に効果音を出力／ＬＥＤを発光させることであり得る。これにより、聴衆の聴覚／視覚を刺激して、その注意を引きつける効果が期待できる。なお、ＬＥＤは、例えばロボット３０に搭載されていてもよい。 “Use of sound effect/LED (Light-Emitting Diode)” in FIG. 9 means, for example, as shown in FIG. 10, outputting a sound effect/lighting an LED when the robot 30 speaks at a target scenario position. Can be As a result, the effect of stimulating the hearing/visual sense of the audience and attracting the attention can be expected. The LED may be mounted on the robot 30, for example.

ここで、対象となるシナリオ位置とは、シナリオ修正が行われるシナリオ位置を意味しており、例えば映像コンテンツの要素のうちの提示中またはそれ以降（典型的には次）に提示される要素であり得る。具体的には、対象となるシナリオ位置は、再生中もしくはそれ以降のスライド、再生中のスライドに設定された再生中もしくはそれ以降のアニメーション、または、再生中もしくはそれ以降のシーン、などであり得る。また、対象となるシナリオ位置は、シナリオ修正内容に依存して定められ得る。例えば、再生中の映像コンテンツの要素を対象に非言語動作または発話内容を変更すると説明の途中にロボット３０の言動が変わることで聴衆に違和感を与えるおそれがあるので、非言語動作または発話内容を変更する修正内容が選択された場合には再生中の映像コンテンツの要素の次に再生される要素が対象となるシナリオ位置として定められ得る。他方、例えば後述する「同じ説明を繰り返す」が修正内容である場合には、対象となるシナリオ位置は、再生中の映像コンテンツの要素であってもよいし、それ以降の要素であってもよい。 Here, the target scenario position means a scenario position where the scenario is corrected, and is, for example, an element that is presented during or after the presentation (typically the next) among the elements of the video content. possible. Specifically, the target scenario position may be a slide being played or later, an animation being played or being set for a slide being played, a scene being played or being played later, or the like. .. Further, the target scenario position can be determined depending on the scenario correction content. For example, if the non-verbal operation or the utterance content is changed for the element of the video content being reproduced, the behavior of the robot 30 may change in the middle of the explanation, which may make the audience feel uncomfortable. When the modification content to be changed is selected, the element to be reproduced next to the element of the video content being reproduced can be set as the target scenario position. On the other hand, for example, when “repeating the same description” described later is the correction content, the target scenario position may be an element of the video content being reproduced, or an element after that. ..

図９における「同じ説明を繰り返す」とは、例えば図１１に示されるように、対象となるシナリオ位置におけるシナリオの内容を繰り返す、例えば複製して当該シナリオ位置の直前または直後に挿入することであり得る。なお、繰り返し部分のシナリオには、さらに「ちょっと分かりにくかった？」、「もう１度言いますね」などの発話内容が追加されてもよい。これにより、対象となるシナリオ位置においてコンテンツを強調して、聴衆の注意を引きつける効果が期待できる。 “Repeating the same description” in FIG. 9 means, for example, as shown in FIG. 11, repeating the contents of the scenario at the target scenario position, for example, duplicating and inserting immediately before or after the scenario position. obtain. It should be noted that the scenario of the repeated part may be further added with utterance contents such as "Don't you understand a little?" and "I will say it again". As a result, the effect of emphasizing the content at the target scenario position and attracting the attention of the audience can be expected.

図９における「手招き動作」とは、ロボット３０に手招きをさせることであり得る。これにより、ディスプレイ２０前に既に居る聴衆の注意を引きつける効果に加えて新たな聴衆を周囲から呼び込む効果も期待できる。 The “beckoning action” in FIG. 9 may be causing the robot 30 to beckon. As a result, in addition to the effect of attracting the attention of the audience who is already in front of the display 20, an effect of attracting a new audience from the surroundings can be expected.

図９における「図解動作」とは、例えばロボット３０の手および／または腕を動かして、映像コンテンツ中の注目すべき部分を囲わせたり、対比すべき部分に左右の手をそれぞれ添えさせたりすることであり得る。これにより、映像コンテンツの中で強調したい点を聴衆に明確に伝え、聴衆に重要点を意識付ける効果が期待できる。 The “illustrated motion” in FIG. 9 means, for example, moving the hand and/or arm of the robot 30 to enclose a portion of interest in the video content or attach a left hand and a right hand to the portion to be compared. Can be. As a result, it is possible to expect an effect that the points to be emphasized in the video content are clearly communicated to the audience and the audience is made aware of the important points.

図９における「ポインティング動作」とは、例えば図１２に示されるように、ロボット３０の手および／または腕を動かして、映像コンテンツ中の注目すべき部分に指または手を指させることであり得る。これにより、映像コンテンツの中で強調したい点を聴衆に明確に伝え、聴衆に重要点を意識付ける効果が期待できる。 The “pointing operation” in FIG. 9 may be moving the hand and/or arm of the robot 30 to point a finger or hand to a notable portion in the video content, as shown in FIG. 12, for example. .. As a result, it is possible to expect an effect that the points to be emphasized in the video content are clearly communicated to the audience and the audience is made aware of the important points.

図９における「パラ言語で重要点強調」とは、例えば対象となるシナリオ位置におけるロボット３０の発話内容そのものではなく発話時の音量を大きくしたり、ピッチを変更したり、イントネーションを付けたり、ポーズを長くしたりすることであり得る。これにより、対象となるシナリオ位置におけるロボット３０の発話内容が強調され、聴衆にロボット３０の発話内容に傾聴するよう促す効果が期待できる。 The “emphasis on important points in para-language” in FIG. 9 means, for example, not the utterance content itself of the robot 30 at the target scenario position, but the volume at the time of utterance is increased, the pitch is changed, intonation is added, and the pause is performed. Can be longer. As a result, the utterance content of the robot 30 at the target scenario position is emphasized, and an effect of encouraging the listener to listen to the utterance content of the robot 30 can be expected.

図９における「視線制御によるスライドへの注意誘導」とは、例えば図１４に示されるように、ロボット３０の頭部を動かしてディスプレイ２０に目を向けさせることであり得る。これにより、聴衆が同調してディスプレイ２０に注視する効果が期待できる。 The “attention guidance to the slide by the line-of-sight control” in FIG. 9 may be moving the head of the robot 30 to direct the eyes on the display 20, as shown in FIG. 14, for example. This can be expected to have the effect of allowing the audience to gaze at the display 20 in synchronism.

図９における「視線制御による聴衆へのアイコンタクト動作」とは、例えば図１３に示されるように、ロボット３０の頭部を動かして聴衆に目を向けさせることであり得る。これにより、ロボット３０は聴衆に語りかけるように発話することになり、聴衆にロボット３０の発話内容に傾聴するよう促す効果が期待できる。 The “eye contact operation to the audience by the line-of-sight control” in FIG. 9 may be moving the head of the robot 30 to direct the audience to look at it, as shown in FIG. 13, for example. As a result, the robot 30 speaks so as to speak to the audience, and an effect of encouraging the audience to listen to the speech content of the robot 30 can be expected.

なお、図９のシナリオ修正内容は例示に過ぎず、ある状態遷移を実現するために利用可能であるとして図示されたシナリオ修正内容が、異なる状態遷移を実現するために利用可能とされてもよい。 It should be noted that the scenario correction content of FIG. 9 is merely an example, and the scenario correction content illustrated as being usable for realizing a certain state transition may be used for realizing a different state transition. ..

修正内容選択部１０６は、効果的なシナリオ修正内容を選択するために、例えば修正履歴記憶部１０８に保存された修正履歴を参照してもよい。この修正履歴には、修正ルール毎に当該修正ルールに含まれるシナリオ修正内容の累積選択回数が記述される。なお、累積選択回数は、例えば提示されるコンテンツが変わる度、または同一のコンテンツであっても１回分の提示が終わる度にリセットされ得る。また、修正履歴には、累積回数以外に、修正時の付加情報を記録しても良い。例えば、修正を実施した時刻や修正前のシナリオ内容、修正を実施した前後の第１，第２の特徴量の統計化情報（例えば、修正前から修正後の各特徴量の変化率の平均値）などを記録しても良い。修正内容選択部１０６は、例えば累積選択回数が最小であるシナリオ修正内容を選択してもよい。また、修正内容選択部１０６は、前述した修正時の付加情報を用いて、同一の時刻、時間帯（午前午後など）、および／または曜日などにおける修正内容の選択や、修正前シナリオの内容を考慮して、同一の時刻、時間帯、および／または曜日などに同一の修正前シナリオに対して同一のシナリオ修正内容が繰り返し選択されないようにシナリオ修正内容の選択を行ってもよい。これらにより、様々なシナリオ修正内容が満遍なく選択されるので、聴衆の反応の良いシナリオを探り当て、聴衆の視聴状態を良好と定義される状態へ遷移させることができる。また、これらにより、シナリオは非画一的に修正されるので、聴衆がロボット３０の発話内容および／または非言語動作に慣れることによるシナリオ修正の効力の低下を抑制することもできる。 The correction content selection unit 106 may refer to the correction history stored in the correction history storage unit 108, for example, in order to select an effective scenario correction content. In this revision history, the cumulative number of times of scenario revision contents included in the revision rule is described for each revision rule. It should be noted that the cumulative selection count can be reset, for example, each time the presented content changes, or each time the presentation of the same content ends. In addition to the cumulative number of times, additional information at the time of modification may be recorded in the modification history. For example, the time at which the correction is performed, the scenario contents before the correction, the statistical information of the first and second feature amounts before and after the correction (for example, the average value of the rate of change of each feature amount before and after the correction) ), etc. may be recorded. The modification content selection unit 106 may select, for example, the scenario modification content with the smallest cumulative selection count. Further, the correction content selection unit 106 selects the correction content at the same time, time zone (such as am and pm), and/or day of the week using the additional information at the time of correction described above, and selects the content of the scenario before correction. Considering this, the scenario correction contents may be selected so that the same scenario correction contents are not repeatedly selected for the same pre-correction scenario at the same time, time zone, and/or day of the week. As a result, various scenario correction contents are uniformly selected, so that it is possible to find a scenario in which the audience has a good reaction and to shift the audience's viewing state to a state defined as good. Further, as a result, the scenario is non-uniformly modified, so that it is possible to suppress a decrease in the effectiveness of the scenario modification due to the audience becoming accustomed to the utterance content and/or non-verbal movements of the robot 30.

修正内容選択部１０６は、実行中のシナリオの内容に基づいて、シナリオ修正内容を選択してもよい。例えば、修正内容選択部１０６は、実行中のシナリオの内容の非言語動作と一致または類似する非言語動作を修正ターゲットとするシナリオ修正内容を選択しないようにしてもよい。かかるシナリオ修正内容を選択しないことで、ロボット３０の非言語動作が単調となるのを防ぐ効果が期待できる。また、修正対象のシナリオを分析して，発話内容に応じて適切な非言語動作に修正しても良い。 The modification content selection unit 106 may select the scenario modification content based on the content of the scenario being executed. For example, the correction content selection unit 106 may not select the scenario correction content for which a non-verbal operation that matches or is similar to the non-verbal operation of the content of the scenario being executed is a correction target. By not selecting such scenario correction contents, an effect of preventing the non-verbal movement of the robot 30 from becoming monotonous can be expected. Further, the scenario to be corrected may be analyzed and corrected to an appropriate non-verbal action according to the utterance content.

修正内容ルール記憶部１０７は、修正内容ルールを、例えば図１５に示される修正ルールテーブルの形式で保存する。図１５の修正内容ルールテーブルでは、それぞれの修正内容ルールを特定するためのＩＤと、当該修正内容ルールを利用可能な現在状態および遷移先状態の組み合わせと、当該修正内容ルールの詳細であるシナリオ修正内容およびその修正ターゲットとが関連付けられている。修正内容ルール記憶部１０７に保存された修正内容ルールは、修正内容選択部１０６によって必要に応じて読み出される。修正内容ルール記憶部１０７は、例えば前述のメモリおよび／または補助記憶装置に相当し得る。 The modification content rule storage unit 107 stores the modification content rule in the format of the modification rule table shown in FIG. 15, for example. In the modification content rule table of FIG. 15, a combination of an ID for identifying each modification content rule, a current state and a transition destination state in which the modification content rule can be used, and a scenario modification that is the details of the modification content rule The content and its correction target are associated. The modification content rule stored in the modification content rule storage unit 107 is read by the modification content selection unit 106 as necessary. The modification content rule storage unit 107 may correspond to, for example, the above-described memory and/or auxiliary storage device.

修正履歴記憶部１０８は、修正履歴を、例えば図１６に示される修正履歴テーブルの形式で保存する。図１６の修正履歴テーブルでは、それぞれの修正内容ルール（のうちのシナリオ修正内容）を特定するためのＩＤと、当該シナリオ修正内容の累積選択回数とが関連付けられている。なお、図１６の修正履歴テーブルにおけるＩＤの項目は、図１５の修正内容ルールテーブルにおけるＩＤの項目と共通であり得る。修正履歴記憶部１０８に保存された修正履歴は、修正内容選択部１０６によって必要に応じて読み出される。また、修正履歴記憶部１０８に保存された修正履歴は、シナリオ修正部１０９によってシナリオ修正の度に更新（インクリメント）される。修正履歴記憶部１０８は、例えば前述のメモリおよび／または補助記憶装置に相当し得る。 The revision history storage unit 108 stores the revision history in the format of the revision history table shown in FIG. 16, for example. In the correction history table of FIG. 16, an ID for specifying each of the correction content rules (of which, the scenario correction content) is associated with the cumulative selection count of the scenario correction content. The item of ID in the correction history table of FIG. 16 may be common to the item of ID in the correction content rule table of FIG. The correction history stored in the correction history storage unit 108 is read by the correction content selection unit 106 as needed. The correction history stored in the correction history storage unit 108 is updated (incremented) by the scenario correction unit 109 each time the scenario is corrected. The correction history storage unit 108 may correspond to, for example, the above-mentioned memory and/or auxiliary storage device.

シナリオ修正部１０９は、修正内容選択部１０６から当該修正内容選択部１０６によって選択されたシナリオ修正内容を示す値、例えばＩＤを受け取り、さらに修正の対象となるシナリオ位置を示す値、およびこのシナリオ位置によって特定されるシナリオ（修正前のシナリオと呼ぶこともできる）を受け取る。シナリオ修正部１０９は、これらシナリオ修正内容およびシナリオ位置に基づいて、この修正前のシナリオを修正する。シナリオ修正部１０９は、修正後のシナリオを例えば図示されない送信手段（例えば、通信Ｉ／Ｆなど）により提示制御装置２００へ送る。例えば、修正後のシナリオを提示制御装置２００のシナリオ実行部２０１へ送り修正後のシナリオにより提示制御装置２００を動作させても良いし、シナリオ記憶部２０５へ送り修正後のシナリオを記録した後に提示制御装置２００を動作させても良い。また、シナリオ修正部１０９は、修正履歴記憶部１０８に保存された修正履歴を更新する。より具体的には、シナリオ修正部１０９は、適用したシナリオ修正内容の累積選択回数をインクリメントさせる。シナリオ修正部１０９は、例えば前述のプロセッサに相当し得る。 The scenario correction unit 109 receives a value indicating the scenario correction content selected by the correction content selection unit 106, for example, an ID from the correction content selection unit 106, and further, a value indicating a scenario position to be corrected, and this scenario position. Receives the scenario specified by (also called the unmodified scenario). The scenario correction unit 109 corrects the scenario before this correction based on the scenario correction content and the scenario position. The scenario correction unit 109 sends the post-correction scenario to the presentation control apparatus 200 by, for example, transmission means (not shown) (for example, communication I/F). For example, the corrected scenario may be sent to the scenario execution unit 201 of the presentation control apparatus 200 to operate the presentation control apparatus 200 according to the modified scenario, or may be sent to the scenario storage unit 205 and the modified scenario is recorded and then presented. The control device 200 may be operated. The scenario correction unit 109 also updates the correction history stored in the correction history storage unit 108. More specifically, the scenario correction unit 109 increments the cumulative selection count of the applied scenario correction contents. The scenario correction unit 109 can correspond to the above-described processor, for example.

具体的には、シナリオ修正部１０９は、シナリオ修正内容の修正ターゲットが発話内容である場合には、対象となるシナリオ位置に対応するシナリオに記述された発話内容に対してシナリオ修正内容の示す発話内容（付加的な台詞、または発話法（例えば、効果音の活用、パラ言語で重要点強調、など））を追加し得る。付加的な台詞は、修正前のシナリオの発話内容の前に、「分かりにくかった？」、「もう一度説明するね」といった発話を追加する。効果音の活用では、修正前のシナリオの発話内容の発話と同時に、「ジャーン」、「ピーン」、といった効果音を同時に再生する。パラ言語で重要点強調では、修正前のシナリオの発話内容を発話する時の音量を大きくしたり、発話前に一定時間の間を設けて発話を行ったりする。 Specifically, when the correction target of the scenario correction content is the utterance content, the scenario correction unit 109 utters the utterance indicated by the scenario correction content with respect to the utterance content described in the scenario corresponding to the target scenario position. Content (additional dialogue, or utterances (eg, use of sound effects, emphasis on paralinguistics, etc.) may be added. For additional dialogue, utterances such as "Don't you understand?" and "I will explain again" are added before the utterance contents of the scenario before correction. In the use of sound effects, the sound effects such as "Jean" and "Pean" are simultaneously played at the same time as the utterance of the utterance content of the scenario before correction. In para-language important point emphasis, the volume of the utterance contents of the scenario before correction is increased, or a certain period of time is set before the utterance.

また、シナリオ修正部１０９は、シナリオ修正内容の修正ターゲットが非言語動作である場合には、対象となるシナリオ位置に対応するシナリオに記述された非言語動作をシナリオ修正内容の示す非言語動作（例えば、手招き動作、図解動作、ポインティング動作、視線制御によるスライドへの注意誘導、視線制御による聴衆へのアイコンタクト動作、など）によって置換し、または対象となるシナリオ位置に対応するシナリオに記述された非言語動作にシナリオ修正内容の示す非言語動作を追加し得る。例えば、シナリオ修正内容が手招き動作などの聴衆の興味を引き付ける非言語動作の場合には、置換するのではなく、対象となるシナリオ位置に対応するシナリオに記述された非言語動作の前に追加しても良い。また、シナリオ修正部１０９は、シナリオ修正内容の修正ターゲットが全てである場合には、例えば同じ説明を繰り返すために、対象となるシナリオ位置に対応するシナリオを複製して当該シナリオ位置の直前または直後に挿入し得る。複製時には，シナリオ中の発話内容の前に、繰り返しを表す台詞として「もう一回繰り返すね。」、「ちょっと難しかったかな。」、などを追加するよう、シナリオ修正内容において、対象となるシナリオとシナリオ修正内容で定義した新たなシナリオとの関係（修正操作）、対象となるシナリオとは別にシナリオ修正内容で定義した新たなシナリオ（繰り返しを表す台詞や台詞に対応する非言語動作）を詳細に指定しても良い。 In addition, when the correction target of the scenario correction content is a non-verbal operation, the scenario correction unit 109 indicates the non-verbal operation described in the scenario corresponding to the target scenario position as the non-verbal operation ( (E.g., beckoning action, illustrated action, pointing action, attention guidance to the slide by eye gaze control, eye contact action to the audience by eye gaze control, etc.) or described in the scenario corresponding to the target scenario position. The non-verbal operation indicated by the scenario correction content may be added to the non-verbal operation. For example, if the scenario correction is a non-verbal action that attracts the audience's attention such as a beckoning action, instead of replacing it, add it before the non-verbal action described in the scenario corresponding to the target scenario position. May be. Further, when the correction targets of the scenario correction contents are all, the scenario correction unit 109 duplicates the scenario corresponding to the target scenario position and repeats the same description, for example, immediately before or after the scenario position. Can be inserted into. At the time of copying, before the utterance content in the scenario, add the words "Repeat again.", "Is it a little difficult?" Details of the relationship (correction operation) with the new scenario defined in the scenario modification content, and the new scenario (speech that represents repetition and non-verbal action corresponding to the speech) defined in the scenario modification content separately from the target scenario You may specify.

次に、提示制御装置２００の機能構成例を説明する。提示制御装置２００は、図１に例示されるように、シナリオ実行部２０１と、実行位置通知部２０２と、シナリオ通知部２０３と、提示制御部２０４と、シナリオ記憶部２０５とを含む。 Next, a functional configuration example of the presentation control device 200 will be described. As illustrated in FIG. 1, the presentation control device 200 includes a scenario execution unit 201, an execution position notification unit 202, a scenario notification unit 203, a presentation control unit 204, and a scenario storage unit 205.

シナリオ実行部２０１は、シナリオ記憶部２０５からシナリオを読み出し、当該シナリオの実行制御を行う。シナリオ実行部２０１は、シナリオに記述された映像コンテンツの要素（例えばスライド、またはスライドに設定されたアニメーション）と、当該要素の提示中のロボット３０の発話内容および非言語動作とを順次解釈し、ディスプレイ２０に表示させる映像データ、ロボット３０の発話内容データ（ＴＴＳ（Ｔｅｘｔ−ｔｏ−Ｓｐｅｅｃｈ）処理可能なテキストデータおよび発話法を示すデータ（オプション）であってもよいし、音声データそのものであってもよい）、ロボット３０の非言語動作を制御する動作制御データ、などを得て、これらを提示制御部２０４へ送る。また、シナリオ実行部２０１は、シナリオの実行位置を示す値を実行位置通知部２０２へ送る。さらに、シナリオ実行部２０１は、実行中のシナリオをシナリオ通知部２０３へ送る。シナリオ実行部２０１は、シナリオ記憶部２０５に保存されたシナリオがシナリオ制御装置１００（のシナリオ修正部１０９）によって修正された場合には、修正後のシナリオに従って動作する。シナリオ実行部２０１は、例えば前述のプロセッサに相当し得る。 The scenario execution unit 201 reads a scenario from the scenario storage unit 205 and controls execution of the scenario. The scenario execution unit 201 sequentially interprets an element (for example, a slide or an animation set on the slide) of the video content described in the scenario, the utterance content and the non-verbal movement of the robot 30 presenting the element, It may be video data displayed on the display 20, utterance content data of the robot 30 (text data that can be processed by TTS (Text-to-Speech), and data (option) indicating the utterance method, or the voice data itself. It is also possible) to obtain motion control data for controlling non-verbal motions of the robot 30 and send them to the presentation control unit 204. The scenario execution unit 201 also sends a value indicating the execution position of the scenario to the execution position notification unit 202. Furthermore, the scenario execution unit 201 sends the scenario being executed to the scenario notification unit 203. When the scenario stored in the scenario storage unit 205 is corrected by (the scenario correction unit 109 of) the scenario control device 100, the scenario execution unit 201 operates according to the corrected scenario. The scenario execution unit 201 can correspond to the above-mentioned processor, for example.

実行位置通知部２０２は、シナリオ実行部２０１からシナリオの実行位置を示す値を受け取り、当該実行位置をシナリオ制御装置１００（の遷移先決定部１０４）に通知する。実行位置通知部２０２は、前述の通信Ｉ／Ｆに相当し得る。 The execution position notification unit 202 receives a value indicating the execution position of the scenario from the scenario execution unit 201, and notifies the scenario control device 100 (the transition destination determination unit 104 thereof) of the execution position. The execution position notification unit 202 can correspond to the above-mentioned communication I/F.

シナリオ通知部２０３は、シナリオ実行部２０１から実行中のシナリオを受け取り、これをシナリオ制御装置１００（の修正内容選択部１０６）に通知する。シナリオ通知部２０３は、前述の通信Ｉ／Ｆに相当し得る。 The scenario notification unit 203 receives the scenario being executed from the scenario execution unit 201 and notifies the scenario control device 100 (the correction content selection unit 106 thereof) of this. The scenario notification unit 203 can correspond to the above-mentioned communication I/F.

提示制御部２０４は、シナリオ実行部２０１から、ディスプレイ２０の映像データ、ロボット３０の発話内容データ、ロボット３０の動作制御データ、などを受け取る。提示制御部２０４は、ディスプレイ２０に適時に映像データを与え、ロボット３０に適時に発話内容データおよび／または制御データを与える。提示制御部２０４は、例えば前述のプロセッサおよび通信Ｉ／Ｆに相当し得る。 The presentation control unit 204 receives the video data of the display 20, the utterance content data of the robot 30, the operation control data of the robot 30, and the like from the scenario execution unit 201. The presentation control unit 204 gives video data to the display 20 in a timely manner, and gives utterance content data and/or control data to the robot 30 in a timely manner. The presentation control unit 204 can correspond to, for example, the above-described processor and communication I/F.

シナリオ記憶部２０５は、シナリオを保存する。シナリオ記憶部２０５に保存されたシナリオは、シナリオ実行部２０１によって必要に応じて読み出される。シナリオ記憶部２０５は、例えば前述のメモリおよび／または補助記憶装置に相当し得る。 The scenario storage unit 205 stores the scenario. The scenario stored in the scenario storage unit 205 is read by the scenario execution unit 201 as needed. The scenario storage unit 205 can correspond to, for example, the above-mentioned memory and/or auxiliary storage device.

次に、図１７乃至図１９を用いて、シナリオ制御装置１００の動作例を説明する。なお、図１７に例示される動作は、コンテンツの提示中に繰り返し行われるが、例えば１スライド毎のように定期的に行われてもよいし、不定期に行われてもよい。 Next, an operation example of the scenario control device 100 will be described with reference to FIGS. 17 to 19. Note that the operation illustrated in FIG. 17 is repeatedly performed during the presentation of the content, but may be performed regularly, for example, every slide, or may be performed irregularly.

まず、画像取得部１０１は、カメラ１０から聴衆を撮影した画像データを取得する（ステップＳ３０１）。聴衆特徴抽出部１０２は、ステップＳ３０１において取得された画像データから聴衆の特徴量、例えば前述の第１の特徴量および第２の特徴量、を抽出する（ステップＳ３０２）。 First, the image acquisition unit 101 acquires image data of an image of an audience from the camera 10 (step S301). The audience feature extraction unit 102 extracts the feature quantity of the audience, for example, the above-described first feature quantity and second feature quantity from the image data acquired in step S301 (step S302).

聴衆状態推定部１０３は、ステップＳ３０２において抽出された特徴量に基づいて、聴衆の視聴状態を複数の状態、例えば前述の状態１〜状態４、のいずれか１つとして推定する（ステップＳ３０３）。 The audience state estimation unit 103 estimates the audience's viewing state as one of a plurality of states, for example, the above-described states 1 to 4 based on the feature amount extracted in step S302 (step S303).

遷移先決定部１０４は、ステップＳ３０３において推定された現在状態が状態遷移を必要とする状態、例えば前述の状態２または状態３、であるか否かを判定する（ステップＳ３０４）。ステップＳ３０４において現在状態が状態遷移を必要とする状態であると判定されれば処理はステップＳ３０５に進み、そうでなければ（例えば、現在状態＝状態１の場合）処理は終了する。 The transition destination determination unit 104 determines whether the current state estimated in step S303 is a state that requires state transition, for example, the state 2 or the state 3 described above (step S304). If it is determined in step S304 that the current state is a state that requires state transition, the process proceeds to step S305, and if not (for example, the current state=state 1), the process ends.

ステップＳ３０５において、遷移先決定部１０４は、遷移先状態の候補が１つであるか否かを判定する。例えば、現在状態が前述の状態２および状態３である場合に遷移先状態の候補は状態１のみである。他方、現在状態が前述の状態４である場合に遷移先状態の候補は状態２および状態３である。ステップＳ３０５において遷移先状態の候補が１つであると判定されれば処理はステップＳ３０７に進み、そうでなければ処理はステップＳ３０６へ進む。 In step S305, the transition destination determination unit 104 determines whether or not there is one transition destination state candidate. For example, when the current state is the state 2 and the state 3 described above, only the state 1 is the candidate for the transition destination state. On the other hand, when the current state is the above-mentioned state 4, the candidates for the transition destination state are the state 2 and the state 3. If it is determined in step S305 that there is one transition destination state candidate, the process proceeds to step S307, and if not, the process proceeds to step S306.

ステップＳ３０６において、遷移先決定部１０４は、状態遷移ルール記憶部１０５に保存された状態遷移ルールと、実行位置通知部２０２によって通知されたシナリオの実行位置とを参照する。そして、処理はステップＳ３０７へ進む。 In step S306, the transition destination determination unit 104 refers to the state transition rule stored in the state transition rule storage unit 105 and the scenario execution position notified by the execution position notification unit 202. Then, the process proceeds to step S307.

ステップＳ３０７において、遷移先決定部１０４は、遷移先状態を決定する。具体的には、ステップＳ３０６を経由していない場合には遷移先状態の候補は１つであるから、遷移先決定部１０４は当該候補を遷移先状態として決定する。他方、ステップＳ３０６を経由している場合には、遷移先決定部１０４は、ステップＳ３０６において参照した状態遷移ルールを同じくステップＳ３０６において参照したシナリオの実行位置に基づくシナリオ進行状況に適用することで、複数の候補のいずれか１つを遷移先状態として決定できる。例えば、遷移先決定部１０４は、シナリオ進行状況が前述の方針転換閾値未満である場合には状態２を遷移先状態として決定し、シナリオ進行状況が前述の方針転換閾値以上である場合には状態３を遷移先状態として決定し得る。 In step S307, the transition destination determination unit 104 determines the transition destination state. Specifically, since the number of candidates for the transition destination state is one when the process does not go through step S306, the transition destination determination unit 104 determines the candidate as the transition destination state. On the other hand, in the case of passing through step S306, the transition destination determination unit 104 applies the state transition rule referred to in step S306 to the scenario progress status based on the execution position of the scenario also referred to in step S306. Any one of the plurality of candidates can be determined as the transition destination state. For example, the transition destination determination unit 104 determines the state 2 as the transition destination state when the scenario progress status is less than the policy transition threshold value, and the state 2 when the scenario progress status is equal to or greater than the policy transition threshold value. 3 can be determined as the transition destination state.

修正内容選択部１０６は、ステップＳ３０３において推定された現在状態と、ステップＳ３０７において決定された遷移先状態との組み合わせに関連付けられている、少なくとも１つのシナリオ修正内容のうちいずれか１つを選択する（ステップＳ３１０）。なお、ステップＳ３１０の詳細な具体例は図１８を用いて後述する。 The modification content selection unit 106 selects any one of at least one scenario modification content associated with the combination of the current state estimated in step S303 and the transition destination state determined in step S307. (Step S310). A detailed concrete example of step S310 will be described later with reference to FIG.

シナリオ修正部１０９は、ステップＳ３１０において選択されたシナリオ修正内容、および修正内容選択部１０６によって決定された対象となるシナリオ位置によって特定される修正前のシナリオ、例えば次に表示されるスライドに対応する発話内容および／または非言語動作を修正し（ステップＳ３２０）、処理は終了する。なお、ステップＳ３２０の詳細な具体例は図１９を用いて後述する。 The scenario correction unit 109 corresponds to the scenario correction content selected in step S310 and the scenario before the correction specified by the target scenario position determined by the correction content selection unit 106, for example, the slide displayed next. The utterance content and/or the non-verbal action is modified (step S320), and the process ends. A detailed concrete example of step S320 will be described later with reference to FIG.

以下、図１８を用いて図１７のステップＳ３１０の詳細な具体例を説明する。図１８の処理はステップＳ３１１から開始する。
ステップＳ３１１において、修正内容選択部１０６は、修正内容ルール記憶部１０７に保存されている修正内容ルールを参照し、ステップＳ３０３において推定された現在状態と、ステップＳ３０７において決定された遷移先状態との組み合わせに関連付けられている、少なくとも１つのシナリオ修正内容を取得する。 Hereinafter, a detailed specific example of step S310 in FIG. 17 will be described with reference to FIG. The process of FIG. 18 starts from step S311.
In step S311, the modification content selection unit 106 refers to the modification content rule stored in the modification content rule storage unit 107, and identifies the current state estimated in step S303 and the transition destination state determined in step S307. At least one scenario modification content associated with the combination is obtained.

修正内容選択部１０６は、ステップＳ３１１において取得されたシナリオ修正内容が１つであるか否かを判定する（ステップＳ３１２）。ステップＳ３１２においてシナリオ修正内容が１つであると判定されれば処理はステップＳ３１６へ進み、そうでなければ処理はステップＳ３１３へと進む。 The correction content selection unit 106 determines whether or not there is one scenario correction content acquired in step S311 (step S312). If it is determined in step S312 that there is one scenario correction content, the process proceeds to step S316, and if not, the process proceeds to step S313.

ステップＳ３１３において、修正内容選択部１０６は、修正履歴記憶部１０８に保存されている修正履歴を参照し、ステップＳ３１１において取得されたシナリオ修正内容のそれぞれの累積選択回数を取得する。そして、修正内容選択部１０６は、ステップＳ３１３において取得された累積選択回数が最小でないシナリオ修正内容を破棄する（ステップＳ３１４）。 In step S313, the correction content selection unit 106 refers to the correction history stored in the correction history storage unit 108, and acquires the cumulative selection count of each scenario correction content acquired in step S311. Then, the modification content selection unit 106 discards the scenario modification content having the smallest cumulative selection count acquired in step S313 (step S314).

修正内容選択部１０６は、ステップＳ３１４を経て残存するシナリオ修正内容が１つであるか否かを判定する（ステップＳ３１５）。ステップＳ３１５において残存するシナリオ修正内容が１つであると判定されれば処理はステップＳ３１６へ進み、そうでなければ処理はステップＳ３１７へ進む。 The correction content selection unit 106 determines whether or not there is one scenario correction content remaining after step S314 (step S315). If it is determined in step S315 that there is only one scenario correction content remaining, the process proceeds to step S316, and if not, the process proceeds to step S317.

ステップＳ３１６において、修正内容選択部１０６は、その時点で残存する唯一のシナリオ修正内容を選択し、処理は終了する。他方、ステップＳ３１７において、修正内容選択部１０６は、その時点で残存する複数のシナリオ修正内容の１つをランダムに選択し、処理は終了する。 In step S316, the modification content selection unit 106 selects the only scenario modification content that remains at that time, and the process ends. On the other hand, in step S317, the correction content selection unit 106 randomly selects one of the plurality of scenario correction content remaining at that time, and the process ends.

以下、図１９を用いて図１７のステップＳ３２０の詳細な具体例を説明する。図１９の処理はステップＳ３２１から開始する。
ステップＳ３２１において、シナリオ修正部１０９は、ステップ３１０において選択されたシナリオ修正内容の修正ターゲットを取得する。シナリオ修正部１０９は、ステップＳ３２１において取得した修正ターゲットが、非言語動作、発話内容、および全てのいずれであるかを判定する。 Hereinafter, a detailed specific example of step S320 in FIG. 17 will be described with reference to FIG. The process of FIG. 19 starts from step S321.
In step S321, the scenario correction unit 109 acquires the correction target of the scenario correction content selected in step 310. The scenario correction unit 109 determines whether the correction target acquired in step S321 is a non-verbal operation, utterance content, or all.

ステップＳ３２１において修正ターゲットが非言語動作と判定されると処理はステップＳ３２３へ進む。ステップＳ３２３において、シナリオ修正部１０９は、修正前のシナリオのうち対象となる部分に記述された非言語動作を、ステップＳ３１０において選択されたシナリオ修正内容の示す非言語動作に置換し、または修正前のシナリオのうち対象となる部分に記述された非言語動作に、ステップＳ３１０において選択されたシナリオ修正内容の示す非言語動作を追加する。例えば、修正前のシナリオの対象となる部分に非言語動作として「ポインティング動作」が記述されていて、シナリオ修正内容の示す非言語動作が「視線制御による聴衆へのアイコンタクト動作」であったとすると、当該部分の実行時にロボット３０は元々予定されていた非言語動作であるポインティング動作を行わずに、アイコンタクト動作を行うことになる。なお、ステップＳ３２３において、修正前のシナリオのうち対象となる部分に記述された非言語動作が、ステップＳ３１０において選択されたシナリオ修正内容の示す非言語動作と一致または類似である場合には、この選択されたシナリオ修正内容を除外したうえで処理はステップＳ３１０に戻ってもよい。 If it is determined in step S321 that the correction target is a non-language action, the process proceeds to step S323. In step S323, the scenario correction unit 109 replaces the non-verbal action described in the target portion of the scenario before the revision with the non-verbal action indicated by the scenario revision content selected in step S310, or before the revision. The non-verbal operation indicated by the scenario correction content selected in step S310 is added to the non-verbal operation described in the target portion of the scenario. For example, if "pointing action" is described as a non-verbal action in the target part of the scenario before modification, and the non-verbal action indicated by the scenario modification content is "eye contact action to the audience by eye control". At the time of executing the portion, the robot 30 performs the eye contact operation without performing the originally planned non-verbal pointing operation. In step S323, if the non-verbal action described in the target portion of the scenario before modification is the same as or similar to the non-verbal action indicated by the scenario modification content selected in step S310, The process may return to step S310 after excluding the selected scenario correction content.

ステップＳ３２１において修正ターゲットが発話内容と判定されると処理はステップＳ３２４へ進む。ステップＳ３２４において、シナリオ修正部１０９は、修正前のシナリオのうち対象となる部分に記述された発話内容に、ステップＳ３１０において選択されたシナリオ修正内容の示す発話内容、例えば発話法や付加的な台詞を追加する。例えば、修正前のシナリオの対象となる部分に発話内容として「今から，××をご説明します」が記述されていて、シナリオ修正内容の示す発話内容が「パラ言語（音量増）で重要点強調」であったとすると、当該部分の実行時にロボット３０は元々予定されていた発話内容である「今から，××をご説明します」を例えば通常よりも大きな音量で発話することになる。 If it is determined in step S321 that the correction target is the utterance content, the process proceeds to step S324. In step S324, the scenario correction unit 109 adds, to the utterance content described in the target portion of the scenario before correction, the utterance content indicated by the scenario correction content selected in step S310, for example, the utterance method or additional dialogue. To add. For example, "I will explain XX from now on" is described as the utterance content in the target part of the scenario before modification, and the utterance content indicated by the scenario modification content is "important in para-language (volume increase)". If it is "point emphasis", the robot 30 will utter the originally planned utterance content "I will explain XX from now on" at a louder volume than usual. ..

ステップＳ３２１において修正ターゲットが全てと判定されると処理はステップＳ３２５へ進む。ステップＳ３２５において、シナリオ修正部１０９は、修正前のシナリオのうち対象となる部分を複製し、必要に応じてその一部を変更したうえで追加する。これにより、同じ説明を繰り返すことが可能となる。 If it is determined in step S321 that all correction targets are present, the process proceeds to step S325. In step S325, the scenario correction unit 109 duplicates the target portion of the scenario before correction, modifies a part of the scenario as needed, and then adds it. This allows the same description to be repeated.

ステップＳ３２６において、シナリオ修正部１０９は、ステップ３１０において選択されたシナリオ修正内容の累積選択回数をインクリメントするために修正履歴記憶部１０８に保存された修正履歴を更新し、処理は終了する。 In step S326, the scenario correction unit 109 updates the correction history stored in the correction history storage unit 108 in order to increment the cumulative selection count of the scenario correction content selected in step 310, and the process ends.

以上説明したように、実施形態に係るシナリオ制御装置は、コンテンツを視聴する聴衆の画像に基づいて当該聴衆の視聴状態を推定し、推定した視聴状態をさらに良好と定義される状態に遷移させるべく、コンテンツの提示態様が記述されるシナリオを動的に修正する。すなわち、このシナリオ制御装置は、例えばコミュニケーションロボットなどのコンテンツのプレゼンタの発話内容、非言語動作などを、聴衆の反応に依存して適応的に変化させる。従って、このシナリオ制御装置によれば、予め用意されたシナリオが適さない聴衆を相手にコンテンツを提示する場合であっても、当該聴衆に適するようにシナリオを修正することができる。要するに、このシナリオ制御装置によれば、コンテンツの要点・詳細が聴衆の属性に関わらず効果的に伝わるように、コンテンツ提示を支援することが可能となる。 As described above, the scenario control device according to the embodiment estimates the viewing state of the audience based on the image of the audience viewing the content, and transitions the estimated viewing state to a state defined as better. , Dynamically modify the scenario in which the presentation mode of the content is described. That is, the scenario control device adaptively changes the utterance content, non-verbal motion, etc. of the presenter of the content such as a communication robot depending on the reaction of the audience. Therefore, according to this scenario control device, even when the content is presented to an audience who is not suitable for the prepared scenario, the scenario can be modified to be suitable for the audience. In short, according to the scenario control device, it is possible to support the content presentation so that the main points and details of the content can be effectively transmitted regardless of the attributes of the audience.

（変形例）
図１では、デジタルサイネージとコミュニケーションロボットとを組み合わせたコンテンツ提示システムが説明された。しかしながら、実施形態に係るシナリオ制御装置は、特定のコンテンツの提示技法に限定されることなく適用可能である。 (Modification)
In FIG. 1, the content presentation system in which the digital signage and the communication robot are combined has been described. However, the scenario control device according to the embodiment can be applied without being limited to a specific content presentation technique.

例えば、コンテンツのプレゼンタは、ロボット３０のような現実空間に存在する物理的なロボットに限られない。プレゼンタは、図２０に例示されるように、ディスプレイ２０に映像コンテンツ２１に重畳して、または映像コンテンツ２１と分離して表示されるバーチャルエージェント３１であってもよい。この場合に、バーチャルエージェント３１の発話内容は、映像コンテンツとともに出力される音声または映像コンテンツとともに表示されるテキストとして実現され、バーチャルエージェント３１の非言語動作はジェスチャ映像として実現され得る。かかるコンテンツ提示技法は、例えばカーナビゲーションシステムなどに適用可能性がある。さらなる変形例として、映像コンテンツ２１およびバーチャルエージェント３１は、仮想空間に表示されてもよい。 For example, the presenter of the content is not limited to the physical robot existing in the physical space such as the robot 30. The presenter may be a virtual agent 31 displayed on the display 20 so as to be superimposed on the video content 21 or separated from the video content 21, as illustrated in FIG. In this case, the utterance content of the virtual agent 31 may be realized as audio output with the video content or text displayed with the video content, and the non-verbal operation of the virtual agent 31 may be realized as a gesture video. Such a content presentation technique may be applicable to, for example, a car navigation system. As a further modification, the video content 21 and the virtual agent 31 may be displayed in the virtual space.

或いは、プレゼンタは、図２１に例示されるポインティングデバイス３２であってもよい。図２１におけるポインティングデバイス３２は、例えば、レーザーポインタと当該レーザーポインタを把持してレーザの照射位置をディスプレイ２０の任意の箇所に移動させることのできるロボットアームとの組み合わせであるが、これとは異なるハードウェアによりプレゼンタとしてのポインティングデバイスが実現されてもよい。 Alternatively, the presenter may be the pointing device 32 illustrated in FIG. The pointing device 32 in FIG. 21 is, for example, a combination of a laser pointer and a robot arm capable of grasping the laser pointer and moving the laser irradiation position to an arbitrary position on the display 20, but this is different. A pointing device as a presenter may be realized by hardware.

また、プレゼンタは発話および非言語動作の一方が不可能であってもよい。例えば、プレゼンタは発話が可能であるものの非言語動作を取ることが不可能であってもよいし、逆にプレゼンタは非言語動作を取ることが可能であるものの発話が不可能であってもよい。さらに、プレゼンタは発話内容および非言語動作の両方が可能であるものの、一方のみが制御可能であってもよい。 Also, the presenter may not be able to speak or perform non-verbal actions. For example, the presenter may be able to speak but not be able to take non-verbal actions, or conversely, the presenter may be able to take non-verbal actions but not be able to speak. .. Furthermore, the presenter may be capable of both utterance content and non-verbal actions, but only one may be controllable.

上述の実施形態は、本発明の概念の理解を助けるための具体例を示しているに過ぎず、本発明の範囲を限定することを意図されていない。実施形態は、本発明の要旨を逸脱しない範囲で、様々な構成要素の付加、削除または転換をすることができる。 The embodiments described above are merely illustrative for facilitating the understanding of the concept of the present invention, and are not intended to limit the scope of the present invention. The embodiment can add, delete, or change various components without departing from the scope of the present invention.

上述の実施形態では、いくつかの機能部を説明したが、これらは各機能部の実装の一例に過ぎない。例えば、１つの装置に実装されると説明された複数の機能部が複数の別々の装置に亘って実装されることもあり得るし、逆に複数の別々の装置に亘って実装されると説明された機能部が１つの装置に実装されることもあり得る。 Although some functional units have been described in the above embodiments, these are merely examples of implementation of each functional unit. For example, a plurality of functional units described as being mounted on one device may be mounted on a plurality of separate devices, or conversely, may be mounted as being mounted on a plurality of separate devices. It is also possible that the implemented functional units are implemented in one device.

上記各実施形態において説明された種々の機能部は、回路を用いることで実現されてもよい。回路は、特定の機能を実現する専用回路であってもよいし、プロセッサのような汎用回路であってもよい。 The various functional units described in each of the above embodiments may be realized by using a circuit. The circuit may be a dedicated circuit that realizes a specific function or a general-purpose circuit such as a processor.

上記各実施形態の処理の少なくとも一部は、例えば汎用のコンピュータに搭載されたプロセッサを基本ハードウェアとして用いることでも実現可能である。上記処理を実現するプログラムは、コンピュータで読み取り可能な記録媒体に格納して提供されてもよい。プログラムは、インストール可能な形式のファイルまたは実行可能な形式のファイルとして記録媒体に記憶される。記録媒体としては、磁気ディスク、光ディスク（ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＤＶＤ等）、光磁気ディスク（ＭＯ等）、半導体メモリなどである。記録媒体は、プログラムを記憶でき、かつ、コンピュータが読み取り可能であれば、何れであってもよい。また、上記処理を実現するプログラムを、インターネットなどのネットワークに接続されたコンピュータ（サーバ）上に格納し、ネットワーク経由でコンピュータ（クライアント）にダウンロードさせてもよい。 At least a part of the processing of each of the above-described embodiments can be realized by using a processor mounted on a general-purpose computer as basic hardware. The program that realizes the above processing may be stored in a computer-readable recording medium and provided. The program is stored in the recording medium as an installable file or an executable file. The recording medium may be a magnetic disk, an optical disk (CD-ROM, CD-R, DVD, etc.), a magneto-optical disk (MO, etc.), a semiconductor memory, or the like. The recording medium may be any one as long as it can store the program and can be read by a computer. Further, the program that implements the above processing may be stored in a computer (server) connected to a network such as the Internet and may be downloaded by a computer (client) via the network.

１０・・・カメラ
２０・・・ディスプレイ
２１・・・映像コンテンツ
３０・・・ロボット
３１・・・バーチャルエージェント
３２・・・ポインティングデバイス
１００・・・シナリオ制御装置
１０１・・・画像取得部
１０２・・・聴衆特徴抽出部
１０３・・・聴衆状態推定部
１０４・・・遷移先決定部
１０５・・・状態遷移ルール記憶部
１０６・・・修正内容選択部
１０７・・・修正内容ルール記憶部
１０８・・・修正履歴記憶部
１０９・・・シナリオ修正部
２００・・・提示制御装置
２０１・・・シナリオ実行部
２０２・・・実行位置通知部
２０３・・・シナリオ通知部
２０４・・・提示制御部 10... Camera 20... Display 21... Video content 30... Robot 31... Virtual agent 32... Pointing device 100... Scenario control device 101... Image acquisition unit 102... Audience feature extraction unit 103... Audience state estimation unit 104... Transition destination determination unit 105... State transition rule storage unit 106... Correction content selection unit 107... Correction content rule storage unit 108... -Correction history storage unit 109... Scenario correction unit 200... Presentation control device 201... Scenario execution unit 202... Execution position notification unit 203... Scenario notification unit 204... Presentation control unit

Claims

An acquisition unit that acquires information representing the state of the audience who is viewing the content,
An extraction unit that extracts the feature amount of the audience from the information representing the state of the audience who views the content;
An estimation unit that estimates the viewing state of the audience as one of a plurality of states including a first state based on the characteristic amount of the audience, and sets the estimated state as a current state;
A determining unit that determines any one of the plurality of states different from the current state as a transition destination state when the current state is not the first state;
A scenario control device, comprising: a selection unit that selects any one of at least one available correction content for the content presentation scenario that is associated with the combination of the current state and the transition destination state. ..

The audience feature amount includes a first feature amount indicating an interest level of the audience with respect to the content, and a second feature amount indicating a concentration degree of the audience with respect to the content,
The plurality of states include a second state, a third state and a fourth state in addition to the first state,
When the first feature amount is equal to or more than a first threshold value and the second feature amount is equal to or more than a second threshold value, the estimation unit sets the audience viewing state to the first state. Estimate,
When the first feature amount is equal to or more than the first threshold value and the second feature amount is less than the second threshold value, the estimation unit determines the viewing state of the audience by the second value. Presumed to be a state,
When the first feature amount is less than the first threshold value and the second feature amount is greater than or equal to the second threshold value, the estimation unit determines the viewing state of the audience by the third value. Presumed to be a state,
When the first feature amount is less than the first threshold value and the second feature amount is less than the second threshold value, the estimating unit determines the viewing state of the audience by the fourth value. Presumed to be a state,
The scenario control device according to claim 1.

The determination unit determines the second state as the transition destination state when the current state is the fourth state and the progress of the presentation scenario is less than a third threshold,
The determination unit determines the third state as the transition destination state when the current state is the fourth state and the progress of the presentation scenario is equal to or more than the third threshold. ,
The scenario control device according to claim 2.

The determining unit determines the second state as the transition destination state when the current state is the fourth state and the total length of the presentation scenario is less than a fourth threshold value. Item 2. The scenario control device according to item 2.

The selection unit selects, of the at least one available correction content associated with the combination of the current state and the transition destination state, one having the smallest cumulative selection count. Item 5. The scenario control device according to any one of items 4.

The scenario control device according to claim 1, further comprising a correction unit that corrects a presentation scenario of the content based on the selected correction content.

A scenario control method executed by a computer, comprising:
Acquiring information that represents the state of the audience watching the content,
Extracting the feature amount of the audience from information representing the state of the audience viewing the content,
Estimating the viewing state of the audience as any one of a plurality of states including a first state based on the characteristic amount of the audience, and setting the estimated state as a current state,
Determining, if the current state is not the first state, any one of the plurality of states different from the current state as a transition destination state;
Selecting any one of at least one available modification to the presentation scenario of the content that is associated with the combination of the current state and the transition destination state.

A scenario control program comprising computer-readable instructions for causing a computer to function as the scenario control device according to any one of claims 1 to 6.