JP2020115202A

JP2020115202A - Information processing apparatus, presentation system, and information processing program

Info

Publication number: JP2020115202A
Application number: JP2019069945A
Authority: JP
Inventors: 一希笠井; Kazuki KASAI; 慎江上; Shin Egami
Original assignee: Omron Corp; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 2019-01-17
Filing date: 2019-04-01
Publication date: 2020-07-30
Anticipated expiration: 2039-04-01
Also published as: WO2020148919A1; JP7111042B2

Abstract

To achieve an information processing apparatus that can support learning of an operation related to serving customers performed by an operation learner.SOLUTION: An information processing apparatus (100) comprises: a face information acquisition unit (111) that acquires face information including information on at least part of the face of an operation learner who learns to serve customers; a voice information acquisition unit (114) that acquires voice information including information on a speech uttered by the operation learner; a difference information derivation unit (118) that derives first difference information indicating the difference between first reference information and the state of the at least part of the face indicated by the face information acquired by the face information acquisition unit (111), and second difference information indicating the difference between second reference information and the voice information acquired by the voice information acquisition unit (114); and a presentation information creation unit (119) that creates presentation information according to at least any one of the first difference information and second difference information.SELECTED DRAWING: Figure 1

Description

本発明は情報処理装置、情報処理方法、提示システム、及び情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, a presentation system, and an information processing program.

近年、作業者が作業内容を習得することを支援するシステムが開発されている。例えば、特許文献１には、画像情報に基づき、模範作業者と学習作業者との視線又は作業動作の違いを表示する作業教育システムが開示されている。 In recent years, systems have been developed to assist workers in learning work content. For example, Patent Literature 1 discloses a work education system that displays a difference in the line of sight or work motion between a model worker and a learning worker based on image information.

特開２０１８−１８００９０JP, 2018-18090, A

一方で、接客作業等においては、画像情報のみを用いる上述の従来技術では、作業内容の習得を適切に支援することができないという問題がある。 On the other hand, in the customer service work and the like, the above-described conventional technique using only image information has a problem that it is not possible to appropriately support the acquisition of work content.

本発明の一態様は、接客作業に関する作業学習者の学習を支援することのできる情報処理装置を実現することにある。 One aspect of the present invention is to realize an information processing apparatus capable of supporting the learning of a work learner regarding customer service work.

上記の課題を解決するために、本発明の一態様に係る情報処理装置は、接客作業を学習する作業学習者の顔の少なくとも一部の情報を含む顔情報を取得する顔情報取得部と、前記作業学習者による発話の情報を含む音声情報を取得する音声情報取得部と、前記顔情報取得部が取得した顔情報が示す顔の少なくとも一部の状態と第１の参照情報との差異を示す第１の差異情報、及び、前記音声情報取得部が取得した音声情報と第２の参照情報との差異を示す第２の差異情報を導出する差異情報導出部と、前記第１の差異情報及び前記第２の差異情報の少なくとも何れかに応じた提示情報を生成する提示情報生成部とを備えていることを特徴とする。 In order to solve the above problems, an information processing apparatus according to an aspect of the present invention includes a face information acquisition unit that acquires face information including information of at least a part of a face of a work learner who learns customer service, The difference between the voice information acquisition unit that acquires voice information including the information about the utterance by the work learner and the state of at least a part of the face indicated by the face information acquired by the face information acquisition unit and the first reference information First difference information indicating the first difference information and a difference information deriving unit deriving second difference information indicating a difference between the voice information acquired by the voice information acquiring unit and the second reference information, and the first difference information. And a presentation information generation unit that generates presentation information according to at least one of the second difference information.

本発明の一態様によれば、接客作業に関する作業学習者の学習を支援することのできる情報処理装置を実現することができる。 According to one aspect of the present invention, it is possible to realize an information processing apparatus that can assist a work learner in learning about customer service work.

本発明の一実施形態に係る提示システムの構成要素を示すブロック図である。It is a block diagram which shows the component of the presentation system which concerns on one Embodiment of this invention. 本発明の一実施形態に係るヘッドマウントディスプレイのディスプレイが表示する画像の一例を示す図である。It is a figure which shows an example of the image which the display of the head mounted display which concerns on one Embodiment of this invention displays. 本発明の一実施形態に係る視認対象特定部又は意思決定情報特定部が特定した学習者の視認対象、発話、動作及び集中度の一例を示す表である。6 is a table showing an example of a learner's view target, utterance, action, and concentration degree specified by a view target specifying unit or a decision making information specifying unit according to an embodiment of the present invention. 本発明の一実施形態に係る記憶部に予め記憶されているベテラン接客者のデータの一例を示す表である。It is a table which shows an example of the data of a veteran customer previously stored beforehand in the storage part concerning one embodiment of the present invention. 本発明の一実施形態に係る差異情報導出部が導出した差異情報のデータの一例を示す表である。It is a table which shows an example of the data of the difference information which the difference information derivation part concerning one embodiment of the present invention derived. 本発明の一実施形態に係る提示システムの処理を示すフローチャートである。It is a flowchart which shows the process of the presentation system which concerns on one Embodiment of this invention. 本発明の一実施形態に係る提示情報生成部が生成した提示情報の一例を示す図である。It is a figure which shows an example of the presentation information which the presentation information generation part which concerns on one Embodiment of this invention produced|generated. 本発明の一実施形態に係る提示情報生成部が生成した提示情報の一例を示す図である。It is a figure which shows an example of the presentation information which the presentation information generation part which concerns on one Embodiment of this invention produced|generated. 本発明の一実施形態に係る提示システムの構成要素を示すブロック図である。It is a block diagram which shows the component of the presentation system which concerns on one Embodiment of this invention. 本発明の一実施形態に係る提示システムの処理を示すフローチャートである。It is a flowchart which shows the process of the presentation system which concerns on one Embodiment of this invention. 視認対象物に関する情報の一例を示す図である。It is a figure which shows an example of the information regarding a visual recognition target object.

以下、本発明の一実施形態について、詳細に説明する。以下の特定の項目（実施形態）における構成について、それが他の項目で説明されている構成と同じである場合は、説明を省略する場合がある。また、説明の便宜上、各項目に示した部材と同一の機能を有する部材については、同一の符号を付し、適宜その説明を省略する。 Hereinafter, an embodiment of the present invention will be described in detail. When the configuration in the following specific items (embodiments) is the same as the configuration described in other items, the description may be omitted. Further, for convenience of description, members having the same functions as the members shown in each item are denoted by the same reference numerals, and the description thereof will be appropriately omitted.

〔実施形態１〕
１．提示システム
本実施形態に係る提示システム８００は、接客作業を学習する作業学習者に対して学習支援を行う。学習支援装置の形態としては、例えばＶＲ等を用いた体験的な学習装置が挙げられる。本実施形態に係る提示システム８００は、作業学習者（例えば、新米アルバイト。以下、単に学習者と記載することがある）の情報と、予め記憶されている模範作業者（例えば、ベテラン接客者）の情報（「参照情報」とも呼ぶ）との差異（「差異情報」とも呼ぶ）を、提示情報として学習者へ提示することで、作業学習者に対して、自立的かつ体験的な学習を支援することが可能である。以下においては、接客が必要とされる飲食店等で働く学習者が接客作業を学習する状況において提示システム８００を用いる場合を例に挙げて説明する。 [Embodiment 1]
1. Presentation System The presentation system 800 according to the present embodiment provides learning support to a work learner who learns customer service work. As a form of the learning support device, for example, an experiential learning device using VR or the like can be mentioned. The presentation system 800 according to the present embodiment includes information of a work learner (for example, a novice part-time job; hereinafter, may be simply referred to as a learner) and a model worker (for example, a veteran attendant) stored in advance. By presenting the difference (also called "difference information") from the information (also called "reference information") to the learner as the presentation information, the work learner is supported for independent and experiential learning. It is possible to Hereinafter, a case where the presentation system 800 is used in a situation in which a learner who works in a restaurant or the like that needs to serve customers learns to serve customers will be described as an example.

学習者の情報及び参照情報は、例えば、視認する対象及び視認する対象への集中度、発話内容及び声のトーン並びに動作等を含む。 The learner's information and reference information include, for example, an object to be visually recognized, a degree of concentration on the object to be visually recognized, utterance content, tone of voice, action, and the like.

図１は、本発明の一実施形態に係る提示システムの構成要素を示すブロック図である。図１に示すように、本実施形態に係る提示システム８００は、情報処理装置１００、ヘッドマウントディスプレイ２００、装着型インターフェース３００、情報管理サーバ４００及び表示部５００を備えている。このような構成により、本実施形態に係る提示システム８００は、学習者に対して学習支援を行うことができる。 FIG. 1 is a block diagram showing components of a presentation system according to an embodiment of the present invention. As shown in FIG. 1, the presentation system 800 according to the present embodiment includes an information processing device 100, a head mounted display 200, a wearable interface 300, an information management server 400, and a display unit 500. With such a configuration, the presentation system 800 according to the present embodiment can provide learning support to the learner.

＜ヘッドマウントディスプレイ＞
図１に示すように、ヘッドマウントディスプレイ２００は、学習者の顔の少なくとも一部を撮像するカメラ２１０（「撮像部」とも呼ぶ）、学習者の発話音声を集音するマイク２２０（「集音部」とも呼ぶ）、スピーカー２４０及びディスプレイ２７０（いずれも「提示部」と呼ぶことがある）、動きセンサ２３０、制御部２５０並びに通信部２６０を備えている。 <Head mounted display>
As shown in FIG. 1, the head-mounted display 200 includes a camera 210 (also referred to as an “imaging unit”) that captures at least a part of a learner's face, and a microphone 220 (“collecting sound” that collects a voice uttered by the learner. Unit)), a speaker 240 and a display 270 (both may be referred to as “presentation unit”), a motion sensor 230, a control unit 250, and a communication unit 260.

ヘッドマウントディスプレイ２００の制御部２５０は、ヘッドマウントディスプレイ２００の各部を統括的に制御するものである。制御部２５０の各機能、及びヘッドマウントディスプレイ２００に含まれる全ての機能は、例えば、記憶部（図示せず）等に記憶されたプログラムを、ＣＰＵが実行することによって実現されてもよい。 The control unit 250 of the head mounted display 200 centrally controls each unit of the head mounted display 200. Each function of the control unit 250 and all the functions included in the head mounted display 200 may be realized by the CPU executing a program stored in a storage unit (not shown) or the like, for example.

通信部２６０は、制御部２５０の制御を受けて、情報処理装置１００と通信する。通信の形態は、特に限定されることなく、Bluetooth（登録商標）、ＷｉＦｉ（登録商標）、その他の近距離無線通信技術による通信であってもよいし、他の通信であってもよい。 The communication unit 260 is controlled by the control unit 250 and communicates with the information processing device 100. The form of communication is not particularly limited, and may be communication using Bluetooth (registered trademark), WiFi (registered trademark), other short-range wireless communication technology, or other communication.

カメラ２１０は、学習者の顔の画像を顔情報として取得する。カメラ２１０は、学習者の身体の少なくとも一部の画像を顔の画像と同時に取得してもよい。マイク２２０は、学習者の発する声を音声情報として取得する。動きセンサ２３０は、例えば加速度センサであり、学習者の動きを動き情報として取得する。 The camera 210 acquires an image of the learner's face as face information. The camera 210 may acquire an image of at least a part of the learner's body at the same time as the face image. The microphone 220 acquires the voice of the learner as voice information. The motion sensor 230 is, for example, an acceleration sensor, and acquires the learner's motion as motion information.

スピーカー２４０は、通信部２６０を介してヘッドマウントディスプレイ２００に入力された音を出力する。ディスプレイ２７０は、通信部２６０を介してヘッドマウントディスプレイ２００に入力された画像を出力する。ディスプレイ２７０が出力する画像は、仮想空間を表示するものであってよく、ディスプレイ２７０の表示例として図２に示す画像が挙げられる。図２は、本発明の一実施形態に係るヘッドマウントディスプレイのディスプレイが表示する画像の一例を示す図である。図２に示す画像２００Ａは、店員である学習者の視点から見た飲食店内の様子であり、画像２００Ａには、客、自分の手元、自分以外の店員、及びテーブル等が表示されている。 The speaker 240 outputs the sound input to the head mounted display 200 via the communication unit 260. The display 270 outputs the image input to the head mounted display 200 via the communication unit 260. The image output by the display 270 may display a virtual space, and the display example of the display 270 may be the image shown in FIG. FIG. 2 is a diagram showing an example of an image displayed on the display of the head mounted display according to the embodiment of the present invention. An image 200A shown in FIG. 2 is a view of the inside of a restaurant from the viewpoint of a learner who is a store clerk, and the image 200A shows customers, his/her own hands, a clerk other than himself, a table, and the like.

また、図３は、視認対象特定部又は意思決定情報特定部が特定した学習者の視認対象、発話（発話内容及び声のトーン）、動作及び集中度（瞬き、瞳孔の状態、視線停留時間及び表情）の一例を示す表である。 In addition, FIG. 3 illustrates the learner's visual target, the utterance (speech content and voice tone), the movement and the degree of concentration (blinking, the state of the pupil, the gaze retention time, and the visual target) of the learner identified by the visual target specifying unit or the decision making information specifying unit. It is a table showing an example of (expression).

＜装着型インターフェース＞
図１に示すように、提示システム８００は、装着型インターフェース３００を備えている。一例において、装着型インターフェース３００は、動きセンサ３１０及びボタン３２０を備えおり、学習者が行う操作を情報処理装置１００に入力する。学習者が行う操作としては、一例において、動きセンサ３１０を上げたり下げたりする操作、及びボタン３２０を押下する操作等が挙げられる。 <Wearable interface>
As shown in FIG. 1, the presentation system 800 includes a wearable interface 300. In one example, the wearable interface 300 includes a motion sensor 310 and a button 320, and inputs an operation performed by a learner to the information processing device 100. Examples of the operation performed by the learner include an operation of raising and lowering the motion sensor 310 and an operation of pressing the button 320.

一例において、提示システム８００は、情報処理装置１００における動き情報取得部１１５が取得する学習者の動き情報に、学習者の手の動き等に関するより詳細な情報を含めることができる。本実施形態における学習者の手の動きの例としては、水の入ったコップを持ったりテーブルに置いたりする動き、及び食事の注文を取る動き等が挙げられる。 In one example, the presentation system 800 can include more detailed information about the learner's hand movement and the like in the learner's movement information acquired by the movement information acquisition unit 115 in the information processing apparatus 100. Examples of the movement of the learner's hand in the present embodiment include a movement of holding a cup of water or placing it on a table, and a movement of taking an order for a meal.

また、装着型インターフェース３００を介して入力された学習者の手の動き等の情報は、ヘッドマウントディスプレイ２００におけるディスプレイ２７０が出力する画像において、学習者の手の動き等として反映させることができる。 Further, the information such as the learner's hand movement input via the wearable interface 300 can be reflected as the learner's hand movement or the like in the image output by the display 270 in the head mounted display 200.

動きセンサ３１０及びボタン３２０としては、公知のセンサ及びボタンを用いることができ、動きセンサ３１０は、例えば加速度センサである。 Known sensors and buttons can be used as the motion sensor 310 and the button 320, and the motion sensor 310 is, for example, an acceleration sensor.

＜情報処理装置＞
図１に示すように、本実施形態における情報処理装置１００は、制御部１１０、記憶部１４０及び通信部１３０を備えている。 <Information processing device>
As shown in FIG. 1, the information processing apparatus 100 according to this embodiment includes a control unit 110, a storage unit 140, and a communication unit 130.

本実施形態における情報処理装置１００の制御部１１０は、作業学習者の顔の少なくとも一部の情報を含む顔情報を取得する顔情報取得部１１１と、作業学習者による発話の情報を含む音声情報を取得する音声情報取得部１１４と、顔情報取得部１１１が取得した顔情報が示す顔の少なくとも一部の状態と記憶部１４０が記憶している情報（「第１の参照情報」とも呼ぶ）との差異を示す差異情報（「第１の差異情報」とも呼ぶ）、及び、音声情報取得部１１４が取得した音声情報と記憶部１４０が記憶している情報（「第２の参照情報」とも呼ぶ）との差異を示す差異情報（「第２の差異情報」とも呼ぶ）を導出する差異情報導出部１１８と、差異情報に応じた提示情報を生成する提示情報生成部１１９とを備えている。 The control unit 110 of the information processing apparatus 100 according to the present embodiment includes a face information acquisition unit 111 that acquires face information including information on at least a part of the face of the work learner, and voice information that includes information about the utterance by the work learner. And the information stored in the storage unit 140 and the state of at least part of the face indicated by the face information acquired by the face information acquisition unit 111 (also referred to as “first reference information”). Difference information (also referred to as “first difference information”) indicating the difference between the voice information acquired by the voice information acquisition unit 114 and the information stored in the storage unit 140 (also referred to as “second reference information”). A difference information derivation unit 118 that derives difference information (also referred to as “second difference information”) indicating a difference between the call information and a presentation information generation unit 119 that generates presentation information according to the difference information. ..

情報処理装置１００は、一例として、ローカルネットワーク又はグローバルネットワークに接続可能な端末装置（例えば、スマートフォン、タブレット、パソコン、又はテレビジョン受像機等）に実装される。 The information processing apparatus 100 is mounted on, for example, a terminal device (for example, a smartphone, a tablet, a personal computer, a television receiver, or the like) that can be connected to a local network or a global network.

〔制御部〕
図１に示すように、本実施形態における情報処理装置１００の制御部１１０は、顔情報取得部１１１、状態検出部１１２、視認対象特定部１１３、音声情報取得部１１４、動き情報取得部１１５、意思決定情報特定部１１６、学習者情報取得部１１７、差異情報導出部１１８及び提示情報生成部１１９を備えている。制御部１１０は、情報処理装置１００の各部を統括的に制御する。制御部１１０の各機能、及び情報処理装置１００に含まれる全ての機能は、例えば、記憶部（図示せず）等に記憶されたプログラムを、ＣＰＵが実行することによって実現されてもよい。 [Control part]
As illustrated in FIG. 1, the control unit 110 of the information processing apparatus 100 according to the present embodiment includes a face information acquisition unit 111, a state detection unit 112, a visual target identification unit 113, a voice information acquisition unit 114, a motion information acquisition unit 115, The decision information specifying unit 116, the learner information acquiring unit 117, the difference information deriving unit 118, and the presentation information generating unit 119 are provided. The control unit 110 centrally controls each unit of the information processing device 100. Each function of the control unit 110 and all the functions included in the information processing apparatus 100 may be realized by the CPU executing a program stored in a storage unit (not shown) or the like, for example.

（顔情報取得部）
顔情報取得部１１１は、各時点において、学習者の顔の少なくとも一部の情報を含む顔情報を取得する。顔情報取得部１１１は、例えば、ヘッドマウントディスプレイ２００が備えるカメラ２１０から取得した画像から、学習者の顔情報を取得することができる。学習者の顔情報とは、顔の特徴量を示す情報である。顔の特徴量とは、例えば、顔の各部位（例えば、目、鼻、口及び眉等）の位置を示す位置情報、形状を示す形状情報及び大きさを示す大きさ情報等を指す。 (Face information acquisition unit)
The face information acquisition unit 111 acquires face information including information on at least a part of the learner's face at each time point. The face information acquisition unit 111 can acquire the learner's face information from the image acquired from the camera 210 included in the head mounted display 200, for example. The learner's face information is information indicating the face feature amount. The feature amount of the face indicates, for example, position information indicating the position of each part of the face (for example, eyes, nose, mouth, eyebrows, etc.), shape information indicating the shape, size information indicating the size, and the like.

特に、目の情報からは、学習者が視認する対象を特定することができるため、特に有用である。目の情報としては、例えば目頭及び目尻の端点、虹彩及び瞳孔等のエッジ等が挙げられる。また、顔情報取得部１１１は、撮像部から取得した画像に、ノイズ低減、エッジ強調等の補正処理を適宜行ってもよい。顔情報取得部１１１は、抽出した顔情報を状態検出部１１２に送信する。 In particular, it is particularly useful because the object visually recognized by the learner can be specified from the eye information. Examples of the eye information include the corner points of the inner and outer corners of the eyes, the edges of the iris and the pupil, and the like. In addition, the face information acquisition unit 111 may appropriately perform correction processing such as noise reduction and edge enhancement on the image acquired from the imaging unit. The face information acquisition unit 111 transmits the extracted face information to the state detection unit 112.

（状態検出部）
状態検出部１１２は、顔情報取得部１１１が取得した学習者の顔情報から上記学習者の状態を検出する。例えば、状態検出部１１２は、各時点における学習者が視認しているオブジェクト（図３における「視認対象」）、各時点における１分あたりの瞬きの回数（図３における「瞬き」）、各時点における瞳孔の開閉（図３における「瞳孔の状態」）、視認対象を視認している時間（図３における「視線停留時間」）、又は表情（図３における表情）等を検出する。 (Status detector)
The state detection unit 112 detects the state of the learner from the face information of the learner acquired by the face information acquisition unit 111. For example, the state detection unit 112 determines the object visually recognized by the learner at each time point (“visual recognition target” in FIG. 3 ), the number of blinks per minute at each time point (“blinking” in FIG. 3 ), each time point. The opening/closing of the pupil (“pupil state” in FIG. 3), the time during which the visual target is visually recognized (“gaze retention time” in FIG. 3), the facial expression (facial expression in FIG. 3), and the like are detected.

状態検出部１１２は、顔情報取得部１１１が抽出した顔情報に基づき、学習者の状態を検出する。状態検出部１１２は、学習者の状態を検出した後、該検出結果を視認対象特定部１１３へ送信する。 The state detection unit 112 detects the state of the learner based on the face information extracted by the face information acquisition unit 111. After detecting the state of the learner, the state detection unit 112 transmits the detection result to the visual target specifying unit 113.

状態検出部１１２は、一例として、顔情報取得部１１１が取得した顔の特徴量である顔の各部位（例えば、目、鼻、口、頬及び眉等）の位置を示す位置情報、形状を示す形状情報及び大きさを示す大きさ情報等を参照し、学習者の状態として、例えば、上記学習者の視認対象、瞳孔の状態、瞬きの回数、眉の動き、頬の動き、瞼の動き、唇の動き及び顎の動きのうち少なくとも１つを検出する。 As an example, the state detection unit 112 generates position information and shape indicating the position of each part of the face (for example, eyes, nose, mouth, cheeks, eyebrows, etc.) that is the feature amount of the face acquired by the face information acquisition unit 111. Referring to the shape information and the size information indicating the size, the learner's state, for example, the learner's visual recognition target, pupil state, number of blinks, eyebrow movement, cheek movement, eyelid movement , At least one of a lip movement and a jaw movement is detected.

このように、状態検出部を備えることで、情報処理装置は、学習者の状態を好適に検出することできる。 As described above, by providing the state detection unit, the information processing device can preferably detect the state of the learner.

視認対象の検出方法としては、特に限定されないが、情報処理装置１００に、点光源（図示せず）を設け、点光源からの光の角膜反射像を撮像部で所定時間撮影することにより、学習者の視認対象を検出する方法が挙げられる。点光源の種類は特に限定されず、可視光、赤外光が挙げられるが、例えば赤外線ＬＥＤを用いることで、学習者に不快感を与えることなく、視認対象を検出することができる。 The method of detecting the visual target is not particularly limited, but the information processing apparatus 100 is provided with a point light source (not shown), and a corneal reflection image of the light from the point light source is captured by the image capturing unit for a predetermined time, thereby learning. There is a method of detecting a visual recognition target of a person. The type of point light source is not particularly limited, and examples thereof include visible light and infrared light. By using an infrared LED, for example, a visually recognizable target can be detected without giving a learner an unpleasant feeling.

学習者の状態として、視認対象の他には、集中度を挙げることができる。一般的に、人間は、集中している場合、低い頻度で安定した間隔で瞬きしており、瞳孔が開く傾向にあるため、瞬きの回数及び瞳孔の状態（サイズ）を検出することで、学習者の集中度を評価することができる。例えば、瞬きの回数を所定時間検出し、所定時間内で瞬きが安定した間隔で行われている場合、学習者がある対象を注視している可能性が高いといえる。また、瞳孔のサイズを所定時間検出し、所定時間内で瞳孔が大きくなっている時間が長い場合は、学習者がある対象を注視している可能性が高いといえる。 As the state of the learner, the degree of concentration can be mentioned in addition to the visual recognition target. Generally, when a person is concentrating, he or she blinks at a stable interval with a low frequency, and the pupil tends to open. Therefore, by learning the number of blinks and the state (size) of the pupil, learning is performed. It is possible to evaluate the degree of concentration of personnel. For example, if the number of blinks is detected for a predetermined time and the blinks are performed at stable intervals within the predetermined time, it can be said that the learner is likely to be gazing at a target. Further, if the size of the pupil is detected for a predetermined time and the pupil is growing for a long time within the predetermined time, it can be said that there is a high possibility that the learner is gazing at an object.

瞬きの回数を検出する方法としては、特に限定されないが、例えば、赤外光を学習者の目に対して照射し、開眼時と、閉眼時との赤外光量反射量の差を検出する方法等が挙げられる。 The method of detecting the number of blinks is not particularly limited, for example, a method of irradiating the learner's eyes with infrared light and detecting the difference in the amount of infrared light reflection between when the eyes are open and when the eyes are closed. Etc.

瞳孔の状態を検出する方法としては、特に限定されないが、例えば、ハフ変換を利用して、目の画像から円形の瞳孔を検出する方法等が挙げられる。瞳孔のサイズに関して、閾値を設定し、瞳孔のサイズが閾値以上である場合は「開」、瞳孔のサイズが閾値未満である場合は「閉」として評価してもよい。 The method of detecting the state of the pupil is not particularly limited, but for example, a method of detecting a circular pupil from the image of the eye by using Hough transform may be used. With regard to the size of the pupil, a threshold may be set, and when the size of the pupil is equal to or larger than the threshold, it may be evaluated as “open”, and when the size of the pupil is smaller than the threshold, evaluation may be made as “closed”.

また、集中度の評価として、視認対象が所定時間以上変化しないことを検出することも可能である。例えば、上述した技術を用いて学習者の視認対象を検出して、学習者がオブジェクトを視認している時間（図３における「視線停留時間」）を計測することができる。 Further, it is possible to detect that the visual recognition target does not change for a predetermined time or longer as an evaluation of the degree of concentration. For example, it is possible to detect the visual recognition target of the learner by using the above-described technique and measure the time during which the learner visually recognizes the object (“gaze retention time” in FIG. 3).

状態検出部１１２は、学習者の視認対象、瞳孔の状態及び瞬きの回数、眉の動き、瞼の動き、頬の動き、鼻の動き、唇の動き及び顎の動きのうち少なくとも１つを検出すればよいが、これらを組み合わせることが好ましい。このように検出方法を組み合わせることで、状態検出部１１２は、あるオブジェクトを視認しているときの学習者の集中度を好適に評価することができる。 The state detection unit 112 detects at least one of the visual recognition target of the learner, the state of the pupil and the number of blinks, eyebrow movement, eyelid movement, cheek movement, nose movement, lip movement, and jaw movement. However, it is preferable to combine these. By combining the detection methods in this way, the state detection unit 112 can appropriately evaluate the degree of concentration of the learner while visually recognizing a certain object.

目の状態以外では、例えば、眉の内側を持ち上げるか、外側を上げるか等の眉の動き、上瞼を上げる、瞼を緊張させる等の瞼の動き、鼻に皺を寄せる等の鼻の動き、上唇を持ち上げる、唇をすぼめる等の唇の動き、頬を持ち上げる等の頬の動き、顎を下げる等の顎の動き等の顔の各部位の状態が挙げられる。学習者の状態として、顔の複数の部位の状態を組み合わせてもよい。 Other than the eye condition, for example, the movement of the eyebrow such as lifting the inside of the eyebrow or raising the outside of the eyebrow, the movement of the eyelid such as raising the upper eyelid, or tightening the eyelid, the movement of the nose such as wrinkling the nose , The state of each part of the face such as the movement of the lips such as lifting the upper lip and the purging of the lips, the movement of the cheeks such as lifting the cheeks, the movement of the jaws such as lowering the chin. As the state of the learner, the states of a plurality of parts of the face may be combined.

状態検出部１１２は、学習者の顔の表情から学習者の機嫌を判断することによって学習者の集中度を算出してもよい。学習者の機嫌に係る状態は、例えば、上述したように、顔の特徴量を参照し、学習者の状態として、例えば、学習者の視認対象、瞳孔の状態、瞬きの回数、眉の動き、頬の動き、瞼の動き、唇の動き及び顎の動きのうち少なくとも１つを検出し、検出した学習者の顔の各部位の状態に基づいて判断することができる。 The state detection unit 112 may calculate the degree of concentration of the learner by determining the learner's mood from the facial expression of the learner. The state related to the learner's mood, for example, as described above, refers to the facial feature amount, and as the state of the learner, for example, the learner's visual recognition target, the state of the pupil, the number of blinks, the movement of the eyebrows, At least one of the movement of the cheek, the movement of the eyelid, the movement of the lips, and the movement of the jaw can be detected, and the determination can be made based on the detected state of each part of the learner's face.

学習者の集中度を顔の表情から判断する方法としては、例えば、眉が上方向に動いた場合には、視認対象をより注視しているため集中していると判断することができる。また、例えば、人の顔を視認しているときに頬が上方向に動いた場合には、相手に対して表情を作っているとして、視認対象に集中していると判断することができる。 As a method of determining the degree of concentration of the learner from the facial expression, for example, when the eyebrows move upward, it can be determined that the learner is concentrating because he or she is gazing at the visual target. Further, for example, when the cheek moves upward while visually recognizing a person's face, it can be determined that the person is concentrating on the visually recognizable object because the facial expression is being made to the opponent.

また、学習者の表情を検出させる場合、状態検出部１１２は、学習者の表情に関する情報を機械学習により算出することもできる。表情に関する情報を取得するための学習処理の具体的な構成は本実施形態を限定するものではないが、例えば、以下のような機械学習的手法の何れか又はそれらの組み合わせを用いることができる。 When detecting the learner's facial expression, the state detection unit 112 can also calculate information regarding the learner's facial expression by machine learning. The specific configuration of the learning process for acquiring information about facial expressions does not limit the present embodiment, but for example, any of the following machine learning methods or a combination thereof can be used.

・サポートベクターマシン（SVM: Support Vector Machine）
・クラスタリング（Clustering）
・帰納論理プログラミング（ILP: Inductive Logic Programming）
・遺伝的アルゴリズム（GP: Genetic Programming）
・ベイジアンネットワーク（BN: Baysian Network）
・ニューラルネットワーク（NN: Neural Network）
ニューラルネットワークを用いる場合、データをニューラルネットワークへのインプット用に予め加工して用いるとよい。このような加工には、データの１次元的配列化、又は多次元的配列化に加え、例えば、データアーギュメンテーション（Deta Argumentation）等の手法を用いることができる。・Support Vector Machine (SVM)
・Clustering
・Inductive Logic Programming (ILP)
・ Genetic Algorithm (GP)
・Bayesian Network (BN)
・Neural network (NN)
When using a neural network, the data may be preprocessed and used for input to the neural network. For such processing, in addition to one-dimensional arraying of data or multi-dimensional arraying, for example, a method such as data argumentation can be used.

また、ニューラルネットワークを用いる場合、畳み込み処理を含む畳み込みニューラルネットワーク（CNN: Convolutional Neural Network）を用いてもよい。より具体的には、ニューラルネットワークに含まれる１又は複数の層（レイヤ）として、畳み込み演算を行う畳み込み層を設け、当該層に入力される入力データに対してフィルタ演算（積和演算）を行う構成としてもよい。またフィルタ演算を行う際には、パディング等の処理を併用したり、適宜設定されたストライド幅を採用したりしてもよい。 When a neural network is used, a convolutional neural network (CNN) including convolution processing may be used. More specifically, a convolutional layer that performs a convolutional operation is provided as one or more layers included in the neural network, and a filter operation (sum of products operation) is performed on input data input to the layer. It may be configured. Further, when performing the filter calculation, processing such as padding may be used together, or a stride width set appropriately may be adopted.

また、ニューラルネットワークとして、数十〜数千層に至る多層型又は超多層型のニューラルネットワークを用いてもよい。 Further, as the neural network, a multi-layer or super multi-layer neural network having tens to thousands of layers may be used.

（視認対象特定部）
視認対象特定部１１３は、顔情報取得部１１１が取得した顔情報、又は状態検出部１１２から取得した検出結果を参照して、学習者が視認する対象を特定する。視認する対象（視認対象）は、人であっても物体であってもよい。 (Visual target specifying part)
The visual recognition target identification unit 113 identifies the target visually recognized by the learner with reference to the face information acquired by the face information acquisition unit 111 or the detection result acquired from the state detection unit 112. The object to be visually recognized (visually recognized object) may be a person or an object.

また、視認対象特定部１１３は、状態検出部１１２から取得した検出結果に基づき、作業学習者の集中度を特定する。 In addition, the visual recognition target specifying unit 113 specifies the degree of concentration of the work learner based on the detection result acquired from the state detecting unit 112.

このように、情報処理装置１００は、視認対象特定部１１３を備えることで、学習者が視認している対象及び集中度を正確に判定することができる。 As described above, the information processing apparatus 100 can accurately determine the target visually recognized by the learner and the degree of concentration by including the visual target specifying unit 113.

視認対象特定部１１３の具体的な処理について説明する。視認対象特定部１１３は、状態検出部１１２から取得した検出結果から、学習者の視線の先の位置座標を判定することで、視認対象を特定する。例えば、視認対象特定部１１３は、図３に示すように、学習者の視線の先の位置座標が客の足元の座標内にあると特定する。また、視線の情報に加えて、（状態検出部）で記載したように、瞳孔の状態、瞬きの回数、眉の動き、瞼の動き、頬の動き、鼻の動き、唇の動き及び顎の動きの検出結果を参照することで、学習者がどのオブジェクトを集中して視認しているかをさらに好適に特定することができる。 Specific processing of the visual recognition target identification unit 113 will be described. The visual recognition target identification unit 113 identifies the visual recognition target by determining the position coordinates of the tip of the learner's line of sight from the detection result acquired from the state detection unit 112. For example, the visual recognition target identification unit 113 identifies that the position coordinates of the learner's line of sight are within the coordinates of the foot of the customer, as shown in FIG. Further, in addition to the information on the line of sight, as described in (State detection unit), the state of the pupil, the number of blinks, eyebrow movement, eyelid movement, cheek movement, nose movement, lip movement, and jaw movement. By referring to the motion detection result, it is possible to more appropriately specify which object the learner is concentrating and visually recognizing.

視認対象特定部１１３は、学習者がどのオブジェクトを集中して視認しているかの特定を、（状態検出部）に記載の機械学習的手法を用いて特定してもよい。 The visual recognition target identification unit 113 may identify which object the learner is concentrating and visually recognizing by using the machine learning method described in (state detection unit).

視認対象特定部１１３は、特定した結果を差異情報導出部１１８へ送信する。 The visual recognition target identification unit 113 transmits the identified result to the difference information derivation unit 118.

（音声情報取得部）
音声情報取得部１１４は、各時点における学習者による発話の情報を含む音声情報を取得する。音声情報取得部１１４は、例えば、ヘッドマウントディスプレイ２００のマイクから学習者の音声情報を取得することができる。学習者の音声情報として、例えば、声の周波数、声の大きさ等を取得する。音声情報の特定には、公知の音声認識技術を用いることができる。 (Voice information acquisition unit)
The voice information acquisition unit 114 acquires voice information including information about the utterance by the learner at each time point. The voice information acquisition unit 114 can acquire the voice information of the learner from the microphone of the head mounted display 200, for example. As the learner's voice information, for example, a voice frequency, a voice volume, etc. are acquired. A known voice recognition technique can be used to specify the voice information.

音声情報取得部１１４が取得した音声情報は、意思決定情報特定部１１６に入力される。 The voice information acquired by the voice information acquisition unit 114 is input to the decision information specifying unit 116.

（動き情報取得部）
動き情報取得部１１５は、各時点における作業学習者の体の少なくとも一部の動きを示す動き情報を取得する。動き情報取得部１１５は、例えば、図１に示すように、ヘッドマウントディスプレイ２００動きセンサ２３０、又は装着型インターフェース３００の動きセンサ３１０又はボタン３２０から取得する。また、ヘッドマウントディスプレイ２００のカメラ２１０が学習の身体を撮影した撮影画像から取得する。 (Motion information acquisition unit)
The motion information acquisition unit 115 acquires motion information indicating the motion of at least a part of the body of the work learner at each time point. The motion information acquisition unit 115 acquires from the motion sensor 230 of the head mounted display 200, the motion sensor 310 of the wearable interface 300, or the button 320, as shown in FIG. 1, for example. In addition, the camera 210 of the head mounted display 200 obtains the learned body from the captured image.

動き情報取得部１１５が取得した動き情報は、意思決定情報特定部１１６に入力される。 The motion information acquired by the motion information acquisition unit 115 is input to the decision making information specifying unit 116.

（意思決定情報特定部）
意思決定情報特定部１１６は、音声情報取得部１１４が取得した音声情報及び動き情報取得部１１５が取得した動き情報をもとに、各時点において学習者の意思を示す意思決定情報を特定する。本実施形態において、意思決定情報には、一例として、学習者の発話内容、声のトーン及び行動の内容が含まれる。 (Decision Information Identification Department)
Based on the voice information acquired by the voice information acquisition unit 114 and the motion information acquired by the motion information acquisition unit 115, the decision information specifying unit 116 specifies the decision information indicating the learner's intention at each time point. In the present embodiment, the decision-making information includes, for example, the utterance content of the learner, the tone of the voice, and the content of the action.

一例として、意思決定情報特定部１１６は、学習者が発した発話内容及び声のトーンを特定する。例えば、図３に示すように、意思決定情報特定部１１６は、発話内容を、「いらっしゃいませ」又は「ご注文は何に致しますか」のように発話内容を示すテキストとして特定し、声のトーンを、一例として高中低のいずれかとして特定する。 As an example, the decision making information specifying unit 116 specifies the utterance content and the tone of the voice uttered by the learner. For example, as shown in FIG. 3, the decision-making information identifying unit 116 identifies the utterance content as text indicating the utterance content such as "Welcome to you" or "What do you want to order?" The tone is specified as one of high, medium and low as an example.

また、意思決定情報特定部１１６は、動き情報取得部１１５が取得した動き情報を参照して、各時点において学習者が行った動作を特定する。例えば、意思決定情報特定部１１６は、図３に示すように、動作を「おじぎ」又は「水を置く」等のように特定する。 Further, the decision making information specifying unit 116 refers to the motion information acquired by the motion information acquiring unit 115 and specifies the motion performed by the learner at each time point. For example, as shown in FIG. 3, the decision-making information identification unit 116 identifies an action such as "bow" or "put water".

なお、意思決定情報特定部１１６は、（状態検出部）に記載の機械学習的手法を用いて意思決定情報を特定してもよい。 Note that the decision making information specifying unit 116 may specify the decision making information by using the machine learning method described in (State detecting unit).

意思決定情報特定部１１６が特定した意思決定情報は、差異情報導出部１１８に入力される。 The decision information specified by the decision information specifying unit 116 is input to the difference information deriving unit 118.

（差異情報導出部）
差異情報導出部１１８は、視認対象特定部１１３から取得した視認対象、意思決定情報特定部１１６から取得した意思決定情報、及び集中度を参照して、記憶部１４０に記憶されている情報との差異を差異情報として導出する。差異情報の導出の一例を図３〜５を用いて説明する。 (Difference information derivation part)
The difference information deriving unit 118 refers to the visual recognition target acquired from the visual recognition target specifying unit 113, the decision making information acquired from the decision making information specifying unit 116, and the information stored in the storage unit 140 with reference to the degree of concentration. The difference is derived as difference information. An example of deriving the difference information will be described with reference to FIGS.

図３は、視認対象特定部１１３及び意思決定情報特定部１１６が特定した学習者のデータである。図３は、学習者のデータの一例として、オブジェクトを視認したタイミング、視認対象、発話内容、声のトーン、動作及び集中度を示している。 FIG. 3 shows learner data identified by the visual recognition target identification unit 113 and the decision making information identification unit 116. FIG. 3 shows, as an example of the learner's data, the timing of visually recognizing the object, the visually recognizable object, the utterance content, the tone of the voice, the action, and the degree of concentration.

図４は、記憶部１４０に予め記憶されているベテラン接客者のデータの一例を示す表である。図４は、ベテラン接客者のデータの一例として、オブジェクトを視認したタイミング、視認対象、発話内容、声のトーン、動作及び集中度を示している。図４における「第１の参照情報」、「第２の参照情報」及び「第３の参照情報」は、それぞれ、ベテラン接客者の視認対象、発話及び動作に対応する。 FIG. 4 is a table showing an example of data of veteran customers who is stored in the storage unit 140 in advance. FIG. 4 shows the timing of visually recognizing the object, the visually recognizable object, the utterance content, the tone of the voice, the motion, and the degree of concentration, as an example of the data of the experienced customer. The “first reference information”, the “second reference information”, and the “third reference information” in FIG. 4 correspond to the visual recognition target, the utterance, and the motion of the veteran customer, respectively.

情報処理装置１００は、一例において、学習者又はベテラン接客者がオブジェクトを視認したタイミングから所定時間内に行われる一連の意思決定（発話及び動作等）を一つのまとまり（図３〜５における「イベント」）として扱う。図３〜５に示すように、一例において、イベントには、タイミング、視認対象、発話、動作、及び集中度が含まれ、それぞれのイベントはＩＤによって識別可能である。図３〜５に示すように、学習者におけるイベントと、ベテラン接客者におけるイベントとは、一例としてイベントＩＤを用いて対応させることができる。本明細書においては、例えば、図３に示すように、学習者のデータにおけるイベントＩＤ１におけるデータは、00;00;03のタイミングにおいて視認した視認対象及びオブジェクトを視認してから一定時間内に行われる意思決定を含む。 In one example, the information processing apparatus 100 collects a series of decision-making (utterances and actions, etc.) that is performed within a predetermined time from the timing when the learner or the veteran customer visually recognizes the object (“event” in FIGS. 3 to 5). )). As shown in FIGS. 3 to 5, in one example, an event includes timing, a visual recognition target, an utterance, a motion, and a degree of concentration, and each event can be identified by an ID. As shown in FIGS. 3 to 5, the event of the learner and the event of the veteran customer can be associated using the event ID as an example. In the present specification, for example, as shown in FIG. 3, the data of the event ID 1 in the learner's data is displayed within a fixed time after the visual target and the visual object visually recognized at the timing of 00;00;03 are visually recognized. Including decision making.

また、図３及び４における「タイミング」のデータは、一例として、オブジェクトを視認した時点を示すデータであるが、発話のタイミング又は動作のタイミング、ならびにオブジェクトを視認した時点と意思決定（発話及び動作）が終了した時点との間の中間の時点を等であってもよく、また、これらを複数組み合わせたものをデータとしてもよい（図示せず）。さらに、一例において、差異情報導出部１１８は、視認対象特定部１１３と意思決定情報特定部１１６とからそれぞれ取得したタイミングの情報を参照することによって、視認から意思決定（発話又は動作）までの経過時間を算出し、データとして含めてもよい（図示せず）。 Further, the “timing” data in FIGS. 3 and 4 is, for example, data indicating a time point when an object is visually recognized. ) May be an intermediate time point between the end time point and the end time point), or a combination of a plurality of these points may be used as data (not shown). Further, in one example, the difference information derivation unit 118 refers to the timing information acquired from the visual recognition target specifying unit 113 and the decision making information specifying unit 116, respectively, to determine the progress from visual recognition to decision making (utterance or action). The time may be calculated and included as data (not shown).

差異情報導出部１１８は、図３及び４にそれぞれ示した学習者のデータ及びベテラン接客者のデータに基づいて、イベントＩＤごとに差異情報を導出する。図５は、差異情報導出部１１８が導出した差異情報のデータの一例を示す表である。図５において、差異情報は、それぞれのイベントＩＤごとに導出された情報であり、対応するイベントＩＤにおける、オブジェクトを視認したタイミング、視認対象、発話内容、声のトーン、動作及び集中度の差異を示す。 The difference information derivation unit 118 derives difference information for each event ID based on the learner data and the experienced customer service data shown in FIGS. 3 and 4, respectively. FIG. 5 is a table showing an example of difference information data derived by the difference information deriving unit 118. In FIG. 5, the difference information is information derived for each event ID, and indicates the difference in the timing of visually recognizing the object, the visually recognizable object, the utterance content, the tone of the voice, the action, and the concentration degree in the corresponding event ID. Show.

図５における「第１の差異情報」、「第２の差異情報」及び「第３の差異情報」は、それぞれ、学習者とベテランとにおける視認対象、発話及び動作の差異に対応する。 The “first difference information”, the “second difference information”, and the “third difference information” in FIG. 5 correspond to the differences in the visual recognition target, the utterance, and the action between the learner and the veteran, respectively.

差異情報は、図３に示す学習者のデータと図４に示すベテラン接客者のデータとの「ずれ」の情報（表中の「＋／−」又は「＋／−」を用いた（学習者−ベテラン接客者）の値によって示される定量的な情報）、又はベテラン接客者のデータに一致するかどうかの情報（表中の「○／×」によって示される定性的な情報）である。一例において、図５に示したように、学習者のデータとベテラン接客者とのデータが一致しない場合は、差異情報導出部１１８が導出する差異情報は、ベテラン接客者のデータ（表中の「正解」）を含むものであってもよい。 As the difference information, information on “deviation” between the learner data shown in FIG. 3 and the veteran customer data shown in FIG. 4 (“+/−” or “+/−” in the table is used (learner -Veteran customer)) quantitative information) or information on whether or not it matches the data of veteran customer (qualitative information indicated by "○/X" in the table). In one example, as shown in FIG. 5, when the learner's data and the veteran customer's data do not match, the difference information derived by the difference information deriving unit 118 is the veteran customer's data (“ Correct answer”) may be included.

一例において、学習者のデータを示す図３中のイベントＩＤ１によれば、タイミング「00；00；03」において、視認対象特定部１１３は、学習者の視認対象が「客の足元」であることを特定している。また、図４中のイベントＩＤ１に示す、記憶部１４０に予め記憶されている情報であるベテラン接客者の情報によれば、タイミング「00；00；03」におけるベテラン接客者の視認対象は「客の目」である。差異情報導出部１１８は、両データに基づいて、学習者の視認対象はベテラン接客者と一致しないと判断し、図５のイベントＩＤ１に示すように、視認対象の項目に関する差異情報として「×」を導出する。さらに、差異情報導出部１１８は、ベテラン接客者の視認対象である「客の目」を正解として導出する。 In one example, according to the event ID 1 in FIG. 3 showing the learner's data, at the timing “00;00;03”, the visual target identification unit 113 determines that the visual target of the learner is “the foot of the customer”. Has been identified. Further, according to the information of the veteran customer who is the information stored in advance in the storage unit 140, which is shown as the event ID 1 in FIG. 4, the visual recognition target of the veteran customer at the timing “00;00;03” is “customer”. The eyes". The difference information deriving unit 118 determines based on both data that the learner's visual recognition target does not match the veteran customer attendant, and as the difference information regarding the visual recognition target item, as shown in event ID 1 in FIG. Derive. Further, the difference information derivation unit 118 derives the “customer's eyes”, which is the visual target of the veteran customer, as the correct answer.

また、図３のイベントＩＤ１に示すように、タイミング「00；00；03」において、意思決定情報特定部１１６は、学習者が「いらっしゃいませ」と発話し、「おじぎ」の動作を行っていることを特定している。また、図４のイベントＩＤ１に示すように、記憶部１４０に予め記憶されている情報によれば、ベテラン接客者は、タイミング「00；00；03」において「いらっしゃいませ」と発話し、「おじぎ」の動作を行っている。差異情報導出部１１８は、両データに基づいて、学習者の発話内容及び動作はベテラン接客者と一致すると判断し、図５のイベントＩＤ１に示すように、発話内容の項目と動作の項目に関する差異情報として「○」を導出する。 Further, as shown in event ID1 of FIG. 3, at timing “00;00;03”, the decision-making information identifying unit 116 speaks “Welcome” to the learner and performs a “bow” action. Have specified that. Further, as shown in event ID 1 in FIG. 4, according to the information stored in advance in the storage unit 140, the experienced customer utters "Welcome" at the timing "00;00;03" and "Given a bow." The operation of "is performed. The difference information deriving unit 118 determines that the utterance content and the action of the learner match those of the veteran customer based on the both data, and as shown in the event ID 1 of FIG. 5, the difference between the utterance content item and the action item. “O” is derived as information.

そして、図３のイベントＩＤ１に示すように、タイミング「00；00；03」において、例えば、意思決定情報特定部１１６は、学習者の声のトーンが「中」であることを特定している。また、図４のイベントＩＤ１に示すように、記憶部１４０に予め記憶されている情報によれば、タイミング「00；00；03」におけるベテラン接客者の声のトーンは「高」である。差異情報導出部１１８は、両データに基づいて、学習者の声のトーンはベテラン接客者とより低いと判断し、図５のイベントＩＤ１に示すように、声のトーンの項目に関する差異情報として「−」を導出する。 Then, as shown in event ID1 of FIG. 3, at the timing “00;00;03”, for example, the decision-making information identifying unit 116 identifies that the tone of the learner's voice is “medium”. .. Further, as shown in event ID1 of FIG. 4, according to the information stored in advance in the storage unit 140, the tone of the voice of the veteran customer at the timing “00;00;03” is “high”. The difference information derivation unit 118 determines that the tone of the learner's voice is lower than that of the veteran customer based on the both data, and as the difference information regarding the item of the voice tone, as shown in event ID1 of FIG. −” is derived.

さらに、タイミング「00；00；03」において、視認対象特定部１１３は、図３のイベントＩＤ１に示すように、学習者の瞬き（回／分）が「５」であることを特定している。また、図４のイベントＩＤ１に示すように、記憶部１４０に予め記憶されている情報によれば、タイミング「00；00；03」におけるベテラン接客者の瞬きは「１０」である。差異情報導出部１１８は、両データに基づいて、学習者の瞬きはベテラン接客者より５少ないと判断し、図５のイベントＩＤ１に示すように、瞬きの項目に関する差異情報として「−５」を導出する。 Further, at the timing "00;00;03", the visual recognition target identification unit 113 identifies that the learner's blink (times/minute) is "5" as shown in event ID1 in FIG. .. Further, as shown in event ID1 in FIG. 4, according to the information stored in advance in the storage unit 140, the blink of the experienced customer at the timing “00;00;03” is “10”. Based on both data, the difference information derivation unit 118 determines that the learner's blink is 5 less than the experienced customer attendant, and as shown in event ID 1 in FIG. 5, “-5” is set as the difference information regarding the item of the blink. Derive.

タイミング「00；00；05」において、視認対象特定部１１３は、図３のイベントＩＤ２に示すように、学習者の瞳孔の状態が「閉」であることを特定している。また、図４のイベントＩＤ２に示すように、記憶部１４０に予め記憶されている情報によれば、タイミング「00；00；05」におけるベテラン接客者の瞬きは「開」である。差異情報導出部１１８は、両データに基づいて、学習者の瞳孔の状態はベテラン接客者と一致しないと判断し、図５のイベントＩＤ２に示すように、瞳孔の状態の項目に関する差異情報として「×」を導出する。 At timing “00;00;05”, the visual recognition target identification unit 113 identifies that the state of the learner's pupil is “closed”, as indicated by event ID2 in FIG. Further, as shown in event ID2 in FIG. 4, according to the information stored in advance in the storage unit 140, the blink of the experienced customer at the timing “00;00;05” is “open”. The difference information derivation unit 118 determines based on both data that the pupil state of the learner does not match the veteran customer attendant, and as the difference information regarding the item of the pupil state, as shown in event ID2 of FIG. X" is derived.

タイミング「00；00；05」において、視認対象特定部１１３は、図３のイベントＩＤ２に示すように、学習者の視線停留時間（秒）が「０．０」であることを特定している。また、図４のイベントＩＤ２に示すように、記憶部１４０に予め記憶されている情報によれば、タイミング「00；00；05」におけるベテラン接客者の視線停留時間は「０．８」である。差異情報導出部１１８は、両データに基づいて、学習者の視線停留時間はベテラン接客者より０．８少ないと判断し、図５のイベントＩＤ２に示すように、視線停留時間の項目に関する差異情報として「−０．８」を導出する。 At timing “00;00;05”, the visual target specifying unit 113 specifies that the learner's line-of-sight hold time (seconds) is “0.0”, as indicated by event ID2 in FIG. .. Further, as shown in event ID2 of FIG. 4, according to the information stored in advance in the storage unit 140, the line-of-sight retention time of the veteran customer at the timing “00;00;05” is “0.8”. .. The difference information derivation unit 118 determines that the learner's line-of-sight stop time is 0.8 less than that of the veteran customer based on both data, and as shown in event ID2 of FIG. 5, the difference information regarding the line-of-sight stop time item. As a result, “−0.8” is derived.

また、タイミング「00；00；05」において、視認対象特定部１１３は、図３のイベントＩＤ２に示すように、学習者の表情について、眉の動き及び頬の動きが検出されていないことを、「−」として特定している。一方、図４のイベントＩＤ２に示すように、記憶部１４０に予め記憶されている情報によれば、タイミング「00；00；05」におけるベテラン接客者の表情は、眉の動きが「−」、頬の動きが「上方向」である。この場合、差異情報導出部１１８は、両データに基づいて、学習者の眉の動きはベテラン接客者と一致していると判断し、図５のイベントＩＤ２に示すように、眉の動きの項目に関する差異情報として「○」を導出する。また、差異情報導出部１１８は、両データに基づいて、学習者の頬の動きはベテラン接客者と一致しないと判断し、図５のイベントＩＤ２に示すように、頬の動きの項目に関する差異情報として「×」を導出する。 Further, at the timing “00;00;05”, the visual target specifying unit 113 confirms that the eyebrow movement and the cheek movement are not detected in the learner's facial expression, as shown in event ID2 of FIG. It is specified as "-". On the other hand, as shown in event ID2 of FIG. 4, according to the information stored in advance in the storage unit 140, the facial expression of the experienced customer attendant at the timing “00;00;05” is that the movement of the eyebrows is “−”, The movement of the cheek is "upward". In this case, the difference information derivation unit 118 determines that the learner's eyebrow movement matches that of the veteran customer based on both data, and as shown in event ID2 of FIG. 5, the eyebrow movement item. “O” is derived as the difference information regarding. Further, the difference information deriving unit 118 determines that the learner's cheek movement does not match the experienced customer attendant based on both data, and as shown in event ID2 of FIG. 5, the difference information regarding the cheek movement item. "X" is derived as.

また、視認対象特定部１１３及び意思決定情報特定部１１６によれば、学習者は、図３のイベントＩＤ３に示すように、タイミング「00;00;40」において、「客のテーブル」を視認し、「ご注文は何に致しますか」と発話し、「水を置く」動作を行っている。一方、図４のイベントＩＤ３に示すように、記憶部１４０に予め記憶されている情報によれば、「客のテーブル」を視認し、「ご注文は何に致しますか」と発話し、「水を置く」動作を行うタイミングは「00;00;33」である。差異情報導出部１１８は、両データに基づいて、学習者の一連のデータのまとまりは、ベテラン接客者より00;00;07遅いと判断し、図５のイベントＩＤ３に示すように、タイミングの項目に関する差異情報として「＋00;00;07」を導出する。 Further, according to the visual recognition target specifying unit 113 and the decision making information specifying unit 116, the learner visually recognizes the "customer table" at the timing "00;00;40" as shown in event ID3 of FIG. , "What would you like to order?" On the other hand, as shown in event ID3 in FIG. 4, according to the information stored in advance in the storage unit 140, the "customer table" is visually recognized and "What are you ordering?" is uttered. The timing of performing the "put water" operation is "00;00;33". Based on both data, the difference information deriving unit 118 determines that the series of data of the learner is 00:00;07 later than the veteran customer, and as shown in event ID3 of FIG. “+00;00;07” is derived as the difference information regarding.

なお、図４のイベントＩＤ４に示すように、記憶部１４０に予め記憶されている情報によれば、ベテラン接客者は、タイミング「00;00;47」において「自分の手元」を視認し、「ご注文を確認いたします」と発話し、「注文内容を復唱」という動作を行っている。その一方で、図３に示すように、視認対象特定部１１３及び意思決定情報特定部１１６はいずれもタイミング「00;00;47」において学習者の視認対象及び意思決定情報を特定しておらず、また、タイミングによらず、「自分の手元」の視認、「ご注文を確認いたします」という発話及び「注文内容を復唱」という動作のいずれも特定されていない。このような場合、差異情報導出部１１８は、両データに基づいて、学習者のデータ中にベテラン接客者のデータに対応するデータが存在しないと判断し、図５の６００に示すように、差異情報として「データ無」、「×」及び正解となる情報を導出する。 As shown in event ID 4 of FIG. 4, according to the information stored in the storage unit 140 in advance, the experienced customer sees “own hand” at the timing “00;00;47”, I will confirm your order" and perform the action of "repeat your order". On the other hand, as shown in FIG. 3, neither the visual recognition target specifying unit 113 nor the decision making information specifying unit 116 specifies the learner's visual recognition target and decision making information at the timing “00;00;47”. In addition, regardless of the timing, neither the visual recognition of "your own hand", the utterance "I confirm the order", or the operation "repeat order contents" is not specified. In such a case, the difference information derivation unit 118 determines, based on both data, that there is no data corresponding to the veteran customer data in the learner's data, and as shown at 600 in FIG. As information, "no data", "x", and correct answer information are derived.

図５の表から、学習者は、客の目を見ておらず、注文を取るタイミングが遅い傾向にあることが判定される。また、図５の表から、学習者は、客に対して注文の確認をしていないことが判定される。 From the table of FIG. 5, it is determined that the learner does not look at the eyes of the customer and tends to take the order later. Further, from the table of FIG. 5, it is determined that the learner has not confirmed the order with the customer.

差異情報導出部１１８は、（状態検出部）に記載の機械学習的手法を用いて差異情報を導出してもよい。 The difference information derivation unit 118 may derive the difference information using the machine learning method described in (State detection unit).

差異情報導出部１１８で導出された差異情報は、提示情報生成部１１９に入力される。 The difference information derived by the difference information deriving unit 118 is input to the presentation information generating unit 119.

（学習者情報取得部）
学習者情報取得部１１７は、例えば、情報管理サーバ４００が管理する学習者の情報を学習者情報として取得する。一例として、学習者の情報は、学習者の属性を示す属性情報と、他の学習者と対象学習者とを識別するための学習者識別情報を含む。学習者の属性情報とは、例えば、学習者の年齢、性別、職歴及び本提示システム８００を用いたこれまでの学習期間等である。また、学習者識別情報とは、例えば、学習者のＩＤ又は学習者のメールアドレス等である。 (Learner information acquisition section)
The learner information acquisition unit 117 acquires, for example, learner information managed by the information management server 400 as learner information. As an example, the learner information includes attribute information indicating the learner's attributes and learner identification information for distinguishing other learners from the target learner. The learner's attribute information is, for example, the learner's age, sex, work history, and the learning period up to now using the presenting system 800. The learner identification information is, for example, a learner's ID or a learner's email address.

このように、学習者の情報が、属性情報と学習者識別情報とを含むことで、後述する提示情報生成部１１９は、個々の学習者の属性に応じて、各学習者にとって効果的な提示情報を提示することができる。詳細は（提示情報生成部）の項目で述べる。 In this way, since the learner information includes the attribute information and the learner identification information, the presentation information generation unit 119, which will be described later, makes an effective presentation for each learner according to the attribute of each learner. Information can be presented. Details will be described in the item of (presentation information generation unit).

学習者情報取得部１１７は、情報管理サーバ４００から取得した学習者情報を提示情報生成部１１９に入力する。 The learner information acquisition unit 117 inputs the learner information acquired from the information management server 400 to the presentation information generation unit 119.

（提示情報生成部）
提示情報生成部１１９は、差異情報導出部１１８が導出した差異情報（視認対象特定部１１３が顔情報から特定する視認対象及び集中度、意思決定情報特定部１１６が音声情報から特定する発話及び動き情報から特定する動作に基づいて導出される）に基づいて提示情報を生成する。一例において、提示情報生成部１１９が生成する提示情報は画像である。 (Presentation information generator)
The presentation information generation unit 119 uses the difference information derived by the difference information derivation unit 118 (the visual recognition target and the degree of concentration identified by the visual identification target identification unit 113 from the face information, and the utterance and movement identified by the decision making information identification unit 116 from the voice information) The presentation information is generated based on the information derived from the action specified from the information). In one example, the presentation information generated by the presentation information generation unit 119 is an image.

提示情報生成部１１９は、学習者が視認する対象が、ベテラン接客者が視認対象と異なる場合に、ベテラン接客者が視認する対象を強調表示した提示情報を生成する。図８は、提示情報生成部１１９が生成した提示情報の一例を示す図である。図８に示す画像２００Ｃは、学習者が視認すべき対象を強調表示した画像であり、図２に示した画像２００Ａに記号として矢印ポインタ２００Ｃ１が加えられている。図８においては、矢印ポインタ２００Ｃ１は矢印の形をしているが、任意の形状及び色であってよい。 The presentation information generation unit 119 generates presentation information in which the target visually recognized by the veteran customer is highlighted when the target visually recognized by the learner is different from the visually recognized target. FIG. 8 is a diagram showing an example of the presentation information generated by the presentation information generation unit 119. An image 200C shown in FIG. 8 is an image in which an object to be visually recognized by the learner is highlighted, and an arrow pointer 200C1 is added as a symbol to the image 200A shown in FIG. In FIG. 8, the arrow pointer 200C1 has an arrow shape, but may have any shape and color.

また、視認対象の強調表示の例としては、他にも、視認対象を枠囲みする、視認対象の周辺の背景色を変更する、視認対象を他のイメージよりも大きく表示する、及び視認対象の画像をポップアップウィンドウとして表示する等が挙げられる。 In addition, as examples of highlighting of the visual target, in addition to surrounding the visual target, changing the background color around the visual target, displaying the visual target larger than other images, and For example, displaying an image as a pop-up window.

提示情報生成部１１９が生成する提示情報は常に強調表示を含んでいてもよいし、例えば、差異情報導出部１１８が導出した差異情報に応じて、適宜、強調の有無及び度合いを変更してもよい。 The presentation information generated by the presentation information generation unit 119 may always include highlighted display, and, for example, the presence or absence of emphasis and the degree thereof may be appropriately changed according to the difference information derived by the difference information derivation unit 118. Good.

また、提示情報生成部１１９は、差異情報導出部１１８が導出した差異情報に応じた点数を算出し、算出した点数を提示情報に含ませてもよい。 The presentation information generation unit 119 may calculate a score according to the difference information derived by the difference information derivation unit 118, and may include the calculated score in the presentation information.

図７は、提示情報生成部１１９が生成した提示情報の一例を示す図である。図７に示す画像２００Ｂは、テキストとして、指示ウィンドウ２００Ｂ１及び点数ウィンドウ２００Ｂ２を含んでいる。 FIG. 7 is a diagram showing an example of the presentation information generated by the presentation information generation unit 119. The image 200B shown in FIG. 7 includes an instruction window 200B1 and a score window 200B2 as text.

画像２００Ｂにおける指示ウィンドウ２００Ｂ１は、学習者が行うべき発話内容及び動作のガイド示すテキストを表示するウィンドウである。ガイドの内容は、記憶部１４０に予め記憶されているベテラン接客者のデータに基づくものである。テキストは、図７に示したように、学習者が行うべき発話内容及び動作の全てを指示するものであってもよいし、例えば、「客がメニューから顔を挙げました、どうしますか？」等のように学習者にヒントを与えるものであってもよい。 The instruction window 200B1 in the image 200B is a window for displaying a text indicating a utterance content and an action to be performed by the learner. The content of the guide is based on the data of the experienced customer who is stored in the storage unit 140 in advance. As shown in FIG. 7, the text may indicate all of the utterance contents and actions that the learner should perform, for example, “A guest raised his face from the menu, what should you do? ", etc. may be used to give a hint to the learner.

画像２００Ｂにおける点数ウィンドウ２００Ｂ２は、学習者の習熟度を示す点数を表示するウィンドウである。点数の算出方法としては、例えば、（ｉ）一致又は不一致で判断した差異情報と、ずれの程度を判断した差異情報とで加点又は減点の方法を分けて、不一致の項目については所定の点数を減点し、ずれている項目についてはずれの程度に応じて減点する点数を変化させる方法、及び（ｉｉ）ずれの程度を判断した差異情報だけでなく、一致又は不一致で判断した差異情報についても、予め定めた範囲を逸脱していなければ（例えば、ベテラン接客者の視認対象が「客の目」であるのに対して学習者の視認対象が「客の鼻」であった、ベテラン接客者の発話内容が「ご注文は何に致しますか」であるのに対して、学習者の発話内容が「ご注文をお伺いいたします」であった等）、程度に応じて減点する点数を変化させる方法等が挙げられるが、特に限られない。 The score window 200B2 in the image 200B is a window for displaying the score indicating the proficiency level of the learner. As a score calculation method, for example, (i) the difference information determined by agreement or disagreement and the difference information determined by the degree of deviation are added or subtracted, and a predetermined score is given for the disagreement item. For items that have been deducted and deviated, the method of changing the deduction points according to the degree of deviation, and (ii) not only the difference information for judging the degree of deviation, but also the difference information for judging whether they match or not If it does not deviate from the specified range (for example, a veteran customer's visual target is the "customer's eye", whereas the learner's visual target is the "customer's nose", the utterance of the veteran customer The content is "What do you want to order?", but the content of the learner's utterance was "Ask for your order, etc.)" Examples of the method include, but are not limited to.

一例において、提示情報生成部１１９が生成する提示情報は音声を含む（図示せず）。音声としては、声であっても効果音であってもよい。一例において、声は、指示ウィンドウ２００Ｂ１に表示されるテキストを読み上げるものである。また、効果音の例としては、差異情報の内容に応じて正解又は不正解を示す音（「ピンポン」又は「ブー」等）という音、及び視認対象の強調表示に伴って学習者に注意を促す音等が挙げられる。 In one example, the presentation information generated by the presentation information generation unit 119 includes sound (not shown). The voice may be a voice or a sound effect. In one example, the voice reads the text displayed in the instruction window 200B1. Also, as an example of the sound effect, a sound such as a sound indicating a correct answer or an incorrect answer (such as "ping-pong" or "boo") depending on the content of the difference information, and the learner's attention along with the highlighting of the visual recognition target Sounds that urge you can be mentioned.

提示情報生成部１１９は、提示情報を生成するにあたって、学習者情報取得部１１７が取得した学習者情報を参照する。学習者情報を参照することによって、提示情報生成部１１９は、個々の学習者の属性に応じて、各学習者が効果的に学習できるような提示情報を提示することができる。 The presentation information generation unit 119 refers to the learner information acquired by the learner information acquisition unit 117 when generating the presentation information. By referring to the learner information, the presentation information generation unit 119 can present the presentation information such that each learner can effectively learn according to the attribute of each learner.

例えば、本実施形態における学習者が接客業を全く経験したことがない場合には、提示情報生成部１１９は、視認対象をより強調表示した提示情報を生成したり、より具体的な内容のガイドを含む提示情報を生成したりする。また、例えば、学習者のこれまでの学習期間が所定期間より長い場合には、提示情報生成部１１９は、より厳しい条件で算出した学習者の習熟度を示す点数を含む提示情報を生成することができる。さらに、提示情報生成部１１９は、一例において、学習者が好きなアニメのキャラクター等の音声を含む提示情報を生成することができる。 For example, when the learner in the present embodiment has never experienced hospitality business, the presentation information generation unit 119 generates the presentation information in which the visual recognition target is more emphasized, or a guide with more specific content. The presentation information including is generated. Further, for example, when the learner's previous learning period is longer than the predetermined period, the presentation information generating unit 119 generates the presentation information including the score indicating the learner's proficiency level calculated under a more severe condition. You can Further, in one example, the presentation information generation unit 119 can generate presentation information including a voice such as an anime character that the learner likes.

提示情報生成部１１９は、生成した提示情報をヘッドマウントディスプレイ２００へ送信する。また、提示情報生成部１１９は、提示情報を表示部５００へ送信する。 The presentation information generation unit 119 transmits the generated presentation information to the head mounted display 200. The presentation information generation unit 119 also transmits the presentation information to the display unit 500.

＜表示部＞
図１に示すように、提示システム８００は、表示部５００を備えている。表示部５００は、提示情報生成部１１９が生成した提示情報を、主に学習者を指導する指導者等向けに表示する。表示部５００が表示する画像は、図３に示した学習者のデータを示す表、及び図５に示した差異情報を示す表を含んでいてよく、さらに、図４に示したベテラン接客者のデータを示す表を含んでいてもよい。また、表示部５００が表示する画像は、ヘッドマウントディスプレイ２００のディスプレイ２７０が表示する画像の一部を含んでいてもよい。 <Display>
As shown in FIG. 1, the presentation system 800 includes a display unit 500. The display unit 500 displays the presentation information generated by the presentation information generation unit 119, mainly for an instructor who teaches a learner. The image displayed by the display unit 500 may include the table showing the learner's data shown in FIG. 3 and the table showing the difference information shown in FIG. 5. Furthermore, the image of the veteran customer shown in FIG. It may include a table showing the data. The image displayed by the display unit 500 may include a part of the image displayed by the display 270 of the head mounted display 200.

２．提示システムの処理例
図６は、提示システム８００の処理を示すフローチャートである。図６に基づいて提示システム８００の処理について説明する。 2. Processing Example of Presentation System FIG. 6 is a flowchart showing processing of the presentation system 800. The processing of the presentation system 800 will be described based on FIG.

まず提示システムの使用を開始（ステップＳ１００）し、処理を開始する。ステップＳ（以下、「ステップ」は省略する）１０２に進む。 First, the use of the presentation system is started (step S100), and the process is started. The process proceeds to step S (hereinafter, “step” is omitted) 102.

Ｓ１０２では、情報処理装置１００の制御部１１０における学習者情報取得部１１７が、学習者情報を取得する（学習者情報取得ステップ）。処理の詳細は、（学習者情報取得部）に記載の通りである。学習者情報取得部１１７は、学習者情報を提示情報生成部１１９に送信し、Ｓ１０４に進む。 In S102, the learner information acquisition unit 117 in the control unit 110 of the information processing apparatus 100 acquires the learner information (learner information acquisition step). Details of the processing are as described in (learner information acquisition unit). The learner information acquisition unit 117 transmits the learner information to the presentation information generation unit 119, and proceeds to S104.

Ｓ１０４では、顔情報取得部１１１が、学習者のヘッドマウントディスプレイ２００におけるカメラ２１０から取得し、顔画像から顔の各部位を抽出する（顔情報取得ステップ）。処理の詳細は、（顔情報取得部）に記載の通りである。顔情報取得部１１１は、抽出した結果を状態検出部１１２に送信し、Ｓ１０６に進む。 In S104, the face information acquisition unit 111 acquires from the camera 210 of the learner's head-mounted display 200 and extracts each part of the face from the face image (face information acquisition step). Details of the processing are as described in (Face information acquisition unit). The face information acquisition unit 111 transmits the extracted result to the state detection unit 112, and proceeds to S106.

Ｓ１０６では、状態検出部１１２が、学習者の視認対象、瞬き、瞳孔の状態、及び顔の各部位の状態を検出する（状態検出ステップ）。処理の詳細は（状態検出部）に記載の通りである。検出結果を視認対象特定部１１３に送信し、Ｓ１０８に進む。 In S106, the state detection unit 112 detects the learner's visual recognition target, blink, the state of the pupil, and the state of each part of the face (state detection step). The details of the processing are as described in (State detection unit). The detection result is transmitted to the visual recognition target specifying unit 113, and the process proceeds to S108.

Ｓ１０８では、視認対象特定部１１３が、学習者が視認する視認対象を特定する（視認対象特定ステップ）。処理の詳細は、（視認対象特定部）に記載の通りである。このとき、視認対象特定部１１３は、視認対象の特定と同時に、視認しているときの集中度を特定してもよい。視認対象特定部１１３は、特定した結果を、差異情報導出部１１８に送信し、Ｓ１１０に進む。 In S108, the visual recognition target identification unit 113 identifies a visual recognition target visually recognized by the learner (visual recognition target identification step). The details of the processing are as described in (visual identification target specifying unit). At this time, the visual recognition target specifying unit 113 may specify the degree of concentration while visually recognizing at the same time as specifying the visual recognition target. The visual recognition target specifying unit 113 transmits the specified result to the difference information deriving unit 118, and proceeds to S110.

Ｓ１１０では、音声情報取得部１１４又は動き情報取得部１１５の少なくとも一方が、学習者の意思決定情報を、ヘッドマウントディスプレイ２００等が備えるマイク２２０及び動きセンサ３１０等からそれぞれ取得する（意思決定情報取得ステップ）。処理の詳細は、（音声情報取得部）及び（動き情報取得部）に記載の通りである。Ｓ１１２に進む。 In S110, at least one of the voice information acquisition unit 114 and the motion information acquisition unit 115 acquires the decision-making information of the learner from the microphone 220 and the motion sensor 310 included in the head-mounted display 200 and the like (decision-making information acquisition). Step). Details of the processing are as described in (Voice information acquisition unit) and (Motion information acquisition unit). It proceeds to S112.

Ｓ１１２では、音声情報取得部１１４又は動き情報取得部１１５の少なくとも一方が、音声情報又は動き情報等（以下、「意思決定情報」ということがある）が取得されたかどうかを参照することによって、意思決定情報の有無を判定する。意思決定情報があった場合は、音声情報取得部１１４及び動き情報取得部１１５は、取得した意思決定情報を意思決定情報特定部１１６に送信し、Ｓ１１３に進む。また、意思決定情報がなかった場合はＳ１１４に進む。 In S112, at least one of the voice information acquisition unit 114 and the motion information acquisition unit 115 refers to whether voice information, motion information, or the like (hereinafter, sometimes referred to as “decision-making information”) is acquired, The presence or absence of decision information is determined. If there is decision-making information, the voice information acquisition unit 114 and the movement information acquisition unit 115 transmit the acquired decision-making information to the decision-making information specifying unit 116, and the process proceeds to S113. If there is no decision-making information, the process proceeds to S114.

Ｓ１１３では、意思決定情報特定部１１６が、学習者の発話内容、声のトーン及び動作等の意思決定情報を特定する（意思決定情報特定ステップ）。処理の詳細は（意思決定情報特定部）に記載の通りである。意思決定情報特定部１１６は、特定した結果を、差異情報導出部１１８に送信し、Ｓ１１４に進む。 In S113, the decision making information specifying unit 116 specifies the decision making information such as the learner's utterance content, voice tone, and motion (decision making information specifying step). Details of the processing are as described in (Decision making information specifying unit). The decision making information specifying unit 116 transmits the specified result to the difference information deriving unit 118, and proceeds to S114.

Ｓ１１４では、差異情報導出部１１８が、記憶部１４０に予め記憶されているベテラン接客者の視認対象及び意思決定情報等の参照情報を取得する（参照情報取得ステップ）。Ｓ１１６に進む。 In S114, the difference information derivation unit 118 obtains reference information such as the visual recognition target of the veteran customer and decision-making information that is stored in advance in the storage unit 140 (reference information obtaining step). Proceed to S116.

Ｓ１１６では、差異情報導出部１１８が、視認対象特定部１１３が特定した視認対象と、意思決定情報特定部１１６が特定した意思決定情報と、記憶部１４０から取得した参照情報とを参照し、差異情報を導出する（差異情報導出ステップ）。処理の詳細は、（差異情報導出部）に記載の通りである。差異情報導出部１１８は、導出した差異情報を、提示情報生成部１１９に送信し、Ｓ１１８に進む。 In S116, the difference information deriving unit 118 refers to the visual recognition target specified by the visual recognition target specifying unit 113, the decision making information specified by the decision making information specifying unit 116, and the reference information acquired from the storage unit 140, and makes a difference. Deriving information (difference information deriving step). Details of the processing are as described in (difference information deriving unit). The difference information derivation unit 118 transmits the derived difference information to the presentation information generation unit 119, and proceeds to S118.

Ｓ１１８では、提示情報生成部１１９が、差異情報導出部１１８が導出した差異情報と、学習者情報取得部１１７から取得した学習者情報とを参照し、提示情報を生成する（提示情報生成ステップ）。処理の詳細は、（提示情報生成部）に記載の通りである。情報処理装置１００の制御部１１０は、通信部１３０に、提示情報生成部１１９が生成した提示情報をヘッドマウントディスプレイ２００及び表示部５００に送信させ、Ｓ１２０に進む。 In S118, the presentation information generation unit 119 generates presentation information by referring to the difference information derived by the difference information derivation unit 118 and the learner information acquired from the learner information acquisition unit 117 (presentation information generation step). .. Details of the processing are as described in (presentation information generation unit). The control unit 110 of the information processing device 100 causes the communication unit 130 to transmit the presentation information generated by the presentation information generation unit 119 to the head mounted display 200 and the display unit 500, and proceeds to S120.

Ｓ１２０では、ヘッドマウントディスプレイ２００の制御部２５０が、ディスプレイ２７０等に、提示情報生成部１１９が生成した提示情報を表示させ（提示情報表示ステップ）、Ｓ１２２に進み、処理を終了する。このとき、ヘッドマウントディスプレイ２００の制御部２５０は、スピーカー２４０等に提示情報生成部１１９が生成した提示情報を音声として出力してもよい（図示せず）。 In S120, the control unit 250 of the head mounted display 200 causes the display 270 or the like to display the presentation information generated by the presentation information generation unit 119 (presentation information display step), the process proceeds to S122, and the process ends. At this time, the control unit 250 of the head mounted display 200 may output the presentation information generated by the presentation information generation unit 119 to the speaker 240 or the like as sound (not shown).

なお、図７の指示ウィンドウ２００Ｂ１及び点数ウィンドウ２００Ｂ２、並びに図８の矢印ポインタ２００Ｃ１等の提示情報の全てが、必ずしも進行中の接客作業の学習においてリアルタイムに表示されずともよく、例えば今回の接客作業の学習において生成された提示情報が、次回以降の接客作業の学習の際に表示される構成でもよい。また、情報処理装置１００が、ディスプレイ２７０等に表示される映像を記憶部１４０に録画する機能を有しており、当該映像を後から再生した場合に各提示情報が表示される構成でもよい。また、後述する実施形態２においても同様である。 It should be noted that all the presentation information such as the instruction window 200B1 and the score window 200B2 in FIG. 7, and the arrow pointer 200C1 in FIG. 8 may not necessarily be displayed in real time during learning of the ongoing customer service, and for example, the current customer service The configuration may be such that the presentation information generated in the learning is displayed at the time of learning the customer service work after the next time. Further, the information processing apparatus 100 may have a function of recording the video displayed on the display 270 or the like in the storage unit 140, and may display each presentation information when the video is reproduced later. The same applies to Embodiment 2 described later.

〔実施形態２〕
１．提示システム
本実施形態に係る提示システム８００ａは、図１に示す提示システム８００と同等の機能を有し、更に、上述した差異情報に対して後述する判断ルールを適用した結果に応じた提示情報を学習者に提示する。これにより、作業学習者に対して、自立的かつ体験的な学習を、より好適に支援することが可能である。以下においては、実施形態１と同様に、接客が必要とされる飲食店等で働く学習者が接客作業を学習する状況において提示システム８００ａを用いる場合を例に挙げて説明する。 [Embodiment 2]
1. Presentation system The presentation system 800a according to the present embodiment has a function equivalent to that of the presentation system 800 shown in FIG. Present to learners. As a result, it is possible to more appropriately support the work learner in independent and experiential learning. In the following, similar to the first embodiment, a case where the presentation system 800a is used in a situation where a learner working at a restaurant or the like that needs service is learning the service work will be described as an example.

図９は、本発明の一実施形態に係る提示システムの構成要素を示すブロック図である。図９に示すように、本実施形態に係る提示システム８００ａは、図１に示す提示システム８００から、制御部１１０ａが、差異情報についての分析を行う差異分析部１２０を更に備える。また、本実施形態に係る提示情報生成部１１９ａは、差異情報導出部１１８から入力された差異情報に加え、差異分析部１２０から入力された判断結果を参照して提示情報を生成する。また、本実施形態に係る記憶部１４０ａは、判断ルールに関するデータベースである判断ルールデータベース１４１を有する。なお、別の態様として、判断ルールデータベース１４１が、記憶部１４０に含まれず、図示しないメモリ等に別途格納される構成でもよい。 FIG. 9 is a block diagram showing components of the presentation system according to the embodiment of the present invention. As shown in FIG. 9, the presentation system 800a according to the present embodiment further includes a difference analysis unit 120, which is different from the presentation system 800 shown in FIG. 1, in that the control unit 110a analyzes difference information. Further, the presentation information generation unit 119a according to the present embodiment generates presentation information by referring to the determination result input from the difference analysis unit 120 in addition to the difference information input from the difference information derivation unit 118. Further, the storage unit 140a according to the present embodiment has a judgment rule database 141 which is a database regarding judgment rules. Note that, as another aspect, the determination rule database 141 may not be included in the storage unit 140 but may be separately stored in a memory or the like (not shown).

（差異分析部）
差異分析部１２０は、差異情報導出部１１８が導出した第１〜第３の差異情報の少なくとも何れかに対して、判断ルールデータベース１４１から読みだした判断ルールを適用した判断処理を行う。 (Difference Analysis Department)
The difference analysis unit 120 performs a determination process in which the determination rule read from the determination rule database 141 is applied to at least one of the first to third difference information derived by the difference information derivation unit 118.

図３〜５に示す差異情報のうち、学習者のデータとベテラン接客者とデータとの「ずれ」の情報、及びベテラン接客者のデータに一致するかどうかの情報の全てが提示情報として学習者に提示されたとした場合、却って学習者は、どのデータに着目すればよいかが分かり難くなる。そこで、両者のデータ間の差異が所定の閾値以内であるか、所定の判断基準が満たされていれば、学習者には当該差異は提示されない方が好ましい場合がある。判断ルールデータベース１４１に含まれる判断ルールとは、各差異についての上記閾値または判断基準を規定したデータセットであるものと解してもよい。 Of the difference information shown in FIGS. 3 to 5, all the information about the “deviation” between the learner's data and the veteran customer's data, and the information on whether or not the data match the veteran customer's data are presented as the learner's information. On the contrary, it is difficult for the learner to understand which data should be focused on. Therefore, if the difference between the two data is within a predetermined threshold value or if a predetermined judgment criterion is satisfied, it may be preferable not to present the difference to the learner. The determination rule included in the determination rule database 141 may be understood as a data set that defines the threshold value or the determination criterion for each difference.

例えば、あるタイミングにおいて、ベテラン接客者の視認対象が客の目である場合であって、判断ルールが客の目を含む所定の範囲内を示す場合、差異分析部１２０は、当該タイミングにおける学習者の視認対象が正確には客の目ではなかったとしても上記所定の範囲内であれば許容されるものとして、当該視認対象の差異については学習者に提示しないものとして判断してもよい。一方で、差異分析部１２０は、上記の場合において学習者の視認対象が自分の手元等であったときには、当該視認対象の差異について学習者に提示するものとして判断してもよい。なお、学習者に提示しないものとして許容される上記所定の範囲は、連続しない複数の範囲であってもよい。 For example, at a certain timing, when the visually recognized object of the veteran customer is the customer's eyes, and the determination rule indicates within a predetermined range including the customer's eyes, the difference analysis unit 120 causes the learner at the timing. Even if the visual recognition target is not exactly the eyes of the customer, it may be determined that the visual recognition target is allowed as long as it is within the predetermined range, and the learner does not present the difference in the visual recognition target. On the other hand, when the learner's visual recognition target is his or her hand in the above case, the difference analysis unit 120 may determine that the learner should be presented with the difference of the visual recognition target. Note that the above-mentioned predetermined range that is allowed not to be presented to the learner may be a plurality of ranges that are not continuous.

また、例えばあるタイミングにおいて、ベテラン接客者の発話内容が「ご注文は何に致しますか」であった場合、学習者の発話内容が「ご注文をお伺いいたします」であったとしても、同様の内容を示しているものとして、差異分析部１２０は、当該発話内容の差異については学習者に提示しないものとして判断してもよい。一方で、差異分析部１２０は、上記の場合において学習者の発話内容が「ご注文を確認いたします」等であった時には、当該発話内容の差異について学習者に提示するものとして判断してもよい。また、上述した例のように、差異分析部１２０は、発話内容の所定部分が一致していれば同様の内容を示しているものと判断してもよい。また、ベテラン接客者のある発話内容に対して許容される学習者の発話内容を示すデータセットが判断ルールデータベース１４１に含まれる構成でもよい。また、発話内容自体に限定されず、声のトーンの高低についても学習者に提示されるか否かの閾値が設定されていてもよい。 Also, for example, at a certain timing, if the utterance content of a veteran customer is "What do you want to order", even if the utterance content of the learner is "I will ask your order", The difference analysis unit 120 may judge that the difference in the utterance content is not presented to the learner, as indicating the same content. On the other hand, when the learner's utterance content is “I will confirm your order” in the above case, the difference analysis unit 120 may determine that the learner should be presented with the difference in the utterance content. Good. Further, as in the above-described example, the difference analysis unit 120 may determine that the utterance content indicates the same content if the predetermined portions of the content match. Further, the determination rule database 141 may include a data set indicating the utterance content of the learner permitted for the utterance content of the veteran customer. Further, the threshold value of whether or not to present the learner about the level of the voice tone is not limited to the utterance content itself.

また、例えばあるタイミングにおいて、ベテラン接客者の動作が「前方を示す」であった場合、学習者の動作が「テーブル番号を述べる」であったとしても、差異分析部１２０は、差異分析部１２０は、当該動作の差異については学習者に提示しないものとして判断してもよい。一方で、差異分析部１２０は、上記の場合において学習者の動作が「注文を尋ねる」等であった時には、当該動作の差異について学習者に提示するものとして判断してもよい。また、ベテラン接客者のある動作に足して許容される学習者の動作を示すデータセットが判断ルールデータベース１４１に含まれる構成でもよい。 Further, for example, at a certain timing, when the motion of the veteran customer is “to show the front”, even if the motion of the learner is to “state the table number”, the difference analysis unit 120 does not change the difference analysis unit 120. May judge that the learner does not present the difference in the action. On the other hand, when the learner's action is “ask for an order” or the like in the above case, the difference analysis unit 120 may determine to present the learner with the difference in action. Further, the determination rule database 141 may include a data set indicating a learner's motion that is allowed in addition to a certain motion of an experienced customer.

また、上述した各例に限定されず、例えば判断ルールデータベース１４１が、両者の各行動のタイミングの差異についての閾値を含み、差異分析部１２０が、上記閾値を超えるタイミングの差異を、学習者に提示するものとして判断してもよい。 In addition, the present invention is not limited to the above-described examples, and for example, the determination rule database 141 includes a threshold value for the difference in timing between the two actions, and the difference analysis unit 120 notifies the learner of the difference in timing that exceeds the threshold value. You may judge as what you present.

また、差異分析部１２０は、各時刻においてベテラン接客者および学習者が何れの視認対象物を視認しているかを示す情報、並びに所定時間内において、各視認対象物が視認された時間に関する情報を生成する。上記各情報の一例については後述する。 Further, the difference analysis unit 120 provides information indicating which visual target object the veteran customer and the learner are viewing at each time, and information regarding the time when each visual target object is viewed within a predetermined time. To generate. An example of each of the above information will be described later.

２．提示システムの処理例
図１０は、提示システム８００ａの処理を示すフローチャートである。図１０に基づいて提示システム８００ａの処理について説明する。 2. Processing Example of Presentation System FIG. 10 is a flowchart showing processing of the presentation system 800a. The processing of the presentation system 800a will be described based on FIG.

図１０のフローチャートに基づく処理において、Ｓ１００からＳ１１６までは、実施形態１と同様の処理が実行される。Ｓ１１６における処理が行われたのち、Ｓ１１７に進む。 In the processing based on the flowchart of FIG. 10, the same processing as that of the first embodiment is executed from S100 to S116. After the processing in S116 is performed, the process proceeds to S117.

Ｓ１１７では、差異分析部１２０が、差異情報導出部１１８が導出した第１〜第３の差異情報の少なくとも何れかに対して、判断ルールデータベース１４１から読みだした判断ルールを適用した判断処理を行う。処理の詳細は、（差異分析部）に記載の通りである。また、差異分析部１２０は、各時刻においてベテラン接客者および学習者が何れの視認対象物を視認しているかを示す情報、並びに所定時間内において、各視認対象物が視認された時間に関する情報を生成する。差異分析部１２０は、判断処理の判断結果および上記各情報を、提示情報生成部１１９ａに送信したのち、Ｓ１１８’に進む。 In S117, the difference analysis unit 120 performs the determination process in which the determination rule read from the determination rule database 141 is applied to at least one of the first to third difference information derived by the difference information derivation unit 118. .. Details of the processing are as described in (Difference analysis section). Further, the difference analysis unit 120 provides information indicating which visual target object the veteran customer and the learner are viewing at each time, and information regarding the time when each visual target object is viewed within a predetermined time. To generate. The difference analysis unit 120 transmits the determination result of the determination process and each of the above information to the presentation information generation unit 119a, and then proceeds to S118'.

Ｓ１１８’では、提示情報生成部１１９ａが、差異分析部１２０による判断結果を参照して提示情報を生成する。換言すれば、提示情報生成部１１９ａは、差異分析部１２０が学習者に提示するものとして判断した差異情報を提示するための提示情報を生成する。また、提示情報生成部１１９ａは、差異分析部１２０から入力された、視認対象物に関する情報であって、図１１に例示する情報を提示情報に含めてもよい。図１１（Ａ）は、学習者が各時刻において何れの視認対象物が視認していたかを示す図である。また、図１１において、Ｏ１〜Ｏ５は、それぞれ視認対象物を示しており、例えばＯ１が「客の目」等に対応する。また、図１１（Ｂ）は、所定時間内において各視認対象物が視認された時間に関する情報であって、学習者が各視認対象物を視認していた時間の割合を示す図である。また、提示情報生成部１１９ａは、ベテラン接客者が視認していた視認対象物についても図１１に例示する情報を生成してもよい。また、提示情報生成部１１９ａは、実施形態１のＳ１１８における提示情報生成部１１９と同様の処理も行ってもよい。Ｓ１２０’に進む。 In S118', the presentation information generation unit 119a refers to the determination result by the difference analysis unit 120 to generate the presentation information. In other words, the presentation information generation unit 119a generates presentation information for presenting the difference information that the difference analysis unit 120 has determined to present to the learner. In addition, the presentation information generation unit 119a may include the information regarding the visual recognition target object input from the difference analysis unit 120 and illustrated in FIG. 11 in the presentation information. FIG. 11A is a diagram showing which visual recognition target object the learner was visually recognizing at each time. Further, in FIG. 11, O1 to O5 each represent a visual recognition target object, for example, O1 corresponds to “a customer's eyes” or the like. In addition, FIG. 11B is a diagram showing the ratio of the time during which the learner visually recognizes each visual target object, which is information regarding the time during which each visual target object is visually recognized within a predetermined time period. The presentation information generation unit 119a may also generate the information illustrated in FIG. 11 for the visual target object visually recognized by the veteran customer. The presentation information generation unit 119a may also perform the same processing as the presentation information generation unit 119 in S118 of the first embodiment. Proceed to S120'.

Ｓ１２０’では、ヘッドマウントディスプレイ２００の制御部２５０が、ディスプレイ２７０等に、提示情報生成部１１９ａが生成した提示情報を表示させ（提示情報表示ステップ）、Ｓ１２２に進み、処理を終了する。このとき、ヘッドマウントディスプレイ２００の制御部２５０は、スピーカー２４０等に提示情報生成部１１９ａが生成した提示情報を音声として出力してもよい（図示せず）。なお、ディスプレイ２７０等に表示される提示情報は、全ての情報が一画面に表示されずともよく、例えばユーザの指示によって表示される情報が切り替え可能であってもよい。また、Ｓ１１８’において、提示情報生成部１１９ａが、ベテラン接客者が視認していた視認対象物についても図１１に例示した情報を生成した場合、制御部２５０は、ベテラン接客者と学習者とに対応する図１１に例示した情報を、比較可能にディスプレイ２７０等に表示させてもよい。これにより、学習者は、ベテラン接客者によるデータと比較して、自身がより着目すべき視認対象物を容易に把握することができる。 In S120', the control unit 250 of the head mounted display 200 causes the display 270 or the like to display the presentation information generated by the presentation information generation unit 119a (presentation information display step), and proceeds to S122 to end the processing. At this time, the control unit 250 of the head mounted display 200 may output the presentation information generated by the presentation information generation unit 119a to the speaker 240 or the like as sound (not shown). Note that the presentation information displayed on the display 270 or the like may not be all information displayed on one screen, and for example, the information displayed may be switchable according to a user's instruction. In addition, in S118′, when the presentation information generation unit 119a generates the information illustrated in FIG. 11 also regarding the visual target object visually recognized by the veteran customer, the control unit 250 causes the veteran customer and the learner to learn. The corresponding information illustrated in FIG. 11 may be displayed on the display 270 or the like in a comparable manner. As a result, the learner can easily grasp the visual target object to which he or she should pay more attention, as compared with the data by the experienced customer.

〔ソフトウェアによる実現例〕
情報処理装置１００（１００ａ）の制御ブロックは、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ソフトウェアによって実現してもよい。 [Example of software implementation]
The control block of the information processing device 100 (100a) may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be realized by software.

後者の場合、情報処理装置１００は、各機能を実現するソフトウェアであるプログラムの命令を実行するコンピュータを備えている。このコンピュータは、例えば少なくとも１つのプロセッサ（制御装置）を備えていると共に、上記プログラムを記憶したコンピュータ読み取り可能な少なくとも１つの記録媒体を備えている。そして、上記コンピュータにおいて、上記プロセッサが上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記プロセッサとしては、例えばＣＰＵ（Central Processing Unit）を用いることができる。上記記録媒体としては、「一時的でない有形の媒体」、例えば、ＲＯＭ（Read Only Memory）等の他、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路等を用いることができる。また、上記プログラムを展開するＲＡＭ（Random Access Memory）等をさらに備えていてもよい。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワーク又は放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明の一態様は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the information processing apparatus 100 includes a computer that executes instructions of a program that is software that realizes each function. The computer includes, for example, at least one processor (control device) and at least one computer-readable recording medium that stores the program. Then, in the computer, the processor reads the program from the recording medium and executes the program to achieve the object of the present invention. As the processor, for example, a CPU (Central Processing Unit) can be used. As the recording medium, a "non-transitory tangible medium" such as a ROM (Read Only Memory), a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. Further, a RAM (Random Access Memory) for expanding the program may be further provided. The program may be supplied to the computer via any transmission medium (communication network or broadcast wave) capable of transmitting the program. Note that one aspect of the present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the program is embodied by electronic transmission.

本発明の各態様に係る情報処理装置は、コンピュータによって実現してもよく、この場合には、コンピュータを上記情報処理装置が備える各部（ソフトウェア要素）として動作させることにより上記情報処理装置をコンピュータにて実現させる情報処理装置の制御プログラム、及びそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The information processing apparatus according to each aspect of the present invention may be realized by a computer. In this case, the information processing apparatus is converted into a computer by operating the computer as each unit (software element) included in the information processing apparatus. The control program of the information processing device realized by the above, and a computer-readable recording medium recording the program are also included in the scope of the present invention.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。さらに、各実施形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成することができる。 The present invention is not limited to the above-described embodiments, but various modifications can be made within the scope of the claims, and embodiments obtained by appropriately combining the technical means disclosed in the different embodiments Is also included in the technical scope of the present invention. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.

１００、１００ａ情報処理装置
１１０、１１０ａ制御部
１１１顔情報取得部
１１２状態検出部
１１３視認対象特定部
１１４音声情報取得部
１１５動き情報取得部
１１６意思決定情報特定部
１１７学習者情報取得部
１１８差異情報導出部
１１９、１１９ａ提示情報生成部
１２０差異分析部
１３０通信部
１４０、１４０ａ記憶部
１４１判断ルールデータベース
２００ヘッドマウントディスプレイ
２１０カメラ
２２０マイク
２３０動きセンサ
２４０スピーカー
２５０制御部
２６０通信部
２７０ディスプレイ
３００装着型インターフェース
３１０動きセンサ
３２０ボタン
４００情報管理サーバ
５００表示部 100, 100a Information processing device 110, 110a Control part 111 Face information acquisition part 112 State detection part 113 Visual recognition target identification part 114 Voice information acquisition part 115 Motion information acquisition part 116 Decision information specific part 117 Learner information acquisition part 118 Difference information Derivation unit 119, 119a Presentation information generation unit 120 Difference analysis unit 130 Communication unit 140, 140a Storage unit 141 Judgment rule database 200 Head mounted display 210 Camera 220 Microphone 230 Motion sensor 240 Speaker 250 Control unit 260 Communication unit 270 Display 300 Wearable interface 310 Motion Sensor 320 Button 400 Information Management Server 500 Display

Claims

A face information acquisition unit that acquires face information including at least part of the face of a work learner who learns customer service,
A voice information acquisition unit for acquiring voice information including information on the utterance by the work learner;
First difference information indicating a difference between at least a part of the state of the face indicated by the face information acquired by the face information acquisition unit and the first reference information, and the voice information acquired by the voice information acquisition unit and the first difference information. A difference information derivation unit that derives second difference information indicating a difference from the second reference information;
An information processing apparatus, comprising: a presentation information generation unit that generates presentation information according to at least one of the first difference information and the second difference information.

With reference to the face information, the work learner further comprises a visual target specifying unit that specifies a target to be visually recognized,
The information processing apparatus according to claim 1, wherein the difference information derivation unit derives the first difference information with reference to the visual recognition target specified by the visual recognition target specifying unit.

The visual recognition target specifying unit further specifies the degree of concentration of the work learner on the target visually recognized by the work learner,
The information processing apparatus according to claim 2, wherein the difference information deriving unit derives the first difference information with reference to the degree of concentration specified by the visual recognition target specifying unit.

When the target visually recognized by the work learner is different from the target indicated by the first reference information, the presentation information generation unit generates presentation information in which the target indicated by the first reference information is highlighted. The information processing apparatus according to claim 2 or 3, characterized in that.

The information processing apparatus according to any one of claims 1 to 4, wherein the presentation information generation unit generates presentation information according to a difference in voice indicated by the second difference information.

6. The presentation information generation unit calculates a score according to the first difference information and the second difference information, and includes the calculated score in the presentation information. The information processing apparatus according to item 1.

Further comprising a motion information acquisition unit for acquiring motion information indicating the motion of at least a part of the body of the work learner,
The difference information deriving unit derives third difference information indicating a difference between the motion information and third reference information,
The information processing apparatus according to any one of claims 1 to 6, wherein the presentation information generated by the presentation information generation unit includes information according to the third difference information.

A difference analysis unit that performs a determination process applying a determination rule to at least one of the first difference information, the second difference information, and the third difference information,
The information processing apparatus according to any one of claims 1 to 7, wherein the presentation information generation unit generates the presentation information with reference to a determination result by the difference analysis unit.

The difference analysis unit generates information indicating which visible target object is visually recognized at each time,
The information processing apparatus according to claim 8, wherein the presentation information generation unit includes, in the presentation information, information indicating which visual recognition target object is visually recognized at each time.

The difference analysis unit, within a predetermined time, to generate information about the time each visual recognition object is visually recognized,
The information processing apparatus according to claim 8 or 9, wherein the presentation information generation unit includes, in the presentation information, information regarding a time when each visually-recognized object is visually recognized within the predetermined time.

A presentation system comprising the information processing device according to claim 1 and a head mounted display,
The head mounted display is
An imaging unit for imaging at least a part of the face of the work learner;
A sound collecting unit that collects the voice uttered by the work learner,
A presentation system comprising: a presentation unit that presents the presentation information.

A program for causing a computer to function as the information processing apparatus according to claim 1.