JP7111042B2

JP7111042B2 - Information processing device, presentation system, and information processing program

Info

Publication number: JP7111042B2
Application number: JP2019069945A
Authority: JP
Inventors: 一希笠井; 慎江上
Original assignee: Omron Corp
Current assignee: Omron Corp
Priority date: 2019-01-17
Filing date: 2019-04-01
Publication date: 2022-08-02
Anticipated expiration: 2039-04-01
Also published as: WO2020148919A1; JP2020115202A

Description

本発明は情報処理装置、情報処理方法、提示システム、及び情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, a presentation system, and an information processing program.

近年、作業者が作業内容を習得することを支援するシステムが開発されている。例えば、特許文献１には、画像情報に基づき、模範作業者と学習作業者との視線又は作業動作の違いを表示する作業教育システムが開示されている。 In recent years, systems have been developed that assist workers in learning the details of their work. For example, Patent Literature 1 discloses a work training system that displays differences in line of sight or work motion between a model worker and a learning worker based on image information.

特開２０１８－１８００９０JP 2018-180090

一方で、接客作業等においては、画像情報のみを用いる上述の従来技術では、作業内容の習得を適切に支援することができないという問題がある。 On the other hand, in customer service work, etc., there is a problem that the above-described conventional technique using only image information cannot appropriately support learning of the work content.

本発明の一態様は、接客作業に関する作業学習者の学習を支援することのできる情報処理装置を実現することにある。 One aspect of the present invention is to realize an information processing apparatus capable of assisting a work learner's learning related to customer service work.

上記の課題を解決するために、本発明の一態様に係る情報処理装置は、接客作業を学習する作業学習者の顔の少なくとも一部の情報を含む顔情報を取得する顔情報取得部と、前記作業学習者による発話の情報を含む音声情報を取得する音声情報取得部と、前記顔情報取得部が取得した顔情報が示す顔の少なくとも一部の状態と第１の参照情報との差異を示す第１の差異情報、及び、前記音声情報取得部が取得した音声情報と第２の参照情報との差異を示す第２の差異情報を導出する差異情報導出部と、前記第１の差異情報及び前記第２の差異情報の少なくとも何れかに応じた提示情報を生成する提示情報生成部とを備えていることを特徴とする。 In order to solve the above problems, an information processing apparatus according to an aspect of the present invention includes a face information acquisition unit that acquires face information including at least part of face information of a work learner who learns customer service work, a speech information acquisition unit that acquires speech information including information of an utterance by the task learner; and a difference information derivation unit for deriving second difference information indicating a difference between the audio information acquired by the audio information acquisition unit and the second reference information; and the first difference information and a presentation information generation unit that generates presentation information corresponding to at least one of the second difference information.

本発明の一態様によれば、接客作業に関する作業学習者の学習を支援することのできる情報処理装置を実現することができる。 According to one aspect of the present invention, it is possible to realize an information processing apparatus capable of assisting a work learner's learning related to customer service work.

本発明の一実施形態に係る提示システムの構成要素を示すブロック図である。1 is a block diagram illustrating components of a presentation system according to one embodiment of the invention; FIG. 本発明の一実施形態に係るヘッドマウントディスプレイのディスプレイが表示する画像の一例を示す図である。FIG. 4 is a diagram showing an example of an image displayed by the display of the head mounted display according to one embodiment of the present invention; 本発明の一実施形態に係る視認対象特定部又は意思決定情報特定部が特定した学習者の視認対象、発話、動作及び集中度の一例を示す表である。It is a table|surface which shows an example of a learner's visual recognition target which the visual recognition target specific|specification part or the decision-making information specific|specification part which concerns on one Embodiment of this invention identified, an utterance, an operation|movement, and a degree of concentration. 本発明の一実施形態に係る記憶部に予め記憶されているベテラン接客者のデータの一例を示す表である。It is a table|surface which shows an example of the data of a veteran customer service person previously memorize|stored in the memory|storage part which concerns on one Embodiment of this invention. 本発明の一実施形態に係る差異情報導出部が導出した差異情報のデータの一例を示す表である。4 is a table showing an example of difference information data derived by a difference information derivation unit according to an embodiment of the present invention; 本発明の一実施形態に係る提示システムの処理を示すフローチャートである。4 is a flow chart showing processing of the presentation system according to one embodiment of the present invention. 本発明の一実施形態に係る提示情報生成部が生成した提示情報の一例を示す図である。It is a figure which shows an example of the presentation information which the presentation information production|generation part which concerns on one Embodiment of this invention produced|generated. 本発明の一実施形態に係る提示情報生成部が生成した提示情報の一例を示す図である。It is a figure which shows an example of the presentation information which the presentation information production|generation part which concerns on one Embodiment of this invention produced|generated. 本発明の一実施形態に係る提示システムの構成要素を示すブロック図である。1 is a block diagram illustrating components of a presentation system according to one embodiment of the invention; FIG. 本発明の一実施形態に係る提示システムの処理を示すフローチャートである。4 is a flow chart showing processing of the presentation system according to one embodiment of the present invention. 視認対象物に関する情報の一例を示す図である。It is a figure which shows an example of the information regarding a visual recognition target object.

以下、本発明の一実施形態について、詳細に説明する。以下の特定の項目（実施形態）における構成について、それが他の項目で説明されている構成と同じである場合は、説明を省略する場合がある。また、説明の便宜上、各項目に示した部材と同一の機能を有する部材については、同一の符号を付し、適宜その説明を省略する。 An embodiment of the present invention will be described in detail below. If the configuration in a specific item (embodiment) below is the same as the configuration described in other items, the description may be omitted. For convenience of explanation, members having the same functions as members shown in each item are denoted by the same reference numerals, and explanations thereof are omitted as appropriate.

〔実施形態１〕
１．提示システム
本実施形態に係る提示システム８００は、接客作業を学習する作業学習者に対して学習支援を行う。学習支援装置の形態としては、例えばＶＲ等を用いた体験的な学習装置が挙げられる。本実施形態に係る提示システム８００は、作業学習者（例えば、新米アルバイト。以下、単に学習者と記載することがある）の情報と、予め記憶されている模範作業者（例えば、ベテラン接客者）の情報（「参照情報」とも呼ぶ）との差異（「差異情報」とも呼ぶ）を、提示情報として学習者へ提示することで、作業学習者に対して、自立的かつ体験的な学習を支援することが可能である。以下においては、接客が必要とされる飲食店等で働く学習者が接客作業を学習する状況において提示システム８００を用いる場合を例に挙げて説明する。 [Embodiment 1]
1. Presentation System A presentation system 800 according to the present embodiment provides learning support to a work learner who learns customer service work. As a form of the learning support device, for example, there is an experiential learning device using VR or the like. The presentation system 800 according to the present embodiment includes information on a work learner (for example, a new part-time worker; hereinafter sometimes simply referred to as a learner) and a pre-stored model worker (for example, a veteran customer). By presenting the difference (also called "difference information") from the information (also called "reference information") to the learner as presentation information, it supports independent and experiential learning for the work learner. It is possible to In the following, a case where the presentation system 800 is used in a situation where a learner who works at a restaurant or the like where customer service is required learns customer service work will be described as an example.

学習者の情報及び参照情報は、例えば、視認する対象及び視認する対象への集中度、発話内容及び声のトーン並びに動作等を含む。 The learner's information and reference information include, for example, the object to be visually recognized, the degree of concentration on the object to be visually recognized, the content of the utterance, the tone of voice, the action, and the like.

図１は、本発明の一実施形態に係る提示システムの構成要素を示すブロック図である。図１に示すように、本実施形態に係る提示システム８００は、情報処理装置１００、ヘッドマウントディスプレイ２００、装着型インターフェース３００、情報管理サーバ４００及び表示部５００を備えている。このような構成により、本実施形態に係る提示システム８００は、学習者に対して学習支援を行うことができる。 FIG. 1 is a block diagram illustrating components of a presentation system according to one embodiment of the invention. As shown in FIG. 1 , a presentation system 800 according to this embodiment includes an information processing device 100 , a head mounted display 200 , a wearable interface 300 , an information management server 400 and a display section 500 . With such a configuration, the presentation system 800 according to this embodiment can provide learning support to the learner.

＜ヘッドマウントディスプレイ＞
図１に示すように、ヘッドマウントディスプレイ２００は、学習者の顔の少なくとも一部を撮像するカメラ２１０（「撮像部」とも呼ぶ）、学習者の発話音声を集音するマイク２２０（「集音部」とも呼ぶ）、スピーカー２４０及びディスプレイ２７０（いずれも「提示部」と呼ぶことがある）、動きセンサ２３０、制御部２５０並びに通信部２６０を備えている。 <Head-mounted display>
As shown in FIG. 1 , the head-mounted display 200 includes a camera 210 (also referred to as an “imaging unit”) that captures at least a portion of the face of the learner, and a microphone 220 that collects the learner’s uttered voice (“sound collecting unit”). ), a speaker 240 and a display 270 (both of which may be referred to as a “presentation unit”), a motion sensor 230 , a control unit 250 and a communication unit 260 .

ヘッドマウントディスプレイ２００の制御部２５０は、ヘッドマウントディスプレイ２００の各部を統括的に制御するものである。制御部２５０の各機能、及びヘッドマウントディスプレイ２００に含まれる全ての機能は、例えば、記憶部（図示せず）等に記憶されたプログラムを、ＣＰＵが実行することによって実現されてもよい。 The control unit 250 of the head mounted display 200 comprehensively controls each unit of the head mounted display 200 . Each function of the control unit 250 and all functions included in the head mounted display 200 may be implemented by the CPU executing a program stored in a storage unit (not shown) or the like.

通信部２６０は、制御部２５０の制御を受けて、情報処理装置１００と通信する。通信の形態は、特に限定されることなく、Bluetooth（登録商標）、ＷｉＦｉ（登録商標）、その他の近距離無線通信技術による通信であってもよいし、他の通信であってもよい。 The communication unit 260 communicates with the information processing device 100 under the control of the control unit 250 . The form of communication is not particularly limited, and may be communication using Bluetooth (registered trademark), WiFi (registered trademark), other short-range wireless communication technologies, or other communication.

カメラ２１０は、学習者の顔の画像を顔情報として取得する。カメラ２１０は、学習者の身体の少なくとも一部の画像を顔の画像と同時に取得してもよい。マイク２２０は、学習者の発する声を音声情報として取得する。動きセンサ２３０は、例えば加速度センサであり、学習者の動きを動き情報として取得する。 The camera 210 acquires an image of the learner's face as face information. Camera 210 may acquire an image of at least a portion of the learner's body at the same time as the facial image. The microphone 220 acquires the learner's voice as audio information. The motion sensor 230 is, for example, an acceleration sensor, and acquires the learner's motion as motion information.

スピーカー２４０は、通信部２６０を介してヘッドマウントディスプレイ２００に入力された音を出力する。ディスプレイ２７０は、通信部２６０を介してヘッドマウントディスプレイ２００に入力された画像を出力する。ディスプレイ２７０が出力する画像は、仮想空間を表示するものであってよく、ディスプレイ２７０の表示例として図２に示す画像が挙げられる。図２は、本発明の一実施形態に係るヘッドマウントディスプレイのディスプレイが表示する画像の一例を示す図である。図２に示す画像２００Ａは、店員である学習者の視点から見た飲食店内の様子であり、画像２００Ａには、客、自分の手元、自分以外の店員、及びテーブル等が表示されている。 Speaker 240 outputs sound input to head mounted display 200 via communication unit 260 . Display 270 outputs an image input to head mounted display 200 via communication unit 260 . The image output by the display 270 may display a virtual space, and an example of the display on the display 270 is the image shown in FIG. FIG. 2 is a diagram showing an example of an image displayed by the display of the head mounted display according to one embodiment of the present invention. An image 200A shown in FIG. 2 shows the inside of a restaurant seen from the viewpoint of a learner who is a store clerk, and the image 200A shows customers, their hands, other store employees, tables, and the like.

また、図３は、視認対象特定部又は意思決定情報特定部が特定した学習者の視認対象、発話（発話内容及び声のトーン）、動作及び集中度（瞬き、瞳孔の状態、視線停留時間及び表情）の一例を示す表である。 In addition, FIG. 3 shows the visual recognition target of the learner identified by the visual recognition target identification unit or the decision making information identification unit, utterance (utterance content and tone of voice), action and degree of concentration (blink, pupil state, gaze retention time and 10 is a table showing an example of facial expression).

＜装着型インターフェース＞
図１に示すように、提示システム８００は、装着型インターフェース３００を備えている。一例において、装着型インターフェース３００は、動きセンサ３１０及びボタン３２０を備えおり、学習者が行う操作を情報処理装置１００に入力する。学習者が行う操作としては、一例において、動きセンサ３１０を上げたり下げたりする操作、及びボタン３２０を押下する操作等が挙げられる。 <Wearable interface>
As shown in FIG. 1, presentation system 800 includes wearable interface 300 . In one example, the wearable interface 300 includes a motion sensor 310 and a button 320 to input operations performed by the learner to the information processing device 100 . An example of the operation performed by the learner includes an operation of raising or lowering the motion sensor 310, an operation of pressing the button 320, and the like.

一例において、提示システム８００は、情報処理装置１００における動き情報取得部１１５が取得する学習者の動き情報に、学習者の手の動き等に関するより詳細な情報を含めることができる。本実施形態における学習者の手の動きの例としては、水の入ったコップを持ったりテーブルに置いたりする動き、及び食事の注文を取る動き等が挙げられる。 In one example, the presentation system 800 can include more detailed information about the movement of the learner's hand in the learner's movement information acquired by the movement information acquisition unit 115 in the information processing apparatus 100 . Examples of the learner's hand movements in this embodiment include the movement of holding a cup of water, placing it on the table, and taking a meal order.

また、装着型インターフェース３００を介して入力された学習者の手の動き等の情報は、ヘッドマウントディスプレイ２００におけるディスプレイ２７０が出力する画像において、学習者の手の動き等として反映させることができる。 Information such as the learner's hand movement input via the wearable interface 300 can be reflected in the image output by the display 270 of the head-mounted display 200 as the learner's hand movement.

動きセンサ３１０及びボタン３２０としては、公知のセンサ及びボタンを用いることができ、動きセンサ３１０は、例えば加速度センサである。 A known sensor and button can be used as the motion sensor 310 and the button 320, and the motion sensor 310 is, for example, an acceleration sensor.

＜情報処理装置＞
図１に示すように、本実施形態における情報処理装置１００は、制御部１１０、記憶部１４０及び通信部１３０を備えている。 <Information processing device>
As shown in FIG. 1, the information processing apparatus 100 in this embodiment includes a control section 110, a storage section 140 and a communication section .

本実施形態における情報処理装置１００の制御部１１０は、作業学習者の顔の少なくとも一部の情報を含む顔情報を取得する顔情報取得部１１１と、作業学習者による発話の情報を含む音声情報を取得する音声情報取得部１１４と、顔情報取得部１１１が取得した顔情報が示す顔の少なくとも一部の状態と記憶部１４０が記憶している情報（「第１の参照情報」とも呼ぶ）との差異を示す差異情報（「第１の差異情報」とも呼ぶ）、及び、音声情報取得部１１４が取得した音声情報と記憶部１４０が記憶している情報（「第２の参照情報」とも呼ぶ）との差異を示す差異情報（「第２の差異情報」とも呼ぶ）を導出する差異情報導出部１１８と、差異情報に応じた提示情報を生成する提示情報生成部１１９とを備えている。 The control unit 110 of the information processing apparatus 100 according to the present embodiment includes a face information acquisition unit 111 that acquires face information including at least part of information on the face of the task learner, and a voice information including information of an utterance by the task learner. and information stored in the storage unit 140 and the state of at least a part of the face indicated by the face information acquired by the face information acquisition unit 111 (also referred to as “first reference information”). Difference information (also referred to as “first difference information”) indicating the difference between the audio information acquired by the audio information acquisition unit 114 and information stored in the storage unit 140 (also referred to as “second reference information”) a difference information derivation unit 118 for deriving difference information (also referred to as “second difference information”) indicating a difference between the difference information and the presentation information generation unit 119 for generating presentation information according to the difference information; .

情報処理装置１００は、一例として、ローカルネットワーク又はグローバルネットワークに接続可能な端末装置（例えば、スマートフォン、タブレット、パソコン、又はテレビジョン受像機等）に実装される。 The information processing device 100 is implemented in, for example, a terminal device (eg, smart phone, tablet, personal computer, television receiver, etc.) connectable to a local network or a global network.

〔制御部〕
図１に示すように、本実施形態における情報処理装置１００の制御部１１０は、顔情報取得部１１１、状態検出部１１２、視認対象特定部１１３、音声情報取得部１１４、動き情報取得部１１５、意思決定情報特定部１１６、学習者情報取得部１１７、差異情報導出部１１８及び提示情報生成部１１９を備えている。制御部１１０は、情報処理装置１００の各部を統括的に制御する。制御部１１０の各機能、及び情報処理装置１００に含まれる全ての機能は、例えば、記憶部（図示せず）等に記憶されたプログラムを、ＣＰＵが実行することによって実現されてもよい。 [Control part]
As shown in FIG. 1, the control unit 110 of the information processing apparatus 100 according to the present embodiment includes a face information acquisition unit 111, a state detection unit 112, a visual target identification unit 113, a voice information acquisition unit 114, a motion information acquisition unit 115, A decision-making information identification unit 116 , a learner information acquisition unit 117 , a difference information derivation unit 118 and a presentation information generation unit 119 are provided. The control unit 110 comprehensively controls each unit of the information processing apparatus 100 . Each function of the control unit 110 and all functions included in the information processing apparatus 100 may be implemented by the CPU executing a program stored in a storage unit (not shown) or the like.

（顔情報取得部）
顔情報取得部１１１は、各時点において、学習者の顔の少なくとも一部の情報を含む顔情報を取得する。顔情報取得部１１１は、例えば、ヘッドマウントディスプレイ２００が備えるカメラ２１０から取得した画像から、学習者の顔情報を取得することができる。学習者の顔情報とは、顔の特徴量を示す情報である。顔の特徴量とは、例えば、顔の各部位（例えば、目、鼻、口及び眉等）の位置を示す位置情報、形状を示す形状情報及び大きさを示す大きさ情報等を指す。 (Face information acquisition unit)
The face information acquisition unit 111 acquires face information including information of at least a part of the face of the learner at each time point. The face information acquisition unit 111 can acquire the learner's face information, for example, from an image acquired from the camera 210 included in the head mounted display 200 . A learner's face information is information which shows the feature-value of a face. The face feature quantity refers to, for example, position information indicating the position of each part of the face (eg, eyes, nose, mouth, eyebrows, etc.), shape information indicating the shape, size information indicating the size, and the like.

特に、目の情報からは、学習者が視認する対象を特定することができるため、特に有用である。目の情報としては、例えば目頭及び目尻の端点、虹彩及び瞳孔等のエッジ等が挙げられる。また、顔情報取得部１１１は、撮像部から取得した画像に、ノイズ低減、エッジ強調等の補正処理を適宜行ってもよい。顔情報取得部１１１は、抽出した顔情報を状態検出部１１２に送信する。 In particular, eye information is particularly useful because it enables the learner to identify an object visually recognized. Eye information includes, for example, end points of the inner and outer corners of the eye, edges of the iris and pupil, and the like. Further, the face information acquisition unit 111 may appropriately perform correction processing such as noise reduction and edge enhancement on the image acquired from the imaging unit. Face information acquisition section 111 transmits the extracted face information to state detection section 112 .

（状態検出部）
状態検出部１１２は、顔情報取得部１１１が取得した学習者の顔情報から上記学習者の状態を検出する。例えば、状態検出部１１２は、各時点における学習者が視認しているオブジェクト（図３における「視認対象」）、各時点における１分あたりの瞬きの回数（図３における「瞬き」）、各時点における瞳孔の開閉（図３における「瞳孔の状態」）、視認対象を視認している時間（図３における「視線停留時間」）、又は表情（図３における表情）等を検出する。 (Status detector)
The state detection unit 112 detects the state of the learner from the learner's face information acquired by the face information acquisition unit 111 . For example, the state detection unit 112 detects an object that the learner is visually recognizing at each time point (“viewing target” in FIG. 3), the number of blinks per minute at each time point (“blinking” in FIG. 3), each time point The opening and closing of the pupil (“pupil state” in FIG. 3), the time during which the visual recognition target is viewed (“visual gaze retention time” in FIG. 3), or the facial expression (facial expression in FIG. 3) are detected.

状態検出部１１２は、顔情報取得部１１１が抽出した顔情報に基づき、学習者の状態を検出する。状態検出部１１２は、学習者の状態を検出した後、該検出結果を視認対象特定部１１３へ送信する。 The state detection unit 112 detects the learner's state based on the face information extracted by the face information acquisition unit 111 . After detecting the state of the learner, the state detection unit 112 transmits the detection result to the visual recognition target identification unit 113 .

状態検出部１１２は、一例として、顔情報取得部１１１が取得した顔の特徴量である顔の各部位（例えば、目、鼻、口、頬及び眉等）の位置を示す位置情報、形状を示す形状情報及び大きさを示す大きさ情報等を参照し、学習者の状態として、例えば、上記学習者の視認対象、瞳孔の状態、瞬きの回数、眉の動き、頬の動き、瞼の動き、唇の動き及び顎の動きのうち少なくとも１つを検出する。 For example, the state detection unit 112 obtains position information indicating the position of each part of the face (for example, eyes, nose, mouth, cheeks, eyebrows, etc.), which is the feature amount of the face acquired by the face information acquisition unit 111, and the shape. With reference to the shape information and the size information indicating the size, the learner's state includes, for example, the learner's visual target, pupil state, number of blinks, eyebrow movement, cheek movement, eyelid movement , lip movement and/or jaw movement.

このように、状態検出部を備えることで、情報処理装置は、学習者の状態を好適に検出することできる。 By including the state detection unit in this way, the information processing apparatus can suitably detect the state of the learner.

視認対象の検出方法としては、特に限定されないが、情報処理装置１００に、点光源（図示せず）を設け、点光源からの光の角膜反射像を撮像部で所定時間撮影することにより、学習者の視認対象を検出する方法が挙げられる。点光源の種類は特に限定されず、可視光、赤外光が挙げられるが、例えば赤外線ＬＥＤを用いることで、学習者に不快感を与えることなく、視認対象を検出することができる。 Although the method for detecting the object to be viewed is not particularly limited, the information processing apparatus 100 is provided with a point light source (not shown), and the imaging unit captures a corneal reflection image of the light from the point light source for a predetermined period of time. A method of detecting an object to be visually recognized by a person can be mentioned. The type of the point light source is not particularly limited, and includes visible light and infrared light. For example, by using an infrared LED, it is possible to detect the visual recognition target without giving discomfort to the learner.

学習者の状態として、視認対象の他には、集中度を挙げることができる。一般的に、人間は、集中している場合、低い頻度で安定した間隔で瞬きしており、瞳孔が開く傾向にあるため、瞬きの回数及び瞳孔の状態（サイズ）を検出することで、学習者の集中度を評価することができる。例えば、瞬きの回数を所定時間検出し、所定時間内で瞬きが安定した間隔で行われている場合、学習者がある対象を注視している可能性が高いといえる。また、瞳孔のサイズを所定時間検出し、所定時間内で瞳孔が大きくなっている時間が長い場合は、学習者がある対象を注視している可能性が高いといえる。 As a learner's state, the degree of concentration can be mentioned in addition to the visual recognition target. In general, when humans are concentrating, they blink at low and stable intervals, and their pupils tend to dilate. It is possible to evaluate the degree of concentration of a person. For example, when the number of blinks is detected for a predetermined period of time, and blinking is performed at stable intervals within the predetermined period of time, it can be said that there is a high possibility that the learner is gazing at a certain object. Further, if the size of the pupil is detected for a predetermined period of time and the pupil size is long within the predetermined period of time, it can be said that there is a high possibility that the learner is gazing at a certain object.

瞬きの回数を検出する方法としては、特に限定されないが、例えば、赤外光を学習者の目に対して照射し、開眼時と、閉眼時との赤外光量反射量の差を検出する方法等が挙げられる。 The method of detecting the number of blinks is not particularly limited, but for example, a method of irradiating the learner's eyes with infrared light and detecting the difference in the amount of reflected infrared light between when the eyes are open and when the eyes are closed. etc.

瞳孔の状態を検出する方法としては、特に限定されないが、例えば、ハフ変換を利用して、目の画像から円形の瞳孔を検出する方法等が挙げられる。瞳孔のサイズに関して、閾値を設定し、瞳孔のサイズが閾値以上である場合は「開」、瞳孔のサイズが閾値未満である場合は「閉」として評価してもよい。 A method for detecting the state of the pupil is not particularly limited, but for example, a method of detecting a circular pupil from an eye image using Hough transform can be used. A threshold may be set for pupil size, and a pupil size greater than or equal to the threshold may be evaluated as "open", and a pupil size less than the threshold may be evaluated as "closed".

また、集中度の評価として、視認対象が所定時間以上変化しないことを検出することも可能である。例えば、上述した技術を用いて学習者の視認対象を検出して、学習者がオブジェクトを視認している時間（図３における「視線停留時間」）を計測することができる。 Moreover, it is also possible to detect that the visual recognition target does not change for a predetermined period of time or more as an evaluation of the degree of concentration. For example, the above-described technology can be used to detect an object to be visually recognized by the learner, and measure the time during which the learner visually recognizes the object (“visual fixation time” in FIG. 3).

状態検出部１１２は、学習者の視認対象、瞳孔の状態及び瞬きの回数、眉の動き、瞼の動き、頬の動き、鼻の動き、唇の動き及び顎の動きのうち少なくとも１つを検出すればよいが、これらを組み合わせることが好ましい。このように検出方法を組み合わせることで、状態検出部１１２は、あるオブジェクトを視認しているときの学習者の集中度を好適に評価することができる。 The state detection unit 112 detects at least one of the learner's visual recognition object, the state of the pupil, the number of blinks, eyebrow movement, eyelid movement, cheek movement, nose movement, lip movement, and jaw movement. However, it is preferable to combine them. By combining the detection methods in this manner, the state detection unit 112 can suitably evaluate the learner's degree of concentration when visually recognizing a certain object.

目の状態以外では、例えば、眉の内側を持ち上げるか、外側を上げるか等の眉の動き、上瞼を上げる、瞼を緊張させる等の瞼の動き、鼻に皺を寄せる等の鼻の動き、上唇を持ち上げる、唇をすぼめる等の唇の動き、頬を持ち上げる等の頬の動き、顎を下げる等の顎の動き等の顔の各部位の状態が挙げられる。学習者の状態として、顔の複数の部位の状態を組み合わせてもよい。 In addition to the state of the eyes, for example, eyebrow movements such as lifting the inside of the eyebrows or raising the outside, eyelid movements such as raising the upper eyelids and tightening the eyelids, and nose movements such as wrinkling the nose. , lip movements such as lifting the upper lip and pursing the lips, cheek movements such as lifting the cheeks, and jaw movements such as lowering the chin. A state of a learner may be a combination of states of a plurality of parts of the face.

状態検出部１１２は、学習者の顔の表情から学習者の機嫌を判断することによって学習者の集中度を算出してもよい。学習者の機嫌に係る状態は、例えば、上述したように、顔の特徴量を参照し、学習者の状態として、例えば、学習者の視認対象、瞳孔の状態、瞬きの回数、眉の動き、頬の動き、瞼の動き、唇の動き及び顎の動きのうち少なくとも１つを検出し、検出した学習者の顔の各部位の状態に基づいて判断することができる。 The state detection unit 112 may calculate the learner's degree of concentration by determining the learner's mood from the learner's facial expression. The state related to the mood of the learner, for example, refers to the feature amount of the face as described above, and the state of the learner includes, for example, the learner's visible object, the state of the pupil, the number of blinks, the movement of the eyebrows, At least one of cheek movement, eyelid movement, lip movement, and jaw movement can be detected, and determination can be made based on the detected state of each part of the learner's face.

学習者の集中度を顔の表情から判断する方法としては、例えば、眉が上方向に動いた場合には、視認対象をより注視しているため集中していると判断することができる。また、例えば、人の顔を視認しているときに頬が上方向に動いた場合には、相手に対して表情を作っているとして、視認対象に集中していると判断することができる。 As a method of judging the degree of concentration of the learner from facial expressions, for example, when the eyebrows move upward, it can be judged that the learner is concentrating because he/she is gazing more at the visual recognition target. Further, for example, when the cheeks move upward while visually recognizing a person's face, it can be determined that the user is concentrating on the visually recognized target, assuming that the user is making a facial expression for the other party.

また、学習者の表情を検出させる場合、状態検出部１１２は、学習者の表情に関する情報を機械学習により算出することもできる。表情に関する情報を取得するための学習処理の具体的な構成は本実施形態を限定するものではないが、例えば、以下のような機械学習的手法の何れか又はそれらの組み合わせを用いることができる。 Moreover, when detecting the learner's facial expression, the state detection unit 112 can also calculate information about the learner's facial expression by machine learning. Although the specific configuration of the learning process for acquiring information about facial expressions does not limit the present embodiment, for example, any one of the following machine learning techniques or a combination thereof can be used.

・サポートベクターマシン（SVM: Support Vector Machine）
・クラスタリング（Clustering）
・帰納論理プログラミング（ILP: Inductive Logic Programming）
・遺伝的アルゴリズム（GP: Genetic Programming）
・ベイジアンネットワーク（BN: Baysian Network）
・ニューラルネットワーク（NN: Neural Network）
ニューラルネットワークを用いる場合、データをニューラルネットワークへのインプット用に予め加工して用いるとよい。このような加工には、データの１次元的配列化、又は多次元的配列化に加え、例えば、データアーギュメンテーション（Deta Argumentation）等の手法を用いることができる。・Support Vector Machine (SVM)
・Clustering
・Inductive Logic Programming (ILP)
・Genetic Algorithms (GP)
・Bayesian Network (BN)
・Neural Network (NN)
When using a neural network, the data may be preprocessed and used as input to the neural network. For such processing, techniques such as data argumentation can be used in addition to one-dimensional or multi-dimensional arraying of data.

また、ニューラルネットワークを用いる場合、畳み込み処理を含む畳み込みニューラルネットワーク（CNN: Convolutional Neural Network）を用いてもよい。より具体的には、ニューラルネットワークに含まれる１又は複数の層（レイヤ）として、畳み込み演算を行う畳み込み層を設け、当該層に入力される入力データに対してフィルタ演算（積和演算）を行う構成としてもよい。またフィルタ演算を行う際には、パディング等の処理を併用したり、適宜設定されたストライド幅を採用したりしてもよい。 Moreover, when using a neural network, a convolutional neural network (CNN: Convolutional Neural Network) including convolution processing may be used. More specifically, one or more layers included in the neural network include a convolution layer that performs a convolution operation, and a filter operation (product-sum operation) is performed on input data input to the layer. may be configured. Further, when performing filter calculation, processing such as padding may be used in combination, or an appropriately set stride width may be employed.

また、ニューラルネットワークとして、数十～数千層に至る多層型又は超多層型のニューラルネットワークを用いてもよい。 As the neural network, a multi-layered or super multi-layered neural network with tens to thousands of layers may be used.

（視認対象特定部）
視認対象特定部１１３は、顔情報取得部１１１が取得した顔情報、又は状態検出部１１２から取得した検出結果を参照して、学習者が視認する対象を特定する。視認する対象（視認対象）は、人であっても物体であってもよい。 (Visual recognition target identification unit)
The visual recognition target identification unit 113 refers to the face information acquired by the face information acquisition unit 111 or the detection result acquired from the state detection unit 112 to identify the target visually recognized by the learner. The object to be visually recognized (visible object) may be a person or an object.

また、視認対象特定部１１３は、状態検出部１１２から取得した検出結果に基づき、作業学習者の集中度を特定する。 In addition, the visual target identification unit 113 identifies the degree of concentration of the work learner based on the detection result obtained from the state detection unit 112 .

このように、情報処理装置１００は、視認対象特定部１１３を備えることで、学習者が視認している対象及び集中度を正確に判定することができる。 In this way, the information processing apparatus 100 can accurately determine the target visually recognized by the learner and the degree of concentration by including the visual recognition target identification unit 113 .

視認対象特定部１１３の具体的な処理について説明する。視認対象特定部１１３は、状態検出部１１２から取得した検出結果から、学習者の視線の先の位置座標を判定することで、視認対象を特定する。例えば、視認対象特定部１１３は、図３に示すように、学習者の視線の先の位置座標が客の足元の座標内にあると特定する。また、視線の情報に加えて、（状態検出部）で記載したように、瞳孔の状態、瞬きの回数、眉の動き、瞼の動き、頬の動き、鼻の動き、唇の動き及び顎の動きの検出結果を参照することで、学習者がどのオブジェクトを集中して視認しているかをさらに好適に特定することができる。 Specific processing of the visual target identification unit 113 will be described. The visual recognition target identification unit 113 identifies the visual recognition target by determining the position coordinates ahead of the learner's line of sight from the detection result acquired from the state detection unit 112 . For example, as shown in FIG. 3, the visual recognition target identification unit 113 identifies that the coordinates of the position ahead of the learner's line of sight are within the coordinates of the customer's feet. In addition to the line-of-sight information, the state of the pupil, the number of blinks, the movement of the eyebrows, the movement of the eyelids, the movement of the cheeks, the movement of the nose, the movement of the lips, and the movement of the jaw, as described in (state detection unit). By referring to the motion detection result, it is possible to more preferably identify which object the learner is concentrating on.

視認対象特定部１１３は、学習者がどのオブジェクトを集中して視認しているかの特定を、（状態検出部）に記載の機械学習的手法を用いて特定してもよい。 The visual recognition target identification unit 113 may identify which object the learner is concentrating on visualizing using the machine learning technique described in (state detection unit).

視認対象特定部１１３は、特定した結果を差異情報導出部１１８へ送信する。 The visual target identification unit 113 transmits the identification result to the difference information derivation unit 118 .

（音声情報取得部）
音声情報取得部１１４は、各時点における学習者による発話の情報を含む音声情報を取得する。音声情報取得部１１４は、例えば、ヘッドマウントディスプレイ２００のマイクから学習者の音声情報を取得することができる。学習者の音声情報として、例えば、声の周波数、声の大きさ等を取得する。音声情報の特定には、公知の音声認識技術を用いることができる。 (Voice information acquisition unit)
The voice information acquisition unit 114 acquires voice information including information on utterances made by the learner at each point in time. The voice information acquisition unit 114 can acquire the learner's voice information from the microphone of the head mounted display 200, for example. As the voice information of the learner, for example, voice frequency, voice volume, etc. are acquired. A known speech recognition technique can be used to specify the speech information.

音声情報取得部１１４が取得した音声情報は、意思決定情報特定部１１６に入力される。 The voice information acquired by the voice information acquisition unit 114 is input to the decision making information identification unit 116 .

（動き情報取得部）
動き情報取得部１１５は、各時点における作業学習者の体の少なくとも一部の動きを示す動き情報を取得する。動き情報取得部１１５は、例えば、図１に示すように、ヘッドマウントディスプレイ２００動きセンサ２３０、又は装着型インターフェース３００の動きセンサ３１０又はボタン３２０から取得する。また、ヘッドマウントディスプレイ２００のカメラ２１０が学習の身体を撮影した撮影画像から取得する。 (Motion information acquisition unit)
The motion information acquisition unit 115 acquires motion information indicating the motion of at least part of the body of the work learner at each time point. The motion information acquisition unit 115 acquires from, for example, the motion sensor 230 of the head mounted display 200, or the motion sensor 310 or button 320 of the wearable interface 300, as shown in FIG. In addition, it is obtained from a photographed image of the learning body photographed by the camera 210 of the head-mounted display 200 .

動き情報取得部１１５が取得した動き情報は、意思決定情報特定部１１６に入力される。 The motion information acquired by the motion information acquiring unit 115 is input to the decision making information specifying unit 116 .

（意思決定情報特定部）
意思決定情報特定部１１６は、音声情報取得部１１４が取得した音声情報及び動き情報取得部１１５が取得した動き情報をもとに、各時点において学習者の意思を示す意思決定情報を特定する。本実施形態において、意思決定情報には、一例として、学習者の発話内容、声のトーン及び行動の内容が含まれる。 (Decision-making information identification unit)
The decision-making information specifying unit 116 specifies decision-making information indicating the intention of the learner at each time based on the voice information acquired by the voice information acquisition unit 114 and the motion information acquired by the motion information acquisition unit 115 . In the present embodiment, the decision-making information includes, for example, the learner's utterance content, voice tone, and action content.

一例として、意思決定情報特定部１１６は、学習者が発した発話内容及び声のトーンを特定する。例えば、図３に示すように、意思決定情報特定部１１６は、発話内容を、「いらっしゃいませ」又は「ご注文は何に致しますか」のように発話内容を示すテキストとして特定し、声のトーンを、一例として高中低のいずれかとして特定する。 As an example, the decision-making information identifying unit 116 identifies the utterance content and the tone of voice uttered by the learner. For example, as shown in FIG. 3, the decision-making information identifying unit 116 identifies the utterance content as text indicating the utterance content, such as "Welcome" or "What would you like to order?" Tones are specified as either high, medium, or low as an example.

また、意思決定情報特定部１１６は、動き情報取得部１１５が取得した動き情報を参照して、各時点において学習者が行った動作を特定する。例えば、意思決定情報特定部１１６は、図３に示すように、動作を「おじぎ」又は「水を置く」等のように特定する。 Also, the decision-making information specifying unit 116 refers to the motion information acquired by the motion information acquiring unit 115 to specify the motion performed by the learner at each time point. For example, as shown in FIG. 3, the decision-making information specifying unit 116 specifies an action such as "bow" or "put water on".

なお、意思決定情報特定部１１６は、（状態検出部）に記載の機械学習的手法を用いて意思決定情報を特定してもよい。 Note that the decision-making information specifying unit 116 may specify the decision-making information using the machine learning method described in (state detection unit).

意思決定情報特定部１１６が特定した意思決定情報は、差異情報導出部１１８に入力される。 The decision making information specified by the decision making information specifying unit 116 is input to the difference information deriving unit 118 .

（差異情報導出部）
差異情報導出部１１８は、視認対象特定部１１３から取得した視認対象、意思決定情報特定部１１６から取得した意思決定情報、及び集中度を参照して、記憶部１４０に記憶されている情報との差異を差異情報として導出する。差異情報の導出の一例を図３～５を用いて説明する。 (Difference information derivation part)
The difference information derivation unit 118 refers to the visual target acquired from the visual target identification unit 113, the decision-making information obtained from the decision-making information identification unit 116, and the degree of concentration, and compares it with the information stored in the storage unit 140. Derive the difference as difference information. An example of derivation of difference information will be described with reference to FIGS.

図３は、視認対象特定部１１３及び意思決定情報特定部１１６が特定した学習者のデータである。図３は、学習者のデータの一例として、オブジェクトを視認したタイミング、視認対象、発話内容、声のトーン、動作及び集中度を示している。 FIG. 3 shows learner data specified by the visual recognition target specifying unit 113 and the decision-making information specifying unit 116 . FIG. 3 shows, as an example of the learner's data, the timing at which an object is visually recognized, the object to be visually recognized, the utterance content, the tone of voice, the action, and the degree of concentration.

図４は、記憶部１４０に予め記憶されているベテラン接客者のデータの一例を示す表である。図４は、ベテラン接客者のデータの一例として、オブジェクトを視認したタイミング、視認対象、発話内容、声のトーン、動作及び集中度を示している。図４における「第１の参照情報」、「第２の参照情報」及び「第３の参照情報」は、それぞれ、ベテラン接客者の視認対象、発話及び動作に対応する。 FIG. 4 is a table showing an example of data of experienced customers stored in the storage unit 140 in advance. FIG. 4 shows, as an example of data of a veteran customer, the timing at which an object is visually recognized, the object to be visually recognized, the utterance content, the tone of voice, the action, and the degree of concentration. "First reference information", "second reference information", and "third reference information" in FIG. 4 respectively correspond to the visual recognition target, utterance, and action of the experienced customer service person.

情報処理装置１００は、一例において、学習者又はベテラン接客者がオブジェクトを視認したタイミングから所定時間内に行われる一連の意思決定（発話及び動作等）を一つのまとまり（図３～５における「イベント」）として扱う。図３～５に示すように、一例において、イベントには、タイミング、視認対象、発話、動作、及び集中度が含まれ、それぞれのイベントはＩＤによって識別可能である。図３～５に示すように、学習者におけるイベントと、ベテラン接客者におけるイベントとは、一例としてイベントＩＤを用いて対応させることができる。本明細書においては、例えば、図３に示すように、学習者のデータにおけるイベントＩＤ１におけるデータは、00;00;03のタイミングにおいて視認した視認対象及びオブジェクトを視認してから一定時間内に行われる意思決定を含む。 In one example, the information processing apparatus 100 collects a series of decision-making (utterances, actions, etc.) made within a predetermined time from the timing when the learner or the veteran customer visually recognizes the object as one unit ("event ”). As shown in FIGS. 3-5, in one example, events include timing, sightings, utterances, actions, and concentration, and each event is identifiable by an ID. As shown in FIGS. 3 to 5, an event for a learner and an event for a veteran customer can be associated using an event ID as an example. In this specification, for example, as shown in FIG. 3, the data in the event ID 1 in the learner's data is performed within a certain period of time after the visual recognition target and the object visually recognized at the timing of 00;00;03. including decisions made by

また、図３及び４における「タイミング」のデータは、一例として、オブジェクトを視認した時点を示すデータであるが、発話のタイミング又は動作のタイミング、ならびにオブジェクトを視認した時点と意思決定（発話及び動作）が終了した時点との間の中間の時点を等であってもよく、また、これらを複数組み合わせたものをデータとしてもよい（図示せず）。さらに、一例において、差異情報導出部１１８は、視認対象特定部１１３と意思決定情報特定部１１６とからそれぞれ取得したタイミングの情報を参照することによって、視認から意思決定（発話又は動作）までの経過時間を算出し、データとして含めてもよい（図示せず）。 In addition, the "timing" data in FIGS. 3 and 4 is, as an example, data indicating the point in time when the object is visually recognized. ) is completed, or a combination of a plurality of these may be used as data (not shown). Furthermore, in one example, the difference information derivation unit 118 refers to the timing information acquired from the visual target identification unit 113 and the decision making information identification unit 116, respectively, to determine the progress from visual recognition to decision making (utterance or action). The time may be calculated and included as data (not shown).

差異情報導出部１１８は、図３及び４にそれぞれ示した学習者のデータ及びベテラン接客者のデータに基づいて、イベントＩＤごとに差異情報を導出する。図５は、差異情報導出部１１８が導出した差異情報のデータの一例を示す表である。図５において、差異情報は、それぞれのイベントＩＤごとに導出された情報であり、対応するイベントＩＤにおける、オブジェクトを視認したタイミング、視認対象、発話内容、声のトーン、動作及び集中度の差異を示す。 The difference information derivation unit 118 derives difference information for each event ID based on the learner's data and the veteran customer's data shown in FIGS. 3 and 4, respectively. FIG. 5 is a table showing an example of difference information data derived by the difference information derivation unit 118 . In FIG. 5, the difference information is information derived for each event ID, and the difference in the timing at which the object was viewed, the target to be viewed, the content of the utterance, the tone of voice, the movement, and the degree of concentration at the corresponding event ID. show.

図５における「第１の差異情報」、「第２の差異情報」及び「第３の差異情報」は、それぞれ、学習者とベテランとにおける視認対象、発話及び動作の差異に対応する。 “First difference information”, “second difference information”, and “third difference information” in FIG. 5 respectively correspond to the differences in visual recognition target, utterance, and action between the learner and the veteran.

差異情報は、図３に示す学習者のデータと図４に示すベテラン接客者のデータとの「ずれ」の情報（表中の「＋／－」又は「＋／－」を用いた（学習者－ベテラン接客者）の値によって示される定量的な情報）、又はベテラン接客者のデータに一致するかどうかの情報（表中の「○／×」によって示される定性的な情報）である。一例において、図５に示したように、学習者のデータとベテラン接客者とのデータが一致しない場合は、差異情報導出部１１８が導出する差異情報は、ベテラン接客者のデータ（表中の「正解」）を含むものであってもよい。 The difference information is the information of "deviation" between the data of the learner shown in FIG. 3 and the data of the veteran customer shown in FIG. Quantitative information indicated by the value of -experienced customer service), or information on whether or not the data matches the data of the veteran customer service (qualitative information indicated by "O/X" in the table). In one example, as shown in FIG. 5, when the data of the learner and the data of the veteran customer do not match, the difference information derived by the difference information derivation unit 118 is the data of the veteran customer (" correct answer”).

一例において、学習者のデータを示す図３中のイベントＩＤ１によれば、タイミング「00；00；03」において、視認対象特定部１１３は、学習者の視認対象が「客の足元」であることを特定している。また、図４中のイベントＩＤ１に示す、記憶部１４０に予め記憶されている情報であるベテラン接客者の情報によれば、タイミング「00；00；03」におけるベテラン接客者の視認対象は「客の目」である。差異情報導出部１１８は、両データに基づいて、学習者の視認対象はベテラン接客者と一致しないと判断し、図５のイベントＩＤ１に示すように、視認対象の項目に関する差異情報として「×」を導出する。さらに、差異情報導出部１１８は、ベテラン接客者の視認対象である「客の目」を正解として導出する。 In one example, according to the event ID1 in FIG. 3 indicating the learner's data, at the timing "00;00;03", the visual recognition target identification unit 113 determines that the learner's visual recognition target is "the customer's feet". are identified. Further, according to the information of the veteran customer, which is the information stored in advance in the storage unit 140, indicated by the event ID 1 in FIG. "eyes". Based on both data, the difference information deriving unit 118 determines that the visual recognition target of the learner does not match the veteran customer service person, and as shown in event ID 1 in FIG. to derive Further, the difference information derivation unit 118 derives the "customer's eyes", which are the visual recognition objects of the experienced customer service, as the correct answer.

また、図３のイベントＩＤ１に示すように、タイミング「00；00；03」において、意思決定情報特定部１１６は、学習者が「いらっしゃいませ」と発話し、「おじぎ」の動作を行っていることを特定している。また、図４のイベントＩＤ１に示すように、記憶部１４０に予め記憶されている情報によれば、ベテラン接客者は、タイミング「00；00；03」において「いらっしゃいませ」と発話し、「おじぎ」の動作を行っている。差異情報導出部１１８は、両データに基づいて、学習者の発話内容及び動作はベテラン接客者と一致すると判断し、図５のイベントＩＤ１に示すように、発話内容の項目と動作の項目に関する差異情報として「○」を導出する。 Further, as shown in event ID 1 in FIG. 3 , at timing “00;00;03”, the decision-making information specifying unit 116 detects that the learner utters “Welcome” and performs a “bow” motion. that is specified. Further, according to the information pre-stored in the storage unit 140, as indicated by event ID 1 in FIG. ” operation. Based on both data, the difference information derivation unit 118 determines that the learner's utterance content and actions match those of the veteran customer, and as shown in event ID 1 in FIG. "○" is derived as information.

そして、図３のイベントＩＤ１に示すように、タイミング「00；00；03」において、例えば、意思決定情報特定部１１６は、学習者の声のトーンが「中」であることを特定している。また、図４のイベントＩＤ１に示すように、記憶部１４０に予め記憶されている情報によれば、タイミング「00；00；03」におけるベテラン接客者の声のトーンは「高」である。差異情報導出部１１８は、両データに基づいて、学習者の声のトーンはベテラン接客者とより低いと判断し、図５のイベントＩＤ１に示すように、声のトーンの項目に関する差異情報として「－」を導出する。 Then, as shown in event ID 1 in FIG. 3, at timing "00;00;03", for example, the decision-making information identifying unit 116 identifies that the learner's voice tone is "medium". . According to the information pre-stored in the storage unit 140, as indicated by the event ID 1 in FIG. 4, the voice tone of the veteran customer at timing "00;00;03" is "high". Based on both data, the difference information derivation unit 118 determines that the tone of voice of the learner is lower than that of the veteran customer, and as shown in event ID 1 in FIG. -" is derived.

さらに、タイミング「00；00；03」において、視認対象特定部１１３は、図３のイベントＩＤ１に示すように、学習者の瞬き（回／分）が「５」であることを特定している。また、図４のイベントＩＤ１に示すように、記憶部１４０に予め記憶されている情報によれば、タイミング「00；00；03」におけるベテラン接客者の瞬きは「１０」である。差異情報導出部１１８は、両データに基づいて、学習者の瞬きはベテラン接客者より５少ないと判断し、図５のイベントＩＤ１に示すように、瞬きの項目に関する差異情報として「－５」を導出する。 Furthermore, at the timing “00;00;03”, the visual recognition target specifying unit 113 specifies that the learner blinks (times/minute) is “5”, as indicated by the event ID 1 in FIG. 3 . . According to the information pre-stored in the storage unit 140, as indicated by the event ID 1 in FIG. 4, the blink of the experienced customer at the timing "00;00;03" is "10". Based on both data, the difference information derivation unit 118 determines that the learner's blink is 5 less than the veteran customer's blink, and as shown in event ID 1 in FIG. derive

タイミング「00；00；05」において、視認対象特定部１１３は、図３のイベントＩＤ２に示すように、学習者の瞳孔の状態が「閉」であることを特定している。また、図４のイベントＩＤ２に示すように、記憶部１４０に予め記憶されている情報によれば、タイミング「00；00；05」におけるベテラン接客者の瞬きは「開」である。差異情報導出部１１８は、両データに基づいて、学習者の瞳孔の状態はベテラン接客者と一致しないと判断し、図５のイベントＩＤ２に示すように、瞳孔の状態の項目に関する差異情報として「×」を導出する。 At the timing “00;00;05”, the visual recognition target specifying unit 113 specifies that the state of the pupil of the learner is “closed”, as indicated by event ID2 in FIG. 3 . Further, as indicated by event ID2 in FIG. 4, according to information pre-stored in the storage unit 140, the veteran customer's blink at timing "00;00;05" is "open". Based on both data, the difference information derivation unit 118 determines that the pupil condition of the learner does not match that of the veteran customer, and as shown in event ID 2 in FIG. x” is derived.

タイミング「00；00；05」において、視認対象特定部１１３は、図３のイベントＩＤ２に示すように、学習者の視線停留時間（秒）が「０．０」であることを特定している。また、図４のイベントＩＤ２に示すように、記憶部１４０に予め記憶されている情報によれば、タイミング「00；00；05」におけるベテラン接客者の視線停留時間は「０．８」である。差異情報導出部１１８は、両データに基づいて、学習者の視線停留時間はベテラン接客者より０．８少ないと判断し、図５のイベントＩＤ２に示すように、視線停留時間の項目に関する差異情報として「－０．８」を導出する。 At the timing "00;00;05", the visual recognition target identification unit 113 identifies that the learner's gaze retention time (seconds) is "0.0", as indicated by event ID2 in FIG. . According to the information stored in advance in the storage unit 140, as indicated by the event ID 2 in FIG. 4, the veteran customer's gaze retention time at the timing "00;00;05" is "0.8". . Based on both data, the difference information derivation unit 118 determines that the learner's gaze retention time is 0.8 less than that of the veteran customer service, and as shown in event ID 2 in FIG. to derive "-0.8".

また、タイミング「00；00；05」において、視認対象特定部１１３は、図３のイベントＩＤ２に示すように、学習者の表情について、眉の動き及び頬の動きが検出されていないことを、「－」として特定している。一方、図４のイベントＩＤ２に示すように、記憶部１４０に予め記憶されている情報によれば、タイミング「00；00；05」におけるベテラン接客者の表情は、眉の動きが「－」、頬の動きが「上方向」である。この場合、差異情報導出部１１８は、両データに基づいて、学習者の眉の動きはベテラン接客者と一致していると判断し、図５のイベントＩＤ２に示すように、眉の動きの項目に関する差異情報として「○」を導出する。また、差異情報導出部１１８は、両データに基づいて、学習者の頬の動きはベテラン接客者と一致しないと判断し、図５のイベントＩＤ２に示すように、頬の動きの項目に関する差異情報として「×」を導出する。 Also, at the timing “00;00;05”, the visual recognition target specifying unit 113 detects that the movement of the eyebrows and the movement of the cheeks are not detected in the expression of the learner, as indicated by the event ID2 in FIG. It is specified as "-". On the other hand, as shown in event ID2 in FIG. 4, according to the information pre-stored in the storage unit 140, the expression of the veteran customer service at timing "00; Cheek movement is "upward". In this case, the difference information derivation unit 118 determines that the eyebrow movement of the learner matches that of the experienced customer based on both data, and as shown in event ID 2 in FIG. "○" is derived as the difference information regarding. Further, the difference information derivation unit 118 determines that the learner's cheek movement does not match the veteran customer's cheek movement based on both data, and, as indicated by event ID 2 in FIG. "x" is derived as

また、視認対象特定部１１３及び意思決定情報特定部１１６によれば、学習者は、図３のイベントＩＤ３に示すように、タイミング「00;00;40」において、「客のテーブル」を視認し、「ご注文は何に致しますか」と発話し、「水を置く」動作を行っている。一方、図４のイベントＩＤ３に示すように、記憶部１４０に予め記憶されている情報によれば、「客のテーブル」を視認し、「ご注文は何に致しますか」と発話し、「水を置く」動作を行うタイミングは「00;00;33」である。差異情報導出部１１８は、両データに基づいて、学習者の一連のデータのまとまりは、ベテラン接客者より00;00;07遅いと判断し、図５のイベントＩＤ３に示すように、タイミングの項目に関する差異情報として「＋00;00;07」を導出する。 Further, according to the visual recognition target identification unit 113 and the decision-making information identification unit 116, the learner visually recognizes the "customer's table" at the timing "00;00;40" as indicated by the event ID3 in FIG. , "What would you like to order?" On the other hand, as indicated by event ID 3 in FIG. 4, according to the information pre-stored in the storage unit 140, the customer visually recognizes the "customer's table", says "What would you like to order?" The timing for performing the operation of "putting water" is "00;00;33". Based on both data, the difference information derivation unit 118 determines that the group of data of the learner is 00; "+00;00;07" is derived as the difference information for

なお、図４のイベントＩＤ４に示すように、記憶部１４０に予め記憶されている情報によれば、ベテラン接客者は、タイミング「00;00;47」において「自分の手元」を視認し、「ご注文を確認いたします」と発話し、「注文内容を復唱」という動作を行っている。その一方で、図３に示すように、視認対象特定部１１３及び意思決定情報特定部１１６はいずれもタイミング「00;00;47」において学習者の視認対象及び意思決定情報を特定しておらず、また、タイミングによらず、「自分の手元」の視認、「ご注文を確認いたします」という発話及び「注文内容を復唱」という動作のいずれも特定されていない。このような場合、差異情報導出部１１８は、両データに基づいて、学習者のデータ中にベテラン接客者のデータに対応するデータが存在しないと判断し、図５の６００に示すように、差異情報として「データ無」、「×」及び正解となる情報を導出する。 According to the information stored in advance in the storage unit 140, as indicated by event ID 4 in FIG. I would like to confirm your order." On the other hand, as shown in FIG. 3, neither the visual recognition target identification unit 113 nor the decision-making information identification unit 116 identifies the visual recognition target and the decision-making information of the learner at the timing “00;00;47”. Also, regardless of the timing, none of the visual recognition of "my hand", the utterance of "I will confirm the order", and the action of "reciting the details of the order" are specified. In such a case, the difference information derivation unit 118 determines that there is no data corresponding to the veteran customer's data in the learner's data based on both data, and as indicated by 600 in FIG. As information, "no data", "x", and correct information are derived.

図５の表から、学習者は、客の目を見ておらず、注文を取るタイミングが遅い傾向にあることが判定される。また、図５の表から、学習者は、客に対して注文の確認をしていないことが判定される。 From the table in FIG. 5, it is determined that learners tend to be late in taking orders because they do not look customers in the eye. Also, from the table in FIG. 5, it is determined that the learner has not confirmed the order with the customer.

差異情報導出部１１８は、（状態検出部）に記載の機械学習的手法を用いて差異情報を導出してもよい。 The difference information derivation unit 118 may derive the difference information using the machine learning method described in (state detection unit).

差異情報導出部１１８で導出された差異情報は、提示情報生成部１１９に入力される。 The difference information derived by the difference information derivation unit 118 is input to the presentation information generation unit 119 .

（学習者情報取得部）
学習者情報取得部１１７は、例えば、情報管理サーバ４００が管理する学習者の情報を学習者情報として取得する。一例として、学習者の情報は、学習者の属性を示す属性情報と、他の学習者と対象学習者とを識別するための学習者識別情報を含む。学習者の属性情報とは、例えば、学習者の年齢、性別、職歴及び本提示システム８００を用いたこれまでの学習期間等である。また、学習者識別情報とは、例えば、学習者のＩＤ又は学習者のメールアドレス等である。 (Student Information Acquisition Department)
The learner information acquisition unit 117 acquires learner information managed by the information management server 400 as learner information, for example. As an example, the learner information includes attribute information indicating attributes of the learner, and learner identification information for identifying other learners and the target learner. The learner's attribute information includes, for example, the learner's age, sex, work history, and the learning period so far using the presentation system 800 . Further, the learner identification information is, for example, a learner's ID or a learner's e-mail address.

このように、学習者の情報が、属性情報と学習者識別情報とを含むことで、後述する提示情報生成部１１９は、個々の学習者の属性に応じて、各学習者にとって効果的な提示情報を提示することができる。詳細は（提示情報生成部）の項目で述べる。 In this way, the learner information includes the attribute information and the learner identification information, so that the presentation information generation unit 119, which will be described later, provides effective presentation for each learner according to the attributes of each learner. Can present information. Details will be described in the item of (presentation information generation unit).

学習者情報取得部１１７は、情報管理サーバ４００から取得した学習者情報を提示情報生成部１１９に入力する。 The learner information acquisition unit 117 inputs the learner information acquired from the information management server 400 to the presentation information generation unit 119 .

（提示情報生成部）
提示情報生成部１１９は、差異情報導出部１１８が導出した差異情報（視認対象特定部１１３が顔情報から特定する視認対象及び集中度、意思決定情報特定部１１６が音声情報から特定する発話及び動き情報から特定する動作に基づいて導出される）に基づいて提示情報を生成する。一例において、提示情報生成部１１９が生成する提示情報は画像である。 (Presentation information generator)
The presentation information generating unit 119 uses the difference information derived by the difference information deriving unit 118 (the visual target and the degree of concentration identified from the face information by the visual target identifying unit 113, the utterance and movement identified by the decision-making information identifying unit 116 from the voice information). derived based on the action specified from the information) to generate the presentation information. In one example, the presentation information generated by the presentation information generator 119 is an image.

提示情報生成部１１９は、学習者が視認する対象が、ベテラン接客者が視認対象と異なる場合に、ベテラン接客者が視認する対象を強調表示した提示情報を生成する。図８は、提示情報生成部１１９が生成した提示情報の一例を示す図である。図８に示す画像２００Ｃは、学習者が視認すべき対象を強調表示した画像であり、図２に示した画像２００Ａに記号として矢印ポインタ２００Ｃ１が加えられている。図８においては、矢印ポインタ２００Ｃ１は矢印の形をしているが、任意の形状及び色であってよい。 The presentation information generating unit 119 generates presentation information in which the target visually recognized by the experienced customer is highlighted when the target visually recognized by the learner is different from the target visually recognized by the experienced customer. FIG. 8 is a diagram showing an example of the presentation information generated by the presentation information generation unit 119. As shown in FIG. An image 200C shown in FIG. 8 is an image in which an object to be visually recognized by the learner is highlighted, and an arrow pointer 200C1 is added as a symbol to the image 200A shown in FIG. In FIG. 8, arrow pointer 200C1 is in the shape of an arrow, but may be of any shape and color.

また、視認対象の強調表示の例としては、他にも、視認対象を枠囲みする、視認対象の周辺の背景色を変更する、視認対象を他のイメージよりも大きく表示する、及び視認対象の画像をポップアップウィンドウとして表示する等が挙げられる。 Other examples of highlighting the visual target include framing the visual target, changing the background color around the visual target, displaying the visual target larger than other images, and displaying the visual target. For example, an image is displayed as a pop-up window.

提示情報生成部１１９が生成する提示情報は常に強調表示を含んでいてもよいし、例えば、差異情報導出部１１８が導出した差異情報に応じて、適宜、強調の有無及び度合いを変更してもよい。 The presentation information generated by the presentation information generation unit 119 may always include highlighting. good.

また、提示情報生成部１１９は、差異情報導出部１１８が導出した差異情報に応じた点数を算出し、算出した点数を提示情報に含ませてもよい。 Also, the presentation information generation unit 119 may calculate a score according to the difference information derived by the difference information derivation unit 118 and include the calculated score in the presentation information.

図７は、提示情報生成部１１９が生成した提示情報の一例を示す図である。図７に示す画像２００Ｂは、テキストとして、指示ウィンドウ２００Ｂ１及び点数ウィンドウ２００Ｂ２を含んでいる。 FIG. 7 is a diagram showing an example of the presentation information generated by the presentation information generation unit 119. As shown in FIG. An image 200B shown in FIG. 7 includes an instruction window 200B1 and a score window 200B2 as text.

画像２００Ｂにおける指示ウィンドウ２００Ｂ１は、学習者が行うべき発話内容及び動作のガイド示すテキストを表示するウィンドウである。ガイドの内容は、記憶部１４０に予め記憶されているベテラン接客者のデータに基づくものである。テキストは、図７に示したように、学習者が行うべき発話内容及び動作の全てを指示するものであってもよいし、例えば、「客がメニューから顔を挙げました、どうしますか？」等のように学習者にヒントを与えるものであってもよい。 The instruction window 200B1 in the image 200B is a window that displays text that guides the learner's utterance content and actions. The contents of the guide are based on the data of experienced customers stored in the storage unit 140 in advance. The text, as shown in FIG. 7, may instruct the learner on all of the utterances and actions to be performed, for example, "The customer looked up from the menu, what would you like to do? , etc., to give a hint to the learner.

画像２００Ｂにおける点数ウィンドウ２００Ｂ２は、学習者の習熟度を示す点数を表示するウィンドウである。点数の算出方法としては、例えば、（ｉ）一致又は不一致で判断した差異情報と、ずれの程度を判断した差異情報とで加点又は減点の方法を分けて、不一致の項目については所定の点数を減点し、ずれている項目についてはずれの程度に応じて減点する点数を変化させる方法、及び（ｉｉ）ずれの程度を判断した差異情報だけでなく、一致又は不一致で判断した差異情報についても、予め定めた範囲を逸脱していなければ（例えば、ベテラン接客者の視認対象が「客の目」であるのに対して学習者の視認対象が「客の鼻」であった、ベテラン接客者の発話内容が「ご注文は何に致しますか」であるのに対して、学習者の発話内容が「ご注文をお伺いいたします」であった等）、程度に応じて減点する点数を変化させる方法等が挙げられるが、特に限られない。 A score window 200B2 in the image 200B is a window that displays a score indicating the proficiency level of the learner. As a method of calculating the score, for example, (i) the method of adding or subtracting points is divided between the difference information determined by matching or non-matching and the difference information determining the degree of deviation, and a predetermined score is calculated for the mismatching item. A method of deducting points and changing the number of points to be deducted according to the degree of deviation for items that are deviated, and (ii) not only the difference information determined by the degree of deviation, but also the difference information determined by match or mismatch in advance. If it does not deviate from the defined range (for example, the veteran customer's visual recognition target is "customer's eyes", whereas the learner's visual recognition target is "customer's nose") For example, the content is "What would you like to order?" and the content of the learner's utterance is "I would like to ask for your order." methods, etc., but are not particularly limited.

一例において、提示情報生成部１１９が生成する提示情報は音声を含む（図示せず）。音声としては、声であっても効果音であってもよい。一例において、声は、指示ウィンドウ２００Ｂ１に表示されるテキストを読み上げるものである。また、効果音の例としては、差異情報の内容に応じて正解又は不正解を示す音（「ピンポン」又は「ブー」等）という音、及び視認対象の強調表示に伴って学習者に注意を促す音等が挙げられる。 In one example, the presentation information generated by the presentation information generator 119 includes audio (not shown). The voice may be a voice or a sound effect. In one example, the voice reads aloud the text displayed in instruction window 200B1. Examples of sound effects include sounds that indicate correct or incorrect answers ("ping-pong" or "boo") according to the content of the difference information, and the highlighting of the visual recognition target that draws the learner's attention. Prompting sound etc. are mentioned.

提示情報生成部１１９は、提示情報を生成するにあたって、学習者情報取得部１１７が取得した学習者情報を参照する。学習者情報を参照することによって、提示情報生成部１１９は、個々の学習者の属性に応じて、各学習者が効果的に学習できるような提示情報を提示することができる。 The presentation information generation unit 119 refers to the learner information acquired by the learner information acquisition unit 117 when generating the presentation information. By referring to the learner information, the presentation information generation unit 119 can present presentation information that enables each learner to learn effectively according to the attribute of each learner.

例えば、本実施形態における学習者が接客業を全く経験したことがない場合には、提示情報生成部１１９は、視認対象をより強調表示した提示情報を生成したり、より具体的な内容のガイドを含む提示情報を生成したりする。また、例えば、学習者のこれまでの学習期間が所定期間より長い場合には、提示情報生成部１１９は、より厳しい条件で算出した学習者の習熟度を示す点数を含む提示情報を生成することができる。さらに、提示情報生成部１１９は、一例において、学習者が好きなアニメのキャラクター等の音声を含む提示情報を生成することができる。 For example, if the learner in the present embodiment has never experienced a hospitality business, the presentation information generation unit 119 generates presentation information that emphasizes the visual recognition target, or creates a guide with more specific content. and generate presentation information including Also, for example, when the learner's previous learning period is longer than a predetermined period, the presentation information generation unit 119 generates presentation information including a score indicating the learner's proficiency level calculated under stricter conditions. can be done. Furthermore, in one example, the presentation information generation unit 119 can generate presentation information including the voice of the learner's favorite anime character or the like.

提示情報生成部１１９は、生成した提示情報をヘッドマウントディスプレイ２００へ送信する。また、提示情報生成部１１９は、提示情報を表示部５００へ送信する。 The presentation information generation unit 119 transmits the generated presentation information to the head mounted display 200 . In addition, presentation information generating section 119 transmits the presentation information to display section 500 .

＜表示部＞
図１に示すように、提示システム８００は、表示部５００を備えている。表示部５００は、提示情報生成部１１９が生成した提示情報を、主に学習者を指導する指導者等向けに表示する。表示部５００が表示する画像は、図３に示した学習者のデータを示す表、及び図５に示した差異情報を示す表を含んでいてよく、さらに、図４に示したベテラン接客者のデータを示す表を含んでいてもよい。また、表示部５００が表示する画像は、ヘッドマウントディスプレイ２００のディスプレイ２７０が表示する画像の一部を含んでいてもよい。 <Display part>
As shown in FIG. 1 , the presentation system 800 has a display section 500 . The display unit 500 displays the presentation information generated by the presentation information generation unit 119 mainly for instructors who instruct learners. The image displayed by the display unit 500 may include the table showing the learner's data shown in FIG. 3 and the table showing the difference information shown in FIG. A table showing the data may be included. Also, the image displayed by the display unit 500 may include part of the image displayed by the display 270 of the head mounted display 200 .

２．提示システムの処理例
図６は、提示システム８００の処理を示すフローチャートである。図６に基づいて提示システム８００の処理について説明する。 2. Processing Example of Presentation System FIG. 6 is a flow chart showing processing of the presentation system 800 . Processing of the presentation system 800 will be described based on FIG.

まず提示システムの使用を開始（ステップＳ１００）し、処理を開始する。ステップＳ（以下、「ステップ」は省略する）１０２に進む。 First, use of the presentation system is started (step S100) to start processing. The process proceeds to step S (hereinafter, “step” is omitted) 102 .

Ｓ１０２では、情報処理装置１００の制御部１１０における学習者情報取得部１１７が、学習者情報を取得する（学習者情報取得ステップ）。処理の詳細は、（学習者情報取得部）に記載の通りである。学習者情報取得部１１７は、学習者情報を提示情報生成部１１９に送信し、Ｓ１０４に進む。 In S102, the learner information acquisition unit 117 in the control unit 110 of the information processing apparatus 100 acquires learner information (learner information acquisition step). The details of the processing are as described in (Learner Information Acquisition Unit). The learner information acquisition unit 117 transmits the learner information to the presentation information generation unit 119, and proceeds to S104.

Ｓ１０４では、顔情報取得部１１１が、学習者のヘッドマウントディスプレイ２００におけるカメラ２１０から取得し、顔画像から顔の各部位を抽出する（顔情報取得ステップ）。処理の詳細は、（顔情報取得部）に記載の通りである。顔情報取得部１１１は、抽出した結果を状態検出部１１２に送信し、Ｓ１０６に進む。 In S104, the face information acquisition unit 111 acquires from the camera 210 in the head mounted display 200 of the learner and extracts each part of the face from the face image (face information acquisition step). The details of the processing are as described in (Face Information Acquisition Unit). The face information acquisition unit 111 transmits the extracted result to the state detection unit 112, and proceeds to S106.

Ｓ１０６では、状態検出部１１２が、学習者の視認対象、瞬き、瞳孔の状態、及び顔の各部位の状態を検出する（状態検出ステップ）。処理の詳細は（状態検出部）に記載の通りである。検出結果を視認対象特定部１１３に送信し、Ｓ１０８に進む。 In S106, the state detection unit 112 detects the learner's visual recognition target, the blink state, the state of the pupil, and the state of each part of the face (state detection step). The details of the processing are as described in (state detector). The detection result is transmitted to the visual recognition target identification unit 113, and the process proceeds to S108.

Ｓ１０８では、視認対象特定部１１３が、学習者が視認する視認対象を特定する（視認対象特定ステップ）。処理の詳細は、（視認対象特定部）に記載の通りである。このとき、視認対象特定部１１３は、視認対象の特定と同時に、視認しているときの集中度を特定してもよい。視認対象特定部１１３は、特定した結果を、差異情報導出部１１８に送信し、Ｓ１１０に進む。 In S<b>108 , the visual target identifying unit 113 identifies the visual target visually recognized by the learner (visual target identification step). The details of the processing are as described in (Visual recognition target identification unit). At this time, the visual recognition target identification unit 113 may identify the degree of concentration during visual recognition at the same time as identifying the visual recognition target. The visual target identifying unit 113 transmits the identified result to the difference information deriving unit 118, and proceeds to S110.

Ｓ１１０では、音声情報取得部１１４又は動き情報取得部１１５の少なくとも一方が、学習者の意思決定情報を、ヘッドマウントディスプレイ２００等が備えるマイク２２０及び動きセンサ３１０等からそれぞれ取得する（意思決定情報取得ステップ）。処理の詳細は、（音声情報取得部）及び（動き情報取得部）に記載の通りである。Ｓ１１２に進む。 In S110, at least one of the voice information acquisition unit 114 and the motion information acquisition unit 115 acquires the learner's decision-making information from the microphone 220 and the motion sensor 310 included in the head-mounted display 200 (decision-making information acquisition step). The details of the processing are as described in (Voice Information Acquisition Unit) and (Motion Information Acquisition Unit). Proceed to S112.

Ｓ１１２では、音声情報取得部１１４又は動き情報取得部１１５の少なくとも一方が、音声情報又は動き情報等（以下、「意思決定情報」ということがある）が取得されたかどうかを参照することによって、意思決定情報の有無を判定する。意思決定情報があった場合は、音声情報取得部１１４及び動き情報取得部１１５は、取得した意思決定情報を意思決定情報特定部１１６に送信し、Ｓ１１３に進む。また、意思決定情報がなかった場合はＳ１１４に進む。 In S112, at least one of the voice information acquisition unit 114 and the motion information acquisition unit 115 refers to whether voice information, motion information, or the like (hereinafter sometimes referred to as “decision-making information”) has been acquired to determine whether or not the decision is made. Determine the presence or absence of determination information. If there is decision-making information, the voice information acquisition unit 114 and the motion information acquisition unit 115 transmit the acquired decision-making information to the decision-making information specifying unit 116, and the process proceeds to S113. If there is no decision-making information, the process proceeds to S114.

Ｓ１１３では、意思決定情報特定部１１６が、学習者の発話内容、声のトーン及び動作等の意思決定情報を特定する（意思決定情報特定ステップ）。処理の詳細は（意思決定情報特定部）に記載の通りである。意思決定情報特定部１１６は、特定した結果を、差異情報導出部１１８に送信し、Ｓ１１４に進む。 In S113, the decision-making information identification unit 116 identifies decision-making information such as the learner's utterance content, tone of voice, and action (decision-making information identification step). The details of the processing are as described in (decision-making information specifying unit). The decision making information specifying unit 116 transmits the specified result to the difference information deriving unit 118, and proceeds to S114.

Ｓ１１４では、差異情報導出部１１８が、記憶部１４０に予め記憶されているベテラン接客者の視認対象及び意思決定情報等の参照情報を取得する（参照情報取得ステップ）。Ｓ１１６に進む。 In S114, the difference information derivation unit 118 acquires reference information such as a visual recognition target and decision-making information of the experienced customer stored in advance in the storage unit 140 (reference information acquisition step). Proceed to S116.

Ｓ１１６では、差異情報導出部１１８が、視認対象特定部１１３が特定した視認対象と、意思決定情報特定部１１６が特定した意思決定情報と、記憶部１４０から取得した参照情報とを参照し、差異情報を導出する（差異情報導出ステップ）。処理の詳細は、（差異情報導出部）に記載の通りである。差異情報導出部１１８は、導出した差異情報を、提示情報生成部１１９に送信し、Ｓ１１８に進む。 In S116, the difference information deriving unit 118 refers to the visual target identified by the visual target identifying unit 113, the decision making information identified by the decision information identifying unit 116, and the reference information acquired from the storage unit 140, and determines the difference. Derive information (difference information derivation step). The details of the processing are as described in (difference information derivation unit). The difference information derivation unit 118 transmits the derived difference information to the presentation information generation unit 119, and proceeds to S118.

Ｓ１１８では、提示情報生成部１１９が、差異情報導出部１１８が導出した差異情報と、学習者情報取得部１１７から取得した学習者情報とを参照し、提示情報を生成する（提示情報生成ステップ）。処理の詳細は、（提示情報生成部）に記載の通りである。情報処理装置１００の制御部１１０は、通信部１３０に、提示情報生成部１１９が生成した提示情報をヘッドマウントディスプレイ２００及び表示部５００に送信させ、Ｓ１２０に進む。 In S118, the presentation information generation unit 119 refers to the difference information derived by the difference information derivation unit 118 and the learner information acquired from the learner information acquisition unit 117 to generate presentation information (presentation information generation step). . The details of the processing are as described in (Presentation Information Generation Unit). The control unit 110 of the information processing apparatus 100 causes the communication unit 130 to transmit the presentation information generated by the presentation information generation unit 119 to the head mounted display 200 and the display unit 500, and proceeds to S120.

Ｓ１２０では、ヘッドマウントディスプレイ２００の制御部２５０が、ディスプレイ２７０等に、提示情報生成部１１９が生成した提示情報を表示させ（提示情報表示ステップ）、Ｓ１２２に進み、処理を終了する。このとき、ヘッドマウントディスプレイ２００の制御部２５０は、スピーカー２４０等に提示情報生成部１１９が生成した提示情報を音声として出力してもよい（図示せず）。 In S120, the control unit 250 of the head mounted display 200 causes the display 270 or the like to display the presentation information generated by the presentation information generation unit 119 (presentation information display step), proceeds to S122, and ends the processing. At this time, the control unit 250 of the head mounted display 200 may output the presentation information generated by the presentation information generation unit 119 to the speaker 240 or the like as audio (not shown).

なお、図７の指示ウィンドウ２００Ｂ１及び点数ウィンドウ２００Ｂ２、並びに図８の矢印ポインタ２００Ｃ１等の提示情報の全てが、必ずしも進行中の接客作業の学習においてリアルタイムに表示されずともよく、例えば今回の接客作業の学習において生成された提示情報が、次回以降の接客作業の学習の際に表示される構成でもよい。また、情報処理装置１００が、ディスプレイ２７０等に表示される映像を記憶部１４０に録画する機能を有しており、当該映像を後から再生した場合に各提示情報が表示される構成でもよい。また、後述する実施形態２においても同様である。 7 and the arrow pointer 200C1 in FIG. 8 may not necessarily be displayed in real time during learning of the customer service work in progress. The presentation information generated in the learning may be displayed at the time of learning the customer service work from the next time onward. Further, the information processing apparatus 100 may have a function of recording an image displayed on the display 270 or the like in the storage unit 140, and may be configured to display each piece of presentation information when the image is reproduced later. Moreover, it is the same also in Embodiment 2 mentioned later.

〔実施形態２〕
１．提示システム
本実施形態に係る提示システム８００ａは、図１に示す提示システム８００と同等の機能を有し、更に、上述した差異情報に対して後述する判断ルールを適用した結果に応じた提示情報を学習者に提示する。これにより、作業学習者に対して、自立的かつ体験的な学習を、より好適に支援することが可能である。以下においては、実施形態１と同様に、接客が必要とされる飲食店等で働く学習者が接客作業を学習する状況において提示システム８００ａを用いる場合を例に挙げて説明する。 [Embodiment 2]
1. Presentation System The presentation system 800a according to the present embodiment has functions equivalent to those of the presentation system 800 shown in FIG. Present to learners. As a result, it is possible to more preferably support independent and experiential learning for the work learner. In the following, as in the first embodiment, a case where the presentation system 800a is used in a situation where a learner who works at a restaurant or the like where customer service is required learns customer service work will be described as an example.

図９は、本発明の一実施形態に係る提示システムの構成要素を示すブロック図である。図９に示すように、本実施形態に係る提示システム８００ａは、図１に示す提示システム８００から、制御部１１０ａが、差異情報についての分析を行う差異分析部１２０を更に備える。また、本実施形態に係る提示情報生成部１１９ａは、差異情報導出部１１８から入力された差異情報に加え、差異分析部１２０から入力された判断結果を参照して提示情報を生成する。また、本実施形態に係る記憶部１４０ａは、判断ルールに関するデータベースである判断ルールデータベース１４１を有する。なお、別の態様として、判断ルールデータベース１４１が、記憶部１４０に含まれず、図示しないメモリ等に別途格納される構成でもよい。 FIG. 9 is a block diagram illustrating components of a presentation system according to one embodiment of the invention. As shown in FIG. 9, in the presentation system 800a according to the present embodiment, the control unit 110a of the presentation system 800 shown in FIG. 1 further includes a difference analysis unit 120 that analyzes the difference information. In addition, the presentation information generation unit 119a according to the present embodiment generates presentation information by referring to the judgment result input from the difference analysis unit 120 in addition to the difference information input from the difference information derivation unit 118. FIG. Further, the storage unit 140a according to the present embodiment has a judgment rule database 141, which is a database regarding judgment rules. As another aspect, the determination rule database 141 may be configured not to be included in the storage unit 140 but to be stored separately in a memory (not shown) or the like.

（差異分析部）
差異分析部１２０は、差異情報導出部１１８が導出した第１～第３の差異情報の少なくとも何れかに対して、判断ルールデータベース１４１から読みだした判断ルールを適用した判断処理を行う。 (Difference Analysis Department)
The difference analysis unit 120 performs determination processing by applying a determination rule read from the determination rule database 141 to at least one of the first to third difference information derived by the difference information derivation unit 118 .

図３～５に示す差異情報のうち、学習者のデータとベテラン接客者とデータとの「ずれ」の情報、及びベテラン接客者のデータに一致するかどうかの情報の全てが提示情報として学習者に提示されたとした場合、却って学習者は、どのデータに着目すればよいかが分かり難くなる。そこで、両者のデータ間の差異が所定の閾値以内であるか、所定の判断基準が満たされていれば、学習者には当該差異は提示されない方が好ましい場合がある。判断ルールデータベース１４１に含まれる判断ルールとは、各差異についての上記閾値または判断基準を規定したデータセットであるものと解してもよい。 Among the difference information shown in FIGS. 3 to 5, all of the information on the "difference" between the data of the learner and the data of the veteran customer and the information on whether or not the data of the veteran customer matches is presented to the learner. , it becomes rather difficult for the learner to understand which data to pay attention to. Therefore, if the difference between the two data is within a predetermined threshold or if a predetermined criterion is satisfied, it may be preferable not to present the difference to the learner. The judgment rule contained in the judgment rule database 141 may be interpreted as a data set that defines the above threshold or judgment criteria for each difference.

例えば、あるタイミングにおいて、ベテラン接客者の視認対象が客の目である場合であって、判断ルールが客の目を含む所定の範囲内を示す場合、差異分析部１２０は、当該タイミングにおける学習者の視認対象が正確には客の目ではなかったとしても上記所定の範囲内であれば許容されるものとして、当該視認対象の差異については学習者に提示しないものとして判断してもよい。一方で、差異分析部１２０は、上記の場合において学習者の視認対象が自分の手元等であったときには、当該視認対象の差異について学習者に提示するものとして判断してもよい。なお、学習者に提示しないものとして許容される上記所定の範囲は、連続しない複数の範囲であってもよい。 For example, at a certain timing, when the visual recognition target of the veteran customer service is the customer's eyes, and the determination rule indicates within a predetermined range including the customer's eyes, the difference analysis unit 120 Even if the visual recognition target is not exactly the customer's eyes, it may be determined that it is permissible as long as it is within the above-mentioned predetermined range, and that the difference in the visual recognition target is not presented to the learner. On the other hand, the difference analysis unit 120 may determine that the difference in the visual recognition target is presented to the learner when the visual recognition target of the learner is his or her hand in the above case. In addition, the above-described predetermined range that is allowed as not to be presented to the learner may be a plurality of discontinuous ranges.

また、例えばあるタイミングにおいて、ベテラン接客者の発話内容が「ご注文は何に致しますか」であった場合、学習者の発話内容が「ご注文をお伺いいたします」であったとしても、同様の内容を示しているものとして、差異分析部１２０は、当該発話内容の差異については学習者に提示しないものとして判断してもよい。一方で、差異分析部１２０は、上記の場合において学習者の発話内容が「ご注文を確認いたします」等であった時には、当該発話内容の差異について学習者に提示するものとして判断してもよい。また、上述した例のように、差異分析部１２０は、発話内容の所定部分が一致していれば同様の内容を示しているものと判断してもよい。また、ベテラン接客者のある発話内容に対して許容される学習者の発話内容を示すデータセットが判断ルールデータベース１４１に含まれる構成でもよい。また、発話内容自体に限定されず、声のトーンの高低についても学習者に提示されるか否かの閾値が設定されていてもよい。 Also, for example, at a certain timing, if the utterance of a veteran customer is "What would you like to order?" The difference analysis unit 120 may determine that the difference in the utterance content is not presented to the learner, assuming that the same content is shown. On the other hand, when the content of the learner's utterance is "I will confirm your order" in the above case, the difference analysis unit 120 determines that the difference in the content of the utterance is presented to the learner. good. Further, as in the example described above, the difference analysis unit 120 may determine that if a predetermined portion of the utterance content matches, the utterance content indicates the same content. Further, the judgment rule database 141 may include a data set indicating the learner's utterance content that is permitted with respect to the utterance content of the experienced customer. Further, a threshold may be set for whether or not to present to the learner not only the content of the utterance itself but also the pitch of the tone of voice.

また、例えばあるタイミングにおいて、ベテラン接客者の動作が「前方を示す」であった場合、学習者の動作が「テーブル番号を述べる」であったとしても、差異分析部１２０は、差異分析部１２０は、当該動作の差異については学習者に提示しないものとして判断してもよい。一方で、差異分析部１２０は、上記の場合において学習者の動作が「注文を尋ねる」等であった時には、当該動作の差異について学習者に提示するものとして判断してもよい。また、ベテラン接客者のある動作に足して許容される学習者の動作を示すデータセットが判断ルールデータベース１４１に含まれる構成でもよい。 Further, for example, at a certain timing, when the action of the experienced customer is "show the front", even if the action of the learner is "state the table number", the difference analysis unit 120 may be judged as not presenting the difference in the motion to the learner. On the other hand, in the above case, when the learner's action is "ask for an order" or the like, the difference analysis unit 120 may determine that the difference in the action should be presented to the learner. Further, the judgment rule database 141 may include a data set indicating actions of learners that are permitted in addition to certain actions of experienced customers.

また、上述した各例に限定されず、例えば判断ルールデータベース１４１が、両者の各行動のタイミングの差異についての閾値を含み、差異分析部１２０が、上記閾値を超えるタイミングの差異を、学習者に提示するものとして判断してもよい。 Also, not limited to the above examples, for example, the judgment rule database 141 includes a threshold for the difference in the timing of each action between the two, and the difference analysis unit 120 provides the learner with the difference in timing exceeding the threshold. It may be judged as presented.

また、差異分析部１２０は、各時刻においてベテラン接客者および学習者が何れの視認対象物を視認しているかを示す情報、並びに所定時間内において、各視認対象物が視認された時間に関する情報を生成する。上記各情報の一例については後述する。 In addition, the difference analysis unit 120 collects information indicating which visual recognition target is visually recognized by the veteran customer service person and the learner at each time, and information regarding the time at which each visual recognition target is visually recognized within a predetermined time. Generate. An example of each of the above information will be described later.

２．提示システムの処理例
図１０は、提示システム８００ａの処理を示すフローチャートである。図１０に基づいて提示システム８００ａの処理について説明する。 2. Processing Example of Presentation System FIG. 10 is a flowchart showing processing of the presentation system 800a. Processing of the presentation system 800a will be described based on FIG.

図１０のフローチャートに基づく処理において、Ｓ１００からＳ１１６までは、実施形態１と同様の処理が実行される。Ｓ１１６における処理が行われたのち、Ｓ１１７に進む。 In the processing based on the flowchart of FIG. 10, the same processing as in the first embodiment is executed from S100 to S116. After the processing in S116 is performed, the process proceeds to S117.

Ｓ１１７では、差異分析部１２０が、差異情報導出部１１８が導出した第１～第３の差異情報の少なくとも何れかに対して、判断ルールデータベース１４１から読みだした判断ルールを適用した判断処理を行う。処理の詳細は、（差異分析部）に記載の通りである。また、差異分析部１２０は、各時刻においてベテラン接客者および学習者が何れの視認対象物を視認しているかを示す情報、並びに所定時間内において、各視認対象物が視認された時間に関する情報を生成する。差異分析部１２０は、判断処理の判断結果および上記各情報を、提示情報生成部１１９ａに送信したのち、Ｓ１１８’に進む。 In S117, the difference analysis unit 120 performs determination processing by applying the determination rule read from the determination rule database 141 to at least one of the first to third difference information derived by the difference information derivation unit 118. . The details of the processing are as described in (difference analysis section). In addition, the difference analysis unit 120 collects information indicating which visual recognition target is visually recognized by the veteran customer service person and the learner at each time, and information regarding the time at which each visual recognition target is visually recognized within a predetermined time. Generate. The difference analysis unit 120 transmits the determination result of the determination process and each of the above information to the presentation information generation unit 119a, and then proceeds to S118'.

Ｓ１１８’では、提示情報生成部１１９ａが、差異分析部１２０による判断結果を参照して提示情報を生成する。換言すれば、提示情報生成部１１９ａは、差異分析部１２０が学習者に提示するものとして判断した差異情報を提示するための提示情報を生成する。また、提示情報生成部１１９ａは、差異分析部１２０から入力された、視認対象物に関する情報であって、図１１に例示する情報を提示情報に含めてもよい。図１１（Ａ）は、学習者が各時刻において何れの視認対象物が視認していたかを示す図である。また、図１１において、Ｏ１～Ｏ５は、それぞれ視認対象物を示しており、例えばＯ１が「客の目」等に対応する。また、図１１（Ｂ）は、所定時間内において各視認対象物が視認された時間に関する情報であって、学習者が各視認対象物を視認していた時間の割合を示す図である。また、提示情報生成部１１９ａは、ベテラン接客者が視認していた視認対象物についても図１１に例示する情報を生成してもよい。また、提示情報生成部１１９ａは、実施形態１のＳ１１８における提示情報生成部１１９と同様の処理も行ってもよい。Ｓ１２０’に進む。 In S<b>118 ′, the presentation information generation unit 119 a refers to the determination result by the difference analysis unit 120 to generate presentation information. In other words, the presentation information generation unit 119a generates presentation information for presenting the difference information determined by the difference analysis unit 120 to be presented to the learner. In addition, the presentation information generation unit 119a may include information about the visual recognition object input from the difference analysis unit 120, and the information illustrated in FIG. 11 in the presentation information. FIG. 11A is a diagram showing which visual recognition target the learner has visually recognized at each time. Further, in FIG. 11, O1 to O5 respectively indicate visual recognition objects, and for example, O1 corresponds to "customer's eyes". Further, FIG. 11B is a diagram showing the ratio of the time during which the learner visually recognized each visual target, which is information related to the time during which each visual target was visually recognized within a predetermined time. In addition, the presentation information generation unit 119a may generate the information illustrated in FIG. 11 for the visually recognized object that was visually recognized by the experienced customer service person. The presentation information generation unit 119a may also perform the same processing as the presentation information generation unit 119 in S118 of the first embodiment. Proceed to S120'.

Ｓ１２０’では、ヘッドマウントディスプレイ２００の制御部２５０が、ディスプレイ２７０等に、提示情報生成部１１９ａが生成した提示情報を表示させ（提示情報表示ステップ）、Ｓ１２２に進み、処理を終了する。このとき、ヘッドマウントディスプレイ２００の制御部２５０は、スピーカー２４０等に提示情報生成部１１９ａが生成した提示情報を音声として出力してもよい（図示せず）。なお、ディスプレイ２７０等に表示される提示情報は、全ての情報が一画面に表示されずともよく、例えばユーザの指示によって表示される情報が切り替え可能であってもよい。また、Ｓ１１８’において、提示情報生成部１１９ａが、ベテラン接客者が視認していた視認対象物についても図１１に例示した情報を生成した場合、制御部２５０は、ベテラン接客者と学習者とに対応する図１１に例示した情報を、比較可能にディスプレイ２７０等に表示させてもよい。これにより、学習者は、ベテラン接客者によるデータと比較して、自身がより着目すべき視認対象物を容易に把握することができる。 In S120', the control unit 250 of the head mounted display 200 causes the display 270 or the like to display the presentation information generated by the presentation information generation unit 119a (presentation information display step), proceeds to S122, and ends the processing. At this time, the control unit 250 of the head mounted display 200 may output the presentation information generated by the presentation information generation unit 119a to the speaker 240 or the like as audio (not shown). Note that all of the presentation information displayed on the display 270 or the like may not be displayed on one screen, and for example, information displayed may be switchable according to a user's instruction. In addition, in S118′, when the presentation information generation unit 119a generates the information illustrated in FIG. 11 for the visual recognition object that the veteran customer has visually recognized, the control unit 250 makes the veteran customer and the learner The corresponding information illustrated in FIG. 11 may be displayed on the display 270 or the like for comparison. As a result, the learner can easily comprehend the visual recognition target that the learner should pay more attention to by comparing with the data of the veteran customer.

〔ソフトウェアによる実現例〕
情報処理装置１００（１００ａ）の制御ブロックは、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ソフトウェアによって実現してもよい。 [Example of realization by software]
The control block of the information processing device 100 (100a) may be implemented by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be implemented by software.

後者の場合、情報処理装置１００は、各機能を実現するソフトウェアであるプログラムの命令を実行するコンピュータを備えている。このコンピュータは、例えば少なくとも１つのプロセッサ（制御装置）を備えていると共に、上記プログラムを記憶したコンピュータ読み取り可能な少なくとも１つの記録媒体を備えている。そして、上記コンピュータにおいて、上記プロセッサが上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記プロセッサとしては、例えばＣＰＵ（Central Processing Unit）を用いることができる。上記記録媒体としては、「一時的でない有形の媒体」、例えば、ＲＯＭ（Read Only Memory）等の他、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路等を用いることができる。また、上記プログラムを展開するＲＡＭ（Random Access Memory）等をさらに備えていてもよい。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワーク又は放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明の一態様は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the information processing apparatus 100 is provided with a computer that executes instructions of a program, which is software that implements each function. This computer includes, for example, at least one processor (control device) and at least one computer-readable recording medium storing the program. In the computer, the processor reads the program from the recording medium and executes it, thereby achieving the object of the present invention. As the processor, for example, a CPU (Central Processing Unit) can be used. As the recording medium, a "non-temporary tangible medium" such as a ROM (Read Only Memory), a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. In addition, a RAM (Random Access Memory) or the like for developing the above program may be further provided. Also, the program may be supplied to the computer via any transmission medium (communication network, broadcast wave, etc.) capable of transmitting the program. Note that one aspect of the present invention can also be implemented in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.

本発明の各態様に係る情報処理装置は、コンピュータによって実現してもよく、この場合には、コンピュータを上記情報処理装置が備える各部（ソフトウェア要素）として動作させることにより上記情報処理装置をコンピュータにて実現させる情報処理装置の制御プログラム、及びそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The information processing apparatus according to each aspect of the present invention may be implemented by a computer. In this case, the information processing apparatus is implemented by the computer by operating the computer as each part (software element) included in the information processing apparatus. A control program for an information processing apparatus realized by a computer and a computer-readable recording medium recording it are also included in the scope of the present invention.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。さらに、各実施形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成することができる。 The present invention is not limited to the above-described embodiments, but can be modified in various ways within the scope of the claims, and can be obtained by appropriately combining technical means disclosed in different embodiments. is also included in the technical scope of the present invention. Furthermore, new technical features can be formed by combining the technical means disclosed in each embodiment.

１００、１００ａ情報処理装置
１１０、１１０ａ制御部
１１１顔情報取得部
１１２状態検出部
１１３視認対象特定部
１１４音声情報取得部
１１５動き情報取得部
１１６意思決定情報特定部
１１７学習者情報取得部
１１８差異情報導出部
１１９、１１９ａ提示情報生成部
１２０差異分析部
１３０通信部
１４０、１４０ａ記憶部
１４１判断ルールデータベース
２００ヘッドマウントディスプレイ
２１０カメラ
２２０マイク
２３０動きセンサ
２４０スピーカー
２５０制御部
２６０通信部
２７０ディスプレイ
３００装着型インターフェース
３１０動きセンサ
３２０ボタン
４００情報管理サーバ
５００表示部 100, 100a information processing apparatus 110, 110a control unit 111 face information acquisition unit 112 state detection unit 113 visual recognition target identification unit 114 voice information acquisition unit 115 motion information acquisition unit 116 decision-making information identification unit 117 learner information acquisition unit 118 difference information Derivation unit 119, 119a Presentation information generation unit 120 Difference analysis unit 130 Communication unit 140, 140a Storage unit 141 Judgment rule database 200 Head mounted display 210 Camera 220 Microphone 230 Motion sensor 240 Speaker 250 Control unit 260 Communication unit 270 Display 300 Wearable interface 310 motion sensor 320 button 400 information management server 500 display unit

Claims

a face information acquisition unit that acquires face information including at least part of face information of a work learner who learns customer service work;
a voice information acquisition unit that acquires voice information including information on the utterance by the working learner;
First difference information indicating a difference between the state of at least a part of the face indicated by the face information acquired by the face information acquisition unit and first reference information including information on the state of at least a part of the model worker's face and a difference information derivation unit for deriving second difference information indicating a difference between the voice information acquired by the voice information acquisition unit and second reference information including information about the voice of the model worker ;
An information processing apparatus, comprising: a presentation information generation unit that generates presentation information corresponding to at least one of the first difference information and the second difference information.

Further comprising a visual target identifying unit that identifies a target visually recognized by the work learner by referring to the face information,
2. The information processing apparatus according to claim 1, wherein the difference information derivation unit derives the first difference information by referring to the visual target identified by the visual target identification unit.

The visual recognition target identification unit further identifies the degree of concentration of the work learner on the target visually recognized by the work learner,
3. The information processing apparatus according to claim 2, wherein the difference information derivation unit derives the first difference information by referring to the degree of concentration specified by the visual recognition target specifying unit.

When the target visually recognized by the work learner is different from the target indicated by the first reference information, the presentation information generation unit generates presentation information in which the target indicated by the first reference information is highlighted. 4. The information processing apparatus according to claim 2, wherein:

5. The information processing apparatus according to any one of claims 1 to 4, wherein the presentation information generation unit generates presentation information according to the difference in sound indicated by the second difference information.

6. The presentation information generator according to any one of claims 1 to 5, wherein the presentation information generating unit calculates scores according to the first difference information and the second difference information, and includes the calculated scores in the presentation information. 1. The information processing apparatus according to 1.

further comprising a motion information acquiring unit that acquires motion information indicating motion of at least part of the body of the work learner,
The difference information deriving unit derives third difference information indicating a difference between the motion information and third reference information,
7. The information processing apparatus according to any one of claims 1 to 6, wherein the presentation information generated by the presentation information generation unit includes information corresponding to the third difference information.

a difference analysis unit that performs determination processing by applying a determination rule to at least one of the first difference information, the second difference information, and the third difference information;
8. The information processing apparatus according to any one of claims 1 to 7, wherein the presentation information generation unit generates the presentation information by referring to the determination result by the difference analysis unit.

The difference analysis unit generates information indicating which visual target is being visually recognized at each time,
9. The information processing apparatus according to claim 8, wherein the presentation information generating unit includes information indicating which viewing target is being viewed at each time in the presentation information.

The difference analysis unit generates information regarding the time at which each visual object was viewed within a predetermined time,
10. The information processing apparatus according to claim 8, wherein the presentation information generation unit includes in the presentation information information about a time at which each visual recognition target is visually recognized within the predetermined time.

A presentation system comprising the information processing device according to any one of claims 1 to 10 and a head-mounted display,
The head mounted display is
an imaging unit that captures at least a portion of the work learner's face;
a sound collecting unit that collects the uttered voice of the working learner;
and a presentation unit that presents the presentation information.

A program for causing a computer to function as the information processing apparatus according to claim 1 .