JP2018049480A

JP2018049480A - Information processing apparatus, evaluation system, and program

Info

Publication number: JP2018049480A
Application number: JP2016184834A
Authority: JP
Inventors: 伊藤　篤; Atsushi Ito; 篤伊藤; 鈴木　譲; Yuzuru Suzuki; 譲鈴木; 河野　功幸; Yoshiyuki Kono; 功幸河野; 耕輔丸山; Kosuke Maruyama
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2016-09-21
Filing date: 2016-09-21
Publication date: 2018-03-29

Abstract

PROBLEM TO BE SOLVED: To extract behavior defined as non-language information, for analysis, from an image obtained by imaging a participant, thereby achieving accurate evaluation.SOLUTION: An information processing apparatus includes: an action detection unit 230 which specifies a section of a human body included in video data, to detect an action of the specified section; a non-language information extraction unit 240 which extracts behavior defined to be evaluated in predetermined evaluation items, on the basis of the action of the section of the human body detected by the action detection unit 230; and a response evaluation unit 250 which evaluates each of the evaluation items, on the basis of the behavior extracted by the non-language information extraction unit 240 and evaluation criteria determined in advance for each of the evaluation items.SELECTED DRAWING: Figure 3

Description

本発明は、情報処理装置、評価システムおよびプログラムに関する。 The present invention relates to an information processing apparatus, an evaluation system, and a program.

講義や集会等において参加者を撮影し、画像解析して分析、評価することが試みられている。特許文献１には、講義中における１以上の受講生の顔を撮影した動画像である受講生画像が格納される受講生画像格納部と、受講生画像から受講生の顔を認識して、認識した顔に対する分析を行う顔分析部と、顔分析部による分析結果に関連する情報を出力する出力部とを備えるようにしたことにより講義に関する分析を行う顔分析装置が開示されている。 Attempts have been made to take pictures of participants in lectures and meetings, and to analyze and evaluate them by image analysis. Patent Document 1 recognizes a student's face from a student image storage unit that stores a student image, which is a moving image obtained by capturing one or more students' faces during a lecture, There has been disclosed a face analysis apparatus for analyzing a lecture by including a face analysis unit for analyzing a recognized face and an output unit for outputting information related to the analysis result by the face analysis unit.

特開２０１３−６１９０６号公報JP 2013-61906 A

講義や集会の参加者は、状況に応じて、顔の表情以外にも身振りや体の向きを変える等の様々な反応をして非言語情報を発する。そのため、顔の分析だけでなく、身体の動作を含めて参加者の発する非言語情報を捉えることにより、より精度の高い分析、評価を行うことができる。また、着目した身体の部位の動き自体を評価しようとすると、参加者が非言語情報として発したものでない意味のない動作までも含んで評価してしまうため、評価の精度が低下する可能性がある。 Participants in lectures and gatherings emit non-verbal information in response to various reactions such as gestures and body orientation in addition to facial expressions. Therefore, it is possible to perform analysis and evaluation with higher accuracy by capturing non-linguistic information issued by participants including not only facial analysis but also body movements. Also, if you try to evaluate the movement of the body part you have focused on, it will be evaluated including the meaningless movement that was not issued by the participant as non-verbal information, which may reduce the accuracy of the evaluation. is there.

本発明は、参加者を撮影して得られた画像から非言語情報として定義された行動を抽出して分析することにより、精度の高い評価を実現することを目的とする。 An object of the present invention is to realize a highly accurate evaluation by extracting and analyzing an action defined as non-linguistic information from an image obtained by photographing a participant.

本発明の請求項１に係る情報処理装置は、
動画データに映っている人体の部位を特定し、特定された部位の動作を検出する動作検出部と、
前記動作検出部により検出された人体の部位の動作に基づき、予め定められた評価項目における評価対象として定義された行動を抽出する行動抽出部と、
前記行動抽出部により抽出された行動および前記評価項目ごとに予め定められた評価基準に基づき、当該評価項目ごとの評価を行う評価部と、
を備えることを特徴とする、情報処理装置である。
請求項２に係る情報処理装置は、前記評価部は、評価対象として定義された行動の種類、出現頻度、継続時間の少なくとも一つに基づいて、当該行動が評価対象とされている評価項目における評価の程度を特定することを特徴とする、請求項１に記載の情報処理装置である。
請求項３に係る情報処理装置は、特定の評価項目に関して、相反する評価となる第１分類に該当する行動と第２分類に該当する行動とが定義され、前記評価部は、当該特定の評価項目に関し、当該第１分類に該当する行動の出現に基づく評価と当該第２分類に該当する行動の出現に基づく評価とに基づいて、当該特定の評価項目における評価を行うことを特徴とする、請求項１または請求項２に記載の情報処理装置である。
請求項４に係る情報処理装置は、前記評価部は、同じ評価項目に関し、継続的に評価を行い、時間経過に伴う評価結果の変化を示す時系列データを生成することを特徴とする、請求項１乃至請求項３のいずれかに記載の情報処理装置である。
請求項５に係る評価システムは、動画データを取得する取得手段と、
前記取得手段により取得された動画データを解析して動画に映っている人物の行動を評価する行動評価手段と、
前記行動評価手段による評価結果を出力する出力手段と、を備え、
前記行動評価手段は、
前記動画データに映っている人体の部位を特定し、特定された部位の動作を検出する動作検出部と、
前記動作検出部により検出された人体の部位の動作に基づき、予め定められた評価項目における評価対象として定義された行動を抽出する行動抽出部と、
前記行動抽出部により抽出された行動および前記評価項目ごとに予め定められた評価基準に基づき、当該評価項目ごとの評価を行う評価部と、
を備えることを特徴とする、評価システムである。
請求項６に係る評価システムは、特定の評価項目に関して、相反する評価となる第１分類に該当する行動と第２分類に該当する行動とが定義され、前記行動評価手段の前記評価部は、当該特定の評価項目に関し、当該第１分類に該当する行動の出現に基づく評価と当該第２分類に該当する行動の出現に基づく評価とに基づいて、当該特定の評価項目における評価を行い、
前記出力手段は、画像を表示する表示部を備え、前記第１分類に該当する行動の出現に基づく評価と前記第２分類に該当する行動の出現に基づく評価とを対比させて表示する画像を当該表示部に表示出力することを特徴とする、請求項５に記載の評価システムである。
請求項７に係る評価システムは、前記出力手段は、画像を表示する表示部を備え、前記行動評価手段の前記評価部により同じ評価項目に関して継続的に行われた評価の結果を、時間経過に伴う評価結果の変化を示すグラフとして表示する画像を当該表示部に表示出力することを特徴とする、請求項５または請求項６に記載の評価システムである。
請求項８に係るプログラムは、コンピュータを、
動画データに映っている人体の部位を特定し、特定された部位の動作を検出する動作検出手段と、
前記動作検出手段により検出された人体の部位の動作に基づき、予め定められた評価項目における評価対象として定義された行動を抽出する行動抽出手段と、
前記行動抽出手段により抽出された行動および前記評価項目ごとに予め定められた評価基準に基づき、当該評価項目ごとの評価を行う評価手段として機能させること、
を特徴とする、プログラムである。 An information processing apparatus according to claim 1 of the present invention provides:
An action detection unit that identifies a part of the human body shown in the video data and detects the action of the specified part;
An action extraction unit for extracting an action defined as an evaluation target in a predetermined evaluation item based on the movement of the part of the human body detected by the movement detection unit;
An evaluation unit that performs evaluation for each evaluation item based on the behavior extracted by the behavior extraction unit and an evaluation criterion that is predetermined for each evaluation item;
An information processing apparatus comprising:
In the information processing apparatus according to claim 2, the evaluation unit includes an evaluation item in which the action is an evaluation target based on at least one of the type, the appearance frequency, and the duration of the action defined as the evaluation target. The information processing apparatus according to claim 1, wherein a degree of evaluation is specified.
The information processing apparatus according to claim 3 defines, for a specific evaluation item, an action corresponding to the first classification and an action corresponding to the second classification, which are contradictory evaluations, and the evaluation unit includes the specific evaluation With respect to the item, based on the evaluation based on the appearance of the action corresponding to the first classification and the evaluation based on the appearance of the action corresponding to the second classification, the specific evaluation item is evaluated. An information processing apparatus according to claim 1 or claim 2.
The information processing apparatus according to claim 4, wherein the evaluation unit continuously evaluates the same evaluation item, and generates time-series data indicating a change in the evaluation result over time. An information processing apparatus according to any one of claims 1 to 3.
An evaluation system according to claim 5 is an acquisition means for acquiring moving image data;
Behavior evaluation means for analyzing the video data acquired by the acquisition means and evaluating the behavior of a person shown in the video;
Output means for outputting an evaluation result by the behavior evaluation means,
The behavior evaluation means includes
An action detection unit for identifying a part of a human body shown in the moving image data and detecting an action of the specified part;
An action extraction unit for extracting an action defined as an evaluation target in a predetermined evaluation item based on the movement of the part of the human body detected by the movement detection unit;
An evaluation unit that performs evaluation for each evaluation item based on the behavior extracted by the behavior extraction unit and an evaluation criterion that is predetermined for each evaluation item;
An evaluation system characterized by comprising:
In the evaluation system according to claim 6, with respect to a specific evaluation item, an action corresponding to the first classification and an action corresponding to the second classification that are contradictory evaluations are defined, and the evaluation unit of the action evaluation means includes: Regarding the specific evaluation item, based on the evaluation based on the appearance of the action corresponding to the first classification and the evaluation based on the appearance of the action corresponding to the second classification, the evaluation on the specific evaluation item is performed,
The output means includes a display unit for displaying an image, and compares the evaluation based on the appearance of the behavior corresponding to the first category with the evaluation based on the appearance of the behavior corresponding to the second category. 6. The evaluation system according to claim 5, wherein display is output on the display unit.
In the evaluation system according to claim 7, the output unit includes a display unit that displays an image, and the evaluation result continuously performed on the same evaluation item by the evaluation unit of the behavior evaluation unit is obtained over time. The evaluation system according to claim 5 or 6, wherein an image to be displayed as a graph indicating a change in the evaluation result is displayed and output on the display unit.
A program according to claim 8 is a computer,
A motion detection means for identifying a part of the human body shown in the video data and detecting a motion of the identified part;
Action extracting means for extracting an action defined as an evaluation target in a predetermined evaluation item based on the action of the part of the human body detected by the action detecting means;
Based on the behavior extracted by the behavior extraction means and the evaluation criteria predetermined for each of the evaluation items, functioning as an evaluation means for performing evaluation for each evaluation item;
It is a program characterized by.

請求項１の発明によれば、動画として取得された身体の部位の動き自体を評価する構成と比較して、非言語情報に基づく精度の高い評価を行うことができる。
請求項２の発明によれば、単に特定の動作が行われたか否かという二値的な評価を行う構成と比較して、精度の高い評価を行うことができる。
請求項３の発明によれば、単に特定の動作が行われたか否かという二値的な評価を行う構成と比較して、複数の評価項目を組み合わせた精度の高い評価を行うことができる。
請求項４の発明によれば、単に特定の動作が行われたか否かを評価する構成と比較して、時間の経過に伴う評価内容の変化を特定し、精度の高い評価を行うことができる。
請求項５の発明によれば、取得手段により動画として取得された身体の部位の動き自体を評価する構成と比較して、非言語情報に基づく精度の高い評価を行うことができる。
請求項６の発明によれば、単に特定の動作が行われたか否かという二値的な評価を行う構成と比較して、複数の評価項目を組み合わせた精度の高い評価を行うことができる。
請求項７の発明によれば、単に特定の動作が行われたか否かを評価する構成と比較して、時間の経過に伴う評価内容の変化を特定し、精度の高い評価を行うことができる。
請求項８の発明によれば、動画として取得された身体の部位の動き自体を評価する構成と比較して、本発明のプログラムを実行するコンピュータにおいて、非言語情報に基づく精度の高い評価を行うことができる。 According to invention of Claim 1, compared with the structure which evaluates the motion itself of the body part acquired as a moving image, highly accurate evaluation based on non-language information can be performed.
According to the second aspect of the present invention, it is possible to perform a highly accurate evaluation as compared with a configuration in which a binary evaluation of whether or not a specific operation has been performed is performed.
According to the invention of claim 3, it is possible to perform a highly accurate evaluation combining a plurality of evaluation items as compared with a configuration in which a binary evaluation of whether or not a specific operation has been performed is performed.
According to the fourth aspect of the present invention, it is possible to identify a change in evaluation contents with the passage of time and perform highly accurate evaluation as compared with a configuration for simply evaluating whether or not a specific operation has been performed. .
According to invention of Claim 5, compared with the structure which evaluates the motion itself of the body part acquired as a moving image by the acquisition means, highly accurate evaluation based on non-language information can be performed.
According to the sixth aspect of the present invention, it is possible to perform a highly accurate evaluation combining a plurality of evaluation items as compared with a configuration in which a binary evaluation of whether or not a specific operation has been performed is performed.
According to the seventh aspect of the present invention, it is possible to identify a change in evaluation contents with the passage of time and perform highly accurate evaluation as compared with a configuration that simply evaluates whether or not a specific operation has been performed. .
According to the eighth aspect of the present invention, the computer executing the program of the present invention performs a highly accurate evaluation based on non-linguistic information, as compared with the configuration for evaluating the movement of the body part acquired as a moving image. be able to.

本実施形態が適用される非言語情報評価システムの構成例を示す図である。It is a figure which shows the structural example of the non-linguistic information evaluation system to which this embodiment is applied. 情報処理装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of information processing apparatus. 情報処理装置の機能構成を示す図である。It is a figure which shows the function structure of information processing apparatus. 端末装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of a terminal device. 端末装置の機能構成を示す図である。It is a figure which shows the function structure of a terminal device. フレーム間特徴量を用いて人体に関わる領域を特定する手法を説明する図であり、図６（Ａ）は、動画の１フレームにおいて、人物が横を向いて椅子に座っている様子を示す図、図６（Ｂ）は、動画の別の１フレームにおいて、同じ人物が前方へ乗り出した様子を示す図である。FIG. 6A is a diagram for explaining a technique for specifying a region related to a human body using inter-frame feature values, and FIG. 6A is a diagram showing a person sitting sideways on a chair in one frame of a moving image. FIG. 6B is a diagram showing a state where the same person has moved forward in another frame of the moving image. 第１の適用場面でビデオカメラにより取得される評価対象者の画像の例を示す図である。It is a figure which shows the example of the image of the evaluation subject acquired with a video camera in a 1st application scene. 第２の適用場面でビデオカメラにより取得される評価対象者の画像の例を示す図である。It is a figure which shows the example of the image of the evaluation subject acquired with a video camera in a 2nd application scene. 評価結果の出力画像の例を示す図である。It is a figure which shows the example of the output image of an evaluation result. 評価結果の出力画像の他の例を示す図である。It is a figure which shows the other example of the output image of an evaluation result.

＜本実施形態が適用される非言語情報評価システムの構成＞
図１は、本実施形態が適用される非言語情報評価システムの構成例を示す図である。図１に示すように、本実施形態による非言語情報評価システム１０は、動画取得装置としてのビデオカメラ１００と、動画解析装置としての情報処理装置２００と、情報処理装置２００による解析結果を出力する出力装置としての端末装置３００とを備える。ビデオカメラ１００と情報処理装置２００、情報処理装置２００と端末装置３００は、それぞれネットワーク２０を介して接続されている。 <Configuration of non-linguistic information evaluation system to which this embodiment is applied>
FIG. 1 is a diagram illustrating a configuration example of a non-language information evaluation system to which the present embodiment is applied. As shown in FIG. 1, the non-linguistic information evaluation system 10 according to the present embodiment outputs a video camera 100 as a moving image acquisition device, an information processing device 200 as a moving image analysis device, and an analysis result by the information processing device 200. And a terminal device 300 as an output device. The video camera 100 and the information processing device 200, and the information processing device 200 and the terminal device 300 are connected via the network 20, respectively.

ネットワーク２０は、ビデオカメラ１００と情報処理装置２００および情報処理装置２００と端末装置３００の間で情報通信を行えるものであれば特に限定されず、例えばインターネットやＬＡＮ（Local Area Network）等としてよい。情報通信に用いられる通信回線は、有線であっても無線であっても良い。ビデオカメラ１００と情報処理装置２００とを接続するネットワーク２０と、情報処理装置２００と端末装置３００とを接続するネットワーク２０とは、共通のネットワークであってもよいし、異なるネットワークであってもよい。また、特に図示しないが、ネットワーク２０にはネットワークや通信回線を接続するためのゲートウェイやハブ等の中継装置が適宜設けられる。 The network 20 is not particularly limited as long as it can perform information communication between the video camera 100 and the information processing apparatus 200, and between the information processing apparatus 200 and the terminal apparatus 300. For example, the network 20 may be the Internet or a LAN (Local Area Network). A communication line used for information communication may be wired or wireless. The network 20 that connects the video camera 100 and the information processing device 200 and the network 20 that connects the information processing device 200 and the terminal device 300 may be a common network or different networks. . Although not particularly illustrated, the network 20 is appropriately provided with a relay device such as a gateway or a hub for connecting a network or a communication line.

本実施形態の非言語情報評価システム１０は、評価対象である人物または評価対象を構成する人物の動画を解析して、その人物の動作や顔の表情といった非言語情報を抽出し、抽出された非言語情報に基づき評価対象を評価する。本実施形態の非言語情報評価システム１０は、例えば、授業、講演、催事、娯楽施設、その他の多くの人が集まる場所や場面において参加者の様子を評価したり、面接のような対象となる個人が固定される場面において個人の様子を評価したりする。評価対象、評価項目、評価内容などは、非言語情報評価システム１０の適用対象や適用場面等に応じて設定される。例えば、評価対象は、個々の人物とされる場合もあるし、複数の人物の集合（グループ、チーム等）とされる場合もある（以下、このような評価対象である人物または評価対象である集合を構成する人物を「評価対象者」と呼ぶ）。評価項目は、例えば評価対象である人物や人物の集合が何かに集中しているか、活発に活動しているか等が設定され、評価内容としては、例えばそのような評価項目に適合する程度（度合）等が判断される。これらの評価については、後で具体的な適用例を挙げて説明する。 The non-linguistic information evaluation system 10 of this embodiment analyzes a moving image of a person who is an evaluation object or a person constituting the evaluation object, and extracts and extracts non-language information such as the movement of the person and facial expressions. Evaluate the evaluation target based on non-linguistic information. The non-linguistic information evaluation system 10 according to the present embodiment is an object such as an interview or an evaluation of a participant in a place or scene where many people gather, such as classes, lectures, events, entertainment facilities, and the like. Evaluate the state of individuals in situations where individuals are fixed. The evaluation target, the evaluation item, the evaluation content, and the like are set according to the application target and application scene of the non-linguistic information evaluation system 10. For example, the evaluation target may be an individual person, or may be a set (group, team, etc.) of a plurality of persons (hereinafter, such evaluation target person or evaluation target). The persons who make up the set are called “evaluators”). The evaluation item is set, for example, whether the person or the set of persons to be evaluated is concentrated on something, whether it is actively active, etc. The evaluation content is, for example, a degree suitable for such an evaluation item ( Degree) etc. are judged. These evaluations will be described later with specific application examples.

図１に示すシステムにおいて、ビデオカメラ１００は、動画データの取得手段の一例であり、本実施形態による評価の適用対象等に応じて、教室、講演会場、催事場、娯楽施設などに設置され、評価対象者を撮影する。本実施形態では、ビデオカメラ１００により撮影された評価対象者の動画を解析し、動作や顔の表情といった非言語情報が抽出される。したがって、評価対象者の構成（個人か集合か等）、設置場所や撮影範囲の広さ等に応じて、評価対象者の動作や表情が識別できるように、ビデオカメラ１００の種類や設置台数が設定される。例えば広い場所で個人を撮影するには望遠カメラが用いられ、広い範囲に存在する複数の人物を撮影するには広角カメラが用いられる。また、評価対象者の身体の様々な部位を撮影するために、複数台のカメラを様々な向きで設置してもよい。また、高解像度のカメラを用いて広範囲を撮影することにより、複数人の画像を取得するとともに、得られた画像を拡大して個人の画像を解析対象とするようにしてもよい。また、本実施形態において、ビデオカメラ１００は、撮影した動画をデジタル・データとして、ネットワーク２０を介して情報処理装置２００へ送信する機能を備える。 In the system shown in FIG. 1, the video camera 100 is an example of a moving image data acquisition unit, and is installed in a classroom, a lecture hall, an event hall, an amusement facility, or the like according to an application target of evaluation according to the present embodiment. Take a picture of the person being evaluated. In the present embodiment, a moving image of an evaluation subject photographed by the video camera 100 is analyzed, and non-linguistic information such as motion and facial expression is extracted. Accordingly, the type and the number of installed video cameras 100 are determined so that the evaluation subject's actions and facial expressions can be identified according to the configuration of the evaluation subject (individual or collective, etc.), the installation location, the shooting range, and the like. Is set. For example, a telephoto camera is used for photographing an individual in a wide place, and a wide-angle camera is used for photographing a plurality of persons existing in a wide range. Further, a plurality of cameras may be installed in various directions in order to photograph various parts of the body of the evaluation subject. Further, by capturing a wide range using a high-resolution camera, a plurality of images may be acquired, and the obtained image may be enlarged so that an individual image is an analysis target. In the present embodiment, the video camera 100 has a function of transmitting a captured moving image as digital data to the information processing apparatus 200 via the network 20.

情報処理装置２００は、行動評価手段の一例であり、ビデオカメラ１００により撮影された動画を解析して評価対象者に関する非言語情報を抽出し、評価するコンピュータ（サーバ）である。情報処理装置２００は、単体のコンピュータにより構成してもよいし、ネットワーク２０に接続された複数のコンピュータにより構成してもよい。後者の場合、後述する本実施形態の情報処理装置２００としての機能は、複数のコンピュータによる分散処理にて実現される。 The information processing apparatus 200 is an example of a behavior evaluation unit, and is a computer (server) that analyzes a moving image taken by the video camera 100, extracts non-linguistic information about an evaluation target person, and evaluates it. The information processing apparatus 200 may be configured by a single computer or may be configured by a plurality of computers connected to the network 20. In the latter case, the function as the information processing apparatus 200 of the present embodiment to be described later is realized by distributed processing by a plurality of computers.

図２は、情報処理装置２００のハードウェア構成例を示す図である。図２に示すように、情報処理装置２００は、制御手段および演算手段であるＣＰＵ（Central Processing Unit）２０１と、ＲＡＭ２０２およびＲＯＭ２０３と、外部記憶装置２０４と、ネットワーク・インターフェイス２０５とを備える。ＣＰＵ２０１は、ＲＯＭ２０３に格納されているプログラムを実行することにより、各種の制御および演算処理を行う。ＲＡＭ２０２は、ＣＰＵ２０１による制御や演算処理において作業メモリとして用いられる。ＲＯＭ２０３は、ＣＰＵ２０１が実行するプログラムや制御において用いられる各種のデータを格納している。外部記憶装置２０４は、例えば磁気ディスク装置や、データの読み書きが可能で不揮発性の半導体メモリで実現され、ＲＡＭ２０２に展開されてＣＰＵ２０１により実行されるプログラムや、ＣＰＵ２０１による演算処理の結果を格納する。ネットワーク・インターフェイス２０５は、ネットワーク２０に接続して、ビデオカメラ１００や端末装置３００との間でデータの送受信を行う。なお、図２に示す構成例は、情報処理装置２００をコンピュータで実現するハードウェア構成の一例に過ぎない。情報処理装置２００の具体的構成は、以下に説明する機能を実現し得るものであれば、図２に示す構成例に限定されない。 FIG. 2 is a diagram illustrating a hardware configuration example of the information processing apparatus 200. As illustrated in FIG. 2, the information processing apparatus 200 includes a central processing unit (CPU) 201 that is a control unit and a calculation unit, a RAM 202 and a ROM 203, an external storage device 204, and a network interface 205. The CPU 201 performs various controls and arithmetic processes by executing programs stored in the ROM 203. The RAM 202 is used as a working memory in the control and arithmetic processing by the CPU 201. The ROM 203 stores various data used in programs executed by the CPU 201 and control. The external storage device 204 is realized by, for example, a magnetic disk device or a non-volatile semiconductor memory that can read and write data, and stores a program that is expanded in the RAM 202 and executed by the CPU 201 and a result of arithmetic processing by the CPU 201. The network interface 205 is connected to the network 20 and transmits / receives data to / from the video camera 100 and the terminal device 300. The configuration example illustrated in FIG. 2 is merely an example of a hardware configuration that implements the information processing apparatus 200 with a computer. The specific configuration of the information processing apparatus 200 is not limited to the configuration example illustrated in FIG. 2 as long as the functions described below can be realized.

図３は、情報処理装置２００の機能構成を示す図である。図３に示すように、情報処理装置２００は、動画データ取得部２１０と、領域識別部２２０と、動作検出部２３０と、非言語情報抽出部２４０と、反応評価部２５０と、出力部２６０とを備える。 FIG. 3 is a diagram illustrating a functional configuration of the information processing apparatus 200. As illustrated in FIG. 3, the information processing apparatus 200 includes a moving image data acquisition unit 210, a region identification unit 220, an operation detection unit 230, a non-language information extraction unit 240, a reaction evaluation unit 250, and an output unit 260. Is provided.

動画データ取得部２１０は、例えば図２に示すコンピュータにおいて、ＣＰＵ２０１がプログラムを実行し、ネットワーク・インターフェイス２０５を制御することにより実現される。動画データ取得部２１０は、ネットワーク２０を介してビデオカメラ１００から動画データを受信する。受信した動画データは、例えば図２に示すＲＡＭ２０２や外部記憶装置２０４に格納される。 For example, in the computer shown in FIG. 2, the moving image data acquisition unit 210 is realized by the CPU 201 executing a program and controlling the network interface 205. The moving image data acquisition unit 210 receives moving image data from the video camera 100 via the network 20. The received moving image data is stored, for example, in the RAM 202 or the external storage device 204 shown in FIG.

領域識別部２２０は、例えば図２に示すコンピュータにおいて、ＣＰＵ２０１がプログラムを実行することにより実現される。領域識別部２２０は、動画データ取得部２１０により取得された動画を解析し、後段の非言語情報抽出部２４０により非言語情報として抽出される評価対象者の部位が映っている領域を識別する。具体的には、人体（全体）が映っている領域、人体の頭部、体部、腕部、手部、指などが映っている領域、頭部の顔、目、口、鼻、耳などが映っている領域、上半身、下半身が映っている領域、その他身体の各特徴点が映っている領域等を識別する（以下、人体の全体や一部分を特に区別せず、部位、身体の部位などと呼ぶ）。識別対象の部位としては、予め定められた部位を全て識別してもよいし、後段の非言語情報抽出部２４０による抽出や反応評価部２５０による評価の内容に基づき、これらの処理に用いられる部位のみを識別してもよい。 For example, in the computer shown in FIG. 2, the area identification unit 220 is realized by the CPU 201 executing a program. The area identifying unit 220 analyzes the moving image acquired by the moving image data acquiring unit 210, and identifies an area in which the part of the evaluation target person extracted as non-linguistic information by the non-linguistic information extracting unit 240 in the subsequent stage is shown. Specifically, areas where the human body (the whole) is shown, areas where the human head, body, arms, hands, fingers, etc. are reflected, head face, eyes, mouth, nose, ears, etc. The area where the body is reflected, the area where the upper body and the lower body are reflected, and the area where each body feature point is reflected, etc. (Hereinafter, the whole body part or part of the human body is not particularly distinguished. Called). As the parts to be identified, all the predetermined parts may be identified, or parts used for these processes based on the extraction by the subsequent non-linguistic information extraction unit 240 and the evaluation by the reaction evaluation unit 250 Only may be identified.

動作検出部２３０は、例えば図２に示すコンピュータにおいて、ＣＰＵ２０１がプログラムを実行することにより実現される。動作検出部２３０は、領域識別部２２０の識別結果に基づき、各領域に映っている身体の部位を特定し、特定した部位ごとの動作を検出する。具体的には、頭の動き、顔の向き、顔の構成部位（目、口など）の動き、腕や脚の動き、身体の向き、身体の移動（歩きまわる等）等の動作を検出する。検出対象の動作としては、予め定められた部位についての予め定められた動作を全て対象として検出してもよいし、後段の非言語情報抽出部２４０による抽出や反応評価部２５０による評価の内容に基づき、これらの処理に用いられる部位の動作のみを検出してもよい。 For example, in the computer shown in FIG. 2, the operation detection unit 230 is realized by the CPU 201 executing a program. The motion detection unit 230 identifies the body part shown in each region based on the identification result of the region identification unit 220, and detects the motion for each identified region. Specifically, motions such as head movements, face orientations, face component movements (eyes, mouth, etc.), arm and leg movements, body orientations, body movements (walking, etc.) are detected. . As the motion of the detection target, all of the predetermined motions regarding a predetermined portion may be detected as targets, or the content of the evaluation by the non-linguistic information extraction unit 240 in the subsequent stage or the evaluation by the reaction evaluation unit 250 may be used. Based on this, only the movement of the part used for these processes may be detected.

非言語情報抽出部２４０は、例えば図２に示すコンピュータにおいて、ＣＰＵ２０１がプログラムを実行することにより実現される。非言語情報抽出部２４０は、動作検出部２３０により検出された部位の動きに基づき、評価対象者の行動のうち、反応評価部２５０の評価項目ごとの評価に用いられるもの（非言語情報）を抽出する。言い換えれば、非言語情報抽出部２４０は、評価対象者の発する非言語情報として定義された行動を抽出する行動抽出部である。具体的には、例えば、うなずく動作、顔を特定の方向に向けたり顔の向きを変えたりする動作、表情の変化、口を動かして発言する動作、欠伸（あくび）をする動作、居眠りしているときの動き、目くばせをする動作、挙手、筆記動作、キーボードを打つ動作、振り向く動作、貧乏ゆすりなどを抽出する。 The non-linguistic information extraction unit 240 is realized by the CPU 201 executing a program in, for example, the computer shown in FIG. The non-linguistic information extraction unit 240 uses, based on the movement of the part detected by the motion detection unit 230, the evaluation target person's behavior (non-linguistic information) used for evaluation for each evaluation item of the reaction evaluation unit 250. Extract. In other words, the non-linguistic information extraction unit 240 is an action extraction unit that extracts an action defined as non-linguistic information issued by the person to be evaluated. Specifically, for example, nodding, moving the face in a specific direction or changing the direction of the face, changing facial expressions, moving the mouth, speaking, yawning, or falling asleep It extracts motions when you are in motion, moving your eyes, raising your hands, writing, typing your keyboard, turning, turning poor, etc.

反応評価部２５０は、例えば図２に示すコンピュータにおいて、ＣＰＵ２０１がプログラムを実行することにより実現される。反応評価部２５０は、非言語情報抽出部２４０により抽出された評価項目ごとの非言語情報の行動に対し、評価項目ごとに予め定められた評価基準に基づいて、評価対象者の反応を評価する。評価内容は、本実施形態の非言語情報評価システム１０が適用される場面（授業、講演、催事など）に応じて設定される。具体的には、例えば、集中度、活性度、進行の度合い、積極性、応答性などの評価を行う。 For example, in the computer shown in FIG. 2, the reaction evaluation unit 250 is realized by the CPU 201 executing a program. The response evaluation unit 250 evaluates the response of the evaluation target person based on an evaluation criterion predetermined for each evaluation item with respect to the behavior of the non-language information for each evaluation item extracted by the non-linguistic information extraction unit 240. . The evaluation content is set according to the scene (class, lecture, event, etc.) to which the non-linguistic information evaluation system 10 of this embodiment is applied. Specifically, for example, concentration, activity, progress, aggressiveness, responsiveness, etc. are evaluated.

出力部２６０は、例えば図２に示すコンピュータにおいて、ＣＰＵ２０１がプログラムを実行し、ネットワーク・インターフェイス２０５を制御することにより実現される。出力部２６０は、ネットワーク２０を介して、反応評価部２５０による評価結果の情報を端末装置３００に送信する。 For example, in the computer shown in FIG. 2, the output unit 260 is realized by the CPU 201 executing a program and controlling the network interface 205. The output unit 260 transmits information on the evaluation result by the reaction evaluation unit 250 to the terminal device 300 via the network 20.

端末装置３００は、出力手段の一例であり、情報処理装置２００による評価結果を出力する情報端末（クライアント）である。端末装置３００としては、例えばパーソナルコンピュータ、タブレット端末、スマートフォン等の出力手段として画像表示手段を備えた装置が用いられる。 The terminal device 300 is an example of an output unit, and is an information terminal (client) that outputs an evaluation result by the information processing device 200. As the terminal device 300, for example, a device including an image display unit as an output unit such as a personal computer, a tablet terminal, or a smartphone is used.

図４は、端末装置３００のハードウェア構成例を示す図である。図４に示すように、端末装置３００は、ＣＰＵ３０１と、ＲＡＭ３０２およびＲＯＭ３０３と、表示装置３０４と、入力装置３０５と、ネットワーク・インターフェイス３０６とを備える。ＣＰＵ３０１は、ＲＯＭ３０３に格納されているプログラムを実行することにより、各種の制御および演算処理を行う。ＲＡＭ３０２は、ＣＰＵ３０１による制御や演算処理において作業メモリとして用いられる。ＲＯＭ３０３は、ＣＰＵ３０１が実行するプログラムや制御において用いられる各種のデータを格納している。表示装置３０４は、例えば液晶ディスプレイにより構成され、ＣＰＵ３０１の制御により画像を表示する。入力装置３０５は、例えばキーボードやマウス、タッチセンサ等の入力デバイスで実現され、操作者の入力操作を受け付ける。一例として、端末装置３００がタブレット端末やスマートフォン等である場合は、液晶ディスプレイとタッチセンサとが組み合わされたタッチパネルが表示装置３０４および入力装置３０５として機能する。ネットワーク・インターフェイス３０６は、ネットワーク２０に接続して、ビデオカメラ１００や端末装置３００との間でデータの送受信を行う。なお、図４に示す構成例は、端末装置３００をコンピュータで実現するハードウェア構成の一例に過ぎない。端末装置３００の具体的構成は、以下に説明する機能を実現し得るものであれば、図４に示す構成例に限定されない。 FIG. 4 is a diagram illustrating a hardware configuration example of the terminal device 300. As illustrated in FIG. 4, the terminal device 300 includes a CPU 301, a RAM 302 and a ROM 303, a display device 304, an input device 305, and a network interface 306. The CPU 301 performs various controls and arithmetic processes by executing programs stored in the ROM 303. The RAM 302 is used as a work memory in the control and arithmetic processing by the CPU 301. The ROM 303 stores various data used in programs executed by the CPU 301 and control. The display device 304 is configured by a liquid crystal display, for example, and displays an image under the control of the CPU 301. The input device 305 is realized by an input device such as a keyboard, a mouse, or a touch sensor, for example, and accepts an operator's input operation. As an example, when the terminal device 300 is a tablet terminal or a smartphone, a touch panel in which a liquid crystal display and a touch sensor are combined functions as the display device 304 and the input device 305. The network interface 306 is connected to the network 20 and transmits / receives data to / from the video camera 100 and the terminal device 300. The configuration example illustrated in FIG. 4 is merely an example of a hardware configuration that implements the terminal device 300 with a computer. The specific configuration of the terminal device 300 is not limited to the configuration example illustrated in FIG. 4 as long as the functions described below can be realized.

図５は、端末装置３００の機能構成を示す図である。図５に示すように、本実施形態の端末装置３００は、評価結果取得部３１０と、表示画像生成部３２０と、表示制御部３３０と、操作受け付け部３４０とを備える。 FIG. 5 is a diagram illustrating a functional configuration of the terminal device 300. As illustrated in FIG. 5, the terminal device 300 according to the present embodiment includes an evaluation result acquisition unit 310, a display image generation unit 320, a display control unit 330, and an operation reception unit 340.

評価結果取得部３１０は、例えば図４に示すコンピュータにおいて、ＣＰＵ３０１がプログラムを実行し、ネットワーク・インターフェイス３０６を制御することにより実現される。評価結果取得部３１０は、ネットワーク２０を介して情報処理装置２００から評価結果のデータを受信する。受信した評価結果のデータは、例えば図４のＲＡＭ３０２に格納される。 For example, in the computer shown in FIG. 4, the evaluation result acquisition unit 310 is realized by the CPU 301 executing a program and controlling the network interface 306. The evaluation result acquisition unit 310 receives evaluation result data from the information processing apparatus 200 via the network 20. The received evaluation result data is stored, for example, in the RAM 302 of FIG.

表示画像生成部３２０は、例えば図４に示すコンピュータにおいて、ＣＰＵ３０１がプログラムを実行することにより実現される。表示画像生成部３２０は、評価結果取得部３１０により取得された評価結果のデータに基づき、評価結果を示す出力画像を生成する。生成される出力画像の構成や表示態様は、評価項目や評価内容等に応じて設定し得る。出力画像の詳細については後述する。 For example, in the computer shown in FIG. 4, the display image generating unit 320 is realized by the CPU 301 executing a program. The display image generation unit 320 generates an output image indicating the evaluation result based on the evaluation result data acquired by the evaluation result acquisition unit 310. The configuration and display mode of the generated output image can be set according to the evaluation items, evaluation contents, and the like. Details of the output image will be described later.

表示制御部３３０は、例えば図４に示すコンピュータにおいて、ＣＰＵ３０１がプログラムを実行することにより実現される。表示制御部３３０は、表示画像生成部３２０により生成された出力画像を、例えば図４に示すコンピュータにおける表示装置３０４に表示させる。また、表示制御部３３０は、表示装置３０４への表示に関する命令を受け付け、受け付けた命令に基づいて表示の切り替え等の制御を行う。 For example, in the computer shown in FIG. 4, the display control unit 330 is realized by the CPU 301 executing a program. The display control unit 330 displays the output image generated by the display image generation unit 320 on, for example, the display device 304 in the computer shown in FIG. In addition, the display control unit 330 receives a command related to display on the display device 304 and performs control such as display switching based on the received command.

操作受け付け部３４０は、例えば図４に示すコンピュータにおいて、ＣＰＵ３０１がプログラムを実行することにより実現される。操作受け付け部３４０は、操作者が入力装置３０５により行った入力操作を受け付ける。そして、操作受け付け部３４０により受け付けた操作にしたがって、表示制御部３３０が表示装置３０４への出力画像等の表示制御を行う。 For example, in the computer shown in FIG. 4, the operation receiving unit 340 is realized by the CPU 301 executing a program. The operation receiving unit 340 receives an input operation performed by the operator using the input device 305. Then, in accordance with the operation received by the operation receiving unit 340, the display control unit 330 performs display control of an output image or the like to the display device 304.

＜領域識別部の処理＞
情報処理装置２００の領域識別部２２０による処理について説明する。領域識別部２２０は、ビデオカメラ１００により撮影された動画から、その動画に映っている人物の動作に係る部位を識別する。この部位の識別には、既存の種々の画像解析技術を適用してよい。例えば、顔や笑顔の識別は、デジタルカメラ等で実現されている既存の識別手法を用いてよい。また、動画に映されている特定の形状の部分（領域）やそのような複数の部分の配置等に基づいて、身体の部位が映っている領域を特定し得る。さらに一例として、フレーム間特徴量に基づく識別を行ってもよい。具体的には、動画データの連続する２枚以上のフレームの差分に基づき、フレーム間特徴量を求める。ここで、フレーム間特徴量としては、例えば、色の境界（エッジ）、色の変化量、これらによって特定される領域の移動方向や移動量などが用いられる。予め設定された時間分のフレーム間特徴量を累積し、フレームごとのフレーム間特徴量の距離や類似度に基づいて、フレーム間特徴量を分類、統合する。これにより、動画において連携して変化する領域が特定され、身体の部位が映っている領域が識別される。 <Processing of area identification unit>
Processing performed by the area identification unit 220 of the information processing apparatus 200 will be described. The area identifying unit 220 identifies a part related to the motion of a person shown in the moving image from the moving image captured by the video camera 100. Various existing image analysis techniques may be applied to this part identification. For example, an existing identification method realized by a digital camera or the like may be used to identify a face or a smile. Moreover, based on the part (area | region) of the specific shape currently reflected on the moving image, arrangement | positioning of such a some part, etc., the area | region where the body part is reflected can be specified. Further, as an example, identification based on inter-frame feature values may be performed. Specifically, an inter-frame feature value is obtained based on a difference between two or more consecutive frames of moving image data. Here, as the interframe feature amount, for example, a color boundary (edge), a color change amount, a moving direction or a moving amount of an area specified by these, and the like are used. The inter-frame feature quantity for a preset time is accumulated, and the inter-frame feature quantity is classified and integrated based on the distance and similarity of the inter-frame feature quantity for each frame. Thereby, the area | region which changes in cooperation in a moving image is specified, and the area | region where the part of the body is reflected is identified.

図６は、フレーム間特徴量を用いて人体に関わる領域を特定する手法を説明する図である。図６（Ａ）は、動画の１フレームにおいて、人物が横を向いて椅子に座っている様子を示し、図６（Ｂ）は、動画の別の１フレームにおいて、同じ人物が前方へ乗り出した様子を示している。図６に示す例において、領域識別部２２０は、図６（Ａ）に映っている色の境界や変化量に基づき、近似する色が映っている範囲を特定する。そして、領域識別部２２０は、図６（Ａ）のフレームと図６（Ｂ）のフレームとを対比し、対応する色の範囲の移動方向および移動量に基づき、画像中の破線の枠で囲まれた領域２２１において、複数個の色の範囲が連携して動いていることを認識し、この領域２２１を人体の上半身が映っている領域として識別する。図６（Ａ）、（Ｂ）を参照すると、人体（上半身）を構成する色の範囲の動きに応じて、領域２２１の位置や大きさが変化している。なお、ここでは図６（Ａ）、（Ｂ）の２つのフレームを対比したが、３つ以上のフレームを対比して色の範囲の変化等のフレーム間特徴量を累積した結果に基づいて人体が映っている領域を識別するようにしてもよい。 FIG. 6 is a diagram for explaining a method for specifying a region related to a human body using inter-frame feature values. FIG. 6A shows a person sitting sideways in one frame of the video, and FIG. 6B shows that the same person has moved forward in another frame of the video. It shows a state. In the example illustrated in FIG. 6, the region identification unit 220 identifies a range in which an approximate color is reflected based on the color boundary and the amount of change shown in FIG. Then, the region identification unit 220 compares the frame in FIG. 6A and the frame in FIG. 6B and surrounds the frame with a broken line frame in the image based on the moving direction and moving amount of the corresponding color range. In the area 221, it is recognized that a plurality of color ranges are moving in cooperation, and this area 221 is identified as an area in which the upper body of the human body is reflected. Referring to FIGS. 6A and 6B, the position and size of the region 221 change according to the movement of the color range constituting the human body (upper body). Although the two frames in FIGS. 6A and 6B are compared here, the human body is based on the result of accumulating inter-frame feature quantities such as a change in color range by comparing three or more frames. You may make it identify the area | region where is reflected.

＜動作検出部の処理＞
動作検出部２３０による処理について説明する。動作検出部２３０は、領域識別部２２０により識別された身体の部位が映っている領域を解析して、具体的にどの部位が映っているかを特定し、特定した部位ごとの動きを検出する。この動きの検出には、既存の種々の画像解析技術を適用してよい。検出される動きは、特定された部位ごとに身体動作として起こり得る動きである。例えば、目を閉じたり口を開けたりする動き、視線の変化、顔の向きを上下や左右に変える動き、肘の曲げ伸ばしや腕を振る動き、手指の曲げ伸ばしや手を開いたり閉じたりする動き、腰の曲げ伸ばしや体を捻じる動き、膝の曲げ伸ばしや脚を振る動き、歩行等による身体の移動などが検出される。なお、これらの動きは例示に過ぎず、本実施形態の非言語情報評価システム１０で検出し得る動きは、上記に提示した動きに限定されない。本実施形態では、動作検出部２３０は、領域識別部２２０で領域として識別された全ての部位の動きを検出してもよいし、後段の非言語情報抽出部２４０で抽出される動作を特定するための動き等に限定して検出してもよい。例えば、非言語情報抽出部２４０でうなずく動作のみを抽出するのであれば、顔の向きを上下に変えるような頭の動きを検出すればよい。 <Processing of motion detection unit>
Processing by the motion detection unit 230 will be described. The motion detection unit 230 analyzes a region in which the body part identified by the region identification unit 220 is reflected, identifies which part is specifically reflected, and detects a motion for each identified part. Various existing image analysis techniques may be applied to this motion detection. The detected motion is a motion that can occur as a physical motion for each identified part. For example, moving your eyes closed or opening your mouth, changing your line of sight, moving your face up and down, left and right, bending your elbows and waving your arms, bending your fingers, stretching your hands and opening and closing your hands. Motion, bending and stretching of the waist, twisting of the body, bending and stretching of the knee, movement of shaking the legs, movement of the body by walking, etc. are detected. Note that these movements are merely examples, and movements that can be detected by the non-language information evaluation system 10 of the present embodiment are not limited to the movements presented above. In the present embodiment, the motion detection unit 230 may detect the movements of all parts identified as regions by the region identification unit 220 or specify the motions extracted by the non-linguistic information extraction unit 240 at the subsequent stage. Therefore, the detection may be limited to the movement for the purpose. For example, if only the nodding motion is extracted by the non-linguistic information extraction unit 240, it is only necessary to detect a head movement that changes the face direction up and down.

＜非言語情報抽出部の処理＞
非言語情報抽出部２４０による処理について説明する。非言語情報抽出部２４０は、動作検出部２３０により検出された部位の動きに基づいて、評価対象者が意識的にまたは無意識的に行った意味のある行動を非言語情報として抽出する。例えば、顔の向きを上下に変える動きからうなずくという動作を抽出したり、口を動かす動きから発話や欠伸という動作を抽出したり、腕を上げる動きから挙手という動作を抽出したりする。非言語情報の抽出は、単に動作検出部２３０により検出された部位の動きのみに基づいて行われるのではなく、例えば、検出された動きの前後における該当部位の動き、周囲の部位や他の人物の動き、動きが検出された場面や文脈（背景）等の情報も参酌して行われる。具体例を挙げると、顔の向きを上下に連続的に変える動きが特定の時間内で行われたとき、この動きは、うなずきの動作として抽出される。一方、顔の向きが上を向き、ある程度の時間が経過した後に下方向へ動いてもとに戻ったとき、この動きは、思考するために上方を見上げた動作として抽出される。また、顔の向きが下を向き、ある程度の時間が経過したとき、この動作は、居眠りしていることを示す動作として抽出される。なお、これらの動作や参酌情報は例示に過ぎず、本実施形態の非言語情報評価システム１０で非言語情報として抽出し得る動作や参酌情報は、上記に提示した動作や情報に限定されない。 <Processing of non-linguistic information extraction unit>
Processing by the non-language information extraction unit 240 will be described. Based on the movement of the part detected by the motion detection unit 230, the non-linguistic information extraction unit 240 extracts meaningful behavior that the evaluation target person has consciously or unconsciously performed as non-linguistic information. For example, a motion of nodding is extracted from a motion of changing the face direction up and down, a motion of utterance or absence is extracted from a motion of moving the mouth, and a motion of raising a hand is extracted from a motion of raising an arm. Extraction of non-linguistic information is not performed based solely on the movement of the part detected by the motion detection unit 230. For example, the movement of the corresponding part before and after the detected movement, the surrounding part, or another person This is also performed in consideration of information such as movements, scenes where the movements are detected, and context (background). As a specific example, when a motion that continuously changes the orientation of the face up and down is performed within a specific time, this motion is extracted as a motion of nodding. On the other hand, when the face is directed upward, and after a certain amount of time has passed, it moves downward and returns to its original state, and this movement is extracted as an action looking up upward for thinking. Also, when the face is facing down and a certain amount of time has elapsed, this action is extracted as an action indicating that the person is dozing. In addition, these operation | movement and consideration information are only illustrations, and the operation | movement and consideration information which can be extracted as non-language information in the non-language information evaluation system 10 of this embodiment are not limited to the operation | movement and information which were shown above.

＜反応評価部の処理＞
反応評価部２５０による処理について説明する。反応評価部２５０は、非言語情報抽出部２４０により抽出された非言語情報に基づき、非言語情報評価システム１０が適用される場面に応じた評価対象者の反応を評価する。例えば、講義における受講者の反応を評価するのであれば、講義に対する集中の度合い等が評価項目となる。また、参加型の授業であれば、各生徒の集中の度合いや積極性、授業全体の活性度等が評価項目となる。また、単純に、非言語情報評価システム１０が適用される場面の目的に対して肯定的（ポジティブ）な反応か否定的（ネガティブ）な反応かを評価するようにしてもよい。評価結果は、評価項目や評価の目的等に応じて様々な形式で決定し得る。例えば、授業中に発言したか否かというような二値的な評価を行ってもよいし、集中度や積極性などの評価項目の達成度（評価の程度）を段階的に特定する多値的な評価を行ってもよい。また、反応評価部２５０は、講義や授業が行われる一定時間にわたって継続的に評価を行い、時間の経過に伴って変化する時系列の評価情報を生成してもよい。 <Processing of reaction evaluation unit>
Processing by the reaction evaluation unit 250 will be described. The reaction evaluation unit 250 evaluates the evaluation subject's reaction according to the scene to which the non-linguistic information evaluation system 10 is applied, based on the non-linguistic information extracted by the non-linguistic information extraction unit 240. For example, when evaluating the responses of students in a lecture, the degree of concentration on the lecture is an evaluation item. In the case of a participatory class, the degree of concentration and aggressiveness of each student, the activity level of the entire class, and the like are evaluation items. Moreover, you may make it evaluate simply whether it is a positive (positive) reaction or a negative (negative) reaction with respect to the objective of the scene where the non-linguistic information evaluation system 10 is applied. The evaluation result can be determined in various forms according to the evaluation item, the purpose of the evaluation, and the like. For example, a binary evaluation such as whether or not he / she spoke during the class may be performed, or a multi-value that specifies the achievement level (evaluation level) of evaluation items such as concentration and aggressiveness in stages Evaluation may be performed. The reaction evaluation unit 250 may perform continuous evaluation over a certain period of time during which a lecture or lesson is performed, and generate time-series evaluation information that changes with the passage of time.

多値的な評価を行う場合、反応評価部２５０において評価される評価項目に応じて、その評価に用いられる非言語情報として定義された（抽出される）行動（以下、反応行動）の種類および反応行動の出現態様が設定される。言い換えると、同じ反応行動であっても、その出現態様に応じて異なる評価となる。例えば、非言語情報として抽出される特定の反応行動が１回行われた場合と、複数回繰り返されたり、一定時間以上継続したりした場合とでは評価が異なる。 When performing multivalued evaluation, according to the evaluation items evaluated in the response evaluation unit 250, the types of actions (hereinafter referred to as reaction actions) defined (extracted) as non-linguistic information used for the evaluation and The appearance mode of the reaction behavior is set. In other words, even if the reaction behavior is the same, the evaluation is different depending on the appearance mode. For example, the evaluation differs between a case where a specific reaction action extracted as non-linguistic information is performed once and a case where the specific reaction action is repeated a plurality of times or continued for a certain time or more.

さらに、多値的な評価を行う場合、例えば、その評価項目における反応行動の種類、出現頻度、継続時間などに基づいて、評価の程度を特定してもよい。一例として、集中度を評価するための評価対象の反応行動として、うなずく動作と筆記する動作とが定義されている場合を考える。そして、うなずく動作は筆記する動作よりも高い集中度を表すものとする。この場合、反応行動の種類として、うなずく動作が出現したときは、筆記する動作が出現したときよりも集中度が高い（評価の程度が高い）と評価する。また、反応行動の出現頻度として、一定時間内にうなずく動作が多いほうが少ないよりも集中度が高い（評価の程度が高い）と評価する。また、一定時間内に筆記する動作の継続時間が長いほうが短いよりも集中度が高い（評価の程度が高い）と評価する。なお、これらの評価項目や評価方法は例示に過ぎず、本実施形態の非言語情報評価システム１０でとり得る評価項目や評価方法は、上記に提示した評価項目や評価方法に限定されない。 Furthermore, when performing multi-level evaluation, for example, the degree of evaluation may be specified based on the type of reaction behavior, the appearance frequency, the duration, and the like in the evaluation item. As an example, let us consider a case where a nodding action and a writing action are defined as reaction behaviors to be evaluated for evaluating the degree of concentration. The nodding operation represents a higher degree of concentration than the writing operation. In this case, when a nodding action appears as a type of reaction action, it is evaluated that the degree of concentration is higher (the degree of evaluation is higher) than when a writing action appears. Further, as the appearance frequency of the reaction behavior, it is evaluated that the degree of concentration is higher (the degree of evaluation is higher) than the case where there are many nodding motions within a certain time. Moreover, it is evaluated that the degree of concentration is higher (the degree of evaluation is higher) when the duration of the writing operation within a certain time is longer than when the duration is shorter. Note that these evaluation items and evaluation methods are merely examples, and the evaluation items and evaluation methods that can be taken by the non-linguistic information evaluation system 10 of the present embodiment are not limited to the evaluation items and evaluation methods presented above.

＜適用例＞
ここで、具体的な適用の場面を想定し、非言語情報の抽出と反応評価についてさらに説明する。第１の適用場面は、講義や講演会等のように、話者と受講者（聴取者）が明確に分かれており、ほぼ話者のみが話をする場面である。片方向（ここでは話者から受講者への方向）の情報伝達が大きい場面（ケース）といえる。そして、ここでは、受講者を評価対象者として評価を行うものとする。 <Application example>
Here, assuming specific application scenes, the extraction of non-linguistic information and reaction evaluation will be further described. The first application scene is a scene where speakers and students (listeners) are clearly separated, such as lectures and lectures, and almost only the speakers speak. It can be said that this is a scene (case) in which information transmission in one direction (in this case, the direction from the speaker to the student) is large. In this case, the student is evaluated as an evaluation target person.

図７は、第１の適用場面でビデオカメラ１００により取得される評価対象者の画像の例を示す図である。図７に示す例では、一方向に向かって縦横４列ずつに並んだ受講者を話者側から撮影した様子が示されている。したがって、各受講者は、原則としてビデオカメラ１００の方向を向いている。この例では、１６人の各受講者が評価対象者となる。 FIG. 7 is a diagram illustrating an example of an image of the evaluation subject acquired by the video camera 100 in the first application scene. In the example shown in FIG. 7, a situation is shown in which the students who are arranged in four rows and four columns in one direction are photographed from the speaker side. Therefore, each student is facing the video camera 100 in principle. In this example, each of 16 students is an evaluation subject.

情報処理装置２００において、動画データ取得部２１０がビデオカメラ１００により取得された動画データを受信すると、領域識別部２２０が、取得した動画から各受講者（評価対象者）が映っている領域を識別する。ここでは、上半身の領域、顔領域、目領域、口領域、顔向き、頭部などの部位の領域が識別される。そして、動作検出部２３０が、領域識別部２２０により識別された領域に基づき、各受講者の部位ごとの動きを検出する。 In the information processing apparatus 200, when the moving image data acquisition unit 210 receives the moving image data acquired by the video camera 100, the region identification unit 220 identifies the region where each student (evaluation target person) is shown from the acquired moving image. To do. Here, regions of the upper body region, face region, eye region, mouth region, face direction, head, and other parts are identified. Then, the motion detection unit 230 detects the movement of each student's part based on the region identified by the region identification unit 220.

次に、非言語情報抽出部２４０が、動作検出部２３０により検出された各受講者の部位ごとの動きに基づき、受講者ごとの特定の行動を、非言語情報を表す反応行動として抽出する。例えば、うなずく動作、欠伸、目を閉じる動作、うつむく動作、笑う動作などが反応行動として抽出される。 Next, the non-linguistic information extraction unit 240 extracts a specific action for each student as a reaction action representing the non-linguistic information based on the movement of each part of each student detected by the motion detection unit 230. For example, a nodding action, distraction, an eye closing action, a nagging action, a laughing action, and the like are extracted as reaction actions.

次に、反応評価部２５０が、非言語情報抽出部２４０により抽出された非言語情報としての反応行動を評価する。評価方法としては、例えば、特定の評価項目に関して、相反する評価となる第１分類に該当する行動と第２分類に該当する行動とを定義しておき、第１分類に該当する行動の出現に基づく評価と第２分類に該当する行動の出現に基づく評価とを統合して、この評価項目の評価結果とする。一例として、肯定的な反応行動と否定的な反応行動とを定義して評価する場合について説明する。例えば、上記のうなずく動作や笑う動作を肯定的な反応として評価する。そして、それぞれの動作において出現頻度が大きかったり、出現時間が長かったりするほど高評価とする。一方、欠伸、目を閉じる動作、うつむく動作等を否定的な反応として評価する。そして、それぞれの動作において出現頻度が大きかったり、出現時間が長かったりするほど低評価とする。 Next, the reaction evaluation unit 250 evaluates the reaction behavior as the non-linguistic information extracted by the non-linguistic information extraction unit 240. As an evaluation method, for example, with respect to a specific evaluation item, an action corresponding to the first classification and an action corresponding to the second classification, which are contradictory evaluations, are defined. The evaluation based on the evaluation and the evaluation based on the appearance of the action corresponding to the second classification are integrated to obtain an evaluation result of this evaluation item. As an example, a case where positive reaction behavior and negative reaction behavior are defined and evaluated will be described. For example, the above-mentioned nodding action and laughing action are evaluated as positive responses. In each operation, the higher the appearance frequency or the longer the appearance time, the higher the evaluation. On the other hand, the absence, the closing of the eyes, the nagging, etc. are evaluated as negative reactions. And it is set as low evaluation, so that appearance frequency is large in each operation | movement, or appearance time is long.

ここで、肯定的な反応行動として定義された動作は、その動作が出現すると必ず肯定的な反応として評価される。例えば、上記のようにうなずく動作が肯定的な反応行動と定義された場合、評価対象者がうなずく動作を行うと、必ず肯定的な反応として評価される。そして、うなずく動作の態様に応じて肯定的な評価の程度が変わる。例えば、何度も繰り返してうなずいたり、大きくゆっくりした動作でうなずいたりした場合は、肯定的な程度が高いと評価される。一方、小さな動作で軽くうなずいた場合は、肯定的な反応と評価するが、その程度は低いと評価される。反対に、否定的な反応行動として定義された動作は、その動作が出現すると必ず否定的な反応として評価される。例えば、上記のように目を閉じる動作が否定的な反応行動と定義された場合、評価対象者が目を閉じる動作を行うと、必ず否定的な反応として評価される。そして、目を閉じる動作の態様に応じて否定的な評価の程度が変わる。例えば、長時間にわたって目を閉じた状態が継続した場合は、否定的な程度が高いと評価する。一方、目を閉じた状態が短時間しか継続しない場合は、否定的な反応と評価するが、その程度は低いと評価する。なお、肯定的（高評価）か否定的（低評価）かという二元的な評価の他に、肯定的な反応行動も否定的な反応行動もあまり多くない場合の評価として、中間的という評価を加えてもよい。 Here, an action defined as a positive reaction action is always evaluated as a positive reaction when the action appears. For example, when the nodding motion is defined as a positive reaction behavior as described above, when the evaluation subject performs the nodding motion, it is always evaluated as a positive response. The degree of positive evaluation changes depending on the mode of nodding operation. For example, if nodding is repeated many times or nodding with a large and slow motion, the positive degree is evaluated as high. On the other hand, when nodding lightly with a small action, it is evaluated as a positive reaction, but the degree is evaluated as low. Conversely, an action defined as a negative reaction behavior is always evaluated as a negative reaction when the action appears. For example, when the action of closing the eyes is defined as a negative reaction behavior as described above, if the evaluation subject performs an action of closing the eyes, the action is always evaluated as a negative reaction. And the degree of negative evaluation changes according to the mode of operation of closing eyes. For example, when the closed state continues for a long time, it is evaluated that the negative degree is high. On the other hand, if the closed eyes only last for a short time, it is evaluated as a negative reaction, but the degree is evaluated as low. In addition to the dual evaluation of positive (high evaluation) or negative (low evaluation), the evaluation is intermediate as an evaluation when there are not many positive and negative response actions. May be added.

上記の例の他、積極性を評価するための反応行動と消極性を評価するための反応行動とを定義し、該当する反応行動の出現頻度や出現時間に応じて積極的（高評価）か、消極的（低評価）か、中間的かといった評価を行ってもよい。また、意味のある動作と無意味な動作とを定義し、意味のある動作の出現頻度や出現時間、無意味な動作の出現頻度や出現時間に応じて高評価か、低評価か、中間的かといった評価を行ってもよい。 In addition to the above example, define the response behavior for evaluating aggressiveness and the response behavior for evaluating depolarization, and depending on the frequency and appearance time of the corresponding response behavior, You may evaluate whether it is passive (low evaluation) or intermediate. Also, meaningful and meaningless actions are defined, depending on the appearance frequency and appearance time of meaningful actions, the appearance frequency and appearance time of meaningless actions, high evaluation, low evaluation, intermediate You may make an evaluation.

さらに、上記のような二元的な内容を基礎とする評価でなく、特定の特性が強いか否かを評価するようにしてもよい。例えば、理解度を評価するための反応行動を定義し、該当する反応行動がなければ理解度が０（ゼロ）であり、該当する反応行動の出現頻度が大きかったり、出現時間が長かったりするほど理解度が高いと評価する。同様に、活性度を評価するための反応行動を定義し、該当する反応行動の出現頻度が大きかったり、出現時間が長かったりするほど活性度が高いと評価する。また、ファッシリテート（facilitate）度を評価するための反応行動を定義し、該当する反応行動の出現頻度が大きかったり、出現時間が長かったりするほどファッシリテート度が高いと評価する。また、集中度を評価するための反応行動を定義し、該当する反応行動の出現頻度が大きかったり、出現時間が長かったりするほど集中度が高いと評価する。また、落ち着きの度合いを評価するための反応行動を定義し、該当する反応行動の出現頻度が大きかったり、出現時間が長かったりするほど落ち着きの度合いが高いと評価する。なお、評価対象者に対して複数の評価項目による評価が行われる場合、一つの非言語情報としての行動が複数の評価項目における反応行動として定義されてよい。 Furthermore, instead of the above-described evaluation based on the dual contents, it may be evaluated whether or not a specific characteristic is strong. For example, the reaction behavior for evaluating the degree of understanding is defined, and if there is no corresponding reaction behavior, the understanding level is 0 (zero), and the appearance frequency of the corresponding reaction behavior is large or the appearance time is long. Assess that the level of understanding is high. Similarly, a reaction behavior for evaluating the activity is defined, and it is evaluated that the activity is higher as the appearance frequency of the corresponding reaction behavior is higher or the appearance time is longer. Further, a reaction behavior for evaluating the degree of facilitate is defined, and it is evaluated that the degree of facilitation is higher as the appearance frequency of the corresponding reaction behavior is higher or the appearance time is longer. Also, reaction behavior for evaluating the degree of concentration is defined, and it is evaluated that the degree of concentration is higher as the appearance frequency of the corresponding reaction behavior is higher or the appearance time is longer. Further, a reaction behavior for evaluating the degree of calm is defined, and it is evaluated that the degree of calmness is higher as the appearance frequency of the corresponding reaction behavior is higher or the appearance time is longer. Note that, when an evaluation target person is evaluated by a plurality of evaluation items, an action as one non-language information may be defined as a reaction action in a plurality of evaluation items.

次に上記の例とは別の適用場面について説明する。第２の適用場面は、打ち合わせ等のように、複数の参加者が互いに発言しあう場面である。双方向の情報伝達が行われる場面（ケース）といえる。そして、ここでは、各参加者を評価対象者として評価を行うものとする。 Next, an application scene different from the above example will be described. The second application scene is a scene where a plurality of participants speak each other, such as a meeting. It can be said that it is a scene (case) where two-way information transmission is performed. Here, it is assumed that each participant is evaluated as an evaluation subject.

図８は、第２の適用場面でビデオカメラ１００により取得される評価対象者の画像の例を示す図である。図８に示す例では、５人の参加者が一つのテーブルを囲んで着席している場面を撮影した様子が示されている。したがって、各参加者は、ビデオカメラ１００の位置とは関係なく相互に向き合っている。この例では、５人の各参加者が評価対象者となる。 FIG. 8 is a diagram illustrating an example of an image of the evaluation subject acquired by the video camera 100 in the second application scene. In the example shown in FIG. 8, a situation is shown in which a scene in which five participants are sitting around a table is photographed. Accordingly, the participants face each other regardless of the position of the video camera 100. In this example, each of the five participants is an evaluation subject.

情報処理装置２００において、動画データ取得部２１０がビデオカメラ１００により取得された動画データを受信すると、領域識別部２２０が、取得した動画から各受講者（評価対象者）が映っている領域を識別する。ここでは、上半身の領域、顔領域、目領域、口領域、顔向き、頭部、体部、腕、手、脚などの部位の領域が識別される。そして、動作検出部２３０が、領域識別部２２０により識別された領域に基づき、各受講者の部位ごとの動きを検出する。 In the information processing apparatus 200, when the moving image data acquisition unit 210 receives the moving image data acquired by the video camera 100, the region identification unit 220 identifies the region where each student (evaluation target person) is shown from the acquired moving image. To do. Here, regions of the upper body region, face region, eye region, mouth region, face orientation, head, body, arm, hand, leg, and other parts are identified. Then, the motion detection unit 230 detects the movement of each student's part based on the region identified by the region identification unit 220.

次に、非言語情報抽出部２４０が、動作検出部２３０により検出された各受講者の部位ごとの動きに基づき、受講者ごとの特定の行動を、非言語情報を表す反応行動として抽出する。例えば、うなずく動作、発言する動作、身振り、手振り、目くばせする動作、筆記動作、キーボードのキーを打つ動作、貧乏ゆすり、欠伸、目を閉じる動作などが反応行動として抽出される。また、口の開閉動作が一定の時間で行われた場合に発言の動作として検出してもよい。 Next, the non-linguistic information extraction unit 240 extracts a specific action for each student as a reaction action representing the non-linguistic information based on the movement of each part of each student detected by the motion detection unit 230. For example, a nodding action, a speaking action, a gesture, a hand gesture, a blinking action, a writing action, a keyboard key hitting action, a poor motion, a stretch, an eye closing action, and the like are extracted as reaction actions. Further, when the opening / closing operation of the mouth is performed for a certain time, it may be detected as a speech operation.

次に、反応評価部２５０が、非言語情報抽出部２４０により抽出された非言語情報としての反応行動を評価する。一例として、参加者が積極的に参加していることを示す反応行動と消極的に参加していることを示す反応行動とを定義して評価する場合について説明する。例えば、上記のうなずく動作、発言する動作、身振り、手振り、目くばせする動作、筆記動作、キーボードのキーを打つ動作を、積極的に参加していることを示す反応行動として評価する。そして、それぞれの動作において出現頻度が大きかったり、出現時間が長かったりするほど高評価とする。一方、欠伸、目を閉じる動作、貧乏ゆすりを、否定的に参加していることを示す反応行動として評価する。そして、それぞれの動作において出現頻度が大きかったり、出現時間が長かったりするほど低評価とする。なお、上述した第１の適用場面での評価と同様に、積極的（高評価）か消極的（低評価）かという二元的な評価の他に中間的という評価を加えてもよい。 Next, the reaction evaluation unit 250 evaluates the reaction behavior as the non-linguistic information extracted by the non-linguistic information extraction unit 240. As an example, a case will be described in which a reaction behavior indicating that a participant is actively participating and a reaction behavior indicating that a participant is actively participating are defined and evaluated. For example, the above-described nodling action, speaking action, gesture, hand gesture, blinking action, writing action, and keyboard keying action are evaluated as reaction actions indicating active participation. In each operation, the higher the appearance frequency or the longer the appearance time, the higher the evaluation. On the other hand, the lack of action, the action of closing the eyes, and the poor prayer are evaluated as reaction behaviors indicating negative participation. And it is set as low evaluation, so that appearance frequency is large in each operation | movement, or appearance time is long. Similar to the evaluation in the first application scene described above, an intermediate evaluation may be added in addition to a binary evaluation of positive (high evaluation) or passive (low evaluation).

上記の例の他、肯定的な反応行動と否定的な反応行動とを定義し、該当する反応行動の出現頻度や出現時間に応じて肯定的（高評価）か、否定的（低評価）か、中間的かといった評価を行ってもよい。また、意味のある動作と無意味な動作とを定義し、意味のある動作の出現頻度や出現時間、無意味な動作の出現頻度や出現時間に応じて高評価か、低評価か、中間的かといった評価を行ってもよい。 In addition to the above example, define positive response behavior and negative response behavior, and whether it is positive (high evaluation) or negative (low evaluation) depending on the frequency and time of appearance of the corresponding response behavior An intermediate evaluation may be performed. Also, meaningful and meaningless actions are defined, depending on the appearance frequency and appearance time of meaningful actions, the appearance frequency and appearance time of meaningless actions, high evaluation, low evaluation, intermediate You may make an evaluation.

さらに、上述した第１の適用場面での評価と同様に、特定の特性が強いか否かを評価するようにしてもよい。すなわち、理解度、活性度、ファッシリテート度、集中度、落ち着きの度合い等を評価するための反応行動をそれぞれ定義し、該当する反応行動がなければ評価対象の特性の度合いが０（ゼロ）であり、該当する反応行動の出現頻度が大きかったり、出現時間が長かったりするほど、評価対象の特性の度合いが高いと評価する。なお、評価対象者に対して複数の評価項目による評価が行われる場合、一つの非言語情報としての行動が複数の評価項目における反応行動として定義されてよい。 Furthermore, as in the evaluation in the first application scene described above, it may be evaluated whether or not the specific characteristic is strong. That is, reaction behaviors for evaluating the degree of understanding, activity, facilitating, concentration, calmness, etc. are defined, respectively, and if there is no corresponding reaction behavior, the degree of the evaluation target characteristic is 0 (zero) It is evaluated that the degree of the characteristic of the evaluation target is higher as the appearance frequency of the corresponding reaction action is higher or the appearance time is longer. Note that, when an evaluation target person is evaluated by a plurality of evaluation items, an action as one non-language information may be defined as a reaction action in a plurality of evaluation items.

以上のように、本実施形態の非言語情報評価システム１０が適用される具体的な場面に応じて、評価項目、評価内容、評価項目ごとの評価に用いられる対象行動、各対象行動をどのように評価するか等の評価方法が個別に設定される。上記の例では、講義や授業（第１の適用場面）、打ち合わせ（第２の適用場面）における個々の参加者に対する評価方法について説明したが、参加者全体の評価に基づいて、講義や授業、打ち合わせ等の場面自体の集中度、活性度、積極性の度合いなどを評価してもよい。参加者全体の評価は、例えば、個々の参加者の評価値の累積値や代表値（平均値や中央値など）を用いて行ってもよい。場面自体の評価は、例えば、集中度の高い参加者の多い講義を高評価としたり、活性度の高い参加者の多い打ち合わせを高評価としたりする等、評価目的等に応じて様々に定義して行ってよい。 As described above, depending on the specific scene where the non-linguistic information evaluation system 10 of the present embodiment is applied, the evaluation item, the evaluation content, the target behavior used for the evaluation for each evaluation item, and how each target behavior is determined. The evaluation method such as whether or not to evaluate is individually set. In the above example, the evaluation method for individual participants in lectures and classes (first application scene) and meetings (second application scenes) has been explained. Based on the evaluation of all participants, lectures and classes, The degree of concentration, activity, aggressiveness, etc. of the scene itself such as a meeting may be evaluated. The evaluation of the entire participant may be performed using, for example, a cumulative value or a representative value (average value, median value, etc.) of evaluation values of individual participants. The evaluation of the scene itself is variously defined according to the purpose of evaluation, such as giving a high evaluation to a lecture with many highly concentrated participants, or a high evaluation to a meeting with many active participants. You can go.

＜評価結果の出力例＞
端末装置３００による評価結果の出力例について説明する。情報処理装置２００において反応評価部２５０により行われた評価の結果は、出力部２６０により端末装置３００へ送られる。端末装置３００において、評価結果取得部３１０は、情報処理装置２００から送信された評価結果のデータを受け取る。表示画像生成部３２０は、取得した評価結果のデータに基づき、評価結果を視覚的に示す出力画像を生成する。表示制御部３３０は、生成された出力画像を表示装置３０４に表示させる。 <Example output of evaluation results>
An example of an evaluation result output by the terminal device 300 will be described. The result of the evaluation performed by the reaction evaluation unit 250 in the information processing device 200 is sent to the terminal device 300 by the output unit 260. In the terminal device 300, the evaluation result acquisition unit 310 receives the evaluation result data transmitted from the information processing device 200. The display image generation unit 320 generates an output image that visually indicates the evaluation result based on the acquired evaluation result data. The display control unit 330 causes the display device 304 to display the generated output image.

図９は、評価結果の出力画像の例を示す図である。図９に示す例では、参加者Ａ、参加者Ｂ、参加者Ｃに関して、５０分の講義中での評価を時系列に表している。図９を参照すると、例えば参加者Ａは、講義開始から１５分程は肯定的な反応行動があり、１５分から３５分頃まで肯定的な反応行動も否定的な反応行動もあまり行われない中間的な状態となり、その後、講義終了（５０分）まで再び肯定的な反応行動があったことが分かる。参加者Ｂは、講義開始から２０分頃まで肯定的な反応行動があり、２０分から３０分頃まで肯定的な反応行動も否定的な反応行動もあまり行われない中間的な状態となり、その後、３０分から４０分頃まで再び肯定的な反応行動が行われ、さらに４０分頃から講義終了（５０分）まで発言が行われたことが分かる。参加者Ｃは、講義開始から１０分頃まで肯定的な反応行動があり、１０分から３０分頃まで肯定的な反応行動も否定的な反応行動もあまり行われない中間的な状態となり、３０分から４０分頃まで否定的な反応行動が行われた後、４０分頃から講義終了（５０分）まで中間的な状態に戻ったことが分かる。 FIG. 9 is a diagram illustrating an example of an output image of the evaluation result. In the example shown in FIG. 9, the evaluation during the 50-minute lecture for the participant A, the participant B, and the participant C is shown in time series. Referring to FIG. 9, for example, Participant A has a positive reaction behavior for about 15 minutes from the start of the lecture, and an intermediate in which neither positive reaction behavior nor negative reaction behavior is performed from 15 to 35 minutes. It turns out that there was a positive reaction behavior again until the end of the lecture (50 minutes). Participant B has a positive reaction behavior from the start of the lecture to about 20 minutes and becomes an intermediate state in which neither positive reaction behavior nor negative reaction behavior is performed from about 20 minutes to about 30 minutes. It can be seen that the positive reaction action was performed again from about 30 minutes to about 40 minutes, and further that the speech was made from about 40 minutes to the end of the lecture (50 minutes). Participant C has a positive reaction behavior from the start of the lecture to about 10 minutes, and is in an intermediate state where neither positive reaction behavior nor negative reaction behavior is performed from about 10 minutes to about 30 minutes. It can be seen that after a negative reaction behavior was performed until around 40 minutes, the state returned to an intermediate state from around 40 minutes until the end of the lecture (50 minutes).

以上のように、各参加者の評価結果を時系列に並べると、各参加者に共通する特徴に基づき、時間経過に伴う講義全体の様子を類推し得る。図９に示した例では、講義の開始直後は各参加者とも肯定的な反応行動を行っているが、開始から一定時間が経過すると次第に肯定的な反応行動が減っていく。これは、例えば、時間の経過と共に、参加者の集中力や注意力が落ちてきたことに起因すると類推される。そして、さらに時間が経過して講義の終了が近くなると、各参加者は再び肯定的な反応行動を行うようになる。これは、例えば、講義の終了が近いことで参加者が集中力や注意力を講義に向けるようになることに起因すると推察される。 As described above, when the evaluation results of each participant are arranged in time series, the state of the entire lecture over time can be inferred based on characteristics common to each participant. In the example shown in FIG. 9, each participant performs a positive reaction behavior immediately after the start of the lecture, but the positive reaction behavior gradually decreases after a certain period of time has elapsed from the start. This is presumed to be caused by, for example, a decrease in the concentration and attention of the participants over time. Then, when more time passes and the end of the lecture is near, each participant will react positively again. This is presumed to be due to the fact that, for example, due to the near end of the lecture, the participants turn their concentration and attention to the lecture.

図１０は、評価結果の出力画像の他の例を示す図である。図１０に示す例では、評価対象者である参加者Ａに関して、時間の経過に伴い、肯定的な反応行動と否定的な反応行動がどのように現れたかを示す。図９の例では参加者の個々の反応行動に基づく評価結果を統合した参加者自身の評価結果を時系列情報として示したのに対し、図１０の例では個々の反応行為に対する評価を時系列情報として示している。図１０に示す表示画像の上段には肯定的な反応行動の強さが示され、下段には否定的な反応行動の強さが示されている。ここで、肯定的な反応行動の強さとは、反応行動である動作の態様によって特定される評価の程度を示す。例えば、うなずく動作について、何度も繰り返してうなずいたり、大きくゆっくりした動作でうなずいたりした場合に肯定的な程度が高いと評価する場合、図１０の上段では、肯定的な反応行動の強さが強い方（上方）にプロットされる（点が打たれる）。反対に、軽くうなずいた場合に肯定的な程度が低いと評価する場合、図１０の上段では、肯定的な反応行動の強さが弱い方（下方）にプロットされる。同様に、反応行動である動作の態様に応じて、否定的な程度が高いと評価する場合、図１０の下段では、否定的な反応行動の強さが強い方（上方）にプロットされ、否定的な程度が低いと評価する場合、図１０の下段では、否定的な反応行動の強さが弱い方（下方）にプロットされる。 FIG. 10 is a diagram illustrating another example of the output image of the evaluation result. In the example illustrated in FIG. 10, regarding the participant A who is an evaluation target, it is shown how positive reaction behavior and negative reaction behavior appear with the passage of time. In the example of FIG. 9, the evaluation result of the participant himself / herself obtained by integrating the evaluation results based on the individual reaction behavior of the participant is shown as time series information, whereas in the example of FIG. Shown as information. The upper part of the display image shown in FIG. 10 shows the strength of the positive reaction behavior, and the lower part shows the strength of the negative reaction behavior. Here, the strength of the positive reaction behavior indicates the degree of evaluation specified by the mode of action that is the reaction behavior. For example, in the case where it is evaluated that a positive degree is high when nodding is repeated many times or when nodding is performed with a large and slow movement, the strength of positive reaction behavior is shown in the upper part of FIG. Plotted on the stronger side (upper) (dotted). On the other hand, when it is evaluated that the positive degree is low when nodding lightly, in the upper part of FIG. 10, the strength of the positive reaction behavior is plotted on the weaker side (downward). Similarly, in the case where it is evaluated that the negative degree is high according to the mode of the action that is the reaction behavior, in the lower part of FIG. In the lower part of FIG. 10, the negative reaction behavior strength is plotted on the lower side (downward).

図１０に示す例において、上段のグラフと下段のグラフとを対比すると、肯定的な反応行動の強さが強い時には、否定的な反応行動の強さも強くなっている（図のグラフが山形になっている個所）。すなわち、この時間帯において、評価対象者である参加者Ａは、強い肯定的な反応行動として抽出される動作と、強い否定的な反応行動として抽出される動作とを行っており、非常に活性化し、積極的に講義に参加していたことが推察される。 In the example shown in FIG. 10, when the upper graph and the lower graph are compared, when the strength of the positive reaction behavior is strong, the strength of the negative reaction behavior is also strong (the graph in the figure is a mountain shape). Where) That is, during this time period, the participant A who is the evaluation target performs an action extracted as a strong positive reaction action and an action extracted as a strong negative reaction action, and is very active. It is assumed that he participated in the lecture positively.

＜他の構成例等＞
以上、本実施形態による非言語情報評価システム１０について説明したが、本実施形態の具体的構成は上記のものに限定されない。例えば、上記の構成では、ビデオカメラ１００で取得した動画を情報処理装置２００が処理し、得られた評価結果を出力手段としての端末装置３００が表示出力するとした。これに対し、情報処理装置２００が出力手段を兼ねる構成としてもよい。すなわち、情報処理装置２００と端末装置３００とを分けず、例えば、情報処理装置２００自身が液晶ディスプレイ等の表示装置を備える構成とし、評価結果の表示出力を行うようにしてもよい。また、上記の実施形態では、ビデオカメラ１００で撮影することにより評価対象者の画像を取得したが、別途用意された画像データを情報処理装置２００が解析し、評価しても良い。例えば、別途撮影し、磁気ディスク装置等の記憶装置に蓄積された画像データを読み込んで評価しても良い。 <Other configuration examples>
The non-linguistic information evaluation system 10 according to the present embodiment has been described above, but the specific configuration of the present embodiment is not limited to the above. For example, in the above configuration, the information processing apparatus 200 processes a moving image acquired by the video camera 100, and the terminal apparatus 300 as an output unit displays and outputs the obtained evaluation result. On the other hand, the information processing apparatus 200 may be configured to also serve as an output unit. That is, the information processing apparatus 200 and the terminal apparatus 300 are not divided, and for example, the information processing apparatus 200 itself may be configured to include a display device such as a liquid crystal display and display the evaluation result. In the above embodiment, the image of the person to be evaluated is acquired by shooting with the video camera 100. However, the information processing apparatus 200 may analyze and evaluate separately prepared image data. For example, the image data separately taken and stored in a storage device such as a magnetic disk device may be read and evaluated.

１０…非言語情報評価システム、２０…ネットワーク、１００…ビデオカメラ、２００…情報処理装置、２０１…ＣＰＵ、２０２…ＲＡＭ、２０３…ＲＯＭ、２０４…外部記憶装置、２０５…ネットワーク・インターフェイス、２１０…動画データ取得部、２２０…領域識別部、２３０…動作検出部、２４０…非言語情報抽出部、２５０…反応評価部、２６０…出力部、３００…端末装置、３０１…ＣＰＵ、３０２…ＲＡＭ、３０３…ＲＯＭ、３０４…表示装置、３０５…入力装置、３０６…ネットワーク・インターフェイス、３１０…評価結果取得部、３２０…表示画像生成部、３３０…表示制御部、３４０…操作受け付け部 DESCRIPTION OF SYMBOLS 10 ... Non-language information evaluation system, 20 ... Network, 100 ... Video camera, 200 ... Information processing apparatus, 201 ... CPU, 202 ... RAM, 203 ... ROM, 204 ... External storage device, 205 ... Network interface, 210 ... Movie Data acquisition unit, 220 ... region identification unit, 230 ... motion detection unit, 240 ... non-linguistic information extraction unit, 250 ... reaction evaluation unit, 260 ... output unit, 300 ... terminal device, 301 ... CPU, 302 ... RAM, 303 ... ROM 304 display device 305 input device 306 network interface 310 evaluation result acquisition unit 320 display image generation unit 330 display control unit 340 operation accepting unit

Claims

An action detection unit that identifies a part of the human body shown in the video data and detects the action of the specified part;
An action extraction unit for extracting an action defined as an evaluation target in a predetermined evaluation item based on the movement of the part of the human body detected by the movement detection unit;
An evaluation unit that performs evaluation for each evaluation item based on the behavior extracted by the behavior extraction unit and an evaluation criterion that is predetermined for each evaluation item;
An information processing apparatus comprising:

The evaluation unit specifies a degree of evaluation in an evaluation item for which the action is an evaluation target, based on at least one of the type of action defined as the evaluation target, the appearance frequency, and the duration. The information processing apparatus according to claim 1.

With respect to a specific evaluation item, an action corresponding to the first classification and an action corresponding to the second classification are defined as conflicting evaluations, and the evaluation unit corresponds to the first classification with respect to the specific evaluation item The evaluation according to the specific evaluation item is performed based on the evaluation based on the appearance of the action and the evaluation based on the appearance of the action corresponding to the second classification. Information processing device.

4. The evaluation unit according to claim 1, wherein the evaluation unit continuously evaluates the same evaluation item, and generates time-series data indicating a change in the evaluation result over time. The information processing apparatus described.

An acquisition means for acquiring video data;
Behavior evaluation means for analyzing the video data acquired by the acquisition means and evaluating the behavior of a person shown in the video;
Output means for outputting an evaluation result by the behavior evaluation means,
The behavior evaluation means includes
An action detection unit for identifying a part of a human body shown in the moving image data and detecting an action of the specified part;
An action extraction unit for extracting an action defined as an evaluation target in a predetermined evaluation item based on the movement of the part of the human body detected by the movement detection unit;
An evaluation unit that performs evaluation for each evaluation item based on the behavior extracted by the behavior extraction unit and an evaluation criterion that is predetermined for each evaluation item;
An evaluation system comprising:

With respect to a specific evaluation item, an action corresponding to the first classification and an action corresponding to the second classification are defined as conflicting evaluations, and the evaluation unit of the behavior evaluation means Based on the evaluation based on the appearance of the action corresponding to the first classification and the evaluation based on the appearance of the action corresponding to the second classification, the evaluation on the specific evaluation item is performed,
The output means includes a display unit for displaying an image, and compares the evaluation based on the appearance of the behavior corresponding to the first category with the evaluation based on the appearance of the behavior corresponding to the second category. 6. The evaluation system according to claim 5, wherein display is output on the display unit.

The output unit includes a display unit that displays an image, and the evaluation result continuously performed on the same evaluation item by the evaluation unit of the behavior evaluation unit is a graph showing a change in the evaluation result over time. The evaluation system according to claim 5 or 6, wherein an image to be displayed is displayed and output on the display unit.

Computer
A motion detection means for identifying a part of the human body shown in the video data and detecting a motion of the identified part;
Action extracting means for extracting an action defined as an evaluation target in a predetermined evaluation item based on the action of the part of the human body detected by the action detecting means;
Based on the behavior extracted by the behavior extraction means and the evaluation criteria predetermined for each of the evaluation items, functioning as an evaluation means for performing evaluation for each evaluation item;
A program characterized by