JP7100938B1

JP7100938B1 - Video analysis program

Info

Publication number: JP7100938B1
Application number: JP2022518706A
Authority: JP
Inventors: 渉三神谷
Original assignee: Imbesideyou Inc
Current assignee: Imbesideyou Inc
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2022-07-14
Anticipated expiration: 2041-03-22
Also published as: WO2022201271A1; JP2022146949A; JPWO2022201271A1

Abstract

【課題】会議や講義等、オンラインコミュニケーションが主となる状況において、より効率的なコミュニケーションを行うために、これらのコミュニケーションを客観的に評価すること。【解決手段】本開示のシステムは、オンラインセッション中に参加者を撮影することによって得られる動画像を取得する動画像取得部と、動画像取得部により取得された動画像に基づいて、参加者について生体反応の変化を解析する解析部と、動画像内に含まれる単語を分析して所定の文書ベクトルに変換する単語分析部と、変換した文書ベクトルを所定の次元に当該単語と共にプロットするプロット生成部と、を備える。【選択図】図１[Problem] To objectively evaluate these communications in order to conduct more efficient communication in situations such as meetings and lectures where online communication is the main. A system of the present disclosure includes a moving image acquisition unit that acquires a moving image obtained by shooting a participant during an online session; a word analysis unit that analyzes the words contained in the moving image and converts them into predetermined document vectors; and a plot that plots the converted document vectors along with the words in a predetermined dimension. and a generator. [Selection drawing] Fig. 1

Description

本発明は、複数人の参加者でオンラインセッションが行われる環境において、オンラインセッション中に参加者が画面に表示されているか否かによらず、参加者を撮影することによって得られる動画像をもとに参加者の反応を解析する動画像分析システムに関する。 The present invention also provides a moving image obtained by photographing a participant in an environment where an online session is held by a plurality of participants, regardless of whether or not the participant is displayed on the screen during the online session. It relates to a moving image analysis system that analyzes the reactions of participants.

発言者の発言に対して他者が受ける感情を解析する技術が知られている（例えば、特許文献１参照）。また、対象者の表情の変化を長期間にわたり時系列的に解析し、その間に抱いた感情を推定する技術も知られている（例えば、特許文献２参照）。さらに、感情の変化に最も影響を与えた要素を特定する技術も知られている（例えば、特許文献３～５参照）。さらにまた、対象者の普段の表情と現在の表情とを比較して、表情が暗い場合にアラートを発する技術も知られている（例えば、特許文献６参照）。また、対象者の平常時（無表情時）の表情と現在の表情とを比較して、対象者の感情の度合いを判定するようにした技術も知られている（例えば、特許文献７～９参照）。更に、また、組織としての感情や、個人が感じるグループ内の雰囲気を分析する技術も知られている（例えば、特許文献１０、１１参照）。 A technique for analyzing emotions received by others in response to a speaker's remark is known (see, for example, Patent Document 1). Further, there is also known a technique of analyzing changes in a subject's facial expression over a long period of time in a time series and estimating emotions held during that period (see, for example, Patent Document 2). Further, a technique for identifying an element that has the greatest influence on emotional changes is also known (see, for example, Patent Documents 3 to 5). Furthermore, there is also known a technique of comparing a subject's normal facial expression with the current facial expression and issuing an alert when the facial expression is dark (see, for example, Patent Document 6). Further, there is also known a technique of comparing a normal (expressionless) facial expression of a subject with a current facial expression to determine the degree of emotion of the subject (for example, Patent Documents 7 to 9). reference). Furthermore, techniques for analyzing the emotions of an organization and the atmosphere within a group felt by an individual are also known (see, for example, Patent Documents 10 and 11).

特開２０１９－５８６２５号公報Japanese Unexamined Patent Publication No. 2019-58625 特開２０１６－１４９０６３号公報Japanese Unexamined Patent Publication No. 2016-149063 特開２０２０－８６５５９号公報Japanese Unexamined Patent Publication No. 2020-86559 特開２０００－７６４２１号公報Japanese Unexamined Patent Publication No. 2000-76421 特開２０１７－２０１４９９号公報Japanese Unexamined Patent Publication No. 2017-201499 特開２０１８－１１２８３１号公報Japanese Unexamined Patent Publication No. 2018-112831 特開２０１１－１５４６６５号公報Japanese Unexamined Patent Publication No. 2011-154665 特開２０１２－８９４９号公報Japanese Unexamined Patent Publication No. 2012-8949 特開２０１３－３００号公報Japanese Unexamined Patent Publication No. 2013-300 特開２０１１－１８６５２１号公報Japanese Unexamined Patent Publication No. 2011-186521 ＷＯ１５／１７４４２６号公報WO15 / 174426

上述したすべての技術は、現実空間におけるコミュニケーションが主である状況におけるサブ的な機能にすぎない。即ち、昨今の業務のＤＸ（ＤｉｇｉｔａｌＴｒａｎｓｆｏｒｍａｔｉｏｎ）化や、世界的な感染症の流行等を受け、業務や授業等のコミュニケーションがオンラインで行われることが主とされる状況に生まれたものではない。 All of the techniques mentioned above are only sub-functions in situations where communication in real space is predominant. In other words, due to the recent shift to DX (Digital Transformation) in business and the worldwide epidemic of infectious diseases, it was not born in a situation where communication such as business and classes is mainly conducted online.

本発明は、会議や講義等、オンラインコミュニケーションが主となる状況において、より効率的なコミュニケーションを行うために、これらのコミュニケーションを客観的に評価することを目的とする。 An object of the present invention is to objectively evaluate these communications in order to perform more efficient communication in a situation where online communication is the main such as a conference or a lecture.

本発明によれば、
複数人の参加者でオンラインセッションが行われる環境において、オンラインセッション中に参加者が画面に表示されているか否かによらず、前記参加者を撮影することによって得られる動画像をもとに前記参加者の反応を解析する動画像分析システムであって、
前記オンラインセッション中に前記参加者を撮影することによって得られる動画像を取得する動画像取得部と、
前記動画像取得部により取得された動画像に基づいて、前記参加者について生体反応の変化を解析する解析部と、
前記動画像内に含まれる単語を分析して所定の文書ベクトルに変換する単語分析部と、
変換した文書ベクトルを所定の次元に当該単語と共にプロットするプロット生成部と、
動画像分析システムが得られる。According to the present invention
In an environment where an online session is held by a plurality of participants, regardless of whether or not the participants are displayed on the screen during the online session, the above is based on the moving image obtained by photographing the participants. It is a moving image analysis system that analyzes the reaction of participants.
A moving image acquisition unit that acquires a moving image obtained by photographing the participant during the online session, and a moving image acquisition unit.
An analysis unit that analyzes changes in biological reactions of the participants based on the moving images acquired by the moving image acquisition unit.
A word analysis unit that analyzes words contained in the moving image and converts them into a predetermined document vector.
A plot generator that plots the converted document vector in a given dimension with the word,
A moving image analysis system can be obtained.

本開示によれば、ビデオセッションの動画像を分析評価することにより、特に内容に関する評価を客観的に行うことができる。 According to the present disclosure, by analyzing and evaluating the moving image of the video session, it is possible to objectively evaluate the content in particular.

特に、本発明によれば、オンラインコミュニケーションが主となる状況において、より効率的なコミュニケーションを行うために、交わされたコミュニケーションを客観的に評価することができる。 In particular, according to the present invention, in a situation where online communication is the main aspect, it is possible to objectively evaluate the exchanged communication in order to perform more efficient communication.

本発明の実施の形態によるシステム全体図を示す図である。It is a figure which shows the whole system by embodiment of this invention. 本発明の実施の形態による評価端末の機能ブロック図の一例である。This is an example of a functional block diagram of an evaluation terminal according to an embodiment of the present invention. 本発明の実施の形態による評価端末の機能構成例１を示す図である。It is a figure which shows the functional structure example 1 of the evaluation terminal by embodiment of this invention. 本発明の実施の形態による評価端末の機能構成例２を示す図である。It is a figure which shows the functional structure example 2 of the evaluation terminal by embodiment of this invention. 本発明の実施の形態による評価端末の機能構成例３を示す図である。It is a figure which shows the functional structure example 3 of the evaluation terminal by embodiment of this invention. 図６の機能構成例３による画面表示例である。It is a screen display example by the function configuration example 3 of FIG. 図６の機能構成例３による他の画面表示例である。6 is another screen display example according to the functional configuration example 3 of FIG. 本発明の実施の形態による評価端末の機能構成例３の他の構成を示す図である。It is a figure which shows the other configuration of the functional configuration example 3 of the evaluation terminal according to the embodiment of the present invention. 本発明の実施の形態による評価端末の機能構成例３の他の構成を示す図である。It is a figure which shows the other configuration of the functional configuration example 3 of the evaluation terminal according to the embodiment of the present invention. 本発明の実施の形態によるシステムの機能ブロック図である。It is a functional block diagram of the system by embodiment of this invention. 本発明の実施の形態によるシステムによって生成されるプロットのイメージ図である。It is an image diagram of the plot generated by the system by embodiment of this invention. 本発明の実施の形態によるによるシステムの処理のイメージを示す図である。It is a figure which shows the image of the processing of the system by the embodiment of this invention. 本発明の実施の形態によるシステムによって生成されるプロットのイメージ図である。It is an image diagram of the plot generated by the system by embodiment of this invention.

本開示の実施形態の内容を列記して説明する。本開示は、以下のような構成を備える。
［項目１］
複数人の参加者でオンラインセッションが行われる環境において、オンラインセッション中に参加者が画面に表示されているか否かによらず、前記参加者を撮影することによって得られる動画像をもとに前記参加者の反応を解析する動画像分析システムであって、
前記オンラインセッション中に前記参加者を撮影することによって得られる動画像を取得する動画像取得部と、
前記動画像取得部により取得された動画像に基づいて、前記参加者について生体反応の変化を解析する解析部と、
前記動画像内に含まれる単語を分析して所定の文書ベクトルに変換する単語分析部と、
変換した文書ベクトルを所定の次元に当該単語と共にプロットするプロット生成部と、
動画像分析システム。
［項目２］
項目１に記載の動画像分析システムであって、
前記プロットを操作された際に対応する単語が含まれる動画像を当該単語を含むフレームから再生する再生部を更に備える、
動画像分析システム。
［項目３］
項目１又は項目２に記載の動画像分析システムであって、
オンラインセッションの開始前に用意された事前情報に含まれる前記文書ベクトルと、前記オンラインセッションの前記動画像の前記文書ベクトルとを比較して重なり具合を評価する評価部を更に備える、
動画像分析システム。
［項目４］
項目１乃至項目３のいずれかに記載の動画像分析システムの構成を備えた動画像分析装置。
［項目５］
項目１乃至項目３のいずれかに記載の動画像分析システムの構成を動画像分析装置に機能させる動画像分析プログラム。
［項目６］
項目１乃至項目３のいずれかに記載の動画像分析システムの構成をステップとして実行する動画像分析方法。The contents of the embodiments of the present disclosure will be listed and described. The present disclosure comprises the following configurations.
[Item 1]
In an environment where an online session is held by a plurality of participants, regardless of whether or not the participants are displayed on the screen during the online session, the above is based on the moving image obtained by photographing the participants. It is a moving image analysis system that analyzes the reaction of participants.
A moving image acquisition unit that acquires a moving image obtained by photographing the participant during the online session, and a moving image acquisition unit.
An analysis unit that analyzes changes in biological reactions of the participants based on the moving images acquired by the moving image acquisition unit.
A word analysis unit that analyzes words contained in the moving image and converts them into a predetermined document vector.
A plot generator that plots the converted document vector in a given dimension with the word,
Video analysis system.
[Item 2]
The moving image analysis system according to item 1.
A reproduction unit for reproducing a moving image including a corresponding word when the plot is manipulated from a frame containing the word is further provided.
Video analysis system.
[Item 3]
The moving image analysis system according to item 1 or item 2.
Further provided with an evaluation unit for evaluating the degree of overlap by comparing the document vector included in the prior information prepared before the start of the online session with the document vector of the moving image of the online session.
Video analysis system.
[Item 4]
A moving image analysis device comprising the configuration of the moving image analysis system according to any one of items 1 to 3.
[Item 5]
A moving image analysis program that causes the moving image analysis device to function the configuration of the moving image analysis system according to any one of items 1 to 3.
[Item 6]
A moving image analysis method for executing the configuration of the moving image analysis system according to any one of items 1 to 3 as a step.

以下に添付図面を参照しながら、本開示の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Preferred embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. In the present specification and the drawings, components having substantially the same functional configuration are designated by the same reference numerals, and duplicate description will be omitted.

＜基本機能＞
本実施形態のビデオセッション評価システムは、複数人でビデオセッション（以下、一方向及び双方向含めてオンラインセッションという）が行われる環境において、当該複数人の中の解析対象者について他者とは異なる特異的な感情（自分または他人の言動に対して起こる気持ち。快・不快またはその程度など）を解析し評価するシステムである。オンラインセッションは、例えばオンライン会議、オンライン授業、オンラインチャットなどであり、複数の場所に設置された端末をインターネットなどの通信ネットワークを介してサーバに接続し、当該サーバを通じて複数の端末間で動画像をやり取りできるようにしたものである。オンラインセッションで扱う動画像には、端末を使用するユーザの顔画像や音声が含まれる。また、動画像には、複数のユーザが共有して閲覧する資料などの画像も含まれる。各端末の画面上に顔画像と資料画像とを切り替えて何れか一方のみを表示させたり、表示領域を分けて顔画像と資料画像とを同時に表示させたりすることが可能である。また、複数人のうち１人の画像を全画面表示させたり、一部または全部のユーザの画像を小画面に分割して表示させたりすることが可能である。端末を使用してオンラインセッションに参加する複数のユーザのうち、何れか１人または複数人を解析対象者として指定することが可能である。例えば、オンラインセッションの主導者、進行者または管理者（以下、まとめて主催者という）が何れかのユーザを解析対象者として指定する。オンラインセッションの主催者は、例えばオンライン授業の講師、オンライン会議の議長やファシリテータ、コーチングを目的としたセッションのコーチなどである。オンラインセッションの主催者は、オンラインセッションに参加する複数のユーザの中の一人であるのが普通であるが、オンラインセッションに参加しない別人であってもよい。なお、解析対象者を指定せず全ての参加者を解析対象としてもよい。また、オンラインセッションの主導者、進行者または管理者（以下、まとめて主催者という）が何れかのユーザを解析対象者として指定することも可能である。オンラインセッションの主催者は、例えばオンライン授業の講師、オンライン会議の議長やファシリテータ、コーチングを目的としたセッションのコーチなどである。オンラインセッションの主催者は、オンラインセッションに参加する複数のユーザの中の一人であるのが普通であるが、オンラインセッションに参加しない別人であってもよい。<Basic function>
The video session evaluation system of the present embodiment is different from others in the analysis target person among the plurality of people in an environment where a video session (hereinafter referred to as an online session including one-way and two-way) is performed by a plurality of people. It is a system that analyzes and evaluates specific emotions (feelings that occur in the words and actions of oneself or others, such as comfort / discomfort or the degree thereof). An online session is, for example, an online conference, an online class, an online chat, etc., in which terminals installed in multiple locations are connected to a server via a communication network such as the Internet, and moving images are transmitted between the terminals through the server. It is designed to be able to communicate. The moving images handled in the online session include facial images and sounds of the user who uses the terminal. In addition, the moving image also includes an image such as a material shared and viewed by a plurality of users. It is possible to switch between the face image and the material image on the screen of each terminal to display only one of them, or to divide the display area and display the face image and the material image at the same time. Further, it is possible to display the image of one of a plurality of people on the full screen, or to display the image of a part or all of the users on a small screen. It is possible to specify any one or more of the plurality of users who participate in the online session using the terminal as the analysis target person. For example, the leader, facilitator, or administrator of an online session (hereinafter collectively referred to as the organizer) designates any user as the analysis target. Organizers of online sessions include, for example, instructors of online classes, chairs and facilitators of online conferences, and coaches of sessions for coaching purposes. The organizer of an online session is usually one of a plurality of users who participate in the online session, but may be another person who does not participate in the online session. In addition, all the participants may be the analysis target without designating the analysis target person. It is also possible for the leader, facilitator, or administrator of the online session (hereinafter collectively referred to as the organizer) to designate any user as the analysis target. Organizers of online sessions include, for example, instructors of online classes, chairs and facilitators of online conferences, and coaches of sessions for coaching purposes. The organizer of an online session is usually one of a plurality of users who participate in the online session, but may be another person who does not participate in the online session.

本実施の形態によるビデオセッション評価システムは、複数の端末間においてビデオセッションセッションが確立された場合に、当該ビデオセッションから取得される少なくとも動画像を表示される。表示された動画像は、端末によって取得され、動画像内に含まれる少なくとも顔画像を所定のフレーム単位ごとに識別される。その後、識別された顔画像に関する評価値が算出される。当該評価値は必要に応じて共有される。特に、本実施の形態においては、取得した動画像は当該端末に保存され、端末上で分析評価され、その結果が当該端末のユーザに提供される。従って、例えば個人情報を含むビデオセッションや機密情報を含むビデオセッションであっても、その動画自体を外部の評価機関等に提供することなく分析評価できる。また、必要に応じて、当該評価結果（評価値）だけを外部端末に提供することによって、結果を可視化したり、クロス分析等行うことができる。 When a video session session is established between a plurality of terminals, the video session evaluation system according to the present embodiment displays at least a moving image acquired from the video session. The displayed moving image is acquired by the terminal, and at least the facial image contained in the moving image is identified for each predetermined frame unit. After that, the evaluation value for the identified facial image is calculated. The evaluation value is shared as necessary. In particular, in the present embodiment, the acquired moving image is stored in the terminal, analyzed and evaluated on the terminal, and the result is provided to the user of the terminal. Therefore, for example, even a video session containing personal information or a video session containing confidential information can be analyzed and evaluated without providing the video itself to an external evaluation organization or the like. Further, by providing only the evaluation result (evaluation value) to the external terminal as needed, the result can be visualized, cross analysis, or the like can be performed.

図１に示されるように、本実施の形態によるビデオセッション評価システムは、少なくともカメラ部及びマイク部等の入力部と、ディスプレイ等の表示部とスピーカー等の出力部とを有するユーザ端末１０、２０と、ユーザ端末１０、２０に双方向のビデオセッションを提供するビデオセッションサービス端末３０と、ビデオセッションに関する評価の一部を行う評価端末４０とを備えている。 As shown in FIG. 1, the video session evaluation system according to the present embodiment has user terminals 10 and 20 having at least an input unit such as a camera unit and a microphone unit, a display unit such as a display unit, and an output unit such as a speaker. It also includes a video session service terminal 30 that provides bidirectional video sessions to user terminals 10 and 20, and an evaluation terminal 40 that performs a part of evaluation related to the video session.

＜ハードウェア構成例＞
以下に説明する各機能ブロック、機能単位、機能モジュールは、例えばコンピュータに備えられたハードウェア、ＤＳＰ（Digital Signal Processor）、ソフトウェアの何れによっても構成することが可能である。例えばソフトウェアによって構成する場合、実際にはコンピュータのＣＰＵ、ＲＡＭ、ＲＯＭなどを備えて構成され、ＲＡＭやＲＯＭ、ハードディスクまたは半導体メモリ等の記録媒体に記憶されたプログラムが動作することによって実現される。本明細書において説明するシステム及び端末による一連の処理は、ソフトウェア、ハードウェア、及びソフトウェアとハードウェアとの組合せのいずれを用いて実現されてもよい。本実施形態に係る情報共有支援装置１０の各機能を実現するためのコンピュータプログラムを作製し、ＰＣ等に実装することが可能である。また、このようなコンピュータプログラムが格納された、コンピュータで読み取り可能な記録媒体も提供することが可能である。記録媒体は、例えば、磁気ディスク、光ディスク、光磁気ディスク、フラッシュメモリ等である。また、上記のコンピュータプログラムは、記録媒体を用いずに、例えばネットワークを介して配信されてもよい。<Hardware configuration example>
Each functional block, functional unit, and functional module described below can be configured by any of hardware, DSP (Digital Signal Processor), and software provided in a computer, for example. For example, when it is configured by software, it is actually configured to include a CPU, RAM, ROM, etc. of a computer, and is realized by operating a program stored in a recording medium such as RAM, ROM, a hard disk, or a semiconductor memory. The series of processes by the system and the terminal described in the present specification may be realized by using any software, hardware, and a combination of software and hardware. It is possible to create a computer program for realizing each function of the information sharing support device 10 according to the present embodiment and implement it on a PC or the like. It is also possible to provide a computer-readable recording medium in which such a computer program is stored. The recording medium is, for example, a magnetic disk, an optical disk, a magneto-optical disk, a flash memory, or the like. Further, the above computer program may be distributed, for example, via a network without using a recording medium.

本実施の形態による評価端末は、ビデオセッションサービス端末から動画像を取得し、当該動画像内に含まれる少なくとも顔画像を所定のフレーム単位ごとに識別すると共に、顔画像に関する評価値を算出する（詳しくは後述する）。 The evaluation terminal according to the present embodiment acquires a moving image from the video session service terminal, identifies at least the facial image contained in the moving image for each predetermined frame unit, and calculates the evaluation value for the facial image (). Details will be described later).

＜動画の取得方法＞
図３に示されるように、ビデオセッションサービス端末が提供するビデオセッションサービス（以下、単に「本サービス」と言うことがある」）は、ユーザ端末１０、２０に対して双方向に画像および音声によって通信が可能となるものである。本サービスは、ユーザ端末のディスプレイに相手のユーザ端末のカメラ部で取得した動画像を表示し、相手のユーザ端末のマイク部で取得した音声をスピーカーから出力可能となっている。また、本サービスは双方の又はいずれかのユーザ端末によって、動画像及び音声（これらを合わせて「動画像等」という）を少なくともいずれかのユーザ端末上の記憶部に記録（レコーディング）することが可能に構成されている。記録された動画像情報Ｖｓ（以下「記録情報」という）は、記録を開始したユーザ端末にキャッシュされつついずれかのユーザ端末のローカルのみに記録されることとなる。ユーザは、必要があれば当該記録情報を本サービスの利用の範囲内で自分で視聴、他者に共有等行うこともできる。<How to get video>
As shown in FIG. 3, the video session service provided by the video session service terminal (hereinafter, may be simply referred to as “the service”) is bidirectionally imaged and voiced with respect to the user terminals 10 and 20. Communication is possible. This service displays a moving image acquired by the camera unit of the other user terminal on the display of the user terminal, and can output the sound acquired by the microphone unit of the other user terminal from the speaker. In addition, this service may record (record) moving images and audio (collectively referred to as "moving images, etc.") in a storage unit on at least one of the user terminals by either or both user terminals. It is configured to be possible. The recorded moving image information Vs (hereinafter referred to as "recording information") is cached in the user terminal that started recording and is recorded only locally in one of the user terminals. If necessary, the user can view the recorded information by himself / herself within the scope of using this service, share it with others, and so on.

＜機能構成例１＞
図４は、本実施形態による構成例を示すブロック図である。図４に示すように、本実施形態のビデオセッション評価システムは、ユーザ端末１０が有する機能構成として実現される。すなわち、ユーザ端末１０はその機能として、動画像取得部１１、生体反応解析部１２、特異判定部１３、関連事象特定部１４、クラスタリング部１５および解析結果通知部１６を備えている。<Function configuration example 1>
FIG. 4 is a block diagram showing a configuration example according to the present embodiment. As shown in FIG. 4, the video session evaluation system of the present embodiment is realized as a functional configuration of the user terminal 10. That is, the user terminal 10 includes a moving image acquisition unit 11, a biological reaction analysis unit 12, a peculiarity determination unit 13, a related event identification unit 14, a clustering unit 15, and an analysis result notification unit 16 as its functions.

動画像取得部１１は、オンラインセッション中に各端末が備えるカメラにより複数人（複数のユーザ）を撮影することによって得られる動画像を各端末から取得する。各端末から取得する動画像は、各端末の画面上に表示されるように設定されているものか否かは問わない。すなわち、動画像取得部１１は、各端末に表示中の動画像および非表示中の動画像を含めて、動画像を各端末から取得する。 The moving image acquisition unit 11 acquires a moving image obtained by photographing a plurality of people (a plurality of users) with a camera provided in each terminal during an online session from each terminal. It does not matter whether the moving image acquired from each terminal is set to be displayed on the screen of each terminal. That is, the moving image acquisition unit 11 acquires the moving image from each terminal, including the moving image being displayed on each terminal and the moving image being hidden.

生体反応解析部１２は、動画像取得部１１により取得された動画像（画面上に表示中のものか否かは問わない）に基づいて、複数人のそれぞれについて生体反応の変化を解析する。本実施形態において生体反応解析部１２は、動画像取得部１１により取得された動画像を画像のセット（フレーム画像の集まり）と音声とに分離し、それぞれから生体反応の変化を解析する。 The biological reaction analysis unit 12 analyzes changes in the biological reaction of each of a plurality of persons based on the moving image (whether or not it is displayed on the screen) acquired by the moving image acquisition unit 11. In the present embodiment, the biological reaction analysis unit 12 separates the moving image acquired by the moving image acquisition unit 11 into a set of images (a collection of frame images) and a voice, and analyzes changes in the biological reaction from each.

例えば、生体反応解析部１２は、動画像取得部１１により取得された動画像から分離したフレーム画像を用いてユーザの顔画像を解析することにより、表情、目線、脈拍、顔の動きの少なくとも１つに関する生体反応の変化を解析する。また、生体反応解析部１２は、動画像取得部１１により取得された動画像から分離した音声を解析することにより、ユーザの発言内容、声質の少なくとも１つに関する生体反応の変化を解析する。 For example, the biological reaction analysis unit 12 analyzes the user's face image using the frame image separated from the moving image acquired by the moving image acquisition unit 11, and thereby at least one of the facial expression, the line of sight, the pulse, and the movement of the face. Analyze changes in biological reactions related to one. In addition, the biological reaction analysis unit 12 analyzes changes in the biological reaction regarding at least one of the user's speech content and voice quality by analyzing the voice separated from the moving image acquired by the moving image acquisition unit 11.

人は感情が変化すると、それが表情、目線、脈拍、顔の動き、発言内容、声質などの生体反応の変化となって現れる。本実施形態では、ユーザの生体反応の変化を解析することを通じて、ユーザの感情の変化を解析する。本実施形態において解析する感情は、一例として、快／不快の程度である。本実施形態において生体反応解析部１２は、生体反応の変化を所定の基準に従って数値化することにより、生体反応の変化の内容を反映させた生体反応指標値を算出する。 When a person's emotions change, they appear as changes in biological reactions such as facial expressions, eyes, pulse, facial movements, speech content, and voice quality. In the present embodiment, changes in the user's emotions are analyzed by analyzing changes in the user's biological reaction. The emotion analyzed in this embodiment is, for example, the degree of comfort / discomfort. In the present embodiment, the biological reaction analysis unit 12 calculates the biological reaction index value reflecting the content of the change in the biological reaction by quantifying the change in the biological reaction according to a predetermined standard.

表情の変化の解析は、例えば以下のようにして行う。すなわち、フレーム画像ごとに、フレーム画像の中から顔の領域を特定し、事前に機械学習させた画像解析モデルに従って特定した顔の表情を複数に分類する。そして、その分類結果に基づいて、連続するフレーム画像間でポジティブな表情変化が起きているか、ネガティブな表情変化が起きているか、およびどの程度の大きさの表情変化が起きているかを解析し、その解析結果に応じた表情変化指標値を出力する。 The analysis of changes in facial expressions is performed, for example, as follows. That is, for each frame image, a facial area is specified from the frame image, and the specified facial expressions are classified into a plurality of types according to an image analysis model trained in advance by machine learning. Then, based on the classification result, it is analyzed whether a positive facial expression change occurs between consecutive frame images, a negative facial expression change occurs, and how large the facial expression change occurs. The facial expression change index value according to the analysis result is output.

目線の変化の解析は、例えば以下のようにして行う。すなわち、フレーム画像ごとに、フレーム画像の中から目の領域を特定し、両目の向きを解析することにより、ユーザがどこを見ているかを解析する。例えば、表示中の話者の顔を見ているか、表示中の共有資料を見ているか、画面の外を見ているかなどを解析する。また、目線の動きが大きいか小さいか、動きの頻度が多いか少ないかなどを解析するようにしてもよい。目線の変化はユーザの集中度にも関連する。生体反応解析部１２は、目線の変化の解析結果に応じた目線変化指標値を出力する。 The analysis of the change in the line of sight is performed, for example, as follows. That is, for each frame image, the area of the eyes is specified from the frame image, and the orientation of both eyes is analyzed to analyze where the user is looking. For example, it analyzes whether the speaker's face being displayed, the shared material being displayed, or the outside of the screen is being viewed. In addition, it may be possible to analyze whether the movement of the line of sight is large or small, and whether the movement is frequent or infrequent. The change in the line of sight is also related to the degree of concentration of the user. The biological reaction analysis unit 12 outputs the line-of-sight change index value according to the analysis result of the line-of-sight change.

脈拍の変化の解析は、例えば以下のようにして行う。すなわち、フレーム画像ごとに、フレーム画像の中から顔の領域を特定する。そして、顔の色情報（ＲＧＢのＧ）の数値を捉える学習済みの画像解析モデルを用いて、顔表面のＧ色の変化を解析する。その結果を時間軸に合わせて並べることによって色情報の変化を表した波形を形成し、この波形から脈拍を特定する。人は緊張すると脈拍が速くなり、気持ちが落ち着くと脈拍が遅くなる。生体反応解析部１２は、脈拍の変化の解析結果に応じた脈拍変化指標値を出力する。 The analysis of the change in the pulse is performed, for example, as follows. That is, for each frame image, the face area is specified from the frame image. Then, using a trained image analysis model that captures the numerical value of the face color information (G in RGB), the change in the G color on the face surface is analyzed. By arranging the results along the time axis, a waveform showing the change in color information is formed, and the pulse is specified from this waveform. When a person is nervous, the pulse becomes faster, and when he / she feels calm, the pulse becomes slower. The biological reaction analysis unit 12 outputs a pulse change index value according to the analysis result of the pulse change.

顔の動きの変化の解析は、例えば以下のようにして行う。すなわち、フレーム画像ごとに、フレーム画像の中から顔の領域を特定し、顔の向きを解析することにより、ユーザがどこを見ているかを解析する。例えば、表示中の話者の顔を見ているか、表示中の共有資料を見ているか、画面の外を見ているかなどを解析する。また、顔の動きが大きいか小さいか、動きの頻度が多いか少ないかなどを解析するようにしてもよい。顔の動きと目線の動きとを合わせて解析するようにしてもよい。例えば、表示中の話者の顔をまっすぐ見ているか、上目遣いまたは下目使いに見ているか、斜めから見ているかなどを解析するようにしてもよい。生体反応解析部１２は、顔の向きの変化の解析結果に応じた顔向き変化指標値を出力する。 The analysis of changes in facial movement is performed, for example, as follows. That is, for each frame image, the area of the face is specified from the frame image, and the orientation of the face is analyzed to analyze where the user is looking. For example, it analyzes whether the speaker's face being displayed, the shared material being displayed, or the outside of the screen is being viewed. In addition, it may be possible to analyze whether the movement of the face is large or small, and whether the movement is frequent or infrequent. The movement of the face and the movement of the line of sight may be combined and analyzed. For example, it may be possible to analyze whether the speaker's face being displayed is viewed straight, whether the speaker is viewed with an upper eye or a lower eye, or whether the speaker is viewed from an angle. The biological reaction analysis unit 12 outputs a face orientation change index value according to the analysis result of the face orientation change.

発言内容の解析は、例えば以下のようにして行う。すなわち、生体反応解析部１２は、指定した時間（例えば、３０～１５０秒程度の時間）の音声について公知の音声認識処理を行うことによって音声を文字列に変換し、当該文字列を形態素解析することにより、助詞、冠詞などの会話を表す上で不要なワードを取り除く。そして、残ったワードをベクトル化し、ポジティブな感情変化が起きているか、ネガティブな感情変化が起きているか、およびどの程度の大きさの感情変化が起きているかを解析し、その解析結果に応じた発言内容指標値を出力する。 The content of the statement is analyzed, for example, as follows. That is, the biological reaction analysis unit 12 converts the voice into a character string by performing a known voice recognition process on the voice for a specified time (for example, a time of about 30 to 150 seconds), and performs morphological analysis of the character string. By doing so, words that are unnecessary for expressing conversation such as auxiliary words and acronyms are removed. Then, the remaining words are vectorized, and whether a positive emotional change is occurring, a negative emotional change is occurring, and how large the emotional change is occurring is analyzed, and the analysis result is used. Outputs the statement content index value.

声質の解析は、例えば以下のようにして行う。すなわち、生体反応解析部１２は、指定した時間（例えば、３０～１５０秒程度の時間）の音声について公知の音声解析処理を行うことによって音声の音響的特徴を特定する。そして、その音響的特徴に基づいて、ポジティブな声質変化が起きているか、ネガティブな声質変化が起きているか、およびどの程度の大きさの声質変化が起きているかを解析し、その解析結果に応じた声質変化指標値を出力する。 The voice quality is analyzed, for example, as follows. That is, the biological reaction analysis unit 12 identifies the acoustic characteristics of the voice by performing a known voice analysis process on the voice for a specified time (for example, a time of about 30 to 150 seconds). Then, based on the acoustic characteristics, it is analyzed whether a positive voice quality change is occurring, a negative voice quality change is occurring, and how loud the voice quality change is occurring, and according to the analysis result. Outputs the voice quality change index value.

生体反応解析部１２は、以上のようにして算出した表情変化指標値、目線変化指標値、脈拍変化指標値、顔向き変化指標値、発言内容指標値、声質変化指標値の少なくとも１つを用いて生体反応指標値を算出する。例えば、表情変化指標値、目線変化指標値、脈拍変化指標値、顔向き変化指標値、発言内容指標値および声質変化指標値を重み付け計算することにより、生体反応指標値を算出する。 The biological reaction analysis unit 12 uses at least one of the facial expression change index value, the line-of-sight change index value, the pulse change index value, the face orientation change index value, the speech content index value, and the voice quality change index value calculated as described above. The biological reaction index value is calculated. For example, the biological reaction index value is calculated by weighting the facial expression change index value, the line-of-sight change index value, the pulse change index value, the face orientation change index value, the speech content index value, and the voice quality change index value.

特異判定部１３は、解析対象者について解析された生体反応の変化が、解析対象者以外の他者について解析された生体反応の変化と比べて特異的か否かを判定する。本実施形態において、特異判定部１３は、生体反応解析部１２により複数のユーザのそれぞれについて算出された生体反応指標値に基づいて、解析対象者について解析された生体反応の変化が他者と比べて特異的か否かを判定する。 The peculiarity determination unit 13 determines whether or not the change in the biological reaction analyzed for the analysis target person is specific to the change in the biological reaction analyzed for a person other than the analysis target person. In the present embodiment, the peculiarity determination unit 13 compares the changes in the biological reaction analyzed for the analysis target person with those of others based on the biological reaction index values calculated for each of the plurality of users by the biological reaction analysis unit 12. To determine whether it is specific or not.

例えば、特異判定部１３は、生体反応解析部１２により複数人のそれぞれについて算出された生体反応指標値の分散を算出し、解析対象者について算出された生体反応指標値と分散との対比により、解析対象者について解析された生体反応の変化が他者と比べて特異的か否かを判定する。 For example, the peculiarity determination unit 13 calculates the variance of the biological reaction index value calculated for each of a plurality of persons by the biological reaction analysis unit 12, and compares the biological reaction index value calculated for the analysis target person with the variance. It is determined whether or not the change in the biological reaction analyzed for the person to be analyzed is specific compared to the other person.

解析対象者について解析された生体反応の変化が他者と比べて特異的である場合として、次の３パターンが考えられる。１つ目は、他者については特に大きな生体反応の変化が起きていないが、解析対象者について比較的大きな生体反応の変化が起きた場合である。２つ目は、解析対象者については特に大きな生体反応の変化が起きていないが、他者について比較的大きな生体反応の変化が起きた場合である。３つ目は、解析対象者についても他者についても比較的大きな生体反応の変化が起きているが、変化の内容が解析対象者と他者とで異なる場合である。 The following three patterns can be considered as cases where the change in the biological reaction analyzed for the person to be analyzed is more specific than that of the other person. The first is the case where a particularly large change in the biological reaction has not occurred in the other person, but a relatively large change in the biological reaction has occurred in the person to be analyzed. The second is the case where a particularly large change in the biological reaction has not occurred in the subject to be analyzed, but a relatively large change in the biological reaction has occurred in the other person. The third is the case where a relatively large change in the biological reaction occurs in both the analysis target person and the other person, but the content of the change is different between the analysis target person and the other person.

関連事象特定部１４は、特異判定部１３により特異的であると判定された生体反応の変化が起きたときに解析対象者、他者および環境の少なくとも１つに関して発生している事象を特定する。例えば、関連事象特定部１４は、解析対象者について特異的な生体反応の変化が起きたときにおける解析対象者自身の言動を動画像から特定する。また、関連事象特定部１４は、解析対象者について特異的な生体反応の変化が起きたときにおける他者の言動を動画像から特定する。また、関連事象特定部１４は、解析対象者について特異的な生体反応の変化が起きたときにおける環境を動画像から特定する。環境は、例えば画面に表示中の共有資料、解析対象者の背景に写っているものなどである。 The related event identification unit 14 identifies an event occurring with respect to at least one of the analysis subject, another person, and the environment when a change in the biological reaction determined to be specific by the peculiarity determination unit 13 occurs. .. For example, the related event identification unit 14 identifies the behavior of the analysis target person himself / herself from the moving image when a specific change in the biological reaction occurs for the analysis target person. In addition, the related event identification unit 14 identifies the words and actions of another person from the moving image when a specific change in the biological reaction occurs for the analysis target person. In addition, the related event identification unit 14 identifies the environment when a specific change in the biological reaction occurs for the analysis target person from the moving image. The environment is, for example, a shared material displayed on the screen, an environment reflected in the background of the person to be analyzed, and the like.

クラスタリング部１５は、特異判定部１３により特異的であると判定された生体反応の変化（例えば、目線、脈拍、顔の動き、発言内容、声質のうち１つまたは複数の組み合わせ）と、当該特異的な生体反応の変化が起きたときに発生している事象（関連事象特定部１４により特定された事象）との相関の程度を解析し、相関が一定レベル以上であると判定された場合に、その相関の解析結果に基づいて解析対象者または事象をクラスタリングする。 The clustering unit 15 includes changes in biological reactions determined to be specific by the peculiarity determination unit 13 (for example, one or a combination of eyes, pulse, facial movement, speech content, and voice quality) and the peculiarity. When the degree of correlation with the event (event specified by the related event identification unit 14) that occurs when a change in the biological reaction occurs is analyzed, and it is determined that the correlation is above a certain level. , Cluster the analysis target person or event based on the analysis result of the correlation.

例えば、特異的な生体反応の変化がネガティブな感情変化に相当するものであり、当該特異的な生体反応の変化が起きたときに発生している事象もネガティブな事象である場合には一定レベル以上の相関が検出される。クラスタリング部１５は、その事象の内容やネガティブな度合い、相関の大きさなどに応じて、あらかじめセグメント化した複数の分類の何れかに解析対象者または事象をクラスタリングする。 For example, if a change in a specific biological reaction corresponds to a negative emotional change, and the event occurring when the specific change in the biological reaction occurs is also a negative event, a certain level The above correlation is detected. The clustering unit 15 clusters the analysis target person or the event in any of a plurality of pre-segmented classifications according to the content of the event, the degree of negativeness, the magnitude of the correlation, and the like.

同様に、特異的な生体反応の変化がポジティブな感情変化に相当するものであり、当該特異的な生体反応の変化が起きたときに発生している事象もポジティブな事象である場合には一定レベル以上の相関が検出される。クラスタリング部１５は、その事象の内容やポジティブな度合い、相関の大きさなどに応じて、あらかじめセグメント化した複数の分類の何れかに解析対象者または事象をクラスタリングする。 Similarly, if a specific change in biological reaction corresponds to a positive emotional change, and the event occurring when the specific change in biological reaction occurs is also constant if it is a positive event. Correlation above the level is detected. The clustering unit 15 clusters the analysis target person or the event in any of a plurality of pre-segmented classifications according to the content of the event, the degree of positiveness, the magnitude of the correlation, and the like.

解析結果通知部１６は、特異判定部１３により特異的であると判定された生体反応の変化、関連事象特定部１４により特定された事象、およびクラスタリング部１５によりクラスタリングされた分類の少なくとも１つを、解析対象者の指定者（解析対象者またはオンラインセッションの主催者）に通知する。 The analysis result notification unit 16 determines at least one of the changes in the biological reaction determined to be specific by the peculiarity determination unit 13, the event specified by the related event identification unit 14, and the classification clustered by the clustering unit 15. , Notify the designated person of the analysis target (analysis target person or the organizer of the online session).

例えば、解析結果通知部１６は、解析対象者について他者とは異なる特異的な生体反応の変化が起きたとき（上述した３パターンの何れか。以下同様）に発生している事象として解析対象者自身の言動を解析対象者自身に通知する。これにより、解析対象者は、自分がある言動を行ったときに他者とは違う感情を持っていることを把握することができる。このとき、解析対象者について特定された特異的な生体反応の変化も併せて解析対象者に通知するようにしてもよい。さらに、対比される他者の生体反応の変化を更に解析対象者に通知するようにしてもよい。 For example, the analysis result notification unit 16 analyzes the analysis target as an event that occurs when a specific change in biological reaction occurs in the analysis target person (any of the above-mentioned three patterns; the same applies hereinafter). Notify the person to be analyzed of the person's own words and actions. As a result, the person to be analyzed can grasp that he / she has different emotions from others when he / she makes a certain word or action. At this time, the change of the specific biological reaction specified for the analysis target person may also be notified to the analysis target person. Further, the change in the biological reaction of the other person to be compared may be further notified to the analysis target person.

例えば、解析対象者が普段どおりの感情で特に意識せずに行った言動、または、解析対象者がある感情を伴って特に意識して行った言動に対して他者が受けた感情と、言動の際に解析対象者自身が抱いていた感情とが相違している場合に、そのときの解析対象者自身の言動が解析対象者に通知される。これにより、自分の意識に反して他者の受けが良い言動や他者の受けが良くない言動などを発見することも可能である。 For example, the words and actions that the analysis target person performed without being particularly conscious of the usual emotions, or the feelings and actions received by others for the words and actions that the analysis target person specifically consciously performed with certain emotions. If the emotions held by the analysis target person are different from each other at the time, the analysis target person's own words and actions at that time are notified to the analysis target person. This makes it possible to discover words and behaviors that are well received by others and words and behaviors that are not well received by others, contrary to one's consciousness.

また、解析結果通知部１６は、解析対象者について他者とは異なる特異的な生体反応の変化が起きたときに発生している事象を、特異的な生体反応の変化と共にオンラインセッションの主催者に通知する。これにより、オンラインセッションの主催者は、指定した解析対象者に特有の現象として、どのような事象がどのような感情の変化に影響を与えているのかを知ることができる。そして、その把握した内容に応じて適切な処置を解析対象者に対して行うことが可能となる。 In addition, the analysis result notification unit 16 organizes the online session of the events occurring when the specific biological reaction of the analysis target person is different from that of others, together with the specific change of the biological reaction. Notify to. This allows the organizer of the online session to know what kind of phenomenon influences what kind of emotional change as a phenomenon peculiar to the designated analysis target person. Then, it becomes possible to take appropriate measures for the analysis target person according to the grasped contents.

また、解析結果通知部１６は、解析対象者について他者とは異なる特異的な生体反応の変化が起きたときに発生している事象または解析対象者のクラスタリング結果をオンラインセッションの主催者に通知する。これにより、オンラインセッションの主催者は、指定した解析対象者がどの分類にクラスタリングされたかによって、解析対象者に特有の行動の傾向を把握したり、今後起こり得る行動や状態などを予測したりすることができる。そして、それに対して適切な処置を解析対象者に対して行うことが可能となる。 In addition, the analysis result notification unit 16 notifies the organizer of the online session of the event occurring when the analysis target person has a specific change in biological reaction different from that of others or the clustering result of the analysis target person. do. As a result, the organizer of the online session can grasp the behavior tendency peculiar to the analysis target person and predict the behavior or state that may occur in the future, depending on which classification the specified analysis target person is clustered into. be able to. Then, it becomes possible to take appropriate measures for the analysis target person.

なお、上記実施形態では、生体反応の変化を所定の基準に従って数値化することによって生体反応指標値を算出し、複数人のそれぞれについて算出された生体反応指標値に基づいて、解析対象者について解析された生体反応の変化が他者と比べて特異的か否かを判定する例について説明したが、この例に限定されない。例えば、以下のようにしてもよい。 In the above embodiment, the biological reaction index value is calculated by quantifying the change in the biological reaction according to a predetermined standard, and the analysis target person is analyzed based on the biological reaction index value calculated for each of the plurality of persons. An example of determining whether or not a change in a biological reaction has been made is specific compared to another person has been described, but the present invention is not limited to this example. For example, it may be as follows.

すなわち、生体反応解析部１２は、複数人のそれぞれについて目線の動きを解析して目線の方向を示すヒートマップを生成する。特異判定部１３は、生体反応解析部１２により解析対象者について生成されたヒートマップと他者について生成されたヒートマップとの対比により、解析対象者について解析された生体反応の変化が、他者について解析された生体反応の変化と比べて特異的か否かを判定する。 That is, the biological reaction analysis unit 12 analyzes the movement of the line of sight for each of the plurality of people and generates a heat map showing the direction of the line of sight. In the peculiarity determination unit 13, the change in the biological reaction analyzed for the analysis target person is measured by the comparison between the heat map generated for the analysis target person by the biological reaction analysis unit 12 and the heat map generated for the other person. It is determined whether or not it is specific by comparing with the change in the biological reaction analyzed for.

このように、本実施の形態においては、ビデオセッションの動画像をユーザ端末１０のローカルストレージに保存し、ユーザ端末１０上で上述した分析を行うこととしている。ユーザ端末１０のマシンスペックに依存する可能性があるとはいえ、動画像の情報を外部に提供することなく分析することが可能となる。 As described above, in the present embodiment, the moving image of the video session is stored in the local storage of the user terminal 10, and the above-mentioned analysis is performed on the user terminal 10. Although it may depend on the machine specifications of the user terminal 10, it is possible to analyze the moving image information without providing it to the outside.

＜機能構成例２＞
図５に示すように、本実施形態のビデオセッション評価システムは、機能構成として、動画像取得部１１、生体反応解析部１２および反応情報提示部１３ａを備えていてもよい。<Function configuration example 2>
As shown in FIG. 5, the video session evaluation system of the present embodiment may include a moving image acquisition unit 11, a biological reaction analysis unit 12, and a reaction information presentation unit 13a as functional configurations.

反応情報提示部１３ａは、画面に表示されていない参加者を含めて生体反応解析部１２ａにより解析された生体反応の変化を示す情報を提示する。例えば、反応情報提示部１３ａは、生体反応の変化を示す情報をオンラインセッションの主導者、進行者または管理者（以下、まとめて主催者という）に提示する。オンラインセッションの主催者は、例えばオンライン授業の講師、オンライン会議の議長やファシリテータ、コーチングを目的としたセッションのコーチなどである。オンラインセッションの主催者は、オンラインセッションに参加する複数のユーザの中の一人であるのが普通であるが、オンラインセッションに参加しない別人であってもよい。 The reaction information presentation unit 13a presents information indicating changes in the biological reaction analyzed by the biological reaction analysis unit 12a, including the participants who are not displayed on the screen. For example, the reaction information presentation unit 13a presents information indicating changes in the biological reaction to the leader, facilitator, or manager of the online session (hereinafter collectively referred to as the organizer). Organizers of online sessions include, for example, instructors of online classes, chairs and facilitators of online conferences, and coaches of sessions for coaching purposes. The organizer of an online session is usually one of a plurality of users who participate in the online session, but may be another person who does not participate in the online session.

このようにすることにより、オンラインセッションの主催者は、複数人でオンラインセッションが行われる環境において、画面に表示されていない参加者の様子も把握することができる。 By doing so, the organizer of the online session can also grasp the state of the participants who are not displayed on the screen in the environment where the online session is held by a plurality of people.

＜機能構成例３＞
図６は、本実施形態による構成例を示すブロック図である。図６に示すように、本実施形態のビデオセッション評価システムは、機能構成として、上述した実施の形態１と類似する機能については同一つの参照符号を付して説明を省略することがある。<Functional configuration example 3>
FIG. 6 is a block diagram showing a configuration example according to the present embodiment. As shown in FIG. 6, in the video session evaluation system of the present embodiment, as a functional configuration, the same reference numerals may be given to the functions similar to those of the above-described first embodiment, and the description thereof may be omitted.

本実施の形態によるシステムは、ビデオセッションの映像を取得するカメラ部及び音声を取得するマイク部と、動画像を分析及び評価する解析部、取得した動画像を評価することによって得られた情報に基づいて表示オブジェクト（後述する）を生成するオブジェクト生成部、前記ビデオセッション実行中にビデオセッションの動画像と表示オブジェクトの両方を表示する表示部と、を備えている。 The system according to the present embodiment includes a camera unit for acquiring video of a video session, a microphone unit for acquiring audio, an analysis unit for analyzing and evaluating moving images, and information obtained by evaluating the acquired moving images. It includes an object generation unit that generates display objects (described later) based on the above, and a display unit that displays both a moving image of a video session and a display object during execution of the video session.

解析部は、上述した説明と同様に、動画像取得部１１、生体反応解析部１２、特異判定部１３、関連事象特定部１４、クラスタリング部１５および解析結果通知部１６を備えている。各要素の機能については上述したとおりである。 Similar to the above description, the analysis unit includes a moving image acquisition unit 11, a biological reaction analysis unit 12, a peculiarity determination unit 13, a related event identification unit 14, a clustering unit 15, and an analysis result notification unit 16. The function of each element is as described above.

図７に示されるように、オブジェクト生成部は、解析部によってビデオセッションから取得される動画像を解析した結果に基づいて、必要に応じて、当該認識した顔の部分を示すオブジェクト５０と、上述した分析・評価した内容を示す情報１００を当該動画像に重畳して表示する。当該オブジェクト５０は、複数人の顔が動画像内に移っている場合には、複数人全員の顔を識別し、表示することとしてもよい。 As shown in FIG. 7, the object generation unit has, if necessary, an object 50 indicating the recognized face portion based on the result of analyzing the moving image acquired from the video session by the analysis unit, and the above-mentioned object 50. The information 100 indicating the analyzed / evaluated content is superimposed and displayed on the moving image. When the faces of a plurality of people are moved in the moving image, the object 50 may identify and display the faces of all of the plurality of people.

また、オブジェクト５０は、例えば、相手側の端末において、ビデオセッションのカメラ機能を停止している場合（即ち、物理的にカメラを覆う等ではなく、ビデオセッションのアプリケーション内においてソフトウェア的に停止している場合）であっても、相手側のカメラで相手の顔を認識していた場合には、相手の顔が位置している部分にオブジェクト５０やオブジェクト１００を表示することとしてもよい。これにより、カメラ機能がオフになっていたとしても、相手側が端末の前にいることがお互い確認することが可能となる。この場合、例えば、ビデオセッションのアプリケーションにおいては、カメラから取得した情報を非表示にする一方、解析部によって認識された顔に対応するオブジェクト５０やオブジェクト１００のみを表示することとしてもよい。また、ビデオセッションから取得される映像情報と、解析部によって認識され得られた情報とを異なる表示レイヤーに分け、前者の情報に関するレイヤーを非表示にすることとしてもよい。 Further, the object 50 is stopped by software in the application of the video session, for example, when the camera function of the video session is stopped (that is, the camera is not physically covered by the other party's terminal). However, if the other party's camera recognizes the other party's face, the object 50 or the object 100 may be displayed in the portion where the other party's face is located. This makes it possible for each other to confirm that the other party is in front of the terminal even if the camera function is turned off. In this case, for example, in a video session application, the information acquired from the camera may be hidden, while only the object 50 or the object 100 corresponding to the face recognized by the analysis unit may be displayed. Further, the video information acquired from the video session and the information recognized by the analysis unit may be divided into different display layers, and the layer related to the former information may be hidden.

オブジェクト５０やオブジェクト１００は、複数の動画像を表示する領域がある場合には、すべての領域又は一部の領域のみに表示することとしてもよい。例えば、図８に示されるように、ゲスト側の動画像のみに表示することとしてもよい。 When the object 50 or the object 100 has an area for displaying a plurality of moving images, the object 50 or the object 100 may be displayed only in all or a part of the area. For example, as shown in FIG. 8, it may be displayed only on the moving image on the guest side.

以上説明した基本構成例１乃至基本構成例３において説明した発明の実施の形態は、単独の装置として実現されてもよく、一部または全部がネットワークで接続された複数の装置（例えばクラウドサーバ）等により実現されてもよい。例えば、各端末１０の制御部１１０およびストレージ１３０は、互いにネットワークで接続された異なるサーバにより実現されてもよい。即ち、本システムは、ユーザ端末１０、２０と、ユーザ端末１０、２０に双方向のビデオセッションを提供するビデオセッションサービス端末３０と、ビデオセッションに関する評価を行う評価端末４０とを含んでいるところ、以下のような構成のバリエーション組み合わせが考えられる。
（１）すべてをユーザ端末のみで処理
図９に示されるように、解析部による処理をビデオセッションを行っている端末で行うことにより、（一定の処理能力は必要なものの）ビデオセッションを行っている時間と同時に（リアルタイムに）分析・評価結果を得ることができる。
（２）ユーザ端末と評価端末とで処理
図１０に示されるように、ネットワーク等で接続された評価端末に解析部を備えさせることとしてもよい。この場合、ユーザ端末で取得された動画像は、ビデオセッションと同時に又は事後的に評価端末に共有され、評価端末における解析部によって分析・評価されたのちに、オブジェクト５０及びオブジェクト１００の情報がユーザ端末に動画像データと共に又は別に（即ち、少なくとも解析データを含む情報が）共有され表示部に表示される。The embodiment of the invention described in the basic configuration example 1 to the basic configuration example 3 described above may be realized as a single device, or a plurality of devices (for example, a cloud server) partially or wholly connected by a network. It may be realized by such as. For example, the control unit 110 and the storage 130 of each terminal 10 may be realized by different servers connected to each other by a network. That is, the system includes user terminals 10 and 20, a video session service terminal 30 that provides bidirectional video sessions to user terminals 10 and 20, and an evaluation terminal 40 that evaluates video sessions. Variation combinations of the following configurations can be considered.
(1) All processing is performed only by the user terminal As shown in FIG. 9, a video session is performed (although a certain processing capacity is required) by performing the processing by the analysis unit on the terminal that is performing the video session. Analysis / evaluation results can be obtained at the same time as the time spent (in real time).
(2) Processing between the user terminal and the evaluation terminal As shown in FIG. 10, the evaluation terminal connected by a network or the like may be provided with an analysis unit. In this case, the moving image acquired by the user terminal is shared with the evaluation terminal at the same time as the video session or after the fact, and after being analyzed and evaluated by the analysis unit in the evaluation terminal, the information of the object 50 and the object 100 is obtained by the user. It is shared with or separately from the moving image data (that is, information including at least the analysis data) to the terminal and displayed on the display unit.

上述した機能構成例１乃至機能構成例３の各構成又はそれらの組み合わせを用いて、以下のシステムが実現する。 The following system is realized by using each of the above-mentioned functional configuration examples 1 to 3 or a combination thereof.

＜実施の形態＞
本発明の実施の形態による動画像分析システム（以下、単に「システム」という）は、複数人の参加者でオンラインセッションが行われる環境において、当該参加者の全員又は特定の参加者のみを撮影することによって得られる動画像をもとに参加者の反応を解析・分析するものである。分析は、オンラインセッション中に参加者が画面に表示されているか否かによらず行われるものとしてもよい。<Embodiment>
The moving image analysis system according to the embodiment of the present invention (hereinafter, simply referred to as “system”) photographs all of the participants or only a specific participant in an environment where an online session is held by a plurality of participants. The reaction of the participants is analyzed and analyzed based on the moving image obtained by the above. The analysis may be performed whether or not the participant is displayed on the screen during the online session.

図１０に示されるように、本実施の形態によるシステムは、動画像取得部と、解析部と、テキスト抽出部と、単語分析部と、プロット生成部とを備えている。動画像取得部は、オンラインセッション中に参加者を撮影することによって得られる動画像を取得する。解析部は、動画像取得部により取得された動画像に基づいて、参加者について生体反応の変化を解析する（図３乃至図５等も併せて参照）。 As shown in FIG. 10, the system according to the present embodiment includes a moving image acquisition unit, an analysis unit, a text extraction unit, a word analysis unit, and a plot generation unit. The moving image acquisition unit acquires a moving image obtained by photographing a participant during an online session. The analysis unit analyzes changes in the biological reaction of the participants based on the moving images acquired by the moving image acquisition unit (see also FIGS. 3 to 5 and the like).

図１１に示されるように、単語分析部は、動画像内に含まれる単語を分析して所定の文書ベクトルに変換する。プロット生成部は、変換した文書ベクトルを所定の次元に当該単語と共にプロットする。図示される例は、講師と学生とで行われた「講義」を解析したものである。図示されるように、縦軸付近には動画像内で用いられている単語として学業試験に関する「進路」「合格」「当日」という単語がプロットされており、横軸付近には動画像内で用いられている単語として特定の学科（数学）学業試験に関する「数学」「方程式」「解」という単語がプロットされている。 As shown in FIG. 11, the word analysis unit analyzes the words contained in the moving image and converts them into a predetermined document vector. The plot generator plots the converted document vector in a predetermined dimension together with the word. The example shown is an analysis of a "lecture" given by a lecturer and a student. As shown in the figure, the words "course", "pass", and "on the day" related to the academic test are plotted near the vertical axis as words used in the moving image, and near the horizontal axis in the moving image. The words "mathematics", "equation", and "solution" related to a specific subject (mathematics) academic test are plotted as the words used.

図１２に示されるように、例えば、図１１に示されるプロットのうち「進路」の点を選択すると、「進路」という単語が含まれる動画が抽出され、（順次又は選択させて上で）再生される。動画像は、当該単語を含むフレームから再生される。 As shown in FIG. 12, for example, when the point of "course" is selected from the plot shown in FIG. 11, the moving image containing the word "course" is extracted and played (sequentially or selected above). Will be done. The moving image is reproduced from the frame containing the word.

なお、図１３に示されるように、オンラインセッションの開始前に用意された事前情報に含まれる文書ベクトル（事前ベクトル）と、オンラインセッションで得られた動画像の文書ベクトル（事後ベクトル）とを比較して重なり具合を評価する評価部を更に備えていてもよい。図示されたように、黒塗りのポイントは事前ベクトルを表し、斜線のポイントは事後ベクトルを表す。 As shown in FIG. 13, the document vector (pre-vector) included in the prior information prepared before the start of the online session is compared with the document vector (post-vector) of the moving image obtained in the online session. Further, an evaluation unit for evaluating the degree of overlap may be provided. As shown, the blackened points represent the pre-vector and the shaded points represent the post-vector.

本明細書においてフローチャート図を用いて説明した処理は、必ずしも図示された順序で実行されなくてもよい。いくつかの処理ステップは、並列的に実行されてもよい。また、追加的な処理ステップが採用されてもよく、一部の処理ステップが省略されてもよい。 The processes described with reference to the flow charts herein may not necessarily be performed in the order shown. Some processing steps may be performed in parallel. Further, additional processing steps may be adopted, and some processing steps may be omitted.

以上説明した実施の形態を適宜組み合わせて実施することとしてもよい。また、本明細書に記載された効果は、あくまで説明的または例示的なものであって限定的ではない。つまり、本開示に係る技術は、上記の効果とともに、または上記の効果に代えて、本明細書の記載から当業者には明らかな他の効果を奏しうる。 The embodiments described above may be combined as appropriate. In addition, the effects described herein are merely explanatory or exemplary and are not limited. That is, the techniques according to the present disclosure may have other effects apparent to those skilled in the art from the description herein, in addition to or in place of the above effects.

１０、２０ユーザ端末
３０ビデオセッションサービス端末
４０評価端末

10, 20 User terminal 30 Video session service terminal 40 Evaluation terminal

Claims

In an environment where an online session is held by a plurality of participants, regardless of whether or not the participants are displayed on the screen during the online session, the above is based on the moving image obtained by photographing the participants. It is a moving image analysis system that analyzes the reaction of participants.
A moving image acquisition unit that acquires a moving image obtained by photographing the participant during the online session, and a moving image acquisition unit.
An analysis unit that analyzes changes in biological reactions of the participants based on the moving images acquired by the moving image acquisition unit.
A word analysis unit that analyzes words contained in the moving image and converts them into a predetermined document vector.
A plot generator that plots the converted document vector in a given dimension with the word,
The document vector included in the prior information prepared before the start of the online session is compared with the document vector of the moving image of the online session to evaluate the degree of overlap.
Video analysis system.

The moving image analysis system according to claim 1.
A reproduction unit for reproducing a moving image including a corresponding word when the plot is manipulated from a frame containing the word is further provided.
Video analysis system.