JP2020057111A

JP2020057111A - Facial expression determination system, program and facial expression determination method

Info

Publication number: JP2020057111A
Application number: JP2018186029A
Authority: JP
Inventors: 英二入江; Eiji Irie; 健一矢谷; Kenichi Yatani; 江村　恒一; Koichi Emura; 恒一江村; 岩田　芳明; Yoshiaki Iwata; 芳明岩田
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2018-09-28
Filing date: 2018-09-28
Publication date: 2020-04-09

Abstract

To provide a facial expression determination system, a program, and a facial expression determination method capable of reducing erroneous detection.SOLUTION: A facial expression determination system 1 includes a detection unit 20 and a processing unit 30. The detection unit 20 performs a detection process of detecting, from an input video including a face, a target period in which a face portion of the input video has changed. The processing unit 30 performs a determination process of determining whether or not at least a slight facial expression appears based on the input video in the target period.SELECTED DRAWING: Figure 1

Description

本開示は、表情判定システム、プログラム及び表情判定方法に関する。より詳細には、本開示は、入力映像から表情を判定するための表情判定システム、プログラム及び表情判定方法に関する。 The present disclosure relates to a facial expression determination system, a program, and a facial expression determination method. More specifically, the present disclosure relates to a facial expression determination system, a program, and a facial expression determination method for determining a facial expression from an input video.

従来、画像入力部が入力した画像中の表情を判断する画像処理装置（表情判定システム）があった（例えば特許文献１参照）。 2. Description of the Related Art Conventionally, there has been an image processing apparatus (expression determination system) that determines an expression in an image input by an image input unit (for example, see Patent Document 1).

特開２００５−５６３８８号公報JP 2005-56388 A

このような画像処理装置においては、微表情以外の顔の動き（例えば瞬き、痙攣、眼球の移動、及び顔全体の動き等）を微表情と誤検出することを低減することが望まれている。 In such an image processing apparatus, it is desired to reduce erroneous detection of a facial movement (for example, blinking, convulsions, movement of an eyeball, movement of the entire face, etc.) other than a facial expression as a facial expression. .

本開示の目的は、誤検出の低減を図ることが可能な表情判定システム、プログラム及び表情判定方法を提供することにある。 An object of the present disclosure is to provide a facial expression determination system, a program, and a facial expression determination method capable of reducing false detection.

本開示の一態様の表情判定システムは、検出部と、処理部と、を備える。前記検出部は、顔を含む入力映像から、前記入力映像のうち前記顔の部分に変化が発生している対象期間を検出する検出処理を行う。前記処理部は、前記対象期間での前記入力映像に基づいて、少なくとも微表情が表れているか否かを判定する判定処理を行う。 A facial expression determination system according to an embodiment of the present disclosure includes a detection unit and a processing unit. The detection unit performs a detection process of detecting, from an input video including a face, a target period in which a change occurs in the face of the input video. The processing unit performs a determination process of determining whether at least a faint expression is present based on the input video in the target period.

本開示の一態様のプログラムは、コンピュータシステムに、検出処理と、判定処理と、を実行させる。前記検出処理では、顔を含む入力映像から、前記入力映像のうち前記顔の部分に変化が発生している対象期間を検出する。前記判定処理では、前記対象期間での前記入力映像に基づいて、少なくとも微表情が表れているか否かを判定する。 A program according to an embodiment of the present disclosure causes a computer system to execute a detection process and a determination process. In the detection processing, a target period in which a change occurs in the face portion of the input video is detected from an input video including a face. In the determination processing, it is determined whether at least a faint expression is present based on the input video in the target period.

本開示の一態様の表情判定方法は、検出処理と、判定処理と、を含む。前記検出処理では、顔を含む入力映像から、前記入力映像のうち前記顔の部分に変化が発生している対象期間を検出する。前記判定処理では、前記対象期間での前記入力映像に基づいて、少なくとも微表情が表れているか否かを判定する。 A facial expression determination method according to an aspect of the present disclosure includes a detection process and a determination process. In the detection processing, a target period in which a change occurs in the face portion of the input video is detected from an input video including a face. In the determination processing, it is determined whether at least a faint expression is present based on the input video in the target period.

本開示によれば、誤検出の低減を図ることが可能な表情判定システム、プログラム及び表情判定方法を提供することができる。 According to the present disclosure, it is possible to provide a facial expression determination system, a program, and a facial expression determination method capable of reducing false detection.

図１は、本開示の一実施形態に係る表情判定システムのブロック図である。FIG. 1 is a block diagram of a facial expression determination system according to an embodiment of the present disclosure. 図２は、同上の表情判定システムの検出部での検出動作を説明する説明図である。FIG. 2 is an explanatory diagram illustrating a detection operation of the detection unit of the facial expression determination system according to the first embodiment. 図３は、同上の表情判定システムの検出部の動作を説明するタイムチャートである。FIG. 3 is a time chart illustrating the operation of the detection unit of the facial expression determination system according to the first embodiment. 図４は、同上の表情判定システムの動作を説明するフローチャートである。FIG. 4 is a flowchart illustrating the operation of the facial expression determination system according to the first embodiment. 図５は、本開示の一実施形態の変形例１に係る表情判定システムのブロック図である。FIG. 5 is a block diagram of a facial expression determination system according to Modification 1 of one embodiment of the present disclosure.

以下に説明する実施形態は、本開示の種々の実施形態の一つに過ぎない。本開示の実施形態は、下記実施形態に限定されることはなく、この実施形態以外も含み得る。また、下記の実施形態は、本開示に係る技術的思想を逸脱しない範囲であれば、設計等に応じて種々の変更が可能である。 The embodiments described below are merely one of various embodiments of the present disclosure. Embodiments of the present disclosure are not limited to the following embodiments, and may include other embodiments. In addition, various modifications can be made to the following embodiments in accordance with the design and the like without departing from the technical idea according to the present disclosure.

（実施形態）
（１）概要
本実施形態の表情判定システム１は、検出部２０と、処理部３０とを備える。検出部２０は、顔を含む入力映像から、入力映像のうち顔の部分に変化が発生している対象期間を検出する検出処理を行う。処理部３０は、対象期間での入力映像に基づいて、少なくとも微表情が表れているか否かを判定する。 (Embodiment)
(1) Overview The facial expression determination system 1 of the present embodiment includes a detection unit 20 and a processing unit 30. The detection unit 20 performs a detection process of detecting, from an input video including a face, a target period in which a change occurs in a face portion of the input video. The processing unit 30 determines whether at least a faint expression is present based on the input video in the target period.

ここにおいて、「入力映像」は、判定対象の人の顔を時系列で撮影した複数の映像を含む。「顔の部分に発生する変化」とは、微表情又はマクロ表情が表れることに起因して発生する変化に限らず、瞬き、痙攣、眼球の移動、及び顔全体の動き等に起因して発生する変化も含み得る。「微表情」とは、抑制された感情に基づいて一瞬表れて消える顔の動きである。微表情が表れる時間長は例えば１秒以下であり、一般的に０．２秒〜０．５秒程度の時間長である。したがって、瞬き、痙攣、眼球の移動、及び顔全体の動き等に起因して顔の部分に発生する変化は微表情には含まれない。なお、微表情は、感情に基づいて発生する僅かな顔の動き（微細表情）を含んでもよく、微細表情が表れる時間長は１秒以上でもよい。なお、微表情の定義はこの限りではなく、今後の微表情に関する研究によって変わり得る。微表情の種類、つまり微表情として表れる感情の種類には複数の種類があり、例えば「怒り」、「嫌悪」、「恐怖」、「悲しみ」、「軽蔑」、「喜び」及び「驚き」等の種類がある。微表情の種類によって、変化する顔の部位（目、眉、唇、頬等）と変化の仕方（変化量、変化方向、変化時間等）が異なっている。 Here, the “input image” includes a plurality of images obtained by photographing the face of the person to be determined in chronological order. "Changes that occur in the face" are not limited to changes that occur due to the appearance of micro- or macro-expressions, but also occur due to blinking, convulsions, eye movement, and movement of the entire face. Changes may also be included. “Minor expression” is a facial movement that appears and disappears momentarily based on suppressed emotions. The time length during which the fine expression appears is, for example, 1 second or less, and is generally about 0.2 second to 0.5 second. Therefore, changes that occur in the face portion due to blinking, convulsions, eyeball movement, movement of the entire face, and the like are not included in the micro-expression. Note that the fine expression may include a slight facial movement (fine expression) generated based on the emotion, and the time length during which the fine expression appears may be 1 second or more. Note that the definition of a micro-expression is not limited to this, and may change depending on future research on micro-expressions. There are a plurality of types of sub-expressions, that is, types of emotions expressed as sub-expressions, such as "anger", "disgust", "fear", "sadness", "contempt", "joy" and "surprise". There are different types. The part of the face that changes (eye, eyebrows, lips, cheeks, etc.) and the way of change (the amount of change, the direction of change, the time of change, etc.) differ depending on the type of micro-expression.

上述のように、本実施形態の表情判定システム１では、検出部２０によって顔の部分に変化が発生していると検出された対象期間での入力映像に基づいて処理部３０が微表情が表れているか否かを判定している。したがって、微表情が表れていないのに微表情が表れていると誤検出する可能性を低減でき、誤検出の低減を図ることが可能な表情判定システムを提供することができる。 As described above, in the facial expression determination system 1 of the present embodiment, the processing unit 30 displays a subtle facial expression based on the input video in the target period in which the detection unit 20 has detected that a change has occurred in the face part. Has been determined. Therefore, it is possible to provide a facial expression determination system that can reduce the possibility of erroneous detection that a subtle facial expression appears even though a subtle facial expression does not appear, and can reduce erroneous detection.

（２）詳細
本実施形態の表情判定システム１は、例えば、接客、テーマパーク、医療、介護、広告に対する受け手の反応の検出、面接、自動車等の運転手の状態監視、セキュリティ、ゲート（例えば、建物の出入口、入出国ゲート又は検問所等）を通過する人物の監視等の用途に適用される。表情判定システム１は、各種の用途において、判定対象の人の顔を撮影するカメラからの入力映像に基づいて、判定対象の人の顔に微表情が表れているか否かを少なくとも判定する。例えば、表情判定システム１が服飾店で使用される場合、カメラは、売り場内で商品を選んでいる客の顔を撮影可能な場所に配置される。表情判定システム１は、商品を選んでいるときの客の顔に表れる微表情を判定し、判定結果を店員に通知する。これにより、店員は客の感情を考慮しながら、客の好む商品を勧める等の適切な接客対応を行うことができる。 (2) Details The facial expression determination system 1 of the present embodiment includes, for example, reception of a customer, theme park, medical care, nursing, detection of a response of a recipient to an advertisement, interview, monitoring of the state of a driver such as an automobile, security, gate (for example, It is used for monitoring of persons passing through entrances, exit gates or checkpoints of buildings. In various applications, the facial expression determination system 1 determines at least whether or not a subtle facial expression is present on the face of the person to be determined based on an input image from a camera that captures the face of the person to be determined. For example, when the facial expression determination system 1 is used in a clothing store, the camera is arranged in a place where a face of a customer selecting a product can be photographed in a sales floor. The facial expression determination system 1 determines a subtle facial expression that appears on a customer's face when selecting a product, and notifies the clerk of the determination result. Thus, the clerk can take appropriate measures such as recommending a product that the customer prefers while considering the customer's feelings.

以下、本実施形態に係る表情判定システム１について図面を参照して詳しく説明する。 Hereinafter, the facial expression determination system 1 according to the present embodiment will be described in detail with reference to the drawings.

（２．１）構成
本実施形態の表情判定システム１は、入力部１０と、検出部２０と、処理部３０と、出力部４０と、記憶部５０と、を備える。 (2.1) Configuration The facial expression determination system 1 of the present embodiment includes an input unit 10, a detection unit 20, a processing unit 30, an output unit 40, and a storage unit 50.

入力部１０は、カメラ２からの入力映像を受け付ける。カメラ２は、例えば、ＣＭＯＳ（Complementary Metal Oxide Semiconductor）イメージセンサ又はＣＣＤ（Charge Coupled Device）イメージセンサ等のイメージセンサを備える。カメラ２は、判定対象の人の顔を正面から撮影可能な位置に設置されている。カメラ２は、微表情を撮影可能な時間間隔、例えば２０〜１００ＦＰＳ（Frames Per Second）程度のフレームレートで撮影エリアを撮影しており、各フレームの映像のデータを入力部１０に出力する。カメラ２のフレームレートは固定値でもよいし、判定対象の人の動き又は撮影条件等に合わせて適宜変更されてもよい。入力部１０は、カメラ２から入力された映像のデータを検出部２０に出力する。本実施形態では、入力部１０には、判定対象の人の顔を撮影するカメラ２から入力映像が直接入力されているが、カメラ２の映像を蓄積するサーバなどから入力されてもよい。また、別の装置で撮影した映像を蓄積するサーバから映像を入力してもよい。 The input unit 10 receives an input video from the camera 2. The camera 2 includes an image sensor such as a CMOS (Complementary Metal Oxide Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor. The camera 2 is installed at a position where the face of the person to be determined can be photographed from the front. The camera 2 shoots the shooting area at a time interval at which a fine expression can be shot, for example, at a frame rate of about 20 to 100 FPS (Frames Per Second), and outputs video data of each frame to the input unit 10. The frame rate of the camera 2 may be a fixed value, or may be appropriately changed in accordance with the movement of a person to be determined or shooting conditions. The input unit 10 outputs video data input from the camera 2 to the detection unit 20. In the present embodiment, the input video is directly input to the input unit 10 from the camera 2 that captures the face of the person to be determined, but may be input from a server that stores the video of the camera 2 or the like. Alternatively, the video may be input from a server that stores the video captured by another device.

検出部２０は、例えば、１以上のプロセッサ及び１以上のメモリを有するマイクロコントローラを主構成とする。マイクロコントローラのメモリに記録されたプログラムを、マイクロコントローラのプロセッサが実行することにより、検出部２０の各機能が実現される。プログラムは、メモリに記録されていてもよいし、インターネット等の電気通信回線を通して提供されてもよく、メモリカード等の非一時的な記録媒体に記録されて提供されてもよい。 The detection unit 20 has, for example, a microcontroller having one or more processors and one or more memories as a main configuration. The functions of the detection unit 20 are realized by the processor of the microcontroller executing the program recorded in the memory of the microcontroller. The program may be recorded in a memory, may be provided through an electric communication line such as the Internet, or may be recorded in a non-temporary recording medium such as a memory card and provided.

検出部２０は、カメラ２が時系列で撮影した複数の入力映像から、顔の部分が変化している対象期間、つまり微表情が表れている可能性が高い対象期間を検出する。検出部２０は、対象期間の検出結果を処理部３０に出力する。 The detection unit 20 detects, from a plurality of input videos captured by the camera 2 in chronological order, a target period in which the face portion is changing, that is, a target period in which there is a high possibility that a micro-expression is appearing. The detection unit 20 outputs a detection result of the target period to the processing unit 30.

具体的には、検出部２０は、カメラ２から入力される各フレームの入力映像に対して顔の部分を検出する顔検出処理を実行して、顔の部分の部分画像Ｇ１（図２参照）を抽出する。検出部２０は、顔の部分の部分画像Ｇ１を、縦方向及び横方向にそれぞれ複数に分割（図２では例えば６つに分割）して、複数の画素ブロックＢ１を作成する。 Specifically, the detection unit 20 performs a face detection process of detecting a face portion with respect to the input video of each frame input from the camera 2, and performs a partial image G1 of the face portion (see FIG. 2). Is extracted. The detection unit 20 divides the partial image G1 of the face part into a plurality of pieces in the vertical direction and the horizontal direction, respectively, and divides it into, for example, six pieces in FIG. 2 to create a plurality of pixel blocks B1.

検出部２０は、各フレームの入力映像に対して、複数の画素ブロックＢ１の各々について、ＬＢＰ（Local Binary Pattern）特徴量のヒストグラムを求め、ヒストグラムの演算結果を記憶部５０に記憶する。なお、検出部２０は、ＬＢＰ特徴量のヒストグラムを求めているが、ＬＢＰ特徴量以外のヒストグラムでもよく、例えば輝度の勾配方向の分布（ヒストグラム）を求めることによってＨＯＧ（Histogram of Oriented Gradients）特徴量を利用してもよい。 The detection unit 20 obtains a histogram of an LBP (Local Binary Pattern) feature amount for each of the plurality of pixel blocks B1 with respect to the input video of each frame, and stores the histogram calculation result in the storage unit 50. Although the detection unit 20 obtains the histogram of the LBP feature amount, the detection unit 20 may use a histogram other than the LBP feature amount. For example, the detection unit 20 obtains a HOG (Histogram of Oriented Gradients) feature amount by obtaining a distribution (histogram) in a luminance gradient direction. May be used.

また、検出部２０は、記憶部５０に記憶された各フレームでの演算結果に基づいて、複数の画素ブロックＢ１の各々について、あるフレーム（以下、注目フレームともいう）でのヒストグラムと、Ｎフレーム（Ｎは１以上の整数であり、本実施形態では例えばＮ＝１１であるがＮの値は適宜変更可能である。）前のフレームでのヒストグラムとＮフレーム後のフレームでのヒストグラムとの平均値との、カイ２乗距離を計算する。ここで、Ｎフレーム分の時間長は、微表情が表れる時間長（最大値、最小値、平均値又は中央値）の約半分に設定されている。検出部２０は、複数の画素ブロックＢ１のうち、カイ２乗距離が大きいものから所定個数（例えば１２個）の画素ブロックＢ１で求めたカイ２乗距離の平均値を求める。そして、検出部２０は、Ｎフレーム前のフレームで求めたカイ２乗距離の平均値とＮフレーム後のフレームで求めたカイ２乗距離の平均値との平均を、注目フレームで求めたカイ２乗距離の平均値から減算した値（Contrasted Difference Vector）Ｃ１を求める。図３は各フレームで値Ｃ１を計算した結果を示している。検出部２０は、値Ｃ１が閾値Ｌ１以上でありかつ値Ｃ１が最大となるフレームを中央フレームとし、中央フレームのＮフレーム前を対象期間の開始フレーム、中央フレームのＮフレーム後を対象期間の終了フレームとして求める。そして、検出部２０は、開始フレームから終了フレームまでの期間を、入力映像のうち顔の部分に変化が発生している対象期間として検出する。ここで、各フレームの入力映像にはフレームを特定するためのフレーム番号が割り当てられており、検出部２０は、開始フレーム及び終了フレームのフレーム番号を対象期間の検出結果として処理部３０に出力する。なお、検出部２０は、対象期間の時間情報を検出結果として処理部３０に出力してもよい。 In addition, the detection unit 20 calculates, for each of the plurality of pixel blocks B1, a histogram at a certain frame (hereinafter, also referred to as a target frame) and an N frame based on the calculation result at each frame stored in the storage unit 50. (N is an integer of 1 or more, and in the present embodiment, for example, N = 11, but the value of N can be changed as appropriate.) The average of the histogram in the previous frame and the histogram in the frame after N frames Calculate the chi-square distance with the value. Here, the time length of N frames is set to about half of the time length (maximum value, minimum value, average value, or median value) in which a fine expression appears. The detection unit 20 calculates an average value of the chi-square distances obtained from a predetermined number (for example, 12) of the pixel blocks B1 from the one having the largest chi-square distance among the plurality of pixel blocks B1. Then, the detection unit 20 calculates the average of the average of the chi-square distance obtained in the frame before the N-th frame and the average of the chi-square distance obtained in the frame after the N-frame in the chi-frame obtained in the frame of interest. A value (Contrasted Difference Vector) C1 obtained by subtracting from the average value of the squared distance is obtained. FIG. 3 shows the result of calculating the value C1 for each frame. The detection unit 20 sets the frame in which the value C1 is equal to or larger than the threshold value L1 and in which the value C1 is the maximum as the center frame, and starts the target frame N frames before the center frame and ends the target period N frames after the center frame. Find as a frame. Then, the detection unit 20 detects a period from the start frame to the end frame as a target period in which a change occurs in a face portion of the input video. Here, a frame number for specifying the frame is assigned to the input video of each frame, and the detection unit 20 outputs the frame numbers of the start frame and the end frame to the processing unit 30 as the detection result of the target period. . Note that the detection unit 20 may output the time information of the target period to the processing unit 30 as a detection result.

なお、検出部２０が、対象期間を検出するアルゴリズムは上記のアルゴリズムに限定されず、フレーム間で各画素の画素値（濃淡値）の差分の変化量を求め、この変化量から対象期間を検出するようなアルゴリズムでもよい。 Note that the algorithm by which the detection unit 20 detects the target period is not limited to the above algorithm. The detection unit 20 obtains the amount of change in the pixel value (shade value) of each pixel between frames, and detects the target period from the amount of change. Such an algorithm may be used.

処理部３０は、例えば、１以上のプロセッサ及び１以上のメモリを有するマイクロコントローラを主構成とする。マイクロコントローラのメモリに記録されたプログラムを、マイクロコントローラのプロセッサが実行することにより、処理部３０の各機能が実現される。プログラムは、メモリに記録されていてもよいし、インターネット等の電気通信回線を通して提供されてもよく、メモリカード等の非一時的な記録媒体に記録されて提供されてもよい。 The processing unit 30 has, for example, a microcontroller having one or more processors and one or more memories as a main configuration. The functions of the processing unit 30 are realized by the processor of the microcontroller executing the program recorded in the memory of the microcontroller. The program may be recorded in a memory, may be provided through an electric communication line such as the Internet, or may be recorded in a non-temporary recording medium such as a memory card and provided.

処理部３０は、検出部２０によって検出された対象期間での入力映像に基づいて、微表情が表れているか否かの判定を少なくとも行う。上述のように、微表情には複数の種類がある。本実形態の処理部３０は、対象期間での入力映像に基づいて、微表情が表れていると判定した場合に、複数の種類の中から、対象期間に表れた微表情の種類を更に判定する判定処理を行う。「（１）概要」では微表情の種類として７種類の分類を例示したが、以下の実施形態では、微表情が「Positive」、「Negative」及び「Surprise」の３種類に分類される場合について説明する。すなわち、処理部３０は、対象期間における顔の変化が微表情であるか否かを判定し、微表情であると判定した場合は微表情の種類が「Positive」、「Negative」及び「Surprise」のうちいずれの微表情に該当するを判定する。なお、処理部３０が、対象期間における顔の変化のうち微表情以外の変化を「非表情」と判定する場合、処理部３０は、対象期間における顔の変化が「Positive」、「Negative」、「Surprise」及び「非表情」のうちのいずれに該当するかを判定してもよい。 The processing unit 30 determines at least whether or not a fine expression is present, based on the input video detected by the detection unit 20 during the target period. As described above, there are a plurality of types of sub-expressions. When the processing unit 30 of the present embodiment determines that a subtle facial expression is present based on the input video in the target period, the processing unit 30 further determines the type of the subtle facial expression that appears in the target period from a plurality of types. Is performed. Although “(1) Overview” exemplifies seven types of sub-expressions, in the following embodiment, the sub-expressions are classified into three types, “Positive”, “Negative”, and “Surprise”. explain. That is, the processing unit 30 determines whether or not the change in the face during the target period is a faint expression, and when it is determined that the facial expression is a faint expression, the type of the faint expression is “Positive”, “Negative”, and “Surprise”. Is determined to correspond to any of the micro-expressions. When the processing unit 30 determines that a change other than a micro-expression in the change of the face in the target period is “non-expression”, the processing unit 30 determines that the change of the face in the target period is “Positive”, “Negative”, It may be determined which of “Surprise” and “non-expression” corresponds.

本実施形態では、処理部３０は認識部３１と判定部３２とを含む。認識部３１は、対象期間での入力映像に基づいて、対象期間での入力映像における顔の部分の変化が、複数の種類のそれぞれに該当する確度を表す確度情報を複数の種類ごとに求める。判定部３２は、認識部３１が求めた複数の種類ごとの確度情報に基づいて、少なくとも微表情が表れているか否かを判定する判定処理を行う。 In the present embodiment, the processing unit 30 includes a recognition unit 31 and a determination unit 32. Based on the input video in the target period, the recognition unit 31 obtains, for each of the plurality of types, certainty information indicating the degree of change in the face of the input video in the target period corresponding to each of the plurality of types. The judging unit 32 performs a judging process of judging whether at least a subtle facial expression is present based on the accuracy information for each of the plurality of types obtained by the recognizing unit 31.

以下、認識部３１及び判定部３２の機能について説明する。 Hereinafter, the functions of the recognition unit 31 and the determination unit 32 will be described.

本実施形態の認識部３１は、複数の種類ごとの確度情報に加えて、入力映像における顔の部分の変化が微表情ではない確度を表す確度情報を更に求める。また、本実施形態の認識部３１（処理部３０）は、機械学習で作成された学習済みモデルを用いて判定処理を行う。 The recognizing unit 31 of the present embodiment further obtains certainty information indicating the certainty that the change of the face part in the input video is not a fine expression, in addition to the certainty information for each of a plurality of types. In addition, the recognition unit 31 (processing unit 30) of the present embodiment performs a determination process using a learned model created by machine learning.

ここで、機械学習の推論フェーズで使用される学習済みモデルは、例えば、処理部３０が教師データを用いて機械学習を行うことによって作成される。機械学習の学習フェーズでは、教師データとして、「Positive」、「Negative」及び「Surprise」の３種類の微表情がそれぞれ表れたときの顔映像と、各顔映像の正解データとが対応付けて用意されている。また、本実施形態では、教師データとして、感情以外の要因で顔の部分が変化した顔映像、つまり「非表情」と判定されるべき顔映像と、その正解データとが対応付けて用意されている。 Here, the learned model used in the inference phase of machine learning is created, for example, by the processing unit 30 performing machine learning using teacher data. In the learning phase of machine learning, face images when three types of micro-expressions, “Positive”, “Negative”, and “Surprise”, respectively appear as teacher data, and the correct answer data of each face image are prepared in association with each other. Have been. Further, in the present embodiment, as the teacher data, a face image in which the face part has changed due to a factor other than the emotion, that is, a face image to be determined to be “non-expression” and its correct answer data are prepared in association with each other. I have.

表情判定システム１は、３種類の微表情がそれぞれ表れたときの顔映像、及び感情以外の要因で顔の部分が変化したときの顔映像についてＬＢＰ特徴量を３次元に拡張したＬＢＰ−ＴＯＰ（Three Orthogonal Planes）特徴量をそれぞれ算出し、それぞれを正解データと対応付けて機械学習を行うことで、学習済みモデルを作成する。表情判定システム１は、作成した学習済みモデルを記憶部５０に保存する。 The facial expression determination system 1 is an LBP-TOP (LBP-TOP (LBP-TOP) in which the LBP feature amount is extended three-dimensionally for a facial image when three types of fine facial expressions respectively appear and for a facial image when a face portion changes due to factors other than emotion. Three Orthogonal Planes) Each of the feature amounts is calculated, and each is associated with the correct answer data to perform machine learning, thereby creating a trained model. The facial expression determination system 1 stores the created learned model in the storage unit 50.

表情判定システム１が行う機械学習のアルゴリズムは、例えば、教師あり学習のアルゴリズムであるサポートベクトルマシン（Support Vector Machine）を使うことができる。なお、教師データは、３種類の微表情がそれぞれ表れたときの顔映像を少なくとも含んでいればよく、感情以外の要因で顔の部分が変化したときの顔映像は必須ではない。また、教師データは、感情以外の要因で顔の部分が変化したときの顔映像であって、表情判定システム１が微表情であると誤判定したときの顔映像を含んでいてもよい。なお、処理部３０には、表情判定システム１以外のシステムで機械学習を行うことによって作成された学習済みモデルが組み込まれてもよい。 As a machine learning algorithm performed by the facial expression determination system 1, for example, a support vector machine (Support Vector Machine) which is a supervised learning algorithm can be used. Note that the teacher data only needs to include at least a face image when three types of micro-expressions appear, and a face image when a face portion changes due to a factor other than emotion is not essential. In addition, the teacher data may be a face image when the face part changes due to a factor other than the emotion, and may include a face image when the expression determination system 1 erroneously determines that the expression is a faint expression. The processing unit 30 may incorporate a learned model created by performing machine learning using a system other than the facial expression determination system 1.

認識部３１は、機械学習の推論フェーズでは学習済みモデルを用いて、対象期間での入力映像（具体的には、入力映像のうち顔部分の部分映像）に対して認識処理を行う。すなわち、認識部３１は、各対象期間での入力映像に基づいてＬＢＰ−ＴＯＰ特徴量を算出する。認識部３１は、各対象期間で算出したＬＢＰ−ＴＯＰ特徴量に基づいて、学習済みモデルにより、微表情の複数の種類ごとの確度情報（スコア）と、感情以外の要因で発生した動きに該当する確度を示す確度情報とをそれぞれ求める。ここで、認識部３１は、微表情以外の顔の動きを「非表情」に分類する。表１は、検出部２０によって検出された５つの対象期間Ｔ１〜Ｔ５のそれぞれで認識部３１が求めた確度情報を示している。認識部３１は、対象期間Ｔ１〜Ｔ５のそれぞれで発生した顔の動きが、３種類の微表情及び非表情のそれぞれに該当する確度を表す確度情報を求めている。なお、対象期間Ｔ１〜Ｔ５のそれぞれで、３種類の微表情及び非表情のそれぞれに該当する確度を表す確度情報は１以下の数値であり、それらの合計は１になる。 In the inference phase of machine learning, the recognizing unit 31 performs a recognition process on an input video (specifically, a partial video of a face portion of the input video) in the target period using the learned model. That is, the recognition unit 31 calculates the LBP-TOP feature quantity based on the input video in each target period. Based on the LBP-TOP feature amount calculated in each target period, the recognition unit 31 uses the learned model to determine the accuracy information (score) for each of a plurality of types of micro-expressions and the movements caused by factors other than emotion. And accuracy information indicating the accuracy to be performed. Here, the recognition unit 31 classifies the movement of the face other than the faint expression as “non-expression”. Table 1 shows the accuracy information obtained by the recognition unit 31 in each of the five target periods T1 to T5 detected by the detection unit 20. The recognizing unit 31 obtains certainty information indicating the certainty that the face movement occurring in each of the target periods T <b> 1 to T <b> 5 corresponds to each of the three types of micro-expressions and non-expressions. Note that in each of the target periods T1 to T5, the accuracy information indicating the accuracy corresponding to each of the three types of micro-expressions and non-expressions is a numerical value of 1 or less, and the total thereof is 1.

なお、認識部３１が、対象期間における顔の部分の変化が、感情以外の要因に起因する確度（つまり「非表情」に該当する確度）を示す確度情報を求めることは必須ではない。認識部３１は、対象期間における顔の部分の変化が、微表情の複数の種類のそれぞれに該当する確度を示す確度情報のみを求めてもよい。 It is not essential that the recognizing unit 31 obtain the certainty information indicating the certainty that the change of the face part during the target period is caused by a factor other than the emotion (that is, the certainty corresponding to “non-expression”). The recognizing unit 31 may obtain only the certainty information indicating the certainty that the change of the face part in the target period corresponds to each of the plurality of types of the micro-expressions.

また、処理部３０は、対象期間における顔の部分の変化が３種類の微表情のどれに該当するかを判定しているが、微表情の種類は上記の３種類に限定されない。微表情の種類は１種類でも複数種類でもよく、例えばP.Ekmanが定義した「怒り」、「嫌悪」、「恐怖」、「悲しみ」、「軽蔑」、「喜び」及び「驚き」の７種類でもよい。 Further, the processing unit 30 determines which of the three types of sub-expressions the change of the face portion during the target period corresponds to, but the types of sub-expressions are not limited to the above three types. There may be one or more types of micro-expressions, for example, seven types of "anger", "disgust", "fear", "sadness", "contempt", "joy" and "surprise" defined by P. Ekman May be.

判定部３２は、微表情の複数の種類と非表情とのそれぞれについて求めた確度情報に基づいて、各対象期間で微表情が表れているか否かを少なくとも判定する。本実施形態の判定部３２は、各対象期間で微表情が表れていると判定した場合、認識部３１が求めた微表情の種類ごとの確度情報に基づいて、どの種類の微表情が表れたのかを判定する。 The determination unit 32 determines at least whether or not a micro-expression is present in each target period based on the accuracy information obtained for each of the plurality of types of micro-expressions and the non-expression. When the determining unit 32 of the present embodiment determines that a subtle facial expression is present in each target period, based on the accuracy information for each type of the subtle facial expression obtained by the recognizing unit 31, the type of the subtle facial expression appears. Is determined.

判定部３２は、各対象期間Ｔ１〜Ｔ５において、「Positive」、「Negative」、「Surprise」及び「非表情」のそれぞれの確度情報のうち、確度情報の値が最大のものが当該対象期間に表れた顔の動きであると判定する（表１参照）。例えば、「非表情」の確度情報の値が最大であれば、判定部３２は、対象期間において微表情が表れていないと判定する。また、「Positive」、「Negative」及び「Surprise」のうちいずれかの確度情報の値が最大であれば、判定部３２は、確度情報の値が最大である微表情が表れたと判定する。ここで、対象期間において「Positive」、「Negative」及び「Surprise」のうちいずれかの微表情の確度情報が最大となった場合でも、確度情報の値が所定の基準値以下である場合、判定部３２は「微表情ではない」と判定してもよい。 In each of the target periods T <b> 1 to T <b> 5, the determination unit 32 determines, among the certainty information of “Positive”, “Negative”, “Surprise”, and “non-expression”, the one with the largest value of the certainty information in the target period. It is determined that the movement is a face that has appeared (see Table 1). For example, if the value of the accuracy information of “non-expression” is the maximum, the determination unit 32 determines that the fine expression does not appear in the target period. If the value of the certainty information of any of “Positive”, “Negative”, and “Surprise” is the largest, the determination unit 32 determines that the fine expression with the largest value of the certainty information has appeared. Here, even if the accuracy information of any of the micro-expressions of “Positive”, “Negative”, and “Surprise” is the maximum during the target period, if the value of the accuracy information is equal to or less than the predetermined reference value, the determination is made. The unit 32 may determine that “it is not a minute expression”.

認識部３１は、各対象期間における顔の動きが「Positive」、「Negative」、「Surprise」及び「非表情」のそれぞれに該当する確度を表す確度情報を求めているが、「非表情」の確度情報は求めなくてもよい。この場合、各対象期間における「Positive」、「Negative」及び「Surprise」の確度情報のうち最大の確度情報の値が所定の基準値以下であれば、判定部３２は、「非表情」つまり微表情ではないと判定すればよい。 The recognition unit 31 obtains certainty information indicating the certainty that the face movement in each target period corresponds to each of “Positive”, “Negative”, “Surprise”, and “non-expression”. Accuracy information need not be determined. In this case, if the value of the maximum accuracy information among the accuracy information of “Positive”, “Negative”, and “Surprise” in each target period is equal to or less than a predetermined reference value, the determination unit 32 determines “non-expression”, that is, What is necessary is just to determine that it is not a facial expression.

なお、本実施形態では、検出部２０と処理部３０とは互いに別々のマイクロコントローラで実現されているが、１つのマイクロコントローラで実現されてもよい。 In the present embodiment, the detection unit 20 and the processing unit 30 are realized by separate microcontrollers, but may be realized by one microcontroller.

なお、本実施形態では、ＬＢＰ−ＴＯＰ特徴量とＳＶＭを用いた機械学習による教師有り学習アルゴリズムを説明したが、機械学習の方法は別の方法を用いても良い。例えば、ディープラーニング（深層学習）等でもよい。例えば、ディープラーニングのアルゴリズムは、３次元カーネルを用いたネットワークを用いて学習する方法でもよい。また、ディープラーニングのアルゴリズムは、フレームごとの顔部分の部分画像に対して２次元カーネルを用いたネットワークから出力される特徴量をリカレントニューラルネットワークにより学習する方法でもよい。 In the present embodiment, the supervised learning algorithm based on the machine learning using the LBP-TOP feature amount and the SVM has been described, but another machine learning method may be used. For example, deep learning (deep learning) may be used. For example, the deep learning algorithm may be a method of learning using a network using a three-dimensional kernel. Further, the algorithm of deep learning may be a method of learning a feature amount output from a network using a two-dimensional kernel using a recurrent neural network for a partial image of a face portion for each frame.

出力部４０は、各対象期間において判定部３２が判定した判定結果を報知部３に出力する。報知部３は、例えば表情判定システム１のユーザ（例えば接客業であれば接客を行う店員）が装着した透過型のヘッドマウントディスプレイである。出力部４０は、報知部３に対して処理部３０の判定結果を無線送信する。報知部３は、出力部４０から無線送信された判定結果を受信すると、例えば、ユーザの眼前に配置される透過型スクリーンに判定結果を表示する。処理部３０の判定結果が複数種類の微表情のいずれかであれば、報知部３は、判定された微表情に対応する感情の名称を表示する。処理部３０の判定結果が「非表情」であれば、報知部３は、例えば「微表情ではない」のようなメッセージを表示する。したがって、表情判定システム１のユーザは、報知部３に表示された判定結果をもとに、判定対象の人（例えば客）の顔に表れた微表情を把握しながら接客を行うことができ、適切な接客対応を行うことができる。なお、報知部３は、例えばユーザの耳に装着されるヘッドホンを備えてもよく、処理部３０の判定結果をヘッドホンから音声等で出力してもよい。また、処理部３０の判定結果が「非表情」である場合に、報知部３は、「微表情ではない」のようなメッセージを表示しなくてもよい。 The output unit 40 outputs the determination result determined by the determination unit 32 in each target period to the notification unit 3. The notification unit 3 is, for example, a transmissive head-mounted display worn by a user of the facial expression determination system 1 (for example, a clerk who serves customers in the case of customer service). The output unit 40 wirelessly transmits the determination result of the processing unit 30 to the notification unit 3. Upon receiving the determination result wirelessly transmitted from the output unit 40, the notification unit 3 displays the determination result on, for example, a transmission screen arranged in front of the user's eyes. If the result of the determination by the processing unit 30 is any of a plurality of types of micro-expressions, the notification unit 3 displays the name of the emotion corresponding to the determined micro-expression. If the determination result of the processing unit 30 is “non-expression”, the notification unit 3 displays a message such as “not a faint expression”, for example. Therefore, the user of the facial expression determination system 1 can serve customers while grasping the micro-expression appearing on the face of the determination target person (for example, a customer) based on the determination result displayed on the notification unit 3, Appropriate customer service can be provided. The notification unit 3 may include, for example, headphones that are worn on the user's ear, and may output the determination result of the processing unit 30 from the headphones as a sound or the like. When the determination result of the processing unit 30 is “non-expression”, the notification unit 3 does not have to display a message such as “not a faint expression”.

なお、処理部３０の判定結果が「非表情」であれば、出力部４０は「非表情」の判定結果を報知部３に出力しなくてもよい。この場合、対象期間における顔の動きが微表情のいずれかの種類に該当すると判定された場合のみ、報知部３から判定結果が報知される。 If the determination result of the processing unit 30 is “non-expression”, the output unit 40 need not output the determination result of “non-expression” to the notification unit 3. In this case, the notification unit 3 notifies the determination result only when it is determined that the movement of the face in the target period corresponds to any type of the fine expression.

記憶部５０は、例えば、ＥＥＰＲＯＭ（Electrically Erasable Programmable Read-Only Memory）等の電気的に書換え可能な不揮発性メモリ、及びＲＡＭ（Random Access Memory）等の揮発性メモリ等を備える。記憶部５０は、ハードディスクドライブ等の外部記憶装置を備えてもよい。記憶部５０は、機械学習により作成された学習済みモデル等を記憶する。また、記憶部５０は、カメラ２から入力部１０に入力された入力映像を記憶してもよいし、検出部２０及び処理部３０の処理途中のデータ及び処理結果等を記憶してもよい。 The storage unit 50 includes, for example, an electrically rewritable nonvolatile memory such as an EEPROM (Electrically Erasable Programmable Read-Only Memory) and a volatile memory such as a RAM (Random Access Memory). The storage unit 50 may include an external storage device such as a hard disk drive. The storage unit 50 stores a learned model or the like created by machine learning. The storage unit 50 may store the input video input from the camera 2 to the input unit 10, or may store data in the course of processing by the detection unit 20 and the processing unit 30, processing results, and the like.

（２．２）動作
以下、本実施形態の表情判定システム１の動作を図４に基づいて説明する。 (2.2) Operation Hereinafter, the operation of the facial expression determination system 1 of the present embodiment will be described with reference to FIG.

入力部１０は、フレーム毎にカメラ２から入力映像が入力されると、カメラ２からの入力映像に対して顔検出処理を実行し（Ｓ１）、顔の部分の部分画像Ｇ１を検出部２０に出力する。なお、検出部２０が、入力映像から人の顔を検出できなければ、Ｓ２以降の処理は行わず、次フレームの入力映像が入力されると顔検出処理を再び実行する。 When an input image is input from the camera 2 for each frame, the input unit 10 performs a face detection process on the input image from the camera 2 (S1), and outputs a partial image G1 of the face to the detection unit 20. Output. If the detection unit 20 cannot detect a human face from the input video, the processing after S2 is not performed, and the face detection processing is executed again when the input video of the next frame is input.

検出部２０は、各フレームで顔の部分の部分画像Ｇ１が入力されると、部分画像Ｇ１に変化が発生している対象期間、つまり顔の部分に動きがある対象期間を検出する検出処理を行う（Ｓ２）。 When the partial image G1 of the face portion is input in each frame, the detection unit 20 performs a detection process of detecting a target period in which a change occurs in the partial image G1, that is, a target period in which the face portion has movement. Perform (S2).

ステップＳ２の検出処理の結果、検出部２０が対象期間を検出しなければ（Ｓ３：Ｎｏ）、ステップＳ１に戻り、カメラ２から次フレームの入力映像が入力されると、検出部２０が顔検出処理を再び行う。 As a result of the detection processing in step S2, if the detection unit 20 does not detect the target period (S3: No), the process returns to step S1, and when the input image of the next frame is input from the camera 2, the detection unit 20 performs face detection. Perform the process again.

ステップＳ２の検出処理の結果、検出部２０が対象期間を検出すると（Ｓ３：Ｙｅｓ）、認識部３１が、対象期間の入力映像（部分画像Ｇ１）について、学習済みモデルを用いて、３種類の微表情及び非表情のそれぞれの確度情報を求める認識処理を行う（Ｓ４）。 As a result of the detection processing in step S2, when the detection unit 20 detects the target period (S3: Yes), the recognition unit 31 uses the learned model to perform three types of input video (partial image G1) in the target period. A recognition process is performed to obtain the accuracy information of each of the micro-expression and the non-expression (S4).

認識部３１が確度情報を求めると、判定部３２が、認識部３１が求めた複数の種類ごとの確度情報に基づいて、少なくとも微表情が表れているか否かを判定する判定処理を行う（Ｓ５）。例えば、判定部３２は、「Positive」、「Negative」、「Surprise」及び「非表情」のうち、確度情報の値が最大のものが当該対象期間に表れた顔の動きに該当すると判定する。 When the recognition unit 31 obtains the accuracy information, the determination unit 32 performs a determination process of determining whether at least a micro-expression is present based on the plurality of types of accuracy information obtained by the recognition unit 31 (S5). ). For example, the determination unit 32 determines that among the “Positive”, “Negative”, “Surprise”, and “non-expression”, the one with the largest value of the accuracy information corresponds to the face movement that appeared in the target period.

そして、出力部４０が、判定部３２の判定結果を報知部３に出力する出力処理を行うと（Ｓ６）、報知部３は判定部３２の判定結果をユーザに報知する。したがって、ユーザは、報知部３による報知内容に基づいて、判定対象の人の顔に微表情が表れたか否か、微表情が表れたのであればその種類を把握できる。 Then, when the output unit 40 performs an output process of outputting the determination result of the determination unit 32 to the notification unit 3 (S6), the notification unit 3 notifies the user of the determination result of the determination unit 32. Therefore, based on the notification content of the notification unit 3, the user can grasp whether or not the facial expression has appeared on the face of the person to be determined and, if the facial expression has appeared, the type thereof.

表情判定システム１は、カメラ２から１フレームの入力映像が入力されるごとに、ステップＳ１〜Ｓ６の処理を繰り返し行っており、入力映像に映っている人の顔の表情を判定する。 The facial expression determination system 1 repeats the processing of steps S1 to S6 every time an input video of one frame is input from the camera 2, and determines the facial expression of a person's face reflected in the input video.

なお、いくつかのステップは同時動作してもよい。例えば、あるフレームにおけるＳ１からＳ６までの処理と、別のフレームにおけるＳ１からＳ６までの処理が同時動作してもよい。また、あるフレームにおけるＳ１からＳ３までの処理と、別のフレームにおけるＳ４からＳ６までの動作がオーバーラップしてもよい。これにより、Ｓ４の処理に多くの時間がかかっても、リアルタイム処理性能を高めることができる。 Note that some steps may be performed simultaneously. For example, the processing from S1 to S6 in one frame and the processing from S1 to S6 in another frame may operate simultaneously. Further, the processing from S1 to S3 in a certain frame and the operations from S4 to S6 in another frame may overlap. As a result, the real-time processing performance can be improved even if much time is required for the processing in S4.

（３）変形例
上記実施形態は、本開示の様々な実施形態の一つに過ぎない。上記実施形態は、本開示の目的を達成できれば、設計等に応じて種々の変更が可能である。また、表情判定システム１と同様の機能は、表情判定方法、コンピュータプログラム、又はプログラムを記録した非一時的な記録媒体等で具現化されてもよい。一態様に係る表情判定方法は、検出処理と、判定処理とを有する。検出処理では、顔を含む入力映像から、入力映像のうち顔の部分に変化が発生している対象期間を検出する。判定処理では、対象期間での入力映像に基づいて、少なくとも微表情が表れているか否かを判定する。一態様に係る（コンピュータ）プログラムは、コンピュータシステムに、検出処理と、判定処理と、を実行させるためのプログラムである。 (3) Modifications The above embodiments are merely one of various embodiments of the present disclosure. The above embodiment can be variously modified according to the design and the like as long as the object of the present disclosure can be achieved. Further, the same functions as those of the facial expression determination system 1 may be embodied by a facial expression determination method, a computer program, a non-temporary recording medium on which the program is recorded, or the like. A facial expression determination method according to one aspect includes a detection process and a determination process. In the detection process, a target period in which a change occurs in the face portion of the input video is detected from the input video including the face. In the determination process, it is determined whether or not at least a faint expression is present based on the input video in the target period. A (computer) program according to one embodiment is a program for causing a computer system to execute a detection process and a determination process.

以下、上記の実施形態の変形例を列挙する。以下に説明する変形例は、適宜組み合わせて適用可能である。 Hereinafter, modified examples of the above embodiment will be listed. The modifications described below can be applied in appropriate combinations.

本開示における表情判定システム１は、コンピュータシステムを含んでいる。コンピュータシステムは、ハードウェアとしてのプロセッサ及びメモリを主構成とする。コンピュータシステムのメモリに記録されたプログラムをプロセッサが実行することによって、本開示における表情判定システム１としての機能が実現される。プログラムは、コンピュータシステムのメモリに予め記録されてもよく、電気通信回線を通じて提供されてもよく、コンピュータシステムで読み取り可能なメモリカード、光学ディスク、ハードディスクドライブ等の非一時的記録媒体に記録されて提供されてもよい。コンピュータシステムのプロセッサは、半導体集積回路（ＩＣ）又は大規模集積回路（ＬＳＩ）を含む１ないし複数の電子回路で構成される。ここでいうＩＣ又はＬＳＩ等の集積回路は、集積の度合いによって呼び方が異なっており、システムＬＳＩ、ＶＬＳＩ（Very Large Scale Integration）、又はＵＬＳＩ（Ultra Large Scale Integration）と呼ばれる集積回路を含む。さらに、ＬＳＩの製造後にプログラムされる、ＦＰＧＡ（Field-Programmable Gate Array）、又はＬＳＩ内部の接合関係の再構成若しくはＬＳＩ内部の回路区画の再構成が可能な論理デバイスについても、プロセッサとして採用することができる。複数の電子回路は、１つのチップに集約されていてもよいし、複数のチップに分散して設けられていてもよい。複数のチップは、１つの装置に集約されていてもよいし、複数の装置に分散して設けられていてもよい。ここでいうコンピュータシステムは、１以上のプロセッサ及び１以上のメモリを有するマイクロコントローラを含む。したがって、マイクロコントローラについても、半導体集積回路又は大規模集積回路を含む１ないし複数の電子回路で構成される。 The facial expression determination system 1 according to the present disclosure includes a computer system. The computer system mainly has a processor and a memory as hardware. When the processor executes the program recorded in the memory of the computer system, the function as the facial expression determination system 1 in the present disclosure is realized. The program may be pre-recorded in the memory of the computer system, may be provided through an electric communication line, or may be recorded in a non-transitory recording medium such as a memory card, an optical disk, or a hard disk drive readable by the computer system. May be provided. A processor of a computer system is composed of one or more electronic circuits including a semiconductor integrated circuit (IC) or a large-scale integrated circuit (LSI). An integrated circuit such as an IC or an LSI referred to here differs depending on the degree of integration, and includes an integrated circuit called a system LSI, a VLSI (Very Large Scale Integration), or a ULSI (Ultra Large Scale Integration). Furthermore, an FPGA (Field-Programmable Gate Array), which is programmed after the manufacture of the LSI, or a logic device capable of reconfiguring the connection relation inside the LSI or reconfiguring the circuit section inside the LSI, is also adopted as a processor. Can be. The plurality of electronic circuits may be integrated on one chip, or may be provided separately on a plurality of chips. The plurality of chips may be integrated in one device, or may be provided separately in a plurality of devices. The computer system includes a microcontroller having one or more processors and one or more memories. Therefore, the microcontroller is also composed of one or more electronic circuits including a semiconductor integrated circuit or a large-scale integrated circuit.

また、表情判定システム１における複数の機能が、１つの筐体内に集約されていることは表情判定システム１に必須の構成ではなく、表情判定システム１の構成要素は、複数の筐体に分散して設けられていてもよい。例えば、表情判定システム１の検出部２０と処理部３０とがそれぞれ別々のシステムに備えられていてもよい。さらに、表情判定システム１の少なくとも一部の機能、例えば、検出部２０及び処理部３０の一部の機能がクラウド（クラウドコンピューティング）等によって実現されてもよい。 It is not an essential configuration of the expression determination system 1 that the plurality of functions in the expression determination system 1 are integrated in one housing, and the components of the expression determination system 1 are distributed to the plurality of housings. May be provided. For example, the detection unit 20 and the processing unit 30 of the facial expression determination system 1 may be provided in separate systems. Furthermore, at least a part of the functions of the facial expression determination system 1, for example, a part of the functions of the detection unit 20 and the processing unit 30 may be realized by a cloud (cloud computing) or the like.

上記の実施形態では、表情判定システム１が判定結果を報知部３に出力しているが、表情判定システム１が報知部３を備えていてもよい。また、表情判定システム１は、カメラ２からの入力映像に基づいて判定処理を行っているが、表情判定システム１がカメラ２を備えていてもよい。 In the above embodiment, the facial expression determination system 1 outputs the determination result to the notification unit 3, but the facial expression determination system 1 may include the notification unit 3. Although the expression determination system 1 performs the determination processing based on the input video from the camera 2, the expression determination system 1 may include the camera 2.

上記の実施形態において、測定データなどの２値の比較において、「以上」としているところは「より大きい」であってもよい。つまり、２値の比較において、２値が等しい場合を含むか否かは、基準値等の設定次第で任意に変更できるので、「以上」か「より以上」かに技術上の差異はない。同様に、「以下」としているところは「未満」であってもよい。 In the above embodiment, in the comparison of the binary values of the measurement data and the like, “over” may be “greater”. In other words, whether or not the comparison of the two values includes the case where the two values are equal can be arbitrarily changed depending on the setting of the reference value or the like, so that there is no technical difference between “more than” and “more than”. Similarly, what is described as “below” may be “less than”.

（３．１）変形例１
図５に示すように、変形例１の表情判定システム１は受付部６０を更に備える点で、上記実施形態と相違する。受付部６０は、検出部２０の検出処理及び処理部３０の判定処理のうち少なくとも一方の対象処理の処理内容を設定するための設定情報を受け付ける。受付部６０が受け付けた設定情報に基づいて対象処理の処理内容が設定される。ここにおいて、処理部３０の判定処理は、認識部３１の認識処理と判定部３２の判定処理とを含む。 (3.1) Modification 1
As shown in FIG. 5, the facial expression determination system 1 of the first modification is different from the above embodiment in that the facial expression determination system 1 further includes a receiving unit 60. The receiving unit 60 receives setting information for setting the processing content of at least one of the detection process of the detection unit 20 and the determination process of the processing unit 30. The processing content of the target process is set based on the setting information received by the receiving unit 60. Here, the determination process of the processing unit 30 includes a recognition process of the recognition unit 31 and a determination process of the determination unit 32.

受付部６０は、例えば、キーボード、マウス、タッチパネル、又は音声入力装置等のＨＭＩ（Human Machine Interface）を備える。例えば、ユーザは、ＨＭＩを用いて、プルダウンメニューで提示された複数の設定情報の中から所望の設定情報を選択したり、テキストボックスに設定情報を入力したりすることで、設定情報を入力する。受付部６０は、ユーザがＨＭＩを用いて入力した設定情報を受け付ける。受付部６０が設定情報を受け付けると、検出部２０及び処理部３０の少なくとも一方が、受付部６０が受け付けた設定情報に基づいて対象処理の処理内容を設定する。 The receiving unit 60 includes, for example, an HMI (Human Machine Interface) such as a keyboard, a mouse, a touch panel, or a voice input device. For example, the user inputs setting information by selecting desired setting information from a plurality of setting information presented in a pull-down menu or inputting setting information in a text box using an HMI. . The receiving unit 60 receives setting information input by the user using the HMI. When the receiving unit 60 receives the setting information, at least one of the detecting unit 20 and the processing unit 30 sets the processing content of the target process based on the setting information received by the receiving unit 60.

ここで、設定情報により設定される検出処理の処理内容は、例えば、検出部２０が検出処理を行う入力映像の種類、つまりカメラ２の種類と、検出部２０の検出アルゴリズムとの少なくとも１つを含む。カメラ２の種類は、例えば、２種類のＲＧＢカメラ（例えば高性能のＲＧＢカメラと低性能のＲＧＢカメラ）及び赤外線（ＩＲ）カメラのいずれかである。検出部２０の検出アルゴリズムは、例えば高性能のＲＧＢカメラの入力映像に適合した検出アルゴリズムＸ１と、低性能のＲＧＢカメラの入力映像に適合した検出アルゴリズムＸ２と、赤外線（ＩＲ）カメラの入力映像に適合した検出アルゴリズムＸ３とを含む。例えば、表情判定システム１では、周囲が明るい場合は高性能又は低性能のＲＧＢカメラに適合した検出アルゴリズムＸ１，Ｘ２を使用し、周囲が暗い場合は赤外線カメラに適合した検出アルゴリズムＸ３を使用すればよい。 Here, the processing content of the detection processing set by the setting information includes, for example, at least one of the type of the input video on which the detection unit 20 performs the detection processing, that is, the type of the camera 2 and the detection algorithm of the detection unit 20. Including. The type of the camera 2 is, for example, one of two types of RGB cameras (for example, a high-performance RGB camera and a low-performance RGB camera) and an infrared (IR) camera. The detection algorithm of the detection unit 20 includes, for example, a detection algorithm X1 adapted to an input image of a high-performance RGB camera, a detection algorithm X2 adapted to an input image of a low-performance RGB camera, and an input image of an infrared (IR) camera. And a suitable detection algorithm X3. For example, in the facial expression determination system 1, if the surroundings are bright, the detection algorithms X1 and X2 adapted to the high-performance or low-performance RGB camera are used, and if the surroundings are dark, the detection algorithm X3 adapted to the infrared camera is used. Good.

設定情報により設定される判定処理の処理内容は、認識部３１により認識される微表情の種類の数（例えば多、中、少）と、要求される適合率（例えば高又は低）と、認識部３１の認識アルゴリズムと、判定部３２の判定アルゴリズムとの少なくとも１つを含む。認識部３１の認識アルゴリズムは、例えば３種類の認識アルゴリズムＹ１〜Ｙ３を含む。認識アルゴリズムＹ１は、微表情であるか否かの確度情報のみを求めるアルゴリズムである。認識アルゴリズムＹ２は、「Positive」、「Negative」、「Surprise」及び「非表情」のそれぞれの確度情報を求めるアルゴリズムである。認識アルゴリズムＹ３は、P.Ekmanが定義した「怒り」、「嫌悪」、「恐怖」、「悲しみ」、「軽蔑」、「喜び」及び「驚き」の７種類と「非表情」のそれぞれの確度情報を求めるアルゴリズムである。判定部３２の判定アルゴリズムは、例えば２種類の判定アルゴリズムＺ１、Ｚ２を含む。判定アルゴリズムＺ１は、微表情であるか否かを判定する基準値を第１基準値に設定するアルゴリズムである。判定アルゴリズムＺ２は、微表情であるか否かを判定する基準値を第１基準値よりも大きい第２基準値に設定するアルゴリズムである。 The processing content of the determination process set by the setting information includes the number of types of micro-expressions recognized by the recognition unit 31 (for example, many, medium, and small), the required matching rate (for example, high or low), the recognition At least one of a recognition algorithm of the unit 31 and a determination algorithm of the determination unit 32 is included. The recognition algorithm of the recognition unit 31 includes, for example, three types of recognition algorithms Y1 to Y3. The recognition algorithm Y1 is an algorithm for obtaining only accuracy information as to whether or not a facial expression is a fine expression. The recognition algorithm Y2 is an algorithm that obtains the accuracy information of each of “Positive”, “Negative”, “Surprise”, and “non-expression”. The recognition algorithm Y3 calculates the accuracy of each of the seven types of "anger", "disgust", "fear", "sadness", "contempt", "joy" and "surprise" defined by P.Ekman and "non-expression". It is an algorithm for obtaining information. The determination algorithm of the determination unit 32 includes, for example, two types of determination algorithms Z1 and Z2. The determination algorithm Z1 is an algorithm for setting a reference value for determining whether or not a facial expression is a first reference value. The determination algorithm Z2 is an algorithm that sets a reference value for determining whether or not a facial expression is a second reference value larger than the first reference value.

変形例１の表情判定システム１では、検出部２０及び処理部３０の少なくとも一方が、受付部６０が受け付けた設定情報に基づいて対象処理の処理内容を設定している。したがって、表情判定システム１のユーザは、対象処理の処理内容を所望の処理内容に設定することができる。なお、設定情報は、検出部２０の検出処理、認識部３１の認識処理、及び判定部３２の判定処理のうち全ての処理内容に関する設定情報を含んでいなくてもよい。設定情報は、検出部２０の検出処理、認識部３１の認識処理、及び判定部３２の判定処理のうち少なくとも一部の対象処理に関する設定情報を含んでいればよい。 In the facial expression determination system 1 of the first modification, at least one of the detection unit 20 and the processing unit 30 sets the processing content of the target process based on the setting information received by the reception unit 60. Therefore, the user of the facial expression determination system 1 can set the processing contents of the target processing to desired processing contents. Note that the setting information does not need to include the setting information regarding all the processing contents of the detection processing of the detection unit 20, the recognition processing of the recognition unit 31, and the determination processing of the determination unit 32. The setting information only needs to include setting information relating to at least a part of target processing among the detection processing of the detection unit 20, the recognition processing of the recognition unit 31, and the determination processing of the determination unit 32.

また、変形例１において、受付部６０が、設定情報として、表情判定システム１の用途を表す用途情報を受け付けた場合、対象処理の処理内容が用途に応じた処理内容に設定されてもよい。 Further, in the first modification, when the receiving unit 60 receives, as setting information, application information indicating a use of the facial expression determination system 1, the processing content of the target process may be set to the processing content according to the use.

この場合、記憶部５０には、表情判定システム１が適用される複数の用途を示す用途情報に対応付けて、検出部２０の検出処理及び処理部３０の判定処理の処理内容がそれぞれ記憶されている。表２は、複数の用途のそれぞれでの処理内容の一例である。なお、表２に示した用途ごとの処理内容は一例であり、用途の種類及び用途ごとの処理内容は適宜変更が可能である。 In this case, the storage unit 50 stores the processing contents of the detection processing of the detection unit 20 and the determination processing of the processing unit 30 in association with use information indicating a plurality of uses to which the facial expression determination system 1 is applied. I have. Table 2 is an example of the processing content for each of a plurality of uses. The processing content for each application shown in Table 2 is an example, and the type of application and the processing content for each application can be changed as appropriate.

検出部２０及び処理部３０は、受付部６０が受け付けた用途情報をもとに、検出部２０の検出処理及び処理部３０の判定処理の処理内容を記憶部５０から読み出して、検出部２０の検出処理及び処理部３０の判定処理の処理内容を設定する。これにより、表情判定システム１のユーザがＨＭＩを用いて用途情報を入力すると、表情判定システム１は、検出部２０の検出処理及び処理部３０の判定処理の処理内容を用途情報に対応して予め設定されている処理内容に設定する。したがって、ユーザは用途情報を入力するだけで、表情判定システム１の処理内容が用途に応じた処理内容が設定されるので、ユーザの設定の手間を低減できる。 The detection unit 20 and the processing unit 30 read the processing content of the detection process of the detection unit 20 and the determination process of the processing unit 30 from the storage unit 50 based on the use information received by the reception unit 60, and The processing contents of the detection processing and the determination processing of the processing unit 30 are set. Thereby, when the user of the facial expression determination system 1 inputs the usage information using the HMI, the facial expression determination system 1 sets the processing contents of the detection processing of the detection unit 20 and the determination processing of the processing unit 30 in advance in correspondence with the usage information. Set to the set processing content. Therefore, the user simply inputs the application information, and the processing content of the facial expression determination system 1 is set to the processing content according to the application, so that the user's setting labor can be reduced.

また、受付部６０が、設定情報として、用途情報を受け付けた後に、対象処理の処理内容を変更する変更情報を受け付けた場合、対象処理の処理内容が、用途に応じた処理内容から変更情報に応じて変更されてもよい。 When the receiving unit 60 receives change information for changing the processing content of the target process after receiving the usage information as the setting information, the processing content of the target process is changed from the processing content corresponding to the usage to the change information. It may be changed accordingly.

表情判定システム１のユーザが、ＨＭＩを用いて用途情報を入力した後に変更情報を入力すると、検出部２０及び処理部３０は、変更情報に対応した対象処理の処理内容を変更情報に基づいて変更する。これにより、検出部２０の検出処理及び処理部３０の判定処理の処理内容が、用途に応じて予め設定された処理内容から変更されるので、用途に応じて予め設定された処理内容から処理内容を微調整できる。 When the user of the facial expression determination system 1 inputs the change information after inputting the use information using the HMI, the detecting unit 20 and the processing unit 30 change the processing content of the target process corresponding to the change information based on the change information. I do. As a result, the processing contents of the detection processing of the detection unit 20 and the determination processing of the processing unit 30 are changed from the processing contents set in advance according to the application, and the processing contents are changed from the processing contents set in advance according to the application. Can be fine-tuned.

例えば、表情判定システム１のユーザは、表情判定システム１の用途が同じ場合でも、表情判定システム１の利用目的に応じて処理内容を変更することができる。例えば表情判定システム１の用途が介護の場合、被介護者の何らかの感情変化をとらえることを目的として、初期設定では、認識部３１が認識アルゴリズムＹ１を使用するように設定されているが、介護内容に不満を感じている利用者を見つけるような目的では認識アルゴリズムが認識アルゴリズムＹ２に変更される。認識部３１は、認識アルゴリズムＹ２を使用することによって、微表情の種類を認識できるので、表情判定システム１のユーザは利用者の感情を把握できる。 For example, the user of the facial expression determination system 1 can change the processing content according to the purpose of use of the facial expression determination system 1 even when the use of the facial expression determination system 1 is the same. For example, when the use of the facial expression determination system 1 is nursing care, the recognition unit 31 is initially set to use the recognition algorithm Y1 for the purpose of capturing any change in emotion of the care receiver. For the purpose of finding a user who is dissatisfied with, the recognition algorithm is changed to the recognition algorithm Y2. The recognition unit 31 can recognize the type of the micro-expression by using the recognition algorithm Y2, so that the user of the expression determination system 1 can grasp the emotion of the user.

また、表情判定システム１のユーザは、表情判定システム１の用途が同じでも、時間帯に応じて表情判定システム１の処理内容を変更することができる。例えば、表情判定システム１のユーザは、昼間はＲＧＢカメラ用の検出アルゴリズムＸ１又はＸ２を使用し、夜間は赤外線カメラ用の検出アルゴリズムＸ３を使用するように処理内容を設定できる。これにより、表情判定システム１のユーザは、用途に応じて予め設定された処理内容を、時間帯に応じて微調整することができる。 Further, the user of the facial expression determination system 1 can change the processing content of the facial expression determination system 1 according to the time zone, even if the purpose of the facial expression determination system 1 is the same. For example, the user of the facial expression determination system 1 can set the processing content so that the detection algorithm X1 or X2 for the RGB camera is used in the daytime and the detection algorithm X3 for the infrared camera is used in the nighttime. Thereby, the user of the facial expression determination system 1 can finely adjust the processing content preset according to the purpose according to the time zone.

また、検出部２０及び処理部３０は、許容される処理量及び処理能力などに応じて、検出アルゴリズム、認識アルゴリズム及び判定アルゴリズムのうちの少なくとも１つを変更してもよい。例えば、処理部３０は、処理部３０の処理量及び処理能力が所定の許容値よりも低ければ認識アルゴリズムを認識アルゴリズムＹ２に設定し、処理量及び処理能力が許容値よりも高ければ認識アルゴリズムを認識アルゴリズムＹ３に設定する。 Further, the detection unit 20 and the processing unit 30 may change at least one of a detection algorithm, a recognition algorithm, and a determination algorithm according to an allowable processing amount and a processing capability. For example, the processing unit 30 sets the recognition algorithm to the recognition algorithm Y2 if the processing amount and the processing capability of the processing unit 30 are lower than the predetermined allowable values, and sets the recognition algorithm to the recognition algorithm if the processing amount and the processing capability are higher than the allowable values. Set to the recognition algorithm Y3.

（３．２）その他の変形例
カメラ２は、判定対象の人の顔を正面から撮影可能な位置に設置されているとしたが、判定対象の人の顔を斜めから撮影した映像を入力しても良い。その場合、カメラ２で、人の顔を斜め方向から撮影した映像を正面方向から撮影した映像に変換する前処理を入れても良い。これにより、カメラ２に対して正面を向いていない人の顔の微表情も検出・認識できる。 (3.2) Other Modifications Although the camera 2 is set at a position where the face of the person to be determined can be photographed from the front, the camera 2 inputs an image of the face of the person to be determined obliquely photographed. May be. In this case, the camera 2 may include a pre-processing for converting an image obtained by photographing a person's face from an oblique direction into an image taken from a front direction. As a result, it is possible to detect / recognize even a faint expression of the face of a person who is not facing the camera 2 in front.

上記実施形態の表情判定システム１では、カメラ２から入力部１０にフレーム毎の入力映像が入力されており、検出部２０は、入力部１０に入力されるフレーム毎の入力映像に基づいて検出処理を行っているが、複数フレームに１回の割合で検出処理を行ってもよい。 In the facial expression determination system 1 of the above embodiment, an input video for each frame is input from the camera 2 to the input unit 10, and the detection unit 20 performs a detection process based on the input video for each frame input to the input unit 10. However, the detection process may be performed once in a plurality of frames.

また、検出部２０は、入力部１０に入力される入力映像から顔の部分の部分映像を検出しているが、例えば表情判定システム１の用途等の条件に応じて、入力映像において顔の部分を検出する範囲が予め設定されていてもよい。例えば、表情判定システム１の用途に応じて、入力映像において判定対象の人の顔が映る範囲が決まっている場合、検出部２０は、入力映像において顔検出を行う範囲を用途に応じて決定してもよい。 The detection unit 20 detects a partial image of a face portion from the input image input to the input unit 10. For example, depending on conditions such as the use of the facial expression determination system 1, the detection unit 20 detects a partial image of the face portion. May be set in advance. For example, when the range in which the face of the person to be determined is reflected in the input video is determined according to the use of the facial expression determination system 1, the detection unit 20 determines the range in which face detection is performed in the input video according to the use. You may.

上記実施形態において、検出部２０はクラウドによって実現されてもよい。すなわち、表情判定システム１は、入力部１０に入力されたカメラ２の入力映像をクラウド上に送信し、クラウドによって顔を検出する検出処理と対象期間を検出する検出処理との少なくとも一方を行えばよい。これにより、表情判定システム１の処理負荷を低減でき、またクラウド上のサービスで検出処理を実現できる。 In the above embodiment, the detection unit 20 may be realized by a cloud. That is, the facial expression determination system 1 transmits the input image of the camera 2 input to the input unit 10 to the cloud, and performs at least one of a detection process of detecting a face by the cloud and a detection process of detecting a target period. Good. This makes it possible to reduce the processing load of the facial expression determination system 1 and to realize the detection processing with a service on the cloud.

また、顔に表れる表情の記述法として、ＦＡＣＳ（Facial Action Coding System）がある。ＦＡＣＳでは、複数の動作単位（ＡＵ：Action Unit）を要素にして顔面動作を記述する。上記実施形態において、検出部２０は、顔の表情の基本要素であるＡＵを定量化し、ＡＵの値が所定の判定値を超える期間を対象期間として検出するアルゴリズムを使用してもよい。例えば、表情判定システム１は、瞬きに対応するＡＵ４５を定量化し、ＡＵ４５の値が判定値を超える期間を対象期間外として判定するアルゴリズムを使用してもよい。 Also, there is a FACS (Facial Action Coding System) as a description method of a facial expression appearing on a face. In the FACS, a facial action is described using a plurality of action units (AU: Action Unit) as elements. In the above embodiment, the detection unit 20 may use an algorithm that quantifies AU that is a basic element of the facial expression and detects a period in which the value of AU exceeds a predetermined determination value as a target period. For example, the facial expression determination system 1 may use an algorithm that quantifies the AU 45 corresponding to the blink and determines that a period in which the value of the AU 45 exceeds the determination value is outside the target period.

上記実施形態では、検出部２０は、入力映像に基づいて顔の部分に変化がある対象期間を検出しているが、入力映像だけではなく判定対象の人の生体情報（心拍、心電位、脳波等）を各種のセンサで検出した結果に基づいて対象期間を検出してもよい。検出部２０は、入力映像と生体情報とに基づいて対象期間を検出するので、対象期間の検出精度が向上するという利点がある。 In the above embodiment, the detection unit 20 detects the target period in which the face portion changes based on the input image. However, the detection unit 20 detects not only the input image but also the biological information (heart rate, cardiac potential, brain wave, , Etc.) may be detected based on the result of detection by various sensors. Since the detection unit 20 detects the target period based on the input video and the biological information, there is an advantage that the detection accuracy of the target period is improved.

上記実施形態では、検出部２０は、対象期間の検出結果を出力しているが、対象期間の検出結果に加えて、微表情が発生している確度を示す確度情報（スコア）を求め、この確度情報を処理部３０に出力してもよい。処理部３０は、検出部２０から確度情報が入力された場合、検出部２０から入力された確度情報を認識部３１により算出される確度情報に反映させてもよい。例えば、認識部３１は、微表情の種類ごとに求めた確度情報に、検出部２０から入力された確度情報を加算又は減算して、微表情の種類ごとの確度情報を求めてもよい。 In the above embodiment, the detection unit 20 outputs the detection result of the target period. In addition to the detection result of the target period, the detection unit 20 obtains certainty information (score) indicating the certainty that the micro-expression is occurring. The accuracy information may be output to the processing unit 30. When the accuracy information is input from the detection unit 20, the processing unit 30 may reflect the accuracy information input from the detection unit 20 on the accuracy information calculated by the recognition unit 31. For example, the recognition unit 31 may obtain the accuracy information for each type of micro-expression by adding or subtracting the accuracy information input from the detection unit 20 to the accuracy information obtained for each type of micro-expression.

また、検出部２０は、検出処理に使用した検出アルゴリズムの情報を処理部３０に出力してもよい。処理部３０は、検出部２０が使用する検出アルゴリズムに応じて微表情の種類ごとに設定されるバイアス値を、微表情の種類ごとに求めた確度情報に重畳してもよい。 The detection unit 20 may output information of the detection algorithm used for the detection processing to the processing unit 30. The processing unit 30 may superimpose a bias value set for each type of fine expression according to the detection algorithm used by the detection unit 20 on the accuracy information obtained for each type of fine expression.

また、検出部２０は、カメラ２の撮影情報及び撮影環境のうち少なくとも一方の情報を処理部３０に出力してもよい。カメラ２の撮影情報は、例えばカメラ２の入力映像に関する情報であり、入力映像の解像度、ホワイトバランス、フレームレート等である。カメラ２の撮影環境は、カメラ２が設置されている場所又はカメラ２の撮影範囲に関する情報である。表情判定システム１の用途が車載機器であれば、カメラ２の撮影環境は例えば車室内又は車室外であり、表情判定システム１の用途がテーマパークであれば、カメラ２の撮影環境は例えば屋内又は屋外である。処理部３０は、検出部２０から入力される、撮影情報及び撮影環境の少なくとも一方の情報に基づいて、複数の学習済みモデルの中から認識部３１が認識処理に使用する学習済みモデルを選択できる。 The detection unit 20 may output at least one of the shooting information of the camera 2 and the shooting environment to the processing unit 30. The shooting information of the camera 2 is, for example, information relating to an input image of the camera 2 and includes a resolution, a white balance, a frame rate, and the like of the input image. The shooting environment of the camera 2 is information on a place where the camera 2 is installed or a shooting range of the camera 2. If the use of the facial expression determination system 1 is an in-vehicle device, the shooting environment of the camera 2 is, for example, a vehicle interior or exterior, and if the use of the facial expression determination system 1 is a theme park, the shooting environment of the camera 2 is, for example, indoors or It is outdoors. The processing unit 30 can select a learned model used by the recognition unit 31 for the recognition process from the plurality of learned models based on at least one of the imaging information and the imaging environment input from the detection unit 20. .

また、検出部２０は、カメラ２等から入力される撮影地点の位置情報（例えばＧＰＳ（Global Positioning System）を用いて得られる情報）を処理部３０に出力してもよい。認識部３１は、例えば、検出部２０から入力される位置情報に基づいて、撮影地点が含まれる地域（例えば、日本、北米、欧州等）に応じた学習済みモデルを使用することができ、例えば該当地域で作成された学習済みモデルを用いて認識処理を行うことができる。 Further, the detection unit 20 may output to the processing unit 30 position information (for example, information obtained using a GPS (Global Positioning System)) of a shooting point input from the camera 2 or the like. The recognizing unit 31 can use a learned model corresponding to an area (for example, Japan, North America, Europe, etc.) including the shooting location based on the position information input from the detecting unit 20, for example. Recognition processing can be performed using a trained model created in the area.

また上記実施形態において、検出部２０は、顔の表情の基本要素であるＡＵを定量化した値を処理部３０に出力してもよい。例えば、機械学習の学習済みモデルが、入力映像に加えてＡＵを定量化した値を教師データとして作成されている場合、認識部３１は、入力映像とＡＵを定量化した値とを入力データとして認識処理を行えばよい。 In the above embodiment, the detection unit 20 may output a value obtained by quantifying the AU that is a basic element of the facial expression to the processing unit 30. For example, when the learned model of machine learning is created as teacher data using a value obtained by quantifying AU in addition to the input image, the recognition unit 31 uses the input image and the value obtained by quantifying AU as input data. What is necessary is just to perform a recognition process.

また、認識部３１はクラウドによって実現されてもよい。すなわち、表情判定システム１は、対象期間における入力映像をクラウド上に送信し、クラウドによって認識処理を行ってもよい。これにより、表情判定システム１の処理負荷を低減でき、またクラウド上のサービスを利用して認識処理を実現できる。 Further, the recognition unit 31 may be realized by a cloud. That is, the facial expression determination system 1 may transmit the input video in the target period to the cloud and perform the recognition process using the cloud. This makes it possible to reduce the processing load of the facial expression determination system 1 and to realize recognition processing using a service on the cloud.

また、認識部３１は、処理部３０が備えるプロセッサ及びネットワーク帯域の余力に応じて、認識処理の処理内容（例えば認識アルゴリズム、ディープラーニングのネットワーク構成、認識処理に使用する学習済みモデルの種類等）を変更してもよい。認識部３１は、処理部３０が備えるプロセッサ及びネットワーク帯域の余力に応じて、認識処理の処理内容を変更することで、認識処理の負荷を変更でき、認識処理によって表情判定システム１が行う他の処理に与える影響を軽減できる。 In addition, the recognition unit 31 performs processing of the recognition processing (for example, a recognition algorithm, a network configuration of deep learning, a type of a learned model used in the recognition processing, and the like) in accordance with a processor provided in the processing unit 30 and an available network bandwidth. May be changed. The recognizing unit 31 can change the load of the recognizing process by changing the processing content of the recognizing process in accordance with the remaining processor and the network bandwidth of the processing unit 30, and the other processing performed by the facial expression determination system 1 by the recognizing process. The effect on processing can be reduced.

また、認識部３１が使用する学習済みモデルは、入力映像に加えて生体情報（心拍、心電位、脳波等）を教師データとして機械学習された学習済みモデルでもよい。このような学習済みモデルを用いて認識処理を行うことで、認識処理の精度を高めることができる。 Further, the learned model used by the recognition unit 31 may be a learned model obtained by machine learning using biological information (heart rate, cardiac potential, brain wave, etc.) as teacher data in addition to the input video. By performing the recognition process using such a learned model, the accuracy of the recognition process can be improved.

また、認識部３１は、２段階で認識処理を行うように構成されてもよい。例えば、認識部３１は、１段目の認識処理では、対象期間における入力映像に基づいて、微表情であるか否かの認識処理を行い、１段目の認識処理で微表情であると認識された場合に、２段目の認識処理で微表情の種類ごとに確度情報を求める処理を行う。認識部３１が２段階で認識処理を行うように構成されている場合、１段目の認識処理と２段目の認識処理のうちの一方の認識処理をクラウドで行ってもよい。例えば、認識部３１が、２段目の認識処理をクラウドで行うように構成されていれば、微表情の種類だけを認識できるクラウドサービスを利用して微表情の種類を認識する認識処理を実行できる。 The recognition unit 31 may be configured to perform the recognition process in two stages. For example, in the first-stage recognition process, the recognition unit 31 performs a recognition process as to whether or not the sub-expression is a micro-expression based on the input video in the target period, and recognizes that the sub-expression is a sub-expression in the first-stage recognition process. Then, in the second-stage recognition process, a process of obtaining accuracy information for each type of fine expression is performed. When the recognition unit 31 is configured to perform the recognition process in two stages, one of the first-stage recognition process and the second-stage recognition process may be performed in the cloud. For example, if the recognition unit 31 is configured to perform the second-stage recognition process in the cloud, the recognition unit 31 executes the recognition process of recognizing the type of the fine expression using a cloud service that can recognize only the type of the small expression. it can.

また、認識部３１が認識処理に使用するアルゴリズムは、ラッセルの円環モデル上で、対象期間における顔の動きが対応する点を特定することで、判定対象の人の感情を認識するアルゴリズムであってもよい。これにより、認識部３１は、判定対象の人の快・不快・覚醒度等に基づく感情を認識できる。 The algorithm used by the recognizing unit 31 for the recognition process is an algorithm for recognizing the emotion of the person to be determined by identifying points on the Russell's ring model that correspond to face movements in the target period. You may. Thereby, the recognition unit 31 can recognize the emotion based on the degree of pleasure, discomfort, arousal, etc. of the person to be determined.

また、認識部３１は、認識処理によって求めた確度情報に加えて、認識処理の処理内容（例えば、認識アルゴリズム、ディープラーニングのネットワーク構成、学習済みモデルの種類、学習済みモデルの性能等）に関する情報を判定部３２に出力してもよい。判定部３２は、認識処理の処理内容に応じて設定された基準値を用いて判定処理を行うことができ、判定精度が向上するという利点がある。 The recognizing unit 31 also provides information on the processing content of the recognition processing (for example, the recognition algorithm, the network configuration of the deep learning, the type of the learned model, the performance of the learned model, etc.) in addition to the accuracy information obtained by the recognition processing. May be output to the determination unit 32. The determination unit 32 can perform the determination process using the reference value set according to the processing content of the recognition process, and has an advantage that the determination accuracy is improved.

また、判定部３２は、判定結果を記憶部５０に記憶しておき、比較的短い時間で複数人の顔に微表情が表れた場合に直前の数フレームでの判定結果を統合して、判定処理を行ってもよい。例えば、表情判定システム１の用途がサイネージ（Signage）である場合、判定部３２は、広告等を複数人が見た時に、複数人の顔にそれぞれ表れた感情を統合することで、複数人が平均してどのような感情を抱くかを判定できる。 In addition, the determination unit 32 stores the determination result in the storage unit 50, and integrates the determination results of the immediately preceding several frames when a micro-expression appears on the faces of a plurality of persons in a relatively short time, and Processing may be performed. For example, when the use of the facial expression determination system 1 is signage, the determination unit 32 integrates the emotions respectively appearing on the faces of a plurality of persons when the advertisement or the like is viewed by the plurality of persons, so that the plurality of persons can recognize the advertisement. On average, you can determine what kind of emotion you have.

また、判定部３２は、所定のフレーム数内での認識部３１の認識結果を統合して判定処理を行ってもよい。例えば、表情判定システム１が、演劇等を見ている観客の反応を調べるために使用される場合、観客の感情を変化させるような事象（例えば役者の台詞及び行動等）が発生してから所定時間内に観客の顔にどのような微表情が平均して表れるのかを判定できる。 The determination unit 32 may perform the determination process by integrating the recognition results of the recognition unit 31 within a predetermined number of frames. For example, when the facial expression determination system 1 is used to check the reaction of a spectator watching a theater or the like, a predetermined time after an event that changes the spectator's emotion (for example, a dialogue or action of an actor) occurs. It is possible to determine what micro-expressions appear on the face of the spectator on average in time.

また、上記実施形態において、出力部４０は、判定部３２による判定結果（微表情ではないとの結果、又は微表情である場合は微表情の種類）とともに、その確度情報を報知部３等の外部システムに出力してもよい。また、出力部４０は、認識部３１が求めた微表情の種類ごとの確度情報、及び微表情ではないことの確度情報を報知部３等の外部システムに出力してもよい。この場合、外部システムは、表情判定システム１から出力される判定結果及び確度情報を、判定対象の人の感情を認識する他のシステムの認識結果と組み合わせることができる。したがって、外部システムは、判定対象の人の感情をより高い精度で判定することができる。 Further, in the above-described embodiment, the output unit 40 transmits the accuracy information of the notifying unit 3 and the like together with the determination result (the result indicating that the expression is not a small expression, or the type of the small expression when the expression is a small expression) by the determination unit 32. You may output to an external system. Further, the output unit 40 may output the accuracy information for each type of sub-expression obtained by the recognizing unit 31 and the accuracy information indicating that the expression is not a sub-expression to an external system such as the notification unit 3. In this case, the external system can combine the determination result and the accuracy information output from the facial expression determination system 1 with the recognition result of another system that recognizes the emotion of the determination target person. Therefore, the external system can determine the emotion of the person to be determined with higher accuracy.

なお、出力部４０は、判定部３２による判定結果のうち、最も可能性が高い認識結果１つだけでなく、可能性が高い複数の結果を出力してもよい。例えば、確度の高いＭ個（Ｍは２以上の整数）を出力してもよい。これにより、マルチモーダル処理等、他の感情認識装置と組み合わせたより複雑な感情推定に応用できる。 Note that the output unit 40 may output not only one recognition result having the highest possibility but also a plurality of results having a high possibility among the determination results by the determination unit 32. For example, M (M is an integer of 2 or more) with high accuracy may be output. Accordingly, the present invention can be applied to more complicated emotion estimation in combination with another emotion recognition device such as multimodal processing.

（まとめ）
以上説明したように、第１の態様に係る表情判定システム（１）は、検出部（２０）と、処理部（３０）と、を備える。検出部（２０）は、顔を含む入力映像から、入力映像のうち顔の部分に変化が発生している対象期間を検出する検出処理を行う。処理部（３０）は、対象期間での入力映像に基づいて、少なくとも微表情が表れているか否かを判定する判定処理を行う。 (Summary)
As described above, the facial expression determination system (1) according to the first aspect includes the detection unit (20) and the processing unit (30). The detection unit (20) performs a detection process of detecting, from an input video including a face, a target period in which a change occurs in a face portion of the input video. The processing unit (30) performs a determination process of determining whether at least a faint expression is present based on the input video in the target period.

この態様によれば、検出部（２０）によって顔の部分に変化が発生していると検出された対象期間での入力映像に基づいて処理部（３０）が微表情が表れているか否かを判定している。したがって、微表情が表れていないのに微表情が表れていると誤検出する可能性を低減でき、誤検出の低減を図ることが可能な表情判定システム（１）を提供することができる。 According to this aspect, the processing unit (30) determines whether or not a minute expression is present based on the input video in the target period in which the detection unit (20) has detected that a change has occurred in the face portion. Has been determined. Accordingly, it is possible to provide a facial expression determination system (1) that can reduce the possibility of erroneous detection that a subtle facial expression is present even though a subtle facial expression is not present, and that can reduce erroneous detection.

第２の態様に係る表情判定システム（１）では、第１の態様において、微表情には複数の種類がある。処理部（３０）は、判定処理において、微表情が表れていると判定した場合に、複数の種類の中から、対象期間に表れた微表情の種類を更に判定する。 In the facial expression determination system (1) according to the second aspect, in the first aspect, there are a plurality of types of fine facial expressions. The processing unit (30) further determines, from the plurality of types, the type of the fine expression appearing in the target period, when it is determined in the determination process that the fine expression appears.

この態様によれば、微表情が表れているか否かだけでなく、微表情の種類まで判定できる。 According to this aspect, it is possible to determine not only whether or not a subtle expression is present but also the type of subexpression.

第３の態様に係る表情判定システム（１）では、第１又は２の態様において、処理部（３０）は認識部（３１）と判定部（３２）とを含む。認識部（３１）は、対象期間での入力映像に基づいて、対象期間での入力映像における顔の部分の変化が、複数の種類のそれぞれに該当する確度を表す確度情報を複数の種類ごとに求める。判定部（３２）は、認識部（３１）が求めた複数の種類ごとの確度情報に基づいて、少なくとも微表情が表れているか否かを判定する。 In the facial expression determination system (1) according to the third aspect, in the first or second aspect, the processing unit (30) includes a recognition unit (31) and a determination unit (32). The recognizing unit (31) generates, based on the input video in the target period, certainty information indicating the degree of change in the face portion in the input video in the target period corresponding to each of the plurality of types, for each of the plurality of types. Ask. The determination unit (32) determines whether at least a micro-expression is present based on the accuracy information for each of the plurality of types obtained by the recognition unit (31).

この態様によれば、誤検出の低減を図ることが可能な表情判定システム（１）を提供することができる。 According to this aspect, it is possible to provide a facial expression determination system (1) capable of reducing false detection.

第４の態様に係る表情判定システム（１）では、第３の態様において、認識部（３１）は、判定処理において、複数の種類ごとの確度情報に加えて、入力映像における顔の部分の変化が微表情ではないことの確度を表す確度情報を更に求める。 In the facial expression determination system (1) according to the fourth aspect, in the third aspect, the recognition unit (31) includes, in the determination processing, a change in a face portion in the input video in addition to the accuracy information for each of the plurality of types. Is further obtained.

第５の態様に係る表情判定システム（１）では、第１〜４のいずれかの態様において、処理部（３０）は、機械学習で作成された学習済みモデルを用いて判定処理を行う。 In the facial expression determination system (1) according to a fifth aspect, in any of the first to fourth aspects, the processing unit (30) performs the determination process using a learned model created by machine learning.

第６の態様に係る表情判定システム（１）は、第１〜５のいずれかの態様において、検出処理及び判定処理のうち少なくとも一方の対象処理の処理内容を設定するための設定情報を受け付ける受付部（６０）を更に備える。受付部（６０）が受け付けた設定情報に基づいて対象処理の処理内容が設定される。 A facial expression determination system (1) according to a sixth aspect, in any one of the first to fifth aspects, accepts setting information for setting processing content of at least one target process of the detection process and the determination process. A part (60) is further provided. The processing content of the target process is set based on the setting information received by the receiving unit (60).

この態様によれば、対象処理の処理内容を変更可能な表情判定システム（１）を提供することができる。 According to this aspect, it is possible to provide a facial expression determination system (1) capable of changing the processing content of the target processing.

第７の態様に係る表情判定システム（１）では、第６の態様において、受付部（６０）が、設定情報として、表情判定システム（１）の用途を表す用途情報を受け付けた場合、対象処理の処理内容が用途に応じた処理内容に設定される。 In the facial expression determination system (1) according to the seventh aspect, in the sixth aspect, when the receiving unit (60) receives, as setting information, use information indicating the use of the facial expression determination system (1), the target process Is set to the processing content according to the application.

この態様によれば、対象処理の処理内容を、用途に応じた処理内容に変更可能な表情判定システム（１）を提供することができる。 According to this aspect, it is possible to provide a facial expression determination system (1) capable of changing the processing content of the target processing to the processing content according to the application.

第８の態様に係る表情判定システム（１）では、第７の態様において、受付部（６０）が、設定情報として、用途情報を受け付けた後に変更情報を受け付けた場合、対象処理の処理内容が、用途に応じた処理内容から変更情報に応じて変更される。変更情報は、対象処理の処理内容を変更するための情報である。 In the facial expression determination system (1) according to the eighth aspect, in the seventh aspect, when the receiving unit (60) receives the change information after receiving the application information as the setting information, the processing content of the target processing is The processing content is changed according to the change information from the processing content according to the application. The change information is information for changing the processing content of the target process.

この態様によれば、対象処理の処理内容を、用途に応じた処理内容から変更可能な表情判定システム（１）を提供することができる。 According to this aspect, it is possible to provide a facial expression determination system (1) capable of changing the processing content of the target processing from the processing content according to the application.

第９の態様に係るプログラムは、コンピュータシステムに、検出処理と、判定処理と、を実行させる。検出処理では、顔を含む入力映像から、入力映像のうち顔の部分に変化が発生している対象期間を検出する。判定処理では、対象期間での入力映像に基づいて、少なくとも微表情が表れているか否かを判定する。 A program according to a ninth aspect causes a computer system to execute a detection process and a determination process. In the detection process, a target period in which a change occurs in the face portion of the input video is detected from the input video including the face. In the determination process, it is determined whether or not at least a faint expression is present based on the input video in the target period.

この態様によれば、判定処理では、検出処理によって顔の部分に変化が発生していると検出された対象期間での入力映像に基づいて、微表情が表れているか否かを判定している。したがって、微表情が表れていないのに微表情が表れていると誤検出する可能性を低減でき、誤検出の低減を図ることが可能なプログラムを提供することができる。 According to this aspect, in the determination process, it is determined whether or not a fine expression is present based on the input video in the target period in which the change in the face portion is detected by the detection process. . Therefore, it is possible to provide a program that can reduce the possibility of erroneous detection that a subtle facial expression is present even though a subtle facial expression is not present, and that can reduce erroneous detection.

第１０の態様に係る表情判定方法は、検出処理と、判定処理と、を含む。検出処理では、顔を含む入力映像から、入力映像のうち顔の部分に変化が発生している対象期間を検出する。判定処理では、対象期間での入力映像に基づいて、少なくとも微表情が表れているか否かを判定する。 The facial expression determination method according to the tenth aspect includes a detection process and a determination process. In the detection process, a target period in which a change occurs in the face portion of the input video is detected from the input video including the face. In the determination process, it is determined whether or not at least a faint expression is present based on the input video in the target period.

この態様によれば、判定処理では、検出処理によって顔の部分に変化が発生していると検出された対象期間での入力映像に基づいて、微表情が表れているか否かを判定している。したがって、微表情が表れていないのに微表情が表れていると誤検出する可能性を低減でき、誤検出の低減を図ることが可能な表情判定方法を提供することができる。 According to this aspect, in the determination process, it is determined whether or not a fine expression is present based on the input video in the target period in which the change in the face portion is detected by the detection process. . Therefore, it is possible to reduce the possibility of erroneously detecting that a subtle facial expression is present even though the subtle facial expression is not present, and to provide a facial expression determination method capable of reducing erroneous detection.

上記態様に限らず、上記実施形態に係る表情判定システム（１）の種々の構成（変形例を含む）は、表情判定方法、（コンピュータ）プログラム、又はプログラムを記録した非一時的記録媒体等で具現化可能である。 Not limited to the above-described aspect, various configurations (including modified examples) of the facial expression determination system (1) according to the above-described embodiment include a facial expression determination method, a (computer) program, a non-temporary recording medium on which the program is recorded, and the like. It can be embodied.

第２〜第８の態様に係る構成については、表情判定システム（１）に必須の構成ではなく、適宜省略可能である。 The configurations according to the second to eighth aspects are not essential components of the facial expression determination system (1) and can be omitted as appropriate.

１表情判定システム
２０検出部
３０処理部
３１認識部
３２判定部
６０受付部 Reference Signs List 1 facial expression determination system 20 detection unit 30 processing unit 31 recognition unit 32 determination unit 60 reception unit

Claims

From an input video including a face, a detection unit that performs a detection process of detecting a target period in which a change occurs in the face portion of the input video,
Based on the input video in the target period, a processing unit that performs a determination process to determine whether at least a subtle facial expression is appearing,
Facial expression judgment system.

There are several types of the micro-expression,
The processing unit, in the determination process, when it is determined that the fine expression is appearing, from the plurality of types, further determines the type of the fine expression appeared in the target period,
The facial expression determination system according to claim 1.

The processing unit includes a recognition unit and a determination unit,
The recognition unit, based on the input video in the target period, the change of the face portion in the input video in the target period, the accuracy information representing the accuracy corresponding to each of the plurality of types, Ask for multiple types,
The determining unit is configured to determine whether at least the micro-expression is present based on the accuracy information for each of the plurality of types obtained by the recognition unit.
The expression determination system according to claim 1.

The recognition unit, in the determination process, in addition to the accuracy information for each of the plurality of types, further obtains accuracy information representing the accuracy of the change of the face portion in the input video is not the fine expression,
The expression determination system according to claim 3.

The processing unit performs the determination process using a learned model created by machine learning,
The facial expression determination system according to claim 1.

Further comprising a receiving unit that receives setting information for setting the processing content of at least one of the target process of the detection process and the determination process,
The processing content of the target process is set based on the setting information received by the receiving unit,
The facial expression determination system according to claim 1.

When the receiving unit receives, as the setting information, use information indicating a use of the facial expression determination system, a process content of the target process is set to a process content corresponding to the use.
The facial expression determination system according to claim 6.

When the receiving unit receives the change information for changing the processing content of the target process after receiving the usage information as the setting information, the processing content of the target process is changed from the processing content corresponding to the usage. Changed according to the change information,
The facial expression determination system according to claim 7.

For computer systems,
From an input video including a face, a detection process of detecting a target period in which a change occurs in the face portion of the input video,
Based on the input video in the target period, a determination process to determine whether at least a subtle facial expression is appearing,
program.

From an input video including a face, a detection process of detecting a target period in which a change occurs in the face portion of the input video,
Based on the input video in the target period, a determination process to determine whether at least a micro-expression is appearing,
Expression determination method.