JP2022046419A

JP2022046419A - Image processing apparatus, image processing method, and machine readable storage medium

Info

Publication number: JP2022046419A
Application number: JP2021130032A
Authority: JP
Inventors: シ・ズチアン; Ziqiang Shi; リヌ・ホォイビヌ; Hui Bin Lin; リィウ・リィウ; Liu Liu; リィウ・ルゥジエ; Rujie Liu; ミイ・シャオユウ; Xiaoyu Mi; 健太郎村瀬; Kentaro Murase
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2020-09-10
Filing date: 2021-08-06
Publication date: 2022-03-23
Also published as: CN114170643A

Abstract

To provide an image processing apparatus, an image processing method, and a machine readable storage medium in the present invention.SOLUTION: An image processing apparatus includes: an information acquisition section that divides an image to be input into a plurality of areas and acquires information on face action units in the plurality of areas; an action unit characteristic extraction section that extracts action unit characteristics from the areas of the face action units based on the acquired information on the face action units; a face global characteristic extraction section that extracts face global characteristics based on the acquired information on the face action units; and a classification section that classifies the face action units based on the action unit characteristics and the face global characteristics.SELECTED DRAWING: Figure 1

Description

本発明は、画像処理の技術分野に関し、特に、顔微表情を認識するための画像処理装置、画像処理方法及びマシン可読記憶媒体に関する。 The present invention relates to a technical field of image processing, and more particularly to an image processing apparatus for recognizing facial expression, an image processing method, and a machine-readable storage medium.

顔部が常に人々の精神的及び感情的な状態に関する大量の情報を持っている。よって、顔微表情の認識に基づいて精神状態を定量化することができる。顔微表情の認識は実生活において多くの用途がある。例えば、従業員の心理に気を配って彼らのモチベーションを改善することで労働生産性を向上させることができ、顧客の満足度を推定することで彼らの購買モチベーションを改善することができ（デジタルマーケティング）、あるいは、ドライバーの運転状況を推定することができる。 The face always has a great deal of information about a person's mental and emotional state. Therefore, the mental state can be quantified based on the recognition of facial expression. Recognition of facial expression has many uses in real life. For example, paying attention to the psychology of employees and improving their motivation can improve labor productivity, and estimating customer satisfaction can improve their purchasing motivation (digital). Marketing) or the driving situation of the driver can be estimated.

よって、顔微表情を効果的に認識し得る装置を開発することは非常に重要である。 Therefore, it is very important to develop a device that can effectively recognize facial expression.

本発明の目的は、少なくとも、アクションユニット（部分的表出（ＡｃｔｉｏｎＵｎｉｔ（ＡＵ）））の出現に対しての検出に基づいて顔微表情の認識を行うことができる画像処理装置、画像処理方法及びマシン可読記憶媒体を提供することにある。 An object of the present invention is an image processing device and an image processing method capable of recognizing a facial expression based on detection of at least the appearance of an action unit (Action Unit (AU)). And to provide machine-readable storage media.

本発明の一側面によれば、画像処理装置が提供され、それは、
入力される画像を複数の領域に分割し、複数の領域における顔アクションユニットに関する情報を取得する情報取得部；
取得される顔アクションユニットに関する情報に基づいて、顔アクションユニットの領域からアクションユニット特徴を抽出するアクションユニット特徴抽出部；
取得される顔アクションユニットに関する情報に基づいて、顔グローバル特徴を抽出する顔グローバル特徴抽出部；及び
アクションユニット特徴及び顔グローバル特徴の両者に基づいて、顔アクションユニットに対して分類を行う分類部を含む。 According to one aspect of the invention, an image processing apparatus is provided, which is:
An information acquisition unit that divides the input image into multiple areas and acquires information about face action units in multiple areas;
Action unit feature extraction unit that extracts action unit features from the area of the face action unit based on the acquired information about the face action unit;
A face global feature extraction unit that extracts face global features based on the acquired information about the face action unit; and a classification unit that classifies face action units based on both the action unit features and the face global features. include.

本発明のもう１つの側面によれば、画像処理方法が提供され、それは、
入力される画像を複数の領域に分割し、複数の領域における顔アクションユニットに関する情報を取得し；
取得される顔アクションユニットに関する情報に基づいて、顔アクションユニットの領域からアクションユニット特徴を抽出し；
取得される顔アクションユニットに関する情報に基づいて、顔グローバル特徴を抽出し；及び
アクションユニット特徴及び顔グローバル特徴の両者に基づいて、顔アクションユニットに対して分類を行うことを含む。 According to another aspect of the invention, an image processing method is provided, which is:
Divide the input image into multiple areas and get information about face action units in multiple areas;
Based on the acquired information about the face action unit, the action unit features are extracted from the area of the face action unit;
It involves extracting face global features based on the information about the acquired face action units; and classifying the face action units based on both the action unit features and the face global features.

本発明のもう１つの側面によれば、マシン可読記憶媒体が提供され、その中には、マシン可読指令コードを含むプログラムプロダクトがキャリー（ｃａｒｒｙ）されており、前記指令コードはコンピュータにより読み取られて実行されるときに、前記コンピュータに、本発明による画像処理方法を実行させることができる。 According to another aspect of the present invention, a machine-readable storage medium is provided, in which a program product containing a machine-readable command code is carried, and the command code is read by a computer. When executed, the computer can be made to execute the image processing method according to the present invention.

本発明による画像処理装置、画像処理方法及びマシン可読記憶媒体によれば、顔部の各ローカル領域に対応する顔アクションユニットの出現を検出することで微表情を認識することができる。 According to the image processing apparatus, the image processing method, and the machine-readable storage medium according to the present invention, a microexpression can be recognized by detecting the appearance of a facial action unit corresponding to each local region of the face portion.

本発明の実施例における画像処理装置の構成ブロック図である。It is a block diagram of the structural block of the image processing apparatus in the Example of this invention. 本発明の実施例における画像処理装置の中の情報取得部の構成ブロック図である。It is a block diagram of the structure of the information acquisition part in the image processing apparatus in the Example of this invention. 本発明の実施例における画像処理装置の中のアクションユニット特徴抽出部の構成ブロック図である。It is a block diagram of the structure of the action unit feature extraction part in the image processing apparatus in the Example of this invention. 本発明の実施例における画像処理装置の中の顔グローバル特徴抽出部の構成ブロック図である。It is a block diagram of the face global feature extraction part in the image processing apparatus in the Example of this invention. 本発明のもう１つの実施例による画像処理装置の構成ブロック図である。It is a block diagram of the structural block of the image processing apparatus according to another embodiment of this invention. 本発明のもう１つの実施例による画像処理装置の中のアクションユニット領域確定部が新しい領域を確定することを示す図である。It is a figure which shows that the action unit area determination part in the image processing apparatus by another embodiment of this invention determines a new area. 本発明の実施例における画像処理方法のフローチャートである。It is a flowchart of the image processing method in an Example of this invention. 本発明の実施例における画像処理装置及び方法を実現し得る汎用パソコンの例示的な構成ブロック図である。It is an exemplary block diagram of a general-purpose personal computer that can realize the image processing apparatus and method in the embodiment of the present invention.

以下、添付した図面を参照しながら、本発明を実施するための好適な実施例を詳細に説明する。なお、このような実施例は、例示に過ぎず、本発明を限定するものでない。 Hereinafter, suitable examples for carrying out the present invention will be described in detail with reference to the accompanying drawings. It should be noted that such examples are merely examples and do not limit the present invention.

以下、図１に基づいて本発明の実施例における画像処理装置が如何に顔微表情を認識するかを説明する。 Hereinafter, how the image processing apparatus according to the embodiment of the present invention recognizes facial expression is described with reference to FIG. 1.

図１は、本発明の実施例における画像処理装置の構成ブロック図である。図１に示すように、本発明の実施例における画像処理装置１００は情報取得部１１０、アクションユニット特徴抽出部１２０、顔グローバル特徴抽出部１３０及び分類部１４０を含んでも良い。ここで、入力画像に１つのみの顔があるとする。顔微表情認識タスクは、ユーザ関心のすべての顔アクションユニットの出現を検出することである。顔アクションユニットが出現するかどうかは、該顔アクションユニットに対応する顔領域における顔の筋肉の運動があるかどうかを表すことができる。 FIG. 1 is a block diagram of a configuration of an image processing apparatus according to an embodiment of the present invention. As shown in FIG. 1, the image processing apparatus 100 in the embodiment of the present invention may include an information acquisition unit 110, an action unit feature extraction unit 120, a face global feature extraction unit 130, and a classification unit 140. Here, it is assumed that the input image has only one face. The facial expression recognition task is to detect the appearance of all facial action units of user interest. Whether or not a facial action unit appears can indicate whether or not there is movement of facial muscles in the facial region corresponding to the facial action unit.

まず、情報取得部１１０は、入力された画像を複数の領域に分割し、複数の領域における顔アクションユニットに関する情報を取得することができる。また、情報取得部１１０は、取得した複数の領域における顔アクションユニットに関する情報をアクションユニット特徴抽出部１２０及び顔グローバル特徴抽出部１３０に提供することができる。 First, the information acquisition unit 110 can divide the input image into a plurality of areas and acquire information about the face action unit in the plurality of areas. Further, the information acquisition unit 110 can provide the action unit feature extraction unit 120 and the face global feature extraction unit 130 with information about the face action unit in the acquired plurality of areas.

さらに、アクションユニット特徴抽出部１２０は、情報取得部１１０により提供された顔アクションユニットに関する情報に基づいて、顔アクションユニットの領域からアクションユニット特徴を抽出することができる。また、アクションユニット特徴抽出部１２０は、抽出したアクションユニット特徴を分類部１４０に提供することができる。 Further, the action unit feature extraction unit 120 can extract the action unit feature from the area of the face action unit based on the information about the face action unit provided by the information acquisition unit 110. Further, the action unit feature extraction unit 120 can provide the extracted action unit features to the classification unit 140.

さらに、顔グローバル特徴抽出部１３０は、情報取得部１１０により提供された顔アクションユニットに関する情報に基づいて、顔グローバル特徴を抽出することができる。また、顔グローバル特徴抽出部１３０は、抽出した顔グローバル特徴を分類部１４０に提供することができる。 Further, the face global feature extraction unit 130 can extract the face global feature based on the information about the face action unit provided by the information acquisition unit 110. Further, the face global feature extraction unit 130 can provide the extracted face global feature to the classification unit 140.

さらに、分類部１４０は、アクションユニット特徴抽出部１２０により提供されたアクションユニット特徴及び顔グローバル特徴抽出部１３０により提供された顔グローバル特徴の両者に基づいて、顔アクションユニットに対して分類を行うことができる。 Further, the classification unit 140 classifies the face action unit based on both the action unit feature provided by the action unit feature extraction unit 120 and the face global feature provided by the face global feature extraction unit 130. Can be done.

従って、本発明の実施例における画像処理装置１００によれば、効果的なエンドツーエンド装置を提供することができ、それは、顔部の各ローカル領域に対応する顔アクションユニットの出現を検出することによって顔微表情の認識を行うことができる。また、画像処理装置１００は、アクションユニット特徴及び顔グローバル特徴の両者を利用することができるため、より良い検出結果を得ることができる。 Therefore, according to the image processing apparatus 100 in the embodiment of the present invention, it is possible to provide an effective end-to-end apparatus, which detects the appearance of a facial action unit corresponding to each local region of the face. It is possible to recognize facial expression. Further, since the image processing device 100 can utilize both the action unit feature and the face global feature, better detection results can be obtained.

以下、図２～図４に基づいて本発明の実施例における画像処理装置１００の中の情報取得部１１０、アクションユニット特徴抽出部１２０及び顔グローバル特徴抽出部１３０の構成について説明する。 Hereinafter, the configurations of the information acquisition unit 110, the action unit feature extraction unit 120, and the face global feature extraction unit 130 in the image processing device 100 according to the embodiment of the present invention will be described with reference to FIGS. 2 to 4.

図２は、本発明の実施例における画像処理装置１００の中の情報取得部１１０の構成ブロック図である。図２では、情報取得部１１０は一例として情報取得部２００である。 FIG. 2 is a block diagram of the information acquisition unit 110 in the image processing device 100 according to the embodiment of the present invention. In FIG. 2, the information acquisition unit 110 is an information acquisition unit 200 as an example.

情報取得部２００は複数の畳み込み層があり、入力された人間顔画像の中の複数の領域における顔アクションユニットに関する情報を得るために用いられる。好ましくは、顔アクションユニットに関する情報は、顔アクションユニットの画像における位置を指示する情報を含み得る。好ましくは、情報取得部２００は、画像を低次元空間における特徴マップに変換することで、顔アクションユニットに関する情報を取得することができる。 The information acquisition unit 200 has a plurality of convolution layers, and is used to obtain information about a face action unit in a plurality of regions in an input human face image. Preferably, the information about the face action unit may include information indicating the position of the face action unit in the image. Preferably, the information acquisition unit 200 can acquire information about the face action unit by converting the image into a feature map in a low-dimensional space.

図２に示すように、入力画像が２次元畳み込み（Ｃｏｎｖ，ｃｏｎｖｏｌｕｔｉｏｎ）ブロック２１０に提供される。Ｃｏｎｖブロック２１０は入力画像に対して畳み込み演算を行うことで、入力画像に対して特徴学習を行い、抽象化を行うことができる。また、Ｃｏｎｖブロック２１０の処理を経た画像が分割ブロック２２０に入力され、分割ブロック２２０は入力画像をｎ個の領域に分割することができる。 As shown in FIG. 2, the input image is provided to the two-dimensional convolution block 210. The Conv block 210 can perform feature learning on the input image and perform abstraction by performing a convolution operation on the input image. Further, the image processed by the Conv block 210 is input to the division block 220, and the division block 220 can divide the input image into n areas.

その後、ｎ個の領域のうちの各々が１組のバッチ正規化（ＢａｔｃｈＮｏｒｍ，ｂａｔｃｈｎｏｒｍａｌｉｚａｔｉｏｎ）ブロック２３０－１～２３０－ｎ、パラメトリックＲｅＬＵ（ＰＲｅＬＵ，ｐａｒａｍｅｔｒｉｃｒｅｃｔｉｆｉｅｄｌｉｎｅａｒｕｎｉｔ）ブロック２４０－１～２４０－ｎ、及びＣｏｎｖブロック２５０－１～２５０－ｎを通過して処理を受ける。ここで、図２に示すＢａｔｃｈＮｏｒｍブロック２３０－ｉ、ＰＲｅＬＵブロック２４０－ｉ及びＣｏｎｖブロック２５０－ｉにおけるｉは、１よりも大きくかつｎよりも小さい整数である。 Then, each of the n regions is a set of batch normalization (BatchNorm, batch normalization) blocks 230-1 to 230-n, and parametric ReLU (PreLU, parametric rectified linear unit) blocks 240-1 to 240-n. , And the Conv blocks 250-1 to 250-n for processing. Here, i in BatchNorm block 230-i, PRELU block 240-i, and Conv block 250-i shown in FIG. 2 is an integer larger than 1 and smaller than n.

このｎ組のＢａｔｃｈＮｏｒｍブロック、ＰＲｅＬＵブロック及びＣｏｎｖブロックにより出力された処理結果がつなぎ合わせブロック２６０に提供され、つなぎ合わせブロック２６０はこのｎ個の領域の処理結果をつなぎ合わせることができる。例えば、つなぎ合わせブロック２６０は、このｎ個の領域の処理結果を、分割ブロック２２０によるこのｎ個領域の分割に対応してつなぎ合わせることができる。その後、つなぎ合わせブロック２６０により処理された結果は重ね合わせブロック２７０に提供される。重ね合わせブロック２７０は、Ｃｏｎｖブロック２１０により処理された画像と、つなぎ合わせブロック２６０により処理された結果との重ね合わせを行い、そして、重ね合わせられた結果をＢａｔｃｈＮｏｒｍブロック２８０、ＰＲｅＬＵブロック２９０及び最大プーリング（Ｍａｘｐｏｏｌｉｎｇ）ブロック２９５に順次提供して更なる処理を行ってもらう。 The processing results output by the n sets of BatchNorm blocks, PRELU blocks, and Conv blocks are provided to the joining block 260, and the joining block 260 can join the processing results of the n regions. For example, the stitching block 260 can stitch the processing results of the n regions corresponding to the division of the n regions by the split block 220. The result processed by the superposition block 260 is then provided to the superposition block 270. The superimposition block 270 superimposes the image processed by the Conv block 210 and the result processed by the splicing block 260, and superimposes the superimposed results on the BatchNorm block 280, the PRELU block 290 and the maximum pooling. (Maxpooling) Block 295 is sequentially provided for further processing.

最後に、情報取得部２００は、特徴マップを出力として図１に示すアクションユニット特徴抽出部１２０及び顔グローバル特徴抽出部１３０に提供する。 Finally, the information acquisition unit 200 provides the feature map as an output to the action unit feature extraction unit 120 and the face global feature extraction unit 130 shown in FIG.

例えば、情報取得部２００は一例として均一ブロック学習モジュールであっても良く、均一ブロック学習モジュールが入力画像を例えば８×８個の領域に分割することができるため、ｎは６４である。６４個の領域のうちの各々が１組のＢａｔｃｈＮｏｒｍブロック、ＰＲｅＬＵブロック及びＣｏｎｖブロックを通過して処理を受ける。その後、各組のＢａｔｃｈＮｏｒｍブロック、ＰＲｅＬＵブロック及びＣｏｎｖブロックの出力特徴マップがつなぎ合わせられ、そして、上述のような重ね合わせ演算が行われた後に順次ＢａｔｃｈＮｏｒｍブロック２８０、ＰＲｅＬＵブロック２９０及びＭａｘｐｏｏｌｉｎｇブロック２９５に供給される。ここで、最終的な出力特徴マップも８×８個のユニットに分割される。このような場合、出力された特徴マップは、均一特徴マップとも称される。 For example, the information acquisition unit 200 may be a uniform block learning module as an example, and n is 64 because the uniform block learning module can divide an input image into, for example, 8 × 8 regions. Each of the 64 regions passes through a set of BatchNorm blocks, PRELU blocks and Conv blocks for processing. After that, the output feature maps of each set of BatchNorm blocks, PRELU blocks, and Conv blocks are stitched together, and after the superposition operation as described above is performed, the BatchNorm blocks 280, PRELU blocks 290, and Maxpooling blocks 295 are sequentially supplied. Will be done. Here, the final output feature map is also divided into 8 × 8 units. In such a case, the output feature map is also referred to as a uniform feature map.

図３は、本発明の実施例における画像処理装置１００の中のアクションユニット特徴抽出部１２０の構成ブロック図である。図３では、アクションユニット特徴抽出部１２０は一例としてアクションユニット特徴抽出部３００である。 FIG. 3 is a block diagram of the action unit feature extraction unit 120 in the image processing device 100 according to the embodiment of the present invention. In FIG. 3, the action unit feature extraction unit 120 is an action unit feature extraction unit 300 as an example.

アクションユニット特徴抽出部３００は、情報取得部により提供された顔アクションユニットに関する情報に基づいて、顔アクションユニットの領域からアクションユニット特徴を抽出することができる。アクションユニット特徴抽出部３００は、幾つかの畳み込み層からなっても良く、これらの畳み込み層はアクションユニット領域にのみ参与する。好ましくは、アクションユニット特徴抽出部３００は、複数の畳み込み層により、各顔アクションユニットの領域について、アクションユニット特徴を単独で抽出することができる。アクションユニット特徴は例えば、人間顔のＡＵ領域の最終特徴マップであっても良い。 The action unit feature extraction unit 300 can extract action unit features from the area of the face action unit based on the information about the face action unit provided by the information acquisition unit. The action unit feature extraction unit 300 may be composed of several convolution layers, and these convolution layers participate only in the action unit region. Preferably, the action unit feature extraction unit 300 can independently extract the action unit features for the region of each face action unit by the plurality of convolution layers. The action unit feature may be, for example, the final feature map of the AU region of the human face.

例えば、図３に示すように、アクションユニット特徴抽出部３００は、人間顔画像のｍ個のＡＵ領域の特徴マップが提供される。より詳しくは、ｍ個のＡＵ領域の特徴マップのうちの各々がそれぞれ１組のＣｏｎｖブロック３１０－１～３１０－ｍ、ＢａｔｃｈＮｏｒｍブロック３２０－１～３２０－ｍ及びＰＲｅＬＵブロック３３０－１～３３０－ｍに入力され、かつＣｏｎｖブロック３１０－１～３１０－ｍ、ＢａｔｃｈＮｏｒｍブロック３２０－１～３２０－ｍ及びＰＲｅＬＵブロック３３０－１～３３０－ｍにより処理された後の結果がそれぞれ対応するＭａｘｐｏｏｌｉｎｇブロック３７０－１～３７０－ｍに入力される。そして、Ｍａｘｐｏｏｌｉｎｇブロック３７０－１～３７０－ｍの処理を経た後に、人間顔におけるｍ個のＡＵ領域の最終特徴マップを得ることができる。好ましくは、ｍは１２であり、即ち、入力されるのが人間顔の１２個のＡＵ領域の初期特徴マップであり、出力されるのは人間顔の１２個のＡＵ領域の最終特徴マップである。 For example, as shown in FIG. 3, the action unit feature extraction unit 300 is provided with a feature map of m AU regions of a human face image. More specifically, each of the m feature maps of the AU regions is a set of Conv blocks 310-1 to 310-m, BatchNorm blocks 320-1 to 320-m, and PRELU blocks 330-1 to 330-m. And the results after being processed by Conv blocks 310-1 to 310-m, BatchNorm blocks 320-1 to 320-m and PRELU blocks 330-1 to 330-m are the corresponding Maxpolling blocks 370-1 respectively. It is input to ~ 370-m. Then, after the processing of Maxpooling blocks 370-1 to 370-m, the final feature map of m AU regions on the human face can be obtained. Preferably, m is 12, i.e., the input is the initial feature map of the 12 AU regions of the human face, and the output is the final feature map of the 12 AU regions of the human face. ..

なお、各ＡＵ領域について、複数組のＣｏｎｖブロック、ＢａｔｃｈＮｏｒｍブロック及びＰＲｅＬＵブロックを用いて、入力されたＡＵ領域の特徴マップに対して処理を行うことができ、このようにして、アクションユニット特徴に対してさらに最適化を行うことで、後続の分類部１４０の分類処理の正確性の向上を助けることができる。例えば、図３では、１組のＣｏｎｖブロック３４０－１～３４０－ｍ、ＢａｔｃｈＮｏｒｍブロック３５０－１～３５０－ｍ及びＰＲｅＬＵブロック３６０－１～３６０－ｍが増加している。 For each AU area, it is possible to process the input feature map of the AU area by using a plurality of sets of Conv blocks, BatchNorm blocks, and PRELU blocks. In this way, the action unit features can be processed. Further optimization can help improve the accuracy of the subsequent classification process of the classification unit 140. For example, in FIG. 3, one set of Conv blocks 340-1 to 340-m, BatchNorm blocks 350-1 to 350-m, and PRELU blocks 360-1 to 360-m are increased.

図４は、本発明の実施例における画像処理装置１００の中の顔グローバル特徴抽出部１３０の構成ブロック図である。図４では、顔グローバル特徴抽出部１３０は一例として顔グローバル特徴抽出部４００である。 FIG. 4 is a block diagram of the face global feature extraction unit 130 in the image processing device 100 according to the embodiment of the present invention. In FIG. 4, the face global feature extraction unit 130 is, for example, the face global feature extraction unit 400.

図４における顔グローバル特徴抽出部４００は、情報取得部により提供された顔アクションユニットに関する情報に基づいて、顔グローバル特徴を抽出することができる。顔グローバル特徴学習は幾つかの畳み込み層からなり、これらの畳み込み層は顔全体に広がる。好ましくは、顔グローバル特徴抽出部４００は、複数の畳み込み層により、情報取得部によって提供された特徴マップに基づいて、顔グローバル特徴を抽出することができる。 The face global feature extraction unit 400 in FIG. 4 can extract face global features based on the information about the face action unit provided by the information acquisition unit. Face global feature learning consists of several convolutional layers, which spread over the entire face. Preferably, the face global feature extraction unit 400 can extract the face global feature by the plurality of convolution layers based on the feature map provided by the information acquisition unit.

図４に示すように、顔グローバル特徴抽出部４００は、人間顔画像の特徴マップ、例えば、均一特徴マップが提供される。具体的には、特徴マップがＣｏｎｖブロック４１０、ＢａｔｃｈＮｏｒｍブロック４２０及びＰＲｅＬＵブロック４３０に入力され、Ｃｏｎｖブロック４１０、ＢａｔｃｈＮｏｒｍブロック４２０及びＰＲｅＬＵブロック４３０の処理を経た後の結果はＭａｘｐｏｏｌｉｎｇブロック４７０に入力される。そして、Ｍａｘｐｏｏｌｉｎｇブロック４７０により処理された後に、人間顔のグローバル特徴マップを取得することができる。 As shown in FIG. 4, the face global feature extraction unit 400 provides a feature map of a human face image, for example, a uniform feature map. Specifically, the feature map is input to the Conv block 410, the BatchNorm block 420 and the PRELU block 430, and the result after the processing of the Conv block 410, the BatchNorm block 420 and the PRELU block 430 is input to the Maxpolling block 470. Then, after being processed by the Maxpooling block 470, the global feature map of the human face can be acquired.

なお、複数組のＣｏｎｖブロック、ＢａｔｃｈＮｏｒｍブロック及びＰＲｅＬＵブロックを用いて、入力された特徴マップに対して処理を行うことができ、このようにして、顔グローバル特徴に対してさらに最適化を行うことで、後続の分類部１４０の分類処理の正確性の向上を助けることができる。例えば、図４では、１組のＣｏｎｖブロック４４０、ＢａｔｃｈＮｏｒｍブロック４５０及びＰＲｅＬＵブロック４６０が増加している。 It should be noted that the input feature map can be processed by using a plurality of sets of Conv block, BatchNorm block and PRELU block, and in this way, further optimization is performed for the face global feature. , It is possible to help improve the accuracy of the subsequent classification process of the classification unit 140. For example, in FIG. 4, a set of Conv blocks 440, BatchNorm blocks 450 and PRELU blocks 460 are increasing.

続いて、図１における分類部１４０の操作を説明する。分類部１４０は、アクションユニット特徴及び顔グローバル特徴を組み合わせることに基づいて、アクションユニットの出現を検出することができる。例えば、分類部１４０は、全結合のニューラルネットワークであっても良い。あるいは、分類部１４０は、ｓｏｆｔｍａｘ出力を有するとても少ない線形層からなっても良く、その出力はすべてのアクションユニットの出現の確率である。換言すれば、分類部１４０はすべての顔アクションユニットの出現の確率を分析することで、顔アクションユニットに対して分類を行うことができる。好ましくは、分類部１４０は、ｓｏｆｔｍａｘ出力を有する線形層を用いて、すべての顔アクションユニットの出現の確率を分析することができる。 Subsequently, the operation of the classification unit 140 in FIG. 1 will be described. The classification unit 140 can detect the appearance of an action unit based on the combination of the action unit feature and the face global feature. For example, the classification unit 140 may be a fully connected neural network. Alternatively, the classifier 140 may consist of very few linear layers with a softmax output, which output is the probability of appearance of all action units. In other words, the classification unit 140 can classify the face action units by analyzing the probability of appearance of all the face action units. Preferably, the classification unit 140 can analyze the probability of appearance of all facial action units using a linear layer with a softmax output.

なお、本発明の実施例における画像処理装置１００の使用は２つの段階、即ち、訓練段階及び評価段階がある。訓練段階では、アクションユニット出現ラベル付き顔画像を用いてネットワークを訓練し、評価段階では、訓練済みネットワークがテスト顔画像における動作を検出しパフォーマンスを評価するために用いられる。例えば、訓練段階では、顔画像、顔特徴部位（例えば、眉毛、口角など）に関する特徴情報、顔アクションユニットに関するリンケージアノテーションなどが図１に示す又は後述の図５に示す画像処理装置に入力され、画像処理装置により処理された後に、異なる顔アクションユニットの予測確率を得ることができる。 The use of the image processing apparatus 100 in the embodiment of the present invention has two stages, that is, a training stage and an evaluation stage. In the training phase, the network is trained using the face image labeled with the appearance of the action unit, and in the evaluation phase, the trained network is used to detect the movement in the test facial image and evaluate the performance. For example, in the training stage, facial images, feature information related to facial feature parts (for example, eyebrows, corners of the mouth, etc.), linkage annotations related to the face action unit, etc. are input to the image processing device shown in FIG. 1 or shown in FIG. 5, which will be described later. After being processed by the image processing device, the prediction probabilities of different face action units can be obtained.

以下、図５及び図６を参照しながら本発明のもう１つの実施例における画像処理装置５００を説明する。 Hereinafter, the image processing apparatus 500 according to another embodiment of the present invention will be described with reference to FIGS. 5 and 6.

図５は、本発明のもう１つの実施例における画像処理装置５００の構成ブロック図であり、図６は、本発明のもう１つの実施例における画像処理装置５００の中のアクションユニット領域確定部５５０が新しい領域を確定することを示す図である。 FIG. 5 is a block diagram of the image processing apparatus 500 according to another embodiment of the present invention, and FIG. 6 is an action unit area determination unit 550 in the image processing apparatus 500 according to another embodiment of the present invention. Is a diagram showing that a new region is determined.

図５に示すように、本発明のもう１つの実施例における画像処理装置５００はアクションユニット領域確定部５５０をさらに含んでも良い。図５に示す情報取得部５１０、アクションユニット特徴抽出部５２０、顔グローバル特徴抽出部５３０及び分類部５４０はそれぞれ図１に示す情報取得部１１０、アクションユニット特徴抽出部１２０、顔グローバル特徴抽出部１３０及び分類部１４０に対する。よって、図１～図４に基づく情報取得部、アクションユニット特徴抽出部、顔グローバル特徴抽出部及び分類部の説明も、同様に、本実施例における情報取得部５１０、アクションユニット特徴抽出部５２０、顔グローバル特徴抽出部５３０及び分類部５４０の説明に適用され得る。そのため、ここではその詳しい説明を省略する。 As shown in FIG. 5, the image processing apparatus 500 in another embodiment of the present invention may further include an action unit region determination unit 550. The information acquisition unit 510, the action unit feature extraction unit 520, the face global feature extraction unit 530, and the classification unit 540 shown in FIG. 5 are the information acquisition unit 110, the action unit feature extraction unit 120, and the face global feature extraction unit 130, respectively, as shown in FIG. And for the classification unit 140. Therefore, the description of the information acquisition unit, the action unit feature extraction unit, the face global feature extraction unit, and the classification unit based on FIGS. 1 to 4 also includes the information acquisition unit 510 and the action unit feature extraction unit 520 in the present embodiment. It can be applied to the description of the face global feature extraction unit 530 and the classification unit 540. Therefore, the detailed description thereof will be omitted here.

具体的には、情報取得部５１０は、入力された画像を複数の領域に分割し、複数の領域における顔アクションユニットに関する情報を取得することができる。また、情報取得部５１０は、取得した複数の領域における顔アクションユニットに関する情報をアクションユニット領域確定部５５０及び顔グローバル特徴抽出部５３０に提供することができる。 Specifically, the information acquisition unit 510 can divide the input image into a plurality of areas and acquire information about the face action unit in the plurality of areas. In addition, the information acquisition unit 510 can provide information about the face action unit in the acquired plurality of areas to the action unit area determination unit 550 and the face global feature extraction unit 530.

さらに、アクションユニット領域確定部５５０は、情報取得部５１０により提供された顔アクションユニットに関する情報（例えば、特徴マップ）に基づいて、顔アクションユニットの新しい領域を再確定することができる。また、アクションユニット領域確定部５５０は、確定した新しい領域及び新しい領域に関する情報をアクションユニット特徴抽出部５２０に提供することができる。 Further, the action unit area determination unit 550 can redetermine a new area of the face action unit based on the information (for example, feature map) about the face action unit provided by the information acquisition unit 510. Further, the action unit area determination unit 550 can provide the determined new area and information regarding the new area to the action unit feature extraction unit 520.

さらに、アクションユニット特徴抽出部５２０は、アクションユニット領域確定部５５０により提供された顔アクションユニットの新しい領域及びそれに関する情報に基づいて、顔アクションユニットの新しい領域からアクションユニット特徴を抽出することができる。また、アクションユニット特徴抽出部５２０は、抽出したアクションユニット特徴を分類部５４０に提供することができる。 Further, the action unit feature extraction unit 520 can extract the action unit features from the new area of the face action unit based on the new area of the face action unit provided by the action unit area determination unit 550 and the information about the new area. .. Further, the action unit feature extraction unit 520 can provide the extracted action unit features to the classification unit 540.

さらに、顔グローバル特徴抽出部５３０は、情報取得部５１０により提供された顔アクションユニットに関する情報に基づいて、顔グローバル特徴を抽出することができる。また、顔グローバル特徴抽出部５３０は、抽出した顔グローバル特徴を分類部５４０に提供することができる。 Further, the face global feature extraction unit 530 can extract the face global feature based on the information about the face action unit provided by the information acquisition unit 510. Further, the face global feature extraction unit 530 can provide the extracted face global feature to the classification unit 540.

さらに、分類部５４０は、アクションユニット特徴抽出部５２０により提供されたアクションユニット特徴及び顔グローバル特徴抽出部５３０により提供された顔グローバル特徴の両者に基づいて、顔アクションユニットに対して分類を行うことができる。 Further, the classification unit 540 classifies the face action unit based on both the action unit feature provided by the action unit feature extraction unit 520 and the face global feature provided by the face global feature extraction unit 530. Can be done.

よって、図１に示す実施例との相違点は、情報取得部５１０がその取得した複数の領域における顔アクションユニットに関する情報をアクションユニット領域確定部５５０及び顔グローバル特徴抽出部５３０に提供し、そして、アクションユニット特徴抽出部５２０がアクションユニット領域確定部５５０により確定された新しい領域からアクションユニット特徴を抽出することにある。 Therefore, the difference from the embodiment shown in FIG. 1 is that the information acquisition unit 510 provides the acquired information on the face action unit in the plurality of regions to the action unit region determination unit 550 and the face global feature extraction unit 530, and The action unit feature extraction unit 520 extracts the action unit feature from the new area determined by the action unit area determination unit 550.

好ましくは、アクションユニット領域確定部５５０は、アクションユニット特徴抽出部５２０がアクションユニット特徴を抽出する前に、各顔アクションユニットについて新しい領域を確定し、新しい領域は画像における顔領域のみを含み、そして、アクションユニット特徴抽出部５２０は確定された新しい領域に基づいてアクションユニット特徴を抽出する。 Preferably, the action unit area determination unit 550 determines a new area for each face action unit before the action unit feature extraction unit 520 extracts the action unit features, the new area including only the face area in the image, and , The action unit feature extraction unit 520 extracts the action unit features based on the determined new area.

好ましくは、アクションユニット領域確定部５５０は次のような方式により各顔アクションユニットについて新しい領域を確定し、即ち、画像における顔アクションユニットに基づいて、３次元（３Ｄ）顔モデルを構築し、そして、３Ｄ顔モデルを用いて顔アクションユニットの領域に対して修正を行うことで、修正後の各顔アクションユニットの領域が画像における顔領域のみを含むようにさせる。 Preferably, the action unit area determination unit 550 determines a new area for each face action unit by the following method, that is, constructs a three-dimensional (3D) face model based on the face action unit in the image, and then By modifying the area of the face action unit using the 3D face model, the area of each modified face action unit is made to include only the face area in the image.

例えば、アクションユニット領域確定部５５０は、まず、３ＤＭＭを用いてオリジナル画像から３Ｄ顔モデルを構築し、その後、該３Ｄ顔モデルに基づいて各アクションユニットのために新しい領域を確定する。これは、姿勢及び標識を既存の３ＤＭＭのパラメータとすることで特定の３Ｄ顔モデルを構築することを意味する。３ＤＭＭモデルは唯一のものであり、特定の画像について異なる３Ｄモデルを構築するために姿勢及び標識を提供する必要がある。 For example, the action unit area determination unit 550 first constructs a 3D face model from the original image using 3DMM, and then determines a new area for each action unit based on the 3D face model. This means constructing a specific 3D face model by using posture and marking as parameters of the existing 3DMM. The 3DMM model is unique and needs to provide orientation and markings to build different 3D models for a particular image.

図６に示すように、左側の図は修正前の顔アクションユニットの領域であり、それは画像における顔領域に属しない部分、例えば、髪、背景などを含み、右側の図は修正後の顔アクションユニットの領域であり、それは画像における顔領域のみを含む。図６は、アクションユニット領域確定部５５０が目尻の領域に関連する顔アクションユニットの領域を修正したことを示している。あるいは、アクションユニット領域確定部５５０は顔部の他の領域に関連する顔アクションユニットの領域を修正することもできる。 As shown in FIG. 6, the figure on the left is the area of the face action unit before modification, which includes parts of the image that do not belong to the face area, such as hair, background, etc., and the figure on the right is the area of face action after modification. The area of the unit, which contains only the face area in the image. FIG. 6 shows that the action unit area determination unit 550 modified the area of the face action unit related to the area of the outer corner of the eye. Alternatively, the action unit area determination unit 550 can modify the area of the face action unit related to other areas of the face.

実際の画像の撮影にあたって、カメラの角度が異なるので、撮影された人間顔画像では、顔が前方を向くのではなく特定の側に寄る可能性がある。このような場合、アクションユニット領域確定部５５０を有する画像処理装置５００を用いて、顔アクションユニットの領域に対して修正を行うことにより、撮影角度の影響を無くし、アクションユニット特徴に対してのより効果的な推定を提供することができる。 In the actual shooting of the image, the angle of the camera is different, so in the shot human face image, the face may move to a specific side instead of facing forward. In such a case, the image processing device 500 having the action unit area determination unit 550 is used to correct the area of the face action unit to eliminate the influence of the shooting angle and to improve the characteristics of the action unit. An effective estimate can be provided.

従って、本発明の実施例における画像処理装置５００によれば、姿勢（顔向き）の変化にロバストな顔微表情認識を自動で行うことができる効果的なエンドツーエンド装置が提供され得る。該画像処理装置は、頭部の姿勢に関係なく、顔部の各ローカル領域に対応する顔アクションユニットの出現を検出することで顔微表情を認識することができる。 Therefore, according to the image processing device 500 in the embodiment of the present invention, an effective end-to-end device capable of automatically performing facial expression recognition that is robust to changes in posture (face orientation) can be provided. The image processing device can recognize a facial expression by detecting the appearance of a facial action unit corresponding to each local region of the face portion, regardless of the posture of the head.

以下、図７をもとに本発明の実施例における画像処理方法について説明する。 Hereinafter, the image processing method in the embodiment of the present invention will be described with reference to FIG. 7.

図７に示すように、本発明の実施例における画像処理方法はステップＳ１１０でスタートする。ステップＳ１１０では、入力された画像を複数の領域に分割し、複数の領域における顔アクションユニットに関する情報を取得する。 As shown in FIG. 7, the image processing method in the embodiment of the present invention starts in step S110. In step S110, the input image is divided into a plurality of areas, and information about the face action unit in the plurality of areas is acquired.

続いて、ステップＳ１２０では、取得された顔アクションユニットに関する情報に基づいて顔アクションユニットの領域からアクションユニット特徴を抽出する。 Subsequently, in step S120, the action unit feature is extracted from the area of the face action unit based on the acquired information about the face action unit.

続いて、ステップＳ１３０では、取得された顔アクションユニットに関する情報に基づいて顔グローバル特徴を抽出する。 Subsequently, in step S130, the face global feature is extracted based on the acquired information about the face action unit.

続いて、ステップＳ１４０では、アクションユニット特徴及び顔グローバル特徴の両者に基づいて顔アクションユニットに対して分類を行う。その後、プロセスは終了する。 Subsequently, in step S140, the face action unit is classified based on both the action unit feature and the face global feature. After that, the process ends.

従って、本発明の実施例における画像処理方法によれば、顔部の各ローカル領域に対応する顔アクションユニットの出現を検出することによって顔微表情認識を行うことができる。 Therefore, according to the image processing method in the embodiment of the present invention, facial expression recognition can be performed by detecting the appearance of a facial action unit corresponding to each local region of the face portion.

本発明の実施例において、該方法は、アクションユニット特徴を抽出する前に、さらに、各顔アクションユニットについて新しい領域を確定するステップを含み、新しい領域は画像における顔領域のみを含み、そして、確定された新しい領域に基づいて、アクションユニット特徴を抽出する。 In an embodiment of the invention, the method further comprises determining a new region for each facial action unit prior to extracting the action unit features, the new region comprising only the facial region in the image and determining. Extract action unit features based on the new area created.

本発明の実施例において、各顔アクションユニットについて新しい領域を確定するステップは、画像における顔アクションユニットに基づいて３次元（３Ｄ）顔モデルを構築し、そして、３Ｄ顔モデルを用いて顔アクションユニットの領域に対して修正を行い、修正後の各顔アクションユニットの領域が画像における顔領域のみを含むようにさせることを含む。 In an embodiment of the invention, the step of determining a new region for each face action unit builds a three-dimensional (3D) face model based on the face action unit in the image, and uses the 3D face model to build the face action unit. This includes modifying the area of the face so that the area of each modified face action unit includes only the face area in the image.

従って、本発明の実施例における画像処理方法によれば、頭部の姿勢に関係なく、姿勢（顔向き）の変化にロバストな顔微表情認識を自動で行うことができる。 Therefore, according to the image processing method in the embodiment of the present invention, it is possible to automatically perform facial expression recognition that is robust to changes in posture (face orientation) regardless of the posture of the head.

本発明の実施例において、顔アクションユニットに関する情報は、顔アクションユニットの画像における位置を指示する情報を含む。 In the embodiment of the present invention, the information about the face action unit includes information indicating the position of the face action unit in the image.

本発明の実施例において、顔アクションユニットに関する情報の取得は、画像を低次元空間における特徴マップに変換することを含む。 In an embodiment of the invention, acquisition of information about a face action unit involves transforming an image into a feature map in low dimensional space.

本発明の実施例において、アクションユニット特徴の抽出は、複数の畳み込み層により、各顔アクションユニットの領域について、アクションユニット特徴を単独で抽出することを含む。 In the embodiment of the present invention, the extraction of the action unit feature includes extracting the action unit feature independently for each face action unit region by a plurality of convolution layers.

本発明の実施例において、顔グローバル特徴の抽出は、複数の畳み込み層により、特徴マップに基づいて、顔グローバル特徴を抽出することを含む。 In the embodiment of the present invention, the extraction of the face global feature includes extracting the face global feature by a plurality of convolution layers based on the feature map.

本発明の実施例において、複数の畳み込み層は少なくとも、バッチ正規化ユニット、パラメトリックＰＲｅＬＵ、２次元畳み込みユニット及び最大プーリングユニットを含む。 In the embodiments of the present invention, the plurality of convolution layers includes at least a batch normalization unit, a parametric PRELU, a two-dimensional convolution unit and a maximum pooling unit.

本発明の実施例において、顔アクションユニットに対しての分類は、すべての顔アクションユニットの出現の確率を分析することを含む。 In the embodiments of the present invention, the classification for facial action units involves analyzing the probability of appearance of all facial action units.

本発明の実施例において、ｓｏｆｔｍａｘ出力を有する線形層を用いて、すべての顔アクションユニットの出現の確率を分析する。 In an embodiment of the invention, a linear layer with a softmax output is used to analyze the probability of appearance of all facial action units.

本発明の実施例における画像処理方法の上述のステップの各種の具体的な実施方式について前に詳しく説明されているから、ここではその詳細な説明を省略する。 Since various specific implementation methods of the above-mentioned steps of the image processing method in the embodiment of the present invention have been described in detail above, the detailed description thereof will be omitted here.

明らかのように、本発明の画像処理方法の各操作プロセスは、各種のマシン可読記憶媒体に記憶されているコンピュータ実行可能なプログラムにより実現され得る。 As is clear, each operational process of the image processing method of the present invention can be realized by a computer-executable program stored in various machine-readable storage media.

また、本発明の目的は次のような方式で実現されても良く、即ち、上述の実行可能なプログラムコードが記憶されている記憶媒体を直接又は間接的にシステム又は装置に提供し、該システム又は装置におけるコンピュータ又は中央処理ユニット（ＣＰＵ）は上述のプログラムコードを読み出して実行する。このときに、該システム又は装置がプログラムを実行し得る機能を有すれば、本発明の実施方式はプログラムに限定されず、また、該プログラムは任意の形式、例えば、オブジェクト指向プログラム、インタプリタによって実行されるプログラム、オペレーティングシステムに提供されるスクリプトプログラムなどであっても良い。 Further, an object of the present invention may be realized by the following method, that is, a storage medium in which the above-mentioned executable program code is stored is directly or indirectly provided to a system or an apparatus, and the system is provided. Alternatively, the computer or central processing unit (CPU) in the device reads and executes the above-mentioned program code. At this time, if the system or device has a function capable of executing a program, the implementation method of the present invention is not limited to the program, and the program is executed by any form, for example, an object-oriented program or an interpreter. It may be a program to be executed, a script program provided to the operating system, or the like.

これらのマシン可読記憶媒体は、各種のメモリ及び記憶ユニット、半導体デバイス、光、磁気、光磁気ディスクなどの磁気ディスク、情報の記憶に適した他の媒体などを含んでも良いが、これに限られない。 These machine-readable storage media may include, but are limited to, various memories and storage units, semiconductor devices, magnetic disks such as magneto-optical, magneto-optical disks, and other media suitable for storing information. do not have.

また、コンピュータは、インターネット上の対応するウェブサイトに接続し、かつ本発明によるコンピュータプログラムコードをコンピュータにダウンローしてインストールし、その後、該プログラムを実行することにより、本発明の技術案を実現することもできる。 Further, the computer connects to a corresponding website on the Internet, downloads and installs the computer program code according to the present invention on the computer, and then executes the program to realize the technical proposal of the present invention. You can also do it.

図８は、本発明の実施例における情報処理方法及び装置を実現し得るハードウェア構成（汎用マシン）１３００の構造図である。 FIG. 8 is a structural diagram of a hardware configuration (general-purpose machine) 1300 that can realize the information processing method and device according to the embodiment of the present invention.

汎用マシン１３００は、例えば、コンピュータシステムであっても良い。なお、汎用マシン１３００は、例示に過ぎず、本発明による方法及び装置の適応範囲又は機能について限定しない。また、汎用マシン１３００は、上述の方法及び装置における任意のモジュールやアセンブリなど又はその組み合わせにも依存しない。 The general-purpose machine 1300 may be, for example, a computer system. The general-purpose machine 1300 is merely an example, and does not limit the applicable range or function of the method and device according to the present invention. Further, the general-purpose machine 1300 does not depend on any module, assembly, or a combination thereof in the above-mentioned method and device.

図１３では、中央処理装置（ＣＰＵ）１３０１は、ＲＯＭ１３０２に記憶されているプログラム又は記憶部１３０８からＲＡＭ１３０３にロッドされているプログラムに基づいて各種の処理を行う。ＲＡＭ１３０３では、ニーズに応じて、ＣＰＵ１３０１が各種の処理を行うときに必要なデータなどを記憶することもできる。ＣＰＵ１３０１、ＲＯＭ１３０２及びＲＡＭ１３０３は、バス１３０４を経由して互いに接続される。入力／出力インターフェース１３０５もバス１３０４に接続される。 In FIG. 13, the central processing unit (CPU) 1301 performs various processes based on the program stored in the ROM 1302 or the program rodged from the storage unit 1308 to the RAM 1303. The RAM 1303 can also store data and the like necessary for the CPU 1301 to perform various processes according to needs. The CPU 1301, the ROM 1302, and the RAM 1303 are connected to each other via the bus 1304. The input / output interface 1305 is also connected to bus 1304.

また、入力／出力インターフェース１３０５には、さらに、次のような部品が接続され、即ち、キーボードなどを含む入力部１３０６、液晶表示器（ＬＣＤ）などのような表示器及びスピーカーなどを含む出力部１３０７、ハードディスクなどを含む記憶部１３０８、ネットワーク・インターフェース・カード、例えば、ＬＡＮカード、モデムなどを含む通信部１３０９である。通信部１３０９は、例えば、インターネット、ＬＡＮなどのネットワークを経由して通信処理を行う。ドライブ１３１０は、ニーズに応じて、入力／出力インターフェース１３０５に接続されても良い。取り外し可能な媒体１３１１、例えば、半導体メモリなどは、必要に応じて、ドライブ１３１０にセットされることにより、その中から読み取られたコンピュータプログラムを記憶部１３０８にインストールすることができる。 Further, the following components are further connected to the input / output interface 1305, that is, an input unit 1306 including a keyboard and the like, a display unit such as a liquid crystal display (LCD), and an output unit including a speaker and the like. 1307, a storage unit 1308 including a hard disk and the like, and a communication unit 1309 including a network interface card such as a LAN card and a modem. The communication unit 1309 performs communication processing via a network such as the Internet or LAN. Drive 1310 may be connected to the input / output interface 1305, if desired. The removable medium 1311 such as a semiconductor memory can be set in the drive 1310 as needed, so that the computer program read from the medium can be installed in the storage unit 1308.

また、本発明は、さらに、マシン可読指令コードを含むプログラムプロダクトを提供する。このような指令コードは、マシンにより読み取られて実行されるときに、上述の本発明の実施形態における方法を実行することができる。それ相応に、このようなプログラムプロダクトをキャリー（ｃａｒｒｙ）する、例えば、磁気ディスク（フロッピーディスク（登録商標）を含む）、光ディスク（ＣＤ－ＲＯＭ及びＤＶＤを含む）、光磁気ディスク（ＭＤ（登録商標）を含む）、及び半導体記憶器などの各種記憶媒体も、本発明に含まれる。 The present invention also provides a program product including a machine readable command code. When such a command code is read and executed by a machine, the method according to the embodiment of the present invention described above can be executed. Correspondingly, such program products are carried, for example, magnetic disks (including floppy disks (registered trademarks)), optical disks (including CD-ROMs and DVDs), magneto-optical disks (MD (registered trademarks)). ), And various storage media such as semiconductor storage devices are also included in the present invention.

上述の記憶媒体は、例えば、磁気ディスク、光ディスク、光磁気ディスク、半導体記憶器などを含んでも良いが、これらに限定されない。 The above-mentioned storage medium may include, but is not limited to, for example, a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor storage device, and the like.

また、上述の方法における各操作（処理）は、各種のマシン可読記憶媒体に記憶されているコンピュータ実行可能なプログラムの方式で実現することもできる。 Further, each operation (process) in the above method can be realized by a method of a computer-executable program stored in various machine-readable storage media.

また、以上の実施例などに関し、さらに以下のように付記として開示する。 In addition, the above examples will be further disclosed as an appendix as follows.

（付記１）
画像処理装置であって、
入力される画像を複数の領域に分割し、前記複数の領域における顔アクションユニットに関する情報を得る情報取得部；
取得される前記顔アクションユニットに関する情報に基づいて、前記顔アクションユニットの領域からアクションユニット特徴を抽出するアクションユニット特徴抽出部；
取得される前記顔アクションユニットに関する情報に基づいて、顔グローバル特徴を抽出する顔グローバル特徴抽出部；及び
前記アクションユニット特徴及び前記顔グローバル特徴の両者に基づいて、前記顔アクションユニットに対して分類を行う分類部を含む、装置。 (Appendix 1)
It is an image processing device
An information acquisition unit that divides an input image into a plurality of areas and obtains information about a face action unit in the plurality of areas;
An action unit feature extraction unit that extracts action unit features from the area of the face action unit based on the acquired information about the face action unit;
The face global feature extraction unit that extracts the face global feature based on the acquired information about the face action unit; and the classification for the face action unit based on both the action unit feature and the face global feature. A device that includes a classifier to perform.

（付記２）
付記１に記載の画像処理装置であって、さらに、
前記アクションユニット特徴抽出部が前記アクションユニット特徴を抽出する前に、各顔アクションユニットについて新しい領域を確定するアクションユニット領域確定部を含み、
前記新しい領域は前記画像における顔領域のみを含み、
前記アクションユニット特徴抽出部は、確定された新しい領域に基づいて、前記アクションユニット特徴を抽出する、装置。 (Appendix 2)
The image processing apparatus according to Appendix 1, further.
The action unit feature extraction unit includes an action unit area determination unit that determines a new area for each face action unit before extracting the action unit feature.
The new area includes only the face area in the image.
The action unit feature extraction unit is a device that extracts the action unit features based on a determined new area.

（付記３）
付記２に記載の画像処理装置であって、
前記アクションユニット領域確定部は、次の方式により、各顔アクションユニットについて新しい領域を確定し、即ち、前記画像における顔アクションユニットに基づいて、３次元（３Ｄ）顔モデルを構築し、そして、前記３Ｄ顔モデルを用いて前記顔アクションユニットの領域を修正し、修正後の各顔アクションユニットの領域が前記画像における顔領域のみを含むようにさせる、装置。 (Appendix 3)
The image processing apparatus according to Appendix 2,
The action unit area determination unit determines a new area for each face action unit by the following method, that is, a three-dimensional (3D) face model is constructed based on the face action unit in the image, and the face model is described. A device that modifies the area of the face action unit using a 3D face model so that the area of each modified face action unit includes only the face area in the image.

（付記４）
付記１に記載の画像処理装置であって、
前記顔アクションユニットに関する情報は、前記顔アクションユニットの前記画像における位置を指示する情報を含む、装置。 (Appendix 4)
The image processing apparatus according to Appendix 1, wherein the image processing apparatus is described.
The device including information indicating the position of the face action unit in the image of the face action unit.

（付記５）
付記１に記載の画像処理装置であって、
前記情報取得部は、前記画像を低次元空間における特徴マップに変換することで前記顔アクションユニットに関する情報を得る、装置。 (Appendix 5)
The image processing apparatus according to Appendix 1, wherein the image processing apparatus is described.
The information acquisition unit is a device that obtains information about the face action unit by converting the image into a feature map in a low-dimensional space.

（付記６）
付記１に記載の画像処理装置であって、
前記アクションユニット特徴抽出部は複数の畳み込み層により各顔アクションユニットの領域について前記アクションユニット特徴を単独で抽出する、装置。 (Appendix 6)
The image processing apparatus according to Appendix 1, wherein the image processing apparatus is described.
The action unit feature extraction unit is a device that independently extracts the action unit features for each face action unit area by a plurality of convolution layers.

（付記７）
付記５に記載の画像処理装置であって、
前記顔グローバル特徴抽出部は複数の畳み込み層により前記特徴マップに基づいて、前記顔グローバル特徴を抽出する、装置。 (Appendix 7)
The image processing apparatus according to Appendix 5, wherein the image processing apparatus is described.
The face global feature extraction unit is a device that extracts the face global feature based on the feature map by a plurality of convolution layers.

（付記８）
付記１に記載の画像処理装置であって、
前記分類部はすべての顔アクションユニットの出現の確率を分析することで前記顔アクションユニットに対して分類を行う、装置。 (Appendix 8)
The image processing apparatus according to Appendix 1, wherein the image processing apparatus is described.
The classification unit is a device that classifies the face action units by analyzing the probability of appearance of all the face action units.

（付記９）
付記１に記載の画像処理装置であって、
前記分類部はｓｏｆｔｍａｘ出力を有する線形層を用いて前記すべての顔アクションユニットの出現の確率を分析する、装置
（付記１０）
画像処理方法であって、
入力される画像を複数の領域に分割し、前記複数の領域における顔アクションユニットに関する情報を取得し；
取得される前記顔アクションユニットに関する情報に基づいて、前記顔アクションユニットの領域からアクションユニット特徴を抽出し；
取得される前記顔アクションユニットに関する情報に基づいて、顔グローバル特徴を抽出し；及び
前記アクションユニット特徴及び前記顔グローバル特徴の両者に基づいて、前記顔アクションユニットに対して分類を行うことを含む、方法。 (Appendix 9)
The image processing apparatus according to Appendix 1, wherein the image processing apparatus is described.
The classification unit analyzes the probability of appearance of all the facial action units using a linear layer having a softmax output (Appendix 10).
It ’s an image processing method.
The input image is divided into a plurality of areas, and information about the face action unit in the multiple areas is acquired;
Based on the acquired information about the face action unit, the action unit features are extracted from the area of the face action unit;
Extracting face global features based on the acquired information about the face action unit; and including classifying the face action units based on both the action unit features and the face global features. Method.

（付記１１）
付記１０に記載の方法であって、
前記アクションユニット特徴を抽出する前に、さらに、
各顔アクションユニットについて新しい領域を確定することを含み、
前記新しい領域は前記画像における顔領域のみを含み、
確定された新しい領域に基づいて、前記アクションユニット特徴を抽出する、方法。 (Appendix 11)
The method described in Appendix 10
Before extracting the action unit features,
Including establishing a new area for each face action unit
The new area includes only the face area in the image.
A method of extracting said action unit features based on a defined new area.

（付記１２）
付記１１に記載の方法であって、
各顔アクションユニットについて新しい領域を確定することは、
前記画像における顔アクションユニットに基づいて３次元（３Ｄ）顔モデルを構築し、そして、前記３Ｄ顔モデルを用いて前記顔アクションユニットの領域を修正し、修正後の各顔アクションユニットの領域に前記画像における顔領域のみを含めるようにさせる、方法。 (Appendix 12)
The method described in Appendix 11
Determining a new area for each face action unit is
A three-dimensional (3D) face model is constructed based on the face action unit in the image, and the area of the face action unit is modified by using the 3D face model, and the area of each modified face action unit is described as described above. A method to include only the face area in the image.

（付記１３）
付記１０に記載の方法であって、
前記顔アクションユニットに関する情報は前記顔アクションユニットの前記画像における位置を指示する情報を含む、方法。 (Appendix 13)
The method described in Appendix 10
The method, wherein the information about the face action unit includes information indicating the position of the face action unit in the image.

（付記１４）
付記１０に記載の方法であって、
前記顔アクションユニットに関する情報の取得は前記画像を低次元空間における特徴マップに変換することを含む、方法。 (Appendix 14)
The method described in Appendix 10
Acquiring information about the face action unit comprises converting the image into a feature map in a low dimensional space.

（付記１５）
付記１０に記載の方法であって、
前記アクションユニット特徴の抽出は複数の畳み込み層により各顔アクションユニットの領域について前記アクションユニット特徴を単独で抽出することを含む、方法。 (Appendix 15)
The method described in Appendix 10
The method comprising extracting the action unit features independently for each face action unit region by means of a plurality of convolution layers.

（付記１６）
付記１４に記載の方法であって、
前記顔グローバル特徴の抽出は複数の畳み込み層により前記特徴マップに基づいて、前記顔グローバル特徴を抽出することを含む、方法。 (Appendix 16)
The method described in Appendix 14,
A method comprising extracting the face global feature from a plurality of convolution layers based on the feature map.

（付記１７）
付記１５又は１６に記載の方法であって、
前記複数畳み込み層は少なくともバッチ正規化ユニット、パラメトリックＰＲｅＬＵ、２次元畳み込みユニット及び最大プーリングユニットを含む、方法。 (Appendix 17)
The method according to Appendix 15 or 16.
The method, wherein the multi-convolution layer comprises at least a batch normalization unit, a parametric PRELU, a two-dimensional convolution unit and a maximum pooling unit.

（付記１１３）
付記１０に記載の方法であって、
前記顔アクションユニットに対しての分類は、すべての顔アクションユニットの出現の確率を分析することを含む、方法。 (Appendix 113)
The method described in Appendix 10
The classification for the facial action unit comprises analyzing the probability of appearance of all facial action units.

（付記１９）
付記１８に記載の方法であって、
ｓｏｆｔｍａｘ出力を有する線形層により前記すべての顔アクションユニットの出現の確率を分析する、方法。 (Appendix 19)
The method described in Appendix 18
A method of analyzing the probability of appearance of all of the facial action units by a linear layer with a softmax output.

（付記２０）
マシン可読記憶媒体であって、
その中にはマシン可読指令コードを含むプログラムプロダクトが記憶されており、そのうち、前記指令コードはコンピュータにより読み取られて実行されるときに、前記コンピュータに付記１０－１９に記載の画像処理方法を実行させることができる、マシン可読記憶媒体。 (Appendix 20)
It is a machine-readable storage medium
A program product including a machine-readable command code is stored in the program product, and when the command code is read and executed by a computer, the image processing method described in Appendix 10-19 is executed on the computer. A machine-readable storage medium that can be made to.

以上、本発明の好ましい実施形態を説明したが、本発明はこの実施形態に限定されず、本発明の趣旨を離脱しない限り、本発明に対するあらゆる変更は、本発明の技術的範囲に属する。 Although the preferred embodiment of the present invention has been described above, the present invention is not limited to this embodiment, and any modification to the present invention belongs to the technical scope of the present invention as long as the gist of the present invention is not deviated.

Claims

It is an image processing device
An information acquisition unit that divides an input image into a plurality of areas and acquires information about a face action unit in the plurality of areas;
An action unit feature extraction unit that extracts action unit features from the area of the face action unit based on the acquired information about the face action unit;
A face global feature extraction unit that extracts face global features based on the acquired information about the face action unit; and a classification for the face action unit based on both the action unit features and the face global features. A device that includes a classifier to perform.

The image processing apparatus according to claim 1.
The action unit feature extraction unit further includes an action unit area determination unit that determines a new area for each face action unit before the action unit feature extraction unit extracts the action unit feature.
The new area includes only the face area in the image.
The action unit feature extraction unit is a device that extracts the action unit features based on a determined new area.

The image processing apparatus according to claim 2.
The action unit area determination unit constructs a three-dimensional (3D) face model based on the face action unit in the image, and modifies the area of the face action unit using the 3D face model, and after the modification. A device that determines a new area for each face action unit by allowing the area of each face action unit in the image to include only the face area in the image.

The image processing apparatus according to claim 1.
The device including information indicating the position of the face action unit in the image of the face action unit.

The image processing apparatus according to claim 1.
The information acquisition unit is a device that obtains information about the face action unit by converting the image into a feature map in a low-dimensional space.

The image processing apparatus according to claim 1.
The action unit feature extraction unit is a device that independently extracts the action unit features for each face action unit area by a plurality of convolution layers.

The image processing apparatus according to claim 5.
The face global feature extraction unit is a device that extracts the face global feature based on the feature map by a plurality of convolution layers.

The image processing apparatus according to claim 1.
The classification unit is a device that classifies the face action units by analyzing the probability of appearance of all the face action units.

It ’s an image processing method.
The input image is divided into a plurality of areas, and information about the face action unit in the multiple areas is acquired;
Based on the acquired information about the face action unit, the action unit features are extracted from the area of the face action unit;
Extracting face global features based on the acquired information about the face action unit; and including classifying the face action units based on both the action unit features and the face global features. Method.

A program for causing a computer to execute the image processing method according to claim 9.