JP2010055395A

JP2010055395A - Image processing apparatus and method, program, and storage medium

Info

Publication number: JP2010055395A
Application number: JP2008219984A
Authority: JP
Inventors: Katsuhiko Mori; 克彦森; Yuji Kaneda; 雄司金田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2008-08-28
Filing date: 2008-08-28
Publication date: 2010-03-11

Abstract

PROBLEM TO BE SOLVED: To improve the accuracy of segmenting a face region in an image and the accuracy of detecting the position of a feature part. SOLUTION: An image processing apparatus includes: a detection parameter setting part 105 for respectively setting parameters to be used to detect face feature parts in a setting mode for setting reference data about a face and in a determination mode for determining the face by using the reference data; a face feature part position detecting part 104 for detecting the positions of face feature parts by using the parameters set in the respective modes; a feature value calculating part 110 for calculating feature values on the basis of the positions of the feature parts detected in the respective modes; a reference data setting part 113 for setting the reference data on the basis of the feature value calculated in the setting mode; and a facial expression determining part 112 for determining the face by using the feature value calculated in the determination mode and the reference data. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、入力した画像中の顔に対する判定、特に顔の表情の判別に好適な技術に関するものである。 The present invention relates to a technique suitable for determination of a face in an input image, particularly determination of facial expression.

従来、画像認識や音声認識の分野では、特定の認識対象に特化した認識処理アルゴリズムをソフトウェア、或いは専用プロセッサを用いたハードウェアにより実行することで、認識対象及び背景を含む画像から、認識対象を検出するものが知られている。 Conventionally, in the field of image recognition and voice recognition, a recognition processing algorithm specialized for a specific recognition target is executed by software or hardware using a dedicated processor, so that the recognition target can be detected from the image including the recognition target and the background. What detects is known.

特に、顔を特定の認識対象として検出する技術は、例えば非特許文献１のサーベイで紹介されているように、従来から多く開示されている。 In particular, many techniques for detecting a face as a specific recognition target have been conventionally disclosed as introduced in the survey of Non-Patent Document 1, for example.

近年では、非特許文献２で紹介された手法が注目されている。また、非特許文献３で示されているように、正面向きの顔だけではなく、横向きの顔の検出を行う技術の開発も行われている。 In recent years, the technique introduced in Non-Patent Document 2 has attracted attention. Further, as shown in Non-Patent Document 3, a technique for detecting not only a face facing front but also a face facing sideways has been developed.

また、顔画像中の目や口といった顔の特徴部位の検出に関する技術は、非特許文献４や非特許文献５で開示されている。 Non-patent document 4 and non-patent document 5 disclose techniques related to detection of facial feature parts such as eyes and mouths in facial images.

また、画像中の顔の表情を認識する技術として、非特許文献６や７で紹介されている手法がある。 As techniques for recognizing facial expressions in images, there are techniques introduced in Non-Patent Documents 6 and 7.

その他に、顔の表情動作を客観的に記述する方法として知られているＦＡＣＳ（ＦａｃｉａｌＡｃｔｉｏｎＣｏｄｉｎｇＳｙｓｔｅｍ）のＡｃｔｉｏｎＵｎｉｔに対応する変化を検出し、表情を認識する技術も開発されている。 In addition, a technique for recognizing a facial expression by detecting a change corresponding to an action unit of FACS (Facial Action Coding System), which is known as a method for objectively describing facial expression behavior, has been developed.

また、特許文献１には、表情認識の技術が開示されている。
特開２００５−５６３８８号公報 “画像処理による顔検出と顔認識”、岩井他、情報処理学会研究報告２００５−ＣＶＩＭ−１４９ｐｐ３４３−３６８、２００５年５月ＲａｐｉｄＯｂｊｅｃｔＤｅｔｅｃｔｉｏｎｕｓｉｎｇａＢｏｏｓｔｅｄＣａｓｃａｄｅｏｆＳｉｍｐｌｅＦｅａｔｕｒｅｓ”，ＰａｕｌＶｉｏｌａ，ＭｉｃｈａｅｌＪｏｎｅｓ，ＣＯＮＦＥＲＥＮＣＥＯＮＣＯＭＰＵＴＥＲＶＩＳＩＯＮＡＮＤＰＡＴＴＥＲＮＲＥＣＯＧＮＩＴＩＯＮ２００１ＬｅａｒｎｉｎｇＳｐａｒｓｅＦｅａｔｕｒｅｓｉｎＧｒａｎｕｌａｒＳｐａｃｅｆｏｒＭｕｌｔｉ−ＶｉｅｗＦａｃｅＤｅｔｅｃｔｉｏｎ（Ｃ．Ｈｕａｎｇ，Ｈ．Ａｉ，Ｙ．Ｌｉ，＊Ｓ．Ｌａｏ；ＴｓｉｎｇｈｕａＵｎｉｖ．，Ｃｈｉｎａ，＊ＯｍｒｏｎＣｏｒｐ．，Ｊａｐａｎ）ｐｐ．４０１−４０６７ｔｈＩｎｔ’ｌＣｏｎｆ．ＡｕｔｏｍａｔｉｃＦａｃｅ＆ＧｅｓｔｕｒｅＲｅｃｏｇｎｉｔｉｏｎＡｌａｎｄｍａｒｋｐａｐｅｒｉｎｆａｃｅｒｅｃｏｇｎｉｔｉｏｎ “Ｇ．Ｍ．Ｂｅｕｍｅｒ，Ｑ．Ｔａｏ，Ａ．Ｍ．Ｂａｚｅｎ，Ｒ．Ｎ．Ｊ．Ｖｅｌｄｈｕｉｓ”、７ｔｈＩｎｔ’ｌＣｏｎｆ．ＡｕｔｏｍａｔｉｃＦａｃｅ＆ＧｅｓｔｕｒｅＲｅｃｏｇｎｉｔｉｏｎＴ．Ｆ．Ｃｏｏｔｅｓ，Ｇ．Ｊ．ＥｄｗａｒｄｓａｎｄＣ．Ｊ．Ｔａｙｌｏｒ：“Ａｃｔｉｖｅａｐｐｅａｒａｎｃｅｍｏｄｅｌｓ”，Ｐｒｏｃ．ｏｆｔｈｅ５ｔｈＥｕｒｏｐｅａｎＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎ（Ｅｄ．ｂｙＨ．Ｂ．Ｎｅｕｍａｎｎ），Ｖｏｌ．２，Ｓｐｒｉｎｇｅｒ，ｐｐ．４８４４９８（１９９８）．Ｇ．Ｄｏｎａｔｅ，Ｔ．Ｊ．Ｓｅｊｎｏｗｓｋｉ，ｅｔ．ａｌ， ”ＣｌａｓｓｉｆｙｉｎｇＦａｃｉａｌＡｃｔｉｏｎｓ”ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓａｎｄＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅ，ｖｏｌ．２１，ｎｏ．１０，Ｏｃｔ，１９９９Ｙ．Ｔｉａｎ，Ｔ．Ｋａｎａｄｅ，ａｎｄＪ．Ｆ．Ｃｏｈｎ“ＲｅｃｏｇｎｉｚｉｎｇＡｃｔｉｏｎＵｎｉｔｓｆｏｒＦａｃｉａｌＥｘｐｒｅｓｓｉｏｎＡｎａｌｙｓｉｓ”ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓａｎｄＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅｖｏｌ．２３，ｎｏ．２，Ｆｅｂ．２００１ Patent Document 1 discloses a technique for facial expression recognition.
JP 2005-56388 A “Face detection and recognition by image processing”, Iwai et al., Information Processing Society of Japan Research Report 2005-CVIM-149 pp343-368, May 2005 Rapid Object Detection using a Boosted Cascade of Simple Features ", Paul Viola, Michael Jones, CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 200 Learning Sparse Features in Granular Space for Multi-View Face Detection (C. Huang, H. Ai, Y. Li, * S. Lao; 401-406 7th Int'l Conf. Automatic Face & Gesture Recognition A land mark paper in face recognition "GM Beumer, Q. Tao, AM Basen, RNJ Veldhuis", 7th Int'l Conf. Automatic Face & Gesture Recognition T. T. F. Cootes, G .; J. et al. Edwards and C.C. J. et al. Taylor: “Active appearance models”, Proc. of the 5th European Conference on Computer Vision (Ed. by H. B. Neumann), Vol. 2, Springer, pp. 484498 (1998). G. Donate, T.W. J. et al. Sejnowski, et. al, “Classifying Facial Actions”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 10, Oct, 1999 Y. Tian, T .; Kanade, and J.H. F. Cohn "Recognizing Action Units for Facial Expression Analysis" IEEE Transactions on Pattern Analysis and Vol. 23, no. 2, Feb. 2001

上述した非特許文献２や非特許文献３のように、画像中の顔の検出に成功したか否かという意味での顔の検出精度は向上している。しかしながら、顔の検出結果の位置精度、つまり検出された顔の位置の座標が顔領域中で常に同一位置（例えば顔中心）にあるかという意味の開発はあまりされていない。 As described in Non-Patent Document 2 and Non-Patent Document 3, the face detection accuracy in terms of whether or not the face in the image has been successfully detected is improved. However, there has not been much development in terms of the positional accuracy of the face detection result, that is, whether the coordinates of the detected face position are always at the same position (for example, the face center) in the face region.

顔の特徴部位の検出においても、検出位置にはある程度の誤差が生じる。 Even in the detection of facial feature parts, a certain amount of error occurs in the detection position.

また画像中の顔の表情を認識する技術において、非特許文献６の手法では、目視によってフレーム画像から顔の部分領域が正確に切り出されることが前提となっている。また非特許文献７の手法でも、顔パターンの大まかな位置決めは自動化されているが、特徴部位の位置決めに当たっては人間の目視による微調整が必要となっている。 In the technique for recognizing facial expressions in an image, the technique of Non-Patent Document 6 is based on the premise that a partial region of a face is accurately cut out from a frame image by visual observation. In the method of Non-Patent Document 7, rough positioning of the face pattern is automated, but fine positioning by human visual inspection is required for positioning of the characteristic part.

以上のように、入力画像から顔領域を切り出す処理、または表情認識に使用する特徴部位の位置の検出精度は、表情認識の精度に影響を与えている。 As described above, the processing for extracting a face region from an input image or the detection accuracy of the position of a characteristic part used for facial expression recognition affects the accuracy of facial expression recognition.

本発明の目的は、画像中の顔領域切り出しの精度や特徴部位位置の検出精度の影響を受けにくい画像処理装置及び方法を提供することである。 An object of the present invention is to provide an image processing apparatus and method which are not easily affected by the accuracy of clipping a face region in an image and the detection accuracy of a characteristic part position.

上記目的を達成するために、本発明の画像処理装置は、顔に関する参照データを設定する設定モードと該参照データを用いて顔に対する判定を行う判定モードとにおいて、顔の特徴部位の検出に用いるパラメータを、それぞれのモードに基づいて設定するパラメータ設定手段と、前記設定モード及び前記判定モードにおいて、それぞれのモードで入力された画像中に存在する顔の特徴部位の位置を、それぞれのモードで設定されたパラメータを用いて検出する検出手段と、前記設定モード及び前記判定モードにおいて、それぞれのモードで検出された前記特徴部位の位置に基づいて特徴量を算出する算出手段と、前記設定モードにおいて、該設定モードで算出された特徴量に基づいて参照データを設定する参照データ設定手段と、前記判定モードにおいて、該判定モードで算出された特徴量と前記設定モードで設定された参照データとを用いて、該判定モードで入力された画像中に存在する顔に対する判定を行う判定手段とを備える。 In order to achieve the above object, the image processing apparatus of the present invention is used for detection of a facial feature portion in a setting mode for setting reference data related to a face and a determination mode for determining a face using the reference data. Parameter setting means for setting parameters based on the respective modes, and in the setting mode and the determination mode, the positions of facial feature portions existing in the images input in the respective modes are set in the respective modes. In the setting mode, the detection means for detecting using the parameter, the calculation means for calculating the feature amount based on the position of the feature portion detected in each mode in the setting mode and the determination mode, Reference data setting means for setting reference data based on the feature amount calculated in the setting mode; In de, and a judging means for judging for the face by using the reference data set feature amount calculated by the determination mode in the setting mode, present in the input image by the determination mode.

また本発明の他の態様によれば、画像処理方法に、顔に関する参照データを設定する設定モードにおいて、顔の特徴部位の検出に用いる第１のパラメータを設定する工程と、前記設定モードにおいて、該設定モードで入力された画像中に存在する顔の特徴部位の位置を、前記第１のパラメータを用いて検出する工程と、前記設定モードにおいて、該設定モードで検出された前記特徴部位の位置に基づいて特徴量を算出する工程と、前記設定モードにおいて、該設定モードで算出された特徴量に基づいて参照データを設定する工程と、前記参照データを用いて顔に対する判定を行う判定モードにおいて、顔の特徴部位の検出に用いるパラメータとして、前記第１のパラメータとは独立な第２のパラメータを設定する工程と、前記判定モードにおいて、該判定モードで入力された画像中に存在する顔の特徴部位の位置を、前記第２のパラメータを用いて検出する工程と、前記判定モードにおいて、該判定モードで検出された前記特徴部位の位置に基づいて特徴量を算出する工程と、前記判定モードにおいて、該判定モードで算出された特徴量と前記設定モードで設定された参照データとを用いて、該判定モードで入力された画像中に存在する顔に対する判定を行う工程とを備える。 According to another aspect of the present invention, in the setting mode for setting reference data related to a face in the image processing method, the step of setting a first parameter used for detection of a facial feature part, and the setting mode, Detecting a position of a feature part of a face existing in an image input in the setting mode using the first parameter; and a position of the feature part detected in the setting mode in the setting mode; In the setting mode, in the setting mode, in the step of setting reference data based on the feature amount calculated in the setting mode, and in the determination mode in which determination is performed on the face using the reference data A step of setting a second parameter independent of the first parameter as a parameter used for detection of a facial feature part; Detecting the position of the feature portion of the face present in the image input in the determination mode using the second parameter, and the feature portion detected in the determination mode in the determination mode. An image input in the determination mode using the feature amount calculated in the determination mode and the reference data set in the setting mode in the determination mode. And a step of determining a face existing inside.

本発明によれば、動作モード毎に、顔特徴部位を検出する際に使用するパラメータ、例えば検出領域を設定することで、顔特徴部位の検出精度を向上させることが可能である。 According to the present invention, it is possible to improve the detection accuracy of a facial feature part by setting a parameter used when detecting a facial feature part, for example, a detection area, for each operation mode.

また、判定モードにおいて検出された顔特徴部位位置の補正の際に、設定モードで検出された顔特徴部位位置を補正位置の初期値として使用する態様では、領域の切り出し精度や顔特徴部位の検出精度の影響を受けにくい表情認識が実現できる。 In addition, in the aspect in which the face feature part position detected in the setting mode is used as the initial value of the correction position when the face feature part position detected in the determination mode is corrected, the region extraction accuracy and the face feature part detection are detected. Facial expression recognition that is less affected by accuracy can be realized.

以下、添付図面を参照して、本発明を好適な実施形態に従って詳細に説明する。 Hereinafter, the present invention will be described in detail according to preferred embodiments with reference to the accompanying drawings.

本実施形態では、入力画像中の顔の表情が、所定の表情であるか否かを判定する表情認識処理について説明する。また、所定の表情を笑顔とする例を使用して説明する。 In the present embodiment, a facial expression recognition process for determining whether a facial expression in an input image is a predetermined facial expression will be described. Further, an explanation will be given using an example in which a predetermined facial expression is a smile.

図１は本実施形態に係る画像処理装置の機能構成を示す図である。本実施形態に係る画像処理装置は、制御部１００、顔検出部１０１、対象顔選択部１０２、正規化顔画像作成部１０３、顔特徴部位位置検出部１０４を備える。また検出パラメータ設定部１０５、検出パラメータ保持部１０６、動作モード保持部１０７、動作モード設定部１０８、特徴量評価部１０９、特徴量算出部１１０、表情度算出部１１１、表情判定部１１２を有する。更に参照データ設定部１１３、参照データ保持部１１４、顔特徴部位位置補正部１１５、切り替え部１１６、補正結果保持部１１７を備える。以下、各部について説明する。 FIG. 1 is a diagram illustrating a functional configuration of the image processing apparatus according to the present embodiment. The image processing apparatus according to the present embodiment includes a control unit 100, a face detection unit 101, a target face selection unit 102, a normalized face image creation unit 103, and a facial feature part position detection unit 104. Further, it includes a detection parameter setting unit 105, a detection parameter holding unit 106, an operation mode holding unit 107, an operation mode setting unit 108, a feature amount evaluation unit 109, a feature amount calculation unit 110, an expression degree calculation unit 111, and an expression determination unit 112. Further, a reference data setting unit 113, a reference data holding unit 114, a facial feature part position correcting unit 115, a switching unit 116, and a correction result holding unit 117 are provided. Hereinafter, each part will be described.

制御部１００は、本実施形態に係る画像処理装置全体を制御するための処理を行うものである。制御部１００は、顔検出部１０１、対象顔選択部１０２、正規化顔画像作成部１０３、顔特徴部位位置検出部１０４、検出パラメータ設定部１０５、検出パラメータ保持部１０６、動作モード保持部１０７、動作モード設定部１０８と接続されている。更に制御部１００は、特徴量評価部１０９、特徴量算出部１１０、表情度算出部１１１、表情判定部１１２、参照データ設定部１１３、参照データ保持部１１４、顔特徴部位位置補正部１１５、切り替え部１１６、補正結果保持部１１７と接続されている。そして制御部１００は、各部が適切なタイミングで動作するように、各部を制御する。 The control unit 100 performs processing for controlling the entire image processing apparatus according to the present embodiment. The control unit 100 includes a face detection unit 101, a target face selection unit 102, a normalized face image creation unit 103, a face feature part position detection unit 104, a detection parameter setting unit 105, a detection parameter holding unit 106, an operation mode holding unit 107, The operation mode setting unit 108 is connected. Further, the control unit 100 includes a feature amount evaluation unit 109, a feature amount calculation unit 110, a facial expression degree calculation unit 111, a facial expression determination unit 112, a reference data setting unit 113, a reference data holding unit 114, a facial feature part position correction unit 115, and a switching. Unit 116 and correction result holding unit 117. Then, the control unit 100 controls each unit so that each unit operates at an appropriate timing.

顔検出部１０１は、入力画像における顔の領域（入力画像中に含まれる顔の画像の領域）を検出する。具体的には、入力画像中の顔領域の数、顔領域の座標位置、顔領域のサイズを求める処理を行う。 The face detection unit 101 detects a face area (an area of a face image included in the input image) in the input image. Specifically, processing for obtaining the number of face areas, the coordinate position of the face area, and the size of the face area in the input image is performed.

対象顔選択部１０２は、顔検出部１０１で検出された顔領域の情報から、表情認識を行う対象とする顔を選択する。この選択の手法は限定しないが、例えば、顔のサイズや入力画像中における顔の位置から決定することが出来る。 The target face selection unit 102 selects a face to be subjected to facial expression recognition from the information on the face area detected by the face detection unit 101. Although the selection method is not limited, for example, it can be determined from the size of the face or the position of the face in the input image.

正規化顔画像作成部１０３は、対象顔選択部１０２で選択された顔に対する顔領域情報を使用して、顔のサイズや回転（向き）が正規化された正規化顔画像を作成する。作成方法については後述する。 The normalized face image creation unit 103 creates a normalized face image in which the size and rotation (orientation) of the face are normalized using the face area information for the face selected by the target face selection unit 102. The creation method will be described later.

顔特徴部位位置検出部１０４は、正規化顔画像作成部１０３で作成された正規化画像から、表情認識に使用する顔特徴部位を検出する。この検出手法については後述する。 The face feature part position detection unit 104 detects a face feature part used for facial expression recognition from the normalized image created by the normalized face image creation unit 103. This detection method will be described later.

検出パラメータ設定部１０５は、顔特徴部位位置検出部１０４で顔特徴部位を検出する際に使用するパラメータ、例えば検出する領域を設定する。この設定は、後述する動作モード保持部１０７が保持する動作モードに従って、検出パラメータ保持部１０６が保持する複数のパラメータの中から選択することで行なわれる。検出パラメータ保持部１０６は、動作モード毎に設定された検出パラメータを保持する。 The detection parameter setting unit 105 sets parameters used when the facial feature part position detection unit 104 detects a facial feature part, for example, a detection area. This setting is performed by selecting from a plurality of parameters held by the detection parameter holding unit 106 in accordance with an operation mode held by an operation mode holding unit 107 described later. The detection parameter holding unit 106 holds detection parameters set for each operation mode.

動作モード保持部１０７は、動作モード設定部１０８で設定された動作モードの情報を保持する。動作モード設定部１０８は、後述する特徴量評価部１０９における特徴量の評価結果が所定の結果だった場合に、動作モードを変更するために、新たな動作モードを動作モード保持部１０７に設定する。 The operation mode holding unit 107 holds information on the operation mode set by the operation mode setting unit 108. The operation mode setting unit 108 sets a new operation mode in the operation mode holding unit 107 in order to change the operation mode when the evaluation result of the feature amount in the feature amount evaluation unit 109 described later is a predetermined result. .

特徴量評価部１０９は、後述する特徴量算出部１１０で算出された特徴量の評価を行う。評価方法については後述する。また、その評価結果が所定の結果の場合には、制御部１００を介して、後述する参照データ設定部１１３がその特徴量を参照データとして設定し、また動作モード設定部１０８が所定の動作モードを設定するようにする。 The feature amount evaluation unit 109 evaluates the feature amount calculated by the feature amount calculation unit 110 described later. The evaluation method will be described later. If the evaluation result is a predetermined result, a reference data setting unit 113 (to be described later) sets the feature amount as reference data via the control unit 100, and the operation mode setting unit 108 uses a predetermined operation mode. To set.

特徴量算出部１１０は、顔特徴部位位置検出部１０４で検出された顔特徴部位の位置、または後述する顔特徴部位位置補正部１１５で補正された顔特徴部位の補正位置から特徴量を算出する。特徴量の算出方法については後述する。 The feature amount calculation unit 110 calculates a feature amount from the position of the face feature part detected by the face feature part position detection unit 104 or the correction position of the face feature part corrected by the face feature part position correction unit 115 described later. . A feature amount calculation method will be described later.

表情度算出部１１１は、特徴量算出部１１０で算出された特徴量に基づいて、表情度を算出する。表情度の算出方法については後述する。 The expression level calculation unit 111 calculates the expression level based on the feature amount calculated by the feature amount calculation unit 110. A method of calculating the expression level will be described later.

表情判定部１１２は、表情度算出部１１１で算出された表情度を評価して、対象顔選択部１０２で選択された入力画像中の顔の表情を判定する。判定方法については後述する。 The facial expression determination unit 112 evaluates the facial expression degree calculated by the facial expression degree calculation unit 111 and determines the facial expression in the input image selected by the target face selection unit 102. The determination method will be described later.

参照データ設定部１１３は、特徴量評価部１０９における特徴量の評価結果が所定の結果だった場合に、特徴量算出部１１０で算出した特徴量を参照データとして、参照データ保持部１１４に設定する。 The reference data setting unit 113 sets the feature amount calculated by the feature amount calculation unit 110 in the reference data holding unit 114 as reference data when the evaluation result of the feature amount by the feature amount evaluation unit 109 is a predetermined result. .

顔特徴部位位置補正部１１５は、顔特徴部位位置検出部１０４で検出された顔特徴部位の位置を補正する。補正方法については後述する。 The face feature part position correcting unit 115 corrects the position of the face feature part detected by the face feature part position detecting unit 104. The correction method will be described later.

切り替え部１１６は、顔特徴部位位置検出部１０４で検出された顔特徴部位の位置結果を出力する先を、動作モード保持部１０７が保持する動作モード情報に基づいて、特徴量算出部１１０または顔特徴部位位置補正部１１５に切り替える。 Based on the operation mode information held by the operation mode holding unit 107, the switching unit 116 outputs the position result of the face feature part detected by the face feature part position detection unit 104 based on the operation mode information held by the operation mode holding unit 107. Switch to the characteristic part position correction unit 115.

補正結果保持部１１７は、顔特徴部位位置補正部１１５で得られた顔特徴部位の補正位置及び補正量を保持する。 The correction result holding unit 117 holds the correction position and correction amount of the facial feature part obtained by the facial feature part position correction unit 115.

続いて、動作モードについて説明する。本実施形態の動作モードは、参照データ探索モードと表情判定モードとの２種類がある。これらのモードにおける処理の概要を説明する。なお、詳細な説明は後述する。 Next, the operation mode will be described. There are two types of operation modes of the present embodiment: a reference data search mode and a facial expression determination mode. An overview of processing in these modes will be described. Detailed description will be given later.

参照データ探索モードは、表情判定モードにおける表情判定処理において使用する参照データを探索して設定する設定モードである。より具体的には、特徴量算出部１１０で算出された特徴量が所定の条件を満たす場合に、その特徴量またはその特徴量を含む過去のいくつかの特徴量から求めた値を参照データとする。 The reference data search mode is a setting mode for searching and setting reference data used in facial expression determination processing in the facial expression determination mode. More specifically, when the feature amount calculated by the feature amount calculation unit 110 satisfies a predetermined condition, the feature amount or a value obtained from several past feature amounts including the feature amount is referred to as reference data. To do.

表情判定モードは、参照データ探索モードで設定された参照データと、現在の処理対象画像から求めた特徴量との変化に基づいて、表情の度合いを示す表情度を算出し、その表情度に基づいて表情を判定する判定モードである。 In the facial expression determination mode, a facial expression indicating the degree of facial expression is calculated based on the change in the reference data set in the reference data search mode and the feature amount obtained from the current processing target image. This is a determination mode for determining facial expressions.

図２は、動作モードの設定手順を示すフローチャートである。まず処理開始時の動作モードとして、ステップＳ２０１で参照データ探索モードが初期設定される。ステップＳ２０２では、特徴量評価部１０９で特徴量が所定の条件を満たすかを評価する。そして、特徴量が所定の条件を満たすと評価された場合に、ステップＳ２０３で動作モードを表情判定モードに変更して、表情判定を行う。 FIG. 2 is a flowchart showing the procedure for setting the operation mode. First, as an operation mode at the start of processing, a reference data search mode is initially set in step S201. In step S202, the feature amount evaluation unit 109 evaluates whether the feature amount satisfies a predetermined condition. If it is evaluated that the feature amount satisfies a predetermined condition, the operation mode is changed to the facial expression determination mode in step S203, and facial expression determination is performed.

続いて、上記各部の動作によって実行される、入力画像中の顔の表情を判別する為のメインの処理について説明する。図３につき、参照データ探索モードにおける処理を説明し、図４につき、表情判定モードにおける処理を説明する。 Next, main processing for discriminating facial expressions in the input image, which is executed by the operations of the above-described units, will be described. The processing in the reference data search mode will be described with reference to FIG. 3, and the processing in the facial expression determination mode will be described with reference to FIG.

図３は、参照データ探索モードにおける処理手順を示すフローチャートである。ステップＳ３０１で、処理対象の画像が入力される。この画像は、動画像ファイルから静止画として切り出された１枚の画像や、不図示の撮像系で撮像されたフレーム画像等が考えられるが、特に限定はしない。 FIG. 3 is a flowchart showing a processing procedure in the reference data search mode. In step S301, an image to be processed is input. The image may be a single image cut out from a moving image file as a still image, a frame image captured by an imaging system (not shown), or the like, but is not particularly limited.

ステップＳ３０２では、顔検出部１０１が入力画像中の顔を検出する。顔検出処理手法は、背景技術で説明したように、様々な手法が提案されている。例えば、非特許文献２で紹介されている矩形フィルタを使用する方法を使用する。 In step S302, the face detection unit 101 detects a face in the input image. Various face detection processing methods have been proposed as described in the background art. For example, a method using a rectangular filter introduced in Non-Patent Document 2 is used.

ステップＳ３０３では、対象顔選択部１０２が、ステップＳ３０２で検出された顔から表情認識を行う対象となる顔を選択する。検出された顔が一つだけの場合は、その顔を選択する。検出された顔が複数ある場合には、例えば、所定サイズ以上の顔の中で、入力画像の中心に近い位置にある顔が選択される。 In step S303, the target face selection unit 102 selects a face to be subjected to facial expression recognition from the faces detected in step S302. If only one face is detected, that face is selected. When there are a plurality of detected faces, for example, a face close to the center of the input image is selected from faces of a predetermined size or larger.

ステップＳ３０４では、正規化顔画像作成部１０３が、対象顔選択ステップＳ３０３で選択された顔の領域の画像から正規化顔画像を作成する。この作成方法を図５に模式的に示す。 In step S304, the normalized face image creation unit 103 creates a normalized face image from the image of the face area selected in the target face selection step S303. This creation method is schematically shown in FIG.

図５の（Ａ）は正規化前の入力画像中の顔領域を、図５の（Ｂ）は正規化後の顔領域を示す。図５の（Ａ）中、５０１は顔領域を、５０２は検出された顔の中心座標を、５０３は検出された顔の右目座標を、５０４は検出された顔の左眼座標を示す。また、図５の（Ｂ）中、５１１は正規化された顔領域を示す。つまり、入力画像中の顔領域５０１が面内回転をしており、また、図５の（Ｂ）に示す正規化後の顔領域５１１と比較すると顔のサイズも異なっている。 FIG. 5A shows the face area in the input image before normalization, and FIG. 5B shows the face area after normalization. In FIG. 5A, 501 indicates a face area, 502 indicates the center coordinates of the detected face, 503 indicates the right eye coordinates of the detected face, and 504 indicates the left eye coordinates of the detected face. In FIG. 5B, reference numeral 511 denotes a normalized face area. That is, the face area 501 in the input image is rotated in the plane, and the face size is also different from the normalized face area 511 shown in FIG.

なお、図５の（Ｂ）中、５１２は正規化顔画像作成時に顔領域の中心座標として使用した位置５０２の正規化画像上での座標を示す。５１３、５１４はそれぞれ顔領域のサイズを求める際に使用した右目及び左目位置５０３、５０４の正規化画像上での座標を示す。 In FIG. 5B, 512 indicates the coordinates on the normalized image of the position 502 used as the center coordinates of the face area when creating the normalized face image. Reference numerals 513 and 514 denote the coordinates on the normalized image of the right eye and left eye positions 503 and 504 used for obtaining the size of the face area, respectively.

正規化処理は、顔の中心座標５０２を中心に、右目座標５０３と左眼座標５０４とを結ぶ直線が水平になるように、面内回転量を補正するとともに、右目座標５０３と左眼座標５０４とを結ぶ線分が所定の長さになるようにする。この正規化処理は、アフィン変換と呼ばれる幾何学的変換処理で行われる。 The normalization processing corrects the in-plane rotation amount so that the straight line connecting the right eye coordinate 503 and the left eye coordinate 504 is horizontal with the center coordinate 502 of the face as the center, and the right eye coordinate 503 and the left eye coordinate 504 are corrected. So that the line connecting the two has a predetermined length. This normalization process is performed by a geometric transformation process called affine transformation.

この正規化処理で使用する右目座標５０３及び左眼座標５０４の検出には、例えば、上述した非特許文献４や非特許文献５の技術を使用することができる。なお、この正規化処理において、輝度値の正規化を行うことも可能である。輝度値の正規化には、例えば、ヒストグラム平坦化処理を使用する。このヒストグラム平坦化処理は、上記非特許文献７に解説されている。 For the detection of the right eye coordinates 503 and the left eye coordinates 504 used in this normalization process, for example, the techniques of Non-Patent Document 4 and Non-Patent Document 5 described above can be used. In this normalization process, the luminance value can be normalized. For normalization of the luminance value, for example, a histogram flattening process is used. This histogram flattening process is described in Non-Patent Document 7.

ステップＳ３０５では、顔特徴部位位置検出部１０４で使用する検出パラメータ、例えば顔検出領域が設定されているかを判定し、設定されていなければステップＳ３０６へ、設定されていればステップＳ３０７へ進む。 In step S305, it is determined whether or not a detection parameter used by the face feature part position detection unit 104, for example, a face detection area, is set. If it is not set, the process proceeds to step S306, and if it is set, the process proceeds to step S307.

ステップＳ３０６では、検出パラメータ設定部１０５が、顔特徴部位位置検出部１０４で使用する顔検出パラメータを設定する。検出パラメータ設定部１０５は、検出パラメータ保持部１０６が保持している複数の検出パラメータの中から、動作モード保持部１０７が保持している現在の動作モードに従って検出パラメータを選択し、設定する。 In step S 306, the detection parameter setting unit 105 sets face detection parameters used by the face feature part position detection unit 104. The detection parameter setting unit 105 selects and sets a detection parameter from a plurality of detection parameters held by the detection parameter holding unit 106 according to the current operation mode held by the operation mode holding unit 107.

本実施形態では、検出パラメータとして検出領域を特定する情報を用いる。この検出領域について説明する。本実施形態の検出領域はマスクで指定される。以下ではこのマスクを検出領域マスクと呼ぶことにする。 In this embodiment, information for specifying a detection area is used as a detection parameter. This detection area will be described. The detection area of this embodiment is specified by a mask. Hereinafter, this mask is referred to as a detection area mask.

図６は、顔の特徴部位として口の端点を検出する際に使用する検出領域マスクの例を示す図である。図６の（Ａ）中６０１は参照データ探索モードで使用する検出領域マスクを示す。図６の（Ｂ）中６０２は表情判定モードで使用する検出領域マスクを示す。また図６中、６０３、６０４は検出領域を、６０５、６０６は非検出領域を示す。 FIG. 6 is a diagram illustrating an example of a detection area mask used when detecting an end point of a mouth as a facial feature part. In FIG. 6A, reference numeral 601 denotes a detection area mask used in the reference data search mode. In FIG. 6B, reference numeral 602 denotes a detection area mask used in the facial expression determination mode. In FIG. 6, reference numerals 603 and 604 denote detection areas, and reference numerals 605 and 606 denote non-detection areas.

これらの検出領域マスクは、検出対象の顔の特徴部位ごとに、図５の（Ｂ）に示した正規化画像中の所定の位置に設定される。後述する顔特徴部位位置検出処理においては、この検出領域マスク６０１、６０２中の検出領域６０３、６０４の領域において、顔特徴部位の検出を行う。 These detection area masks are set at predetermined positions in the normalized image shown in FIG. 5B for each feature part of the face to be detected. In the face feature part position detection process described later, face feature parts are detected in the detection areas 603 and 604 in the detection area masks 601 and 602.

図８に正規化画像中で検出領域マスクを設定する領域の一例を示す。８０１は右目目尻検出マスクを設定する領域を、８０２は左目目尻検出マスクを設定する領域を示す。８０３は右口端点検出マスクを設定する領域を、８０４は左口端点検出マスクを設定する領域を示す。 FIG. 8 shows an example of a region for setting a detection region mask in the normalized image. Reference numeral 801 denotes an area for setting a right eye corner detection mask, and reference numeral 802 denotes an area for setting a left eye corner detection mask. Reference numeral 803 denotes an area for setting a right mouth end point detection mask, and reference numeral 804 denotes an area for setting a left mouth end point detection mask.

例えば、図６に示した口端点を検出する検出領域マスクに関して説明する。右口端点検出マスク及び左口端点検出マスクを設定する領域８０３、８０４に、動作モードごとに独立に図６に示した検出領域の設定を行う。すなわち、参照データ探索モードでは検出領域マスク６０１を選択して検出領域６０３を第１のパラメータとして設定する。表情判定モードでは検出領域マスク６０２を選択して検出領域６０４を第２のパラメータとして設定する。 For example, the detection area mask for detecting the mouth end point shown in FIG. 6 will be described. In the areas 803 and 804 for setting the right mouth end point detection mask and the left mouth end point detection mask, the detection areas shown in FIG. 6 are set independently for each operation mode. That is, in the reference data search mode, the detection area mask 601 is selected and the detection area 603 is set as the first parameter. In the facial expression determination mode, the detection area mask 602 is selected and the detection area 604 is set as the second parameter.

前述のように、正規化画像の作成は、顔の検出位置と両目の検出位置とを基準に行われる。そのため、正規化画像中で口の端点が存在する位置は、個人差や表情等により、ある範囲に分布する。また、顔の検出位置や両目の検出位置の誤差によってもある範囲に分布することになる。そのため、口の端点の検出領域は、図６に示した検出領域マスク６０１の検出領域６０３及び検出領域マスク６０２の検出領域６０４のように、広がりのある範囲となる。 As described above, the normalized image is generated based on the detection position of the face and the detection position of both eyes. Therefore, the positions where the mouth end points exist in the normalized image are distributed in a certain range due to individual differences and facial expressions. Further, the distribution is made in a certain range due to the error of the detection position of the face and the detection position of both eyes. Therefore, the detection area of the end point of the mouth becomes a wide area like the detection area 603 of the detection area mask 601 and the detection area 604 of the detection area mask 602 shown in FIG.

また、本実施形態の表情認識処理では、参照データ探索モードで設定された参照データと、現在の画像から求めた特徴量との変化に基づいて、表情の度合いを示す表情度を算出し、その表情度に基づいて表情判定を行う。 Further, in the facial expression recognition process of the present embodiment, the facial expression degree indicating the degree of facial expression is calculated based on the change between the reference data set in the reference data search mode and the feature amount obtained from the current image, The facial expression is determined based on the expression level.

つまり、表情の判定のために、無表情または表情の変化の少ない状態からの変化を調査する。そのため、参照データ探索モードでは、参照データとして設定すべき特徴量は、無表情または表情の変化の少ない顔から求めた特徴量である。また、表情判定モードでは、無表情または表情の変化の少ない状態からの変化を調査する。そのために、図６に示した参照データ探索モードで使用する検出領域マスク６０１の検出領域６０３は、表情判定モードで使用する検出領域マスク６０２の検出領域６０４の範囲よりも狭い領域となっている。 In other words, in order to determine the facial expression, a change from a state where there is no change in the expression or the expression is investigated. Therefore, in the reference data search mode, the feature value to be set as the reference data is a feature value obtained from a face with no expression or little change in expression. Also, in the facial expression determination mode, the change from a state where there is no change in facial expression or facial expression is investigated. For this reason, the detection area 603 of the detection area mask 601 used in the reference data search mode shown in FIG. 6 is narrower than the detection area 604 of the detection area mask 602 used in the facial expression determination mode.

この検出領域６０３、６０４を作成するには、例えば、無表情または表情の変化の少ない顔を集めた画像データベースＡと、検出する表情の顔を集めた画像データベースＢとを用意する。検出領域６０２を決定する際には、データベースＡのそれぞれの画像に対して、前述の顔検出処理と正規化処理を行って得られた正規化顔画像中で、対象検出部位の位置の頻度分布を調査し、各位置における頻度がしきい値以上の領域を検出領域６０２とする。検出領域６０４を決定する際には、データベースＡとＢのそれぞれの画像に対して、前述の顔検出処理と正規化処理を行って得られた正規化顔画像中で、対象検出部位の位置の頻度分布を調査し、各位置における頻度がしきい値以上の領域を検出領域６０４とする。このようにして予め検出領域マスクを求めて用意しておく。 In order to create the detection areas 603 and 604, for example, an image database A that collects faces with no expression or little change in expression and an image database B that collects faces with detected facial expressions are prepared. When the detection area 602 is determined, the frequency distribution of the position of the target detection portion in the normalized face image obtained by performing the face detection process and the normalization process described above on each image in the database A. And a region where the frequency at each position is equal to or higher than a threshold value is set as a detection region 602. When the detection area 604 is determined, the position of the target detection site in the normalized face image obtained by performing the above-described face detection process and normalization process on each of the images in the databases A and B is determined. The frequency distribution is examined, and an area where the frequency at each position is equal to or higher than a threshold is set as a detection area 604. In this way, the detection area mask is obtained and prepared in advance.

なお、検出領域マスクは、後述する顔特徴部位位置検出処理で検出する検出部位ごとに用意される。つまり、口の端点の他にも目尻等を検出する場合は、それぞれの特徴部位に合わせた検出領域マスクを使用する。 Note that the detection area mask is prepared for each detection site detected by the facial feature site position detection process described later. That is, when detecting the corners of the eyes in addition to the end points of the mouth, detection area masks adapted to the respective characteristic parts are used.

また、検出領域マスクの他にも、後述する顔特徴部位位置検出で使用するパラメータを設定する。例えば、その検出で使用する手法が非特許文献２に基づくものである場合は、弱判別器と呼ばれる矩形フィルタ群が、テンプレートマッチング法を使用する場合はテンプレートが、そのパラメータに相当する。 In addition to the detection area mask, parameters used for detecting the position of the facial feature part described later are set. For example, when the method used for the detection is based on Non-Patent Document 2, a rectangular filter group called a weak classifier corresponds to the parameter when a template matching method is used.

上記の弱判別器やテンプレートも、予め用意した、無表情または表情の変化の少ない顔を集めた画像データベースＡと、検出する表情の顔を集めた画像データベースＢとから、参照データ探索モードと表情判定モードとのそれぞれで使用するものを求めておく。 The weak classifier and the template are also prepared in advance from a reference image search mode and a facial expression, which are prepared in advance from an image database A that collects faces without changes in expression or expression and an image database B that collects faces with facial expressions to be detected. Find what to use in each of the judgment modes.

つまり、画像データベースＡの画像中の顔の特徴部位を検出するように学習して得た弱判別器を参照データ探索モードで使用し、また画像データベースＡとＢの画像中の顔の特徴部位を検出するように学習して得た弱判別器を表情判定モードで使用する。 That is, the weak classifier obtained by learning to detect the facial feature portion in the image of the image database A is used in the reference data search mode, and the facial feature portion in the images of the image databases A and B is used. The weak classifier obtained by learning to detect is used in the facial expression determination mode.

また、テンプレートについては、画像データベースＡの画像の平均画像から検出対象の顔の特徴部位ごとに生成したテンプレートを参照データ探索モードで使用する。また、画像データベースＡとＢの画像の平均画像から検出対象の顔の特徴部位ごとに生成したテンプレートを表情判定モードで使用する。 As for the template, a template generated for each feature part of the face to be detected from the average image of the images in the image database A is used in the reference data search mode. Further, a template generated for each feature part of the face to be detected from the average image of the images in the image databases A and B is used in the facial expression determination mode.

ステップＳ３０７では顔特徴部位の位置を検出する。この検出は、顔特徴部位位置検出部１０４で行われる。顔特徴部位位置の検出は、ステップＳ３０４で目の検出手法として説明した、非特許文献４に記載の手法を使用することが可能である。また別の手法としてテンプレートマッチング法を使用することも出来る。正規化画像作成ステップＳ３０４では、入力画像中の顔に対して目の位置の検出を行っていたが、ステップＳ３０７では、ステップＳ３０４で作成された正規化画像に対して検出処理を行う。そのため、ステップＳ３０４で対象とする顔と比較して、一般的に変動の大きさは小さくなる。そのため、複数の画像の平均画像から検出対象の顔特徴部位ごとに生成したテンプレートを使用して、顔特徴部位を検出することも可能である。テンプレートマッチング法は、公知の技術であり、例えば非特許文献７に記載がある。 In step S307, the position of the facial feature part is detected. This detection is performed by the face feature part position detection unit 104. For the detection of the face feature part position, the method described in Non-Patent Document 4 described as the eye detection method in step S304 can be used. Alternatively, a template matching method can be used. In the normalized image creation step S304, the eye position is detected with respect to the face in the input image. In step S307, a detection process is performed on the normalized image created in step S304. For this reason, in general, the magnitude of variation is smaller than the target face in step S304. Therefore, it is also possible to detect a facial feature part using a template generated for each facial feature part to be detected from an average image of a plurality of images. The template matching method is a known technique and is described in Non-Patent Document 7, for example.

ステップＳ３０７で検出する顔特徴部位の例を図７に示す。図７中、７０１は右目目尻を、７０２は左目目尻を、７０３は右口端点を、７０４は左口端点を示す。 An example of the facial feature part detected in step S307 is shown in FIG. In FIG. 7, 701 indicates the right eye corner, 702 indicates the left eye corner, 703 indicates the right mouth end point, and 704 indicates the left mouth end point.

ステップＳ３０８では、ステップＳ３０７で検出した顔特徴部位の位置から特徴量を算出する。図９はステップＳ３０８で算出する特徴量を示す図である。９０１は右目目尻と右口端点との距離を、９０２は左目目尻と左口端点との距離を、９０３は口両端点間距離を示す。各距離は、各顔特徴部位の正規化画像上の座標値から算出する。 In step S308, a feature amount is calculated from the position of the facial feature part detected in step S307. FIG. 9 is a diagram showing the feature amount calculated in step S308. Reference numeral 901 represents the distance between the right eye corner and the right mouth end point, 902 represents the distance between the left eye corner and the left mouth end point, and 903 represents the distance between the mouth end points. Each distance is calculated from the coordinate value on the normalized image of each facial feature part.

ステップＳ３０９では、ステップＳ３０８で求めた特徴量の評価を特徴量評価部１０８により行う。この評価は、現在の入力画像が無表情または変化の少ない表情かどうか、またステップＳ３０７で検出された顔特徴部位が誤検出されたものでないかを確認するために行われる。上記の確認は、ステップＳ３０７で検出された顔特徴部位の位置を直接使用しても行うことが出来る。しかし本実施形態では、検出された顔特徴部位位置の距離という特徴量を使用する。 In step S309, the feature amount evaluation unit 108 evaluates the feature amount obtained in step S308. This evaluation is performed to confirm whether or not the current input image is an expressionless expression or an expression with little change, and whether or not the facial feature portion detected in step S307 has been erroneously detected. The above confirmation can also be performed by directly using the position of the facial feature part detected in step S307. However, in the present embodiment, a feature amount that is the distance of the detected face feature part position is used.

これは、ステップＳ３０２で検出された顔中心座標の位置の検出誤差を打ち消すことが出来るためである。つまり、顔検出位置に誤差がある場合、無表情または変化の少ない表情の時の顔特徴部位位置の分布は、その誤差のため本来の分布よりも広い分布になる。笑顔等の表情により顔特徴部位の位置が変化した時のその移動量と比較して、その誤差量は必ずしも小さいとは言えないので、その誤差の影響を少なくするために、前述の特徴量を使用して評価を行っている。 This is because the detection error of the position of the face center coordinate detected in step S302 can be canceled. In other words, when there is an error in the face detection position, the distribution of the facial feature part position when there is no expression or expression with little change is a wider distribution than the original distribution due to the error. Since the amount of error is not necessarily small compared to the amount of movement when the position of the facial feature part changes due to a facial expression such as a smile, in order to reduce the effect of the error, the above-mentioned feature amount is used. It is used for evaluation.

特徴量評価は、例えば、全ての特徴量の評価が所定の範囲に含まれている顔画像が、連続画像中で所定比率存在するかを判定することで行われる。この評価手法の技術は、本出願人が先に出願した特願２００７−１６０６８０に開示してある。 The feature amount evaluation is performed, for example, by determining whether a face image in which all feature amount evaluations are included in a predetermined range is present in a predetermined ratio in the continuous image. The technique of this evaluation method is disclosed in Japanese Patent Application No. 2007-160680 filed earlier by the present applicant.

そしてこの評価結果に基づいて、参照データとして不適当ならば、ステップＳ３０１に戻り新たな入力画像に対して参照データ探索処理を行う。参照データとして適当ならばステップＳ３１０に進む。 If the reference data is inappropriate based on the evaluation result, the process returns to step S301 to perform the reference data search process for the new input image. If it is appropriate as reference data, the process proceeds to step S310.

ステップＳ３１０では、ステップＳ３０８で算出された特徴量を参照データとして、参照データ設定部１１３が参照データ保持部１１４に設定する。この参照データの算出は、ステップＳ３０８で算出された特徴量をそのまま設定しても良い。また他の算出手法として、例えば、ステップＳ３０９における特徴量評価を、全ての特徴量の評価が所定の範囲に含まれている顔画像が、連続画像中で所定比率存在するかを判定することで行われたとする。その場合、連続画像中の所定範囲内の各特徴量の平均値を参照データとして使用しても良い。 In step S310, the reference data setting unit 113 sets the feature amount calculated in step S308 as reference data in the reference data holding unit 114. For the calculation of the reference data, the feature amount calculated in step S308 may be set as it is. As another calculation method, for example, the feature amount evaluation in step S309 is performed by determining whether a face image in which all feature amount evaluations are included in a predetermined range is present in a predetermined ratio in the continuous image. Suppose it was done. In that case, you may use the average value of each feature-value within the predetermined range in a continuous image as reference data.

ステップＳ３１１では、動作モード設定部１０８が、動作モードを参照データ探索モードから表情判定モードに変更し、その情報を動作モード保持部１０７に保持する。 In step S 311, the operation mode setting unit 108 changes the operation mode from the reference data search mode to the facial expression determination mode, and holds the information in the operation mode holding unit 107.

このように参照データ探索モードにおいては、図１の特徴量算出部１１０、表情度算出部１１１、表情判定部１１２、顔特徴部位位置補正部１１５、補正結果保持部１１７は動作しない。 As described above, in the reference data search mode, the feature amount calculation unit 110, the facial expression degree calculation unit 111, the facial expression determination unit 112, the facial feature part position correction unit 115, and the correction result holding unit 117 in FIG. 1 do not operate.

続いて、表情判定モードの時の動作を説明する。図４は、表情判定モードにおける処理手順を示すフローチャートである。 Next, the operation in the facial expression determination mode will be described. FIG. 4 is a flowchart showing a processing procedure in the facial expression determination mode.

ステップＳ４０１では、後述するステップＳ４０６で顔特徴部位の位置検出に使用するパラメータを検出パラメータ設定部１０５が設定する。検出パラメータ設定部１０５は、動作モード保持部１０８に保持されている動作モードが表情判定モードであるので、検出パラメータ保持部１０６に保持されている検出パラメータの中から、表情判定モードで使用する検出パラメータを選択して設定する。 In step S401, the detection parameter setting unit 105 sets parameters used for detecting the position of the facial feature part in step S406 described later. Since the operation mode held in the operation mode holding unit 108 is the facial expression determination mode, the detection parameter setting unit 105 detects from the detection parameters held in the detection parameter holding unit 106 to be used in the facial expression determination mode. Select and set parameters.

ステップＳ４０２では、処理対象の画像が入力され、ステップＳ４０３では、入力画像中の顔が検出される。 In step S402, an image to be processed is input, and in step S403, a face in the input image is detected.

ステップＳ４０４では、ステップＳ４０３で検出された顔から正規化顔画像の作成を行う対象となる顔を選択する。ステップＳ４０５では、ステップＳ４０４で選択された顔の画像から正規化顔画像を作成する。 In step S404, a face to be subjected to creation of a normalized face image is selected from the faces detected in step S403. In step S405, a normalized face image is created from the face image selected in step S404.

ステップＳ４０６では、顔特徴部位の位置を検出する。なお、この検出で使用される検出パラメータは、ステップＳ４０１で設定された、表情判定モード時の顔特徴部位位置検出処理用のパラメータである。つまり、例えば、口端点検出用の検出領域マスクは、図６の（Ｂ）に示した表情判定モード時の検出領域マスク６０２であり、検出する範囲は検出領域６０４である。 In step S406, the position of the facial feature part is detected. Note that the detection parameter used in this detection is a parameter for face feature part position detection processing in the facial expression determination mode set in step S401. That is, for example, the detection area mask for detecting the mouth end point is the detection area mask 602 in the facial expression determination mode shown in FIG. 6B, and the detection range is the detection area 604.

ステップＳ４０７では、顔特徴部位位置検出ステップＳ４０６で検出された顔特徴部位位置の補正を行う。この補正処理は、顔特徴部位位置補正部１１５で行われ、また補正値は補正結果保持部１１７に保持される。なお、現在の動作モードは表情判定モードであるため、図１に示した切り替え部１１６は、顔特徴部位位置検出部１０４で検出した顔特徴部位の位置情報を、顔特徴部位位置補正部１１５に出力するように切り替えられている。 In step S407, the facial feature part position detected in face feature part position detection step S406 is corrected. This correction processing is performed by the facial feature part position correcting unit 115, and the correction value is held in the correction result holding unit 117. Since the current operation mode is the facial expression determination mode, the switching unit 116 illustrated in FIG. 1 sends the facial feature part position information detected by the facial feature part position detection unit 104 to the facial feature part position correction unit 115. It has been switched to output.

背景技術で示したように、顔検出処理の検出結果位置や目検出位置にはある程度の誤差が生じる。すると、正規化画像作成時の、推定顔サイズや推定回転量が正解の顔サイズや回転量とは異なり、作成された正規化画像が、連続画像間で異なることが生じる。その例を図１０に示す。図１０の（Ａ）は、目検出結果位置１００１が正解の時の画像を、図１０の（Ｂ）は目検出結果位置１００２が正解よりも内側に誤検出した様子を示す。また、図１０の（Ｃ）は、図１０の（Ａ）の目検出結果で作成した正規化顔画像を、図１０の（Ｄ）は図１０（Ｂ）の目検出結果で作成した正規化顔画像を示す。 As shown in the background art, a certain amount of error occurs in the detection result position and the eye detection position of the face detection process. Then, when the normalized image is created, the estimated face size and the estimated rotation amount are different from the correct face size and the rotation amount, and the created normalized image is different between consecutive images. An example is shown in FIG. 10A shows an image when the eye detection result position 1001 is correct, and FIG. 10B shows a state where the eye detection result position 1002 is erroneously detected inside the correct answer. 10C shows a normalized face image created based on the eye detection result shown in FIG. 10A, and FIG. 10D shows a normalized face image created based on the eye detection result shown in FIG. 10B. A face image is shown.

図１０の（Ｃ）と図１０の（Ｄ）を比較すると、図１０の（Ｄ）は、正規化画像中の顔のサイズが大きく、目尻や口の端点が図１０の（Ｃ）の位置よりも外側に移動していることがわかる。つまり、例えば連続画像で、ある時刻Ｔの画像に対して、図１０の（Ａ）のように正解の顔検出・目検出結果で正規化画像を作成して目尻や口の端点を検出する。次に時刻Ｔ＋１の画像に対して、図１０の（Ｂ）のように誤検出した目検出結果で正規化画像を作成して目尻や口の端点を検出したとする。すると、入力画像中の顔の表情には変化が無いのに、ステップＳ４０６で検出した顔特徴部位の位置では、口が横に広がり、目尻と口の端点の距離が広がったという変化が生じたように評価される。 Comparing FIG. 10C and FIG. 10D, FIG. 10D shows that the face size in the normalized image is large, and the corners of the corners of the eyes and mouth are the positions of FIG. It turns out that it has moved outside. That is, for example, for a continuous image, a normalized image is created with the correct face detection / eye detection result as shown in FIG. 10A for an image at a certain time T, and the corners of the eyes and mouth are detected. Next, it is assumed that a normalized image is generated from the erroneously detected eye detection result as shown in FIG. 10B with respect to the image at time T + 1, and the corners of the corners of the eyes and the mouth are detected. Then, although there was no change in the facial expression in the input image, the mouth feature spread sideways and the distance between the corner of the eye and the mouth endpoint increased at the face feature position detected in step S406. It is evaluated as follows.

このような誤検出に伴う、変化の誤った評価は表情判定の精度を劣化されることになる。そこで、顔特徴部位検出位置が突発的に大きく変化しないように、つまりそのような変化が検出された場合でもその影響を受けにくくして、正規化画像上で特徴点が緩やかに変化するように、特徴点位置を補正する処理を行なう。 The erroneous evaluation of changes accompanying such erroneous detection degrades the accuracy of facial expression determination. Therefore, the feature point detection position does not change suddenly, that is, even if such a change is detected, it is not easily affected, and the feature point changes gently on the normalized image. Then, processing for correcting the feature point position is performed.

この補正には、一般にロバスト推定と呼ばれている処理を使用する。この処理は、簡単に説明すると、前回の値の近傍であれば影響を受けるが、ある範囲を越えると影響を受けにくくするというものである。このロバスト推定のアルゴリズムは以下の式で表される。時刻ｔにおける正規化画像で検出された顔特徴部位の位置をｘ_ｔとすると，その時刻における補正値θ_ｔは、以下の式１及び式２に従って更新される。この時、以前の補正結果θ_ｔ−１と補正量∂Ｅ_ｔ−１／∂θは補正結果保持部１１７に保持されている。また新たに求めた補正結果θ_ｔと補正量∂Ｅ_ｔ／∂θは、補正結果保持部１１７に保持される。ただし、初期値においては、式２の代わりに式３が使用される。 For this correction, a process generally called robust estimation is used. In brief, this process is affected if it is in the vicinity of the previous value, but it is less likely to be affected if it exceeds a certain range. This robust estimation algorithm is expressed by the following equation. Assuming that the position of the facial feature part detected in the normalized image at time t is x _t , the correction value θ _{t at} that time is updated according to the following equations 1 and 2. At this time, the previous correction result θ _t-1 and the correction amount ∂E _t-1 / ∂θ are held in the correction result holding unit 117. Further, the newly obtained correction result θ _t and the correction amount ∂E _t / ∂θ are held in the correction result holding unit 117. However, in the initial value, Expression 3 is used instead of Expression 2.

ここでρ（・）は正値有限偶関数で、αは記憶率、ηは適応率を表す定数である。εは観測値と推定値の誤差を表す関数で、通常次のように表す。
ε_ｔ＝ｘ_ｔ−θ_ｔ−１（式４）
ここでρ（・）に使用する関数として式５の関数を使用する。これは、正規化画像が正しくない場合の顔特徴部位位置の検出結果は、正規化画像が正しい場合の結果と比較して大きく外れるので、以前の結果から大きく外れた結果に対しては鈍く反応し、近い値には速く追随する。という目的を考慮して設定している。 Here, ρ (·) is a positive finite even function, α is a memory rate, and η is a constant representing an adaptation rate. ε is a function representing an error between an observed value and an estimated value, and is usually expressed as follows.
_{_{_{ε t = x t -θ t-}}} 1 ( Equation 4)
Here, the function of Expression 5 is used as a function used for ρ (·). This is because the detection result of the facial feature part position when the normalized image is incorrect deviates significantly compared to the result when the normalized image is correct. However, it quickly follows the close value. Is set in consideration of the purpose.

なお、このロバスト推定は初期値に大きく依存する。つまり、顔特徴部位位置検出の結果が誤検出している場合の位置を誤って参照データとして設定すると、修正を行うことが困難である。そこで、参照データ探索モードのステップＳ３０６におけるパラメータ設定で説明したように、参照データを決定する。すなわち、範囲が限定された領域検出マスクのように、無表情または表情の変化の少ない顔を集めた画像データベースを使用して求めた検出パラメータを使用する。さらにステップＳ３０９で特徴量の評価も行っている。このように初期値の高精度化と、ロバスト推定を組み合わせることで、顔検出結果の誤差の影響を受けにくい表情認識を行っている。 This robust estimation largely depends on the initial value. In other words, if the position when the result of detecting the facial feature part position is erroneously detected is erroneously set as reference data, it is difficult to perform correction. Therefore, the reference data is determined as described in the parameter setting in step S306 in the reference data search mode. That is, the detection parameters obtained by using an image database in which faces that have no expression or little change in expression, such as a region detection mask with a limited range, are used. In step S309, the feature amount is also evaluated. In this way, facial expression recognition that is less susceptible to the error of the face detection result is performed by combining high accuracy of the initial value and robust estimation.

ステップＳ４０８では、図９に示した特徴量を算出する。ステップＳ４０９では、ステップＳ４０８で算出した特徴量及び参照データ探索モードで参照データ保持部１１４に保持した参照データを使用して、表情度算出部１１１が表情度を算出する。 In step S408, the feature amount shown in FIG. 9 is calculated. In step S409, the facial expression degree calculation unit 111 calculates the facial expression degree using the feature amount calculated in step S408 and the reference data held in the reference data holding unit 114 in the reference data search mode.

表情度算出処理は、図１１に示すように、最初にステップＳ１１０１で特徴量と参照データとから特徴量比を求め、次にステップＳ１１０２で特徴量比から表情度を計算する。 In the facial expression degree calculation process, as shown in FIG. 11, first, a feature quantity ratio is obtained from the feature quantity and reference data in step S1101, and then in step S1102, the facial expression degree is computed from the feature quantity ratio.

ステップＳ１１０１では、下記の式６に基づいて特徴量比を求める。式６中、Ｃｊは特徴量比を、Ｆｊは特徴量を、Ｒｊは参照データを示す。なお、添え字のｊは図９に示した各特徴のインデックスを示す。
Ｃｊ＝Ｆｊ／Ｒｊ（式６）
ステップＳ１１０２では、下記の式７に示すように、各特徴量とそれに対する係数との積和演算によって表情度Ｓを求める。なお、Ｗｊは各特徴量に対する係数を示す。
Ｓ＝Σ（Ｗｊ・Ｃｊ）（式７）
係数Ｗｊの求め方は、特には限定しないが、例えば線形判別分析（ＬｉｎｅａｒＤｉｓｃｒｉｍｉｎａｎｔＡｎａｌｙｓｉｓ：ＬＤＡ）と呼ばれる方法を使用することが出来る。以下簡単に説明する。なお、線形判別分析については非特許文献８に詳細な説明がある。 In step S1101, the feature amount ratio is obtained based on the following Expression 6. In Expression 6, Cj represents a feature amount ratio, Fj represents a feature amount, and Rj represents reference data. The subscript j indicates the index of each feature shown in FIG.
Cj = Fj / Rj (Formula 6)
In step S1102, as shown in Equation 7 below, the facial expression degree S is obtained by the product-sum operation of each feature amount and the coefficient corresponding thereto. Wj indicates a coefficient for each feature quantity.
S = Σ (Wj · Cj) (Formula 7)
The method for obtaining the coefficient Wj is not particularly limited, and for example, a method called linear discriminant analysis (LDA) can be used. This will be briefly described below. Note that the linear discriminant analysis is described in detail in Non-Patent Document 8.

各特徴量比をベクトルにしたものを、特徴量比ベクトル
Ｃ＝（Ｃ１，Ｃ２，・・・，Ｃｎ）^ｔ
とする。なおｔは転置を表している。図９に示したように、本実施形態ではこのベクトルの次元は３である。線形判別分析は、ｎ次元空間の特徴ベクトルを１次元空間に射影して判別する手法と考えることが出来る。この時の射影は、
ｙ＝Ｗ^ｔＣ（式８）
で示すことができ、特徴比ベクトルＣの係数Ｗが判別パラメータとなり、今回求める係数のベクトルとなる。また、ｙは判別器の出力値となり、この値を表情度Ｓとして使用する。このＷは式９で求められる。
Ｗ＝Ｓｗ^−１（Ｍ１−Ｍ２）（式９）
なお、式９中のＭｉは標本平均を示し、またＳｗは散布行列を示す。標本平均Ｍｉは式１０、散布行列Ｓｗは式１１で求められる。 Characteristic ratio vector C = (C1, C2,..., Cn) ^t
And Note that t represents transposition. As shown in FIG. 9, the dimension of this vector is 3 in this embodiment. Linear discriminant analysis can be considered as a method of discriminating by projecting a feature vector in an n-dimensional space onto a one-dimensional space. The projection at this time is
y = W ^t C (Formula 8)
The coefficient W of the feature ratio vector C becomes a discrimination parameter and becomes a coefficient vector to be obtained this time. Further, y is an output value of the discriminator, and this value is used as the expression level S. This W is obtained by Equation 9.
W = Sw ⁻¹ (M1−M2) (Formula 9)
In Equation 9, Mi represents a sample average, and Sw represents a scattering matrix. The sample average Mi is obtained by Equation 10 and the scattering matrix Sw is obtained by Equation 11.

判定したい表情のクラス（本実施形態では笑顔の画像）をクラス１、そのほかの表情のクラスをクラス２で表すと、各クラスの標本平均Ｍｉは式１０で表される。なお、ｎｉは、各クラスの標本数を示す。 If the class of facial expression to be determined (in this embodiment, a smile image) is represented by class 1, and the other facial expression class is represented by class 2, the sample average Mi of each class is represented by equation (10). Ni represents the number of samples in each class.

また、散布行列をＳｗは式１１で求められる。 Further, Sw is obtained from Equation 11 as a scattering matrix.

具体的にＷを求めるためには、予め判定したい表情の画像（本実施形態では笑顔の画像）とそのほかの表情の顔を多数用意し、選択した無表情の顔の特徴量とから各画像に対して特徴量比Ｃｊを求める。そして、式１０から笑顔とその他の表情の標本平均Ｍ_ｉを、そして式１１から散布行列Ｓｗを求める。そして、式９を使用してＷを求めることで、式７の係数を求めることが出来る。 Specifically, in order to obtain W, an image of a facial expression to be determined in advance (a smile image in the present embodiment) and a number of other facial expressions are prepared, and each image is determined from the selected facial expression of the expressionless face On the other hand, a feature amount ratio Cj is obtained. Then, the sample mean M _i smile and other facial expressions from equation 10, and obtains a scatter matrix Sw from Equation 11. And the coefficient of Formula 7 can be calculated | required by calculating | requiring W using Formula 9.

ステップＳ４１０では、ステップＳ４０９で求めた表情度としきい値とを比較して、入力画像が所望の表情か否かを表情判定部１１２で判定する。そして、所望の表情でなければステップＳ４０２へ戻り、新たな入力画像に対して処理を行う。なお、このしきい値は、前述した表情度算出で使用する式７の係数Ｗｊを求める際に求めることが出来る。つまり、予め用意した全ての顔画像に対して、得られた係数Ｗｊを使用して笑顔度Ｓを求める。そして、所望の表情とそれ以外の表情のそれぞれに属する顔画像毎に、求めた笑顔度の頻度分布を作成し、その頻度分布上でしきい値候補を変化させて、所望の検出精度となるときのしきい値候補の値をしきい値として設定する。 In step S410, the facial expression degree obtained in step S409 is compared with a threshold value, and the facial expression determination unit 112 determines whether the input image is a desired facial expression. If it is not a desired facial expression, the process returns to step S402 to process a new input image. This threshold value can be obtained when the coefficient Wj of Expression 7 used in the expression level calculation described above is obtained. That is, the smile degree S is obtained using the obtained coefficient Wj for all face images prepared in advance. Then, for each facial image belonging to each of the desired facial expression and the other facial expressions, a frequency distribution of the obtained smile degree is created, and threshold candidates are changed on the frequency distribution to achieve a desired detection accuracy. The threshold value at that time is set as the threshold value.

図１２は、本実施形態の画像処理装置を実現するコンピュータのハードウェア構成を示す図である。ＣＰＵ１２１は、プログラムを実行することで種々の処理を実行し、装置各部を制御する。ＲＯＭ１２２はプログラムや各種固定データを記憶する。ＲＡＭ１２３は各種データやプログラムを一時記憶し、ＣＰＵ１２１にワークエリアを提供する。ＣＰＵはＲＯＭ１２２内のプログラムやＨＤＤ１２５などからＲＡＭ１２３へロードされたプログラムを実行することで、フローチャートにつき上述した処理を実現する。 FIG. 12 is a diagram illustrating a hardware configuration of a computer that implements the image processing apparatus according to the present embodiment. The CPU 121 executes various processes by executing a program and controls each part of the apparatus. The ROM 122 stores programs and various fixed data. The RAM 123 temporarily stores various data and programs and provides a work area to the CPU 121. The CPU implements the processing described above with reference to the flowchart by executing a program in the ROM 122 or a program loaded from the HDD 125 or the like into the RAM 123.

入力部１２４は、キーボードやマウスなどを備え、ユーザからのデータや指示の入力を受け付ける。ＨＤＤ１２５は、ハードディスクドライブであり、プログラムや辞書データ、画像データなどを不揮発に記憶する。表示部１２６は入力データや処理結果などを表示する。ＨＤＤ１２５に代えてＣＤＲＯＭドライブなどを備えるようにしてもよい。 The input unit 124 includes a keyboard, a mouse, and the like, and receives input of data and instructions from the user. The HDD 125 is a hard disk drive, and stores programs, dictionary data, image data, and the like in a nonvolatile manner. The display unit 126 displays input data and processing results. A CD ROM drive or the like may be provided instead of the HDD 125.

Ｉ／Ｆ１２７は、撮像装置と接続し、撮像装置が撮影した画像を受け取ったり、撮像装置への指示などを送信する。バス１２８は装置各部を接続する。 The I / F 127 is connected to the imaging device, receives an image captured by the imaging device, and transmits an instruction to the imaging device. A bus 128 connects each part of the apparatus.

以上説明したように、本実施形態によれば、設定した種類の表情の判定において、表情の判定に使用する特徴量の算出において使用される顔特徴部位位置の検出を高精度に行う。そのために、設定した種類の表情の判定に必要な顔特徴部位の検出に使用する、例えば検出領域といったパラメータを、動作モードに基づいて設定する。また、正規化画像作成時の誤差の影響を受けにくくするために、検出された顔特徴部位の位置の補正を行う。以上により、高精度な表情認識を実現する。 As described above, according to this embodiment, in the determination of the set type of facial expression, the detection of the position of the facial feature part used in the calculation of the feature amount used for the determination of the facial expression is performed with high accuracy. For this purpose, parameters such as a detection area, which are used for detection of facial feature parts necessary for determination of the set type of facial expression, are set based on the operation mode. Further, the position of the detected facial feature part is corrected in order to make it less susceptible to errors during normalization image creation. As described above, highly accurate facial expression recognition is realized.

なお、本発明の目的は、上記実施の形態の機能を実現するソフトウェアのプログラムコードを、装置のコンピュータ（またはＣＰＵ）がコンピュータ読み取り可能な記憶媒体から読出して実行することでも達成される。 Note that the object of the present invention can also be achieved by reading and executing software program codes for realizing the functions of the above-described embodiments from a computer-readable storage medium by a computer (or CPU) of the apparatus.

この場合、コンピュータ可読記憶媒体から読出されたプログラムコード自体が前述した実施の形態の機能を実現することになり、そのプログラムコードを記憶したコンピュータ可読記憶媒体は本発明を構成することになる。 In this case, the program code itself read from the computer-readable storage medium realizes the functions of the above-described embodiments, and the computer-readable storage medium storing the program code constitutes the present invention.

プログラムコードを供給するためのコンピュータ可読記憶媒体としては、種々の媒体が利用できる。例えば、フレキシブルディスク，ハードディスク，光ディスク，光磁気ディスク，ＣＤ−ＲＯＭ，ＣＤ−Ｒ，ＤＶＤ−ＲＯＭ，ＤＶＤ−Ｒ，磁気テープ，不揮発性のメモリカード，ＲＯＭなどを用いることができる。 Various media can be used as a computer-readable storage medium for supplying the program code. For example, a flexible disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD-R, DVD-ROM, DVD-R, magnetic tape, nonvolatile memory card, ROM, or the like can be used.

また、コンピュータが読出したプログラムコードを実行することにより、前述した実施の形態の機能が実現されるだけではない。そのプログラムコードの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）などが実際の処理の一部または全部を行い、その処理によって前述した実施の形態の機能が実現される場合も含まれる。 Further, the functions of the above-described embodiments are not only realized by executing the program code read by the computer. This includes the case where the OS (operating system) running on the computer performs part or all of the actual processing based on the instruction of the program code, and the functions of the above-described embodiments are realized by the processing. It is.

実施形態の画像処理装置の機能構成を示す図である。It is a figure which shows the function structure of the image processing apparatus of embodiment. 動作モードの設定手順を示すフローチャートである。It is a flowchart which shows the setting procedure of an operation mode. 参照データ探索モードの動作を示すフローチャートである。It is a flowchart which shows operation | movement of reference data search mode. 表情判定モードの動作を示すフローチャートである。It is a flowchart which shows operation | movement of facial expression determination mode. 正規化顔画像の生成を説明する図である。It is a figure explaining the production | generation of a normalized face image. 検出領域マスクの例を説明する図である。It is a figure explaining the example of a detection area mask. 顔特徴部位の例を説明する図である。It is a figure explaining the example of a face feature part. 顔特徴部位検出領域の例を説明する図である。It is a figure explaining the example of a face feature part detection area. 特徴量の例を示す図である。It is a figure which shows the example of a feature-value. 顔特徴部位の検出誤差と正規化顔画像の関係を示す図である。It is a figure which shows the relationship between the detection error of a face feature part, and a normalized face image. 表情度算出処理の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of a facial expression degree calculation process. 本実施形態の画像処理装置を実現するコンピュータのハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the computer which implement | achieves the image processing apparatus of this embodiment.

Claims

Parameter setting means for setting parameters used for detection of facial feature parts in a setting mode for setting reference data relating to a face and a determination mode for determining a face using the reference data, based on each mode
In the setting mode and the determination mode, detection means for detecting the position of the facial feature portion present in the image input in each mode using the parameters set in each mode;
In the setting mode and the determination mode, a calculation unit that calculates a feature amount based on the position of the feature part detected in each mode;
In the setting mode, reference data setting means for setting reference data based on the feature amount calculated in the setting mode;
In the determination mode, using the feature amount calculated in the determination mode and the reference data set in the setting mode, a determination unit that determines a face existing in the image input in the determination mode; An image processing apparatus comprising:

The image processing apparatus according to claim 1, wherein the parameter includes a parameter that specifies a region in which a facial feature part is detected.

The image processing apparatus according to claim 1, wherein the setting unit sets an area in which the feature portion of the face is detected in the determination mode to be wider than the setting mode.

The image processing apparatus according to claim 1, wherein the determination unit determines a facial expression.

The facial expression recognition apparatus according to claim 4, wherein the parameter is created based on facial data of a facial expression as a target in each of the modes.

The image processing according to claim 1, further comprising a correcting unit that corrects the position of the facial feature portion detected in the determination mode based on the position of the facial feature portion detected in the setting mode. apparatus.

The image processing apparatus according to claim 6, wherein the correction unit uses a position of a feature portion of the face detected in the setting mode as an initial value of a correction value.

The image processing apparatus according to claim 1, further comprising: a mode setting unit that initializes the setting mode and changes the setting to the determination mode based on an evaluation of a feature amount calculated in the setting mode.

The image according to claim 1, wherein the detection unit includes a unit that detects a face existing in the input image, and a unit that normalizes the size or orientation of the detected face. Processing equipment.

In a setting mode for setting reference data relating to a face, a step of setting a first parameter used for detection of a facial feature part;
In the setting mode, detecting a position of a facial feature portion present in the image input in the setting mode using the first parameter;
Calculating the feature amount based on the position of the feature portion detected in the setting mode in the setting mode;
In the setting mode, setting reference data based on the feature amount calculated in the setting mode;
A step of setting a second parameter independent of the first parameter as a parameter to be used for detection of a facial feature part in a determination mode for determining a face using the reference data;
In the determination mode, detecting a position of a facial feature part present in the image input in the determination mode using the second parameter;
Calculating a feature amount based on the position of the feature portion detected in the determination mode in the determination mode;
In the determination mode, using the feature amount calculated in the determination mode and the reference data set in the setting mode, a step of determining a face existing in the image input in the determination mode An image processing method.

A program for causing a computer to execute the image processing method according to claim 10.

A computer-readable storage medium storing the program according to claim 11.