JP2011113464A

JP2011113464A - Apparatus and method for attribute identification and program

Info

Publication number: JP2011113464A
Application number: JP2009271599A
Authority: JP
Inventors: Shingo Ando; 慎吾安藤; Akira Suzuki; 章鈴木; Hideki Koike; 秀樹小池
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-11-30
Filing date: 2009-11-30
Publication date: 2011-06-09
Anticipated expiration: 2029-11-30
Also published as: JP5025713B2

Abstract

<P>PROBLEM TO BE SOLVED: To achieve robust and high-speed attribute recognition in a face direction of an object and accurately identify a subjective age group of the object. <P>SOLUTION: An apparatus 1 for attribute recognition includes: a learning data obtaining portion 11 for obtaining learning data; a learning face region detection portion 12 for outputting face clip image data from the learning data; a face-direction independent attribute identifier generation portion 13 for generating an attribute identifier for each face direction of an object; a face-direction independent attribute identifier storage portion 14 for storing the attribute identifier; a target image data obtaining portion 21 for obtaining target image data; an identification face region detection portion 22 for outputting the face clip image data from the target image data; a face direction estimation portion 23 for outputting an attribute-identification face direction parameter; an identification face region re-detection portion 24 for outputting the face clip image data from the target image data; and an attribute identification portion 25 for selecting an attribute identifier from a plurality of attribute identifiers stored and obtaining an identification result of the target image related to the object. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、属性識別装置、属性識別方法およびプログラムに関する。特に、撮像画像における被写体の属性識別装置、属性識別方法およびプログラムに関する。 The present invention relates to an attribute identification device, an attribute identification method, and a program. In particular, the present invention relates to an object attribute identification device, an attribute identification method, and a program in a captured image.

近年、画像や映像から人物の顔を検出し、該人物の性別や年齢層を識別する技術（以下、「属性識別」または「属性推定」とも称する）が研究されている。例えば、２ＤＰＣＡによる特徴抽出とＧＭＭおよびＳＶＭを併用した識別器による手法がある（例えば、非特許文献１参照）。
非特許文献１に開示された技術は、画像や映像において、撮像装置（レンズ）の方向に対する属性識別の対象である人物の顔の向いている方向（以下、「顔方向」と称する）が正面である場合、即ち、正面方向から属性識別の対象者が撮像されている場合には、有効な手法ともいえる。 In recent years, techniques for detecting a human face from an image or video and identifying the gender and age group of the person (hereinafter also referred to as “attribute identification” or “attribute estimation”) have been studied. For example, there is a method using a classifier that combines feature extraction by 2DPCA and GMM and SVM (see, for example, Non-Patent Document 1).
In the technique disclosed in Non-Patent Document 1, in the image or video, the direction in which the face of the person who is the object of attribute identification with respect to the direction of the imaging device (lens) faces (hereinafter referred to as “face direction”) is the front. In other words, when the subject of attribute identification is imaged from the front direction, it can be said to be an effective method.

「顔画像による性別・年齢層推定への複数識別器の統合手法の検討」林田輝英、植木一也、小林哲則信学技法ＰＲＭＵ２００５−９６、２００５年１０月“Examination of multiple classifiers for gender and age group estimation using facial images” Teruhide Hayashida, Kazuya Ueki, Tetsunori Kobayashi Shingaku Technique PRMU 2005-96, October 2005

しかしながら、非特許文献１に開示された技術は、正面方向だけでなく種々の方向から属性識別の対象者が撮像されている場合、属性識別の精度が著しく低下するという問題がある。人物の顔は３次元的な構造であるため、同一人物の顔であっても顔方向が異なると、画像上での明度パターンが大きく変動するからである。
上記問題に対処する簡易な手法としては、学習段階で正面方向の顔画像に加え、種々の方向の顔画像を纏めて１つの識別器に学習させるという手法が考えられる。しかし、正面方向を含む種々の方向の顔画像を学習させる場合、正面方向の顔画像のみを学習させる場合に比べ、パターンの変動がより膨大となるため、識別能力の低下が予想される。また、識別器によっては、処理時間が極端に長くなるという問題がある。 However, the technique disclosed in Non-Patent Document 1 has a problem that the accuracy of attribute identification is remarkably lowered when the subject of attribute identification is imaged not only from the front direction but also from various directions. This is because a person's face has a three-dimensional structure, and therefore even if the faces of the same person are different in face direction, the brightness pattern on the image varies greatly.
As a simple technique for coping with the above problem, a technique may be considered in which, in the learning stage, in addition to the face image in the front direction, face images in various directions are collected and learned by one classifier. However, when face images in various directions including the front direction are learned, the variation in pattern becomes enormous as compared to the case of learning only the face images in the front direction. Also, depending on the discriminator, there is a problem that the processing time becomes extremely long.

ところで、属性識別のうち、年齢層を識別する技術（以下、「年齢層識別」または「年齢層推定」とも称する）は、性別を識別する技術（以下、「性別識別」または「性別推定」とも称する）に比べ、一般に、非常に困難である。個人差や化粧などによって、他者（他人）の主観による年齢（以下、「主観年齢」という）が、実年齢と多分に異なる場合が少なくないからである。従って、年齢層識別の識別器を学習する際に、他者による評価に基づき決定された教師信号を用いる方が、良い識別率が得られるものと考えられる。また、他者による評価を考慮した方が、却って実用面において有用性が高い場合も多いと考えられる。しかし、他者による評価には、評者者の評価における個人差が反映されるため、上記評価を集計した場合に、分布にばらつきが生じ、必ずしも１つの年齢層のクラスに決定できるとは限らない。つまり、適切な教師信号の設定が難しく、年齢層識別において、主観年齢層を精度良く識別できないという問題がある。 By the way, among the attribute identification, a technique for identifying an age group (hereinafter also referred to as “age group identification” or “age group estimation”) is a technique for identifying a sex (hereinafter referred to as “sex identification” or “sex estimation”). In general, it is very difficult. This is because the age by the subjectivity of another person (other person) (hereinafter referred to as “subjective age”) is often different from the actual age due to individual differences and makeup. Therefore, when learning a classifier for age group identification, it is considered that a better identification rate can be obtained by using a teacher signal determined based on evaluation by others. In addition, it is considered that there are many cases where it is more practically useful to consider evaluation by others. However, the evaluation by others reflects individual differences in the evaluation of the reviewer. Therefore, when the above evaluations are aggregated, the distribution varies, and it is not always possible to determine a class of one age group. . That is, it is difficult to set an appropriate teacher signal, and there is a problem that the subjective age group cannot be accurately identified in age group identification.

本発明は、このような事情を考慮してなされたものであり、その目的は、被写体の顔方向に対しロバストかつ処理速度の速い属性識別を実現する技術を提供することにある。また、被写体の年齢層識別において、主観年齢層を精度良く識別できる技術を提供することにある。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide a technique for realizing attribute identification that is robust and fast in processing speed with respect to the face direction of a subject. It is another object of the present invention to provide a technology that can accurately identify a subjective age group in identifying an age group of a subject.

上述した課題を解決するために、本発明の一実施態様である属性識別装置は、学習用データとして、種々の方向から撮像された顔画像データ、前記顔画像データの被写体の顔の向いている方向を示す学習用顔方向パラメータ、前記被写体の属性データを取得する学習用データ取得部と、前記学習用データ取得部によって取得された前記顔画像データから、被写体の顔領域を検出して切り出した顔切出画像データを出力する学習用顔領域検出部と、前記学習用顔領域検出部によって出力された前記顔切出画像データであって前記学習用顔方向パラメータが同一である複数の前記顔切出画像データと、当該複数の顔切出画像データそれぞれの前記属性データとに基づいて、前記被写体の顔の向いている方向別に、前記被写体の属性を識別する属性識別器を生成する顔方向別属性識別器生成部と、前記顔方向別属性識別器生成部によって生成された前記属性識別器を記憶する顔方向別属性識別器記憶部と、被写体の属性識別の対象である対象画像データを取得する対象画像データ取得部と、前記対象画像データ取得部によって取得された前記対象画像データから、被写体の顔領域を検出し顔切出画像データを出力する認識用顔領域検出部と、前記認識用顔領域検出部によって出力された前記顔切出画像データに基づいて、前記対象画像の被写体の顔の向いている方向を推定し、前記被写体の顔の向いている方向を示す属性識別用顔方向パラメータを出力する顔方向推定部と、前記顔方向推定部によって出力された前記属性識別用顔方向パラメータに基づいて、前記対象画像データから、被写体の顔領域を再度検出して切り出した顔切出画像データを出力する認識用顔領域再検出部と、前記顔方向推定部によって出力された前記属性識別用顔方向パラメータに基づいて、前記顔方向別属性識別器記憶部に記憶されている複数の前記属性識別器のなかから１または２以上の前記属性識別器を選択し、当該選択した１または２以上の前記属性識別器に、前記認識用顔領域再検出部が出力した前記顔切出画像データを入力し、前記対象画像の被写体に係る識別結果を取得する属性識別部とを備えることを特徴とする。 In order to solve the above-described problem, an attribute identification device according to an embodiment of the present invention has face image data captured from various directions as learning data, and the face of the subject of the face image data faces the subject. A learning face direction parameter indicating a direction, a learning data acquisition unit that acquires the attribute data of the subject, and a face area of the subject is detected and cut out from the face image data acquired by the learning data acquisition unit A learning face area detection unit that outputs face cut-out image data, and a plurality of the faces that are the face cut-out image data output by the learning face area detection unit and have the same learning face direction parameter An attribute for identifying the attribute of the subject for each direction in which the face of the subject faces based on the cut-out image data and the attribute data of each of the plurality of face-cut image data A face direction attribute classifier generating unit for generating a classifier, a face direction attribute classifier storage unit for storing the attribute classifier generated by the face direction attribute classifier generating unit, A target image data acquisition unit for acquiring target image data as a target, and a recognition face for detecting a face area of the subject from the target image data acquired by the target image data acquisition unit and outputting face-cut image data Based on the face cut-out image data output by the area detection unit and the recognition face area detection unit, the direction of the face of the subject of the target image is estimated, and the face of the subject faces A face direction estimation unit that outputs a face direction parameter for attribute identification indicating a direction; and a target direction from the target image data based on the face direction parameter for attribute identification output by the face direction estimation unit. A face area re-detection unit for recognition that outputs face-extracted image data that is detected and cut out again by detecting a face area of the body, and the face based on the attribute identification face direction parameter output by the face direction estimation unit One or more attribute classifiers are selected from a plurality of the attribute classifiers stored in the direction-specific attribute classifier storage unit, and the recognition is performed on the selected one or more attribute classifiers. And an attribute identification unit that inputs the face cut-out image data output from the facial area re-detection unit and acquires the identification result relating to the subject of the target image.

上述属性識別装置において、前記顔方向推定部は、前記属性識別用顔方向パラメータとして前記対象画像の被写体の顔の向いている方向を示すｙａｗ角およびｐｉｔｃｈ角を出力し、前記属性識別部は、前記顔方向別属性識別器記憶部に記憶されている複数の前記属性識別器のそれぞれを、前記顔方向推定部から出力された前記ｙａｗ角およびｐｉｔｃｈ角から構成される２次元空間に配置したときの、ユークリッド距離に基づいて最近傍の前記属性識別器を１つ選択し、前記対象画像の被写体に係る識別結果を取得するようにしてもよい。 In the above-described attribute identification device, the face direction estimation unit outputs a yaw angle and a pitch angle indicating the direction of the face of the subject of the target image as the attribute identification face direction parameter, and the attribute identification unit includes: When each of the plurality of attribute classifiers stored in the face direction attribute classifier storage unit is arranged in a two-dimensional space composed of the yaw angle and the pitch angle output from the face direction estimation unit One of the nearest attribute classifiers may be selected based on the Euclidean distance, and the identification result relating to the subject of the target image may be acquired.

上述属性識別装置において、前記顔方向推定部は、前記属性識別用顔方向パラメータとして前記対象画像の被写体の顔の向いている方向を示すｙａｗ角およびｐｉｔｃｈ角を出力し、前記属性識別部は、前記顔方向別属性識別器記憶部に記憶されている複数の前記属性識別器のそれぞれを、前記顔方向推定部から出力された前記ｙａｗ角およびｐｉｔｃｈ角から構成される２次元空間に配置したときの、ユークリッド距離に基づいて近傍の前記属性識別器を２つ以上選択し、前記距離による重み付き平均を用いて、前記対象画像の被写体に係る識別結果を取得するようにしてもよい。 In the above-described attribute identification device, the face direction estimation unit outputs a yaw angle and a pitch angle indicating the direction of the face of the subject of the target image as the attribute identification face direction parameter, and the attribute identification unit includes: When each of the plurality of attribute classifiers stored in the face direction attribute classifier storage unit is arranged in a two-dimensional space composed of the yaw angle and the pitch angle output from the face direction estimation unit Two or more neighboring attribute classifiers may be selected based on the Euclidean distance, and the identification result relating to the subject of the target image may be acquired using a weighted average based on the distance.

上述属性識別装置において、前記学習用データ取得部は、前記顔画像データを多数の人物に予め提示して得られた前記顔画像データの被写体の主観年齢の割合を集計データした集計データを学習用データとして更に取得し、前記顔方向別属性識別器生成部は、所定の閾値に基づいて、集計データによって示される各主観年齢層の正解／不正解を判定し、複数の主観年齢層を正解と判定した場合に、正解と判定した主観年齢層における評価の割合に応じて重み付けした内分値、又は、上記評価の割合を等価とした内分値を教師信号として属性識別器に渡して、属性識別器を生成するようにしてもよい。 In the above-described attribute identification device, the learning data acquisition unit is configured to learn aggregated data obtained by aggregating the ratio of the subjective age of the subject of the facial image data obtained by previously presenting the facial image data to a large number of persons. Further acquiring as data, the face direction attribute classifier generator determines a correct / incorrect answer for each subjective age group indicated by the aggregate data based on a predetermined threshold, and sets a plurality of subjective age groups as correct answers. If it is determined, the internal value weighted according to the rate of evaluation in the subjective age group determined to be correct, or the internal value equivalent to the above rate of evaluation is passed to the attribute classifier as a teacher signal, and the attribute A discriminator may be generated.

上述した課題を解決するために、本発明の他の実施態様である属性識別方法は、学習用データとして、種々の方向から撮像された顔画像データ、前記顔画像データの被写体の顔の向いている方向を示す学習用顔方向パラメータ、前記被写体の属性データを取得する学習用データ取得手段と、前記学習用データ取得手段によって取得された前記顔画像データから、被写体の顔領域を検出して切り出した顔切出画像データを出力する学習用顔領域検出手段と、前記学習用顔領域検出手段によって出力された前記顔切出画像データであって前記学習用顔方向パラメータが同一である複数の前記顔切出画像データと、当該複数の顔切出画像データそれぞれの前記属性データとに基づいて、前記被写体の顔の向いている方向別に、前記被写体の属性を識別する属性識別器を生成する顔方向別属性識別器生成手段と、前記顔方向別属性識別器生成手段によって生成された前記属性識別器を記憶する顔方向別属性識別器記憶手段と、被写体の属性識別の対象である対象画像データを取得する対象画像データ取得手段と、前記対象画像データ取得手段によって取得された前記対象画像データから、被写体の顔領域を検出し顔切出画像データを出力する認識用顔領域検出手段と、前記認識用顔領域検出手段によって出力された前記顔切出画像データに基づいて、前記対象画像の被写体の顔の向いている方向を推定し、前記被写体の顔の向いている方向を示す属性識別用顔方向パラメータを出力する顔方向推定手段と、前記顔方向推定手段によって出力された前記属性識別用顔方向パラメータに基づいて、前記対象画像データから、被写体の顔領域を再度検出して切り出した顔切出画像データを出力する認識用顔領域再検出手段と、前記顔方向推定手段によって出力された前記属性識別用顔方向パラメータに基づいて、前記顔方向別属性識別器記憶手段に記憶されている複数の前記属性識別器のなかから１または２以上の前記属性識別器を選択し、当該選択した１または２以上の前記属性識別器に、前記認識用顔領域再検出手段が出力した前記顔切出画像データを入力し、前記対象画像の被写体に係る識別結果を取得する属性識別手段とを有することを特徴とする特徴とする。 In order to solve the above-described problem, an attribute identification method according to another embodiment of the present invention is directed to learning image data of face image data captured from various directions, and a face of the subject of the face image data. A learning face direction parameter indicating the direction of the subject, learning data acquisition means for acquiring the attribute data of the subject, and detection of the face area of the subject from the face image data acquired by the learning data acquisition means Learning face area detecting means for outputting the face cut-out image data, and the face cut-out image data output by the learning face area detecting means, wherein the learning face direction parameters are the same. Based on the face cut-out image data and the attribute data of each of the plurality of face cut-out image data, the attributes of the subject are identified for each direction in which the face of the subject faces. An attribute classifier generating unit for generating a face direction attribute identifier, a face direction attribute classifier storing unit for storing the attribute classifier generated by the face direction attribute classifier generating unit, and a subject attribute A target image data acquisition unit that acquires target image data that is a target of identification, and a recognition that detects a face area of the subject and outputs face cut-out image data from the target image data acquired by the target image data acquisition unit Based on the face cut image data output by the face area detection means and the recognition face area detection means, the direction of the face of the subject of the target image is estimated, and the face direction of the subject A face direction estimating means for outputting an attribute identifying face direction parameter indicating the direction of the image, and based on the attribute identifying face direction parameter output by the face direction estimating means Recognizing face area re-detecting means for outputting face-cut image data obtained by detecting again the face area of the subject from the target image data, and the attribute identifying face direction parameter output by the face direction estimating means And selecting one or more attribute classifiers from the plurality of attribute classifiers stored in the face direction attribute classifier storage means, and selecting the one or more selected attributes. An attribute identifying unit that inputs the face cut image data output from the recognition face area re-detecting unit to the classifier and obtains the identification result relating to the subject of the target image; To do.

上述属性識別方法において、前記顔方向推定手段は、前記属性識別用顔方向パラメータとして前記対象画像の被写体の顔の向いている方向を示すｙａｗ角およびｐｉｔｃｈ角を出力し、前記属性識別手段は、前記顔方向別属性識別器記憶手段に記憶されている複数の前記属性識別器のそれぞれを、前記顔方向推定手段から出力された前記ｙａｗ角およびｐｉｔｃｈ角から構成される２次元空間に配置したときの、ユークリッド距離に基づいて最近傍の前記属性識別器を１つ選択し、前記対象画像の被写体に係る識別結果を取得するようにしてもよい。 In the attribute identification method, the face direction estimation means outputs a yaw angle and a pitch angle indicating the direction of the face of the subject of the target image as the attribute identification face direction parameter, and the attribute identification means includes: When each of the plurality of attribute classifiers stored in the face direction attribute classifier storage unit is arranged in a two-dimensional space composed of the yaw angle and pitch angle output from the face direction estimation unit One of the nearest attribute classifiers may be selected based on the Euclidean distance, and the identification result relating to the subject of the target image may be acquired.

上述属性識別方法において、前記顔方向推定手段は、前記属性識別用顔方向パラメータとして前記対象画像の被写体の顔の向いている方向を示すｙａｗ角およびｐｉｔｃｈ角を出力し、前記属性識別手段は、前記顔方向別属性識別器記憶手段に記憶されている複数の前記属性識別器のそれぞれを、前記顔方向推定手段から出力された前記ｙａｗ角およびｐｉｔｃｈ角から構成される２次元空間に配置したときの、ユークリッド距離に基づいて近傍の前記属性識別器を２つ以上選択し、前記距離による重み付き平均を用いて、前記対象画像の被写体に係る識別結果を取得するようにしてもよい。 In the attribute identification method, the face direction estimation means outputs a yaw angle and a pitch angle indicating the direction of the face of the subject of the target image as the attribute identification face direction parameter, and the attribute identification means includes: When each of the plurality of attribute classifiers stored in the face direction attribute classifier storage unit is arranged in a two-dimensional space composed of the yaw angle and pitch angle output from the face direction estimation unit Two or more neighboring attribute classifiers may be selected based on the Euclidean distance, and the identification result relating to the subject of the target image may be acquired using a weighted average based on the distance.

上述属性識別方法において、前記学習用データ取得手段は、前記顔画像データを多数の人物に予め提示して得られた前記顔画像データの被写体の主観年齢の割合を集計データした集計データを学習用データとして更に取得し、前記顔方向別属性識別器生成手段は、所定の閾値に基づいて、集計データによって示される各主観年齢層の正解／不正解を判定し、複数の主観年齢層を正解と判定した場合に、正解と判定した主観年齢層における評価の割合に応じて重み付けした内分値、又は、上記評価の割合を等価とした内分値を教師信号として属性識別器に渡して、属性識別器を生成するようにしてもよい。 In the above-described attribute identification method, the learning data acquisition means is used for learning aggregated data obtained by aggregating the ratio of the subject's subjective age of the facial image data obtained by previously presenting the facial image data to a large number of persons. Further acquiring as data, the face direction attribute discriminator generating means determines a correct answer / incorrect answer of each subjective age group indicated by the aggregate data based on a predetermined threshold, and sets a plurality of subjective age groups as correct answers. If it is determined, the internal value weighted according to the rate of evaluation in the subjective age group determined to be correct, or the internal value equivalent to the above rate of evaluation is passed to the attribute classifier as a teacher signal, and the attribute A discriminator may be generated.

上述した課題を解決するために、本発明の他の実施態様であるプログラムは、被写体の属性を識別する属性識別装置を制御するコンピュータに、学習用データとして、種々の方向から撮像された顔画像データ、前記顔画像データの被写体の顔の向いている方向を示す学習用顔方向パラメータ、前記被写体の属性データを取得する学習用データ取得ステップと、前記学習用データ取得ステップによって取得された前記顔画像データから、被写体の顔領域を検出して切り出した顔切出画像データを出力する学習用顔領域検出ステップと、前記学習用顔領域検出ステップによって出力された前記顔切出画像データであって前記学習用顔方向パラメータが同一である複数の前記顔切出画像データと、当該複数の顔切出画像データそれぞれの前記属性データとに基づいて、前記被写体の顔の向いている方向別に、前記被写体の属性を識別する属性識別器を生成し、記憶部に記憶する顔方向別属性識別器生成ステップと、被写体の属性識別の対象である対象画像データを取得する対象画像データ取得ステップと、前記対象画像データ取得ステップによって取得された前記対象画像データから、被写体の顔領域を検出し顔切出画像データを出力する認識用顔領域検出ステップと、前記認識用顔領域検出ステップによって出力された前記顔切出画像データに基づいて、前記対象画像の被写体の顔の向いている方向を推定し、前記被写体の顔の向いている方向を示す属性識別用顔方向パラメータを出力する顔方向推定ステップと、前記顔方向推定ステップによって出力された前記属性識別用顔方向パラメータに基づいて、前記対象画像データから、被写体の顔領域を再度検出して切り出した顔切出画像データを出力する認識用顔領域再検出ステップと、前記顔方向推定ステップによって出力された前記属性識別用顔方向パラメータに基づいて、前記記憶部に記憶されている複数の前記属性識別器のなかから１または２以上の前記属性識別器を選択し、当該選択した１または２以上の前記属性識別器に、前記認識用顔領域再検出ステップが出力した前記顔切出画像データを入力し、前記対象画像の被写体に係る識別結果を取得する属性識別ステップとを実行させることを特徴とする。 In order to solve the above-described problem, a program according to another embodiment of the present invention provides a face image captured from various directions as learning data by a computer that controls an attribute identification device that identifies an attribute of a subject. Data, a learning face direction parameter indicating a direction in which the face of the subject of the face image data faces, a learning data acquisition step of acquiring attribute data of the subject, and the face acquired by the learning data acquisition step A learning face area detection step for outputting face cut image data extracted by detecting a face area of a subject from image data, and the face cut image data output by the learning face area detection step; The plurality of face cut-out image data having the same learning face direction parameter and the attribute data of each of the plurality of face cut-out image data. And generating an attribute classifier for identifying the attribute of the subject for each direction in which the face of the subject faces, and storing the attribute classifier for each face direction stored in a storage unit; A target image data acquisition step for acquiring target image data that is a target of the target, and a recognition area that detects a face area of the subject and outputs face-cut image data from the target image data acquired by the target image data acquisition step Based on the face cut image data output by the face area detection step and the recognition face area detection step, the direction of the face of the subject of the target image is estimated, and the face of the subject faces A face direction estimation step for outputting an attribute identification face direction parameter indicating the current direction, and the attribute identification face direction parameter output by the face direction estimation step. Based on the data, the recognition face area re-detection step for outputting the face cut-out image data extracted again by detecting the face area of the subject from the target image data, and the face direction estimation step Based on the face direction parameter for attribute identification, one or more attribute classifiers are selected from the plurality of attribute classifiers stored in the storage unit, and the selected one or more attributes are selected. An attribute identification step of inputting the face cut-out image data output by the recognition face area redetection step and acquiring an identification result relating to a subject of the target image is executed by the classifier.

本発明によれば、被写体の顔方向に対しロバストかつ処理速度の速い属性識別を実現できるようになる。また、被写体の年齢層識別において、主観年齢層を精度良く識別できるようになる。 According to the present invention, it is possible to realize attribute identification that is robust and fast in processing speed with respect to the face direction of the subject. Further, the subjective age group can be accurately identified in the age group identification of the subject.

本発明の一実施形態による属性識別装置１の構成の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of a structure of the attribute identification device 1 by one Embodiment of this invention. 認識用顔領域再検出部２４の動作を説明する説明図である。It is explanatory drawing explaining operation | movement of the face area redetection part 24 for recognition. 属性識別装置１における年齢層識別について説明した説明図である。It is explanatory drawing explaining age group identification in the attribute identification apparatus. 属性識別装置１における年齢層識別について説明した説明図である。It is explanatory drawing explaining age group identification in the attribute identification apparatus. 属性識別装置１の動作の一例を示すフローチャートである。5 is a flowchart showing an example of the operation of the attribute identification device 1.

以下、本発明の一実施形態について図面を参照して説明する。図１は、本発明の一実施形態による属性識別装置１の構成の一例を示すブロック図である。図２は、認識用顔領域再検出部２４の動作を説明する説明図である。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing an example of the configuration of an attribute identification device 1 according to an embodiment of the present invention. FIG. 2 is an explanatory diagram for explaining the operation of the recognition face area redetection unit 24.

属性識別装置１は、図１に示すように、学習処理部１０および認識処理部２０を備える。学習処理部１０は、学習用データ取得部１１、学習用顔領域検出部１２、顔方向別属性識別器生成部１３および顔方向別属性識別器記憶部１４を備える。認識処理部２０は、対象画像データ取得部２１、認識用顔領域検出部２２、顔方向推定部２３、認識用顔領域再検出部２４、属性識別部２５および結果出力部２６を備える。 As shown in FIG. 1, the attribute identification device 1 includes a learning processing unit 10 and a recognition processing unit 20. The learning processing unit 10 includes a learning data acquisition unit 11, a learning face area detection unit 12, a face direction attribute discriminator generation unit 13, and a face direction attribute discriminator storage unit 14. The recognition processing unit 20 includes a target image data acquisition unit 21, a recognition face region detection unit 22, a face direction estimation unit 23, a recognition face region redetection unit 24, an attribute identification unit 25, and a result output unit 26.

学習用データ取得部１１は、学習用データとして、種々の方向から撮像された顔画像データ、上記顔画像データの被写体の顔方向を示す学習用顔方向パラメータ、被写体の属性データを取得する。 The learning data acquisition unit 11 acquires, as learning data, face image data captured from various directions, a learning face direction parameter indicating the face direction of the subject of the face image data, and subject attribute data.

つまり、学習用データ取得部１１は、撮像装置（例えば、デジタルカメラ）によって種々の方向から撮像された複数の顔画像データを学習用データとして取得する。学習用データ取得部１１は、顔画像データを学習用顔領域検出部１２に供給する。 That is, the learning data acquisition unit 11 acquires a plurality of face image data captured from various directions by an imaging device (for example, a digital camera) as learning data. The learning data acquisition unit 11 supplies the face image data to the learning face area detection unit 12.

また、学習用データ取得部１１は、学習用顔方向パラメータ（例えば、ｙａｗ角の値、ｐｉｔｃｈ角の値など）を学習用データとして取得する。学習用データ取得部１１は、学習用顔方向パラメータを、何れの顔画像データに係る学習用顔方向パラメータであるかわかる態様（顔画像データと学習用顔方向パラメータとの対応関係がわかる態様）により、顔方向別属性識別器生成部１３に供給する。例えば、学習用データ取得部１１は、学習用顔方向パラメータを、顔画像データを識別する識別情報に対応付けて顔方向別属性識別器生成部１３に供給する。なお、学習用顔方向パラメータは、各顔画像データの撮像時に設定した顔方向を手作業で入力したものである。 The learning data acquisition unit 11 acquires learning face direction parameters (for example, a yaw angle value and a pitch angle value) as learning data. A mode in which the learning data acquisition unit 11 knows which learning face direction parameter is the learning face direction parameter related to which face image data (a mode in which the correspondence between the face image data and the learning face direction parameter is known) Is supplied to the face direction attribute discriminator generating unit 13. For example, the learning data acquisition unit 11 supplies the learning face direction parameter to the face direction attribute classifier generation unit 13 in association with the identification information for identifying the face image data. Note that the learning face direction parameter is obtained by manually inputting the face direction set when each face image data is captured.

また、学習用データ取得部１１は、各顔画像データの被写体の属性データを学習用データとして取得する。学習用データ取得部１１は、属性データを、何れの顔画像データに係る属性データであるかわかる態様（顔画像データと属性データとの対応関係がわかる態様）により、顔方向別属性識別器生成部１３に供給する。例えば、学習用データ取得部１１は、属性データを、顔画像データを識別する識別情報に対応付けて顔方向別属性識別器生成部１３に供給する。 The learning data acquisition unit 11 acquires subject attribute data of each face image data as learning data. The learning data acquisition unit 11 generates an attribute classifier by face direction according to an aspect in which attribute data is attribute data related to which face image data (an aspect in which the correspondence between the face image data and the attribute data is known). To the unit 13. For example, the learning data acquisition unit 11 supplies the attribute data to the face direction-specific attribute classifier generation unit 13 in association with identification information for identifying the face image data.

また、学習処理部１０において、主観年齢層を識別する属性識別器を生成する場合、学習用データ取得部１１は、主観年齢に係る集計データ（顔画像データを多数の人物に予め提示して得られた当該顔画像データの被写体の主観年齢の割合を集計した集計データ）を学習用データとして取得する。学習用データ取得部１１は、集計データを、何れの顔画像データに係る集計データであるかわかる態様（顔画像データと集計データとの対応関係がわかる態様）により、顔方向別属性識別器生成部１３に供給する。例えば、学習用データ取得部１１は、集計データを、顔画像データを識別する識別情報に対応付けて顔方向別属性識別器生成部１３に供給する。なお、主観年齢層を識別する識別器を生成する場合の具体例などについては後述する When the learning processing unit 10 generates an attribute discriminator for identifying the subjective age group, the learning data acquiring unit 11 obtains the aggregated data related to the subjective age (face image data is presented to many persons in advance). (Aggregated data obtained by totaling the ratios of the subjective ages of the subjects in the face image data) is acquired as learning data. The learning data acquisition unit 11 generates an attribute classifier by face direction according to an aspect (an aspect in which the correspondence between the face image data and the aggregated data is known) that identifies the aggregated data regarding which facial image data. To the unit 13. For example, the learning data acquisition unit 11 supplies the aggregated data to the face direction attribute classifier generation unit 13 in association with identification information for identifying the face image data. A specific example of generating a discriminator for identifying a subjective age group will be described later.

学習用顔領域検出部１２は、学習用データ取得部１１から顔画像データを取得し、顔画像データから顔領域を検出する。例えば、学習用顔領域検出部１２は、確率的増分符号相関などの統計的手法（例えば、参考文献１参照）を利用し、顔領域を高速に検出してもよい。
（参考文献１）
「個体差のある対象の画像照合に適した確率的増分符号相関」三田雄志、金子敏充、堀修電子情報通信学会論文誌Ｄ−ＩＩ，Ｖｏｌ．Ｊ８８−Ｄ−ＩＩ，Ｎｏ．８，ｐｐ．１６１４−１６２３，２００５． The learning face area detection unit 12 acquires face image data from the learning data acquisition unit 11 and detects a face area from the face image data. For example, the learning face area detection unit 12 may detect a face area at high speed using a statistical method such as stochastic incremental code correlation (for example, see Reference 1).
(Reference 1)
“Probabilistic incremental code correlation suitable for image matching of objects with individual differences” Yuji Mita, Toshimitsu Kaneko, Osamu Hori, IEICE Transactions D-II, Vol. J88-D-II, no. 8, pp. 1614-1623, 2005.

顔領域を検出した学習用顔領域検出部１２は、顔画像データから顔領域を切り出した画像（以下、「顔切出画像データ」という）を、何れの顔画像データに係る顔切出画像データであるかわかる態様（顔画像データと顔切出画像データとの対応関係がわかる態様）により、顔方向別属性識別器生成部１３に供給する。例えば、学習用顔領域検出部１２は、顔切出画像データを、切り出し元の顔画像データを識別する識別情報に対応付けて顔方向別属性識別器生成部１３に供給する。なお、学習処理部１０の学習用顔領域検出部１２は、認識処理部２０の認識用顔領域検出部２２に比べ、処理時間における制約は厳しくない。認識用顔領域検出部２２は認識時に顔領域を検出するが、学習用顔領域検出部１２は属性識別器の学習時に顔領域を検出するからである。 The learning face area detection unit 12 that has detected the face area uses an image obtained by cutting out the face area from the face image data (hereinafter referred to as “face cut-out image data”) as face cut-out image data related to any face image data. Is supplied to the attribute classifier generating unit 13 according to face direction in a manner that can be understood (a manner in which the correspondence between face image data and face cut-out image data is known). For example, the learning face area detection unit 12 supplies the face cut-out image data to the face direction attribute classifier generation unit 13 in association with identification information for identifying the cut-out face image data. Note that the learning face area detection unit 12 of the learning processing unit 10 is less severe in processing time than the recognition face area detection unit 22 of the recognition processing unit 20. This is because the recognition face area detection unit 22 detects the face area during recognition, but the learning face area detection unit 12 detects the face area during learning of the attribute classifier.

顔方向別属性識別器生成部１３は、学習用データ取得部１１から学習用データ（学習用顔方向パラメータ、属性データ）を取得する。また、顔方向別属性識別器生成部１３は、学習用顔領域検出部１２から顔切出画像データを取得する。なお、学習用顔方向パラメータ、属性データおよび顔切出画像データは、何れも、それぞれが何れの顔画像データに係る情報であるか識別可能である。換言すれば、学習用顔方向パラメータ、属性データおよび顔切出画像データのぞれぞれは互いに対応付けられている。 The face direction attribute classifier generator 13 acquires learning data (learning face direction parameters, attribute data) from the learning data acquisition unit 11. Further, the face direction attribute discriminator generating unit 13 acquires face cut-out image data from the learning face area detecting unit 12. Note that the learning face direction parameter, the attribute data, and the face cut-out image data can all be identified as information relating to which face image data. In other words, the learning face direction parameter, the attribute data, and the face cut-out image data are associated with each other.

また、学習処理部１０において、主観年齢層を識別する属性識別器を生成する場合、顔方向別属性識別器生成部１３は、学習用データ取得部１１から学習用データ（集計データ）を取得する。なお、集計データは、何れも、それぞれが何れの顔画像データに係る情報であるか識別可能である。換言すれば、学習用顔方向パラメータ、属性データ、集計データおよび顔切出画像データのぞれぞれは互いに対応付けられている。 When the learning processing unit 10 generates an attribute classifier that identifies a subjective age group, the face-specific attribute classifier generation unit 13 acquires learning data (aggregated data) from the learning data acquisition unit 11. . It should be noted that each of the aggregate data can identify which face image data is associated with each. In other words, the learning face direction parameter, the attribute data, the total data, and the face cut-out image data are associated with each other.

学習用データ取得部１１および学習用顔領域検出部１２を取得した顔方向別属性識別器生成部１３は、顔方向別に、属性識別器を生成する。例えば、顔方向別属性識別器生成部１３は、男性か女性かを識別できる性別識別器などの属性識別器を顔方向別に生成する。 The face direction attribute discriminator generation unit 13 that has acquired the learning data acquisition unit 11 and the learning face area detection unit 12 generates an attribute discriminator for each face direction. For example, the face direction attribute discriminator generating unit 13 generates an attribute discriminator such as a gender discriminator that can discriminate between male and female by face direction.

具体的には、顔方向別属性識別器生成部１３は、学習用顔領域検出部１２によって出力された顔切出画像データであって学習用顔方向パラメータが同一である複数の顔切出画像データと、当該複数の顔切出画像データそれぞれの属性データとに基づいて、被写体の顔方向別に、被写体の属性を識別する属性識別器を生成する。より詳細には、顔方向別属性識別器生成部１３は、学習用顔方向パラメータの値が一致する顔切出画像データ（顔方向の一致する顔切出画像データ）を１つのグループとし、各グループ内の属性（各グループ内の顔切出画像データに対応する属性データに基づく属性）を教師信号として属性識別器に学習させる。顔方向別属性識別器生成部１３は、全ての顔方向に対して当該処理を行うことによって、顔方向別の属性識別器を生成する。なお、顔方向別属性識別器生成部１３は、学習用顔方向パラメータに基づいて、各属性識別器に顔方向に係る情報（例えば、ｙａｗ角およびｐｉｔｃｈ角の組合せ）を示すタグを付加する。 Specifically, the face direction attribute classifier generation unit 13 is a plurality of face cut images that are face cut image data output by the learning face region detection unit 12 and have the same learning face direction parameter. Based on the data and the attribute data of each of the plurality of face cut-out image data, an attribute discriminator for identifying the attribute of the subject is generated for each face direction of the subject. More specifically, the face direction attribute classifier generator 13 sets the face cut image data (face cut image data with matching face directions) having the same value of the learning face direction parameter as one group, The attribute discriminator is made to learn the attribute in the group (the attribute based on the attribute data corresponding to the face cut image data in each group) as a teacher signal. The face direction attribute classifier generator 13 generates the face direction attribute classifier by performing the process for all face directions. The face direction attribute classifier generator 13 adds a tag indicating information on the face direction (for example, a combination of yaw angle and pitch angle) to each attribute classifier based on the learning face direction parameter.

顔方向別属性識別器生成部１３は、生成した全ての属性識別器を顔方向別属性識別器記憶部１４に記憶する。なお、顔方向別属性識別器生成部１３は、属性識別器の生成結果を管理し、全ての顔方向の属性識別器を生成したか否かを判断する。なお、顔方向別属性識別器生成部１３が生成する属性識別器の種類は、２クラスを判別できるものであれば何でもよい。例えば、サポートベクトルマシンやフィードフォワード型ニューラルネットワークなどが代表的なところである。 The face direction attribute discriminator generation unit 13 stores all the generated attribute discriminators in the face direction attribute discriminator storage unit 14. The face direction attribute classifier generator 13 manages the generation results of the attribute classifiers and determines whether or not all face direction attribute classifiers have been generated. Note that the type of attribute classifier generated by the face direction attribute classifier generator 13 may be anything as long as it can discriminate between two classes. For example, support vector machines and feed-forward neural networks are typical.

また、主観年齢層を識別する識別器を生成する場合、顔方向別属性識別器生成部１３は、所定の閾値に基づいて、集計データによって示される各主観年齢層（各クラス）の正解／不正解を判定し、複数の主観年齢層を正解と判定した場合に、正解と判定した主観年齢層における評価の頻度（正解と判定した各クラスに属すると評価した他者の割合）に応じて重み付けした内分値、又は、上記評価の割合を等価とした内分値を教師信号として属性識別器に渡して、属性識別器を生成する。 Further, when generating a discriminator for identifying a subjective age group, the face direction attribute classifier generating unit 13 corrects / incorrects each subjective age group (each class) indicated by the aggregate data based on a predetermined threshold. When the correct answer is determined and multiple subjective age groups are determined to be correct, weighting is performed according to the frequency of evaluation in the subjective age group determined to be correct (percentage of others evaluated as belonging to each class determined to be correct) Then, the internal division value or the internal division value equivalent to the evaluation ratio is passed as a teacher signal to the attribute discriminator to generate an attribute discriminator.

顔方向別属性識別器記憶部１４は、メモリまたはＨＤＤであって、顔方向別属性識別器生成部１３によって生成された全ての属性識別器を記録する。顔方向別属性識別器記憶部１４に記憶された属性識別器は、属性識別部２５からの要求に応じて属性識別部２５に供給される。 The face direction attribute classifier storage unit 14 is a memory or HDD, and records all the attribute classifiers generated by the face direction attribute classifier generation unit 13. The attribute classifiers stored in the face-direction attribute classifier storage unit 14 are supplied to the attribute classifier 25 in response to a request from the attribute classifier 25.

対象画像データ取得部２１は、属性識別の対象となる対象画像データを取得する。対象画像データ取得部２１は、対象画像データを認識用顔領域検出部２２および認識用顔領域再検出部２４に供給する。 The target image data acquisition unit 21 acquires target image data that is a target of attribute identification. The target image data acquisition unit 21 supplies the target image data to the recognition face area detection unit 22 and the recognition face area redetection unit 24.

認識用顔領域検出部２２は、対象画像データ取得部２１から対象画像データを取得し、対象画像データから顔領域を検出する。例えば、認識用顔領域検出部２２は、学習用顔領域検出部１２と同様の手法を利用し、対象画像データから顔領域を検出する。顔領域を検出した認識用顔領域検出部２２は、対象画像データから顔領域を切り出した顔切出画像データを顔方向推定部２３に供給する。なお、認識処理部２０の認識用顔領域検出部２２は、学習処理部１０の学習用顔領域検出部１２に比べ、処理時間における制約は厳しい。学習用顔領域検出部１２は属性識別器の学習時に顔領域を検出するが、認識用顔領域検出部２２は認識時に顔領域を検出するからである。従って、確率的増分符号相関などの統計的手法を利用し、顔領域を高速に検出することが好ましい。 The recognition face area detection unit 22 acquires target image data from the target image data acquisition unit 21 and detects a face area from the target image data. For example, the recognition face area detection unit 22 uses the same method as the learning face area detection unit 12 to detect a face area from the target image data. The recognizing face area detecting unit 22 that has detected the face area supplies the face cut-out image data obtained by cutting out the face area from the target image data to the face direction estimating unit 23. Note that the recognition face area detection unit 22 of the recognition processing unit 20 is more restrictive in processing time than the learning face area detection unit 12 of the learning processing unit 10. This is because the learning face area detection unit 12 detects the face area during learning of the attribute classifier, but the recognition face area detection unit 22 detects the face area during recognition. Therefore, it is preferable to detect a face area at high speed using a statistical method such as stochastic incremental code correlation.

顔方向推定部２３は、認識用顔領域検出部２２から顔切出画像データを取得する。顔切出画像データを取得した顔方向推定部２３は、認識用顔領域検出部２２によって出力された顔切出画像データに基づいて、対象画像の被写体の顔方向を推定し、被写体の顔方向を示す属性識別用顔方向パラメータを出力する。属性識別用顔方向パラメータは、例えば、ｙａｗ角、ｐｉｔｃｈ角、ｒｏｌｌ角およびスケール値に係るものである。スケール値とは、顔領域として検出される方形枠に対する顔の占める大きさ（例えば、ドット数）を定量的に算出したものである（例えば、ある基準値に対する相対値として表現してもよい）。 The face direction estimation unit 23 acquires face cut-out image data from the recognition face area detection unit 22. The face direction estimation unit 23 that has acquired the face cut-out image data estimates the face direction of the subject of the target image based on the face cut-out image data output by the recognition face area detection unit 22, and the face direction of the subject The face direction parameter for attribute identification indicating is output. The attribute identification face direction parameter relates to, for example, a yaw angle, a pitch angle, a roll angle, and a scale value. The scale value is a quantitative calculation of the size (for example, the number of dots) occupied by the face with respect to the square frame detected as the face area (for example, it may be expressed as a relative value with respect to a certain reference value). .

より詳細には、顔方向推定部２３は、顔切出画像データから顔領域の明度パターンを検出し、検出した顔領域の明度パターンに基づいて、属性識別用顔方向パラメータを推定する。例えば、顔方向推定部２３は、主成分分析とサポートベクトル回帰を組み合わせたパラメータ推定法（例えば、参考文献２参照）を利用し、高精度に、属性識別用顔方向パラメータを推定する。参考文献２参照の手法を利用することによって、学習していない顔方向を含む連続的な顔方向推定が可能となる。
（参考文献２）
「サポートベクトル回帰を用いた三次元物体の姿勢推定法」安藤慎吾，草地良規，鈴木章，荒川賢一電子情報通信学会論文誌Ｄ−ＩＩ，Ｖｏｌ．Ｊ８９−ＤＮｏ．８，ｐｐ．１８４０−１８４７，２００６． More specifically, the face direction estimation unit 23 detects the brightness pattern of the face area from the face cut-out image data, and estimates a face direction parameter for attribute identification based on the detected brightness pattern of the face area. For example, the face direction estimation unit 23 estimates a face direction parameter for attribute identification with high accuracy using a parameter estimation method (for example, see Reference 2) that combines principal component analysis and support vector regression. By using the method described in Reference 2, continuous face direction estimation including a face direction that has not been learned can be performed.
(Reference 2)
“Pose Estimation Method for 3D Objects Using Support Vector Regression” Shingo Ando, Yoshinori Kusachi, Akira Suzuki, Kenichi Arakawa IEICE Transactions D-II, Vol. J89-D No. 8, pp. 1840-1847, 2006.

顔方向を推定した顔方向推定部２３は、ｒｏｌｌ角およびスケール値に係る属性識別用顔方向パラメータを認識用顔領域再検出部２４に供給し、ｙａｗ角およびｐｉｔｃｈ角に係る属性識別用顔方向パラメータを属性識別部２５に供給する。 The face direction estimation unit 23 that has estimated the face direction supplies the attribute identification face direction parameter related to the roll angle and the scale value to the recognition face region re-detection unit 24, and the attribute identification face direction related to the yaw angle and the pitch angle. The parameter is supplied to the attribute identification unit 25.

認識用顔領域再検出部２４は、対象画像データ取得部２１から対象画像データを取得する。また、認識用顔領域再検出部２４は、顔方向推定部２３からｒｏｌｌ角およびスケール値に係る属性識別用顔方向パラメータを取得する。対象画像データ、ｒｏｌｌ角およびスケール値に係る属性識別用顔方向パラメータを取得した認識用顔領域再検出部２４は、顔方向推定部２３によって出力された属性識別用顔方向パラメータ（ｒｏｌｌ角およびスケール値）に基づいて、対象画像データから、被写体の顔領域を再度検出して切り出した顔切出画像データを出力する。具体的には、認識用顔領域再検出部２４は、図２に示すように、ｒｏｌｌ角が０°かつスケール値が１になるように正規化されるように、対象画像データから顔領域を切り出した顔切出画像データを出力する。つまり、認識用顔領域再検出部２４は、回転や大きさの微妙なぶれを補正するために、再度、対象画像データから顔領域を切り出した顔切出画像データを出力している。認識用顔領域再検出部２４は、顔切出画像データを属性識別部２５に供給する。 The recognition face area redetection unit 24 acquires target image data from the target image data acquisition unit 21. Further, the recognizing face area re-detecting unit 24 acquires the attribute identifying face direction parameter related to the roll angle and the scale value from the face direction estimating unit 23. The recognizing face area re-detecting unit 24 that has acquired the attribute identifying face direction parameter related to the target image data, the roll angle, and the scale value outputs the attribute identifying face direction parameter (roll angle and scale) output by the face direction estimating unit 23. Value), the face image of the subject is detected again from the target image data and the face cut image data cut out is output. Specifically, as shown in FIG. 2, the recognizing face area re-detecting unit 24 extracts the face area from the target image data so that the roll angle is 0 ° and the scale value is 1. The clipped face image data is output. That is, the recognizing face area re-detecting unit 24 outputs face cut-out image data obtained by cutting out the face area from the target image data again in order to correct a slight rotation or size fluctuation. The recognition face area redetection unit 24 supplies the face cut-out image data to the attribute identification unit 25.

属性識別部２５は、顔方向推定部２３からｙａｗ角およびｐｉｔｃｈ角に係る属性識別用顔方向パラメータを取得する。また、属性識別部２５は、認識用顔領域再検出部２４から顔切出画像データを取得する。ｙａｗ角およびｐｉｔｃｈ角に係る属性識別用顔方向パラメータ、顔切出画像データを取得した属性識別部２５は、顔方向推定部２３によって出力された属性識別用顔方向パラメータ（ｙａｗ角およびｐｉｔｃｈ角）に基づいて、顔方向別属性識別器記憶部１４に記憶されている複数の属性識別器のなかから１または２以上の属性識別器を選択する。 The attribute identification unit 25 acquires the face direction parameters for attribute identification related to the yaw angle and the pitch angle from the face direction estimation unit 23. Further, the attribute identification unit 25 acquires face cut-out image data from the recognition face area redetection unit 24. The attribute identifying unit 25 having acquired the face direction parameter for attribute identification and the face cut image data related to the yaw angle and the pitch angle, the face direction parameter for attribute identification (yaw angle and pitch angle) output by the face direction estimating unit 23 Based on the above, one or more attribute classifiers are selected from the plurality of attribute classifiers stored in the face direction attribute classifier storage unit 14.

属性識別部２５による属性識別器の選択基準は種々の方法が考えられるが、本実施形態においては、属性識別部２５は、基準基準１または選択基準２の何れかに従って属性識別器を選択する。
（選択基準１）
顔方向別属性識別器記憶部１４に記憶されている複数の属性識別器がｙａｗ角とｐｉｔｃｈ角に関する２次元空間上に格子状に並べられていると想定し、ユークリッド距離による最近傍の属性識別器を１つ選択する。換言すれば、顔方向別属性識別器記憶部１４に記憶されている複数の属性識別器のそれぞれを、顔方向推定部２３から出力されたｙａｗ角およびｐｉｔｃｈ角から構成される２次元空間に配置したときの、ユークリッド距離に基づいて最近傍の属性識別器を１つ選択する。
（選択基準２）
顔方向別属性識別器記憶部１４に記憶されている複数の属性識別器がｙａｗ角とｐｉｔｃｈ角に関する２次元空間上に格子状に並べられていると想定し、ユークリッド距離による４近傍の識別器を選択する。換言すれば、顔方向別属性識別器記憶部１４に記憶されている複数の属性識別器のそれぞれを、顔方向推定部２３から出力されたｙａｗ角およびｐｉｔｃｈ角から構成される２次元空間に配置したときの、ユークリッド距離に基づいて近傍の属性識別器を２つ以上選択する。 There are various methods for selecting the attribute classifier by the attribute classifier 25. In the present embodiment, the attribute classifier 25 selects the attribute classifier according to either the criterion 1 or the selection criterion 2.
(Selection criteria 1)
Assuming that a plurality of attribute classifiers stored in the face direction attribute classifier storage unit 14 are arranged in a grid on a two-dimensional space with respect to the yaw angle and the pitch angle, the nearest attribute identification based on the Euclidean distance is performed. Select one vessel. In other words, each of the plurality of attribute classifiers stored in the face direction attribute classifier storage unit 14 is arranged in a two-dimensional space composed of the yaw angle and the pitch angle output from the face direction estimation unit 23. Then, one nearest attribute classifier is selected based on the Euclidean distance.
(Selection criteria 2)
Assume that a plurality of attribute classifiers stored in the face direction attribute classifier storage unit 14 are arranged in a grid on a two-dimensional space with respect to the yaw angle and the pitch angle, and classifiers of four neighbors based on the Euclidean distance Select. In other words, each of the plurality of attribute classifiers stored in the face direction attribute classifier storage unit 14 is arranged in a two-dimensional space composed of the yaw angle and the pitch angle output from the face direction estimation unit 23. Two or more nearby attribute classifiers are selected based on the Euclidean distance.

基準基準１に従って１つの属性識別器を選択した場合、属性識別部２５は、当該選択した１つの属性識別器に、認識用顔領域再検出部２４から取得した顔切出画像データを入力し、識別結果を得る。そして、属性識別部２５は、当該識別結果を結果出力部２６に供給する。 When one attribute discriminator is selected according to the criterion 1, the attribute discriminating unit 25 inputs the face cut-out image data acquired from the recognition face area re-detecting unit 24 to the selected one attribute discriminator. Get the identification result. Then, the attribute identification unit 25 supplies the identification result to the result output unit 26.

基準基準２に従って４つの属性識別器を選択した場合、属性識別部２５は、当該選択した４つの属性識別器のそれぞれに、認識用顔領域再検出部２４から取得した顔切出画像データを入力し、それぞれから識別結果を得る。そして、属性識別部２５は、それぞれから取得した識別結果から最終的な識別結果を算出し（例えば、距離による重み付き平均を用いて対象画像の被写体に係る最終的な識別結果を算出する）、最終的な識別結果を結果出力部２６に供給する。
例えば、サポートベクトルマシン等では最後にｓｉｇｎ関数を乗じて１か−１の何れかを出力するが、本実施形態では、それぞれから取得した識別結果について、ｓｉｇｎ関数を乗じる前の数値を（一時的な）識別結果とし、デジタル画像の拡大などで用いられるバイリニア補間と同様の手法を用いて、それぞれの識別結果を重み付き平均する。次いで、ｓｉｇｎ関数をかけ、（最終的な）識別結果として結果出力部２６に供給する。なお、バイキュービック補間のように５近傍以上の重み付き平均を利用した手法、スプライン補間などの、より高度な手法を利用可能であり、バリエーションは多彩である。 When four attribute classifiers are selected according to the criterion 2, the attribute identification unit 25 inputs the face cut-out image data acquired from the recognition face area redetection unit 24 to each of the selected four attribute classifiers. Then, an identification result is obtained from each. Then, the attribute identification unit 25 calculates a final identification result from the identification results acquired from each of them (for example, calculates a final identification result related to the subject of the target image using a weighted average based on distance), The final identification result is supplied to the result output unit 26.
For example, the support vector machine or the like multiplies the sign function at the end and outputs either 1 or -1. In this embodiment, for the identification result obtained from each, the numerical value before the sign function is multiplied (temporarily N) The identification results are weighted and averaged using a method similar to bilinear interpolation used for enlargement of a digital image. Next, the sign function is multiplied and supplied to the result output unit 26 as a (final) identification result. Note that more advanced methods such as a method using a weighted average of five or more neighborhoods such as bicubic interpolation and spline interpolation can be used, and variations are various.

結果出力部２６では、属性識別部２５から識別結果を取得し出力する。 The result output unit 26 acquires the identification result from the attribute identification unit 25 and outputs it.

以下、図３および図４を用いて、年齢層識別において、主観年齢層を識別する属性識別器を生成する場合の具体例などについて説明する。図３および図４は、主観年齢層の概念を説明するための説明図である。図３（ａ）に示すように、一人の顔画像に対し、多数の人物の主観年齢層を集計すると、複数のクラス（層）にばらつく可能性が高い。そのため、まず、主観年齢層で全体のａ％（ａはあらかじめ決定するパラメータ値）を超えるもクラスのみを正解とする。例えば、図３（ｂ）に示す例では、２０歳〜３４歳のクラスおよび３５歳〜４９歳のクラスは正解とするが、１９歳以下のクラスおよび５０歳以上のクラスは不正解とする。 Hereinafter, a specific example of generating an attribute classifier for identifying a subjective age group in age group identification will be described with reference to FIGS. 3 and 4. 3 and 4 are explanatory diagrams for explaining the concept of the subjective age group. As shown in FIG. 3A, when the subjective age groups of a large number of persons are aggregated for one face image, there is a high possibility that the classes will vary among a plurality of classes (layers). For this reason, first, only a class that exceeds a% (a is a parameter value determined in advance) in the subjective age group is regarded as correct. For example, in the example shown in FIG. 3B, a class of 20 years old to 34 years old and a class of 35 years old to 49 years old are correct, but a class of 19 years old or younger and a class of 50 years old or older are incorrect.

ところで、主観年齢層を分けるときのクラスの分け方も種々の分け方が考えられるが（例えば、図３に示す方法やｎ十代毎に分ける方法）、通常、主観年齢層が３クラス以上となるような分け方を設定する。４クラスの主観年齢層（１９歳以下のクラス、２０歳〜３４歳のクラス、３５歳〜４９歳のクラス、５０歳以上のクラス）を設定する場合、例えば、図４（ａ）のように、１９歳以下と２０歳以上を識別する属性識別器１、３４歳以下と３５歳以上を識別する属性識別器２、４９歳以下と５０歳以上を識別する属性識別器２をそれぞれ学習する。つまり、属性識別器１、２、３の識別結果を分析すれば、上記４クラスの主観年齢層に対応できる。各属性識別器は２クラスの年齢層（例えば、図４（ａ）に示す属性識別器２の場合、３４歳以下のクラスと３５歳以上のクラス）を判別し、１か−１として出力する最も単純な識別器の組合せで構成可能である。 By the way, there are various ways of dividing the classes when dividing the subjective age group (for example, the method shown in FIG. 3 or the method of dividing every n teens). Set up a way of dividing. When setting four classes of subjective age groups (classes of 19 years old or less, classes of 20 years old to 34 years old, classes of 35 years old to 49 years old, classes of 50 years old or more), for example, as shown in FIG. , An attribute discriminator 1 for identifying 19 years or younger and 20 or older, an attribute discriminator 2 for discriminating 34 years or younger and 35 or older, and an attribute discriminator 2 for identifying 49 or younger and 50 or older. That is, if the identification results of the attribute classifiers 1, 2, and 3 are analyzed, the four classes of subjective age groups can be handled. Each attribute discriminator discriminates two classes of age groups (for example, in the case of the attribute discriminator 2 shown in FIG. 4A, a class of 34 years old or younger and a class of 35 years old or older), and outputs as 1 or -1. It can be configured with the simplest combination of discriminators.

問題は、正解が２クラス以上発生する場合（例えば、図３（ｂ）のような結果が出る顔画像の場合）の対処であるが、当該問題の対処としては、例えば、属性識別器を学習するために渡す教師信号として、下記式（１）に従って算出される内分値を与えるようにしてもよい。 The problem is to deal with cases where two or more correct answers occur (for example, in the case of a face image that produces a result as shown in FIG. 3 (b)). Therefore, an internal division value calculated according to the following equation (1) may be given as a teacher signal to be passed.

内分値＝（Ｏ_Ｓ×Ｐ_Ｓ＋Ｏ_Ｂ×Ｐ_Ｂ）÷（Ｐ_Ｓ＋Ｐ_Ｂ）・・・（１）
但し、Ｏ_Ｓはある属性識別器Ｘにおいて小さい方の年齢層Ｓであると判別された場合の出力値、Ｏ_Ｂは当該属性識別器Ｘにおいて大きい方の年齢層Ｂであると判別された場合の出力値、Ｐ_Ｓは年齢層Ｓに含まれる最大の主観年齢層に属すると評価した他者の割合（頻度）、Ｐ_Ｂは年齢層Ｂに含まれる最小の主観年齢層に属すると評価した他者の割合（頻度である。 Internal value = (O _S × P _S + O _B × P _B ) ÷ (P _S + P _B ) (1)
However, O output value _{when S} is determined that is smaller age S of the certain attribute identifier X, if O _B is it is judged that age B larger in the attribute identifier X , P _S is the ratio (frequency) of others who are evaluated as belonging to the largest subjective age group included in the age group S, and P _B is evaluated as belonging to the minimum subjective age group included in the age group B Percentage of others (frequency.

具体的には、属性識別器２の場合、図４（ａ）に示すように、小さい方の年齢層Ｓ（３４歳以下のクラス）であると判別された場合の出力値Ｏ_Ｓ＝−１、大きい方の年齢層Ｂ（３５歳以上のクラス）であると判別された場合の出力値Ｏ_Ｂ＝１、図３（ａ）に示すように、年齢層Ｓ（３４歳以下のクラス）に含まれる最大の主観年齢層（２０〜３４歳のクラス）に属すると評価した他者の頻度Ｐ_Ｓ＝０．６、年齢層Ｂ（３５歳以上のクラス）に含まれる最小の主観年齢層（３５〜４９歳のクラス）に属すると評価した他者の頻度P_Ｂ＝０．３である。従って、属性識別器２に係る内分値（教師信号）は、上記式（１）に従って、図４（ｂ）に示すように、（−１×０．６＋１×０．３）÷（０．６＋０．３）＝−０．３３３と算出される。 Specifically, in the case of the attribute discriminator 2, as shown in FIG. 4A, the output value O _S = −1 when it is determined that it is the smaller age group S (class of 34 years old or less). The output value O _B = 1 when it is determined that it is the larger age group B (class 35 years or older), as shown in FIG. 3A, the age group S (class 34 years or younger) Frequency P _S = 0.6 of others evaluated as belonging to the largest included subjective age group (classes 20 to 34 years old), minimum subjective age group included in age group B (classes 35 years and older) ( The frequency P _B = 0.3 of others evaluated as belonging to the class of 35 to 49 years old. Therefore, the internal value (teacher signal) related to the attribute discriminator 2 is (−1 × 0.6 + 1 × 0.3) ÷ (0...) As shown in FIG. 6 + 0.3) = − 0.333.

同様に、属性識別器１の場合、図４（ａ）に示すように、小さい方の年齢層Ｓ（１９歳以下のクラス）であると判別された場合の出力値Ｏ_Ｓ＝−１、大きい方の年齢層Ｂ（２０歳以上のクラス）であると判別された場合の出力値Ｏ_Ｂ＝１、図３（ａ）に示すように、年齢層Ｓ（１９歳以下のクラス）に含まれる最大の主観年齢層（１９歳以下のクラス）に属すると評価した他者の頻度Ｐ_Ｓ＝０．１、年齢層Ｂ（２０歳以上のクラス）に含まれる最小の主観年齢層（２０〜３４歳のクラス）に属すると評価した他者の頻度P_Ｂ＝０．６である。従って、属性識別器１に係る内分値（教師信号）は、上記式（１）に従って、図４（ｂ）に示すように、（−１×０．１＋１×０．６）÷（０．１＋０．６）＝０．７１４と算出される。 Similarly, in the case of the attribute discriminator 1, as shown in FIG. 4 (a), the output value O _S = −1 when the lower age group S (class of 19 years or less) is discriminated is large. The output value O _B = 1 when it is determined that the other age group B (class 20 years or older) is included in the age group S (class 19 years or younger) as shown in FIG. Frequency P _S = 0.1 of others evaluated as belonging to the maximum subjective age group (classes of 19 years old and under), minimum subjective age group (20 to 34) included in age group B (classes of 20 years old and over) The frequency P _B = 0.6 of others evaluated as belonging to the age class). Therefore, the internal value (teacher signal) related to the attribute discriminator 1 is (−1 × 0.1 + 1 × 0.6) ÷ (0...) As shown in FIG. 1 + 0.6) = 0.714.

同様に、属性識別器３の場合、図４（ａ）に示すように、小さい方の年齢層Ｓ（４９歳以下のクラス）であると判別された場合の出力値Ｏ_Ｓ＝−１、大きい方の年齢層Ｂ（５０歳以上のクラス）であると判別された場合の出力値Ｏ_Ｂ＝１、図３（ａ）に示すように、年齢層Ｓ（４９歳以下のクラス）に含まれる最大の主観年齢層（３５〜４９歳以下のクラス）に属すると評価した他者の頻度Ｐ_Ｓ＝０．３、年齢層Ｂ（５０歳以上のクラス）に含まれる最小の主観年齢層（５０歳以上のクラス）に属すると評価した他者の頻度P_Ｂ＝０である。従って、属性識別器３に係る内分値（教師信号）は、上記式（１）に従って、図４（ｂ）に示すように、（−１×０．３＋１×０）÷（０．３＋０）＝−１と算出される。 Similarly, in the case of the attribute discriminator 3, as shown in FIG. 4 (a), the output value O _S = −1 when the lower age group S (class of 49 years old or less) is discriminated is large. Output value O _B = 1 when it is determined that the other age group B (class of 50 years or older) is included in the age group S (class of 49 years or younger) as shown in FIG. Frequency P _S = 0.3 of others evaluated as belonging to the largest subjective age group (35 to 49 years old or lower class), minimum subjective age group (50 included in age group B (class 50 years old or older)) The frequency P _B = 0 of others who are evaluated as belonging to a class aged over). Therefore, the internal value (teacher signal) related to the attribute classifier 3 is (−1 × 0.3 + 1 × 0) ÷ (0.3 + 0) as shown in FIG. = -1.

また、より単純に、それぞれの頻度（割合）を等価と考え、下記式（２）に従って内分値を算出してもよい。即ち、上記式（１）において、Ｐ_Ｓ＝Ｐ_Ｂ＝０．５としてもよい。 Further, the internal value may be calculated according to the following formula (2), considering each frequency (ratio) as equivalent. That is, in the above formula (1), P _S = P _B = 0.5 may be set.

内分値＝（Ｏ_Ｓ×０．５＋Ｏ_Ｂ×０．５）÷（０．５＋０．５）・・・（２） Internal value = (O _S × 0.5 + O _B × 0.5) ÷ (0.5 + 0.5) (2)

なお、上記式（２）に従えば、例えば、属性識別器２に係る内分値（教師信号）は、（−１×０．５＋１×０．５）÷（０．５＋０．５）＝０と算出される。 According to the above equation (2), for example, the internal value (teacher signal) related to the attribute discriminator 2 is (−1 × 0.5 + 1 × 0.5) ÷ (0.5 + 0.5) = 0. Is calculated.

なお、基礎実験等により、パラメータａを適切に設定すれば、クラス間が隣接せずに正解が２クラス以上発生するケースは稀であることが分かっている。そのため、複数の正解クラスが隣接しないケースは無視する。仮に、複数の正解クラスが隣接しないようなデータが出現した場合には、当該データを学習データから除外する。また、属性識別器から矛盾する結果（例えば、１９歳以下、かつ、３５歳以上であるという結果）が出力される可能性を否定できないため、矛盾した結果の出力されたときのルール（例えば、「常に、年齢が少ない方のクラスを優先して出力する」というようなルール）を予め設定しておくことよい。 In addition, it is known from a basic experiment or the like that if the parameter a is appropriately set, it is rare that two or more correct answers occur without adjacent classes. Therefore, the case where multiple correct classes are not adjacent is ignored. If data such that a plurality of correct classes do not adjoin each other appears, the data is excluded from the learning data. In addition, since it is impossible to deny the possibility that an inconsistent result (for example, a result of being 19 years old or younger and 35 years old or older) is output from the attribute discriminator, a rule when an inconsistent result is output (for example, It is preferable to set in advance a rule such as “always give priority to a class with a lower age”.

続いて、図５を用いてパターン認識方法１の動作を説明する。図５（ａ）に示すフローチャートは、学習処理部１０の動作の流れである。図５（ｂ）に示すフローチャートは、認識処理部２０の動作の流れである。図５（ｃ）に示すフローチャートは、顔方向別属性識別器生成部１３が主観年代に係る教師信号を作成する場合の動作の流れである。 Subsequently, the operation of the pattern recognition method 1 will be described with reference to FIG. The flowchart shown in FIG. 5A is an operation flow of the learning processing unit 10. The flowchart shown in FIG. 5B is an operation flow of the recognition processing unit 20. The flowchart shown in FIG. 5C is an operation flow when the face direction attribute discriminator generation unit 13 creates a teacher signal related to the subjective age.

図５（ａ）において、学習用データ取得部１１は、学習用データ（顔画像データ、学習用顔方向パラメータ、属性データ、集計データ）を取得する（ステップＳ１１）。学習用データ取得部１１は顔画像データを学習用顔領域検出部１２に供給し、学習用顔方向パラメータおよび属性データを顔方向別属性識別器生成部１３に供給する。 In FIG. 5A, the learning data acquisition unit 11 acquires learning data (face image data, learning face direction parameters, attribute data, and total data) (step S11). The learning data acquisition unit 11 supplies the face image data to the learning face area detection unit 12, and supplies the learning face direction parameter and attribute data to the face direction-specific attribute classifier generation unit 13.

次いで、学習用顔領域検出部１２は、顔画像データから顔領域を検出する（ステップＳ１２）。学習用顔領域検出部１２は、顔切出画像データを顔方向別属性識別器生成部１３に供給する。 Next, the learning face area detection unit 12 detects a face area from the face image data (step S12). The learning face area detection unit 12 supplies the face cut-out image data to the face direction attribute classifier generation unit 13.

次いで、顔方向別属性識別器生成部１３は、属性識別器を生成する（ステップＳ１３）。具体的には、顔方向別属性識別器生成部１３は、図５（ｃ）のフローチャートを実行し、教師信号を用いて属性識別器を生成する。顔方向別属性識別器生成部１３は、生成した属性識別器を顔方向別属性識別器記憶部１４に記憶する（ステップＳ１４）。 Next, the face-specific attribute classifier generator 13 generates an attribute classifier (step S13). Specifically, the face direction attribute discriminator generating unit 13 executes the flowchart of FIG. 5C and generates an attribute discriminator using a teacher signal. The face direction attribute discriminator generation unit 13 stores the generated attribute discriminator in the face direction attribute discriminator storage unit 14 (step S14).

次いで、顔方向別属性識別器生成部１３は、全ての顔方向の属性識別器を生成したか否かを判断する（ステップＳ１５）。顔方向別属性識別器生成部１３は、全ての顔方向の属性識別器を生成していないと判断した場合（ステップＳ１５：Ｎｏ）、ステップＳ１１に戻る。一方、顔方向別属性識別器生成部１３は、全ての顔方向の属性識別器を生成したと判断した場合（ステップＳ１５：Ｙｅｓ）、図５（ａ）に示すフローチャートは終了する。 Next, the face direction attribute classifier generator 13 determines whether all face direction attribute classifiers have been generated (step S15). If the face direction attribute classifier generator 13 determines that all face direction attribute classifiers have not been generated (step S15: No), the process returns to step S11. On the other hand, if the face direction attribute classifier generator 13 determines that all face direction attribute classifiers have been generated (step S15: Yes), the flowchart shown in FIG. 5A ends.

図５（ｂ）において、対象画像データ取得部２１は、対象画像データを取得する（ステップＳ２１）。対象画像データ取得部２１は、対象画像データを認識用顔領域検出部２２および認識用顔領域再検出部２４に供給する。 In FIG.5 (b), the target image data acquisition part 21 acquires target image data (step S21). The target image data acquisition unit 21 supplies the target image data to the recognition face area detection unit 22 and the recognition face area redetection unit 24.

次いで、認識用顔領域検出部２２は、対象画像データから顔領域を検出する（ステップＳ２２）。認識用顔領域検出部２２は、顔切出画像データを顔方向推定部２３に供給する。 Next, the recognition face area detection unit 22 detects a face area from the target image data (step S22). The recognition face area detection unit 22 supplies the face cut image data to the face direction estimation unit 23.

次いで、顔方向推定部２３は、認識用顔領域検出部２２によって出力された顔切出画像データに基づいて、対象画像の顔方向を推定する（ステップＳ２３）。顔方向推定部２３は、対象画像の被写体の顔方向を示す属性識別用顔方向パラメータ（ｒｏｌｌ角およびスケール値）を認識用顔領域再検出部２４に供給し、対象画像の被写体の顔方向を示す属性識別用顔方向パラメータ（ｙａｗ角およびｐｉｔｃｈ角）を属性識別部２５に供給する。 Next, the face direction estimation unit 23 estimates the face direction of the target image based on the face cut-out image data output by the recognition face area detection unit 22 (step S23). The face direction estimation unit 23 supplies attribute recognition face direction parameters (roll angle and scale value) indicating the face direction of the subject of the target image to the recognition face region redetection unit 24, and determines the face direction of the subject of the target image. The attribute identification face direction parameters (yaw angle and pitch angle) shown are supplied to the attribute identification unit 25.

次いで、認識用顔領域再検出部２４は、顔方向推定部２３によって出力された属性識別用顔方向パラメータ（ｒｏｌｌ角およびスケール値）に基づいて、対象画像データから顔領域を再度切り出す（ステップＳ２４）。認識用顔領域再検出部２４は、顔切出画像データを属性識別部２５に供給する。 Next, the recognizing face area redetection unit 24 cuts out the face area again from the target image data based on the attribute identification face direction parameters (roll angle and scale value) output by the face direction estimation unit 23 (step S24). ). The recognition face area redetection unit 24 supplies the face cut-out image data to the attribute identification unit 25.

次いで、属性識別部２５は、顔方向推定部２３によって出力された属性識別用顔方向パラメータ（ｙａｗ角およびｐｉｔｃｈ角）に基づいて、顔方向別属性識別器記憶部１４に記憶されている複数の属性識別器のなかから１または２以上の属性識別器を選択する（ステップＳ２５）。そして、属性識別部２５は、選択した属性識別器に、認識用顔領域再検出部２４から取得した顔切出画像データを入力し、識別結果を得る（ステップＳ２６）。結果出力部２６では、属性識別部２５から識別結果を取得し出力する（ステップＳ２７）。そして、図５（ｂ）に示すフローチャートは終了する。 Next, the attribute identifying unit 25, based on the attribute identifying face direction parameters (yaw angle and pitch angle) output by the face direction estimating unit 23, a plurality of attributes stored in the face direction-specific attribute classifier storage unit 14. One or more attribute classifiers are selected from the attribute classifiers (step S25). Then, the attribute identification unit 25 inputs the face cut-out image data acquired from the recognition face area redetection unit 24 to the selected attribute classifier, and obtains the identification result (step S26). The result output unit 26 acquires and outputs the identification result from the attribute identification unit 25 (step S27). Then, the flowchart shown in FIG. 5B ends.

図５（ｃ）において、顔方向別属性識別器生成部１３は、学習用データ取得部１１から学習用データ（集計データ）を取得する。具体的には、顔方向別属性識別器生成部１３は、学習用データ取得部１１から、例えば、図３（ａ）に示すように、主観年齢層頻度データを取得する（ステップＳ３１）。 In FIG. 5C, the face direction attribute classifier generator 13 acquires learning data (aggregated data) from the learning data acquisition unit 11. Specifically, the face direction attribute classifier generator 13 acquires subjective age group frequency data from the learning data acquisition unit 11, for example, as shown in FIG. 3A (step S31).

次いで、顔方向別属性識別器生成部１３は、図３（ｂ）に示すように、予め設定した閾値ａに基づいて、各クラスについて正解／不正解を判別する（ステップＳ３２）。顔方向別属性識別器生成部１３は、正解になったクラスについて、主観年齢層頻度データにばらつきがある場合は、上述の如く、複数隣り合う正解クラス同士での頻度による重み付き内分値を算出し、教師信号を作成する（ステップＳ３３）（例えば図４（ｂ）を参照）。 Next, as shown in FIG. 3B, the face direction attribute classifier generator 13 determines the correct / incorrect answer for each class based on a preset threshold value a (step S32). When there is a variation in the subjective age group frequency data for the correct class, the face direction attribute classifier generator 13 calculates a weighted internal value based on the frequency of the adjacent correct classes as described above. The teacher signal is calculated and created (step S33) (see, for example, FIG. 4B).

なお、図３（ｂ）は２クラスに正解がまたがる場合を示しているが、３クラス以上に正解がまたがる場合でも、教師信号”−１”に該当する正解クラスの頻度の割合の合計値と、教師信号”１”に該当する正解クラスの頻度の割合の合計値とで内分値を容易に算出することができる。また、前述したが、頻度の割合を全て等価とし内分値を計算してもよい。なお、正解になったクラスについて、主観年齢層頻度データにばらつきがない場合は、通常通り適切に、１または−１を教師信号として各属性識別器に出力する。また、不正解になったクラスは無視する。 FIG. 3B shows the case where the correct answer spans two classes. Even when the correct answer spans three or more classes, the total value of the ratios of the frequency of the correct class corresponding to the teacher signal “−1” Therefore, the internal division value can be easily calculated from the total value of the proportions of the correct class frequencies corresponding to the teacher signal “1”. Further, as described above, the internal division value may be calculated with all the frequency ratios being equivalent. When there is no variation in the subjective age group frequency data for the correct class, 1 or −1 is appropriately output to each attribute classifier as a teacher signal as usual. Also, ignore the class that is incorrect.

顔方向別属性識別器生成部１３は、全ての人物の教師信号を作成したか否かを判断する（ステップＳ３４）。顔方向別属性識別器生成部１３は、全ての人物の教師信号を作成していないと判断した場合（ステップＳ３４：Ｎｏ）、ステップＳ３１に戻る。一方、顔方向別属性識別器生成部１３は、全ての顔方向の属性識別器を生成したと判断した場合（ステップＳ３４：Ｙｅｓ）、図５（ｃ）に示すフローチャートは終了する。 The face direction attribute classifier generator 13 determines whether teacher signals have been created for all persons (step S34). If it is determined that the face direction attribute classifier generator 13 has not created teacher signals for all persons (step S34: No), the process returns to step S31. On the other hand, if the face direction attribute classifier generator 13 determines that all face direction attribute classifiers have been generated (step S34: Yes), the flowchart shown in FIG. 5C ends.

以上、本実施形態によれば、顔検出後に顔の姿勢を示すｙａｗ角、ｐｉｔｃｈ角、ｒｏｌｌ角およびスケール値を推定し、その結果をもとに入力画像から顔領域を再度切出すとともに、最も適切な識別器を１つ、あるいは複数個選択し、重み付き平均等を利用して結果を統合することにより、被写体の顔方向に対しロバストかつ処理速度の速い属性識別を実現できるようになる。
また、適切な教師信号を算出して設定するため、主観年齢層を精度良く識別できるようになる。具体的には、年齢層識別においては、予め集計した主観年齢層の頻度分布を基に、ある閾値を通して正解／不正解を決定し、さらに、正解クラスの頻度の割合を重みとした内分値（正解と判定した各クラスに属すると評価した他者の割合を重みとした内分値）、または、頻度の割合を全て等価とした内分値を教師信号として識別器に渡すことによって、主観年齢層を精度良く識別できるようになる。 As described above, according to the present embodiment, the yaw angle, the pitch angle, the roll angle, and the scale value indicating the posture of the face after the face detection are estimated, and the face area is cut out again from the input image based on the result. By selecting one or a plurality of appropriate classifiers and integrating the results using a weighted average or the like, it is possible to realize attribute identification that is robust and fast in processing speed with respect to the face direction of the subject.
In addition, since an appropriate teacher signal is calculated and set, the subjective age group can be accurately identified. Specifically, in age group identification, the correct / incorrect answer is determined through a certain threshold based on the frequency distribution of the subjective age group that has been pre-aggregated, and further, an internal value with the ratio of the correct class frequency as a weight (Individual values with weights of the proportion of others evaluated as belonging to each class determined to be correct), or by passing the divided values with all frequency proportions as equivalent to the discriminator as subjective signals, The age group can be accurately identified.

なお、本発明の一実施形態による属性識別装置１の各処理を実行するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、当該記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより、本発明の一実施形態による属性識別装置１の各処理に係る上述した種々の処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものであってもよい。また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、フラッシュメモリ等の書き込み可能な不揮発性メモリ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 Note that a program for executing each process of the attribute identification device 1 according to an embodiment of the present invention is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed. By doing so, you may perform the various process mentioned above which concerns on each process of the attribute identification device 1 by one Embodiment of this invention. Here, the “computer system” may include an OS and hardware such as peripheral devices. Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used. The “computer-readable recording medium” means a flexible disk, a magneto-optical disk, a ROM, a writable nonvolatile memory such as a flash memory, a portable medium such as a CD-ROM, a hard disk built in a computer system, etc. This is a storage device.

さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（例えばＤＲＡＭ（Dynamic Random Access Memory））のように、一定時間プログラムを保持しているものも含むものとする。また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 Further, the “computer-readable recording medium” means a volatile memory (for example, DRAM (Dynamic DRAM) in a computer system that becomes a server or a client when a program is transmitted through a network such as the Internet or a communication line such as a telephone line. Random Access Memory)), etc., which hold programs for a certain period of time. The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes designs and the like that do not depart from the gist of the present invention.

１属性識別装置、１０学習処理部、１１学習用データ取得部、１２学習用顔領域検出部、１３顔方向別属性識別器生成部、１４顔方向別属性識別器記憶部、２０認識処理部、２１対象画像データ取得部、２２認識用顔領域検出部、２３顔方向推定部、２４認識用顔領域再検出部、２５属性識別部、２６結果出力部 DESCRIPTION OF SYMBOLS 1 Attribute identification device, 10 Learning processing part, 11 Learning data acquisition part, 12 Learning face area detection part, 13 Face direction attribute discriminator production | generation part, 14 Face direction attribute discriminator memory | storage part, 20 Recognition processing part, 21 target image data acquisition unit, 22 recognition face area detection unit, 23 face direction estimation unit, 24 recognition face area redetection unit, 25 attribute identification unit, 26 result output unit

Claims

Learning data acquisition unit for acquiring face image data captured from various directions, learning face direction parameters indicating the direction of the face of the subject of the face image data, and attribute data of the subject as learning data When,
A learning face area detection unit that outputs face cut-out image data obtained by detecting and cutting out a face area of a subject from the face image data acquired by the learning data acquisition unit;
The face cut image data output by the learning face area detection unit, the plurality of face cut image data having the same learning face direction parameter, and each of the plurality of face cut image data A face direction attribute classifier generator for generating an attribute classifier for identifying the attribute of the subject for each direction in which the face of the subject is facing based on the attribute data;
A face direction attribute classifier storage unit that stores the attribute classifier generated by the face direction attribute classifier generation unit;
A target image data acquisition unit that acquires target image data that is a target for subject identification;
A recognition face area detection unit that detects a face area of a subject and outputs face cut-out image data from the target image data acquired by the target image data acquisition unit;
Based on the face cut-out image data output by the recognition face area detection unit, the direction of the face of the subject of the target image is estimated, and attribute identification indicating the direction of the face of the subject A face direction estimation unit that outputs a face direction parameter for use;
Based on the attribute identification face direction parameter output by the face direction estimation unit, a recognition face area re-output for outputting face cut-out image data obtained by detecting again the face area of the subject from the target image data. A detection unit;
Based on the attribute identification face direction parameter output by the face direction estimation unit, one or more of the attributes from among the plurality of the attribute classifiers stored in the face direction attribute classifier storage unit The discriminator is selected, and the face cut-out image data output from the recognition face area redetection unit is input to the selected one or more attribute discriminators, and the discrimination result relating to the subject of the target image An attribute identification device comprising: an attribute identification unit that acquires

The face direction estimation unit
Outputting a yaw angle and a pitch angle indicating the direction of the face of the subject of the target image as the face direction parameter for attribute identification;
The attribute identification unit
When each of the plurality of attribute classifiers stored in the face direction attribute classifier storage unit is arranged in a two-dimensional space composed of the yaw angle and the pitch angle output from the face direction estimation unit The attribute identification device according to claim 1, wherein one of the nearest attribute classifiers is selected based on a Euclidean distance, and an identification result relating to a subject of the target image is acquired.

The face direction estimation unit
Outputting a yaw angle and a pitch angle indicating the direction of the face of the subject of the target image as the face direction parameter for attribute identification;
The attribute identification unit
When each of the plurality of attribute classifiers stored in the face direction attribute classifier storage unit is arranged in a two-dimensional space composed of the yaw angle and the pitch angle output from the face direction estimation unit The two or more neighboring attribute classifiers are selected based on the Euclidean distance, and the identification result relating to the subject of the target image is obtained using a weighted average based on the distance. The attribute identification device described in 1.

The learning data acquisition unit
Aggregate data obtained by aggregating the ratio of the subjective age of the subject of the face image data obtained by previously presenting the face image data to a large number of persons is further acquired as learning data,
The face direction-specific attribute classifier generator is
Based on a predetermined threshold, the correct answer / incorrect answer of each subjective age group indicated by the aggregated data is determined, and when a plurality of subjective age groups are determined to be correct, the ratio of evaluation in the subjective age group determined to be correct 4. The attribute discriminator is generated by passing the weighted internal value or the internal value equivalent to the evaluation ratio as a teacher signal to the attribute discriminator. The attribute identification device according to any one of the above.

Learning data acquisition means for acquiring face image data captured from various directions, learning face direction parameters indicating the direction of the face of the subject of the face image data, and attribute data of the subject as learning data When,
Learning face area detection means for detecting face area of a subject and outputting cut-out face image data from the face image data acquired by the learning data acquisition means;
A plurality of the face cut-out image data output by the learning face area detecting means and having the same learning face direction parameter, and each of the plurality of face cut-out image data An attribute classifier generating unit for each face direction that generates an attribute classifier for identifying the attribute of the subject for each direction in which the face of the subject is facing based on the attribute data;
A face direction attribute classifier storage unit for storing the attribute classifier generated by the face direction attribute classifier generation unit;
Target image data acquisition means for acquiring target image data which is a target for subject identification;
Recognizing face area detecting means for detecting a face area of a subject from the target image data acquired by the target image data acquiring means and outputting face-cut image data;
Based on the face cut-out image data output by the recognition face area detection means, the direction of the subject's face facing the target image is estimated, and attribute identification indicating the direction of the subject's face is directed A face direction estimating means for outputting a face direction parameter for use;
Based on the attribute identification face direction parameter output by the face direction estimating means, a face area for recognition is regenerated to output face cut-out image data obtained by detecting again the face area of the subject from the target image data. Detection means;
Based on the attribute identification face direction parameter output by the face direction estimation unit, one or more of the attributes from among the plurality of the attribute classifiers stored in the face direction attribute classifier storage unit Selecting a discriminator, inputting the face cut-out image data output from the recognition face area re-detecting means to the selected one or more attribute discriminators, and identifying results relating to the subject of the target image An attribute identification method comprising: attribute identification means for acquiring

The face direction estimating means includes
Outputting a yaw angle and a pitch angle indicating the direction of the face of the subject of the target image as the face direction parameter for attribute identification;
The attribute identifying means includes
When each of the plurality of attribute classifiers stored in the face direction attribute classifier storage unit is arranged in a two-dimensional space composed of the yaw angle and pitch angle output from the face direction estimation unit The attribute identification method according to claim 5, wherein one of the nearest attribute classifiers is selected based on a Euclidean distance, and an identification result relating to a subject of the target image is acquired.

The face direction estimating means includes
Outputting a yaw angle and a pitch angle indicating the direction of the face of the subject of the target image as the face direction parameter for attribute identification;
The attribute identifying means includes
When each of the plurality of attribute classifiers stored in the face direction attribute classifier storage unit is arranged in a two-dimensional space composed of the yaw angle and pitch angle output from the face direction estimation unit 6. The method according to claim 5, wherein two or more neighboring attribute classifiers are selected based on a Euclidean distance, and an identification result relating to a subject of the target image is obtained using a weighted average based on the distance. The attribute identification method described in 1.

The learning data acquisition means includes
Aggregate data obtained by aggregating the ratio of the subjective age of the subject of the face image data obtained by previously presenting the face image data to a large number of persons is further acquired as learning data,
The face direction attribute discriminator generating means comprises:
Based on a predetermined threshold, the correct answer / incorrect answer of each subjective age group indicated by the aggregated data is determined, and when a plurality of subjective age groups are determined to be correct, the ratio of evaluation in the subjective age group determined to be correct 8. The attribute discriminator is generated by passing the weighted internal value or the internal value equivalent to the evaluation ratio as a teacher signal to the attribute discriminator. The attribute identification method according to any one of the above.

To a computer that controls an attribute identification device that identifies an attribute of a subject,
Learning data acquisition step for acquiring face image data captured from various directions, a learning face direction parameter indicating the direction of the face of the subject of the face image data, and attribute data of the subject as learning data When,
A learning face area detecting step for detecting face area of the subject and extracting the face cut image data extracted from the face image data acquired by the learning data acquisition step;
The face cut-out image data output by the learning face area detection step, the plurality of face cut-out image data having the same learning face direction parameter, and each of the plurality of face cut-out image data Generating an attribute classifier for identifying the attribute of the subject for each direction in which the face of the subject is facing based on the attribute data, and storing the attribute classifier by face direction in a storage unit;
A target image data acquisition step for acquiring target image data which is a target for subject identification;
A recognition face area detecting step of detecting a face area of a subject and outputting face cut-out image data from the target image data acquired by the target image data acquisition step;
Based on the face cut-out image data output by the recognition face area detection step, the direction of the face of the subject of the target image is estimated, and attribute identification indicating the direction of the face of the subject A face direction estimating step for outputting a face direction parameter for use;
Based on the attribute identification face direction parameter output in the face direction estimation step, a recognition face area re-output that outputs face cut-out image data obtained by detecting again the face area of the subject from the target image data. A detection step;
Based on the attribute identification face direction parameter output by the face direction estimation step, select one or more of the attribute classifiers from the plurality of attribute classifiers stored in the storage unit, Attribute identification step of inputting the face cut-out image data output by the recognizing face area re-detection step to the selected one or more attribute classifiers and acquiring an identification result relating to the subject of the target image A program characterized by causing