JP2013058060A

JP2013058060A - Person attribute estimation device, person attribute estimation method and program

Info

Publication number: JP2013058060A
Application number: JP2011195639A
Authority: JP
Inventors: Yasuhisa Matsuba; 靖寿松葉; Satoshi Tabata; 聡田端; Daishi Sei; 大志瀬井; Daisuke Fukutomi; 大介福富; Kazumasa Koizumi; 和真小泉
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2011-09-08
Filing date: 2011-09-08
Publication date: 2013-03-28

Abstract

PROBLEM TO BE SOLVED: To provide a person attribute estimation device that can estimate a person attribute from feature quantity related to wrinkle shapes and has robustness against a change in a photographic environment.SOLUTION: A person attribute estimation device 1 includes: face detection means 110 for detecting a face region from images of a camera 2 photographing a person browsing an advertising medium 2a; a region of interest setting means 111 for specifying the position of a facial organ on the face region by applying a face model created in advance to the face region and setting a first region of interest to be region where wrinkles are easy to be generated as a region of interest for determining feature quantity to be used for estimating the person attribute from the specified position of the facial organ in the face region; feature quantity extraction means 112 for extracting gradient histogram as feature quantity of the face region from edge images in the first region of interest; and person attribute estimation means 120 for referring to a learning device 130 to specify the person attribute classification to which the feature quantity extracted by the feature quantity extraction means 112 belongs.

Description

本発明は、広告媒体の広告効果を測定する際に、該広告媒体を閲覧した人物の人物属性（年齢・性別）を推定するための技術に関する。 The present invention relates to a technique for estimating a person attribute (age / gender) of a person who viewed an advertisement medium when measuring the advertisement effect of the advertisement medium.

液晶ディスプレイやプロジェクタなどのディスプレイを用いて広告を表示するデジタルサイネージ（Digital Signage）が、様々な場所に設置され始めている。デジタルサイネージを用いることで、動画や音声を用いた豊かなコンテンツの提供が可能になるばかりか、デジタルサイネージの設置場所に応じた効率的な広告配信が可能になるため、今後、デジタルサイネージのマーケット拡大が期待されている。 Digital signage that displays advertisements using displays such as liquid crystal displays and projectors is beginning to be installed in various places. By using digital signage, not only will it be possible to provide rich content using video and audio, but it will also be possible to efficiently deliver advertisements according to the location of digital signage. Expansion is expected.

デジタルサイネージは広告媒体として利用されるため、テレビの視聴率、新聞・雑誌の販売部数といったように、デジタルサイネージにおいても客観的な広告効果を測定することが必要になり、デジタルサイネージの広告効果を測定する際、特許文献１で開示されているように、デジタルサイネージを閲覧している人物を撮影するカメラで撮影した映像を解析し、デジタルサイネージの広告効果の指標として、デジタルサイネージを閲覧した人物の人数を集計する装置が設置される。 Since digital signage is used as an advertising medium, it is necessary to measure objective advertising effects in digital signage, such as TV audience ratings and the number of newspapers and magazines sold. When measuring, as disclosed in Patent Document 1, a person who has viewed digital signage is analyzed as an index of the advertising effect of digital signage by analyzing a video captured by a camera that captures a person browsing digital signage Will be installed.

人物属性の区分毎にデジタルサイネージを閲覧した人物の人数を集計する際、カメラで撮影した映像に含まれる人物の人物属性（年齢・性別）を推定することが必要になり、映像に撮影されている人物の人物属性を推定する発明としては、例えば、特許文献２において、任意の顔向き方向から撮像した対象人物の顔画像から、その対象人物の性別や年代等の属性を自動で推定する装置が開示されている。 When counting the number of people who viewed digital signage for each person attribute category, it is necessary to estimate the person attributes (age and gender) of the person included in the video taken by the camera, As an invention for estimating a person attribute of a person, for example, in Patent Document 2, an apparatus for automatically estimating attributes such as sex and age of a target person from a face image of the target person taken from an arbitrary face direction Is disclosed.

特許文献２で開示されている装置では、撮像した人物画像の顔画像から特徴ベクトルを作成し、特徴ベクトルを用いて性別及び年代が類似した属性類似者を複数人推定した後、属性類似者の性別に基づいて該人物画像に対応する人物の性別を推定し、推定した性別に属性類似者を限定して、該人物の性別を推定する。 In the device disclosed in Patent Document 2, a feature vector is created from a face image of a captured human image, and after estimating a plurality of attribute similar persons having similar gender and age using the feature vector, the attribute similar person's The gender of the person corresponding to the person image is estimated based on the gender, the attribute similar person is limited by the estimated gender, and the gender of the person is estimated.

また、特許文献３には、口の両端、目尻、目頭、鼻先などのような顔の特徴となる特徴部から、特徴部の全体もしくは一部の色の濃淡、皺の数などの特徴量を抽出し、予め用意された参照特徴量とのスコアを求めることで人物属性（性別・年齢）を推定する旨が記述されている。 Further, in Patent Document 3, the feature amount such as the shade of the whole or a part of the color of the characteristic part, the number of wrinkles, etc. from the characteristic part that is the facial characteristic such as both ends of the mouth, the corner of the eye, the corner of the eye, and the nose tip. It is described that a person attribute (gender / age) is estimated by extracting and obtaining a score with a reference feature amount prepared in advance.

更に、非特許文献１では、シワ及びシミに係る肌テクスチャや唇領域及び頬領域における色情報色相情報など、顔の見え方に依存しない性別・年齢推定に有効な特徴量を用い、関数近似能力を持つ多層型Nneural Networkを推定器として用いることで、性別の分類だけではなく、様々な分野への応用を考慮した推定年齢の連続値出力を可能とする手法が提案されている。 Furthermore, Non-Patent Document 1 uses a feature amount effective for gender / age estimation independent of face appearance, such as skin texture related to wrinkles and spots, color information hue information in lip area and cheek area, and function approximation capability. A method has been proposed that enables continuous output of estimated age considering not only gender classification but also application to various fields by using a multi-layered neural network with estimator.

特開２０１１―７０６２９号公報JP 2011-70629 A 特開２００３−２４２４８６号公報JP 2003-242486 A 特開２００８―２８２０８９号公報JP 2008-282089 A

滝本裕則、他３名、「姿勢変動に影響されない顔画像からの性別年齢推定」、電学論Ｃ、127巻7号、2007号,p.1022−p.1029Hironori Takimoto and three others, “Gender age estimation from face images not affected by posture changes”, Denki Theory C, Vol. 127, No. 7, 2007, p.1022-p.1029

確かに、非特許文献１で開示されている手法は有用ではあるが改善の余地が残っている。例えば、非特許文献１に係る肌テクスチャでは、額、目尻、法齢線（豊令線）などの各領域におけるエッジ強度の平均と偏差をシワ特徴量として定義しているが、年齢・性別に応じてシワ形状（シワの向きや深さ）が変化するため、シワ形状に係る特徴量をシワ特徴量とすれば人物属性の推定精度向上が期待できる。また、非特許文献１に係る色相特徴では、撮影環境の変化に対する考慮がなされていないため、撮影環境の変化に対してロバスト性を持たせることができれば人物属性の推定精度向上が期待できる。 Certainly, although the technique disclosed in Non-Patent Document 1 is useful, there is still room for improvement. For example, in the skin texture according to Non-Patent Document 1, the average and deviation of the edge strength in each region such as the forehead, the corner of the eye, and the legal age line (bodily line) are defined as wrinkle feature values. Accordingly, the wrinkle shape (the direction and depth of the wrinkle) changes. Therefore, if the feature amount related to the wrinkle shape is used as the wrinkle feature amount, improvement in estimation accuracy of human attributes can be expected. Further, since the hue feature according to Non-Patent Document 1 does not consider the change of the shooting environment, if the robustness can be given to the change of the shooting environment, the estimation accuracy of the person attribute can be expected to be improved.

そこで、本発明は、このような課題に鑑みてなされたもので、その目的とするところは、シワ形状に係る特徴量を人物属性の推定に適用でき、加えて、撮影環境の変化に対してロバスト性を有する人物属性推定装置、人物属性推定方法、及びプログラムを提供することである。 Therefore, the present invention has been made in view of such problems, and the object of the present invention is to apply the feature amount related to the wrinkle shape to the estimation of the person attribute, and in addition to the change of the shooting environment. It is to provide a person attribute estimation device, a person attribute estimation method, and a program having robustness.

上述した課題を解決する第１の発明は、人物を撮影するように設定されたカメラの画像から顔領域を検出する顔検出手段と、事前に作成された顔モデルを前記顔領域にあてはめることで前記顔領域上における顔器官の位置を特定し、特定した顔器官の位置から、人物属性の推定に利用する特徴量を求める関心領域として、シワが発生し易い領域である第１の関心領域を前記顔領域に設定する関心領域設定手段と、前記特徴量として、前記第１の関心領域内のエッジ画像から勾配ヒストグラムを抽出する特徴量抽出手段と、複数の学習用顔画像から抽出した前記特徴量を人物属性の区分に分類した結果である学習器を参照し、前記特徴量抽出手段が抽出した前記特徴量を用いて前記顔領域に対応する人物属性の区分を特定する人物属性推定手段を備えたことを特徴とする人物属性推定装置である。 A first invention for solving the above-mentioned problem is to apply a face detection means for detecting a face area from an image of a camera set to photograph a person and a face model created in advance to the face area. The position of the facial organ on the face area is specified, and the first region of interest that is an area where wrinkles are likely to occur is obtained as a region of interest for obtaining a feature value used for estimating the human attribute from the position of the specified facial organ. The region of interest setting means for setting the face region, the feature amount extracting means for extracting a gradient histogram from the edge image in the first region of interest as the feature amount, and the feature extracted from a plurality of learning face images A personal attribute estimation unit that refers to a learning device that is a result of classifying a quantity into human attribute categories, and that uses the feature quantity extracted by the feature quantity extraction unit to identify a personal attribute category corresponding to the face region A person attribute estimation device characterized by comprising.

上述した第１の発明において、勾配ヒストグラムとは、エッジの方向を横軸、エッジの強度を縦軸に取り、エッジの強度を累積ヒストグラムとしてあらわしたもので、上述した第１の発明によれば、関心領域内のエッジ画像から生成された勾配ヒストグラムは、関心領域内における物体形状を示すことになり、ここでは、シワが発生し易い顔の第１の領域が関心領域となるため、特徴量抽出層の特徴量抽出手段が生成する勾配ヒストグラムは、第１の関心領域内の画像におけるシワ形状を示すことになり、シワ形状に係る特徴量を人物属性の推定に適用できる。 In the first invention described above, the gradient histogram is obtained by taking the edge direction as the horizontal axis, the edge strength as the vertical axis, and the edge strength as a cumulative histogram. The gradient histogram generated from the edge image in the region of interest indicates the object shape in the region of interest. Here, the first region of the face where wrinkles are likely to occur is the region of interest. The gradient histogram generated by the feature quantity extraction unit of the extraction layer indicates the wrinkle shape in the image in the first region of interest, and the feature quantity related to the wrinkle shape can be applied to estimation of the person attribute.

更に、上述した課題を解決する第２の発明は、第１の発明に記載した人物属性推定装置であって、前記関心領域設定手段は、前記第１の関心領域に加え、肌の基準色とする領域である第２の関心領域と唇に対応する第３の関心領域を前記顔領域に設定し、前記特徴量として、前記勾配ヒストグラムに加え、前記第２の関心領域と前記第３の関心領域間の色距離を抽出することを特徴とする人物属性推定装置である。 Further, a second invention for solving the above-described problem is the person attribute estimation device described in the first invention, wherein the region of interest setting means includes a skin reference color in addition to the first region of interest. A second region of interest corresponding to the lip and a third region of interest corresponding to the lips are set in the face region, and the second region of interest and the third region of interest are added as the feature amount to the gradient histogram. A human attribute estimation device that extracts a color distance between regions.

上述した第２の発明によれば、第２の関心領域と第３の関心領域間の色距離は相対的な値になるため、撮影条件が変動しても、第２の関心領域と第３の関心領域間の色距離は変動しないため、本実施形態の人物属性推定装置は、撮影条件の変動に対してロバスト性を有することになる。 According to the second invention described above, since the color distance between the second region of interest and the third region of interest is a relative value, even if the imaging condition varies, the second region of interest and the third region of interest. Since the color distance between the regions of interest does not vary, the human attribute estimation device of this embodiment has robustness against variations in shooting conditions.

更に、上述した課題を解決する第３の発明は、第１の発明または第２の発明に記載した人物属性推定装置であって、所望の人物属性の区分毎に用意された複数の学習用顔画像から前記特徴量を抽出し、機械学習により前記学習器を作成する学習器生成手段を備えたことを特徴とする人物属性推定装置である。 Furthermore, a third invention for solving the above-described problem is the person attribute estimation device described in the first invention or the second invention, wherein a plurality of learning faces prepared for each desired person attribute category It is a human attribute estimation device characterized by comprising learning device generation means for extracting the feature value from an image and creating the learning device by machine learning.

上述した第３の発明によれば、人物属性の推定に利用する前記学習器を作成できるようになる。なお、前記学習器の作成する手法としては、カーネル関数を用いた非線形分離を利用するＳＶＭを用いるとよい。 According to the third aspect described above, the learning device used for estimating the person attribute can be created. As a method for creating the learning device, SVM using nonlinear separation using a kernel function may be used.

更に、第４の発明は、人物を撮影するように設定されたカメラの画像から顔領域を検出する顔検出工程と、事前に作成された顔モデルを前記顔領域にあてはめることで前記顔領域上における顔器官の位置を特定し、特定した顔器官の位置から、人物属性の推定に利用する特徴量を求める関心領域として、シワが発生し易い領域である第１の関心領域を前記顔領域に設定する関心領域設定工程と、前記特徴量として、前記第１の関心領域内のエッジ画像から勾配ヒストグラムを抽出する特徴量抽出工程と、複数の学習用顔画像から抽出した前記特徴量を人物属性の区分に分類した結果である学習器を参照し、前記特徴量抽出手段が抽出した前記特徴量を用いて前記顔領域に対応する人物属性の区分を特定する人物属性推定工程を含む処理をコンピュータが実行することを特徴とする人物属性推定方法である。 Furthermore, the fourth invention is a face detection step for detecting a face area from an image of a camera set to shoot a person, and a face model created in advance is applied to the face area to thereby detect the face area. As the region of interest for determining the feature amount used for estimating the human attribute from the specified position of the facial organ, the first region of interest that is prone to wrinkles is defined as the facial region. A region-of-interest setting step to be set; a feature amount extracting step of extracting a gradient histogram from an edge image in the first region of interest as the feature amount; and the feature amount extracted from a plurality of learning face images A process including a person attribute estimation step of referring to a learning device that is a result of classification into a plurality of classifications and identifying a classification of a person attribute corresponding to the face region using the feature quantities extracted by the feature quantity extraction unit. A person attribute estimating method characterized in that Yuta performs.

更に、上述した課題を解決する第５の発明は、第４の発明に記載した人物属性推定方法であって、前記関心領域設定工程において、前記第１の関心領域に加え、肌の基準色とする領域である第２の関心領域と唇に対応する第３の関心領域を前記顔領域に設定し、前記特徴量抽出工程において、前記特徴量として、前記勾配ヒストグラムに加え、前記第２の関心領域と前記第３の関心領域間の色距離を抽出することを特徴とする人物属性推定方法である。 Further, a fifth invention for solving the above-described problem is the person attribute estimation method described in the fourth invention, wherein, in the region of interest setting step, in addition to the first region of interest, a skin reference color and A second region of interest corresponding to the lip and a third region of interest corresponding to the lips are set as the face region, and in the feature amount extraction step, the feature amount is added to the gradient histogram and the second region of interest. It is a person attribute estimation method characterized by extracting a color distance between a region and the third region of interest.

更に、第６の発明は、人物を撮影するように設定されたカメラの画像から顔領域を検出する顔検出工程と、事前に作成された顔モデルを前記顔領域にあてはめることで前記顔領域上における顔器官の位置を特定し、特定した顔器官の位置から、人物属性の推定に利用する特徴量を求める関心領域として、シワが発生し易い領域である第１の関心領域を前記顔領域に設定する関心領域設定工程と、前記特徴量として、前記第１の関心領域内のエッジ画像から勾配ヒストグラムを抽出する特徴量抽出工程と、複数の学習用顔画像から抽出した前記特徴量を人物属性の区分に分類した結果である学習器を参照し、前記特徴量抽出手段が抽出した前記特徴量を用いて前記顔領域に対応する人物属性の区分を特定する人物属性推定工程を含む処理をコンピュータに実行させるためのコンピュータプログラムである。 Furthermore, the sixth invention is a face detection step for detecting a face area from an image of a camera set to shoot a person, and a face model created in advance is applied to the face area to thereby detect the face area. As the region of interest for determining the feature amount used for estimating the human attribute from the specified position of the facial organ, the first region of interest that is prone to wrinkles is defined as the facial region. A region-of-interest setting step to be set; a feature amount extracting step of extracting a gradient histogram from an edge image in the first region of interest as the feature amount; and the feature amount extracted from a plurality of learning face images A process including a person attribute estimation step of referring to a learning device that is a result of classification into a plurality of classifications and identifying a classification of a person attribute corresponding to the face region using the feature quantities extracted by the feature quantity extraction unit. A computer program to be executed by the Yuta.

更に、上述した課題を解決する第７の発明は、第６の発明に記載した人物属性推定方法であって、前記関心領域設定工程において、前記第１の関心領域に加え、肌の基準色とする領域である第２の関心領域と唇に対応する第３の関心領域を前記顔領域に設定し、前記特徴量抽出工程において、前記特徴量として、前記勾配ヒストグラムに加え、前記第２の関心領域と前記第３の関心領域間の色距離を抽出することを特徴とすることを特徴とするコンピュータプログラムである。 Furthermore, a seventh invention for solving the above-described problem is the person attribute estimation method described in the sixth invention, wherein, in the region of interest setting step, in addition to the first region of interest, a skin reference color and A second region of interest corresponding to the lip and a third region of interest corresponding to the lips are set as the face region, and in the feature amount extraction step, the feature amount is added to the gradient histogram and the second region of interest. A computer program characterized by extracting a color distance between a region and the third region of interest.

このように、本発明によれば、シワ形状に係る特徴量を人物属性の推定を行うことができ、加えて、撮影環境の変化に対してロバスト性を有する人物属性推定装置、人物属性推定方法、及びプログラムを提供できる。 As described above, according to the present invention, it is possible to estimate a person attribute of a feature amount related to a wrinkle shape, and in addition, a person attribute estimation device and a person attribute estimation method having robustness against changes in a shooting environment And programs can be provided.

人物属性推定装置を配置したシステム構成を説明する図。The figure explaining the system configuration which has arranged the person attribute estimating device. 人物属性推定装置が有するハードウェアを説明する図。The figure explaining the hardware which a person attribute estimation apparatus has. 人物属性推定装置に実装されるアプリケーションの層の構成を説明する図。The figure explaining the structure of the layer of the application mounted in a person attribute estimation apparatus. 特徴量抽出層の顔検出手段の動作を説明する図。The figure explaining operation | movement of the face detection means of a feature-value extraction layer. 特徴量抽出層の顔検出手段の動作を説明する補足図。FIG. 6 is a supplementary diagram for explaining the operation of the feature amount extraction layer face detection means. 対象領域の一例を説明する図。The figure explaining an example of an object field. 特徴量抽出手段が勾配ヒストグラムを抽出する動作を説明する図。The figure explaining the operation | movement which a feature-value extraction means extracts a gradient histogram. 特徴量抽出手段が抽出する勾配ヒストグラムを説明する図。The figure explaining the gradient histogram which a feature-value extraction means extracts. 特徴量抽出手段が第２の関心領域と第３の関心領域間の色距離を抽出する動作を説明する図。The figure explaining the operation | movement in which the feature-value extraction means extracts the color distance between the 2nd region of interest and the 3rd region of interest. 本発明の効果を説明する図。The figure explaining the effect of this invention.

ここから、本発明の好適な実施形態について、図面を参照しながら詳細に説明する。なお、これから説明する実施形態は本発明の一実施形態にしか過ぎず、本発明は、これから説明する実施形態に限定されることなく、種々の変形や変更が可能である。 Now, preferred embodiments of the present invention will be described in detail with reference to the drawings. The embodiment described below is only one embodiment of the present invention, and the present invention is not limited to the embodiment described below, and various modifications and changes are possible.

図１は、本実施形態に係る人物属性推定装置１を配置したシステム構成を説明する図である。図１で図示した人物属性推定装置１は、広告を表示する広告媒体２ａ（図１では、デジタルサイネージ）と、広告媒体２ａを閲覧している人物３を撮影するカメラ２と、カメラ２が撮影した映像に含まれる人物３の人物属性（年齢・性別）を推定する人物属性推定装置１とから少なくとも構成される。 FIG. 1 is a diagram illustrating a system configuration in which a person attribute estimation device 1 according to the present embodiment is arranged. The person attribute estimation device 1 illustrated in FIG. 1 is an advertisement medium 2a (digital signage in FIG. 1) that displays an advertisement, a camera 2 that photographs a person 3 who is browsing the advertisement medium 2a, and a camera 2 that captures images. And a person attribute estimation device 1 that estimates the person attributes (age and gender) of the person 3 included in the video.

なお、本実施形態において広告媒体２ａとは、公の場において大衆に対し広告を提示する媒体で、図１では、液晶ディスプレイを用いて広告を表示するデジタルサイネージ（Digital Signage）として広告媒体２ａを図示しているが、広告媒体２ａは、ショーウインドウ、広告看板、店舗内の特定の商品陳列棚などとすることもできる。 In the present embodiment, the advertising medium 2a is a medium that presents an advertisement to the public in a public place. In FIG. 1, the advertising medium 2a is used as a digital signage that displays an advertisement using a liquid crystal display. Although illustrated, the advertisement medium 2a may be a show window, an advertisement signboard, a specific product display shelf in a store, or the like.

図２は、人物属性推定装置１が有するハードウェアを説明する図である。図２に図示したように、コンピュータを利用して実現される装置である人物属性推定装置１は、ハードウェアとして、メモリインターフェースやグラフィックボードなどが実装されたチップセット１ｂに接続しているＣＰＵ（Central Processing Unit）１ａを有し、このチップセット１ｂには、メモリ１ｃ、ディスプレイ１ｄ、キーボード／マウス１ｅ、ＬＡＮインターフェース１ｆ、ハードディスク１ｇ及びカメラインターフェース１ｈが接続されている。 FIG. 2 is a diagram illustrating hardware included in the person attribute estimation device 1. As shown in FIG. 2, the person attribute estimation device 1 which is a device realized using a computer has a CPU (hardware) connected to a chipset 1b mounted with a memory interface, a graphic board, and the like. The chip set 1b is connected with a memory 1c, a display 1d, a keyboard / mouse 1e, a LAN interface 1f, a hard disk 1g, and a camera interface 1h.

人物属性推定装置１のハードディスク１ｇには、人物属性推定装置１を動作させるためのコンピュータプログラムが実装され、このコンピュータプログラムには、基本的な機能を提供するオペレーティングシステムや該オペレーティングシステム上で動作する画像認識に係るライブラリ（例えば、OpenCV）などの他に、画像認識に係るライブラリを利用して人物属性を推定するための機能を提供するアプリケーションが実装される。 A computer program for operating the person attribute estimation apparatus 1 is mounted on the hard disk 1g of the person attribute estimation apparatus 1, and this computer program operates on an operating system that provides basic functions and on the operating system. In addition to a library related to image recognition (for example, OpenCV), an application that provides a function for estimating a person attribute using a library related to image recognition is implemented.

図３は、人物属性推定装置１に実装されるアプリケーション層の構成を説明する図である。図３に図示したように、画像認識に係るライブラリを利用して人物属性を推定するための機能を提供するアプリケーションの層の構成には、カメラ２が撮影した画像が入力されるインターフェース層１０と、インターフェース層１０に入力された画像を解析することで、人物属性の推定に用いる特徴量を該画像から抽出する特徴量抽出層１１と、学習用顔画像から得られた特徴量を用いて予め生成された学習器１３０を有する人物属性学習層１３と、人物属性学習層１３が有する学習器１３０を参照し、特徴量抽出層１１が抽出した特徴量から人物属性を推定する人物属性推定層１２が含まれる。 FIG. 3 is a diagram for explaining the configuration of the application layer implemented in the person attribute estimation device 1. As shown in FIG. 3, an application layer providing a function for estimating a person attribute using a library related to image recognition includes an interface layer 10 to which an image taken by the camera 2 is input. By analyzing the image input to the interface layer 10, the feature amount extraction layer 11 that extracts the feature amount used for estimating the person attribute from the image and the feature amount obtained from the learning face image are used in advance. With reference to the generated person attribute learning layer 13 having the learning device 130 and the learning device 130 having the person attribute learning layer 13, the person attribute estimation layer 12 that estimates the person attribute from the feature amount extracted by the feature amount extraction layer 11. Is included.

ここから、人物属性推定装置１に実装されるアプリケーションの層の構成についてより詳細に説明する。図３に図示したように、インターフェース層１０には、カメラインターフェース１ｈを利用して実現される画像入力手段１００が含まれる。特徴量抽出層１１に出力する画像はカメラ２で撮影した動画に含まれる各フレームであってもよいが、一定時間毎（例えば、１分毎）のフレームであってもよい。 From here, the structure of the layer of the application mounted in the person attribute estimation apparatus 1 is demonstrated in detail. As shown in FIG. 3, the interface layer 10 includes an image input unit 100 realized using the camera interface 1 h. The image output to the feature amount extraction layer 11 may be each frame included in the moving image captured by the camera 2, or may be a frame at regular time intervals (for example, every minute).

次に、特徴量抽出層１１について説明する。図３に図示したように、特徴量抽出層１１には、カメラ２から入力された画像から矩形の顔領域を検出する顔検出手段１１０と、顔検出手段１１０が検出した顔領域毎に、特徴量を抽出する対象領域を検出する関心領域設定手段１１１と、顔検出手段１１０が検出した顔領域の特徴量を抽出する手段として、関心領域設定手段１１１が検出した対象領域からシワ特徴量と唇特徴量を顔領域の特徴量として抽出する特徴量抽出手段１１２が含まれる。 Next, the feature quantity extraction layer 11 will be described. As illustrated in FIG. 3, the feature amount extraction layer 11 includes a face detection unit 110 that detects a rectangular face region from an image input from the camera 2, and a feature for each face region detected by the face detection unit 110. A region-of-interest setting unit 111 that detects a target region from which the amount is extracted, and a unit that extracts a feature amount of the face region detected by the face detection unit 110. A feature quantity extraction unit 112 that extracts the feature quantity as a feature quantity of the face area is included.

ここから、特徴量抽出層１１に含まれる各手段について説明する。図４は、特徴量抽出層１１の顔検出手段１１０の動作を説明する図で、図５は、特徴量抽出層１１の顔検出手段１１０の動作を説明する補足図である。 From here, each means contained in the feature-value extraction layer 11 is demonstrated. FIG. 4 is a diagram for explaining the operation of the face detection unit 110 in the feature quantity extraction layer 11, and FIG. 5 is a supplementary diagram for explaining the operation of the face detection unit 110 in the feature quantity extraction layer 11.

本実施形態において、特徴量抽出層１１の顔検出手段１１０は、インターフェース層１０から入力された画像から顔領域を検出する際、まず、インターフェース層１０の画像入力手段１００に入力された画像から背景を除去する処理を行う（Ｓ１）。背景を除去する手法としては、背景画像を用意しておき、インターフェース層１０から入力された画像のある位置の画素が、背景画像の同じ位置にある画素との差分が閾値未満なら「０」(黒）、閾値以上であれば「１」（白）とする処理を行えばよい。 In the present embodiment, when the face detection unit 110 of the feature quantity extraction layer 11 detects a face area from an image input from the interface layer 10, first, the background from the image input to the image input unit 100 of the interface layer 10 is detected. The process which removes is performed (S1). As a method for removing the background, a background image is prepared, and if the difference between the pixel at a certain position in the image input from the interface layer 10 and the pixel at the same position in the background image is less than a threshold value, “0” ( Black), if it is equal to or greater than the threshold value, the process of “1” (white) may be performed.

インターフェース層１０から入力された画像から背景を除去すると、特徴量抽出層１１の顔検出手段１１０は、図５（ａ）に図示したように、インターフェース層１０から入力された画像から背景が除去された後の領域をｂｌｏｂ（小塊）４ａとして抽出し、更に、ｂｌｏｂ４ａのサイズに依存する矩形で囲った領域をｂｌｏｂ領域４ｂとして抽出する（Ｓ２）。 When the background is removed from the image input from the interface layer 10, the face detection unit 110 of the feature quantity extraction layer 11 removes the background from the image input from the interface layer 10 as illustrated in FIG. The subsequent area is extracted as a blob (small blob) 4a, and further, an area surrounded by a rectangle depending on the size of the blob 4a is extracted as a blob area 4b (S2).

特徴量抽出層１１の顔検出手段１１０は、ｂｌｏｂ（小塊）４ａとｂｌｏｂ領域４ｂを抽出すると、背景を除去する前の画像についてｂｌｏｂ領域４ｂをあてはめ、ｂｌｏｂ領域４ｂ内の画像から顔検出処理を行い、顔領域の候補を検出する（Ｓ３）。なお、顔検出処理については、下記の非特許文献１などで開示されているＨａａｒ-Like特徴量を用いた手法を利用するとよい。
＜非特許文献１＞Paul Viola and Michel J. Jones: "Rapid Object Detection Using a Boosted Cascade of Simple Features", Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.511-518, (2001) When the face detection unit 110 of the feature quantity extraction layer 11 extracts the blob (blob) 4a and the blob area 4b, the face detection process 110 applies the blob area 4b to the image before removing the background, and performs face detection processing from the image in the blob area 4b. To detect face area candidates (S3). For the face detection process, a method using Haar-Like feature values disclosed in Non-Patent Document 1 below may be used.
<Non-Patent Document 1> Paul Viola and Michel J. Jones: "Rapid Object Detection Using a Boosted Cascade of Simple Features", Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.511-518, (2001 )

図５（ｂ）に図示したように、背景を除去する前の画像についてｂｌｏｂ領域４ｂをあてはめ、ｂｌｏｂ領域４ｂ内の画像から顔検出処理を行うと、同一の顔から検出された複数の顔領域４ｃや、顔以外の物を顔として誤検出した顔領域４ｄも含まれるため、本実施形態において、特徴量抽出層１１の顔検出手段１１０は、抽出したｂｌｏｂ領域４ｂ内の画像にて顔検出処理を行った後、図５（ｃ）に図示したように、２つの顔領域において、一方の顔領域の中心と他方の顔領域の中心との距離≦一方の顔領域の中心から端までの距離＋他方の顔領域の中心から端までの距離であるか確認するなどして、顔領域４ｃの候補の重なりを判定し、重なっていると判定された顔領域の中から最大サイズのものを選択するなどして、同一の顔から検出されたと推測される複数の顔領域４ｂを一つの顔領域４ｅにまとめる処理（Ｓ４）や、図５（ｄ）に図示したように、顔領域（ここでは、顔領域４ｅ、ｄ）に含まれるｂｌｏｂ４ａの割合を参照して、顔以外を誤検出した顔領域４ｄの候補を削除する処理を行い（Ｓ５）、最終的に残った顔領域（ここでは、顔領域４ｅ）を顔検出結果とし、最終的に残った顔領域（ここでは、顔領域４ｅ）を所定サイズに正規化して（Ｓ６）、この処理を終了する。 As shown in FIG. 5B, when the blob area 4b is applied to the image before the background is removed, and face detection processing is performed from the image in the blob area 4b, a plurality of face areas detected from the same face 4c and the face area 4d that is erroneously detected as a face other than the face is included, in this embodiment, the face detection unit 110 of the feature amount extraction layer 11 detects the face in the extracted image in the blob area 4b. After performing the processing, as shown in FIG. 5C, in the two face areas, the distance between the center of one face area and the center of the other face area ≦ the distance from the center of one face area to the end. Check whether the distance + the distance from the center to the end of the other face area, etc., and determine whether the candidate face area 4c overlaps, and select the face area of the maximum size from the face areas determined to overlap. Detect from the same face, such as by selecting A plurality of face areas 4b that are estimated to be included in one face area 4e (S4), or as shown in FIG. 5D, the face areas (here, face areas 4e and d) are included. Referring to the ratio of blob 4a, a process of deleting a candidate for face area 4d that is erroneously detected other than the face is performed (S5), and the finally remaining face area (here, face area 4e) is used as a face detection result, The finally remaining face area (here, the face area 4e) is normalized to a predetermined size (S6), and this process is terminated.

次に、特徴量抽出層１１の関心領域設定手段１１１について説明する。本実施形態において、特徴量抽出層１１の関心領域設定手段１１１は、事前に作成された顔モデルを顔検出手段１１０が検出した顔領域に当てはめることで、顔領域における顔器官（目、鼻、唇等）の位置を特定し、特定した顔器官（目、鼻、唇等）の位置を利用し、人物属性の推定に用いる特徴量を求める関心領域として、シワの発生し易い顔の領域である第１の関心領域、肌の基準色とする顔の領域である第２の関心領域及び唇の領域である第３の関心領域を顔領域に設定する処理を行う。なお、顔モデルを顔領域に当てはめる手法としては、下記の非特許文献２などで開示され、Ａｆｆｉｎｅ変換により顔モデルを顔領域に当てはめるＡＳＭ（Active Shape Model）を用いるとよい。
＜非特許文献２＞T.F. Cootes, D. Cooper, C.J. Taylor and J. Graham, "Active Shape Models - Their Training and Application." Computer Vision and Image Understanding. Vol. 61, No. 1, Jan. 1995, pp. 38-59. Next, the region of interest setting unit 111 of the feature quantity extraction layer 11 will be described. In the present embodiment, the region-of-interest setting unit 111 of the feature quantity extraction layer 11 applies a face model created in advance to the face region detected by the face detection unit 110, so that the facial organs (eyes, nose, The area of the face where wrinkles are likely to occur as the region of interest for determining the feature quantity used for estimating the human attribute using the position of the identified facial organs (eyes, nose, lips, etc.) A process of setting a certain first region of interest, a second region of interest that is a facial region as a skin reference color, and a third region of interest that is a lip region is performed as a facial region. As a technique for applying the face model to the face area, ASM (Active Shape Model) disclosed in the following Non-Patent Document 2 or the like and applying the face model to the face area by Affine transformation may be used.
<Non-Patent Document 2> TF Cootes, D. Cooper, CJ Taylor and J. Graham, "Active Shape Models-Their Training and Application." Computer Vision and Image Understanding. Vol. 61, No. 1, Jan. 1995, pp 38-59.

図６は、関心領域の設定の一例を説明する図である。図６では、シワの発生し易い顔の領域である第１の関心領域として、額領域５ａ、左右の目下領域５ｂ、ｃ、左右の法齢線領域５ｄ、５ｅの合計５つが、肌の基準色とする顔の領域である第２の関心領域として鼻領域５ｆが、そして、唇の領域である第３の関心領域として唇領域５ｇが設定され、特徴量抽出層１１の関心領域設定手段１１１は、図６で図示した関心領域と顔器官（目、鼻、唇等）との相対的な位置関係から、顔モデルを顔検出手段１１０が検出した顔領域におけるこれらの関心領域を求める。 FIG. 6 is a diagram illustrating an example of setting the region of interest. In FIG. 6, as a first region of interest which is a face region where wrinkles are likely to occur, a total of five regions including a forehead region 5a, left and right current regions 5b and c, and left and right legal age line regions 5d and 5e, The nose region 5f is set as the second region of interest that is the face region to be colored, and the lip region 5g is set as the third region of interest that is the lip region, and the region of interest setting means 111 of the feature quantity extraction layer 11 is set. Finds these regions of interest in the face region detected by the face detection means 110 from the relative positional relationship between the region of interest shown in FIG. 6 and the facial organs (eyes, nose, lips, etc.).

このように、本実施形態では、事前に作成された顔モデルを顔領域に当てはめ、顔領域における顔器官の位置を検出し、顔器官の位置を利用して関心領域を顔領域に設定するため、正面の顔のみならず斜めの顔に対しても、顔領域における関心領域を正確に検出することができ、本実施形態の人物属性推定装置１は、撮影された顔の向きに対してロバスト性を持つことになる。 As described above, in this embodiment, the face model created in advance is applied to the face area, the position of the face organ in the face area is detected, and the region of interest is set as the face area using the position of the face organ. The region of interest in the face region can be accurately detected not only for the front face but also for the oblique face, and the person attribute estimation device 1 of the present embodiment is robust against the orientation of the photographed face. Will have sex.

次に、特徴量抽出層１１の特徴量抽出手段１１２について説明する。非特許文献１では、額、目尻、法齢線（豊令線）などの各領域におけるエッジ強度の平均と偏差をシワ特徴量として定義しているが、本実施形態においては、年齢・性別に応じてシワ形状（シワの向きや深さ）が変化することに着目し、特徴量抽出層１１の特徴量抽出手段１１２は、関心領域設定手段１１１が検出した第１の関心領域内のエッジ画像から求めた勾配ヒストグラムを、関心領域設定手段１１１が検出した第１の関心領域内のシワ形状（シワの向きや深さ）を示す特徴量として抽出する。 Next, the feature quantity extraction unit 112 of the feature quantity extraction layer 11 will be described. In Non-Patent Document 1, the average and deviation of the edge strength in each region such as the forehead, the corner of the eye, and the legal age line (Hoyu Line) are defined as wrinkle feature amounts. Focusing on the fact that the wrinkle shape (the direction and depth of the wrinkles) changes accordingly, the feature amount extraction unit 112 of the feature amount extraction layer 11 detects the edge image in the first region of interest detected by the region of interest setting unit 111. Is extracted as a feature amount indicating the wrinkle shape (the direction and depth of the wrinkle) in the first region of interest detected by the region-of-interest setting unit 111.

更に、年齢や性別により唇の赤みが異なることに着目し、本実施形態において、特徴量抽出層１１の特徴量抽出手段１１２は、唇の赤みを示す特徴量として、関心領域設定手段１１１が検出した第２の関心領域と第３の関心領域間の色距離を抽出する。なお、第２の関心領域と第３の関心領域間の色距離は相対的な値になるため、撮影条件が変動しても、第２の関心領域と第３の関心領域間の色距離は変動しないため、本実施形態の人物属性推定装置１は、撮影条件の変動に対してロバスト性を有することになる。 Furthermore, paying attention to the difference in redness of lips depending on age and sex, in this embodiment, the feature amount extraction unit 112 of the feature amount extraction layer 11 detects the region of interest setting unit 111 as a feature amount indicating redness of the lips. The color distance between the second region of interest and the third region of interest is extracted. Note that since the color distance between the second region of interest and the third region of interest is a relative value, the color distance between the second region of interest and the third region of interest is the same even if the shooting conditions vary. Since there is no change, the person attribute estimation device 1 of the present embodiment has robustness against changes in shooting conditions.

図７は、特徴量抽出層１１の特徴量抽出手段１１２が勾配ヒストグラムを抽出する動作を説明する図で、この手順は、顔検出手段１１０が検出した顔領域毎に実行される。 FIG. 7 is a diagram for explaining an operation in which the feature quantity extraction unit 112 of the feature quantity extraction layer 11 extracts the gradient histogram. This procedure is executed for each face area detected by the face detection unit 110.

勾配ヒストグラムを抽出する際、特徴量抽出層１１のシワ特徴量抽出手段１１２は、まず、シワの発生し易い顔の対象領域である第１の関心領域内の画像におけるＸ方向及びＹ方向のエッジ画像を算出する（Ｓ１０）。第１の関心領域におけるＸ方向及びＹ方向のエッジ画像を算出する手法としては、空間フィルタの一つであるＳｏｂｅｌフィルタを用いるとよく、ここでは、Ｘ方向及びＹ方向のエッジ画像を算出するため、Ｓｏｂｅｌフィルタのオペレータとしては表1のオペレータを用いるとよい。

When the gradient histogram is extracted, the wrinkle feature amount extraction unit 112 of the feature amount extraction layer 11 first detects edges in the X direction and the Y direction in the image in the first region of interest that is the target region of the face where wrinkles are likely to occur. An image is calculated (S10). As a method for calculating the edge image in the X direction and the Y direction in the first region of interest, a Sobel filter that is one of spatial filters may be used. Here, in order to calculate the edge image in the X direction and the Y direction. The operators shown in Table 1 may be used as Sobel filter operators.

シワの発生し易い顔の対象領域である第１の関心領域内の画像におけるＸ方向及びＹ方向のエッジ画像を算出すると、特徴量抽出層１１の特徴量抽出手段１１２は、数式１を用いて、Ｘ方向及びＹ方向のエッジ画像の画素毎に勾配強度・勾配方向を算出する（Ｓ１１）。

When the edge image in the X direction and the Y direction in the image in the first region of interest, which is the target region of the face where wrinkles are likely to occur, is calculated, the feature amount extraction unit 112 of the feature amount extraction layer 11 uses Equation 1. The gradient strength / gradient direction is calculated for each pixel of the edge images in the X and Y directions (S11).

Ｘ方向及びＹ方向のエッジ画像の画素毎に勾配強度・勾配方向を算出すると、特徴量抽出層１１の特徴量抽出手段１１２は、勾配強度・勾配方向を元に、Ｘ軸を勾配方向、Ｙ軸を勾配強度とした勾配ヒストグラムを生成し（Ｓ１２）、この手順を終了する。 When the gradient strength / gradient direction is calculated for each pixel of the edge images in the X direction and the Y direction, the feature amount extraction unit 112 of the feature amount extraction layer 11 uses the gradient strength / gradient direction as the gradient direction, Y A gradient histogram with the axis as the gradient strength is generated (S12), and this procedure is terminated.

図８は、特徴量抽出層１１の特徴量抽出手段１１２が抽出する勾配ヒストグラムを説明する図である。勾配ヒストグラムの勾配方向は全方向（０度〜３６０度）でも良いが、本実施形態において、特徴量抽出層１１の特徴量抽出手段１１２は、全方向（０度〜３６０度）を１０分割した３６方向の勾配ヒストグラムを生成する。 FIG. 8 is a diagram for explaining a gradient histogram extracted by the feature amount extraction unit 112 of the feature amount extraction layer 11. The gradient direction of the gradient histogram may be all directions (0 degrees to 360 degrees). However, in this embodiment, the feature amount extraction unit 112 of the feature amount extraction layer 11 divides all directions (0 degrees to 360 degrees) into 10 parts. A gradient histogram in 36 directions is generated.

対象領域内の画像のＸ方向及びＹ方向のエッジ画像から生成された勾配ヒストグラムは、対象領域内における物体形状を示すことになり、ここでは、シワが発生し易い第１の関心領域が対象領域となるため、特徴量抽出層１１の特徴量抽出手段１１２が生成する勾配ヒストグラムは、第１の関心領域内の画像におけるシワ形状を示すことになる。 The gradient histogram generated from the X-direction and Y-direction edge images of the image in the target region indicates the object shape in the target region. Here, the first region of interest where wrinkles are likely to occur is the target region. Therefore, the gradient histogram generated by the feature amount extraction unit 112 of the feature amount extraction layer 11 indicates a wrinkle shape in the image in the first region of interest.

次に、特徴量抽出層１１の特徴量抽出手段１１２が、第２の関心領域と第３の関心領域間の色距離を抽出する動作について説明する。図９は、特徴量抽出層１１の特徴量抽出手段１１２第２の関心領域と第３の関心領域間の色距離を抽出する動作を説明する図で、この手順は、顔検出手段１１０が検出した顔領域毎に実行される。 Next, an operation in which the feature amount extraction unit 112 of the feature amount extraction layer 11 extracts a color distance between the second region of interest and the third region of interest will be described. FIG. 9 is a diagram for explaining the operation of extracting the color distance between the second region of interest and the third region of interest in the feature amount extraction unit 112 of the feature amount extraction layer 11. This procedure is detected by the face detection unit 110. It is executed for each face area.

特徴量抽出層１１の特徴量抽出手段１１２は、唇特徴量を抽出する際、基準肌領域内の画像と第３の関心領域内の画像それぞれから、赤み成分を多く含む色空間上での色ヒストグラムを求めた後（Ｓ２０）、該色ヒストグラム間の差分を色距離として抽出し（Ｓ２１）、この手順を終了する。なお、本実施形態では、赤み成分を多く含む色空間を、Ｈ（色相）、Ｓ（彩度）及びＲ（赤）の３つの色空間とし、色ヒストグラム間の差分として数式２に従いバタチャリヤ距離を算出するが、ヒストグラム間の差分としてマハラノビス距離を算出してもよい。

When extracting the lip feature value, the feature value extraction unit 112 of the feature value extraction layer 11 extracts a color in a color space containing a lot of redness components from the image in the reference skin region and the image in the third region of interest. After obtaining the histogram (S20), the difference between the color histograms is extracted as a color distance (S21), and this procedure is terminated. In the present embodiment, a color space containing a lot of redness components is defined as three color spaces of H (hue), S (saturation), and R (red), and the batchary distance according to Equation 2 is set as a difference between the color histograms. Although calculated, the Mahalanobis distance may be calculated as a difference between histograms.

このように、特徴量抽出層１１の特徴量抽出手段１１２は、顔画像の特徴量として、合計で５つの第１の関心領域毎に３６方向の勾配ヒストグラムを作成し、また、Ｈ（色相）、Ｓ（彩度）及びＲ（赤）の３つの色空間毎に、第２の関心領域と第３の関心領域間の色距離（ここでは、バタチャリヤ距離）を抽出するため、一つの顔領域から抽出される特徴量は１８３次元のベクトルとなる。 As described above, the feature amount extraction unit 112 of the feature amount extraction layer 11 creates a gradient histogram in 36 directions for each of the five first regions of interest as the feature amount of the face image, and H (hue) , S (saturation) and R (red) for each of the three color spaces, one facial region is used to extract the color distance between the second region of interest and the third region of interest (here, the batcha rear distance). The feature quantity extracted from is a 183 dimensional vector.

次に、人物属性学習層１３の学習器１３０について説明する。人物属性学習層１３には、特徴量抽出層１１から得られた特徴量（ここでは、勾配ヒストグラムと色距離）から人物属性（ここでは、年齢・性別）を推定する際に用いる学習器１３０と、機械学習により学習器１３０を作成する学習器生成手段１３１が設けられる。 Next, the learning device 130 of the person attribute learning layer 13 will be described. The person attribute learning layer 13 includes a learning device 130 used for estimating a person attribute (here, age / sex) from the feature amount (here, gradient histogram and color distance) obtained from the feature amount extraction layer 11. A learning device generating means 131 is provided for creating the learning device 130 by machine learning.

人物属性学習層１３の学習器１３０は、学習器生成手段１３１により人物属性を推定する前に予め生成される。本実施形態において、人物属性学習層１３の学習器生成手段１３１は、ＳＶＭ（Support Vector Machine）により学習器１３０を生成する。 The learning device 130 of the person attribute learning layer 13 is generated in advance before the learning device generating means 131 estimates the person attribute. In this embodiment, the learning device generation means 131 of the person attribute learning layer 13 generates the learning device 130 by SVM (Support Vector Machine).

ＳＶＭ（Support Vector Machine）を利用して学習器１３０を生成する際、予め、人物属性の区分（ここでは、男女の性別に４つの年齢区分毎に、０歳から１９歳、２０歳から３４歳、３５歳から４９歳、５０歳以上を設定した合計８区分）毎に複数の学習用顔画像を用意し、学習用顔画像毎に上述した１８３次元の特徴量を抽出し、カーネル関数を用いた非線形分離を利用するＳＶＭにより、それぞれの学習用顔画像から抽出された特徴量を８個の人物属性の区分に分離した結果が学習器１３０として利用される。 When the learning device 130 is generated using SVM (Support Vector Machine), classification of person attributes (here, from 0 to 19 years old, from 20 to 34 years old, for each of four age categories according to gender sex) Prepare a plurality of learning face images for each of the 35 to 49 years old and 50 years old or more), extract the above-mentioned 183 dimensional feature amount for each learning face image, and use the kernel function The result obtained by separating the feature amount extracted from each learning face image into eight human attribute sections by the SVM using the non-linear separation is used as the learning device 130.

次に、人物属性推定層１２について説明する。人物属性推定層１２の人物属性推定手段は、人物属性学習層１３の学習器１３０を参照し、特徴量抽出層１１から得られた特徴量（ここでは、シワ特徴量と唇特徴量）から人物属性（ここでは、年齢・性別）を推定する処理を行う。具体的には、人物属性推定層１２の人物属性推定手段は、人物属性（ここでは、年齢・性別）を推定する際、人物属姓学習層の学習器１３０を参照し、特徴量抽出層１１から得られた１８３次元の特徴量が属するベクトル空間に対応した人物属性の区分を該特徴量の対象となった顔領域の人物属性として特定する。 Next, the person attribute estimation layer 12 will be described. The person attribute estimation means of the person attribute estimation layer 12 refers to the learning device 130 of the person attribute learning layer 13, and the person is obtained from the feature amounts (here, the wrinkle feature amount and the lip feature amount) obtained from the feature amount extraction layer 11. A process of estimating attributes (here, age and sex) is performed. Specifically, the person attribute estimation means of the person attribute estimation layer 12 refers to the learning device 130 of the person family name learning layer when estimating the person attribute (here, age and sex), and the feature amount extraction layer 11 The person attribute classification corresponding to the vector space to which the 183 dimensional feature value obtained from the above belongs is specified as the person attribute of the face area targeted by the feature value.

最後に、本発明の効果について説明する。図１０は、本発明の効果を説明する図で、図１０（ａ）は年齢推定結果を示し、図１０（ｂ）は性別推定結果を示し、図１０内のパーセント表示は、図１０（ａ）の番号２であれば、２０歳から３４歳までの女性を正しく推定できた識別率が各角度で８７％、８２％・・・であることを示している。 Finally, the effect of the present invention will be described. FIG. 10 is a diagram for explaining the effects of the present invention. FIG. 10 (a) shows the age estimation result, FIG. 10 (b) shows the gender estimation result, and the percentage display in FIG. ) No. 2 indicates that the identification rates for correctly estimating women from the age of 20 to 34 are 87%, 82%... At each angle.

図１０の年齢推定結果では、若年層（１９歳以下）の年齢推定結果が非常に悪い識別率になっているが、これは、人物属性学習器１３０を生成する際に利用した若年層（１９歳以下）の学習用顔画像の数が１５人と少ないためと考えられ、より多くの学習用顔画像を用いて学習器１３０を生成すれば識別率は向上すると思われる。 In the age estimation result of FIG. 10, the age estimation result of the younger group (under 19 years old) has a very poor identification rate. This is because the younger group used in generating the person attribute learner 130 (19 It is considered that the number of learning face images of 15 years old or younger is as small as 15 people. If the learning device 130 is generated using more learning face images, it is considered that the identification rate is improved.

１人物属性推定装置
１０インターフェース層
１００画像入力手段
１１特徴量抽出層
１１０顔検出手段
１１１関心領域設定手段
１１２特徴量抽出手段
１２人物属性推定層
１２０人物属性推定手段
１３人物属性学習層
１３０学習器
１３１学習器生成手段
２カメラ
２ａ広告媒体
３人物
DESCRIPTION OF SYMBOLS 1 Person attribute estimation apparatus 10 Interface layer 100 Image input means 11 Feature-value extraction layer 110 Face detection means 111 Area of interest setting means 112 Feature-value extraction means 12 Person attribute estimation layer 120 Person attribute estimation means 13 Person attribute learning layer 130 Learner 131 Learning device generation means 2 Camera 2a Advertising medium 3 Person

Claims

Face detection means for detecting a face area from an image of a camera set to photograph a person, and a face model created in advance is applied to the face area to identify the position of a facial organ on the face area A region-of-interest setting means for setting, as the face region, a first region of interest, which is a region where wrinkles are likely to occur, as a region of interest for obtaining a feature amount to be used for estimating human attributes from the position of the identified facial organ; As the feature amount, a feature amount extraction unit that extracts a gradient histogram from an edge image in the first region of interest and a result of classifying the feature amount extracted from a plurality of learning face images into individual attribute categories Human attribute estimation characterized by comprising personal attribute estimation means for referring to a learning device and identifying a classification of a person attribute corresponding to the face area using the feature quantity extracted by the feature quantity extraction means Location.

The region-of-interest setting means sets, in addition to the first region of interest, a second region of interest that is a region used as a skin reference color and a third region of interest corresponding to the lips as the face region, The human attribute estimation apparatus according to claim 1, wherein a color distance between the second region of interest and the third region of interest is extracted as a quantity in addition to the gradient histogram.

2. The learning apparatus according to claim 1, further comprising: a learning device generating unit that extracts the feature amount from a plurality of learning face images prepared for each desired person attribute category and creates the learning device by machine learning. Or the person attribute estimation apparatus of Claim 2.

A face detection step for detecting a face area from an image of a camera set to shoot a person, and a position of a facial organ on the face area is specified by applying a face model created in advance to the face area. A region-of-interest setting step of setting, as the face region, a first region of interest that is an area where wrinkles are likely to occur, as a region of interest for obtaining a feature amount used for estimating a human attribute from the position of the identified facial organ; As the feature amount, a feature amount extraction step of extracting a gradient histogram from an edge image in the first region of interest, and a result of classifying the feature amount extracted from a plurality of learning face images into categories of person attributes With reference to the learning device, the computer executes a process including a person attribute estimation step of specifying a person attribute classification corresponding to the face region using the feature quantity extracted by the feature quantity extraction unit. People attribute estimating method comprising.

In the region of interest setting step, in addition to the first region of interest, a second region of interest that is a region used as a skin reference color and a third region of interest corresponding to the lips are set as the face region, and the feature 6. The person attribute according to claim 5, wherein, in the quantity extraction step, a color distance between the second region of interest and the third region of interest is extracted as the feature amount in addition to the gradient histogram. Estimation method.

A face detection step for detecting a face area from an image of a camera set to shoot a person, and a position of a facial organ on the face area is specified by applying a face model created in advance to the face area. A region-of-interest setting step of setting, as the face region, a first region of interest that is an area where wrinkles are likely to occur, as a region of interest for obtaining a feature amount used for estimating a human attribute from the position of the identified facial organ; As the feature amount, a feature amount extraction step of extracting a gradient histogram from an edge image in the first region of interest, and a result of classifying the feature amount extracted from a plurality of learning face images into categories of person attributes Referring to the learning device, the computer is caused to execute a process including a person attribute estimation step of specifying a person attribute classification corresponding to the face region using the feature quantity extracted by the feature quantity extraction unit. Because of the computer program.

In the region of interest setting step, in addition to the first region of interest, a second region of interest that is a region used as a skin reference color and a third region of interest corresponding to the lips are set as the face region, and the feature The quantity extraction step includes extracting a color distance between the second region of interest and the third region of interest as the feature amount in addition to the gradient histogram. Computer programs listed in