JP2012053813A

JP2012053813A - Person attribute estimation device, person attribute estimation method and program

Info

Publication number: JP2012053813A
Application number: JP2010197473A
Authority: JP
Inventors: Yasuhisa Matsuba; 靖寿松葉; Satoshi Tabata; 聡田端; Kazumasa Koizumi; 和真小泉; Tetsutaro Ono; 徹太郎小野
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2010-09-03
Filing date: 2010-09-03
Publication date: 2012-03-15

Abstract

PROBLEM TO BE SOLVED: To provide a person attribute estimation device and the like capable of accurately estimating person attributes from a face image.SOLUTION: A wrinkle feature amount, a spot feature amount and a lip feature amount in a face area are calculated as feature amounts for estimating person attributes. The spot feature amount is calculated for each block using a difference between an image after mask processing with a skin mask and a blurred image obtained by further performing smoothing processing thereof. As the wrinkle feature amount, the intensity and direction of a wrinkle are calculated for each block from the image after mask processing with the skin mask. The lip feature amount is calculated for a lip area extracted using the skin mask as a color comparison value with respect to a reference color in the original face area. By using the spot feature amount, the wrinkle feature amount and the lip feature amount calculated in this way, person attributes such as age and sex can be accurately estimated, and robustness against environmental variation can be improved.

Description

本発明は、人物の性別や年齢等の属性を推定する人物属性推定装置等に関する。 The present invention relates to a person attribute estimation device and the like for estimating attributes such as sex and age of a person.

近年、従来の野外看板、テレビ、新聞、雑誌等の媒体に加え、デジタルサイネージによる情報配信が広告手法として注目されている。
デジタルサイネージとは、屋外や店頭、交通機関等の場所において、ディスプレイ等の表示機器で情報を配信する媒体である。デジタルサイネージを活用することにより、動画や音声といった表現豊かなコンテンツの提供が可能であるほか、柔軟な番組作成、ネットワーク化によるタイムリーな情報提示、ターゲットの絞込み等ができるように情報配信の高効率化が期待されている。
また、広告の効果測定指標として、広告を閲覧する人物の属性（年齢や性別）が頻繁に利用される。例えば、広告閲覧者の人物属性をマーケティング指標に応用したり、取得した広告閲覧者の人物属性に応じてタイムリーにデジタルサイネージに放映するコンテンツを切り替えたりすることも可能となり、情報提供者と閲覧者との双方にとってメリットの高いコンテンツ配信を行える。 In recent years, information distribution using digital signage has attracted attention as an advertising method in addition to conventional media such as outdoor signboards, televisions, newspapers, and magazines.
Digital signage is a medium that distributes information using a display device such as a display in places such as outdoors, stores, and transportation facilities. By utilizing digital signage, it is possible to provide expressive content such as video and audio, as well as high information distribution so that flexible program creation, timely information presentation through networking, and narrowing down of targets are possible. Efficiency is expected.
In addition, attributes (age and gender) of the person browsing the advertisement are frequently used as an index for measuring the effectiveness of the advertisement. For example, it is possible to apply the advertisement viewer's personal attributes to marketing indicators, or to switch the content to be broadcast on digital signage in a timely manner according to the acquired personal attributes of the advertisement viewer. Content delivery with high merit for both parties.

ところで、従来の人物属性推定手法としては、非特許文献１に示す手法が提案されている。非特許文献１に示す手法では、青年期以降は顔皮膚に現れるシミやシワが年齢を表す重要な特徴とされるため、顔画像から性別や年齢を推定するが、推定に用いる特徴量として、肌テクスチャ特徴と色相特徴とを考慮している。すなわち、顔画像に対してＳｏｂｅｌフィルタを用いて大きなシワを検出し、Ｇａｂｏｒｊｅｔを用いて小ジワを検出し、これらのシワのエッジ強度の平均と偏差をシワ特徴として定義している。また、シミについては、原画像に平均値フィルタを用いてぼかし画像を生成し、原画像とぼかし画像との差分画像から顔器官等の輪郭部分やしわを閾値処理によって取り除き、シミを抽出している。また、色相特徴として、唇領域及び頬領域における色情報を、修正ＨＳＶ表色系の刺激値のうち色相と彩度とＬ＊ａ＊ｂ表色系の赤み成分ａ＊の平均と偏差を特徴量として定義している。 By the way, as a conventional person attribute estimation method, a method shown in Non-Patent Document 1 has been proposed. In the method shown in Non-Patent Document 1, since the spots and wrinkles appearing on the facial skin are important features representing the age after adolescence, the gender and age are estimated from the face image. Skin texture features and hue features are taken into account. That is, large wrinkles are detected using a Sobel filter on a face image, small wrinkles are detected using a Gabor jet, and the average and deviation of edge strengths of these wrinkles are defined as wrinkle features. As for stains, an average value filter is used for the original image to generate a blurred image, and contours and wrinkles such as facial organs are removed from the difference image between the original image and the blurred image by threshold processing, and the stain is extracted. Yes. Further, as hue characteristics, color information in the lip area and cheek area is characterized by the average and deviation of the hue and saturation and the redness component a * of the L * a * b color system among the stimulation values of the modified HSV color system. It is defined as a quantity.

滝本裕則、他３名、「姿勢変動に影響されない顔画像からの性別年齢推定」、電学論Ｃ、127巻7号、2007号,p.1022−p.1029Hironori Takimoto and three others, “Gender age estimation from face images not affected by posture changes”, Denki Theory C, Vol. 127, No. 7, 2007, p.1022-p.1029

しかしながら、非特許文献１の手法では、原画像とぼかし画像との差分からシミを抽出するため、顔の輪郭や目、鼻、口の輪郭も肌テクスチャ（シミやシワ）として抽出されてしまう恐れがあり改良の余地があった。また、上述の色相特徴は、唇領域及び頬領域に限定して抽出され、また色空間（Ｌ＊ａ＊ｂ、ＨＳＶ）毎の平均を特徴量としているため、環境変動（照明による影響等）へのロバスト性が懸念される。 However, in the method of Non-Patent Document 1, since a stain is extracted from the difference between the original image and the blurred image, the contours of the face, eyes, nose, and mouth may also be extracted as skin texture (stains and wrinkles). There was room for improvement. Further, the above-described hue feature is extracted only in the lip region and the cheek region, and the average for each color space (L * a * b, HSV) is used as a feature amount. There is concern about robustness to

本発明は、このような課題に鑑みてなされたもので、その目的とするところは、多少の環境変動があっても顔画像から人物の属性を精度よく推定することが可能な人物属性推定装置、人物属性推定方法、及びプログラムを提供することである。 The present invention has been made in view of such problems, and the object of the present invention is to provide a person attribute estimation device capable of accurately estimating a person attribute from a face image even if there is some environmental fluctuation. It is to provide a person attribute estimation method and program.

前述した課題を解決するため第１の発明は、入力された画像データから人物の顔領域を検出する顔検出手段と、前記顔領域のうち肌領域を示す肌マスクを生成する肌マスク生成手段と、前記肌マスクを用いて元の顔領域の肌部を抽出したマスク処理後画像と、マスク処理後画像を更にスムージング処理したぼかし画像との差分を用いて肌部のシミを抽出し、シミ特徴量を算出するシミ特徴量算出手段と、前記肌マスクを用いて元の顔領域の肌部を抽出したマスク処理後画像からシワの強度及び方向をシワ特徴量として算出するシワ特徴量算出手段と、前記肌マスクを用いて抽出される唇領域について、元の顔領域内の基準色に対する色の比較値を唇特徴量として算出する唇特徴量算出手段と、入力された複数の学習用人物画像から算出される前記シワ特徴量、前記シミ特徴量、及び前記唇特徴量と、該人物画像についての属性情報とに基づいて人物属性学習データを作成する属性学習手段と、入力された人物画像について前記シミ特徴量、前記シワ特徴量、及び前記唇特徴量を算出し、算出した特徴量と前記人物属性学習データとに基づいて該人物画像の人物属性を推定する推定手段と、を備えることを特徴とする人物属性推定装置である。 In order to solve the above-described problem, the first invention is a face detection unit that detects a human face region from input image data, and a skin mask generation unit that generates a skin mask indicating a skin region of the face region. , Using the difference between the masked image obtained by extracting the skin part of the original face region using the skin mask and the blurred image obtained by further smoothing the image after masking, and extracting the spot on the skin part. A wrinkle feature amount calculating means for calculating an amount, and a wrinkle feature amount calculating means for calculating the wrinkle intensity and direction as a wrinkle feature amount from an image after mask processing in which the skin portion of the original face region is extracted using the skin mask. Lip feature amount calculating means for calculating, as lip feature amounts, a color comparison value with respect to a reference color in the original face region for the lip region extracted using the skin mask, and a plurality of input learning human images Calculated from Wrinkle feature value, the stain feature value, the lip feature value, attribute learning means for creating person attribute learning data based on the attribute information about the person image, and the stain feature value for the input person image And a estimator for calculating the wrinkle feature amount and the lip feature amount, and estimating a person attribute of the person image based on the calculated feature amount and the person attribute learning data. It is an attribute estimation device.

第１の発明によれば、複数の学習用人物画像について顔領域を検出し、肌マスクを用いてマスク処理したマスク処理後画像を用いて、シミ特徴量算出手段、シワ特徴量算出手段、及び唇特徴量算出手段によって、夫々シワ特徴量、シミ特徴量、及び唇特徴量を算出し、人物属性学習データを作成する。また、入力された人物画像について前記シミ特徴量、前記シワ特徴量、及び前記唇特徴量を算出し、算出した特徴量と前記人物属性学習データとに基づいて該人物画像の人物属性を推定する。シミ特徴量は、肌マスクによるマスク処理後画像とこれを更にスムージング処理したぼかし画像との差分を用いて算出されるため、顔の輪郭、目、鼻、口、メガネ等による影響の少ないシミ特徴量を求めることができる。また、シワ特徴量は、シワの強度と方向とが考慮されて求められるため、年齢差や性差を示すシワ特徴量を精度よく算出できる。また、唇特徴量は、肌マスクを用いて抽出される唇領域について、元の顔領域内の基準色に対する色の比較値として算出されるため、画像の環境変動に対してロバスト性の高い特徴抽出を行える。このようにして算出されたシミ特徴量、シワ特徴量、及び唇特徴量を用いることにより年齢・性別等の人物属性を精度よく推定できる。 According to the first aspect of the present invention, a face area is detected from a plurality of person images for learning, and a post-mask processing image obtained by performing mask processing using a skin mask is used. The lip feature amount calculating means calculates the wrinkle feature amount, the stain feature amount, and the lip feature amount, respectively, and creates person attribute learning data. Further, the stain feature amount, the wrinkle feature amount, and the lip feature amount are calculated for the input person image, and the person attribute of the person image is estimated based on the calculated feature amount and the person attribute learning data. . Since the spot feature amount is calculated using the difference between the image after masking with the skin mask and the blurred image obtained by further smoothing the spot feature, the spot feature is less affected by the face contour, eyes, nose, mouth, glasses, etc. The amount can be determined. Further, since the wrinkle feature value is obtained in consideration of the strength and direction of the wrinkle, the wrinkle feature value indicating an age difference or a sex difference can be calculated with high accuracy. In addition, since the lip feature value is calculated as a color comparison value with respect to the reference color in the original face region for the lip region extracted using the skin mask, the lip feature amount is a feature that is highly robust to environmental fluctuations in the image. Extraction is possible. By using the spot feature amount, the wrinkle feature amount, and the lip feature amount calculated in this way, it is possible to accurately estimate a person attribute such as age and sex.

また、第１の発明において、前記シワ特徴量算出手段は、前記マスク処理後画像の各画素について勾配強度及び勾配方向を算出する勾配算出手段と、前記勾配算出手段によって算出された勾配強度及び勾配方向に基づき、勾配方向に対する勾配強度の累計を示す勾配ヒストグラムを、前記顔領域を分割したブロック毎に生成する勾配ヒストグラム生成手段と、を備え、前記ブロック毎の前記勾配ヒストグラムの勾配強度の最大値をシワ特徴量とすることが望ましい。 In the first invention, the wrinkle feature amount calculating unit includes a gradient calculating unit that calculates a gradient intensity and a gradient direction for each pixel of the image after mask processing, and a gradient intensity and gradient calculated by the gradient calculating unit. A gradient histogram generating unit that generates a gradient histogram indicating a cumulative gradient intensity with respect to the gradient direction for each block obtained by dividing the face area, and a gradient gradient maximum value of the gradient histogram for each block. Is preferably a wrinkle feature amount.

本発明では、シワの現れる方向は年代によって異なる傾向がある点に着目し、上述のようにシワ特徴量をシワの強度及び方向から求めている。具体的には、シワ特徴量を勾配ヒストグラムの勾配強度（シワ強度）の最大値と定義している。勾配ヒストグラムは、マスク処理後画像の各画素の勾配強度（シワ強度）及び勾配方向（シワ方向）から求められる。すなわち、本発明では、ブロック毎にシワ方向（勾配方向）別に累積されたシワの強度（勾配強度）のうち最も大きいものがそのブロックにおけるシワ特徴量とされる。このようにして求めたシワ特徴量を年齢推定等に利用するため、従来と比較して年齢推定の精度向上に大きく寄与できる。
一方、従来の手法では、例えば上述の非特許文献１に記載される手法によれば、シワ特徴量は、ＡＲＳＭ（非特許文献１によって提案される特徴点決定方法）によって決定されたいくつかのシワ特徴量抽出領域内（額、目尻、法齢線、唇領域の両端、顎先）におけるエッジ強度の平均と偏差として定義されている。したがって従来の手法はシワの方向が考慮されず、勾配ヒストグラムを用いない点で本発明と異なる。
また、従来の手法は、色相特徴について、唇領域及び頬領域における色情報を、修正ＨＳＶ表色系の刺激値のうち色相と彩度とＬ＊ａ＊ｂ表色系の赤み成分ａ＊の平均と偏差を特徴量として定義している。一方、本発明では、様々な撮影環境の下で撮像された画像ごとに決定される元の顔領域内の基準色に対する色の比較値として唇特徴量を算出するため、画像の環境変動に対してロバスト性の高い特徴抽出を行える。このようにして算出されたシミ特徴量、シワ特徴量、及び唇特徴量を用いることにより従来と比較して、年齢・性別等の人物属性を精度よく推定できる。 In the present invention, paying attention to the fact that the direction in which wrinkles appear tends to vary depending on the age, the wrinkle feature value is obtained from the strength and direction of the wrinkles as described above. Specifically, the wrinkle feature amount is defined as the maximum value of the gradient strength (wrinkle strength) of the gradient histogram. The gradient histogram is obtained from the gradient intensity (wrinkle intensity) and gradient direction (wrinkle direction) of each pixel of the image after mask processing. That is, in the present invention, the largest wrinkle intensity (gradient intensity) accumulated in each wrinkle direction (gradient direction) for each block is the wrinkle feature amount in that block. Since the wrinkle feature amount obtained in this way is used for age estimation or the like, it can greatly contribute to improvement in accuracy of age estimation as compared with the conventional case.
On the other hand, in the conventional method, for example, according to the method described in Non-Patent Document 1 described above, the wrinkle feature amount is determined according to ARSM (feature point determination method proposed by Non-Patent Document 1). It is defined as the average and deviation of the edge strength in the wrinkle feature amount extraction area (forehead, eye corner, age line, both ends of the lip area, chin tip). Therefore, the conventional method is different from the present invention in that the direction of wrinkles is not taken into consideration and the gradient histogram is not used.
Further, the conventional method uses the color information in the lip region and cheek region for the hue feature, and includes the hue and saturation of the stimulation value of the modified HSV color system and the redness component a * of the L * a * b color system. Mean and deviation are defined as feature quantities. On the other hand, in the present invention, since the lip feature amount is calculated as a color comparison value with respect to the reference color in the original face area determined for each image captured under various shooting environments, And robust feature extraction. By using the spot feature amount, the wrinkle feature amount, and the lip feature amount calculated in this way, it is possible to accurately estimate the person attributes such as age and sex as compared with the conventional case.

また、第１の発明において、前記シミ特徴量算出手段は、前記マスク処理後画像に対してＭＴＭフィルタを用いて前記ぼかし画像を生成し、前記顔領域を分割したブロック毎に前記マスク処理後画像と前記ぼかし画像との差分の平均値を算出し、シミ特徴量とすることが望ましい。
ＭＴＭフィルタを用いることにより、周囲との画素の値の差分の大きい、大きなシミやホクロ等をぼかし処理の対象から除外でき、シミ特徴量の考慮対象から排除できる。このため、年齢等との相関が高いシミについてのみ特徴量を抽出できる。 Also, in the first invention, the spot feature value calculating means generates the blurred image using an MTM filter for the image after mask processing, and the image after mask processing for each block obtained by dividing the face area. It is desirable to calculate an average value of the difference between the image and the blurred image and use it as a spot feature amount.
By using the MTM filter, it is possible to exclude large spots, moles, and the like that have a large difference in pixel values from the surroundings from the subject of blurring processing, and to exclude them from the subject of spot feature amount consideration. For this reason, the feature amount can be extracted only for a stain having a high correlation with age and the like.

また、第１の発明において、前記唇特徴量算出手段は、前記肌マスクを反転処理した反転マスクを用いて、前記顔領域から唇領域を抽出し、当該唇領域及び元の顔領域内の基準領域について夫々色相Ｈ、彩度Ｓ、赤みＲを対象色空間とするヒストグラムを生成し、これらのヒストグラムの距離を唇特徴量として算出することが望ましい。
元の顔領域の基準領域との差分（ヒストグラムの距離）から唇特徴量を求めるので、照明の強さなどの環境変動に対してロバストな特徴抽出を行える。 In the first invention, the lip feature value calculating means extracts a lip region from the face region using a reversal mask obtained by reversing the skin mask, and a reference in the lip region and the original face region. It is desirable to generate a histogram with the hue H, saturation S, and redness R as the target color space for each region, and calculate the distance between these histograms as the lip feature amount.
Since the lip feature amount is obtained from the difference (histogram distance) between the original face region and the reference region, feature extraction that is robust against environmental changes such as illumination intensity can be performed.

第２の発明は、入力された画像データから人物の顔領域を検出する顔検出手段と、前記顔領域のうち肌領域を示す肌マスクを生成する肌マスク生成手段と、前記肌マスクを用いて元の顔領域の肌部を抽出したマスク処理後画像と、マスク処理後画像を更にスムージング処理したぼかし画像との差分を用いて肌部のシミを抽出し、シミ特徴量を算出するシミ特徴量算出手段と、前記肌マスクを用いて元の顔領域の肌部を抽出したマスク処理後画像からシワの強度及び方向をシワ特徴量として算出するシワ特徴量算出手段と、前記肌マスクを用いて抽出される唇領域について、元の顔領域内の基準色に対する色の比較値を唇特徴量として算出する唇特徴量算出手段と、複数の学習用人物画像から算出される前記シワ特徴量、前記シミ特徴量、および前記唇特徴量と、該人物画像について与えられた属性情報とに基づいて所定の学習アルゴリズムにより作成された人物属性学習データを格納するメモリ手段と、入力された人物画像について前記シミ特徴量、前記シワ特徴量、及び前記唇特徴量を算出し、算出した特徴量と前記人物属性学習データとに基づいて該人物画像の人物属性を推定する推定手段と、を備えることを特徴とする人物属性推定装置である。 According to a second aspect of the present invention, there is provided face detection means for detecting a human face area from input image data, skin mask generation means for generating a skin mask indicating a skin area of the face area, and the skin mask. A spot feature that extracts the spot of the skin using the difference between the masked image obtained by extracting the skin of the original face area and the blurred image obtained by further smoothing the image after the mask processing, and calculates the spot feature A calculating means; a wrinkle feature amount calculating means for calculating the wrinkle intensity and direction as a wrinkle feature amount from the mask-processed image obtained by extracting the skin portion of the original face area using the skin mask; and the skin mask. For the extracted lip region, lip feature amount calculating means for calculating a color comparison value for the reference color in the original face region as a lip feature amount, the wrinkle feature amount calculated from a plurality of learning person images, Stain feature, and Memory means for storing person attribute learning data created by a predetermined learning algorithm based on the lip feature quantity and attribute information given for the person image, the stain feature quantity for the input person image, Human attribute estimation, comprising: an estimation means for calculating a wrinkle feature amount and the lip feature amount, and estimating a person attribute of the person image based on the calculated feature amount and the person attribute learning data Device.

第２の発明によれば、複数の学習用人物画像のシワ特徴量、シミ特徴量、及び唇特徴量と、該人物画像について与えられた属性情報とに基づいて所定の学習アルゴリズムにより作成された人物属性学習データをメモリ手段に記憶しておき、入力された人物画像について前記シミ特徴量、前記シワ特徴量、及び前記唇特徴量を算出し、算出した特徴量と前記人物属性学習データとに基づいて該人物画像の人物属性を推定する。シミ特徴量は、肌マスクによるマスク処理後画像とこれを更にスムージング処理したぼかし画像との差分を用いて算出されるため、顔の輪郭、目、鼻、口、メガネ等による影響の少ないシミ特徴量を求めることができる。また、シワ特徴量は、シワの強度と方向とが考慮されて求められるため、年齢差や性差を示すシワ特徴量を精度よく算出できる。また、唇特徴量は、肌マスクを用いて抽出される唇領域について、元の顔領域内の基準色に対する色の比較値として算出されるため、画像の環境変動に対してロバスト性の高い特徴抽出を行える。このようにして算出されたシミ特徴量、シワ特徴量、及び唇特徴量を用いて予め作成されている人物属性学習データに基づいて、入力された人物画像について、年齢・性別等の人物属性を精度よく推定できる。 According to the second invention, the wrinkle feature amount, the stain feature amount, and the lip feature amount of the plurality of learning person images and the attribute information given to the person image are created by a predetermined learning algorithm. Person attribute learning data is stored in the memory means, the spot feature amount, the wrinkle feature amount, and the lip feature amount are calculated for the input person image, and the calculated feature amount and the person attribute learning data are used. Based on this, the person attribute of the person image is estimated. Since the spot feature amount is calculated using the difference between the image after masking with the skin mask and the blurred image obtained by further smoothing the spot feature, the spot feature is less affected by the face contour, eyes, nose, mouth, glasses, etc. The amount can be determined. Further, since the wrinkle feature value is obtained in consideration of the strength and direction of the wrinkle, the wrinkle feature value indicating an age difference or a sex difference can be calculated with high accuracy. In addition, since the lip feature value is calculated as a color comparison value with respect to the reference color in the original face region for the lip region extracted using the skin mask, the lip feature amount is a feature that is highly robust to environmental fluctuations in the image. Extraction is possible. Based on the person attribute learning data created in advance using the spot feature amount, the wrinkle feature amount, and the lip feature amount calculated in this way, the person attributes such as age and gender are set for the input person image. It can be estimated accurately.

第２の発明において、画像を撮像する撮像手段と、前記撮像手段によって撮像された画像を前記推定手段における人物属性推定対象として入力する人物画像入力手段と、を更に備えることが望ましい。
これにより、撮像手段によって撮像した画像を人物属性の推定対象として入力できるようになる。例えば、撮像手段によってデジタルサイネージの閲覧者を撮像し、その画像に含まれる人物の画像を人物属性推定対象として入力すれば、本発明の人物属性推定装置による人物属性推定結果をデジタルサイネージの効果測定装置として利用可能となる。 In the second invention, it is desirable to further include an imaging unit that captures an image, and a person image input unit that inputs an image captured by the imaging unit as a person attribute estimation target in the estimation unit.
As a result, an image captured by the imaging unit can be input as a person attribute estimation target. For example, if a viewer of digital signage is imaged by an imaging means, and an image of a person included in the image is input as a person attribute estimation target, the person attribute estimation result of the present invention is used to measure the effect of digital signage. It can be used as a device.

第３の発明は、入力された画像データから人物の顔領域を検出する顔検出ステップと、前記顔領域のうち肌領域を示す肌マスクを生成する肌マスク生成ステップと、前記肌マスクを用いて元の顔領域の肌部を抽出したマスク処理後画像と、マスク処理後画像を更にスムージング処理したぼかし画像との差分を用いて肌部のシミを抽出し、シミ特徴量を算出するシミ特徴量算出ステップと、前記肌マスクを用いて元の顔領域から肌部を抽出したマスク処理後画像からシワの強度及び方向をシワ特徴量として算出するシワ特徴量算出ステップと、前記肌マスクを用いて抽出される唇領域について、元の顔領域内の基準色に対する色の比較値を唇特徴量として算出する唇特徴量算出ステップと、入力された複数の学習用人物画像から算出される前記シワ特徴量、前記シミ特徴量、及び前記唇特徴量と、該人物画像についての属性情報とに基づいて人物属性学習データを作成する属性学習ステップと、入力された人物画像について前記シミ特徴量、前記シワ特徴量、及び前記唇特徴量を算出し、算出した特徴量と前記人物属性学習データとに基づいて該人物画像の人物属性を推定する推定ステップと、を含む処理をコンピュータが実行することを特徴とする人物属性推定方法である。 A third invention uses a face detection step for detecting a human face area from input image data, a skin mask generation step for generating a skin mask indicating a skin area of the face area, and the skin mask. A spot feature that extracts the spot of the skin using the difference between the masked image obtained by extracting the skin of the original face area and the blurred image obtained by further smoothing the image after the mask processing, and calculates the spot feature A wrinkle feature amount calculating step for calculating a wrinkle intensity and direction as a wrinkle feature amount from a mask-processed image obtained by extracting a skin portion from an original face area using the skin mask; and using the skin mask. For the extracted lip region, a lip feature amount calculating step for calculating a color comparison value with respect to the reference color in the original face region as a lip feature amount, and the lip feature amount calculated from a plurality of input learning human images An attribute learning step for creating person attribute learning data based on the feature value, the spot feature quantity, the lip feature quantity, and attribute information about the person image, and the spot feature quantity about the input person image, The computer executes a process including calculating the wrinkle feature value and the lip feature value, and estimating a person attribute of the person image based on the calculated feature value and the person attribute learning data. Is a person attribute estimation method characterized by

第３の発明によれば、複数の学習用人物画像について顔領域を検出し、肌マスクを用いてマスク処理したマスク処理後画像を用いて、夫々シワ特徴量、シミ特徴量、及び唇特徴量を算出し、人物属性学習データを作成する。また、入力された人物画像について前記シミ特徴量、前記シワ特徴量、及び前記唇特徴量を算出し、算出した特徴量と前記人物属性学習データとに基づいて該人物画像の人物属性を推定する。シミ特徴量は、肌マスクによるマスク処理後画像とこれを更にスムージング処理したぼかし画像との差分を用いて算出されるため、顔の輪郭、目、鼻、口、メガネ等による影響の少ないシミ特徴量を求めることができる。また、シワ特徴量は、シワの強度と方向とが考慮されて求められるため、年齢差や性差を示すシワ特徴量を精度よく算出できる。また、唇特徴量は、肌マスクを用いて抽出される唇領域について、元の顔領域内の基準色に対する色の比較値として算出されるため、画像の環境変動に対してロバスト性の高い特徴抽出を行える。このようにして算出されたシミ特徴量、シワ特徴量、及び唇特徴量を用いることにより年齢・性別等の人物属性を精度よく推定できる。 According to the third invention, a wrinkle feature amount, a stain feature amount, and a lip feature amount are respectively detected using a mask-processed image obtained by detecting a face region of a plurality of learning person images and performing a mask process using a skin mask. And person attribute learning data is created. Further, the stain feature amount, the wrinkle feature amount, and the lip feature amount are calculated for the input person image, and the person attribute of the person image is estimated based on the calculated feature amount and the person attribute learning data. . Since the spot feature amount is calculated using the difference between the image after masking with the skin mask and the blurred image obtained by further smoothing the spot feature, the spot feature is less affected by the face contour, eyes, nose, mouth, glasses, etc. The amount can be determined. Further, since the wrinkle feature value is obtained in consideration of the strength and direction of the wrinkle, the wrinkle feature value indicating an age difference or a sex difference can be calculated with high accuracy. In addition, since the lip feature value is calculated as a color comparison value with respect to the reference color in the original face region for the lip region extracted using the skin mask, the lip feature amount is a feature that is highly robust to environmental fluctuations in the image. Extraction is possible. By using the spot feature amount, the wrinkle feature amount, and the lip feature amount calculated in this way, it is possible to accurately estimate a person attribute such as age and sex.

また、第３の発明において、前記シワ特徴量算出ステップは、前記マスク処理後画像の各画素について勾配強度及び勾配方向を算出する勾配算出ステップと、算出された勾配強度及び勾配方向に基づき、勾配方向に対する勾配強度の累計を示す勾配ヒストグラムを、前記顔領域を分割したブロック毎に生成する勾配ヒストグラム生成ステップと、を含み、前記ブロック毎の前記勾配ヒストグラムの勾配強度の最大値をシワ特徴量とすることが望ましい。
このように、本発明の人物属性推定方法では、シワの方向は年代によって異なる傾向がある点に着目し、シワ特徴量を、勾配ヒストグラムの勾配強度の最大値と定義している。勾配ヒストグラムは、マスク処理後画像の各画素の勾配強度及び勾配方向から求められる。すなわち、本発明では、ブロック毎にシワ方向（勾配方向）別に累積されたシワの強度（勾配強度）のうち最も大きいものがそのブロックにおけるシワ特徴量とされる。このようにして求めたシワ特徴量を年齢推定等に利用するため、従来と比較して年齢推定の精度向上に大きく寄与できる。 In the third invention, the wrinkle feature amount calculating step includes a gradient calculating step for calculating a gradient strength and a gradient direction for each pixel of the image after mask processing, and a gradient based on the calculated gradient strength and gradient direction. A gradient histogram generation step for generating a gradient histogram indicating a cumulative gradient intensity with respect to a direction for each block obtained by dividing the face area, and the maximum gradient intensity of the gradient histogram for each block is a wrinkle feature amount. It is desirable to do.
Thus, in the person attribute estimation method of the present invention, focusing on the fact that the direction of wrinkles tends to vary depending on the age, the wrinkle feature amount is defined as the maximum value of the gradient strength of the gradient histogram. The gradient histogram is obtained from the gradient intensity and gradient direction of each pixel of the image after mask processing. That is, in the present invention, the largest wrinkle intensity (gradient intensity) accumulated in each wrinkle direction (gradient direction) for each block is the wrinkle feature amount in that block. Since the wrinkle feature amount obtained in this way is used for age estimation or the like, it can greatly contribute to improvement in accuracy of age estimation as compared with the conventional case.

第４の発明は、コンピュータにより読み取り可能な形式で記述されたプログラムであって、入力された画像データから人物の顔領域を検出する顔検出ステップと、前記顔領域のうち肌領域を示す肌マスクを生成する肌マスク生成ステップと、前記肌マスクを用いて元の顔領域の肌部を抽出したマスク処理後画像と、マスク処理後画像を更にスムージング処理したぼかし画像との差分を用いて肌部のシミを抽出し、シミ特徴量を算出するシミ特徴量算出ステップと、前記肌マスクを用いて元の顔領域から肌部を抽出したマスク処理後画像からシワの強度及び方向をシワ特徴量として算出するシワ特徴量算出ステップと、前記肌マスクを用いて抽出される唇領域について、元の顔領域内の基準色に対する色の比較値を唇特徴量として算出する唇特徴量算出ステップと、入力された複数の学習用人物画像から算出される前記シワ特徴量、前記シミ特徴量、及び前記唇特徴量と、該人物画像についての属性情報とに基づいて人物属性学習データを作成する属性学習ステップと、入力された人物画像について前記シミ特徴量、前記シワ特徴量、及び前記唇特徴量を算出し、算出した特徴量と前記人物属性学習データとに基づいて該人物画像の人物属性を推定する推定ステップと、を含む処理をコンピュータに実行させるためのプログラムである。 A fourth aspect of the invention is a program described in a computer-readable format, a face detecting step for detecting a human face area from input image data, and a skin mask indicating a skin area of the face area A skin mask using a difference between a skin mask generation step that generates a skin mask, an image after mask processing in which the skin portion of the original face area is extracted using the skin mask, and a blurred image obtained by further smoothing the image after mask processing A wrinkle feature amount calculating step for extracting a wrinkle feature amount, and a wrinkle intensity and direction as a wrinkle feature amount from an image after mask processing in which the skin portion is extracted from the original face region using the skin mask. A wrinkle feature amount calculating step for calculating and a lip feature for calculating a lip feature value by comparing a color comparison value with respect to a reference color in the original face region for the lip region extracted using the skin mask. Personal attribute learning data based on a quantity calculating step, the wrinkle feature quantity, the spot feature quantity, the lip feature quantity calculated from a plurality of input learning person images, and attribute information about the person image An attribute learning step for generating the image, calculating the spot feature amount, the wrinkle feature amount, and the lip feature amount for the input person image, and the person image based on the calculated feature amount and the person attribute learning data A program for causing a computer to execute a process including an estimation step of estimating a person attribute of the computer.

また、第４の発明において、前記シワ特徴量算出ステップは、前記マスク処理後画像の各画素について勾配強度及び勾配方向を算出する勾配算出ステップと、算出された勾配強度及び勾配方向に基づき、勾配方向に対する勾配強度の累計を示す勾配ヒストグラムを、前記顔領域を分割したブロック毎に生成する勾配ヒストグラム生成ステップと、を含み、前記ブロック毎の前記勾配ヒストグラムの勾配強度の最大値をシワ特徴量とすることが望ましい。 In the fourth invention, the wrinkle feature amount calculating step includes a gradient calculating step for calculating a gradient strength and a gradient direction for each pixel of the image after mask processing, and a gradient based on the calculated gradient strength and gradient direction. A gradient histogram generation step for generating a gradient histogram indicating a cumulative gradient intensity with respect to a direction for each block obtained by dividing the face area, and the maximum gradient intensity of the gradient histogram for each block is a wrinkle feature amount. It is desirable to do.

第４の発明により、コンピュータを第１の発明の人物属性推定装置として機能させることが可能となる。 According to the fourth invention, it is possible to cause a computer to function as the person attribute estimation device of the first invention.

本発明によれば、多少の環境変動があっても顔画像から人物の属性を精度よく推定することが可能な人物属性推定装置、人物属性推定方法、及びプログラムを提供できる。 According to the present invention, it is possible to provide a person attribute estimation device, a person attribute estimation method, and a program capable of accurately estimating a person attribute from a face image even if there is some environmental variation.

本発明に係る人物属性推定システム１の全体構成図1 is an overall configuration diagram of a person attribute estimation system 1 according to the present invention. 人物属性推定システム１の機能ブロック図Functional block diagram of person attribute estimation system 1 人物属性推定装置（コンピュータ）２の内部構成を示すブロック図The block diagram which shows the internal structure of the person attribute estimation apparatus (computer) 2 人物属性推定処理の流れを説明するフローチャートFlow chart explaining the flow of person attribute estimation processing 顔領域の検出と肌マスクの作成について説明する図Diagram explaining detection of face area and creation of skin mask シミ特徴量の抽出処理について説明する図The figure explaining the extraction process of a stain feature-value シミ画像６５の一例Example of spot image 65 顔領域の分割の一例An example of facial area division シワ特徴量の抽出処理について説明する図The figure explaining the extraction process of a wrinkle feature-value 勾配ヒストグラム６９について説明する図The figure explaining the gradient histogram 69 唇特徴量の抽出処理について説明する図The figure explaining the extraction process of lip feature-value 特徴量テーブル８１の一例An example of the feature amount table 81

以下、図面に基づいて本発明の好適な実施形態について詳細に説明する。
まず、図１〜図３を参照して本発明に係る人物属性推定装置の構成について説明する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.
First, the configuration of the person attribute estimation device according to the present invention will be described with reference to FIGS.

図１は本発明に係る人物属性推定装置を利用した人物属性推定システム１のハードウエア構成を示す図であり、図２は人物属性推定システム１の機能ブロック図であり、図３は人物属性推定装置２として機能するコンピュータの内部構成を示すブロック図である。 FIG. 1 is a diagram showing a hardware configuration of a person attribute estimation system 1 using a person attribute estimation apparatus according to the present invention, FIG. 2 is a functional block diagram of the person attribute estimation system 1, and FIG. 3 is a person attribute estimation. 3 is a block diagram showing an internal configuration of a computer that functions as the device 2. FIG.

図１に示すように、本実施の形態において、人物属性推定システム１は、人物属性推定装置２として機能するコンピュータと、撮像装置３と、表示装置４とを備えて構成される。 As shown in FIG. 1, in the present embodiment, a person attribute estimation system 1 includes a computer that functions as a person attribute estimation device 2, an imaging device 3, and a display device 4.

表示装置４は、デジタルサイネージ（電子看板）等に利用される表示装置であり、ディスプレイ装置やプロジェクタ等の投影装置、その他の表示媒体としてもよい。 The display device 4 is a display device used for digital signage (electronic signage) or the like, and may be a projection device such as a display device or a projector, or other display medium.

撮像装置３は、表示装置４を閲覧する人物１０ａ，１０ｂ，１０ｃ，１０ｄ，・・・を、撮像するカメラ等であり、撮像した人物画像を人物属性推定装置２に出力する。 The imaging device 3 is a camera or the like that images the persons 10a, 10b, 10c, 10d,... Browsing the display device 4, and outputs the captured person image to the person attribute estimation device 2.

人物属性推定装置２は、例えばコンピュータにより構成され、図２（ａ）に示す属性学習機能と、図２（ｂ）に示す属性推定機能とを有する。
属性学習機能とは、学習用の人物画像及びその人物の属性情報（年齢、性別等）に基づいて、人物属性学習データ１０を生成する機能であり、図２（ａ）に示すように、特徴抽出部８と属性学習部９とを備える。
また、属性推定機能とは、処理対象とする人物画像と、図２（ａ）の属性学習機能によって得た人物属性学習データ１０とから、入力された処理対象人物画像の属性を判定し、属性判定結果として出力する機能であり、特徴抽出部８と属性推定部１１とを備える。 The person attribute estimation device 2 is configured by a computer, for example, and has an attribute learning function shown in FIG. 2A and an attribute estimation function shown in FIG.
The attribute learning function is a function for generating the person attribute learning data 10 based on the person image for learning and the attribute information (age, gender, etc.) of the person. As shown in FIG. An extraction unit 8 and an attribute learning unit 9 are provided.
The attribute estimation function determines the attribute of the input processing target person image from the person image to be processed and the person attribute learning data 10 obtained by the attribute learning function of FIG. This is a function for outputting as a determination result, and includes a feature extraction unit 8 and an attribute estimation unit 11.

以下の実施の形態では、上述の属性学習機能と属性推定機能とを共に備えた人物属性推定装置１について説明するが、自ら属性学習機能は持たず、別のコンピュータ等により作成された人物属性学習データを取り込んで記憶するメモリ（ＲＡＭ，記憶部）と、上述の属性推定機能（特徴抽出部８及び属性推定部１１）とを備えた人物属性推定装置も本発明に含まれる。 In the following embodiments, the human attribute estimation device 1 having both the attribute learning function and the attribute estimation function described above will be described. However, the personal attribute learning apparatus created by another computer or the like does not have its own attribute learning function. A person attribute estimation device including a memory (RAM, storage unit) that captures and stores data and the above-described attribute estimation function (feature extraction unit 8 and attribute estimation unit 11) is also included in the present invention.

上述の特徴抽出部８は、入力された人物画像から顔領域を検出し、肌マスクを作成し、シミ特徴量、シワ特徴量、及び唇特徴量を抽出する。特徴抽出部８における各特徴量の抽出の詳細については後述する。 The feature extraction unit 8 detects a face region from the input person image, creates a skin mask, and extracts a stain feature value, a wrinkle feature value, and a lip feature value. Details of extraction of each feature amount in the feature extraction unit 8 will be described later.

属性学習部９は、特徴抽出部８にて抽出した各人物画像のシワ特徴量、シミ特徴量、及び唇特徴量と、それらの人物の属性（年齢及び性別等）とに基づいて、人物属性学習データ１０を生成する。 Based on the wrinkle feature amount, the stain feature amount, and the lip feature amount of each person image extracted by the feature extraction unit 8 and the attributes (age, sex, etc.) of those persons, the attribute learning unit 9 Learning data 10 is generated.

属性推定部１１は、撮像装置３から入力された人物画像について特徴抽出部８によって抽出されたシワ特徴量、シミ特徴量、及び前記唇特徴量と、人物属性学習データ１０とに基づいてその人物画像の属性（年齢及び性別等）を推定する。
本実施の形態では、属性学習及び属性推定には、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）を利用することが好適である。ＳＶＭを用いることにより、上述の各特徴量を成分として有する高次元の特徴ベクトル（サポートベクトル）から属性を識別するための非線形な境界を設定できる。ＳＶＭは公知の手法であるため、詳細な説明を省略するが、例えば、「津田宏治：サポートベクトルマシンとは何か，電子情報通信学会誌，８３，ｐｐ．４６０−４６６（２０００）」に詳しい。なお、学習アルゴリズムは、上述のＳＶＭに限定されず、その他のアルゴリズムを適用してもよい。 The attribute estimation unit 11 is based on the wrinkle feature amount, the stain feature amount, the lip feature amount extracted from the person image input from the imaging device 3 by the feature extraction unit 8, and the person attribute learning data 10. Estimate image attributes (such as age and gender).
In this embodiment, it is preferable to use SVM (Support Vector Machine) for attribute learning and attribute estimation. By using SVM, it is possible to set a non-linear boundary for identifying an attribute from a high-dimensional feature vector (support vector) having each of the above-described feature amounts as a component. Since SVM is a known technique, detailed description thereof is omitted, but for example, detailed in “Koji Tsuda: What is a support vector machine, Journal of IEICE, 83, pp. 460-466 (2000)”. . Note that the learning algorithm is not limited to the above-described SVM, and other algorithms may be applied.

図３は、人物属性推定装置２の内部構成を示すブロック図である。図３に示すように、人物属性推定装置２は、制御部２１、記憶部２２、メディア入出力部２３、入力部２４、表示部２５、通信部２６、周辺機器Ｉ／Ｆ部２７等がバス２８を介して接続されて構成される。周辺機器Ｉ／Ｆ部２７には、撮像装置３が接続される。 FIG. 3 is a block diagram showing an internal configuration of the person attribute estimation device 2. As shown in FIG. 3, the person attribute estimation device 2 includes a control unit 21, a storage unit 22, a media input / output unit 23, an input unit 24, a display unit 25, a communication unit 26, a peripheral device I / F unit 27, and the like. 28 is connected and configured. The imaging device 3 is connected to the peripheral device I / F unit 27.

制御部２１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等により構成される。
ＣＰＵは、記憶部２２、ＲＯＭ、記録媒体等に格納されるプログラムをＲＡＭ上のワークメモリ領域に呼び出して実行し、バス２８を介して接続された各部を駆動制御する。ＲＯＭは、コンピュータのブートプログラムやＢＩＯＳ等のプログラム、データ等を恒久的に保持する。ＲＡＭは、ロードしたプログラムやデータを一時的に保持するとともに、制御部２１が各種処理を行うために使用するワークエリアを備える。 The control unit 21 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like.
The CPU calls and executes a program stored in the storage unit 22, ROM, recording medium or the like to a work memory area on the RAM, and drives and controls each unit connected via the bus 28. The ROM permanently holds a computer boot program, a program such as BIOS, data, and the like. The RAM temporarily holds the loaded program and data, and includes a work area used by the control unit 21 for performing various processes.

本実施の形態において、制御部２１（ＣＰＵ）は、記憶部２２に格納されている属性推定処理プログラムを呼び出して実行する。属性推定処理については後述する（図４参照）。 In the present embodiment, the control unit 21 (CPU) calls and executes an attribute estimation processing program stored in the storage unit 22. The attribute estimation process will be described later (see FIG. 4).

記憶部２２は、ＨＤＤ（ハードディスクドライブ）であり、制御部２１が実行するプログラムや、プログラム実行に必要なデータ、ＯＳ（オペレーティング・システム）等が格納されている。これらのプログラムコードは、制御部２１により必要に応じて読み出されてＲＡＭに移され、ＣＰＵに読み出されて実行される。 The storage unit 22 is an HDD (hard disk drive), and stores a program executed by the control unit 21, data necessary for program execution, an OS (operating system), and the like. These program codes are read by the control unit 21 as necessary, transferred to the RAM, and read and executed by the CPU.

メディア入出力部２３は、例えば、ＨＤ（ハードディスク）ドライブ、フロッピー（登録商標）ディスクドライブ、メモリカードドライブ、ＰＤドライブ、ＣＤドライブ、ＤＶＤドライブ、ＭＯドライブ等のメディア入出力装置であり、データの入出力を行う。 The media input / output unit 23 is a media input / output device such as an HD (hard disk) drive, a floppy (registered trademark) disk drive, a memory card drive, a PD drive, a CD drive, a DVD drive, or an MO drive. Output.

入力部２４は、例えば、キーボード、マウス等のポインティング・デバイス、テンキー等の入力装置であり、入力されたデータを制御部２１へ出力する。
表示部２５は、例えば液晶パネル、ＣＲＴモニタ等のディスプレイ装置と、ディスプレイ装置と連携して表示処理を実行するための論理回路（ビデオアダプタ等）で構成され、制御部２１の制御により入力された表示情報をディスプレイ装置上に表示させる。
通信部２６は、通信制御装置、通信ポート等を有し、ネットワークとの通信を媒介する通信インターフェイスであり、通信制御を行う。 The input unit 24 is an input device such as a keyboard, a pointing device such as a mouse, or a numeric keypad, and outputs input data to the control unit 21.
The display unit 25 includes, for example, a display device such as a liquid crystal panel or a CRT monitor, and a logic circuit (video adapter or the like) for executing display processing in cooperation with the display device, and is input under the control of the control unit 21. Display information is displayed on a display device.
The communication unit 26 includes a communication control device, a communication port, and the like, and is a communication interface that mediates communication with the network, and performs communication control.

周辺機器Ｉ／Ｆ（インターフェース）部２７は、人物属性推定装置２に周辺機器を接続するためのポートであり、周辺機器Ｉ／Ｆ部２７を介して人物属性推定装置２は周辺機器とのデータの送受信を行う。周辺機器Ｉ／Ｆ部２７は、ＵＳＢやＩＥＥＥ１３９４やＲＳ−２３２Ｃ等で構成される。撮像装置３は、周辺機器Ｉ／Ｆ部２７に接続される。撮像装置３によって撮像された画像は、人物属性推定対象として制御部２１に入力される。すなわち、周辺機器Ｉ／Ｆ部２７は、人物画像入力手段として機能する。
バス２８は、各装置間の制御信号、データ信号等の授受を媒介する経路である。 The peripheral device I / F (interface) unit 27 is a port for connecting a peripheral device to the person attribute estimation device 2, and the person attribute estimation device 2 communicates data with the peripheral device via the peripheral device I / F unit 27. Send and receive. The peripheral device I / F unit 27 is configured by USB, IEEE 1394, RS-232C, or the like. The imaging device 3 is connected to the peripheral device I / F unit 27. An image captured by the imaging device 3 is input to the control unit 21 as a person attribute estimation target. That is, the peripheral device I / F unit 27 functions as a person image input unit.
The bus 28 is a path that mediates transmission / reception of control signals, data signals, and the like between the devices.

次に、図４〜図１２を参照して、人物属性推定装置２の動作を説明する。
人物属性推定装置２の制御部２１は、属性推定処理プログラムを記憶部２２から読み込み、以下の手順に従って処理を実行する。 Next, the operation of the person attribute estimation device 2 will be described with reference to FIGS.
The control unit 21 of the person attribute estimation device 2 reads the attribute estimation processing program from the storage unit 22 and executes processing according to the following procedure.

まず、属性学習機能について説明する。
図４及び図５に示すように、人物属性推定装置２に対して学習用の画像データが入力される（ステップＳ１）。入力される画像データは証明写真のように顔の全体が正面から撮影された画像であることが望ましい。 First, the attribute learning function will be described.
As shown in FIGS. 4 and 5, image data for learning is input to the person attribute estimation device 2 (step S1). The input image data is preferably an image in which the entire face is photographed from the front, such as an ID photo.

制御部２１は、入力画像６から人物の顔領域６１を検出する（ステップＳ２）。画像データが動画データである場合は、各フレームについて夫々顔領域６１を検出し、検出した各顔領域６１を処理対象として以下の処理を行う。 The control unit 21 detects a human face area 61 from the input image 6 (step S2). When the image data is moving image data, the face area 61 is detected for each frame, and the following processing is performed on each detected face area 61 as a processing target.

顔検出のための手法は種々あるが、例えば、Ｈａａｒ−ｌｉｋｅ特徴を用いた手法を用いればよい。Ｈａａｒ−ｌｉｋｅ特徴を用いた物体検出手法は、検出対象となる物体の明暗のパターンを基に高速に識別を行うことを特徴とし、静止画や動画像中における顔検出等に利用されている。特に、証明写真のような正面、無帽の人物像に対しては高い検出精度を発揮する。Ｈａａｒ−ｌｉｋｅ特徴を用いた顔の検出結果は、目・鼻・口等といった顔の局所情報と、顔の中心、サイズ等の情報を含んだ領域として表される。なお、Ｈａａｒ−ｌｉｋｅ特徴による対象物検出の原理、顔検出への適用例等については、下記の非特許文献２，３に紹介されている。 There are various methods for face detection. For example, a method using Haar-like features may be used. The object detection method using the Haar-like feature is characterized by performing high-speed identification based on a light / dark pattern of an object to be detected, and is used for detecting a face in a still image or a moving image. In particular, high detection accuracy is exhibited for a frontal, hatless person image such as an ID photo. A face detection result using the Haar-like feature is represented as an area including local information of the face such as eyes, nose, and mouth, and information such as the center and size of the face. Note that the principle of object detection based on the Haar-like feature, an application example to face detection, and the like are introduced in Non-Patent Documents 2 and 3 below.

＜非特許文献２＞Paul Viola and Michel
J. Jones: "Rapid Object Detection Using a Boosted Cascade of Simple Features",
Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, pp.511-518, (2001)
＜非特許文献３＞Rainer Lienhart and
Jochen Maydt: "An Extended Set of Haar-likeFeatures for Rapid Object
Detection", Proceedings of the 2002 IEEEInternational Conference on Image
Processing, Vol.1, pp.900-903, (2002) <Non-Patent Document 2> Paul Viola and Michel
J. Jones: "Rapid Object Detection Using a Boosted Cascade of Simple Features",
Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, pp.511-518, (2001)
<Non-Patent Document 3> Rainer Lienhart and
Jochen Maydt: "An Extended Set of Haar-likeFeatures for Rapid Object
Detection ", Proceedings of the 2002 IEEE International Conference on Image
Processing, Vol.1, pp.900-903, (2002)

また、Ｈａａｒ−ｌｉｋｅ特徴を用い、画像中から顔を特定し、顔のサイズや位置を取得する手法は、既に画像処理のライブラリとして提供されており（Ｉｎｔｅｌ社が開発公開しているＯｐｅｎＣＶライブラリ）、これらを利用し顔領域６１を導くことができる。 In addition, a method for identifying a face from an image using a Haar-like feature and acquiring the size and position of the face has already been provided as an image processing library (OpenCV library developed and released by Intel). These can be used to guide the face area 61.

このように顔領域６１が特定されると、制御部２１は、肌マスク６１を生成する（ステップＳ３）。肌マスク生成において、制御部２１は、顔領域６１のうち例えば鼻部（顔領域６１の枠領域６１ａ）の各画素の平均色を基準色とし、顔領域６１内の各画素の色と上記基準色とのマハラノビス距離を算出し、ある閾値以上の箇所を肌領域として抽出する。抽出した肌領域の画素を「１」、その他の画素を「０」で示した二値化画像を肌マスク６２とする。この肌マスク６２を用いて、ステップＳ２にて検出した元の顔領域６１をマスクした画像をマスク処理後画像６３とする。人物の肌領域抽出の方法および肌マスクの作成方法については、本願と同一の出願人による特開２００９−５９００９号公報等に開示されるため、詳細な説明は省略する。 When the face area 61 is specified in this way, the control unit 21 generates a skin mask 61 (step S3). In generating the skin mask, the control unit 21 uses, for example, the average color of each pixel in the nose part (the frame area 61a of the face area 61) in the face area 61 as a reference color, and the color of each pixel in the face area 61 and the above reference. The Mahalanobis distance from the color is calculated, and a portion above a certain threshold is extracted as a skin region. A binarized image in which the extracted skin region pixels are represented by “1” and the other pixels are represented by “0” is defined as a skin mask 62. An image obtained by masking the original face area 61 detected in step S2 using the skin mask 62 is defined as an image 63 after mask processing. Since the method for extracting a person's skin region and the method for creating a skin mask are disclosed in Japanese Patent Application Laid-Open No. 2009-59209 by the same applicant as the present application, detailed description thereof will be omitted.

次に、制御部２１は、マスク処理後画像６３を用いてシミ特徴量を抽出する（ステップＳ４）。処理対象となるマスク処理後画像６３はグレースケールで表された画像とする。
ステップＳ４のシミ特徴量を算出する処理において、制御部２１は、肌マスク６２を用いて元の顔領域６１の肌部を抽出したマスク処理後画像６３と、マスク処理後画像６３を更にスムージング処理したぼかし画像６４との差分を用いて肌部のシミ画像６５を抽出し、シミ特徴量を算出する。 Next, the control unit 21 extracts a spot feature amount by using the mask-processed image 63 (step S4). The post-mask processing image 63 to be processed is an image expressed in gray scale.
In the process of calculating the spot feature amount in step S4, the control unit 21 further smoothes the masked image 63 obtained by extracting the skin part of the original face region 61 using the skin mask 62 and the masked image 63. The spot image 65 of the skin part is extracted using the difference from the blurred image 64, and the spot feature amount is calculated.

まず、制御部２１は、マスク処理後画像６３をスムージング化するためのフィルタ処理を行う。このフィルタ処理には、ＭＴＭフィルタを用いることが好適である（ステップＳ４１）。
ＭＴＭフィルタは、平均化フィルタの一種であるが、着目画素と近傍の画素との差分が小さい場合は着目画素と近傍画素との相関が強いとし、この近傍画素を演算対象に取り入れ、逆に差分が大きい場合は、演算対象から除外することで、小振幅信号に対し重点的にフィルタ効果を作用させるものである。ＭＴＭフィルタの原理については、本願と同一の出願人による特開２０１０−６６９４３号公報に詳細な説明が記載されるため、ここでは説明を省略する。
ＭＴＭフィルタを用いることで、周囲との画素の値の差分の大きい、大きなシミやホクロ等をぼかし処理の対象から除外でき、シミ特徴量の考慮対象から排除できる。 First, the control unit 21 performs filter processing for smoothing the post-mask processing image 63. For this filtering process, it is preferable to use an MTM filter (step S41).
The MTM filter is a kind of averaging filter, but when the difference between the target pixel and the neighboring pixel is small, the correlation between the target pixel and the neighboring pixel is assumed to be strong, and this neighboring pixel is taken into the calculation target, and conversely the difference Is large, the filter effect is focused on the small amplitude signal by excluding it from the calculation target. Since the detailed explanation of the principle of the MTM filter is described in Japanese Patent Application Laid-Open No. 2010-66943 by the same applicant as the present application, the explanation is omitted here.
By using the MTM filter, it is possible to exclude large spots, moles, and the like that have a large difference in pixel values from the surroundings from the subject of blurring processing, and to exclude them from the subject of spot feature amount consideration.

図６は、マスク処理後画像６３及びＭＴＭフィルタ処理後のぼかし画像６４の例を示す。図６では顔領域の一部のみ例示する。図６に示すように、鼻の周辺にあるほくろはぼかし処理されていないが、頬から顎の黒ずみはぼかし処理され、滑らかになっている。 FIG. 6 shows an example of the image 63 after mask processing and the blurred image 64 after MTM filter processing. FIG. 6 illustrates only a part of the face area. As shown in FIG. 6, the moles around the nose are not blurred, but the darkening of the chin from the cheeks is blurred and smoothed.

ＭＴＭフィルタ処理後、制御部２１は、マスク処理後画像６３とＭＴＭフィルタ処理後画像６４との差分を取る（ステップＳ４２）。マスク処理後画像６３からＭＴＭフィルタ処理後画像６４の差分を取ると、図７に示すように顔領域からシミ（肌の黒ずんだ部分や微細な凹凸等）のみを抽出できる。加齢と関係のないほくろはシミとして抽出されない。以下、シミを抽出した画像をシミ画像６５と呼ぶ。図７において画素値が大きく白く表示される部位ほど、シミが強いことを示している。 After the MTM filter processing, the control unit 21 takes the difference between the image 63 after mask processing and the image 64 after MTM filter processing (step S42). When the difference between the image 63 after the mask processing and the image 64 after the MTM filter processing is taken, only a stain (a darkened portion of the skin, fine unevenness, etc.) can be extracted from the face region as shown in FIG. Mole that is not related to aging is not extracted as a spot. Hereinafter, the image from which the stain is extracted is referred to as a stain image 65. In FIG. 7, the part where the pixel value is large and displayed in white indicates that the spot is strong.

制御部２１は、ステップＳ４２にて求めたシミ画像６５を例えば図８に示すように、縦横に夫々５ブロックに分割し、分割ブロック毎に画素値の平均値を求め、シミ特徴量とする（ステップＳ４３）。このようにして２５ブロック分のシミ特徴量が求められる。制御部２１は各ブロックから抽出したシミ特徴量をＲＡＭに保存する。なお、ブロックの分割数は５×５に限定されず、任意のブロック数に変更してもよい。ただし、シミの表れる箇所は、性別や年齢を表す特徴となるため、細分しすぎず、かつ頬、額、鼻等の顔の部位を特定できる程度の分割数に分割することが望ましい。 For example, as shown in FIG. 8, the control unit 21 divides the stain image 65 obtained in step S42 into 5 blocks vertically and horizontally, obtains an average value of pixel values for each divided block, and uses it as a stain feature amount ( Step S43). In this way, the stain feature amount for 25 blocks is obtained. The control unit 21 stores the spot feature amount extracted from each block in the RAM. Note that the number of divided blocks is not limited to 5 × 5, and may be changed to an arbitrary number of blocks. However, spots where spots appear are characteristics representing sex and age, so it is desirable to divide them into divisions that do not subdivide and that can identify facial parts such as cheeks, forehead, and nose.

また、制御部２１は、マスク処理後画像６３を用いてシワの強度及び方向を考慮したシワ特徴量を抽出する（ステップＳ５）。
ステップＳ５のシワ特徴量を算出する処理において、図９に示すように、まず制御部２１は、マスク処理後画像６３を用いて画素毎の勾配強度、勾配方向を算出する。勾配強度はシワの強さを示し、勾配方向はシワの方向を示す。処理対象となるマスク処理後画像６３はグレースケールで表された画像とする。
制御部２１は、勾配抽出用のフィルタをマスク処理後画像６３の各画素に乗じる（ステップＳ５１）。勾配抽出用のフィルタとしては、Ｓｏｂｅｌフィルタが好適である。 Further, the control unit 21 extracts a wrinkle feature amount considering the strength and direction of the wrinkle using the image 63 after the mask process (step S5).
In the process of calculating the wrinkle feature amount in step S5, first, as shown in FIG. 9, the control unit 21 calculates the gradient strength and gradient direction for each pixel using the image 63 after mask processing. The gradient strength indicates the strength of wrinkles, and the gradient direction indicates the direction of wrinkles. The post-mask processing image 63 to be processed is an image expressed in gray scale.
The control unit 21 multiplies each pixel of the post-mask processing image 63 by a gradient extraction filter (step S51). A Sobel filter is suitable as the gradient extraction filter.

Ｓｏｂｅｌフィルタとは、画像処理においてエッジ検出に利用されるフィルタであり、注目画素を中心とした上下左右の９つの画素値に対して、図９に示すＳｏｂｅｌフィルタの各係数をそれぞれ乗算し、その乗算結果をＸ方向（水平方向）またはＹ方向（垂直方向）に合計することにより、Ｘ方向の勾配（Δｘの値）、Ｙ方向の勾配（Δｙの値）を求める。そして、以下の式（１）を用いて、勾配強度を算出でき、以下の式（２）を用いて勾配方向を算出できる。 The Sobel filter is a filter used for edge detection in image processing. The Sobel filter multiplies each pixel value of the Sobel filter shown in FIG. By summing the multiplication results in the X direction (horizontal direction) or the Y direction (vertical direction), the gradient in the X direction (value of Δx) and the gradient in the Y direction (value of Δy) are obtained. The gradient strength can be calculated using the following equation (1), and the gradient direction can be calculated using the following equation (2).

制御部２１は、各画素の勾配強度及び勾配方向を基に、Ｘ軸を勾配方向、Ｙ軸を勾配強度とした勾配ヒストグラム６９（図１０参照）を作成する。勾配ヒストグラム６９は、シミ特徴量を抽出する場合と同様に、顔領域を縦横に夫々５ブロックで分割した分割ブロック毎に作成される（ステップＳ５２）。 Based on the gradient intensity and gradient direction of each pixel, the control unit 21 creates a gradient histogram 69 (see FIG. 10) with the X axis as the gradient direction and the Y axis as the gradient intensity. The gradient histogram 69 is created for each divided block obtained by dividing the face area into 5 blocks vertically and horizontally, similarly to the case of extracting the spot feature amount (step S52).

図１０において、勾配ヒストグラム６９の横軸は勾配方向を示す。勾配ヒストグラムのｂｉｎ幅は１０度とし、計３６ｂｉｎ（３６０度）についてそれぞれ勾配強度を累計することが望ましい。これにより、顔領域の各ブロックにおいて最も強くまたは最も多く刻まれているシワの方向が抽出される。また、勾配強度はシワの量そのものも表すため、年齢推定に好適である。若年層では勾配ヒストグラム６９の強度値は全体に小さく、高年層では勾配ヒストグラム６９の強度値が全体に大きくなる傾向がある。
制御部２１は各ブロックから勾配ヒストグラム６９の強度最大値を算出し、これをシワ特徴量とする（ステップＳ５３）。制御部２１は算出したシワ特徴量をＲＡＭに格納する。 In FIG. 10, the horizontal axis of the gradient histogram 69 indicates the gradient direction. The bin width of the gradient histogram is 10 degrees, and it is desirable to accumulate the gradient intensities for a total of 36 bins (360 degrees). As a result, the direction of the wrinkle that is the strongest or most marked in each block of the face area is extracted. The gradient strength also represents the wrinkle amount itself, which is suitable for age estimation. In the younger age group, the intensity value of the gradient histogram 69 tends to be smaller overall, and in the older age group, the intensity value of the gradient histogram 69 tends to become larger overall.
The control unit 21 calculates the maximum intensity value of the gradient histogram 69 from each block and sets it as a wrinkle feature amount (step S53). The control unit 21 stores the calculated wrinkle feature amount in the RAM.

なお、シワ特徴量抽出においても、ブロックの分割数は５×５に限定されず、任意のブロック数に変更してもよい。ただし、シワの表れる箇所は、性別や年齢を表す特徴となるため、細分しすぎず、かつ頬、額、鼻等の顔の部位を特定できる程度の分割数に分割することが望ましい。 In the wrinkle feature amount extraction, the number of divided blocks is not limited to 5 × 5, and may be changed to an arbitrary number of blocks. However, since the wrinkle-appearing part is a feature that represents gender and age, it is desirable to divide the face into parts that are not subdivided and that can identify facial parts such as the cheek, forehead, and nose.

また、制御部２１は、検出した顔領域から唇特徴量を抽出する（ステップＳ６）。唇特徴量は色についての特徴であるため、ステップＳ６の唇特徴量を算出する処理においては、カラー画像を基に処理が行われる。図１１に示すように、まず制御部２１は、肌マスク６２（二値画像）を反転した反転肌マスク７０を生成し（ステップＳ６１）、元の顔領域画像（カラー）６１を反転肌マスク７０にてマスク処理した反転肌マスク処理結果７１を得る。反転肌マスク処理結果７１から唇領域７１ａが抽出される（ステップＳ６２）。制御部２１は、唇領域７１ａの赤みを判定する（ステップＳ６３）。 Further, the control unit 21 extracts a lip feature amount from the detected face area (step S6). Since the lip feature value is a color feature, the process of calculating the lip feature value in step S6 is performed based on the color image. As shown in FIG. 11, the control unit 21 first generates a reversed skin mask 70 obtained by inverting the skin mask 62 (binary image) (step S61), and the original face area image (color) 61 is converted into the inverted skin mask 70. A reverse skin mask processing result 71 obtained by masking is obtained. A lip region 71a is extracted from the inverted skin mask processing result 71 (step S62). The control unit 21 determines redness of the lip region 71a (step S63).

ステップＳ６３の赤み判定において、制御部２１は、元の顔領域画像６１における肌基準領域６１ａ内のヒストグラムを得るとともに、ステップＳ６２で抽出した唇領域７１ａにおけるヒストグラムを得る。ヒストグラムの対象色空間は、Ｈ（色相）、Ｓ（彩度）、Ｒ（ＲＧＢのＲ（赤））とする。
肌基準領域６１ａは元の顔領域画像６１の例えば鼻部が好適である。そして、制御部２１は、色に関する各ヒストグラムを比較し、基準領域６１ａ内の色を基準とした唇領域７１ａの赤みを特徴量として抽出する。ヒストグラムの比較には、例えばバタッチャリア距離を用いることが好適であるが、その他の距離（例えばマハラノビス距離）等で比較してもよい。 In the redness determination in step S63, the control unit 21 obtains a histogram in the skin reference region 61a in the original face region image 61 and obtains a histogram in the lip region 71a extracted in step S62. The target color space of the histogram is assumed to be H (hue), S (saturation), and R (RGB R (red)).
The skin reference area 61a is preferably the nose of the original face area image 61, for example. And the control part 21 compares each histogram regarding a color, and extracts the redness of the lip | rip area | region 71a on the basis of the color in the reference | standard area | region 61a as a feature-value. For comparison of histograms, it is preferable to use, for example, a Bachtalia distance, but a comparison may be made with other distances (for example, Mahalanobis distance).

このように本実施の形態の唇特徴量抽出においては、様々な撮影環境の下で撮像された画像ごとに決定される元の顔領域内の基準色に対して唇領域の赤みを求めるため、撮影環境の変動等に依存しにくい唇特徴量（色に関する特徴量）を得ることができる。 As described above, in the lip feature extraction of the present embodiment, in order to obtain redness of the lip region with respect to the reference color in the original face region determined for each image captured under various photographing environments, Lip feature values (color-related feature values) that are less dependent on changes in the shooting environment can be obtained.

制御部２１はステップＳ６の処理により抽出した唇特徴量をＲＡＭに格納する。 The control unit 21 stores the lip feature amount extracted by the process of step S6 in the RAM.

以上の処理において、算出された各特徴量を集約した特徴量テーブル８１の一例を図１２に示す。
図１２に示すように、学習用の各人物画像（Ｎｏ．００１、Ｎｏ．００２、・・・）から夫々複数の特徴量が抽出され、特徴量テーブル８１に集約される。特徴量には、シミ特徴量、シワ特徴量、及び唇特徴量が含まれている。 FIG. 12 shows an example of the feature amount table 81 in which the calculated feature amounts are collected in the above processing.
As shown in FIG. 12, a plurality of feature amounts are extracted from each person image for learning (No. 001, No. 002,...) And are collected in a feature amount table 81. The feature amount includes a stain feature amount, a wrinkle feature amount, and a lip feature amount.

例えば、上述のシミ特徴量抽出（ステップＳ４）において、ＭＴＭフィルタの強度レベルを３段階に変更してフィルタ処理し、それぞれ分割領域（５×５）毎に特徴量を得た場合は、５×５×３＝７５のシミ特徴量が抽出される。更に、上述のシワ特徴量抽出（ステップＳ５）でも各分割領域（５×５）毎に特徴量を得るため、５×５＝２５のシミ特徴量が抽出される。更に、上述の唇特徴量抽出（ステップＳ６）において、Ｈ（色相）、Ｓ（彩度）、Ｒ（赤）についてそれぞれ特徴量を得るため、３つの唇特徴量が抽出される。この場合は、計１０３種類の特徴量が抽出されるため１０３次元の特徴ベクトルを得る。これらの学習用人物画像については、属性（年齢、性別）が既知であり、属性毎にそれぞれ特徴量テーブル８１が集約される。 For example, in the above-described spot feature amount extraction (step S4), when the intensity level of the MTM filter is changed to three levels and filtering is performed, and feature amounts are obtained for each divided region (5 × 5), 5 × 5 × 3 = 75 spot feature values are extracted. Further, in the above-described wrinkle feature amount extraction (step S5), 5 × 5 = 25 spot feature amounts are extracted in order to obtain feature amounts for each divided region (5 × 5). Further, in the above-described lip feature value extraction (step S6), three lip feature values are extracted in order to obtain feature values for H (hue), S (saturation), and R (red), respectively. In this case, since 103 types of feature amounts are extracted, 103-dimensional feature vectors are obtained. About these learning person images, attributes (age, gender) are known, and a feature amount table 81 is aggregated for each attribute.

すなわち、制御部２１は、人物の属性区分毎に、人物画像から抽出した特徴量を集約し、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）による人物属性学習データ１０を求める（ステップＳ７）。ＳＶＭでは、統計的学習理論に基づくパターン認識が可能であり、学習により得られる特徴量空間において、他クラスと最も近い位置に存在する特徴量ベクトル（サポートベクトル）を基準として、そのユークリッド距離が最も大きくなるような位置に識別境界を設定する（分離超平面による線形識別を行う）。ＳＶＭは公知の手法であるため詳細な説明を省略する。 That is, the control unit 21 aggregates the feature amounts extracted from the person image for each person attribute category, and obtains the person attribute learning data 10 by SVM (Support Vector Machine) (step S7). In SVM, pattern recognition based on statistical learning theory is possible, and in the feature amount space obtained by learning, the Euclidean distance is the most on the basis of the feature amount vector (support vector) existing closest to other classes. An identification boundary is set at a position where it becomes large (linear identification is performed using a separation hyperplane). Since SVM is a known method, detailed description thereof is omitted.

制御部２１は、人物属性学習データ１０を生成し、記憶部２２に記憶する。人物画像の学習数が多くなるほど、精度のよい学習データ１０を得ることができる。 The control unit 21 generates the person attribute learning data 10 and stores it in the storage unit 22. As the number of person images learned increases, more accurate learning data 10 can be obtained.

人物属性学習データ１０が生成されると、処理対象画像について属性推定を行うことが可能となる。
属性推定機能では、図２（ｂ）に示すように、人物画像推定装置２に対して任意の人物画像が入力されると、制御部２１は、入力された人物画像６から顔領域６１を検出し、検出した顔領域６１から上述のように、シミ特徴量、シワ特徴量、及び唇特徴量を算出し、算出した各特徴量と人物属性学習データ１０とに基づいて入力された処理対象となる人物画像の人物属性を推定し、推定結果を出力する。 When the person attribute learning data 10 is generated, it is possible to perform attribute estimation for the processing target image.
In the attribute estimation function, as shown in FIG. 2B, when an arbitrary person image is input to the person image estimation device 2, the control unit 21 detects the face area 61 from the input person image 6. Then, as described above, the spot feature amount, the wrinkle feature amount, and the lip feature amount are calculated from the detected face region 61, and the processing target input based on the calculated feature amounts and the person attribute learning data 10 The person attribute of the person image is estimated, and the estimation result is output.

例えば、デジタルサイネージの閲覧者を撮像装置３にて撮像した画像を処理対象人物画像として使用すれば、上述の人物属性の推定結果は、サイネージ閲覧者の属性として集計され、出力され、広告効果の測定結果として利用することが可能となる。また、この推定結果を利用して、コンピュータ２の制御部２１が、デジタルサイネージの表示媒体（表示装置４）に表示するコンテンツをリアルタイムに切り替え制御するようにしてもよい。 For example, if an image obtained by capturing a digital signage viewer with the imaging device 3 is used as a person image to be processed, the above-described estimation result of the person attribute is aggregated and output as the signage viewer attribute, It can be used as a measurement result. Further, using the estimation result, the control unit 21 of the computer 2 may switch and control the content to be displayed on the digital signage display medium (display device 4) in real time.

また、撮像装置３から入力される画像が動画像である場合は、フレーム単位に人物属性を推定し、各フレームの人物属性推定結果の例えば最頻値を人物属性推定結果とすればよい。（但し、動画像中の人物は同一人物一人だけの場合）人物属性の区分は、例えば、広告効果測定に利用される多く利用される区分として男女別に４つの年齢区分（０歳−１９歳、２０歳−３４歳、３５歳−５９歳、６０歳以上）を設定した、８区分の属性区分としてもよいし、用途に応じてその他の適切な区分としてもよい。 When the image input from the imaging device 3 is a moving image, the person attribute is estimated for each frame, and for example, the mode value of the person attribute estimation result of each frame may be used as the person attribute estimation result. (However, in the case where only one person is the same person in the moving image) The classification of the person attribute is, for example, four age categories (0-19 years old, (20 years old-34 years old, 35 years old-59 years old, over 60 years old) may be set as an attribute classification of 8 classifications, or may be other appropriate classifications according to uses.

以上説明したように、本実施の形態の人物属性推定システム１において、人物属性推定装置２は、入力される学習用人物画像から人物の顔領域６１を検出し、顔領域６１からシミ特徴量、シワ特徴量、及び唇特徴量を算出し、これらの各特徴量と人物の既知の属性（年齢、性別）に基づいて人物属性学習データ１０を作成する。また、人物属性推定システム１は、人物属性学習データ１０が作成されると、撮像装置３から入力された人物画像についてシミ特徴量、シワ特徴量、及び唇特徴量を算出し、算出した特徴量と上述の人物属性学習データ１０とに基づいてその人物画像の人物属性を推定し、出力する。 As described above, in the human attribute estimation system 1 according to the present embodiment, the human attribute estimation device 2 detects the human face area 61 from the input learning person image, and the face area 61 detects the spot feature amount. Wrinkle feature values and lip feature values are calculated, and person attribute learning data 10 is created based on these feature values and the known attributes (age, gender) of the person. In addition, when the person attribute learning data 10 is created, the person attribute estimation system 1 calculates a spot feature amount, a wrinkle feature amount, and a lip feature amount for the person image input from the imaging device 3, and the calculated feature amount. And the person attribute learning data 10 are used to estimate and output the person attribute of the person image.

本手法では、シミ特徴量は、肌マスク６２によるマスク処理後画像６３とこれを更にスムージング処理したぼかし画像６４との差分を用いて算出されるため、顔の輪郭、目、鼻、口、メガネ等による影響の少ないシミ特徴量を求めることができる。また、シワ特徴量は、シワの強度と方向とが考慮されて求められるため、年齢差や性差を示すシワ特徴量を精度よく算出できる。また、唇特徴量は、肌マスク６２を用いて抽出される唇領域７１ａについて、元の顔領域内の基準色に対する色の比較値として算出されるため、画像の環境変動に対してロバスト性の高い特徴抽出を行える。このようにして算出されたシミ特徴量、シワ特徴量、及び唇特徴量を用いることにより年齢・性別等の人物属性を精度よく推定できる。 In this method, since the spot feature amount is calculated using a difference between the image 63 after mask processing using the skin mask 62 and the blurred image 64 obtained by further smoothing the image, the contour of the face, eyes, nose, mouth, glasses It is possible to obtain a spot feature amount that is less affected by the above. Further, since the wrinkle feature value is obtained in consideration of the strength and direction of the wrinkle, the wrinkle feature value indicating an age difference or a sex difference can be calculated with high accuracy. Further, since the lip feature amount is calculated as a color comparison value with respect to the reference color in the original face region for the lip region 71a extracted using the skin mask 62, the lip feature amount is robust against environmental variations of the image. High feature extraction can be performed. By using the spot feature amount, the wrinkle feature amount, and the lip feature amount calculated in this way, it is possible to accurately estimate a person attribute such as age and sex.

また、シワ特徴量を算出する際、勾配方向に対する勾配強度の累計を示す勾配ヒストグラム６９を顔領域を分割したブロック毎に生成し、各ブロックの勾配ヒストグラム６９の勾配強度の最大値をシワ特徴量とするため、各ブロックにおけるシワの状態（方向、強さ）やシワの量を取得でき、年齢推定の精度向上に大きく寄与できる。また、ブロック毎にシワの特徴を抽出するため、例えば、シワの多く現れる部分と少なく現れる部分、または法齢線や額のシワのように比較的大きなシワと目尻のシワ等のような比較的細かいシワとを混同することなくシワの特徴を抽出できる。 Further, when calculating the wrinkle feature amount, a gradient histogram 69 indicating the cumulative gradient intensity with respect to the gradient direction is generated for each block obtained by dividing the face area, and the maximum gradient strength value of the gradient histogram 69 of each block is determined as the wrinkle feature amount. Therefore, the wrinkle state (direction and strength) and the amount of wrinkles in each block can be acquired, which can greatly contribute to the improvement of the accuracy of age estimation. Also, because wrinkle features are extracted for each block, for example, wrinkles appearing more or less, or relatively large wrinkles such as normal lines and forehead wrinkles, and relatively large wrinkles such as the corners of the eyes. Wrinkle features can be extracted without confusing fine wrinkles.

また、シミ特徴量を算出する際、マスク処理後画像６３に対してＭＴＭフィルタを用いてぼかし画像６４を生成し、顔領域を分割したブロック毎にマスク処理後画像６３とぼかし画像６４との差分の平均値を算出してシミ特徴量とする。ＭＴＭフィルタを用いることにより、年齢との相関の小さい大きなシミやホクロ等をぼかし処理の対象から除外でき、シミ特徴量の考慮対象から排除できる。このため、年齢等との相関が高いシミについてのみ特徴量を抽出できる。 Further, when calculating the spot feature amount, a blurred image 64 is generated for the image 63 after mask processing using an MTM filter, and the difference between the image 63 after mask processing and the blurred image 64 for each block into which the face area is divided. The average value is calculated as a spot feature amount. By using the MTM filter, it is possible to exclude large spots, moles, and the like having a small correlation with age from the subject of blurring processing, and it is possible to exclude them from the subject of consideration of the spot feature amount. For this reason, the feature amount can be extracted only for a stain having a high correlation with age and the like.

また、唇特徴量を算出する際、肌マスク６２を反転処理した反転マスク７０を用いて唇領域を抽出し、当該唇領域及び元の顔領域内の基準領域について夫々色相Ｈ、彩度Ｓ、赤みＲを対象色空間とするヒストグラムを生成し、これらのヒストグラムの距離を唇特徴量として算出するため、照明の強さなどの環境変動に対してロバスト性の高い特徴抽出を行える。 Further, when calculating the lip feature amount, the lip region is extracted using the reversal mask 70 obtained by reversing the skin mask 62, and the hue H, the saturation S, and the reference region in the lip region and the original face region, respectively. Since histograms with redness R as the target color space are generated and the distance between these histograms is calculated as the lip feature amount, feature extraction with high robustness to environmental fluctuations such as illumination intensity can be performed.

また、このような特徴量から推定された人物属性を広告閲覧者の人物属性取得に利用すれば、広告の効果測定を精度よく行うことが可能となる。 In addition, if the person attribute estimated from such a feature amount is used for acquiring the person attribute of the advertisement viewer, it is possible to accurately measure the advertisement effect.

従来の手法では、本発明では、肌マスクによるマスク処理後画像とこれを更にスムージング処理したぼかし画像との差分を用いてシミ特徴量を算出するため、顔の輪郭、目、鼻、口、メガネ等による影響の少ないシミ特徴量を求めることができる。
また、従来の手法では、シワ特徴量は、顔画像に対してＳｏｂｅｌフィルタを用いて大きなシワを検出し、Ｇａｂｏｒｊｅｔを用いて小ジワを検出し、これらのシワのエッジ強度の平均と偏差をシワ特徴として定義しているが、本発明では、各ブロックの勾配ヒストグラムの勾配強度の最大値をシワ特徴量として定義している。勾配ヒストグラムは、マスク処理後画像の各画素の勾配強度及び勾配方向から求められる。すなわち、本発明では、ブロック毎に勾配方向（シワの方向）別に累積されたシワの強度（勾配強度）のうち最も大きいものがそのブロックにおけるシワ特徴量とされる。シワの現れる方向は年代によって異なる傾向がある点に着目すれば、本発明は、シワの現れる方向についても考慮され、年齢推定に利用されるため、従来の手法と比較して年齢推定の精度向上に大きく寄与できる。
また、色相特徴として、従来の手法は、唇領域及び頬領域における色情報を、修正ＨＳＶ表色系の刺激値のうち色相と彩度とＬ＊ａ＊ｂ表色系の赤み成分ａ＊の平均と偏差を特徴量として定義している。一方、本発明では、様々な撮影環境の下で撮像された画像ごとに決定される元の顔領域内の基準色に対する色の比較値として唇特徴量を算出するため、画像の環境変動に対してロバスト性の高い特徴抽出を行える。このようにして算出されたシミ特徴量、シワ特徴量、及び唇特徴量を用いることにより従来の手法と比較して、年齢・性別等の人物属性を精度よく推定できる。 In the conventional technique, in the present invention, since the spot feature amount is calculated using the difference between the image after mask processing using the skin mask and the blurred image obtained by further smoothing the image, the contour of the face, eyes, nose, mouth, glasses It is possible to obtain a spot feature amount that is less affected by the above.
In the conventional method, the wrinkle feature amount is detected by detecting a large wrinkle using a Sobel filter on a face image, detecting a small wrinkle using a Gabor jet, and calculating the average and deviation of the edge strength of these wrinkles. Although defined as wrinkle features, in the present invention, the maximum value of the gradient strength of the gradient histogram of each block is defined as the wrinkle feature amount. The gradient histogram is obtained from the gradient intensity and gradient direction of each pixel of the image after mask processing. That is, according to the present invention, the largest wrinkle intensity (gradient intensity) accumulated in each gradient direction (wrinkle direction) for each block is set as the wrinkle feature amount in the block. Focusing on the fact that the direction in which wrinkles appear tends to differ depending on the age, the present invention also takes into account the direction in which wrinkles appear and is used for age estimation. Therefore, the accuracy of age estimation is improved compared to conventional methods. Can greatly contribute.
In addition, as a hue feature, the conventional method uses the color information in the lip region and cheek region as the hue and saturation of the stimulation value of the modified HSV color system and the redness component a * of the L * a * b color system. Mean and deviation are defined as feature quantities. On the other hand, in the present invention, since the lip feature amount is calculated as a color comparison value with respect to the reference color in the original face area determined for each image captured under various shooting environments, And robust feature extraction. By using the spot feature amount, the wrinkle feature amount, and the lip feature amount calculated in this way, it is possible to accurately estimate the person attributes such as age and sex as compared with the conventional method.

なお、本実施の形態において、属性学習・属性推定のためにＳＶＭを用いているが、その他の学習手法を用いてもよい。また、顔領域の検出手法等についても任意である。
また、上述の実施形態では、デジタルサイネージの広告効果測定のために、本発明の人物属性推定システムを利用する例を示したが、このような用途以外に利用されてもよい。その他、当業者であれば、本願で開示した技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 In the present embodiment, SVM is used for attribute learning and attribute estimation, but other learning methods may be used. Also, the detection method of the face area is arbitrary.
Moreover, although the example which utilizes the person attribute estimation system of this invention for the advertisement effect measurement of digital signage was shown in the above-mentioned embodiment, you may utilize for purposes other than such a use. In addition, it is obvious that those skilled in the art can come up with various changes and modifications within the scope of the technical idea disclosed in the present application, and these naturally belong to the technical scope of the present invention. It is understood.

１・・・・人物属性推定システム
２・・・・コンピュータ（人物属性推定装置）
３・・・・撮像装置
４・・・・表示装置
１０ａ，１０ｂ，１０ｃ，１０ｄ・・・・人物
６・・・・入力画像
６１・・・顔領域
６１ａ・・基準領域
６２・・・肌マスク
６３・・・マスク処理後画像
６４・・・ぼかし画像
６５・・・シミ画像
６７・・・Ｓｏｂｅｌフィルタ処理後画像
６８・・・ブロックに分割されたＳｏｂｅｌフィルタ処理後画像
６９・・・勾配ヒストグラム
７０・・・反転マスク画像
７１・・・反転マスク処理結果
７１ａ・・唇領域
８１・・・特徴量テーブル
８・・・・特徴抽出部
９・・・・属性学習部
１０・・・人物属性学習データ
１１・・・属性推定部 1 ... Person attribute estimation system 2 ... Computer (person attribute estimation device)
3 ... Imaging device 4 ... Display devices 10a, 10b, 10c, 10d ... Person 6 ... Input image 61 ... Face region 61a ... Reference region 62 ... Skin mask 63 ... Image after mask processing 64 ... Blurred image 65 ... Spot image 67 ... Image after Sobel filter processing 68 ... Image after Sobel filter processing 69 divided into blocks ... Gradient histogram 70 ... Reversal mask image 71 ... Reverse mask processing result 71a ... Lip region 81 ... Feature amount table 8 ... Feature extraction unit 9 ... Attribute learning unit 10 ... Person attribute learning data 11 ... attribute estimation part

Claims

Face detection means for detecting a human face area from input image data;
A skin mask generating means for generating a skin mask indicating a skin area of the face area;
Using the difference between the image after mask processing in which the skin portion of the original face area is extracted using the skin mask and the blurred image obtained by further smoothing the image after mask processing, a spot in the skin portion is extracted, and the spot feature amount A spot feature amount calculating means for calculating
A wrinkle feature amount calculating means for calculating a wrinkle intensity and direction as a wrinkle feature amount from an image after mask processing in which the skin portion of the original face region is extracted using the skin mask;
Lip feature amount calculating means for calculating, as a lip feature amount, a color comparison value with respect to a reference color in the original face region for the lip region extracted using the skin mask;
Attribute learning that creates personal attribute learning data based on the wrinkle feature amount, the stain feature amount, the lip feature amount calculated from a plurality of input learning person images, and attribute information about the person image Means,
Estimating means for calculating the spot feature amount, the wrinkle feature amount, and the lip feature amount for an input person image, and estimating the person attribute of the person image based on the calculated feature amount and the person attribute learning data When,
A person attribute estimation device comprising:

The wrinkle feature amount calculating means includes:
Gradient calculating means for calculating gradient intensity and gradient direction for each pixel of the image after masking;
Gradient histogram generation means for generating a gradient histogram indicating the cumulative gradient intensity with respect to the gradient direction based on the gradient intensity and gradient direction calculated by the gradient calculation means for each block obtained by dividing the face area;
The human attribute estimation apparatus according to claim 1, wherein a maximum value of the gradient strength of the gradient histogram for each block is used as a wrinkle feature amount.

The spot feature amount calculating means includes:
The blurred image is generated using an MTM filter for the image after mask processing, an average value of the difference between the image after mask processing and the blurred image is calculated for each block obtained by dividing the face region, The person attribute estimation apparatus according to claim 1, wherein the person attribute estimation apparatus is an amount.

The lip feature quantity calculating means includes:
Using a reversal mask obtained by reversing the skin mask, a lip region is extracted from the face region, and the hue H, saturation S, and redness R of the reference region in the lip region and the original face region are set as the target color space. The human attribute estimation apparatus according to claim 1, wherein a histogram is generated and a distance between the histograms is calculated as a lip feature amount.

Face detection means for detecting a human face area from input image data;
A skin mask generating means for generating a skin mask indicating a skin area of the face area;
Using the difference between the image after mask processing in which the skin portion of the original face area is extracted using the skin mask and the blurred image obtained by further smoothing the image after mask processing, a spot in the skin portion is extracted, and the spot feature amount A spot feature amount calculating means for calculating
A wrinkle feature amount calculating means for calculating a wrinkle intensity and direction as a wrinkle feature amount from an image after mask processing in which the skin portion of the original face region is extracted using the skin mask;
Lip feature amount calculating means for calculating, as a lip feature amount, a color comparison value with respect to a reference color in the original face region for the lip region extracted using the skin mask;
A person attribute created by a predetermined learning algorithm based on the wrinkle feature quantity, the spot feature quantity, and the lip feature quantity calculated from a plurality of learning person images, and attribute information given to the person image Memory means for storing learning data;
Estimating means for calculating the spot feature amount, the wrinkle feature amount, and the lip feature amount for an input person image, and estimating the person attribute of the person image based on the calculated feature amount and the person attribute learning data When,
A person attribute estimation device comprising:

An imaging means for capturing an image;
A person image input means for inputting an image picked up by the image pickup means as a person attribute estimation target in the estimation means;
The person attribute estimation device according to claim 1, further comprising:

A face detection step of detecting a human face area from the input image data;
A skin mask generating step for generating a skin mask indicating a skin region of the face region;
Using the difference between the image after mask processing in which the skin portion of the original face area is extracted using the skin mask and the blurred image obtained by further smoothing the image after mask processing, a spot in the skin portion is extracted, and the spot feature amount A spot feature amount calculating step for calculating
A wrinkle feature amount calculating step of calculating the strength and direction of wrinkles as a wrinkle feature amount from an image after mask processing in which a skin portion is extracted from an original face region using the skin mask;
A lip feature amount calculating step for calculating, as a lip feature amount, a color comparison value with respect to a reference color in the original face region for the lip region extracted using the skin mask;
Attribute learning that creates personal attribute learning data based on the wrinkle feature amount, the stain feature amount, the lip feature amount calculated from a plurality of input learning person images, and attribute information about the person image Steps,
An estimation step of calculating the spot feature amount, the wrinkle feature amount, and the lip feature amount for the input person image, and estimating a person attribute of the person image based on the calculated feature amount and the person attribute learning data. When,
A person attribute estimation method, wherein a computer executes a process including:

The wrinkle feature amount calculating step includes:
A gradient calculating step of calculating gradient intensity and gradient direction for each pixel of the image after masking;
A gradient histogram generating step for generating, for each block obtained by dividing the face area, a gradient histogram indicating a cumulative gradient intensity with respect to the gradient direction based on the calculated gradient intensity and gradient direction,
The person attribute estimation method according to claim 7, wherein the maximum value of the gradient strength of the gradient histogram for each block is used as a wrinkle feature amount.

A program written in a computer-readable format,
A face detection step of detecting a human face area from the input image data;
A skin mask generating step for generating a skin mask indicating a skin region of the face region;
Using the difference between the image after mask processing in which the skin portion of the original face area is extracted using the skin mask and the blurred image obtained by further smoothing the image after mask processing, a spot in the skin portion is extracted, and the spot feature amount A spot feature amount calculating step for calculating
A wrinkle feature amount calculating step of calculating the strength and direction of wrinkles as a wrinkle feature amount from an image after mask processing in which a skin portion is extracted from an original face region using the skin mask;
A lip feature amount calculating step for calculating, as a lip feature amount, a color comparison value with respect to a reference color in the original face region for the lip region extracted using the skin mask;
Attribute learning that creates personal attribute learning data based on the wrinkle feature amount, the stain feature amount, the lip feature amount calculated from a plurality of input learning person images, and attribute information about the person image Steps,
An estimation step of calculating the spot feature amount, the wrinkle feature amount, and the lip feature amount for the input person image, and estimating a person attribute of the person image based on the calculated feature amount and the person attribute learning data. When,
A program for causing a computer to execute processing including

The wrinkle feature amount calculating step includes:
A gradient calculating step of calculating gradient intensity and gradient direction for each pixel of the image after masking;
A gradient histogram generating step for generating, for each block obtained by dividing the face area, a gradient histogram indicating a cumulative gradient intensity with respect to the gradient direction based on the calculated gradient intensity and gradient direction,
The program according to claim 9, wherein a maximum value of the gradient strength of the gradient histogram for each block is set as a wrinkle feature amount.