JP2006133824A

JP2006133824A - Method and apparatus for image processing, and program

Info

Publication number: JP2006133824A
Application number: JP2004318706A
Authority: JP
Inventors: Toshihiko Kako; 俊彦加來
Original assignee: Fuji Photo Film Co Ltd
Current assignee: Fujifilm Holdings Corp
Priority date: 2004-11-02
Filing date: 2004-11-02
Publication date: 2006-05-25

Abstract

PROBLEM TO BE SOLVED: To provide an image processing method, etc. capable of accurately detecting a chin position in a face photograph image. SOLUTION: An eye detection portion 30 detects each position of both eyes in a face photograph image S0. A chin area estimation portion 70a estimates a chin area constituted of a chin portion and a neighboring area of the chin, based on each position of both eyes. Also, a chin position detection portion 90 detects the chin position in the chin area in which non-skin color pixels are eliminated by a non-skin color pixel elimination portion 80. COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は顔写真画像から、顎の位置を取得する画像処理方法および装置並びにそのためのプログラムに関するものである。 The present invention relates to an image processing method and apparatus for acquiring a chin position from a face photograph image, and a program therefor.

パスポートや免許証の交付申請、あるいは履歴書の作成などの場合において、本人の顔が写っている予め定められた出力規格の写真（以下証明写真という）の提出が要求されることが多い。このため、利用者の撮影を行うための撮影室が設けられ、撮影室内の椅子に着座した利用者を撮影し、利用者の証明写真用の顔写真画像をシートに記録した証明写真シートを作成することを自動的に行う証明写真の自動作成装置が従来より利用されている。このような自動作成装置は、大型であるため、設置場所が限られているため、利用者が証明写真を取得するためには、自動作成装置が設置された場所を探して出向く必要があり、不便である。 When applying for a passport or a license, or creating a resume, it is often required to submit a photo of a predetermined output standard (hereinafter referred to as a certification photo) showing the person's face. For this reason, a photo room is provided for taking pictures of the user, and the ID photo sheet is created by photographing the user sitting on the chair in the photo room and recording the face photo image for the ID photo of the user on the sheet. ID photo automatic creation devices that automatically do this have been used. Since such an automatic creation device is large and has a limited installation location, it is necessary for the user to go to the location where the automatic creation device is installed in order to obtain an ID photo. Inconvenient.

この問題を解決するために、例えば、特許文献１に記載されたように、証明写真の作成に用いる顔写真画像（顔が写されている画像）がモニタなどの表示装置で表示されている状態で、表示されている顔写真画像における頭頭部位置と顎の先端位置（以下顎の位置という）を指示すると、コンピュータが指示された２つの位置および証明写真の出力規格に基づいて顔の拡大縮小率、顔の位置を求めて画像を拡大縮小すると共に、拡大縮小した画像中の顔が証明写真における所定の位置に配置されるように拡大縮小した顔写真画像をトリミングして証明写真画像を形成する方法が提案されている。このような方法によって、利用者は、証明写真の自動作成装置よりも多数存在しているＤＰＥ店などに証明写真の作成を依頼することができると共に、手持ちの写真のうち、写りが良いなどのような気に入った写真が記録された写真フィルムまたは記録媒体をＤＰＥ店などに持ち込むことで、気に入った写真から証明写真を作成させることも可能となる。 In order to solve this problem, for example, as described in Patent Document 1, a face photo image (an image showing a face) used for creating an ID photo is displayed on a display device such as a monitor. When the head position and the tip position of the chin (hereinafter referred to as the chin position) in the displayed face photograph image are designated, the computer enlarges the face based on the two designated positions and the output standard of the identification photograph. The image is enlarged and reduced by obtaining the reduction ratio and the position of the face, and the ID photo image is obtained by trimming the enlarged face photo image so that the face in the enlarged image is arranged at a predetermined position in the ID photo. A method of forming has been proposed. With this method, the user can request the creation of ID photos from DPE stores, etc., which exist more than the ID photo auto-creating device, and the photo of the photo on hand is good. By bringing a photographic film or recording medium on which such a favorite photograph is recorded to a DPE store or the like, it is possible to create an ID photo from the favorite photograph.

しかしながら、この技術では、表示されている顔写真画像に対して頭頭部位置と顎の位置を各々指示する、という煩雑な操作をオペレータが行う必要があるので、特に多数の利用者の証明写真を作成するなどの場合にオペレータの負担が大きい。また、顎の色と、顎の下の首部分の色とが同じ肌色であるため、特に表示されて顔写真画像中の顔の領域の面積が小さい場合や、顔写真画像の解像度が粗いなどの場合には、顎の位置をオペレータが迅速かつ正確に指示することは困難であり、適切かつ迅速に証明写真を作成することができないという問題がある。 However, in this technique, the operator needs to perform a complicated operation of designating the head position and the chin position with respect to the displayed face photograph image. For example, the burden on the operator is large. Also, since the color of the chin and the color of the neck part under the chin are the same skin color, especially when the area of the face area in the face photo image is small or the resolution of the face photo image is rough, etc. In this case, it is difficult for the operator to quickly and accurately indicate the position of the jaw, and there is a problem that it is impossible to create an ID photo appropriately and quickly.

また、顔における目、口などの他のパーツと比べ、顎の色と顎の下の首部分の色とが同じ肌色であると共に、顎と首との間に形成されたエッジの強度も比較的に弱いため、画像処理の技術を利用して検出することも困難である。 Compared to other parts such as eyes and mouth on the face, the color of the chin and the color of the neck under the chin are the same skin color, and the strength of the edge formed between the chin and neck is also compared. Therefore, it is difficult to detect using image processing technology.

特許文献２および特許文献３には、顔から目、口を検出すると共に、顔における顎、口間の距離と、目、口間の距離との比が略一定であるとの仮定に基づいて、顔写真画像における目および口の位置から顎の位置を推定する方法が提案されている。 Patent Documents 2 and 3 are based on the assumption that the eyes and mouth are detected from the face, and the ratio between the distance between the jaw and mouth in the face and the distance between the eyes and mouth is substantially constant. A method for estimating the position of the jaw from the position of the eyes and mouth in the face photograph image has been proposed.

顎の位置より、顔写真画像から目および口の位置を検出するほうが精確にできるので、特許文献２および特許文献３に提案された方法によって、顔写真画像から目および口の位置を検出し、検出された目および口の位置から顎の位置を推定するようにすれば、オペレータにより顎の先端位置を指示することを必要とせずに顔写真画像から証明写真の作成ができる。 Since it is more accurate to detect the position of the eyes and mouth from the face photograph image than the position of the jaw, the positions of the eyes and mouth are detected from the face photograph image by the methods proposed in Patent Document 2 and Patent Document 3, If the position of the jaw is estimated from the detected eye and mouth position, an ID photo can be created from the face photo image without requiring the operator to indicate the tip position of the jaw.

同様に、顎の位置より、目の位置と口の位置を指定するほうが適切かつ迅速にできるので、オペレータに目の位置と口の位置を指定させ、指定された位置から顎の位置を推定するようにしても、オペレータにより顎の先端位置を指示することを必要とせずに顔写真画像から証明写真の作成ができる。
特開平１１―３４１２７２号公報特開２００４−５３８４号公報特開２００４−９６４８６ Similarly, specifying the eye position and mouth position is more appropriate and quicker than the jaw position, so let the operator specify the eye position and mouth position, and then estimate the jaw position from the specified position. Even so, the ID photo can be created from the face photo image without requiring the operator to indicate the position of the tip of the jaw.
JP-A-11-341272 JP 2004-5384 A JP2004-96486

しかしながら、顔における各パーツの位置関係は、必ずしも一定とは限らない。特に顎と他のパーツとの位置関係は、顎が長い人、顎が短い人がいるように生まれ付きの個人差もあるし、人種、性別、年齢などによっても変動するものである。また、太っているか痩せているか、無表情であるか笑っているかなどの要因によって、顎と他のパーツ間の位置関係も一定とは言いがたい。 However, the positional relationship between the parts on the face is not always constant. In particular, the positional relationship between the jaw and other parts varies depending on the race, gender, age, etc., as there are individual differences such as those with long jaws and people with short jaws. Also, it is difficult to say that the positional relationship between the chin and other parts is constant, depending on factors such as whether you are fat or thin, expressionless or laughing.

特許文献２および特許文献３に提案された方法は、オペレータにより顎の位置を指定することを避けるために、検出も、オペレータによる指定も顎の位置よりしやすい目および口の位置から顎の位置を推定しているので、平均的な顎位置を得ることができるが、精確な顎の位置を得ることができない。その結果として、顎の位置を必要とする例えばトリミング処理なども適切に行えず、良い証明写真を得ることがでないという問題がある。 In order to avoid specifying the jaw position by the operator, the method proposed in Patent Document 2 and Patent Document 3 is based on the position of the jaw from the position of the eyes and mouth that is easier to detect and specify by the operator than the position of the jaw. Therefore, an average jaw position can be obtained, but an accurate jaw position cannot be obtained. As a result, there is a problem that, for example, trimming processing that requires the position of the jaw cannot be performed properly, and a good ID photo cannot be obtained.

本発明は、上記事情に鑑み、精確に顔写真画像における顎の位置を検出することができる画像処理方法および装置並びにそのためのプログラムを提供することを目的とするものである。 In view of the above circumstances, an object of the present invention is to provide an image processing method and apparatus capable of accurately detecting the position of a jaw in a face photographic image, and a program therefor.

本発明の第１の画像処理方法は、顔写真画像からおおよその顔の位置および大きさを検出し、
検出された前記顔の位置および大きさに基づいて、該顔における顎と前記顎の近傍領域とからなる顎範囲を推定し、
推定された前記領域内において、前記顎の位置を検出することを特徴とするものである。 The first image processing method of the present invention detects the approximate position and size of a face from a face photograph image,
Based on the detected position and size of the face, estimate a jaw range consisting of the jaw in the face and a region near the jaw;
In the estimated region, the position of the jaw is detected.

本発明において、「顎範囲」とは、顎と顎の近傍領域とからなる範囲であり、顎の存在する可能性が大きく、かつ領域ができるだけ小さい範囲である。顎範囲を推定するのに際し、顔における顎以外の他のパーツや、首にかかった衣服などの領域をできるだけ排除するように推定することが望ましい。 In the present invention, the “jaw range” is a range composed of the jaw and a region near the jaw, and is a range where the possibility of the presence of the jaw is large and the region is as small as possible. In estimating the jaw range, it is desirable to estimate so as to eliminate as much as possible other parts of the face other than the jaws and clothes on the neck.

本発明の第２の画像処理方法は、顔写真画像における両目の位置を取得し、
取得された前記顔写真画像における両目の位置に基づいて、該顔写真画像における顎と前記顎の近傍領域とからなる顎範囲を推定し、
推定された前記範囲内において、前記顎の位置を検出することを特徴とするものである。 The second image processing method of the present invention obtains the positions of both eyes in a face photo image,
Based on the position of both eyes in the acquired face photo image, estimate a jaw range consisting of the jaw in the face photo image and a region near the jaw,
The position of the jaw is detected within the estimated range.

本発明の第１の画像処理装置は、顔写真画像からおおよその顔の位置および大きさを取得する顔検出手段と、
検出された前記顔の位置および大きさに基づいて、該顔における顎と前記顎の近傍領域とからなる顎範囲を推定する顎範囲推定手段と、
推定された前記範囲において、前記顎の位置を検出する顎位置検出手段とを備えてなることを特徴とするものである。 The first image processing apparatus of the present invention includes a face detection means for acquiring an approximate face position and size from a face photograph image;
Jaw range estimation means for estimating a jaw range composed of a jaw in the face and a region near the jaw based on the detected position and size of the face;
Jaw position detecting means for detecting the position of the jaw in the estimated range is provided.

本発明の第２の画像処理装置は、顔写真画像における両目の夫々の位置を取得する目位置取得手段と、
取得された前記両目の夫々の位置に基づいて、前記顔写真画像における顎と前記顎の近傍領域とからなる顎範囲を推定する顎範囲推定手段と、
推定された前記範囲内において、前記顎の位置を検出する顎位置検出手段とを備えてなることを特徴とするものである。 The second image processing apparatus of the present invention includes eye position acquisition means for acquiring the positions of both eyes in a face photo image,
Jaw range estimation means for estimating a jaw range consisting of a jaw in the face photograph image and a region near the jaw based on the acquired positions of both eyes;
Jaw position detection means for detecting the position of the jaw within the estimated range is provided.

ここで、目位置取得手段は、オペレータに両目の位置を指定させる入力手段であってもよいが、オペレータの負担を減らすために、顔写真画像から両目の夫々の位置を検出する目検出手段であることが好ましい。 Here, the eye position acquisition unit may be an input unit that allows the operator to specify the positions of both eyes. In order to reduce the burden on the operator, the eye position acquisition unit is an eye detection unit that detects the positions of both eyes from the face photograph image. Preferably there is.

本発明の画像処理装置は、前記顔における口の位置を取得する口位置取得手段をさらに備え、
前記顎範囲推定手段が、前記顎範囲を前記口の位置より下に推定するものであることが好ましい。 The image processing apparatus of the present invention further includes mouth position acquisition means for acquiring the position of the mouth in the face,
It is preferable that the jaw range estimation means estimates the jaw range below the position of the mouth.

本発明の画像処理装置は、前記顎範囲推定手段により推定された前記顎範囲内の各々の画素のうち、肌色以外の画素を検出して除去する非肌色画素除去手段をさらに備え、
前記顎位置検出手段が、前記肌色以外の画素が除去された前記顎範囲において、前記顎の位置を検出するものであることが好ましい。 The image processing apparatus of the present invention further comprises non-skin color pixel removing means for detecting and removing pixels other than the skin color among the pixels in the jaw range estimated by the jaw range estimating means,
It is preferable that the jaw position detecting means detects the position of the jaw in the jaw range from which pixels other than the skin color are removed.

本発明の画像処理装置における前記顎位置検出手段は、前記顎範囲内の各画素の画素値を、前記顔の縦方向に沿った軸に対して射影を行って、前記軸方向の高さ毎の射影情報を得る射影手段と、
前記射影情報に基づいて、前記高さ毎のエッジ強度を求めるエッジ強度取得手段と、
最も大きい前記エッジ強度を有する前記高さが対応する前記画素の位置を前記顎の位置として決定する顎位置決定手段とからなるものであることが好ましい。 The jaw position detecting means in the image processing apparatus of the present invention projects the pixel value of each pixel in the jaw range with respect to an axis along the vertical direction of the face, for each height in the axial direction. A projection means for obtaining projection information of
Based on the projection information, an edge strength acquisition means for obtaining an edge strength for each height;
It is preferable that the image forming apparatus includes jaw position determining means for determining the position of the pixel corresponding to the height having the largest edge intensity as the position of the jaw.

本発明の画像処理方法を、コンピュータに実行させるためのプログラムとして提供してもよい。 You may provide the image processing method of this invention as a program for making a computer perform.

本発明の第１の画像処理方法および装置によれば、顔写真画像からおおよその顔の位置および大きさを検出して、検出された顔の位置および大きさに基づいて顎と顎の近傍領域からなる顎範囲を推定する。そして、推定された顎範囲において顎の位置を検出する。おおよその顔の位置および大きさが分かっても、顔における顎の位置は個人差があるので、精確に顎の位置を推定することができないが、おおよその顎範囲を推定することができる。そして、このように推定された顎範囲において顎の位置を検出することによって、個人差があるにも拘わらず、精確に顎の位置を得ることができる。 According to the first image processing method and apparatus of the present invention, the approximate position and size of the face are detected from the face photograph image, and the jaw and the vicinity of the jaw are based on the detected position and size of the face. Estimate the jaw range consisting of Then, the position of the jaw is detected in the estimated jaw range. Even if the approximate position and size of the face are known, the position of the jaw in the face varies from person to person, so the position of the jaw cannot be estimated accurately, but the approximate jaw range can be estimated. Then, by detecting the position of the jaw in the jaw range estimated in this way, the position of the jaw can be accurately obtained despite individual differences.

本発明の第２の画像処理方法および装置によれば、顔写真画像における両目の位置を取得し、取得された両目の位置に基づいて顎範囲を推定する。そして、推定された範囲において顎の位置を検出しているので、第１の画像処理方法および装置と同じような効果を得ることができる。 According to the second image processing method and apparatus of the present invention, the positions of both eyes in the face photograph image are acquired, and the jaw range is estimated based on the acquired positions of both eyes. Since the position of the jaw is detected in the estimated range, the same effect as that of the first image processing method and apparatus can be obtained.

以下、図面を参照して、本発明の実施形態について説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１は、本発明の第１の実施形態となる画像処理システムＡの構成を示すブロック図である。本発明の画像処理システムＡは、顔写真画像（以下略して写真画像という）Ｓ０から顎の位置を検出するものであり、この顎の位置を検出するする処理が、補助記憶装置に読み込まれた処理プログラムをコンピュータ（たとえばパーソナルコンピュータ等）上で実行することにより実現される。また、この処理プログラムは、ＣＤ−ＲＯＭ等の情報記憶媒体に記憶され、もしくはインターネット等のネットワークを介して配布され、コンピュータにインストールされることになる。 FIG. 1 is a block diagram showing a configuration of an image processing system A according to the first embodiment of the present invention. The image processing system A of the present invention detects the position of the jaw from the face photograph image (hereinafter referred to as a photographic image for short) S0, and the process for detecting the position of the jaw is read into the auxiliary storage device. This is realized by executing the processing program on a computer (for example, a personal computer). Further, this processing program is stored in an information storage medium such as a CD-ROM, or distributed via a network such as the Internet and installed in a computer.

図示のように、本実施形態の画像処理システムＡは、写真画像Ｓ０を入力する画像入力部１０と、画像入力部１０により入力された写真画像における顔のおおよその位置および大きさを検出し、顔部分の画像（以下顔画像という）Ｓ１を得る顔検出部２０と、顔画像Ｓ１から両目の位置を夫々検出する目検出部３０と、顔検出部２０および目検出部３０に用いられる後述する参照データＥ１、Ｅ２を記憶したデータベース４０ａと、顔画像Ｓ１から肌色画素を検出する肌色画素検出部５０と、肌色画素検出部５０の検出結果に基づいて口の位置を検出する口検出部６０と、目検出部３０により検出された両目の位置、および口検出部６０により検出された口の位置に基づいて顎範囲を推定する顎範囲推定部７０ａと、肌色画素検出部５０の検出結果に基づいて、顎範囲推定部７０ａにより推定された顎範囲の画素のうちの、肌色画素ではない画素を除去する非肌色画素除去部８０と、非肌色画素が除去された顎範囲内において、顎の位置を検出する顎位置検出部９０と、顎位置検出部９０により得られた顎位置を示す情報を出力して、例えばトリミングなどの処理に供する出力部１００とを備えてなる。 As shown in the figure, the image processing system A of the present embodiment detects an approximate position and size of a face in an image input unit 10 that inputs a photographic image S0 and a photographic image input by the image input unit 10, A face detection unit 20 that obtains a face image (hereinafter referred to as a face image) S1, an eye detection unit 30 that detects the positions of both eyes from the face image S1, and the face detection unit 20 and the eye detection unit 30 described later. A database 40a that stores reference data E1 and E2, a skin color pixel detection unit 50 that detects skin color pixels from the face image S1, and a mouth detection unit 60 that detects the position of the mouth based on the detection result of the skin color pixel detection unit 50; The jaw range estimation unit 70a for estimating the jaw range based on the positions of both eyes detected by the eye detection unit 30 and the mouth position detected by the mouth detection unit 60; Based on the result, the non-skin color pixel removing unit 80 that removes pixels that are not skin color pixels among the pixels of the jaw range estimated by the jaw range estimation unit 70a, and the jaw range from which the non-skin color pixels are removed, A jaw position detection unit 90 that detects the position of the jaw and an output unit 100 that outputs information indicating the jaw position obtained by the jaw position detection unit 90 and performs processing such as trimming are provided.

画像入力部１０は、本実施形態の画像処理システムＡに処理対象の写真画像Ｓ０を入力するものであり、例えば、ネットワークを介して送信されてきた写真画像Ｓ０を受信する受信部や、ＣＤ−ＲＯＭなどの記録媒体から写真画像Ｓ０を読み出す読取部や、紙や、プリント用紙などの印刷媒体から印刷媒体に印刷（プリントを含む）された画像を光電変換によって読み取って写真画像Ｓ０を得るスキャナなどとすることができる。 The image input unit 10 inputs a photographic image S0 to be processed to the image processing system A of the present embodiment. For example, the image input unit 10 receives a photographic image S0 transmitted via a network, a CD- A reading unit that reads a photographic image S0 from a recording medium such as a ROM, a scanner that reads an image printed (including a print) on a printing medium from a printing medium such as paper or print paper by photoelectric conversion, and the like. It can be.

図２は、図１に示す画像処理システムＡにおける顔検出部２０の構成を示すブロック図である。顔検出部２０は、写真画像Ｓ０における顔のおおよその位置および大きさを検出し、この位置および大きさにより示される領域の画像を写真画像Ｓ０から抽出して顔画像Ｓ１を得るものであり、図２に示すように、写真画像Ｓ０から特徴量Ｃ０を算出する第１の特徴量算出部２２と、特徴量Ｃ０およびデータベース４０ａに記憶された参照データＥ１とを用いて顔検出を実行する顔検出実行部２４とを備えてなる。ここで、データベース４０ａに記憶された参照データＥ１、顔検出部２０の各構成の詳細について説明する。 FIG. 2 is a block diagram showing a configuration of the face detection unit 20 in the image processing system A shown in FIG. The face detection unit 20 detects an approximate position and size of the face in the photographic image S0, extracts an image of an area indicated by the position and size from the photographic image S0, and obtains a face image S1. As shown in FIG. 2, a face that performs face detection using the first feature quantity calculation unit 22 that calculates the feature quantity C0 from the photographic image S0, and the feature quantity C0 and the reference data E1 stored in the database 40a. And a detection execution unit 24. Here, the detail of each structure of the reference data E1 memorize | stored in the database 40a and the face detection part 20 is demonstrated.

顔検出部２０の第１の特徴量算出部２２は、顔の識別に用いる特徴量Ｃ０を写真画像Ｓ０から算出する。具体的には、勾配ベクトル（すなわち写真画像Ｓ０上の各画素における濃度が変化する方向および変化の大きさ）を特徴量Ｃ０として算出する。以下、勾配ベクトルの算出について説明する。まず、第１の特徴量算出部２２は、写真画像Ｓ０に対して図５（ａ）に示す水平方向のエッジ検出フィルタによるフィルタリング処理を施して写真画像Ｓ０における水平方向のエッジを検出する。また、第１の特徴量算出部２２は、写真画像Ｓ０に対して図５（ｂ）に示す垂直方向のエッジ検出フィルタによるフィルタリング処理を施して写真画像Ｓ０における垂直方向のエッジを検出する。そして、写真画像Ｓ０上の各画素における水平方向のエッジの大きさＨおよび垂直方向のエッジの大きさＶとから、図６に示すように、各画素における勾配ベクトルＫを算出する。 The first feature amount calculation unit 22 of the face detection unit 20 calculates a feature amount C0 used for face identification from the photographic image S0. Specifically, the gradient vector (that is, the direction in which the density of each pixel on the photographic image S0 changes and the magnitude of the change) is calculated as the feature amount C0. Hereinafter, calculation of the gradient vector will be described. First, the first feature amount calculation unit 22 performs a filtering process on the photographic image S0 using a horizontal edge detection filter shown in FIG. 5A to detect a horizontal edge in the photographic image S0. Further, the first feature amount calculation unit 22 performs filtering processing by the vertical edge detection filter shown in FIG. 5B on the photographic image S0 to detect the vertical edge in the photographic image S0. Then, a gradient vector K at each pixel is calculated from the horizontal edge size H and the vertical edge size V at each pixel on the photographic image S0, as shown in FIG.

なお、このようにして算出された勾配ベクトルＫは、図７（ａ）に示すような人物の顔の場合、図７（ｂ）に示すように、目および口のように暗い部分においては目および口の中央を向き、鼻のように明るい部分においては鼻の位置から外側を向くものとなる。また、口よりも目の方が濃度の変化が大きいため、勾配ベクトルＫは口よりも目の方が大きくなる。 It should be noted that the gradient vector K calculated in this way is an eye in a dark part such as the eyes and mouth as shown in FIG. 7B in the case of a human face as shown in FIG. It faces the center of the mouth and faces outward from the position of the nose in a bright part like the nose. Further, since the change in density is larger in the eyes than in the mouth, the gradient vector K is larger in the eyes than in the mouth.

そして、この勾配ベクトルＫの方向および大きさを特徴量Ｃ０とする。なお、勾配ベクトルＫの方向は、勾配ベクトルＫの所定方向（例えば図６におけるｘ方向）を基準とした０から３５９度の値となる。 The direction and magnitude of the gradient vector K are defined as a feature amount C0. The direction of the gradient vector K is a value from 0 to 359 degrees with reference to a predetermined direction of the gradient vector K (for example, the x direction in FIG. 6).

ここで、勾配ベクトルＫの大きさは正規化される。この正規化は、写真画像Ｓ０の全画素における勾配ベクトルＫの大きさのヒストグラムを求め、その大きさの分布が写真画像Ｓ０の各画素が取り得る値（８ビットであれば０〜２５５）に均一に分布されるようにヒストグラムを平滑化して勾配ベクトルＫの大きさを修正することにより行う。例えば、勾配ベクトルＫの大きさが小さく、図８（ａ）に示すように勾配ベクトルＫの大きさが小さい側に偏ってヒストグラムが分布している場合には、大きさが０〜２５５の全領域に亘るものとなるように勾配ベクトルＫの大きさを正規化して図８（ｂ）に示すようにヒストグラムが分布するようにする。なお、演算量を低減するために、図８（ｃ）に示すように、勾配ベクトルＫのヒストグラムにおける分布範囲を例えば５分割し、５分割された頻度分布が図８（ｄ）に示すように０〜２５５の値を５分割した範囲に亘るものとなるように正規化することが好ましい。 Here, the magnitude of the gradient vector K is normalized. This normalization obtains a histogram of the magnitude of the gradient vector K in all the pixels of the photographic image S0, and the distribution of the magnitudes is a value that each pixel of the photographic image S0 can take (0 to 255 if 8 bits). The histogram is smoothed so as to be uniformly distributed, and the magnitude of the gradient vector K is corrected. For example, when the gradient vector K is small and the histogram is distributed with the gradient vector K biased toward the small side as shown in FIG. The magnitude of the gradient vector K is normalized so that it extends over the region so that the histogram is distributed as shown in FIG. In order to reduce the calculation amount, as shown in FIG. 8C, the distribution range in the histogram of the gradient vector K is divided into, for example, five, and the frequency distribution divided into five is shown in FIG. 8D. It is preferable to normalize so that the value of 0 to 255 is in a range divided into five.

データベース４０ａに記憶された参照データＥ１は、後述するサンプル画像から選択された複数画素の組み合わせからなる複数種類の画素群の夫々について、各画素群を構成する各画素における特徴量Ｃ０の組み合わせに対する識別条件を規定したものである。 The reference data E1 stored in the database 40a identifies, for each of a plurality of types of pixel groups composed of a combination of a plurality of pixels selected from a sample image, which will be described later, with respect to a combination of feature amounts C0 in each pixel constituting each pixel group. The conditions are specified.

参照データＥ１中の、各画素群を構成する各画素における特徴量Ｃ０の組み合わせおよび識別条件は、顔であることが分かっている複数のサンプル画像と顔でないことが分かっている複数のサンプル画像とからなるサンプル画像群の学習により、あらかじめ決められたものである。 In the reference data E1, the combination and identification condition of the feature amount C0 in each pixel constituting each pixel group are a plurality of sample images that are known to be faces and a plurality of sample images that are known not to be faces. It is predetermined by learning a sample image group consisting of

なお、本実施形態においては、参照データＥ１を生成する際には、顔であることが分かっているサンプル画像として、３０×３０画素サイズを有し、図９に示すように、１つの顔の画像について両目の中心間の距離が１０画素、９画素および１１画素であり、両目の中心間距離において垂直に立った顔を平面上±１５度の範囲において３度単位で段階的に回転させた（すなわち、回転角度が−１５度，−１２度，−９度，−６度，−３度，０度，３度，６度，９度，１２度，１５度）サンプル画像を用いるものとする。したがって、１つの顔の画像につきサンプル画像は３×１１＝３３通り用意される。なお、図９においては−１５度、０度および＋１５度に回転させたサンプル画像のみを示す。また、回転の中心はサンプル画像の対角線の交点である。ここで、両目の中心間の距離が１０画素のサンプル画像であれば、目の中心位置はすべて同一となっている。この目の中心位置をサンプル画像の左上隅を原点とする座標上において（ｘ１，ｙ１）、（ｘ２，ｙ２）とする。また、図面上上下方向における目の位置（すなわちｙ１，ｙ２）はすべてのサンプル画像において同一である。 In the present embodiment, when the reference data E1 is generated, the sample image that is known to be a face has a 30 × 30 pixel size, and as shown in FIG. The distance between the centers of both eyes of the image is 10 pixels, 9 pixels, and 11 pixels, and the face standing vertically at the distance between the centers of both eyes is rotated stepwise by 3 degrees within a range of ± 15 degrees on the plane. (That is, the rotation angle is -15 degrees, -12 degrees, -9 degrees, -6 degrees, -3 degrees, 0 degrees, 3 degrees, 6 degrees, 9 degrees, 12 degrees, 15 degrees) To do. Therefore, 3 × 11 = 33 sample images are prepared for one face image. In FIG. 9, only sample images rotated at −15 degrees, 0 degrees, and +15 degrees are shown. The center of rotation is the intersection of the diagonal lines of the sample image. Here, if the distance between the centers of both eyes is a 10-pixel sample image, the center positions of the eyes are all the same. The center position of this eye is set to (x1, y1) and (x2, y2) on the coordinates with the upper left corner of the sample image as the origin. In addition, the eye positions in the vertical direction in the drawing (ie, y1, y2) are the same in all sample images.

また、顔でないことが分かっているサンプル画像としては、３０×３０画素サイズを有する任意の画像を用いるものとする。 As a sample image that is known not to be a face, an arbitrary image having a 30 × 30 pixel size is used.

ここで、顔であることが分かっているサンプル画像として、両目の中心間距離が１０画素であり、平面上の回転角度が０度（すなわち顔が垂直な状態）のもののみを用いて学習を行った場合、参照データＥ１を参照して顔であると識別されるのは、両目の中心間距離が１０画素で全く回転していない顔のみである。写真画像Ｓ０に含まれる可能性がある顔のサイズは一定ではないため、顔が含まれるか否かを識別する際には、後述するように写真画像Ｓ０を拡大縮小して、サンプル画像のサイズに適合するサイズの顔の位置を識別できるようにしている。しかしながら、両目の中心間距離を正確に１０画素とするためには、写真画像Ｓ０のサイズを拡大率として例えば１．１単位で段階的に拡大縮小しつつ識別を行う必要があるため、演算量が膨大なものとなる。 Here, as a sample image that is known to be a face, learning is performed using only a center image whose distance between the centers of both eyes is 10 pixels and the rotation angle on the plane is 0 degree (that is, the face is vertical). When performed, only the face which is identified as a face by referring to the reference data E1 is a face which is not rotated at all with a distance between the centers of both eyes of 10 pixels. Since the size of a face that may be included in the photographic image S0 is not constant, when identifying whether or not a face is included, the size of the sample image is enlarged by scaling the photographic image S0 as described later. It is possible to identify the position of a face of a size that fits. However, in order to accurately set the distance between the centers of both eyes to 10 pixels, the size of the photographic image S0 needs to be identified while being enlarged or reduced in steps of, for example, 1.1 units as an enlargement ratio. Will be enormous.

また、写真画像Ｓ０に含まれる可能性がある顔は、図１１（ａ）に示すように平面上の回転角度が０度のみではなく、図１１（ｂ）、（ｃ）に示すように回転している場合もある。しかしながら、両目の中心間距離が１０画素であり、顔の回転角度が０度のサンプル画像のみを使用して学習を行った場合、顔であるにも拘わらず、図１１（ｂ）、（ｃ）に示すように回転した顔については識別を行うことができなくなってしまう。 Further, the face that may be included in the photographic image S0 is not only rotated at 0 degree on the plane as shown in FIG. 11A, but is rotated as shown in FIGS. 11B and 11C. Sometimes it is. However, when learning is performed using only a sample image in which the distance between the centers of both eyes is 10 pixels and the rotation angle of the face is 0 degrees, FIGS. As shown in (), the rotated face cannot be identified.

このため、本実施形態においては、顔であることが分かっているサンプル画像として、図９に示すように両目の中心間距離が９，１０，１１画素であり、各距離において平面上±１５度の範囲にて３度単位で段階的に顔を回転させたサンプル画像を用いて、参照データＥ１の学習に許容度を持たせるようにしたものである。これにより、後述する顔検出実行部２４において識別を行う際には、写真画像Ｓ０を拡大率として１１／９単位で段階的に拡大縮小すればよいため、写真画像Ｓ０のサイズを例えば拡大率として例えば１．１単位で段階的に拡大縮小する場合と比較して、演算時間を低減できる。また、図１１（ｂ）、（ｃ）に示すように回転している顔も識別することができる。 Therefore, in this embodiment, as a sample image known to be a face, the distance between the centers of both eyes is 9, 10, 11 pixels as shown in FIG. 9, and ± 15 degrees on the plane at each distance. In this range, a sample image obtained by rotating the face step by step in units of 3 degrees is allowed to learn the reference data E1. As a result, when the face detection execution unit 24 to be described later performs identification, the photographic image S0 may be enlarged or reduced in steps of 11/9 as an enlargement rate. For example, the calculation time can be reduced as compared with the case where the enlargement / reduction is performed in steps of 1.1 units. In addition, as shown in FIGS. 11B and 11C, a rotating face can be identified.

以下、図１２のフローチャートを参照しながらサンプル画像群の学習手法の一例を説明する。 Hereinafter, an example of a learning method for the sample image group will be described with reference to the flowchart of FIG.

図１３を参照しながらある識別器の作成について説明する。図１３の左側のサンプル画像に示すように、この識別器を作成するための画素群を構成する各画素は、顔であることが分かっている複数のサンプル画像上における、右目の中心にある画素Ｐ１、右側の頬の部分にある画素Ｐ２、額の部分にある画素Ｐ３および左側の頬の部分にある画素Ｐ４である。そして顔であることが分かっているすべてのサンプル画像について全画素Ｐ１〜Ｐ４における特徴量Ｃ０の組み合わせが求められ、そのヒストグラムが作成される。ここで、特徴量Ｃ０は勾配ベクトルＫの方向および大きさを表すが、勾配ベクトルＫの方向は０〜３５９の３６０通り、勾配ベクトルＫの大きさは０〜２５５の２５６通りあるため、これをそのまま用いたのでは、組み合わせの数は１画素につき３６０×２５６通りの４画素分、すなわち（３６０×２５６）⁴通りとなってしまい、学習および検出のために多大なサンプルの数、時間およびメモリを要することとなる。このため、本実施形態においては、勾配ベクトルの方向を０〜３５９を０〜４４と３１５〜３５９（右方向、値：０），４５〜１３４（上方向値：１），１３５〜２２４（左方向、値：２），２２５〜３１４（下方向、値３）に４値化し、勾配ベクトルの大きさを３値化（値：０〜２）する。そして、以下の式を用いて組み合わせの値を算出する。 The creation of a classifier will be described with reference to FIG. As shown in the sample image on the left side of FIG. 13, each pixel constituting the pixel group for creating the discriminator is a pixel at the center of the right eye on a plurality of sample images that are known to be faces. P1, a pixel P2 on the right cheek, a pixel P3 on the forehead, and a pixel P4 on the left cheek. Then, combinations of feature amounts C0 in all the pixels P1 to P4 are obtained for all sample images that are known to be faces, and a histogram thereof is created. Here, the feature amount C0 represents the direction and magnitude of the gradient vector K. Since the gradient vector K has 360 directions from 0 to 359 and the gradient vector K has 256 sizes from 0 to 255, If it is used as it is, the number of combinations is 360 × 256 four pixels per pixel, that is, (360 × 256) ^four , and the number of samples, time and memory for learning and detection are large. Will be required. For this reason, in this embodiment, the gradient vector directions are 0 to 359, 0 to 44, 315 to 359 (right direction, value: 0), 45 to 134 (upward value: 1), and 135 to 224 (left). Direction, value: 2), 225-314 (downward, value 3), and quaternarization, and the gradient vector magnitude is ternarized (value: 0-2). And the value of a combination is computed using the following formula | equation.

組み合わせの値＝０（勾配ベクトルの大きさ＝０の場合）
組み合わせの値＝（（勾配ベクトルの方向＋１）×勾配ベクトルの大きさ（勾配ベクトルの大きさ＞０の場合）
これにより、組み合わせ数が９⁴通りとなるため、特徴量Ｃ０のデータ数を低減できる。 Combination value = 0 (when gradient vector size = 0)
Combination value = ((gradient vector direction + 1) × gradient vector magnitude (gradient vector magnitude> 0)
Thus, since the number of combinations is nine patterns ^4, can reduce the number of data of the characteristic amounts C0.

同様に、顔でないことが分かっている複数のサンプル画像についても、ヒストグラムが作成される。なお、顔でないことが分かっているサンプル画像については、顔であることが分かっているサンプル画像上における上記画素Ｐ１〜Ｐ４の位置に対応する画素が用いられる。これらの２つのヒストグラムが示す頻度値の比の対数値を取ってヒストグラムで表したものが、図１３の一番右側に示す、識別器として用いられるヒストグラムである。この識別器のヒストグラムが示す各縦軸の値を、以下、識別ポイントと称する。この識別器によれば、正の識別ポイントに対応する特徴量Ｃ０の分布を示す画像は顔である可能性が高く、識別ポイントの絶対値が大きいほどその可能性は高まると言える。逆に、負の識別ポイントに対応する特徴量Ｃ０の分布を示す画像は顔でない可能性が高く、やはり識別ポイントの絶対値が大きいほどその可能性は高まる。ステップＳ２では、識別に使用され得る複数種類の画素群を構成する各画素における特徴量Ｃ０の組み合わせについて、上記のヒストグラム形式の複数の識別器が作成される。 Similarly, histograms are created for a plurality of sample images that are known not to be faces. For the sample image that is known not to be a face, pixels corresponding to the positions of the pixels P1 to P4 on the sample image that is known to be a face are used. A histogram used as a discriminator shown on the right side of FIG. 13 is a histogram obtained by taking logarithmic values of ratios of frequency values indicated by these two histograms. The value of each vertical axis indicated by the histogram of the discriminator is hereinafter referred to as an identification point. According to this classifier, an image showing the distribution of the feature quantity C0 corresponding to the positive identification point is highly likely to be a face, and it can be said that the possibility increases as the absolute value of the identification point increases. Conversely, an image showing the distribution of the feature quantity C0 corresponding to the negative identification point is highly likely not to be a face, and the possibility increases as the absolute value of the identification point increases. In step S 2, a plurality of classifiers in the above-described histogram format are created for combinations of feature amounts C 0 in the respective pixels constituting a plurality of types of pixel groups that can be used for identification.

続いて、ステップＳ２で作成した複数の識別器のうち、画像が顔であるか否かを識別するのに最も有効な識別器が選択される。最も有効な識別器の選択は、各サンプル画像の重みを考慮して行われる。この例では、各識別器の重み付き正答率が比較され、最も高い重み付き正答率を示す識別器が選択される（Ｓ３）。すなわち、最初のステップＳ３では、各サンプル画像の重みは等しく１であるので、単純にその識別器によって画像が顔であるか否かが正しく識別されるサンプル画像の数が最も多いものが、最も有効な識別器として選択される。一方、後述するステップＳ５において各サンプル画像の重みが更新された後の２回目のステップＳ３では、重みが１のサンプル画像、重みが１よりも大きいサンプル画像、および重みが１よりも小さいサンプル画像が混在しており、重みが１よりも大きいサンプル画像は、正答率の評価において、重みが１のサンプル画像よりも重みが大きい分多くカウントされる。これにより、２回目以降のステップＳ３では、重みが小さいサンプル画像よりも、重みが大きいサンプル画像が正しく識別されることに、より重点が置かれる。 Subsequently, the most effective classifier for identifying whether or not the image is a face is selected from the plurality of classifiers created in step S2. The most effective classifier is selected in consideration of the weight of each sample image. In this example, the weighted correct answer rate of each classifier is compared, and the classifier showing the highest weighted correct answer rate is selected (S3). That is, in the first step S3, since the weight of each sample image is equal to 1, the number of sample images in which the image is correctly identified by the classifier is simply the largest. Selected as a valid discriminator. On the other hand, in the second step S3 after the weight of each sample image is updated in step S5, which will be described later, a sample image with a weight of 1, a sample image with a weight greater than 1, and a sample image with a weight less than 1 The sample images having a weight greater than 1 are counted more in the evaluation of the correct answer rate because the weight is larger than the sample images having a weight of 1. Thereby, in step S3 after the second time, more emphasis is placed on correctly identifying a sample image having a large weight than a sample image having a small weight.

次に、それまでに選択した識別器の組み合わせの正答率、すなわち、それまでに選択した識別器を組み合わせて使用して各サンプル画像が顔の画像であるか否かを識別した結果が、実際に顔の画像であるか否かの答えと一致する率が、所定の閾値を超えたか否かが確かめられる（Ｓ４）。ここで、組み合わせの正答率の評価に用いられるのは、現在の重みが付けられたサンプル画像群でも、重みが等しくされたサンプル画像群でもよい。所定の閾値を超えた場合は、それまでに選択した識別器を用いれば画像が顔であるか否かを十分に高い確率で識別できるため、学習は終了する。所定の閾値以下である場合は、それまでに選択した識別器と組み合わせて用いるための追加の識別器を選択するために、ステップＳ６へと進む。 Next, the correct answer rate of the classifiers selected so far, that is, the result of identifying whether each sample image is a face image using a combination of the classifiers selected so far, is actually It is ascertained whether or not the rate that matches the answer indicating whether the image is a face image exceeds a predetermined threshold (S4). Here, the sample image group to which the current weight is applied or the sample image group to which the weight is equal may be used for evaluating the correct answer rate of the combination. When the predetermined threshold value is exceeded, learning can be completed because it is possible to identify whether the image is a face with a sufficiently high probability by using the classifier selected so far. If it is less than or equal to the predetermined threshold, the process advances to step S6 to select an additional classifier to be used in combination with the classifier selected so far.

ステップＳ６では、直近のステップＳ３で選択された識別器が再び選択されないようにするため、その識別器が除外される。 In step S6, the discriminator selected in the most recent step S3 is excluded so as not to be selected again.

次に、直近のステップＳ３で選択された識別器では顔であるか否かを正しく識別できなかったサンプル画像の重みが大きくされ、画像が顔であるか否かを正しく識別できたサンプル画像の重みが小さくされる（Ｓ５）。このように重みを大小させる理由は、次の識別器の選択において、既に選択された識別器では正しく識別できなかった画像を重要視し、それらの画像が顔であるか否かを正しく識別できる識別器が選択されるようにして、識別器の組み合わせの効果を高めるためである。 Next, the weight of the sample image that could not be correctly identified as a face by the classifier selected in the most recent step S3 is increased, and the sample image that can be correctly identified as whether or not the image is a face is increased. The weight is reduced (S5). The reason for increasing or decreasing the weight in this way is that in selecting the next discriminator, an image that cannot be discriminated correctly by the already selected discriminator is regarded as important, and whether or not those images are faces can be discriminated correctly. This is to increase the effect of the combination of the discriminators by selecting the discriminators.

続いて、ステップＳ３へと戻り、上記したように重み付き正答率を基準にして次に有効な識別器が選択される。 Subsequently, the process returns to step S3, and the next valid classifier is selected based on the weighted correct answer rate as described above.

以上のステップＳ３からＳ６を繰り返して、顔が含まれるか否かを識別するのに適した識別器として、特定の画素群を構成する各画素における特徴量Ｃ０の組み合わせに対応する識別器が選択されたところで、ステップＳ４で確認される正答率が閾値を超えたとすると、顔が含まれるか否かの識別に用いる識別器の種類と識別条件とが確定され（Ｓ７）、これにより参照データＥ１の学習を終了する。 By repeating the above steps S3 to S6, the classifier corresponding to the combination of the feature amount C0 in each pixel constituting the specific pixel group is selected as a classifier suitable for identifying whether or not a face is included. If the correct answer rate confirmed in step S4 exceeds the threshold value, the type of the discriminator used for discriminating whether or not a face is included and the discriminating condition are determined (S7), thereby the reference data E1. Finish learning.

なお、上記の学習手法を採用する場合において、識別器は、特定の画素群を構成する各画素における特徴量Ｃ０の組み合わせを用いて顔の画像と顔でない画像とを識別する基準を提供するものであれば、上記のヒストグラムの形式のものに限られずいかなるものであってもよく、例えば２値データ、閾値または関数等であってもよい。また、同じヒストグラムの形式であっても、図１３の中央に示した２つのヒストグラムの差分値の分布を示すヒストグラム等を用いてもよい。 In the case of adopting the above learning method, the discriminator provides a reference for discriminating between a face image and a non-face image using a combination of feature amounts C0 in each pixel constituting a specific pixel group. As long as it is not limited to the above histogram format, it may be anything, for example, binary data, a threshold value, a function, or the like. Further, even with the same histogram format, a histogram or the like indicating the distribution of difference values between the two histograms shown in the center of FIG. 13 may be used.

また、学習の方法としては上記手法に限定されるものではなく、ニューラルネットワーク等他のマシンラーニングの手法を用いることができる。 Further, the learning method is not limited to the above method, and other machine learning methods such as a neural network can be used.

顔検出実行部２４は、複数種類の画素群を構成する各画素における特徴量Ｃ０の組み合わせのすべてについて参照データＥ１が学習した識別条件を参照して、各々の画素群を構成する各画素における特徴量Ｃ０の組み合わせについての識別ポイントを求め、すべての識別ポイントを総合して顔を検出する。この際、特徴量Ｃ０である勾配ベクトルＫの方向は４値化され大きさは３値化される。本実施形態では、すべての識別ポイントを加算して、その加算値の正負および大小によって顔を検出する。例えば、識別ポイントの総和が正の値である場合、顔であると判断し、負の値である場合には顔ではないと判断する。 The face detection execution unit 24 refers to the identification conditions learned by the reference data E1 for all the combinations of the feature amounts C0 in the respective pixels constituting the plural types of pixel groups, and the features in the respective pixels constituting the respective pixel groups. An identification point for the combination of the quantity C0 is obtained, and a face is detected by combining all the identification points. At this time, the direction of the gradient vector K that is the feature amount C0 is quaternized and the magnitude is ternary. In the present embodiment, all the identification points are added, and a face is detected based on the positive / negative and magnitude of the added value. For example, if the total sum of the identification points is a positive value, it is determined that the face is present, and if the sum is negative, it is determined that the face is not a face.

ここで、写真画像Ｓ０のサイズは３０×３０画素のサンプル画像とは異なり、各種サイズを有するものとなっている可能性がある。また、顔が含まれる場合、平面上における顔の回転角度が０度であるとは限らない。このため、顔検出実行部２４は、図１４に示すように、写真画像Ｓ０を縦または横のサイズが３０画素となるまで段階的に拡大縮小するとともに平面上で段階的に３６０度回転させつつ（図１４においては縮小する状態を示す）、各段階において拡大縮小された写真画像Ｓ０上に３０×３０画素サイズのマスクＭを設定し、マスクＭを拡大縮小された写真画像Ｓ０上において１画素ずつ移動させながら、マスク内の画像が顔の画像であるか否か（すなわち、マスク内の画像に対して得られた識別ポイントの加算値が正か負か）の識別を行う。そして、この識別を拡大縮小および回転の全段階の写真画像Ｓ０について行い、識別ポイントの加算値が最も高い正の値が得られた段階におけるサイズおよび回転角度の写真画像Ｓ０から、識別されたマスクＭの位置に対応する３０×３０画素の領域を顔領域として検出すると共に、この領域の画像を顔画像Ｓ１として写真画像Ｓ０から抽出する。 Here, unlike the sample image of 30 × 30 pixels, the size of the photographic image S0 may have various sizes. When a face is included, the rotation angle of the face on the plane is not always 0 degrees. For this reason, as shown in FIG. 14, the face detection execution unit 24 enlarges or reduces the photographic image S0 stepwise until the vertical or horizontal size becomes 30 pixels and rotates it 360 degrees stepwise on the plane. (In FIG. 14, a state of reduction is shown) A mask M having a size of 30 × 30 pixels is set on the photographic image S0 enlarged and reduced at each stage, and the mask M is set to one pixel on the photographic image S0 enlarged and reduced. While moving the images one by one, it is determined whether or not the image in the mask is a face image (that is, whether or not the added value of the identification points obtained for the image in the mask is positive or negative). Then, this identification is performed on the photographic image S0 at all stages of enlargement / reduction and rotation, and the mask identified from the photographic image S0 of the size and rotation angle at the stage where the positive value with the highest added value of the identification points is obtained. An area of 30 × 30 pixels corresponding to the position of M is detected as a face area, and an image of this area is extracted as a face image S1 from the photographic image S0.

なお、参照データＥ１の生成時に学習したサンプル画像として両目の中心位置の画素数が９，１０，１１画素のものを使用しているため、写真画像Ｓ０を拡大縮小する時の拡大率は１１／９とすればよい。また、参照データＥ１の生成時に学習したサンプル画像として、顔が平面上で±１５度の範囲において回転させたものを使用しているため、写真画像Ｓ０は３０度単位で３６０度回転させればよい。 Since the sample images learned at the time of generating the reference data E1 have 9, 10, and 11 pixels at the center position of both eyes, the enlargement ratio when the photographic image S0 is enlarged / reduced is 11 / 9 is enough. In addition, since the sample image learned at the time of generating the reference data E1 uses a face rotated within a range of ± 15 degrees on the plane, the photographic image S0 can be rotated 360 degrees in units of 30 degrees. Good.

なお、第１の特徴量算出部２２は、写真画像Ｓ０の拡大縮小および回転という変形の各段階において特徴量Ｃ０を算出している。 Note that the first feature amount calculation unit 22 calculates the feature amount C0 at each stage of deformation such as enlargement / reduction and rotation of the photographic image S0.

顔検出部２０は、このようにして写真画像Ｓ０からおおよその顔の位置および大きさを検出して、顔画像Ｓ１を得る。 In this way, the face detection unit 20 detects the approximate position and size of the face from the photographic image S0, and obtains the face image S1.

図３は、目検出部３０の構成を示すブロック図である。目検出部３０は、顔検出部２０により得られた顔画像Ｓ１から両目の位置を検出するものであり、図示のように、顔画像Ｓ１から特徴量Ｃ０を算出する第２の特徴量算出部３２と、特徴量Ｃ０およびデータベース４０ａに記憶された参照データＥ２に基づいて目の位置の検出を実行する目検出実行部３４とを備えてなる。 FIG. 3 is a block diagram illustrating a configuration of the eye detection unit 30. The eye detection unit 30 detects the positions of both eyes from the face image S1 obtained by the face detection unit 20, and as illustrated, a second feature amount calculation unit that calculates a feature amount C0 from the face image S1. 32, and an eye detection execution unit 34 that detects the position of the eye based on the feature amount C0 and the reference data E2 stored in the database 40a.

本実施形態において、目検出実行部３４により識別される目の位置とは、顔における目尻から目頭の間の中心位置（図４中×で示す）であり、図４（ａ）に示すように真正面を向いた目の場合においては瞳の中心位置と同様であるが、図４（ｂ）に示すように右を向いた目の場合は瞳の中心位置ではなく、瞳の中心から外れた位置または白目部分に位置する。 In the present embodiment, the eye position identified by the eye detection execution unit 34 is the center position (indicated by x in FIG. 4) between the corners of the eyes and the eyes, as shown in FIG. 4 (a). In the case of an eye facing directly in front, it is the same as the center position of the pupil. However, in the case of an eye facing right as shown in FIG. 4B, not the center position of the pupil but a position deviating from the center of the pupil. Or located in the white eye area.

第２の特徴量算出部３２は、写真画像Ｓ０ではなく、顔画像Ｓ１から特徴量Ｃ０を算出する点を除いて、図２に示す顔検出部２０における第１の特徴量算出部２２と同じであるため、ここで、その詳細な説明を省略する。 The second feature quantity calculation unit 32 is the same as the first feature quantity calculation unit 22 in the face detection unit 20 shown in FIG. 2 except that the feature quantity C0 is calculated from the face image S1 instead of the photographic image S0. Therefore, detailed description thereof is omitted here.

データベース４０ｂに記憶された第２の参照データＥ２は、第１の参照データＥ１と同じように、後述するサンプル画像から選択された複数画素の組み合わせからなる複数種類の画素群のそれぞれについて、各画素群を構成する各画素における特徴量Ｃ０の組み合わせに対する識別条件を規定したものである。 The second reference data E2 stored in the database 40b is similar to the first reference data E1, with respect to each of a plurality of types of pixel groups each including a combination of a plurality of pixels selected from a sample image to be described later. This specifies the identification condition for the combination of the feature values C0 in each pixel constituting the group.

ここで、第２の参照データＥ２の学習には、図９に示すように両目の中心間距離が９．７，１０，１０．３画素であり、各距離において平面上±３度の範囲にて１度単位で段階的に顔を回転させたサンプル画像を用いている。そのため、第１の参照データＥ１と比較して学習の許容度は小さく、精確に目の位置を検出することができる。なお、第２の参照データＥ２を得るための学習は、用いられるサンプル画像群が異なる点を除いて、第１の参照データＥ１を得るための学習と同じであるので、ここでその詳細な説明を省略する。 Here, in learning of the second reference data E2, as shown in FIG. 9, the distance between the centers of both eyes is 9.7, 10, 10.3 pixels, and each distance is within a range of ± 3 degrees on the plane. Sample images in which the face is rotated step by step by 1 degree. Therefore, the tolerance of learning is smaller than that of the first reference data E1, and the eye position can be accurately detected. Note that the learning for obtaining the second reference data E2 is the same as the learning for obtaining the first reference data E1 except that the sample image group used is different. Is omitted.

目検出実行部３４は、顔検出部２０により得られた顔画像Ｓ１上において、複数種類の画素群を構成する各画素における特徴量Ｃ０の組み合わせのすべてについて第２の参照データＥ２が学習した識別条件を参照して、各々の画素群を構成する各画素における特徴量Ｃ０の組み合わせについての識別ポイントを求め、すべての識別ポイントを総合して顔に含まれる目の位置を識別する。この際、特徴量Ｃ０である勾配ベクトルＫの方向は４値化され大きさは３値化される。 The eye detection execution unit 34 performs identification in which the second reference data E2 has learned all the combinations of the feature amounts C0 in the respective pixels constituting the plurality of types of pixel groups on the face image S1 obtained by the face detection unit 20. With reference to the condition, an identification point for a combination of the feature amount C0 in each pixel constituting each pixel group is obtained, and the position of the eye included in the face is identified by combining all the identification points. At this time, the direction of the gradient vector K that is the feature amount C0 is quaternized and the magnitude is ternary.

ここで、目検出実行部３４は、顔検出部２０により得られた顔画像Ｓ１のサイズを段階的に拡大縮小するとともに平面上で段階的に３６０度回転させつつ、各段階において拡大縮小された顔画像上に３０×３０画素サイズのマスクＭを設定し、マスクＭを拡大縮小された顔上において１画素ずつ移動させながら、マスク内の画像における目の位置の検出を行う。 Here, the eye detection execution unit 34 enlarges / reduces the size of the face image S1 obtained by the face detection unit 20 in stages and rotates it 360 degrees on the plane in steps, and enlarges / reduces in each stage. A mask M having a size of 30 × 30 pixels is set on the face image, and the position of the eye in the image in the mask is detected while moving the mask M pixel by pixel on the enlarged / reduced face.

なお、第２参照データＥ２の生成時に学習したサンプル画像として両目の中心位置の画素数が９．０７，１０，１０．３画素のものを使用しているため、顔画像Ｓ１の拡大縮小時の拡大率は１０．３／９．７とすればよい。また、第２の参照データＥ２の生成時に学習したサンプル画像として、顔が平面上で±３度の範囲において回転させたものを使用しているため、顔画像は６度単位で３６０度回転させればよい。 Since the sample image learned at the time of generating the second reference data E2 has a number of pixels at the center position of both eyes of 9.07, 10, and 10.3 pixels, the face image S1 is enlarged or reduced. The enlargement ratio may be 10.3 / 9.7. Further, as the sample image learned at the time of generating the second reference data E2, a face image rotated in a range of ± 3 degrees on the plane is used, so the face image is rotated 360 degrees in units of 6 degrees. Just do it.

なお、第２の特徴量算出部３２は、顔画像Ｓ１の拡大縮小および回転という変形の各段階において特徴量Ｃ０を算出する。 Note that the second feature amount calculation unit 32 calculates the feature amount C0 at each stage of deformation of enlargement / reduction and rotation of the face image S1.

そして、本実施形態では、顔画像Ｓ１の変形の全段階においてすべての識別ポイントを加算し、加算値が最も大きい変形の段階における３０×３０画素のマスクＭ内の画像において、左上隅を原点とする座標を設定し、サンプル画像における目の位置の座標（ｘ１，ｙ１）、（ｘ２，ｙ２）に対応する位置を求め、変形前の顔画像Ｓ１におけるこの位置に対応する位置を目の位置として検出する。 In this embodiment, all the identification points are added at all stages of deformation of the face image S1, and the upper left corner is set as the origin in the image in the 30 × 30 pixel mask M at the stage of deformation having the largest added value. The coordinates corresponding to the coordinates (x1, y1) and (x2, y2) of the eye position in the sample image are obtained, and the position corresponding to this position in the face image S1 before deformation is set as the eye position. To detect.

目検出部３０は、このようにして、顔検出部２０により得られた顔画像Ｓ１から両目の位置を夫々検出する。 In this way, the eye detection unit 30 detects the positions of both eyes from the face image S1 obtained by the face detection unit 20.

肌色画素検出部５０は、顔検出部２０により得られた顔画像Ｓ１から肌色の画素を検出するものであり、その検出の方法としては、既存のいかなるものを用いてもよい。例えば、Ｒ値、Ｇ値、Ｂ値を夫々Ｒ、Ｇ、Ｂとし、「ｒ＝Ｒ／（Ｒ＋Ｇ＋Ｂ），ｇ＝Ｇ／（Ｒ＋Ｇ＋Ｂ）」により表されるｒ、ｇを２つの座標軸とする２次元平面において、予め肌色の範囲を設定し、設定されたこの肌色範囲に含まれる色を有する画素を肌色画素として検出すればよい。 The skin color pixel detection unit 50 detects a skin color pixel from the face image S1 obtained by the face detection unit 20, and any existing method may be used as the detection method. For example, R, G, and B values are R, G, and B, respectively, and r and g expressed by “r = R / (R + G + B), g = G / (R + G + B)” are two coordinate axes 2 In the dimension plane, a skin color range is set in advance, and a pixel having a color included in the set skin color range may be detected as a skin color pixel.

口検出部６０は、肌色画素検出部５０の検出結果に顔画像Ｓ１から口の縦方向（両目の並ぶ方向と垂直する方法）の位置を検出するものであり、その検出の方法としては、既存のいかなるものを用いてもよい。例えば、まず、顔画像Ｓ１の各画素のうちの、肌色画素検出部５０により肌色画素として検出されなかった画素（以下非肌色画素という）に対して、その赤みの強度を算出する。そして、算出された各非肌色画素の赤みの強度を、顔の横方向（縦方向と垂直する方向）において累積して顔の縦方向の高さ毎の累積値を得る。そして、最も高い累積値が対応する高さを、口が顔の縦方向における位置として検出する。 The mouth detection unit 60 detects the position of the mouth in the vertical direction (a method perpendicular to the direction in which both eyes are aligned) from the face image S1 in the detection result of the skin color pixel detection unit 50. Any of these may be used. For example, first, the redness intensity of each pixel of the face image S1 that is not detected as a skin color pixel by the skin color pixel detection unit 50 (hereinafter referred to as a non-skin color pixel) is calculated. Then, the calculated redness intensity of each non-skin color pixel is accumulated in the horizontal direction of the face (direction perpendicular to the vertical direction) to obtain an accumulated value for each vertical height of the face. Then, the height corresponding to the highest cumulative value is detected as the position of the mouth in the vertical direction of the face.

顎範囲推定部７０ａは、目検出部３０により検出された両目の位置と、口検出部６０により検出された口の位置とに基づいて顎範囲を推定するものである。顎範囲の推定は、顎の位置を精確かつ迅速に検出するためのものであり、顎が存在し、かつ顔における顎以外の他のパーツ、首にかかった衣服などをできるだけ排除した領域を顎範囲として推定することが望ましい。例えば顔の中心位置以下の領域や、目の位置以下の領域などを顎範囲として推定するように、顎範囲の上限（顎範囲が顔の縦方向における最上端の位置）のみを決める推定方法でも、頭頂部や、目などのパーツを排除することができるが、首にかかった衣服などをできるだけ排除するためには、顎範囲の上限と共に、顎範囲の下限（顎範囲が顔の縦方向における最下端の位置）も決めるようにすることが好ましい。さらに、顔下部の左右側にあるエラ部分は、エッジを形成しているので、顎の位置をより精確に検出するために、エラ部分も顎範囲から排除することが好ましい。ここで、図１５を参照しながら顎範囲推定部７０ａを説明する。 The jaw range estimation unit 70 a estimates the jaw range based on the positions of both eyes detected by the eye detection unit 30 and the position of the mouth detected by the mouth detection unit 60. The estimation of the jaw range is intended to accurately and quickly detect the position of the jaw, and the area where the jaw is present and other parts of the face other than the jaw and clothes on the neck are excluded as much as possible. It is desirable to estimate as a range. For example, an estimation method that determines only the upper limit of the jaw range (the jaw range is the uppermost position in the vertical direction of the face) so that the area below the center position of the face or the area below the eye position is estimated as the jaw range. The top part of the head and the eyes can be excluded, but in order to eliminate the clothes on the neck as much as possible, the upper part of the jaw area and the lower part of the jaw area (the jaw area is in the vertical direction of the face) It is preferable to determine the position of the lowermost end. Furthermore, since the error portions on the left and right sides of the lower part of the face form edges, it is preferable to exclude the error portions from the jaw range in order to detect the position of the jaw more accurately. Here, the jaw range estimation unit 70a will be described with reference to FIG.

本実施形態の画像処理システムＡにおける顎範囲推定部７０ａは、下記のように顎範囲を推定する。 The jaw range estimation unit 70a in the image processing system A of the present embodiment estimates the jaw range as follows.

１．まず、両目の位置（図１５には点Ａ１、Ａ２により示されている）から両目間の距離Ｄを求め、両目の位置および両目間の距離Ｄに基づいて顎の仮位置を推定する。 1. First, a distance D between both eyes is obtained from the positions of both eyes (indicated by points A1 and A2 in FIG. 15), and the temporary jaw position is estimated based on the positions of both eyes and the distance D between the eyes.

人の顔の大きさは夫々異なるが、特別な例を除いて、顔の大きさ（横幅、縦幅）は、両目間の距離と対応する関係にあると共に、目から顎までの距離（両目間の中心位置から顎までの垂直距離）も、両目間の距離と対応する関係にある。本実施形態における顎範囲推定部７０ａは、この点に着目し、まず、下記の式（１）に従って、目から顎までの距離Ｌを推定することによって顎の仮位置を推定する。 The size of a person's face varies, but except for special cases, the size of the face (width and height) has a corresponding relationship with the distance between the eyes and the distance from the eyes to the chin (both eyes). The vertical distance from the center position to the jaw) also has a corresponding relationship with the distance between the eyes. The jaw range estimation unit 70a in the present embodiment pays attention to this point, and first estimates the temporary position of the jaw by estimating the distance L from the eyes to the jaw according to the following equation (1).

Ｌ＝ｕ×Ｄ（１）
但し，Ｌ：目から顎までの推定距離
Ｄ：両目間の距離
ｕ：２．１７０

ここで係数ｕは、多数のサンプル顔写真画像を用いて求められたものである。
L = u × D (1)
L: Estimated distance from eye to jaw
D: Distance between eyes
u: 2.170

Here, the coefficient u is obtained using a number of sample face photograph images.

顎範囲推定部７０ａは、両目間の中心線上において、目の位置の下にあり、両目間の中心位置（図中点Ａ）までの距離が式（１）に従って求められた距離Ｌとなる点（図中は例としてＣ）を顎の仮位置として推定する。 The jaw range estimation unit 70a is located below the eye position on the center line between the eyes, and the distance to the center position between the eyes (point A in the figure) is the distance L obtained according to the equation (1). (C in the figure as an example) is estimated as the temporary position of the jaw.

２．次に、顎範囲推定部７０ａは、顎の仮位置Ｃを中心する顎範囲を示す枠を推定する。具体的には、まず、両目間の距離Ｄを枠の横幅とする。そして、両目間の中心線上において、口検出部６０により検出された口の高さに対応する位置（図中点Ｂ）の下にあり、点Ｂと顎の仮位置Ｃとの間で点Ｂ寄りの位置（本実施形態においては、点Ｂからの距離が、点Ｂと顎の仮位置Ｃとの距離（ｄ）の１／５となる点（図中Ｂ１））を枠の上辺の中心点とする。すなわち、顎範囲推定部７０ａは、顎の仮位置Ｃを中心とした、高さが（２×ｄ）／５で、横幅がＤである枠（図中斜線部分）を顎範囲として推定する。 2. Next, the jaw range estimation unit 70a estimates a frame indicating the jaw range centering on the temporary jaw position C. Specifically, first, the distance D between both eyes is set as the width of the frame. Then, on the center line between the eyes, the point B is below the position (point B in the figure) corresponding to the height of the mouth detected by the mouth detector 60, and between the point B and the temporary position C of the jaw. The center of the upper side of the frame is the position near the point (in this embodiment, the point (B1 in the figure) where the distance from the point B is 1/5 of the distance (d) between the point B and the temporary jaw position C). Let it be a point. That is, the jaw range estimation unit 70a estimates a frame (hatched portion in the figure) having a height of (2 × d) / 5 and a horizontal width D centered on the temporary position C of the jaw as the jaw range.

顎範囲推定部７０ａは、このようにして顎範囲を推定して、顎範囲の位置および大きさを示す情報を非肌色画素除去部８０に出力する。
The jaw range estimation unit 70a estimates the jaw range in this way, and outputs information indicating the position and size of the jaw range to the non-skin color pixel removal unit 80.

非肌色画素除去部８０は、顎の位置をより精確に検出することができるように、顎範囲推定部７０ａにより得られた顎範囲内の画素のうちの、肌色画素ではない画素を除去するものである。具体的には、非肌色画素除去部８０は、顎範囲推定部７０ａにより得られた顎範囲内の画素のうちの、肌色画素検出部５０により肌色画素として検出された画素以外の画素、すなわち非肌色画素を除去する。 The non-skin color pixel removing unit 80 removes pixels that are not skin color pixels from the pixels in the jaw range obtained by the jaw range estimating unit 70a so that the position of the jaw can be detected more accurately. It is. Specifically, the non-skin color pixel removing unit 80 is a pixel other than the pixels detected as the skin color pixels by the skin color pixel detecting unit 50 among the pixels in the chin range obtained by the chin range estimating unit 70a. Remove skin color pixels.

顎位置検出部９０は、非肌色画素除去部８０により非肌色画素を除去された顎範囲内において、顎の位置を検出する。図１６は、顎位置検出部９０の構成を示すブロック図である。図示のように、顎位置検出部９０は、射影処理部９２と、エッジ強度検出部９４と、顎位置決定部９６とを備えてなる。 The jaw position detection unit 90 detects the position of the jaw within the jaw range from which the non-skin color pixel is removed by the non-skin color pixel removal unit 80. FIG. 16 is a block diagram illustrating a configuration of the jaw position detection unit 90. As illustrated, the jaw position detection unit 90 includes a projection processing unit 92, an edge strength detection unit 94, and a jaw position determination unit 96.

ここで、顔における両目の並ぶ方向の軸をｘ軸、ｘ方向と垂直する方向の軸をｙ軸として顎位置検出部９０を説明する。 Here, the jaw position detector 90 will be described with the axis in the direction in which both eyes are arranged in the face as the x axis and the axis in the direction perpendicular to the x direction as the y axis.

まず、射影処理部９２は、図１５における斜線部分により示された、顎範囲推定部７０ａにより推定された顎範囲内の画素（非肌色画素は、除去済み）に対して、その画素値をｙ軸に対して射影する。すなわち、ｙ軸における高さが同じである画素の画素値（例えば輝度値）を累積してｙ軸の高さ毎の累積画素値を得る。 First, the projection processing unit 92 sets the pixel value to y for the pixels in the jaw range estimated by the jaw range estimation unit 70a indicated by the hatched portion in FIG. 15 (non-skin color pixels have been removed). Project to the axis. That is, the pixel values (for example, luminance values) of pixels having the same height on the y axis are accumulated to obtain accumulated pixel values for each y axis height.

エッジ強度検出部９４は、累積画素値を用いて、ｙ軸における高さ毎のエッジ強度を検出する。具体的には、例えば図１７に示すような微分フィルタを累積画素値に対してｙ軸に沿った方向に適用して、ｙ軸における高さ毎のエッジ強度を検出する。 The edge strength detection unit 94 detects the edge strength for each height on the y-axis using the accumulated pixel value. Specifically, for example, a differential filter as shown in FIG. 17 is applied to the accumulated pixel value in the direction along the y-axis to detect the edge intensity for each height on the y-axis.

顎位置決定部９６は、エッジ強度検出部９４により検出された各々のエッジ強度のうち、最もエッジ強度が大きい高さを顎の位置の高さとすると共に、両目間の中心線上における、この高さに対応する位置を顎位置として検出する。 The chin position determination unit 96 sets the height of the edge strength among the edge strengths detected by the edge strength detection unit 94 as the height of the chin position, and this height on the center line between both eyes. The position corresponding to is detected as the jaw position.

出力部１００は、顎位置検出部９０により検出された顎の位置を示す情報を出力し、顎の位置を必要とするトリミング処理などの処理に供する。 The output unit 100 outputs information indicating the position of the jaw detected by the jaw position detection unit 90, and uses the information for a trimming process that requires the position of the jaw.

図１８は、図１に示す実施形態の画像処理システムＡの処理を示すフローチャートである。図示のように、本発明形態の画像処理システムＡにおいて、画像入力部１０により入力された顔写真画像Ｓ０に対して、まず、顔検出部２０により顔のおおよその位置および大きさを得るための顔検出が行われる（Ｓ１０、Ｓ１２）。顔検出部２０により得られた顔画像Ｓ１に対して、さらに目検出部３０により両目の位置が検出される（Ｓ１４）。目の位置の検出と並行して、肌色画素検出部５０により顔画像Ｓ１における肌色画素が検出される（Ｓ１６）。そして、口検出部６０により、肌色画素検出部５０の検出結果を利用して、顔画像Ｓ１における口の位置が検出される（Ｓ１８）。目検出部３０により得られた両目の位置、および口検出部６０により得られた口の位置に基づいて、顎範囲が顎範囲推定部７０ａにより推定される（Ｓ２０）。顎位置検出部９０は、非肌色画素除去部８０により非肌色画素を除去した顎範囲内において、顎の位置を検出する（Ｓ２５、Ｓ３０）。顎位置検出部９０により検出された顎の位置を示す情報は、出力部１００により出力される（Ｓ４０）ことをもって、画像処理システムＡの処理が終了する。 FIG. 18 is a flowchart showing processing of the image processing system A according to the embodiment shown in FIG. As shown in the figure, in the image processing system A according to the embodiment of the present invention, for the face photo image S0 input by the image input unit 10, first, the face detection unit 20 obtains the approximate position and size of the face. Face detection is performed (S10, S12). The position of both eyes is further detected by the eye detection unit 30 from the face image S1 obtained by the face detection unit 20 (S14). In parallel with the detection of the eye position, the skin color pixel detection unit 50 detects the skin color pixels in the face image S1 (S16). The mouth detection unit 60 detects the position of the mouth in the face image S1 using the detection result of the skin color pixel detection unit 50 (S18). Based on the position of both eyes obtained by the eye detection unit 30 and the position of the mouth obtained by the mouth detection unit 60, the jaw range is estimated by the jaw range estimation unit 70a (S20). The chin position detection unit 90 detects the position of the chin within the chin range from which the non-skin color pixels are removed by the non-skin color pixel removal unit 80 (S25, S30). Information indicating the position of the jaw detected by the jaw position detection unit 90 is output by the output unit 100 (S40), and the processing of the image processing system A ends.

このように、本実施形態の画像処理システムＡによれば、顔写真画像から両目の位置を夫々検出し、検出された両目の位置に基づいて顎範囲を推定し、推定された顎範囲内において顎の位置を検出するようにしているので、両目の位置などから顎の位置を推定する従来の顎位置の取得方法より、精確な顎の位置を得ることができる。さらに、口の位置も検出して、顎範囲を口より下の領域に設定しているので、口の影響による顎の位置の検出精度の低下を防ぐことができる。 As described above, according to the image processing system A of the present embodiment, the positions of both eyes are detected from the face photograph image, the jaw range is estimated based on the detected positions of both eyes, and the estimated jaw range is within the estimated jaw range. Since the position of the jaw is detected, an accurate jaw position can be obtained by a conventional jaw position acquisition method that estimates the position of the jaw from the positions of both eyes. Furthermore, since the position of the mouth is also detected and the jaw range is set to a region below the mouth, it is possible to prevent a decrease in detection accuracy of the position of the jaw due to the influence of the mouth.

また、顎と首とが略同じ肌色であるため、顎と首とが形成されたエッジが弱いため、通常のエッジ検出手法により顎の位置を検出することが困難であるが、本実施形態における顎位置検出部９０は、顎範囲内の画素の画素値をｙ軸に対して射影して累積画素値を得、この累積画素値をもってｙ軸に沿った方向における高さ毎のエッジ強度を求め、求められたエッジ強度に基づいて顎の位置を決定するようにしているので、確実に顎の位置を検出することができる。 Further, since the chin and the neck have substantially the same skin color, the edge where the chin and the neck are formed is weak, so it is difficult to detect the position of the chin by a normal edge detection method. The jaw position detection unit 90 projects the pixel values of the pixels in the jaw range onto the y-axis to obtain a cumulative pixel value, and obtains the edge strength for each height in the direction along the y-axis with the cumulative pixel value. Since the position of the jaw is determined based on the obtained edge strength, the position of the jaw can be reliably detected.

図１９は、本発明の第２の実施形態となる画像処理システムＢの構成を示すブロック図である。図示のように、本実施形態の画像処理システムＢは、写真画像Ｓ０を入力する画像入力部１０と、画像入力部１０により入力された写真画像における顔のおおよその位置および大きさを検出し、顔画像Ｓ１を得る顔検出部２０と、顔検出部２０に用いられる参照データＥ１を記憶したデータベース４０ｂと、顔画像Ｓ１から肌色画素を検出する肌色画素検出部５０と、肌色画素検出部５０の検出結果に基づいて口の位置を検出する口検出部６０と、顔検出部２０により得られた顔画像Ｓ１、および口検出部６０により検出された口の位置に基づいて顎範囲を推定する顎範囲推定部７０ｂと、肌色画素検出部５０の検出結果に基づいて、顎範囲推定部７０ｂにより推定された顎範囲の画素のうちの、肌色画素ではない画素を除去する非肌色画素除去部８０と、非肌色画素が除去された顎範囲内において、顎の位置を検出する顎位置検出部９０と、顎位置検出部９０により得られた顎位置を示す情報を出力して、例えばトリミングなどの処理に供する出力部１００とを備えてなる。 FIG. 19 is a block diagram showing a configuration of an image processing system B according to the second embodiment of the present invention. As shown in the figure, the image processing system B of the present embodiment detects an approximate position and size of a face in an image input unit 10 that inputs a photographic image S0 and a photographic image input by the image input unit 10, A face detection unit 20 that obtains a face image S1, a database 40b that stores reference data E1 used in the face detection unit 20, a skin color pixel detection unit 50 that detects skin color pixels from the face image S1, and a skin color pixel detection unit 50 Mouth detection unit 60 that detects the position of the mouth based on the detection result, the face image S1 obtained by the face detection unit 20, and the jaw that estimates the jaw range based on the position of the mouth detected by the mouth detection unit 60 Based on the detection results of the range estimation unit 70b and the skin color pixel detection unit 50, non-skin color pixels that remove pixels that are not skin color pixels from pixels in the jaw range estimated by the jaw range estimation unit 70b In the jaw 80 from which the non-skin color pixels have been removed, the leaving part 80, the jaw position detection part 90 for detecting the position of the jaw, and the information indicating the jaw position obtained by the jaw position detection part 90 are output. And an output unit 100 used for processing such as trimming.

なお、本実施形態の画像処理システムＢの各構成については、図１に示す画像処理システムＡにおける相対応する構成と同じものについては、同じ符号を付与すると共に、その詳細な説明を省略する。 In addition, about each structure of the image processing system B of this embodiment, while attaching | subjecting the same code | symbol about the same thing as the corresponding structure in the image processing system A shown in FIG. 1, the detailed description is abbreviate | omitted.

画像処理システムＢの顎範囲推定部７０ｂは、顔検出部２０により得られた顔画像Ｓ１、および口検出部６０により得られた口の位置に基づいて顎範囲を推定するものである。具体的には、口より下の所定の範囲、例えば図２０に示すように、顔画像Ｓ１の中心線と同じ中心線を有し、顔画像Ｓ１の底辺の位置を底辺の位置とすると共に、横幅は顔画像Ｓ１の横幅の１／２であり、高さは口の位置Ｂから顔画像Ｓ１の底辺までの距離ｄの９／１０である枠を顎範囲として推定する。 The jaw range estimation unit 70b of the image processing system B estimates the jaw range based on the face image S1 obtained by the face detection unit 20 and the mouth position obtained by the mouth detection unit 60. Specifically, as shown in FIG. 20, for example, a predetermined range below the mouth has the same center line as the center line of the face image S1, and the position of the base of the face image S1 is set as the position of the base. A frame whose width is ½ of the width of the face image S1 and whose height is 9/10 of the distance d from the mouth position B to the bottom of the face image S1 is estimated as a jaw range.

図２１は、本実施形態の画像処理システムＢにおける処理を示すフローチャートである。図示のように、本実施形態に画像処理システムＢにおいて、画像入力部１０により入力された顔写真画像Ｓ０に対して、まず、顔検出部２０により顔のおおよその位置および大きさを得るための顔検出が行われる（Ｓ５０、Ｓ５２）。顔検出部２０により得られた顔画像Ｓ１に対して、肌色画素検出部５０により肌色画素が検出される（Ｓ５４）。そして、口検出部６０により、肌色画素検出部５０の検出結果を利用して、顔画像Ｓ１における口の位置が検出される（Ｓ５６）。顔検出部２０により得られた顔画像Ｓ１、および口検出部６０により得られた口の位置に基づいて、顎範囲が顎範囲推定部７０ｂにより推定される（Ｓ５８）。顎位置検出部９０は、非肌色画素除去部８０により非肌色画素を除去した顎範囲内において、顎の位置を検出する（Ｓ６０、Ｓ６５）。顎位置検出部９０により検出された顎の位置を示す情報が、出力部１００により出力される（Ｓ７０）ことをもって、画像処理システムＢの処理が終了する。 FIG. 21 is a flowchart showing processing in the image processing system B of the present embodiment. As shown in the figure, in the image processing system B according to the present embodiment, for the face photograph image S0 input by the image input unit 10, first, the face detection unit 20 obtains the approximate position and size of the face. Face detection is performed (S50, S52). Skin color pixels are detected by the skin color pixel detection unit 50 from the face image S1 obtained by the face detection unit 20 (S54). Then, the mouth detection unit 60 detects the position of the mouth in the face image S1 using the detection result of the skin color pixel detection unit 50 (S56). Based on the face image S1 obtained by the face detection unit 20 and the mouth position obtained by the mouth detection unit 60, the jaw range is estimated by the jaw range estimation unit 70b (S58). The chin position detection unit 90 detects the position of the chin within the chin range from which the non-skin color pixel removal unit 80 has removed the non-skin color pixels (S60, S65). When the information indicating the position of the jaw detected by the jaw position detection unit 90 is output by the output unit 100 (S70), the processing of the image processing system B ends.

このように、本実施形態の画像処理システムＢは、顔画像Ｓ１の底辺を顎範囲の枠の底辺とするようにしているため、図１に示す実施形態の画像処理システムＢにより推定された顎範囲より縦方向における高さが長く、顎範囲内に首にかかった衣服などが含まれる可能性が高いが、顎が存在し、かつ顔における他のパーツを排除した範囲内で検出により顎の位置を取得しているので、両目の位置などから顎の位置を推定する従来の顎位置の取得方法より、精確な顎の位置を得ることができる。また、非肌色画素除去部８０により、顎範囲内の画素から非肌色画素が除去されることによって、衣服などの影響を防ぎ、一層検出精度の向上を図ることができる。 As described above, the image processing system B of the present embodiment uses the base of the face image S1 as the base of the frame of the jaw range, and thus the jaw estimated by the image processing system B of the embodiment shown in FIG. The height in the vertical direction is longer than the range, and there is a high possibility that clothes on the neck are included in the jaw range, but the jaw is detected by detection within the range where the jaw is present and other parts on the face are excluded. Since the position is acquired, the accurate jaw position can be obtained by the conventional jaw position acquisition method in which the position of the jaw is estimated from the positions of both eyes. Further, the non-skin color pixel removing unit 80 removes the non-skin color pixels from the pixels in the jaw range, thereby preventing the influence of clothes and the like and further improving the detection accuracy.

以上、本発明の画像処理方法および装置並びにそのためのプログラムの望ましい実施形態を説明したが、本発明の画像処理方法および装置並びにプログラムは、上述した実施形態に限らず、本発明の主旨から逸脱しない限り、種々の増減、変更を加えることができる。 The preferred embodiments of the image processing method and apparatus of the present invention and the program therefor have been described above, but the image processing method, apparatus and program of the present invention are not limited to the above-described embodiments, and do not depart from the gist of the present invention. As long as there are various changes, changes can be made.

例えば、図１に示す実施形態の画像処理システムＡにおいて、目の位置を検出するようにしているが、ユーザに指定させるようにしてもよい。勿論、口の位置についても同じである。 For example, in the image processing system A according to the embodiment shown in FIG. 1, the eye position is detected, but the user may designate it. Of course, the same applies to the position of the mouth.

また、図１に示す実施形態の画像処理システムＡは、両目の位置および両目間の距離に基づいて式（１）に従って顎の仮位置を推定するようにしているが、例えば特許文献２または３に記載された方法のように、両目の位置に加え、両目の位置および口の位置に基づいて顎の仮位置を推定するようにしてもよい。 Further, the image processing system A according to the embodiment shown in FIG. 1 estimates the temporary position of the jaw according to the formula (1) based on the position of both eyes and the distance between both eyes. In addition to the position of both eyes, the temporary position of the jaw may be estimated based on the position of both eyes and the position of the mouth.

また、図１に示す実施形態の画像処理システムＡにおける目検出部３０は、目の中心位置を目の位置として検出しているが、瞳の位置や、目尻、目頭などの位置を用いても勿論よい。 In addition, the eye detection unit 30 in the image processing system A of the embodiment shown in FIG. 1 detects the center position of the eyes as the eye position, but the position of the pupil, the corner of the eye, the position of the eyes, etc. may be used. Of course.

また、顔の検出、口の検出についても、上述した実施形態に用いられた方法に限られることがなく、顔のおおよその位置および大きさ、口の位置を検出することができれば、いかなる方法を用いてもよい。 In addition, the detection of the face and the detection of the mouth is not limited to the method used in the above-described embodiment, and any method can be used as long as the approximate position and size of the face and the position of the mouth can be detected. It may be used.

本発明の第１の実施形態となる画像処理システムＡの構成を示すブロック図1 is a block diagram showing a configuration of an image processing system A according to a first embodiment of the present invention. 顔検出部２０の構成を示すブロック図Block diagram showing the configuration of the face detection unit 20 目検出部３０の構成を示すブロック図The block diagram which shows the structure of the eye detection part 30 目の中心位置を説明するための図Diagram for explaining the center position of eyes （ａ）は水平方向のエッジ検出フィルタを示す図、（ｂ）は垂直方向のエッジ検出フィルタを示す図(A) is a diagram showing a horizontal edge detection filter, (b) is a diagram showing a vertical edge detection filter 勾配ベクトルの算出を説明するための図Diagram for explaining calculation of gradient vector （ａ）は人物の顔を示す図、（ｂ）は（ａ）に示す人物の顔の目および口付近の勾配ベクトルを示す図(A) is a figure which shows a person's face, (b) is a figure which shows the gradient vector of eyes and mouth vicinity of the person's face shown to (a). （ａ）は正規化前の勾配ベクトルの大きさのヒストグラムを示す図、（ｂ）は正規化後の勾配ベクトルの大きさのヒストグラムを示す図、（ｃ）は５値化した勾配ベクトルの大きさのヒストグラムを示す図、（ｄ）は正規化後の５値化した勾配ベクトルの大きさのヒストグラムを示す図(A) is a diagram showing a histogram of magnitudes of gradient vectors before normalization, (b) is a diagram showing histograms of magnitudes of gradient vectors after normalization, and (c) is a magnitude of gradient vectors obtained by quinarization. The figure which shows the histogram of the length, (d) is a figure which shows the histogram of the magnitude | size of the quinary gradient vector after normalization 第１の参照データの学習に用いられる顔であることが分かっているサンプル画像の例を示す図The figure which shows the example of the sample image known to be a face used for learning of 1st reference data 第２の参照データの学習に用いられる顔であることが分かっているサンプル画像の例を示す図The figure which shows the example of the sample image known to be a face used for learning of 2nd reference data 顔の回転を説明するための図Illustration for explaining face rotation 参照データの学習手法を示すフローチャートFlow chart showing learning method of reference data 識別器の導出方法を示す図Diagram showing how to derive a classifier 識別対象画像の段階的な変形を説明するための図The figure for demonstrating the stepwise deformation | transformation of an identification object image 顎範囲推定部７０ａを説明するための図The figure for demonstrating the jaw range estimation part 70a 顎位置検出部９０の構成を示すブロック図Block diagram showing the configuration of the jaw position detector 90 エッジ強度を求めるためのフィルタの例を示す図The figure which shows the example of the filter for calculating | requiring edge strength 図１に示す実施形態の画像処理システムＡの処理を示すフローチャートThe flowchart which shows the process of the image processing system A of embodiment shown in FIG. 本発明の第２の実施形態となる画像処理システムＢの構成を示すブロック図The block diagram which shows the structure of the image processing system B used as the 2nd Embodiment of this invention. 顎範囲推定部７０ｂを説明するための図The figure for demonstrating the jaw range estimation part 70b 図１９に示す実施形態の画像処理システムＢの処理を示すフローチャートThe flowchart which shows the process of the image processing system B of embodiment shown in FIG.

Explanation of symbols

１０画像入力部
２０顔検出部
２２第１の特徴量算出部
２４顔検出実行部
３０目検出部
３２第２の特徴量算出部
３４目検出実行部
４０ａ，４０ｂデータベース
５０肌色画素検出部
６０口検出部
７０ａ，７０ｂ顎範囲推定部
８０非肌色画素除去部
９０顎位置検出部
９２射影処理部
９４エッジ強度検出部
９６顎位置決定部
１００出力部
Ｓ０顔写真画像
Ｓ１顔画像
Ｅ１，Ｅ２参照データ DESCRIPTION OF SYMBOLS 10 Image input part 20 Face detection part 22 1st feature-value calculation part 24 Face detection execution part 30 Eye detection part 32 2nd feature-value calculation part 34 Eye detection execution part 40a, 40b Database 50 Skin color pixel detection part 60 Mouth detection Unit 70a, 70b jaw range estimation unit 80 non-skin color pixel removal unit 90 jaw position detection unit 92 projection processing unit 94 edge strength detection unit 96 jaw position determination unit 100 output unit S0 face photograph image S1 face image E1, E2 reference data

Claims

Detect approximate face position and size from face photo image,
Based on the detected position and size of the face, estimate a jaw range consisting of a jaw in the face and a region near the jaw;
An image processing method, wherein the position of the jaw is detected in the estimated region.

Get the position of both eyes in the face photo image,
Based on the position of both eyes in the acquired face photo image, estimate a jaw range consisting of the jaw in the face photo image and a region near the jaw,
An image processing method comprising: detecting a position of the jaw within the estimated range.

A face detection means for acquiring an approximate face position and size from a face photo image;
Jaw range estimation means for estimating a jaw range composed of a jaw in the face and a region near the jaw based on the detected position and size of the face;
An image processing apparatus comprising: a jaw position detecting means for detecting the position of the jaw in the estimated range.

Eye position acquisition means for acquiring the position of each eye in the face photo image;
Jaw range estimation means for estimating a jaw range consisting of a jaw in the face photograph image and a region near the jaw based on the acquired positions of both eyes;
An image processing apparatus comprising: a jaw position detecting means for detecting the position of the jaw within the estimated range.

5. The image processing apparatus according to claim 4, wherein the eye position acquisition means is eye position detection means for detecting the positions of the eyes from the face photograph image.

Mouth position obtaining means for obtaining the position of the mouth in the face,
The image processing apparatus according to claim 3, wherein the jaw range estimating unit estimates the jaw range below the position of the mouth.

Non-skin color pixel removing means for detecting and removing pixels other than skin color among each pixel in the jaw range estimated by the jaw range estimating means,
The image processing according to any one of claims 3 to 6, wherein the jaw position detecting means detects the position of the jaw in the jaw range from which pixels other than the skin color are removed. apparatus.

Projection means for obtaining projection information for each height in the axial direction, wherein the jaw position detection means projects the pixel value of each pixel in the jaw range to an axis along the vertical direction of the face. When,
Based on the projection information, an edge strength acquisition means for obtaining an edge strength for each height;
The image according to any one of claims 3 to 7, further comprising jaw position determining means for determining the position of the pixel corresponding to the height having the largest edge intensity as the position of the jaw. Processing equipment.

A process for detecting the approximate position and size of a face from a face photo image;
Based on the detected position and size of the face, a process for estimating a jaw range consisting of a jaw in the face and a region near the jaw;
A program for causing a computer to execute processing for detecting the position of the jaw within the estimated range.

Processing to obtain the positions of both eyes in the face photo image;
Based on the position of both eyes in the acquired face photograph image, a process of estimating a jaw range consisting of a jaw in the face photograph image and a region near the jaw;
A program for causing a computer to execute processing for detecting the position of the jaw within the estimated range.