JP2005108195A

JP2005108195A - Object identification unit, method, and program

Info

Publication number: JP2005108195A
Application number: JP2004254430A
Authority: JP
Inventors: Yuanzhong Li; 元中李
Original assignee: Fuji Photo Film Co Ltd
Current assignee: Fujifilm Holdings Corp
Priority date: 2003-09-09
Filing date: 2004-09-01
Publication date: 2005-04-21
Anticipated expiration: 2024-09-01
Also published as: JP4510556B2

Abstract

<P>PROBLEM TO BE SOLVED: To identify whether or not a predetermined object such as a face is included in an image in a comparatively short processing time. <P>SOLUTION: A feature calculating part 4 calculates a first feature C1 not requiring normalization of the identification target image S0, and a normalized second feature C2. A first identification part 8 identifies whether or not a face candidate is included in the identification target image S0 on the basis of the first feature C1 calculated from the identification target image by referring to first reference data R1 having carried out learning in regard to a multiplicity of face images, and first features C1 of images that are not faces. If a face candidate is included, a second identification part 10 identifies whether or not the face candidate is a face by referring to second reference data R2 having carried out learning in regard to a multiplicity of face images, and second features C2 of images that are not faces. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、画像に顔等の所定対象物が含まれるか否かを識別する対象物識別装置および方法並びに対象物識別方法をコンピュータに実行させるためのプログラムに関するものである。 The present invention relates to an object identification apparatus and method for identifying whether a predetermined object such as a face is included in an image, and a program for causing a computer to execute the object identification method.

デジタルカメラにおいて取得した画像データや、フイルムに記録された画像を読み取ることにより得た画像データを、プリント等のハードコピーとしてあるいはディスプレイ上にソフトコピーとして再現することが行われている。このような画像データにより表される画像は人物の顔が含まれることが多く、顔が適切な明るさや色を有するものとなるように画像データに対して明るさ、階調、色、シャープネス等を修正する画像処理を施すことが行われている。このように画像データに対して画像処理を施す場合には、画像データにより表される画像から人物の顔に対応する顔領域を検出する必要がある。このため、画像に顔等の所定対象物が含まれるか否かを識別する各種方法が提案されている。 Image data obtained by a digital camera or image data obtained by reading an image recorded on a film is reproduced as a hard copy such as a print or as a soft copy on a display. An image represented by such image data often includes a human face, and the brightness, gradation, color, sharpness, etc. of the image data are set so that the face has appropriate brightness and color. Image processing for correcting the image is performed. When image processing is performed on image data in this way, it is necessary to detect a face area corresponding to a human face from an image represented by the image data. For this reason, various methods for identifying whether or not a predetermined object such as a face is included in the image have been proposed.

例えば非特許文献１は、顔を検出する際に用いられる特徴量である輝度値を正規化し、顔について学習を行ったニューラルネットワークの学習結果を参照して、画像に顔が含まれるか否かを識別する手法である。また非特許文献２は、画像中に含まれるエッジのような高周波成分を対象物の検出に使用する特徴量として求めてこの特徴量を正規化し、ブースティングと称されるマシンラーニング（machine learning）の手法を用いての特徴量についての学習結果を参照して、画像に対象物が含まれるか否かを識別する手法である。これら非特許文献１，２の手法は、顔等の対象物の検出に使用する特徴量を正規化しているため、画像に対象物が含まれるか否かを精度よく識別することができる。 For example, Non-Patent Document 1 normalizes a luminance value, which is a feature amount used when detecting a face, refers to a learning result of a neural network that has learned the face, and determines whether or not the face is included in the image. It is a technique to identify. Non-Patent Document 2 obtains a high-frequency component such as an edge included in an image as a feature value used for detection of an object, normalizes this feature value, and is called machine learning called boosting. This is a method for identifying whether or not an object is included in an image by referring to a learning result on a feature amount using the above method. These methods of Non-Patent Documents 1 and 2 normalize feature quantities used for detecting an object such as a face, and therefore can accurately identify whether or not an object is included in an image.

また、非特許文献３は、特に乳癌における特徴的形態の１つである腫瘤陰影を検出するために、例えばＸ線ネガフイルム上においては、腫瘤陰影は周囲にくらべて濃度値がわずかに低く、腫瘤陰影内の任意の画素における勾配ベクトルは腫瘤陰影の中心付近を向いているという事実を利用して、画像中の勾配ベクトルの向きの分布を評価し、特定の点に集中している領域を腫瘤陰影の候補として抽出する手法である。さらに、特許文献１は、ニューラルネットワークの一手法であるコホーネンの自己組織化を用いて顔等の対象物の特徴パターンを学習し、この学習結果を参照して、対象物の候補と対象物の特徴部分とが学習された特徴パターンに含まれるか否かを判定し、さらに対象物の候補の特徴部分の位置関係が対象物の特徴部分の位置関係と一致するか否かを判定することにより、対象物の候補が対象物であるか否かを判定する手法である。
Henry A. Rowley, Shumeet Baluja, and Takeo Kanada, "Neural Network-Based Face Detection", volume 20, number 1, pages 23-38, January 1998. Rainer Lienhart, Jochen Maydt, "An Extended Set of Haar-like Features for Rapid Object Detection", International Conference on Image Processing. 小畑他、「ＤＲ画像における腫瘤影検出（アイリスフィルタ）」、電子情報通信学会論文誌、D-II Vol.J75-D-II No.3 、P663〜670、1992年３月特開平５−２８２４５７号公報 In addition, Non-Patent Document 3 detects a tumor shadow that is one of the characteristic forms particularly in breast cancer. For example, on an X-ray negative film, the tumor shadow has a slightly lower concentration value than the surroundings. Using the fact that the gradient vector at any pixel in the tumor shadow is near the center of the tumor shadow, the distribution of the orientation of the gradient vector in the image is evaluated, and the area concentrated at a specific point is determined. This is a method of extracting as a candidate for a shadow of a tumor. Further, Patent Document 1 learns a feature pattern of an object such as a face using Kohonen's self-organization, which is one method of a neural network, and refers to the learning result to identify a candidate object and an object. By determining whether or not the feature portion is included in the learned feature pattern, and further determining whether or not the positional relationship of the feature portion of the candidate object matches the positional relationship of the feature portion of the target object This is a method for determining whether or not a candidate for an object is an object.
Henry A. Rowley, Shumeet Baluja, and Takeo Kanada, "Neural Network-Based Face Detection", volume 20, number 1, pages 23-38, January 1998. Rainer Lienhart, Jochen Maydt, "An Extended Set of Haar-like Features for Rapid Object Detection", International Conference on Image Processing. Obata et al., “Detection of mass shadow in DR image (iris filter)”, IEICE Transactions, D-II Vol.J75-D-II No.3, P663-670, March 1992 Japanese Patent Laid-Open No. 5-282457

しかしながら、上記非特許文献１，２の手法は、対象物の検出に使用する特徴量を正規化しているために演算量が多くなり、識別のために必要な処理時間が長くなってしまうという問題がある。また、非特許文献３の手法は勾配ベクトルの向きの分布を評価しているのみであるため、腫瘤陰影のような単純な形状の対象物は検出できても、人物の顔のような複雑な対象物は検出することができない。また、特許文献１の手法は判定する対象が多いため処理に長時間を要する。 However, the above-described methods of Non-Patent Documents 1 and 2 have a problem that the amount of calculation increases because the feature amount used for detection of the object is normalized, and the processing time required for identification becomes long. There is. In addition, since the method of Non-Patent Document 3 only evaluates the distribution of the gradient vector direction, it can detect an object having a simple shape such as a mass shadow, but can detect a complicated object such as a human face. The object cannot be detected. Moreover, since the method of patent document 1 has many objects to determine, processing requires a long time.

本発明は、上記事情に鑑みなされたものであり、比較的短い処理時間により顔等の所定対象物が画像に含まれているか否かを識別することを目的とする。 The present invention has been made in view of the above circumstances, and an object thereof is to identify whether or not a predetermined object such as a face is included in an image in a relatively short processing time.

本発明による対象物識別装置は、識別対象の画像の入力を受け付ける画像入力手段と、
所定対象物の識別に用いる正規化が不要な第１の特徴量を前記識別対象の画像から算出する第１の特徴量算出手段と、
前記第１の特徴量と該第１の特徴量に対応する識別条件とをあらかじめ規定した第１の参照データを、前記識別対象の画像から算出された第１の特徴量に基づいて参照して、前記識別対象の画像に所定対象物候補が含まれるか否かを識別する第１の識別手段と、
該第１の識別手段により前記所定対象物候補が含まれると識別された場合、前記所定対象物の識別に用いる正規化された第２の特徴量を前記所定対象物候補から算出する第２の特徴量算出手段と、
前記第２の特徴量と該第２の特徴量に対応する識別条件とをあらかじめ規定した第２の参照データを、前記所定対象物候補から算出された正規化された第２の特徴量に基づいて参照して、前記所定対象物候補が前記所定対象物であるか否かを識別する第２の識別手段とを備えたことを特徴とするものである。 An object identification apparatus according to the present invention includes an image input unit that receives an input of an image to be identified
First feature amount calculating means for calculating a first feature amount that is not required for normalization used for identifying a predetermined object from the image of the identification target;
The first reference data that predefines the first feature quantity and the identification condition corresponding to the first feature quantity are referred to based on the first feature quantity calculated from the image to be identified. First identifying means for identifying whether or not a predetermined object candidate is included in the image to be identified;
When the first identifying means identifies that the predetermined object candidate is included, a second normalized feature value used for identifying the predetermined object is calculated from the predetermined object candidate. A feature amount calculating means;
Based on the normalized second feature amount calculated from the predetermined object candidate, second reference data that predefines the second feature amount and an identification condition corresponding to the second feature amount. With reference to the above, a second identification unit for identifying whether or not the predetermined object candidate is the predetermined object is provided.

「所定対象物」とは、ほぼ一定形状をなしており、ほぼ一定の大きさとなるようにサイズを揃えることが可能な対象物が挙げられる。具体的には、人物の顔、車両および道路標識等を所定対象物とすることができる。 Examples of the “predetermined object” include an object that has a substantially constant shape and can be arranged to have a substantially constant size. Specifically, a human face, a vehicle, a road sign, and the like can be set as predetermined objects.

「特徴量」とは、画像の特徴を表すパラメータを指し、その画像における各画素の濃度勾配を表す勾配ベクトル、各画素の色情報（色相、彩度）、濃度、テクスチャーの特徴、奥行情報、その画像に含まれるエッジの特徴等、いかなる特徴を表すものであってもよい。 The “feature amount” refers to a parameter representing the feature of the image, a gradient vector representing the density gradient of each pixel in the image, color information (hue, saturation), density, texture feature, depth information of each pixel, Any feature such as a feature of an edge included in the image may be expressed.

「正規化が不要な第１の特徴量」とは、画像の明度やコントラストの変化に依存しない特徴量である。例えば、画像の各画素における濃度が変化する方向および変化の大きさ、すなわち濃度の勾配を表す勾配ベクトルは、その画素の濃度およびその画素から見た特定の方向におけるコントラストの変化量に応じて大きさが変わるが、大きさが変わっても勾配ベクトルの方向は変わらない。また、色相等の色情報は、画像の濃度が変わっても色情報自体は変わらない。したがって、勾配ベクトルの方向および色情報等を第１の特徴量として用いることができる。 The “first feature amount that does not require normalization” is a feature amount that does not depend on changes in image brightness or contrast. For example, the direction in which the density changes in each pixel of the image and the magnitude of the change, that is, the gradient vector representing the density gradient, is large depending on the density of the pixel and the amount of change in contrast in a specific direction viewed from the pixel. However, the direction of the gradient vector does not change even if the size changes. Further, the color information such as hue does not change even if the image density changes. Therefore, the direction of the gradient vector, color information, and the like can be used as the first feature amount.

「第２の特徴量」とは、特徴量をそのまま用いたのでは、ある画像の特徴量が、その画像に含まれる同一種類の特徴量や他の画像における同一種類の特徴量と比較して大きいのか小さいのか区別ができない、画像の明度やコントラストの変化に依存する特徴量である。第２の特徴量を正規化する手法としては、例えば、画素毎に第２の特徴量が算出される場合、所定対象物候補を構成する全画素の第２の特徴量を用いて所定対象物候補に含まれる各画素の第２の特徴量を正規化する手法や、所定対象物候補を構成する全画素のうち、正規化の対象となる画素を含む所定範囲内の複数画素の第２の特徴量を用いて対象となる画素の第２の特徴量を正規化する手法等を用いることができる。 The “second feature amount” means that if the feature amount is used as it is, the feature amount of an image is compared with the same type of feature amount included in the image or the same type of feature amount in another image. It is a feature quantity that cannot be distinguished whether it is large or small and depends on changes in brightness and contrast of the image. As a method for normalizing the second feature amount, for example, when the second feature amount is calculated for each pixel, the predetermined feature is used by using the second feature amounts of all the pixels constituting the predetermined target candidate. A method of normalizing the second feature amount of each pixel included in the candidate, or a second of a plurality of pixels within a predetermined range including a pixel to be normalized among all pixels constituting the predetermined object candidate A method of normalizing the second feature value of the target pixel using the feature value can be used.

「識別条件」とは、特徴量を指標とした、所定対象物とそうでない対象物とを識別する条件を指す。 The “identification condition” refers to a condition for discriminating between a predetermined object and an object that is not, using the feature amount as an index.

なお、本発明による対象物識別装置においては、前記第１の参照データを、前記所定対象物であることが分かっている複数のサンプル画像と、前記所定対象物でないことが分かっている複数のサンプル画像とからなる多数のサンプル画像群に含まれる前記第１の特徴量を、ニューラルネットワーク、ブースティング等のマシンラーニング（machine learning）の手法によりあらかじめ学習することにより得てもよい。 In the object identification device according to the present invention, the first reference data includes a plurality of sample images known to be the predetermined object and a plurality of samples known to be not the predetermined object. The first feature amount included in a large number of sample image groups including images may be obtained by learning in advance by a machine learning technique such as neural network or boosting.

ここで、前記所定対象物が顔である場合、前記第１の参照データを、前記所定対象物であることが分かっているサンプル画像における左目と左頬とを含む所定範囲の第１の領域および右目と右頬とを含む所定範囲の第２の領域に含まれる前記第１の特徴量、並びに前記所定対象物でないことが分かっているサンプル画像における前記第１および前記第２の領域に対応する各領域に含まれる前記第１の特徴量を学習することにより得るものとし、
前記第１の特徴量算出手段を、前記識別対象の画像における前記第１および前記第２の領域に対応する各領域から前記第１の特徴量を算出する手段としてもよい。 Here, when the predetermined object is a face, the first reference data includes a first region in a predetermined range including a left eye and a left cheek in a sample image that is known to be the predetermined object; Corresponding to the first feature amount included in the second region of the predetermined range including the right eye and the right cheek, and the first and second regions in the sample image that is known not to be the predetermined object. It is obtained by learning the first feature amount included in each region,
The first feature amount calculating means may be means for calculating the first feature amount from each region corresponding to the first and second regions in the image to be identified.

また、前記第１の参照データを、前記所定対象物であることが分かっているサンプル画像における両目を含む所定範囲の第３の領域に含まれる前記第１の特徴量、並びに前記所定対象物でないことが分かっているサンプル画像における前記第３の領域に対応する領域に含まれる前記第１の特徴量をさらに学習することにより得るものとし、
前記第１の特徴量算出手段を、前記識別対象の画像における前記第１から第３の領域に対応する各領域から前記第１の特徴量を算出する手段としてもよい。 Further, the first reference data includes the first feature amount included in a third region of a predetermined range including both eyes in the sample image known to be the predetermined object, and is not the predetermined object. It is obtained by further learning the first feature amount included in an area corresponding to the third area in the sample image that is known.
The first feature amount calculating unit may be a unit that calculates the first feature amount from each region corresponding to the first to third regions in the image to be identified.

また、本発明による対象物識別装置においては、前記第２の参照データを、前記所定対象物であることが分かっている複数のサンプル画像と、前記所定対象物でないことが分かっている複数のサンプル画像とからなる多数のサンプル画像群に含まれる前記第２の特徴量を、マシンラーニングの手法によりあらかじめ学習することにより得てもよい。 Further, in the object identification device according to the present invention, the second reference data includes a plurality of sample images known to be the predetermined object and a plurality of samples known to be not the predetermined object. The second feature amount included in a large number of sample image groups including images may be obtained by learning in advance by a machine learning method.

ここで、前記所定対象物が顔である場合、前記第２の参照データを、前記所定対象物であることが分かっているサンプル画像における左目と左頬とを含む所定範囲の第１の領域および右目と右頬とを含む所定範囲の第２の領域に含まれる前記第２の特徴量、並びに前記所定対象物でないことが分かっているサンプル画像における前記第１および前記第２の領域に対応する各領域に含まれる前記第２の特徴量を学習することにより得るものとし、
前記第２の特徴量算出手段を、前記識別対象の画像における前記第１および前記第２の領域に対応する各領域から前記第２の特徴量を算出する手段としてもよい。 Here, when the predetermined object is a face, the second reference data includes a first region in a predetermined range including a left eye and a left cheek in a sample image that is known to be the predetermined object; Corresponding to the second feature amount included in the second region of the predetermined range including the right eye and the right cheek, and the first and second regions in the sample image that is known not to be the predetermined object. It is obtained by learning the second feature amount included in each region,
The second feature amount calculating unit may be a unit that calculates the second feature amount from each region corresponding to the first and second regions in the image to be identified.

また、前記第２の参照データを、前記所定対象物であることが分かっているサンプル画像における両目を含む所定範囲の第３の領域に含まれる前記第２の特徴量、並びに前記所定対象物でないことが分かっているサンプル画像における前記第３の領域に対応する領域に含まれる前記第２の特徴量をさらに学習することにより得るものとし、
前記第２の特徴量算出手段を、前記識別対象の画像における前記第１から第３の領域に対応する各領域から前記第２の特徴量を算出する手段としてもよい。 In addition, the second reference data is not the predetermined feature and the second feature amount included in the third region of the predetermined range including both eyes in the sample image that is known to be the predetermined target. It is obtained by further learning the second feature amount included in a region corresponding to the third region in the sample image that is known.
The second feature quantity calculating means may be means for calculating the second feature quantity from each area corresponding to the first to third areas in the image to be identified.

また、本発明による対象物識別装置においては、前記第１の特徴量を、画像上の各画素における勾配ベクトルの方向または色情報としてもよい。 In the object identification device according to the present invention, the first feature amount may be a gradient vector direction or color information in each pixel on the image.

「勾配ベクトル」とは、画像の各画素における濃度が変化する方向および変化の大きさを表すものである。 The “gradient vector” represents the direction in which the density at each pixel of the image changes and the magnitude of the change.

また、本発明による対象物識別装置においては、前記第２の特徴量を、画像上の各画素における勾配ベクトルの方向および大きさとしてもよい。 In the object identification device according to the present invention, the second feature amount may be a direction and a magnitude of a gradient vector in each pixel on the image.

また、本発明による対象物識別装置においては、前記第１の識別手段による識別結果が所定の要求を満たすか否かを判定し、該判定が肯定された場合は、前記識別対象の画像から前記第１の特徴量のみを算出して、前記第１の識別手段が識別した前記所定対象物候補を前記所定対象物と識別するよう前記第１および前記第２の特徴量算出手段、並びに前記第１および前記第２の識別手段を制御する制御手段をさらに備えるようにしてもよい。 Further, in the object identification device according to the present invention, it is determined whether or not the identification result by the first identification unit satisfies a predetermined request, and when the determination is affirmative, the image is identified from the identification target image. Only the first feature quantity is calculated, and the first and second feature quantity calculation means, and the first feature quantity calculation means to identify the predetermined object candidate identified by the first identification means from the predetermined object, and Control means for controlling the first and second identification means may be further provided.

また、本発明による対象物識別装置においては、前記所定対象物候補から算出されたさらに他の特徴量に基づいて、前記第２の識別手段により識別された前記画像に含まれる所定対象物が、真に所定対象物であるかを識別する少なくとも１つの他の識別手段をさらに備えるようにしてもよい。 In the object identification device according to the present invention, the predetermined object included in the image identified by the second identification unit based on still another feature amount calculated from the predetermined object candidate, You may make it further provide at least 1 other identification means which identifies whether it is truly a predetermined target object.

また、本発明による対象物識別装置においては、前記識別対象の画像から前記所定対象物を抽出する抽出手段をさらに備えるようにしてもよい。 The object identification device according to the present invention may further include an extraction unit that extracts the predetermined object from the image to be identified.

また、本発明による対象物識別装置においては、前記識別対象の画像における前記所定対象物の位置を表す情報を前記識別対象の画像に付与して出力する出力手段をさらに備えるようにしてもよい。 The object identification device according to the present invention may further include an output unit that outputs the information representing the position of the predetermined object in the identification target image to the identification target image.

本発明によるデジタルカメラ、カメラ付き携帯電話等の撮像装置は、本発明による対象物識別装置を備えたことを特徴とするものである。 An imaging apparatus such as a digital camera or a camera-equipped mobile phone according to the present invention is characterized by including the object identification apparatus according to the present invention.

本発明による対象物識別方法は、識別対象の画像の入力を受け付け、
所定対象物の識別に用いる正規化が不要な第１の特徴量を前記識別対象の画像から算出し、
前記第１の特徴量と該第１の特徴量に対応する識別条件とをあらかじめ規定した第１の参照データを、前記識別対象の画像から算出された第１の特徴量に基づいて参照して、前記識別対象の画像に所定対象物候補が含まれるか否かを識別し、
該第１の識別手段により前記所定対象物候補が含まれると識別された場合、前記所定対象物の識別に用いる正規化された第２の特徴量を前記所定対象物候補から算出し、
前記第２の特徴量と該第２の特徴量に対応する識別条件とをあらかじめ規定した第２の参照データを、前記所定対象物候補から算出された正規化された第２の特徴量に基づいて参照して、前記所定対象物候補が前記所定対象物であるか否かを識別することを特徴とするものである。 The object identification method according to the present invention accepts input of an image to be identified,
Calculating a first feature amount that is not required for normalization used for identification of a predetermined object from the image of the identification object;
The first reference data that predefines the first feature quantity and the identification condition corresponding to the first feature quantity are referred to based on the first feature quantity calculated from the image to be identified. , Identifying whether a predetermined target candidate is included in the image to be identified,
When the first identification means identifies that the predetermined object candidate is included, a normalized second feature value used for identification of the predetermined object is calculated from the predetermined object candidate,
Based on the normalized second feature amount calculated from the predetermined object candidate, second reference data that predefines the second feature amount and an identification condition corresponding to the second feature amount. With reference to the above, it is characterized by identifying whether or not the predetermined object candidate is the predetermined object.

なお、本発明による対象物識別方法をコンピュータに実行させるためのプログラムとして提供してもよい。 In addition, you may provide as a program for making a computer perform the target object identification method by this invention.

本発明によれば、識別対象の画像から正規化が不要な第１の特徴量が算出される。そして、第１の参照データが第１の特徴量に基づいて参照されて、識別対象の画像に所定対象物候補が含まれるか否かが識別される（第１の識別）。そして、所定対象物候補が含まれると識別されると、所定対象物候補から正規化された第２の特徴量が算出され、続いて、第２の参照データが第２の特徴量に基づいて参照されて、所定対象物候補が所定対象物であるか否かが識別（第２の識別）される。ここで、第１の識別においては、正規化が不要な第１の特徴量を用いているため、識別対象の画像の全体について所定対象物候補が含まれるか否かを識別しても、それほど演算量は多くなく、その結果、比較的高速に識別対象の画像に所定対象物候補が含まれるか否かを識別できる。一方、第２の識別においては、正規化された第２の特徴量を用いているため、精度よく所定対象物が含まれるか否かを識別できるものの、演算量が多くなる。しかしながら、本発明においては、正規化された第２の特徴量を算出して第２の識別を行うのは、識別対象の画像における所定対象物候補の部分のみであるため、正規化のための演算量が少なくなり、その結果、識別の処理に要する時間は短いものとなる。したがって、本発明によれば、識別対象の画像に所定対象物が含まれるか否かの識別を高速かつ高精度に行うことができる。 According to the present invention, the first feature amount that does not require normalization is calculated from the image to be identified. Then, the first reference data is referred to based on the first feature amount, and it is identified whether or not the predetermined target candidate is included in the image to be identified (first identification). Then, when it is identified that the predetermined object candidate is included, a normalized second feature amount is calculated from the predetermined object candidate, and then the second reference data is based on the second feature amount. Reference is made to identify (second identification) whether or not the predetermined object candidate is the predetermined object. Here, in the first identification, since the first feature amount that does not need to be normalized is used, even if it is determined whether or not the predetermined target candidate is included in the entire identification target image, it is not much. The amount of calculation is not large, and as a result, it is possible to identify whether or not the predetermined target candidate is included in the image to be identified at a relatively high speed. On the other hand, in the second identification, since the normalized second feature amount is used, it can be accurately identified whether or not the predetermined object is included, but the calculation amount increases. However, in the present invention, it is only the portion of the predetermined object candidate in the image to be identified that calculates the second normalized feature amount and performs the second identification. The amount of computation is reduced, and as a result, the time required for the identification process is short. Therefore, according to the present invention, it is possible to identify at high speed and with high accuracy whether or not a predetermined object is included in the image to be identified.

また、第１および第２の参照データをマシンラーニングの手法によりあらかじめ学習することにより得られたものとすることにより、所定対象物の識別性能をより向上させることができる。 Moreover, the identification performance of a predetermined object can be further improved by obtaining the first and second reference data by learning in advance by a machine learning method.

また、所定対象物が顔である場合において、学習の際に、サンプル画像における左目と左頬とを含む第１の領域および右目と右頬とを含む第２の領域、さらには両目を含む第３の領域に含まれる第１および第２の特徴量を使用することにより、学習時間を大幅に短縮することができる。また、識別対象の画像に所定対象物が含まれるか否かの識別を行う場合において、第１および第２の領域、さらには第３の領域に含まれる第１および第２の特徴量が識別性能の向上に大きく寄与することが本出願人の実験により確認されている。このため、学習の際に第１および第２の領域、さらには第３の領域に含まれる第１および第２の特徴量を使用して第１および第２の参照データの学習を行うことことにより、識別対象の画像に所定対象物が含まれるか否かの識別性能をより向上させることができる。 Further, when the predetermined object is a face, during the learning, the first region including the left eye and the left cheek, the second region including the right eye and the right cheek in the sample image, and the second region including both eyes. By using the first and second feature quantities included in the region 3, the learning time can be significantly shortened. When identifying whether or not a predetermined object is included in the image to be identified, the first and second feature values included in the first and second regions, and further the third region are identified. It has been confirmed by the applicant's experiment that it greatly contributes to the improvement of performance. Therefore, learning of the first and second reference data using the first and second feature amounts included in the first and second regions, and further in the third region during learning. Accordingly, it is possible to further improve the identification performance as to whether or not the predetermined object is included in the image to be identified.

また、識別対象の画像からは第１および第２の領域、さらには第３の領域に対応する各領域から第１および第２の特徴量を算出することにより、識別対象の画像の全体から第１および第２の特徴量を算出する場合よりも第１および第２の特徴量を算出する範囲が小さくなるため、演算時間を短縮することができる。 In addition, by calculating the first and second feature amounts from the first and second regions and further from the regions corresponding to the third region from the identification target image, the first and second feature amounts are calculated from the entire identification target image. Since the range for calculating the first and second feature amounts is smaller than when calculating the first and second feature amounts, the calculation time can be shortened.

また、第１の特徴量を画像上の各画素における勾配ベクトルの方向または色情報とするまたは第２の特徴量を画像上の各画素における勾配ベクトルの方向および大きさとすることにより、画像に含まれる比較的算出しやすい特徴量を用いて精度よく所定対象物が識別対象の画像に含まれるか否かを識別できる。 Also, the first feature amount is included in the image by setting the direction or color information of the gradient vector at each pixel on the image or the second feature amount as the direction and size of the gradient vector at each pixel on the image. Whether or not the predetermined object is included in the image to be identified can be identified with high accuracy using the feature amount that is relatively easy to calculate.

また、第１の識別手段により識別結果が所定の要求を満たすか否かを判定し、この判定が肯定された場合には、第１の特徴量のみを算出して第１の識別を行い、識別された所定対象物候補を所定対象物と識別することにより、第１の識別が精度よく行われている場合には、正規化された第２の特徴量の算出および第２の識別を省略することができるため、これにより、より高速に識別対象の画像に所定対象物が含まれるか否かの識別を行うことができる。 Further, it is determined whether or not the identification result satisfies a predetermined request by the first identification unit, and when this determination is affirmed, only the first feature amount is calculated and the first identification is performed, By identifying the identified predetermined object candidate as the predetermined object, the calculation of the normalized second feature amount and the second identification are omitted when the first identification is accurately performed. Therefore, it is possible to identify whether or not the predetermined object is included in the image to be identified at a higher speed.

また、識別された所定対象物を抽出することにより、識別対象の画像から精度よく所定対象物を抽出することができる。 Further, by extracting the identified predetermined object, the predetermined object can be extracted with high accuracy from the image of the identification target.

また、識別対象の画像における所定対象物の位置を表す情報を識別対象の画像に付与して出力することにより、後に識別対象に付与された情報を参照すれば、識別対象の画像から精度よく所定対象物を抽出することができる。 In addition, by adding information indicating the position of the predetermined object in the identification target image to the identification target image and outputting it, the information given to the identification target can be referred to from the identification target image with high accuracy. An object can be extracted.

以下、図面を参照して本発明の実施形態について説明する。図１は本発明の第１の実施形態による対象物識別装置の構成を示す概略ブロック図である。図１に示すように、第１の実施形態による対象物識別装置１は、識別対象の画像を表す識別対象画像データＳ０の入力を受け付ける画像入力部２、識別対象画像データＳ０により表される識別対象画像（以下画像についても参照符号Ｓ０を用いる）Ｓ０から第１および第２の特徴量Ｃ１，Ｃ２を算出する特徴量算出部４、後述する第１および第２の参照データＲ１，Ｒ２が格納されているメモリ６、特徴量算出部４が算出した第１の特徴量Ｃ１とメモリ６内の第１の参照データＲ１とに基づいて、識別対象画像Ｓ０に所定対象物である人物の顔の候補が含まれているか否かを識別する第１の識別部８、第１の識別部８により識別対象画像Ｓ０に顔候補が含まれていると識別された場合に、特徴量算出部４が算出した第２の特徴量Ｃ２とメモリ６内の第２の参照データＲ２とに基づいて、その顔候補が所定対象物である人物の顔であるか否かを識別する第２の識別部１０、並びに第１および第２の識別部８，１０による識別結果を出力する出力部１２とを備える。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a schematic block diagram showing the configuration of the object identification device according to the first embodiment of the present invention. As shown in FIG. 1, the object identification device 1 according to the first embodiment includes an image input unit 2 that receives input of identification target image data S0 representing an image to be identified, and an identification represented by the identification target image data S0. A feature value calculation unit 4 that calculates first and second feature values C1 and C2 from a target image (hereinafter also referred to as reference image S0) S0, and first and second reference data R1 and R2 to be described later are stored. Based on the first feature amount C1 calculated by the memory 6 and the feature amount calculation unit 4 and the first reference data R1 in the memory 6, the identification target image S0 includes the face of the person who is the predetermined target. When the first identification unit 8 for identifying whether or not a candidate is included and the first identification unit 8 identify that the face candidate is included in the identification target image S0, the feature amount calculation unit 4 Calculated second feature amount C2 and memory A second identification unit 10 for identifying whether or not the face candidate is a face of a person who is a predetermined object, and the first and second identification units 8 based on the second reference data R2 , 10 and an output unit 12 for outputting the identification results.

特徴量算出部４は、顔の識別に用いる正規化が不要な第１の特徴量Ｃ１を識別対象画像Ｓ０から算出するとともに、第２の特徴量Ｃ２を後述するように抽出された顔候補内の画像から算出する。具体的には、第１の特徴量Ｃ１として、識別対象画像Ｓ０の勾配ベクトルの方向を、第２の特徴量Ｃ２として顔候補内の画像の勾配ベクトル（すなわち方向および大きさ）を算出する。以下、勾配ベクトルの算出について説明する。まず、特徴量算出部４は、識別対象画像Ｓ０に対して図２（ａ）に示す水平方向のエッジ検出フィルタによるフィルタリング処理を施して識別対象画像Ｓ０における水平方向のエッジを検出する。また、特徴量算出部４は、識別対象画像Ｓ０に対して図２（ｂ）に示す垂直方向のエッジ検出フィルタによるフィルタリング処理を施して識別対象画像Ｓ０における垂直方向のエッジを検出する。そして、識別対象画像Ｓ０上の各画素における水平方向のエッジの大きさＨおよび垂直方向のエッジの大きさＶとから、図３に示すように、各画素における勾配ベクトルＫを算出する。 The feature quantity calculation unit 4 calculates the first feature quantity C1 that is not required for normalization used for face identification from the identification target image S0, and extracts the second feature quantity C2 from the extracted face candidates as described later. It calculates from the image. Specifically, the direction of the gradient vector of the identification target image S0 is calculated as the first feature amount C1, and the gradient vector (that is, the direction and size) of the image in the face candidate is calculated as the second feature amount C2. Hereinafter, calculation of the gradient vector will be described. First, the feature amount calculation unit 4 performs a filtering process on the identification target image S0 using a horizontal edge detection filter shown in FIG. 2A to detect a horizontal edge in the identification target image S0. The feature amount calculation unit 4 performs filtering processing on the identification target image S0 using a vertical edge detection filter illustrated in FIG. 2B to detect vertical edges in the identification target image S0. Then, as shown in FIG. 3, a gradient vector K for each pixel is calculated from the horizontal edge size H and the vertical edge size V of each pixel on the identification target image S0.

そして、この勾配ベクトルＫの方向を第１の特徴量Ｃ１とする。具体的には勾配ベクトルＫの所定方向（例えば図３におけるｘ方向）を基準とした０から３５９度の値を第１の特徴量Ｃ１とする。 The direction of the gradient vector K is defined as a first feature amount C1. Specifically, a value from 0 to 359 degrees based on a predetermined direction of the gradient vector K (for example, the x direction in FIG. 3) is set as the first feature amount C1.

なお、このようにして算出された勾配ベクトルＫは、図４（ａ）に示すような人物の顔の場合、図４（ｂ）に示すように、目および口のように暗い部分においては目および口の中央を向き、鼻のように明るい部分においては鼻の位置から外側を向くものとなる。また、口よりも目の方が濃度の変化が大きいため、勾配ベクトルＫの大きさは口よりも目の方が大きくなる。 In the case of a human face as shown in FIG. 4 (a), the gradient vector K calculated in this way is the eye in a dark part such as the eyes and mouth as shown in FIG. 4 (b). It faces the center of the mouth and faces outward from the position of the nose in a bright part like the nose. Further, since the change in density is larger in the eyes than in the mouth, the magnitude of the gradient vector K is larger in the eyes than in the mouth.

ここで、第２の特徴量Ｃ２は顔候補内においてのみ算出される。また、第２の特徴量Ｃ２の勾配ベクトルＫの大きさは正規化される。この正規化は、顔候補内の全画素における勾配ベクトルＫの大きさのヒストグラムを求め、その大きさの分布が顔候補内の各画素が取り得る値（８ビットであれば０〜２５５）に均一に分布されるようにヒストグラムを平滑化して勾配ベクトルＫの大きさを修正することにより行う。例えば、勾配ベクトルＫの大きさが小さく、図５（ａ）に示すように勾配ベクトルＫの大きさが小さい側に偏ってヒストグラムが分布している場合には、大きさが０〜２５５の全領域に亘るものとなるように勾配ベクトルＫの大きさを正規化して図５（ｂ）に示すようにヒストグラムが分布するようにする。なお、演算量を低減するために、図５（ｃ）に示すように、勾配ベクトルＫのヒストグラムにおける分布範囲を例えば５分割し、５分割された頻度分布が図５（ｄ）に示すように０〜２５５の値を５分割した範囲に亘るものとなるように正規化することが好ましい。 Here, the second feature amount C2 is calculated only within the face candidate. Further, the magnitude of the gradient vector K of the second feature quantity C2 is normalized. This normalization obtains a histogram of the magnitude of the gradient vector K at all the pixels in the face candidate, and the distribution of the magnitude is a value that each pixel in the face candidate can take (0 to 255 if 8 bits). The histogram is smoothed so as to be uniformly distributed, and the magnitude of the gradient vector K is corrected. For example, in the case where the gradient vector K is small and the histogram is distributed to the side where the gradient vector K is small as shown in FIG. The magnitude of the gradient vector K is normalized so that it extends over the region so that the histogram is distributed as shown in FIG. In order to reduce the amount of calculation, as shown in FIG. 5C, the distribution range in the histogram of the gradient vector K is divided into, for example, five, and the frequency distribution divided into five is shown in FIG. 5D. It is preferable to normalize so that the value of 0 to 255 is in a range divided into five.

ここで、撮影を行う際には、照明の明るさや照明の方向が撮影時の条件に応じて様々であるため、明るさや照明の方向は識別対象画像Ｓ０毎に異なる。このように明るさや照明の方向が異なる識別対象画像Ｓ０のそれぞれについてそのまま勾配ベクトルＫを求めていたのでは、同じ顔であるのに目の位置における勾配ベクトルの大きさが異なってしまい、精度よく顔候補が顔であるか否かを識別することができない。この場合、勾配ベクトルＫの大きさを識別対象画像Ｓ０の全体について正規化すればよいが、正規化は演算量が多いため処理に時間がかかる。このため、本実施形態においては、識別対象画像Ｓ０の全体ではなく、第１の識別部８が識別した顔候補についてのみ第２の特徴量の正規化を行うことにより、演算量を低減して処理時間を短縮している。 Here, when shooting, the brightness and direction of illumination vary depending on the conditions at the time of shooting, so the brightness and direction of illumination differ for each identification target image S0. As described above, if the gradient vector K is obtained as it is for each of the identification target images S0 having different brightness and illumination directions, the magnitude of the gradient vector at the eye position is different even though the face is the same. Whether the face candidate is a face cannot be identified. In this case, the magnitude of the gradient vector K may be normalized for the entire identification target image S0. However, the normalization takes a long time because of the large amount of calculation. For this reason, in the present embodiment, the amount of computation is reduced by normalizing the second feature amount only for the face candidate identified by the first identification unit 8 instead of the entire identification target image S0. Processing time is shortened.

なお、特徴量算出部４は、後述するように識別対象画像Ｓ０および顔候補の変形の各段階において第１および第２の特徴量Ｃ１，Ｃ２を算出する。 Note that the feature amount calculation unit 4 calculates first and second feature amounts C1 and C2 at each stage of the transformation of the identification target image S0 and the face candidate, as will be described later.

メモリ６内に格納されている第１の参照データＲ１は、後述するサンプル画像から選択された複数画素の組み合わせからなる複数種類の画素群のそれぞれについて、各画素群を構成する各画素における第１の特徴量Ｃ１の組み合わせに対する識別条件を規定したものである。また、第２の参照データＲ２は、サンプル画像から選択された複数画素の組み合わせからなる複数種類の画素群のそれぞれについて、各画素群を構成する各画素における第２の特徴量Ｃ２の組み合わせに対する識別条件を規定したものである。 The first reference data R1 stored in the memory 6 is the first reference data in each pixel constituting each pixel group for each of a plurality of types of pixel groups composed of a combination of a plurality of pixels selected from a sample image to be described later. The identification condition for the combination of the feature amount C1 is defined. In addition, the second reference data R2 is an identification for the combination of the second feature amount C2 in each pixel constituting each pixel group, for each of a plurality of types of pixel groups including a combination of a plurality of pixels selected from the sample image. The conditions are specified.

第１および第２の参照データＲ１，Ｒ２中の、各画素群を構成する各画素における第１および第２の特徴量Ｃ１，Ｃ２の組み合わせおよび識別条件は、顔であることが分かっている複数のサンプル画像と顔でないことが分かっている複数のサンプル画像とからなるサンプル画像群の学習により、あらかじめ決められたものである。 A plurality of combinations of the first and second feature values C1 and C2 and identification conditions in each pixel constituting each pixel group in the first and second reference data R1 and R2 are known to be faces. And a sample image group made up of a plurality of sample images that are known to be non-faces.

なお、本実施形態においては、顔であることが分かっているサンプル画像として、３０×３０画素サイズを有し、図６に示すように、１つの顔の画像について両目の中心間の距離が１０画素、９画素および１１画素であり、垂直に立った顔を基準として平面上±１５度の範囲において３度単位で段階的に回転させた（すなわち、回転角度が−１５度，−１２度，−９度，−６度，−３度，０度，３度，６度，９度，１２度，１５度）サンプル画像を用いるものとする。したがって、１つの顔の画像につきサンプル画像は３×１１＝３３通り用意される。ここで、顔が垂直に立った状態において上下方向における目の位置はすべてのサンプル画像において同一である。なお、図６においては−１５度、０度および＋１５度に回転させたサンプル画像のみを示す。また、回転の中心はサンプル画像の対角線の交点である。また、顔でないことが分かっているサンプル画像としては、３０×３０画素サイズを有する任意の画像を用いるものとする。 In this embodiment, the sample image that is known to be a face has a 30 × 30 pixel size, and as shown in FIG. Pixels, 9 pixels, and 11 pixels, which are rotated stepwise in units of 3 degrees within a range of ± 15 degrees on the plane with respect to a vertically standing face (that is, the rotation angles are −15 degrees, −12 degrees, (-9 degrees, -6 degrees, -3 degrees, 0 degrees, 3 degrees, 6 degrees, 9 degrees, 12 degrees, 15 degrees) sample images shall be used. Therefore, 3 × 11 = 33 sample images are prepared for one face image. Here, in the state where the face stands vertically, the position of the eyes in the vertical direction is the same in all the sample images. In FIG. 6, only sample images rotated at −15 degrees, 0 degrees, and +15 degrees are shown. The center of rotation is the intersection of the diagonal lines of the sample image. As a sample image that is known not to be a face, an arbitrary image having a 30 × 30 pixel size is used.

ここで、顔であることが分かっているサンプル画像として、両目の中心間距離が１０画素であり、平面上の回転角度が０度（すなわち顔が垂直な状態）のもののみを用いて学習を行った場合、第１および第２の参照データＲ１，Ｒ２を参照して顔候補または顔であると識別されるのは、両目の中心間距離が１０画素で全く回転していない顔候補または顔のみである。識別対象画像Ｓ０に含まれる可能性がある顔のサイズは一定ではないため、顔候補が含まれるか否かあるいは顔候補が顔であるか否かを識別する際には、後述するように識別対象画像Ｓ０を拡大縮小して、サンプル画像のサイズに適合するサイズの顔を識別できるようにしている。しかしながら、両目の中心間距離を正確に１０画素とするためには、識別対象画像Ｓ０のサイズを拡大率として例えば１．１単位で段階的に拡大縮小しつつ識別を行う必要があるため、演算量が膨大なものとなる。 Here, as a sample image that is known to be a face, learning is performed using only a center image whose distance between the centers of both eyes is 10 pixels and the rotation angle on the plane is 0 degree (that is, the face is vertical). When it is performed, the face candidate or face that is identified as a face candidate or face with reference to the first and second reference data R1 and R2 is a face candidate or face that is not rotated at all because the distance between the centers of both eyes is 10 pixels. Only. Since the size of a face that may be included in the identification target image S0 is not constant, when identifying whether a face candidate is included or whether a face candidate is a face, identification is performed as described later. The target image S0 is enlarged or reduced so that a face having a size suitable for the size of the sample image can be identified. However, in order to accurately set the distance between the centers of both eyes to 10 pixels, it is necessary to perform identification while gradually enlarging or reducing the size of the identification target image S0 by, for example, 1.1 units. The amount will be enormous.

また、識別対象画像Ｓ０に含まれる可能性がある顔は、図７（ａ）に示すように平面上の回転角度が０度のみではなく、図７（ｂ）、（ｃ）に示すように回転している場合もある。しかしながら、両目の中心間距離が１０画素であり、顔の回転角度が０度のサンプル画像のみを使用して学習を行った場合、顔であるにも拘わらず、図７（ｂ）、（ｃ）に示すように回転した顔については識別を行うことができなくなってしまう。 Further, the faces that may be included in the identification target image S0 are not only rotated at 0 degrees on the plane as shown in FIG. 7A, but also as shown in FIGS. 7B and 7C. It may be rotating. However, when learning is performed using only a sample image in which the distance between the centers of both eyes is 10 pixels and the rotation angle of the face is 0 degree, FIGS. 7B and 7C are used regardless of the face. As shown in (), the rotated face cannot be identified.

このため、本実施形態においては、顔であることが分かっているサンプル画像として、図６に示すように両目の中心間距離が９，１０，１１画素であり、各距離において平面上±１５度の範囲にて３度単位で段階的に顔を回転させたサンプル画像を用いて、第１および第２の参照データＲ１，Ｒ２の学習に許容度を持たせるようにしたものである。これにより、識別対象画像Ｓ０を、拡大率として１１／９単位で段階的に拡大縮小すればよいため、識別対象画像Ｓ０のサイズを例えば拡大率として１．１単位で段階的に拡大縮小する場合と比較して、演算時間を低減できる。また、図７（ｂ）、（ｃ）に示すように回転している顔も識別することができる。 Therefore, in this embodiment, as a sample image known to be a face, the distance between the centers of both eyes is 9, 10, 11 pixels as shown in FIG. 6, and ± 15 degrees on the plane at each distance. Using the sample image obtained by rotating the face step by step in the range of 3 degrees, the learning of the first and second reference data R1, R2 is allowed. As a result, the identification target image S0 may be scaled up and down in steps of 11/9 as an enlargement rate. For example, the size of the identification target image S0 is scaled up and down in steps of 1.1 units as the enlargement rate, for example. Compared with, the calculation time can be reduced. Further, as shown in FIGS. 7B and 7C, a rotating face can also be identified.

以下、図８のフローチャートを参照しながらサンプル画像群の学習手法の一例を説明する。なお、ここでは第２の参照データＲ２の学習について説明する。 Hereinafter, an example of a learning method for the sample image group will be described with reference to the flowchart of FIG. Here, learning of the second reference data R2 will be described.

学習の対象となるサンプル画像群は、顔であることが分かっている複数のサンプル画像と、顔でないことが分かっている複数のサンプル画像とからなる。なお、顔であることが分かっているサンプル画像は、１つのサンプル画像につき両目の中心位置が９，１０，１１画素であり、各距離において平面上±１５度の範囲にて３度単位で段階的に顔を回転させたものを用いる。各サンプル画像には、重みすなわち重要度が割り当てられる。まず、すべてのサンプル画像の重みの初期値が等しく１に設定される（ステップＳ１）。 The group of sample images to be learned includes a plurality of sample images that are known to be faces and a plurality of sample images that are known not to be faces. A sample image known to be a face has 9, 10, 11 pixels at the center of both eyes for each sample image, and is stepped in units of 3 degrees within a range of ± 15 degrees on the plane at each distance. The one with the face rotated is used. Each sample image is assigned a weight or importance. First, the initial value of the weight of all the sample images is set equal to 1 (step S1).

次に、サンプル画像における複数種類の画素群のそれぞれについて識別器が作成される（ステップＳ２）。ここで、それぞれの識別器とは、１つの画素群を構成する各画素における第２の特徴量Ｃ２の組み合わせを用いて、顔の画像と顔でない画像とを識別する基準を提供するものである。本実施形態においては、１つの画素群を構成する各画素における第２の特徴量Ｃ２の組み合わせについてのヒストグラムを識別器として使用する。 Next, a discriminator is created for each of a plurality of types of pixel groups in the sample image (step S2). Here, each discriminator provides a reference for discriminating between a face image and a non-face image by using a combination of the second feature amount C2 in each pixel constituting one pixel group. . In the present embodiment, a histogram for a combination of the second feature amount C2 in each pixel constituting one pixel group is used as a discriminator.

図９を参照しながらある識別器の作成について説明する。図９の左側のサンプル画像に示すように、この識別器を作成するための画素群を構成する各画素は、顔であることが分かっている複数のサンプル画像上における、右目の中心にある画素Ｐ１、右側の頬の部分にある画素Ｐ２、額の部分にある画素Ｐ３および左側の頬の部分にある画素Ｐ４である。そして顔であることが分かっている全てのサンプル画像について全画素Ｐ１〜Ｐ４における第２の特徴量Ｃ２の組み合わせが求められ、そのヒストグラムが作成される。ここで、第２の特徴量Ｃ２は勾配ベクトルＫの方向および大きさを表すが、勾配ベクトルＫの方向は０〜３５９の３６０通り、勾配ベクトルＫの大きさは０〜２５５の２５６通りあるため、これをそのまま用いたのでは、組み合わせの数は１画素につき３６０×２５６通りの４画素分、すなわち（３６０×２５６）⁴通りとなってしまい、学習および検出のために多大なサンプルの数、時間およびメモリを要することとなる。このため、本実施形態においては、勾配ベクトルの方向を０〜３５９を０〜４４と３１５〜３５９（右方向、値：０），４５〜１３４（上方向値：１），１３５〜２２４（左方向、値：２），２２５〜３１４（下方向、値３）に４値化し、勾配ベクトルの大きさを３値化（値：０〜２）する。そして、以下の式を用いて組み合わせの値を算出する。 The creation of a classifier will be described with reference to FIG. As shown in the sample image on the left side of FIG. 9, each pixel constituting the pixel group for creating the discriminator is a pixel at the center of the right eye on a plurality of sample images that are known to be faces. P1, a pixel P2 on the right cheek, a pixel P3 on the forehead, and a pixel P4 on the left cheek. Then, combinations of the second feature amount C2 in all the pixels P1 to P4 are obtained for all sample images that are known to be faces, and a histogram thereof is created. Here, the second feature amount C2 represents the direction and magnitude of the gradient vector K. Since the gradient vector K has 360 directions from 0 to 359 and the gradient vector K has 256 sizes from 0 to 255, If this is used as it is, the number of combinations is 360 × 256 four pixels per pixel, that is, (360 × 256) ⁴ types, and the number of samples is large for learning and detection. Time and memory are required. For this reason, in this embodiment, the gradient vector directions are 0 to 359, 0 to 44, 315 to 359 (right direction, value: 0), 45 to 134 (upward value: 1), and 135 to 224 (left). Direction, value: 2), 225-314 (downward, value 3), and quaternarization, and the gradient vector magnitude is ternarized (value: 0-2). And the value of a combination is computed using the following formula | equation.

組み合わせの値＝０（勾配ベクトルの大きさ＝０の場合）
組み合わせの値＝（（勾配ベクトルの方向＋１）×勾配ベクトルの大きさ（勾配ベクトルの大きさ＞０の場合）
これにより、組み合わせ数が９⁴通りとなるため、第２の特徴量Ｃ２のデータ数を低減できる。 Combination value = 0 (when gradient vector size = 0)
Combination value = ((gradient vector direction + 1) × gradient vector magnitude (gradient vector magnitude> 0)
Thus, since the number of combinations is nine patterns ^4, it can reduce the number of data of the second feature quantity C2.

同様に、顔でないことが分かっている複数のサンプル画像についても、ヒストグラムが作成される。なお、顔でないことが分かっているサンプル画像については、顔であることが分かっているサンプル画像上における上記画素Ｐ１〜Ｐ４の位置に対応する画素（同様に参照符号Ｐ１〜Ｐ４を用いる）が用いられる。これらの２つのヒストグラムが示す頻度値の比の対数値を取ってヒストグラムで表したものが、図９の一番右側に示す、識別器として用いられるヒストグラムである。この識別器のヒストグラムが示す各縦軸の値を、以下、識別ポイントと称する。この識別器によれば、正の識別ポイントに対応する第２の特徴量Ｃ２の分布を示す画像は顔である可能性が高く、識別ポイントの絶対値が大きいほどその可能性は高まると言える。逆に、負の識別ポイントに対応する第２の特徴量Ｃ２の分布を示す画像は顔でない可能性が高く、やはり識別ポイントの絶対値が大きいほどその可能性は高まる。ステップＳ２では、識別に使用され得る複数種類の画素群を構成する各画素における第２の特徴量Ｃ２の組み合わせについて、上記のヒストグラム形式の複数の識別器が作成される。 Similarly, histograms are created for a plurality of sample images that are known not to be faces. For sample images that are known not to be faces, pixels corresponding to the positions of the pixels P1 to P4 on the sample images that are known to be faces (similarly, reference numerals P1 to P4 are used) are used. It is done. The histogram used as a discriminator shown on the right side of FIG. 9 is a histogram obtained by taking logarithm values of the ratios of the frequency values indicated by these two histograms. The value of each vertical axis indicated by the histogram of the discriminator is hereinafter referred to as an identification point. According to this discriminator, an image showing the distribution of the second feature amount C2 corresponding to the positive discrimination point is highly likely to be a face, and it can be said that the possibility increases as the absolute value of the discrimination point increases. Conversely, an image showing the distribution of the second feature quantity C2 corresponding to the negative identification point is highly likely not to be a face, and the possibility increases as the absolute value of the identification point increases. In step S2, a plurality of classifiers in the above-described histogram format are created for combinations of the second feature amount C2 in each pixel constituting a plurality of types of pixel groups that can be used for identification.

続いて、ステップＳ２で作成した複数の識別器のうち、画像が顔であるか否かを識別するのに最も有効な識別器が選択される。最も有効な識別器の選択は、各サンプル画像の重みを考慮して行われる。この例では、各識別器の重み付き正答率が比較され、最も高い重み付き正答率を示す識別器が選択される（ステップＳ３）。すなわち、最初のステップＳ３では、各サンプル画像の重みは等しく１であるので、単純にその識別器によって画像が顔であるか否かが正しく識別されるサンプル画像の数が最も多いものが、最も有効な識別器として選択される。一方、後述するステップＳ５において各サンプル画像の重みが更新された後の２回目のステップＳ３では、重みが１のサンプル画像、重みが１よりも大きいサンプル画像、および重みが１よりも小さいサンプル画像が混在しており、重みが１よりも大きいサンプル画像は、正答率の評価において、重みが１のサンプル画像よりも重みが大きい分多くカウントされる。これにより、２回目以降のステップＳ３では、重みが小さいサンプル画像よりも、重みが大きいサンプル画像が正しく識別されることに、より重点が置かれる。 Subsequently, the most effective classifier for identifying whether or not the image is a face is selected from the plurality of classifiers created in step S2. The most effective classifier is selected in consideration of the weight of each sample image. In this example, the weighted correct answer rates of the classifiers are compared, and the classifier showing the highest weighted correct answer rate is selected (step S3). That is, in the first step S3, since the weight of each sample image is equal to 1, the number of sample images in which the image is correctly identified by the classifier is simply the largest. Selected as a valid discriminator. On the other hand, in the second step S3 after the weight of each sample image is updated in step S5, which will be described later, a sample image with a weight of 1, a sample image with a weight greater than 1, and a sample image with a weight less than 1 The sample images having a weight greater than 1 are counted more in the evaluation of the correct answer rate because the weight is larger than the sample images having a weight of 1. Thereby, in step S3 after the second time, more emphasis is placed on correctly identifying a sample image having a large weight than a sample image having a small weight.

次に、それまでに選択した識別器の組み合わせの正答率、すなわち、それまでに選択した識別器を組み合わせて使用して各サンプル画像が顔の画像であるか否かを識別した結果が、実際に顔の画像であるか否かの答えと一致する率が、所定の閾値を超えたか否かが確かめられる（ステップＳ４）。ここで、組み合わせの正答率の評価に用いられるのは、現在の重みが付けられたサンプル画像群でも、重みが等しくされたサンプル画像群でもよい。所定の閾値を超えた場合は、それまでに選択した識別器を用いれば画像が顔であるか否かを十分に高い確率で識別できるため、学習は終了する。所定の閾値以下である場合は、それまでに選択した識別器と組み合わせて用いるための追加の識別器を選択するために、ステップＳ６へと進む。 Next, the correct answer rate of the classifiers selected so far, that is, the result of identifying whether each sample image is a face image using a combination of the classifiers selected so far, is actually It is ascertained whether or not the rate that matches the answer of whether or not the image is a face image exceeds a predetermined threshold (step S4). Here, the sample image group to which the current weight is applied or the sample image group to which the weight is equal may be used for evaluating the correct answer rate of the combination. When the predetermined threshold value is exceeded, learning can be completed because it is possible to identify whether the image is a face with a sufficiently high probability by using the classifier selected so far. If it is equal to or less than the predetermined threshold value, the process proceeds to step S6 in order to select an additional classifier to be used in combination with the classifier selected so far.

ステップＳ６では、直近のステップＳ３で選択された識別器が再び選択されないようにするため、その識別器が除外される。 In step S6, the discriminator selected in the most recent step S3 is excluded so as not to be selected again.

次に、直近のステップＳ３で選択された識別器では顔であるか否かを正しく識別できなかったサンプル画像の重みが大きくされ、画像が顔であるか否かを正しく識別できたサンプル画像の重みが小さくされる（ステップＳ５）。このように重みを大小させる理由は、次の識別器の選択において、既に選択された識別器では正しく識別できなかった画像を重要視し、それらの画像が顔であるか否かを正しく識別できる識別器が選択されるようにして、識別器の組み合わせの効果を高めるためである。 Next, the weight of the sample image that could not be correctly identified as a face by the classifier selected in the most recent step S3 is increased, and the sample image that can be correctly identified as whether or not the image is a face is increased. The weight is reduced (step S5). The reason for increasing or decreasing the weight in this way is that in selecting the next discriminator, an image that cannot be discriminated correctly by the already selected discriminator is regarded as important, and whether or not those images are faces can be discriminated correctly. This is to increase the effect of the combination of the discriminators by selecting the discriminators.

続いて、ステップＳ３へと戻り、上記したように重み付き正答率を基準にして次に有効な識別器が選択される。 Subsequently, the process returns to step S3, and the next valid classifier is selected based on the weighted correct answer rate as described above.

以上のステップＳ３からＳ６を繰り返して、顔が含まれるか否かを識別するのに適した識別器として、特定の画素群を構成する各画素における第２の特徴量Ｃ２の組み合わせに対応する識別器が選択されたところで、ステップＳ４で確認される正答率が閾値を超えたとすると、顔が含まれるか否かの識別に用いる識別器の種類と識別条件とが確定され（ステップＳ７）、これにより第２の参照データＲ２の学習を終了する。 As a discriminator suitable for discriminating whether or not a face is included by repeating the above steps S3 to S6, discrimination corresponding to a combination of the second feature amount C2 in each pixel constituting a specific pixel group If the correct answer rate confirmed in step S4 exceeds the threshold when the device is selected, the type of the discriminator used for identifying whether or not a face is included and the identification condition are determined (step S7). Thus, the learning of the second reference data R2 is finished.

そして、上記と同様に識別器の種類と識別条件とを求めることにより第１の参照データＲ１の学習がなされる。 Then, the first reference data R1 is learned by obtaining the classifier type and the identification condition in the same manner as described above.

なお、上記の学習手法を採用する場合において、識別器は、特定の画素群を構成する各画素における第１および第２の特徴量Ｃ１，Ｃ２の組み合わせを用いて顔の画像と顔でない画像とを識別する基準を提供するものであれば、上記のヒストグラムの形式のものに限られずいかなるものであってもよく、例えば２値データ、閾値または関数等であってもよい。また、同じヒストグラムの形式であっても、図９の中央に示した２つのヒストグラムの差分値の分布を示すヒストグラム等を用いてもよい。 In the case of adopting the above learning method, the classifier uses a combination of the first and second feature amounts C1 and C2 in each pixel constituting a specific pixel group, As long as it provides a criterion for identifying the data, it is not limited to the above-described histogram format, and may be any data such as binary data, a threshold value, or a function. Further, even in the same histogram format, a histogram or the like indicating the distribution of difference values between the two histograms shown in the center of FIG. 9 may be used.

また、学習の方法としては上記手法に限定されるものではなく、ニューラルネットワーク等他のマシンラーニングの手法を用いることができる。なお、第１および第２の参照データＲ１，Ｒ２は、熟練した技術者により経験的に定められたものであってもよい。 Further, the learning method is not limited to the above method, and other machine learning methods such as a neural network can be used. Note that the first and second reference data R1 and R2 may be empirically determined by a skilled engineer.

また、上記の学習方法において、識別器を作成するための画素群を合成する画素として、図１０に示すように顔であることが分かっているサンプル画像における左目と左頬とを含む第１の領域Ａ１および右目と右頬とを含む第２の領域Ａ２内の画素のみを用いるようにしてもよい。また、第１および第２の領域Ａ１，Ａ２に加えて、図１０に破線で示すように両目を含む第３の領域Ａ３内の画素を用いるようにしてもよい。 In the above learning method, the first pixel including the left eye and the left cheek in the sample image that is known to be a face as shown in FIG. 10 is used as a pixel to synthesize a pixel group for creating a discriminator. Only the pixels in the area A1 and the second area A2 including the right eye and the right cheek may be used. Further, in addition to the first and second regions A1 and A2, pixels in the third region A3 including both eyes may be used as indicated by broken lines in FIG.

この場合、領域Ａ１，Ａ２，Ａ３の位置は学習に用いるすべてのサンプル画像において同一とする。すなわち、本実施形態においては、図６に示すように両目の中心間距離が９，１０，１１画素であり、各距離において平面上±１５度の範囲にて３度単位で段階的に顔を回転させることにより変形したサンプル画像を用いて、第１および第２の参照データＲ１，Ｒ２の学習を行うが、変形したサンプル画像上における領域Ａ１，Ａ２，Ａ３の位置を、両目の中心間距離が１０画素で回転角度が０度のサンプル画像に設定した領域Ａ１，Ａ２，Ａ３の位置と同一とする。また、顔でないことが分かっているサンプル画像についても、設定する領域Ａ１，Ａ２，Ａ３の位置は、両目の中心間距離が１０画素で回転角度が０度の顔であることが分かっているサンプル画像に設定した領域Ａ１，Ａ２，Ａ３の位置と同一とする。したがって、図１１に示すように学習に用いるすべてのサンプル画像上に設定された領域Ａ１，Ａ２さらには領域Ａ３内の画素のみを用いて識別器を作成することとなる。 In this case, the positions of the areas A1, A2, and A3 are the same in all sample images used for learning. That is, in this embodiment, as shown in FIG. 6, the distance between the centers of both eyes is 9, 10, 11 pixels, and the face is stepped in units of 3 degrees within a range of ± 15 degrees on the plane at each distance. The first and second reference data R1 and R2 are learned using the sample image deformed by rotation. The positions of the regions A1, A2, and A3 on the deformed sample image are determined by the distance between the centers of both eyes. Are the same as the positions of the regions A1, A2, and A3 set in the sample image with 10 pixels and the rotation angle of 0 degrees. Also for sample images that are known not to be faces, the positions of the areas A1, A2, and A3 to be set are samples that are known to be faces whose center-to-center distance between both eyes is 10 pixels and whose rotation angle is 0 degrees. Assume that the positions of the regions A1, A2, and A3 set in the image are the same. Therefore, as shown in FIG. 11, the discriminator is created using only the pixels in the areas A1, A2 and further in the area A3 set on all the sample images used for learning.

このように、学習の際にサンプル画像における第１から第３の領域Ａ１〜Ａ３内の画素のみを用いて識別器を作成することにより、第１および第２の参照データＲ１，Ｒ２の学習時間を大幅に短縮することができる。 Thus, the learning time of the first and second reference data R1 and R2 is created by creating a discriminator using only the pixels in the first to third regions A1 to A3 in the sample image during learning. Can be greatly shortened.

また、識別対象画像Ｓ０に顔が含まれるか否かの識別を行う場合において、第１および第２の領域Ａ１，Ａ２、さらには第３の領域Ａ３に含まれる画素を用いて作成した識別器が識別性能の向上に大きく寄与することが本出願人の実験により確認されている。このため、学習の際に第１および第２の領域Ａ１，Ａ２、さらには第３の領域Ａ３に含まれる画素のみを用いて識別器を作成して第１および第２の参照データＲ１，Ｒ２の学習を行うことにより、識別対象画像Ｓ０に顔が含まれるか否かの識別性能をより向上させることができる。 Further, when identifying whether or not a face is included in the classification target image S0, a classifier created using pixels included in the first and second regions A1 and A2 and further the third region A3. Has been confirmed by experiments of the present applicant to greatly contribute to the improvement of the discrimination performance. For this reason, at the time of learning, a discriminator is created using only the pixels included in the first and second regions A1, A2, and further the third region A3, and the first and second reference data R1, R2 are created. By performing this learning, it is possible to further improve the identification performance as to whether or not a face is included in the identification target image S0.

第１の識別部８は、複数種類の画素群を構成する各画素における第１の特徴量Ｃ１の組み合わせのすべてについて第１の参照データＲ１が学習した識別条件を参照して、各々の画素群を構成する各画素における第１の特徴量Ｃ１の組み合わせについての識別ポイントを求め、すべての識別ポイントを総合して識別対象画像Ｓ０に顔候補が含まれるか否かを識別する。この際、第１の特徴量Ｃ１である勾配ベクトルＫの方向は第１の参照データＲ１を学習した場合と同様に例えば４値化される。本実施形態では、すべての識別ポイントを加算して、その加算値の正負によって識別を行うものとする。例えば、識別ポイントの総和が正の値である場合には識別対象画像Ｓ０には顔候補が含まれると判断し、負の値である場合には顔候補は含まれないと判断する。なお、第１の識別部８が行う識別対象画像Ｓ０に顔候補が含まれるか否かの識別を第１の識別と称する。 The first identification unit 8 refers to the identification conditions learned by the first reference data R1 for all combinations of the first feature amount C1 in each pixel constituting a plurality of types of pixel groups, and each pixel group An identification point for the combination of the first feature amount C1 in each pixel constituting the pixel is obtained, and all the identification points are combined to identify whether or not a face candidate is included in the classification target image S0. At this time, the direction of the gradient vector K, which is the first feature amount C1, is converted into, for example, a quaternary value as in the case of learning the first reference data R1. In the present embodiment, all the identification points are added, and identification is performed based on the positive / negative of the added value. For example, when the sum of the identification points is a positive value, it is determined that a face candidate is included in the identification target image S0, and when it is a negative value, it is determined that no face candidate is included. The identification of whether or not a face candidate is included in the identification target image S0 performed by the first identification unit 8 is referred to as a first identification.

ここで、識別対象画像Ｓ０のサイズは３０×３０画素のサンプル画像とは異なり、各種サイズを有するものとなっている。また、顔が含まれる場合、平面上における顔の回転角度が０度であるとは限らない。このため、第１の識別部８は、図１２に示すように、識別対象画像Ｓ０を縦または横のサイズが３０画素となるまで段階的に拡大縮小するとともに平面上で段階的に３６０度回転させつつ（図１２においては縮小する状態を示す）、各段階において拡大縮小された識別対象画像Ｓ０上に３０×３０画素サイズのマスクＭを設定し、マスクＭを拡大縮小された識別対象画像Ｓ０上において１画素ずつ移動させながら、マスク内の画像が顔の画像であるか否かの識別を行うことにより、識別対象画像Ｓ０に顔候補が含まれるか否かを識別する。 Here, the size of the identification target image S0 is different from the sample image of 30 × 30 pixels and has various sizes. When a face is included, the rotation angle of the face on the plane is not always 0 degrees. For this reason, as shown in FIG. 12, the first identification unit 8 enlarges or reduces the identification target image S0 stepwise until the vertical or horizontal size becomes 30 pixels and rotates it 360 degrees stepwise on the plane. (In FIG. 12, a reduced state is shown), a mask M having a 30 × 30 pixel size is set on the identification target image S0 enlarged and reduced at each stage, and the identification target image S0 obtained by scaling the mask M is expanded. By identifying whether or not the image in the mask is a face image while moving one pixel at a time, whether or not a face candidate is included in the identification target image S0 is identified.

なお、第１および第２の参照データＲ１，Ｒ２の生成時に学習したサンプル画像として両目の中心位置の画素数が９，１０，１１画素のものを使用しているため、識別対象画像Ｓ０および顔候補の拡大縮小時の拡大率は１１／９とすればよい。また、第１および第２の参照データＲ１，Ｒ２の生成時に学習したサンプル画像として、顔が平面上で±１５度の範囲において回転させたものを使用しているため、識別対象画像Ｓ０および顔候補は３０度単位で３６０度回転させればよい。 Since the sample images learned at the time of generating the first and second reference data R1 and R2 have 9, 10, and 11 pixels at the center position of both eyes, the identification target image S0 and the face The enlargement ratio at the time of candidate enlargement / reduction may be 11/9. In addition, since the sample images learned at the time of generating the first and second reference data R1 and R2 are images obtained by rotating the face within a range of ± 15 degrees on the plane, the identification target image S0 and the face The candidate may be rotated 360 degrees in units of 30 degrees.

ここで、特徴量算出部４は、識別対象画像Ｓ０および顔候補の拡大縮小および回転という変形の各段階において第１および第２の特徴量Ｃ１，Ｃ２を算出する。 Here, the feature amount calculation unit 4 calculates the first and second feature amounts C1 and C2 at each stage of deformation, that is, enlargement / reduction and rotation of the identification target image S0 and the face candidate.

なお、第１および第２の参照データＲ１，Ｒ２の学習の際に、上述したようにサンプル画像に設定された第１および第２の領域Ａ１，Ａ２、さらには第３の領域Ａ３内の画素のみを用いて識別器を作成した場合には、特徴量算出部４はマスクＭにおける第１および第２の領域Ａ１，Ａ２、さらには第３の領域Ａ３に対応する各領域の画素のみを用いて第１および第２の特徴量Ｃ１，Ｃ２を算出する。 In the learning of the first and second reference data R1, R2, as described above, the pixels in the first and second areas A1, A2, and further in the third area A3 set in the sample image. When the discriminator is created using only the feature amount, the feature amount calculation unit 4 uses only the pixels in each region corresponding to the first and second regions A1 and A2 and further the third region A3 in the mask M. The first and second feature amounts C1, C2 are calculated.

そして、識別対象画像Ｓ０に顔候補が含まれるか否かの識別を拡大縮小および回転の全段階の識別対象画像Ｓ０について行い、一度でも顔候補が含まれると識別された場合には、識別対象画像Ｓ０には顔候補が含まれると識別し、顔候補が含まれると識別された段階におけるサイズおよび回転角度の識別対象画像Ｓ０から、識別されたマスクＭの位置に対応する３０×３０画素の領域を顔候補として抽出する。 Whether or not a face candidate is included in the identification target image S0 is determined for the identification target image S0 at all stages of enlargement / reduction and rotation, and if it is identified that the face candidate is included even once, the identification target The image S0 is identified as including a face candidate, and the 30 × 30 pixel corresponding to the position of the identified mask M is identified from the size and rotation angle identification target image S0 when the face candidate is identified as being included. Extract regions as face candidates.

第２の識別部１０は、第１の識別部８が抽出した顔候補上において、第１の識別部８と同様に顔候補を段階的に拡大縮小しつつ回転させることにより変形し、顔候補の変形の各段階において、複数種類の画素群を構成する各画素における第２の特徴量Ｃ２の組み合わせのすべてについて第２の参照データＲ２が学習した識別条件を参照して、各々の画素群を構成する各画素における第２の特徴量Ｃ２の組み合わせについての識別ポイントを求め、すべての識別ポイントを総合して顔候補が顔であるか否かを識別する。この際、第２の特徴量Ｃ２である勾配ベクトルＫの方向は４値化され大きさは３値化される。本実施形態では、すべての識別ポイントを加算して、その加算値の正負によって識別を行うものとする。例えば、識別ポイントの総和が正の値である場合には顔候補が顔であると判断され、負の値である場合には顔候補は顔でないと判断される。なお、第２の識別部１０が行う顔候補が顔であるか否かの識別を第２の識別と称する。 The second identification unit 10 deforms the face candidate extracted by the first identification unit 8 by rotating the face candidate while scaling up or down in the same manner as the first identification unit 8. In each stage of the transformation, each pixel group is referred to by referring to the identification condition learned by the second reference data R2 for all combinations of the second feature amount C2 in each pixel constituting a plurality of types of pixel groups. An identification point for the combination of the second feature amount C2 in each pixel constituting is obtained, and all the identification points are combined to identify whether the face candidate is a face. At this time, the direction of the gradient vector K, which is the second feature amount C2, is quaternized and the magnitude is ternary. In the present embodiment, all the identification points are added, and identification is performed based on the positive / negative of the added value. For example, when the sum of the identification points is a positive value, the face candidate is determined to be a face, and when the sum is negative, the face candidate is determined not to be a face. The identification of whether or not the face candidate performed by the second identification unit 10 is a face is referred to as a second identification.

出力部１２は、第１の識別部８が識別対象画像Ｓ０に顔候補が含まれないと識別した場合、および第１の識別部８が識別対象画像Ｓ０に顔候補が含まれると識別しても第２の識別部１０がその顔候補が顔でないと識別した場合に、識別対象画像Ｓ０には顔が含まれない旨の識別結果を出力する。一方、第２の識別部１０が第１の識別部８が識別した顔候補が顔であると識別した場合、識別対象画像Ｓ０から識別された顔をトリミングすることにより抽出して抽出された顔の画像を表す顔画像データＳ１を出力する。 The output unit 12 recognizes that the first identification unit 8 identifies that the identification target image S0 does not include a face candidate, and the first identification unit 8 identifies that the identification target image S0 includes a face candidate. When the second identification unit 10 identifies that the face candidate is not a face, the identification result indicating that the identification target image S0 does not include a face is output. On the other hand, when the second identifying unit 10 identifies that the face candidate identified by the first identifying unit 8 is a face, the face extracted and extracted by trimming the identified face from the identification target image S0 The face image data S1 representing the image is output.

次いで、第１の実施形態において行われる処理について説明する。図１３は第１の実施形態において行われる処理を示すフローチャートである。まず、画像入力部２が識別対象画像データＳ０の入力を受け付ける（ステップＳ１１）。この際、多数の画像に関する一連の画像データＳ０の入力を連続的に受け付けてもよい。次いで、特徴量算出部４が識別対象画像Ｓ０の拡大縮小および回転の各段階において、識別対象画像Ｓ０の勾配ベクトルＫの方向を第１の特徴量Ｃ１として算出する（ステップＳ１２）。そして、第１の識別部８がメモリ６から第１の参照データＲ１を読み出し（ステップＳ１３）、識別対象画像Ｓ０に顔候補が含まれるか否かの第１の識別を行う（ステップＳ１４）。 Next, processing performed in the first embodiment will be described. FIG. 13 is a flowchart showing processing performed in the first embodiment. First, the image input unit 2 accepts input of identification target image data S0 (step S11). At this time, a series of image data S0 related to a large number of images may be continuously received. Next, the feature amount calculation unit 4 calculates the direction of the gradient vector K of the identification target image S0 as the first feature amount C1 at each stage of enlargement / reduction and rotation of the identification target image S0 (step S12). Then, the first identification unit 8 reads the first reference data R1 from the memory 6 (step S13), and performs first identification as to whether or not a face candidate is included in the identification target image S0 (step S14).

ステップＳ１４が肯定されると、第１の識別部８は識別対象画像Ｓ０から顔候補を抽出する（ステップＳ１５）。なお、複数の顔候補を抽出してもよい。次いで、特徴量算出部４が顔候補の拡大縮小および回転の各段階において顔候補から第２の特徴量Ｃ２を算出し（ステップＳ１６）、第２の特徴量Ｃ２を正規化する（ステップＳ１７）。そして、第２の識別部１０がメモリ６から第２の参照データＲ２を読み出し（ステップＳ１８）、顔候補が顔であるか否かの第２の識別を行う（ステップＳ１９）。 If step S14 is positive, the first identification unit 8 extracts face candidates from the identification target image S0 (step S15). A plurality of face candidates may be extracted. Next, the feature quantity calculation unit 4 calculates the second feature quantity C2 from the face candidate at each stage of the face candidate enlargement / reduction and rotation (step S16), and normalizes the second feature quantity C2 (step S17). . Then, the second identification unit 10 reads the second reference data R2 from the memory 6 (step S18), and performs second identification as to whether or not the face candidate is a face (step S19).

ステップＳ１９が肯定されると、出力部１２が識別対象画像Ｓ０から識別された顔を抽出し、抽出された顔の画像を表す顔画像データＳ１を出力し（ステップＳ２０）、処理を終了する。 If step S19 is positive, the output unit 12 extracts the identified face from the identification target image S0, outputs face image data S1 representing the extracted face image (step S20), and ends the process.

ステップＳ１４およびステップＳ１９が否定されると、識別対象画像Ｓ０には顔が含まれないとして出力部１２がその旨を表す識別結果を出力し（ステップＳ２１）、処理を終了する。 If step S14 and step S19 are negative, the output unit 12 outputs an identification result indicating that the identification target image S0 does not include a face (step S21), and the process ends.

このように、第１の実施形態による対象物識別装置１の第１の識別部８においては、正規化が不要な勾配ベクトルＫの傾きという第１の特徴量Ｃ１を用いているため、識別対象画像Ｓ０の全体について顔候補が含まれるか否かを識別しても、それほど演算量は多くなく、その結果、比較的高速に識別対象画像Ｓ０に顔候補が含まれるか否かを識別できる。一方、第２の識別部１０においては、勾配ベクトルＫの傾きおよび大きさという第２の特徴量Ｃ２を正規化して顔候補が顔であるか否かの識別を行っているため、識別の精度は高いものの演算量が多くなる。しかしながら、本実施形態においては、第２の特徴量を正規化して第２の識別を行うのは、識別対象画像Ｓ０から抽出された顔候補の部分のみであるため、正規化のための演算量が少なくなり、その結果、識別の処理に要する時間は短いものとなる。したがって、本実施形態によれば、識別対象画像Ｓ０に顔が含まれるか否かの識別を高速かつ高精度に行うことができる。 As described above, the first identification unit 8 of the object identification device 1 according to the first embodiment uses the first feature amount C1 that is the gradient of the gradient vector K that does not require normalization, and thus the identification target. Even if it is identified whether or not a face candidate is included in the entire image S0, the amount of calculation is not so large, and as a result, it can be determined whether or not a face candidate is included in the identification target image S0 at a relatively high speed. On the other hand, since the second identification unit 10 normalizes the second feature amount C2 such as the gradient and the magnitude of the gradient vector K to identify whether the face candidate is a face, the identification accuracy is high. Is expensive, but the amount of computation increases. However, in the present embodiment, the second feature amount is normalized and the second identification is performed only for the face candidate portion extracted from the identification target image S0. As a result, the time required for the identification process is short. Therefore, according to the present embodiment, whether or not a face is included in the identification target image S0 can be identified at high speed and with high accuracy.

なお、識別対象画像Ｓ０において、サンプル画像における第１および第２の領域Ａ１，Ａ２さらには第３の領域Ａ３に対応する各領域から第１および第２の特徴量Ｃ１，Ｃ２を算出することにより、識別対象画像Ｓ０の全体から第１および第２の特徴量Ｃ１，Ｃ２を算出する場合よりも第１および第２の特徴量Ｃ１，Ｃ２を算出する範囲が小さくなるため、演算時間を短縮することができる。 In the identification target image S0, the first and second feature amounts C1 and C2 are calculated from the regions corresponding to the first and second regions A1 and A2 and further the third region A3 in the sample image. Since the range for calculating the first and second feature values C1 and C2 is smaller than when the first and second feature values C1 and C2 are calculated from the entire identification target image S0, the calculation time is shortened. be able to.

次いで、本発明の第２の実施形態について説明する。図１４は本発明の第２の実施形態による対象物識別装置１′の構成を示す概略ブロック図である。なお、第２の実施形態において第１の実施形態と同一の構成については同一の参照番号を付し、ここでは詳細な説明は省略する。第２の実施形態においては、第１の実施形態による対象物識別装置を構成する画像入力部２、特徴量算出部４、メモリ６、第１の識別部８、第２の識別部１０および出力部１２に加えて、第１の識別部８による識別結果が所定の要求を満たすか否かを判定し、この判定が肯定された場合は、第１の特徴量Ｃ１のみを算出し、第１の識別部８が識別した顔候補を顔と識別して第２の識別部１０における第２の識別を行わないよう、特徴量算出部４、第１の識別部８および第２の識別部１０を制御する制御部１４を備えた点が第１の実施形態と異なる。 Next, a second embodiment of the present invention will be described. FIG. 14 is a schematic block diagram showing a configuration of an object identification device 1 ′ according to the second embodiment of the present invention. In the second embodiment, the same components as those in the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted here. In the second embodiment, the image input unit 2, the feature amount calculation unit 4, the memory 6, the first identification unit 8, the second identification unit 10, and the output that constitute the object identification device according to the first embodiment. In addition to the unit 12, it is determined whether or not the identification result by the first identification unit 8 satisfies a predetermined request. If this determination is affirmative, only the first feature amount C1 is calculated, and the first The feature amount calculation unit 4, the first identification unit 8, and the second identification unit 10 are used so that the face candidate identified by the identification unit 8 is identified as a face and the second identification unit 10 does not perform the second identification. The point provided with the control part 14 which controls is different from the first embodiment.

制御部１４は、識別対象画像データＳ０に対する第１の識別および第２の識別を開始してから、識別を行った識別対象画像データＳ０の数が所定数に達した時点において、第１の識別部８が識別対象画像Ｓ０に顔候補が含まれると識別した回数（Ｎ１とする）と、第２の識別部１０が顔候補が顔であると識別した回数（Ｎ２とする）とを比較し、回数Ｎ１に対する回数Ｎ２の比Ｎ２／Ｎ１が例えば所定の割合（例えば０．９５）以上であるか否かを判定することにより、第１の識別部８の識別結果が所定の要求を満たすか否かを判定する。そしてこの判定が肯定されると第１の識別部８による顔候補の識別精度が非常に高いものであるとして、それ以降識別を行う識別対象画像データＳ０については、特徴量算出部４において第１の特徴量Ｃ１のみを算出し、第１の識別部８においてのみ識別対象画像Ｓ０に顔候補が含まれるか否かを識別し、顔候補が含まれると識別された場合にはその顔候補が顔であるものとして、識別対象画像Ｓ０には顔が含まれると識別し、識別結果を出力部１２に出力するよう特徴量算出部４、第１の識別部８および第２の識別部１０を制御する。 The control unit 14 starts the first identification and the second identification with respect to the identification target image data S0, and when the number of identification target image data S0 that has been identified reaches a predetermined number, The number of times that the unit 8 identifies that the face candidate is included in the identification target image S0 (N1) is compared with the number of times that the second identification unit 10 identifies that the face candidate is a face (N2). Whether the identification result of the first identification unit 8 satisfies a predetermined request by determining whether the ratio N2 / N1 of the number N2 to the number N1 is, for example, a predetermined ratio (for example, 0.95) or more. Determine whether or not. If this determination is affirmative, it is determined that the identification accuracy of the face candidate by the first identification unit 8 is very high, and the feature amount calculation unit 4 performs first identification on the target image data S0 to be identified. Only the feature amount C1 is calculated, and only the first identification unit 8 identifies whether or not a face candidate is included in the identification target image S0. If it is identified that the face candidate is included, the face candidate is identified. The feature amount calculation unit 4, the first identification unit 8, and the second identification unit 10 are identified so that the identification target image S 0 is identified as including a face and the identification result is output to the output unit 12. Control.

次いで、第２の実施形態において行われる処理について説明する。なお、第２の実施形態においては、画像入力部２、特徴量算出部４、メモリ６、第１の識別部８、第２の識別部１０および出力部１２において行われる処理は第１の実施形態において行われる処理と同一であるため、ここでは制御部１４が行う処理についてのみ説明する。 Next, processing performed in the second embodiment will be described. In the second embodiment, the processing performed in the image input unit 2, the feature amount calculation unit 4, the memory 6, the first identification unit 8, the second identification unit 10, and the output unit 12 is the first implementation. Since the process is the same as that performed in the embodiment, only the process performed by the control unit 14 will be described here.

図１５は第２の実施形態において行われる処理を示すフローチャートである。識別対象画像Ｓ０に顔が含まれるか否かを識別する処理が開始されると制御部１４は処理を開始し、第１の識別部８が識別対象画像Ｓ０に顔候補が含まれると識別した回数Ｎ１をカウントする（ステップＳ３１）。一方、第２の識別部１０が顔候補が顔であると識別した回数Ｎ２をカウントする（ステップＳ３２）。 FIG. 15 is a flowchart showing processing performed in the second embodiment. When the process of identifying whether or not a face is included in the identification target image S0 is started, the control unit 14 starts the process, and the first identification unit 8 has identified that a face candidate is included in the identification target image S0. The number of times N1 is counted (step S31). On the other hand, the number N2 of times when the second identification unit 10 identifies that the face candidate is a face is counted (step S32).

次いで、制御部１４は、識別を行った識別対象画像Ｓ０の数（すなわち識別数）が所定数に達したか否かを判定する（ステップＳ３３）。ステップＳ３３が否定されるとステップＳ３１に戻り、ステップＳ３３が肯定されるまでステップＳ３１からステップＳ３３の処理を繰り返す。ステップＳ３３が肯定されると、回数Ｎ１に対する回数Ｎ２の比Ｎ２／Ｎ１が所定の割合以上であるか否かを判定する（ステップＳ３４）。 Next, the control unit 14 determines whether or not the number of identification target images S0 that have been identified (that is, the number of identifications) has reached a predetermined number (step S33). If step S33 is negative, the process returns to step S31, and the processing from step S31 to step S33 is repeated until step S33 is positive. If step S33 is affirmed, it is determined whether or not the ratio N2 / N1 of the number of times N2 to the number of times N1 is equal to or greater than a predetermined ratio (step S34).

ステップＳ３４が肯定されると、第１の特徴量Ｃ１のみを算出して第１の識別部８のみを用いて識別対象画像Ｓ０に顔が含まれるか否かを識別するよう特徴量算出部４、第１の識別部８および第２の識別部１０を制御し（ステップＳ３５）、処理を終了する。一方、ステップＳ３４が否定されると、引き続き第１および第２の特徴量Ｃ１，Ｃ２を算出し、第１および第２の識別部８，１０を用いて識別対象画像Ｓ０に顔が含まれるか否かを識別するよう特徴量算出部４、第１の識別部８および第２の識別部１０を制御し（ステップＳ３６）、処理を終了する。 When step S34 is affirmed, the feature quantity calculation unit 4 calculates only the first feature quantity C1 and uses the first identification unit 8 alone to identify whether or not the identification target image S0 includes a face. Then, the first identification unit 8 and the second identification unit 10 are controlled (step S35), and the process ends. On the other hand, if step S34 is negative, the first and second feature amounts C1 and C2 are continuously calculated, and whether the identification target image S0 includes a face using the first and second identification units 8 and 10 or not. The feature amount calculation unit 4, the first identification unit 8, and the second identification unit 10 are controlled so as to identify whether or not (step S36), and the process ends.

このように、第２の実施形態においては、第１の識別部８による識別結果が所定の要求を満たすか否かを判定し、この判定が肯定された場合には、以降の処理においては第１の特徴量Ｃ１のみを算出し、第１の識別部８のみを用いて第１の識別部８が識別した顔候補を顔と識別するようにしたものである。このため、第１の識別部８が精度よく識別を行っている場合には、正規化された第２の特徴量Ｃ２の算出および第２の識別部１０が行う第２の識別を省略することができ、これにより、より高速に識別対象画像Ｓ０に顔が含まれるか否かの識別を行うことができる。 As described above, in the second embodiment, it is determined whether or not the identification result by the first identification unit 8 satisfies a predetermined request. If this determination is affirmative, Only one feature quantity C1 is calculated, and only the first identification unit 8 is used to identify the face candidate identified by the first identification unit 8 as a face. For this reason, when the 1st discriminating part 8 is discriminating with sufficient precision, the calculation of the normalized 2nd feature-value C2 and the 2nd discriminating which the 2nd discriminating part 10 performs are abbreviate | omitted. Accordingly, it is possible to identify whether or not a face is included in the identification target image S0 at a higher speed.

なお、上記第１および第２の実施形態においては、第１および第２の参照データＲ１，Ｒ２は装置１内のメモリ６に格納されているものとしたが、特徴量算出部４、第１の識別部８および第２の識別部１０が第１および第２の参照データＲ１，Ｒ２にアクセスできる限り、第１および第２の参照データＲ１，Ｒ２は、装置１とは別個の装置やＣＤ−ＲＯＭ等の差替可能な媒体に記憶されたものであってもよい。 In the first and second embodiments, the first and second reference data R1 and R2 are stored in the memory 6 in the apparatus 1. However, the feature amount calculation unit 4, As long as the discriminating unit 8 and the second discriminating unit 10 can access the first and second reference data R1 and R2, the first and second reference data R1 and R2 are separated from the device 1 by a device or CD. It may be stored in a replaceable medium such as a ROM.

また、上記第１および第２の実施形態においては、正規化が不要な第１の特徴量Ｃ１として勾配ベクトルＫの傾きを用いているが、識別対象画像Ｓ０の色相や彩度等の色情報も勾配ベクトルＫと同様に識別対象画像Ｓ０の明度やコントラストが変化しても不変なものであることから、識別対象画像Ｓ０の色情報を第１の特徴量として用いてもよい。 In the first and second embodiments, the gradient of the gradient vector K is used as the first feature amount C1 that does not require normalization. However, color information such as hue and saturation of the identification target image S0 is used. Similarly to the gradient vector K, the color information of the identification target image S0 may be used as the first feature amount because it remains unchanged even if the brightness and contrast of the identification target image S0 change.

また、上記第１および第２の実施形態においては、顔を識別対象物として識別対象画像Ｓ０に顔が含まれるか否かを識別しているが、ほぼ一定形状をなしており、参照データの学習を行う際にサイズを揃えることが可能な自動車や道路標識等を識別の対象物としてもよい。 In the first and second embodiments, a face is identified as an identification target, and whether or not the identification target image S0 includes a face is identified. Automobiles, road signs, and the like that can be matched in size when learning may be used as identification objects.

また、上記第１および第２の実施形態においては、出力部１２が識別対象画像Ｓ０から顔を抽出しているが、識別対象画像Ｓ０における顔の位置を表す顔位置情報（例えば識別された顔を囲む矩形領域の四隅の座標）を識別対象画像データＳ０に付与し、顔位置情報が付与された識別対象画像データＳ０を出力してもよい。ここで、顔位置情報を識別対象画像データＳ０に付与するには、識別対象画像データＳ０のヘッダやタグに顔位置情報を記述したり、識別対象画像データＳ０とファイル名が同一で拡張子が異なる例えばテキストファイルに顔位置情報を記述して、識別対象画像データＳ０とテキストファイルとを一体不可分とする手法を用いることができる。なお、識別対象画像Ｓ０には顔が含まれないと識別された場合には、その識別結果を表す識別情報を識別対象画像データＳ０に付与して出力してもよい。 In the first and second embodiments, the output unit 12 extracts a face from the identification target image S0. However, face position information (for example, an identified face) indicating the position of the face in the identification target image S0 is used. (Coordinates of the four corners of the rectangular area surrounding the image) may be added to the identification target image data S0, and the identification target image data S0 to which the face position information is added may be output. Here, in order to add the face position information to the identification target image data S0, the face position information is described in the header or tag of the identification target image data S0, or the file name has the same file name as the identification target image data S0. For example, it is possible to use a method in which face position information is described in a different text file and the identification target image data S0 and the text file are inseparably integrated. When it is identified that the identification target image S0 does not include a face, identification information indicating the identification result may be added to the identification target image data S0 and output.

また、上記第１および第２の実施形態においては、特徴量算出部４において第１および第２の特徴量Ｃ１，Ｃ２を算出しているが、第１の特徴量Ｃ１および第２の特徴量Ｃ２をそれぞれ算出するための専用の特徴量算出部を設けてもよい。 In the first and second embodiments, the feature amount calculation unit 4 calculates the first and second feature amounts C1 and C2. However, the first feature amount C1 and the second feature amount are calculated. A dedicated feature amount calculation unit for calculating C2 may be provided.

また、上記第１および第２の実施形態においては、第１および第２の識別部８，１０という２つの識別部を用いているが、図１６に示す本発明の第３の実施形態による対象物識別装置１″のように、さらに第３の識別部１６を設けるようにしてもよい。 Further, in the first and second embodiments, two identification units, the first and second identification units 8 and 10, are used, but the object according to the third embodiment of the present invention shown in FIG. As in the object identification device 1 ″, a third identification unit 16 may be further provided.

第３の実施形態による対象物識別装置１″の第３の識別部１６は、第１および第２の特徴量Ｃ１，Ｃ２とは別の特徴量（第３の特徴量Ｃ３とする）について学習を行った別の参照データ（第３の参照データＲ３とする）を参照して、識別対象画像Ｓ０から算出された第３の特徴量Ｃ３に基づいて、第２の識別部１０が識別した顔がさらに真の顔であるか否かを識別する。このように、さらに第３の識別部１６を設けることにより、識別対象画像Ｓ０に顔が含まれるか否かの識別精度をさらに向上させることができる。なお、第３の実施形態においては、第３の識別部１６という１つの識別部を第１および第２の識別部８，１０に追加しているが、さらに複数の識別部を追加してもよい。 The third identification unit 16 of the object identification device 1 ″ according to the third embodiment learns about a feature quantity (referred to as a third feature quantity C3) different from the first and second feature quantities C1 and C2. The face identified by the second identification unit 10 based on the third feature amount C3 calculated from the identification target image S0 with reference to another reference data (referred to as third reference data R3) that has been subjected to In this manner, the third identification unit 16 is further provided to further improve the identification accuracy of whether or not the identification target image S0 includes a face. In the third embodiment, one identification unit called the third identification unit 16 is added to the first and second identification units 8 and 10, but a plurality of identification units are further added. May be.

また、上記第１から第３の実施形態においては、本発明による対象物識別装置を単体として用いているが、本発明による対象物識別装置をデジタルカメラ、カメラ付き携帯電話等の撮影により画像データを取得する撮像装置に設けるようにしてもよい。これにより、撮像装置において、画像データにより表される画像に対して顔検出、赤目補正または目を閉じているか否かを検出する処理を行う際に、顔さらには目の位置の認識を行うことができる。 In the first to third embodiments, the object identification device according to the present invention is used as a single unit. However, the object identification device according to the present invention is image data obtained by photographing with a digital camera, a camera-equipped mobile phone, or the like. You may make it provide in the imaging device which acquires. Thus, in the imaging apparatus, when performing face detection, red-eye correction, or detection of whether or not eyes are closed on the image represented by the image data, the face and the eye position are recognized. Can do.

以上、本発明の第１から第３の実施形態に係る装置について説明したが、コンピュータを、上記の画像入力部２、特徴量算出部４、メモリ６、第１の識別部８、第２の識別部１０、出力部１２、制御部１４および第３の識別部１６に対応する手段として機能させ、識別対象画像Ｓ０に顔が含まれるか否かを識別する処理を行わせるプログラムも、本発明の実施形態の１つである。また、そのようなプログラムを記録したコンピュータ読取可能な記録媒体も、本発明の実施形態の１つである。これらの場合においても、参照データは、プログラム内あるいは同一の記録媒体内に含まれているものであってもよいし、外部の装置や別個の媒体から提供されるものであってもよい。 As described above, the devices according to the first to third embodiments of the present invention have been described. However, the computer includes the image input unit 2, the feature amount calculation unit 4, the memory 6, the first identification unit 8, and the second. A program that functions as means corresponding to the identification unit 10, the output unit 12, the control unit 14, and the third identification unit 16 and that performs processing for identifying whether or not a face is included in the identification target image S0 is also included in the present invention. This is one of the embodiments. A computer-readable recording medium that records such a program is also one embodiment of the present invention. In these cases, the reference data may be included in the program or the same recording medium, or may be provided from an external device or a separate medium.

本発明の第１の実施形態による対象物識別装置の構成を示す概略ブロック図1 is a schematic block diagram showing the configuration of an object identification device according to a first embodiment of the present invention. （ａ）は水平方向のエッジ検出フィルタを示す図、（ｂ）は垂直方向のエッジ検出フィルタを示す図(A) is a diagram showing a horizontal edge detection filter, (b) is a diagram showing a vertical edge detection filter 勾配ベクトルの算出を説明するための図Diagram for explaining calculation of gradient vector （ａ）は人物の顔を示す図、（ｂ）は（ａ）に示す人物の顔の目および口付近の勾配ベクトルを示す図(A) is a figure which shows a person's face, (b) is a figure which shows the gradient vector of eyes and mouth vicinity of the person's face shown to (a). （ａ）は正規化前の勾配ベクトルの大きさのヒストグラムを示す図、（ｂ）は正規化後の勾配ベクトルの大きさのヒストグラムを示す図、（ｃ）は５値化した勾配ベクトルの大きさのヒストグラムを示す図、（ｄ）は正規化後の５値化した勾配ベクトルの大きさのヒストグラムを示す図(A) is a diagram showing a histogram of the magnitude of a gradient vector before normalization, (b) is a diagram showing a histogram of the magnitude of a gradient vector after normalization, and (c) is a magnitude of a gradient vector obtained by quinarization. The figure which shows the histogram of the length, (d) is a figure which shows the histogram of the magnitude | size of the quinary gradient vector after normalization 顔であることが分かっているサンプル画像の例を示す図Figure showing an example of a sample image that is known to be a face 顔の回転を説明するための図Illustration for explaining face rotation 参照データの学習手法を示すフローチャートFlow chart showing learning method of reference data 識別器の導出方法を示す図Diagram showing how to derive a classifier サンプル画像に左目および左頬を含む第１の領域および右目および右頬を含む第２の領域、さらには両目を含む第３の領域を設定した状態を示す図The figure which shows the state which set the 1st area | region containing a left eye and a left cheek, the 2nd area | region containing a right eye and a right cheek, and also the 3rd area | region containing both eyes in the sample image. 変形したサンプル画像に第１から第３の領域を設定した状態を示す図The figure which shows the state which set the 1st-3rd area | region to the deformed sample image 識別対象画像の段階的な変形を説明するための図The figure for demonstrating the stepwise deformation | transformation of an identification object image 第１の実施形態において行われる処理を示すフローチャートThe flowchart which shows the process performed in 1st Embodiment. 本発明の第２の実施形態による対象物識別装置の構成を示す概略ブロック図The schematic block diagram which shows the structure of the target object identification apparatus by the 2nd Embodiment of this invention. 第２の実施形態の制御部が行う処理を示すフローチャートThe flowchart which shows the process which the control part of 2nd Embodiment performs 本発明の第３の実施形態による対象物識別装置の構成を示す概略ブロック図The schematic block diagram which shows the structure of the target object identification apparatus by the 3rd Embodiment of this invention.

Explanation of symbols

１，１′，１″ 対象物識別装置
２画像入力部
４特徴量算出部
６メモリ
８第１の識別部
１０第２の識別部
１２出力部
１４制御部
１６第３の識別部 1, 1 ′, 1 ″ Object identification device 2 Image input unit 4 Feature quantity calculation unit 6 Memory 8 First identification unit 10 Second identification unit 12 Output unit 14 Control unit 16 Third identification unit

Claims

Image input means for receiving input of an image to be identified;
First feature amount calculating means for calculating a first feature amount that is not required for normalization used for identifying a predetermined object from the image of the identification target;
The first reference data that predefines the first feature quantity and the identification condition corresponding to the first feature quantity are referred to based on the first feature quantity calculated from the image to be identified. First identifying means for identifying whether or not a predetermined object candidate is included in the image to be identified;
When the first identifying means identifies that the predetermined object candidate is included, a second normalized feature value used for identifying the predetermined object is calculated from the predetermined object candidate. A feature amount calculating means;
Based on the normalized second feature amount calculated from the predetermined object candidate, second reference data that predefines the second feature amount and an identification condition corresponding to the second feature amount. And a second identification means for identifying whether or not the predetermined object candidate is the predetermined object.

The first reference data is included in a number of sample image groups including a plurality of sample images known to be the predetermined object and a plurality of sample images known to be not the predetermined object. The object identifying apparatus according to claim 1, wherein the first feature amount is obtained by learning in advance by a machine learning method.

When the predetermined object is a face, the first reference data includes a first region of a predetermined range including a left eye and a left cheek in a sample image known to be the predetermined object, and a right eye and a right The first feature amount included in the second region of the predetermined range including the cheek, and each region corresponding to the first and second regions in the sample image that is known not to be the predetermined object It is obtained by learning the first feature amount included,
The first feature amount calculating means is means for calculating the first feature amount from each region corresponding to the first and second regions in the image to be identified. 2. The object identification device according to 2.

The first reference data is not the predetermined feature and the first feature amount included in the third region of the predetermined range including both eyes in the sample image that is known to be the predetermined target. Obtained by further learning the first feature amount included in a region corresponding to the third region in the known sample image,
The first feature amount calculating unit is a unit that calculates the first feature amount from each region corresponding to the first to third regions in the image to be identified. 3. The object identification device according to 3.

The second reference data is included in a number of sample image groups including a plurality of sample images that are known to be the predetermined object and a plurality of sample images that are known not to be the predetermined object. 5. The object identification device according to claim 1, wherein the second feature amount is obtained by learning in advance by a machine learning method.

When the predetermined object is a face, the second reference data includes a first region of a predetermined range including a left eye and a left cheek and a right eye and a right in a sample image that is known to be the predetermined object. The second feature amount included in the second region of the predetermined range including the cheek, and each region corresponding to the first and second regions in the sample image that is known not to be the predetermined object It is obtained by learning the second feature amount included,
The second feature quantity calculating means is a means for calculating the second feature quantity from each area corresponding to the first and second areas in the image to be identified. 5. The object identification device according to 5.

The second reference data is not the predetermined feature and the second feature amount included in the third region of the predetermined range including both eyes in the sample image that is known to be the predetermined target. Obtained by further learning the second feature amount included in a region corresponding to the third region in the known sample image,
7. The second feature quantity calculating means is a means for calculating the second feature quantity from each area corresponding to the first to third areas in the image to be identified. The object identification device described.

The object identification device according to claim 1, wherein the first feature amount is a direction of a gradient vector or color information at each pixel on the image.

9. The object identification device according to claim 1, wherein the second feature amount is a direction and a magnitude of a gradient vector in each pixel on the image.

It is determined whether or not the identification result by the first identification means satisfies a predetermined request, and when the determination is affirmative, only the first feature amount is calculated from the image to be identified, Control for controlling the first and second feature quantity calculating means and the first and second identifying means so as to identify the predetermined object candidate identified by the first identifying means from the predetermined object. The object identification device according to claim 1, further comprising means.

Based on yet another feature amount calculated from the predetermined object candidate, it is identified whether or not the predetermined object included in the image identified by the second identifying means is truly the predetermined object. The object identifying device according to any one of claims 1 to 9, further comprising at least one other identifying means.

The target object identification apparatus according to claim 1, further comprising an extraction unit configured to extract the predetermined target object from the identification target image.

12. The output device according to claim 1, further comprising an output unit configured to output information indicating the position of the predetermined object in the identification target image by adding the information to the identification target image. Object identification device.

An imaging apparatus comprising the object identification device according to claim 1.

Accept input of image to be identified,
Calculating a first feature amount that is not required for normalization used for identification of a predetermined object from the image of the identification object;
The first reference data that predefines the first feature quantity and the identification condition corresponding to the first feature quantity are referred to based on the first feature quantity calculated from the image to be identified. , Identifying whether a predetermined target candidate is included in the image to be identified,
When the first identification means identifies that the predetermined object candidate is included, a normalized second feature value used for identification of the predetermined object is calculated from the predetermined object candidate,
Based on the normalized second feature amount calculated from the predetermined object candidate, second reference data that predefines the second feature amount and an identification condition corresponding to the second feature amount. The object identification method is characterized by identifying whether or not the predetermined object candidate is the predetermined object.

A procedure for accepting input of an image to be identified;
A procedure for calculating a first feature amount used for identification of a predetermined object that does not require normalization from the image of the identification object;
The first reference data that predefines the first feature quantity and the identification condition corresponding to the first feature quantity are referred to based on the first feature quantity calculated from the image to be identified. , A procedure for identifying whether or not a predetermined object candidate is included in the image to be identified;
A step of calculating, from the predetermined object candidate, a normalized second feature amount used for identifying the predetermined object when the first identifying means identifies that the predetermined object candidate is included;
Based on the normalized second feature amount calculated from the predetermined object candidate, second reference data that predefines the second feature amount and an identification condition corresponding to the second feature amount. A program for causing a computer to execute an object identification method, comprising: a step of identifying whether or not the predetermined object candidate is the predetermined object.