JP5317934B2

JP5317934B2 - Object detection apparatus and method, and program

Info

Publication number: JP5317934B2
Application number: JP2009267215A
Authority: JP
Inventors: 軼胡
Original assignee: Fujifilm Corp
Current assignee: Fujifilm Corp
Priority date: 2009-11-25
Filing date: 2009-11-25
Publication date: 2013-10-16
Anticipated expiration: 2029-11-25
Also published as: JP2011113168A

Abstract

<P>PROBLEM TO BE SOLVED: To accurately detect an object of a specific type, such as a face from a detection object image at high speed. <P>SOLUTION: A plurality of resolution images of different resolutions are obtained by multiplex resolution of a face detection object image with a predetermined magnification. A partial image is formed by scanning a plurality of windows which have sizes to interpolate a predetermined magnification between resolution images on the plurality of resolution images. It is determined whether the partial image is a face by computing based on the partial image, the amount of characteristic concerning distribution of pixel values of the partial image, and using the amount of the characteristic. For determination, a discriminator is used, which discriminates whether the partial image is the image of the object by using the amount of characteristic concerning the partial image. The discriminator has learned the amount of characteristic concerning distribution of the face for each size of window by a plurality of sample images which are rendered the same illumination correction and have different sizes corresponding to sizes of the plurality of windows. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、検出対象画像から人物の顔等の特定種類のオブジェクトを検出するオブジェクト検出装置および方法並びにオブジェクト検出方法をコンピュータに実行させるためのプログラムに関するものである。 The present invention relates to an object detection apparatus and method for detecting a specific type of object such as a human face from a detection target image, and a program for causing a computer to execute the object detection method.

従来、デジタルカメラによって撮影されたスナップ写真における人物の顔領域の色分布を調べてその肌色を補正したり、監視システムのデジタルビデオカメラで撮影されたデジタル映像中の人物を認識したりすることが行われている。このような場合、デジタル画像中の人物の顔に対応する顔領域を検出する必要があるため、これまでに、デジタル画像中の顔を検出する手法が種々提案されている。その中でもとくに検出精度、ロバスト性が優れているとされる顔検出の手法として、サンプル画像を用いたマシンラーニングの学習により生成された判別器モジュール（以下、単に判別器という）を用いる手法が知られている。 Conventionally, the color distribution of a person's face area in a snapshot photographed by a digital camera is examined to correct the skin color, or a person in a digital image photographed by a digital video camera of a surveillance system is recognized. Has been done. In such a case, since it is necessary to detect a face region corresponding to a person's face in the digital image, various techniques for detecting a face in the digital image have been proposed so far. Among them, a method using a discriminator module (hereinafter simply referred to as a discriminator) generated by machine learning learning using sample images is known as a face detection method that is considered to have excellent detection accuracy and robustness. It has been.

この手法は、複数の異なる顔のサンプル画像からなる顔サンプル画像群と、顔でないことが分かっている複数の異なる非顔サンプル画像とからなる非顔サンプル画像群とを用いて、顔であることの特徴を学習させ、ある画像が顔の画像であるか否かを判別できる判別器を生成して用意しておき、顔の検出対象となる画像（以下、検出対象画像という）において部分画像を順次切り出し、その部分画像が顔であるか否かを上記の判別器を用いて判別し、顔であると判別した部分画像の領域を抽出することにより、検出対象画像上の顔を検出する手法である。 This method uses a face sample image group composed of a plurality of different face sample images and a non-face sample image group composed of a plurality of different non-face sample images that are known not to be faces. A classifier that can learn whether or not an image is a face image is generated and prepared, and a partial image is detected in an image that is a face detection target (hereinafter referred to as a detection target image). A method of detecting a face on a detection target image by sequentially cutting out, determining whether or not the partial image is a face using the above discriminator, and extracting a region of the partial image determined to be a face It is.

ここで、判別器は、顔サンプル画像および非顔サンプル画像から画素値の分布に係る複数の特徴量を抽出し、この特徴量を用いて学習が行われてなるものであり、各特徴量毎に顔であるか否かを判別する複数の弱判別器から構成されてなるものである。したがって、部分画像が顔であるか否かの判別は、部分画像から学習時と同様の画素値の分布に係る複数の特徴量を抽出し、抽出した特徴量を各弱判別器により判別することにより行われる。 Here, the classifier extracts a plurality of feature amounts related to the distribution of pixel values from the face sample image and the non-face sample image, and learning is performed using the feature amounts. It is composed of a plurality of weak classifiers for determining whether or not a face is present. Therefore, whether or not the partial image is a face is determined by extracting a plurality of feature amounts related to the distribution of pixel values from the partial image as in learning, and determining the extracted feature amounts by each weak classifier. Is done.

なお、この手法は、顔のサンプル画像と同一のサイズを有する顔については、サンプル画像と同一サイズとなるように部分画像を切り出すことにより、精度良く検出することができる。しかしながら、検出対象画像に含まれる可能性がある顔のサイズは一定ではないため、同一サイズの顔サンプル画像のみを用いて学習した判別器のみを用いたのでは、検出対象画像に含まれるすべての顔を検出できない。このため、部分画像を切り出すウィンドウのサイズを段階的に変更しつつ部分画像を切り出し、部分画像から対応する複数の特徴量を抽出し、抽出した特徴量を対応する弱判別器により判別して、部分画像が顔であるか否かを判別する手法が提案されている（非特許文献１参照）。 In this method, a face having the same size as the face sample image can be accurately detected by cutting out the partial image so as to be the same size as the sample image. However, since the size of the face that may be included in the detection target image is not constant, using only the discriminator learned using only the face sample image of the same size, all of the faces included in the detection target image The face cannot be detected. For this reason, the partial image is cut out while gradually changing the size of the window to cut out the partial image, a plurality of corresponding feature amounts are extracted from the partial image, and the extracted feature amount is determined by the corresponding weak discriminator, A method for determining whether or not a partial image is a face has been proposed (see Non-Patent Document 1).

また、ウィンドウのサイズではなく、検出対象画像を多重解像度化して複数の解像度画像を取得し、各解像度画像からサンプル画像と同一サイズのウィンドウを用いて部分画像を切り出して、部分画像が顔であるか否かの判別を行う手法も提案されている（特許文献１参照）。なお、特許文献１に記載された手法は、図２０に示すように、検出対象画像を基本となる解像度画像Ｓ１＿１とし、解像度画像Ｓ１＿１に対して２の−１／３乗倍サイズの解像度画像Ｓ１＿２と、解像度画像Ｓ１＿２に対して２の−１／３乗倍サイズ（基本画像Ｓ１＿１に対しては２の−２／３乗倍サイズ）の解像度画像Ｓ１＿３とを先に生成し、その後、解像度画像Ｓ１＿１，Ｓ１＿２，Ｓ１＿３のそれぞれを１／２倍サイズに縮小した解像度画像を生成し、それら縮小した解像度画像をさらに１／２倍サイズに縮小した解像度画像を生成する、といった処理を繰り返し行い、解像度が異なる複数の解像度画像を所定の数だけ生成している。 Also, instead of the size of the window, the detection target image is multi-resolution to obtain a plurality of resolution images, a partial image is cut out from each resolution image using a window of the same size as the sample image, and the partial image is a face A method of determining whether or not is also proposed (see Patent Document 1). As shown in FIG. 20, the technique described in Patent Document 1 uses a detection target image as a basic resolution image S1_1, and a resolution image S1_2 having a size that is −1/3 times the size of the resolution image S1_1. And a resolution image S1_3 having a size of 2−1 / 3 times the size of the resolution image S1_2 (2−2 / 3 times the size of the basic image S1_1), and then generating the resolution image S1_2. A process of repeatedly generating a resolution image obtained by reducing each of S1_1, S1_2, and S1_3 to a half size and generating a resolution image obtained by further reducing the reduced resolution image to a half size is performed. A predetermined number of resolution images having different values are generated.

ところで、上記の判別器は、一般的に、比較的画質が整ったサンプル画像を用いて学習されるため、基本的に画質のきれいな画像を対象に作られたものである。一方、検出対象画像としては、撮影シーンの明るさやコントラストが種々異なる画像が想定される。したがって、例えば、検出対象画像が暗い場所で撮影された画像である場合、この画像において顔の特徴を表す目の暗い部分と鼻の明るい部分を探そうとしても、画像の明るさが影響し、探索が難しい場合がある。 By the way, since the discriminator is generally learned using a sample image with relatively good image quality, it is basically made for an image with a good image quality. On the other hand, as the detection target image, images with various brightness and contrast of the shooting scene are assumed. Therefore, for example, when the detection target image is an image taken in a dark place, even if an attempt is made to search for a dark part of the eye and a bright part of the nose representing facial features in this image, the brightness of the image affects, Searching may be difficult.

このため、検出対象画像の明るさやコントラストが違っても顔を検出することができるように、検出もしくは判別の対象となる画像に前処理として、画像のコントラストをある一定レベルに揃えるべく、正規化処理を施して照明補正を行う手法が提案されている。この正規化処理を施す手法としては、主に下記３つの手法が提案されている。 For this reason, normalization is performed so that the contrast of the image is set to a certain level as a pre-process for the image to be detected or discriminated so that the face can be detected even if the brightness and contrast of the detection target image are different. There has been proposed a technique for performing illumination correction by performing processing. The following three methods are mainly proposed as a method for performing this normalization process.

第１の手法は、検出対象画像の画像全体の画素値をその画像における被写体の輝度の対数を表す値に近づける変換曲線（ルックアップテーブル）にしたがって変換する手法である。第２の手法は、検出対象画像から切り出された部分画像毎にその領域内の画素値（輝度値）の分散の程度を一定レベルに揃えるべくこの画素値を変換する手法である。そして、第３の手法は、検出対象画像から切り出された部分画像の領域内で所定サイズの局所領域を走査しながらその局所領域における画素値の分散の程度を一定レベルに揃えるべく、この画素値を変換する手法である。第１から第３の手法のうち、第３の手法は、検出対象画像中の遮光、背景、入力モダリティの違いによる影響を受けにくいため、照明補正の程度が最も高い。なお、検出対象画像を多重解像度化した場合、複数の解像度画像のそれぞれに対して所定サイズの局所領域を走査しながらその局所領域における画素値の分散の程度を一定レベルに揃えるべく、画素値を変換することにより照明補正を行うことも可能である。 The first method is a method of performing conversion according to a conversion curve (lookup table) that approximates the pixel value of the entire image of the detection target image to a value representing the logarithm of the luminance of the subject in the image. The second method is a method of converting the pixel values for each partial image cut out from the detection target image so that the degree of dispersion of the pixel values (luminance values) in the region is made constant. The third method scans the local area of a predetermined size within the area of the partial image cut out from the detection target image, and adjusts the degree of dispersion of the pixel values in the local area to a certain level. Is a method to convert. Of the first to third methods, the third method has the highest degree of illumination correction because it is not easily affected by differences in light shielding, background, and input modality in the detection target image. In addition, when the detection target image is multi-resolution, the pixel value is set so that the degree of dispersion of the pixel value in the local region is made constant while scanning the local region of a predetermined size for each of the plurality of resolution images. It is also possible to perform illumination correction by conversion.

Paul Viola and Michael Jones, Rapid object detection using a boosted cascade of features, IEEE CVPR, 2001Paul Viola and Michael Jones, Rapid object detection using a boosted cascade of features, IEEE CVPR, 2001 特開２００７−２５７６６号公報JP 2007-25766 Gazette

しかしながら、上記非特許文献１に記載の手法においては、複数サイズのウィンドウを使用するため、部分画像のサイズがウィンドウのサイズに応じて異なるものとなる。このようにサイズが異なる部分画像に対して第３の手法により照明補正を行う場合、局所領域のサイズを部分画像のサイズに応じて変更する必要がある。このため、照明補正のための演算が非常に複雑となり、その結果、照明補正のための演算に長時間を要するものとなる。 However, in the method described in Non-Patent Document 1, since a plurality of windows are used, the size of the partial image differs depending on the size of the window. Thus, when performing illumination correction by the 3rd method with respect to the partial image from which size differs, it is necessary to change the size of a local area | region according to the size of a partial image. For this reason, the calculation for illumination correction becomes very complicated, and as a result, a long time is required for the calculation for illumination correction.

また、非特許文献１に記載された手法においては、複数サイズのウィンドウを使用するが、すべてのウィンドウサイズに応じた学習を行った弱判別器からなる判別器を用意することは、学習の手間を考えると現実的ではない。このため、非特許文献１に記載された手法においては、弱判別器により判別される特徴量を取得する画素値の位置が、サイズが異なる部分画像間において互いに対応するものとなるようにするために、弱判別器のスケールを変更するようにしている。しかしながら、スケールを変更した弱判別器からなる判別器を用いると顔の検出精度が低下する。また、部分画像のサイズ毎にスケールを変更する必要があるため、その演算に長時間を要するものとなる。 In the method described in Non-Patent Document 1, a plurality of windows are used. However, preparing a discriminator composed of weak discriminators that perform learning according to all window sizes is difficult to learn. Is not realistic. For this reason, in the method described in Non-Patent Document 1, the position of the pixel value for acquiring the feature amount determined by the weak classifier corresponds to each other between the partial images having different sizes. In addition, the scale of the weak classifier is changed. However, if a discriminator consisting of a weak discriminator whose scale has been changed is used, the accuracy of face detection decreases. Further, since it is necessary to change the scale for each size of the partial image, the calculation takes a long time.

一方、上記特許文献１に記載された手法においては、複数の解像度画像のそれぞれに対して照明補正を行うことが可能であるため、非特許文献１に記載された手法と比較して、容易に照明補正を行うことができる。また、単一サイズのウィンドウを使用しているため、顔検出の精度が低下するおそれもない。しかしながら、顔検出の精度を向上させるためには、図２０に示すように、解像度画像における解像度の相違を非常に小さくして多くの解像度の解像度画像を取得する必要があるため、その演算に長時間を要するものとなる。 On the other hand, in the method described in Patent Document 1, since it is possible to perform illumination correction on each of a plurality of resolution images, it is easier than in the method described in Non-Patent Document 1. Illumination correction can be performed. In addition, since a single size window is used, there is no possibility that the accuracy of face detection is lowered. However, in order to improve the accuracy of face detection, as shown in FIG. 20, it is necessary to obtain a resolution image with many resolutions by reducing the difference in resolution in the resolution image. It takes time.

本発明は上記事情に鑑みなされたものであり、検出対象画像から顔等の特定種類のオブジェクトを精度良くかつ高速に検出することを目的とする。 The present invention has been made in view of the above circumstances, and an object of the present invention is to detect a specific type of object such as a face from a detection target image with high accuracy and high speed.

本発明によるオブジェクト検出装置は、特定種類のオブジェクトを検出する検出対象画像を所定倍率により多重解像度化して解像度が異なる複数の解像度画像を取得する多重解像度処理手段と、
前記複数の解像度画像上に、前記各解像度画像間の前記所定倍率を補間するサイズを有する複数のウィンドウを走査させて部分画像を生成する部分画像生成手段と、
前記部分画像に基づいて、該部分画像の画素値の分布に係る特徴量を算出し、該特徴量を用いて前記部分画像が前記オブジェクトの画像であるか否かを判定する判定手段とを備え、
前記判定手段が、前記複数のウィンドウのサイズに対応する異なるサイズを有し、同一の照明補正がなされた複数のサンプル画像により、前記オブジェクトの分布に係る特徴量を前記ウィンドウのサイズ毎に学習させた、前記部分画像に係る前記特徴量を用いて該部分画像が前記オブジェクトの画像であるか否かを判別する判別器を備えたものであることを特徴とするものである。 An object detection apparatus according to the present invention includes a multi-resolution processing unit that multi-resolutions a detection target image for detecting a specific type of object at a predetermined magnification to obtain a plurality of resolution images having different resolutions;
A partial image generating means for generating a partial image by scanning a plurality of windows having a size for interpolating the predetermined magnification between the resolution images on the plurality of resolution images;
A determination unit that calculates a feature amount related to a distribution of pixel values of the partial image based on the partial image and determines whether the partial image is an image of the object using the feature amount; ,
The determination unit is configured to learn, for each window size, a feature amount related to the object distribution by using a plurality of sample images having different sizes corresponding to the sizes of the plurality of windows and subjected to the same illumination correction. In addition, the image processing apparatus includes a discriminator that determines whether or not the partial image is an image of the object using the feature amount related to the partial image.

「所定倍率」としては、検出対象画像を縮小する任意の倍率を用いることができるが、比較的計算が容易な１／２^ｎ倍を所定倍率として用いることが好ましい。 As the “predetermined magnification”, any magnification that reduces the detection target image can be used, but it is preferable to use 1/2 ⁿ times that is relatively easy to calculate as the predetermined magnification.

「所定倍率を補間するサイズ」とは、複数の解像度画像間の倍率の差異を補間するサイズを意味する。「所定倍率を補間するサイズを有する複数のウィンドウ」とは、所定サイズのウィンドウを基準ウィンドウとし、基準ウィンドウを所定倍率により縮小した場合において、基準ウィンドウのサイズと縮小したウィンドウのサイズとの中間のサイズを有する複数のウィンドウを意味する。例えば、所定倍率が１／２^ｎ倍の場合、各解像度画像間の倍率の差異は１／２すなわち２^−１であるため、「所定倍率を補間するサイズの複数のウィンドウ」としては、基準ウィンドウ、基準ウィンドウのサイズの２^−１／４，２^−２／４，２^−３／４のサイズを有するウィンドウあるいは２^−１／3，２^−２／3のサイズを有するウィンドウ等を用いることができる。 The “size for interpolating a predetermined magnification” means a size for interpolating a magnification difference between a plurality of resolution images. “A plurality of windows having a size for interpolating a predetermined magnification” means that a window having a predetermined size is set as a reference window, and when the reference window is reduced at a predetermined magnification, the size between the reference window size and the reduced window size is intermediate. It means a plurality of windows having a size. For example, when the predetermined magnification is 1/2 ⁿ times, the difference in magnification between the resolution images is 1/2, that is, 2 ^−1. Therefore, the “multiple windows having a size for interpolating the predetermined magnification” is used as the reference window. , ² -1/4 size of the reference ^window 2 ^{-2/4, 2 -3/4} window or ^{2 -1/3} having a size of, the use of windows and the like having a size of ^{2 -2/3} it can.

このように本発明は、部分画像に係る特徴量を用いて部分画像が特定種類のオブジェクトの画像であるか否かを判別する判別器を、複数のウィンドウのサイズに対応する異なるサイズを有し、同一の照明補正がなされた複数のサンプル画像により、オブジェクトの分布に係る特徴量をウィンドウのサイズ毎に学習させてなるものである。このため、ウィンドウのサイズが異なっていても、サイズが異なる部分画像毎に照明補正を行う必要がなくなるため、オブジェクト検出のための演算量を低減でき、その結果、高速にかつ精度良くオブジェクトを検出することができる。 As described above, the present invention provides a discriminator for determining whether or not a partial image is an image of a specific type of object using a feature amount related to the partial image, and has different sizes corresponding to the sizes of a plurality of windows. The feature amount related to the distribution of the object is learned for each window size by using a plurality of sample images subjected to the same illumination correction. For this reason, even if the window size is different, there is no need to perform illumination correction for each partial image with different sizes, so the amount of computation for object detection can be reduced, and as a result, objects can be detected quickly and accurately. can do.

また、本発明によれば、検出対象画像を所定倍率により多重解像度化して解像度が異なる複数の解像度画像を生成し、部分画像を生成するウィンドウのサイズを各解像度画像間の所定倍率を補間する複数のサイズを有するものとしたものである。このため、特許文献１に記載された手法のように、多重解像度化する際の解像度の差異を細かくする必要がなくなり、その結果、検出対象画像を多重解像度化する際の演算量を低減でき、よってオブジェクト検出を高速に行うことができる。 In addition, according to the present invention, a plurality of resolution images having different resolutions are generated by multiplying the detection target image at a predetermined magnification to generate a plurality of resolution images, and a plurality of sizes for interpolating a predetermined magnification between the resolution images for a size of a window for generating a partial image It is assumed to have a size of For this reason, unlike the technique described in Patent Document 1, it is not necessary to make a fine difference in resolution when multi-resolution, and as a result, the amount of calculation when multi-resolution detection target image can be reduced, Therefore, object detection can be performed at high speed.

また、特許文献１に記載された手法と比較して解像度画像の数が少なくなるため、各解像度画像に照明補正を行う際の演算量を低減することができる。 In addition, since the number of resolution images is reduced as compared with the method described in Patent Document 1, it is possible to reduce the amount of calculation when performing illumination correction on each resolution image.

また、検出対象画像を多重解像度化する際の解像度の差異を比較的大きくしつつも、部分画像を生成するウィンドウのサイズを各解像度画像間の所定倍率を補間する複数のサイズを有するものとしているため、検出対象画像に含まれる各種サイズのオブジェクトを精度良く検出することができる。 In addition, the size of the window for generating the partial image has a plurality of sizes for interpolating a predetermined magnification between the resolution images, while making the difference in resolution when the detection target image is multi-resolution relatively large. Therefore, objects of various sizes included in the detection target image can be detected with high accuracy.

また、非特許文献１に記載された手法のように、判別器のスケールを変更する計算を行う必要がないため、オブジェクトの検出を高速かつ精度良く行うことができる。 Further, unlike the method described in Non-Patent Document 1, it is not necessary to perform calculation for changing the scale of the discriminator, so that the object can be detected at high speed and with high accuracy.

なお、本発明によるオブジェクト検出装置においては、前記複数のウィンドウのそれぞれを、前記所定倍率により多重解像度化された、解像度が異なる少なくとも１つのサブウィンドウを有するものとし、
前記判別器を、前記サブウィンドウのサイズに対応する異なるサイズを有し、前記同一の照明補正がなされた複数のサンプル画像により、前記オブジェクトの分布に係る特徴量を前記サブウィンドウのサイズ毎にさらに学習させたものとしてもよい。 In the object detection apparatus according to the present invention, each of the plurality of windows has at least one sub-window having a plurality of resolutions with different resolutions and having a plurality of resolutions.
The discriminator further learns the feature quantity related to the distribution of the object for each size of the sub-window by using the plurality of sample images having different sizes corresponding to the size of the sub-window and subjected to the same illumination correction. It is also good.

これにより、複数の解像度画像の解像度と、複数のウィンドウおよび複数のウィンドウのそれぞれのサブウィンドウの解像度とを対応づけることができることとなる。 As a result, the resolution of the plurality of resolution images can be associated with the resolutions of the plurality of windows and the subwindows of the plurality of windows.

この場合、本発明によるオブジェクト検出装置においては、検出対象の解像度画像の注目画素に前記複数のウィンドウのうちの所定サイズのウィンドウを設定するとともに、該検出対象の解像度画像よりも解像度が低い解像度画像の前記注目画素に対応する画素に、前記所定サイズのウィンドウのサブウィンドウを解像度順に設定し、
相対的に低い解像度のサブウィンドウにより生成した部分画像が前記オブジェクトの画像であるか否かの第１の判定を行い、
該第１の判定が肯定された場合にのみ、前記複数のウィンドウおよび／または該複数のウィンドウの前記相対的に低い解像度よりも高い解像度のサブウィンドウによる前記部分画像の生成および該部分画像が前記オブジェクトの画像であるか否かの判定を行うよう、前記部分画像生成手段および前記判定手段を制御する制御手段を備えるものとしてもよい。 In this case, in the object detection apparatus according to the present invention, a window having a predetermined size among the plurality of windows is set as a target pixel of the resolution image to be detected, and a resolution image having a lower resolution than the resolution image to be detected Set the sub-window of the window of the predetermined size in the order of resolution to the pixel corresponding to the target pixel of
Performing a first determination as to whether or not the partial image generated by the relatively low resolution sub-window is the image of the object;
Only when the first determination is affirmative, the generation of the partial image by the plurality of windows and / or a sub-window having a resolution higher than the relatively low resolution of the plurality of windows and the partial image is the object. The partial image generation means and a control means for controlling the determination means may be provided so as to determine whether or not the image is an image.

「注目画素」とは、特定種類のオブジェクトの検出の対象となる画素である。 A “target pixel” is a pixel that is a target for detection of a specific type of object.

「所定サイズのウィンドウのサブウィンドウを解像度順に設定」するとは、最も解像度が高い、すなわち最も大きいサイズの解像度画像には、最も解像度が高い、すなわち最も大きいサイズのウィンドウを、次に大きいサイズの解像度画像には、最も大きいサイズのサブウィンドウのサイズよりも一段階小さいサイズのサブウィンドウを設定するというように、解像度画像のサイズに応じて順次小さいサイズのサブウィンドウを、各解像度画像の注目画素に対応する画素に設定することを意味する。 “Set subwindows of a given size window in order of resolution” means that the highest resolution, ie, the largest resolution image, has the highest resolution, ie, the largest size window, and the next largest resolution image. For example, a sub-window having a size smaller by one step than the size of the largest sub-window is set, and sub-windows having smaller sizes are sequentially set according to the size of the resolution image as pixels corresponding to the target pixel of each resolution image. Means to set.

これにより、第１の判定が否定された場合には、次の注目画素に所定ウィンドウを移動させて、次の段階の処理に進むことができる。ここで、低い解像度の解像度画像および低い解像度のサブウィンドウを用いての部分画像の生成および第１の判定は、部分画像の画素数が小さいため、その演算量が少ない。したがって、効率よく注目画素の位置を変更することができ、その結果、部分画像がオブジェクトの画像であるか否かの判定を高速に行うことができる。 As a result, when the first determination is negative, the predetermined window can be moved to the next pixel of interest, and the process of the next stage can proceed. Here, the generation of the partial image and the first determination using the low resolution resolution image and the low resolution sub-window have a small amount of calculation because the number of pixels of the partial image is small. Therefore, the position of the target pixel can be changed efficiently, and as a result, it can be determined at high speed whether or not the partial image is an object image.

また、この場合、本発明によるオブジェクト検出装置においては、前記制御手段を、前記所定サイズのウィンドウのサブウィンドウのうち、前記解像度が２番目に高いサブウィンドウにより生成された部分画像に対する前記判定が肯定された場合、前記複数のウィンドウのすべてについての最高解像度の前記サブウィンドウにより複数の部分画像を生成し、
該複数の部分画像が前記オブジェクトの画像であるか否かの第２の判定を行い、
該第２の判定により、前記オブジェクトの画像であることの確度が最も高い部分画像を生成したサブウィンドウに対応するウィンドウのみにより前記部分画像を生成し、
該部分画像が前記オブジェクトの画像であるか否かの第３の判定を行うよう、前記部分画像生成手段および前記判定手段を制御する手段としてもよい。 Further, in this case, in the object detection device according to the present invention, the control unit determines that the determination on the partial image generated by the sub-window having the second highest resolution among the sub-windows of the predetermined size window is affirmed. A plurality of sub-images with the highest resolution sub-window for all of the plurality of windows;
Performing a second determination as to whether the plurality of partial images are images of the object;
By the second determination, the partial image is generated only by a window corresponding to the sub-window that generated the partial image having the highest probability of being the image of the object,
The partial image generation unit and the determination unit may be controlled to perform a third determination as to whether or not the partial image is an image of the object.

「特定種類のオブジェクトの画像であることの確度」とは、判別器を用いた判定手段による特定種類のオブジェクトであるか否かの判別は、判別器の出力が所定の閾値以上であるか否かにより行われるものであることから、第２の判定によるウィンドウサイズ毎の判別器の出力を「特定種類のオブジェクトの画像であることの確度」として用いることができる。 “Accuracy of being an image of a specific type of object” means whether or not the determination unit using the classifier is the specific type of object, whether or not the output of the classifier is equal to or greater than a predetermined threshold Therefore, the output of the discriminator for each window size according to the second determination can be used as the “accuracy of being an image of a specific type of object”.

これにより、効率よくかつ精度良く高速にオブジェクトの検出を行うことができる。 Thereby, an object can be detected efficiently and accurately at high speed.

また、本発明によるオブジェクト検出装置においては、前記複数の解像度画像に対して、前記照明補正を行う照明補正手段をさらに備えるものとしてもよい。 The object detection apparatus according to the present invention may further include illumination correction means for performing the illumination correction on the plurality of resolution images.

これにより、オブジェクトを精度良く検出することができる。 Thereby, an object can be detected with high accuracy.

本発明によるオブジェクト検出方法は、特定種類のオブジェクトを検出する対象となる検出対象画像を所定倍率により多重解像度化して解像度が異なる複数の解像度画像を取得し、
前記複数の解像度画像上に、前記各解像度画像間の前記所定倍率を補間するサイズを有する複数のウィンドウを走査させて部分画像を生成し、
前記複数のウィンドウのサイズに対応する異なるサイズを有し、同一の照明補正がなされた複数のサンプル画像により、前記オブジェクトの分布に係る特徴量を前記ウィンドウのサイズ毎に学習させた、前記部分画像に係る前記特徴量を用いて該部分画像が前記オブジェクトの画像であるか否かを判別する判別器を備えた判定手段により、前記部分画像に基づいて該部分画像の画素値の分布に係る特徴量を算出し、該特徴量を用いて前記部分画像が前記オブジェクトの画像であるか否かを判定することを特徴とするものである。 The object detection method according to the present invention obtains a plurality of resolution images having different resolutions by multi-resolution a detection target image to be a target for detecting a specific type of object at a predetermined magnification,
A partial image is generated by scanning a plurality of windows having a size for interpolating the predetermined magnification between the resolution images on the plurality of resolution images,
The partial image having a different size corresponding to the size of the plurality of windows and learning a feature amount related to the distribution of the object for each size of the window by using a plurality of sample images subjected to the same illumination correction. A feature relating to a distribution of pixel values of the partial image based on the partial image by a determination unit including a discriminator that determines whether the partial image is an image of the object using the feature amount related to the partial image. An amount is calculated, and it is determined whether or not the partial image is an image of the object using the feature amount.

なお、本発明によるオブジェクト検出方法をコンピュータに実行させるためのプログラムとして提供してもよい。 In addition, you may provide as a program for making a computer perform the object detection method by this invention.

本発明によれば、オブジェクト検出のための演算量を低減して、高速かつ精度良くオブジェクトを検出することができる。 According to the present invention, the amount of calculation for object detection can be reduced, and an object can be detected at high speed and with high accuracy.

顔検出システムの構成を示すブロック図Block diagram showing the configuration of the face detection system 検出対象画像の多重解像度化の工程を示す図The figure which shows the process of multiresolution of a detection target image 局所正規化処理の概念を示す図Diagram showing the concept of local normalization processing 照明補正部における処理を示すフローチャートFlowchart showing processing in the illumination correction unit 顔検出部の構成を示す概略ブロック図Schematic block diagram showing the configuration of the face detection unit 複数サイズのウィンドウの生成を説明するための図Diagram for explaining generation of multiple size windows 判別器群の構成を示すブロック図Block diagram showing configuration of classifier group 判別器における大局的な処理を示すフローチャートFlow chart showing global processing in discriminator 弱判別器における処理を示すフローチャートFlow chart showing processing in weak classifier 弱判別器における特徴量の算出を説明するための図The figure for demonstrating calculation of the feature-value in a weak discriminator 顔検出システムにおいて行われる処理を示すフローチャートA flowchart showing processing performed in the face detection system ウィンドウの設定を説明するための図（その１）Diagram for explaining window settings (1) ウィンドウの設定を説明するための図（その２）Diagram for explaining window settings (2) 詳細な検出処理のフローチャートDetailed detection process flowchart 顔のサイズおよび位置の正規化を説明するための図Illustration for explaining normalization of face size and position 判別器の学習方法を示すフローチャートFlow chart showing the learning method of the classifier 弱判別器のヒストグラムを導出する方法を示す図The figure which shows the method of deriving the histogram of the weak classifier 本実施形態において動画像から顔を検出する場合の処理を示すフローチャートA flowchart showing processing when a face is detected from a moving image in the present embodiment. 動体検出を説明するための図Diagram for explaining moving object detection 従来の多重解像度化の工程を示す図Diagram showing the conventional multi-resolution process

以下、図面を参照して本発明の実施形態について説明する。図１は本発明のオブジェクト検出装置を適用した顔検出システムの構成を示す概略ブロック図である。この顔検出システムは、デジタル画像中に含まれる顔を検出するものである。図１に示すように、顔検出システム１は、顔を検出する対象となる検出対象画像Ｓ０を多重解像度化して解像度が異なる複数の画像（以下、解像度画像という）からなる解像度画像群Ｓ１（＝Ｓ１＿１，Ｓ１＿２，・・・，Ｓ１＿ｎ）を得る多重解像度化部１０と、後に実行される顔検出処理の精度向上を目的とした前処理として、各解像度画像に対して照明補正処理を行って照明補正済みの解像度画像群Ｓ１′（＝Ｓ１′＿１，Ｓ１′＿２，・・・，Ｓ１′＿ｎ）を得る照明補正部２０と、全体正規化済みの解像度画像群Ｓ１′の各々に対して顔検出処理を施すことにより、解像度画像群Ｓ１′の各解像度画像に含まれる顔を表す画像（以下、顔画像という）Ｓ２を検出する顔検出部３０とを備える。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a schematic block diagram showing the configuration of a face detection system to which the object detection device of the present invention is applied. This face detection system detects a face included in a digital image. As shown in FIG. 1, the face detection system 1 has a resolution image group S1 (= resolution image) composed of a plurality of images with different resolutions (hereinafter referred to as resolution images) by converting a detection target image S0 that is a target for detecting a face into multiple resolutions. (S1_1, S1_2,..., S1_n), and illumination processing is performed by performing illumination correction processing on each resolution image as preprocessing for the purpose of improving the accuracy of face detection processing to be executed later. The illumination correction unit 20 for obtaining the corrected resolution image group S1 ′ (= S1′_1, S1′_2,..., S1′_n) and the face for each of the overall normalized resolution image group S1 ′ A face detection unit 30 that detects an image (hereinafter referred to as a face image) S2 representing a face included in each resolution image of the resolution image group S1 ′ by performing detection processing.

多重解像度化部１０は、検出対象画像Ｓ０の解像度（画像サイズ）を変換することにより、その解像度を所定の解像度、例えば、ＶＧＡサイズ（６４０×４８０画素）の矩形サイズの画像に規格化し、規格化済みの検出対象画像Ｓ０′を得る。そして、この規格化済みの検出対象画像Ｓ０′を基本としてさらに解像度変換を行うことにより、解像度の異なる複数の解像度画像を生成し、解像度画像群Ｓ１を得る。このような解像度画像群を生成する理由は、通常、検出対象画像に含まれる顔の大きさは不明であるが、一方、検出しようとする顔の大きさ（画像サイズ）は、後述の判別器の生成方法と関連して一定の大きさに固定されるため、大きさの異なる顔を検出するためには、解像度の異なる画像上で位置をずらしながら後述する各種サイズの部分画像をそれぞれ切り出し、その部分画像が顔か非顔かを判別してゆく必要があるためである。 The multi-resolution conversion unit 10 converts the resolution (image size) of the detection target image S0 to normalize the resolution into an image having a predetermined resolution, for example, a rectangular size of VGA size (640 × 480 pixels). A converted detection target image S0 ′ is obtained. Then, by further performing resolution conversion based on the standardized detection target image S0 ′, a plurality of resolution images having different resolutions are generated, and a resolution image group S1 is obtained. The reason for generating such a resolution image group is that the size of the face included in the detection target image is usually unknown, whereas the size of the face to be detected (image size) is determined by a discriminator described later. In order to detect a face with a different size, a partial image of various sizes described later is cut out while shifting the position on an image with a different resolution. This is because it is necessary to determine whether the partial image is a face or a non-face.

具体的には、図２に示すように、規格化済みの検出対象画像Ｓ０′を基本となる解像度画像Ｓ１＿１とし、解像度画像Ｓ１＿１に対して２の−１乗倍サイズの解像度画像Ｓ１＿２と、解像度画像Ｓ１＿２に対して２の−１乗倍サイズ（解像度画像Ｓ１＿１に対しては２の−２乗倍サイズ）の解像度画像Ｓ１＿３と、解像度画像Ｓ１＿３に対して２の−１乗倍サイズ（解像度画像Ｓ１＿１に対しては２の−３乗倍サイズ）の解像度画像Ｓ１＿４とを生成する。 Specifically, as shown in FIG. 2, a standardized detection target image S0 ′ is set as a basic resolution image S1_1, and a resolution image S1_2 having a size that is −1 times the size of the resolution image S1_1, A resolution image S1_3 having a size of 2 −1 times the size of the image S1_2 (a size of 2−2 times a size of the resolution image S1_1), and a size of 2 −1 times the size (resolution image of the resolution image S1_3 A resolution image S1_4 having a size of 2 −3 times the size of S1_1 is generated.

これにより、解像度画像Ｓ１＿１が６４０×４８０画素の矩形サイズである場合、解像度画像Ｓ１＿２，Ｓ１＿３，・・・は、短辺がそれぞれ、３２０×２４０画素、１６０×１２０画素、８０×６０画素・・・の矩形サイズとなり、２の−１乗倍ずつ縮小された複数の解像度画像を生成することができる。なお、本実施形態においては、最も解像度が大きい解像動画像Ｓ１＿１を第１の階層の解像度画像Ｓ１＿１と称し、以下、解像度が低くなるにつれて第２の階層の解像度画像Ｓ１＿２、第３の階層の解像度画像Ｓ１＿３・・・と称するものとする。 Thus, when the resolution image S1_1 has a rectangular size of 640 × 480 pixels, the short sides of the resolution images S1_2, S1_3,... Are 320 × 240 pixels, 160 × 120 pixels, 80 × 60 pixels,. It is possible to generate a plurality of resolution images that have a rectangular size of 2 and are reduced by 2 to the power of −1. In the present embodiment, the resolution moving image S1_1 having the highest resolution is referred to as a first-layer resolution image S1_1. Hereinafter, as the resolution decreases, the second-layer resolution image S1_2 and the third-layer resolution image S1_1 are displayed. It shall be called resolution image S1_3.

照明補正部２０は、解像度画像群Ｓ１の各々に対して、画像上の局所的な領域におけるコントラストのばらつきを抑制するための局所正規化処理を施すことにより照明補正を行うものである。すなわち、照明補正部２０は、各解像度画像内で所定サイズの局所領域を走査しながらその局所領域における画素値の分散の程度を一定レベルに揃えるべく、画素値を変換する。具体的には、各解像度画像における各局所領域について、輝度を表す画素値の分散の程度が所定レベル以上である局所領域に対しては、この分散の程度を上記の所定レベルより高い一定レベルに近づける第１の輝度階調変換処理を施し、画素値の分散の程度が上記の所定レベル未満である局所領域に対しては、この分散の程度を上記の一定レベルより低いレベルに抑える第２の輝度階調変換処理を施すものである。 The illumination correction unit 20 performs illumination correction on each resolution image group S1 by performing local normalization processing for suppressing variations in contrast in local regions on the image. In other words, the illumination correction unit 20 converts the pixel value so as to align the degree of dispersion of the pixel value in the local region to a certain level while scanning the local region of a predetermined size in each resolution image. Specifically, for each local region in each resolution image, for a local region where the degree of dispersion of pixel values representing luminance is a predetermined level or higher, the degree of dispersion is set to a constant level higher than the predetermined level. A first luminance gradation conversion process is performed so as to reduce the variance of the pixel value to a level lower than the predetermined level for a local region where the variance of the pixel value is less than the predetermined level. A luminance gradation conversion process is performed.

図３は局所正規化処理の概念を示した図であり、図４は照明補正部２０における処理を示すフローチャートである。また、式（１），（２）は、この局所正規化処理のための画素値の階調変換の式である。

FIG. 3 is a diagram showing the concept of local normalization processing, and FIG. 4 is a flowchart showing processing in the illumination correction unit 20. Expressions (1) and (2) are gradation conversion expressions for pixel values for the local normalization process.

ここで、Ｘは注目画素の画素値、Ｘ′は注目画素の変換後の画素値、ｍlocalは注目画素を中心とする局所領域における画素値の平均、Ｖlocalはこの局所領域における画素値の分散、ＳＤlocalはこの局所領域における画素値の標準偏差、（Ｃ１×Ｃ１）は上記の一定レベルに対応する基準値、Ｃ２は上記の所定レベルに対応する閾値、ＳＤｃは所定の定数である。なお、本実施形態において、輝度の階調数は８ｂｉｔとし、画素値のとり得る値は０から２５５とする。 Here, X is the pixel value of the pixel of interest, X ′ is the pixel value after conversion of the pixel of interest, mlocal is the average of the pixel values in the local region centered on the pixel of interest, Vlocal is the variance of the pixel values in this local region, SDlocal is a standard deviation of pixel values in this local area, (C1 × C1) is a reference value corresponding to the above-mentioned constant level, C2 is a threshold value corresponding to the above-mentioned predetermined level, and SDc is a predetermined constant. In the present embodiment, the number of gradations of luminance is 8 bits, and the possible values of pixel values are 0 to 255.

照明補正部２０は、図４に示すように、各解像度画像Ｓ１−ｉ（ｉ＝１〜ｎ）における１つの画素を注目画素として設定し（ステップＳＴ１）、この注目画素を中心とする所定の大きさ、例えば９×９画素サイズの局所領域における画素値の分散Ｖlocalを算出し（ステップＳＴ２）、分散Ｖlocalが上記所定のレベルに対応する閾値Ｃ２以上であるか否かを判定する（ステップＳＴ３）。ステップＳＴ３において、分散Ｖlocalが閾値Ｃ２以上であると判定された場合には、上記第１の輝度階調変換処理として、分散Ｖlocalが上記一定のレベルに対応する基準値（Ｃ１×Ｃ１）より大きいほど、注目画素の画素値Ｘと平均ｍlocalとの差を小さくし、分散ｍlocalが基準値（Ｃ１×Ｃ１）より小さいほど、注目画素の画素値Ｘと平均ｍlocalとの差を大きくする階調変換を式（１）にしたがって行う（ステップＳＴ４）。 As shown in FIG. 4, the illumination correction unit 20 sets one pixel in each resolution image S1-i (i = 1 to n) as a target pixel (step ST1), and a predetermined center around this target pixel. A variance Vlocal of pixel values in a local area having a size, for example, a 9 × 9 pixel size is calculated (step ST2), and it is determined whether or not the variance Vlocal is equal to or greater than a threshold C2 corresponding to the predetermined level (step ST3). ). If it is determined in step ST3 that the variance Vlocal is greater than or equal to the threshold value C2, the variance Vlocal is larger than the reference value (C1 × C1) corresponding to the certain level as the first luminance gradation conversion process. The tone conversion that decreases the difference between the pixel value X of the target pixel and the average mlocal, and increases the difference between the pixel value X of the target pixel and the average mlocal as the variance mlocal is smaller than the reference value (C1 × C1). Is performed according to the equation (1) (step ST4).

一方、ステップＳＴ３において、分散Ｖlocalが閾値Ｃ２未満であると判定された場合には、上記第２の輝度階調変換処理として、分散Ｖlocalによらない線形な階調変換を式（２）にしたがって行う（ステップＳＴ５）。そして、ステップＳＴ１で設定した注目画素が最後の画素であるか否かを判定する（ステップＳＴ６）。ステップＳＴ６において、その注目画素が最後の画素でないと判定された場合には、ステップＳＴ１に戻り、同じ部分画像上の次の画素を注目画素として設定する。一方、ステップＳＴ６において、その注目画素が最後の画素であると判定された場合には、その部分画像に対する局所正規化を終了する。このように、上記ステップＳＴ１からＳＴ６の処理を繰り返すことにより、解像度画像Ｓ１＿ｉ全体に局所正規化処理が施された解像度画像Ｓ１′＿ｉを得る。 On the other hand, if it is determined in step ST3 that the variance Vlocal is less than the threshold value C2, linear tone conversion that does not depend on the variance Vlocal is performed according to equation (2) as the second luminance tone conversion processing. Perform (step ST5). Then, it is determined whether or not the target pixel set in step ST1 is the last pixel (step ST6). If it is determined in step ST6 that the target pixel is not the last pixel, the process returns to step ST1, and the next pixel on the same partial image is set as the target pixel. On the other hand, if it is determined in step ST6 that the target pixel is the last pixel, the local normalization for the partial image is terminated. In this way, by repeating the processes of steps ST1 to ST6, a resolution image S1′_i obtained by subjecting the entire resolution image S1_i to local normalization processing is obtained.

なお、上記の所定レベルは、局所領域における全体または一部の輝度に応じて変化させるようにしてもよい。例えば、上記の、注目画素毎に階調変換を行う正規化処理において、閾値Ｃ２を注目画素の画素値に応じて変化させるようにしてもよい。すなわち、上記の所定レベルに対応する閾値Ｃ２を、注目画素の輝度が相対的に高いときにはより高く設定し、その輝度が相対的に低いときにはより低く設定するようにしてもよい。 Note that the predetermined level may be changed according to the whole or a part of luminance in the local region. For example, in the normalization process in which gradation conversion is performed for each target pixel, the threshold value C2 may be changed according to the pixel value of the target pixel. That is, the threshold value C2 corresponding to the predetermined level may be set higher when the luminance of the target pixel is relatively high, and may be set lower when the luminance is relatively low.

顔検出部３０は、照明補正部２０により照明補正がなされた解像度画像群Ｓ１′の各々に対して顔検出処理を施し、各解像度画像における顔画像Ｓ２を検出するものである。図５は顔検出部３０の構成を示す概略ブロック図である。図５に示すように、顔検出部３０は、後述の各部を制御して顔検出処理におけるシーケンス制御を主に行う検出制御部３１と、解像度画像群Ｓ１′の中から顔検出処理に供する解像度画像Ｓ１′＿ｉをサイズの大きいものから順に順次選択する解像度画像選択部３２と、解像度画像選択部３２により選択された解像度画像において、顔画像であるか否かの判別対象となる部分画像Ｂを切り出すウィンドウを、その位置をずらしながら順次設定するウィンドウ設定部３３と、その切り出された部分画像Ｂが顔画像であるか否かを判別する判別器群３４とから構成されている。 The face detection unit 30 performs face detection processing on each of the resolution image groups S1 ′ that has been subjected to illumination correction by the illumination correction unit 20, and detects a face image S2 in each resolution image. FIG. 5 is a schematic block diagram showing the configuration of the face detection unit 30. As shown in FIG. 5, the face detection unit 30 controls each unit to be described later to mainly perform sequence control in the face detection process, and the resolution used for the face detection process from the resolution image group S1 ′. A resolution image selection unit 32 that sequentially selects images S1′_i in descending order of size, and a partial image B that is a determination target of whether or not the image is a face image in the resolution image selected by the resolution image selection unit 32. A window setting unit 33 that sequentially sets the window to be cut out while shifting the position thereof, and a classifier group 34 that determines whether or not the cut out partial image B is a face image.

検出制御部３１は、解像度画像群Ｓ１′の各画像に対して、顔画像Ｓ２を検出するという顔検出処理を行うべく、解像度画像選択部３２およびウィンドウ設定部３３を制御するものである。例えば、適宜、解像度画像選択部３２に対して解像度画像の選択を指示したり、ウィンドウ設定部３３に対してウィンドウの設定条件を指示したり、得られた検出結果を判別器群３４に出力したりする。なお、ウィンドウ設定条件には、ウィンドウを設定する画像上の範囲、ウィンドウの移動間隔（検出の粗さ）等が含まれる。 The detection control unit 31 controls the resolution image selection unit 32 and the window setting unit 33 so as to perform face detection processing for detecting the face image S2 for each image in the resolution image group S1 ′. For example, as appropriate, the resolution image selection unit 32 is instructed to select a resolution image, the window setting unit 33 is instructed of window setting conditions, and the obtained detection result is output to the discriminator group 34. Or Note that the window setting condition includes a range on the image where the window is set, a window movement interval (roughness of detection), and the like.

解像度画像選択部３２は、検出制御部３１の制御により、解像度画像群Ｓ１′の中から顔検出処理に供する解像度画像をサイズの大きい順に（解像度の細かい順に）順次選択するものである。なお、本実施形態における顔検出の手法が、各解像度画像上で順次切り出された部分画像Ｂについてその部分画像Ｂが顔画像であるか否かを判別し、顔であると判別した部分画像Ｂの領域を抽出することにより、検出対象画像Ｓ０における顔を検出する手法であるから、この解像度画像選択部３２は、検出対象画像Ｓ０における検出すべき顔の大きさを毎回変えながら設定するものであって、検出すべき顔の大きさを小から大へ変えながら設定するものと同等なものということができる。 Under the control of the detection control unit 31, the resolution image selection unit 32 sequentially selects resolution images to be used for face detection processing from the resolution image group S1 ′ in descending order of size (in order of fine resolution). Note that the face detection method in the present embodiment determines whether or not the partial image B sequentially cut out on each resolution image is a face image, and determines that the partial image B is a face. Therefore, the resolution image selection unit 32 sets the size of the face to be detected in the detection target image S0 while changing the size of the face in the detection target image S0. Therefore, it can be said to be equivalent to setting the face to be detected while changing the size of the face from small to large.

ウィンドウ設定部３３は、検出制御部３１により設定されたウィンドウ設定条件に基づいて、解像度画像選択部３２により選択された解像度画像上で各種サイズのウィンドウを移動させながら順次設定する。本実施形態においては、各種サイズのウィンドウは以下のように生成される。まず本実施形態においては、３２×３２画素サイズのウィンドウが用意されている。ここで、解像度画像群Ｓ１間の解像度の相違は１／２であるため、単一サイズのウィンドウを用いるのみでは、大きさが異なる顔を精度良く検出することができない。 Based on the window setting conditions set by the detection control unit 31, the window setting unit 33 sequentially sets while moving windows of various sizes on the resolution image selected by the resolution image selection unit 32. In the present embodiment, windows of various sizes are generated as follows. First, in the present embodiment, a 32 × 32 pixel size window is prepared. Here, since the difference in resolution between the resolution image groups S1 is ½, faces having different sizes cannot be detected with high accuracy only by using a single size window.

このため、本実施形態においては、この３２×３２画素サイズのウィンドウを基本として、複数サイズのウィンドウが生成されている。図６は複数サイズのウィンドウの生成を説明するための図である。図６に示すように、本実施形態においては、３２×３２画素サイズのウィンドウＷ１＿１を基準として、解像度画像群Ｓ１間の解像度の相違を補間するサイズの３つのウィンドウＷ２＿１，Ｗ３＿１，Ｗ４＿１を生成する。ここで、解像度画像群Ｓ１の各解像度画像間の倍率の相違は１／２倍であるため、ウィンドウＷ２＿１はウィンドウＷ１＿１に対して２の−１／４乗倍サイズを有し、ウィンドウＷ３＿１はウィンドウＷ２＿１に対して２の−１／４乗倍サイズ（ウィンドウＷ１＿１に対しては２の−２／４乗倍サイズ）を有し、ウィンドウＷ４＿１はウィンドウＷ３＿１に対して２の−１／４乗倍サイズ（ウィンドウＷ１＿１に対しては２の−３／４乗倍サイズ）を有する。ここで、ウィンドウＷ１＿１は３２×３２画素サイズであるため、ウィンドウＷ２＿１は２７×２７画素サイズ、ウィンドウＷ３＿１は２３×２３画素サイズ、ウィンドウＷ４＿１は１９×１９画素サイズを有するものとなる。 For this reason, in the present embodiment, a plurality of sizes of windows are generated based on the 32 × 32 pixel size window. FIG. 6 is a diagram for explaining the generation of windows of a plurality of sizes. As shown in FIG. 6, in the present embodiment, three windows W2_1, W3_1, and W4_1 having a size for interpolating a difference in resolution between resolution image groups S1 are generated on the basis of a window W1_1 having a 32 × 32 pixel size. . Here, since the difference in magnification between the resolution images of the resolution image group S1 is ½, the window W2_1 has a size of −1/4 times the size of the window W1_1, and the window W3_1 W2_1 has a size of -1/4 times the size of 2 (a size of -2/4 times a size of 2 for the window W1_1), and the window W4_1 has a size of -1/4 times the size of the window W3_1 Size (2 −3/4 times the size for the window W1_1). Since the window W1_1 has a 32 × 32 pixel size, the window W2_1 has a 27 × 27 pixel size, the window W3_1 has a 23 × 23 pixel size, and the window W4_1 has a 19 × 19 pixel size.

さらに本実施形態においては、ウィンドウＷ１＿１〜Ｗ４＿１のそれぞれを、１／２倍サイズに縮小したウィンドウを生成し、それら縮小した解像度画像をさらに１／２倍サイズに縮小したウィンドウを生成する、といった処理を繰り返し行い、複数のウィンドウを生成する。本実施形態においては、ウィンドウＷ１＿１〜Ｗ４＿１のそれぞれを２の−１乗倍サイズに縮小したウィンドウＷ１＿２〜Ｗ４＿２、ウィンドウＷ１＿１〜Ｗ４＿１のそれぞれを２の−２乗倍サイズに縮小したウィンドウＷ１＿３〜Ｗ４＿３を生成する。 Further, in the present embodiment, a process in which each of the windows W1_1 to W4_1 is reduced to a ½ size and a window in which the reduced resolution image is further reduced to a ½ size is generated. To generate multiple windows. In the present embodiment, windows W1_2 to W4_2 in which each of the windows W1_1 to W4_1 is reduced to a power of 2 times 1 and windows W1_3 to W4_3 in which each of the windows W1_1 to W4_1 is reduced to a power of 2 to a power of 2 are provided. Generate.

ここで、ウィンドウＷ１＿１は３２×３２画素サイズであるため、ウィンドウＷ１＿２，Ｗ１＿３はそれぞれ１６×１６画素サイズ、８×８画素サイズを有するものとなる。また、ウィンドウＷ２＿１は２７×２７画素サイズであるため、ウィンドウＷ２＿２，Ｗ２＿３はそれぞれ１３×１３画素サイズ、６×６画素サイズを有するものとなる。ウィンドウＷ３＿１は２３×２３画素サイズであるため、ウィンドウＷ３＿２，Ｗ３＿３はそれぞれ１１×１１画素サイズ、５×５画素サイズを有するものとなる。ウィンドウＷ４＿１は１９×１９画素サイズであるため、ウィンドウＷ４＿２，Ｗ４＿３はそれぞれ９×９画素サイズ、４×４画素サイズを有するものとなる。 Here, since the window W1_1 has a 32 × 32 pixel size, the windows W1_2 and W1_3 have a 16 × 16 pixel size and an 8 × 8 pixel size, respectively. Further, since the window W2_1 has a size of 27 × 27 pixels, the windows W2_2 and W2_3 have a size of 13 × 13 pixels and a size of 6 × 6 pixels, respectively. Since the window W3_1 has a 23 × 23 pixel size, the windows W3_2 and W3_3 have an 11 × 11 pixel size and a 5 × 5 pixel size, respectively. Since the window W4_1 has a 19 × 19 pixel size, the windows W4_2 and W4_3 have a 9 × 9 pixel size and a 4 × 4 pixel size, respectively.

ここで、以降の説明においては、ウィンドウＷ１＿１，Ｗ２＿１，Ｗ３＿１およびＷ４＿１を第１〜第４のウィンドウと称し、第１のウィンドウＷ１＿１および第１のウィンドウＷ１＿１から生成されるサイズが小さいウィンドウＷ１＿２，Ｗ１＿３，Ｗ１＿４を含む複数のウィンドウを第１のウィンドウ群Ｗ１、第２のウィンドウＷ２＿１および第２のウィンドウＷ２＿１から生成されるサイズが小さいウィンドウＷ２＿２，Ｗ２＿３，Ｗ２＿４を含む複数のウィンドウを第２のウィンドウ群Ｗ２、第１のウィンドウＷ３＿１および第３のウィンドウＷ３＿１から生成されるサイズが小さいウィンドウＷ３＿２，Ｗ３＿３，Ｗ３＿４を含む複数のウィンドウを第３のウィンドウ群Ｗ３、第４のウィンドウＷ４＿１および第４のウィンドウＷ４＿１から生成されるサイズが小さいウィンドウＷ４＿２，Ｗ４＿３，Ｗ４＿４を含む複数のウィンドウを第４のウィンドウ群Ｗ４と称するものとする。また、第１から第４のウィンドウ群Ｗ１〜Ｗ４において、解像度が高い（すなわちサイズが大きい）ウィンドウから順に、第１の階層のウィンドウ、第２の階層のウィンドウ、第３の階層のウィンドウと称するものとする。なお、ウィンドウの設定および顔の判別の処理については後述する。 Here, in the following description, the windows W1_1, W2_1, W3_1, and W4_1 are referred to as first to fourth windows, and the windows W1_2, W1_3 that are generated from the first window W1_1 and the first window W1_1 are small in size. , W1_4 includes a plurality of windows including a first window group W1, a second window W2_1 and a second window W2_1, and a plurality of windows including small windows W2_2, W2_3, and W2_4 that are generated from the second window group. A plurality of windows including small windows W3_2, W3_3, and W3_4 generated from W2, the first window W3_1, and the third window W3_1 are defined as a third window group W3, a fourth window W4_1, and a fourth window W. Window size is generated from _1 small W4_2, W4_3, it is assumed that a plurality of windows including W4_4 called fourth window groups W4. In the first to fourth window groups W1 to W4, the windows in the first hierarchy, the second hierarchy, and the third hierarchy are called in order from the window having the highest resolution (that is, the size is larger). Shall. Note that window setting and face discrimination processing will be described later.

判別器群３４は、部分画像Ｂが顔画像であるか否かを判別する判別器群であり、解像度画像における顔画像を検出するために用いられる。図７は判別器群３４の構成を示す図である。図７に示すように判別器群３４は、検出可能な顔のサイズが異なる複数種類の判別器群、すなわち、部分画像Ｂを生成するための１２個のウィンドウのサイズのそれぞれに対応する第１〜第１２の判別器群３４＿１〜３４＿１２を有する。なお、第１〜第３の判別器群３４＿１〜３４＿３がウィンドウＷ１＿１〜Ｗ１＿３のサイズに、第４〜第６の判別器群３４＿４〜３４＿６がウィンドウＷ２＿１〜Ｗ２＿３のサイズに、第７〜第９の判別器群３４＿７〜３４＿９がウィンドウＷ３＿１〜Ｗ３＿３のサイズに、第１０〜第１２の判別器群３４＿１０〜３４＿１２がウィンドウＷ４＿１〜Ｗ４＿３のサイズにそれぞれ対応する。そして、第１〜第３の判別器群３４＿１〜３４＿３が第３の判別器群３４＿３から順に直列に、第４〜第６の判別器群３４＿４〜３４＿６が第６の判別器群３４＿６から順に直列に、第７〜第９の判別器群３４＿７〜３４＿９が第９の判別器群３４＿９から順に直列に、第１０〜第１２の判別器群３４＿１０〜３４＿１２が第１２の判別器群３４＿１２から順に直列に結合している。さらに、それぞれ直列に結合された第１〜第３の判別器群３４＿１〜３４＿３、第４〜第６の判別器群３４＿４〜３４＿６、第７〜第９の判別器群３４＿７〜３４＿９、第１０〜第１２の判別器群３４＿１０〜３４＿１２が並列に結合されている。なお、第１〜第１２の判別器群３４＿１〜３４＿１２のそれぞれは、入力される部分画像Ｂのサイズに対応するサイズの顔を判別する。 The discriminator group 34 is a discriminator group that discriminates whether or not the partial image B is a face image, and is used for detecting the face image in the resolution image. FIG. 7 is a diagram showing the configuration of the classifier group 34. As shown in FIG. 7, the discriminator group 34 includes a plurality of types of discriminator groups having different detectable face sizes, that is, first sizes corresponding to the sizes of 12 windows for generating the partial image B. To twelfth discriminator groups 34_1 to 34_12. Note that the first to third discriminator groups 34_1 to 34_3 have the size of the windows W1_1 to W1_3, the fourth to sixth discriminator groups 34_4 to 34_6 have the sizes of the windows W2_1 to W2_3, and the seventh to ninth. The classifier groups 34_7 to 34_9 correspond to the sizes of the windows W3_1 to W3_3, and the tenth to twelfth classifier groups 34_10 to 34_12 correspond to the sizes of the windows W4_1 to W4_3, respectively. The first to third discriminator groups 34_1 to 34_3 are serially arranged in order from the third discriminator group 34_3, and the fourth to sixth discriminator groups 34_4 to 34_6 are serially arranged in order from the sixth discriminator group 34_6. In addition, the seventh to ninth discriminator groups 34_7 to 34_9 are serially arranged in order from the ninth discriminator group 34_9, and the tenth to twelfth discriminator groups 34_10 to 34_12 are serially arranged in order from the twelfth discriminator group 34_12. Is bound to. Furthermore, the first to third discriminator groups 34_1 to 34_3, the fourth to sixth discriminator groups 34_4 to 34_6, the seventh to ninth discriminator groups 34_7 to 34_9, the tenth to the tenth, respectively coupled in series. The twelfth discriminator groups 34_10 to 34_12 are coupled in parallel. Each of the first to twelfth discriminator groups 34_1 to 34_12 discriminates a face having a size corresponding to the size of the input partial image B.

このように、第１〜第３の判別器群３４＿１〜３４＿３、第４〜第６の判別器群３４＿４〜３４＿６、第７〜第９の判別器群３４＿７〜３４＿９、第１０〜第１２の判別器群３４＿１０〜３４＿１２について、小さいサイズのウィンドウＷｋ＿３（ｋ＝１〜４）により切り出された部分画像Ｂが顔画像であるかを否かを判別する判別器群から順に直列に接続することにより、後述するように、小さいサイズのウィンドウＷｋ＿３（ｋ＝１〜４）により切り出された部分画像Ｂから、大きいサイズのウィンドウＷｋ＿１（ｋ＝１〜４）に向けて、順次判別を行うことが可能となる。 Thus, the first to third discriminator groups 34_1 to 34_3, the fourth to sixth discriminator groups 34_4 to 34_6, the seventh to ninth discriminator groups 34_7 to 34_9, and the tenth to twelfth discriminators. For the device groups 34_10 to 34_12, by connecting in series sequentially from the discriminator group that determines whether or not the partial image B cut out by the small size window Wk_3 (k = 1 to 4) is a face image, As will be described later, it is possible to sequentially determine from the partial image B cut out by the small size window Wk_3 (k = 1 to 4) toward the large size window Wk_1 (k = 1 to 4). Become.

また、判別器群３４は、図７に示すように、複数の弱判別器ＷＣが線形に結合したカスケード構造を有しており、弱判別器は、部分画像Ｂの画素値（輝度）の分布に係る少なくとも１つの特徴量を算出し、この特徴量を用いて部分画像Ｂが顔画像であるか否かを判別するものである。 Further, as shown in FIG. 7, the classifier group 34 has a cascade structure in which a plurality of weak classifiers WC are linearly coupled, and the weak classifier has a distribution of pixel values (luminance) of the partial image B. At least one feature amount related to is calculated, and using this feature amount, it is determined whether or not the partial image B is a face image.

ここで、判別器群３４における具体的な処理について説明する。図８は判別器群３４に含まれる各判別器群３４＿ｎ（ｎ＝１〜１２）における大局的な処理を示すフローチャートであり、図９はその中の各弱判別器による処理を示すフローチャートである。 Here, specific processing in the classifier group 34 will be described. FIG. 8 is a flowchart showing a general process in each classifier group 34_n (n = 1 to 12) included in the classifier group 34, and FIG. 9 is a flowchart showing a process by each weak classifier in the classifier group 34_n. .

まず、１番目の弱判別器ＷＣが、対応するサイズのウィンドウにより解像度画像Ｓ１′＿ｉ上で切り出された部分画像Ｂに対してこの部分画像Ｂが顔であるか否かを判別する（ステップＳＳ１）。具体的には、１番目の弱判別器ＷＣは、図１０に示すように、解像度画像Ｓ１′＿ｉから切り出された所定サイズの部分画像Ｂ（例えば３２×３２画素サイズ）の画像に対して、４近傍画素平均（画像を２×２画素サイズ毎に複数のブロックに区分し、各ブロックの４画素における画素値の平均値をそのブロックに対応する１つの画素の画素値とする処理）を行うことにより、１６×１６画素サイズの画像と、８×８画素サイズの縮小した画像を得、これら３つの画像の平面内に設定される所定の２点を１ペアとして、複数種類のペアからなる１つのペア群を構成する各ペアにおける２点間の画素値（輝度）の差分値をそれぞれ計算し、これらの差分値の組合せを特徴量とする（ステップＳＳ１−１）。各ペアの所定の２点は、例えば、画像上の顔の濃淡の特徴が反映されるよう決められた縦方向に並んだ所定の２点や、横方向に並んだ所定の２点とする。そして、特徴量である差分値の組合せに応じて所定のスコアテーブルを参照してスコアを算出し（ステップＳＳ１−２）、直前の弱判別器が算出したスコアに自己の算出したスコアを加算して累積スコアを算出する（ステップＳＳ１−３）。なお、最初の弱判別器ＷＣ１では、直前の弱判別器がないので、自己の算出したスコアをそのまま累積スコアとする。この累積スコアが所定の閾値以上であるか否かによって部分画像が顔であるか否かを判別する（ステップＳＳ１−４）。ここで、上記部分画像Ｂが顔と判別されたときには、次の弱判別器ＷＣ２による判別に移行し（ステップＳＳ２）、部分画像Ｂが非顔と判別されたときには、部分画像は、即、非顔と判定され（ステップＳＳＢ）、処理が終了する。 First, the first weak discriminator WC discriminates whether or not the partial image B is a face with respect to the partial image B cut out on the resolution image S1′_i by the corresponding size window (step SS1). ). Specifically, as shown in FIG. 10, the first weak discriminator WC applies a partial image B (for example, 32 × 32 pixel size) having a predetermined size cut out from the resolution image S1′_i. 4-neighbor pixel average (processing that divides an image into a plurality of blocks for each 2 × 2 pixel size and sets an average value of pixel values of four pixels of each block to a pixel value of one pixel corresponding to the block) As a result, an image of 16 × 16 pixel size and a reduced image of 8 × 8 pixel size are obtained, and a predetermined pair of two points set in the plane of these three images is used as a pair, and a plurality of types of pairs are used. A difference value of pixel values (luminance) between two points in each pair constituting one pair group is calculated, and a combination of these difference values is used as a feature amount (step SS1-1). The predetermined two points of each pair are, for example, two predetermined points arranged in the vertical direction and two predetermined points arranged in the horizontal direction so as to reflect the characteristics of the facial shading on the image. Then, a score is calculated by referring to a predetermined score table according to a combination of difference values as feature amounts (step SS1-2), and the score calculated by itself is added to the score calculated by the previous weak discriminator. The cumulative score is calculated (step SS1-3). In the first weak classifier WC1, since there is no previous weak classifier, the self-calculated score is used as the cumulative score as it is. It is determined whether or not the partial image is a face depending on whether or not the accumulated score is equal to or greater than a predetermined threshold (step SS1-4). When the partial image B is determined to be a face, the process proceeds to determination by the next weak classifier WC2 (step SS2). When the partial image B is determined to be a non-face, the partial image is immediately non- The face is determined (step SSB), and the process ends.

ステップＳＳ２においても、ステップＳＳ１と同様に、２番目の弱判別器ＷＣが部分画像に基づいて画像上の特徴を表す上記のような特徴量を算出し（ステップＳＳ２−１）、スコアテーブルを参照して特徴量からスコアを算出する（ステップＳＳ２−２）。そして、自ら算出したスコアを直前の１番目の弱判別器ＷＣが算出した累積スコアに加算して累積スコアを更新し（ステップＳＳ２−３）、この累積スコアが所定の閾値以上であるか否かによって部分画像Ｂが顔であるか否かを判別する（ステップＳＳ２−４）。ここでも、部分画像Ｂが顔と判別されたときには、次の３番目の弱判別器ＷＣによる判別に移行し（ステップＳＳ３）、部分画像Ｂが非顔と判別されたときには、部分画像Ｂは、即、非顔と判定され（ステップＳＳＢ）、処理が終了する。このようにして、判別器を構成する全Ｎ個の弱判別器ＷＣにおいて部分画像Ｂが顔であると判別されたときには、その部分画像Ｂを最終的に顔画像と判定する（ステップＳＳＡ）。 Also in step SS2, as in step SS1, the second weak classifier WC calculates the above-described feature amount representing the feature on the image based on the partial image (step SS2-1), and refers to the score table. Then, a score is calculated from the feature amount (step SS2-2). Then, the score calculated by itself is added to the cumulative score calculated by the immediately preceding first weak discriminator WC to update the cumulative score (step SS2-3), and whether or not the cumulative score is equal to or greater than a predetermined threshold value. To determine whether the partial image B is a face (step SS2-4). Again, when the partial image B is determined to be a face, the process proceeds to determination by the next third weak classifier WC (step SS3), and when the partial image B is determined to be a non-face, the partial image B is Immediately, the face is determined to be non-face (step SSB), and the process ends. In this way, when the partial image B is determined to be a face in all N weak classifiers WC constituting the classifier, the partial image B is finally determined to be a face image (step SSA).

なお、本実施形態において、検出制御部３１、解像度画像選択部３２、ウィンドウ設定部３３および判別器群３４が、本発明の判定手段として機能する。 In the present embodiment, the detection control unit 31, the resolution image selection unit 32, the window setting unit 33, and the discriminator group 34 function as the determination unit of the present invention.

次に、顔検出システム１における処理の流れについて説明する。図１１は本実施形態による顔検出システムにおける処理の流れを示すフローチャートである。図１１に示すように、多重解像度化部１０に検出対象画像Ｓ０が入力されると（ステップＳＴ１１）、多重解像度化部１０が検出対象画像Ｓ０を多重解像度化し、複数の解像度画像からなる解像度画像群Ｓ１を生成する（ステップＳＴ１２）。そして、照明補正部２０が、解像度画像群Ｓ１の各々に対して照明補正を施し、照明補正済みの解像度画像群Ｓ１′を取得する（ステップＳＴ１３）。 Next, the flow of processing in the face detection system 1 will be described. FIG. 11 is a flowchart showing the flow of processing in the face detection system according to this embodiment. As shown in FIG. 11, when the detection target image S0 is input to the multi-resolution converting unit 10 (step ST11), the multi-resolution converting unit 10 multi-resolutions the detection target image S0, and a resolution image composed of a plurality of resolution images. A group S1 is generated (step ST12). Then, the illumination correction unit 20 performs illumination correction on each of the resolution image groups S1, and obtains an illumination-corrected resolution image group S1 ′ (step ST13).

顔検出部３０は、検出制御部３１からの指示を受けた解像度画像選択部３２により、解像度画像群Ｓ１′の中から画像サイズの大きい順、すなわち、Ｓ１′＿１，Ｓ１′＿２，・・・，Ｓ１′＿ｎの順に解像度画像Ｓ１′＿ｉを選択する（ステップＳＴ１４）。次に検出制御部３１が、ウィンドウ設定部３３に対して、ウィンドウを初期位置に、すなわち選択された解像度画像上の最初の注目画素にウィンドウを設定する指示を行う（ステップＳＴ１５）。本実施形態においては、第１から第４のウィンドウ群Ｗ１〜Ｗ４のうち、まず中間のサイズの第３のウィンドウ群Ｗ３を解像度画像上に設定する。このため、本実施形態においては、まず第１の階層の解像度画像Ｓ１′＿１から２３×２３画素サイズの顔が検出されることとなる。なお、第３のウィンドウ群Ｗ３に代えて、第２のウィンドウ群Ｗ２を選択された解像度画像上に設定してもよい。 In response to an instruction from the detection control unit 31, the face detection unit 30 uses the resolution image selection unit 32 in descending order of the image size from the resolution image group S1 ′, that is, S1′_1, S1′_2,. , S1′_n in the order of resolution images S1′_i (step ST14). Next, the detection control unit 31 instructs the window setting unit 33 to set the window at the initial position, that is, the first target pixel on the selected resolution image (step ST15). In the present embodiment, among the first to fourth window groups W1 to W4, a third window group W3 having an intermediate size is first set on the resolution image. For this reason, in the present embodiment, a face having a 23 × 23 pixel size is first detected from the resolution image S1′_1 of the first hierarchy. Instead of the third window group W3, the second window group W2 may be set on the selected resolution image.

図１２はウィンドウの設定を説明するための図である。なお、図１２においては第１の階層の解像度画像Ｓ１′＿１に対するウィンドウの設定を示す。図１２に示すように、第１の階層の解像度画像Ｓ１′＿１に対しては、第３のウィンドウ群Ｗ３のうちの第１の階層のウィンドウＷ３＿１が、第２の階層の解像度画像Ｓ１′＿２に対しては第２の階層のウィンドウＷ３＿２が、第３の階層の解像度画像Ｓ１′＿３に対しては第３の階層のウィンドウＷ３＿３がそれぞれ設定される。ウィンドウ設定部３３は、選択された解像度画像上に設定された３つのウィンドウのうち、最低階層のウィンドウ（ここではウィンドウＷ３＿３）が設定された階層の解像度画像（ここでは第３の階層の解像度画像Ｓ１′＿３）から部分画像Ｂを切り出し（ステップＳＴ１６）、その部分画像を判別器群３４に入力する（ステップＳＴ１７）。 FIG. 12 is a diagram for explaining window settings. FIG. 12 shows window settings for the resolution image S1′_1 in the first hierarchy. As shown in FIG. 12, with respect to the resolution image S1′_1 of the first hierarchy, the window W3_1 of the first hierarchy in the third window group W3 is replaced with the resolution image S1′_2 of the second hierarchy. Is set to the second layer window W3_2, and the third layer window W3_3 is set to the third layer resolution image S1'_3. Of the three windows set on the selected resolution image, the window setting unit 33 sets the resolution image of the hierarchy in which the lowest hierarchy window (here, window W3_3) is set (here, the resolution image of the third hierarchy). The partial image B is cut out from S1′_3) (step ST16), and the partial image is input to the classifier group 34 (step ST17).

なお、選択された解像度画像が第２の階層の解像度画像Ｓ１′＿２である場合には、図１３に示すように、第１の階層の解像度画像Ｓ１′＿１に対してはウィンドウは設定されず、第２の階層の解像度画像Ｓ１′＿２に対して第３のウィンドウ群Ｗ３のうちの第１の階層のウィンドウＷ３＿１が、第３の階層の解像度画像Ｓ１′＿２に対しては第２の階層のウィンドウＷ３＿２が、第４の階層の解像度画像Ｓ１′＿３に対しては第３の階層のウィンドウＷ３＿３がそれぞれ設定される。そして、同様に最低階層のウィンドウＷ３＿３が設定された階層の解像度画像（第４の階層の解像度画像Ｓ１′＿４）から部分画像Ｂが切り出され、その部分画像Ｂが判別器群３４に入力される。 If the selected resolution image is the second-level resolution image S1′_2, no window is set for the first-level resolution image S1′_1 as shown in FIG. The first layer window W3_1 of the third window group W3 for the second layer resolution image S1'_2 and the second layer for the third layer resolution image S1'_2. The third layer window W3_3 is set for the fourth layer resolution image S1'_3. Similarly, the partial image B is cut out from the resolution image (fourth hierarchy resolution image S1′_4) in which the lowest-layer window W3_3 is set, and the partial image B is input to the discriminator group 34. .

ここで、選択された解像度画像の階層が低くなると、選択された解像度画像の階層よりも低い階層の解像度画像数が１または０となることがあるため、すべての階層のウィンドウをすべての階層の解像度画像に設定できない場合があり得る。この場合、ステップＳＴ１６の処理においては、設定されたウィンドウのうち、最低階層のウィンドウにより切り出されることとなる。例えば、選択された解像度画像が最低階層の解像度画像の場合、第１の階層のウィンドウのみしか選択された解像度画像に設定できないため、ステップＳＴ１６の処理においては、第１の階層のウィンドウが最低階層のウィンドウとなる。 Here, if the hierarchy of the selected resolution image is lowered, the number of resolution images in the hierarchy lower than the hierarchy of the selected resolution image may be 1 or 0. It may not be possible to set the resolution image. In this case, in the process of step ST16, it is cut out by the window of the lowest hierarchy among the set windows. For example, when the selected resolution image is the lowest resolution image, only the first hierarchy window can be set as the selected resolution image. Therefore, in the process of step ST16, the first hierarchy window is the lowest hierarchy. Window.

判別器群３４は、入力される部分画像Ｂに対して、上記の１２種類の判別器のうちのウィンドウサイズが対応する判別器群を用いて判別を行い、検出制御部３１がその判別結果Ｒを取得し（ステップＳＴ１８）、判別結果Ｒが部分画像Ｂが顔であるというものであるか否かを判定する（ステップＳＴ１９）。判別結果Ｒが部分画像Ｂが顔でないというものであった場合（ステップＳＴ１９否定）、検出制御部３１は、現在切り出された部分画像Ｂが最後の注目画素に位置する部分画像、すなわち最後の部分画像であるか否かを判定し（ステップＳＴ２０）、部分画像Ｂが最後の部分画像でないと判定された場合には、ウィンドウを設定する位置を次の注目画素の位置（すなわち次の位置）に設定し（ステップＳＴ２１）、ステップＳＴ１６に戻って、ウィンドウ設定部３３が新たな部分画像Ｂを切り出す。これにより、図１２に示すようにウィンドウにより解像度画像Ｓ１′＿ｉを走査しつつ、解像度画像Ｓ１′＿ｉから顔画像を検出する。 The discriminator group 34 discriminates the input partial image B using the discriminator group corresponding to the window size among the 12 types of discriminators described above, and the detection control unit 31 determines the discrimination result R. Is obtained (step ST18), and it is determined whether or not the determination result R is that the partial image B is a face (step ST19). When the determination result R is that the partial image B is not a face (No in step ST19), the detection control unit 31 determines that the currently extracted partial image B is the partial image located at the last pixel of interest, that is, the last portion. It is determined whether or not the image is an image (step ST20). If it is determined that the partial image B is not the last partial image, the position where the window is set is set to the position of the next pixel of interest (that is, the next position). After setting (step ST21), the process returns to step ST16, and the window setting unit 33 cuts out a new partial image B. Thus, as shown in FIG. 12, the face image is detected from the resolution image S1′_i while scanning the resolution image S1′_i by the window.

ここで、ステップＳＴ２１におけるウィンドウの位置の設定は、第３階層のウィンドウＷ３＿３を、第３階層のウィンドウＷ３＿３が設定された階層の解像度画像上において１画素移動させて設定するものである。これにより、第２階層のウィンドウＷ３＿２は、第２階層のウィンドウＷ３＿２が設定された階層の解像度画像上において２画素移動されて設定されることとなり、第１階層のウィンドウＷ３＿１は、第１階層のウィンドウＷ３＿１が設定された階層の解像度画像上において４画素移動されて設定されることとなる。 Here, the setting of the window position in step ST21 is performed by moving the third layer window W3_3 by one pixel on the resolution image of the layer in which the third layer window W3_3 is set. As a result, the window W3_2 in the second hierarchy is set by moving two pixels on the resolution image of the hierarchy in which the window W3_2 in the second hierarchy is set. The window W3_1 in the first hierarchy is set in the first hierarchy. The window W3_1 is set by moving four pixels on the resolution image of the set hierarchy.

なお、部分画像Ｂが最後の部分画像であると判定された場合には、検出制御部３１は、現在選択されている解像度画像Ｓ１′＿ｉが最後に判定される画像、すなわち最後の解像度画像Ｓ１′＿ｎであるか否かを判定し（ステップＳＴ２２）、最後の解像度画像であると判定された場合には検出処理を終了し、検出結果を出力する（ステップＳＴ２３）。一方、最後の解像度画像ではないと判定された場合には、ステップＳＴ１４に戻り、解像度画像選択部３２により、現在選択されている解像度画像Ｓ１′＿ｉより１段階サイズが小さい解像度画像Ｓ１′＿ｉ＋１が選択され、さらに顔画像の検出が実行される。 When it is determined that the partial image B is the last partial image, the detection control unit 31 determines the image for which the currently selected resolution image S1′_i is determined last, that is, the last resolution image S1. It is determined whether it is' _n (step ST22). If it is determined that the image is the last resolution image, the detection process is terminated and the detection result is output (step ST23). On the other hand, if it is determined that the resolution image is not the last resolution image, the process returns to step ST14, and the resolution image selection unit 32 generates a resolution image S1′_i + 1 that is one step smaller than the currently selected resolution image S1′_i. Then, the face image is detected.

また、検出結果の出力は、検出対象画像Ｓ１から顔が検出できなかった場合にはその旨を出力し、検出対象画像Ｓ１に顔が検出できた場合には、検出対象画像Ｓ０上における顔が検出された部分画像の位置の座標を出力する。 The detection result is output when a face cannot be detected from the detection target image S1, and when a face is detected in the detection target image S1, the face on the detection target image S0 is output. The coordinates of the position of the detected partial image are output.

一方、判別結果Ｒが部分画像Ｂが顔であるというものであった場合、検出制御部３１はさらに詳細な検出処理を行う（ステップＳＴ２４）。図１４は詳細な検出処理のフローチャートである。詳細な検出処理においては、最低階層のウィンドウ（ここでは第３の階層のウィンドウＷ３＿３）が設定された階層の解像度画像（ここでは第３の階層の解像度画像Ｓ１′＿３）から切り出された部分画像Ｂが顔であると判別されていることから、さらに詳細な判別を行うために、最低階層のウィンドウよりも上位の階層数が２より大きいか否かを判定する（ステップＳＴ３１）。ステップＳＴ３１が肯定されると、ウィンドウ設定部３３は、最低階層のウィンドウよりも一段階上位の階層、すなわち一段階サイズが大きいウィンドウが設定された階層の解像度画像から部分画像Ｂを切り出し（ステップＳＴ３２）、その部分画像Ｂを判別器群３４に入力する（ステップＳＴ３３）。 On the other hand, if the determination result R is that the partial image B is a face, the detection control unit 31 performs more detailed detection processing (step ST24). FIG. 14 is a flowchart of detailed detection processing. In the detailed detection process, the partial image cut out from the resolution image (here, the third resolution image S1′_3) in which the lowest hierarchy window (here, the third hierarchy window W3_3) is set. Since it is determined that B is a face, it is determined whether or not the number of layers above the lowest layer window is greater than 2 in order to perform more detailed determination (step ST31). If step ST31 is affirmed, the window setting unit 33 cuts out the partial image B from the resolution image of the hierarchy in which the hierarchy one level higher than the window of the lowest hierarchy, that is, the window having a large size of one stage is set (step ST32). The partial image B is input to the classifier group 34 (step ST33).

判別器群３４は、入力される部分画像Ｂに対して、上記の１２種類の判別器のうちのウィンドウサイズが対応する判別器群を用いて判別を行い、検出制御部３１がその判別結果Ｒを取得し（ステップＳＴ３４）、判別結果Ｒが部分画像Ｂが顔であるというものであるか否かを判別する（ステップＳＴ３５）。判別結果Ｒが部分画像Ｂが顔でないというものであった場合、ステップＳＴ２０の処理に進む。判別結果Ｒが部分画像Ｂが顔であるものである場合、ステップＳＴ３１の処理に戻り、ステップＳＴ３１からステップＳＴ３５の処理を繰り返す。 The discriminator group 34 discriminates the input partial image B using the discriminator group corresponding to the window size among the 12 types of discriminators described above, and the detection control unit 31 determines the discrimination result R. Is acquired (step ST34), and it is determined whether or not the determination result R is that the partial image B is a face (step ST35). If the determination result R is that the partial image B is not a face, the process proceeds to step ST20. If the determination result R is that the partial image B is a face, the process returns to step ST31, and the processes from step ST31 to step ST35 are repeated.

一方、ステップＳＴ３１が否定されると、検出制御部３１は、最低階層のウィンドウよりも上位の階層数が０であるか否かを判定する（ステップＳＴ３６）。ステップＳＴ３６が肯定されるとステップＳＴ２０の処理に進む。ステップＳＴ３６が否定されると、検出制御部３１は、最低階層のウィンドウよりも上位の階層数が１であるか否かを判定する（ステップＳＴ３７）。ステップＳＴ３７が否定されると、最低階層のウィンドウよりも上位の階層の数が２であることから、ウィンドウ設定部３３は、すべてのウィンドウ群Ｗ１〜Ｗ４の第２の階層のウィンドウＷ１＿２，Ｗ２＿２，Ｗ３＿２，Ｗ４＿２を、第３の階層のウィンドウが設定された階層の一段階上位の階層の解像度画像（ここでは第２の階層の解像度画像Ｓ１′＿２）に設定する（すべてのウィンドウ群の第２の階層のウィンドウの設定、ステップＳＴ３８）。なお、第２の階層のウィンドウＷ１＿２，Ｗ２＿２，Ｗ３＿２，Ｗ４＿２が設定される位置は、現在選択されている解像度画像Ｓ１′＿ｉの次の階層の解像度画像Ｓ１′＿ｉ＋１における、現在の注目画素の位置に対応する位置である。 On the other hand, if step ST31 is negative, the detection control unit 31 determines whether or not the number of layers above the lowest layer window is 0 (step ST36). If step ST36 is affirmed, the process proceeds to step ST20. If step ST36 is negative, the detection control unit 31 determines whether or not the number of layers higher than the lowest layer window is 1 (step ST37). If step ST37 is negative, since the number of layers higher than the window of the lowest hierarchy is 2, the window setting unit 33 sets the windows W1_2 and W2_2 of the second hierarchy of all the window groups W1 to W4. W3_2 and W4_2 are set to the resolution image of the hierarchy one level higher than the hierarchy in which the window of the third hierarchy is set (here, the resolution image S1′_2 of the second hierarchy) (the second of all the window groups). Setting of the window of the hierarchy, step ST38). Note that the positions at which the second-level windows W1_2, W2_2, W3_2, and W4_2 are set are the positions of the current pixel of interest in the resolution image S1′_i + 1 in the next layer after the currently selected resolution image S1′_i. Is a position corresponding to.

次いで、ウィンドウ設定部３３は、設定された４つのウィンドウのそれぞれから順次部分画像Ｂ１＿２，Ｐ２＿２，Ｐ３＿２，Ｐ４＿２を切り出し（ステップＳＴ３９）、部分画像Ｂ１＿２，Ｐ２＿２，Ｐ３＿２，Ｐ４＿２を判別器群３４に順次入力する（ステップＳＴ４０）。なお、ステップＳＴ４０の処理は並列して行ってもよく、連続させて行ってもよい。判別器群３４は、入力される部分画像Ｂ１＿２，Ｐ２＿２，Ｐ３＿２，Ｐ４＿２に対して、上記の１２種類の判別器のうちのウィンドウサイズが対応する判別器群を用いて判別を行い、検出制御部３１が各判別器群の累積スコアＫ１＿２，Ｋ２＿２，Ｋ３＿２，Ｋ４＿２を取得し（ステップＳＴ４１）、累積スコアＫ１＿２，Ｋ２＿２，Ｋ３＿２，Ｋ４＿２のうち、最大累積スコアを決定する（ステップＳＴ４２）。 Next, the window setting unit 33 sequentially cuts out the partial images B1_2, P2_2, P3_2, and P4_2 from each of the set four windows (step ST39), and sequentially sets the partial images B1_2, P2_2, P3_2, and P4_2 to the discriminator group 34. Input (step ST40). In addition, the process of step ST40 may be performed in parallel and may be performed continuously. The discriminator group 34 discriminates the input partial images B1_2, P2_2, P3_2, and P4_2 using the discriminator group corresponding to the window size among the 12 types of discriminators described above, and the detection control unit 31 acquires cumulative scores K1_2, K2_2, K3_2, and K4_2 of each classifier group (step ST41), and determines the maximum cumulative score among the cumulative scores K1_2, K2_2, K3_2, and K4_2 (step ST42).

次いで、ウィンドウ設定部３３は、最大累積スコアを得たウィンドウに対応する第１の階層のウィンドウＷｉ＿１を、現在選択されている解像度画像Ｓ１′＿ｉに設定する（ステップＳＴ４３）。例えば、ウィンドウＷ２＿２により取得した累積スコアが最も大きい場合、第１の階層のウィンドウＷ２＿１を解像度画像Ｓ１′＿ｉに設定する。そしてウィンドウ設定部３３は、設定したウィンドウにより解像度画像Ｓ１′＿ｉから部分画像Ｂを切り出し（ステップＳＴ４４）、その部分画像Ｂを判別器群３４に入力する（ステップＳＴ４５）。 Next, the window setting unit 33 sets the first layer window Wi_1 corresponding to the window for which the maximum cumulative score is obtained as the currently selected resolution image S1′_i (step ST43). For example, when the accumulated score acquired by the window W2_2 is the largest, the window W2_1 of the first hierarchy is set as the resolution image S1′_i. The window setting unit 33 cuts out the partial image B from the resolution image S1′_i using the set window (step ST44), and inputs the partial image B to the classifier group 34 (step ST45).

判別器群３４は、入力される部分画像Ｂに対して、上記の１２種類の判別器のうちのウィンドウサイズが対応する判別器群を用いて判別を行い、検出制御部３１がその判別結果Ｒを取得し（ステップＳＴ４６）、ステップＳＴ２０の処理に進む。一方、ステップＳＴ３７が肯定されると、ウィンドウ設定部３３は、最低階層のウィンドウよりも一段階上位の階層（ここでは第１階層）のウィンドウが設定された階層の解像度画像から部分画像Ｂを切り出し（ステップＳＴ４７）、その部分画像Ｂを判別器群３４に入力し（ステップＳＴ４８）、ステップＳＴ１８の処理に進む。以上の処理を行うことにより、検出対象画像Ｓ０から種々のサイズの顔画像を検出することができる。 The discriminator group 34 discriminates the input partial image B using the discriminator group corresponding to the window size among the 12 types of discriminators described above, and the detection control unit 31 determines the discrimination result R. Is acquired (step ST46), and the process proceeds to step ST20. On the other hand, when step ST37 is affirmed, the window setting unit 33 cuts out the partial image B from the resolution image of the hierarchy in which the window of the hierarchy one level higher than the window of the lowest hierarchy (here, the first hierarchy) is set. (Step ST47), the partial image B is input to the classifier group 34 (step ST48), and the process proceeds to step ST18. By performing the above processing, face images of various sizes can be detected from the detection target image S0.

次に、判別器の学習方法（生成方法）について説明する。なお、学習は、判別器の種類、すなわち、判別すべき顔のサイズ毎に行われる。 Next, a learning method (generation method) of the discriminator will be described. Note that learning is performed for each type of discriminator, that is, for each face size to be discriminated.

学習の対象となるサンプル画像群は、ウィンドウ群Ｗ１〜Ｗ４のすべてのサイズ、すなわち、ウィンドウ群Ｗ１〜Ｗ４に含まれるウィンドウの１２のサイズで規格化された、顔であることが分かっている複数のサンプル画像（顔サンプル画像群）と、顔でないことが分かっている複数のサンプル画像（非顔サンプル画像群）とからなる。 A plurality of sample images to be learned are a plurality of faces that are standardized by all sizes of the window groups W1 to W4, that is, 12 sizes of windows included in the window groups W1 to W4. Sample images (face sample image group) and a plurality of sample images (non-face sample image group) known not to be faces.

顔であることが分かっているサンプル画像は、１つのサンプル画像につき、縦および／または横を０．７倍から１．２倍の範囲にて０．１倍単位で段階的に拡大縮小して得られる各サンプル画像に対し、平面上±１５度の範囲にて３度単位で段階的に回転させて得られる複数の変形バリエーションを用いる。なおこのとき、顔のサンプル画像は、目の位置が所定の位置に来るように顔のサイズと位置を規格化し、上記の平面上の回転、拡大縮小は目の位置を基準として行うようにする。例えば、ｄ×ｄサイズの正面顔のサンプル画像の場合においては、図１５に示すように、両目の位置が、サンプル画像の最左上の頂点と最右上の頂点から、それぞれ、内側に１／４ｄ、下側に１／４ｄ移動した各位置とに来るように顔のサイズと位置を規格化し、また、上記の平面上の回転、拡大縮小は、両目の中間点を中心に行うようにする。 A sample image known to be a face is scaled in steps of 0.1 times in the range of 0.7 to 1.2 times in length and / or width for each sample image. For each sample image to be obtained, a plurality of deformation variations obtained by stepwise rotation in units of 3 degrees within a range of ± 15 degrees on the plane are used. At this time, in the face sample image, the size and position of the face are standardized so that the eye position is at a predetermined position, and the above-described rotation and enlargement / reduction on the plane are performed based on the eye position. . For example, in the case of a sample image of a front face of d × d size, as shown in FIG. 15, the positions of both eyes are 1/4 d inward from the upper left vertex and the upper right vertex of the sample image, respectively. The face size and position are standardized so as to come to each position shifted by ¼d downward, and the above-mentioned rotation and enlargement / reduction on the plane are performed around the middle point of both eyes.

このような顔サンプル画像群を、ウィンドウのサイズに応じてサイズが異なる１２種類用意する。これら１２種類の顔サンプル画像群の各々と非顔サンプル画像群とを用いて各種類毎に判別器の学習を行い、１２種類の判別器を生成する。以下、その具体的な学習手法について説明する。 Twelve types of face sample image groups having different sizes according to the window size are prepared. Each of these 12 types of face sample image groups and the non-face sample image group is used to learn a classifier for each type, thereby generating 12 types of classifiers. The specific learning method will be described below.

図１６は判別器の学習方法を示すフローチャートである。なお、顔サンプル画像群および非顔サンプル画像群を構成する各サンプル画像は、前もって、上記照明補正部２０による照明補正処理と同一の局所正規化処理により照明補正が施されているものとする。 FIG. 16 is a flowchart showing a learning method of the classifier. It is assumed that each sample image constituting the face sample image group and the non-face sample image group has been subjected to illumination correction in advance by the same local normalization process as the illumination correction process by the illumination correction unit 20.

これら各サンプル画像には、重みすなわち重要度が割り当てられる。まず、すべてのサンプル画像の重みの初期値が等しく１に設定される（ステップＳＴ５１）。次に、サンプル画像およびその縮小画像の平面内に設定される所定の２点を１ペアとして複数のペアからなるペア群を複数種類設定したときの、この複数種類のペア群のそれぞれについて弱半別器が作成される（ステップＳＴ５２）。ここで、それぞれの弱判別器とは、ウィンドウＷで切り出された部分画像とその縮小画像の平面内に設定される所定の２点を１ペアとして複数のペアからなる１つのペア群を設定したときの、この１つのペア群を構成する各ペアにおける２点間の画素値（輝度）の差分値の組合せを用いて、顔の画像と顔でない画像とを判別する基準を提供するものである。本実施形態においては、１つのペア群を構成する各ペアにおける２点間の画素値の差分値の組合せについてのヒストグラムを弱判別器のスコアテーブルの基礎として使用する。 Each of these sample images is assigned a weight or importance. First, the initial value of the weight of all sample images is set equal to 1 (step ST51). Next, when a plurality of pairs of groups consisting of a plurality of pairs are set with a predetermined two points set in the plane of the sample image and the reduced image as one pair, each of the plurality of types of pairs is weak. A separate device is created (step ST52). Here, each weak discriminator sets one pair group consisting of a plurality of pairs, with a predetermined two points set in the plane of the partial image cut out by the window W and the reduced image as one pair. This provides a reference for discriminating between a face image and a non-face image using a combination of difference values of pixel values (luminance) between two points in each pair constituting this one pair group. . In the present embodiment, a histogram for a combination of pixel value difference values between two points in each pair constituting one pair group is used as the basis of the score table of the weak classifier.

図１７を参照しながらある判別器の作成について説明する。図１７の左側のサンプル画像に示すように、この判別器を作成するためのペア群を構成する各ペアの２点は、顔であることが分かっている複数のサンプル画像において、サンプル画像上の右目の中心にある点をＰ１、右側の頬の部分にある点をＰ２、眉間の部分にある点をＰ３、サンプル画像を４近傍画素平均で縮小した１６×１６画素サイズの縮小画像上の右目の中心にある点をＰ４、右側の頬の部分にある点をＰ５、さらに４近傍画素平均で縮小した８×８画素サイズの縮小画像上の額の部分にある点をＰ６、口の部分にある点をＰ７として、Ｐ１−Ｐ２、Ｐ１−Ｐ３、Ｐ４−Ｐ５、Ｐ４−Ｐ６、Ｐ６−Ｐ７の５ペアである。なお、ある判別器を作成するための１つのペア群を構成する各ペアの２点の座標位置はすべてのサンプル画像において同一である。そして顔であることが分かっているすべてのサンプル画像について上記５ペアを構成する各ペアの２点間の画素値の差分値の組合せが求められ、そのヒストグラムが作成される。ここで、画素値の差分値の組合せとしてとり得る値は、画像の輝度階調数に依存するが、仮に１６ビット階調である場合には、１つの画素値の差分値につき６５５３６通りあり、全体では階調数の（ペア数）乗、すなわち６５５３６の５乗通りとなってしまい、学習および検出のために多大なサンプルの数、時間およびメモリを要することとなる。このため、本実施形態においては、画素値の差分値を適当な数値幅で区切って量子化し、ｎ値化する（例えばｎ＝１００）。 The creation of a classifier will be described with reference to FIG. As shown in the sample image on the left side of FIG. 17, two points of each pair constituting the pair group for creating this discriminator are a plurality of sample images that are known to be faces. The right eye on the reduced image of 16 × 16 pixel size in which the point in the center of the right eye is P1, the point in the right cheek part is P2, the point in the part between the eyebrows is P3, and the sample image is reduced by an average of four neighboring pixels The point at the center of P4, the point at the cheek on the right side is P5, and the point at the forehead part on the reduced image of 8 × 8 pixel size reduced by the average of 4 neighboring pixels is P6, the mouth part A certain point is P7, and there are five pairs of P1-P2, P1-P3, P4-P5, P4-P6, and P6-P7. Note that the coordinate positions of the two points of each pair constituting one pair group for creating a certain classifier are the same in all sample images. For all sample images that are known to be faces, combinations of pixel value difference values between two points of each of the five pairs are obtained, and a histogram thereof is created. Here, the value that can be taken as a combination of the difference values of the pixel values depends on the number of luminance gradations of the image, but if it is a 16-bit gradation, there are 65536 kinds of difference values of one pixel value, As a whole, the number of gradations is (the number of pairs), that is, 65536 to the fifth power, and a large number of samples, time, and memory are required for learning and detection. For this reason, in the present embodiment, the difference value of the pixel value is divided by an appropriate numerical value width and quantized to be n-valued (for example, n = 100).

これにより、画素値の差分値の組合せの数はｎの５乗通りとなるため、画素値の差分値の組合せを表すデータ数を低減できる。 As a result, the number of combinations of difference values of pixel values is n to the fifth power, so that the number of data representing combinations of difference values of pixel values can be reduced.

同様に、顔でないことが分かっている複数のサンプル画像についても、ヒストグラムが作成される。なお、顔でないことが分かっているサンプル画像については、顔であることが分かっているサンプル画像上における上記各ペアの所定の２点の位置に対応する位置（同様に参照符号Ｐ１からＰ７を用いる）が用いられる。これらの２つのヒストグラムが示す頻度値の比の対数値をとってヒストグラムで表したものが、図１７の一番右側に示す、弱判別器のスコアテーブルの基礎として用いられるヒストグラムである。この弱判別器のヒストグラムが示す各縦軸の値を、以下、判別ポイントと称する。この弱判別器によれば、正の判別ポイントに対応する、画素値の差分値の組合せの分布を示す画像は顔である可能性が高く、判別ポイントの絶対値が大きいほどその可能性は高まると言える。逆に、負の判別ポイントに対応する画素値の差分値の組合せの分布を示す画像は顔でない可能性が高く、やはり判別ポイントの絶対値が大きいほどその可能性は高まる。ステップＳＴ５２では、判別に使用され得る複数種類のペア群を構成する各ペアの所定の２点間の画素値の差分値の組合せについて、上記のヒストグラム形式の複数の弱判別器が作成される。 Similarly, histograms are created for a plurality of sample images that are known not to be faces. For sample images that are known not to be faces, positions corresponding to the positions of the two predetermined points of each pair on the sample image that is known to be a face (similarly, reference numerals P1 to P7 are used). ) Is used. A histogram obtained by taking logarithm values of the ratios of the frequency values indicated by these two histograms and representing the histogram is used as the basis of the weak discriminator score table shown on the rightmost side of FIG. The value of each vertical axis indicated by the histogram of the weak classifier is hereinafter referred to as a discrimination point. According to this weak discriminator, an image showing the distribution of combinations of pixel value difference values corresponding to positive discrimination points is highly likely to be a face, and the possibility increases as the absolute value of the discrimination point increases. It can be said. Conversely, an image showing a distribution of combinations of difference values of pixel values corresponding to negative discrimination points is highly likely not to be a face, and the possibility increases as the absolute value of the discrimination point increases. In step ST52, a plurality of weak discriminators in the above-described histogram format are created for combinations of pixel value difference values between two predetermined points of each pair constituting a plurality of types of pair groups that can be used for discrimination.

続いて、ステップＳＴ５２で作成した複数の弱半別器のうち、画像が顔であるか否かを判別するのに最も有効な弱判別器が選択される。最も有効な弱判別器の選択は、各サンプル画像の重みを考慮して行われる。この例では、各弱判別器の重み付き正答率が比較され、最も高い重み付き正答率を示す弱判別器が選択される（ステップＳＴ５３）。すなわち、最初のステップＳＴ５３では、各サンプル画像の重みは等しく１であるので、単純にその弱判別器によって画像が顔であるか否かが正しく判別されるサンプル画像の数が最も多いものが、最も有効な弱判別器として選択される。一方、後述するステップＳＴ５５において各サンプル画像の重みが更新された後の２回目のステップＳＴ５３では、重みが１のサンプル画像、重みが１よりも大きいサンプル画像、および重みが１よりも小さいサンプル画像が混在しており、重みが１よりも大きいサンプル画像は、正答率の評価において、重みが１のサンプル画像よりも重みが大きい分多くカウントされる。これにより、２回目以降のステップＳＴ５３では、重みが小さいサンプル画像よりも、重みが大きいサンプル画像が正しく判別されることに、より重点が置かれる。 Subsequently, the most effective weak discriminator for discriminating whether or not the image is a face is selected from the plurality of weak semi-divided devices created in step ST52. The most effective weak classifier is selected in consideration of the weight of each sample image. In this example, the weighted correct answer rates of the weak discriminators are compared, and the weak discriminator showing the highest weighted correct answer rate is selected (step ST53). That is, in the first step ST53, since the weight of each sample image is equal to 1, the one with the largest number of sample images in which it is simply determined correctly whether or not the image is a face by the weak classifier, Selected as the most effective weak classifier. On the other hand, in the second step ST53 after the weight of each sample image is updated in step ST55, which will be described later, a sample image with a weight of 1, a sample image with a weight greater than 1, and a sample image with a weight less than 1 The sample images having a weight greater than 1 are counted more in the evaluation of the correct answer rate because the weight is larger than the sample images having a weight of 1. Thereby, in step ST53 after the second time, more emphasis is placed on correctly identifying a sample image having a large weight than a sample image having a small weight.

次に、それまでに選択した弱判別器の組合せの正答率、すなわち、それまでに選択した弱判別器を組み合せて使用して（学習段階では、弱判別器は必ずしも線形に結合させる必要はない）各サンプル画像が顔の画像であるか否かを判別した結果が、実際に顔の画像であるか否かの答えと一致する率が、所定の閾値を超えたか否かが確かめられる（ステップＳＴ５４）。ここで、弱判別器の組合せの正答率の評価に用いられるのは、現在の重みが付けられたサンプル画像群でも、重みが等しくされたサンプル画像群でもよい。所定の閾値を超えた場合は、それまでに選択した弱判別器を用いれば画像が顔であるか否かを十分に高い確率で判別できるため、学習は終了する。所定の閾値以下である場合は、それまでに選択した弱判別器と組み合せて用いるための追加の弱判別器を選択するために、ステップＳＴ５６へと進む。 Next, the correct answer rate of the combination of the weak classifiers selected so far, that is, using the weak classifiers selected so far (in the learning stage, the weak classifiers do not necessarily have to be linearly combined) ) It is ascertained whether the result of determining whether or not each sample image is a face image has exceeded a predetermined threshold value at a rate that matches the answer of whether or not it is actually a face image (step) ST54). Here, the current weighted sample image group or the sample image group with equal weight may be used for evaluating the correct answer rate of the combination of weak classifiers. When the predetermined threshold value is exceeded, learning is terminated because it is possible to determine whether the image is a face with a sufficiently high probability by using the weak classifier selected so far. If it is equal to or less than the predetermined threshold value, the process proceeds to step ST56 in order to select an additional weak classifier to be used in combination with the weak classifier selected so far.

ステップＳＴ５６では、直近のステップＳＴ５３で選択された弱判別器が再び選択されないようにするため、その弱判別器が除外される。 In step ST56, the weak classifier selected in the most recent step ST53 is excluded so as not to be selected again.

次に、直近のステップＳＴ５３で選択された弱判別器では顔であるか否かを正しく判別できなかったサンプル画像の重みが大きくされ、画像が顔であるか否かを正しく判別できたサンプル画像の重みが小さくされる（ステップＳＴ５５）。このように重みを大小させる理由は、次の弱判別器の選択において、既に選択された弱判別器では正しく判別できなかった画像を重要視し、それらの画像が顔であるか否かを正しく判別できる弱判別器が選択されるようにして、弱判別器の組合せの効果を高めるためである。 Next, the weight of the sample image that could not correctly determine whether or not it is a face in the weak classifier selected in the most recent step ST53 is increased, and the sample image that can correctly determine whether or not the image is a face. Is reduced (step ST55). The reason for increasing or decreasing the weight in this way is that in the selection of the next weak classifier, importance is placed on images that could not be correctly determined by the already selected weak classifier, and whether or not those images are faces is correct. This is because a weak discriminator that can be discriminated is selected to enhance the effect of the combination of the weak discriminators.

続いて、ステップＳＴ５３へと戻り、上記したように重み付き正答率を基準にして次に有効な弱判別器が選択される。 Subsequently, the process returns to step ST53, and the next effective weak classifier is selected based on the weighted correct answer rate as described above.

以上のステップＳＴ５３からＳ５６を繰り返して、顔であるか否かを判別するのに適した弱判別器として、特定のペア群を構成する各ペアの所定の２点間の画素値の差分値の組合せに対応する弱判別器が選択されたところで、ステップＳＴ５４で確認される正答率が閾値を超えたとすると、顔であるか否かの判別に用いる弱判別器の種類と判別条件とが確定され（ステップＳＴ５７）、これにより学習を終了する。なお、選択された弱判別器は、その重み付き正答率が高い順に線形結合され、１つの判別器が構成される。また、各弱判別器については、それぞれ得られたヒストグラムを基に、画素値の差分値の組合せに応じてスコアを算出するためのスコアテーブルが生成される。なお、ヒストグラム自身をスコアテーブルとして用いることもでき、この場合、ヒストグラムの判別ポイントがそのままスコアとなる。 As a weak discriminator suitable for discriminating whether or not it is a face by repeating the above steps ST53 to S56, the difference value of the pixel value between two predetermined points of each pair constituting a specific pair group If the weak discriminator corresponding to the combination is selected and the correct answer rate confirmed in step ST54 exceeds the threshold, the type of the weak discriminator used for discriminating whether or not it is a face and the discrimination condition are determined. (Step ST57), thereby completing the learning. The selected weak classifiers are linearly combined in descending order of the weighted correct answer rate to constitute one classifier. For each weak classifier, a score table for calculating a score according to a combination of pixel value difference values is generated based on the obtained histogram. Note that the histogram itself can also be used as a score table. In this case, the discrimination point of the histogram is directly used as a score.

このようにして、各顔サンプル画像群毎に学習を行うことにより、上述の１２種類の判別器が生成される。 In this way, by learning for each face sample image group, the 12 types of discriminators described above are generated.

なお、上記の学習手法を採用する場合において、弱判別器は、特定のペア群を構成する各ペアの所定の２点間の画素値の差分値の組合せを用いて顔の画像と顔でない画像とを判別する基準を提供するものであれば、上記のヒストグラムの形式のものに限られずいかなるものであってもよく、例えば２値データ、閾値または関数等であってもよい。また、同じヒストグラムの形式であっても、図１７の中央に示した２つのヒストグラムの差分値の分布を示すヒストグラム等を用いてもよい。 In the case of adopting the above learning method, the weak classifier uses a combination of difference values of pixel values between two predetermined points of each pair constituting a specific pair group, and a face image and a non-face image. Is not limited to the above-described histogram format, and may be anything, for example, binary data, a threshold value, a function, or the like. Moreover, even in the same histogram format, a histogram or the like indicating the distribution of difference values between the two histograms shown in the center of FIG. 17 may be used.

また、学習の方法としては上記手法に限定されるものではなく、ニューラルネットワーク等他のマシンラーニングの手法を用いることができる。 Further, the learning method is not limited to the above method, and other machine learning methods such as a neural network can be used.

このように、本実施形態の顔検出システムによれば、検出対象画像を多重解像度化する際に、特許文献１に記載された手法のように、多重解像度画像間の解像度の差異を細かくする必要がないため、検出対象画像を多重解像度化する際の演算量を低減でき、よって高速に顔を検出することができる。 As described above, according to the face detection system of the present embodiment, when the detection target image is converted to multi-resolution, it is necessary to make the difference in resolution between the multi-resolution images fine as in the technique described in Patent Document 1. Therefore, it is possible to reduce the amount of calculation when the detection target image is multi-resolution, and thus it is possible to detect the face at high speed.

また、検出対象画像Ｓ０を多重解像度化する際の解像度の差異を特許文献１に記載された主要より大きくしつつも、部分画像を生成するウィンドウＷを各解像度画像間の倍率を補間する複数のサイズを有するものとしているため、検出対象画像Ｓ０に含まれる各種サイズの顔を精度良く検出することができる。 Further, while making the difference in resolution when the detection target image S0 is multi-resolution larger than the main described in Patent Document 1, a plurality of windows W for generating partial images are interpolated with magnifications between the resolution images. Since it has a size, faces of various sizes included in the detection target image S0 can be detected with high accuracy.

また、特許文献１に記載された手法と比較して解像度画像の数が少なくなるため、特許文献１に記載された手法と比較して、各解像度画像に照明補正を行う際の演算量を低減することができる。 In addition, since the number of resolution images is reduced compared to the method described in Patent Document 1, the amount of calculation when performing illumination correction on each resolution image is reduced compared to the method described in Patent Document 1. can do.

また、本実施形態によれば、判別器群３４の学習に際して、照明補正部２０と同一の照明補正がなされた複数のサンプル画像を用いているため、ウィンドウのサイズが異なっていても、サイズが異なる部分画像毎に照明補正を行う必要がなくなる。これにより、顔検出のための演算量を低減でき、その結果、高速に顔を検出することができる。 In addition, according to the present embodiment, when learning the discriminator group 34, a plurality of sample images that have been subjected to the same illumination correction as the illumination correction unit 20 are used. It is not necessary to perform illumination correction for each different partial image. Thereby, the amount of calculation for face detection can be reduced, and as a result, a face can be detected at high speed.

また、非特許文献１に記載された手法のように、判別器群３４を構成する弱判別器のスケールを変更する計算を行う必要がないため、顔の検出を高速かつ精度良く行うことができる。 Further, unlike the method described in Non-Patent Document 1, it is not necessary to perform a calculation for changing the scale of the weak classifiers constituting the classifier group 34, so that face detection can be performed at high speed and with high accuracy. .

なお、本実施形態による顔検出システムを動画像からの顔検出に適用する場合、以下の処理が可能となる。図１８は本実施形態において動画像から顔を検出する場合の処理を示すフローチャートである。なお、ここでは、動画像における処理対象のフレーム（現フレーム）ｆｔと、処理対象フレームｆｔの１つ前のフレーム（前フレーム）ｆｔ−１とを用いて処理を行うものとして説明する。 Note that when the face detection system according to the present embodiment is applied to face detection from a moving image, the following processing is possible. FIG. 18 is a flowchart showing processing when a face is detected from a moving image in the present embodiment. In the following description, it is assumed that processing is performed using a frame to be processed (current frame) ft in a moving image and a frame (previous frame) ft-1 immediately before the processing target frame ft.

まず検出制御部３１は、前フレームｆｔ−１において顔が検出されているか否かを判定する（ステップＳＴ６１）。前フレームｆｔ−１において顔が検出されていない場合には、現フレームｆｔを検出対象画像として、上記図１１および図１４に示すフローチャートと同一の処理を行うべく、図１１のステップＳＴ１１の処理に進む。前フレームｆｔ−１において顔が検出されている場合、顔であると判別された部分画像Ｂの生成に使用したサイズのウィンドウ単位で、前フレームｆｔ−１と現フレームｆｔとの対応する画素間の差の絶対値の総和（以下差分とする）を、各フレームｆｔ−１，ｆｔの互いに対応する階層の解像度画像間で算出し（ステップＳＴ６２）、ウィンドウ単位で前フレームｆｔ−１と現フレームｆｔとの間で移動した物体、すなわち動体が存在するか否かを判定する（ステップＳＴ６３）。 First, the detection control unit 31 determines whether or not a face is detected in the previous frame ft-1 (step ST61). When the face is not detected in the previous frame ft-1, the process of step ST11 in FIG. 11 is performed in order to perform the same process as the flowchart shown in FIGS. 11 and 14 with the current frame ft as the detection target image. move on. When a face is detected in the previous frame ft-1, between corresponding pixels of the previous frame ft-1 and the current frame ft in a window unit having a size used for generating the partial image B determined to be a face. Is calculated between the resolution images of the layers corresponding to each other in the frames ft-1 and ft (step ST62), and the previous frame ft-1 and the current frame in units of windows. It is determined whether or not there is an object moved between ft, that is, a moving object (step ST63).

図１９は差分の算出を説明するための図である。図１９に示すように、前フレームｆｔ−１において顔の検出に使用したウィンドがウィンドウＷ２＿１である場合、前フレームｆｔ−１および現フレームｆｔの間において、ウィンドウＷ２＿１を走査しつつ、対応する位置のウィンドウＷ２＿１（ｔ−１），Ｗ２＿１（ｔ）の差分を算出することにより動体が存在するか否かを判定する。このように、ウィンドウ単位で差分を算出することにより、画素単位で差分を算出して動体を検出する場合よりも演算時間を短縮することができる。 FIG. 19 is a diagram for explaining calculation of a difference. As shown in FIG. 19, when the window used for face detection in the previous frame ft-1 is the window W2_1, the window W2_1 is scanned between the previous frame ft-1 and the current frame ft, and the corresponding position is scanned. It is determined whether or not a moving object exists by calculating the difference between the windows W2_1 (t-1) and W2_1 (t). Thus, by calculating the difference in window units, the calculation time can be shortened compared to the case of detecting the moving object by calculating the difference in pixel units.

なお、ウィンドウＷ２＿１（ｔ−１），Ｗ２＿１（ｔ）の差分を算出する際に、ウィンドウＷ２＿１（ｔ−１），Ｗ２＿１（ｔ）内のすべての画素の差分を算出してもよいが、ウィンドウサイズに対応する各判別器群３４＿ｊ（ｊ＝１〜１２）を構成する弱判別器のうち、特定の弱判別器に入力される特徴量を算出するために使用する画素位置のみを用いて差分を算出するようにしてもよい。例えば、ウィンドウＷ２＿１に対応する判別器群３４＿４のある弱判別器に入力される特徴量が、図１９に示す３つの画素位置Ｐ１１〜Ｐ１３の画素値である場合、ウィンドウＷ２＿１（ｔ−１），Ｗ２＿１（ｔ）における、各画素位置Ｐ１１〜Ｐ１３間の差分を算出して動体を検出するようにしてもよい。これにより、動体検出のための演算を大幅に低減して、動体を高速に検出することができる。 Note that when calculating the difference between the windows W2_1 (t-1) and W2_1 (t), the difference between all the pixels in the windows W2_1 (t-1) and W2_1 (t) may be calculated. Among weak discriminators constituting each discriminator group 34_j (j = 1 to 12) corresponding to the size, a difference is obtained by using only pixel positions used for calculating a feature amount input to a specific weak discriminator. May be calculated. For example, when the feature amount input to a weak discriminator of the discriminator group 34_4 corresponding to the window W2_1 is the pixel values of the three pixel positions P11 to P13 shown in FIG. 19, the window W2_1 (t−1), You may make it detect a moving body by calculating the difference between each pixel position P11-P13 in W2_1 (t). Thereby, the calculation for a moving body detection can be reduced significantly and a moving body can be detected at high speed.

ステップＳＴ６３が否定されると、検出制御部３１は、現フレームｆｔを検出対象画像として、上記図１１および図１４に示すフローチャートと同一の処理を行うべく、図１１のステップＳＴ１１の処理に進む。ステップＳＴ６３が肯定されると、検出制御部３１は、現フレームｆｔにおける動体の領域およびその近傍の領域に対してのみ、上記図１１および図１４と同様の検出処理を行い（ステップＳＴ６４）、検出結果を出力し（ステップＳＴ６５）、検出対象のフレームを次のフレームに変更し（ステップＳＴ６６）、ステップＳＴ６１に戻る。 If step ST63 is negative, the detection control unit 31 proceeds to the process of step ST11 in FIG. 11 to perform the same process as the flowchart shown in FIGS. 11 and 14 using the current frame ft as the detection target image. If step ST63 is affirmed, the detection control unit 31 performs the same detection process as that in FIG. 11 and FIG. 14 only on the moving object region and its neighboring region in the current frame ft (step ST64). The result is output (step ST65), the detection target frame is changed to the next frame (step ST66), and the process returns to step ST61.

このように、動画像の場合には、まず動体検出処理を行うことにより、動体の位置を推定し、推定した動体およびその近傍の領域においてのみ顔の検出を行うようにすることができるため、効率よく顔の検出を行うことができる。とくにウィンドウ単位で動体検出処理を行うことにより、演算量を低減して、高速に動体を検出することができる。 As described above, in the case of a moving image, first, by performing the moving object detection process, it is possible to estimate the position of the moving object, and to detect the face only in the estimated moving object and the area in the vicinity thereof. The face can be detected efficiently. In particular, by performing a moving object detection process in units of windows, it is possible to reduce the amount of calculation and detect a moving object at high speed.

なお、上記実施形態においては、検出対象を人物の顔としているが、人物の手等の他のオブジェクトを検出するようにしてもよい。この場合、判別器はオブジェクトを含むサンプル画像群とオブジェクトを含まないサンプル画像群とを用いて学習を行えばよい
また、上記実施形態においては、照明補正部２０により照明補正処理を行っているが、照明補正は本発明に必須のものではない。 In the above embodiment, the detection target is a human face, but other objects such as a human hand may be detected. In this case, the discriminator may perform learning using the sample image group including the object and the sample image group not including the object. In the above embodiment, the illumination correction unit 20 performs the illumination correction process. The illumination correction is not essential to the present invention.

以上、本発明の実施形態に係る顔検出システムについて説明したが、この顔検出システムのうちの本発明の顔検出装置に対応する部分における各処理をコンピュータに実行させるためのプログラムも、本発明の実施形態の１つである。また、そのようなプログラムを記録したコンピュータ読取可能な記録媒体も、本発明の実施形態の１つである。 Although the face detection system according to the embodiment of the present invention has been described above, a program for causing a computer to execute each process in a portion corresponding to the face detection device of the present invention in the face detection system is also included in the present invention. This is one of the embodiments. A computer-readable recording medium that records such a program is also one embodiment of the present invention.

１顔検出システム
１０多重解像度化部
２０照明補正部
３０顔検出部
３１検出制御部
３２解像度画像選択部
３３ウィンドウ設定部
３４判別器群 DESCRIPTION OF SYMBOLS 1 Face detection system 10 Multi-resolution part 20 Illumination correction part 30 Face detection part 31 Detection control part 32 Resolution image selection part 33 Window setting part 34 Classifier group

Claims

Multi-resolution processing means for obtaining a plurality of resolution images having different resolutions by multi-resolution a detection target image for detecting a specific type of object at a predetermined magnification;
A plurality of windows having a size for interpolating the predetermined magnification between the resolution images on the plurality of resolution images , each of which is multi-resolutiond by the predetermined magnification and having at least one sub-window having different resolutions. Partial image generation means for generating a partial image by scanning a plurality of windows ;
A determination unit that calculates a feature amount related to a distribution of pixel values of the partial image based on the partial image and determines whether the partial image is an image of the object using the feature amount; ,
The determination unit is configured to learn, for each window size, a feature amount related to the object distribution by using a plurality of sample images having different sizes corresponding to the sizes of the plurality of windows and subjected to the same illumination correction. A discriminator for discriminating whether or not the partial image is an image of the object using the feature amount of the partial image, the discriminator having different sizes corresponding to the size of the sub-window, and the same An object detection apparatus comprising: a discriminator that further learns a feature amount related to the distribution of the object for each size of the sub-window using a plurality of sample images subjected to the illumination correction .

A window having a predetermined size among the plurality of windows is set for the target pixel of the resolution image to be detected, and the predetermined pixel is set to a pixel corresponding to the target pixel of the resolution image having a resolution lower than that of the target resolution image. Set the sub-windows of the size window in order of resolution,
Performing a first determination as to whether or not the partial image generated by the relatively low resolution sub-window is the image of the object;
Only when the first determination is affirmative, the generation of the partial image by the plurality of windows and / or a sub-window having a resolution higher than the relatively low resolution of the plurality of windows and the partial image is the object. as for judging whether or not the image, the object detection apparatus according to claim 1, further comprising a control means for controlling the partial image generating means and said determining means.

The control means, when the determination on the partial image generated by the sub-window having the second highest resolution among the sub-windows of the predetermined size window is affirmed, the highest resolution for all of the plurality of windows. Generate multiple partial images by subwindow,
Performing a second determination as to whether the plurality of partial images are images of the object;
By the second determination, the partial image is generated only by a window corresponding to the sub-window that generated the partial image having the highest probability of being the image of the object,
3. The object detection device according to claim 2 , wherein the partial image generation unit and the determination unit are controlled to perform a third determination as to whether or not the partial image is an image of the object. .

Wherein for a plurality of resolution images, the object detection apparatus according to any one of claims 1 to 3, further comprising a lighting correction means for performing the illumination correction.

The detection target image for detecting a specific type of object is multi-resolutiond at a predetermined magnification to obtain a plurality of resolution images having different resolutions,
A plurality of windows having a size for interpolating the predetermined magnification between the resolution images on the plurality of resolution images , each of which is multi-resolutiond by the predetermined magnification and having at least one sub-window having different resolutions. Generating a partial image by scanning a plurality of windows ,
The partial image having a different size corresponding to the size of the plurality of windows and learning a feature amount related to the distribution of the object for each size of the window by using a plurality of sample images subjected to the same illumination correction. A discriminator for discriminating whether or not the partial image is an image of the object using the feature amount according to the above, and having a different size corresponding to the size of the sub-window, and performing the same illumination correction. In addition, a plurality of sample images is used to determine the distribution of pixel values of the partial image based on the partial image by a determination unit including a discriminator that further learns the feature amount related to the distribution of the object for each size of the subwindow. Calculating the feature amount, and determining whether or not the partial image is an image of the object using the feature amount. Object detection method that.

A procedure for obtaining a plurality of resolution images having different resolutions by converting a detection target image for detecting a specific type of object into a multi-resolution with a predetermined magnification,
A plurality of windows having a size for interpolating the predetermined magnification between the resolution images on the plurality of resolution images , each of which is multi-resolutiond by the predetermined magnification and having at least one sub-window having different resolutions. A procedure for generating a partial image by scanning a plurality of windows ,
The partial image having a different size corresponding to the size of the plurality of windows and learning a feature amount related to the distribution of the object for each size of the window by using a plurality of sample images subjected to the same illumination correction. A discriminator for discriminating whether or not the partial image is an image of the object using the feature amount according to the above, and having a different size corresponding to the size of the sub-window, and performing the same illumination correction. In addition, a plurality of sample images is used to determine the distribution of pixel values of the partial image based on the partial image by a determination unit including a discriminator that further learns the feature amount related to the distribution of the object for each size of the subwindow. And calculating a feature amount and determining whether the partial image is an image of the object using the feature amount. Program for executing the object detection method characterized in the computer in that.