JP2009026326A

JP2009026326A - Group study device and method

Info

Publication number: JP2009026326A
Application number: JP2008248698A
Authority: JP
Inventors: Kenichi Hidai; 健一日台; Kotaro Sabe; 浩太郎佐部; Kenta Kawamoto; 献太河本
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2008-09-26
Filing date: 2008-09-26
Publication date: 2009-02-05
Anticipated expiration: 2023-11-25
Also published as: JP4553044B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an object detector and method, and a group study device and method with which arithmetic processing in study and in detection is accelerated, a target object with optional size is detected, and which has extremely high discrimination capability when the target object is detected by group study. <P>SOLUTION: A target object detector 1 has: a scaling part 3 which reduces a gray image input from an image output part 2 to generate a scaling image; a scanning part 4 which sequentially operates and cuts off window images from the scaling image; and a discrimination unit 5 which discriminates whether or not the respective window images are target objects. The discrimination unit 5 is comprised of a plurality of weak discrimination units of which the group study is performed by boosting and an adder which performs majority with weight to output of the weak discrimination units, and each weak discrimination unit uses difference in luminance values of two pixels as feature quantity to output an estimation value for estimating whether or not the respective window images are the target objects. In addition, the discrimination unit 5 discontinues calculation of the estimation value about window images determined as not being the target objects by a pre-studied discontinuation threshold. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、リアルタイムで顔画像などの対象物を検出する対象物検出装置及び方法、並びにその対象物検出装置を集団学習する集団学習装置及び方法に関する。 The present invention relates to an object detection apparatus and method for detecting an object such as a face image in real time, and a group learning apparatus and method for group learning of the object detection apparatus.

従来、複雑な画像シーンの中から動きを使わないで画像信号の濃淡パターンのみを使った顔検出手法は数多く提案されている。例えば下記特許文献１に記載の顔検出器は、ハール（Ｈａａｒ）基底のようなフィルタを弱判別器（弱学習機）（weak learner）に用いたアダブースト（AdaBoost）を使用したもので、後述する積分画像（インテグラルイメージ：Integral image）とよばれる画像と矩形特徴（rectangle feature）とを使用することで、高速に弱仮説（weak hypothesis）を計算することができる。 Conventionally, many face detection methods have been proposed that use only a light / dark pattern of an image signal without using a motion from a complicated image scene. For example, the face detector described in Patent Document 1 below uses an AdaBoost that uses a filter such as a Haar basis for a weak learner (weak learner), which will be described later. By using an image called an integral image and a rectangle feature, a weak hypothesis can be calculated at high speed.

図１５は、特許文献１に記載の矩形特徴を示す模式図である。図１５に示すように、特許文献１に記載の技術においては、入力画像１４２Ａ〜１４２Ｄにおいて、同一サイズの隣り合う矩形領域の輝度値の総和を求め、一方の矩形領域の輝度値の総和と他方の矩形領域の輝度値の総和との差を出力するようなフィルタ（弱仮説）を複数用意する。例えば、入力画像１４２Ａにおいては、矩形領域１５４Ａ−１の輝度値の総和から、影を付けて示す矩形領域（rectangular box）１５４Ａ−２の輝度値の総和を減算するフィルタ１５４Ａを示す。このような２つの矩形領域からなるフィルタを２矩形特徴（2 rectangle feature）という。また、入力画像１４２Ｃにおいては、１つの矩形領域が３分割された３つの矩形領域１５４Ｃ−１〜１５４Ｃ−３からなり、矩形領域１５４Ｃ−１、１５４Ｃ−３の輝度値の総和から影を付けて示す中央の矩形領域１５４Ｃ−２の輝度値の総和を減算するフィルタ１５４Ｃを示す。このような３つの矩形領域からなるフィルタを３矩形特徴（3 rectangle feature）という。更に、入力画像１４２Ｄにおいては、１つの矩形領域が上下左右に分割された４つの矩形領域１５４Ｄ−１〜１５４Ｄ−４からなり、矩形領域１５４Ｄ−１、１５４Ｄ−３の輝度値の総和から影を付けて示す矩形領域１５４Ｄ−２、１５４Ｄ−４の輝度値の総和を減算するフィルタ１５４Ｄを示す。このような４つの矩形領域からなるフィルタを４矩形特徴（4 rectangle feature）という。 FIG. 15 is a schematic diagram showing the rectangular feature described in Patent Document 1. As shown in FIG. As shown in FIG. 15, in the technique described in Patent Document 1, in input images 142A to 142D, the sum of luminance values of adjacent rectangular regions of the same size is obtained, and the sum of luminance values of one rectangular region and the other A plurality of filters (weak hypotheses) that output the difference from the sum of the luminance values of the rectangular area are prepared. For example, in the input image 142A, a filter 154A that subtracts the sum of the luminance values of the rectangular region 154A-2 shown with a shadow from the sum of the luminance values of the rectangular region 154A-1. Such a filter composed of two rectangular regions is called a two rectangle feature. In addition, the input image 142C includes three rectangular areas 154C-1 to 154C-3 obtained by dividing one rectangular area into three, and a shadow is added from the sum of luminance values of the rectangular areas 154C-1 and 154C-3. A filter 154C that subtracts the sum of the luminance values of the central rectangular area 154C-2 shown is shown. Such a filter composed of three rectangular areas is called a three rectangle feature. Further, in the input image 142D, one rectangular area is composed of four rectangular areas 154D-1 to 154D-4 divided into upper, lower, left and right, and a shadow is added from the sum of luminance values of the rectangular areas 154D-1, 154D-3. A filter 154D that subtracts the sum of the luminance values of the rectangular regions 154D-2 and 154D-4 shown. Such a filter composed of four rectangular regions is called a four rectangle feature.

例えば、図１６に示す顔画像を、図１５に示す例えば矩形特徴１５４Ｂを使用して顔であることを判定する場合について説明する。２矩形特徴１５４Ｂは、１つの矩形領域が上下（垂直方向）に２分割された２つの矩形領域１５４Ｂ−１、１５４Ｂ−２からなり、矩形領域１５４Ｂ−２の輝度値の総和から影を付けて示す矩形領域１５４Ｂ−１の輝度値の総和を減算する。人間の顔（対象物）１３８は、頬の領域より眼の領域の方が輝度値が低いことを利用すると、矩形特徴１５４Ｂの出力値から入力画像が顔か否か（正解または不正解）をある程度の確率で推定することができる。これをアダブーストにおける弱判別器の１つとして利用する。 For example, a case will be described in which the face image shown in FIG. 16 is determined to be a face using, for example, the rectangular feature 154B shown in FIG. The two rectangular feature 154B is composed of two rectangular areas 154B-1 and 154B-2 in which one rectangular area is divided into two in the vertical direction (vertical direction), and a shadow is added from the sum of the luminance values of the rectangular area 154B-2. The sum of the luminance values of the rectangular area 154B-1 shown is subtracted. When the human face (object) 138 uses the fact that the eye area has a lower luminance value than the cheek area, it is determined from the output value of the rectangular feature 154B whether the input image is a face (correct answer or incorrect answer). It can be estimated with a certain probability. This is used as one of weak classifiers in AdaBoost.

ここで、検出時において、入力画像には含まれる様々な大きさの顔領域を検出するため、様々なサイズの領域（以下、探索ウィンドウという。）を切り出して顔か否かを判定する必要がある。しかしながら例えば３２０×２４０画素からなる入力画像には、およそ５００００種類のサイズの顔領域（探索ウィンドウ）が含まれており、これら全てのウィンドウサイズについての演算を行うと極めて時間がかかる。そこで、特許文献１においては、積分画像とよばれる画像を使用する。積分画像とは、図１７に示すように、入力画像１４４において、（ｘ、ｙ）番目の画素１６２が、下記式（１）に示すように、その画素１６２より左上の画素の輝度値の総和になっている画像である。即ち、画素１６２の値は、画素１６２の左上の矩形領域１６０に含まれる画素の輝度値の総和となっている。以下、各画素値が下記式（１）に示す値の画像を積分画像という。 Here, at the time of detection, in order to detect face areas of various sizes included in the input image, it is necessary to determine whether or not the face is by cutting out areas of various sizes (hereinafter referred to as search windows). is there. However, for example, an input image composed of 320 × 240 pixels includes face regions (search windows) of about 50,000 types of sizes, and it takes a very long time to perform calculations for all these window sizes. Therefore, in Patent Document 1, an image called an integral image is used. As shown in FIG. 17, the integral image is the sum of the luminance values of the pixels at the upper left of the pixel 162 in the input image 144 as shown in the following equation (1) in the input image 144. It is an image that is. That is, the value of the pixel 162 is the sum of the luminance values of the pixels included in the upper left rectangular area 160 of the pixel 162. Hereinafter, an image in which each pixel value is a value represented by the following formula (1) is referred to as an integral image.

この積分画像を使用すると、任意の大きさの矩形領域の演算を高速に行うことができる。すなわち、図１８に示すように、左上の矩形領域１７０、矩形領域１７０の右横、下、右下のそれぞれ矩形領域１７２、１７４、１７６としたとき、矩形領域１７６の４頂点を左上から時計周りにＰ１、Ｐ２、Ｐ３、Ｐ４とする。ここで、Ｐ１は矩形領域１７０の輝度値の総和Ａ（Ｐ１＝Ａ）、Ｐ２はＡ＋矩形領域１７２の輝度値の総和Ｂ（Ｐ２＝Ａ＋Ｂ）、Ｐ３はＡ＋矩形領１７４の輝度値の総和Ｃ（Ｐ３＝Ａ＋Ｃ）、Ｐ４はＡ＋Ｂ＋Ｃ＋矩形領１７６の輝度値の総和Ｄ（Ｐ４＝Ａ＋Ｂ＋Ｃ＋Ｄ）となっている。このとき、矩形領域Ｄの輝度値の総和Ｄは、Ｐ４−（Ｐ２＋Ｐ３）−Ｐ１として算出することができ、矩形領域の四隅の画素値を加減算することで矩形領域の輝度値の総和を高速に算出することができる。通常、入力画像をスケール変換し、スケール変換された各画像から、学習に使用する学習サンプルと同一サイズのウィンドウ（探索ウィンドウ）を切り出すことで、異なるサイズの探索ウィンドウを探索することを可能にする。しかしながら上述したように、全てのサイズの探索ウィンドウを設定可能なように入力画像をスケール変換すると極めて演算量が膨大になってしまう。そこで、特許文献１に記載の技術においては、矩形領域の輝度値の総和の演算を高速に行うことができる積分画像を用い、矩形特徴を使用することにより演算量を低減するものである。 If this integral image is used, a rectangular region of an arbitrary size can be calculated at high speed. That is, as shown in FIG. 18, when the rectangular area 170 at the upper left and the rectangular areas 172, 174, and 176 at the right side, the lower, and the lower right of the rectangular area 170 are set, the four vertices of the rectangular area 176 are rotated clockwise from the upper left. P1, P2, P3, and P4. Here, P1 is the sum A of the luminance values of the rectangular area 170 (P1 = A), P2 is A + the total sum B of the luminance values of the rectangular area 172 (P2 = A + B), and P3 is the sum C of the luminance values of the A + rectangular area 174 (P3 = A + C), P4 is the sum D of luminance values of A + B + C + rectangular area 176 (P4 = A + B + C + D). At this time, the sum D of the luminance values of the rectangular area D can be calculated as P4- (P2 + P3) -P1, and the sum of the luminance values of the rectangular area can be increased at high speed by adding and subtracting the pixel values at the four corners of the rectangular area. Can be calculated. Usually, it is possible to search for a search window of a different size by scaling the input image and cutting out a window (search window) of the same size as the learning sample used for learning from each scale-converted image. . However, as described above, if the input image is scale-converted so that search windows of all sizes can be set, the amount of calculation becomes extremely large. Therefore, in the technique described in Patent Document 1, the amount of calculation is reduced by using an integral image that can calculate the sum of luminance values in a rectangular area at high speed and using a rectangular feature.

米国特許出願公開第２００２／０１０２０２４号明細書US Patent Application Publication No. 2002/0102024

しかしながら、上記特許文献１に記載の顔検出器は、学習時に使用した学習サンプルのサイズの整数倍の大きさの対象物体しか検出することができない。これは、上記特許文献１が入力画像をスケール変換することで探索ウィンドウの大きさを変更するのではなく、入力画像を積分画像に変換し、これを利用して異なる探索ウィンドウの顔領域を検出するためである。すなわち、積分画像はピクセル単位に離散化されているため、例えば２０×２０のウィンドウサイズを使用する場合、３０×３０のサイズを探索ウィンドウに設定することができず、従ってウィンドウサイズこのウィンドウサイズの顔検出を行うことができない。 However, the face detector described in Patent Document 1 can only detect a target object that is an integer multiple of the size of the learning sample used during learning. This is because the above-mentioned Patent Document 1 does not change the size of the search window by scaling the input image, but converts the input image to an integral image and uses this to detect the face area of a different search window It is to do. That is, since the integral image is discretized in units of pixels, for example, when a window size of 20 × 20 is used, a size of 30 × 30 cannot be set as a search window. Face detection cannot be performed.

また、上記矩形特徴として、演算の高速化のため隣り合った矩形領域間の輝度値の差分のみを利用している。そのため、離れた矩形領域間の輝度変化をとらえることができず、物体検出の性能に制限がある。 Further, as the rectangular feature, only the difference in luminance value between adjacent rectangular areas is used for speeding up the calculation. For this reason, a change in luminance between distant rectangular areas cannot be captured, and the performance of object detection is limited.

なお、例えば積分画像をスケール変換すれば任意のサイズのウィンドウの探索が可能になり、また離れた位置の矩形領域間の輝度値の差分を利用することも可能ではあるが、積分画像をスケール変換すると演算量が増大し、積分画像を使用して処理を高速化する効果を相殺することになり、また離れた矩形領域間の輝度値の差分をも含めようとするとフィルタの種類が膨大になり、同じく処理量が増大してしまう。 Note that, for example, if the integral image is scale-converted, it is possible to search for a window of an arbitrary size, and it is also possible to use the difference in luminance value between the rectangular regions at distant positions, but the integral image is scale-converted. As a result, the amount of calculation increases, and the effect of speeding up the processing using the integral image is canceled. Also, if the difference of the luminance value between the separated rectangular areas is included, the types of filters become enormous. Similarly, the amount of processing increases.

本発明は、このような従来の実情に鑑みて提案されたものであり、集団学習により対象とする物体を検出する際に、学習時及び検出時の演算処理を高速化すると共に、任意の大きさの対象物体を検出でき且つ極めて判別能力が高い対象物検出装置及び方法、並びに集団学習装置及び方法を提供することを目的とする。 The present invention has been proposed in view of such a conventional situation. When detecting a target object by collective learning, the arithmetic processing at the time of learning and detection is accelerated, and an arbitrary size is provided. It is an object of the present invention to provide an object detection apparatus and method, and a collective learning apparatus and method that can detect the target object and have extremely high discrimination ability.

上述した目的を達成するために、本発明に係る対象物検出装置は、与えられた濃淡画像が対象物であるか否かを検出する対象物検出装置において、予め学習された２つの位置の画素の輝度値の差からなる特徴量に基づき上記濃淡画像が対象物か否かを示す推定値を算出する複数の弱判別手段と、上記複数の弱判別手段の少なくとも１以上により算出された上記推定値に基づき上記濃淡画像が対象物であるか否かを判別する判別手段とを有することを特徴とする。 In order to achieve the above-described object, an object detection apparatus according to the present invention is a pixel detection apparatus that detects whether a given grayscale image is an object or not. A plurality of weak discriminating means for calculating an estimated value indicating whether or not the grayscale image is an object based on a feature amount consisting of a difference in luminance value, and the estimation calculated by at least one of the plurality of weak discriminating means. And determining means for determining whether the grayscale image is an object based on the value.

本発明においては、弱判定手段が２つの位置の画素の輝度値の差という極めて簡単な特徴量を使用し、与えられる濃淡画像が、検出対象とする対象物であるか、非対象物であるかの弱判定を行うため、検出処理を高速化することができる。 In the present invention, the weak determination means uses a very simple feature amount that is a difference between the luminance values of the pixels at two positions, and the given grayscale image is an object to be detected or a non-object. Since the weak determination is performed, the detection process can be speeded up.

また、上記判別手段は、上記推定値に対して上記学習により得られた各弱判別手段に対する信頼度を乗算して加算した重み付き多数決の値を算出し、当該多数決の値に基づき上記濃淡画像が対象物か否かを判別することができ、複数の弱判別手段の推定値を組み合わせた多数決の結果を用いて対象物か否かの判定を行うことができる。 The discriminating means calculates a weighted majority value obtained by multiplying the estimated value by the reliability of each weak discriminating means obtained by the learning and adding the value, and based on the majority value, the grayscale image is calculated. It is possible to determine whether or not the object is an object, and it is possible to determine whether or not the object is an object using a majority result obtained by combining the estimated values of a plurality of weak discriminating means.

更に、上記複数の弱判別手段は、逐次上記推定値を算出し、上記判別手段は、上記推定値が算出される毎に上記重み付き多数決の値を逐次更新し、該更新された重み付き多数決の値に基づき上記推定値の算出を打ち切るか否かを制御することができ、弱判別器による推定値の算出を逐次行い、重み付き多数決の値を評価することで、全ての弱判定手段による算出を待たずに処理を中断することで、更に検出の高速化を図ることができる。 Further, the plurality of weak discriminating means sequentially calculate the estimated value, and the discriminating means sequentially updates the weighted majority value every time the estimated value is calculated, and the updated weighted majority vote. Whether or not to stop the calculation of the estimated value based on the value of the value, and by sequentially calculating the estimated value by the weak classifier and evaluating the value of the weighted majority vote, by all weak determining means By suspending the processing without waiting for the calculation, the detection speed can be further increased.

更にまた、上記判別手段は、上記重み付き多数決の値が打ち切り閾値より小さいか否かにより上記推定値の算出を打ち切るものであって、上記各弱判別手段は、対象物であるか非対象物であるかの正解付けがされた複数の濃淡画像からなる学習サンプルを使用して集団学習により逐次生成されたものであり、上記打ち切り閾値は、上記学習時において、弱判別手段が生成される毎に、該生成された弱判別手段により算出された上記対象物である学習サンプルに対する推定値に上記信頼度を重み付けした値を加算して更新される重み付き多数決の値の最小値とすることができ、正解付けされた対象物濃淡画像が取りえる最小の値を打ち切り閾値として学習しておくことで、正確かつ効率よく弱判別手段の処理を打ち切ることができる。 Furthermore, the determining means aborts the calculation of the estimated value depending on whether the value of the weighted majority vote is smaller than an abort threshold, and each weak distinguishing means is an object or a non-object. Are generated sequentially by collective learning using a learning sample consisting of a plurality of grayscale images to which the correct answer is given, and the truncation threshold value is generated each time weak discriminating means is generated during the learning. And adding the value obtained by weighting the reliability to the estimated value for the learning sample that is the object calculated by the generated weak discriminating means to obtain the minimum value of the weighted majority value that is updated. In addition, by learning the minimum value that can be taken by the correct object grayscale image as the abort threshold, the process of the weak discriminating means can be aborted accurately and efficiently.

また、上記学習時における重み付き多数決の値の最小値が正の場合、０を上記打ち切り閾値に設定することができ、AdaBoostのように弱判別手段の出力が正か負かで判定するような集団学習アルゴリズムにより学習する場合は０以上の最小値を打ち切り閾値として設定することができる。 In addition, when the minimum value of the weighted majority vote at the time of learning is positive, 0 can be set as the truncation threshold value, and it is determined whether the output of the weak discrimination means is positive or negative like AdaBoost. When learning by a collective learning algorithm, a minimum value of 0 or more can be set as an abort threshold.

更に、上記弱判別手段は、上記濃淡画像の上記特徴量が所定の閾値以上であるか否かに応じて対象物であるか否かを示す２値の上記推定値を算出することにより、推定値を確定的に出力してもよく、また、上記特徴量に基づき上記濃淡画像が対象物である確率を上記推定値として算出することにより、推定値を確率的に出力してもよい。 Further, the weak discriminating means estimates by calculating the binary estimated value indicating whether or not the object is an object according to whether or not the feature amount of the grayscale image is equal to or greater than a predetermined threshold. A value may be output deterministically, or an estimated value may be output probabilistically by calculating a probability that the grayscale image is an object based on the feature amount as the estimated value.

本発明に係る対象物検出方法は、与えられた濃淡画像が対象物であるか否かを検出する対象物検出方法において、予め学習された２つの位置の画素の輝度値の差からなる特徴量に基づき上記濃淡画像が対象物か否かを示す推定値を複数の弱判別手段により算出する弱判別工程と、複数の弱判別器の少なくとも１以上により算出された上記推定値に基づき上記濃淡画像が対象物であるか否かを判別する判別工程とを有することを特徴とする。 The object detection method according to the present invention is the object detection method for detecting whether or not a given grayscale image is an object, and is a feature amount comprising a difference between luminance values of pixels at two previously learned positions. And a weak discrimination step for calculating an estimated value indicating whether or not the gray image is an object by a plurality of weak discriminating means, and the gray image based on the estimated value calculated by at least one of a plurality of weak discriminators. And a determination step of determining whether or not the object is an object.

本発明に係る集団学習装置は、対象物であるか非対象物であるかの正解付けがされた複数の濃淡画像からなる学習サンプルを使用して集団学習する集団学習装置において、上記学習サンプルを使用し、任意の位置の２つの画素の輝度値の差を特徴量とし入力として与えられる濃淡画像が対象物か否かを示す推定値を出力する複数の弱判別器を集団学習する学習手段を有することを特徴とする。 The group learning device according to the present invention is a group learning device that performs group learning using a learning sample composed of a plurality of grayscale images that are correctly identified as being an object or a non-object. Learning means for performing collective learning using a plurality of weak discriminators that output an estimated value indicating whether or not a grayscale image given as an input is a difference between luminance values of two pixels at an arbitrary position as a feature amount It is characterized by having.

本発明においては、学習サンプルにおける任意の位置の２つの画素の輝度値の差という極めて単純な特徴量を用いた弱判別器を集団学習により生成することで、生成した弱判別器の判別結果を多数利用して対象物を検出する検出装置を構成した場合、検出処理を極めて高速化させることができる。 In the present invention, a weak discriminator using a very simple feature amount, which is a difference between luminance values of two pixels at arbitrary positions in a learning sample, is generated by collective learning, so that the discrimination result of the generated weak discriminator is obtained. When a detection apparatus that detects a target object by using a large number is configured, the detection process can be made extremely fast.

また、上記学習手段は、上記各学習サンプルの上記特徴量を算出し、各特徴量に基づき上記弱判別器を生成する弱判別器生成手段と、上記弱判別器生成手段が生成した弱判別器について、各学習サンプルに設定されたデータ重みに基づき上記学習サンプルを判別した誤り率を算出する誤り率算出手段と、上記誤り率に基づき上記弱判別器に対する信頼度を算出する信頼度算出手段と、上記弱判別器が不正解とした学習サンプルの重みが相対的に増すよう上記データ重みを更新するデータ重み算出手段とを有し、弱判別器生成手段は、上記データ重みが更新されると新たな弱判別器を生成することができ、弱判別器を生成してその誤り率及び信頼度を算出し、データ重みを更新して再び弱判別器を生成するという一連の処理を繰り返すことで学習を行うことができる。 The learning means calculates the feature quantity of each learning sample, and generates a weak classifier based on each feature quantity, and a weak classifier generated by the weak classifier generation means. An error rate calculating means for calculating an error rate for discriminating the learning sample based on the data weight set for each learning sample, and a reliability calculating means for calculating a reliability for the weak classifier based on the error rate; And a data weight calculating means for updating the data weight so that the weight of the learning sample that the weak discriminator makes an incorrect answer relatively increases, and the weak discriminator generating means updates the data weight. A new weak classifier can be generated, a weak classifier is generated, its error rate and reliability are calculated, data weight is updated, and a series of processes of generating a weak classifier again is repeated. Study It can be carried out.

更に、上記弱判別器生成手段は、上記特徴量を算出する処理を複数回繰り返して複数種の特徴量を算出し、該特徴量毎に弱判別器候補を生成し、生成された複数の各弱判別器候補について、各学習サンプルに設定された上記データ重みに基づき上記学習サンプルを判別した誤り率を算出し、該誤り率が最小のものを上記弱判別器とすることができ、データ重みが更新される毎に、多数の弱判別器候補を生成し、これらの中から最も誤り率が小さいものを選択して弱判別器を１つ生成（学習）することができる。 Further, the weak discriminator generation means repeats the process of calculating the feature quantity a plurality of times to calculate a plurality of types of feature quantities, generates weak discriminator candidates for each feature quantity, and generates each of the generated plurality of feature quantities. For the weak discriminator candidate, the error rate obtained by discriminating the learning sample is calculated based on the data weight set for each learning sample, and the one having the smallest error rate can be used as the weak discriminator. Each time is updated, a large number of weak discriminator candidates are generated, and one weak discriminator can be generated (learned) by selecting the one having the smallest error rate from among them.

更に、上記弱判別器生成手段が上記弱判別器を生成する毎に、当該弱判別器により上記対象物である各学習サンプルに対する上記推定値を算出し該推定値に上記信頼度を重み付けして加算した重み付き多数決の値を算出し、その最小値を記憶する打ち切り閾値記憶手段を有することができ、この最小値を打ち切り閾値として学習しておくことで、生成された複数の弱判別器からなる検出装置における検出処理を更に高速化することができる。 Further, each time the weak classifier generating means generates the weak classifier, the weak classifier calculates the estimated value for each learning sample that is the object, and the estimated value is weighted with the reliability. It is possible to have an abort threshold value storage means for calculating the added weighted majority value and storing the minimum value, and learning the minimum value as the abort threshold value, thereby generating a plurality of weak classifiers. It is possible to further speed up the detection process in the detection apparatus.

本発明に係る集団学習方法は、対象物であるか非対象物であるかの正解付けがされた複数の濃淡画像からなる学習サンプルを使用して集団学習する集団学習方法において、上記学習サンプルを使用し、任意の位置の２つの画素の輝度値の差を特徴量とし入力として与えられる濃淡画像が対象物か否かを示す推定値を出力する複数の弱判別器を集団学習する学習工程を有することを特徴とする。 The group learning method according to the present invention is a group learning method in which group learning is performed using a learning sample composed of a plurality of grayscale images that are correctly identified as being an object or a non-object. A learning step of using a plurality of weak discriminators to output an estimated value indicating whether or not a grayscale image given as an input is a difference between luminance values of two pixels at arbitrary positions. It is characterized by having.

本発明に係る対象物検出装置は、濃淡画像から固定サイズのウィンドウ画像を切り出し、該ウィンドウ画像が対象物か否かを検出する対象物検出装置において、入力される濃淡画像のサイズを拡大又は縮小したスケール画像を生成するスケール変換手段と、上記スケール画像から上記固定サイズのウィンドウを走査し、上記ウィンドウ画像を切り出すウィンドウ画像走査手段と、与えられるウィンドウ画像が対象物か否かを検出する対象物検出手段とを有し、上記対象物検出手段は、予め学習された２つの位置の画素の輝度値の差からなる特徴量に基づき上記ウィンドウ画像が対象物か否かを推定する推定値を算出する複数の弱判別手段と、複数の弱判別手段の少なくとも１以上により算出された上記推定値に基づき上記ウィンドウ画像が対象物であるか否かを判別する判別手段とを有することを特徴とする。 An object detection apparatus according to the present invention cuts out a fixed-size window image from a grayscale image, and expands or reduces the size of the input grayscale image in the object detection apparatus that detects whether the window image is an object. Scale conversion means for generating the scale image, window image scanning means for scanning the window of the fixed size from the scale image and cutting out the window image, and an object for detecting whether the given window image is an object Detecting means, and the object detecting means calculates an estimated value for estimating whether or not the window image is an object based on a feature amount comprising a difference between luminance values of pixels at two positions learned in advance. A window image based on the estimated value calculated by at least one of the plurality of weak discrimination means and at least one of the plurality of weak discrimination means. And having a discriminating means for discriminating whether the whether ones.

本発明においては、濃淡画像をスケール変換し、ウィンドウ画像を切り出すことで、任意の大きさの対象物を検出することができると共に、弱判別手段が２つの画素の輝度値の差という極めて簡単な特徴量によりウィンドウ画像が対象物か否かを示す推定値を算出するため極めて高速に検出処理を行うことができる。 In the present invention, a grayscale image is scaled and a window image is cut out to detect an object of an arbitrary size, and the weak discriminating means is a very simple difference between the luminance values of two pixels. Since an estimated value indicating whether or not the window image is an object is calculated based on the feature amount, the detection process can be performed at a very high speed.

本発明に係る対象物検出方法は、濃淡画像から固定サイズのウィンドウ画像を切り出し、該ウィンドウ画像が対象物か否かを検出する対象物検出方法において、入力される濃淡画像のサイズを拡大又は縮小したスケール画像を生成するスケール変換工程と、上記スケール画像から上記固定サイズのウィンドウを走査し、上記ウィンドウ画像を切り出すウィンドウ画像走査工程と、与えられるウィンドウ画像が対象物か否かを検出する対象物検出工程とを有し、上記対象物検出工程は、予め学習された２つの位置の画素の輝度値の差からなる特徴量に基づき上記濃淡画像が対象物か否かを示す推定値を複数の弱判別器により算出する弱判別工程と、複数の弱判別器の少なくとも１以上により算出された上記推定値に基づき上記濃淡画像が対象物であるか否かを判別する判別工程と有することを特徴とする。 In the object detection method according to the present invention, a window image having a fixed size is cut out from a grayscale image, and the size of the input grayscale image is enlarged or reduced in the object detection method for detecting whether the window image is an object. A scale conversion step for generating a scale image, a window image scanning step for scanning the window of the fixed size from the scale image and cutting out the window image, and an object for detecting whether the given window image is an object A detection step, wherein the object detection step has a plurality of estimated values indicating whether or not the grayscale image is an object based on a feature amount that is a difference between luminance values of pixels at two positions learned in advance. Based on the weak discriminating step calculated by the weak discriminator and the estimated value calculated by at least one of a plurality of weak discriminators, the grayscale image is an object. Characterized in that it has a determination step of determining whether there.

本発明に係る対象物検出装置によれば、与えられた濃淡画像が対象物であるか否かを検出する対象物検出装置において、予め学習された２つの位置の画素の輝度値の差からなる特徴量に基づき上記濃淡画像が対象物か否かを示す推定値を算出する複数の弱判別手段と、上記複数の弱判別手段の少なくとも１以上により算出された上記推定値に基づき上記濃淡画像が対象物であるか否かを判別する判別手段とを有するので、濃淡画像が対象物であるか否かの弱判定が極めて簡単であり、検出処理を極めて高速化し、リアルタイムで顔検出することができる。 According to the object detection device of the present invention, in the object detection device that detects whether or not a given grayscale image is an object, the difference between the luminance values of pixels at two previously learned positions is included. A plurality of weak discriminating means for calculating an estimated value indicating whether the grayscale image is an object based on a feature amount, and the grayscale image is based on the estimated value calculated by at least one of the plurality of weak discriminating means. Since it has a discriminating means for discriminating whether or not it is an object, it is very easy to weakly determine whether or not a grayscale image is an object, the detection process can be made extremely fast, and face detection can be performed in real time. it can.

また、本発明に係る対象物検出方法によれば、与えられた濃淡画像が対象物であるか否かを高速に検出することができる。 In addition, according to the object detection method of the present invention, it is possible to detect at high speed whether or not a given grayscale image is an object.

本発明に係る集団学習装置によれば、対象物であるか非対象物であるかの正解付けがされた複数の濃淡画像からなる学習サンプルを使用して集団学習する集団学習装置において、上記学習サンプルを使用し、任意の位置の２つの画素の輝度値の差を特徴量とし入力として与えられる濃淡画像が対象物か否かを示す推定値を出力する複数の弱判別器を集団学習する学習手段を有するので、集団学習により任意の位置の２つの画素の輝度値の差という極めて単純な特徴量を用いた弱判別器を生成することができ、これにより学習時の特徴量の算出を高速化でき、かつ生成した多数の弱判別器を有する対象物検出装置を構成した場合の検出処理を極めて高速化させることができる。 According to the group learning device according to the present invention, in the group learning device that performs group learning using a learning sample composed of a plurality of grayscale images that are correctly identified as being an object or a non-object, Learning that uses a sample and collectively learns a plurality of weak classifiers that output the estimated value indicating whether the grayscale image given as an input is the target value using the difference between the luminance values of two pixels at an arbitrary position As a result, it is possible to generate a weak discriminator using a very simple feature value, which is the difference between the luminance values of two pixels at an arbitrary position, by collective learning. And the detection process when the object detection device having a large number of generated weak discriminators is configured can be extremely accelerated.

また、本発明に係る手段学習方法によれば、対象物であるか非対象物であるかの正解付けがされた複数の濃淡画像からなる学習サンプルを使用して集団学習することで、高速に対象物を検出可能な対象物検出装置を構成する弱判別器を学習することができる。 Further, according to the means learning method according to the present invention, by performing collective learning using a learning sample composed of a plurality of grayscale images to which a correct answer as to whether it is an object or a non-object, It is possible to learn a weak classifier that constitutes an object detection device capable of detecting an object.

本発明に係る対象物検出装置によれば、濃淡画像から固定サイズのウィンドウ画像を切り出し、該ウィンドウ画像が対象物か否かを検出する対象物検出装置において、入力される濃淡画像のサイズを拡大又は縮小したスケール画像を生成するスケール変換手段と、上記スケール画像から上記固定サイズのウィンドウを走査し、上記ウィンドウ画像を切り出すウィンドウ画像走査手段と、与えられるウィンドウ画像が対象物か否かを検出する対象物検出手段とを有し、上記対象物検出手段は、予め学習された２つの位置の画素の輝度値の差からなる特徴量に基づき上記ウィンドウ画像が対象物か否かを推定する推定値を算出する複数の弱判別手段と、複数の弱判別手段の少なくとも１以上により算出された上記推定値に基づき上記ウィンドウ画像が対象物であるか否かを判別する判別手段とを有するので、入力画像の濃淡画像をスケール変換してウィンドウ画像を切り出すことで、任意の大きさの対象物を検出することができると共に、弱判別手段が２つの画素の輝度値の差という極めて簡単な特徴量を使用してウィンドウ画像が対象物かを検出するため極めて高速に検出処理を行うことができる。 According to the object detection device of the present invention, a window image having a fixed size is cut out from the grayscale image, and the size of the input grayscale image is enlarged in the object detection device that detects whether or not the window image is the target object. Alternatively, a scale conversion unit that generates a reduced scale image, a window image scanning unit that scans the window of the fixed size from the scale image and cuts out the window image, and detects whether the given window image is an object. An object detection means, and the object detection means estimates whether the window image is an object based on a feature amount consisting of a difference between luminance values of pixels at two positions learned in advance. A window image based on the estimated value calculated by at least one of a plurality of weak discriminating means and a plurality of weak discriminating means. Since it has a discriminating means for discriminating whether or not the object is an object, it is possible to detect an object of any size by scaling the grayscale image of the input image and cutting out the window image, Since the weak discriminating means detects whether the window image is an object by using an extremely simple feature amount that is the difference between the luminance values of the two pixels, the detection process can be performed at a very high speed.

また、本発明に係る対象物検出方法によれば、濃淡画像から固定サイズのウィンドウ画像を切り出し、該ウィンドウ画像が対象物か否かを高速に検出することができる。 In addition, according to the object detection method of the present invention, it is possible to cut out a fixed-size window image from a grayscale image and to detect at high speed whether the window image is an object.

以下、本発明を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。この実施の形態は、本発明を、アンサンブル学習（集団学習）を利用して画像から対象物を検出する対象物検出装置に適用したものである。 Hereinafter, specific embodiments to which the present invention is applied will be described in detail with reference to the drawings. In this embodiment, the present invention is applied to an object detection apparatus that detects an object from an image using ensemble learning (group learning).

集団学習によって得られる学習機械は、多数の弱仮説と、これらを組み合わせる結合機（combiner）とからなる。ここで、入力によらず、固定した重みで弱仮説の出力を統合する結合機の一例としてブースティングがある。ブースティングは、前に生成した弱仮説の学習結果を使用して間違いを苦手とする学習サンプル（例題）の重みを増すように、学習サンプルが従う分布を加工し、この分布に基づき新たな弱仮説の学習を行う。これにより不正解が多く対象物として判別が難しい学習サンプルの重みが相対的に上昇し、結果的に重みが大きい、即ち判別が難しい学習サンプルを正解させるような弱判別器が逐次選択される。すなわち、学習における弱仮説の生成は逐次的に行われるものであり、後から生成された弱仮説はその前に生成された弱仮説に依存することになる。 A learning machine obtained by group learning is composed of a number of weak hypotheses and a combiner that combines them. Here, boosting is an example of a combiner that integrates weak hypothesis outputs with fixed weights regardless of input. Boosting uses the previously generated weak hypothesis learning result to process the distribution that the learning sample follows to increase the weight of the learning sample (example) that is not prone to mistakes, and then creates a new weakness based on this distribution. Learn hypotheses. As a result, the weights of learning samples that have many incorrect answers and are difficult to discriminate as objects are relatively increased, and as a result, weak classifiers that correctly answer learning samples that are large in weight, that is, difficult to discriminate are sequentially selected. That is, the generation of weak hypotheses in learning is performed sequentially, and the weak hypotheses generated later depend on the weak hypotheses generated before that.

対象物を検出する際には、上述のようにして学習により逐次生成された多数の弱仮説の判別結果を使用する。例えばAdaBoostの場合は、この学習により生成された弱仮説（以下、弱判別器という。）全ての判別結果（対象物であれば１、非対象物であれば−１）が結合機に供給され、結合機は、全判別結果に対して、対応する弱判別器毎に学習時に算出された信頼度を重み付け加算し、その重み付き多数決の結果を出力し、結合機の出力値を評価することで入力された画像が対象物か否かを選択するものである。 When detecting an object, the discrimination results of a number of weak hypotheses sequentially generated by learning as described above are used. For example, in the case of AdaBoost, all the weak hypotheses (hereinafter referred to as weak discriminators) generated by this learning are supplied to the coupling machine (1 for an object and -1 for a non-object). The combiner weights and adds the reliability calculated at the time of learning for each weak discriminator to all the discrimination results, outputs the result of the weighted majority vote, and evaluates the output value of the combiner This selects whether or not the image input in is an object.

弱判別器は、なんらかの特徴量を使用して、対象物か又は非対象物であるかの判定を行うものである。なお、後述するように、弱判別器の出力は対象物か否かを確定的に出力してもよく、対象物らしさを確率密度などで確率的に出力してもよい。ここで、本実施の形態においては、２つのピクセル間の輝度値の差という極めて簡単な特徴量（以下、ピクセル間差分特徴という。）を使用して対象物か否かを判別する弱判別器を使用した集団学習装置を利用することで、対象物の検出処理を高速化するものである。 The weak classifier determines whether the object is an object or a non-object by using some feature amount. As will be described later, whether or not the output of the weak discriminator may be deterministically output, or the likelihood of the object may be output probabilistically by a probability density or the like. Here, in this embodiment, a weak discriminator that discriminates whether or not the object is an object using a very simple feature quantity called a difference in luminance value between two pixels (hereinafter referred to as an inter-pixel difference feature). By using a group learning device that uses, the object detection process is speeded up.

（１）対象物検出装置
図１は、本実施の形態における対象物検出装置の処理機能を示す機能ブロック図である。図１に示すように、対象物検出装置１は、入力画像として濃淡画像（輝度画像）を出力する画像出力部２と、入力画像を拡大又は縮小してスケーリングを行うスケーリング部３と、スケーリングされた入力画像において、所定サイズのウィンドウ画像の大きさで例えば左上から順次スキャンする走査部４と、走査部４にて順次スキャンされた各ウィンドウ画像が対象物か非対象物かを判別する判別器５とを有し、与えられる画像（入力画像）の中から対象物体の領域を示す対象物の位置及び大きさを出力する。すなわち、スケーリング部３は、入力画像を指定された全てのスケールに拡大又は縮小しスケーリング画像を出力する。走査部３は、各スケーリング画像について、検出したい対象物の大きさとなるウィンドウを順次スキャンしてウィンドウ画像を切り出し、判別器５は、各ウィンドウ画像が顔か否かを判別する。 (1) Object Detection Device FIG. 1 is a functional block diagram showing processing functions of the object detection device in the present embodiment. As shown in FIG. 1, the object detection apparatus 1 is scaled by an image output unit 2 that outputs a grayscale image (luminance image) as an input image, a scaling unit 3 that scales the input image by enlarging or reducing it. In the input image, for example, a scanning unit 4 that sequentially scans from the upper left with a window image of a predetermined size, and a discriminator that determines whether each window image sequentially scanned by the scanning unit 4 is an object or a non-object 5 and outputs the position and size of the target object indicating the area of the target object from the given image (input image). That is, the scaling unit 3 enlarges or reduces the input image to all designated scales and outputs a scaled image. For each scaled image, the scanning unit 3 sequentially scans the window having the size of the object to be detected to cut out the window image, and the discriminator 5 determines whether each window image is a face.

ここで、判別器５は、集団学習により判別器５を構成する複数の弱判別器の集団学習を実行する集団学習機６の学習結果を参照して、現在のウィンドウ画像が例えば顔画像などの対象物であるか、又は非対象物であるかを判別する。 Here, the discriminator 5 refers to the learning result of the collective learning machine 6 that performs collective learning of a plurality of weak discriminators constituting the discriminator 5 by collective learning, and the current window image is, for example, a face image or the like. It is determined whether the object is an object or a non-object.

また、対象物検出装置１は、入力画像から複数の対象物が検出された場合は、複数の領域情報を出力する。更に、複数の領域情報のうち領域が重なりあっている領域が存在する場合は、後述する方法で最も対象物とされる評価が高い領域を選択する処理も行うことができる。 Moreover, the target object detection apparatus 1 outputs a plurality of region information when a plurality of target objects are detected from the input image. Furthermore, when there is an overlapping area among a plurality of pieces of area information, it is possible to perform a process of selecting an area having the highest evaluation as a target by a method described later.

画像出力部２から出力された画像（濃淡画像）は、先ずスケーリング部３に入る。スケーリング部３では、バイリニア補完を用いた画像の縮小が行われる。本実施の形態においては、最初に複数の縮小画像を生成するのではなく、必要とされる画像を走査部４に対して出力し、その画像の処理を終えた後で、次の更に小さな縮小画像を生成するという処理を繰り返す。 The image (grayscale image) output from the image output unit 2 first enters the scaling unit 3. The scaling unit 3 performs image reduction using bilinear interpolation. In the present embodiment, instead of first generating a plurality of reduced images, a required image is output to the scanning unit 4, and after the processing of the image is finished, the next smaller reduction is performed. The process of generating an image is repeated.

すなわち、先ず図２に示すように、入力画像１０Ａをそのまま走査部４へ出力する。そして、入力画像１０Ａについて走査部４及び判別器５の処理が終了するのを待った後、入力画像１０Ａのサイズを縮小した入力画像１０Ｂを生成する。そして、この入力画像１０Ｂにおける走査部４及び判別器５の処理が終了するのを待って、入力画像１０Ｂのサイズを更に縮小した入力画像１０Ｃを走査部４に出力するというように、順次縮小画像１０Ｄ、１０Ｅなどを生成していき、縮小画像の画像サイズが、走査部４にて走査するウィンドウサイズより小さくなった時点で処理を終了する。この処理の終了をまって、画像入力部２は、次の入力画像をスケーリング部３に出力する。 That is, first, as shown in FIG. 2, the input image 10A is output to the scanning unit 4 as it is. Then, after waiting for the processing of the scanning unit 4 and the discriminator 5 to end for the input image 10A, an input image 10B in which the size of the input image 10A is reduced is generated. Then, after the processing of the scanning unit 4 and the discriminator 5 in the input image 10B is completed, the input image 10C obtained by further reducing the size of the input image 10B is output to the scanning unit 4 so as to be sequentially reduced. 10D, 10E, etc. are generated, and the process is terminated when the image size of the reduced image becomes smaller than the window size scanned by the scanning unit 4. After the end of this process, the image input unit 2 outputs the next input image to the scaling unit 3.

走査部４では、図３に示すように、与えられた例えば画像１０Ａに対して、後段の判別器５が受け付けるウィンドウサイズＳの大きさのウィンドウ１１を画像（画面）の全体に対して順次当てはめていき、各位置における画像（以下、切り取り画像）を判別器５に出力する。ここで、ウィンドウサイズＳは一定である一方、上述したように、スケーリング部３により順次入力画像が縮小され、入力画像の画像サイズが様々なスケールに変換されるため、任意の大きさの対象物体を検出することが可能となる。 As shown in FIG. 3, the scanning unit 4 sequentially applies a window 11 having a window size S received by the subsequent discriminator 5 to the entire image (screen) for the given image 10 </ b> A, for example. The image at each position (hereinafter referred to as a cut image) is output to the discriminator 5. Here, while the window size S is constant, as described above, the input image is sequentially reduced by the scaling unit 3, and the image size of the input image is converted into various scales. Can be detected.

判別器５は、前段から与えられた切り取り画像が、例えば顔などの対象物体であるか否かを判定する。判別器５は、図４に示すように、アンサンブル学習（Ensemble learning）により得られた複数の弱判別器２１_ｎ（２１_１〜２１_Ｎ）と、これらの出力にそれぞれ重みＷ_ｎ（Ｗ_１〜Ｗ_Ｎ）を乗算し、重み付き多数決を求める加算器２２とを有する。判別器６は、入力されるウィンドウ画像に対し、各弱判別器２１_ｎ（２１_１〜２１_Ｎ）が対象物である否かの推定値を逐次出力し、加算器２２が重み付き多数決を算出して出力する。この重み付き多数決の値に応じ、図示せぬ判定手段が対象物か否かを判定する。 The discriminator 5 determines whether the cut image given from the previous stage is a target object such as a face. As shown in FIG. 4, the discriminator 5 includes a plurality of weak discriminators 21 _n (21 _{1 to} 21 _N ) obtained by ensemble learning, and weights W _n (W ₁ to W _N ), and an adder 22 for obtaining a weighted majority. The discriminator 6 sequentially outputs an estimated value as to whether or not each weak discriminator 21 _n (21 _{1 to} 21 _N ) is an object for the input window image, and the adder 22 calculates a weighted majority vote. And output. In accordance with the weighted majority value, it is determined whether or not a determination means (not shown) is an object.

集団学習機６は、後述する方法にて、予め弱判別器２１_ｎと、それらの出力（推定値）に乗算する重みを集団学習により学習するものである。集団学習としては、複数の判別器の結果を多数決にて求めることができるものであれば具体的にはどんな手法でも適用可能である。例えば、データの重み付けを行って重み付き多数決行うアダブースト（AdaBoost）などのブースティングを用いた集団学習を適用することができる。 The group learning machine 6 learns the weak classifier 21 _n and weights for multiplying their outputs (estimated values) in advance by group learning using a method described later. As group learning, any method can be applied as long as it can obtain the results of a plurality of discriminators by majority vote. For example, it is possible to apply group learning using boosting such as AdaBoost, which performs weighted majority by weighting data.

判別器５を構成する各弱判別器２１_ｎは、判別のための特徴量として、２つの画素間の輝度値の差分（ピクセル間差分特徴）を使用する。そして、判別には、予め学習された、対象物か非対象物であるかのラベリングがされた複数の濃淡画像からなる学習サンプルにより予め学習された特徴量と、ウィンドウ画像の特徴量とを比較し、ウィンドウ画像が対象物であるか否かを推定するための推定値を確定的又は確率的に出力する。 Each weak discriminator 21 _n constituting the discriminator 5 uses a difference in luminance value between two pixels (inter-pixel difference feature) as a feature amount for discrimination. Then, for the discrimination, a feature amount learned in advance by a learning sample made up of a plurality of grayscale images labeled as objects or non-objects is compared with the feature amount of the window image. Then, an estimated value for estimating whether or not the window image is an object is output deterministically or probabilistically.

ここで、加算器２２は、弱判別器２１_ｎの推定値に、各弱判別器２１_ｎに対する信頼度となる重みを乗算し、これを加算した値（重み付き多数決の値）を出力する。ここで、AdaBoostでは、複数の弱判別器２１_ｎは、順次推定値を算出し、これに伴い重み付き多数決の値が逐次更新されていく。これら複数の弱判別器は、集団学習機６により後述するアルゴリズムに従い、上述の学習サンプルを使用して集団学習により逐次的に生成されたものであり、例えばその生成順に上記推定値を算出する。また、重み付き多数決の重み（信頼度）は、弱判別器を生成する後述する学習工程にて学習されるものである。 Here, the adder 22, the estimated value of the weak discriminator 21 _n, multiplied by the weight as a reliability for each weak discriminator 21 _n, and outputs the added value (the value of the weighted majority decision) this. Here, in AdaBoost, the plurality of weak classifiers 21 _n sequentially calculate estimated values, and the weighted majority values are sequentially updated accordingly. The plurality of weak classifiers are sequentially generated by group learning using the above-described learning samples according to an algorithm described later by the group learning machine 6, and for example, the estimated values are calculated in the order of generation. Further, the weight (reliability) of the weighted majority decision is learned in a learning process described later for generating a weak classifier.

弱判別器２１_ｎは、例えばAdaBoostのように弱判別器が２値出力を行うべき場合は、ピクセル間差分特徴を閾値で二分することで、対象物体であるかどうかの判別をおこなう。また、閾値による判別方法は、複数の閾値を用いてもよい。また、弱判別器は、例えばReal-AdaBoostのようにピクセル間差分特徴から対象物体かどうかを表す度合いの連続値を確率的に出力してもよい。これら弱判別器２１_ｎが必要とする判別のための特徴量（閾値）なども学習時に上記アルゴリズムに従って学習されるものである。 When the weak discriminator 21 _n should perform binary output, such as AdaBoost, for example, the weak discriminator 21 _n divides the inter-pixel difference feature by a threshold value to determine whether the object is a target object. Further, a plurality of threshold values may be used as the determination method based on the threshold values. Further, the weak classifier may probabilistically output a continuous value indicating the degree of whether the target object is based on the inter-pixel difference feature, for example, Real-AdaBoost. The feature quantity (threshold value) for discrimination required by these weak discriminators 21 _n is also learned according to the above algorithm at the time of learning.

更に、重み付き多数決の際、全ての弱判別器の計算結果を待たず、計算途中であってもその値によっては対象物体でないと判断して計算を打ち切りするため、打ち切りの閾値を学習時に学習する。この打ち切り処理によって、検出処理における演算量を大幅に削減することが可能となる。これにより、全ての弱判別器の計算結果を待たず、計算途中で次のウィンドウ画像の判別処理に移ることができる。 In addition, the weighted majority decision does not wait for the calculation results of all weak classifiers, and even during the calculation, depending on the value, it is determined that the object is not the target object, so the calculation is aborted. To do. With this abort process, the amount of calculation in the detection process can be significantly reduced. As a result, it is possible to shift to the next window image discrimination process during the calculation without waiting for the calculation results of all weak discriminators.

このように、判別器５は、ウィンドウ画像が対象物か否かを判定するための評価値として重み付き多数決を算出し、その評価値に基づきウィンドウ画像が対象物か否かを判定する判定手段として機能する。更に判別器５は、予め学習により生成された複数の弱判別器が推定値を順次算出して出力し、推定値が算出される毎にその推定値に対して学習により得られた各弱判別器に対する重みを乗算して加算した重み付き多数決の値を更新し、この重み付き多数決の値（評価値）が更新される毎に、上記打ち切り閾値を利用して推定値の算出を打ち切るか否かをも制御することができるものである。 In this way, the discriminator 5 calculates a weighted majority vote as an evaluation value for determining whether or not the window image is an object, and determination means for determining whether or not the window image is an object based on the evaluation value. Function as. Further, the discriminator 5 sequentially calculates and outputs an estimated value by a plurality of weak discriminators generated by learning in advance, and each weak discriminant obtained by learning for the estimated value every time the estimated value is calculated. Whether or not to abort the calculation of the estimated value by using the above-mentioned truncation threshold each time the weighted majority vote value multiplied by the weight for the unit is updated and the weighted majority vote value (evaluation value) is updated Can also be controlled.

この判別器５は、集団学習機６において、学習サンプルを使用し、所定のアルゴリズムに従って集団学習することにより生成される。ここでは先ず、集団学習機６における集団学習方法について説明し、次に、その集団学習により学習された得られた判別器５を使用し、入力画像から対象物を判別する方法について説明する。 The discriminator 5 is generated by the group learning machine 6 using the learning sample and performing group learning according to a predetermined algorithm. Here, a group learning method in the group learning machine 6 will be described first, and then a method for determining an object from an input image using the discriminator 5 obtained by the group learning will be described.

（２）集団学習機
ブースティングアルゴリズムを用いて集団学習する集団学習機６は、上述したように複数の弱判別器を複数個組み合わせ、結果的に強い判定結果が得られるよう学習するものである。弱判別器は、１つ１つは、極めて簡単な構成とし、１つでは顔か顔でないかの判別能力も低いものであるが、これを例えば数百〜数千個組み合わせることで、高い判別能力を持たせることができる。この集団学習機６は、例えば数千の学習サンプルといわれる予め正解付け（ラベリング）された対象物と非対象物、例えば顔画像と、非顔画像とからならなるサンプル画像を使用し、多数の学習モデル（仮説の組み合わせ）から所定の学習アルゴリズムに従って１つの仮説を選択（学習）することで弱判別器を生成し、この弱判別器の組み合わせ方を決定していく。弱判別器はそれ自体では判別性能が低いものであるが、これらの選別、組み合わせ方により結果的に判別能力が高い判別器を得ることができるため、集団学習機６では、弱判別器の組み合わせ方、即ち弱判別器の選別及びそれらの出力値を重み付き多数決する際の重みなどの学習をする。 (2) Group learning machine The group learning machine 6 that performs group learning using a boosting algorithm combines a plurality of weak discriminators as described above, and learns to obtain a strong determination result as a result. . Each weak discriminator has a very simple configuration, and one has a low ability to discriminate whether it is a face or not a face, but by combining several hundred to several thousand, for example, a high discrimination Can have ability. The group learning machine 6 uses, for example, a sample image consisting of a target object and a non-target object, for example, a face image and a non-face image, which are pre-corrected (labeled), for example, thousands of learning samples. A weak classifier is generated by selecting (learning) one hypothesis from a learning model (hypothesis combination) according to a predetermined learning algorithm, and the combination of the weak classifiers is determined. Although the weak discriminator itself has a low discrimination performance, a classifier having a high discrimination ability can be obtained as a result of the selection and combination of the weak discriminators. On the other hand, the weak classifiers are selected and the weights when the weighted majority of the output values are determined are learned.

次に、適切な弱判別器を学習アルゴリズムに従って多数組み合わせた判別器を得るための集団学習機６の学習方法について説明するが、集団学習機６の学習方法の説明に先立ち、集団学習にて学習する学習データのうちで、本実施の形態において特徴となる学習データ、具体的には弱判別器を構成するためのピクセル間差分特徴、及びに判別工程（検出工程）において検出を途中で打ち切るための打ち切り閾値について説明しておく。 Next, a learning method of the collective learning machine 6 for obtaining a discriminator in which a number of appropriate weak discriminators are combined according to a learning algorithm will be described. Prior to the description of the learning method of the collective learning machine 6, learning is performed by collective learning. Among the learning data to be learned, the learning data that is the feature in the present embodiment, specifically, the inter-pixel difference feature for constituting the weak classifier, and the detection process (detection process) to be interrupted in the middle The truncation threshold value will be described.

（３）弱判別器の構成
本実施の形態における判別器５は、これを構成する弱判別器が弱判別器に入力される画像に含まれる全画素において選択された２つの画素の輝度値の差分（ピクセル間差分特徴）により顔か否かを判別する極めて簡単な構成とすることで、判別工程において弱判別器の判別結果の算出を高速化するものである。弱判別器に入力される画像は、学習工程では、学習サンプルであり、判別工程では、スケーリング画像から切り出されたウィンドウ画像である。 (3) Configuration of Weak Discriminator The discriminator 5 according to the present embodiment is configured such that the weak discriminators constituting the discriminator 5 are the luminance values of two pixels selected in all pixels included in an image input to the weak discriminator. By using a very simple configuration for discriminating whether or not the face is based on the difference (difference feature between pixels), the calculation of the discrimination result of the weak discriminator is accelerated in the discrimination process. The image input to the weak classifier is a learning sample in the learning step, and is a window image cut out from the scaling image in the determination step.

図５は、ピクセル間差分特徴を説明するための画像を示す模式図である。画像３０において、本実施の形態においては、任意の２つの画素の輝度値の差、例えば画素３１の輝度値Ｉ_１と、画素３２の輝度値Ｉ_２との差、即ち下記式（２）をピクセル間差分特徴と定義する。 FIG. 5 is a schematic diagram showing an image for explaining the inter-pixel difference feature. In this embodiment, in the present embodiment, the difference between the luminance values of any two pixels, for example, the difference between the luminance value I ₁ of the pixel 31 and the luminance value I ₂ of the pixel 32, that is, the following equation (2) is obtained. It is defined as an inter-pixel difference feature.

ここで、どのピクセル間差分特徴を顔検出に使用するかが弱判別器の能力となる。従って、切り出し画像に含まれる任意の２画素の組み合わせ（フィルタ又は弱仮説ともいう。）から、弱判別器に使用するピクセル位置の組を選択する必要がある。 Here, which inter-pixel difference feature is used for face detection is the capability of the weak classifier. Therefore, it is necessary to select a set of pixel positions used for the weak classifier from a combination of arbitrary two pixels included in the cut-out image (also referred to as a filter or a weak hypothesis).

例えばAdaBoostでは、弱判別器に、＋１（対象物体である）か−１（非対象物体）であるかの確定的な出力を要求する。そこで、AdaBoostにおいては、ある任意のピクセル位置において、そのピクセル間差分特徴を、１又は複数の閾値を利用して二分割（＋１又は−１)することをもって弱判別器とすることができる。 For example, in AdaBoost, the weak classifier is requested to have a deterministic output indicating whether it is +1 (target object) or -1 (non-target object). Therefore, in AdaBoost, a weak discriminator can be obtained by dividing a difference feature between pixels at one arbitrary pixel position into two (+1 or −1) using one or a plurality of thresholds.

また、このような２値出力ではなく、学習サンプルの確率分布を示す連続値（実数値）を確率的に出力するような例えばReal-AdaBoost又はGentle Boostなどのブースティングアルゴリズムの場合、弱判別器は、入力された画像が対象物である確からしさ（確率）を出力する。弱判別器の出力は、このように確定的であっても確率的であってもよい。先ず、これら２種類の弱判別器について説明する。 In addition, in the case of a boosting algorithm such as Real-AdaBoost or Gentle Boost that probabilistically outputs a continuous value (real value) indicating the probability distribution of the learning sample instead of such binary output, a weak classifier Outputs the probability (probability) that the input image is an object. The output of the weak classifier may be deterministic or stochastic as described above. First, these two types of weak classifiers will be described.

（３−１）２値出力の弱判別器
確定的な出力をする弱判別器は、ピクセル間差分特徴の値に応じて、対象物か否かの２クラス判別をおこなう。対象画像領域中のある２つのピクセルの輝度値をＩ_１、Ｉ_２とし、ピクセル間差分特徴により対象物か否かを判別するための閾値をＴｈとすると、下記式（３）を満たすか否かで、いずれのクラスに属するかを決定することができる。 (3-1) Binary Output Weak Discriminator A weak discriminator that performs deterministic output performs 2-class discrimination of whether or not an object is present according to the value of the inter-pixel difference feature. Whether the luminance values of two pixels in the target image area are I _{1 and} I _2, and the threshold for determining whether or not the target object is based on the inter-pixel difference feature is Th, whether the following expression (3) is satisfied: It can be determined which class it belongs to.

ここで、弱判別器を構成するには、２つのピクセル位置と、その閾値を決定する必要があるがその決定方法については後述する。上記式（３）の閾値判定は最も単純な場合である。また、閾値判定には、次の下記式（４）又は式（５）に示す２つの閾値を用いることもできる。 Here, in order to construct a weak classifier, it is necessary to determine two pixel positions and their threshold values. The determination method will be described later. The threshold determination of the above equation (3) is the simplest case. For threshold determination, the two thresholds shown in the following formula (4) or formula (5) can also be used.

図６（ａ）乃至図６（ｃ）は、縦軸に頻度をとり、横軸にピクセル間差分特徴をとって、それぞれ上記式（３）〜（５）に示す３つの判別方法を、データの頻度分布の特徴的なケースに合わせて示す模式図である。ここで、図６（ａ）乃至図６（ｃ）においては、ｙ_ｉは弱判別器の出力を示し、破線で示すデータがｙ_ｉ＝−１（非対象物）である全学習サンプルの出力値を示し、実線で示すデータがｙ_ｉ＝１（対象物）である全学習サンプルの出力値を示す。多数の顔画像、非顔画像からなる学習サンプルに対し、同一のピクセル間差分特徴に対する頻度を取ると図６（ａ）乃至図６（ｃ）に示すヒストグラムが得られる。 6 (a) to 6 (c), the frequency is plotted on the vertical axis and the inter-pixel difference feature is plotted on the horizontal axis, and the three discrimination methods shown in the above formulas (3) to (5) are respectively used as data. It is a schematic diagram shown according to the characteristic case of frequency distribution. Here, in FIGS. 6A to 6C, y _i indicates the output of the weak discriminator, and the output of all learning samples in which the data indicated by the broken line is y _i = −1 (non-object). The values shown are the output values of all the learning samples in which the data indicated by the solid line is y _i = 1 (target object). When the frequency with respect to the same difference feature between pixels is taken with respect to a learning sample made up of a large number of face images and non-face images, histograms shown in FIGS. 6A to 6C are obtained.

図６（ａ）に示すように、ヒストグラムが、例えば、非対象物を示す破線のデータと、対象物を示す実線のデータとが同様な正規曲線のような分布を取り、そのピーク位置がずれるような場合は、その境を閾値Ｔｈとし、上記式（３）にて対象物か否かを判別することができる。例えばAdaBoostにおいては、弱判別器の出力をｆ（ｘ）としたとき、出力ｆ（ｘ）＝１（対象物）又は−１（非対象物）となる。図６（ａ）では、ピクセル間差分特徴が閾値Ｔｈより大きい場合に対象物であると判定され、弱判別器の出力がｆ（ｘ）＝１となる例を示している。 As shown in FIG. 6A, the histogram has a distribution like a normal curve in which the broken line data indicating the non-object and the solid line data indicating the object have the same distribution, and the peak position is deviated. In such a case, the boundary is set as the threshold Th, and it can be determined whether or not it is an object by the above equation (3). For example, in AdaBoost, when the output of the weak classifier is f (x), the output f (x) = 1 (target object) or −1 (non-target object). FIG. 6A shows an example in which when the inter-pixel difference feature is larger than the threshold value Th, the object is determined to be an object, and the output of the weak discriminator is f (x) = 1.

また、ピーク位置が同じような位置にあって、その分布の幅が異なるような場合、分布が狭い方のピクセル間差分特徴の上限値近傍及び下限値近傍を閾値として上記式（４）又は式（５）により対象物か否かを判別することができる。図６（ｂ）では、分布が狭い方を対象物と判定される例、図６（ｃ）では、分布の幅が広い方から分布の幅が狭い方を除いたものが対象物と判定され、弱判別器の出力がｆ（ｘ）＝１となる例を示している。 Further, when the peak positions are at the same position and the distribution widths are different, the above formula (4) or formula is used with the vicinity of the upper limit value and the lower limit value of the inter-pixel difference feature having the narrower distribution as a threshold value. Whether the object is an object can be determined by (5). In FIG. 6B, an example in which a narrower distribution is determined as an object is determined, and in FIG. 6C, an object obtained by excluding a narrower distribution width from a wider distribution is determined as an object. In this example, the output of the weak classifier is f (x) = 1.

弱判別器は、あるピクセル間差分特徴とその閾値とを決定することにより構成されるが、その判定によって誤り率ができるだけ小さくなるような、即ち判別率が高いピクセル間差分特徴を選択する必要がある。例えば、閾値は、２つの画素位置を決め、正解付けされた学習サンプルに対して図６に示すヒストグラムを求め、最も正解率が高く、非正解率（誤り率）が最も小さくなるような閾値を検索することで求めることができる。また、２つの画素位置は、閾値と共に得られる誤り率が最も小さいものを選択するなどすればよい。但し、AdaBoostにおいては、判別の難易度を反映した重み（データ重み）が各学習サンプルに付けられており、適切なピクセル間差分特徴（どの位置の２つのピクセルの輝度値を特徴値とするか）が後述する重み付き誤り率を最小にするように学習される。 The weak discriminator is configured by determining a certain inter-pixel difference feature and its threshold value, but it is necessary to select an inter-pixel difference feature that makes the error rate as small as possible by the determination, that is, has a high discrimination rate. is there. For example, the threshold value is determined so that two pixel positions are determined, the histogram shown in FIG. 6 is obtained with respect to the correctly-acquired learning sample, and the threshold value is such that the highest correct answer rate and the lowest incorrect answer rate (error rate) are obtained. It can be obtained by searching. For the two pixel positions, the one with the smallest error rate obtained together with the threshold may be selected. However, in AdaBoost, each learning sample is given a weight (data weight) that reflects the difficulty of discrimination, and an appropriate inter-pixel difference feature (whether the luminance value of two pixels at which position is used as a feature value) ) Is learned to minimize the weighted error rate described later.

（３−２）連続値出力の弱判別器
確率的な出力をする弱判別器としては、上述した如く、例えばReal-AdaBoost又はGentle Boostなどのように弱判別器が連続値を出力するものがある。この場合は、ある決められた一定値（閾値）により判別問題を解き、２値出力（ｆ（ｘ）＝１又は−１）する上述の場合と異なり、入力された画像が対象物である度合いを例えば確率密度関数として出力する。 (3-2) Weak discriminator of continuous value output As described above, as a weak discriminator that performs stochastic output, for example, a weak discriminator that outputs a continuous value, such as Real-AdaBoost or Gentle Boost, is used. is there. In this case, the degree to which the input image is an object is different from the above-described case where the discrimination problem is solved by a certain fixed value (threshold value) and binary output (f (x) = 1 or −1) is performed. Is output as a probability density function, for example.

このような、対象物体である度合い（確率）を示す確率的な出力は、ピクセル間差分特徴ｄを入力としたとき、Ｐ_Ｐ（ｘ）を学習サンプルの対象物の確率密度関数、Ｐ_ｎ（ｘ）を学習サンプルの非対象物の確率密度関数とすると、下記式（６）に示す関数ｆ（ｘ）とすることができる。 Such a stochastic output indicating the degree (probability) of the target object is obtained by using P _p (x) as the probability density function of the target of the learning sample and P _n ( If x) is the probability density function of the non-object of the learning sample, the function f (x) shown in the following equation (6) can be obtained.

図７（ａ）は、縦軸に確率密度をとり、横軸にピクセル間差分特徴をとって、データの頻度分布の特徴的なケースを示す図、図７（ｂ）は、縦軸に関数ｆ（ｘ）の値をとり、横軸にピクセル間差分特徴をとって、図７（ａ）に示すデータ分布における関数ｆ（ｘ）を示すグラフ図である。図７（ａ）において、破線が非対象物体であることを示す確率密度、実線が対象物体であることを示す確率密度を示す。上記式（６）から関数ｆ（ｘ）を求めると図７（ｂ）に示すグラフが得られる。弱判別器は、判別工程において、入力されるウィンドウ画像から得られた上記式（２）に示すピクセル間差分特徴ｄに対応する関数ｆ（ｘ）を出力する。この関数ｆ（ｘ）は、対象物らしさの度合いを示すものであって、例えば非対象物を−１、対象物を１としたとき、−１乃至１までの連続値を取るものとすることができる。例えばピクセル間差分特徴ｄとそれに対応するｆ（ｘ）とからなるテーブルを記憶し、入力に応じてテーブルからｆ（ｘ）を読出し出力する。従って、一定値である閾値Ｔｈ又はＴｈ_１、Ｔｈ_２より若干記憶量が大きくなるが判別性能が向上する。 FIG. 7A is a diagram showing a characteristic case of the frequency distribution of data, with probability density on the vertical axis and inter-pixel difference features on the horizontal axis, and FIG. 7B is a function on the vertical axis. FIG. 8 is a graph showing a function f (x) in the data distribution shown in FIG. 7A, where f (x) is taken and the inter-pixel difference feature is taken on the horizontal axis. In FIG. 7A, the probability density indicating that the broken line is a non-target object and the probability density indicating that the solid line is a target object are shown. When the function f (x) is obtained from the above equation (6), a graph shown in FIG. 7B is obtained. In the discrimination process, the weak discriminator outputs a function f (x) corresponding to the inter-pixel difference feature d shown in the above formula (2) obtained from the input window image. This function f (x) indicates the degree of object-likeness. For example, when the non-object is -1 and the object is 1, the function f (x) takes continuous values from -1 to 1. Can do. For example, a table composed of the inter-pixel difference feature d and the corresponding f (x) is stored, and f (x) is read out from the table according to the input and output. Accordingly, although the storage amount is slightly larger than the threshold value Th or Th ₁ or Th ₂ which is a constant value, the discrimination performance is improved.

これら複数の推定方法（判別方法）は、アンサンブル学習中に組み合わせて使用することで、判別性能が向上することが期待できる。また、いずれか単一の判別方法のみを利用すれば、実行速度性能を引き出すことができる。 These multiple estimation methods (discrimination methods) can be expected to improve discrimination performance when used in combination during ensemble learning. Moreover, if only any one of the determination methods is used, the execution speed performance can be extracted.

本実施の形態において使用する弱判別器は、使用する特徴量（ピクセル間差分特徴）が非常に単純であるために、上述したように極めて高速に対象物の判別をおこなうことができる点が特長である。このように対象物として顔検出する場合には、ピクセル間差分特徴を上述の判別方法のうち最も単純な式（３）に示す閾値判定によっても極めてよい判別結果が得られるが、どのような判別方法により弱判別器が有効に機能するかは、対象とする問題によって異なり、その閾値設定方法などを適宜選択すればよい。また、問題によっては、２つの画素の輝度値の差ではなく、複数個の画素の輝度値の差を特徴量としたり、それらを組み合わせた特徴量を使用したりしてもよい。 The weak discriminator used in the present embodiment is characterized in that it can discriminate an object at a very high speed as described above because the feature amount (inter-pixel difference feature) used is very simple. It is. In this way, when detecting a face as an object, an extremely good discrimination result can be obtained by the threshold judgment shown in the simplest formula (3) of the above-described discrimination methods for the inter-pixel difference feature. Whether the weak classifier functions effectively depending on the method depends on the target problem, and the threshold setting method or the like may be selected as appropriate. Further, depending on the problem, instead of the difference between the luminance values of two pixels, the difference between the luminance values of a plurality of pixels may be used as a feature amount, or a feature amount combining them may be used.

（４）打ち切り閾値
次に、打ち切り閾値について説明する。ブースティングを用いた集団学習機においては、通常は、上述したように判別器５を構成する全弱判別器の出力の重み付き多数決によりウィンドウ画像が対象物か否かを判別する。重み付き多数決は、弱判別器の判別結果（推定値）を逐次足し合わせていくことで算出される。例えば、弱判別器の個数をｔ（＝１，・・・，Ｋ）、各弱判別器に対応する多数決の重み（信頼度）をα_ｔ、各弱判別器の出力をｆ_ｔ（ｘ）としたとき、AdaBoostにおける重み付き多数決の値Ｆ(ｘ)は、下記式（７）により求めることができる。 (4) Abort threshold Next, the abort threshold will be described. In a group learning machine using boosting, normally, as described above, it is determined whether or not the window image is an object by a weighted majority vote of the outputs of all weak classifiers constituting the classifier 5. The weighted majority vote is calculated by sequentially adding the discrimination results (estimated values) of the weak classifiers. For example, the number of weak classifiers is t (= 1,..., K), the majority decision weight (reliability) corresponding to each weak classifier is α _t , and the output of each weak classifier is f _t (x). Then, the weighted majority value F (x) in AdaBoost can be obtained by the following equation (7).

図８は、横軸に弱判別器の数をとり、縦軸に上記式（７）に示す重み付き多数決の値Ｆ(ｘ)をとって、入力される画像が対象物か否かに応じた重み付き多数決の値Ｆ(ｘ)の変化を示すグラフ図である。図８において、実線で示すデータＤ１〜Ｄ４は、対象物としてラベリングされている画像（学習サンプル）を入力として弱判別器により算出した推定値ｆ（ｘ）を逐次算出し、その重み付き多数決の値Ｆ(ｘ)を逐次求めたものである。このデータＤ１〜Ｄ４に示すように、対象物を入力画像とするとある程度の個数の弱判別器の判別によりその重み付き多数決の値Ｆ(ｘ)はプラスになる。 In FIG. 8, the horizontal axis represents the number of weak classifiers and the vertical axis represents the weighted majority value F (x) shown in the above equation (7), depending on whether the input image is an object or not. It is a graph which shows the change of the value F (x) of the weighted majority vote. In FIG. 8, data D1 to D4 indicated by solid lines sequentially calculate an estimated value f (x) calculated by a weak discriminator using an image (learning sample) labeled as an object as an input, and the weighted majority vote. The value F (x) is obtained sequentially. As shown in the data D1 to D4, when an object is an input image, the weighted majority value F (x) is positive due to a certain number of weak classifiers.

ここで、本実施の形態においては、通常のブースティングアルゴリズムとは異なる手法を導入する。すなわち、弱判別器の判別結果を逐次足し合わせていく過程において、全ての弱判別器の結果を得る前であっても、明らかに対象物ではないと判別できるウィンドウ画像についてはその判別を中止するものである。この際、判別を中止するか否かを決定する閾値を学習工程にて学習しておく。以下、判別を中止するか否かの判定に用いる閾値を打ち切り閾値という。 Here, in the present embodiment, a method different from a normal boosting algorithm is introduced. That is, in the process of sequentially adding the discrimination results of the weak classifiers, even before obtaining the results of all weak classifiers, the discrimination is canceled for window images that can clearly be determined not to be objects. Is. At this time, a threshold value for determining whether to stop the discrimination is learned in the learning step. Hereinafter, the threshold used for determining whether or not to stop the determination is referred to as an abort threshold.

この打ち切り閾値により、全てのウィンドウ画像について、全弱判別器の出力結果を用いなくとも、非対象物であることが確実に推定できる場合、弱判別器の推定値ｆ（ｘ）の演算を途中で中止することができ、これにより、全ての弱判別器を使用した重み付き多数決を行うのに比して格段に演算量を低減することができる。 When it is possible to reliably estimate the non-target object for all window images without using the output result of all weak classifiers, the calculation of the estimated value f (x) of the weak classifier is in progress. As a result, the amount of calculation can be significantly reduced as compared to performing weighted majority using all weak classifiers.

この打ち切り閾値としては、ラベリングされている学習サンプルのうち、検出対象物を示す学習サンプルの判別結果の重み付き多数決の値が取りえる最小値とすることができる。判別工程において、ウィンドウ画像の弱判別器による結果が逐次重み付きされて出力される、即ち、重み付き多数決の値が逐次更新されていくが、この更新されていく値と、上記打ち切り閾値とを更新の度、即ち１つの弱判別器が判別結果を出力する毎に比較し、更新された重み付き多数決の値が打ち切り閾値を下回る場合には当該ウィンドウ画像は対象物ではないとし、計算を打ち切ることができ、これにより無駄な演算を省いて更に判別処理を高速化することができる。 The abort threshold can be a minimum value that can be taken by the weighted majority value of the discrimination result of the learning sample indicating the detection target object among the labeled learning samples. In the discrimination step, the result of the weak discriminator of the window image is sequentially weighted and output, that is, the weighted majority value is sequentially updated. Every time update is performed, that is, each time one weak discriminator outputs a discrimination result, a comparison is made. If the updated weighted majority value falls below the abort threshold, the window image is not an object and the computation is aborted. Thus, it is possible to further speed up the discrimination process by omitting useless calculations.

すなわち、Ｋ番目の弱判別器の出力ｆ_Ｋ（ｘ）の打ち切り閾値Ｒ_Ｋは、学習サンプルｘ_ｉ（＝ｘ_１〜ｘ_Ｎ）のうち、対象物である学習サンプルｘ_ｊ（＝ｘ_１〜ｘ_Ｊ）を使用したときの重み付き多数決の値の最小値とされ、下記式（８）のように定義される。 That is, the cutoff threshold R _K of the output f _K (x) of the K-th weak discriminator is the learning sample x _j (= x ₁ to x ₁ ) that is the object among the learning samples x _i (= x _{1 to} x _N ). x _J ) is used as the minimum value of the weighted majority vote, and is defined as the following equation (8).

この式（８）に示すように、対象物である学習サンプルｘ_１〜ｘ_Ｊの重み付き多数決の値の最小値が０を上回る場合には打ち切り閾値Ｒ_Ｋには０が設定される。なお、０を上回らないようにするのは、０を閾値にして判別を行うAdaBoostの場合であり、ここは集団学習の手法により異なる場合がありうる。AdaBoostの場合においては、打ち切り閾値は図８の太線で示すように、入力画像として対象物を入力した場合の全データＤ１〜Ｄ４のうち取りうる最小値に設定され、全てのデータＤ１〜Ｄ４の最小値が０を超えた場合は、打ち切り閾値が０に設定される。 As shown in this equation (8), 0 is set in the termination threshold R _K if the minimum value of the weighted majority value of the learning samples x ₁ ~x _J as an object is greater than 0. It should be noted that not exceeding 0 is the case of AdaBoost in which discrimination is performed using 0 as a threshold, and this may differ depending on the group learning technique. In the case of AdaBoost, as shown by the bold line in FIG. 8, the abort threshold is set to the minimum value that can be taken out of all data D1 to D4 when an object is input as an input image, and all data D1 to D4 If the minimum value exceeds 0, the abort threshold is set to 0.

本実施の形態においては、弱判別器が生成される毎の打ち切り閾値Ｒ_ｔ（Ｒ₁〜Ｒ_ｋ）を学習しておくことで、後述する判別工程において、例えばデータＤ５のように、複数の弱判別器により推定値が逐次出力され重み付き多数決の値が逐次更新されていくが、この値が上記打ち切り閾値を下回った時点で、後段の弱判別器による判別を行う処理を終了する。すなわち、この打ち切り閾値Ｒ_ｔを学習しておくことにより、弱判別器の推定値を計算する毎に次の弱判別器の計算を行うか否かを決定でき、明らかに対象物ではないとされる場合には全ての弱判別器の判別結果を待たずに非対象物であることが判定でき、演算を途中で打ち切りことで検出処理を高速化することができる。 In the present embodiment, by learning the truncation threshold value R _t (R _{1 to} R _k ) each time a weak classifier is generated, in a discrimination process described later, for example, a plurality of data D5 The estimated value is sequentially output by the weak discriminator, and the weighted majority value is successively updated. When this value falls below the abort threshold, the process of discriminating by the subsequent weak discriminator ends. That is, by learning this truncation threshold value _Rt , it is possible to determine whether or not to calculate the next weak classifier every time the estimated value of the weak classifier is calculated. In this case, it can be determined that the object is a non-object without waiting for the determination results of all weak classifiers, and the detection process can be speeded up by aborting the calculation.

（５）学習方法
次に、集団学習機６の学習方法について説明する。与えられたデータが、例えば顔か否かを判別する問題など、一般的な２クラス判別のパターン認識問題の前提として、予め人手によりラベリング（正解付け）された学習サンプルとなる画像（訓練データ）を用意する。学習サンプルは、検出したい対象物体の領域を切り出した画像群と、全く関係のない例えば風景画などを切り出したランダムな画像群とからなる。 (5) Learning Method Next, a learning method of the group learning machine 6 will be described. An image (training data) that is a learning sample that has been manually labeled (corrected) in advance as a premise of a general pattern recognition problem for two-class classification, such as a problem of determining whether the given data is a face, for example. Prepare. The learning sample includes an image group obtained by cutting out a region of the target object to be detected, and a random image group obtained by cutting out a landscape image or the like that has nothing to do with it.

これらの学習サンプルを基に学習アルゴリズムを適用し、判別時に用いる学習データを生成する。判別時に用いる学習データとは、本実施の形態においては、上述した学習データを含む以下の４つの学習データである。すなわち、
（Ａ）２つのピクセル位置の組（Ｋ個）
（Ｂ）弱判別器の閾値（Ｋ個）
（Ｃ）重み付き多数決の重み（弱判別器の信頼度）（Ｋ個）
（Ｄ）打ち切り閾値（Ｋ個） A learning algorithm is applied based on these learning samples to generate learning data used for discrimination. In this embodiment, the learning data used at the time of discrimination is the following four learning data including the learning data described above. That is,
(A) A set of two pixel positions (K)
(B) Weak classifier threshold (K)
(C) Weight of majority vote (reliability of weak classifier) (K)
(D) Abort threshold (K)

（５−１）判別器の生成
以下に、上述したような多数の学習サンプルから、上記（Ａ）乃至（Ｄ）に示す４種類の学習データを学習するアルゴリズムを説明する。図９は、集団学習機６の学習方法を示すフローチャートである。なお、ここでは、学習アルゴリズムとして、弱判別の際の閾値として一定の値を使用するアルゴリズム（AdaBoost）に従った学習について説明するが、閾値として正解の確からしさ（確率）を示す連続値を使用する例えばReal-AdaBoostなど、弱判別器を複数結合するために集団学習するものであれば、学習アルゴリズムはAdaBoostに限らない。 (5-1) Generation of Discriminator An algorithm for learning the four types of learning data shown in (A) to (D) above from a large number of learning samples as described above will be described. FIG. 9 is a flowchart showing a learning method of the group learning machine 6. Here, learning according to an algorithm (AdaBoost) that uses a constant value as a threshold value for weak discrimination will be described as a learning algorithm, but a continuous value indicating the probability (probability) of the correct answer is used as the threshold value. For example, a learning algorithm is not limited to AdaBoost as long as it performs collective learning to combine a plurality of weak classifiers, such as Real-AdaBoost.

（ステップＳ０）学習サンプルのラベリング
上述のように、予め対象物又は非対象物であることがラベリングされた学習サンプル（ｘ_ｉ，ｙ_ｉ）を用意する。
ここで、
学習サンプル（ｘ_ｉ，ｙ_ｉ）：（ｘ_１，ｙ_１），・・・，（ｘ_Ｎ，ｙ_Ｎ）
ｘ_ｉ∈Ｘ，ｙ_ｉ∈{−１，１}
Ｘ：学習サンプルのデータ
Ｙ：学習サンプルのラベル（正解）
Ｎ：学習サンプル数
を示す。即ち、ｘ_ｉは、学習サンプル画像の全輝度値からなる特徴ベクトルを示す。また、ｙ_ｉ＝−１は、学習サンプルが非対象物としてラベリングされている場合を示し、ｙ_ｉ＝１は、学習サンプルが対象物としてラベリングされていることを示す。 (Step S0) Labeling of Learning Sample As described above, a learning sample (x _i , y _i ) that has been previously labeled as an object or a non-object is prepared.
here,
Learning sample (x _i , y _i ): (x ₁ , y ₁ ), ..., (x _N , y _N )
x _i εX, y _i ε {−1,1}
X: Learning sample data Y: Learning sample label (correct answer)
N: Indicates the number of learning samples. That is, x _i represents a feature vector made up of all luminance values of the learning sample image. Further, y _i = −1 indicates a case where the learning sample is labeled as a non-object, and y _i = 1 indicates that the learning sample is labeled as an object.

（ステップＳ１）データ重みの初期化
ブースティングにおいては、各学習サンプルの重み（データ重み）を異ならせ、判別が難しい学習サンプルに対するデータ重みを相対的に大きくしていく。判別結果は、弱判別器を評価する誤り率（エラー）の算出に使用されるが、判別結果にデータ重みを乗算することで、より難しい学習サンプルの判別を誤った弱判別器の評価が実際の判別率より下まわることになる。後述する方法にてデータ重みは逐次更新されるが、先ず最初にこの学習サンプルのデータ重みの初期化を行う。学習サンプルのデータ重みの初期化は、全学習サンプルの重みを一定にすることにより行われ、下記式（９）のように定義される。 (Step S1) Initialization of data weight In boosting, the weight (data weight) of each learning sample is varied to relatively increase the data weight for the learning sample that is difficult to discriminate. The discriminant result is used to calculate the error rate (error) for evaluating the weak discriminator, but by multiplying the discriminant result by the data weight, the evaluation of the weak discriminator that erroneously discriminates the more difficult learning sample is actually performed. Will fall below the discrimination rate. Although the data weight is sequentially updated by a method to be described later, first, the data weight of this learning sample is initialized. The initialization of the data weights of the learning samples is performed by making the weights of all the learning samples constant, and is defined as the following equation (9).

ここで、学習サンプルのデータ重みＤ_１，ｉは、繰り返し回数ｔ＝１回目の学習サンプルｘ_ｉ（＝ｘ₁〜ｘ_Ｎ）のデータ重みを示す。Ｎは学習サンプル数である。 Here, the data weight D _{1, i} of the learning sample indicates the data weight of the learning sample x _i (= x _{1 to} x _N ) for the _first iteration t = 1. N is the number of learning samples.

（ステップＳ２〜Ｓ７）繰り返し処理
次に、以下に示すステップＳ２〜ステップＳ７の処理を繰り返すことで判別器５を生成する。ここで、繰り返し処理回数をｔ＝１，２，・・・，Ｋとする。１回の繰り返し処理を行う毎に１つの弱判別器、即ち１組の画素と、その位置でのピクセル間差分特徴が学習され、従って繰り返し処理回数（Ｋ回）分、弱判別器が生成されて、Ｋ個の弱判別器からなる判別器５が生成されることになる。なお、通常、数百〜数千個の繰り返し処理により、数百〜数千個の弱判別器が生成されるが、繰り返し処理回数（弱判別器の個数）ｔは、要求される判別性能、判別する問題（対象物）に応じて適宜設定すればよい。 (Steps S2 to S7) Repetitive Processing Next, the discriminator 5 is generated by repeating the processing of steps S2 to S7 shown below. Here, it is assumed that the number of repetition processes is t = 1, 2,. Each time iterative processing is performed, one weak discriminator, that is, one set of pixels and the inter-pixel difference feature at that position are learned. Accordingly, weak discriminators are generated for the number of times of repeated processing (K times). Thus, the discriminator 5 composed of K weak discriminators is generated. Usually, hundreds to thousands of iterative processes generate hundreds to thousands of weak classifiers, but the number of iterations (number of weak classifiers) t is the required discrimination performance, What is necessary is just to set suitably according to the problem (object) to identify.

（ステップＳ２）弱判別器の学習
ステップＳ２では弱判別器の学習（生成）を行うが、この学習方法については後述する。本実施の形態においては、１回の繰り返し処理毎に１つの弱判別器を後述する方法に従って生成する。 (Step S2) Weak classifier learning In step S2, weak classifier learning (generation) is performed. This learning method will be described later. In the present embodiment, one weak discriminator is generated according to a method described later for each repetition process.

（ステップＳ３）重み付き誤り率ｅ_ｔの算出
次に、ステップＳ２にて生成された弱判別器の重み付き誤り率を下記式（１０）により算出する。 (Step S3) Calculation of Weighted Error Rate e _t Next, the weighted error rate of the weak discriminator generated in Step S2 is calculated by the following equation (10).

上記式（１０）に示すように、重み付き誤り率ｅ_ｔは、学習サンプルのうち、弱判別器の判別結果が誤っているもの（ｆ_ｔ（ｘ_ｉ）≠ｙ_ｉ）である学習サンプルのデータ重みのみを加算したものとなり、上述したように、データ重みＤ_ｔ，ｉが大きい（判別が難しい）学習サンプルの判別を間違えると重み付き誤り率ｅ_ｔが大きくなるよう算出される。なお、重み付き誤り率ｅ_ｔは０．５未満となるが、この理由は後述する。 As shown in the equation (10), the weighted error ratio e _t, among the learning samples, which is incorrect determination result of the weak discriminator _{_{(f t (x i) ≠}} y i) of a is learning samples Only the data weights are added, and as described above, the weighted error rate _et is calculated to be large if the data sample Dt _{, i} has a large (difficult to distinguish) learning sample. The weighted error rate _et is less than 0.5, and the reason will be described later.

（ステップＳ４）重み付き多数決の重み（弱判別器の信頼度）の算出
次に、上述の式（１０）に示す重み付き誤り率ｅ_ｔに基づき、重み付き多数決の重み（以下、信頼度という。）信頼度α_ｔを下記式（１１）により算出する。この、重み付き多数決の重みは、繰り返し回数ｔ回目に生成された弱判別器の信頼度α_ｔを示す。 Calculation of (Step S4) of weighted majority decision weight (weak discriminator reliability) Next, based on the weighted error ratio e _t shown in equation (10) described above, the weight of weighted majority decision (hereinafter, referred reliability .) The reliability α _t is calculated by the following equation (11). The weight of the weighted majority vote indicates the reliability α _t of the weak classifier generated at the t-th iteration.

上記式（１０）に示すように、重み付き誤り率ｅ_ｔが小さいものほどその弱判別器の信頼度α_ｔが大きくなる。 As shown in the equation (10), the reliability alpha _t of the weak classifier increases as the ones weighted error ratio e _t is small.

（ステップＳ５）学習サンプルのデータ重み更新
次に、上記式（１１）にて得られた信頼度α_ｔを使用して、下記式（１２）により学習サンプルのデータ重みＤ_ｔ，ｉを更新する。データ重みＤ_ｔ，ｉは、通常全部足し合わせると１になるよう正規化されており、下記式（１３）はデータ重みＤ_ｔ，ｉを正規化するためのものである。 (Step S5) Data Weight Update of Learning Sample Next, the data weight D _{t, i} of the learning sample is updated by the following equation (12) using the reliability α _t obtained by the above equation (11). . The data weights D _{t, i} are normally normalized to be 1 when all are added, and the following equation (13) is for normalizing the data weights D _{t, i} .

（ステップＳ６）打ち切り閾値Ｒ_ｔの算出
次に、上述したように、判別工程にて判別を打ち切るための打ち切り閾値Ｒ_ｔを算出する。打ち切り閾値Ｒ_ｔは、上述の式（８）に従って、対象物である学習サンプル（ポジディブな学習サンプル）ｘ_１〜ｘ_Ｊの重み付き多数決の値又は０のうち最も小さい値が選択される。なお、上述したように、最小値又は０を打ち切り閾値に設定するのは、０を閾値にして判別を行うAdaBoostの場合である。いずれにせよ、打ち切り閾値Ｒ_ｔは、少なくとも全てのポジティブな学習サンプルが通過できる最大の値となるよう設定する。 (Step S6) calculates the abort threshold value R _t Next, as described above, calculates the abort threshold value R _t for aborting the determination at determination step. As the censoring threshold value R _t , the smallest value among the weighted majority values of learning samples (positive learning samples) x _{1 to} x _J or 0 is selected according to the above equation (8). As described above, the minimum value or 0 is set as the abort threshold in the case of AdaBoost in which discrimination is performed using 0 as the threshold. In any case, the truncation threshold value _Rt is set to a maximum value that allows at least all positive learning samples to pass.

そして、ステップＳ７において、所定回数（＝Ｋ回）のブースティングが行われたか否かを判定し、行われていない場合は、ステップＳ２〜ステップＳ７の処理を繰り返す。所定回数の学習が終了した場合は学習処理を終了する。この繰り返し処理は、学習サンプルなどの与えられる画像から検出対象とする対象物を十分判別できる数の弱判別器を学習すると終了するものとする。 In step S7, it is determined whether or not a predetermined number of times (= K times) of boosting has been performed. If not, the processes in steps S2 to S7 are repeated. When the predetermined number of times of learning is finished, the learning process is finished. This iterative process ends when a number of weak discriminators that can sufficiently discriminate an object to be detected are learned from a given image such as a learning sample.

（５−２）弱判別器の生成
次に、上述したステップＳ２における弱判別器の学習方法（生成方法）について説明する。弱判別器の生成は、弱判別器が２値出力の場合と、上記式（６）に示す関数ｆ（ｘ）として連続値を出力する場合とで異なる。また、２値出力の場合においても、上記式（２）に示すように１つの閾値Ｔｈで判別する場合と、２つの閾値Ｔｈ_１、Ｔｈ_２で判別する場合とで処理が若干異なる。ここでは、１つの閾値Ｔｈで２値出力する弱判別器の学習方法（生成方法）を説明する。図１０は、１つの閾値Ｔｈで２値出力する弱判別器の学習方法（生成方法）を示すフローチャートである。 (5-2) Generation of weak classifier Next, the learning method (generation method) of the weak classifier in step S2 described above will be described. The weak discriminator is generated differently when the weak discriminator outputs a binary value and when a continuous value is output as the function f (x) shown in the above equation (6). Even in the case of binary output, the processing is slightly different between the case of determination using one threshold Th and the case of determination using two thresholds Th ₁ and Th ₂ as shown in the above equation (2). Here, a learning method (generation method) of a weak classifier that outputs a binary value with one threshold Th will be described. FIG. 10 is a flowchart showing a learning method (generation method) of a weak classifier that outputs a binary value with one threshold Th.

（ステップＳ１１）画素の選択
ここでは、学習サンプルにおける全画素から任意の２つを選択する。例えば２０×２０画素の学習サンプルを使用する場合、２つの画素の選択方法は、４００×３９９通りあるがそのうちの１つを選択する。ここで、２つの画素の位置をＳ_１、Ｓ_２、その輝度値をそれぞれＩ_１、Ｉ_２とする。 (Step S11) Selection of Pixels Here, arbitrary two are selected from all the pixels in the learning sample. For example, when a learning sample of 20 × 20 pixels is used, there are 400 × 399 selection methods for two pixels, and one of them is selected. Here, it is assumed that the positions of the two pixels are S ₁ and S ₂ , and the luminance values thereof are I ₁ and I ₂ , respectively.

（ステップＳ１２）頻度分布作成
次に、全ての学習サンプルに対して、ステップＳ１１にて選択した２つの画素の輝度値の差（Ｉ_１−Ｉ_２）であるピクセル間差分特徴ｄを求め、図６（ａ）に示したようなヒストグラム（頻度分布）を求める。 (Step S12) Frequency Distribution Creation Next, for all learning samples, an inter-pixel difference feature d, which is a difference (I ₁ -I ₂ ) between the luminance values of the two pixels selected in step S11, is obtained. A histogram (frequency distribution) as shown in FIG.

（ステップＳ１３）閾値Ｔｈ_ｍｉｎの算出
そして、ステップＳ１２にて求めた頻度分布から、上記式（１０）に示す重み付き誤り率ｅ_ｔを最小（ｅ_ｍｉｎ）にする閾値Ｔｈ_ｍｉｎを求める。 (Step S13) Calculation of the threshold _{Th min} Then, from the frequency distribution obtained in step S12, obtains the threshold _{Th min} that the weighted error ratio _{e t} shown in the equation (10) to the minimum _{(e min).}

（ステップＳ１４）閾値Ｔｈ_ｍａｘの算出
更に、上記式（１０）に示す重み付き誤り率ｅ_ｔを最大（ｅ_ｍａｘ）にする閾値Ｔｈ_ｍａｘを求め、下記（１４）に示す方法にて閾値を反転する。即ち、弱判別器は１つの閾値Ｔｈより大きいか否かで正解か不正解かの２つの値を出力するものであり、従って重み付き誤り率ｅ_ｔが０．５未満である場合は、反転することで誤り率を０．５以上にすることができる。 (Step S14) the threshold _{Th max} calculated yet, it obtains the threshold _{Th max} that maximizes _{(e max)} The weighted error ratio _{e t} shown in the equation (10), inverts the threshold value by the following method (14) To do. That is, the weak discriminator outputs two values of correct answer and incorrect answer depending on whether or not it is larger than one threshold value Th. Therefore, when the weighted error rate _et is less than 0.5, it is inverted. By doing so, the error rate can be made 0.5 or more.

（ステップＳ１５）パラメータ決定
最後に、上述のｅ_ｍｉｎとｅ_ｍａｘ’とから、弱判別器を構成する各パラメータ、即ち２つの画素の位置Ｓ_１、Ｓ_２と、その閾値Ｔｈとを決定する。すなわち、
ｅ_ｍｉｎ＜ｅ_ｍａｘ’の場合：Ｓ_１、Ｓ_２、Ｔｈ_ｍｉｎ
ｅ_ｍｉｎ＞ｅ_ｍａｘ’の場合：Ｓ_１’（＝Ｓ_２）、Ｓ_２’（＝Ｓ_１）、Ｔｈ_ｍｉｎ
そして、ステップＳ１６において、所定回数Ｍ回繰り返したか否かを判定し、所定回数繰り返した場合はステップＳ１７に進み、Ｍ回の繰り返し処理にて生成された弱判別器のうち最も誤り率ｅ_ｔが小さいものを弱判別器とし、図９に示すステップＳ３に進む。一方、ステップＳ１６にて所定回数に達していない場合は、ステップＳ１１〜ステップＳ１６の処理を繰り返す。このように、１つの弱判別器の生成にあたって、ｍ（＝１，２，・・・，Ｍ）回の繰り返し処理が行われる。なお、説明の都合上、図９に示すステップＳ３において重み付き誤り率ｅ_ｔを算出するものとして説明したが、ステップＳ１７において、最も誤り率ｅ_ｔが小さい弱判別器を選択した時点でステップＳ３に示す誤り率ｅ_ｔが自動的に得られる。 (Step S15) Parameter Determination Finally, each parameter constituting the weak classifier, that is, the positions S ₁ and S ₂ of the _two pixels, and the threshold Th thereof are determined from the above-described e _min and e _max ′. That is,
When e _min <e _max ′: S ₁ , S ₂ , Th _min
When e _min > e _max ′: S ₁ ′ (= S ₂ ), S ₂ ′ (= S ₁ ), Th _min
Then, in step S16, it is determined whether or not the predetermined number of times has been repeated. If the predetermined number of times has been repeated, the process proceeds to step S17, and the error rate _et is the highest among the weak discriminators generated by the M number of repetition processes. The smaller one is used as a weak discriminator, and the process proceeds to step S3 shown in FIG. On the other hand, if the predetermined number of times has not been reached in step S16, the processes in steps S11 to S16 are repeated. As described above, m (= 1, 2,..., M) iterations are performed to generate one weak classifier. For convenience of explanation, it has been described that the weighted error rate _et is calculated in step S3 shown in FIG. 9. However, in step S17, when the weak discriminator having the smallest error rate _et is selected, step S3 is performed. error ratio e _t shown in the automatically obtained.

なお、本実施の形態においては、前回の繰り返し処理においてステップＳ５にて求めたデータ重みＤ_ｔ，iを使用し、複数の弱判別器の特徴量を学習し、これらの弱判別器（弱判別器候補）の中から上記式（１０）に示す誤り率が最も小さいものを選択することで、１つの弱判別器を生成する場合について説明したが、上述のステップＳ２において、例えば予め用意又は学習した複数の画素位置から任意の画素位置を選択して弱判別器を生成するようにしてもよい。また、上述のステップＳ２〜ステップＳ７までの繰り返し処理に使用する学習サンプルとは異なる学習サンプルを使用して弱判別器を生成してもよい。また、cross-validation（交差検定）法又はjack-knife法などの評価などのように、学習サンプルとは別のサンプルを用意して生成された弱判別器や判別器の評価を行うようにしてもよい。ここで、交差検定とは、学習サンプルを均等にＩ個に分割し、その中から１つ以外を使用して学習を行い、当該１つを使用して学習結果を評価する作業をＩ回繰り返して学習結果の評価を行う手法である。 In the present embodiment, the data weights D _{t, i} obtained in step S5 in the previous iterative process are used to learn the feature quantities of a plurality of weak classifiers, and these weak classifiers (weak classifiers). The case where one weak classifier is generated by selecting the one with the smallest error rate shown in the above formula (10) from among the candidate generators has been described. In step S2, for example, preparation or learning is performed in advance. The weak discriminator may be generated by selecting an arbitrary pixel position from the plurality of pixel positions. Moreover, you may produce | generate a weak discriminator using the learning sample different from the learning sample used for the iterative process to the above-mentioned step S2-step S7. In addition, the weak classifier and classifier generated by preparing a sample different from the learning sample, such as the evaluation of the cross-validation method or jack-knife method, should be evaluated. Also good. Here, the cross-validation means that the learning sample is equally divided into I pieces, learning is performed using one of them, and the learning result is evaluated using the one for I times. This is a method for evaluating learning results.

一方、上記式（４）または式（５）に示すように、弱判別器が２つの閾値Ｔｈ_１、Ｔｈ_２を有するような場合、図１０に示すステップＳ１３〜ステップＳ１５の処理が若干異なる。上記式（３）に示すように、閾値Ｔｈが１つの場合は、反転することにより、誤り率が０．５より大きい場合にその誤り率を反転させることができたが、式（４）に示すように、ピクセル間差分特徴が閾値Ｔｈ_２より大きく閾値Ｔｈより小さい場合が正解の判別結果である場合、これを反転すると、式（５）に示すように、閾値Ｔｈ_２より小さいか、閾値Ｔｈ_１より大きい場合が正解の判別結果になる。即ち、式（４）の反転は式（５）となり、式（５）の反転は式（４）となる。 On the other hand, when the weak discriminator has two threshold values Th ₁ and Th ₂ as shown in the above formula (4) or formula (5), the processing in steps S13 to S15 shown in FIG. 10 is slightly different. As shown in the above equation (3), when the threshold value Th is one, the error rate can be reversed when the error rate is larger than 0.5 by reversing. As shown, when the difference feature between pixels is larger than the threshold Th _{2 and} smaller than the threshold Th is a correct discrimination result, when this is reversed, as shown in the equation (5), the difference feature is smaller than the threshold Th ₂ or the threshold Th When Th _{1 is} greater, the correct answer is determined. That is, the inversion of Expression (4) becomes Expression (5), and the inversion of Expression (5) becomes Expression (4).

弱判別器が２つの閾値Ｔｈ_１、Ｔｈ_２を有して判別結果を出力するような場合は、図１０に示すステップＳ１２において、ピクセル間差分特徴における頻度分布を求め、誤り率ｅ_ｔを最小にする閾値Ｔｈ_１、Ｔｈ_２を求める。そして、ステップＳ１６と同様に所定回数繰り返したか否かを判定し、所定回数繰り返し後、生成された弱判別器の中で最も誤り率が小さい弱判別器を採用する。 When the weak discriminator has two threshold values Th ₁ and Th ₂ and outputs a discrimination result, in step S12 shown in FIG. 10, the frequency distribution in the inter-pixel difference feature is obtained, and the error rate _et is minimized. Threshold values Th ₁ and Th ₂ are obtained. Then, similarly to step S16, it is determined whether or not it has been repeated a predetermined number of times, and after repeating the predetermined number of times, the weak discriminator having the smallest error rate among the generated weak discriminators is adopted.

また、上記式（６）に示すように、連続値を出力する弱判別器の場合、図１０に示すステップＳ１１と同様、先ず２つの画素をランダムに選択する。そして、ステップＳ１２と同様、全学習サンプルにおける頻度分布を求める。そして、得られた頻度分布に基づき上記式（６）に示す関数ｆ（ｘ）を求める。そして、弱判別器の出力として対象物である度合い（正解である度合い）を出力するような所定の学習アルゴリズムに従って誤り率を算出するという一連の処理を所定回数繰り返し、最も誤り率が小さい（正答率が高い）パラメータを選択することで弱判別器生成する。 Further, as shown in the above equation (6), in the case of a weak discriminator that outputs a continuous value, first, two pixels are randomly selected as in step S11 shown in FIG. And the frequency distribution in all the learning samples is calculated | required similarly to step S12. Then, based on the obtained frequency distribution, a function f (x) shown in the above equation (6) is obtained. Then, a series of processes of calculating an error rate according to a predetermined learning algorithm that outputs the degree of being an object (degree of correct answer) as the output of the weak classifier is repeated a predetermined number of times, and the error rate is the smallest (correct answer) A weak classifier is generated by selecting a parameter having a high rate.

ここで、いずれの弱判別器の生成においても、例えば２０×２０画素の学習サンプルを使用する場合、２つの画素の選択方法は、１５９０００通りあり、最大Ｍ＝１５９０００回、上記繰り返し処理を行った中で最も誤り率が小さいものを弱判別器として採用することができる。このように、繰り返し回数の最大数繰り返す、即ち生成し得る最大数の弱判別器を生成し、それらの中から最も誤り率が小さいものを弱判別器として採用すると性能が高い弱判別器を生成することができるが、最大回数未満の例えば数百回繰り返し処理を行って、その中から最も誤り率が小さいものを採用してもよい。 Here, in the generation of any weak discriminator, for example, when a learning sample of 20 × 20 pixels is used, there are 159000 selection methods for two pixels, and the above-described iterative processing is performed at maximum M = 159000 times. Among them, the one with the smallest error rate can be adopted as the weak classifier. In this way, the maximum number of weak discriminators that can be generated is generated by repeating the maximum number of iterations, and a weak discriminator with high performance is generated when the one with the smallest error rate is adopted as the weak discriminator. However, it is also possible to repeat the processing less than the maximum number of times, for example, several hundred times, and adopt the one with the smallest error rate.

（６）対象物検出方法
次に、図１に示す対象物検出装置の対象物検出方法について説明する。図１１は、対象物検出方法を示すフローチャートである。検出時（判別工程）においては、上述のようにして生成された弱判別器群を利用した判別器５を使用し、所定のアルゴリズムに従って画像中から対象物体を検出する。 (6) Object detection method Next, the object detection method of the object detection apparatus shown in FIG. 1 is demonstrated. FIG. 11 is a flowchart illustrating the object detection method. At the time of detection (discrimination step), the discriminator 5 using the weak discriminator group generated as described above is used to detect the target object from the image according to a predetermined algorithm.

（ステップＳ２１）スケーリング画像生成
先ず、図１に示すスケーリング部３は、画像出力部２から与えられた濃淡画像を一定の割合で縮小スケーリングする。なお、画像出力部２には、入力画像として濃淡画像が入力されてもよく、画像出力部２にて入力画像を濃淡画像に変換してもよい。スケーリング部３に画像出力部２から与えられる画像をスケール変換せずに出力し、次のタイミング以降で縮小スケーリングしたスケーリング画像を出力するが、スケーリング部３から出力する画像をまとめてスケーリング画像という。ここで、スケーリング画像を生成するタイミングは、前に出力したスケーリング画像全領域の顔検出が終了した時点とし、スケーリング画像がウィンドウ画像より小さくなった時点で次のフレームの入力画像の処理に移る。 (Step S21) Scaling Image Generation First, the scaling unit 3 shown in FIG. 1 scales the grayscale image supplied from the image output unit 2 at a certain rate. The image output unit 2 may receive a grayscale image as an input image, and the image output unit 2 may convert the input image into a grayscale image. The image supplied from the image output unit 2 is output to the scaling unit 3 without scaling, and the scaled image scaled down after the next timing is output. The images output from the scaling unit 3 are collectively referred to as a scaled image. Here, the timing for generating the scaled image is the time when the face detection of the entire area of the scaled image output previously is completed, and when the scaled image becomes smaller than the window image, the process proceeds to the processing of the input image of the next frame.

（ステップＳ２２）
スケーリングされた画像に対し、図１に示す走査部４が探索ウィンドウの位置を縦横に走査し、ウィンドウ画像を出力する。 (Step S22)
The scanning unit 4 shown in FIG. 1 scans the position of the search window vertically and horizontally with respect to the scaled image, and outputs a window image.

（ステップＳ２３、２４）評価値ｓの算出
そして、走査部４により出力されるウィンドウ画像が対象物であるか否かを判定する。判別器５は、ウィンドウ画像に対して上述した複数の弱判別器の推定値ｆ（ｘ）を逐次重み付き加算した値（重み付き多数決の値の更新値）を評価値ｓとして算出する。そして、この評価値ｓに基づきウィンドウ画像が対象物か否か、及び判別を打ち切るか否かを判定する。 (Steps S23 and 24) Calculation of Evaluation Value s Then, it is determined whether or not the window image output by the scanning unit 4 is an object. The discriminator 5 calculates, as the evaluation value s, a value obtained by sequentially adding the estimated values f (x) of the plurality of weak discriminators described above to the window image with weighting (update value of the weighted majority decision value). Then, based on the evaluation value s, it is determined whether or not the window image is an object and whether or not the determination is terminated.

先ず、ウィンドウ画像を入力されると、その評価値ｓ＝０に初期化する。判別器５の初段の弱判別器２１_１はピクセル間差分特徴ｄ_ｔを算出する（ステップＳ２３）。そしてこの弱判別器２１_１が出力する推定値を上記評価値ｓに反映させる（ステップＳ２４）。 First, when a window image is input, the evaluation value s = 0 is initialized. Weak classifiers 21 ₁ of the first-stage determination unit 5 calculates the inter-pixel difference feature _{d t} (step S23). And an estimation value output from the the weak discriminator 21 ₁ is reflected in the evaluation value s (Step S24).

ここで、上述した式（３）乃至（５）により、２値の推定値を出力する弱判別器と、式（６）に示す関数ｆ（ｘ）を推定値として出力する弱判別器とでは、その推定値の評価値ｓへの反映の仕方が異なる。 Here, the weak discriminator that outputs the binary estimated value and the weak discriminator that outputs the function f (x) shown in the equation (6) as the estimated value according to the above-described equations (3) to (5). The method of reflecting the estimated value on the evaluation value s is different.

先ず、上記式（２)を弱判別器に利用し、２値の値を推定値として出力する場合、評価値ｓは下記（１５）のようになる。 First, when the above equation (2) is used as a weak classifier and a binary value is output as an estimated value, the evaluation value s is as shown in (15) below.

また、上記式（３）を弱判別器に利用し、２値の値を推定値として出力する場合、評価値ｓは下記式（１６）のようになる。 When the above equation (3) is used as a weak classifier and a binary value is output as an estimated value, the evaluation value s is expressed by the following equation (16).

また、上記式（４）を弱判別器に利用し、２値の値を推定値として出力する場合、評価値ｓは下記式（１７）のようになる。 When the above equation (4) is used as a weak classifier and a binary value is output as an estimated value, the evaluation value s is expressed by the following equation (17).

また、上記式（５）を弱判別器に利用し、関数ｆを推定値として出力する場合、評価値ｓは下記式（１８）のようになる。 When the above equation (5) is used for the weak classifier and the function f is output as an estimated value, the evaluation value s is expressed by the following equation (18).

（ステップＳ２５、Ｓ２６）打ち切り判定
そして、判別器５は、上述に示す例えば４つの方法の何れかにより得られた（更新された）評価値ｓが打ち切り閾値Ｒ_ｔより大きいか否かを判定する。評価値ｓが打ち切り閾値Ｒ_ｔより大きい場合は、所定回数（＝Ｋ回）繰り返したか否かを判定し（ステップＳ２６）、繰り返していない場合はステップＳ２３からの処理を繰り返す。 (Steps S25, S26) Abort Determination Then, the discriminator 5 determines whether or not the (updated) evaluation value s obtained by any one of the four methods described above is larger than the abort threshold R _t. . When the evaluation value s is larger than the abort threshold value R _t is a predetermined number of times (= K times) repeated or not (step S26), if not repeat the process is repeated from step S23.

一方、所定回数（＝Ｋ回）繰り返している場合、及び評価値ｓが打ち切り閾値Ｒ_ｔより小さい場合はステップＳ２７に進み、得られている評価値ｓが０より大きいか否かにより、対象物であるか否かの判定をする。そして、対象物である場合は、現在のウィンドウ位置を記憶し、次の探索ウィンドウがあるか否かを判別し（ステップＳ２７）、次の探索ウィンドウがある場合はステップＳ２２からの処理を繰り返す。また、次の全ての領域について探索ウィンドウを走査した場合はステップＳ２８に進み、次のスケーリング画像があるか否かを判定し、なければステップＳ２９に進んで重なり領域の削除処理を実行する。スケーリング画像がある場合は、ステップＳ２１からの処理をくり返す。ステップＳ２１のスケーリング処理は、ウィンドウ画像よりスケーリング画像が小さくなった時点で終了する。 On the other hand, a predetermined number of times (= K times) if repeatedly are, and when the evaluation value s is smaller than the abort threshold value R _t, the process proceeds to step S27, resulting in that the evaluation value s is the zero or not greater than, the object It is determined whether or not. If it is an object, the current window position is stored, and it is determined whether or not there is a next search window (step S27). If there is a next search window, the processing from step S22 is repeated. If the search window has been scanned for all the next areas, the process proceeds to step S28 to determine whether or not there is a next scaled image. If not, the process proceeds to step S29 to execute the overlapping area deletion process. If there is a scaled image, the processing from step S21 is repeated. The scaling process in step S21 ends when the scaled image becomes smaller than the window image.

（ステップＳ２９〜Ｓ３１）重なり領域の削除
１枚の入力画像に対して、全てのスケーリング画像の処理が終了すると、ステップＳ２９の処理に移る。ステップＳ２９以降の処理においては、１枚の入力画像において、対象物体であると判定された領域が重複している場合に、互いに重なっている領域を取り除く。先ず、互いに重なっている領域が在るか否かを判定し、ステップＳ２６にて記憶した領域が複数あり、かつ重複している場合は、ステップＳ３０に進む。そして、互いに重なっている２つの領域を取り出し、この２つの領域のうち、評価値ｓが小さい領域は信頼度が低いとみなし削除し、評価値ｓの大きい領域を選択する（ステップＳ２９）。そして、再びステップＳ２９からの処理を繰り返す。これにより、複数回重複して抽出されている領域のうち、最も評価値ｓが高い領域１枚のみが選択される。なお、２以上の対象物領域が重複しない場合及び対象物領域が存在しない場合は１枚の入力画像についての処理を終了し、次のフレーム処理に移る。 (Steps S29 to S31) Deletion of Overlapping Region When processing of all the scaled images is completed for one input image, the process proceeds to step S29. In the processing after step S29, when the areas determined to be the target objects overlap in one input image, the overlapping areas are removed. First, it is determined whether or not there are areas overlapping each other. If there are a plurality of areas stored in step S26 and there are overlapping areas, the process proceeds to step S30. Then, two regions that overlap each other are extracted, and the region having a small evaluation value s is regarded as having low reliability, and the region having a large evaluation value s is selected from the two regions (step S29). Then, the processing from step S29 is repeated again. As a result, only one region having the highest evaluation value s is selected from the regions that are extracted multiple times. If two or more object areas do not overlap or there is no object area, the process for one input image is terminated, and the process proceeds to the next frame process.

このように、本実施の形態における対象物検出方法によれば、ピクセル間差分特徴により弱判別する弱判別器を集団学習により学習した判別器を使用して対象物を検出するため、ウィンドウ画像において、対応する２つの画素の輝度値を読出し、その差を算出するのみで、上記ステップＳ２３における対象物の特徴量の算出工程が終了し、極めて高速に顔検出処理することができるため、リアルタイムな顔検出が可能である。また、その特徴量から判別した判別結果（推定値）と判別に使用した弱判別器に対する信頼度とを乗算した値を加算して評価値ｓを逐次更新する毎に打ち切り閾値Ｒ_ｔと比較し、弱判別器の推定値の演算を続けるか否かを判定する。そして、打ち切り閾値Ｒ_ｔを評価値ｓが下回った場合に弱判別器の演算を打ち切り、次のウィンドウ画像の処理に移ることにより、無駄な演算を飛躍的に提言して更に高速に顔検出が可能となる。すなわち、入力画像及びそれを縮小スケーリングしたまたスケーリング画像の全ての領域を走査してウィンドウ画像を切り出した場合、それらのウィンドウ画像のうち対象物である確率は小さく、ほとんどが非対象物である。この非対象物であるウィンドウ画像の判別を途中で打ち切ることで、判別工程を極めて高効率化することができる。なお、逆に検出すべき対象物が多数含まれるような場合、上述した打ち切り閾値と同様の手法にて、対象物であると明らかなウィンドウ画像の演算を途中で打ち切るような閾値も設けてもよい。更に、入力画像をスケーリング部にてスケーリングすることで、任意の大きさの探索ウィンドウを設定し、任意の大きさの対象物を検出することができる。 As described above, according to the object detection method in the present embodiment, a weak discriminator that weakly discriminates based on an inter-pixel difference feature is used to detect an object using a discriminator learned by collective learning. By simply reading the luminance values of the two corresponding pixels and calculating the difference between them, the calculation process of the feature amount of the object in step S23 is completed, and the face detection processing can be performed at a very high speed. Face detection is possible. Further, each time the evaluation value s is sequentially updated by adding a value obtained by multiplying the discrimination result (estimated value) discriminated from the feature amount and the reliability of the weak discriminator used for discrimination, it is compared with the cutoff threshold R _t. Then, it is determined whether or not to continue the calculation of the estimated value of the weak classifier. Then, abort the operation of the weak discriminator when it falls below the evaluation value s of the abort threshold value R _t, by moving to the next window image processing, further face detection at a high speed dramatically recommendations wasteful operations It becomes possible. That is, when a window image is cut out by scanning the input image and all areas of the scaled image obtained by reducing and scaling the input image, the probability that the window image is an object is small, and most of the window images are non-objects. By discriminating the window image that is the non-object, the discrimination process can be made highly efficient. On the other hand, if there are a lot of objects to be detected, a threshold value may be provided so that the calculation of the window image that is clearly the object is interrupted by the same method as the above-described threshold value. Good. Further, by scaling the input image by the scaling unit, a search window having an arbitrary size can be set and an object having an arbitrary size can be detected.

（７）実施例
次に、対象物として実際に顔を検出した本発明の実施例について説明する。なお、対象物は顔に限らず、例えばロゴタイプや模様又は人間の顔以外の対象物画像など、２次元的な平面での特徴を有するものであって、上記ピクセル間差分特徴によりある程度の判別ができる（弱判別器を構成できる）ものであれば、どのような対象物でも検出できることはいうまでもない。 (7) Example Next, an example of the present invention in which a face is actually detected as an object will be described. Note that the object is not limited to a face, but has features on a two-dimensional plane such as a logotype, a pattern, or an object image other than a human face, and is determined to some extent by the above-described inter-pixel difference feature. It is needless to say that any object can be detected as long as it can perform (a weak classifier can be configured).

図１２（ａ）及び図１２（ｂ）は、本実施例の学習サンプルの一部を示す図である。学習サンプルは、対象物としてラベリングされた図１２（ａ）に示す顔画像群と、非対象物としてラベリングされた図１２（ｂ）に示す非顔画像群を使用する。図１２（ａ）及び（ｂ）には、学習サンプルとして使用した画像の一部を示すものであるが、学習サンプルとしては、例えば数千枚の顔画像と、数万枚の非顔画像とを用いる。画像サイズは例えば２０×２０ピクセルなどとする。 FIG. 12A and FIG. 12B are diagrams showing a part of the learning samples of this embodiment. The learning sample uses a face image group shown in FIG. 12A labeled as an object and a non-face image group shown in FIG. 12B labeled as a non-object. FIGS. 12A and 12B show a part of an image used as a learning sample. Examples of the learning sample include several thousand face images and tens of thousands of non-face images. Is used. The image size is, for example, 20 × 20 pixels.

本実施例では、これら学習サンプルから、図９及び図１０に示すアルゴリズムに従って上述の式（３）のみを利用した顔判別問題を学習する。このような学習により、最初に生成された１〜６番目までの弱判別器をそれぞれ図１３（ａ）乃至（ｆ）に示す。これらは顔の特徴をよく表していると考えられる。定性的には、図１３（ａ）の弱判別器ｆ_１は、額（Ｓ_１）は目（Ｓ_２）より明るい（閾値：１８．５）ことを示し、図１３（ｂ）の弱判別器ｆ_２は、頬（Ｓ_１）は目（Ｓ_２）より明るい（閾値：１７．５）ことを示す。また、図１３（ｃ）の弱判別器ｆ_３は、額（Ｓ_１）は髪の毛（Ｓ_２）より明るい（閾値：２６．５）ことを示し、図１３（ｄ）の弱判別器ｆ_４は、鼻の下（Ｓ_１）は鼻腔（Ｓ_２）より明るい（閾値：５．５）ことを示す。更に、図１３（ｅ）の弱判別器ｆ_５は、頬（Ｓ_１）は髪の毛（Ｓ_２）より明るい（閾値：２２．５）ことを示し、図１３（Ｆ）の弱判別器ｆ_６は、顎（Ｓ_１）は唇（Ｓ_２）より明るい（閾値：４．５）ことを示している。 In this embodiment, a face discrimination problem using only the above equation (3) is learned from these learning samples according to the algorithm shown in FIGS. FIGS. 13A to 13F show the first to sixth weak classifiers generated first by such learning. These are considered to represent facial features well. Qualitatively, the weak discriminator f ₁ in FIG. 13A indicates that the forehead (S ₁ ) is brighter than the eye (S ₂ ) (threshold value: 18.5), and the weak discrimination in FIG. 13B. The vessel f ₂ indicates that the cheek (S ₁ ) is brighter than the eyes (S ₂ ) (threshold value: 17.5). Furthermore, the weak discriminator _{f 3} in FIG. 13 (c), the amount _{(S 1)} is lighter than the hair _{(S 2)} (threshold value: 26.5) shows that, weak classifier _{f 4} shown in FIG. 13 (d) Indicates that the bottom of the nose (S ₁ ) is brighter than the nasal cavity (S ₂ ) (threshold: 5.5). Further, the weak discriminator f ₅ in FIG. 13 (e) indicates that the cheek (S ₁ ) is brighter (threshold: 22.5) than the hair (S ₂ ), and the weak discriminator f _{6 in} FIG. 13 (F). Indicates that the chin (S ₁ ) is brighter than the lips (S ₂ ) (threshold value: 4.5).

本実施例においては、最初の１つの弱判別器ｆ_１により７０％の正答率（学習サンプルに対する性能）を得ることができ、弱判別器ｆ_１〜ｆ_６を全て利用することで８０％の正答率に達する。そして、４０個の弱判別器を組み合わせることで９０％の正答率に達し、７６５個の弱判別器を組み合わせることにより９９％の正答率に達することができた。 In the present embodiment, 70% correct rate by the first one of the weak classifier f ₁ can be obtained (performance on learning sample), 80% by utilizing all the weak classifiers f ₁ ~f ₆ Reach the correct answer rate. By combining 40 weak classifiers, a correct answer rate of 90% was achieved, and by combining 765 weak classifiers, a correct answer rate of 99% could be reached.

図１４は、１枚の入力画像から検出された顔検出結果を示す図であって、（ａ）及び（ｂ）は、それぞれ重なり領域を取り除く前後を示す図である。図１４（ａ）に示す複数の枠が、検出された顔（対象物）であり、１枚の画像から図１１に示すステップＳ２１〜ステップＳ２８までの処理にて複数の顔（領域）が検出される。これをステップＳ２９〜ステップＳ３１に示す重なり領域除去処理を行うことで、１つの顔として検出することができる。なお、画像内に複数の顔が存在する場合には、複数の顔を同時に検出することができる。上述したように、本実施例の顔検出処理は、極めて高速に処理可能であって、通常のＰＣ等を利用しても１秒間に３０枚程度の入力画像から顔検出することができ、従って動画から顔検出することも可能である。 FIG. 14 is a diagram showing a face detection result detected from one input image, and (a) and (b) are diagrams showing before and after removing the overlapping region, respectively. A plurality of frames shown in FIG. 14 (a) are detected faces (objects), and a plurality of faces (regions) are detected by processing from one image to steps S21 to S28 shown in FIG. Is done. This can be detected as one face by performing the overlapping area removal processing shown in steps S29 to S31. If there are a plurality of faces in the image, the plurality of faces can be detected simultaneously. As described above, the face detection process according to the present embodiment can be processed at a very high speed, and can detect a face from about 30 input images per second even if a normal PC or the like is used. It is also possible to detect a face from a moving image.

なお、本発明は上述した実施の形態のみに限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能であることは勿論である。 It should be noted that the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present invention.

本発明の実施の形態における対象物検出装置の処理機能を示す機能ブロック図である。It is a functional block diagram which shows the processing function of the target object detection apparatus in embodiment of this invention. 上記対象物検出装置におけるスケーリング部にてスケール変換された画像を示す模式図である。It is a schematic diagram which shows the image scale-converted by the scaling part in the said target object detection apparatus. 上記対象物検出装置における走査部が探索ウィンドウを走査する様子を示す図である。It is a figure which shows a mode that the scanning part in the said target object detection apparatus scans a search window. 上記対象物検出装置における判別器の構成を示す模式図である。It is a schematic diagram which shows the structure of the discriminator in the said target object detection apparatus. ピクセル間差分特徴を説明するための画像を示す模式図であるIt is a schematic diagram which shows the image for demonstrating the difference feature between pixels. （ａ）乃至（ｃ）は、縦軸に頻度をとり、横軸にピクセル間差分特徴をとって、それぞれ上記式（３）〜（５）に示す３つの判別方法を、データの頻度分布の特徴的なケースに合わせて示す模式図である。In (a) to (c), the frequency is plotted on the vertical axis, and the inter-pixel difference feature is plotted on the horizontal axis. It is a schematic diagram shown according to a characteristic case. （ａ）は、縦軸に確率密度をとり、横軸にピクセル間差分特徴をとって、データの頻度分布の特徴的なケースを示す図、（ｂ）は、縦軸に関数ｆ（ｘ）の値をとり、横軸にピクセル間差分特徴をとって、（ａ）に示すデータ分布における関数ｆ（ｘ）を示すグラフ図である(A) is a diagram showing a characteristic case of frequency distribution of data, with probability density on the vertical axis and inter-pixel difference features on the horizontal axis, and (b) is a function f (x) on the vertical axis. FIG. 7 is a graph showing a function f (x) in the data distribution shown in FIG. 横軸に弱判別器の数をとり、縦軸に重み付き多数決の値Ｆ(ｘ)をとって、入力される画像が対象物か否かに応じた重み付き多数決の値Ｆ(ｘ)の変化を示すグラフ図である。Taking the number of weak classifiers on the horizontal axis and the weighted majority value F (x) on the vertical axis, the weighted majority value F (x) according to whether the input image is an object or not. It is a graph which shows a change. 上記対象物検出装置における判別器をえるための集団学習機の学習方法を示すフローチャートである。It is a flowchart which shows the learning method of the group learning machine for obtaining the discriminator in the said target object detection apparatus. １つの閾値Ｔｈで２値出力する弱判別器の学習方法（生成方法）を示すフローチャートである。It is a flowchart which shows the learning method (generation method) of the weak discriminator which outputs a binary value with one threshold value Th. 上記対象物検出装置における対象物検出方法を示すフローチャートである。It is a flowchart which shows the target object detection method in the said target object detection apparatus. （ａ）及び（ｂ）は、本発明の実施例に使用した学習サンプルの一部を示す図であってそれぞれ対象物としてラベリングされた顔画像群及び非対象物としてラベリングされた非顔画像群を示す図である。(A) And (b) is a figure which shows a part of learning sample used for the Example of this invention, Comprising: The facial image group labeled as a target object, and the non-facial image group labeled as a non-target object, respectively FIG. （ａ）乃至（ｆ）は、上記集団学習機における学習により、最初に生成されたそれぞれ１〜６番目までの弱判別器を説明する図である。(A) thru | or (f) is a figure explaining the 1st-6th weak discriminator each produced | generated initially by the learning in the said group learning machine. １枚の入力画像から検出された顔検出結果を示す図であって、（ａ）及び（ｂ）は、それぞれ重なり領域を取り除く前後を示す図である。It is a figure which shows the face detection result detected from one sheet of input images, Comprising: (a) And (b) is a figure which shows before and after removing an overlapping area | region, respectively. 特許文献１に記載の矩形特徴（rectangle feauture）を示す模式図である。It is a schematic diagram which shows the rectangular feature (rectangle feature) of patent document 1. FIG. 特許文献１に記載の矩形特徴を使用して顔画像に判別する方法を説明する図である。It is a figure explaining the method to discriminate | determine in a face image using the rectangular feature of patent document 1. FIG. 特許文献１に記載の積分画像（integral image）を示す模式図である。It is a schematic diagram which shows the integral image (integral image) described in patent document 1. FIG. 特許文献１に記載の積分画像を使用して矩形領域の輝度値の総和を算出する方法を説明する図である。It is a figure explaining the method of calculating the sum total of the luminance value of a rectangular area using the integral image of patent document 1. FIG.

Explanation of symbols

１対象物検出装置、２画像入力部、３スケーリング部、４走査部、５判別器、６集団学習機、２１_ｎ弱判別器、２２加算器、ｆ_１〜ｆ_６弱判別器 1 object detecting device, second image input unit, 3 a scaling unit, 4 a scanning unit, 5 discriminator 6 group learner, 21 _n weak classifiers, 22 an _adder, f 1 ~f ₆ weak classifier

Claims

In an object detection device for detecting whether a given grayscale image is an object,
A plurality of weak discriminating means for calculating an estimated value indicating whether or not the grayscale image is an object based on a feature amount consisting of a difference between luminance values of pixels at two positions learned in advance;
An object detection apparatus comprising: a determination unit that determines whether the grayscale image is an object based on the estimated value calculated by at least one of the plurality of weak determination units.

The discriminating means calculates a weighted majority value obtained by multiplying the estimated value by the reliability for each weak discriminating means obtained by the learning and adding the result, and based on the majority value, the gray image is targeted. It is discriminate | determined whether it is an object. The target object detection apparatus of Claim 1 characterized by the above-mentioned.

The plurality of weak discrimination means sequentially calculate the estimated value,
The discriminating means sequentially updates the weighted majority value every time the estimated value is calculated, and controls whether or not to stop the calculation of the estimated value based on the updated weighted majority value. The object detection device according to claim 2.

The discriminating means aborts the calculation of the estimated value depending on whether the value of the weighted majority vote is smaller than an abort threshold,
Each of the weak discriminating means is sequentially generated by group learning using a learning sample composed of a plurality of grayscale images that are correctly identified as being an object or a non-object,
The truncation threshold value is a value obtained by weighting the reliability to the estimated value for the learning sample that is the object calculated by the generated weak discriminating means every time the weak discriminating means is generated during the learning. The object detection device according to claim 3, comprising a minimum value of weighted majority values updated by addition.

The object detection device according to claim 4, wherein when the minimum value of the weighted majority vote at the time of learning is positive, 0 is set as the truncation threshold value.

The weak discriminating means calculates the binary estimated value indicating whether or not the object is an object in accordance with whether or not the feature amount of the grayscale image is equal to or greater than a predetermined threshold value. Item 1. The object detection apparatus according to Item 1.

The object detection apparatus according to claim 1, wherein the weak determination unit calculates, as the estimated value, a probability that the grayscale image is an object based on the feature amount.

In an object detection method for detecting whether a given grayscale image is an object,
A weak discrimination step of calculating an estimated value indicating whether or not the grayscale image is an object based on a feature amount consisting of a difference between luminance values of pixels at two positions learned in advance by a plurality of weak discriminators;
And a discrimination step of discriminating whether or not the grayscale image is an object based on the estimated value calculated by at least one of a plurality of weak classifiers.

In the determining step, a weighted majority value obtained by multiplying the estimated value by the reliability for each weak classifier obtained by the learning and adding the calculated value is calculated, and the grayscale image is processed based on the majority value. It is discriminate | determined whether it is an object. The target object detection method of Claim 8 characterized by the above-mentioned.

In the weak discrimination step, the estimated value is sequentially calculated by the plurality of weak discriminators,
In the determining step, each time the estimated value is calculated, the weighted majority value is sequentially updated,
The object detection method according to claim 9, further comprising: an abort control step of controlling whether or not to cancel the calculation of the estimated value based on the updated weighted majority value in the determination step.

In a group learning device that performs group learning using a learning sample consisting of a plurality of grayscale images that are correctly identified as being an object or a non-object,
Collectively learning multiple weak classifiers that use the above learning samples and output an estimated value indicating whether the grayscale image given as an input is a target value using the difference between the luminance values of two pixels at an arbitrary position as a feature quantity A group learning apparatus characterized by comprising learning means for performing

The learning means is
Weak classifier generating means for calculating the feature quantity of each learning sample and generating the weak classifier based on each feature quantity;
For the weak discriminator generated by the weak discriminator generating means, an error rate calculating means for calculating an error rate for discriminating the learning sample based on the data weight set for each learning sample;
A reliability calculation means for calculating a reliability for the weak classifier based on the error rate;
Data weight calculating means for updating the data weight so that the weight of the learning sample that the weak discriminator makes an incorrect answer relatively increases,
The group learning device according to claim 11, wherein the weak classifier generation unit generates a new weak classifier when the data weight is updated.

The weak discriminator generating means repeats the process of calculating the feature quantity a plurality of times to calculate a plurality of types of feature quantities, generates weak discriminator candidates for each feature quantity, and generates a plurality of weak discriminators. 13. An error rate obtained by discriminating the learning sample is calculated based on the data weight set for each learning sample, and the one having the smallest error rate is used as the weak discriminator. The group learning device described.

The weak discriminator generating means generates the weak discriminant candidate for determining whether or not the object is an object in accordance with whether or not the feature amount of the grayscale image is a predetermined threshold value or more. The group learning device according to claim 12.

The group learning device according to claim 12, wherein the weak discriminator generating unit generates the weak discriminant candidate that outputs a probability that the grayscale image is an object based on the feature amount.

Each time the weak discriminator generating means generates the weak discriminator, the weak discriminator calculates the estimated value for each learning sample that is the object, and weights and adds the reliability to the estimated value. The group learning device according to claim 12, further comprising: an abort threshold value storage unit that calculates a weighted majority value and stores the minimum value.

In a group learning method for performing group learning using a learning sample consisting of a plurality of grayscale images with correct answers as to whether they are objects or non-objects,
Collectively learning multiple weak classifiers that use the above learning samples and output an estimated value indicating whether the grayscale image given as an input is a target value using the difference between the luminance values of two pixels at an arbitrary position as a feature quantity A collective learning method characterized by having a learning step.

In the learning step, the feature amount of each learning sample is calculated, and the weak discriminator generation step that generates the weak discriminator based on each feature amount, and the weak discriminator generated by the weak discriminator generation step, An error rate calculating step of calculating an error rate by discriminating the learning sample based on the data weight set for each learning sample; a reliability calculating step of calculating a reliability for the weak discriminator based on the error rate; The group learning method according to claim 17, wherein a series of steps including the data weight calculation step of updating the data weight is repeated so that the weight of the learning sample that the weak discriminator makes an incorrect answer is relatively increased.

In the weak discriminator generation step, a plurality of types of feature quantities are calculated by repeating the process of calculating the feature quantities a plurality of times, weak discriminator candidates are generated for each feature quantity, and each of the generated weak discriminators is generated. An error rate obtained by discriminating the learning sample is calculated based on the data weight set for each learning sample, and the weak error discriminator is used as the weak discriminator. 18. The group learning method according to 18.

In the weak discriminator generating step, the weak discriminant candidate for determining whether or not the object is an object according to whether or not the feature amount of the grayscale image is equal to or greater than a predetermined threshold is generated. The group learning method according to claim 18.

The group learning method according to claim 18, wherein, in the weak discriminator generation step, the weak discriminant candidate that outputs a probability that the grayscale image is an object is generated based on the feature amount.

Each time the weak discriminator is generated in the weak discriminator generation step, the weak discriminator calculates the estimated value for each learning sample that is the object, and adds the reliability to the estimated value by weighting. The group learning method according to claim 18, further comprising: an abort threshold value storing step of calculating a weighted majority value and storing the minimum value.

In an object detection device that cuts out a fixed-size window image from a grayscale image and detects whether the window image is an object,
Scale conversion means for generating a scale image obtained by enlarging or reducing the size of the input grayscale image;
Window image scanning means for scanning the window of the fixed size from the scale image and cutting out the window image;
Object detection means for detecting whether a given window image is an object or not,
The object detection means includes a plurality of weak discrimination means for calculating an estimated value for estimating whether or not the window image is an object based on a feature amount consisting of a difference between luminance values of pixels at two positions learned in advance. An object detection apparatus comprising: determination means for determining whether or not the window image is an object based on the estimated value calculated by at least one of a plurality of weak determination means.

The discriminating means calculates a weighted majority value obtained by multiplying the estimated value by the reliability for each weak discriminating means obtained by the learning and adding the result, and based on the majority value, the gray image is targeted. It is discriminate | determined whether it is an object. The target object detection apparatus of Claim 23 characterized by the above-mentioned.

The plurality of weak discrimination means sequentially calculate the estimated value,
The discriminating means sequentially updates the weighted majority value every time the estimated value is calculated, and controls whether or not to stop the calculation of the estimated value based on the updated weighted majority value. 25. The object detection device according to claim 24.

In an object detection method for cutting out a fixed-size window image from a grayscale image and detecting whether the window image is an object,
A scale conversion step for generating a scale image obtained by enlarging or reducing the size of the input grayscale image;
A window image scanning step of scanning the fixed-size window from the scale image and cutting out the window image;
An object detection step of detecting whether a given window image is an object, or
The object detection step includes
A weak discrimination step of calculating an estimated value indicating whether or not the grayscale image is an object based on a feature amount consisting of a difference between luminance values of pixels at two positions learned in advance by a plurality of weak discriminators;
An object detection method comprising: a determination step of determining whether the grayscale image is an object based on the estimated value calculated by at least one of a plurality of weak classifiers.

In the determining step, a weighted majority value obtained by multiplying the estimated value by the reliability for each weak classifier obtained by the learning and adding the calculated value is calculated, and the grayscale image is processed based on the majority value. It is discriminate | determined whether it is an object. The target object detection method of Claim 26 characterized by the above-mentioned.

In the weak discrimination step, the estimated value is sequentially calculated by the plurality of weak discriminators,
In the determining step, each time the estimated value is calculated, the weighted majority value is sequentially updated,
28. The object detection method according to claim 27, further comprising an abort control step of controlling whether or not to cancel the calculation of the estimated value based on the updated weighted majority value in the discrimination step.