JP2009059047A

JP2009059047A - Device, method and program for detecting object

Info

Publication number: JP2009059047A
Application number: JP2007224002A
Authority: JP
Inventors: Yoko Mitsugi; 洋子三次
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2007-08-30
Filing date: 2007-08-30
Publication date: 2009-03-19

Abstract

<P>PROBLEM TO BE SOLVED: To enable high-speed processing with a small amount of calculations and detect an object from a digital image at high accuracy. <P>SOLUTION: A device for detecting an object includes: a first detection unit 60 for detecting a plurality of image regions which are likely to be an object of interest from digital image data input; a grouping process unit 70 for extracting, from a plurality of image regions doubly detected for the one object among the plurality of image regions likely to be the one object, the image area that is most likely to be the one object; and a second detection unit 80 for detecting with high accuracy the image region of the object from the image region selected by the grouping process unit 70. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、デジタル画像から注目したい対象物の検出を行うための、対象物検出装置、対象物検出方法、および対象物検出プログラムに関する。 The present invention relates to an object detection device, an object detection method, and an object detection program for detecting an object of interest from a digital image.

従来、デジタル画像処理分野においては、デジタル画像の中から注目したい対象物を検出するためのさまざまな技術が提案されている。 Conventionally, in the digital image processing field, various techniques for detecting an object of interest from a digital image have been proposed.

これらの技術において、注目したい対象物として例えば人の顔を検出する場合には、目、鼻、口、または輪郭など顔の各部分の形状の特徴を用いる方法、顔の濃淡特徴や肌色などの色の特徴を利用する方法、またはこれらを組み合わせる方法などが用いられる。 In these techniques, for example, when detecting a human face as an object of interest, a method using the shape features of each part of the face such as eyes, nose, mouth, or contour, facial tone features, skin color, etc. A method using color characteristics or a method of combining them is used.

そして、これらの対象物を検出する技術の１つとして、ブースティングと呼ばれる手法がある。このブースティング法は、検出精度の低い識別器（弱識別器）を多数組み合わせ、誤りが最小になるようにそれぞれの弱識別器に重み付けをすることにより１つの精度の高い識別器を構成して、対象物の検出を実行する方法である。 One technique for detecting these objects is a technique called boosting. In this boosting method, a plurality of discriminators with low detection accuracy (weak discriminators) are combined, and each weak discriminator is weighted so as to minimize an error, thereby forming one high-accuracy discriminator. This is a method for detecting an object.

この多数の弱識別器で構成された識別器では、対象物の濃淡特徴を基に生成された学習データを利用して高速に対象物の検出を行うことができる。 In the classifier constituted by a large number of weak classifiers, it is possible to detect an object at high speed using learning data generated based on the density characteristics of the object.

このようなブースティング法を利用して対象物を検出する技術の一例として、非特許文献１に記載の顔検出器がある。 As an example of a technique for detecting an object using such a boosting method, there is a face detector described in Non-Patent Document 1.

この非特許文献１に記載の顔検出器は、複数の弱識別器で構成され、ブースティングの一手法であるアダブースト（AdaBoost）を使って、Haar-like特徴のような濃淡特徴のパターンを示す複数の形状・サイズの矩形データの中から、顔の輝度の濃淡を表すのに効果的でエラーの少ない特徴を抽出すべく学習を行うことにより、デジタル画像の中から対象物である顔領域を検出するものである。 The face detector described in Non-Patent Document 1 is composed of a plurality of weak classifiers, and shows a pattern of light and dark features such as Haar-like features using AdaBoost, which is a boosting technique. By performing learning to extract features that are effective in expressing the brightness of the face and with few errors from rectangular data of multiple shapes and sizes, the face area that is the target object is extracted from the digital image. It is to detect.

この非特許文献１に記載の顔検出器の構成を図５に示す。 The configuration of the face detector described in Non-Patent Document 1 is shown in FIG.

図５に示すように、顔検出器１００は、複数の弱識別器をｍ個の分類器（第１分類器１００−１〜第ｍ分類器１００−ｍ）に分け、複数の弱識別器からなるｍ個の分類器をカスケード接続することにより構成されている。 As shown in FIG. 5, the face detector 100 divides a plurality of weak classifiers into m classifiers (first classifier 100-1 to m-th classifier 100-m), and from the plurality of weak classifiers. Are configured by cascading m classifiers.

この顔検出器１００によりデジタル画像の中から対象物を検出する際は、まず、検出対象のデジタル画像の輝度成分（輝度画像）を入力し、積分画像（Integral Image）を作成する。 When the face detector 100 detects an object from a digital image, first, the luminance component (luminance image) of the digital image to be detected is input to create an integral image.

この積分画像とは、例えば画像中の自身より１つ上と、自身より１つ左の画素の輝度値の和に自身の画素の輝度値を足し込む作業を画像の左下から順に行うことで生成されるものであり、任意の位置の画素値が自身より左下の矩形領域の輝度値の総和になっている画像のことである。 This integrated image is generated by, for example, performing the work of adding the luminance value of its own pixel to the sum of the luminance values of the pixels one pixel above and one pixel left from the image in order from the lower left of the image. It is an image in which the pixel value at an arbitrary position is the sum of the luminance values of the lower left rectangular area.

この積分画像を生成しておくことにより、画像内における所望の矩形領域の四隅の画素の輝度値を加減算するのみで、この矩形領域の輝度値の総和を高速に算出することができる。 By generating this integrated image, the sum of the luminance values of the rectangular area can be calculated at high speed only by adding and subtracting the luminance values of the pixels at the four corners of the desired rectangular area in the image.

次に、入力された輝度画像から、検索の対象となる２４×２４ピクセルの画像で構成されるウィンドウ画像を順次走査して切り出す。 Next, a window image composed of 24 × 24 pixel images to be searched is sequentially scanned and cut out from the input luminance image.

次に、この切り出したウィンドウ画像を検出器１００に入力し、先頭の第１分類器１００−１の中の先頭の弱識別器１００−１（１）から弱識別器１００−１（ｎ）までのｎ個の弱識別器で、逐次このウィンドウ画像が顔領域であるか非顔領域であるかを２値で判定して出力し、この結果に第１分類器１００−１の信頼度に応じた重み付けを加算した結果である評価値を算出する。 Next, this cut-out window image is input to the detector 100, and from the first weak classifier 100-1 (1) to the weak classifier 100-1 (n) in the first first classifier 100-1. The n weak classifiers sequentially determine whether the window image is a face area or a non-face area, and outputs the result according to the reliability of the first classifier 100-1. An evaluation value that is a result of adding the weights is calculated.

次に、この算出された評価値が予め学習データにより設定された閾値よりも高いか否かを判定し、判定の結果が高いときはこのウィンドウ画像が顔領域であると判断して次の第２分類器１００−２にデータを送出する一方、閾値よりも低いときはこのウィンドウ画像は顔領域でないと判断して判定処理を打ち切ることにより処理の高速化を図っている。 Next, it is determined whether or not the calculated evaluation value is higher than a threshold value set in advance by learning data. If the determination result is high, it is determined that the window image is a face area, and While data is sent to the 2 classifier 100-2, if the window image is lower than the threshold, it is determined that the window image is not a face area, and the determination process is terminated to speed up the process.

次に、第２分類器１００−２において第１分類器１００−１と同様に評価値を算出し、この評価値が設定された閾値よりも高いときには、さらに第３分類器１００−３にデータを送出する。 Next, the second classifier 100-2 calculates an evaluation value in the same manner as the first classifier 100-1, and when this evaluation value is higher than the set threshold value, the third classifier 100-3 further stores data. Is sent out.

このウィンドウ画像は、入力した輝度画像内を縦横に走査することにより順次切り出し、ウィンドウ画像に顔領域があるか否かの判断を行い、最後の第ｍ分類器１００−ｍまで通過したウィンドウ画像を顔領域の画像として検出する。 The window image is sequentially cut out by scanning the input luminance image vertically and horizontally, it is determined whether or not the window image has a face area, and the window image that has passed to the last m-th classifier 100-m is determined. It is detected as an image of the face area.

そして、検出対象のデジタル画像内における顔の大きさは様々であるため、このデジタル画像を段階的に縮小して、それぞれ上述した積分画像の生成および顔領域の検出処理を行うことにより、高い精度で顔検出を行う。 Since the size of the face in the digital image to be detected varies, the digital image is reduced in stages, and the above-described integral image generation and face area detection processing are performed, thereby achieving high accuracy. Perform face detection with.

なお、上記の検出方法については、特許文献１にも記載されている。 The above detection method is also described in Patent Document 1.

特許文献１は、顔の表情認識システムおよび顔の表情認識のための学習方法について記載されたものであり、その中で顔検出方法について記載されている。 Patent Document 1 describes a facial expression recognition system and a learning method for facial expression recognition, in which a face detection method is described.

この特許文献１に記載された、アダブーストを用いた顔検出のための検出器の学習では、全ての弱仮説（非特許文献１における弱識別器と同義）中から性能が他より高いと推定される弱仮説を選別し、選別された弱仮説により統計的性質に基づいて新弱仮説を生成している。 In the learning of the detector for face detection using Adaboost described in Patent Document 1, it is estimated that the performance is higher than others among all weak hypotheses (synonymous with the weak classifier in Non-Patent Document 1). Weak hypotheses are selected, and new weak hypotheses are generated based on statistical properties by the selected weak hypotheses.

次に、特許文献１に記載の顔検出器の構成を図６に示す。 Next, the configuration of the face detector described in Patent Document 1 is shown in FIG.

この特許文献１に記載の顔検出器２００は、選別された複数の弱仮説（第１弱仮説２００−１〜第ｎ弱仮説２００−ｎ）が非特許文献１に記載の弱識別器と同様にカスケード接続されることにより構成されているが、この顔検出器２００によりデジタル画像の中から対象物を検出する際は、１つの弱仮説（＝弱識別器）による判定結果を出力するたびにウィンドウ画像に顔領域があるか否かを判断する。 In the face detector 200 described in Patent Document 1, a plurality of selected weak hypotheses (first weak hypothesis 200-1 to nth weak hypothesis 200-n) are the same as the weak classifier described in Non-Patent Document 1. However, when the face detector 200 detects an object from a digital image, every time a determination result based on one weak hypothesis (= weak classifier) is output. It is determined whether or not there is a face area in the window image.

各弱仮説による判定結果としては、各弱仮説における推定値を算出するたびに、「ウィンドウ画像が顔領域であるか否かを示す判別結果の推定値×当該弱仮説の信頼度」を加算し、評価値として算出する。 As the determination result by each weak hypothesis, every time the estimated value in each weak hypothesis is calculated, “the estimated value of the discrimination result indicating whether the window image is a face region × the reliability of the weak hypothesis” is added. Calculated as an evaluation value.

さらに、この算出された評価値と予め学習により算出された処理を打ち切るための閾値とに基づき、明らかに顔領域ではないと判定した場合には判定処理を打ち切ることにより高速化を図るものである。 Furthermore, based on the calculated evaluation value and a threshold value for canceling the process calculated in advance by learning, when it is determined that the face area is not clearly a face area, the determination process is terminated to increase the speed. .

このようにして処理を進めていき、最後の弱仮説２００−ｎまで通過したウィンドウ画像を顔領域の画像として検出する。 The processing proceeds in this way, and the window image that has passed through to the last weak hypothesis 200-n is detected as an image of the face area.

これらの非特許文献１および特許文献１に記載の顔検出器においては、デジタル画像内の全てのウィンドウ画像に対して判別処理が終わった後、顔領域を示す領域として検出された領域が近傍で２つ以上重複していた場合には、互いに重複する２つの領域のうち評価値が小さい方が顔領域である可能性が低いとみなして削除し、評価値が大きい方の領域を顔領域の画像として選択する。または、それらを平均した領域を抽出する。この処理を、重複がなくなるまで繰り返し、最後に選択した画像を顔領域の画像として検出する。 In the face detectors described in Non-Patent Document 1 and Patent Document 1, after the discrimination processing is completed for all window images in the digital image, the area detected as the area indicating the face area is in the vicinity. If there are two or more overlapping areas, the area with the smaller evaluation value of the two overlapping areas is considered to be less likely to be a face area, and the area with the larger evaluation value is deleted. Select as an image. Alternatively, an area obtained by averaging them is extracted. This process is repeated until there is no overlap, and the last selected image is detected as a face area image.

また特許文献２には、非特許文献１または特許文献１のように識別器を一列に接続するのではなく、図７に示すように識別器である各ノードがネットワーク状に接続された検出装置３００が記載されている。 Further, in Patent Document 2, the discriminators are not connected in a line as in Non-Patent Document 1 or Patent Document 1, but as shown in FIG. 7, each node as a discriminator is connected in a network form. 300 is described.

この検出器３００によりデジタル画像の中から対象物を検出する際は、例えばパス３０１のような複数のノードを通る経路を生成し、入力されたウィンドウ画像に対してパスの各ノードにおいて顔などの対象物か否かを評価した評価値の累積を求め、パスの識別結果に対する識別エラーの推定値を算出する。 When detecting an object from a digital image by the detector 300, for example, a path passing through a plurality of nodes such as a path 301 is generated, and a face or the like is detected at each node of the path with respect to the input window image. An accumulation of evaluation values that evaluate whether or not the object is an object is obtained, and an estimation value of an identification error for a path identification result is calculated.

そして、複数のパスにおける識別結果の中で最も低い識別エラーが予め設定されている閾値よりも小さくなった場合は識別処理を終了し、そこに対象物があると判断する。 If the lowest identification error among the identification results in a plurality of paths is smaller than a preset threshold value, the identification process is terminated, and it is determined that there is an object there.

上記の処理により顔領域であると判断される場合にも、識別エラーが予め設定された閾値よりも小さくなった場合に処理を打ち切ることで処理の高速化を図る。また、エラーが閾値よりも小さくならない場合はパスの生成および評価を継続し、パスの数が予め設定された数に達したときに処理を打ち切るものである。
Paul Viola, Michael Jones, 「Robust Real-Time Face Detection」, International Journal of Computer Vision 57(2), 137-154,2004 特開２００５−４４３３０号公報特開２００６−３５０６４５号公報 Even when a face region is determined by the above processing, the processing is speeded up by terminating the processing when the identification error becomes smaller than a preset threshold value. If the error does not become smaller than the threshold value, path generation and evaluation are continued, and the process is terminated when the number of paths reaches a preset number.
Paul Viola, Michael Jones, `` Robust Real-Time Face Detection '', International Journal of Computer Vision 57 (2), 137-154, 2004 JP 2005-44330 A JP 2006-350645 A

しかしながら、非特許文献１の顔検出器１００においては、例えば入力されるデジタル画像が幅３２０×高さ２４０ピクセルの輝度画像である場合、画像を４／５倍ずつ１０段階に縮小して積分画像を生成し、縮小されたそれぞれの画像において幅２４×高さ２４ピクセルのウィンドウ画像を切り出し、縦横１ピクセルを飛ばして走査することにより判定処理を行うと、４００００個近くのウィンドウ画像に対して判定処理を行うことになる。 However, in the face detector 100 of Non-Patent Document 1, for example, when the input digital image is a luminance image having a width of 320 × 240 pixels in height, the image is reduced to 10 steps by 4/5 times to obtain an integrated image. If a determination process is performed by cutting out a window image having a width of 24 × height of 24 pixels in each reduced image and scanning by skipping one pixel in the vertical and horizontal directions, determination is made on nearly 40,000 window images. Processing will be performed.

また、判定処理に利用する学習データは概ね数百から数千個の特徴量により形成されており（非特許文献１の場合は６０６０個）、弱識別器においては１ウィンドウ画像に対して最高で学習データの個数分の判別処理が行われる。 In addition, the learning data used for the determination process is generally formed by hundreds to thousands of feature quantities (6060 in the case of Non-Patent Document 1), and the weak classifier is the maximum for one window image. A discrimination process for the number of pieces of learning data is performed.

この判別処理は、切り出されたウィンドウ画像がコントラストの低い無地の壁など顔と濃淡特徴が全く違うときには初めの方の分類器で「顔でない」と判断され処理が打ち切られるが、複雑な濃淡を持つウィンドウ画像の場合は後段の分類器まで進むことが多く、判別のための計算量が膨大になる可能性がある。 In this discrimination process, when the clipped window image is completely different from the face and shade characteristics, such as a plain wall with low contrast, the first classifier determines that the face is not a face and the process is terminated. In the case of a window image having, it often proceeds to a subsequent classifier, and the amount of calculation for discrimination may become enormous.

また、顔領域であると判断されるウィンドウ画像の近傍では、同じ検出対象により顔領域であると判断されたウィンドウ画像が複数検出されることが多い。 Further, in the vicinity of a window image determined to be a face area, a plurality of window images determined to be a face area by the same detection target are often detected.

つまり、１つの顔領域を検出する際に、近傍の複数のウィンドウ画像においても学習データの個数分の判別処理が行われることになり、さらに計算量は膨大になる。 That is, when one face area is detected, discrimination processing is performed for the number of learning data in a plurality of neighboring window images, and the calculation amount is enormous.

特許文献１の顔検出器２００では、弱仮説（＝弱識別器）ごとに処理を打ち切るための閾値を有しているが、どの時点で判別の打ち切りが行われるかのタイミングが非特許文献１の場合よりも多少早くなるのみであり、対象物の検出を行うための計算量はやはり膨大になる。 The face detector 200 of Patent Literature 1 has a threshold value for aborting the process for each weak hypothesis (= weak discriminator), but the timing at which the discrimination is aborted is determined by Non-Patent Literature 1. The calculation amount for detecting the object is enormous.

特許文献２の検出器３００では、複数の識別器をネットワーク状に配置しているが、このネットワーク状の最初に配置された識別器（ノード）など検出処理時に同じ識別器を何度も通るため効率的ではない。 In the detector 300 of Patent Document 2, a plurality of discriminators are arranged in a network, but the same discriminator (node) arranged at the beginning of the network passes through the same discriminator many times during detection processing. Not efficient.

また、特許文献２の検出器３００の形状では、ノードの途中で打ち切ることができないため、結果的には対象物であるか否かの判断をするために多くの識別器を通ることになり、顕著な高速化は見込めない。 In addition, in the shape of the detector 300 of Patent Document 2, since it cannot be cut off in the middle of the node, as a result, it passes through many discriminators in order to determine whether or not it is an object. Significant speedup is not expected.

そこで本発明は、上記問題に鑑みてなされたものであり、少ない計算量で高速に処理が可能であり、且つ高い精度でデジタル画像から対象物を検出することが可能な対象物検出装置、対象物検出方法、および対象物検出プログラムを提供することを目的とする。 Therefore, the present invention has been made in view of the above problems, and is an object detection device and an object that can be processed at high speed with a small amount of calculation and can detect an object from a digital image with high accuracy. An object is to provide an object detection method and an object detection program.

上記目的を達成するための本発明の対象物検出装置（１）は、入力されたデジタル画像データから、対象物が含まれる画像領域を検出するものであり、前記デジタル画像データを所定サイズのウィンドウにより走査させながら、ウィンドウ毎に前記対象物が含まれる確度を示す複数の第１の評価値を算出し、これら算出された第１の評価値に基づき、複数の候補画像領域を検出する第１の検出手段（６０）と、前記第１の検出手段（６０）で検出された複数の候補画像領域のうち、少なくとも画像部分が重複している候補画像領域がある場合に、これら重複している候補画像領域から最も前記評価値の高い候補画像領域を選択し、この選択された候補画像領域と前記画像部分の重複がない候補画像領域とを抽出するグルーピング処理手段（７０）と、前記グルーピング処理手段（７０）で抽出された複数の候補画像領域それぞれについて、前記対象物が含まれる確度を示す複数の第２の評価値を算出し、これら算出された第２の評価値に基づき前記画像領域を検出する第２の検出手段（８０）とを備えたことを特徴とする。 In order to achieve the above object, an object detection apparatus (1) of the present invention detects an image area including an object from input digital image data, and uses the digital image data as a window of a predetermined size. A plurality of first evaluation values indicating the probability that the object is included for each window are calculated while scanning, and a plurality of candidate image regions are detected based on the calculated first evaluation values. Of the plurality of candidate image areas detected by the detection means (60) and the first detection means (60), at least when there is a candidate image area where the image part overlaps, these overlap. A grouping processing unit (7) that selects a candidate image region having the highest evaluation value from candidate image regions and extracts the selected candidate image region and a candidate image region in which the image portion does not overlap. ) And a plurality of second evaluation values indicating the probability that the object is included for each of the plurality of candidate image areas extracted by the grouping processing means (70), and the calculated second evaluation values And second detection means (80) for detecting the image area based on the value.

また、本発明の対象物検出装置（１）の前記第１の検出手段（６０）が、前記対象物に関する複数の識別器からなる第１の学習データを用いてブースティング法により前記第１の評価値を算出すると共に、前記第２の検出手段（８０）が、前記対象物に関する複数の識別器からなる第２の学習データを用いてブースティング法により前記第２の評価値を算出するようにしてもよい。 Further, the first detection means (60) of the object detection device (1) according to the present invention uses the first learning data including a plurality of discriminators related to the object by the boosting method. While calculating an evaluation value, the second detection means (80) calculates the second evaluation value by a boosting method using second learning data including a plurality of discriminators related to the object. It may be.

また、本発明の対象物検出装置（１）の前記第１の学習データにおける複数の識別器の個数を、前記第２の学習データにおける複数の識別器の個数よりも少なくしてもよい。 In addition, the number of the plurality of classifiers in the first learning data of the object detection device (1) of the present invention may be smaller than the number of the plurality of classifiers in the second learning data.

また、本発明の対象物検出方法は、入力されたデジタル画像データから、対象物が含まれる画像領域を検出する対象物検出方法であり、前記デジタル画像データを所定サイズのウィンドウにより走査させながら、ウィンドウ毎に前記対象物が含まれる確度を示す複数の第１の評価値を算出し、これら算出された第１の評価値に基づき、複数の候補画像領域を検出する第１の検出ステップ（６０）と、前記第１の検出ステップ（６０）において検出された複数の候補画像領域のうち、少なくとも画像部分が重複している候補画像領域がある場合に、これら重複している候補画像領域から最も前記評価値の高い候補画像領域を選択し、この選択された候補画像領域と前記画像部分の重複がない候補画像領域とを抽出するグルーピング処理ステップ（７０）と、前記グルーピング処理ステップ（７０）において抽出された複数の候補画像領域それぞれについて、前記対象物が含まれる確度を示す複数の第２の評価値を算出し、これら算出された第２の評価値に基づき前記画像領域を検出する第２の検出ステップ（８０）とを有したことを特徴とする。 The object detection method of the present invention is an object detection method for detecting an image region including an object from input digital image data, while scanning the digital image data with a window of a predetermined size, A first detection step (60) of calculating a plurality of first evaluation values indicating the probability of the object being included for each window and detecting a plurality of candidate image regions based on the calculated first evaluation values. ) And a plurality of candidate image areas detected in the first detection step (60), at least when there is a candidate image area where the image part overlaps, the most candidate image area is selected from these overlapping candidate image areas. Grouping processing step of selecting a candidate image region having a high evaluation value and extracting the selected candidate image region and a candidate image region in which the image portion does not overlap 70) and, for each of the plurality of candidate image regions extracted in the grouping processing step (70), a plurality of second evaluation values indicating the probability of the object being included are calculated, and these calculated second values are calculated. And a second detection step (80) for detecting the image area based on the evaluation value.

また、本発明の対象物検出方法の前記第１の検出ステップ（６０）では、前記対象物に関する複数の識別器からなる第１の学習データを用いてブースティング法により前記第１の評価値を算出し、前記第２の検出ステップ（８０）では、前記対象物に関する複数の識別器からなる第２の学習データを用いてブースティング法により前記第２の評価値を算出するようにしてもよい。 Further, in the first detection step (60) of the object detection method of the present invention, the first evaluation value is obtained by a boosting method using first learning data comprising a plurality of discriminators related to the object. In the second detection step (80), the second evaluation value may be calculated by a boosting method using second learning data including a plurality of discriminators related to the object. .

また、本発明の対象物検出方法の前記第１の学習データにおける複数の識別器の個数を、前記第２の学習データにおける複数の識別器の個数よりも少なくしてもよい。 Further, the number of the plurality of discriminators in the first learning data of the object detection method of the present invention may be smaller than the number of the plurality of discriminators in the second learning data.

また、本発明の対象物検出プログラムは、入力されたデジタル画像データから、対象物が含まれる画像領域を検出する処理をコンピュータに実行させるための対象物検出プログラムであって、前記デジタル画像データを所定サイズのウィンドウにより走査させながら、ウィンドウ毎に前記対象物が含まれる確度を示す複数の第１の評価値を算出し、これら算出された第１の評価値に基づき、複数の候補画像領域を検出する第１の検出ステップ（６０）と、前記第１の検出ステップ（６０）において検出された複数の候補画像領域のうち、少なくとも画像部分が重複している候補画像領域がある場合に、これら重複している候補画像領域から最も前記評価値の高い候補画像領域を選択し、この選択された候補画像領域と前記画像部分の重複がない候補画像領域とを抽出するグルーピング処理ステップ（７０）と、前記グルーピング処理ステップ（７０）において抽出された複数の候補画像領域それぞれについて、前記対象物が含まれる確度を示す複数の第２の評価値を算出し、これら算出された第２の評価値に基づき前記画像領域を検出する第２の検出ステップ（８０）と、を前記コンピュータに実行させることを特徴する。 An object detection program of the present invention is an object detection program for causing a computer to execute a process of detecting an image area including an object from input digital image data, wherein the digital image data is While scanning with a window of a predetermined size, a plurality of first evaluation values indicating the probability that the object is included for each window are calculated, and a plurality of candidate image regions are calculated based on the calculated first evaluation values. A first detection step (60) to be detected, and a plurality of candidate image regions detected in the first detection step (60), when there is a candidate image region having at least overlapping image portions. The candidate image region having the highest evaluation value is selected from the overlapping candidate image regions, and the selected candidate image region and the image portion are not overlapped. A grouping process step (70) for extracting candidate image areas, and a plurality of second evaluation values indicating the probability that the object is included for each of the plurality of candidate image areas extracted in the grouping process step (70). And a second detection step (80) for detecting the image area based on the calculated second evaluation value, and causing the computer to execute.

また、本発明の対象物検出プログラムの前記第１の検出ステップ（６０）では、前記対象物に関する複数の識別器からなる第１の学習データを用いてブースティング法により前記第１の評価値を算出し、前記第２の検出ステップ（８０）が、前記対象物に関する複数の識別器からなる第２の学習データを用いてブースティング法により前記第２の評価値を算出するように前記コンピュータに実行させるようにしてもよい。 In the first detection step (60) of the object detection program of the present invention, the first evaluation value is obtained by a boosting method using first learning data composed of a plurality of discriminators related to the object. And the second detection step (80) causes the computer to calculate the second evaluation value by a boosting method using second learning data comprising a plurality of discriminators related to the object. You may make it perform.

また、本発明の対象物検出プログラムの前記第１の学習データにおける複数の識別器の個数を、前記第２の学習データにおける複数の識別器の個数よりも少なくして前記コンピュータに実行させるようにしてもよい。 In addition, the number of the plurality of discriminators in the first learning data of the object detection program of the present invention is made smaller than the number of the plurality of discriminators in the second learning data to be executed by the computer. May be.

本発明の対象物検出装置、対象物検出方法、および対象物検出プログラムによれば、少ない計算量で高速に処理が可能であり、且つ高い精度でデジタル画像から対象物を検出することができる。 According to the object detection device, the object detection method, and the object detection program of the present invention, it is possible to perform high-speed processing with a small amount of calculation and detect an object from a digital image with high accuracy.

本発明の一実施形態による対象物検出装置について、図面を参照して詳細に説明する。 An object detection apparatus according to an embodiment of the present invention will be described in detail with reference to the drawings.

本実施形態による対象物検出装置は、従来利用されているブースティング法を用い、図１に示すように第１検出部６０により検出対象の画像データから高速に粗い検出処理を行って対象物の候補領域を検出し、グルーピング処理部７０により対象物に対して候補領域を抽出した後、この抽出された対象物の候補領域に対して第２検出部８０により精度の高い検出処理を行うことにより、従来よりも顕著に少ない計算量で対象物の領域位置を高精度に検出するものである。 The object detection apparatus according to the present embodiment uses a conventionally used boosting method, and as shown in FIG. 1, the first detection unit 60 performs high-speed rough detection processing from image data to be detected to detect the object. By detecting a candidate area and extracting a candidate area for the target object by the grouping processing unit 70, the second detection unit 80 performs a highly accurate detection process on the extracted candidate area of the target object. Thus, the region position of the object is detected with high accuracy with a significantly smaller amount of calculation than in the prior art.

なお、第１検出部６０で行われる「高速に粗い検出処理」とは、検出処理の速度と検出精度とのトレードオフにおいて、検出対象の誤検出を許容して検出処理の速度を検出精度に優先させるものである。 Note that the “rough detection process at high speed” performed by the first detection unit 60 is a trade-off between the speed of the detection process and the detection accuracy, and allows the detection target to be erroneously detected and the detection process speed to be the detection accuracy. It is a priority.

〈一実施形態による対象物検出装置の構成〉
本実施形態による対象物検出装置１の概略構成を、図２を参照して説明する。 <Configuration of Object Detection Device According to One Embodiment>
A schematic configuration of the object detection apparatus 1 according to the present embodiment will be described with reference to FIG.

図２に示すように、対象物検出装置１は、入力部１０と、画像縮尺変更部２０と、積分画像生成部３０と、処理データ記憶部４０と、学習データ記憶部５０と、第１検出部６０と、グルーピング処理部７０と、第２検出部８０と、出力部９０とを有する。 As illustrated in FIG. 2, the object detection device 1 includes an input unit 10, an image scale change unit 20, an integral image generation unit 30, a processing data storage unit 40, a learning data storage unit 50, and a first detection. Unit 60, grouping processing unit 70, second detection unit 80, and output unit 90.

入力部１０は、検索対象のデジタル画像データの輝度成分で構成された画像データを入力する。 The input unit 10 inputs image data composed of luminance components of search target digital image data.

画像縮尺変更部２０は、入力された画像データの縮尺を変更する。 The image scale changing unit 20 changes the scale of the input image data.

積分画像生成部３０は、縮尺が変更されたそれぞれの画像データの積分画像データを生成する。 The integrated image generation unit 30 generates integrated image data of each image data whose scale has been changed.

処理データ記憶部４０は、積分画像生成部３０で生成された積分画像データと、後述する第１検出部６０で算出された対象物候補ウィンドウ画像情報と、グルーピング処理部７０で抽出された対象物候補ウィンドウ画像情報と、第２検出部８０で算出された対象物領域ウィンドウ画像情報とを記憶する。 The processing data storage unit 40 includes integrated image data generated by the integrated image generation unit 30, target object window image information calculated by a first detection unit 60 described later, and target objects extracted by the grouping processing unit 70. The candidate window image information and the object area window image information calculated by the second detection unit 80 are stored.

学習データ記憶部５０は、事前の学習により生成された、対象物の特徴を示す複数の特徴量とそのサイズや位置、各特徴量の信頼度（重み）、検索対象の画像に対象物が含まれているか否かを判別するための後述する評価値の閾値などの学習データを記憶する。ここでいう特徴量は、例えばHaar-like特徴のような局所的な明度差を示す白黒の２値パターンや、ウィンドウ上の２点以上の任意の位置の濃淡パターンなどである。 The learning data storage unit 50 includes a plurality of feature amounts indicating features of the target object, their sizes and positions, reliability (weights) of each feature amount, and search target images generated by the prior learning. Learning data such as a threshold value of an evaluation value (to be described later) for determining whether or not it is determined is stored. The feature amount here is, for example, a black-and-white binary pattern showing a local lightness difference such as a Haar-like feature, or a shading pattern at an arbitrary position of two or more points on the window.

第１検出部６０は、第１走査部６１と、第１判別部６２とを有し、ブースティング法により対象物領域の候補を検出する。 The 1st detection part 60 has the 1st scanning part 61 and the 1st discrimination | determination part 62, and detects the candidate of a target object area | region by a boosting method.

第１走査部６１は、入力された輝度成分画像データを走査し、検索対象となる矩形画像であるウィンドウ画像を縦横に所定ピクセルずつずらしながら順次切り出す。 The first scanning unit 61 scans the input luminance component image data, and sequentially cuts out a window image, which is a rectangular image to be searched, while shifting the window image vertically and horizontally by predetermined pixels.

第１判別部６２は、学習データ記憶部５０に記憶された学習データの特徴量ごとに設けられた複数の識別器（図示せず）で構成され、各識別器では、第１走査部６１で順次切り出されたウィンドウ画像に対し、その特徴量に対応する領域の濃淡の勾配を、処理データ記憶部４０に記憶された積分画像を利用して算出し、その濃淡の勾配からその識別器における対象物らしさを評価し、評価値αを出力する。さらに、その評価値αに各識別器の信頼度に応じた重み付けをすることにより、対象物らしさの度合いを示す評価値βを算出し、各識別器で算出した評価値βを累積した値を第１評価値としてウィンドウ画像ごとに算出する。 The first discriminating unit 62 includes a plurality of discriminators (not shown) provided for each feature amount of the learning data stored in the learning data storage unit 50, and each discriminator includes a first scanning unit 61. For the sequentially extracted window images, the gradient of the shade corresponding to the feature amount is calculated using the integrated image stored in the processing data storage unit 40, and the object in the classifier is calculated from the gradient of the shade. Evaluate materiality and output an evaluation value α. Further, by weighting the evaluation value α according to the reliability of each discriminator, an evaluation value β indicating the degree of objectness is calculated, and a value obtained by accumulating the evaluation value β calculated by each discriminator is calculated. The first evaluation value is calculated for each window image.

そして、第１判別部６２は、算出した第１評価値と学習データ記憶部５０に記憶されている評価値の閾値とを比較し、第１評価値がこの閾値を超えていれば当該ウィンドウ画像は「対象物である可能性がある」と判定して対象物領域の候補となる対象物候補ウィンドウ画像として検出し、検索処理中の画像データ中のこのウィンドウ画像の位置情報と第１評価値とを対象物候補ウィンドウ画像情報として処理データ記憶部４０に送出する。 And the 1st discrimination | determination part 62 compares the calculated 1st evaluation value and the threshold value of the evaluation value memorize | stored in the learning data storage part 50, and if the 1st evaluation value exceeds this threshold value, the said window image Is detected as an object candidate window image that is a candidate for the object region by determining that “it may be an object”, and the position information and the first evaluation value of this window image in the image data being searched Are sent to the processing data storage unit 40 as object candidate window image information.

グルーピング処理部７０は、重複ウィンドウ画像検出部７１と、候補ウィンドウ画像選択部７２とを有する。 The grouping processing unit 70 includes an overlapping window image detection unit 71 and a candidate window image selection unit 72.

重複ウィンドウ画像検出部７１は、輝度成分画像データ内の一の対象物領域に対して、第１判別部６２において重複して検出されている近傍位置の対象物候補ウィンドウ画像を検出する。 The overlapping window image detection unit 71 detects an object candidate window image at a nearby position that is detected redundantly by the first determination unit 62 for one object region in the luminance component image data.

候補ウィンドウ画像選択部７２は、重複ウィンドウ画像検出部７１において、一の対象物領域に対して第１判別部６２において重複して検出されている近傍位置の対象物候補ウィンドウ画像が複数個検出されたときには、この対象物候補ウィンドウ画像群の中から最も評価値が高い対象物候補ウィンドウ画像を選択してこの対象物候補ウィンドウ画像の対象物候補ウィンドウ画像情報を処理データ記憶部４０に送出し、処理データ記憶部４０に記憶されている、選択されなかった候補ウィンドウ画像情報の対象物候補ウィンドウ画像情報を削除する。なお、候補ウィンドウ画像選択部７２は、対象物候補ウィンドウ画像群の位置から平均的な位置を算出し、処理データ記憶部４０に送出するようにしてもよい。 The candidate window image selection unit 72 detects a plurality of object candidate window images at neighboring positions detected in the overlapped window image detection unit 71 by the first determination unit 62 overlapping with one object region. The object candidate window image having the highest evaluation value is selected from the object candidate window image group, and the object candidate window image information of the object candidate window image is sent to the processing data storage unit 40. The object candidate window image information of the candidate window image information that has not been selected and is stored in the processing data storage unit 40 is deleted. Note that the candidate window image selection unit 72 may calculate an average position from the position of the target object candidate window image group and send it to the processing data storage unit 40.

第２検出部８０は、第２走査部８１と、第２判別部８２とを有し、ブースティング法により対象物領域の候補を検出する。 The second detection unit 80 includes a second scanning unit 81 and a second determination unit 82, and detects a candidate for the object region by a boosting method.

第２走査部８１は、グルーピング処理部７０で抽出された対象物候補ウィンドウ画像の位置情報を処理データ記憶部４０から取得し、取得した位置情報に基づいて輝度成分画像データから対象物候補ウィンドウ画像を切り出す。 The second scanning unit 81 acquires the position information of the object candidate window image extracted by the grouping processing unit 70 from the processing data storage unit 40, and based on the acquired position information, the object candidate window image is obtained from the luminance component image data. Cut out.

第２判別部８２は、学習データ記憶部５０に記憶された学習データの特徴量ごとに設けられた、高い精度で対象物の検出を行うために十分な信頼性を有する数の識別器（図示せず）で構成され、各識別器では、第２走査部８１で切り出されたウィンドウ画像に対し、その特徴量に対応する領域の濃淡の勾配を、処理データ記憶部４０に記憶された積分画像を利用して算出し、その濃淡の勾配からその識別器における対象物らしさを評価し、評価値γを出力する。さらに、その評価値γに各識別器の信頼度に応じた重み付けを加算することにより、対象物らしさの度合いを示す評価値λを求め、各識別器で求められた評価値λを累積した値を第２評価値としてウィンドウ画像ごとに求める。 The second discriminating unit 82 is provided for each feature amount of the learning data stored in the learning data storage unit 50 and has a sufficient number of discriminators (FIG. 5) having sufficient reliability to detect the object with high accuracy. In each discriminator, the integrated image stored in the processing data storage unit 40 is the gradient of the area corresponding to the feature amount of the window image cut out by the second scanning unit 81. Is used to evaluate the object-likeness of the discriminator from the shade gradient, and an evaluation value γ is output. Further, by adding a weight according to the reliability of each discriminator to the evaluation value γ, an evaluation value λ indicating the degree of objectness is obtained, and a value obtained by accumulating the evaluation value λ obtained by each discriminator For each window image as a second evaluation value.

そして、第２判別部８２は、求めた第２評価値と学習データ記憶部５０に記憶されている評価値の閾値とを比較し、第２評価値がこの閾値を超えていれば当該ウィンドウ画像は「対象物である」と判定して対象物領域ウィンドウ画像として検出し、入力された輝度成分画像データ中のこの対象物領域ウィンドウ画像の位置情報と第２評価値とを対象物領域ウィンドウ画像情報として処理データ記憶部４０に送出する。 And the 2nd discrimination | determination part 82 compares the calculated | required 2nd evaluation value and the threshold value of the evaluation value memorize | stored in the learning data memory | storage part 50, and if the 2nd evaluation value exceeds this threshold value, the said window image Is determined to be “object” and detected as an object area window image, and the position information and the second evaluation value of the object area window image in the input luminance component image data are used as the object area window image. The information is sent to the processing data storage unit 40 as information.

第２判別部８２を構成する識別器の数は使用目的に応じて適宜変更されるが、高い精度で対象物の検出を行うために十分な信頼性を有する数の識別器で構成される。 The number of discriminators constituting the second discriminating unit 82 is appropriately changed according to the purpose of use, but is configured with a number of discriminators having sufficient reliability for detecting an object with high accuracy.

なお、上述した第１判別部６２では、対象物である可能性があるか否かを判定することを目的としているため最終的な検出に必要な高さの精度は必要ない。具体的には、１ウィンドウ画像あたりの平均計算量は、「平均計算量＝使用する特徴量の数の平均×１特徴量あたりの平均計算量」と表すことができ、第１判別部６２の平均計算量が第２判別部８２のものよりも十分に少なければよい。ゆえに、第１判別部６２は、特徴量の数は多くても１個あたりの計算量が少なければよいということになる。 Note that the first determination unit 62 described above is intended to determine whether or not there is a possibility of being an object, and therefore the height accuracy necessary for final detection is not necessary. Specifically, the average calculation amount per window image can be expressed as “average calculation amount = average number of feature amounts to be used × average calculation amount per one feature amount”. It is sufficient that the average calculation amount is sufficiently smaller than that of the second determination unit 82. Therefore, the first discriminating unit 62 only needs to have a small amount of calculation even if the number of feature amounts is large.

出力部９０は、処理データ記憶部４０に記憶された対象物領域ウィンドウ画像情報を読み出して出力する。 The output unit 90 reads and outputs the object area window image information stored in the processing data storage unit 40.

〈一実施形態による対象物検出装置の動作〉
本実施形態による対象物検出装置１により、対象物として顔領域を検出するときの動作について図３のフローチャートおよび図４の画面表示図を参照して説明する。 <Operation of Object Detection Device According to One Embodiment>
An operation when a face area is detected as an object by the object detection apparatus 1 according to the present embodiment will be described with reference to a flowchart of FIG. 3 and a screen display diagram of FIG.

まず、対象物検出装置１の入力部１０に検索対象のデジタル画像データの輝度成分の画像データａが入力されると（Ｓ１）、画像縮尺変更部２０においてこの画像データａが、予め設定された縮尺ｐ倍に縮小される（Ｓ２）。 First, when the image data a of the luminance component of the digital image data to be searched is input to the input unit 10 of the object detection device 1 (S1), the image data a is set in advance in the image scale changing unit 20. The scale is reduced to p times (S2).

本実施形態においては、下記式（１）に定義するように、入力された画像データａがｐ倍に縮小される。

In the present embodiment, the input image data a is reduced by a factor of p as defined by the following equation (1).

（ただし、ｑは、
０＜ｑ＜１
の定数であり、ｒは、
０≦ｒ≦Ｔ−１（Ｔ：縮尺変更の繰り返し回数で自然数）
の範囲で１ずつ増加する自然数である。）
この縮尺変更の繰り替えし回数Ｔは、検出したい顔のサイズの範囲に応じて、予め設定される。 (However, q is
0 <q <1
Where r is a constant of
0 ≦ r ≦ T-1 (T: the number of scale change repetitions is a natural number)
It is a natural number that increases by 1 in the range. )
The number of repetitions T of the scale change is set in advance according to the range of the face size to be detected.

初回はｒ＝０であり、入力された画像データａは等倍のままで処理が実行される。 The first time is r = 0, and the input image data a is processed with the same magnification.

次に積分画像生成部３０において、この画像データａの積分画像データが生成され、処理データ記憶部４０に記憶される（Ｓ３）。 Next, in the integrated image generation unit 30, integrated image data of the image data a is generated and stored in the processing data storage unit 40 (S3).

次に、第１検出部６０の第１走査部６１において、予め設定された大きさの矩形窓が用いられ、画像データａから検索対象の矩形画像であるウィンドウ画像が走査され切り出される。 Next, in the first scanning unit 61 of the first detection unit 60, a rectangular window having a preset size is used, and a window image that is a rectangular image to be searched is scanned and cut out from the image data a.

本実施形態においてこの矩形窓の大きさは幅２４×高さ２４ピクセルであり、初回の走査では例えば、検索対象の画像データ内の初期位置として設定された左下画素（画像データのＸ座標＝０、Ｙ座標＝０）がウィンドウ画像の左下画素となる状態でウィンドウ画像が切り出される（Ｓ４、Ｓ５）。 In this embodiment, the size of the rectangular window is 24 × 24 pixels high, and in the first scan, for example, the lower left pixel (X coordinate = 0 of the image data = 0) set as the initial position in the image data to be searched. , Y coordinate = 0) is cut out in a state where the lower left pixel of the window image is set (S4, S5).

次に第１判別部６２において、第１走査部６１で切り出されたウィンドウ画像について、例えば、前述の非特許文献１および特許文献１等に記載されたブースティング法を用いて特徴量が抽出される。 Next, in the first discriminating unit 62, feature amounts are extracted from the window image cut out by the first scanning unit 61 using, for example, the boosting method described in Non-Patent Document 1 and Patent Document 1 described above. The

この第１判別部６２の各識別器では、第１走査部６１で順次切り出されたウィンドウ画像に対し、その特徴量に対応する領域の濃淡の勾配を、処理データ記憶部４０に記憶された積分画像を利用して算出し、その濃淡の勾配からその識別器における対象物らしさを評価し、評価値αを出力する。さらに、その評価値αに各識別器の信頼度に応じた重み付けをすることにより、対象物らしさの度合いを示す評価値βを算出し、各識別器で算出した評価値βを累積した値を第１評価値としてウィンドウ画像ごとに算出する。そして、第１判別部６２において、算出した評価値αに各識別器の信頼度に応じた重み付けが加算され、対象物らしさの度合いを示す評価値βが算出される（Ｓ６）。 In each discriminator of the first discriminating unit 62, the gradient of the shade of the region corresponding to the feature amount of the window image sequentially cut out by the first scanning unit 61 is integrated in the processing data storage unit 40. It calculates using the image, evaluates the object likeness in the discriminator from the gradient of the shade, and outputs the evaluation value α. Further, by weighting the evaluation value α according to the reliability of each discriminator, an evaluation value β indicating the degree of objectness is calculated, and a value obtained by accumulating the evaluation value β calculated by each discriminator is calculated. The first evaluation value is calculated for each window image. Then, in the first discriminating unit 62, a weight corresponding to the reliability of each discriminator is added to the calculated evaluation value α, and an evaluation value β indicating the degree of objectness is calculated (S6).

さらに第１判別部６２において、各識別器で算出された評価値が累積された第１評価値がウィンドウ画像ごとに算出され、この第１評価値が学習データ記憶部５０に記憶されている評価値の閾値を超えているか否かが判定される。 Further, in the first discriminating unit 62, a first evaluation value in which the evaluation values calculated by the respective discriminators are accumulated is calculated for each window image, and the first evaluation value is stored in the learning data storage unit 50. It is determined whether a value threshold is exceeded.

判定の結果、第１評価値がこの閾値を超えていれば当該ウィンドウ画像は「顔である可能性がある」と判定されて顔領域の候補となる顔候補ウィンドウ画像として検出され（Ｓ７の「YES」）、検索処理中の画像データ中のこのウィンドウ画像の位置情報と第１評価値とが顔候補ウィンドウ画像情報として処理データ記憶部４０に記憶される（Ｓ８）。 As a result of the determination, if the first evaluation value exceeds this threshold value, the window image is determined as “possibly a face” and is detected as a face candidate window image as a face area candidate (“S7” YES ”), the position information of the window image in the image data being searched and the first evaluation value are stored in the processing data storage unit 40 as face candidate window image information (S8).

第１判別部６２を構成する識別器の数は使用目的や検出精度に応じて適宜変更されるが、従来のブースティング法における識別器の数のように高い精度で対象物の検出を行うために十分な信頼性を有する数よりも格段に少ない数で構成され、これら従来の検出器よりも少ない計算量で対象物である可能性があるか否かの基準のゆるい（誤検出を許容した)検出処理が行われる。 The number of discriminators constituting the first discriminating unit 62 is appropriately changed according to the purpose of use and detection accuracy. In order to detect an object with high accuracy like the number of discriminators in the conventional boosting method. It is composed of a number that is much smaller than the number with sufficient reliability, and the standard of whether or not it may be an object with a smaller amount of calculation than these conventional detectors (allowing false detection) ) Detection processing is performed.

ステップＳ８の処理が終了するとステップＳ５に戻り、ウィンドウ画像を切り出すための矩形窓が画像データａ内のＸ方向にｍピクセルずつずらされて走査されてウィンドウ画像が切り出されてステップＳ６〜Ｓ８の処理が行われる。このステップＳ５からステップＳ８の繰り返し処理は、検索対象の画像データａの右端までｊ回行われる（Ｓ９、Ｓ１０）。 When the process of step S8 is completed, the process returns to step S5, the rectangular window for cutting out the window image is scanned by shifting by m pixels in the X direction in the image data a, and the window image is cut out, and the processes of steps S6 to S8 are performed. Is done. The repetitive processing from step S5 to step S8 is performed j times to the right end of the search target image data a (S9, S10).

また、このステップＳ９、Ｓ１０の繰り返し処理は、画像データａ内のＹ方向にｎピクセルずつずらされて走査されることにより、検索対象の画像データａの上端までｉ回繰り返される（ｍ，ｎ：１以上の整数）（Ｓ１１、Ｓ１２）。 In addition, the repetition process of steps S9 and S10 is repeated i times up to the upper end of the image data a to be searched by scanning by shifting n pixels at a time in the Y direction in the image data a (m, n: An integer greater than or equal to 1) (S11, S12).

ステップＳ５〜Ｓ１２に処理により検索対象の画像データａ内の顔候補ウィンドウ画像の検索処理が終了すると、ステップＳ２に戻り（Ｓ１３の「NO」）、上記式（１）のｒの値が加算されてｒ＝１とされ、定数ｑの値に基づいて画像データａが縮小され、ステップＳ３〜Ｓ１２の処理が縮尺変更の繰り返し回数として設定されたＴ回繰り返される。 When the search process of the face candidate window image in the search target image data a is completed by the processing in steps S5 to S12, the process returns to step S2 (“NO” in S13), and the value of r in the above equation (1) is added. R = 1, the image data a is reduced based on the value of the constant q, and the processes of steps S3 to S12 are repeated T times set as the number of scale change repetitions.

第１検出部６０において縮小された画像データからの顔候補ウィンドウ画像の検出処理がＴ回繰り返されると（Ｓ１３の「YES」）、第１検出部６０の処理により処理データ記憶部４０に記憶された顔候補ウィンドウ画像情報がグルーピング処理部７０の重複ウィンドウ画像検出部７１において読み出され、各画像データ内の一の顔候補領域に対して重複して検出されている近傍位置の顔候補ウィンドウ画像が検出される。 When the detection process of the face candidate window image from the reduced image data in the first detection unit 60 is repeated T times (“YES” in S13), the process is stored in the processing data storage unit 40 by the process of the first detection unit 60. The candidate face window image information is read out by the overlapping window image detecting unit 71 of the grouping processing unit 70, and is detected by overlapping with respect to one face candidate area in each image data. Is detected.

ここで、各画像データ内の一の顔候補領域に対して重複しているウィンドウ画像であるか否かの判断は、比較するウィンドウ画像どうしが所定の距離内にあるか否かにより決定される。 Here, whether or not the window images overlap with each other in one image candidate area in each image data is determined by whether or not the window images to be compared are within a predetermined distance. .

重複して検出されている顔候補ウィンドウ画像群が検出されたときは（Ｓ１４の「YES」）、この顔候補ウィンドウ画像群のそれぞれの評価値が候補ウィンドウ画像選択部７２において処理データ記憶部４０から読み出され、検出された顔候補ウィンドウ画像群の中で評価値が高い顔候補ウィンドウ画像が選択されるとともに、評価値が低い顔候補ウィンドウ画像の顔候補ウィンドウ画像情報が処理データ記憶部４０から削除され破棄される（Ｓ１５）。 When the face candidate window image group detected in duplicate is detected (“YES” in S14), the evaluation value of each face candidate window image group is processed in the candidate window image selection unit 72 by the processing data storage unit 40. The face candidate window image having a high evaluation value is selected from the detected face candidate window image group and the face candidate window image information of the face candidate window image having a low evaluation value is processed data storage unit 40. Is deleted and discarded (S15).

ステップＳ１４およびステップＳ１５の処理は、各画像データ内の一の顔候補領域に対して一の顔候補ウィンドウ画像のみが選択されるまで繰り返される。 The processes of step S14 and step S15 are repeated until only one face candidate window image is selected for one face candidate area in each image data.

重複して検出されている顔候補ウィンドウ画像群から各画像データ内の一の顔候補領域に対して一の顔候補ウィンドウ画像のみが選択されると（Ｓ１４の「NO」）、選択された顔候補ウィンドウ画像の顔候補ウィンドウ画像情報が処理データ記憶部４０に記憶される（Ｓ１６）。 When only one face candidate window image is selected for one face candidate region in each image data from the group of face candidate window images detected in duplicate (“NO” in S14), the selected face Face candidate window image information of the candidate window image is stored in the processing data storage unit 40 (S16).

図４（ａ）に、グルーピング後の抽出された顔候補ウィンドウ画像が検索対象の画像データ上に太枠の矩形領域で表示されたときの一例を示す。同図（ａ）では、第１判別部６２において低い精度で顔候補領域の検出処理が行われているため、実際の顔領域と顔以外の領域を含む複数の領域が顔候補領域として選択されている様子が示されている。 FIG. 4A shows an example when the extracted face candidate window image after grouping is displayed as a rectangular region with a thick frame on the image data to be searched. In FIG. 9A, since the face determination area detection process is performed with low accuracy in the first determination unit 62, a plurality of areas including the actual face area and the area other than the face are selected as the face candidate areas. Is shown.

次に、第２検出部８０の第２走査部８１においてステップＳ１６で記憶された顔候補ウィンドウ画像情報が処理データ記憶部４０から読み出され（Ｓ１７）、画像データａからこれらの位置情報に対応する顔候補ウィンドウ画像が切り出される（Ｓ１８）。 Next, the face candidate window image information stored in step S16 in the second scanning unit 81 of the second detection unit 80 is read from the processing data storage unit 40 (S17), and the position information is associated with the image data a. The face candidate window image to be cut out is cut out (S18).

次に第２判別部８２において、第２走査部８１で切り出されたウィンドウ画像について、ステップＳ６と同様に評価値が各識別器で算出されて重み付けがされる（Ｓ１９）。 Next, in the second discriminating unit 82, the evaluation value is calculated by each discriminator and weighted for the window image cut out by the second scanning unit 81 as in step S6 (S19).

この第２判別部８２の各識別器では、第２走査部８１で切り出されたウィンドウ画像に対し、その特徴量に対応する領域の濃淡の勾配を、処理データ記憶部４０に記憶された積分画像を利用して算出し、その濃淡の勾配からその識別器における対象物らしさを評価し、評価値γを出力する。さらに、その評価値γに各識別器の信頼度に応じた重み付けを加算することにより、対象物らしさの度合いを示す評価値λを求め、各識別器で求められた評価値λを累積した値を第２評価値としてウィンドウ画像ごとに求める。 In each discriminator of the second discriminating unit 82, the integrated image stored in the processing data storage unit 40 is the gradient of the shade corresponding to the feature amount of the window image cut out by the second scanning unit 81. Is used to evaluate the object-likeness of the discriminator from the shade gradient, and an evaluation value γ is output. Further, by adding a weight according to the reliability of each discriminator to the evaluation value γ, an evaluation value λ indicating the degree of objectness is obtained, and a value obtained by accumulating the evaluation value λ obtained by each discriminator For each window image as a second evaluation value.

この第２判別部８２において、第１判別部６２における処理と同様に各識別器で算出された評価値が累積された第２評価値がウィンドウ画像ごとに算出され、この第２評価値が学習データ記憶部５０に記憶されている評価値の閾値を超えているか否かが判定される（Ｓ２０）。 In the second discriminating unit 82, as in the processing in the first discriminating unit 62, a second evaluation value obtained by accumulating the evaluation values calculated by the respective discriminators is calculated for each window image, and the second evaluation value is learned. It is determined whether the threshold value of the evaluation value stored in the data storage unit 50 is exceeded (S20).

判定の結果、第２評価値がこの閾値を超えていれば当該ウィンドウ画像は「顔である」と判定されて顔領域ウィンドウ画像として検出され（Ｓ２０の「YES」）、画像データａ中のこのウィンドウ画像の位置情報と第２評価値とが顔領域ウィンドウ画像情報として処理データ記憶部４０に記憶される（Ｓ２１）。 As a result of the determination, if the second evaluation value exceeds this threshold value, the window image is determined to be “face” and detected as a face area window image (“YES” in S20), and this window image image a The position information of the window image and the second evaluation value are stored in the processing data storage unit 40 as face area window image information (S21).

このステップＳ１７〜Ｓ２１の処理はステップＳ１６で記憶されたすべての位置情報に対応する顔候補ウィンドウ画像について行われ（Ｓ２２の「YES」）、すべての位置情報に対応する顔候補ウィンドウ画像への処理が終了すると（Ｓ２２の「NO」）、ステップＳ２１で記憶された顔領域ウィンドウ画像情報の位置情報が出力部９０において読み出され、外部に出力される（Ｓ２３）。 The processes in steps S17 to S21 are performed for the face candidate window images corresponding to all the position information stored in step S16 (“YES” in S22), and the process for the face candidate window images corresponding to all the position information is performed. Is completed (“NO” in S22), the position information of the face area window image information stored in step S21 is read by the output unit 90 and output to the outside (S23).

図４（ｂ）に、出力された顔領域ウィンドウ画像情報の位置情報が、検索対象の画像データ上に太枠の矩形領域で表示されたときの一例を示す。 FIG. 4B shows an example when the position information of the output face area window image information is displayed as a thick rectangular area on the image data to be searched.

ここでは第２判別部８２において、図４（ａ）で選択された顔候補ウィンドウ画像から高い精度で顔候補領域の検出処理が行われているため、実際の顔領域が選択されていることが示されている。 Here, in the second discriminating unit 82, the face candidate area detection process is performed with high accuracy from the face candidate window image selected in FIG. 4A, so that the actual face area is selected. It is shown.

以上説明した本実施形態によれば、第１検出部において少ない計算量で画像データから、誤検出を許容して対象物である可能性がある領域の（基準のゆるい)検出を行い、さらにグルーピング処理部において画像データ内の一の対象物領域に対して重複して検出されている対象物候補ウィンドウ画像を削除することにより、検出対象のウィンドウ画像を絞り込んでから、第２検出部において高い精度で対象物の検出を行うため、従来のブースティング法による検出処理に比べて顕著に少ない計算量で高速に処理が可能であり、且つ高い精度でデジタル画像から対象物を検出することができる。 According to the present embodiment described above, the first detection unit detects an area that may be a target object (allowing a loose reference) from image data with a small amount of calculation, and further performs grouping. The processing unit narrows down the window image to be detected by deleting the target object window image that is detected redundantly with respect to one target region in the image data, and then the second detection unit has high accuracy. Therefore, the object can be detected at a high speed with a remarkably small amount of calculation compared to the detection processing by the conventional boosting method, and the object can be detected from the digital image with high accuracy.

なお、本実施形態においては、第１判別部６２および第２判別部８２で使用する学習データと閾値とがそれぞれ同値である場合として説明したが、第１判別部６２および第２判別部８２のそれぞれについて、第１および第２学習データ、並びに第１および第２閾値を設定できるようにしておくことが望ましい。 In the present embodiment, the learning data used in the first determination unit 62 and the second determination unit 82 and the threshold value are described as being the same value, but the first determination unit 62 and the second determination unit 82 It is desirable that the first and second learning data and the first and second threshold values can be set for each.

そして、第１および第２学習データとして、一連の十分精度のある数の識別器を有した学習データの最初の幾つかの識別器を第１学習データとし、それ以降最後までの識別器を第２学習データとしてもよいし、第１および第２学習データを同一の特徴量にしてもよいし、第１学習データでは少数で顔らしい位置を特定し易い形状の特徴量を用いたり、例えば、矩形特徴ならば周波数の低い計算量が少なくて済む特徴量を使うなど、別々の特徴量を使用して学習するようにしてもよい。 Then, as the first and second learning data, the first several discriminators of the learning data having a series of sufficiently accurate discriminators are used as the first learning data, and the discriminators from then on until the last are discriminated. 2 learning data, the first and second learning data may be the same feature amount, the first learning data uses a feature amount of a shape that is easy to specify a position that is likely to be a face, for example, In the case of a rectangular feature, learning may be performed using different feature amounts, such as using a feature amount that requires less calculation amount with a low frequency.

またさらに、本実施例においては、第１判別部６２および第２判別部８２で、すべての識別器を使用して第1評価値および第2評価値を求めた後に閾値との比較を行っているが、これに限らず、途中の識別器を通ったところまでの累積された評価値を、それに対応する途中段階の閾値学習データとして保持するようにし、その閾値を超えていなければ途中でそのウィンドウ画像に対する判別処理を打ち切るようにすれば、更なる高速化を図ることができる。 Furthermore, in the present embodiment, the first discriminating unit 62 and the second discriminating unit 82 use all the discriminators to obtain the first evaluation value and the second evaluation value, and then compare with the threshold value. However, the present invention is not limited to this, and the accumulated evaluation value up to the point where it passed through the classifier in the middle is retained as threshold learning data corresponding to the middle stage. If the discrimination process for the window image is discontinued, the speed can be further increased.

また、本実施形態の対象物検出装置の機能構成を対象物検出プログラムとしてプログラム化してコンピュータに組み込むことにより、当該コンピュータを対象物検出装置として機能させることもできる。 Moreover, the computer can be made to function as a target object detection apparatus by programming the function structure of the target object detection apparatus of this embodiment as a target object detection program, and incorporating it into a computer.

本発明の一実施形態による対象物検出装置１の構成を示すブロック図である。It is a block diagram which shows the structure of the target object detection apparatus 1 by one Embodiment of this invention. 本発明の一実施形態による対象物検出装置１の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the target object detection apparatus 1 by one Embodiment of this invention. 本発明の一実施形態による対象物検出装置１の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the target object detection apparatus 1 by one Embodiment of this invention. 本発明の一実施形態による対象物検出装置１の第１検出部６０により検出された対象物候補領域を、検出対象の画像データ上に表示した状態を示す画面表示図（ａ）、および対象物検出装置１の第２検出部８０により検出された対象物領域を、検出対象の画像データ上に表示した状態を示す画面表示図（ｂ）である。The screen display figure (a) which shows the state which displayed the candidate object area | region detected by the 1st detection part 60 of the target object detection apparatus 1 by one Embodiment of this invention on the image data of a detection target, and a target object It is a screen display figure (b) which shows the state where the object field detected by the 2nd detection part 80 of detecting device 1 was displayed on the image data of a candidate for detection. 非特許文献１に記載の従来の顔検出器の構成を示すブロック図である。It is a block diagram which shows the structure of the conventional face detector described in the nonpatent literature 1. 特許文献１に記載の従来の顔検出器の構成を示すブロック図である。It is a block diagram which shows the structure of the conventional face detector described in patent document 1. FIG. 特許文献２に記載の従来の顔検出器の構成を示すブロック図である。It is a block diagram which shows the structure of the conventional face detector described in patent document 2.

Explanation of symbols

１…対象物検出装置
１０…入力部
２０…画像縮尺変更部
３０…積分画像生成部
４０…処理データ記憶部
５０…学習データ記憶部
６０…第１検出部
６１…第１走査部
６２…第１判別部
７０…グルーピング処理部
７１…重複ウィンドウ画像検出部
７２…候補ウィンドウ画像選択部
８０…第２検出部
８１…第２走査部
８２…第２判別部
９０…出力部 DESCRIPTION OF SYMBOLS 1 ... Object detection apparatus 10 ... Input part 20 ... Image scale change part 30 ... Integrated image generation part 40 ... Processing data storage part 50 ... Learning data storage part 60 ... 1st detection part 61 ... 1st scanning part 62 ... 1st Discriminator 70 ... Grouping processor 71 ... Duplicate window image detector 72 ... Candidate window image selector 80 ... Second detector 81 ... Second scanning unit 82 ... Second discriminator 90 ... Output unit

Claims

In the object detection device for detecting an image area including the object from the input digital image data,
While scanning the digital image data with a window of a predetermined size, a plurality of first evaluation values indicating the probability that the object is included for each window are calculated, and a plurality of values are calculated based on the calculated first evaluation values. First detection means for detecting a candidate image region of
Among the plurality of candidate image areas detected by the first detection means, if there is a candidate image area where at least the image portion overlaps, the candidate having the highest evaluation value from these overlapping candidate image areas Grouping processing means for selecting an image area and extracting the selected candidate image area and a candidate image area in which the image portion does not overlap;
For each of a plurality of candidate image regions extracted by the grouping processing means, a plurality of second evaluation values indicating the probability of the object being included are calculated, and the image regions are based on the calculated second evaluation values. Second detecting means for detecting
An object detection apparatus comprising:

The first detection means calculates the first evaluation value by a boosting method using first learning data comprising a plurality of discriminators related to the object, and the second detection means The object detection apparatus according to claim 1, wherein the second evaluation value is calculated by a boosting method using second learning data including a plurality of classifiers related to the object.

The object detection apparatus according to claim 2, wherein the number of the plurality of classifiers in the first learning data is smaller than the number of the plurality of classifiers in the second learning data.

In an object detection method for detecting an image region including an object from input digital image data,
While scanning the digital image data with a window of a predetermined size, a plurality of first evaluation values indicating the probability that the object is included for each window are calculated, and a plurality of values are calculated based on the calculated first evaluation values. A first detection step of detecting a candidate image region of
Among the plurality of candidate image areas detected in the first detection step, if there is a candidate image area where at least an image portion overlaps, the candidate having the highest evaluation value from these overlapping candidate image areas A grouping process step of selecting an image region and extracting the selected candidate image region and a candidate image region in which the image portion does not overlap;
For each of the plurality of candidate image areas extracted in the grouping processing step, a plurality of second evaluation values indicating the probability of the object being included are calculated, and the image areas are based on the calculated second evaluation values. A second detection step of detecting
The object detection method characterized by having.

In the first detection step, the first evaluation value is calculated by a boosting method using first learning data including a plurality of discriminators related to the object, and the second detection step includes the object 5. The object detection method according to claim 4, wherein the second evaluation value is calculated by a boosting method using second learning data comprising a plurality of classifiers related to an object.

6. The object detection method according to claim 5, wherein the number of the plurality of classifiers in the first learning data is smaller than the number of the plurality of classifiers in the second learning data.

An object detection program for causing a computer to execute processing for detecting an image area including an object from input digital image data,
While scanning the digital image data with a window of a predetermined size, a plurality of first evaluation values indicating the probability that the object is included for each window are calculated, and a plurality of values are calculated based on the calculated first evaluation values. A first detection step of detecting a candidate image region of
Among the plurality of candidate image areas detected in the first detection step, if there is a candidate image area where at least an image portion overlaps, the candidate having the highest evaluation value from these overlapping candidate image areas A grouping process step of selecting an image region and extracting the selected candidate image region and a candidate image region in which the image portion does not overlap;
For each of the plurality of candidate image areas extracted in the grouping processing step, a plurality of second evaluation values indicating the probability of the object being included are calculated, and the image areas are based on the calculated second evaluation values. A second detection step of detecting
An object detection program for causing the computer to execute.

In the first detection step, the first evaluation value is calculated by a boosting method using first learning data including a plurality of discriminators related to the object, and the second detection step includes the object The object detection program according to claim 7, wherein the computer is executed to calculate the second evaluation value by a boosting method using second learning data including a plurality of classifiers related to an object.

9. The object detection program according to claim 8, wherein the number of the plurality of classifiers in the first learning data is made smaller than the number of the plurality of classifiers in the second learning data to be executed by the computer. .