JP2012128623A

JP2012128623A - Object discrimination device, method, and program

Info

Publication number: JP2012128623A
Application number: JP2010278914A
Authority: JP
Inventors: Makoto Yonaha; 誠與那覇
Original assignee: Fujifilm Corp
Current assignee: Fujifilm Corp
Priority date: 2010-12-15
Filing date: 2010-12-15
Publication date: 2012-07-05
Anticipated expiration: 2030-12-15
Also published as: JP5649943B2

Abstract

PROBLEM TO BE SOLVED: To avoid duplicate difference calculation in an object discrimination device.SOLUTION: A discriminator 13 includes plural weak discriminators each calculating a difference between at least one pair of two points in an input image, and calculating a score related to existence of a detection target object based on the calculated difference. In the discriminator 13, the plural weak discriminators are cascade-connected. Difference image generation means 15 sets a displacement amount corresponding to a positional relationship between two points in an image between which a difference is to be calculated by a weak discriminator, and generates a difference image between the input image and an image obtained by displacing the input image by the set displacement amount. At least a part of the plural weak discriminators refer to the difference image to acquire the difference, and calculate the score.

Description

本発明は、オブジェクト判別装置、方法、及びプログラムに関し、更に詳しくは、画像中に検出対象のオブジェクトが含まれているか否かを判別するオブジェクト判別装置、方法、及びプログラムに関する。 The present invention relates to an object determination device, method, and program, and more particularly, to an object determination device, method, and program for determining whether or not an object to be detected is included in an image.

コンピュータなどの計算機を用いて、写真画像などのデジタル画像から顔等の所定の対象物（オブジェクト）を検出する方法が種々提案されている。画像から対象物を検出する方法としては、例えば比較的古くから利用されているテンプレートマッチングの手法が知られている。また、近年、ブースティング（boosting）と呼ばれる機械学習の手法を用いて判別器を構成し、その判別器を用いて画像から対象物を検出手法も注目されている。ブースティングを用いた判別器の学習、及び、その判別器を用いたオブジェクト検出は、例えば特許文献１や特許文献２に記載されている。 Various methods for detecting a predetermined object (object) such as a face from a digital image such as a photographic image using a computer such as a computer have been proposed. As a method for detecting an object from an image, for example, a template matching method that has been used for a relatively long time is known. In recent years, a classifier is configured using a machine learning technique called boosting, and a technique for detecting an object from an image using the classifier has attracted attention. Learning of a discriminator using boosting and object detection using the discriminator are described in Patent Document 1 and Patent Document 2, for example.

一般に、ブースティング学習により生成された判別器は、複数の、例えば数百から数千の弱判別器を有する。それら複数の弱判別器を直列に接続（カスケード接続）することで１つの判別器（強判別器）が構成される。一般に、弱判別器は、真の分類と若干の相関を有する分類器として定義される。各弱判別器は、特徴量計算を行い、その特徴量に基づくスコアを求める。強判別器は、カスケード接続された全ての弱判別器で求められたスコアの合計を所定のしきい値でしきい値処理し、合計スコアがしきい値以上のとき、処理対象画像に検出対象のオブジェクトが現れていると判断する。 In general, a discriminator generated by boosting learning has a plurality of, for example, hundreds to thousands of weak discriminators. A plurality of weak classifiers are connected in series (cascade connection) to form one classifier (strong classifier). In general, weak classifiers are defined as classifiers that have some correlation with the true classification. Each weak discriminator calculates a feature amount and obtains a score based on the feature amount. The strong classifier performs threshold processing on the total score obtained by all the cascaded weak classifiers with a predetermined threshold value. It is determined that the object of appears.

弱判別器における特徴量計算は、２点（２つの領域）間の画素値の差分が基本である。各弱判別器は、差分計算に関する複数の基本特徴タイプの何れかで差分計算を行い、入力画像から検出対象物の存在に関するスコアを求める。差分計算に関する基本特徴タイプは、例えば横方向に並ぶ２点間の差分、縦方向に並ぶ２点間の差分、斜め方向に並ぶ２点間の差分など、テンプレート内の２点間の相対的な位置関係で定義することができる。基本特徴タイプが２点間の位置関係を複数ペア（２ペア、３ペア、・・・）で持ち、弱判別器がその組み合わせに応じて特徴量を計算する場合もある。２ペアの場合は４点参照、３ペアの場合は６点参照となる。 The feature amount calculation in the weak classifier is based on the difference in pixel value between two points (two regions). Each weak discriminator performs a difference calculation using one of a plurality of basic feature types related to the difference calculation, and obtains a score related to the presence of the detection target from the input image. The basic feature types related to the difference calculation are, for example, a difference between two points in the template such as a difference between two points arranged in the horizontal direction, a difference between two points arranged in the vertical direction, and a difference between two points arranged in the diagonal direction. It can be defined by positional relationship. In some cases, the basic feature type has a positional relationship between two points in a plurality of pairs (2 pairs, 3 pairs,...), And the weak classifier calculates a feature amount according to the combination. In the case of 2 pairs, 4 points are referenced, and in the case of 3 pairs, 6 points are referenced.

オブジェクト検出装置は、例えば、６４０×４８０画素の検出対象の画像に対して、３２×３２画素のテンプレート（ウィンドウ）を１画素単位又は２数画素単位でラスタスキャンし、テンプレートの各位置で切り出される部分画像を強判別器に与える。強判別器は、初段側から順次に弱判別器による判別を行い、最終段に到達したときの各弱判別器のスコアの合計をしきい値処理する。強判別器は、スコア合計がしきい値以上のとき、テンプレートにより切り出される３２×３２画素の位置に、検出対象のオブジェクトが現れている旨を出力する。 The object detection device, for example, performs a raster scan of a 32 × 32 pixel template (window) in units of one pixel or two or more pixels on an image to be detected of 640 × 480 pixels, and is cut out at each position of the template. The partial image is given to the strong classifier. The strong discriminator performs discrimination by the weak discriminator sequentially from the first stage side, and performs threshold processing on the total score of each weak discriminator when the final stage is reached. When the total score is equal to or greater than the threshold value, the strong discriminator outputs that a detection target object appears at a position of 32 × 32 pixels cut out by the template.

特開２００７−４７９６５号公報JP 2007-47965 A 特開２００７−１２８１２７号公報JP 2007-128127 A

通常、強判別器では、各弱判別器においてその段までのスコアをしきい値処理し、スコアがしきい値より低いとき、後段の弱判別器の処理を行わずに処理を終了するアーリーリジェクト判断（early reject判断）が行われる。アーリーリジェクト（早期終了）を行うことで、検出対象のオブジェクトが含まれないことが明らかな画像に対しては、直列接続された数千の弱判別器のうちの比較的早い段階で処理を終了することができ、最終段の弱判別器まで処理を行う場合に比して処理を高速化できる。特許文献１及び２にも記載されるように、一般に、学習により生成された弱判別器は、重み付き正答率が高い順に線形結合され、１つの強判別器が構成される。言い換えれば、学習により生成された複数の判別器を、判別に有効な順に直列接続することで、強判別器が構成される。 Normally, in the strong classifier, each weak classifier performs threshold processing on the score up to that stage, and when the score is lower than the threshold, an early reject that ends the process without performing the subsequent weak classifier processing. A decision (early reject decision) is made. By performing early rejection (early termination), processing is completed at a relatively early stage among thousands of weak classifiers connected in series for images that clearly show that objects to be detected are not included. Therefore, the processing speed can be increased as compared with the case where the processing is performed up to the weak classifier at the final stage. As described in Patent Documents 1 and 2, in general, weak classifiers generated by learning are linearly combined in descending order of weighted correct answer rate to constitute one strong classifier. In other words, a strong discriminator is configured by connecting a plurality of discriminators generated by learning in series in order effective for discrimination.

ところで、近年、検出対象オブジェクトのおおよその位置とサイズとを高速に推定する技術が開発されている。この技術を、強判別器の前処理として用い、前処理において抽出されたエリアの画像を強判別器の処理対象画像として用いることを考える。その場合、前処理において抽出されるエリアのほとんどが検出対象オブジェクトのエリアとなるものと考えられるため、弱判別器の初期の段階で早期終了となることは少なく、ほとんどのケースで、弱判別器の最終段近くまで処理が進行することになると考えられる。従って、早期終了を行っても処理高速化の効果は大きくない。むしろ、各弱識別器で早期終了の判断（条件分岐処理）を行うことで、パイプライン処理の乱れ（ハザード）が生じ、処理高速化の阻害要因となる。 By the way, in recent years, a technique for estimating the approximate position and size of an object to be detected at high speed has been developed. Consider that this technique is used as pre-processing for a strong classifier, and an image of an area extracted in the pre-processing is used as a processing target image for the strong classifier. In that case, since most of the areas extracted in the pre-processing are considered to be the areas of the detection target object, it is unlikely that the weak classifier ends early, and in most cases the weak classifier It is thought that the process will proceed to near the last stage. Therefore, even if the early termination is performed, the effect of increasing the processing speed is not significant. Rather, if each weak discriminator determines early termination (conditional branch processing), the pipeline processing is disturbed (hazard), which hinders the processing speed.

早期終了の思想は、検出対象オブジェクトと背景の領域割合との大きな開きがベースとなっている。つまり、画像の大部分が背景領域で、検出対象オブジェクトが少ないという事前知識（仮定）をおいている。一般的な処理系においては、早期終了判断により処理を高速化できる。しかし、オブジェクトが存在する確率が高い部分を対象に強判別器の処理を行うような場合は、上記したように早期終了判断が高速化の阻害要因になることを、本発明者は見出した。 The idea of early termination is based on a large gap between the detection target object and the background area ratio. That is, prior knowledge (assuming) that most of the image is the background area and the number of detection target objects is small. In a general processing system, processing can be speeded up by early termination determination. However, the present inventor has found that, when the strong classifier process is performed on a portion where the probability that an object exists is high, the early termination determination is an impediment to speedup as described above.

早期終了を行わないとすると、毎回、カスケード接続された例えば数千の弱判別器において差分計算を行う必要がある。テンプレートを、画像内で走査することを考えると、以前のテンプレートの位置においてある弱判別器において差分計算された画素（オブジェクトが存在すると推定された部分の周辺の画像における画素）の２点が、別のテンプレートの位置において別の弱判別器で重複して差分計算されることがある。例えば、ある弱判別器Ａが、テンプレート内の相対座標で（ｘ，ｙ）の位置と（ｘ＋１，ｙ）の位置との差分計算を行い、別の弱判別器Ｂが（ｘ＋３，ｙ＋３）の位置と（ｘ＋４，ｙ＋３）の位置との差分計算を行う場合を考える。テンプレートが、ｘ方向に＋３、ｙ方向に＋３だけ移動したとすると、移動後の相対座標位置（ｘ，ｙ）は、移動前のテンプレートの位置における相対座標位置（ｘ＋３，ｙ＋３）と重なる。移動前のテンプレートの位置において、弱判別器Ｂが、（ｘ＋３，ｙ＋３）の位置と（ｘ＋４，ｙ＋３）の位置との差分計算を既に行っているものの、テンプレートの移動後、弱判別器Ａは、同じ差分計算を繰り返し行わなければならない。このように、テンプレートを移動したときに、個別の弱判別器において同じ差分計算を繰り返し行う必要があり、同じ差分計算を繰り返し行わなければならない分だけ、処理時間が無駄に長くなっていた。 If the early termination is not performed, it is necessary to perform difference calculation every time, for example, in thousands of weak classifiers cascaded. Considering that the template is scanned in the image, two points of pixels (pixels in the image around the portion where the object is estimated to be present) calculated by the difference in the weak classifier at the position of the previous template are In some cases, the difference calculation is performed by another weak classifier at different template positions. For example, one weak classifier A calculates the difference between the position (x, y) and the position (x + 1, y) in relative coordinates in the template, and another weak classifier B is (x + 3, y + 3). Consider a case where the difference between the position and the position of (x + 4, y + 3) is calculated. If the template moves by +3 in the x direction and +3 in the y direction, the relative coordinate position (x, y) after the movement overlaps with the relative coordinate position (x + 3, y + 3) at the template position before the movement. Although the weak classifier B has already calculated the difference between the position (x + 3, y + 3) and the position (x + 4, y + 3) at the position of the template before the movement, the weak classifier A The same difference calculation must be repeated. As described above, when the template is moved, it is necessary to repeatedly perform the same difference calculation in individual weak classifiers, and the processing time is unnecessarily increased by the amount that the same difference calculation must be repeated.

本発明は、上記に鑑み、重複する差分計算を避けることができ、その分だけ処理を高速化できるオブジェクト判別装置、方法、及びプログラムを提供することを目的とする。 In view of the above, an object of the present invention is to provide an object discriminating apparatus, method, and program capable of avoiding overlapping difference calculations and accelerating the process accordingly.

上記目的を達成するために、本発明は、それぞれが、入力画像における少なくとも１組の２点間の差分を求め、該求めた差分に基づいて検出対象物の存在に関するスコアを求める複数の弱判別器がカスケード接続された強判別器と、前記弱判別器で差分を求めるべき画像の２点間の位置関係に応じたずれ量を設定し、前記入力画像と前記入力画像を前記設定したずれ量だけずらした画像との差分画像を生成する差分画像生成手段とを備え、前記複数の弱判別器のうちの少なくとも一部が、前記差分画像を参照して前記少なくとも１組の２点間の差分を取得し、前記スコアを求めるものであることを特徴とするオブジェクト判別装置を提供する。 In order to achieve the above object, the present invention obtains a plurality of weak discriminations, each of which obtains a difference between at least one pair of two points in an input image and obtains a score related to the presence of a detection object based on the obtained difference. A shift amount corresponding to the positional relationship between two points of the strong classifier in which the units are cascade-connected and the image whose difference is to be obtained by the weak classifier, and the set shift amount between the input image and the input image Difference image generation means for generating a difference image with the image shifted by at least one of the plurality of weak classifiers, the difference between the at least one set of two points with reference to the difference image Is obtained, and the score is obtained.

前記強判定器が、前記入力画像内でテンプレートを所定の走査順で走査し、該走査されたテンプレートの各位置について前記複数の弱判別器における処理を実行する構成とすることができる。 The strong discriminator may scan the template in the input image in a predetermined scanning order, and execute processing in the plurality of weak discriminators for each position of the scanned template.

本発明においては、前記差分画像生成手段が、前記複数の弱判別器で差分を求めるべき複数の２点間の位置関係に応じたずれ量を順次に設定し、各ずれ量に対応した複数の差分画像を生成することとすることができる。 In the present invention, the difference image generation means sequentially sets a shift amount according to a positional relationship between a plurality of two points from which a difference is to be obtained by the plurality of weak discriminators, and a plurality of shift amounts corresponding to the shift amounts. A difference image can be generated.

本発明においては、前記差分画像生成手段が、前記複数の弱判別器で差分を求めるべき全ての２点間の位置関係に応じたずれ量を順次に設定し、前記複数の弱判別器のそれぞれが、前記差分画像を参照して前記少なくとも１組の２点間の差分を取得する構成を採用することができる。 In the present invention, the difference image generation means sequentially sets a shift amount according to the positional relationship between all the two points for which differences should be obtained by the plurality of weak classifiers, and each of the plurality of weak classifiers. However, the structure which acquires the difference between said at least 1 set of 2 points | pieces with reference to the said difference image is employable.

前記強判別器は、前記カスケード接続された弱判別器の初段から最終段まで、早期終了を行うことなく各弱判別器における処理を実行してもよい。 The strong discriminator may execute processing in each weak discriminator without early termination from the first stage to the final stage of the cascaded weak discriminators.

前記複数の弱判別器のそれぞれが、差分計算に関する複数の基本特徴タイプの何れかで前記少なくとも１組の２点間の差分を求め、前記強判別器では、前記基本特徴タイプが同じ弱判別器が連続して並べられている構成を採用してもよい。 Each of the plurality of weak classifiers obtains a difference between the at least one set of two points in any of a plurality of basic feature types related to difference calculation, and the strong classifier has the same basic feature type as the weak classifier. A configuration may be adopted in which are continuously arranged.

前記強判別器では、基本特徴タイプが同じ弱判別器が複数あるとき、該基本特徴タイプが同じ複数の弱判別器が、各弱判別器における差分計算の際の画像の参照位置に従った並び順で並べられている構成とすることができる。 In the strong discriminator, when there are a plurality of weak discriminators having the same basic feature type, the plurality of weak discriminators having the same basic feature type are arranged according to the reference position of the image in the difference calculation in each weak discriminator. It can be set as the structure arranged in order.

上記に代えて、前記強判別器では、各弱判別器における差分計算の際の画像の参照位置に従った並び順で前記弱判別器が並べられている構成を採用してもよい。 Instead of the above, the strong discriminator may adopt a configuration in which the weak discriminators are arranged in the arrangement order according to the reference position of the image at the time of difference calculation in each weak discriminator.

本発明のオブジェクト判別装置では、処理対象の画像からオブジェクトの位置を推定して該推定したオブジェクトの位置の周辺の画像を切り出し、該切り出した画像をそれぞれ前記強判別器及び前記差分画像生成手段に前記入力画像として与えるオブジェクト候補点検出手段を更に備える構成を採用することができる。 In the object discriminating apparatus of the present invention, the position of the object is estimated from the processing target image, an image around the estimated object position is cut out, and the cut out images are respectively sent to the strong discriminator and the difference image generating unit. It is possible to employ a configuration further comprising object candidate point detection means provided as the input image.

前記オブジェクト候補点検出手段が、オブジェクトの輪郭形状に対応したフィルタ特性を有する平滑化フィルタを画像に畳み込む処理を繰り返し行い、前記フレーム画像からスケールが異なる複数枚の平滑化画像を生成する平滑化処理手段と、前記複数枚の平滑化画像のうち、スケールが互いに異なる２枚の平滑化画像間の差分画像を、スケールを変えつつ複数枚生成する差分画像生成手段と、前記複数枚の差分画像を合算し合算画像を生成する合算手段と、前記合算画像における画素値に基づいてオブジェクトの位置を推定する位置推定手段と、前記フレーム画像から前記推定された位置の周辺の領域の画像を切り出す部分画像生成手段とを含む構成としてもよい。 A smoothing process in which the object candidate point detection unit repeatedly performs a process of convolving a smoothing filter having a filter characteristic corresponding to the contour shape of the object into an image, and generates a plurality of smoothed images having different scales from the frame image Means, a difference image generating means for generating a plurality of difference images between two smoothed images having different scales among the plurality of smoothed images, while changing the scale, and the plurality of difference images. Summing means for summing and generating a summed image, position estimating means for estimating the position of an object based on pixel values in the summed image, and a partial image for cutting out an image of a region around the estimated position from the frame image It is good also as a structure containing a production | generation means.

前記平滑化処理手段がスケールσ_１からσ_ａ×ｋ（ａ及びｋは２以上の整数）までのａ×ｋ枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）（ｉ＝１〜ａ×ｋ）を生成し、前記差分画像生成手段が、スケールσ_１からσ_ｋまでのｋ枚の差分画像Ｇ（ｘ，ｙ，σ_ｊ）（ｊ＝１〜ｋ）を、それぞれスケールσ_ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ）とスケールσ_ｊ×ａの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ×ａ）との差分に基づいて生成することとしてもよい。 The smoothing processing means performs a × k smoothed images L (x, y, σ _i ) (i = ₁ to _{a ×)} from the scale σ ₁ to σ _{a × k} (a and k are integers of 2 or more). k), and the difference image generation means smoothes k difference images G (x, y, σ _j ) (j = 1 to k) from the scales σ ₁ to σ _k , respectively, on the scale σ _j . of the image _{L (x, y, σ j} ) the scale sigma _{j × a} smoothed image _{L (x, y, σ j} × a) may be generated based on the difference between.

上記に代えて、前記平滑化処理手段がスケールσ_１からσ_ｒ（ｒは３以上の整数）までのｒ枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）（ｉ＝１〜ｒ）を生成し、前記差分画像生成手段が、スケールσ_１からσ_ｋ−ｐ（ｐは１以上の整数）までのｋ−ｐ枚の差分画像Ｇ（ｘ，ｙ，σ_ｊ）（ｊ＝１〜ｋ−ｐ）を、それぞれスケールσ_ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ）とスケールσ_ｊ＋ｐの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ＋ｐ）との差分に基づいて生成することとしてもよい。 Instead of the above, the smoothing processing means outputs r smoothed images L (x, y, σ _i ) (i = ₁ to _r ) from the scale σ ₁ to σ _r (r is an integer of 3 or more). The difference image generation means generates kp difference images G (x, y, σ _j ) (j = 1 to k) from the scale σ ₁ to σ _k−p (p is an integer of 1 or more). the -p), the smoothed image L (x, respectively scale σ _{_j,} y, σ _j) the scale sigma _{j + p} of the smoothed image L (x, y, even be generated based on the difference between the σ _{j + p)} Good.

本発明は、また、それぞれが、入力画像における少なくとも１組の２点間の差分に基づいて検出対象物の存在に関するスコアを求める複数の弱判別をカスケードに実行するステップと、前記入力画像と、該入力画像を前記弱判別で差分を求めるべき画像の２点間の位置関係に応じたずれ量だけずらした画像との差分画像を生成するステップとを有し、前記複数の弱判別をカスケードに実行するステップの少なくとも一部において、前記差分画像を参照して前記少なくとも１組の２点間の差分を取得し、前記スコアを求めることを特徴とするオブジェクト判別方法を提供する。 The present invention also includes a step of cascading a plurality of weak discriminations, each of which obtains a score related to the presence of a detection object based on a difference between at least one set of two points in the input image, and the input image; Generating a difference image between the input image and an image shifted by a shift amount corresponding to a positional relationship between two points of an image whose difference should be obtained by the weak discrimination, and the plurality of weak discriminations are cascaded. In at least a part of the steps to be executed, an object discrimination method is provided, wherein a difference between the at least one set of two points is obtained with reference to the difference image and the score is obtained.

更に本発明は、コンピュータに、それぞれが、入力画像における少なくとも１組の２点間の差分に基づいて検出対象物の存在に関するスコアを求める複数の弱判別をカスケードに実行する手順と、前記入力画像と、該入力画像を前記弱判別で差分を求めるべき画像の２点間の位置関係に応じたずれ量だけずらした画像との差分画像を生成するステップを手順とを実行させ、コンピュータに、前記複数の弱判別をカスケードに実行する手順の少なくとも一部において、前記差分画像を参照して前記少なくとも１組の２点間の差分を取得し、前記スコアを求める手順を実行させるためのプログラムを提供する。 Furthermore, the present invention provides a procedure for executing, in cascade, a plurality of weak discriminations, each of which obtains a score related to the presence of a detection object based on a difference between at least one set of two points in an input image, and the input image. And a step of generating a difference image between the input image and an image shifted by a shift amount according to a positional relationship between two points of the image from which the difference is to be obtained by the weak discrimination. Provided is a program for acquiring a difference between the at least one set of two points with reference to the difference image and executing the procedure for obtaining the score in at least a part of the procedure of performing a plurality of weak discriminations in cascade To do.

本発明のオブジェクト判別装置、方法、及びプログラムは、入力画像と、弱判別で差分を求めるべき画像の２点間の位置関係に応じたずれ量だけ入力画像をずらした画像との差分画像を生成し、弱判別において、差分画像を参照して差分を取得する。各弱判別において個別に差分計算を行う場合、テンプレート移動の前後で入力画像の同じ位置の差分が重複して計算されることがある。本発明では、差分画像を参照することで差分が取得できるため、複数の弱判別において差分計算が重複して実行されるのを避けることができ、処理の高速化を図ることができる。 The object discriminating apparatus, method, and program according to the present invention generate a difference image between an input image and an image obtained by shifting the input image by a shift amount corresponding to the positional relationship between two points of an image whose difference should be obtained by weak discrimination. In the weak discrimination, the difference is acquired with reference to the difference image. When the difference calculation is performed individually in each weak discrimination, the difference at the same position of the input image may be calculated before and after the template movement. In the present invention, since the difference can be acquired by referring to the difference image, it is possible to avoid that the difference calculation is repeatedly executed in a plurality of weak discriminations, and the processing speed can be increased.

本発明の第１実施形態のオブジェクト判別装置を示すブロック図。1 is a block diagram illustrating an object determination device according to a first embodiment of the present invention. 判別器の構成を示すブロック図。The block diagram which shows the structure of a discriminator. オブジェクト判別装置の動作手順を示すフローチャート。The flowchart which shows the operation | movement procedure of an object discrimination device. 差分画像生成の手順を示すフローチャート。The flowchart which shows the procedure of a difference image generation. オブジェクト候補点検出手段の構成例を示すブロック図。The block diagram which shows the structural example of an object candidate point detection means. オブジェクト候補点検出手段の動作手順を示すフローチャート。The flowchart which shows the operation | movement procedure of an object candidate point detection means. （ａ）及び（ｂ）は、基本特徴タイプを例示する図。(A) And (b) is a figure which illustrates a basic feature type. 本発明の第２実施形態のオブジェクト判別装置で用いられる判別器を示すブロック図。The block diagram which shows the discriminator used with the object discrimination device of 2nd Embodiment of this invention. 判別器の構成に用いられる判別器構成装置を示すブロック図。The block diagram which shows the discriminator structure apparatus used for the structure of a discriminator. （ａ）は学習後の判別器を示すブロック図、（ｂ）は再配置後の判別器を示すブロック図。(A) is a block diagram showing a discriminator after learning, (b) is a block diagram showing a discriminator after rearrangement. （ａ）は、基本特徴タイプ１における弱判別器の並び順を示すブロック図、（ｂ）は、テンプレート内での各弱判別器の画像の参照位置を示す図。(A) is a block diagram showing the order of weak classifiers in basic feature type 1, and (b) is a diagram showing the reference positions of the images of the weak classifiers in the template. 本発明の第３実施形態における判別器の構成に用いる判別器構成装置を示すブロック図。The block diagram which shows the discriminator structure apparatus used for the structure of the discriminator in 3rd Embodiment of this invention.

以下、図面を参照し、本発明の実施の形態を詳細に説明する。図１は、本発明の第１実施形態のオブジェクト判別装置を示す。オブジェクト判別装置１０は、画像入力手段１１、オブジェクト候補点検出手段１２、判別器１３、ルックアップテーブル１４、及び差分画像生成手段１５を備える。オブジェクト判別装置１０内の各部の機能は、例えばコンピュータ（プロセッサ）が所定のプログラムに従って処理を実行することで実現可能である。オブジェクト判別装置１０は、例えばカメラに組み込まれ、カメラで撮影すべき画像に検出対象のオブジェクトが存在するか否かの判別を行う。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 shows an object discriminating apparatus according to a first embodiment of the present invention. The object discrimination device 10 includes an image input unit 11, an object candidate point detection unit 12, a discriminator 13, a lookup table 14, and a difference image generation unit 15. The function of each unit in the object determination device 10 can be realized by, for example, a computer (processor) executing processing according to a predetermined program. The object determination device 10 is incorporated in, for example, a camera and determines whether or not an object to be detected exists in an image to be photographed by the camera.

画像入力手段１１は、処理対象の画像を入力する。画像入力手段１１は、例えば６４０×４８０画素の画像を処理対象画像として入力する。画像入力手段１１は、例えば動画像を構成する各画像（各フレームの画像）を所定のレートで順次に入力してもよい。オブジェクト候補点検出手段１２は、処理対象画像から、所定のアルゴリズムで検出対象オブジェクトのおおよその位置を推定する。また、オブジェクト候補点検出手段１２は、オブジェクトのサイズを推定する。オブジェクト候補点検出手段１２は、処理対象画像から、オブジェクトが存在すると推定される位置の周辺の画像を切り出し、切り出した画像を、推定したサイズに応じて拡大／縮小する。なお、画像入力手段１１は、入力された処理対象画像に対して、例えばノイズの除去やフレーム間の輝度変動の抑制などの所定の画像処理を施し、処理後の画像をオブジェクト候補点検出手段１２に入力するようにしてもよい。 The image input unit 11 inputs an image to be processed. For example, the image input unit 11 inputs an image of 640 × 480 pixels as a processing target image. For example, the image input unit 11 may sequentially input each image (image of each frame) constituting the moving image at a predetermined rate. The object candidate point detection means 12 estimates the approximate position of the detection target object from the processing target image using a predetermined algorithm. Further, the object candidate point detection means 12 estimates the size of the object. The object candidate point detection unit 12 cuts out an image around a position where an object is estimated to exist from the processing target image, and enlarges / reduces the cut-out image according to the estimated size. The image input unit 11 performs predetermined image processing such as noise removal and suppression of luminance fluctuation between frames on the input processing target image, and the processed image is subjected to object candidate point detection unit 12. You may make it input into.

判別器１３は、オブジェクト候補点検出手段１２から、オブジェクト候補点検出手段１２が切り出した、オブジェクトが存在すると推定される位置の周辺の画像を入力する。判別器１３は、それぞれが、入力画像から検出対象物の存在に関するスコアを求める複数の弱判別器を含む。複数の弱判別器をカスケード接続することで、判別器（強判別器）１３が構成される。判別器１３は、各弱判別器により求められたスコアの合計をしきい値処理し、入力画像に検出対象のオブジェクトが存在しているか否かを判別する。 The discriminator 13 inputs from the object candidate point detection means 12 an image around the position where the object is estimated to be present, which is extracted by the object candidate point detection means 12. Each of the discriminators 13 includes a plurality of weak discriminators that obtain a score related to the presence of the detection target from the input image. A classifier (strong classifier) 13 is configured by cascading a plurality of weak classifiers. The discriminator 13 performs threshold processing on the total score obtained by each weak discriminator, and discriminates whether or not an object to be detected exists in the input image.

判別器１３は、入力画像のサイズがテンプレートのサイズよりも大きいときは、入力画像内でテンプレートをラスタスキャンして入力画像からテンプレートのサイズに相当する画像を切り出し、その切り出した画像を順次に弱判別器に与える。判別器１３は、走査されたテンプレートの各位置について、複数の弱判別器における処理を実行する。判別器１３は、カスケード接続された弱判別器の初段から最終段まで、早期終了を行うことなく各弱判別器における処理を実行する。 When the size of the input image is larger than the size of the template, the discriminator 13 raster-scans the template in the input image, cuts out an image corresponding to the size of the template from the input image, and sequentially weakens the cut-out images. Give to the classifier. The discriminator 13 executes processing in a plurality of weak discriminators for each position of the scanned template. The discriminator 13 executes processing in each weak discriminator without early termination from the first stage to the final stage of the cascaded weak discriminators.

図２は、判別器１３の構成を示す。判別器１３は、カスケード接続された複数の弱判別器１６を含んでいる。各弱判別器１６は、入力画像における少なくとも１組の２点間の差分を求め、求めた差分に基づいて検出対象物の存在に関するスコアを求める。判別器１３は、テンプレートの大きさ、例えば３２×３２画素の大きさの画像中に検出対象のオブジェクトが存在する場合の画像と存在しない場合の画像とを用いて、機械学習を用いて生成される。各弱判別器１６は、差分計算に関する複数の基本特徴タイプの何れかで、少なくとも１組の２点間の差分を求める。各弱判別器１６が何れの基本特徴タイプで差分を求めるかは、学習のプロセスにおいて決まる。 FIG. 2 shows the configuration of the discriminator 13. The discriminator 13 includes a plurality of weak discriminators 16 connected in cascade. Each weak discriminator 16 obtains a difference between at least one set of two points in the input image, and obtains a score related to the presence of the detection target based on the obtained difference. The discriminator 13 is generated using machine learning using an image when an object to be detected exists in an image having a template size, for example, a size of 32 × 32 pixels, and an image when the object to be detected does not exist. The Each weak discriminator 16 obtains a difference between at least one set of two points in any of a plurality of basic feature types related to difference calculation. It is determined in the learning process by which basic feature type each weak discriminator 16 obtains the difference.

弱判別器１６が求める差分は、２つの画素位置の画素値の差分でもよく、或いは２つの領域の画素値の差分でもよい。領域間の画素値の差分は、領域内の画素値の合計の差分でもよいし、領域内の画素値の平均値の差分でもよい。各弱判別器１６は、差分に基づいてスコアを求める。各弱判別器１６は、前段の弱判別器１６までのスコアの累計に自身が求めたスコアを加算し、次段の弱判別器１６に渡す。この処理を最終段の弱判別器１６まで行い、最終的に得られたスコアが、判別器１３における検出対象オブジェクトの存在に関するスコアとなる。 The difference obtained by the weak discriminator 16 may be a pixel value difference between two pixel positions or a pixel value difference between two regions. The difference between the pixel values between the regions may be a total difference between the pixel values within the region, or may be a difference between the average values of the pixel values within the region. Each weak discriminator 16 obtains a score based on the difference. Each weak discriminator 16 adds the score obtained by itself to the total score up to the weak discriminator 16 in the previous stage and passes it to the weak discriminator 16 in the next stage. This process is performed up to the weak discriminator 16 at the final stage, and the finally obtained score becomes a score relating to the presence of the detection target object in the discriminator 13.

図１に戻り、差分画像生成手段１５は、判別器１３への入力画像と、その入力画像を、弱判別器１６（図２）で差分を求めるべき画像の２点間の位置関係に応じたずれ量だけずらした画像との差分画像を生成する。差分画像生成手段１５は、いくつかの弱判別器１６で差分を求めるべき２点間の位置関係に応じた複数のずれ量を順次に設定し、各ずれ量に対応した複数の差分画像を生成する。カスケード接続された複数の弱判別器１６のうちの少なくとも一部は、差分画像を参照して差分を取得し、取得した差分に基づいてスコアを求める。 Returning to FIG. 1, the difference image generation means 15 determines the input image to the discriminator 13 and the input image according to the positional relationship between two points of the image whose difference is to be obtained by the weak discriminator 16 (FIG. 2). A difference image from the image shifted by the shift amount is generated. The difference image generation means 15 sequentially sets a plurality of shift amounts corresponding to the positional relationship between two points at which differences are to be obtained by several weak classifiers 16, and generates a plurality of difference images corresponding to each shift amount. To do. At least a part of the plurality of weak classifiers 16 connected in cascade acquires a difference by referring to the difference image, and obtains a score based on the acquired difference.

ルックアップテーブル１４は、弱判別器１６における差分計算で求まる特徴空間と、検出対象のオブジェクトの存在に関するスコアとの関係を保持する。ルックアップテーブル１４は、例えば判別器１３の学習の際に、基本特徴タイプごとに生成される。各弱判別器１６は、求めた差分に基づいて、自身の基本特徴タイプに対して用意されたルックアップテーブルを参照し、求めた差分からスコアを求める。例えば、弱判別器１６が３組の差分（６点参照）を求めるものである場合、弱判別器１６は、１組目の差分値をα、２組目の差分値をβ、３組目の差分値をγとして、（α，β，γ）を特徴空間として求める。弱判別器１６は、ルックアップテーブルの配列要素［α］［β］［γ］を参照し、その配列要素に格納されている値をスコアとして取得する。 The look-up table 14 holds the relationship between the feature space obtained by the difference calculation in the weak discriminator 16 and the score regarding the presence of the object to be detected. The lookup table 14 is generated for each basic feature type, for example, when the discriminator 13 learns. Each weak classifier 16 refers to a lookup table prepared for its basic feature type based on the obtained difference, and obtains a score from the obtained difference. For example, when the weak discriminator 16 calculates three sets of differences (see 6 points), the weak discriminator 16 sets the first set of difference values to α, the second set of difference values to β, and the third set. And (α, β, γ) is obtained as a feature space. The weak classifier 16 refers to the array element [α] [β] [γ] of the lookup table, and acquires the value stored in the array element as a score.

図３は、オブジェクト判別装置１０の動作手順を示す。画像入力手段１１は、処理対象の画像を入力する（ステップＡ１）。オブジェクト候補点検出手段１２は、処理対象画像におけるオブジェクトが存在すると推定される位置の周辺の画像を部分画像として生成する（ステップＡ２）。オブジェクト候補点検出手段１２の具体的な構成例、及びその手順については後述する。オブジェクト候補点検出手段１２は、例えばテンプレートのサイズより大きなサイズの画像を、部分画像として生成する。 FIG. 3 shows an operation procedure of the object discrimination device 10. The image input unit 11 inputs an image to be processed (step A1). The object candidate point detection means 12 generates an image around the position where the object in the processing target image is estimated to exist as a partial image (step A2). A specific configuration example and procedure of the object candidate point detection unit 12 will be described later. The object candidate point detection means 12 generates, for example, an image having a size larger than the template size as a partial image.

差分画像生成手段１５は、オブジェクト候補点検出手段１２が生成した部分画像に基づいて差分画像を生成する（ステップＡ３）。図４は、差分画像生成の手順を示す。差分画像生成手段１５は、オブジェクト候補点検出手段１２から、オブジェクト候補点検出手段１２が生成した部分画像、すなわちオブジェクトが存在すると推定される位置の周辺の画像を入力する（ステップＢ１）。 The difference image generation means 15 generates a difference image based on the partial image generated by the object candidate point detection means 12 (step A3). FIG. 4 shows a difference image generation procedure. The difference image generation means 15 inputs from the object candidate point detection means 12 a partial image generated by the object candidate point detection means 12, that is, an image around a position where an object is estimated to exist (step B1).

差分画像生成手段１５は、画像の相対的なずれ量を設定する（ステップＢ２）。差分画像生成手段１５は、ステップＢ２では、弱判別器１６で計算されるべき差分ピッチをずれ量として設定する。言い換えれば、弱判別器１６で差分を求めるべき画像の２点間の位置関係に応じたずれ量を設定する。差分画像生成手段１５は、ステップＢ１で入力した画像の各画素と、その入力画像をステップＢ２で設定したずれ量だけずらした画像の各画素との差分を求め、その差分値を画素値とする差分画像を生成する（ステップＢ３）。 The difference image generation means 15 sets the relative shift amount of the image (step B2). In step B2, the difference image generation means 15 sets the difference pitch to be calculated by the weak discriminator 16 as a deviation amount. In other words, the weak discriminator 16 sets a shift amount corresponding to the positional relationship between two points of the image whose difference is to be obtained. The difference image generation means 15 obtains a difference between each pixel of the image input in step B1 and each pixel of the image obtained by shifting the input image by the shift amount set in step B2, and uses the difference value as a pixel value. A difference image is generated (step B3).

差分画像生成手段１５は、判別器１３を構成する弱判別器１６で差分を求めるべき全ての２点間の位置関係に応じたずれ量をステップＢ２で設定したか否かを判断する（ステップＢ４）。差分画像生成手段１５は、全てのずれ量を設定したと判断すると、差分画像の生成を終了する。差分画像生成手段１５は、未設定のずれ量があると判断するとステップＢ２に戻り、未設定のずれ量のうちの１つを設定する。差分画像生成手段１５は、ステップＢ２からステップＢ４を繰り返し実行することで、弱判別器１６で差分を求めるべき複数の２点間の位置関係に応じたずれ量を順次に設定し、各ずれ量に対応した複数の差分画像を生成する。 The difference image generation means 15 determines whether or not the deviation amount corresponding to the positional relationship between all two points for which the difference is to be obtained by the weak classifier 16 constituting the classifier 13 is set in step B2 (step B4). ). When the difference image generation unit 15 determines that all the shift amounts have been set, the generation of the difference image ends. When the difference image generating means 15 determines that there is an unset deviation amount, the difference image generating means 15 returns to step B2 and sets one of the unset deviation amounts. The difference image generation means 15 repeatedly executes steps B2 to B4, so that the weak discriminator 16 sequentially sets a shift amount according to the positional relationship between a plurality of two points for which a difference is to be obtained. A plurality of difference images corresponding to are generated.

図３に戻り、判別器１３は、テンプレートの初期位置を設定する（ステップＡ４）。判別器１３は、例えば入力画像であるオブジェクト候補点検出手段１２が生成した部分画像の原点（０，０）をテンプレートの初期位置として決定する。判別器１３は、入力画像からテンプレートのサイズの画像を切り出し（ステップＡ５）、切り出した画像をカスケード接続された弱判別器１６の初段に与え、弱判別器１６による処理を開始する。 Returning to FIG. 3, the discriminator 13 sets the initial position of the template (step A4). The discriminator 13 determines, for example, the origin (0, 0) of the partial image generated by the object candidate point detection means 12 that is an input image as the initial position of the template. The discriminator 13 cuts out an image of the template size from the input image (step A5), gives the cut-out image to the first stage of the cascade-connected weak discriminator 16, and starts processing by the weak discriminator 16.

弱判別器１６は、ステップＡ３で生成された差分画像から、少なくとも１組の２点間の差分を取得する（ステップＡ６）。弱判別器１６は、ステップＡ６では、ステップＡ３で生成された複数の差分画像のうち、差分を求めるべき画像の２点間の位置関係に対応したずれ量で生成された差分画像を参照して差分を取得する。弱判別器１６は、求めた差分に基づいてルックアップテーブル１４を参照し、スコアを求める（ステップＡ７）。弱判別器１６は、前段の弱判別器１６までのスコアにステップＡ７で求めたスコアを加算する（ステップＡ８）。 The weak classifier 16 acquires at least one set of differences between two points from the difference image generated in step A3 (step A6). In step A6, the weak discriminator 16 refers to the difference image generated with the shift amount corresponding to the positional relationship between the two points of the image for which the difference is to be obtained among the plurality of difference images generated in step A3. Get the difference. The weak discriminator 16 obtains a score by referring to the lookup table 14 based on the obtained difference (step A7). The weak classifier 16 adds the score obtained in step A7 to the score up to the weak classifier 16 in the previous stage (step A8).

判別器１３は、カスケード接続された弱判別器１６の最終段まで処理を行ったか否かを判断する（ステップＡ９）。最終段に到達していないときは、処理を次の段に進め（ステップＡ１０）、ステップＡ６に戻り、次の段の弱判別器１６による処理を継続する。判別器１３では、各段の弱判別器１６がステップＡ６からステップＡ８を実行し、処理が次の段に進んでいく。判別器１３は、最終段の弱判別器１６まで処理を終えたと判断すると、最終段で得られたスコアの合計と、所定のしきい値とを比較する（ステップＡ１１）。判別器１３は、スコア合計がしきい値以上であれば、ステップＡ５で切り出された画像部分に検出対象のオブジェクトが現れていると判断し、オブジェクトの位置を出力して（ステップＡ１２）、処理を終了する。 The discriminator 13 determines whether or not processing has been performed up to the final stage of the cascade-connected weak discriminator 16 (step A9). If the final stage has not been reached, the process proceeds to the next stage (step A10), the process returns to step A6, and the process by the weak classifier 16 at the next stage is continued. In the discriminator 13, the weak discriminator 16 at each stage executes Step A6 to Step A8, and the process proceeds to the next stage. When the discriminator 13 determines that the processing has been completed up to the weak discriminator 16 in the final stage, the total of the scores obtained in the final stage is compared with a predetermined threshold (step A11). If the total score is equal to or greater than the threshold value, the discriminator 13 determines that the object to be detected appears in the image portion cut out in step A5, and outputs the position of the object (step A12). Exit.

判別器１３は、ステップＡ１１でスコア合計がしきい値よりも小さいと判断すると、現在のテンプレートの位置にオブジェクトが存在しないと判断し、テンプレートの位置を移動させる（ステップＡ１３）。テンプレート移動後、ステップＡ５に戻り、入力画像から移動後のテンプレートの位置で画像を切り出す。判別器１３は、切り出した画像をカスケード接続された弱判別器１６の初段に与え、ステップＡ６からステップＡ１０を実行することで、移動後のテンプレートの位置に検出対象のオブジェクトが存在するか否かの判別を実行する。判別器１３は、ステップＡ５からステップＡ１１、ステップＡ１３を、テンプレートが最終スキャン位置に到達するまで繰り返し実行し、オブジェクト候補点検出手段１２が生成した部分画像にオブジェクトが現れているか否かを判別する。 If the discriminator 13 determines in step A11 that the total score is smaller than the threshold value, it determines that no object exists at the current template position, and moves the position of the template (step A13). After moving the template, the process returns to step A5, and the image is cut out from the input image at the position of the moved template. The discriminator 13 gives the clipped image to the first stage of the cascade-connected weak discriminator 16, and executes steps A6 to A10 to determine whether or not the object to be detected exists at the position of the template after movement. Perform the determination. The discriminator 13 repeatedly executes Step A5 to Step A11 and Step A13 until the template reaches the final scan position, and discriminates whether or not the object appears in the partial image generated by the object candidate point detection means 12. .

なお、図４に示す手順では、差分画像生成手段１５が、判別器１３を構成する弱判別器１６で差分を求めるべき全ての２点間の位置関係に応じたずれ量で差分画像を生成するものとして説明したが、これには限定されない。例えば２点間の位置関係に使用頻度に応じて、使用頻度が低いものについては差分画像を生成しなくてもよい。弱判別器１６は、自身が差分を求めるべき２点間の位置関係に対応した差分画像が存在しないときは、差分画像を参照するのに代えて、入力画像から差分値を計算すればよい。 In the procedure shown in FIG. 4, the difference image generation unit 15 generates a difference image with a deviation amount corresponding to the positional relationship between all two points for which the difference should be obtained by the weak classifier 16 constituting the classifier 13. Although described as a thing, it is not limited to this. For example, a difference image does not need to be generated for a low frequency of use according to the frequency of use of the positional relationship between two points. The weak discriminator 16 may calculate the difference value from the input image instead of referring to the difference image when there is no difference image corresponding to the positional relationship between the two points for which the difference is to be obtained.

続いて、オブジェクト候補点検出手段１２の具体的な構成例を説明する。図５は、オブジェクト候補点検出手段１２の構成例を示す。オブジェクト候補点検出手段１２は、前処理手段２１、平滑化処理手段２２、差分画像生成手段２３、合算手段２４、位置推定手段２５、サイズ推定手段２６、及び部分画像生成手段２７を有する。オブジェクト候補点検出手段１２は、画像内の特定パターン、例えば人物の頭部が存在すると推定される位置の周辺の画像を部分画像として切り出す。以下ではオブジェクト候補点検出手段１２が、オブジェクトが存在すると推定される位置を１つ推定し、その周辺の画像を部分画像として切り出すものとして説明を行う。 Next, a specific configuration example of the object candidate point detection unit 12 will be described. FIG. 5 shows a configuration example of the object candidate point detection unit 12. The object candidate point detection unit 12 includes a preprocessing unit 21, a smoothing processing unit 22, a difference image generation unit 23, a summation unit 24, a position estimation unit 25, a size estimation unit 26, and a partial image generation unit 27. The object candidate point detection unit 12 cuts out a specific pattern in the image, for example, an image around a position where a human head is estimated to exist as a partial image. In the following description, it is assumed that the object candidate point detection means 12 estimates one position where an object is estimated to exist and cuts out the surrounding image as a partial image.

前処理手段２１は、解像度変換手段５１と動き領域抽出手段５２とを有する。解像度変換手段５１は、動画像を構成するフレーム画像を所定の解像度に低解像度化する。解像度変換手段５１は、例えば画像の解像度を縦横それぞれ１／８倍に変換する。 The preprocessing unit 21 includes a resolution conversion unit 51 and a motion region extraction unit 52. The resolution conversion means 51 lowers the frame image constituting the moving image to a predetermined resolution. The resolution conversion means 51 converts, for example, the resolution of an image to 1/8 times in the vertical and horizontal directions.

動き領域抽出手段５２は、動画像を構成するフレーム画像から動き領域を抽出し動き領域抽出画像を生成する。動き領域の抽出には、例えば背景画像やフレーム間画像の差分を算出するなど任意の手法を用いることができる。動き領域抽出手段５２は、抽出された動きの量に応じて、動きがある領域ほど白く（階調値が高く）、動きが少ない領域ほど黒く（階調値が低く）なるようなグレースケール画像を動き領域抽出画像として生成する。動き領域抽出手段５２は、例えば階調数２５６のグレースケール画像に対して所定の関数に従って階調を変換し、白から黒までの階調数を減少させるコントラスト低減処理を実施してもよい。動き領域抽出手段５２は、グレースケール画像に代えて、動き領域を白、背景領域を黒にするような２値化画像を動き領域抽出画像として生成してもよい。 The motion region extraction unit 52 extracts a motion region from the frame image constituting the motion image and generates a motion region extraction image. For extracting the motion region, for example, an arbitrary method such as calculating a difference between the background image and the inter-frame image can be used. The motion region extraction means 52 is a grayscale image in which the region with motion is white (the tone value is high) and the region with less motion is black (the tone value is low) according to the amount of motion extracted. Is generated as a motion region extraction image. For example, the motion region extraction unit 52 may perform a contrast reduction process for converting the gray scale according to a predetermined function on a grayscale image having 256 gray scales and reducing the gray scale from white to black. The motion area extraction unit 52 may generate a binarized image in which the motion area is white and the background area is black instead of the grayscale image as the motion area extraction image.

平滑化処理手段２２には、前処理手段２１で前処理された画像Ｐ（ｘ，ｙ）、すなわち解像度が低解像度化され、動き領域が抽出された画像が入力される。平滑化処理手段２２は、平滑化フィルタを画像に畳み込む処理を繰り返し行うことにより、スケールが異なる複数枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）を生成する。 The smoothing processing unit 22 is input with the image P (x, y) pre-processed by the pre-processing unit 21, that is, the image whose resolution is reduced and the motion region is extracted. The smoothing processing means 22 generates a plurality of smoothed images L (x, y, σ _i ) having different scales by repeatedly performing a process of convolving the smoothing filter into the image.

平滑化処理手段２２は、まず画像Ｐ（ｘ，ｙ）に平滑化フィルタを畳み込むことで平滑化画像Ｌ（ｘ，ｙ，σ_１）を生成し、その平滑化画像Ｌ（ｘ，ｙ，σ_１）に更に平滑化フィルタを畳み込むことでスケールσ_２の平滑化画像＋（ｘ，ｙ，σ_２）を生成する。平滑化処理手段２２は、以降同様に平滑化フィルタの畳み込みを繰り返し行い、任意のスケールσ_ｑの平滑化画像Ｌ（ｘ，ｙ，σ_ｑ）から次のスケールσ_ｑ＋１の平滑化画像Ｌ（ｘ，ｙ，σ_ｑ＋１）を生成する。 The smoothing processing means 22 first generates a smoothed image L (x, y, σ ₁ ) by convolving a smoothing filter with the image P (x, y), and the smoothed image L (x, y, σ). ₁ ) is further convolved with a smoothing filter to generate a smoothed image + (x, y, σ ₂ ) of scale σ ₂ . The smoothing processing means 22 repeats the convolution of the smoothing filter in the same manner thereafter, and from the smoothed image L (x, y, σ _q ) of an arbitrary scale σ _{q to} the smoothed image L (x of the next scale σ _{q + 1} , Y, σ _{q + 1} ).

平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）におけるスケール番号ｉは、平滑化フィルタを畳み込んだ回数に相当する。平滑化処理手段２２は、例えばスケールが異なるａ×ｋ枚（ａ及びｋはそれぞれ２以上の整数）の平滑化画像Ｌ（ｘ，ｙ，σ_１）〜Ｌ（ｘ，ｙ，σ_ａ×ｋ）を生成する。平滑化処理手段２２は、例えばａ＝２、ｋ＝３０とすれば２×３０＝６０枚の平滑化画像Ｌ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_６０）を生成する。 The scale number i in the smoothed image L (x, y, σ _i ) corresponds to the number of times the smoothing filter is convoluted. The smoothing processing means 22 is, for example, a × k images (a and k are integers of 2 or more) of different scales L (x, y, σ ₁ ) to L (x, y, σ _{a × k).} ) Is generated. For example, if a = 2 and k = 30, the smoothing processing unit 22 generates 2 × 30 = 60 smoothed images L (x, y, σ ₁ ) to (x, y, σ ₆₀ ).

平滑化フィルタには、例えばガウシアンフィルタを用いることができる。平滑化フィルタは、例えば検出対象であるオブジェクトの輪郭形状に合わせたフィルタ特性となる３×３オペレータから成る。例えば判別器１３（図１）で検出対象とするオブジェクトが人物の頭部であれば、平滑化フィルタとして、人物の頭部の輪郭形状に沿って下側のフィルタ係数が小さくなる特性（オメガ形状）を有するフィルタを用いる。このような平滑化フィルタを用いることで、人物の頭部の輪郭形状を有する領域を強調し、それ以外の領域は抑制された平滑化処理を実現できる。 As the smoothing filter, for example, a Gaussian filter can be used. The smoothing filter is composed of, for example, a 3 × 3 operator having a filter characteristic that matches the contour shape of the object to be detected. For example, if the object to be detected by the discriminator 13 (FIG. 1) is a human head, the smoothing filter has a characteristic in which the lower filter coefficient decreases along the contour shape of the human head (omega shape). ) Is used. By using such a smoothing filter, it is possible to realize a smoothing process in which a region having a contour shape of a person's head is emphasized and other regions are suppressed.

なお、フィルタの形状はオメガ形状には限定されず、例えば特開２００３−２４８８２４号公報等に記載されたものなど、他の公知技術を適用することも可能である。例えば検出対象のオブジェクトの形状が円形、三角形、四角形などの場合には、それぞれのオブジェクト形状に合わせたフィルタ特性を有する平滑化フィルタを用いて平滑化処理を施せばよい。 The shape of the filter is not limited to the omega shape, and other known techniques such as those described in Japanese Patent Application Laid-Open No. 2003-248824 can be applied. For example, when the object to be detected has a circular shape, a triangular shape, a quadrangular shape, or the like, the smoothing process may be performed using a smoothing filter having a filter characteristic matched to each object shape.

差分画像生成手段２３は、平滑化処理手段２２が生成した複数枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）を入力し、スケールが互いに異なる２つの平滑化画像間の差分画像Ｇ（ｘ，ｙ，σ_ｊ）を、スケールを変えつつ複数枚生成する。ここで、差分画像Ｇ（ｘ，ｙ，σ_ｊ）におけるスケール番号ｊの最大値は、平滑化画像Ｌにおけるスケールσ_ｉの最大値（例えばａ×ｋ）よりは小さい。差分画像生成手段２３は、例えばスケール番号ｊに応じたスケールだけ離れた平滑化画像間の差分画像を生成する。具体的には、差分画像生成手段２３は、例えば下記式１を用いて差分画像Ｇ（ｘ，ｙ，σ_ｊ）を生成することができる。
Ｇ（ｘ，ｙ，σ_ｊ）＝Ｌ（ｘ，ｙ，σ_ｊ）−Ｌ（ｘ，ｙ，σ_ｊ×ａ）・・・（１）
差分画像は、差分値の絶対値であってもよい。 The difference image generation means 23 receives a plurality of smoothed images L (x, y, σ _i ) generated by the smoothing processing means 22 and a difference image G (x between two smoothed images having different scales. , Y, σ _j ) are generated while changing the scale. Here, the maximum value of the scale number j in the difference image G (x, y, σ _j ) is smaller than the maximum value (for example, a × k) of the scale σ _i in the smoothed image L. The difference image generation unit 23 generates a difference image between smoothed images separated by a scale corresponding to the scale number j, for example. Specifically, the difference image generation means 23 can generate the difference image G (x, y, σ _j ) using, for example, the following formula 1.
G (x, y, σ _j ) = L (x, y, σ _j ) −L (x, y, σ _{j × a} ) (1)
The difference image may be an absolute value of the difference value.

上記の式１の定義からわかるように、差分画像Ｇ（ｘ，ｙ，σ_ｊ）は、スケールσ_ｊの平滑化画像と、スケールσ_ｊ×ａの平滑化画像との差分として定義される。例えばａ＝２、ｋ＝３０とすると、差分画像生成手段２３は、スケールσ_１とσ_２、スケールσ_２とσ_４、スケールσ_３とσ_６、・・・、スケールσ_３０とσ_６０の組み合わせからなる３０枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_３０）を生成する。式１に従って差分画像Ｇ（ｘ，ｙ，σ_ｊ）を生成する場合、ｊは１〜ｋの値を取る。すなわち、差分画像生成手段２３は、ｋ枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ｋ）を生成する。 As can be seen from the definition of formula 1 above, the difference image _{G (x, y, σ j} ) has a smoothed image of the scale sigma _j, it is defined as the difference between the smoothed image of the scale σ _{j × a.} For example, the a = 2, k = 30, the difference image generation unit 23, the scale sigma ₁ and sigma _2, scale sigma ₂ and sigma _4, scale sigma ₃ and sigma _6, · · ·, scale sigma ₃₀ and sigma ₆₀ of Thirty differential images G (x, y, σ ₁ ) to (x, y, σ ₃₀ ) composed of combinations are generated. When the difference image G (x, y, σ _j ) is generated according to Equation 1, j takes a value from 1 to k. That is, the difference image generation unit 23 generates k difference images G (x, y, σ ₁ ) to (x, y, σ _k ).

差分画像生成手段２３は、上記に代えて、一定のスケールだけ離れた平滑化画像間の差分を差分画像として生成してもよい。差分画像生成手段２３は、例えばスケールσ_ｊの平滑化画像と、スケールσ_ｊ＋ｐの平滑化画像（ｐは１以上の整数）との差分を差分画像（ｘ，ｙ，σ_ｊ）として生成してもよい。具体的には、差分画像生成手段２３は、下記式２を用いて差分画像Ｇ（ｘ，ｙ，σ_ｊ）を生成してもよい。
Ｇ（ｘ，ｙ，σ_ｊ）＝Ｌ（ｘ，ｙ，σ_ｊ）−Ｌ（ｘ，ｙ，σ_ｊ＋ｐ）・・・（２）
この場合、平滑化画像の枚数をｒ（ｒ：３以上の整数）枚とすると、ｊは１〜ｒ−ｐの値を取る。すなわち差分画像生成手段２３は、ｒ−ｐ枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ｒ−ｐ）を生成する。具体的には、ｒ＝６０で、ｐ＝３０の場合、差分画像生成手段２３は、スケールσ_１とσ_３１、スケールσ_２とσ_３２、スケールσ_３とσ_３３、・・・、スケールσ_３０とσ_６０の組み合わせからなる３０枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_３０）を生成する。 Instead of the above, the difference image generation means 23 may generate a difference between smoothed images separated by a certain scale as a difference image. The difference image generating means 23 generates, for example, a difference between a smoothed image of scale σ _{j and} a smoothed image of scale σ _{j + p} (p is an integer of 1 or more) as a difference image (x, y, σ _j ). Also good. Specifically, the difference image generation means 23 may generate the difference image G (x, y, σ _j ) using the following formula 2.
G (x, y, σ _j ) = L (x, y, σ _j ) −L (x, y, σ _{j + p} ) (2)
In this case, if the number of smoothed images is r (r: an integer of 3 or more), j takes a value of 1 to rp. That is, the difference image generation unit 23 generates rp difference images G (x, y, σ ₁ ) to (x, y, σ _r−p ). Specifically, in the case of r = 60 and p = 30, the difference image generation unit 23 determines that the scales σ ₁ and σ ₃₁ , the scales σ ₂ and σ ₃₂ , the scales σ ₃ and σ ₃₃ ,. ₃₀ differential images G (x, y, σ ₁ ) to (x, y, σ ₃₀ ) composed of combinations of ₃₀ and σ ₆₀ are generated.

合算手段２４は、差分画像生成手段２３が生成した複数枚の差分画像Ｇ（ｘ，ｙ，σ_ｊ）を合算し、合算画像ＡＰ（ｘ，ｙ）を生成する。位置推定手段２５は、合算画像ＡＰ（ｘ，ｙ）における画素値に基づいてオブジェクトの位置を推定する。位置推定手段２５は、例えば合算画像ＡＰ（ｘ，ｙ）において画素値（差分値を合計した値）が最も大きくなる位置を調べ、その位置をオブジェクトの位置として推定する。 The summing unit 24 adds the plurality of difference images G (x, y, σ _j ) generated by the difference image generating unit 23 to generate a combined image AP (x, y). The position estimation unit 25 estimates the position of the object based on the pixel value in the combined image AP (x, y). For example, the position estimating unit 25 checks the position where the pixel value (the sum of the difference values) is the largest in the combined image AP (x, y), and estimates the position as the position of the object.

サイズ推定手段２６は、複数枚の差分画像Ｇ（ｘ，ｙ，σ_ｊ）の画素値を比較し、最大の画素値を有する差分画像のスケールに基づいて、検出すべきオブジェクトのサイズを推定する。サイズ推定手段２６は、例えば最大の画素値（差分値）を有する差分画像の生成元となった２枚の平滑化画像のうちのスケールが小さい方の平滑化画像内のスケールに基づいてオブジェクトのサイズを推定する。すなわちサイズ推定手段２６は、式１又は式２に従って生成される複数枚の差分画像Ｇ（ｘ，ｙ，σ_ｊ）のうちで、最大の差分値を有するスケールσ_ｊを求め、求めたスケールσ_ｊに基づいてオブジェクトの位置を推定する。 The size estimation means 26 compares the pixel values of a plurality of difference images G (x, y, σ _j ), and estimates the size of the object to be detected based on the scale of the difference image having the maximum pixel value. . For example, the size estimating unit 26 determines the object based on the scale in the smoothed image having the smaller scale of the two smoothed images that are the generation sources of the difference image having the maximum pixel value (difference value). Estimate the size. That is, the size estimation unit 26 obtains the scale σ _j having the maximum difference value among the plurality of difference images G (x, y, σ _j ) generated according to the expression 1 or 2, and the obtained scale σ The position of the object is estimated based on _j .

上記のオブジェクトの位置及びサイズの推定について説明する。平滑化処理手段２２は、オブジェクト形状に合わせたフィルタ特性を有する平滑化フィルタを用いて平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）を生成しており、この平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）は、特定の形状を持つ領域が強調され、他の領域が抑制された画像となる。例えば平滑化処理を数十回行ったときでも平滑化画像内にオブジェクトの輪郭成分が残るが、スケールσ_ｉが大きくなるほど、オブジェクトの領域はボケていくと共に広がっていく。 The estimation of the position and size of the object will be described. The smoothing processing means 22 generates a smoothed image L (x, y, σ _i ) using a smoothing filter having a filter characteristic matched to the object shape, and this smoothed image L (x, y, σ _i ) is an image in which a region having a specific shape is emphasized and other regions are suppressed. For example, even when the smoothing process is performed several tens of times, the contour component of the object remains in the smoothed image, but as the scale σ _i increases, the area of the object blurs and expands.

平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）におけるオブジェクトの形状及びサイズは、入力画像内のオブジェクトの形状及びサイズとそれぞれ一致していると仮定する。この平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）でのオブジェクト形状及びサイズの顕著性を算出するために、あるスケールの平滑化画像に対して、そのスケールよりもスケールが大きい平滑化画像を背景として設定する。すなわちスケールσ_ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ）対して、式１ではスケールσ_ｊ×ａの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ×ａ）を背景画像として設定し、式２ではスケールσ_ｊ＋ｐの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ＋ｐ）を背景として設定する。そして、式１又は式２に従って、スケールσ_ｊの平滑化画像と背景画像として設定する平滑化画像との差分画像Ｇ（ｘ，ｙ，σ_ｊ）が、スケールσ_ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ）におけるオブジェクトの顕著性として算出される。このように差分画像生成手段２３においてオブジェクトの顕著性を数値化し、位置推定手段２５及びサイズ推定手段２６において、差分画像生成手段２３において数値化されたオブジェクトの顕著性に基づいて、オブジェクトの位置及びサイズをそれぞれ推定する。 It is assumed that the shape and size of the object in the smoothed image L (x, y, σ _i ) match the shape and size of the object in the input image, respectively. In order to calculate the saliency of the object shape and size in the smoothed image L (x, y, σ _i ), a smoothed image having a scale larger than that scale is used as a background for the smoothed image of a certain scale. Set as. That scale sigma _j of the smoothed image _{L (x, y, σ j} ) for sets smoothed image L of formula 1 in the scale _{σ j × a (x, y} , σ j × a) as the background image, In Equation 2, a smoothed image L (x, y, σ _{j + p} ) of scale σ _{j + p} is set as the background. Then, according to Equation 1 or Equation 2, the scale sigma difference image G of the smoothed image to be set as _j of the smoothed image and the background image (x, y, σ _j) are scaled sigma _j of the smoothed image L (x , Y, σ _j ) as the saliency of the object. In this way, the saliency of the object is digitized by the difference image generation means 23, and the position estimation means 25 and the size estimation means 26 based on the saliency of the object quantified by the difference image generation means 23. Estimate each size.

ここで、画像内においてオブジェクトが理想形状、すなわちフィルタ特性に最も合致した形状であって、かつ背景にノイズがない差分画像が、他の差分画像に比べて最大の信号を有する。言い換えれば、前処理済みの画像Ｐ（ｘ，ｙ）内のオブジェクトを構成する各画素の成分がオブジェクトの領域にほぼ等しくなるまで広がったとき、差分画像Ｇ（ｘ，ｙ，σ_ｊ）内の差分値は最大となる。例えば画像Ｐ（ｘ，ｙ）内のオブジェクトが直径１０画素の円形画素から構成される場合、複数の差分画像のうちで、ｊ＝１０の差分画像Ｇ（ｘ，ｙ，σ_１０）（式１ではＬ（ｘ，ｙ，σ_１０）−Ｌ（ｘ，ｙ，σ_ａ×１０）、式２ではＬ（ｘ，ｙ，σ_１０）−Ｌ（ｘ，ｙ，σ_１０＋ｐ））における差分値が、他の差分画像における差分値に比べて大きな値を有することになる。 Here, the difference image in which the object has an ideal shape in the image, that is, the shape that most closely matches the filter characteristics and has no noise in the background, has the maximum signal compared to the other difference images. In other words, when the component of each pixel constituting the object in the preprocessed image P (x, y) spreads to be approximately equal to the object region, the difference image G (x, y, σ _j ) The difference value is the maximum. For example, when an object in the image P (x, y) is composed of circular pixels having a diameter of 10 pixels, among the plurality of difference images, a difference image G (x, y, σ ₁₀ ) (equation 1) where j = 10. In L (x, y, σ ₁₀ ) −L (x, y, σ _{a × 10} ), and in Equation 2, the difference value in L (x, y, σ ₁₀ ) −L (x, y, σ _{10 + p} )) is Therefore, it has a larger value than the difference value in other difference images.

一方で、実際に画像内に映し出されるオブジェクトは、カメラとオブジェクトの位置関係や個体差などに応じて映り方が異なり、オブジェクトの輪郭形状及びサイズは理想形状になるとは限らない。つまり、オブジェクトの輪郭形状及びサイズは変動する。そこで、位置推定手段２５は、複数の差分画像Ｇ（ｘ，ｙ，σ_ｊ）を合算した合算画像ＡＰ（ｘ，ｙ）を用いてオブジェクトの位置を推定する。このようにすることで、オブジェクトの変動を吸収しながらオブジェクトの位置を推定できる。つまり、サイズが小さいオブジェクトからサイズが大きいオブジェクトに含まれる様々な輪郭形状の変動を持つオブジェクトに対して、平滑化画像を加算した合算画像ＡＰ（ｘ，ｙ）から最大値を検出することにより、変動を吸収しながら位置推定を行うことができる。 On the other hand, the object actually reflected in the image differs in the way it is reflected according to the positional relationship between the camera and the object, individual differences, and the like, and the contour shape and size of the object are not necessarily ideal. That is, the contour shape and size of the object vary. Therefore, the position estimation unit 25 estimates the position of the object using the combined image AP (x, y) obtained by adding a plurality of difference images G (x, y, σ _j ). By doing so, it is possible to estimate the position of the object while absorbing the variation of the object. That is, by detecting the maximum value from the summed image AP (x, y) obtained by adding the smoothed image to the object having various outline shape fluctuations included in the large object from the small object, Position estimation can be performed while absorbing fluctuations.

また、上述したように、式１、式２におけるスケール番号ｊは、画像Ｐ（ｘ，ｙ）内における検出対象のオブジェクトのサイズに対応するパラメータである。オブジェクトのサイズが小さい場合にはスケール番号ｊが小さい差分画像Ｇ（ｘ，ｙ，σ_ｊ）から最大値が検出され、オブジェクトのサイズが大きい場合にはスケール番号ｊが大きい差分画像Ｇ（ｘ，ｙ，σ_ｊ）から最大値が検出される。サイズ推定手段２６は、この性質を利用し、複数の差分画像の間で差分値同士を比較し、最大の差分値となる差分画像のスケール番号、すなわち平滑化処理の繰り返し回数からオブジェクトのサイズを推定する。 As described above, the scale number j in the expressions 1 and 2 is a parameter corresponding to the size of the object to be detected in the image P (x, y). When the object size is small, the maximum value is detected from the difference image G (x, y, σ _j ) having a small scale number j. When the object size is large, the difference image G (x, The maximum value is detected from y, σ _j ). Using this property, the size estimation means 26 compares the difference values among a plurality of difference images, and calculates the size of the object from the scale number of the difference image that becomes the maximum difference value, that is, the number of repetitions of the smoothing process. presume.

部分画像生成手段２７は、位置推定手段２５から推定されたオブジェクトの位置を入力し、サイズ推定手段２６から推定されたオブジェクトのサイズを入力する。部分画像生成手段２７は、入力画像（フレーム画像）からオブジェクトが存在すると推定される位置の周辺の画像を部分画像として切り出す。また部分画像生成手段２７は、切り出した部分画像を、推定されたサイズに応じた倍率で拡大／縮小する。推定されたサイズに応じた倍率で拡大／縮小することで、オブジェクトのサイズの変動を吸収することができる。 The partial image generation unit 27 inputs the position of the object estimated from the position estimation unit 25 and inputs the size of the object estimated from the size estimation unit 26. The partial image generating means 27 cuts out an image around a position where an object is estimated to exist from the input image (frame image) as a partial image. Further, the partial image generation means 27 enlarges / reduces the cut out partial image at a magnification according to the estimated size. By enlarging / reducing by a magnification according to the estimated size, a change in the size of the object can be absorbed.

図６は、オブジェクト候補点検出手段１２の動作手順を示す。前処理手段２１は、画像入力手段１１（図１）からフレーム画像を受け取り、フレーム画像に対して前処理を行う（ステップＣ１）。すなわち、解像度変換手段５１がフレーム画像を所定の解像度にまで低解像度化し、動き領域抽出手段５２が低解像度化されたフレーム画像から動き領域を抽出する。前処理手段２１は、前処理後の画像、すなわち解像度が低解像度化され、動き領域が白で背景領域が黒となるようにグレースケール化された画像Ｐ（ｘ，ｙ）を平滑化処理手段２２に入力する。なお、前処理手段２１における解像度変換及び動き領域抽出の何れか一方、又は双方を省略しても構わない。双方を省略する場合、フレーム画像を平滑化処理手段２２に入力すればよい。 FIG. 6 shows an operation procedure of the object candidate point detection means 12. The preprocessing unit 21 receives the frame image from the image input unit 11 (FIG. 1) and performs preprocessing on the frame image (step C1). That is, the resolution conversion means 51 lowers the frame image to a predetermined resolution, and the motion area extraction means 52 extracts a motion area from the reduced resolution frame image. The pre-processing means 21 smoothes the pre-processed image, that is, the image P (x, y) gray-scaled so that the resolution is reduced and the motion area is white and the background area is black. 22 is input. Note that either one or both of resolution conversion and motion region extraction in the preprocessing unit 21 may be omitted. When both are omitted, the frame image may be input to the smoothing processing means 22.

平滑化処理手段２２は、画像Ｐ（ｘ，ｙ）を入力し、画像Ｐ（ｘ，ｙ）に平滑化フィルタを畳み込む処理を繰り返すことで、スケールが異なる複数の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）を生成する（ステップＣ２）。平滑化処理手段２２は、フレーム画像そのものに対して平滑化フィルタを畳み込んでもよい。差分画像生成手段２３は、スケールが異なる２つの平滑化画像間の差分を計算し、差分画像Ｇ（ｘ，ｙ，σ_ｊ）を生成する（ステップＣ３）。差分画像生成手段２３は、例えば式１を用いて、ａ×ｋ枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）からスケール番号１〜ｋのｋ枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ｋ）を生成する。あるいは差分画像生成手段２３は、式２を用いて、ｒ枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）からスケール番号１〜ｒ−ｐのｒ−ｐ枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ｒ−ｐ）を生成する。 The smoothing processing means 22 receives the image P (x, y), and repeats the process of convolving the smoothing filter with the image P (x, y), whereby a plurality of smoothed images L (x, y with different scales) are obtained. , Σ _i ) (step C2). The smoothing processing unit 22 may convolve a smoothing filter with the frame image itself. The difference image generation unit 23 calculates a difference between two smoothed images having different scales, and generates a difference image G (x, y, σ _j ) (step C3). The difference image generation means 23 uses, for example, Equation 1 to calculate k difference images G (x, y, σ) having scale numbers 1 to k from a × k smoothed images L (x, y, σ _i ). ₁ ) to (x, y, σ _k ) are generated. Alternatively, the difference image generation unit 23 uses the equation 2 to calculate rp difference images G (x, y) having scale numbers 1 to rp from r smoothed images L (x, y, σ _i ). , Σ ₁ ) to (x, y, σ _rp ).

合算手段２４は、差分画像生成手段２３が生成した複数の差分画像を合算し、合算画像ＡＰ（ｘ，ｙ）を生成する（ステップＣ４）。合算手段２４は、例えば差分画像生成手段２３で生成されたｋ枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ｋ）の各画素値を全て加算する。位置推定手段２５は、合算画像ＡＰ（ｘ，ｙ）に基づいて、オブジェクトが存在する位置を推定する（ステップＣ５）。位置推定手段２５は、例えば合算画像ＡＰ（ｘ，ｙ）を構成する各画素位置の画素値（差分の合算値）を比較し、合算画像において最大の画素値を有する画素位置をオブジェクトの位置として推定する。 The summing unit 24 adds the plurality of difference images generated by the difference image generating unit 23 to generate a summed image AP (x, y) (step C4). For example, the summing unit 24 adds all the pixel values of the _k difference images G (x, y, σ ₁ ) to (x, y, σ _k ) generated by the difference image generating unit 23. The position estimation means 25 estimates the position where the object exists based on the combined image AP (x, y) (step C5). The position estimation unit 25 compares, for example, pixel values (summation values of differences) of pixel positions constituting the summed image AP (x, y), and uses the pixel position having the maximum pixel value in the summed image as the position of the object. presume.

なお、合算手段２４は、全ての差分画像を合算する必要はない。合算手段２４は、例えば全ｋ枚の差分画像のうちの任意数、及び任意のスケール番号の差分画像を合算してもよい。合算手段２４は、例えば吸収したいサイズ変動幅に応じて、加算処理に用いる差分画像の数（合算する差分画像のスケール）を変更してもよい。例えば、検出対象のオブジェクトの種類に応じて吸収したいサイズ変動幅を設定しておき、あるオブジェクトについては、スケール番号が小さい、具体的にはスケール番号１、２の差分画像Ｇ（ｘ，ｙ，σ_１）、（ｘ，ｙ，σ_２）を合算から除外して、スケール番号３〜ｋの差分画像Ｇ（ｘ，ｙ，σ_３）〜（ｘ，ｙ，σ_ｋ）を合算してもよい。また、合算手段２４が、スケール番号１から、ｋよりも小さい任意のスケール番号までの差分画像（ｘ，ｙ，σ_ｊ）を合算してもよい。 Note that the summing unit 24 does not have to sum all the difference images. For example, the summing unit 24 may sum any number of difference images of all k pieces of difference images and any number of difference images. For example, the summing unit 24 may change the number of difference images (the scale of the difference image to be summed) used for the addition processing in accordance with the size fluctuation range to be absorbed. For example, the size fluctuation range to be absorbed is set according to the type of the object to be detected, and for a certain object, the difference number G (x, y, Even if the difference images G (x, y, σ ₃ ) to (x, y, σ _k ) of scale numbers ₃ to _k are added together by excluding σ ₁ ) and (x, y, σ ₂ ) from the addition. Good. Further, the summing unit 24 may sum the difference images (x, y, σ _j ) from the scale number 1 to an arbitrary scale number smaller than k.

サイズ推定手段２６は、複数の差分画像Ｇ（ｘ，ｙ，σ_ｊ）に基づいて、オブジェクトのサイズを推定する（ステップＣ６）。サイズ推定手段２６は、例えばｋ枚の差分画像間で、位置推定手段２５で推定されたオブジェクトの位置の周辺の画素の画素値（差分値）を比較する。サイズ推定手段２６は、最大の画素値を与える差分画像のスケールを特定する。あるいはサイズ推定手段２６は、推定されたオブジェクトの位置の周辺だけではなく、差分画像の全画素の画素値を比較し、最大の画素値を与える差分画像のスケールを特定してもよい。平滑化処理を行うことで画像内の像がどの程度広がるか（ボケるか）は既知であるため、差分最大を与えるスケールが判明すれば、そのスケール番号に基づいてオブジェクトのサイズが推定できる。また、上述のように検出対象であるオブジェクトは変動するため、サイズ推定手段２６は、最も大きい差分値を有する差分画像から推定したサイズ±α（αは所定の値）をオブジェクトのサイズとして推定するようにしてもよい。 The size estimation means 26 estimates the size of the object based on the plurality of difference images G (x, y, σ _j ) (step C6). The size estimation unit 26 compares pixel values (difference values) of pixels around the position of the object estimated by the position estimation unit 25 between, for example, k difference images. The size estimation means 26 specifies the scale of the difference image that gives the maximum pixel value. Alternatively, the size estimation unit 26 may compare not only the vicinity of the estimated position of the object but also the pixel values of all the pixels of the difference image, and specify the scale of the difference image that gives the maximum pixel value. Since it is known how much the image in the image spreads out by performing the smoothing process, if the scale that gives the maximum difference is found, the size of the object can be estimated based on the scale number. Since the object to be detected varies as described above, the size estimation unit 26 estimates the size ± α (α is a predetermined value) estimated from the difference image having the largest difference value as the object size. You may do it.

部分画像生成手段２７は、推定されたオブジェクトの位置及びサイズを利用して、フレーム画像におけるオブジェクトが存在すると推定される位置の周辺の画像を部分画像として生成する（ステップＳ７）。部分画像生成手段２７は、例えばフレーム画像からオブジェクトが存在すると推定される位置の周辺の画像を切り出し、切り出した画像を、推定されたオブジェクトのサイズに応じて拡大／縮小する。推定されたオブジェクトのサイズに応じて拡大／縮小を行うことで、部分画像におけるオブジェクトの大きさを、判別器１３で使用されるテンプレートにおけるオブジェクトの大きさに適合させることができる。部分画像生成手段２７は、生成した部分画像を判別器１３へ出力する。判別器１３は、部分画像生成手段２７により生成された部分画像に対して、検出対象のオブジェクトの存在に関する詳細な判別処理を実行する。 The partial image generating unit 27 generates an image around the position where the object in the frame image is estimated to exist as a partial image using the estimated position and size of the object (step S7). For example, the partial image generation unit 27 cuts out an image around a position where an object is estimated to exist from a frame image, and enlarges / reduces the cut-out image according to the estimated size of the object. By enlarging / reducing according to the estimated size of the object, the size of the object in the partial image can be adapted to the size of the object in the template used in the discriminator 13. The partial image generating means 27 outputs the generated partial image to the discriminator 13. The discriminator 13 performs detailed discrimination processing on the presence of the detection target object on the partial image generated by the partial image generation unit 27.

比較例としてＤＯＧ（Differential Of Gaussian）画像を用いたオブジェクトの位置推定を考えると、ＤＯＧ画像を用いた位置推定では隣接するスケールの平滑化画像間の差分を全て求める必要があり、生成する必要がある差分画像の枚数が多くなる。図５に示すオブジェクト候補点検出手段１２を用いる場合、あるスケールの平滑化画像と、そのスケールから所定スケールだけ離れたスケールの平滑化画像との差分を差分画像として生成すればよく、ＤＯＧ画像を用いた位置推定に比して、差分画像の生成枚数を少なくすることができる。このため、効率的に精度良くオブジェクトの位置を推定することができる。また、図５に示す構成のオブジェクト候補点検出手段１２では、多重解像度画像を生成しなくてもオブジェクトのサイズの推定することができ、効率的にオブジェクトのサイズを推定することができる。 Considering the position estimation of an object using a DOG (Differential Of Gaussian) image as a comparative example, the position estimation using a DOG image needs to obtain all the differences between smoothed images of adjacent scales, and needs to be generated. The number of certain difference images increases. When the object candidate point detection means 12 shown in FIG. 5 is used, a difference between a smoothed image having a certain scale and a smoothed image having a scale separated from the scale by a predetermined scale may be generated as a difference image. Compared to the position estimation used, the number of generated difference images can be reduced. For this reason, the position of the object can be estimated efficiently and accurately. Further, the object candidate point detection means 12 having the configuration shown in FIG. 5 can estimate the size of an object without generating a multi-resolution image, and can efficiently estimate the size of the object.

特に、平滑化処理手段２２においてａ×ｋ枚の平滑化画像Ｌ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ａ×ｋ）を生成し、差分画像生成手段２３において、式１用いて、スケールσ_ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ）とスケールσ_ａ×ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ａ×ｊ）との差分を差分画像Ｇ（ｘ，ｙ，σ_ｊ）として求める場合、オブジェクトのサイズの様々な変動に合わせて、オブジェクトの位置を精度よく推定することができる。また、オブジェクトのサイズの推定を精度よく行うことができる。 In particular, the smoothing processing unit 22 generates a × k smoothed images L (x, y, σ ₁ ) to (x, y, σ _{a × k} ), and the difference image generating unit 23 uses Expression 1. Te, scale sigma _j of the smoothed image _{L (x, y, σ j} ) the scale sigma smoothed image L of _{a × j (x, y,} σ a × j) the difference between the difference image G (x, y , Σ _j ), the position of the object can be accurately estimated in accordance with various changes in the size of the object. Also, the object size can be estimated with high accuracy.

なお、上記の説明では、動き領域抽出手段５２が動き領域（オブジェクト）を白、背景領域を黒とするようなグレースケール化処理又は２値化処理を行うものとして説明したが、動き領域抽出手段５２の動作はこれには限定されない。例えば動き領域抽出手段５２は、動き領域を黒、背景領域を白とするようなグレースケール化処理又は２値化処理を行ってもよい。その場合には、位置推定手段２５は、合算画像ＡＰ（ｘ，ｙ）において、画素値が最小となる画素位置を、オブジェクトの位置として推定すればよい。また、サイズ推定手段２６は、複数の差分画像のうちで最小の画素値（差分値）を与える差分画像のスケールに基づいて、オブジェクトのサイズを推定すればよい。 In the above description, the motion region extraction unit 52 has been described as performing gray scale processing or binarization processing in which the motion region (object) is white and the background region is black. The operation of 52 is not limited to this. For example, the motion region extraction unit 52 may perform gray scale processing or binarization processing in which the motion region is black and the background region is white. In this case, the position estimation unit 25 may estimate the pixel position where the pixel value is minimum in the summed image AP (x, y) as the position of the object. The size estimation unit 26 may estimate the size of the object based on the scale of the difference image that gives the minimum pixel value (difference value) among the plurality of difference images.

また、上記の説明では、オブジェクト候補点検出手段１２が動画像からオブジェクトが存在すると推定される位置を１つだけ推定する例を説明したが、これには限定されない。オブジェクト候補点検出手段１２において、複数のオブジェクトの存在を推定し、オブジェクトが存在すると推定された複数の位置の周辺の画像をそれぞれ部分画像を切り出してもよい。例えばオブジェクト候補点検出手段１２において位置を推定すべきオブジェクトの数をＭとする。その場合、位置推定手段２５は、合算画像ＡＰ（ｘ，ｙ）の画素値を大きい順に並べ、上位Ｍ個の画素位置を各オブジェクトの位置として推定し、各位置の周辺の画像を部分画像として切り出せばよい。つまり、合算画像ＡＰ（ｘ，ｙ）において画素値が大きい順にＭ個の画素位置をオブジェクトの位置として推定すればよい。サイズ推定手段２６は、推定されたＭ個のオブジェクトの位置の周辺において、最大の画素値を与える差分画像のスケールに基づいて、各オブジェクトのサイズを推定すればよい。 In the above description, an example has been described in which the object candidate point detection unit 12 estimates only one position where an object is estimated to exist from a moving image. However, the present invention is not limited to this. The object candidate point detection means 12 may estimate the presence of a plurality of objects, and cut out partial images of images around a plurality of positions where the objects are estimated to exist. For example, let M be the number of objects whose positions should be estimated by the object candidate point detection means 12. In that case, the position estimation means 25 arranges the pixel values of the summed image AP (x, y) in descending order, estimates the top M pixel positions as the positions of the respective objects, and uses the peripheral image of each position as a partial image. Cut it out. That is, it is only necessary to estimate M pixel positions as object positions in descending order of pixel values in the combined image AP (x, y). The size estimation means 26 may estimate the size of each object based on the scale of the difference image that gives the maximum pixel value around the estimated positions of the M objects.

次いで、本実施形態における効果を説明する。本実施形態では、判別器１３における処理に先立って、弱判別器１６で求めるべき差分の位置関係に対応した２点間の差分を差分画像として生成してく。各弱判別器１６は、個別に差分計算を行うのに代えて、あらかじめ生成しておいた差分画像を参照して差分画像から差分値を取得する。各弱判別器１６において個別に差分計算を行う場合、位置関係が同じ２点間の差分を計算する複数の弱判別器１６において、テンプレートの移動前後で、入力画像の同じ位置の差分が個別に重複して計算される。本実施形態では、弱判別器１６は差分画像を参照することで差分値を取得できるため、テンプレートを移動したときに、同じ位置関係の２点間の差分計算が重複して実行されるのを避けることができる。本実施形態では、重複した差分計算を避けることができるため、差分処理の計算回数を削減することができ、判別処理を高速化をすることができる。 Next, effects in the present embodiment will be described. In the present embodiment, prior to the processing in the discriminator 13, a difference between two points corresponding to the positional relationship of the differences to be obtained by the weak discriminator 16 is generated as a difference image. Each weak discriminator 16 obtains a difference value from the difference image by referring to the difference image generated in advance, instead of performing the difference calculation individually. When each weak discriminator 16 performs the difference calculation individually, in the plurality of weak discriminators 16 that calculate the difference between two points having the same positional relationship, the difference at the same position of the input image is individually obtained before and after the template is moved. Calculated redundantly. In this embodiment, since the weak discriminator 16 can obtain a difference value by referring to the difference image, the difference calculation between two points having the same positional relationship is executed repeatedly when the template is moved. Can be avoided. In the present embodiment, since it is possible to avoid duplicate difference calculation, the number of difference processing calculations can be reduced, and the discrimination processing can be speeded up.

以下、具体例を用いて説明する。図７（ａ）及ぶ（ｂ）は、弱判別器１６における差分計算に関する基本特徴タイプを示す。（ａ）及び（ｂ）に示す基本特徴タイプＡ、Ｂは、それぞれ３つの画素のペアの差分として定義される。（ａ）に示す基本特徴タイプＡと（ｂ）に示す基本特徴タイプＢとは、ペア間の画素の位置関係は両者で異なるものの、各ペアにおいて横方向に１画素分離れた画素間で差分計算を行うという点で共通している。学習の結果、判別器１３が、（ａ）に示す基本特徴タイプＡで差分計算を行う弱判別器１６と、（ｂ）に示す基本特徴タイプＢで差分計算を行う弱判別器１６とを、それぞれ２０個ずつ含むこととなったとする。 Hereinafter, a specific example will be described. FIGS. 7A and 7B show basic feature types related to difference calculation in the weak classifier 16. Basic feature types A and B shown in (a) and (b) are each defined as a difference between a pair of three pixels. The basic feature type A shown in (a) and the basic feature type B shown in (b) differ in the positional relationship of the pixels between the pairs, but the difference between the pixels separated by one pixel in the horizontal direction in each pair. They are common in that they perform calculations. As a result of learning, the discriminator 13 includes a weak discriminator 16 that performs difference calculation with the basic feature type A shown in (a), and a weak discriminator 16 that performs difference calculation with the basic feature type B shown in (b). Suppose that each contains 20 pieces.

差分画像を用いない場合、テンプレートのサイズを３２画素×３２画素とすると、この３２×３２の領域内で、基本特徴タイプＡの弱判別器１６、及び基本特徴タイプＢの弱判別器１６においてそれぞれ２０か所の差分計算がされることになる。基本特徴タイプＡ及びＢはそれぞれ３つのペアについて差分計算を行うため、テンプレート内では、（２０＋２０）×３＝１２０回の差分計算が行われることになる。このテンプレートを縦方向及び横方向にそれぞれ３２画素の範囲で１画素ずつずらして走査すると、テンプレートの各位置で１２０回の差分計算が行われることから、トータルの差分の計算回数は１２０×３２×３２＝１２２８８０回となる。 If the difference image is not used and the template size is 32 pixels × 32 pixels, the basic feature type A weak discriminator 16 and the basic feature type B weak discriminator 16 respectively in this 32 × 32 region. Difference calculation is performed at 20 locations. Since the basic feature types A and B each perform difference calculation for three pairs, (20 + 20) × 3 = 120 difference calculations are performed in the template. When this template is scanned by shifting one pixel at a time in the range of 32 pixels in the vertical and horizontal directions, difference calculation is performed 120 times at each position of the template, so the total number of difference calculations is 120 × 32 ×. 32 = 122880 times.

一方、差分画像を用いる場合、テンプレートの走査範囲が上記と同じ範囲であるとすれば、差分処理が必要なエリアは６４画素×６４画素（テンプレートのサイズ３２画素＋走査幅３２画素＝６４画素）のエリアになり、差分画像の生成に際して、６４×６４＝４０９６回の差分計算が必要である。基本特徴タイプＡと基本特徴タイプＢとは、共に横方向に１画素離れた画素ペアの差分を計算するため、基本特徴タイプＡの弱判別器１６と基本特徴タイプＢの弱判別器１６とは、共通の差分画像を参照すればよい。差分画像を用いる場合、４０９６回の差分計算を行うことで、基本特徴タイプＡの弱判別器１６と基本特徴タイプＢの弱判別器１６とにおける処理が実現でき、各弱判別器において個別の差分計算を行う場合に比して、差分計算の回数を大幅に削減できる。また、差分画像の生成において、例えば着目画素の位置をラスタスキャンして差分計算を行うことで、各弱判別器１６において個別に差分計算を行う場合に比して画像の参照箇所を局所化することができ、これによりキャッシュヒットの向上を見込むことができる。キャッシュがヒットすることで、差分画像生成を効率よく行うことができる。 On the other hand, when using the difference image, assuming that the scanning range of the template is the same as the above, the area that needs to be processed is 64 pixels × 64 pixels (template size 32 pixels + scanning width 32 pixels = 64 pixels). When the difference image is generated, 64 × 64 = 4096 difference calculations are required. Since the basic feature type A and the basic feature type B both calculate a difference between pixel pairs that are one pixel apart in the horizontal direction, the basic feature type A weak classifier 16 and the basic feature type B weak classifier 16 The common difference image may be referred to. When a difference image is used, processing in the weak discriminator 16 of the basic feature type A and the weak discriminator 16 of the basic feature type B can be realized by performing the difference calculation 4096 times. The number of difference calculations can be greatly reduced as compared with the case where the calculation is performed. Further, in the generation of the difference image, for example, by performing the difference calculation by raster scanning the position of the pixel of interest, the reference location of the image is localized as compared with the case where the difference calculation is individually performed in each weak discriminator 16. This can be expected to improve cache hits. When the cache hits, the difference image can be generated efficiently.

また、本実施形態では、オブジェクト候補点検出手段１２を用いており、オブジェクトが存在する可能性が高い画像部分を判別器１３に入力している。本実施形態では、オブジェクトが存在する確率が高い画像部分を判別器１３で処理するため、各弱判別器１６で早期終了の判断を行わずに、複数の弱判別器１６を最終段まで一括で実行することが好ましい。早期終了を行わない場合、各弱判別器１６で分岐判断が発生しないため、パイプラインの乱れが生じない。更に、早期終了を行わないことで、判別器１３における処理時間を一定の時間に保つことができる効果もある。 Further, in the present embodiment, the object candidate point detection unit 12 is used, and an image portion having a high possibility that an object exists is input to the discriminator 13. In the present embodiment, since the classifier 13 processes an image portion having a high probability that an object exists, the weak classifiers 16 are collectively processed up to the final stage without performing the early termination determination by each weak classifier 16. It is preferable to carry out. If the early termination is not performed, no branch determination occurs in each weak discriminator 16, so that the pipeline is not disturbed. Furthermore, there is an effect that the processing time in the discriminator 13 can be maintained at a constant time by not performing the early termination.

続いて、本発明の第２実施形態を説明する。第２実施形態におけるオブジェクト判別装置の構成は、図１に示す第１実施形態のオブジェクト判別装置１０の構成と同様である。本実施形態では、判別器１３において、基本特徴タイプが同じ複数の弱判別器１６（図２）が連続してカスケード接続される。その他の点は、第１実施形態と同様である。 Next, a second embodiment of the present invention will be described. The configuration of the object discrimination device in the second embodiment is the same as that of the object discrimination device 10 in the first embodiment shown in FIG. In this embodiment, in the discriminator 13, a plurality of weak discriminators 16 (FIG. 2) having the same basic feature type are connected in cascade. Other points are the same as in the first embodiment.

図８は、第２実施形態のオブジェクト判別装置で用いられる判別器を示す。判別器１３ａでは、例えば図８に示すように、基本特徴タイプ１、基本特徴タイプ２、及び基本特徴タイプ３の弱判別器１６がそれぞれまとめられ、連続してカスケード接続されている。また、判別器１３ａでは、基本特徴タイプ１の弱判別器１６のグループに次に基本特徴タイプ２の弱判別器のグループが配置され、基本特徴タイプ２の弱判別器１６のグループの次に基本特徴タイプ３の弱判別器１６のグループが配置されている。 FIG. 8 shows a discriminator used in the object discriminating apparatus of the second embodiment. In the discriminator 13a, for example, as shown in FIG. 8, the weak discriminators 16 of the basic feature type 1, the basic feature type 2, and the basic feature type 3 are grouped and cascaded in succession. Further, in the classifier 13a, the group of weak classifiers of basic feature type 2 is arranged next to the group of weak classifiers 16 of basic feature type 1, and the group of weak classifiers 16 of basic feature type 2 is next to basic. A group of weak classifiers 16 of feature type 3 is arranged.

図９は、判別器１３ａの構成に用いられる判別器構成装置３０を示す。学習結果入力手段３１は、機械学習を用いて学習された複数の弱判別器１６を入力する。グループ化手段３２は、学習により得られた複数の弱判別器１６を、基本特徴タイプに応じて複数のグループにグループ化する。グループ化手段３２は、複数の弱判別器１６を、例えば基本特徴タイプごとにグループ化する。再配置手段３３は、同じグループに所属する弱判別器１６が連続して並ぶように複数の弱判別器をカスケード接続し、判別器１３ａを構成する。判別器構成装置３０の各部の機能は、例えばコンピュータが所定のプログラムに従って処理を実行することで実現できる。 FIG. 9 shows a discriminator constituting apparatus 30 used for the construction of the discriminator 13a. The learning result input means 31 inputs a plurality of weak discriminators 16 learned using machine learning. The grouping means 32 groups a plurality of weak classifiers 16 obtained by learning into a plurality of groups according to the basic feature type. The grouping means 32 groups a plurality of weak classifiers 16 for each basic feature type, for example. The rearrangement means 33 configures a discriminator 13a by cascading a plurality of weak discriminators so that the weak discriminators 16 belonging to the same group are continuously arranged. The function of each part of the discriminator constituting apparatus 30 can be realized by, for example, a computer executing processing according to a predetermined program.

図１０（ａ）は学習後の判別器を示し、（ｂ）は再配置後の判別器を示す。一般に、学習により得られた弱判別器は、重み付き正答率が高い順に、つまり判別に有効な順に並んでいる。図１０（ａ）は、複数の弱判別器が判別に有効な順にカスケード接続された状態を示している。再配置手段３３は、図１０（ｂ）に示すように、判別器１３ａにおいて、基本特徴タイプが同じ弱判別器が連続して配置されるように学習済みの弱判別器を並び替える。並び替えを行うことで、例えば学習後の判別器（図１０（ａ））において初段を構成していた弱判別器が、再配置後の判別器１３ａ（図１０（ｂ））の中段に配置され、学習後の判別器において中段を構成していた弱判別器が、再配置後の判別器１３ａの初段に配置され得る。 FIG. 10A shows the discriminator after learning, and FIG. 10B shows the discriminator after rearrangement. Generally, the weak classifiers obtained by learning are arranged in descending order of the weighted correct answer rate, that is, in an order effective for discrimination. FIG. 10A shows a state in which a plurality of weak classifiers are cascade-connected in the order effective for discrimination. As shown in FIG. 10B, the rearrangement unit 33 rearranges the learned weak classifiers so that weak classifiers having the same basic feature type are continuously arranged in the classifier 13a. By performing the rearrangement, for example, the weak discriminator constituting the first stage in the discriminator after learning (FIG. 10A) is arranged in the middle stage of the discriminator 13a after rearrangement (FIG. 10B). Then, the weak discriminator constituting the middle stage in the discriminator after learning can be arranged in the first stage of the discriminator 13a after rearrangement.

ここで、図１に示すルックアップテーブル１４には、基本特徴タイプごとに生成されたルックアップテーブルが格納されており、基本特徴タイプが同じ弱判別器１６は、同じルックアップテーブルを参照してスコアを求める。通常、判別器１３ａの処理を実現するプロセッサにはキャッシュメモリが備えられており、そのキャッシュメモリには、弱判別器１６が参照したルックアップテーブルの参照箇所に近い部分が格納されることになる。 Here, the lookup table 14 shown in FIG. 1 stores a lookup table generated for each basic feature type, and weak classifiers 16 having the same basic feature type refer to the same lookup table. Find the score. Usually, the processor that implements the processing of the discriminator 13a is provided with a cache memory, and the cache memory stores a portion close to the reference location of the lookup table referred to by the weak discriminator 16. .

弱判別器を、判別に有効な順にカスケード接続した一般的な判別器（強判別器）では、ある段の弱判別器の基本特徴タイプとその次の段の弱判別器の基本特徴タイプとが異なっていることが多い。その場合、ある段の弱判別器の処理において、その弱判別器が参照するルックアップテーブルの一部がキャッシュメモリに格納されたとしても、その次の段の弱判別器の処理においてキャッシュがヒットすることはあまり期待できない。これに対し、基本特徴タイプが同じ弱判別器が連続して並ぶ場合、同じ基本特徴タイプの弱判別器１６が連続して処理を行う間は同じルックアップテーブルが参照されることになり、キャッシュがヒットする確率の向上が見込める In a general classifier (strong classifier) in which weak classifiers are cascade-connected in the order effective for discrimination, the basic feature type of a weak classifier in one stage and the basic feature type of the weak classifier in the next stage are Often different. In that case, even if a part of the lookup table referenced by the weak classifier is stored in the cache memory in the process of the weak classifier at a certain stage, the cache is hit in the process of the weak classifier at the next stage. I can't expect much to do. On the other hand, when weak classifiers having the same basic feature type are successively arranged, the same look-up table is referred to while the weak classifiers 16 having the same basic feature type are continuously processed. Can increase the probability of hit

本実施形態では、オブジェクト判別装置１０は、基本特徴タイプが同じ弱判別器１６が連続して並べられている判別器１３ａを用いて画像に検出対象のオブジェクトが存在するか否かの判別を行う。このようにすることで、同じ基本特徴タイプの弱判別器１６を連続して並べない場合に比して、ルックアップテーブルの参照において参照の局所化を図ることができ、キャッシュヒットの確率を上げることができる。本実施形態では、第１実施形態で得られる効果に加えて、キャッシュがヒットした分だけ、処理を高速化できる。特に、主に組み込み系で用いられるようなローパワーの処理系では、キャッシュヒットの有無が処理時間に与える影響は大きく、キャッシュをヒットさせることで処理時間を大幅に短縮することができる。 In this embodiment, the object discriminating apparatus 10 discriminates whether or not an object to be detected exists in an image using a discriminator 13a in which weak discriminators 16 having the same basic feature type are continuously arranged. . By doing in this way, compared with the case where the weak discriminators 16 of the same basic feature type are not continuously arranged, the reference can be localized in the lookup table reference, and the probability of a cache hit is increased. be able to. In the present embodiment, in addition to the effects obtained in the first embodiment, the processing speed can be increased by the amount corresponding to the cache hit. In particular, in a low-power processing system used mainly in an embedded system, the presence or absence of a cache hit has a large influence on the processing time, and the processing time can be greatly shortened by hitting the cache.

引き続き、本発明の第３実施形態を説明する。本実施形態におけるオブジェクト判別装置の構成は、図１に示す第１実施形態のオブジェクト判別装置１０の構成と同様である。本実施形態では、強判別器１３ａ（図８）において、基本特徴タイプが同じ複数の弱判別器１６が、各弱判別器１６における差分計算の際の画像の参照位置に従った並び順で並べられる。その他の点は、第２実施形態と同様である。 Next, a third embodiment of the present invention will be described. The configuration of the object discrimination device in the present embodiment is the same as the configuration of the object discrimination device 10 in the first embodiment shown in FIG. In the present embodiment, in the strong discriminator 13a (FIG. 8), a plurality of weak discriminators 16 having the same basic feature type are arranged in the order of arrangement according to the reference position of the image at the time of difference calculation in each weak discriminator 16. It is done. Other points are the same as in the second embodiment.

図１１（ａ）は、基本特徴タイプ１における弱判別器の並び順を示し、（ｂ）は、テンプレート内での各弱判別器の画像の参照位置を示している。基本特徴タイプ１は、横方向（ｘ方向）に並ぶ２つの画素の差分であるとする。図１１（ｂ）は、差分計算のタイプが基本特徴体タイプ１の複数の弱判別器１６のうちのいくつかにおける画像の参照位置を示している。基本特徴タイプ１の複数の弱判別器１６は、図１１（ａ）に示すように、各弱判別器１６における差分計算の際の画像の参照位置に従った順序でカスケード接続される。 FIG. 11A shows the arrangement order of the weak classifiers in the basic feature type 1, and FIG. 11B shows the reference positions of the images of the weak classifiers in the template. The basic feature type 1 is a difference between two pixels arranged in the horizontal direction (x direction). FIG. 11B shows image reference positions in some of the plurality of weak classifiers 16 having the basic feature type 1 as the difference calculation type. As shown in FIG. 11A, the plurality of weak classifiers 16 of the basic feature type 1 are cascade-connected in the order according to the reference position of the image at the time of difference calculation in each weak classifier 16.

例えば、基本特徴タイプ１の複数の弱判別器１６は、各弱判別器１６における差分計算の際の画像の参照位置がラスタスキャン走査順に従って現れるように並べられている。図８に示す判別器１３における基本特徴タイプ２の複数の弱判別器１６、及び、基本特徴タイプ３の複数の弱判別器１６も、基本特徴タイプ１と同様に、弱判別器１６が、差分計算の際の画像の参照位置がラスタスキャン走査順に従って現れるように並べられている。 For example, the plurality of weak classifiers 16 of the basic feature type 1 are arranged so that the reference positions of the images in the difference calculation in each weak classifier 16 appear in the raster scan scanning order. In the discriminator 13 shown in FIG. 8, the weak discriminators 16 of the basic feature type 2 and the weak discriminators 16 of the basic feature type 3 are different from each other in the weak discriminator 16 as in the basic feature type 1. The reference positions of the images at the time of calculation are arranged so as to appear in the raster scan scanning order.

図１２は、本実施形態における判別器の構成に用いる判別器構成装置３０ａを示している。学習結果入力手段３１は、機械学習を用いて学習された複数の弱判別器１６を入力する。グループ化手段３２は、学習により得られた複数の弱判別器１６を、基本特徴タイプに応じて複数のグループにグループ化する。グループ化手段３２は、複数の弱判別器１６を、例えば基本特徴タイプごとにグループ化する。ソート手段３４は、同じグループに所属する弱判別器１６を、差分計算の際の画像の参照位置に従ってソートする。再配置手段３３は、ソート手段３４でソートされた順初に従って、グループごとに複数の弱判別器をカスケード接続し、判別器を構成する。 FIG. 12 shows a discriminator configuration apparatus 30a used for the configuration of the discriminator in the present embodiment. The learning result input means 31 inputs a plurality of weak discriminators 16 learned using machine learning. The grouping means 32 groups a plurality of weak classifiers 16 obtained by learning into a plurality of groups according to the basic feature type. The grouping means 32 groups a plurality of weak classifiers 16 for each basic feature type, for example. The sorting unit 34 sorts the weak classifiers 16 belonging to the same group according to the reference position of the image at the time of difference calculation. The rearrangement means 33 forms a discriminator by cascading a plurality of weak classifiers for each group according to the order sorted by the sorting means 34.

ソート手段３４は、例えば、弱判別器１６が差分計算の際に参照する複数の参照位置のうちで最も原点（画像の左上）に近い参照位置を、その弱判別器１６が差分計算の際に参照する画像の参照位置としてソートを行う。具体的に、弱判別器１６が３組の差分（６点参照）を求めるものである場合、ソート手段３４は、６点のうちでもっとも原点に違い参照点の位置を、その弱判別器１６における差分計算の際の画像の参照位置としてソートを行うことができる。あるいは６点の参照点のうちの任意の１つを、差分計算の際の画像の参照位置としてソートを行ってもよい。また、弱判別器１６における複数の参照点の重心位置、例えば６点の参照点の重心位置を、差分計算の際の画像の参照位置としてソートを行ってもよい。 For example, the sorting unit 34 selects a reference position closest to the origin (upper left of the image) among a plurality of reference positions that the weak discriminator 16 refers to when calculating the difference, and the weak discriminator 16 performs the difference calculation. Sort as the reference position of the image to be referenced. Specifically, when the weak discriminator 16 obtains three sets of differences (see 6 points), the sorting unit 34 determines the position of the reference point that is most different from the origin among the 6 points, and the weak discriminator 16. Sorting can be performed as the reference position of the image in the difference calculation in. Alternatively, any one of the six reference points may be sorted as the image reference position in the difference calculation. Further, the centroid positions of a plurality of reference points in the weak discriminator 16, for example, the centroid positions of six reference points may be sorted as the image reference positions in the difference calculation.

ここで、単に基本特徴タイプに応じてグループ分けを行っただけであれば、基本特徴タイプは同じでも、ある段の弱判別器における差分計算の際の画像の参照箇所とその次の段の弱判別器における差分計算の際の画像の参照箇所とが離れている場合が多いと考えられる。その場合、ある段の弱判別器の処理において、その弱判別器が差分を求める際に参照する位置付近の画像がキャッシュメモリに格納されたとしても、その次の段の弱判別器が差分を求める際に画像のキャッシュがヒットすることはない。 Here, if the grouping is simply performed according to the basic feature type, even if the basic feature type is the same, the reference position of the image in the difference calculation in the weak classifier at a certain stage and the weak level at the next stage. It is considered that there are many cases where the reference location of the image at the time of difference calculation in the discriminator is far away. In that case, in the processing of the weak classifier at a certain stage, even if an image near the position referred to when the weak classifier calculates the difference is stored in the cache memory, the weak classifier at the next stage The image cache never hits when seeking.

本実施形態では、複数の弱判別器１６を、差分計算の際の画像の参照位置に従った並び順でカスケード接続した判別器１３を用いる。弱判別器１６が画像の参照箇所に従った並び順で並べられている場合、後段の弱判別器１６が前段の弱判別器１６の参照箇所と近い部分を参照して差分を求め、画像のキャッシュがヒットする可能性がある。ルックアップテーブルの参照の局所化だけでなく、画像についても参照の局所化を図ることができ、差分を求める際の画像参照を効率的に行うことが可能である。 In the present embodiment, a discriminator 13 is used in which a plurality of weak discriminators 16 are cascade-connected in the arrangement order according to the reference position of the image at the time of difference calculation. When the weak discriminators 16 are arranged in an order according to the reference location of the image, the subsequent weak discriminator 16 obtains a difference by referring to a portion close to the reference location of the weak discriminator 16 in the previous stage, and There is a possibility of hitting the cache. In addition to localizing the reference of the lookup table, it is possible to localize the reference for the image, and it is possible to efficiently perform the image reference when obtaining the difference.

なお、第２実施形態では、基本特徴タイプごとにグループ化を行い、全ての基本特徴タイプについて、同じ基本特徴タイプの弱判別器１６が連続してカスケード接続されるものとして説明したが、これには限定されない。必ずしも、全ての基本特徴タイプについて、同じ基本特徴タイプの弱判別器１６が連続して並んでいる必要はない。例えば、基本特徴タイプの使用頻度に応じて、いくつかの基本特徴タイプはグループ化の対象から除外し、除外した基本特徴タイプの弱判別器１６については連続してカスケード接続しないという構成も可能である。 In the second embodiment, grouping is performed for each basic feature type, and the weak classifiers 16 of the same basic feature type are continuously cascaded for all the basic feature types. Is not limited. The weak classifiers 16 of the same basic feature type do not necessarily have to be continuously arranged for all basic feature types. For example, depending on the frequency of use of the basic feature types, some basic feature types may be excluded from grouping targets, and the weak classifiers 16 of the excluded basic feature types may not be cascaded continuously. is there.

また、第３実施形態では、基本特徴タイプでグループ化した後に、弱判別器１６を差分計算の際の画像の参照位置に応じて並べる例について説明したが、これには限定されない。例えば、基本特徴タイプでグループ化せずに、弱判別器１６を差分計算の際の画像の参照位置に応じて並べてもよい。すなわち、各弱判別器１６における差分計算の際の画像の参照位置に従った並び順で複数の弱判別器１６をカスケード接続し、判別器１３を構成してもよい。その場合でも、画素値参照の際のキャッシュヒットの向上を見込むことができ、処理の高速化が可能である。 In the third embodiment, the example in which the weak classifiers 16 are arranged according to the reference positions of the images in the difference calculation after grouping by the basic feature type has been described. However, the present invention is not limited to this. For example, the weak classifiers 16 may be arranged according to the reference position of the image at the time of difference calculation without grouping by basic feature type. That is, the discriminator 13 may be configured by cascading a plurality of weak discriminators 16 in the arrangement order according to the reference position of the image at the time of difference calculation in each weak discriminator 16. Even in this case, it is possible to expect an improvement in cache hit when referring to the pixel value, and the processing speed can be increased.

上記各実施形態では、判別器１３が早期終了を行わないこととして説明を行ったが、判別器１３において早期終了を行ってもよい。例えば、数千の弱判別器を、数百の弱判別器ごとにブロック化し、ブロックごとに早期終了の判断を行うようにしてもよい。その場合は、同一ブロック内で、基本特徴タイプが同じ弱判別器が連続して並ぶように、複数の弱判別器をカスケード接続すればよい。または、ブロックごとに、差分計算の際の画像の参照箇所に従った並び順で弱判別器を並べればよい。その場合、ブロック内の処理において参照の局所化を図ることができ、ブロック内で弱判別器が判別に有効な順に並んでいる場合に比して、処理時間を短縮できる。ブロックごとに基本特徴量タイプの母集団を変えて学習し、複数ブロックから構成させる強判別器を構成することも可能であり、その場合、ブロックごとに最後に１回だけ早期終了判断を行ってもよい。 In each of the above-described embodiments, it has been described that the discriminator 13 does not perform early termination. However, the discriminator 13 may perform early termination. For example, thousands of weak classifiers may be divided into blocks of several hundred weak classifiers, and early termination may be determined for each block. In that case, a plurality of weak classifiers may be cascade-connected so that weak classifiers having the same basic feature type are continuously arranged in the same block. Alternatively, the weak classifiers may be arranged for each block in the arrangement order according to the reference location of the image at the time of difference calculation. In this case, the reference can be localized in the processing in the block, and the processing time can be shortened as compared with the case where the weak classifiers are arranged in the order effective for the determination in the block. It is also possible to construct a strong discriminator consisting of multiple blocks by learning by changing the basic feature type population for each block. In that case, an early termination decision is made only once at the end of each block. Also good.

以上、本発明をその好適な実施形態に基づいて説明したが、本発明のオブジェクト判別装置、方法、及びプログラムは、上記実施形態にのみ限定されるものではなく、上記実施形態の構成から種々の修正及び変更を施したものも、本発明の範囲に含まれる。 As described above, the present invention has been described based on the preferred embodiments. However, the object discriminating apparatus, method, and program of the present invention are not limited to the above-described embodiments, and various configurations are possible from the configuration of the above-described embodiments. Modifications and changes are also included in the scope of the present invention.

１０：オブジェクト判別装置
１１：画像入力手段
１２：オブジェクト候補点検出手段
１３：判別器（強判別器）
１４：ルックアップテーブル
１５：差分画像生成手段
１６：弱判別器
２１：前処理手段
２２：平滑化処理手段
２３：差分画像生成手段
２４：合算手段
２５：位置推定手段
２６：サイズ推定手段
２７：部分画像生成手段
３０：判別器構成装置
３１：学習結果入力手段
３２：グループ化手段
３３：再配置手段
３４：ソート手段
５１：解像度変換手段
５２：動き領域抽出手段 10: Object discrimination device 11: Image input means 12: Object candidate point detection means 13: Discriminator (strong discriminator)
14: Lookup table 15: Difference image generation means 16: Weak discriminator 21: Preprocessing means 22: Smoothing processing means 23: Difference image generation means 24: Summation means 25: Position estimation means 26: Size estimation means 27: Part Image generating means 30: discriminator configuration device 31: learning result input means 32: grouping means 33: rearrangement means 34: sorting means 51: resolution conversion means 52: motion region extraction means

Claims

A strong discriminator in which a plurality of weak discriminators each of which obtains a difference between at least one set of two points in the input image and obtains a score relating to the presence of the detection target based on the obtained difference are cascaded;
The weak discriminator sets a deviation amount according to the positional relationship between two points of the image whose difference is to be obtained, and generates a difference image between the input image and an image obtained by shifting the input image by the set deviation amount. Difference image generation means,
An object discriminating apparatus, wherein at least a part of the plurality of weak discriminators obtains the score by obtaining a difference between the at least one set of two points with reference to the difference image. .

The strong discriminator scans a template in the input image in a predetermined scanning order, and executes processing in the plurality of weak discriminators for each position of the scanned template. Item 4. The object discrimination device according to Item 1.

The difference image generation means sequentially sets a shift amount corresponding to a positional relationship between a plurality of two points for which a difference is to be obtained by the plurality of weak classifiers, and generates a plurality of difference images corresponding to each shift amount. The object discriminating apparatus according to claim 1, wherein the object discriminating apparatus is a thing.

The difference image generating means sequentially sets a shift amount according to the positional relationship between all two points for which a difference is to be obtained by the plurality of weak discriminators, and each of the plurality of weak discriminators, The object discriminating apparatus according to claim 3, wherein a difference between the at least one set of two points is acquired with reference to the difference image.

5. The process according to claim 1, wherein the strong classifier executes processing in each weak classifier without performing early termination from the first stage to the last stage of the cascade-connected weak classifier. The object discrimination device according to any one of the above.

Each of the plurality of weak classifiers obtains a difference between the at least one set of two points by any one of a plurality of basic feature types related to difference calculation. In the strong classifier, the basic feature types are the same. 6. The object discriminating apparatus according to claim 1, wherein the weak discriminators are continuously arranged.

In the strong discriminator, when there are a plurality of weak discriminators having the same basic feature type, the plurality of weak discriminators having the same basic feature type are arranged according to the reference position of the image in the difference calculation in each weak discriminator. The object discrimination device according to claim 6, wherein the object discrimination devices are arranged in order.

6. The weak classifiers according to claim 1, wherein the weak classifiers are arranged in an arrangement order according to a reference position of an image in the difference calculation in each weak classifier. Object determination device.

Object candidate inspection that estimates the position of an object from an image to be processed, cuts out an image around the estimated object position, and supplies the cut-out image as the input image to the strong discriminator and the difference image generation unit, respectively 9. The object discriminating apparatus according to claim 1, further comprising an output unit.

The object candidate point detection means
Smoothing processing means for repeatedly performing a process of convolving a smoothing filter having a filter characteristic corresponding to the contour shape of an object into an image, and generating a plurality of smoothed images having different scales from the frame image;
A difference image generating means for generating a plurality of difference images between two smoothed images having different scales among the plurality of smoothed images, while changing the scale;
A summing means for summing the plurality of difference images to generate a summed image;
Position estimation means for estimating the position of the object based on the pixel value in the combined image;
The object discriminating apparatus according to claim 9, further comprising: a partial image generating unit that cuts out an image of a region around the estimated position from the frame image.

The smoothing processing means performs a × k smoothed images L (x, y, σ _i ) (i = ₁ to _{a ×)} from the scale σ ₁ to σ _{a × k} (a and k are integers of 2 or more). k), and the difference image generation means smoothes k difference images G (x, y, σ _j ) (j = 1 to k) from the scales σ ₁ to σ _k , respectively, on the scale σ _j . And generating _a smoothed image L (x, y, σ _{j × a} ) having _a scale σ _{j × a} based on a difference between the converted image L (x, y, σ _j ) and the smoothed image L (x, y, σ _{j × a} ) having _a scale σ _{j × a.} 10. The object discrimination device according to 10.

The smoothing processing means generates r smoothed images L (x, y, σ _i ) (i = ₁ to _r ) from scale σ ₁ to σ _r (r is an integer of 3 or more), and the difference The image generating means outputs kp differential images G (x, y, σ _j ) (j = 1 to k−p) from the scale σ ₁ to σ _k−p (p is an integer of 1 or more), smoothed image L of each scale _{σ j (x, y, σ} j) is characterized in that to produce on the basis of the difference between the scale sigma _{j + p} of the smoothed image _{L (x, y, σ j} + p) and The object discrimination device according to claim 10.

Each performing a plurality of weak discriminations in a cascade to obtain a score relating to the presence of the detection object based on a difference between at least one set of two points in the input image; and
Generating a difference image between the input image and an image obtained by shifting the input image by a shift amount according to a positional relationship between two points of an image whose difference should be obtained by the weak discrimination;
In at least a part of the steps of performing the plurality of weak discriminations in cascade, the difference between the at least one set of two points is acquired with reference to the difference image, and the score is obtained. .

On the computer,
A procedure for performing in cascade a plurality of weak discriminations, each of which obtains a score relating to the presence of a detection object based on a difference between at least one set of two points in the input image;
A step of generating a difference image between the input image and an image obtained by shifting the input image by a shift amount corresponding to a positional relationship between two points of the image whose difference should be obtained by the weak discrimination;
In order to cause a computer to execute a procedure for obtaining the score by acquiring a difference between the at least one set of two points with reference to the difference image in at least a part of the procedure of performing the plurality of weak discriminations in cascade. Program.