JP5748472B2

JP5748472B2 - Object discrimination device, method, and program

Info

Publication number: JP5748472B2
Application number: JP2010278913A
Authority: JP
Inventors: 與那覇　誠; 誠與那覇
Original assignee: Fujifilm Corp
Current assignee: Fujifilm Corp
Priority date: 2010-12-15
Filing date: 2010-12-15
Publication date: 2015-07-15
Anticipated expiration: 2030-12-15
Also published as: JP2012128622A

Description

本発明は、オブジェクト判別装置、方法、及びプログラムに関し、更に詳しくは、画像中に検出対象のオブジェクトが含まれているか否かを判別するオブジェクト判別装置、方法、及びプログラムに関する。 The present invention relates to an object determination device, method, and program, and more particularly, to an object determination device, method, and program for determining whether or not an object to be detected is included in an image.

コンピュータなどの計算機を用いて、写真画像などのデジタル画像から顔等の所定の対象物（オブジェクト）を検出する方法が種々提案されている。画像から対象物を検出する方法としては、例えば比較的古くから利用されているテンプレートマッチングの手法が知られている。また、近年、ブースティング（boosting）と呼ばれる機械学習の手法を用いて判別器を構成し、その判別器を用いて画像から対象物を検出する手法も注目されている。ブースティングを用いた判別器の学習、及び、その判別器を用いたオブジェクト検出は、例えば特許文献１や特許文献２に記載されている。 Various methods for detecting a predetermined object (object) such as a face from a digital image such as a photographic image using a computer such as a computer have been proposed. As a method for detecting an object from an image, for example, a template matching method that has been used for a relatively long time is known. In recent years, attention has also been paid to a method of configuring a discriminator using a machine learning technique called boosting and detecting an object from an image using the discriminator. Learning of a discriminator using boosting and object detection using the discriminator are described in Patent Document 1 and Patent Document 2, for example.

一般に、ブースティング学習により生成された判別器は、複数の、例えば数百から数千の弱判別器を有する。それら複数の弱判別器を直列に接続（カスケード接続）することで１つの判別器（強判別器）が構成される。一般に、弱判別器は、真の分類と若干の相関を有する分類器として定義される。各弱判別器は、特徴量計算を行い、その特徴量に基づくスコアを求める。強判別器は、カスケード接続された全ての弱判別器で求められたスコアの合計を所定のしきい値でしきい値処理し、合計スコアがしきい値以上のとき、処理対象画像に検出対象のオブジェクトが現れていると判断する。 In general, a discriminator generated by boosting learning has a plurality of, for example, hundreds to thousands of weak discriminators. A plurality of weak classifiers are connected in series (cascade connection) to form one classifier (strong classifier). In general, weak classifiers are defined as classifiers that have some correlation with the true classification. Each weak discriminator calculates a feature amount and obtains a score based on the feature amount. The strong classifier performs threshold processing on the total score obtained by all the cascaded weak classifiers with a predetermined threshold value. It is determined that the object of appears.

弱判別器における特徴量計算は、２点（２つの領域）間の画素値の差分が基本である。各弱判別器は、差分計算に関する複数の基本特徴タイプの何れかで差分計算を行い、入力画像から検出対象物の存在に関するスコアを求める。差分計算に関する基本特徴タイプは、例えば横方向に並ぶ２点間の差分、縦方向に並ぶ２点間の差分、斜め方向に並ぶ２点間の差分など、テンプレート内の２点間の相対的な位置関係で定義することができる。基本特徴タイプが２点間の位置関係を複数ペア（２ペア、３ペア、・・・）で持ち、弱判別器がその組み合わせに応じて特徴量を計算する場合もある。２ペアの場合は４点参照、３ペアの場合は６点参照となる。 The feature amount calculation in the weak classifier is based on the difference in pixel value between two points (two regions). Each weak discriminator performs a difference calculation using one of a plurality of basic feature types related to the difference calculation, and obtains a score related to the presence of the detection target from the input image. The basic feature types related to the difference calculation are, for example, a difference between two points in the template such as a difference between two points arranged in the horizontal direction, a difference between two points arranged in the vertical direction, and a difference between two points arranged in the diagonal direction. It can be defined by positional relationship. In some cases, the basic feature type has a positional relationship between two points in a plurality of pairs (2 pairs, 3 pairs,...), And the weak classifier calculates a feature amount according to the combination. In the case of 2 pairs, 4 points are referenced, and in the case of 3 pairs, 6 points are referenced.

オブジェクト検出装置は、例えば、６４０×４８０画素の検出対象の画像に対して、３２×３２画素のテンプレート（ウィンドウ）を１画素単位又は数画素単位でラスタスキャンし、テンプレートの各位置で切り出される部分画像を強判別器に与える。強判別器は、初段側から順次に弱判別器による判別（スコア計算）を行い、最終段に到達したときの各弱判別器のスコアの合計をしきい値処理する。強判別器は、スコア合計がしきい値以上のとき、テンプレートにより切り出される３２×３２画素の位置に、検出対象のオブジェクトが現れている旨を出力する。 The object detection device, for example, performs a raster scan of a 32 × 32 pixel template (window) in units of one pixel or several pixels on an image to be detected of 640 × 480 pixels, and is a portion cut out at each position of the template Give the image to the strong classifier. The strong discriminator sequentially performs discrimination (score calculation) by the weak discriminator from the first stage side, and performs threshold processing on the total score of each weak discriminator when the final stage is reached. When the total score is equal to or greater than the threshold value, the strong discriminator outputs that a detection target object appears at a position of 32 × 32 pixels cut out by the template.

特開２００７−４７９６５号公報JP 2007-47965 A 特開２００７−１２８１２７号公報JP 2007-128127 A

通常、強判別器では、各弱判別器においてその段までのスコアをしきい値処理し、スコアがしきい値より低いとき、後段の弱判別器の処理を行わずに処理を終了するアーリーリジェクト判断（early reject判断）が行われる。アーリーリジェクト（早期終了）を行うことで、検出対象のオブジェクトが含まれないことが明らかな画像に対しては、直列接続された数千の弱判別器のうちの比較的早い段階で処理を終了することができ、最終段の弱判別器まで処理を行う場合に比して処理を高速化できる。特許文献１及び２にも記載されるように、一般に、学習により生成された弱判別器は、重み付き正答率が高い順に線形結合され、１つの強判別器が構成される。言い換えれば、学習により生成された複数の判別器を、判別に有効な順に直列接続することで、強判別器が構成される。 Normally, in the strong classifier, each weak classifier performs threshold processing on the score up to that stage, and when the score is lower than the threshold, an early reject that ends the process without performing the subsequent weak classifier processing. A decision (early reject decision) is made. By performing early rejection (early termination), processing is completed at a relatively early stage among thousands of weak classifiers connected in series for images that clearly show that objects to be detected are not included. Therefore, the processing speed can be increased as compared with the case where the processing is performed up to the weak classifier at the final stage. As described in Patent Documents 1 and 2, in general, weak classifiers generated by learning are linearly combined in descending order of weighted correct answer rate to constitute one strong classifier. In other words, a strong discriminator is configured by connecting a plurality of discriminators generated by learning in series in order effective for discrimination.

ところで、近年、検出対象オブジェクトのおおよその位置とサイズとを高速に推定する技術が開発されている。この技術を、強判別器の前処理として用い、前処理において抽出されたエリアの画像を強判別器の処理対象画像として用いることを考える。その場合、前処理において抽出されるエリアのほとんどが検出対象オブジェクトのエリアとなるものと考えられるため、弱判別器の初期の段階で早期終了となることは少なく、ほとんどのケースで、弱判別器の最終段近くまで処理が進行することになると考えられる。従って、早期終了を行っても処理高速化の効果は大きくない。むしろ、各弱判別器で早期終了の判断（条件分岐処理）を行うことで、パイプライン処理の乱れ（ハザード）が生じ、処理高速化の阻害要因となる。 By the way, in recent years, a technique for estimating the approximate position and size of an object to be detected at high speed has been developed. Consider that this technique is used as pre-processing for a strong classifier, and an image of an area extracted in the pre-processing is used as a processing target image for the strong classifier. In that case, since most of the areas extracted in the pre-processing are considered to be the areas of the detection target object, it is unlikely that the weak classifier ends early, and in most cases the weak classifier It is thought that the process will proceed to near the last stage. Therefore, even if the early termination is performed, the effect of increasing the processing speed is not significant. Rather, if each weak discriminator determines early termination (conditional branch processing), disturbance of the pipeline processing (hazard) occurs, which hinders processing speedup.

早期終了の思想は、検出対象オブジェクトと背景の領域割合との大きな開きがベースとなっている。つまり、画像の大部分が背景領域で、検出対象オブジェクトが少ないという事前知識（仮定）をおいている。一般的な処理系においては、早期終了判断により処理を高速化できる。しかし、特に、オブジェクトが存在する確率が高い部分を対象に強判別器の処理を行うような場合は、上記したように早期終了判断が高速化の阻害要因になることを、本発明者は見出した。従来、早期終了を行わずに、カスケード接続された弱判別器の最終段まで一括で処理を行うとした場合に、処理を高速化できる手法は知られていなかった。 The idea of early termination is based on a large gap between the detection target object and the background area ratio. That is, prior knowledge (assuming) that most of the image is the background area and the number of detection target objects is small. In a general processing system, processing can be speeded up by early termination determination. However, the present inventor has found that, as described above, early termination determination is an impediment to speeding up, particularly when a strong classifier process is performed on a portion where there is a high probability that an object exists. It was. Conventionally, there has been no known method that can speed up processing when batch processing is performed up to the final stage of cascaded weak classifiers without early termination.

本発明は、上記に鑑み、弱判別器における処理を効率的に実行し、処理を高速化できるオブジェクト判別装置、方法、及びプログラムを提供することを目的とする。 In view of the above, an object of the present invention is to provide an object discriminating apparatus, method, and program capable of efficiently executing processing in a weak discriminator and accelerating the processing.

上記目的を達成するために、本発明は、それぞれが、差分計算に関する複数の基本特徴タイプの何れかで差分計算を行い、入力画像から検出対象物の存在に関するスコアを求める複数の弱判別器がカスケード接続された強判別器を備え、該強判別器では、前記基本特徴タイプが同じ弱判別器が連続して並べられていることを特徴とする第１のオブジェクト判別装置を提供する。 In order to achieve the above object, the present invention includes a plurality of weak discriminators, each performing a difference calculation with any of a plurality of basic feature types related to a difference calculation, and obtaining a score related to the presence of a detection target from an input image. There is provided a first object discriminating device including a strong discriminator connected in cascade, wherein weak discriminators having the same basic feature type are continuously arranged.

本発明の第１のオブジェクト判別装置では、前記基本特徴タイプごとに、前記差分計算の計算値から前記スコアを求めるためのルックアップテーブルが生成されており、前記弱判別器が、前記差分計算の計算値に基づいて前記ルックアップテーブルを参照することで前記スコアを求める構成を採用できる。 In the first object discriminating device of the present invention, a lookup table for obtaining the score from the calculated value of the difference calculation is generated for each basic feature type, and the weak discriminator is configured to calculate the difference calculation. A configuration can be adopted in which the score is obtained by referring to the lookup table based on the calculated value.

本発明の第１のオブジェクト判別装置は、前記複数の弱判別器が機械学習を用いて学習されており、該学習により生成された複数の弱判別器を前記基本特徴タイプに応じて複数のグループにグループ化し、同じグループに所属する弱判別器が連続して並ぶように前記複数の弱判別器をカスケード接続することで前記強判別器が構成されるものとすることができる。 In the first object discriminating apparatus of the present invention, the plurality of weak discriminators are learned using machine learning, and the plurality of weak discriminators generated by the learning are divided into a plurality of groups according to the basic feature type. The strong classifiers can be configured by cascading the plurality of weak classifiers so that weak classifiers belonging to the same group are successively arranged.

前記強判別器では、基本特徴タイプが同じ弱判別器が複数あるとき、該基本特徴タイプが同じ複数の弱判別器が、各弱判別器における差分計算の際の画像の参照位置に従った並び順で並べられている構成とすることができる。この場合、前記基本特徴タイプが同じ複数の弱判別器が、各弱判別器における差分計算の際の画像の参照位置がラスタスキャン走査順に従って現れるように並べられている構成を採用できる。 In the strong discriminator, when there are a plurality of weak discriminators having the same basic feature type, the plurality of weak discriminators having the same basic feature type are arranged according to the reference position of the image in the difference calculation in each weak discriminator. It can be set as the structure arranged in order. In this case, it is possible to adopt a configuration in which a plurality of weak classifiers having the same basic feature type are arranged so that the reference positions of the images in the difference calculation in each weak classifier appear in the raster scan scanning order.

本発明は、また、それぞれが、差分計算に関する複数の基本特徴タイプの何れかで差分計算を行い、入力画像から検出対象物の存在に関するスコアを求める複数の弱判別器がカスケード接続された強判別器を備え、該強判別器では、各弱判別器における差分計算の際の画像の参照位置に従った並び順で前記弱判別器が並べられていることを特徴とする第２のオブジェクト判別装置を提供する。 The present invention also provides a strong discrimination in which a plurality of weak discriminators each performing a difference calculation with any one of a plurality of basic feature types relating to the difference calculation and obtaining a score relating to the presence of the detection target from the input image are cascade-connected. A second object discriminating device, wherein the weak discriminators are arranged in the order of arrangement according to the reference position of the image at the time of difference calculation in each weak discriminator. I will provide a.

本発明の第２のオブジェクト判別装置では、前記複数の弱判別器が、各弱判別器における差分計算の際の画像の参照位置がラスタスキャン走査順に従って現れるように並べられている構成とすることができる。 In the second object discriminating apparatus of the present invention, the plurality of weak discriminators are arranged so that the reference positions of the images in the difference calculation in each weak discriminator appear in the raster scan scanning order. Can do.

本発明の第１及び第２のオブジェクト判別装置は、オブジェクトの位置を推定し、処理対象の画像から前記推定したオブジェクトの位置の周辺の画像を切り出して前記強判別器に与えるオブジェクト候補点検出手段を更に備える構成を採用することができる。 The first and second object discriminating apparatuses of the present invention estimate an object position, cut out an image around the estimated object position from an image to be processed, and give the object candidate point detection means to the strong discriminator It is possible to adopt a configuration further comprising

前記オブジェクト候補点検出手段が、オブジェクトの輪郭形状に対応したフィルタ特性を有する平滑化フィルタを画像に畳み込む処理を繰り返し行い、前記フレーム画像からスケールが異なる複数枚の平滑化画像を生成する平滑化処理手段と、前記複数枚の平滑化画像のうち、スケールが互いに異なる２枚の平滑化画像間の差分画像を、スケールを変えつつ複数枚生成する差分画像生成手段と、前記複数枚の差分画像を合算し合算画像を生成する合算手段と、前記合算画像における画素値に基づいてオブジェクトの位置を推定する位置推定手段と、前記フレーム画像から前記推定された位置の周辺の領域の画像を切り出す部分画像生成手段とを含む構成とすることができる。 A smoothing process in which the object candidate point detection unit repeatedly performs a process of convolving a smoothing filter having a filter characteristic corresponding to the contour shape of the object into an image, and generates a plurality of smoothed images having different scales from the frame image Means, a difference image generating means for generating a plurality of difference images between two smoothed images having different scales among the plurality of smoothed images, while changing the scale, and the plurality of difference images. Summing means for summing and generating a summed image, position estimating means for estimating the position of an object based on pixel values in the summed image, and a partial image for cutting out an image of a region around the estimated position from the frame image It can be set as the structure containing a production | generation means.

前記平滑化処理手段がスケールσ_１からσ_ａ×ｋ（ａ及びｋは２以上の整数）までのａ×ｋ枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）（ｉ＝１〜ａ×ｋ）を生成し、前記差分画像生成手段が、スケールσ_１からσ_ｋまでのｋ枚の差分画像Ｇ（ｘ，ｙ，σ_ｊ）（ｊ＝１〜ｋ）を、それぞれスケールσ_ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ）とスケールσ_ｊ×ａの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ×ａ）との差分に基づいて生成してもよい。 The smoothing processing means performs a × k smoothed images L (x, y, σ _i ) (i = ₁ to _{a ×)} from the scale σ ₁ to σ _{a × k} (a and k are integers of 2 or more). k), and the difference image generation means smoothes k difference images G (x, y, σ _j ) (j = 1 to k) from the scales σ ₁ to σ _k , respectively, on the scale σ _j . The generated image L (x, y, σ _j ) may be generated based on the difference between the smoothed image L (x, y, σ _{j × a} ) of the scale σ _{j × a} .

上記に代えて、前記平滑化処理手段がスケールσ_１からσ_ｒ（ｒは３以上の整数）までのｒ枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）（ｉ＝１〜ｒ）を生成し、前記差分画像生成手段が、スケールσ_１からσ_ｋ−ｐ（ｐは１以上の整数）までのｋ−ｐ枚の差分画像Ｇ（ｘ，ｙ，σ_ｊ）（ｊ＝１〜ｋ−ｐ）を、それぞれスケールσ_ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ）とスケールσ_ｊ＋ｐの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ＋ｐ）との差分に基づいて生成してもよい。 Instead of the above, the smoothing processing means outputs r smoothed images L (x, y, σ _i ) (i = ₁ to _r ) from the scale σ ₁ to σ _r (r is an integer of 3 or more). The difference image generation means generates kp difference images G (x, y, σ _j ) (j = 1 to k) from the scale σ ₁ to σ _k−p (p is an integer of 1 or more). the -p), the smoothed image L (x, respectively scale σ _{_j,} y, σ _j) the scale sigma _{j + p} of the smoothed image L (x, y, may be generated based on the difference between the σ _{j + p)} .

本発明は、それぞれが、差分計算に関する複数の基本特徴タイプの何れかで差分計算を行い、入力画像から検出対象物の存在に関するスコアを求める複数の弱判別をカスケードに実行するオブジェクト判別方法であって、前記複数の弱判別のうちで前記基本特徴タイプが同じ弱判別を連続して実行することを特徴とする第１のオブジェクト判別方法を提供する。 The present invention is an object discrimination method in which a difference calculation is performed using any one of a plurality of basic feature types relating to a difference calculation, and a plurality of weak discriminations for obtaining a score relating to the presence of a detection target from an input image are executed in cascade. Thus, a first object discrimination method is provided in which weak discrimination having the same basic feature type among the plurality of weak discriminations is continuously executed.

更に本発明は、コンピュータに、それぞれが、差分計算に関する複数の基本特徴タイプの何れかで差分計算を行い、入力画像から検出対象物の存在に関するスコアを求める複数の弱判別をカスケードに実行させるためのプログラムであって、前記コンピュータに、前記複数の弱判別のうちで前記基本特徴タイプが同じ弱判別を連続して実行させるための第１のプログラムを提供する。 Furthermore, the present invention causes a computer to perform a difference calculation with any one of a plurality of basic feature types related to a difference calculation, and to execute a plurality of weak discriminations in a cascade to obtain a score related to the presence of a detection target from an input image. A first program for causing the computer to successively execute weak discrimination having the same basic feature type among the plurality of weak discriminations.

また、本発明は、それぞれが、差分計算に関する複数の基本特徴タイプの何れかで差分計算を行い、入力画像から検出対象物の存在に関するスコアを求める複数の弱判別をカスケードに実行するオブジェクト判別方法であって、各弱判別における差分計算の際の画像の参照位置に従った順序で前記複数の弱判別を実行することを特徴とする第２のオブジェクト判別方法を提供する。 In addition, the present invention provides an object discrimination method in which a plurality of weak discriminations are performed in cascade, each performing a difference calculation with any one of a plurality of basic feature types relating to difference calculation, and obtaining a score relating to the presence of a detection target from an input image The second object discrimination method is characterized in that the plurality of weak discriminations are executed in the order according to the reference position of the image at the time of difference calculation in each weak discrimination.

本発明は、コンピュータに、それぞれが、差分計算に関する複数の基本特徴タイプの何れかで差分計算を行い、入力画像から検出対象物の存在に関するスコアを求める複数の弱判別をカスケードに実行させるためのプログラムであって、前記コンピュータに、各弱判別における差分計算の際の画像の参照位置に従った順序で前記複数の弱判別を実行させるための第２のプログラムを提供する。 The present invention allows a computer to perform a difference calculation with any one of a plurality of basic feature types related to a difference calculation, and to perform a plurality of weak discriminations in a cascade to obtain a score related to the presence of a detection target from an input image. A second program is provided for causing the computer to execute the plurality of weak discriminations in an order according to a reference position of an image at the time of difference calculation in each weak discrimination.

本発明の第１のオブジェクト判別装置、方法、及びプログラムは、基本特徴タイプが同じ弱判別を連続して実行する。基本特徴タイプが同じ複数の弱判別を連続して実行することで、弱判別において検出対象のオブジェクトの存在に関するスコアを求める際に参照の局所化を図ることができ、参照処理を効率的に行うことで、入力画像に検出対象のオブジェクトが存在するか否かの判別処理を高速化できる。 The first object discriminating apparatus, method, and program of the present invention continuously execute weak discrimination with the same basic feature type. By continuously executing multiple weak classifications with the same basic feature type, it is possible to localize references when obtaining a score related to the presence of an object to be detected in the weak classification, thereby efficiently performing reference processing. As a result, it is possible to speed up the process of determining whether or not an object to be detected exists in the input image.

本発明の第２のオブジェクト判別装置、方法、及びプログラムは、複数の弱判別を、各弱判別における差分計算の際の画像の参照位置に従った順序で実行する。弱判別を、画像の参照位置に従った順序で実行することで、画像参照について参照の局所化を図ることができ、参照処理を効率的に行うことで、入力画像に検出対象のオブジェクトが存在するか否かの判別処理を高速化できる。 The second object discriminating apparatus, method, and program of the present invention execute a plurality of weak discriminations in the order according to the reference position of the image at the time of difference calculation in each weak discrimination. By performing weak discrimination in the order according to the reference position of the image, it is possible to localize the reference for the image reference, and the object to be detected exists in the input image by performing the reference process efficiently. It is possible to speed up the process of determining whether or not to do so.

本発明の第１実施形態のオブジェクト判別装置を示すブロック図。1 is a block diagram illustrating an object determination device according to a first embodiment of the present invention. 判別器の構成を示すブロック図。The block diagram which shows the structure of a discriminator. （ａ）〜（ｄ）は、基本特徴タイプを例示する図。(A)-(d) is a figure which illustrates a basic feature type. 基本特徴タイプに対して設定可能なパラメータを例示する図。The figure which illustrates the parameter which can be set up with respect to a basic feature type. 判別器の構成に用いられる判別器構成装置を示すブロック図。The block diagram which shows the discriminator structure apparatus used for the structure of a discriminator. （ａ）は学習後の判別器を示し、（ｂ）は再配置後の判別器を示すブロック図。(A) shows the discriminator after learning, (b) is a block diagram showing the discriminator after rearrangement. オブジェクト候補点検出手段の構成例を示すブロック図。The block diagram which shows the structural example of an object candidate point detection means. オブジェクト候補点検出手段の動作手順を示すフローチャート。The flowchart which shows the operation | movement procedure of an object candidate point detection means. （ａ）は、基本特徴タイプ１における弱判別器の並び順を示すブロック図、（ｂ）は、テンプレート内での各弱判別器の画像の参照位置を示す図。(A) is a block diagram showing the order of weak classifiers in basic feature type 1, and (b) is a diagram showing the reference positions of the images of the weak classifiers in the template. 本発明の第２実施形態における判別器の構成に用いる判別器構成装置を示すブロック図。The block diagram which shows the discriminator structure apparatus used for the structure of the discriminator in 2nd Embodiment of this invention.

以下、図面を参照し、本発明の実施の形態を詳細に説明する。図１は、本発明の第１実施形態のオブジェクト判別装置を示す。オブジェクト判別装置１０は、画像入力手段１１、オブジェクト候補点検出手段１２、判別器１３、及びルックアップテーブル１４を備える。オブジェクト判別装置１０内の各部の機能は、コンピュータ（プロセッサ）が所定のプログラムに従って処理を実行することで実現可能である。オブジェクト判別装置１０は、例えばカメラなどに組み込まれ、カメラで撮影すべき画像に検出対象のオブジェクトが存在するか否かの判別を行う。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 shows an object discriminating apparatus according to a first embodiment of the present invention. The object discriminating apparatus 10 includes an image input unit 11, an object candidate point detection unit 12, a discriminator 13, and a lookup table 14. The function of each unit in the object determination device 10 can be realized by a computer (processor) executing processing according to a predetermined program. The object determination device 10 is incorporated in, for example, a camera and determines whether or not an object to be detected exists in an image to be captured by the camera.

画像入力手段１１は、処理対象の画像を入力する。画像入力手段１１は、例えば６４０×４８０画素の画像を処理対象画像として入力する。画像入力手段１１は、例えば動画像を構成する各画像（各フレームの画像）を所定のレートで順次に入力してもよい。オブジェクト候補点検出手段１２は、処理対象画像から、所定のアルゴリズムで検出対象オブジェクトのおおよその位置を推定する。また、オブジェクト候補点検出手段１２は、オブジェクトのサイズを推定する。オブジェクト候補点検出手段１２は、処理対象画像から、オブジェクトが存在すると推定される位置の周辺の画像を切り出し、切り出した画像を、推定したサイズに応じて拡大／縮小する。なお、画像入力手段１１は、入力された処理対象画像に対して、例えばノイズの除去やフレーム間の輝度変動の抑制などの所定の画像処理を施し、処理後の画像をオブジェクト候補点検出手段１２に入力するようにしてもよい。 The image input unit 11 inputs an image to be processed. For example, the image input unit 11 inputs an image of 640 × 480 pixels as a processing target image. For example, the image input unit 11 may sequentially input each image (image of each frame) constituting the moving image at a predetermined rate. The object candidate point detection means 12 estimates the approximate position of the detection target object from the processing target image using a predetermined algorithm. Further, the object candidate point detection means 12 estimates the size of the object. The object candidate point detection unit 12 cuts out an image around a position where an object is estimated to exist from the processing target image, and enlarges / reduces the cut-out image according to the estimated size. The image input unit 11 performs predetermined image processing such as noise removal and suppression of luminance fluctuation between frames on the input processing target image, and the processed image is subjected to object candidate point detection unit 12. You may make it input into.

判別器１３は、オブジェクト候補点検出手段１２から、オブジェクト候補点検出手段１２が切り出したオブジェクトが存在すると推定される位置の周辺の画像を入力する。判別器１３は、それぞれが、入力画像から検出対象物の存在に関するスコアを求める複数の弱判別器を含む。複数の弱判別器をカスケード接続することで、判別器（強判別器）１３が構成される。判別器１３は、各弱判別器により求められたスコアの合計をしきい値処理し、入力画像に検出対象のオブジェクトが存在しているか否かを判別する。 The discriminator 13 receives from the object candidate point detection unit 12 an image around the position where the object extracted by the object candidate point detection unit 12 is estimated to exist. Each of the discriminators 13 includes a plurality of weak discriminators that obtain a score related to the presence of the detection target from the input image. A classifier (strong classifier) 13 is configured by cascading a plurality of weak classifiers. The discriminator 13 performs threshold processing on the total score obtained by each weak discriminator, and discriminates whether or not an object to be detected exists in the input image.

判別器１３は、例えば入力画像のサイズがテンプレートのサイズよりも大きいときは、入力画像内でテンプレートをラスタスキャンして入力画像からテンプレートのサイズに相当する画像を切り出し、その切り出した画像を弱判別器に与えてスコアを求めればよい。判別器１３は、入力画像のサイズがテンプレートのサイズと等しいときは、入力画像を弱判別器に与えてスコアを求めればよい。 For example, when the size of the input image is larger than the size of the template, the discriminator 13 raster-scans the template in the input image, cuts out an image corresponding to the size of the template from the input image, and weakly discriminates the cut-out image. You just need to give it to the vessel and get the score. When the size of the input image is equal to the size of the template, the discriminator 13 may obtain the score by giving the input image to the weak discriminator.

図２は、判別器１３の構成を示す。判別器１３は、カスケード接続された複数の弱判別器１５を含んでいる。各弱判別器１５は、差分計算に関する複数の基本特徴タイプの何れかで差分計算を行う。判別器１３は、テンプレートの大きさ、例えば３２×３２画素の大きさの画像中に検出対象のオブジェクトが存在する場合の画像と存在しない場合の画像とを用いて、機械学習を用いて生成される。各弱判別器１５が何れの基本特徴タイプで差分計算を行うかは、学習のプロセスにおいて決まる。 FIG. 2 shows the configuration of the discriminator 13. The discriminator 13 includes a plurality of weak discriminators 15 connected in cascade. Each weak discriminator 15 performs a difference calculation using one of a plurality of basic feature types related to the difference calculation. The discriminator 13 is generated using machine learning using an image when an object to be detected exists in an image having a template size, for example, a size of 32 × 32 pixels, and an image when the object to be detected does not exist. The Which basic feature type each weak classifier 15 performs difference calculation is determined in the learning process.

判別器１３では、基本特徴タイプが同じ弱判別器１５が連続して並べられている。図２においては、基本特徴タイプ１、基本特徴タイプ２、及び基本特徴タイプ３の弱判別器１５がそれぞれまとめられ、連続してカスケード接続されている。また、基本特徴タイプ１の弱判別器１５のグループに次に基本特徴タイプ２の弱判別器のグループが配置され、基本特徴タイプ２の弱判別器１５のグループの次に基本特徴タイプ３の弱判別器１５のグループが配置されている。 In the discriminator 13, weak discriminators 15 having the same basic feature type are continuously arranged. In FIG. 2, the weak classifiers 15 of the basic feature type 1, the basic feature type 2, and the basic feature type 3 are grouped and cascaded in succession. In addition, a group of weak classifiers of basic feature type 2 is arranged next to the group of weak classifiers 15 of basic feature type 1, and the weakness of basic feature type 3 is next to the group of weak classifiers 15 of basic feature type 2. A group of discriminators 15 is arranged.

各弱判別器１５は、入力画像を参照して、入力画像中の少なくとも１組の２点間の画素値の差分を計算する。弱判別器１５は、２つの画素位置の画素値の差分を計算してもよく、或いは２つの領域の画素値の差分を計算してもよい。領域間の画素値の差分の計算では、領域内の画素値の合計の差分を求めてもよいし、領域内の画素値の平均値の差分を求めてもよい。各弱判別器１５は、計算した差分に基づいてスコアを求める。各弱判別器１５は、前段の弱判別器１５までのスコアの累計に自身が求めたスコアを加算し、次段の弱判別器１５に渡す。この処理を最終段の弱判別器１５まで行い、最終的に得られたスコアが、判別器１３における検出対象オブジェクトの存在に関するスコアとなる。 Each weak classifier 15 refers to the input image and calculates a difference in pixel value between at least one set of two points in the input image. The weak discriminator 15 may calculate a difference between pixel values at two pixel positions, or may calculate a difference between pixel values of two regions. In the calculation of the pixel value difference between the regions, the total difference of the pixel values in the region may be obtained, or the average value difference of the pixel values in the region may be obtained. Each weak discriminator 15 obtains a score based on the calculated difference. Each weak discriminator 15 adds the score it finds to the total score up to the previous weak discriminator 15 and passes it to the next weak discriminator 15. This process is performed up to the weak discriminator 15 at the final stage, and the finally obtained score becomes a score relating to the presence of the detection target object in the discriminator 13.

図３（ａ）〜（ｄ）は、基本特徴タイプを例示する。図３（ａ）〜（ｄ）に例示する基本特徴タイプは、何れも３組の差分（６点参照：３組の画素間の差分、又は３組の領域間の差分）で差分計算を行うタイプである。図３（ａ）〜（ｄ）において、点線の矢印で結ばれる２点は、差分計算を行う点を表している。図３（ａ）及び（ｃ）に示す基本特徴タイプでは、縦方向に並ぶ２点間で差分計算を行う。一方、図３（ｂ）及び（ｄ）に示す基本特徴タイプでは、横方向に並ぶ２点間で差分計算を行う。 3A to 3D illustrate basic feature types. Each of the basic feature types illustrated in FIGS. 3A to 3D performs difference calculation using three sets of differences (see 6 points: differences between three sets of pixels or differences between three sets of regions). Type. In FIGS. 3A to 3D, two points connected by dotted arrows represent points where the difference calculation is performed. In the basic feature types shown in FIGS. 3A and 3C, difference calculation is performed between two points arranged in the vertical direction. On the other hand, in the basic feature type shown in FIGS. 3B and 3D, difference calculation is performed between two points arranged in the horizontal direction.

図４は、基本特徴タイプに対して設定可能なパラメータを例示する。ここでは、図３（ａ）に示す基本特徴タイプにおいて差分計算を行う６点を、それぞれ点Ｐｔ０〜Ｐｔ５と呼ぶものとする。図３（ａ）に示す基本特徴タイプに対して設定できるパラメータには、以下の３つのパラメータが考えられる。１つ目のパラメータは、差分を計算する２点間の距離ｄである。横方向の座標をｘ、縦方向の座標をｙとすると、点Ｐｔ１は、点Ｐｔ０の座標位置からｙ方向にｄだけ離れた位置となる。同様に、点Ｐｔ３は、点Ｐｔ２の座標位置からｙ方向にｄだけ離れた位置となり、点Ｐｔ５は、点Ｐｔ４の座標位置からｙ方向にｄだけ離れた位置となる。 FIG. 4 illustrates parameters that can be set for a basic feature type. Here, the six points for which the difference calculation is performed in the basic feature type shown in FIG. 3A are referred to as points Pt0 to Pt5, respectively. The following three parameters are conceivable as parameters that can be set for the basic feature type shown in FIG. The first parameter is the distance d between two points for calculating the difference. If the coordinate in the horizontal direction is x and the coordinate in the vertical direction is y, the point Pt1 is a position away from the coordinate position of the point Pt0 by d in the y direction. Similarly, the point Pt3 is a position away from the coordinate position of the point Pt2 by d in the y direction, and the point Pt5 is a position away from the coordinate position of the point Pt4 by d in the y direction.

残り２つのパラメータは、差分計算を行う３組の点の並びに関するパラメータであり、各組間の紙面横方向の間隔Ｐｘと、紙面縦方向のずれ量Ｐｙである。図４において、点Ｐｔ２は、点Ｐｔ０の座標位置からｘ方向にＰｘ、ｙ方向のＰｙだけ離れた位置となる。また、点Ｐｔ４は、点Ｐｔ２の座標位置からｘ方向にＰｘ、ｙ方向にＰｙだけ離れた位置となる。点Ｐｔ０の座標位置と、上記の３つのパラメータとが定まることで、図３（ａ）に示す基本特徴タイプで差分計算を行う弱判別器１５において、入力画像中のどの位置の差分を計算すればよいかが決まる。 The remaining two parameters are parameters relating to the arrangement of the three sets of points for which the difference calculation is performed, and are the space Px in the horizontal direction between the sets and the shift amount Py in the vertical direction of the paper. In FIG. 4, the point Pt2 is a position away from the coordinate position of the point Pt0 by Px in the x direction and Py in the y direction. Further, the point Pt4 is located away from the coordinate position of the point Pt2 by Px in the x direction and Py in the y direction. By determining the coordinate position of the point Pt0 and the above three parameters, the weak discriminator 15 that performs the difference calculation with the basic feature type shown in FIG. Decide what to do.

図１に戻り、ルックアップテーブル１４は、弱判別器１５（図２）における差分計算で求まる特徴空間と、検出対象のオブジェクトの存在に関するスコアとの関係を保持する。ルックアップテーブル１４は、例えば判別器１３の学習の際に、基本特徴タイプごとに生成される。各弱判別器１５は、差分計算の計算値に基づいて、自身の基本特徴タイプに対して用意されたルックアップテーブルを参照し、計算した差分からスコアを求める。 Returning to FIG. 1, the lookup table 14 holds the relationship between the feature space obtained by the difference calculation in the weak discriminator 15 (FIG. 2) and the score regarding the presence of the object to be detected. The lookup table 14 is generated for each basic feature type, for example, when the discriminator 13 learns. Each weak discriminator 15 refers to a lookup table prepared for its own basic feature type based on the calculated value of the difference calculation, and obtains a score from the calculated difference.

例えば、図３（ａ）に示す基本特徴タイプで差分計算を行う弱判別器１５は、３組の画素間の差分値から特徴空間を求める。弱判別器１５は、例えば図４の点Ｐｔ０と点Ｐｔ１との差分値をα、点Ｐｔ２と点Ｐｔ３との差分値をβ、点Ｐｔ４と点Ｐｔ５との差分値をγとして、（α，β，γ）を特徴空間として求める。弱判別器１５は、ルックアップテーブルの配列要素［α］［β］［γ］を参照し、その配列要素に格納されている値をスコアとして取得する。 For example, the weak discriminator 15 that performs difference calculation with the basic feature type shown in FIG. 3A obtains a feature space from the difference values between three sets of pixels. For example, the weak discriminator 15 assumes that the difference value between the points Pt0 and Pt1 in FIG. 4 is α, the difference value between the points Pt2 and Pt3 is β, the difference value between the points Pt4 and Pt5 is γ, (α, β, γ) is obtained as a feature space. The weak classifier 15 refers to the array element [α] [β] [γ] of the lookup table, and acquires the value stored in the array element as a score.

図５は、判別器１３の構成に用いられる判別器構成装置３０を示す。学習結果入力手段３１は、機械学習を用いて学習された複数の弱判別器１５を入力する。グループ化手段３２は、学習により得られた複数の弱判別器１５を、基本特徴タイプに応じて複数のグループにグループ化する。グループ化手段３２は、複数の弱判別器１５を、例えば基本特徴タイプごとにグループ化する。再配置手段３３は、同じグループに所属する弱判別器１５が連続して並ぶように複数の弱判別器をカスケード接続し、判別器１３を構成する。判別器構成装置３０の各部の機能は、コンピュータが所定のプログラムに従って処理を実行することで実現可能である。 FIG. 5 shows a discriminator constituting apparatus 30 used for the construction of the discriminator 13. The learning result input means 31 inputs a plurality of weak discriminators 15 learned using machine learning. The grouping means 32 groups the plurality of weak discriminators 15 obtained by learning into a plurality of groups according to the basic feature type. The grouping means 32 groups the plurality of weak classifiers 15 for each basic feature type, for example. The rearrangement unit 33 configures the discriminator 13 by cascading a plurality of weak discriminators so that the weak discriminators 15 belonging to the same group are continuously arranged. The function of each part of the discriminator constituting apparatus 30 can be realized by a computer executing processing according to a predetermined program.

図６（ａ）は学習後の判別器を示し、（ｂ）は再配置後の判別器を示す。一般に、学習により得られた弱判別器は、重み付き正答率が高い順に、つまり判別に有効な順に並んでいる。図６（ａ）は、複数の弱判別器が判別に有効な順にカスケード接続された状態を示している。再配置手段３３は、図６（ｂ）に示すように、判別器１３において、基本特徴タイプが同じ弱判別器が連続して配置されるように学習済みの弱判別器を並び替える。並び替えを行うことで、例えば学習後の判別器（図６（ａ））において初段を構成していた弱判別器が、再配置後の判別器１３（図６（ｂ））の中段に配置され、学習後の判別器において中段を構成していた弱判別器が、再配置後の判別器１３の初段に配置され得る。 6A shows the discriminator after learning, and FIG. 6B shows the discriminator after rearrangement. Generally, the weak classifiers obtained by learning are arranged in descending order of the weighted correct answer rate, that is, in an order effective for discrimination. FIG. 6A shows a state in which a plurality of weak classifiers are cascade-connected in the order effective for discrimination. As shown in FIG. 6B, the rearrangement unit 33 rearranges the learned weak classifiers in the classifier 13 so that weak classifiers having the same basic feature type are continuously arranged. By performing the rearrangement, for example, the weak discriminator constituting the first stage in the discriminator after learning (FIG. 6A) is arranged in the middle stage of the discriminator 13 after rearrangement (FIG. 6B). Then, the weak discriminator constituting the middle stage in the discriminator after learning can be arranged in the first stage of the discriminator 13 after rearrangement.

続いて、オブジェクト候補点検出手段１２の具体的な構成例を説明する。図７は、オブジェクト候補点検出手段１２の構成例を示す。オブジェクト候補点検出手段１２は、前処理手段２１、平滑化処理手段２２、差分画像生成手段２３、合算手段２４、位置推定手段２５、サイズ推定手段２６、及び部分画像生成手段２７を有する。オブジェクト候補点検出手段１２は、動画像内の特定パターン、例えば人物の頭部が存在すると推定される位置の周辺の画像を部分画像として切り出す。以下ではオブジェクト候補点検出手段１２が、オブジェクトが存在すると推定される位置を１つ推定し、その周辺の画像を部分画像として切り出すものとして説明を行う。 Next, a specific configuration example of the object candidate point detection unit 12 will be described. FIG. 7 shows a configuration example of the object candidate point detection means 12. The object candidate point detection unit 12 includes a preprocessing unit 21, a smoothing processing unit 22, a difference image generation unit 23, a summation unit 24, a position estimation unit 25, a size estimation unit 26, and a partial image generation unit 27. The object candidate point detection unit 12 cuts out a specific pattern in the moving image, for example, an image around a position where a human head is estimated to exist as a partial image. In the following description, it is assumed that the object candidate point detection means 12 estimates one position where an object is estimated to exist and cuts out the surrounding image as a partial image.

前処理手段２１は、解像度変換手段５１と動き領域抽出手段５２とを有する。解像度変換手段５１は、動画像を構成するフレーム画像を所定の解像度に低解像度化する。解像度変換手段５１は、例えば画像の解像度を縦横それぞれ１／８倍に変換する。 The preprocessing unit 21 includes a resolution conversion unit 51 and a motion region extraction unit 52. The resolution conversion means 51 lowers the frame image constituting the moving image to a predetermined resolution. The resolution conversion means 51 converts, for example, the resolution of an image to 1/8 times in the vertical and horizontal directions.

動き領域抽出手段５２は、動画像を構成するフレーム画像から動き領域を抽出し動き領域抽出画像を生成する。動き領域の抽出には、例えば背景画像やフレーム間画像の差分を算出するなど任意の手法を用いることができる。動き領域抽出手段５２は、抽出された動きの量に応じて、動きがある領域ほど白く（階調値が高く）、動きが少ない領域ほど黒く（階調値が低く）なるようなグレースケール画像を動き領域抽出画像として生成する。動き領域抽出手段５２は、例えば階調数２５６のグレースケール画像に対して所定の関数に従って階調を変換し、白から黒までの階調数を減少させるコントラスト低減処理を実施してもよい。動き領域抽出手段５２は、グレースケール画像に代えて、動き領域を白、背景領域を黒にするような２値化画像を動き領域抽出画像として生成してもよい。 The motion region extraction unit 52 extracts a motion region from the frame image constituting the motion image and generates a motion region extraction image. For extracting the motion region, for example, an arbitrary method such as calculating a difference between the background image and the inter-frame image can be used. The motion region extraction means 52 is a grayscale image in which the region with motion is white (the tone value is high) and the region with less motion is black (the tone value is low) according to the amount of motion extracted. Is generated as a motion region extraction image. For example, the motion region extraction unit 52 may perform a contrast reduction process for converting the gray scale according to a predetermined function on a grayscale image having 256 gray scales and reducing the gray scale from white to black. The motion area extraction unit 52 may generate a binarized image in which the motion area is white and the background area is black instead of the grayscale image as the motion area extraction image.

平滑化処理手段２２には、前処理手段２１で前処理された画像Ｐ（ｘ，ｙ）、すなわち解像度が低解像度化され、動き領域が抽出された画像が入力される。平滑化処理手段２２は、平滑化フィルタを画像に畳み込む処理を繰り返し行うことにより、スケールが異なる複数枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）を生成する。 The smoothing processing unit 22 is input with the image P (x, y) pre-processed by the pre-processing unit 21, that is, the image whose resolution is reduced and the motion region is extracted. The smoothing processing means 22 generates a plurality of smoothed images L (x, y, σ _i ) having different scales by repeatedly performing a process of convolving the smoothing filter into the image.

平滑化処理手段２２は、まず画像Ｐ（ｘ，ｙ）に平滑化フィルタを畳み込むことで平滑化画像Ｌ（ｘ，ｙ，σ_１）を生成し、その平滑化画像Ｌ（ｘ，ｙ，σ_１）に更に平滑化フィルタを畳み込むことでスケールσ_２の平滑化画像＋（ｘ，ｙ，σ_２）を生成する。平滑化処理手段２２は、以降同様に平滑化フィルタの畳み込みを繰り返し行い、任意のスケールσ_ｑの平滑化画像Ｌ（ｘ，ｙ，σ_ｑ）から次のスケールσ_ｑ＋１の平滑化画像Ｌ（ｘ，ｙ，σ_ｑ＋１）を生成する。 The smoothing processing means 22 first generates a smoothed image L (x, y, σ ₁ ) by convolving a smoothing filter with the image P (x, y), and the smoothed image L (x, y, σ). ₁ ) is further convolved with a smoothing filter to generate a smoothed image + (x, y, σ ₂ ) of scale σ ₂ . The smoothing processing means 22 repeats the convolution of the smoothing filter in the same manner thereafter, and from the smoothed image L (x, y, σ _q ) of an arbitrary scale σ _{q to} the smoothed image L (x of the next scale σ _{q + 1} , Y, σ _{q + 1} ).

平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）におけるスケール番号ｉは、平滑化フィルタを畳み込んだ回数に相当する。平滑化処理手段２２は、例えばスケールが異なるａ×ｋ枚（ａ及びｋはそれぞれ２以上の整数）の平滑化画像Ｌ（ｘ，ｙ，σ_１）〜Ｌ（ｘ，ｙ，σ_ａ×ｋ）を生成する。平滑化処理手段２２は、例えばａ＝２、ｋ＝３０とすれば２×３０＝６０枚の平滑化画像Ｌ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_６０）を生成する。 The scale number i in the smoothed image L (x, y, σ _i ) corresponds to the number of times the smoothing filter is convoluted. The smoothing processing means 22 is, for example, a × k images (a and k are integers of 2 or more) of different scales L (x, y, σ ₁ ) to L (x, y, σ _{a × k).} ) Is generated. For example, if a = 2 and k = 30, the smoothing processing unit 22 generates 2 × 30 = 60 smoothed images L (x, y, σ ₁ ) to (x, y, σ ₆₀ ).

平滑化フィルタには、例えばガウシアンフィルタを用いることができる。平滑化フィルタは、例えば検出対象であるオブジェクトの輪郭形状に合わせたフィルタ特性となる３×３オペレータから成る。例えば判別器１３（図１）で検出対象とするオブジェクトが人物の頭部であれば、平滑化フィルタとして、人物の頭部の輪郭形状に沿って下側のフィルタ係数が小さくなる特性（オメガ形状）を有するフィルタを用いる。このような平滑化フィルタを用いることで、人物の頭部の輪郭形状を有する領域を強調し、それ以外の領域は抑制された平滑化処理を実現できる。 As the smoothing filter, for example, a Gaussian filter can be used. The smoothing filter is composed of, for example, a 3 × 3 operator having a filter characteristic that matches the contour shape of the object to be detected. For example, if the object to be detected by the discriminator 13 (FIG. 1) is a human head, the smoothing filter has a characteristic in which the lower filter coefficient decreases along the contour shape of the human head (omega shape). ) Is used. By using such a smoothing filter, it is possible to realize a smoothing process in which a region having a contour shape of a person's head is emphasized and other regions are suppressed.

なお、フィルタの形状はオメガ形状には限定されず、例えば特開２００３−２４８８２４号公報等に記載されたものなど、他の公知技術を適用することも可能である。例えば検出対象のオブジェクトの形状が円形、三角形、四角形などの場合には、それぞれのオブジェクト形状に合わせたフィルタ特性を有する平滑化フィルタを用いて平滑化処理を施せばよい。 The shape of the filter is not limited to the omega shape, and other known techniques such as those described in Japanese Patent Application Laid-Open No. 2003-248824 can be applied. For example, when the object to be detected has a circular shape, a triangular shape, a quadrangular shape, or the like, the smoothing process may be performed using a smoothing filter having a filter characteristic matched to each object shape.

差分画像生成手段２３は、平滑化処理手段２２が生成した複数枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）を入力し、スケールが互いに異なる２つの平滑化画像間の差分画像Ｇ（ｘ，ｙ，σ_ｊ）を、スケールを変えつつ複数枚生成する。ここで、差分画像Ｇ（ｘ，ｙ，σ_ｊ）におけるスケール番号ｊの最大値は、平滑化画像Ｌにおけるスケールσ_ｉの最大値（例えばａ×ｋ）よりは小さい。差分画像生成手段２３は、例えばスケール番号ｊに応じたスケールだけ離れた平滑化画像間の差分画像を生成する。具体的には、差分画像生成手段２３は、例えば下記式１を用いて差分画像Ｇ（ｘ，ｙ，σ_ｊ）を生成することができる。
Ｇ（ｘ，ｙ，σ_ｊ）＝Ｌ（ｘ，ｙ，σ_ｊ）−Ｌ（ｘ，ｙ，σ_ｊ×ａ）・・・（１）
差分画像は、差分値の絶対値であってもよい。 The difference image generation means 23 receives a plurality of smoothed images L (x, y, σ _i ) generated by the smoothing processing means 22 and a difference image G (x between two smoothed images having different scales. , Y, σ _j ) are generated while changing the scale. Here, the maximum value of the scale number j in the difference image G (x, y, σ _j ) is smaller than the maximum value (for example, a × k) of the scale σ _i in the smoothed image L. The difference image generation unit 23 generates a difference image between smoothed images separated by a scale corresponding to the scale number j, for example. Specifically, the difference image generation means 23 can generate the difference image G (x, y, σ _j ) using, for example, the following formula 1.
G (x, y, σ _j ) = L (x, y, σ _j ) −L (x, y, σ _{j × a} ) (1)
The difference image may be an absolute value of the difference value.

上記の式１の定義からわかるように、差分画像Ｇ（ｘ，ｙ，σ_ｊ）は、スケールσ_ｊの平滑化画像と、スケールσ_ｊ×ａの平滑化画像との差分として定義される。例えばａ＝２、ｋ＝３０とすると、差分画像生成手段２３は、スケールσ_１とσ_２、スケールσ_２とσ_４、スケールσ_３とσ_６、・・・、スケールσ_３０とσ_６０の組み合わせからなる３０枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_３０）を生成する。式１に従って差分画像Ｇ（ｘ，ｙ，σ_ｊ）を生成する場合、ｊは１〜ｋの値を取る。すなわち、差分画像生成手段２３は、ｋ枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ｋ）を生成する。 As can be seen from the definition of formula 1 above, the difference image _{G (x, y, σ j} ) has a smoothed image of the scale sigma _j, it is defined as the difference between the smoothed image of the scale σ _{j × a.} For example, the a = 2, k = 30, the difference image generation unit 23, the scale sigma ₁ and sigma _2, scale sigma ₂ and sigma _4, scale sigma ₃ and sigma _6, · · ·, scale sigma ₃₀ and sigma ₆₀ of Thirty differential images G (x, y, σ ₁ ) to (x, y, σ ₃₀ ) composed of combinations are generated. When the difference image G (x, y, σ _j ) is generated according to Equation 1, j takes a value from 1 to k. That is, the difference image generation unit 23 generates k difference images G (x, y, σ ₁ ) to (x, y, σ _k ).

差分画像生成手段２３は、上記に代えて、一定のスケールだけ離れた平滑化画像間の差分を差分画像として生成してもよい。差分画像生成手段２３は、例えばスケールσ_ｊの平滑化画像と、スケールσ_ｊ＋ｐの平滑化画像（ｐは１以上の整数）との差分を差分画像（ｘ，ｙ，σ_ｊ）として生成してもよい。具体的には、差分画像生成手段２３は、下記式２を用いて差分画像Ｇ（ｘ，ｙ，σ_ｊ）を生成してもよい。
Ｇ（ｘ，ｙ，σ_ｊ）＝Ｌ（ｘ，ｙ，σ_ｊ）−Ｌ（ｘ，ｙ，σ_ｊ＋ｐ）・・・（２）
この場合、平滑化画像の枚数をｒ（ｒ：３以上の整数）枚とすると、ｊは１〜ｒ−ｐの値を取る。すなわち差分画像生成手段２３は、ｒ−ｐ枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ｒ−ｐ）を生成する。具体的には、ｒ＝６０で、ｐ＝３０の場合、差分画像生成手段２３は、スケールσ_１とσ_３１、スケールσ_２とσ_３２、スケールσ_３とσ_３３、・・・、スケールσ_３０とσ_６０の組み合わせからなる３０枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_３０）を生成する。 Instead of the above, the difference image generation means 23 may generate a difference between smoothed images separated by a certain scale as a difference image. The difference image generating means 23 generates, for example, a difference between a smoothed image of scale σ _{j and} a smoothed image of scale σ _{j + p} (p is an integer of 1 or more) as a difference image (x, y, σ _j ). Also good. Specifically, the difference image generation means 23 may generate the difference image G (x, y, σ _j ) using the following formula 2.
G (x, y, σ _j ) = L (x, y, σ _j ) −L (x, y, σ _{j + p} ) (2)
In this case, if the number of smoothed images is r (r: an integer of 3 or more), j takes a value of 1 to rp. That is, the difference image generation unit 23 generates rp difference images G (x, y, σ ₁ ) to (x, y, σ _r−p ). Specifically, in the case of r = 60 and p = 30, the difference image generation unit 23 determines that the scales σ ₁ and σ ₃₁ , the scales σ ₂ and σ ₃₂ , the scales σ ₃ and σ ₃₃ ,. ₃₀ differential images G (x, y, σ ₁ ) to (x, y, σ ₃₀ ) composed of combinations of ₃₀ and σ ₆₀ are generated.

合算手段２４は、差分画像生成手段２３が生成した複数枚の差分画像Ｇ（ｘ，ｙ，σ_ｊ）を合算し、合算画像ＡＰ（ｘ，ｙ）を生成する。位置推定手段２５は、合算画像ＡＰ（ｘ，ｙ）における画素値に基づいてオブジェクトの位置を推定する。位置推定手段２５は、例えば合算画像ＡＰ（ｘ，ｙ）において画素値（差分値を合計した値）が最も大きくなる位置を調べ、その位置をオブジェクトの位置として推定する。 The summing unit 24 adds the plurality of difference images G (x, y, σ _j ) generated by the difference image generating unit 23 to generate a combined image AP (x, y). The position estimation unit 25 estimates the position of the object based on the pixel value in the combined image AP (x, y). For example, the position estimating unit 25 checks the position where the pixel value (the sum of the difference values) is the largest in the combined image AP (x, y), and estimates the position as the position of the object.

サイズ推定手段２６は、複数枚の差分画像Ｇ（ｘ，ｙ，σ_ｊ）の画素値を比較し、最大の画素値を有する差分画像のスケールに基づいて、検出すべきオブジェクトのサイズを推定する。サイズ推定手段２６は、例えば最大の画素値（差分値）を有する差分画像の生成元となった２枚の平滑化画像のうちのスケールが小さい方の平滑化画像内のスケールに基づいてオブジェクトのサイズを推定する。すなわちサイズ推定手段２６は、式１又は式２に従って生成される複数枚の差分画像Ｇ（ｘ，ｙ，σ_ｊ）のうちで、最大の差分値を有するスケールσ_ｊを求め、求めたスケールσ_ｊに基づいてオブジェクトの位置を推定する。 The size estimation means 26 compares the pixel values of a plurality of difference images G (x, y, σ _j ), and estimates the size of the object to be detected based on the scale of the difference image having the maximum pixel value. . For example, the size estimating unit 26 determines the object based on the scale in the smoothed image having the smaller scale of the two smoothed images that are the generation sources of the difference image having the maximum pixel value (difference value). Estimate the size. That is, the size estimation unit 26 obtains the scale σ _j having the maximum difference value among the plurality of difference images G (x, y, σ _j ) generated according to the expression 1 or 2, and the obtained scale σ The position of the object is estimated based on _j .

上記のオブジェクトの位置及びサイズの推定について説明する。平滑化処理手段２２は、オブジェクト形状に合わせたフィルタ特性を有する平滑化フィルタを用いて平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）を生成しており、この平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）は、特定の形状を持つ領域が強調され、他の領域が抑制された画像となる。例えば平滑化処理を数十回行ったときでも平滑化画像内にオブジェクトの輪郭成分が残るが、スケールσ_ｉが大きくなるほど、オブジェクトの領域はボケていくと共に広がっていく。 The estimation of the position and size of the object will be described. The smoothing processing means 22 generates a smoothed image L (x, y, σ _i ) using a smoothing filter having a filter characteristic matched to the object shape, and this smoothed image L (x, y, σ _i ) is an image in which a region having a specific shape is emphasized and other regions are suppressed. For example, even when the smoothing process is performed several tens of times, the contour component of the object remains in the smoothed image, but as the scale σ _i increases, the area of the object blurs and expands.

平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）におけるオブジェクトの形状及びサイズは、入力画像内のオブジェクトの形状及びサイズとそれぞれ一致していると仮定する。この平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）でのオブジェクト形状及びサイズの顕著性を算出するために、あるスケールの平滑化画像に対して、そのスケールよりもスケールが大きい平滑化画像を背景として設定する。すなわちスケールσ_ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ）対して、式１ではスケールσ_ｊ×ａの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ×ａ）を背景画像として設定し、式２ではスケールσ_ｊ＋ｐの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ＋ｐ）を背景として設定する。そして、式１又は式２に従って、スケールσ_ｊの平滑化画像と背景画像として設定する平滑化画像との差分画像Ｇ（ｘ，ｙ，σ_ｊ）が、スケールσ_ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ）におけるオブジェクトの顕著性として算出される。このように差分画像生成手段２３においてオブジェクトの顕著性を数値化し、位置推定手段２５及びサイズ推定手段２６において、差分画像生成手段２３において数値化されたオブジェクトの顕著性に基づいて、オブジェクトの位置及びサイズをそれぞれ推定する。 It is assumed that the shape and size of the object in the smoothed image L (x, y, σ _i ) match the shape and size of the object in the input image, respectively. In order to calculate the saliency of the object shape and size in the smoothed image L (x, y, σ _i ), a smoothed image having a scale larger than that scale is used as a background for the smoothed image of a certain scale. Set as. That scale sigma _j of the smoothed image _{L (x, y, σ j} ) for sets smoothed image L of formula 1 in the scale _{σ j × a (x, y} , σ j × a) as the background image, In Equation 2, a smoothed image L (x, y, σ _{j + p} ) of scale σ _{j + p} is set as the background. Then, according to Equation 1 or Equation 2, the scale sigma difference image G of the smoothed image to be set as _j of the smoothed image and the background image (x, y, σ _j) are scaled sigma _j of the smoothed image L (x , Y, σ _j ) as the saliency of the object. In this way, the saliency of the object is digitized by the difference image generation means 23, and the position estimation means 25 and the size estimation means 26 based on the saliency of the object quantified by the difference image generation means 23. Estimate each size.

ここで、画像内においてオブジェクトが理想形状、すなわちフィルタ特性に最も合致した形状であって、かつ背景にノイズがない差分画像が、他の差分画像に比べて最大の信号を有する。言い換えれば、前処理済みの画像Ｐ（ｘ，ｙ）内のオブジェクトを構成する各画素の成分がオブジェクトの領域にほぼ等しくなるまで広がったとき、差分画像Ｇ（ｘ，ｙ，σ_ｊ）内の差分値は最大となる。例えば画像Ｐ（ｘ，ｙ）内のオブジェクトが直径１０画素の円形画素から構成される場合、複数の差分画像のうちで、ｊ＝１０の差分画像Ｇ（ｘ，ｙ，σ_１０）（式１ではＬ（ｘ，ｙ，σ_１０）−Ｌ（ｘ，ｙ，σ_ａ×１０）、式２ではＬ（ｘ，ｙ，σ_１０）−Ｌ（ｘ，ｙ，σ_１０＋ｐ））における差分値が、他の差分画像における差分値に比べて大きな値を有することになる。 Here, the difference image in which the object has an ideal shape in the image, that is, the shape that most closely matches the filter characteristics and has no noise in the background, has the maximum signal compared to the other difference images. In other words, when the component of each pixel constituting the object in the preprocessed image P (x, y) spreads to be approximately equal to the object region, the difference image G (x, y, σ _j ) The difference value is the maximum. For example, when an object in the image P (x, y) is composed of circular pixels having a diameter of 10 pixels, among the plurality of difference images, a difference image G (x, y, σ ₁₀ ) (equation 1) where j = 10. In L (x, y, σ ₁₀ ) −L (x, y, σ _{a × 10} ), and in Equation 2, the difference value in L (x, y, σ ₁₀ ) −L (x, y, σ _{10 + p} )) is Therefore, it has a larger value than the difference value in other difference images.

一方で、実際に画像内に映し出されるオブジェクトは、カメラとオブジェクトの位置関係や個体差などに応じて映り方が異なり、オブジェクトの輪郭形状及びサイズは理想形状になるとは限らない。つまり、オブジェクトの輪郭形状及びサイズは変動する。そこで、位置推定手段２５は、複数の差分画像Ｇ（ｘ，ｙ，σ_ｊ）を合算した合算画像ＡＰ（ｘ，ｙ）を用いてオブジェクトの位置を推定する。このようにすることで、オブジェクトの変動を吸収しながらオブジェクトの位置を推定できる。つまり、サイズが小さいオブジェクトからサイズが大きいオブジェクトに含まれる様々な輪郭形状の変動を持つオブジェクトに対して、平滑化画像を加算した合算画像ＡＰ（ｘ，ｙ）から最大値を検出することにより、変動を吸収しながら位置推定を行うことができる。 On the other hand, the object actually reflected in the image differs in the way it is reflected according to the positional relationship between the camera and the object, individual differences, and the like, and the contour shape and size of the object are not necessarily ideal. That is, the contour shape and size of the object vary. Therefore, the position estimation unit 25 estimates the position of the object using the combined image AP (x, y) obtained by adding a plurality of difference images G (x, y, σ _j ). By doing so, it is possible to estimate the position of the object while absorbing the variation of the object. That is, by detecting the maximum value from the summed image AP (x, y) obtained by adding the smoothed image to the object having various outline shape fluctuations included in the large object from the small object, Position estimation can be performed while absorbing fluctuations.

また、上述したように、式１、式２におけるスケール番号ｊは、画像Ｐ（ｘ，ｙ）内における検出対象のオブジェクトのサイズに対応するパラメータである。オブジェクトのサイズが小さい場合にはスケール番号ｊが小さい差分画像Ｇ（ｘ，ｙ，σ_ｊ）から最大値が検出され、オブジェクトのサイズが大きい場合にはスケール番号ｊが大きい差分画像Ｇ（ｘ，ｙ，σ_ｊ）から最大値が検出される。サイズ推定手段２６は、この性質を利用し、複数の差分画像の間で差分値同士を比較し、最大の差分値となる差分画像のスケール番号、すなわち平滑化処理の繰り返し回数からオブジェクトのサイズを推定する。 As described above, the scale number j in the expressions 1 and 2 is a parameter corresponding to the size of the object to be detected in the image P (x, y). When the object size is small, the maximum value is detected from the difference image G (x, y, σ _j ) having a small scale number j. When the object size is large, the difference image G (x, The maximum value is detected from y, σ _j ). Using this property, the size estimation means 26 compares the difference values among a plurality of difference images, and calculates the size of the object from the scale number of the difference image that becomes the maximum difference value, that is, the number of repetitions of the smoothing process. presume.

部分画像生成手段２７は、位置推定手段２５から推定されたオブジェクトの位置を入力し、サイズ推定手段２６から推定されたオブジェクトのサイズを入力する。部分画像生成手段２７は、入力画像（フレーム画像）からオブジェクトが存在すると推定される位置の周辺の画像を部分画像として切り出す。また部分画像生成手段２７は、切り出した部分画像を、推定されたサイズに応じた倍率で拡大／縮小する。推定されたサイズに応じた倍率で拡大／縮小することで、オブジェクトのサイズの変動を吸収することができる。 The partial image generation unit 27 inputs the position of the object estimated from the position estimation unit 25 and inputs the size of the object estimated from the size estimation unit 26. The partial image generating means 27 cuts out an image around a position where an object is estimated to exist from the input image (frame image) as a partial image. Further, the partial image generation means 27 enlarges / reduces the cut out partial image at a magnification according to the estimated size. By enlarging / reducing by a magnification according to the estimated size, a change in the size of the object can be absorbed.

図８は、オブジェクト候補点検出手段１２の動作手順を示す。前処理手段２１は、画像入力手段１１（図１）からフレーム画像を受け取り、フレーム画像に対して前処理を行う（ステップＳ１）。すなわち、解像度変換手段５１がフレーム画像を所定の解像度にまで低解像度化し、動き領域抽出手段５２が低解像度化されたフレーム画像から動き領域を抽出する。前処理手段２１は、前処理後の画像、すなわち解像度が低解像度化され、動き領域が白で背景領域が黒となるようにグレースケール化された画像Ｐ（ｘ，ｙ）を平滑化処理手段２２に入力する。なお、前処理手段２１における解像度変換及び動き領域抽出の何れか一方、又は双方を省略しても構わない。双方を省略する場合、フレーム画像を平滑化処理手段２２に入力すればよい。 FIG. 8 shows an operation procedure of the object candidate point detection means 12. The preprocessing unit 21 receives the frame image from the image input unit 11 (FIG. 1) and performs preprocessing on the frame image (step S1). That is, the resolution conversion means 51 lowers the frame image to a predetermined resolution, and the motion area extraction means 52 extracts a motion area from the reduced resolution frame image. The pre-processing means 21 smoothes the pre-processed image, that is, the image P (x, y) gray-scaled so that the resolution is reduced and the motion area is white and the background area is black. 22 is input. Note that either one or both of resolution conversion and motion region extraction in the preprocessing unit 21 may be omitted. When both are omitted, the frame image may be input to the smoothing processing means 22.

平滑化処理手段２２は、画像Ｐ（ｘ，ｙ）を入力し、画像Ｐ（ｘ，ｙ）に平滑化フィルタを畳み込む処理を繰り返すことで、スケールが異なる複数の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）を生成する（ステップＳ２）。平滑化処理手段２２は、フレーム画像そのものに対して平滑化フィルタを畳み込んでもよい。差分画像生成手段２３は、スケールが異なる２つの平滑化画像間の差分を計算し、差分画像Ｇ（ｘ，ｙ，σ_ｊ）を生成する（ステップＳ３）。差分画像生成手段２３は、例えば式１を用いて、ａ×ｋ枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）からスケール番号１〜ｋのｋ枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ｋ）を生成する。あるいは差分画像生成手段２３は、式２を用いて、ｒ枚の平滑化画像Ｌ（ｘ，ｙ，σ_ｉ）からスケール番号１〜ｒ−ｐのｒ−ｐ枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ｒ−ｐ）を生成する。 The smoothing processing means 22 receives the image P (x, y), and repeats the process of convolving the smoothing filter with the image P (x, y), whereby a plurality of smoothed images L (x, y with different scales) are obtained. , Σ _i ) is generated (step S2). The smoothing processing unit 22 may convolve a smoothing filter with the frame image itself. The difference image generation means 23 calculates a difference between two smoothed images having different scales, and generates a difference image G (x, y, σ _j ) (step S3). The difference image generation means 23 uses, for example, Equation 1 to calculate k difference images G (x, y, σ) having scale numbers 1 to k from a × k smoothed images L (x, y, σ _i ). ₁ ) to (x, y, σ _k ) are generated. Alternatively, the difference image generation unit 23 uses the equation 2 to calculate rp difference images G (x, y) having scale numbers 1 to rp from r smoothed images L (x, y, σ _i ). , Σ ₁ ) to (x, y, σ _rp ).

合算手段２４は、差分画像生成手段２３が生成した複数の差分画像を合算し、合算画像ＡＰ（ｘ，ｙ）を生成する（ステップＳ４）。合算手段２４は、例えば差分画像生成手段２３で生成されたｋ枚の差分画像Ｇ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ｋ）の各画素値を全て加算する。位置推定手段２５は、合算画像ＡＰ（ｘ，ｙ）に基づいて、オブジェクトが存在する位置を推定する（ステップＳ５）。位置推定手段２５は、例えば合算画像ＡＰ（ｘ，ｙ）を構成する各画素位置の画素値（差分の合算値）を比較し、合算画像において最大の画素値を有する画素位置をオブジェクトの位置として推定する。 The summing unit 24 adds the plurality of difference images generated by the difference image generating unit 23 to generate a summed image AP (x, y) (step S4). For example, the summing unit 24 adds all the pixel values of the _k difference images G (x, y, σ ₁ ) to (x, y, σ _k ) generated by the difference image generating unit 23. The position estimation means 25 estimates the position where the object exists based on the combined image AP (x, y) (step S5). The position estimation unit 25 compares, for example, pixel values (summation values of differences) of pixel positions constituting the summed image AP (x, y), and uses the pixel position having the maximum pixel value in the summed image as the position of the object. presume.

なお、合算手段２４は、全ての差分画像を合算する必要はない。合算手段２４は、例えば全ｋ枚の差分画像のうちの任意数、及び任意のスケール番号の差分画像を合算してもよい。合算手段２４は、例えば吸収したいサイズ変動幅に応じて、加算処理に用いる差分画像の数（合算する差分画像のスケール）を変更してもよい。例えば、検出対象のオブジェクトの種類に応じて吸収したいサイズ変動幅を設定しておき、あるオブジェクトについては、スケール番号が小さい、具体的にはスケール番号１、２の差分画像Ｇ（ｘ，ｙ，σ_１）、（ｘ，ｙ，σ_２）を合算から除外して、スケール番号３〜ｋの差分画像Ｇ（ｘ，ｙ，σ_３）〜（ｘ，ｙ，σ_ｋ）を合算してもよい。また、合算手段２４が、スケール番号１から、ｋよりも小さい任意のスケール番号までの差分画像（ｘ，ｙ，σ_ｊ）を合算してもよい。 Note that the summing unit 24 does not have to sum all the difference images. For example, the summing unit 24 may sum any number of difference images of all k pieces of difference images and any number of difference images. For example, the summing unit 24 may change the number of difference images (the scale of the difference image to be summed) used for the addition processing in accordance with the size fluctuation range to be absorbed. For example, the size fluctuation range to be absorbed is set according to the type of the object to be detected, and for a certain object, the difference number G (x, y, Even if the difference images G (x, y, σ ₃ ) to (x, y, σ _k ) of scale numbers ₃ to _k are added together by excluding σ ₁ ) and (x, y, σ ₂ ) from the addition. Good. Further, the summing unit 24 may sum the difference images (x, y, σ _j ) from the scale number 1 to an arbitrary scale number smaller than k.

サイズ推定手段２６は、複数の差分画像Ｇ（ｘ，ｙ，σ_ｊ）に基づいて、オブジェクトのサイズを推定する（ステップＳ６）。サイズ推定手段２６は、例えばｋ枚の差分画像間で、位置推定手段２５で推定されたオブジェクトの位置の周辺の画素の画素値（差分値）を比較する。サイズ推定手段２６は、最大の画素値を与える差分画像のスケールを特定する。あるいはサイズ推定手段２６は、推定されたオブジェクトの位置の周辺だけではなく、差分画像の全画素の画素値を比較し、最大の画素値を与える差分画像のスケールを特定してもよい。平滑化処理を行うことで画像内の像がどの程度広がるか（ボケるか）は既知であるため、差分最大を与えるスケールが判明すれば、そのスケール番号に基づいてオブジェクトのサイズが推定できる。また、上述のように検出対象であるオブジェクトは変動するため、サイズ推定手段２６は、最も大きい差分値を有する差分画像から推定したサイズ±α（αは所定の値）をオブジェクトのサイズとして推定するようにしてもよい。 The size estimation means 26 estimates the size of the object based on the plurality of difference images G (x, y, σ _j ) (step S6). The size estimation unit 26 compares pixel values (difference values) of pixels around the position of the object estimated by the position estimation unit 25 between, for example, k difference images. The size estimation means 26 specifies the scale of the difference image that gives the maximum pixel value. Alternatively, the size estimation unit 26 may compare not only the vicinity of the estimated position of the object but also the pixel values of all the pixels of the difference image, and specify the scale of the difference image that gives the maximum pixel value. Since it is known how much the image in the image spreads out by performing the smoothing process, if the scale that gives the maximum difference is found, the size of the object can be estimated based on the scale number. Since the object to be detected varies as described above, the size estimation unit 26 estimates the size ± α (α is a predetermined value) estimated from the difference image having the largest difference value as the object size. You may do it.

部分画像生成手段２７は、推定されたオブジェクトの位置及びサイズを利用して、フレーム画像におけるオブジェクトが存在すると推定される位置の周辺の画像を部分画像として生成する（ステップＳ７）。部分画像生成手段２７は、例えばフレーム画像からオブジェクトが存在すると推定される位置の周辺の画像を切り出し、切り出した画像を、推定されたオブジェクトのサイズに応じて拡大／縮小する。推定されたオブジェクトのサイズに応じて拡大／縮小を行うことで、部分画像におけるオブジェクトの大きさを、判別器１３で使用されるテンプレートにおけるオブジェクトの大きさに適合させることができる。部分画像生成手段２７は、生成した部分画像を判別器１３へ出力する。判別器１３は、部分画像生成手段２７により生成された部分画像に対して、検出対象のオブジェクトの存在に関する詳細な判別処理を実行する。 The partial image generating unit 27 generates an image around the position where the object in the frame image is estimated to exist as a partial image using the estimated position and size of the object (step S7). For example, the partial image generation unit 27 cuts out an image around a position where an object is estimated to exist from a frame image, and enlarges / reduces the cut-out image according to the estimated size of the object. By enlarging / reducing according to the estimated size of the object, the size of the object in the partial image can be adapted to the size of the object in the template used in the discriminator 13. The partial image generating means 27 outputs the generated partial image to the discriminator 13. The discriminator 13 performs detailed discrimination processing on the presence of the detection target object on the partial image generated by the partial image generation unit 27.

比較例としてＤＯＧ（Differential Of Gaussian）画像を用いたオブジェクトの位置推定を考えると、ＤＯＧ画像を用いた位置推定では隣接するスケールの平滑化画像間の差分を全て求める必要があり、生成する必要がある差分画像の枚数が多くなる。図７に示すオブジェクト候補点検出手段１２を用いる場合、あるスケールの平滑化画像と、そのスケールから所定スケールだけ離れたスケールの平滑化画像との差分を差分画像として生成すればよく、ＤＯＧ画像を用いた位置推定に比して、差分画像の生成枚数を少なくすることができる。このため、効率的に精度良くオブジェクトの位置を推定することができる。また、図７に示す構成のオブジェクト候補点検出手段１２では、多重解像度画像を生成しなくてもオブジェクトのサイズの推定することができ、効率的にオブジェクトのサイズを推定することができる。 Considering the position estimation of an object using a DOG (Differential Of Gaussian) image as a comparative example, the position estimation using a DOG image needs to obtain all the differences between smoothed images of adjacent scales, and needs to be generated. The number of certain difference images increases. When the object candidate point detection unit 12 shown in FIG. 7 is used, a difference between a smoothed image having a certain scale and a smoothed image having a scale separated from the scale by a predetermined scale may be generated as a difference image. Compared to the position estimation used, the number of generated difference images can be reduced. For this reason, the position of the object can be estimated efficiently and accurately. Further, the object candidate point detection means 12 having the configuration shown in FIG. 7 can estimate the size of an object without generating a multi-resolution image, and can efficiently estimate the size of the object.

特に、平滑化処理手段２２においてａ×ｋ枚の平滑化画像Ｌ（ｘ，ｙ，σ_１）〜（ｘ，ｙ，σ_ａ×ｋ）を生成し、差分画像生成手段２３において、式１用いて、スケールσ_ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ｊ）とスケールσ_ａ×ｊの平滑化画像Ｌ（ｘ，ｙ，σ_ａ×ｊ）との差分を差分画像Ｇ（ｘ，ｙ，σ_ｊ）として求める場合、オブジェクトのサイズの様々な変動に合わせて、オブジェクトの位置を精度よく推定することができる。また、オブジェクトのサイズの推定を精度よく行うことができる。 In particular, the smoothing processing unit 22 generates a × k smoothed images L (x, y, σ ₁ ) to (x, y, σ _{a × k} ), and the difference image generating unit 23 uses Expression 1. Te, scale sigma _j of the smoothed image _{L (x, y, σ j} ) the scale sigma smoothed image L of _{a × j (x, y,} σ a × j) the difference between the difference image G (x, y , Σ _j ), the position of the object can be accurately estimated in accordance with various changes in the size of the object. Also, the object size can be estimated with high accuracy.

なお、上記の説明では、動き領域抽出手段５２が動き領域（オブジェクト）を白、背景領域を黒とするようなグレースケール化処理又は２値化処理を行うものとして説明したが、動き領域抽出手段５２の動作はこれには限定されない。例えば動き領域抽出手段５２は、動き領域を黒、背景領域を白とするようなグレースケール化処理又は２値化処理を行ってもよい。その場合には、位置推定手段２５は、合算画像ＡＰ（ｘ，ｙ）において、画素値が最小となる画素位置を、オブジェクトの位置として推定すればよい。また、サイズ推定手段２６は、複数の差分画像のうちで最小の画素値（差分値）を与える差分画像のスケールに基づいて、オブジェクトのサイズを推定すればよい。 In the above description, the motion region extraction unit 52 has been described as performing gray scale processing or binarization processing in which the motion region (object) is white and the background region is black. The operation of 52 is not limited to this. For example, the motion region extraction unit 52 may perform gray scale processing or binarization processing in which the motion region is black and the background region is white. In this case, the position estimation unit 25 may estimate the pixel position where the pixel value is minimum in the summed image AP (x, y) as the position of the object. The size estimation unit 26 may estimate the size of the object based on the scale of the difference image that gives the minimum pixel value (difference value) among the plurality of difference images.

また、上記の説明では、オブジェクト候補点検出手段１２が動画像からオブジェクトが存在すると推定される位置を１つだけ推定する例を説明したが、これには限定されない。オブジェクト候補点検出手段１２において、複数のオブジェクトの存在を推定し、オブジェクトが存在すると推定された複数の位置の周辺の画像をそれぞれ部分画像を切り出してもよい。例えばオブジェクト候補点検出手段１２において位置を推定すべきオブジェクトの数をＭとする。その場合、位置推定手段２５は、合算画像ＡＰ（ｘ，ｙ）の画素値を大きい順に並べ、上位Ｍ個の画素位置を各オブジェクトの位置として推定し、各位置の周辺の画像を部分画像として切り出せばよい。つまり、合算画像ＡＰ（ｘ，ｙ）において画素値が大きい順にＭ個の画素位置をオブジェクトの位置として推定すればよい。サイズ推定手段２６は、推定されたＭ個のオブジェクトの位置の周辺において、最大の画素値を与える差分画像のスケールに基づいて、各オブジェクトのサイズを推定すればよい。 In the above description, an example has been described in which the object candidate point detection unit 12 estimates only one position where an object is estimated to exist from a moving image. However, the present invention is not limited to this. The object candidate point detection means 12 may estimate the presence of a plurality of objects, and cut out partial images of images around a plurality of positions where the objects are estimated to exist. For example, let M be the number of objects whose positions should be estimated by the object candidate point detection means 12. In that case, the position estimation means 25 arranges the pixel values of the summed image AP (x, y) in descending order, estimates the top M pixel positions as the positions of the respective objects, and uses the peripheral image of each position as a partial image. Cut it out. That is, it is only necessary to estimate M pixel positions as object positions in descending order of pixel values in the combined image AP (x, y). The size estimation means 26 may estimate the size of each object based on the scale of the difference image that gives the maximum pixel value around the estimated positions of the M objects.

次いで、本実施形態における効果を説明する。図１に示すルックアップテーブル１４には、基本特徴タイプごとに生成されたルックアップテーブルが格納されており、基本特徴タイプが同じ弱判別器１５は、同じルックアップテーブルを参照してスコアを求める。通常、判別器１３の処理を実現するプロセッサにはキャッシュメモリが備えられており、そのキャッシュメモリには、弱判別器１５が参照したルックアップテーブルの参照箇所に近い部分が格納されることになる。 Next, effects in the present embodiment will be described. The lookup table 14 shown in FIG. 1 stores a lookup table generated for each basic feature type, and weak classifiers 15 having the same basic feature type obtain scores by referring to the same lookup table. . Usually, the processor that implements the processing of the discriminator 13 is provided with a cache memory, and the cache memory stores a portion close to the reference location of the lookup table referred to by the weak discriminator 15. .

弱判別器を、判別に有効な順にカスケード接続した一般的な判別器（強判別器）では、ある段の弱判別器の基本特徴タイプとその次の段の弱判別器の基本特徴タイプとが異なっていることが多い。その場合、ある段の弱判別器の処理において、その弱判別器が参照するルックアップテーブルの一部がキャッシュメモリに格納されたとしても、その次の段の弱判別器の処理においてキャッシュがヒットすることはあまり期待できない。これに対し、基本特徴タイプが同じ弱判別器が連続して並ぶ場合、同じ基本特徴タイプの弱判別器１５が連続して処理を行う間は同じルックアップテーブルが参照されることになり、キャッシュがヒットする確率の向上が見込める。 In a general classifier (strong classifier) in which weak classifiers are cascade-connected in the order effective for discrimination, the basic feature type of a weak classifier in one stage and the basic feature type of the weak classifier in the next stage are Often different. In that case, even if a part of the lookup table referenced by the weak classifier is stored in the cache memory in the process of the weak classifier at a certain stage, the cache is hit in the process of the weak classifier at the next stage. I can't expect much to do. On the other hand, when weak classifiers having the same basic feature type are successively arranged, the same look-up table is referred to while the weak classifiers 15 having the same basic feature type are continuously processed. Can improve the probability of hit.

本実施形態では、オブジェクト判別装置１０は、基本特徴タイプが同じ弱判別器１５が連続して並べられている判別器１３を用いて画像に検出対象のオブジェクトが存在するか否かの判別を行う。このようにすることで、同じ基本特徴タイプの弱判別器１５を連続して並べない場合に比して、参照の局所化を図ることができ、キャッシュヒットの確率を上げることができる。本実施形態では、キャッシュがヒットした分だけ、処理を高速化できる。特に、主に組み込み系で用いられるようなローパワーの処理系では、キャッシュヒットの有無が処理時間に与える影響は大きく、キャッシュをヒットさせることで処理時間を大幅に短縮することができる。 In this embodiment, the object discriminating apparatus 10 discriminates whether or not an object to be detected exists in an image using a discriminator 13 in which weak discriminators 15 having the same basic feature type are continuously arranged. . By doing in this way, compared with the case where the weak discriminators 15 of the same basic feature type are not continuously arranged, the reference can be localized and the probability of a cache hit can be increased. In this embodiment, the processing can be speeded up by the amount corresponding to the cache hit. In particular, in a low-power processing system used mainly in an embedded system, the presence or absence of a cache hit has a large influence on the processing time, and the processing time can be greatly shortened by hitting the cache.

また、本実施形態では、オブジェクト候補点検出手段１２を用いており、オブジェクトが存在する可能性が高い画像部分を判別器１３に入力している。本実施形態では、オブジェクトが存在する確率が高い画像部分を判別器１３で処理するため、各弱判別器１５で早期終了の判断を行わずに、複数の弱判別器１５を最終段まで一括で実行することが好ましい。早期終了を行わない場合、各弱判別器１５で分岐判断が発生しないため、パイプラインの乱れが生じない。更に、早期終了を行わないことで、判別器１３における処理時間を一定の時間に保つことができる効果もある。 Further, in the present embodiment, the object candidate point detection unit 12 is used, and an image portion having a high possibility that an object exists is input to the discriminator 13. In this embodiment, since the classifier 13 processes an image portion having a high probability that an object exists, each weak classifier 15 is collectively determined to the final stage without performing the early termination determination. It is preferable to carry out. If the early termination is not performed, no branch determination occurs in each weak discriminator 15, so that the pipeline is not disturbed. Furthermore, there is an effect that the processing time in the discriminator 13 can be maintained at a constant time by not performing the early termination.

続いて、本発明の第２実施形態を説明する。本実施形態におけるオブジェクト判別装置の構成は、図１に示す第１実施形態のオブジェクト判別装置１０の構成と同様である。本実施形態では、判別器１３において、基本特徴タイプが同じ複数の弱判別器１５（図２）が、各弱判別器１５における差分計算の際の画像の参照位置に従った並び順で並べられる。その他の点は、第１実施形態と同様である。 Next, a second embodiment of the present invention will be described. The configuration of the object discrimination device in the present embodiment is the same as the configuration of the object discrimination device 10 in the first embodiment shown in FIG. In the present embodiment, in the discriminator 13, a plurality of weak discriminators 15 (FIG. 2) having the same basic feature type are arranged in the arrangement order according to the reference position of the image at the time of difference calculation in each weak discriminator 15. . Other points are the same as in the first embodiment.

図９（ａ）は、基本特徴タイプ１における弱判別器の並び順を示し、（ｂ）は、テンプレート内での各弱判別器の画像の参照位置を示している。基本特徴タイプ１は、横方向（ｘ方向）に並ぶ２つの画素の差分であるとする。図９（ｂ）は、基本特徴体タイプ１で差分計算を行う複数の弱判別器１５のうちのいくつかにおける画像の参照位置を示している。基本特徴タイプ１で差分計算を行う複数の弱判別器１５は、図９（ａ）に示すように、各弱判別器１５における差分計算の際の画像の参照位置に従った順序でカスケード接続される。 FIG. 9A shows the arrangement order of the weak classifiers in the basic feature type 1, and FIG. 9B shows the reference positions of the images of the weak classifiers in the template. The basic feature type 1 is a difference between two pixels arranged in the horizontal direction (x direction). FIG. 9B shows image reference positions in some of the plurality of weak classifiers 15 that perform difference calculation using the basic feature type 1. As shown in FIG. 9A, the plurality of weak classifiers 15 that perform difference calculation in the basic feature type 1 are cascade-connected in the order according to the reference position of the image at the time of difference calculation in each weak classifier 15. The

例えば、基本特徴タイプ１で差分計算を行う複数の弱判別器１５は、各弱判別器１５における差分計算の際の画像の参照位置がラスタスキャン走査順に従って現れるように並べられている。図２に示す判別器１３における基本特徴タイプ２で差分計算を行う複数の弱判別器１５、及び、基本特徴タイプ３で差分計算を行う複数の弱判別器１５も、基本特徴タイプ１と同様に、弱判別器１５が、差分計算の際の画像の参照位置がラスタスキャン走査順に従って現れるように並べられている。 For example, the plurality of weak classifiers 15 that perform the difference calculation in the basic feature type 1 are arranged so that the reference positions of the images in the difference calculation in each weak classifier 15 appear in the raster scan scanning order. Similarly to the basic feature type 1, the plurality of weak discriminators 15 that perform difference calculation with the basic feature type 2 and the plurality of weak discriminators 15 that perform difference calculation with the basic feature type 3 in the discriminator 13 shown in FIG. The weak discriminators 15 are arranged so that the reference positions of the images in the difference calculation appear in the raster scan scanning order.

図１０は、本実施形態における判別器１３の構成に用いる判別器構成装置３０ａを示している。学習結果入力手段３１は、機械学習を用いて学習された複数の弱判別器１５を入力する。グループ化手段３２は、学習により得られた複数の弱判別器１５を、基本特徴タイプに応じて複数のグループにグループ化する。グループ化手段３２は、複数の弱判別器１５を、例えば基本特徴タイプごとにグループ化する。ソート手段３４は、同じグループに所属する弱判別器１５を、差分計算の際の画像の参照位置に従ってソートする。再配置手段３３は、ソート手段３４でソートされた順初に従って、グループごとに複数の弱判別器をカスケード接続し、判別器１３を構成する。 FIG. 10 shows a discriminator configuration apparatus 30a used for the configuration of the discriminator 13 in the present embodiment. The learning result input means 31 inputs a plurality of weak discriminators 15 learned using machine learning. The grouping means 32 groups the plurality of weak discriminators 15 obtained by learning into a plurality of groups according to the basic feature type. The grouping means 32 groups the plurality of weak classifiers 15 for each basic feature type, for example. The sorting unit 34 sorts the weak classifiers 15 belonging to the same group according to the reference position of the image at the time of difference calculation. The rearrangement unit 33 configures the classifier 13 by cascading a plurality of weak classifiers for each group according to the order sorted by the sorting unit 34.

ソート手段３４は、例えば、弱判別器１５が差分計算の際に参照する複数の参照位置のうちで最も原点（画像の左上）に近い参照位置を、その弱判別器１５が差分計算の際に参照する画像の参照位置としてソートを行う。具体的に、図３（ａ）に示す基本特徴タイプのように、弱判別器１５が３組の差分（６点参照）で差分計算を行う場合、ソート手段３４は、図４に示す点ｐｔ０を、その弱判別器１５における差分計算の際の画像の参照位置としてソートを行うことができる。これに代えて、図４に示す点ｐｔ１〜ｐｔ５のうちの何れかを、差分計算の際の画像の参照位置としてソートを行ってもよい。あるいは、弱判別器１５における複数の参照点の重心位置、例えば点ｐｔ０〜ｐｔ６の重心位置を、差分計算の際の画像の参照位置としてソートを行ってもよい。 For example, the sorting unit 34 selects a reference position closest to the origin (upper left of the image) among a plurality of reference positions that the weak discriminator 15 refers to when calculating the difference, and the weak discriminator 15 performs the difference calculation. Sort as the reference position of the image to be referenced. Specifically, when the weak discriminator 15 performs difference calculation with three sets of differences (see 6 points) as in the basic feature type shown in FIG. 3A, the sorting unit 34 uses the point pt0 shown in FIG. Can be sorted as the reference position of the image at the time of the difference calculation in the weak classifier 15. Instead of this, any of the points pt1 to pt5 shown in FIG. 4 may be sorted as an image reference position in the difference calculation. Alternatively, the centroid positions of a plurality of reference points in the weak classifier 15, for example, the centroid positions of the points pt <b> 0 to pt <b> 6 may be sorted as the image reference positions in the difference calculation.

ここで、単に基本特徴タイプに応じてグループ分けを行っただけであれば、基本特徴タイプは同じでも、ある段の弱判別器における差分計算の際の画像の参照箇所とその次の段の弱判別器における差分計算の際の画像の参照箇所とが離れている場合が多いと考えられる。その場合、ある段の弱判別器の処理において、その弱判別器が差分計算の際に参照する位置付近の画像がキャッシュメモリに格納されたとしても、その次の段の弱判別器が差分計算を行う際に画像のキャッシュがヒットすることはない。 Here, if the grouping is simply performed according to the basic feature type, even if the basic feature type is the same, the reference position of the image in the difference calculation in the weak classifier at a certain stage and the weak level at the next stage. It is considered that there are many cases where the reference location of the image at the time of difference calculation in the discriminator is far away. In that case, in the processing of the weak classifier at a certain stage, even if an image near the position referred to by the weak classifier is stored in the cache memory, the weak classifier at the next stage performs the difference calculation. The image cache never hits when doing

本実施形態では、複数の弱判別器１５を、差分計算の際の画像の参照位置に従った並び順でカスケード接続した判別器１３を用いる。弱判別器１５が画像の参照箇所に従った並び順で並べられている場合、後段の弱判別器１５が前段の弱判別器１５の参照箇所と近い部分を参照して差分計算を行い、画像のキャッシュがヒットする可能性がある。ルックアップテーブルの参照の局所化だけでなく、画像についても参照の局所化を図ることができ、差分計算における画像参照を効率的に行うことが可能である。 In the present embodiment, a discriminator 13 is used in which a plurality of weak discriminators 15 are cascade-connected in the arrangement order according to the reference position of the image at the time of difference calculation. When the weak classifiers 15 are arranged in the order in accordance with the reference location of the image, the subsequent weak discriminator 15 performs a difference calculation with reference to a portion close to the reference location of the previous weak discriminator 15, and the image There is a possibility of hitting the cache. In addition to localizing the reference of the lookup table, it is possible to localize the reference for the image, and it is possible to efficiently perform the image reference in the difference calculation.

なお、第１実施形態では、基本特徴タイプごとにグループ化を行い、全ての基本特徴タイプについて、同じ基本特徴タイプの弱判別器１５が連続してカスケード接続されるものとして説明したが、これには限定されない。必ずしも、全ての基本特徴タイプについて、同じ基本特徴タイプの弱判別器１５が連続して並んでいる必要はない。例えば、基本特徴タイプの使用頻度に応じて、いくつかの基本特徴タイプはグループ化の対象から除外し、除外した基本特徴タイプの弱判別器１５については連続してカスケード接続しないという構成も可能である。 In the first embodiment, grouping is performed for each basic feature type, and the weak classifiers 15 of the same basic feature type are continuously cascaded for all the basic feature types. Is not limited. It is not always necessary that the weak classifiers 15 of the same basic feature type are continuously arranged for all basic feature types. For example, depending on the frequency of use of the basic feature types, some basic feature types may be excluded from the grouping target, and the weak feature classifiers 15 of the excluded basic feature types may not be cascaded continuously. is there.

また、第２実施形態では、基本特徴タイプでグループ化した後に、弱判別器１５を差分計算の際の画像の参照位置に応じて並べる例について説明したが、これには限定されない。例えば、基本特徴タイプでグループ化せずに、弱判別器１５を差分計算の際の画像の参照位置に応じて並べてもよい。すなわち、各弱判別器１５における差分計算の際の画像の参照位置に従った並び順で複数の弱判別器１５をカスケード接続し、判別器１３を構成してもよい。その場合でも、画素値参照の際のキャッシュヒットの向上を見込むことができ、処理の高速化が可能である。 In the second embodiment, the example in which the weak classifiers 15 are arranged according to the reference position of the image at the time of difference calculation after grouping by the basic feature type has been described. However, the present invention is not limited to this. For example, the weak classifiers 15 may be arranged according to the reference position of the image at the time of difference calculation without grouping by basic feature type. That is, the discriminator 13 may be configured by cascading a plurality of weak discriminators 15 in the order of arrangement according to the reference position of the image at the time of difference calculation in each weak discriminator 15. Even in this case, it is possible to expect an improvement in cache hit when referring to the pixel value, and the processing speed can be increased.

上記各実施形態では、判別器１３が早期終了を行わないこととして説明を行ったが、判別器１３において早期終了を行ってもよい。例えば、数千の弱判別器を、数百の弱判別器ごとにブロック化し、ブロックごとに早期終了の判断を行うようにしてもよい。その場合、同一ブロック内で、基本特徴タイプが同じ弱判別器が連続して並ぶように、複数の弱判別器をカスケード接続すればよい。または、ブロックごとに、差分計算の際の画像の参照箇所に従った並び順で弱判別器を並べればよい。その場合、ブロック内の処理において参照の局所化を図ることができ、ブロック内で弱判別器が判別に有効な順に並んでいる場合に比して、処理時間を短縮できる。ブロックごとに基本特徴量タイプの母集団を変えて学習し、複数ブロックから構成させる強判別器を構成することも可能であり、その場合、ブロックごとに最後に１回だけ早期終了判断を行ってもよい。 In each of the above-described embodiments, it has been described that the discriminator 13 does not perform early termination. However, the discriminator 13 may perform early termination. For example, thousands of weak classifiers may be divided into blocks of several hundred weak classifiers, and early termination may be determined for each block. In that case, a plurality of weak classifiers may be cascade-connected so that weak classifiers having the same basic feature type are continuously arranged in the same block. Alternatively, the weak classifiers may be arranged for each block in the arrangement order according to the reference location of the image at the time of difference calculation. In this case, the reference can be localized in the processing in the block, and the processing time can be shortened as compared with the case where the weak classifiers are arranged in the order effective for the determination in the block. It is also possible to construct a strong discriminator consisting of multiple blocks by learning by changing the basic feature type population for each block. In that case, an early termination decision is made only once at the end of each block. Also good.

以上、本発明をその好適な実施形態に基づいて説明したが、本発明のオブジェクト判別装置、方法、及びプログラムは、上記実施形態にのみ限定されるものではなく、上記実施形態の構成から種々の修正及び変更を施したものも、本発明の範囲に含まれる。 As described above, the present invention has been described based on the preferred embodiments. However, the object discriminating apparatus, method, and program of the present invention are not limited to the above-described embodiments, and various configurations are possible from the configuration of the above-described embodiments. Modifications and changes are also included in the scope of the present invention.

１０：オブジェクト判別装置
１１：画像入力手段
１２：オブジェクト候補点検出手段
１３：判別器（強判別器）
１４：ルックアップテーブル
１５：弱判別器
２１：前処理手段
２２：平滑化処理手段
２３：差分画像生成手段
２４：合算手段
２５：位置推定手段
２６：サイズ推定手段
２７：部分画像生成手段
３０：判別器構成装置
３１：学習結果入力手段
３２：グループ化手段
３３：再配置手段
３４：ソート手段
５１：解像度変換手段
５２：動き領域抽出手段 10: Object discrimination device 11: Image input means 12: Object candidate point detection means 13: Discriminator (strong discriminator)
14: Look-up table 15: Weak discriminator 21: Preprocessing means 22: Smoothing processing means 23: Difference image generating means 24: Summing means 25: Position estimating means 26: Size estimating means 27: Partial image generating means 30: Discrimination Organizer 31: Learning result input means 32: Grouping means 33: Rearrangement means 34: Sort means 51: Resolution conversion means 52: Motion region extraction means

Claims

Each includes a strong discriminator in which a plurality of weak discriminators that perform a difference calculation in any of a plurality of basic feature types related to the difference calculation and obtain a score relating to the presence of the detection target from the input image are cascade-connected, In the discriminator, an object discriminating apparatus characterized in that weak discriminators having the same basic feature type are continuously arranged.

A lookup table for obtaining the score from the calculated value of the difference calculation is generated for each basic feature type, and the weak classifier refers to the lookup table based on the calculated value of the difference calculation. The object discriminating apparatus according to claim 1, wherein the score is obtained by doing so.

The plurality of weak classifiers are learned using machine learning, the plurality of weak classifiers generated by the learning are grouped into a plurality of groups according to the basic feature type, and weak classifications belonging to the same group The object discriminating apparatus according to claim 1 or 2, wherein the strong discriminator is configured by cascading the plurality of weak discriminators so that the devices are arranged in a row.

In the strong discriminator, when there are a plurality of weak discriminators having the same basic feature type, the plurality of weak discriminators having the same basic feature type are arranged according to the reference position of the image in the difference calculation in each weak discriminator. 4. The object discrimination device according to claim 1, wherein the object discrimination device is arranged in order.

5. The plurality of weak classifiers having the same basic feature type are arranged so that reference positions of images at the time of difference calculation in each weak classifier appear in the raster scan scanning order. Object discriminator.

Each includes a strong discriminator in which a plurality of weak discriminators that perform a difference calculation in any of a plurality of basic feature types related to the difference calculation and obtain a score relating to the presence of the detection target from the input image are cascade-connected, In the classifier, the weak classifiers are arranged in the order of arrangement according to the reference position of the image at the time of difference calculation in each weak classifier.

The object discriminating apparatus according to claim 6, wherein the plurality of weak discriminators are arranged so that the reference positions of the images in the difference calculation in each weak discriminator appear in the raster scan scanning order.

8. An object candidate point detecting unit that estimates an object position, cuts out an image around the estimated object position from a processing target image, and gives the image to the strong discriminator. The object discrimination device according to any one of the above.

The object candidate point detection means
Smoothing processing means for repeatedly performing a process of convolving a smoothing filter having a filter characteristic corresponding to the contour shape of an object into an image, and generating a plurality of smoothed images having different scales from the image to be processed;
A difference image generating means for generating a plurality of difference images between two smoothed images having different scales among the plurality of smoothed images, while changing the scale;
A summing means for summing the plurality of difference images to generate a summed image;
Position estimation means for estimating the position of the object based on the pixel value in the combined image;
The object discriminating apparatus according to claim 8, further comprising: a partial image generating unit that cuts out an image of a region around the estimated position from the processing target image.

The smoothing processing means performs a × k smoothed images L (x, y, σ _i ) (i = ₁ to _{a ×)} from the scale σ ₁ to σ _{a × k} (a and k are integers of 2 or more). k), and the difference image generation means smoothes k difference images G (x, y, σ _j ) (j = 1 to k) from the scales σ ₁ to σ _k , respectively, on the scale σ _j . And generating _a smoothed image L (x, y, σ _{j × a} ) having _a scale σ _{j × a} based on a difference between the converted image L (x, y, σ _j ) and the smoothed image L (x, y, σ _{j × a} ) having _a scale σ _{j × a.} 10. The object discrimination device according to 9.

The smoothing processing means generates r smoothed images L (x, y, σ _i ) (i = ₁ to _r ) from scale σ ₁ to σ _r (r is an integer of 3 or more), and the difference The image generating means outputs kp differential images G (x, y, σ _j ) (j = 1 to k−p) from the scale σ ₁ to σ _k−p (p is an integer of 1 or more), smoothed image L of each scale _{σ j (x, y, σ} j) is characterized in that to produce on the basis of the difference between the scale sigma _{j + p} of the smoothed image _{L (x, y, σ j} + p) and The object discrimination device according to claim 9.

Each is an object discrimination method that performs a difference calculation in any of a plurality of basic feature types relating to a difference calculation, and executes a plurality of weak discriminations in a cascade to obtain a score relating to the presence of a detection target from an input image,
An object discrimination method characterized in that among the plurality of weak discriminations, the weak discrimination with the same basic feature type is continuously executed.

A program for causing a computer to perform a difference calculation with any of a plurality of basic feature types relating to a difference calculation, and to execute a plurality of weak discriminations in a cascade to obtain a score relating to the presence of a detection target from an input image. ,
A program for causing the computer to continuously execute weak discrimination having the same basic feature type among the plurality of weak discriminations.

Each is an object discrimination method that performs a difference calculation in any of a plurality of basic feature types relating to a difference calculation, and executes a plurality of weak discriminations in a cascade to obtain a score relating to the presence of a detection target from an input image,
An object discrimination method characterized in that the plurality of weak discriminations are executed in an order according to an image reference position at the time of difference calculation in each weak discrimination.

A program for causing a computer to perform a difference calculation with any of a plurality of basic feature types relating to a difference calculation, and to execute a plurality of weak discriminations in a cascade to obtain a score relating to the presence of a detection target from an input image. ,
A program for causing the computer to execute the plurality of weak discriminations in an order according to a reference position of an image at the time of difference calculation in each weak discrimination.