JP2009140369A

JP2009140369A - Group learning device and group learning method, object detection device and object detection method, and computer program

Info

Publication number: JP2009140369A
Application number: JP2007317730A
Authority: JP
Inventors: Kenichi Hidai; 健一日台; Kotaro Sabe; 浩太郎佐部
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2007-12-07
Filing date: 2007-12-07
Publication date: 2009-06-25

Abstract

<P>PROBLEM TO BE SOLVED: To perform excellent group learning of a group of weak discriminators comprising fairly free filters by using statistic learning, such as Adaboost. <P>SOLUTION: The weak discriminator is a four-point filter for calculating a feature amount of a gray image by inner product calculation between a luminance value vector having luminance values of reference pixels at four pixel positions as elements and a filter coefficient vector comprising arbitrary real number values. Learning of the weak discriminator is performed by determining a filter coefficient comprising a combination of pixel positions of reference pixels and an ideal real number value by using Logistic Regression so that a discrimination difference obtained by charging the weak discriminator with a plurality of learning samples is minimized. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、与えられた画像が顔などの所望の対象物であるか否かを検出するための対象物検出装置を集団学習する集団学習装置及び集団学習方法、集団学習された対象物検出装置及び対象物検出方法、並びにコンピュータ・プログラムに係り、特に、Ａｄａｂｏｏｓｔなどの統計学習を用いて複数の弱判別器からなる判別器を集団学習する集団学習装置及び集団学習方法、集団学習して得られた弱判別器群を用いて対象物の検出を行なう対象物検出装置及び対象物検出方法、並びにコンピュータ・プログラムに関する。 The present invention relates to a group learning device and a group learning method for collectively learning an object detection device for detecting whether or not a given image is a desired object such as a face, and a group learning target detection device And a group learning apparatus and a group learning method for collectively learning a classifier composed of a plurality of weak classifiers using statistical learning such as Adaboost, etc. The present invention relates to an object detection apparatus, an object detection method, and a computer program for detecting an object using a group of weak classifiers.

さらに詳しくは、本発明は、Ａｄａｂｏｏｓｔなどの統計学習を用いて自由度の高いフィルタで構成される弱判別器群を集団学習する集団学習装置及び集団学習方法、集団学習して得られた弱判別器群を用いて対象物の検出を行なう対象物検出装置及び対象物検出方法、並びにコンピュータ・プログラムに係り、特に、入力画像（複数の参照画素）に対して乗算するフィルタ係数として用いる理想的な実数値を求めて自由度の高いフィルタ群を集団学習する集団学習装置及び集団学習方法、集団学習して得られた弱判別器群を用いて対象物の検出を行なう対象物検出装置及び対象物検出方法、並びにコンピュータ・プログラムに関する。 More specifically, the present invention relates to a collective learning apparatus and collective learning method for collective learning of weak discriminator groups composed of filters having a high degree of freedom using statistical learning such as Adaboost, and weak discrimination obtained by collective learning. BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an object detection apparatus and an object detection method for detecting an object using a group of devices, and a computer program, and particularly ideal as a filter coefficient for multiplying an input image (a plurality of reference pixels) Collective learning apparatus and collective learning method for collective learning of a filter group having a high degree of freedom by obtaining real values, and an object detection apparatus and an object for detecting an object using a weak classifier group obtained by collective learning The present invention relates to a detection method and a computer program.

顔認識の技術は、ユーザに負担をかけない個人認証システムをはじめとして、性別の識別など、マンマシン・インターフェースに幅広く適用が可能である。また、最近では、デジタルカメラにおける被写体検出若しくは被写体認識に基づく自動焦点合わせ（ＡＦ）や自動露光調整（ＡＥ）、自動画角設定、自動撮影といったカメラワークの自動化のために顔認識が適用されている。 Face recognition technology can be widely applied to man-machine interfaces, such as gender identification, as well as personal authentication systems that do not burden the user. Recently, face recognition has been applied to automate camera work such as automatic focusing (AF) and automatic exposure adjustment (AE) based on subject detection or subject recognition in a digital camera, automatic field angle setting, and automatic shooting. Yes.

顔認識システムは、例えば、顔画像の位置を検出して検出顔として抽出する顔検出処理と、検出顔から主要な顔器官の位置を検出する顔器官検出処理と、検出顔の識別（人物の特定）を行なう顔識別処理で構成される。顔検出処理では、入力画像から顔の大きさや位置を検出して、検出顔として抽出する。また、顔器官検出処理では、検出顔から目の中心や目頭、目尻、鼻、眉毛といった顔器官を発見する。そして、顔検出された検出された顔器官の位置に基づく位置合わせや回転の補正を行なった後に、顔識別処理では検出顔の識別（人物の認識など）を行なう。 The face recognition system includes, for example, a face detection process that detects the position of a face image and extracts it as a detected face, a face organ detection process that detects the position of a main face organ from the detected face, Specific face) processing. In the face detection process, the size and position of the face are detected from the input image and extracted as a detected face. In the facial organ detection process, facial organs such as the center of the eyes, the eyes, the corners of the eyes, the nose and the eyebrows are found from the detected face. Then, after performing alignment and rotation correction based on the detected face organ position, the detected face is identified (person recognition, etc.) in the face identification process.

図２５には、よくある画像認識の手順の一例を示している。同図に示すように、画像認識のための前処理若しくは特徴抽出として、ハール基底（Ｈａａｒｌｉｋｅｆｅａｔｕｒｅ）や、ガボア・フィルタ（ＧａｂｏｒＦｉｌｔｅｒ）、ステアラブル・フィルタ（ＳｔｅａｒａｂｌｅＦｉｌｔｅｒ）などのフィルタが広く用いられている。何故ならば、特徴抽出を経た方が、画像そのものを直接判別するよりも有利であることが多いからである。 FIG. 25 shows an example of a common image recognition procedure. As shown in the figure, filters such as a Haar like feature, a Gabor filter, and a steerable filter are widely used as preprocessing or feature extraction for image recognition. ing. This is because the feature extraction is often more advantageous than the direct discrimination of the image itself.

本明細書で言うフィルタは、下式（１）のように表される。フィルタＦｉｌｔｅｒは、複数の画素位置の参照画素の輝度値x_ｊからなる輝度ベクトルと、各参照画素に掛け合わせるフィルタ係数ｗ_jからなる係数ベクトルとの内積計算である。すなわち、フィルタＦｉｌｔｅｒは、参照画素として用いる複数（下式（１）ではＭ個）の画素位置と、各参照画素に適用（乗算）するフィルタ係数の組み合わせで構成される。 The filter referred to in this specification is expressed as the following formula (1). Filter Filter the luminance vector made up of the brightness value x _j of the reference pixels of a plurality of pixel locations, a dot product computation with coefficients vector composed of the filter coefficients w _j for multiplying each reference pixel. That is, the filter Filter is configured by a combination of a plurality (M in the following equation (1)) pixel positions used as reference pixels and filter coefficients to be applied (multiplied) to each reference pixel.

フィルタは、与えられた画像から所望の物体（例えば、被写体の顔や笑顔）を認識できたか否かを、例えば正負の符号で出力する。１つのフィルタは、ランダムよりも少し良い程度の「弱判別器」（ＷｅａｋＬｅａｒｎｅｒ）に過ぎない（弱判別器は、何らかの特徴量を使用して、対象物か又は非対象物であるかの判定を行なうものである）。１９９６年にＦｒｅｕｎｄらは、複数の弱判別器を組み合わせて用いることで強い判別器を構築できるとする理論、すなわちＡｄａｂｏｏｓｔを提案した。それぞれの弱判別器は自分の前に生成された弱判別器が不得意とする識別に重みαを置くようにして生成され、それぞれの弱判別器がどの程度確からしいによって信頼度を求め、それに基づき多数決を行なう。 The filter outputs whether or not a desired object (for example, a subject's face or smile) has been recognized from a given image, for example, with a positive or negative sign. One filter is only a “weak learner” (Weak Learner) that is slightly better than random (a weak discriminator uses some feature amount to determine whether it is an object or a non-object) ). In 1996, Freund et al. Proposed the theory that a strong classifier can be constructed by using a combination of a plurality of weak classifiers, that is, Adaboost. Each weak classifier is generated by placing a weight α on the weak classifier that the weak classifier generated before itself is not good at, and how reliable each weak classifier is, A majority decision is made based on this.

但し、フィルタは通常、あらかじめ設計されているものであり、数多く存在するフィルタの中からどのフィルタを用いるのが最適であるのかは自明ではない。よって、通常は、試行錯誤的に決定するか、若しくは、離散的に分解可能なフィルタ群の中から総当りで探索するなどの方法で、どのフィルタを使うかを決定する。 However, the filter is usually designed in advance, and it is not obvious which filter is optimal to use from among many existing filters. Therefore, usually, a filter to be used is determined by a method such as a trial and error determination or a search for a brute force from a group of filters that can be discretely decomposed.

上式（１）で表されるフィルタは、参照画素として用いる複数の画素位置と、各参照画素に適用（乗算）するフィルタ係数の組み合わせで構成される。どの画素位置を参照画素に用いるべきか、並びに、各参照画素に対し乗算するのに適したフィルタ係数の値は、与えられた画像から何を認識するか（例えば、顔を判別するのか、笑顔を判別するのか、あるいは他の物体を判別するのか）という目的に応じて区々であり、参照画素の位置とフィルタ係数の組み合わせを試行錯誤的に決定する必要がある。 The filter represented by the above formula (1) is composed of a combination of a plurality of pixel positions used as reference pixels and filter coefficients to be applied (multiplied) to each reference pixel. Which pixel position should be used for the reference pixel and what filter coefficient value is suitable for multiplying for each reference pixel is what to recognize from a given image (eg, whether to distinguish a face, smile It is necessary to determine the combination of the position of the reference pixel and the filter coefficient by trial and error.

例えば、２つの参照画素間の輝度値の差という極めて簡単な特徴量（ピクセル間差分特徴）を使用して対象物か否かを判別するという弱判別器をフィルタとして使用した集団学習装置について提案がなされている（例えば、特許文献１を参照のこと）。ピクセル間差分特徴をフィルタ出力に用いる判別器は、下式（２）のように表すことができる。 For example, a group learning device using a weak discriminator that uses a very simple feature quantity (difference feature between pixels), which is a difference in luminance value between two reference pixels, to determine whether or not an object is used as a filter is proposed. (For example, see Patent Document 1). The discriminator that uses the inter-pixel difference feature for the filter output can be expressed as the following equation (2).

ピクセル間差分特徴をフィルタ出力に用いる判別器は、非常に高速に動作することができるという点では有効である。しかしながら、上式（２）からも分かるように、２点の参照画素しか用いないことと、２点の参照画素に掛けるフィルタ係数ｗ₁、ｗ₂は＋１と−１に固定されることという強い制約があるため、フィルタとしての自由度が大変低く、認識性能を犠牲にしている。 A discriminator using the inter-pixel difference feature as a filter output is effective in that it can operate at a very high speed. However, as can be seen from the above equation (2), only two reference pixels are used, and the filter coefficients w ₁ and w ₂ applied to the two reference pixels are fixed to +1 and −1. Due to restrictions, the degree of freedom as a filter is very low, and the recognition performance is sacrificed.

上式（２）において、フィルタ係数ｗ_jを±１だけでなく任意の実数から選択することができれば、フィルタとしての自由度が増すことにより、認識性能が向上する可能性がある。しかしながら、その実数からなる係数をどのように決定すればいいのかが自明でない。このため、ピクセル差分特徴では固定の係数を用いている。 In the above equation (2), if the filter coefficient w _j can be selected not only from ± 1 but also from an arbitrary real number, the degree of freedom as a filter increases, so that the recognition performance may be improved. However, it is not obvious how to determine the coefficient consisting of the real number. For this reason, a fixed coefficient is used in the pixel difference feature.

Ｈｕａｎｇらが提案するＳｐａｒｓｅｆｅａｔｕｒｅでは、さまざまなＨａａｒ−ｌｉｋｅ特徴（画像中の矩形領域の平均画素値を足したり（＋）、引いたりする（−）フィルタ）を組み合わせることで、有効なフィルタを構成することを試みている（例えば、非特許文献１を参照のこと）。しかしながら、この方法は、｛−１，０，＋１｝の３値の係数からなる空間を探索していることに相当し、言い換えれば、任意の実数ではなく固定のフィルタ係数を用いるものであり、本来設定できるフィルタの自由度を使い切っていない。 In the Sparse feature proposed by Huang et al., An effective filter is constructed by combining various Haar-like features (adding (+) or subtracting (-) filters for the average pixel value of a rectangular area in an image). (See, for example, Non-Patent Document 1). However, this method corresponds to searching for a space composed of ternary coefficients {-1, 0, +1}, in other words, using a fixed filter coefficient instead of an arbitrary real number. The degree of freedom of the filter that can be originally set is not used up.

Ｓｐａｒｓｅｆｅａｔｕｒｅの定義は下式（３）の通りである。同式中のαを｛−１，＋１｝の２値でなく実数にすることで表現力を高めることができる。しかしながら、フィルタ係数をどの実数値に設定すれば後段の判別器にとって有利になるかが普通は判らないため、離散値に制約して探索することになる。逆に言えば、フィルタ係数として用いる理想的な実数値を求めることができれば、認識性能を高めることができる、と本発明者らは考えている。 The definition of the sparse feature is as shown in the following formula (3). Expressing power can be enhanced by making α in the formula not a binary value of {−1, +1} but a real number. However, since it is not usually known which real value the filter coefficient is set to be advantageous for the discriminator at the subsequent stage, the search is limited to discrete values. In other words, the present inventors consider that the recognition performance can be improved if an ideal real value used as a filter coefficient can be obtained.

なお、上式（３）において、参照されない画素はαが０になっていると考えることができるので、３値の係数からなるフィルタと言える。図２６において、黒い四角がα＝−１、白い四角がα＝＋１、グレーがα＝０に相当する。 In the above equation (3), a pixel that is not referred to can be considered as a filter composed of ternary coefficients because α can be considered to be 0. In FIG. 26, the black square corresponds to α = −1, the white square corresponds to α = + 1, and the gray corresponds to α = 0.

ターゲットの認識タスクにとって有利となるようなフィルタを自動的に作ることができれば、それら試行錯誤の労力から開放されるだけでなく、認識性能の向上や処理量の削減や高速化などに貢献することが期待でき、大変有効である。 If you can automatically create a filter that will be advantageous for the target recognition task, you will not only be freed from the effort of trial and error, but also contribute to improving recognition performance, reducing processing volume and speeding up. Can be expected and is very effective.

特開２００５−１５７６７９号公報JP 2005-157679 A Ｈｕａｎｇ，Ｃ．，Ａｉ，Ｈ．，Ｌｉ，Ｙ．，ａｎｄＬａｏ，Ｓ．２００６．“ＬｅａｒｎｉｎｇＳｐａｒｓｅＦｅａｔｕｒｅｓｉｎＧｒａｎｕｌａｒＳｐａｃｅｆｏｒＭｕｌｔｉ−ＶｉｅｗＦａｃｅＤｅｔｅｃｔｉｏｎ”（ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ７ｔｈｉｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＡｕｔｏｍａｔｉｃＦａｃｅａｎｄＧｅｓｔｕｒｅＲｅｃｏｇｎｉｔｉｏｎ（Ａｐｒｉｌ１０−１２，２００６）．ＦＧＲ．ＩＥＥＥＣｏｍｐｕｔｅｒＳｏｃｉｅｔｙ，Ｗａｓｈｉｎｇｔｏｎ，ＤＣ，４０１−４０７）Huang, C.I. , Ai, H .; Li, Y .; , And Lao, S .; 2006. "Learning Sparse Features in Granular Space for Multi-View Face Detection" (In Proceedings of the 7th international Conference on Automatic Face and Gesture Recognition (April 10-12,2006) .FGR.IEEE Computer Society, Washington, DC, 401-407 )

本発明の目的は、与えられた画像が顔などの所望の対象物であるか否かを検出するための対象物検出装置を好適に集団学習することができる、優れた集団学習装置及び集団学習方法、集団学習された対象物検出装置及び対象物検出方法、並びにコンピュータ・プログラムを提供することにある。 An object of the present invention is to provide an excellent group learning device and group learning that can suitably perform group learning for an object detection device for detecting whether or not a given image is a desired object such as a face. An object is to provide a method, a group-learned object detection device and an object detection method, and a computer program.

本発明のさらなる目的は、Ａｄａｂｏｏｓｔなどの統計学習を用いて複数の弱判別器からなる判別器を好適に集団学習することができる、優れた集団学習装置及び集団学習方法、集団学習して得られた弱判別器群を用いて対象物の検出を行なう対象物検出装置及び対象物検出方法、並びにコンピュータ・プログラムを提供することにある。 A further object of the present invention is to obtain an excellent group learning apparatus and group learning method and group learning, which can suitably perform group learning of a classifier consisting of a plurality of weak classifiers using statistical learning such as Adaboost. Another object of the present invention is to provide an object detection apparatus, an object detection method, and a computer program that detect an object using a group of weak classifiers.

本発明のさらなる目的は、Ａｄａｂｏｏｓｔなどの統計学習を用いて自由度の高いフィルタで構成される弱判別器群を好適に集団学習することができる、優れた集団学習装置及び集団学習方法、集団学習して得られた弱判別器群を用いて対象物の検出を行なう対象物検出装置及び対象物検出方法、並びにコンピュータ・プログラムを提供することにある。 A further object of the present invention is to provide an excellent group learning apparatus, group learning method, and group learning that can suitably perform group learning of weak classifier groups composed of filters having a high degree of freedom using statistical learning such as Adaboost. Another object of the present invention is to provide an object detection device, an object detection method, and a computer program for detecting an object using the weak classifier group obtained in this way.

本発明のさらなる目的は、入力画像（複数の参照画素）に対して乗算するフィルタ係数として用いる理想的な実数値を求めて自由度の高いフィルタ群を好適に集団学習することができる、優れた集団学習装置及び集団学習方法、集団学習して得られた弱判別器群を用いて対象物の検出を行なう対象物検出装置及び対象物検出方法、並びにコンピュータ・プログラムを提供することにある。 A further object of the present invention is to obtain an ideal real value to be used as a filter coefficient for multiplying an input image (a plurality of reference pixels) and to perform group learning suitably for a filter group having a high degree of freedom. To provide a group learning apparatus and a group learning method, an object detection apparatus and an object detection method for detecting an object using a weak classifier group obtained by group learning, and a computer program.

本発明は、上記課題を参酌してなされたものであり、その第１の側面は、対象物であるか非対象物であるかが既知である複数の濃淡画像からなる学習サンプルを使用して、与えられた画像が対象物であるか否かを検出する対象物の検出に用いる複数の弱判別器を集団学習する集団学習装置であって、
各弱判別器は、Ｌ点の画素位置の参照画素の輝度値を要素とする輝度値ベクトルと任意の実数値からなるフィルタ係数ベクトルとの内積計算により特徴量を算出するフィルタであり（但し、Ｌは２以上の整数）、
各弱判別器についての、参照画素の位置の組み合わせと、フィルタ係数に用いる理想的な実数値をブースティングにより学習する学習手段を備える、
ことを特徴とする集団学習装置である。 The present invention has been made in consideration of the above problems, and the first aspect of the present invention is to use a learning sample consisting of a plurality of grayscale images that are known to be objects or non-objects. A collective learning apparatus that collectively learns a plurality of weak discriminators used for detecting an object for detecting whether or not a given image is an object,
Each weak discriminator is a filter that calculates a feature amount by calculating an inner product of a luminance value vector whose element is the luminance value of a reference pixel at the pixel position of the L point and a filter coefficient vector consisting of an arbitrary real value (however, L is an integer of 2 or more),
For each weak discriminator, a learning means for learning a combination of reference pixel positions and an ideal real value used for a filter coefficient by boosting is provided.
This is a group learning device characterized by this.

また、本発明の第２の側面は、与えられた濃淡画像が対象物であるか否かを検出する対象物検出装置であって、
Ｌ点の画素位置の参照画素の輝度値を要素とする輝度値ベクトルと任意の実数値からなるフィルタ係数ベクトルとの内積計算により該与えられた濃淡画像の特徴量を算出するフィルタからなる複数の弱判別器と、
前記複数の弱判別器の少なくとも１以上により算出された前記特徴量に基づいて該与えられた濃淡画像が対象物であるか否かを判別する判別器と、
を具備することを特徴とする対象物検出装置である。 The second aspect of the present invention is an object detection device for detecting whether or not a given grayscale image is an object,
A plurality of filters each including a filter for calculating a feature value of the given grayscale image by calculating an inner product of a luminance value vector having the luminance value of the reference pixel at the pixel position of the point L as an element and a filter coefficient vector composed of an arbitrary real value. A weak classifier,
A discriminator for discriminating whether or not the given grayscale image is an object based on the feature amount calculated by at least one of the plurality of weak discriminators;
It is an object detection apparatus characterized by comprising.

画像認識のための前処理若しくは特徴抽出として、フィルタが広く用いられている。フィルタは、入力画像中の複数の画素位置の参照画素の輝度値からなる輝度ベクトルと、各参照画素の輝度値に掛け合わせるフィルタ係数からなる係数ベクトルとの内積計算である。また、１つのフィルタはランダムよりも少し良い程度の弱判別器に過ぎないが、複数の弱判別器を組み合わせて用いることで、強い判別器を構築することができる。 Filters are widely used as preprocessing or feature extraction for image recognition. The filter is an inner product calculation of a luminance vector composed of the luminance values of the reference pixels at a plurality of pixel positions in the input image and a coefficient vector composed of a filter coefficient multiplied by the luminance value of each reference pixel. One filter is only a weak classifier that is slightly better than random, but a strong classifier can be constructed by using a combination of a plurality of weak classifiers.

従来のフィルタは、固定のフィルタ係数しか用いておらず、フィルタの自由度を充分に使い切っておらず、認識性能を犠牲にしている。フィルタ係数として用いる理想的な実数値を求めることができれば、認識性能を高めることができる。 Conventional filters use only fixed filter coefficients, do not fully use the degree of freedom of the filter, and sacrifice recognition performance. If an ideal real value used as a filter coefficient can be obtained, recognition performance can be improved.

これに対し、本発明によれば、フィルタの学習を線形の２クラス判別問題に帰着させることで、フィルタ係数として用いる理想的な実数値を求め、自由度の高いフィルタ群を好適に学習することができる。 On the other hand, according to the present invention, it is possible to obtain an ideal real value used as a filter coefficient by reducing learning of a filter to a linear two-class discrimination problem, and to appropriately learn a filter group having a high degree of freedom. Can do.

学習手段は、対象物又は非対象物であるかの２クラスが分別すなわちラベリングされた濃淡画像からなる複数の学習サンプルを各弱判別器に投入して、ブースティングにより対象物及び非対象物それぞれの特徴量をあらかじめ学習する。 The learning means inputs a plurality of learning samples consisting of gray images in which two classes of objects or non-objects are classified, that is, labeled, into each weak classifier, and each of the objects and non-objects by boosting The feature amount of is learned in advance.

具体的には、学習手段は、弱判別器に複数の学習サンプルを投入して得られる判別誤差を最小化するように、参照画素のＬ点の画素位置の組み合わせとフィルタ係数を含む多点フィルタのパラメータを決定することによって、弱判別器の学習を行なう。このような最小化問題は、例えば、ＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎ又はサポート・ベクタ・マシンなどのアルゴリズムを用いて解決することができる。ＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎ又はサポート・ベクタ・マシンの手法自体は、当業界において周知である。 Specifically, the learning means includes a multi-point filter including a combination of the pixel positions of the L points of the reference pixels and a filter coefficient so as to minimize a discrimination error obtained by inputting a plurality of learning samples into the weak discriminator. The weak discriminator is learned by determining the parameters. Such a minimization problem can be solved, for example, using algorithms such as Logistic Regression or Support Vector Machine. The Logistic Regression or Support Vector Machine approach itself is well known in the art.

より詳細に言えば、学習手段は、最初に生成する弱判別器にとっての各学習サンプルのデータ重みＤ_1,iを初期化してから、 More specifically, the learning means initializes the data weights D _{1, i} of each learning sample for the weak classifier to be generated first,

ＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎ又はサポート・ベクタ・マシンを用いてにより弱判別器を学習する弱判別器学習と、学習したｔ番目の弱判別器に対して複数の学習サンプルを投入したときの、誤判別した各学習サンプルのデータ重みＤ_t,iを加算して重み付けエラーを算出する重み付けエラー算出と、対象物の検出時において各弱判別器が算出した特徴量を重み付け多数決する際に用いられる、弱判別器の重みα_tを算出する重み付け多数決の重み算出と、各学習サンプルのデータ重みＤ_t+1,iを、重み付け多数決の重みα_tと、学習した弱判別器による各学習サンプルについての判別結果に基づいて算出するデータ重み更新とを必要回数だけ繰り返して実施して、必要数の弱判別器を学習することができる。 Weak discriminator learning that learns a weak discriminator using Logistic Regression or a support vector machine, and each misclassified learning when a plurality of learning samples are input to the learned t-th weak discriminator A weighting error calculation for adding a sample data weight D _{t, i} to calculate a weighting error, and a weak classifier used for weighted majority determination of feature quantities calculated by each weak classifier at the time of detection of an object. Weight calculation of weighted majority voting to calculate weight α _t , data weight D _{t + 1, i} of each learning sample, weight voting weight α _t , and learning result for each learning sample by learned weak classifier It is possible to learn the necessary number of weak discriminators by repeating the data weight update calculated in the above manner as many times as necessary.

ここで、弱判別器学習の処理では、多点フィルタｔの参照画素位置の組み合わせをランダムに決め、重み付けエラーからなるコスト関数を最小とするような多点フィルタｔのフィルタ係数をＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎ又はサポート・ベクタ・マシンを用いて決定するとともに、該決定したフィルタ係数を持つ多点フィルタｔの重み付けエラーを求める処理を複数回だけ繰り返して実施する。そして、この繰り返し処理の中で重み付けエラーが最小となる参照画素位置の組み合わせ及びフィルタ係数からなる多点フィルタｔをｔ番目の弱判別器として採用するようにすればよい。 Here, in the weak discriminator learning process, the combination of reference pixel positions of the multipoint filter t is determined at random, and the filter coefficient of the multipoint filter t that minimizes the cost function including the weighting error is Logistic Regression or supported. The determination using the vector machine and the process for obtaining the weighting error of the multipoint filter t having the determined filter coefficient are repeated a plurality of times. Then, a multi-point filter t composed of a combination of reference pixel positions and a filter coefficient that minimizes a weighting error in this iterative process may be adopted as the t-th weak discriminator.

したがって、本発明によれば、ターゲットの認識タスクにとって有利となるようなフィルタを自動的に作ることができるので、それら試行錯誤の労力から開放されるだけでなく、得られた弱判別器群を適用した対象物検出装置における認識性能の向上や処理量の削減や高速化などに貢献することが期待することができ、大変有効である。 Therefore, according to the present invention, it is possible to automatically create a filter that is advantageous for the target recognition task, so that not only the labor of trial and error is released, but the obtained weak classifier group is It can be expected to contribute to improvement of recognition performance, reduction of processing amount, and speedup in the applied object detection device, which is very effective.

ここで、弱判別器の学習時に扱う学習サンプルや、対象物の検出時に与えられる画像などの入力画像は、例えば２０×２０画素など所定のウィンドウ・サイズに切り出されている。各弱判別器はそれぞれ、Ｌ点の参照画素を前記ウィンドウ・サイズ上に直線状に配置し、各々の画素位置を｛開始位置ｑ，ステップ幅ｓ｝の２つの情報として保持するようにしてもよい。すなわち、画素位置の情報をなるべくコンパクトな表現にして保持することにより、学習時や検出時など実行時のメモリ・アクセス回数を削減し、処理を高速化することができる。 Here, an input image such as a learning sample to be handled at the time of learning of the weak classifier and an image given at the time of detection of the target is cut out to a predetermined window size such as 20 × 20 pixels. Each weak discriminator arranges L reference pixels in a straight line on the window size, and holds each pixel position as two pieces of information of {start position q, step width s}. Good. That is, by holding the pixel position information in a compact representation as much as possible, the number of memory accesses during execution such as learning or detection can be reduced, and the processing speed can be increased.

また、学習サンプルや被検対象の濃淡画像などの１×１レイヤの入力画像を、隣接する２×２画素のブロック毎に輝度値を平均化して１画素とする２×２ぼかし処理を行なって２×２レイヤの画像を生成し、又、元の学習サンプルを隣接する２×２画素のブロック毎に輝度値を平均化して１画素とする４×４ぼかし処理を行なって４×４レイヤの画像を生成するマルチレイヤ画像生成手段をさらに備えていてもよい。そして、各弱判別器は、マルチレイヤ化された各レイヤの学習サンプルを使用して、いずれかのレイヤについての多点フィルタがあらかじめ学習しておく。このような場合、対象物の検出時には、該与えられた濃淡画像を前記マルチレイヤ画像生成手段によってマルチレイヤ化した該当するレイヤについて特徴量を算出することができる。 In addition, a 2 × 2 blurring process is performed on an input image of a 1 × 1 layer, such as a learning sample or a grayscale image to be examined, by averaging luminance values for each adjacent 2 × 2 pixel block to 1 pixel. A 2 × 2 layer image is generated, and the original learning sample is subjected to 4 × 4 blurring processing by averaging the luminance value for each adjacent 2 × 2 pixel block to 1 pixel, and the 4 × 4 layer Multi-layer image generation means for generating an image may be further provided. Each weak classifier uses a learning sample of each layer that has been made multi-layered, and a multi-point filter for any one of the layers learns in advance. In such a case, at the time of detecting an object, it is possible to calculate a feature amount for a corresponding layer obtained by multi-layering the given grayscale image by the multi-layer image generating means.

２×２並びに４×４の各レイヤにおいて、多点フィルタは、元の１×１画素に対し２×２並びに４×４の大きさからなる拡大したＬ点の参照画素を配置したものとなる。ぼかした画像の各画素を参照することは、１×１レイヤのオリジナル画像において広い範囲を一度に参照することに相当する。 In each of the 2 × 2 and 4 × 4 layers, the multipoint filter is obtained by arranging enlarged L-point reference pixels having sizes of 2 × 2 and 4 × 4 with respect to the original 1 × 1 pixel. . Referencing each pixel of the blurred image is equivalent to referencing a wide range at a time in the original image of the 1 × 1 layer.

顔パーツの位置によっては、粒度の細かく狭い範囲のみを参照する１×１レイヤの多点フィルタを用いた方が認識性能の高い場合や、逆に粒度が粗く広い範囲を参照する２×２レイヤ又は４×４レイヤの多点フィルタを用いた方が認識性能の高くなる場合が考えられる。又、多くの画素を参照すると計算量が増大するが、画像のマルチレイヤ化によって、なるべく少ない参照画素数にすることができる。 Depending on the position of the face part, the use of a 1 × 1 layer multipoint filter that refers only to a fine and narrow range of granularity provides better recognition performance, or conversely, a 2 × 2 layer that refers to a wide range with a coarse granularity. Or the case where the multipoint filter of 4x4 layer uses the recognition performance becomes higher. In addition, when a large number of pixels are referred to, the amount of calculation increases. However, the number of reference pixels can be reduced as much as possible by making the image multi-layered.

また、対象物の検出時において、各弱判別器が算出した特徴量を重み付け多数決する計算途中の値と打ち切り閾値との比較結果に応じて、該与えられた濃淡画像が対象物でないと判断して当該計算を打ち切る打ち切り処理を導入することができる。 Further, at the time of detection of an object, it is determined that the given grayscale image is not an object according to a comparison result between an in-computation value for weighted majority determination of the feature amount calculated by each weak discriminator and an abort threshold. Thus, it is possible to introduce an abort process for aborting the calculation.

この打ち切り閾値は、対象物を示す学習サンプルの判別結果の重み付き多数決の値がとり得る最小値に基づいて計算することができる。弱判別器学習、重み付けエラー算出、重み算出、及び、データ重み更新の処理を必要回数だけ繰り返して実施する際に、前記打ち切り閾値算出処理を併せて実施することによって、弱判別器毎の打ち切り閾値を学習することができる。 This censoring threshold can be calculated based on the minimum value that can be taken by the weighted majority value of the discrimination result of the learning sample indicating the object. When the weak discriminator learning, the weighting error calculation, the weight calculation, and the data weight update process are repeated as many times as necessary, the abort threshold value for each weak discriminator is obtained by performing the abort threshold value calculation process together. Can learn.

１つの弱判別器が特徴量を出力して重み付き多数決の値を更新する度に、打ち切り閾値と比較し、重み付き多数決の値が打ち切り閾値を下回る時点で当該ウィンドウ画像は対象物ではないとして計算を打ち切ることによって、無駄な演算を省いて判別処理を高速化することができる。 Each time a weak classifier outputs a feature value and updates the value of the weighted majority vote, it is compared with the abort threshold value, and the window image is not an object when the weighted majority vote value falls below the abort threshold value. By aborting the calculation, it is possible to speed up the discrimination process without using unnecessary calculations.

また、本発明の第３の側面は、正解（対象物であるか非対象物であるか）が既知である複数の濃淡画像からなる学習サンプルを使用して、与えられた画像が対象物であるか否かを検出する対象物の検出に用いる複数の弱判別器を集団学習するための処理をコンピュータ上で実行するようにコンピュータ可読形式で記述されたコンピュータ・プログラムであって、
各弱判別器は、Ｌ点の画素位置の参照画素の輝度値を要素とする輝度値ベクトルと任意の実数値からなるフィルタ係数ベクトルとの内積計算により特徴量を算出する多点フィルタであり（但し、Ｌは２以上の整数）、各学習サンプルｉは、弱判別器ｔ毎に判別の難易度を反映した重みを表すデータ重みＤ_t,iを持ち（但し、ｉは学習サンプルを識別する通し番号で、ｔは弱判別器を識別する通し番号）、
前記コンピュータ・プログラムは前記コンピュータを、
最初に生成する弱判別器にとっての各学習サンプルのデータ重みＤ_1,iを初期化する初期化手段と、
ＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎ又はサポート・ベクタ・マシンを用いてにより、ｔ番目の弱判別器を学習する弱判別器学習手段と、
前記弱判別器学習手段が学習したｔ番目の弱判別器に対して複数の学習サンプルを投入したときの、誤判別した各学習サンプルのデータ重みＤ_t,iを加算して重み付けエラーを算出する重み付けエラー算出手段と、
対象物の検出時に各弱判別器が算出した特徴量を重み付け多数決する際に用いられる、ｔ番目の弱判別器の信頼度に相当する重みα_tを算出する重み付け多数決の重み算出手段と、
各学習サンプルのデータ重みＤ_t+1,iを、重み付け多数決の重みα_tと、各学習サンプルを正解付けするラベルと、ｔ番目の弱判別器による各学習サンプルについての判別結果に基づいて算出するデータ重み更新手段と、
として機能させ、
前記の弱判別器学習手段、重み付けエラー算出手段、重み算出手段、及び、データ重み更新手段による処理を必要回数だけ繰り返して実施して、必要数の弱判別器を学習する、
ことを特徴とするコンピュータ・プログラムである。 The third aspect of the present invention uses a learning sample consisting of a plurality of grayscale images whose correct answers (whether they are objects or non-objects), and the given image is the object. A computer program written in a computer-readable format so as to execute on a computer a process for collective learning of a plurality of weak classifiers used for detecting an object for detecting whether or not there is an object,
Each weak discriminator is a multi-point filter that calculates a feature value by calculating an inner product of a luminance value vector whose element is the luminance value of a reference pixel at the pixel position of the L point and a filter coefficient vector composed of an arbitrary real value ( However, L is an integer of 2 or more, and each learning sample i has a data weight D _{t, i} that represents a weight reflecting the difficulty of discrimination for each weak discriminator t (where i identifies a learning sample) Serial number, t is a serial number that identifies the weak classifier),
The computer program causes the computer to
Initialization means for initializing data weights D _{1, i} of each learning sample for the weak classifier to be generated first;
Weak classifier learning means for learning the t-th weak classifier by using Logistic Regression or a support vector machine;
When a plurality of learning samples are input to the t-th weak classifier learned by the weak classifier learning means, a data error D _{t, i} of each misclassified learning sample is added to calculate a weighting error. A weighting error calculating means;
A weighted majority voting weight calculating means for calculating a weight α _t corresponding to the reliability of the t-th weak classifier, which is used when the weighted majority of the feature quantities calculated by each weak classifier at the time of detection of an object;
The data weight D _{t + 1, i} of each learning sample is calculated based on the weighting majority weight α _t , the label for correctly attaching each learning sample, and the discrimination result for each learning sample by the t-th weak discriminator. Data weight updating means to perform,
Function as
The weak classifier learning means, the weighting error calculation means, the weight calculation means, and the data weight update means repeat the process as many times as necessary to learn the required number of weak classifiers.
This is a computer program characterized by the above.

また、本発明の第４の側面は、与えられた濃淡画像が対象物であるか否かを検出するための処理をコンピュータ上で実行するようにコンピュータ可読形式で記述されたコンピュータ・プログラムであって、前記コンピュータを、
Ｌ点の画素位置の参照画素の輝度値を要素とする輝度値ベクトルと任意の実数値からなるフィルタ係数ベクトルとの内積計算により該与えられた濃淡画像の特徴量を算出する多点フィルタからなる複数の弱判別手段と（但し、Ｌは２以上の整数）、
前記複数の弱判別手段の少なくとも１以上により算出された前記特徴量に基づいて該与えられた濃淡画像が対象物であるか否かを判別する判別手段と、
として機能させるためのコンピュータ・プログラムである。 The fourth aspect of the present invention is a computer program written in a computer-readable format so that processing for detecting whether a given grayscale image is an object is executed on the computer. The computer
It consists of a multi-point filter that calculates the feature value of the given grayscale image by calculating the inner product of a luminance value vector whose element is the luminance value of the reference pixel at the L pixel position and a filter coefficient vector consisting of an arbitrary real value. A plurality of weak discrimination means (where L is an integer of 2 or more),
Discriminating means for discriminating whether or not the given grayscale image is an object based on the feature amount calculated by at least one of the plurality of weak discriminating means;
It is a computer program for making it function as.

本発明の第３並びに第４の各側面に係るコンピュータ・プログラムは、コンピュータ上で所定の処理を実現するようにコンピュータ可読形式で記述されたコンピュータ・プログラムを定義したものである。換言すれば、本発明の第３並びに第４の各側面に係るコンピュータ・プログラムをコンピュータにインストールすることによって、コンピュータ上では協働的作用が発揮され、本発明の第１の側面に係る集団学習装置、並びに、第４の側面に係る対象物検出装置と同様の作用効果をそれぞれ得ることができる。 The computer program according to each of the third and fourth aspects of the present invention defines a computer program written in a computer-readable format so as to realize predetermined processing on the computer. In other words, by installing the computer program according to each of the third and fourth aspects of the present invention in a computer, a cooperative action is exhibited on the computer, and the group learning according to the first aspect of the present invention. Effects similar to those of the apparatus and the object detection apparatus according to the fourth aspect can be obtained.

本発明によれば、与えられた画像が顔などの所望の対象物であるか否かを検出するための対象物検出装置を好適に集団学習することができる、優れた集団学習装置及び集団学習方法、集団学習された対象物検出装置及び対象物検出方法、並びにコンピュータ・プログラムを提供することができる。 Advantageous Effects of Invention According to the present invention, an excellent group learning device and group learning that can suitably perform group learning for an object detection device for detecting whether or not a given image is a desired object such as a face. A method, a group-learned object detection device and an object detection method, and a computer program can be provided.

また、本発明によれば、Ａｄａｂｏｏｓｔなどの統計学習を用いて複数の弱判別器からなる判別器を好適に集団学習することができる、優れた集団学習装置及び集団学習方法、集団学習して得られた弱判別器群を用いて対象物の検出を行なう対象物検出装置及び対象物検出方法、並びにコンピュータ・プログラムを提供することができる。 In addition, according to the present invention, an excellent group learning apparatus and group learning method that can suitably perform group learning of a classifier composed of a plurality of weak classifiers using statistical learning such as Adaboost, obtained by group learning. It is possible to provide an object detection device, an object detection method, and a computer program that detect an object using the weak classifier group.

また、本発明によれば、Ａｄａｂｏｏｓｔなどの統計学習を用いて自由度の高いフィルタで構成される弱判別器群を好適に集団学習することができる、優れた集団学習装置及び集団学習方法、集団学習して得られた弱判別器群を用いて対象物の検出を行なう対象物検出装置及び対象物検出方法、並びにコンピュータ・プログラムを提供することができる。 In addition, according to the present invention, an excellent group learning apparatus and group learning method, a group learning method, and a group learning method that can suitably perform group learning of a weak classifier group including a filter having a high degree of freedom using statistical learning such as Adaboost. It is possible to provide an object detection apparatus, an object detection method, and a computer program for detecting an object using a weak classifier group obtained by learning.

また、本発明によれば、入力画像（複数の参照画素）に対して乗算するフィルタ係数として用いる理想的な実数値を求めて自由度の高いフィルタ群を好適に集団学習することができる、優れた集団学習装置及び集団学習方法、集団学習して得られた弱判別器群を用いて対象物の検出を行なう対象物検出装置及び対象物検出方法、並びにコンピュータ・プログラムを提供することができる。 In addition, according to the present invention, it is possible to appropriately collect and learn a filter group having a high degree of freedom by obtaining an ideal real value to be used as a filter coefficient for multiplying an input image (a plurality of reference pixels). A group learning apparatus and group learning method, an object detection apparatus and an object detection method for detecting an object using a weak classifier group obtained by group learning, and a computer program can be provided.

また、本発明に係る集団学習装置によって得られた弱判別器群を用いて対象物検出装置を構成することにより、小さいデータで効率的且つ高速に画像認識を行なうことができ、また、ピクセル差分特徴を求める弱判別器群を用いた場合よりも高い認識性能を得ることが期待できる。 In addition, by configuring the object detection device using the weak classifier group obtained by the group learning device according to the present invention, it is possible to perform image recognition efficiently and at high speed with small data, and pixel difference Higher recognition performance can be expected than when weak classifier groups for obtaining features are used.

本発明のさらに他の目的、特徴や利点は、後述する本発明の実施形態や添付する図面に基づくより詳細な説明によって明らかになるであろう。 Other objects, features, and advantages of the present invention will become apparent from more detailed description based on embodiments of the present invention described later and the accompanying drawings.

以下、図面を参照しながら本発明の実施形態について詳解する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

画像から対象物を検出する対象物検出装置では、判別のための前処理若しくは特徴抽出として、フィルタが広く用いられている。このようなフィルタは、アンサンブル学習（集団学習）を通じて得ることができる。 In an object detection apparatus that detects an object from an image, a filter is widely used as preprocessing for discrimination or feature extraction. Such a filter can be obtained through ensemble learning (group learning).

集団学習によって得られる学習機械は、多数の弱仮説と、これらを組み合わせる結合機（ｃｏｍｂｉｎｅｒ）からなる。ここで、入力に依らず、固定した重みで弱仮説の出力を統合する結合機の一例として、ブースティングが挙げられる。ブースティングは、前に生成した弱仮説の学習結果を使用して間違いを苦手とする学習サンプルの重みを増すように、学習サンプルが従う分布を加工し、この分布に基づき新たな弱仮説の学習を行なう。これにより、不正解が多く対象物として判別が難しい学習サンプルの重みが相対的に上昇し、重みが大きい、すなわち判別が難しい学習サンプルを正解させるような弱判別器が逐次選択される。学習における弱仮説の生成は逐次的に行なわれ、後から生成された弱仮説はその前に生成された弱仮説に依存することになる。 A learning machine obtained by group learning includes a number of weak hypotheses and a combiner that combines them. Here, boosting is an example of a combiner that integrates weak hypothesis outputs with fixed weights regardless of input. Boosting uses the previously generated weak hypothesis learning result to increase the weight of the learning sample that is not prone to mistakes, and then processes the distribution that the learning sample follows and learns a new weak hypothesis based on this distribution. To do. As a result, the weight of the learning sample that has many incorrect answers and is difficult to discriminate as a target is relatively increased, and a weak discriminator that makes the learning sample that has a large weight, that is, difficult to discriminate, correct is selected sequentially. The generation of weak hypotheses in learning is performed sequentially, and the weak hypotheses generated later depend on the weak hypotheses generated before that.

対象物を検出する際、学習により逐次生成された多数の弱判別器の判別結果を使用する。例えばＡｄａＢｏｏｓｔアルゴリズムでは、学習により生成されたすべての弱判別器の判別結果（対象物であれば１、非対象物であれば−１）が結合機に供給され、結合機は弱判別器毎に学習時に算出された信頼度を重み付け加算した重み付き多数決の結果を出力し、この出力値を評価することによって、入力された画像が対象物か否かを選択する。 When detecting an object, discrimination results of a number of weak discriminators sequentially generated by learning are used. For example, in the AdaBoost algorithm, the discrimination results of all weak discriminators generated by learning (1 for an object, -1 for a non-object) are supplied to the coupler, and the coupler for each weak discriminator. A weighted majority result obtained by weighting and adding the reliability calculated at the time of learning is output, and the output value is evaluated to select whether or not the input image is an object.

弱判別器は、何らかの特徴量を使用して、対象物か又は非対象物であるかの判定を行なうフィルタである。以下では、フィルタは、入力画像中の複数の画素位置の参照画素の輝度値からなる輝度ベクトルと、各参照画素に掛け合わせるフィルタ係数からなる係数ベクトルとの内積計算を行なうものとする（上式（１）を参照のこと）。弱判別器の出力は対象物か否かを確定的に出力してもよく、対象物らしさを確率密度などで確率的に出力してもよい。 The weak classifier is a filter that determines whether the object is an object or a non-object by using some feature amount. In the following, it is assumed that the filter performs an inner product calculation of a luminance vector composed of the luminance values of reference pixels at a plurality of pixel positions in the input image and a coefficient vector composed of a filter coefficient to be multiplied to each reference pixel (the above formula (See (1)). The output of the weak classifier may be output deterministically whether or not the object is an object, and may be output probabilistically by the probability density or the like.

特許文献１（前述）では、２つのピクセル間の輝度値の差すなわちピクセル間差分特徴から対象物か否かを判別する極めて簡単な弱判別器を使用した集団学習装置を利用して、認識性能を犠牲にしつつ対象物の検出処理を高速化する方法について提案されている。これに対し、本実施形態では、フィルタの学習を線形の２クラス判別問題に帰着させることで、フィルタ係数として用いる理想的な実数値を求め、自由度の高いフィルタ群を用いることで認識性能の向上を図っている。弱判別器すなわちフィルタの構成方法については後に詳解する。 Patent Document 1 (described above) uses a group learning device that uses a very simple weak classifier that determines whether or not an object is an object from a difference in luminance value between two pixels, that is, an inter-pixel difference feature. There has been proposed a method for speeding up the object detection process while sacrificing. On the other hand, in this embodiment, the learning performance of the filter is reduced to a linear two-class discrimination problem, an ideal real value used as a filter coefficient is obtained, and the recognition performance is improved by using a filter group having a high degree of freedom. We are trying to improve. The method of configuring the weak classifier, that is, the filter will be described in detail later.

装置構成：
図１には、対象物検出装置の機能的構成を模式的に示している。図示の対象物検出装置１は、画像入力部２と、スケーリング部３と、走査部４と、判別器５で構成される。 Device configuration:
FIG. 1 schematically shows a functional configuration of the object detection apparatus. The illustrated object detection apparatus 1 includes an image input unit 2, a scaling unit 3, a scanning unit 4, and a discriminator 5.

画像入力部２は、例えばデジタルカメラで撮影された濃淡画像（輝度画像）を入力する。スケーリング部３は、入力画像を、指定されたすべてのスケールに拡大又は縮小したスケーリング画像を出力する。走査部３は、各スケーリング画像について、検出したい対象物の大きさとなるウィンドウを例えば最上ラインから下に向かって順次水平スキャン順次スキャンしてウィンドウ画像を切り出す。そして、判別器５は、走査部４にて順次スキャンされた各ウィンドウ画像が顔などの対象物か非対象物かを判別し、対象物を検出したときには、その領域を示す位置及び大きさを検出結果として出力する。 The image input unit 2 inputs a grayscale image (luminance image) captured by a digital camera, for example. The scaling unit 3 outputs a scaled image obtained by enlarging or reducing the input image to all designated scales. For each scaled image, the scanning unit 3 sequentially scans a window having a size of an object to be detected, for example, sequentially from the top line to the bottom, and cuts out the window image. Then, the discriminator 5 discriminates whether each window image sequentially scanned by the scanning unit 4 is an object such as a face or a non-object, and when the object is detected, the position and size indicating the area are determined. Output as detection result.

ここで、集団学習機６は、集団学習により判別器５を構成する複数の弱判別器の集団学習を実行する。判別器５は、集団学習機６の学習結果を参照して、現在のウィンドウ画像が例えば顔画像などの対象物であるか、又は非対象物であるかを判別する。集団学習機６は、対象物検出装置１内のコンポーネントであっても、あるいは外部の独立した装置であってもよい。 Here, the group learning machine 6 performs group learning of a plurality of weak classifiers constituting the classifier 5 by group learning. The discriminator 5 refers to the learning result of the group learning machine 6 and discriminates whether the current window image is an object such as a face image or a non-object. The group learning machine 6 may be a component in the object detection device 1 or an external independent device.

また、対象物検出装置１は、入力画像から複数の対象物が検出された場合は、複数の領域情報を出力する。さらに、複数の領域情報のうち領域が重なり合っている領域が存在する場合は、最も対象物とされる評価が高い領域を選択する処理も行なうことができる。 Moreover, the target object detection apparatus 1 outputs a plurality of region information when a plurality of target objects are detected from the input image. Furthermore, when there is an overlapping area among a plurality of area information, it is possible to perform a process of selecting an area having the highest evaluation as an object.

画像入力部２に入力された画像（濃淡画像）は、まずスケーリング部３に供給される。スケーリング部３では、バイリニア補完を用いた画像の縮小が行なわれる。最初に複数の縮小画像を生成するのではなく、必要とされる画像を走査部４に対して出力し、その画像の処理を終えた後で、次のさらに小さな縮小画像を生成するという処理を繰り返す。図２には、スケーリング部３が縮小画像を生成する様子を示している。同図に示すように、入力画像１０Ａをそのまま走査部４へ出力し、走査部４及び判別器５の処理が終了するのを待って、入力画像１０Ａのサイズを縮小した入力画像１０Ｂを生成する。次いで、この入力画像１０Ｂにおける走査部４及び判別器５の処理が終了してから、入力画像１０Ｂのサイズをさらに縮小した入力画像１０Ｃを走査部４に出力するというように、順次縮小画像１０Ｄ、１０Ｅなどを生成していく。そして、縮小画像の画像サイズが、走査部４にて走査するウィンドウ・サイズより小さくなった時点で処理を終了する。画像入力部２は、このような処理が終了してから、次の入力画像をスケーリング部３に出力する。 The image (grayscale image) input to the image input unit 2 is first supplied to the scaling unit 3. The scaling unit 3 performs image reduction using bilinear interpolation. Instead of first generating a plurality of reduced images, the necessary image is output to the scanning unit 4, and after the processing of the image is completed, the next smaller reduced image is generated. repeat. FIG. 2 shows how the scaling unit 3 generates a reduced image. As shown in the figure, the input image 10A is output to the scanning unit 4 as it is, and after the processing of the scanning unit 4 and the discriminator 5 is completed, the input image 10B in which the size of the input image 10A is reduced is generated. . Subsequently, after the processing of the scanning unit 4 and the discriminator 5 in the input image 10B is completed, the input image 10C obtained by further reducing the size of the input image 10B is output to the scanning unit 4 so that the reduced image 10D, 10E etc. are generated. Then, the process ends when the image size of the reduced image becomes smaller than the window size scanned by the scanning unit 4. The image input unit 2 outputs the next input image to the scaling unit 3 after such processing is completed.

図３には、走査部４において入力画像上で所定のウィンドウ・サイズのウィンドウＳをスキャンする様子を示している。ウィンドウ・サイズは、後段の判別器５が受け付ける（すなわち対象物の判別に適した）サイズであり、例えば２０×２０画素である。走査部４は、入力画像１０Ａに対して、ウィンドウＳを画像（画面）の全体に対して順次当て嵌め、各スキャン位置における画像（以下、「切り取り画像」とする）を判別器５に出力する。ウィンドウ・サイズＳは一定であるが、図２に示した通りスケーリング部３により入力画像を順次縮小してさまざまな画像サイズにスケール変換するので、任意の大きさの対象物体を検出することが可能となる。 FIG. 3 shows how the scanning unit 4 scans a window S having a predetermined window size on the input image. The window size is a size that is accepted by the discriminator 5 at the subsequent stage (that is, suitable for discrimination of an object), and is, for example, 20 × 20 pixels. The scanning unit 4 sequentially applies the window S to the entire image (screen) with respect to the input image 10 </ b> A, and outputs an image at each scan position (hereinafter referred to as a “cut image”) to the discriminator 5. . Although the window size S is constant, as shown in FIG. 2, the scaling unit 3 sequentially reduces the input image and scales it to various image sizes, so that it is possible to detect a target object of any size. It becomes.

判別器５は、走査部４から与えられた切り取り画像が、例えば顔などの対象物体であるか否かを判定する。図４には、判別器５の構成を示している。判別器５は、複数の弱判別器２１_t（２１₁〜２１_K）と、これらの出力にそれぞれ重みα_t（α₁〜α_K）を乗算し、重み付き多数決を求める加算器２２で構成される（ｔは弱判別器を識別する通し番号とし、Ｋは弱判別器の総数とする）。判別器６は、入力されるウィンドウ画像に対し、各弱判別器２１_t（２１₁〜２１_K）が対象物である否かの推定値を逐次出力し、加算器２２が重み付き多数決を算出して出力する。判別器５は、この重み付き多数決の値に応じて、切り取り画像が対象物か否かを判定することができる。 The discriminator 5 determines whether or not the cut image provided from the scanning unit 4 is a target object such as a face. FIG. 4 shows the configuration of the discriminator 5. The discriminator 5 includes a plurality of weak discriminators 21 _t (21 _{1 to} 21 _K ) and an adder 22 that multiplies these outputs by weights α _t (α _{1 to} α _K ), respectively, and obtains a weighted majority vote. (T is a serial number for identifying weak classifiers, and K is the total number of weak classifiers). The discriminator 6 sequentially outputs estimated values as to whether or not each weak discriminator 21 _t (21 _{1 to} 21 _K ) is an object for the input window image, and the adder 22 calculates a weighted majority vote. And output. The discriminator 5 can determine whether or not the cut image is an object according to the weighted majority value.

判別器５を構成する各弱判別器２１_t（２１₁〜２１_K）は、入力された切り取り画像中の複数の画素位置の参照画素の輝度値からなる輝度値ベクトルと、各参照画素に掛け合わせるフィルタ係数からなる係数ベクトルとの内積計算を行なうことで判別のための特徴量を算出するフィルタであり、アンサンブル学習（ＥｎｓｅｍｂｌｅＬｅａｒｎｉｎｇ）により得ることができる。本実施形態では、フィルタの学習を線形の２クラス判別問題に帰着させることで、フィルタ係数として用いる理想的な実数値を求め、自由度の高いフィルタ群を用いることで認識性能の向上を図っている。 Each weak discriminator 21 _t (21 _{1 to} 21 _K ) constituting the discriminator 5 multiplies each reference pixel by a luminance value vector composed of luminance values of reference pixels at a plurality of pixel positions in the input cut image. This is a filter for calculating a feature value for discrimination by calculating an inner product with a coefficient vector composed of the filter coefficients to be combined, and can be obtained by ensemble learning (Ensemble Learning). In this embodiment, the learning of the filter is reduced to a linear two-class discrimination problem, an ideal real value used as a filter coefficient is obtained, and the recognition performance is improved by using a filter group having a high degree of freedom. Yes.

集団学習機６は、あらかじめ弱判別器２１_t（２１₁〜２１_K）と、それらの出力（推定値）に乗算する重みを集団学習により学習する。集団学習としては、複数の判別器の結果を多数決にて求めることができる手法を適用することができる。例えば、データの重み付けを行なって重み付き多数決を行なうＡｄａＢｏｏｓｔなどのブースティングを用いた集団学習を適用可能である。 The collective learning machine 6 learns in advance by collective learning a weak discriminator 21 _t (21 _{1 to} 21 _K ) and weights to be multiplied by their outputs (estimated values). As group learning, a technique that can obtain the results of a plurality of discriminators by majority decision can be applied. For example, group learning using boosting such as AdaBoost that performs weighted majority by weighting data can be applied.

学習の際には、対象物又は非対象物であるかの２クラスが分別すなわちラベリングされた濃淡画像からなる複数の学習サンプルを弱判別器２１_t（２１₁〜２１_K）に投入して、対象物及び非対象物それぞれの特徴量をあらかじめ学習しておく。そして、判別の際には、走査部４から順次供給されるウィンドウ画像について算出した特徴量を、あらかじめ学習しておいた対象物及び非対象物それぞれの特徴量と比較して、ウィンドウ画像が対象物であるか否かを推定するための推定値を確定的又は確率的に出力する。 At the time of learning, a plurality of learning samples consisting of gray images obtained by classifying or labeling two classes of objects or non-objects are input to the weak classifier 21 _t (21 _{1 to} 21 _K ), The feature amount of each of the object and the non-object is learned in advance. In the determination, the feature amount calculated for the window image sequentially supplied from the scanning unit 4 is compared with the feature amount of each of the target object and the non-target object learned in advance, and the window image is the target. An estimated value for estimating whether or not it is an object is output deterministically or probabilistically.

加算器２２は、各弱判別器２１_t（２１₁〜２１_K）から算出された特徴量に、各弱判別器２１_t（２１₁〜２１_K）に対する信頼度となる重みα_t（α₁〜α_K）を乗算し、これを加算した値（重み付き多数決の値）を出力する。ＡｄａＢｏｏｓｔでは、複数の弱判別器２１_t（２１₁〜２１_K）が順次推定値を算出し、これに伴い重み付き多数決の値が逐次更新されていく。これら複数の弱判別器２１_n（２１₁〜２１_N）は、集団学習機６が学習サンプルを使用した集団学習により逐次的に生成したものであり、例えばその生成順に特徴量を算出する。また、重み付き多数決の重みα_t（α₁〜α_K）（信頼度）は、各弱判別器２１_t（２１₁〜２１_K）を生成する学習工程にて学習される。 The adder 22 adds a weight α _t (α ₁ ) that is a reliability for each weak classifier 21 _t (21 _{1 to} 21 _K ) to the feature amount calculated from each weak classifier 21 _t (21 _{1 to} 21 _K ). to? _K) by multiplying the outputs the addition value (value of weighted majority decision) this. In AdaBoost, a plurality of weak classifiers 21 _t (21 _{1 to} 21 _K ) sequentially calculate estimated values, and the weighted majority values are sequentially updated accordingly. The plurality of weak classifiers 21 _n (21 _{1 to} 21 _N ) are sequentially generated by the group learning machine 6 by group learning using learning samples, and for example, feature quantities are calculated in the generation order. Also, the weighted majority vote weight α _t (α _{1 to} α _K ) (reliability) is learned in a learning process for generating each weak classifier 21 _t (21 _{1 to} 21 _K ).

弱判別器２１_t（２１₁〜２１_K）は、ＡｄａＢｏｏｓｔのように弱判別器が２値出力を行なうべき場合は、算出した特徴量を単一の閾値で２分することで、対象物体であるかどうかの判別を行なう。勿論、複数の閾値を用いて判別するようにしてもよい。また、弱判別器２１_t（２１₁〜２１_K）は、Ｒｅａｌ−ＡｄａＢｏｏｓｔのように算出した特徴量から対象物体かどうかを表す度合いの連続値を確率的に出力してもよい。これら弱判別器２１_t（２１₁〜２１_K）が必要とする判別のための特徴量（閾値）なども、学習時にＡｄａＢｏｏｓｔなどのアルゴリズムに従って学習される。 When the weak discriminator 21 _t (21 _{1 to} 21 _K ) should perform binary output as in AdaBoost, the weak discriminator 21 _t (21 _{1 to} 21 _K ) It is determined whether it exists. Of course, you may make it discriminate | determine using a some threshold value. Further, the weak classifier 21 _t (21 _{1 to} 21 _K ) may probabilistically output a continuous value indicating the degree of being a target object from the feature amount calculated as Real-AdaBoost. The feature quantity (threshold value) for discrimination required by these weak discriminators 21 _t (21 _{1 to} 21 _K ) is also learned according to an algorithm such as AdaBoost at the time of learning.

加算器２２にて重み付き多数決を行なう際、すべての弱判別器２１_t（２１₁〜２１_K）の計算結果を待たず、計算途中の値次第では対象物体でないと判断して計算を打ち切るようにしてもよい。この打ち切り処理によって、検出処理における演算量を大幅に削減することが可能となる。これにより、すべての弱判別器の計算結果を待たず、計算途中で次のウィンドウ画像の判別処理に移ることができる。打ち切りの閾値を学習時に学習することができる。 When a weighted majority decision is made by the adder 22, the calculation result of all weak discriminators 21 _t (21 _{1 to} 21 _K ) is not waited, and the calculation is aborted by determining that the object is not a target object depending on the value during the calculation. It may be. With this abort process, the amount of calculation in the detection process can be significantly reduced. As a result, it is possible to move to the next window image discrimination process during the calculation without waiting for the calculation results of all the weak discriminators. The censoring threshold value can be learned at the time of learning.

このように、判別器５は、ウィンドウ画像が対象物か否かを判定するための評価値ｓとして重み付き多数決の値を算出し、その評価値に基づいてウィンドウ画像が対象物か否かを判定する。さらに判別器５は、複数の弱判別器２１_t（２１₁〜２１_K）が特徴量を順次算出して出力する毎に、その特徴量に対して学習により得られた各弱判別器２１_t（２１₁〜２１_K）に対する重みを乗算して加算した重み付き多数決の値を更新する。そして、重み付き多数決の値（評価値）が更新される毎に、打ち切り閾値を利用して推定値の算出を打ち切るか否かをも制御することができる。 Thus, the discriminator 5 calculates a weighted majority value as the evaluation value s for determining whether or not the window image is an object, and determines whether or not the window image is an object based on the evaluation value. judge. Further, each time the plurality of weak discriminators 21 _t (21 _{1 to} 21 _K ) sequentially calculate and output feature amounts, the discriminator 5 each weak discriminator 21 _t obtained by learning with respect to the feature amount. The weighted majority value obtained by multiplying and adding the weights for (21 _{1 to} 21 _K ) is updated. Then, every time the weighted majority value (evaluation value) is updated, it is possible to control whether or not the calculation of the estimated value is aborted using the abort threshold.

弱判別器の集団学習：
上述したように、集団学習機６は、学習サンプルを使用して所定の集団アルゴリズムに従って集団学習することによって、判別器５を生成する。以下では、まず判別器５で用いられる弱判別器２１の構成について説明し、続いて集団学習機６における集団学習方法について説明する。 Weak classifier group learning:
As described above, the group learning machine 6 generates the discriminator 5 by performing group learning using a learning sample according to a predetermined group algorithm. Below, the structure of the weak discriminator 21 used with the discriminator 5 is demonstrated first, and the group learning method in the group learning machine 6 is demonstrated continuously.

集団学習機６は、対象物又は非対象物のいずれであるか（例えば、顔画像又は非顔画像のいずれであるか）があらかじめ正解付け（ラベリング）された、濃淡画像からならなる学習サンプルを使用して、多数の学習モデル（仮説の組み合わせ）から所定の学習アルゴリズムに従って１つの仮説を選択（学習）することで弱判別器を生成し、この弱判別器の組み合わせ方を決定していく。 The group learning machine 6 obtains a learning sample consisting of a grayscale image in which whether it is an object or a non-object (for example, a face image or a non-face image) is correctly labeled (labeled) in advance. The weak classifier is generated by selecting (learning) one hypothesis from a large number of learning models (hypothesis combinations) according to a predetermined learning algorithm, and the combination of the weak classifiers is determined.

通常、数千若しくは数万の学習サンプルを用いて学習を行なう。図５には、顔画像と非顔画像のサンプル画像を例示している。本明細書では、４００次元の輝度値ベクトルをｘ、顔と非顔の２クラスを分別するクラス・ラベルをｙとする。 Usually, learning is performed using thousands or tens of thousands of learning samples. FIG. 5 illustrates sample images of face images and non-face images. In this specification, a 400-dimensional luminance value vector is x, and a class label that classifies two classes of face and non-face is y.

集団学習機６は、ブースティング・アルゴリズムを用いて集団学習するが、複数の弱判別器を組み合わせ、結果的に強い判定結果が得られるよう学習する。一つ一つの弱判別器は判別能力が低い（ランダムよりも少し良い程度）が、例えば数百〜数千個の弱判別器を選別し、これらの組み合わせ方によって結果的に高い判別能力を持つ判別器を構築することができる。集団学習機６では、弱判別器の組み合わせ方、すなわち弱判別器２１_t（２１₁〜２１_K）の選別、及び、それらの出力値を重み付き多数決する際の重みα_t（α₁〜α_K）を学習する。 The group learning machine 6 performs group learning using a boosting algorithm, but combines a plurality of weak classifiers and learns so that a strong determination result is obtained as a result. Each weak discriminator has low discrimination ability (a little better than random), but for example, hundreds to thousands of weak discriminators are selected, and as a result, they have high discrimination ability. A classifier can be constructed. The group learning machine 6 selects the weak discriminators, that is, selects the weak discriminators 21 _t (21 _{1 to} 21 _K ), and the weights α _t (α _{1 to} α when the output values are subjected to a weighted majority decision. _K ) to learn.

本実施形態では、フィルタの学習を線形の２クラス判別問題に帰着させることで、フィルタ係数として理想的な実数値を求め、自由度の高いフィルタからなる弱判別器を学習するようにしている。そして、このような弱判別器群を用いることで認識性能の向上を図っている。 In the present embodiment, learning of a filter is reduced to a linear two-class discrimination problem, whereby an ideal real value is obtained as a filter coefficient, and a weak discriminator composed of a filter having a high degree of freedom is learned. And the recognition performance is improved by using such a weak classifier group.

特許文献１（前述）に開示された弱判別器は、２つのピクセル間の輝度値の差すなわちピクセル間差分特徴を算出する、いわば２点フィルタである。図６には、ピクセル間差分特徴を求める様子を示している。２点フィルタは参照画素として用いる２点の画素位置３１₁及び３１₂の輝度値｛Ｉ₁，Ｉ₂｝と固定的なフィルタ係数｛＋１，−１｝で構成され、この場合の集団学習機６は、多数の２点フィルタの選別、組み合わせ方を学習する。 The weak classifier disclosed in Patent Document 1 (described above) is a so-called two-point filter that calculates a difference in luminance value between two pixels, that is, an inter-pixel difference feature. FIG. 6 shows how the difference feature between pixels is obtained. The two-point filter is composed of luminance values {I ₁ , I ₂ } at _two pixel positions 31 ₁ and 31 ₂ used as reference pixels and fixed filter coefficients {+1, −1}. 6 learns how to select and combine a number of two-point filters.

これに対し、本実施形態では、３点以上のピクセルを参照画素に用いたフィルタを弱判別機に用いる。参照画素の個数が多くなれば、認識性能は向上するが計算量が増大する。以下の説明では、４点の参照画素を有する４点フィルタを弱判別器に用いる場合を例にとる。４点フィルタを構成するには、４点の参照画素の位置の組み合わせと、各参照画素の輝度値にそれぞれ乗算する実数値のフィルタ係数をパラメータとして決定する必要がある。集団学習機６は、多数の４点フィルタの選別、組み合わせ方を学習するが、線形の２クラス判別問題に帰着させることで理想的な実数値からなるフィルタ係数を求め、自由度の高いフィルタ群を学習するようになっている。 In contrast, in the present embodiment, a filter using three or more pixels as reference pixels is used for the weak classifier. If the number of reference pixels increases, the recognition performance improves, but the calculation amount increases. In the following description, a case where a four-point filter having four reference pixels is used as a weak classifier is taken as an example. In order to configure a four-point filter, it is necessary to determine, as parameters, a combination of positions of four reference pixels and a real-value filter coefficient that multiplies the luminance value of each reference pixel. The collective learning machine 6 learns how to select and combine a large number of four-point filters, but obtains a filter coefficient composed of ideal real values by reducing it to a linear two-class discrimination problem, and a filter group having a high degree of freedom. To learn.

図７には、４点フィルタの構成例を示している。同図では、２０×２０画素からなる入力画像（図示の例では、学習サンプルとなる顔画像）上に、４点の参照画素を直線状に配置している。図示のフィルタは、４つの参照画素の位置を表すのに、｛開始位置ｑ，ステップ幅ｓ｝の２つの情報を持てばよく、保持すべきデータがコンパクトになるという利点がある。４つの参照画素の位置ｐ_j（但し、ｊは参照画素の通し番号で１〜４の整数）は下式（４）で表される。 FIG. 7 shows a configuration example of a four-point filter. In the figure, four reference pixels are arranged in a straight line on an input image composed of 20 × 20 pixels (in the example shown, a face image as a learning sample). The illustrated filter has only two pieces of information of {start position q, step width s} to represent the positions of the four reference pixels, and has an advantage that the data to be held becomes compact. The position p _j of four reference pixels (where j is a serial number of the reference pixel and an integer of 1 to 4) is expressed by the following expression (4).

図７から分かるように、１ラインが２０画素からなる入力画像の場合、任意の開始位置ｑを設定し、ステップ幅ｓを１にすると４点フィルタは横一列の４画素からなり、ステップ幅ｓを２０にすると４点フィルタは縦一列の４画素からなり、ステップ幅ｓを２１にすると４点フィルタは右下がりの斜め４画素からなり、ステップ幅ｓを１９にすると４点フィルタは右上がりの斜め４画素からなる。 As can be seen from FIG. 7, in the case of an input image consisting of 20 pixels per line, if an arbitrary start position q is set and the step width s is set to 1, the 4-point filter consists of 4 pixels in a horizontal row, and the step width s When 20 is set, the 4-point filter is composed of 4 pixels in a vertical column. When the step width s is 21, the 4-point filter is composed of 4 pixels slanting to the right. When the step width s is 19, the 4-point filter is increasing to the right. It consists of four diagonal pixels.

また、図８Ａ並びに図８Ｂに示すように、入力画像に対して階層的なぼかしの処理を施すことで、広い範囲を参照することが可能な４点フィルタを構成することができる。図８Ａでは、隣接する２×２画素のブロック毎に輝度値を平均化して１画素とすることでぼかし処理を行なっている。また、図８Ｂでは、隣接する４×４画素のブロック毎に輝度値を平均化して１画素とすることでぼかし処理を行なっている。これらの場合、４点フィルタは、元の１×１画素に対し２×２並びに４×４の大きさからなる拡大した４個の参照画素を直線状に配置したものとなる。ぼかした画像の各画素を参照することは、１×１レイヤのオリジナル画像において広い範囲を一度に参照することに相当する。 Further, as shown in FIGS. 8A and 8B, a four-point filter that can refer to a wide range can be configured by performing hierarchical blurring processing on the input image. In FIG. 8A, the blurring process is performed by averaging the luminance values for each adjacent 2 × 2 pixel block to one pixel. In FIG. 8B, the blurring process is performed by averaging the luminance values for each adjacent 4 × 4 pixel block to one pixel. In these cases, the four-point filter is obtained by linearly arranging four enlarged reference pixels having a size of 2 × 2 and 4 × 4 with respect to the original 1 × 1 pixel. Referencing each pixel of the blurred image is equivalent to referencing a wide range at a time in the original image of the 1 × 1 layer.

顔パーツの位置によっては、粒度の細かく狭い範囲のみを参照する１×１レイヤの４点フィルタを用いた方が認識性能の高い場合や、逆に粒度が粗く広い範囲を参照する２×２レイヤ又は４×４レイヤの４点フィルタを用いた方が認識性能の高くなる場合が考えられる。 Depending on the position of the face part, the use of a 1 × 1 layer four-point filter that refers only to a fine and narrow range of the granularity provides better recognition performance, or conversely, a 2 × 2 layer that refers to a wide range with a coarse granularity. Alternatively, it is conceivable that the recognition performance is higher when a 4 × 4 layer four-point filter is used.

本実施形態では、図９に示すように、１×１レイヤのオリジナルの入力画像に対し、２×２レイヤ並びに４×４レイヤのぼかし処理済み画像を連結したものを単一の入力画像として扱うこと（すなわち画像のマルチレイヤ化）によって、４点フィルタの開始位置の指定によって適当なレイヤを選択できるようにしている。 In the present embodiment, as shown in FIG. 9, the original input image of 1 × 1 layer is a combination of 2 × 2 layer and 4 × 4 layer blurred image processed as a single input image. This means that an appropriate layer can be selected by designating the start position of the four-point filter.

入力画像をマルチレイヤ化する処理ための処理モジュールは、例えば走査部４の後段に配設し、あるいは走査部４内に組み込むことができる。 A processing module for processing the input image into a multi-layer can be disposed, for example, in the subsequent stage of the scanning unit 4 or incorporated in the scanning unit 4.

４点フィルタは、特許文献１に記載されているピクセル差分特徴に対し、単に参照画素の数を増やしただけでなく、各参照画素の輝度値からなる輝度画素ベクトルｘに対し乗算（内積計算）するフィルタ係数を、｛＋１，−１｝といった固定的な値ではなく、理想的な実数値を集団学習により求める点にも特徴がある。これにより、自由度の高いフィルタ群を構成することができる。 The four-point filter not only simply increases the number of reference pixels, but also multiplies (internal product calculation) a luminance pixel vector x composed of luminance values of the respective reference pixels with respect to the pixel difference feature described in Patent Document 1. A characteristic is that the filter coefficient to be obtained is not a fixed value such as {+1, -1} but an ideal real value is obtained by collective learning. Thereby, a filter group with a high degree of freedom can be configured.

フィルタｆは、Ｌ次元の輝度画像ベクトルｘと、フィルタの参照画素の位置ｐ_jがあるとき、下式（５）のように表される。但し、参照画素の位置ｐ_jは｛開始位置ｑ_j，ステップ幅ｓ_j｝で表される（前述並びに式（４）を参照のこと）。また、フィルタｆが４点フィルタの場合は次元数Ｌ＝４となる。 Filter f is a luminance image vector x L-dimensional, when there is a position p _j of the reference pixels of the filter is expressed as the following equation (5). However, the position p _{j of the} reference pixel is represented by {start position q _j , step width s _j } (see the above-mentioned and expression (4)). When the filter f is a four-point filter, the dimension number L = 4.

上式（５）において、ｗ_jは輝度画素ベクトルｘの各要素に乗算するフィルタ係数であり、ｂはバイアスである。フィルタ係数は、｛＋１，−１｝といった固定的な値ではなく、理想的な実数値である。理想的な実数値からなるフィルタ係数は、線形の２クラス判別問題に帰着させることによって求まる。 In the above equation (5), w _j is a filter coefficient for multiplying each element of the luminance pixel vector x, and b is a bias. The filter coefficient is not a fixed value such as {+1, −1} but an ideal real value. An ideal filter coefficient consisting of real values can be obtained by reducing to a linear two-class discrimination problem.

そして、上式（５）に示したフィルタｆを用いた弱判別器ｈは、下式（６）であり、判別結果を｛＋１，−１｝すなわち２クラスを分別するクラス・ラベルをｙとして出力する（フィルタｆによる内積計算結果は、入力画像中に対象物を検出したとき（例えば、顔画像のとき）には正の値となり、検出しなかったとき（例えば、非顔画像のとき）には負の値となる）。 The weak discriminator h using the filter f shown in the above formula (5) is the following formula (6), and the discrimination result is {+1, -1}, that is, the class label for classifying the two classes is y. Output (the inner product calculation result by the filter f is a positive value when an object is detected in the input image (for example, for a face image), and is not detected (for example, for a non-face image) Is negative).

次式（７）のように仮想的な５番目の参照画素ｐ₅として導入して、上式（５）の右辺に含まれるバイアスｂを仮想的な参照画素ｐ₅の輝度値ｘ_p5とすると、バイアスｂをフィルタ係数の１つｗ₅として表すことができ、上式（５）を次々式（８）のようにベクトルの内積計算の形式に簡略化することができる。 Introduced as a virtual fifth reference pixels p ₅ as in the following equation (7), when the bias b included in the right side of the above equation (5) and the luminance value x _p5 of the virtual reference pixels p ₅ , The bias b can be expressed as one of the filter coefficients w ₅ , and the above equation (5) can be simplified to a vector inner product calculation form as shown in equation (8).

Ｎ組の輝度値ベクトル（特徴ベクトル）ｘと、２クラスの判別結果を分別するクラス・ラベルｙが下式（９）のように与えられたとする。輝度値ベクトルｘ⁽¹⁾、…、ｘ^(N)は、多数の顔画像及び非顔画像の学習サンプル（図５を参照のこと）の中から合計してＮ組だけ抽出された各画像サンプルについての、４点の参照画素の輝度値を要素とするベクトルである。そして、顔画像サンプルから取り出した輝度値ベクトルのときはそのクラス・ラベルｙは＋１となり、非顔画像サンプルから取り出した輝度値ベクトルのときはそのクラス・ラベルｙは−１となる。 Assume that N sets of luminance value vectors (feature vectors) x and a class label y for classifying the discrimination results of the two classes are given by the following equation (9). The luminance value vectors x ⁽¹⁾ ,..., X ^(N) are image samples extracted in total from N learning samples (see FIG. 5 ⁾ of a large number of face images and non-face images. Is a vector whose elements are luminance values of four reference pixels. The class label y is +1 for the luminance value vector extracted from the face image sample, and the class label y is −1 for the luminance value vector extracted from the non-face image sample.

このときの判別誤差ｅ_lr（＝負の対数尤度：ｎｅｇａｔｉｖｅｌｏｇ−ｌｉｋｅｌｉｈｏｏｄ）は、下式（１０）のように表される。集団学習機６は、この判別誤差ｅ_lrを最小化するような複数の弱判別器ｈを選別し組み合わせるような学習を行なう。 The discrimination error e _lr (= negative log likelihood: negative log-likelihood) at this time is expressed by the following equation (10). The collective learning machine 6 performs learning so as to select and combine a plurality of weak classifiers h that minimize the discrimination error _elr .

上記の判別誤差ｅ_lrを最小化するような弱判別器ｈのパラメータ｛ｗ_j，ｂ｝を求めること、すなわちフィルタの学習は、線形の２クラス判別問題に帰着させることで実現し、具体的にはＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎを利用することができる。 The parameter {w _j , b} of the weak discriminator h that minimizes the discrimination error e _lr is obtained, that is, the learning of the filter is realized by reducing it to a linear two-class discrimination problem. For this, it is possible to use Logistic Regression.

また、下式（１１）及び（１２）で表される判別誤差ｅ_l2SVM並びにｅ_l１SVMを最小化する弱判別器ｈのパラメータ｛ｗ_j，ｂ｝を求めるには、線形サポート・ベクタ・マシン（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）（それぞれＬ２−ＳＶＭとＬ１−ＳＶＭ）を用いることができる。 Further, determination error e _L2SVM and parameters of weak classifiers h which minimizes the _e l1SVM {w _j, b} are represented by the following formula (11) and (12) to seek a linear support vector machine ( Support Vector Machine) (L2-SVM and L1-SVM, respectively) can be used.

ＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎ又はサポート・ベクタ・マシンの手法自体は、当業界において周知である。ＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎの詳細については、例えばＬｉｎ，Ｃ．，Ｗｅｎｇ，Ｒ．Ｃ．，ａｎｄＫｅｅｒｔｈｉ，Ｓ．Ｓ．２００７．ＴｒｕｓｔｒｅｇｉｏｎＮｅｗｔｏｎｍｅｔｈｏｄｓｆｏｒｌａｒｇｅ−ｓｃａｌｅｌｏｇｉｓｔｉｃｒｅｇｒｅｓｓｉｏｎ.（ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ２４ｔｈｉｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＭａｃｈｉｎｅＬｅａｒｎｉｎｇ．（Ｃｏｒｖａｌｉｓ，Ｏｒｅｇｏｎ，Ｊｕｎｅ２０−２４，２００７)．Ｚ．Ｇｈａｈｒａｍａｎｉ，Ｅｄ．ＩＣＭＬ ‘０７，ｖｏｌ．２２７．ＡＣＭ，ＮｅｗＹｏｒｋ，ＮＹ，５６１−５６８）を参照されたい。また、線形ＳＶＭの詳細については、例えばＪｏａｃｈｉｍｓ，Ｔ．２００６．ＴｒａｉｎｉｎｇｌｉｎｅａｒＳＶＭｓｉｎｌｉｎｅａｒｔｉｍｅ．（ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ１２ｔｈＡＣＭＳＩＧＫＤＤｉｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＫｎｏｗｌｅｄｇｅＤｉｓｃｏｖｅｒｙａｎｄＤａｔａＭｉｎｉｎｇ（Ｐｈｉｌａｄｅｌｐｈｉａ，ＰＡ，ＵＳＡ，Ａｕｇｕｓｔ２０−２３，２００６）．ＫＤＤ ’０６．ＡＣＭ，ＮｅｗＹｏｒｋ，ＮＹ，２１７−２２６）を参照されたい。 The Logistic Regression or Support Vector Machine approach itself is well known in the art. For details of Logistic Regression, see, for example, Lin, C. et al. , Weng, R .; C. , And Keerthi, S .; S. 2007. Trust region Newton methods for large-scale logistic regression. (In Processeds of the 24th international, E. M.7.E., 2). ACM, New York, NY, 561-568). For details of the linear SVM, see, for example, Joachims, T .; 2006. Training linear SVMs in linear time. (See In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Philadelphia, PA, USA, August 20-23, 2006). .

なお、ＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎとＳＶＭとの相違は、上記のコスト関数だけであり、非常に近い性能を示すことが知られている（例えば、前者のＬｉｎ外著の文献には、ＬｏｇｏｓｔｉｃＲｅｇｒｅｓｓｉｏｎがＳＶＭと同様の手法であることが記載されている）。当業界ではいずれの手法もよく研究されており、近年は非常に高速な学習アルゴリズムが存在する。また、いずれの手法を用いても、弱判別器のフィルタ係数ｗ_jとして理想的な実数値を学習することができ、自由度の高い弱判別器群からなる判別器５を構成して認識性能を向上させることができる、という点を十分理解されたい。 Note that the difference between Logistic Regression and SVM is only the above-mentioned cost function, and it is known that the performance is very close (for example, in the former Lin literature, Logistic Regression is the same as SVM) It is described that). Both techniques are well studied in the industry, and there are very fast learning algorithms in recent years. In addition, by using any of the methods, an ideal real value can be learned as the filter coefficient w _j of the weak classifier, and the classifier 5 including the weak classifier group having a high degree of freedom is configured to recognize the performance. It should be fully understood that this can be improved.

弱判別器の学習方法について、さらに詳細に説明する。図１０には、学習の大枠となるブースティング（Ａｄａｂｏｏｓｔ）を示している。但し、Ｎ個（例えば数万個）の学習サンプルＸと学習サンプルのラベル（正解）は既知であるする。学習サンプルは、多数の顔画像及び非顔画像のサンプル画像からなる（図５を参照のこと）。 The learning method of the weak classifier will be described in more detail. FIG. 10 shows boosting (Adaboost), which is a framework for learning. However, N (for example, several tens of thousands) learning samples X and learning sample labels (correct answers) are known. The learning sample consists of a large number of face images and non-face image sample images (see FIG. 5).

ＡｄａＢｏｏｓｔアルゴリズムでは、判別の難易度を反映した重みを表すデータ重みＤが各学習サンプルに付けられ、弱判別器としての４点フィルタが算出する特徴量が重み付けエラーを最小にするように、弱判別器のパラメータが学習される。ここで言うパラメータとは、フィルタを構成するために必要となる、参照画素の画素位置の組み合わせ｛ｐ₁，ｐ₂，ｐ₃，ｐ₄｝と、フィルタ係数｛ｗ₁，ｗ₂，ｗ₃，ｗ₄｝と、（仮想的な５番目の参照画素のフィルタ係数として表された）バイアスｗ₅である。 In the AdaBoost algorithm, a weak weight discrimination is performed such that a data weight D representing a weight reflecting the difficulty of discrimination is attached to each learning sample, and the feature quantity calculated by the four-point filter as the weak discriminator minimizes the weight error. The instrument parameters are learned. The parameters referred to here are a combination of pixel positions of reference pixels {p ₁ , p ₂ , p ₃ , p ₄ } and filter coefficients {w ₁ , w ₂ , w _{3 which} are necessary for constructing a filter. , W ₄ } and a bias w _{5 (} expressed as a filter coefficient of a virtual fifth reference pixel).

ｔ番目の弱判別器２１_tにとってｉ番目の学習サンプルのデータ重みをＤ_t,iとする。但し、初期段階では、各学習サンプルの判別の難易度は不明であることから、学習により最初に生成する弱判別器２１₁にとっての各学習サンプルのデータ重みＤ_1,iとして、下式（１４）に示すように、１を学習サンプル数Ｎで割った均一な値が与えられる。 For the t-th weak classifier 21 _t , the data weight of the i-th learning sample is D _{t, i} . However, since the difficulty of discrimination of each learning sample is unknown at the initial stage, the following equation (14) is used as the data weight D _{1, i} of each learning sample for the weak discriminator 21 ₁ generated first by learning. ), A uniform value obtained by dividing 1 by the number of learning samples N is given.

そして、（ａ）弱判別器の学習、（ｂ）重み付けエラーｅ_tの算出、（ｃ）重み付け多数決の重みの算出、（ｄ）データ重みの算出、（ｅ）打ち切り閾値の算出からなる処理を、生成すべき弱判別器の個数Ｋに相当する回数だけ繰り返して実施することで、Ｋ個（例えば、数千個）の弱判別器を得ることができる。 Then, (a) learning of weak classifiers, calculation of (b) weighted error e _t, calculation of the weight of (c) weighting majority, (d) calculating data weights, the process consisting of the calculation of (e) termination threshold By repeatedly performing the number of times corresponding to the number K of weak classifiers to be generated, K (for example, several thousand) weak classifiers can be obtained.

（ａ）では、ＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎにより、ｔ番目の弱判別器２１_tのパラメータが学習される。弱判別器の学習方法の詳細については後述に譲る。 In (a), the parameter of the t-th weak discriminator 21 _t is learned by Logistic Regression. The details of the weak classifier learning method will be described later.

（ｂ）で算出する重み付けエラーｅ_tとは、（ａ）においてｔ番目に学習した弱判別器２１_tですべての学習サンプルを判別した際のエラー、すなわち、当該弱判別器が顔画像サンプルなのに非顔と誤判別したときや、逆に非顔画像サンプルなのに顔と誤判別したとき、誤判定された各学習サンプルがそれぞれ持つデータ重みＤ_t,iの合計である。重み付けエラーｅ_tは、下式（１５）を用いて算出される。 The weighting error e _t calculated in (b) is an error when all learning samples are discriminated by the t-th weak discriminator 21 _t learned in (a), that is, the weak discriminator is a face image sample. When misclassified as a non-face, or conversely, when a non-face image sample is misclassified as a face, this is the total of the data weights D _{t, i} possessed by each misclassified learning sample. The weighting error _et is calculated using the following equation (15).

続いて、（ｃ）では、判別器５内の結合器２２において、弱判別器毎の出力を重み付き多数決する際に、当該弱判別器２１_tに与える重みの値α_tを求める。重みの値α_tは、弱判別器２１_tの信頼度に相当し、下式（１６）により算出される。 Subsequently, in (c), when the combiner 22 in the discriminator 5 performs a weighted majority decision on the output of each weak discriminator, the weight value α _t to be given to the weak discriminator 21 _t is obtained. The weight value α _t corresponds to the reliability of the weak classifier 21 _t and is calculated by the following equation (16).

続いて、（ｄ）では、後続の弱判別器２１_t+1にとっての各学習サンプルｘ⁽ⁱ⁾の判別の難易度を示すデータ重みＤ_t+1,iを、下式（１７）に示すように、重み付け多数決の重みα_tと、各学習サンプルｘ⁽ⁱ⁾のラベルｙ⁽ⁱ⁾と、各学習サンプルｘ⁽ⁱ⁾についての判別結果ｈ（ｘ⁽ⁱ⁾）に基づいて算出する。 Subsequently, in (d), a data weight D _{t + 1, i} indicating the difficulty of discrimination of each learning sample x ⁽ⁱ⁾ for the subsequent weak discriminator 21 _{t + 1} is represented by the following equation (17). as such, the weight alpha _t weighting majority, the label y ⁽ⁱ⁾ of each of the learning samples x ^(i), is calculated on the basis of the discrimination results for each training sample ^{x (i) h (x (} i)).

本実施形態では、加算器２２にて重み付き多数決を行なう際、すべての弱判別器２１_t（２１₁〜２１_K）の計算結果を待たず、計算途中の値次第で対象物体でないと判断して計算を打ち切るようになっている。（ｅ）では、判別工程を途中で打ち切るための打ち切り閾値を算出する。例えば、ラベリングされている学習サンプルのうち、検出対象物を示す学習サンプルの判別結果の重み付き多数決の値がとり得る最小値を打ち切り閾値とすることができる。判別工程では、ウィンドウ画像の各弱判別器によって特徴量が順次計算されると、下式（１８）に示すようにこれらが重み付き加算され、重み付き多数決の値Ｆが逐次更新されていく。 In the present embodiment, when performing the weighted majority decision by the adder 22, all without waiting for the calculation results of the weak discriminators _{_{_{21 t (21 1 ~21 K)}}} , is judged not to be the target object depending on the calculation middle value To stop the calculation. In (e), an abort threshold value for aborting the discrimination process is calculated. For example, among the labeled learning samples, the minimum value that can be taken by the weighted majority value of the discrimination result of the learning sample indicating the detection target can be set as the abort threshold. In the discrimination step, when the feature amounts are sequentially calculated by the weak discriminators of the window image, these are weighted and added as shown in the following equation (18), and the weighted majority value F is sequentially updated.

１つの弱判別器が特徴量を出力して重み付き多数決の値を更新する度に、打ち切り閾値と比較し、重み付き多数決の値が打ち切り閾値を下回る時点で当該ウィンドウ画像は対象物ではないとして計算を打ち切ることによって、無駄な演算を省いて判別処理を高速化することができる。Ｋ番目の弱判別器の出力ｆＫ（ｘ）の打ち切り閾値Ｒ_Kは、学習サンプルｘ_i（＝ｘ₁〜ｘ_N）のうち、対象物である学習サンプルｘ_j（＝ｘ₁〜ｘ_J）を使用したときの重み付き多数決の値の最小値とされ、下式（１９）のように定義される。いずれにせよ、打ち切り閾値Ｒ_Kは、少なくともすべてのポジティブな学習サンプルが通過できる最大の値となるよう設定する。 Each time a weak classifier outputs a feature value and updates the value of the weighted majority vote, it is compared with the abort threshold value, and the window image is not an object when the weighted majority vote value falls below the abort threshold value. By aborting the calculation, it is possible to speed up the discrimination process without using unnecessary calculations. K-th weak discriminator output fK termination threshold R _K of (x), of the learning samples _{_{_{x i (= x 1 ~x N}}} ), which is the object learning samples _{_{_{x j (= x 1 ~x J}}} ) Is the minimum value of the weighted majority vote, and is defined as in the following equation (19). In any case, the truncation threshold _RK is set to a maximum value that allows at least all positive learning samples to pass.

弱判別器が算出した特徴量にその弱判別器に対する信頼度を乗算して加算して重み付け多数決の値すなわち評価値ｓを更新する毎に、評価値ｓを打ち切り閾値Ｒ_tと比較し、弱判別器の推定値の演算を続けるか否かを判定する。重み付け多数決により得られる評価値ｓが打ち切り閾値Ｒ_t以下となる場合には、入力画像並びにそれを縮小したスケーリング画像のすべての領域を走査してウィンドウ画像を切り出しても、それらのウィンドウ画像のうち対象物である確率は小さく、ほとんどが非対象物である。したがって、評価値ｓが打ち切り閾値Ｒ_tを下回ったときに次段以降の弱判別器の演算を打ち切り、次のウィンドウ画像の処理に移ることにより、無駄な演算を飛躍的に低減して対象物検出を高速化することができる。但し、重み付き多数決の計算を途中で打ち切ることや、（ｅ）の打ち切り閾値の算出はオリジナルのＡｄａｂｏｏｓｔアルゴリズムには含まれない部分である。 Each time the feature value calculated by the weak discriminator is multiplied by the reliability for the weak discriminator and added to update the weighted majority value, that is, the evaluation value s, the evaluation value s is compared with the truncation threshold value R _t. It is determined whether or not to continue the calculation of the estimated value of the discriminator. When the evaluation value s obtained by the weighted majority decision is equal to or less than the cutoff threshold R _t , even if the window image is cut out by scanning all regions of the input image and the scaled image obtained by reducing the input image, The probability of being an object is small and most are non-objects. Therefore, truncation of the calculation of the next subsequent weak discriminators when the evaluation value s falls below the abort threshold value R _t, by moving to the next window image processing, object dramatically reduce wasteful computing Detection can be speeded up. However, the calculation of the weighted majority vote is aborted in the middle, and the calculation of the abort threshold in (e) is not included in the original Adaboost algorithm.

続いて、（ａ）の弱判別器の学習について詳細に説明する。上式（６）に示した弱判別器は、参照画素の位置｛ｐ₁，ｐ₂，ｐ₃，ｐ₄｝と、フィルタ係数｛ｗ₁，ｗ₂，ｗ₃，ｗ₄｝と、バイアスｗ₅のパラメータからなる。このうち参照画素の位置をＭ回（例えば、数万回）にわたって探索し、それぞれの参照画素の位置に対して、フィルタ係数とバイアスをＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎを用いて求める。 Next, learning of the weak classifier (a) will be described in detail. The weak discriminator shown in the above equation (6) includes reference pixel positions {p ₁ , p ₂ , p ₃ , p ₄ }, filter coefficients {w ₁ , w ₂ , w ₃ , w ₄ }, and bias. consisting of parameters of w _5. Among these, the position of the reference pixel is searched M times (for example, several tens of thousands), and the filter coefficient and the bias are obtained for each reference pixel position by using Logistic Regression.

図１１には、弱判別器を学習するアルゴリズムを示している。 FIG. 11 shows an algorithm for learning a weak classifier.

まず、ステップ１（ａ）として、フィルタの参照画素の位置ｐ_jの組をランダムに決めると、コスト関数を最小化するようなフィルタ係数とバイアスｗ_jを、ＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎを用いて求める。そして、ステップ１（ｂ）として、得られた４点フィルタすなわち弱判別器の重み付けエラーｅ_mを、上式（１５）を用いて算出する（但し、ｍは１〜Ｍの整数とする）。 First, as a step 1 (a), when a set of filter reference pixel positions p _j is determined at random, a filter coefficient and a bias w _j that minimizes a cost function are obtained using Logistic Regression. Then, in step 1 (b), the weighted error e _m of the obtained 4-point filter or a weak classifier is calculated using the above equation (15) (where, m is an integer of 1 to M).

ランダムに決定した参照画素の位置の組において、ＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎ（又は線形ＳＶＭ）によりフィルタ係数とバイアス、及び重み付けエラーを算出するという、ステップ１の処理をＭ回にわたって繰り返す。そして、ステップ２では、重み付けエラーが最小となる参照画素の位置の組、フィルタ係数、及びバイアスを、弱判別器のパラメータとして採用する。 In the set of reference pixel positions determined at random, the process of Step 1 of calculating the filter coefficient, the bias, and the weighting error by Logistic Regression (or linear SVM) is repeated M times. In step 2, a set of reference pixel positions where the weighting error is minimized, a filter coefficient, and a bias are adopted as parameters of the weak classifier.

ステップ１（ｂ）でパラメータｗ_j（但し、ｊは１〜５の整数）を求めるとき、上式（１５）で表される重み付けエラーを最小化するように求めるのが一般的である。但し、本実施形態では、下式（２０）〜（２２）に示すように、データ重みＤ_t,iを掛けた重み付けコスト関数を最小化している。コスト関数が上式（１５）と完全に同一のものでなくても、判別の良さを基準にしているものであれば、通常、ＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎは好適に機能する、という点を理解されたい。 When determining the parameter w _j (where j is an integer of 1 to 5) in step 1 (b), it is common to determine so as to minimize the weighting error expressed by the above equation (15). However, in the present embodiment, as shown in the following equations (20) to (22), the weighting cost function multiplied by the data weight D _{t, i} is minimized. Even if the cost function is not completely the same as the above equation (15), it should be understood that the Logistic Regi- sion normally functions as long as it is based on good discrimination.

図１２には、これまで説明してきた、Ａｄａｂｏｏｓｔにより弱判別器を学習するための処理手順をフローチャートの形式で示している。 FIG. 12 shows, in the form of a flowchart, the processing procedure for learning a weak classifier using Adaboost that has been described so far.

初期段階では、各学習サンプルの判別の難易度は不明であることから、まず、各学習サンプルのデータ重みを、上式（１４）を用いて初期化する（ステップＳ１）。 At the initial stage, since the difficulty level of discrimination of each learning sample is unknown, first, the data weight of each learning sample is initialized using the above equation (14) (step S1).

次いで、別途定義された処理ステップＳ２において、重み付きで弱判別器を学習する（ステップＳ２）。 Next, in a separately defined processing step S2, a weak classifier is learned with a weight (step S2).

そして、学習された弱判別器について、重み付けエラー、重み付け多数決の重み、並びに、データ重みなどのパラメータを更新する（ステップＳ３）。その際、当該弱判別器の打ち切り閾値を併せて更新するようにしてもよい。 Then, for the learned weak classifier, parameters such as a weighting error, weighting weighting majority, and data weighting are updated (step S3). At that time, the abort threshold value of the weak classifier may be updated together.

ステップＳ２及びＳ３をＫ回だけ繰り返すことにより（ステップＳ４）、Ｋ個の弱判別器を学習することができる。そして、求められたＫ個の弱判別器の線形和が判別器５となる（ステップＳ５）。 By repeating steps S2 and S3 K times (step S4), K weak classifiers can be learned. Then, the obtained linear sum of the K weak classifiers becomes the classifier 5 (step S5).

続いて、ステップＳ２において１つの弱判別器を学習する手順について説明する。 Next, the procedure for learning one weak classifier in step S2 will be described.

まず、参照画素の位置の組をランダムに決定すると（ステップＳ１１）、ＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎ（若しくは線形ＳＶＭ）を用いて、フィルタ係数とバイアスを学習する（ステップＳ１２）。 First, when a set of reference pixel positions is randomly determined (step S11), a filter coefficient and a bias are learned using logistic regression (or linear SVM) (step S12).

次いで、得られたフィルタ係数及びバイアスからなる弱判別器についての重み付けエラーを算出し（ステップＳ１３）、この重み付けエラーがこれまで得られた中で最小であれば、参照画素の位置の組と、フィルタ係数及びバイアスからなる弱判別器のパラメータを保持しておく（ステップＳ１４）。 Next, a weighting error is calculated for the weak classifier composed of the obtained filter coefficient and bias (step S13). If this weighting error is the smallest obtained so far, a set of reference pixel positions, The weak discriminator parameters including the filter coefficient and the bias are stored (step S14).

ステップＳ１１〜Ｓ１４における、ランダムな参照画素の位置の組におけるフィルタ係数とバイアスの学習を、Ｍ回にわたって実行する（ステップＳ１５）。そして、重み付けエラーすなわちコスト関数が最小となる弱判別器のパラメータを、戻り値とする（ステップＳ１６）。 The learning of the filter coefficient and the bias in the set of random reference pixel positions in steps S11 to S14 is performed M times (step S15). The parameter of the weak discriminator that minimizes the weighting error, that is, the cost function is set as a return value (step S16).

図１３には、図１２に示した処理手順により学習した弱判別器すなわちフィルタの例を示している。但し、学習サンプルには１×１レイヤのオリジナルの入力画像に対し、２×２レイヤ並びに４×４レイヤのぼかし処理済み画像を連結したマルチレイヤ化画像（図９を参照のこと）を用いた。また、正面顔検出のための４点フィルタとし、同図では、最初から１５番目までに選ばれたフィルタを示している。一般に、ブースティングでは、選ばれる順に弱判別器の信頼度が高く、したがって重み付き多数決の重みは大きくなる。 FIG. 13 shows an example of a weak classifier, that is, a filter learned by the processing procedure shown in FIG. However, a multi-layered image (see FIG. 9) in which 2 × 2 layer and 4 × 4 layer blurred images are connected to the original input image of 1 × 1 layer was used as the learning sample. . Further, a four-point filter for front face detection is shown, and in the figure, filters selected from the first to the fifteenth are shown. In general, in boosting, the reliability of weak classifiers is high in the order of selection, and therefore the weight of the weighted majority vote becomes large.

図１３中で、４つの正方形がフィルタの位置を示し、その中の白〜黒の濃度がフィルタ係数の値を示している。左上の１番目のフィルタは、目の位置の輝度が暗く、頬の位置の輝度が明るい画像に最も反応するフィルタとなっていることが判る。 In FIG. 13, four squares indicate the positions of the filters, and white to black densities therein indicate the values of the filter coefficients. It can be seen that the first filter in the upper left is the filter most responsive to an image with dark eye position brightness and bright cheek position brightness.

対象物の検出：
次に、図１に示す対象物検出装置において、入力画像から対象物を判別する方法について説明する。検出すなわち判別工程では、上述のようにして生成された弱判別器（フィルタ）群を利用した判別器５を使用して、入力画像中から対象物体を検出する。 Object detection:
Next, a method for discriminating an object from an input image in the object detection apparatus shown in FIG. 1 will be described. In the detection or discrimination step, the target object is detected from the input image using the discriminator 5 using the weak discriminator (filter) group generated as described above.

図１４には、弱判別器群を利用して画像中から対象物を検出するための処理手順をフローチャートの形式で示している。 FIG. 14 shows a processing procedure for detecting an object from an image using a weak classifier group in the form of a flowchart.

まず、スケーリング部３は、画像入力部２から与えられた濃淡画像を一定の割合で縮小処理し（ステップＳ２１）、スケーリング画像を走査部４に出力する。画像入力部２には、濃淡画像が入力されても、あるいは画像入力部２にて入力画像を濃淡画像に変換してもよい。 First, the scaling unit 3 reduces the grayscale image given from the image input unit 2 at a certain rate (step S21), and outputs the scaled image to the scanning unit 4. A grayscale image may be input to the image input unit 2, or the input image may be converted into a grayscale image by the image input unit 2.

ここで、スケーリング画像を生成するタイミングは、前に出力したスケーリング画像全領域の顔検出が終了した時点とし、スケーリング画像がウィンドウ画像より小さくなった時点で次フレームの入力画像の処理に移る。 Here, the timing for generating the scaled image is the time when the face detection of the entire area of the scaled image output previously is completed, and when the scaled image becomes smaller than the window image, the process proceeds to the processing of the input image of the next frame.

次いで、走査部４は、スケーリング画像上で探索ウィンドウの位置を縦横に走査して、ウィンドウ画像を出力する（ステップＳ２２）。 Next, the scanning unit 4 scans the position of the search window vertically and horizontally on the scaled image, and outputs a window image (step S22).

次いで、判別器５では、走査部４により出力されるウィンドウ画像が対象物であるか否かを判定する。すなわち、ウィンドウ画像に対して、４点フィルタからなる複数の弱判別器２１_tが特徴量ｆ（ｘ）を順次算出し（ステップＳ２３）、結合器２２が逐次重み付き加算して、その重み付き多数決の値の更新値を評価値ｓとして算出する（ステップＳ２４）。但し、判別器５に新たなウィンドウ画像が入力されたときには、まず評価値ｓ＝０に初期化する。 Next, the discriminator 5 determines whether or not the window image output from the scanning unit 4 is an object. That is, a plurality of weak discriminators 21 _t composed of a four-point filter sequentially calculate the feature value f (x) for the window image (step S 23), and the combiner 22 sequentially weights and adds the weights. The update value of the majority value is calculated as the evaluation value s (step S24). However, when a new window image is input to the discriminator 5, first, the evaluation value s = 0 is initialized.

次いで、この評価値ｓに基づきウィンドウ画像が対象物か否か、及び判別を打ち切るか否かを判定する（ステップＳ２５）。 Next, based on the evaluation value s, it is determined whether or not the window image is an object and whether or not the determination is terminated (step S25).

次いで、判別器５は、得られた評価値ｓが当該弱判別器２１_tの打ち切り閾値Ｒ_tより大きいか否かを判定する（ステップＳ２５）。ここで、評価値ｓが打ち切り閾値Ｒ_tより大きいときには（ステップＳ２５のＹｅｓ）、ステップＳ２３〜Ｓ２４の処理を所定回数（＝Ｋ回）だけ繰り返したか、すなわち、すべての弱判別器による特徴量の算出と重み付き多数決の値の更新処理が終了したか否かを判定する（ステップＳ２６）。まだ所定回数だけ繰り返していないときには（ステップＳ２６のＮｏ）、ステップＳ２３に戻り、次段の弱判別器２１_t+1について特徴量の算出と重み付き多数決の値の更新処理を繰り返す。 Next, the discriminator 5 determines whether or not the obtained evaluation value s is larger than the cutoff threshold R _t of the weak discriminator 21 _t (step S25). Here, when the evaluation value s is greater than the termination threshold R _t (Yes in step S25), and processing a predetermined number of times (= K times) of step S23~S24 only or repeated, i.e., the feature quantity by all the weak classifiers It is determined whether or not the calculation and weighted majority value update processing has been completed (step S26). If it has not been repeated a predetermined number of times yet (No in Step S26), the process returns to Step S23, and the calculation of the feature amount and the update process of the weighted majority value are repeated for the next weak discriminator 21 _{t + 1} .

一方、ステップＳ２３〜Ｓ２４の処理を所定回数（＝Ｋ回）だけ繰り返したときには（ステップＳ２６のＹｅｓ）、ステップＳ２７に進む。また、評価値ｓが打ち切り閾値Ｒ_t以下となるときには、次段以降の弱判別器による特徴量の算出と重み付き多数決の値の更新処理を打ち切り、次ステップＳ２６をスキップして次々ステップＳ２７に進む。評価値ｓが打ち切り閾値Ｒ_tを下回ったときに次段以降の弱判別器の演算を打ち切り、次のウィンドウ画像の処理に移ることにより、無駄な演算を飛躍的に低減して対象物検出を高速化することが可能となる。 On the other hand, when the processes in steps S23 to S24 are repeated a predetermined number of times (= K times) (Yes in step S26), the process proceeds to step S27. Further, when the evaluation value s becomes less than the abort threshold value R _t is abort the update process of the calculated value of the weighted majority decision feature quantity by the following stages of weak discriminators, one after step S27 is skipped and the next step S26 move on. Abort the calculation of the next stage and subsequent weak discriminators when the evaluation value s falls below the abort threshold value R _t, by moving to the next window image processing, a dramatically reduced by object detecting wasteful operations It is possible to increase the speed.

ステップＳ２７では、得られている評価値ｓが０より大きいか否か（すなわち、正負の符号判別）に応じて、ウィンドウ画像に対象物が含まれるか否かの判定をする。そして、ウィンドウ画像に対象物が含まれているときには、現在のウィンドウ位置を記憶し、次の探索ウィンドウがあるか否かを判別する（ステップＳ２７）。 In step S27, it is determined whether or not the object is included in the window image according to whether or not the obtained evaluation value s is greater than 0 (that is, whether the sign is positive or negative). When the object is included in the window image, the current window position is stored, and it is determined whether there is a next search window (step S27).

次の探索ウィンドウがあるときには（ステップＳ２７のＹｅｓ）、ステップＳ２２に戻り、走査部４が次の探索ウィンドウを切り出すと、Ｓ２３〜Ｓ２６の処理を繰り返す。 When there is a next search window (Yes in step S27), the process returns to step S22, and when the scanning unit 4 cuts out the next search window, the processes of S23 to S26 are repeated.

また、入力画像のすべての領域について探索ウィンドウを走査し終えたときには（ステップＳ２７のＮｏ）、次ステップＳ２８に進み、スケーリング部３から次のスケーリング画像の入力があるか否かを判定する。 When the search window has been scanned for all the regions of the input image (No in step S27), the process proceeds to the next step S28 to determine whether or not the next scaling image is input from the scaling unit 3.

次のスケーリング画像があるときには（ステップＳ２８のＹｅｓ）、ステップＳ２１に戻って、スケーリング部３から入力される次のスケーリング画像について、同様の処理をくり返す。ステップＳ２１のスケーリング処理は、ウィンドウ画像よりスケーリング画像が小さくなった時点で終了する。 When there is a next scaled image (Yes in step S28), the process returns to step S21, and the same processing is repeated for the next scaled image input from the scaling unit 3. The scaling process in step S21 ends when the scaled image becomes smaller than the window image.

１枚の入力画像に対して、すべてのスケーリング画像の処理が終了すると（ステップＳ２８のＮｏ）、続いて、重なり領域の削除処理に移り、１枚の入力画像において、対象物体であると判定された領域が重複しているときには互いに重なっている領域を取り除く。当該処理のために、まず、互いに重なっている領域が在るか否かを判定する（ステップＳ２９）。ステップＳ２６にて記憶した領域が複数あり、かつ重複している場合は、ステップＳ３０に進む。そして、互いに重なっている２つの領域を取り出し、この２つの領域のうち、評価値ｓが小さい領域は信頼度が低いとみなし削除し、評価値ｓの大きい領域を選択する（ステップＳ２９）。そして、再びステップＳ２９からの処理を繰り返す。これにより、複数回重複して抽出されている領域のうち、最も評価値ｓが高い領域１枚のみが選択される。なお、２以上の対象物領域が重複しない場合及び対象物領域が存在しない場合は１枚の入力画像についての処理を終了し、次のフレーム処理に移る。 When processing of all the scaled images is completed for one input image (No in step S28), the process proceeds to overlap region deletion processing and is determined to be a target object in one input image. When overlapping areas overlap, the overlapping areas are removed. For this processing, first, it is determined whether or not there are areas overlapping each other (step S29). When there are a plurality of areas stored in step S26 and there are overlapping areas, the process proceeds to step S30. Then, two regions that overlap each other are extracted, and the region having a small evaluation value s is regarded as having low reliability, and the region having a large evaluation value s is selected from the two regions (step S29). Then, the processing from step S29 is repeated again. As a result, only one region having the highest evaluation value s is selected from the regions that are extracted multiple times. If two or more object areas do not overlap or there is no object area, the process for one input image is terminated, and the process proceeds to the next frame process.

本実施形態における対象物検出方法によれば、各フィルタ係数が理想的な実数値からなる自由度の高い弱判別器（フィルタ）群を使用して対象物を検出するため、認識性能が向上する。 According to the object detection method of the present embodiment, the object is detected using a group of weak classifiers (filters) having a high degree of freedom in which each filter coefficient is an ideal real value, so that recognition performance is improved. .

また、弱判別器群を用いて重み付け多数決の値を更新する過程において打ち切り処理を採り入れることによって、無駄な演算を飛躍的に低減して対象物検出を高速化することが可能となる。入力画像並びにそれを縮小したスケーリング画像のすべての領域を走査してウィンドウ画像を切り出しても、それらのウィンドウ画像のうち対象物である確率は小さいと考えられるときには打ち切り処理を行なう。なお、逆に検出すべき対象物が多数含まれるようなときには、上述した打ち切り閾値と同様の手法にて、対象物であると明らかなウィンドウ画像の演算を途中で打ち切るような閾値も設けてもよい。さらに、入力画像をスケーリング部３にてスケーリングすることで、任意の大きさの探索ウィンドウを設定し、任意の大きさの対象物を検出することができる。 In addition, by adopting the truncation process in the process of updating the value of the weighted majority using the weak classifier group, it is possible to drastically reduce wasteful computation and speed up object detection. Even if the window image is cut out by scanning all regions of the input image and the scaled image obtained by reducing the input image, the censoring process is performed when it is considered that the probability of being an object among the window images is small. On the contrary, when a large number of objects to be detected are included, a threshold value may be provided so that the calculation of the window image that is clearly identified as the object is interrupted in the same way as the above-described threshold value. Good. Further, by scaling the input image by the scaling unit 3, a search window having an arbitrary size can be set and an object having an arbitrary size can be detected.

図１５には、Ａｄａｂｏｏｓｔにより学習した弱判別器群を利用して、デジタルカメラにより撮影した画像から対象物として顔領域を検出した様子を示している。入力画像をスケーリングし、所定サイズのウィンドウ画像でスキャンしていくことで、入力画像に散在する、遠近などにより大きさが区々となる複数の顔画像を好適に検出できていることを十分に理解されたい。 FIG. 15 shows a state where a face area is detected as an object from an image photographed by a digital camera using a weak classifier group learned by Adaboost. By scaling the input image and scanning with a window image of a predetermined size, it is possible to detect suitably a plurality of face images scattered in the input image and varying in size due to perspective, etc. I want you to understand.

フィルタのバリエーション：
ここまでは、弱判別器として、４つの参照画素の輝度値それぞれにフィルタ係数を掛け合わせて特徴量を算出する４点フィルタ（図７を参照のこと）について説明してきた。しかしながら、本発明の要旨は、フィルタを構成する参照画素の個数は４に限定されるものではなく、フィルタの形状にはバリエーションがある。例えば、図１６に示すような、矩形４×４画素の合計１６画素を参照する多点フィルタを考えることができる。同図において、参照画素の大きさが３種類あるのは画像のマルチレイヤ化（前述）に依るものである。４点フィルタの場合、４つの参照画素の位置を表すのに、｛開始位置ｑ，ステップ幅ｓ｝の２つの情報を持てばよい（前述）。これに対し、４×４画素の１６点フィルタの場合には、｛開始位置ｑ，ステップ幅ｓ，ウィンドウ幅ｗ｝の２つの情報を持てばよく、画素位置ｐ_i,jは下式のように表される。 Filter variations:
Up to this point, the four-point filter (see FIG. 7) that calculates the feature amount by multiplying the luminance values of the four reference pixels by the filter coefficients has been described as the weak classifier. However, the gist of the present invention is that the number of reference pixels constituting the filter is not limited to four, and there are variations in the shape of the filter. For example, as shown in FIG. 16, a multipoint filter that refers to a total of 16 pixels of a rectangular 4 × 4 pixel can be considered. In the figure, there are three types of reference pixel sizes depending on the multi-layered image (described above). In the case of a four-point filter, it is only necessary to have two pieces of information of {start position q, step width s} to represent the positions of the four reference pixels (described above). On the other hand, in the case of a 4 × 4 pixel 16-point filter, it is sufficient to have two pieces of information: {start position q, step width s, window width w}, and the pixel position p _{i, j} is expressed by the following equation: It is expressed in

また、特許文献１と同様に、任意の２点の画素を参照する２点フィルタも考えられる。画素位置は、そのまま２点分の情報を保持すればよい。図１７には、２点フィルタの構成例を示している。同図において、参照画素の大きさが３種類あるのは画像のマルチレイヤ化（前述）に依るものである。特許文献１に記載の発明では｛＋１，−１｝に固定されたフィルタ係数を用いるが（上式（２）を参照のこと）、これに対し、本発明では、各参照画素の輝度値に対し理想的な実数値からなるフィルタ係数を乗算する自由度の高いフィルタすなわち弱判別器を学習し、このような弱判別器群を用いて対象物を検出することで検出性能が向上する。 Similarly to Patent Document 1, a two-point filter that refers to arbitrary two pixels is also conceivable. The pixel position may hold information for two points as it is. FIG. 17 shows a configuration example of the two-point filter. In the figure, there are three types of reference pixel sizes depending on the multi-layered image (described above). In the invention described in Patent Document 1, filter coefficients fixed to {+1, −1} are used (see the above equation (2)). On the other hand, in the present invention, the luminance value of each reference pixel is set. On the other hand, a high-degree-of-freedom filter, that is, a weak discriminator that multiplies an ideal real value filter coefficient, is learned, and the detection performance is improved by detecting an object using such a weak discriminator group.

上述した以外にもさらにさまざまなフィルタを考えることができるが、ポイントは以下の２点である。 Various filters other than those described above can be considered, but the points are the following two points.

（１）参照画素数を多くすると計算量が増大するので、なるべく少ない参照画素数にする。
（２）参照画素の位置情報をなるべくコンパクトな表現で保持できるようにする。これによって、実行時に生じるメモリ・アクセス回数が削減され、高速化につながる。 (1) Since the amount of calculation increases when the number of reference pixels is increased, the number of reference pixels is made as small as possible.
(2) The position information of the reference pixel can be held in as compact an expression as possible. This reduces the number of memory accesses that occur during execution, leading to higher speed.

学習された判別器は、以下のテーブルに示す変数からなる。これをなるべく小さなデータ・サイズで表現して保持することで、実行時に生じるメモリ・アクセス回数が削減され、高速化につながる。同表中には、保持に必要となるビット数を併せて例示している。 The learned discriminator consists of variables shown in the following table. By expressing and holding this in as small a data size as possible, the number of memory accesses that occur during execution is reduced, leading to higher speed. In the table, the number of bits necessary for holding is also illustrated.

マルチレイヤ化画像の作成方法：
既に述べたように、本実施形態では、画像のマルチレイヤ化（図９を参照のこと）を行なうことで、１×１レイヤのオリジナル画像において広い範囲を一度に参照するようにしている。入力画像をマルチレイヤ化する処理ための処理モジュールは、例えば走査部４の後段に配設し、あるいは走査部４内に組み込むことができる。ここでは、マルチレイヤ化画像の作成する方法について説明する。 To create a multi-layered image:
As described above, in the present embodiment, by performing image multi-layering (see FIG. 9), a wide range is referred to at a time in an original image of 1 × 1 layer. A processing module for processing the input image into a multi-layer can be disposed, for example, in the subsequent stage of the scanning unit 4 or incorporated in the scanning unit 4. Here, a method of creating a multi-layered image will be described.

オリジナル画像をマルチレイヤ化する１つの方法として、入力画像の２０×２０画素を使って、２×２画素の平均で１０×１０画素の低解像度画像を作成し、次いで１０×１０画素のうちの２×２画素の平均（入力画像２０×２０画素からみると、４×４画素の平均）でさらに５×５画素の低解像度画像を作り、これら３つのレイヤを合わせた５２５画素を入力画像とする方法が挙げられる（図１８Ａを参照のこと）。また、他の方法として、２×２、４×４、８×８の各平均値フィルタを１画素ずつずらして、１９×１９、１７×１７、１３×１３の各低解像度画像をそれぞれ生成し、それら全部を合わせて１２１９画素を入力画像とする方法が挙げられる（図１８Ｂを参照のこと）。図１９に示す例では、低解像度の画像の各画素がオリジナル画像の正方領域の平均になっている。これにより低解像度の画素を選択したときは、オリジナルの複数の画素を一度に処理することになり効率が上がる。大きさの違う領域同士の比較なども可能になる。 One method of multi-layering the original image is to create a low-resolution image of 10 × 10 pixels on the average of 2 × 2 pixels using 20 × 20 pixels of the input image, and then out of 10 × 10 pixels A low-resolution image of 5 × 5 pixels is created with an average of 2 × 2 pixels (an average of 4 × 4 pixels when viewed from an input image of 20 × 20 pixels), and 525 pixels including these three layers are combined as an input image. (See FIG. 18A). As another method, low-resolution images of 19 × 19, 17 × 17, and 13 × 13 are respectively generated by shifting the average filter of 2 × 2, 4 × 4, and 8 × 8 by one pixel. A method of combining all of them and using 1219 pixels as an input image can be cited (see FIG. 18B). In the example shown in FIG. 19, each pixel of the low-resolution image is the average of the square area of the original image. As a result, when a low-resolution pixel is selected, a plurality of original pixels are processed at a time, which increases efficiency. Comparison between areas of different sizes is also possible.

実際の認識処理を高速に行なおうとする場合、マルチレイヤ画像を生成する時間が問題となる。毎回入力画像からマルチスケール画像を生成していては、何度も同じ画素の平均をとることになり、効率が低い。処理の冗長性を省くには、入力画面全体に対してマルチレイヤ変換処理を１回だけ施して、実際に認識処理するウィンドウ部分の画像を各スケールから切り出すということが考えられる。メモリに余裕がある場合にはこのような方法が有効である。 When actual recognition processing is to be performed at high speed, the time for generating a multilayer image becomes a problem. If a multi-scale image is generated from an input image every time, the same pixel is averaged many times, and the efficiency is low. In order to eliminate processing redundancy, it is conceivable that the entire input screen is subjected to a multi-layer conversion process only once, and an image of a window portion to be actually recognized is cut out from each scale. Such a method is effective when there is a margin in memory.

本発明者らは、メモリの使用量を極力抑え、且つ、マルチレイヤ変換に要する処理の冗長性を排除しつつ、メモリのコピーの回数も減らしたマルチレイヤ画像の生成方法を提案する。 The inventors of the present invention propose a method for generating a multi-layer image in which the amount of memory used is suppressed as much as possible, and the redundancy of processing required for multi-layer conversion is eliminated while the number of memory copies is reduced.

まず、データの並び形式を図２０に示すような点順次方式に変更する。つまり、各スケールの先頭の画素４つ（図示の例では４レイヤ）、並びに、その次にその隣の画素を各レイヤから取ったものが続く。したがって、オリジナルウィンドウのサイズが２０×２０画素であるとすると、８０×２０画素のウィンドウを用意する。この画像を生成するためには、まずオリジナルの２０×２０の画素を４つ飛ばしで画像領域からこの処理バッファにコピーする。 First, the data arrangement format is changed to a dot sequential method as shown in FIG. That is, the first four pixels of each scale (four layers in the example shown in the figure), followed by the next pixel taken from each layer. Therefore, if the size of the original window is 20 × 20 pixels, a window of 80 × 20 pixels is prepared. In order to generate this image, first four original 20 × 20 pixels are skipped and copied from the image area to this processing buffer.

次いで、下位レイヤの画素は直近の上位レイヤの４つ画素の平均から生成する。２レイヤ目の画素は１つ上の階層の隣接する２×２画素から生成し、３レイヤ目は２レイヤ目の１つ飛ばしの２×２画素から生成し、４レイヤ目は３レイヤ目の２つ飛ばしの２×２から生成することに留意されたい。 The lower layer pixels are then generated from the average of the four most recent upper layer pixels. The pixel of the second layer is generated from adjacent 2 × 2 pixels in the layer one level above, the third layer is generated from 2 × 2 pixels skipping one of the second layer, and the fourth layer is generated from the third layer Note that it generates from 2x2 skips.

このままだと、ウィンドウを移動する度に上述の処理が発生する。そこで、処理バッファとして、図２１Ａ〜Ｅの右側に示すような（ｗｉｎ＿ｗｉｄｔｈ×（ｉｍａｇｅ＿ｈｅｉｇｈｔ−１）＋ｉｍａｇｅ＿ｗｉｄｔｈ）×ｎｕｍ＿ｓｃａｌｅサイズの領域をあらかじめ用意する。入力画像の処理を開始する一番始めには、入力画像の左端の２０×ｉｍｇ＿ｈｅｉｇｈｔ分の画素を上述のように４つ飛ばしで短冊バッファにコピーする。次いで、上述の手順で２０×ｉｍａｇｅ＿ｈｅｉｇｈｔ分のマルチレイヤ画像をバースト的に作成する。これで入力画像の縦一列分のマルチスレイヤ画像ができたことになるので、これに対して、上から下まで処理ウィンドウの開始アドレスをずらしながら、認識処理を行なう。 If this is left, the above-described processing occurs every time the window is moved. Therefore, an area of (win_width × (image_height−1) + image_width) × num_scale size as shown on the right side of FIGS. At the very beginning of processing the input image, four pixels of 20 × img_height at the left end of the input image are skipped as described above and copied to the strip buffer. Next, a multilayer image of 20 × image_height is created in a burst manner by the above-described procedure. As a result, a multi-slayer image corresponding to one vertical column of the input image is formed. Accordingly, recognition processing is performed while shifting the start address of the processing window from top to bottom.

このときに点順次にしてあることで、先頭アドレスを移動するだけでそのウィンドウ内の画素の位置関係は常に図２０に示したような状態に保たれる。したがって、１列分処理する間は新たに入力画像をコピーしたりスケール変換したりする必要がない。 Since the dot sequence is used at this time, the positional relationship of the pixels in the window is always maintained as shown in FIG. 20 simply by moving the head address. Therefore, it is not necessary to newly copy or scale the input image while processing for one column.

次いで、１列分の処理が終了した後に次の列の画像を生成する方法について説明する。ウィンドウは１画素（スキップ処理をしているときはスキップする画素数分）だけしか移動していないので、対象画素の大半は短冊バッファの中に残されている。そこで、新たに必要な画素１列分（又はスキップした列分）だけ短冊内のｗｉｎ＿ｗｉｄｔｈ×ｎｕｍｓｃａｌｅ＋１番目からｗｉｎ＿ｗｉｄｔｈ×ｎｕｍ＿ｓｃａｌｅ間隔でコピーする。そして、埋めた画素に相当する下位レイヤの画素を短冊バッファ内の画素の４平均で生成する。 Next, a method for generating an image of the next column after the processing for one column is completed will be described. Since the window has moved only by one pixel (the number of pixels to be skipped when skip processing is being performed), most of the target pixels remain in the strip buffer. Therefore, copying is performed at an interval of win_width × numscale + 1 to win_width × num_scale in the strip for the newly required one column (or the skipped column). Then, a lower layer pixel corresponding to the filled pixel is generated by four averages of the pixels in the strip buffer.

認識処理は、短冊バッファの２番目の画素を先頭として指定して、同様に１列分の処理を次々と行なうことができる。 In the recognition process, the second pixel of the strip buffer is designated as the head, and similarly, the process for one column can be performed one after another.

Ｉ番目の列の処理をする際には、短冊バッファのｗｉｎ＿ｗｉｄｔｈ×ｎｕｍ＿ｓｃａｌｅ＋ｉ−１番目の画素から１列分の画素を埋めて、処理バッファの先頭をｉ−１番目から指定することで処理することができる。これにより、少ないバッファサイズで効率的に全画面を処理することが可能になる。 When processing the I-th column, fill the pixels for one column from the win_width × num_scale + i−1th pixel of the strip buffer, and specify the top of the processing buffer from the i−1th to process. Can do. As a result, the entire screen can be processed efficiently with a small buffer size.

ブースティングを用いた学習の場合、図２２に示すようなｓｔｕｍｐｃｌａｓｓｉｆｉｅｒを良く用いる。また、判定結果が正負で出力の信頼度を変える図２３に示すようなｒｅｇｒｅｓｓｉｏｎｓｔｕｍｐを用いて、ＧｅｎｔｌｅＢｏｏｓｔやＲｅａｌＡｄａＢｏｏｓｔのような弱仮説ＷＬの信頼度を加味した学習アルゴリズムも提案されている。 In the case of learning using boosting, a stamp classifier as shown in FIG. 22 is often used. In addition, a learning algorithm that takes into account the reliability of weak hypotheses WL such as GentleBoost and RealAdaBoost has been proposed using a regression stamp as shown in FIG.

信頼度を算出する弱判別器で一般的なのは、図２４に示すようなルックアップ・テーブルを用いたもので、特徴量を等分に区切ってビン毎に出力する信頼度をテーブルで保持している。図２４に示したヒストグラムから各特徴量のときの正負判定の確率値がよりきめ細かく出力することができる。 A weak discriminator for calculating the reliability generally uses a lookup table as shown in FIG. 24. The reliability is output in a bin by dividing feature quantities into equal parts. Yes. From the histogram shown in FIG. 24, the probability value of the positive / negative determination for each feature amount can be output more finely.

しかし、特許文献１に開示されているようなピクセル差分を使った画像認識タスクの場合、図２４に示したヒストグラムのように差分値が０を中心とする分布をとることが多い。何故ならば、近くの画素ほど似た値を取ることが多いからである。したがってｐｉｅｃｅ−ｗｉｓｅｆｕｎｃｔｉｏｎのように特徴量空間を均等にきると、最も情報が集中している０近辺の差異を見落とすことになる。この点に着目して、切れ目を自由に設定できるｒｅｇｒｅｓｓｉｏｎｓｔｕｍｐの区分を閾値を増やすことで拡張したｍｕｌｔｉ−ｔｈｒｅｓｈｏｌｄｒｅｇｒｅｓｓｉｏｎｓｔｕｍｐを提案し、ピクセル差分特徴量の弱判別器として用いた。 However, in the case of an image recognition task using a pixel difference as disclosed in Patent Document 1, the difference value often has a distribution centered on 0 as in the histogram shown in FIG. This is because the closer pixels often take similar values. Therefore, if the feature amount space is evenly divided like the piece-wise function, a difference near 0 where information is most concentrated is overlooked. Focusing on this point, a multi-threshold regression stamp was proposed in which the division of the regression stamp that can freely set the break is increased by increasing the threshold value, and used as a weak discriminator of the pixel difference feature quantity.

上述のように、複数の画素を平均した画素をあらかじめ計算して用意する。領域の大きさに合わせて、２×２の領域の平均した画素、４×４の領域を平均化した画素、８×８領域を平均化したピクセルなど３通りの画像を用意して、これをオリジナルの画像と合わせてマルチレイヤ画像と呼ぶ。マルチレイヤ画像を計算してしまえば、弱判別器の学習並びに弱判別器群を用いた対象物検出の際、細かいパターンも広範囲の大まかなパターンも両方扱うことが可能になり、トータル判別回数を減少させることができる。 As described above, a pixel obtained by averaging a plurality of pixels is calculated and prepared in advance. Depending on the size of the area, prepare 3 types of images, such as averaged pixel of 2x2 area, averaged pixel of 4x4 area, and averaged pixel of 8x8 area. Together with the original image, it is called a multilayer image. Once the multi-layer image is calculated, it is possible to handle both fine patterns and a broad range of patterns when learning weak classifiers and detecting objects using weak classifiers. Can be reduced.

以上、特定の実施形態を参照しながら、本発明について詳解してきた。しかしながら、本発明の要旨を逸脱しない範囲で当業者が該実施形態の修正や代用を成し得ることは自明である。 The present invention has been described in detail above with reference to specific embodiments. However, it is obvious that those skilled in the art can make modifications and substitutions of the embodiment without departing from the gist of the present invention.

本発明が扱う顔・非顔判別問題は、画像をスキャンすることで顔の「検出」技術として利用することができる。同様に、顔以外のパターンにも本発明が適用可能であり、例えばロゴマークの検出などが挙げられる。 The face / non-face discrimination problem handled by the present invention can be used as a “detection” technique for a face by scanning an image. Similarly, the present invention can also be applied to patterns other than the face, such as logo mark detection.

また、本発明が顔を扱う場合、顔を検出した後、さらに顔領域に対して「笑顔・非笑顔」を判別することで、笑顔検出を行なうことができる。勿論、笑顔以外の表情や状態などの判別にも、同様に本発明を適用することが可能である。 Also, when the present invention handles a face, smile detection can be performed by further determining “smile / non-smile” for the face area after detecting the face. Of course, the present invention can be similarly applied to discrimination of facial expressions and states other than smiles.

あるいは、本発明は、人種の判別などにも適用することができる。ここで、人種は例えば「黄色人種・白人・黒人」の複数クラスにまたがるが、「黄色人種・非黄色人種」、「白人・非白人」、「黒人・非黒人」といったように、問題を２クラス問題に変換することで実現可能となる。３以上のクラスにまたがる他の判別についても同様である。 Alternatively, the present invention can also be applied to race discrimination and the like. Here, the race spans multiple classes such as “Yellow Race / White / Black”, but “Yellow Race / Non-yellow Race”, “White / Non-White”, “Black / Non-Black”, etc. This can be realized by converting the problem into a two-class problem. The same applies to other discriminations over three or more classes.

本発明は、さまざまな画像の検出、判別、識別に用いることができるが、その例を以下に挙げる。 The present invention can be used for detection, discrimination, and identification of various images, examples of which are given below.

（１）ロゴマーク検出（会社のロゴや道路標識など）
（２）笑顔などの表情判別
（３）目や口の開閉状態の判別
（４）性別判別
（５）大人・子供判別
（６）人種判別
（７）個人識別（ある特定の個人か、そうでないかの判別）
（８）メガネの有無の判別
（９）顔パーツ（目・鼻・口といった顔器官など）の位置検出
（１０）文字認識
（１１）車検出・車種判別 (1) Logo mark detection (company logo, road signs, etc.)
(2) Facial expression discrimination such as smiles (3) Discrimination of eyes and mouth open / closed state (4) Gender discrimination (5) Adult / child discrimination (6) Race discrimination (7) Personal identification (whether it is a specific individual Is not)
(8) Determination of the presence or absence of glasses (9) Position detection of facial parts (face organs such as eyes, nose, mouth) (10) Character recognition (11) Car detection / model determination

要するに、例示という形態で本発明を開示してきたのであり、本明細書の記載内容を限定的に解釈するべきではない。本発明の要旨を判断するためには、特許請求の範囲を参酌すべきである。 In short, the present invention has been disclosed in the form of exemplification, and the description of the present specification should not be interpreted in a limited manner. In order to determine the gist of the present invention, the claims should be taken into consideration.

図１は、対象物検出装置の機能的構成を模式的に示した図である。FIG. 1 is a diagram schematically illustrating a functional configuration of the object detection apparatus. 図２は、スケーリング部３が縮小画像を生成する様子を示した図である。FIG. 2 is a diagram illustrating how the scaling unit 3 generates a reduced image. 図３は、走査部４において入力画像上で所定のウィンドウ・サイズのウィンドウをスキャンする様子を示した図である。FIG. 3 is a diagram showing how the scanning unit 4 scans a window having a predetermined window size on the input image. 図４は、判別器５の構成を示した図である。FIG. 4 is a diagram showing the configuration of the discriminator 5. 図５は、顔画像と非顔画像のサンプル画像を例示した図である。FIG. 5 is a diagram illustrating sample images of a face image and a non-face image. 図６は、ピクセル間差分特徴を求める様子を示した図である。FIG. 6 is a diagram showing how the inter-pixel difference feature is obtained. 図７は、４点フィルタの構成例を示した図である。FIG. 7 is a diagram illustrating a configuration example of a four-point filter. 図８Ａは、２×２画素のブロック毎に輝度値を平均化して１画素とすることでぼかし処理を行なった画像に直線状の４点フィルタを配置した様子を示した図である。FIG. 8A is a diagram illustrating a state in which a linear four-point filter is arranged in an image that has been subjected to blurring processing by averaging luminance values for each 2 × 2 pixel block to obtain one pixel. 図８Ｂは、４×４画素のブロック毎に輝度値を平均化して１画素とすることでぼかし処理を行なった画像に直線状の４点フィルタを配置した様子を示した図である。FIG. 8B is a diagram illustrating a state in which a linear four-point filter is arranged on an image that has been subjected to blurring processing by averaging luminance values for each 4 × 4 pixel block to obtain one pixel. 図９は、１×１レイヤのオリジナルの入力画像に対し、２×２レイヤ並びに４×４レイヤのぼかし処理済み画像を連結して画像をマルチレイヤ化した様子を示した図である。FIG. 9 is a diagram illustrating a state in which the 2 × 2 layer and 4 × 4 layer blurred images are connected to the original input image of the 1 × 1 layer to form a multi-layered image. 図１０は、学習の大枠となるブースティング（Ａｄａｂｏｏｓｔ）を示した図である。FIG. 10 is a diagram showing boosting (Adaboost), which is a framework for learning. 図１１は、ＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎを用いて弱判別器を学習するアルゴリズムを示した図である。FIG. 11 is a diagram illustrating an algorithm for learning a weak classifier using Logistic Regression. 図１２は、Ａｄａｂｏｏｓｔにより弱判別器を学習するための処理手順を示したフローチャートである。FIG. 12 is a flowchart showing a processing procedure for learning a weak classifier by Adaboost. 図１３は、上述した方法により学習した弱判別器すなわちフィルタの例を示した図である。FIG. 13 is a diagram showing an example of a weak classifier, that is, a filter learned by the above-described method. 図１４は、Ａｄａｂｏｏｓｔにより学習した弱判別器群を利用して画像中から対象物を検出するための処理手順を示したフローチャートである。FIG. 14 is a flowchart showing a processing procedure for detecting an object from an image using a weak classifier group learned by Adaboost. 図１５は、Ａｄａｂｏｏｓｔにより学習した弱判別器群を利用して、デジタルカメラにより撮影した画像から対象物として顔領域を検出した様子を示した図である。FIG. 15 is a diagram showing a state in which a face area is detected as an object from an image photographed by a digital camera using a weak classifier group learned by Adaboost. 図１６は、矩形４×４画素の合計１６画素を参照する多点フィルタの構成を例示した図である。FIG. 16 is a diagram illustrating the configuration of a multipoint filter that refers to a total of 16 pixels of a rectangular 4 × 4 pixel. 図１７は、本発明を適用して学習された２点フィルタの構成例を示した図である。FIG. 17 is a diagram showing a configuration example of a two-point filter learned by applying the present invention. 図１８Ａは、マルチレイヤ画像を作成する方法を説明するための図である。FIG. 18A is a diagram for explaining a method of creating a multilayer image. 図１８Ｂは、マルチレイヤ画像を作成する方法を説明するための図である。FIG. 18B is a diagram for describing a method of creating a multilayer image. 図１９は、マルチレイヤ画像の構成例を示した図である。FIG. 19 is a diagram illustrating a configuration example of a multilayer image. 図２０は、データの並び形式を点順次方式に変更する方法を説明するための図である。FIG. 20 is a diagram for explaining a method of changing the data arrangement format to the dot sequential method. 図２１Ａは、処理バッファを用いた入力画像の処理を説明するための図である。FIG. 21A is a diagram for explaining processing of an input image using a processing buffer. 図２１Ｂは、処理バッファを用いた入力画像の処理を説明するための図である。FIG. 21B is a diagram for describing processing of an input image using a processing buffer. 図２１Ｃは、処理バッファを用いた入力画像の処理を説明するための図である。FIG. 21C is a diagram for describing processing of an input image using a processing buffer. 図２１Ｄは、処理バッファを用いた入力画像の処理を説明するための図である。FIG. 21D is a diagram for describing processing of an input image using a processing buffer. 図２１Ｅは、処理バッファを用いた入力画像の処理を説明するための図である。FIG. 21E is a diagram for describing processing of an input image using a processing buffer. 図２２は、ｓｔｕｍｐｃｌａｓｓｉｆｉｅｒの構成例を示した図である。FIG. 22 is a diagram illustrating a configuration example of the stamp classifier. 図２３は、ｒｅｇｒｅｓｓｉｏｎｓｔｕｍｐの構成例を示した図である。FIG. 23 is a diagram illustrating a configuration example of a regression stamp. 図２４は、特徴量を等分に区切ってビン毎に出力する信頼度をテーブルで保持する様子を示した図である。FIG. 24 is a diagram illustrating a state in which the reliability of outputting each bin by dividing the feature amount into equal parts is held in a table. 図２５は、画像認識の手順の一例を示した図である。FIG. 25 is a diagram illustrating an example of an image recognition procedure. 図２６は、Ｈｕａｎｇらが提案するＳｐａｒｓｅｆｅａｔｕｒｅを説明するための図である。FIG. 26 is a diagram for explaining the Sparse feature proposed by Huang et al.

Explanation of symbols

１…対象物検出装置
２…画像入力部
３…スケーリング部
４…走査部
５…判別器
６…集団学習機
２１…弱判別器
２２…加算器
DESCRIPTION OF SYMBOLS 1 ... Object detection apparatus 2 ... Image input part 3 ... Scaling part 4 ... Scanning part 5 ... Discriminator 6 ... Collective learning machine 21 ... Weak discriminator 22 ... Adder

Claims

Object detection that detects whether a given image is an object using a learning sample consisting of multiple grayscale images with known correct answers (whether they are objects or non-objects) A collective learning device for collective learning of a plurality of weak classifiers used in
Each weak discriminator is a multi-point filter that calculates a feature value by calculating an inner product of a luminance value vector whose element is the luminance value of a reference pixel at the pixel position of the L point and a filter coefficient vector composed of an arbitrary real value ( However, L is an integer greater than or equal to 2),
For each weak discriminator, a learning means for learning a combination of reference pixel positions and an ideal real value used for a filter coefficient by boosting is provided.
A group learning device characterized by that.

The learning means obtains an ideal real value to be used as a filter coefficient by reducing learning of a multipoint filter to a linear two-class discrimination problem.
The group learning apparatus according to claim 1.

The learning means inputs a plurality of learning samples consisting of gray images in which two classes of objects or non-objects are classified, that is, labeled, into each weak classifier, and performs object and non-objects by boosting. Learn each feature amount,
The group learning apparatus according to claim 1.

The input image (learning sample as well as the given image) consists of a predetermined window size,
Each of the plurality of weak classifiers arranges L reference pixels in a straight line on the window size, and holds each pixel position as two pieces of information of {start position q, step width s}.
The group learning apparatus according to claim 1.

The original 1 × 1 layer input image is subjected to 2 × 2 blurring processing that averages the luminance value for each adjacent 2 × 2 pixel block to 1 pixel to generate a 2 × 2 layer image, and And a multi-layer image generation means for generating a 4 × 4 layer image by performing a 4 × 4 blurring process that averages the luminance value for each adjacent 2 × 2 pixel block of the original learning sample to 1 pixel. Prepared,
The learning means learns a weak classifier using a learning sample of each layer that is multi-layered by the multi-layer image generating means.
The group learning apparatus according to claim 1.

The learning means determines a parameter of a multipoint filter including a combination of pixel positions of L points of reference pixels and a filter coefficient so as to minimize a discrimination error obtained by inputting a plurality of learning samples into a weak discriminator. To learn weak classifiers,
The group learning apparatus according to claim 1.

The learning means determines a parameter of a multipoint filter including a combination of a pixel position of the L point of the reference pixel and a filter coefficient so as to minimize a discriminating error by using Logistic Regression or a support vector machine.
The group learning apparatus according to claim 6.

Each learning sample i has a data weight D _{t, i} representing a weight reflecting the difficulty of discrimination for each weak discriminator t (where i is a serial number for identifying a learning sample, and t is a weak discriminator). Serial number),
The learning means includes
Initialization means for initializing data weights D _{1, i} of each learning sample for the weak classifier to be generated first;
Weak classifier learning means for learning the t-th weak classifier by using Logistic Regression or a support vector machine;
When a plurality of learning samples are input to the t-th weak classifier learned by the weak classifier learning means, a data error D _{t, i} of each misclassified learning sample is added to calculate a weighting error. A weighting error calculating means;
A weighted majority voting weight calculating means for calculating a weight α _t corresponding to the reliability of the t-th weak classifier, which is used when the weighted majority of the feature quantities calculated by each weak classifier at the time of detection of an object;
The data weight D _{t + 1, i} of each learning sample is calculated based on the weighting majority weight α _t , the label for correctly attaching each learning sample, and the discrimination result for each learning sample by the t-th weak discriminator. Data weight updating means to perform,
With
The weak classifier learning means, the weighting error calculation means, the weight calculation means, and the data weight update means repeat the process as many times as necessary to learn the required number of weak classifiers.
The group learning apparatus according to claim 6.

The weak classifier learning means includes:
A weighting error obtained by adding the data weights D _{t, i} of misclassified learning samples when a combination of reference pixel positions of the multi-point filter t is randomly determined and a plurality of learning samples are input to the multi-point filter t The filter coefficient of the multi-point filter t that minimizes the cost function consisting of is determined using Logistic Regression or a support vector machine, and the weighting error of the multi-point filter t having the determined filter coefficient is obtained. Repeat the process several times,
A multi-point filter t composed of a combination of reference pixel positions and a filter coefficient that minimizes a weighting error in the above iterative processing is output as a t-th weak discriminator;
The group learning apparatus according to claim 8.

At the time of detection of an object, a truncation process is introduced in which the given grayscale image is determined not to be an object according to a value in the middle of the calculation in which the feature quantity calculated by each weak classifier is weighted, and the calculation is terminated. And
A censoring threshold calculating means for calculating a censoring threshold based on a minimum value that can be taken by the weighted majority value of the discrimination result of the learning sample indicating the object;
When the processing by the weak discriminator learning unit, the weighting error calculation unit, the weight calculation unit, and the data weight update unit is repeatedly performed as many times as necessary, the processing by the truncation threshold value calculation unit is also performed, Calculate the censoring threshold for each weak classifier learned,
The group learning apparatus according to claim 8.

Object detection that detects whether a given image is an object using a learning sample consisting of multiple grayscale images with known correct answers (whether they are objects or non-objects) A collective learning device for collective learning of a plurality of weak classifiers used in
Each weak discriminator is a multi-point filter that calculates a feature value by calculating an inner product of a luminance value vector whose element is the luminance value of a reference pixel at the pixel position of the L point and a filter coefficient vector composed of an arbitrary real value ( However, L is an integer of 2 or more, and each learning sample i has a data weight D _{t, i} that represents a weight reflecting the difficulty of discrimination for each weak discriminator t (where i identifies a learning sample) Serial number, t is a serial number that identifies the weak classifier),
An initialization step for initializing the data weights D _{1, i} of each learning sample for the weak classifier to be generated first;
A weak discriminator learning step of learning the t-th weak discriminator using Logistic Regression or a support vector machine;
When a plurality of learning samples are input to the t-th weak classifier learned in the weak classifier learning step, the weighting error is calculated by adding the data weights D _{t, i} of the misclassified learning samples. A weighting error calculation step;
A weighted majority voting weight calculating step for calculating a weight α _t corresponding to the reliability of the t-th weak classifier, which is used when weighting majority voting the feature amount calculated by each weak classifier at the time of detecting an object;
The data weight D _{t + 1, i} of each learning sample is calculated based on the weighting majority weight α _t , the label for correctly attaching each learning sample, and the discrimination result for each learning sample by the t-th weak discriminator. A data weight update step,
With
The weak classifier learning step, the weighting error calculation step, the weight calculation step, and the data weight update step are repeated as many times as necessary to learn the required number of weak classifiers.
A group learning method characterized by this.

Object detection that detects whether a given image is an object using a learning sample consisting of multiple grayscale images with known correct answers (whether they are objects or non-objects) A computer program written in a computer-readable format so as to execute a process for collective learning of a plurality of weak classifiers used in the computer,
Each weak discriminator is a multi-point filter that calculates a feature value by calculating an inner product of a luminance value vector whose element is the luminance value of a reference pixel at the pixel position of the L point and a filter coefficient vector composed of an arbitrary real value ( However, L is an integer of 2 or more, and each learning sample i has a data weight D _{t, i} that represents a weight reflecting the difficulty of discrimination for each weak discriminator t (where i identifies a learning sample) Serial number, t is a serial number that identifies the weak classifier),
The computer program causes the computer to
Initialization means for initializing data weights D _{1, i} of each learning sample for the weak classifier to be generated first;
Weak classifier learning means for learning the t-th weak classifier by using Logistic Regression or a support vector machine;
When a plurality of learning samples are input to the t-th weak classifier learned by the weak classifier learning means, a data error D _{t, i} of each misclassified learning sample is added to calculate a weighting error. A weighting error calculating means;
A weighted majority voting weight calculating means for calculating a weight α _t corresponding to the reliability of the t-th weak classifier, which is used when the weighted majority of the feature quantities calculated by each weak classifier at the time of detection of an object;
The data weight D _{t + 1, i} of each learning sample is calculated based on the weighting majority weight α _t , the label for correctly attaching each learning sample, and the discrimination result for each learning sample by the t-th weak discriminator. Data weight updating means to perform,
Function as
The weak classifier learning means, the weighting error calculation means, the weight calculation means, and the data weight update means repeat the process as many times as necessary to learn the required number of weak classifiers.
A computer program characterized by the above.

An object detection device for detecting whether a given grayscale image is an object,
It consists of a multi-point filter that calculates the feature value of the given grayscale image by calculating the inner product of a luminance value vector whose element is the luminance value of the reference pixel at the L pixel position and a filter coefficient vector consisting of an arbitrary real value. A plurality of weak classifiers (where L is an integer of 2 or more),
A discriminator for discriminating whether or not the given grayscale image is an object based on the feature amount calculated by at least one of the plurality of weak discriminators;
An object detection apparatus comprising:

The discriminator calculates a weighted majority value obtained by multiplying the feature amount calculated by each weak discriminator by a weight representing the reliability of the corresponding weak discriminator and adding the weight, and based on the weighted majority value To determine whether the given grayscale image is an object.
The object detection apparatus according to claim 13.

Each of the plurality of weak classifiers uses, as a filter coefficient, an ideal real value obtained by reducing learning of a multipoint filter to a linear two-class classification problem.
The object detection apparatus according to claim 13.

Each of the plurality of weak classifiers inputs a plurality of learning samples composed of gray images that are classified, that is, labeled, as two classes of objects or non-objects, and each of the objects and non-objects by boosting Is a multi-point filter that has been learned in advance,
The object detection apparatus according to claim 13.

The input image consists of a predetermined window size,
Each of the plurality of weak classifiers arranges L reference pixels in a straight line on the window size, and holds each pixel position as two pieces of information of {start position q, step width s}.
The object detection apparatus according to claim 13.

The 2 × 2 blur processing is performed by averaging the luminance value of the 1 × 1 layer input image for each adjacent 2 × 2 pixel block to 1 pixel to generate a 2 × 2 layer image. A multi-layer image generating means for generating a 4 × 4 layer image by performing a 4 × 4 blurring process that averages the luminance value of each learning block of adjacent 2 × 2 pixel blocks to 1 pixel;
Each of the plurality of weak classifiers uses a learning sample of each layer that has been multi-layered, and a multi-point filter for any one of the layers has been learned in advance. Calculating a feature amount for a corresponding layer obtained by multi-layering a grayscale image by the multi-layer image generation unit;
The object detection apparatus according to claim 13.

Each of the plurality of weak classifiers determines a parameter of a multipoint filter including a combination of pixel positions of L points of reference pixels and a filter coefficient so as to minimize a discrimination error obtained by inputting a plurality of learning samples. To be learned in advance,
The object detection apparatus according to claim 13.

The parameters of the multi-point filter are determined so as to minimize the discrimination error of the weak discriminator using Logistic Regression or a support vector machine.
The object detection device according to claim 19.

For each weak classifier, the censoring threshold is learned based on the minimum value that can be taken by the weighted majority value of the discrimination result of the learning sample indicating the object,
The discriminator determines that the given gray-scale image is not an object according to a comparison result between a halfway calculation value for weighted majority determination of the feature amount calculated by each weak discriminator, and the calculation is performed. abort,
The object detection apparatus according to claim 13.

An object detection device for detecting whether a given grayscale image is an object,
A plurality of weak discrimination steps for calculating feature values of the given grayscale image by calculating an inner product of a luminance value vector whose element is the luminance value of the reference pixel at the L pixel position and a filter coefficient vector consisting of an arbitrary real value (Where L is an integer of 2 or more),
A determination step of determining whether or not the given grayscale image is an object based on the feature amount calculated by at least one or more of the plurality of weak determination steps;
An object detection apparatus comprising:

A computer program written in a computer-readable format so as to execute on a computer a process for detecting whether or not a given grayscale image is an object, the computer comprising:
It consists of a multi-point filter that calculates the feature value of the given grayscale image by calculating the inner product of a luminance value vector whose element is the luminance value of the reference pixel at the L pixel position and a filter coefficient vector consisting of an arbitrary real value. A plurality of weak discrimination means (where L is an integer of 2 or more),
Discriminating means for discriminating whether or not the given grayscale image is an object based on the feature amount calculated by at least one of the plurality of weak discriminating means;
Computer program to function as