JP2006318341A

JP2006318341A - Detection object image determination device, method, and program

Info

Publication number: JP2006318341A
Application number: JP2005142226A
Authority: JP
Inventors: Hideto Takeuchi; 英人竹内
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2005-05-16
Filing date: 2005-05-16
Publication date: 2006-11-24

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method for efficiently detecting and determining a detection object image such as a human image part. <P>SOLUTION: This detection object image determination device determines whether a given grayscale image is a detection object image or not. The detection object image determination device is provided with a plurality of weak determination means, which is arranged for each pair in a plurality of adjacent or proximity pixel pairs in two positions previously decided by learning among pixels constituting the grayscale image for finding a difference between luminance values of two pixels in a pixel pair as characteristic quantity and calculating an estimation value showing whether a pair of pixel is an outline part of the detection object image or not based on the found characteristic quantity. The determination means determines whether the given grayscale image is the detection object image or not based on the estimation value calculated by a plurality of weak determination means. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明は、例えば人物画像などの検知対象画像を、与えられた濃淡画像から検出する検知対象画像判定装置、方法およびプログラムに関する。 The present invention relates to a detection target image determination apparatus, method, and program for detecting a detection target image such as a person image from a given grayscale image.

従来、画像中から、コンピュータにより、人物画像部分を自動的に見つける方法として、動きのある部分を人物画像部分とみなして検知する方法、人物画像を代表する部分である顔や頭部の肌色や顔部品により検知して、人物画像部分を検知したとする方法などがある。 Conventionally, as a method of automatically finding a person image portion from an image by a computer, a method of detecting a moving portion as a person image portion, a skin color of a face or a head representing a person image, There is a method in which a human image portion is detected by detecting with a facial part.

前者の例として、予め作成した背景画像と入力される画像との差分をとることにより、変化した部分を人物画像部分として検知する手法がある（例えば、特許文献１（特開平１１−３１１６８２号公報）参照）。 As an example of the former, there is a method of detecting a changed portion as a person image portion by taking a difference between a background image created in advance and an input image (for example, Japanese Patent Application Laid-Open No. 11-311682). )reference).

また、後者の例としては、顔を楕円としてモデル化し、肌色領域の楕円検知による顔検知を行なう手法がある（例えば、特許文献２（特開平１１−１８５０２６号公報）参照）。 As an example of the latter, there is a method in which a face is modeled as an ellipse and face detection is performed by detecting an ellipse in a skin color area (see, for example, Patent Document 2 (Japanese Patent Laid-Open No. 11-185026)).

この他に、例えば特許文献３（特開２００１−２２２７１９号公報）には、頭部を円としてモデル化し、テンプレートを作成し、投票により円形状の検知を行なうことで、頭部検知を行なう手法が開示されている。 In addition to this, for example, Patent Document 3 (Japanese Patent Laid-Open No. 2001-222719) discloses a method of detecting a head by modeling a head as a circle, creating a template, and detecting a circular shape by voting. Is disclosed.

また、特許文献４（特開２００４−１７８２２９号公報）には、特許文献３と同様に、頭部と外側に凸な境界曲線で囲まれる形状モデルで近似し、投票によりその形状を検知し、頭部検知を行なう方法が開示されている。 Further, in Patent Document 4 (Japanese Patent Application Laid-Open No. 2004-178229), similar to Patent Document 3, it is approximated by a shape model surrounded by a boundary curve convex to the head and the outside, and the shape is detected by voting, A method of performing head detection is disclosed.

上記の特許文献は、次の通りである。
特開平１１−３１１６８２号公報特開平１１−１８５０２６号公報特開２００１−２２２７１９号公報特開２００４−１７８２２９号公報 The above-mentioned patent documents are as follows.
JP-A-11-311682 Japanese Patent Laid-Open No. 11-185026 JP 2001-222719 A JP 2004-178229 A

しかしながら、特許文献１の発明のように、変化部分をもとめる手法では、画像上の明度が大きく変化した領域を人物画像部分とするために、照明環境変化のあった場合、椅子などの物が移動する場合でも人物として、検知してしまう問題がある。 However, as in the invention of Patent Document 1, in the method of finding the changed portion, an area where the brightness on the image is greatly changed is used as the person image portion. Even when doing so, there is a problem of detecting as a person.

また、特許文献２の発明の場合には、人物画像部分の肌色をさまざまな照明環境化で安定して検知することは難しい。また、楕円当て嵌めは、人物が正面を向いている顔については有効な手段であるが、人物が斜めを向いているときの検知に用いることは難しい。また、人物が後ろ向きの場合は頭部を検知できないという問題もある。 Further, in the case of the invention of Patent Document 2, it is difficult to stably detect the skin color of the person image portion in various lighting environments. Elliptical fitting is an effective means for a face in which a person is facing the front, but is difficult to use for detection when the person is facing obliquely. There is also a problem that the head cannot be detected when the person is facing backward.

また、特許文献３の発明では、検知を行なう際に用いる投票に使用するテンプレートの形状が同心円であるために、検知に寄与しない投票が多く、処理時間が無駄になるとともに、偽の投票ピークが作られて、誤検知が発生するという問題がある。 Further, in the invention of Patent Document 3, since the shape of the template used for voting used for detection is a concentric circle, there are many votes that do not contribute to detection, processing time is wasted, and false voting peaks are generated. There is a problem that false detection occurs.

さらに、特許文献４の発明では、頭部を外側に凸な形状と仮定してテンプレートを作成しているので、撮影方向などによっては、その仮定が成り立たず、人物画像部分の検知ができなくなるという問題がある。 Furthermore, in the invention of Patent Document 4, since the template is created on the assumption that the head is convex outward, the assumption does not hold depending on the shooting direction and the like, and the person image portion cannot be detected. There's a problem.

この発明は、以上の問題点を軽減して、人物画像部分などの検知対象画像を効率良く検出判定することができる装置および方法を提供することを目的とする。 An object of the present invention is to provide an apparatus and method that can reduce the above-described problems and efficiently detect and determine a detection target image such as a person image portion.

上記の課題を解決するために、請求項１の発明は、
与えられた濃淡画像が検知対象画像であるか否かを判定する検知対象画像判定装置であって、
前記濃淡画像を構成する画素のうちの、予め学習により定められた隣接または近接する２つの位置の画素の組の複数個のそれぞれの組に対して設けられ、前記画素の組の２画素間の輝度値の差分を特徴量として求め、前記特徴量に基づいて前記画素の組が前記検知対象画像の輪郭部分であるか否かを示す推定値を算出する弱判別手段の複数個と、
前記複数個の弱判別手段により算出された前記推定値に基づいて、前記与えられた濃淡画像が前記検知対象画像であるか否かを判定する判定手段と
を備えることを特徴とする検知対象画像判定装置を提供する。 In order to solve the above problems, the invention of claim 1
A detection target image determination device that determines whether or not a given grayscale image is a detection target image,
Among the pixels constituting the grayscale image, the pixel is provided for each of a plurality of sets of adjacent or adjacent two positions determined by learning, and between the two pixels of the set of pixels. A plurality of weak discriminating means for calculating an estimated value indicating whether or not the set of pixels is a contour portion of the detection target image based on the feature amount and obtaining a difference between luminance values as a feature amount;
A detection target image comprising: a determination unit that determines whether the given grayscale image is the detection target image based on the estimated values calculated by the plurality of weak determination units. A determination device is provided.

この請求項１の発明においては、検知対象画像を判定するための特徴量として、隣接または近接する２つの画素間の輝度値の差分を用いる。この特徴量は、広義の物体輪郭情報に相当する。したがって、請求項１の発明によれば、人の頭部や肩のような大まかな輝度変化を持つ輪郭部分またはエッジ部分が優先的に検知でき、与えられた濃淡画像が、人物画像（人型画像）などの検知対象画像であるか否かを効率的に検知判定することができる。 According to the first aspect of the present invention, a difference in luminance value between two adjacent or adjacent pixels is used as a feature amount for determining a detection target image. This feature amount corresponds to object outline information in a broad sense. Therefore, according to the first aspect of the present invention, it is possible to preferentially detect a contour portion or an edge portion having a rough luminance change such as a human head or shoulder, and a given grayscale image is a person image (human type). It is possible to efficiently detect and determine whether the image is a detection target image such as an image.

また、請求項３の発明は、
濃淡画像中から検知対象画像を検出判定する検知対象画像判定装置であって、
前記濃淡画像を縮小し、複数の異なる大きさの画像を生成する画像縮小手段と、
前記画像縮小手段からの前記複数の異なる大きさの縮小画像のそれぞれを、固定サイズのウインドウ単位で走査する走査手段と、
前記走査手段から得られる前記ウインドウ単位の濃淡画像を構成する画素のうちの、予め学習により定められた隣接または近接する２つの位置の画素の組の複数個のそれぞれの組に対して設けられ、前記画素の組の２画素間の輝度値の差分を特徴量として求め、前記特徴量に基づいて前記画素の組が前記検知対象画像の輪郭部分であるか否かを示す推定値を算出する弱判別手段の複数個と、
前記複数個の弱判別手段により算出された前記推定値に基づいて、前記ウインドウ単位の濃淡画像が前記検知対象画像であるか否かを判定する判定手段と
を備えることを特徴とする検知対象画像判定装置を提供する。 The invention of claim 3
A detection target image determination device that detects and detects a detection target image from grayscale images,
Image reduction means for reducing the grayscale image and generating a plurality of images of different sizes;
Scanning means for scanning each of the plurality of different size reduced images from the image reducing means in window units of a fixed size;
Of the pixels constituting the grayscale image of the window unit obtained from the scanning means, provided for each of a plurality of sets of adjacent or adjacent two positions of pixels determined by learning in advance, A weakness for calculating a difference between luminance values of two pixels of the pixel set as a feature amount, and calculating an estimated value indicating whether the pixel set is a contour portion of the detection target image based on the feature amount A plurality of discrimination means;
A detection target image, comprising: a determination unit that determines whether or not the grayscale image in window units is the detection target image based on the estimated values calculated by the plurality of weak determination units. A determination device is provided.

この請求項３の発明においては、濃淡画像中に検知対象画像が含まれている場合に、予め定められた固定サイズのウインドウ単位で、前記請求項１と同様の検知対象画像の検知判定処理を行なう。この場合に、濃淡画像中の検知対象画像は、種々の大きさで含まれている可能性が大きいが、請求項３の発明では、濃淡画像を縮小して、種々のサイズの縮小画像を生成し、その縮小画像のそれぞれについて、前記固定サイズのウインドウ単位での走査を行なって、当該ウインドウ単位の画像が、検知対象画像であるか否かを判定する。 In the third aspect of the invention, when the detection target image is included in the grayscale image, the detection determination process for the detection target image similar to that of the first aspect is performed in units of a predetermined fixed size window. Do. In this case, the detection target image in the grayscale image is likely to be included in various sizes. In the invention of claim 3, the grayscale image is reduced to generate reduced images of various sizes. Then, each of the reduced images is scanned in the window unit of the fixed size, and it is determined whether or not the window unit image is a detection target image.

したがって、請求項３の発明によれば、効率良く、濃淡画像中に種々の大きさで含まれる検知対象画像を検知判定することができる。 Therefore, according to the invention of claim 3, it is possible to efficiently detect and determine detection target images included in various sizes in the grayscale image.

この発明によれば、人の頭部や肩のような大まかな輝度変化を持つ輪郭部分またはエッジ部分が優先的に検知でき、与えられた濃淡画像が、人物画像（人型画像）などの検知対象画像であるか否かを効率的に検知判定することができる。また、濃淡画像中に種々の大きさで含まれる検知対象画像を、効率良く検知判定することができる。 According to the present invention, it is possible to preferentially detect a contour portion or an edge portion having a rough luminance change such as a human head or shoulder, and a given grayscale image is detected as a human image (humanoid image) or the like. It is possible to efficiently detect and determine whether the image is a target image. Further, it is possible to efficiently detect and determine a detection target image included in various sizes in the grayscale image.

以下、この発明による検知対象画像判定装置および方法の実施形態を、図を参照しながら説明する。以下に説明する実施形態は、アンサンブル学習（Ｅｎｓｅｍｂｌｅｌｅａｒｎｉｎｇ：集団学習）を利用して、入力画像から検知対象画像を検出判定する場合である。以下の説明は、静止画を処理する場合について説明するが、動画の場合にも同様の処理が可能である。なお、動画の処理の場合には、以下に説明する検知対象画像判定装置を複数個設けて、それら複数個の検知対象画像判定装置を並列に動作させて処理を行なうこともできる。 Hereinafter, embodiments of a detection target image determination apparatus and method according to the present invention will be described with reference to the drawings. The embodiment described below is a case where a detection target image is detected from an input image using ensemble learning (group learning). In the following description, the case of processing a still image will be described, but the same processing can be performed for a moving image. In the case of moving image processing, a plurality of detection target image determination devices described below may be provided, and the plurality of detection target image determination devices may be operated in parallel.

図１は、実施形態の検知対象画像判定装置を含む対象物検出システムの構成例を示すブロック図であり、入力画像提供装置１と、実施形態の検知対象画像判定装置２と、検知対象画像の結果出力装置部３とからなる。 FIG. 1 is a block diagram illustrating a configuration example of an object detection system including a detection target image determination device according to an embodiment. The input image providing device 1, the detection target image determination device 2 according to the embodiment, and a detection target image It consists of a result output unit 3.

入力画像供給装置部１は、検知対象画像判定装置部２に対して、その入力画像として、濃淡画像を出力する。この入力画像提供装置部１は、例えば、記録紙に記録されている画像をスキャンして読み取り、濃淡画像の画像データとして出力したり、入力端子を通じて入力された画像データを取り込んで、濃淡画像の画像データとして出力したりする機能を備える。入力画像の画像データが濃淡画像の画像データではないときには、当該入力画像の画像データを濃淡画像の画像データに変換する機能も備える。 The input image supply device unit 1 outputs a grayscale image as the input image to the detection target image determination device unit 2. For example, the input image providing device unit 1 scans and reads an image recorded on a recording sheet and outputs the scanned image as grayscale image data, or captures image data input through an input terminal to obtain a grayscale image. It has a function to output as image data. When the image data of the input image is not grayscale image data, a function of converting the input image image data into grayscale image data is also provided.

検知対象画像判定装置部２は、スケーリング部２１と、走査部２２と、判定部２３と、処理制御部２０とを備えて構成されており、与えられた画像（入力画像）中から、検知対象画像の領域を示す検知対象画像位置および検知対象画像の大きさの情報を出力する。 The detection target image determination device unit 2 includes a scaling unit 21, a scanning unit 22, a determination unit 23, and a processing control unit 20, and detects a detection target from the given image (input image). Information on the detection target image position indicating the image area and the size of the detection target image is output.

結果出力装置部３は、検知対象画像判定装置部２からの検知対象画像位置および検知対象画像の大きさの情報を受けて、検知対象画像位置および検知対象画像の大きさをユーザに報知する。 The result output unit 3 receives the detection target image position and the size of the detection target image from the detection target image determination unit 2 and notifies the user of the detection target image position and the size of the detection target image.

この例では、検知対象画像判定装置部２のスケーリング部２１、走査部２２、判定部２３および処理制御部２０は、機能ブロックとされており、検知対象画像判定装置２は、コンピュータにより構成されている。すなわち、この例の場合には、スケーリング部２１、走査部２２、判定部２３および処理制御部２０は、コンピュータが備えるメモリに格納されたプログラムが実行されることにより実現されるソフトウエア機能手段の構成とされている。 In this example, the scaling unit 21, the scanning unit 22, the determination unit 23, and the processing control unit 20 of the detection target image determination device unit 2 are function blocks, and the detection target image determination device 2 is configured by a computer. Yes. That is, in the case of this example, the scaling unit 21, the scanning unit 22, the determination unit 23, and the processing control unit 20 are software function means realized by executing a program stored in a memory included in the computer. It is configured.

もっとも、スケーリング部２１、走査部２２、判別部２３および処理制御部２０のそれぞれを、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）などによりハードウエア構成とすることもできる。その場合には、処理制御部２０は、検知対象画像判定装置２の全体を制御するマイクロコンピュータなどの構成とすることができる。 However, each of the scaling unit 21, the scanning unit 22, the determination unit 23, and the processing control unit 20 may be configured in hardware by a DSP (Digital Signal Processor) or the like. In that case, the process control unit 20 may be configured as a microcomputer that controls the entire detection target image determination device 2.

スケーリング部２１は、処理制御部２０の指示に基づき、入力画像を、予め設定されている複数の大きさ（サイズ）の全てに縮小または拡大し、その処理結果のスケーリング画像を走査部２２に出力する。 Based on an instruction from the processing control unit 20, the scaling unit 21 reduces or enlarges the input image to all of a plurality of sizes (sizes) set in advance, and outputs the scaled image of the processing result to the scanning unit 22. To do.

入力画像供給装置部１からの入力画像のサイズは、固定サイズとすることもできるが、この例においては、一定とは限らない。もしも、入力画像のサイズが予め設定されているサイズの中の最大サイズあるいはそれ以上であるときには、スケーリング部２１では縮小処理のみを行なって、予め設定されている全てのサイズのスケーリング画像を得ることができる。しかし、入力画像のサイズが予め設定されているサイズの中の最大サイズよりも小さいときには、画像のサイズの拡大処理が必要となる。ここで、画像の縮小処理としては、バイリニア補完を用いた画像の縮小処理などが行なわれる。 The size of the input image from the input image supply unit 1 may be a fixed size, but in this example, it is not necessarily constant. If the size of the input image is equal to or larger than the maximum size among the preset sizes, the scaling unit 21 performs only the reduction processing to obtain the scaled images of all the preset sizes. Can do. However, when the size of the input image is smaller than the maximum size among the preset sizes, an enlargement process of the image size is required. Here, as the image reduction process, an image reduction process using bilinear interpolation is performed.

走査部２２は、各サイズのスケーリング画像について、検出したい対象物の大きさのウインドウ単位で順次走査をしながら、各走査位置でのウインドウ画像を切り出し、切り出したウインドウ画像を判定部２３に供給する。 The scanning unit 22 cuts out the window image at each scanning position while sequentially scanning the scaled image of each size in units of windows of the size of the object to be detected, and supplies the cut-out window image to the determination unit 23. .

判定部２３は、各ウインドウ画像が、検知対象画像であるか否かを判別し、その判別結果を処理制御部２０に出力する。処理制御部２０は、検知対象画像であると判定した時のスケーリング画像のサイズおよびそのスケーリング画像におけるウインドウ画像位置（ウインドウ位置）の情報を結果出力装置部３に出力する。 The determination unit 23 determines whether each window image is a detection target image, and outputs the determination result to the processing control unit 20. The process control unit 20 outputs to the result output unit 3 information on the size of the scaled image and the window image position (window position) in the scaled image when it is determined to be the detection target image.

ここで、検知対象画像判定装置部２の処理制御部２０は、入力画像から複数の検知対象画像が検出された場合は、複数の領域情報を結果出力装置部３に出力する。更に、複数の領域情報のうち領域が重なりあっている領域が存在する場合は、後述する方法で最も検知対象画像とされる評価が高い領域を選択する処理も行なうことができる。 Here, when a plurality of detection target images are detected from the input image, the processing control unit 20 of the detection target image determination device unit 2 outputs a plurality of region information to the result output device unit 3. Furthermore, when there is an overlapping area among the plurality of area information, it is possible to perform a process of selecting an area having the highest evaluation as a detection target image by a method described later.

前述したように、スケーリング部２１では、予め設定されている複数サイズのスケーリング画像を生成し、生成したスケーリング画像を走査部２２に出力する。この実施形態においては、スケーリング部２１では、最初に、全ての複数のスケーリング画像を生成してしまって、それを走査部２２に出力するのではなく、所定のサイズのスケーリング画像を生成したら、その生成したスケーリング画像を走査部４に対して出力し、そのスケーリング画像についての走査処理および判定処理を終えた後、次のサイズのスケーリング画像を生成し、走査部２２に渡すという処理を、全てのサイズのスケーリング画像が出力されるまで繰り返す。 As described above, the scaling unit 21 generates scaling images having a plurality of sizes set in advance, and outputs the generated scaling images to the scanning unit 22. In this embodiment, the scaling unit 21 first generates all the plurality of scaled images and outputs them to the scanning unit 22 instead of outputting them to the scanning unit 22. The generated scaled image is output to the scanning unit 4, and after the scanning process and the determination process for the scaled image are finished, a scaled image of the next size is generated and passed to the scanning unit 22. Repeat until a scaled image is output.

この繰り返し制御を含む検知対象画像判定処理は、処理制御部２０の制御により実行される。この処理制御部２０における検知対象画像判定処理制御動作の概要を、図２のフローチャートを参照しながら、説明する。 The detection target image determination process including this repeated control is executed under the control of the process control unit 20. An outline of the detection target image determination processing control operation in the processing control unit 20 will be described with reference to the flowchart of FIG.

まず、処理制御部２０は、入力画像供給装置部１からの入力画像を検知対象画像判定装置部２に取り込む（ステップＳ１）。そして、処理制御部２０は、スケーリング部２１に、この例では、例えば最も大きい画像サイズのスケーリング画像の生成指示をする（ステップＳ２）。スケーリング部２１は、この指示に基づいて、指示されたサイズのスケーリング画像、例えば図３のスケーリング画像１０Ａを生成して、走査部２２に出力するようにする。ここで、例えば最も大きい画像サイズのスケーリング画像１０Ａは、入力画像のサイズそのものであるときには、スケーリング部２１は、入力画像をそのままスケーリング画像１０Ａとして走査部２２に出力するようにする。 First, the process control unit 20 captures an input image from the input image supply device unit 1 into the detection target image determination device unit 2 (step S1). Then, in this example, the processing control unit 20 instructs the scaling unit 21 to generate a scaled image having the largest image size (step S2). Based on this instruction, the scaling unit 21 generates a scaled image of the instructed size, for example, the scaled image 10A of FIG. 3 and outputs it to the scanning unit 22. Here, for example, when the scaled image 10A having the largest image size is the size of the input image itself, the scaling unit 21 outputs the input image as it is to the scanning unit 22 as the scaled image 10A.

次に、処理制御部２０は、例えばスケーリング部２１からのスケーリング画像１０Ａの生成完了通知に基づき、走査部２２に、スケーリング画像１０Ａを受け取り、ウインドウによる走査およびウインドウ画像の切り出しを行なうように指示する（ステップＳ３）。走査部２２は、この処理制御部２０からの指示に基づいて、受け取ったスケーリング画像について、ウインドウによる走査およびウインドウ画像の切り出しを行なう。 Next, the processing control unit 20 instructs the scanning unit 22 to receive the scaled image 10A and scan the window and cut out the window image based on, for example, the generation completion notification of the scaled image 10A from the scaling unit 21. (Step S3). Based on the instruction from the processing control unit 20, the scanning unit 22 scans the received scaled image and cuts out the window image.

この場合に、走査部２２では、例えば図４に示すような固定サイズ、例えば２４画素×２４画素のウインドウＷＤを用意し、このウインドウＷＤを、図４に示すように、スケーリング画像の水平方向にＮ画素（Ｎ≧１）づつ移動させて走査させ、この水平方向の走査が終了すると、スケーリング画像の垂直方向にＭ画素（Ｍ≧１）だけ移動させて、水平方向の走査を繰り返すというように、ウインドウ走査を行なう。すなわち、この例では、走査部２２は、いわゆるラスタースキャン型の走査を行なう。 In this case, the scanning unit 22 prepares a window WD having a fixed size as shown in FIG. 4, for example, 24 pixels × 24 pixels, and this window WD is arranged in the horizontal direction of the scaled image as shown in FIG. N pixels (N ≧ 1) are moved and scanned, and when the horizontal scanning is completed, the horizontal scanning is repeated by moving M pixels (M ≧ 1) in the vertical direction of the scaled image. Window scan. That is, in this example, the scanning unit 22 performs a so-called raster scan type scan.

そして、走査部２２は、スケジュール画像上における各ウインドウ走査位置において、当該ウインドウＷＤに囲まれる領域の画像を切り出して、それをウインドウ画像として判定部２３に出力するようにする。 Then, the scanning unit 22 cuts out an image of an area surrounded by the window WD at each window scanning position on the schedule image, and outputs it to the determination unit 23 as a window image.

ここで、上記Ｎ，Ｍの値を１画素とすると非常に精細な画像走査が可能となるが、処理すべきウインドウ画像数が増加するため、処理速度の低下を招く。また、上記Ｎ，Ｍの値をあまりに大きくすると、画像走査が大雑把になり、判定結果の信頼性が低下する。そこで、この例では、処理速度と判定結果の信頼性を考慮して、上記Ｎ，Ｍの値を設定するようにする。 Here, if the values of N and M are set to one pixel, very fine image scanning is possible, but the number of window images to be processed increases, resulting in a decrease in processing speed. If the values of N and M are too large, the image scanning becomes rough and the reliability of the determination result is lowered. Therefore, in this example, the values of N and M are set in consideration of the processing speed and the reliability of the determination result.

ウインドウＷＤは、この例のように縦横のサイズが同じなくても良く、また、矩形である必要もない。例えば処理によって、ウインドウＷＤは、複雑な形状、例えばダイヤモンド型やフリーハンドで指定した形状であってもよい。 The window WD does not need to have the same vertical and horizontal sizes as in this example, and does not have to be rectangular. For example, depending on the processing, the window WD may have a complicated shape, for example, a shape designated by a diamond shape or freehand.

また、ウインドウＷＤの走査方法としては、上述の例では、一定の画素間隔でラスタースキャンをするようにしたが、一定の画素間隔である必要は無い。例えば、入力画像が一連の動画像である場合には、以前に検知対象画像が検知された付近では狭い間隔で、それ以外の箇所では広い間隔で走査するようにしても良い。また、スキャン方法は、垂直方向を先にスキャンして、そのスキャン位置を水平方向にずらして行くような方法でもよい。また、ラスタースキャンではなく、例えば周辺部から渦巻状に中心に向かってスキャンするようにしても良い。 In addition, as a scanning method of the window WD, in the above-described example, raster scanning is performed at a constant pixel interval, but it is not necessary to have a constant pixel interval. For example, when the input image is a series of moving images, scanning may be performed with a narrow interval in the vicinity where the detection target image was previously detected, and with a wide interval in other locations. Further, the scanning method may be a method in which the vertical direction is scanned first and the scanning position is shifted in the horizontal direction. Further, instead of raster scanning, for example, scanning may be performed from the peripheral part in a spiral shape toward the center.

次に、処理制御部２０は、走査部２２からの１回の走査完了通知ごとに、判定部２３に、切り出されたウインドウ画像を受け取り、当該ウインドウ画像が検知対象画像であるか否かの判定を行なうように指示する（ステップＳ４）。 Next, the processing control unit 20 receives the clipped window image at the determination unit 23 for each scan completion notification from the scanning unit 22, and determines whether the window image is a detection target image. (Step S4).

そして、判定部２３でウインドウ画像についての判定結果（検知対象画像であるか否かを示す情報）を受け取ると、処理制御部２０は、その判定結果を、その時のスケーリング画像のサイズおよびウインドウ位置の情報と共に一時保持した後、当該スケーリング画像におけるウインドウＷＤによる全走査が完了したか否か判別する（ステップＳ５）。このとき、この例では、処理制御部２０は、処理中のスケーリング画像の画像サイズを認識しており、このため、ウインドウＷＤによる当該スケーリング画像における走査回数をも処理制御部２０は認識しているので、ステップＳで、当該スケーリング画像におけるウインドウＷＤによる全走査が完了したか否か判別することができる。 When the determination unit 23 receives a determination result (information indicating whether the image is a detection target image) about the window image, the processing control unit 20 uses the determination result as the size of the scaling image and the window position at that time. After temporarily holding the information together with the information, it is determined whether or not all the scanning by the window WD in the scaled image is completed (step S5). At this time, in this example, the process control unit 20 recognizes the image size of the scaled image being processed, and thus the process control unit 20 also recognizes the number of scans in the scaled image by the window WD. Therefore, in step S, it is possible to determine whether or not all scanning by the window WD in the scaled image has been completed.

ステップＳ５で、当該スケーリング画像におけるウインドウＷＤによる全走査が完了してはいないと判別したときには、処理制御部２０は、ステップＳ３に戻って、走査部２２に対して、ウインドウＷＤを次の走査位置に移動させ、ウインドウ画像を切り出して、判定部２３に出力させるようにする指示する。そして、処理制御部２０は、ステップＳ３、ステップＳ４、ステップＳ５の処理を繰り返す。 If it is determined in step S5 that the full scan by the window WD in the scaled image has not been completed, the process control unit 20 returns to step S3 and moves the window WD to the next scan position with respect to the scan unit 22. And instructing to cut out the window image and output it to the determination unit 23. And the process control part 20 repeats the process of step S3, step S4, and step S5.

以上のようにして、処理制御部２０は、ステップＳ３〜ステップＳ５を、１つのスケーリング画像について、全ウインドウ走査位置について繰り返すように、走査部２２および判定部２３を制御する。 As described above, the process control unit 20 controls the scanning unit 22 and the determination unit 23 so as to repeat Steps S3 to S5 for one scaling image for all window scanning positions.

ステップＳ５で、当該スケーリング画像におけるウインドウＷＤによる全走査が完了したと判別したときには、処理制御部２０は、全てのスケーリング画像についての検知対象画像の検出判定処理が終了したか否か判別し（ステップＳ６）、終了していないと判別したときには、ステップＳ２に戻り、スケーリング部２１に、次の画像サイズのスケーリング画像、例えば図３の画像１０Ｂを生成して走査部２２に出力するように指示する。そして、処理制御部２０は、前述したこのステップＳ２以降の処理を繰り返す。 When it is determined in step S5 that all the scans by the window WD in the scaled image have been completed, the process control unit 20 determines whether or not the detection target image detection determination process for all the scaled images has been completed (step S5). S6) When it is determined that the process has not been completed, the process returns to step S2 to instruct the scaling unit 21 to generate a scaled image of the next image size, for example, the image 10B of FIG. . And the process control part 20 repeats the process after this step S2 mentioned above.

ここで、図３の例では、最も大きいサイズの画像１０Ａから、画像１０Ａ→画像１０Ｂ→画像１０Ｃ→画像１０Ｄ→画像１０Ｅというように順次に画像サイズを縮小したものを示しており、スケーリング部２１は、例えば、画像１０Ａを０．８７５倍して画像１０Ｂを生成し、画像１０Ｂを０．８７５倍して画像１０Ｃを生成し、画像１０Ｃを０．８７５倍して画像１０Ｄを生成・・・というようにして、各画像サイズのスケーリング画像を生成するようにする。 Here, in the example of FIG. 3, the image size is sequentially reduced from the image 10A having the largest size in the order of image 10A → image 10B → image 10C → image 10D → image 10E. For example, the image 10A is multiplied by 0.875 to generate the image 10B, the image 10B is multiplied by 0.875 to generate the image 10C, the image 10C is multiplied by 0.875, and the image 10D is generated ... Thus, a scaled image of each image size is generated.

この例では、ウインドウＷＤのサイズは一定とされ、判定部２３では、この固定サイズのウインドウ画像について、それが検知対象画像であるか否かの判定をするようにするが、上述のようにして、スケーリング部２１で、各画像サイズのスケジュール画像が生成されて、入力画像の画像サイズが様々なサイズに変換されるので、任意の大きさの検知対象画像についての判定をすることが可能になる。 In this example, the size of the window WD is constant, and the determination unit 23 determines whether or not this fixed-size window image is a detection target image. Since the scaling unit 21 generates a schedule image of each image size and converts the image size of the input image into various sizes, it is possible to determine a detection target image having an arbitrary size. .

なお、入力画像のサイズを変更するのではなく、ウインドウＷＤのサイズを変更することにより、上述と同様の処理結果を得ることができるが、その場合には、判定部２３では、種々の大きさのウインドウ画像についての判定を行なう必要が生じ、好ましくない。 Note that the processing result similar to that described above can be obtained by changing the size of the window WD instead of changing the size of the input image. In this case, the determination unit 23 has various sizes. It is necessary to make a determination on the window image, which is not preferable.

また、後述するように、この例においては、判定部２３では集団学習による学習結果を反映させた判定方法が用いられるが、その場合に、ウインドウ画像の大きさが種々のものとなると、その複数個のウインドウ画像のそれぞれに対応した学習を行ない、その複数個のウインドウ画像についての学習結果をそれぞれ用いた判定処理が必要となり、処理が膨大になるという問題もある。この点、この実施形態のようにすれば、画像サイズ縮小（場合によっては画像サイズ拡大）処理を行なうだけで、ウインドウ画像は１種でよいので、全体として構成が簡単化できる。 Further, as will be described later, in this example, the determination unit 23 uses a determination method that reflects the learning result of group learning. In this case, if the window image has various sizes, a plurality of the window images are used. There is also a problem that learning corresponding to each of the window images is performed, and a determination process using the learning results for each of the plurality of window images is required. In this respect, according to this embodiment, only one type of window image is required only by performing an image size reduction process (in some cases, an image size increase process), so that the overall configuration can be simplified.

以上のようにして、全ての画像サイズのスケーリング画像についての検知対象画像の判定処理が終了すると、ステップＳ６からステップＳ７に進み、処理制御部２０は、一時保持している判定部２３からの検知対象画像についての判定結果（スケーリング画像のサイズとウインドウ位置の情報を含む）を参照し、検知対象画像であると判定されたスケーリング画像のサイズおよびそのウインドウ位置の情報を結果出力装置部３に出力する。 As described above, when the detection target image determination process for the scaled images of all image sizes is completed, the process proceeds from step S6 to step S7, and the process control unit 20 detects from the determination unit 23 temporarily held. With reference to the determination result for the target image (including information on the size of the scaling image and the window position), the size output of the scaling image determined to be the detection target image and the information on the window position are output to the result output unit 3 To do.

なお、結果出力装置部３において出力する判定結果は、上述の説明の例のように、スケーリング画像のサイズと、ウインドウの位置の情報に限られるものではなく、検知されたウインドウ領域の画像を、上記の判定結果と併せてあるいは単独で表示するようにしても良い。 Note that the determination result output in the result output device unit 3 is not limited to the information on the size of the scaled image and the position of the window, as in the example described above. You may make it display together with said determination result or independently.

また、上述の説明では、判定結果は、１つの入力画像についての判定処理が終了した後に検知対象画像判定装置部２から結果出力装置部３に出力するようにしたが、検知対象画像が検出される毎に、その判定結果を検知対象画像判定装置部２から結果出力装置部３に出力するようにしてもよい。また、出力時に、検知対象画像の判定結果について、判定領域に重なりがある場合には、検知対象画像判定装置部２に重複を取り除く処理を追加して、重なり部分を無くして出力するようにすることもできる。 In the above description, the determination result is output from the detection target image determination device unit 2 to the result output device unit 3 after the determination process for one input image is completed. However, the detection target image is detected. The determination result may be output from the detection target image determination device unit 2 to the result output device unit 3 each time. Further, when there is an overlap in the determination region regarding the determination result of the detection target image at the time of output, a process for removing the overlap is added to the detection target image determination device unit 2 so that the overlap portion is eliminated and output. You can also.

［判定部２３の構成例］
この実施形態における判定部２３は、アンサンブル学習（集団学習）を利用して、その入力画像（ウインドウ画像）が検知対象画像であるか否かを判定する。なお、この判定部２３は、請求項１の発明の実施形態を構成するものである。 [Configuration Example of Determination Unit 23]
The determination unit 23 in this embodiment uses ensemble learning (group learning) to determine whether the input image (window image) is a detection target image. The determination unit 23 constitutes an embodiment of the invention of claim 1.

集団学習によって得られる学習機械は、多数の弱仮説と、これらを組み合わせる結合機（ｃｏｍｂｉｎｅｒ）とからなる。ここで、入力によらず、固定した重みで弱仮説の出力を統合する結合機の一例としてブースティングがある。ブースティングは、前に生成した弱仮説の学習結果を使用して、間違いを苦手とする学習サンプル（例題）の重みを増すように、学習サンプルが従う分布を加工し、この分布に基づき新たな弱仮説の学習を行なう。 A learning machine obtained by group learning is composed of a number of weak hypotheses and a combiner that combines them. Here, boosting is an example of a combiner that integrates weak hypothesis outputs with fixed weights regardless of input. Boosting uses the previously generated weak hypothesis learning result to process the distribution that the learning sample follows to increase the weight of the learning sample (example) that is not good at mistakes, and based on this distribution a new Learn weak hypotheses.

これにより不正解が多く、検知対象画像として判別が難しい学習サンプルの重みが相対的に上昇し、結果的に、重みが大きい、即ち判別が難しい学習サンプルを正解させるような弱判別器が逐次選択される。すなわち、学習における弱仮説の生成は逐次的に行われるものであり、後から生成された弱仮説は、その前に生成された弱仮説に依存することになる。 As a result, the weights of learning samples that have many incorrect answers and are difficult to discriminate as the detection target image are relatively increased. As a result, weak discriminators that correctly correct the learning samples that have large weights, that is, difficult to discriminate, are sequentially selected. Is done. That is, generation of weak hypotheses in learning is performed sequentially, and weak hypotheses generated later depend on weak hypotheses generated before that.

検知対象画像を検出する際には、上述のようにして、学習により逐次生成された多数の弱仮説の判別結果を使用する。例えばアダブースト（ＡｄａＢｏｏｓｔ）の場合は、この学習により生成された弱仮説（以下、弱判別器という。）の全ての判別結果（検知対象画像であれば１、非検知対象画像であれば−１）が結合機に供給され、結合機は、全判別結果に対して、対応する弱判別器毎に学習時に算出された信頼度を重み付け加算し、その重み付き多数決の結果を出力し、結合機の出力値を評価することで、入力された画像が検知対象画像か否かを判定するものである。 When the detection target image is detected, the determination results of a number of weak hypotheses sequentially generated by learning are used as described above. For example, in the case of AdaBoost, all discrimination results of the weak hypothesis (hereinafter referred to as weak discriminator) generated by this learning (1 for a detection target image, -1 for a non-detection target image) Is supplied to the combiner, and the combiner weights and adds the reliability calculated at the time of learning for each corresponding weak discriminator to all the discrimination results, and outputs the result of the weighted majority vote. By evaluating the output value, it is determined whether or not the input image is a detection target image.

弱判別器は、なんらかの特徴量を使用して、検知対象画像か、又は非検知対象画像であるかの判定を行なうものである。なお、後述するように、弱判別器の出力は、検知対象画像か否かを確定的に出力してもよく、また、検知対象画像らしさを確率密度などで確率的に出力してもよい。 The weak discriminator determines whether the image is a detection target image or a non-detection target image using some feature amount. As will be described later, whether or not the weak classifier output is a detection target image may be deterministically output, or the detection target image likelihood may be output probabilistically by a probability density or the like.

ここで、この実施形態においては、２つの画素（ピクセル）間の輝度値の差という極めて簡単な特徴量（以下、ピクセル間差分特徴量という。）を使用して、検知対象画像か否かを判別する弱判別器を使用した集団学習装置を利用することで、検知対象画像の検出処理を高速化するものである。しかも、この実施形態では、ウインドウ画像中における全ての２画素間のピクセル間差分特徴量を用いるのではなく、例えば検知対象画像となる頭部や人の肩において大まかな輝度変化を持つ輪郭部分あるいはエッジ部分を優先的に検知して、より効率的な検知対象画像の検知判定を可能にしている。 Here, in this embodiment, it is determined whether or not the image is a detection target image by using a very simple feature amount called a difference in luminance value between two pixels (pixels) (hereinafter referred to as an inter-pixel difference feature amount). By using a group learning apparatus that uses a weak classifier for discrimination, the detection processing of the detection target image is speeded up. In addition, in this embodiment, instead of using the inter-pixel difference feature amount between all two pixels in the window image, for example, a contour portion having a rough luminance change in the head or the person's shoulder as the detection target image or Edge portions are preferentially detected to enable more efficient detection determination of the detection target image.

すなわち、この実施形態では、ウインドウ画像内の全ての２画素間の差分を用いるのではなく、図５に示すように、検知対象画像の輪郭部分あるいはエッジ部分となる互いに隣接する、あるいは近接する２画素Ｐ１，Ｐ２間の輝度値Ｉ１，Ｉ２の差のみを、判定のための特徴量（以下、制約ピクセル間差分特徴量と呼ぶ）として用いる。 That is, in this embodiment, the difference between all two pixels in the window image is not used, but as shown in FIG. 5, the two adjacent or close to each other as the contour portion or the edge portion of the detection target image. Only the difference between the luminance values I1 and I2 between the pixels P1 and P2 is used as a feature amount for determination (hereinafter referred to as a restricted inter-pixel difference feature amount).

図６は、この実施形態の判定部２３の構成例を示すブロック図である。すなわち、判定部２３は、後述するアンサンブル学習により得られた複数の弱判別器２０ｔ（ｔ＝１〜Ｔ）と、これら複数の弱判別器２０ｔの出力のそれぞれに対して、重み付け係数Ｗｔ（ｉ＝１〜ｎ）を乗算して重み付け処理する複数個の係数乗算器２１ｔ（ｔ＝１〜Ｔ）と、係数乗算器２１ｔからの重み付け判定出力を受けて、重み付き多数決を求める加算器２２０と、その加算器２２０からの重み付き多数決の値に応じて検知対象画像であるか否かを判定する判定出力部２３０を有する。判定出力部２３０からの判定出力は、処理制御部２０に供給される。 FIG. 6 is a block diagram illustrating a configuration example of the determination unit 23 of this embodiment. That is, the determination unit 23 assigns a weighting coefficient Wt (i) to each of the plurality of weak classifiers 20t (t = 1 to T) obtained by ensemble learning described later and the outputs of the plurality of weak classifiers 20t. = 1 to n), a plurality of coefficient multipliers 21t (t = 1 to T), and an adder 220 for receiving a weighted decision output from the coefficient multiplier 21t and obtaining a weighted majority decision, The determination output unit 230 determines whether the image is a detection target image according to the weighted majority value from the adder 220. The determination output from the determination output unit 230 is supplied to the processing control unit 20.

複数個の弱判別器２０ｔを求めると共に、重み付け係数Ｗｔを求めるために、この実施形態では、処理制御部２０には、処理機能部として構成される集団学習機部２４が設けられる。この集団学習機部２４は、この例では、集団学習により、弱判別器２０ｔおよび重み付け係数Ｗｔを求める。この場合、集団学習としては、複数の判別器の結果を多数決にて求めることができるものであれば、具体的にはどんな手法でも適用可能である。例えば、データの重み付けを行って、重み付き多数決を行なう前述したアダブーストなどのブースティングを用いた集団学習を適用することができる。 In this embodiment, in order to obtain a plurality of weak classifiers 20t and a weighting coefficient Wt, the process control unit 20 is provided with a collective learning unit 24 configured as a processing function unit. In this example, the group learning machine unit 24 obtains the weak discriminator 20t and the weighting coefficient Wt by group learning. In this case, as group learning, any method can be applied as long as it can obtain the results of a plurality of discriminators by majority vote. For example, it is possible to apply group learning using boosting such as the above-mentioned Adaboost that weights data and makes a weighted majority decision.

前述したように、各弱判別器２０ｔは、判別のための特徴量として、制約ピクセル間差分特徴量を使用する。そして、判別には、予め学習された検知対象画像か非検知対象画像であるかのラベリングがされた複数の濃淡画像からなる学習サンプルにより予め学習された特徴量と、入力されたウインドウ画像の特徴量とを比較し、ウインドウ画像が検知対象画像であるか否かを推定するための推定値を確定的又は確率的に出力する。 As described above, each weak classifier 20t uses the restricted inter-pixel difference feature quantity as the feature quantity for discrimination. Then, for the determination, the feature amount learned in advance by a learning sample composed of a plurality of grayscale images labeled as detection target images or non-detection target images learned in advance, and features of the input window image The estimated value for estimating whether or not the window image is the detection target image is output deterministically or probabilistically.

ここで、アダブーストでは、複数の弱判別器２０ｔは、順次推定値を算出し、これに伴い重み付き係数Ｗｔの値が逐次更新されていく。これら複数の弱判別器２０ｔは、集団学習機部２４により、後述するアルゴリズムに従い、上述の学習サンプルを使用して集団学習により逐次的に生成されたものであり、例えばその生成順に上記推定値を算出する。また、重み付き多数決の重み係数Ｗｔ（信頼度）は、弱判別器２０ｔを生成する後述する学習工程にて学習されるものである。 Here, in Adaboost, the plurality of weak discriminators 20t sequentially calculate estimated values, and the value of the weighted coefficient Wt is sequentially updated accordingly. The plurality of weak classifiers 20t are sequentially generated by group learning using the learning samples described above according to the algorithm described later by the group learning machine unit 24. For example, the estimated values are generated in the order of generation. calculate. Further, the weighting factor Wt (reliability) of the weighted majority decision is learned in a learning process described later that generates the weak discriminator 20t.

弱判別器２０ｔは、例えばアダブーストのように弱判別器が２値出力を行なうべきものである場合は、制約ピクセル間差分特徴量を閾値で二分することで、検知対象画像であるかどうかの判別を行なう。閾値による判別方法は、複数の閾値を用いてもよい。 When the weak classifier 20t should perform binary output, such as Adaboost, for example, the weak classifier 20t divides the constrained pixel difference feature quantity by a threshold value to determine whether the image is a detection target image. To do. A plurality of threshold values may be used as the determination method based on the threshold value.

また、弱判別器２０ｔは、例えばリアル・アダブースト（Ｒｅａｌ−ＡｄａＢｏｏｓｔ）のように制約ピクセル間差分特徴量から対象物体かどうかを表す度合いの連続値を確率的に出力してもよい。これら弱判別器２０ｔが必要とする判別のための特徴量（閾値）なども学習時に上記アルゴリズムに従って学習されるものである。 Further, the weak discriminator 20t may probabilistically output a continuous value indicating the degree of whether or not the target object is based on the inter-constraint pixel difference feature quantity, for example, Real-AdaBoost. The feature amount (threshold value) for discrimination required by these weak discriminators 20t is also learned according to the above algorithm at the time of learning.

さらに、この実施形態では、重み付き多数決の処理を行なう際、全ての弱判別器２０ｔの計算結果を待たず、計算途中であってもその値によっては検知対象画像でないと判断して計算を打ち切りする。このための打ち切りの閾値も、学習時に学習する。この打ち切り処理によって、検出処理における演算量を大幅に削減することが可能となる。これにより、全ての弱判別器２０ｔの計算結果を待たず、計算途中で次のウインドウ画像の判別処理に移ることができる。 Furthermore, in this embodiment, when performing weighted majority processing, the calculation result of all weak classifiers 20t is not waited, and even during the calculation, it is determined that the image is not a detection target image and the calculation is aborted. To do. The truncation threshold for this is also learned during learning. With this abort process, the amount of calculation in the detection process can be significantly reduced. As a result, it is possible to shift to the next window image discrimination process during the calculation without waiting for the calculation results of all the weak discriminators 20t.

このように、判定部２３は、ウインドウ画像が検知対象画像か否かを判定するための評価値として重み付き多数決の値を算出し、その評価値（重み付け多数決の値）に基づき、ウインドウ画像が検知対象画像か否かを判定する判定手段としての機能する。また、判定部２３は、前述したように、請求項１の検知対象判定装置の実施形態とされている。 Thus, the determination unit 23 calculates a weighted majority value as an evaluation value for determining whether or not the window image is a detection target image, and the window image is determined based on the evaluation value (weighted majority value). It functions as a determination unit that determines whether the image is a detection target image. Moreover, the determination part 23 is made into embodiment of the detection target determination apparatus of Claim 1 as mentioned above.

さらに、判定部２３は、予め学習により生成された複数の弱判別器２０ｔが推定値を順次算出して出力し、前記推定値が算出される毎にその推定値に対して、学習により得られた各弱判別器２０ｔに対する重み付け係数Ｗｔを乗算して加算した重み付き多数決の値を更新し、この重み付き多数決の値（評価値）が更新される毎に、上記打ち切り閾値を利用して推定値の算出を打ち切るか否かをも制御することができるものである。 Furthermore, the determination unit 23 sequentially calculates and outputs estimated values by a plurality of weak classifiers 20t generated in advance by learning, and each time the estimated value is calculated, the estimated value is obtained by learning. The weighted majority vote value multiplied by the weighting coefficient Wt for each weak discriminator 20t is updated, and each time the weighted majority vote value (evaluation value) is updated, the value is estimated using the truncation threshold value. It is also possible to control whether or not to stop the calculation of the value.

この判定部２３は、集団学習機部２４において、学習サンプルを使用し、所定のアルゴリズムに従って集団学習することにより生成される。ここでは先ず、集団学習機部２４における集団学習方法について説明し、次に、その集団学習により学習されて得られた判定部２３を使用し、入力画像から検知対象画像を判別する方法について説明する。 The determination unit 23 is generated by the group learning machine unit 24 using the learning sample and performing group learning according to a predetermined algorithm. Here, a group learning method in the group learning unit 24 will be described first, and then a method for determining a detection target image from an input image using the determination unit 23 obtained by learning through the group learning will be described. .

［集団学習機部２４］
ブースティングアルゴリズムを用いて集団学習する集団学習機部２４は、上述したように複数の弱判別器を複数個組み合わせ、結果的に強い判定結果が得られるよう学習するものである。 [Group learning unit 24]
The group learning machine unit 24 that performs group learning using the boosting algorithm combines a plurality of weak classifiers as described above, and learns so that a strong determination result can be obtained as a result.

弱判別器は、１つ１つは、極めて簡単な構成とし、１つでは、検知対象画像か否か、例えば顔か顔でないかの判別能力も低いものである。しかし、弱判別器を、例えば数百〜数千個組み合わせることで、高い判別能力を持たせることができる。 Each weak discriminator has a very simple configuration, and one weak discriminator has a low ability to discriminate whether it is a detection target image, for example, a face or a face. However, a high discrimination capability can be provided by combining, for example, several hundred to several thousand weak discriminators.

この集団学習機部２４は、例えば数千の学習サンプルといわれる予め正解付け（ラベリング）された検知対象画像と非検知対象画像、例えば顔画像と、非顔画像とからならなるサンプル画像を使用し、多数の学習モデル（仮説の組み合わせ）から所定の学習アルゴリズムに従って１つの仮説を選択（学習）することで弱判別器を生成し、生成した弱判別器の組み合わせ方を決定していく。 The group learning machine unit 24 uses, for example, a sample image composed of a detection target image and a non-detection target image, for example, a face image and a non-face image, which are correctly labeled (labeled), which are referred to as thousands of learning samples. A weak discriminator is generated by selecting (learning) one hypothesis from a large number of learning models (hypothesis combinations) according to a predetermined learning algorithm, and a combination of the generated weak discriminators is determined.

前述したように、弱判別器は、それ自体では判別性能が低いものであるが、これらの選別、組み合わせ方により結果的に判別能力が高い判別器を得ることができるため、集団学習機部２３では、弱判別器の組み合わせ方、即ち弱判別器の選別及びそれらの出力値を重み付き多数決処理する際の重み付け係数Ｗｔなどの学習をする。 As described above, the weak discriminator itself has a low discrimination performance, but as a result, it is possible to obtain a discriminator having a high discrimination ability by the selection and combination of these, so that the group learning machine unit 23 Then, how to combine weak classifiers, that is, selection of weak classifiers and learning of weighting coefficients Wt for weighted majority processing of their output values are learned.

次に、適切な弱判別器を学習アルゴリズムに従って多数組み合わせた判定部２３を得るための集団学習機部２４の学習方法について説明する。ここで、集団学習機部２４の学習方法の説明に先立ち、集団学習にて学習する学習データのうちで、この実施の形態において特徴量となる学習データ、具体的には弱判別器を構成するための制約ピクセル間差分特徴量、および判別工程（検出工程）において検出を途中で打ち切るための打ち切り閾値について説明しておく。 Next, a learning method of the collective learning machine unit 24 for obtaining a determination unit 23 in which a large number of appropriate weak classifiers are combined according to a learning algorithm will be described. Here, prior to the description of the learning method of the group learning machine unit 24, among the learning data to be learned by group learning, learning data that is a feature amount in this embodiment, specifically, a weak classifier is configured. An explanation will be given of the restricted inter-pixel difference feature amount and the abort threshold for aborting detection in the determination step (detection step).

［弱判別器の構成］
この実施の形態における判定部２３は、これを構成する複数個の弱判別器２０ｔが、当該複数個の弱判別器２０ｔに入力される画像に含まれる全画素において選択された、隣接するまたは近接する２つの画素の輝度値の差分（制約ピクセル間差分特徴量）により、検知対象画像例えば顔か否かを判別する極めて簡単な構成とすることで、判別工程において弱判別器２０ｔの判別結果の算出を高速化するものである。弱判別器２０ｔに入力される画像は、学習工程では、学習サンプルであり、判別工程では、スケーリング画像から切り出されたウインドウ画像である。 [Configuration of weak classifier]
In the determination unit 23 in this embodiment, the plurality of weak classifiers 20t constituting the same are adjacent or close to each other selected in all pixels included in the image input to the plurality of weak classifiers 20t. By using a very simple configuration for discriminating whether or not the image is a detection target image, for example, a face, based on the difference between the luminance values of the two pixels (constraint pixel difference feature quantity), the discrimination result of the weak discriminator 20t in the discrimination process The calculation is speeded up. The image input to the weak discriminator 20t is a learning sample in the learning step, and is a window image cut out from the scaling image in the discrimination step.

前述したように、この実施形態では、図５に示したように、隣接するまたは近接する任意の２つの画素の輝度値の差、図５の例では、画素Ｐ１の輝度値Ｉ１と、画素Ｐ２の輝度値Ｉ２との差を、次の（式ａ）に示すように制約ピクセル間差分特徴量と定義する。すなわち、
制約ピクセル間差分特徴量：ｄ＝Ｉ１−Ｉ２・・・（式ａ）
と定義する。 As described above, in this embodiment, as shown in FIG. 5, the difference between the luminance values of any two adjacent or adjacent pixels, in the example of FIG. 5, the luminance value I1 of the pixel P1 and the pixel P2 The difference from the luminance value I2 is defined as a constrained inter-pixel difference feature quantity as shown in the following (formula a). That is,
Restricted inter-pixel difference feature: d = I1-I2 (Expression a)
It is defined as

ここで、どの制約ピクセル間差分特徴量を、検知対象画像検出に使用するかが弱判別器の能力となる。したがって、ウインドウＷＤによる切り出し画像に含まれる隣接するまたは近接する任意の２画素の組み合わせ（フィルタ又は弱仮説ともいう。）から、弱判別器に使用するピクセル位置の組を選択する必要がある。 Here, the ability of the weak discriminator determines which restricted inter-pixel difference feature quantity is used for detection target image detection. Therefore, it is necessary to select a set of pixel positions to be used for the weak classifier from a combination of two adjacent pixels (also referred to as a filter or a weak hypothesis) included in the clipped image by the window WD.

例えばアダブーストでは、弱判別器に、＋１（検知対象画像である）か、−１（非検知対象画像）であるかの確定的な出力を要求する。そこで、アダブーストにおいては、隣接するまたは近接する或る任意の画素位置において、その制約ピクセル間差分特徴量を、１又は複数の閾値を利用して二分割（＋１又は−１)することをもって弱判別器とすることができる。 For example, in AdaBoost, the weak classifier is requested to have a definitive output as to whether it is +1 (is a detection target image) or -1 (non-detection target image). Therefore, in Adaboost, weak discrimination is performed by dividing the difference feature quantity between restricted pixels into two (+1 or −1) using one or a plurality of thresholds at an arbitrary adjacent pixel position. Can be a container.

また、このような２値出力ではなく、学習サンプルの確率分布を示す連続値（実数値）を確率的に出力するような例えばリアル・アダブースト（Ｒｅａｌ−ＡｄａＢｏｏｓｔ）又はジェントルブースト（ＧｅｎｔｌｅＢｏｏｓｔ）などのブースティングアルゴリズムの場合、弱判別器は、入力された画像が対象物である確からしさ（確率）を出力する。弱判別器の出力は、このように確定的であっても、確率的であってもよい。先ず、これら２種類の弱判別器について説明する。 Further, instead of such binary output, a continuous value (real value) indicating the probability distribution of the learning sample is output stochastically, for example, Real-AdaBoost or Gentle Boost. In the case of the boosting algorithm, the weak classifier outputs the probability (probability) that the input image is the target. The output of the weak classifier may be deterministic or stochastic as described above. First, these two types of weak classifiers will be described.

＜２値出力の弱判別器＞
確定的な２値出力をする弱判別器は、制約ピクセル間差分特徴量の値に応じて、検知対象画像か否かの２クラス判別を行なう。対象画像領域（ウインドウ画像）中のある隣接するまたは近接する２つのピクセルの輝度値をＩ１、Ｉ２とし、制約ピクセル間差分特徴量により検知対象画像か否かを判別するための閾値をＴｈとすると、
Ｉ１−Ｉ２＞Ｔｈ・・・（式ｂ）
を満たすか否かで、いずれのクラスに属するかを決定することができる。 <Weak classifier with binary output>
The weak discriminator that performs deterministic binary output performs two-class discrimination as to whether or not the image is a detection target image according to the value of the inter-constraint pixel difference feature quantity. When the luminance values of two adjacent or adjacent pixels in the target image region (window image) are I1 and I2, and the threshold for determining whether or not the image is a detection target image based on the difference feature quantity between the restricted pixels is Th. ,
I1-I2> Th (Formula b)
Which class it belongs to can be determined depending on whether or not it satisfies.

ここで、弱判別器を構成するには、隣接するまたは近接する２つのピクセル位置と、その閾値を決定する必要があるが、その決定方法については後述する。上記（式ｂ）の閾値判定は最も単純な場合である。また、閾値判定には、
Ｔｈ１＞Ｉ１−Ｉ２＞Ｔｈ２・・・（式ｃ）
Ｉ１−Ｉ２＞Ｔｈ１ａｎｄＴｈ２＞Ｉ１−Ｉ２・・・（式ｄ）
でそれぞれ表わされる（式ｃ）又は（式ｄ）に示す２つの閾値を用いることもできる。 Here, in order to configure a weak classifier, it is necessary to determine two adjacent or adjacent pixel positions and their threshold values. The determination method will be described later. The threshold determination in the above (formula b) is the simplest case. In addition, for threshold judgment,
Th1>I1-I2> Th2 (Formula c)
I1-I2> Th1 and Th2> I1-I2 (formula d)
It is also possible to use two threshold values represented by (Expression c) or (Expression d), respectively.

図７（Ａ）、図７（Ｂ）、図７（Ｃ）は、縦軸に頻度をとり、横軸に制約ピクセル間差分特徴量をとって、それぞれ上記（式ｂ）、（式ｃ）、（式ｄ）に示した３つの判別方法を、検知対象画像データおよび非検知対象画像データの頻度分布の特徴的なケースに合わせて説明するための模式図である。 7A, FIG. 7B, and FIG. 7C, the frequency is plotted on the vertical axis and the difference feature quantity between the restricted pixels is plotted on the horizontal axis. FIG. 4 is a schematic diagram for explaining the three determination methods shown in (Expression d) in accordance with characteristic cases of frequency distribution of detection target image data and non-detection target image data.

ここで、図７（Ａ）、図７（Ｂ）、図７（Ｃ）において、ｙｉは弱判別器の出力を示しており、それぞれ破線で示す曲線は、ｙｉ＝−１（非検知対象画像の場合）である全学習サンプルの頻度分布を示し、それぞれ実線で示す曲線は、ｙｉ＝１（検知対象画像の場合）である全学習サンプルの頻度分布を示す。 Here, in FIG. 7A, FIG. 7B, and FIG. 7C, yi indicates the output of the weak discriminator, and each curve indicated by a broken line is yi = −1 (non-detection target image). In the case of (2), the frequency distribution of all learning samples is shown, and the curves shown by solid lines respectively show the frequency distribution of all learning samples where yi = 1 (in the case of the detection target image).

検知対象画像が例えば顔画像として場合、多数の顔画像、非顔画像からなる学習サンプルに対し、同一の制約ピクセル間差分特徴量に対する頻度を取ると、図７（Ａ）、図７（Ｂ）、図７（Ｃ）に示すヒストグラム分布が得られる。 When the detection target image is, for example, a face image, when the frequency for the same restricted pixel difference feature quantity is determined for a learning sample made up of a large number of face images and non-face images, FIGS. 7A and 7B. A histogram distribution shown in FIG. 7C is obtained.

図７（Ａ）に示すように、ヒストグラム分布が、例えば、破線で示す非検知対象画像の場合と、実線で示す検知対象画像の場合とで、それぞれ同様な形状の正規分布曲線のような分布を示すが、その正規分布曲線のピーク位置がずれるような場合には、２つの正規分布曲線の境の制約ピクセル間差分特徴量を閾値Ｔｈとし、上記（式ｂ）によって、検知対象画像か否かを判別することができる。 As shown in FIG. 7A, the histogram distribution is a distribution like a normal distribution curve having the same shape, for example, in the case of a non-detection target image indicated by a broken line and in the case of a detection target image indicated by a solid line. In the case where the peak position of the normal distribution curve is deviated, the restricted inter-pixel difference feature quantity at the boundary between the two normal distribution curves is set as a threshold Th, and whether or not the image is a detection target image according to the above (formula b). Can be determined.

例えばアダブーストにおいては、弱判別器の出力をｆ（ｘ）としたとき、入力ウインドウ画像を検知対象画像であると判別すると、出力ｆ（ｘ）＝１となり、また、入力ウインドウ画像を非検知対象画像であると判定すると、出力ｆ（ｘ）＝−１となる。図７（Ａ）では、制約ピクセル間差分特徴量が閾値Ｔｈより大きい場合に検知対象画像であると判定され、弱判別器の出力がｆ（ｘ）＝１となる例を示している。 For example, in Adaboost, when the output of the weak discriminator is f (x), if it is determined that the input window image is the detection target image, the output f (x) = 1, and the input window image is not detected. If it is determined to be an image, the output f (x) = − 1. FIG. 7A shows an example in which when the difference feature quantity between restricted pixels is larger than the threshold Th, it is determined that the image is a detection target image, and the output of the weak classifier is f (x) = 1.

また、図７（Ｂ）または図７（Ｃ）に示すように、破線で示す非検知対象画像の場合と、実線で示す検知対象画像の場合とで、それぞれの正規分布曲線のピーク位置が同じような位置にあって、そのヒストグラム分布の幅が異なるような場合、分布が狭い方の制約ピクセル間差分特徴量の下限値近傍の値Ｔｈ１および上限値近傍の値Ｔｈ２を閾値として、上記（式ｃ）または（式ｄ）により検知対象画像か否かを判別することができる。 Further, as shown in FIG. 7B or FIG. 7C, the peak positions of the respective normal distribution curves are the same between the non-detection target image indicated by the broken line and the detection target image indicated by the solid line. If the width of the histogram distribution is different in such a position, the value Th1 in the vicinity of the lower limit value and the value Th2 in the vicinity of the upper limit value of the constrained pixel difference feature quantity with the narrower distribution are used as threshold values. Whether the image is a detection target image can be determined by c) or (expression d).

図７（Ｂ）は、分布が狭い方を検知対象画像と判定される例を示し、図７（Ｃ）は、分布の幅が広い方から分布の幅が狭い方を除いたものが検知対象画像と判定されて、弱判別器の出力がｆ（ｘ）＝１となる例を示している。 FIG. 7B shows an example in which the narrower distribution is determined as the detection target image, and FIG. 7C shows the detection target excluding the wider distribution from the narrower distribution. In this example, the image is determined to be an image and the output of the weak classifier is f (x) = 1.

弱判別器は、ある制約ピクセル間差分特徴量と、その閾値とを決定することにより構成されるが、その判定によって誤り率ができるだけ小さくなるような、即ち判別率が高い制約ピクセル間差分特徴量を選択する必要がある。 The weak discriminator is configured by determining a certain inter-constraint pixel difference feature amount and its threshold value, but the error rate is minimized by the judgment, that is, the constrained inter-pixel difference feature amount is high. It is necessary to select.

例えば、閾値は、隣接するまたは近接する２つの画素位置を決め、正解付けされた学習サンプルに対して図７に示したヒストグラムを求め、最も正解率が高く、非正解率（誤り率）が最も小さくなるような閾値を検索することで、求めることができる。また、隣接するまたは近接する２つの画素位置は、閾値と共に得られる誤り率が最も小さいものを選択するなどすればよい。 For example, the threshold value is determined as two adjacent or adjacent pixel positions, the histogram shown in FIG. 7 is obtained for the correctly-acquired learning sample, the highest accuracy rate is the highest, and the inaccuracy rate (error rate) is the highest. It can be obtained by searching for a threshold value that decreases. In addition, for two adjacent or adjacent pixel positions, the one with the smallest error rate obtained together with the threshold value may be selected.

ただし、アダブーストにおいては、判別の難易度を反映した重み（データ重み）が各学習サンプルに付けられており、適切な制約ピクセル間差分特徴量（どの位置の隣接するまたは近接する２つの画素の輝度値を特徴値とするか）が、後述する重み付き誤り率を最小にするように学習される。 However, in Adaboost, a weight (data weight) reflecting the difficulty of discrimination is attached to each learning sample, and an appropriate restricted inter-pixel difference feature (the brightness of two adjacent pixels at or near which position) Whether the value is a feature value) is learned so as to minimize a weighted error rate described later.

＜連続値出力の弱判別器＞
確率的な出力をする弱判別器としては、上述した如く、例えばリアル・アダブースト（Ｒｅａｌ−ＡｄａＢｏｏｓｔ）やジェントルブースト（ＧｅｎｔｌｅＢｏｏｓｔ）などのように弱判別器が連続値を出力するものがある。この場合は、或る決められた一定値（閾値）により判別問題を解き、２値出力（ｆ（ｘ）＝１又は−１）ではなく、入力された画像が検知対象画像である度合いを、例えば確率密度関数として出力する。 <Weak discriminator for continuous value output>
As described above, as weak classifiers that perform stochastic output, there are those in which weak classifiers output continuous values such as Real-AdaBoost and Gentle Boost, as described above. In this case, the discrimination problem is solved by a certain fixed value (threshold value), and the degree that the input image is the detection target image, not the binary output (f (x) = 1 or −1), For example, it is output as a probability density function.

このような、検知対象画像である度合い（確率）を示す確率的な出力は、制約ピクセル間差分特徴量ｄを入力としたとき、Ｐｐ（ｘ）を学習サンプルの検知対象画像の確率密度関数、Ｐｎ（ｘ）を学習サンプルの非検知対象画像の確率密度関数とすると、図Ａの（式ｅ）に示す関数ｆ（ｘ）とすることができる。 Such a stochastic output indicating the degree (probability) of the detection target image is obtained by using Pp (x) as a probability density function of the detection target image of the learning sample, when the inter-constraint pixel difference feature quantity d is input. When Pn (x) is a probability density function of the non-detection target image of the learning sample, a function f (x) shown in (Equation e) in FIG. A can be obtained.

図８（Ａ）は、縦軸に確率密度をとり、横軸に制約ピクセル間差分特徴量をとって、データの頻度分布の特徴的なケースを示す図である。また、図８（Ｂ）は、縦軸に前記（式ｅ）の関数ｆ（ｘ）の値をとり、横軸に制約ピクセル間差分特徴量をとって、図８（Ａ）に示すデータ分布における関数ｆ（ｘ）値の特性を示す図である。 FIG. 8A is a diagram showing a characteristic case of the data frequency distribution, with the probability density on the vertical axis and the difference feature quantity between restricted pixels on the horizontal axis. FIG. 8B shows the data distribution shown in FIG. 8A with the vertical axis representing the value of the function f (x) of the above (formula e) and the horizontal axis representing the constrained pixel difference feature quantity. It is a figure which shows the characteristic of the function f (x) value in.

図８（Ａ）において、破線が非検知対象画像であることを示す確率密度、実線が検知対象画像であることを示す確率密度を示す。前記（式ｅ）から関数ｆ（ｘ）を求めると、図８（Ｂ）に示すグラフが得られる。 In FIG. 8A, the probability density indicating that the broken line is a non-detection target image and the probability density indicating that the solid line is a detection target image are shown. When the function f (x) is obtained from the (formula e), a graph shown in FIG. 8B is obtained.

この場合、弱判別器は、判別工程において、入力されるウインドウ画像から得られた前記（式ａ）に示した制約ピクセル間差分特徴量ｄに対応する関数ｆ（ｘ）を出力する。この関数ｆ（ｘ）は、検知対象画像らしさの度合いを示すものであって、例えば非検知対象画像を−１、検知対象画像を１としたとき、−１〜１までの連続値を取るものとすることができる。 In this case, the weak discriminator outputs a function f (x) corresponding to the restricted inter-pixel difference feature quantity d shown in (Formula a) obtained from the input window image in the discrimination step. This function f (x) indicates the degree of likelihood of the detection target image. For example, when the non-detection target image is -1 and the detection target image is 1, the function f (x) takes a continuous value from −1 to 1. It can be.

例えば制約ピクセル間差分特徴量ｄと、それに対応する関数ｆ（ｘ）とからなるテーブルを記憶し、入力に応じてテーブルから関数値ｆ（ｘ）を読出し出力する。したがって、一定値である閾値Ｔｈ又はＴｈ１、Ｔｈ２を記憶する場合より若干記憶量が大きくなるが、判別性能が向上する。 For example, a table composed of the inter-restricted pixel difference feature quantity d and the function f (x) corresponding thereto is stored, and the function value f (x) is read out from the table according to the input and output. Therefore, although the storage amount is slightly larger than the case where the threshold value Th or Th1, Th2, which is a constant value, is stored, the discrimination performance is improved.

これら複数の推定方法（判別方法）は、アンサンブル学習中に組み合わせて使用することで、判別性能が向上することが期待できる。また、いずれか単一の判別方法のみを利用すれば、実行速度性能を引き出すことができる。 These multiple estimation methods (discrimination methods) can be expected to improve discrimination performance when used in combination during ensemble learning. Moreover, if only any one of the determination methods is used, the execution speed performance can be extracted.

この実施の形態において使用する弱判別器は、使用する特徴量（制約ピクセル間差分特徴量）が非常に単純であるために、上述したように極めて高速に検知対象画像の判別を行なうことができる点が特長である。 Since the weak discriminator used in this embodiment has a very simple feature amount (difference feature amount between constrained pixels), the detection target image can be discriminated extremely quickly as described above. The point is a feature.

検知対象画像として例えば顔検出する場合には、制約ピクセル間差分特徴量を、上述の判別方法のうち最も単純な（式ｂ）に示す閾値判定を用いても、極めてよい判別結果が得られるが、どのような判別方法により弱判別器が有効に機能するかは、対象とする問題によって異なり、その閾値設定方法などを適宜選択すればよい。 For example, when a face is detected as the detection target image, an extremely good discrimination result can be obtained even if the threshold value judgment shown in the simplest (formula b) of the above-described discrimination methods is used for the difference feature amount between restricted pixels. The discrimination method by which the weak discriminator functions effectively depends on the target problem, and the threshold setting method and the like may be appropriately selected.

また、問題によっては、隣接するまたは近接する２つの画素の輝度値の差ではなく、隣接するまたは近接する２以上の複数個の画素間における輝度値の差を特徴量としたり、それらを組み合わせた特徴量を使用したりしてもよい。 In addition, depending on the problem, the difference between the luminance values of two or more adjacent or adjacent pixels is used as a feature amount or a combination of the luminance values of two or more adjacent or adjacent pixels. A feature amount may be used.

＜打ち切り閾値＞
次に、打ち切り閾値について説明する。ブースティングを用いた集団学習機においては、通常は、上述したように判定部２３を構成する全ての弱判別器の出力の重み付き多数決により、ウインドウ画像が検知対象画像か否かを判別する。重み付き多数決は、弱判別器の判別結果（推定値）を逐次足し合わせていくことで算出される。例えば、弱判別器２０ｔのそれぞれに対応する多数決の重み（信頼度）を前述したようにＷｔ、各弱判別器の出力をｆｔ（ｘ）としたとき、アダブーストにおける重み付き多数決の値Ｆ（ｘ）は、図Ａの（式ｆ）により求めることができる。 <Cutoff threshold>
Next, the abort threshold will be described. In a group learning machine using boosting, it is usually determined whether the window image is a detection target image by the weighted majority of the outputs of all weak classifiers constituting the determination unit 23 as described above. The weighted majority vote is calculated by sequentially adding the discrimination results (estimated values) of the weak classifiers. For example, when the weight (reliability) of the majority decision corresponding to each weak discriminator 20t is Wt as described above and the output of each weak discriminator is ft (x), the weighted majority decision value F (x in Adaboost ) Can be obtained by (Formula f) in FIG.

図９は、横軸に、判定部２３を構成する弱判別器の数をとり、縦軸に前記（式ｆ）に示す重み付き多数決の値Ｆ(ｘ)をとって、入力される画像が検知対象画像か否かに応じた重み付き多数決の値Ｆ(ｘ)の変化を示すグラフ図である。 In FIG. 9, the horizontal axis represents the number of weak classifiers constituting the determination unit 23, and the vertical axis represents the weighted majority value F (x) shown in (Formula f). It is a graph which shows the change of the value F (x) of the weighted majority according to whether it is a detection target image.

この図９において、破線で示すデータＤ１〜Ｄ４は、検知対象画像としてラベリングされている画像（学習サンプル）を入力として弱判別器により算出した推定値ｆ（ｘ）を逐次算出し、その重み付き多数決の値Ｆ(ｘ)を逐次求めたものである。このデータＤ１〜Ｄ４に示すように、検知対象画像を入力画像とすると、ある程度の個数の弱判別器の判別によりその重み付き多数決の値Ｆ(ｘ)はプラスになる。 In FIG. 9, data D1 to D4 indicated by broken lines sequentially calculate an estimated value f (x) calculated by a weak discriminator using an image (learning sample) labeled as a detection target image as an input, and weighted. The majority value F (x) is obtained sequentially. As shown in the data D1 to D4, when the detection target image is an input image, the weighted majority value F (x) becomes positive by the determination of a certain number of weak classifiers.

ここで、この実施の形態においては、通常のブースティングアルゴリズムとは異なる手法を導入する。すなわち、弱判別器の判別結果を逐次足し合わせていく過程において、全ての弱判別器の結果を得る前であっても、明らかに検知対象画像ではないと判別できるウインドウ画像については、その判別を中止するものである。この際、判別を中止するか否かを決定する閾値を学習工程にて学習しておく。以下、判別を中止するか否かの判定に用いる閾値を打ち切り閾値という。 Here, in this embodiment, a method different from a normal boosting algorithm is introduced. That is, in the process of sequentially adding the discrimination results of the weak classifiers, even before obtaining the results of all the weak classifiers, the window image that can clearly be determined not to be a detection target image is subjected to the discrimination. It will be canceled. At this time, a threshold value for determining whether to stop the discrimination is learned in the learning step. Hereinafter, the threshold used for determining whether or not to stop the determination is referred to as an abort threshold.

この打ち切り閾値により、判定部２３では、全てのウインドウ画像について、全弱判別器の出力結果を用いなくとも、非検知対象画像であることが確実に推定できる場合、弱判別器の推定値ｆ（ｘ）の演算を途中で中止することができ、これにより、全ての弱判別器を使用した重み付き多数決を行なうのに比して格段に演算量を低減することができる。 If the determination unit 23 can reliably estimate that all the window images are non-detection target images without using the output result of all the weak discriminators, the threshold value f ( The calculation of x) can be stopped in the middle, and the amount of calculation can be significantly reduced as compared with the case where a weighted majority decision using all weak classifiers is performed.

この打ち切り閾値としては、ラベリングされている学習サンプルのうち、検出検知対象画像を示す学習サンプルの判別結果の重み付き多数決の値が取り得る最小値を用いることができる。 As the censoring threshold, a minimum value that can be taken by the weighted majority value of the discrimination result of the learning sample indicating the detection detection target image among the labeled learning samples can be used.

判別工程において、ウインドウ画像の弱判別器による結果が逐次重み付きされて出力される、即ち、重み付き多数決の値が逐次更新されていくが、この更新されていく値と、上記打ち切り閾値とを更新の度、すなわち、１つの弱判別器が判別結果を出力する毎に比較し、更新された重み付き多数決の値が打ち切り閾値を下回る場合には当該ウインドウ画像は検知対象画像ではないとし、計算を打ち切ることができ、これにより無駄な演算を省いて、さらに判別処理を高速化することができる。 In the discrimination step, the result of the weak discriminator of the window image is sequentially weighted and output, i.e., the weighted majority value is sequentially updated. Every time update is performed, that is, each time a weak classifier outputs a discrimination result, a comparison is made. If the updated weighted majority value falls below the abort threshold, the window image is not a detection target image, and calculation is performed. As a result, it is possible to eliminate unnecessary computations and further speed up the discrimination process.

すなわち、Ｋ番目の弱判別器の出力ｆ_Ｋ（ｘ）の打ち切り閾値Ｒ_Ｋは、学習サンプルｘｉ（＝ｘ１〜ｘＮ；ｉ＝１〜Ｎ）のうち、検知対象画像である学習サンプルｘｊ（＝ｘ１〜ｘＪ；ｊ＝１〜Ｊ）を使用したときの重み付き多数決の値の最小値とされ、図Ａの（式ｇ）のように定義される。 That is, the abort threshold value _{R K} output _f K (x) of the K-th weak discriminator is a learning sample xi; Of (= x1~xN i = 1~N), a detection target image learning samples xj (= x1 to xJ; j = 1 to J) is used as a minimum value of the weighted majority value, and is defined as (formula g) in FIG.

この（式ｇ）に示すように、検知対象画像である学習サンプルｘ１〜ｘＪの重み付き多数決の値の最小値が０を上回る場合には、打ち切り閾値Ｒ_Ｋには０が設定される。なお、０を上回らないようにするのは、０を閾値にして判別を行なうアダブーストの場合であり、ここは集団学習の手法により異なる場合がありうる。 As shown in (Formula g), if the minimum value of the weighted majority value of a detection target image learning sample x1~xJ exceeds 0, the termination threshold R _K 0 is set. It should be noted that not exceeding 0 is the case of Adaboost that performs determination using 0 as a threshold, and this may differ depending on the group learning method.

アダブーストの場合においては、打ち切り閾値は、図９において太線で示すように、入力画像として検知対象画像を入力した場合の全データＤ１〜Ｄ４のうち、取り得る最小値に設定され、全てのデータＤ１〜Ｄ４の最小値が０を超えた場合は、打ち切り閾値が０に設定される。 In the case of Adaboost, the abort threshold is set to the minimum possible value among all data D1 to D4 when a detection target image is input as an input image, as shown by a thick line in FIG. 9, and all data D1 When the minimum value of .about.D4 exceeds 0, the abort threshold is set to 0.

この実施の形態においては、弱判別器が生成される毎に打ち切り閾値Ｒｔを学習しておくことで、後述する判別工程において、例えばデータＤ５のように、複数の弱判別器により推定値が逐次出力され、重み付き多数決の値が逐次更新されていくが、この値が上記打ち切り閾値を下回った時点で、後段の弱判別器による判別を行なう処理を終了する。 In this embodiment, each time a weak discriminator is generated, the censoring threshold value Rt is learned, so that an estimated value is sequentially generated by a plurality of weak discriminators as in data D5, for example, in a later-described discrimination step. The weighted majority value that is output is sequentially updated, and when this value falls below the abort threshold value, the process of determining by the weak classifier at the subsequent stage is terminated.

すなわち、この打ち切り閾値Ｒｔを学習しておくことにより、弱判別器の推定値を計算する毎に次の弱判別器の計算を行なうか否かを決定でき、明らかに検知対象画像ではないとされる場合には、全ての弱判別器の判別結果を待たずに非検知対象画像であることが判定でき、演算を途中で打ち切りことにより検出処理を高速化することができる。 That is, by learning this truncation threshold value Rt, it is possible to determine whether or not to calculate the next weak discriminator every time the estimated value of the weak discriminator is calculated. In this case, it is possible to determine that the image is a non-detection target image without waiting for the determination results of all the weak classifiers, and it is possible to speed up the detection process by aborting the calculation halfway.

［学習の方法］
次に、集団学習機部２４における学習方法について説明する。与えられたデータが、例えば顔か否かを判別する問題など、一般的な２クラス判別のパターン認識問題の前提として、予め人手によりラベリング（正解付け）された学習サンプルとなる画像（訓練データ）を用意する。学習サンプルは、検出したい検知対象画像の領域を切り出した画像群（検知対象画像群）と、検出したい検知対象画像とは全く関係のない例えば風景画などを切り出したランダムな画像群（非検知対象画像群）とからなる。 [How to learn]
Next, a learning method in the group learning machine unit 24 will be described. An image (training data) that is a learning sample that has been manually labeled (corrected) in advance as a premise of a general pattern recognition problem for two-class classification, such as a problem of determining whether the given data is a face, for example. Prepare. The learning sample consists of an image group (detection target image group) obtained by cutting out a region of the detection target image desired to be detected, and a random image group (non-detection target) obtained by cutting out, for example, a landscape image that has no relation to the detection target image desired to be detected Image group).

これらの学習サンプルを基に学習アルゴリズムを適用し、判別時に用いる学習データを生成する。判別時に用いる学習データとは、この実施の形態においては、上述した学習データを含む以下の４つの学習データである。すなわち、
（Ａ）隣接するまたは近接する２つのピクセル位置の組（Ｔ個）
（Ｂ）弱判別器の閾値（Ｔ個）
（Ｃ）重み付き多数決の重み（弱判別器の信頼度）（Ｔ個）
（Ｄ）打ち切り閾値（Ｔ個）
である。 A learning algorithm is applied based on these learning samples to generate learning data used for discrimination. In this embodiment, the learning data used at the time of discrimination is the following four learning data including the learning data described above. That is,
(A) A set of two adjacent pixel positions (T)
(B) Weak classifier threshold (T)
(C) Weight of majority vote (reliability of weak classifier) (T)
(D) Abort threshold (T)
It is.

次に、上述したような多数の学習サンプルから、上記（Ａ）〜（Ｄ）に示す４種類の学習データを学習するアルゴリズムを説明する。 Next, an algorithm for learning the four types of learning data shown in the above (A) to (D) from a large number of learning samples as described above will be described.

図１０は、集団学習機部２４における学習方法を示すフローチャートである。なお、ここでは、学習アルゴリズムとして、弱判別の際の閾値として一定の値を使用するアルゴリズム（アダブースト（ＡｄａＢｏｏｓｔ））に従った学習について説明するが、閾値として正解の確からしさ（確率）を示す連続値を使用する例えばリアル・アダブースト（Ｒｅａｌ−ＡｄａＢｏｏｓｔ）など、弱判別器を複数結合するために集団学習するものであれば、学習アルゴリズムはアダブーストに限らない。 FIG. 10 is a flowchart showing a learning method in the group learning machine unit 24. Here, learning according to an algorithm (Adaboost (AdaBoost)) that uses a constant value as a threshold value in weak discrimination will be described as a learning algorithm. However, a continuous value indicating the probability (probability) of a correct answer as a threshold value will be described. The learning algorithm is not limited to AdaBoost as long as the group learning is performed in order to combine a plurality of weak discriminators such as Real-AdaBoost using values.

（準備処理）学習サンプルのラベリング
図１０の処理フローに先立ち、上述のように、予め検知対象画像又は非検知対象画像であることがラベリングされた学習サンプル（ｘｉ，ｙｉ）を用意する。 (Preparation Process) Labeling of Learning Sample Prior to the processing flow of FIG. 10, as described above, a learning sample (xi, yi) that is previously labeled as a detection target image or a non-detection target image is prepared.

ここで、
学習サンプル（ｘｉ，ｙｉ）：（ｘ１，ｙ１），・・・，（ｘＮ，ｙＮ）
ｘｉ∈Ｘ，ｙｉ∈｛−１，１}
Ｘ：学習サンプルのデータ
Ｙ：学習サンプルのラベル（正解）
Ｎ：学習サンプル数
をそれぞれ示す。 here,
Learning sample (xi, yi): (x1, y1), ..., (xN, yN)
xiεX, yiε {-1,1}
X: Learning sample data Y: Learning sample label (correct answer)
N: Indicates the number of learning samples.

すなわち、ｘｉは、学習サンプル画像の全輝度値からなる特徴ベクトルを示す。また、ｙｉ＝−１は、学習サンプルが非検知対象画像としてラベリングされている場合を示し、ｙｉ＝１は、学習サンプルが検知対象画像としてラベリングされていることを示す。 That is, xi represents a feature vector composed of all the luminance values of the learning sample image. In addition, yi = −1 indicates that the learning sample is labeled as a non-detection target image, and yi = 1 indicates that the learning sample is labeled as a detection target image.

（ステップＳ１１）データ重みの初期化
ブースティングにおいては、各学習サンプルの重み（データ重み）を異ならせ、判別が難しい学習サンプルに対するデータ重みを相対的に大きくしていく。判別結果は、弱判別器を評価する誤り率（エラー）の算出に使用されるが、判別結果にデータ重みを乗算することで、より難しい学習サンプルの判別を誤った弱判別器の評価が実際の判別率より下まわることになる。データ重みは、後述する方法によって逐次更新されるが、先ず最初にこの学習サンプルのデータ重みの初期化を行なう。学習サンプルのデータ重みの初期化は、全学習サンプルの重みを一定にすることにより行なわれ、図Ｂの（式１）に示すように定義される。 (Step S11) Initialization of data weight In boosting, the weight (data weight) of each learning sample is made different to relatively increase the data weight for the learning sample that is difficult to discriminate. The discriminant result is used to calculate the error rate (error) for evaluating the weak discriminator, but by multiplying the discriminant result by the data weight, the evaluation of the weak discriminator that erroneously discriminates the more difficult learning sample is actually performed. Will fall below the discrimination rate. The data weight is sequentially updated by a method described later. First, the data weight of this learning sample is initialized. The data weights of the learning samples are initialized by making the weights of all the learning samples constant and are defined as shown in (Equation 1) in FIG.

ここで、（式１）の学習サンプルのデータ重みＤ_１，ｉは、繰り返し回数ｔ＝１回目の学習サンプルｘｉ（＝ｘ1〜ｘＮ）のデータ重みを示す。Ｎは学習サンプル数である。 Here, the data weight D _{1, i} of the learning sample in (Equation 1) indicates the data weight of the learning sample xi (= x 1 to xN) of the first iteration t = 1. N is the number of learning samples.

（ステップＳ１２〜Ｓ１７）繰り返し処理
次に、以下に示すステップＳ１２〜ステップＳ１７の処理を繰り返すことで判定部２３を生成する。ここで、繰り返し処理回数をｔ＝１，２，・・・，Ｔとする。１回の繰り返し処理を行う毎に１つの弱判別器、すなわち、隣接または近接する１組の画素と、その組の制約ピクセル間差分特徴量が学習され、従って繰り返し処理回数（Ｔ回）分、弱判別器が生成されて、Ｔ個の弱判別器からなる判定部２３が生成されることになる。 (Steps S12 to S17) Repetition Processing Next, the determination unit 23 is generated by repeating the processing of steps S12 to S17 shown below. Here, it is assumed that the number of repetition processes is t = 1, 2,. Each time iterative processing is performed, one weak discriminator, that is, a set of adjacent or adjacent pixels and a difference feature quantity between the constrained pixels of the set are learned, and accordingly, the number of iterations (T times), A weak classifier is generated, and a determination unit 23 including T weak classifiers is generated.

なお、通常、数百〜数千個の繰り返し処理により、数百〜数千個の弱判別器が生成されるが、繰り返し処理回数（弱判別器の個数）は、要求される判別性能、判別する問題（検知対象画像）に応じて適宜設定すればよい。 In general, hundreds to thousands of iterative processes generate hundreds to thousands of weak classifiers. The number of iterations (number of weak classifiers) depends on the required discrimination performance and discrimination. What is necessary is just to set suitably according to the problem (detection object image) to do.

（ステップＳ１２）弱判別器の学習
ステップＳ１２では、弱判別器の学習（生成）を行うが、この学習方法については後述する。この実施の形態においては、１回の繰り返し処理毎に、１つの弱判別器を、後述する方法に従って生成する。 (Step S12) Weak Classifier Learning In step S12, weak classifier learning (generation) is performed. This learning method will be described later. In this embodiment, one weak discriminator is generated according to a method described later for each iteration.

（ステップＳ１３）重み付き誤り率ｅ_ｔの算出
次に、ステップＳ１２にて生成された弱判別器の重み付き誤り率ｅ_ｔを、図Ｂの（式２）により算出する。 (Step S13) Calculation of the weighted error ratio _{e t} Then, the weighted error ratio _{e t} of the weak classifier generated in step S12, is calculated by (Equation 2) in FIG. B.

（式２）に示すように、重み付き誤り率ｅ_ｔは、学習サンプルのうち、弱判別器の判別結果が誤っているもの（ｆｔ（ｘｉ）≠ｙｉ）である学習サンプルのデータ重みのみを加算したものとなり、上述したように、データ重みＤ_ｔ，ｉが大きい（判別が難しい）学習サンプルの判別を間違えると、重み付き誤り率ｅ_ｔが大きくなるよう算出される。なお、重み付き誤り率ｅ_ｔは０．５未満となるが、この理由は後述する。 As shown in (Equation 2), the weighted error ratio e _t, among the learning samples, only the data weights of learning samples is what is wrong discrimination result of the weak discriminator (ft (xi) ≠ yi) As described above, if the data sample Dt _{, i} is large (difficult to distinguish) and the learning sample is mistakenly determined, the weighted error rate _et is calculated to be large. The weighted error rate _et is less than 0.5, and the reason will be described later.

（ステップＳ１４）重み付き多数決の重み（弱判別器の信頼度）の算出
次に、上述の（式２）に示す重み付き誤り率ｅ_ｔに基づき、重み付き多数決の重みＷｔを、図Ｂの（式３）により算出する。この、重み付き多数決の重みＷｔは、繰り返し回数ｔ回目に生成された弱判別器の信頼度を示す。以下、この重み付き多数決の重みＷｔを、信頼度Ｗｔという。 Calculation of (step S14) of weighted majority decision weight (weak discriminator reliability) Next, based on the weighted error ratio e _t shown in the above equation (2), the weight Wt of the weighted majority decision, in Figure B Calculated by (Equation 3). The weight Wt of the weighted majority vote indicates the reliability of the weak discriminator generated at the number of repetitions t. Hereinafter, the weight Wt of the weighted majority decision is referred to as reliability Wt.

上述の（式３）に示すように、重み付き誤り率ｅ_ｔが小さいものほど、その弱判別器の信頼度Ｗｔが大きくなる。 As shown in the above equation (3), as those weighted error ratio e _t is small, the reliability Wt of the weak classifier increases.

（ステップＳ１５）学習サンプルのデータ重み更新
次に、上記（式３）にて得られた信頼度Ｗｔを使用して、図Ｂの（式４）により学習サンプルのデータ重みＤ_ｔ，ｉを更新する。データ重みＤ_ｔ，ｉは、通常全部足し合わせると１になるよう正規化されており、図Ｂの（式５）はデータ重みＤ_ｔ，ｉを正規化するためのものである。 (Step S15) Data Weight Update of Learning Sample Next, using the reliability Wt obtained in (Equation 3), the data weight D _{t, i} of the learning sample is updated according to (Equation 4) in FIG. To do. The data weights D _{t, i} are normally normalized to be 1 when all are added together, and (Equation 5) in FIG. B is for normalizing the data weights D _{t, i} .

（ステップＳ１６）打ち切り閾値Ｒ_ｔの算出
次に、上述したように、判別工程にて、各弱判別器２０ｔの段階で判別を打ち切るための打ち切り閾値Ｒ_ｔを算出する。打ち切り閾値Ｒ_ｔは、上述した図Ａの（式ｇ）に従って、検知対象画像である学習サンプル（ポジディブな学習サンプル）ｘ１〜ｘＪの重み付き多数決の値又は０のうち最も小さい値が選択される。なお、上述したように、最小値又は０を打ち切り閾値に設定するのは、０を閾値にして判別を行なうアダブーストの場合である。いずれにせよ、打ち切り閾値Ｒ_ｔは、少なくとも全てのポジティブな学習サンプルが通過できる最大の値となるよう設定する。 (Step S16) Calculation of abort threshold value R _t Next, as described above, at determination step calculates the abort threshold value R _t for aborting the discrimination at the stage of the weak classifiers 20t. Abort threshold value R _t in accordance with equation (g) of Figure A described above, the smallest value of the values or 0 of weighted majority of the detection target image in which the learning sample (Pojidibu learning samples) X1～xJ is selected . Note that, as described above, the minimum value or 0 is set as the cutoff threshold value in the case of Adaboost that performs determination using 0 as the threshold value. In any case, the truncation threshold value _Rt is set to a maximum value that allows at least all positive learning samples to pass.

（ステップＳ１７）繰り返し処理
ステップＳ１７においては、所定回数（＝Ｔ回）のブースティングが行われたか否かを判別し、行なわれていないと判別した場合は、ステップＳ１７からステップＳ１２に戻り、上述したステップＳ１２〜ステップＳ１７の処理を繰り返す。所定回数の学習が終了したと判別した場合は、図１０の学習処理を終了する。この実施の形態では、学習サンプルなどの与えられる画像から、検出対象とする検知対象画像を十分判別できる数の弱判別器を学習すると終了するものとする。 (Step S17) Repetition Processing In step S17, it is determined whether or not a predetermined number of times (= T) of boosting has been performed. If it is determined that the boosting has not been performed, the process returns from step S17 to step S12 to Steps S12 to S17 are repeated. If it is determined that the predetermined number of times of learning has been completed, the learning process in FIG. 10 is ended. In this embodiment, it is assumed that the learning is completed when a number of weak discriminators that can sufficiently discriminate a detection target image as a detection target are learned from a given image such as a learning sample.

［弱判別器の生成］
次に、上述したステップＳ１２における弱判別器の学習方法（生成方法）について説明する。弱判別器の生成は、弱判別器が２値出力の場合と、図Ａの（式ｅ）に示した関数ｆ（ｘ）として連続値を出力する場合とで異なる。また、２値出力の場合においても、前述の（式ｂ）に示したような１つの閾値Ｔｈで判別する場合と、前述の（式ｃ）、（式ｄ）に示したような２つの閾値Ｔｈ１、Ｔｈ２で判別する場合とで処理が若干異なる。 [Generate weak classifiers]
Next, the weak classifier learning method (generation method) in step S12 described above will be described. The weak discriminator is generated differently when the weak discriminator outputs a binary value and when a continuous value is output as the function f (x) shown in (Equation e) of FIG. Also in the case of binary output, a case where the determination is made with one threshold Th as shown in the above (Formula b) and two thresholds as shown in the above (Formula c) and (Formula d). The process is slightly different depending on whether Th1 or Th2 is used for discrimination.

ここでは、１つの閾値Ｔｈで２値出力する弱判別器の学習方法（生成方法）を説明する。図１１は、１つの閾値Ｔｈで２値出力する弱判別器の学習方法（生成方法）を説明するためのフローチャートであり、弱判別器のパラメータ決定手順に相当する。 Here, a learning method (generation method) of a weak classifier that outputs a binary value with one threshold Th will be described. FIG. 11 is a flowchart for explaining a learning method (generation method) of a weak classifier that outputs a binary value with one threshold Th, and corresponds to a parameter determination procedure of the weak classifier.

（ステップＳ２１）画素の選択
ここでは、学習サンプルにおける全画素から隣接または近接する任意の２つの画素を選択する。例えば２４×２４画素の学習サンプルを使用する場合に、その全ての２画素の組ではなく、隣接または近接する２つの画素の組の１つをランダムに選択する。 (Step S21) Selection of Pixels Here, arbitrary two pixels that are adjacent or close to each other from all the pixels in the learning sample are selected. For example, when a learning sample of 24 × 24 pixels is used, one of two adjacent or adjacent pixel sets is randomly selected instead of all the two pixel sets.

この実施形態では、差分特徴量つまりエッジ強度を特徴として判別に利用するため、このような隣接または近接する２画素の組を用いる。前述の図５に示したように、選択された２つの画素をＰ１、Ｐ２、その輝度値をそれぞれＩ１、Ｉ２としたとき、この差分特徴量を求めるための２画素Ｐ１、Ｐ２は、ウインドウＷＤ内で、例えば、図１８の（式６）に従うように選ばれる。この（式６）において、（ｘ１，ｙ１）は画素Ｐ１の位置を、（ｘ２，ｙ２）は画素Ｐ２の位置を、θは閾値を示している。 In this embodiment, since the difference feature amount, that is, the edge strength is used as a feature for discrimination, such a pair of adjacent or adjacent two pixels is used. As shown in FIG. 5 described above, when the two selected pixels are P1 and P2, and the luminance values are I1 and I2, respectively, the two pixels P1 and P2 for obtaining the difference feature amount are the window WD. For example, the selection is made in accordance with (Equation 6) in FIG. In (Expression 6), (x1, y1) represents the position of the pixel P1, (x2, y2) represents the position of the pixel P2, and θ represents the threshold value.

この画素選択により、ある１画素に対して例えば上下左右に隣接するいわゆる４近傍や、また、４近傍画素に加えて斜め方向に近接する画素を含む周囲８画素からなる８近傍などの、隣接または近接する２画素が選択される。 By this pixel selection, for example, so-called four neighbors adjacent to one pixel in the vertical and horizontal directions, or eight neighbors including eight neighboring pixels including pixels adjacent in the oblique direction in addition to the four neighboring pixels, Two adjacent pixels are selected.

このステップＳ２１では、学習サンプルの全ての２画素の組から隣接または近接する２つの画素の組の群（Ｍ個）を選択しておき、その画素の組の群の中からランダムに、１つの隣接または近接する２つの画素を選択するようにする。ここで、ランダムに選択する理由は、学習の効果を上げるためである。 In this step S21, a group (M) of two adjacent pixel groups is selected from all the two pixel groups of the learning sample, and one random group is selected from the group of the pixel groups. Two adjacent or adjacent pixels are selected. Here, the reason for selecting at random is to increase the effect of learning.

ここで、例えば人の頭部および肩部の人型の検知に使用する際の２画素の組み合わせ例を、図１２に示す。この例は、ウインドウサイズが２４画素×２４画素（図では簡略化して１２画素×１２画素で表現）の場合であり、黒丸で示したものが選択された弱判別器で使用される２画素である。この図１２は、上位８個の弱判別器で使用される隣接または近接２画素の組み合わせを示したものであり、実際のアンサンブル学習では、さらに多くの弱判別器を用いることになる。 Here, for example, FIG. 12 shows a combination example of two pixels when used for detecting a human type of a human head and shoulders. This example is a case where the window size is 24 pixels × 24 pixels (represented by 12 pixels × 12 pixels for simplification in the figure), and those indicated by black circles are two pixels used by the selected weak classifier. is there. FIG. 12 shows a combination of adjacent or adjacent two pixels used in the top eight weak classifiers. In actual ensemble learning, more weak classifiers are used.

参考のため、検知対象である人の頭部および肩部のエッジ形状の例を図１３に示す。この図１３は、複数枚の頭部画像にエッジ検出処理を施し、その平均を取ったものである。実際の学習過程においては事前にエッジ検出処理を行なわないので、ここでは参考に留めるが、前記のようにして選択された隣接または近接２画素の組を累積していくと、ほぼこの平均画像と同様の形状になっていく。 For reference, an example of the edge shape of the head and shoulders of a person who is a detection target is shown in FIG. FIG. 13 shows an average of edge detection processing performed on a plurality of head images. Since edge detection processing is not performed in advance in the actual learning process, it is only referred to here, but if the set of adjacent or adjacent two pixels selected as described above is accumulated, this average image and It becomes the same shape.

（ステップＳ２２）頻度分布作成
次に、全ての学習サンプルに対して、ステップＳ２１にて選択した隣接または近接した２つの画素の輝度値の差（Ｉ１−Ｉ２）として、制約ピクセル間差分特徴量ｄを求め、図７（Ａ）に示したようなヒストグラム（頻度分布）を求める。 (Step S22) Frequency Distribution Creation Next, the inter-restricted pixel difference feature quantity d is set as the difference (I1-I2) between the luminance values of two adjacent or adjacent pixels selected in step S21 for all learning samples. And a histogram (frequency distribution) as shown in FIG.

（ステップＳ２３）閾値Ｔｈminの算出
次に、ステップＳ２２にて求めた頻度分布から、前記図１７の（式２）に示した重み付き誤り率ｅ_ｔを、その最小値ｅminにする閾値Ｔｈminを求める。 (Step S23) Calculation of threshold Thmin Next, the frequency distribution obtained in step S22, the weighted error ratio _{e t} shown in equation (2) of FIG. 17, calculate a threshold Thmin that its minimum emin .

（ステップＳ２４）閾値Ｔｈmaxの算出
次に、ステップＳ２２にて求めた頻度分布から、前記図１７の（式２）に示した重み付き誤り率ｅ_ｔを、その最大値ｅmaxにする閾値Ｔｈmaxを求め、図１８の（式７）に示す方法によって閾値を反転する。すなわち、弱判別器は１つの閾値Ｔｈより大きいか否かで正解か不正解かの２つの値を出力するものであり、したがって、重み付き誤り率ｅ_ｔが０．５未満である場合は、閾値を反転することで、重み付き誤り率を０．５以上にすることができる。 (Step S24) Calculation of threshold Thmax Then, from the frequency distribution obtained in step S22, the weighted error ratio _{e t} shown in equation (2) of FIG. 17, obtains a threshold Thmax that its maximum value emax The threshold value is inverted by the method shown in (Equation 7) of FIG. That is, the weak discriminator outputs two values of correct answer and incorrect answer depending on whether or not it is larger than one threshold value Th. Therefore, when the weighted error rate _et is less than 0.5, By reversing the threshold, the weighted error rate can be made 0.5 or more.

（ステップＳ２５）パラメータ決定
次に、上述のｅminとｅmax’とから、弱判別器を構成する各パラメータ、すなわち、隣接または近接する２つの画素Ｐ１、Ｐ２の位置と、その閾値Ｔｈとを決定する。すなわち、
ｅmin＜ｅmax’の場合：Ｐ１、Ｐ２、Ｔｈmin、
ｅmin＞ｅmax’の場合：Ｐ１’（＝Ｐ２）、Ｐ２’（＝Ｐ１）、Ｔｈmin、
とする。 (Step S25) Parameter Determination Next, each parameter constituting the weak classifier, that is, the position of two adjacent or adjacent pixels P1 and P2, and the threshold Th thereof are determined from the above-described emin and emax ′. . That is,
When emin <emax ': P1, P2, Thmin,
When emin> emax ′: P1 ′ (= P2), P2 ′ (= P1), Thmin,
And

（ステップＳ２６）繰り返し処理
そして、ステップＳ２６において、学習サンプルについて隣接または近接する２つの画素の組の全ての数Ｍ分について、ステップＳ２１〜２５の処理を繰り返したか否かを判別し、全ての画素の組数Mについて、未だステップＳ２１〜２５を繰り返してはいない判別したときには、ステップＳ２１に戻り、ステップＳ２１〜ステップＳ２６の処理を繰り返す。このように、１つの弱判別器の生成にあたって、ｍ（＝１，２，・・・，Ｍ）回の繰り返し処理が行われる。 (Step S26) Repetition Processing Then, in step S26, it is determined whether or not the processing of steps S21 to 25 has been repeated for all the number M of two sets of adjacent or adjacent pixels for the learning sample, and all pixels When it is determined that steps S21 to 25 have not been repeated yet, the process returns to step S21, and the processes of steps S21 to S26 are repeated. As described above, m (= 1, 2,..., M) iterations are performed to generate one weak classifier.

（ステップＳ２７）弱判別器の選択
ステップＳ２６で、全ての画素の組数Mについて、未だステップＳ２１〜２５を繰り返したと判別したときには、ステップＳ２７に進み、Ｍ回の繰り返し処理にて生成された弱判別器のうち、誤り率ｅ_ｔが最も小さかった弱判別器のパラメータ候補を最終的な弱判別器のパラメータとして採用する。そして、この図１１の処理を終了し、図１０に示したステップＳ１３に進む。 (Step S27) Selection of weak discriminator When it is determined in step S26 that steps S21 to 25 have been repeated for the number M of sets of all pixels, the process proceeds to step S27, and the weak generated by M iterations. Among the classifiers, the parameter candidate of the weak classifier having the smallest error rate _et is adopted as the final weak classifier parameter. Then, the process of FIG. 11 is terminated, and the process proceeds to step S13 shown in FIG.

ここで、ステップＳ２７では、弱判別器は、図１８の（式８）を基に判別が行なわれる。この（式８）において、ｋ_１，ｋ_２は、画素Ｉ１，Ｉ２の位置、ｘはウインドウ画像の輝度値、θ_ｔは閾値である。 Here, in step S27, the weak discriminator is discriminated based on (Equation 8) in FIG. In (Equation 8), k ₁ and k ₂ are the positions of the pixels I 1 and I 2, x is the luminance value of the window image, and θ _t is a threshold value.

なお、説明の都合上、図１０に示したステップＳ１３において重み付き誤り率ｅ_ｔを算出するものとして説明したが、ステップＳ２７において、誤り率ｅ_ｔが最も小さい弱判別器を選択した時点で、ステップＳ１３に示す誤り率ｅ_ｔが自動的に得られる。 For convenience of explanation, at the time has been described as calculating the weighted error ratio e _t in step S13 shown in FIG. 10, in step S27, and selects the error rate e _t is the smallest weak classifier, error ratio _{e t} shown in step S13 automatically obtained.

なお、この実施の形態においては、前回の繰り返し処理においてステップＳ１５にて求めたデータ重みＤ_ｔ，ｉを使用し、複数の弱判別器の特徴量を学習し、これらの弱判別器（弱判別器候補）の中から前記図１７の（式２）に示した重み付き誤り率ｅ_ｔが最も小さいものを選択することで、１つの弱判別器を生成する場合について説明したが、上述のステップＳ１２において、例えば予め用意又は学習した複数の画素位置から任意の画素位置を選択して弱判別器を生成するようにしてもよい。 In this embodiment, the data weights D _{t, i} obtained in step S15 in the previous repetitive process are used to learn the feature quantities of a plurality of weak classifiers, and these weak classifiers (weak classifiers). by weighted error ratio e _t shown in equation (2) of FIG. 17 from the vessel candidates) selects the smallest has described the case of generating one weak discriminator, the above step In S12, for example, a weak discriminator may be generated by selecting an arbitrary pixel position from a plurality of pixel positions prepared or learned in advance.

また、上述のステップＳ１２〜ステップＳ１７までの繰り返し処理に使用する学習サンプルとは異なる学習サンプルを使用して弱判別器を生成してもよい。また、交差検定（ｃｒｏｓｓ−ｖａｌｉｄａｔｉｏｎ）法又はジャックナイフ（jack-knife）法などの評価などのように、学習サンプルとは別のサンプルを用意して、生成された弱判別器や判定部２３の評価を行うようにしてもよい。 Moreover, you may produce | generate a weak discriminator using the learning sample different from the learning sample used for the above-mentioned repetition process from step S12 to step S17. Further, a sample other than the learning sample is prepared, such as an evaluation such as a cross-validation method or a jack-knife method, and the generated weak discriminator and the determination unit 23 An evaluation may be performed.

ここで、交差検定法とは、学習サンプルを均等にＬ個に分割し、その中から１つ以外を使用して学習を行い、当該１つを使用して学習結果を評価する作業をＬ回繰り返して学習結果の評価を行う手法である。 Here, the cross-validation method means that the learning sample is equally divided into L pieces, learning is performed using one of them, and the learning result is evaluated using the one by L times. This is a method of repeatedly evaluating learning results.

以上は、弱判別器が１つの閾値Ｔｈを有する場合であるが、前述した（式ｃ）または（式ｄ）に示すように、弱判別器が２つの閾値Ｔｈ１、Ｔｈ２を有するような場合には、図１１に示したステップＳ２３〜ステップＳ２５の処理が若干異なる。 The above is a case where the weak discriminator has one threshold Th, but as shown in the above-described (Equation c) or (Equation d), the weak discriminator has two thresholds Th1 and Th2. Is slightly different from step S23 to step S25 shown in FIG.

すなわち、前記（式ｂ）に示したように、１つの閾値Ｔｈの場合は、反転することにより、重み付き誤り率が０．５より大きい場合に、その誤り率を反転させることができたが、（式ｃ）に示したように、制約ピクセル間差分特徴量が、閾値Ｔｈ２より大きく、かつ、閾値Ｔｈ１より小さい場合が正解の判別結果である場合、これを反転すると、（式ｄ）に示すように、閾値Ｔｈ２より小さいか、閾値Ｔｈ１より大きい場合が、正解の判別結果になる。すなわち、（式ｃ）の反転は、（式ｄ）となり、（式ｄ）の反転は、（式ｃ）となる。 That is, as shown in the above (formula b), in the case of one threshold Th, the error rate can be inverted when the weighted error rate is larger than 0.5 by inverting. As shown in (Equation c), when the difference feature amount between the restricted pixels is larger than the threshold Th2 and smaller than the threshold Th1 is a correct discrimination result, when this is inverted, (Equation d) is obtained. As shown, a correct determination result is obtained when the threshold value is smaller than the threshold value Th2 or larger than the threshold value Th1. That is, the inversion of (expression c) becomes (expression d), and the inversion of (expression d) becomes (expression c).

弱判別器が２つの閾値Ｔｈ１、Ｔｈ２を有して判別結果を出力するような場合は、図１１に示すステップＳ２２において、制約ピクセル間差分特徴量における頻度分布を求め、重み付き誤り率ｅ_ｔを最小にする閾値Ｔｈ１、Ｔｈ２を求める。そして、ステップＳ２６で所定回数Ｍ回、繰り返したことを判別した後、ステップＳ２７で、生成された弱判別器の中で、重み付き誤り率ｅ_ｔが最も小さい弱判別器を採用する。 When the weak discriminator has two threshold values Th1 and Th2 and outputs a discrimination result, in step S22 shown in FIG. 11, the frequency distribution in the constrained pixel difference feature quantity is obtained, and the weighted error rate e _t. Threshold values Th1 and Th2 that minimize the threshold are obtained. Then, after determining that it has been repeated a predetermined number of times M in step S26, in step S27, the weak classifier having the smallest weighted error rate _et is adopted among the generated weak classifiers.

また、前記（式ｅ）に示したように、２値出力ではなく、連続値を出力する弱判別器の場合、図１１のステップＳ２１と同様にして、先ず、隣接または近接２画素群の中から、２つの画素をランダムに選択する。そして、ステップＳ２２と同様にして、全学習サンプルにおける頻度分布を求める。そして、得られた頻度分布に基づき、（式ｅ）に示した関数ｆ（ｘ）を求める。 Further, as shown in the above (formula e), in the case of a weak discriminator that outputs a continuous value instead of a binary output, first, as in step S21 in FIG. Then, two pixels are selected at random. Then, the frequency distribution in all the learning samples is obtained in the same manner as in step S22. Then, based on the obtained frequency distribution, the function f (x) shown in (Expression e) is obtained.

そして、弱判別器の出力として検知対象画像である度合い（正解である度合い）を出力するような所定の学習アルゴリズムに従って誤り率を算出するという一連の処理を所定回数Ｍ回繰り返し、誤り率が最も小さい（正答率が高い）パラメータを選択することで弱判別器を生成する。 Then, a series of processes of calculating an error rate according to a predetermined learning algorithm that outputs the degree of detection target image (degree of correct answer) as an output of the weak classifier is repeated a predetermined number of times M, and the error rate is the highest. A weak classifier is generated by selecting a small parameter (high correct answer rate).

以上のようにして、繰り返し回数の最大数繰り返す、すなわち、生成し得る最大数の弱判別器を生成し、それらの中から誤り率が最も小さいものを弱判別器として採用すると、性能が高い弱判別器を生成することができるが、最大回数未満の例えば数百回繰り返し処理を行って、その中から最も誤り率が小さいものを採用してもよい。 As described above, when the maximum number of iterations is repeated, that is, the maximum number of weak discriminators that can be generated is generated, and the one with the smallest error rate is adopted as the weak discriminator, the performance is weak. Although the discriminator can be generated, it is possible to repeat the processing less than the maximum number of times, for example, several hundred times, and adopt the one with the smallest error rate.

なお、上述の説明では、ウインドウＷＤは、例えば２４画素×２４画素の図１４（Ａ）に示すような矩形形状としたが、前述もしたように、ウインドウＷＤの形状およびサイズは、これに限られるものではなく、例えば、検知対象画像の輪郭形状特徴に併せたウインドウ形状としてもよい。例えば前述した人の頭部および肩部の人型を検知対象とする場合には、図１４（Ｂ）に示すように、黒く塗り潰した領域をマスク領域として、その領域を使用せず、有効な領域のみで、隣接または近接する２画素を抽出して、学習に用いるようにしても良い。 In the above description, the window WD has a rectangular shape as shown in FIG. 14A of, for example, 24 pixels × 24 pixels. However, as described above, the shape and size of the window WD are not limited thereto. For example, the window shape may be combined with the contour shape feature of the detection target image. For example, in the case where the above-described human shape of the head and shoulders of the person is to be detected, as shown in FIG. 14B, an area filled in black is used as a mask area, and the area is not used and effective. Two pixels adjacent or close to each other may be extracted and used for learning.

［検知対象画像の検出判定方法］
次に、図１に示した検知対象画像検出装置における検知対象画像検出方法の実施形態について説明する。図１５は、検知対象画像検出方法の実施形態を示すフローチャートであり、図２に示した処理のフローチャートよりは、詳細な処理説明図である。 [Detection method for detection target image]
Next, an embodiment of a detection target image detection method in the detection target image detection apparatus shown in FIG. 1 will be described. FIG. 15 is a flowchart showing an embodiment of the detection target image detection method, and is a more detailed process explanatory diagram than the flowchart of the process shown in FIG.

検出時（判別工程）においては、上述のようにして生成された弱判別器群２０１〜２０Ｔを利用した判定部２３を使用し、所定のアルゴリズムに従って、入力画像供給装置部からの入力画像中から検知対象画像を検出する。なお、この図１５の例は、入力画像供給装置部からの入力画像が、最大のサイズのスケーリング画像となっている場合である。 At the time of detection (determination step), the determination unit 23 using the weak classifier groups 201 to 20T generated as described above is used, and the input image from the input image supply device unit is detected according to a predetermined algorithm. A detection target image is detected. Note that the example of FIG. 15 is a case where the input image from the input image supply unit is a scaling image of the maximum size.

（ステップＳ３１）スケーリング画像生成
先ず、図１に示した検知対象画像判定装置部２では、スケーリング部２１が、入力画像供給装置部１から与えられた濃淡画像（入力画像）を一定の割合で縮小スケーリングする。この場合、入力画像供給装置部１は、入力画像として濃淡画像が入力されたものをそのまま検知対象画像判定装置部２に出力するようにしてもよく、また、入力画像供給装置部２にて、その入力画像を濃淡画像に変換した後、検知対象画像判定装置部２に出力するようにしてもよい。 (Step S31) Scaling Image Generation First, in the detection target image determination device unit 2 shown in FIG. 1, the scaling unit 21 reduces the grayscale image (input image) given from the input image supply device unit 1 at a constant rate. Scale. In this case, the input image supply device unit 1 may output the input grayscale image as an input image to the detection target image determination device unit 2 as it is. After converting the input image into a grayscale image, the input image may be output to the detection target image determination device unit 2.

スケーリング部２１は、最初は、入力画像供給装置部１から与えられる画像をスケール変換せずに出力し、次のタイミング以降で縮小スケーリングしたスケーリング画像を出力する。ここで、次のスケーリング画像を生成するタイミングは、前に出力したスケーリング画像の全領域についての検知対象画像の検出判定が終了した時点とする。 The scaling unit 21 first outputs an image supplied from the input image supply device unit 1 without performing scale conversion, and outputs a scaled image reduced and scaled after the next timing. Here, the timing for generating the next scaled image is the time when the detection determination of the detection target image is completed for all the areas of the previously output scaled image.

そして、この例では、スケーリング画像が、ウインドウ画像より小さくなった時点で、入力画像の１枚分についての検知対象画像の検出判定処理が終了したとして、次の入力画像（動画の場合には、次のフレームの画像）の処理に移る。ただし、図１５は、１枚の入力画像についての検知対象画像の検出判定処理となっており、図１５の処理が、入力画像の１枚ごとに行なわれるものである。 In this example, when the scaled image becomes smaller than the window image, the detection determination process for the detection target image for one input image is completed, and the next input image (in the case of a moving image, The process proceeds to the next frame image). However, FIG. 15 shows detection detection processing for a detection target image for one input image, and the processing in FIG. 15 is performed for each input image.

（ステップＳ３２）
走査部２２は、スケーリング部２２からのスケーリング画像の情報を受け取り、当該スケーリング画像上において、ウインドウＷＤの位置を縦横に走査し、各走査位置におけるウインドウ画像を判定部２３に出力する。 (Step S32)
The scanning unit 22 receives the information of the scaling image from the scaling unit 22, scans the position of the window WD vertically and horizontally on the scaled image, and outputs the window image at each scanning position to the determination unit 23.

（ステップＳ３３、３４）評価値ｓの算出
判定部２３は、走査部２２から出力されるウインドウ画像が検知対象画像であるか否かを判定する。判定部２３は、ウインドウ画像に対して、上述した複数の弱判別器２０ｔ（＝２０１〜２０Ｔ）の推定値ｆｔ（ｘ）を、逐次、重み付き加算し、その重み付け加算値（重み付き多数決の値の更新値）を評価値ｓとして算出する。そして、この評価値ｓに基づき、ウインドウ画像が検知対象画像か否か、及び判別を打ち切るか否かを判定する。 (Steps S33 and 34) Calculation of Evaluation Value s The determination unit 23 determines whether or not the window image output from the scanning unit 22 is a detection target image. The determination unit 23 sequentially adds the weighted addition values (weighted majority decision) of the estimated values ft (x) of the plurality of weak classifiers 20t (= 201 to 20T) described above to the window image. Value update value) is calculated as the evaluation value s. Then, based on the evaluation value s, it is determined whether or not the window image is a detection target image and whether or not the determination is terminated.

なお、実際的には、判定部２３は、評価値ｓを処理制御部２０に出力し、処理制御部２０が、ウインドウ画像が検知対象画像か否か、及び判別を打ち切るか否かを判定するようにする。 Actually, the determination unit 23 outputs the evaluation value s to the processing control unit 20, and the processing control unit 20 determines whether the window image is a detection target image and whether to abort the determination. Like that.

判定部２３は、先ず、ウインドウ画像が入力されると、その評価値ｓ＝０に初期化する。判定部２３の初段の弱判別器２０１は、制約ピクセル間差分特徴量ｄを算出する（ステップＳ３３）。そして、この弱判別器２０１が出力する推定値を上記評価値ｓに反映させる（ステップＳ３４）。 First, when a window image is input, the determination unit 23 initializes the evaluation value s = 0. The weak discriminator 201 at the first stage of the determination unit 23 calculates the difference feature amount d between restricted pixels (step S33). Then, the estimated value output by the weak classifier 201 is reflected in the evaluation value s (step S34).

ここで、上述した（式ｂ）、（式ｃ）、（式ｄ）により、２値の推定値を出力する弱判別器と、（式ｅ）に示す関数ｆ（ｘ）を推定値として出力する弱判別器とでは、その推定値の評価値ｓへの反映の仕方が異なる。 Here, the weak discriminator that outputs a binary estimated value and the function f (x) shown in (Expression e) are output as estimated values by the above-described (Expression b), (Expression c), and (Expression d). The method of reflecting the estimated value on the evaluation value s differs from that of the weak classifier.

先ず、前記（式ｂ）を弱判別器２０ｔに利用し、２値の値を推定値として出力する場合、評価値ｓは、図１８の（式９）に示すようなものとなる。 First, when (Formula b) is used for the weak classifier 20t and a binary value is output as an estimated value, the evaluation value s is as shown in (Formula 9) of FIG.

また、前記（式ｃ）を弱判別器２０ｔに利用し、２値の値を推定値として出力する場合、評価値ｓは、図１８の（式１０）に示すようなものとなる。 When the (expression c) is used for the weak discriminator 20t and a binary value is output as an estimated value, the evaluation value s is as shown in (expression 10) in FIG.

また、前記（式ｄ）を弱判別器２０ｔに利用し、２値の値を推定値として出力する場合、評価値ｓは、図１８の（式１１）に示すようなものとなる。 When the (expression d) is used for the weak discriminator 20t and a binary value is output as an estimated value, the evaluation value s is as shown in (expression 11) in FIG.

また、前記（式ｅ）を弱判別器２０ｔに利用し、関数ｆを推定値として出力する場合、評価値ｓは、図１８の（式１２）に示すようなものとなる。 Further, when the (formula e) is used for the weak discriminator 20t and the function f is output as an estimated value, the evaluation value s is as shown in (formula 12) of FIG.

（ステップＳ３５、Ｓ３６、Ｓ３７、Ｓ３８）検出判定および打ち切り判定
そして、判定部２３（または処理制御部２０）は、上述に示した例えば４つの方法の何れかにより得られた（更新された）評価値ｓが、打ち切り閾値Ｒ_ｔより大きいか否かを判定する（ステップＳ３５）。このステップＳ３５で評価値ｓが打ち切り閾値Ｒ_ｔより大きいと判別された場合は、所定回数（＝Ｔ回）繰り返したか否かを判別し（ステップＳ３６）、Ｔ回繰り返していないと判別したときには、ステップＳ３３に戻り、このステップＳ３３からステップＳ３６までの処理を繰り返す。 (Steps S35, S36, S37, S38) Detection Determination and Abort Determination Then, the determination unit 23 (or the process control unit 20) is obtained (updated) by any one of the four methods described above, for example. value s is determined whether greater than the abort threshold value _{R t} (step S35). If the evaluation value s in this step S35 is judged to be larger than the abort threshold value R _t, the predetermined number of times (= T times) repeated whether determined (step S36), when it is judged that no repeated T times, Returning to step S33, the processing from step S33 to step S36 is repeated.

また、ステップＳ３６で、所定回数（＝Ｔ回）繰り返していると判別したときには、判定部２３は、得られている評価値ｓが０より大きいか否かにより、ウインドウ画像が検知対象画像であるか否かの判定をして、検知対象画像であると判定した場合は、処理制御部２０は、現在のスケーリング画像サイズおよびウインドウ位置を記憶する（ステップＳ３７）。そして、ステップＳ３７の次には、ステップＳ３８に進む。 If it is determined in step S36 that the predetermined number of times (= T) has been repeated, the determination unit 23 determines whether the obtained evaluation value s is greater than 0 or not, and the window image is the detection target image. If it is determined whether the image is a detection target image, the process control unit 20 stores the current scaled image size and window position (step S37). Then, after step S37, the process proceeds to step S38.

また、ステップＳ３５で、評価値ｓが打ち切り閾値Ｒ_ｔより小さいと判別されたときにも、ステップＳ３８に進む。ステップＳ３８では、処理制御部２０が、次の探索ウインドウがあるか否かを判別し、次の探索ウインドウがあると判別したときには、ステップＳ３２に戻り、このステップＳ３２からの処理を繰り返す。 Further, in step S35, even when the evaluation value s is judged to be smaller than the abort threshold value _{R t,} the process proceeds to step S38. In step S38, the process control unit 20 determines whether or not there is a next search window. If it is determined that there is a next search window, the process returns to step S32 and repeats the processes from step S32.

また、ステップＳ３８で、次の探索ウインドウがないと判別したときには、処理制御部２０は、ステップＳ３９に進み、次のスケーリング画像があるか否かを判別し、次のスケーリング画像があると判別した場合は、ステップＳ３１に戻って、このステップＳ３１からの処理を繰り返す。前述したように、ステップＳ２１のスケーリング処理は、ウインドウ画像よりスケーリング画像が小さくなった時点で終了する。 If it is determined in step S38 that there is no next search window, the process control unit 20 proceeds to step S39, determines whether there is a next scaling image, and determines that there is a next scaling image. In that case, the process returns to step S31 and the process from step S31 is repeated. As described above, the scaling process in step S21 ends when the scaled image becomes smaller than the window image.

（ステップＳ４０〜Ｓ４２）重なり領域の削除
ステップＳ３９で、次のスケーリング画像がなくなったと判別したときには、検知対象画像であると検知判定されたウインドウ画像領域についての重なり領域の有無を判定し、重なり領域があるときには、当該重なり領域の削除処理を実行する。 (Steps S40 to S42) Deletion of Overlap Area When it is determined in step S39 that there is no next scaled image, it is determined whether or not there is an overlap area for the window image area that has been detected as a detection target image. If there is, the deletion process of the overlap area is executed.

すなわち、ステップＳ３９で、１枚の入力画像に対して、全てのスケーリング画像の処理が終了したと判別すると、検出判定された検知対象画像の領域（ウインドウ画像の領域）について、重なりがあるか否か判別する（ステップＳ４０）。 That is, if it is determined in step S39 that the processing of all the scaled images has been completed for one input image, there is an overlap in the detection target image area (window image area) that has been detected and determined. (Step S40).

ステップＳ４０で、互いに重なっている領域が在ると判別したときには、当該互いに重なっている２つのウインドウ領域を取り出し（ステップＳ４１）、この２つのウインドウ領域のうち、評価値ｓが小さい領域は信頼度が低いとみなし削除し、評価値ｓの大きい領域を真の検知対象画像の領域であると選択する（ステップＳ４２）。 If it is determined in step S40 that there are overlapping areas, two overlapping window areas are extracted (step S41). Of these two window areas, the area with the smaller evaluation value s is the reliability. Is determined to be low, and the region having the large evaluation value s is selected as the region of the true detection target image (step S42).

そして、ステップＳ４２からステップＳ４０に戻り、互いに重なっている領域がなくなるまで、ステップＳ４０からステップＳ４２までの処理を繰り返す。そして、ステップＳ４０で、互いに重なっている領域がないとは判別されると、この図１５の処理ルーチンを終了する。これにより、複数個の重複領域が検出されても、最も評価値ｓが高い領域１枚のみが選択される。 Then, the process returns from step S42 to step S40, and the processes from step S40 to step S42 are repeated until there is no overlapping area. When it is determined in step S40 that there are no overlapping areas, the processing routine of FIG. 15 is terminated. Thereby, even if a plurality of overlapping areas are detected, only one area having the highest evaluation value s is selected.

以上のように、この実施の形態における検知対象画像検出方法によれば、制約ピクセル間差分特徴量により弱判別する弱判別器を集団学習により学習した判定部を使用して検知対象画像を検出するため、ウインドウ画像において、対応する２つの画素の輝度値を読み出し、その差を算出するのみで、上記ステップＳ３３における検知対象画像の特徴量の算出工程が終了し、極めて高速に検知対象画像の検出処理することができるため、リアルタイムな人型などの検出が可能である。 As described above, according to the detection target image detection method in this embodiment, the detection target image is detected using the determination unit that learns the weak classifier that performs weak determination based on the difference feature quantity between restricted pixels by collective learning. Therefore, the process of calculating the feature amount of the detection target image in step S33 is completed by reading the luminance values of the corresponding two pixels in the window image and calculating the difference between them, and the detection target image is detected extremely quickly. Since it can be processed, it is possible to detect a human type in real time.

また、制約ピクセル間差分特徴量から判別した判別結果（推定値）と、判別に使用した弱判別器に対する信頼度とを乗算した値を加算して評価値ｓを逐次更新する毎に打ち切り閾値Ｒｔと比較し、弱判別器の推定値の演算を続けるか否かを判定する。そして、打ち切り閾値Ｒｔを評価値ｓが下回った場合に弱判別器の演算を打ち切り、次のウインドウ画像の処理に移ることにより、無駄な演算を飛躍的に低減して、更に高速に検知対象画像の検出が可能となる。 Each time the evaluation value s is sequentially updated by adding a value obtained by multiplying the discrimination result (estimated value) discriminated from the inter-constraint pixel difference feature quantity and the reliability of the weak discriminator used for discrimination, the truncation threshold Rt And determine whether or not to continue the calculation of the estimated value of the weak classifier. Then, when the evaluation value s falls below the truncation threshold value Rt, the weak discriminator's computation is aborted, and the processing of the next window image is performed, thereby drastically reducing wasteful computation and further increasing the detection target image. Can be detected.

すなわち、入力画像及びそれを縮小スケーリングした、またスケーリング画像の全ての領域を走査してウインドウ画像を切り出した場合、それらのウインドウ画像のうち検知対象画像である確率は小さく、ほとんどが非検知対象画像である。この非検知対象画像であるウインドウ画像の判別を、途中で打ち切ることで、判別工程を極めて高効率化することができる。 In other words, when a window image is cut out by scanning the entire input image and the scaled image, or the entire scaled image, the probability of being a detection target image is small, and most of the window images are non-detection target images. It is. By discriminating the window image that is the non-detection target image in the middle, the discrimination process can be made extremely efficient.

なお、逆に検出すべき検知対象画像が多数含まれるような場合、上述した打ち切り閾値と同様の手法にて、検知対象画像であることが明らかなウインドウ画像の演算を、途中で打ち切るような閾値も設けてもよい。さらに、入力画像をスケーリング部にて、種々のサイズにスケーリングすることで、固定サイズのウインドウを用いるものであっても、実質的に任意の大きさの探索ウインドウを設定したのと等価となり、任意の大きさの検知対象画像を検出することができる。 On the other hand, when there are many detection target images to be detected, a threshold value that causes the calculation of the window image that is clearly detected as the detection target image to be interrupted in the same way as described above. May also be provided. Furthermore, even if a fixed-size window is used by scaling the input image to various sizes by the scaling unit, it is equivalent to setting a search window of an arbitrary size. Can be detected.

また、例えば顔を検知対象画像とする場合に、目、鼻、口などを詳細に検知判別する場合には、ウインドウＷＤ内の全ての２画素の組み合わせについて、ピクセル間差分特徴量を求めることで、それを検知判別することができるが、全ての２ピクセル間差分特徴量を求める必要がある分、演算量が多くなる。 In addition, for example, when a face is a detection target image, when eyes, nose, mouth, and the like are detected and discriminated in detail, an inter-pixel difference feature amount is obtained for all combinations of two pixels in the window WD. This can be detected and discriminated, but the amount of computation increases as much as it is necessary to find all the difference features between two pixels.

これに対して、この実施形態の場合には、全ての２ピクセル間差分特徴量を求めるのではなく、上述した隣接または近接する２画素のみを用いた制約ピクセル間差分特徴量を用いるものであり、特に、例えば、顔の目、鼻、口などの詳細部分は検知する必要がない人型の検知を行なう場合に好適であり、有効である。何故なら、人の頭部、肩部にかけての輪郭形状は、一般に特徴的なΩ型であり、見る方向によらずΩ型に見えるから、全ての２画素間の特徴量を用いるよりも、効率的で有効な検知判別が可能となる。 On the other hand, in the case of this embodiment, not all of the difference feature amounts between two pixels are obtained, but the above-mentioned restricted feature difference amount between pixels using only two adjacent or adjacent pixels is used. In particular, for example, it is suitable and effective when performing human-type detection in which detailed portions such as the eyes, nose and mouth of the face need not be detected. This is because the contour shape on the head and shoulders of a person is generally a characteristic Ω type, and looks like an Ω type regardless of the viewing direction, so it is more efficient than using all the features between two pixels. And effective detection discrimination becomes possible.

すなわち、この実施形態によれば、一般に行われる画像マッチングのような詳細なパターンの比較ではなく、大体の輪郭の類似性から検知対象物体を検知できるようになるため、個体や見る方向などによる違いを吸収した検知を行なうことが可能である。 That is, according to this embodiment, it is possible to detect a detection target object based on the similarity of outlines, rather than the comparison of detailed patterns such as image matching that is generally performed. Can be detected.

前述したように、人の頭部肩部にかけての輪郭形状は、一般に特徴的なΩ型である。ただし、ある特定の形状テンプレートを用いてそのΩ型を表現し、人を検知することは、個人個人の体型、髪型を含む特徴の違いにより難しい。これに対して、この実施形態の手法は、輪郭形状を表す制約ピクセル間差分特徴量を判別特徴量としているために、人のようなΩ型を検知するのに有利である。 As described above, the contour shape of the person's head and shoulder is generally a characteristic Ω type. However, it is difficult to detect a person by expressing the Ω shape using a specific shape template due to the difference in characteristics including the individual body shape and hairstyle. On the other hand, the method of this embodiment is advantageous for detecting an Ω type like a person because the constrained inter-pixel difference feature quantity representing the contour shape is used as the discrimination feature quantity.

そして、例えばΩ型を特徴量とした場合、顔の表情、メガネ、マスクの着用、顔の向きに寄らない検知が可能である。さらに、後頭部からの検知など、顔検知には不可能な向きからの検知も可能になる。 For example, when the Ω type is used as a feature amount, it is possible to detect a facial expression, wearing glasses, wearing a mask, and detecting without depending on the face direction. Furthermore, detection from a direction impossible for face detection, such as detection from the back of the head, is possible.

さらに、この実施形態の制約ピクセル間差分特徴量を用いたアンサンブル学習法型検知判定装置は、弱判別器を終結した判定装置であるため、ひとつの判別器からなるような判別手法よりも、個人個人の輪郭形状の差異に対してロバストな検知を行なうことができる。また、実際の人型を学習してテンプレートを作るようにするため、勝手に定義された特定形状テンプレートを用いる他の手段よりもロバストな検知が行なえる。 Furthermore, since the ensemble learning method type detection / determination device using the inter-constraint pixel difference feature quantity of this embodiment is a determination device that terminates the weak discriminator, it is more personal than a discrimination method consisting of a single discriminator. Robust detection can be performed for differences in individual contour shapes. In addition, since a template is created by learning an actual human form, detection can be performed more robustly than other means using a specific shape template that is arbitrarily defined.

この発明による検知対象画像検出装置の実施形態における処理機能を示す機能ブロック図である。It is a functional block diagram which shows the processing function in embodiment of the detection target image detection apparatus by this invention. 図１の実施形態における検知対象画像の検知判定処理動作を説明するためのフローチャートである。3 is a flowchart for explaining a detection determination processing operation of a detection target image in the embodiment of FIG. 1. 図１の実施形態におけるスケーリング部２１にてスケール変換された画像を示す模式図である。It is a schematic diagram which shows the image scale-converted by the scaling part 21 in embodiment of FIG. 図１の実施形態における走査部２２が探索ウインドウを走査する様子を示す図である。It is a figure which shows a mode that the scanning part 22 in embodiment of FIG. 1 scans a search window. 制約ピクセル間差分特徴量を説明するための画像を示す模式図である。It is a schematic diagram which shows the image for demonstrating the difference feature quantity between restrictions pixels. 図１の実施形態における判定部２３の構成例を示す図である。It is a figure which shows the structural example of the determination part 23 in embodiment of FIG. 図１の実施形態における検知対象画像の検知判定方法と、閾値との関係を説明するための図である。It is a figure for demonstrating the relationship between the detection determination method of the detection target image in embodiment of FIG. 1, and a threshold value. 図１の実施形態における検知対象画像の他の検知判定方法と、閾値との関係を説明するための図である。It is a figure for demonstrating the relationship between the other detection determination method of the detection target image in embodiment of FIG. 1, and a threshold value. 横軸に弱判別器の数をとり、縦軸に重み付き多数決の値Ｆ(ｘ)をとって、入力される画像が検知対象画像か否かに応じた重み付き多数決の値Ｆ(ｘ)の変化を示す図である。Taking the number of weak classifiers on the horizontal axis and the weighted majority value F (x) on the vertical axis, the weighted majority value F (x) depending on whether the input image is a detection target image or not. It is a figure which shows the change of. 図１の実施形態における判定部２３を構成する複数個の弱判別器を得るための集団学習機の学習方法の一例を示すフローチャートである。It is a flowchart which shows an example of the learning method of the group learning machine for obtaining the some weak discriminator which comprises the determination part 23 in embodiment of FIG. １つの閾値Ｔｈで２値出力する弱判別器の学習方法（生成方法）の一例を示すフローチャートである。It is a flowchart which shows an example of the learning method (generation method) of the weak discriminator which outputs a binary value with one threshold value Th. 図１１の学習方法で生成される弱判別器で用いる２画素の組の例を示す図である。It is a figure which shows the example of the group of 2 pixels used with the weak discriminator produced | generated by the learning method of FIG. この発明の実施形態で使用される学習サンプルの一例を説明するための図である。It is a figure for demonstrating an example of the learning sample used by embodiment of this invention. この発明の実施形態における走査部２２で使用されるウインドウの他の例を説明するための図である。It is a figure for demonstrating the other example of the window used by the scanning part 22 in embodiment of this invention. この発明の実施形態における検知対象画像検出方法の一例を示すフローチャートである。It is a flowchart which shows an example of the detection target image detection method in embodiment of this invention. この発明の実施形態の説明に用いる式を示す図である。It is a figure which shows the formula used for description of embodiment of this invention. この発明の実施形態の説明に用いる式を示す図である。It is a figure which shows the formula used for description of embodiment of this invention. この発明の実施形態の説明に用いる式を示す図である。It is a figure which shows the formula used for description of embodiment of this invention.

Explanation of symbols

１…入力画像提供装置部、２…検知対象画像検出装置部、３…結果出力装置部、２０…処理制御部、２１…スケーリング部、２２…走査部、２３…判定部、２４…集団学習機部、２０１〜２０Ｔ…弱判別器、２１０…加算器
DESCRIPTION OF SYMBOLS 1 ... Input image provision apparatus part, 2 ... Detection target image detection apparatus part, 3 ... Result output apparatus part, 20 ... Processing control part, 21 ... Scaling part, 22 ... Scanning part, 23 ... Determination part, 24 ... Collective learning machine Part, 201-20T ... weak discriminator, 210 ... adder

Claims

A detection target image determination device that determines whether or not a given grayscale image is a detection target image,
Among the pixels constituting the grayscale image, the pixel is provided for each of a plurality of sets of adjacent or adjacent two positions determined by learning, and between the two pixels of the set of pixels. A plurality of weak discriminating means for calculating an estimated value indicating whether or not the set of pixels is a contour portion of the detection target image based on the feature amount and obtaining a difference between luminance values as a feature amount;
A detection target image comprising: a determination unit that determines whether the given grayscale image is the detection target image based on the estimated values calculated by the plurality of weak determination units. Judgment device.

The detection target image determination device according to claim 1,
Weighting assigning means for multiplying the estimated value from each of the plurality of weak discriminating means by a weighting coefficient obtained by the learning;
The determination unit determines whether or not the given grayscale image is the detection target image based on the estimated value to which the weighting is given from the weighting provision unit. Image determination device.

A detection target image determination device that detects and detects a detection target image from grayscale images,
Image reduction means for reducing the grayscale image and generating a plurality of images of different sizes;
Scanning means for scanning each of the plurality of different size reduced images from the image reducing means in window units of a fixed size;
Of the pixels constituting the grayscale image of the window unit obtained from the scanning means, provided for each of a plurality of sets of adjacent or adjacent two positions of pixels determined by learning in advance, A weakness for calculating a difference between luminance values of two pixels of the pixel set as a feature amount, and calculating an estimated value indicating whether the pixel set is a contour portion of the detection target image based on the feature amount A plurality of discrimination means;
A detection target image, comprising: a determination unit that determines whether or not the grayscale image in window units is the detection target image based on the estimated values calculated by the plurality of weak determination units. Judgment device.

The detection target image determination device according to claim 3,
Weighting assigning means for multiplying the estimated value from each of the plurality of weak discriminating means by a weighting coefficient obtained by the learning;
The determination unit determines whether or not the given grayscale image is the detection target image based on the estimated value to which the weighting is given from the weighting provision unit. Image determination device.

A detection target image determination method for determining whether a given grayscale image is a detection target image,
Of the pixels constituting the grayscale image, this is performed for each of a plurality of sets of adjacent or adjacent two positions determined by learning in advance, and 2 of the set of pixels. A difference between luminance values between pixels is obtained as a feature amount, and an estimated value indicating whether or not the pixel set is a contour portion of the detection target image based on the feature amount is calculated for the plurality of pixel sets. A weak discrimination step to calculate for each,
And a determination step of determining whether or not the given grayscale image is the detection target image based on the plurality of estimated values calculated in the weak determination step. Judgment method.

The detection target image determination method according to claim 5,
A weighting step of multiplying each of the plurality of estimated values calculated in the weak discrimination step by a weighting coefficient obtained by the learning;
In the determination step, it is determined whether the grayscale image in window units is the detection target image based on the estimated value to which the weighting is applied in the weighting step. Judgment method.

A detection target image determination method for detecting and detecting a detection target image from a grayscale image,
An image reduction step of reducing the grayscale image to generate a plurality of reduced images of different sizes;
A scanning step of scanning each of the reduced images of different sizes generated in the image reduction step in units of a fixed size window;
Among the pixels constituting the grayscale image in the window unit obtained in the scanning step, this is performed for each of a plurality of pairs of adjacent or adjacent two positions determined by learning. Then, a difference between luminance values between two pixels of the pixel set is obtained as a feature amount, and an estimated value indicating whether the pixel set is a contour portion of the detection target image based on the feature amount is obtained. A weak discrimination step for calculating each of the plurality of pixel sets;
And a determination step of determining whether or not the grayscale image in window units is the detection target image based on the plurality of estimated values calculated in the weak determination step. Judgment method.

The detection target image determination method according to claim 7,
A weighting step of multiplying each of the plurality of estimated values calculated in the weak discrimination step by a weighting coefficient obtained by the learning;
In the determination step, it is determined whether or not the grayscale image in the window unit is the detection target image based on the estimated value to which the weighting is applied in the weighting step. Judgment method.

In order to determine whether the given grayscale image is a detection target image,
Among the pixels constituting the grayscale image, the pixel is provided for each of a plurality of sets of adjacent or adjacent two positions determined by learning, and between the two pixels of the set of pixels. A plurality of weak discriminating means for calculating an estimated value indicating whether or not the set of pixels is a contour portion of the detection target image based on the feature amount, and obtaining a difference between luminance values as a feature amount; and
A detection target image determination program for functioning as a determination unit for determining whether or not the given grayscale image is the detection target image based on the estimated values calculated by the plurality of weak determination units.

In order to detect and determine the detection target image from the grayscale image,
Image reduction means for reducing the grayscale image and generating a plurality of images of different sizes;
Scanning means for scanning each of the plurality of reduced images of different sizes from the image reducing means in units of a fixed size window;
Of the pixels constituting the grayscale image of the window unit obtained from the scanning means, provided for each of a plurality of sets of adjacent or adjacent two positions of pixels determined by learning in advance, A weakness for calculating a difference between luminance values of two pixels of the pixel set as a feature amount, and calculating an estimated value indicating whether the pixel set is a contour portion of the detection target image based on the feature amount A plurality of discrimination means, and
A detection target image determination program for functioning as a determination unit for determining whether or not the grayscale image in window units is the detection target image based on the estimated values calculated by the plurality of weak determination units.