JP5335554B2

JP5335554B2 - Image processing apparatus and image processing method

Info

Publication number: JP5335554B2
Application number: JP2009121320A
Authority: JP
Inventors: 直嗣佐川
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2009-05-19
Filing date: 2009-05-19
Publication date: 2013-11-06
Anticipated expiration: 2029-05-19
Also published as: JP2010271792A

Description

本発明は、デジタル画像機器及び画像処理ソフトウェア等における静止画像及び動画像からの特定の被写体又は被写体の一部の検出に好適な画像処理装置及び画像処理方法等に関する。 The present invention relates to an image processing apparatus and an image processing method suitable for detecting a specific subject or a part of a subject from still images and moving images in digital image equipment and image processing software.

画像から特定の被写体パターンを自動的に検出する技術は、画像検索、物体検知、物体認識、物体追跡等の様々な分野に応用できる。このような技術の一例が非特許文献１に提案されている。この技術では、先ず入力画像から矩形の小領域（以下、検出ウインドウと呼ぶ）を抽出し、この検出ウインドウ内に顔が含まれているかどうかを判定する。この判定では、カスケード型に複数の強判別器を接続して構成された検出器に検出ウインドウを通す。そして、全ての強判別器で顔であると判定された場合に、検出ウインドウ内に顔があると出力し、それ以外の場合には検出ウインドウ内に顔がないと出力する。また、各強判別器には、複数の弱判別器等が含まれている。 A technique for automatically detecting a specific subject pattern from an image can be applied to various fields such as image search, object detection, object recognition, and object tracking. An example of such a technique is proposed in Non-Patent Document 1. In this technique, a rectangular small area (hereinafter referred to as a detection window) is first extracted from an input image, and it is determined whether or not a face is included in the detection window. In this determination, a detection window is passed through a detector configured by connecting a plurality of strong discriminators in a cascade type. If it is determined that all the strong classifiers are faces, it outputs that there is a face in the detection window, and otherwise outputs that there is no face in the detection window. Each strong classifier includes a plurality of weak classifiers.

しかしながら、非特許文献１に記載された従来の技術を用いて、実際に顔又は人物の検出を高い精度で行うためには、数百から数千の弱判別器が必要とされ、弱判別器における判別処理の回数が非常に多くなってしまう。この結果、処理時間が長くなる。 However, in order to actually detect a face or a person with high accuracy using the conventional technique described in Non-Patent Document 1, hundreds to thousands of weak classifiers are required. The number of discriminating processes will be very large. As a result, the processing time becomes longer.

その一方で、検出器へ入力する検出ウインドウの切り出し（抽出）では、先ず、入力画像を基準画像にリサイズし、その後、基準画像を輝度画像等に変換する。そして、輝度画像等のサイズに対して数段階の縮小画像を生成し、これらの縮小画像内全ての領域に対して検出ウインドウが走査するように、ラスタスキャンすることで検出ウインドウを切り出す。従って、検出器に入力される検出ウインドウの数は膨大な数となる。 On the other hand, in the extraction (extraction) of the detection window input to the detector, first, the input image is resized to a reference image, and then the reference image is converted into a luminance image or the like. Then, reduced images of several stages are generated with respect to the size of the luminance image or the like, and the detection window is cut out by raster scanning so that the detection window scans all regions in these reduced images. Therefore, the number of detection windows input to the detector is enormous.

なお、基準画像を数段階のサイズにリサイズして縮小画像を生成し、この縮小画像に対して検出ウインドウをラスタスキャンするのは、以下の理由のためである。すなわち、基準画像における被写体の大きさは様々だが、切り出す検出ウインドウは特定のサイズとなる。このため、基準画像における被写体が検出ウインドウサイズよりも大きな領域である場合には、この被写体を検出することができない。そこで、基準画像のサイズから数段階サイズを落とした縮小画像を生成し、これらの縮小画像全てに対して検出ウインドウをラスタスキャンすることで様々な大きさの被写体を検出する。 The reason why the reduced image is generated by resizing the reference image to several sizes and the detection window is raster-scanned for the reduced image is as follows. That is, the size of the subject in the reference image varies, but the detection window to be cut out has a specific size. For this reason, when the subject in the reference image is an area larger than the detection window size, the subject cannot be detected. Therefore, a reduced image is generated by reducing the size of the reference image by several steps, and subjects of various sizes are detected by raster scanning the detection window for all the reduced images.

このように、非特許文献１に記載された技術では、入力画像から切り出した膨大な数の検出ウインドウについて、数百から数千の弱判別器で構成される検出器において判別処理を行わなくてはならず、非常に処理コスト（処理時間）がかかってしまう。 As described above, in the technique described in Non-Patent Document 1, a huge number of detection windows cut out from an input image need not be subjected to discrimination processing in a detector composed of hundreds to thousands of weak classifiers. In other words, the processing cost (processing time) is very high.

このような技術に対し、単純に検出ウインドウのラスタスキャンを画素間引きで行うことにより処理時間を短縮する方法が考えられる。しかし、この方法では特徴的なパターンが判定されるはずの位置を飛ばしてしまうことがあるため、結果として検出漏れが多くなり検出精度が低下する。 For such a technique, a method of shortening the processing time by simply performing raster scanning of the detection window by pixel thinning is conceivable. However, in this method, a position where a characteristic pattern is supposed to be determined may be skipped. As a result, detection omissions increase and detection accuracy decreases.

これらの課題に対し、特許文献１には、検出ウインドウのラスタスキャンにおける画素間引きを行うかわりに、位置がずれた被写体パターンについての学習を通常の被写体パターンの学習に加えて行う技術が記載されている。また、この技術では、これらの学習結果を全て辞書データとして保持することとしている。 In order to deal with these problems, Patent Document 1 describes a technique for learning a subject pattern whose position is shifted in addition to normal subject pattern learning, instead of performing pixel thinning in the raster scan of the detection window. Yes. In this technique, all these learning results are held as dictionary data.

しかしながら、画素間引きを行った場合の位置ズレについては、少なくとも上下左右４方向にずれる可能性があるため、それぞれの位置ズレに対する学習を行わなくてはならない。この結果、通常の辞書に比べ辞書サイズが大きくなってしまい別の問題が生じてしまう。 However, there is a possibility that the positional deviation in the case of pixel thinning may be shifted in at least four directions, up, down, left, and right. Therefore, learning for each positional deviation must be performed. As a result, the dictionary size becomes larger than that of a normal dictionary, which causes another problem.

特開２００７−５８７２２号公報JP 2007-58722 A

P. Viola and M. Jones, "Robust Real-time Object Detection", SECOND INTERNATIONAL WORKSHOP ON STATISTICAL AND COMPUTATIONAL THEORIES OF VISION, July 13 2001P. Viola and M. Jones, "Robust Real-time Object Detection", SECOND INTERNATIONAL WORKSHOP ON STATISTICAL AND COMPUTATIONAL THEORIES OF VISION, July 13 2001

本発明は、辞書サイズの増加を抑制しながら、被写体の検出を高い精度及び短時間で行うことができる画像処理装置及び画像処理方法等を提供することを目的とする。 An object of the present invention is to provide an image processing apparatus, an image processing method, and the like that can detect a subject with high accuracy and in a short time while suppressing an increase in dictionary size.

本願発明者は、前記課題を解決すべく鋭意検討を重ねた結果、以下に示す発明の諸態様に想到した。 As a result of intensive studies to solve the above problems, the present inventor has come up with various aspects of the invention described below.

本発明に係る画像処理装置は、所定の被写体を判定するための局所領域の位置及び形状の情報を含む辞書情報を記憶した辞書記憶手段と、前記辞書情報に基づいて画像内の検出ウインドウの位置を制御する位置制御手段と、前記辞書情報に基づいて前記検出ウインドウ内の局所領域を決定する決定手段と、決定された前記局所領域における特徴量を算出する局所特徴量算出手段と、前記特徴量及び前記辞書情報に基づいて前記検出ウインドウ内に前記所定の被写体が含まれるか否かを判定する判定手段と、を有し、前記決定手段は、前記局所領域の横幅が広いほど水平方向の走査間隔が広くなるように前記特徴量を算出する対象となる局所領域の位置を決定するか、前記局所領域の縦幅が広いほど垂直方向の走査間隔が広くなるように前記特徴量を算出する対象となる局所領域の位置を決定することを特徴とする。 An image processing apparatus according to the present invention includes a dictionary storage unit that stores dictionary information including position and shape information of a local region for determining a predetermined subject, and a position of a detection window in the image based on the dictionary information. Position control means for controlling, determination means for determining a local area in the detection window based on the dictionary information, local feature quantity calculation means for calculating a feature quantity in the determined local area, and the feature quantity and have a, a determination unit configured to determine whether the predetermined object is included in the detection window based on said dictionary information, said determination means, the local region width is wide enough horizontal scanning The position of the local region for which the feature amount is to be calculated is determined so that the interval becomes wider, or the vertical scanning interval becomes wider as the vertical width of the local region becomes wider. And determining the position of the local region for which to calculate the amount.

本発明に係る画像処理方法は、辞書記憶手段に記憶され、所定の被写体を判定するための局所領域の位置及び形状の情報を含む辞書情報に基づいて、画像内の検出ウインドウの位置を制御する位置制御ステップと、前記辞書情報に基づいて前記検出ウインドウ内の局所領域を決定する決定ステップと、決定された前記局所領域における特徴量を算出する局所特徴量算出ステップと、前記特徴量及び前記辞書情報に基づいて前記検出ウインドウ内に前記所定の被写体が含まれるか否かを判定する判定ステップと、を有し、前記決定ステップでは、前記局所領域の横幅が広いほど水平方向の走査間隔が広くなるように前記特徴量を算出する対象となる局所領域の位置を決定するか、前記局所領域の縦幅が広いほど垂直方向の走査間隔が広くなるように前記特徴量を算出する対象となる局所領域の位置を決定することを特徴とする。 An image processing method according to the present invention controls the position of a detection window in an image based on dictionary information stored in a dictionary storage means and including information on the position and shape of a local region for determining a predetermined subject. A position control step; a determination step of determining a local region in the detection window based on the dictionary information; a local feature amount calculation step of calculating a feature amount in the determined local region; the feature amount and the dictionary It has a, and determining whether the predetermined object is included in the detection window based on the information, in the determining step, the scanning interval of approximately the width is wider horizontal direction of the local region is wide The position of the local region that is the target for calculating the feature amount is determined so that the vertical scanning interval becomes wider as the vertical width of the local region is larger. And determining the position of the local region for which to calculate the feature quantity.

本発明によれば、検出ウインドウを走査して被写体の有無の判定を行う際に、形状に応じて特徴量の算出の対象となる局所領域が決定されるので、高い精度及び短時間で判定を行うことができる。 According to the present invention, when the presence / absence of the subject is determined by scanning the detection window, the local region for which the feature amount is calculated is determined according to the shape, so that the determination can be performed with high accuracy and in a short time. It can be carried out.

本発明の第１の実施形態に係る画像処理装置の構成を示すブロック図である。1 is a block diagram illustrating a configuration of an image processing apparatus according to a first embodiment of the present invention. 第１の実施形態で用いられるカスケード型検出器の構成を示す図である。It is a figure which shows the structure of the cascade type detector used in 1st Embodiment. 第１の実施形態で用いられる強判別器の構成を示す図である。It is a figure which shows the structure of the strong discriminator used by 1st Embodiment. 縮小画像から所定の大きさの検出ウインドウ（部分領域）を抽出する処理を示す図である。It is a figure which shows the process which extracts the detection window (partial area | region) of a predetermined | prescribed magnitude | size from a reduced image. 第１の実施形態に係る画像処理装置の動作を示すフローチャートである。3 is a flowchart illustrating an operation of the image processing apparatus according to the first embodiment. ステップＳ５０５の人物判定処理の内容を示すフローチャートである。It is a flowchart which shows the content of the person determination process of step S505. 強判別器と弱判別器との関係の具体例を示す図である。It is a figure which shows the specific example of the relationship between a strong discriminator and a weak discriminator. データ構造の一例を示す図である。It is a figure which shows an example of a data structure. 本発明の実施形態における被写体尤度算出処理の基本原理を示す図である。It is a figure which shows the basic principle of the object likelihood calculation process in embodiment of this invention. 本発明の実施形態における被写体尤度算出処理の基本原理を示す図である。It is a figure which shows the basic principle of the object likelihood calculation process in embodiment of this invention. 第１の実施形態におけるステップＳ６０３の被写体尤度算出処理の内容を示すフローチャートである。It is a flowchart which shows the content of the object likelihood calculation process of step S603 in 1st Embodiment. ステップＳ１１０４の局所特徴量算出処理の内容を示すフローチャートである。It is a flowchart which shows the content of the local feature-value calculation process of step S1104. 注目する画素（ｕ，ｖ）に関する輝度値の勾配強度及び勾配方向の関係を示す図である。It is a figure which shows the relationship between the gradient intensity | strength and gradient direction of the luminance value regarding the pixel (u, v) to which attention is paid. 勾配方向ヒストグラムの例を示す図である。It is a figure which shows the example of a gradient direction histogram. ブロック領域の例を示す図である。It is a figure which shows the example of a block area. 第２の実施形態におけるステップＳ６０３の被写体尤度算出処理の内容を示すフローチャートである。It is a flowchart which shows the content of the object likelihood calculation process of step S603 in 2nd Embodiment. 検出ウインドウが６０×１２０の場合における対応表の例を示す図である。It is a figure which shows the example of a conversion table in case a detection window is 60x120.

以下、本発明の実施形態について添付の図面を参照して具体的に説明する。 Hereinafter, embodiments of the present invention will be specifically described with reference to the accompanying drawings.

（第１の実施形態）
先ず、本発明の第１の実施形態について説明する。図１は、本発明の第１の実施形態に係る画像処理装置の構成を示すブロック図である。 (First embodiment)
First, a first embodiment of the present invention will be described. FIG. 1 is a block diagram showing a configuration of an image processing apparatus according to the first embodiment of the present invention.

この画像処理装置には、図１に示すように、画像入力部１０１、画像保存部１０２、縮小画像生成部１０３、縮小画像設定部１０４、局所特徴量算出部１０５、位置制御部１０６、辞書記憶部１０７、判定部１０８及び判定結果格納部１０９が設けられている。更に、これらの動作を制御する制御部（図示せず）も設けられている。 As shown in FIG. 1, the image processing apparatus includes an image input unit 101, an image storage unit 102, a reduced image generation unit 103, a reduced image setting unit 104, a local feature amount calculation unit 105, a position control unit 106, a dictionary storage, and the like. A unit 107, a determination unit 108, and a determination result storage unit 109 are provided. Furthermore, a control unit (not shown) for controlling these operations is also provided.

画像入力部１０１は、検出対象となる画像を入力する。画像保存部１０２は、画像入力部１０１により入力された画像（入力画像）を保存する。縮小画像生成部１０３は、入力画像から数段階の縮小画像を生成する。縮小画像設定部１０４は、数段階の縮小画像のうちから１つの縮小画像を設定する。辞書記憶部１０７は、事前に学習した辞書データ（辞書情報）を記憶する。位置制御部１０６は、辞書データに基づいて縮小画像における検出ウインドウの位置を制御する。局所特徴量算出部１０５は、検出ウインドウ内における局所領域の特徴量（局所特徴量）を算出する。判定部１０８は、局所特徴量に基づいて検出ウインドウ内に人物がいるかどうかを判定する。判定結果格納部１０９は、判定部１０８の出力結果である人物の位置を格納する。 The image input unit 101 inputs an image to be detected. The image storage unit 102 stores the image (input image) input by the image input unit 101. The reduced image generation unit 103 generates reduced images of several stages from the input image. The reduced image setting unit 104 sets one reduced image from several stages of reduced images. The dictionary storage unit 107 stores dictionary data (dictionary information) learned in advance. The position control unit 106 controls the position of the detection window in the reduced image based on the dictionary data. The local feature amount calculation unit 105 calculates a feature amount (local feature amount) of a local region in the detection window. The determination unit 108 determines whether there is a person in the detection window based on the local feature amount. The determination result storage unit 109 stores the position of the person that is the output result of the determination unit 108.

次に、上述のように構成された画像処理装置における、入力画像中から被写体である人物の位置を検出する動作について説明する。図５は、第１の実施形態に係る画像処理装置の動作を示すフローチャートである。 Next, an operation for detecting the position of a person as a subject from an input image in the image processing apparatus configured as described above will be described. FIG. 5 is a flowchart showing the operation of the image processing apparatus according to the first embodiment.

先ず、ステップＳ５０１にて、画像入力部１０１より画像を入力し、画像保存部１０２がこれを読み込む。ここで、読み込まれた画像データは、例えば８ビットの画素により構成される２次元配列のデータであり、Ｒ、Ｇ、Ｂ、３つの面により構成されるものとする。 First, in step S501, an image is input from the image input unit 101, and the image storage unit 102 reads it. Here, the read image data is, for example, two-dimensional array data composed of 8-bit pixels, and is composed of R, G, B, and three surfaces.

次いで、ステップＳ５０２にて、縮小画像生成部１０３が画像データを所定の倍率に縮小した画像データを生成する。これは、本実施形態では、様々な大きさの人物の検出に対応すべく複数のサイズの画像データに対して順次検出を行うため。例えば、倍率が１．２倍程度異なる複数の画像への縮小処理が後段の検出処理のために順次適用される。 In step S502, the reduced image generation unit 103 generates image data obtained by reducing the image data to a predetermined magnification. This is because in the present embodiment, detection is sequentially performed on image data of a plurality of sizes in order to cope with detection of persons of various sizes. For example, reduction processing to a plurality of images with different magnifications by about 1.2 is sequentially applied for subsequent detection processing.

その後、ステップＳ５０３にて、縮小画像設定部１０４が、縮小画像生成部１０３により生成された複数のサイズの縮小画像の中から１枚を設定する。 Thereafter, in step S503, the reduced image setting unit 104 sets one of a plurality of reduced images generated by the reduced image generation unit 103.

続いて、ステップＳ５０４にて、位置制御部１０６が、縮小画像から所定の大きさの検出ウインドウ（部分領域）を抽出する。ここで、この抽出処理について図４を参照しながら説明する。先ず、入力画像４０１を基準画像４０２にリサイズし、その後、基準画像４０２を後の判定処理（ステップＳ５０５）で用いる所定の形式の画像、例えば輝度画像４０３に変換する。そして、輝度画像４０３のサイズに対して数段階の縮小画像４０４を生成し、これらの縮小画像４０４内の全ての領域に対して検出ウインドウが走査するように、ラスタスキャンすることで検出ウインドウ（部分領域）４０５を切り出す。従って、縮小率の大きな画像から検出ウインドウを切り出して人物の判別を行う場合には、画像に対して大きな人物の検出を行うことになる。 Subsequently, in step S504, the position control unit 106 extracts a detection window (partial region) having a predetermined size from the reduced image. Here, this extraction process will be described with reference to FIG. First, the input image 401 is resized to a reference image 402, and then the reference image 402 is converted into an image of a predetermined format, for example, a luminance image 403, used in a later determination process (step S505). Then, a reduced image 404 of several stages is generated with respect to the size of the luminance image 403, and the detection window (partial) is raster-scanned so that the detection window scans all areas in the reduced image 404. Area) 405 is cut out. Therefore, when a person is identified by cutting out a detection window from an image with a large reduction ratio, a large person is detected in the image.

次いで、ステップＳ５０５にて、局所特徴量算出部１０５及び判定部１０８が検出ウインドウ内に人物が含まれるか否かの判定を行う。この判定処理の詳細については後述する。 In step S505, the local feature amount calculation unit 105 and the determination unit 108 determine whether a person is included in the detection window. Details of this determination processing will be described later.

その後、ステップＳ５０６にて、制御部が、検出ウインドウが画像内の全ての位置を走査したか否かを判定する。そして、全ての位置を走査している場合にはステップＳ５０７に流れ、そうでない場合にはステップステップＳ５０４に流れ、全ての位置の走査が完了するまでステップＳ５０４からステップＳ５０６までの処理を繰り返す。 Thereafter, in step S506, the control unit determines whether or not the detection window has scanned all positions in the image. If all positions have been scanned, the process proceeds to step S507. If not, the process proceeds to step S504, and the processes from step S504 to step S506 are repeated until the scanning of all positions is completed.

ステップＳ５０７では、ステップＳ５０２にて縮小画像生成部１０３により生成された複数の縮小画像の全てについて処理が完了したか否かを判定する。そして、全てについて処理が完了している場合にはステップＳ５０８に流れ、そうでない場合にはステップＳ５０３に流れ、次の縮小画像についてステップＳ５０３からステップＳ５０７までの処理を繰り返す。 In step S507, it is determined whether or not processing has been completed for all of the plurality of reduced images generated by the reduced image generation unit 103 in step S502. If all the processes have been completed, the process proceeds to step S508. If not, the process proceeds to step S503, and the process from step S503 to step S507 is repeated for the next reduced image.

そして、ステップＳ５０８では、制御部が、検出された人物の位置を判定結果格納部１０９に出力し、判定結果格納部１０９がこれを格納する。 In step S508, the control unit outputs the detected position of the person to the determination result storage unit 109, and the determination result storage unit 109 stores this.

［人物判定処理（ステップＳ５０５）］
次に、ステップＳ５０５の人物の判定処理の詳細について説明する。この判定処理では、図２に示すカスケード型検出器が用いられる。図２は、第１の実施形態で用いられるカスケード型検出器の構成を示す図である。このカスケード型検出器は、図２に示すように、Ｎ個の強判別器２０−１〜２０−Ｎがカスケード接続されて構成されている。また、各強判別器は、図３に示す構成を備えている。図３は、第１の実施形態で用いられる強判別器の構成を示す図である。強判別器には、図３に示すように、Ｍ個の弱判別器３０−１〜３０−Ｍ、加算器３１１及び閾値処理部３１２が含まれている。弱判別器３０−１〜３０−Ｍは、０又は１を出力する。加算器３１１は、弱判別器３０−１〜３０−Ｍから出力された信号に各弱判別器３０−１〜３０−Ｍに対して予め設定された重みを掛けた値を互いに加算して出力する。閾値処理部３１２は、加算器３１１から出力された値及び予め設定された閾値に基づいて判別結果を出力する。なお、１個の強判別器に含まれる弱判別器の数Ｍは均一である必要はなく、強判別器毎に弱判別器の数Ｍが相違していてもよい。 [Person Determination Processing (Step S505)]
Next, details of the person determination process in step S505 will be described. In this determination process, the cascade detector shown in FIG. 2 is used. FIG. 2 is a diagram showing the configuration of the cascade detector used in the first embodiment. As shown in FIG. 2, this cascade detector is configured by cascade-connecting N strong discriminators 20-1 to 20-N. Each strong discriminator has the configuration shown in FIG. FIG. 3 is a diagram showing the configuration of the strong discriminator used in the first embodiment. The strong classifier includes M weak classifiers 30-1 to 30-M, an adder 311 and a threshold processing unit 312 as shown in FIG. The weak classifiers 30-1 to 30-M output 0 or 1. The adder 311 adds together values obtained by multiplying the signals output from the weak classifiers 30-1 to 30-M by weights set in advance for the weak classifiers 30-1 to 30-M, and outputs the result. To do. The threshold processing unit 312 outputs a discrimination result based on the value output from the adder 311 and a preset threshold. Note that the number M of weak classifiers included in one strong classifier need not be uniform, and the number M of weak classifiers may be different for each strong classifier.

そして、ステップＳ５０５では、次のような処理を実行する。図６は、ステップＳ５０５の人物判定処理の内容を示すフローチャートである。 In step S505, the following processing is executed. FIG. 6 is a flowchart showing the content of the person determination process in step S505.

先ず、ステップＳ６０１にて、局所特徴量算出部１０５が強判別器の番号ｎをｎ＝１と初期化する。番号ｎは１以上Ｎ以下の自然数である。 First, in step S601, the local feature quantity calculation unit 105 initializes the strong classifier number n to n = 1. The number n is a natural number between 1 and N.

次いで、ステップＳ６０２にて、局所特徴量算出部１０５が、ｎ番目の強判別器における弱判別器の番号ｔをｔ＝１と初期化し、また、各弱判別器の被写体尤度の合算値を代入するための変数ＳｎをＳｎ＝０と初期化する。番号ｔは１以上Ｍ以下の自然数である。 Next, in step S602, the local feature quantity calculation unit 105 initializes the weak classifier number t in the n-th strong classifier as t = 1, and also adds the subject likelihood of each weak classifier to the sum. A variable Sn for substitution is initialized as Sn = 0. The number t is a natural number between 1 and M.

次に、ステップＳ６０３にて、局所特徴量算出部１０５が、ｎ番目の強判別器におけるｔ番目の弱判別器ｈｎｔ（ｘ，ｙ）の被写体尤度Ｌｎｔ（ｘ，ｙ）を算出する。ここで、強判別器と弱判別器との関係について具体的に説明する。図７は、強判別器と弱判別器との関係の具体例を示す図である。この具体例では、強判別器１に３つの弱判別器ｈ１１〜ｈ１３が含まれ、強判別器２に４つの弱判別器ｈ２１〜ｈ２４が含まれているとする。また、検出ウインドウ内には、図７に示すように、様々な位置、大きさ、形状の局所領域が設定されており、これらの局所領域における被写体尤度を弱判別器が算出する。以下、弱判別器ｈ１１〜ｈ１３、ｈ２１〜ｈ２４が算出した写体尤度をＬ１１〜Ｌ１３、Ｌ２１〜Ｌ２４と表わす。被写体尤度の算出処理の詳細については後述する。なお、強判別器１及び２、弱判別器ｈ１１〜ｈ１３及びｈ２１〜ｈ２４、並びにこれらに付随する情報について予め学習しておき、図８に示すようなデータ構造として辞書記憶部１０７に格納しておく。このデータは、強判別器数８０１と各強判別器のデータ８０２とで構成され、各強判別器のデータ８０２は、弱判別器数８０３と各弱判別器のデータ８０４と判別の閾値８０５とで構成される。各弱判別器のデータ８０４は、８０６に示すように局所領域の情報と尤度変換のＬＵＴ（ルックアップテーブル）とを含み、局所領域の情報は、８０７に示すように、領域の左上のＸ座標とＹ座標、幅および高さで構成される。 Next, in step S603, the local feature amount calculation unit 105 calculates the subject likelihood Lnt (x, y) of the t-th weak discriminator hnt (x, y) in the n-th strong discriminator. Here, the relationship between the strong classifier and the weak classifier will be specifically described. FIG. 7 is a diagram illustrating a specific example of the relationship between the strong classifier and the weak classifier. In this specific example, it is assumed that the strong classifier 1 includes three weak classifiers h11 to h13, and the strong classifier 2 includes four weak classifiers h21 to h24. Further, as shown in FIG. 7, local areas of various positions, sizes, and shapes are set in the detection window, and the weak classifier calculates subject likelihoods in these local areas. Hereinafter, the object likelihoods calculated by the weak classifiers h11 to h13 and h21 to h24 are represented as L11 to L13 and L21 to L24. Details of the subject likelihood calculation process will be described later. The strong discriminators 1 and 2, weak discriminators h11 to h13 and h21 to h24, and information associated therewith are learned in advance and stored in the dictionary storage unit 107 as a data structure as shown in FIG. deep. This data is composed of the number of strong discriminators 801 and the data 802 of each strong discriminator. The data 802 of each strong discriminator includes the number of weak discriminators 803, the data 804 of each weak discriminator, the threshold value 805 for discrimination. Consists of. The data 804 of each weak classifier includes local area information and likelihood conversion LUT (lookup table) as indicated by 806, and the local area information is represented by X at the upper left of the area as indicated by 807. It consists of coordinates, Y coordinate, width and height.

なお、弱判別器に対応する局所領域の形状が矩形領域である必要はなく、例えば、図７に示すように、複数の矩形領域の組み合わせからなる局所領域７０１又は７０２等を用いてもよい。また、強判別器に含まれる弱判別器の数は限定されず、１又は２以上のいずれであってもよい。 Note that the shape of the local area corresponding to the weak discriminator need not be a rectangular area. For example, as shown in FIG. 7, a local area 701 or 702 composed of a combination of a plurality of rectangular areas may be used. Further, the number of weak classifiers included in the strong classifier is not limited and may be one or two or more.

ステップＳ６０３の後、ステップＳ６０４にて、局所特徴量算出部１０５が、加算器３１１を用いて、ステップＳ６０３で取得した被写体尤度Ｌｎｔ（ｘ，ｙ）を合算値Ｓｎに加算する。 After step S603, in step S604, the local feature amount calculation unit 105 uses the adder 311 to add the subject likelihood Lnt (x, y) acquired in step S603 to the total value Sn.

次いで、ステップＳ６０５にて、局所特徴量算出部１０５が、現在注目している弱判別器ｈｎｔ（ｘ，ｙ）が、ｎ番目の強判別器における最後の弱判別器（Ｍ番目の弱判別器）であるか否かを判定する。最後の弱判別器ではない場合、ステップＳ６０３に流れ、ステップＳ６０５までの処理を繰り返す。 Next, in step S605, the weak feature classifier hnt (x, y) that the local feature quantity calculation unit 105 is currently focusing on is the last weak classifier (Mth weak classifier) in the nth strong classifier. ). If it is not the last weak classifier, the process proceeds to step S603, and the process up to step S605 is repeated.

最後の弱判別器の場合、ステップＳ６０６に進み、判定部１０８が、閾値処理部３１２を用いて合算値Ｓｎとｎ番目の強判別器の閾値Ｔｎとの値を比較する。閾値Ｔｎは、予め学習処理により求めておき、辞書記憶部１０７に格納しておき、これを参照すればよい。図７に示す例では、強判別器１の合算値Ｓ１は、Ｓ１＝Ｌ１１＋Ｌ１２＋Ｌ１３となり、この合算値Ｓ１が強判別器１の閾値Ｔ１と比較される。また、強判別器２の合算値Ｓ２は、Ｓ２＝Ｌ２１＋Ｌ２２＋Ｌ２３＋Ｌ２４となり、この合算値Ｓ２が強判別器１の閾値Ｔ２と比較される。 In the case of the last weak classifier, the process proceeds to step S606, and the determination unit 108 compares the value Sn and the threshold value Tn of the nth strong classifier using the threshold processing unit 312. The threshold value Tn may be obtained in advance by a learning process, stored in the dictionary storage unit 107, and referred to. In the example shown in FIG. 7, the sum S1 of the strong discriminator 1 is S1 = L11 + L12 + L13, and this sum S1 is compared with the threshold T1 of the strong discriminator 1. Further, the total value S2 of the strong discriminator 2 is S2 = L21 + L22 + L23 + L24, and this sum S2 is compared with the threshold value T2 of the strong discriminator 1.

そして、判定部１０８は、Ｓｎ＜Ｔｎの場合、現在注目している検出ウインドウには人物が含まれていないと判定し、図６のフローチャートで示す処理を終了して図５のフローチャートに戻り、ステップＳ５０５に流れる。一方、判定部１０８は、Ｓｎ≧Ｔｎ（Ｓｎ＞＝Ｔｎ）の場合、現在注目している検出ウインドウはｎ番目の強判別器の判別処理に対し、条件を満たすパターンであると判定し、ステップＳ６０７に流れる。 Then, when Sn <Tn, the determination unit 108 determines that no person is included in the currently focused detection window, ends the processing illustrated in the flowchart of FIG. 6, and returns to the flowchart of FIG. 5. Flow proceeds to step S505. On the other hand, when Sn ≧ Tn (Sn> = Tn), the determination unit 108 determines that the currently detected detection window is a pattern that satisfies the conditions for the determination processing of the nth strong classifier, Flow proceeds to S607.

ステップＳ６０７では、判定部１０８が、現在注目している強判別器が最後の強判別器（Ｎ番目の強判別器）であるか否かを判定する。最後の強判別器ではない場合、ステップＳ６０２に流れ、次の強判別器の判別処理に移る。最後の判別器の場合、現在注目している検出ウインドウには人物が含まれていると判定し、ステップＳ６０８に流れ、判定結果格納部１０９に人物検出結果として検出ウインドウの位置（ｘ、ｙ）を格納する。 In step S607, the determination unit 108 determines whether or not the strong classifier currently focused on is the last strong classifier (N-th strong classifier). If it is not the last strong classifier, the process proceeds to step S602, and the process proceeds to the next strong classifier discrimination process. In the case of the last discriminator, it is determined that the detection window currently focused on includes a person, and the flow proceeds to step S608, where the detection result position (x, y) of the detection window is stored in the determination result storage unit 109. Is stored.

［被写体尤度算出処理（ステップＳ６０３）］
次に、ステップＳ６０３の被写体尤度の算出処理の詳細について説明する。従来の被写体検出処理においては、ラスタスキャンする検出ウインドウについて、全ての位置で被写体尤度算出処理を行っている。しかし、このような処理では、非常に多くの被写体尤度の計算が必要とされる。一方、スキャン画素を間引いて計算コストを下げる方法もあるが、単純にスキャン画素を間引いた場合には検出漏れが多く発生してしまう。 [Subject likelihood calculation process (step S603)]
Next, the details of the subject likelihood calculation process in step S603 will be described. In conventional subject detection processing, subject likelihood calculation processing is performed at all positions in a detection window for raster scanning. However, such a process requires calculation of a very large number of subject likelihoods. On the other hand, there is a method of reducing the calculation cost by thinning out the scan pixels. However, when the scan pixels are simply thinned out, many detection omissions occur.

これに対し、本実施形態では被写体尤度を算出する際に各弱判別器に対応する局所領域の形状に注目し、これに応じてスキャン幅を決定することで、精度を落とさずに高速に被写体尤度計算を行う。図９及び図１０は、本発明の実施形態における被写体尤度算出処理の基本原理を示す図である。 In contrast, in the present embodiment, when calculating the subject likelihood, attention is paid to the shape of the local region corresponding to each weak discriminator, and the scan width is determined in accordance with this, so that the accuracy can be reduced at a high speed. Subject likelihood calculation is performed. 9 and 10 are diagrams showing the basic principle of subject likelihood calculation processing in the embodiment of the present invention.

図９に示すように、横幅が広い形状の局所領域９０１と、それを水平方向に２画素飛ばした位置にある局所領域９０３とに注目した場合、飛ばした画素数に対して重複する部分の領域が大きい。このため、それぞれの局所領域内の累積的な特徴量（例えば輝度ヒストグラム又はＨＯＧ（Histgrams Of Oriented Gradients）特徴）を、グラフ９０２及び９０４のように表すと、これらの特徴の差は小さい。なお、ＨＯＧ特徴は、例えば「Navneet Dalal and Bill Triggs, "Histograms of Oriented Gradients for Human Detection", IEEE Computer Vision and Pattern Recognition. Vol.1, pp.886-893, 2005」に示されている。ＨＯＧ特徴は、局所領域における画素勾配の強度と角度に基づいたヒストグラムを特徴量とするものであり、人物検出に特に有効な特徴量である。 As shown in FIG. 9, when attention is paid to a local region 901 having a wide width and a local region 903 at a position where two pixels are skipped in the horizontal direction, a region overlapping with the number of skipped pixels Is big. For this reason, when the cumulative feature amount (for example, luminance histogram or HOG (Histgrams Of Oriented Gradients) feature) in each local region is represented as graphs 902 and 904, the difference between these features is small. The HOG feature is shown, for example, in “Navneet Dalal and Bill Triggs,“ Histograms of Oriented Gradients for Human Detection ”, IEEE Computer Vision and Pattern Recognition. Vol.1, pp.886-893, 2005”. The HOG feature uses a histogram based on the intensity and angle of the pixel gradient in the local region as a feature amount, and is a particularly effective feature amount for person detection.

図１０に示すように、縦幅が広い局所領域１００１と、それを垂直方向に２画素飛ばした位置にある局所領域１００３とに注目した場合にも、同様に、飛ばした画素数に対して重複する部分の領域が大きい。このため、それぞれの局所領域内の累積的な特徴量を、グラフ１００２及び１００４のように表すと、これらの特徴の差も小さい。 As shown in FIG. 10, when attention is paid to a local region 1001 having a wide vertical width and a local region 1003 at a position where the vertical region is skipped by two pixels, the number of skipped pixels is similarly overlapped. The area of the part to do is large. For this reason, when the cumulative feature amount in each local region is represented as graphs 1002 and 1004, the difference between these features is also small.

従って、特徴量として局所領域内の累積的な特徴量を用いて判別処理を行う場合には、判別対象となる局所領域の横幅が十分に広ければ、弱判別器のラスタスキャンの水平方向のステップ幅を画素を間引いて行ったとしても精度に大きな影響を与えることはないといえる。同様に、判別対象となる局所領域の縦幅が十分に広ければ、弱判別器のラスタスキャンの垂直方向のステップ幅を画素を間引いて行ったとしても精度に大きな影響を与えることはないといえる。 Therefore, when performing the discrimination process using the cumulative feature quantity in the local area as the feature quantity, if the horizontal width of the local area to be discriminated is sufficiently wide, the horizontal step of the raster scan of the weak classifier Even if the width is thinned out, it can be said that the accuracy is not greatly affected. Similarly, if the vertical width of the local region to be discriminated is sufficiently wide, even if the step width in the vertical direction of the raster scan of the weak discriminator is thinned out, the accuracy is not greatly affected. .

そこで、ステップＳ６０３では、次のような処理を実行する。即ち、本実施形態では、弱判別器の局所領域の横幅が広い場合に水平方向に画素を間引いて（水平方向の走査間隔が広くなるように）ラスタスキャンを行う。図１１は、第１の実施形態におけるステップＳ６０３の被写体尤度算出処理の内容を示すフローチャートである。 Therefore, in step S603, the following processing is executed. That is, in the present embodiment, when the width of the local area of the weak classifier is wide, raster scanning is performed by thinning out pixels in the horizontal direction (so that the horizontal scanning interval is widened). FIG. 11 is a flowchart showing the contents of the subject likelihood calculation process in step S603 in the first embodiment.

先ず、ステップＳ１１０１にて、局所特徴量算出部１０５が、位置制御部１０６を介して検出ウインドウ（ｘ，ｙ）におけるｎ番目の強判別器のｔ番目の弱判別器ｈｎｔ（ｘ，ｙ）の横幅の値を辞書記憶部１０７から取得する。予め学習した結果の弱判別器を辞書として格納する際に、図８の情報８０７のように、局所領域の情報として、左上Ｘ座標、左上Ｙ座標、幅Ｗｌ及び高さＨｌを格納しておき、このうちの横幅Ｗｌの値を取得する。 First, in step S1101, the local feature amount calculation unit 105 performs the t th weak discriminator hnt (x, y) of the n th strong discriminator in the detection window (x, y) via the position control unit 106. The width value is acquired from the dictionary storage unit 107. When storing the weak classifier obtained as a result of learning in advance as a dictionary, the upper left X coordinate, the upper left Y coordinate, the width Wl, and the height Hl are stored as local area information as in the information 807 of FIG. Of these, the value of the width Wl is acquired.

次いで、ステップＳ１１０２にて、局所特徴量算出部１０５が、取得した横幅Ｗｌの値が、予め定められている閾値ｔｈｗより大きいか否かの判定を行う。大きい場合は画素間引きを行う局所領域であると判断し、ステップＳ１１０３に流れ、そうでない場合は１画素毎に被写体尤度計算を行うため、ステップＳ１１０４に流れる。 Next, in step S1102, the local feature amount calculation unit 105 determines whether or not the acquired value of the width Wl is larger than a predetermined threshold thw. If it is larger, it is determined that the region is a local region where pixel thinning is performed, and the flow proceeds to step S1103. Otherwise, the subject likelihood calculation is performed for each pixel, and the flow proceeds to step S1104.

ステップＳ１１０３では、局所特徴量算出部１０５が、画素間引きを行う局所領域について、画素間引きを行う位置にあるか否かの判定を行う。本実施形態では、画素間引き数を３画素とするため、検出ウインドウの位置（ｘ，ｙ）の水平方向座標ｘについて、以下の式（１）を満たすか否かの判定を行う。
ｘ％３！＝０・・・（１）
なお、「％」は剰余の計算を行うものとする。 In step S <b> 1103, the local feature amount calculation unit 105 determines whether or not the local region where pixel thinning is performed is in a position where pixel thinning is performed. In the present embodiment, since the pixel thinning number is set to 3 pixels, it is determined whether or not the following formula (1) is satisfied with respect to the horizontal coordinate x of the position (x, y) of the detection window.
x% 3! = 0 (1)
“%” Is calculated as a remainder.

式（１）を満たす場合、弱判別器ｈｎｔ（ｘ，ｙ）は間引き位置にあるため、被写体尤度の計算を省略してステップＳ１１０７に流れる。 When the expression (1) is satisfied, the weak discriminator hnt (x, y) is in the thinning position, so that the calculation of the subject likelihood is omitted and the flow proceeds to step S1107.

ステップＳ１１０７では、局所特徴量算出部１０５が、ステップＳ１１０６で保持した値を参照することで被写体尤度ｌｎｔを取得する。 In step S1107, the local feature amount calculation unit 105 acquires the subject likelihood lnt by referring to the value held in step S1106.

一方、式（１）を満たさない場合には、弱判別器ｈｎｔ（ｘ，ｙ）は被写体尤度の計算を行う位置にあるため、ステップＳ１１０４に流れて、局所特徴量算出部１０５が局所特徴量Ｕｎｔを算出する。ステップＳ１１０４の局所特徴量Ｕｎｔの算出処理の詳細については後述する。 On the other hand, when Expression (1) is not satisfied, the weak discriminator hnt (x, y) is located at the position where the subject likelihood is calculated. Therefore, the flow proceeds to step S1104, and the local feature quantity calculation unit 105 performs the local feature calculation. The quantity Unt is calculated. Details of the local feature amount Unt calculation process in step S1104 will be described later.

次いで、ステップＳ１１０５にて、局所特徴量算出部１０５が被写体尤度ｌｎｔを取得する。被写体尤度ｌｎｔの取得では、以下の式（２）を用いて局所特徴量Ｕｎｔからの変換を行う。
ｌｎｔ＝ｆｎｔ（Ｕｎｔ）・・・（２） Next, in step S1105, the local feature amount calculation unit 105 acquires the subject likelihood lnt. In obtaining the subject likelihood lnt, conversion from the local feature amount Unt is performed using the following equation (2).
lnt = fnt (Unt) (2)

ここで、関数ｆｎｔは、ｎ番目の強判別器のｔ番目の弱判別器ｈｎｔにおける局所特徴量と被写体尤度の関係を表した対応表である。局所特徴量算出部１０５はこの対応表を参照して、局所特徴量Ｕｎｔから被写体尤度ｌｎｔを取得する。 Here, the function fnt is a correspondence table that represents the relationship between the local feature amount and the subject likelihood in the t-th weak classifier hnt of the n-th strong classifier. The local feature quantity calculation unit 105 refers to the correspondence table and acquires the subject likelihood lnt from the local feature quantity Unt.

なお、被写体尤度ｌｎｔは、以降の検出ウインドウにおいて弱判別器ｈｎｔが間引き位置にある場合に被写体尤度を計算する代わりにこの値を参照するため（ステップＳ１１０７）、ステップＳ１１０６にて、局所特徴量算出部１０５内に設けたメモリに保持しておく。 Note that the subject likelihood lnt refers to this value instead of calculating the subject likelihood when the weak discriminator hnt is in the thinning position in the subsequent detection windows (step S1107). The data is stored in a memory provided in the amount calculation unit 105.

その後、ステップＳ１１０８にて、被写体尤度ｌｎｔを弱判別器ｈｎｔの被写体尤度Ｌｎｔ（ｘ，ｙ）に代入して図６のステップＳ６０３の処理に戻る。 Thereafter, in step S1108, the subject likelihood lnt is substituted for the subject likelihood Lnt (x, y) of the weak discriminator hnt, and the process returns to step S603 in FIG.

以上の処理について、例えば検出ウインドウが（０，ｙ）→（１，ｙ）→（２，ｙ）→（３，ｙ）と水平方向に走査する場合、検出ウインドウ（０，ｙ）で弱判別器ｈｎｔの被写体尤度ｌｎｔを算出し、この値を保持する。また、検出ウインドウ（１，ｙ）及び検出ウインドウ（２，ｙ）における弱判別器ｈｎｔの被写体尤度計算は間引き位置にあるため省略し、被写体尤度Ｌｎｔ（１，ｙ）及びＬｎｔ（２，ｙ）にはともに被写体尤度ｌｎｔが代入される。そして、検出ウインドウ（３，ｙ）に移動した際に、再度被写体尤度の計算を行い、被写体尤度ｌｎｔの値を更新する。 In the above processing, for example, when the detection window scans in the horizontal direction of (0, y) → (1, y) → (2, y) → (3, y), weak detection is performed with the detection window (0, y). The object likelihood lnt of the container hnt is calculated and held. The subject likelihood calculation of the weak classifier hnt in the detection window (1, y) and the detection window (2, y) is omitted because it is in the thinning position, and the subject likelihoods Lnt (1, y) and Lnt (2, The subject likelihood lnt is assigned to both y). Then, when moving to the detection window (3, y), the subject likelihood is calculated again, and the value of the subject likelihood lnt is updated.

なお、この例では、局所領域の横幅を参照することで水平方向の画素間引きを行うか否かを決定したが、局所領域の縦幅を参照して垂直方向の画素間引きを行うか否かを決定してもよく、また、双方を組み合わせてもよい。つまり、局所領域の縦幅が広いほど垂直方向の走査間隔が広くなるように決定してもよい。 In this example, it is determined whether or not to perform pixel thinning in the horizontal direction by referring to the horizontal width of the local region, but whether or not to perform pixel thinning in the vertical direction with reference to the vertical width of the local region is determined. It may be determined, or both may be combined. In other words, the vertical scanning interval may be determined to be wider as the vertical width of the local region is wider.

［局所特徴量算出処理（ステップＳ１１０４）］
次に、ステップＳ１１０４の局所特徴量の算出処理の詳細について説明する。本実施形態では、局所特徴量として特に画素特徴の累積的な特徴量を用いることで大きな効果を得ることができる。画素特徴の累積的な特徴量としては、例えばＨＯＧ特徴及びＨａａｒ特徴等のように、局所領域における画素値そのもの、輝度値、又はエッジ強度等の画素の特徴を累積した特徴量が挙げられる。本実施形態では、人物の検出に特に有効なＨＯＧ特徴を用いるが、他の特徴量を用いることも可能である。 [Local Feature Quantity Calculation Processing (Step S1104)]
Next, details of the local feature amount calculation processing in step S1104 will be described. In the present embodiment, a great effect can be obtained by using a cumulative feature amount of pixel features as the local feature amount. Examples of the cumulative feature amount of the pixel feature include a feature amount obtained by accumulating pixel features such as the pixel value itself, the luminance value, or the edge strength in the local region, such as the HOG feature and the Haar feature. In this embodiment, HOG features that are particularly effective for human detection are used, but other feature amounts can also be used.

そして、ステップＳ１１０４では、次のような処理を実行する。図１２は、ステップＳ１１０４の局所特徴量算出処理の内容を示すフローチャートである。 In step S1104, the following processing is executed. FIG. 12 is a flowchart showing the contents of the local feature amount calculation process in step S1104.

先ず、ステップＳ１２０１にて、局所特徴量算出部１０５が勾配情報を算出する。ここで、勾配情報には、隣接画素における画素特徴の勾配強度及び勾配方向の２つの情報が含まれる。画素特徴としては輝度値が代表的であるが、その他の画素の色情報を表すものであればよい。ここでは、画素特徴を輝度値として説明する。画素（ｕ，ｖ）における輝度値をＩ（ｕ，ｖ）とする。 First, in step S1201, the local feature amount calculation unit 105 calculates gradient information. Here, the gradient information includes two pieces of information on the gradient strength and gradient direction of the pixel feature in the adjacent pixels. A luminance value is typical as a pixel feature, but any pixel feature may be used as long as it represents color information of other pixels. Here, the pixel feature is described as a luminance value. Let the luminance value at pixel (u, v) be I (u, v).

図１３は、注目する画素（ｕ，ｖ）に関する輝度値の勾配強度及び勾配方向の関係を示す図である。画素（ｕ，ｖ）における輝度の勾配強度は以下の式（３）で、勾配方向は以下の式（４）で表される。 FIG. 13 is a diagram illustrating the relationship between the gradient intensity and gradient direction of the luminance value related to the pixel of interest (u, v). The gradient intensity of luminance at the pixel (u, v) is represented by the following equation (3), and the gradient direction is represented by the following equation (4).

局所特徴量算出部１０５は、これらの式（３）及び（４）から勾配強度１３０１を示す値ｍ（ｕ，ｖ）、勾配方向１３０２を示す値θ（ｕ，ｖ）を算出する。 The local feature amount calculation unit 105 calculates a value m (u, v) indicating the gradient strength 1301 and a value θ (u, v) indicating the gradient direction 1302 from these equations (3) and (4).

次いで、ステップＳ１２０２にて、局所特徴量算出部１０５が、勾配方向ヒストグラムを生成する。図１４は、勾配方向ヒストグラムの例を示す図である。図１４の検出ウインドウ１４０１の局所領域１４０２内の各画素は、ステップＳ１２０１の処理により、矢印１４０３で示す勾配強度及び勾配方向の情報を持つ。ステップＳ１２０２では、これらの情報に基づき、勾配の方向別に勾配強度で重み付けしたヒストグラムを生成する。このとき、０度から１８０度の勾配方向を９段階に区分すると、例えばヒストグラム１４０４が得られる。以降、局所領域１４０２をセル領域とよぶ。ステップＳ１２０２の処理により、検出ウインドウ１４０１内の各セル領域はそれぞれ勾配方向ヒストグラムとして生成された９次元の特徴ベクトルを持つことになる。１つのセルにおける特徴ベクトルＦｉを以下の式（５）で表す。 Next, in step S1202, the local feature amount calculation unit 105 generates a gradient direction histogram. FIG. 14 is a diagram illustrating an example of a gradient direction histogram. Each pixel in the local region 1402 of the detection window 1401 in FIG. 14 has information on the gradient strength and gradient direction indicated by the arrow 1403 by the processing in step S1201. In step S1202, based on these pieces of information, a histogram weighted by the gradient strength for each gradient direction is generated. At this time, if the gradient direction from 0 degrees to 180 degrees is divided into nine stages, for example, a histogram 1404 is obtained. Hereinafter, the local region 1402 is referred to as a cell region. By the processing in step S1202, each cell region in the detection window 1401 has a nine-dimensional feature vector generated as a gradient direction histogram. A feature vector Fi in one cell is expressed by the following formula (5).

その後、ステップＳ１２０３にて、局所特徴量算出部１０５が、複数のセル領域から構成されるブロック領域において、セル領域を正規化する。本実施形態では、２×２のセル領域を含む領域をブロック領域として設定し、このブロック領域でセル領域内の９次元特徴ベクトルを正規化する。図１５は、ブロック領域の例を示す図である。ブロック領域１５０１内の各セルはそれぞれ９次元のベクトルを持っており、これらをセル領域毎にＦ１、Ｆ２、Ｆ３、Ｆ４と表すと、ブロック領域内には３６次元のベクトルが存在することになる。これらをブロック特徴ベクトルＶｋとすると、このブロック特徴ベクトルＶｋは以下の式（５）で表される。 Thereafter, in step S1203, local feature amount calculation section 105 normalizes the cell area in a block area composed of a plurality of cell areas. In the present embodiment, an area including a 2 × 2 cell area is set as a block area, and the 9-dimensional feature vector in the cell area is normalized with this block area. FIG. 15 is a diagram illustrating an example of a block area. Each cell in the block area 1501 has a 9-dimensional vector. If these are expressed as F1, F2, F3, and F4 for each cell area, a 36-dimensional vector exists in the block area. . Assuming that these are block feature vectors Vk, the block feature vectors Vk are expressed by the following equation (5).

そして、局所特徴量算出部１０５は、このブロック特徴ベクトルＶｋに対し、パターン照合における照明変動を低減するため、以下の式（６）により正規化を行う。 And the local feature-value calculation part 105 normalizes with respect to this block feature vector Vk by the following formula | equation (6) in order to reduce the illumination fluctuation | variation in pattern matching.

以上のように求めたブロック特徴ベクトルＶｋが局所領域におけるＨＯＧ特徴となる。即ち、各弱判別器のパターン判別では、３６次元の特徴ベクトルを用いてパターンの照合を行う。 The block feature vector Vk obtained as described above becomes the HOG feature in the local region. That is, in pattern discrimination of each weak discriminator, patterns are collated using 36-dimensional feature vectors.

なお、ここでは、セル領域が６×６の画素で構成されることとしているが、セル領域の構成画素数は任意であり、セル領域のアスペクト比は１：１に限定されない。同様に、ブロック領域が２×２のセル領域から構成されているが、ブロック領域を構成するセル領域の数は任意である。このように、様々なサイズ及びアスペクト比のセル領域を任意の数で構成してブロック領域を生成することで、形状及び大きさの異なる複数種類の局所領域の特徴を表現することが可能となる。 Here, the cell region is configured by 6 × 6 pixels, but the number of constituent pixels of the cell region is arbitrary, and the aspect ratio of the cell region is not limited to 1: 1. Similarly, the block area is composed of 2 × 2 cell areas, but the number of cell areas constituting the block area is arbitrary. In this way, by creating an arbitrary number of cell regions of various sizes and aspect ratios and generating block regions, it becomes possible to express the characteristics of multiple types of local regions having different shapes and sizes. .

このような第１の実施形態では、画像中の被写体として人物領域を抽出する際に、判別器に対応する局所領域の横幅が特定の大きさ以上である場合にラスタスキャンの画素間引きが行われる。このため、判別性能を落とすことなく判別処理を高速に行うことが可能となる。 In such a first embodiment, when a person area is extracted as a subject in an image, raster scan pixel thinning is performed if the width of the local area corresponding to the discriminator is greater than or equal to a specific size. . For this reason, the discrimination process can be performed at high speed without degrading the discrimination performance.

（第２の実施形態）
次に、本発明の第２の実施形態について説明する。第２の実施形態は、主にステップＳ６０３の被写体尤度算出処理が第１の実施形態と相違しており、他の構成は第１の実施形態と同様である。図１６は、本発明の第２の実施形態におけるステップＳ６０３の被写体尤度算出処理の内容を示すフローチャートである。 (Second Embodiment)
Next, a second embodiment of the present invention will be described. The second embodiment mainly differs from the first embodiment in subject likelihood calculation processing in step S603, and the other configuration is the same as that of the first embodiment. FIG. 16 is a flowchart showing the contents of subject likelihood calculation processing in step S603 in the second embodiment of the present invention.

先ず、第１の実施形態と同様にして、ステップＳ１１０１にて、局所特徴量算出部１０５が、位置制御部１０６を介して検出ウインドウ（ｘ，ｙ）におけるｎ番目の強判別器のｔ番目の弱判別器ｈｎｔ（ｘ，ｙ）の横幅の値を辞書記憶部１０７から取得する。 First, in the same manner as in the first embodiment, in step S1101, the local feature amount calculation unit 105 passes the position control unit 106 via the position control unit 106 and the t-th of the n-th strong classifier in the detection window (x, y). The width value of the weak classifier hnt (x, y) is acquired from the dictionary storage unit 107.

次いで、ステップＳ１６０１にて、局所特徴量算出部１０５が、弱判別器ｈｎｔの幅Ｗｌに基づいて間引き画素数ｍを決定する。この際には、弱判別器の幅と間引き画素数の対応表を予め用意しておき、これを参照することで間引き画素数ｍを決定する。検出ウインドウが６０×１２０の場合における対応表の例を図１７に示す。 Next, in step S1601, the local feature amount calculation unit 105 determines the number of thinned pixels m based on the width Wl of the weak discriminator hnt. At this time, a correspondence table of the width of the weak classifier and the number of thinned pixels is prepared in advance, and the number m of thinned pixels is determined by referring to this table. FIG. 17 shows an example of the correspondence table when the detection window is 60 × 120.

その後、ステップＳ１６０２において、局所特徴量算出部１０５が、現在注目している検出ウインドウの画像水平方向の座標ｘ及び間引き画素数ｍから検出ウインドウ（ｘ，ｙ）において弱判別器ｈｎｔの被写体尤度の計算を行うか否かを判定する。
ｘ％ｍ！＝０・・・（７） Thereafter, in step S1602, the local feature amount calculation unit 105 determines the subject likelihood of the weak discriminator hnt in the detection window (x, y) from the horizontal coordinate x and the number of thinned pixels m of the currently focused detection window. It is determined whether to perform the calculation.
x% m! = 0 (7)

式（７）を満たす場合、弱判別器ｈｎｔ（ｘ，ｙ）は間引き位置にあるため、被写体尤度の計算を省略してステップＳ１１０７に流れる。 When Expression (7) is satisfied, the weak discriminator hnt (x, y) is in the thinning position, and thus the calculation of subject likelihood is omitted and the flow proceeds to step S1107.

一方、式（７）を満たさない場合には、弱判別器ｈｎｔ（ｘ，ｙ）は被写体尤度の計算を行う位置にあるため、ステップＳ１１０４に流れて、局所特徴量算出部１０５が局所特徴量Ｕｎｔを算出する。 On the other hand, if the expression (7) is not satisfied, the weak classifier hnt (x, y) is in a position where the subject likelihood is calculated, and therefore, the flow proceeds to step S1104 and the local feature quantity calculation unit 105 performs the local feature calculation. The quantity Unt is calculated.

ステップＳ１１０７の後及びステップＳ１１０４の後には、第１の実施形態と同様にして、ステップＳ１１０５、Ｓ１１０６及びＳ１１０８の処理を行う。 After step S1107 and after step S1104, the processes of steps S1105, S1106, and S1108 are performed in the same manner as in the first embodiment.

このような第２の実施形態では、画像中の被写体として人物領域を抽出する際に、弱判別器に対応する局所領域の横幅の値に応じてラスタスキャンの間引き画素数が決定され、その間引き画素数で画素間引きが行われる。このため、第１の実施形態と同様に、判別性能を落とすことなく判別処理を高速に行うことが可能となる。 In such a second embodiment, when extracting a person area as a subject in an image, the number of thinned pixels of a raster scan is determined according to the width value of the local area corresponding to the weak classifier, and the thinning is performed. Pixel thinning is performed by the number of pixels. For this reason, as in the first embodiment, the discrimination processing can be performed at high speed without degrading the discrimination performance.

なお、第１の実施形態及び第２の実施形態では、被写体として画像中の人物領域を抽出することとしているが、人物以外の物体、例えば顔及び動物等を抽出の対象としてもよい。 In the first and second embodiments, a person region in an image is extracted as a subject, but objects other than a person, such as a face and an animal, may be extracted.

なお、上述した実施形態の処理は、各機能を具現化したソフトウェアのプログラムコードを記録した記録媒体をシステム或いは装置に提供しても実現することができる。そして、そのシステム又は装置のコンピュータ（若しくはＣＰＵ、ＭＰＵ）が記録媒体に格納されたプログラムコードを読み出し実行することによって、前述した実施形態の機能を実現することができる。この場合、記録媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体（コンピュータ読み取り可能な記録媒体）は本発明を構成することになる。 The processing of the above-described embodiment can also be realized by providing a system or apparatus with a recording medium that records a program code of software that embodies each function. The functions of the above-described embodiments can be realized by the computer (or CPU, MPU) of the system or apparatus reading and executing the program code stored in the recording medium. In this case, the program code itself read from the recording medium realizes the functions of the above-described embodiments, and the storage medium (computer-readable recording medium) storing the program code constitutes the present invention. become.

また、コンピュータが読み出したプログラムコードの指示に基づき、コンピュータ上で稼動しているＯＳ（オペレーティングシステム）などが実際の処理の一部又は全部を行う場合も含まれている。 In addition, a case where an OS (operating system) running on the computer performs part or all of the actual processing based on an instruction of the program code read by the computer is included.

１０１：画像入力部、１０２：画像保存部、１０３：縮小画像生成部、１０４：縮小画像設定部、１０５：局所特徴量算出部、１０６：位置制御部、１０７：辞書記憶部、１０８：判定部、１０９：判定結果格納部 101: Image input unit, 102: Image storage unit, 103: Reduced image generation unit, 104: Reduced image setting unit, 105: Local feature amount calculation unit, 106: Position control unit, 107: Dictionary storage unit, 108: Determination unit 109: Determination result storage unit

Claims

Dictionary storage means for storing dictionary information including information on the position and shape of a local region for determining a predetermined subject;
Position control means for controlling the position of the detection window in the image based on the dictionary information;
Determining means for determining a local region in the detection window based on the dictionary information;
Local feature amount calculating means for calculating a feature amount in the determined local region;
Determination means for determining whether or not the predetermined object is included in the detection window based on the feature amount and the dictionary information;
I have a,
The image processing apparatus according to claim 1, wherein the determining unit determines a position of the local region that is a target for calculating the feature amount such that a horizontal scanning interval becomes wider as a lateral width of the local region is larger .

Said determining means according to claim 1, characterized in that to determine the position of the local region for which to calculate the characteristic amount such as longitudinal width is wider in the vertical direction of the scanning spacing becomes wider in the local region Image processing apparatus.

Dictionary storage means for storing dictionary information including information on the position and shape of a local region for determining a predetermined subject;
Position control means for controlling the position of the detection window in the image based on the dictionary information;
Determining means for determining a local region in the detection window based on the dictionary information;
Local feature amount calculating means for calculating a feature amount in the determined local region;
Determination means for determining whether or not the predetermined object is included in the detection window based on the feature amount and the dictionary information;
Have
The image processing apparatus according to claim 1, wherein the determining unit determines a position of a local region that is a target for calculating the feature amount such that the vertical scanning interval becomes wider as the vertical width of the local region is larger.

The feature amount, the image processing apparatus according to any one of claims 1 to 3, characterized in that a characteristic quantity of accumulating pixel feature of the local region.

A position control step for controlling the position of the detection window in the image based on dictionary information stored in the dictionary storage means and including information on the position and shape of the local region for determining a predetermined subject;
Determining a local region in the detection window based on the dictionary information;
A local feature amount calculating step of calculating a feature amount in the determined local region;
A determination step of determining whether or not the predetermined subject is included in the detection window based on the feature amount and the dictionary information;
Have
In the determination step, the position of the local region for which the feature amount is calculated is determined such that the horizontal scanning interval becomes wider as the lateral width of the local region becomes wider .

A position control step for controlling the position of the detection window in the image based on dictionary information stored in the dictionary storage means and including information on the position and shape of the local region for determining a predetermined subject;
Determining a local region in the detection window based on the dictionary information;
A local feature amount calculating step of calculating a feature amount in the determined local region;
A determination step of determining whether or not the predetermined subject is included in the detection window based on the feature amount and the dictionary information;
Have
In the determining step, the position of the local region for which the feature amount is calculated is determined such that the vertical scanning interval becomes wider as the vertical width of the local region becomes wider.

On the computer,
A position control step for controlling the position of the detection window in the image based on dictionary information stored in the dictionary storage means and including information on the position and shape of the local region for determining a predetermined subject;
Determining a local region in the detection window based on the dictionary information;
A local feature amount calculating step of calculating a feature amount in the determined local region;
A determination step of determining whether or not the predetermined subject is included in the detection window based on the feature amount and the dictionary information;
Was executed,
Wherein in the determination step, the program characterized that you determine the position of the local region for which to calculate the characteristic amount such as lateral width is wide in the horizontal direction of the scanning spacing becomes wider in the local region.

On the computer,
A position control step for controlling the position of the detection window in the image based on dictionary information stored in the dictionary storage means and including information on the position and shape of the local region for determining a predetermined subject;
Determining a local region in the detection window based on the dictionary information;
A local feature amount calculating step of calculating a feature amount in the determined local region;
A determination step of determining whether or not the predetermined subject is included in the detection window based on the feature amount and the dictionary information;
And execute
In the determination step, the position of the local region that is a target for calculating the feature amount is determined so that the vertical scanning interval becomes wider as the vertical width of the local region becomes wider.

9. A computer-readable recording medium on which the program according to claim 7 or 8 is recorded.