JP4561380B2

JP4561380B2 - Detection apparatus, detection method, and detection program

Info

Publication number: JP4561380B2
Application number: JP2005015375A
Authority: JP
Inventors: 大作保理江; 雄介中野
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2005-01-24
Filing date: 2005-01-24
Publication date: 2010-10-13
Anticipated expiration: 2025-01-24
Also published as: JP2006202184A

Description

本発明は、対象画像中に存在する物体を検出する検出技術に関する。 The present invention relates to a detection technique for detecting an object present in a target image.

近年、対象画像中に存在する被写体（例えば、人体又は物体）を検出する技術が多く提案されている。肌色検出又はエッジ検出等の簡易な手法も存在するが、被写体検出後に、認証又は状態検出等の高度な処理を実行する場合には、テンプレートマッチング等による検出位置精度の高い検出手法が有効となる。 In recent years, many techniques for detecting a subject (for example, a human body or an object) existing in a target image have been proposed. There are simple methods such as skin color detection or edge detection, but when advanced processing such as authentication or state detection is executed after subject detection, a detection method with high detection position accuracy by template matching or the like is effective. .

テンプレートマッチングにおいて、検出処理時間を短縮するために、多重解像度画像を用いる手法が存在する（特許文献１参照）。これは、検出対象画像を所定の縮小率で縮小した低解像度画像を用いて粗く検出し位置限定を行い、限定された位置情報に基づいて高解像度画像でより正確な検出を行うものであり、高速かつ高精細な検索処理を実現する。 In template matching, there is a technique using a multi-resolution image in order to shorten the detection processing time (see Patent Document 1). This is a method of coarsely detecting and limiting the position using a low-resolution image obtained by reducing the detection target image at a predetermined reduction ratio, and performing more accurate detection with a high-resolution image based on the limited position information. Realizes high-speed and high-definition search processing.

特開２０００−１３４６３８号公報JP 2000-134638 A

しかしながら、この手法においては、検出に用いるテンプレートと同等の大きさを有する被写体しか検出できない。したがって、対象画像中において複数の大きさで現された被写体を検出するためには、複数の大きさを検出するための多重解像度画像を、検出対象の大きさごとに別々に作成するための処理時間を要することとなり、十分に高速化を図ることができないという問題がある。 However, in this method, only a subject having the same size as the template used for detection can be detected. Therefore, in order to detect a subject displayed in a plurality of sizes in the target image, a process for separately generating a multi-resolution image for detecting a plurality of sizes for each size of the detection target Time is required, and there is a problem that the speed cannot be sufficiently increased.

そこで、本発明は、上記課題に鑑みてなされたものであり、対象画像中に存在する複数の大きさの被写体を効率的に検出可能とする技術を提供することを課題とする。 Therefore, the present invention has been made in view of the above problems, and an object of the present invention is to provide a technique that can efficiently detect a plurality of sizes of subjects present in a target image.

上記の課題を解決するため、請求項１の発明は、対象画像中に存在する物体を検出する検出装置であって、前記対象画像中における複数の大きさの物体を検出するために用いる第１の画像群であって、前記対象画像に対する倍率が互いに異なる複数の画像を有する第１の画像群と、前記対象画像中における前記複数の大きさの物体をそれぞれ高速に検出するために用いる第２の画像群であって、前記第１の画像群における前記複数の画像がそれぞれ所定倍率で縮小された状態に相当する複数の画像を有する第２の画像群とを含む階層画像群を作成する階層画像群作成手段と、前記階層画像群を用いて前記物体を検出する物体検出手段とを備え、前記第１の画像群と前記第２の画像群とは、各画像群の一部の画像についての前記対象画像に対する倍率が互いに同一となるように、当該一部の画像が共通化された状態で作成されることを特徴とする。 In order to solve the above problems, the invention of claim 1 is a detection device for detecting an object present in a target image, and is a first device used for detecting an object having a plurality of sizes in the target image. A first image group having a plurality of images with different magnifications relative to the target image, and a second image group used for detecting each of the plurality of objects in the target image at high speed. And a second image group that includes a plurality of images corresponding to a state in which the plurality of images in the first image group are reduced at a predetermined magnification, respectively. Image group creation means and object detection means for detecting the object using the hierarchical image group, wherein the first image group and the second image group are for a part of each image group For the target image As the magnification is identical to each other, and a part of the image the is created in shared state.

また、請求項２の発明は、請求項１の発明に係る検出装置において、前記階層画像群作成手段は、前記対象画像に対する倍率が互いに異なる複数の基準画像を作成するとともに当該複数の基準画像のそれぞれをさらに所定倍率で縮小した複数の縮小画像を作成することによって、前記複数の基準画像と前記複数の縮小画像とを含む前記階層画像群を作成し、前記複数の縮小画像の一部は、前記第１の画像群と前記第２の画像群との両方に属する画像として共通化されて作成されることを特徴とする。 According to a second aspect of the present invention, in the detection apparatus according to the first aspect of the invention, the hierarchical image group creating unit creates a plurality of reference images having different magnifications with respect to the target image, and the plurality of reference images. By creating a plurality of reduced images obtained by reducing each of them by a predetermined magnification, the hierarchical image group including the plurality of reference images and the plurality of reduced images is created, and a part of the plurality of reduced images includes: It is characterized by being created in common as images belonging to both the first image group and the second image group.

また、請求項３の発明は、請求項２の発明に係る検出装置において、前記階層画像群作成手段は、第１の縮小率αと第２の縮小率βとを、β＝α^N（Ｎは２以上の整数）の関係を満たすように決定する決定手段と、前記対象画像に基づいて第１の基準画像を作成するとともに当該第１の基準画像を前記第１の縮小率αの画像間縮小率で順次縮小して（Ｎ−１）個の中間階層の基準画像を作成することによって、前記第１の基準画像と前記（Ｎ−１）個の中間階層の基準画像とを含む前記複数の基準画像を作成する手段と、前記複数の基準画像のそれぞれをさらに前記第２の縮小率βで縮小することによって複数の縮小画像を作成する手段とを有することを特徴とする。 According to a third aspect of the present invention, in the detection apparatus according to the second aspect of the invention, the hierarchical image group creating means sets the first reduction rate α and the second reduction rate β to β = α ^N (N And a determination means for determining so as to satisfy a relationship of 2 or more) and a first reference image based on the target image and the first reference image between the images with the first reduction ratio α The plurality of images including the first reference image and the (N-1) intermediate layer reference images by sequentially reducing the reduction ratio to generate (N-1) intermediate layer reference images. And a means for creating a plurality of reduced images by further reducing each of the plurality of reference images at the second reduction ratio β.

また、請求項４の発明は、請求項３の発明に係る検出装置において、前記階層画像群作成手段は、前記複数の縮小画像のうちの１つの画像または複数の画像のそれぞれを、さらに前記第２の縮小率βで縮小することによって、少なくとも１つの再縮小画像を作成する手段、をさらに有することを特徴とする。 According to a fourth aspect of the present invention, in the detection apparatus according to the third aspect of the invention, the hierarchical image group creating means further includes one of the plurality of reduced images or each of the plurality of images. And means for creating at least one re-reduced image by reducing at a reduction ratio β of 2.

また、請求項５の発明は、請求項３または請求項４の発明に係る検出装置において、前記第２の縮小率βは、１／Ｋ（Ｋは２以上の整数）であることを特徴とする。 The invention according to claim 5 is the detection apparatus according to claim 3 or claim 4, wherein the second reduction rate β is 1 / K (K is an integer of 2 or more). To do.

また、請求項６の発明は、請求項１ないし請求項５のいずれかの発明に係る検出装置において、前記物体検出手段は、前記第１の画像群の中から選択した注目画像とテンプレートとを用いた比較処理によって、当該注目画像で検出対象となる大きさの物体を検出する動作を繰り返すことによって前記複数の大きさの物体を検出し、前記注目画像に対応する大きさの物体を検出する動作においては、前記注目画像と前記注目画像を前記所定倍率で縮小した画像であって前記第２の画像群の一画像とのうち低解像度の画像から順に用いて、物体の検出を行うことによって前記注目画像での検出対象となる大きさの物体を効率的に検出することを特徴とする。 According to a sixth aspect of the present invention, in the detection device according to any one of the first to fifth aspects, the object detection unit includes a target image selected from the first image group and a template. By using the comparison processing used, the plurality of sizes of objects are detected by repeating the operation of detecting the size of the target object in the target image, and the size of the object corresponding to the target image is detected. In operation, an object is detected by using the image of interest and an image obtained by reducing the image of interest at a predetermined magnification and sequentially using a low-resolution image from one image of the second image group. An object having a size to be detected in the attention image is efficiently detected.

また、請求項７の発明は、対象画像中に存在する物体を検出する検出方法であって、前記対象画像中における複数の大きさの物体を検出するために用いる第１の画像群であって、前記対象画像に対する倍率が互いに異なる複数の画像を有する第１の画像群と、前記対象画像中における前記複数の大きさの物体をそれぞれ高速に検出するために用いる第２の画像群であって、前記第１の画像群における前記複数の画像がそれぞれ所定倍率で縮小された状態に相当する複数の画像を有する第２の画像群とを含む階層画像群を作成する階層画像群作成工程と、前記階層画像群を用いて前記物体を検出する物体検出工程とを含み、前記第１の画像群と前記第２の画像群とは、各画像群の一部の画像についての前記対象画像に対する倍率が互いに同一となるように、当該一部の画像が共通化された状態で作成されることを特徴とする。 The invention of claim 7 is a detection method for detecting an object present in a target image, and is a first image group used for detecting an object having a plurality of sizes in the target image. A first image group having a plurality of images with different magnifications relative to the target image, and a second image group used for detecting each of the plurality of objects in the target image at high speed. A hierarchical image group creating step for creating a hierarchical image group including a second image group having a plurality of images corresponding to a state in which the plurality of images in the first image group are respectively reduced at a predetermined magnification; An object detection step of detecting the object using the hierarchical image group, wherein the first image group and the second image group are a magnification with respect to the target image with respect to a partial image of each image group Are identical to each other So that the characterized in that the part of the image the is created in shared state.

また、請求項８の発明は、請求項７の発明に係る検出方法において、前記階層画像群作成工程は、前記対象画像に対する倍率が互いに異なる複数の基準画像を作成するとともに当該複数の基準画像のそれぞれをさらに所定倍率で縮小した複数の縮小画像を作成することによって、前記複数の基準画像と前記複数の縮小画像とを含む前記階層画像群を作成し、前記複数の縮小画像の一部は、前記第１の画像群と前記第２の画像群との両方に属する画像として共通化されて作成されることを特徴とする。 The invention according to claim 8 is the detection method according to claim 7, wherein the hierarchical image group creating step creates a plurality of reference images having different magnifications with respect to the target image and the plurality of reference images. By creating a plurality of reduced images obtained by reducing each of them by a predetermined magnification, the hierarchical image group including the plurality of reference images and the plurality of reduced images is created, and a part of the plurality of reduced images includes: It is characterized by being created in common as images belonging to both the first image group and the second image group.

また、請求項９の発明は、請求項８の発明に係る検出方法において、前記階層画像群作成工程は、第１の縮小率αと第２の縮小率βとを、β＝α^N（Ｎは２以上の整数）の関係を満たすように決定する決定工程と、前記対象画像に基づいて第１の基準画像を作成するとともに当該第１の基準画像を前記第１の縮小率αの画像間縮小率で順次縮小して（Ｎ−１）個の中間階層の基準画像を作成することによって、前記第１の基準画像と前記（Ｎ−１）個の中間階層の基準画像とを含む前記複数の基準画像を作成する工程と、前記複数の基準画像のそれぞれをさらに前記第２の縮小率βで縮小することによって複数の縮小画像を作成する工程とを備えることを特徴とする。 The invention according to claim 9 is the detection method according to claim 8, wherein the hierarchical image group creating step sets the first reduction rate α and the second reduction rate β to β = α ^N (N Is determined to satisfy the relationship of an integer of 2 or more, and a first reference image is created based on the target image, and the first reference image is between the images with the first reduction ratio α. The plurality of images including the first reference image and the (N-1) intermediate layer reference images by sequentially reducing the reduction ratio to generate (N-1) intermediate layer reference images. And a step of creating a plurality of reduced images by further reducing each of the plurality of reference images at the second reduction ratio β.

また、請求項１０の発明は、請求項９の発明に係る検出方法において、前記階層画像群作成工程は、前記複数の縮小画像のうちの１つの画像または複数の画像のそれぞれを、さらに前記第２の縮小率βで縮小することによって、少なくとも１つの再縮小画像を作成する工程、をさらに有することを特徴とする。 Further, the invention of claim 10 is the detection method according to the invention of claim 9, wherein in the hierarchical image group creating step, one of the plurality of reduced images or each of the plurality of images is further processed. The method further includes the step of creating at least one re-reduced image by reducing at a reduction ratio β of 2.

また、請求項１１の発明は、請求項９または請求項１０の発明に係る検出方法において、前記第２の縮小率βは、１／Ｋ（Ｋは２以上の整数）であることを特徴とする。 The invention according to claim 11 is the detection method according to claim 9 or claim 10, wherein the second reduction rate β is 1 / K (K is an integer of 2 or more). To do.

また、請求項１２の発明は、請求項７ないし請求項１１のいずれかの発明に係る検出方法において、前記物体検出工程は、前記第１の画像群の中から選択した注目画像とテンプレートとを用いた比較処理によって、当該注目画像で検出対象となる大きさの物体を検出する動作を繰り返すことによって前記複数の大きさの物体を検出し、前記注目画像に対応する大きさの物体を検出する動作においては、前記注目画像と前記注目画像を前記所定倍率で縮小した画像であって前記第２の画像群の一画像とのうち低解像度の画像から順に用いて、物体の検出を行うことによって前記注目画像での検出対象となる大きさの物体を効率的に検出することを特徴とする。 According to a twelfth aspect of the present invention, in the detection method according to any one of the seventh to eleventh aspects, the object detection step includes a target image selected from the first image group and a template. By using the comparison processing used, the plurality of sizes of objects are detected by repeating the operation of detecting the size of the target object in the target image, and the size of the object corresponding to the target image is detected. In operation, an object is detected by using the image of interest and an image obtained by reducing the image of interest at a predetermined magnification and sequentially using a low-resolution image from one image of the second image group. An object having a size to be detected in the attention image is efficiently detected.

また、請求項１３の発明は、コンピュータに、対象画像中における複数の大きさの物体を検出するために用いる第１の画像群であって、前記対象画像に対する倍率が互いに異なる複数の画像を有する第１の画像群と、前記対象画像中における前記複数の大きさの物体をそれぞれ高速に検出するために用いる第２の画像群であって、前記第１の画像群における前記複数の画像がそれぞれ所定倍率で縮小された状態に相当する複数の画像を有する第２の画像群とを含む階層画像群を作成する階層画像群作成工程と、前記階層画像群を用いて前記物体を検出する物体検出工程とを実行させることによって、前記対象画像中に存在する物体を検出するプログラムであって、前記第１の画像群と前記第２の画像群とは、各画像群の一部の画像についての前記対象画像に対する倍率が互いに同一となるように、当該一部の画像が共通化された状態で作成されることを特徴とする。 The invention according to claim 13 is a first image group used for detecting an object having a plurality of sizes in the target image, and has a plurality of images having different magnifications with respect to the target image. A first image group, and a second image group used for detecting each of the plurality of size objects in the target image at high speed, wherein the plurality of images in the first image group are respectively A hierarchical image group creating step for creating a hierarchical image group including a second image group having a plurality of images corresponding to a state reduced at a predetermined magnification, and object detection for detecting the object using the hierarchical image group A program for detecting an object present in the target image by executing the step, wherein the first image group and the second image group are for a part of each image group. As the magnification for serial target image becomes the same as each other, and a part of the image the is created in shared state.

また、請求項１４の発明は、請求項１３の発明に係るプログラムにおいて、前記階層画像群作成工程は、前記対象画像に対する倍率が互いに異なる複数の基準画像を作成するとともに当該複数の基準画像のそれぞれをさらに所定倍率で縮小した複数の縮小画像を作成することによって、前記複数の基準画像と前記複数の縮小画像とを含む前記階層画像群を作成し、前記複数の縮小画像の一部は、前記第１の画像群と前記第２の画像群との両方に属する画像として共通化されて作成されることを特徴とする。 The invention according to claim 14 is the program according to claim 13, wherein the hierarchical image group creating step creates a plurality of reference images having different magnifications with respect to the target image and each of the plurality of reference images. Are further reduced at a predetermined magnification to create the hierarchical image group including the plurality of reference images and the plurality of reduced images, and a part of the plurality of reduced images is It is characterized in that it is created in common as images belonging to both the first image group and the second image group.

また、請求項１５の発明は、請求項１４の発明に係るプログラムにおいて、前記階層画像群作成工程は、前記コンピュータに、第１の縮小率αと第２の縮小率βとを、β＝α^N（Ｎは２以上の整数）の関係を満たすように決定する決定工程と、前記対象画像に基づいて第１の基準画像を作成するとともに当該第１の基準画像を前記第１の縮小率αの画像間縮小率で順次縮小して（Ｎ−１）個の中間階層の基準画像を作成することによって、前記第１の基準画像と前記（Ｎ−１）個の中間階層の基準画像とを含む前記複数の基準画像を作成する工程と、前記複数の基準画像のそれぞれをさらに前記第２の縮小率βで縮小することによって複数の縮小画像を作成する工程とを実行させることを特徴とする。 Further, the invention according to claim 15 is the program according to claim 14, wherein the hierarchical image group creating step sets the first reduction rate α and the second reduction rate β to the computer, and β = α ^N (N is an integer equal to or greater than 2), a determination step of determining so as to satisfy the relationship, and a first reference image is created based on the target image and the first reference image is converted into the first reduction ratio α. The (N-1) intermediate layer reference images are sequentially reduced at the image reduction ratios to create the first reference image and the (N-1) intermediate layer reference images. Including a step of creating the plurality of reference images, and a step of creating a plurality of reduced images by further reducing each of the plurality of reference images at the second reduction rate β. .

また、請求項１６の発明は、請求項１５の発明に係るプログラムにおいて、前記階層画像群作成工程は、前記コンピュータに、前記複数の縮小画像のうちの１つの画像または複数の画像のそれぞれを、さらに前記第２の縮小率βで縮小することによって、少なくとも１つの再縮小画像を作成する工程、をさらに実行させることを特徴とする。 The invention of claim 16 is the program according to the invention of claim 15, wherein in the hierarchical image group creating step, one image or each of a plurality of images among the plurality of reduced images is stored in the computer. Further, the step of generating at least one re-reduced image is further executed by reducing at the second reduction rate β.

また、請求項１７の発明は、請求項１５または請求項１６の発明に係るプログラムにおいて、前記第２の縮小率βは、１／Ｋ（Ｋは２以上の整数）であることを特徴とする。 The invention according to claim 17 is the program according to claim 15 or claim 16, wherein the second reduction ratio β is 1 / K (K is an integer of 2 or more). .

また、請求項１８の発明は、請求項１３ないし請求項１７のいずれかの発明に係るプログラムにおいて、前記物体検出工程は、前記コンピュータに、前記第１の画像群の中から選択した注目画像とテンプレートとを用いた比較処理によって、当該注目画像に対応する大きさの物体を検出する動作を繰り返すことによって前記複数の大きさの物体を検出し、前記注目画像で検出対象となる大きさの物体を検出する動作においては、前記注目画像と前記注目画像を前記所定倍率で縮小した画像であって前記第２の画像群の一画像とのうち低解像度の画像から順に用いて、物体の検出を行うことによって前記注目画像での検出対象となる大きさの物体を効率的に検出することを特徴とする。 According to an eighteenth aspect of the present invention, in the program according to any one of the thirteenth to seventeenth aspects, the object detection step causes the computer to select an image of interest selected from the first image group. The object having the size to be detected in the target image is detected by repeating the operation of detecting the object having the size corresponding to the target image by the comparison process using the template. In the operation of detecting the object, the object is detected by using the image of interest and the image of interest reduced at the predetermined magnification in order from the low resolution image of the second image group. By performing this, it is possible to efficiently detect an object having a size to be detected in the target image.

請求項１から請求項１８に記載の発明によれば、対象画像中における複数の大きさの物体を検出するために用いる画像群と対象画像中における物体を高速に検出するために用いる画像群とが一部共通化された階層画像群が作成されるので、対象画像中に存在する複数の大きさの物体を効率的に検出することが可能となる。 According to the invention described in claims 1 to 18, an image group used for detecting an object of a plurality of sizes in the target image, an image group used for detecting an object in the target image at high speed, Since a hierarchical image group in which a part of the images is shared is created, an object having a plurality of sizes existing in the target image can be efficiently detected.

以下、本発明の実施形態を図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

＜概要＞
図１は、本発明の実施形態に係る検出装置（画像処理装置）１の概要を示す図である。図１に示すように、検出装置１は、コンピュータシステム（以下、単に「コンピュータ」とも称する）として構成される。具体的には、当該コンピュータ（検出装置）は、ＣＰＵ２と、記憶部３と、メディアドライブ４と、液晶ディスプレイなどの表示部５と、キーボード６ａ及びポインティングデバイスであるマウス６ｂなどの入力部６と、ネットワークカードなどの通信部７とを備えている。記憶部３は、複数の記憶媒体、具体的には、ハードディスクドライブ（ＨＤＤ）３ａと、ＨＤＤ３ａよりも高速処理可能なＲＡＭ（半導体メモリ）３ｂとを有している。また、メディアドライブ４は、ＣＤ−ＲＯＭ、ＤＶＤ（Digital Versatile Disk）、フレキシブルディスク、メモリカードなどの可搬性の記録媒体９からその中に記録されている情報を読み出すことができる。尚、このコンピュータに対して供給される情報は、記録媒体９を介して供給される場合に限定されず、ＬＡＮ及びインターネットなどのネットワークを介して供給されてもよい。 <Overview>
FIG. 1 is a diagram showing an outline of a detection apparatus (image processing apparatus) 1 according to an embodiment of the present invention. As shown in FIG. 1, the detection apparatus 1 is configured as a computer system (hereinafter also simply referred to as “computer”). Specifically, the computer (detection device) includes a CPU 2, a storage unit 3, a media drive 4, a display unit 5 such as a liquid crystal display, an input unit 6 such as a keyboard 6a and a mouse 6b that is a pointing device. And a communication unit 7 such as a network card. The storage unit 3 includes a plurality of storage media, specifically, a hard disk drive (HDD) 3a, and a RAM (semiconductor memory) 3b capable of processing at higher speed than the HDD 3a. The media drive 4 can read information recorded therein from a portable recording medium 9 such as a CD-ROM, DVD (Digital Versatile Disk), flexible disk, or memory card. The information supplied to the computer is not limited to the case where the information is supplied via the recording medium 9, but may be supplied via a network such as a LAN and the Internet.

検出装置１は、記録媒体９等から読み出されて記憶部３に予め記録されたソフトウェアプログラム（以下、単に「プログラム」とも称する）をメモリ（ＲＡＭ等）上に読み込み、そのプログラムをＣＰＵ２等を用いて実行することによって、後述するような各種機能を実現する。 The detection apparatus 1 reads a software program (hereinafter simply referred to as “program”) read from the recording medium 9 and recorded in advance in the storage unit 3 onto a memory (RAM or the like), and loads the program on the CPU 2 or the like. By using and executing, various functions as described later are realized.

図２は、検出装置１が備える各種機能を示すブロック図である。図２に示すように、検出装置１は、情報設定部１１とピラミッド画像作成部１２と探索窓決定部１３とマッチング処理部１４とを備えている。 FIG. 2 is a block diagram illustrating various functions included in the detection device 1. As shown in FIG. 2, the detection apparatus 1 includes an information setting unit 11, a pyramid image creation unit 12, a search window determination unit 13, and a matching processing unit 14.

情報設定部１１は、検出に用いる参照情報を検出に適した情報に設定する機能を有している。 The information setting unit 11 has a function of setting reference information used for detection to information suitable for detection.

ピラミッド画像作成部１２は、検出の対象となる対象画像（以下、「元画像」とも称する）中に存在する複数の大きさの物体を検出するために用いる画像群（以下、「複数サイズ検出用ピラミッド画像」とも称する）と、対象画像中に存在する複数の大きさの物体をそれぞれ高速に検出可能とする画像群（以下、「高速検出用ピラミッド画像」とも称する）とが一部共通化された階層画像群（以下、「統合ピラミッド画像」とも称する）を作成する機能を有している。尚、以下では、「複数サイズ検出用ピラミッド画像」、「高速検出用ピラミッド画像」及び「統合ピラミッド画像」をそれぞれ単に「ピラミッド画像」とも称する。 The pyramid image creation unit 12 is an image group (hereinafter referred to as “multiple size detection use”) for detecting an object having a plurality of sizes present in a target image (hereinafter also referred to as “original image”) to be detected. And a group of images (hereinafter also referred to as “high-speed detection pyramid images”) that can detect a plurality of objects of a plurality of sizes existing in the target image at high speed. A hierarchical image group (hereinafter also referred to as “integrated pyramid image”). In the following description, the “multiple-size detection pyramid image”, the “high-speed detection pyramid image”, and the “integrated pyramid image” are also simply referred to as “pyramid images”, respectively.

探索窓決定部１３は、前述のピラミッド画像作成部１２によって作成された階層画像群中の所定の画像上において、検出に用いる情報をラスタスキャンする探索窓領域を決定する機能を有している。 The search window determination unit 13 has a function of determining a search window region for raster scanning information used for detection on a predetermined image in the hierarchical image group created by the pyramid image creation unit 12 described above.

マッチング処理部１４は、前述の探索窓決定部１３において決定された探索窓領域において、照合及び検索処理を行い、検出対象物体の検出を行う機能を有している。詳細にはマッチング処理部１４は、テンプレートを用いた比較処理を行うことによって対象画像中に存在する対象物体を検出する。尚、テンプレートを用いた比較処理としては、（１）テンプレートとして利用される画像と対象画像とを画素単位で比較する処理（パターンマッチング処理あるいは直接比較処理とも称する）、及び、（２）テンプレート画像と対象画像とからそれぞれ抽出した特徴量同士を比較する処理（間接比較処理とも称する）などが存在する。 The matching processing unit 14 has a function of performing a collation and search process in the search window region determined by the search window determination unit 13 and detecting a detection target object. Specifically, the matching processing unit 14 detects a target object existing in the target image by performing a comparison process using a template. The comparison process using the template includes (1) a process of comparing an image used as a template with the target image in pixel units (also referred to as a pattern matching process or a direct comparison process), and (2) a template image. There is a process (also referred to as an indirect comparison process) for comparing feature amounts extracted from the target image and the target image.

図３は、被写体として数人の人物を現す画像ＰＩを示す図である。図４は、プログラム実行時の操作画面を示す図である。 FIG. 3 is a diagram showing an image PI showing several persons as subjects. FIG. 4 is a diagram showing an operation screen during program execution.

上述の各機能を備えた検出装置１は、対象画像中に存在する複数の大きさの物体を、高速に検出する。例えば、図３に示された画像ＰＩ中に存在する人物の顔を検出対象として検出する場合、検出装置１は、単一の大きさの顔だけでなく、図４に示す操作画面Ｑ中のウィンドウＷ１において使用者（ユーザ）によって指定された範囲内（最も小さい顔から最も大きい顔まで）の複数の大きさの顔を検出することができる。 The detection apparatus 1 having the above-described functions detects a plurality of sizes of objects present in the target image at high speed. For example, when detecting the face of a person existing in the image PI shown in FIG. 3 as a detection target, the detection apparatus 1 is not only a single size face but also in the operation screen Q shown in FIG. A plurality of sizes of faces within a range designated by the user (user) in the window W1 (from the smallest face to the largest face) can be detected.

ここで、検出装置１において実行される検出処理の概要について説明する。 Here, an outline of the detection process executed in the detection apparatus 1 will be described.

図５は、検出装置１において実行される検出処理を模式的に示した図である。図６は、元画像ＦＥ１（ＰＩ）中に存在する複数の大きさの物体をピラミッド画像を用いて検出する検出処理の原理を示す模式図である。図７は、画像中に存在する所定の物体を高速に検出する検出処理の原理を示す模式図である。図８は、図６と図７とに示される両検出処理を組み合わせた検出処理の一例を示す模式図であり、図５に示される検出処理の比較例を示す図である。尚、図５と図８との関係は後に詳述する。 FIG. 5 is a diagram schematically illustrating a detection process executed in the detection apparatus 1. FIG. 6 is a schematic diagram showing the principle of detection processing for detecting an object having a plurality of sizes in the original image FE1 (PI) using a pyramid image. FIG. 7 is a schematic diagram showing the principle of detection processing for detecting a predetermined object existing in an image at high speed. FIG. 8 is a schematic diagram illustrating an example of a detection process in which both the detection processes illustrated in FIGS. 6 and 7 are combined, and is a diagram illustrating a comparative example of the detection process illustrated in FIG. The relationship between FIG. 5 and FIG. 8 will be described in detail later.

図５に示すように、検出装置１は、検出位置精度の高い検出手法であるテンプレートを用いたテンプレートマッチングによって物体検出を行っている。以下では、検出装置１が有する各機能の概要を説明する。 As illustrated in FIG. 5, the detection apparatus 1 performs object detection by template matching using a template that is a detection technique with high detection position accuracy. Below, the outline | summary of each function which the detection apparatus 1 has is demonstrated.

検出装置１は、情報設定部１１において、検出テンプレートＴＥ１と、検出テンプレートＴＥ１を縮小率βで縮小した縮小テンプレートＴＥ２と、縮小テンプレートＴＥ２を縮小率βでさらに縮小した再縮小テンプレートＴＥ３とで構成される検出用情報ＩＦ（ＩＦ１）を作成する。また、ピラミッド画像作成部１２では、対象画像ＦＥ１から縮小率γと縮小率αと縮小率βとを用いて階層画像群ＭＰ（ＭＰ１）を作成する。各縮小率α，β，γについては後述する。さらに、探索窓決定部１３においては、前述のピラミッド画像作成部１２によって作成された階層画像群ＭＰ１中の所定階層の画像において、探索窓領域を決定する。この探索窓領域は、検出対象物体をラスタスキャン動作によって検索する際に用いられる。そして、マッチング処理部１４において、前述の決定された探索窓領域での照合及び検索を行い、検出対象物体の検出処理を行う。具体的には、図５に示されるように、第２階層の画像ＦＥ２中で検出テンプレートＴＥ１に対応する大きさの物体を検出する場合、検出装置１は、再縮小テンプレートＴＥ３と第６階層の画像ＦＥ６との間で上述の検出処理を実行する。当該検出処理によって検出されれば、その検出位置を基にして、より高い解像度間つまり縮小テンプレートＴＥ２と第４階層の画像ＦＥ４との間において検出処理を行う。さらに検出されれば、当該検出位置を基にして、検出テンプレートＴＥ１と第２階層の画像ＦＥ２との間において検出処理を行う。このような動作によれば、低解像度の画像を用いた検出処理を行うことによって検出対象物体の存在及びその概要位置を高速に検出することができるとともに、その後に、より高解像度の画像を用いた検出処理を行うことによってさらに高精度に検出対象物体の位置を特定することができる。 In the information setting unit 11, the detection apparatus 1 includes a detection template TE1, a reduction template TE2 obtained by reducing the detection template TE1 at a reduction rate β, and a re-reduction template TE3 obtained by further reducing the reduction template TE2 at a reduction rate β. The detection information IF (IF1) is created. Further, the pyramid image creation unit 12 creates a hierarchical image group MP (MP1) from the target image FE1 using the reduction rate γ, the reduction rate α, and the reduction rate β. Each reduction rate α, β, γ will be described later. Further, the search window determination unit 13 determines a search window region in an image of a predetermined hierarchy in the hierarchy image group MP1 created by the pyramid image creation unit 12 described above. This search window region is used when searching for a detection target object by a raster scan operation. Then, the matching processing unit 14 performs collation and search in the determined search window region, and performs detection processing of the detection target object. Specifically, as illustrated in FIG. 5, when detecting an object having a size corresponding to the detection template TE1 in the image FE2 in the second hierarchy, the detection apparatus 1 includes the re-reduction template TE3 and the sixth hierarchy. The above-described detection process is executed with the image FE6. If detected by the detection process, the detection process is performed between higher resolutions, that is, between the reduced template TE2 and the fourth-layer image FE4 based on the detection position. If further detection is performed, detection processing is performed between the detection template TE1 and the second-level image FE2 based on the detection position. According to such an operation, by performing detection processing using a low-resolution image, it is possible to detect the presence of the detection target object and its outline position at a high speed, and then use a higher-resolution image. By performing the detection process, the position of the detection target object can be specified with higher accuracy.

ここで、検出装置１における検出方法の基礎となる検出原理について図６及び図７を用いて説明する。 Here, the detection principle that is the basis of the detection method in the detection apparatus 1 will be described with reference to FIGS. 6 and 7.

図６は、上述したように、複数サイズの物体をピラミッド画像を用いて検出する検出処理の原理を示す模式図である。図６には、元画像ＦＥ１を縮小することによって複数の大きさ検出用（複数サイズ検出用）のピラミッド画像ＰＥ１を作成し、作成した階層画像ごとに検出テンプレートＴＥ１を用いて検索又は照合（検出処理）を行うことで、元画像ＦＥ１中に存在する複数の大きさを有する物体を一つの検出テンプレートＴＥ１を用いて検出可能とする検出方法が示されている。 FIG. 6 is a schematic diagram illustrating the principle of detection processing for detecting an object having a plurality of sizes using a pyramid image as described above. In FIG. 6, a plurality of size detection (multiple size detection) pyramid images PE1 are created by reducing the original image FE1, and search or collation (detection) is performed using the detection template TE1 for each created hierarchical image. FIG. 2 shows a detection method in which an object having a plurality of sizes existing in the original image FE1 can be detected using one detection template TE1.

ここで、元画像ＦＥ１中に存在する複数の大きさを有する物体を一つの検出テンプレートＴＥ１によって検出可能となることについて詳述する。元画像ＦＥ１と検出テンプレートＴＥ１との間で実行される検出処理においては、元画像ＦＥ１中に存在する検出テンプレートＴＥ１に対応する大きさの物体が検出される。一方、元画像ＦＥ１を縮小した画像ＦＥ２と検出テンプレートＴＥ１との間の検出処理においては、画像ＦＥ２中に存在する検出テンプレートＴＥ１に対応する大きさの物体が検出される。画像ＦＥ２と検出テンプレートＴＥ１との間で検出される大きさの物体は、元画像ＦＥ１に換算すると検出テンプレートＴＥ１に対応する大きさよりもさらに大きな物体となる。つまり、元画像ＦＥ１を縮小した画像ＦＥ２と検出テンプレートＴＥ１との間で実行される検出処理では、検出テンプレートＴＥ１に対応する大きさの物体よりもさらに大きな物体を検出していることになる。また、このように検出される物体の大きさは、元画像ＦＥ１と縮小した画像との間の縮小率によって決まるため、縮小率を変更することによって複数の大きさの物体を一つの検出テンプレートＴＥ１を用いて検出することが可能となる。 Here, it will be described in detail that an object having a plurality of sizes existing in the original image FE1 can be detected by one detection template TE1. In the detection process executed between the original image FE1 and the detection template TE1, an object having a size corresponding to the detection template TE1 existing in the original image FE1 is detected. On the other hand, in the detection process between the image FE2 obtained by reducing the original image FE1 and the detection template TE1, an object having a size corresponding to the detection template TE1 existing in the image FE2 is detected. An object having a size detected between the image FE2 and the detection template TE1 is an object larger than the size corresponding to the detection template TE1 when converted to the original image FE1. That is, in the detection process executed between the image FE2 obtained by reducing the original image FE1 and the detection template TE1, an object larger than the object having a size corresponding to the detection template TE1 is detected. Further, since the size of the object detected in this way is determined by the reduction ratio between the original image FE1 and the reduced image, an object having a plurality of sizes is changed to one detection template TE1 by changing the reduction ratio. It becomes possible to detect using.

図７は、上述したように、所定の大きさの物体を高速に検出する検出処理の原理を示す模式図である。具体的には、図７に示されるように、まず、検出テンプレートＴＥ１と検出対象画像ＫＥとを縮小率βでそれぞれ縮小することで検出用情報ＩＦ１（具体的には、テンプレートＴＥ１，ＴＥ２，ＴＥ３）と高速検出用ピラミッド画像ＰＳ１（具体的には、画像ＫＥ，ＫＳ２，ＫＳ３）とが作成される。そして、作成した低解像度画像ＫＳ３とテンプレートＴＥ３とを用いて最初の検出処理を実行することによって対象物の概略位置を特定し、さらに、特定された概略位置の情報に基づいて、高解像度画像（ＫＳ２，ＫＥ）と当該高解像度に対応するテンプレートＴＥ２，ＴＥ１とを用いてさらに正確な検出を行う。これにより、画像ＫＥ中に存在する物体を高速に検出することが可能となる。 FIG. 7 is a schematic diagram showing the principle of detection processing for detecting an object of a predetermined size at high speed as described above. Specifically, as shown in FIG. 7, first, the detection information TE1 (specifically, the templates TE1, TE2, TE3, and the detection information TE1 are reduced by reducing the detection template TE1 and the detection target image KE at the reduction ratio β. ) And a high-speed detection pyramid image PS1 (specifically, images KE, KS2, and KS3). Then, an approximate position of the object is specified by executing an initial detection process using the generated low-resolution image KS3 and template TE3, and further, based on the information on the specified approximate position, a high-resolution image ( More accurate detection is performed using KS2, KE) and templates TE2, TE1 corresponding to the high resolution. As a result, an object present in the image KE can be detected at high speed.

そして、上述した二つの検出処理を組み合わせることによって、すなわち、図６に示されるピラミッド画像ＰＥ１中における階層画像ごとに、図７に示される高速検出手法を採用することによって、元画像ＦＥ１中に存在する複数の大きさの物体を高速に検出することが可能となる。図８は、このような検出原理を用いた比較例に係る検出処理を示している。 Then, by combining the above-described two detection processes, that is, for each hierarchical image in the pyramid image PE1 shown in FIG. 6, by using the high-speed detection method shown in FIG. 7, it exists in the original image FE1. It is possible to detect an object of a plurality of sizes at a high speed. FIG. 8 shows a detection process according to a comparative example using such a detection principle.

一方、上述の図５の検出処理は、図８のような検出原理を用いつつ更に高速化を図ることを可能にするものである。具体的には、上述した図５に示される検出装置１が実行する検出方法では、図８中の複数用ピラミッド画像ＰＥ１と高速用ピラミッド画像ＰＳ１とを一部共通化したピラミッド画像ＭＰ１を作成し、これを検出処理に用いることによって、図８に示す検出手法よりもピラミッド画像作成の効率化を可能としている。つまり、検出装置１は、画像中に存在する複数の大きさの物体を高速に検出することを可能とし、さらに、検出に用いるピラミッド画像作成の効率化により、検出処理時間の短縮も実現している。 On the other hand, the above-described detection process of FIG. 5 makes it possible to further increase the speed while using the detection principle as shown in FIG. Specifically, in the detection method executed by the detection apparatus 1 shown in FIG. 5 described above, a pyramid image MP1 is created by partially sharing the plurality of pyramid images PE1 and the high-speed pyramid image PS1 in FIG. By using this for the detection process, the pyramid image can be created more efficiently than the detection method shown in FIG. In other words, the detection device 1 can quickly detect a plurality of sizes of objects present in an image, and further, the detection processing time can be shortened by improving the efficiency of creating a pyramid image used for detection. Yes.

＜動作＞
以下では、上述した検出装置１の各機能について詳細に説明する。尚、ここでは、検出対象物体を顔とした場合について説明するが、対象画像上で物体の大きさを想定して検出する場合であれば本発明の思想を適用することが可能であり、検出対象物は顔に限定されない。 <Operation>
Below, each function of the detection apparatus 1 mentioned above is demonstrated in detail. Here, the case where the detection target object is a face will be described. However, the idea of the present invention can be applied if detection is performed assuming the size of the object on the target image. The object is not limited to the face.

図９は、検出装置１の全体動作を示すフローチャートである。 FIG. 9 is a flowchart showing the overall operation of the detection apparatus 1.

まず、使用者（ユーザ）が行う入力部６等の操作による検出開始指示（ステップＳ２１）の後、検出に用いる参照情報である検出対象物のテンプレート（以下、「検出テンプレート」とも称する）から検出用情報ＩＦが設定され（ステップＳ２２）、また、対象画像から階層画像群ＭＰが作成される（ステップＳ２３）。 First, after a detection start instruction (step S21) by an operation of the input unit 6 or the like performed by a user (user), detection is performed from a detection object template (hereinafter also referred to as “detection template”) which is reference information used for detection. The use information IF is set (step S22), and the hierarchical image group MP is created from the target image (step S23).

次に、ステップＳ２２において設定された情報に基づいて、前述の階層画像群ＭＰ中の所定の画像上においてラスタスキャンを行う探索窓の領域が決定され（ステップＳ２４）、当該探索窓において、マッチング（照合）処理が行われる（ステップＳ２５）。さらに、所定画像上の領域が全て検索されたか否かを判定し（ステップＳ２６）、検索が終了したと判定されると検出処理を終了し（ステップＳ２７）、検索が終了していないと判定されると、ステップＳ２４からステップＳ２６までの工程が繰り返し実行される。 Next, based on the information set in step S22, a search window area for performing raster scan on a predetermined image in the hierarchical image group MP is determined (step S24), and matching ( A collation process is performed (step S25). Further, it is determined whether or not all areas on the predetermined image have been searched (step S26). When it is determined that the search has been completed, the detection process is ended (step S27), and it is determined that the search has not been completed. Then, the process from step S24 to step S26 is repeatedly performed.

以下、ステップＳ２１からステップＳ２５において実行される各処理を詳細に説明する。図１０は、情報設定処理を示すフローチャートである。図１１は、ピラミッド画像作成処理を示すフローチャートである。図１２は、探索窓決定処理を示すフローチャートである。図１３は、マッチング処理を示すフローチャートである。 Hereinafter, each process executed in steps S21 to S25 will be described in detail. FIG. 10 is a flowchart showing the information setting process. FIG. 11 is a flowchart showing pyramid image creation processing. FIG. 12 is a flowchart showing search window determination processing. FIG. 13 is a flowchart showing the matching process.

＜顔検出処理開始＞
顔検出処理を開始する前に、使用者（ユーザ）は、検出装置１によって検出する顔の大きさを指定する。具体的には、図４に示される操作画面Ｑ上で使用者（ユーザ）が入力部６等の操作を行い、検出装置１により検出される顔の大きさを指定する。例えば、操作画面Ｑ中の小ウィンドウＷ１の欄Ｒ１に数値「１」が、欄Ｒ２に数値「６」が入力されると、各欄の上部に入力した数値に該当する顔の大きさが表示される。尚、欄Ｒ１では、検出対象とする最小の大きさＭＮ（最小サイズ）を指定し、欄Ｒ２では、検出対象とする最大の大きさＭＸ（最大サイズ）を指定する。使用者（ユーザ）は、画像ＰＩ中に現された顔の大きさと検出対象とする顔の大きさとを比較し、検出する顔の大きさを決定する。決定は、各欄に数値を入力した状態で、実行ボタンＢＴを入力部６等で選択し押下することによって行う。上述の数値「１」及び数値「６」が欄Ｒ１及び欄Ｒ２にそれぞれ入力された状態で実行ボタンＢＴが押下されると、数値「１」に該当する大きさＳＺ１から数値「６」に該当する大きさＳＺ６までの６種類の大きさ（ＳＺ１〜ＳＺ６）の顔が検出対象として指定され（大きさの数Ｍ＝６）、検出処理が開始されたことになり、情報設定処理（ステップＳ２２）へと移行する。 <Start face detection processing>
Before starting the face detection process, the user (user) designates the size of the face detected by the detection device 1. Specifically, the user (user) operates the input unit 6 and the like on the operation screen Q shown in FIG. 4 to specify the size of the face detected by the detection device 1. For example, when a numerical value “1” is input in the field R1 of the small window W1 in the operation screen Q and a numerical value “6” is input in the field R2, the face size corresponding to the input numerical value is displayed at the top of each field. Is done. In the column R1, the minimum size MN (minimum size) to be detected is specified, and in the column R2, the maximum size MX (maximum size) to be detected is specified. The user (user) compares the size of the face appearing in the image PI with the size of the face to be detected, and determines the size of the face to be detected. The determination is performed by selecting and pressing the execution button BT with the input unit 6 or the like in a state where numerical values are input in the respective columns. When the execution button BT is pressed in a state where the numerical value “1” and the numerical value “6” are input in the column R1 and the column R2, respectively, the size SZ1 corresponding to the numerical value “1” corresponds to the numerical value “6”. Faces of six types of sizes (SZ1 to SZ6) up to the size SZ6 to be performed are designated as detection targets (number of sizes M = 6), and detection processing is started, and information setting processing (step S22) ).

＜情報設定処理＞
図１４は、情報設定処理を模式的に示した図である。 <Information setting process>
FIG. 14 is a diagram schematically illustrating the information setting process.

この実施形態においては、図７と同様に、各大きさの顔を検出する際に、２段階の縮小画像を用いて比較処理を行うものとする。そのため、検出テンプレートＴＰ１に加えて、当該テンプレートＴＰ１を縮小率βを用いて順次に縮小した２つのテンプレートＴＰ２，ＴＰ３が作成される。 In this embodiment, as in FIG. 7, when detecting a face of each size, comparison processing is performed using two-stage reduced images. Therefore, in addition to the detection template TP1, two templates TP2 and TP3 are created by sequentially reducing the template TP1 using the reduction ratio β.

具体的には、図１０に示されるように、情報設定処理が開始されると（ステップＳ３１）、記憶部３に保持している検出用の参照情報である顔検出テンプレートＴＰ１から検出用情報ＩＦ（ＩＦ２）が設定される（ステップＳ３２）。詳細には、顔検出テンプレートＴＰ１から縮小率β＝１／２によって縮小された縮小顔テンプレートＴＰ２が作成され、さらに、当該縮小顔テンプレートＴＰ２から縮小率β＝１／２によって再縮小テンプレートＴＰ３が作成される。 Specifically, as shown in FIG. 10, when the information setting process is started (step S31), the detection information IF is detected from the face detection template TP1 that is the reference information for detection held in the storage unit 3. (IF2) is set (step S32). Specifically, a reduced face template TP2 reduced by the reduction rate β = 1/2 is created from the face detection template TP1, and a re-reduction template TP3 is created from the reduced face template TP2 by the reduction rate β = 1/2. Is done.

ここで、縮小率βは、上述の１／２に限定されるものではなく、様々な値であってもよが、縮小率β＝１／Ｋ（Ｋは２以上の整数）を満たす数値であることが好ましい。尚、本願においては、画像の「縮小率」を、画像一辺あたりの倍率で表すものとする。また、画像の縮小は、処理時間の制限及び必要画質に応じて、バイリニア法（bi-linear interpolation）、ニアレストネイバー法(nearest neighbor) 又はバイキュービック法（bi-cubic convolution）等の既存の補間方法を用いて行うことができる。 Here, the reduction ratio β is not limited to the above-mentioned ½, and may be various values, but is a numerical value satisfying the reduction ratio β = 1 / K (K is an integer of 2 or more). Preferably there is. In the present application, the “reduction ratio” of an image is represented by a magnification per side of the image. Image reduction can be done with existing interpolation methods such as bi-linear interpolation, nearest neighbor method, or bi-cubic convolution depending on processing time limitations and required image quality. It can be done using the method.

さらに、この実施形態においては、再縮小テンプレートＴＰ３と縮小画像とを直接的に比較するのではなく、再縮小テンプレートＴＰ３から抽出した特徴量と縮小画像から抽出した特徴量とを比較することによって、テンプレートを用いた比較処理を行う場合を例示する。そのため、このステップＳ３２において、再縮小テンプレートＴＰ３から当該全体領域に対する肌色領域の割合を数値化した特徴量ＣＨを抽出する。尚、これに限定されず、再縮小テンプレートＴＰ３と縮小画像とを直接的に画素単位で比較するパターンマッチング処理によって、テンプレートを用いた比較処理を行うようにしてもよく、その場合には、特徴量ＣＨの抽出処理は不要である。 Furthermore, in this embodiment, instead of directly comparing the reduced image template TP3 and the reduced image, by comparing the feature amount extracted from the reduced image template TP3 with the feature amount extracted from the reduced image, The case where the comparison process using a template is performed is illustrated. Therefore, in step S32, a feature amount CH in which the ratio of the skin color area to the entire area is quantified is extracted from the re-reduction template TP3. However, the present invention is not limited to this, and comparison processing using a template may be performed by pattern matching processing that directly compares the re-reduced template TP3 and the reduced image in units of pixels. The amount CH extraction process is not necessary.

このように作成又は抽出された各テンプレート及び特徴量ＣＨは、検出用情報ＩＦ（ＩＦ２）としてＲＡＭ３ｂに読み込まれ、後述するマッチング処理において使用される。 Each template and feature amount CH created or extracted in this way is read into the RAM 3b as detection information IF (IF2) and used in matching processing described later.

尚、顔検出テンプレートＴＰ１を基準として実行するテンプレート縮小回数Ｓは、上述のような回数（すなわち２回）に限定されるものではなく、検出処理時間及び検出精度等との関係により適切な回数に変更することが可能である。また、特徴量ＣＨは、再縮小テンプレートＴＰ３から抽出することに限定されず、縮小テンプレートＴＰ２又は顔検出テンプレートＴＰ１から特徴量を抽出し、マッチング処理に使用してもよい。さらに、ここでは、抽出される特徴量ＣＨとして、前述のように再縮小テンプレートＴＰ３が有する当該全体領域に対する肌色領域の割合を数値化した値を例示しているがこれに限定されず、当該テンプレートが有する他の特徴を抽出したものであってもよい。また、主成分分析 (Principal Component Analysis, ＰＣＡ)等を利用してテンプレートをパラメータ化した情報を特徴量として用いてもよい。 Note that the number of template reductions S to be executed on the basis of the face detection template TP1 is not limited to the number of times as described above (that is, twice), and is set to an appropriate number depending on the relationship with the detection processing time, detection accuracy, and the like. It is possible to change. The feature amount CH is not limited to being extracted from the re-reduction template TP3, and the feature amount may be extracted from the reduction template TP2 or the face detection template TP1 and used for matching processing. Furthermore, here, as the extracted feature quantity CH, a value obtained by quantifying the ratio of the skin color area to the entire area of the re-reduction template TP3 as described above is illustrated, but the present invention is not limited to this, and the template is not limited thereto. It may be one obtained by extracting other features of the. Further, information obtained by parameterizing a template using principal component analysis (PCA) or the like may be used as a feature amount.

検出用情報ＩＦ（ＩＦ２）が設定されると情報設定処理の工程を終了し（ステップＳ３３）、ピラミッド画像作成処理（ステップＳ２３）に移行する。 When the detection information IF (IF2) is set, the information setting process ends (step S33), and the process proceeds to a pyramid image creation process (step S23).

＜ピラミッド画像作成処理＞
図１５は、ピラミッド画像作成処理における基準画像作成を示す図である。図１６は、ピラミッド画像作成処理における縮小画像作成を示す図である。図１７は、ピラミッド画像の完成図である。図１８は、ピラミッド画像における画像種別を示す図である。図１９は、ピラミッド画像における各画像間の縮小倍率を示す図である。 <Pyramid image creation processing>
FIG. 15 is a diagram illustrating reference image creation in the pyramid image creation processing. FIG. 16 is a diagram illustrating reduced image creation in the pyramid image creation processing. FIG. 17 is a completed view of the pyramid image. FIG. 18 is a diagram illustrating image types in a pyramid image. FIG. 19 is a diagram illustrating a reduction ratio between images in a pyramid image.

図１１に示されるように、ピラミッド画像作成処理が開始されると（ステップＳ４１）、作成する階層数Ｌ（以下、「総階層数」とも称する）が決定される（ステップＳ４２）。具体的には、検出対象の顔の大きさの数Ｍとテンプレート縮小回数Ｓと後述する基準画像数Ｎとの関係から導出される総階層数Ｌ＝Ｍ＋Ｓ×Ｎによって決定される。本実施形態においては、検出対象の顔の大きさの数Ｍ＝６、テンプレート縮小回数Ｓ＝２、基準画像数Ｎ＝Ｍ−２＝４とした場合について説明を行う。よって、本実施形態では、作成される総階層数は、Ｌ＝１４階層となる。 As shown in FIG. 11, when the pyramid image creation process is started (step S41), the number L of layers to be created (hereinafter also referred to as “total number of layers”) is determined (step S42). Specifically, it is determined by the total number of layers L = M + S × N derived from the relationship among the number M of face sizes to be detected, the number of template reductions S, and a reference image number N described later. In the present embodiment, a case will be described where the number M of face sizes to be detected is M = 6, the number of template reductions S = 2, and the number of reference images N = M−2 = 4. Therefore, in the present embodiment, the total number of hierarchies created is L = 14 hierarchies.

次に、検出処理開始（ステップＳ２１）時に指定された最小の顔の大きさを検出するために用いる第１階層の画像Ｆ１が作成される（ステップＳ４３）。具体的には、対象画像Ｆ０（元画像）から所定の倍率γを用いて作成される（図１５参照）。所定の倍率γは、顔検出テンプレートＴＰ１が有する顔の大きさＬＡ１と使用者（ユーザ）が指定した最小の顔の大きさＭＮ（最小サイズ）との比で決定される。例えば、指定された大きさＭＮがテンプレートＴＰ１の大きさＬＡ１の２倍の大きさである場合は、倍率γ＝１／２となり、指定された大きさＭＮとテンプレートＴＰ１の大きさＬＡ１とが等しい場合は、倍率γ＝１つまり等倍となる。 Next, a first layer image F1 used to detect the minimum face size designated at the start of the detection process (step S21) is created (step S43). Specifically, it is created from the target image F0 (original image) using a predetermined magnification γ (see FIG. 15). The predetermined magnification γ is determined by a ratio between the face size LA1 of the face detection template TP1 and the minimum face size MN (minimum size) designated by the user (user). For example, when the designated size MN is twice the size LA1 of the template TP1, the magnification γ = 1/2, and the designated size MN and the size LA1 of the template TP1 are equal. In this case, the magnification γ = 1, that is, equal magnification.

上述のようにステップＳ４３において第１階層の画像Ｆ１が作成されると、中間階層の基準画像作成工程（ステップＳ４４）において、（Ｎ−１）個つまり３個の中間階層の基準画像Ｇ２が作成される（図１５参照）。具体的には、まず、第１階層の画像Ｆ１から縮小率αによって縮小された第２階層の画像Ｆ２が作成される。さらに、第２階層の画像Ｆ２から縮小率αによって第３階層の画像Ｆ３が作成され、以後、第Ｎ階層までの画像が縮小率αによって順次作成される。つまり、基準画像数Ｎ＝４の場合は、第４階層の画像Ｆ４まで縮小率αを用いて基準画像が作成されることになる。尚、ステップＳ４３で作成される第１階層の画像Ｆ１とステップＳ４４で作成される中間階層の基準画像Ｇ２（画像Ｆ２〜Ｆ４）とは、基準画像Ｇ３を構成する（図１８参照）。これらの基準画像Ｇ３は、次述するように、縮小画像を作成するにあたり基準となる画像である。基準画像Ｇ３の構成要素である各画像Ｆ１〜Ｆ４は、対象画像Ｆ０に対する倍率が互いに異なっている。詳細には、画像Ｆ１〜Ｆ４は、それぞれ対象画像Ｆ０のγ倍、γ×α倍、γ×α²倍、γ×α³倍となっている。 As described above, when the first layer image F1 is created in step S43, (N-1), that is, three middle layer reference images G2 are created in the middle layer reference image creation step (step S44). (See FIG. 15). Specifically, first, an image F2 of the second hierarchy reduced from the image F1 of the first hierarchy by the reduction ratio α is created. Further, an image F3 in the third hierarchy is created from the image F2 in the second hierarchy with a reduction ratio α, and thereafter images up to the Nth hierarchy are sequentially created with the reduction ratio α. That is, when the reference image number N = 4, the reference image is created using the reduction ratio α up to the fourth-layer image F4. The first layer image F1 created in step S43 and the intermediate layer reference image G2 (images F2 to F4) created in step S44 constitute a reference image G3 (see FIG. 18). These reference images G3 are images that serve as a reference in creating a reduced image, as will be described below. The images F1 to F4, which are constituent elements of the reference image G3, have different magnifications relative to the target image F0. Specifically, the images F1 to F4 are γ times, γ × α times, γ × α ² times, and γ × α ³ times the target image F0, respectively.

中間階層の基準画像作成工程（ステップＳ４４）が終了すると、縮小画像作成工程（ステップＳ４５）へと移行する。ステップＳ４５では、まず、第（Ｎ＋１）階層つまり第５階層に該当する画像Ｆ５が第１階層の画像Ｆ１に基づいて作成される。具体的には、第５階層の画像Ｆ５は、第１階層の画像Ｆ１を縮小率βによって縮小することによって作成される（図１６参照）。ここで、基準画像数Ｎと縮小率αと縮小率βとの間においては、β＝α^Nの関係が成立する。つまり、本実施形態（基準画像数Ｎ＝４、縮小率β＝１／２）の場合、上記の関係によって自動的に縮小率αが決定され、縮小率α≒１／１．１９となる。尚、この実施形態においては基準画像数Ｎ＝４としているが、基準画像数Ｎは、２以上でありかつ検出対象の顔の大きさの数Ｍ未満の整数であれば任意な数である。 When the intermediate layer reference image creation step (step S44) ends, the process proceeds to a reduced image creation step (step S45). In step S45, first, an image F5 corresponding to the (N + 1) th layer, that is, the fifth layer is created based on the image F1 of the first layer. Specifically, the fifth layer image F5 is created by reducing the first layer image F1 by the reduction ratio β (see FIG. 16). Here, the relationship β = α ^N is established among the reference image number N, the reduction rate α, and the reduction rate β. That is, in the case of the present embodiment (reference image number N = 4, reduction rate β = 1/2), the reduction rate α is automatically determined by the above relationship, and the reduction rate α≈1 / 1.99. In this embodiment, the reference image number N = 4. However, the reference image number N is an arbitrary number as long as it is an integer equal to or larger than 2 and less than the number M of face sizes to be detected.

ステップＳ４５における縮小画像の作成が終了すると、ステップＳ４２で決定された総階層数ＬとステップＳ４５で既に作成済みとなった画像の階層数Ｆｋとが比較される（ステップＳ４６）。この比較によって総階層数Ｌ分の階層画像が作成されたと判断されると、ピラミッド画像作成処理を終了し（ステップＳ４７）、作成階層数Ｆｋが総階層数Ｌに満たない場合は、作成階層数Ｆｋに１が加算される（ステップＳ４８）。例えば、Ｆｋ＝５である場合は、ステップＳ４８へと移行し、Ｆｋ＝６となる。 When the creation of the reduced image in step S45 is completed, the total number of hierarchies L determined in step S42 is compared with the number of hierarchies Fk of images already created in step S45 (step S46). If it is determined by this comparison that layer images corresponding to the total number of layers L have been created, the pyramid image creation process is terminated (step S47), and if the number of created layers Fk is less than the total number of layers L, the number of created layers 1 is added to Fk (step S48). For example, when Fk = 5, the process proceeds to step S48 and Fk = 6.

この後、新たな上位階層６について再び縮小画像処理（ステップＳ４５）が実行される。第６階層の画像Ｆ６の作成には、第２階層の画像Ｆ２が基準となり、第２階層の画像Ｆ２が縮小率βによって縮小されることによって作成される（図１６参照）。以後同様に、第７階層の画像Ｆ７の作成には、第３階層の画像Ｆ３が基準となり、第８階層の画像Ｆ８の作成には、第４階層の画像Ｆ４が基準となり、順次作成される。さらに、第９階層以降の画像も作成階層より４階層下の画像を基にして同様に作成され、階層数Ｆｋ＝１４までステップＳ４５からステップＳ４８の工程が繰り返し実行される（図１７参照）。 Thereafter, the reduced image processing (step S45) is executed again for the new upper hierarchy 6. The image F6 of the sixth hierarchy is created by using the image F2 of the second hierarchy as a reference, and the image F2 of the second hierarchy is reduced by the reduction ratio β (see FIG. 16). Thereafter, similarly, the image F3 in the third hierarchy is used as a reference for the creation of the image F7 in the seventh hierarchy, and the image F4 in the fourth hierarchy is used as the reference for the creation of the image F8 in the eighth hierarchy. . Further, the images after the ninth layer are similarly created based on the images four layers below the creation layer, and the steps S45 to S48 are repeatedly executed until the number of layers Fk = 14 (see FIG. 17).

図１７に示されるように、上述の繰り返し処理によって、総階層数Ｌ分の画像つまり第１４階層の画像Ｆ１４まで作成されると、ピラミッド画像作成処理は、終了し（ステップＳ４７）、探索窓決定処理（ステップＳ２４）へと移行する。 As shown in FIG. 17, when the image for the total number of hierarchies L, that is, the image F14 of the 14th hierarchy is created by the above-described repetition process, the pyramid image creation process ends (step S47) and the search window is determined. The process proceeds to processing (step S24).

上述したピラミッド画像作成処理（ステップＳ２３）を図１８を用いて総括すると、ステップＳ４３において基準画像Ｇ１に相当する第１階層の画像Ｆ１が作成され、ステップＳ４４において基準画像Ｇ１（画像Ｆ１）に基づいて中間階層の基準画像Ｇ２が縮小率αによって作成される。次に、ステップＳ４５からステップＳ４８の繰り返し処理において、基準画像Ｇ３（画像Ｆ１〜Ｆ４）から縮小画像ＧＭ１（画像Ｆ５〜Ｆ８）が縮小率βによって作成され、作成された縮小画像ＧＭ１に基づいて再縮小画像ＧＭ２（画像Ｆ９〜Ｆ１２）が縮小率βによって作成され、さらに、再縮小画像ＧＭ２に基づいて再々縮小画像ＧＭ３（画像Ｆ１３、Ｆ１４）が作成される。 When the above-described pyramid image creation processing (step S23) is summarized with reference to FIG. 18, a first-layer image F1 corresponding to the reference image G1 is created in step S43, and based on the reference image G1 (image F1) in step S44. Thus, the intermediate layer reference image G2 is created with the reduction ratio α. Next, in the iterative processing from step S45 to step S48, a reduced image GM1 (images F5 to F8) is created from the reference image G3 (images F1 to F4) with a reduction ratio β, and is reproduced based on the created reduced image GM1. The reduced image GM2 (images F9 to F12) is created with the reduction ratio β, and the re-reduced image GM3 (images F13 and F14) is created based on the re-reduced image GM2.

この繰り返し処理によって作成される画像は、全て縮小率β＝１／Ｋ（Ｋは２以上の整数）を用いて作成されるため、縮小率α等の数値を用いて作成するよりもピラミッド画像作成の処理を高速化することができる。 All images created by this iterative process are created using a reduction ratio β = 1 / K (K is an integer equal to or greater than 2), so a pyramid image is created rather than using a numerical value such as the reduction ratio α. Can be speeded up.

尚、ピラミッド画像ＭＰにおいては、上位階層になるほど縮小の度合いが大きくなっている（言い換えれば、元画像Ｆ０に対する倍率が小さくなっている）ので、上位階層になるほど解像度が低くなっている。また、本実施形態では、作成する総階層数がＬ＝１４であるため、再々縮小画像ＧＭ３に該当する画像は、再縮小画像ＧＭ２に属する第９階層の画像Ｆ９及び第１０階層の画像Ｆ１０を基にして作成された第１３階層の画像Ｆ１３及び第１４階層の画像Ｆ１４の二つとなる。仮に、総階層数Ｌ＝１３（Ｍ＝５，Ｓ＝２，Ｎ＝４）の場合を想定すると、再々縮小画像ＧＭ３に該当する画像は、再縮小画像ＧＭ２に属する第９階層の画像Ｆ９を基にして作成された第１３階層の画像Ｆ１３のみとなる。 Note that, in the pyramid image MP, the degree of reduction increases as the hierarchy becomes higher (in other words, the magnification with respect to the original image F0 decreases), so the resolution becomes lower as the hierarchy becomes higher. In the present embodiment, since the total number of layers to be created is L = 14, the images corresponding to the re-reduced image GM3 include the ninth-layer image F9 and the tenth-layer image F10 belonging to the re-reduced image GM2. The images are based on the 13th layer image F13 and the 14th layer image F14 created based on the above. Assuming that the total number of layers L = 13 (M = 5, S = 2, N = 4), an image corresponding to the re-reduced image GM3 is an image F9 of the ninth layer belonging to the re-reduced image GM2. Only the image 13 of the thirteenth layer created based on this is obtained.

図１８においては、対象画像中における複数の大きさの物体を検出するために用いる複数サイズ検出用ピラミッド画像Ｈ１は、対象画像に対する倍率が互いに異なる６つの画像Ｆ１〜Ｆ６を有するものとして構成されている。また、対象画像中における複数の大きさの物体をそれぞれ高速に検出するために用いる高速検出用ピラミッド画像Ｈ２は、１０枚の画像Ｆ５〜Ｆ１４で構成されている。詳細には、高速検出用ピラミッド画像Ｈ２は、複数サイズ検出用ピラミッド画像Ｈ１における６つの画像Ｆ１〜Ｆ６がそれぞれ所定倍率βで縮小された状態に相当する６つの画像Ｆ５〜Ｆ１０を有している。また、高速検出用ピラミッド画像Ｈ２は、これら６つの画像Ｆ５〜Ｆ１０がそれぞれ所定倍率βでさらに縮小された状態に相当する６つの画像Ｆ９〜Ｆ１４を有している。 In FIG. 18, the multiple-size detection pyramid image H1 used for detecting an object having a plurality of sizes in the target image is configured to include six images F1 to F6 having different magnifications with respect to the target image. Yes. Further, the high-speed detection pyramid image H2 used for detecting each of a plurality of objects in the target image at high speed is composed of ten images F5 to F14. Specifically, the high-speed detection pyramid image H2 includes six images F5 to F10 corresponding to a state in which the six images F1 to F6 in the multiple-size detection pyramid image H1 are respectively reduced at a predetermined magnification β. . The high-speed detection pyramid image H2 includes six images F9 to F14 corresponding to a state in which the six images F5 to F10 are further reduced by a predetermined magnification β.

ここで、複数サイズ検出用ピラミッド画像Ｈ１と高速検出用ピラミッド画像Ｈ２とは、上述の手法等で作成されることによって、各ピラミッド画像Ｈ１，Ｈ２の一部の画像についての対象画像に対する倍率が互いに同一となるように、当該一部の画像（具体的には、画像Ｆ５，Ｆ６）が共通化された状態で作成される。これによって、ピラミッド画像作成に要する時間を短縮することができる。これについては後述する。 Here, the multiple-size detection pyramid image H1 and the high-speed detection pyramid image H2 are created by the above-described method or the like, so that the magnifications of the pyramid images H1 and H2 with respect to the target image are mutually equal. The partial images (specifically, images F5 and F6) are created in a common state so as to be the same. Thereby, the time required for creating the pyramid image can be shortened. This will be described later.

尚、基準画像数Ｎ＝４かつ縮小率β＝１／２とした場合のピラミッド画像作成処理（ステップＳ２３）においては、縮小率β＝α^Nの関係から、縮小率αは、厳密には縮小率α＝１／１．１８９２・・・の無理数となるが、上述の実施形態では、縮小率β＝α^Nの関係を満たす値として、α＝１／１．１９の有理数を採用している。このため、α^Nは縮小率βに完全には一致しない。しかしながら、この不一致は許容誤差範囲内のものでありピラミッド画像作成処理において不都合を生じるものではない。なぜなら、図１９に示すように、中間階層の基準画像Ｇ２を作成する際に、一律縮小率α＝１／１．１９を用いたとしても、第５階層の画像Ｆ５は、そもそも第１階層の画像Ｆ１から縮小率βによって作成される画像であり、同様に、第６階層以降の画像も全て縮小率βを用いて作成されるため、ピラミッド画像作成に影響を与えることはないからである。 Note that in the pyramid image creation process (step S23) when the reference image number N = 4 and the reduction rate β = 1/2, the reduction rate α is strictly reduced from the relationship of the reduction rate β = α ^N. In this embodiment, the rational number α = 1 / 1.99 is used as the value satisfying the relationship of the reduction ratio β = α ^N. Yes. For this reason, α ^N does not completely match the reduction ratio β. However, this discrepancy is within an allowable error range and does not cause inconvenience in the pyramid image creation processing. This is because, as shown in FIG. 19, even when the uniform reduction ratio α = 1 / 1.99 is used when creating the intermediate layer reference image G2, the fifth layer image F5 is originally in the first layer. This is because the image is created from the image F1 with the reduction ratio β, and similarly, all the images in the sixth and subsequent layers are also created using the reduction ratio β, so that the pyramid image creation is not affected.

＜探索窓決定処理＞
次に、探索窓決定処理（ステップＳ２４）の説明前に、検出装置１における検索方法について図２０、図２１及び図２２を用いて詳細に説明する。 <Search window decision processing>
Next, before the search window determination process (step S24) is described, the search method in the detection apparatus 1 will be described in detail with reference to FIGS.

図２０は、最小の大きさの顔を検出する場合の検出用情報ＩＦ（ＩＦ２）とピラミッド画像ＭＰ（ＭＰ２）との対応関係を示す図である。図２１は、検出する各大きさに対応するピラミッド画像中の注目階層を示す図である。図２２は、ピラミッド画像において各注目階層が検出に使用する階層を示した図である。 FIG. 20 is a diagram illustrating a correspondence relationship between the detection information IF (IF2) and the pyramid image MP (MP2) when detecting the face having the minimum size. FIG. 21 is a diagram illustrating a target hierarchy in a pyramid image corresponding to each size to be detected. FIG. 22 is a diagram illustrating a hierarchy used by each attention hierarchy for detection in the pyramid image.

図２０に示されるように、使用者（ユーザ）によって指定された最小の大きさの顔（大きさＳＺ１）を検出する場合には、第１階層の画像Ｆ１と顔検出テンプレートＴＰ１と、第５階層の画像Ｆ５と縮小顔テンプレートＴＰ２と、第９階層の画像Ｆ９と再縮小テンプレートＴＰ３から抽出された特徴量ＣＨとがそれぞれ比較検出されることになる。 As shown in FIG. 20, in the case of detecting the face (size SZ1) having the minimum size designated by the user (user), the first layer image F1, the face detection template TP1, and the fifth The image F5 and the reduced face template TP2 in the hierarchy, and the feature amount CH extracted from the image F9 in the ninth hierarchy and the re-reduced template TP3 are respectively detected by comparison.

ここで、比較検出の際に基準となる階層は、注目階層と呼ぶ。つまり、大きさＳＺ１を検出する際の注目階層は、第１階層である。例えば、図２１に示されるように、大きさＳＺ２の顔を検出する場合には、注目階層は、第２階層となる。この場合、縮小顔テンプレートＴＰ２の比較対象になる階層は第６階層であり、特徴量ＣＨの比較対象になる階層は第１０階層である。さらに、大きさＳＺ３から大きさＳＺ６までの各大きさの顔を検出する場合には、それぞれ第３階層から第６階層までが注目階層となる。上述した対応関係は、図２２において詳細に示されている。例えば、大きさＳＺ５の顔を検出する場合には、注目階層（図２２中、◎を付記した部分に該当）は第５階層、縮小顔テンプレートＴＰ２の比較対象になる階層（○を付記した部分に該当）は第９階層、特徴量ＣＨの比較対象になる階層（△を付記した部分に該当）は第１３階層となる。 Here, a hierarchy serving as a reference in comparison detection is referred to as an attention hierarchy. That is, the target hierarchy when detecting the size SZ1 is the first hierarchy. For example, as shown in FIG. 21, when a face of size SZ2 is detected, the attention layer is the second layer. In this case, the layer to be compared with the reduced face template TP2 is the sixth layer, and the layer to be compared with the feature amount CH is the tenth layer. Further, in the case of detecting faces of various sizes from size SZ3 to size SZ6, the third to sixth layers are the attention layers, respectively. The correspondence relationship described above is shown in detail in FIG. For example, when a face of size SZ5 is detected, the target layer (corresponding to the part marked with ◎ in FIG. 22) is the fifth layer and the layer to be compared with the reduced face template TP2 (part marked with ◯) Corresponds to the ninth layer, and the layer to be compared with the feature amount CH (corresponding to the part marked with Δ) is the thirteenth layer.

図２２に示されるように、注目階層が第１階層の画像Ｆ１から第６階層の画像Ｆ６である場合、検出処理に必要とされるピラミッド画像の階層数は、第１階層の画像Ｆ１から第１４階層の画像Ｆ１４までの１４階層となる。また、これは、ステップＳ４２において算出したピラミッド画像作成総数と一致している。 As shown in FIG. 22, when the target layer is the first layer image F1 to the sixth layer image F6, the number of layers of the pyramid image required for the detection process is from the first layer image F1 to the first layer. There are 14 layers up to the image F14 of 14 layers. This coincides with the total number of pyramid images created in step S42.

ここで、図８の比較例に示されるような複数サイズ検出用ピラミッド画像ＰＥ１と高速検出用ピラミッド画像ＰＳ１とを別々に作成し検出処理を実行する検出手法を採用して、本実施形態と同様に６つの顔の大きさを高速に検出する場合を想定する。この場合、複数サイズ検出用ピラミッド画像に注目階層分の６階層の画像を必要とし、さらに、高速検出用ピラミッド画像に注目階層の１階層ごとに２階層ずつ、つまり、６×２＝１２階層分の画像を必要とする。すなわち、検出に要するピラミッド画像の合計作成総数Ｌｎは、１２＋６＝１８階層分の画像となる。また、図８の比較例に示される検出手法において、検出に要するピラミッド画像の作成総数Ｌｎは、検出対象の顔の大きさの数Ｍとテンプレート縮小回数Ｓとの関係からＬｎ＝Ｍ＋Ｍ×Ｓによって求められる。 Here, as in the present embodiment, a detection method is employed in which a multi-size detection pyramid image PE1 and a high-speed detection pyramid image PS1 as shown in the comparative example of FIG. Assume that six face sizes are detected at high speed. In this case, images of six layers corresponding to the target layer are required for the pyramid images for size detection, and further, two layers for each layer of the target layer are included in the pyramid image for high speed detection, that is, 6 × 2 = 12 layers. Need an image. That is, the total total number Ln of pyramid images required for detection is an image for 12 + 6 = 18 layers. Further, in the detection method shown in the comparative example of FIG. 8, the total number Ln of pyramid images required for detection is Ln = M + M × S from the relationship between the number M of face sizes to be detected and the number of template reductions S. Desired.

このように、検出装置１でのピラミッド画像作成数Ｌが図８に示す検出手法よりも少なくて済む理由は、検出装置１では、複数サイズ検出用ピラミッド画像と高速検出用ピラミッド画像とを一部共通化したピラミッド画像ＭＰ（ＭＰ２）を作成することによって、ピラミッド画像作成の効率化を実現しているためである。より詳細には、図２２中の第５階層及び第６階層Ｃ１又は第９階層及び第１０階層Ｃ２に示されるように、検出装置１は、検出処理の際に一度利用した画像を、別の大きさの顔を検出処理する際にも再利用できるようにピラミッド画像を作成することによって、作成する総画像数を減らすことを可能にしている。具体的には、この場合、（基準画像数Ｎ）＜（検出対象の顔の大きさの数Ｍ）が成立するため、（本実施形態における作成画像数Ｌ）＜（図８の比較例における作成画像数Ｌｎ）となる。 As described above, the reason why the number L of pyramid images created by the detection apparatus 1 is smaller than that of the detection method shown in FIG. 8 is that the detection apparatus 1 partially uses a plurality of size detection pyramid images and a high-speed detection pyramid image. This is because the efficiency of pyramid image creation is realized by creating a common pyramid image MP (MP2). More specifically, as shown in the fifth and sixth hierarchies C1 or the ninth and tenth hierarchies C2 in FIG. 22, the detection apparatus 1 uses the image once used in the detection process as another image. It is possible to reduce the total number of images to be created by creating pyramid images so that they can be reused even when detecting a size face. Specifically, in this case, (reference image number N) <(number M of face sizes to be detected) holds, so (number of created images L in the present embodiment) <(in the comparative example of FIG. 8). The number of created images Ln).

また、第５階層の画像Ｆ５及び第６階層の画像Ｆ６は、複数サイズ検出用ピラミッド画像及び高速検出用ピラミッド画像の両ピラミッド画像（画像群）に属する画像であり、対象画像Ｆ０に対する倍率が各々互いに同一である。このため、ピラミッド画像ＭＰ（ＭＰ２）は複数サイズ検出用ピラミッド画像と高速検出用ピラミッド画像とを一部共通化したピラミッド画像となっている。 The fifth-layer image F5 and the sixth-layer image F6 are images belonging to both pyramid images (image groups) of the multiple-size detection pyramid image and the high-speed detection pyramid image, and the magnifications with respect to the target image F0 are respectively set. Are identical to each other. Therefore, the pyramid image MP (MP2) is a pyramid image in which a plurality of size detection pyramid images and a high-speed detection pyramid image are partially shared.

以上より、検出装置１は、画像中に存在する複数の大きさの物体を高速に検出することを可能とし、さらに、検出に用いるピラミッド画像の効率使用により当該作成数を減らし、ピラミッド画像作成に要する時間を短縮することで、検出処理のさらなる高速化を実現している。 As described above, the detection device 1 can detect a plurality of objects having a plurality of sizes in an image at a high speed, and further reduce the number of creation by using the pyramid image used for detection, thereby creating a pyramid image. By shortening the time required, the detection processing is further speeded up.

また、検出装置１は、注目階層に相当する第５階層の画像Ｆ５を、第１階層の画像Ｆ１から縮小率βによって縮小することによって作成し、同様に、第６階層の画像Ｆ６も第２階層の画像Ｆ２から縮小率βによって縮小することによって作成している。このように、縮小率β＝１／Ｋ（Ｋは２以上の整数）を用いて作成することにより、縮小率α＝１／１．１９等を用いて作成する場合に比べて、複雑な補間処理を行わずに済むため、縮小画像作成の処理時間を短縮することが可能となる。 Further, the detection apparatus 1 creates the fifth layer image F5 corresponding to the target layer by reducing the image F1 from the first layer with the reduction rate β, and similarly, the sixth layer image F6 is also the second layer image F6. It is created by reducing the image F2 in the hierarchy with the reduction ratio β. In this way, by using the reduction ratio β = 1 / K (K is an integer of 2 or more), the interpolation is more complicated than when using the reduction ratio α = 1 / 1.19 or the like. Since it is not necessary to perform processing, it is possible to shorten the processing time for creating a reduced image.

一方、図８の比較例に示される検出手法においては、注目階層分の画像作成の際、縮小率βを有効に利用した作成手法とはなっていない。 On the other hand, the detection method shown in the comparative example of FIG. 8 is not a creation method that effectively uses the reduction rate β when creating an image for the target layer.

つまり、検出装置１は、第５階層の画像Ｆ５及び第６階層の画像Ｆ６の作成に要する処理時間を短縮することができ、検出処理の高速化を実現している。 In other words, the detection apparatus 1 can shorten the processing time required to create the fifth-layer image F5 and the sixth-layer image F6, and realizes a high-speed detection process.

次に、比較検出の順序について説明する。比較検出は、まず最初に、最も縮小された画像同士間つまり第９階層の画像Ｆ９と再縮小テンプレートＴＰ３との間において行われる（ＣＰ１）。より詳細には、画像Ｆ９から抽出された特徴量（肌色度数）と、再縮小テンプレートＴＰ３から抽出された特徴量ＣＨ（肌色度数）とが比較される。この最初の比較検出ＣＰ１は画像Ｆ９の全領域において行われ、最初の比較検出ＣＰ１によって検出対象の大きさの顔が検出された場合、検出された位置を基にして、第５階層の画像Ｆ５と縮小顔テンプレートＴＰ２との間において比較検出が行われる（ＣＰ２）。さらに、２回目の比較検出ＣＰ２において、さらに検出された場合、検出された位置を基にして、第１階層の画像Ｆ１と顔検出テンプレートＴＰ１との間において比較検出が行われる（ＣＰ３）。３回目の比較検出ＣＰ３においても検出された場合、検出対象の大きさの顔が対象画像中に存在するとして出力されることになる。 Next, the order of comparison detection will be described. The comparison detection is first performed between the most reduced images, that is, between the image F9 in the ninth layer and the re-reduction template TP3 (CP1). More specifically, the feature amount (skin color frequency) extracted from the image F9 is compared with the feature amount CH (skin color frequency) extracted from the re-reduction template TP3. This first comparison detection CP1 is performed in the entire region of the image F9. When a face having a size to be detected is detected by the first comparison detection CP1, the image F5 in the fifth hierarchy is detected based on the detected position. And the reduced face template TP2 are compared and detected (CP2). Further, if further detection is performed in the second comparison detection CP2, comparison detection is performed between the image F1 on the first layer and the face detection template TP1 based on the detected position (CP3). When the detection is performed also in the third comparison detection CP3, the face of the detection target size is output as being present in the target image.

上述のように、最初に、縮小画像全体において比較検出を実行し、検出対象物体の概略位置を特定することにより、検出に要する処理時間を短縮することができる。 As described above, the processing time required for detection can be shortened by first performing comparative detection on the entire reduced image and specifying the approximate position of the detection target object.

次に、探索窓決定処理について、図２０、図２３及び図２４を用いて説明する。 Next, the search window determination process will be described with reference to FIGS. 20, 23, and 24.

図２３は、注目階層を第１階層とした際の検出処理手順を示す図である。図２４は、探索窓のシフト処理を示す図である。図２４では、画像Ｆ９の左上隅を基準位置Ｐ（１，１）、右下隅の位置をＰ（Ｘｍ，Ｙｍ）としている。また、各探索窓の位置を、当該各探索窓の左上隅の座標で表すものとする。 FIG. 23 is a diagram illustrating a detection processing procedure when the target layer is the first layer. FIG. 24 is a diagram illustrating search window shift processing. In FIG. 24, the upper left corner of the image F9 is the reference position P (1, 1), and the lower right corner is P (Xm, Ym). Further, the position of each search window is represented by the coordinates of the upper left corner of each search window.

図１２に示されるように、探索窓決定処理が開始されると（ステップＳ５１）、後述のマッチング処理を実行する探索窓が決定される（ステップＳ５２）。 As shown in FIG. 12, when the search window determination process is started (step S51), a search window for executing a matching process described later is determined (step S52).

図２３は、図２０と同様、注目階層が第１階層である場合の検出方法を図示したものであり、上述の通り、最初に実行される処理ＣＰ１は、最も縮小された画像同士間つまり第９階層の画像Ｆ９と再縮小テンプレートＴＰ３から抽出された特徴量ＣＨとの間において行われる。より詳細には、再縮小テンプレートＴＰ３と等しい大きさの領域を有する探索窓ＩＷが画像Ｆ９の左上隅に設定される。探索窓ＩＷの設定が終了すると（ステップＳ５３）、後述のマッチング処理（ステップＳ２５）に移行する。マッチング処理（ステップＳ２５）では、設定された探索窓ＩＷにおいて、特徴量ＣＨを用いたマッチング処理が行われる。マッチング処理終了後、再び探索窓決定処理が開始され（ステップＳ５１）、前回マッチング処理位置よりも１画素分ｘ軸方向にシフトした場所が新たな探索窓ＩＷと設定される（ステップＳ５２）。ステップＳ５１からステップＳ５３までの工程は、探索窓ＩＷが第９階層の画像Ｆ９の全ての領域をシフトし、網羅するまで実行される。 FIG. 23 illustrates the detection method when the target layer is the first layer, as in FIG. 20. As described above, the process CP1 that is executed first is between the most reduced images, that is, the first layer. This is performed between the image F9 of the nine layers and the feature amount CH extracted from the re-reduction template TP3. More specifically, the search window IW having an area having the same size as the re-reduction template TP3 is set at the upper left corner of the image F9. When the setting of the search window IW is completed (step S53), the process proceeds to a matching process (step S25) described later. In the matching process (step S25), the matching process using the feature amount CH is performed in the set search window IW. After the matching process is completed, the search window determination process is started again (step S51), and a location shifted by one pixel in the x-axis direction from the previous matching process position is set as a new search window IW (step S52). The processes from step S51 to step S53 are executed until the search window IW shifts and covers all areas of the image F9 in the ninth hierarchy.

探索窓ＩＷのシフト処理を、図２４を用いてより詳細に説明する。探索窓ＩＷは、第９階層の画像Ｆ９における左上隅の初期位置Ｐ（１，１）から右上隅位置Ｐ（Ｘｍ，１）まで１画素ずつｘ軸方向にシフトする（図２４（ａ）参照）。探索窓ＩＷが右上隅位置Ｐ（Ｘｍ，１）まで到達すると、初期位置Ｐ（１，１）よりｙ軸方向に１画素分シフトした位置Ｐ（１，２）に探索窓ＩＷを設定する。以後同様に、位置Ｐ（１，２）からｘ方向に同様の走査が行われ、最終的には、位置Ｐ（Ｘｍ，Ｙｍ）まで探索窓ＩＷが移動する。すなわち、ｘ方向を主走査方向、ｙ方向を副走査方向として探索窓ＩＷの走査処理が行われる。尚、シフト幅は、上述のように１画素に限定されるものではなく、検出処理時間及び検出精度等との関係により適切な値に変更してもよい。シフト幅を増やせば、検出精度は粗くなるが、検出処理時間は短縮される。 The shift process of the search window IW will be described in more detail with reference to FIG. The search window IW is shifted pixel by pixel in the x-axis direction from the initial position P (1, 1) at the upper left corner in the image F9 in the ninth layer to the upper right corner position P (Xm, 1) (see FIG. 24A). ). When the search window IW reaches the upper right corner position P (Xm, 1), the search window IW is set at a position P (1, 2) shifted by one pixel in the y-axis direction from the initial position P (1, 1). Thereafter, similarly, the same scanning is performed from the position P (1, 2) in the x direction, and the search window IW finally moves to the position P (Xm, Ym). That is, the scanning process of the search window IW is performed with the x direction as the main scanning direction and the y direction as the sub scanning direction. Note that the shift width is not limited to one pixel as described above, and may be changed to an appropriate value depending on the relationship with the detection processing time, the detection accuracy, and the like. If the shift width is increased, the detection accuracy becomes coarse, but the detection processing time is shortened.

＜マッチング処理＞
以下、図２５及び図２６を用いてマッチング処理（ステップＳ２５）について説明する。 <Matching process>
Hereinafter, the matching process (step S25) will be described with reference to FIGS.

図２５は、低解像度画像におけるシフトの様子と高解像度画像におけるシフトの様子とを一次元で示す図である。 FIG. 25 is a diagram illustrating one-dimensionally the shift state in the low-resolution image and the shift state in the high-resolution image.

図２６は、高解像度画像におけるシフトの様子を二次元で示す図である。 FIG. 26 is a diagram two-dimensionally showing the shift state in the high resolution image.

図１３に示されるように、マッチング処理が開始されると（ステップＳ６１）、探索窓ＩＷにおいて肌色領域判定処理が実行される（ステップＳ６２）。肌色領域判定は、上述した最初の比較検出ＣＰ１（図２３参照）に相当する処理であり、縮小テンプレートＴＰ３より抽出される当該全体領域に対する肌色画素の割合即ち特徴量ＣＨと探索窓ＩＷが占める領域中の肌色画素の割合とが比較される。例えば、探索窓ＩＷが有する赤（Ｒ）、緑（Ｇ）、青（Ｂ）の三原色情報を色相（Ｈ）、色彩（Ｓ）、明度（Ｖ）の色彩体系へと色空間変換を行い、色相（Ｈ）の値が予め設定した肌色を示す範囲内か否かによって判断し、探索窓ＩＷ領域中の肌色画素数を算出し、当該肌色画素数が、特徴量ＣＨの肌色画素数から決定されるしきい値以上であれば肌色領域を検出したと判断すればよい。 As shown in FIG. 13, when the matching process is started (step S61), the skin color area determination process is executed in the search window IW (step S62). The skin color area determination is a process corresponding to the first comparison detection CP1 (see FIG. 23) described above, and the ratio of the skin color pixels to the entire area extracted from the reduced template TP3, that is, the area occupied by the feature amount CH and the search window IW The ratio of the skin color pixels inside is compared. For example, the three primary color information of red (R), green (G), and blue (B) included in the search window IW is converted to a color system of hue (H), color (S), and brightness (V), and color space conversion is performed. Judgment is made based on whether or not the value of the hue (H) is within a range indicating a preset skin color, the number of skin color pixels in the search window IW region is calculated, and the number of skin color pixels is determined from the number of skin color pixels of the feature amount CH If the threshold is equal to or greater than the threshold value, it may be determined that the skin color region has been detected.

肌色領域判定処理（ステップＳ６２）において、肌色領域が検出されなければ（ステップＳ６３）、マッチング処理を終了し（ステップＳ６４）、ステップＳ２６を経て次の探索窓ＩＷを決定する処理に移行する（図９参照）。肌色領域が検出されると（ステップＳ６３）、検出された位置を基にして、上述の第５階層の画像Ｆ５と縮小顔テンプレートＴＰ２との間において比較検出ＣＰ２が行われる（ステップＳ６５）。 If the skin color area is not detected in the skin color area determination process (step S62) (step S63), the matching process is terminated (step S64), and the process proceeds to the process of determining the next search window IW through step S26 (FIG. 9). When the skin color area is detected (step S63), the comparison detection CP2 is performed between the above-described fifth layer image F5 and the reduced face template TP2 based on the detected position (step S65).

さらに、上述のステップＳ６５において検出対象の大きさの顔が存在すると検出されなければ（ステップＳ６６）、マッチング処理を終了し（ステップＳ６７）、ステップＳ２６を経て次の探索窓ＩＷを決定する処理に移行する（図９参照）。検出対象の大きさの顔が存在すると検出されると（ステップＳ６６）、検出された位置を基にして、第１階層の画像Ｆ１と顔検出テンプレートＴＰ１との間においてテンプレートマッチングによる比較検出ＣＰ３が行われる（ステップＳ６８）。 Furthermore, if it is not detected in step S65 that there is a face of the size to be detected (step S66), the matching process is terminated (step S67), and the process proceeds to step S26 to determine the next search window IW. Transition (see FIG. 9). When it is detected that a face of the size to be detected exists (step S66), based on the detected position, comparison detection CP3 by template matching is performed between the image F1 on the first layer and the face detection template TP1. Performed (step S68).

このように、最初の比較検出ＣＰ１において、特徴量ＣＨを用いた肌色領域判定を実行することにより、肌色領域を有しない探索窓ＩＷのテンプレートマッチングを省略することが可能となり検出処理を高速に行うことができる。すなわち、顔が存在する可能性が高い領域を、低解像度画像を用いた比較処理によって全体領域の中から絞り込むことが可能である。言い換えれば、肌色領域による比較処理で顔が存在しないと判定される領域を、高解像度画像による比較処理の対象から予め除外することが可能である。 As described above, in the first comparison detection CP1, by performing skin color region determination using the feature amount CH, template matching of the search window IW having no skin color region can be omitted, and detection processing is performed at high speed. be able to. That is, it is possible to narrow down an area where a face is likely to exist from the entire area by a comparison process using a low resolution image. In other words, it is possible to exclude in advance an area that is determined to have no face in the comparison process using the skin color area from the comparison process using the high-resolution image.

ステップＳ６８が終了すると、マッチング処理を終了し（ステップＳ６９）、ステップＳ２６へと移行する。尚、ステップＳ６８において、検出対象の大きさの顔が検出されると、検出情報（例えば、検出位置又は大きさ等）は出力情報としてＲＡＭ３ｂに記憶され、全処理終了後（ステップＳ２７）出力される。 When step S68 ends, the matching process ends (step S69), and the process proceeds to step S26. When a face having a size to be detected is detected in step S68, detection information (for example, detection position or size) is stored in the RAM 3b as output information, and is output after completion of all the processing (step S27). The

以下、マッチング処理について、具体的に例を挙げて詳説する。 Hereinafter, the matching process will be described in detail with specific examples.

例えば、ステップＳ６２において、図２３における第９階層の画像Ｆ９中の位置Ｐ（ｖ，ｗ）で肌色領域が検出されたとすると、第５階層の画像Ｆ５中の位置Ｐ（ｖ，ｗ）を基にして、縮小顔テンプレートＴＰ２とのテンプレートマッチングによる比較検出ＣＰ２が行われる。具体的には、図２５に示すように、比較検出ＣＰ１において、探索窓ＩＷが位置Ｐｖ−２、位置Ｐｖ−１、さらに、位置Ｐｖとシフトし、位置Ｐｖにおける肌色領域判定（ステップＳ６２）において肌色領域が検出された場合、第５階層の画像Ｆ５と縮小顔テンプレートＴＰ２との間の比較検出ＣＰ２（ステップＳ６５）へと移行し、位置Ｐｖと位置Ｐｖ＋０．５との２地点においてテンプレートマッチングが実行される。 For example, if a skin color region is detected at the position P (v, w) in the image F9 of the ninth hierarchy in FIG. 23 in step S62, the position P (v, w) in the image F5 of the fifth hierarchy is used as the basis. Thus, comparison detection CP2 is performed by template matching with the reduced face template TP2. Specifically, as shown in FIG. 25, in comparison detection CP1, the search window IW is shifted to position Pv-2, position Pv-1, and further to position Pv, and in skin color region determination at position Pv (step S62). When the skin color area is detected, the process proceeds to comparison detection CP2 (step S65) between the image F5 of the fifth layer and the reduced face template TP2, and template matching is performed at two points of the position Pv and the position Pv + 0.5. Executed.

ここで、比較検出ＣＰ１において用いられる画像Ｆ９は、比較検出ＣＰ２において用いられる画像Ｆ５を１／２倍にした画像であるため、画像Ｆ９における探索窓ＩＷのシフト幅１画素分は画像Ｆ５における０．５画素分に相当する。 Here, since the image F9 used in the comparison detection CP1 is an image obtained by halving the image F5 used in the comparison detection CP2, the shift width of one pixel of the search window IW in the image F9 is 0 in the image F5. This corresponds to 5 pixels.

このため、比較検出ＣＰ２において行うテンプレートマッチングのシフト幅を１画素とすると、比較検出ＣＰ２においてテンプレートマッチングを実行する位置は、一次元でみると、図２５に示される上述の２地点となり、実際に実行される二次元では、画像Ｆ９中におけるＰ（ｖ，ｗ）、Ｐ（ｖ＋０．５，ｗ）、Ｐ（ｖ，ｗ＋０．５）及びＰ（ｖ＋０．５，ｗ＋０．５）の４点に相当する位置となる。図２６は、上述した４地点におけるシフトの様子を示しおり、図２３中の領域ＰＲ２を拡大表示した図に相当する。画像Ｆ５において画像Ｆ９中のＰ（ｖ，ｗ）に相当する位置ＰＡにおいてテンプレートマッチングが行われ、その後、位置ＰＡよりｘ軸方向に１画素分シフトした位置ＰＢ（画像Ｆ９中のＰ（ｖ＋０．５，ｗ）に相当）においてテンプレートマッチングが行われる（図２６（ａ）参照）。さらに、位置ＰＡからｙ軸方向に１画素分シフトした位置ＰＣ（画像Ｆ９中のＰ（ｖ，ｗ＋０．５）に相当）においてテンプレートマッチングが行われ、その後、位置ＰＣよりｘ軸方向に１画素分シフトした位置ＰＤ（画像Ｆ９中のＰ（ｖ＋０．５，ｗ＋０．５）に相当）においてテンプレートマッチングが行われる（図２６（ｂ）参照）。 Therefore, if the shift width of the template matching performed in the comparison detection CP2 is 1 pixel, the position where the template matching is performed in the comparison detection CP2 is the above-described two points shown in FIG. In the two-dimensional executed, there are four points P (v, w), P (v + 0.5, w), P (v, w + 0.5) and P (v + 0.5, w + 0.5) in the image F9. The corresponding position. FIG. 26 shows the state of shifting at the four points described above, and corresponds to a diagram in which the region PR2 in FIG. 23 is enlarged and displayed. In the image F5, template matching is performed at a position PA corresponding to P (v, w) in the image F9, and then a position PB (P (v + 0.P in the image F9) shifted by one pixel from the position PA in the x-axis direction. 5), template matching is performed (see FIG. 26A). Further, template matching is performed at a position PC (corresponding to P (v, w + 0.5) in the image F9) shifted by one pixel from the position PA in the y-axis direction. Template matching is performed at the position PD shifted by the minute (corresponding to P (v + 0.5, w + 0.5) in the image F9) (see FIG. 26B).

また、テンプレートマッチングは、比較する画像同士（例えば、縮小顔テンプレートＴＰ２と第５階層の画像Ｆ５の比較領域）の相関値を算出し、算出した相関値と予め設定したしきい値とを比較することにより、検出対象の大きさの顔が存在するか否かを決定する。例えば、検出用テンプレートが１００×１００の画素を有する場合には、一万個の対応する２画素間で画素信号（ここでは色差情報）の差分絶対値を求め、それによって得られる一万個の差分絶対値の平均値を、相関値とすることができる。このような相関値は、対応する画素同士が被写体の同一部分を含む場合には小さな値となり、異なる部分である場合には大きな値となる。 In template matching, a correlation value between images to be compared (for example, a comparison region between the reduced face template TP2 and the fifth layer image F5) is calculated, and the calculated correlation value is compared with a preset threshold value. Thus, it is determined whether or not a face having a size to be detected exists. For example, when the detection template has 100 × 100 pixels, the absolute difference value of the pixel signal (here, color difference information) is obtained between 10,000 corresponding two pixels, and 10,000 pixels obtained thereby are obtained. The average value of the absolute differences can be used as the correlation value. Such a correlation value is a small value when the corresponding pixels include the same part of the subject, and a large value when the corresponding pixels are different parts.

尚、画像Ｆ５において行うテンプレートマッチングのシフト幅を１画素としたが、これに限定されない。例えば、所定の縮小画像ＡＡにおいて採用した所定のシフト幅ｓａを、当該縮小画像ＡＡの元となる画像ＢＢに換算した際のシフト幅ｓａ２よりも大きくならない範囲で、画像ＢＢにおいて実行するシフト幅ｓｂを採用すれば、画像ＢＢにおいて実行する検出は、縮小画像ＡＡにおいて実行した検出と同等、若しくは、高精度の検出が可能となる。つまり、画像Ｆ９における探索窓ＩＷのシフト幅が１画素であれば、画像Ｆ５においては、２画素以下のシフト幅に設定すればよい。 In addition, although the shift width of the template matching performed in the image F5 is 1 pixel, it is not limited to this. For example, the shift width sb executed in the image BB is within a range in which the predetermined shift width sa adopted in the predetermined reduced image AA is not larger than the shift width sa2 when converted to the image BB that is the original of the reduced image AA. Is used, the detection performed on the image BB can be performed with the same or high accuracy as the detection performed on the reduced image AA. That is, if the shift width of the search window IW in the image F9 is 1 pixel, the shift width of 2 pixels or less may be set in the image F5.

以上、ステップＳ６２において肌色領域が検出され、当該検出位置に基づいてステップＳ６５において実行されるマッチング処理について詳述したが、さらにステップＳ６５において検出対象の大きさの顔が存在すると検出された場合も同様の処理が行われる。すなわち、ステップＳ６８では、ステップＳ６５で検出された位置から得られる４地点において第１階層の画像Ｆ１と顔検出テンプレートＴＰ１とのテンプレートマッチングによる比較検出ＣＰ３が実行される。例えば、ステップＳ６５において第５階層の画像Ｆ５中の位置ＰＷ２（ＰＣ）で検出されたとすると、位置ＰＷ２に対応する第１階層の画像Ｆ１中の位置ＰＷ１を基にして得られる図２３中の領域ＰＲ１に示す４地点においてテンプレートマッチングによる比較検出ＣＰ３が行われる。 As described above, the skin color area is detected in step S62, and the matching process executed in step S65 based on the detection position has been described in detail. However, there may be a case where it is detected in step S65 that a face having a size to be detected exists. Similar processing is performed. That is, in step S68, comparison detection CP3 by template matching between the image F1 of the first hierarchy and the face detection template TP1 is executed at the four points obtained from the position detected in step S65. For example, if it is detected at the position PW2 (PC) in the fifth-layer image F5 in step S65, the region in FIG. 23 obtained based on the position PW1 in the first-layer image F1 corresponding to the position PW2. Comparison detection CP3 by template matching is performed at four points indicated by PR1.

以上のように、或る注目階層の画像（以下、単に注目画像とも称する）で検出対象となる大きさの物体を検出する動作（すなわち或るサイズの物体検出動作）においては、注目画像と注目画像を所定倍率βで縮小した画像（高速検出用ピラミッド画像Ｈ２中の画像）とのうち低解像度の画像から順に決定された画像を対象として、テンプレートを用いた比較処理を行うことによって、対象物体の検出が行われる。たとえば、最小大きさＳＺ１の顔検出処理を行う場合には、最初に低解像度の画像Ｆ９とテンプレートＴＰ３とを用いた比較処理（比較検出ＣＰ１）が行われ、次に中解像度の画像Ｆ５とテンプレートＴＰ２とを用いた比較処理（比較検出ＣＰ２）が行われ、最後に高解像度の画像Ｆ１とテンプレートＴＰ１とを用いた比較処理（比較検出ＣＰ３）が行われる。これによって、最小大きさＳＺ１の顔が対象画像中に存在するか否か、及び存在する場合にはその存在位置が検出されることになる。また、より大きなサイズＳＺ２の顔検出処理が行われる場合には、最初に低解像度の画像Ｆ１０とテンプレートＴＰ３とを用いた比較処理（比較検出ＣＰ１）が行われ、次に中解像度の画像Ｆ６とテンプレートＴＰ２とを用いた比較処理（比較検出ＣＰ２）が行われ、最後に高解像度の画像Ｆ２とテンプレートＴＰ１とを用いた比較処理（比較検出ＣＰ３）が行われる。これによって、大きさＳＺ２の顔の存在位置等が検出される。他のサイズＳＺ３〜ＳＺ６についても同様である。 As described above, in an operation for detecting an object having a size to be detected in an image of a certain target hierarchy (hereinafter, also simply referred to as a “target image”) (that is, an object detection operation of a certain size), the target image and the target The target object is obtained by performing comparison processing using a template for images determined in order from low-resolution images among images obtained by reducing images at a predetermined magnification β (images in the high-speed detection pyramid image H2). Is detected. For example, when the face detection process of the minimum size SZ1 is performed, a comparison process (comparison detection CP1) using the low resolution image F9 and the template TP3 is first performed, and then the medium resolution image F5 and the template are performed. Comparison processing (comparison detection CP2) using TP2 is performed, and finally comparison processing (comparison detection CP3) using the high-resolution image F1 and template TP1 is performed. As a result, whether or not a face having the minimum size SZ1 is present in the target image, and if present, its presence position is detected. When face detection processing with a larger size SZ2 is performed, first, comparison processing (comparison detection CP1) using the low-resolution image F10 and the template TP3 is performed, and then the medium-resolution image F6 and Comparison processing using the template TP2 (comparison detection CP2) is performed, and finally comparison processing using the high-resolution image F2 and the template TP1 (comparison detection CP3) is performed. As a result, the position of the face having the size SZ2 is detected. The same applies to the other sizes SZ3 to SZ6.

このようにして、複数サイズ検出用ピラミッド画像Ｈ１（図１８参照）の中から選択した注目画像と、テンプレートとを用いた比較処理によって、当該注目画像で検出対象となる大きさの物体を検出する動作が行われる。 In this manner, an object having a size to be detected in the target image is detected by the comparison process using the target image selected from the pyramid image H1 for multiple size detection (see FIG. 18) and the template. Operation is performed.

そして、このような動作を複数の注目画像について繰り返すことによって、複数（上記の例では６つ）の大きさＳＺ１〜ＳＺ６の物体が検出される。 Then, by repeating such an operation for a plurality of images of interest, a plurality (6 in the above example) of objects SZ1 to SZ6 are detected.

＜その他＞
以上、この発明の実施の形態について説明したが、この発明は上記に説明した内容に限定されるものではない。 <Others>
Although the embodiments of the present invention have been described above, the present invention is not limited to the contents described above.

例えば、上記実施の形態においては、比較検出ＣＰ１において肌色領域判定を行い、比較検出ＣＰ２及びＣＰ３では、テンプレートマッチングを行い、検出処理を実行しているがこれに限定されない。具体的には、比較検出ＣＰ１〜ＣＰ３の全てにおいてテンプレートマッチングを実行してもよく、また、比較検出ＣＰ１〜ＣＰ３の全てにおいて実行するテンプレートマッチングに先立って、肌色領域判定等の特徴量を利用した判定処理を実行してもよい。このように、簡易な判定処理をテンプレートマッチングに先立って実行することにより、肌色領域を有しない探索窓のマッチング処理を省略することが可能となり、検出処理を高速に実行することができる。 For example, in the above-described embodiment, skin color region determination is performed in the comparison detection CP1, and template matching is performed in the comparison detection CP2 and CP3, and the detection process is performed. However, the present invention is not limited to this. Specifically, template matching may be executed in all of the comparison detections CP1 to CP3, and a feature amount such as skin color region determination is used prior to template matching executed in all of the comparison detections CP1 to CP3. A determination process may be executed. As described above, by executing the simple determination process prior to the template matching, the matching process for the search window having no skin color region can be omitted, and the detection process can be executed at high speed.

また、上記実施の形態においては、縮小率αは、縮小率β＝α^Nの関係が成立する値としたがこれに限定されない。具体的には、縮小率αは、基準画像ＧＭ３を作成するために用いる値であり、縮小率βが一定の値であれば、ピラミッド画像作成に影響を与えることはない。このため、縮小率αとしては、縮小率β＝α^Nの関係を満たさない任意の値を設定することも可能である。 In the above embodiment, the reduction rate α is a value that satisfies the relationship of reduction rate β = α ^N , but is not limited to this. Specifically, the reduction rate α is a value used to create the reference image GM3. If the reduction rate β is a constant value, it does not affect the pyramid image creation. Therefore, an arbitrary value that does not satisfy the relationship of the reduction rate β = α ^N can be set as the reduction rate α.

さらに、上記実施の形態においては、図４に示される操作画面Ｑ上で使用者（ユーザ）が指定する顔の大きさは、予め設定されていた顔の大きさを使用者が選択するという態様であったが、これに限定されない。具体的には、上述のように縮小率αが任意の値に設定可能であることにより、対象画像Ｆ０（元画像）から第１階層の画像Ｆ１を作成する際に用いる所定の倍率γも任意の値に設定可能とすると、顔の大きさ自体を使用者（ユーザ）自身が自由に設定することが可能となる。つまり、基準画像ＧＭ３の作成に当たり、使用者（ユーザ）が指定した顔の大きさを検出するための注目階層の画像を、作成可能とするような所定の倍率γ及び縮小率αを決定するという態様にしてもよい。 Further, in the above-described embodiment, the face size specified by the user (user) on the operation screen Q shown in FIG. 4 is a mode in which the user selects a face size set in advance. However, it is not limited to this. Specifically, since the reduction ratio α can be set to an arbitrary value as described above, the predetermined magnification γ used when creating the first layer image F1 from the target image F0 (original image) is also arbitrary. If the user can set the face size itself, the user (user) can freely set the face size. That is, in creating the reference image GM3, a predetermined magnification γ and a reduction rate α are determined so that an image of a target layer for detecting the size of the face designated by the user (user) can be created. You may make it an aspect.

本発明の実施形態に係る検出装置の概要を示す図である。It is a figure which shows the outline | summary of the detection apparatus which concerns on embodiment of this invention. 検出装置が備える各種機能を示すブロック図である。It is a block diagram which shows the various functions with which a detection apparatus is provided. 被写体として数人の人物を有する画像を示す図である。It is a figure which shows the image which has several persons as a to-be-photographed object. プログラム実行時の操作画面を示す図である。It is a figure which shows the operation screen at the time of program execution. 検出装置において実行される検出処理の一例を模式的に示した図である。It is the figure which showed typically an example of the detection process performed in a detection apparatus. 元画像中に存在する複数の大きさの物体をピラミッド画像を用いて検出する検出処理の模式図である。It is a schematic diagram of the detection process which detects the object of the some magnitude | size which exists in an original image using a pyramid image. 画像中に存在する所定の物体を高速に検出する検出処理の模式図である。It is a schematic diagram of the detection process which detects the predetermined object which exists in an image at high speed. 比較例に係る検出処理の模式図である。It is a schematic diagram of the detection process which concerns on a comparative example. 検出装置の全体動作を示すフローチャートである。It is a flowchart which shows the whole operation | movement of a detection apparatus. 情報設定処理を示すフローチャートである。It is a flowchart which shows an information setting process. ピラミッド画像作成処理を示すフローチャートである。It is a flowchart which shows a pyramid image creation process. 探索窓決定処理を示すフローチャートである。It is a flowchart which shows a search window determination process. マッチング処理を示すフローチャートである。It is a flowchart which shows a matching process. 情報設定処理を模式的に示した図である。It is the figure which showed the information setting process typically. ピラミッド画像作成処理における基準画像作成を示す図である。It is a figure which shows the reference | standard image creation in a pyramid image creation process. ピラミッド画像作成処理における縮小画像作成を示す図である。It is a figure which shows the reduction image creation in a pyramid image creation process. ピラミッド画像の完成図である。It is a completion drawing of a pyramid image. ピラミッド画像における画像種別を示す図である。It is a figure which shows the image classification in a pyramid image. ピラミッド画像における各画像間の縮小倍率を示す図である。It is a figure which shows the reduction magnification between each image in a pyramid image. 最小の大きさの顔を検出する場合の検出用情報とピラミッド画像との対応関係を示す図である。It is a figure which shows the correspondence of the information for a detection in the case of detecting the face of the minimum magnitude | size, and a pyramid image. 検出する各大きさに対応するピラミッド画像中の注目階層を示す図である。It is a figure which shows the attention hierarchy in the pyramid image corresponding to each magnitude | size to detect. ピラミッド画像において各注目階層が検出に使用する階層を示した図である。It is the figure which showed the hierarchy which each attention hierarchy uses for a detection in a pyramid image. 注目階層を第１階層とした際の検出処理手順を示す図である。It is a figure which shows the detection process procedure at the time of making an attention hierarchy into the 1st hierarchy. 探索窓のシフト処理を示す図である。It is a figure which shows the shift process of a search window. 低解像度画像におけるシフトの様子と高解像度画像におけるシフトの様子とを一次元で示す図である。It is a figure which shows the mode of the shift in a low resolution image, and the mode of the shift in a high resolution image in one dimension. 高解像度画像におけるシフトの様子を二次元で示す図である。It is a figure which shows the mode of the shift in a high resolution image in two dimensions.

Explanation of symbols

１検出装置
３記憶部
３ａハードディスクドライブ（ＨＤＤ）
３ｂＲＡＭ
５表示部
６入力部
６ａキーボード
６ｂマウス
１１情報設定部
１２ピラミッド画像作成部
１３探索窓決定部
１４マッチング処理部
DESCRIPTION OF SYMBOLS 1 Detection apparatus 3 Memory | storage part 3a Hard disk drive (HDD)
3b RAM
DESCRIPTION OF SYMBOLS 5 Display part 6 Input part 6a Keyboard 6b Mouse 11 Information setting part 12 Pyramid image creation part 13 Search window determination part 14 Matching process part

Claims

A detection device for detecting an object present in a target image,
A first image group used to detect objects of a plurality of sizes in the target image, the first image group having a plurality of images with different magnifications relative to the target image; A plurality of images corresponding to a state in which each of the plurality of images in the first image group is reduced at a predetermined magnification. A hierarchical image group creating means for creating a hierarchical image group including a second image group having a plurality of images;
Object detection means for detecting the object using the hierarchical image group;
With
In the first image group and the second image group, a part of the images is made common so that the magnifications of the part of the images in the image group with respect to the target image are the same. A detection device produced by the method described above.

The detection device according to claim 1,
The hierarchical image group creating means creates a plurality of reference images having different magnifications with respect to the target image and creates a plurality of reduced images obtained by further reducing each of the plurality of reference images at a predetermined magnification. A hierarchical image group including the reference image and the plurality of reduced images,
A part of the plurality of reduced images is created in common as an image belonging to both the first image group and the second image group.

The detection device according to claim 2,
The hierarchical image group creating means includes
Determining means for determining the first reduction rate α and the second reduction rate β so as to satisfy a relationship of β = α ^N (N is an integer of 2 or more);
A first reference image is created based on the target image, and the first reference image is sequentially reduced at an inter-image reduction ratio of the first reduction ratio α, so that (N−1) intermediate layer references Means for creating the plurality of reference images including the first reference image and the (N-1) intermediate layer reference images by creating an image;
Means for creating a plurality of reduced images by further reducing each of the plurality of reference images at the second reduction rate β;
A detection apparatus comprising:

The detection device according to claim 3,
The hierarchical image group creating means includes
Means for creating at least one re-reduced image by further reducing one image or each of the plurality of reduced images at the second reduction rate β;
The detection apparatus further comprising:

In the detection device according to claim 3 or 4,
The second reduction rate β is 1 / K (K is an integer equal to or greater than 2).

The detection device according to any one of claims 1 to 5,
The object detection means includes
The object of the plurality of sizes is obtained by repeating an operation of detecting an object of a size to be detected in the target image by a comparison process using the target image selected from the first image group and the template. Detect
In the operation of detecting an object having a size corresponding to the target image, the target image and an image obtained by reducing the target image at the predetermined magnification and having a low resolution out of the second group of images. A detection apparatus characterized by efficiently detecting an object having a size to be detected in the target image by detecting an object in order from an image.

A detection method for detecting an object present in a target image,
A first image group used to detect objects of a plurality of sizes in the target image, the first image group having a plurality of images with different magnifications relative to the target image; A plurality of images corresponding to a state in which each of the plurality of images in the first image group is reduced at a predetermined magnification. A hierarchical image group creating step of creating a hierarchical image group including a second image group having the images of:
An object detection step of detecting the object using the hierarchical image group;
Including
In the first image group and the second image group, a part of the images is made common so that the magnification of the part of each image group with respect to the target image is the same. The detection method characterized by being created by.

The detection method according to claim 7,
The hierarchical image group creating step creates a plurality of reference images having different magnifications with respect to the target image and creating a plurality of reduced images obtained by further reducing each of the plurality of reference images at a predetermined magnification. A hierarchical image group including the reference image and the plurality of reduced images,
A part of the plurality of reduced images is created by being shared as an image belonging to both the first image group and the second image group.

The detection method according to claim 8, wherein
The hierarchical image group creation step includes:
A determination step of determining the first reduction rate α and the second reduction rate β so as to satisfy a relationship of β = α ^N (N is an integer of 2 or more);
A first reference image is created based on the target image, and the first reference image is sequentially reduced at an inter-image reduction rate of the first reduction rate α, thereby (N−1) intermediate layer references. Creating the plurality of reference images including the first reference image and the (N-1) intermediate layer reference images by creating an image;
Creating a plurality of reduced images by further reducing each of the plurality of reference images at the second reduction rate β;
A detection method comprising:

The detection method according to claim 9, wherein
The hierarchical image group creation step includes:
Creating at least one re-reduced image by further reducing one image or each of the plurality of reduced images at the second reduction rate β;
A detection method characterized by further comprising:

The detection method according to claim 9 or 10,
The second reduction ratio β is 1 / K (K is an integer of 2 or more).

The detection method according to any one of claims 7 to 11,
The object detection step includes
The object of the plurality of sizes is obtained by repeating an operation of detecting an object of a size to be detected in the target image by a comparison process using the target image selected from the first image group and the template. Detect
In the operation of detecting an object having a size corresponding to the target image, the target image and an image obtained by reducing the target image at the predetermined magnification and having a low resolution out of the second group of images. A detection method characterized by efficiently detecting an object having a size to be detected in the target image by detecting an object in order from an image.

On the computer,
A first image group used for detecting an object of a plurality of sizes in the target image, the first image group having a plurality of images with different magnifications relative to the target image; and A plurality of second image groups used to detect each of the plurality of objects at a high speed, each of which corresponds to a state in which the plurality of images in the first image group are reduced at a predetermined magnification. A hierarchical image group creating step of creating a hierarchical image group including a second image group having an image;
An object detection step of detecting the object using the hierarchical image group;
Is a program for detecting an object present in the target image by executing
In the first image group and the second image group, a part of the images is made common so that the magnifications of the part of the images in the image group with respect to the target image are the same. A program characterized by being created in.

The program according to claim 13, wherein
The hierarchical image group creating step creates a plurality of reference images having different magnifications with respect to the target image and creates a plurality of reduced images obtained by further reducing each of the plurality of reference images at a predetermined magnification. Creating the hierarchical image group including the reference image and the plurality of reduced images,
A part of the plurality of reduced images is created in common as an image belonging to both the first image group and the second image group.

The program according to claim 14, wherein
The hierarchical image group creation step includes:
In the computer,
A determination step of determining the first reduction rate α and the second reduction rate β so as to satisfy a relationship of β = α ^N (N is an integer of 2 or more);
A first reference image is created based on the target image, and the first reference image is sequentially reduced at an inter-image reduction rate of the first reduction rate α, thereby (N−1) intermediate layer references. Creating the plurality of reference images including the first reference image and the (N-1) intermediate layer reference images by creating an image;
Creating a plurality of reduced images by further reducing each of the plurality of reference images at the second reduction rate β;
A program characterized by having executed.

The program according to claim 15, wherein
The hierarchical image group creation step includes:
In the computer,
Creating at least one re-reduced image by further reducing one image or each of the plurality of reduced images at the second reduction rate β;
Is further executed.

In the program according to claim 15 or 16,
The second reduction ratio β is 1 / K (K is an integer of 2 or more).

The program according to any one of claims 13 to 17,
The object detection step includes
In the computer,
The object having the plurality of sizes is detected by repeating the operation of detecting the object having the size corresponding to the target image by the comparison process using the target image selected from the first image group and the template. And
In the operation of detecting an object having a size as a detection target in the target image, the target image and an image obtained by reducing the target image at the predetermined magnification, which is lower than the second image group. A program characterized in that an object having a size to be detected in the target image is efficiently detected by detecting objects in order from the resolution image.