JP2014142760A

JP2014142760A - Image detection device, control program and image detection method

Info

Publication number: JP2014142760A
Application number: JP2013010092A
Authority: JP
Inventors: Kenta Nishiyuki; 健太西行
Original assignee: MegaChips Corp
Current assignee: MegaChips Corp
Priority date: 2013-01-23
Filing date: 2013-01-23
Publication date: 2014-08-07
Anticipated expiration: 2033-01-23
Also published as: JP6030457B2

Abstract

PROBLEM TO BE SOLVED: To provide a technology for reducing throughput about a processing object image as the object of the detection of a detection object image.SOLUTION: A detection part 3 performs detection processing of detecting a region which is highly likely to be a detection object image having the same size as that of a detection frame as a detection result region to a pickup image. The detection part 3 associates a plurality of types of detection frames with a plurality of pickup images in a distributed manner such that at least one type of detection frame included in the plurality of types of detection frames having different sizes is associated with each of the plurality of pickup images including a processing object image. The detection part 3 performs detection processing to each of the plurality of pickup images by using at least one type of detection frame associated with each pickup image. A detection object image specification part 7 specifies the detection object image in the processing object image on the basis of the detection result region detected about the plurality of pickup images in the detection part 3.

Description

本発明は、処理対象画像から検出対象画像を検出する技術に関する。 The present invention relates to a technique for detecting a detection target image from a processing target image.

特許文献１〜３には、処理対象画像から検出対象画像を検出する技術が開示されている。 Patent Documents 1 to 3 disclose techniques for detecting a detection target image from a processing target image.

特開２００８−２７０５８号公報JP 2008-27058 A 特開２００９−１７５８２１号公報JP 2009-175821 A 特開２０１１−２２１７９１号公報JP 2011-221791 A

さて、処理対象画像から検出対象画像を検出する際には、処理対象画像についての処理量の低減が望まれている。 Now, when detecting a detection target image from a processing target image, reduction of the processing amount about a processing target image is desired.

そこで、本発明は上述の点に鑑みて成されたものであり、検出対象画像の検出が行われる対象の処理対象画像についての処理量を低減することが可能な技術を提供することを目的とする。 Therefore, the present invention has been made in view of the above-described points, and an object thereof is to provide a technique capable of reducing the processing amount of a processing target image to be detected. To do.

上記課題を解決するため、本発明に係る画像検出装置の一態様は、処理対象画像から検出対象画像を検出する画像検出装置であって、検出枠を用いて、撮像画像に対して、当該検出枠と同じサイズの前記検出対象画像である可能性が高い領域を検出結果領域として検出する検出処理を行う検出部と、前記処理対象画像において前記検出対象画像を特定する検出対象画像特定部とを備え、前記検出部は、互いに異なったタイミングで撮像された、前記処理対象画像を含む複数枚の撮像画像のそれぞれに対して、互いにサイズの異なる複数種類の検出枠に含まれる少なくとも一種類の検出枠が対応するように、当該複数種類の検出枠を分散して当該複数枚の撮像画像に対応付けて、当該複数枚の撮像画像のそれぞれに対して、当該撮像画像に対応する少なくとも一種類の検出枠を用いて前記検出処理を行い、前記検出対象画像特定部は、前記検出部において前記複数枚の撮像画像について検出された前記検出結果領域に基づいて、前記処理対象画像において前記検出対象画像を特定する。 In order to solve the above problems, one aspect of an image detection device according to the present invention is an image detection device that detects a detection target image from a processing target image, and uses the detection frame to detect the detected image. A detection unit that performs a detection process of detecting an area that is likely to be the detection target image having the same size as a frame as a detection result area, and a detection target image specifying unit that specifies the detection target image in the processing target image The detection unit includes at least one type of detection included in a plurality of types of detection frames having different sizes with respect to each of a plurality of captured images including the processing target image captured at different timings. The plurality of types of detection frames are dispersed and associated with the plurality of captured images so that the frames correspond to each other, and each of the plurality of captured images corresponds to the captured image. The detection target image specifying unit performs the detection process using at least one type of detection frame, and the detection target image specifying unit detects the processing target image based on the detection result region detected for the plurality of captured images by the detection unit. In step (b), the detection target image is specified.

また、本発明に係る画像検出装置の一態様では、前記複数種類の検出枠は、基準サイズの基準検出枠と、当該基準サイズとは異なるサイズの非基準検出枠とを含み、前記検出部は、撮像画像について前記非基準検出枠を用いて前記検出処理を行う際には、前記基準サイズとサイズが一致するように当該非基準検出枠をサイズ変更するとともに、当該非基準検出枠のサイズ変更に応じて当該撮像画像のサイズ変更を行い、サイズ変更後の当該撮像画像であるサイズ変更画像に対して、サイズ変更後の当該非基準検出枠であるサイズ変更検出枠を移動させながら、当該サイズ変更画像での当該サイズ変更検出枠内の画像が前記検出対象画像である可能性が高いかを判定する。 In the image detection apparatus according to the aspect of the invention, the plurality of types of detection frames include a reference detection frame having a reference size and a non-reference detection frame having a size different from the reference size, and the detection unit includes When performing the detection process on the captured image using the non-reference detection frame, the non-reference detection frame is resized so that the size matches the reference size, and the non-reference detection frame is resized. The size of the captured image is changed according to the size, and the size change detection frame that is the non-reference detection frame after the size change is moved with respect to the size changed image that is the captured image after the size change. It is determined whether there is a high possibility that the image within the size change detection frame in the changed image is the detection target image.

また、本発明に係る画像検出装置の一態様では、前記検出部は、前記複数枚の撮像画像における、前記処理対象画像以外の撮像画像について、前記複数種類の検出枠に含まれる検出枠を用いて前記検出処理を行った結果、当該撮像画像において、当該検出枠と同じサイズの前記検出対象画像である可能性が非常に高い領域が存在する場合には、前記処理対象画像についても当該検出枠を用いて前記検出処理を行う。 In one aspect of the image detection apparatus according to the present invention, the detection unit uses detection frames included in the plurality of types of detection frames for the captured images other than the processing target image in the plurality of captured images. As a result of performing the detection process, if there is a region in the captured image that is very likely to be the detection target image having the same size as the detection frame, the detection frame is also detected for the processing target image. The detection process is performed using.

また、本発明に係る画像検出装置の一態様では、前記検出対象画像特定部は、前記検出部において前記複数枚の撮像画像について検出された前記検出結果領域の外形枠である検出結果枠に基づいて、前記処理対象画像において前記検出対象画像を特定し、前記検出対象画像特定部は、前記検出結果枠が重ねられた前記処理対象画像を複数のブロックに分割し、前記検出対象画像特定部は、前記複数のブロックにおける、前記検出結果枠と重なるブロックについて、前記複数枚の撮像画像のうち、当該ブロックと重なる前記検出結果枠が得られた撮像画像の数がしきい値以下である場合には、当該ブロックと重なる前記検出結果枠を使用せずに、前記処理対象画像において前記検出対象画像を特定する。 In the image detection device according to the aspect of the invention, the detection target image specifying unit may be based on a detection result frame that is an outer frame of the detection result area detected for the plurality of captured images in the detection unit. The detection target image is specified in the processing target image, the detection target image specifying unit divides the processing target image on which the detection result frame is overlaid into a plurality of blocks, and the detection target image specifying unit is When the number of captured images in which the detection result frame overlapping the block is obtained among the plurality of captured images is equal to or less than a threshold value for the block overlapping the detection result frame in the plurality of blocks Specifies the detection target image in the processing target image without using the detection result frame overlapping the block.

また、本発明に係る画像検出装置の一態様では、前記検出対象画像特定部は、前記複数のブロックにおける、前記検出結果枠と重なるブロックについて、前記複数枚の撮像画像のうち、当該ブロックと重なる前記検出結果枠が得られた撮像画像の数がしきい値以下である場合であっても、当該ブロックと重なる前記検出結果枠を外形枠とする前記検出結果領域に、前記検出対象画像である可能性が非常に高い前記検出結果領域が含まれる場合には、当該ブロックと重なる前記検出結果枠を使用して、前記処理対象画像において前記検出対象画像を特定する。 In the image detection apparatus according to the aspect of the invention, the detection target image specifying unit may overlap the block of the plurality of captured images with respect to the block that overlaps the detection result frame in the plurality of blocks. Even if the number of captured images from which the detection result frame is obtained is equal to or less than a threshold value, the detection result image is in the detection result region having the detection result frame overlapping the block as an outer frame. When the detection result region having a very high possibility is included, the detection target image is specified in the processing target image using the detection result frame overlapping the block.

また、本発明に係る画像検出装置の一態様では、前記検出対象画像特定部は、前記検出部において前記複数枚の撮像画像について検出された前記検出結果領域についての、当該検出結果領域が前記検出対象画像である確からしさを示す検出確度値に基づいて、前記処理対象画像において前記検出対象画像を特定する。 In the image detection device according to the aspect of the invention, the detection target image specifying unit may detect the detection result region of the detection result region detected for the plurality of captured images in the detection unit. The detection target image is specified in the processing target image based on a detection accuracy value indicating the likelihood of being the target image.

また、本発明に係る画像検出装置の一態様では、前記検出対象画像特定部は、前記検出部において前記複数枚の撮像画像について検出された前記検出結果領域についての前記検出確度値に対して重み付けを行ったうえで当該検出確度値に基づいて、前記処理対象画像において前記検出対象画像を特定し、前記検出対象画像特定部は、撮像画像について検出された前記検出結果領域についての前記検出確度値に対して重み付けを行う際には、当該撮像画像の撮像タイミングが前記処理対象画像の撮像タイミングよりも離れているほど、当該検出確度値に対する重み付けを小さくする。 In the aspect of the image detection apparatus according to the present invention, the detection target image specifying unit weights the detection accuracy value for the detection result region detected for the plurality of captured images in the detection unit. The detection target image is specified in the processing target image based on the detection accuracy value, and the detection target image specifying unit is configured to detect the detection accuracy value of the detection result area detected for the captured image. When the weighting is performed on the detection accuracy value, the weighting of the detection accuracy value is reduced as the imaging timing of the captured image is further away from the imaging timing of the processing target image.

また、本発明に係る画像検出装置の一態様では、前記検出対象画像は、人の顔画像である。 In one aspect of the image detection apparatus according to the present invention, the detection target image is a human face image.

また、本発明に係る制御プログラムの一態様は、処理対象画像から検出対象画像を検出する画像検出装置を制御するための制御プログラムであって、前記画像検出装置に、（ａ）検出枠を用いて、撮像画像に対して、当該検出枠と同じサイズの前記検出対象画像である可能性が高い領域を検出結果領域として検出する検出処理を行う工程と、（ｂ）前記処理対象画像において前記検出対象画像を特定する工程とを実行させ、前記工程（ａ）において、互いに異なったタイミングで撮像された、前記処理対象画像を含む複数枚の撮像画像のそれぞれに対して、互いにサイズの異なる複数種類の検出枠に含まれる少なくとも一種類の検出枠が対応するように、当該複数種類の検出枠を分散して当該複数枚の撮像画像に対応付けて、当該複数枚の撮像画像のそれぞれについて、当該撮像画像に対応する少なくとも一種類の検出枠を用いて前記検出処理を行い、前記工程（ｂ）において、前記工程（ａ）で前記複数枚の撮像画像について検出された前記検出結果領域に基づいて、前記処理対象画像において前記検出対象画像を特定するように動作させるためのものである。 An aspect of the control program according to the present invention is a control program for controlling an image detection apparatus that detects a detection target image from a processing target image. The image detection apparatus uses (a) a detection frame. A step of performing a detection process on the captured image to detect, as a detection result area, an area that is highly likely to be the detection target image having the same size as the detection frame; and (b) the detection in the processing target image. A plurality of types having different sizes with respect to each of a plurality of captured images including the processing target image captured at different timings in the step (a). The plurality of types of detection frames are distributed and associated with the plurality of captured images so that at least one type of detection frame included in the detection frames corresponds to the plurality of detection frames. For each of the images, the detection processing is performed using at least one type of detection frame corresponding to the captured image, and in the step (b), the plurality of captured images detected in the step (a) is detected. Based on the detection result area, the detection target image is operated to specify the detection target image in the processing target image.

また、本発明に係る画像検出方法の一態様は、処理対象画像から検出対象画像を検出する画像検出方法であって、（ａ）検出枠を用いて、撮像画像に対して、当該検出枠と同じサイズの前記検出対象画像である可能性が高い領域を検出結果領域として検出する検出処理を行う工程と、（ｂ）前記処理対象画像において前記検出対象画像を特定する工程とを備え、前記工程（ａ）において、互いに異なったタイミングで撮像された、前記処理対象画像を含む複数枚の撮像画像のそれぞれに対して、互いにサイズの異なる複数種類の検出枠に含まれる少なくとも一種類の検出枠が対応するように、当該複数種類の検出枠を分散して当該複数枚の撮像画像に対応付けて、当該複数枚の撮像画像のそれぞれについて、当該撮像画像に対応する少なくとも一種類の検出枠を用いて前記検出処理を行い、前記工程（ｂ）において、前記工程（ａ）で前記複数枚の撮像画像について検出された前記検出結果領域に基づいて、前記処理対象画像において前記検出対象画像を特定する。 An aspect of the image detection method according to the present invention is an image detection method for detecting a detection target image from a processing target image. (A) Using a detection frame, A step of performing a detection process of detecting a region having a high possibility of being the detection target image of the same size as a detection result region, and (b) a step of specifying the detection target image in the processing target image, In (a), for each of a plurality of captured images including the processing target image captured at different timings, at least one type of detection frame is included in a plurality of types of detection frames having different sizes. In order to correspond, the plurality of types of detection frames are distributed and associated with the plurality of captured images, and each of the plurality of captured images is at least corresponding to the captured image. The detection process is performed using one type of detection frame, and in the process target image based on the detection result area detected for the plurality of captured images in the process (a) in the process (b). The detection target image is specified.

本発明によれば、処理対象画像についての処理量を低減することができる。 According to the present invention, it is possible to reduce the amount of processing for a processing target image.

画像検出装置の構成を示す図である。It is a figure which shows the structure of an image detection apparatus. 画像検出装置が備える複数の機能ブロックの構成を示す図である。It is a figure which shows the structure of the several functional block with which an image detection apparatus is provided. 複数種類の検出枠が複数枚の使用撮像画像に対応付けられている様子を示す図である。It is a figure which shows a mode that several types of detection frames are matched with the some use captured image. 検出部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a detection part. 検出部の構成を示す図である。It is a figure which shows the structure of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出結果枠を処理対象画像に重ねて示す図である。It is a figure which overlaps and shows a detection result frame on a process target image. 出力値マップの生成方法を説明するための図である。It is a figure for demonstrating the production | generation method of an output value map. 出力値マップの生成方法を説明するための図である。It is a figure for demonstrating the production | generation method of an output value map. 出力値マップを示す図である。It is a figure which shows an output value map. 極大点の探索方法を説明するための図である。It is a figure for demonstrating the search method of a local maximum point. 極大点の探索方法を説明するための図である。It is a figure for demonstrating the search method of a local maximum point. 出力値マップでの極大点付近の検出確度値の分布を示す図である。It is a figure which shows distribution of the detection accuracy value vicinity of the maximum point in an output value map. 検出画像領域の決定方法を説明するための図である。It is a figure for demonstrating the determination method of a detection image area | region. 検出対象画像決定部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a detection target image determination part. 検出画像領域及び統合後検出結果枠を処理対象画像に重ねて示す図である。It is a figure which overlaps and shows a detection image area | region and the detection result frame after integration on a process target image. 検出結果枠が重ねられた処理対象画像が複数のブロックに分割されている様子を示す図である。It is a figure which shows a mode that the process target image with which the detection result frame was overlaid is divided | segmented into several blocks.

図１は実施の形態に係る画像検出装置１の構成を示す図である。本実施の形態に係る画像検出装置１は、入力される画像データが示す撮像画像から検出対象画像を検出する。画像検出装置１は、例えば、監視カメラシステム、デジタルカメラシステム等で使用される。本実施の形態では、検出対象画像は、例えば人の顔画像である。以後、単に「顔画像」と言えば、人の顔画像を意味するものとする。また、検出対象画像を検出する対象の撮像画像を「処理対象画像」と呼ぶ。本実施の形態に係る画像検出装置１は、処理対象画像を含む、互いに撮像タイミングが異なるもののそれらの撮像タイミングが近い複数枚の撮像画像が同じ画像を示すものとして、処理対象画像から検出対象画像を検出する際には当該複数枚の撮像画像を使用する。画像検出装置１での検出対象画像は顔画像以外の画像であっても良い。 FIG. 1 is a diagram illustrating a configuration of an image detection apparatus 1 according to an embodiment. The image detection apparatus 1 according to the present embodiment detects a detection target image from a captured image indicated by input image data. The image detection apparatus 1 is used, for example, in a surveillance camera system, a digital camera system, or the like. In the present embodiment, the detection target image is, for example, a human face image. Hereinafter, simply speaking “face image” means a human face image. A captured image to be detected from the detection target image is referred to as a “processing target image”. The image detection apparatus 1 according to the present embodiment assumes that a plurality of captured images having different imaging timings, including processing target images, but having different imaging timings indicate the same image, from the processing target images. When detecting a plurality of captured images. The detection target image in the image detection apparatus 1 may be an image other than a face image.

図１に示されるように、画像検出装置１は、ＣＰＵ（Central Processing Unit）１０及び記憶部１１を備えている。記憶部１１は、ＲＯＭ（Read Only Memory）及びＲＡＭ（Random Access Memory）等で構成されている。記憶部１１には、画像検出装置１の動作を制御するための制御プログラム１２等が記憶されている。画像検出装置１の各種機能は、ＣＰＵ１０が記憶部１１内の制御プログラム１２を実行することによって実現される。画像検出装置１では、制御プログラム１２が実行されることによって、図２に示されるような複数の機能ブロックが形成される。 As shown in FIG. 1, the image detection apparatus 1 includes a CPU (Central Processing Unit) 10 and a storage unit 11. The storage unit 11 includes a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The storage unit 11 stores a control program 12 for controlling the operation of the image detection apparatus 1 and the like. Various functions of the image detection apparatus 1 are realized by the CPU 10 executing the control program 12 in the storage unit 11. In the image detection apparatus 1, a plurality of functional blocks as shown in FIG. 2 are formed by executing the control program 12.

図２に示されるように、画像検出装置１は、機能ブロックとして、画像入力部２と、検出部３と、検出対象画像特定部７とを備えている。以下では、画像検出装置１が備える各機能ブロックの概略動作を説明した後に、当該各機能ブロックの詳細動作について説明する。画像検出装置１が備える各種機能は、機能ブロックではなくハードウェア回路で実現しても良い。 As shown in FIG. 2, the image detection apparatus 1 includes an image input unit 2, a detection unit 3, and a detection target image specifying unit 7 as functional blocks. Hereinafter, after describing the schematic operation of each functional block included in the image detection apparatus 1, the detailed operation of each functional block will be described. Various functions provided in the image detection apparatus 1 may be realized by hardware circuits instead of function blocks.

＜画像検出装置の概略動作説明＞
画像入力部２には、監視カメラシステム等が備える撮像部（カメラ）で順次撮像された複数枚の撮像画像をそれぞれ示す複数の画像データが順次入力される。画像入力部２は、処理対象画像から検出対象画像が検出される際に使用される、処理対象画像を含むＫ枚（Ｋ≧２）の撮像画像をそれぞれ示す複数の画像データを出力する。以後、当該Ｋ枚の撮像画像をまとめて「使用撮像画像群」と呼ぶことがある。また、使用撮像画像群を構成する複数枚（Ｋ枚）の撮像画像のそれぞれを「使用撮像画像」と呼ぶことがある。使用撮像画像群を構成する複数枚の使用撮像画像は互いに異なったタイミングで撮像されている。 <Overview of operation of image detection apparatus>
The image input unit 2 is sequentially input with a plurality of pieces of image data respectively indicating a plurality of captured images sequentially captured by an imaging unit (camera) included in the surveillance camera system or the like. The image input unit 2 outputs a plurality of image data respectively indicating K (K ≧ 2) captured images including the processing target image, which are used when the detection target image is detected from the processing target image. Hereinafter, the K captured images may be collectively referred to as a “used captured image group”. In addition, each of a plurality of (K) captured images constituting the used captured image group may be referred to as a “used captured image”. The plurality of used captured images constituting the used captured image group are captured at different timings.

画像入力部２は、撮像部で得られる各撮像画像を処理対象画像としても良いし、撮像部で得られる撮像画像のうち、数秒毎に得られる撮像画像を処理対象画像としても良い。撮像部では、例えば、１秒間にＬ枚（Ｌ≧２）の撮像画像が撮像される。つまり、撮像部での撮像フレームレートは、Ｌｆｐｓ(frame per second）である。例えばＬ＝３０に設定される。 The image input unit 2 may use each captured image obtained by the imaging unit as a processing target image, or may use a captured image obtained every few seconds among the captured images obtained by the imaging unit as a processing target image. In the imaging unit, for example, L (L ≧ 2) captured images are captured per second. That is, the imaging frame rate at the imaging unit is Lfps (frame per second). For example, L = 30 is set.

また、撮像部で得られる撮像画像では、行方向にＭ個（Ｍ≧２）のピクセルが並び、列方向にＮ個（Ｎ≧２）のピクセルが並んでいる。撮像部で得られる撮像画像の解像度は、例えばＶＧＡ（Video Graphics Array）であって、Ｍ＝６４０、Ｎ＝４８０となっている。 In the captured image obtained by the imaging unit, M (M ≧ 2) pixels are arranged in the row direction, and N (N ≧ 2) pixels are arranged in the column direction. The resolution of the captured image obtained by the imaging unit is, for example, VGA (Video Graphics Array), and M = 640 and N = 480.

なお以後、行方向にｍ個（ｍ≧１）のピクセルが並び、列方向にｎ個（ｎ≧１）のピクセルが並ぶ領域の大きさをｍｐ×ｎｐで表す（ｐはピクセルの意味）。また、行列状に配置された複数の値において、左上を基準にして第ｍ行目であって第ｎ列目に位置する値をｍ×ｎ番目の値と呼ぶことがある。 Hereinafter, the size of an area in which m (m ≧ 1) pixels are arranged in the row direction and n (n ≧ 1) pixels are arranged in the column direction is represented by mp × np (p is a meaning of a pixel). In addition, among a plurality of values arranged in a matrix, a value located in the m-th row and the n-th column with reference to the upper left may be referred to as an m × n-th value.

検出部３は、画像入力部２から出力される、使用撮像画像群を構成する複数枚の使用撮像画像をそれぞれ示す複数の画像データを使用して、当該複数枚の使用撮像画像のそれぞれについて、当該使用撮像画像において顔画像である可能性が高い領域を検出する。 The detection unit 3 uses a plurality of pieces of image data each indicating a plurality of use captured images constituting the use captured image group output from the image input unit 2, and for each of the plurality of use captured images. An area that is highly likely to be a face image in the use captured image is detected.

検出対象画像特定部７は、マップ生成部４と、極大点探索部５と、検出対象画像決定部６とを備えている。検出対象画像特定部７は、検出部３での検出結果に基づいて、処理対象画像において顔画像を特定する。 The detection target image specifying unit 7 includes a map generation unit 4, a maximum point search unit 5, and a detection target image determination unit 6. The detection target image specifying unit 7 specifies a face image in the processing target image based on the detection result of the detection unit 3.

マップ生成部４は、検出部３での検出結果に基づいて、顔画像としての確からしさを示す検出確度値についての処理対象画像での分布を示す出力値マップを生成する。極大点探索部５は、マップ生成部４で生成された出力値マップにおいて検出確度値の極大点を探索する。検出対象画像決定部６は、処理対象画像において、極大点探索部５で求められた極大点と同じ位置のピクセルを含む所定領域を顔画像であると決定する。これにより、検出対象画像特定部７では処理対象画像において顔画像が特定される。その結果、画像検出装置１では、処理対象画像から顔画像が検出される。 The map generation unit 4 generates an output value map indicating the distribution in the processing target image with respect to the detection accuracy value indicating the likelihood as the face image based on the detection result of the detection unit 3. The maximum point search unit 5 searches for a maximum point of the detection accuracy value in the output value map generated by the map generation unit 4. The detection target image determination unit 6 determines that a predetermined area including a pixel at the same position as the maximum point obtained by the maximum point search unit 5 in the processing target image is a face image. Thereby, the detection target image specifying unit 7 specifies a face image in the processing target image. As a result, the image detection apparatus 1 detects a face image from the processing target image.

＜画像検出装置の詳細動作説明＞
＜検出処理＞
検出部３は、検出枠を用いて、使用撮像画像において当該検出枠と同じサイズの顔画像である可能性が高い領域を検出結果領域として検出する検出処理を行う。以後、単に「検出処理」と言えば、検出部３でのこの検出処理を意味する。検出処理については後で詳細に説明する。 <Detailed operation description of image detection apparatus>
<Detection process>
The detection unit 3 uses the detection frame to perform a detection process of detecting, as a detection result region, a region that is highly likely to be a face image having the same size as the detection frame in the used captured image. Hereinafter, simply speaking “detection process” means this detection process in the detection unit 3. The detection process will be described in detail later.

検出部３は、処理対象画像における様々な大きさの顔画像を検出するために、サイズの異なる複数種類の検出枠を使用する。検出部３は、使用撮像画像群を構成するＫ枚の使用撮像画像のそれぞれに対して、Ｔ種類（Ｔ≧２）の検出枠に含まれる少なくとも一種類の検出枠が対応するように、当該複数種類の検出枠を分散して当該Ｋ枚の使用撮像画像に対応付ける。そして、検出部３は、使用撮像画像群を構成するＫ枚の使用撮像画像のそれぞれについて、当該使用撮像画像に対応する少なくとも一種類の検出枠を用いて検出処理を行う。 The detection unit 3 uses a plurality of types of detection frames having different sizes in order to detect face images of various sizes in the processing target image. The detection unit 3 is configured so that at least one type of detection frame included in the T types (T ≧ 2) of detection frames corresponds to each of the K use captured images constituting the use captured image group. A plurality of types of detection frames are dispersed and associated with the K used captured images. Then, the detection unit 3 performs detection processing for each of the K used captured images constituting the used captured image group using at least one type of detection frame corresponding to the used captured image.

図３は、検出部３で使用される複数種類の検出枠１００が分散して使用撮像画像群を構成する複数枚の使用撮像画像に対応付けられている様子を示す図である。本実施の形態では、例えば、Ｋ＝３であって、使用撮像画像群は、撮像部で連続的に撮像された３枚の撮像画像で構成されている。撮像部でのフレームレートが３０ｆｐｓであるとすると、使用撮像画像群を構成する３枚の使用撮像画像における撮像間隔は（１／３０）秒となる。また本実施の形態では、使用撮像画像を構成する３枚の使用撮像画像のうち、例えば最後に撮像された使用撮像画像が処理対象画像とされる。 FIG. 3 is a diagram illustrating a state in which a plurality of types of detection frames 100 used in the detection unit 3 are distributed and associated with a plurality of used captured images that form a used captured image group. In the present embodiment, for example, K = 3, and the used captured image group includes three captured images that are continuously captured by the imaging unit. Assuming that the frame rate at the imaging unit is 30 fps, the imaging interval of the three used captured images constituting the used captured image group is (1/30) seconds. In the present embodiment, among the three used captured images constituting the used captured image, for example, the last captured captured image is used as the processing target image.

図３では、使用撮像画像群を構成する、（ｋ−２）フレーム目、（ｋ−１）フレーム目及びｋフレーム目の使用撮像画像に対して複数種類の検出枠１００が分散して対応付けられている様子が示されている。図３の例では、ｋフレーム目の使用撮像画像が処理対象画像となる。以下では、使用撮像画像群が、（ｋ−２）フレーム目、（ｋ−１）フレーム目及びｋフレーム目で構成されている場合を例に挙げて、画像検出装置１の動作について説明する。 In FIG. 3, a plurality of types of detection frames 100 are distributed and associated with the used captured images of the (k-2) th frame, the (k-1) th frame, and the kth frame constituting the used captured image group. It is shown that it is being done. In the example of FIG. 3, the use captured image of the kth frame is the processing target image. Hereinafter, the operation of the image detection apparatus 1 will be described by taking as an example a case where the use captured image group is configured by the (k-2) th frame, the (k-1) th frame, and the kth frame.

図３に示されるように、互いにサイズの異なる複数種類の検出枠１００は、分散して、（ｋ−２）フレーム目、（ｋ−１）フレーム目及びｋフレーム目の使用撮像画像に対して対応付けられている。図３の例では、検出枠１００が、サイズの小さいものから順に１つずつ、（ｋ−２）フレーム目の使用撮像画像、（ｋ−１）フレーム目の使用撮像画像、ｋフレーム目の使用撮像画像という順番が繰り返された順番で使用撮像画像に対して対応付けられている。つまり、（ｋ−２）フレーム目の使用撮像画像に対しては（３×ｓ＋１）番目（ｓ＝０，１，２・・・）に小さい検出枠１００が対応付けられ、（ｋ−１）フレーム目の使用撮像画像に対しては（３×ｓ＋２）番目に小さい検出枠１００が対応付けられ、ｋフレーム目の使用撮像画像に対しては（３×ｓ＋３）番目に小さい検出枠１００が対応付けられている。 As shown in FIG. 3, a plurality of types of detection frames 100 having different sizes from each other are dispersed and used for the use imaged images of the (k−2) th frame, the (k−1) th frame, and the kth frame. It is associated. In the example of FIG. 3, the detection frames 100 are used one by one in order from the smallest size, the (k−2) th frame used captured image, the (k−1) th frame used captured image, and the kth frame used. The captured images are associated with the used captured images in the order in which the order of the captured images is repeated. That is, the (3 × s + 1) th (s = 0, 1, 2,...) Small detection frame 100 is associated with the used captured image of the (k−2) th frame, and (k−1). The (3 × s + 2) th smallest detection frame 100 is associated with the used captured image of the frame, and the (3 × s + 3) th smallest detection frame 100 is associated with the used captured image of the kth frame. It is attached.

このように、本実施の形態では、使用撮像画像群を構成する３枚の使用撮像画像のそれぞれに対しては、当該３枚の使用撮像画像の間で互いに重複しないように、検出枠１００がサイズの小さいものから順に２つおきに対応付けられている。本実施の形態では、各使用撮像画像に対して複数種類の検出枠１００が対応付けられている。例えば、Ｔ＝３０である場合には、各使用撮像画像に対して１０種類の検出枠１００が対応付けられる。 Thus, in the present embodiment, for each of the three used captured images constituting the used captured image group, the detection frame 100 is not overlapped between the three used captured images. Every two items are associated in ascending order of size. In the present embodiment, a plurality of types of detection frames 100 are associated with each used captured image. For example, when T = 30, ten types of detection frames 100 are associated with each used captured image.

なお、（ｋ−１）フレーム目の撮像画像が処理対象画像とされる場合には、使用撮像画像群は、（ｋ−３）フレーム目、（ｋ−２）フレーム目及び（ｋ−１）フレーム目の撮像画像で構成されることになる。この場合には、（ｋ−３）フレーム目の撮像画像に対しては、ｋフレーム目に対応付けられた検出枠１００と同じ検出枠１００が対応付けられる。 When the captured image of the (k-1) th frame is the processing target image, the used captured image group includes the (k-3) th frame, the (k-2) th frame, and the (k-1). It is composed of the captured image of the frame. In this case, the same detection frame 100 as the detection frame 100 associated with the kth frame is associated with the captured image of the (k-3) th frame.

また、（ｋ−２）フレーム目の撮像画像が処理対象画像とされる場合には、使用撮像画像群は、（ｋ−４）フレーム目、（ｋ−３）フレーム目及び（ｋ−２）フレーム目の撮像画像で構成されることになる。この場合には、（ｋ−４）フレーム目の撮像画像に対しては、（ｋ−１）フレーム目に対応付けられた検出枠１００と同じ検出枠１００が対応付けられる。 When the captured image of the (k-2) th frame is the processing target image, the used captured image group includes the (k-4) th frame, the (k-3) th frame, and (k-2). It is composed of the captured image of the frame. In this case, the same detection frame 100 as the detection frame 100 associated with the (k-1) frame is associated with the captured image of the (k-4) frame.

また、（ｋ−３）フレーム目の撮像画像が処理対象画像とされる場合には、使用撮像画像群は、（ｋ−５）フレーム目、（ｋ−４）フレーム目及び（ｋ−３）フレーム目の撮像画像で構成されることになる。この場合には、（ｋ−５）フレーム目の撮像画像に対しては、（ｋ−２）フレーム目に対応付けられた検出枠１００と同じ検出枠１００が対応付けられる。 When the captured image of the (k-3) th frame is the processing target image, the used captured image group includes the (k-5) th frame, the (k-4) th frame, and the (k-3). It is composed of the captured image of the frame. In this case, the same detection frame 100 as the detection frame 100 associated with the (k-2) frame is associated with the captured image of the (k-5) frame.

以上の点を一般化すると、使用撮像画像群に（ｋ＋３ｔ）フレーム目（ｔは零を除く整数）の撮像画像が含まれ、当該撮像画像に対して検出枠１００が対応付けられる際には、ｋフレーム目の検出枠１００と同じ検出枠１００が対応付けられる。また、使用撮像画像群に（ｋ＋３ｔ−１）フレーム目の撮像画像が含まれ、当該撮像画像に対して検出枠１００が対応付けられる際には、（ｋ−１）フレーム目の検出枠１００と同じ検出枠１００が対応付けられる。そして、使用撮像画像群に（ｋ＋３ｔ−２）フレーム目の撮像画像が含まれ、当該撮像画像に対して検出枠１００が対応付けられる際には、（ｋ−２）フレーム目の検出枠１００と同じ検出枠１００が対応付けられる。 To generalize the above points, when the captured image group includes the captured image of the (k + 3t) frame (t is an integer excluding zero), and the detection frame 100 is associated with the captured image, The same detection frame 100 as the k-th detection frame 100 is associated. When the captured image group includes the captured image of the (k + 3t−1) frame and the detection frame 100 is associated with the captured image, the detection frame 100 of the (k−1) frame is The same detection frame 100 is associated. When the captured image group includes the captured image of the (k + 3t-2) frame, and the detection frame 100 is associated with the captured image, the detection frame 100 of the (k-2) frame is The same detection frame 100 is associated.

図４は、検出部３が使用撮像画像群を構成する複数枚の使用撮像画像のそれぞれについて当該使用撮像画像に対応する複数種類の検出枠を用いて検出処理を行う際の当該検出部３の一連の動作を示すフローチャートである。 FIG. 4 illustrates the detection unit 3 when the detection unit 3 performs detection processing using a plurality of types of detection frames corresponding to the use captured image for each of the plurality of use captured images constituting the use captured image group. It is a flowchart which shows a series of operation | movement.

図４に示されるように、ステップｓ１において、検出部３は、画像入力部２から、（ｋ−２）フレーム目の使用撮像画像を示す画像データが入力されると、ステップｓ２において、当該画像データを用いて、（ｋ−２）フレーム目の使用撮像画像に対応する各検出枠を用いて、（ｋ−２）フレーム目の使用撮像画像について検出処理を行う。つまり、検出部３は、（ｋ−２）フレーム目の使用撮像画像に対応する各検出枠について、（ｋ−２）フレーム目の使用撮像画像において、当該検出枠と同じサイズの検出対象画像である可能性が高い領域を検出結果領域として検出する検出処理を行う。 As shown in FIG. 4, when image data indicating a use captured image of the (k−2) th frame is input from the image input unit 2 in step s1, the detection unit 3 receives the image in step s2. Using the data, detection processing is performed on the used captured image of the (k-2) frame using each detection frame corresponding to the used captured image of the (k-2) frame. That is, for each detection frame corresponding to the (k-2) -th frame used captured image, the detection unit 3 uses the detection target image having the same size as the detection frame in the (k-2) -frame used captured image. Detection processing is performed to detect a region having a high possibility as a detection result region.

ステップｓ２の実行後、ステップｓ３において、検出部３は、画像入力部２から、（ｋ−１）フレーム目の使用撮像画像を示す画像データが入力されると、ステップｓ４において、当該画像データを用いて、（ｋ−１）フレーム目の使用撮像画像に対応する各検出枠を用いて、（ｋ−１）フレーム目の使用撮像画像について検出処理を行う。 After execution of step s2, in step s3, when the image data indicating the used captured image of the (k−1) th frame is input from the image input unit 2, the detection unit 3 receives the image data in step s4. Using the detection frames corresponding to the used captured image of the (k-1) th frame, the detection processing is performed on the used captured image of the (k-1) th frame.

ステップｓ４の実行後、ステップｓ５において、検出部３は、画像入力部２から、ｋフレーム目の使用撮像画像を示す画像データが入力されると、ステップｓ６において、当該画像データを用いて、ｋフレーム目の使用撮像画像に対応する各検出枠を用いて、ｋフレーム目の使用撮像画像について検出処理を行う。 After execution of step s4, in step s5, when the image data indicating the use-captured image of the k-th frame is input from the image input unit 2, the detection unit 3 uses the image data in step s6 to obtain k. Using each detection frame corresponding to the used captured image of the frame, detection processing is performed on the used captured image of the kth frame.

このように、本実施の形態では、使用撮像画像群を構成する、処理対象画像を含むＫ枚の使用撮像画像に対してＴ種類の検出枠が分散して対応付けられていることから、処理対象画像に対してはＴ種類の検出枠よりも少ない種類の検出枠が対応付けられる。したがって、処理対象画像については、Ｔ種類の検出枠よりも少ない種類の検出枠が使用されて検出処理が行われる。よって、処理対象画像についてＴ種類の検出枠が使用されて検出処理が行われる場合と比較して、処理対象画像についての処理量を低減することができる。 As described above, in the present embodiment, the T types of detection frames are distributed and associated with the K used captured images including the processing target images that constitute the used captured image group. Fewer types of detection frames are associated with the target image than the T types of detection frames. Therefore, for the processing target image, detection processing is performed using fewer types of detection frames than the T types of detection frames. Therefore, the processing amount for the processing target image can be reduced as compared with the case where the detection processing is performed using T types of detection frames for the processing target image.

＜検出処理の詳細＞
次に検出処理の詳細について説明する。図５は検出部３の構成を示す図である。図５に示されるように、検出部３は、画像切り取り部３０と、特徴量抽出部３１と、識別器３２と、判定部３３と、画像サイズ変更部３４とを備えている。 <Details of detection processing>
Next, details of the detection process will be described. FIG. 5 is a diagram illustrating a configuration of the detection unit 3. As shown in FIG. 5, the detection unit 3 includes an image cropping unit 30, a feature amount extraction unit 31, a discriminator 32, a determination unit 33, and an image size change unit 34.

本実施の形態では、後述するように、特徴量抽出部３１は、入力される画像から特徴量を抽出する。そして、特徴量抽出部３１においては、入力された画像から特徴量を抽出するために、基準サイズの画像を入力する必要がある。 In the present embodiment, as will be described later, the feature amount extraction unit 31 extracts a feature amount from an input image. The feature amount extraction unit 31 needs to input an image of a reference size in order to extract the feature amount from the input image.

一方で、本実施の形態では、互いにサイズが異なるＴ種類の検出枠には、基準サイズと同じサイズの検出枠と、基準サイズとは異なるサイズの検出枠とが含まれている。以後、基準サイズと同じサイズの検出枠を「基準検出枠」と呼び、基準サイズとは異なるサイズの検出枠を「非基準検出枠」と呼ぶ。本実施の形態では、Ｔ種類の検出枠のうちのサイズが最小の検出枠が基準検出枠となっている。したがって、Ｔ種類の検出枠のうち、（Ｔ−１）種類の検出枠が非基準検出枠となる。（Ｔ−１）種類の非基準検出枠のそれぞれのサイズは基準サイズよりも大きくなっている。基準検出枠のサイズは、例えば１６ｐ×１６ｐである。また、（Ｔ−１）種類の非基準検出枠には、例えば、大きさが１８ｐ×１８ｐの非基準検出枠及び大きさが２０ｐ×２０ｐの非基準検出枠などが含まれている。 On the other hand, in this embodiment, T types of detection frames having different sizes include a detection frame having the same size as the reference size and a detection frame having a size different from the reference size. Hereinafter, a detection frame having the same size as the reference size is referred to as a “reference detection frame”, and a detection frame having a size different from the reference size is referred to as a “non-reference detection frame”. In the present embodiment, the detection frame having the smallest size among the T types of detection frames is the reference detection frame. Therefore, (T-1) types of detection frames among the T types of detection frames are non-reference detection frames. The size of each of the (T-1) types of non-reference detection frames is larger than the reference size. The size of the reference detection frame is, for example, 16p × 16p. The (T-1) types of non-reference detection frames include, for example, a non-reference detection frame having a size of 18p × 18p and a non-reference detection frame having a size of 20p × 20p.

本実施の形態では、検出部３は、使用撮像画像について基準検出枠を使用して検出処理を行う際には、使用撮像画像に対して基準検出枠を移動させながら、当該基準検出枠内の画像に対して顔画像の検出を行って、当該画像が顔画像である可能性が高いかを判定する。そして、検出部３は、使用撮像画像において、顔画像である可能性が高いと判定した領域（基準検出枠内の画像）を検出結果領域とする。 In the present embodiment, when performing detection processing using a reference detection frame for a used captured image, the detection unit 3 moves the reference detection frame with respect to the used captured image while moving the reference detection frame within the reference detection frame. A face image is detected for the image to determine whether the image is highly likely to be a face image. And the detection part 3 makes the area | region (image in a reference | standard detection frame) determined with the possibility that it is a face image high in a use captured image as a detection result area | region.

一方で、検出部３は、使用撮像画像について非基準検出枠を使用して検出処理を行う際には、基準サイズとサイズが一致するように非基準検出枠をサイズ変更する。そして、検出部３は、非基準検出枠のサイズ変更に応じて使用撮像画像のサイズ変更を行う。検出部３は、サイズ変更を行った使用撮像画像に対して、サイズ変更を行った非基準検出枠を移動させながら、当該非基準検出枠内の画像に対して顔画像の検出を行って、当該画像が顔画像である可能性が高いかを判定する。そして、検出部３は、サイズ変更を行った使用撮像画像において、顔画像である可能性が高いと判定した領域（サイズ変更後の非基準検出枠内の画像）に基づいて、サイズ変更が行われていない、本来のサイズの使用撮像画像において顔画像である可能性が高い領域を特定し、当該領域を検出結果領域とする。 On the other hand, when performing detection processing using the non-reference detection frame for the used captured image, the detection unit 3 changes the size of the non-reference detection frame so that the size matches the reference size. Then, the detection unit 3 changes the size of the used captured image in accordance with the size change of the non-reference detection frame. The detection unit 3 detects the face image for the image in the non-reference detection frame while moving the non-reference detection frame whose size has been changed with respect to the used captured image whose size has been changed, It is determined whether or not the image is likely to be a face image. Then, the detection unit 3 performs the size change based on the region (the image in the non-reference detection frame after the size change) that is determined to be highly likely to be a face image in the use-captured image after the size change. A region that has a high possibility of being a face image in a used captured image of an original size that is not known is identified, and the region is set as a detection result region.

以後、使用撮像画像に対して非基準検出枠が使用されて検出処理が行われる際のサイズ変更後の当該使用撮像画像を「サイズ変更画像」と呼ぶ。また、使用撮像画像に対して非基準検出枠が使用されて検出処理が行われる際のサイズ変更後の当該非基準検出枠を「サイズ変更検出枠」と呼ぶ。 Hereinafter, the used captured image after the size change when the non-reference detection frame is used for the used captured image and the detection process is performed is referred to as a “size-changed image”. In addition, the non-reference detection frame after the size change when the detection process is performed using the non-reference detection frame for the used captured image is referred to as a “size change detection frame”.

このように、本実施の形態では、検出部３が使用撮像画像に対して基準検出枠を使用して検出処理を行う際の当該検出部３の動作と、検出部３が使用撮像画像に対して非基準検出枠を使用して検出処理を行う際の当該検出部３の動作とが異なっている。以下に検出部３の動作について詳細に説明する。 As described above, in the present embodiment, the operation of the detection unit 3 when the detection unit 3 performs the detection process on the used captured image using the reference detection frame, and the detection unit 3 operates on the used captured image. Thus, the operation of the detection unit 3 when performing the detection process using the non-reference detection frame is different. The operation of the detection unit 3 will be described in detail below.

検出部３では、検出処理に基準検出枠が使用される際には、画像切り取り部３０が、使用撮像画像に対して基準検出枠を設定し、当該使用撮像画像から当該基準検出枠内の画像を切り取って特徴量抽出部３１に入力する。一方で、検出処理に非基準検出枠が使用される際には、画像切り取り部３０は、画像サイズ変更部３４で使用撮像画像がサイズ変更されることによって得られたサイズ変更画像に対して、非基準検出枠をサイズ変更して得られるサイズ変更検出枠を設定し、当該サイズ変更画像から当該サイズ変更検出枠内の画像を切り取って特徴量抽出部３１に入力する。 In the detection unit 3, when the reference detection frame is used for the detection process, the image cutout unit 30 sets a reference detection frame for the use captured image, and the image within the reference detection frame from the use captured image. Is input to the feature amount extraction unit 31. On the other hand, when the non-reference detection frame is used for the detection process, the image cropping unit 30 applies the size change image obtained by resizing the use captured image by the image size changing unit 34 to the size change image. A size change detection frame obtained by resizing the non-reference detection frame is set, and an image within the size change detection frame is cut out from the size change image and input to the feature amount extraction unit 31.

ここで、基準検出枠のサイズは基準サイズと一致することから、画像切り取り部３０において切り取られた基準検出枠内の画像のサイズは基準サイズとなる。また、サイズ変更検出枠のサイズは基準サイズと一致することから、画像切り取り部３０において切り取られたサイズ変更検出枠内の画像のサイズは基準サイズとなる。よって、特徴量抽出部３１には、常に基準サイズの画像が入力される。 Here, since the size of the reference detection frame coincides with the reference size, the size of the image in the reference detection frame cut out by the image cutout unit 30 becomes the reference size. In addition, since the size of the size change detection frame matches the reference size, the size of the image in the size change detection frame cut out by the image cutout unit 30 becomes the reference size. Therefore, an image having a reference size is always input to the feature amount extraction unit 31.

特徴量抽出部３１は、入力された画像（使用撮像画像における基準検出枠内の画像あるいはサイズ変更画像におけるサイズ変更検出枠内の画像）から、例えばＨａａｒ−ｌｉｋｅ特徴量やＬＢＰ（Local Binary Pattern）特徴量などの特徴量を抽出する。 The feature amount extraction unit 31 uses, for example, a Haar-like feature amount or LBP (Local Binary Pattern) from the input image (the image in the reference detection frame in the used captured image or the image in the size change detection frame in the size change image). Extract feature quantities such as feature quantities.

識別器３２は、特徴量抽出部３１で抽出された特徴量と学習データに基づいて、画像切り取り部３０で切り取られた画像に対して顔検出を行った結果として、当該画像が顔画像である確からしさを示す検出確度値を実数値として出力する。つまり、識別器３２から出力値として出力される検出確度値は、基準検出枠内の画像あるいはサイズ変更検出枠内の画像についての顔画像らしさ（顔らしさ）を示していると言える。識別器３２としては、例えば、ＳＶＭ（Support Vector Machine）あるいはＡｄａｂｏｏｓｔが使用される。 The discriminator 32 performs face detection on the image cut out by the image cutout unit 30 based on the feature amount extracted by the feature amount extraction unit 31 and the learning data, and as a result, the image is a face image. The detection accuracy value indicating the probability is output as a real value. That is, it can be said that the detection accuracy value output as an output value from the discriminator 32 indicates the face image-likeness (face-likeness) of the image in the reference detection frame or the image in the size change detection frame. As the discriminator 32, for example, SVM (Support Vector Machine) or Adaboost is used.

判定部３３は、識別器２２から出力される検出確度値がしきい値以上であれば、画像切り取り部３０で切り取られた画像が顔画像である可能性が高いと判定する。つまり、基準検出枠が使用される際には、判定部３３は、使用対象画像における基準検出枠内の画像が、基準検出枠と同じサイズの顔画像である可能性が高い領域であると判定する。また、非基準検出枠が使用される際には、判定部３３は、サイズ変更画像におけるサイズ変更検出枠内の画像が、サイズ変更検出枠と同じサイズの顔画像である可能性が高い領域であると判定する。 If the detection accuracy value output from the discriminator 22 is equal to or greater than the threshold value, the determination unit 33 determines that there is a high possibility that the image cut out by the image cutout unit 30 is a face image. That is, when the reference detection frame is used, the determination unit 33 determines that the image in the reference detection frame in the use target image is a region that is highly likely to be a face image having the same size as the reference detection frame. To do. Further, when the non-reference detection frame is used, the determination unit 33 is an area in which there is a high possibility that the image in the size change detection frame in the size change image is a face image having the same size as the size change detection frame. Judge that there is.

一方で、判定部３３は、識別器２２から出力される検出確度値がしきい未満であれば、画像切り取り部３０で切り取られた画像が顔画像でない可能性が高いと判定する。つまり、基準検出枠が使用される際には、判定部３３は、使用撮像画像における基準検出枠内の画像が、基準検出枠と同じサイズの顔画像である可能性が高い領域ではないと判定する。また、非基準検出枠が使用される際には、判定部３３は、サイズ変更画像におけるサイズ変更検出枠内の画像が、サイズ変更検出枠と同じサイズの顔画像である可能性が高い領域ではないと判定する。 On the other hand, if the detection accuracy value output from the discriminator 22 is less than the threshold, the determination unit 33 determines that there is a high possibility that the image cut out by the image cutout unit 30 is not a face image. That is, when the reference detection frame is used, the determination unit 33 determines that the image in the reference detection frame in the used captured image is not a region that is highly likely to be a face image having the same size as the reference detection frame. To do. When the non-reference detection frame is used, the determination unit 33 determines that the image in the size change detection frame in the size change image is likely to be a face image having the same size as the size change detection frame. Judge that there is no.

判定部３３は、使用対象画像における基準検出枠内の画像が、基準検出枠と同じサイズの顔画像である可能性が高い領域であると判定すると、当該画像を検出結果領域とし、当該基準検出枠を検出結果枠とする。 When the determination unit 33 determines that the image in the reference detection frame in the use target image is a region that is highly likely to be a face image having the same size as the reference detection frame, the image is used as a detection result region, and the reference detection is performed. Let the frame be the detection result frame.

また判定部３３は、サイズ変更画像におけるサイズ変更検出枠内の画像が、サイズ変更検出枠と同じサイズの顔画像である可能性が高い領域であると判定すると、当該領域の外形枠である当該サイズ変更枠を仮検出結果枠とする。そして、判定部３３は、仮検出結果枠に基づいて、サイズ変更前の本来のサイズの使用撮像画像において、非基準検出枠と同じサイズの顔画像である可能性が高い領域を特定し、当該領域を検出結果領域とするとともに、当該検出結果領域の外形枠を最終的な検出結果枠とする。 Further, when the determination unit 33 determines that the image in the size change detection frame in the size change image is a region having a high possibility of being a face image having the same size as the size change detection frame, the determination unit 33 is the outline frame of the region. The size change frame is set as a temporary detection result frame. Then, the determination unit 33 identifies a region that is highly likely to be a face image having the same size as the non-reference detection frame in the use captured image of the original size before the size change based on the temporary detection result frame. The area is set as a detection result area, and the outer frame of the detection result area is set as a final detection result frame.

＜基準検出枠を用いた検出処理＞
次に、検出部３が使用撮像画像に対して基準検出枠を移動させながら、当該基準検出枠内の画像が顔画像である可能性が高いかを判定する際の当該検出部３の一連の動作について説明する。図６〜９は、検出部３の当該動作を説明するための図である。検出部３は、基準検出枠をラスタスキャンさせながら、当該基準検出枠内の画像に対して顔画像の検出を行う。本実施の形態では、基準検出枠は、最小サイズの基準枠であり、図３に示されるように最小サイズの基準枠は（ｋ−２）フレーム目の使用撮像画像に対応付けられていることから、基準検出枠が使用される際には、（ｋ−２）フレーム目の使用撮像画像に対して検出処理が行われる。 <Detection process using reference detection frame>
Next, the detection unit 3 moves the reference detection frame with respect to the used captured image, and determines whether the image in the reference detection frame is likely to be a face image. The operation will be described. 6 to 9 are diagrams for explaining the operation of the detection unit 3. The detection unit 3 detects a face image with respect to an image in the reference detection frame while raster scanning the reference detection frame. In this embodiment, the reference detection frame is a reference frame of the minimum size, and the reference frame of the minimum size is associated with the used captured image of the (k-2) th frame as shown in FIG. Therefore, when the reference detection frame is used, the detection process is performed on the used captured image of the (k-2) th frame.

図６に示されるように、画像切り取り部３０は、基準検出枠１００が対応付けられた使用撮像画像２０（（ｋ−２）フレーム目の使用撮像画像）の左上にまず基準検出枠１００を設定して、当該基準検出枠１００内の画像を切り取る。その後、特徴量抽出部３１は、画像切り取り部３０で切り取られた画像から特徴量を抽出する。基準検出枠のサイズは基準サイズと一致することから、特徴量抽出部３１には基準サイズの画像が入力される。 As illustrated in FIG. 6, the image cropping unit 30 first sets the reference detection frame 100 on the upper left of the used captured image 20 (used captured image of the (k−2) frame) associated with the reference detection frame 100. Then, the image in the reference detection frame 100 is cut out. Thereafter, the feature quantity extraction unit 31 extracts the feature quantity from the image cut out by the image cutout unit 30. Since the size of the reference detection frame matches the reference size, an image of the reference size is input to the feature amount extraction unit 31.

識別器３２は、特徴量抽出部３１が抽出した特徴量と学習データに基づいて、画像切り取り部３０で切り取られた画像についての検出確度値を求める。判定部３３は、識別器３２で求められた検出確度値がしきい値以上である場合には、画像切り取り部３０で切り取られた画像、つまり使用撮像画像２０での左上の基準検出枠１００内の領域が顔画像である可能性が高いと判定し、当該領域を検出結果領域とし、当該領域の外形枠である当該基準検出枠１００を検出結果枠とする。 The discriminator 32 obtains a detection accuracy value for the image cut out by the image cutout unit 30 based on the feature amount extracted by the feature amount extraction unit 31 and the learning data. When the detection accuracy value obtained by the discriminator 32 is equal to or greater than the threshold value, the determination unit 33 determines that the image cut by the image cutout unit 30, that is, the upper left reference detection frame 100 in the used captured image 20. It is determined that there is a high possibility that the area is a face image, the area is set as a detection result area, and the reference detection frame 100 which is an outer frame of the area is set as a detection result frame.

次に画像切り取り部３０は、使用撮像画像２０において基準検出枠１００を少し右に移動させる。画像切り取り部３０は、例えば、１ピクセル分あるいは数ピクセル分だけ右に基準検出枠１００を移動させる。そして、画像切り取り部３０は、使用撮像画像２０における移動後の基準検出枠１００内の画像を切り取る。 Next, the image cropping unit 30 moves the reference detection frame 100 slightly to the right in the used captured image 20. For example, the image cutout unit 30 moves the reference detection frame 100 to the right by one pixel or several pixels. Then, the image cutout unit 30 cuts out the image in the reference detection frame 100 after movement in the used captured image 20.

その後、特徴量抽出部３１は、画像切り取り部３０で切り取られた画像から特徴量を抽出し、識別器３２が当該特徴量と学習データに基づいて、画像切り取り部３０で切り取られた画像についての検出確度値を求める。判定部３３は、識別器３２で求められた検出確度値がしきい値以上である場合には、画像切り取り部３０で切り取られた画像が顔画像である可能性が高いと判定して、当該画像を検出結果領域とするとともに、当該画像の外形枠である、画像切り取り部３０で設定された基準検出枠１００を検出結果枠とする。 Thereafter, the feature amount extraction unit 31 extracts a feature amount from the image cut out by the image cutout unit 30, and the discriminator 32 uses the feature cutout and the learning data for the image cut out by the image cutout unit 30. Find the detection accuracy value. When the detection accuracy value obtained by the discriminator 32 is equal to or greater than the threshold value, the determination unit 33 determines that the image cut out by the image cutout unit 30 is highly likely to be a face image, and The image is used as a detection result area, and the reference detection frame 100 set by the image cropping unit 30 that is the outer frame of the image is used as the detection result frame.

その後、検出部３は同様に動作して、図７に示されるように、基準検出枠１００が使用撮像画像２０の右端まで移動すると、検出部３は、右端の基準検出枠１００内の画像についての検出確度値を求める。そして、検出部３は、求めた検出確度値がしきい値以上であれば、右端の基準検出枠１００内の画像を検出結果領域とするとともに、当該右端の基準検出枠１００を検出結果枠とする。 Thereafter, the detection unit 3 operates in the same manner, and as illustrated in FIG. 7, when the reference detection frame 100 moves to the right end of the use captured image 20, the detection unit 3 detects the image in the reference detection frame 100 at the right end. The detection accuracy value is obtained. If the obtained detection accuracy value is equal to or greater than the threshold value, the detection unit 3 sets the image in the rightmost reference detection frame 100 as a detection result region, and uses the rightmost reference detection frame 100 as a detection result frame. To do.

次に、画像切り取り部３０は、図８に示されるように、基準検出枠１００を少し下げつつ使用撮像画像２０の左端に移動させた後、当該基準検出枠１００内の画像を切り取る。画像切り取り部３０は、上下方向（列方向）において例えば１ピクセル分あるいは数ピクセル分だけ下に基準検出枠１００を移動させる。その後、特徴量抽出部３１が、画像切り取り部３０で切り取られた画像から特徴量を抽出し、識別器３２が当該特徴量と学習データに基づいて、画像切り取り部３０で切り取られた画像についての検出確度値を求めて出力する。判定部３３は、識別器３２から出力される検出確度値がしきい値以上である場合には、画像切り取り部３０で切り取られた画像が顔画像である可能性が高いと判定して、当該画像を検出結果領域とするとともに、画像切り取り部３０で設定された基準検出枠１００を検出結果枠とする。 Next, as illustrated in FIG. 8, the image cutout unit 30 cuts the image in the reference detection frame 100 after moving the reference detection frame 100 to the left end of the used captured image 20 while slightly lowering the reference detection frame 100. The image cutout unit 30 moves the reference detection frame 100 downward by, for example, one pixel or several pixels in the vertical direction (column direction). Thereafter, the feature amount extraction unit 31 extracts the feature amount from the image cut out by the image cutout unit 30, and the discriminator 32 uses the feature cutout and the learning data for the image cut out by the image cutout unit 30. Find and output the detection accuracy value. When the detection accuracy value output from the discriminator 32 is equal to or greater than the threshold value, the determination unit 33 determines that the image cut out by the image cutout unit 30 is highly likely to be a face image, and The image is set as a detection result area, and the reference detection frame 100 set by the image cutout unit 30 is set as a detection result frame.

その後、検出部３は同様に動作して、図９に示されるように、基準検出枠１００が使用撮像画像２０の右下まで移動すると、検出部３は、右下の当該基準検出枠１００内の画像についての検出確度値を求める。そして、検出部３は、求めた検出確度値がしきい値以上であれば、右下の基準検出枠１００内の画像を検出結果領域とするとともに、当該右下の基準検出枠を検出結果枠とする。 Thereafter, the detection unit 3 operates in the same manner, and as shown in FIG. 9, when the reference detection frame 100 moves to the lower right of the use captured image 20, the detection unit 3 moves within the reference detection frame 100 on the lower right. The detection accuracy value for the image is obtained. If the obtained detection accuracy value is equal to or greater than the threshold value, the detection unit 3 sets the image in the lower right reference detection frame 100 as a detection result region and uses the lower right reference detection frame as the detection result frame. And

以上のようにして、検出部３は、基準検出枠を使用して、当該基準検出枠が対応付けられた使用撮像画像において、当該基準検出枠と同じサイズの顔画像である可能性が高い領域を検出結果領域として検出する。 As described above, the detection unit 3 uses the reference detection frame, and in the use captured image associated with the reference detection frame, the region that is highly likely to be a face image having the same size as the reference detection frame. Is detected as a detection result area.

＜非基準検出枠を用いた検出処理＞
検出部３が非基準検出枠を使用して検出処理を行う際には、画像切り取り部３０は、非基準検出枠の大きさが基準サイズ（基準検出枠のサイズ）と一致するように、当該非基準検出枠をサイズ変更する。そして、画像サイズ変更部３４が、非基準検出枠についてのサイズ変更比率と同じだけ、非基準検出枠が対応付けられた使用撮像画像をサイズ変更する。 <Detection process using non-reference detection frame>
When the detection unit 3 performs the detection process using the non-reference detection frame, the image cropping unit 30 is configured so that the size of the non-reference detection frame matches the reference size (the size of the reference detection frame). Resize the non-reference detection frame. Then, the image size changing unit 34 changes the size of the used captured image associated with the non-reference detection frame by the same size change ratio as the non-reference detection frame.

本実施の形態では、基準サイズは１６ｐ×１６ｐであることから、例えば、大きさがＲｐ×Ｒｐ（Ｒ＞１６）の非基準検出枠が使用される場合、画像切り取り部３０は、当該非基準検出枠の縦幅（上下方向の幅）及び横幅（左右方向の幅）をそれぞれ（１６／Ｒ）倍して当該非基準検出枠を縮小し、サイズ変更検出枠を生成する。そして、画像サイズ変更部３４は、当該非基準検出枠が対応付けられた使用撮像画像の縦幅（ピクセル数）及び横幅（ピクセル数）をそれぞれ（１６／Ｒ）倍して当該使用撮像画像を縮小し、サイズ変更画像を生成する。その後、検出部３は、上述の図６〜９を用いて説明した処理と同様に、サイズ変更画像に対してサイズ変更検出枠を移動させながら、当該サイズ変更検出枠内の画像が、当該サイズ変更検出枠と同じサイズの顔画像である可能性が高いか判定する。つまり、検出部３は、サイズ変更検出枠を用いて、サイズ変更画像において当該サイズ変更検出枠と同じサイズの顔画像である可能性が高い領域を検出する処理を行う。以後、この処理を「サイズ変更版検出処理」と呼ぶ。 In the present embodiment, since the reference size is 16p × 16p, for example, when a non-reference detection frame having a size of Rp × Rp (R> 16) is used, the image cropping unit 30 performs the non-reference detection. The non-reference detection frame is reduced by multiplying the vertical width (vertical width) and horizontal width (horizontal width) of the detection frame by (16 / R), respectively, and a size change detection frame is generated. Then, the image size changing unit 34 multiplies (16 / R) the vertical width (number of pixels) and the horizontal width (number of pixels) of the used captured image associated with the non-reference detection frame, respectively. Reduce and generate a resized image. After that, the detection unit 3 moves the size change detection frame with respect to the size change image and moves the image in the size change detection frame to the size in the same manner as the processing described with reference to FIGS. It is determined whether there is a high possibility that the face image has the same size as the change detection frame. That is, the detection unit 3 performs processing for detecting an area that is highly likely to be a face image having the same size as the size change detection frame in the size change image using the size change detection frame. Hereinafter, this process is referred to as a “size-changed version detection process”.

検出部３は、サイズ変更版検出処理において、サイズ変更画像に対してサイズ変更検出枠を設定し、当該サイズ変更検出枠内の画像が、当該サイズ変更検出枠と同じサイズの顔画像である可能性が高いと判定すると、当該画像の外形枠である当該サイズ変更検出枠を仮検出結果枠とする。これにより、検出部３では、サイズ変更画像について少なくとも一つの仮検出結果枠が得られる。サイズ変更検出枠のサイズは基準サイズと一致することから、サイズ変更版検出処理においては、基準サイズと一致する画像が特徴量抽出部３１に入力される。 In the size change version detection process, the detection unit 3 sets a size change detection frame for the size change image, and the image in the size change detection frame may be a face image having the same size as the size change detection frame. If it is determined that the property is high, the size change detection frame that is the outer frame of the image is set as a temporary detection result frame. As a result, the detection unit 3 obtains at least one temporary detection result frame for the resized image. Since the size of the size change detection frame matches the reference size, an image that matches the reference size is input to the feature amount extraction unit 31 in the size change version detection process.

検出部３では、サイズ変更画像について少なくとも一つの仮検出結果枠が得られると、判定部３３が、当該少なくとも一つの仮検出結果枠を、本来のサイズの使用撮像画像に応じた検出結果枠に変換する。 When the detection unit 3 obtains at least one temporary detection result frame for the size-changed image, the determination unit 33 converts the at least one temporary detection result frame into a detection result frame according to the captured image of the original size. Convert.

具体的には、判定部３３は、まず、サイズ変更画像に対して、得られた少なくとも一つの仮検出結果枠を設定する。図１０は、サイズ変更画像１２０に対して仮検出結果枠１３０が設定されている様子を示す図である。図１０の例では、サイズ変更画像１２０に対して複数の仮検出結果枠１３０が設定されている。 Specifically, the determination unit 33 first sets at least one obtained temporary detection result frame for the resized image. FIG. 10 is a diagram illustrating a state in which the temporary detection result frame 130 is set for the size-changed image 120. In the example of FIG. 10, a plurality of temporary detection result frames 130 are set for the resized image 120.

次に判定部３３は、図１１に示されるように、仮検出結果枠１３０が設定されたサイズ変更画像１２０を拡大（サイズ変更）して元のサイズに戻すことによって、サイズ変更画像１２０を使用撮像画像２０に変換する。これにより、サイズ変更画像１２０に設定された仮検出結果枠１３０も拡大されて、仮検出結果枠１３０は、図１１に示されるように、使用撮像画像２０に応じた検出結果枠１５０に変換される。使用撮像画像２０における検出結果枠１５０内の領域が、使用撮像画像２０において非基準検出枠と同じサイズの顔画像である可能性が高い検出結果領域となる。これにより、検出部３では、サイズ変更版検出処理によって得られた仮検出結果枠１３０に基づいて、使用撮像画像において非基準検出枠と同じサイズの顔画像である可能性が高い検出結果領域が特定される。 Next, as illustrated in FIG. 11, the determination unit 33 uses the resized image 120 by enlarging (resizing) the resized image 120 in which the temporary detection result frame 130 is set and returning the resized image to the original size. The captured image 20 is converted. As a result, the temporary detection result frame 130 set in the size-changed image 120 is also enlarged, and the temporary detection result frame 130 is converted into a detection result frame 150 corresponding to the used captured image 20, as shown in FIG. The A region within the detection result frame 150 in the use captured image 20 is a detection result region that is highly likely to be a face image having the same size as the non-reference detection frame in the use captured image 20. Thereby, in the detection unit 3, based on the temporary detection result frame 130 obtained by the size-changed version detection process, a detection result region that is highly likely to be a face image having the same size as the non-reference detection frame in the used captured image. Identified.

このように、検出部３は、非基準検出枠を使用して使用撮像画像についての検出処理を行う際には、サイズが基準サイズと一致するようにサイズ変更した非基準検出枠と、当該非基準検出枠のサイズ変更に応じてサイズ変更した使用撮像画像とを使用してサイズ変更版検出処理を行う。これにより、基準サイズとは異なるサイズの検出枠が使用される場合であっても、特徴量抽出部３１には基準サイズの画像が入力される。そして、検出部３は、サイズ変更版検出処理の結果に基づいて、使用撮像画像において非基準検出枠と同じサイズの顔画像である可能性が高い検出結果領域を特定する。 As described above, when the detection unit 3 performs the detection process on the used captured image using the non-reference detection frame, the non-reference detection frame whose size is changed to match the reference size, the non-reference detection frame The size-changed version detection process is performed using the used captured image whose size is changed in accordance with the size change of the reference detection frame. Thus, even when a detection frame having a size different from the reference size is used, an image of the reference size is input to the feature amount extraction unit 31. And the detection part 3 pinpoints the detection result area | region with high possibility that it is a face image of the same size as a non-reference | standard detection frame in a use captured image based on the result of a size-change version detection process.

なお、検出部３は、非基準検出枠を使用して使用撮像画像についての検出処理を行う際には、基準検出枠を使用する場合と同様に、当該非基準検出枠及び当該使用撮像画像をサイズ変更せずに、当該使用撮像画像に対して当該非基準検出枠を移動させながら、当該非基準検出枠内の画像が顔画像である可能性が高いかを判定しても良い。この場合には、画像切り取り部３０は、使用撮像画像に対して非基準検出枠を設定するたびに（非基準検出枠を移動させるたびに）、当該非基準検出枠内の画像のサイズを基準サイズに変更した上で当該画像を特徴量抽出部３１に入力することになる。そのため、使用撮像画像に対して非基準検出枠を設定するたびに画像のサイズ変更処理が必要となる。処理量の低減の観点からは、上記のサイズ変更版検出処理のように、非基準検出枠及び使用撮像画像をサイズ変更した上で処理を行う方が望ましい。 In addition, when performing the detection process on the used captured image using the non-reference detection frame, the detection unit 3 uses the non-reference detection frame and the used captured image as in the case of using the reference detection frame. It may be determined whether the image in the non-reference detection frame is likely to be a face image while moving the non-reference detection frame with respect to the used captured image without changing the size. In this case, each time the non-reference detection frame is set for the used captured image (each time the non-reference detection frame is moved), the image cropping unit 30 uses the size of the image in the non-reference detection frame as a reference. After changing the size, the image is input to the feature amount extraction unit 31. Therefore, every time a non-reference detection frame is set for a used captured image, an image size changing process is required. From the viewpoint of reducing the amount of processing, it is desirable to perform the processing after resizing the non-reference detection frame and the used captured image as in the above-described size-changed detection processing.

また、非基準検出枠のサイズが、基準検出枠のサイズ（基準サイズ）に対して、縦方向及び横方向のそれぞれで整数倍の場合には、サイズ変更画像を生成する際に、使用撮像画像の縦幅及び横幅のそれぞれを（１／整数）倍とすることができることから、平均化フィルタ等を使用して使用撮像画像を精度良く縮小することができる。よって、非基準検出枠の縦幅及び横幅は、基準検出枠の縦幅及び横幅に対してそれぞれ整数倍であることが望ましい。 In addition, when the size of the non-reference detection frame is an integral multiple of the size of the reference detection frame (reference size) in each of the vertical direction and the horizontal direction, the captured image used when generating the size-changed image Since each of the vertical width and the horizontal width can be (1 / integer) times, the used captured image can be accurately reduced using an averaging filter or the like. Therefore, it is desirable that the vertical width and the horizontal width of the non-reference detection frame are each an integral multiple of the vertical width and the horizontal width of the reference detection frame.

また、一つの使用撮像画像に対して対応付けられる複数種類の検出枠については、当該複数種類の検出枠をサイズの小さいものから順に並べた際に、隣り合う２種類の検出枠から成る各組においては、サイズの大きい方の検出枠についての縦幅及び横幅が、サイズが小さい方の検出枠の縦幅及び横幅に対してそれぞれ固定の整数倍であることが望ましい。 In addition, for a plurality of types of detection frames associated with one used captured image, each set of two types of adjacent detection frames is arranged when the plurality of types of detection frames are arranged in order from the smallest size. In this case, it is desirable that the vertical width and horizontal width of the detection frame having the larger size are fixed integer multiples of the vertical width and horizontal width of the detection frame having the smaller size.

例えば、（ｋ−２）フレーム目の使用撮像画像に対して、基準検出枠と、互いにサイズが異なる第１及び第２の非基準検出枠が対応付けられており、基準検出枠、第１の非基準検出枠及び第２の非基準検出枠の順でサイズが小さいものとする。この場合には、第１の非基準検出枠の縦幅及び横幅を、例えば、基準検出枠の縦幅及び横幅に対してそれぞれ２倍（面積で言うと４倍）に設定する。そして、第２の非基準検出枠の縦幅及び横幅を、第１の非基準検出枠の縦幅及び横幅に対してそれぞれ２倍に設定する。 For example, the reference detection frame and the first and second non-reference detection frames having different sizes are associated with the used captured image of the (k-2) th frame, and the reference detection frame, the first It is assumed that the size is smaller in the order of the non-reference detection frame and the second non-reference detection frame. In this case, for example, the vertical width and the horizontal width of the first non-reference detection frame are each set to be twice (in terms of area, 4 times) the vertical width and the horizontal width of the reference detection frame. Then, the vertical width and the horizontal width of the second non-reference detection frame are each set to be twice the vertical width and the horizontal width of the first non-reference detection frame.

（ｋ−２）フレーム目の使用撮像画像に対して、このようなサイズの複数種類の検出枠が対応付けられている場合には、第１の非基準検出枠を使用して検出処理を行う際に、（ｋ−２）フレーム目の使用撮像画像の縦幅及び横幅を（１／２）倍に設定して、第１の非基準検出枠についてのサイズ変更画像を生成する。そして、第２の非基準検出枠を使用して検出処理を行う際には、（ｋ−２）フレーム目の使用撮像画像を縮小するのではなく、第１の非基準検出枠についてのサイズ変更画像の縦幅及び横幅を（１／２）倍に設定して、第２の非基準検出枠についてのサイズ変更画像を生成する。これにより、同じ縮小倍率の縮小処理を行うことによって、第１及び第２の非基準検出枠についてのサイズ変更画像を生成することができる。 (K-2) When a plurality of types of detection frames of such a size are associated with the used captured image of the frame, the detection process is performed using the first non-reference detection frame. At this time, the vertical and horizontal widths of the used captured image of the (k−2) th frame are set to (½) times to generate a size-changed image for the first non-reference detection frame. When performing the detection process using the second non-reference detection frame, the size of the first non-reference detection frame is changed instead of reducing the use captured image of the (k-2) frame. The vertical and horizontal widths of the image are set to (1/2) times to generate a resized image for the second non-reference detection frame. As a result, by performing the reduction process with the same reduction ratio, it is possible to generate the resized images for the first and second non-reference detection frames.

また、（ｋ−１）フレーム目の使用撮像画像に対して、互いにサイズが異なる第３〜第５の非基準検出枠が対応付けられており、第３〜第５の非基準検出枠の順でサイズが小さいものとする。この場合には、第３の非基準検出枠の縦幅及び横幅を、例えば、基準検出枠の縦幅及び横幅に対してそれぞれ３倍（面積で言うと９倍）に設定する。また、第４の非基準検出枠の縦幅及び横幅を、第３の非基準検出枠の縦幅及び横幅に対してそれぞれ３倍に設定する。そして、第５の非基準検出枠の縦幅及び横幅を、第４の非基準検出枠の縦幅及び横幅に対してそれぞれ３倍に設定する。 Also, the use captured image of the (k−1) th frame is associated with the third to fifth non-reference detection frames having different sizes, and the order of the third to fifth non-reference detection frames is the order. It is assumed that the size is small. In this case, the vertical width and the horizontal width of the third non-reference detection frame are set to, for example, 3 times (9 times in terms of area) with respect to the vertical width and the horizontal width of the reference detection frame. Further, the vertical width and the horizontal width of the fourth non-reference detection frame are set to be three times the vertical width and the horizontal width of the third non-reference detection frame, respectively. Then, the vertical width and the horizontal width of the fifth non-reference detection frame are set to be three times the vertical width and the horizontal width of the fourth non-reference detection frame, respectively.

（ｋ−１）フレーム目の使用撮像画像に対して、このようなサイズの複数種類の検出枠が対応付けられている場合には、第３の非基準検出枠を使用して検出処理を行う際に、（ｋ−１）フレーム目の使用撮像画像の縦幅及び横幅を（１／３）倍に設定して、第３の非基準検出枠についてのサイズ変更画像を生成する。また、第４の非基準検出枠を使用して検出処理を行う際には、（ｋ−１）フレーム目の使用撮像画像を縮小するのではなく、第３の非基準検出枠についてのサイズ変更画像の縦幅及び横幅を（１／３）倍に設定して、第４の非基準検出枠についてのサイズ変更画像を生成する。そして、第５の非基準検出枠を使用して検出処理を行う際には、（ｋ−１）フレーム目の使用撮像画像を縮小するのではなく、第４の非基準検出枠についてのサイズ変更画像の縦幅及び横幅を（１／３）倍に設定して、第５の非基準検出枠についてのサイズ変更画像を生成する。これにより、同じ縮小倍率の縮小処理を行うことによって、第３〜第５の非基準検出枠についてのサイズ変更画像を生成することができる。 When a plurality of types of detection frames of such a size are associated with the used captured image of the (k-1) th frame, the detection process is performed using the third non-reference detection frame. At this time, the vertical and horizontal widths of the used captured image of the (k−1) th frame are set to (1/3) times to generate a size-changed image for the third non-reference detection frame. Further, when performing the detection process using the fourth non-reference detection frame, the size of the third non-reference detection frame is changed instead of reducing the used captured image of the (k−1) frame. The vertical and horizontal widths of the image are set to (1/3) times to generate a resized image for the fourth non-reference detection frame. When performing the detection process using the fifth non-reference detection frame, the size of the fourth non-reference detection frame is changed instead of reducing the use captured image of the (k−1) frame. The vertical and horizontal widths of the images are set to (1/3) times to generate a resized image for the fifth non-reference detection frame. As a result, by performing the reduction process with the same reduction ratio, it is possible to generate the resized images for the third to fifth non-reference detection frames.

また、ｋフレーム目の使用撮像画像に対して、互いにサイズが異なる第６〜第８の非基準検出枠が対応付けられており、第６〜第８の非基準検出枠の順でサイズが小さいものとする。この場合には、第６の非基準検出枠の縦幅及び横幅を、例えば、基準検出枠の縦幅及び横幅に対してそれぞれ５倍（面積で言うと２５倍）に設定する。また、第７の非基準検出枠の縦幅及び横幅を、第６の非基準検出枠の縦幅及び横幅に対してそれぞれ５倍に設定する。そして、第８の非基準検出枠の縦幅及び横幅を、第７の非基準検出枠の縦幅及び横幅に対してそれぞれ５倍に設定する。 Further, the sixth to eighth non-reference detection frames having different sizes are associated with the used captured image of the k-th frame, and the sizes are smaller in the order of the sixth to eighth non-reference detection frames. Shall. In this case, the vertical width and the horizontal width of the sixth non-reference detection frame are set to, for example, 5 times (25 times in terms of area) with respect to the vertical width and the horizontal width of the reference detection frame. Further, the vertical width and the horizontal width of the seventh non-reference detection frame are set to 5 times the vertical width and the horizontal width of the sixth non-reference detection frame, respectively. Then, the vertical width and the horizontal width of the eighth non-reference detection frame are each set to 5 times the vertical width and the horizontal width of the seventh non-reference detection frame.

ｋフレーム目の使用撮像画像に対して、このようなサイズの複数種類の検出枠が対応付けられている場合には、第６の非基準検出枠を使用して検出処理を行う際に、ｋフレーム目の使用撮像画像の縦幅及び横幅を（１／５）倍に設定して、第６の非基準検出枠についてのサイズ変更画像を生成する。また、第７の非基準検出枠を使用して検出処理を行う際には、ｋフレーム目の使用撮像画像を縮小するのではなく、第６の非基準検出枠についてのサイズ変更画像の縦幅及び横幅を（１／５）倍に設定して、第７の非基準検出枠についてのサイズ変更画像を生成する。そして、第８の非基準検出枠を使用して検出処理を行う際には、ｋフレーム目の使用撮像画像を縮小するのではなく、第７の非基準検出枠についてのサイズ変更画像の縦幅及び横幅を（１／５）倍に設定して、第８の非基準検出枠についてのサイズ変更画像を生成する。これにより、同じ縮小倍率の縮小処理を行うことによって、第６〜第８の非基準検出枠についてのサイズ変更画像を生成することができる。 When a plurality of types of detection frames of such a size are associated with the use image of the k-th frame, k is used when performing detection processing using the sixth non-reference detection frame. The vertical and horizontal widths of the used captured image of the frame are set to (1/5) times to generate a resized image for the sixth non-reference detection frame. In addition, when performing detection processing using the seventh non-reference detection frame, the vertical width of the size-changed image for the sixth non-reference detection frame is used instead of reducing the use image of the kth frame. And the horizontal width is set to (1/5) times, and the resized image for the seventh non-reference detection frame is generated. When performing the detection process using the eighth non-reference detection frame, the vertical width of the size-changed image for the seventh non-reference detection frame is used instead of reducing the use image of the k-th frame. And the width is set to (1/5) times, and the resized image for the eighth non-reference detection frame is generated. As a result, by performing the reduction process with the same reduction ratio, it is possible to generate the resized images for the sixth to eighth non-reference detection frames.

検出部３は、以上のような検出処理を、使用撮像画像群を構成する複数枚の使用撮像画像のそれぞれについて、当該使用撮像画像に対応付けられた複数種類の検出枠（本例では１０種類の検出枠）のそれぞれを用いて行う。これにより、各使用撮像画像に関して、当該使用撮像画像に対応付けられた複数種類の検出枠のそれぞれに対応して少なくとも一つの検出結果領域（顔画像である可能性が高い領域）及び検出結果枠（顔画像である可能性が高い領域の外形枠）が得られるとともに、各検出結果枠に対応した検出確度値が得られる。使用撮像画像について得られた検出結果枠に対応した検出確度値とは、当該使用撮像画像における当該検出結果枠内の画像が顔画像である確からしさを示している。 The detection unit 3 performs the above-described detection processing for each of a plurality of use captured images constituting the use captured image group, and a plurality of types of detection frames (in this example, 10 types) associated with the use captured image. Each detection frame). As a result, for each use captured image, at least one detection result region (region that is likely to be a face image) and a detection result frame corresponding to each of a plurality of types of detection frames associated with the use captured image. (Outer frame of an area having a high possibility of being a face image) is obtained, and detection accuracy values corresponding to the respective detection result frames are obtained. The detection accuracy value corresponding to the detection result frame obtained for the used captured image indicates the probability that the image in the detection result frame in the used captured image is a face image.

＜検出対象画像特定部の動作説明＞
処理対象画像から顔画像が検出される際に使用される使用撮像画像群を構成する複数枚の使用撮像画像については、それらの撮像タイミングを互いに近づけることによって、互いに同じ画像であると考えることができる。 <Description of operation of detection target image specifying unit>
A plurality of used captured images constituting a used captured image group used when a face image is detected from a processing target image may be considered to be the same image by bringing their imaging timings close to each other. it can.

本実施の形態では、撮像部での撮像フレームレートは３０ｆｐｓであって、撮像部で連続して撮像された３枚の撮像画像によって使用撮像画像群が構成されていることから、使用撮像画像の撮像間隔が（１／３０）秒となる。人が歩く速度を５ｋｍ／時間とすると、歩く人が（１／３０）秒の間に移動する距離は数ｃｍ程度となる。つまり、複数枚の使用撮像画像が撮像される間、人の顔はほとんど移動しない。したがって、処理対象画像から人の顔画像を検出するという観点においては、処理対象画像と、それ以外の使用撮像画像とは互いに同じ画像であると見ることができる。よって、使用撮像画像群における、処理対象画像以外の使用撮像画像について求められた検出結果枠は、処理対象画像について求められた検出結果枠と同等であると考えることができる。つまり、処理対象画像以外の使用撮像画像について求められた検出結果枠を処理対象画像に重ねて配置すると、処理対象画像における当該検出結果枠内の画像は、顔画像である可能性が高い画像であると言える。そして、処理対象画像以外の使用撮像画像について求められた検出結果枠についての検出確度値は、当該検出結果枠を処理対象画像に重ねて配置した場合における当該検出結果枠内の画像についての顔画像らしさを示していると言える。 In the present embodiment, the imaging frame rate at the imaging unit is 30 fps, and the use captured image group is configured by three captured images continuously captured by the imaging unit. The imaging interval is (1/30) seconds. If the walking speed of a person is 5 km / hour, the distance that the walking person moves in (1/30) seconds is about several centimeters. That is, the human face hardly moves while a plurality of use captured images are captured. Therefore, from the viewpoint of detecting a human face image from the processing target image, it can be seen that the processing target image and the other used captured images are the same image. Therefore, it can be considered that the detection result frame obtained for the used captured image other than the processing target image in the used captured image group is equivalent to the detection result frame obtained for the processing target image. In other words, if the detection result frame obtained for the used captured image other than the processing target image is placed on the processing target image, the image within the detection result frame in the processing target image is an image that is highly likely to be a face image. It can be said that there is. Then, the detection accuracy value for the detection result frame obtained for the use captured image other than the processing target image is the face image of the image within the detection result frame when the detection result frame is placed on the processing target image. It can be said that it shows its uniqueness.

そこで、本実施の形態に係る検出対象画像特定部７は、処理対象画像以外の使用撮像画像について得られた検出結果枠をすべて処理対象画像についての検出結果枠として使用する。つまり、検出対象画像特定部７は、使用撮像画像群について得られた検出結果枠のすべてを処理対象画像についての検出結果枠とする。そして、検出対象画像特定部７は、処理対象画像についての各検出結果枠と、当該各検出結果枠についての検出確度値とに基づいて、処理対象画像において顔画像を特定する。 Therefore, the detection target image specifying unit 7 according to the present embodiment uses all the detection result frames obtained for the used captured images other than the processing target images as detection result frames for the processing target images. That is, the detection target image specifying unit 7 sets all the detection result frames obtained for the used captured image group as detection result frames for the processing target image. Then, the detection target image specifying unit 7 specifies a face image in the processing target image based on each detection result frame for the processing target image and the detection accuracy value for each detection result frame.

このように、処理対象画像以外の使用撮像画像について得られた検出結果枠を処理対象画像についての検出結果枠として使用することによって、処理対象画像についての検出処理において、Ｔ種類の検出枠の一部しか使用していないにもかかわらず、Ｔ種類の検出枠のすべてを用いて検出処理を行った際に得られる検出結果枠と同等の検出結果枠を得ることができる。以後、特に断らない限り、検出対象画像特定部７において使用される、処理対象画像についての検出結果枠は、使用撮像画像群について得られた検出結果枠のすべてを意味するものとする。 As described above, by using the detection result frame obtained for the used captured image other than the processing target image as the detection result frame for the processing target image, in the detection processing for the processing target image, one of the T types of detection frames. In spite of using only a part, it is possible to obtain a detection result frame equivalent to the detection result frame obtained when the detection process is performed using all of the T types of detection frames. Hereinafter, unless otherwise specified, the detection result frame for the processing target image used in the detection target image specifying unit 7 means all of the detection result frames obtained for the used captured image group.

図１２は、使用撮像画像群を構成する複数枚の使用撮像画像について得られたすべての検出結果枠１５０が処理対象画像２０ａについての検出結果枠として処理対象画像２０ａに重ねて配置された様子を示す図である。 FIG. 12 shows a state in which all detection result frames 150 obtained for a plurality of use captured images constituting the use captured image group are arranged to overlap the process target image 20a as detection result frames for the process target image 20a. FIG.

図１２に示されるように、互いにサイズの異なる複数種類の検出枠が使用されて検出処理が行われることによって、様々な大きさの検出結果枠１５０が得られる。これは、処理対象画像２０ａに含まれる様々な大きさの顔画像が検出されていることを意味している。また、図１２に示されるように、得られた検出結果枠１５０が処理対象画像２０ａに重ねられると、一つの顔画像付近に複数の検出結果枠１５０が集中する。つまり、処理対象画像２０ａに含まれる一つの顔画像に対して複数の検出結果枠１５０が得られる。 As shown in FIG. 12, detection results frames 150 of various sizes are obtained by performing detection processing using a plurality of types of detection frames having different sizes. This means that face images of various sizes included in the processing target image 20a are detected. Also, as shown in FIG. 12, when the obtained detection result frame 150 is superimposed on the processing target image 20a, a plurality of detection result frames 150 are concentrated in the vicinity of one face image. That is, a plurality of detection result frames 150 are obtained for one face image included in the processing target image 20a.

このように、処理対象画像２０ａに含まれる一つの顔画像に対して複数の検出結果枠１５０が得られることから、このままでは、処理対象画像２０ａに含まれる顔画像の数を特定することが困難である。また、図１２のように検出結果枠１５０が重ねられた処理対象画像２０ａを表示装置に表示したとすると、処理対象画像２０ａ中に含まれる顔画像が複数の検出結果枠１５０で隠れてしまう可能性があり、当該顔画像を識別することが困難となる。 Thus, since a plurality of detection result frames 150 are obtained for one face image included in the processing target image 20a, it is difficult to specify the number of face images included in the processing target image 20a. It is. In addition, if the processing target image 20a with the detection result frame 150 overlapped as shown in FIG. 12 is displayed on the display device, the face image included in the processing target image 20a may be hidden by the plurality of detection result frames 150. And it is difficult to identify the face image.

そこで、顔画像付近に集中している複数の検出結果枠１５０を一つの検出結果枠に統合して統合検出結果枠を生成し、一つの顔画像には一つの統合検出結果枠を対応させることが望ましい。 Accordingly, a plurality of detection result frames 150 concentrated near the face image are integrated into one detection result frame to generate an integrated detection result frame, and one integrated detection result frame is associated with one face image. Is desirable.

一方で、複数の検出結果枠１５０を適切に統合しないと、統合検出結果枠内に顔画像が適切に収まらず、その結果、顔画像の検出精度が低下する可能性がある。 On the other hand, if the plurality of detection result frames 150 are not properly integrated, the face image does not fit properly in the integrated detection result frame, and as a result, the detection accuracy of the face image may be lowered.

本実施の形態に係る検出対象画像特定部７は、マップ生成部４が生成する出力値マップを用いて、処理対象画像において顔画像を特定し、その顔画像の外形枠を統合検出結果枠とすることによって、精度の良い統合検出結果枠、つまりその内側に適切に顔画像が収まっている統合検出結果枠を生成する。まず出力値マップの生成処理について説明する。 The detection target image specifying unit 7 according to the present embodiment uses the output value map generated by the map generation unit 4 to specify a face image in the processing target image, and uses the outline frame of the face image as the integrated detection result frame. By doing so, a highly accurate integrated detection result frame, that is, an integrated detection result frame in which the face image is properly contained inside is generated. First, output value map generation processing will be described.

＜出力値マップ生成処理＞
マップ生成部４は、検出部３での検出結果に基づいて、顔画像としての確からしさ（顔画像らしさ）を示す検出確度値についての処理対象画像での分布を示す出力値マップを生成する。 <Output value map generation processing>
The map generation unit 4 generates an output value map indicating the distribution in the processing target image with respect to the detection accuracy value indicating the likelihood (face image likelihood) as the face image based on the detection result of the detection unit 3.

具体的には、マップ生成部４は、処理対象画像と同様に、行方向にＭ個の値が並び、列方向にＮ個の値が並ぶ、合計（Ｍ×Ｎ）個の値から成るマップ２００を考える。そして、マップ生成部４は、処理対象画像についての一つの検出結果枠を対象検出結果枠とし、対象検出結果枠と同じ位置に、対象検出結果枠と同じ大きさの枠２１０をマップ２００に対して設定する。図１３は、マップ２００に対して枠２１０を設定した様子を示す図である。 Specifically, the map generation unit 4 is a map composed of a total of (M × N) values in which M values are arranged in the row direction and N values are arranged in the column direction, similarly to the processing target image. Think about 200. Then, the map generation unit 4 sets one detection result frame for the processing target image as the target detection result frame, and sets a frame 210 having the same size as the target detection result frame to the map 200 at the same position as the target detection result frame. To set. FIG. 13 is a diagram illustrating a state in which a frame 210 is set for the map 200.

次にマップ生成部４は、マップ２００における、枠２１０外の各値については“０”とし、枠２１０内の各値については、対象検出結果枠に対応する検出確度値（対象検出結果枠となった検出枠内の画像に対して顔画像の検出を行った結果得られた検出確度値）を用いて決定する。対象検出結果枠の大きさが、例えば１６ｐ×１６ｐであるとすると、枠２１０内には、行方向に１６個、列方向に１６個、合計２５６個の値が存在する。また、対象検出結果枠の大きさが、例えば２０ｐ×２０ｐであるとすると、枠２１０内には、行方向に２０個、列方向に２０個、合計４００個の値が存在する。図１４は、枠２１０内の各値を決定する方法を説明するための図である。 Next, the map generation unit 4 sets “0” for each value outside the frame 210 in the map 200, and for each value within the frame 210, a detection accuracy value corresponding to the target detection result frame (the target detection result frame and The detection accuracy value obtained as a result of detecting the face image with respect to the image within the detection frame is determined. If the size of the target detection result frame is, for example, 16p × 16p, there are 16 values in the row 210, 16 in the column direction and 16 in the column direction, for a total of 256 values. Further, assuming that the size of the target detection result frame is 20p × 20p, for example, there are 20 values in the row 210 and 20 in the column direction, for a total of 400 values. FIG. 14 is a diagram for explaining a method for determining each value in the frame 210.

マップ生成部４は、枠２１０内の中心２１１の値を、検出部３で求められた、対象検出結果枠に対応する検出確度値とする。そして、マップ生成部４は、枠２１０内のそれ以外の複数の値を、枠２１０の中心２１１の値を最大値とした正規分布曲線に従って枠２１０内の中心２１１から外側に向けて値が徐々に小さくなるようにする。これにより、マップ２００を構成する複数の値のそれぞれが決定されて、対象検出結果枠に対応するマップ２００が完成する。 The map generation unit 4 sets the value of the center 211 in the frame 210 as the detection accuracy value corresponding to the target detection result frame obtained by the detection unit 3. Then, the map generation unit 4 gradually increases the values of the other values in the frame 210 from the center 211 in the frame 210 to the outside according to a normal distribution curve with the value at the center 211 of the frame 210 as the maximum value. To be smaller. Thereby, each of a plurality of values constituting the map 200 is determined, and the map 200 corresponding to the target detection result frame is completed.

以上のようにして、マップ生成部４は、処理対象画像についての複数の検出結果枠にそれぞれ対応する複数のマップ２００を生成する。言い換えれば、マップ生成部４は、使用撮像画像群を構成する複数枚の使用撮像画像について得られた複数の検出結果枠にそれぞれ対応する複数のマップ２００を生成する。そして、マップ生成部４は、生成した複数のマップ２００を合成して出力値マップを生成する。具体的には、マップ生成部４は、生成した複数のマップ２００のｍ×ｎ番目の値を加算し、それによって得られた加算値を出力値マップのｍ×ｎ番目の検出確度値とする。マップ生成部４は、このようにして、出力値マップを構成する各検出確度値を求める。これにより、処理対象画像での検出確度値の分布を示す出力値マップが完成される。出力値マップを参照すれば、処理対象画像において顔画像らしさが高い領域を特定することができる。つまり、出力値マップを参照することによって、処理対象画像おける顔画像を特定することができる。 As described above, the map generation unit 4 generates a plurality of maps 200 respectively corresponding to a plurality of detection result frames for the processing target image. In other words, the map generation unit 4 generates a plurality of maps 200 respectively corresponding to a plurality of detection result frames obtained for a plurality of used captured images constituting the used captured image group. And the map production | generation part 4 synthesize | combines the produced | generated several map 200, and produces | generates an output value map. Specifically, the map generation unit 4 adds the m × n-th values of the plurality of generated maps 200, and uses the obtained addition value as the m × n-th detection accuracy value of the output value map. . In this way, the map generation unit 4 obtains each detection accuracy value constituting the output value map. Thereby, an output value map indicating the distribution of detection accuracy values in the processing target image is completed. By referring to the output value map, it is possible to specify a region having a high likelihood of a face image in the processing target image. That is, the face image in the processing target image can be specified by referring to the output value map.

図１５は、図１２に示される処理対象画像２０ａについての出力値マップを当該処理対象画像２０ａに重ねて示す図である。図１５では、理解し易いように、検出確度値の大きさを例えば第１段階から第５段階の５段階に分けて出力値マップを示している。図１５に示される出力値マップにおいては、検出確度値が、最も大きい第５段階に属する領域については縦線のハッチングが示されており、２番目に大きい第４段階に属する領域については砂地のハッチングが示されている。また、図１５での出力値マップにおいては、検出確度値が、３番目に大きい第３段階に属する領域については右上がりのハッチングが示されており、４番目に大きい第２段階に属する領域については左上がりのハッチングが示されている。そして、図１５に示される出力値マップにおいては、検出確度値が、最も小さい第１段階に属する領域についてはハッチングが示されていない。 FIG. 15 is a diagram showing an output value map for the processing target image 20a shown in FIG. 12 superimposed on the processing target image 20a. In FIG. 15, for easy understanding, the output value map is shown by dividing the magnitude of the detection accuracy value into, for example, five stages from the first stage to the fifth stage. In the output value map shown in FIG. 15, vertical line hatching is shown for the area belonging to the fifth stage where the detection accuracy value is the largest, and sand area is shown for the area belonging to the second largest fourth stage. Hatching is shown. Further, in the output value map in FIG. 15, the area belonging to the third stage where the detection accuracy value is the third largest shows hatching to the right, and the area belonging to the fourth largest second stage. Indicates a hatching that goes up to the left. In the output value map shown in FIG. 15, hatching is not shown for the region belonging to the first stage having the smallest detection accuracy value.

図１５を参照すると、出力値マップでは、処理対象画像２０ａに含まれる各顔画像に対応する領域での検出確度値が高くなっていることが理解できる。 Referring to FIG. 15, it can be understood that in the output value map, the detection accuracy value in the region corresponding to each face image included in the processing target image 20a is high.

＜極大点探索処理＞
図１５に示されるように、出力値マップにおいては、処理対象画像での顔画像に対応する領域での検出確度値が大きくなる可能性が高い。そして、ミクロな視点で出力値マップを見てみると、出力値マップにおいては、処理対象画像での顔画像に対応する領域での検出確度値のうち、顔画像の中心位置と同じ位置での検出確度値が最も大きくなる可能性が高い。したがって、出力値マップにおいて検出確度値の極大点を探索することによって、顔画像の中心位置を特定することができる。そして、処理対象画像において、特定した極大点（顔画像の中心位置に対応）と同じ位置のピクセルを含む所定領域を顔画像であると決定することによって、処理対象画像中での顔画像を正確に特定することができる。よって、当該所定領域の外形枠を統合検出結果枠とすることによって、精度の良い統合検出結果枠を得ることができる。 <Maximum point search processing>
As shown in FIG. 15, in the output value map, there is a high possibility that the detection accuracy value in the region corresponding to the face image in the processing target image becomes large. Then, looking at the output value map from a micro viewpoint, in the output value map, the detection accuracy value in the region corresponding to the face image in the processing target image is the same as the center position of the face image. There is a high possibility that the detection accuracy value will be the largest. Therefore, the center position of the face image can be specified by searching for the maximum point of the detection accuracy value in the output value map. Then, by determining that a predetermined area including a pixel at the same position as the identified maximum point (corresponding to the center position of the face image) in the processing target image is a face image, the face image in the processing target image is accurately Can be specified. Therefore, an accurate integrated detection result frame can be obtained by using the outer frame of the predetermined area as the integrated detection result frame.

ここでは、出力値マップにおいて検出確度値の極大点を探索する方法について説明する。本実施の形態では、極大点探索部５は、例えば、Mean-Shift法を用いて出力値マップでの検出確度値の極大点を探索する。以下に極大点探索部５の動作について詳細に説明する。以後、単に「極大点」と言えば、「出力値マップでの検出確度値の極大点」を意味するものとする。 Here, a method for searching for the maximum point of the detection accuracy value in the output value map will be described. In the present embodiment, the maximum point search unit 5 searches for the maximum point of the detection accuracy value in the output value map using, for example, the Mean-Shift method. The operation of the maximum point search unit 5 will be described in detail below. Hereinafter, simply speaking, “maximum point” means “maximum point of detection accuracy value in output value map”.

極大点探索部５は、二次元座標に配置された出力値マップにおいて、検出確度値を重み係数として、処理対象領域内に含まれる複数の検出確度値がそれぞれ存在する複数の位置についての座標値の重み付け平均値を算出し、当該処理対象領域の中心位置が当該重み付け平均値となるように当該処理対象領域を移動させる処理を繰り返すことによって極大点を探索する。本実施の形態では、処理対象画像についての複数の検出結果枠１５０の数と同じ数だけ極大点が求められる。 In the output value map arranged in the two-dimensional coordinates, the local maximum point search unit 5 uses the detection accuracy value as a weighting factor, and coordinates values for a plurality of positions each having a plurality of detection accuracy values included in the processing target region. The maximum point is searched by repeating the process of moving the processing target region so that the center position of the processing target region becomes the weighted average value. In the present embodiment, as many local points as the plurality of detection result frames 150 for the processing target image are obtained.

図１６は極大点の探索方法を説明するための図である。図１６では、二次元座標であるＸＹ座標に出力値マップ３００が配置されている。本実施の形態では、例えば、出力値マップ３００の左上をＸＹ座標の原点Ｏとし、行方向をＸ軸方向とし、列方向をＹ軸方向とする。また極大値の探索の際に移動させる処理対象領域４００の形状を例えば円形とする。 FIG. 16 is a diagram for explaining a search method for local maximum points. In FIG. 16, an output value map 300 is arranged at XY coordinates that are two-dimensional coordinates. In the present embodiment, for example, the upper left of the output value map 300 is the origin O of the XY coordinates, the row direction is the X axis direction, and the column direction is the Y axis direction. The shape of the processing target area 400 that is moved when searching for the maximum value is, for example, circular.

極大点探索部５は、処理対象画像２０ａについての複数の検出結果枠１５０のうちの一つの検出結果枠１５０を対象検出結果枠１５０ｔとする。 The maximum point search unit 5 sets one detection result frame 150 among the plurality of detection result frames 150 for the processing target image 20a as the target detection result frame 150t.

次に、極大点探索部５は、出力値マップ３００上を移動させる処理対象領域４００の移動開始位置を決定する。ここで、処理対象画像での検出結果枠１５０内の画像は顔画像である可能性が高いことから、処理対象画像において顔画像の中心は検出結果枠１５０内に存在する可能性が高い。したがって、出力値マップ３００においては、対象検出結果枠１５０ｔと同じ位置の領域内に極大点が存在する可能性が高い。特に本実施の形態では、出力値マップ３００の生成で使用される上述のマップ２００を完成する際には、枠２１０内の中心２１１の値を検出確度値としていることから、出力値マップ３００においては、対象検出結果枠１５０ｔ内の中心位置と同じ位置の近くに極大点が存在する可能性が高い。 Next, the local maximum point search unit 5 determines the movement start position of the processing target area 400 to be moved on the output value map 300. Here, since the image in the detection result frame 150 in the processing target image is highly likely to be a face image, the center of the face image in the processing target image is likely to exist in the detection result frame 150. Therefore, in the output value map 300, there is a high possibility that a local maximum point exists in the region at the same position as the target detection result frame 150t. In particular, in the present embodiment, when the above-described map 200 used for generating the output value map 300 is completed, the value of the center 211 in the frame 210 is used as the detection accuracy value. Is likely to have a local maximum near the same position as the center position in the object detection result frame 150t.

そこで、図１６に示されるように、極大点探索部５は、対象検出結果枠１５０ｔ内の所定位置、例えば中心位置と同じ出力値マップ３００での位置４１０を、処理対象領域４００の中心位置の初期位置とする。つまり、極大点探索部５は、極大点の探索を開始する際には、処理対象領域４００の中心位置が、対象検出結果枠１５０ｔ内の中心位置と同じ位置となるように、当該処理対象領域４００を出力値マップ３００に配置する。これにより、極大点をすぐに探索することができる。 Therefore, as illustrated in FIG. 16, the local maximum point search unit 5 uses a predetermined position in the target detection result frame 150 t, for example, a position 410 in the output value map 300 that is the same as the center position as the center position of the processing target region 400. The initial position. That is, when starting the search for the maximum point, the local maximum point search unit 5 sets the processing target region 400 so that the center position of the processing target region 400 is the same as the central position in the target detection result frame 150t. 400 is placed in the output value map 300. Thereby, the local maximum point can be searched immediately.

なお、処理対象領域４００の大きさは、例えば、出力値マップ３００上に配置された当該処理対象領域４００内において、その中心から半径方向に沿って５０〜６０個の検出確度値が並ぶ程度の大きさとなっている。 Note that the size of the processing target area 400 is, for example, such that 50 to 60 detection accuracy values are arranged along the radial direction from the center in the processing target area 400 arranged on the output value map 300. It is a size.

次に、極大点探索部５は、ＸＹ座標に配置された出力値マップ３００において、検出確度値を重み係数として、処理対象領域４００内に含まれる複数の検出確度値がそれぞれ存在する複数の位置についての座標値の重み付け平均値（ＸＭ，ＹＭ）を算出する。極大点探索部５は、以下の式（１）を用いて重み付け平均値（ＸＭ，ＹＭ）を算出する。 Next, the maximal point search unit 5 uses the detection accuracy value as a weighting factor in the output value map 300 arranged in the XY coordinates, and a plurality of positions where a plurality of detection accuracy values included in the processing target region 400 exist. A weighted average value (XM, YM) of coordinate values is calculated. The maximum point search unit 5 calculates a weighted average value (XM, YM) using the following equation (1).

ここで、式（１）中のＪは、処理対象領域４００内に存在する複数の検出確度値の個数を示している。また、ｉは、処理対象領域４００内の複数の検出確度値のそれぞれに対して付された番号を示している。そして、ｖｉは、ｉ番の検出確度値を意味しており、（Ｘｉ，Ｙｉ）は、ＸＹ座標に配置された出力値マップ３００においてｉ番の検出確度値が存在する位置についてのＸＹ座標値を示している。 Here, J in Expression (1) indicates the number of a plurality of detection accuracy values existing in the processing target area 400. In addition, i indicates a number assigned to each of a plurality of detection accuracy values in the processing target area 400. And vi means the i-th detection accuracy value, and (Xi, Yi) is the XY coordinate value for the position where the i-th detection accuracy value exists in the output value map 300 arranged in the XY coordinates. Is shown.

極大点探索部５は、重み付け平均値（ＸＭ，ＹＭ）を求めると、処理対象領域４００の中心位置のＸＹ座標が当該重み付け平均値（ＸＭ，ＹＭ）となるように処理対象領域４００を移動させる。図１６の矢印は、処理対象領域４００が移動する様子を示している。 When the maximum point search unit 5 obtains the weighted average value (XM, YM), the processing target region 400 is moved so that the XY coordinates of the center position of the processing target region 400 become the weighted average value (XM, YM). . The arrows in FIG. 16 indicate how the processing target area 400 moves.

次に極大点探索部５は、処理対象領域４００の移動距離（シフト量）がしきい値未満であるかを判定する。処理対象領域４００の移動距離は、移動前の処理対象領域４００の中心位置と移動後の処理対象領域４００の中心位置との間の距離を求めることによって得られる。極大点探索部５は、処理対象領域４００の移動距離がしきい値以上であると判定すると、移動後の処理対象領域４００内に含まれる複数の検出確度値がそれぞれ存在する複数の位置についての座標値の重み付け平均値（ＸＭ，ＹＭ）を式（１）を用いて算出する。そして、極大点探索部５は、処理対象領域４００の中心位置のＸＹ座標が、新たに求めた重み付け平均値（ＸＭ，ＹＭ）となるように当該処理対象領域４００をさらに移動させる。 Next, the local maximum point search unit 5 determines whether the moving distance (shift amount) of the processing target area 400 is less than a threshold value. The movement distance of the processing target area 400 is obtained by calculating the distance between the center position of the processing target area 400 before the movement and the center position of the processing target area 400 after the movement. When the local maximum point search unit 5 determines that the movement distance of the processing target area 400 is equal to or greater than the threshold value, the local maximum point searching unit 5 has a plurality of detection accuracy values included in the processing target area 400 after the movement. A weighted average value (XM, YM) of coordinate values is calculated using equation (1). Then, the local maximum point search unit 5 further moves the processing target region 400 so that the XY coordinates of the center position of the processing target region 400 become the newly obtained weighted average value (XM, YM).

一方で、極大点探索部５は、処理対象領域４００の移動量がしきい値未満であると判定すると、処理対象領域４００の移動量が収束したと判断して、処理対象領域４００の移動を終了する。そして、極大点探索部５は、現在の処理対象領域４００の中心位置を極大点とする。これより、対象検出結果枠１５０の位置付近での極大点が求められる。 On the other hand, when the local maximum point search unit 5 determines that the movement amount of the processing target region 400 is less than the threshold value, the local maximum point searching unit 5 determines that the movement amount of the processing target region 400 has converged, and moves the processing target region 400. finish. Then, the local maximum point search unit 5 sets the current center position of the processing target region 400 as the local maximum point. From this, the local maximum point in the vicinity of the position of the target detection result frame 150 is obtained.

以上のようにして、極大点探索部５は、処理対象画像についての複数の検出結果枠１５０のそれぞれについて、当該検出結果枠１５０の位置付近での極大点を求める。 As described above, the maximal point search unit 5 obtains a maximal point near the position of the detection result frame 150 for each of the plurality of detection result frames 150 for the processing target image.

なお、極大点探索部５は、重み付け平均値（ＸＭ，ＹＭ）を算出する際には、検出確度値が間引かれた出力値マップ３００を用いても良い。言い換えれば、極大点探索部５は、重み付け平均値（ＸＭ，ＹＭ）を算出する際には、出力値マップ３００において処理対象領域４００内に含まれる複数の検出確度値のすべてを用いなくても良い。 In addition, when calculating the weighted average value (XM, YM), the maximum point search unit 5 may use the output value map 300 from which the detection accuracy value is thinned out. In other words, when calculating the weighted average value (XM, YM), the local maximum point search unit 5 does not have to use all of the plurality of detection accuracy values included in the processing target area 400 in the output value map 300. good.

図１７は、検出確度値が間引かれた出力値マップ３００の一例を示す図である。図１７の例では、出力値マップ３００が、行方向（Ｘ軸方向）に等間隔で並ぶ複数の分割線５００と列方向（Ｙ軸方向）に等間隔で並ぶ複数の分割線５１０とによって格子状に分割されている。そして、出力値マップ３００においては、格子交点（分割線５００，５１０の交点）に存在する検出確度値以外の検出確度値が削除されている。これにより、出力値マップ３００においては、検出確度値が、列方向においてＰ個（Ｐ≧２）ごとに（Ｐ−１）個間引かれ、行方向においてＱ個（Ｑ≧２）ごとに（Ｑ−１）個間引かれる。図１７中の丸印は、検出確度値が間引かれた出力値マップ３００に存在する検出確度値を示している。 FIG. 17 is a diagram illustrating an example of an output value map 300 in which detection accuracy values are thinned out. In the example of FIG. 17, the output value map 300 is a lattice of a plurality of dividing lines 500 arranged at equal intervals in the row direction (X-axis direction) and a plurality of dividing lines 510 arranged at equal intervals in the column direction (Y-axis direction). It is divided into shapes. In the output value map 300, detection accuracy values other than the detection accuracy values existing at the grid intersections (intersections of the dividing lines 500 and 510) are deleted. Thereby, in the output value map 300, (P-1) pieces of detection accuracy values are thinned out every P (P ≧ 2) in the column direction, and every Q pieces (Q ≧ 2) in the row direction ( Q-1) Thinned out. The circles in FIG. 17 indicate the detection accuracy values existing in the output value map 300 from which the detection accuracy values are thinned out.

図１７に示される出力値マップ３００では、図１５での出力値マップと同様に、検出確度値の大きさが例えば第１段階から第５段階の５段階に分けられて各検出確度値が示されている。図１７での出力値マップ３００では、最も大きい第５段階に属する検出確度値を示す丸印には横線のハッチングが示されており、２番目に大きい第４段階に属する検出確度値を示す丸印には縦線のハッチングが示されている。また、図１７での出力値マップ３００では、３番目に大きい第３段階に属する検出確度値を示す丸印には右上がりのハッチングが示されており、４番目に大きい第２段階に属する検出確度値を示す丸印には左上がりのハッチングが示されている。そして、図１７に示される出力値マップ３００では、最も小さい第１段階に属する検出確度値を示す丸印にはハッチングが示されていない。 In the output value map 300 shown in FIG. 17, similarly to the output value map in FIG. 15, the detection accuracy value is divided into five stages, for example, from the first stage to the fifth stage, and each detection accuracy value is indicated. Has been. In the output value map 300 in FIG. 17, the circle indicating the detection accuracy value belonging to the largest fifth stage is indicated by horizontal hatching, and the circle indicating the detection accuracy value belonging to the second largest fourth stage. The mark shows vertical hatching. Further, in the output value map 300 in FIG. 17, the circle indicating the detection accuracy value belonging to the third largest third stage indicates a right-up hatching, and the detection belonging to the fourth largest second stage. The circle indicating the accuracy value indicates a left-upward hatching. In the output value map 300 shown in FIG. 17, no hatching is shown in the circle indicating the detection accuracy value belonging to the smallest first stage.

極大点探索部５は、重み付け平均値（ＸＭ，ＹＭ）を算出する場合に、図１７に示されるような、検出確度値が間引かれた出力値マップ３００を用いる際には、当該出力値マップ３００において処理対象領域４００内に含まれる複数の検出確度値がそれぞれ存在する複数の位置についてのＸＹ座標値と当該複数の検出確度値を上記の式（１）に代入する。これにより、重み付け平均値（ＸＭ，ＹＭ）を算出する際に使用される検出確度値の数が低減する。さらに、処理対象領域４００の移動量が収束するまでに必要な、重み付け平均値（ＸＭ，ＹＭ）の算出回数も低減する。よって、極大点を探索する処理の負荷が軽減される。 When calculating the weighted average value (XM, YM), the maximum point search unit 5 uses the output value map 300 in which the detection accuracy values are thinned as shown in FIG. In the map 300, the XY coordinate values and the plurality of detection accuracy values for a plurality of positions where the plurality of detection accuracy values included in the processing target area 400 respectively exist are substituted into the above equation (1). This reduces the number of detection accuracy values used when calculating the weighted average value (XM, YM). Furthermore, the number of times of calculating the weighted average value (XM, YM) required until the movement amount of the processing target area 400 converges is also reduced. Therefore, the processing load for searching for the maximum point is reduced.

＜極大点統合処理＞
上述の説明から理解できるように、極大点探索部５では、互いに位置が異なる複数の極大点が求められることがある。出力値マップにおいて、互いに近くに位置している複数の極大点については、同じ顔画像の中心を示している可能性が高い。一方で、互いに離れて位置している複数の極大点については、別々の顔画像の中心を示している可能性が高い。 <Maximum point integration processing>
As can be understood from the above description, the maximum point search unit 5 may require a plurality of maximum points having different positions. In the output value map, a plurality of local maximum points located close to each other is likely to indicate the center of the same face image. On the other hand, there is a high possibility that the plurality of local maximum points located apart from each other indicate the centers of different face images.

そこで、検出対象画像決定部６は、極大点探索部５で求められた極大点を用いて、処理対象画像において顔画像を特定する際には、まず、極大点探索部５で求められた、互いに近くに位置している複数の極大点を一つの極大点に統合する。以下に、互いに近くに位置する複数の極大点の統合方法の一例について説明する。 Therefore, when the detection target image determination unit 6 specifies a face image in the processing target image using the maximum point obtained by the maximum point search unit 5, first, the detection target image determination unit 6 obtains the maximum point search unit 5. A plurality of local maximum points located close to each other are integrated into a single local maximum point. Hereinafter, an example of a method for integrating a plurality of local maximum points located close to each other will be described.

検出対象画像決定部６は、検出部３が検出処理において検出枠を処理対象画像の左上から右下にかけて移動させる場合と同様に、出力値マップを左上から右下にかけて見ていき（ラスタスキャンの方向に見ていき）、極大点探索部５で求められた極大点が現れると、当該極大点を基準点として、当該基準点と次に現れる極大点との間の距離を求める。そして、検出対象画像決定部６は、求めた距離がしきい値未満であれば、基準点を残して、後に現れた極大点を削除する。一方で、検出対象画像決定部６は、求めた距離がしきい値以上であれば、現在の基準点を残しつつ、後に現れた極大点を新たな基準点とする。 Similar to the case where the detection unit 3 moves the detection frame from the upper left to the lower right of the processing target image in the detection process, the detection target image determination unit 6 looks at the output value map from the upper left to the lower right (raster scanning). When the local maximum obtained by the local maximum searching section 5 appears, the distance between the reference point and the next local maximum appearing is determined using the local maximum as a reference point. Then, if the obtained distance is less than the threshold value, the detection target image determination unit 6 leaves the reference point and deletes the maximum point that appears later. On the other hand, if the obtained distance is equal to or greater than the threshold value, the detection target image determination unit 6 leaves the current reference point and sets the maximum point that appears later as a new reference point.

極大点の統合で使用されるしきい値については、どの程度の大きさの顔画像を検出すべきかに応じて決定される。例えば、本画像検出装置１が監視カメラシステムで使用される場合であって、カメラから比較的近いエリアを監視するのであれば、比較的大きい顔画像を検出することになるため、しきい値としては大きな値が使用される。また、本画像検出装置１が監視カメラシステムで使用される場合であって、カメラから比較的遠いエリアを監視するのであれば、比較的小さい顔画像を検出することになるため、しきい値としては小さな値が使用される。本例では、しきい値は、例えば、処理対象画像での５ピクセル分の距離に設定される。なお、しきい値は、ユーザによって調整可能（書き替え可能）とすることが好ましい。 The threshold value used in the integration of local maximum points is determined according to how large a face image should be detected. For example, when this image detection apparatus 1 is used in a surveillance camera system and an area relatively close to the camera is monitored, a relatively large face image is detected. A large value is used. In addition, when the image detection apparatus 1 is used in a surveillance camera system and an area relatively far from the camera is monitored, a relatively small face image is detected. A small value is used. In this example, the threshold value is set to a distance of 5 pixels in the processing target image, for example. The threshold value is preferably adjustable (rewritable) by the user.

検出対象画像決定部６は、求めた距離がしきい値未満であれば、現在の基準点を残して、後に現れた極大点を削除し、その後、現在の基準点と削除した極大点の次に現れる極大点との間の距離と、しきい値とを比較する。また、検出対象画像決定部６は、求めた距離がしきい値以上であり、後に現れた極大点を新たな基準点とすると、その新たな基準点の次に現れる極大点と当該新たな基準点との間の距離と、しきい値とを比較する。 If the obtained distance is less than the threshold value, the detection target image determination unit 6 leaves the current reference point, deletes the maximum point that appears later, and then deletes the current reference point and the next maximum point after the deletion. The threshold value is compared with the distance between the local maximum points appearing in. Further, when the obtained distance is equal to or greater than the threshold value and the maximum point that appears later is set as a new reference point, the detection target image determination unit 6 sets the maximum point that appears next to the new reference point and the new reference point. The distance between the points is compared with the threshold value.

以後、検出対象画像決定部６は同様に動作して、最後に現れる極大点と基準点との間の距離がしきい値未満の場合には、最後に現れる極大点を削除して、極大点の統合処理を終了する。一方で、検出対象画像決定部６は、最後に現れる極大点と基準点との間の距離がしきい値以上の場合には、最後に現れる極大点を削除せずに、極大点の統合処理を終了する。検出対象画像決定部６は、極大点の統合処理の終了後に残った少なくとも一つの極大点を使用して、処理対象画像において顔画像を特定する。 Thereafter, the detection target image determination unit 6 operates in the same manner, and when the distance between the maximum point that appears last and the reference point is less than the threshold value, the maximum point that appears last is deleted, and the maximum point is deleted. This completes the integration process. On the other hand, when the distance between the maximum point that appears last and the reference point is equal to or greater than the threshold value, the detection target image determination unit 6 does not delete the maximum point that appears last, and integrates the maximum points. Exit. The detection target image determination unit 6 specifies a face image in the processing target image using at least one local maximum point remaining after the local maximum point integration process is completed.

なお、上記の例では、基準点と後に現れた極大点との間の距離がしきい値未満であれば、基準点及び後に現れた極大点のうち基準点だけを残していたが、基準点及び後に現れた極大点のうち、その位置での検出確度値が大きい方の極大点だけを残しても良い。これにより、極大点の統合処理の終了後に残った極大点が、顔画像の中心位置を示す可能性が高くなる。 In the above example, if the distance between the reference point and the maximum point that appears later is less than the threshold value, only the reference point is left among the reference point and the maximum point that appears later. Of the local maximum points that appear later, only the local maximum point with the larger detection accuracy value at that position may be left. Accordingly, there is a high possibility that the maximal point remaining after the maximal point integration process ends indicates the center position of the face image.

＜顔画像決定処理＞
検出対象画像決定部６は、極大点の統合処理が終了すると、その後に残った各極大点について、処理対象画像において当該極大点と同じ位置のピクセルを含む所定領域を顔画像（検出対象画像）であると決定する。そして、検出対象画像決定部６は、顔画像であると決定した当該所定領域の外形枠を統合後検出結果枠とする。以後、顔画像であると決定される当該所定領域を「検出画像領域」と呼ぶ。また、説明の対象の極大点を「対象極大点」と呼ぶことがある。 <Face image determination process>
When the integration process of local maximum points is completed, the detection target image determination unit 6 applies a predetermined region including a pixel at the same position as the local maximum point in the processing target image for each remaining local maximum point after that. It is determined that Then, the detection target image determination unit 6 sets the outline frame of the predetermined area determined to be a face image as a post-integration detection result frame. Hereinafter, the predetermined area determined to be a face image is referred to as a “detected image area”. In addition, the maximum point to be described may be referred to as “target maximum point”.

ここで、出力値マップでの顔画像に対応する領域においては、当該顔画像の中心位置と同じ位置での検出確度値が大きくなり、当該同じ位置から離れるにつれて検出確度値が小さくなる可能性が高い。そして、出力値マップにおいては、顔画像に対応する領域以外の領域では、検出確度値が零あるいは非常に小さくなる可能性が高い。したがって、出力値マップでは、ある顔画像の中心位置に相当する極大点から、当該顔画像の端に相当する位置に向かうにつれて、検出確度値が小さくなる可能性が高い。言い換えれば、出力値マップでは、ある顔画像の中心位置に相当する極大点から、当該顔画像の端に相当する位置に向かうにつれて、検出確度値が単調減少する可能性が高い。 Here, in the region corresponding to the face image in the output value map, the detection accuracy value at the same position as the center position of the face image increases, and the detection accuracy value may decrease as the distance from the same position increases. high. In the output value map, there is a high possibility that the detection accuracy value is zero or very small in an area other than the area corresponding to the face image. Therefore, in the output value map, there is a high possibility that the detection accuracy value becomes smaller from the maximum point corresponding to the center position of a certain face image toward the position corresponding to the end of the face image. In other words, in the output value map, there is a high possibility that the detection accuracy value monotonously decreases from the maximum point corresponding to the center position of a certain face image toward the position corresponding to the end of the face image.

図１８は、出力値マップでの対象極大点７００付近の検出確度値の分布の一例を示すグラフである。図１８では、対象極大点７００を中心とした左右方向の検出確度値の分布が示されている。また図１８では、縦軸は検出確度値を示しており、横軸は出力値マップ３００での左右方向の位置を示している。図１８に示されるように、検出確度値は、対象極大点７００から右方向ＤＲ１に向かうにつれて小さくなっている（単調減少している）。また、検出確度値は、対象極大点７００から左方向ＤＲ２に向かうにつれて小さくなっている（単調減少している）。 FIG. 18 is a graph showing an example of the distribution of detection accuracy values near the target maximum point 700 in the output value map. FIG. 18 shows a distribution of detection accuracy values in the left-right direction with the target maximum point 700 as the center. In FIG. 18, the vertical axis represents the detection accuracy value, and the horizontal axis represents the position in the left-right direction on the output value map 300. As shown in FIG. 18, the detection accuracy value decreases from the target maximum point 700 in the right direction DR1 (monotonically decreases). Further, the detection accuracy value decreases from the target maximum point 700 in the left direction DR2 (monotonically decreases).

本実施の形態では、このような点に鑑みて、検出対象画像決定部６は、出力値マップにおいて、対象極大点７００から離れる方向に沿って対象極大点７００から検出確度値を見ていった際に、検出確度値が、対象極大点７００での検出確度値に対して最初に（１／Ｚ）倍以下（Ｚ＞１）となる位置と同じ処理対象画像での位置を検出画像領域の端とすることによって、顔画像の端を特定する。これにより、処理対象画像に含まれる顔画像の端を正確に特定することができる。 In the present embodiment, in view of such points, the detection target image determination unit 6 looks at the detection accuracy value from the target maximum point 700 along the direction away from the target maximum point 700 in the output value map. In this case, the position in the processing target image is the same as the position where the detection accuracy value first becomes (1 / Z) times or less (Z> 1) with respect to the detection accuracy value at the target maximum point 700. By defining the edge, the edge of the face image is specified. Thereby, the edge of the face image included in the processing target image can be accurately specified.

また、撮像部の撮像エリアにおいて複数の顔が前後に存在するなどして、処理対象画像において、複数の顔画像が互いに接している場合には、対象極大点７００から離れる方向に沿って対象極大点７００から検出確度値を見ていった際に、検出確度値が、対象極大点７００での検出確度値に対して（１／Ｚ）倍以下となる位置が現れる前に、検出確度値の変化が単調減少でなくなる可能性がある。このような場合には、検出確度値の変化が単調減少でなくなった位置と同じ処理対象画像での位置が、顔画像の端である可能性が高い。 In addition, when a plurality of face images are in contact with each other in a processing target image because a plurality of faces exist in front and back in the imaging area of the imaging unit, the target maximum is along a direction away from the target maximum point 700. When the detection accuracy value is viewed from the point 700, before the position where the detection accuracy value is (1 / Z) times or less than the detection accuracy value at the target maximum point 700 appears, the detection accuracy value The change may not be monotonous. In such a case, there is a high possibility that the position in the processing target image that is the same as the position where the change in the detection accuracy value is not monotonously decreased is the end of the face image.

そこで、検出対象画像決定部６は、出力値マップ３００において、対象極大点７００から離れる方向に沿って対象極大点７００から検出確度値を見ていった際に、検出確度値が、対象極大点７００での検出確度値に対して（１／Ｚ）倍以下となる位置が現れる前に、検出確度値の変化が単調減少でなくなったと判断すると、当該変化が単調減少でなくなったと判断した位置と同じ処理対象画像での位置を、検出画像領域の端とする。これにより、複数の顔画像が接している場合であっても、当該複数の顔画像のそれぞれを個別に特定することができる。以下に検出対象画像決定部６の動作について詳細に説明する。 Therefore, when the detection target image determination unit 6 looks at the detection accuracy value from the target maximum point 700 along the direction away from the target maximum point 700 in the output value map 300, the detection accuracy value is the target maximum point. If it is determined that the change in the detection accuracy value is no longer monotonically decreasing before a position that is less than (1 / Z) times the detection accuracy value in 700, the position that is determined that the change is no longer monotonically decreasing The position in the same processing target image is the end of the detected image area. Thereby, even when a plurality of face images are in contact with each other, each of the plurality of face images can be specified individually. The operation of the detection target image determination unit 6 will be described in detail below.

図１９は、処理対象画像において対象極大点７００と同じ位置のピクセルを含む検出画像領域６００（以後、「対象検出画像領域６００」と呼ぶ）の決定方法を説明するための図である。図１９では出力値マップ３００が拡大して示されている。また図１９では、出力値マップ３００に対して対象検出画像領域６００の外形枠６００ａが重ねられて示されている。 FIG. 19 is a diagram for explaining a method of determining a detection image region 600 (hereinafter, referred to as “target detection image region 600”) including a pixel at the same position as the target maximum point 700 in the processing target image. In FIG. 19, the output value map 300 is shown enlarged. Also, in FIG. 19, the outer shape frame 600 a of the target detection image region 600 is shown superimposed on the output value map 300.

本実施の形態では、検出画像領域の形状は例えば四角形に設定される。検出対象画像決定部６は、四角形の検出画像領域の右側端、左側端、上側端及び下側端を決定することによって、当該検出画像領域を決定する。 In the present embodiment, the shape of the detected image area is set to, for example, a quadrangle. The detection target image determination unit 6 determines the detection image region by determining the right end, left end, upper end, and lower end of the square detection image region.

まず、検出対象画像決定部６が対象検出画像領域６００の右側端６１０を決定する際の当該検出対象画像決定部６の動作について説明する。図２０は当該動作を示すフローチャートである。 First, the operation of the detection target image determination unit 6 when the detection target image determination unit 6 determines the right end 610 of the target detection image region 600 will be described. FIG. 20 is a flowchart showing the operation.

検出対象画像決定部６は、図１９に示されるように、出力値マップ３００において、対象極大点７００から右方向ＤＲ１に沿って検出確度値８００（丸印で示されている）を見ていって（抽出していって）、ペアを変えながら前後２つの検出確度値８００を比較し、その比較結果に基づいて対象検出画像領域６００の右側端６１０を決定する。このとき、検出対象画像決定部６は、検出確度値８００を一つずつ見ていっても良いし、一つ置きに見ていっても良いし、複数個置きに見ていっても良い。本例では、検出対象画像決定部６は、検出確度値８００を一つずつ見ていくものとする。 As illustrated in FIG. 19, the detection target image determination unit 6 looks at the detection accuracy value 800 (indicated by a circle) from the target maximum point 700 along the right direction DR1 in the output value map 300. Then, the two detection accuracy values 800 are compared before and after changing the pair, and the right end 610 of the target detection image region 600 is determined based on the comparison result. At this time, the detection target image determination unit 6 may view the detection accuracy values 800 one by one, every other one, or every other plurality. In this example, it is assumed that the detection target image determination unit 6 looks at the detection accuracy values 800 one by one.

具体的に説明すると、図２０に示されるように、検出対象画像決定部６は、ステップｓ１１において、対象極大点７００での検出確度値８００（図１９に示される検出確度値８００ａ）を第１の確度値ｖ１とし、その右側の検出確度値８００を第２の確度値ｖ２とする。そして、ステップｓ１２において、検出対象画像決定部６は（ｖ１−ｖ２）を求めて、第１の確度値ｖ１と第２の確度値ｖ２を比較する。 Specifically, as shown in FIG. 20, the detection target image determination unit 6 first sets the detection accuracy value 800 (the detection accuracy value 800a shown in FIG. 19) at the target maximum point 700 to the first in step s11. And the detection accuracy value 800 on the right side thereof is set as the second accuracy value v2. In step s12, the detection target image determination unit 6 obtains (v1-v2) and compares the first accuracy value v1 and the second accuracy value v2.

次にステップｓ１３において、検出対象画像決定部６は、ｖ１とｖ２の比較結果が、（ｖ１−ｖ２）＜０であるかを判断する。検出対象画像決定部６は、（ｖ１−ｖ２）＜０でないと判断した場合には、検出確度値８００が単調減少していると判断して、ステップｓ１４を実行する。ステップｓ１４において、検出対象画像決定部６は、第２の確度値ｖ２が、対象極大点７００での検出確度値８００の（１／Ｚ）倍以下であるか判断する。Ｚについては、例えば３≦Ｚ≦５に設定される。検出対象画像決定部６は、第２の確度値ｖ２が、対象極大点７００での検出確度値８００ａの（１／Ｚ）倍以下であると判断すると、ステップｓ１５において、出力値マップ３００において第２の確度値ｖ２が存在する位置７１０（図１８，１９参照）と同じ処理対象画像での位置を、対象検出画像領域６００の右側端６１０とする。この位置７１０は、出力値マップ３００において、対象極大点７００から右方向ＤＲ１に沿って検出確度値８００を見ていった際に、検出確度値８００が、対象極大点７００での検出確度値８００ａに対して最初に（１／Ｚ）倍以下となる位置である。 Next, in step s13, the detection target image determination unit 6 determines whether the comparison result between v1 and v2 is (v1-v2) <0. If the detection target image determination unit 6 determines that (v1−v2) <0 is not satisfied, the detection target image determination unit 6 determines that the detection accuracy value 800 is monotonically decreasing and executes step s14. In step s14, the detection target image determination unit 6 determines whether the second accuracy value v2 is equal to or less than (1 / Z) times the detection accuracy value 800 at the target maximum point 700. For Z, for example, 3 ≦ Z ≦ 5 is set. When the detection target image determination unit 6 determines that the second accuracy value v2 is equal to or less than (1 / Z) times the detection accuracy value 800a at the target maximum point 700, in step s15, the output value map 300 The position in the processing target image that is the same as the position 710 (see FIGS. 18 and 19) where the accuracy value v2 of 2 exists is the right end 610 of the target detection image area 600. This position 710 indicates that when the detection accuracy value 800 is viewed along the right direction DR1 from the target maximum point 700 in the output value map 300, the detection accuracy value 800 becomes the detection accuracy value 800a at the target maximum point 700. First, it is a position that becomes (1 / Z) times or less.

一方でステップｓ１４において、検出対象画像決定部６は、第２の確度値ｖ２が、対象極大点７００での検出確度値８００ａの（１／Ｚ）倍以下ではないと判断すると、ステップｓ１１において、現在の第２の確度値ｖ２を新たな第１の確度値ｖ１とし、その右側の検出確度値８００を新たな第２の確度値ｖ２とする。その後、検出対象画像決定部６は同様に動作する。 On the other hand, in step s14, if the detection target image determination unit 6 determines that the second accuracy value v2 is not less than (1 / Z) times the detection accuracy value 800a at the target maximum point 700, in step s11, The current second accuracy value v2 is set as a new first accuracy value v1, and the detection accuracy value 800 on the right side thereof is set as a new second accuracy value v2. Thereafter, the detection target image determination unit 6 operates in the same manner.

またステップｓ１３において、検出対象画像決定部６は、ｖ１とｖ２の比較結果が、（ｖ１−ｖ２）＜０であると判断した場合には、ステップｓ１６を実行する。ステップｓ１６において、検出対象画像決定部６は、今回の比較結果も含めて（ｖ１−ｖ２）＜０という比較結果が連続して所定回数Ｃ（Ｃ≧２）だけ発生したかを判断する。つまり、検出対象画像決定部６は、出力値マップ３００において、対象極大点７００から右方向ＤＲ１に沿って、ペアを変えながら前後２つの検出確度値８００を比較していく際に、前の検出確度値８００が後の検出確度値８００よりも小さいという比較結果が連続して所定回数Ｃだけ発生したかを判断する。所定回数Ｃについては、例えばＣ＝２に設定される。 In step s13, when the detection target image determination unit 6 determines that the comparison result between v1 and v2 is (v1-v2) <0, executes step s16. In step s16, the detection target image determination unit 6 determines whether the comparison result of (v1-v2) <0 has been generated a predetermined number of times C (C ≧ 2) including the current comparison result. That is, when the detection target image determination unit 6 compares the two detection accuracy values 800 before and after changing the pair along the right direction DR1 from the target maximum point 700 in the output value map 300, the previous detection is performed. It is determined whether a comparison result that the accuracy value 800 is smaller than the subsequent detection accuracy value 800 has occurred a predetermined number of times C continuously. The predetermined number of times C is set to C = 2, for example.

ステップｓ１６において、検出対象画像決定部６は、（ｖ１−ｖ２）＜０という比較結果が連続して所定回数Ｃだけ発生したと判断すると、検出確度値が、対象極大点７００での検出確度値に対して（１／Ｚ）倍以下となる位置が現れる前に、検出確度値の変化が単調減少でなくなったと判断して、ステップｓ１５を実行して、現在の第２の確度値ｖ２が存在する位置７１０と同じ処理対象画像での位置を、対象検出画像領域６００の右側端６１０とする。この位置７１０は、検出対象画像決定部６が、出力値マップ３００において、対象極大点７００から右方向ＤＲ１に沿って検出確度値８００を見ていった際に、検出確度値８００が、対象極大点７００での検出確度値８００ａに対して（１／Ｚ）倍以下となる位置が現れる前に、検出確度値の変化が単調減少でなくなったと判断した位置となる。 In step s16, when the detection target image determination unit 6 determines that the comparison result of (v1-v2) <0 has continuously occurred a predetermined number of times C, the detection accuracy value is the detection accuracy value at the target maximum point 700. Before a position that is (1 / Z) times or less appears, it is determined that the change in the detection accuracy value is no longer monotonically decreasing, and step s15 is executed, so that the current second accuracy value v2 exists. The position in the processing target image that is the same as the position 710 to be processed is set as the right end 610 of the target detection image area 600. This position 710 indicates that when the detection target image determination unit 6 looks at the detection accuracy value 800 along the right direction DR1 from the target maximum point 700 in the output value map 300, the detection accuracy value 800 is the target maximum. Before a position that is (1 / Z) times or less than the detection accuracy value 800a at the point 700 appears, the detection accuracy value is determined to be no longer monotonically decreasing.

一方で、ステップｓ１６において、検出対象画像決定部６は、（ｖ１−ｖ２）＜０という比較結果が連続して所定回数Ｃだけ発生したと判断しない場合には、ステップｓ１１において、現在の第２の確度値ｖ２を新たな第１の確度値ｖ１とし、その右側の検出確度値８００を新たな第２の確度値ｖ２とする。その後、検出対象画像決定部６は同様に動作する。 On the other hand, in step s16, if the detection target image determination unit 6 does not determine that the comparison result of (v1-v2) <0 has continuously occurred a predetermined number of times C, the current second image is determined in step s11. The accuracy value v2 is set as a new first accuracy value v1, and the detection accuracy value 800 on the right side thereof is set as a new second accuracy value v2. Thereafter, the detection target image determination unit 6 operates in the same manner.

このようにして、検出対象画像決定部６は、対象検出画像領域６００の右側端６１０を決定する。 In this way, the detection target image determination unit 6 determines the right end 610 of the target detection image region 600.

同様にして、検出対象画像決定部６は、対象検出画像領域６００の左側端６２０を決定する際には、図１９に示されるように、出力値マップ３００において、対象極大点７００から左方向ＤＲ２に沿って検出確度値８００を見ていって、ペアを変えながら前後２つの検出確度値８００を比較し、その比較結果に基づいて対象検出画像領域６００の左側端６２０を決定する。対象検出画像領域６００の右側端６１０及び左側端６２０が決定されると、対象検出画像領域６００の左右方向（行方向）の幅Ｗ１（図１８，１９参照）が決定される。 Similarly, when the detection target image determination unit 6 determines the left end 620 of the target detection image region 600, as shown in FIG. 19, in the output value map 300, the left direction DR2 from the target maximum point 700 is displayed. , The two detection accuracy values 800 are compared before and after changing the pair, and the left end 620 of the target detection image region 600 is determined based on the comparison result. When the right end 610 and the left end 620 of the target detection image region 600 are determined, the left-right direction (row direction) width W1 (see FIGS. 18 and 19) of the target detection image region 600 is determined.

また、検出対象画像決定部６は、対象検出画像領域６００の上側端６３０を決定する際には、図１９に示されるように、出力値マップ３００において、対象極大点７００から上方向ＤＲ３に沿って検出確度値８００を見ていって、ペアを変えながら前後２つの検出確度値８００を比較し、その比較結果に基づいて対象検出画像領域６００の上側端６３０を決定する。そして、検出対象画像決定部６は、対象検出画像領域６００の下側端６４０を決定する際には、図１９に示されるように、出力値マップ３００において、対象極大点７００から下方向ＤＲ４に沿って検出確度値８００を見ていって、ペアを変えながら前後２つの検出確度値８００を比較し、その比較結果に基づいて対象検出画像領域６００の下側端６４０を決定する。対象検出画像領域６００の上側端６３０及び下側端６４０が決定されると、対象検出画像領域６００の上下方向（列方向）の幅Ｗ２（図１９参照）が決定される。 Further, when the detection target image determination unit 6 determines the upper end 630 of the target detection image region 600, as shown in FIG. 19, in the output value map 300, the detection target image determination unit 6 extends from the target maximum point 700 along the upward direction DR3. Then, the two detection accuracy values 800 are compared while changing the pair, and the upper end 630 of the target detection image region 600 is determined based on the comparison result. When the detection target image determination unit 6 determines the lower end 640 of the target detection image region 600, as shown in FIG. 19, in the output value map 300, the target maximum point 700 is shifted downward DR4. The two detection accuracy values 800 are compared while changing the pair, and the lower end 640 of the target detection image region 600 is determined based on the comparison result. When the upper end 630 and the lower end 640 of the target detection image area 600 are determined, the vertical width (column direction) width W2 (see FIG. 19) of the target detection image area 600 is determined.

このようにして、検出対象画像決定部６は、四角形の検出画像領域の右側端、左側端、上側端及び下側端を決定することによって、当該検出画像領域の位置及び大きさを決定する。そして、検出対象画像決定部６は、決定した検出画像領域の外形枠を、統合後検出結果枠とする。処理対象画像領域での統合後検出結果枠内の画像が、顔画像であると決定された検出画像領域となる。 In this way, the detection target image determination unit 6 determines the position and size of the detection image region by determining the right end, left end, upper end, and lower end of the quadrangle detection image region. Then, the detection target image determination unit 6 sets the determined outer frame of the detected image region as a post-integration detection result frame. An image in the post-integration detection result frame in the processing target image area becomes a detected image area determined to be a face image.

検出対象画像決定部６は、極大点の統合処理が終了した後に残った各極大点について、当該極大点に対応する検出画像領域（処理対象画像での顔画像）を決定するとともに、当該検出画像領域の外形枠を統合後検出結果枠とする。これにより、処理対象画像に含まれる各顔画像に関して、一つ顔画像に対して一つの統合後検出結果枠が求められる。 The detection target image determination unit 6 determines a detection image area (a face image in the processing target image) corresponding to the local maximum point for each local maximum point remaining after the local maximum point integration process is completed, and the detection image. The outer frame of the region is used as a detection result frame after integration. Thereby, for each face image included in the processing target image, one post-integration detection result frame is obtained for one face image.

なお、検出対象画像決定部６は、求めた検出画像領域の大きさがあまりにも小さい場合には、当該検出画像領域は顔画像ではないとして、当該検出画像領域を削除しても良い。言い換えれば、検出対象画像決定部６は、求めた統合後検出結果枠の大きさがあまりにも小さい場合には、当該統合後検出結果枠内の画像は顔画像ではないとして、当該統合後検出結果枠を削除しても良い。 In addition, when the size of the obtained detection image area is too small, the detection target image determination unit 6 may delete the detection image area, assuming that the detection image area is not a face image. In other words, if the size of the obtained detection result frame after integration is too small, the detection target image determination unit 6 determines that the image in the detection result frame after integration is not a face image, and the detection result after integration. The frame may be deleted.

図２１は、図１２，１５に示される処理対象画像２０ａに関して、検出対象画像決定部６で求められた検出画像領域６００及び統合後検出結果枠９００（検出画像領域６００の外形枠６００ａ）を示す図である。図２１では、検出画像領域６００及び統合後検出結果枠９００が処理対象画像２０ａに重ねて示されている。 FIG. 21 shows the detection image region 600 and the post-integration detection result frame 900 (the outer shape frame 600a of the detection image region 600) obtained by the detection target image determination unit 6 with respect to the processing target image 20a shown in FIGS. FIG. In FIG. 21, the detection image region 600 and the post-integration detection result frame 900 are shown superimposed on the processing target image 20a.

図２１に示されるように、処理対象画像２０ａに含まれる各顔画像に対して、おおよそ一つの検出画像領域６００が求められている。つまり、処理対象画像２０ａに含まれる各顔画像に対して、おおよそ一つの統合後検出結果枠９００が求められている。これは、一つの顔画像に対して求められた複数の検出結果枠１５０（図１２参照）が統合されて、当該一つの顔画像に対して一つの統合後検出結果枠９００が求められたことを意味している。そして、各統合後検出結果枠９００内には顔画像が適切に収まっている。よって、本実施の形態に係る画像検出装置１では、適切に顔画像が検出されていると言える。 As shown in FIG. 21, approximately one detected image area 600 is obtained for each face image included in the processing target image 20a. That is, approximately one post-integration detection result frame 900 is obtained for each face image included in the processing target image 20a. This is because a plurality of detection result frames 150 (see FIG. 12) obtained for one face image are integrated, and one post-integration detection result frame 900 is obtained for the one face image. Means. In addition, the face images are appropriately contained in each post-integration detection result frame 900. Therefore, it can be said that the face image is appropriately detected in the image detection apparatus 1 according to the present embodiment.

このように、本実施の形態では、検出対象画像としての確からしさを示す検出確度値についての処理対象画像での分布を示す出力値マップでの検出確度値の極大点と同じ当該処理対象画像での位置のピクセルを含む所定領域が当該検出対象画像であると決定されている。出力値マップでの検出確度値の極大点は、処理対象画像での検出対象画像の中心位置に対応すると考えられることから、処理対象画像において、当該極大点と同じ位置でのピクセルを含む所定領域を検出対象画像あると決定することによって、当該処理対象画像から当該検出対象画像を精度良く検出することができる。つまり、検出対象画像についての検出精度を向上することができる。 As described above, in the present embodiment, the processing target image is the same as the maximum point of the detection accuracy value in the output value map indicating the distribution in the processing target image with respect to the detection accuracy value indicating the probability as the detection target image. It is determined that the predetermined area including the pixel at the position is the detection target image. Since the maximum point of the detection accuracy value in the output value map is considered to correspond to the center position of the detection target image in the processing target image, the predetermined region including the pixel at the same position as the maximum point in the processing target image Is determined to be a detection target image, the detection target image can be accurately detected from the processing target image. That is, the detection accuracy for the detection target image can be improved.

なお、上記の例では、ノイズの影響により、単調減少でなくなったと誤って判断することを抑制するためにステップｓ１６を実行しているが、ステップｓ１６は実行しなくても良い。この場合には、ステップｓ１３において、（ｖ１−ｖ２）＜０であると判断されると、ステップｓ１５が実行されることになる。つまり、（ｖ１−ｖ２）＜０という比較結果が１回でも得られると、検出確度値が、対象極大点７００での検出確度値に対して（１／Ｚ）倍以下となる位置が現れる前に、検出確度値の変化が単調減少でなくなったと判断されて、ステップｓ１５が実行される。 In the above example, step s16 is executed in order to suppress erroneous determination that the monotonous decrease is no longer caused by the influence of noise, but step s16 need not be executed. In this case, if it is determined in step s13 that (v1-v2) <0, step s15 is executed. That is, when the comparison result of (v1-v2) <0 is obtained even once, before the position where the detection accuracy value becomes (1 / Z) times or less than the detection accuracy value at the target maximum point 700 appears. At the same time, it is determined that the change in the detection accuracy value is not monotonously decreased, and step s15 is executed.

また、画像検出装置１は、処理対象画像を表示装置で表示する際に、図２１に示されるように、当該処理対象画像に対して統合後検出結果枠９００（検出画像領域６００の外形枠６００ａ）を重ねて表示しても良い。 In addition, when the processing target image is displayed on the display device, the image detection device 1 displays the post-integration detection result frame 900 (the outer shape frame 600a of the detection image region 600) for the processing target image, as shown in FIG. ) May be displayed in an overlapping manner.

また、画像検出装置１は、予め登録された顔画像と、処理対象画像において顔画像であると判断された検出画像領域６００（統合後検出結果枠９００内の画像）とを比較し、両者が一致するか否かを判定しても良い。そして、画像検出装置１は、予め登録された顔画像と、処理対象画像での検出画像領域６００とが一致しない場合には、当該検出画像領域６００に対してモザイク処理を行った上で、当該処理対象画像を表示装置に表示しても良い。これにより、本実施の形態に係る画像検出装置１を監視カメラシステムに使用した場合において、監視カメラによって隣家の人の顔画像が撮影された場合であっても、当該顔画像を認識できないようにすることができる。つまり、プライバシーマスクを実現することができる。 In addition, the image detection apparatus 1 compares the face image registered in advance with the detection image region 600 (image in the post-integration detection result frame 900) determined to be a face image in the processing target image. It may be determined whether or not they match. If the face image registered in advance and the detected image area 600 in the processing target image do not match, the image detecting apparatus 1 performs mosaic processing on the detected image area 600 and then The processing target image may be displayed on the display device. As a result, when the image detection apparatus 1 according to the present embodiment is used in a surveillance camera system, even when a face image of a neighbor's person is photographed by the surveillance camera, the face image cannot be recognized. can do. That is, a privacy mask can be realized.

以上のように、本実施の形態では、使用撮像画像群を構成する、処理対象画像を含むＫ枚の使用撮像画像に対してＴ種類の検出枠が分散して対応付けられていることから、処理対象画像に対してはＴ種類よりも少ない種類の検出枠が対応付けられる。したがって、処理対象画像については、Ｔ種類よりも少ない種類の検出枠が使用されて検出処理が行われる。よって、処理対象画像から顔画像が検出される際に、処理対象画像だけが使用され、処理対象画像についてＴ種類の検出枠が使用されて検出処理が行われる場合と比較して、処理対象画像についての処理量を低減することができる。 As described above, in the present embodiment, the T types of detection frames are distributed and associated with the K used captured images including the processing target images that constitute the used captured image group. Less than T types of detection frames are associated with the processing target image. Therefore, detection processing is performed on the processing target image using fewer types of detection frames than T types. Therefore, when the face image is detected from the processing target image, only the processing target image is used, and the processing target image is compared with the case where the detection processing is performed using the T types of detection frames for the processing target image. The amount of processing can be reduced.

処理対象画像についての処理量を低減することによって様々な効果を得ることができる。例えば、画像検出装置１において、撮像部で撮像される各撮像画像（各フレーム画像）が処理対象画像とされる場合には、つまり、撮像部で撮像される各撮像画像に対して顔画像の検出が行われる際には、各撮像画像についての処理量を低減することができる。よって、画像検出装置１で行われる、顔画像の検出の処理量を低減することができる。 Various effects can be obtained by reducing the processing amount of the processing target image. For example, in the image detection apparatus 1, when each captured image (each frame image) captured by the imaging unit is a processing target image, that is, a facial image of each captured image captured by the imaging unit. When detection is performed, the processing amount for each captured image can be reduced. Therefore, it is possible to reduce the processing amount of face image detection performed in the image detection apparatus 1.

上記の例では、処理対象画像については、Ｔ種類の検出枠のうち（Ｔ／３）種類の検出枠だけが使用されるため、各撮像画像が処理対象画像とされる場合には、各撮像画像については（Ｔ／３）種類の検出枠だけが使用されることになる。よって、各撮像画像についてＴ種類の検出枠のすべてが使用される場合と比較して、検出部３での処理量が（１／３）倍となり、当該処理量を低減することができる。 In the above example, since only (T / 3) types of detection frames among the T types of detection frames are used as the processing target images, when each captured image is a processing target image, Only (T / 3) types of detection frames are used for the image. Therefore, compared with the case where all of the T types of detection frames are used for each captured image, the processing amount in the detection unit 3 is (1/3) times, and the processing amount can be reduced.

また、画像検出装置１において、撮像部で撮像される撮像画像がＵ枚（Ｕ＞Ｋ）ごとに処理対象画像とされる場合において、つまり、撮像部でＫ枚以上の撮像画像が得られるたびに、撮像画像から顔画像の検出が行われる場合において、本実施の形態とは異なり、処理対象画像についてＴ種類の検出枠のすべてが使用されて検出処理が行われるとすると、撮像部で得られる撮像画像のうち処理対象画像だけ処理量が非常に大きくなる。これに対して、本実施の形態のように、Ｔ種類の検出枠が用いられる検出処理が、Ｋ枚の使用撮像画像に分散して行われることによって、処理対象画像だけ処理量が大きくなることを抑制することができる。よって、撮像画像間での処理量の差異を低減することができる。 Further, in the image detection device 1, when the captured image captured by the imaging unit is a processing target image for each U (U> K), that is, whenever K or more captured images are obtained by the imaging unit. In addition, when a face image is detected from a captured image, unlike the present embodiment, if the detection processing is performed using all of the T types of detection frames for the processing target image, the image capturing unit obtains the detection result. Of the captured images to be processed, only the processing target image has a very large processing amount. On the other hand, as in the present embodiment, the detection processing using the T types of detection frames is performed in a distributed manner on the K used captured images, so that the processing amount is increased only for the processing target image. Can be suppressed. Therefore, a difference in processing amount between captured images can be reduced.

なお上記の例では、Ｋ＝３としたが、Ｋ＝２であっても良いし、Ｋ≧４であっても良い。また、上記の例では、使用撮像画像群を構成するＫ枚の使用撮像画像のうち、最後に撮像された使用撮像画像を処理対象画像としたが、他の使用撮像画像を処理対象画像としても良い。例えば、使用撮像画像群を構成するＫ枚の使用撮像画像のうち最初に撮像された使用撮像画像を処理対象画像としても良い。 In the above example, K = 3. However, K = 2 or K ≧ 4 may be used. In the above example, among the K used captured images constituting the used captured image group, the last captured captured image is used as the processing target image. However, other used captured images may be used as the processing target image. good. For example, the used captured image captured first among the K used captured images constituting the used captured image group may be set as the processing target image.

＜各種変形例＞
以下に、使用撮像画像群が（ｋ−２）フレーム目、（ｋ−１）フレーム目及びｋフレーム目の使用撮像画像で構成されている場合を例に挙げて、本実施の形態についての各種変形例について説明する。 <Various modifications>
Hereinafter, various examples of the present embodiment will be described by taking as an example a case in which the use captured image group includes the (k-2) th frame, the (k-1) th frame, and the kth frame. A modification will be described.

＜第１変形例＞
使用撮像画像群を構成する複数枚の使用撮像画像については互いに近いタイミングで撮像されているものの、処理対象画像とその他の使用撮像画像との間には多少の差異が存在する。したがって、顔画像の検出精度を向上させるという点だけを考慮すると、処理対象画像に対しては、できるだけ多くの種類の検出枠が使用されて検出処理が行われることが望ましい。 <First Modification>
Although the plurality of used captured images constituting the used captured image group are captured at timings close to each other, there are some differences between the processing target image and the other used captured images. Therefore, considering only the point of improving the detection accuracy of the face image, it is desirable that the detection process is performed on the processing target image using as many types of detection frames as possible.

そこで、本変形例では、検出部３は、使用撮像画像群を構成する複数枚の使用撮像画像における、処理対象画像以外の使用撮像画像について、当該使用撮像画像に対応付けられている検出枠を用いて検出処理を行った結果、当該使用撮像画像において、当該検出枠と同じサイズの顔画像である可能性が非常に高い領域が存在する場合には、処理対象画像についても当該検出枠を用いて検出処理を行う。 Therefore, in the present modification, the detection unit 3 uses a detection frame associated with the use-captured image for the use-captured image other than the processing target image in the plurality of use-captured images constituting the use-captured image group. As a result of performing detection processing using the detection frame, if there is a region that is very likely to be a face image of the same size as the detection frame in the used captured image, the detection frame is also used for the processing target image. Detection process.

例えば、検出部３は、（ｋ−２）フレーム目の使用撮像画像について得られた少なくとも一つの検出結果枠において、それに対応する検出確度値がしきい値以上の検出結果枠が含まれているかを判定する。ここでのしきい値（以後、「第２のしきい値」と呼ぶ）は、判定部３３において検出結果枠が特定される際に使用されるしきい値（以後、「第１のしきい値」と呼ぶ）よりも大きい値となっている。 For example, the detection unit 3 includes, in at least one detection result frame obtained for the use captured image of the (k-2) th frame, whether a detection result frame corresponding to the detection accuracy value corresponding to the threshold value is greater than or equal to a threshold value. Determine. The threshold value (hereinafter referred to as “second threshold value”) is a threshold value (hereinafter referred to as “first threshold”) used when the detection result frame is specified by the determination unit 33. The value is larger than the value).

検出部３は、（ｋ−２）フレーム目の使用撮像画像について、それに対応する検出確度値が第２のしきい値以上の検出結果枠を特定すると、（ｋ−２）フレーム目の使用撮像画像における、当該検出結果枠内の画像を、（ｋ−２）フレーム目の使用撮像画像において、当該検出結果枠と同じサイズの顔画像である可能性が非常に高い領域であると判断する。つまり、検出部３は、（ｋ−２）フレーム目の使用撮像画像について、それに対応する検出確度値が第２のしきい値以上の検出結果枠を特定すると、（ｋ−２）フレーム目の使用撮像画像において、当該検出結果枠に対応する検出枠（当該検出結果枠と同じサイズの検出枠）と同じサイズの顔画像である可能性が非常に高い領域が存在すると判断する。そして、検出部３は、（ｋ−２）フレーム目の使用撮像画像についての、検出確度値が第２のしきい値以上の検出結果枠に対応する検出枠を用いて処理対象画像に対して検出処理を行う。 When the detection unit 3 identifies a detection result frame whose detection accuracy value corresponding to the use captured image of the (k-2) frame is equal to or greater than the second threshold value, the use imaging of the (k-2) frame is used. It is determined that the image within the detection result frame in the image is a region that is very likely to be a face image having the same size as the detection result frame in the (k-2) -th frame used captured image. That is, when the detection unit 3 specifies a detection result frame having a detection accuracy value corresponding to the second threshold value or more for the use captured image of the (k-2) th frame, the (k-2) th frame In the used captured image, it is determined that there is a region that is very likely to be a face image having the same size as the detection frame corresponding to the detection result frame (a detection frame having the same size as the detection result frame). Then, the detection unit 3 uses the detection frame corresponding to the detection result frame whose detection accuracy value is greater than or equal to the second threshold for the use captured image of the (k-2) frame to the processing target image. Perform detection processing.

このように、検出部３は、（ｋ−２）フレーム目の使用撮像画像について、それに対応付けられている検出枠を用いて検出処理を行った結果、検出確度値が第２のしきい値以上の検出結果枠が得られると、（ｋ−２）フレーム目の使用撮像画像において、当該検出枠と同じサイズの顔画像である可能性が非常に高い領域が存在すると判断して、処理対象画像についても当該検出枠を用いて検出処理を行う。検出対象画像特定部７では、この検出処理によって得られた検出結果枠及び検出確度値も用いられて、処理対象画像中の顔画像が特定される。 As described above, the detection unit 3 performs the detection process on the use captured image of the (k−2) th frame using the detection frame associated with the use captured image, and as a result, the detection accuracy value is the second threshold value. When the above detection result frame is obtained, it is determined that there is a region in the use captured image of the (k-2) th frame that is very likely to be a face image having the same size as the detection frame. The detection process is also performed on the image using the detection frame. In the detection target image specifying unit 7, the face image in the processing target image is specified using the detection result frame and the detection accuracy value obtained by the detection processing.

同様にして、検出部３は、（ｋ−１）フレーム目の使用撮像画像について、それに対応付けられている検出枠を用いて検出処理を行った結果、検出確度値が第２のしきい値以上の検出結果枠が得られると、（ｋ−１）フレーム目の使用撮像画像において、当該検出枠と同じサイズの顔画像である可能性が非常に高い領域が存在すると判断して、処理対象画像についても当該検出枠を用いて検出処理を行う。検出対象画像特定部７では、この検出処理によって得られた検出結果枠及び検出確度値も用いられて、処理対象画像中の顔画像が特定される。 Similarly, the detection unit 3 performs the detection process on the used captured image of the (k−1) th frame using the detection frame associated therewith, and the detection accuracy value is the second threshold value. When the above detection result frame is obtained, it is determined that there is an area in the use captured image of the (k−1) th frame that is very likely to be a face image of the same size as the detection frame, and the processing target The detection process is also performed on the image using the detection frame. In the detection target image specifying unit 7, the face image in the processing target image is specified using the detection result frame and the detection accuracy value obtained by the detection processing.

このように、本変形例では、処理対象画像以外の使用撮像画像について、当該使用撮像画像に対応付けられている検出枠が用いられて検出処理が行われた結果、当該使用撮像画像において、当該検出枠と同じサイズの顔画像である可能性が非常に高い領域が存在する場合には、処理対象画像についても当該検出枠が用いられて検出処理が行われる。したがって、処理対象画像において顔画像が特定される際に、当該検出処理の結果も使用されることによって、顔画像が精度良く特定される。よって、顔画像の検出精度が向上する。 As described above, in this modified example, as a result of the detection processing performed on the use captured image other than the processing target image using the detection frame associated with the use captured image, If there is a region that is very likely to be a face image having the same size as the detection frame, the detection frame is also used for the processing target image and the detection process is performed. Therefore, when the face image is specified in the processing target image, the face image is specified with high accuracy by using the result of the detection process. Therefore, the detection accuracy of the face image is improved.

＜第２変形例＞
上述の図１２に示されるように、使用撮像画像群を構成する複数枚の使用撮像画像について得られたすべての検出結果枠１５０が処理対象画像２０ａについての検出結果枠として処理対象画像２０ａに重ねて配置されると、一つの顔画像付近に複数の検出結果枠１５０が集中する。つまり、一つの顔画像付近には、使用撮像画像群を構成する複数枚の使用撮像画像について得られた検出結果枠１５０が集中することになる。したがって、検出結果枠１５０が重ねられた処理対象画像２０ａにおいて、検出結果枠１５０が存在する領域であっても、複数枚の使用撮像画像において、その領域と重なる検出結果枠１５０が得られた使用撮像画像の数が少ない場合には、当該領域が顔画像である可能性は低い。よって、当該領域と重なる検出結果枠１５０に対応する検出結果領域は、顔画像である可能性が高いと誤って判定された領域であると考えることができる。したがって、当該領域と重なる検出結果枠１５０については、処理対象画像において検出対象画像が特定される際に使用されない方が好ましい。 <Second Modification>
As shown in FIG. 12 described above, all the detection result frames 150 obtained for a plurality of use captured images constituting the use captured image group are superimposed on the process target image 20a as detection result frames for the process target image 20a. Are arranged, a plurality of detection result frames 150 are concentrated in the vicinity of one face image. That is, detection result frames 150 obtained for a plurality of used captured images constituting the used captured image group are concentrated near one face image. Therefore, in the processing target image 20a on which the detection result frames 150 are overlapped, even if the detection result frame 150 is present, the use result in which the detection result frame 150 overlapping the region is obtained in a plurality of use captured images. When the number of captured images is small, the possibility that the area is a face image is low. Therefore, the detection result area corresponding to the detection result frame 150 that overlaps the area can be considered to be an area that is erroneously determined to be a face image. Therefore, it is preferable that the detection result frame 150 overlapping the area is not used when the detection target image is specified in the processing target image.

そこで、本変形例では、検出対象画像特定部７は、図１２のように検出結果枠１５０が重ねられた処理対象画像２０ａを複数のブロックに分割する。そして、検出対象画像特定部７は、複数のブロックにおける、検出結果枠１５０と重なるブロックについて、複数枚の使用撮像画像のうち、当該ブロックと重なる検出結果枠１５０が得られた使用撮像画像の数がしきい値以下である場合には、当該ブロックと重なる検出結果枠１５０を使用せずに、処理対象画像２０ａにおいて顔画像を特定する。本変形例では、検出対象画像特定部７は、検出結果枠１５０と重なるブロックについて、複数枚の使用撮像画像のうち、当該ブロックと重なる検出結果枠１５０が得られた使用撮像画像の数が、例えば１以下、つまり１である場合には、当該ブロックと重なる検出結果枠１５０を使用せずに、処理対象画像２０ａにおいて顔画像を特定する。以下に、本変形例について具体的に説明する。以下の説明では、ｋフレーム目の使用撮像画像について得られた検出結果枠１５０を「検出結果枠１５０ａ」とし、（ｋ−１）フレーム目の使用撮像画像について得られた検出結果枠１５０を「検出結果枠１５０ｂ」とし、（ｋ−２）フレーム目の使用撮像画像について得られた検出結果枠１５０を「検出結果枠１５０ｃ」とする。 Therefore, in the present modification, the detection target image specifying unit 7 divides the processing target image 20a on which the detection result frames 150 are superimposed as shown in FIG. 12 into a plurality of blocks. Then, the detection target image specifying unit 7 includes, for a block that overlaps the detection result frame 150 in a plurality of blocks, among the plurality of used captured images, the number of used captured images from which the detection result frame 150 that overlaps the block is obtained. Is equal to or less than the threshold value, the face image is specified in the processing target image 20a without using the detection result frame 150 overlapping the block. In the present modification, the detection target image specifying unit 7 has, for a block overlapping the detection result frame 150, among the plurality of used captured images, the number of used captured images from which the detection result frame 150 overlapping the block is obtained. For example, when it is 1 or less, that is, 1, the face image is specified in the processing target image 20a without using the detection result frame 150 that overlaps the block. Below, this modification is demonstrated concretely. In the following description, the detection result frame 150 obtained for the used captured image of the kth frame is referred to as “detection result frame 150a”, and the detection result frame 150 obtained for the used captured image of the (k−1) th frame is represented by “ The detection result frame 150b ”is obtained, and the detection result frame 150 obtained for the used captured image of the (k-2) th frame is assumed to be a“ detection result frame 150c ”.

図２２は、検出結果枠１５０が重ねられた処理対象画像２０ａが複数のブロック９５０に分割されている様子の一例を示す図である。図２２の例では、処理対象画像２０ａが、行方向に９個のブロック９５０に分割され、列方向に７個のブロック９５０に分割されている。なお、処理対象画像２０ａの分割方法はこの限りではない。 FIG. 22 is a diagram illustrating an example of a state in which the processing target image 20 a on which the detection result frames 150 are superimposed is divided into a plurality of blocks 950. In the example of FIG. 22, the processing target image 20a is divided into nine blocks 950 in the row direction and is divided into seven blocks 950 in the column direction. Note that the method of dividing the processing target image 20a is not limited to this.

複数のブロック９５０のうちのあるブロック９５０ａにおいては、検出結果枠１５０ａと、検出結果枠１５０ｂと、検出結果枠１５０ｃとが重なっている。したがって、ブロック９５０ａについては、使用撮像画像群を構成する３枚の使用撮像画像のうち、当該ブロック９５０ａと重なる検出結果枠１５０が得られた使用撮像画像の数は“３”であり、しきい値“１”よりも大きくなっている。したがって、検出対象画像特定部７は、ブロック９５０ａに重なっている検出結果枠１５０のすべてを、処理対象画像２０ａにおいて顔画像を特定する際に使用する。 In a block 950a among the plurality of blocks 950, the detection result frame 150a, the detection result frame 150b, and the detection result frame 150c overlap. Therefore, for the block 950a, of the three used captured images constituting the used captured image group, the number of used captured images in which the detection result frame 150 overlapping with the block 950a is obtained is “3”, and the threshold is set. The value is larger than “1”. Therefore, the detection target image specifying unit 7 uses all of the detection result frames 150 overlapping the block 950a when specifying a face image in the processing target image 20a.

また、複数のブロック９５０のうちのブロック９５０ｂにおいては、検出結果枠１５０ａと検出結果枠１５０ｂとが重なっている。したがって、ブロック９５０ｂについては、使用撮像画像群を構成する３枚の使用撮像画像のうち、当該ブロック９５０ｂと重なる検出結果枠１５０が得られた使用撮像画像の数は“２”であり、しきい値“１”よりも大きくなっている。したがって、検出対象画像特定部７は、ブロック９５０ｂに重なっている検出結果枠１５０のすべてを、処理対象画像２０ａにおいて顔画像を特定する際に使用する。 In addition, in the block 950b among the plurality of blocks 950, the detection result frame 150a and the detection result frame 150b overlap. Therefore, for the block 950b, out of the three used captured images constituting the used captured image group, the number of used captured images in which the detection result frame 150 overlapping the block 950b is obtained is “2”, and the threshold is set. The value is larger than “1”. Therefore, the detection target image specifying unit 7 uses all of the detection result frames 150 overlapping the block 950b when specifying the face image in the processing target image 20a.

一方で、複数のブロック９５０のうちのブロック９５０ｃにおいては、検出結果枠１５０ａだけが重なっている。したがって、ブロック９５０ｃについては、使用撮像画像群を構成する３枚の使用撮像画像のうち、当該ブロック９５０ｂと重なる検出結果枠１５０が得られた使用撮像画像の数は“１”であり、しきい値“１”以下となっている。したがって、検出対象画像特定部７は、ブロック９５０ｃに重なっている検出結果枠１５０について、ノイズの影響を受けて誤って取得された検出結果枠１５０であると判断して、処理対象画像２０ａにおいて顔画像を特定する際に使用しない。 On the other hand, in the block 950c of the plurality of blocks 950, only the detection result frame 150a overlaps. Therefore, with respect to the block 950c, of the three used captured images constituting the used captured image group, the number of used captured images in which the detection result frame 150 overlapping the block 950b is obtained is “1”, and the threshold is set. The value is “1” or less. Therefore, the detection target image specifying unit 7 determines that the detection result frame 150 overlapping the block 950c is the detection result frame 150 that is erroneously acquired due to the influence of noise, and the face in the processing target image 20a. Do not use when identifying images.

図２２の例では、ブロック９５０ａには、一つの検出結果枠１５０ａと、一つの検出結果枠１５０ｂと、一つの検出結果枠１５０ｃとが重なっているが、実際には、複数の検出結果枠１５０ａと、複数の検出結果枠１５０ｂと、複数の検出結果枠１５０ｃとが重なる可能性が高い。同様に、ブロック９５０ｂには、一つの検出結果枠１５０ａと、一つの検出結果枠１５０ｂとが重なっているが、実際には、複数の検出結果枠１５０ａと、複数の検出結果枠１５０ｂとが重なる可能性が高い。同様に、ブロック９５０ｃには、一つの検出結果枠１５０ａが重なっているが、実際には、複数の検出結果枠１５０ａが重なる可能性が高い。ブロック９５０ｃに複数の検出結果枠１５０ａが重なっている場合であっても、使用撮像画像群を構成する３枚の使用撮像画像のうち、ブロック９５０ｂと重なる検出結果枠１５０が得られた使用撮像画像の数は“１”であることから、ブロック９５０ａに重なる複数の検出結果枠１５０ａのすべてが、処理対象画像２０ａにおいて顔画像を特定する際に使用されない。 In the example of FIG. 22, one detection result frame 150a, one detection result frame 150b, and one detection result frame 150c overlap each other in the block 950a, but actually, a plurality of detection result frames 150a. The plurality of detection result frames 150b and the plurality of detection result frames 150c are likely to overlap. Similarly, in the block 950b, one detection result frame 150a and one detection result frame 150b overlap, but actually, a plurality of detection result frames 150a and a plurality of detection result frames 150b overlap. Probability is high. Similarly, one detection result frame 150a overlaps the block 950c, but in reality, there is a high possibility that a plurality of detection result frames 150a overlap. Even when a plurality of detection result frames 150a overlap with the block 950c, the use captured image in which the detection result frame 150 overlapping with the block 950b is obtained among the three use captured images constituting the use captured image group. Since the number of detection results is 150, all of the plurality of detection result frames 150a overlapping the block 950a are not used when specifying a face image in the processing target image 20a.

このように、本変形例では、検出対象画像特定部７は、複数のブロックにおける、検出結果枠１５０と重なるブロックについて、複数枚の使用撮像画像のうち、当該ブロックと重なる検出結果枠１５０が得られた使用撮像画像の数がしきい値以下である場合には、当該ブロックと重なる検出結果枠１５０を使用せずに、処理対象画像２０ａにおいて顔画像を特定する。したがって、使用撮像画像において、ノイズの影響により、顔画像についての特徴量とよく似た特徴量を有する画像が偶然に存在し、当該画像を顔画像である可能性が高いと誤って判断した結果によって得られた検出結果枠１５０が存在する場合であっても、検出対象画像特定部７は、当該検出結果枠１５０を除去して処理対象画像において顔画像を特定することができる。よって、顔画像の検出精度が向上する。 As described above, in this modification, the detection target image specifying unit 7 obtains the detection result frame 150 that overlaps the block among the plurality of used captured images for the block that overlaps the detection result frame 150 in the plurality of blocks. When the number of used captured images is equal to or less than the threshold value, the face image is specified in the processing target image 20a without using the detection result frame 150 that overlaps the block. Therefore, a result of erroneously determining that an image having a feature amount that closely resembles the feature amount of the face image accidentally exists in the used captured image due to the influence of noise, and that the image is likely to be a face image. Even if the detection result frame 150 obtained by the above is present, the detection target image specifying unit 7 can specify the face image in the processing target image by removing the detection result frame 150. Therefore, the detection accuracy of the face image is improved.

なお、検出対象画像特定部７は、検出結果枠１５０と重なるブロックについて、複数枚の使用撮像画像のうち、当該ブロックと重なる検出結果枠１５０が得られた使用撮像画像の数がしきい値以下である場合であっても、当該ブロックと重なる検出結果枠を外形枠とする検出結果領域に、顔画像である可能性が非常に高い検出結果領域が含まれる場合には、当該ブロックと重なる検出結果枠１５０を使用して、処理対象画像において顔画像を特定しても良い。 Note that the detection target image specifying unit 7 determines that the number of used captured images in which the detection result frame 150 overlapping the block is obtained is equal to or less than the threshold among the plurality of used captured images for the block overlapping the detection result frame 150. Even when the detection result area having the detection result frame that overlaps the block as the outer frame includes a detection result area that is very likely to be a face image, the detection that overlaps the block is detected. The result frame 150 may be used to specify a face image in the processing target image.

例えば、図２２の例において、ブロック９５０ｃに複数の検出結果枠１５０ａが重なっている場合を考える。このような場合、検出対象画像特定部７は、ブロック９５０ｃに重なっている複数の検出結果枠１５０ａにおいて、それに対応する検出確度値が第３のしきい値（＞第１のしきい値）以上である検出結果枠１５０ａが存在するかどうかを判断する。そして、検出対象画像特定部７は、ブロック９５０ｃに重なっている複数の検出結果枠１５０ａにおいて、それに対応する検出確度値が第３のしきい値以上である検出結果枠１５０ａが存在する場合には、ブロック９５０ｃと重なる検出結果枠１５０ａを外形枠とする検出結果領域に、顔画像である可能性が非常に高い検出結果領域が含まれていると判断して、ブロック９５０ａと重なる複数の検出結果枠１５０ａを使用して、処理対象画像において顔画像を特定する。これにより、処理対象画像において顔画像が特定される際に、正しい検出結果枠１５０が誤って使用されなくなることを抑制することができる。 For example, consider a case where a plurality of detection result frames 150a overlap the block 950c in the example of FIG. In such a case, the detection target image specifying unit 7 has a detection accuracy value corresponding to a third threshold value (> first threshold value) or more in a plurality of detection result frames 150a overlapping the block 950c. It is determined whether or not there is a detection result frame 150a. Then, when the detection target image specifying unit 7 has a plurality of detection result frames 150a overlapping the block 950c and a detection result frame 150a corresponding to the detection accuracy value equal to or greater than the third threshold value, the detection target image specifying unit 7 The detection result area having the detection result frame 150a overlapping the block 950c as the outer frame includes a detection result area very likely to be a face image, and a plurality of detection results overlapping the block 950a A face image is specified in the processing target image using the frame 150a. Thereby, when the face image is specified in the processing target image, it is possible to prevent the correct detection result frame 150 from being erroneously used.

一方で、検出対象画像特定部７は、ブロック９５０ｃに重なっている複数の検出結果枠１５０ａにおいて、それに対応する検出確度値が第３のしきい値以上である検出結果枠１５０ａが存在しない場合には、ブロック９５０ｃと重なる検出結果枠１５０ａを外形枠とする検出結果領域には、顔画像である可能性が非常に高い検出結果領域が含まれていないいと判断して、ブロック９５０ａと重なる複数の検出結果枠１５０ａを使用せずに、処理対象画像において顔画像を特定する。 On the other hand, the detection target image specifying unit 7 determines that, in the plurality of detection result frames 150a overlapping the block 950c, there is no detection result frame 150a having a corresponding detection accuracy value equal to or greater than the third threshold value. Determines that the detection result region having the detection result frame 150a that overlaps the block 950c as an outer frame does not include a detection result region that is very likely to be a face image, and thus includes a plurality of overlapped blocks 950a. A face image is specified in the processing target image without using the detection result frame 150a.

＜第３変形例＞
使用撮像画像群を構成する複数枚の使用撮像画像については互いに近いタイミングで撮像されているものの、処理対象画像とその他の使用撮像画像との間には多少の差異が存在する。したがって、処理対象画像以外の使用撮像画像について得られた検出確度値については、処理対象画像についての検出確度値として見た場合には精度が低いと言える。そして、処理対象画像以外の使用撮像画像について得られた検出確度値については、当該使用撮像画像の撮像タイミングが処理対象画像の撮像タイミングから離れるほど、処理対象画像についての検出確度値として見た場合には精度が低くなると言える。 <Third Modification>
Although the plurality of used captured images constituting the used captured image group are captured at timings close to each other, there are some differences between the processing target image and the other used captured images. Therefore, it can be said that the detection accuracy value obtained for the used captured image other than the processing target image has low accuracy when viewed as the detection accuracy value for the processing target image. And about the detection accuracy value obtained about the use captured image other than the processing target image, when the detection timing value of the processing target image is viewed as the imaging timing of the use captured image becomes far from the imaging timing of the processing target image Can be said to be less accurate.

そこで、本変形例では、検出対象画像特定部７は、使用撮像画像群を構成する複数枚の使用撮像画像について得られた検出結果枠（検出結果領域）に対応する検出確度値に対して重み付けを行ったうえで当該検出確度値に基づいて、処理対象画像において顔画像を特定する。具体的には、検出対象画像特定部７のマップ生成部４は、複数枚の使用撮像画像について得られた検出結果枠に対応する検出確度値に対して重み付けを行ったうえで当該検出確度値に基づいて出力値マップを生成する。そして、マップ生成部４は、使用撮像画像について得られた検出結果枠に対応する検出確度値に対して重み付けを行う際には、当該使用撮像画像の撮像タイミングが処理対象画像の撮像タイミングよりも離れているほど、当該検出確度値に対する重み付けを小さくする。 Therefore, in the present modification, the detection target image specifying unit 7 weights the detection accuracy values corresponding to the detection result frames (detection result regions) obtained for the plurality of used captured images constituting the used captured image group. And a face image is specified in the processing target image based on the detection accuracy value. Specifically, the map generation unit 4 of the detection target image specifying unit 7 weights the detection accuracy values corresponding to the detection result frames obtained for the plurality of used captured images, and then performs the detection accuracy value. An output value map is generated based on Then, when weighting the detection accuracy value corresponding to the detection result frame obtained for the used captured image, the map generation unit 4 sets the capturing timing of the used captured image to be higher than the capturing timing of the processing target image. The farther away, the smaller the weight for the detection accuracy value.

例えば、本変形例に係るマップ生成部４は、（ｋ−２）フレーム目の使用撮像画像について得られた検出結果枠についてのマップ２００（図１３参照）を生成する際には、マップ２００に設定された枠２１０内の中心２１１の値を（図１４参照）、当該検出結果枠に対応する検出確度値に対して“０．８”を乗算して得られる値とする。そして、マップ生成部４は、枠２１０内のそれ以外の複数の値を、枠２１０の中心２１１の値を最大値とした正規分布曲線に従って枠２１０内の中心２１１から外側に向けて値が徐々に小さくなるようにする。 For example, when generating the map 200 (see FIG. 13) for the detection result frame obtained for the used captured image of the (k−2) th frame, the map generation unit 4 according to the present modification includes the map 200. The set value of the center 211 in the frame 210 (see FIG. 14) is a value obtained by multiplying the detection accuracy value corresponding to the detection result frame by “0.8”. Then, the map generation unit 4 gradually increases the values of the other values in the frame 210 from the center 211 in the frame 210 to the outside according to a normal distribution curve with the value at the center 211 of the frame 210 as the maximum value. To be smaller.

また、マップ生成部４は、（ｋ−１）フレーム目の使用撮像画像について得られた検出結果枠についてのマップ２００を生成する際には、マップ２００に設定された枠２１０内の中心２１１の値を、当該検出結果枠に対応する検出確度値に対して“０．９”を乗算して得られる値とする。そして、マップ生成部４は、枠２１０内のそれ以外の複数の値を、枠２１０の中心２１１の値を最大値とした正規分布曲線に従って枠２１０内の中心２１１から外側に向けて値が徐々に小さくなるようにする。 Further, when the map generation unit 4 generates the map 200 for the detection result frame obtained for the used captured image of the (k−1) th frame, the map generation unit 4 displays the center 211 in the frame 210 set in the map 200. The value is a value obtained by multiplying the detection accuracy value corresponding to the detection result frame by “0.9”. Then, the map generation unit 4 gradually increases the values of the other values in the frame 210 from the center 211 in the frame 210 to the outside according to a normal distribution curve with the value at the center 211 of the frame 210 as the maximum value. To be smaller.

そして、マップ生成部４は、処理対象画像であるｋフレーム目の使用撮像画像について得られた検出結果枠についてのマップ２００を生成する際には、マップ２００に設定された枠２１０内の中心２１１の値を、当該検出結果枠に対応する検出確度値に対して“１．０”を乗算して得られる値とする。そして、マップ生成部４は、枠２１０内のそれ以外の複数の値を、枠２１０の中心２１１の値を最大値とした正規分布曲線に従って枠２１０内の中心２１１から外側に向けて値が徐々に小さくなるようにする。 When the map generation unit 4 generates the map 200 for the detection result frame obtained for the used captured image of the k-th frame that is the processing target image, the center 211 in the frame 210 set in the map 200 is displayed. Is a value obtained by multiplying the detection accuracy value corresponding to the detection result frame by “1.0”. Then, the map generation unit 4 gradually increases the values of the other values in the frame 210 from the center 211 in the frame 210 to the outside according to a normal distribution curve with the value at the center 211 of the frame 210 as the maximum value. To be smaller.

このように、使用撮像画像について得られた検出結果領域についての検出確度値に対して重み付けが行われる際に、当該使用撮像画像の撮像タイミングが処理対象画像の撮像タイミングよりも離れているほど、当該検出確度値に対する重み付けが小さくされることによって、精度の高い出力値マップを生成することが可能となる。よって、処理対象画像において顔画像を正確に特定することが可能となる。その結果、顔画像の検出精度が向上する。 As described above, when weighting is performed on the detection accuracy value for the detection result region obtained for the used captured image, the farther the imaging timing of the used captured image is from the imaging timing of the processing target image, By reducing the weighting on the detection accuracy value, a highly accurate output value map can be generated. Therefore, it is possible to accurately specify the face image in the processing target image. As a result, the detection accuracy of the face image is improved.

以上の第１〜第３変形例は、少なくとも２つの変形例を組み合わせて使用することも可能である。例えば、第１変形例と第２変形例とを組み合わせて使用することも可能であるし、第１〜第３変形例を組み合わせて使用することも可能である。 The first to third modified examples described above can be used in combination with at least two modified examples. For example, the first modified example and the second modified example can be used in combination, and the first to third modified examples can be used in combination.

また、上記において画像検出装置１は詳細に説明されたが、上記した説明は、全ての局面において例示であって、この発明がそれに限定されるものではない。また、上述した各種の例は、相互に矛盾しない限り組み合わせて適用可能である。そして、例示されていない無数の変形例が、この発明の範囲から外れることなく想定され得るものと解される。 Moreover, although the image detection apparatus 1 was demonstrated in detail above, the above-mentioned description is an illustration in all the aspects, Comprising: This invention is not limited to it. The various examples described above can be applied in combination as long as they do not contradict each other. And it is understood that the countless modification which is not illustrated can be assumed without deviating from the scope of the present invention.

１画像検出装置
３検出部
７検出対象画像特定部
１２制御プログラム DESCRIPTION OF SYMBOLS 1 Image detection apparatus 3 Detection part 7 Detection target image specific part 12 Control program

Claims

An image detection device for detecting a detection target image from a processing target image,
A detection unit that performs a detection process for detecting, as a detection result region, a region that is highly likely to be the detection target image having the same size as the detection frame, using the detection frame;
A detection target image specifying unit that specifies the detection target image in the processing target image,
The detection unit includes at least one type of detection frame included in a plurality of types of detection frames having different sizes with respect to each of a plurality of captured images including the processing target image captured at different timings. In correspondence with each other, the plurality of types of detection frames are distributed and associated with the plurality of captured images, and for each of the plurality of captured images, at least one type of detection frame corresponding to the captured image. The detection process is performed using
The said detection target image specific | specification part is an image detection apparatus which specifies the said detection target image in the said process target image based on the said detection result area | region detected about the said several picked-up image in the said detection part.

The image detection apparatus according to claim 1,
The plurality of types of detection frames include a reference detection frame of a reference size and a non-reference detection frame of a size different from the reference size,
The detector is
When performing the detection process for the captured image using the non-reference detection frame,
Resizing the non-reference detection frame so that the size matches the reference size, and changing the size of the captured image according to the size change of the non-reference detection frame,
While moving the size change detection frame that is the non-reference detection frame after the size change to the size change image that is the captured image after the size change, the image within the size change detection frame in the size change image An image detection device that determines whether or not the image is likely to be the detection target image.

The image detection apparatus according to any one of claims 1 and 2,
In the captured image, the detection unit performs the detection process on the captured images other than the processing target image in the plurality of captured images using the detection frames included in the multiple types of detection frames. An image detection apparatus that performs the detection process on the processing target image using the detection frame when there is a region that is very likely to be the detection target image having the same size as the detection frame.

The image detection apparatus according to any one of claims 1 to 3,
The detection target image specifying unit specifies the detection target image in the processing target image based on a detection result frame that is an outer frame of the detection result region detected for the plurality of captured images in the detection unit. ,
The detection target image specifying unit divides the processing target image on which the detection result frame is overlaid into a plurality of blocks,
The detection target image specifying unit is
About the block that overlaps the detection result frame in the plurality of blocks,
When the number of the captured images obtained from the plurality of captured images that are overlapped with the block is less than or equal to the threshold value, without using the detection result frames that overlap the block, An image detection device that identifies the detection target image in the processing target image.

The image detection apparatus according to claim 4,
The detection target image specifying unit is
About the block that overlaps the detection result frame in the plurality of blocks,
Among the plurality of captured images, even if the number of captured images from which the detection result frame overlapping the block is obtained is equal to or less than a threshold value, the detection result frame overlapping the block is defined as an outer frame. When the detection result area that is very likely to be the detection target image is included in the detection result area, the detection result frame that overlaps the block is used to detect the detection in the processing target image. An image detection device that identifies a target image.

An image detection device according to any one of claims 1 to 5,
The detection target image specifying unit is based on a detection accuracy value indicating the probability that the detection result region is the detection target image for the detection result region detected for the plurality of captured images in the detection unit. An image detection device that identifies the detection target image in the processing target image.

The image detection device according to claim 6,
The detection target image specifying unit weights the detection accuracy value for the detection result region detected for the plurality of captured images in the detection unit, and then based on the detection accuracy value, Specifying the detection target image in the processing target image;
When the detection target image specifying unit weights the detection accuracy value for the detection result area detected for the captured image, the capturing timing of the captured image is higher than the capturing timing of the processing target image. An image detection apparatus that reduces the weighting of the detection accuracy value as the distance increases.

An image detection apparatus according to any one of claims 1 to 7,
The image detection apparatus, wherein the detection target image is a human face image.

A control program for controlling an image detection device that detects a detection target image from a processing target image,
In the image detection device,
(A) performing a detection process for detecting, as a detection result region, a region that is highly likely to be the detection target image having the same size as the detection frame, using the detection frame;
(B) executing the step of identifying the detection target image in the processing target image;
In the step (a), at least one type of detection included in a plurality of types of detection frames having different sizes with respect to each of a plurality of captured images including the processing target image captured at different timings. The plurality of types of detection frames are distributed and associated with the plurality of captured images so that the frames correspond, and for each of the plurality of captured images, at least one type of detection frame corresponding to the captured image The detection process is performed using
Control for operating in the step (b) to specify the detection target image in the processing target image based on the detection result area detected for the plurality of captured images in the step (a). program.

An image detection method for detecting a detection target image from a processing target image,
(A) performing a detection process for detecting, as a detection result region, a region that is highly likely to be the detection target image having the same size as the detection frame, using the detection frame;
(B) specifying the detection target image in the processing target image,
In the step (a), at least one type of detection included in a plurality of types of detection frames having different sizes with respect to each of a plurality of captured images including the processing target image captured at different timings. The plurality of types of detection frames are distributed and associated with the plurality of captured images so that the frames correspond, and for each of the plurality of captured images, at least one type of detection frame corresponding to the captured image The detection process is performed using
An image detection method in which, in the step (b), the detection target image is specified in the processing target image based on the detection result area detected for the plurality of captured images in the step (a).