JP6116916B2

JP6116916B2 - Image detection apparatus, control program, and image detection method

Info

Publication number: JP6116916B2
Application number: JP2013004591A
Authority: JP
Inventors: 健太西行
Original assignee: MegaChips Corp
Current assignee: MegaChips Corp
Priority date: 2013-01-15
Filing date: 2013-01-15
Publication date: 2017-04-19
Anticipated expiration: 2033-01-15
Also published as: JP2014137629A

Description

本発明は、処理対象画像から検出対象画像を検出する技術に関する。 The present invention relates to a technique for detecting a detection target image from a processing target image.

特許文献１〜３には、処理対象画像から検出対象画像を検出する技術が開示されている。 Patent Documents 1 to 3 disclose techniques for detecting a detection target image from a processing target image.

特開２００８−２７０５８号公報JP 2008-27058 A 特開２００９−１７５８２１号公報JP 2009-175821 A 特開２０１１−２２１７９１号公報JP 2011-221791 A

さて、処理対象画像から検出対象画像を検出する際には、その検出精度の向上が望まれている。 Now, when detecting a detection target image from a processing target image, improvement of the detection accuracy is desired.

そこで、本発明は上述の点に鑑みて成されたものであり、検出対象画像についての検出精度を向上させることが可能な技術を提供することを目的とする。 Therefore, the present invention has been made in view of the above points, and an object thereof is to provide a technique capable of improving the detection accuracy of a detection target image.

上記課題を解決するため、本発明に係る画像検出装置の一態様は、処理対象画像から検出対象画像を検出する画像検出装置であって、前記処理対象画像に対して互いに位置が異なる複数の検出枠を設定し、当該複数の検出枠のそれぞれについて、当該検出枠内の画像に対して前記検出対象画像の検出処理を行う検出部と、前記検出部での検出結果に基づいて、前記検出対象画像としての確からしさを示す検出確度値についての前記処理対象画像での分布を示すマップを生成するマップ生成部と、前記マップにおいて前記検出確度値の極大点を探索する探索部と、所定の基準に基づいて前記マップにおいて互いに近い複数の前記極大点を一つに統合する統合処理を行い、前記処理対象画像において前記統合処理後の前記極大点と同じ位置のピクセルを含む所定領域を前記検出対象画像であると決定する決定部とを備え、前記決定部は、前記マップにおいて、前記統合処理後の前記極大点から離れる方向に沿って当該極大点から前記検出確度値を見ていった際に、前記検出確度値が、当該極大点での前記検出確度値に対して最初に（１／Ｚ）倍以下（Ｚ＞１）となる位置と同じ前記処理対象画像での位置を、前記検出対象画像であると決定する前記所定領域の端とし、前記マップにおいて、前記統合処理後の前記極大点から離れる方向に沿って当該極大点から前記検出確度値を見ていった際に、前記検出確度値が、当該極大点での前記検出確度値に対して（１／Ｚ）倍以下となる位置が現れる前に、前記検出確度値の変化が単調減少でなくなったと判断すると、当該変化が単調減少でなくなったと判断した位置を、前記検出対象画像であると決定する前記所定領域の端とする。 In order to solve the above-described problem, one aspect of an image detection device according to the present invention is an image detection device that detects a detection target image from a processing target image, and a plurality of detections whose positions differ from each other with respect to the processing target image A detection unit configured to detect a detection target image with respect to an image in the detection frame for each of the plurality of detection frames, and the detection target based on a detection result of the detection unit. A map generating unit that generates a map indicating a distribution in the processing target image with respect to a detection accuracy value indicating the certainty as an image; a search unit that searches for a maximum point of the detection accuracy value in the map; and a predetermined reference one performs integration processing to integrate into, pixels in the same position and the maximum point after the integration processing in the processing target image a plurality of said local maximum points close to each other in the map based on And a determining section that determines to be the detection target image a predetermined region including the Le, the determination unit, in the map, the detection of the maximum point in a direction away from the maximum point after the integration process When the accuracy value is viewed, the processing target is the same as the position where the detection accuracy value is initially (1 / Z) times or less (Z> 1) with respect to the detection accuracy value at the local maximum point. The position in the image is defined as the end of the predetermined area that is determined to be the detection target image, and the detection accuracy value is viewed from the local maximum point along the direction away from the local maximum point after the integration process in the map. The change in the detection accuracy value is not monotonously reduced before the position at which the detection accuracy value is less than (1 / Z) times the detection accuracy value at the maximum point appears. The change is not monotonically decreasing. The position determining that became, the end of the predetermined area to determine that the to be detected image.

また、本発明に係る画像検出装置の一態様では、前記探索部は、二次元座標に配置された前記マップにおいて、前記検出確度値を重み係数として、処理対象領域内に含まれる複数の前記検出確度値がそれぞれ存在する複数の位置についての座標値の重み付け平均値を算出し、当該処理対象領域の中心位置が当該重み付け平均値となるように当該処理対象領域を移動させる処理を繰り返すことによって前記極大点を探索する。 In the image detection device according to the aspect of the invention, the search unit may include a plurality of the detections included in a processing target region using the detection accuracy value as a weighting factor in the map arranged in two-dimensional coordinates. By calculating a weighted average value of coordinate values for a plurality of positions each having an accuracy value, and repeating the process of moving the processing target region so that the center position of the processing target region becomes the weighted average value, Search for local maxima.

また、本発明に係る画像検出装置の一態様では、前記探索部は、前記検出部が前記検出処理を行った結果、前記検出枠内の画像が前記検出対象画像である可能性が高いと判定した当該検出枠内の所定位置と同じ前記マップでの位置を前記処理対象領域の中心位置として前記極大点の探索を開始する。 In the image detection device according to the aspect of the invention, the search unit may determine that the image within the detection frame is likely to be the detection target image as a result of the detection unit performing the detection process. The search for the maximum point is started with the position on the map that is the same as the predetermined position in the detection frame as the center position of the processing target region.

また、本発明に係る画像検出装置の一態様では、前記探索部は、前記重み付け平均値を算出する際には、前記検出確度値が間引かれた前記マップにおいて前記処理対象領域内に含まれる複数の前記検出確度値がそれぞれ存在する複数の位置についての座標値を使用する。 In the aspect of the image detection apparatus according to the present invention, when the weighting average value is calculated, the search unit is included in the processing target region in the map in which the detection accuracy value is thinned out. Coordinate values for a plurality of positions each having a plurality of detection accuracy values are used.

また、本発明に係る画像検出装置の一態様では、前記検出対象画像は、人の顔画像である。 In one aspect of the image detection apparatus according to the present invention, the detection target image is a human face image.

また、本発明に係る制御プログラムの一態様は、処理対象画像から検出対象画像を検出する画像検出装置を制御するための制御プログラムであって、前記画像検出装置に、（ａ）前記処理対象画像に対して互いに位置が異なる複数の検出枠を設定し、当該複数の検出枠のそれぞれについて、当該検出枠内の画像に対して前記検出対象画像の検出処理を行う工程と、（ｂ）前記工程（ａ）での検出結果に基づいて、前記検出対象画像としての確からしさを示す検出確度値についての前記処理対象画像での分布を示すマップを生成する工程と、（ｃ）前記マップにおいて前記検出確度値の極大点を探索する工程と、（ｄ）所定の基準に基づいて前記マップにおいて互いに近い複数の前記極大点を一つに統合する統合処理を行う工程と、（ｅ）前記処理対象画像において前記統合処理後の前記極大点と同じ位置のピクセルを含む所定領域を前記検出対象画像であると決定する工程とを実行させ、前記マップにおいて、前記統合処理後の前記極大点から離れる方向に沿って当該極大点から前記検出確度値を見ていった際に、前記検出確度値が、当該極大点での前記検出確度値に対して最初に（１／Ｚ）倍以下（Ｚ＞１）となる位置と同じ前記処理対象画像での位置を、前記検出対象画像であると決定する前記所定領域の端とし、前記マップにおいて、前記統合処理後の前記極大点から離れる方向に沿って当該極大点から前記検出確度値を見ていった際に、前記検出確度値が、当該極大点での前記検出確度値に対して（１／Ｚ）倍以下となる位置が現れる前に、前記検出確度値の変化が単調減少でなくなったと判断すると、当該変化が単調減少でなくなったと判断した位置を、前記検出対象画像であると決定する前記所定領域の端とする工程を、前記工程（ｅ）において実行させるためのものである。 One aspect of the control program according to the present invention is a control program for controlling an image detection device that detects a detection target image from a processing target image. The control program includes (a) the processing target image. A plurality of detection frames whose positions are different from each other, and for each of the plurality of detection frames, a process of detecting the detection target image with respect to an image in the detection frame; and (b) the step Generating a map indicating a distribution in the processing target image with respect to a detection accuracy value indicating the certainty as the detection target image based on the detection result in (a), and (c) detecting the detection in the map A step of searching for a local maximum point of the accuracy value; (d) a step of performing integration processing for integrating the plurality of local maximum points close to each other in the map based on a predetermined criterion; and ( e ) before A predetermined region including a pixel at the same position as the maximum point after the integration processing in the processing target image is executed, and in the map, the maximum point after the integration processing When the detection accuracy value is viewed from the local maximum point along the direction away from the detection point, the detection accuracy value is initially less than (1 / Z) times the detection accuracy value at the local maximum point ( The position in the processing target image that is the same as the position where Z> 1) is set as the end of the predetermined region that is determined to be the detection target image, and in the direction away from the maximum point after the integration processing in the map When the detection accuracy value is viewed from the local maximum point along the line, before the position at which the detection accuracy value is less than (1 / Z) times the detection accuracy value at the local maximum point appears. The change of the detection accuracy value is monotonous If it is determined that no longer small, the position determination with the change is no longer monotonically decreases, the process of the end of the predetermined area to determine said a detection target image, the order is executed in the step (e) Is.

また、本発明に係る画像検出方法の一態様は、処理対象画像から検出対象画像を検出する画像検出方法であって、（ａ）前記処理対象画像に対して互いに位置が異なる複数の検出枠を設定し、当該複数の検出枠のそれぞれについて、当該検出枠内の画像に対して前記検出対象画像の検出処理を行う工程と、（ｂ）前記工程（ａ）での検出結果に基づいて、前記検出対象画像としての確からしさを示す検出確度値についての前記処理対象画像での分布を示すマップを生成する工程と、（ｃ）前記マップにおいて前記検出確度値の極大点を探索する工程と、（ｄ）所定の基準に基づいて前記マップにおいて互いに近い複数の前記極大点を一つに統合する統合処理を行う工程と、（ｅ）前記処理対象画像において前記統合処理後の前記極大点と同じ位置のピクセルを含む所定領域を前記検出対象画像であると決定する工程とを備え、前記工程（ｅ）では、前記マップにおいて、前記統合処理後の前記極大点から離れる方向に沿って当該極大点から前記検出確度値を見ていった際に、前記検出確度値が、当該極大点での前記検出確度値に対して最初に（１／Ｚ）倍以下（Ｚ＞１）となる位置と同じ前記処理対象画像での位置を、前記検出対象画像であると決定する前記所定領域の端とし、前記マップにおいて、前記統合処理後の前記極大点から離れる方向に沿って当該極大点から前記検出確度値を見ていった際に、前記検出確度値が、当該極大点での前記検出確度値に対して（１／Ｚ）倍以下となる位置が現れる前に、前記検出確度値の変化が単調減少でなくなったと判断すると、当該変化が単調減少でなくなったと判断した位置を、前記検出対象画像であると決定する前記所定領域の端とする。 An aspect of the image detection method according to the present invention is an image detection method for detecting a detection target image from a processing target image, wherein: (a) a plurality of detection frames whose positions are different from each other with respect to the processing target image; Setting, for each of the plurality of detection frames, a step of performing detection processing of the detection target image on an image in the detection frame, and (b) based on the detection result in the step (a), generating a map showing the distribution in the processing target image of detection accuracy value indicating the likelihood of a detection target image, a step of searching a maximum point of the detection accuracy value in (c) the map, ( d) performing an integration process of integrating a plurality of local maximum points close to each other in the map based on a predetermined criterion; and ( e ) the same as the local maximum point after the integration process in the processing target image. A predetermined region including a pixel at the same position is determined as the detection target image, and in the step (e), in the map, the local maximum along the direction away from the local maximum after the integration process. When the detection accuracy value is viewed from a point, the detection accuracy value is initially (1 / Z) times or less (Z> 1) with respect to the detection accuracy value at the local maximum point; The position in the same processing target image is set as an end of the predetermined region that is determined to be the detection target image, and the detection is performed from the local maximum point along the direction away from the local maximum point after the integration processing in the map. When looking at the accuracy value, the detection accuracy value changes before the position where the detection accuracy value becomes (1 / Z) times or less than the detection accuracy value at the maximum point appears. If it is determined that the monotonic decrease is no longer The position is determined to no longer monotonically decreases and the end of the predetermined area to determine that the to be detected image.

本発明によれば、検出対象画像についての検出精度を向上させることができる。 According to the present invention, it is possible to improve detection accuracy for a detection target image.

画像検出装置の構成を示す図である。It is a figure which shows the structure of an image detection apparatus. 画像検出装置が備える複数の機能ブロックの構成を示す図である。It is a figure which shows the structure of the several functional block with which an image detection apparatus is provided. 検出部の構成を示す図である。It is a figure which shows the structure of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出結果枠を処理対象画像に重ねて示す図である。It is a figure which overlaps and shows a detection result frame on a process target image. 出力値マップの生成方法を説明するための図である。It is a figure for demonstrating the production | generation method of an output value map. 出力値マップの生成方法を説明するための図である。It is a figure for demonstrating the production | generation method of an output value map. 出力値マップを示す図である。It is a figure which shows an output value map. 極大点の探索方法を説明するための図である。It is a figure for demonstrating the search method of a local maximum point. 極大点の探索方法を説明するための図である。It is a figure for demonstrating the search method of a local maximum point. 出力値マップでの極大点付近の検出確度値の分布を示す図である。It is a figure which shows distribution of the detection accuracy value vicinity of the maximum point in an output value map. 検出画像領域の決定方法を説明するための図である。It is a figure for demonstrating the determination method of a detection image area | region. 検出対象画像決定部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a detection target image determination part. 検出画像領域及び統合後検出結果枠を処理対象画像に重ねて示す図である。It is a figure which overlaps and shows a detection image area | region and the detection result frame after integration on a process target image.

図１は実施の形態に係る画像検出装置１の構成を示す図である。本実施の形態に係る画像検出装置１は、入力される画像データが示す画像から検出対象画像を検出する。画像検出装置１は、例えば、監視カメラシステム、デジタルカメラシステム等で使用される。本実施の形態では、検出対象画像は、例えば人の顔画像である。以後、単に「顔画像」と言えば、人の顔画像を意味するものとする。また、検出対象画像の検出処理を行う対象の画像を「処理対象画像」と呼ぶ。画像検出装置１での検出対象画像は顔画像以外の画像であっても良い。 FIG. 1 is a diagram illustrating a configuration of an image detection apparatus 1 according to an embodiment. The image detection apparatus 1 according to the present embodiment detects a detection target image from an image indicated by input image data. The image detection apparatus 1 is used, for example, in a surveillance camera system, a digital camera system, or the like. In the present embodiment, the detection target image is, for example, a human face image. Hereinafter, simply speaking “face image” means a human face image. An image to be subjected to detection processing of a detection target image is referred to as a “processing target image”. The detection target image in the image detection apparatus 1 may be an image other than a face image.

図１に示されるように、画像検出装置１は、ＣＰＵ（Central Processing Unit）１０及び記憶部１１を備えている。記憶部１１は、ＲＯＭ（Read Only Memory）及びＲＡＭ（Random Access Memory）等で構成されている。記憶部１１には、画像検出装置１の動作を制御するための制御プログラム１２等が記憶されている。画像検出装置１の各種機能は、ＣＰＵ１０が記憶部１１内の制御プログラム１２を実行することによって実現される。画像検出装置１では、制御プログラム１２が実行されることによって、図２に示されるような複数の機能ブロックが形成される。 As shown in FIG. 1, the image detection apparatus 1 includes a CPU (Central Processing Unit) 10 and a storage unit 11. The storage unit 11 includes a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The storage unit 11 stores a control program 12 for controlling the operation of the image detection apparatus 1 and the like. Various functions of the image detection apparatus 1 are realized by the CPU 10 executing the control program 12 in the storage unit 11. In the image detection apparatus 1, a plurality of functional blocks as shown in FIG. 2 are formed by executing the control program 12.

図２に示されるように、画像検出装置１は、機能ブロックとして、画像入力部２と、検出部３と、マップ生成部４と、極大点探索部５と、検出対象画像決定部６とを備えている。以下では、画像検出装置１が備える各機能ブロックの概略動作を説明した後に、当該各機能ブロックの詳細動作について説明する。なお、画像検出装置１が備える各機能ブロックはハードウェア回路で実現されても良い。 As shown in FIG. 2, the image detection apparatus 1 includes an image input unit 2, a detection unit 3, a map generation unit 4, a maximum point search unit 5, and a detection target image determination unit 6 as functional blocks. I have. Hereinafter, after describing the schematic operation of each functional block included in the image detection apparatus 1, the detailed operation of each functional block will be described. Note that each functional block included in the image detection apparatus 1 may be realized by a hardware circuit.

＜画像検出装置の概略動作説明＞
画像入力部２には、監視カメラシステム等が備える撮像部（カメラ）で順次撮像された複数枚の画像をそれぞれ示す複数の画像データが順次入力される。画像入力部２は、処理対象画像を示す画像データを出力する。画像入力部２は、撮像部で得られる各画像を処理対象画像としても良いし、撮像部で得られる画像のうち、数秒毎に得られる画像を処理対象画像としても良い。撮像部では、例えば、１秒間にＬ枚（Ｌ≧２）の画像が撮像される。つまり、撮像部での撮像フレームレートは、Ｌｆｐｓ(frame per second）である。また、撮像部で撮像される画像では、行方向にＭ個（Ｍ≧２）のピクセルが並び、列方向にＮ個（Ｎ≧２）のピクセルが並んでいる。撮像部で撮像される画像の解像度は、例えばＶＧＡ（Video Graphics Array）であって、Ｍ＝６４０、Ｎ＝４８０となっている。 <Overview of operation of image detection apparatus>
The image input unit 2 is sequentially input with a plurality of image data respectively indicating a plurality of images sequentially captured by an imaging unit (camera) included in the surveillance camera system or the like. The image input unit 2 outputs image data indicating the processing target image. The image input unit 2 may use each image obtained by the imaging unit as a processing target image, or may use an image obtained every few seconds among images obtained by the imaging unit as a processing target image. In the imaging unit, for example, L (L ≧ 2) images are captured per second. That is, the imaging frame rate at the imaging unit is Lfps (frame per second). In the image captured by the imaging unit, M (M ≧ 2) pixels are arranged in the row direction, and N (N ≧ 2) pixels are arranged in the column direction. The resolution of the image picked up by the image pickup unit is, for example, VGA (Video Graphics Array), and M = 640 and N = 480.

なお以後、行方向にｍ個（ｍ≧１）のピクセルが並び、列方向にｎ個（ｎ≧１）のピクセルが並ぶ領域の大きさをｍｐ×ｎｐで表す（ｐはピクセルの意味）。また、行列状に配置された複数の値において、左上を基準にして第ｍ行目であって第ｎ列目に位置する値をｍ×ｎ番目の値と呼ぶことがある。 Hereinafter, the size of an area in which m (m ≧ 1) pixels are arranged in the row direction and n (n ≧ 1) pixels are arranged in the column direction is represented by mp × np (p is a meaning of a pixel). In addition, among a plurality of values arranged in a matrix, a value located in the m-th row and the n-th column with reference to the upper left may be referred to as an m × n-th value.

検出部３は、画像入力部２から出力される画像データを使用して、処理対象画像に対して顔画像の検出を行う。マップ生成部４は、検出部３での検出結果に基づいて、顔画像としての確からしさを示す検出確度値についての処理対象画像での分布を示す出力値マップを生成する。極大点探索部５は、マップ生成部４で生成された出力値マップにおいて検出確度値の極大点を探索する。検出対象画像決定部６は、処理対象画像において、極大点探索部５で求められた極大点と同じ位置のピクセルを含む所定領域を顔画像であると決定する。これにより、画像検出装置１では、処理対象画像から顔画像が検出される。 The detection unit 3 uses the image data output from the image input unit 2 to detect a face image for the processing target image. The map generation unit 4 generates an output value map indicating the distribution in the processing target image with respect to the detection accuracy value indicating the likelihood as the face image based on the detection result of the detection unit 3. The maximum point search unit 5 searches for a maximum point of the detection accuracy value in the output value map generated by the map generation unit 4. The detection target image determination unit 6 determines that a predetermined area including a pixel at the same position as the maximum point obtained by the maximum point search unit 5 in the processing target image is a face image. As a result, the image detection apparatus 1 detects a face image from the processing target image.

＜画像検出装置の詳細動作説明＞
＜顔検出処理＞
検出部３は、処理対処画像に対して検出枠を移動させながら当該検出枠内の部分画像に対して顔画像の検出を行う。言い換えれば、検出部３は、処理対象画像に対して互いに位置が異なる複数の検出枠を設定し、当該複数の検出枠のそれぞれについて、当該検出枠内の部分画像に対して顔画像の検出処理を行う。 <Detailed operation description of image detection apparatus>
<Face detection processing>
The detection unit 3 detects a face image for a partial image within the detection frame while moving the detection frame with respect to the processed image. In other words, the detection unit 3 sets a plurality of detection frames whose positions are different from each other with respect to the processing target image, and for each of the plurality of detection frames, a face image detection process is performed on a partial image in the detection frame. I do.

また、本実施の形態では、処理対象画像における様々な大きさの顔画像を検出するために、検出部３では、互いに大きさが異なる複数種類の検出枠が使用される。以後、複数種類の検出枠のうち、大きさが最も小さい検出枠を「基準検出枠」と呼び、その他の種類の検出枠を「非基準検出枠」と呼ぶことがある。本実施の形態では、複数種類の検出枠には、互いに大きさが異なる複数種類の非基準検出枠が含まれている。基準検出枠の大きさは例えば１６ｐ×１６ｐである。また、複数種類の非基準検出枠には、例えば、大きさが１８ｐ×１８ｐの非基準検出枠及び大きさが２０ｐ×２０ｐの非基準検出枠などが含まれている。 In this embodiment, in order to detect face images of various sizes in the processing target image, the detection unit 3 uses a plurality of types of detection frames having different sizes. Hereinafter, among the plurality of types of detection frames, the detection frame having the smallest size may be referred to as a “reference detection frame”, and other types of detection frames may be referred to as “non-reference detection frames”. In the present embodiment, the plurality of types of detection frames include a plurality of types of non-reference detection frames having different sizes. The size of the reference detection frame is, for example, 16p × 16p. The plurality of types of non-reference detection frames include, for example, a non-reference detection frame having a size of 18p × 18p and a non-reference detection frame having a size of 20p × 20p.

図３は検出部３の構成を示す図である。図３に示されるように、検出部３は、検出枠設定部３０と、特徴量抽出部３１と、識別器３２と、判定部３３とを備えている。 FIG. 3 is a diagram illustrating a configuration of the detection unit 3. As illustrated in FIG. 3, the detection unit 3 includes a detection frame setting unit 30, a feature amount extraction unit 31, a discriminator 32, and a determination unit 33.

検出枠設定部３０は、処理対象画像に対して検出枠を設定する。特徴量抽出部３１は、処理対象画像における、検出枠設定部３０で設定された検出枠内の部分画像から、例えばＨａａｒ−ｌｉｋｅ特徴量やＬＢＰ（Local Binary Pattern）特徴量などの特徴量を抽出する。識別器３２は、特徴量抽出部３１で抽出された特徴量と学習データに基づいて、設定された検出枠内の部分画像に対して顔検出を行った結果として、当該部分画像が顔画像である確からしさを示す検出確度値を実数値として出力する。つまり、識別器３２から出力される検出確度値は、検出枠内の部分画像についての顔画像らしさ（顔らしさ）を示していると言える。識別器３２としては、例えば、ＳＶＭ（Support Vector Machine）あるいはＡｄａｂｏｏｓｔが使用される。 The detection frame setting unit 30 sets a detection frame for the processing target image. The feature amount extraction unit 31 extracts feature amounts such as Haar-like feature amounts and LBP (Local Binary Pattern) feature amounts from the partial images within the detection frame set by the detection frame setting unit 30 in the processing target image. To do. The discriminator 32 performs face detection on the partial image within the set detection frame based on the feature amount extracted by the feature amount extraction unit 31 and the learning data, and as a result, the partial image is a face image. A detection accuracy value indicating a certain probability is output as a real value. That is, it can be said that the detection accuracy value output from the discriminator 32 indicates the likelihood of a face image (the likelihood of a face) for a partial image within the detection frame. As the discriminator 32, for example, SVM (Support Vector Machine) or Adaboost is used.

判定部３３は、識別器２２から出力される検出確度値がしきい値以上であれば、検出枠内の部分画像が顔画像である可能性が高いと判定する。一方で、判定部３３は、識別器２２から出力される検出確度値がしきい未満であれば、検出枠内の部分画像が顔画像でない可能性が高いと判定する。以後、判定部３３において、検出枠内の部分画像が顔画像である可能性が高いと判定された当該検出枠、つまりその中から顔画像が検出された検出枠を「検出結果枠」と呼ぶ。 If the detection accuracy value output from the discriminator 22 is greater than or equal to the threshold value, the determination unit 33 determines that there is a high possibility that the partial image in the detection frame is a face image. On the other hand, if the detection accuracy value output from the discriminator 22 is less than the threshold, the determination unit 33 determines that there is a high possibility that the partial image in the detection frame is not a face image. Hereinafter, the detection frame in which the determination unit 33 determines that the partial image in the detection frame is likely to be a face image, that is, the detection frame in which the face image is detected is referred to as a “detection result frame”. .

次に、検出部３が処理対象画像に対して検出枠を移動させながら、当該検出枠内の部分画像に対して顔画像の検出を行う際の当該検出部３の一連の動作について説明する。図４〜７は、検出部３の当該動作を説明するための図である。検出部３は、検出枠をラスタスキャンさせながら、当該検出枠内の部分画像に対して顔画像の検出を行う。 Next, a series of operations of the detection unit 3 when the detection unit 3 detects a face image for a partial image in the detection frame while moving the detection frame with respect to the processing target image will be described. 4-7 is a figure for demonstrating the said operation | movement of the detection part 3. FIG. The detection unit 3 detects a face image for a partial image in the detection frame while performing raster scanning of the detection frame.

図４に示されるように、検出枠設定部３０は、まず検出枠１００を処理対象画像２０の左上に設定する。その後、特徴量抽出部３１は、左上の検出枠１００内の部分画像から特徴量を抽出する。そして、識別器３２は、特徴量抽出部３１が抽出した特徴量と学習データに基づいて検出枠１００内の部分画像についての検出確度値を求める。判定部３３は、識別器３２で求められた検出確度値がしきい値以上である場合には、検出枠１００内の部分画像が顔画像である可能性が高いと判定して当該検出枠１００を検出結果枠とする。 As shown in FIG. 4, the detection frame setting unit 30 first sets the detection frame 100 at the upper left of the processing target image 20. Thereafter, the feature amount extraction unit 31 extracts a feature amount from the partial image in the upper left detection frame 100. Then, the classifier 32 obtains a detection accuracy value for the partial image in the detection frame 100 based on the feature amount extracted by the feature amount extraction unit 31 and the learning data. When the detection accuracy value obtained by the discriminator 32 is equal to or greater than the threshold value, the determination unit 33 determines that the partial image in the detection frame 100 is likely to be a face image and determines the detection frame 100. Is a detection result frame.

次に検出枠設定部３０は、処理対象画像２０において検出枠１００を少し右に移動させて、処理対象画像２０に対して新たな検出枠１００を設定する。検出枠設定部３０は、例えば、１ピクセル分あるいは数ピクセル分だけ右に検出枠１００を移動させる。その後、特徴量抽出部３１が、新たに設定された検出枠１００内の部分画像から特徴量を抽出し、識別器３２が当該特徴量と学習データに基づいて、新たに設定された検出枠１００内の部分画像についての検出確度値を求める。判定部３３は、識別器３２で求められた検出確度値がしきい値以上である場合には、新たに設定された検出枠１００内の部分画像が顔画像である可能性が高いと判定して、新たに設定された検出枠１００を検出結果枠とする。 Next, the detection frame setting unit 30 moves the detection frame 100 slightly to the right in the processing target image 20 to set a new detection frame 100 for the processing target image 20. For example, the detection frame setting unit 30 moves the detection frame 100 to the right by one pixel or several pixels. Thereafter, the feature amount extraction unit 31 extracts a feature amount from the newly set partial image in the detection frame 100, and the discriminator 32 newly sets the detection frame 100 based on the feature amount and the learning data. A detection accuracy value for the partial image is obtained. The determination unit 33 determines that there is a high possibility that the newly set partial image in the detection frame 100 is a face image when the detection accuracy value obtained by the classifier 32 is equal to or greater than the threshold value. Thus, the newly set detection frame 100 is set as a detection result frame.

その後、検出部３は同様に動作して、図５に示されるように、検出枠１００が処理対象画像の右端まで移動すると、検出部３は、右端の検出枠１００内の部分画像についての検出確度値を求める。そして、検出部３は、求めた検出確度値がしきい値以上であれば、右端の検出枠１００を検出結果枠とする。 Thereafter, the detection unit 3 operates in the same manner, and as illustrated in FIG. 5, when the detection frame 100 moves to the right end of the processing target image, the detection unit 3 detects the partial image in the detection frame 100 at the right end. Find the accuracy value. And the detection part 3 will make the detection frame 100 of a right end into a detection result frame, if the calculated | required detection accuracy value is more than a threshold value.

次に、検出枠設定部３０は、図６に示されるように、検出枠１００を少し下げつつ処理対象画像２０の左端に移動させて、処理対象画像２０に対して新たに検出枠１００を設定する。検出枠設定部３０は、上下方向（列方向）において例えば１ピクセル分あるいは数ピクセル分だけ下に検出枠１００を移動させる。その後、特徴量抽出部３１が、新たに設定された検出枠１００内の部分画像から特徴量を抽出し、識別器３２が当該特徴量と学習データに基づいて、新たに設定された検出枠１００内の部分画像についての検出確度値を求めて出力する。判定部３３は、識別器３２から出力される検出確度値がしきい値以上である場合には、新たに設定された検出枠１００内の部分画像が顔画像である可能性が高いと判定して、新たに設定された検出枠１００を検出結果枠とする。 Next, as shown in FIG. 6, the detection frame setting unit 30 moves the detection frame 100 to the left end of the processing target image 20 while slightly lowering the detection frame 100, and sets a new detection frame 100 for the processing target image 20. To do. The detection frame setting unit 30 moves the detection frame 100 downward by, for example, one pixel or several pixels in the vertical direction (column direction). Thereafter, the feature amount extraction unit 31 extracts a feature amount from the newly set partial image in the detection frame 100, and the discriminator 32 newly sets the detection frame 100 based on the feature amount and the learning data. The detection accuracy value for the partial image is obtained and output. When the detection accuracy value output from the discriminator 32 is greater than or equal to the threshold value, the determination unit 33 determines that there is a high possibility that the newly set partial image within the detection frame 100 is a face image. Thus, the newly set detection frame 100 is set as a detection result frame.

その後、検出部３は同様に動作して、図７に示されるように、検出枠１００が処理対象画像２０の右下まで移動すると、検出部３は、右下の当該検出枠１００内の部分画像についての検出確度値を求める。そして、検出部３は、求めた検出確度値がしきい値以上であれば、右端の検出枠１００を検出結果枠とする。 Thereafter, the detection unit 3 operates in the same manner, and as shown in FIG. 7, when the detection frame 100 moves to the lower right of the processing target image 20, the detection unit 3 moves to the lower right part of the detection frame 100. A detection accuracy value for the image is obtained. And the detection part 3 will make the detection frame 100 of a right end into a detection result frame, if the calculated | required detection accuracy value is more than a threshold value.

検出部３は、以上のような顔検出処理を、互いに大きさが異なる複数種類の検出枠のそれぞれを用いて行う。本実施の形態では、検出部３は、基準検出枠を使用する際には、上述のように、処理対象画像に対して基準検出枠を移動させながら、当該基準検出枠内の部分画像に対して顔画像の検出を行う。そして、検出部３は、処理対象画像に対して基準検出枠をラスタスキャンさせながら、当該基準検出枠内の部分画像に対して顔画像の検出を行った結果得られる少なくとも一つの検出結果枠を、基準検出枠を用いた顔検出結果として記憶する。 The detection unit 3 performs the face detection process as described above using each of a plurality of types of detection frames having different sizes. In the present embodiment, when using the reference detection frame, the detection unit 3 moves the reference detection frame with respect to the processing target image as described above, while moving the reference detection frame with respect to the partial image in the reference detection frame. To detect face images. Then, the detection unit 3 performs at least one detection result frame obtained as a result of detecting the face image on the partial image in the reference detection frame while performing raster scanning of the reference detection frame on the processing target image. And stored as a face detection result using the reference detection frame.

一方で、検出部３は、非基準検出枠を使用する際には、非基準検出枠をそのままのサイズで使用するのではなく、非基準検出枠の大きさが基準検出枠の大きさと一致するように非基準検出枠を縮小して使用する。そして、検出部３は、非基準検出枠を縮小した分、処理対象画像も縮小して使用する。 On the other hand, when using the non-reference detection frame, the detection unit 3 does not use the non-reference detection frame as it is, but the size of the non-reference detection frame matches the size of the reference detection frame. Thus, the non-reference detection frame is reduced and used. Then, the detection unit 3 uses the processing target image by reducing the non-reference detection frame.

検出部３は、例えば、大きさがＲｐ×Ｒｐ（Ｒ＞１６）の非基準検出枠を使用する場合には、当該非基準検出枠の上下方向（列方向）の幅及び左右方向（行方向）の幅をそれぞれ（１６／Ｒ）倍して、当該非基準検出枠を縮小する。以後、このように縮小された非基準検出枠を「縮小非基準検出枠」と呼ぶ。また、検出部３は、処理対象画像の上下方向の幅（画素数）及び左右方向の幅（画素数）をそれぞれ（１６／Ｒ）倍して処理対象画像を縮小する。以後、このように縮小された処理対象画像を「縮小処理対象画像」と呼ぶ。そして、検出部３は、上述の図４〜７を用いて説明した顔検出処理と同様に、縮小処理対象画像に対して縮小非基準検出枠を移動させながら、当該縮小非基準検出枠内の部分画像に対して顔画像の検出を行う。以後、縮小非基準検出枠が使用されて顔画像の検出処理が行われた結果得られる、縮小処理対象画像に応じた検出結果枠、つまり、縮小非基準検出枠内の部分画像が顔画像である可能性が高いと判定された当該縮小非基準検出枠を特に「縮小検出結果枠」と呼ぶ。 For example, when using a non-reference detection frame having a size of Rp × Rp (R> 16), the detection unit 3 has a width in the vertical direction (column direction) and a horizontal direction (row direction) of the non-reference detection frame. ) Are respectively multiplied by (16 / R) to reduce the non-reference detection frame. Hereinafter, the non-reference detection frame reduced in this way is referred to as a “reduced non-reference detection frame”. The detection unit 3 reduces the processing target image by multiplying the vertical width (number of pixels) and the horizontal width (number of pixels) of the processing target image by (16 / R), respectively. Hereinafter, the processing target image thus reduced is referred to as a “reduction processing target image”. Then, the detection unit 3 moves the reduced non-reference detection frame with respect to the reduction target image while moving the reduced non-reference detection frame with respect to the reduction processing target image, as in the face detection process described with reference to FIGS. The face image is detected for the partial image. Thereafter, the detection result frame corresponding to the reduction target image obtained as a result of the face image detection process using the reduced non-reference detection frame, that is, the partial image in the reduced non-reference detection frame is a face image. The reduced non-reference detection frame determined to have a high possibility is particularly referred to as a “reduction detection result frame”.

検出部３は、縮小非基準検出枠を用いて縮小処理対象画像に対して顔画像の検出処理を行い、その結果、少なくとも一つの縮小検出結果枠が得られると、当該少なくとも一つの縮小検出結果枠を処理対象画像に応じた検出結果枠に変換する。 The detection unit 3 performs face image detection processing on the reduction target image using the reduced non-reference detection frame, and as a result, when at least one reduced detection result frame is obtained, the at least one reduced detection result is obtained. The frame is converted into a detection result frame corresponding to the processing target image.

具体的には、検出部３は、まず、縮小処理対象画像に対して、得られた少なくとも一つの縮小検出結果枠を設定する。図８は、縮小処理対象画像１２０に対して縮小検出結果枠１３０が設定されている様子を示す図である。図８の例では、縮小処理対象画像１２０に対して複数の縮小検出結果枠１３０が設定されている。 Specifically, the detection unit 3 first sets at least one obtained reduction detection result frame for the reduction processing target image. FIG. 8 is a diagram illustrating a state in which a reduction detection result frame 130 is set for the reduction processing target image 120. In the example of FIG. 8, a plurality of reduction detection result frames 130 are set for the reduction processing target image 120.

次に検出部３は、縮小検出結果枠１３０が設定された縮小処理対象画像１２０を拡大することによって、図９に示されるように、縮小処理対象画像１２０を処理対象画像２０に変換する。これにより、縮小処理対象画像１２０に設定された縮小検出結果枠１３０も拡大されて、縮小検出結果枠１３０は、図９に示されるように、処理対象画像２０に応じた検出結果枠１５０に変換される。検出部３は、各非基準検出枠について、このようにして求めた検出結果枠１５０を、非基準検出枠を用いた顔検出結果として記憶する。 Next, the detection unit 3 converts the reduction processing target image 120 into the processing target image 20 as shown in FIG. 9 by enlarging the reduction processing target image 120 in which the reduction detection result frame 130 is set. As a result, the reduction detection result frame 130 set in the reduction processing target image 120 is also enlarged, and the reduction detection result frame 130 is converted into a detection result frame 150 corresponding to the processing target image 20, as shown in FIG. Is done. The detection unit 3 stores the detection result frame 150 thus obtained for each non-reference detection frame as a face detection result using the non-reference detection frame.

以上のように、検出部３が、複数種類の検出枠のそれぞれを用いて顔検出を行うことによって、検出部３では、当該複数種類の検出枠のそれぞれについて、当該検出枠に対応した複数の検出確度値が得られる。そして、検出部３では、顔検出結果として、一つの種類の検出枠について少なくとも一つの検出結果枠が得られる。 As described above, when the detection unit 3 performs face detection using each of a plurality of types of detection frames, the detection unit 3 performs a plurality of detection frames corresponding to the detection frames for each of the plurality of types of detection frames. A detection accuracy value is obtained. The detection unit 3 obtains at least one detection result frame for one type of detection frame as the face detection result.

図１０は、ある処理対象画像２０に対して、複数種類の検出枠を用いて顔画像の検出処理を行った結果得られる複数の検出結果枠１５０を当該処理対象画像２０に重ねて示す図である。図１０に示されるように、複数種類の検出枠を用いて顔画像の検出処理を行うことによって、様々な大きさの検出結果枠１５０が得られる。これは、処理対象画像２０に含まれる様々な大きさの顔画像が検出されていることを意味している。また、図１０に示されるように、得られた検出結果枠１５０を処理対象画像２０に重ねると、一つの顔画像付近に複数の検出結果枠１５０が集中する。つまり、一つの顔画像に対して複数の検出結果枠１５０が得られる。 FIG. 10 is a diagram showing a plurality of detection result frames 150 obtained by performing face image detection processing on a certain processing target image 20 using a plurality of types of detection frames, overlaid on the processing target image 20. is there. As shown in FIG. 10, detection result frames 150 of various sizes are obtained by performing face image detection processing using a plurality of types of detection frames. This means that face images of various sizes included in the processing target image 20 are detected. As shown in FIG. 10, when the obtained detection result frame 150 is superimposed on the processing target image 20, a plurality of detection result frames 150 are concentrated in the vicinity of one face image. That is, a plurality of detection result frames 150 are obtained for one face image.

＜出力値マップ生成処理＞
上述のように、検出枠を用いて顔検出処理を行うことによって、処理対象画像から顔画像を検出することは可能である。 <Output value map generation processing>
As described above, it is possible to detect a face image from the processing target image by performing the face detection process using the detection frame.

しかしながら、図１０に示されるように、一つの顔画像に対して複数の検出結果枠１５０が得られることから、処理対象画像に含まれる顔画像の数を特定することが困難である。また、図１０のように検出結果枠１５０が重ねられた処理対象画像２０を表示装置に表示したとすると、処理対象画像２０中に含まれる顔画像が複数の検出結果枠１５０で隠れてしまう可能性があり、当該顔画像を識別することが困難となる。 However, as shown in FIG. 10, since a plurality of detection result frames 150 are obtained for one face image, it is difficult to specify the number of face images included in the processing target image. Further, when the processing target image 20 with the detection result frames 150 superimposed thereon is displayed on the display device as shown in FIG. 10, the face image included in the processing target images 20 may be hidden by the plurality of detection result frames 150. And it is difficult to identify the face image.

そこで、顔画像付近に集中している複数の検出結果枠１５０を一つの検出結果枠に統合して統合検出結果枠を生成し、一つの顔画像には一つの統合検出結果枠が対応することが望ましい。 Therefore, an integrated detection result frame is generated by integrating a plurality of detection result frames 150 concentrated near the face image into one detection result frame, and one integrated detection result frame corresponds to one face image. Is desirable.

一方で、複数の検出結果枠１５０を適切に統合しないと、統合検出結果枠内に顔画像が適切に収まらず、その結果、顔画像の検出精度が低下する可能性がある。 On the other hand, if the plurality of detection result frames 150 are not properly integrated, the face image does not fit properly in the integrated detection result frame, and as a result, the detection accuracy of the face image may be lowered.

本実施の形態では、マップ生成部４が生成する出力値マップを用いて、処理対象画像において顔画像である領域を特定し、その領域の外形枠を統合検出結果枠とすることによって、精度の良い統合検出結果枠、つまりその内側に適切に顔画像が収まっている統合検出結果枠を生成する。 In the present embodiment, the output value map generated by the map generation unit 4 is used to identify a region that is a face image in the processing target image, and the outline frame of the region is used as an integrated detection result frame, thereby improving accuracy. A good integrated detection result frame, that is, an integrated detection result frame in which a face image is appropriately contained inside is generated.

ここでは、出力値マップの生成方法について説明する。 Here, a method for generating an output value map will be described.

マップ生成部４は、検出部３での検出結果に基づいて、顔画像としての確からしさ（顔画像らしさ）を示す検出確度値についての処理対象画像での分布を示す出力値マップを生成する。 The map generation unit 4 generates an output value map indicating the distribution in the processing target image with respect to the detection accuracy value indicating the likelihood (face image likelihood) as the face image based on the detection result of the detection unit 3.

具体的には、マップ生成部４は、処理対象画像と同様に、行方向にＭ個の値が並び、列方向にＮ個の値が並ぶ、合計（Ｍ×Ｎ）個の値から成るマップ２００を考える。そして、マップ生成部４は、基準検出枠を用いた顔検出結果として得られた少なくとも一つの検出結果枠に含まれる一つの検出結果枠を対象検出結果枠とし、対象検出結果枠と同じ位置に、対象検出結果枠と同じ大きさの枠２１０をマップ２００に対して設定する。図１１は、マップ２００に対して枠２１０を設定した様子を示す図である。 Specifically, the map generation unit 4 is a map composed of a total of (M × N) values in which M values are arranged in the row direction and N values are arranged in the column direction, similarly to the processing target image. Think about 200. Then, the map generation unit 4 sets one detection result frame included in at least one detection result frame obtained as a face detection result using the reference detection frame as a target detection result frame, and is located at the same position as the target detection result frame. A frame 210 having the same size as the target detection result frame is set for the map 200. FIG. 11 is a diagram illustrating a state in which a frame 210 is set for the map 200.

次にマップ生成部４は、マップ２００における、枠２１０外の各値については“０”とし、枠２１０内の各値については、対象検出結果枠に対応する検出確度値（対象検出結果枠となった検出枠内の部分画像に対して顔画像の検出処理を行った結果得られた検出確度値）を用いて決定する。本実施の形態では、基準検出枠の大きさは１６ｐ×１６ｐであることから、枠２１０内には、行方向に１６個、列方向に１６個、合計２５６個の値が存在する。図１２は、枠２１０内の各値を決定する方法を説明するための図である。 Next, the map generation unit 4 sets “0” for each value outside the frame 210 in the map 200, and for each value within the frame 210, a detection accuracy value corresponding to the target detection result frame (the target detection result frame and The detection accuracy value obtained as a result of performing the face image detection process on the partial image within the detection frame is determined. In the present embodiment, since the size of the reference detection frame is 16p × 16p, there are a total of 256 values in the frame 210, 16 in the row direction and 16 in the column direction. FIG. 12 is a diagram for explaining a method for determining each value in the frame 210.

マップ生成部４は、枠２１０内の中心２１１の値を、検出部３で求められた、対象検出結果枠に対応する検出確度値とする。そして、マップ生成部４は、枠２１０内のそれ以外の複数の値を、枠２１０の中心２１１の値を最大値とした正規分布曲線に従って枠２１０内の中心２１１から外側に向けて値が徐々に小さくなるようにする。これにより、マップ２００を構成する複数の値のそれぞれが決定されて、対象検出結果枠に対応するマップ２００が完成する。マップ生成部４は、基準検出枠を用いた顔検出結果として得られた少なくとも一つの検出結果枠にそれぞれ対応する少なくとも一つのマップ２００を生成する。これにより、マップ生成部４では、基準検出枠に対応する少なくとも一つのマップ２００が生成される。 The map generation unit 4 sets the value of the center 211 in the frame 210 as the detection accuracy value corresponding to the target detection result frame obtained by the detection unit 3. Then, the map generation unit 4 gradually increases the values of the other values in the frame 210 from the center 211 in the frame 210 to the outside according to a normal distribution curve with the value at the center 211 of the frame 210 as the maximum value. To be smaller. Thereby, each of a plurality of values constituting the map 200 is determined, and the map 200 corresponding to the target detection result frame is completed. The map generation unit 4 generates at least one map 200 corresponding to each of at least one detection result frame obtained as a face detection result using the reference detection frame. As a result, the map generation unit 4 generates at least one map 200 corresponding to the reference detection frame.

またマップ生成部４は、上記と同様にして、複数種類の非基準検出枠のそれぞれについて、当該非基準検出枠を用いた顔検出結果として得られた少なくとも一つの検出結果枠にそれぞれ対応する少なくとも一つのマップ２００を生成する。ここで、非基準検出枠を用いた顔検出結果として得られた少なくとも一つの検出結果枠の大きさが、例えば１８ｐ×１８ｐであるとすると、枠２１０内には、行方向に１８個、列方向に１８個、合計３２４個の値が存在する。また、非基準検出枠を用いた顔検出結果として得られた少なくとも一つの検出結果枠の大きさが、例えば２０ｐ×２０ｐであるとすると、枠２１０内には、行方向に２０個、列方向に２０個、合計４００個の値が存在する。 Similarly to the above, the map generation unit 4 at least corresponds to at least one detection result frame obtained as a face detection result using the non-reference detection frame for each of a plurality of types of non-reference detection frames. One map 200 is generated. Here, assuming that the size of at least one detection result frame obtained as a face detection result using the non-reference detection frame is, for example, 18p × 18p, there are 18 columns in the frame 210 in the row direction. There are 18 values in the direction, a total of 324 values. Also, assuming that the size of at least one detection result frame obtained as a face detection result using a non-reference detection frame is, for example, 20p × 20p, there are 20 in the frame 210 in the row direction and in the column direction. There are 20 values in total, for a total of 400 values.

以上のようにして、複数種類の検出枠のそれぞれについて、当該検出枠に対応する少なくとも一つのマップ２００が生成されると、マップ生成部４は、これらのマップ２００を合成して出力値マップを生成する。具体的には、マップ生成部４は、生成した複数のマップ２００のｍ×ｎ番目の値を加算し、それによって得られた加算値を出力値マップのｍ×ｎ番目の検出確度値とする。マップ生成部４は、このようにして、出力値マップを構成する各検出確度値を求める。これにより、処理対象画像での検出確度値の分布を示す出力値マップが完成される。出力値マップを参照すれば、処理対象画像において顔画像らしさが高い領域を特定することができる。つまり、出力値マップを参照することによって、処理対象画像に含まれる顔画像を特定することができる。 As described above, when at least one map 200 corresponding to each detection frame is generated for each of a plurality of types of detection frames, the map generation unit 4 combines these maps 200 to generate an output value map. Generate. Specifically, the map generation unit 4 adds the m × n-th values of the plurality of generated maps 200, and uses the obtained addition value as the m × n-th detection accuracy value of the output value map. . In this way, the map generation unit 4 obtains each detection accuracy value constituting the output value map. Thereby, an output value map indicating the distribution of detection accuracy values in the processing target image is completed. By referring to the output value map, it is possible to specify a region having a high likelihood of a face image in the processing target image. That is, the face image included in the processing target image can be specified by referring to the output value map.

図１３は、図１０に示される処理対象画像２０についての出力値マップを当該処理対象画像２０に重ねて示す図である。図１３では、理解し易いように、検出確度値の大きさを例えば第１段階から第５段階の５段階に分けて出力値マップを示している。図１３に示される出力値マップにおいては、検出確度値が、最も大きい第５段階に属する領域については縦線のハッチングが示されており、２番目に大きい第４段階に属する領域については砂地のハッチングが示されている。また、図１３での出力値マップにおいては、検出確度値が、３番目に大きい第３段階に属する領域については右上がりのハッチングが示されており、４番目に大きい第２段階に属する領域については左上がりのハッチングが示されている。そして、図１３に示される出力値マップにおいては、検出確度値が、最も小さい第１段階に属する領域についてはハッチングが示されていない。 FIG. 13 is a diagram showing an output value map for the processing target image 20 shown in FIG. In FIG. 13, for easy understanding, the output value map is shown by dividing the magnitude of the detection accuracy value into, for example, five stages from the first stage to the fifth stage. In the output value map shown in FIG. 13, vertical line hatching is shown for the area belonging to the fifth stage where the detection accuracy value is the largest, and sand area is shown for the area belonging to the second largest fourth stage. Hatching is shown. Further, in the output value map in FIG. 13, the area belonging to the third stage where the detection accuracy value is the third largest shows hatching upward to the right, and the area belonging to the fourth largest second stage. Indicates a hatching that goes up to the left. In the output value map shown in FIG. 13, hatching is not shown for the region belonging to the first stage having the smallest detection accuracy value.

図１３を参照すると、出力値マップでは、処理対象画像２０に含まれる各顔画像に対応する領域での検出確度値が高くなっていることが理解できる。 Referring to FIG. 13, it can be understood that in the output value map, the detection accuracy value in the region corresponding to each face image included in the processing target image 20 is high.

＜極大点探索処理＞
図１３に示されるように、出力値マップにおいては、処理対象画像での顔画像に対応する領域での検出確度値が大きくなる可能性が高い。そして、ミクロな視点で出力値マップを見てみると、出力値マップにおいては、処理対象画像での顔画像に対応する領域での検出確度値のうち、顔画像の中心位置と同じ位置での検出確度値が最も大きくなる可能性が高い。したがって、出力値マップにおいて検出確度値の極大点を探索することによって、顔画像の中心位置を特定することができる。そして、処理対象画像において、特定した極大点（顔画像の中心位置に対応）と同じ位置のピクセルを含む所定領域を顔画像であると決定することによって、処理対象画像中での顔画像を正確に特定することができる。よって、当該所定領域の外形枠を統合検出結果枠とすることによって、精度の良い統合検出結果枠を得ることができる。 <Maximum point search processing>
As shown in FIG. 13, in the output value map, there is a high possibility that the detection accuracy value in the region corresponding to the face image in the processing target image is large. Then, looking at the output value map from a micro viewpoint, in the output value map, the detection accuracy value in the region corresponding to the face image in the processing target image is the same as the center position of the face image. There is a high possibility that the detection accuracy value will be the largest. Therefore, the center position of the face image can be specified by searching for the maximum point of the detection accuracy value in the output value map. Then, by determining that a predetermined area including a pixel at the same position as the identified maximum point (corresponding to the center position of the face image) in the processing target image is a face image, the face image in the processing target image is accurately Can be specified. Therefore, an accurate integrated detection result frame can be obtained by using the outer frame of the predetermined area as the integrated detection result frame.

ここでは、出力値マップにおいて検出確度値の極大点を探索する方法について説明する。本実施の形態では、極大点探索部５は、例えば、Mean-Shift法を用いて出力値マップでの検出確度値の極大点を探索する。以下に極大点探索部５の動作について詳細に説明する。以後、単に「極大点」と言えば、「出力値マップでの検出確度値の極大点」を意味するものとする。 Here, a method for searching for the maximum point of the detection accuracy value in the output value map will be described. In the present embodiment, the maximum point search unit 5 searches for the maximum point of the detection accuracy value in the output value map using, for example, the Mean-Shift method. The operation of the maximum point search unit 5 will be described in detail below. Hereinafter, simply speaking, “maximum point” means “maximum point of detection accuracy value in output value map”.

極大点探索部５は、二次元座標に配置された出力値マップにおいて、検出確度値を重み係数として、処理対象領域内に含まれる複数の検出確度値がそれぞれ存在する複数の位置についての座標値の重み付け平均値を算出し、当該処理対象領域の中心位置が当該重み付け平均値となるように当該処理対象領域を移動させる処理を繰り返すことによって極大点を探索する。本実施の形態では、検出部３が複数種類の検出枠を用いて顔画像の検出処理を行った結果として得られた複数の検出結果枠１５０の数と同じ数だけ極大点が求められる。 In the output value map arranged in the two-dimensional coordinates, the local maximum point search unit 5 uses the detection accuracy value as a weighting factor, and coordinates values for a plurality of positions each having a plurality of detection accuracy values included in the processing target region. The maximum point is searched by repeating the process of moving the processing target region so that the center position of the processing target region becomes the weighted average value. In the present embodiment, the maximum points are obtained by the same number as the number of the plurality of detection result frames 150 obtained as a result of the face image detection process performed by the detection unit 3 using a plurality of types of detection frames.

図１４は極大点の探索方法を説明するための図である。図１４では、二次元座標であるＸＹ座標に出力値マップ３００が配置されている。本実施の形態では、例えば、出力値マップ３００の左上をＸＹ座標の原点Ｏとし、行方向をＸ軸方向とし、列方向をＹ軸方向とする。また極大値の探索の際に移動させる処理対象領域４００の形状を例えば円形とする。 FIG. 14 is a diagram for explaining a search method for local maximum points. In FIG. 14, an output value map 300 is arranged at XY coordinates that are two-dimensional coordinates. In the present embodiment, for example, the upper left of the output value map 300 is the origin O of the XY coordinates, the row direction is the X axis direction, and the column direction is the Y axis direction. The shape of the processing target area 400 that is moved when searching for the maximum value is, for example, circular.

極大点探索部５は、検出部３が複数種類の検出枠を用いて顔画像の検出処理を行った結果として得られた複数の検出結果枠１５０のうちの一つの検出結果枠１５０を対象検出結果枠１５０ａとする。 The local maximum point search unit 5 performs target detection on one detection result frame 150 among the plurality of detection result frames 150 obtained as a result of the detection unit 3 performing face image detection processing using a plurality of types of detection frames. Let it be a result frame 150a.

次に、極大点探索部５は、出力値マップ３００上を移動させる処理対象領域４００の移動開始位置を決定する。ここで、処理対象画像での検出結果枠１５０内の部分画像は顔画像である可能性が高いことから、処理対象画像において顔画像の中心は検出結果枠１５０内に存在する可能性が高い。したがって、出力値マップ３００においては、対象検出結果枠１５０ａと同じ位置の領域内に極大点が存在する可能性が高い。特に本実施の形態では、出力値マップ３００の生成で使用される上述のマップ２００を完成する際には、枠２１０内の中心２１１の値を検出確度値としていることから、出力値マップ３００においては、対象検出結果枠１５０ａ内の中心位置と同じ位置の近くに極大点が存在する可能性が高い。 Next, the local maximum point search unit 5 determines the movement start position of the processing target area 400 to be moved on the output value map 300. Here, since the partial image in the detection result frame 150 in the processing target image is highly likely to be a face image, the center of the face image in the processing target image is likely to exist in the detection result frame 150. Therefore, in the output value map 300, there is a high possibility that a local maximum point exists in the region at the same position as the target detection result frame 150a. In particular, in the present embodiment, when the above-described map 200 used for generating the output value map 300 is completed, the value of the center 211 in the frame 210 is used as the detection accuracy value. Is likely to have a local maximum near the same position as the center position in the object detection result frame 150a.

そこで、図１４に示されるように、極大点探索部５は、対象検出結果枠１５０ａ内の所定位置、例えば中心位置と同じ出力値マップ３００での位置４１０を、処理対象領域４００の中心位置の初期位置とする。つまり、極大点探索部５は、極大点の探索を開始する際には、処理対象領域４００の中心位置が、対象検出結果枠１５０ａ内の中心位置と同じ位置となるように、当該処理対象領域４００を出力値マップ３００に配置する。これにより、極大点をすぐに探索することができる。 Therefore, as illustrated in FIG. 14, the local maximum point search unit 5 uses a predetermined position in the target detection result frame 150 a, for example, a position 410 in the output value map 300 that is the same as the center position, as the center position of the processing target region 400. The initial position. That is, when starting the search for the maximum point, the local maximum point search unit 5 sets the processing target region 400 so that the center position of the processing target region 400 is the same as the central position in the target detection result frame 150a. 400 is placed in the output value map 300. Thereby, the local maximum point can be searched immediately.

なお、処理対象領域４００の大きさは、例えば、出力値マップ３００上に配置された当該処理対象領域４００内において、その中心から半径方向に沿って５０〜６０個の検出確度値が並ぶ程度の大きさとなっている。 Note that the size of the processing target area 400 is, for example, such that 50 to 60 detection accuracy values are arranged along the radial direction from the center in the processing target area 400 arranged on the output value map 300. It is a size.

次に、極大点探索部５は、ＸＹ座標に配置された出力値マップ３００において、検出確度値を重み係数として、処理対象領域４００内に含まれる複数の検出確度値がそれぞれ存在する複数の位置についての座標値の重み付け平均値（ＸＭ，ＹＭ）を算出する。極大点探索部５は、以下の式（１）を用いて重み付け平均値（ＸＭ，ＹＭ）を算出する。 Next, the maximal point search unit 5 uses the detection accuracy value as a weighting factor in the output value map 300 arranged in the XY coordinates, and a plurality of positions where a plurality of detection accuracy values included in the processing target region 400 exist. A weighted average value (XM, YM) of coordinate values is calculated. The maximum point search unit 5 calculates a weighted average value (XM, YM) using the following equation (1).

ここで、式（１）中のＪは、処理対象領域４００内に存在する複数の検出確度値の個数を示している。また、ｉは、処理対象領域４００内の複数の検出確度値のそれぞれに対して付された番号を示している。そして、ｖｉは、ｉ番の検出確度値を意味しており、（Ｘｉ，Ｙｉ）は、ＸＹ座標に配置された出力値マップ３００においてｉ番の検出確度値が存在する位置についてのＸＹ座標値を示している。 Here, J in Expression (1) indicates the number of a plurality of detection accuracy values existing in the processing target area 400. In addition, i indicates a number assigned to each of a plurality of detection accuracy values in the processing target area 400. And vi means the i-th detection accuracy value, and (Xi, Yi) is the XY coordinate value for the position where the i-th detection accuracy value exists in the output value map 300 arranged in the XY coordinates. Is shown.

極大点探索部５は、重み付け平均値（ＸＭ，ＹＭ）を求めると、処理対象領域４００の中心位置のＸＹ座標が当該重み付け平均値（ＸＭ，ＹＭ）となるように処理対象領域４００を移動させる。図１４の矢印は、処理対象領域４００が移動する様子を示している。 When the maximum point search unit 5 obtains the weighted average value (XM, YM), the processing target region 400 is moved so that the XY coordinates of the center position of the processing target region 400 become the weighted average value (XM, YM). . The arrows in FIG. 14 indicate how the processing target area 400 moves.

次に極大点探索部５は、処理対象領域４００の移動距離（シフト量）がしきい値未満であるかを判定する。処理対象領域４００の移動距離は、移動前の処理対象領域４００の中心位置と移動後の処理対象領域４００の中心位置との間の距離を求めることによって得られる。極大点探索部５は、処理対象領域４００の移動距離がしきい値以上であると判定すると、移動後の処理対象領域４００内に含まれる複数の検出確度値がそれぞれ存在する複数の位置についての座標値の重み付け平均値（ＸＭ，ＹＭ）を式（１）を用いて算出する。そして、極大点探索部５は、処理対象領域４００の中心位置のＸＹ座標が、新たに求めた重み付け平均値（ＸＭ，ＹＭ）となるように当該処理対象領域４００をさらに移動させる。 Next, the local maximum point search unit 5 determines whether the moving distance (shift amount) of the processing target area 400 is less than a threshold value. The movement distance of the processing target area 400 is obtained by calculating the distance between the center position of the processing target area 400 before the movement and the center position of the processing target area 400 after the movement. When the local maximum point search unit 5 determines that the movement distance of the processing target area 400 is equal to or greater than the threshold value, the local maximum point searching unit 5 has a plurality of detection accuracy values included in the processing target area 400 after the movement. A weighted average value (XM, YM) of coordinate values is calculated using equation (1). Then, the local maximum point search unit 5 further moves the processing target region 400 so that the XY coordinates of the center position of the processing target region 400 become the newly obtained weighted average value (XM, YM).

一方で、極大点探索部５は、処理対象領域４００の移動量がしきい値未満であると判定すると、処理対象領域４００の移動量が収束したと判断して、処理対象領域４００の移動を終了する。そして、極大点探索部５は、現在の処理対象領域４００の中心位置を極大点とする。これより、対象検出結果枠１５０の位置付近での極大点が求められる。 On the other hand, when the local maximum point search unit 5 determines that the movement amount of the processing target region 400 is less than the threshold value, the local maximum point searching unit 5 determines that the movement amount of the processing target region 400 has converged, and moves the processing target region 400. finish. Then, the local maximum point search unit 5 sets the current center position of the processing target region 400 as the local maximum point. From this, the local maximum point in the vicinity of the position of the target detection result frame 150 is obtained.

極大点探索部５は、検出部３が複数種類の検出枠を用いて顔画像の検出処理を行った結果として得られた複数の検出結果枠１５０のそれぞれについて、当該検出結果枠１５０の位置付近での極大点を同様にして求める。 The local maximum point search unit 5 is located near the position of the detection result frame 150 for each of the plurality of detection result frames 150 obtained as a result of the detection unit 3 performing face image detection processing using a plurality of types of detection frames. Find the local maximum in the same way.

なお、極大点探索部５は、重み付け平均値（ＸＭ，ＹＭ）を算出する際には、検出確度値が間引かれた出力値マップ３００を用いても良い。言い換えれば、極大点探索部５は、重み付け平均値（ＸＭ，ＹＭ）を算出する際には、出力値マップ３００において処理対象領域４００内に含まれる複数の検出確度値のすべてを用いなくても良い。 In addition, when calculating the weighted average value (XM, YM), the maximum point search unit 5 may use the output value map 300 from which the detection accuracy value is thinned out. In other words, when calculating the weighted average value (XM, YM), the local maximum point search unit 5 does not have to use all of the plurality of detection accuracy values included in the processing target area 400 in the output value map 300. good.

図１５は、検出確度値が間引かれた出力値マップ３００の一例を示す図である。図１５の例では、出力値マップ３００が、行方向（Ｘ軸方向）に等間隔で並ぶ複数の分割線５００と列方向（Ｙ軸方向）に等間隔で並ぶ複数の分割線５１０とによって格子状に分割されている。そして、出力値マップ３００においては、格子交点（分割線５００，５１０の交点）に存在する検出確度値以外の検出確度値が削除されている。これにより、出力値マップ３００においては、検出確度値が、列方向においてＰ個（Ｐ≧２）ごとに（Ｐ−１）個間引かれ、行方向においてＱ個（Ｑ≧２）ごとに（Ｑ−１）個間引かれる。図１５中の丸印は、検出確度値が間引かれた出力値マップ３００に存在する検出確度値を示している。 FIG. 15 is a diagram illustrating an example of an output value map 300 in which detection accuracy values are thinned out. In the example of FIG. 15, the output value map 300 is latticed by a plurality of dividing lines 500 arranged at equal intervals in the row direction (X-axis direction) and a plurality of dividing lines 510 arranged at equal intervals in the column direction (Y-axis direction). It is divided into shapes. In the output value map 300, detection accuracy values other than the detection accuracy values existing at the grid intersections (intersections of the dividing lines 500 and 510) are deleted. Thereby, in the output value map 300, (P-1) pieces of detection accuracy values are thinned out every P (P ≧ 2) in the column direction, and every Q pieces (Q ≧ 2) in the row direction ( Q-1) Thinned out. The circles in FIG. 15 indicate the detection accuracy values existing in the output value map 300 from which the detection accuracy values are thinned out.

図１５に示される出力値マップ３００では、図１３での出力値マップと同様に、検出確度値の大きさが例えば第１段階から第５段階の５段階に分けられて各検出確度値が示されている。図１５での出力値マップ３００では、最も大きい第５段階に属する検出確度値を示す丸印には横線のハッチングが示されており、２番目に大きい第４段階に属する検出確度値を示す丸印には縦線のハッチングが示されている。また、図１５での出力値マップ３００では、３番目に大きい第３段階に属する検出確度値を示す丸印には右上がりのハッチングが示されており、４番目に大きい第２段階に属する検出確度値を示す丸印には左上がりのハッチングが示されている。そして、図１５に示される出力値マップ３００では、最も小さい第１段階に属する検出確度値を示す丸印にはハッチングが示されていない。 In the output value map 300 shown in FIG. 15, as in the output value map in FIG. 13, the magnitude of the detection accuracy value is divided into, for example, five stages from the first stage to the fifth stage to indicate each detection accuracy value. Has been. In the output value map 300 in FIG. 15, a circle indicating a detection accuracy value belonging to the fifth highest stage is indicated by a horizontal line hatching, and a circle indicating a detection accuracy value belonging to the second highest fourth stage. The mark shows vertical hatching. In addition, in the output value map 300 in FIG. 15, the circle indicating the detection accuracy value belonging to the third largest third stage indicates a right-up hatching, and the detection belonging to the fourth largest second stage. The circle indicating the accuracy value indicates a left-upward hatching. In the output value map 300 shown in FIG. 15, the circle indicating the detection accuracy value belonging to the smallest first stage is not hatched.

極大点探索部５は、重み付け平均値（ＸＭ，ＹＭ）を算出する場合に、図１５に示されるような、検出確度値が間引かれた出力値マップ３００を用いる際には、当該出力値マップ３００において処理対象領域４００内に含まれる複数の検出確度値がそれぞれ存在する複数の位置についてのＸＹ座標値と当該複数の検出確度値を上記の式（１）に代入する。これにより、重み付け平均値（ＸＭ，ＹＭ）を算出する際に使用される検出確度値の数が低減する。さらに、処理対象領域４００の移動量が収束するまでに必要な、重み付け平均値（ＸＭ，ＹＭ）の算出回数も低減する。よって、極大点を探索する処理の負荷が軽減される。 When calculating the weighted average value (XM, YM), the maximum point search unit 5 uses the output value map 300 in which the detection accuracy value is thinned as shown in FIG. In the map 300, the XY coordinate values and the plurality of detection accuracy values for a plurality of positions where the plurality of detection accuracy values included in the processing target area 400 respectively exist are substituted into the above equation (1). This reduces the number of detection accuracy values used when calculating the weighted average value (XM, YM). Furthermore, the number of times of calculating the weighted average value (XM, YM) required until the movement amount of the processing target area 400 converges is also reduced. Therefore, the processing load for searching for the maximum point is reduced.

＜極大点統合処理＞
上述の説明から理解できるように、極大点探索部５では、互いに位置が異なる複数の極大点が求められることがある。出力値マップにおいて、互いに近くに位置している複数の極大点については、同じ顔画像の中心を示している可能性が高い。一方で、互いに離れて位置している複数の極大点については、別々の顔画像の中心を示している可能性が高い。 <Maximum point integration processing>
As can be understood from the above description, the maximum point search unit 5 may require a plurality of maximum points having different positions. In the output value map, a plurality of local maximum points located close to each other is likely to indicate the center of the same face image. On the other hand, there is a high possibility that the plurality of local maximum points located apart from each other indicate the centers of different face images.

そこで、検出対象画像決定部６は、極大点探索部５で求められた極大点を用いて、処理対象画像において顔画像である領域を特定する際には、まず、極大点探索部５で求められた、互いに近くに位置している複数の極大点を一つの極大点に統合する。以下に、互いに近くに位置する複数の極大点の統合方法の一例について説明する。 Therefore, the detection target image determination unit 6 uses the local maximum point obtained by the local maximum point search unit 5 to specify a region that is a face image in the processing target image. A plurality of local maximum points located close to each other are integrated into one local maximum point. Hereinafter, an example of a method for integrating a plurality of local maximum points located close to each other will be described.

検出対象画像決定部６は、検出部３が顔検出処理の際に検出枠を処理対象画像の左上から右下にかけて移動させる場合と同様に、出力値マップを左上から右下にかけて見ていき（ラスタスキャンの方向に見ていき）、極大点探索部５で求められた極大点が現れると、当該極大点を基準点として、当該基準点と次に現れる極大点との間の距離を求める。そして、検出対象画像決定部６は、求めた距離がしきい値未満であれば、基準点を残して、後に現れた極大点を削除する。一方で、検出対象画像決定部６は、求めた距離がしきい値以上であれば、現在の基準点を残しつつ、後に現れた極大点を新たな基準点とする。 Similar to the case where the detection unit 3 moves the detection frame from the upper left to the lower right of the processing target image during the face detection process, the detection target image determination unit 6 looks at the output value map from the upper left to the lower right ( When the local maximum obtained by the local maximum searching unit 5 appears, the distance between the reference point and the next maximum local point is determined with the local maximum as a reference point. Then, if the obtained distance is less than the threshold value, the detection target image determination unit 6 leaves the reference point and deletes the maximum point that appears later. On the other hand, if the obtained distance is equal to or greater than the threshold value, the detection target image determination unit 6 leaves the current reference point and sets the maximum point that appears later as a new reference point.

極大点の統合で使用されるしきい値については、どの程度の大きさの顔画像を検出すべきかに応じて決定される。例えば、本画像検出装置１が監視カメラシステムで使用される場合であって、カメラから比較的近いエリアを監視するのであれば、比較的大きい顔画像を検出することになるため、しきい値としては大きな値が使用される。また、本画像検出装置１が監視カメラシステムで使用される場合であって、カメラから比較的遠いエリアを監視するのであれば、比較的小さい顔画像を検出することになるため、しきい値としては小さな値が使用される。本例では、しきい値は、例えば、処理対象画像での５ピクセル分の距離に設定される。なお、しきい値は、ユーザによって調整可能（書き替え可能）とすることが好ましい。 The threshold value used in the integration of local maximum points is determined according to how large a face image should be detected. For example, when this image detection apparatus 1 is used in a surveillance camera system and an area relatively close to the camera is monitored, a relatively large face image is detected. A large value is used. In addition, when the image detection apparatus 1 is used in a surveillance camera system and an area relatively far from the camera is monitored, a relatively small face image is detected. A small value is used. In this example, the threshold value is set to a distance of 5 pixels in the processing target image, for example. The threshold value is preferably adjustable (rewritable) by the user.

検出対象画像決定部６は、求めた距離がしきい値未満であれば、現在の基準点を残して、後に現れた極大点を削除し、その後、現在の基準点と削除した極大点の次に現れる極大点との間の距離と、しきい値とを比較する。また、検出対象画像決定部６は、求めた距離がしきい値以上であり、後に現れた極大点を新たな基準点とすると、その新たな基準点の次に現れる極大点と当該新たな基準点との間の距離と、しきい値とを比較する。 If the obtained distance is less than the threshold value, the detection target image determination unit 6 leaves the current reference point, deletes the maximum point that appears later, and then deletes the current reference point and the next maximum point after the deletion. The threshold value is compared with the distance between the local maximum points appearing in. Further, when the obtained distance is equal to or greater than the threshold value and the maximum point that appears later is set as a new reference point, the detection target image determination unit 6 sets the maximum point that appears next to the new reference point and the new reference point. The distance between the points is compared with the threshold value.

以後、検出対象画像決定部６は同様に動作して、最後に現れる極大点と基準点との間の距離がしきい値未満の場合には、最後に現れる極大点を削除して、極大点の統合処理を終了する。一方で、検出対象画像決定部６は、最後に現れる極大点と基準点との間の距離がしきい値以上の場合には、最後に現れる極大点を削除せずに、極大点の統合処理を終了する。検出対象画像決定部６は、極大点の統合処理の終了後に残った少なくとも一つの極大点を使用して、処理対象画像において顔画像である領域を特定する。 Thereafter, the detection target image determination unit 6 operates in the same manner, and when the distance between the maximum point that appears last and the reference point is less than the threshold value, the maximum point that appears last is deleted, and the maximum point is deleted. This completes the integration process. On the other hand, when the distance between the maximum point that appears last and the reference point is equal to or greater than the threshold value, the detection target image determination unit 6 does not delete the maximum point that appears last, and integrates the maximum points. Exit. The detection target image determination unit 6 specifies an area that is a face image in the processing target image using at least one local maximum point remaining after the local maximum point integration process is completed.

なお、上記の例では、基準点と後に現れた極大点との間の距離がしきい値未満であれば、基準点及び後に現れた極大点のうち基準点だけを残していたが、基準点及び後に現れた極大点のうち、その位置での検出確度値が大きい方の極大点だけを残しても良い。これにより、極大点の統合処理の終了後に残った極大点が、顔画像の中心位置を示す可能性が高くなる。 In the above example, if the distance between the reference point and the maximum point that appears later is less than the threshold value, only the reference point is left among the reference point and the maximum point that appears later. Of the local maximum points that appear later, only the local maximum point with the larger detection accuracy value at that position may be left. Accordingly, there is a high possibility that the maximal point remaining after the maximal point integration process ends indicates the center position of the face image.

＜顔画像決定処理＞
検出対象画像決定部６は、極大点の統合処理が終了すると、その後に残った各極大点について、処理対象画像において当該極大点と同じ位置のピクセルを含む所定領域を顔画像（検出対象画像）であると決定する。そして、検出対象画像決定部６は、顔画像であると決定した当該所定領域の外形枠を統合後検出結果枠とする。以後、顔画像であると決定される当該所定領域を「検出画像領域」と呼ぶ。また、説明の対象の極大点を「対象極大点」と呼ぶことがある。 <Face image determination process>
When the integration process of local maximum points is completed, the detection target image determination unit 6 applies a predetermined region including a pixel at the same position as the local maximum point in the processing target image for each remaining local maximum point after that. It is determined that Then, the detection target image determination unit 6 sets the outline frame of the predetermined area determined to be a face image as a post-integration detection result frame. Hereinafter, the predetermined area determined to be a face image is referred to as a “detected image area”. In addition, the maximum point to be described may be referred to as “target maximum point”.

ここで、出力値マップでの顔画像に対応する領域においては、当該顔画像の中心位置と同じ位置での検出確度値が大きくなり、当該同じ位置から離れるにつれて検出確度値が小さくなる可能性が高い。そして、出力値マップにおいては、顔画像に対応する領域以外の領域では、検出確度値が零あるいは非常に小さくなる可能性が高い。したがって、出力値マップでは、ある顔画像の中心位置に相当する極大点から、当該顔画像の端に相当する位置に向かうにつれて、検出確度値が小さくなる可能性が高い。言い換えれば、出力値マップでは、ある顔画像の中心位置に相当する極大点から、当該顔画像の端に相当する位置に向かうにつれて、検出確度値が単調減少する可能性が高い。 Here, in the region corresponding to the face image in the output value map, the detection accuracy value at the same position as the center position of the face image increases, and the detection accuracy value may decrease as the distance from the same position increases. high. In the output value map, there is a high possibility that the detection accuracy value is zero or very small in an area other than the area corresponding to the face image. Therefore, in the output value map, there is a high possibility that the detection accuracy value becomes smaller from the maximum point corresponding to the center position of a certain face image toward the position corresponding to the end of the face image. In other words, in the output value map, there is a high possibility that the detection accuracy value monotonously decreases from the maximum point corresponding to the center position of a certain face image toward the position corresponding to the end of the face image.

図１６は、出力値マップでの対象極大点７００付近の検出確度値の分布の一例を示すグラフである。図１６では、対象極大点７００を中心とした左右方向の検出確度値の分布が示されている。また図１６では、縦軸は検出確度値を示しており、横軸は出力値マップ３００での左右方向の位置を示している。図１６に示されるように、検出確度値は、対象極大点７００から右方向ＤＲ１に向かうにつれて小さくなっている（単調減少している）。また、検出確度値は、対象極大点７００から左方向ＤＲ２に向かうにつれて小さくなっている（単調減少している）。 FIG. 16 is a graph showing an example of the distribution of detection accuracy values near the target maximum point 700 in the output value map. FIG. 16 shows a distribution of detection accuracy values in the left-right direction with the target maximum point 700 as the center. In FIG. 16, the vertical axis indicates the detection accuracy value, and the horizontal axis indicates the position in the left-right direction on the output value map 300. As shown in FIG. 16, the detection accuracy value decreases from the target maximum point 700 in the right direction DR1 (monotonically decreases). Further, the detection accuracy value decreases from the target maximum point 700 in the left direction DR2 (monotonically decreases).

本実施の形態では、このような点に鑑みて、検出対象画像決定部６は、出力値マップにおいて、対象極大点７００から離れる方向に沿って対象極大点７００から検出確度値を見ていった際に、検出確度値が、対象極大点７００での検出確度値に対して最初に（１／Ｚ）倍以下（Ｚ＞１）となる位置と同じ処理対象画像での位置を検出画像領域の端とすることによって、顔画像の端を特定する。これにより、処理対象画像に含まれる顔画像の端を正確に特定することができる。 In the present embodiment, in view of such points, the detection target image determination unit 6 looks at the detection accuracy value from the target maximum point 700 along the direction away from the target maximum point 700 in the output value map. In this case, the position in the processing target image is the same as the position where the detection accuracy value first becomes (1 / Z) times or less (Z> 1) with respect to the detection accuracy value at the target maximum point 700. By defining the edge, the edge of the face image is specified. Thereby, the edge of the face image included in the processing target image can be accurately specified.

また、撮像部の撮像エリアにおいて複数の顔が前後に存在するなどして、処理対象画像において、複数の顔画像が互いに接している場合には、対象極大点７００から離れる方向に沿って対象極大点７００から検出確度値を見ていった際に、検出確度値が、対象極大点７００での検出確度値に対して（１／Ｚ）倍以下となる位置が現れる前に、検出確度値の変化が単調減少でなくなる可能性がある。このような場合には、検出確度値の変化が単調減少でなくなった位置と同じ処理対象画像での位置が、顔画像の端である可能性が高い。 In addition, when a plurality of face images are in contact with each other in a processing target image because a plurality of faces exist in front and back in the imaging area of the imaging unit, the target maximum is along a direction away from the target maximum point 700. When the detection accuracy value is viewed from the point 700, before the position where the detection accuracy value is (1 / Z) times or less than the detection accuracy value at the target maximum point 700 appears, the detection accuracy value The change may not be monotonous. In such a case, there is a high possibility that the position in the processing target image that is the same as the position where the change in the detection accuracy value is not monotonously decreased is the end of the face image.

そこで、検出対象画像決定部６は、出力値マップ３００において、対象極大点７００から離れる方向に沿って対象極大点７００から検出確度値を見ていった際に、検出確度値が、対象極大点７００での検出確度値に対して（１／Ｚ）倍以下となる位置が現れる前に、検出確度値の変化が単調減少でなくなったと判断すると、当該変化が単調減少でなくなったと判断した位置と同じ処理対象画像での位置を、検出画像領域の端とする。これにより、複数の顔画像が接している場合であっても、当該複数の顔画像のそれぞれを個別に特定することができる。以下に検出対象画像決定部６の動作について詳細に説明する。 Therefore, when the detection target image determination unit 6 looks at the detection accuracy value from the target maximum point 700 along the direction away from the target maximum point 700 in the output value map 300, the detection accuracy value is the target maximum point. If it is determined that the change in the detection accuracy value is no longer monotonically decreasing before a position that is less than (1 / Z) times the detection accuracy value in 700, the position that is determined that the change is no longer monotonically decreasing The position in the same processing target image is the end of the detected image area. Thereby, even when a plurality of face images are in contact with each other, each of the plurality of face images can be specified individually. The operation of the detection target image determination unit 6 will be described in detail below.

図１７は、処理対象画像において対象極大点７００と同じ位置のピクセルを含む検出画像領域６００（以後、「対象検出画像領域６００」と呼ぶ）の決定方法を説明するための図である。図１７では出力値マップ３００が拡大して示されている。また図１７では、出力値マップ３００に対して対象検出画像領域６００の外形枠６００ａが重ねられて示されている。 FIG. 17 is a diagram for explaining a method of determining a detection image region 600 (hereinafter, referred to as “target detection image region 600”) including a pixel at the same position as the target maximum point 700 in the processing target image. In FIG. 17, the output value map 300 is shown enlarged. In FIG. 17, the outline frame 600 a of the target detection image region 600 is shown superimposed on the output value map 300.

本実施の形態では、検出画像領域の形状は例えば四角形に設定される。検出対象画像決定部６は、四角形の検出画像領域の右側端、左側端、上側端及び下側端を決定することによって、当該検出画像領域を決定する。 In the present embodiment, the shape of the detected image area is set to, for example, a quadrangle. The detection target image determination unit 6 determines the detection image region by determining the right end, left end, upper end, and lower end of the square detection image region.

まず、検出対象画像決定部６が対象検出画像領域６００の右側端６１０を決定する際の当該検出対象画像決定部６の動作について説明する。図１８は当該動作を示すフローチャートである。 First, the operation of the detection target image determination unit 6 when the detection target image determination unit 6 determines the right end 610 of the target detection image region 600 will be described. FIG. 18 is a flowchart showing the operation.

検出対象画像決定部６は、図１７に示されるように、出力値マップ３００において、対象極大点７００から右方向ＤＲ１に沿って検出確度値８００（丸印で示されている）を見ていって（抽出していって）、ペアを変えながら前後２つの検出確度値８００を比較し、その比較結果に基づいて対象検出画像領域６００の右側端６１０を決定する。このとき、検出対象画像決定部６は、検出確度値８００を一つずつ見ていっても良いし、一つ置きに見ていっても良いし、複数個置きに見ていっても良い。本例では、検出対象画像決定部６は、検出確度値８００を一つずつ見ていくものとする。 As shown in FIG. 17, the detection target image determination unit 6 looks at the detection accuracy value 800 (indicated by a circle) from the target maximum point 700 along the right direction DR1 in the output value map 300. Then, the two detection accuracy values 800 are compared before and after changing the pair, and the right end 610 of the target detection image region 600 is determined based on the comparison result. At this time, the detection target image determination unit 6 may view the detection accuracy values 800 one by one, every other one, or every other plurality. In this example, it is assumed that the detection target image determination unit 6 looks at the detection accuracy values 800 one by one.

具体的に説明すると、図１８に示されるように、検出対象画像決定部６は、ステップｓ１において、対象極大点７００での検出確度値８００（図１７に示される検出確度値８００ａ）を第１の確度値ｖ１とし、その右側の検出確度値８００を第２の確度値ｖ２とする。そして、ステップｓ２において、検出対象画像決定部６は（ｖ１−ｖ２）を求めて、第１の確度値ｖ１と第２の確度値ｖ２を比較する。 More specifically, as shown in FIG. 18, the detection target image determination unit 6 sets the detection accuracy value 800 at the target maximum point 700 (the detection accuracy value 800a shown in FIG. 17) to the first in step s1. And the detection accuracy value 800 on the right side thereof is set as the second accuracy value v2. In step s2, the detection target image determination unit 6 obtains (v1-v2) and compares the first accuracy value v1 and the second accuracy value v2.

次にステップｓ３において、検出対象画像決定部６は、ｖ１とｖ２の比較結果が、（ｖ１−ｖ２）＜０であるかを判断する。検出対象画像決定部６は、（ｖ１−ｖ２）＜０でないと判断した場合には、検出確度値８００が単調減少していると判断して、ステップｓ４を実行する。ステップｓ４において、検出対象画像決定部６は、第２の確度値ｖ２が、対象極大点７００での検出確度値８００の（１／Ｚ）倍以下であるか判断する。Ｚについては、例えば３≦Ｚ≦５に設定される。検出対象画像決定部６は、第２の確度値ｖ２が、対象極大点７００での検出確度値８００ａの（１／Ｚ）倍以下であると判断すると、ステップｓ５において、出力値マップ３００において第２の確度値ｖ２が存在する位置７１０（図１６，１７参照）と同じ処理対象画像での位置を、対象検出画像領域６００の右側端６１０とする。この位置７１０は、出力値マップ３００において、対象極大点７００から右方向ＤＲ１に沿って検出確度値８００を見ていった際に、検出確度値８００が、対象極大点７００での検出確度値８００ａに対して最初に（１／Ｚ）倍以下となる位置である。 Next, in step s3, the detection target image determination unit 6 determines whether the comparison result between v1 and v2 is (v1-v2) <0. If the detection target image determination unit 6 determines that (v1−v2) <0 is not satisfied, the detection target image determination unit 6 determines that the detection accuracy value 800 is monotonously decreasing and executes step s4. In step s4, the detection target image determination unit 6 determines whether the second accuracy value v2 is equal to or less than (1 / Z) times the detection accuracy value 800 at the target maximum point 700. For Z, for example, 3 ≦ Z ≦ 5 is set. When the detection target image determination unit 6 determines that the second accuracy value v2 is equal to or less than (1 / Z) times the detection accuracy value 800a at the target maximum point 700, the output value map 300 in step s5. The position in the processing target image that is the same as the position 710 (see FIGS. 16 and 17) where the accuracy value v2 of 2 exists is the right end 610 of the target detection image area 600. This position 710 indicates that when the detection accuracy value 800 is viewed along the right direction DR1 from the target maximum point 700 in the output value map 300, the detection accuracy value 800 becomes the detection accuracy value 800a at the target maximum point 700. First, it is a position that becomes (1 / Z) times or less.

一方でステップｓ４において、検出対象画像決定部６は、第２の確度値ｖ２が、対象極大点７００での検出確度値８００ａの（１／Ｚ）倍以下ではないと判断すると、ステップｓ１において、現在の第２の確度値ｖ２を新しい第１の確度値ｖ１と、その右側の検出確度値８００を新たな第２の確度値ｖ２とする。その後、検出対象画像決定部６は同様に動作する。 On the other hand, in step s4, when the detection target image determination unit 6 determines that the second accuracy value v2 is not less than (1 / Z) times the detection accuracy value 800a at the target maximum point 700, in step s1, The current second accuracy value v2 is set as a new first accuracy value v1, and the detection accuracy value 800 on the right side thereof is set as a new second accuracy value v2. Thereafter, the detection target image determination unit 6 operates in the same manner.

またステップｓ３において、検出対象画像決定部６は、ｖ１とｖ２の比較結果が、（ｖ１−ｖ２）＜０であると判断した場合には、ステップｓ６を実行する。ステップｓ６において、検出対象画像決定部６は、今回の比較結果も含めて（ｖ１−ｖ２）＜０という比較結果が連続して所定回数Ｃ（Ｃ≧２）だけ発生したかを判断する。つまり、検出対象画像決定部６は、出力値マップ３００において、対象極大点７００から右方向ＤＲ１に沿って、ペアを変えながら前後２つの検出確度値８００を比較していく際に、前の検出確度値８００が後の検出確度値８００よりも小さいという比較結果が連続して所定回数Ｃだけ発生したかを判断する。所定回数Ｃについては、例えばＣ＝２に設定される。 In step s3, if the detection target image determination unit 6 determines that the comparison result between v1 and v2 is (v1-v2) <0, step s6 is executed. In step s6, the detection target image determination unit 6 determines whether the comparison result of (v1-v2) <0 has been generated a predetermined number of times C (C ≧ 2) including the current comparison result. That is, when the detection target image determination unit 6 compares the two detection accuracy values 800 before and after changing the pair along the right direction DR1 from the target maximum point 700 in the output value map 300, the previous detection is performed. It is determined whether a comparison result that the accuracy value 800 is smaller than the subsequent detection accuracy value 800 has occurred a predetermined number of times C continuously. The predetermined number of times C is set to C = 2, for example.

ステップｓ６において、検出対象画像決定部６は、（ｖ１−ｖ２）＜０という比較結果が連続して所定回数Ｃだけ発生したと判断すると、検出確度値が、対象極大点７００での検出確度値に対して（１／Ｚ）倍以下となる位置が現れる前に、検出確度値の変化が単調減少でなくなったと判断して、ステップｓ５を実行して、現在の第２の確度値ｖ２が存在する位置７１０と同じ処理対象画像での位置を、対象検出画像領域６００の右側端６１０とする。この位置７１０は、検出対象画像決定部６が、出力値マップ３００において、対象極大点７００から右方向ＤＲ１に沿って検出確度値８００を見ていった際に、検出確度値８００が、対象極大点７００での検出確度値８００ａに対して（１／Ｚ）倍以下となる位置が現れる前に、検出確度値の変化が単調減少でなくなったと判断した位置となる。 In step s6, when the detection target image determination unit 6 determines that the comparison result of (v1-v2) <0 has continuously occurred a predetermined number of times C, the detection accuracy value is the detection accuracy value at the target maximum point 700. Before a position that is less than (1 / Z) times appears, it is determined that the change in the detection accuracy value is no longer monotonically decreasing, and step s5 is executed, so that the current second accuracy value v2 exists. The position in the processing target image that is the same as the position 710 to be processed is set as the right end 610 of the target detection image area 600. This position 710 indicates that when the detection target image determination unit 6 looks at the detection accuracy value 800 along the right direction DR1 from the target maximum point 700 in the output value map 300, the detection accuracy value 800 is the target maximum. Before a position that is (1 / Z) times or less than the detection accuracy value 800a at the point 700 appears, the detection accuracy value is determined to be no longer monotonically decreasing.

一方で、ステップｓ６において、検出対象画像決定部６は、（ｖ１−ｖ２）＜０という比較結果が連続して所定回数Ｃだけ発生したと判断しない場合には、ステップｓ１において、現在の第２の確度値ｖ２を新しい第１の確度値ｖ１とし、その右側の検出確度値８００を新たな第２の確度値ｖ２とする。その後、検出対象画像決定部６は同様に動作する。 On the other hand, in step s6, if the detection target image determination unit 6 does not determine that the comparison result of (v1-v2) <0 has continuously occurred a predetermined number of times C, the current second image is determined in step s1. The accuracy value v2 is a new first accuracy value v1, and the detection accuracy value 800 on the right side is a new second accuracy value v2. Thereafter, the detection target image determination unit 6 operates in the same manner.

このようにして、検出対象画像決定部６は、対象検出画像領域６００の右側端６１０を決定する。 In this way, the detection target image determination unit 6 determines the right end 610 of the target detection image region 600.

同様にして、検出対象画像決定部６は、対象検出画像領域６００の左側端６２０を決定する際には、図１７に示されるように、出力値マップ３００において、対象極大点７００から左方向ＤＲ２に沿って検出確度値８００を見ていって、ペアを変えながら前後２つの検出確度値８００を比較し、その比較結果に基づいて対象検出画像領域６００の左側端６２０を決定する。対象検出画像領域６００の右側端６１０及び左側端６２０が決定されると、対象検出画像領域６００の左右方向（行方向）の幅Ｗ１（図１６，１７参照）が決定される。 Similarly, when the detection target image determination unit 6 determines the left end 620 of the target detection image region 600, as shown in FIG. 17, in the output value map 300, the left direction DR2 from the target maximum point 700 is displayed. , The two detection accuracy values 800 are compared before and after changing the pair, and the left end 620 of the target detection image region 600 is determined based on the comparison result. When the right end 610 and the left end 620 of the target detection image region 600 are determined, the width W1 (see FIGS. 16 and 17) in the left-right direction (row direction) of the target detection image region 600 is determined.

また、検出対象画像決定部６は、対象検出画像領域６００の上側端６３０を決定する際には、図１７に示されるように、出力値マップ３００において、対象極大点７００から上方向ＤＲ３に沿って検出確度値８００を見ていって、ペアを変えながら前後２つの検出確度値８００を比較し、その比較結果に基づいて対象検出画像領域６００の上側端６３０を決定する。そして、検出対象画像決定部６は、対象検出画像領域６００の下側端６４０を決定する際には、図１７に示されるように、出力値マップ３００において、対象極大点７００から下方向ＤＲ４に沿って検出確度値８００を見ていって、ペアを変えながら前後２つの検出確度値８００を比較し、その比較結果に基づいて対象検出画像領域６００の下側端６４０を決定する。対象検出画像領域６００の上側端６３０及び下側端６４０が決定されると、対象検出画像領域６００の上下方向（列方向）の幅Ｗ２（図１７参照）が決定される。 Further, when the detection target image determination unit 6 determines the upper end 630 of the target detection image region 600, as shown in FIG. 17, in the output value map 300, the detection target image determination unit 6 extends from the target maximum point 700 along the upward direction DR3. Then, the two detection accuracy values 800 are compared while changing the pair, and the upper end 630 of the target detection image region 600 is determined based on the comparison result. Then, when the detection target image determination unit 6 determines the lower end 640 of the target detection image region 600, as shown in FIG. 17, in the output value map 300, the target maximum point 700 is shifted downward DR4. The two detection accuracy values 800 are compared while changing the pair, and the lower end 640 of the target detection image region 600 is determined based on the comparison result. When the upper end 630 and the lower end 640 of the target detection image region 600 are determined, the vertical width (column direction) width W2 (see FIG. 17) of the target detection image region 600 is determined.

このようにして、検出対象画像決定部６は、四角形の検出画像領域の右側端、左側端、上側端及び下側端を決定することによって、当該検出画像領域の位置及び大きさを決定する。そして、検出対象画像決定部６は、決定した検出画像領域の外形枠を、統合後検出結果枠とする。処理対象画像領域での統合後検出結果枠内の部分画像が、顔画像であると決定された検出画像領域となる。 In this way, the detection target image determination unit 6 determines the position and size of the detection image region by determining the right end, left end, upper end, and lower end of the quadrangle detection image region. Then, the detection target image determination unit 6 sets the determined outer frame of the detected image region as a post-integration detection result frame. The partial image within the post-integration detection result frame in the processing target image area becomes a detected image area determined to be a face image.

検出対象画像決定部６は、極大点の統合処理が終了した後に残った各極大点について、当該極大点に対応する検出画像領域（顔画像である領域）を決定するとともに、当該検出画像領域の外形枠を統合後検出結果枠とする。これにより、処理対象画像に含まれる各顔画像に関して、一つ顔画像に対して一つの統合後検出結果枠が求められる。 The detection target image determination unit 6 determines a detection image region (region that is a face image) corresponding to the local maximum point for each local maximum point remaining after the integration processing of the local maximum points is completed, and the detection image region The outline frame is set as a post-integration detection result frame. Thereby, for each face image included in the processing target image, one post-integration detection result frame is obtained for one face image.

なお、検出対象画像決定部６は、求めた検出画像領域の大きさがあまりにも小さい場合には、当該検出画像領域は顔画像ではないとして、当該検出画像領域を削除しても良い。言い換えれば、検出対象画像決定部６は、求めた統合後検出結果枠の大きさがあまりにも小さい場合には、当該統合後検出結果枠内の部分画像は顔画像ではないとして、当該統合後検出結果枠を削除しても良い。 In addition, when the size of the obtained detection image area is too small, the detection target image determination unit 6 may delete the detection image area, assuming that the detection image area is not a face image. In other words, when the size of the obtained detection result frame after integration is too small, the detection target image determination unit 6 determines that the partial image in the detection result frame after integration is not a face image, and performs detection after integration. The result frame may be deleted.

図１９は、図１０，１３に示される処理対象画像２０に関して、検出対象画像決定部６で求められた検出画像領域６００及び統合後検出結果枠９００（検出画像領域６００の外形枠６００ａ）を示す図である。図１９では、検出画像領域６００及び統合後検出結果枠９００が処理対象画像２０に重ねて示されている。 FIG. 19 shows a detection image region 600 and a post-integration detection result frame 900 (outer frame 600a of the detection image region 600) obtained by the detection target image determination unit 6 with respect to the processing target image 20 shown in FIGS. FIG. In FIG. 19, the detection image region 600 and the post-integration detection result frame 900 are shown superimposed on the processing target image 20.

図１９に示されるように、処理対象画像２０に含まれる各顔画像に対して、おおよそ一つの検出画像領域６００が求められている。つまり、処理対象画像２０に含まれる各顔画像に対して、おおよそ一つの統合後検出結果枠９００が求められている。これは、一つの顔画像に対して求められた複数の検出結果枠１５０（図１０参照）が統合されて、当該一つの顔画像に対して一つの統合後検出結果枠９００が求められたことを意味している。そして、各統合後検出結果枠９００内には顔画像が適切に収まっている。よって、本実施の形態に係る画像検出装置１では、適切に顔画像が検出されていると言える。 As shown in FIG. 19, approximately one detected image area 600 is obtained for each face image included in the processing target image 20. That is, approximately one post-integration detection result frame 900 is obtained for each face image included in the processing target image 20. This is because a plurality of detection result frames 150 (see FIG. 10) obtained for one face image are integrated, and one post-integration detection result frame 900 is obtained for the one face image. Means. In addition, the face images are appropriately contained in each post-integration detection result frame 900. Therefore, it can be said that the face image is appropriately detected in the image detection apparatus 1 according to the present embodiment.

なお、上記の例では、ノイズの影響により、単調減少でなくなったと誤って判断することを抑制するためにステップｓ６を実行しているが、ステップｓ６は実行しなくても良い。この場合には、ステップｓ３において、（ｖ１−ｖ２）＜０であると判断されると、ステップｓ５が実行されることになる。つまり、（ｖ１−ｖ２）＜０という比較結果が１回でも得られると、検出確度値が、対象極大点７００での検出確度値に対して（１／Ｚ）倍以下となる位置が現れる前に、検出確度値の変化が単調減少でなくなったと判断されて、ステップｓ５が実行される。 In the above example, step s6 is performed in order to suppress erroneous determination that the monotonous decrease is no longer caused by the influence of noise, but step s6 may not be performed. In this case, if it is determined in step s3 that (v1-v2) <0, step s5 is executed. That is, when the comparison result of (v1-v2) <0 is obtained even once, before the position where the detection accuracy value becomes (1 / Z) times or less than the detection accuracy value at the target maximum point 700 appears. In addition, it is determined that the change in the detection accuracy value is not monotonously decreased, and step s5 is executed.

また、画像検出装置１は、処理対象画像を表示装置で表示する際に、図１９に示されるように、当該処理対象画像に対して統合後検出結果枠９００（検出画像領域６００の外形枠６００ａ）を重ねて表示しても良い。 In addition, when the processing target image is displayed on the display device, the image detection device 1 displays the post-integration detection result frame 900 (the outer shape frame 600a of the detection image region 600) for the processing target image, as shown in FIG. ) May be displayed in an overlapping manner.

また、画像検出装置１は、予め登録された顔画像と、処理対象画像において顔画像であると判断された検出画像領域６００（統合後検出結果枠９００内の画像）とを比較し、両者が一致するか否かを判定しても良い。そして、画像検出装置１は、予め登録された顔画像と、処理対象画像での検出画像領域６００とが一致しない場合には、当該検出画像領域６００に対してモザイク処理を行った上で、当該処理対象画像を表示装置に表示しても良い。これにより、本実施の形態に係る画像検出装置１を監視カメラシステムに使用した場合において、監視カメラによって隣家の人の顔画像が撮影された場合であっても、当該顔画像を認識できないようにすることができる。つまり、プライバシーマスクを実現することができる。 In addition, the image detection apparatus 1 compares the face image registered in advance with the detection image region 600 (image in the post-integration detection result frame 900) determined to be a face image in the processing target image. It may be determined whether or not they match. If the face image registered in advance and the detected image area 600 in the processing target image do not match, the image detecting apparatus 1 performs mosaic processing on the detected image area 600 and then The processing target image may be displayed on the display device. As a result, when the image detection apparatus 1 according to the present embodiment is used in a surveillance camera system, even when a face image of a neighbor's person is photographed by the surveillance camera, the face image cannot be recognized. can do. That is, a privacy mask can be realized.

以上のように、本実施の形態では、検出対象画像としての確からしさを示す検出確度値についての処理対象画像での分布を示す出力値マップでの検出確度値の極大点と同じ当該処理対象画像での位置のピクセルを含む所定領域が当該検出対象画像であると決定されている。出力値マップでの検出確度値の極大点は、処理対象画像での検出対象画像の中心位置に対応すると考えられることから、処理対象画像において、当該極大点と同じ位置でのピクセルを含む所定領域を検出対象画像あると決定することによって、当該処理対象画像から当該検出対象画像を精度良く検出することができる。つまり、検出対象画像についての検出精度を向上することができる。 As described above, in the present embodiment, the processing target image that is the same as the maximum point of the detection accuracy value in the output value map that indicates the distribution in the processing target image with respect to the detection accuracy value that indicates the probability as the detection target image. It is determined that the predetermined area including the pixel at the position is the detection target image. Since the maximum point of the detection accuracy value in the output value map is considered to correspond to the center position of the detection target image in the processing target image, the predetermined region including the pixel at the same position as the maximum point in the processing target image Is determined to be a detection target image, the detection target image can be accurately detected from the processing target image. That is, the detection accuracy for the detection target image can be improved.

なお、画像検出装置１は詳細に説明されたが、上記した説明は、全ての局面において例示であって、この発明がそれに限定されるものではない。また、上述した各種変形例は、相互に矛盾しない限り組み合わせて適用可能である。そして、例示されていない無数の変形例が、この発明の範囲から外れることなく想定され得るものと解される。 Although the image detection device 1 has been described in detail, the above description is illustrative in all aspects, and the present invention is not limited thereto. The various modifications described above can be applied in combination as long as they do not contradict each other. And it is understood that the countless modification which is not illustrated can be assumed without deviating from the scope of the present invention.

１画像検出装置
３検出部
４マップ生成部
５極大点探索部
６検出対象画像決定部
１２制御プログラム DESCRIPTION OF SYMBOLS 1 Image detection apparatus 3 Detection part 4 Map production | generation part 5 Maximum point search part 6 Detection object image determination part 12 Control program

Claims

An image detection device for detecting a detection target image from a processing target image,
A detection unit that sets a plurality of detection frames whose positions are different from each other with respect to the processing target image, and performs detection processing of the detection target image with respect to an image in the detection frame for each of the plurality of detection frames;
A map generation unit that generates a map indicating a distribution in the processing target image with respect to a detection accuracy value indicating the certainty as the detection target image based on a detection result in the detection unit;
A search unit for searching for a maximum point of the detection accuracy value in the map;
Based on a predetermined criterion, an integration process that integrates a plurality of local maximum points that are close to each other in the map is performed, and a predetermined region that includes a pixel at the same position as the local maximum point after the integration process in the processing target image. A determination unit that determines that the image is a detection target image ,
The determination unit
In the map, when the detection accuracy value is viewed from the local maximum point along the direction away from the local maximum point after the integration process, the detection accuracy value becomes the detection accuracy value at the local maximum point. On the other hand, the position in the processing target image that is the same as the position that is first (1 / Z) times or less (Z> 1) is set as the end of the predetermined region that is determined to be the detection target image,
In the map, when the detection accuracy value is viewed from the local maximum point along the direction away from the local maximum point after the integration process, the detection accuracy value becomes the detection accuracy value at the local maximum point. On the other hand, if it is determined that the change in the detection accuracy value is no longer monotonically decreasing before a position that is (1 / Z) times or less appears, the position where the change is determined not to be monotonically decreased is determined in the detection target image. An image detection apparatus that uses an end of the predetermined area to be determined .

The image detection apparatus according to claim 1,
In the map arranged at two-dimensional coordinates, the search unit uses the detection accuracy value as a weighting factor, and calculates coordinate values for a plurality of positions where a plurality of the detection accuracy values included in the processing target area respectively exist. An image detection apparatus that searches for the maximum point by calculating a weighted average value and repeating a process of moving the processing target region so that a center position of the processing target region becomes the weighted average value.

The image detection apparatus according to claim 2,
The search unit determines that an image in the detection frame is highly likely to be the detection target image as a result of the detection unit performing the detection process, and the search unit is the same as the predetermined position in the detection frame. An image detection apparatus that starts searching for the maximum point with a position as a center position of the processing target region.

The image detection apparatus according to any one of claims 2 and 3,
When the search unit calculates the weighted average value, the search unit includes a plurality of positions where the plurality of detection accuracy values included in the processing target region exist in the map where the detection accuracy values are thinned out. An image detection device using the coordinate values of

An image detection apparatus according to any one of claims 1 to 4 , wherein
The image detection apparatus, wherein the detection target image is a human face image.

A control program for controlling an image detection device that detects a detection target image from a processing target image,
In the image detection device,
(A) A step of setting a plurality of detection frames whose positions are different from each other with respect to the processing target image, and performing detection processing of the detection target image on the image in the detection frame for each of the plurality of detection frames. When,
(B) based on the detection result in the step (a), generating a map indicating a distribution in the processing target image with respect to a detection accuracy value indicating the certainty as the detection target image;
(C) searching for a maximum point of the detection accuracy value in the map;
(D) performing an integration process for integrating the plurality of local maximum points close to each other in the map based on a predetermined criterion;
( E ) performing a step of determining a predetermined region including a pixel at the same position as the maximum point after the integration processing in the processing target image as the detection target image ;
In the map, when the detection accuracy value is viewed from the local maximum point along the direction away from the local maximum point after the integration process, the detection accuracy value becomes the detection accuracy value at the local maximum point. In contrast, in the map, a position in the processing target image that is the same as a position that is initially (1 / Z) times or less (Z> 1) is set as an end of the predetermined region that is determined to be the detection target image. When the detection accuracy value is viewed from the local maximum point along the direction away from the local maximum point after the integration process, the detection accuracy value is (1) with respect to the detection accuracy value at the local maximum point. / Z) If it is determined that the change in the detection accuracy value is no longer monotonically decreasing before a position that is less than or equal to (Z) appears, the position where it is determined that the change is no longer monotonically decreased is determined as the detection target image. Work with the end of the predetermined area The control program for executing in said step (e).

An image detection method for detecting a detection target image from a processing target image,
(A) A step of setting a plurality of detection frames whose positions are different from each other with respect to the processing target image, and performing detection processing of the detection target image on the image in the detection frame for each of the plurality of detection frames. When,
(B) based on the detection result in the step (a), generating a map indicating a distribution in the processing target image with respect to a detection accuracy value indicating the certainty as the detection target image;
(C) searching for a maximum point of the detection accuracy value in the map;
(D) performing an integration process for integrating the plurality of local maximum points close to each other in the map based on a predetermined criterion;
( E ) determining a predetermined region including a pixel at the same position as the maximum point after the integration processing in the processing target image as the detection target image ,
In the step (e),
In the map, when the detection accuracy value is viewed from the local maximum point along the direction away from the local maximum point after the integration process, the detection accuracy value becomes the detection accuracy value at the local maximum point. On the other hand, the position in the processing target image that is the same as the position that is first (1 / Z) times or less (Z> 1) is set as the end of the predetermined region that is determined to be the detection target image,
In the map, when the detection accuracy value is viewed from the local maximum point along the direction away from the local maximum point after the integration process, the detection accuracy value becomes the detection accuracy value at the local maximum point. On the other hand, if it is determined that the change in the detection accuracy value is no longer monotonically decreasing before a position that is (1 / Z) times or less appears, the position where the change is determined not to be monotonically decreased is determined in the detection target image. An image detection method using an end of the predetermined area to be determined .