JP2007264887A

JP2007264887A - Human region detection apparatus, human region detection method and program

Info

Publication number: JP2007264887A
Application number: JP2006087149A
Authority: JP
Inventors: Masahiro Hayakawa; 雅弘早川; Munemasa Hirota; 宗正弘田; Koji Matsuo; 浩次松尾; Kohei Ueda; 晃平上田; Sachiko Mizukami; 幸子水上
Original assignee: Dainippon Screen Manufacturing Co Ltd
Current assignee: Dainippon Screen Manufacturing Co Ltd
Priority date: 2006-03-28
Filing date: 2006-03-28
Publication date: 2007-10-11

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technique to specify candidate regions for human regions independently of the pixel color of an image. <P>SOLUTION: A human region detection apparatus derives the edge strength of each pixel of a target image (Step S21) and sets each group of adjacent pixels whose edge strength belongs to the same level in the target image as a contour region belonging to the level (Step S23). One contour region whose edge strength belongs to the highest level is set as an initial target region, and contour regions near the target region belonging to levels different from the highest level are combined with the target region in descending order of edge strength level (Step S26). The resultant target region is a human candidate region. The human candidate regions is specified without using the pixel color of target images but based on the edge strength, so that the human candidate regions can be specified independently of the pixel color of target images. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、画像から人物を示す人物領域を検出するための技術に関する。 The present invention relates to a technique for detecting a person region indicating a person from an image.

近年、デジタルカメラやコンピュータ等の普及により、幅広い分野においてデジタルの画像が取り扱われるようになっている。これに伴い、取り扱うデジタルの画像から人物を示す人物領域を検出する技術が要望されており、従来より各種の技術が提案されている。 In recent years, with the spread of digital cameras and computers, digital images have been handled in a wide range of fields. Accordingly, there is a demand for a technique for detecting a person area indicating a person from a digital image to be handled, and various techniques have been proposed.

例えば、特許文献１には、対象となる画像中において肌色の画素の領域を検出し、当該肌色の領域の形状に基づいて人物領域を検出する手法が開示されている。また、特許文献２には、対象となる画像中において肌色の画素の領域を検出し、当該肌色の領域に顔の特徴である要素（目や口など）が存在するか否かに基づいて人物領域を検出する手法が開示されている。 For example, Patent Document 1 discloses a technique of detecting a skin color pixel region in a target image and detecting a person region based on the shape of the skin color region. Further, Patent Document 2 detects a skin color pixel area in a target image, and determines whether a person has an element (such as an eye or a mouth) that is a facial feature in the skin color area. A technique for detecting a region is disclosed.

特開２０００−４８１８４号公報JP 2000-48184 A 特開２００４−５３８４号公報JP 2004-5384 A

従来から提案されている人物領域の検出手法はいずれも、肌色の画素の領域の検出を前提とするものである。しかしながら、人物の肌色は個人差があり、照射された照明光に応じても変化することから、画像中の肌色を一義的に定義できない。このため、肌色の判定条件としては比較的広い許容幅を確保する必要があり、これにより、人物以外の多くの領域が検出されてしまうこととなっていた。 Any of the conventionally proposed human area detection methods is premised on the detection of a skin color pixel area. However, the skin color of a person varies from person to person, and changes depending on the illumination light that is emitted. Therefore, the skin color in the image cannot be uniquely defined. For this reason, it is necessary to ensure a comparatively wide allowable range as the skin color determination condition, and as a result, many areas other than a person are detected.

また、グレースケールの画像や色相を大きく変更した画像などを対象とする場合には、たとえ人物領域であったとしても肌色とならないため、肌色の領域の検出では人物領域を検出することは不可能である。 In addition, when targeting grayscale images, images with greatly changed hues, etc., even if it is a human area, it will not become flesh-colored, so it is impossible to detect a human area by detecting a flesh-colored area. It is.

すなわち、従来の技術は、画像の画素の色に依存した手法を採用していたために、人物領域、あるいは、その候補となる領域を必ずしも正しく特定できなかった。 That is, since the conventional technique employs a method that depends on the color of the pixel of the image, the person area or the candidate area cannot always be correctly specified.

本発明は、上記課題に鑑みてなされたものであり、画像の画素の色に依存せずに人物領域の候補となる領域を特定できる技術を提供することを目的とする。 The present invention has been made in view of the above-described problems, and an object of the present invention is to provide a technique that can specify a region that is a candidate for a human region without depending on the color of a pixel of an image.

上記課題を解決するため、請求項１の発明は、対象画像から人物を示す人物領域を検出する人物領域検出装置であって、前記対象画像の各画素におけるエッジ強度を導出する手段と、前記エッジ強度を複数のレベルに区分し、前記対象画像において前記エッジ強度が同一のレベルに属し互いに隣接する画素群をそれぞれ、当該レベルに属する等高領域として設定する手段と、前記エッジ強度が最大のレベルに属する一の前記等高領域を初期の注目領域とし、前記最大のレベルとは異なるレベルに属し前記注目領域に近接する前記等高領域を、前記注目領域に結合して前記注目領域の一部とする結合手段と、を備え、前記結合手段は、前記注目領域への前記等高領域の結合を前記エッジ強度が大きなレベルに属する前記等高領域の順に行い、得られた前記注目領域を前記人物領域の候補となる人物候補領域とする。 In order to solve the above-mentioned problem, the invention of claim 1 is a person area detecting device for detecting a person area indicating a person from a target image, the means for deriving edge strength in each pixel of the target image, and the edge Means for classifying intensities into a plurality of levels, and setting the adjacent pixel groups belonging to the same level in the target image as contour regions belonging to the level, and the level with the maximum edge strength One of the contour regions belonging to the initial region of interest, and the contour region belonging to a level different from the maximum level and close to the region of interest is combined with the region of interest and part of the region of interest And combining means for combining the contour regions to the region of interest in the order of the contour regions belonging to a level having a large edge strength. Said region of interest is a candidate to become a person candidate region of the human region.

また、請求項２の発明は、請求項１に記載の人物領域検出装置において、前記結合手段による結合前の前記等高領域を拡張する手段、をさらに備えている。 The invention according to claim 2 further includes means for expanding the contour area before combining by the combining means in the human region detecting apparatus according to claim 1.

また、請求項３の発明は、請求項１または２に記載の人物領域検出装置において、前記結合手段によって得られた前記人物候補領域の内部に存在する空き領域を補完する補完手段、をさらに備えている。 Further, the invention of claim 3 is the person area detecting device according to claim 1 or 2, further comprising complementing means for complementing a free area existing inside the person candidate area obtained by the combining means. ing.

また、請求項４の発明は、請求項３に記載の人物領域検出装置において、前記空き領域の補完後の前記人物候補領域において、エッジ強度が所定の閾値以上となり互いに隣接する画素群をそれぞれ、人物の目を示す領域の候補となる目候補領域として抽出する目候補領域抽出手段、をさらに備えている。 According to a fourth aspect of the present invention, in the human region detection device according to the third aspect, in the human candidate region after complementing the vacant region, each of pixel groups adjacent to each other whose edge strength is equal to or greater than a predetermined threshold value, Eye candidate area extracting means for extracting as eye candidate areas that are candidates for areas indicating human eyes is further provided.

また、請求項５の発明は、請求項４に記載の人物領域検出装置において、前記目候補領域抽出手段は、前記エッジ強度が所定の閾値以上となり互いに隣接する画素群のうち、サイズが所定条件を満足するものをそれぞれ、前記目候補領域として抽出する。 According to a fifth aspect of the present invention, in the human region detecting device according to the fourth aspect, the eye candidate region extracting means is configured such that the edge strength is equal to or greater than a predetermined threshold value and a size is a predetermined condition among adjacent pixel groups Are extracted as the eye candidate regions.

また、請求項６の発明は、請求項４または５に記載の人物領域検出装置において、所定数以上の前記目候補領域が抽出されない前記人物候補領域を、前記人物領域の候補から除外する手段、をさらに備えている。 Further, the invention of claim 6 is the person area detection device according to claim 4 or 5, wherein means for excluding the person candidate area from which a predetermined number or more of the eye candidate areas are not extracted from the person area candidates, Is further provided.

また、請求項７の発明は、請求項４ないし６のいずれかに記載の人物領域検出装置において、前記人物候補領域から抽出されたいずれか２つの前記目候補領域のサイズの比較結果に基づいて、前記人物候補領域が前記人物領域であるか否かを判定する判定手段、をさらに備えている。 The invention according to claim 7 is the person area detection device according to any one of claims 4 to 6, based on a comparison result of the sizes of any two of the eye candidate areas extracted from the person candidate area. And determining means for determining whether the person candidate area is the person area.

また、請求項８の発明は、請求項４ないし７のいずれかに記載の人物領域検出装置において、前記人物候補領域から抽出されたいずれか２つの前記目候補領域の相互の配置関係に基づいて、前記人物候補領域が前記人物領域であるか否かを判定する判定手段、をさらに備えている。 Further, the invention of claim 8 is the person region detection device according to any one of claims 4 to 7, based on a mutual arrangement relationship between any two of the eye candidate regions extracted from the person candidate region. And determining means for determining whether the person candidate area is the person area.

また、請求項９の発明は、対象画像から人物を示す人物領域を検出する人物領域検出方法であって、（ａ）前記対象画像の各画素におけるエッジ強度を導出する工程と、（ｂ）前記エッジ強度を複数のレベルに区分し、前記対象画像において前記エッジ強度が同一のレベルに属し互いに隣接する画素群をそれぞれ、当該レベルに属する等高領域として設定する工程と、（ｃ）前記エッジ強度が最大のレベルに属する一の前記等高領域を初期の注目領域とし、前記最大のレベルとは異なるレベルに属し前記注目領域に近接する前記等高領域を、前記注目領域に結合して前記注目領域の一部とする工程と、を備え、前記（ｃ）工程では、前記注目領域への前記等高領域の結合を前記エッジ強度が大きなレベルに属する前記等高領域の順に行い、得られた前記注目領域を前記人物領域の候補となる人物候補領域とする。 The invention of claim 9 is a person area detection method for detecting a person area indicating a person from a target image, wherein (a) deriving edge strength at each pixel of the target image; Dividing the edge strength into a plurality of levels, and setting adjacent pixel groups belonging to the same level in the target image as the contour regions belonging to the level, and (c) the edge strength One of the contour regions belonging to the maximum level is set as the initial attention region, and the contour region belonging to a level different from the maximum level and close to the attention region is combined with the attention region to generate the attention. And in the step (c), combining the contour regions to the region of interest is performed in the order of the contour regions belonging to a level having a large edge strength. Said region of interest is a candidate to become a person candidate region of the human region.

また、請求項１０の発明は、対象画像から人物を示す人物領域を検出するためのプログラムであって、コンピュータに、（ａ）前記対象画像の各画素におけるエッジ強度を導出する工程と、（ｂ）前記エッジ強度を複数のレベルに区分し、前記対象画像において前記エッジ強度が同一のレベルに属し互いに隣接する画素群をそれぞれ、当該レベルに属する等高領域として設定する工程と、（ｃ）前記エッジ強度が最大のレベルに属する一の前記等高領域を初期の注目領域とし、前記最大のレベルとは異なるレベルに属し前記注目領域に近接する前記等高領域を、前記注目領域に結合して前記注目領域の一部とする工程と、を実行させ、前記（ｃ）工程では、前記注目領域への前記等高領域の結合を前記エッジ強度が大きなレベルに属する前記等高領域の順に行い、得られた前記注目領域を前記人物領域の候補となる人物候補領域とする。 The invention of claim 10 is a program for detecting a person region indicating a person from a target image, wherein the computer includes (a) a step of deriving an edge strength at each pixel of the target image; ) Dividing the edge strength into a plurality of levels, and setting pixel groups belonging to the same level and adjacent to each other in the target image as contour regions belonging to the level, and (c) the One contour region belonging to a level having the maximum edge strength is set as an initial attention region, and the contour region belonging to a level different from the maximum level and close to the attention region is combined with the attention region. And making the part of the region of interest a part of the region of interest, and in the step (c), the edge strength belongs to a level where the edge strength belongs to the region of interest. Performed in order of the high region, the region of interest obtained as a candidate to become a person candidate region of the human region.

請求項１ないし１０の発明によれば、対象画像のエッジ強度に基づいて人物候補領域を特定するため、対象画像の画素の色に依存せずに人物候補領域を特定できる。 According to the first to tenth aspects, since the person candidate area is specified based on the edge strength of the target image, the person candidate area can be specified without depending on the color of the pixel of the target image.

また、特に請求項２の発明によれば、結合する際の近接判定の基準を緩和できる。 In particular, according to the invention of claim 2, it is possible to relax the criteria for proximity determination when combining.

また、特に請求項３の発明によれば、空き領域を埋めることで、より人物らしい人物候補領域を取得できる。 In particular, according to the invention of claim 3, it is possible to acquire a person candidate area that is more like a person by filling the empty area.

また、特に請求項４の発明によれば、エッジ強度に基づいて容易に目候補領域を抽出できる。 In particular, according to the invention of claim 4, the eye candidate region can be easily extracted based on the edge strength.

また、特に請求項５の発明によれば、より正確に目候補領域を抽出できる。 In particular, according to the invention of claim 5, the eye candidate region can be extracted more accurately.

また、特に請求項６の発明によれば、所定数以上の目候補領域を含まない人物候補領域を人物領域の候補から除外するため、より正しい人物候補領域のみを人物領域の候補とすることができる。 Further, according to the invention of claim 6, in order to exclude a person candidate area that does not include a predetermined number or more of eye candidate areas from the candidate for the person area, only the more correct person candidate area can be set as the candidate for the person area. it can.

また、特に請求項７の発明によれば、目候補領域のサイズの比較結果に基づいて、人物領域を正確に特定できる。 In particular, according to the invention of claim 7, the person area can be accurately specified based on the comparison result of the sizes of the eye candidate areas.

また、特に請求項８の発明によれば、目候補領域の相互の配置関係に基づいて、人物領域を正確に特定できる。 In particular, according to the invention of claim 8, the person area can be accurately specified based on the mutual arrangement relationship of the eye candidate areas.

以下、図面を参照しつつ本発明の実施の形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

＜１．装置構成＞
図１は、本発明の実施の形態に係る人物領域検出装置１０の外観図である。この人物領域検出装置１０は、対象とするデジタルの画像（以下、「対象画像」という。）中から人物の像（代表的には、人物の顔の像）を示す領域（以下、「人物領域」という。）を検出する機能を有している。図に示すように、人物領域検出装置１０は装置構成としては一般的なコンピュータであり、コンピュータとしての本体部１、各種情報の表示を行うディスプレイ２、並びに、ユーザからの各種入力を受け付けるキーボード３及びマウス４を備えている。 <1. Device configuration>
FIG. 1 is an external view of a human region detection apparatus 10 according to an embodiment of the present invention. The person area detecting apparatus 10 includes an area (hereinafter referred to as “person area”) representing an image of a person (typically, a human face image) from a target digital image (hereinafter referred to as “target image”). ")"). As shown in the figure, the human region detection device 10 is a general computer as a device configuration, and includes a main body 1 as a computer, a display 2 for displaying various information, and a keyboard 3 for receiving various inputs from a user. And a mouse 4.

図２は、本体部１の内部構成を模式的に示すブロック図である。本体部１は、ＣＰＵ２１、ＲＯＭ２２、ＲＡＭ２３、各種情報を記憶する記憶装置であるハードディスク２５、記録媒体９１を読み取る読取装置２６、及び、インターネットなどのネットワーク９２を介して通信を行う通信部２７などを備え、それぞれをバスライン２９によって接続した構成となっている。また、このバスライン２９にはインターフェイス（Ｉ／Ｆ）を介して、上述したディスプレイ２、キーボード３及びマウス４が接続されている。このような構成により、人物領域検出装置１０の各部はＣＰＵ２１の制御下で動作する。 FIG. 2 is a block diagram schematically showing the internal configuration of the main body 1. The main unit 1 includes a CPU 21, a ROM 22, a RAM 23, a hard disk 25 that is a storage device for storing various information, a reading device 26 that reads a recording medium 91, and a communication unit 27 that performs communication via a network 92 such as the Internet. Each of which is connected by a bus line 29. The display 2, the keyboard 3 and the mouse 4 are connected to the bus line 29 through an interface (I / F). With this configuration, each unit of the person area detection device 10 operates under the control of the CPU 21.

ハードディスク２５には、プログラム４１が記憶されている。このプログラム４１は、記録媒体９１からの読み出しや、ネットワーク９２に接続された外部のサーバ装置からのダウンロードなどにより、ハードディスク２５に記憶される。このプログラム４１を、メモリたるＲＡＭ２３と協働しつつＣＰＵ２１が実行することにより、人物領域検出装置としての各種の機能が実現されるようになっている。 A program 41 is stored in the hard disk 25. The program 41 is stored in the hard disk 25 by reading from the recording medium 91 or downloading from an external server device connected to the network 92. When the CPU 21 executes the program 41 in cooperation with the RAM 23 serving as a memory, various functions as a person area detecting device are realized.

図中では、プログラム４１がＣＰＵ２１により実行されることで実現される主たる機能を、人物候補抽出部３１、領域補完部３２、目候補抽出部３３及び人物判定部３４として模式的に示している。 In the drawing, main functions realized by the program 41 being executed by the CPU 21 are schematically shown as a person candidate extraction unit 31, a region complementation unit 32, an eye candidate extraction unit 33, and a person determination unit 34.

＜２．処理＞
次に、人物領域検出装置１０の処理について説明する。以下では最初に、対象画像から人物領域を検出する全体処理を概略的に説明し、その後、全体処理の各工程となるサブ処理それぞれの詳細について説明する。 <2. Processing>
Next, processing of the person area detection device 10 will be described. In the following, first, the overall process for detecting a person region from a target image will be schematically described, and then the details of each of the sub-processes that are steps of the overall process will be described.

＜２−１．全体処理の概要＞
図３は、人物領域を検出する処理の全体の流れを示す図である。 <2-1. Overview of overall processing>
FIG. 3 is a diagram showing an overall flow of processing for detecting a person area.

まず、サブ処理である人物候補領域抽出処理が行われる（ステップＳ１１）。この人物候補領域抽出処理では、対象画像から人物領域の候補となる人物候補領域が抽出される。通常、この処理では、複数の人物候補領域が抽出される。 First, a person candidate area extraction process, which is a sub-process, is performed (step S11). In this person candidate area extraction processing, a person candidate area that is a candidate for a person area is extracted from the target image. Usually, in this processing, a plurality of person candidate regions are extracted.

次に、抽出された人物候補領域のうちから、一の人物候補領域が処理の対象となる「対象人物候補領域」として決定される（ステップＳ１２）。そして以降、対象人物候補領域に対して、３つのサブ処理、具体的には、空き領域補完処理（ステップＳ１３）、目候補領域抽出処理（ステップＳ１４）及び人物判定処理（ステップＳ１５）がこの順で行われる。 Next, among the extracted person candidate areas, one person candidate area is determined as a “target person candidate area” to be processed (step S12). Subsequently, for the target person candidate area, three sub-processes, specifically, an empty area complement process (step S13), an eye candidate area extraction process (step S14), and a person determination process (step S15) are performed in this order. Done in

空き領域補完処理（ステップＳ１３）では、対象人物候補領域の画素が補完され、対象人物候補領域内に存在する空き領域が補完される。また、目候補領域抽出処理（ステップＳ１４）では、空き領域の補完後の対象人物候補領域から人物の目を示す領域の候補となる目候補領域が抽出される。さらに、人物判定処理（ステップＳ１５）では、抽出された目候補領域に基づいて、対象人物候補領域が人物領域であるか否かの判定がなされる。つまり、このような３つのサブ処理により人物領域が検出されるわけである。 In the empty area complementing process (step S13), the pixels of the target person candidate area are complemented, and the empty areas existing in the target person candidate area are complemented. Further, in the eye candidate area extraction process (step S14), eye candidate areas that are candidates for areas indicating the eyes of the person are extracted from the target person candidate areas after the empty areas are complemented. Further, in the person determination process (step S15), it is determined whether or not the target person candidate area is a person area based on the extracted eye candidate area. That is, the person area is detected by such three sub-processes.

一の対象人物候補領域に関して処理がなされると、未処理の人物候補領域が存在しているかが判定される（ステップＳ１６）。そして、存在していた場合は（ステップＳ１６にてＹｅｓ）、次の対象人物候補領域が決定され（ステップＳ１２）、この対象人物候補領域に対して上記と同様の３つサブ処理が繰り返される。このような処理が繰り返され、最終的に全ての人物候補領域に関して３つサブ処理が行われて、人物領域か否かの判定がなされる。これにより、対象画像中の全ての人物領域が検出されることになる。 When the process is performed on one target person candidate area, it is determined whether an unprocessed person candidate area exists (step S16). If it exists (Yes in step S16), the next target person candidate area is determined (step S12), and the same three sub-processes as described above are repeated for this target person candidate area. Such a process is repeated, and finally three sub-processes are performed for all the person candidate areas, and it is determined whether or not the area is a person area. As a result, all the human regions in the target image are detected.

＜２−２．人物候補領域抽出処理＞
次に、人物候補領域抽出処理（図３：ステップＳ１１）の詳細について説明する。図４は、人物候補領域抽出処理の詳細な流れを示す図である。特に言及しない限り、人物候補領域抽出処理の各工程は全て、図２に示した人物候補抽出部３１により行われる。 <2-2. Person candidate area extraction processing>
Next, details of the person candidate region extraction process (FIG. 3: step S11) will be described. FIG. 4 is a diagram illustrating a detailed flow of the person candidate area extraction process. Unless otherwise stated, all the steps of the person candidate area extraction process are performed by the person candidate extraction unit 31 shown in FIG.

まず、対象画像の各画素におけるエッジ強度が導出される（ステップＳ２１）。具体的には、対象画像の各画素に関して、図５に示すラプラシアンフィルタ５を利用した畳み込み処理がなされ、各画素のエッジ強度を示すエッジ画像が生成される。例えば、図６の上部に例示する画像を対象画像６１とした場合は、ラプラシアンフィルタ５の適用により、図６の下部に示すエッジ画像６２が求められる。このエッジ画像６２の画素数は対象画像と同数となる。 First, the edge strength at each pixel of the target image is derived (step S21). Specifically, a convolution process using the Laplacian filter 5 shown in FIG. 5 is performed on each pixel of the target image, and an edge image indicating the edge strength of each pixel is generated. For example, when the image illustrated in the upper part of FIG. 6 is the target image 61, the edge image 62 shown in the lower part of FIG. 6 is obtained by applying the Laplacian filter 5. The number of pixels of the edge image 62 is the same as that of the target image.

エッジ画像６２の各画素は対応する対象画像の各画素におけるエッジ強度を示すことになる。ラプラシアンフィルタ５を利用した畳み込み処理では、中央画素と周辺画素との輝度の差が大きいほど（すなわち、エッジ強度が大きいほど）、その結果の値が大きくなる。このため、エッジ画像６２の画素の画素値は、対応する対象画像６１の画素のエッジ強度が大きいほど、大きくなる。 Each pixel of the edge image 62 indicates the edge intensity at each pixel of the corresponding target image. In the convolution process using the Laplacian filter 5, the larger the difference in luminance between the central pixel and the peripheral pixel (that is, the higher the edge strength), the larger the resulting value. For this reason, the pixel value of the pixel of the edge image 62 becomes larger as the edge intensity of the corresponding pixel of the target image 61 is larger.

次に、生成されたエッジ画像６２に基づいて、対象画像６１においてエッジ強度が同一のレベルに属する画素の位置を示すデータであるマスクデータが生成される（図４：ステップＳ２２）。 Next, based on the generated edge image 62, mask data which is data indicating the position of a pixel belonging to the same level of edge intensity in the target image 61 is generated (FIG. 4: step S22).

本実施の形態では、エッジ強度は、レベル０，レベル１，レベル２の３つのレベルに区分されている。エッジ強度が最大のレベルはレベル０であり、以降、レベル１，レベル２の順でエッジ強度は小さくなる。具体的には、エッジ強度は、閾値Ｔｈ１以上でレベル０、閾値Ｔｈ１未満かつ閾値Ｔｈ２（Ｔｈ２＜Ｔｈ１）以上でレベル１、閾値Ｔｈ２未満かつ閾値Ｔｈ３（０＜Ｔｈ３＜Ｔｈ２）以上でレベル２にそれぞれ区分される。 In the present embodiment, the edge strength is divided into three levels of level 0, level 1 and level 2. The level with the maximum edge strength is level 0, and thereafter the edge strength decreases in the order of level 1 and level 2. Specifically, the edge strength is level 0 when the threshold value is Th1 or more, and is level 1 when the threshold value Th1 is less than the threshold value Th2 (Th2 <Th1) or more, and is level 2 when the threshold value Th2 is less than the threshold value Th2 (0 <Th3 <Th2). Each is divided.

図７に示すように、これらの各レベル毎にマスクデータ６３ａ〜６３ｃが生成される。これらのマスクデータは各画素の値が「０」か「１」かで示される２値の画像データであり、その画素数は対象画像６１やエッジ画像６２と同数である。 As shown in FIG. 7, mask data 63a to 63c are generated for each of these levels. These mask data are binary image data indicated by the value of each pixel being “0” or “1”, and the number of pixels is the same as that of the target image 61 and the edge image 62.

エッジ画像６２にて画素値（すなわち、エッジ強度）がレベル０に属する画素に対応するレベル０のマスクデータ６３ａの画素の値は「１」となり、このマスクデータ６３ａの他の画素の値は「０」となる。同様に、エッジ画像６２にて画素値がレベル１に属する画素に対応するレベル１のマスクデータ６３ｂの画素の値は「１」となり、他の画素の値は「０」となる。さらに、エッジ画像６２にて画素値がレベル２に属する画素に対応するレベル２のマスクデータ６３ｃの画素の値は「１」となり、他の画素の値は「０」となる。 In the edge image 62, the pixel value of the mask data 63a at the level 0 corresponding to the pixel whose pixel value (that is, edge strength) belongs to the level 0 is “1”, and the values of the other pixels of the mask data 63a are “1”. 0 ". Similarly, the pixel value of the level 1 mask data 63b corresponding to the pixel whose pixel value belongs to level 1 in the edge image 62 is “1”, and the values of the other pixels are “0”. Further, the pixel value of the level 2 mask data 63c corresponding to the pixel whose pixel value belongs to level 2 in the edge image 62 is “1”, and the values of the other pixels are “0”.

したがって、レベル０のマスクデータ６３ａは、対象画像６１においてエッジ強度がレベル０に属する画素（エッジ強度が比較的大の画素）の位置を示すことになる。同様に、レベル１のマスクデータ６３ｂは対象画像６１においてエッジ強度がレベル１に属する画素の位置を示し、レベル２のマスクデータ６３ｃは対象画像６１においてエッジ強度がレベル２に属する画素（エッジ強度が比較的小の画素）の位置を示すことになる。 Therefore, the mask data 63a of level 0 indicates the position of a pixel whose edge intensity belongs to level 0 (a pixel having a relatively large edge intensity) in the target image 61. Similarly, level 1 mask data 63b indicates the position of a pixel whose edge strength belongs to level 1 in the target image 61, and level 2 mask data 63c indicates a pixel whose edge strength belongs to level 2 in the target image 61 (edge strength is 0). The position of a relatively small pixel) is indicated.

次に、生成されたマスクデータに基づいて、エッジ強度が同一レベルに属しかつ互いに隣接する画素群が、当該レベルに属する等高領域として設定される（図４：ステップＳ２３）。 Next, based on the generated mask data, pixel groups having edge strengths belonging to the same level and adjacent to each other are set as contour regions belonging to the level (FIG. 4: step S23).

具体的には、図７に示すように、マスクデータ６３ａ〜６３ｃが利用され、マスクデータにおいて値が「１」の画素に対応する対象画像６１の画素のみから構成される画像６４ａ〜６４ｃが各レベルごとに生成される。すなわち、これらの画像６４ａ〜６４ｃは、対象画像６１においてエッジ強度が同一のレベルに属する画素のみから構成される画像（以下、「等高画像」という。）となる。 Specifically, as shown in FIG. 7, mask data 63 a to 63 c are used, and images 64 a to 64 c each composed of only the pixel of the target image 61 corresponding to the pixel whose value is “1” in the mask data. Generated for each level. That is, these images 64a to 64c are images (hereinafter referred to as “contour images”) configured only from pixels belonging to the same level of edge intensity in the target image 61.

そして、各レベルに属する等高画像６４ａ〜６４ｃのそれぞれにおいて、水平方向あるいは垂直方向で隣接して一つの領域を形成する画素群が、当該レベルに属する一つの等高領域として設定されてＲＡＭ２３に記憶される。等高領域は、それが含まれる等高画像のレベルに属するものとして設定される。例えば、レベル０の等高画像６４ａに含まれていた等高領域は、レベル０に属する等高領域として設定される。 In each of the contour images 64a to 64c belonging to each level, a pixel group forming one region adjacent in the horizontal direction or the vertical direction is set as one contour region belonging to the level and stored in the RAM 23. Remembered. The contour region is set as belonging to the level of the contour image in which it is included. For example, a contour area included in the level 0 contour image 64 a is set as a contour area belonging to level 0.

一つの等高画像からは、一または複数の等高領域が設定される。例えば、図７に例示するレベル０の等高画像６４ａにおいては、目近傍の領域Ａ１、耳近傍の領域Ａ２、鼻近傍の領域Ａ３、及び、口近傍の領域Ａ４などがそれぞれレベル０に属する等高領域として設定される。また、図７に例示するレベル１の等高画像６４ｂにおいては、肌に相当する領域Ａ５などがレベル１に属する等高領域として設定される。 One or a plurality of contour regions are set from one contour image. For example, in the level 0 contour image 64a illustrated in FIG. 7, the region A1 near the eye, the region A2 near the ear, the region A3 near the nose, the region A4 near the mouth, and the like belong to level 0. Set as high region. Further, in the level 1 contour image 64b illustrated in FIG. 7, the region A5 corresponding to the skin is set as the contour region belonging to the level 1.

図４に戻り、次に、レベル０の等高領域のうちから、一の等高領域が初期の注目領域として決定される（ステップＳ２４）。次に、処理の対象となる対象レベルにレベル１が設定され（ステップＳ２５）、この対象レベル（レベル１）に属する等高領域を、所定の条件下で注目領域（レベル０に属する等高領域）に結合する等高領域結合処理が行われる（ステップＳ２６）。 Returning to FIG. 4, next, one contour area is determined as the initial attention area from the contour areas of level 0 (step S24). Next, level 1 is set as the target level to be processed (step S25), and the contour area belonging to this target level (level 1) is changed into the attention area (contour area belonging to level 0) under a predetermined condition. ) Is performed (step S26).

図８は、等高領域結合処理（図４：ステップＳ２６）の詳細な流れを示す図である。まず、対象レベルに属する等高領域のうちから、一の等高領域が処理の対象となる「対象等高領域」として決定される（ステップＳ３１）。 FIG. 8 is a diagram showing a detailed flow of the contour region combining process (FIG. 4: step S26). First, among the contour regions belonging to the target level, one contour region is determined as the “target contour region” to be processed (step S31).

次に、対象等高領域が注目領域に近接しているか否かが判定される（ステップＳ３２）。この近接判定は、互いの画素が少なくとも一部でも水平方向又は垂直方向で隣接しているか否かに基づいて行われる。 Next, it is determined whether or not the target contour area is close to the attention area (step S32). This proximity determination is performed based on whether or not at least some of the pixels are adjacent in the horizontal direction or the vertical direction.

例えば、図９に例示する領域が注目領域ＴＡ１であり、図１０に例示する領域が対象等高領域ＥＡ１であるとする。図１１に示すように、対象画像においてこれらの２つの領域ＴＡ１，ＥＡ１の画素が互いに隣接していた場合は、対象等高領域ＥＡ１は注目領域ＴＡ１に近接していると判定される。一方、図１２に示すように、対象画像においてこれらの２つの領域ＴＡ１，ＥＡ１のいずれの画素も隣接していない場合（離間している場合）は対象等高領域ＥＡ１は注目領域ＴＡ１に近接していないと判定される。画素同士の隣接は水平方向又は垂直方向で判断されるため、図１２の状態でも画素が隣接していることにはならない。 For example, the region illustrated in FIG. 9 is the attention region TA1, and the region illustrated in FIG. 10 is the target contour area EA1. As shown in FIG. 11, when the pixels of these two areas TA1 and EA1 are adjacent to each other in the target image, it is determined that the target contour area EA1 is close to the attention area TA1. On the other hand, as shown in FIG. 12, when none of the pixels in these two areas TA1 and EA1 are adjacent (separated) in the target image, the target contour area EA1 is close to the attention area TA1. It is determined that it is not. Since the adjacent pixels are determined in the horizontal direction or the vertical direction, the pixels are not adjacent even in the state of FIG.

このような判定により、対象等高領域が注目領域に近接していた場合は（図８：ステップＳ３２にてＹｅｓ）、対象等高領域は注目領域に結合すべき結合対象に設定される（ステップＳ３３）。一方、対象等高領域が注目領域に近接していない場合は（ステップＳ３２にてＮｏ）、対象等高領域は結合対象に設定されずに除外される。 As a result of such determination, when the target contour area is close to the attention area (FIG. 8: Yes in step S32), the target contour area is set as a combination target to be combined with the attention area (step S33). On the other hand, when the target contour area is not close to the attention area (No in step S32), the target contour area is excluded without being set as a combination target.

このようにして一の対象等高領域に関して近接判定がなされると、対象レベルに属する未処理の等高領域が存在しているかが判定される（ステップＳ３４）。そして、存在していた場合は（ステップＳ３４にてＹｅｓ）、次の対象等高領域が決定され（ステップＳ３１）、この対象等高領域に対して上記と同様の近接判定が繰り返される。このような処理が繰り返され、最終的に対象レベルに属する全ての等高領域に関して近接判定がなされ、近接していた等高領域は結合対象に設定されることになる。 When the proximity determination is made for one target contour area in this way, it is determined whether there is an unprocessed contour area belonging to the target level (step S34). If it exists (Yes in step S34), the next target contour area is determined (step S31), and the proximity determination similar to the above is repeated for this target contour area. Such a process is repeated, and finally the proximity determination is made for all the contour regions belonging to the target level, and the contour regions that are close are set as the combination target.

次に、結合対象に設定された全ての等高領域が注目領域に結合され、注目領域の一部とされる。すなわち、元の注目領域と、結合対象に設定された全ての等高領域とからなる全領域が、新たな注目領域とされる。例えば、図１１の例では、元の注目領域ＴＡ１と等高領域ＥＡ１とが結合されて、新たな注目領域ＴＡ２が形成されることになる（ステップＳ３５）。 Next, all the contour areas set as the combination target are combined with the attention area to be a part of the attention area. That is, the entire area composed of the original attention area and all the contour areas set as the combination target is set as a new attention area. For example, in the example of FIG. 11, the original attention area TA1 and the contour area EA1 are combined to form a new attention area TA2 (step S35).

なお、ステップＳ３２の近接判定の際には、対象等高領域の周縁を形成する画素に関して、所定の画素分（例えば、１〜２画素）水平方向及び垂直方向の双方に画素を付加して領域を拡張した後に近接判定を行うようにしてもよい。これによれば、対象等高領域が拡張されるため、拡張した画素分に応じて近接判定の基準を緩和でき、ノイズなどの影響により離間してしまった領域同士を近接していると判断できる。例えば、図１２のような状態でも、対象等高領域ＥＡ１が注目領域ＴＡ１に近接していると判断されることになる。なお、このような拡張を行った場合は、一部の画素において重複が生じることもあるが、この重複の場合も近接していると判断すればよい。 In the proximity determination in step S32, for the pixels forming the periphery of the target contour area, a pixel is added to both the horizontal direction and the vertical direction for a predetermined pixel (for example, 1 to 2 pixels). The proximity determination may be performed after expanding. According to this, since the target contour region is expanded, the criteria for proximity determination can be relaxed according to the expanded pixels, and it can be determined that the regions separated by the influence of noise or the like are close to each other. . For example, even in the state shown in FIG. 12, it is determined that the target contour area EA1 is close to the attention area TA1. In addition, when such an extension is performed, an overlap may occur in some pixels, but it may be determined that the overlap is also close.

図４に戻って説明する。以上のようにして等高領域結合処理（ステップＳ２６）が完了すると、次に、その時点の対象レベルよりも、エッジ強度が小となる下位のレベルが存在しているかが判定される（ステップＳ２７）。そして、下位のレベルが存在していた場合は、その時点の対象レベルより１つ下位のレベル（ステップＳ２８）が新たな対象レベルに設定される。例えば、その時点の対象レベルがレベル１であれば、レベル２が新たな対象レベルに設定される。そして、この新たな対象レベルに属する等高領域を注目領域に結合する等高領域結合処理が再び行われる（ステップＳ２６）。 Returning to FIG. When the contour region combination processing (step S26) is completed as described above, it is next determined whether or not there is a lower level whose edge strength is lower than the target level at that time (step S27). ). If a lower level exists, a level (step S28) one level lower than the current target level is set as a new target level. For example, if the target level at that time is level 1, level 2 is set as a new target level. Then, the contour area combination process for combining the contour area belonging to the new target level with the attention area is performed again (step S26).

前述のように本実施の形態では、エッジ強度はレベル０，レベル１，レベル２の３つのレベルに区分されるため、上記の処理により、レベル１を対象レベルにした等高領域結合処理の後、レベル２を対象レベルにした等高領域結合処理が行われる。したがって、レベル０の初期の注目領域に対してまずレベル１の等高領域が結合され、その結果として得られた注目領域に対してさらにレベル２の等高領域が結合されることになる。 As described above, in the present embodiment, the edge strength is divided into three levels of level 0, level 1, and level 2. Therefore, after the above-described processing, after the contour-level region combining processing in which level 1 is set as the target level. Then, the contour region combination processing with level 2 as the target level is performed. Accordingly, the level 1 contour region is first combined with the initial level 0 attention region, and the level 2 contour region is further combined with the resultant attention region.

また他の実施の態様としてエッジ強度を４以上のレベルに区分した場合を想定すると、この場合でも上記の処理により、レベル０以外のレベルに関してエッジ強度が大きなレベルから順に各レベルを対象レベルにした等高領域結合処理が順次に行われる。つまり、注目領域への等高領域の結合が、エッジ強度が大きなレベルに属する等高領域の順に行われていくことになる。 Assuming that the edge strength is divided into four or more levels as another embodiment, even in this case, each level is set to the target level in order from the level with the highest edge strength with respect to the levels other than level 0 by the above processing. The contour region combining process is sequentially performed. That is, the contour regions are combined with the attention region in the order of contour regions belonging to a level with a large edge strength.

レベル０以外の全てのレベルに関して等高領域結合処理がなされると、その時点で結果として得られた注目領域が人物候補領域とされる。つまり、レベル０の初期の注目領域に対して、各レベルの等高領域がエッジ強度が大きな順に結合された結果が、人物候補領域とされるわけである（ステップＳ２９）。 When the contour region combination processing is performed for all levels other than level 0, the attention region obtained as a result at that time is set as a human candidate region. That is, the result obtained by combining the contour regions of each level with the initial attention region of level 0 in the descending order of the edge strength is the person candidate region (step S29).

一般に、人物の顔を示す画像においては目の領域が最もエッジ強度が大きくなる。このことから、レベル０の領域は目の領域である可能性が高い。したがって、図１３に示すように、レベル０の領域を注目領域ＴＡとし、その注目領域ＴＡに対して近接する下位のレベルの等高領域ＥＡをエッジ強度が大きいものから順に結合していけば、その結果として得られる領域ＮＡは、人物を示す領域である可能性が高い。本実施の形態では、この原理に基づいて人物候補領域を抽出するわけである。 In general, in an image showing a person's face, the eye region has the highest edge strength. From this, it is highly possible that the level 0 area is the eye area. Therefore, as shown in FIG. 13, if the level 0 area is the attention area TA, and the lower level contour areas EA adjacent to the attention area TA are combined in descending order of edge strength, The resulting area NA is likely to be an area representing a person. In the present embodiment, the person candidate area is extracted based on this principle.

図４に戻り、以上のようにしてレベル０の一の等高領域を注目領域として人物候補領域が抽出されると、レベル０に属し、注目領域とされていない未処理の等高領域が存在しているかが判定される（ステップＳ３０）。そして、存在していた場合は（ステップＳ３０にてＹｅｓ）、次の一の等高領域が初期の注目領域として決定され（ステップＳ２４）、この注目領域に対して上記と同様にして近接する下位のレベルの等高領域が結合され、人物候補領域が抽出される。このような処理が繰り返され、最終的にレベル０に属する全ての等高領域に関して下位のレベルの等高領域が結合されて人物候補領域が抽出されることになる。 Returning to FIG. 4, when a candidate human area is extracted with the level 0 one contour area as the attention area as described above, there is an unprocessed contour area that belongs to level 0 and is not regarded as the attention area. It is determined whether or not (step S30). If it exists (Yes in step S30), the next one contour area is determined as the initial attention area (step S24), and the lower level adjacent to the attention area in the same manner as described above. The contour regions of the level are combined, and the person candidate region is extracted. Such processing is repeated, and finally, the candidate area is extracted by combining the lower level contour areas with respect to all the contour areas belonging to level 0.

＜２−３．空き領域補完処理＞
次に、空き領域補完処理（図３：ステップＳ１３）の詳細について説明する。上述した人物候補領域抽出処理で抽出された人物候補領域では、アルゴリズム上、図１３の下部に示すように、閉じた領域ＮＡの内部の所々に、その領域の一部として認識されない空き領域（画素が抜け落ちた状態の欠損領域）Ｓａが存在することが多い。空き領域補完処理では、対象人物候補領域に対してこのような空き領域が補完され、空き領域の無い対象人物候補領域が生成されることになる。 <2-3. Free space complement processing>
Next, details of the free space complementing process (FIG. 3: step S13) will be described. In the human candidate area extracted by the above-described human candidate area extraction processing, as shown in the lower part of FIG. 13, empty areas (pixels) that are not recognized as a part of the closed area NA in the algorithm. In many cases, there is a missing region (Sa) in a state in which is missing. In the empty area complementing process, such empty areas are complemented with respect to the target person candidate area, and a target person candidate area having no empty area is generated.

図１４は、空き領域補完処理の詳細な流れを示す図である。特に言及しない限り、空き領域補完処理の各工程は全て、図２に示した領域補完部３２により行われる。 FIG. 14 is a diagram showing a detailed flow of the empty area complementing process. Unless otherwise stated, all the steps of the empty area complementing process are performed by the area complementing unit 32 shown in FIG.

まず、対象画像において対象人物候補領域が存在する位置を示すマスクデータが生成される（ステップＳ４１）。このマスクデータは、画素数が対象画像と同数となり、各画素の値が「０」か「１」かの２値で示される画像データである。マスクデータでは、対象画像において対象人物候補領域に含まれる画素に対応する画素の値は「１」、他の画素の値は「０」となる。例えば、図１３の下部に例示する人物候補領域ＮＡからは、図１５の上部に例示するマスクデータＭＤ０が生成される。なお、図１５では、マスクデータにおいて値が「１」となる画素をハッチングで示している。 First, mask data indicating the position where the target person candidate area exists in the target image is generated (step S41). This mask data is image data in which the number of pixels is the same as that of the target image, and the value of each pixel is represented by a binary value of “0” or “1”. In the mask data, the value of the pixel corresponding to the pixel included in the target person candidate area in the target image is “1”, and the values of the other pixels are “0”. For example, mask data MD0 illustrated in the upper part of FIG. 15 is generated from the candidate person area NA illustrated in the lower part of FIG. In FIG. 15, pixels whose value is “1” in the mask data are indicated by hatching.

また、この処理では、図１５に示すように、マスクデータにＸＹ座標系が設定され、水平方向がＸ軸方向、垂直方向がＹ軸方向とされる。このＸＹ座標系の原点はマスクデータデータの左上端とされ、右側が＋Ｘ側、下側が＋Ｙ側とされる。そして、マスクデータデータの各画素は、このＸＹ座標系の座標位置（ｘ，ｙ）を用いてＰ（ｘ，ｙ）で示される。例えば、左上端の画素はＰ（１，１）で示され、その右側の画素はＰ（２，１）、その下側の画素はＰ（１，２）で示される。 In this process, as shown in FIG. 15, an XY coordinate system is set for the mask data, the horizontal direction is the X-axis direction, and the vertical direction is the Y-axis direction. The origin of this XY coordinate system is the upper left end of the mask data data, the right side is the + X side, and the lower side is the + Y side. Each pixel of the mask data data is indicated by P (x, y) using the coordinate position (x, y) of the XY coordinate system. For example, the upper left pixel is indicated by P (1,1), the right pixel thereof is indicated by P (2,1), and the lower pixel thereof is indicated by P (1,2).

マスクデータが生成されると、次に、このマスクデータの各水平画素列が＋Ｘ向きに走査され、以下の式（１）の演算が各画素を対象になされる（ステップＳ４２）。 When the mask data is generated, each horizontal pixel column of the mask data is then scanned in the + X direction, and the following equation (1) is calculated for each pixel (step S42).

Ｐ（ｘ，ｙ）＝Ｐ（ｘ−１，ｙ）ＯＲＰ（ｘ，ｙ） …（１）
すなわち、画素の値とその左側に隣接する画素の値との論理和を、新たに当該画素の値とする処理が各画素に関して行われる。演算対象とする画素は、左端部の画素から右端部の画素に向かう順番で変更される。これにより、その水平画素列において値が「１」となる画素が存在すれば、その画素の右側の画素の値は全て「１」となる。この処理はマスクデータの全ての水平画素列においてなされ、その結果、第１修正データが生成される。例えば、図１５の上部に例示するマスクデータＭＤ０からは、その下部に例示する第１修正データＭＤ１が生成される。 P (x, y) = P (x-1, y) OR P (x, y) (1)
That is, a process is performed for each pixel by newly setting the logical sum of the value of the pixel and the value of the pixel adjacent to the left side to the value of the pixel. The pixel to be calculated is changed in the order from the left end pixel to the right end pixel. As a result, if there is a pixel having a value “1” in the horizontal pixel row, the values of the pixels on the right side of the pixel are all “1”. This process is performed for all the horizontal pixel columns of the mask data, and as a result, first correction data is generated. For example, from the mask data MD0 illustrated in the upper part of FIG. 15, first correction data MD1 illustrated in the lower part is generated.

次に、ステップＳ４２とは逆に、マスクデータの各水平画素列が−Ｘ向きに走査され、以下の式（２）の演算が各画素を対象になされる（ステップＳ４３）。 Next, contrary to step S42, each horizontal pixel column of the mask data is scanned in the -X direction, and the following equation (2) is calculated for each pixel (step S43).

Ｐ（ｘ，ｙ）＝Ｐ（ｘ，ｙ）ＯＲＰ（ｘ＋１，ｙ） …（２）
すなわち、画素の値とその右側に隣接する画素の値との論理和を、新たに当該画素の値とする処理が各画素に関して行われる。演算対象とする画素は、右端部の画素から左端部の画素に向かう順番で変更される。これにより、その水平画素列において値が「１」となる画素が存在すれば、その画素の左側の画素の値は全て「１」となる。この処理はマスクデータの全ての水平画素列においてなされ、その結果、第２修正データが生成される。例えば、図１５の上部に例示するマスクデータＭＤ０からは、その下部に例示する第２修正データＭＤ２が生成される。 P (x, y) = P (x, y) OR P (x + 1, y) (2)
In other words, a process is performed for each pixel by newly setting the logical sum of the pixel value and the value of the pixel adjacent to the right side to the value of the pixel. The pixel to be calculated is changed in the order from the right end pixel to the left end pixel. As a result, if there is a pixel having a value of “1” in the horizontal pixel column, the values of the pixels on the left side of the pixel are all “1”. This process is performed for all the horizontal pixel columns of the mask data, and as a result, second correction data is generated. For example, from the mask data MD0 illustrated in the upper part of FIG. 15, second correction data MD2 illustrated in the lower part is generated.

次に、このマスクデータの各垂直画素列が＋Ｙ向きに走査され、以下の式（３）の演算が各画素を対象になされる（ステップＳ４４）。 Next, each vertical pixel column of this mask data is scanned in the + Y direction, and the following equation (3) is calculated for each pixel (step S44).

Ｐ（ｘ，ｙ）＝Ｐ（ｘ，ｙ−１）ＯＲＰ（ｘ，ｙ） …（３）
すなわち、画素の値とその上側に隣接する画素の値との論理和を、新たに当該画素の値とする処理が各画素に関して行われる。演算対象とする画素は、上端部の画素から下端部の画素に向かう順番で変更される。これにより、その垂直画素列において値が「１」となる画素が存在すれば、その画素の下側の画素の値は全て「１」となる。この処理はマスクデータの全ての垂直画素列においてなされ、その結果、第３修正データが生成される。例えば、図１５の上部に例示するマスクデータＭＤ０からは、その下部に例示する第３修正データＭＤ３が生成される。 P (x, y) = P (x, y-1) OR P (x, y) (3)
In other words, a process is performed for each pixel by newly setting the logical sum of the pixel value and the value of the adjacent pixel above the pixel value. The pixel to be calculated is changed in the order from the upper end pixel to the lower end pixel. As a result, if there is a pixel having a value “1” in the vertical pixel column, the values of the pixels below the pixel are all “1”. This processing is performed for all the vertical pixel columns of the mask data, and as a result, third correction data is generated. For example, from the mask data MD0 illustrated in the upper part of FIG. 15, third correction data MD3 illustrated in the lower part is generated.

次に、ステップＳ４４とは逆に、このマスクデータの各垂直画素列が−Ｙ向きに走査され、以下の式（４）の演算が各画素を対象になされる（ステップＳ４５）。 Next, contrary to step S44, each vertical pixel column of this mask data is scanned in the -Y direction, and the following equation (4) is calculated for each pixel (step S45).

Ｐ（ｘ，ｙ）＝Ｐ（ｘ，ｙ）ＯＲＰ（ｘ，ｙ＋１） …（４）
すなわち、画素の値とその下側に隣接する画素の値との論理和を、新たに当該画素の値とする処理が各画素に関して行われる。演算対象とする画素は、下端部の画素から上端部の画素に向かう順番で変更される。これにより、その垂直画素列において値が「１」となる画素が存在すれば、その画素の上側の画素の値は全て「１」となる。この処理はマスクデータの全ての垂直画素列においてなされ、その結果、第４修正データが生成される。例えば、図１５の上部に例示するマスクデータＭＤ０からは、その下部に例示する第４修正データＭＤ４が生成される。 P (x, y) = P (x, y) OR P (x, y + 1) (4)
In other words, a process is performed for each pixel by newly setting the logical sum of the pixel value and the value of the adjacent pixel below the pixel value. The calculation target pixel is changed in the order from the lower end pixel toward the upper end pixel. As a result, if there is a pixel having a value “1” in the vertical pixel column, the values of the pixels above the pixel are all “1”. This process is performed for all the vertical pixel columns of the mask data, and as a result, fourth correction data is generated. For example, from the mask data MD0 illustrated in the upper part of FIG. 15, fourth correction data MD4 illustrated in the lower part is generated.

以上のようにして、第１ないし第４修正データが得られると、これら４つの修正データの論理積（ＡＮＤ）が導出される。これにより、元のマスクデータにおいて空き領域に相当する画素が埋められたデータが、補完マスクデータとして生成される（ステップＳ４６）。例えば、図１５に例示する第１ないし第４修正データＭＤ１〜ＭＤ４からは、図１５の下部に示す補完マスクデータＭＤ５が生成される。マスクデータＭＤ０と、補完マスクデータＭＤ５とを比較してわかるように、補完マスクデータＭＤ５では空き領域に相当する画素が埋められている。 When the first to fourth correction data are obtained as described above, a logical product (AND) of these four correction data is derived. As a result, data in which pixels corresponding to empty areas are filled in the original mask data is generated as complementary mask data (step S46). For example, complementary mask data MD5 shown in the lower part of FIG. 15 is generated from the first to fourth correction data MD1 to MD4 illustrated in FIG. As can be seen by comparing the mask data MD0 and the complementary mask data MD5, the complementary mask data MD5 is filled with pixels corresponding to empty areas.

補完マスクデータが得られると、この補完マスタデータが利用され、補完マスクデータにおいて値が「１」の画素に対応する対象画像の画素から構成される領域が取得される。これにより、空き領域が補完された対象人物候補領域が取得されることになる（ステップＳ４７）。 When complementary mask data is obtained, this complementary master data is used, and an area composed of pixels of the target image corresponding to a pixel having a value of “1” in the complementary mask data is acquired. Thereby, the target person candidate area | region where the empty area | region was complemented is acquired (step S47).

＜２−４．目候補領域抽出処理＞
次に、目候補領域抽出処理（図３：ステップＳ１４）の詳細について説明する。この目候補領域抽出処理では、上記のようにして空き領域が補完された対象人物候補領域から、目の領域の候補となる目候補領域が抽出される。前述のように、人物の顔を示す画像においては目の領域が最もエッジ強度が大きくなることから、目候補領域を抽出する場合でもエッジ強度が利用されるようになっている。 <2-4. Eye candidate area extraction processing>
Next, details of the eye candidate area extraction process (FIG. 3: step S14) will be described. In this eye candidate area extraction process, eye candidate areas that are candidates for eye areas are extracted from the target person candidate areas in which the empty areas are complemented as described above. As described above, in an image showing a person's face, the eye region has the highest edge strength, and therefore the edge strength is used even when the eye candidate region is extracted.

図１６は、目候補領域抽出処理の詳細な流れを示す図である。特に言及しない限り、目候補領域抽出処理の各工程は全て、図２に示した目候補抽出部３３により行われる。 FIG. 16 is a diagram showing a detailed flow of the eye candidate area extraction process. Unless otherwise stated, all the steps of the eye candidate region extraction process are performed by the eye candidate extraction unit 33 shown in FIG.

まず、対象人物候補領域の各画素のエッジ強度が導出される（ステップＳ５１）。このエッジ強度の導出は、図４のステップＳ２１と同様にラプラシアンフィルタ５を利用した畳み込み処理により行われる。 First, the edge strength of each pixel in the target person candidate area is derived (step S51). The derivation of the edge strength is performed by a convolution process using the Laplacian filter 5 as in step S21 of FIG.

次に、導出されたエッジ強度が所定の閾値Ｔｈ０以上となり、互いに隣接する画素群がそれぞれ高エッジ領域として設定される（ステップＳ５２）。これにより、一または複数の高エッジ領域が設定される。この高エッジ領域を設定する手法は、閾値を考慮しなければ、上述したレベル０の等高領域を設定する手法と実質的に同一である。このため、閾値Ｔｈ０と閾値Ｔｈ１とが同一でよいならば、対象人物候補領域内に含まれるレベル０の等高領域をそのまま高エッジ領域として設定してよい。 Next, the derived edge strength is equal to or greater than a predetermined threshold Th0, and adjacent pixel groups are set as high edge regions (step S52). Thereby, one or a plurality of high edge regions are set. The method for setting the high edge region is substantially the same as the method for setting the level 0 contour region described above unless the threshold value is taken into consideration. Therefore, if the threshold value Th0 and the threshold value Th1 may be the same, the level 0 contour area included in the target person candidate area may be set as the high edge area as it is.

次に、設定された高エッジ領域のうちから、一の高エッジ領域が処理の対象となる「対象高エッジ領域」として決定される（ステップＳ５３）。そして、対象高エッジ領域のサイズが所定条件を満足するか否かが判定される（ステップＳ５４）。対象高エッジ領域のサイズは、３つのサイズ変数で表現され、それぞれのサイズ変数が所定の閾値以上となるか否かが判定される。 Next, among the set high edge regions, one high edge region is determined as the “target high edge region” to be processed (step S53). Then, it is determined whether or not the size of the target high edge region satisfies a predetermined condition (step S54). The size of the target high edge region is expressed by three size variables, and it is determined whether or not each size variable is equal to or larger than a predetermined threshold value.

サイズ変数の１つは、対象高エッジ領域に含まれる「画素数」（Ｐｎｕｍ）である。そして、この画素数（Ｐｎｕｍ）が所定の閾値（ＰｎｕｍＴｈ）以上であるか否かが判定される。具体的には、次式（５）を満足するか否かが判定される。 One of the size variables is “number of pixels” (Pnum) included in the target high edge region. Then, it is determined whether or not the number of pixels (Pnum) is equal to or greater than a predetermined threshold value (PnumTh). Specifically, it is determined whether or not the following expression (5) is satisfied.

Ｐｎｕｍ＞ＰｎｕｍＴｈ …（５）
また、サイズ変数の他の１つは、図１７に示すように、対象高エッジ領域ＥＮに外接する最小の矩形たる外接矩形Ｒｅの画素数（ＲａＰｎ）に対する対象高エッジ領域ＥＮに含まれる画素数（Ｐｎｕｍ）の率である「矩形面積率」（Ｒａｒ）である。矩形面積率（Ｒａｒ）は、次式（６）で導出される。 Pnum> PnumTh (5)
Further, as shown in FIG. 17, the other one of the size variables is the number of pixels included in the target high edge region EN with respect to the number of pixels (RaPn) of the circumscribed rectangle Re that is the smallest rectangle circumscribing the target high edge region EN. It is a “rectangular area ratio” (Rar) which is a ratio of (Pnum). The rectangular area ratio (Rar) is derived by the following equation (6).

Ｒａｒ＝Ｐｎｕｍ／ＲａＰｎ …（６）
そして、この矩形面積率（Ｒａｒ）が所定の閾値（ＲａｒＴｈ）以上であるか否かが判定される。具体的には、次式（７）を満足するか否かが判定される。 Rar = Pnum / RaPn (6)
Then, it is determined whether or not the rectangular area ratio (Rar) is equal to or greater than a predetermined threshold (RarTh). Specifically, it is determined whether or not the following expression (7) is satisfied.

Ｒａｒ＞ＲａｒＴｈ …（７）
さらに、サイズ変数の他の１つは、対象画像の全画素数（ＡａＰｎ）に対する対象高エッジ領域に含まれる画素数（Ｐｎｕｍ）の率である「画像面積率」（Ａａｒ）である。画像面積率（Ａａｒ）は、次式（８）で導出される。 Rar> RarTh (7)
Furthermore, another size variable is an “image area ratio” (Aar) which is a ratio of the number of pixels (Pnum) included in the target high edge region to the total number of pixels (AaPn) of the target image. The image area ratio (Aar) is derived by the following equation (8).

Ａａｒ＝Ｐｎｕｍ／ＡａＰｎ …（８）
そして、この画像面積率（Ａａｒ）が所定の閾値（ＡａｒＴｈ）以上であるか否かが判定される。具体的には、次式（９）を満足するか否かが判定される。 Aar = Pnum / AaPn (8)
Then, it is determined whether or not the image area ratio (Aar) is equal to or greater than a predetermined threshold (AarTh). Specifically, it is determined whether or not the following expression (9) is satisfied.

Ａａｒ＞ＡａｒＴｈ …（９）
上記３つのサイズ変数の全てが条件を満足した場合（すなわち、式（５）（７）（９）の全てが満足した場合）に、対象高エッジ領域のサイズが所定条件を満足すると判定される（図１６：ステップＳ５４にてＹｅｓ）。そして、この場合は、対象高エッジ領域が目候補領域に設定される（ステップＳ５５）。一方、対象高エッジ領域のサイズが所定条件を満足しない場合（サイズの比較的小さな場合）は、対象高エッジ領域は目候補領域に設定されない。 Aar> AarTh (9)
When all of the above three size variables satisfy the condition (that is, when all of the expressions (5), (7), and (9) are satisfied), it is determined that the size of the target high edge region satisfies the predetermined condition. (FIG. 16: Yes in step S54). In this case, the target high edge region is set as the eye candidate region (step S55). On the other hand, when the size of the target high edge region does not satisfy the predetermined condition (when the size is relatively small), the target high edge region is not set as the eye candidate region.

以上のようにして、一の対象高エッジ領域に関してサイズの判定がなされると、未処理の高エッジ領域が存在しているかが判定される（ステップＳ５６）。そして、存在していた場合は（ステップＳ５６にてＹｅｓ）、次の対象高エッジ領域が決定され（ステップＳ５３）、この対象高エッジ領域に対して上記と同様のサイズの判定がなされる。このような処理が繰り返され、最終的に全ての高エッジ領域に関してサイズの判定がなされ、条件を満足した高エッジ領域のみが目候補領域として抽出される。これにより、目の候補でない可能性が低い、サイズの比較的小さな高エッジ領域が除外されるため、より正確に目候補領域を抽出できることになる。 As described above, when the size of one target high edge region is determined, it is determined whether there is an unprocessed high edge region (step S56). If it exists (Yes in step S56), the next target high edge region is determined (step S53), and the same size determination as described above is performed for this target high edge region. Such processing is repeated, and finally the size is determined for all the high edge regions, and only the high edge regions that satisfy the conditions are extracted as eye candidate regions. As a result, a relatively small high-edge region that is unlikely to be an eye candidate is excluded, so that the eye candidate region can be extracted more accurately.

このようにして目候補領域が抽出されると、次に、２以上の目候補領域が抽出されたか否かが判断される（ステップＳ５７）。人物の顔には２つの目が存在するため、人物領域では、少なくとも２以上の目候補領域が存在するはずである。このため、対象人物候補領域から２以上の目候補領域が抽出されない場合は（ステップＳ５７にてＮｏ）、対象人物候補領域は人物領域の候補から除外される（ステップＳ５８）。そして、以降の人物判定処理（図３：ステップＳ１５）はなされず、処理はそのまま図３のステップＳ１６に進むことになる。 Once the eye candidate areas have been extracted in this way, it is next determined whether or not two or more eye candidate areas have been extracted (step S57). Since there are two eyes on the face of a person, there must be at least two or more eye candidate areas in the person area. For this reason, when two or more eye candidate areas are not extracted from the target person candidate area (No in step S57), the target person candidate area is excluded from the candidates for the person area (step S58). The subsequent person determination process (FIG. 3: step S15) is not performed, and the process proceeds to step S16 in FIG. 3 as it is.

＜２−５．人物判定処理＞
次に、人物判定処理（図３：ステップＳ１５）の詳細について説明する。この人物判定処理では、上記のようにして対象人物候補領域から抽出された目候補領域のうちのいずれか２つに基づいて、対象人物候補領域が人物領域であるか否かの判定がなされる。２つの目候補領域が実際に人物の目の領域であれば、それらはほぼ同サイズとなり、また、それらは互いに一定の範囲内に存在するはずである。このため、この人物判定処理では、２つの目候補領域のサイズの比較結果、及び、相互の配置関係のそれぞれに関して、人物領域の可能性が高いほど高くなるように点数を付ける。そして、点数が所定値以上あれば対象人物候補領域が人物領域であると判断されるようになっている。 <2-5. Person determination processing>
Next, details of the person determination process (FIG. 3: step S15) will be described. In this person determination process, it is determined whether or not the target person candidate area is a person area based on any two of the eye candidate areas extracted from the target person candidate area as described above. . If the two eye candidate regions are actually human eye regions, they will be approximately the same size, and they should be within a certain range of each other. For this reason, in this person determination process, points are assigned so that the possibility of a person area increases with respect to the comparison result of the sizes of the two eye candidate areas and the mutual arrangement relationship. If the score is equal to or greater than a predetermined value, it is determined that the target person candidate area is a person area.

図１８は、人物判定処理の詳細な流れを示す図である。特に言及しない限り、人物判定処理の各工程は全て、図２に示した人物判定部３４により行われる。 FIG. 18 is a diagram showing a detailed flow of the person determination process. Unless otherwise stated, all the steps of the person determination process are performed by the person determination unit 34 shown in FIG.

まず、対象人物候補領域から抽出された複数の目候補領域のうちから、２つの目候補領域が、処理の対象となる対象目候補領域として選択される（ステップＳ６１）。 First, two eye candidate areas are selected as target eye candidate areas to be processed from among a plurality of eye candidate areas extracted from the target person candidate areas (step S61).

次に、２つの対象目候補領域のサイズが比較され、その比較結果として人物判定に用いる第１判定点数が求められる（ステップＳ６２）。具体的には、２つの対象目候補領域のそれぞれの外接矩形が想定される。そして、２つの外接矩形のサイズが同じ大きさであるほど（すなわち、人物領域の可能性が高いほど）、高くなるように第１判定点数Ｐ１が導出される。ここで、一方の外接矩形の垂直方向幅をＨｔ１、水平方向幅をＷｔ１とし、他方の外接矩形の垂直方向幅をＨｔ２、水平方向幅をＷｔ２とすると、第１判定点数Ｐ１は以下の式（１０）で導出される。 Next, the sizes of the two target eye candidate regions are compared, and a first determination score used for person determination is obtained as a comparison result (step S62). Specifically, the circumscribed rectangles of the two target eye candidate areas are assumed. Then, the first determination score P1 is derived so as to increase as the size of the two circumscribed rectangles is the same (that is, as the possibility of the person area increases). Here, assuming that the vertical width of one circumscribed rectangle is Ht1, the horizontal width is Wt1, the vertical width of the other circumscribed rectangle is Ht2, and the horizontal width is Wt2, the first determination point P1 is expressed by the following formula ( 10).

Ｐ１＝｛１−２＊ＡＢＳ［(Ｈｔ１−Ｈｔ２)／(Ｈｔ１＋Ｈｔ２)］｝＊｛１−２＊ＡＢＳ［(Ｗｔ１−Ｗｔ２)／(Ｗｔ１＋Ｗｔ２)］｝
（ただし、”ＡＢＳ”は、［］の絶対値） …（１０）
第１判定点数Ｐ１が導出されると、次に、２つの対象目候補領域の相互の配置関係に応じて人物判定に用いる第２判定点数が求められる（ステップＳ６３）。 P1 = {1-2 * ABS [(Ht1-Ht2) / (Ht1 + Ht2)]} * {1-2 * ABS [(Wt1-Wt2) / (Wt1 + Wt2)]}
(However, “ABS” is the absolute value of [].) (10)
When the first determination score P1 is derived, next, the second determination score used for person determination is obtained according to the mutual arrangement relationship between the two target eye candidate regions (step S63).

具体的には、図１９に示すように、一方の対象目候補領域の外接矩形Ｒｅ１の対角線ＤＬ１の長さと、他方の対象目候補領域の外接矩形Ｒｅ２の対角線ＤＬ２の長さとの平均値をＤＬＡｖとする。また、２つの外接矩形Ｒ１，Ｒ２を包含する最小の矩形（以下、「包含矩形」という。）Ｒｅ３を想定し、この対角線ＤＬ３の長さをＤＬＬとする。そして、次式（１１）で示される値Ｋが導出される。 Specifically, as shown in FIG. 19, the average value of the length of the diagonal line DL1 of the circumscribed rectangle Re1 of one target eye candidate area and the length of the diagonal line DL2 of the circumscribed rectangle Re2 of the other target eye candidate area is calculated as DLAv. And Further, a minimum rectangle (hereinafter referred to as “included rectangle”) Re3 including two circumscribed rectangles R1 and R2 is assumed, and the length of the diagonal line DL3 is assumed to be DLL. And the value K shown by following Formula (11) is derived | led-out.

Ｋ＝ＤＬＬ／ＤＬＡｖ …（１１）
そして、αを所定の定数とし、第２判定点数Ｐ２は以下の式（１２）で導出される。 K = DLL / DLAv (11)
Then, α is a predetermined constant, and the second determination score P2 is derived by the following formula (12).

Ｋ≧αのとき、
Ｐ２＝α／Ｋ；
Ｋ＜αのとき、
Ｐ２＝Ｋ／α …（１２）
すなわち、この値Ｋが所定の定数αに近いほど高くなるように判定点数Ｐ２が導出される。定数αは例えば「３」に設定される。一般に、人物の顔においては、２つの目が占める長さは、一つの目の長さに比して３倍程度となる。このため、この式では、包含矩形Ｒｅ３の対角線ＤＬ３の長さ（ＤＬＬ）を２つの目が占める長さ、２つの外接矩形の対角線の平均値（ＤＬＡｖ）を一つの目の長さとそれぞれみなし、値Ｋが定数α（＝「３」）に近いほど（すなわち、人物領域の可能性が高いほど）、高い判定点数Ｐ２を与えるようにしているわけである。 When K ≧ α
P2 = α / K;
When K <α
P2 = K / α (12)
That is, the determination score P2 is derived so that the value K becomes higher as it approaches the predetermined constant α. The constant α is set to “3”, for example. In general, in the face of a person, the length occupied by two eyes is about three times the length of one eye. Therefore, in this equation, the length of the diagonal line DL3 of the inclusion rectangle Re3 (DLL) is the length occupied by the two eyes, and the average value (DLAV) of the diagonal lines of the two circumscribed rectangles is regarded as the length of one eye, respectively. The closer the value K is to the constant α (= “3”) (that is, the higher the possibility of the person area), the higher the determination point P2 is given.

２つの判定点数Ｐ１，Ｐ２が導出されると、それらが加算されて合計判定点数Ｐ３が導出される（図１８：ステップＳ６４）。 When the two determination points P1 and P2 are derived, they are added to derive a total determination point P3 (FIG. 18: Step S64).

以上のようにして、２つの目候補領域の一つの組合せについて合計判定点数Ｐ３が導出されると、未処理の他の組合せが存在しているかが判定される（ステップＳ６５）。そして、存在していた場合は（ステップＳ６５にてＹｅｓ）、次の２つの目候補領域の組合せが決定され（ステップＳ６１）、この組合せに対して上記と同様にして合計判定点数Ｐ３が導出される。このような処理が繰り返され、最終的に全ての組合せに関して合計判定点数Ｐ３が導出される。対象人物候補領域から抽出された複数の目候補領域の数をｎとすると、この組合せの数は_nＣ₂となり、_nＣ₂個の合計判定点数Ｐ３が導出されることになる。 As described above, when the total determination score P3 is derived for one combination of the two eye candidate regions, it is determined whether another unprocessed combination exists (step S65). If it exists (Yes in step S65), the combination of the next two eye candidate regions is determined (step S61), and the total determination score P3 is derived for this combination in the same manner as described above. The Such processing is repeated, and finally the total judgment score P3 is derived for all combinations. When the number of the plurality of eye candidate areas extracted from the target person candidate area is _n , the number of combinations is _n C ₂ , and _n C ₂ total determination points P3 are derived.

次に、導出された合計判定点数Ｐ３の最大値が取得され、この最大値が所定の閾値以上であるか否かが判定される（ステップＳ６６）。そして、合計判定点数Ｐ３の最大値が閾値以上であれば、対象人物候補領域が人物領域と判定され（ステップＳ６７）、合計判定点数Ｐ３の最大値が閾値未満であれば、対象人物候補領域が人物領域でないと判定されることになる（ステップＳ６８）。 Next, the maximum value of the derived total determination score P3 is acquired, and it is determined whether or not this maximum value is equal to or greater than a predetermined threshold (step S66). If the maximum value of the total determination points P3 is equal to or greater than the threshold value, the target person candidate region is determined as a person region (step S67). If the maximum value of the total determination points P3 is less than the threshold value, the target person candidate region is determined. It is determined that the area is not a person area (step S68).

＜３．まとめ＞
以上のように本実施の形態の人物領域検出装置１０では、対象画像の各画素におけるエッジ強度が導出され、対象画像においてエッジ強度が同一のレベルに属し互いに隣接する画素群がそれぞれ、当該レベルに属する等高領域として設定される。そして、エッジ強度が最大のレベルに属する一の等高領域が初期の注目領域とされ、最大のレベルとは異なるレベルに属する注目領域に近接する等高領域が、エッジ強度が大きなレベルから順に注目領域に結合される。その結合結果として得られた注目領域が人物候補領域とされる。 <3. Summary>
As described above, in the human region detection device 10 according to the present embodiment, the edge strength at each pixel of the target image is derived, and the pixel groups that belong to the same level and are adjacent to each other in the target image are at that level. It is set as the contour area to which it belongs. One contour region belonging to the level with the highest edge strength is set as the initial attention region, and contour regions close to the attention region belonging to a level different from the maximum level are noticed in order from the level with the highest edge strength. Bound to the region. The attention area obtained as a result of the combination is set as the person candidate area.

したがって、人物領域検出装置１０では、対象画像の画素の色を用いずエッジ強度に基づいて人物候補領域を特定することから、対象画像の画素の色に依存せずに人物候補領域を特定できる。このため、グレースケールの画像や色相を大きく変更した画像などを対象画像とする場合でも、正しく人物候補領域を特定できることになる。 Therefore, since the person area detection device 10 specifies the person candidate area based on the edge strength without using the pixel color of the target image, the person candidate area can be specified without depending on the pixel color of the target image. For this reason, even when a gray scale image, an image with a greatly changed hue, or the like is used as the target image, the person candidate area can be correctly specified.

また、内部に空き領域が存在する人物候補領域は、空き領域補完処理により空き領域が補完されるため、より人物らしい人物候補領域を取得できる。さらに、人物候補領域から目候補領域を抽出し、２つの目候補領域のサイズの比較結果及び相互の配置関係に基づいて人物領域であるか否かを判定するため、人物領域を正確に特定することが可能である。 Moreover, since the empty area is complemented by the empty area complementing process for the candidate person area that has an empty area inside, a person candidate area that is more likely to be a person can be acquired. Further, the eye candidate area is extracted from the person candidate area, and the person area is accurately specified in order to determine whether or not it is a person area based on the comparison result of the sizes of the two eye candidate areas and the mutual arrangement relationship. It is possible.

＜４．他の実施の形態＞
以上、本発明の実施の形態について説明してきたが、この発明は上記実施の形態に限定されるものではなく様々な変形が可能である。以下では、このような他の実施の形態を説明する。もちろん、以下の実施の形態を適宜組合わせてもよい。 <4. Other embodiments>
Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and various modifications are possible. Hereinafter, such other embodiments will be described. Of course, the following embodiments may be appropriately combined.

上記実施の形態では、アルゴリズム上、一の人物候補領域について空き領域補完処理（図３：ステップＳ１３）を行った結果が、処理済の他の人物候補領域について空き領域補完処理を行った結果とほぼ一致することがある。この場合は、その後のサブ処理（ステップＳ１４，Ｓ１５）を行ったとしても、同一の人物領域が検出されるだけであるため、その後のサブ処理を省略してもよい。 In the above embodiment, according to the algorithm, the result of performing the empty area complementing process (FIG. 3: step S13) for one person candidate area is the result of performing the empty area complementing process for the other processed person candidate areas. May be nearly identical. In this case, even if the subsequent sub-process (steps S14 and S15) is performed, only the same person region is detected, and the subsequent sub-process may be omitted.

また、上記実施の形態の目候補領域抽出処理では、式（５）（７）（９）の全てが満足した場合に対象高エッジ領域のサイズが所定条件を満足すると判定されていたが、いずれか１つあるいは２つのみが満足したとき、所定条件を満足すると判定してもよい。また、式（５）（７）（９）では、各サイズ変数の下限のみを閾値で指定していたが、上限をさらに指定してもよい。 In the eye candidate region extraction process of the above embodiment, when all of the expressions (5), (7), and (9) are satisfied, it is determined that the size of the target high edge region satisfies the predetermined condition. When only one or two are satisfied, it may be determined that the predetermined condition is satisfied. Further, in Expressions (5), (7), and (9), only the lower limit of each size variable is specified as a threshold value, but an upper limit may be further specified.

また、上記実施の形態の目候補領域抽出処理では、「２」以上の目候補領域が抽出されなかった対象人物候補領域をその後の処理から除外していたが、この除外の判断に用いる所定数としては「２」のみならず、「１」あるいは「３」以上であってもよい。 Further, in the eye candidate area extraction process of the above embodiment, the target person candidate area for which the eye candidate area of “2” or more was not extracted is excluded from the subsequent processes. As well as “2”, it may be “1” or “3” or more.

また、上記実施の形態の上記人物判定処理では、２つの対象目候補領域のサイズの比較結果と相互の配置関係との双方に基づいて人物領域の判定を行っていたが、いずれかのみに基づいて人物領域の判定を行ってもよい。 In the person determination process of the above embodiment, the person area is determined based on both the comparison result of the sizes of the two target eye candidate areas and the mutual arrangement relationship. The person area may be determined.

また、上記実施の形態では、静止画像を対象画像とする場合を例に説明を行ったが、動画像を対象画像としてもよい。この場合は、動画像に含まれる各フレームに対して上記同様の処理を実施すればよい。 In the above embodiment, the case where a still image is the target image has been described as an example, but a moving image may be the target image. In this case, the same processing as described above may be performed on each frame included in the moving image.

また、上記実施の形態では、人物領域検出装置は一般的なコンピュータで構成されると説明したが、上記実施の形態で説明した機能が、デジタルスチルカメラ、ビデオカメラなどの撮像装置やプリンタなどの印刷装置等、画像を取り扱う他の種類のデバイスの機能の一部として含まれ、当該デバイスが人物領域検出装置として機能してもよい。 In the above-described embodiment, the person area detection device is described as being configured by a general computer. However, the function described in the above-described embodiment can be applied to an imaging device such as a digital still camera or a video camera, a printer, or the like. It may be included as a part of the function of another type of device that handles images, such as a printing apparatus, and the device may function as a person area detection device.

また、上記実施の形態では、プログラムに従ったＣＰＵの演算処理によってソフトウェア的に各種の機能が実現されると説明したが、これら機能のうちの一部は電気的なハードウェア回路により実現されてもよい。 Further, in the above-described embodiment, it has been described that various functions are realized in software by the arithmetic processing of the CPU according to the program. However, some of these functions are realized by an electrical hardware circuit. Also good.

人物領域検出装置の外観図である。It is an external view of a person area detection apparatus. 本体部の内部構成を模式的に示すブロック図である。It is a block diagram which shows typically the internal structure of a main-body part. 人物領域を検出する処理の全体の流れを示す図である。It is a figure which shows the flow of the whole process which detects a person area. 人物候補領域抽出処理の詳細な流れを示す図である。It is a figure which shows the detailed flow of a person candidate area | region extraction process. ラプラシアンフィルタの例を示す図である。It is a figure which shows the example of a Laplacian filter. 対象画像からエッジ画像を求める手法を説明する図である。It is a figure explaining the method of calculating | requiring an edge image from a target image. 等高領域を設定する手法を説明する図である。It is a figure explaining the method of setting a contour area. 等高領域結合処理の詳細な流れを示す図である。It is a figure which shows the detailed flow of a contour area joint process. 注目領域の一例を示す図である。It is a figure which shows an example of an attention area. 対象等高領域の一例を示す図である。It is a figure which shows an example of an object contour area | region. 注目領域と対象等高領域との配置の一例を示す図である。It is a figure which shows an example of arrangement | positioning of an attention area | region and a target level area. 注目領域と対象等高領域との配置の一例を示す図である。It is a figure which shows an example of arrangement | positioning of an attention area | region and a target level area. 人物候補領域を抽出する原理を説明する図である。It is a figure explaining the principle which extracts a person candidate area | region. 空き領域補完処理の詳細な流れを示す図である。It is a figure which shows the detailed flow of an empty area complementation process. 空き領域を補完する手法を説明する図である。It is a figure explaining the method of complementing an empty area. 目候補領域抽出処理の詳細な流れを示す図である。It is a figure which shows the detailed flow of an eye candidate area | region extraction process. 対象高エッジ領域とその外接矩形とを示す図である。It is a figure which shows object high edge area | region and its circumscribed rectangle. 人物判定処理の詳細な流れを示す図である。It is a figure which shows the detailed flow of a person determination process. ２つの外接矩形と、それらの包含矩形とを示す図である。It is a figure which shows two circumscribed rectangles and those inclusion rectangles.

Explanation of symbols

１０人物領域検出装置
３１人物候補抽出部
３２領域補完部
３３目候補抽出部
３４人物判定部
４１プログラム
６１対象画像
６２エッジ画像
６３ａ〜６３ｃマスクデータ
６４ａ〜６４ｃ等高画像
Ａ１〜Ａ５等高領域
DESCRIPTION OF SYMBOLS 10 Person area detection apparatus 31 Person candidate extraction part 32 Area complement part 33 Eye candidate extraction part 34 Person determination part 41 Program 61 Target image 62 Edge image 63a-63c Mask data 64a-64c Contour image A1-A5 Contour area

Claims

A person area detection device for detecting a person area indicating a person from a target image,
Means for deriving edge strength at each pixel of the target image;
Means for classifying the edge intensity into a plurality of levels, and setting pixel groups adjacent to each other belonging to the same level in the target image as contour regions belonging to the level;
One of the contour regions belonging to the level having the maximum edge strength is set as an initial attention region, and the contour region belonging to a level different from the maximum level and close to the attention region is coupled to the attention region. Coupling means to be part of the region of interest;
With
The combining means performs the combination of the contour areas to the attention area in the order of the contour areas belonging to a level having a large edge strength, and the obtained candidate area is a person candidate area that is a candidate for the person area. A person area detecting device characterized by the above.

In the person area detection device according to claim 1,
Means for expanding the contour region before joining by the joining means;
The person area detecting device further comprising:

In the person area detection device according to claim 1 or 2,
Complementing means for complementing a free area existing inside the person candidate area obtained by the combining means;
The person area detecting device further comprising:

In the person area detection device according to claim 3,
Eye candidate region extraction means for extracting pixel groups adjacent to each other whose edge strength is equal to or greater than a predetermined threshold in the candidate human region after complementing the vacant region as candidate eye regions that are candidates for regions indicating human eyes. ,
The person area detecting device further comprising:

In the person area detection device according to claim 4,
The eye candidate area extracting means extracts, as the eye candidate area, each of the pixel groups whose edge intensity is equal to or greater than a predetermined threshold and whose size satisfies a predetermined condition among adjacent pixel groups. Detection device.

In the person area detection device according to claim 4 or 5,
Means for excluding the human candidate area from which the predetermined number or more of the candidate eye areas are not extracted from the candidate human area;
The person area detecting device further comprising:

In the person area detection device according to any one of claims 4 to 6,
Determination means for determining whether or not the person candidate area is the person area based on a comparison result of the sizes of any two of the eye candidate areas extracted from the person candidate area;
The person area detecting device further comprising:

In the person area detection device according to any one of claims 4 to 7,
Determination means for determining whether or not the person candidate area is the person area based on a mutual arrangement relationship between any two of the eye candidate areas extracted from the person candidate area;
The person area detecting device further comprising:

A person area detection method for detecting a person area indicating a person from a target image, comprising:
(A) deriving edge strength at each pixel of the target image;
(B) dividing the edge intensity into a plurality of levels, and setting pixel groups belonging to the same level and adjacent to each other in the target image as contour regions belonging to the level;
(C) One of the contour regions belonging to the level having the maximum edge strength is set as an initial attention region, and the contour region belonging to a level different from the maximum level and close to the attention region is defined as the attention region. To be a part of the region of interest;
With
In the step (c), the contour areas are combined with the attention area in the order of the contour areas belonging to a level having a large edge strength, and the obtained attention area is a person who is a candidate for the person area. A person area detection method characterized in that the area is a candidate area.

A program for detecting a person area indicating a person from a target image,
(A) deriving edge strength at each pixel of the target image;
(B) dividing the edge intensity into a plurality of levels, and setting pixel groups belonging to the same level and adjacent to each other in the target image as contour regions belonging to the level;
(C) One of the contour regions belonging to the level having the maximum edge strength is set as an initial attention region, and the contour region belonging to a level different from the maximum level and close to the attention region is defined as the attention region. To be a part of the region of interest;
And execute
In the step (c), the contour areas are combined with the attention area in the order of the contour areas belonging to a level having a large edge strength, and the obtained attention area is a person who is a candidate for the person area. A program characterized as a candidate area.