JP2008181439A

JP2008181439A - Face detection device and method, and imaging apparatus

Info

Publication number: JP2008181439A
Application number: JP2007015786A
Authority: JP
Inventors: Kazuhiro Kojima; 和浩小島; Hideto Fujita; 日出人藤田; Masahiko Yamada; 晶彦山田
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2007-01-26
Filing date: 2007-01-26
Publication date: 2008-08-07

Abstract

<P>PROBLEM TO BE SOLVED: To detect an edge face, that is, the state in which a part of the face runs off an input image. <P>SOLUTION: A face detection device is provided with an edge face detection processing part for detecting the edge face in which a part of the face is on the edge of the input image by adding a dummy image 201 around the input image 200 and performing face detection beyond the input image edge. The edge face detection processing part calculates similarity between the image and a reference face image from the image in a judgement area defined in a synthesized image and prescribed dictionary data, and judges whether or not there is an edge face in the judgement area by comparing the similarity with a judgement threshold. For each pixel value of the dummy image, a pixel value which does not influence the edge face detection judgement is set. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、画像内の顔を検出する顔検出装置及び顔検出方法並びにそれらを利用した撮像装置に関する。 The present invention relates to a face detection device and a face detection method for detecting a face in an image, and an imaging device using them.

撮像装置を用いた静止画または動画の撮影時における失敗の１つとして被写体切れがある。被写体切れとは、撮影者が着目している被写体が撮影領域内からはみ出してしまうことを意味する。特に、この被写体が人物である場合、被写体切れは重大な失敗としてユーザに認知される。 One of the failures at the time of shooting a still image or a moving image using an imaging device is subject cut. “Out of subject” means that the subject focused on by the photographer protrudes from the photographing region. In particular, when the subject is a person, the subject is recognized as a serious failure by the user.

撮影時に人物の顔を被写体として含めることは極めて多く、また、人物を撮影する際に撮影者が意図的に顔を撮影領域からはみ出させることは考えにくい。被写体切れを起こしている顔を検出することができれば、その検出結果を用いて構図を調整したりすることにより、より望ましい画像を得ることも可能となる。このことからも、被写体切れを起こしている顔（端顔）を検出することは重要である。 In many cases, a person's face is included as a subject at the time of shooting, and it is unlikely that the photographer intentionally protrudes the face from the shooting area when shooting a person. If it is possible to detect a face where the subject is cut off, it is possible to obtain a more desirable image by adjusting the composition using the detection result. For this reason as well, it is important to detect the face (edge face) that causes the subject to be cut.

従来、顔検出手法として、パターン認識による顔マッチングを用いた手法、肌色を基準とする色抽出を用いた手法、顔の特徴量を大量のデータから学習する学習処理を介した手法など、様々な手法が提案されている。しかしながら、これらの従来の顔検出手法は、顔の全体が映っていることを前提としている。つまり、従来の顔検出手法は、画像内に存在する完全な顔を対象に顔検出処理を行うものであるため、一部が画像からはみ出しているような顔を検出することはできない。 Conventionally, there are various face detection methods such as a method using face matching by pattern recognition, a method using color extraction based on skin color, and a method through learning processing that learns facial features from a large amount of data. A method has been proposed. However, these conventional face detection methods are based on the premise that the entire face is shown. That is, the conventional face detection method performs face detection processing on a complete face existing in an image, and thus cannot detect a face that partially protrudes from the image.

これに鑑み、下記特許文献１には、フレーム（画像）から被写体の顔がはみ出しているか否かを検出する手法が開示されている。この手法では、肌色領域の抽出を介して、顔の被写体切れ検出を行うようにしている。しかしながら、肌色抽出を用いたアルゴリズムは、一般的に知られているように、人種による肌色の違いや撮影条件（周囲環境光など）の違いに大きく左右されてしまう。また、下記特許文献１には、目・鼻・口などに対応する特徴点の検出を介して、顔の被写体切れ検出を行うことも開示されている。しかしながら、それは、顔部品である目・鼻・口がそれぞれ独立に検出されることが前提となる技術であり、一般的に知られているように、顔部品検出は誤検出が比較的多い。このように、特許文献１の手法では、問題も多い。 In view of this, Patent Document 1 below discloses a method for detecting whether or not a subject's face protrudes from a frame (image). In this method, the detection of a subject cut of a face is performed through extraction of a skin color region. However, as is generally known, an algorithm using skin color extraction is greatly affected by a difference in skin color and a photographing condition (such as ambient light) according to race. Japanese Patent Application Laid-Open No. 2004-228561 also discloses that the detection of a face subject being cut off is performed through detection of feature points corresponding to eyes, nose, mouth, and the like. However, it is a technique on the premise that eyes, nose, and mouth, which are face parts, are detected independently, and as is generally known, face part detection has relatively many false detections. As described above, the method of Patent Document 1 has many problems.

また、下記特許文献２には、フレーム間差分を用いて顔の被写体切れ検出を行う手法が開示されている。しかしながら、これは、動画を対象とした手法であり、静止画への適応は不可能である。また、顔が映っている状態からの変化をフレーム間差分を用いて検出し、これによって被写体切れ検出を行うようにしているため、顔が撮影領域外から入ってくる動作には対応できない。 Patent Document 2 below discloses a method for detecting a subject cut of a face using a difference between frames. However, this is a technique for moving images and cannot be applied to still images. In addition, since the change from the state in which the face is reflected is detected using the inter-frame difference and the subject cut is detected by this, it is not possible to cope with an operation in which the face enters from outside the shooting area.

この他、顔検出に関する様々な手法が提案されているが（下記特許文献３〜５など）、良好に顔の被写体切れを検出する手法は存在していない。 In addition to these, various methods relating to face detection have been proposed (Patent Documents 3 to 5 below), but there is no method for detecting a subject cut of a face well.

特開２００５−６５２６５号公報JP 2005-65265 A 特開２００３−３２３６２１号公報JP 2003-323621 A 特開２００６−１０１１８６号公報JP 2006-101186 A 特開２００５−２１７７６８号公報JP 2005-217768 A 特開２００５−１１７３１６号公報JP 2005-117316 A

上述の如く、従来の顔検出手法は、顔検出を行う部位に対する入力画像内に完全な顔が存在することを前提としており、このままでは、被写体切れを起こした顔（端顔）を検出することができない。また、被写体切れを起こした顔を検出する際、検出条件（画像内のどの範囲を検出範囲とするかなどの条件）を適切に設定してやれば処理時間の短縮化等を見込める。このため、上記のような検出条件を如何にして設定してやるかも重要である。 As described above, the conventional face detection method is based on the premise that a complete face exists in the input image for the part where face detection is performed, and in this state, the face (end face) in which the subject is cut off is detected. I can't. In addition, when detecting a face where the subject has run out, if the detection conditions (conditions such as which range in the image is the detection range) are appropriately set, the processing time can be shortened. For this reason, it is also important how the above detection conditions are set.

そこで本発明は、良好な端顔検出を行うことのできる顔検出装置及び顔検出方法並びに撮像装置を提供することを目的とする。また本発明は、端顔検出のための処理時間の短縮化等に寄与する顔検出装置及び撮像装置を提供することを目的とする。 Accordingly, an object of the present invention is to provide a face detection device, a face detection method, and an imaging device that can perform good end face detection. Another object of the present invention is to provide a face detection device and an imaging device that contribute to shortening the processing time for edge detection.

上記目的を達成するため本発明に係る第１の顔検出装置は、顔検出を行う顔検出装置において、入力画像の周囲にダミー画像を追加し、入力画像端を超えて顔検出を行うことによって、顔の一部が前記入力画像端にかかった端顔の検出を行う端顔検出手段を備えたことを特徴とする。 In order to achieve the above object, a first face detection apparatus according to the present invention is a face detection apparatus that performs face detection by adding a dummy image around an input image and performing face detection beyond the edge of the input image. The apparatus further comprises an end face detection means for detecting an end face where a part of the face is applied to the end of the input image.

これにより、良好な端顔検出が実現可能となる。 This makes it possible to realize good end face detection.

具体的には例えば、前記端顔検出手段は、前記入力画像と前記ダミー画像の合成画像内に判定領域を定義し、前記判定領域内の画像と所定の辞書データとの類似度を算出する類似度算出手段と、判定閾値を設定するための判定閾値設定手段と、を備えて、前記類似度と前記判定閾値を比較することにより前記判定領域内に前記端顔が存在するか否かを判定する。 Specifically, for example, the edge detection unit defines a determination area in a composite image of the input image and the dummy image, and calculates similarity between the image in the determination area and predetermined dictionary data. A degree calculation unit and a determination threshold setting unit for setting a determination threshold, and comparing the similarity with the determination threshold to determine whether the end face is present in the determination region To do.

そして例えば、前記判定閾値設定手段は、前記判定領域と前記ダミー画像との重なり領域の前記類似度への寄与成分に応じて前記判定閾値を変更する。 For example, the determination threshold value setting unit changes the determination threshold value according to a contribution component to the similarity of the overlapping area between the determination area and the dummy image.

これにより、端顔検出に対する重なり領域内の画像の画素値の影響を、排除或いは抑制することが可能となる。 Thereby, it is possible to eliminate or suppress the influence of the pixel value of the image in the overlapping area on the edge detection.

これに代えて例えば、前記端顔検出手段は、前記判定領域と前記ダミー画像との重なり領域の前記類似度への寄与成分がゼロとなるような画素値を、前記重なり領域内の画像の画素値として設定するようにしてもよい。 Instead of this, for example, the end face detection means sets a pixel value such that a contribution component to the similarity of the overlapping region between the determination region and the dummy image is zero, and the pixel of the image in the overlapping region. It may be set as a value.

また例えば、前記入力画像内に存在する顔を検出する顔検出手段を更に備え、前記端顔検出手段は、前記顔検出手段の顔検出結果に応じて前記端顔の検出条件を変更する。 Further, for example, a face detection unit that detects a face present in the input image is further provided, and the end face detection unit changes the detection condition of the end face according to a face detection result of the face detection unit.

これにより、例えば、顔検出手段によって検出された顔のサイズに応じて端顔の検出サイズに上下限を設けるといったことが可能となる。上下限を設ければ、端顔検出のための処理時間の短縮化や、端顔の誤検出又は過検出の抑制効果を期待できる。 Thereby, for example, it is possible to provide upper and lower limits to the end face detection size in accordance with the face size detected by the face detection means. If the upper and lower limits are provided, it is possible to expect a reduction in processing time for end face detection and an effect of suppressing end face misdetection or overdetection.

具体的には例えば、前記端顔検出手段は、前記顔検出手段によって検出された顔の、サイズ、位置及び向きの少なくとも１つに基づいて、前記端顔の検出条件を変更する。 Specifically, for example, the end face detection unit changes the end face detection condition based on at least one of the size, position, and orientation of the face detected by the face detection unit.

また例えば、当該顔検出装置は、撮像手段を有する撮像装置に搭載され、前記入力画像は、前記撮像手段の撮影画像から生成され、前記端顔検出手段は、前記撮影画像の撮影時における前記撮像装置の傾きに応じて前記端顔の検出条件を変更する。 Further, for example, the face detection device is mounted on an imaging device having an imaging unit, the input image is generated from a captured image of the imaging unit, and the end face detection unit is configured to capture the captured image at the time of capturing the captured image. The edge detection condition is changed according to the inclination of the apparatus.

これにより、例えば、撮影時の撮像装置の傾きに応じて端顔の検出範囲に制限を加えるといったことが可能となる。端顔の検出範囲に制限を設ければ、端顔検出のための処理時間の短縮化や、端顔の誤検出又は過検出の抑制効果を期待できる。 Thereby, for example, it is possible to limit the detection range of the face according to the inclination of the imaging device at the time of shooting. If the end face detection range is limited, it is possible to expect a reduction in processing time for end face detection and an effect of suppressing end face misdetection or overdetection.

上記目的を達成するため本発明に係る第２の顔検出装置は、入力画像内に存在する顔を検出する顔検出手段を備えた顔検出装置において、顔の一部が入力画像端にかかった端顔の検出を行う端顔検出手段を更に備え、前記端顔検出手段は、前記顔検出手段の顔検出結果に応じて前記端顔の検出条件を変更する。 In order to achieve the above object, a second face detection device according to the present invention is a face detection device having a face detection means for detecting a face existing in an input image, wherein a part of the face is applied to an end of the input image. An end face detection unit for detecting an end face is further provided, and the end face detection unit changes the detection condition of the end face according to a face detection result of the face detection unit.

上記目的を達成するため本発明に係る第３の顔検出装置は、撮像手段を有する撮像装置に搭載された顔検出装置において、前記撮像手段の撮影画像から生成された入力画像を受け、顔の一部が入力画像端にかかった端顔の検出を行う端顔検出手段を備え、前記端顔検出手段は、前記撮影画像の撮影時における前記撮像装置の傾きに応じて前記端顔の検出条件を変更する。 In order to achieve the above object, a third face detection device according to the present invention is a face detection device mounted on an image pickup device having an image pickup means, receives an input image generated from a photographed image of the image pickup means, An end face detection unit that detects an end face partly applied to the end of the input image, and the end face detection unit detects the end face according to an inclination of the imaging device at the time of shooting the captured image. To change.

上記目的を達成するため本発明に係る撮像装置は、撮像手段と、上記の何れかに記載の顔検出装置と、を備えた撮像装置であって、前記顔検出装置に対する前記入力画像は、前記撮像手段の撮影画像に基づいて生成される。 In order to achieve the above object, an imaging apparatus according to the present invention is an imaging apparatus including an imaging unit and the face detection apparatus according to any one of the above, wherein the input image to the face detection apparatus is It is generated based on the captured image of the imaging means.

上記目的を達成するため本発明に係る顔検出方法は、顔検出を行う顔検出方法において、入力画像の周囲にダミー画像を追加し、入力画像端を超えて顔検出を行うことによって、顔の一部が前記入力画像端にかかった端顔の検出を行うことを特徴とする。 In order to achieve the above object, a face detection method according to the present invention is a face detection method for performing face detection by adding a dummy image around the input image and performing face detection beyond the edge of the input image. It is characterized in that an end face partially covered with the input image edge is detected.

本発明によれば、良好な端顔検出を行うことのできる顔検出装置及び顔検出方法並びに撮像装置を提供することができる。また、端顔検出のための処理時間の短縮化等に寄与する顔検出装置及び撮像装置を提供することができる。 According to the present invention, it is possible to provide a face detection device, a face detection method, and an imaging device that can perform good end face detection. Further, it is possible to provide a face detection device and an imaging device that contribute to shortening the processing time for edge detection.

本発明の意義ないし効果は、以下に示す実施の形態の説明により更に明らかとなろう。ただし、以下の実施の形態は、あくまでも本発明の一つの実施形態であって、本発明ないし各構成要件の用語の意義は、以下の実施の形態に記載されたものに制限されるものではない。 The significance or effect of the present invention will become more apparent from the following description of embodiments. However, the following embodiment is merely one embodiment of the present invention, and the meaning of the term of the present invention or each constituent element is not limited to that described in the following embodiment. .

以下、本発明の実施の形態につき、図面を参照して具体的に説明する。参照される各図において、同一の部分には同一の符号を付し、同一の部分に関する重複する説明を原則として省略する。 Hereinafter, embodiments of the present invention will be specifically described with reference to the drawings. In each of the drawings to be referred to, the same part is denoted by the same reference numeral, and redundant description regarding the same part is omitted in principle.

＜＜第１実施形態＞＞
まず、本発明の第１実施形態について説明する。図１は、本発明の第１実施形態に係る撮像装置１の全体ブロック図である。図１の撮像装置１は、静止画を撮影及び記録可能なデジタルスチルカメラ、又は、静止画及び動画を撮影及び記録可能なデジタルビデオカメラである。 << First Embodiment >>
First, a first embodiment of the present invention will be described. FIG. 1 is an overall block diagram of an imaging apparatus 1 according to the first embodiment of the present invention. The imaging apparatus 1 in FIG. 1 is a digital still camera that can capture and record a still image, or a digital video camera that can capture and record a still image and a moving image.

撮像装置１は、撮像部１１と、ＡＦＥ（Analog Front End）１２と、主制御部１３と、内部メモリ１４と、表示部１５と、記録媒体１６と、操作部１７と、傾きセンサ１８と、顔検出部１９と、を備えている。操作部１７には、シャッタボタン１７ａが備えられている。 The imaging device 1 includes an imaging unit 11, an AFE (Analog Front End) 12, a main control unit 13, an internal memory 14, a display unit 15, a recording medium 16, an operation unit 17, an inclination sensor 18, And a face detection unit 19. The operation unit 17 is provided with a shutter button 17a.

撮像部１１は、光学系と、絞りと、ＣＣＤ（Charge Coupled Devices）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）イメージセンサなどから成る撮像素子と、光学系や絞りを制御するためのドライバ（全て不図示）と、を有している。ドライバは、主制御部１３からのＡＦ／ＡＥ制御信号に基づいて、光学系のズーム倍率や焦点距離、及び、絞りの開度を制御する。撮像素子は、光学系及び絞りを介して入射した被写体を表す光学像を光電変換し、該光電変換によって得られた電気信号をＡＦＥ１２に出力する。 The imaging unit 11 includes an optical system, an aperture, an imaging device such as a CCD (Charge Coupled Devices) or a CMOS (Complementary Metal Oxide Semiconductor) image sensor, and a driver (all not shown) for controlling the optical system and the aperture. And have. The driver controls the zoom magnification and focal length of the optical system and the aperture of the diaphragm based on the AF / AE control signal from the main control unit 13. The image sensor photoelectrically converts an optical image representing a subject incident through the optical system and the diaphragm, and outputs an electrical signal obtained by the photoelectric conversion to the AFE 12.

ＡＦＥ１２は、撮像部１１（撮像素子）から出力されるアナログ信号を増幅し、増幅されたアナログ信号をデジタル信号に変換する。ＡＦＥ１２は、このデジタル信号を、順次、主制御部１３に出力する。 The AFE 12 amplifies the analog signal output from the imaging unit 11 (imaging device), and converts the amplified analog signal into a digital signal. The AFE 12 sequentially outputs this digital signal to the main control unit 13.

主制御部１３は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）及びＲＡＭ（Random Access Memory）等を備え、映像信号処理部としても機能する。主制御部１３は、ＡＦＥ１２の出力信号に基づいて、撮像部１１によって撮影された画像（以下、「撮影画像」ともいう）を表す映像信号を生成する。また、主制御部１３は、表示部１５の表示内容を制御する表示制御手段としての機能をも備え、表示に必要な制御を表示部１５に対して行う。 The main control unit 13 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like, and also functions as a video signal processing unit. Based on the output signal of the AFE 12, the main control unit 13 generates a video signal representing an image captured by the imaging unit 11 (hereinafter also referred to as “captured image”). The main control unit 13 also has a function as display control means for controlling the display content of the display unit 15, and performs control necessary for display on the display unit 15.

内部メモリ１４は、ＳＤＲＡＭ（Synchronous Dynamic Random Access Memory）等にて形成され、撮像装置１内で生成された各種データを一時的に記憶する。表示部１５は、液晶ディスプレイパネル等から成る表示装置であり、主制御部１３の制御の下、直前のフレームにて撮影された画像や記録媒体１６に記録されている画像などを表示する。記録媒体１６は、ＳＤ（Secure Digital）メモリカード等の不揮発性メモリであり、主制御部１３による制御の下、撮影画像などを記憶する。 The internal memory 14 is formed by SDRAM (Synchronous Dynamic Random Access Memory) or the like, and temporarily stores various data generated in the imaging device 1. The display unit 15 is a display device including a liquid crystal display panel and the like, and displays an image taken in the immediately previous frame, an image recorded on the recording medium 16, and the like under the control of the main control unit 13. The recording medium 16 is a non-volatile memory such as an SD (Secure Digital) memory card, and stores captured images and the like under the control of the main control unit 13.

操作部１７は、外部からの操作を受け付ける。操作部１７に対する操作内容は、主制御部１３に伝達される。シャッタボタン１７ａは、静止画の撮影及び記録を指示するためのボタンである。 The operation unit 17 receives an operation from the outside. The content of the operation on the operation unit 17 is transmitted to the main control unit 13. The shutter button 17a is a button for instructing to capture and record a still image.

傾きセンサ１８は、鉛直線に対する撮像装置１の傾きを検出する。今、水平線を撮影した時における撮影画像上の水平線が撮影画像上の水平方向（水平ライン）と合致する状態を横撮影状態とする。この横撮影状態において撮像装置１は傾いていないものとし（即ち、傾きは０度であるとし）、傾きセンサ１８は、この横撮影状態からの撮像装置１の傾きを検出する。横撮影状態にある撮像装置１を９０度傾けると、水平線を撮影した時における撮影画像上の水平線が撮影画像上の垂直方向（垂直ライン）と合致する状態となり、この状態を縦撮影状態と呼ぶ。撮像装置１の現在の状態が、横撮影状態であるのか或いは縦撮影状態であるのかを表す傾きデータは、主制御部１３に伝達される。 The tilt sensor 18 detects the tilt of the imaging device 1 with respect to the vertical line. Now, a state in which the horizontal line on the photographed image when the horizontal line is photographed matches the horizontal direction (horizontal line) on the photographed image is defined as a horizontal photographing state. It is assumed that the imaging device 1 is not tilted in this horizontal shooting state (that is, the tilt is 0 degree), and the tilt sensor 18 detects the tilt of the imaging device 1 from this horizontal shooting state. When the image pickup apparatus 1 in the horizontal shooting state is tilted by 90 degrees, the horizontal line on the shot image when the horizontal line is shot matches the vertical direction (vertical line) on the shot image, and this state is called the vertical shooting state. . Tilt data indicating whether the current state of the imaging apparatus 1 is the horizontal photographing state or the vertical photographing state is transmitted to the main control unit 13.

撮像装置１の動作モードには、静止画または動画の撮影及び記録が可能な撮影モードと、記録媒体１６に記録された静止画または動画を表示部１５に再生表示する再生モードと、が含まれる。操作部１７に対する操作に応じて、各モード間の遷移は実施される。撮影モードにおいて、撮像部１１は、所定のフレーム周期（例えば、１／６０秒）にて順次撮影を行い、各フレームにて得られる撮影画像は、顔検出部１９に与えられる。顔検出部１９による顔検出の際、撮影画像は縮小されうる。従って、その縮小画像と撮影画像そのものとを明確に区別するため、縮小されていない撮影画像を、以下「原画像」とよぶことにする。 The operation mode of the imaging device 1 includes a shooting mode capable of shooting and recording a still image or a moving image, and a reproduction mode for reproducing and displaying the still image or moving image recorded on the recording medium 16 on the display unit 15. . Transition between the modes is performed according to the operation on the operation unit 17. In the shooting mode, the imaging unit 11 sequentially performs shooting at a predetermined frame period (for example, 1/60 seconds), and a captured image obtained in each frame is given to the face detection unit 19. When the face detection unit 19 detects a face, the captured image can be reduced. Therefore, in order to clearly distinguish the reduced image from the captured image itself, the captured image that has not been reduced is hereinafter referred to as an “original image”.

図２に顔検出部１９の内部ブロック図を示す。図２の顔検出部１９は、画像入力部３０、全顔検出処理部３１、端顔検出処理部３２、顔検出結果統合部３３及び顔辞書メモリ３４を備える。 FIG. 2 shows an internal block diagram of the face detection unit 19. 2 includes an image input unit 30, an all-face detection processing unit 31, an end face detection processing unit 32, a face detection result integration unit 33, and a face dictionary memory 34.

画像入力部３０は、原画像に基づく画像を入力画像として全顔検出処理部３１及び端顔検出処理部３２に与える。以下、特に記述しない限り、入力画像とは、全顔検出処理部３１及び端顔検出処理部３２（又は後述する３２ａ若しくは３２ｂ）に対する入力画像を指すものとする。この入力画像は、原画像或いは原画像の縮小画像である。原画像の縮小画像は、画像入力部３０（或いは図１の主制御部１３）によって生成される。 The image input unit 30 gives an image based on the original image to the all face detection processing unit 31 and the end face detection processing unit 32 as an input image. Hereinafter, unless otherwise specified, the input image refers to an input image for the entire face detection processing unit 31 and the end face detection processing unit 32 (or 32a or 32b described later). This input image is an original image or a reduced image of the original image. The reduced image of the original image is generated by the image input unit 30 (or the main control unit 13 in FIG. 1).

全顔検出処理部３１は、入力画像内に存在する顔を検出する。図３を参照して、全顔検出処理部３１による顔検出の手法を説明する。 The all face detection processing unit 31 detects a face existing in the input image. With reference to FIG. 3, a method of face detection by the all-face detection processing unit 31 will be described.

図３（ａ）において、符号１００は入力画像を表し、符号１１０は顔検出用の判定領域を表す。顔検出を行う際、判定領域を入力画像の左上隅に配置した状態を初期状態とし、入力画像内において、判定領域を１画素ずつ左から右に水平方向に走査させる。判定領域が入力画像の右端に到達したら、判定領域を下方向に１画素ずらし、再度、水平方向の走査を行う。このように、判定領域を水平方向及び垂直方向に走査しながら、判定領域内に顔が存在しているかを逐次検出する。 In FIG. 3A, reference numeral 100 represents an input image, and reference numeral 110 represents a determination area for face detection. When performing face detection, a state in which the determination area is arranged at the upper left corner of the input image is set as an initial state, and the determination area is scanned horizontally from left to right one pixel at a time in the input image. When the determination area reaches the right end of the input image, the determination area is shifted downward by one pixel, and scanning in the horizontal direction is performed again. In this manner, it is sequentially detected whether a face exists in the determination area while scanning the determination area in the horizontal direction and the vertical direction.

１種類のサイズの判定領域にて大きさの異なる顔を検出することができるように、入力画像のサイズは適宜縮小される。具体的には、原画像から縮小によって入力画像を生成する際の縮小率を段階的に変更し、原画像及び各縮小率にて生成された縮小画像が入力画像として、画像入力部３０から出力される。図３（ｂ）及び（ｃ）における符号１０１及び１０２は、縮小によって得られた入力画像であり、図３（ａ）の入力画像１００が原画像であるとした場合、図３（ｂ）及び（ｂ）に示される入力画像１０１及び１０２は、原画像の縮小画像に相当する。 The size of the input image is appropriately reduced so that faces of different sizes can be detected in one type of determination area. Specifically, the reduction ratio when generating an input image by reduction from the original image is changed in stages, and the original image and the reduced image generated at each reduction ratio are output from the image input unit 30 as input images. Is done. Reference numerals 101 and 102 in FIGS. 3B and 3C are input images obtained by reduction, and when the input image 100 in FIG. 3A is an original image, FIGS. Input images 101 and 102 shown in (b) correspond to reduced images of the original image.

判定領域１１０のサイズは、何れの入力画像（１００、１０１及び１０２）に対しても同じである。具体的には例えば、判定領域１１０のサイズは、２４×２４画素に設定される。但し、以下、説明及び図示の簡略化上、判定領域１１０のサイズは、８×８画素であるものとする。 The size of the determination area 110 is the same for any input image (100, 101, and 102). Specifically, for example, the size of the determination area 110 is set to 24 × 24 pixels. However, for simplification of explanation and illustration, it is assumed that the size of the determination region 110 is 8 × 8 pixels.

上述の如く判定領域は走査されるが、或る入力画像内において判定領域が或る特定の位置に存在する状態に着目し、その判定領域内に顔が存在しているか否かを判断する手法について説明する。図４は、この手法の動作手順を表すフローチャートである。図４におけるステップＳ１〜Ｓ５の各処理は、全顔検出処理部３１によって実行される。 Although the determination area is scanned as described above, a method for determining whether or not a face exists in the determination area by paying attention to a state where the determination area exists at a specific position in a certain input image Will be described. FIG. 4 is a flowchart showing the operation procedure of this method. Each process of steps S1 to S5 in FIG.

ステップＳ１において、判定領域内の画像に対して４種類の微分フィルタの夫々を適用してエッジ強調処理を施すことにより、４つの第１エッジ強調画像（４方向の第１エッジ強調画像）を生成する。４種類の微分フィルタとして、例えば図５（ａ）〜（ｄ）に示すような、水平方向、垂直方向、右斜め上方向及び左斜め上方向に対応する４方向のPrewitt型の微分フィルタ（エッジ検出オペレータ）を用いる。これらを適用して得られた各第１エッジ強調画像を、水平方向、垂直方向、右斜め上方向及び左斜め上方向の第１エッジ強調画像と呼ぶ。また、それらを総称して、４方向の第１エッジ強調画像と呼ぶ。 In step S1, four first edge-enhanced images (four-direction first edge-enhanced images) are generated by applying edge enhancement processing to each of the four types of differential filters on the image in the determination region. To do. As four types of differential filters, for example, four-direction Prewitt type differential filters (edges) corresponding to the horizontal direction, the vertical direction, the upper right diagonal direction, and the upper left diagonal direction as shown in FIGS. Detection operator). Each first edge-enhanced image obtained by applying these is referred to as a first edge-enhanced image in the horizontal direction, the vertical direction, the upper right oblique direction, and the upper left oblique direction. In addition, they are collectively referred to as a first edge enhanced image in four directions.

図６に、各第１エッジ強調画像の例を示す。符号１２１、１２２、１２３及び１２４は、夫々、水平方向、垂直方向、右斜め上方向及び左斜め上方向の第１エッジ強調画像を表しており、図６には、各第１エッジ強調画像を形成する各画素の画素値の一部が示されている。 FIG. 6 shows an example of each first edge enhanced image. Reference numerals 121, 122, 123, and 124 represent first edge emphasized images in the horizontal direction, the vertical direction, the right diagonally upward direction, and the diagonally left upward direction, respectively. In FIG. A part of the pixel value of each pixel to be formed is shown.

次に、ステップＳ２において、４方向の第１エッジ強調画像間で対応する画素ごとに画素値が最大のものを特定し、最大の画素値のみをそのまま残すと共に最大以外の画素値をゼロとすることにより、４方向の第２エッジ強調画像を生成する。図６において、符号１３１、１３２、１３３及び１３４は、夫々、水平方向、垂直方向、右斜め上方向及び左斜め上方向の第２エッジ強調画像を表しており、図６には、各第２エッジ強調画像を形成する各画素の画素値の一部が示されている。 Next, in step S2, a pixel having the maximum pixel value is specified for each corresponding pixel between the four edge-enhanced images in four directions, leaving only the maximum pixel value as it is, and setting the pixel values other than the maximum to zero. Thus, a second edge enhanced image in four directions is generated. In FIG. 6, reference numerals 131, 132, 133, and 134 represent second edge enhanced images in the horizontal direction, the vertical direction, the upper right diagonal direction, and the upper left diagonal direction, respectively. A part of pixel values of each pixel forming the edge enhanced image is shown.

例えば、第１エッジ強調画像内の或る特定画素位置に対する、第１エッジ強調画像１２１、１２２、１２３及び１２４の画素値が夫々１０、４、６及び３である場合、第２エッジ強調画像１３１、１３２、１３３及び１３４の該特定画素位置における画素値は、夫々、１０、０、０及び０となる（図６参照）。 For example, when the pixel values of the first edge enhanced images 121, 122, 123, and 124 for a specific pixel position in the first edge enhanced image are 10, 4, 6, and 3, respectively, the second edge enhanced image 131 , 132, 133 and 134 at the specific pixel positions are 10, 0, 0 and 0, respectively (see FIG. 6).

次に、ステップＳ３において、４方向の第２エッジ強調画像の夫々に対して平滑化処理を施し、平滑化処理後の４方向の第２エッジ強調画像を４方向の特徴画像とする。図７（ａ）〜（ｄ）に、生成された４方向の特徴画像、即ち、水平方向、垂直方向、右斜め上方向及び左斜め上方向の特徴画像を示す。尚、平滑化処理を行うことなく、４方向の第２エッジ強調画像そのものを４方向の特徴画像とするようにしてもよい。 Next, in step S3, a smoothing process is performed on each of the four-direction second edge enhanced images, and the four-direction second edge enhanced image after the smoothing process is used as a four-direction feature image. FIGS. 7A to 7D show the generated feature images in the four directions, that is, feature images in the horizontal direction, the vertical direction, the right diagonally upward direction, and the diagonally left upward direction. Note that the four-direction second edge enhanced image itself may be used as the four-direction feature image without performing the smoothing process.

４方向の特徴画像の夫々において、画素位置は特徴画像内の座標（ｘ，ｙ）にて特定される。ｘ及びｙは、各特徴画像における水平座標及び垂直座標を表し、今の例の場合、ｘ及びｙは、夫々、０以上且つ７以下の各整数をとる。また、特徴画像の種類を、ｑにて表現する。ｑは、０以上３以下の各整数をとり、ｑ＝０、１、２及び３の特徴画像は、夫々、水平方向、垂直方向、右斜め上方向及び左斜め上方向の特徴画像を意味する。各特徴画像における画素を、特徴画素と呼ぶ。 In each of the four-direction feature images, the pixel position is specified by coordinates (x, y) in the feature image. x and y represent the horizontal coordinate and the vertical coordinate in each feature image. In the present example, x and y each take an integer of 0 or more and 7 or less. Further, the type of feature image is expressed by q. q is an integer of 0 or more and 3 or less, and the feature images of q = 0, 1, 2, and 3 mean feature images in the horizontal direction, the vertical direction, the right diagonally upward direction, and the left diagonally upward direction, respectively. . A pixel in each feature image is referred to as a feature pixel.

図２の顔辞書メモリ３４には、特徴画像のサイズに適応した重みテーブル（換言すれば、判定領域のサイズに適応した重みテーブル）が格納されている。重みテーブルは、教師サンプルとしての大量の顔画像を元に予め作成され、顔辞書メモリ３４に格納される。図８に、判定領域のサイズを８×８画素とした場合における重みテーブルの内容例を示す。重みテーブルには、特徴画素ごとに、特徴画素の画素値に対応した重みが格納されている。重みは、顔らしさを表す値であり、本実施形態では、この重みをスコアと呼ぶことにする。 The face dictionary memory 34 in FIG. 2 stores a weight table adapted to the size of the feature image (in other words, a weight table adapted to the size of the determination area). The weight table is created in advance based on a large amount of face images as teacher samples and stored in the face dictionary memory 34. FIG. 8 shows an example of the contents of the weight table when the size of the determination area is 8 × 8 pixels. The weight table stores the weight corresponding to the pixel value of the feature pixel for each feature pixel. The weight is a value representing the likelihood of a face, and in the present embodiment, this weight is referred to as a score.

また、変数ｑ、ｘ及びｙを、１つの変数ｎに集約して考える。４×８×８＝２５６より、変数ｎは、０以上且つ２５５以下の各整数をとるものとする。変数ｎの値は、変数ｑ、ｘ及びｙが定まれば一意に定まる。４方向の特徴画像における各特徴画素（全特徴画素数は２５６個）を、変数ｎを用いてＦ（ｎ）で表し、特徴画素Ｆ（ｎ）における画素値（図８の横方向に対応）をｉ（ｎ）にて表す。スコアも変数ｎに依存して特定されるため、スコアをｗ（ｎ）にて表すものとする。例えば、ｑ＝ｘ＝ｙ＝０がｎ＝０に対応すると仮定した場合、水平方向の特徴画像（ｑ＝０に対応）の画素位置（０，０）の画素値は、特徴画素Ｆ（０）の画素値ｉ（０）と合致する。スコア（０）は、重みテーブルに基づき、画素値ｉ（０）にて特定される。この画素値ｉ（０）が３であるとしたとき、スコア（０）は、図８の例では３５となる。 Further, variables q, x and y are considered as one variable n. From 4 × 8 × 8 = 256, the variable n is assumed to be an integer of 0 or more and 255 or less. The value of the variable n is uniquely determined if the variables q, x, and y are determined. Each feature pixel in the four-direction feature image (the total number of feature pixels is 256) is represented by F (n) using a variable n, and the pixel value in the feature pixel F (n) (corresponding to the horizontal direction in FIG. 8) Is represented by i (n). Since the score is also specified depending on the variable n, the score is represented by w (n). For example, assuming that q = x = y = 0 corresponds to n = 0, the pixel value at the pixel position (0, 0) of the horizontal feature image (corresponding to q = 0) is the feature pixel F (0 ) Matches the pixel value i (0). The score (0) is specified by the pixel value i (0) based on the weight table. When the pixel value i (0) is 3, the score (0) is 35 in the example of FIG.

このような重みテーブルは、例えば、Adaboostと呼ばれる公知の学習方法を利用して作成することができる（Yoav Freund, Robert E. Schapire,"A decision-theoretic generalization of on-line learning and an application to boosting", European Conference on Computational Learning Theory, September 20，1995．）。 Such a weight table can be created using, for example, a known learning method called Adaboost (Yoav Freund, Robert E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting” ", European Conference on Computational Learning Theory, September 20, 1995.).

尚、Adaboostは、適応的なブースティング学習方法の１つで、大量の教師サンプルをもとに、複数の弱識別器候補の中から識別に有効な弱識別器を複数個選択し、それらを重み付けして統合することによって高精度な識別器を実現する学習方法である。ここで、弱識別器とは、全くの偶然よりは識別能力は高いが、十分な精度を満たすほど高精度ではない識別器のことをいう。弱識別器の選択時において、既に選択した弱識別器がある場合、選択済の弱識別器によって誤認識してしまう教師サンプルに対して学習を重点的に行う。これにより、残りの弱識別器候補の中から最も効果の高い弱識別器を選択するようにする。 Adaboost is an adaptive boosting learning method. Based on a large number of teacher samples, Adaboost selects multiple weak classifiers that are effective for identification from among a plurality of weak classifier candidates. This is a learning method for realizing a highly accurate classifier by weighting and integrating. Here, a weak classifier refers to a classifier that has a higher discrimination ability than a coincidence but is not high enough to satisfy sufficient accuracy. When a weak classifier is selected, if there is a weak classifier already selected, learning is focused on a teacher sample that is erroneously recognized by the selected weak classifier. Thereby, the most effective weak classifier is selected from the remaining weak classifier candidates.

また、重みテーブルを関数によって表現することもでき、重みテーブルの代わりに、この関数を顔辞書メモリ３４に格納するようにしてもよい。この関数において、スコアは、特徴画素の画素値と変数ｎとに依存して定まる。この関数は連続関数によって表現され、ＸＹＺ軸を座標軸とする３次元空間において、Ｘ軸を特徴画素の画素値にとり、Ｙ軸を変数ｎにとり、Ｚ軸をスコアにとると、この関数は連続的な曲面を形成する。この関数や重みテーブルは、判定領域内の画像から後述の類似度を導出するための顔辞書データと呼ぶことができる。 Also, the weight table can be expressed by a function, and this function may be stored in the face dictionary memory 34 instead of the weight table. In this function, the score is determined depending on the pixel value of the feature pixel and the variable n. This function is expressed by a continuous function. In a three-dimensional space with the XYZ axes as coordinate axes, the X axis is taken as the pixel value of the feature pixel, the Y axis is taken as the variable n, and the Z axis is taken as the score. A simple curved surface. This function and weight table can be referred to as face dictionary data for deriving a similarity described later from an image in the determination area.

図４のステップＳ３の処理に続くステップＳ４において、下記式（１）に従い、類似度Ｓ₁を算出する。画素値ｉ（ｎ）は判定領域内の画像が顔画像（即ち、顔を表す画像）であるか否かを識別するために有効な特徴量と呼ぶことができ、類似度Ｓ₁は、着目した判定領域に基づく各画素値ｉ（ｎ）から算出された各スコアｗ（ｎ）の総和に相当する。重みテーブルは、学習処理を介して得た基準顔画像をデータ化したものであるため、各スコアｗ（ｎ）の総和は、重みテーブルによって表される基準顔画像と判定領域内の画像との類似度を表すことになる。 In step S4 following the process of step S3 in FIG. 4, the similarity S ₁ is calculated according to the following equation (1). Pixel value i (n) is the image of the determination area is a face image (i.e., image representing the face) can be referred to as feature amounts effective to identify whether the similarity S ₁ is focused This corresponds to the sum of the scores w (n) calculated from the pixel values i (n) based on the determination area. Since the weight table is obtained by converting the reference face image obtained through the learning process into a data, the sum of the scores w (n) is the sum of the reference face image represented by the weight table and the image in the determination area. It represents the degree of similarity.

そして、図４のステップＳ５において、類似度Ｓ₁と所定の顔判定用閾値ＴＨとを比較し、下記式（２）が成立する時、判定領域内の画像が顔画像である（換言すれば、判定領域に顔が存在している）と判断する一方、下記式（２）が成立しない時、判定領域内の画像は顔画像ではない（換言すれば、判定領域に顔が存在していない）と判断する。
Ｓ₁＞ＴＨ・・・（２） Then, in step S5 in FIG. 4, by comparing the similarity S ₁ and predetermined face determination threshold value TH, when the following formula (2) is satisfied, the image of the determination area is a face image (in other words When the following expression (2) is not satisfied, the image in the determination area is not a face image (in other words, no face exists in the determination area) ).
S ₁ > TH (2)

判定領域内の画像が顔画像である時に類似度Ｓ₁が比較的大きな値をとり且つ判定領域内の画像が顔画像でない時に類似度Ｓ₁が比較的小さな値をとるように、上記の重みテーブルは、学習処理を介して設定されている。このため、上述の処理によって顔検出が可能となる。 The weights are set so that the similarity S ₁ takes a relatively large value when the image in the determination area is a face image and the similarity S ₁ takes a relatively small value when the image in the determination area is not a face image. The table is set through a learning process. For this reason, face detection is possible by the above-described processing.

次に、図２の端顔検出処理部３２について説明する。端顔検出処理部３２は、入力画像の周囲にダミー画像を追加して合成画像を生成し、合成画像内において顔検出を行うことにより、端顔の検出を行う。端顔とは、顔の一部が入力画像端にかかった顔を意味する。即ち、図９に示す如く、端顔とは、入力画像２００の端に存在する顔であって、顔の一部が入力画像２００の外にはみ出している顔を意味する。 Next, the end face detection processing unit 32 in FIG. 2 will be described. The end face detection processing unit 32 detects a face by adding a dummy image around the input image to generate a composite image, and performing face detection in the composite image. An end face means a face in which a part of the face covers the end of the input image. That is, as shown in FIG. 9, the end face means a face that exists at the end of the input image 200 and a part of the face protrudes outside the input image 200.

図１０に、入力画像、ダミー画像及び合成画像の関係を示す。図９において、符号２００が付された四角枠内の領域が入力画像を表し、符号２０１が付されたドーナツ状の破線領域がダミー画像を表す。ダミー画像は、入力画像の四方を囲むように設けられ、入力画像とダミー画像を併せたものが、上記の合成画像に相当する。 FIG. 10 shows the relationship between the input image, the dummy image, and the composite image. In FIG. 9, an area within a square frame denoted by reference numeral 200 represents an input image, and a donut-shaped broken line area denoted by reference numeral 201 represents a dummy image. The dummy image is provided so as to surround four sides of the input image, and the combination of the input image and the dummy image corresponds to the above-described composite image.

また、図１０において、符号２１０が付された四角形は判定領域を表している。全顔検出処理部３１におけるそれと同様、判定領域は、合成画像内において水平方向及び垂直方向に走査され、判定領域内に端顔が存在しているか否かが検出される。 In FIG. 10, a square with reference numeral 210 represents a determination area. Similar to that in the all-face detection processing unit 31, the determination area is scanned in the horizontal direction and the vertical direction in the composite image, and it is detected whether or not an end face exists in the determination area.

［ダミー画像のサイズ決定法］
まず、図１０を参照してダミー画像のサイズの決定法について説明する。入力画像の幅（即ち、水平方向サイズ）及び高さ（即ち、垂直方向サイズ）を夫々Ｗ₁及びＨ₁とし、合成画像の幅及び高さを夫々Ｗ₂及びＨ₂とする。ここで、Ｗ₁≦Ｗ₂且つＨ₁＜Ｈ₂である、或いは、Ｗ₁＜Ｗ₂且つＨ₁≦Ｈ₂である。図１０は、典型的な例としてＷ₁＜Ｗ₂且つＨ₁＜Ｈ₂である場合を例示しており、この場合を例に挙げて以下の説明を行う。今、入力画像から合成画像を生成する際の画像追加幅をＷ＿ｄｕｍにて表す。従って、Ｗ＿ｄｕｍ＝Ｗ₂−Ｗ₁、である。また、判定領域の幅（即ち、水平方向サイズ）をＷ＿ｄｉｃで表す。 [Dummy image sizing method]
First, a method for determining the size of a dummy image will be described with reference to FIG. The width (ie, horizontal size) and height (ie, vertical size) of the input image are W ₁ and H _{1, respectively,} and the width and height of the composite image are W ₂ and H ₂ , respectively. Here, W ₁ ≦ W ₂ and H ₁ <H ₂ , or W ₁ <W ₂ and H ₁ ≦ H ₂ . FIG. 10 exemplifies the case where W ₁ <W ₂ and H ₁ <H ₂ as a typical example, and the following description will be given by taking this case as an example. Now, an image addition width when generating a composite image from an input image is represented by W_dum. Therefore, W_dum = W ₂ −W ₁ . Further, the width of the determination area (that is, the horizontal size) is represented by W_dic.

例えば、図１０に示す如く、入力画像の中心と合成画像の中心とが一致するようにダミー画像を設定する。この場合、入力画像の左端と右端の夫々に対して、Ｗ＿ｄｕｍ／２の幅を有する画像がダミー画像の一部として設けられることになる。図１０に示す例では、画像追加幅Ｗ＿ｄｕｍを左右均等に割り振っているが、入力画像の左右に追加する画像の幅を別々に設定するようにしても構わない。 For example, as shown in FIG. 10, the dummy image is set so that the center of the input image matches the center of the composite image. In this case, an image having a width of W_dum / 2 is provided as a part of the dummy image for each of the left end and the right end of the input image. In the example shown in FIG. 10, the image addition width W_dum is equally allocated to the left and right, but the width of the image added to the left and right of the input image may be set separately.

画像追加幅Ｗ＿ｄｕｍは、以下の第１〜第３の追加幅設定手法の何れかを用いて設定される。 The image additional width W_dum is set using any one of the following first to third additional width setting methods.

第１の追加幅設定手法では、判定領域の幅Ｗ＿ｄｉｃに基づいて、画像追加幅Ｗ＿ｄｕｍを設定する。より具体的には、下記式（３ａ）に基づいて、Ｗ＿ｄｕｍを定める。ここで、ｋ₁は、不等式（３ｂ）を満たす任意の値である。ｋ₁は、固定値であってもよく、幅Ｗ₁及び／又は幅Ｗ＿ｄｉｃに応じて定まる値であってもよい。
Ｗ＿ｄｕｍ＝ｋ₁・Ｗ＿ｄｉｃ・・・（３ａ）
０＜ｋ₁＜１・・・（３ｂ） In the first additional width setting method, the image additional width W_dum is set based on the determination area width W_dic. More specifically, W_dum is determined based on the following formula (3a). Here, k ₁ is an arbitrary value that satisfies the inequality (3b). k ₁ may be a fixed value or a value determined according to the width W ₁ and / or the width W_dic.
W_dum = k ₁ · W_dic (3a)
0 <k ₁ <1 (3b)

第２の追加幅設定手法では、全顔検出処理部３１によって既に検出された顔のサイズに基づいて、画像追加幅Ｗ＿ｄｕｍを設定する。ここにおける「全顔検出処理部３１によって既に検出された顔のサイズ」とは、全顔検出処理部３１及び端顔検出処理部３２に与えられた同一の入力画像上における顔のサイズであり、その顔の幅（幅方向のサイズ）をＷ＿ｆｄにて表す。全顔検出処理部３１によって検出された、入力画像上の顔の幅Ｗ＿ｆｄは、例えば、全顔検出処理部３１が用いた判定領域の幅に等しい。 In the second additional width setting method, the image additional width W_dum is set based on the size of the face already detected by the all face detection processing unit 31. Here, “the size of the face already detected by the all face detection processing unit 31” is the size of the face on the same input image given to the all face detection processing unit 31 and the end face detection processing unit 32. The width (size in the width direction) of the face is represented by W_fd. The face width W_fd on the input image detected by the all-face detection processing unit 31 is equal to the width of the determination area used by the all-face detection processing unit 31, for example.

より具体的には、下記式（４ａ）に基づいて、Ｗ＿ｄｕｍを定める。ここで、ｋ₂は、不等式（４ｂ）を満たす任意の値である。ｋ₂は、固定値であってもよく、幅Ｗ₁及び／又は幅Ｗ＿ｆｄに応じて定まる値であってもよい。
Ｗ＿ｄｕｍ＝ｋ₂・Ｗ＿ｆｄ・・・（４ａ）
０＜ｋ₂・Ｗ＿ｆｄ＜Ｗ＿ｄｉｃ・・・（４ｂ） More specifically, W_dum is determined based on the following formula (4a). Here, k ₂ is an arbitrary value that satisfies the inequality (4b). k ₂ may be a fixed value or a value determined according to the width W ₁ and / or the width W_fd.
W_dum = k ₂ · W_fd (4a)
0 <k ₂ · W_fd <W_dic (4b)

第３の追加幅設定手法では、幅Ｗ＿ｄｉｃと幅Ｗ＿ｆｄの双方に基づいて、画像追加幅Ｗ＿ｄｕｍを設定する。より具体的には、下記式（５ａ）に基づいて、Ｗ＿ｄｕｍを定める。ここで、ｋ_3A及びｋ_3Bは、不等式（５ｂ）を満たす任意の値である。ｋ_3A及びｋ_3Bは、固定値であってもよく、幅Ｗ₁、幅Ｗ＿ｄｉｃ及び／又は幅Ｗ＿ｆｄに応じて定まる値であってもよい。
Ｗ＿ｄｕｍ＝ｋ_3A・Ｗ＿ｄｉｃ＋ｋ_3B・Ｗ＿ｆｄ・・・（５ａ）
０＜ｋ_3A・Ｗ＿ｄｉｃ＋ｋ_3B・Ｗ＿ｆｄ＜Ｗ＿ｄｉｃ・・・（５ｂ） In the third additional width setting method, the image additional width W_dum is set based on both the width W_dic and the width W_fd. More specifically, W_dum is determined based on the following formula (5a). Here, k _3A and k _3B are arbitrary values satisfying the inequality (5b). k _3A and k _3B may be fixed values or values determined according to the width W ₁ , the width W_dic, and / or the width W_fd.
W_dum = k _3A · W_dic + k _3B · W_fd (5a)
0 <k _3A · W_dic + k _3B · W_fd <W_dic (5b)

判定領域の幅Ｗ＿ｄｉｃにて定まる特定の領域を超えて端顔の検出処理を行っても、無意味である（その領域には端顔が存在し得ないため）、或いは、正確な端顔検出はできない。これを考慮して、第１、第２又は第３の追加幅設定手法が採用され、また、式（３ｂ）、（４ｂ）及び（５ｂ）のような条件が設けられる。また、同一の入力画像内に、撮影者が捉えようとする顔（端顔含む）が複数存在している場合、その複数の顔のサイズは、通常、同程度である。これを考慮して、第２又は第３の追加幅設定手法を採用するようにしてもよい。 Even if a face detection process is performed beyond a specific area determined by the width W_dic of the determination area, it is meaningless (because no face can exist in that area) or accurate face detection is performed. I can't. In consideration of this, the first, second, or third additional width setting method is adopted, and conditions such as equations (3b), (4b), and (5b) are provided. In addition, when there are a plurality of faces (including end faces) that the photographer wants to capture in the same input image, the sizes of the plurality of faces are generally the same. In consideration of this, the second or third additional width setting method may be adopted.

何れの追加幅設定手法を採用しても、無駄な或いは必要以上の検出処理が省かれ、処理時間の短縮効果が得られると共に端顔の誤検出及び過検出の抑制効果が得られる。端顔の過検出とは、全顔検出処理部３１によって検出された顔のサイズと比べて著しく小さい又は大きい端顔（この端顔は、撮影者の着目しない人物の顔と推測される）の検出などを意味する。 Regardless of which additional width setting method is employed, useless or unnecessary detection processing is omitted, and the effect of reducing processing time is obtained and the effect of suppressing misdetection and overdetection of a face is obtained. The over-detection of the end face is an end face that is remarkably smaller or larger than the size of the face detected by the all-face detection processing unit 31 (this end face is assumed to be a face of a person not focused on by the photographer). Means detection.

幅に着目してダミー画像のサイズの決定法を説明したが、入力画像から合成画像を生成する際の画像追加高さＨ＿ｄｕｍ（＝Ｈ₂−Ｈ₁）も、画像追加幅Ｗ＿ｄｕｍと同様にして設定される。即ち、画像追加高さＨ＿ｄｕｍは、判定領域の高さ及び／又は全顔検出処理部３１によって既に検出された顔の高さに基づいて設定される。 Although the method for determining the size of the dummy image has been described focusing on the width, the image additional height H_dum (= H ₂ −H ₁ ) when generating the composite image from the input image is also set in the same manner as the image additional width W_dum. Is set. That is, the additional image height H_dum is set based on the height of the determination area and / or the height of the face already detected by the all-face detection processing unit 31.

［合成画像生成後の動作］
次に、合成画像を生成した後の、図２の端顔検出処理部３２の動作を説明する。 [Operation after composite image generation]
Next, the operation of the end face detection processing unit 32 in FIG. 2 after generating the composite image will be described.

端顔検出処理部３２にて端顔検出処理を行う際、判定領域を合成画像の左上隅に配置した状態を初期状態とし、合成画像内において、判定領域を１画素ずつ左から右に水平方向に走査させる。判定領域が合成画像の右端に到達したら、判定領域を下方向に１画素ずらし、再度、水平方向の走査を行う。このように、判定領域を水平方向及び垂直方向に走査しながら、判定領域内に端顔が存在しているかを逐次検出する。 When the end face detection processing unit 32 performs the end face detection process, the state in which the determination area is arranged at the upper left corner of the composite image is set as an initial state, and the determination area is horizontally shifted from left to right one pixel at a time in the composite image. To scan. When the determination area reaches the right end of the composite image, the determination area is shifted downward by one pixel, and scanning in the horizontal direction is performed again. In this way, it is sequentially detected whether an end face exists in the determination area while scanning the determination area in the horizontal direction and the vertical direction.

但し、端顔検出処理部３２において定義される判定領域は、必ず、図１１の判定領域２１０の如く、ダミー画像の一部と重なる。図１１における入力画像２００及びダミー画像２０１は、図１０に示すそれらと同じものである。図１１において、網掛け領域２１１は、判定領域２１０とダミー画像２０１が重なっている領域を表し、この領域を、以下、判定領域とダミー画像との重なり領域（或いは、単に重なり領域）と呼ぶ。 However, the determination area defined in the end face detection processing unit 32 always overlaps a part of the dummy image as in the determination area 210 of FIG. The input image 200 and the dummy image 201 in FIG. 11 are the same as those shown in FIG. In FIG. 11, a shaded area 211 represents an area where the determination area 210 and the dummy image 201 overlap each other, and this area is hereinafter referred to as an overlapping area (or simply an overlapping area) between the determination area and the dummy image.

判定領域とダミー画像とが常に重なり領域を有するように走査が行われるため、図１２（ａ）に示す如く、或る合成画像２０２内において判定領域を走査する際、判定領域の中心座標は、合成画像２０２の外周付近の斜線領域２０３内を移動することになる。 Since the scanning is performed so that the determination region and the dummy image always have an overlapping region, as shown in FIG. 12A, when scanning the determination region in a certain composite image 202, the center coordinates of the determination region are It moves within the hatched area 203 near the outer periphery of the composite image 202.

尚、判定領域を１画素ずつ左から右に水平方向に走査させ、判定領域が合成画像の右端に到達したら判定領域を下方向に１画素ずらしつつ合成画像の左端に移動させて再度水平走査するのではなく、渦巻き状（螺旋状）に走査するようにしてもよい。前者の走査手法では、判定領域を合成画像の右端から左端に移動させる際、顔検出を行うためにメモリ（不図示）に読み込んだ判定領域内の画像の画素値を全て該メモリから破棄し、新たに、移動後の判定領域内の画像の画素値を全て該メモリに読み込む必要がある。これと比較して、後者の走査手法（渦巻き状の走査手法）では、メモリへの読み込み回数を減らすことができるというメリットがある。 The determination area is scanned horizontally from left to right one pixel at a time, and when the determination area reaches the right end of the composite image, the determination area is shifted downward by one pixel and moved to the left end of the composite image to perform horizontal scanning again. Instead of scanning, it may be scanned in a spiral shape (spiral shape). In the former scanning method, when moving the determination region from the right end to the left end of the composite image, all pixel values of the image in the determination region read into a memory (not shown) for face detection are discarded from the memory, It is necessary to newly read all the pixel values of the image in the determination area after movement into the memory. Compared with this, the latter scanning method (spiral scanning method) has an advantage that the number of times of reading into the memory can be reduced.

渦巻き状の走査の走査方向概念図を図１２（ｂ）に示す。渦巻き状の走査では、判定領域を１画素ずつ左から右に水平方向に走査させ、判定領域が合成画像の右端に到達したら、走査方向を垂直方向に変更する。判定領域が合成画像の右端に到達したら、判定領域を水平方向に移動させることなく判定領域を上から下へ垂直方向に走査させるようにする。このように走査することで、判定領域が合成画像の右端に到達後、判定領域を下方に移動させる際、移動前に上記メモリに読み込んだ画素値の一部を移動後においても再利用可能である。故に、メモリへの読み込み回数を削減可能であり、処理高速化を期待できる。判定領域が合成画像の下端に到達したら、判定領域を垂直方向に移動させることなく判定領域を右から左へ水平方向に走査させる。この後も、図１２（ｂ）に示す如く、判定領域の移動軌跡が合成画像内で螺旋を描くように判定領域を走査する。 A conceptual diagram of the scanning direction of the spiral scan is shown in FIG. In the spiral scan, the determination area is scanned horizontally from left to right one pixel at a time, and when the determination area reaches the right end of the composite image, the scanning direction is changed to the vertical direction. When the determination area reaches the right end of the composite image, the determination area is scanned vertically from top to bottom without moving the determination area in the horizontal direction. By scanning in this way, when moving the determination area downward after the determination area reaches the right end of the composite image, a part of the pixel values read into the memory before the movement can be reused. is there. Therefore, the number of readings into the memory can be reduced, and high processing speed can be expected. When the determination area reaches the lower end of the composite image, the determination area is scanned in the horizontal direction from right to left without moving the determination area in the vertical direction. Thereafter, as shown in FIG. 12B, the determination region is scanned so that the movement locus of the determination region draws a spiral in the composite image.

端顔検出処理部３２に対する入力画像は、全顔検出処理部３１に対する入力画像と同じものである。従って、端顔検出処理部３２に対する入力画像も、原画像又は原画像の縮小画像とされ、これによって、１種類のサイズの判定領域にて大きさの異なる端顔を検出することができるようになる。 The input image for the end face detection processing unit 32 is the same as the input image for the all face detection processing unit 31. Accordingly, the input image to the edge detection processing unit 32 is also an original image or a reduced image of the original image, so that an edge having a different size can be detected in one type of determination area. Become.

判定領域２１０のサイズは、図３（ａ）〜（ｃ）における判定領域１１０のそれと同じとされる。以下、説明及び図示の簡略化上、判定領域２１０のサイズは、判定領域１１０と同様、８×８画素であるものとする。 The size of the determination area 210 is the same as that of the determination area 110 in FIGS. Hereinafter, for simplification of description and illustration, it is assumed that the size of the determination area 210 is 8 × 8 pixels, similarly to the determination area 110.

上述の如く判定領域は走査されるが、或る合成画像内において判定領域が或る特定の位置に存在する状態に着目し、その判定領域内に端顔が存在しているか否かを判断する手法について説明する。図１３は、この手法の動作手順を表すフローチャートである。 As described above, the determination area is scanned, but focusing on the state where the determination area exists at a specific position in a certain composite image, it is determined whether an end face exists within the determination area. The method will be described. FIG. 13 is a flowchart showing the operation procedure of this method.

端顔検出処理部３２は、図４に示すステップＳ１〜Ｓ３と同様の処理を行う。即ち、ステップＳ１において、合成画像内に定義された判定領域内の画像に対してエッジ強調処理を施すことにより４方向の第１エッジ強調画像を生成し、続くステップＳ２において、４方向の第１エッジ強調画像から４方向の第２エッジ強調画像を生成する。そして、ステップＳ３において、４方向の第２エッジ強調画像から４方向の特徴画像を生成する。第１エッジ強調画像、第２エッジ強調画像及び特徴画像の生成手法は、全顔検出処理部３１におけるそれらと同様である。端顔検出処理部３２における動作手順では、ステップＳ３の処理の後、ステップＳ１４に移行する。ステップＳ１４及びＳ１５の処理は、端顔検出処理部３２によって実行される。 The end face detection processing unit 32 performs the same processing as steps S1 to S3 shown in FIG. That is, in step S1, a first edge-enhanced image in four directions is generated by performing edge enhancement processing on the image in the determination area defined in the composite image, and in the subsequent step S2, first images in four directions are generated. A second edge enhanced image in four directions is generated from the edge enhanced image. In step S3, a four-direction feature image is generated from the four-direction second edge enhanced image. The generation methods of the first edge enhanced image, the second edge enhanced image, and the feature image are the same as those in the all face detection processing unit 31. In the operation procedure in the end face detection processing unit 32, the process proceeds to step S14 after the process in step S3. The processes in steps S14 and S15 are executed by the end face detection processing unit 32.

ステップＳ１４では、全顔検出処理部３１の動作説明にて述べたのと同様、変数ｑ、ｘ及びｙを変数ｎに集約して考え、端顔検出処理部３２によって生成される４方向の特徴画像の各特徴画素Ｆ（ｎ）の画素値ｉ（ｎ）に基づき、上述の重みテーブルからスコアｗ（ｎ）を算出する。そして、下記式（６）に従い、類似度Ｓ₂を算出する。類似度Ｓ₂は、端顔検出処理部３２によって生成される４方向の特徴画像に基づく各スコアｗ（ｎ）の総和に相当する。尚、式（６）における類似度の算出手法は、上記式（１）におけるそれと同様であるが、全顔検出処理部３１によって算出される類似度（Ｓ₁）と端顔検出処理部３２によって算出される類似度（Ｓ₂）を区別して表現すべく、ここで、Ｓ₁と異なる記号Ｓ₂を導入した。 In step S14, as described in the explanation of the operation of the all face detection processing unit 31, the variables q, x, and y are considered as the variable n, and the four-direction features generated by the end face detection processing unit 32 are considered. Based on the pixel value i (n) of each feature pixel F (n) of the image, the score w (n) is calculated from the above weight table. Then, the similarity S ₂ is calculated according to the following equation (6). The similarity S ₂ corresponds to the sum of the scores w (n) based on the four-direction feature images generated by the end face detection processing unit 32. The similarity calculation method in Equation (6) is the same as that in Equation (1) above, but the similarity (S ₁ ) calculated by the all face detection processing unit 31 and the end face detection processing unit 32 In order to distinguish and express the calculated similarity (S ₂ ), a symbol S ₂ different from S ₁ is introduced here.

類似度Ｓ₂の算出後、ステップＳ１５に移行する。上述したように、全顔検出処理部３１ではＳ₁＞ＴＨの成立／不成立を判断することによって判定領域内の画像が顔画像であるか否かを判断する。しかし、ダミー画像を形成する各画素の画素値（以下、ダミー画素値という）にも依存するが、類似度Ｓ₂には、判定領域とダミー領域との重なり領域についてのスコア成分が誤差として含まれうる。ダミー画素値として、顔と何ら関係のない値が設定されるからである。 After calculating the similarity S _2, the process proceeds to step S15. As described above, the all-face detection processing unit 31 determines whether or not the image in the determination region is a face image by determining whether S ₁ > TH is satisfied or not. However, although depending on the pixel value (hereinafter referred to as a dummy pixel value) of each pixel forming the dummy image, the similarity S ₂ includes a score component as an error in the overlapping region between the determination region and the dummy region. Can be. This is because a value not related to the face is set as the dummy pixel value.

そこで、ステップＳ１５では、閾値補正項αを導入し、下記式（７）の成立／不成立を判断する。そして、ステップＳ１５において、下記式（７）が成立する時、判定領域内に端顔が存在していると判断する一方、下記式（７）が成立しない時、判定領域内に端顔が存在していないと判断する。
Ｓ₂＞ＴＨ−α ・・・（７） Therefore, in step S15, a threshold correction term α is introduced to determine whether or not the following formula (7) is established. In step S15, when the following expression (7) is satisfied, it is determined that an end face exists in the determination area. When the following expression (7) is not satisfied, an end face exists in the determination area. Judge that it is not.
S ₂ > TH-α (7)

閾値補正項αは、重なり領域に対応する画素値ｉ（ｎ）から算出されたスコアｗ（ｎ）の総和である。図１４及び図１５を参照して、閾値補正項αの算出法を示すための具体例を説明する。今、図１４に示す如く、判定領域の左端の１列のみが重なり領域であったとする。この判定領域内の画像に対する４方向の第１エッジ強調画像は、図１５の符号２２１〜２２４のように表される。斜線が付された、各第１エッジ強調画像２２１〜２２４の左端の１列は、重なり領域の画素に対してエッジ強調処理を施すことによって得られる。図１５において、符号２３１〜２３４は、４方向の第１エッジ強調画像２２１〜２２４から生成される４方向の特徴画像である。斜線が付された、各特徴画像２３１〜２３４の左端の１列は、各第１エッジ強調画像２２１〜２２４の左端の１列の画素値に基づいて算出される。各特徴画像２３１〜２３４の左端の１列に属する合計３２個の画素の画素値ｉ（ｎ）から合計３２個のスコアｗ（ｎ）が算出されるが、この合計３２個のスコアｗ（ｎ）の総和を閾値補正項αとする。 The threshold correction term α is the sum of the scores w (n) calculated from the pixel values i (n) corresponding to the overlapping area. With reference to FIG. 14 and FIG. 15, a specific example for illustrating a method of calculating the threshold correction term α will be described. Now, as shown in FIG. 14, it is assumed that only one column at the left end of the determination area is an overlapping area. The first edge-enhanced images in four directions with respect to the image in the determination area are represented as reference numerals 221 to 224 in FIG. One row at the left end of each of the first edge-enhanced images 221 to 224, which is hatched, is obtained by performing edge enhancement processing on the pixels in the overlapping region. In FIG. 15, reference numerals 231 to 234 are four-direction feature images generated from the four-direction first edge enhanced images 221 to 224. The leftmost column of each of the feature images 231 to 234, which is shaded, is calculated based on the pixel values of the leftmost column of the first edge enhanced images 221 to 224. A total of 32 scores w (n) are calculated from the pixel values i (n) of a total of 32 pixels belonging to the leftmost column of each feature image 231 to 234. The total of 32 scores w (n) ) Is a threshold correction term α.

判定領域において、重なり領域と異なる領域を非重なり領域と呼んだ場合、類似度Ｓ₂は、重なり領域内の画像に対応するスコア（上記例における合計３２個のスコアｗ（ｎ））と非重なり領域内の画像に対応するスコアとの合算値となり、前者は、類似度Ｓ₂に対する重なり領域の寄与成分に相当すると共に閾値補正項αに対応し、後者は、類似度Ｓ₂に対する非重なり領域の寄与成分に相当する。 In the determination area, when an area different from the overlapping area is called a non-overlapping area, the similarity S ₂ is not overlapped with the scores corresponding to the images in the overlapping area (a total of 32 scores w (n) in the above example). The sum is a sum of the scores corresponding to the images in the area, the former corresponds to the contribution component of the overlapping area with respect to the similarity S ₂ and corresponds to the threshold correction term α, and the latter corresponds to the non-overlapping area with respect to the similarity S ₂ . It corresponds to the contribution component.

上述の如く、顔判定用閾値ＴＨを閾値補正項αにて補正することにより、ダミー画素値に依存しない端顔検出を行うことが可能となる。 As described above, by correcting the face determination threshold TH with the threshold correction term α, it is possible to perform end face detection independent of the dummy pixel value.

また、重なり領域が増大するに伴って類似度Ｓ₂に寄与する特徴画素の総数が減少することを考慮し、ステップＳ１５において、下記式（７ａ）の成立／不成立を判断するようにしてもよい。この場合において、下記式（７ａ）が成立する時、判定領域内に端顔が存在していると判断する一方、下記式（７ａ）が成立しない時、判定領域内に端顔が存在していないと判断する。ここで、ｋ₄は、重なり領域に属する画素の総数（図１４の例においては、８個）に応じて定められる。端顔の検出されやすさを増大させたい場合にはｋ₄を１未満の値とすればよく、誤検出及び過検出を抑制したい場合にはｋ₄を１より大きな値とすればよい。
Ｓ₂＞ｋ₄・ＴＨ−α ・・・（７ａ） In consideration of the fact that the total number of feature pixels contributing to the similarity S ₂ decreases as the overlapping area increases, the establishment / non-establishment of the following expression (7a) may be determined in step S15. . In this case, when the following formula (7a) is satisfied, it is determined that an end face is present in the determination area. On the other hand, when the following expression (7a) is not satisfied, an end face is present in the determination area. Judge that there is no. Here, k ₄ is determined according to the total number of pixels belonging to the overlapping region (eight in the example of FIG. 14). K ₄ may be set to a value less than 1 when it is desired to increase the ease of detection of an end face, and k ₄ may be set to a value greater than 1 when it is desired to suppress erroneous detection and overdetection.
S ₂ > k ₄ · TH-α (7a)

全顔検出処理部３１の顔検出結果と端顔検出処理部３１の顔検出結果は、顔検出結果統合部３３に送られる。全顔検出処理部３１の顔検出結果は、全顔検出処理部３１によって検出された顔の、原画像上におけるサイズ及び位置を特定する。端顔検出処理部３２の顔検出結果は、端顔検出処理部３２によって検出された端顔の、原画像上におけるサイズ及び位置を特定する。顔検出結果統合部３３は、双方の顔検出結果を統合したものに相当する統合顔検出結果を出力する。 The face detection result of the all face detection processing unit 31 and the face detection result of the end face detection processing unit 31 are sent to the face detection result integration unit 33. The face detection result of the all-face detection processing unit 31 specifies the size and position of the face detected by the all-face detection processing unit 31 on the original image. The face detection result of the end face detection processing unit 32 specifies the size and position of the end face detected by the end face detection processing unit 32 on the original image. The face detection result integration unit 33 outputs an integrated face detection result corresponding to an integration of both face detection results.

撮像装置１内の各部位は、必要に応じて統合顔検出結果を参照し、必要な処理を行う。例えば、撮像部１１の光学系に補正レンズやバリアングルプリズム（何れも不図示）が設けられている場合は、それらを駆動制御することにより、端顔と認知される人物の顔全体が原画像内に含まれるように撮像部１１の撮像素子に結像する光学像をずらす。また、原画像は、通常、撮像素子の全撮像領域に内包される有効撮像領域の画素信号から形成される。この有効撮像領域は全撮像領域内で移動させることが可能である。そこで、端顔が検出された場合は、端顔と認知される人物の顔全体が原画像内に含まれるように有効撮像領域を全撮像領域内で移動させるようにしてもよい。また、光学系に備えられたズームレンズ（不図示）を駆動制御して光学系の画角を増大させることにより、端顔と認知される人物の顔全体が原画像内に含まれるようにしてもよい Each part in the imaging device 1 refers to the integrated face detection result as necessary, and performs necessary processing. For example, when a correction lens and a vari-angle prism (both not shown) are provided in the optical system of the imaging unit 11, the entire face of a person who is recognized as an end face is an original image by driving and controlling them. The optical image formed on the image sensor of the imaging unit 11 is shifted so as to be included in the image. In addition, the original image is usually formed from pixel signals of an effective imaging area included in the entire imaging area of the imaging device. This effective imaging area can be moved within the entire imaging area. Therefore, when an end face is detected, the effective image pickup area may be moved within the entire image pickup area so that the entire face of the person recognized as the end face is included in the original image. Further, by driving and controlling a zoom lens (not shown) provided in the optical system to increase the angle of view of the optical system, the entire face of a person recognized as an end face is included in the original image. Good

＜＜第２実施形態＞＞
次に、本発明の第２実施形態に係る撮像装置について説明する。第２実施形態に係る撮像装置は、第１実施形態に係る撮像装置１と類似しており、全体ブロック図などは両者間で同じである。但し、図２の端顔検出処理部３２による類似度算出の手法が、第１と第２実施形態とで異なる。その他の点において、第１と第２実施形態は同様であるため、同様の部分の重複する説明を省略し、両者の相違点に着目して説明を行う。第１実施形態に記載した事項は、矛盾なき限り、第２実施形態でも適用される。 << Second Embodiment >>
Next, an imaging apparatus according to the second embodiment of the present invention will be described. The imaging apparatus according to the second embodiment is similar to the imaging apparatus 1 according to the first embodiment, and the overall block diagram and the like are the same between the two. However, the method of calculating the similarity by the end face detection processing unit 32 in FIG. 2 differs between the first and second embodiments. In other respects, the first and second embodiments are the same, and therefore, duplicate description of similar parts is omitted, and description will be made focusing on the differences between the two. The matters described in the first embodiment are applied to the second embodiment as long as there is no contradiction.

第１実施形態では、閾値補正項αを用いて顔判定用閾値ＴＨを補正することにより、ダミー画素値に依存しない端顔検出を実現しているが、第２実施形態では、このような補正を行うことなく正確な端顔検出を実現する。 In the first embodiment, the face detection threshold TH is corrected using the threshold correction term α to realize edge detection independent of the dummy pixel value. In the second embodiment, such correction is performed. Accurate edge detection is realized without performing the process.

第２実施形態では、合成画像を生成する際、合成画像を形成するダミー画像のダミー画素値として、端顔検出判定に影響を与えない画素値を設定する。例えば、第１実施形態で例示したように端顔検出判定に影響を与えないスコアが０であるなら、判定領域とダミー画像との重なり領域に対応するスコアの総和が０となるような画素値をダミー画素値とすればよい。つまり、合成画像を生成する際、０の値を有するスコアが導出されるような画素値（以下、強制ゼロ画素値という）を、重なり領域のダミー画素値として設定すればよい。 In the second embodiment, when generating a composite image, a pixel value that does not affect the end face detection determination is set as the dummy pixel value of the dummy image forming the composite image. For example, if the score that does not affect the edge detection determination is 0 as exemplified in the first embodiment, the pixel value is such that the sum of the scores corresponding to the overlapping area between the determination area and the dummy image is 0. May be a dummy pixel value. That is, when generating a composite image, a pixel value from which a score having a value of 0 is derived (hereinafter referred to as a forced zero pixel value) may be set as a dummy pixel value in the overlapping region.

この強制ゼロ画素値は、設計段階において、任意の値に設定可能である。合成画像の各画素値のとり得る値が０以上２５５以下の整数であるなら、０以上２５５以下の整数の中から強制ゼロ画素値を選べばよい。 This forced zero pixel value can be set to an arbitrary value in the design stage. If the possible value of each pixel value of the composite image is an integer between 0 and 255, a forced zero pixel value may be selected from an integer between 0 and 255.

ところで、第１実施形態で述べたように、スコアｗ（ｎ）は、対応する特徴画素Ｆ（ｎ）の画素値ｉ（ｎ）から算出され、その画素値と変数ｎが定まれば、重みテーブルまたは関数によって一意に定まる。この関数は、上述したように、連続関数によって表され、Ｘ軸、Ｙ軸及びＺ軸を特徴画素の画素値、変数ｎ及びスコアにとった３次元空間において連続的な曲面を形成する。このため、特徴画素の画素値が「１」程度ずれても、最終的に得られる類似度Ｓ₂に大きな差異は生じないという特徴がある。この特徴は、この関数に対応して作成される重みテーブルに対しても同様に当てはまる。また、特徴画素の画素値が「１」程度ずれても最終的に得られる類似度Ｓ₂に大きな差異は生じないのであるから、入力画像の画素値における「１」程度のずれも類似度Ｓ₂に大きな影響は与えない。 By the way, as described in the first embodiment, the score w (n) is calculated from the pixel value i (n) of the corresponding feature pixel F (n), and if the pixel value and the variable n are determined, the weight It is uniquely determined by the table or function. As described above, this function is represented by a continuous function, and forms a continuous curved surface in a three-dimensional space in which the X-axis, Y-axis, and Z-axis are the pixel values of the feature pixels, the variable n, and the score. For this reason, even if the pixel value of the characteristic pixel is shifted by about “1”, there is a feature that the similarity S ₂ finally obtained does not greatly differ. This feature applies to the weight table created corresponding to this function as well. In addition, even if the pixel value of the feature pixel is shifted by about “1”, there is no significant difference in the similarity S ₂ finally obtained. Therefore, a shift of about “1” in the pixel value of the input image is also performed by the similarity S. _No significant impact on ₂ .

ダミー画像（重なり領域）内の各画素の画素値を強制ゼロ画素値とすると共に強制ゼロ画素値が設定された画素に対して強制的に０のスコアを与えるような仕組みを設ける場合を考える。この場合において、入力画像（非重なり領域）内の画素が強制ゼロ画素値と同じ画素値を有していたとする。図１６は、この状況を表しており、図１６の画素３００は、強制ゼロ画素値を有する入力画像内の画素である。この場合、何ら手当てをしなければ、その入力画像内の画素（図１６における画素３００）に対応するスコアが本来あるべきスコアと異なってしまう。これを回避するべく、第２実施形態では上記の特徴を利用し、強制ゼロ画素値と同じ画素値を有する入力画像内の画素の画素値を少しだけ増減させる。 Consider a case in which a pixel value of each pixel in the dummy image (overlapping region) is set to a forced zero pixel value and a mechanism for forcibly giving a score of 0 to a pixel for which the forced zero pixel value is set is considered. In this case, it is assumed that the pixels in the input image (non-overlapping region) have the same pixel value as the forced zero pixel value. FIG. 16 illustrates this situation, where pixel 300 in FIG. 16 is a pixel in the input image having a forced zero pixel value. In this case, if no treatment is made, the score corresponding to the pixel in the input image (pixel 300 in FIG. 16) is different from the score that should be originally. In order to avoid this, in the second embodiment, the above feature is used, and the pixel value of the pixel in the input image having the same pixel value as the forced zero pixel value is slightly increased or decreased.

図１７に、第２実施形態に係る端顔検出処理部３２ａの内部ブロック図を示す。端顔検出処理部３２ａは、合成画像生成部４１と、スコア計算前処理部４２と、スコア計算部４３と、類似度算出部４４と、端顔判定部４５とを有しており、図２の端顔検出処理部３２として利用することが可能である。 FIG. 17 shows an internal block diagram of the end face detection processing unit 32a according to the second embodiment. The end face detection processing unit 32a includes a composite image generation unit 41, a score calculation preprocessing unit 42, a score calculation unit 43, a similarity calculation unit 44, and an end face determination unit 45. FIG. It can be used as the end face detection processing unit 32.

合成画像生成部４１は、端顔検出処理部３２ａに対する入力画像の周囲にダミー画像を追加して合成画像を生成する。この際、ダミー画素値（ダミー画像の各画素の画素値）として強制ゼロ画素値を設定する。また更に、入力画像内に強制ゼロ画素値を有する画素（図１６における画素３００に対応）が含まれている場合、その画素の画素値を少しだけ増減させることにより該画素値を強制ゼロ画素値と異ならせる。具体的には、その画素値を１だけ増加或いは１だけ減少させる。画素値を増やすか或いは減らすかは、強制ゼロ画素値に応じて決めればよい。 The composite image generation unit 41 generates a composite image by adding a dummy image around the input image to the face detection processing unit 32a. At this time, a forced zero pixel value is set as a dummy pixel value (pixel value of each pixel of the dummy image). Furthermore, when a pixel having a forced zero pixel value (corresponding to the pixel 300 in FIG. 16) is included in the input image, the pixel value is increased or decreased by slightly increasing or decreasing the pixel value of the pixel. Different from. Specifically, the pixel value is increased by 1 or decreased by 1. Whether to increase or decrease the pixel value may be determined according to the forced zero pixel value.

スコア計算前処理部４２は、合成画像生成部４１によって生成された合成画像を参照し、合成画像内において強制ゼロ画素値を有する画素に対し強制的に０のスコアを与える。強制ゼロ画素値と異なる画素値を有する領域、即ち入力画像の領域に対しては、スコア計算部４３によって第１実施形態と同様の処理が施され、上記重みテーブルから各スコアが算出される。 The score calculation preprocessing unit 42 refers to the composite image generated by the composite image generation unit 41 and forcibly gives a score of 0 to pixels having a forced zero pixel value in the composite image. The area having a pixel value different from the forced zero pixel value, that is, the area of the input image is subjected to the same processing as that of the first embodiment by the score calculation unit 43, and each score is calculated from the weight table.

つまり、合成画像生成部４１とスコア計算前処理部４２とスコア計算部４３とによって、ダミー画像内の画素に対応するスコアは全て０とされ、入力画像内の画素に対応するスコアは上記重みテーブルから算出されるようになる。得られた各スコアは類似度算出部４４に与えられる。 That is, the composite image generation unit 41, the score calculation preprocessing unit 42, and the score calculation unit 43 all set the scores corresponding to the pixels in the dummy image to 0, and the scores corresponding to the pixels in the input image It comes to be calculated from. Each obtained score is given to the similarity calculation unit 44.

類似度算出部４４は、与えられた各スコアに基づき、上記式（６）に従って類似度Ｓ₂を算出する。第１実施形態と同様、類似度Ｓ₂は、判定領域ごとに算出される。或る着目した判定領域に関し、当該判定領域内の画像から算出される合計２５６個のスコアｗ（ｎ）の総和を、当該判定領域に対する類似度Ｓ₂として算出する。この合計２５６個のスコアｗ（ｎ）の内、少なくとも重なり領域に対応するスコアは全て０となる。即ち、類似度Ｓ₂に対する重なり領域の寄与成分はゼロとなる。 The similarity calculation unit 44 calculates the similarity S ₂ according to the above equation (6) based on the given scores. Similar to the first embodiment, the similarity S ₂ is calculated for each determination region. For a certain determination area, the total of 256 scores w (n) calculated from the images in the determination area is calculated as the similarity S ₂ to the determination area. Of the total 256 scores w (n), at least the scores corresponding to the overlapping regions are all zero. That is, the contribution component of the overlapping area to the similarity S ₂ is zero.

端顔判定部４５は、類似度算出部４４によって算出された類似度Ｓ₂を上記顔判定用閾値ＴＨと比較し、前者が後者よりも大きい場合、着目した判定領域内に端顔が存在していると判断する一方、そうでない場合、判定領域内に端顔が存在していないと判断する。或いは、第１実施形態において上記式（７）を式（７ａ）に変形したように、類似度Ｓ₂と比較すべきＴＨに上述の係数ｋ₄を乗じてもよい。即ち、端顔判定部４５は、類似度算出部４４によって算出された類似度Ｓ₂をｋ₄・ＴＨと比較し、前者が後者よりも大きい場合、着目した判定領域内に端顔が存在していると判断する一方、そうでない場合、判定領域内に端顔が存在していないと判断するようにしてもよい。端顔判定部４５による判断結果は、顔検出結果として出力される。この顔検出結果は、図２の端顔検出処理部３２によるそれと等価なものである。 The end face determination unit 45 compares the similarity S ₂ calculated by the similarity calculation unit 44 with the face determination threshold TH, and when the former is larger than the latter, an end face exists in the focused determination region. On the other hand, if not, it is determined that there is no end face in the determination area. Alternatively, TH that should be compared with the similarity S ₂ may be multiplied by the coefficient k ₄ as described in the first embodiment by changing the above expression (7) into the expression (7a). That is, the edge determination unit 45 compares the similarity S ₂ calculated by the similarity calculation unit 44 with k ₄ · TH, and if the former is larger than the latter, the edge is present in the focused determination region. On the other hand, if not, it may be determined that there is no end face in the determination area. The determination result by the end face determination unit 45 is output as a face detection result. This face detection result is equivalent to that by the end face detection processing unit 32 of FIG.

第２実施形態のように構成すれば、閾値補正項αを逐次算出する必要がない。また、図１の全顔検出処理部３１と同様の類似度算出手法にて、端顔検出に関する類似度を算出することが可能である（これは、第１実施形態にも当てはまる）。即ち、端顔検出を行うために、専用の類似度算出手法を採用したり、専用の重みテーブルを用意したりする必要がなくなる。このため、類似度の算出を行う部位を、全顔検出処理部と端顔検出処理部とで共用するといったことも可能となる。 If configured as in the second embodiment, it is not necessary to sequentially calculate the threshold correction term α. Moreover, it is possible to calculate the similarity related to edge detection using the same similarity calculation method as that of the all-face detection processing unit 31 in FIG. 1 (this also applies to the first embodiment). That is, it is not necessary to employ a dedicated similarity calculation method or to prepare a dedicated weight table in order to perform end face detection. For this reason, it is possible to share the part for calculating the similarity between the all face detection processing unit and the end face detection processing unit.

また、上述の説明では「ダミー画像の各画素の画素値を強制ゼロ画素値とし、強制ゼロ画素値を有する画素に対し強制的に０のスコアを与える」としたが、これを以下のように変形することもできる。即ち、ダミー画像の各画素の画素値を強制ゼロ画素値とし、重なり領域内の各画素の画素値から算出されるスコアの総和が値Ｃ_SCOREとなるように、強制ゼロ画素値を有する画素に対し強制的に何らかの値を与えるようにする。そうすると、類似度Ｓ₂は、値Ｃ_SCOREの成分を含むようになる。例えば、値Ｃ_SCOREは、予め定められる所定値である。値Ｃ_SCOREを、重なり領域内の画素の総数に応じて定めてもよい。端顔の検出されやすさを増大させたい場合には値Ｃ_SCOREを正の値とすればよく、誤検出及び過検出を抑制したい場合には値Ｃ_SCOREを負の値とすればよい。 In the above description, “the pixel value of each pixel of the dummy image is set to a forced zero pixel value, and a pixel having a forced zero pixel value is forcibly given a score of 0”. It can also be deformed. That is, the pixel value of each pixel of the dummy image is set to the forced zero pixel value, and the pixel having the forced zero pixel value is set so that the sum of the scores calculated from the pixel values of the respective pixels in the overlapping region becomes the value C _SCORE. Forcibly give some value. Then, the similarity S ₂ includes a component of the value C _SCORE . For example, the value C _SCORE is a predetermined value that is determined in advance. The value C _SCORE may be determined according to the total number of pixels in the overlap region. The value C _SCORE may be set to a positive value if it is desired to increase the ease of detection of the end face, and the value C _SCORE may be set to a negative value if it is desired to suppress erroneous detection and overdetection.

＜＜第３実施形態＞＞
次に、本発明の第３実施形態について説明する。第３実施形態に係る撮像装置の全体ブロック図は、第１実施形態におけるそれ（図１）と同様であり、第１実施形態（又は第２実施形態）に記載した事項は、矛盾なき限り、第３実施形態にも適用される。第３実施形態は、図１の顔検出部１９の内部に特徴を有するため、その特徴部に着目して説明を行う。 << Third Embodiment >>
Next, a third embodiment of the present invention will be described. The overall block diagram of the imaging apparatus according to the third embodiment is the same as that in the first embodiment (FIG. 1), and the matters described in the first embodiment (or the second embodiment) are as long as there is no contradiction. This also applies to the third embodiment. Since the third embodiment has a feature inside the face detection unit 19 in FIG. 1, the description will be made by paying attention to the feature portion.

図１８は、図１の顔検出部１９として利用可能な顔検出部の内部ブロック図である。図１８の顔検出部は、図２の顔検出部１９と類似している。但し、図２の顔検出部１９における端顔検出処理部３２が、端顔検出処理部３２ｂに置換されている。この置換を除いて、図１８の顔検出部と図２の顔検出部１９は同様の構成及び機能を有する。端顔検出処理部３２ｂは、端顔検出処理部３２と同様の機能を有している上に、全顔検出処理部３１の顔検出結果及び／又は図１の傾きセンサ１８から出力される傾きデータ等を利用した特有の機能を有する。 FIG. 18 is an internal block diagram of a face detection unit that can be used as the face detection unit 19 of FIG. The face detection unit in FIG. 18 is similar to the face detection unit 19 in FIG. However, the end face detection processing unit 32 in the face detection unit 19 of FIG. 2 is replaced with an end face detection processing unit 32b. Except for this replacement, the face detection unit in FIG. 18 and the face detection unit 19 in FIG. 2 have the same configuration and function. The end face detection processing unit 32b has the same function as the end face detection processing unit 32, and also the face detection result of the all face detection processing unit 31 and / or the tilt output from the tilt sensor 18 of FIG. It has a unique function using data.

以下、この特有の機能について説明する。この特有の機能は、端顔を検出する際における検出条件に制限を加える機能である。検出条件に対する制限の加え方として、７つの具体的形態があるため、それぞれを、第１〜第７制限例として個別に説明する。各制限例は、矛盾なき限り任意に組み合わせて利用することができ、また、或る制限例で記載した事項は、矛盾なき限り他の制限例でも適用される。尚、以下の説明において、全顔検出処理部３１によって検出される顔を、端顔検出処理部３２ｂによって検出される端顔と明確に区別するべく、全顔とも呼ぶ。 Hereinafter, this unique function will be described. This peculiar function is a function that limits the detection conditions when detecting the end face. Since there are seven specific modes for adding restrictions to detection conditions, each will be described individually as first to seventh restriction examples. Each restriction example can be used in any combination as long as there is no contradiction, and the matters described in a certain restriction example can be applied to other restriction examples as long as there is no contradiction. In the following description, the face detected by the all-face detection processing unit 31 is also referred to as an all-face in order to clearly distinguish it from the end face detected by the end-face detection processing unit 32b.

［第１制限例］
まず、第１制限例を説明する。１つの原画像に着目する。第１制限例では、全顔検出処理部３１によって事前に検出された全顔のサイズに基づいて、検出条件としての端顔検出サイズ範囲に上限と下限を定める。これを具体的に説明する。 [First restriction example]
First, a first restriction example will be described. Focus on one original image. In the first restriction example, an upper limit and a lower limit are set in the end face detection size range as a detection condition based on the size of all faces detected in advance by the all face detection processing unit 31. This will be specifically described.

上述したように、原画像の縮小画像を入力画像とし、縮小率を段階的に変更することによって、全顔検出処理部３１及び端顔検出処理部３２ｂは、様々なサイズを有する全顔又は端顔を検出可能である。全顔検出処理部３１は、着目した原画像に基づく各入力画像に対して全顔を検出する処理を行い、検出された全顔のサイズを特定する検出全顔サイズ情報を顔検出結果として端顔検出処理部３２ｂ（及び必要に応じて画像入力部３０）に与える。 As described above, by using the reduced image of the original image as the input image and changing the reduction ratio in stages, the entire face detection processing unit 31 and the end face detection processing unit 32b can detect all the faces or edges having various sizes. The face can be detected. The all face detection processing unit 31 performs a process for detecting all faces on each input image based on the focused original image, and uses the detected all face size information for specifying the size of all detected faces as a face detection result. This is given to the face detection processing unit 32b (and the image input unit 30 as necessary).

検出全顔サイズ情報によって特定される、原画像上の全顔のサイズをＦ_Wとする。着目した原画像から複数の全顔が検出された場合は、それらの平均のサイズ、最大のサイズ及び最小のサイズの何れかをＦ_Wとすればよい。 Let _FW be the size of all faces on the original image specified by the detected total face size information. When a plurality of all faces are detected from the focused original image, any one of the average size, the maximum size, and the minimum size may be set as _FW .

この場合、端顔検出サイズ範囲の上限は（Ｆ_W＋ΔＡ）とされ、下限は（Ｆ_W−ΔＢ）とされる。即ち、原画像上におけるサイズが（Ｆ_W＋ΔＡ）以下であって且つ（Ｆ_W−ΔＢ）以上の端顔を検出し、それ以外のサイズの端顔の検出が行われないように、端顔検出処理部３２ｂは動作する（具体的には、原画像に対する縮小率を制限する）。ここで、ΔＡ及びΔＢは正の値であり、両者の一致／不一致は問わない。尚、原画像上における全顔または端顔のサイズは、例えば、原画像上における顔領域（全顔または端顔が存在すると判断された領域）の総画素数で表される。 In this case, the upper limit of the end face detection size range is (F _W + ΔA), and the lower limit is (F _W −ΔB). In other words, an end face is detected so that an end face having a size of (F _W + ΔA) or less and (F _W −ΔB) or more on the original image is detected, and an end face of any other size is not detected. The detection processing unit 32b operates (specifically, the reduction rate for the original image is limited). Here, ΔA and ΔB are positive values, and matching / mismatching of both is not questioned. Note that the size of the entire face or end face on the original image is represented, for example, by the total number of pixels of the face area (area where it is determined that the entire face or end face exists) on the original image.

記念撮影等において、撮影者が着目する人物の顔のサイズは、概ね揃っていることが多い。これを考慮し、上述の如く、事前に検出された全顔のサイズに応じて端顔検出サイズ範囲の上下限を設定する。これにより、処理時間の短縮効果が得られると共に端顔の誤検出及び過検出の抑制効果が得られる。 In commemorative photography or the like, the size of the face of the person to whom the photographer pays attention is often almost the same. Considering this, as described above, the upper and lower limits of the end face detection size range are set in accordance with the sizes of all faces detected in advance. Thereby, the effect of shortening the processing time is obtained and the effect of suppressing misdetection and overdetection of the face is obtained.

［第２制限例］
次に、第２制限例を説明する。まず、第２制限例の説明に必要な事項について記述する。今、図１９に示す如く、入力画像４００に着目し、入力画像４００の左端を中心に含む領域４０１と、入力画像４００の右端を中心に含む領域４０２と、入力画像４００の上端を中心に含む領域４０３と、入力画像４００の下端を中心に含む領域４０４と、を定義する。また、入力画像４００の左上隅を、入力画像４００の原点４１０とする。図１９において、上下方向が画像の垂直方向に対応し、左右方向が画像の水平方向に対応する。 [Second restriction example]
Next, a second restriction example will be described. First, items necessary for explaining the second restriction example will be described. Now, as shown in FIG. 19, paying attention to the input image 400, the region 401 including the left end of the input image 400 as the center, the region 402 including the right end of the input image 400 as the center, and including the upper end of the input image 400 as the center. An area 403 and an area 404 including the lower end of the input image 400 as the center are defined. Further, the upper left corner of the input image 400 is set as the origin 410 of the input image 400. In FIG. 19, the vertical direction corresponds to the vertical direction of the image, and the horizontal direction corresponds to the horizontal direction of the image.

領域４０３の左側において、領域４０３と領域４０１は共通領域を有し、領域４０３の右側において、領域４０３と領域４０２は共通領域を有する。領域４０４の左側において、領域４０４と領域４０１は共通領域を有し、領域４０４の右側において、領域４０４と領域４０２は共通領域を有する。領域４０１と領域４０２は共通領域を持たない。端顔検出用の判定領域は、領域４０１〜領域４０４の何れかに含まれるように走査される。第２制限例を適用しない場合、例えば、端顔検出用の判定領域は、領域４０１〜領域４０４の全てに位置するように走査される。即ち、端顔検出範囲は、領域４０１〜領域４０４の全てとされる。 On the left side of the region 403, the region 403 and the region 401 have a common region, and on the right side of the region 403, the region 403 and the region 402 have a common region. On the left side of the region 404, the region 404 and the region 401 have a common region, and on the right side of the region 404, the region 404 and the region 402 have a common region. The area 401 and the area 402 do not have a common area. The determination area for end face detection is scanned so as to be included in any of the areas 401 to 404. When the second restriction example is not applied, for example, the determination area for end face detection is scanned so as to be located in all of the areas 401 to 404. That is, the end face detection range is the entire area 401 to area 404.

第２制限例では、全顔検出処理部３１によって事前に検出された全顔の位置に基づいて、検出条件としての端顔検出範囲に制限を加える。例えば、図１９に示す如く、入力画像４００の左側よりに全顔４２０が検出された場合、端顔検出範囲に領域４０１を含める一方で端顔検出範囲から領域４０２を除外する。これにより、端顔検出用の判定領域は、領域４０１内を走査される一方、領域４０２内で走査されないことになり、領域４０２に対して端顔検出が行われなくなる。尚、この場合において、端顔検出範囲に領域４０３及び４０４を含めるか否かについては任意である。また、検出された全顔の位置に応じて端顔検出範囲にどのような制限を加えるかは、予め設定されている。 In the second restriction example, the end face detection range as a detection condition is restricted based on the positions of all faces detected in advance by the all face detection processing unit 31. For example, as shown in FIG. 19, when the entire face 420 is detected from the left side of the input image 400, the region 401 is included in the end face detection range, while the region 402 is excluded from the end face detection range. As a result, the determination area for end face detection is scanned in the area 401, but not scanned in the area 402, and end face detection is not performed on the area 402. In this case, whether or not to include the regions 403 and 404 in the end face detection range is arbitrary. Further, what kind of restriction is added to the end face detection range in accordance with the positions of all detected faces is set in advance.

記念撮影を行う場合など、撮影領域内に複数の顔を含める場合、それらの顔は、撮影領域内の片側に集中することが多い。これを考慮し、上述の如く、事前に検出された全顔の位置に応じて端顔検出範囲を設定する（制限を加える）。これにより、処理時間の短縮効果が得られると共に端顔の誤検出及び過検出の抑制効果が得られる。 When a plurality of faces are included in the shooting area, such as when taking a commemorative photo, the faces are often concentrated on one side in the shooting area. Considering this, as described above, the end face detection range is set according to the positions of all faces detected in advance (limitation is added). Thereby, the effect of shortening the processing time is obtained and the effect of suppressing misdetection and overdetection of the face is obtained.

［第３制限例］
次に、第３制限例を説明する。第３及び後述する第４制限例を用いる場合、全顔検出処理部３１が全顔の向きも併せて検出できるように構成する。この場合、全顔検出処理部３１は、入力画像から検出された顔が、正面顔（正面から見た顔）であるのか、或いは、横顔（右横或いは左横から見た顔）であるのかを区別して検出可能とされる。顔の向きを検出する手法として様々な手法が提案されており、全顔検出処理部３１は何れの手法をも採用可能である。 [Third restriction example]
Next, a third restriction example will be described. When the third and fourth restriction examples described later are used, the entire face detection processing unit 31 is configured to detect the orientation of all faces. In this case, the all-face detection processing unit 31 determines whether the face detected from the input image is a front face (a face seen from the front) or a side face (a face seen from the right side or the left side). Can be detected separately. Various methods have been proposed as methods for detecting the orientation of the face, and the all-face detection processing unit 31 can employ any method.

例えば、特開平１０−３０７９２３号公報に記載の手法のように、入力画像の中から、目、鼻、口等の顔部品を順番に見つけていって画像中の顔の位置を検出し、顔部品の投影データに基づいて顔の向きを検出する。或いは例えば、特開２００６−７２７７０号公報に記載の手法を用いてもよい。この手法については、後にも述べる。 For example, as in the technique described in Japanese Patent Laid-Open No. 10-307923, face parts such as eyes, nose and mouth are found in order from the input image, and the position of the face in the image is detected. The direction of the face is detected based on the projection data of the part. Alternatively, for example, a method described in JP 2006-72770 A may be used. This technique will be described later.

第３制限例では、全顔検出処理部３１によって事前に検出された全顔の向きに基づいて、検出条件としての端顔検出範囲に制限を加える。例えば、図２０に示す如く、入力画像４００から検出された全顔４３０の向きが右向きである場合、端顔検出範囲に領域４０２を含める一方で端顔検出範囲から領域４０１を除外する。これにより、端顔検出用の判定領域は、領域４０２内を走査される一方、領域４０１内で走査されないことになり、領域４０１に対して端顔検出が行われなくなる。尚、この場合において、端顔検出範囲に領域４０３及び４０４を含めるか否かについては任意である。また、検出された全顔の向きに応じて端顔検出範囲にどのような制限を加えるかは、予め設定されている。 In the third restriction example, the end face detection range as a detection condition is restricted based on the orientation of all faces detected in advance by the all face detection processing unit 31. For example, as shown in FIG. 20, when the orientation of all faces 430 detected from the input image 400 is rightward, the area 402 is included in the end face detection range, while the area 401 is excluded from the end face detection range. As a result, the determination area for end face detection is scanned in the area 402, but is not scanned in the area 401, and end face detection is not performed on the area 401. In this case, whether or not to include the regions 403 and 404 in the end face detection range is arbitrary. In addition, what kind of restriction is added to the end face detection range in accordance with the orientations of all detected faces is set in advance.

入力画像内に含まれる全顔が右を向いているとき、存在しうる端顔は、合成画像内において右側よりに位置している可能性が高い、と推測される。これを考慮し、上述の如く、事前に検出された全顔の向きに応じて端顔検出範囲を設定する（制限を加える）。これにより、処理時間の短縮効果が得られると共に端顔の誤検出及び過検出の抑制効果が得られる。 When all the faces included in the input image are facing right, it is estimated that there is a high possibility that the end face that may exist is located on the right side in the composite image. Considering this, as described above, the end face detection range is set according to the orientations of all faces detected in advance (limitation is added). Thereby, the effect of shortening the processing time is obtained and the effect of suppressing misdetection and overdetection of the face is obtained.

［第４制限例］
次に、第４制限例を説明する。第４制限例を用いる場合、端顔検出処理部３２が端顔の向きも併せて検出できるように構成する。この場合、端顔検出処理部３２は、合成画像から検出された端顔が、正面顔（正面から見た端顔）であるのか、或いは、横顔（右横或いは左横から見た端顔）であるのかを区別して検出可能とされる。端顔の向きを検出する手法として、全顔検出処理部３１におけるそれと同様の手法を用いることができる。 [Fourth restriction example]
Next, a fourth restriction example will be described. When the fourth restriction example is used, the end face detection processing unit 32 is configured to detect the direction of the end face. In this case, the end face detection processing unit 32 determines whether the end face detected from the composite image is a front face (end face seen from the front) or a side face (end face seen from the right side or the left side). It is possible to detect whether it is As a method for detecting the direction of the end face, a method similar to that in the all-face detection processing unit 31 can be used.

第４制限例では、全顔検出処理部３１によって事前に検出された全顔の向きに基づいて、検出される端顔の向きに制限を加える。第４制限例では、端顔の向きに対する制限事項が、端顔を検出する際における検出条件に相当する。 In the fourth restriction example, the orientation of the detected end face is restricted based on the orientation of all the faces detected in advance by the all face detection processing unit 31. In the fourth restriction example, the restriction on the direction of the end face corresponds to the detection condition when detecting the end face.

例えば、図２１に示す如く、入力画像４００から検出された全顔４４０の向きが右向きである場合、右向きの端顔４４１を検出する一方で左向きの端顔４４２を検出しないようにする（端顔４４２を端顔として認識しないようにする）。この際、正面を向いている端顔を検出するか否かについては任意である。また、検出された全顔の向きに応じて、検出を行う端顔の向きにどのような制限を加えるかは、予め設定されている。 For example, as shown in FIG. 21, when the orientation of all faces 440 detected from the input image 400 is rightward, the right-facing endface 441 is detected while the left-facing endface 442 is not detected (endface). 442 is not recognized as an end face). At this time, whether or not to detect an end face facing the front is arbitrary. In addition, it is set in advance what kind of restriction is imposed on the orientation of the face to be detected according to the orientation of all the detected faces.

入力画像内に含まれる全顔が右を向いているとき、合成画像内に含まれうる左向きの端顔は、撮影者が着目していない人物の顔である可能性が高い。これを考慮し、上述の如く、事前に検出された全顔の向きに応じて検出される端顔の向きを設定する（制限を加える）。これにより、処理時間の短縮効果が得られると共に端顔の誤検出及び過検出の抑制効果が得られる。 When all the faces included in the input image are facing right, it is highly likely that the left-facing end face that can be included in the composite image is a face of a person not focused on by the photographer. Considering this, as described above, the direction of the end face to be detected is set in accordance with the direction of all faces detected in advance (addition of a restriction). Thereby, the effect of shortening the processing time is obtained and the effect of suppressing misdetection and overdetection of the face is obtained.

尚、第４制限例と上述の第３制限例を組み合わせることも可能である。例えば、図２１に示す如く入力画像４００から検出された全顔４４０の向きが右向きである場合において、領域４０１内に位置する左向きの端顔４４２を検出しないようにする一方で、領域４０２内に位置する左向きの端顔（不図示）は検出できるようにする。 It is possible to combine the fourth restriction example with the third restriction example described above. For example, as shown in FIG. 21, when the orientation of all faces 440 detected from the input image 400 is rightward, the left-facing end face 442 located in the area 401 is not detected, while the area 402 A left-facing end face (not shown) is made to be detected.

［第５制限例］
次に、第５制限例を説明する。第５制限例では、図１の傾きセンサ１８からの傾きデータが利用される。傾きデータによって、撮像装置１の現在の状態が、横撮影状態であるのか或いは縦撮影状態であるのかが特定される。 [Fifth restriction example]
Next, a fifth restriction example will be described. In the fifth restriction example, inclination data from the inclination sensor 18 of FIG. 1 is used. The inclination data specifies whether the current state of the imaging apparatus 1 is the horizontal shooting state or the vertical shooting state.

端顔検出処理部３２ｂは、この傾きデータに基づいて、端顔検出範囲に制限を加える。図１９〜図２１に示される入力画像４００は、横撮影状態にて撮影された原画像またはその縮小画像である。従って、端顔検出処理部３２ｂは、この原画像の撮影時における傾きデータ（横撮影状態を表す傾きデータ）に基づき、端顔検出範囲に領域４０１及び４０２を含める一方で端顔検出範囲から領域４０３及び４０４を除外する。横撮影状態において、画像の垂直方向側に顔がきれるという事態は比較的起こりにくいからである。これにより、端顔検出用の判定領域は、領域４０１及び４０２内を走査される一方、領域４０３及び４０３内で走査されないことになり、領域４０３及び４０４に対して端顔検出が行われなくなる。 The end face detection processing unit 32b limits the end face detection range based on the inclination data. An input image 400 shown in FIGS. 19 to 21 is an original image taken in the horizontal shooting state or a reduced image thereof. Accordingly, the end face detection processing unit 32b includes the areas 401 and 402 in the end face detection range based on the tilt data (inclination data indicating the horizontal shooting state) at the time of shooting the original image, while the area from the end face detection range to the area. 403 and 404 are excluded. This is because a situation in which the face is cut off in the vertical direction side of the image in the horizontal shooting state is relatively unlikely. As a result, the determination area for edge detection is scanned in the areas 401 and 402, but is not scanned in the areas 403 and 403, and the edge detection is not performed on the areas 403 and 404.

一方、図２２に示される入力画像４００は、縦撮影状態にて撮影された原画像またはその縮小画像である。この場合、端顔検出処理部３２ｂは、この原画像の撮影時における傾きデータ（縦撮影状態を表す傾きデータ）に基づき、端顔検出範囲に領域４０３及び４０４を含める一方で端顔検出範囲から領域４０１及び４０２を除外する。縦撮影状態において、画像の水平方向側に顔がきれるという事態は比較的起こりにくいからである。これにより、端顔検出用の判定領域は、領域４０３及び４０４内を走査される一方、領域４０１及び４０２内で走査されないことになり、領域４０１及び４０２に対して端顔検出が行われなくなる。 On the other hand, an input image 400 shown in FIG. 22 is an original image taken in a vertical shooting state or a reduced image thereof. In this case, the end face detection processing unit 32b includes the areas 403 and 404 in the end face detection range based on the tilt data at the time of shooting the original image (inclination data indicating the vertical shooting state), while starting from the end face detection range. The areas 401 and 402 are excluded. This is because a situation in which the face is cut off in the horizontal direction of the image in the vertical shooting state is relatively unlikely. As a result, the determination area for edge detection is scanned in the areas 403 and 404, but is not scanned in the areas 401 and 402, and edge detection is not performed on the areas 401 and 402.

第５制限例によっても、処理時間の短縮効果が得られると共に端顔の誤検出及び過検出の抑制効果が得られる。 According to the fifth restriction example, the effect of shortening the processing time can be obtained, and the effect of suppressing misdetection and overdetection of the face can be obtained.

［第６制限例］
次に、第６制限例を説明する。第６制限例の具体的な説明の前に、傾いた全顔又は端顔の検出手法について説明する。全ての実施形態に共通するが、全顔検出処理部（３１）及び端顔検出処理部（３２、３２ａ又は３２ｂ）は、夫々、図２３に示すような傾いた全顔及び傾いた端顔をも検出することができる。傾いた全顔を検出するためには、入力画像を必要分だけ回転させて、回転後の入力画像に対し、上述してきた全顔検出処理（全顔を検出するための処理）を行えばよい。傾いた端顔を検出する際も、同様に、回転後の入力画像に対し、上述してきた端顔検出処理（端顔を検出するための処理）を行えばよい。以下、端顔に着目して第６制限例を説明するが、全顔に対しても同様に処理である。 [Sixth restriction example]
Next, a sixth restriction example will be described. Prior to specific description of the sixth restriction example, a method for detecting a tilted whole face or end face will be described. Although common to all the embodiments, the whole face detection processing unit (31) and the end face detection processing unit (32, 32a, or 32b) each have a tilted full face and a tilted end face as shown in FIG. Can also be detected. In order to detect all the tilted faces, the input image is rotated by a necessary amount, and the above-described all face detection process (process for detecting all faces) is performed on the rotated input image. . Similarly, when detecting an inclined face, the above-described face detection process (process for detecting a face) may be performed on the rotated input image. Hereinafter, the sixth restriction example will be described focusing on the edge face, but the process is similarly performed for all faces.

様々な傾き角度を有する端顔を検出するために、例えば、入力画像を所定角度ずつ回転させ、回転後の各入力画像に対して上述の全顔検出処理及び端顔検出処理を行う訳であるが、第６制限例では、上述の傾きデータに基づいて、入力画像を回転させる角度範囲に制限を加える。即ち、傾きデータに基づいて、所定の傾き角度範囲内の傾き角度を有する端顔を検出可能とする一方で、その傾き角度範囲外の傾き角度を有する端顔を検出対象から除外する。傾きデータに応じて、検出条件としての角度範囲（又は傾き角度範囲）に如何なる制限を加えるかは、予め定めておけばよい。 In order to detect an end face having various tilt angles, for example, the input image is rotated by a predetermined angle, and the above-described all face detection process and end face detection process are performed on each rotated input image. However, in the sixth restriction example, the angle range for rotating the input image is restricted based on the tilt data described above. That is, an end face having an inclination angle within a predetermined inclination angle range can be detected based on the inclination data, while an end face having an inclination angle outside the inclination angle range is excluded from detection targets. Depending on the inclination data, what kind of restriction should be added to the angle range (or inclination angle range) as a detection condition may be determined in advance.

具体例を挙げる。原画像の撮影時における傾きデータが横撮影状態を表すとき、図２４（ａ）に示すような、合成画像の領域４０３側に顎があり且つ領域４０４側に目があるような端顔４５１は、存在する可能性が低い。故に、傾きデータが横撮影状態を表すとき、端顔４５１のような端顔が検出対象とならないように上述の角度範囲に制限を加える。 A specific example is given. When the tilt data at the time of shooting the original image indicates the horizontal shooting state, an end face 451 having a chin on the region 403 side and an eye on the region 404 side as shown in FIG. , Unlikely to exist. Therefore, when the tilt data represents the horizontal shooting state, the above-described angle range is limited so that an end face such as the end face 451 is not detected.

同様に、原画像の撮影時における傾きデータが縦撮影状態を表すとき、図２４（ｂ）に示すような、合成画像の領域４０２側に顎があり且つ領域４０１側に目があるような端顔４５２は、存在する可能性が低い。故に、傾きデータが縦撮影状態を表すとき、端顔４５２のような端顔が検出対象とならないように上述の角度範囲に制限を加える。 Similarly, when the tilt data at the time of shooting the original image indicates the vertical shooting state, as shown in FIG. 24B, the end where the jaw is on the region 402 side and the eye is on the region 401 side of the composite image. The face 452 is unlikely to exist. Therefore, when the tilt data represents the vertical shooting state, the above-described angle range is limited so that an end face such as the end face 452 is not detected.

第６制限例によっても、処理時間の短縮効果が得られると共に端顔の誤検出及び過検出の抑制効果が得られる。 According to the sixth restriction example, the effect of shortening the processing time can be obtained and the effect of suppressing misdetection and overdetection of the face can be obtained.

［第７制限例］
次に、第７制限例を説明する。第７実施例では、人の声が聞こえる方向にのみ着目して端顔検出を行うようにする。このために、図１の撮像装置１に、複数の指向性マイクを搭載する。この複数の指向性マイクは、感度の高い方向が互いに異なるように撮像装置１に設置される。 [Seventh restriction example]
Next, a seventh restriction example will be described. In the seventh embodiment, end face detection is performed by paying attention only to the direction in which a human voice can be heard. For this purpose, a plurality of directional microphones are mounted on the imaging apparatus 1 of FIG. The plurality of directional microphones are installed in the imaging apparatus 1 so that directions with high sensitivity are different from each other.

図２５を参照して、具体的な例を挙げる。図２５は、指向性マイク５１及び５２が設けられた撮像装置１を上方（空側）から見た図である。図２５において、符号５０１は、指向性マイク５１の指向性が高い領域を表し、符号５０２は、指向性マイク５２の指向性が高い領域を表す。指向性マイク５１は、撮像装置１の左側から入ってくる音を集音し、指向性マイク５２は、撮像装置１の右側から入ってくる音を集音する。つまり、撮像装置１を上方から見た二次元平面上において、撮像装置１の中心を基準として考えた場合、指向性マイク５１は、左側から入ってくる音に対して感度が比較的高い一方で右側から入ってくる音に対して感度が比較的低く、指向性マイク５２は、右側から入ってくる音に対して感度が比較的高い一方で左側から入ってくる音に対して感度が比較的低い。 A specific example will be given with reference to FIG. FIG. 25 is a diagram of the imaging apparatus 1 provided with the directional microphones 51 and 52 as viewed from above (empty side). In FIG. 25, reference numeral 501 represents an area where the directivity of the directional microphone 51 is high, and reference numeral 502 represents an area where the directivity of the directional microphone 52 is high. The directional microphone 51 collects sound that enters from the left side of the imaging apparatus 1, and the directional microphone 52 collects sound that enters from the right side of the imaging apparatus 1. That is, on the two-dimensional plane when the imaging device 1 is viewed from above, when the center of the imaging device 1 is considered as a reference, the directional microphone 51 is relatively high in sensitivity to sound entering from the left side. The sensitivity of the directional microphone 52 is relatively low with respect to sound coming from the right side, while the sensitivity of the directional microphone 52 is relatively high with respect to sound coming from the left side. Low.

図１の主制御部１３は、音声信号処理部としての機能をも備えている。指向性マイク５１及び５２に入力された各音を表す各音声信号は、主制御部１３に与えられる。主制御部１３は、各音声信号に基づき、公知の音声認識処理を用いて、指向性マイク５１に入力された音に人の声が含まれているかを判断すると共に指向性マイク５２に入力された音に人の声が含まれているかを判断する。この判断の結果を、音声判断結果と呼ぶ。 The main control unit 13 in FIG. 1 also has a function as an audio signal processing unit. Each audio signal representing each sound input to the directional microphones 51 and 52 is given to the main control unit 13. Based on each audio signal, the main control unit 13 determines whether or not a human voice is included in the sound input to the directional microphone 51 using a known speech recognition process and is input to the directional microphone 52. To determine whether the sound contains human voice. The result of this determination is called a voice determination result.

図２５は、横撮影状態における撮像装置１を上方（空側）から見た図でもある。従って、合成画像の領域４０１（図１９等参照）側に位置する人が発した声は指向性マイク５１によって集音され、合成画像の領域４０２側に位置する人が発した声は指向性マイク５２によって集音される。第７制限例では、この特性を利用し、音声判断結果に基づいて端顔検出範囲に制限を加える。 FIG. 25 is also a diagram of the imaging device 1 in the horizontal shooting state as viewed from above (empty side). Therefore, the voices uttered by the person located on the side of the composite image area 401 (see FIG. 19 and the like) are collected by the directional microphone 51, and the voices uttered by the person located on the side of the composite image area 402 are collected. 52 is collected. In the seventh restriction example, this characteristic is used to restrict the end face detection range based on the voice determination result.

例えば、指向性マイク５１に入力された音に人の声が含まれており且つ指向性マイク５２に入力された音に人の声が含まれていないと判断された場合、撮像装置１の左側に人が存在していると推測し、端顔検出範囲に領域４０１を含める一方で端顔検出範囲から領域４０２を除外する。逆に、指向性マイク５１に入力された音に人の声が含まれておらず且つ指向性マイク５２に入力された音に人の声が含まれていると判断された場合、撮像装置１の右側に人が存在していると推測し、端顔検出範囲に領域４０２を含める一方で端顔検出範囲から領域４０１を除外する。尚、これらの場合において、端顔検出範囲に領域４０３及び４０４を含めるか否かについては任意である。また、音声判断結果に応じて端顔検出範囲にどのような制限を加えるかは、予め設定されている。また、横撮影状態における制限処理について説明したが、当然に予想される変形を施すことにより、縦撮影状態でも同様の処理が可能である。 For example, if it is determined that the sound input to the directional microphone 51 includes a human voice and the sound input to the directional microphone 52 does not include a human voice, the left side of the imaging device 1 And the region 401 is included in the end face detection range, while the region 402 is excluded from the end face detection range. Conversely, when it is determined that the sound input to the directional microphone 51 does not include a human voice and the sound input to the directional microphone 52 includes a human voice, the imaging apparatus 1 It is assumed that there is a person on the right side of the area, and the area 402 is included in the end face detection range, while the area 401 is excluded from the end face detection range. In these cases, whether or not to include the regions 403 and 404 in the end face detection range is arbitrary. Further, what kind of restriction is added to the end face detection range according to the sound determination result is set in advance. Further, although the restriction process in the horizontal shooting state has been described, it is possible to perform the same process in the vertical shooting state by applying a naturally anticipated modification.

第７制限例によっても、処理時間の短縮効果が得られると共に端顔の誤検出及び過検出の抑制効果が得られる。 According to the seventh restriction example, the effect of shortening the processing time can be obtained and the effect of suppressing misdetection and overdetection of the face can be obtained.

＜＜第４実施形態＞＞
第１〜第３実施形態における全顔検出処理部（３１）及び端顔検出処理部（３２、３２ａ又は３２ｂ）に、特開２００６−７２７７０号公報に記載の手法を適用することが可能である。これを、第４実施形態として説明する。 << Fourth Embodiment >>
The technique described in JP-A-2006-72770 can be applied to the all face detection processing unit (31) and the end face detection processing unit (32, 32a, or 32b) in the first to third embodiments. . This will be described as a fourth embodiment.

この手法では、１つの正面顔を左側半分（以下、左顔という）と右側半分（以下、右顔という）とに分けて考え、学習処理を介して左顔用のパラメータと右顔用のパラメータを生成しておく。顔検出時には、判定領域を左右に分割することにより、左側判定領域と右側判定領域を定義し、左側判定領域と左顔用のパラメータとに基づく類似度と、右側判定領域と右顔用のパラメータとに基づく類似度と、を算出する。そして、一方又は双方の類似度が閾値以上の時に、判定領域が顔領域であると判別する。更に、２つの類似度の大小関係から顔の向きを検出する。 In this method, one front face is divided into a left half (hereinafter referred to as a left face) and a right half (hereinafter referred to as a right face), and the parameters for the left face and the right face are determined through a learning process. Is generated. At the time of face detection, the left determination area and the right determination area are defined by dividing the determination area into right and left, the similarity based on the left determination area and the left face parameters, and the right determination area and right face parameters And the similarity based on. Then, when one or both of the similarities are equal to or greater than the threshold, it is determined that the determination area is a face area. Furthermore, the face orientation is detected from the magnitude relationship between the two similarities.

この手法を第１〜第３実施形態に適用する場合、左顔用のパラメータと右顔用のパラメータを、重みテーブルとして生成しておく。即ち、図１の顔辞書メモリ３４に格納される重みテーブルとして、左顔用のパラメータを表す左顔用重みテーブルと、右顔用のパラメータを表す右顔用重みテーブルと、を用意しておく。 When this method is applied to the first to third embodiments, a left face parameter and a right face parameter are generated as a weight table. That is, as a weight table stored in the face dictionary memory 34 of FIG. 1, a left face weight table representing a left face parameter and a right face weight table representing a right face parameter are prepared. .

そして、判定領域を左側判定領域と右側判定領域に分割することに対応させて特徴画素Ｆ（ｎ）を左顔用の特徴画素と右顔用の特徴画素に分類し、左顔用の各特徴画素Ｆ（ｎ）の画素値ｉ（ｎ）及び左顔用重みテーブルに基づいて算出されるスコアｗ（ｎ）の総和を左顔用の類似度Ｓ_Lとして算出すると共に、右顔用の各特徴画素Ｆ（ｎ）の画素値ｉ（ｎ）及び右顔用重みテーブルに基づいて算出されるスコアｗ（ｎ）の総和を右顔用の類似度Ｓ_Rとして算出する。 Then, the feature pixels F (n) are classified into feature pixels for the left face and feature pixels for the right face in correspondence with dividing the decision area into a left decision area and a right decision area, and each feature for the left face The sum of the score w (n) calculated based on the pixel value i (n) of the pixel F (n) and the weight table for the left face is calculated as the similarity S _L for the left face, calculating the sum of the feature pixel F pixel value i of the (n) (n) and the score is calculated based on the weight table for right facial w (n) as the similarity S _R for the right face.

全顔検出処理部は、不等式「Ｓ_L＞ＴＨ_B」と不等式「Ｓ_R＞ＴＨ_B」が成立するか否かを判断し、一方又は双方が成立する場合、着目した判定領域に全顔が存在していると判断し、双方とも成立しない場合、着目した判定領域に全顔は存在していないと判断する。ここで、ＴＨ_Bは所定の閾値である。 The all face detection processing unit determines whether or not the inequality “S _L > TH _B ” and the inequality “S _R > TH _B ” are satisfied. When it is determined that both are present and neither of them is established, it is determined that the entire face does not exist in the focused determination area. Here, TH _B is a predetermined threshold value.

端顔検出処理部は、例えば、不等式「Ｓ_L＞ＴＨ_B−α_L」と不等式「Ｓ_R＞ＴＨ_B−α_R」が成立するか否かを判断し、一方又は双方が成立する場合、着目した判定領域に端顔が存在していると判断し、双方とも成立しない場合、着目した判定領域に端顔は存在していないと判断する。閾値補正項α_Lは、左側判定領域に属する重なり領域に対応する画素値ｉ（ｎ）から算出されたスコアｗ（ｎ）の総和である。閾値補正項α_Rは、右側判定領域に属する重なり領域に対応する画素値ｉ（ｎ）から算出されたスコアｗ（ｎ）の総和である。 For example, the end face detection processing unit determines whether or not the inequality “S _L > TH _B −α _L ” and the inequality “S _R > TH _B −α _R ” are satisfied. If it is determined that an end face exists in the focused determination area and neither of them is established, it is determined that no end face exists in the focused determination area. The threshold correction term α _L is the sum of the scores w (n) calculated from the pixel values i (n) corresponding to the overlapping area belonging to the left determination area. The threshold correction term α _R is the sum of the scores w (n) calculated from the pixel values i (n) corresponding to the overlapping area belonging to the right determination area.

また、特開２００６−７２７７０号公報に記載の手法を適用し、Ｓ_LとＳ_Rの大小関係に基づいて、検出された全顔の向き又は端顔の向きを検出すればよい。 In addition, the method described in Japanese Patent Application Laid-Open No. 2006-72770 may be applied, and the detected orientation of all faces or end faces may be detected based on the magnitude relationship between S _L and S _R.

＜＜変形等＞＞
上述の実施形態の変形例または注釈事項として、以下に、注釈１〜注釈４を記す。各注釈に記載した内容は、矛盾なき限り、任意に組み合わせることが可能である。 << Deformation, etc. >>
As modifications or annotations of the above-described embodiment, notes 1 to 4 are described below. The contents described in each comment can be arbitrarily combined as long as there is no contradiction.

［注釈１］
上述の各実施形態では、同一の入力画像を全顔検出処理部（３１）及び端顔検出処理部（３２、３２ａ又は３２ｂ）に与え、該入力画像に対する全顔検出処理部の顔検出結果（全顔検出結果）を、端顔検出処理部による該入力画像に対する端顔検出処理に反映させるようにしている。つまり、或るフレームの撮影画像に対して全顔検出処理を行い、その全顔検出結果を同一フレームの撮影画像に対する端顔検出処理に反映させている。 [Note 1]
In each of the above-described embodiments, the same input image is given to the whole face detection processing unit (31) and the end face detection processing unit (32, 32a, or 32b), and the face detection result of the whole face detection processing unit for the input image ( All face detection results) are reflected in the edge detection processing for the input image by the edge detection processing unit. That is, all face detection processing is performed on a captured image of a certain frame, and the result of all face detection is reflected in the end face detection processing for the captured image of the same frame.

しかしながら、第１フレームの撮影画像に対して全顔検出処理を行い、その全顔検出結果を、第２フレームの撮影画像に対する端顔検出処理に反映させるようにしてもよい。第２フレームは第１フレームよりも後のフレームであり、典型的には例えば、第２フレームは第１フレームの次のフレームである。 However, the entire face detection process may be performed on the captured image of the first frame, and the entire face detection result may be reflected in the edge detection process for the captured image of the second frame. The second frame is a frame after the first frame. Typically, for example, the second frame is a frame subsequent to the first frame.

このようにすると、以下のような処理も可能となる。即ち、第１フレームの撮影画像に対する全顔検出処理の結果に基づいて、第２フレームの撮影画像に対する端顔検出処理の条件を設定して第２フレームの撮影画像に基づく合成画像を生成する。ここで、「端顔検出処理の条件」とは、端顔検出を行うために設定すべき条件の全てを指し、第１実施形態で述べた画像追加幅Ｗ＿ｄｕｍ（図１０参照）、第３実施形態で述べた端顔検出時の検出条件などを含む。そして、第２フレームに対応する該合成画像に対して、全顔検出処理及び端顔検出処理を行うための判定領域の走査を一度に行うようにする。つまり、全顔検出を行うための判定領域の走査と端顔検出を行うための判定領域の走査とを個別に行うことなく、第２フレームに対応する合成画像に対して両検出用の走査を一度に行う。これにより、端顔検出専用の走査を行う必要がなくなる。 In this way, the following processing is also possible. That is, based on the result of the entire face detection process for the captured image of the first frame, a condition for the edge detection process for the captured image of the second frame is set, and a composite image based on the captured image of the second frame is generated. Here, the “end face detection processing conditions” refer to all the conditions to be set for performing end face detection. The image additional width W_dum (see FIG. 10) described in the first embodiment, the third embodiment. The detection conditions at the time of edge detection described in the form are included. Then, the determination area for performing all face detection processing and end face detection processing is scanned at a time on the composite image corresponding to the second frame. That is, both detection scans are performed on the composite image corresponding to the second frame without separately performing the determination region scan for performing all face detection and the determination region scan for performing end face detection. Do it at once. As a result, it is not necessary to perform scanning exclusively for edge detection.

［注釈２］
上述した説明文中に示した具体的な数値は、単なる例示であって、当然の如く、それらを様々な数値に変更することができる。 [Note 2]
The specific numerical values shown in the above description are merely examples, and as a matter of course, they can be changed to various numerical values.

［注釈３］
各実施形態における撮像装置は、ハードウェア、或いは、ハードウェアとソフトウェアの組み合わせによって実現可能である。特に、図２、図１７または図１８に示される各部位の機能は、ハードウェア、ソフトウェア、またはハードウェアとソフトウェアの組み合わせによって実現可能であり（但し、メモリを除く）、また、それらの各機能を撮像装置の外部にて実現することも可能である。 [Note 3]
The imaging device in each embodiment can be realized by hardware or a combination of hardware and software. In particular, the function of each part shown in FIG. 2, FIG. 17, or FIG. 18 can be realized by hardware, software, or a combination of hardware and software (except for memory), and each of these functions. Can also be realized outside the imaging device.

ソフトウェアを用いて撮像装置を構成する場合、ソフトウェアにて実現される部位についてのブロック図は、その部位の機能ブロック図を表すことになる。また、図２、図１７または図１８に示される各部位の機能の全部または一部を、プログラムとして記述し、該プログラムをプログラム実行装置（例えばコンピュータ）上で実行することによって、その機能の全部または一部を実現するようにしてもよい。 When the imaging apparatus is configured using software, a block diagram of a part realized by software represents a functional block diagram of the part. Also, all or part of the functions of the respective parts shown in FIG. 2, FIG. 17, or FIG. 18 are described as a program, and the program is executed on a program execution device (for example, a computer), whereby all of the functions are described. Or you may make it implement | achieve a part.

［注釈４］
上述の各実施形態おいて、図２等に示される端顔検出処理部（３２、３２ａ又は３２ｂ）は、端顔検出手段として機能し、この端顔検出手段は、上記の類似度（Ｓ₂など）を算出する類似度算出手段と、この類似度を比較されるべき判定閾値を設定するための判定閾値設定手段と、を備える。この判定閾値は、上述の（ＴＨ−α）などに対応する。 [Note 4]
In each of the above-described embodiments, the end face detection processing unit (32, 32a or 32b) shown in FIG. 2 or the like functions as an end face detection means, and the end face detection means has the similarity (S ₂₎ described above. Etc.) and a determination threshold value setting means for setting a determination threshold value with which the similarity should be compared. This determination threshold corresponds to (TH-α) described above.

本発明の第１実施形態に係る撮像装置の全体ブロック図である。1 is an overall block diagram of an imaging apparatus according to a first embodiment of the present invention. 図１の顔検出部の内部ブロック図である。It is an internal block diagram of the face detection part of FIG. 図２の全顔検出処理部に対する入力画像内において判定領域が走査される様子を表す図である。It is a figure showing a mode that the determination area | region is scanned within the input image with respect to the whole face detection process part of FIG. 図２の全顔検出処理部による顔検出処理（全顔検出処理）の動作手順を表すフローチャートである。It is a flowchart showing the operation | movement procedure of the face detection process (all face detection process) by the all face detection process part of FIG. 判定領域内の画像から４方向の第１エッジ強調画像を生成する際に用いる微分フィルタを表す図である。It is a figure showing the differential filter used when producing | generating the 1st edge emphasis image of 4 directions from the image in a determination area | region. 判定領域内の画像から生成される、４方向の第１エッジ強調画像と４方向の第２エッジ強調画像を表す図である。It is a figure showing the 1st edge emphasis image of 4 directions and the 2nd edge emphasis image of 4 directions produced | generated from the image in a determination area | region. 判定領域内の画像から生成される４方向の特徴画像を表す図である。It is a figure showing the characteristic image of 4 directions produced | generated from the image in a determination area | region. 図２の顔辞書メモリに格納される重みテーブルの内容例を表す図である。It is a figure showing the example of the content of the weight table stored in the face dictionary memory of FIG. 入力画像に存在しうる端顔を表す図である。It is a figure showing the end face which may exist in an input image. 入力画像にダミー画像を追加することによって生成される合成画像を表す図である。It is a figure showing the synthesized image produced | generated by adding a dummy image to an input image. 判定領域とダミー画像との間に重なり領域が存在することを示す図である。It is a figure which shows that an overlap area | region exists between a determination area | region and a dummy image. 合成画像内で走査される判定領域の中心座標の軌跡を示す図である。It is a figure which shows the locus | trajectory of the center coordinate of the determination area | region scanned within a composite image. 図２の端顔検出処理部による顔検出処理（端顔検出処理）の動作手順を表すフローチャートである。3 is a flowchart showing an operation procedure of face detection processing (face detection processing) by an edge detection processing unit in FIG. 2. 図２の端顔検出処理部による閾値補正項の算出法を説明するための図である。It is a figure for demonstrating the calculation method of the threshold value correction | amendment term by the end face detection process part of FIG. 図２の端顔検出処理部による閾値補正項の算出法を説明するための図である。It is a figure for demonstrating the calculation method of the threshold value correction | amendment term by the end face detection process part of FIG. 本発明の第２実施形態に係る端顔検出処理部の動作を説明するための図である。It is a figure for demonstrating operation | movement of the end face detection process part which concerns on 2nd Embodiment of this invention. 本発明の第２実施形態に係る端顔検出処理部の内部ブロック図である。It is an internal block diagram of the end face detection processing part which concerns on 2nd Embodiment of this invention. 本発明の第３実施形態に係る顔検出部の内部ブロック図である。It is an internal block diagram of the face detection part which concerns on 3rd Embodiment of this invention. 本発明の第３実施形態に係り、入力画像と端顔検出範囲との関係を説明するための図である。It is a figure for demonstrating the relationship between an input image and an end face detection range concerning 3rd Embodiment of this invention. 本発明の第３実施形態に係り、入力画像と端顔検出範囲との関係を説明するための図である。It is a figure for demonstrating the relationship between an input image and an end face detection range concerning 3rd Embodiment of this invention. 本発明の第３実施形態に係り、検出された全顔の向きと検出対象となる端顔の向きとの関係を説明するための図である。It is a figure for demonstrating the relationship between the direction of all the detected faces and the direction of the end face used as a detection target concerning 3rd Embodiment of this invention. 本発明の第３実施形態に係り、撮像装置の傾きと端顔検出範囲との関係を説明するための図である。It is a figure for demonstrating the relationship between the inclination of an imaging device, and an end face detection range concerning 3rd Embodiment of this invention. 本発明の第３実施形態に係り、傾いた全顔及び端顔を示す図である。It is a figure which concerns on 3rd Embodiment of this invention and shows the whole face and end face which inclined. 本発明の第３実施形態に係り、検出対象から除外される、傾いた端顔を示す図である。It is a figure which concerns on 3rd Embodiment of this invention and shows the inclined end face excluded from a detection target. 本発明の第３実施形態に係り、指向性マイクが設けられた撮像装置を上方（空側）から見た図である。It is the figure which looked at the imaging device provided with the directional microphone concerning 3rd Embodiment of this invention from upper direction (sky side).

Explanation of symbols

１撮像装置
１１撮像部
１９顔検出部
３０画像入力部
３１全顔検出処理部
３２、３２ａ、３２ｂ端顔検出処理部
３３顔検出結果統合部
３４顔辞書メモリ DESCRIPTION OF SYMBOLS 1 Imaging device 11 Imaging part 19 Face detection part 30 Image input part 31 Whole face detection process part 32, 32a, 32b End face detection process part 33 Face detection result integration part 34 Face dictionary memory

Claims

In a face detection device that performs face detection,
A face detection unit is provided for detecting a face part of the face that has been applied to the edge of the input image by adding a dummy image around the input image and performing face detection beyond the edge of the input image. A face detection device characterized by the above.

The edge detection means includes:
A determination area is defined in a composite image of the input image and the dummy image;
Similarity calculating means for calculating the similarity between the image in the determination area and predetermined dictionary data;
A determination threshold value setting means for setting the determination threshold value,
The face detection apparatus according to claim 1, wherein it is determined whether or not the end face is present in the determination region by comparing the similarity and the determination threshold.

A face detecting means for detecting a face present in the input image;
The face detection apparatus according to claim 1, wherein the end face detection unit changes the detection condition of the end face according to a face detection result of the face detection unit.

4. The end face detection unit changes the end face detection condition based on at least one of a size, a position, and an orientation of the face detected by the face detection unit. Face detection device.

The face detection device is mounted on an imaging device having imaging means,
The input image is generated from a captured image of the imaging means,
The face detection apparatus according to claim 1, wherein the end face detection unit changes the detection condition of the end face according to an inclination of the image pickup apparatus at the time of shooting the captured image.

In a face detection device comprising a face detection means for detecting a face present in an input image,
Further provided is an end face detection means for detecting an end face where a part of the face is applied to the end of the input image,
The face detection device, wherein the end face detection unit changes the detection condition of the end face according to a face detection result of the face detection unit.

In the face detection device mounted on the imaging device having the imaging means,
Receiving an input image generated from a captured image of the imaging means, and comprising an end face detecting means for detecting an end face where a part of the face is applied to the end of the input image,
The face detection device is characterized in that the face detection device changes a detection condition of the face according to an inclination of the image pickup device at the time of shooting the captured image.

Imaging means;
An imaging device comprising: the face detection device according to claim 1,
The imaging apparatus according to claim 1, wherein the input image to the face detection apparatus is generated based on a captured image of the imaging means.

In the face detection method for performing face detection,
A face detection method, wherein a dummy image is added around an input image and face detection is performed beyond the edge of the input image, thereby detecting a face part of the face that is applied to the edge of the input image. .