JP2009017194A

JP2009017194A - Object detecting device and method therefor, and imaging apparatus

Info

Publication number: JP2009017194A
Application number: JP2007176214A
Authority: JP
Inventors: Kazuhiro Kojima; 和浩小島; Masahiko Yamada; 晶彦山田
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2007-07-04
Filing date: 2007-07-04
Publication date: 2009-01-22

Abstract

<P>PROBLEM TO BE SOLVED: To suppress breaks and fluctuations in facial detection. <P>SOLUTION: A face detection device includes a first detection part for provisionally detecting a facial position from each frame image and a second detection part for detecting the final facial position. When the facial position of a current frame image is detected, a center view area (220) and a peripheral view area (230) around the area (220) are set with the facial position of a preceding frame image as a reference. The center view area and the peripheral view area are defined for the preceding frame image and the current frame image, and image changes between both the frame images in the center view area and the peripheral view area are detected by the calculation of motion vectors, or the like. The second detection part ultimately detects the facial position of the current frame, on the basis of detection results of the first detection part for the current frame, and the preceding frame and the image changes. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、画像内から人物の顔等の物体を検出する物体検出装置及び物体検出方法に関する。また本発明は、それらを利用した撮像装置に関する。 The present invention relates to an object detection apparatus and an object detection method for detecting an object such as a human face from an image. The present invention also relates to an imaging apparatus using them.

撮像装置において、画像内から顔を検出する顔検出機能が実用化されており、顔検出結果を利用した様々なカメラ制御も提案されている。例えば、顔検出結果を画像圧縮処理に利用することもできる。この種の画像圧縮処理では、顔の部分に割り当てる符号量を他の部分よりも多くすることで、顔の部分の解像度を比較的高く保つ（例えば、特許第３７４８７１７号公報参照）。 In an imaging apparatus, a face detection function for detecting a face from an image has been put into practical use, and various camera controls using face detection results have been proposed. For example, the face detection result can be used for image compression processing. In this type of image compression processing, the resolution of the face portion is kept relatively high by increasing the amount of code assigned to the face portion as compared to other portions (see, for example, Japanese Patent No. 3748717).

顔検出結果を利用したカメラ制御を行う場合、安定的な顔検出が必要となってくる。安定的な顔検出とは、顔検出の途切れやふらつきがない（或いは少ない）ことを意味する。顔検出結果を利用して上述の画像圧縮処理を行った場合において、仮に、顔検出が途切れると（実際には画像内に顔が存在するのに対して顔検出を失敗すると）、顔の部分の解像度が急激に高くなったり低くなったりする。これは、顔の部分の解像度が低いまま撮影される事よりも問題が大きい。また、顔検出結果を利用して、オートフォーカス制御、自動露出制御又はオートホワイトバランス制御を実行する場合においても類似の問題が生じる。これらの事情からも分かるように、安定的な顔検出は重要である。 When performing camera control using a face detection result, stable face detection is required. Stable face detection means that face detection is not interrupted or wobbled (or less). In the case where the above-described image compression processing is performed using the face detection result, if face detection is interrupted (actually, a face exists in the image but face detection fails), the face portion The resolution of the screen suddenly increases or decreases. This is more problematic than shooting with low resolution of the face. A similar problem occurs when auto focus control, automatic exposure control, or auto white balance control is executed using the face detection result. As can be seen from these circumstances, stable face detection is important.

ところが、顔検出はあらゆる条件下で成功する訳ではなく、実際には安定的に顔検出が行われないこともある。例えば、人物の顔はあらゆる方向を向くが顔を検出できる顔の向きにも限界があり、限界付近で顔の検出を失敗することもある。また例えば、ノイズ、顔の動きによる画像ボケ、急激な光源変化又はオクルージョンの発生によって、顔検出を失敗することもある。また例えば、顔検出の演算量削減のために顔を探査する領域を間引いたりすることもあり、その場合には、検出顔位置がふらつくこともある。このような不安定な顔検出結果に基づいてカメラ制御を行うと、上述のような問題が生じる。 However, face detection does not succeed under all conditions, and in fact, face detection may not be performed stably. For example, the face of a person faces every direction, but there is a limit to the direction of the face that can be detected, and face detection may fail near the limit. In addition, for example, face detection may fail due to noise, image blur due to face movement, sudden light source change or occlusion. Further, for example, a face search area may be thinned out to reduce the amount of face detection calculation, and in this case, the detected face position may fluctuate. When the camera control is performed based on such an unstable face detection result, the above-described problem occurs.

尚、下記特許文献１には、連写中に顔検出が不能になった場合、所定の時間内に得られた顔検出結果に基づいて撮影動作制御を行う手法が開示されている。しかし、この手法では、顔が所定の時間以上に静止していた場合、撮影動作制御に必要な顔検出結果を失ってしまうことになる。 Patent Document 1 below discloses a technique for performing shooting operation control based on a face detection result obtained within a predetermined time when face detection becomes impossible during continuous shooting. However, with this method, when the face is stationary for a predetermined time or more, the face detection result necessary for the shooting operation control is lost.

特開２００６−１５７６１７号公報JP 2006-157617 A 藤吉弘亘、“動画像理解技術とその応用”、［online］、中部大学工学部情報工学科、［平成１９年５月２５日検索］、インターネット＜URL:http://www.vision.cs.chubu.ac.jp/VU/pdf/VU.pdf＞Hiroyoshi Fujiyoshi, “Video Understanding Technology and its Applications”, [online], Chubu University Faculty of Engineering, Department of Information Engineering, [Search May 25, 2007], Internet <URL: http: //www.vision.cs.chubu .ac.jp / VU / pdf / VU.pdf>

検出すべき対象物が顔である場合に着目して従来の問題を説明したが、顔以外の物体を検出する場合にも同様の問題が生じる。 Although the conventional problem has been described focusing on the case where the object to be detected is a face, the same problem occurs when an object other than the face is detected.

そこで本発明は、安定した物体検出に寄与する物体検出装置及び物体検出方法並びに撮像装置を提供することを目的とする。 Accordingly, an object of the present invention is to provide an object detection device, an object detection method, and an imaging device that contribute to stable object detection.

本発明に係る第１の物体検出装置は、動画像内における特定種類の物体の位置を前記動画像に基づいて検出する物体検出装置において、前記動画像内における前記物体の周辺の画像変化をも考慮して、前記物体の位置を検出することを特徴とする。 A first object detection device according to the present invention is an object detection device that detects a position of a specific type of object in a moving image based on the moving image, and also includes an image change around the object in the moving image. In consideration of this, the position of the object is detected.

物体の周辺の画像変化に基づけば、物体そのもの或いは物体の周辺の動きを推測可能であり、その推測結果に基づいて良好な物体の位置検出が可能である。故に、上述のように構成すれば、安定した物体検出が期待される。 Based on the image change around the object, it is possible to estimate the object itself or the movement around the object, and it is possible to detect the position of the object well based on the estimation result. Therefore, when configured as described above, stable object detection is expected.

本発明に係る第２の物体検出装置は、動画像内における特定種類の物体の位置を前記動画像に基づいて検出する物体検出装置において、前記動画像内における前記物体の位置を暫定的に検出する第１検出手段と、前記第１検出手段の検出結果と、前記物体を含む基準エリア内及び前記基準エリアの周辺に位置する周辺エリア内の画像変化と、に基づいて前記動画像内における前記物体の位置を最終的に検出する第２検出手段と、を備えたことを特徴とする。 A second object detection device according to the present invention is an object detection device that detects a position of a specific type of object in a moving image based on the moving image, and tentatively detects the position of the object in the moving image. Based on the detection result of the first detection means, the detection result of the first detection means, and the image change in the reference area including the object and in the peripheral area located around the reference area. And second detection means for finally detecting the position of the object.

一旦、検出した物体の位置を上記画像変化に基づいてより適切な位置に修正することができる場合がある。そこで上記の如く構成する。これによっても、安定した物体検出が期待される。 In some cases, the position of the detected object can be corrected to a more appropriate position based on the image change. Therefore, it is configured as described above. This also allows stable object detection.

具体的には例えば、前記動画像は、時系列で並ぶ複数の入力画像から形成され、前記第１検出手段は、各入力画像内における前記物体の位置を暫定的に検出し、前記第２検出手段は、前記第１検出手段によって検出された過去の入力画像についての前記物体の位置に基づいて現在の入力画像に対する前記基準エリア及び前記周辺エリアを設定するエリア設定手段と、現在の入力画像に対して設定された前記基準エリア及び前記周辺エリアを現在と過去の入力画像に対して定義し、現在と過去の入力画像間における前記基準エリア及び前記周辺エリア内の画像変化を、現在の入力画像についての画像変化として検出する画像変化検出手段と、を備え、過去及び現在の入力画像に対する前記第１検出手段の検出結果と、前記画像変化検出手段によって検出された、現在の入力画像についての前記画像変化と、に基づいて現在の入力画像内における前記物体の位置を最終的に検出する。 Specifically, for example, the moving image is formed from a plurality of input images arranged in time series, and the first detection means tentatively detects the position of the object in each input image, and the second detection Means for setting the reference area and the peripheral area for the current input image based on the position of the object with respect to the past input image detected by the first detection means; The reference area and the peripheral area set for the current and past input images are defined, and image changes in the reference area and the peripheral area between the current and past input images are defined as the current input image. Image change detection means for detecting an image change for the first and second input images, and a detection result of the first detection means for the past and current input images, and the image change detection means. It detected Te, and the image changes for the current input image, and finally to detect the position of the object in the current input image based on.

また例えば、前記周辺エリアは、前記基準エリアの全周を囲むように設定され、前記画像変化検出手段は、前記周辺エリアの一部エリアを除外した上で、前記周辺エリアについての前記画像変化を検出する。 Further, for example, the peripheral area is set so as to surround the entire circumference of the reference area, and the image change detection unit excludes a part of the peripheral area and then performs the image change for the peripheral area. To detect.

例えば、検出すべき物体が人物の顔である場合において、身体に対応するエリアを除外するようにすれば、物体検出の更なる安定化が期待される。 For example, when the object to be detected is a person's face, further stabilization of object detection is expected if the area corresponding to the body is excluded.

これに代えて例えば、前記周辺エリアを、前記基準エリアの全周の一部を囲むように設定してもよい。 Instead of this, for example, the peripheral area may be set so as to surround a part of the entire circumference of the reference area.

また例えば、前記周辺エリアは、前記基準エリアからの距離が比較的近い第１周辺エリアと前記基準エリアからの距離が比較的遠い第２周辺エリアとを含むように階層化され、
前記第２検出手段は、前記第１検出手段の検出結果と、前記基準エリア並びに第１及び第２周辺エリア内の画像変化と、に基づいて前記物体の位置を最終的に検出するようにしてもよい。 Further, for example, the peripheral area is hierarchized to include a first peripheral area that is relatively close to the reference area and a second peripheral area that is relatively far from the reference area,
The second detection means finally detects the position of the object based on a detection result of the first detection means and image changes in the reference area and the first and second peripheral areas. Also good.

また具体的には例えば、前記特定種類の物体は、人物の顔である。 More specifically, for example, the specific type of object is a human face.

そして、本発明に係る撮像装置は、被写体に応じた動画像を取得する撮像手段と、前記撮像手段によって得られた前記動画像を受ける、上記の何れかの物体検出装置と、を備えたことを特徴とする。 An imaging apparatus according to the present invention includes an imaging unit that acquires a moving image according to a subject, and any one of the object detection devices that receives the moving image obtained by the imaging unit. It is characterized by.

また、本発明に係る物体検出方法は、動画像内における特定種類の物体の位置を前記動画像に基づいて検出する物体検出方法において、前記動画像内における前記物体の周辺の画像変化をも考慮して、前記物体の位置を検出することを特徴とする。 The object detection method according to the present invention is an object detection method for detecting a position of a specific type of object in a moving image based on the moving image, and also considers an image change around the object in the moving image. Then, the position of the object is detected.

本発明によれば、安定した物体検出に寄与する物体検出装置及び物体検出方法並びに撮像装置を提供することが可能である。 According to the present invention, it is possible to provide an object detection apparatus, an object detection method, and an imaging apparatus that contribute to stable object detection.

本発明の意義ないし効果は、以下に示す実施の形態の説明により更に明らかとなろう。ただし、以下の実施の形態は、あくまでも本発明の一つの実施形態であって、本発明ないし各構成要件の用語の意義は、以下の実施の形態に記載されたものに制限されるものではない。 The significance or effect of the present invention will become more apparent from the following description of embodiments. However, the following embodiment is merely one embodiment of the present invention, and the meaning of the term of the present invention or each constituent element is not limited to that described in the following embodiment. .

以下、本発明の実施の形態につき、図面を参照して具体的に説明する。参照される各図において、同一の部分には同一の符号を付し、同一の部分に関する重複する説明を原則として省略する。後に第１〜第４実施例を説明するが、まず、各実施例に共通する事項又は各実施例にて参照される事項について説明する。 Hereinafter, embodiments of the present invention will be specifically described with reference to the drawings. In each of the drawings to be referred to, the same part is denoted by the same reference numeral, and redundant description regarding the same part is omitted in principle. The first to fourth embodiments will be described later. First, matters that are common to each embodiment or items that are referred to in each embodiment will be described.

図１は、本発明の実施形態に係る撮像装置１の全体ブロック図である。図１の撮像装置１は、静止画像を撮影及び記録可能なデジタルスチルカメラ、又は、静止画像及び動画像を撮影及び記録可能なデジタルビデオカメラである。 FIG. 1 is an overall block diagram of an imaging apparatus 1 according to an embodiment of the present invention. The imaging apparatus 1 in FIG. 1 is a digital still camera that can capture and record still images, or a digital video camera that can capture and record still images and moving images.

撮像装置１は、撮像部１１と、ＡＦＥ（Analog Front End）１２と、主制御部１３と、内部メモリ１４と、表示部１５と、記録媒体１６と、操作部１７と、顔検出部１８と、を備えている。 The imaging apparatus 1 includes an imaging unit 11, an AFE (Analog Front End) 12, a main control unit 13, an internal memory 14, a display unit 15, a recording medium 16, an operation unit 17, and a face detection unit 18. It is equipped with.

撮像部１１は、光学系と、絞りと、ＣＣＤ（Charge Coupled Devices）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）イメージセンサなどから成る撮像素子と、光学系や絞りを制御するためのドライバ（全て不図示）と、を有している。ドライバは、主制御部１３からの制御信号に基づいて、光学系のズーム倍率や焦点距離、及び、絞りの開度を制御する。撮像素子は、光学系及び絞りを介して入射した被写体を表す光学像を光電変換し、該光電変換によって得られた電気信号をＡＦＥ１２に出力する。 The imaging unit 11 includes an optical system, an aperture, an imaging device such as a CCD (Charge Coupled Devices) or a CMOS (Complementary Metal Oxide Semiconductor) image sensor, and a driver (all not shown) for controlling the optical system and the aperture. And have. The driver controls the zoom magnification and focal length of the optical system and the aperture of the diaphragm based on the control signal from the main control unit 13. The image sensor photoelectrically converts an optical image representing a subject incident through the optical system and the diaphragm, and outputs an electrical signal obtained by the photoelectric conversion to the AFE 12.

ＡＦＥ１２は、撮像部１１（撮像素子）から出力されるアナログ信号を増幅し、増幅されたアナログ信号をデジタル信号に変換する。ＡＦＥ１２は、このデジタル信号を、順次、主制御部１３に出力する。 The AFE 12 amplifies the analog signal output from the imaging unit 11 (imaging device), and converts the amplified analog signal into a digital signal. The AFE 12 sequentially outputs this digital signal to the main control unit 13.

主制御部１３は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）及びＲＡＭ（Random Access Memory）等を備え、映像信号処理部としても機能する。主制御部１３は、ＡＦＥ１２の出力信号に基づいて、撮像部１１によって撮影された画像（以下、「撮影画像」又は「フレーム画像」ともいう）を表す映像信号を生成する。また、主制御部１３は、表示部１５の表示内容を制御する表示制御手段としての機能をも備え、表示に必要な制御を表示部１５に対して行う。 The main control unit 13 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like, and also functions as a video signal processing unit. Based on the output signal of the AFE 12, the main control unit 13 generates a video signal representing an image captured by the imaging unit 11 (hereinafter also referred to as “captured image” or “frame image”). The main control unit 13 also has a function as display control means for controlling the display content of the display unit 15, and performs control necessary for display on the display unit 15.

内部メモリ１４は、ＳＤＲＡＭ（Synchronous Dynamic Random Access Memory）等にて形成され、撮像装置１内で生成された各種データを一時的に記憶する。表示部１５は、液晶ディスプレイパネル等から成る表示装置であり、主制御部１３の制御の下、直前のフレームにて撮影された画像や記録媒体１６に記録されている画像などを表示する。記録媒体１６は、ＳＤ（Secure Digital）メモリカード等の不揮発性メモリであり、主制御部１３による制御の下、撮影画像などを記憶する。勿論、記録媒体１６を、ＳＤメモリカード以外の記録媒体（例えば、ＨＤＤ（Hard Disk Drive）やＤＶＤ（Digital Versatile Disc））を用いて形成しても良い。操作部１７は、外部からの操作を受け付ける。操作部１７に対する操作内容は、主制御部１３に伝達される。 The internal memory 14 is formed by SDRAM (Synchronous Dynamic Random Access Memory) or the like, and temporarily stores various data generated in the imaging device 1. The display unit 15 is a display device composed of a liquid crystal display panel or the like, and displays an image taken in the immediately previous frame or an image recorded on the recording medium 16 under the control of the main control unit 13. The recording medium 16 is a non-volatile memory such as an SD (Secure Digital) memory card, and stores captured images and the like under the control of the main control unit 13. Of course, the recording medium 16 may be formed using a recording medium other than the SD memory card (for example, HDD (Hard Disk Drive) or DVD (Digital Versatile Disc)). The operation unit 17 receives an operation from the outside. The content of the operation on the operation unit 17 is transmitted to the main control unit 13.

顔検出部１８は、各撮影画像内から人物の顔を検出し、その顔検出結果を表す情報を主制御部１３に伝達する。主制御部１３は、顔検出部１８の顔検出結果を利用して様々な処理を実施する。例えば、顔検出部１８によって検出された顔にピントが合うようにオートフォーカス制御を行う。或いは例えば、顔検出部１８によって検出された顔に対する露出量が最適となるように露出制御（絞りの開度に対する制御や撮像素子の露光時間に対する制御）を行う。或いは例えば、顔検出部１８によって検出された顔の画像に対してオートホワイトバランス制御を行う。また、動画像の撮影及び記録時において、動画像を形成する各フレーム画像は所定の圧縮方式にて圧縮されてから記録媒体１６に記録されるが、この画像圧縮処理において顔検出部１８の顔検出結果を利用することもできる。例えば、画像圧縮処理において、顔の部分に割り当てる符号量を他の部分よりも多くするといったことが可能である。 The face detection unit 18 detects the face of a person from each captured image, and transmits information representing the face detection result to the main control unit 13. The main control unit 13 performs various processes using the face detection result of the face detection unit 18. For example, autofocus control is performed so that the face detected by the face detection unit 18 is in focus. Alternatively, for example, exposure control (control on the opening degree of the aperture and control on the exposure time of the image sensor) is performed so that the exposure amount for the face detected by the face detection unit 18 is optimized. Alternatively, for example, auto white balance control is performed on the face image detected by the face detection unit 18. Further, at the time of shooting and recording a moving image, each frame image forming the moving image is compressed by a predetermined compression method and then recorded on the recording medium 16. In this image compression process, the face of the face detection unit 18 is recorded. The detection result can also be used. For example, in the image compression process, it is possible to increase the amount of code assigned to the face part more than other parts.

撮像装置１の動作モードには、静止画像または動画像の撮影及び記録が可能な撮影モードと、記録媒体１６に記録された静止画像または動画像を表示部１５に再生表示する再生モードと、が含まれる。操作部１７に対する操作に応じて、各モード間の遷移は実施される。撮影モードにおいて、撮像部１１は、所定のフレーム周期（例えば、１／６０秒）にて順次撮影を行う。以下の説明は、特に記述なき限り、撮影モードにおける動作の説明である。 The operation modes of the imaging apparatus 1 include a shooting mode capable of shooting and recording still images or moving images, and a playback mode for playing back and displaying still images or moving images recorded on the recording medium 16 on the display unit 15. included. Transition between the modes is performed according to the operation on the operation unit 17. In the shooting mode, the imaging unit 11 sequentially performs shooting at a predetermined frame period (for example, 1/60 seconds). The following description is an explanation of the operation in the shooting mode unless otherwise specified.

今、フレーム周期が経過するごとに、第１、第２、第３、・・・、第（ｎ−２）、第（ｎ−１）及び第ｎフレームがこの順番で訪れるものとし（ｎは３以上の整数）、第１、第２、第３、・・・、第（ｎ−２）、第（ｎ−１）及び第ｎフレームにて得られた撮影画像を、夫々、第１、第２、第３、・・・、第（ｎ−２）、第（ｎ−１）及び第ｎフレーム画像と呼ぶこととする。時系列で並ぶ第１〜第ｎフレーム画像は、動画像を形成する。各フレーム画像は、顔検出部１８に対する入力画像として顔検出部１８に与えられる。尚、動画像を形成する第１〜第ｎフレーム画像には、操作部１７に対する静止画像撮影指示に応じて得られ且つ記録媒体１６に記録される静止画像が含まれうるし、スルー表示画像も含まれうる。スルー表示画像とは、静止画像撮影指示がなされていなくても、表示部１５の表示内容を更新するために撮像部１１から取得される画像である。 Now, every time the frame period elapses, the first, second, third,..., (N-2) th, (n-1) th and nth frames come in this order (n is (Integer greater than or equal to 3), first, second, third,..., (N-2), (n-1) and nth frames are taken as the first, The second, third,..., (N-2) th, (n-1) th and nth frame images are referred to. The first to nth frame images arranged in time series form a moving image. Each frame image is given to the face detection unit 18 as an input image to the face detection unit 18. The first to nth frame images forming the moving image can include a still image obtained in response to a still image shooting instruction to the operation unit 17 and recorded on the recording medium 16, and also includes a through display image. Can be. The through display image is an image acquired from the imaging unit 11 in order to update the display content of the display unit 15 even when a still image shooting instruction is not given.

図２に、顔検出部１８の内部ブロック図を示す。図２に示す如く、顔検出部１８は、顔検出処理部２１と、顔検出安定化処理部２２と、を備える。顔検出部１８に対する入力画像（各フレーム画像）は、顔検出処理部２１及び顔検出安定化処理部２２に与えられる。以下の記述において、単に入力画像といった場合、それは、顔検出部１８に対する入力画像を指すものとする。また、以下の説明において、特に記述なき限り、各入力画像内には顔が含まれているものとする。 FIG. 2 shows an internal block diagram of the face detection unit 18. As shown in FIG. 2, the face detection unit 18 includes a face detection processing unit 21 and a face detection stabilization processing unit 22. An input image (each frame image) to the face detection unit 18 is given to the face detection processing unit 21 and the face detection stabilization processing unit 22. In the following description, an input image simply refers to an input image for the face detection unit 18. Further, in the following description, it is assumed that each input image includes a face unless otherwise specified.

顔検出処理部２１は、入力画像ごとに、入力画像の画像データに基づいて入力画像中から人物の顔を検出し、検出された顔を含む顔領域を抽出する。画像中に含まれる顔を検出する手法として様々な手法が知られており、顔検出処理部２１は何れの手法をも採用可能である。例えば、特開２０００−１０５８１９号公報に記載の手法のように入力画像から肌色領域を抽出することによって顔（顔領域）を検出しても良いし、特開２００６−２１１１３９号公報又は特開２００６−７２７７０号公報に記載の手法を用いて顔（顔領域）を検出しても良い。 For each input image, the face detection processing unit 21 detects a human face from the input image based on the image data of the input image, and extracts a face region including the detected face. Various methods are known as a method for detecting a face included in an image, and the face detection processing unit 21 can employ any method. For example, a face (face region) may be detected by extracting a skin color region from an input image as in the method described in Japanese Patent Application Laid-Open No. 2000-105819, or Japanese Patent Application Laid-Open No. 2006-21111 or Japanese Patent Application Laid-Open No. 2006. A face (face area) may be detected using the method described in Japanese Patent No. -72770.

典型的には例えば、入力画像内に設定された着目領域の画像と所定の画像サイズを有する基準顔画像とを対比して両画像の類似度を判定し、その類似度に基づいて着目領域に顔が含まれているか否か（着目領域が顔領域であるか否か）を検出する。類似判定は、顔であるか否かを識別するのに有効な特徴量を抽出することによって行う。特徴量は、水平エッジ、垂直エッジ、右斜めエッジ、左斜めエッジ等である。 Typically, for example, the image of the region of interest set in the input image is compared with a reference face image having a predetermined image size to determine the similarity between both images, and the region of interest is determined based on the similarity. It is detected whether or not a face is included (whether or not the region of interest is a face region). The similarity determination is performed by extracting a feature amount effective for identifying whether the face is a face. The feature amount is a horizontal edge, a vertical edge, a right diagonal edge, a left diagonal edge, or the like.

入力画像において着目領域は一画素ずつ左右方向又は上下方向にずらされる。そして、ずらされた後の着目領域の画像と基準顔画像とが対比されて、再度、両画像の類似度が判定され、同様の検出が行われる。このように、着目領域は、例えば入力画像の左上から右下方向に向けて１画素ずつずらされながら、更新設定される。また、入力画像を一定割合で縮小し、縮小後の画像に対して、同様の顔検出処理を行う。このような処理を繰り返すことにより、入力画像中から任意の大きさの顔を検出することができる。 In the input image, the region of interest is shifted in the horizontal direction or the vertical direction pixel by pixel. Then, the image of the region of interest after the shift is compared with the reference face image, the similarity between both images is determined again, and the same detection is performed. In this way, the region of interest is updated and set, for example, while being shifted pixel by pixel from the upper left to the lower right of the input image. In addition, the input image is reduced at a certain rate, and the same face detection process is performed on the reduced image. By repeating such processing, a face of any size can be detected from the input image.

また、顔検出処理部２１は、各入力画像内における顔の傾きをも検出可能に構成されている。例えば、入力画像を回転させ、回転後の画像に対して上述と同様の顔検出処理を行うことにより、傾いた顔を検出可能であると共に顔の傾き（傾き角度）を検出することができる。ここにおける顔の傾きとは、入力画像の垂直方向に対する顔の傾きを意味し、例えば、入力画像の垂直方向に平行な直線に対する顔の口中央と眉間を結ぶ直線の傾きである。 Further, the face detection processing unit 21 is configured to be able to detect the inclination of the face in each input image. For example, by rotating the input image and performing face detection processing similar to the above on the rotated image, it is possible to detect a tilted face and to detect the tilt (tilt angle) of the face. Here, the face inclination means the inclination of the face with respect to the vertical direction of the input image, for example, the inclination of the straight line connecting the center of the mouth and the eyebrows with respect to the straight line parallel to the vertical direction of the input image.

顔検出処理部２１は、更に、各入力画像における顔の向きも併せて検出する。即ち例えば、顔検出処理部２１は、各入力画像において検出された顔が、正面顔（正面から見た顔）であるのか、或いは、横顔（横から見た顔）であるのかを区別して検出可能である。顔の向きを検出する手法として様々な手法が提案されており、顔検出処理部２１は何れの手法をも採用可能である。 The face detection processing unit 21 further detects the orientation of the face in each input image. That is, for example, the face detection processing unit 21 detects whether the face detected in each input image is a front face (face viewed from the front) or a side face (face viewed from the side). Is possible. Various methods have been proposed as methods for detecting the orientation of a face, and the face detection processing unit 21 can employ any method.

例えば、特開平１０−３０７９２３号公報に記載の手法のように、入力画像の中から、目、鼻、口等の顔部品を順番に見つけていって画像中の顔の位置を検出し、顔部品の投影データに基づいて顔の向きを検出する。 For example, as in the technique described in Japanese Patent Laid-Open No. 10-307923, face parts such as eyes, nose and mouth are found in order from the input image, and the position of the face in the image is detected. The direction of the face is detected based on the projection data of the part.

或いは例えば、特開２００６−７２７７０号公報に記載の手法を用いてもよい。この手法では、１つの正面顔を左側半分（以下、左顔という）と右側半分（以下、右顔という）とに分けて考え、学習処理を介して左顔に関するパラメータと右顔に関するパラメータを事前に生成しておく。顔検出時には、入力画像内の着目領域を左右に分割し、各分割領域と、上記２つのパラメータの内の対応するパラメータとの類似度を算出する。そして、一方又は双方の類似度が閾値以上の時に、着目領域が顔領域であると判別する。更に、各分割領域に対する類似度の大小関係から顔の向きを検出する。 Alternatively, for example, a method described in JP 2006-72770 A may be used. In this method, one front face is divided into a left half (hereinafter referred to as a left face) and a right half (hereinafter referred to as a right face), and parameters related to the left face and right face are preliminarily determined through a learning process. Generate in advance. At the time of face detection, the region of interest in the input image is divided into left and right, and the similarity between each divided region and the corresponding parameter of the two parameters is calculated. Then, when one or both of the similarities are equal to or greater than the threshold, it is determined that the region of interest is a face region. Furthermore, the orientation of the face is detected from the magnitude relationship of the similarity to each divided region.

顔検出処理部２１は、自身による顔検出結果を表す顔検出情報を出力する。顔検出処理部２１により或る入力画像から顔が検出された場合、該入力画像に対する顔検出情報は、該入力画像上における「顔の位置、顔の大きさ、顔の傾き及び顔の向き」を特定する。実際には例えば、顔検出処理部２１は、顔を含む矩形領域を顔領域として抽出し、入力画像上における該顔領域の位置及び画像サイズによって顔の位置及び大きさを表現する。また、顔の位置とは、例えば、その顔についての顔領域の中心位置を示す。各入力画像に対する顔検出情報は、顔検出安定化処理部２２に与えられる。尚、顔検出処理部２１において顔が検出されなかった場合は、顔検出情報は生成及び出力されず、代わりに、その旨を表す情報が顔検出安定化処理部２２に伝達される。 The face detection processing unit 21 outputs face detection information representing a face detection result by itself. When a face is detected from an input image by the face detection processing unit 21, face detection information for the input image is “face position, face size, face tilt, and face orientation” on the input image. Is identified. Actually, for example, the face detection processing unit 21 extracts a rectangular area including the face as a face area, and expresses the position and size of the face by the position and image size of the face area on the input image. The face position indicates, for example, the center position of the face area for the face. The face detection information for each input image is given to the face detection stabilization processing unit 22. If no face is detected by the face detection processing unit 21, face detection information is not generated and output. Instead, information indicating that fact is transmitted to the face detection stabilization processing unit 22.

ところで、顔検出処理部２１による顔検出は、あらゆる条件下で成功する訳ではなく、安定的に顔検出が行われないこともある。例えば、顔を検出できる顔の向きにも限界があり、限界付近で顔の検出を失敗することもある。また例えば、ノイズ、顔検出処理部２１が想定していない顔の表情変化、顔の動きによる画像ボケ、急激な光源変化又はオクルージョンの発生によって、顔検出を失敗することもある。また例えば、顔検出の演算量削減のために顔を探査する領域を間引いたりすることもあり、その場合には、検出顔位置がふらつくこともある。 Incidentally, face detection by the face detection processing unit 21 does not succeed under all conditions, and face detection may not be performed stably. For example, there is a limit to the direction of the face that can be detected, and face detection may fail near the limit. In addition, for example, face detection may fail due to noise, facial expression change that the face detection processing unit 21 does not assume, image blur due to face movement, sudden light source change or occlusion. Further, for example, a face search area may be thinned out to reduce the amount of face detection calculation, and in this case, the detected face position may fluctuate.

上述の如く、顔検出処理部２１だけでは顔検出が安定的に行われないことがあるが、顔検出部１８では、顔検出安定化処理部２２が顔検出の安定性を担保する。顔検出安定化処理部２２は、複数フレーム分の入力画像と顔検出処理部２１からの顔検出情報とに基づいて顔検出の安定化を図り、これによって安定化顔情報を生成及び出力する。顔検出の安定化とは、顔検出の途切れやふらつきを抑制することを意味する。安定化顔情報は、顔検出の安定化が行われた後の顔の検出結果を表している。安定化顔情報は、着目した入力画像上における顔の位置及び顔の大きさを特定する情報を少なくとも含み、更に、着目した入力画像上における顔の傾き及び顔の向きを特定する情報をも含みうる。 As described above, face detection may not be performed stably only by the face detection processing unit 21, but in the face detection unit 18, the face detection stabilization processing unit 22 ensures the stability of face detection. The face detection stabilization processing unit 22 stabilizes face detection based on the input images for a plurality of frames and the face detection information from the face detection processing unit 21, thereby generating and outputting the stabilized face information. Stabilization of face detection means suppressing face detection interruptions and wobbling. The stabilized face information represents a face detection result after the face detection is stabilized. The stabilized face information includes at least information for specifying a face position and a face size on the focused input image, and further includes information for identifying a face inclination and a face orientation on the focused input image. sell.

着目した入力画像に対する安定化顔情報は、その入力画像に対する顔検出部１８の顔検出結果を表す情報として、図１の主制御部１３に伝達される。つまり、主制御部１３は、安定化顔情報にて特定される顔の位置及び大きさ等に基づいて、上述のような様々な処理（画像圧縮処理やオートフォーカス制御等）を実施する。 The stabilized face information for the focused input image is transmitted to the main control unit 13 in FIG. 1 as information representing the face detection result of the face detection unit 18 for the input image. That is, the main control unit 13 performs various processes (image compression processing, autofocus control, etc.) as described above based on the face position and size specified by the stabilized face information.

図３は、顔検出安定化処理部２２の内部ブロック図である。図３に示す如く、顔検出安定化処理部２２は、エリア設定部３１、画像変化判定部３２及び安定化顔情報生成部３３を備える。 FIG. 3 is an internal block diagram of the face detection stabilization processing unit 22. As shown in FIG. 3, the face detection stabilization processing unit 22 includes an area setting unit 31, an image change determination unit 32, and a stabilized face information generation unit 33.

今、説明の具体化のため、顔検出の安定化を施したい入力画像が第ｎフレーム画像であるとし、第ｎフレーム画像についての安定化顔情報を生成することを想定する（この想定は、特に記述無き限り、以下の説明文の全てに適用される）。顔検出の安定化を施したい入力画像（今の例の場合、第ｎフレーム画像）を、「安定化対象画像」とも呼ぶ。 Now, for the sake of concrete explanation, it is assumed that the input image for which face detection is to be stabilized is the nth frame image, and that the stabilized face information for the nth frame image is generated (this assumption is This applies to all of the following descriptions unless otherwise stated). An input image (in the present example, the nth frame image) on which face detection is to be stabilized is also referred to as a “stabilization target image”.

概略的には、顔検出安定化処理部２２は、顔領域及び顔領域周辺の画像変化を監視し、その画像変化をも考慮して最終的に検出すべき顔の位置等を決定する。 Schematically, the face detection stabilization processing unit 22 monitors the face area and image changes around the face area, and determines the position of the face to be finally detected in consideration of the image change.

例えば、或る区間において、顔領域及び顔領域周辺に画像変化が殆どなければ顔自体及び顔の位置は変わっていないと判断することができる。また例えば、顔領域周辺に画像変化がなければ、顔自体は動いているかもしれないが顔の位置は変わっていないと判断できる。顔領域及び顔領域周辺の双方に画像変化がある場合は、顔の位置の移動等が発生したと判断できる。 For example, in a certain section, it can be determined that the face itself and the face position have not changed if there is almost no image change in the face area and the periphery of the face area. Also, for example, if there is no image change around the face area, it can be determined that the face itself may have moved but the face position has not changed. When there is an image change in both the face area and the periphery of the face area, it can be determined that the movement of the face position has occurred.

そして例えば、顔の位置が変わっていないと判断できたならば、図４に示す如く、安定化対象画像の前のフレーム画像に対する顔検出結果を安定化対象画像に対する顔検出結果として流用することができる。これにより、安定化対象画像に対する顔検出処理部２１の顔検出が失敗したとしても、顔検出の途切れ等の発生を抑制可能である。 For example, if it can be determined that the face position has not changed, the face detection result for the frame image before the stabilization target image can be used as the face detection result for the stabilization target image, as shown in FIG. it can. Thereby, even if face detection of the face detection processing unit 21 with respect to the stabilization target image fails, occurrence of a face detection interruption or the like can be suppressed.

図３の顔検出安定化処理部２２内の各部の機能を概略的に説明する（詳細説明は、後述の各実施例内に設ける）。 The function of each unit in the face detection stabilization processing unit 22 of FIG. 3 will be schematically described (detailed description will be provided in each example described later).

エリア設定部３１は、エリア設定用画像についての顔検出情報に基づいて画像変化を監視するべき中心視エリア及び周辺視エリア（例えば、後述の図５（ａ）の中心視エリア２２０及び周辺視エリア２３０）を設定する。安定化対象画像が第ｎフレーム画像である場合、エリア設定用画像を第（ｎ−１）フレーム画像とする。 The area setting unit 31 is configured to monitor a central vision area and a peripheral vision area (for example, a central vision area 220 and a peripheral vision area in FIG. 5A to be described later) based on face detection information about the area setting image. 230) is set. When the stabilization target image is the nth frame image, the area setting image is the (n−1) th frame image.

画像変化判定部３２は、安定化対象画像についての中心視エリア及び周辺視エリア内の各画像変化を検出する。安定化対象画像についての中心視エリア及び周辺視エリア内の各画像変化とは、安定化対象画像を含む２つの画像変化検出用画像間における中心視エリア及び周辺視エリア内の各画像変化である。安定化対象画像が第ｎフレーム画像である場合、上記２つの画像変化検出用画像は、第（ｎ−１）及び第ｎフレーム画像である。尚、画像変化判定部３２にて実施される処理を、総称して「画像変化検出処理」と呼ぶ。 The image change determination unit 32 detects each image change in the central view area and the peripheral view area for the stabilization target image. The image changes in the central vision area and the peripheral vision area for the stabilization target image are image changes in the central vision area and the peripheral vision area between the two image change detection images including the stabilization target image. . When the stabilization target image is the nth frame image, the two image change detection images are the (n−1) th and nth frame images. The processing performed by the image change determination unit 32 is collectively referred to as “image change detection processing”.

安定化顔情報生成部３３は、画像変化判定部３２による画像変化検出処理の結果と、参照過去画像及び安定化対象画像についての顔検出情報とに基づいて、安定化対象画像についての安定化顔情報を生成して出力する。安定化対象画像が第ｎフレーム画像である場合、参照過去画像とは、第（ｎ−１）フレーム画像である。 The stabilization face information generation unit 33 is based on the result of the image change detection process by the image change determination unit 32 and the face detection information about the reference past image and the stabilization target image, and the stabilization face for the stabilization target image. Generate and output information. When the stabilization target image is the nth frame image, the reference past image is the (n−1) th frame image.

以下に、図１の撮像装置１の顔検出機能に関与する詳細な実施例として、第１〜第４実施例を説明する。 Below, the 1st-4th Example is described as a detailed Example related to the face detection function of the imaging device 1 of FIG.

＜＜第１実施例＞＞
まず、第１実施例について説明する。第１実施例では、エリア設定部３１によるエリア設定法について説明する。エリア設定部３１が利用可能なエリア設定法として、第１〜第４エリア設定法を個別に例示する。尚、本明細書では、画像内の或る部分を表す言葉としてエリアと領域という２つの文言を用いているが、両者は同義である。 << First Example >>
First, the first embodiment will be described. In the first embodiment, an area setting method by the area setting unit 31 will be described. As area setting methods that can be used by the area setting unit 31, first to fourth area setting methods are individually illustrated. In the present specification, two words, an area and a region, are used as words representing a certain part in the image, but both are synonymous.

［第１エリア設定法］
第１エリア設定法について説明する。図５（ａ）を参照する。図５（ａ）において、符号２００は、エリア設定用画像としての第（ｎ−１）フレーム画像である。符号２１０が付された破線矩形領域は、図２の顔検出処理部２１によって第（ｎ−１）フレーム画像から抽出された顔領域である。顔領域２１０の位置及び大きさは、第（ｎ−１）フレーム画像の顔検出情報によって特定される。 [First area setting method]
The first area setting method will be described. Reference is made to FIG. In FIG. 5A, reference numeral 200 denotes an (n−1) th frame image as an area setting image. A broken-line rectangular area denoted by reference numeral 210 is a face area extracted from the (n−1) th frame image by the face detection processing unit 21 in FIG. 2. The position and size of the face area 210 are specified by face detection information of the (n−1) th frame image.

図５（ａ）において、符号２２０が付された実線矩形領域は、顔領域２１０を含む中心視エリア（基準エリア）である。中心視エリア２２０は、顔領域２１０についての顔の位置、大きさ及び向きに基づいて設定される。典型的には例えば、中心視エリア２２０の中心と顔領域２１０の中心を一致させる。また、図５（ｂ）に示す如く、顔の向きに応じて両中心をずらすようにしてもよい。中心視エリア２２０の大きさ（画像サイズ）は、顔領域２１０の大きさ（画像サイズ）に比例した大きさとする、或いは、顔領域２１０の大きさに応じた一定の大きさとする。 In FIG. 5A, a solid line rectangular area denoted by reference numeral 220 is a central viewing area (reference area) including the face area 210. The central vision area 220 is set based on the position, size, and orientation of the face with respect to the face area 210. Typically, for example, the center of the central viewing area 220 and the center of the face area 210 are made to coincide. Further, as shown in FIG. 5B, both centers may be shifted according to the orientation of the face. The size (image size) of the central viewing area 220 is set to a size proportional to the size (image size) of the face area 210 or a certain size according to the size of the face area 210.

図５（ａ）において、符号２３０が付されたロの字状の斜線領域は、中心視エリア２２０の周辺に配置された周辺視エリア（周辺エリア）である。周辺視エリア２３０は、中心視エリア２２０に隣接しつつ中心視エリア２２０の全周を取り囲むように配置されたエリアであり、中心視エリア２２０と周辺視エリア２３０は互いに重なり合わない。故に、周辺視エリア２３０内には、顔領域２１０は存在しない。周辺視エリア２３０は、典型的には例えば、中心視エリア２２０の中心と中心が合致する、中心視エリア２２０よりもサイズの大きな矩形領域から、中心視エリア２２０を省いた部分とされる。 In FIG. 5A, a square-shaped hatched area denoted by reference numeral 230 is a peripheral vision area (peripheral area) arranged around the central vision area 220. The peripheral vision area 230 is an area that is adjacent to the central vision area 220 and is disposed so as to surround the entire circumference of the central vision area 220, and the central vision area 220 and the peripheral vision area 230 do not overlap each other. Therefore, the face area 210 does not exist in the peripheral vision area 230. The peripheral vision area 230 is typically a portion obtained by omitting the central vision area 220 from a rectangular region having a size larger than that of the central vision area 220 and the center of the central vision area 220 being coincident with the center.

第１エリア設定法を採用する場合、図３のエリア設定部３１は、エリア設定用画像の顔検出情報に基づいて中心視エリア２２０及び周辺視エリア２３０を設定し、入力画像上における中心視エリア２２０及び周辺視エリア２３０の位置及び大きさを画像変化判定部３２に伝達する。画像変化判定部３２は、２つの画像変化検出用画像（今の例の場合、第（ｎ−１）及び第ｎフレーム画像）に対して中心視エリア２２０及び周辺視エリア２３０を設定し、２つの画像変化検出用画像間において中心視エリア２２０及び周辺視エリア２３０内の各画像変化を検出する。これにより、中心視エリア２２０及び周辺視エリア２３０内の動き（顔が動いたか等）を把握することができる。 When the first area setting method is adopted, the area setting unit 31 in FIG. 3 sets the central vision area 220 and the peripheral vision area 230 based on the face detection information of the area setting image, and the central vision area on the input image. The position and size of 220 and the peripheral vision area 230 are transmitted to the image change determination unit 32. The image change determination unit 32 sets a central viewing area 220 and a peripheral viewing area 230 for two image change detection images (in the present example, the (n−1) th and nth frame images), and 2 Each image change in the central vision area 220 and the peripheral vision area 230 is detected between two image change detection images. Thereby, it is possible to grasp the movements in the central vision area 220 and the peripheral vision area 230 (whether the face has moved).

尚、図５（ａ）は、顔が傾いていない時の例を示しているが、顔が傾いているときは、その顔の傾きに応じて中心視エリア２２０及び周辺視エリア２３０も傾く（第２〜第４エリア設定法についても同様）。 FIG. 5A shows an example when the face is not tilted, but when the face is tilted, the central vision area 220 and the peripheral vision area 230 are tilted according to the tilt of the face ( The same applies to the second to fourth area setting methods).

［第２エリア設定法］
次に、第２エリア設定法について説明する。図６（ａ）及び（ｂ）を参照する。図６（ａ）において、符号２００はエリア設定用画像としての第（ｎ−１）フレーム画像であり、顔領域２１０及び中心視エリア２２０は、図５（ａ）のそれらと同じものである。 [Second area setting method]
Next, the second area setting method will be described. Reference is made to FIGS. 6 (a) and 6 (b). In FIG. 6A, reference numeral 200 denotes an (n−1) th frame image as an area setting image, and the face area 210 and the central viewing area 220 are the same as those in FIG.

第２エリア設定法では、周辺視エリアが４つに分割されており、上エリア２３１、下エリア２３２、左エリア２３３及び右エリア２３４によって、図５（ａ）に示すそれと同じ周辺視エリア２３０が形成される。尚、図６（ｂ）は、周辺視エリア２３０の部分のみを抽出した図である。また、記述の簡略化上、以下、上エリア、下エリア、左エリア及び右エリアを、夫々単にエリアとも記述する。 In the second area setting method, the peripheral vision area is divided into four, and the same peripheral vision area 230 as shown in FIG. 5A is formed by the upper area 231, the lower area 232, the left area 233, and the right area 234. It is formed. FIG. 6B is a diagram in which only the peripheral vision area 230 is extracted. For simplification of description, the upper area, the lower area, the left area, and the right area are also simply referred to as areas.

図６（ａ）の紙面上側と合致する顔の額側を上とし、図６（ａ）の紙面下側と合致する顔の顎側を下と定義する。上エリア２３１は周辺視エリア２３０内の上側の一部矩形領域であり、下エリア２３２は周辺視エリア２３０内の下側の一部矩形領域であり、左エリア２３３は周辺視エリア２３０内の左側の一部矩形領域であり、右エリア２３４は周辺視エリア２３０内の右側の一部矩形領域である。各エリア２３１〜２３４を合成したエリアが周辺視エリア２３０であり、各エリア２３１〜２３４は互いに重なり合わない。 The forehead side of the face that matches the upper side of the page of FIG. 6A is defined as the upper side, and the chin side of the face that matches the lower side of the page of FIG. The upper area 231 is an upper partial rectangular area in the peripheral vision area 230, the lower area 232 is a lower partial rectangular area in the peripheral vision area 230, and the left area 233 is a left side in the peripheral vision area 230. The right area 234 is a right-side partial rectangular area in the peripheral vision area 230. The area obtained by combining the areas 231 to 234 is the peripheral vision area 230, and the areas 231 to 234 do not overlap each other.

第２エリア設定法を採用する場合、図３のエリア設定部３１は、エリア設定用画像の顔検出情報に基づいて中心視エリア２２０及びエリア２３１〜２３４を設定し、入力画像上における中心視エリア２２０及びエリア２３１〜２３４の位置及び大きさを画像変化判定部３２に伝達する。画像変化判定部３２は、２つの画像変化検出用画像（今の例の場合、第（ｎ−１）及び第ｎフレーム画像）に対して中心視エリア２２０及びエリア２３１〜２３４を設定し、２つの画像変化検出用画像間において中心視エリア２２０及びエリア２３１〜２３４内の各画像変化を検出する。 When the second area setting method is adopted, the area setting unit 31 in FIG. 3 sets the central viewing area 220 and the areas 231 to 234 based on the face detection information of the area setting image, and the central viewing area on the input image. 220 and the positions and sizes of the areas 231 to 234 are transmitted to the image change determination unit 32. The image change determination unit 32 sets the central vision area 220 and the areas 231 to 234 for two image change detection images (in this example, the (n−1) th and nth frame images), and 2 Each image change in the central vision area 220 and the areas 231 to 234 is detected between two image change detection images.

また、顔領域の下側には人物の身体が存在しているため、下エリア２３２には人物の身体が描画されることとなる。そして、顔の位置が変わっていなくても身体は動きうる。本実施形態では、顔領域周辺の変化を監視して顔の位置が変化したのか否か等を判断し、これによって顔検出の安定化を図ろうとしているため、顔の位置に変化を及ぼさないような身体の動きは、顔検出の安定化にとって、いわばノイズ成分となる。身体の動きを考慮して、顔領域周辺に変化が生じたか否かを判別する判定閾値を大きめに設定しておくという手法をとることもできるが、周辺視エリアの分割によって、これに対応することが可能である。 In addition, since the human body exists below the face area, the human body is drawn in the lower area 232. And even if the position of the face has not changed, the body can move. In this embodiment, it is determined whether or not the position of the face has changed by monitoring changes in the vicinity of the face area, and thereby trying to stabilize face detection, so that the position of the face is not changed. Such body movement becomes a noise component for the stabilization of face detection. Considering the movement of the body, it is possible to take a method of setting a larger threshold for determining whether or not a change has occurred around the face area, but this can be dealt with by dividing the peripheral vision area It is possible.

つまり、画像変化判定部３２の画像変化検出処理において、エリア２３１〜２３４の一部を無視するようにしてもよい。エリア２３１〜２３４の内、何れのエリアを無視するかは、エリア設定用画像の顔検出情報に応じて決定することができる。例えば、図６（ａ）の例の場合、顔の顎側のエリア（即ち、身体の一部を含むエリア）である下エリア２３２を、画像変化検出処理において無視するようにする。尚、顔検出可能な顔検出処理部２１（図２）は、当然、顔の額側や顎側を認識可能であり、その認識内容も顔検出情報に含まれている。 That is, in the image change detection process of the image change determination unit 32, a part of the areas 231 to 234 may be ignored. Which of the areas 231 to 234 is to be ignored can be determined according to the face detection information of the area setting image. For example, in the example of FIG. 6A, the lower area 232 that is an area on the jaw side of the face (that is, an area including a part of the body) is ignored in the image change detection process. The face detection processing unit 21 (FIG. 2) capable of detecting a face can naturally recognize the forehead side and the chin side of the face, and the recognition content is also included in the face detection information.

また、重力の向きを検出可能な傾きセンサ（不図示）を撮像装置１に設けることも可能であり、この場合は、傾きセンサの検出結果に基づいて、エリア２３１〜２３４の内の何れのエリアを無視するかを決定しても良い。つまり、傾きセンサによって検出された重力の向きから、エリア２３１〜２３４の内、何れのエリアが顔の顎側のエリア（即ち、身体の一部を含むエリア）であるかを特定し、特定されたエリアを画像変化検出処理において無視するようにする。 In addition, an inclination sensor (not shown) capable of detecting the direction of gravity can be provided in the imaging apparatus 1, and in this case, any of the areas 231 to 234 is based on the detection result of the inclination sensor. You may decide to ignore. In other words, from the direction of gravity detected by the tilt sensor, it is specified by specifying which of the areas 231 to 234 is the area on the chin side of the face (that is, an area including a part of the body). Are ignored in the image change detection process.

尚、画像変化検出処理において、エリア２３１〜２３４の内の何れかのエリアを完全に無視するのではなく、エリア２３１〜２３４の夫々に重みを付けた上で画像変化検出処理を実施するようにしても良い。例えば、顔の顎側のエリアである下エリア２３２の重みを他のエリア（２３１、２３３又は２３４）の重みよりも小さくすることによって、前者の画像変化が画像変化検出処理の結果に与える寄与度を後者のそれよりも小さくする。或るエリアに関し、重みをゼロにするということは、画像変化検出処理において、そのエリアを無視することと等価である。 In the image change detection process, any one of the areas 231 to 234 is not completely ignored, but the image change detection process is performed after weighting each of the areas 231 to 234. May be. For example, by making the weight of the lower area 232, which is the area on the jaw side of the face, smaller than the weight of the other area (231, 233 or 234), the degree of contribution that the former image change gives to the result of the image change detection process Is smaller than that of the latter. For a certain area, setting the weight to zero is equivalent to ignoring that area in the image change detection process.

［第３エリア設定法］
また、第２エリア設定法の如く周辺視エリアを分割するのではなく、図７に示す如く、周辺視エリアの形状をコの字状にすることによっても、身体の動きに対応可能である。これを第３エリア設定法とする。 [Third area setting method]
Further, instead of dividing the peripheral vision area as in the second area setting method, it is possible to cope with body movement by making the shape of the peripheral vision area U-shaped as shown in FIG. This is the third area setting method.

図７において、符号２００はエリア設定用画像としての第（ｎ−１）フレーム画像であり、顔領域２１０及び中心視エリア２２０は、図５（ａ）のそれと同じものである。符号２３０ａが付されたコの字状の斜線領域は、第３エリア設定法に係る周辺視エリアであり、周辺視エリア２３０ａは、図６（ａ）のエリア２３１、２３３及び２３４を合成したエリアに相当する。 In FIG. 7, reference numeral 200 denotes an (n−1) th frame image as an area setting image, and the face area 210 and the central viewing area 220 are the same as those in FIG. A U-shaped oblique line area denoted by reference numeral 230a is a peripheral vision area according to the third area setting method, and the peripheral vision area 230a is an area obtained by combining the areas 231, 233, and 234 of FIG. It corresponds to.

第３エリア設定法を採用する場合、図３のエリア設定部３１は、エリア設定用画像の顔検出情報に基づいて中心視エリア２２０及び周辺視エリア２３０ａを設定し、入力画像上における中心視エリア２２０及び周辺視エリア２３０ａの位置及び大きさを画像変化判定部３２に伝達する。画像変化判定部３２は、２つの画像変化検出用画像（今の例の場合、第（ｎ−１）及び第ｎフレーム画像）に対して中心視エリア２２０及び周辺視エリア２３０ａを設定し、２つの画像変化検出用画像間において中心視エリア２２０及び周辺視エリア２３０ａ内の各画像変化を検出する。 When the third area setting method is adopted, the area setting unit 31 in FIG. 3 sets the central vision area 220 and the peripheral vision area 230a based on the face detection information of the area setting image, and the central vision area on the input image. The position and size of 220 and the peripheral vision area 230 a are transmitted to the image change determination unit 32. The image change determination unit 32 sets the central view area 220 and the peripheral view area 230a for two image change detection images (in the present example, the (n−1) th and nth frame images), and 2 Each image change in the central vision area 220 and the peripheral vision area 230a is detected between two image change detection images.

［第４エリア設定法］
次に、第４エリア設定法について説明する。図８を参照する。図８において、符号２００はエリア設定用画像としての第（ｎ−１）フレーム画像であり、顔領域２１０、中心視エリア２２０は、図５（ａ）のそれらと同じものである。 [Fourth area setting method]
Next, the fourth area setting method will be described. Please refer to FIG. In FIG. 8, reference numeral 200 denotes an (n-1) th frame image as an area setting image, and the face area 210 and the central viewing area 220 are the same as those in FIG.

図８において、符号２４０は、中心視エリア２２０の周辺に配置された第１周辺視エリアである。第１周辺視エリア２４０は、ロの字状の領域であり、図５（ａ）の周辺視エリア２３０と同様のものである（但し、サイズは異なりうる）。つまり、第１周辺視エリア２４０は、中心視エリア２２０に隣接しつつ中心視エリア２２０の全周を取り囲むように配置されたエリアであり、中心視エリア２２０と第１周辺視エリア２４０は互いに重なり合わない。第１周辺視エリア２４０は、典型的には例えば、中心視エリア２２０の中心と中心が合致する、中心視エリア２２０よりもサイズの大きな矩形領域から、中心視エリア２２０を省いた部分とされる。 In FIG. 8, reference numeral 240 denotes a first peripheral vision area arranged around the central vision area 220. The first peripheral vision area 240 is a square-shaped region and is the same as the peripheral vision area 230 in FIG. 5A (however, the size may be different). That is, the first peripheral vision area 240 is an area that is adjacent to the central vision area 220 and is disposed so as to surround the entire circumference of the central vision area 220. The central vision area 220 and the first peripheral vision area 240 overlap each other. Do not fit. The first peripheral vision area 240 is typically a portion obtained by omitting the central vision area 220 from a rectangular area having a size larger than that of the central vision area 220 and the center of the central vision area 220 being coincident with the center. .

図８において、符号２５０が付されたロの字状の斜線領域は、第１周辺視エリア２４０の周辺に配置された第２周辺視エリアである。第２周辺視エリア２５０は、第１周辺視エリア２４０に隣接しつつ第１周辺視エリア２４０の全周を取り囲むように配置されたエリアであり、第１周辺視エリア２４０と第２周辺視エリア２５０は互いに重なり合わない。 In FIG. 8, a square-shaped hatched area denoted by reference numeral 250 is a second peripheral vision area arranged around the first peripheral vision area 240. The second peripheral vision area 250 is an area that is adjacent to the first peripheral vision area 240 and is disposed so as to surround the entire circumference of the first peripheral vision area 240. The first peripheral vision area 240 and the second peripheral vision area 240 250 do not overlap each other.

第４エリア設定法を採用する場合、図３のエリア設定部３１は、エリア設定用画像の顔検出情報に基づいて中心視エリア２２０、第１周辺視エリア２４０及び第２周辺視エリア２５０を設定し、入力画像上における中心視エリア２２０、第１周辺視エリア２４０及び第２周辺視エリア２５０の位置及び大きさを画像変化判定部３２に伝達する。画像変化判定部３２は、２つの画像変化検出用画像（今の例の場合、第（ｎ−１）及び第ｎフレーム画像）に対して中心視エリア２２０、第１周辺視エリア２４０及び第２周辺視エリア２５０を設定し、２つの画像変化検出用画像間において中心視エリア２２０、第１周辺視エリア２４０及び第２周辺視エリア２５０内の各画像変化を検出する。 When the fourth area setting method is adopted, the area setting unit 31 in FIG. 3 sets the central vision area 220, the first peripheral vision area 240, and the second peripheral vision area 250 based on the face detection information of the area setting image. Then, the positions and sizes of the central vision area 220, the first peripheral vision area 240, and the second peripheral vision area 250 on the input image are transmitted to the image change determination unit 32. The image change determination unit 32 performs the central view area 220, the first peripheral view area 240, and the second view for two image change detection images (in the present example, the (n−1) th and nth frame images). The peripheral vision area 250 is set, and each image change in the central vision area 220, the first peripheral vision area 240, and the second peripheral vision area 250 is detected between the two image change detection images.

第４エリア設定法における周辺視エリアは、第１周辺視エリア２４０及び第２周辺視エリア２５０を合成したエリアであり、周辺視エリアが階層化されている。周辺視エリアを階層化ことにより、より高度な画像変化状態を検出可能となる。例えば、顔の位置が移動した場合と、顔が他の物体によって隠れた場合とを、正確に区別することも可能となる。なぜなら、顔の位置が移動した場合は、中心視エリア２２０、第１周辺視エリア２４０、第２周辺視エリア２５０の順番で動きが伝播するのに対して、顔が他の物体によって隠れた場合は、第２周辺視エリア２５０、第１周辺視エリア２４０、中心視エリア２２０の順番で動きが伝播するからであり、各エリアの画像変化を監視することで両者の区別が可能となるからである。 The peripheral vision area in the fourth area setting method is an area obtained by combining the first peripheral vision area 240 and the second peripheral vision area 250, and the peripheral vision area is hierarchized. By hierarchizing the peripheral vision area, a more advanced image change state can be detected. For example, it is possible to accurately distinguish the case where the position of the face has moved from the case where the face is hidden by another object. This is because when the face moves, the movement propagates in the order of the central vision area 220, the first peripheral vision area 240, and the second peripheral vision area 250, whereas the face is hidden by another object. This is because the movement propagates in the order of the second peripheral vision area 250, the first peripheral vision area 240, and the central vision area 220, and it is possible to distinguish between the two by monitoring the image change in each area. is there.

尚、第１エリア設定法を第２エリア設定法に変形したように、第１周辺視エリア２４０及び第２周辺視エリア２５０の夫々を、複数のエリアに分割するようにしてもよい。そして、顔の顎側のエリア（即ち、身体の一部を含むエリア）を画像変化検出処理において無視するようにしてもよい。また、第３エリア設定法のように、第１周辺視エリア２４０及び第２周辺視エリア２５０の夫々を、顔の顎側のエリアを含まない、コの字状の領域としてもよい。 Note that each of the first peripheral vision area 240 and the second peripheral vision area 250 may be divided into a plurality of areas as if the first area setting method was modified to the second area setting method. Then, the area on the jaw side of the face (that is, the area including a part of the body) may be ignored in the image change detection process. Further, as in the third area setting method, each of the first peripheral vision area 240 and the second peripheral vision area 250 may be a U-shaped region that does not include the area on the jaw side of the face.

また、説明の明確化及び具体化のため、中心視エリア２２０と周辺視エリア２３０又は２３０ａとが互いに重なり合わない場合を想定したが、両者の一部が重なり合っていても構わない（図５（ａ）及び図７参照）。これは、「互いに重なり合わない」と述べた他のエリア間についても当てはまる。つまり例えば、図６（ａ）のエリア２３１の一部と中心視エリア２２０の一部、図６（ａ）のエリア２３１の一部とエリア２３３の一部、図８の中心視エリア２２０の一部と第１周辺視エリア２４０の一部、又は、図８の第１周辺視エリア２４０の一部と第２周辺視エリア２５０の一部が重なり合っていても構わない。 Further, for the sake of clarification and concrete explanation, it is assumed that the central vision area 220 and the peripheral vision area 230 or 230a do not overlap each other, but a part of both may overlap (FIG. 5 ( a) and FIG. 7). This is also true for other areas described as “not overlapping each other”. That is, for example, part of the area 231 and part of the central viewing area 220 in FIG. 6A, part of the area 231 and part of the area 233 in FIG. 6A, part of the central viewing area 220 in FIG. And a part of the first peripheral vision area 240 or a part of the first peripheral vision area 240 and a part of the second peripheral vision area 250 in FIG. 8 may overlap each other.

＜＜第２実施例＞＞
次に、第２実施例について説明する。第２実施例では、図３の画像変化判定部３２で実施される画像変化の検出手法を説明する。 << Second Example >>
Next, a second embodiment will be described. In the second embodiment, an image change detection method performed by the image change determination unit 32 in FIG. 3 will be described.

画像変化判定部３２は、エリア設定部３１で設定されたエリアの夫々を検出対象エリアとして取り扱い、検出対象エリアごとに検出対象エリア内の画像変化を検出する。 The image change determination unit 32 treats each of the areas set by the area setting unit 31 as a detection target area, and detects an image change in the detection target area for each detection target area.

例えば、エリア設定部３１が上述の第１エリア設定法を採用する場合は、図５（ａ）の中心視エリア２２０及び周辺視エリア２３０の夫々が検出対象エリアとして取り扱われる。エリア設定部３１が上述の第２エリア設定法を採用する場合は、図６（ａ）の中心視エリア２２０並びにエリア２３１〜２３４の夫々が検出対象エリアとして取り扱われる。エリア設定部３１が上述の第３エリア設定法を採用する場合は、図７の中心視エリア２２０及び周辺視エリア２３０ａの夫々が検出対象エリアとして取り扱われる。エリア設定部３１が上述の第４エリア設定法を採用する場合は、図８の中心視エリア２２０並びに第１周辺視エリア２４０及び第２周辺視エリア２５０の夫々が検出対象エリアとして取り扱われる。 For example, when the area setting unit 31 adopts the first area setting method described above, each of the central vision area 220 and the peripheral vision area 230 in FIG. 5A is handled as the detection target area. When the area setting unit 31 adopts the above-described second area setting method, each of the central vision area 220 and the areas 231 to 234 in FIG. 6A is handled as the detection target area. When the area setting unit 31 adopts the above-described third area setting method, each of the central vision area 220 and the peripheral vision area 230a in FIG. 7 is handled as a detection target area. When the area setting unit 31 adopts the above-described fourth area setting method, each of the central vision area 220, the first peripheral vision area 240, and the second peripheral vision area 250 in FIG. 8 is handled as the detection target area.

但し、画像変化検出処理において無視されるエリアは、検出対象エリアとならない。例えば、上述の第２エリア設定法の例において、図６（ａ）の下エリア２３２を画像変化検出処理において無視することができると述べたが、その場合、下エリア２３２は検出対象エリアとならない。 However, an area that is ignored in the image change detection process is not a detection target area. For example, in the example of the second area setting method described above, it has been described that the lower area 232 in FIG. 6A can be ignored in the image change detection process, but in this case, the lower area 232 is not a detection target area. .

画像変化判定部３２は、２つの画像変化検出用画像（今の例の場合、第（ｎ−１）及び第ｎフレーム画像）間における各検出対象エリア内の画像変化を検出する。或る１つの検出対象エリアに着目して、画像変化の検出手法を説明する。 The image change determination unit 32 detects an image change in each detection target area between two image change detection images (in the present example, the (n−1) th and nth frame images). An image change detection method will be described by focusing on one detection target area.

［動きベクトルによる画像変化検出］
例えば、画像変化判定部３２は、各画像変化検出用画像における検出対象エリア内の映像信号（画像データ）に基づいて動きベクトルを算出し、この動きベクトルの算出によって検出対象エリア内の画像変化を検出する。着目した検出対象エリアについて算出された動きベクトルは、その検出対象エリア内の画像の動きの向き及び大きさを特定する。 [Image change detection based on motion vectors]
For example, the image change determination unit 32 calculates a motion vector based on the video signal (image data) in the detection target area in each image change detection image, and the image change in the detection target area is calculated by calculating the motion vector. To detect. The motion vector calculated for the focused detection target area specifies the direction and magnitude of the motion of the image in the detection target area.

動きベクトルの算出手法自体は公知であるため、その詳細な算出手法の説明を割愛する。例えば、特開昭６１−２０１５８１号公報等に示されている代表点マッチング法を用いて動きベクトルを算出することができる。また、検出対象エリアのオプティカルフローを検出することによって動きベクトルを算出しても良い。オプティカルフローは、上記非特許文献１（特にページ３６〜４０）等に示されている、ブロックマッチング法、勾配法又はＬＫ（Lucas-Kanade）法に基づいて検出することができる。 Since the motion vector calculation method itself is known, a detailed description of the calculation method is omitted. For example, a motion vector can be calculated using a representative point matching method disclosed in Japanese Patent Application Laid-Open No. 61-201581. Alternatively, the motion vector may be calculated by detecting the optical flow in the detection target area. The optical flow can be detected based on the block matching method, the gradient method, or the LK (Lucas-Kanade) method described in Non-Patent Document 1 (particularly, pages 36 to 40).

１つの検出対象エリアに対して１つの動きベクトルを算出するようにしても構わないが、より高度な画像変化検出を行うために、検出対象エリアを複数のブロックに分割し、ブロックごとに動きベクトルを算出するとよい。この場合、１つの検出対象エリアに対して複数の動きベクトルが算出される。 One motion vector may be calculated for one detection target area. However, in order to perform more advanced image change detection, the detection target area is divided into a plurality of blocks, and the motion vector for each block is calculated. Should be calculated. In this case, a plurality of motion vectors are calculated for one detection target area.

動きベクトルを算出する場合、検出対象エリア内の画像変化を以下のように判断することができる。即ち、着目した検出対象エリアの動きベクトルの大きさが所定の判定閾値未満である場合、その検出対象エリア内の画像変化はないと判断し、その動きベクトルの大きさが該判定閾値以上である場合、その検出対象エリア内の画像変化があると判断する。尚、「画像変化がない」とは、画像変化が完全にないことを意味するだけでなく、画像変化が殆どないことをも意味する概念である。また、検出対象エリア内の画像変化を、検出対象エリア内の状態変化とも言い換えることができる。 When calculating a motion vector, an image change in the detection target area can be determined as follows. That is, when the size of the motion vector in the detection target area of interest is less than a predetermined determination threshold, it is determined that there is no image change in the detection target area, and the size of the motion vector is greater than or equal to the determination threshold In this case, it is determined that there is an image change in the detection target area. “No image change” is a concept that not only means that there is no image change, but also means that there is almost no image change. In addition, an image change in the detection target area can be rephrased as a state change in the detection target area.

また、１つの検出対象エリアに対して複数の動きベクトルを算出する場合は、複数の動きベクトルの大きさの平均値を算出し、その平均値と所定の判定閾値とを比較すれば良い。そして、その平均値が該判定閾値未満である場合に、着目した検出対象エリア内の画像変化はないと判断し、その平均値が該判定閾値以上である場合に、検出対象エリア内の画像変化はあると判断する。尚、複数の動きベクトルの大きさの平均値の代わりに、複数の動きベクトルの大きさの最大値を用いて画像変化の有無を検出するようにしてもよい。最大値の方が画像変化の有無検出に適している場合があるからである。同様に、複数の動きベクトルの大きさの平均値の代わりに、複数の動きベクトルの大きさの、最頻値、最小値又は中央値などを用いることも可能である。 In addition, when calculating a plurality of motion vectors for one detection target area, an average value of the sizes of the plurality of motion vectors may be calculated and the average value may be compared with a predetermined determination threshold. When the average value is less than the determination threshold, it is determined that there is no image change in the target detection target area, and when the average value is equal to or greater than the determination threshold, the image change in the detection target area Judge that there is. Note that the presence or absence of image change may be detected using the maximum value of the magnitudes of the plurality of motion vectors instead of the average value of the magnitudes of the plurality of motion vectors. This is because the maximum value may be more suitable for detecting the presence / absence of an image change. Similarly, instead of the average value of the sizes of the plurality of motion vectors, the mode value, the minimum value, or the median value of the sizes of the plurality of motion vectors can be used.

［カメラ制御情報による画像変化検出］
動きベクトルではなく、カメラ制御情報に基づいて画像変化を検出することも可能である。これについて説明する。 [Image change detection based on camera control information]
It is also possible to detect image changes based on camera control information instead of motion vectors. This will be described.

カメラ制御情報とは、例えば、自動露出制御に利用可能なＡＥ評価値である。ＡＥ評価値は、着目したエリア内の画像の明るさに応じた値を持ち、例えば、着目したエリア内の輝度信号値の平均値とされる。ＡＥ評価値によって表される画像の明るさの変化によって、画像変化（顔の位置の移動等）を検出可能である。ＡＥ評価値を利用する場合、２つの画像変化検出用画像における検出対象エリア内のＡＥ評価値を算出し、一方の画像変化検出用画像におけるＡＥ評価値と他方の画像変化検出用画像におけるＡＥ評価値とを比較する。そして、両者の差分値が所定の差分閾値未満である場合、着目した検出対象エリア内の画像変化はないと判断し、そうでない場合、着目した検出対象エリア内の画像変化はあると判断する。 The camera control information is, for example, an AE evaluation value that can be used for automatic exposure control. The AE evaluation value has a value corresponding to the brightness of the image in the focused area, and is, for example, an average value of luminance signal values in the focused area. An image change (movement of the face position, etc.) can be detected by a change in the brightness of the image represented by the AE evaluation value. When using the AE evaluation value, the AE evaluation value in the detection target area in the two image change detection images is calculated, and the AE evaluation value in one image change detection image and the AE evaluation in the other image change detection image. Compare the value. If the difference value between the two is less than the predetermined difference threshold, it is determined that there is no image change in the focused detection target area. Otherwise, it is determined that there is an image change in the focused detection target area.

また、カメラ制御情報とは、例えば、オートフォーカス制御に利用可能なＡＦ評価値である。ＡＦ評価値は、着目したエリア内の画像のコントラスト量（エッジ量）に応じた値を持ち、例えば、該エリア内の画像に含まれる所定の高域周波数成分の積算値から算出される。ＡＥ評価値によって表される画像のコントラスト量の変化によって、画像変化（顔の位置の移動等）を検出可能である。ＡＦ評価値を利用する場合も、ＡＥ評価値を利用する場合と同様にして、検出対象エリア内の画像変化の有無検出が可能である。即ち、ＡＦ評価値を利用する場合、２つの画像変化検出用画像における検出対象エリア内のＡＦ評価値を算出し、一方の画像変化検出用画像におけるＡＦ評価値と他方の画像変化検出用画像におけるＡＦ評価値とを比較する。そして、両者の差分値が所定の差分閾値未満である場合、着目した検出対象エリア内の画像変化はないと判断し、そうでない場合、着目した検出対象エリア内の画像変化はあると判断する。 The camera control information is, for example, an AF evaluation value that can be used for autofocus control. The AF evaluation value has a value corresponding to the contrast amount (edge amount) of the image in the focused area, and is calculated from, for example, an integrated value of a predetermined high-frequency component included in the image in the area. An image change (movement of the face position, etc.) can be detected by a change in the contrast amount of the image represented by the AE evaluation value. Even when the AF evaluation value is used, it is possible to detect the presence or absence of an image change in the detection target area in the same manner as when the AE evaluation value is used. That is, when the AF evaluation value is used, the AF evaluation value in the detection target area in the two image change detection images is calculated, and the AF evaluation value in one image change detection image and the other image change detection image are calculated. The AF evaluation value is compared. If the difference value between the two is less than the predetermined difference threshold, it is determined that there is no image change in the focused detection target area. Otherwise, it is determined that there is an image change in the focused detection target area.

また、カメラ制御情報とは、例えば、オートホワイトバランス制御に利用可能なＡＷＢ評価値である。オートホワイトバランス制御では、着目したエリア内の画像のＲＧＢ信号の比率が所望の比率となるように自動制御され、その自動制御の元となる評価値がＡＷＢ評価値である。ＡＷＢ評価値は、例えば、着目したエリア内の画像のＲＧＢ信号の比率に応じた値を持つ。検出対象エリア内の画像に動きが生じた場合（顔の位置の移動等が生じた場合）、検出対象エリアについてのＡＷＢ評価値に変化が生じる。故に、ＡＷＢ評価値に基づいても、ＡＥ評価値又はＡＦ評価値を利用する場合と同様にして、検出対象エリア内の画像変化の有無検出が可能である。 The camera control information is, for example, an AWB evaluation value that can be used for auto white balance control. In the auto white balance control, automatic control is performed so that the ratio of the RGB signals of the image in the focused area becomes a desired ratio, and the evaluation value that is the basis of the automatic control is the AWB evaluation value. For example, the AWB evaluation value has a value corresponding to the ratio of the RGB signals of the image in the focused area. When a motion occurs in an image in the detection target area (when a face position is moved or the like), a change occurs in the AWB evaluation value for the detection target area. Therefore, based on the AWB evaluation value, it is possible to detect the presence or absence of an image change in the detection target area in the same manner as when the AE evaluation value or the AF evaluation value is used.

尚、エリア設定部３１で設定される中心視エリアや周辺視エリア等の各エリアを、カメラ制御情報の算出用のエリアに合わせ込むようにしてもよい。例えば、カメラ制御情報の算出用のエリアは、フレーム画像を縦方向に１６分割し且つ横方向に１６分割して得られる合計２５６個の分割エリアの夫々とされ、その合計２５６個の分割エリアの夫々に対してカメラ制御情報が算出されるが、エリア設定部３１で設定される各エリア（中心視エリアや周辺視エリア）を１又は複数の分割エリアから形成するようにしてもよい。つまり例えば、図２の顔検出処理部２１によって抽出された顔領域の位置及び大きさに基づきつつ、その顔領域を含む、１つの分割エリア又は互いに隣接する複数の分割エリアの合成エリアを中心視エリヤとして設定する。このようにすれば、カメラ制御情報を中心視エリアや周辺視エリア用に再計算する必要がなくなる。 Each area such as the central vision area and the peripheral vision area set by the area setting unit 31 may be matched with the area for calculating the camera control information. For example, the area for calculating camera control information is each of a total of 256 divided areas obtained by dividing the frame image into 16 parts in the vertical direction and 16 parts in the horizontal direction. The camera control information is calculated for each, but each area (central viewing area and peripheral viewing area) set by the area setting unit 31 may be formed from one or a plurality of divided areas. That is, for example, based on the position and size of the face area extracted by the face detection processing unit 21 in FIG. 2, the central area includes one divided area including the face area or a combined area of a plurality of adjacent divided areas. Set as Eliya. In this way, it is not necessary to recalculate the camera control information for the central vision area and the peripheral vision area.

自動露出制御を実行するために顔検出部１８外で算出されたＡＥ評価値、オートフォーカス制御を実行するために顔検出部１８外で算出されたＡＦ評価値、又は、オートホワイトバランス制御を実行するために顔検出部１８外で算出されたＡＷＢ評価値を流用することによって検出対象エリア内の画像変化を検出することもできるが、画像変化の検出を行うために、それに適したＡＥ評価値、ＡＦ評価値又はＡＷＢ評価値を算出する部位を図３の画像変化判定部３２内に設けるようにしてもよい。 AE evaluation value calculated outside the face detection unit 18 to execute automatic exposure control, AF evaluation value calculated outside the face detection unit 18 to execute autofocus control, or auto white balance control In order to detect the image change in the detection target area by diverting the AWB evaluation value calculated outside the face detection unit 18, an AE evaluation value suitable for detecting the image change can be detected. A part for calculating the AF evaluation value or the AWB evaluation value may be provided in the image change determination unit 32 of FIG.

動きベクトルを用いる場合もカメラ制御情報を用いる場合も、結局、画像変化判定部３２は、各画像変化検出用画像における検出対象エリア内の映像信号（画像データ）に基づいて検出対象エリア内の画像変化を検出していることになる。 Regardless of whether the motion vector is used or the camera control information is used, the image change determination unit 32 eventually determines the image in the detection target area based on the video signal (image data) in the detection target area in each image change detection image. A change is detected.

＜＜第３実施例＞＞
次に、第３実施例について説明する。第３実施例では、図３の画像変化判定部３２の画像変化検出処理に基づく安定化顔情報の生成手法を説明する。第３実施例は、上述の第１及び第２実施例と組み合わせて実施される。第１実施例の如く中心視エリア及び周辺視エリアが設定され、第２実施例の如く、検出対象エリアとしての中心視エリア及び周辺視エリアの各画像変化が検出される。 << Third Example >>
Next, a third embodiment will be described. In the third embodiment, a method for generating stabilized face information based on the image change detection process of the image change determination unit 32 in FIG. 3 will be described. The third embodiment is implemented in combination with the first and second embodiments described above. The central vision area and the peripheral vision area are set as in the first embodiment, and each image change in the central vision area and the peripheral vision area as the detection target area is detected as in the second embodiment.

第３実施例において、周辺視エリアとは、第１又は第２エリア設定法を採用した場合は図５（ａ）又は図６（ｂ）の周辺視エリア２３０を指し、第３エリア設定法を採用した場合は図７の周辺視エリア２３０ａを指し、第４エリア設定法を採用した場合は図８の第１周辺視エリア２４０及び第２周辺視エリア２５０の合成エリアを指す。中心視エリアとは、何れのエリア設定法を用いた場合も、中心視エリア２２０を指す（図５（ａ）等参照）。 In the third embodiment, the peripheral vision area refers to the peripheral vision area 230 in FIG. 5A or 6B when the first or second area setting method is adopted, and the third area setting method is referred to as the third area setting method. When it is adopted, it refers to the peripheral vision area 230a of FIG. 7, and when the fourth area setting method is adopted, it refers to the composite area of the first peripheral vision area 240 and the second peripheral vision area 250 of FIG. The central vision area refers to the central vision area 220 regardless of which area setting method is used (see FIG. 5A and the like).

画像変化判定部３２は、検出した中心視エリア及び周辺視エリアの各画像変化の有無に基づいて、安定化対象画像についての画像変化状態を４つの状態の何れかに分類する。この分類処理を「概略分類処理」という。また、上記４つの状態を、第１、第２、第３及び第４分類状態と呼ぶ。第４分類状態に分類された場合、画像変化判定部３２は、更に「詳細分類処理」を実施する。図９に、概略分類処理及び詳細分類処理の内容を大まかに示す。 The image change determination unit 32 classifies the image change state for the stabilization target image into one of four states based on the detected presence or absence of each image change in the central vision area and the peripheral vision area. This classification process is referred to as “schematic classification process”. The four states are referred to as first, second, third and fourth classification states. When classified into the fourth classification state, the image change determination unit 32 further performs “detailed classification processing”. FIG. 9 roughly shows the contents of the rough classification process and the detailed classification process.

概略分類処理において、中心視エリア及び周辺視エリアの双方に画像変化がないと判断される場合は、第１分類状態に分類する。この場合は、前フレームからの変化がなく、顔の位置に変化はないと考えられる。例えば、図５（ａ）に対応する第１エリア設定法を採用する場合は、第（ｎ−１）及び第ｎフレーム画像間において中心視エリア２２０及び周辺視エリア２３０内の各画像変化が第２実施例の如く検出され、中心視エリア２２０及び周辺視エリア２３０の双方の画像変化がないと判断される場合は、第１分類状態に分類する。 In the rough classification process, when it is determined that there is no image change in both the central vision area and the peripheral vision area, the classification is performed in the first classification state. In this case, there is no change from the previous frame, and it is considered that there is no change in the face position. For example, when the first area setting method corresponding to FIG. 5A is adopted, each image change in the central vision area 220 and the peripheral vision area 230 is changed between the (n−1) th and nth frame images. When it is detected as in the second embodiment and it is determined that there is no image change in both the central vision area 220 and the peripheral vision area 230, the image is classified into the first classification state.

概略分類処理において、中心視エリア内の画像変化はあるものの、周辺視エリア内の画像変化がないと判断される場合は、第２分類状態に分類する。この場合は、首振り動作等によって顔領域（及び顔領域近傍）に変化があるものの、顔の位置自体には変化はないと考えられる。 In the rough classification process, when it is determined that there is an image change in the central vision area but no image change in the peripheral vision area, the image is classified into the second classification state. In this case, although there is a change in the face area (and the vicinity of the face area) due to a swinging motion or the like, it is considered that the face position itself does not change.

概略分類処理において、中心視エリア内の画像変化はないが、周辺視エリア内の画像変化があると判断される場合は、第３分類状態に分類する。ノイズの影響、顔領域周辺の木の葉の揺れ、身体の揺れ等によって第３分類状態に分類されうる。但し、この場合も、顔の位置自体には変化はないと考えられる。 In the rough classification process, when it is determined that there is no image change in the central vision area but there is an image change in the peripheral vision area, the image is classified into the third classification state. It can be classified into the third classification state by the influence of noise, the shaking of the leaves around the face area, the shaking of the body, and the like. However, also in this case, it is considered that the face position itself does not change.

尚、上述の第２エリア設定法を採用して周辺視エリアが細分化されている場合は、図６（ａ）のエリア２３１〜２３４の全てにおいて画像変化がない場合に、周辺視エリア内の画像変化がないと判断し、エリア２３１〜２３４の何れかに画像変化がある場合は、周辺視エリア内の画像変化はあると判断する。 In addition, when the peripheral vision area is subdivided by adopting the above-described second area setting method, when there is no image change in all the areas 231 to 234 in FIG. If it is determined that there is no image change and there is an image change in any of the areas 231 to 234, it is determined that there is an image change in the peripheral vision area.

また、第２実施例で述べたように、画像変化検出処理において無視されるエリアは画像変化の検出対象エリアとなっていないため、その無視されるエリアの画像変化は、概略分類処理及び詳細分類処理に影響を与えない（概略分類処理及び詳細分類処理において無視される）。例えば、上述の第２エリア設定法の例において、図６（ａ）の下エリア２３２を画像変化検出処理において無視することができると述べたが、第２エリア設定法を採用して且つ下エリア２３２を無視する場合、上エリア２３１、左エリア２３３及び右エリア２３４のみを含む周辺視エリアを想定する。そして、上エリア２３１、左エリア２３３及び右エリア２３４の全てに画像変化がない場合、周辺視エリア内の画像変化はないと判断し、上エリア２３１、左エリア２３３及び右エリア２３４の何れかに画像変化がある場合、周辺視エリア内の画像変化はあると判断する。 Further, as described in the second embodiment, since the area ignored in the image change detection process is not the image change detection target area, the image change in the ignored area is performed by the rough classification process and the detailed classification. Does not affect processing (ignored in rough classification processing and detailed classification processing). For example, in the above-described example of the second area setting method, it has been described that the lower area 232 in FIG. 6A can be ignored in the image change detection process. When 232 is ignored, a peripheral vision area including only the upper area 231, the left area 233, and the right area 234 is assumed. If there is no image change in all of the upper area 231, the left area 233, and the right area 234, it is determined that there is no image change in the peripheral vision area, and any one of the upper area 231, the left area 233, and the right area 234 is determined. If there is an image change, it is determined that there is an image change in the peripheral vision area.

第１〜第３分類状態に分類された場合は、何れも顔の位置自体に変化はないものと考えられる。このため、前のフレーム画像に対する顔検出情報を安定化対象画像に対する安定化顔情報として流用することができ、これによって顔検出の途切れ及びふらつき等を抑制可能である。但し、安定化対象画像である第ｎフレーム画像の顔検出情報がある場合とない場合とで取り扱いを異なせることも可能である。即ち例えば、第１〜第３分類状態に分類された場合において、第ｎフレーム画像の顔検出情報がないときは、参照過去画像としての第（ｎ−１）フレーム画像の顔検出情報を第ｎフレーム画像に対する安定化顔情報としてそのまま用いる。一方において、第ｎフレーム画像の顔検出情報があるときは、第ｎフレーム画像の顔検出情報そのものを第ｎフレーム画像に対する安定化顔情報とする、或いは、第（ｎ−１）及び第ｎフレーム画像の顔検出情報の双方を考慮して第ｎフレーム画像に対する安定化顔情報を生成する。 In the case of classification into the first to third classification states, it is considered that there is no change in the face position itself. For this reason, the face detection information for the previous frame image can be used as the stabilization face information for the stabilization target image, and thereby it is possible to suppress face detection interruptions and fluctuations. However, the handling may be different depending on whether or not there is face detection information of the nth frame image that is the stabilization target image. That is, for example, in the case of classification into the first to third classification states, when there is no face detection information of the nth frame image, the face detection information of the (n−1) th frame image as the reference past image is changed to the nth frame image. It is used as it is as stabilized face information for the frame image. On the other hand, when there is face detection information of the nth frame image, the face detection information itself of the nth frame image is used as stabilized face information for the nth frame image, or (n-1) th and nth frames Stabilized face information for the nth frame image is generated in consideration of both face detection information of the image.

概略分類処理において、中心視エリア及び周辺視エリアの双方に画像変化があると判断される場合、安定化対象画像は第４分類状態に分類され、更に「詳細分類処理」が実施される。 In the rough classification process, when it is determined that there is an image change in both the central vision area and the peripheral vision area, the stabilization target image is classified into the fourth classification state, and “detailed classification process” is further performed.

第４分類状態は、詳細分類処理によって、第１〜第３細分化状態の何れかに細分化される。第１細分化状態は、着目した人物が歩くこと等によって、実空間において実際に顔が移動した状態に相当する。第２細分化状態は、所謂手ぶれやパン又はチルト操作によって撮像装置１自体が動いた状態に相当する。第３細分化状態は、顔の前に他の物体が進入し顔が他の物体によって隠れた状態に相当する（オクルージョンの発生）。 The fourth classification state is subdivided into any of the first to third subdivision states by the detailed classification process. The first subdivided state corresponds to a state in which the face is actually moved in the real space due to a walking person walking. The second subdivided state corresponds to a state in which the imaging device 1 itself is moved by a so-called camera shake, panning or tilting operation. The third subdivided state corresponds to a state in which another object enters in front of the face and the face is hidden by the other object (occurrence of occlusion).

動きベクトルに基づいて画像変化の検出を行った場合を想定して、詳細分類処理の手順を説明する。この場合、まず、中心視エリアの動きベクトルの向き及び大きさが周辺視エリアの動きベクトルの向き及び大きさと同じであるか否かを判断し、同じである場合は、ぶれやパン又はチルト操作によって撮像装置１自体が動いたと判断して、安定化対象画像を第２細分化状態に分類する。尚、動きベクトルの向き及び大きさに関する「同じ」とは、完全に「同じ」だけでなく実質的に「同じ」であることをも意味する幅のある概念である。また、撮像装置１の角速度を検出するためのジャイロセンサ（不図示）が撮像装置１に設けられている場合は、そのジャイロセンサの検出結果から得られた、撮像装置１の全体の動きを表す全体動きベクトルを利用することで安定化対象画像が第２細分化状態に分類されるか否かを判断することもできる。 The procedure of the detailed classification process will be described assuming that image change is detected based on the motion vector. In this case, first, it is determined whether the direction and magnitude of the motion vector in the central vision area are the same as the direction and magnitude of the motion vector in the peripheral vision area. Thus, it is determined that the imaging device 1 itself has moved, and the stabilization target image is classified into the second subdivided state. Note that “same” regarding the direction and size of a motion vector is a broad concept that means not only “same” but also “same” substantially. Moreover, when the gyro sensor (not shown) for detecting the angular velocity of the imaging device 1 is provided in the imaging device 1, the whole motion of the imaging device 1 obtained from the detection result of the gyro sensor is represented. It is also possible to determine whether or not the stabilization target image is classified into the second subdivided state by using the entire motion vector.

中心視エリアの動きベクトルの向き及び大きさが周辺視エリアの動きベクトルの向き及び大きさと同じでない場合、安定化対象画像は、第１及び第３細分化状態の何れかに分類される。第１と第３細分化状態間の区別も、中心視エリア及び周辺視エリアの動きベクトルに基づいて行うことができる。 When the direction and magnitude of the motion vector in the central vision area are not the same as the direction and magnitude of the motion vector in the peripheral vision area, the stabilization target image is classified into one of the first and third subdivided states. A distinction between the first and third subdivision states can also be made based on the motion vectors of the central vision area and the peripheral vision area.

例えば、第２実施例で述べたように、１つの検出対象エリアを複数のブロックに分割し、中心視エリア及び周辺視エリアの夫々について複数の動きベクトルを検出するようにする。この場合において、第（ｎ−１）及び第ｎフレーム間で、顔の位置が右方向に移動した場合、図１０（ａ）に示す如く、中心視エリア及び周辺視エリアの各動きベクトルの向きは右向き（中心視エリアから出て行く向き）となり且つ周辺視エリアにおいて右側にのみ有意な大きさを有する右向きの動きベクトルが検出される。このような動きベクトルが検出されたならば安定化対象画像を第１細分化状態に分類する。また、第（ｎ−１）及び第ｎフレーム間で、顔の右側から他の物体が顔の前に侵入してきた場合、図１０（ｂ）に示す如く、中心視エリア及び周辺視エリアの各動きベクトルの向きは左向き（中心視エリアに入っていく向き）となり且つ周辺視エリアにおいて右側にのみ有意な大きさを有する左向きの動きベクトルが検出される。このような動きベクトルが検出されたならば安定化対象画像を第３細分化状態に分類する。 For example, as described in the second embodiment, one detection target area is divided into a plurality of blocks, and a plurality of motion vectors are detected for each of the central vision area and the peripheral vision area. In this case, when the position of the face moves to the right between the (n−1) th and nth frames, the direction of each motion vector in the central viewing area and the peripheral viewing area as shown in FIG. Is rightward (direction going out of the central vision area) and a rightward motion vector having a significant magnitude only on the right side in the peripheral vision area is detected. If such a motion vector is detected, the stabilization target image is classified into the first subdivided state. Also, when another object enters the front of the face from the right side of the face between the (n−1) th and nth frames, as shown in FIG. 10B, each of the central vision area and the peripheral vision area The direction of the motion vector is leftward (direction entering the central vision area), and a leftward motion vector having a significant magnitude only on the right side is detected in the peripheral vision area. If such a motion vector is detected, the stabilization target image is classified into the third subdivided state.

第４分類状態に分類された場合、図３の安定化顔情報生成部３３は、第（ｎ−１）フレーム画像の顔検出情報と第ｎ及び第（ｎ−１）フレーム画像間における中心視エリア及び周辺視エリアの動きベクトルに基づいて第ｎフレーム画像における顔の位置等を推定する。 When classified into the fourth classification state, the stabilized face information generation unit 33 in FIG. 3 performs central vision between the face detection information of the (n−1) th frame image and the nth and (n−1) th frame images. The position of the face in the nth frame image is estimated based on the motion vectors of the area and the peripheral vision area.

例えば、第（ｎ−１）及び第ｎフレーム間で顔の位置が右方向に移動して図１０（ａ）に示すような動きベクトルが検出された時は、第ｎフレーム画像を第１細分化状態に分類すると共に、第ｎフレーム画像における顔の位置が、第（ｎ−１）フレーム画像の顔検出情報によって特定される顔の位置よりも右方向にずれていると推定する（ずれの向き及びずれ量は動きベクトルの向き及び大きさから決定できる）。
また例えば、第（ｎ−１）及び第ｎフレーム間で顔の右側から他の物体が顔の前に侵入してきて図１０（ｂ）に示すような動きベクトルが検出された時は、第ｎフレーム画像を第３細分化状態に分類すると共に、第ｎフレーム画像における顔の位置は、第（ｎ−１）フレーム画像の顔検出情報によって特定される顔の位置と同じであると推定する。
また例えば、第ｎフレーム画像が第２細分化状態に分類された場合は、第ｎ及び第（ｎ−１）フレーム画像間における中心視エリア及び周辺視エリアの動きベクトルに相当する分だけ、第ｎフレーム画像における顔の位置が、第（ｎ−１）フレーム画像の顔検出情報によって特定される顔の位置からずれていると推定する（ずれの向き及びずれ量は動きベクトルの向き及び大きさから決定できる）。上述の如く推定された顔の位置を推定顔位置という。 For example, when the face position moves rightward between the (n−1) th and nth frames and a motion vector as shown in FIG. 10A is detected, the nth frame image is divided into the first subdivision. And the position of the face in the nth frame image is estimated to be shifted to the right from the position of the face specified by the face detection information of the (n−1) th frame image. The direction and amount of deviation can be determined from the direction and magnitude of the motion vector).
Further, for example, when another object enters the front of the face from the right side of the face between the (n−1) th and nth frames and a motion vector as shown in FIG. The frame image is classified into the third subdivision state, and the face position in the nth frame image is estimated to be the same as the face position specified by the face detection information of the (n−1) th frame image.
Further, for example, when the nth frame image is classified into the second subdivision state, the first frame image is divided by an amount corresponding to the motion vector of the central vision area and the peripheral vision area between the nth and (n−1) th frame images. It is estimated that the position of the face in the n frame image is deviated from the position of the face specified by the face detection information of the (n−1) th frame image (the direction and amount of deviation are the direction and magnitude of the motion vector). Can be determined). The face position estimated as described above is referred to as an estimated face position.

第４分類状態に分類された場合は、安定化対象画像である第ｎフレーム画像の顔検出情報がある場合とない場合とで安定化顔情報の生成手法が異なる。 When classified into the fourth classification state, the method for generating stabilized face information differs depending on whether or not there is face detection information of the nth frame image that is the stabilization target image.

第ｎフレーム画像が第４分類状態に分類され且つ第ｎフレーム画像の顔検出情報がない場合は、上記の推定顔位置より、第ｎフレーム画像の安定化顔情報を生成する。つまり、上記の推定顔位置を、第ｎフレーム画像の安定化顔情報によって特定される顔の位置とする。また、第ｎフレーム画像の安定化顔情報によって特定される顔の大きさ等は、第（ｎ−１）フレーム画像のそれと同じとされる。 When the nth frame image is classified into the fourth classification state and there is no face detection information of the nth frame image, stabilized face information of the nth frame image is generated from the estimated face position. That is, the estimated face position is set as the face position specified by the stabilized face information of the nth frame image. The size of the face specified by the stabilized face information of the nth frame image is the same as that of the (n−1) th frame image.

第ｎフレーム画像が第４分類状態に分類され且つ第ｎフレーム画像の顔検出情報がある場合は、上記の推定顔位置と第ｎフレーム画像の顔検出情報によって特定される顔の位置の何れかを、第ｎフレーム画像の安定化顔情報における顔の位置とする。基本的には、第（ｎ−１）フレーム画像の安定化顔情報における顔の位置に近い方を、第ｎフレーム画像の安定化顔情報における顔の位置とする。これにより、検出顔位置のふらつきが抑制される。尚、第ｎフレーム画像の安定化顔情報によって特定される顔の大きさ等は、第ｎフレーム画像の顔検出情報に従えばよい。 When the nth frame image is classified into the fourth classification state and there is face detection information of the nth frame image, either of the estimated face position and the face position specified by the face detection information of the nth frame image Is the face position in the stabilized face information of the nth frame image. Basically, the face closer to the face position in the stabilized face information of the (n−1) th frame image is set as the face position in the stabilized face information of the nth frame image. Thereby, the fluctuation of the detected face position is suppressed. Note that the size of the face specified by the stabilized face information of the nth frame image may follow the face detection information of the nth frame image.

＜＜第４実施例＞＞
次に、第４実施例について説明する。第４実施例でも、第３実施例と同様、図３の画像変化判定部３２の画像変化検出処理に基づく安定化顔情報の生成手法を説明する。第４実施例も、上述の第１及び第２実施例と組み合わせて実施される。但し、第４実施例では、図３のエリア設定部３１において上述の第４エリア設定法が採用された場合のみを想定する。 << 4th Example >>
Next, a fourth embodiment will be described. In the fourth embodiment, as in the third embodiment, a method for generating stabilized face information based on the image change detection process of the image change determination unit 32 in FIG. 3 will be described. The fourth embodiment is also implemented in combination with the first and second embodiments described above. However, in the fourth embodiment, only the case where the above-described fourth area setting method is adopted in the area setting unit 31 of FIG. 3 is assumed.

第４エリア設定法を採用する場合も、概略分類処理は第３実施例におけるそれと同じである。概略分類処理において安定化対象画像が第４分類状態に分類された場合、詳細分類処理を実行する。第４エリア設定法の採用は、特に、第１と第３細分化状態の区別に関与する、オクルージョンの発生検知に有益である。オクルージョンの発生検知の原理を、図１１（ａ）〜（ｃ）を参照して説明する。 Even when the fourth area setting method is adopted, the rough classification process is the same as that in the third embodiment. When the stabilization target image is classified into the fourth classification state in the rough classification process, the detailed classification process is executed. The adoption of the fourth area setting method is particularly useful for detecting the occurrence of occlusion, which is involved in the distinction between the first and third subdivided states. The principle of detection of occurrence of occlusion will be described with reference to FIGS.

第（ｎ−２）フレーム〜第ｎフレーム間において、入力画像内の顔の位置が静止している状態を想定する。この場合、第（ｎ−２）〜第ｎフレーム画像の夫々に対して、同じ位置に中心視エリア２２０、第１周辺視エリア２４０及び第２周辺視エリア２５０が設定される。そして、第（ｎ−２）フレーム〜第ｎフレーム間において、顔の右側から物体３００が顔に向かって進入してきた場合を想定する。図１１（ａ）、（ｂ）及び（ｃ）は、夫々、第（ｎ−２）フレーム画像、第（ｎ−１）フレーム画像及び第ｎフレーム画像内に設定された中心視エリア２２０、第１周辺視エリア２４０及び第２周辺視エリア２５０を、上記の物体３００と共に示している。 It is assumed that the face position in the input image is stationary between the (n−2) th frame and the nth frame. In this case, the central viewing area 220, the first peripheral viewing area 240, and the second peripheral viewing area 250 are set at the same position for each of the (n−2) th to nth frame images. Then, it is assumed that the object 300 enters the face from the right side of the face between the (n−2) -th frame and the n-th frame. 11 (a), (b), and (c) respectively show a central vision area 220 set in the (n-2) th frame image, the (n-1) th frame image, and the nth frame image, respectively. The 1 peripheral vision area 240 and the 2nd peripheral vision area 250 are shown with the said object 300. FIG.

第（ｎ−２）フレーム画像において物体３００は第２周辺視エリア２５０内に位置しているが、第（ｎ−２）及び第（ｎ−１）フレーム間において物体３００が左方向に移動することにより、第（ｎ−１）フレーム画像において物体３００が第１周辺視エリア２４０内に位置したものとする。そして、第（ｎ−１）及び第ｎフレーム間において物体３００が更に左方向に移動することにより、第ｎフレーム画像において物体３００が中心視エリア２２０内に位置したものとする。 The object 300 is located in the second peripheral vision area 250 in the (n-2) th frame image, but the object 300 moves to the left between the (n-2) th and (n-1) th frames. Thus, it is assumed that the object 300 is located in the first peripheral vision area 240 in the (n−1) th frame image. Then, it is assumed that the object 300 is positioned in the central vision area 220 in the nth frame image by further moving the object 300 leftward between the (n−1) th and nth frames.

この場合、第（ｎ−２）及び第（ｎ−１）フレーム画像間において、第１周辺視エリア２４０と第２周辺視エリア２５０の双方に画像変化が表れ、また、第（ｎ−１）及び第ｎフレーム画像間において、第１周辺視エリア２４０と中心視エリア２２０の双方に画像変化が表れる。このような画像変化を検出することで、物体３００による顔のオクルージョンの発生を検知することが可能である。 In this case, an image change appears in both the first peripheral vision area 240 and the second peripheral vision area 250 between the (n-2) th and (n-1) th frame images, and the (n-1) th And between the nth frame images, image changes appear in both the first peripheral vision area 240 and the central vision area 220. By detecting such an image change, it is possible to detect the occurrence of facial occlusion by the object 300.

具体的には以下のようにして、詳細分類処理を実行する。図１１（ｂ）及び（ｃ）に示すような物体３００の移動があった場合は、中心視エリア及び周辺視エリアの双方に画像変化があると判断されて、安定化対象画像は第４分類状態に分類され、詳細分類処理が実行される。詳細分類処理では、まず、安定化対象画像が第２細分化状態に分類されるか否かを判断する。この判断は、第３実施例と同様にできる。 Specifically, the detailed classification process is executed as follows. When the object 300 moves as shown in FIGS. 11B and 11C, it is determined that there is an image change in both the central viewing area and the peripheral viewing area, and the stabilization target image is classified into the fourth classification. The state is classified and the detailed classification process is executed. In the detailed classification process, first, it is determined whether or not the stabilization target image is classified into the second subdivided state. This determination can be made in the same manner as in the third embodiment.

安定化対象画像が第２細分化状態に分類されない場合、第１と第３細分化状態を区別する。この区別に当たり、図３の画像変化判定部３２は、第（ｎ−１）及び第ｎフレーム画像間の画像変化だけでなく、第（ｎ−２）及び第（ｎ−１）フレーム画像間の画像変化をも参照する。故に、第４実施例では、図３の画像変化判定部３２に対する画像変化検出用画像は、３つのフレーム画像（第（ｎ−２）〜第ｎフレーム画像）を含むこととなる。 When the stabilization target image is not classified into the second subdivision state, the first and third subdivision states are distinguished. In this distinction, the image change determination unit 32 in FIG. 3 not only changes between the (n−1) th and nth frame images, but also between the (n−2) th and (n−1) th frame images. See also image changes. Therefore, in the fourth embodiment, the image change detection image for the image change determination unit 32 in FIG. 3 includes three frame images ((n-2) to nth frame images).

画像変化判定部３２は、第（ｎ−２）及び第（ｎ−１）フレーム画像間における中心視エリア２２０、第１周辺視エリア２４０及び第２周辺視エリア２５０内の各画像変化の有無を検出する。この検出結果は、第（ｎ−１）フレーム画像を安定化対象画像とした時における画像変化検出処理において既に得られているため、それを利用することができる。そして、第（ｎ−２）及び第（ｎ−１）フレーム画像間において、第１周辺視エリア２４０と第２周辺視エリア２５０内にのみ画像変化があると判断される場合は、第ｎフレーム画像において顔のオクルージョンが発生していると判断して第ｎフレーム画像（安定化対象画像）を第３細分化状態に分類する一方、そうでない場合は、第ｎフレーム画像（安定化対象画像）を第１細分化状態に分類する。 The image change determination unit 32 determines whether or not each image change in the central vision area 220, the first peripheral vision area 240, and the second peripheral vision area 250 between the (n-2) th and (n-1) th frame images. To detect. Since this detection result has already been obtained in the image change detection process when the (n−1) th frame image is set as the stabilization target image, it can be used. When it is determined that there is an image change only in the first peripheral vision area 240 and the second peripheral vision area 250 between the (n-2) th and (n-1) th frame images, the nth frame It is determined that facial occlusion has occurred in the image, and the n-th frame image (stabilization target image) is classified into the third subdivided state. Otherwise, the n-th frame image (stabilization target image) Are classified into the first subdivided state.

第１と第３細分化状態の区別は、動きベクトルに基づく画像変化の有無検出によっても、カメラ制御情報に基づく画像変化の有無検出によっても実行可能である。 The distinction between the first and third subdivided states can be performed either by detecting the presence / absence of an image change based on a motion vector or by detecting the presence / absence of an image change based on camera control information.

第１〜第３細分化状態の何れかへの分類が終了した後の処理は、第３実施例とほぼ同様である。即ち、図３の安定化顔情報生成部３３は、第（ｎ−１）フレーム画像の顔検出情報と第ｎ及び第（ｎ−１）フレーム画像間における中心視エリア及び周辺視エリアの動きベクトルに基づいて第ｎフレーム画像における顔の位置等を推定し、上述の推定顔位置を決定する。特に、安定化対象画像としての第ｎフレーム画像が第３細分化状態に分類された場合、推定顔位置を、第（ｎ−１）フレーム画像の顔検出情報によって特定される顔の位置と同じとすればよい。第１又は第２細分化状態に分類された場合の推定処理は、第３実施例と同様である。 The processing after the classification into any of the first to third subdivided states is almost the same as in the third embodiment. That is, the stabilized face information generation unit 33 in FIG. 3 performs motion vectors of the central vision area and the peripheral vision area between the face detection information of the (n−1) th frame image and the nth and (n−1) th frame images. Is used to estimate the position of the face in the nth frame image and determine the estimated face position. In particular, when the nth frame image as the stabilization target image is classified into the third subdivision state, the estimated face position is the same as the face position specified by the face detection information of the (n−1) th frame image. And it is sufficient. The estimation process when classified into the first or second subdivided state is the same as in the third embodiment.

そして、第ｎフレーム画像が第４分類状態に分類され且つ第ｎフレーム画像の顔検出情報がない場合は、上記の推定顔位置より、第ｎフレーム画像の安定化顔情報を生成する。つまり、上記の推定顔位置を、第ｎフレーム画像の安定化顔情報によって特定される顔の位置とする。また、第ｎフレーム画像の安定化顔情報によって特定される顔の大きさ等は、第（ｎ−１）フレーム画像のそれと同じとされる。 If the nth frame image is classified into the fourth classification state and there is no face detection information of the nth frame image, stabilized face information of the nth frame image is generated from the estimated face position. That is, the estimated face position is set as the face position specified by the stabilized face information of the nth frame image. The size of the face specified by the stabilized face information of the nth frame image is the same as that of the (n−1) th frame image.

＜＜変形等＞＞
上述した説明文中に示した具体的な数値は、単なる例示であって、当然の如く、それらを様々な数値に変更することができる。上述の実施形態の変形例または注釈事項として、以下に、注釈１〜注釈６を記す。各注釈に記載した内容は、矛盾なき限り、任意に組み合わせることが可能である。 << Deformation, etc. >>
The specific numerical values shown in the above description are merely examples, and as a matter of course, they can be changed to various numerical values. As modifications or annotations of the above-described embodiment, notes 1 to 6 are described below. The contents described in each comment can be arbitrarily combined as long as there is no contradiction.

［注釈１］
図８に対応する第４エリア設定法では周辺視エリアを２段階に階層化しているが、周辺視エリアを３以上の段階で階層化してもよい。例えば、周辺視エリアを３段階に階層化すれば、４つのフレーム画像に基づくことによって、より高精度にオクルージョンの発生検知を行うことができる。 [Note 1]
In the fourth area setting method corresponding to FIG. 8, the peripheral vision area is hierarchized in two stages, but the peripheral vision area may be hierarchized in three or more stages. For example, if the peripheral vision area is hierarchized in three stages, it is possible to detect the occurrence of occlusion with higher accuracy based on four frame images.

［注釈２］
図３のエリア設定部３１は、エリア設定用画像についての顔検出情報に基づいて画像変化を監視するべき中心視エリア及び周辺視エリアを設定する。この際、上述の例では、安定化対象画像が第ｎフレーム画像である場合、エリア設定用画像を第（ｎ−１）フレーム画像とした。 [Note 2]
The area setting unit 31 in FIG. 3 sets a central vision area and a peripheral vision area where image changes should be monitored based on face detection information about the area setting image. In this case, in the above-described example, when the stabilization target image is the nth frame image, the area setting image is the (n−1) th frame image.

第ｎフレーム画像を基準として顔領域及び顔領域周辺の画像変化を監視したいのであるから、なるだけ近い過去のフレーム画像の顔検出情報を用いて上記エリアを設定した方が望ましい（遠い過去のフレーム画像における顔位置等は、第ｎフレーム画像におけるそれとかけ離れている可能性が高いため）。故に、第（ｎ−１）フレーム画像をエリア設定用画像として取り扱うのが望ましいのであるが、それよりも過去のフレーム画像（例えば、第（ｎ−２）フレーム画像）をエリア設定用画像とすることも可能ではある。 Since we want to monitor the face area and image changes around the face area based on the nth frame image, it is desirable to set the above area using face detection information of the past frame image as close as possible (distant past frames The face position in the image is likely to be far from that in the nth frame image). Therefore, it is desirable to handle the (n−1) th frame image as the area setting image, but a past frame image (for example, the (n−2) th frame image) is used as the area setting image. It is also possible.

仮に、エリア設定用画像を第（ｎ−２）フレーム画像とした場合、第（ｎ−２）フレーム画像の顔検出情報に基づいて、第ｎフレーム画像に対する中心視エリア及び周辺視エリアが設定される。 If the area setting image is the (n-2) th frame image, the central viewing area and the peripheral viewing area for the nth frame image are set based on the face detection information of the (n-2) th frame image. The

［注釈３］
図３の安定化顔情報生成部３３は、画像変化判定部３２による画像変化検出処理の結果と、参照過去画像及び安定化対象画像についての顔検出情報とに基づいて、安定化対象画像についての安定化顔情報を生成する。この際、上述の例では、安定化対象画像が第ｎフレーム画像である場合、参照過去画像を第（ｎ−１）フレーム画像とした。 [Note 3]
The stabilization face information generation unit 33 in FIG. 3 is based on the result of the image change detection process by the image change determination unit 32 and the face detection information on the reference past image and the stabilization target image. Generate stabilized face information. In this case, in the above-described example, when the stabilization target image is the nth frame image, the reference past image is the (n−1) th frame image.

参照過去画像は、安定化対象画像における顔の位置の推定等に用いられるため、なるだけ近い過去のフレーム画像とすべきである。故に、第（ｎ−１）フレーム画像を参照過去画像として取り扱うのが望ましいのであるが、それよりも過去のフレーム画像（例えば、第（ｎ−２）フレーム画像）を参照過去画像とすることも可能ではある。 Since the reference past image is used to estimate the position of the face in the stabilization target image, it should be a past frame image as close as possible. Therefore, it is desirable to treat the (n-1) th frame image as the reference past image, but a past frame image (for example, the (n-2) th frame image) may be used as the reference past image. It is possible.

例えば、参照過去画像を第（ｎ−２）フレーム画像とした場合でも、安定化対象画像としての第ｎフレーム画像が第１〜第３分類状態に分類され且つ第ｎフレーム画像の顔検出情報がない時は、第（ｎ−２）フレーム画像の顔検出情報そのものを第ｎフレーム画像の安定化顔情報とするといった処理を行うことができ、これによって顔検出の途切れ等を有効に抑制できる。 For example, even when the reference past image is the (n-2) th frame image, the nth frame image as the stabilization target image is classified into the first to third classification states, and the face detection information of the nth frame image is When there is not, it is possible to perform processing such that the face detection information itself of the (n−2) th frame image is used as the stabilized face information of the nth frame image, thereby effectively preventing the face detection from being interrupted.

尚、注釈２と本注釈（注釈３）から分かるように、同一の安定化対象画像に対するエリア設定用画像と参照過去画像は、互いに異なりうる。 As can be seen from the annotation 2 and the main annotation (annotation 3), the area setting image and the reference past image for the same stabilization target image can be different from each other.

［注釈４］
上述の実施形態では、中心視エリアと周辺視エリアの双方における画像変化を考慮して顔位置を最終的に検出するようにしているが、周辺視エリアの画像変化のみを考慮するだけでも顔検出の安定化に寄与する。例えば、第（ｎ−１）フレーム画像の顔検出情報があり且つ安定化対象画像としての第ｎフレーム画像の顔検出情報がない場合において、第（ｎ−１）及び第ｎフレーム画像間における周辺視エリア（図５（ａ）の周辺視エリア２３０等）の画像変化がないと判断される時は顔の位置に変化がないと推定されるため、第（ｎ−１）フレーム画像の顔検出情報そのものを第ｎフレーム画像の安定化顔情報とするといった処理を行うことができる。これによって顔検出の途切れ等を有効に抑制できる。 [Note 4]
In the above-described embodiment, the face position is finally detected in consideration of the image change in both the central vision area and the peripheral vision area, but the face detection is performed only by considering only the image change in the peripheral vision area. Contributes to the stabilization of For example, when there is face detection information of the (n-1) th frame image and no face detection information of the nth frame image as the stabilization target image, the periphery between the (n-1) th and nth frame images When it is determined that there is no image change in the viewing area (such as the peripheral vision area 230 in FIG. 5A), it is estimated that there is no change in the face position. It is possible to perform processing such that the information itself is the stabilized face information of the nth frame image. This can effectively prevent face detection interruptions and the like.

［注釈５］
上述の実施形態では、撮像装置１内に物体検出装置としての顔検出部１８を設け、入力画像内から検出する対象物（特定種類の物体）を人間の顔としているが、本発明はこれに限定されない。即ち、顔以外を入力画像内から検出すべき対象物として取り扱うことも可能である。例えば、入力画像内から検出すべき対象物を車両とすれば、本発明を駐車場管理システム等に適用することも可能である。顔以外の対象物の検出も公知の手法（パターンマッチング等）を用いることによって実現可能である。 [Note 5]
In the above-described embodiment, the face detection unit 18 as an object detection device is provided in the imaging device 1, and a target (a specific type of object) to be detected from the input image is a human face. It is not limited. That is, it is also possible to handle objects other than the face as objects to be detected from the input image. For example, if the object to be detected from the input image is a vehicle, the present invention can be applied to a parking lot management system or the like. Detection of an object other than a face can also be realized by using a known method (pattern matching or the like).

［注釈６］
図１の撮像装置１は、ハードウェア、或いは、ハードウェアとソフトウェアの組み合わせによって実現可能である。特に、図２及び図３に示される各部位の機能は、ハードウェア、ソフトウェア、またはハードウェアとソフトウェアの組み合わせによって実現可能である。ソフトウェアを用いて撮像装置１を構成する場合、ソフトウェアにて実現される部位についてのブロック図は、その部位の機能ブロック図を表すことになる。 [Note 6]
The imaging apparatus 1 in FIG. 1 can be realized by hardware or a combination of hardware and software. In particular, the function of each part shown in FIGS. 2 and 3 can be realized by hardware, software, or a combination of hardware and software. When the imaging apparatus 1 is configured using software, a block diagram of a part realized by software represents a functional block diagram of the part.

また、顔検出部１８にて実現される機能の全部または一部を、プログラムとして記述し、該プログラムをプログラム実行装置（例えばコンピュータ）上で実行することによって、その機能の全部または一部を実現するようにしてもよい。 Further, all or part of the functions realized by the face detection unit 18 are described as a program, and the program is executed on a program execution device (for example, a computer) to realize all or part of the functions. You may make it do.

本発明の実施形態に係る撮像装置の全体ブロック図である。1 is an overall block diagram of an imaging apparatus according to an embodiment of the present invention. 図１の顔検出部の内部ブロック図である。It is an internal block diagram of the face detection part of FIG. 図２の顔検出安定化処理部の内部ブロック図である。FIG. 3 is an internal block diagram of a face detection stabilization processing unit in FIG. 2. 図２の顔検出安定化処理部の概略的機能を説明するための図である。It is a figure for demonstrating the schematic function of the face detection stabilization process part of FIG. 本発明の第１実施例に係り、図３のエリア設定部で設定される中心視エリア及び周辺視エリアの例（第１エリア設定法）を示す図（ａ）（ｂ）である。FIGS. 5A and 5B show examples of the central viewing area and the peripheral viewing area (first area setting method) set by the area setting unit of FIG. 3 according to the first embodiment of the present invention. 本発明の第１実施例に係り、図３のエリア設定部で設定される中心視エリア及び周辺視エリアの例（第２エリア設定法）を示す図（ａ）と、周辺視エリアのみを抽出して示した図（ｂ）である。FIG. 7A shows an example of the central vision area and the peripheral vision area (second area setting method) set by the area setting unit in FIG. 3 according to the first embodiment of the present invention, and only the peripheral vision area is extracted. It is the figure (b) shown. 本発明の第１実施例に係り、図３のエリア設定部で設定される中心視エリア及び周辺視エリアの例（第３エリア設定法）を示す図である。It is a figure which concerns on 1st Example of this invention and shows the example (3rd area setting method) of the central vision area and peripheral vision area which are set by the area setting part of FIG. 本発明の第１実施例に係り、図３のエリア設定部で設定される中心視エリア及び周辺視エリアの例（第４エリア設定法）を示す図である。It is a figure which shows the example (4th area setting method) of the central vision area and peripheral vision area which are related to 1st Example of this invention and are set by the area setting part of FIG. 本発明の第３実施例に係り、図３の画像変化判定部にて実施される概略分類処理及び詳細分類処理の内容を示す図である。It is a figure which shows the content of the rough classification | category process and detailed classification | category process which are related to 3rd Example of this invention, and are implemented in the image change determination part of FIG. 本発明の第３実施例に係り、顔が右方向に移動した時に現れる動きベクトルを示す図（ａ）と、右側から顔に向かって他の物体が進入してきた時に現れる動きベクトルを示す図（ｂ）である。The figure which shows the motion vector which appears when a face moves to the right direction according to 3rd Example of this invention, and the figure which shows the motion vector which appears when another object approachs toward the face from the right side ( b). 本発明の第４実施例に係り、周辺視エリアが階層化されている時に実施可能なオクルージョン検知の原理を説明するための図（ａ）（ｂ）（ｃ）である。FIG. 6 is a diagram (a), (b), and (c) for explaining the principle of occlusion detection that can be performed when the peripheral vision area is hierarchized according to the fourth embodiment of the present invention.

Explanation of symbols

１撮像装置
１８顔検出部
２１顔検出処理部
２２顔検出安定化処理部
３１エリア設定部
３２画像変化判定部
３３安定化顔情報生成部 DESCRIPTION OF SYMBOLS 1 Imaging device 18 Face detection part 21 Face detection process part 22 Face detection stabilization process part 31 Area setting part 32 Image change determination part 33 Stabilization face information generation part

Claims

In an object detection apparatus for detecting the position of a specific type of object in a moving image based on the moving image,
An object detection apparatus for detecting a position of the object in consideration of an image change around the object in the moving image.

In an object detection apparatus for detecting the position of a specific type of object in a moving image based on the moving image,
First detection means for tentatively detecting the position of the object in the moving image;
The position of the object in the moving image is finally determined based on the detection result of the first detection means and the image change in the reference area including the object and in the peripheral area located around the reference area. An object detection apparatus comprising: a second detection means for detecting.

The moving image is formed from a plurality of input images arranged in time series,
The first detection means tentatively detects the position of the object in each input image,
The second detection means includes
Area setting means for setting the reference area and the peripheral area for the current input image based on the position of the object with respect to the past input image detected by the first detection means;
The reference area and the peripheral area set for the current input image are defined for the current and past input images, and image changes in the reference area and the peripheral area between the current and past input images are defined. Image change detecting means for detecting as an image change for the current input image,
Based on the detection result of the first detection means for the past and current input images and the image change for the current input image detected by the image change detection means, the object in the current input image The object detection apparatus according to claim 2, wherein the position is finally detected.

The peripheral area is set so as to surround the entire circumference of the reference area,
The object detection apparatus according to claim 2, wherein the image change detection unit detects the image change for the peripheral area after excluding a part of the peripheral area.

The object detection apparatus according to claim 2, wherein the peripheral area is set so as to surround a part of the entire circumference of the reference area.

The peripheral area is hierarchized to include a first peripheral area that is relatively close to the reference area and a second peripheral area that is relatively far from the reference area,
The second detection means finally detects the position of the object based on a detection result of the first detection means and image changes in the reference area and the first and second peripheral areas. The object detection device according to claim 2.

The object detection apparatus according to claim 1, wherein the specific type of object is a human face.

Imaging means for acquiring a moving image according to the subject;
An imaging apparatus comprising: the object detection apparatus according to claim 1 that receives the moving image obtained by the imaging means.

In an object detection method for detecting a position of a specific type of object in a moving image based on the moving image,
An object detection method for detecting a position of the object in consideration of an image change around the object in the moving image.