JP2010027000A

JP2010027000A - Image detection device and image detection method

Info

Publication number: JP2010027000A
Application number: JP2008191316A
Authority: JP
Inventors: Yoshihiro Suzuki; 義弘鈴木
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2008-07-24
Filing date: 2008-07-24
Publication date: 2010-02-04

Abstract

<P>PROBLEM TO BE SOLVED: To track and detect a seamless face by using an inspection processing speed that follows the frame of a moving image in order to clearly photograph a moving image that changes with time. <P>SOLUTION: This image detection device includes a scaling part 201 for expanding or reducing image data on a screen, an image detecting part 203 for detecting a specific image in image data subjected to scaling processing by scanning a rectangular area of a prescribed size on the screen provided with a plurality of sections horizontally, or horizontally and vertically, wherein the image data obtained by the scaling part includes the plurality of sections having the prescribed number of pixels, a candidate area processing part 204 for setting a detection likelihood value corresponding to the degree of overlap of the sections of the screen of the specified image obtained by the detection, and a storing part 202 for storing the set detection likelihood value. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、画面の中から特定の画像を検出する画像検出装置および画像検出方法に関し、特に画面の中から特定の領域の人物の顔の画像を検出して所望の処理を行う画像検出装置及び画像検出方法に関する。 The present invention relates to an image detection apparatus and an image detection method for detecting a specific image from a screen, and more particularly to an image detection apparatus for detecting a human face image in a specific area from a screen and performing a desired process. The present invention relates to an image detection method.

近年、静止画像の撮影を主とするディジタルスチルカメラや動画像の撮影を主とするディジタルビデオカメラには、撮影対象となる人物の顔を検出し、検出結果をＡＦ（Auto Focus）、ＡＥ（Auto Exposure）やＡＷＢ（Auto White Balance）の制御に用いたり、撮影画像の肌色の色合いの調整を行ったりする機能が搭載されてきている。時々刻々と変化する一瞬の情景を、上記の制御を活かしながら鮮明に撮影するためには、動画像のフレームレイトに匹敵する、または、それ以上のスピードで顔を追跡しながら検出する必要がある。 In recent years, digital still cameras mainly focusing on still images and digital video cameras mainly focusing on moving images detect the face of a person to be captured, and the detection results are AF (Auto Focus) and AE (AE). A function for controlling Auto Exposure) or AWB (Auto White Balance), or adjusting the flesh tone of a photographed image has been installed. In order to capture a momentary scene that changes from moment to moment clearly while taking advantage of the above control, it is necessary to detect the scene while tracking the face at a speed comparable to or faster than the frame rate of the moving image. .

この顔検出を行う方法に関し、特許文献１には、顔らしさを示す検出確度値に基づき、対象物が顔であるか否かを判断し、画像を段階的に縮小させて走査を行い、画像の中から顔の位置およびサイズを検索する方法が開示してある。
この特許文献１では、画面上で、撮像された画像の左上の端から右下の端まで、所定サイズの矩形領域を水平方向にΔｘ、垂直方向にΔｙだけ移動させて走査することにより、顔の位置を求める。
また、撮像された画像を所定の縮小率Δｒだけ段階的に縮小させた縮小画像を生成し、生成された縮小画像に対して走査を繰り返すことにより、顔のサイズを求める。
そして、所定サイズの矩形領域内の全ての画素の中から、予め学習によって定められた２点の画素の輝度値の差分を全て算出し、算出結果に基づき顔らしさを示す検出確度値をそれぞれ求める。矩形領域の画像の検出確度値には、対象物が顔であるか否かを判断するための閾値が予め設けられており、算出された検出確度値が閾値以上であれば、対象物が顔であると判断し、閾値よりも低ければ、対象物が顔でないと判断する。 With respect to the method of performing face detection, Patent Document 1 discloses whether or not an object is a face based on a detection accuracy value indicating the likelihood of a face, and performs scanning by reducing the image stepwise. A method for retrieving the position and size of a face from among the methods is disclosed.
In this patent document 1, a rectangular area having a predetermined size is scanned on the screen by moving Δx in the horizontal direction and Δy in the vertical direction from the upper left end to the lower right end of the captured image. Find the position of.
In addition, a reduced image obtained by reducing the captured image stepwise by a predetermined reduction ratio Δr is generated, and the generated reduced image is repeatedly scanned to obtain the face size.
Then, from all the pixels in the rectangular area of the predetermined size, all the differences between the luminance values of the two pixels determined in advance by learning are calculated, and the detection accuracy values indicating the facialness are obtained based on the calculation results. . The detection accuracy value of the image in the rectangular area has a threshold value for determining whether or not the object is a face in advance, and if the calculated detection accuracy value is equal to or greater than the threshold value, the object is the face. If it is lower than the threshold, it is determined that the object is not a face.

この特許文献１における顔検出を行う手法では、画像一枚を処理する場合の演算処理回数が非常に多い。対象物が顔であるか否かを判断する演算処理回数は、「画像の縮小回数×矩形領域の移動回数」となる。その演算処理回数の多さが顔の検出処理スピードを下げる原因となっている。
この不利益を解決するために特許文献２に、色特徴に基づいて顔を含む可能性の高い候補領域を探し、その候補領域中のみを顔の検索対象とすることで、演算処理回数を減らす手法が開示されている。
ここで、「顔を含む可能性が高い」とは、例えば肌色を有する画像領域である。しかし、この方法では背景に肌色に近い色が存在すると、その背景を候補領域に含めてしまう不利益がある。また、肌色は顔のみとは限らず、例えば腕や足など肌が露出している部分も肌色であり、その部分を候補領域に含めてしまう不利益がある。 In the method of performing face detection in Patent Document 1, the number of times of arithmetic processing when processing one image is very large. The number of times of calculation processing for determining whether or not the object is a face is “number of times of image reduction × number of times of movement of the rectangular area”. The large number of calculation processes is a cause of reducing the face detection processing speed.
In order to solve this disadvantage, Patent Document 2 searches for candidate areas that are highly likely to include a face based on color characteristics, and only the candidate areas are searched for faces, thereby reducing the number of calculation processes. A technique is disclosed.
Here, “highly likely to include a face” is, for example, an image region having a skin color. However, this method has a disadvantage that if the background has a color close to skin color, the background is included in the candidate area. Further, the skin color is not limited to only the face, and for example, a portion where the skin is exposed, such as an arm or a foot, is a skin color, and there is a disadvantage that the portion is included in the candidate region.

また、この不利益を解決するために特許文献３では、過去の撮影によって得られた顔検出結果(顔のサイズ、顔の位置、など)を蓄積していき、この撮像装置では、どの位置にどのような大きさの顔が写るように撮影される傾向にあるかを学習する。その学習データに基づいて、撮像画像データ中に顔が存在する可能性のある候補領域を探し、その領域に対してのみ顔検出処理を行うことで演算処理回数を減らす方法が開示されている。
しかし、この方法では定点カメラのように撮像する情景が一定で、撮影される画像の傾向が定まるような場合では非常に有効であるが、ディジタルスチルカメラやディジタルビデオカメラにように持ち運びが自由な撮像装置の場合には、撮影される画像の傾向が時間と場所によって変化するために、学習データが役にたたず、候補領域が特定できない。 In order to solve this disadvantage, in Patent Document 3, face detection results (face size, face position, etc.) obtained by past shooting are accumulated, and in this imaging apparatus, at which position Learn what size the face tends to be shot. Based on the learning data, a method is disclosed in which a candidate area where a face may exist is detected in captured image data, and the face detection process is performed only on the area, thereby reducing the number of calculation processes.
However, this method is very effective when the scene to be captured is fixed as in the case of a fixed point camera and the tendency of the image to be captured is fixed, but it is not portable as in a digital still camera or digital video camera. In the case of an imaging device, since the tendency of a captured image changes depending on time and place, the learning data is not useful and a candidate area cannot be specified.

この顔検出を行う手法は、矩形領域が顔であるか否かを表す検出確度値を求め、検出確度値が閾値以上であれば、対象物が顔であると判断し、閾値よりも低ければ、対象物が顔でないと判断する方法を取っている。矩形領域の検出確度値が撮像画像の条件により下がってしまった場合は顔と判定されない。検出確度値が下がる条件としては、顔の向きや、光の当たり方、メガネ・帽子などの顔を隠す物体の有無、顔の前方を横切る障害物などが挙げられる。顔が検出できない瞬間があるということは、その瞬間に撮像画像中の顔の位置とサイズを見失うことを意味し、その後の顔検出の追跡が出来なくなってしまう。 This face detection method obtains a detection accuracy value indicating whether or not a rectangular area is a face, and if the detection accuracy value is equal to or greater than a threshold value, determines that the target is a face, and if the detection accuracy value is lower than the threshold value. The method of judging that the object is not a face is taken. If the detection accuracy value of the rectangular area is lowered due to the condition of the captured image, the face is not determined. Conditions for lowering the detection accuracy value include face orientation, how light strikes, presence / absence of objects that hide the face such as glasses / hats, obstacles crossing the front of the face, and the like. The fact that there is a moment when a face cannot be detected means that the position and size of the face in the captured image is lost at that moment, and subsequent face detection cannot be tracked.

この不利益を解決するために特許文献４では、検出された顔の位置に関連付けて、顔の位置と相対関係で定まる領域から取得される周囲情報を記録し、もし、顔が検出されなかった場合には、直前に検出された顔の位置に基づいて、顔の位置候補を複数求め、この位置候補にそれぞれ対応する周囲情報を取得し、直前に検出された顔の位置に関連付けられている周囲情報に、もっとも類似する周囲情報をもつ位置候補を、顔の位置とする方法が開示されている。 In order to solve this disadvantage, Patent Document 4 records ambient information obtained from an area determined by a relative relationship with the face position in association with the detected face position, and if no face is detected. In this case, based on the position of the face detected immediately before, a plurality of face position candidates are obtained, the surrounding information corresponding to each position candidate is obtained, and associated with the position of the face detected immediately before. A method is disclosed in which a position candidate having the most similar surrounding information to the surrounding information is used as the face position.

上述した周囲情報の例としては顔の下側に位置する服装などである。しかし、この方法では顔が検出されなかった場合に、直前に検出された顔の位置に関連付けられた周囲情報が残っているとは限らないため、類似する周囲情報を探し出せない場合がある。
例えば、服装を周囲情報と考えると、顔の向きが変わったことで顔が検出できなくなった場合には、顔の向きが変わった理由として体全体の向きが変わったとすれば、服装の向きも変化することが考えられる。また、光の当たり方で顔が検出されなくなった場合には、例えば人物が建物の影に入った場合などのように、体全体にあたる光が変わったとすれば、服装に当たる光も変化することが考えられる。
また、顔の前方を横切る障害物で顔が検出されなくなった場合には、横切る障害物として顔のみを覆い隠す小さい障害物もあれば、体全体を覆い隠してしまう大きな障害物も存在する。これにより服装も隠されてしまうことが考えられる。また、制服などのように、同一の服装の人物を撮影する場合に、類似する周囲情報をもつ位置候補が複数存在することになり、顔の位置の絞り込みが出来ない。 Examples of the surrounding information described above include clothes positioned below the face. However, in this method, when a face is not detected, the surrounding information associated with the position of the face detected immediately before does not always remain, and therefore similar surrounding information may not be found.
For example, if clothes are considered to be surrounding information, if the face cannot be detected due to a change in face orientation, if the orientation of the whole body changes as the reason for the change of face orientation, It is possible to change. Also, if the face is no longer detected by the way the light hits, if the light hits the entire body changes, for example, when a person enters the shadow of a building, the light that hits the clothes may also change. Conceivable.
In addition, when a face is no longer detected by an obstacle that crosses the front of the face, there are small obstacles that cover only the face as obstacles that cross, and there are large obstacles that cover the entire body. It is conceivable that this will also hide clothes. In addition, when shooting a person in the same clothes, such as a uniform, there are a plurality of position candidates having similar surrounding information, and the position of the face cannot be narrowed down.

特開２００５−１５７６７９号公報JP 2005-157679 A 特開２００１−３０９２２５号公報JP 2001-309225 A 特開２００７−１２２４８４号公報JP 2007-122484 A 特開２００７−４２０７２号公報JP 2007-42072 A

したがって、本発明の目的は、処理検出を高速化すると共にシームレスな顔の追跡検出を行う画像検出装置とその方法を提供することにある。 Accordingly, it is an object of the present invention to provide an image detection apparatus and method for speeding up processing detection and performing seamless face tracking detection.

本発明の画像検出装置は、画面上の画像データを拡大または縮小するスケーリング部と、上記スケーリング部で得られた画像データに対して所定個数の画素を有する区画が複数個設けられ、該複数個の区画で設けられた画面上を所定の大きさの矩形領域で水平方向、または水平および垂直方向に走査し、上記スケーリング処理された画像データの中の特定画像を検出する画像検出部と、上記検出して得られた特定画像の画面の区画の重複度に応じた検出確度値を設定する候補領域処理部と、上記候補領域処理部で設定された検出確度値を記憶する記憶部と、を有する。 An image detection apparatus according to the present invention includes a scaling unit that enlarges or reduces image data on a screen, and a plurality of sections having a predetermined number of pixels with respect to the image data obtained by the scaling unit. An image detection unit that scans the screen provided in the section in a rectangular area of a predetermined size in the horizontal direction, or in the horizontal and vertical directions, and detects a specific image in the scaled image data; and A candidate area processing unit that sets a detection accuracy value corresponding to the degree of overlap of the sections of the screen of the specific image obtained by detection, and a storage unit that stores the detection accuracy value set by the candidate area processing unit. Have.

本発明の、画像検出方法は、画面上の画像データをスケーリング処理により拡大または縮小するステップと、上記スケーリング処理により拡大または縮小された画像データに所定個数の画素を有する区画を複数個設けるステップと、上記画面上に設けられた前記複数個の区画を所定の大きさの矩形領域で上記画面上の水平方向、または水平および垂直方向に走査するステップと、上記矩形領域の走査により、上記スケーリング処理された画像データの中の特定画像を検出するステップと、上記検出して得られた特定画像に対応する区画の重複度に応じた検出確度値を設定するステップと、上記設定された検出確度値を記憶するステップとを有する。 The image detection method of the present invention includes a step of enlarging or reducing the image data on the screen by scaling processing, and a step of providing a plurality of sections having a predetermined number of pixels in the image data enlarged or reduced by the scaling processing. Scanning the plurality of sections provided on the screen with a rectangular area of a predetermined size in the horizontal direction on the screen, or in the horizontal and vertical directions, and the scaling process by scanning the rectangular area A step of detecting a specific image in the obtained image data, a step of setting a detection accuracy value according to the degree of overlap of the section corresponding to the specific image obtained by the detection, and the set detection accuracy value Storing.

本発明の画像検出装置または画像検出方法は、フレーム画像に複数の画素で構成される区画を複数個設け、特定の大きさの検査領域で画面上の複数の区画を水平方向、垂直方向に走査し、検査領域に特定画像が含まれる場合は検査確度値を設定し、それ以外の場合は検査確度値を零（０）に設定し、各フレーム画像で同様な検査を行い検出候補領域を求める。 In the image detection apparatus or the image detection method of the present invention, a plurality of sections composed of a plurality of pixels are provided in a frame image, and a plurality of sections on the screen are scanned in a horizontal direction and a vertical direction in an inspection area of a specific size. If a specific image is included in the inspection area, an inspection accuracy value is set. Otherwise, the inspection accuracy value is set to zero (0), and a similar inspection is performed on each frame image to obtain a detection candidate area. .

本発明の画像検出装置及び画像検出方法によれば、画面上の特定画像領域の例えば顔検出の追跡が中断されること無く、顔検出のスピードを速くすることができる。 According to the image detection apparatus and the image detection method of the present invention, it is possible to increase the speed of face detection without interrupting the tracking of, for example, face detection of a specific image area on the screen.

図１に、この発明の実施形態である撮像装置１００の構成例を示す。以下、具体例として顔検出を行う画像検出装置について説明するが、本発明は画像データとして特に顔に限定されるものでなく、画面上のその他の特定される画像であっても良い。 FIG. 1 shows a configuration example of an imaging apparatus 100 that is an embodiment of the present invention. Hereinafter, an image detection apparatus that performs face detection will be described as a specific example, but the present invention is not particularly limited to a face as image data, and may be another specified image on a screen.

撮像装置１００は、光学ブロック１０１、信号変換部１０２、カメラ信号処理部１０３、顔検出部１０４、表示処理部１０５、画像信号処理部１０６、記憶部１０７、表示部１０８、画像ＲＡＭ（Random Access Memory）１０９、制御部（ＣＰＵ；Central Processing Unit）１１０、ＲＯＭ（Read Only Memory）１１１、ＲＡＭ１１２および画像バス１１３を備える。 The imaging apparatus 100 includes an optical block 101, a signal conversion unit 102, a camera signal processing unit 103, a face detection unit 104, a display processing unit 105, an image signal processing unit 106, a storage unit 107, a display unit 108, an image RAM (Random Access Memory). ) 109, a central processing unit (CPU) 110, a read only memory (ROM) 111, a RAM 112, and an image bus 113.

光学ブロック１０１は、内部に、不図示の被写体を撮像するためのレンズ群、絞り調整機構、フォーカス調整機構、ズーム機構、シャッター機構およびフラッシュ機構などを備え、後述する制御部（ＣＰＵ）１１０からの制御信号に応じてズーム制御、シャッター制御および露出制御などの制御が行われる。 The optical block 101 includes a lens group for imaging a subject (not shown), an aperture adjustment mechanism, a focus adjustment mechanism, a zoom mechanism, a shutter mechanism, a flash mechanism, and the like, and is supplied from a control unit (CPU) 110 described later. Control such as zoom control, shutter control, and exposure control is performed according to the control signal.

信号変換部１０２は、ＣＣＤ（Charge Coupled Device）やＣＭＯＳ（complementary Metal Oxide Semiconductor）イメージセンサなどの撮像素子により構成され、入力された被写体像は光学ブロック１０１において結像面に結像される。この信号変換部１０２は、例えば、シャッター操作に応じて制御部１１０から供給される画像取り込みタイミング信号を受けて、結像面に結像されている被写体像の光信号を撮像（映像）信号の電気信号に変換し、カメラ信号処理部１０３に供給する。 The signal conversion unit 102 is configured by an image sensor such as a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS) image sensor, and an input subject image is formed on an image forming plane in the optical block 101. The signal conversion unit 102 receives, for example, an image capture timing signal supplied from the control unit 110 in response to a shutter operation, and captures an optical signal of a subject image formed on the imaging plane as an imaging (video) signal. It is converted into an electric signal and supplied to the camera signal processing unit 103.

カメラ信号処理部１０３は、制御部１１０からの制御信号に基づき、信号変換部１０２から出力された撮像信号に対して各種の信号処理を施す。具体的には、例えば、信号変換部１０２からの撮像信号に対して、ガンマ補正やＡＧＣ（Auto Gain Control）などの処理を行うとともに、撮像信号をディジタル信号としての画像データに変換する処理を行う。また、カメラ信号処理部１０３は、制御部１１０からの制御信号に基づき、この画像データに対してホワイトバランス制御や露出補正制御などをさらに行う。 The camera signal processing unit 103 performs various signal processing on the imaging signal output from the signal conversion unit 102 based on the control signal from the control unit 110. Specifically, for example, processing such as gamma correction and AGC (Auto Gain Control) is performed on the imaging signal from the signal conversion unit 102, and processing for converting the imaging signal into image data as a digital signal is performed. . The camera signal processing unit 103 further performs white balance control, exposure correction control, and the like on the image data based on a control signal from the control unit 110.

顔検出部１０４は、カメラ信号処理部１０３から出力された画像データを、画像ＲＡＭ(Random Access Memory）１０９を介して受け取り、受け取った画像データの中から人物の顔を検出し、その検出結果を制御部１１０に供給する。なお、顔検出部１０４の構成や顔検出処理の詳細については、後述する。 The face detection unit 104 receives the image data output from the camera signal processing unit 103 via an image RAM (Random Access Memory) 109, detects a human face from the received image data, and detects the detection result. It supplies to the control part 110. The configuration of the face detection unit 104 and details of the face detection process will be described later.

表示処理部１０５は、カメラ信号処理部１０３および後述する画像信号処理部１０６から出力された画像データを、画像ＲＡＭ１０９を介して受け取り、例えば表示に適した解像度の画像に変換して、表示部１０８に供給する。 The display processing unit 105 receives image data output from the camera signal processing unit 103 and an image signal processing unit 106 described later via the image RAM 109, converts the image data into an image having a resolution suitable for display, for example, and displays the display unit 108. To supply.

表示部１０８は、表示処理部１０５で処理された画像データを表示画面に表示する。この表示部１０８は、撮像装置におけるビューファインダとして用いられるとともに、記憶部１０７から再生された画像のモニタとして用いられる。表示部１０８としては、例えばＬＣＤ（Liquid Crystal Display；液晶表示装置）を用いることができる。 The display unit 108 displays the image data processed by the display processing unit 105 on the display screen. The display unit 108 is used as a view finder in the imaging apparatus, and is also used as a monitor for an image reproduced from the storage unit 107. As the display unit 108, for example, an LCD (Liquid Crystal Display) can be used.

画像信号処理部１０６は、カメラ信号処理部１０３から出力された画像データを、画像ＲＡＭ１０９を介して受け取り、この画像データを圧縮符号化して、動画像あるいは静止画像のデータファイルとして記憶部１０７に出力する。
また、画像信号処理部１０６は、後述する記憶部１０７から読み出された画像データファイルを復号し、画像ＲＡＭ１０９を介して表示処理部１０５に供給する。なお、例えば、動画像の符号化方式としては、ＭＰＥＧ（Moving Picture Experts Group）方式、静止画像の符号化方式としては、ＪＰＥＧ（Joint Photographic Experts Group）方式を適用することができる。 The image signal processing unit 106 receives the image data output from the camera signal processing unit 103 via the image RAM 109, compresses and encodes this image data, and outputs it to the storage unit 107 as a moving image or still image data file. To do.
Further, the image signal processing unit 106 decodes an image data file read from the storage unit 107 described later, and supplies the decoded image data file to the display processing unit 105 via the image RAM 109. For example, a moving picture experts group (MPEG) system can be applied as a moving picture encoding system, and a JPEG (Joint Photographic Experts Group) system can be applied as a still image encoding system.

記憶部１０７は、画像信号処理部１０６により符号化されて生成された画像ファイルを記憶する。記憶部１０７としては、例えば、磁気テープや光ディスクなどの記録媒体のドライブ装置、フラッシュメモリ、あるいはＨＤＤ（Hard Disc Drive）などを用いることができる。この記憶部１０７は、記憶された画像データファイルを画像信号処理部１０６に読み出すことの他に、その画像データファイルに付随した情報を制御部１１０に供給する。画像データファイルに付随した情報とは、例えば画像データファイル中に埋め込まれた撮像条件や顔検出結果などのデータである。 The storage unit 107 stores the image file generated by being encoded by the image signal processing unit 106. As the storage unit 107, for example, a drive device of a recording medium such as a magnetic tape or an optical disk, a flash memory, or an HDD (Hard Disc Drive) can be used. In addition to reading out the stored image data file to the image signal processing unit 106, the storage unit 107 supplies information accompanying the image data file to the control unit 110. The information attached to the image data file is data such as imaging conditions and face detection results embedded in the image data file, for example.

画像ＲＡＭ１０９は、カメラ信号処理部１０３、顔検出部１０４、表示処理部１０５および画像信号処理部１０６に対して、画像バス１１３を介して接続されている。画像ＲＡＭ１０９は、画像バス１１３に接続されたこれらの処理ブロックにより共有され、各ブロック間では画像ＲＡＭ１０９を介して画像データが受け渡される。なお、この例では、これらの処理ブロックは、画像ＲＡＭ１０９を介して画像データを授受するものとして説明しているが、本発明は、この例に限らず、例えば、顔検出部１０４および表示処理部１０５は、カメラ信号処理部１０３および画像信号処理部１０６からの出力画像データを、画像バス１１３を介さずに直接受け取ることができるようにしてもよい。 The image RAM 109 is connected to the camera signal processing unit 103, the face detection unit 104, the display processing unit 105, and the image signal processing unit 106 through an image bus 113. The image RAM 109 is shared by these processing blocks connected to the image bus 113, and image data is transferred between the blocks via the image RAM 109. In this example, these processing blocks are described as transmitting and receiving image data via the image RAM 109. However, the present invention is not limited to this example. For example, the face detection unit 104 and the display processing unit Reference numeral 105 may be configured to directly receive output image data from the camera signal processing unit 103 and the image signal processing unit 106 without using the image bus 113.

制御部（ＣＰＵ）１１０はＣＰＵなどを備え、ＲＯＭ１１１に予め格納されたプログラムに従い、ＲＡＭ１１２をワークメモリとして、この撮像装置（１００）全体を制御する。例えば、制御部１１０のＣＰＵは、撮像装置１００の各部とコマンドやデータのやりとりを行い、それぞれを制御する。また、制御部１１０のＣＰＵは、図示されない操作部に対する操作に応じた制御信号や撮像信号などに基づき、光学ブロック１０１のフォーカス、絞り、ズームなどを制御するための制御信号を生成し、光学ブロック１０１に供給する。 The control unit (CPU) 110 includes a CPU and the like, and controls the entire imaging apparatus (100) using the RAM 112 as a work memory in accordance with a program stored in advance in the ROM 111. For example, the CPU of the control unit 110 exchanges commands and data with each unit of the imaging apparatus 100 and controls each of them. Further, the CPU of the control unit 110 generates a control signal for controlling the focus, aperture, zoom, and the like of the optical block 101 based on a control signal, an imaging signal, and the like according to an operation on an operation unit (not shown), and the optical block 101.

図２に、顔検出部１０４の回路ブロックの構成例を示す。顔検出部１０４は、画像変換部としてのスケーリング部２０１、画像バッファ２０２、顔検出処理部としての顔検出コア部２０３、候補領域処理部２０４、コントローラ２０５を備える。 FIG. 2 shows a configuration example of a circuit block of the face detection unit 104. The face detection unit 104 includes a scaling unit 201 as an image conversion unit, an image buffer 202, a face detection core unit 203 as a face detection processing unit, a candidate area processing unit 204, and a controller 205.

スケーリング部２０１は、画像ＲＡＭ１０９から画像バス１１３を介して読み出した画像データのサイズ（水平方向および垂直方向の各画素数）を、顔検出コア部２０３における顔検出処理に適するように縮小または拡大するスケーリング処理を行い、画像バッファ２０２に供給する。
また、顔を含む可能性の高い候補領域として、候補領域処理部２０４により算出された、顔の検出候補領域ビットマップをコントローラ２０５より受け取り、画像と同様にスケーリング処理を行い、画像バッファ２０２に供給する。なお、顔の検出候補領域ビットマップのスケーリング処理の詳細については後述する。 The scaling unit 201 reduces or enlarges the size of the image data read from the image RAM 109 via the image bus 113 (the number of pixels in the horizontal direction and the vertical direction) so as to be suitable for the face detection processing in the face detection core unit 203. Scaling processing is performed and supplied to the image buffer 202.
In addition, a face detection candidate area bitmap calculated by the candidate area processing unit 204 as a candidate area having a high possibility of including a face is received from the controller 205, and scaling processing is performed in the same manner as an image, which is supplied to the image buffer 202. To do. Details of the scaling processing of the face detection candidate area bitmap will be described later.

画像バッファ２０２は、スケーリング部２０１によってサイズ変換された画像データを一時的に保持する。そして、コントローラ２０５の制御に基づき、画像データの中から指示された位置における、所定サイズの矩形画像データを切り出す走査処理を行い、顔検出コア部２０３に供給する。
また、スケーリング部２０１によってサイズ変換された顔の検出候補領域ビットマップから、切出された矩形画像データが、顔の検出候補領域内に当てはまるものかどうかを判定し、顔検出処理を行う（On）／行わない（Off）を表すフラグデータを作成して、顔検出コア部２０３へ供給する。なお、顔検出を行う（On）／行わない（Off）を表すフラグデータの作成方法の詳細については後述する。 The image buffer 202 temporarily holds the image data whose size has been converted by the scaling unit 201. Based on the control of the controller 205, a scanning process for cutting out rectangular image data of a predetermined size at a designated position from the image data is performed and supplied to the face detection core unit 203.
Further, it is determined from the face detection candidate area bitmap size-converted by the scaling unit 201 whether or not the extracted rectangular image data falls within the face detection candidate area, and face detection processing is performed (On ) / Not performed (Off) flag data is created and supplied to the face detection core unit 203. The details of a method for creating flag data indicating whether face detection is performed (On) or not performed (Off) will be described later.

顔検出コア部２０３は、画像バッファ２０２から供給された矩形画像データに対して、顔検出処理を施す。顔検出コア部２０３では、顔検出処理として、顔判定処理、重複判定処理を行い、顔検出結果を制御部１１０と候補領域処理部２０４に出力する。ただし、上記の顔検出処理を行うのは、画像バッファ２０２から供給された顔検出を行う（On）／行わない（Off）を表すフラグデータが、Ｏｎの矩形画像データに対してであり、Ｏｆｆの矩形画像データについては顔検出処理を行わない。 The face detection core unit 203 performs face detection processing on the rectangular image data supplied from the image buffer 202. The face detection core unit 203 performs face determination processing and overlap determination processing as face detection processing, and outputs the face detection result to the control unit 110 and the candidate area processing unit 204. However, the above-described face detection processing is performed on the rectangular image data with On flag indicating that face detection supplied from the image buffer 202 is performed (On) / not performed (Off). Face detection processing is not performed for the rectangular image data.

候補領域処理部２０４は、顔検出コア部２０３から供給された顔の検出結果に基づいて、次のフレーム画像で顔が検出される可能性が高い領域を算出し、その領域を次のフレーム画像で、顔の検出候補領域に設定する。顔が検出される可能性が高い領域とは、現フレーム画像で顔が検出された位置の周辺画像である。設定された顔の検出候補領域データはコントローラ２０５に供給される。なお、顔の検出候補領域データの算出方法については後述する。 The candidate area processing unit 204 calculates an area where a face is highly likely to be detected in the next frame image based on the face detection result supplied from the face detection core unit 203, and uses the area as the next frame image. Then, the face detection candidate area is set. The region where the possibility of detecting a face is high is a peripheral image at the position where the face is detected in the current frame image. The set face detection candidate area data is supplied to the controller 205. A method for calculating face detection candidate area data will be described later.

コントローラ２０５は、制御部１１０に設けられたＣＰＵの制御に基づき、スケーリング部２０１に対する画像データの縮小率（または拡大率）の指示や、画像バッファ２０２に対する書き込みまたは読み出しのメモリアドレスの指示など、顔検出部１０４の各部を制御する。また、候補領域処理部２０４によって求めた顔の検出候補領域データを、顔の検出候補ビットマップに変換した後に、スケーリング部２０１に供給する。なお、顔の検出候補領域ビットマップの算出方法については後述する。 Based on the control of the CPU provided in the control unit 110, the controller 205 performs instructions such as an instruction for a reduction rate (or enlargement rate) of the image data to the scaling unit 201 and an instruction for a memory address for writing or reading to the image buffer 202. Each part of the detection part 104 is controlled. Further, the face detection candidate area data obtained by the candidate area processing unit 204 is converted into a face detection candidate bitmap, and then supplied to the scaling unit 201. A method for calculating the face detection candidate area bitmap will be described later.

次に、図１と図２を用いて撮像装置１００の動作を説明する。
まず、画像の取り込みについて説明する。
画像の記録時には、被写体からの光が光学ブロック１０１を介して不図示の撮像素子に入射され、撮像素子上に結像された画像は、信号変換部１０２で光電変換により電気信号に変換された撮像（映像）信号となる。その撮像信号は順次カメラ信号処理部１０３に供給され、ディジタル変換や画質補正処理などが施される。各種処理が施された画像データは、画像バス１１３を介して画像ＲＡＭ１０９に一旦記憶される。 Next, the operation of the imaging apparatus 100 will be described with reference to FIGS. 1 and 2.
First, image capture will be described.
At the time of image recording, light from a subject is incident on an image sensor (not shown) via the optical block 101, and the image formed on the image sensor is converted into an electric signal by photoelectric conversion in the signal conversion unit 102. It becomes an imaging (video) signal. The imaging signals are sequentially supplied to the camera signal processing unit 103 and subjected to digital conversion, image quality correction processing, and the like. The image data that has been subjected to various processes is temporarily stored in the image RAM 109 via the image bus 113.

表示処理部１０５は、カメラ信号処理部１０３からの画像データを、画像ＲＡＭ１０９を介して受け取り、表示用の画像信号を生成して表示部１０８に供給する。これにより、現在撮像中の画像が表示部１０８に表示され、撮影者は、この画像を視認して画角を確認することができる。 The display processing unit 105 receives the image data from the camera signal processing unit 103 via the image RAM 109, generates a display image signal, and supplies it to the display unit 108. As a result, the currently captured image is displayed on the display unit 108, and the photographer can visually check this image and confirm the angle of view.

また、画像信号処理部１０６は、カメラ信号処理部１０３から出力された画像データを、画像ＲＡＭ１０９を介して順次受け取り、例えばＭＰＥＧ方式を用いて圧縮符号化処理を施して動画像ファイルを生成し、記憶部１０７に記録する。 In addition, the image signal processing unit 106 sequentially receives the image data output from the camera signal processing unit 103 via the image RAM 109, performs compression encoding processing using, for example, the MPEG method, and generates a moving image file. Records in the storage unit 107.

また、図示しない操作部に設けられたシャッターボタンの押下などに応じて、カメラ信号処理部１０３から１フレーム分の画像データを取り出し、画像信号処理部１０６において、例えばＪＰＥＧ方式を用いて圧縮符号化処理を施して静止画像ファイルを生成し、記憶部１０７に記録することもできる。 Further, in response to pressing of a shutter button provided in an operation unit (not shown), image data for one frame is extracted from the camera signal processing unit 103, and the image signal processing unit 106 performs compression encoding using, for example, the JPEG method. It is also possible to generate a still image file by performing processing and record it in the storage unit 107.

さらに、画像の再生時には、記憶部１０７に記憶された画像ファイルが、画像信号処理部１０６に読み出されて復号され、表示処理部１０５に供給されて表示用の画像に変換される。これにより、動画像または静止画像を表示部１０８に再生表示することができる。 Furthermore, at the time of image reproduction, the image file stored in the storage unit 107 is read out and decoded by the image signal processing unit 106, supplied to the display processing unit 105, and converted into a display image. Thereby, a moving image or a still image can be reproduced and displayed on the display unit 108.

上述のような画像の記録時において、顔検出部１０４は、カメラ信号処理部１０３からの出力画像データを、画像ＲＡＭ１０９を介して受け取り、顔の検出処理を実行する。顔検出部１０４の顔の検出結果は制御部１１０に供給される。 At the time of image recording as described above, the face detection unit 104 receives the output image data from the camera signal processing unit 103 via the image RAM 109 and executes face detection processing. The face detection result of the face detection unit 104 is supplied to the control unit 110.

次に、上述したＡＥ、ＡＦ、ＡＷＢ制御について説明する。
制御部１１０のＣＰＵは、顔検出部１０４の顔検出結果に基づきＡＥ（Auto Exposure）、ＡＦ（Auto Focus）やＡＷＢ（Auto White Balance）などの制御を行う。例えば、検出された顔の明るさや色を最適にするように絞り量、シャッタースピードやホワイトバランスゲインを調整する。あるいは、検出された顔にフォーカスを合わせるといった制御を行うことができる。
なお、顔検出部１０４は、記憶部１０７内の画像ファイルの再生時において、画像信号処理部１０６により復号された画像データを、画像ＲＡＭ１０９を介して受け取り、画像の記録時と同様に、顔の検出処理を実行してもよい。 Next, the above-described AE, AF, and AWB control will be described.
The CPU of the control unit 110 controls AE (Auto Exposure), AF (Auto Focus), AWB (Auto White Balance), and the like based on the face detection result of the face detection unit 104. For example, the aperture amount, shutter speed, and white balance gain are adjusted so as to optimize the brightness and color of the detected face. Alternatively, it is possible to perform control such as focusing on the detected face.
The face detection unit 104 receives the image data decoded by the image signal processing unit 106 via the image RAM 109 when the image file in the storage unit 107 is played back, and the face detection unit 104 performs the same processing as the image recording. Detection processing may be executed.

次に、顔検出の主な処理動作である、スケーリング処理、操作処理、顔判定処理と検出候補領域処理について説明する。
まず、スケーリング処理について説明する。
スケーリング部２０１では、撮像された画面上の画像データ３００を所定の縮小率（または拡大率）で縮小（または拡大）した画像データ３０１を生成する。また、画像データと同じ縮小率（または拡大率）で、コントローラ２０５から供給された顔の検出候補領域のビットマップ３０２についても縮小（拡大）を行い、スケーリングされた顔の検出候補領域のビットマップ３０３を生成する。図３に、画像データとそれに対応するビットマップ図を示す。
図３（ａ）、（ｂ）において、画像データ３００、３０１は互いに拡大または縮小した時の画像を示す。これに対応して、図３（ｃ）、（ｄ）は、画像データを拡大または縮小したときに対応するビットマップデータを示す。
顔の検出候補領域のビットマップとは、画像データのピクセル（画素）に対応したフラグデータ群で、画像バッファ２０２により走査処理のために切出される矩形画像データの左上ピクセルが、顔の検出候補領域内であればＯｎ（1）となり、顔の検出候補領域外であればＯｆｆ（0）となる。 Next, scaling processing, operation processing, face determination processing, and detection candidate region processing, which are main processing operations for face detection, will be described.
First, the scaling process will be described.
The scaling unit 201 generates image data 301 obtained by reducing (or enlarging) the image data 300 on the captured screen at a predetermined reduction rate (or enlargement rate). Further, the face detection candidate area bitmap 302 supplied from the controller 205 is also reduced (enlarged) at the same reduction ratio (or enlargement ratio) as the image data, and the scaled face detection candidate area bitmap is scaled. 303 is generated. FIG. 3 shows image data and a corresponding bit map.
3A and 3B, image data 300 and 301 indicate images when enlarged or reduced with respect to each other. Correspondingly, FIGS. 3C and 3D show corresponding bitmap data when the image data is enlarged or reduced.
The face detection candidate area bitmap is a group of flag data corresponding to pixels (pixels) of image data, and the upper left pixel of the rectangular image data cut out by the image buffer 202 for scanning processing is the face detection candidate. If it is within the area, it is On (1). If it is outside the face detection candidate area, it is Off (0).

図３（ａ）は、拡大した時の画像データ３００を示し、この画像データ３００に対応した顔の検出候補領域のビットマップ３０２を図３（ｃ）に示す。検出候補領域のビットマップ３０２において、データ“１”を示す領域は、顔の検出候補領域を示し、データ“０”を示す領域は、顔の検出候補領域以外の領域を示す。
図３（ｂ）は、縮小した時の画像データ３０１を示し、これに対応した検出候補領域のビットマップ３０３において、データ“１”を示す領域は顔の検出候補領域を示し、データ“０”を示す領域は顔の検出候補領域以外の領域を示す。
図３（ｃ）と図３（ｄ）を比較すると、拡大した時のデータ“１”の領域が５＊５（ここで、＊印は掛算記号を示す）の領域を示し、縮小した時の顔の検出候補領域を示すデータ“１”の領域が３＊３の領域を示す。 FIG. 3A shows image data 300 when enlarged, and FIG. 3C shows a bitmap 302 of face detection candidate areas corresponding to the image data 300. In the detection candidate area bitmap 302, an area indicating data “1” indicates a face detection candidate area, and an area indicating data “0” indicates an area other than the face detection candidate area.
FIG. 3B shows the image data 301 when reduced, and in the detection candidate area bitmap 303 corresponding thereto, the area indicating the data “1” indicates the face detection candidate area, and the data “0”. The area indicating “” indicates an area other than the face detection candidate area.
Comparing FIG. 3 (c) and FIG. 3 (d), the area of the data “1” when enlarged indicates the area of 5 * 5 (where * indicates a multiplication symbol), and when the area is reduced An area of data “1” indicating a face detection candidate area indicates a 3 * 3 area.

次に、図４、５を用いて走査処理について説明する。
走査処理（後述の図１２のステップＳＴ−１０２に対応する）では、スケーリングした画像データに対して、所定サイズの矩形領域を水平方向および垂直方向に所定量だけ移動させながら走査し、矩形画像データを切り出す。
画像バッファ２０２は、コントローラ２０５の制御に基づき、スケーリング処理によって得られた画像データに対して、図４に示す処理を行う。すなわち、所定サイズの矩形領域３１０を画面の左上の端から水平方向に移動量Δｘだけ移動させ（水平方向に移動した矩形領域３１１に対応する）、画像データの右端に到達すると、左端に戻るとともに、垂直方向に移動量Δｙだけ位置を変更し、再び水平方向に移動させる。この動作を繰り返して画像データの右下端まで走査を行い、矩形画像データを順次切り出す。
また、コントローラ２０５は、走査処理により切出された矩形画像データの左上ピクセルに対応する、顔の検出候補領域のフラグデータを取り出す処理も行う。 Next, the scanning process will be described with reference to FIGS.
In the scanning process (corresponding to step ST-102 in FIG. 12 to be described later), the scaled image data is scanned while moving a predetermined size rectangular area by a predetermined amount in the horizontal direction and the vertical direction, thereby obtaining rectangular image data. Cut out.
The image buffer 202 performs the process shown in FIG. 4 on the image data obtained by the scaling process based on the control of the controller 205. That is, the rectangular area 310 of a predetermined size is moved by the movement amount Δx in the horizontal direction from the upper left end of the screen (corresponding to the rectangular area 311 moved in the horizontal direction), and when reaching the right end of the image data, the rectangular area 310 returns to the left end. Then, the position is changed in the vertical direction by the movement amount Δy, and is moved again in the horizontal direction. By repeating this operation, scanning is performed to the lower right corner of the image data, and rectangular image data is sequentially cut out.
The controller 205 also performs a process of extracting flag data of the face detection candidate area corresponding to the upper left pixel of the rectangular image data cut out by the scanning process.

図５に画面をスケーリング部２０１により縮小した時の画像を示す。画像データ３００をスケーリング比によりΔｒ倍した画像を画像データ３００ａに示し、この画像データ３００ａをまたΔｒ倍した画像を画像データ３００ｂに示す。以下同様に、所定、または任意のスケーリングにより、画像を縮小する。
図５では、縮小する場合のスケーリングの例を示したが、拡大する場合のスケーリングも同様に行うことができる。例えば画面データ３００ｄを基準にして、画像データ３００ｃ、３００ｂ、３００ａ、３００の順序に沿った画面とするとよい。なお、拡大する場合に関する説明は、縮小する場合と逆であるので省略する。 FIG. 5 shows an image when the screen is reduced by the scaling unit 201. An image obtained by multiplying the image data 300 by Δr by the scaling ratio is shown in image data 300a, and an image obtained by multiplying the image data 300a by Δr is shown in image data 300b. Similarly, the image is reduced by predetermined or arbitrary scaling.
FIG. 5 shows an example of scaling in the case of reduction, but scaling in the case of enlargement can be performed in the same manner. For example, the screen may be a screen in the order of the image data 300c, 300b, 300a, 300 with the screen data 300d as a reference. In addition, since the description regarding the case of enlarging is the reverse of the case of reducing, it is omitted.

なお、図５に示すように、走査処理の際に走査される矩形領域３１０のサイズは、画像データの拡大または縮小した時のサイズによらず、常に同一のサイズとする。顔検出部１０４に入力された画像データのサイズをスケーリング部２０１で変更し、サイズが変更された画像データに対して所定サイズの矩形領域３１０を走査させることで、矩形領域３１０内で検出される顔のサイズが変わるため、実際の顔サイズを検出することができる。 As shown in FIG. 5, the size of the rectangular area 310 scanned in the scanning process is always the same size regardless of the size when the image data is enlarged or reduced. The size of the image data input to the face detection unit 104 is changed by the scaling unit 201, and the image data whose size has been changed is scanned in the rectangular region 310 of a predetermined size, thereby being detected in the rectangular region 310. Since the face size changes, the actual face size can be detected.

次に、顔判定処理について説明する。
顔判定処理（後述の図１２のステップＳＴ−１０４に対応）では、走査処理によって切り出した矩形画像データにおける所定の画素に対して、所定の演算処理を施し、矩形画像データが顔を含むか否かを判定する。
走査処理によってスケーリングされた画像データから矩形画像データが切り出されると、顔検出コア部２０３は、画像バッファ２０２から供給された矩形画像データ内の画素のうち、予め学習によって定められた２点の画素における輝度値の差分を算出する。
また、学習によって定められた他の２点の画素における輝度値の差分を同様に算出していき、算出されたこれらの輝度値の差分に基づいて、矩形画像データにおける顔らしさを示す検出確度値を決定する。そして、この検出確度値に対して所定の閾値を設定し、検出確度値が閾値以上であれば、矩形画像データが顔を含むと判定し、検出確度値が閾値よりも低ければ、矩形画像データが顔を含まないと判定する。
なお、ここで顔検出に輝度値を用いた例を示したが、これは顔の眉、目、鼻、口などの輝度データが互いに関連することに基づいているためであり、必ずしも顔の色データ、顔の形状などに基づいて顔を検出する必要がない。
また、これとは別に、従来の顔の色データや顔の形状データ、輝度データを組み合わせて顔を検出しても良く、本発明はこれらに限定するものでない。 Next, the face determination process will be described.
In face determination processing (corresponding to step ST-104 in FIG. 12 described later), predetermined calculation processing is performed on predetermined pixels in the rectangular image data cut out by the scanning processing, and whether or not the rectangular image data includes a face. Determine whether.
When the rectangular image data is cut out from the image data scaled by the scanning process, the face detection core unit 203 selects two pixels determined in advance from among the pixels in the rectangular image data supplied from the image buffer 202. The difference between the luminance values at is calculated.
In addition, the difference between the luminance values of the other two pixels determined by learning is calculated in the same manner, and the detection accuracy value indicating the facialness in the rectangular image data based on the calculated difference between the luminance values. To decide. A predetermined threshold is set for the detection accuracy value. If the detection accuracy value is equal to or greater than the threshold, it is determined that the rectangular image data includes a face. If the detection accuracy value is lower than the threshold, the rectangular image data Is determined not to contain a face.
In addition, although the example which used the luminance value for the face detection was shown here, this is because luminance data such as eyebrow, eyes, nose, and mouth of the face are related to each other, and the color of the face is not necessarily There is no need to detect a face based on data, face shape, or the like.
In addition, the face may be detected by combining conventional face color data, face shape data, and luminance data, and the present invention is not limited to these.

このように、スケーリングされた画像データを矩形領域３１０で走査することによって得られる矩形画像データに対して顔判定処理を行うことにより、検出確度値に応じて画像データ中の顔の位置を求めることができる。スケーリング処理においてスケーリングされた画像に対してさらに縮小率（拡大率）Δｒで縮小（拡大）されたスケーリングされた画像について、同様の顔判定処理を順次行うことにより、画像データ中の顔のサイズを求めることができる。
また、これらの顔判定処理は、後述の分岐（図１２のステップＳＴ−１０３）で示すように、走査によって切出された矩形画像データと共に取り出された顔の検出候補領域のフラグデータが、Ｏｎ（1）の場合のみ行われ、Ｏｆｆ（0）の場合には行われない。これにより、顔の検出候補領域内の矩形画像データのみに検出処理の実行が限定され、演算量を減らすことが出来る。
顔判定処理は画像のすべての領域を走査するまで繰り返される。さらに、すべてのサイズの画像データを処理するまで繰り返される。 In this way, by performing face determination processing on rectangular image data obtained by scanning the scaled image data in the rectangular area 310, the position of the face in the image data is obtained according to the detection accuracy value. Can do. The same face determination process is sequentially performed on the scaled image further reduced (enlarged) with the reduction rate (enlargement rate) Δr with respect to the image scaled in the scaling process, thereby reducing the size of the face in the image data. Can be sought.
Further, as shown in a later-described branch (step ST-103 in FIG. 12), these face determination processes are performed when the flag data of the face detection candidate area extracted together with the rectangular image data cut out by scanning is On. Only performed in the case of (1), not performed in the case of Off (0). Thereby, the execution of the detection process is limited only to the rectangular image data in the face detection candidate area, and the amount of calculation can be reduced.
The face determination process is repeated until all areas of the image are scanned. Further, the process is repeated until image data of all sizes is processed.

次に、検出候補領域処理について説明する。
検出候補領域処理は、顔検出コア部２０３により出力された顔検出結果をもとに、次のフレーム画像で顔検出を行う画像範囲を、顔の検出候補領域として算出する。 Next, detection candidate area processing will be described.
In the detection candidate area processing, based on the face detection result output by the face detection core unit 203, an image range in which face detection is performed on the next frame image is calculated as a face detection candidate area.

ここで、上述した検出候補領域について定義する。
顔の検出候補領域データは図６（ａ）に示すように、入力画像を縦横ｍ＊ｍピクセル（画素；ｍは正の整数）単位に区画分けし、横方向を行候補領域データ（column_block）、縦方向を列候補領域データ（row_block）と定義したものである。ｍの大きさは、顔の検出候補領域を詳細に設定したければ小さい値にすれば良いが、小さい値にするほど行または列候補領域データのデータ量が増大してしまうので実用的ではない。そのために、ｍは使用上の適切なサイズに定義する。また、区画分けは、これ以外に縦横ｍ＊ｎピクセルとしｍとｎ（画素；ｎとｍは正の整数）が異なる値に設定しても良い。
さらに、行候補領域データは一区画当たり所定の検出確度値の範囲で、例えば０〜３の２ビットの顔検出確度値をもち、この２ビットが横方向の区画の一行分を表す。同様に列候補領域データは一区画当たり２ビットの顔検出確度値をもち、この２ビットが縦方向の区画の一列分を表す。 Here, the detection candidate areas described above are defined.
In the face detection candidate area data, as shown in FIG. 6A, the input image is divided into vertical and horizontal m * m pixels (pixels; m is a positive integer), and the horizontal direction is row candidate area data (column_block). The vertical direction is defined as column candidate area data (row_block). The size of m may be set to a small value if the face detection candidate region is set in detail, but the amount of row or column candidate region data increases as the value becomes smaller, which is not practical. . Therefore, m is defined as an appropriate size for use. In addition, the partitioning may be set to m × n pixels in the vertical and horizontal directions, and m and n (pixels; n and m are positive integers) different values.
Further, the row candidate area data has a 2-bit face detection accuracy value of, for example, 0 to 3 within a predetermined detection accuracy value range per division, and these 2 bits represent one row of the horizontal division. Similarly, the column candidate area data has a face detection accuracy value of 2 bits per section, and these 2 bits represent one column of the section in the vertical direction.

なお、顔検出確度値の０〜３の値は、画面上の各区画のそれぞれ独立した座標上のアドレスに設けられるのではなく、図６（ｂ）に示すように、画面上に構成された各区画に対応した行方向の１列と列方向の１列に対してのみ設定される。原理的には、全ての区画に顔検出確度値をそれぞれ設定してよいが、行と列方向の１行と１列のみに顔検出確度値を設定して、行、列方向の顔検出確度値が所定以上の範囲を顔検出候補領域をする方が、演算処理する回数を減少することができ、それに伴い画像処理を高速化することができる。なお、詳細な例に付いては、後述する。 Note that the face detection accuracy values 0 to 3 are not provided at addresses on independent coordinates of each section on the screen, but are configured on the screen as shown in FIG. 6B. It is set only for one column in the row direction and one column in the column direction corresponding to each section. In principle, face detection accuracy values may be set for all sections, but face detection accuracy values are set for only one row and one column in the row and column directions, and face detection accuracy values in the row and column directions are set. If the face detection candidate region is in a range where the value is greater than or equal to the predetermined value, the number of times of the arithmetic processing can be reduced, and the image processing can be speeded up accordingly. Detailed examples will be described later.

顔検出確度値は、数値が大きいほど次のフレームで顔が検出される可能性の高いことを表している。顔検出確度値＝０は、顔が検出される可能性が無いことである。
この行候補領域データと列候補領域データ中の顔検出確度値が共に顔検出確度値≧１である領域が、次のフレーム画像の検出候補領域である。 The face detection accuracy value indicates that the larger the numerical value, the higher the possibility that a face will be detected in the next frame. The face detection accuracy value = 0 indicates that there is no possibility that a face is detected.
A region where the face detection accuracy value in the row candidate region data and the column candidate region data is a face detection accuracy value ≧ 1 is a detection candidate region of the next frame image.

なお、ここで顔検出確度値とは、０〜３（２ビット）までの重み付けした値を示し、フレーム画像間において変化する。
例えば、現フレーム画像において、画面上で矩形領域３１０が検出候補領域を水平方向または垂直方向に移動した時に、顔検出されない矩形領域の座標の原点（０，０）が前フレーム画像における顔検出された区画に重複すると、前フレーム画像の顔検出確度値から“１”を減算する。たとえば、前フレーム画像の顔検出確度値が“３”のとき、顔検出確度値は、１減少して“２”となり、また前フレーム画像で顔検出確度値が“２”のとき、“１”になり、さらに前フレーム画像の顔検出確度値が“１”のときは“０”になる。
また、前フレーム画像における顔検出確度値が“０,１,２”のとき、現フレーム画像で新たに顔検出される区画では顔検出確度値は“３”となる。
図６（ｂ）は、画面上で１区画をｍ＊ｍ個（ｍは正の整数）の画素で構成し、行候補領域の区画をｋ＝０，１，２，・・・，ｐ−１また列候補領域の区画をｉ＝０，１，２，・・・，ｎ−１（ここで、ｐとｎは１より大きい整数を表す）としたときの複数の区画で構成される画面構成を示す。 Here, the face detection accuracy value indicates a weighted value from 0 to 3 (2 bits), and varies between frame images.
For example, in the current frame image, when the rectangular area 310 moves the detection candidate area in the horizontal direction or the vertical direction on the screen, the origin (0, 0) of the coordinates of the rectangular area where the face is not detected is detected in the previous frame image. If it overlaps with another section, “1” is subtracted from the face detection accuracy value of the previous frame image. For example, when the face detection accuracy value of the previous frame image is “3”, the face detection accuracy value decreases by 1 to “2”, and when the face detection accuracy value of the previous frame image is “2”, “1” “,” And “0” when the face detection accuracy value of the previous frame image is “1”.
When the face detection accuracy value in the previous frame image is “0, 1, 2”, the face detection accuracy value is “3” in the section where a new face is detected in the current frame image.
In FIG. 6B, one section is composed of m * m pixels (m is a positive integer) on the screen, and the sections of the row candidate areas are k = 0, 1, 2,..., P−. 1 or a screen composed of a plurality of sections when the section of the column candidate area is i = 0, 1, 2,..., N−1 (where p and n represent integers greater than 1). The configuration is shown.

次に、図７を用いて検出候補領域の設定について説明する。
顔検出結果を受け取ると候補領域処理部２０４は、１以上の顔検出確度値をもつ、すべての区画の顔検出確度値から１を減算した後に、図７に示すように顔検出座標が含まれる行候補領域データの区画番号（i）と列候補領域データの区画番号（k）を算出し、区画（i,k）Ｃ００の周辺区画±ｎに対応する行候補領域データと列候補領域データに、顔検出確度値＝３を設定する。図８（ａ）〜（ｄ）は、顔検出確度値の遷移４００〜４０３は時間の経過による変化を図示したものである。 Next, setting of detection candidate areas will be described with reference to FIG.
When the face detection result is received, the candidate area processing unit 204 includes face detection coordinates as shown in FIG. 7 after subtracting 1 from the face detection accuracy values of all sections having one or more face detection accuracy values. The partition number (i) of the row candidate region data and the partition number (k) of the column candidate region data are calculated, and the row candidate region data and the column candidate region data corresponding to the peripheral partition ± n of the partition (i, k) C00 are calculated. Then, the face detection accuracy value = 3 is set. FIGS. 8A to 8D illustrate changes in face detection accuracy value transitions 400 to 403 over time.

図７において、Ｃ００は、ｎ＝０のときの区画で座標（i,k）のＣ００を示し、ｎ＝１のときの１区画は（i,k）を中心として水平、垂直方向にそれぞれ１区画広げた区画を示す。
この拡大した区画は、図７の斜線で示した区画で、（i−ｎ,k−ｎ）、（i,k−ｎ）、（i＋ｎ,k―ｎ）、（i−ｎ,k）、（i,k）、（i＋ｎ,k）、（i−ｎ,k＋ｎ）、（i,k＋ｎ）、（i＋ｎ,k＋ｎ）の９区画で構成され、顔検出確度値をRow block i[1:0]=3, coloumn block k[1:0]=3で表す。
その結果、画面上の画像データは９区画をまとめて１区画と見做され、この区画を新たな区画として顔検出用の矩形領域３１０を走査するので、フレーム間で画像が激しく移動する場合、顔検出を見失うことを少なくすることができる。さらに、変数ｎを２、３、・・・と変えて区画をさらに拡張することができる。
このように、変数ｎを可変することにより、画像の移動の変化に伴う顔検出を最適化することができる。 In FIG. 7, C00 is a section when n = 0 and indicates C00 of coordinates (i, k), and one section when n = 1 is 1 in the horizontal and vertical directions centering on (i, k), respectively. The section which expanded the section is shown.
This enlarged section is a section indicated by hatching in FIG. 7, and (i−n, k−n), (i, k−n), (i + n, k−n), (i−n, k), (I, k), (i + n, k), (i−n, k + n), (i, k + n), and (i + n, k + n) are configured by nine sections, and the face detection accuracy value is set to Row block i [1: 0 ] = 3, coloumn block k [1: 0] = 3.
As a result, the image data on the screen is regarded as one section by combining nine sections, and the face detection rectangular area 310 is scanned with this section as a new section. Therefore, when the image moves violently between frames, It is possible to reduce losing sight of face detection. Further, the variable n can be changed to 2, 3,.
In this way, by varying the variable n, it is possible to optimize face detection accompanying changes in image movement.

図７において、周辺区画±ｎは次のフレーム画像で顔の検出される確率が高い領域の広さを表すものである。微小時間単位で動画像より切出したフレーム画像間では、人物の顔の移動量が±ｎ区画以内（移動の最大量）に収まるとした場合である。
図７および図８ではｎ＝１とした例を示す。周辺区画±ｎの数値は、移動の最大量から算出しても良いし、予測の確率を高めたければ、例えば動画像の圧縮技術で取り入れられている動き予測のアルゴリズムを用いて算出しても良い。また、ｎの値を上下左右に分解して、各々の向きに違った値を用いてもよい。
また、図８（ｂ）において、前フレーム画像の検出確度値が“３”であったすべての区画の顔検出確度値“３”から“１”を減算して“２”としているのは、同一の位置から顔が検出される確率は、時間の経過と共に下がっていくと考えられるためである。図８（ｃ）と図８（ｄ）においても、同様にフレーム画像が時間経過とともに顔検出確度値は減少する。顔検出確度値≧１の間は顔が検出される確率がまだ存在するとして、顔の検出候補領域に設定する。顔検出確度値＝０になると、顔の検出候補領域から除外される。
以上のように顔検出の確度値を減算する処理は、一時的な顔の条件（向き、輝度、髪型、表情、など）により顔が検出出来なかった場合に、顔の検出候補領域データの顔検出確度値の設定に慣性を与える。これにより、一時的に顔が検出できなかったとしても、顔の検出候補領域を見失うことなく、顔の追跡検出が可能となる。 In FIG. 7, the peripheral section ± n represents the size of an area where the probability that a face is detected in the next frame image is high. This is a case where the movement amount of a person's face is within ± n sections (maximum movement amount) between frame images cut out from a moving image in minute time units.
7 and 8 show an example in which n = 1. The numerical value of the peripheral block ± n may be calculated from the maximum amount of movement, or if it is desired to increase the prediction probability, it may be calculated using, for example, a motion prediction algorithm incorporated in a moving image compression technique. good. Further, the value of n may be decomposed vertically and horizontally, and different values may be used for the respective directions.
In FIG. 8B, “2” is obtained by subtracting “1” from the face detection accuracy value “3” of all the sections where the detection accuracy value of the previous frame image was “3”. This is because the probability that a face is detected from the same position is considered to decrease with the passage of time. 8C and 8D, similarly, the face detection accuracy value of the frame image decreases with time. If the probability of face detection still exists while the face detection accuracy value ≧ 1, it is set as a face detection candidate region. When the face detection accuracy value = 0, the face detection candidate area is excluded.
As described above, the process of subtracting the face detection accuracy value is performed when the face cannot be detected due to temporary face conditions (direction, brightness, hairstyle, facial expression, etc.). Inertia is given to the setting of the detection accuracy value. As a result, even if a face cannot be detected temporarily, face detection can be performed without losing sight of the face detection candidate area.

図８に、画面上に形成された複数の区画を顔検出用の矩形領域３１０で走査した例を示す。
図８（ａ）は、画面上の区画をRow_block_i[1:0]のｉをｉ＝２０、colum_block_k[1:0]のｋをｋ＝１５とした例を示す。顔検出において、矩形領域３１０は顔が検出された位置である。
データ“０”は顔検出度確度値＝０で検出されなかったことを示す。一方、データ“３”は顔検出確度値＝３で現フレーム画像で顔検出したことを示す。
そして、検出結果を記憶する場合は、画面上の各区画の座標値の顔検出確度値を記憶しないで、ｙ座標列に対応した００３３３０００・・・とまたｘ座標列に対応した００００００００００３３３０・・・の値を例えばコントローラ２０５に記憶する。 FIG. 8 shows an example in which a plurality of sections formed on the screen are scanned with a rectangular area 310 for face detection.
FIG. 8A shows an example in which the i on the row_block_i [1: 0] is i = 20 and the k on the colum_block_k [1: 0] is k = 15. In face detection, the rectangular area 310 is a position where a face is detected.
Data “0” indicates that the face detection accuracy value = 0 and not detected. On the other hand, data “3” indicates that a face is detected in the current frame image with a face detection accuracy value = 3.
When the detection result is stored, the face detection accuracy value of the coordinate value of each section on the screen is not stored, and 00333000... Corresponding to the y coordinate sequence and 00000000003330. Is stored in the controller 205, for example.

図８（ｂ）は、例えば次のフレーム画像で顔が左斜め下に１区画移動した画像を示す。図８（ａ）と同様に、矩形領域３１０aは顔が検出された位置であり、ｉ＝１０〜１２とｋ＝４〜６で囲まれる領域はこのフレーム画像で新たに顔が検出されたから顔検出確度値＝３となる。一方、今回顔は検出されなかった領域で且つ前フレーム画像で顔検出された領域である、ｋ＝３でｉ＝１１〜１３は“２”と顔検出確度値が“３”から“２”へ“１”減少する。またｉ＝１３で且つｋ＝４、５の領域も顔検出確度値は“１”減少し、顔検出確度値は“２”となる。前フレーム画像と現フレーム画像で顔が検出されなかった領域（区画）の顔検出確度値は“０”である。
この結果、ｘ座標列に対応して００００００００３３２０００００とｙ座標列の００２３３３０００００００をコントローラ２０５に記憶する。 FIG. 8B shows, for example, an image in which the face has moved one section downward and to the left in the next frame image. Similarly to FIG. 8A, the rectangular area 310a is a position where a face is detected, and the area surrounded by i = 10 to 12 and k = 4 to 6 is a face because a new face is detected in this frame image. Detection accuracy value = 3. On the other hand, the face is not detected this time and the face is detected in the previous frame image, i = 11 to 13 is “2” from k = 3, and the face detection accuracy value is “3” to “2”. Decrease by “1”. In the region where i = 13 and k = 4, 5, the face detection accuracy value decreases by “1” and the face detection accuracy value becomes “2”. The face detection accuracy value of an area (section) where no face is detected in the previous frame image and the current frame image is “0”.
As a result, 000000003320000000 and y coordinate sequence 002333000000 are stored in the controller 205 corresponding to the x coordinate sequence.

図８（ｃ）は、図８（ｂ）に続くフレーム画像における顔の移動状態を示す。図８（ｂ）の顔の位置から斜め右下へ１区画移動した例を示す。その結果、ｉ＝１１〜１３、ｋ＝３の領域は、さらに顔検出確度値が“１”減少し、顔検出確度値＝１となり、ｉ=１０〜１２、ｋ＝４の領域は前フレーム画像で顔検出され、現フレーム画像で顔検出されなかったので顔検出確度＝２、ｉ＝１３且つｋ＝４で示される領域は前々回と前フレーム画像で顔検出されなかったので、顔検出確度値＝１となる。ｉ＝１０とｋ＝５、６では前回顔検出されたので顔検出確度値=２、ｉ=１１〜１３とｋ＝５〜７では今回顔検出されたので、顔検出確度値＝３となる。
その結果、ｘ座標列の０００００００００２３３３０００・・・とｙ座標列の００１２３３３０００００・・・が記憶部１０７に記憶される。
図８（ｄ）は、時間経過に伴うフレーム画像で顔が右斜め下に１区画ずつシフトした例を示す。上述した例と同様に検出した結果、ｘ座標列に対応して、０００００００００１２３３３００・・・、ｙ座標列に対応して、０００１２３３３００００・・・の顔検出確度値が得られ、この顔検出確度値をコントローラ２０５に記憶する。 FIG. 8C shows a face movement state in the frame image following FIG. An example is shown in which one block is moved diagonally from the face position in FIG. As a result, the area of i = 11 to 13 and k = 3 further decreases the face detection accuracy value by “1” and becomes the face detection accuracy value = 1, and the area of i = 10 to 12 and k = 4 Since the face is detected in the image and the face is not detected in the current frame image, the face detection accuracy = 2, i = 13, and the region indicated by k = 4 has not been detected in the previous frame image, so the face detection accuracy Value = 1. When i = 10 and k = 5 and 6, the previous face detection was performed, so the face detection accuracy value = 2, and when i = 11 to 13 and k = 5-7, the current face detection was performed, so the face detection accuracy value = 3. .
As a result, the x coordinate string 0000000002333000... And the y coordinate string 001233300000.
FIG. 8D shows an example in which the face is shifted diagonally to the lower right by one section in the frame image over time. As a result of detection in the same manner as in the above-described example, face detection accuracy values of 00000031233300... Corresponding to the x coordinate sequence and 00012333300000... Corresponding to the y coordinate sequence are obtained. Store in the controller 205.

上述したように、フレーム経過に対応して、検出した顔検出確度値を設定することにより、顔検出の領域の顔検出確度値に重み付けし、その検出確度の出現割合を数値化して確度の高い検出候補領域を設定することができる。 As described above, by setting the detected face detection accuracy value corresponding to the progress of the frame, the face detection accuracy value in the face detection area is weighted, and the appearance ratio of the detection accuracy is quantified and the accuracy is high. Detection candidate areas can be set.

図９を用いて、検出候補領域のビットマップ変換の動作について説明する。
図９（ａ）に示す顔の検出候補領域データを受け取るとコントローラ２０５は図９（ｂ）に示すように、検出候補領域データを検出候補領域のビットマップへ展開する。
ｘ座標列に対応して、顔検出確度値０００００００００１２３３３０００・・・、ｙ座標列に対応して顔検出確度値０００１２３３３０００・・・が求められ、このｘ座標列で得られた顔検出確度値１２３３３とｙ座標列で得られた顔検出確度値１２３３３で囲まれる領域が、検出候補領域となる。
図９（ｂ）は、図９（ａ）の画像をビットマップに変換した例である。図９（ｂ）において、顔の検出候補領域のビットマップがＯｎ（1）となるのは、行候補領域データと列候補領域データの顔検出確率値が共に顔検出確度値≧１（ビットマップ＝１）の区画で、その区画内のすべてのピクセルに対応するビットマップが対象である。すなわち、row_block>=1かつcolumn_Block_k>=1で囲まれた領域が検出候補領域である。それ以外の行候補領域データと列候補領域データの顔検出確率値のどちらかが、顔検出確度値＝０（ビットマップ＝０）の区画はＯｆｆ（0）が設定される。 The operation of the bitmap conversion of the detection candidate area will be described with reference to FIG.
Upon receiving the face detection candidate area data shown in FIG. 9A, the controller 205 expands the detection candidate area data into a detection candidate area bitmap as shown in FIG. 9B.
.. corresponding to the x coordinate sequence, face detection accuracy values 000000000123333000... corresponding to the y coordinate sequence, face detection accuracy values 00012333000... A region surrounded by the face detection accuracy value 12333 obtained from the y coordinate sequence is a detection candidate region.
FIG. 9B is an example in which the image of FIG. 9A is converted into a bitmap. In FIG. 9B, the bit map of the face detection candidate area becomes On (1) because the face detection probability values of the row candidate area data and the column candidate area data are both face detection accuracy values ≧ 1 (bitmap = 1), the bitmap corresponding to all the pixels in the partition is the target. That is, a region surrounded by row_block> = 1 and column_Block_k> = 1 is a detection candidate region. Off (0) is set for a section where the face detection probability value = 0 (bitmap = 0) of any other row candidate area data or column candidate area data.

図１０を用いて、追跡復帰の設定について説明する。
図１０（ａ）は、時間経過に対するフレーム画像Ａ−０２、Ａ−０１、Ａ００、Ａ０１、Ａ０２、Ａ０３、・・・の例を示し、このフレーム画像の内、フレーム画像Ａ−０２では画面全体で顔検出を行い、フレーム画像Ａ−０１では検出の検出確度を求める。
また、フレーム画像Ａ００は顔検出した結果、前フレーム画像Ａ００で検出した検出候補領域のビットマップ＝１で顔が検出されないときの動作を示し、フレーム画像Ａ０１は前フレーム画像でビットマップの“１”と“０”の領域を切り替えて（反転して）、顔検出する領域を異なる領域に設定する動作を示す。フレーム画像Ａ０２は、フレーム画像Ａ０１で“１”となった領域を走査して、その中から新たに顔検出候補領域を検出し、検出候補領域（ビットマップ＝１）を求める動作を示す。 The setting for tracking return will be described with reference to FIG.
FIG. 10A shows an example of frame images A-02, A-01, A00, A01, A02, A03,... Over time. Of these frame images, the frame image A-02 shows the entire screen. Then, face detection is performed, and detection accuracy of detection is obtained in the frame image A-01.
Further, the frame image A00 shows the operation when the face is not detected with the bit map = 1 of the detection candidate area detected in the previous frame image A00 as a result of the face detection, and the frame image A01 is the previous frame image and “1” of the bitmap. The operation of switching (inverting) the “0” and “0” areas and setting the face detection area to a different area is shown. The frame image A02 shows an operation of scanning a region that is “1” in the frame image A01, newly detecting a face detection candidate region, and obtaining a detection candidate region (bitmap = 1).

以下、上述した追跡復帰の動作について図１０（ｂ）〜図１０（ｄ）を用いて具体的に説明する。
図１０（ｂ）に示すように、顔検出しないフレーム画像Ａ００では、row_bloc_i>=1,colum_block_ｋ>=1がビットマップ＝１を表し、それ以外の領域がビットマップ＝０を表し、顔はこのビットマップ＝０の領域に存在しない。すなわち、フレーム画像Ａ００において、ビットマップ＝１の領域に顔の画像が存在せず、顔画像を見失った状態を示す。
時間が経過し、図１０（ｃ）に示すように、顔検出の動作を行うフレーム画像Ａ０１において、row_block_i>=1,colum_block_ｋ>=1がビットマップ＝０、それ以外の領域がビットマップ＝１を示し、図１０（ｂ）のビットマップデータを反転した値になっている。この反転したビットマップにおいて、顔はこのビットマップ＝１の領域に存在する。
さらに、時間経過し、図１０（d）に示すように、顔検出したフレーム画像Ａ０２において、フレーム画像Ａ０１でビットマップ＝１の領域の中で顔検出され、row_block_1>=1,colum_block_k>=1がビットマップ＝１、それ以外の領域がビットマップ＝０となる。すなわち、一度顔画像を見失ってもビットマップを反転し、この反転した領域について新たに顔検出することにより、再度顔を検出することができる。 Hereinafter, the tracking return operation described above will be described in detail with reference to FIGS. 10 (b) to 10 (d).
As shown in FIG. 10B, in the frame image A00 where the face is not detected, row_bloc_i> = 1, colum_block_k> = 1 represents bitmap = 1, and other areas represent bitmap = 0, It does not exist in the area of bitmap = 0. That is, in the frame image A00, there is no face image in the area of bitmap = 1, and the face image is lost.
As time passes, as shown in FIG. 10C, in the frame image A01 that performs the face detection operation, row_block_i> = 1, colum_block_k> = 1 is bitmap = 0, and other regions are bitmap = 1. This is a value obtained by inverting the bitmap data in FIG. In this inverted bitmap, the face is in the area of this bitmap = 1.
Further, as time passes, as shown in FIG. 10 (d), in the frame image A02 in which the face is detected, the face is detected in the area of bitmap = 1 in the frame image A01, and row_block_1> = 1, colum_block_k> = 1 Bitmap = 1, and other areas are bitmap = 0. That is, even if the face image is lost once, the face can be detected again by inverting the bitmap and newly detecting a face in the inverted area.

上述したように、顔の検出候補領域内に対して顔検出処理を行っていたとしても、顔の条件（向き、輝度、髪型、表情、など）が変化して顔が検出できない状態が続いたり、人物の顔が検出候補領域外へ瞬時に移動して顔が検出できなくなったりした場合には、顔検出確度値≧１を満たす期間内に顔が再検出できなくなり、フレーム画像Ａ００において追跡していた顔を見失うことが考えられる。その場合にコントローラ２０５は前フレーム画像までＯｎ（1）、Ｏｆｆ（0）としていた顔のフレーム画像Ａ００の検出候補領域ビットマップを、現フレーム画像では前フレーム画像のビットマップをフレーム画像Ａ０１で入れ替えて設定し、顔検出処理を行う。これにより見失った顔を改めてフレーム画像Ａ０２で検出し、顔検出の追跡を復帰させることができる。 As described above, even if face detection processing is performed on the face detection candidate area, the face condition (orientation, brightness, hairstyle, facial expression, etc.) changes and the face cannot be detected continuously. When a person's face moves instantaneously outside the detection candidate area and the face cannot be detected, the face cannot be detected again within a period that satisfies the face detection accuracy value ≧ 1, and is tracked in the frame image A00. It may be possible to lose sight of the face. In that case, the controller 205 replaces the detection candidate area bitmap of the face frame image A00 which has been On (1) and Off (0) up to the previous frame image, and replaces the bitmap of the previous frame image with the frame image A01 in the current frame image. And perform face detection processing. As a result, the face that has been lost can be detected again from the frame image A02, and tracking of the face detection can be restored.

また、図１１を用いて新規に顔が出現する場合の顔検出の追跡動作について説明する。
図１１（ａ）は、時間経過に対する顔検出の有無のフレーム画像を示す。フレーム画像Ｂ００は顔検出あり、またこれに続いてフレーム画像Ｂ０１で新たに顔が現われた例を示す。 The face detection tracking operation when a new face appears will be described with reference to FIG.
FIG. 11A shows a frame image with or without face detection over time. The frame image B00 shows an example in which face detection is performed and a new face appears in the frame image B01 following this.

図１１（ｂ）に示すように、フレーム画像Ｂ００において、row_block_k>=1,colum_block_i>=1の領域はビットマップ＝１、それ以外の領域はビットマップ＝０を示し、顔はビットマップ＝１の領域に存在する。
時間が経過し、図１１（ｃ）に示すようにフレーム画像Ｂ０１において、フレーム画像Ｂ０１で顔が新たに１個出現して２個の顔が存在する例を示す。フレーム画像Ｂ００で顔検出したビットマップ＝１の領域とビットマップ＝０の領域のデータを反転し、顔検出されなかった領域を、ビットマップ＝０からビットマップ＝１とする。この新たにビットマップ＝１とした領域のみを顔検出する。顔検出したフレーム画像Ｂ０１において、row_block_k>=1,colum_block_K>=1がビットマップ＝０、それ以外の領域がビットマップ＝１を示し、顔はこのビットマップ＝１の領域にも存在する。このビットマップ＝１とした領域内で新たに顔検出を行い、その領域中で顔検出された検出候補領域のみがrow_block_k>=1,colum_block_K>=1となり、それ以外はビットマップ＝０となる。
さらに、時間経過し、図１１（ｄ）に示すように、顔検出したフレーム画像Ｂ０２において、２個の顔が存在する領域を合わせて単純化された矩形領域が新たに検出候補領域となり、row_block_k>=1,colum_block_K>=1がビットマップ>=１、それ以外の領域がビットマップ＝０を示す。すなわち、新たに出現した顔を追加した２個の顔がこのビットマップ＝１で示す領域内に存在する。 As shown in FIG. 11B, in the frame image B00, the region where row_block_k> = 1, colum_block_i> = 1 indicates bitmap = 1, the other region indicates bitmap = 0, and the face indicates bitmap = 1. Exists in the area.
FIG. 11C shows an example in which one new face appears in the frame image B01 and two faces exist in the frame image B01 as time passes. The data of the area of bitmap = 1 and the area of bitmap = 0 detected for the face in the frame image B00 are inverted, and the area where the face is not detected is changed from bitmap = 0 to bitmap = 1. Only the area where the bit map = 1 is newly detected is detected. In the frame image B01 in which the face is detected, row_block_k> = 1, colum_block_K> = 1 indicates bitmap = 0, and other areas indicate bitmap = 1, and the face also exists in this bitmap = 1 area. A new face detection is performed in the area where the bit map = 1, and only the detection candidate area where the face is detected in the area is row_block_k> = 1, colum_block_K> = 1, otherwise the bit map = 0. .
Further, as time passes, as shown in FIG. 11D, a rectangular area simplified by combining the areas where two faces exist in the face image B02 detected as a face becomes a new detection candidate area, and row_block_k > = 1, colum_block_K> = 1 indicates bitmap> = 1, and other areas indicate bitmap = 0. That is, two faces to which newly appearing faces are added are present in the area indicated by this bitmap = 1.

このように、コントローラ２０５は図１１で示すように、検出候補領域のビットマップのＯｎ(1)、Ｏｆｆ（0）の入れ替えを定期的に実行しフレーム画像Ｂ０１で行い、画像中に出現した新規顔について、フレーム画像Ｂ０２において追跡を開始できるように動作させることも出来る。 In this manner, as shown in FIG. 11, the controller 205 periodically executes On (1) and Off (0) replacement of the detection candidate region bitmap, performs the frame image B01, and newly appears in the image. It is also possible to operate the face so that tracking can be started in the frame image B02.

次に、図１２のフローチャートを用いて顔検出部１０４の動作について説明する。
画像データが例えば画像ＲＡＭ１０９から画像バス１１３を介して顔検出部１０４のスケーリング部２０１に入力される（ＳＴ−１００）。 Next, the operation of the face detection unit 104 will be described using the flowchart of FIG.
For example, image data is input from the image RAM 109 to the scaling unit 201 of the face detection unit 104 via the image bus 113 (ST-100).

ステップＳＴ−１０１において、スケーリング部２０１で、画像データのサイズ（水平方向および垂直方向の各画素数）を、顔検出コア部２０３における顔検出処理に適するように縮小または拡大するスケーリング処理を行い、画像バッファ２０２に供給する（図３参照）。また、顔を含む可能性の高い検出候補領域として、候補領域処理部２０４により算出された、顔の検出候補領域のビットマップをコントローラ２０５より受け取り、画像と同様にスケーリング処理を行い、この処理したデータを画像バッファ２０２に供給する。 In step ST-101, the scaling unit 201 performs a scaling process for reducing or enlarging the size of the image data (the number of pixels in each of the horizontal direction and the vertical direction) to be suitable for the face detection process in the face detection core unit 203, The image is supplied to the image buffer 202 (see FIG. 3). In addition, as a detection candidate region having a high possibility of including a face, a bitmap of the face detection candidate region calculated by the candidate region processing unit 204 is received from the controller 205, and the scaling process is performed in the same manner as the image. Data is supplied to the image buffer 202.

ステップＳＴ−１０２において、スケーリング処理された後、顔領域を検出するために、矩形領域（検出領域）３１０をΔｘ、Δｙ方向にシフトして、画面上を走査することにより操作処理が行われる。 In step ST-102, after the scaling process, in order to detect the face area, the rectangular area (detection area) 310 is shifted in the Δx and Δy directions, and the operation process is performed by scanning the screen.

ステップＳＴ−１０３において、矩形領域で走査した領域が、顔の検出候補領域内で有るか否かが判別され、判別した結果、検出候補領域内でないときステップＳＴ−１０５に遷移し、検出候補領域内のとき、ステップＳＴ−１０４へ遷移する。 In step ST-103, it is determined whether or not the area scanned with the rectangular area is within the face detection candidate area. If the result of the determination is not within the detection candidate area, the process proceeds to step ST-105, where Transition to step ST-104.

ステップＳＴ−１０４において、ステップＳＴ−１０２の走査処理によって切り出した矩形画像データにおける所定の画素に対して、所定の演算処理を施し、矩形画像データが顔を含むか否かを判定する。この詳細な動作については、上述したように、図８〜図１１を用いて説明した。 In step ST-104, predetermined calculation processing is performed on predetermined pixels in the rectangular image data cut out by the scanning processing in step ST-102 to determine whether the rectangular image data includes a face. This detailed operation has been described with reference to FIGS. 8 to 11 as described above.

ステップＳＴ−１０５において、矩形領域３１０を表示画面上で全ての領域に渡り走査したか否かを判別する。判別の結果、全ての領域を走査していないとき、ステップＳＴ―１０２に遷移し、全ての領域を走査したときは、ステップＳＴ−１０６に遷移する。 In step ST-105, it is determined whether or not the rectangular area 310 has been scanned over all areas on the display screen. As a result of determination, when not all the regions are scanned, the process proceeds to step ST-102. When all the areas are scanned, the process proceeds to step ST-106.

画面上の全ての領域を走査した画像データを全てのサイズで処理したかどうかを判別する。判別した結果、全てのサイズを処理したときは、ステップＳＴ−１０７に遷移し、顔検出結果を出力する。判別の結果、全てのサイズの画像データを処理しないときは、ステップＳＴ−１０１に遷移しスケーリング処理を行う。 It is determined whether or not image data obtained by scanning all areas on the screen has been processed with all sizes. As a result of the determination, when all the sizes have been processed, the process proceeds to step ST-107, and the face detection result is output. As a result of the determination, when image data of all sizes is not processed, the process proceeds to step ST-101 to perform scaling processing.

ステップＳＴ−１０７において、顔検出コア部２０３により出力された顔検出結果をもとに、次のフレーム画像で顔検出を行う画像範囲を、顔の検出候補領域として算出する。その後、ステップＳＴ−１０２へ遷移する。 In step ST-107, based on the face detection result output by the face detection core unit 203, an image range for performing face detection on the next frame image is calculated as a face detection candidate region. Then, the process proceeds to step ST-102.

このように、顔の検出候補領域データとビットマップを用いて顔検出することにより、演算量を削減できる。また、顔の検出候補領域データの区分けによる顔検出候補データ量も削減することができる。顔検出度候補確度の制御によるシームレスな顔の追跡ができる。さらに、顔の検出候補領域データを制御することにより、顔の追跡を復帰でき、また新規に現われた顔も追跡できる。 Thus, the amount of calculation can be reduced by detecting a face using face detection candidate area data and a bitmap. Further, the amount of face detection candidate data by dividing the face detection candidate area data can also be reduced. Seamless face tracking can be performed by controlling the face detection candidate accuracy. Furthermore, by controlling the face detection candidate area data, the face tracking can be restored, and a newly appearing face can also be tracked.

以上述べたことから、本発明は以下の利点を有する。
顔の検出処理を検出候補領域に絞り込むことで、一枚のフレーム画像を処理するための演算量が削減でき、その結果、顔の検出処理スピードを上げることが出来る。顔の検出処理スピードが上がると、顔の検出結果をＡＦ（Auto Focus）、ＡＥ（Auto Exposure）やＡＷＢ（Auto White Balance）の制御に用いる場合に、時間的に細かな制御が可能となる。細かな制御が可能になると、一瞬のシャッターチャンスを逃すことなく鮮明な画像を捕らえることができる As described above, the present invention has the following advantages.
By narrowing the face detection processing to the detection candidate area, the amount of calculation for processing one frame image can be reduced, and as a result, the face detection processing speed can be increased. When the face detection processing speed increases, fine control over time is possible when the face detection result is used for AF (Auto Focus), AE (Auto Exposure), or AWB (Auto White Balance) control. When fine control is possible, a clear image can be captured without missing a momentary photo opportunity.

顔の検出候補領域データの設定に慣性を持たせることで、一時的な顔の条件（向き、輝度、髪型、表情、など）変化により顔が未検出になったとしても、追跡が途切れることを防止できる。これにより、シームレスな顔検出の追跡が可能となる。シームレスな顔検出の追跡が可能になると、例えば、運動会の子供を撮影する場合などのように、屋外を動き回る被写体に対しても顔検出ができ、その顔の検出結果に対してＡＦ、ＡＥ、ＡＷＢの制御を適用すれば鮮明な動画像を撮影できる。 By giving inertia to the setting of face detection candidate area data, tracking can be interrupted even if the face is not detected due to a temporary change in face conditions (direction, brightness, hairstyle, facial expression, etc.) Can be prevented. This makes it possible to track face detection seamlessly. When seamless face detection tracking becomes possible, for example, when a child in an athletic meet is photographed, a face can be detected even for a subject moving around outdoors, and AF, AE, If the AWB control is applied, a clear moving image can be taken.

顔の条件（向き、輝度、髪型、表情、など）変化が長時間続いたり、動き回る人物の顔の移動量が大きく、顔が検出候補領域内から一瞬のうちに外れたりした場合、検出していた顔を見失って追跡が途切れる場合がある。その場合に、今まで顔の検出候補としていた領域以外を、あらたに検出候補に設定し、見失った顔を検出しなおして、追跡処理に復帰することが出来る。これにより、顔の追跡の途切れる時間を最小に抑えることが出来る。
また、検出領域を画像の顔の移動量に応じて可変することにより、顔の移動スピードが速い場合でも、追従することができる。
このように、追跡している顔の検出結果に対して、ＡＦ、ＡＥ、ＡＷＢの制御に適用すれば、ＡＦ、ＡＥ、ＡＷＢの制御ができない時間を最小限に抑えることができ、その結果、動画像が不鮮明になる時間を最小限にできる。 Detected when face condition (direction, brightness, hairstyle, facial expression, etc.) changes for a long time, or when the amount of movement of the face of a moving person is large and the face deviates from the detection candidate area instantly Tracking may be interrupted due to loss of sight. In this case, it is possible to newly set a region other than the region that has been used as a face detection candidate so far, detect a face that has been lost, and return to the tracking process. As a result, the time during which face tracking is interrupted can be minimized.
Further, by changing the detection area according to the amount of movement of the face of the image, it is possible to follow even when the movement speed of the face is fast.
In this way, if the detection result of the face being tracked is applied to AF, AE, and AWB control, the time during which AF, AE, and AWB cannot be controlled can be minimized, and as a result, It is possible to minimize the time when the moving image is unclear.

今まで顔の検出候補としていた領域以外を検出候補に設定する動作を定期的に行うことで、検出される顔が増減した場合でも、顔の検出と追跡処理が可能となる。これにより、複数人の顔の検出結果に対して、ＡＦ、ＡＥ、ＡＷＢの制御を適用すれば、被写体が複数人の動画像についても、鮮明な画像が撮影できる。 By regularly performing the operation of setting areas other than the face detection candidates so far as detection candidates, face detection and tracking processing can be performed even when the number of detected faces increases or decreases. As a result, if AF, AE, and AWB controls are applied to the detection results of a plurality of faces, a clear image can be taken even for moving images with a plurality of subjects.

よって本発明は、移動する人物の動画像を微小時間単位で切出したフレーム画像において、時間的に並べられたフレーム画像の前後２枚の画像に対して顔検出を行い、その顔の位置とサイズの変移量を調べると、微小時間が十分小さいと考えれば、その変移量は大きくない。これは、動画像を微小時間単位で切出したフレーム画像であれば、現在のフレーム画像で検出した顔を含む周辺の画像領域を、顔を含む可能性の高い候補領域とし、次のフレームで顔の検索対象領域とすれば、顔検出の探索範囲を絞り込むことができ、その結果、演算回数を削減することができる。 Therefore, the present invention performs face detection on two images before and after the temporally arranged frame images in a frame image obtained by cutting out a moving image of a moving person in minute time units, and the position and size of the face. When the amount of change is examined, if the minute time is considered to be sufficiently small, the amount of change is not large. If this is a frame image obtained by cutting out a moving image in minute time units, the peripheral image area including the face detected in the current frame image is set as a candidate area having a high possibility of including the face, and the face is detected in the next frame. If the search target area is selected, the face detection search range can be narrowed down, and as a result, the number of calculations can be reduced.

また、現在のフレーム画像で検出した顔の位置と、まったく同じ位置に顔が検出される確率は、時間が経過する毎に低くなっていくと考えられる。これは人物がフレーム画像中を動くためで、時間の経過と共に、顔が検出された位置からの変移量が大きくなっていくと考えられるからである。しかし、一定の時間が経過するまでは確率はゼロとは言い切れない。従って、顔が検出されたという事象を持つ領域を、一定の時間が経過するまで顔を含む可能性が高い候補領域とする処理を加える。これにより、顔が検出されない瞬間があったとしても、顔の位置とサイズを見失うことなく顔の検索が行われ、撮像画像の条件が元に戻ったところで、再び顔が検出できるようになる。 In addition, it is considered that the probability that a face is detected at the exact same position as the face position detected in the current frame image decreases with time. This is because the person moves in the frame image, and it is considered that the amount of change from the position where the face is detected increases with time. However, the probability is not zero until a certain time has passed. Accordingly, a process is performed in which a region having an event that a face is detected is set as a candidate region that is highly likely to include a face until a predetermined time elapses. As a result, even if there is a moment when a face is not detected, the face is searched without losing sight of the position and size of the face, and the face can be detected again when the captured image conditions are restored.

また、顔を含む可能性が高い候補領域内で、一定の時間が経過しても顔が検出されない場合は、逆に候補領域外の方が顔を含む可能性が高いと判定し、顔の検索対象領域を候補領域外に切り替える処理を加える。これにより、顔の位置とサイズを見失うことになったとしても、改めて顔を検出することができ、顔検出の追跡動作へ復帰することが可能となる。なお、候補領域の領域内・外を排他的に切り替える理由は、顔検出の探索範囲を絞り込む効果を、顔検出の追跡動作を復帰する場合にも維持するためである。 If a face is not detected within a candidate area that is likely to contain a face even after a certain period of time, it is determined that the face outside the candidate area is more likely to contain a face. A process of switching the search target area outside the candidate area is added. As a result, even if the position and size of the face are lost, it is possible to detect the face anew and return to the face detection tracking operation. The reason why the candidate area is exclusively switched between the inside and the outside is to maintain the effect of narrowing down the face detection search range even when the face detection tracking operation is restored.

本発明において、画面上の画像データを拡大または縮小するスケーリング部は、スケーリング部に対応する。上記スケーリング部で得られた画像データに対して所定個数の画素を有する区画が複数個設けられ、該複数個の区画で設けられた画面上を所定の大きさの矩形領域で水平方向、または水平および垂直方向に走査し、上記スケーリング処理された画像データの中の特定画像を検出する画像検出部は、顔検出部に対応する。上記検出して得られた特定画像の画面の区画の重複度に応じた検出確度値を設定する候補領域処理部は、候補領域処理部に対応する。上記候補処理部で設定された検出確度値を記憶する記憶部は、画像バッファに対応する。 In the present invention, the scaling unit that enlarges or reduces the image data on the screen corresponds to the scaling unit. A plurality of sections having a predetermined number of pixels are provided for the image data obtained by the scaling unit, and a rectangular area of a predetermined size is horizontally or horizontally displayed on the screen provided with the plurality of sections. An image detection unit that scans in the vertical direction and detects a specific image in the scaled image data corresponds to the face detection unit. The candidate area processing unit that sets the detection accuracy value according to the degree of overlap of the screen sections of the specific image obtained by the detection corresponds to the candidate area processing unit. A storage unit that stores the detection accuracy value set by the candidate processing unit corresponds to an image buffer.

図１は、撮像装置のブロック構成図である。FIG. 1 is a block diagram of the imaging apparatus. 図２は、図１に示した撮像装置の顔検出部のブロック構成図である。FIG. 2 is a block configuration diagram of the face detection unit of the imaging apparatus shown in FIG. 図３は、フレーム画像を拡大または縮小した画像データとそれに対応するビットマップデータを示す。FIG. 3 shows image data obtained by enlarging or reducing the frame image and corresponding bitmap data. 図４は、フレーム画像を特定の大きさの矩形領域で水平または垂直方向に走査する例である。FIG. 4 shows an example in which a frame image is scanned horizontally or vertically in a rectangular area having a specific size. 図５は、フレーム画像を縮小した時の画像データである。FIG. 5 shows image data when the frame image is reduced. 図６は、画面上の画像データを複数の画素で構成した複数の区画を示す図である。FIG. 6 is a diagram showing a plurality of sections in which image data on the screen is composed of a plurality of pixels. 図７は、検出候補領域の構成例を示す図である。FIG. 7 is a diagram illustrating a configuration example of a detection candidate area. 図８は、各フレーム画像における、顔検出の動作を説明する図である。FIG. 8 is a diagram for explaining the face detection operation in each frame image. 図９は、顔の検出候補領域データとビットマップ変換した時の検出候補領域ビットマップを示す図である。FIG. 9 is a diagram showing a face detection candidate area data and a detection candidate area bitmap when bitmap conversion is performed. 図１０は、フレーム画像における、ビットマップデータの切換えと検出候補領域を検出する動作を説明する図である。FIG. 10 is a diagram for explaining the operation of switching the bitmap data and detecting the detection candidate area in the frame image. 図１１は、新規顔画像を検出するための動作を説明する図である。FIG. 11 is a diagram illustrating an operation for detecting a new face image. 図１２は、画像の特定領域の顔検出候補領域を検出する動作を説明するためのフローチャートである。FIG. 12 is a flowchart for explaining an operation of detecting a face detection candidate area in a specific area of an image.

Explanation of symbols

１００…撮像装置、１０１…光学ブロック、１０２…信号変換部、１０３…カメラ信号処理部、１０４…顔検出部、１０５…表示処理部、１０６…画像信号処理部、１０７…記憶部、１０８…表示部、１０９…画像ＲＡＭ、１１０…制御部（ＣＰＵ）、１１１…ＲＯＭ、１１２…ＲＡＭ、１１３…画像バス、２０１…スケーリング部、２０２…画像バッファ、２０３…顔検出コア部、２０４…候補領域処理部、２０５…コントローラ、３００，３００ａ〜３００ｄ…画像データ、３１０，３１０ａ〜３１０ｄ，３１１…矩形領域。 DESCRIPTION OF SYMBOLS 100 ... Imaging device 101 ... Optical block 102 ... Signal conversion part 103 ... Camera signal processing part 104 ... Face detection part 105 ... Display processing part 106 ... Image signal processing part 107 ... Memory | storage part 108 ... Display 109, image RAM, 110, control unit (CPU), 111, ROM, 112, RAM, 113, image bus, 201, scaling unit, 202, image buffer, 203, face detection core unit, 204, candidate area processing Part, 205 ... controller, 300, 300a to 300d ... image data, 310, 310a to 310d, 311 ... rectangular area.

Claims

A scaling unit for enlarging or reducing the image data on the screen;
A plurality of sections having a predetermined number of pixels are provided for the image data obtained by the scaling unit, and a rectangular area of a predetermined size is horizontally or horizontally displayed on the screen provided with the plurality of sections. And an image detection unit that scans in the vertical direction and detects a specific image in the scaled image data,
A candidate area processing unit for setting a detection accuracy value according to the degree of overlap of the screen section of the specific image obtained by the detection;
A storage unit for storing the detection accuracy value set by the candidate area processing unit;
An image detection apparatus.

The image detection apparatus according to claim 1, wherein the specific image of the image data represents a face.

The image detection apparatus according to claim 2, wherein the detection accuracy value varies according to a frame image having the image data that changes with time.

The image detection apparatus according to claim 3, wherein the detection accuracy value is a value representing a degree of overlap in a specific range.

The image detection apparatus according to claim 4, wherein the detection accuracy value is set with respect to horizontal or vertical coordinates of a plurality of sections on the screen.

The detection accuracy value is stored in the storage unit corresponding to each frame image, and is used to detect a specific image that is not detected in the specific frame image based on the stored data in a subsequent frame image. Item 6. The image detection device according to Item 5.

The image detection apparatus according to claim 1, wherein a detection candidate area for detecting the specific image is varied according to a movement amount of the specific image of the image data.

The image detection apparatus according to claim 1, wherein the bitmap value of the specific image detected in the frame image is inverted, and the specific image is inspected for areas other than the inverted region.

The image detection device according to claim 1, wherein the image detection unit detects the section of the entire frame image at a predetermined frame interval.

Enlarging or reducing the image data on the screen by scaling processing;
Providing a plurality of sections having a predetermined number of pixels in the image data enlarged or reduced by the scaling process;
Scanning the plurality of sections provided on the screen in a rectangular area of a predetermined size in the horizontal direction on the screen, or in the horizontal and vertical directions;
Detecting a specific image in the scaled image data by scanning the rectangular area;
A step of setting a detection accuracy value according to the degree of overlap of the section corresponding to the specific image obtained by the detection;
An image detection method comprising: storing the set detection accuracy value.

The image detection method according to claim 10, wherein the specific area of the image data is face image data, and the detection accuracy value varies according to a frame image having the image data that changes with time.

The detection accuracy value is stored in the storage unit corresponding to each frame image, and is used to detect a specific image that is not detected in the specific frame image based on the stored data in a subsequent frame image. Item 15. The image detection method according to Item 10.

The image detection method according to claim 10, wherein the bitmap value of the specific image detected in the frame image is inverted and the specific image is inspected for areas other than the inverted region.

The image detection method according to claim 10, wherein the image detection step detects the section of the entire frame image at a predetermined frame interval.