JP2009009456A

JP2009009456A - Face detection device and face detection program

Info

Publication number: JP2009009456A
Application number: JP2007171759A
Authority: JP
Inventors: Fumiyuki Shiratani; 文行白谷
Original assignee: Olympus Corp
Current assignee: Olympus Corp
Priority date: 2007-06-29
Filing date: 2007-06-29
Publication date: 2009-01-15
Anticipated expiration: 2027-06-29
Also published as: JP4818997B2

Abstract

PROBLEM TO BE SOLVED: To improve the detection precision of a face of a person who becomes a subject with a high frequency, without burdening a user. SOLUTION: A digital camera includes a speed prioritized face detector 42 and a precision prioritized face detector 43. A control part 4 of the digital camera extracts an area which has been detected as a face area by the precision prioritized face detector 43 but has not been detected as a face area by the speed prioritized face detector 42, as an undetected area and performs learning of the speed prioritized face detector 42 on the basis of the undetected area so that the undetected area may be detected as a face area. COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、画像から顔領域を検出する技術に関する。 The present invention relates to a technique for detecting a face area from an image.

近年、顔検出処理の自動化が進み、顔検出器を搭載したデジタルカメラが増えてきている。デジタルカメラに顔検出器を搭載することにより、被写体となる人物の顔に対してピント、露出値を自動的に最適化する自動焦点制御・自動露出制御が可能になり、人物の顔を綺麗に撮影することができる。 In recent years, automation of face detection processing has progressed, and digital cameras equipped with face detectors have increased. By installing a face detector in the digital camera, it is possible to focus on the face of the person who is the subject and to automatically optimize the exposure value, and to perform automatic focus control and automatic exposure control, making the face of the person beautiful You can shoot.

顔検出器としては、複数の識別器をカスケード接続するViolaらが提案する顔検出器（特許文献１）の他、ニューラルネットワークを用いた顔検出器（特許文献２）などが用いられている。
P. Viola and M. Jones. "Rapid Object Detection Using a Boosted Cascade of Simple Features," in Proc. of CVPR, vol.1, ppp.511-518, December, 2001 特開2006-301779公報特開2003-323615公報 As a face detector, a face detector using a neural network (Patent Document 2) and the like are used in addition to a face detector proposed by Viola et al. That cascade-connects a plurality of discriminators (Patent Document 1).
P. Viola and M. Jones. "Rapid Object Detection Using a Boosted Cascade of Simple Features," in Proc. Of CVPR, vol.1, ppp.511-518, December, 2001 JP 2006-301779 A JP2003-323615A

デジタルカメラなどの製品には学習済みの顔検出器が製品に搭載される。しかしながら、顔検出器の学習は標準的な顔の画像を用いて所望の検出精度が得られるように行われるため、あらゆる人物の顔について高い検出精度が得られるわけではない。したがって、ユーザーによっては、家族や知人といった被写体となる頻度の高い人物の顔が検出されにくいといった場合も生じうる。 Products such as digital cameras are equipped with learned face detectors. However, since learning of the face detector is performed so that a desired detection accuracy can be obtained using a standard face image, high detection accuracy cannot be obtained for every human face. Accordingly, depending on the user, it may be difficult to detect the face of a person who frequently becomes a subject such as a family member or acquaintance.

この点に関して特許文献３では、被写体となる頻度が高い人物の特徴をユーザーに入力させることで、当該人物の顔の検出精度を向上させている。しかしながら、このような操作をユーザーに求める手法では手間がかかる。 In this regard, in Patent Document 3, the user's face detection accuracy is improved by allowing a user to input the characteristics of a person who frequently becomes a subject. However, the method for requesting such an operation from the user is time-consuming.

本発明は、このような従来技術の技術的課題を鑑みてなされたもので、ユーザーに負担をかけずに、被写体となる頻度の高い人物の顔の検出精度を向上させることを目的とする。 The present invention has been made in view of such technical problems of the prior art, and an object of the present invention is to improve the detection accuracy of the face of a person who frequently becomes a subject without burdening the user.

本発明に係る顔検出装置は、画像から顔領域を検出する第１の顔検出器と、前記画像と同一ないし略同一の画像から顔領域を検出し、前記第１の顔検出器とは異なる検出精度を有する第２の顔検出器と、前記第２の顔検出器では顔領域として検出されたが前記第１の顔検出器では顔領域として検出されなかった領域を未検出領域として抽出する未検出領域抽出手段と、前記未検出領域に基づき、前記未検出領域が顔領域として検出されるように前記第１の顔検出器の学習を行う学習手段と、を備える。 The face detection apparatus according to the present invention detects a face area from a first face detector that detects a face area from an image, and the same or substantially the same image as the image, and is different from the first face detector. A second face detector having detection accuracy and a region detected as a face region by the second face detector but not detected as a face region by the first face detector are extracted as undetected regions. Undetected area extraction means; and learning means for learning the first face detector based on the undetected area so that the undetected area is detected as a face area.

本発明によれば、第１の顔検出器によって検出することができなかった顔領域（未検出領域）に基づき第１の顔検出器の学習が行われるので、顔検出の対象となる頻度が高く、かつ、第１の顔検出器によって検出しにくい顔がある場合に、第１の顔検出器による当該顔の検出精度を高めることができる。 According to the present invention, since the learning of the first face detector is performed based on the face area (undetected area) that could not be detected by the first face detector, the frequency of the face detection target is increased. When there is a face that is high and difficult to detect by the first face detector, the detection accuracy of the face by the first face detector can be increased.

以下、添付図面を参照しながら本発明の実施形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings.

図１は、本発明に係る顔検出装置が搭載されたデジタルカメラの概略構成図である。 FIG. 1 is a schematic configuration diagram of a digital camera equipped with a face detection apparatus according to the present invention.

光学系１はレンズ、絞りで構成され、撮像部２はCCDやCMOSなどの電荷撮像素子で構成される。光学系１を介して撮像部２で撮影された画像は、A/Dコンバータ３でリアルタイムにデジタル信号に変換され、制御部４に送られる。 The optical system 1 includes a lens and a diaphragm, and the imaging unit 2 includes a charge imaging element such as a CCD or a CMOS. An image taken by the imaging unit 2 via the optical system 1 is converted into a digital signal in real time by the A / D converter 3 and sent to the control unit 4.

制御部４は、マイクロプロセッサを中心として構成され、一時メモリ４１、速度重視顔検出器（第１の顔検出器）４２、精度重視顔検出器（第２の顔検出器）４３を備える。また、制御部４は、撮像部２で撮影された画像をスルー画像として一時メモリ４１に逐次記録するとともに、スルー画像をカメラの背面に設けられた液晶モニタ５に表示する。カメラの上面及び背面にはボタン、ダイヤル等からなる操作部９が設けられており、ユーザーは液晶モニタ５に表示されるメニュー画面、撮影された画像のサムネイル画像等を見ながら各種操作を行うことができる。 The control unit 4 is configured around a microprocessor, and includes a temporary memory 41, a speed-oriented face detector (first face detector) 42, and an accuracy-oriented face detector (second face detector) 43. In addition, the control unit 4 sequentially records the images taken by the imaging unit 2 as a through image in the temporary memory 41 and displays the through image on a liquid crystal monitor 5 provided on the back of the camera. An operation unit 9 composed of buttons, dials, and the like is provided on the top and back of the camera, and the user can perform various operations while viewing the menu screen displayed on the liquid crystal monitor 5, thumbnail images of the captured images, and the like. Can do.

制御部４に接続される学習用画像データベース６には、複数の画像がそれぞれ顔画像、非顔画像のいずれかであることを示す教師信号と対応付けられた状態で格納されており、学習用画像データベース６は速度重視顔検出器４２の学習を行う際に用いられる。速度重視顔検出器４２の学習方法については後述する。 The learning image database 6 connected to the control unit 4 stores a plurality of images in association with a teacher signal indicating that each of the images is a face image or a non-face image. The image database 6 is used when the speed-oriented face detector 42 is learned. A learning method of the speed-oriented face detector 42 will be described later.

制御部４は、レリーズボタン７が押されるまでは、速度重視顔検出器４２をスルー画像に対して適用し、スルー画像から顔領域を検出する。そして、制御部４は、検出された顔領域の位置（中心位置）、大きさに基づき、検出された顔領域に対してピント、露出値が最適になるように、自動焦点（ＡＦ）・自動露出（ＡＥ）制御部１０に信号を送り、レンズの位置、絞りを調節する。 Until the release button 7 is pressed, the control unit 4 applies the speed-oriented face detector 42 to the through image and detects a face area from the through image. Then, the control unit 4 automatically focuses (AF) / automatically so that the focus and exposure values are optimized with respect to the detected face area based on the position (center position) and size of the detected face area. A signal is sent to the exposure (AE) control unit 10 to adjust the position and aperture of the lens.

本撮影はユーザーがレリーズボタン７を押すことで行われる。制御部４は、本撮影によって撮影された画像を本撮影画像としてフラッシュメモリ８に記録する。 The actual shooting is performed when the user presses the release button 7. The control unit 4 records an image photographed by the main photographing in the flash memory 8 as a main photographing image.

制御部４は、本撮影画像に対して精度重視顔検出器４３を適用し、本撮影画像から顔領域を検出する。そして、制御部４は、速度重視顔検出器４２の顔検出結果と精度重視顔検出器４３の顔検出結果とを比較し、精度重視顔検出器４３で顔領域として検出されたが速度重視顔検出器４２では顔領域として検出されなかった領域を未検出領域として抽出し、抽出された未検出領域の画像を顔画像であることを示す教師信号の値１と対応づけて学習用画像データベース６に追加する。 The control unit 4 applies a precision-oriented face detector 43 to the actual captured image, and detects a face area from the actual captured image. Then, the control unit 4 compares the face detection result of the speed emphasis face detector 42 with the face detection result of the accuracy emphasis face detector 43, and the speed emphasis face detector 43 detects the face area. The detector 42 extracts an area that is not detected as a face area as an undetected area, and associates the extracted image of the undetected area with a value 1 of a teacher signal indicating that it is a face image. Add to

制御部４は、学習用画像データベース６に追加された画像の数が所定数に達したところで、追加された未検出領域の画像が顔領域として検出されるよう、速度重視顔検出器４２の学習を行う。 When the number of images added to the learning image database 6 reaches a predetermined number, the control unit 4 learns the speed-oriented face detector 42 so that the added undetected area image is detected as a face area. I do.

速度重視顔検出器４２と精度重視顔検出器４３についてさらに説明すると、２つの顔検出器はいずれもViolaらが提案する方法を用いた顔検出器であり、Adaboostアルゴリズムによって生成される複数の識別器H_k(k=1〜S）を、図２に示すようにカスケード接続した構造を有している。 The speed-oriented face detector 42 and the accuracy-oriented face detector 43 will be further described. Both of the two face detectors are face detectors using the method proposed by Viola et al., And a plurality of identifications generated by the Adaboost algorithm. The apparatus H _k (k = 1 to S) has a cascade connection structure as shown in FIG.

画像のある領域が顔領域か否かを判断するには、まず、１段目の識別器H₁でその領域が顔領域か否か判定する。顔領域でないと判定した場合は処理を終了し、顔領域と判定した場合は２段目の識別器H₂に進む。そして、２段目以降の識別器H_kでも顔領域か否かの判定を行い、顔領域でないと判定した場合は処理を終了し、顔領域と判定した場合のみ次の段へと進む。 In order to determine whether or not a certain area of the image is a face area, first, it is determined by the first-stage discriminator H ₁ whether or not the area is a face area. If it is determined not to be a face region ends the process, if it is determined that the face area advances to the discriminator of H ₂ 2 stage. Then, it is determined whether the face area even classifier H _k of the second and subsequent stages, when it is determined not to be a face region terminates the processing proceeds to the next stage only when it is determined that the face area.

したがって、顔判別器は、最終段の識別器H_Sまで顔領域であるとの判定が続いた場合のみその領域が顔領域であると判定するので、高い検出精度が得られる。その一方で、途中の段で顔領域でないと判定した場合は直ちに処理が終了するので、高い処理速度が得られる。 Therefore, since the face discriminator determines that the area is a face area only when it is determined that the area is a face area up to the final classifier H _S , high detection accuracy can be obtained. On the other hand, if it is determined that the face area is not in the middle of the process, the processing is immediately terminated, so that a high processing speed can be obtained.

識別器H_kは、それぞれ複数の弱識別器を線形結合することによって構成される。弱識別器は図３に示すような黒矩形と白矩形からなる矩形フィルターと閾値θの組である。弱識別器は、矩形フィルターを顔検出の対象となる領域に重ね合わせ、黒矩形に対応する領域内の輝度値の和と白矩形に対応する領域内の輝度値の和との差が閾値θよりも大きいか判断し、閾値θよりも大きいときは顔であることを示す１、小さいときは非顔であることを示す０を出力する。 Each classifier H _k is configured by linearly combining a plurality of weak classifiers. The weak classifier is a set of a rectangular filter composed of a black rectangle and a white rectangle as shown in FIG. The weak classifier superimposes the rectangular filter on the area to be face-detected, and the difference between the sum of the luminance values in the area corresponding to the black rectangle and the sum of the luminance values in the area corresponding to the white rectangle is the threshold θ. If it is larger than the threshold value θ, 1 indicating that it is a face is output, and if it is smaller, 0 indicating that it is a non-face is output.

顔検出の対象となる領域が識別器H_kに入力されると、識別器H_kは、弱識別器の出力にその弱識別器の信頼度αを掛けた値の総和を算出し、その総和から所定の閾値Th_Tを減じた値を確信度Conv(k)として算出する。確信度Conv(k)は、その領域に顔が含まれていることの確かさを表す値である。そして、識別器H_kは、確信度Conv(k)が正の場合は当該領域が顔領域と判定して＋１を出力し、負の場合は非顔領域であると判定して−１の値を出力する。 When subject to areas of the face detection is inputted to the discriminator H _k, identifier H _k calculates the sum of values reliability multiplied by α for the weak discriminator to output a weak classifier, the sum Then, a value obtained by subtracting a predetermined threshold value Th _T is calculated as the confidence level Conv (k). The certainty factor Conv (k) is a value representing the certainty that the face is included in the region. When the certainty factor Conv (k) is positive, the discriminator H _k determines that the region is a face region and outputs +1, and when negative, determines that the region is a non-face region and has a value of −1. Is output.

図４は識別器H_kを生成する処理の内容を示したフローチャートである。識別器H_kを構成する弱識別器の選出はAdaboostアルゴリズムにより行われ、識別器H_kが学習用画像データベース６に格納されている画像に対して所望の精度で顔、非顔の判断ができるようになるまで繰り返される。添え字tは識別器H_kの更新回数（弱識別器を識別器H_kに追加した回数）であり、初期値は１である。 Figure 4 is a flow chart showing the contents of processing for generating an identifier H _k. The weak classifiers constituting the classifier H _k are selected by the Adaboost algorithm, and the classifier H _k can determine a face or a non-face with desired accuracy with respect to an image stored in the learning image database 6. Repeat until The subscript t is the number of times the classifier H _k has been updated (the number of times the weak classifier has been added to the classifier H _k ). The initial value is 1.

これによると、まず、ステップＳ１では、次式（１）により、学習用画像データベース６に格納されている各画像の重みを初期値W₁(i)に設定する。iは各画像に割り当てられる通し番号であり、Nは学習用画像データベース６に格納されている画像の総数である。 According to this, first, in step S1, the weight of each image stored in the learning image database 6 is set to the initial value W ₁ (i) by the following equation (1). i is a serial number assigned to each image, and N is the total number of images stored in the learning image database 6.

ステップＳ２では、様々な弱識別器を全画像に対して適用し、次式（２）により誤り率ε_tを算出する。

In step S2, various weak classifiers are applied to all images, and an error rate ε _t is calculated by the following equation (2).

ステップＳ３では、誤り率ε_tが最小になる弱識別器を、識別器H_kを構成する弱識別器h_tとして選出する。そして、選出された弱識別器h_tを識別器H_kに追加し、識別器H_kを更新する。

In step S3, the weak classifier that minimizes the error rate ε _t is selected as the weak classifier h _t constituting the classifier H _k . Then, add the elected weak classifier h _t the discriminator H _k, updating the classifier H _k.

ステップＳ４では、選出された弱識別器h_tの誤り率ε_tに基づき、次式（３）により選出された弱識別器h_tの信頼度α_tを算出する。 In step S4, based on the error rate epsilon _t the elected weak classifier h _t, it calculates the reliability alpha _t of the weak classifier h _t elected by the following equation (3).

ステップＳ５では、選出された弱識別器h_tの信頼度α_tに基づき、弱識別器h_tが判定を誤った画像の重みW_t(i)を次式（４）により増加させ、逆に、判定が正しかった画像の重みW_t(i)を次式（５）によって減少させる。さらに、更新後の重みW_t(i)をそれらの総和で割って重みW_t(i)を正規化する。

In step S5, based on the reliability alpha _t the elected weak classifier h _t, it is increased by the following equation (4) the weight W _t of the image weak classifier h _t is the wrong decision (i), the opposite The weight W _t (i) of the image for which the determination is correct is reduced by the following equation (5). Further, the weight W _t (i) after the update is divided by the sum of them to normalize the weight W _t (i).

ステップＳ６では、次式（６）により、弱識別器h_tを全画像に適用し、その結果に対応する信頼度α_tを掛けた値の総和を求め、その総和から閾値Th_Tを引いた値を確信度Conv(k)として算出する。xは画像の輝度情報である。確信度Conv(k)は、０から１の間に正規化されるように、シグモイド関数を施してもよく、最大値で割ってもよい。

In step S6, by the following equation (6), applying a weak classifier h _t the entire image, the total sum of the value obtained by multiplying the reliability alpha _t corresponding to the result, by subtracting the threshold Th _T from the sum The value is calculated as the certainty factor Conv (k). x is the luminance information of the image. The certainty factor Conv (k) may be subjected to a sigmoid function so as to be normalized between 0 and 1, and may be divided by the maximum value.

ステップＳ７では、全画像について確信度Conv(k)の符号の正負に応じて顔検出対象となる領域が顔領域か否かを判断する。そして、顔領域、非顔領域の判断が正しく行われた画像の数を学習用画像データベースに格納されている画像の総数Nで割って、検出精度を算出する。

In step S7, it is determined whether or not the area to be face-detected is a face area according to the sign of the certainty factor Conv (k) for all images. Then, the detection accuracy is calculated by dividing the number of images in which the determination of the face area and the non-face area is correctly performed by the total number N of images stored in the learning image database.

ステップＳ８では、所望の検出精度が得られているか判断する。所望の検出精度が得られている場合はステップＳ９に進む。 In step S8, it is determined whether a desired detection accuracy is obtained. If the desired detection accuracy is obtained, the process proceeds to step S9.

ステップＳ９では、次式（７）により識別器H_kを構成する。 In step S9, the discriminator _Hk is configured by the following equation (7).

識別器H_kは、画像のある領域の輝度情報が入力されると式（６）により確信度Conv(k)を算出し、確信度Conv(k)の符号が正の場合は当該領域が顔領域と判定して＋１を出力し、負の場合は非顔領域と判定して−１を出力する。

When the luminance information of a certain area of the image is input, the discriminator H _k calculates the certainty factor Conv (k) according to the equation (6). If the sign of the certainty factor Conv (k) is positive, the region is the face The area is determined to be +1, and if it is negative, it is determined to be a non-face area and -1 is output.

なお、通常、カスケード処理では前段の情報を持ち越さない場合が多いが、識別器H_kにおいて１段目からk段目の確信度Conv(k)の総和を次式（８）により算出し、Convsumの符号によって顔領域、非顔領域を判断するようにしてもよい。このように確信度Conv(k)の総和をとり、以前の段で算出した確信度も反映させたほうが、経験上、高い検出精度を得ることができることがわかっている。 Usually, in cascade processing, the previous stage information is often not carried over, but in the classifier H _k , the sum of confidence levels Conv (k) from the first stage to the kth stage is calculated by the following equation (8), and Convsum The face area and the non-face area may be determined based on the reference numeral. In this way, it is known from experience that high detection accuracy can be obtained by taking the sum of the certainty levels Conv (k) and reflecting the certainty levels calculated in the previous stage.

一方、ステップＳ８で所望の検出精度が得られていないと判断された場合は、更新回数ｔに１が加算され、ステップＳ２に戻って新たな弱識別器の選出及び選出した弱識別器の識別器H_kへの追加が行われる。弱識別器の追加は所望の検出精度が得られるまで繰り返し行われる。

On the other hand, if it is determined in step S8 that the desired detection accuracy is not obtained, 1 is added to the update count t, and the process returns to step S2 to select a new weak classifier and identify the selected weak classifier. Addition to vessel H _k is performed. The addition of the weak classifier is repeatedly performed until a desired detection accuracy is obtained.

速度重視顔検出器４２と精度重視顔検出器４３は、いずれも上記手順を経て構成されるが、両検出器４２、４３の違いは、例えば、弱識別器の閾値θを変えることによって得られる。 Both the speed-oriented face detector 42 and the accuracy-oriented face detector 43 are configured through the above-described procedure, but the difference between the two detectors 42 and 43 is obtained, for example, by changing the threshold value θ of the weak classifier. .

例えば、閾値θを図５に示すように低めの値θ_Aに設定すると、顔を含む領域をほぼ正しく顔領域と判定するが、顔を含まない領域を顔領域と誤判定してしまう率が高い弱識別器が得られる。精度重視顔検出器４３を構成する識別器はこのような弱識別器を多数組み合わせることで構成され、顔領域、非顔領域を高い精度で認識する。ただし、識別器を構成する弱識別器の数が多い分、処理は遅くなる。 For example, when the threshold value θ is set to a low value θ _A as shown in FIG. 5, a region that includes a face is determined as a face region almost correctly, but a region that does not include a face is erroneously determined as a face region. A high weak classifier is obtained. The classifier constituting the accuracy-oriented face detector 43 is configured by combining a number of such weak classifiers, and recognizes a face area and a non-face area with high accuracy. However, since the number of weak classifiers constituting the classifier is large, processing is slowed down.

一方、閾値θを高めのθ_Bに設定すれば、顔を含む領域を正しく顔領域と判定する率、顔を含まない領域を非顔領域と判定する率ともにある程度高い弱識別器となる。したがって、このような弱識別器を少ない個数組み合わせれば、未検出と過検出がやや多めではあるが処理の速い識別器が得られ、速度重視顔判別器４２はこのような識別器を組み合わせて構成される。 On the other hand, if the threshold value θ is set to a higher θ _B , a weak discriminator having a relatively high rate at which an area including a face is correctly determined as a face area and an area at which a face is not included is determined as a non-face area. Therefore, if a small number of such weak classifiers are combined, a classifier having a relatively high number of undetected and overdetected but a fast process can be obtained, and the speed-oriented face classifier 42 combines such classifiers. Composed.

なお、精度重視顔検出器４３は、例えば、公知のガボールフィルターとグラフマッチング処理を組み合わせた方法とViolaらが提案する方法の２つを併用する等、異なる方法を組み合わせたものであってもよい。 Note that the accuracy-oriented face detector 43 may be a combination of different methods, for example, a combination of a known Gabor filter and graph matching processing and a method proposed by Viola et al. .

続いて撮影時のデジタルカメラの撮影時の処理内容について説明する。 Next, processing contents at the time of shooting by the digital camera at the time of shooting will be described.

図６は制御部４の処理内容を示したフローチャートである。この処理は、デジタルカメラが撮影モードにあるときに、制御部４において繰り返し実行される。 FIG. 6 is a flowchart showing the processing contents of the control unit 4. This process is repeatedly executed by the control unit 4 when the digital camera is in the shooting mode.

これによると、ステップＳ１１では、撮像部２でスルー画像を撮影し、液晶モニタ５に撮影したスルー画像を表示する。 According to this, in step S 11, a through image is captured by the imaging unit 2, and the captured through image is displayed on the liquid crystal monitor 5.

ステップＳ１２では、一時メモリ４１に格納されているスルー画像を消去し、新たに撮影されたスルー画像を一時メモリ４１に保存する。 In step S 12, the through image stored in the temporary memory 41 is deleted, and the newly captured through image is stored in the temporary memory 41.

ステップＳ１３では、一時メモリ４１に保存されているスルー画像に対して速度重視顔検出器４２を適用し、スルー画像から顔領域を検出する。顔領域が検出された場合は、液晶モニタ５に当該領域を囲む矩形の枠を表示し、ユーザーに顔領域が検出されたことを示す。 In step S13, the speed-oriented face detector 42 is applied to the through image stored in the temporary memory 41, and a face area is detected from the through image. When a face area is detected, a rectangular frame surrounding the area is displayed on the liquid crystal monitor 5 to indicate to the user that the face area has been detected.

ステップＳ１４では、検出された顔領域の位置、大きさ情報に基づき、ＡＦ・ＡＥ制御部１０を介してレンズ位置、絞りを調節し、検出された顔領域についてピント、露出値を最適化する。 In step S14, the lens position and the aperture are adjusted via the AF / AE control unit 10 based on the detected position and size information of the face area, and the focus and exposure values are optimized for the detected face area.

ステップＳ１５ではレリーズボタン７が押されたか判断する。レリーズボタン７が押された場合はステップＳ１６に進む。そうでない場合はステップＳ１１に戻り、スルー画像の撮影及びスルー画像からの顔領域の検出を繰り返す。 In step S15, it is determined whether the release button 7 has been pressed. If the release button 7 has been pressed, the process proceeds to step S16. Otherwise, the process returns to step S11, and the capturing of the through image and the detection of the face area from the through image are repeated.

ステップＳ１６では本撮影を行う。このとき、速度重視顔検出器４２で検出された顔領域について既にピント、露出値が既に最適化されているので、本撮影では被写体となっている人物の顔が綺麗に撮影される。 In step S16, actual photographing is performed. At this time, since the focus and exposure values have already been optimized for the face area detected by the speed-oriented face detector 42, the face of the person who is the subject is photographed beautifully in the main photographing.

ステップＳ１７では撮影された画像を本撮影画像としてフラッシュメモリ８に保存する。本撮影画像は直前に撮影されたスルー画像と略同一の画像である。 In step S17, the captured image is stored in the flash memory 8 as a main captured image. The actual captured image is substantially the same as the through image captured immediately before.

ステップＳ１８ではフラッシュメモリ８に保存されている本撮影画像に対して精度重視顔検出器４３を適用し、本撮影画像から顔領域を検出する。 In step S18, the accuracy-oriented face detector 43 is applied to the actual captured image stored in the flash memory 8, and a face area is detected from the actual captured image.

ステップＳ１９では、２つの検出器４２、４３の検出結果を比較し、精度重視顔検出器４３で顔領域として検出されたが速度重視顔検出器４２では顔領域として検出されなかった領域を未検出領域として抽出する。図７はそのときの様子を示したものである。この例では、被写体となっている３人の顔のうち、画面右下の人物の顔を囲む矩形領域が未検出領域として抽出される。 In step S19, the detection results of the two detectors 42 and 43 are compared, and an area that is detected as a face area by the accuracy-oriented face detector 43 but not detected as a face area by the speed-oriented face detector 42 is not detected. Extract as a region. FIG. 7 shows the state at that time. In this example, a rectangular area surrounding the face of the person at the lower right of the screen is extracted as an undetected area among the three faces as subjects.

ステップＳ２０では、抽出された未検出領域の画像が、顔画像であることを示す教師信号の値１と対応付けられ、学習用画像データベース６に格納される。 In step S 20, the extracted image of the undetected area is associated with a teacher signal value 1 indicating that it is a face image and stored in the learning image database 6.

なお、未検出領域の画像を無条件に学習用画像データベース６に顔画像として格納するようにすると、精度重視顔検出器４３が顔を含まない領域を顔領域であると誤判定した場合に非顔画像が顔画像として学習用画像データベース６に格納されてしまい、その後の速度重視顔検出器４２の学習結果に悪影響を及ぼす。 If the image of the undetected area is unconditionally stored as a face image in the learning image database 6, it is not used when the accuracy-oriented face detector 43 erroneously determines that the area not including the face is a face area. The face image is stored as a face image in the learning image database 6 and adversely affects the subsequent learning result of the speed-oriented face detector 42.

そこで、ステップＳ２０では、抽出された未検出領域が、精度重視顔検出器４３で顔と判定されたときの確信度が所定値以上か否かを判断し、所定値以上である場合に抽出された未検出領域の画像を学習用画像データベース６に格納するようにしてもよい。 Therefore, in step S20, it is determined whether or not the degree of certainty when the extracted undetected area is determined to be a face by the accuracy-oriented face detector 43 is greater than or equal to a predetermined value. Alternatively, the image of the undetected area may be stored in the learning image database 6.

あるいは、抽出された未検出領域の画像を液晶モニタ５に表示し、ユーザーが当該画像に顔が含まれていることを確認した上で、ユーザーが操作部９を操作し、抽出された未検出領域の画像を顔画像として学習用画像データベース６に追加するようにしてもよい。このとき、未検出領域に顔が含まれていない場合は、当該画像を非顔画像であることを示す教師信号の値０と対応付け、非顔画像として学習用画像データベース６に追加するようにしてもよい。 Alternatively, the extracted image of the undetected area is displayed on the liquid crystal monitor 5, and after the user confirms that the face is included in the image, the user operates the operation unit 9 to extract the undetected area. The image of the area may be added to the learning image database 6 as a face image. At this time, if a face is not included in the undetected area, the image is associated with a teacher signal value 0 indicating that it is a non-face image and added to the learning image database 6 as a non-face image. May be.

ステップＳ２１では、追加された画像の個数が所定数に達したか判断し、所定数に達している場合は速度重視顔検出器４２の学習を行う。追加された画像の個数が所定数に達した場合に学習を行うのは、学習による演算負荷を考慮したものである。 In step S21, it is determined whether or not the number of added images has reached a predetermined number. If the predetermined number has been reached, the speed-oriented face detector 42 is trained. Learning is performed when the number of added images reaches a predetermined number in consideration of the computation load due to learning.

速度重視顔検出器４２の学習は、未検出領域の画像が追加された学習用画像データベース６を用い、図４に示した処理を再度実行して速度重視顔検出器４２を作り直すことで行われる。学習は未検出領域が顔領域として検出されるように行われるので、作り直された速度重視顔検出器４２では、それまで顔領域として検出されなかった未検出領域の画像が顔領域として検出されるようになる。 Learning of the speed-oriented face detector 42 is performed by using the learning image database 6 to which an image of an undetected area is added and executing the process shown in FIG. 4 again to recreate the speed-oriented face detector 42. . Since the learning is performed so that the undetected area is detected as the face area, the recreated speed-oriented face detector 42 detects the image of the undetected area that has not been detected as the face area until now as the face area. It becomes like this.

ステップ２２では撮影を終了するか判断し、撮影を終了する場合は処理を終了し、終了しない場合はステップＳ１１に戻る。 In step 22, it is determined whether or not the shooting is to be ended. If the shooting is to be ended, the process is ended.

以上の構成によれば、速度重視顔検出器４２で未検出領域が発生すると、当該未検出領域が顔領域として検出されるよう速度重視顔検出器４２の学習が行われる。学習は実際に撮影された人物の顔に基づき行われるので、学習が進むにつれ、被写体となる頻度の高い人物の顔の検出精度が向上する。また、速度重視顔検出器４２の学習はユーザーが意識しないうちに行われ、ユーザーへの負担も少ない。 According to the above configuration, when an undetected area is generated in the speed-oriented face detector 42, the speed-oriented face detector 42 is learned so that the undetected area is detected as a face area. Since the learning is performed based on the face of the actually photographed person, the detection accuracy of the face of the person who frequently becomes the subject improves as the learning progresses. Further, the learning of the speed-oriented face detector 42 is performed before the user is conscious, and the burden on the user is small.

なお、上記実施形態では、スルー画像に対して速度重視顔検出器４２、本撮影画像に対して精度重視顔検出器４３を適用しているが、同一ないし略同一の画像に対して顔検出器４２、４３を適用する構成であればよく、例えば、同一の本撮影画像に対して顔検出器４２、４３を適用しても同様の学習が可能である。 In the above embodiment, the speed-oriented face detector 42 is applied to the through image and the accuracy-oriented face detector 43 is applied to the actual captured image, but the face detector is applied to the same or substantially the same image. For example, the same learning is possible even if the face detectors 42 and 43 are applied to the same actual captured image.

学習により検出精度の向上が期待できる対象は特定の人物の顔に限らない。例えば、犬、猫等のペットの写真をよく撮影する場合でも、ペットの顔を精度重視顔検出器４３で検出することができれば、撮影されたペットの顔の画像が顔画像として学習用画像データベースｋ６に追加され、速度重視顔検出器４２の学習が行われるので、速度重視顔検出器４２によるペットの顔の検出精度を向上させることも可能である。 An object whose detection accuracy can be improved by learning is not limited to a specific person's face. For example, even when a photograph of a pet such as a dog or a cat is often taken, if the pet face can be detected by the accuracy-oriented face detector 43, the photographed pet face image is used as a learning image database. In addition to k6, learning of the speed-oriented face detector 42 is performed, so that the accuracy of detection of the pet's face by the speed-oriented face detector 42 can be improved.

また、ここでは速度重視顔検出器４２を構成する識別器H_kを作り直すことで学習しているが、学習の方法はこれに限らない。例えば、非顔と判定された領域が正しく顔領域として検出されるように、誤った検出を行った弱識別器の閾値θや信頼度αの値を変更するようにしてもよい。この方法によれば、速度重視顔検出器４２を作り直す学習方法にくらべ、演算負荷を下げることができる。 Although here are learning to recreate the identifier H _k constituting the speed-oriented face detector 42, a method of learning is not limited to this. For example, the threshold value θ and the reliability α of the weak classifier that performed the erroneous detection may be changed so that the area determined to be a non-face is correctly detected as the face area. According to this method, the calculation load can be reduced as compared with the learning method for remaking the speed-oriented face detector 42.

あるいは、識別器H_kを作り直す場合であっても、顔を含むにも関わらず非顔領域と判定された領域の画像に対する重みについては、他の画像に対する重みとは異なる制限を加える等の改変が可能である。例えば、非顔領域と判定されてしまった顔画像の重みについては、学習の途中で正しい判定を行った場合であっても、一定値よりも小さくならないような制限を加えたり、更新時には、常に、一定値より大きくなるような制限を加えたりする。こうすることで、この画像に対して正しい判定が行えるよう学習が進むことが期待される。 Or, even when re-creating the classifier H _k , the weight for the image of the area determined to be a non-face area even though it includes a face is modified such that a restriction different from the weight for the other image is added. Is possible. For example, for the weight of a face image that has been determined to be a non-face area, even if correct determination is performed during learning, a restriction that does not become smaller than a certain value is added, or when updating, always , Or add restrictions that are larger than a certain value. By doing so, it is expected that learning proceeds so that correct determination can be performed on this image.

また、学習処理は制御部４の負担が大きいので、デジタルカメラの充電中等に学習させるようにしても良い。デジタルカメラの充電中等のユーザーが撮影を行わない時期に学習を行わせることで、学習による演算負荷が撮影動作に影響を及ぼすのを防止することができる。なお、学習のための演算は、制御部４ではなくデジタルカメラの充電器やクレードル側に設けられたプロセッサで行うようにしてもかまわない。 In addition, since the learning process is heavy on the control unit 4, the learning process may be performed while the digital camera is being charged. By performing learning at a time when the user does not perform shooting such as during charging of the digital camera, it is possible to prevent the calculation load due to learning from affecting the shooting operation. Note that the calculation for learning may be performed not by the control unit 4 but by a digital camera charger or a processor provided on the cradle side.

また、顔検出器をViolaらが提案する方法によらず他の方法によって構成してもかまわない。顔検出器は、例えば、ニューラルネットワークにより構成してもよく、この場合、学習は、顔を含むにも関わらず非顔画像と判定されてしまった画像を顔検出器に与え、正しく顔であるという判定値の＋１が出力されるように、中間層等の重みパラメータを更新すればよい。 Further, the face detector may be configured by other methods without depending on the method proposed by Viola et al. For example, the face detector may be configured by a neural network. In this case, learning gives the face detector an image that has been determined to be a non-face image even though it includes a face, and is a correct face. The weight parameter such as the intermediate layer may be updated so that the determination value +1 is output.

また、図８に示すように、デジタルカメラ外部のネットワークに接続するための無線LAN、Bluetooth等の通信装置からなる通信部１１を設け、インターネット等のネットワークを通じて外部のサーバーから最新の学習結果を取り込み、最新の学習結果に基づき精度重視顔検出器４３を更新するようにしてもよい。 In addition, as shown in FIG. 8, a communication unit 11 including a communication device such as a wireless LAN or Bluetooth for connecting to a network outside the digital camera is provided, and the latest learning result is captured from an external server through a network such as the Internet. The accuracy-oriented face detector 43 may be updated based on the latest learning result.

精度重視顔検出器４３の検出精度が向上すれば、速度重視顔検出器４２で検出できなかった顔領域を精度重視顔検出器４３で検出できる可能性が高くなり、速度重視顔検出器４２の学習がさらに進むので、結果として、被写体となる頻度の高い人物の顔の速度重視顔検出器４２による検出精度をさらに上げることができる。 If the detection accuracy of the accuracy-oriented face detector 43 is improved, the possibility that the accuracy-oriented face detector 43 can detect a face area that cannot be detected by the speed-oriented face detector 42 increases. Since learning further proceeds, as a result, it is possible to further improve the accuracy of detection by the speed-oriented face detector 42 of the face of a person who frequently becomes a subject.

さらに、速度重視顔検出器４２の未検出領域を、ネットワークを通じてサービスセンターに伝送するように構成し、サービスセンターのオペレータが顔領域、非顔領域の判定を行うようにしても良い。判定結果はネットワークを通じて教師信号としてデジタルカメラの制御部４に戻され、未検出領域の画像と対応付けられた状態で学習用画像データベース６に追加される。 Further, the undetected area of the speed-oriented face detector 42 may be configured to be transmitted to the service center through the network, and the operator of the service center may determine the face area and the non-face area. The determination result is returned to the control unit 4 of the digital camera as a teacher signal through the network, and added to the learning image database 6 in a state associated with the image of the undetected area.

この方法によれば、ユーザーの操作負担を増やすことなく、顔を含まない領域の画像が顔画像であるとして学習用画像データベース６に追加されるのを防止し、速度重視顔検出器４２の学習に悪影響を及ぼすのを防止できる。 According to this method, it is possible to prevent the image of the area not including the face from being added to the learning image database 6 as a face image without increasing the operation burden on the user, and to learn the speed-oriented face detector 42. Can be adversely affected.

また、速度重視顔検出器４２の学習が進みすぎると、普段撮影しない人物の顔の検出精度が下がるため、撮影シーンによっては顔検出が正しく行われない可能性がある。これを回避するには図９に示すように、２つの速度重視顔検出器４２、４４を用意しておき、一方の速度重視顔検出器４２に対してのみ学習を行い、他方の速度重視顔検出器（第３の顔検出器）４３に対しては学習を行わないようにする。そして、ユーザーが撮影シーンに応じて学習後の速度重視顔検出器４２、未学習の速度重視顔検出器４４のいずれかを操作部９を操作することで選択できるようにする。 Also, if the speed-oriented face detector 42 learns too much, the detection accuracy of a person's face that is not normally photographed decreases, and face detection may not be performed correctly depending on the shooting scene. In order to avoid this, as shown in FIG. 9, two speed-oriented face detectors 42 and 44 are prepared, learning is performed only for one speed-oriented face detector 42, and the other speed-oriented face detector 42 is used. Learning is not performed for the detector (third face detector) 43. Then, the user can select either the learned speed-oriented face detector 42 after learning or the unlearned speed-oriented face detector 44 by operating the operation unit 9 according to the shooting scene.

この構成によれば、未学習の速度重視顔検出器４４を選択するようにすれば、未学習の速度重視顔検出器４４を用いてスルー画像から顔検出が行われるので、普段撮影しない人物を撮影する場合であっても所定レベル以上の検出精度が得られる。 According to this configuration, if the unlearned speed-oriented face detector 44 is selected, face detection is performed from the through image using the unlearned speed-oriented face detector 44. Even when shooting, detection accuracy of a predetermined level or higher can be obtained.

また、学習用画像データベース６に格納されている画像をメンテナンスする機能をデジタルカメラに持たせるようにしてもよい。例えば、液晶モニタ５に学習用画像データベース６に格納されている画像を教師信号とともにサムネイル表示し、顔を含まない画像の教師信号の値が顔画像を示す１になっている場合は、ユーザーが操作部９を操作することで、当該画像の教師信号の値を０に書き換える（その逆も可）、あるいは、当該画像を学習用画像データベース６から削除できるようにしてもよい。これにより、被写体となる頻度の高い人物の顔の検出精度をさらに上げることが可能である。 Further, the digital camera may be provided with a function for maintaining the images stored in the learning image database 6. For example, when the image stored in the learning image database 6 is displayed as a thumbnail together with the teacher signal on the liquid crystal monitor 5 and the value of the teacher signal of the image not including the face is 1 indicating the face image, the user By operating the operation unit 9, the value of the teacher signal of the image may be rewritten to 0 (or vice versa), or the image may be deleted from the learning image database 6. As a result, it is possible to further improve the accuracy of detecting the face of a person who frequently becomes a subject.

以上、本発明の実施形態について説明したが、上記実施形態は本発明の適用例の一つを示したに過ぎず、本発明の技術的範囲を上記実施形態の具体的構成に限定する趣旨ではない。 The embodiment of the present invention has been described above. However, the above embodiment is merely one example of application of the present invention, and the technical scope of the present invention is limited to the specific configuration of the above embodiment. Absent.

本発明に係る顔検出装置を備えたデジタルカメラの概略構成図である。It is a schematic block diagram of the digital camera provided with the face detection apparatus which concerns on this invention. 顔検出器を説明するための図である。It is a figure for demonstrating a face detector. 弱識別器を構成する矩形フィルターの例である。It is an example of the rectangular filter which comprises a weak discriminator. 識別器を生成する処理の内容を示したフローチャートである。It is the flowchart which showed the content of the process which produces | generates a discriminator. 閾値の大小と認識されるサンプル数の関係を示した図である。It is the figure which showed the relationship between the number of samples recognized as the magnitude of a threshold value. 撮影時の制御部の処理内容を示したフローチャートである。It is the flowchart which showed the processing content of the control part at the time of imaging | photography. ２つの顔検出器による顔検出結果のイメージである。It is an image of the face detection result by two face detectors. 本発明の一部変更例である。It is a partial modification of this invention. 同じく本発明の一部変更例である。Similarly, this is a partial modification of the present invention.

Explanation of symbols

４制御部
６学習用画像データベース
４２速度重視顔検出器（第１の顔検出器）
４３精度重視顔検出器（第２の顔検出器）
４４速度重視顔検出器（第３の顔検出器） 4 Control unit 6 Learning image database 42 Speed-oriented face detector (first face detector)
43 Precision-oriented face detector (second face detector)
44 Speed-sensitive face detector (third face detector)

Claims

A first face detector for detecting a face region from an image;
A second face detector that detects a face region from the same or substantially the same image as the image and has a detection accuracy different from that of the first face detector;
Undetected area extracting means for extracting, as an undetected area, an area detected as a face area by the second face detector but not detected as a face area by the first face detector;
Learning means for learning the first face detector based on the undetected area so that the undetected area is detected as a face area;
A face detection apparatus comprising:

A learning image database for storing a plurality of face images;
Image adding means for adding the image of the undetected area to the learning image database;
With
The learning means learns the first face detector using the learning image database to which an image of the undetected region is added.
The face detection apparatus according to claim 1.

Comprising a certainty factor calculating means for calculating a certainty factor indicating the certainty that the face is included in the undetected region;
The image adding means adds the image of the undetected area to the learning image database when the certainty factor is larger than a predetermined value.
The face detection apparatus according to claim 1 or 2, wherein

Confirmation means for confirming whether or not a face is included in the undetected region,
The image adding means adds an image of the undetected area to the learning image database when it is confirmed that a face is included in the undetected area.
The face detection apparatus according to claim 1 or 2, wherein

The face detection device is mounted on a digital camera,
The first face detector detects a face area from a through image of the digital camera;
The second face detector detects a face area from a real captured image of the digital camera;
The face detection apparatus according to claim 1, wherein the face detection apparatus is a part of the face detection apparatus.

A third face detector for detecting a face area from the image;
Selecting means for selecting whether to perform face detection using the first face detector or the third face detector;
The face detection device according to claim 1, further comprising:

A first face detection process for detecting a face area from an image;
A second face detection process that detects a face region from an image that is the same or substantially the same as the image, and has a detection accuracy different from the first face detection process;
An undetected area extraction process for extracting an area detected as a face area in the second face detector but not detected as a face area in the first face detector;
A learning process for learning the first face detector based on the undetected area so that the undetected area is detected as a face area;
A face detection program comprising: