JP2006072770A

JP2006072770A - Face detection device and face direction estimation device

Info

Publication number: JP2006072770A
Application number: JP2004256263A
Authority: JP
Inventors: Masahiko Yamada; 晶彦山田
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2004-09-02
Filing date: 2004-09-02
Publication date: 2006-03-16

Abstract

PROBLEM TO BE SOLVED: To provide a device capable of detecting a face and estimating a face direction with high detection accuracy even if the face has any direction in a detection target image, without accompanying increase of a detection time or a used memory amount. SOLUTION: A front face is divided into a left half and a right half, and a face-detecting parameter is generated to each the half face. When detecting the face, a notice area is divided into right and left, and similarity of each the divided area and the corresponding parameter of two parameters is calculated. When one similarity is a threshold value or above, it is distinguished that the notice area is a face area. The direction of the face and an angle are distinguished from magnitude relation of the similarity to each the division area. Thereby, the face detection and the face direction estimation become possible with the high detection accuracy even if the face has any direction without accompanying the increase of the detection time or the used memory amount. COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、画像中から顔領域とその顔向きを検出する顔検出装置および顔向き推定装置に関するものである。 The present invention relates to a face detection device and a face direction estimation device that detect a face region and its face direction from an image.

人の顔の検出技術は、ビルの監視システムなどで用いられつつある。また、最近では、デジタルカメラにおける人物（顔）へのオートフォーカス機能や、自動露光補正機能、さらには顔認識による個人認証などでも使用されつつある。この他、デジタルビデオにおける画像検索やインデクシング、あるいは、画像アルバム整理などにおいて、動画や静止画から人物を抽出するための技術としても使用されつつある。 Human face detection technology is being used in building monitoring systems and the like. Recently, digital cameras are also being used for autofocus functions for humans (faces), automatic exposure correction functions, and personal authentication using face recognition. In addition, it is also being used as a technique for extracting a person from a moving image or a still image in image search and indexing in digital video or image album organization.

従来の顔検出技術では、一定の顔の向きに対するパラメータ、すなわち、正面顔用のパラメータや横向き顔用のパラメータなどを用いて顔検出を行うのが一般的であった（以下では、従来技術１と呼ぶ）。しかし、正面顔用のパラメータを使って顔検出を行う場合には、正面顔以外の顔の向きに対する検出率が低くなってしまうとの問題が生じる。つまり、パラメータと同じ顔の向きに対しては検出精度が高いが、他の顔の向きとなると、検出率がかなり落ちてしまうとの問題が生じる。 In the conventional face detection technique, face detection is generally performed using parameters for a certain face orientation, that is, parameters for a front face, parameters for a side face, and the like (hereinafter, Conventional Technique 1). Called). However, when face detection is performed using parameters for the front face, there is a problem that the detection rate for the face orientation other than the front face is low. That is, although the detection accuracy is high for the same face orientation as the parameter, there is a problem that the detection rate is considerably lowered when the orientation is another face.

そこで、最近では、顔向きに拘わらず検出精度を向上できる技術が開発されている。たとえば、このような技術として、
（１）多様な顔の向きの画像データベースから一つのパラメータを生成し、これをもとに顔検出を行う技術（以下では、従来技術２と呼ぶ）、
（２）複数の顔向きについてパラメータを準備（例えば、右用、正面用、左用など複数準備）し、これをもとに顔検出を行う技術（以下では、従来技術３と呼ぶ）、
の２つが知られている。 Therefore, recently, a technique has been developed that can improve detection accuracy regardless of the face orientation. For example,
(1) A technique for generating one parameter from an image database of various face orientations and performing face detection based on the parameter (hereinafter referred to as Conventional Technique 2),
(2) A technique for preparing parameters for a plurality of face orientations (for example, a plurality of preparations for right, front, left, etc.) and performing face detection based on the parameters (hereinafter referred to as prior art 3),
Two are known.

このうち、従来技術３に関しては、例えば、ピクセル差分特徴を用いて、顔の向き毎に作成した複数の検出器をツリー構造に組み合わせて識別器を構成し、顔向き変化に対応した顔検出を行う技術（非特許文献１、参照）や、Haarウェーブレット特徴を用いて、顔の向き毎に作成した複数の検出器をピラミッド型に組み合わせて識別器を構成することで、顔向き変化に対応した顔検出を行う技術（非特許文献２、参照）が知られている。
佐部浩太郎, 日台健一, ”ピクセル差分特徴を用いた実時間任意姿勢顔検出器の学習”, 第１０回画像センシングシンポジウム講演論文集, pp.547-552, 2004年6月 Z.Q.Zhang, L.Zhu, S.Z.Li, and H.J.Zhang, "Real-Time Multi-View Face Detection", In Proc. of IEEE International Conference on Automatic Face and Gesture Recognition, Washington, May 2000 Among them, with regard to the prior art 3, for example, using a pixel difference feature, a plurality of detectors created for each face orientation are combined in a tree structure to constitute a discriminator, and face detection corresponding to face orientation change is performed. Using a technique to perform (see Non-Patent Document 1) and Haar wavelet features, a detector is constructed by combining a plurality of detectors created for each face orientation in a pyramid shape, thus supporting face orientation changes A technique for performing face detection (see Non-Patent Document 2) is known.
Kotaro Sabe and Kenichi Hidai, "Learning a Real-Time Arbitrary Posture Face Detector Using Pixel Difference Features", Proc. Of the 10th Image Sensing Symposium, pp.547-552, June 2004 ZQZhang, L.Zhu, SZLi, and HJZhang, "Real-Time Multi-View Face Detection", In Proc. Of IEEE International Conference on Automatic Face and Gesture Recognition, Washington, May 2000

上記の従来技術２は、多様な顔向きの画像から特徴量を抽出し、これらを、例えば平均化するなどして、全ての顔の向きに汎用的に使用できる一つのパラメータを生成するものである。この技術によれば、一つのパラメータで対応可能なため、検出時間や使用メモリ量が従来技術１の場合と同程度で済むとのメリットがある。しかし、一般に、このように多様な顔向きの画像の特徴を一つに加味したパラメータを用いると、例えば正面近くの顔向き以外に関しては、検出精度が従来技術１と比べて向上するが、正面近くの顔向きにおいては、従来技術１と比べて検出精度が落ちてしまう、との欠点を有している。 The above-described prior art 2 generates a single parameter that can be used universally for all face orientations by extracting feature amounts from images with various face orientations and averaging them. is there. According to this technique, since it is possible to cope with one parameter, there is an advantage that the detection time and the amount of memory used can be the same as those in the case of the conventional technique 1. However, in general, when parameters that take into account the characteristics of images with various face orientations as described above are used, the detection accuracy is improved as compared with the related art 1 except for the face orientation near the front, for example. In the near face direction, there is a defect that the detection accuracy is lower than that in the prior art 1.

これに対し、従来技術３では、顔の向き毎に生成したパラメータを用いるため、どの顔の向きでも検出精度を高めることができる。しかし、この場合には、複数のパラメータを用いることから、従来技術１や２と比べて検出時間や使用メモリ量がかなり大きくなってしまうとの欠点を有する。 On the other hand, in the prior art 3, since the parameter generated for each face direction is used, the detection accuracy can be increased for any face direction. However, in this case, since a plurality of parameters are used, there is a disadvantage that the detection time and the amount of used memory are considerably increased as compared with the prior arts 1 and 2.

そこで、本発明は、検出時間や使用メモリ量を低く抑えながら、どの顔の向きに対しても検出精度を高めることができる顔検出装置を提供することを課題とする。また、検出した顔がどの向きを向いているかを精度よく検出できる顔向き推定装置を提供することを課題とする。 Therefore, an object of the present invention is to provide a face detection apparatus that can improve detection accuracy for any face orientation while keeping the detection time and the amount of memory used low. It is another object of the present invention to provide a face direction estimation device that can accurately detect which direction the detected face is facing.

本発明は、一つの顔を検出する際に、従来技術１と異なり正面顔に関するパラメータを使用するのではなく、たとえば正面顔を左半分と右半分に分け、各々の顔情報に関するパラメータを使用して検出を行うものであり、正面の顔の検出率は従来技術１程度に維持し、かつ正面以外の顔の向きでも顔の検出率が従来技術３に匹敵する程度に向上する顔検出技術および顔向き推定技術に関する装置に関するものである。なお、パラメータの情報量が従来技術１と同程度となり、その結果、検索時間、使用メモリ量は従来技術１と同程度となる特徴も有する。 In the present invention, when detecting one face, unlike the prior art 1, parameters relating to the front face are not used. For example, the front face is divided into a left half and a right half and parameters relating to each face information are used. A face detection technique in which the detection rate of the front face is maintained at about the level of the prior art 1, and the face detection rate is improved to a level comparable to that of the prior art 3 even in the face orientation other than the front. The present invention relates to a device related to face orientation estimation technology. Note that the parameter information amount is about the same as that of the prior art 1, and as a result, the search time and the amount of memory used are also about the same as those of the prior art 1.

請求項１の発明に係る顔検出装置は、画像中に含まれる顔領域を当該画像の画像データをもとに検出する顔検出装置において、前記画像上に、フレーム領域を設定する領域設定手段と、前記フレーム領域を領域分割して得られる特定の分割領域に対応して個別に割り当てられたパラメータを格納する格納手段と、前記それぞれのパラメータと、このパラメータが割り当てられた分割領域の画像データをもとに、前記フレーム領域が設定された画像領域が顔領域であるかを判別する判別手段とを有する、ことを特徴とする。 A face detection device according to claim 1 is a face detection device for detecting a face region included in an image based on image data of the image, and a region setting means for setting a frame region on the image. Storage means for storing individually assigned parameters corresponding to specific divided areas obtained by dividing the frame area, and the respective parameters and image data of the divided areas to which the parameters are assigned. Basically, the image processing apparatus includes a determining unit that determines whether the image area in which the frame area is set is a face area.

請求項２の発明は前記請求項１に係る顔検出装置において、前記分割領域は、前記フレーム領域を左右に領域分割したときの領域であって、前記パラメータは、左右の分割領域に対応してそれぞれ準備されている、ことを特徴とする。 According to a second aspect of the present invention, in the face detection device according to the first aspect, the divided region is a region obtained by dividing the frame region into left and right regions, and the parameter corresponds to the left and right divided regions. Each of them is prepared.

請求項３の発明は前記請求項２に係る顔検出装置において、前記判別手段は、前記それぞれのパラメータと、このパラメータが割り当てられた分割領域の画像データとを比較演算して当該パラメータにて規定される画像と当該分割領域の画像の類似度を求め、求めた類似度をもとに、前記フレーム領域が設定された画像領域が顔領域であるかを判別する、ことを特徴とする。 According to a third aspect of the present invention, in the face detection apparatus according to the second aspect, the discrimination means compares the respective parameters with image data of a divided area to which the parameters are assigned, and is defined by the parameters. The similarity between the image to be processed and the image of the divided area is obtained, and based on the obtained similarity, it is determined whether the image area in which the frame area is set is a face area.

請求項４の発明は前記請求項３に係る顔検出装置において、前記判別手段は、前記左右の分割領域に対する類似度を閾値と比較し、少なくとも何れか一方の類似度が前記閾値以上であるときに、前記フレーム領域が設定された画像領域が顔領域であるかを判別する、ことを特徴とする。 According to a fourth aspect of the present invention, in the face detection device according to the third aspect, the determination unit compares the similarity with respect to the left and right divided regions with a threshold, and at least one of the similarities is equal to or greater than the threshold. In addition, it is characterized in that it is determined whether the image area in which the frame area is set is a face area.

請求項５の発明は前記請求項１ないし４のいずれか一項に係る顔検出装置において、前記それぞれのパラメータは、正面顔の画像データベースから各分割領域の特徴量を抽出することにより生成されており、前記一つのフレーム領域に対して一つのみ準備されている、ことを特徴とする。 According to a fifth aspect of the present invention, in the face detection device according to any one of the first to fourth aspects, the respective parameters are generated by extracting feature amounts of each divided region from a front face image database. And only one is prepared for the one frame region.

請求項６の発明に係る顔向き推定装置は、フレーム領域に含まれる顔の向きを当該フレーム領域内の画像データをもとに検出する顔向き推定装置であって、前記フレーム領域を領域分割して得られる特定の分割領域に対応して個別に割り当てられたパラメータを格納する格納手段と、前記それぞれのパラメータと、このパラメータが割り当てられた分割領域の画像データをもとに、当該フレーム領域に含まれる顔の向きを判別する判別手段とを有する、ことを特徴とする。 A face direction estimation apparatus according to a sixth aspect of the present invention is a face direction estimation apparatus that detects a face direction included in a frame area based on image data in the frame area, and divides the frame area into regions. Storage means for storing parameters individually assigned corresponding to the specific divided areas obtained in the above, the respective parameters, and the image data of the divided areas to which the parameters are assigned. And determining means for determining the orientation of the included face.

請求項７の発明は前記請求項６に係る顔向き推定装置において、前記分割領域は、前記フレーム領域を左右に領域分割したときの領域であって、前記パラメータは、左右の分割領域に対応してそれぞれ準備されている、ことを特徴とする。 According to a seventh aspect of the present invention, in the face direction estimation device according to the sixth aspect, the divided region is a region obtained by dividing the frame region into left and right regions, and the parameter corresponds to the left and right divided regions. Each of them is prepared.

請求項８の発明は前記請求項７に係る顔向き推定装置において、前記判別手段は、前記それぞれのパラメータと、このパラメータが割り当てられた分割領域の画像データとを比較演算して当該パラメータにて規定される画像と当該分割領域の画像の類似度を求め、求めた類似度をもとに、当該フレーム領域に含まれる顔の左右方向の向きを判別する、ことを特徴とする。 According to an eighth aspect of the present invention, in the face direction estimating apparatus according to the seventh aspect, the discrimination means compares the respective parameters with image data of a divided region to which the parameters are assigned, and uses the parameters. It is characterized in that the degree of similarity between the prescribed image and the image of the divided area is obtained, and the direction of the face included in the frame area is discriminated based on the obtained degree of similarity.

請求項９の発明は前記請求項８に係る顔向き推定装置において、前記判別手段は、前記左右の分割領域に対する類似度の大小関係をもとに、当該フレーム領域に含まれる顔の左右方向の向きを判別する、ことを特徴とする。 According to a ninth aspect of the present invention, in the face orientation estimating apparatus according to the eighth aspect, the determining means is configured to detect the left-right direction of the face included in the frame region based on the magnitude relationship of the similarity to the left and right divided regions. It is characterized by discriminating the direction.

請求項１０の発明は前記請求項９に係る顔向き推定装置において、前記判別手段は、前記左右の分割領域に対する類似度と閾値とを比較して、当該フレーム領域に含まれる顔の左右方向の向きの大きさを判別する、ことを特徴とする。 According to a tenth aspect of the present invention, in the face direction estimating apparatus according to the ninth aspect, the discrimination means compares the similarity to the left and right divided areas with a threshold value, and compares the left and right direction of the face included in the frame area. It is characterized by discriminating the size of the direction.

請求項１ないし５の発明によれば、パラメータが割り当てられた分割領域の画像が、当該パラメータによって規定される画像にどの程度マッチングするかによって、フレームが設定された画像領域が顔を含むか否かを判別するようにしたことにより、当該フレームが設定された画像領域内の顔がパラメータによる顔向きとは異なる向きを向いていても、当該画像領域が顔領域であると判別する率を高めることができる。 According to the first to fifth aspects of the present invention, whether or not the image area in which the frame is set includes a face, depending on how much the image of the divided area to which the parameter is assigned matches the image specified by the parameter. This makes it possible to increase the rate at which the image area is determined to be a face area even if the face in the image area where the frame is set faces in a direction different from the face direction by the parameter. be able to.

この場合、パラメータは、正面顔用のパラメータとすることができ、よって、本発明によれば、正面顔の検出精度を高く維持しながら、正面顔以外の顔向きに対しても、顔検出精度を高めることができる。 In this case, the parameter can be a parameter for the front face. Therefore, according to the present invention, the face detection accuracy can be applied to face orientations other than the front face while maintaining high detection accuracy of the front face. Can be increased.

また、分割領域毎にパラメータを保持するものであるから、全フレーム領域について足し合わせたときのパラメータの総容量は、全フレーム領域に対して一つのパラメータを保持する場合（上記従来技術１）と同程度とすることができ、上記従来技術３のように、パラメータの容量が飛躍的に大きくなることもない。また、分割領域毎のマッチング算出を同時平行で行えば、処理時間が長期化することもない。 Further, since the parameter is held for each divided area, the total capacity of the parameters when all the frame areas are added is the same as the case where one parameter is held for all the frame areas (the above prior art 1). The parameter capacity can be made substantially the same as in the prior art 3, and the capacity of the parameter does not increase dramatically. In addition, if the matching calculation for each divided region is performed simultaneously in parallel, the processing time will not be prolonged.

このように、本発明によれば、検出時間や使用メモリ量を低く抑えながら、どの顔の向きに対しても検出精度を高めることができる顔検出装置を提供することができる。 As described above, according to the present invention, it is possible to provide a face detection device capable of increasing the detection accuracy for any face orientation while keeping the detection time and the amount of used memory low.

より具体的には、請求項２ないし４のように構成すると、顔の左半分に関するパラメータ、顔の右半分に関するパラメータの２つのパラメータを使用することで、正面の顔の検出率は従来技術１程度に維持し、かつ正面以外の顔の向きでも顔の検出率を従来技術３に匹敵する程度に向上させることができる。すなわち、顔の左半分に関するパラメータ、顔の右半分に関するパラメータを用いるので、正面顔の検出率は従来技術１と比べて劣らない上、例えば、左向きの顔の場合には顔の右半分に関するパラメータによる検出率が高く、右向きの顔の場合には顔の左半分に関するパラメータによる検出率が高くなるので、正面以外の顔の向きでも顔の検出率を従来技術３に匹敵する程度に向上させることができる。 More specifically, according to the second to fourth aspects, by using two parameters, a parameter relating to the left half of the face and a parameter relating to the right half of the face, the detection rate of the front face can be increased according to the prior art 1. The face detection rate can be improved to a level comparable to that of the related art 3 even when the face orientation is other than the front. That is, since the parameters relating to the left half of the face and the parameters relating to the right half of the face are used, the detection rate of the front face is not inferior to that of the prior art 1. For example, in the case of a face facing left, the parameter relating to the right half of the face In the case of a face facing right, the detection rate based on the parameters related to the left half of the face is high. Therefore, the face detection rate should be improved to a level comparable to that of the prior art 3 even in face orientations other than the front. Can do.

なお、顔の左半分に関するパラメータ、顔の右半分に関するパラメータの２つで１つの顔に関するパラメータとなるので、パラメータの情報量が１つの顔に関する量とほぼ同じ、すなわち従来技術１と同程度となる。よって、使用メモリ量は従来技術１と同程度となり、また、左顔と右顔の検出処理を平行して行えば、処理速度も長期化することはない。 It should be noted that two parameters, the parameter relating to the left half of the face and the parameter relating to the right half of the face, are parameters relating to one face. Become. Therefore, the amount of memory used is about the same as that in the prior art 1, and if the left face and right face detection processes are performed in parallel, the processing speed will not be prolonged.

また、請求項６ないし１０によれば、パラメータが割り当てられた分割領域の画像が、当該パラメータによって規定される画像にどの程度マッチングするかによって、フレーム内に含まれる顔がどの方向を向いているかを判別するものであるから、簡単な処理により、顔向き方向を検出することができる。 Further, according to claims 6 to 10, in which direction the face included in the frame is directed depends on how much the image of the divided area to which the parameter is assigned matches the image specified by the parameter. Therefore, the face direction can be detected by a simple process.

特に、本発明は、上記請求項１ないし５の発明と同様、分割領域毎にパラメータを割り当て、分割領域毎のマッチング度合いをもとに判別を行うものであるから、上記請求項１ないし５の発明における処理結果をそのまま用いることができ、よって、請求項１ないし５の発明による顔検出結果に続いた顔向き推定処理を、処理ルーチンの複雑化を招くことなく行うことができる。すなわち、本発明は、上記請求項１ないし５の発明とともに用いることにより、飛躍的な処理ルーチンの簡素化を図ることができる。 In particular, since the present invention assigns parameters to each divided area and performs the discrimination based on the matching degree for each divided area, as in the first to fifth aspects of the present invention. The processing result in the invention can be used as it is, and therefore the face direction estimation processing following the face detection result according to the inventions of claims 1 to 5 can be performed without complicating the processing routine. That is, according to the present invention, when used together with the inventions of claims 1 to 5, the processing routine can be dramatically simplified.

より具体的には、請求項７ないし９のように構成すると、顔の左半分に関するパラメータによる検出結果（類似度）と、顔の右半分に関するパラメータによる検出結果（類似度）の双方の数値の組み合わせから、左右方向の顔の向きないしその大きさを円滑に検出することができる。 More specifically, when configured as in claims 7 to 9, the numerical values of both the detection result (similarity) based on the parameters related to the left half of the face and the detection result (similarity) based on the parameters related to the right half of the face From the combination, it is possible to smoothly detect the direction and size of the face in the left-right direction.

本発明の意義ないし効果は、以下に示す実施形態の説明により更に明らかとなろう。 The significance or effect of the present invention will become more apparent from the description of the embodiments given below.

ただし、以下の実施の形態は、あくまでも、本発明の一つの実施形態であって、本発明ないし各構成要件の用語の意義は、以下の実施の形態に記載されたものに制限されるものではない。 However, the following embodiment is merely one embodiment of the present invention, and the meaning of the term of the present invention or each constituent element is not limited to that described in the following embodiment. Absent.

以下、本発明の実施の形態につき図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

本実施形態に係る顔検出および顔向き推定装置では、正面顔のサンプル画像データベースを学習処理して顔検出用のパラメータを生成し、これをもとに、対象画像に対する顔検出処理が行われる。ここで、パラメータは、一般的には、輝度情報やエッジ情報、これらの情報に関する統計学的データ、顔検出のために特徴を際立たせるアルゴリズムにおける設定要素、などから構成される。本実施例では、パラメータとしてエッジ情報を用い、これをもとに、正規化相互相間法にて類似度を算出し、顔検出を行っている。 In the face detection and face direction estimation apparatus according to the present embodiment, a face detection parameter is generated by performing learning processing on the front face sample image database, and face detection processing on the target image is performed based on this. Here, the parameter is generally composed of luminance information and edge information, statistical data related to such information, setting elements in an algorithm that highlights features for face detection, and the like. In the present embodiment, edge information is used as a parameter, and based on this, the degree of similarity is calculated by a normalized inter-phase method, and face detection is performed.

本実施形態では、顔全体に対して１つのパラメータを用いるのではなく、顔全体のうちの左半分（以下、左顔と呼ぶ）に関するパラメータと、右半分（以下、右顔と呼ぶ）に関するパラメータの２つが用いられる。すなわち、これらのパラメータは、図１（ａ）に示されている様に、１つの正面顔を、左顔、右顔に分割したときの、左顔に関するパラメータと、右顔に関するパラメータからなっている。各々のパラメータは、左顔用のサンプル画像データベース（正面顔）と左顔用のサンプル画像データベース（正面顔）について、それぞれ個別に学習処理を行うことによって生成される。パラメータの生成はオフラインで行われ、顔検出および顔向き推定装置内にストアされる。 In this embodiment, one parameter is not used for the entire face, but a parameter related to the left half (hereinafter referred to as the left face) and a parameter related to the right half (hereinafter referred to as the right face) of the entire face. Are used. That is, as shown in FIG. 1A, these parameters are composed of a parameter relating to the left face and a parameter relating to the right face when one front face is divided into a left face and a right face. Yes. Each parameter is generated by individually performing a learning process on the sample image database for the left face (front face) and the sample image database for the left face (front face). The parameters are generated off-line and stored in the face detection and face orientation estimation device.

図１（ｂ）を参照して、顔検出および顔向き推定処理においては、画像データにおいて、顔があるかどうかの検出を行う領域（以下、この領域を注目領域と呼ぶ）を左、右の領域に等分割し、このうち左領域に対し右顔に関するパラメータを用いて比較演算（類似度の算出）を行い、右領域に対し左顔に関するパラメータを用いて比較演算を行う。なお、左領域および右領域の各々に対する演算を左右同時に行うようにしても良い。比較演算は、正規化相互相間法に従って行われ、その算出結果として類似度が、左領域と右領域の各々において別々に算出される。この左領域と右領域の類似度各々を、各レベル値（閾値）と比較し、左領域における比較結果、右領域における比較結果を用いて顔検出および顔向き推定を行う。 Referring to FIG. 1B, in the face detection and face direction estimation processing, a region for detecting whether or not there is a face in the image data (hereinafter, this region is referred to as a region of interest) is displayed on the left and right sides. The area is equally divided, a comparison operation (similarity calculation) is performed on the left area using the parameters related to the right face, and a comparison calculation is performed on the right area using the parameters related to the left face. Note that the calculation for each of the left region and the right region may be performed simultaneously on the left and right. The comparison operation is performed according to the normalized inter-phase method, and the similarity is calculated separately in each of the left region and the right region as a calculation result. Each similarity between the left region and the right region is compared with each level value (threshold value), and face detection and face orientation estimation are performed using the comparison result in the left region and the comparison result in the right region.

まず、図２に本実施の形態に係る顔検出および顔向き推定装置１００の機能ブロック図を示す。 First, FIG. 2 shows a functional block diagram of the face detection and face direction estimation apparatus 100 according to the present embodiment.

図２を参照して、１０は顔検出および顔向き推定を行う対象画像などを検出処理終了まで記憶しておく画像記憶部、１１は画像のエッジ情報を計算するエッジ情報計算部、１２は各演算処理および各部の動作の処理や判断などの制御を行う演算処理部、１３は検出処理の結果（顔領域情報や顔向き情報など）を検出装置１００の後段に出力する出力部である。 Referring to FIG. 2, 10 is an image storage unit that stores a target image for face detection and face orientation estimation until the end of the detection process, 11 is an edge information calculation unit that calculates edge information of the image, and 12 is each An arithmetic processing unit 13 performs control such as arithmetic processing and processing and determination of operation of each unit, and 13 is an output unit that outputs a result of detection processing (such as face area information and face orientation information) to the subsequent stage of the detection apparatus 100.

１４および１５は各々左顔および右顔に関するパラメータであって、当該検出装置１００の内蔵メモリに格納されている。１６は画像の左領域および右領域の各々について右顔および左顔に関するパラメータ１４、１５を用いて類似度を計算する類似度計算部、１７および１８は算出された左および右領域の各々についての類似度をもとに顔判定および顔向き判定を行う顔判定部および顔向き判定部である。１９は演算結果や検出情報などを格納するメモリである。 14 and 15 are parameters relating to the left face and the right face, respectively, and are stored in the built-in memory of the detection apparatus 100. 16 is a similarity calculation unit that calculates the degree of similarity using the parameters 14 and 15 for the right face and the left face for each of the left region and the right region of the image, and 17 and 18 are for the calculated left and right regions, respectively. A face determination unit and a face direction determination unit that perform face determination and face direction determination based on similarity. A memory 19 stores calculation results and detection information.

ここで、検出装置１００の各部の動作について説明する。 Here, the operation of each part of the detection apparatus 100 will be described.

まず、検出装置１００に顔検出および顔向き推定を行う対象画像のデータが入力されると、この画像データは画像記憶部１０に記憶される。この画像記憶部１０から、画像データがエッジ情報計算部１１に入力され、画像データのエッジ情報が計算される。注目領域の左領域および右領域の各々についてのエッジ情報はここから読み出される。本実施例ではこれらのエッジ情報をベクトル値として扱い、正規化相互相間法をもとに、類似度の計算を行う。なお、演算処理部１２の制御により、一つの注目領域に対する検出処理が終了する毎に、画像の大きさに留意しながら、注目領域が横方向または縦方向に１画素ずれた領域に更新される。これにより、入力画像全体に亘って、顔検出および顔向き推定処理が行われる。さらに、演算処理部１２は、後述のように、画像記憶部１０に保存された画像を縮小する処理も行う。 First, when data of a target image to be subjected to face detection and face orientation estimation is input to the detection apparatus 100, this image data is stored in the image storage unit 10. Image data is input from the image storage unit 10 to the edge information calculation unit 11, and edge information of the image data is calculated. The edge information for each of the left region and the right region of the attention region is read from here. In this embodiment, the edge information is treated as a vector value, and the similarity is calculated based on the normalized interphase method. Note that, every time detection processing for one attention area is completed, the attention area is updated to an area that is shifted by one pixel in the horizontal direction or the vertical direction while paying attention to the size of the image under the control of the arithmetic processing unit 12. . Thereby, face detection and face orientation estimation processing are performed over the entire input image. Further, the arithmetic processing unit 12 also performs a process of reducing an image stored in the image storage unit 10 as will be described later.

類似度計算部１６では、左領域および右領域の各々についてのエッジ情報のベクトル値と、左顔および右顔に関するパラメータ１４および１５を用いて、左領域および右領域の各々における類似度が計算される。顔判定部１７および顔向き判定部１８は、これら類似度と、所定のレベル値とを比較し、類似度がどのレベルにあるかを判別する（詳細は後述）。そして、その判別結果をもとに、顔検出、顔向き推定を行う（詳細は後述）。 The similarity calculation unit 16 calculates the similarity in each of the left region and the right region using the vector value of the edge information for each of the left region and the right region and the parameters 14 and 15 regarding the left face and the right face. The The face determination unit 17 and the face direction determination unit 18 compare these similarities with a predetermined level value to determine which level the similarity is (details will be described later). Then, based on the determination result, face detection and face direction estimation are performed (details will be described later).

なお、これらの検出結果から顔および顔の向きに関する情報、例えば、顔位置情報や顔領域情報、顔向き角度情報など、検出装置１００の後段において有意義な情報が得られる。これらの情報はメモリ１９に格納され、必要に応じて出力部１３から出力される。 From these detection results, meaningful information is obtained in the subsequent stage of the detection apparatus 100, such as information on the face and face orientation, for example, face position information, face area information, and face orientation angle information. These pieces of information are stored in the memory 19 and output from the output unit 13 as necessary.

次に、検出装置１００における顔検出および顔向き推定の動作フローについて説明する。 Next, an operation flow of face detection and face orientation estimation in the detection apparatus 100 will be described.

本実施形態では、顔検出および顔向き推定用に一種類の大きさのテンプレートが設定され、このテンプレートを左右均等に分割した領域に、上記左顔用のパラメータと右顔用のパラメータが割り当てられる。上述の注目領域の大きさは、テンプレート領域と同サイズに設定されており、注目領域の左領域および右領域から取得したエッジ情報（ベクトル値）と、テンプレートの右顔用パラメータおよび左顔用パラメータとを比較演算することにより、当該注目領域の左領域画像および右領域画像と、右顔用パラメータおよび左顔用パラメータの類似度が算出される。 In the present embodiment, a template of one size is set for face detection and face orientation estimation, and the left face parameter and right face parameter are assigned to an area obtained by equally dividing the template into left and right. . The size of the attention area described above is set to the same size as the template area, the edge information (vector value) acquired from the left area and the right area of the attention area, the right face parameter and the left face parameter of the template And the left region image and the right region image of the region of interest, the right face parameter, and the left face parameter similarity are calculated.

なお、注目領域は、先に説明したように、検出処理が終了する毎に一画素ずつ左右方向または上下方向にずらされる。この際、演算処理部１２は、注目領域が入力画像領域の右端または下端に至ったかの判定、すなわち、注目領域に対する領域判定処理を行う。これにより、注目領域は、検出処理が終了する都度、入力画像領域の左上から右下方向に向けて、ラスタースキャン方式で１画素ずつずらしながら、更新設定される。 Note that, as described above, the attention area is shifted in the horizontal direction or the vertical direction by one pixel each time the detection process is completed. At this time, the arithmetic processing unit 12 determines whether the attention area has reached the right end or the lower end of the input image area, that is, performs area determination processing for the attention area. As a result, each time the detection process ends, the attention area is updated and set by shifting one pixel at a time from the upper left to the lower right of the input image area in the raster scan method.

なお、本実施形態では、一種類の大きさのテンプレートを用いて入力画像内における様々な大きさの顔の検出を行えるよう、ラスタースキャン方式で入力画像の右下までの検出作業が終了する毎に、入力画像を定められた倍率で縮小し、再度、ラスタースキャン方式で上記の検出作業を入力画像の左上から右下方向に向けて行う処理が行われる。これを規定回数繰り返すことにより、入力画像における様々な大きさの顔が検出可能となっている。 In the present embodiment, every time the detection work up to the lower right of the input image is completed by the raster scan method, a single size template can be used to detect faces of various sizes in the input image. In addition, the input image is reduced at a predetermined magnification, and the above-described detection operation is again performed from the upper left to the lower right of the input image by the raster scan method. By repeating this a predetermined number of times, faces of various sizes in the input image can be detected.

図３を参照して、入力画像を徐々に縮小しながら、入力画像内における様々な大きさの顔の検出を行う動作について説明する。同図では、入力画像中に、顔Ａと顔Ｂが含まれている。 With reference to FIG. 3, an operation for detecting faces of various sizes in an input image while gradually reducing the input image will be described. In the figure, the face A and the face B are included in the input image.

同図（ａ）は、入力画像上において、注目領域をラスタースキャン方式で１画素ずつ右方または下方にずらして顔検出を行う際の動作を示すものである。この場合、注目領域に比べて顔Ａと顔Ｂが大きいため、顔Ａ、顔Ｂの何れも検出することができない。 FIG. 6A shows an operation when face detection is performed by shifting the attention area to the right or below by one pixel in the raster scan method on the input image. In this case, since face A and face B are larger than the region of interest, neither face A nor face B can be detected.

同図（ｂ）は、同図（ａ）の場合よりも入力画像を縮小し、その後、同図（ａ）の場合と同様に、注目領域をラスタースキャン方式で１画素ずつ右方または下方にずらして顔検出を行う際の動作を示すものである。この場合、顔Ｂが注目領域とほぼ一致する大きさとなるため、顔Ｂは検出されるが、顔Ａは検出されない。 In FIG. 6B, the input image is reduced more than in the case of FIG. 5A, and thereafter, as in the case of FIG. The operation when performing face detection by shifting is shown. In this case, since the face B has a size that substantially matches the attention area, the face B is detected, but the face A is not detected.

同図（ｃ）は、同図（ｂ）の状態の後、縮小処理とスキャン処理を何度か繰り返した際の処理動作を示すものである。この場合、顔Ａが注目領域とほぼ一致する大きさとなり、顔Ａを検出することができる。 FIG. 6C shows the processing operation when the reduction process and the scan process are repeated several times after the state of FIG. In this case, the size of the face A substantially coincides with the attention area, and the face A can be detected.

これ以後も、入力画像の縮小処理と注目領域のスキャン処理を規定回数まで繰り返す。その結果、入力画像から顔Ａと顔Ｂが検出され、入力画像のどこに顔があるか、その顔の向きとその角度はいくらか、などの情報がメモリ１９に蓄積される。 Thereafter, the reduction process of the input image and the scan process of the attention area are repeated a predetermined number of times. As a result, the face A and the face B are detected from the input image, and information such as where the face is in the input image, the direction of the face and what the angle is is stored in the memory 19.

さて、本実施形態では、類似度計算のアルゴリズムとして正規化相互相関を用いる。正規化相互相関では、計算式「Ｓ=(Ｖｆ-Ｍｆ)＊(Ｔ-Ｍｔ)/(|Ｖｆ-Ｍｆ||Ｔ-Ｍｔ|)」で表される演算の結果から類似度Ｓが得られる。なお、式中の＊はベクトル内積演算子、|Ａ|はベクトルＡの大きさを表す。Ｖｆは注目領域のエッジ情報行列を行ベクトル（又は列ベクトル）表記したもの、Ｔはテンプレートのエッジ情報行列を行ベクトル（又は列ベクトル）表記したものである。Ｍｆは、その全成分が、注目領域のエッジ情報ベクトルの全要素の平均値ｍであるベクトル、すなわち、「Ｍｆ＝ｍＥ（Ｅ＝[１，１，１…，１]）」で表されるベクトルである。Ｍｔは、その全成分が、テンプレートのエッジ情報ベクトルの全要素の平均値ｔであるベクトル、すなわち、「Ｍｔ＝ｔＥ」で表されるベクトルである。なお、正規化相互相関におけるパラメータは、Ｔ、Ｍｔである。 In the present embodiment, normalized cross-correlation is used as a similarity calculation algorithm. In the normalized cross-correlation, the similarity S is obtained from the result of the calculation represented by the calculation formula “S = (Vf−Mf) * (T−Mt) / (| Vf−Mf || T−Mt |)”. . Note that * in the equation represents a vector dot product operator, and | A | represents the size of the vector A. Vf is a row vector (or column vector) notation of the edge information matrix of the region of interest, and T is a row vector (or column vector) notation of the template edge information matrix. Mf is a vector whose all components are the average value m of all the elements of the edge information vector of the region of interest, that is, “Mf = mE (E = [1, 1, 1..., 1])”. Is a vector. Mt is a vector whose all components are the average value t of all elements of the edge information vector of the template, that is, a vector represented by “Mt = tE”. Note that parameters in normalized cross-correlation are T and Mt.

顔検出および顔向き推定動作の動作フローを図４に示す。なお、この動作フローでは、前段階として、顔検出および顔向き推定を行う画像が画像記憶部１０に入力される処理が行われる。 FIG. 4 shows an operation flow of the face detection and face direction estimation operation. In this operation flow, as a previous step, processing for inputting an image for face detection and face orientation estimation to the image storage unit 10 is performed.

図４を参照して、ステップＳ１では、画像記憶部１０に入力された画像または後述の縮小画像についてエッジ情報計算部１１で、エッジ情報の計算を行う。 Referring to FIG. 4, in step S 1, edge information is calculated by the edge information calculation unit 11 for an image input to the image storage unit 10 or a reduced image described later.

ステップＳ２では、顔検出および顔向き推定する領域、すなわち注目領域を特定し、注目領域における左右各領域のエッジ情報を取り出す。これにより、ベクトル値Ｖｆが得られる（以下、画像ベクトルと呼ぶ）。 In step S2, an area for face detection and face orientation estimation, that is, an attention area is specified, and edge information of each of the left and right areas in the attention area is extracted. Thereby, a vector value Vf is obtained (hereinafter referred to as an image vector).

ステップＳ３では、画像ベクトルを用い、注目領域における、左右各領域内の顔検出、顔向き推定を行う。動作の詳細は後で述べる。 In step S3, face detection and face orientation estimation in the left and right regions in the attention region are performed using the image vector. Details of the operation will be described later.

ステップＳ４では、ステップＳ３で得られた検出結果をメモリ１９へ格納する。 In step S4, the detection result obtained in step S3 is stored in the memory 19.

ステップＳ５では、注目領域が入力画像の右下まで至ったかを判定することで、入力画像の全領域におけるスキャンが終了したかを判断する。終了した場合は、ステップＳ６へ進み。終了していない場合は、ステップＳ２へ戻り、注目領域を再設定する。 In step S 5, it is determined whether the scan of the entire area of the input image has been completed by determining whether the attention area has reached the lower right of the input image. If completed, the process proceeds to step S6. If not completed, the process returns to step S2 to reset the attention area.

ステップＳ６では、規定回数まで入力画像の縮小がなされたかの判断を行う。規定回数まで縮小がなされた場合は、当該全検出動作の終了となる。終了していない場合は、ステップＳ７へ進む。 In step S6, it is determined whether the input image has been reduced to the specified number of times. When the reduction has been performed to the specified number of times, the entire detection operation ends. If not completed, the process proceeds to step S7.

ステップＳ７では、現在検出を行った対象画像の縮小画像を作成する。その後、当該縮小画像の顔検出および顔向き推定を行うため、ステップＳ１へ戻る。 In step S7, a reduced image of the target image currently detected is created. Thereafter, the process returns to step S1 in order to perform face detection and face orientation estimation of the reduced image.

顔および非顔検出動作（ステップＳ３）の動作フローを図５に示す。 FIG. 5 shows an operation flow of the face and non-face detection operation (step S3).

図５を参照して、ステップＳ３１では、ステップＳ２で計算された注目領域のエッジ情報のうち左半分のものを取り出す。 Referring to FIG. 5, in step S31, the left half of the edge information of the attention area calculated in step S2 is extracted.

ステップＳ３３では、ステップＳ２で計算された注目領域のエッジ情報のうち右半分のものを取り出す。 In step S33, the right half of the edge information of the attention area calculated in step S2 is extracted.

ステップＳ３２、Ｓ３４では、注目領域のうちの左領域、右領域の各々に関して、正規化相互相関を用いて類似度を計算する。詳細については、後で述べる。 In steps S32 and S34, the degree of similarity is calculated using normalized cross-correlation for each of the left region and the right region of the attention region. Details will be described later.

ステップＳ３５では、ステップＳ３２、Ｓ３４で計算された左領域、右領域の各々の類似度を用いて顔検出、顔向き推定を行う。これも詳細については、後で述べる。その後、ステップＳ４へ進む。 In step S35, face detection and face orientation estimation are performed using the similarity between the left region and the right region calculated in steps S32 and S34. Details of this will be described later. Then, it progresses to step S4.

半顔における類似度計算動作（ステップＳ３２またはＳ３４）の動作フローを図６に示す。なお、ステップＳ３２とステップＳ３４の動作は同様なので、ここでは、ステップＳ３２についてのみ述べる。 FIG. 6 shows an operation flow of the similarity calculation operation (step S32 or S34) in the half face. In addition, since operation | movement of step S32 and step S34 is the same, only step S32 is described here.

図６を参照して、ステップＳ３２１では、左顔用のテンプレート等のパラメータが左顔パラメータ１４に読み込まれているか確認を行う。読み込まれていればステップＳ３２３へ進む。読み込まれていないときはステップＳ３２２へ進む。 Referring to FIG. 6, in step S 321, it is confirmed whether parameters such as a left face template are read in left face parameter 14. If it has been read, the process proceeds to step S323. If not read, the process proceeds to step S322.

ステップＳ３２２では、左顔用のテンプレート等のパラメータの読み込みを行う。 In step S322, parameters such as a left face template are read.

ステップＳ３２３では、正規化相互相関を用いて類似度を計算する。具体的には、先に述べた計算式を用いて計算を行う。 In step S323, similarity is calculated using normalized cross-correlation. Specifically, the calculation is performed using the calculation formula described above.

ステップＳ３２４では、算出した類似度Ｓをメモリ１９に格納する。 In step S324, the calculated similarity S is stored in the memory 19.

顔および顔向きの検出動作（ステップＳ３５）の動作フローを図７に示す。 FIG. 7 shows an operation flow of the face and face detection operation (step S35).

図７を参照して、ステップＳ３５１では、ステップＳ３２で計算された左半顔における類似度Ｓ_Bが閾値Ｔ_B以上であるか、またはステップＳ３４で計算された右半顔における類似度Ｓ_Aが閾値Ｔ_A以上であるかの判断を行う。少なくとも一方が満たされているときは、左顔領域および右顔領域の少なくとも一方が顔であると判断され、ステップＳ３５２へ進む。両方共に満たされないときは、左顔領域および右顔領域双方とも顔ではないと判断され、ステップＳ３５３へ進む。 Referring to FIG. 7, in step S351, the similarity S _B in the left half face calculated in step S32 is greater than or equal to the threshold value T _B , or the similarity S _A in the right half face calculated in step S34 is the same. _A determination is made as to whether or not the threshold value TA is greater than or equal to. When at least one of them is satisfied, it is determined that at least one of the left face area and the right face area is a face, and the process proceeds to step S352. When both are not satisfied, it is determined that both the left face area and the right face area are not faces, and the process proceeds to step S353.

ステップＳ３５２では、左顔領域および右顔領域の少なくとも一方が顔と判断できたので、注目領域は顔であると判定される。 In step S352, since at least one of the left face area and the right face area can be determined to be a face, it is determined that the attention area is a face.

ステップＳ３５３では、左顔領域および右顔領域の両方が顔ではないと判断されたので、注目領域は顔ではないと判定される。 In step S353, since it is determined that both the left face area and the right face area are not faces, it is determined that the attention area is not a face.

ステップＳ３５４では、類似度Ｓ_Aおよび類似度Ｓ_Bの各々が、例えば、テーブルにおいてどの範囲に位置しているかによって、判定された顔が入力画像においてどちらを向いているのかを検出する。 In step S354, it is detected which direction the determined face is facing in the input image depending on, for example, in which range each of the similarity S _A and the similarity S _B is located in the table.

以上のように、検出装置１００において顔検出および顔向き推定の動作が行われる。 As described above, the detection apparatus 100 performs face detection and face orientation estimation operations.

次に、正規化相互相関による類似度から、顔向き推定がどのように行われるのか（Ｓ３５４）について述べる。正規化相互相関における類似度の最大値は、先に述べた式から１であることが分かる（ベクトル(Ｖｆ-Ｍｆ)とベクトル(Ｔ-Ｍｔ)の向きが一致するとき）。例えば、顔向き判定部１８には、図８に示すようなテーブルが格納されており、類似度計算部１６にて算出された類似度Ｓ_Aおよび類似度Ｓ_Bの各々が、このテーブルにおいてどの範囲に位置していれば、入力画像において判定された顔が正面に対して何度左または右を向いているのかが分かる。顔向き判定部１８は、算出された類似度Ｓ_A、Ｓ_Bとテーブルとを比較し、顔向きの方向と角度を判別する。このテーブルでは、顔の向きの角度は、正面の場合を０度、右向きを負値、左向きを正値としている。 Next, how the face orientation is estimated from the similarity based on the normalized cross-correlation (S354) will be described. It can be seen that the maximum value of the similarity in the normalized cross-correlation is 1 (when the directions of the vector (Vf−Mf) and the vector (T−Mt) coincide). For example, a table as shown in FIG. 8 is stored in the face orientation determination unit 18, and each of the similarity S _A and the similarity S _B calculated by the similarity calculation unit 16 is stored in this table. If it is located in the range, it can be seen how many times the face determined in the input image faces left or right with respect to the front. The face orientation determination unit 18 compares the calculated similarities S _A and S _B with a table to determine the face orientation direction and angle. In this table, the angle of the face direction is 0 degree for the front, a negative value for the right direction, and a positive value for the left direction.

なお、テーブル中に示された閾値は一例であって、検証結果等に基づき適宜変更可能である。また、顔向き角度をこれより細かく設定しても良く、逆に、荒く設定しても良い。なお、顔向き角度を判別せずに、右と左の何れを向いているかのみを判別する場合には、類似度Ｓ_A、Ｓ_Bのうち何れが閾値以上となっているかによって顔向きを判別するようにしても良い。このとき、類似度Ｓ_A、Ｓ_Bのうち何れもが閾値以上となっている場合には、顔向き推定を正面とするか、あるいは、類似度の高い方を優先して顔向き方向の判別を行えばよい。なお、類似度Ｓ_A、Ｓ_Bが同じ値の場合または略同じの場合は、図８のテーブルの場合と同様、正面を向いているとする。 Note that the threshold values shown in the table are examples, and can be changed as appropriate based on the verification result and the like. Further, the face orientation angle may be set finer than this, or conversely, it may be set rough. When determining only the right or left direction without determining the face angle, the face direction is determined depending on which of the similarity S _A and S _B is greater than or equal to the threshold value. You may make it do. At this time, if both the similarity S _A and S _B are equal to or greater than the threshold value, the face orientation estimation is set to the front, or the face orientation direction is determined by giving priority to the one with the higher similarity. Can be done. In the case where the similarities S _A and S _B are the same value or substantially the same, it is assumed that the front faces the front as in the case of the table of FIG.

ところで、上記実施形態では、類似度計算のアルゴリズムとして正規化相互相関を用いるようにしたが、顔検出において用いられているその他の類似度算出手法を用いることも勿論可能である。 In the above embodiment, normalized cross-correlation is used as the algorithm for calculating the similarity, but it is of course possible to use another similarity calculation method used in face detection.

例えば、顔画像の標本および顔以外（非顔）の画像の標本の確率分布を用いて類似度を計算してもよい。具体的には、多数の顔画像ならびに非顔画像を左右に２等分し、各々の画像（非顔画像に関しても顔画像と同様に各々を左顔画像、右顔画像と呼ぶ）に関する画像情報の標本を用いて、各々における正規化された確率分布に関する平均値Ｍと標準偏差σを算出する。なお、非顔画像には、顔検出の際に顔と紛らわしいと考えられる物の画像が選択される。この場合のパラメータは、以上の８つの値と、顔の確率分布と非顔の確率分布の差を際立たせるために使用する投影ベクトルであり、具体的には、顔画像および非顔画像の各々について、左、右顔画像各々の正規化確率分布における平均値Ｍと標準偏差σが４組と、顔画像と非顔画像で共通の左右顔画像の各投影ベクトルが２つである。 For example, the similarity may be calculated using a probability distribution of a face image sample and a non-face (non-face) image sample. Specifically, a large number of face images and non-face images are equally divided into left and right parts, and image information about each image (the non-face images are also referred to as left face images and right face images in the same manner as face images). Are used to calculate the mean value M and the standard deviation σ for the normalized probability distribution in each. As the non-face image, an image of an object that is considered to be confused with a face at the time of face detection is selected. The parameters in this case are the above eight values, and projection vectors used to highlight the difference between the probability distribution of the face and the probability distribution of the non-face. Specifically, each of the face image and the non-face image In the normalized probability distribution of each of the left and right face images, the average value M and the standard deviation σ are four sets, and there are two projection vectors of the left and right face images common to the face image and the non-face image.

類似度Ｓは、式「Ｓ＝Ｐ_F（Ｖ＊Ｔ）／Ｐ_NF（Ｖ＊Ｔ）」で表され、左顔画像、右顔画像の各々について算出される。なお、式中の＊はベクトル内積演算子であり、Ｐ_F（ｘ）、Ｐ_NF（ｘ）は各々、顔画像および非顔画像の正規分布Ｎ（Ｍ，σ²）に対する確率密度関数を表し、「ｘ＝Ｖ＊Ｔ」の時の確率を表す。Ｖは、例えば、注目領域のエッジ情報行列を行ベクトル（又は列ベクトル）表記したものである。なお、上式はその領域の顔である確率と非顔である確率の比を表しており、注目領域に顔画像が含まれる場合には、類似度は大きくなることが分かる。なお、この場合の動作フローは正規化相互相関を用いた場合と同じである。 The similarity S is expressed by an expression “S = P _F (V * T) / P _NF (V * T)”, and is calculated for each of the left face image and the right face image. Note that * in the equation is a vector dot product operator, and P _F (x) and P _NF (x) each represent a probability density function for the normal distribution N (M, σ ² ) of the face image and the non-face image. , “X = V * T” represents the probability. V is, for example, a row vector (or column vector) representation of the edge information matrix of the region of interest. Note that the above expression represents the ratio of the probability of being a face in the region to the probability of being a non-face, and it can be seen that the similarity increases when a face image is included in the region of interest. Note that the operation flow in this case is the same as that when using normalized cross-correlation.

また、本実施形態における類似度計算のアルゴリズムとして、線形判別式、差分自乗和による計算手法を用いてもよい。線形判別式は、顔画像のみの標本の正規化確率分布および投影ベクトルを用いる。類似度は式「Ｓ＝（Ｍ―Ｖ＊Ｔ）／σ」で表され、左顔画像、右顔画像の各々について算出される。なお、式中の文字、記号は、上で用いたものと同じである。 In addition, as a similarity calculation algorithm in this embodiment, a calculation method using a linear discriminant or a sum of squared differences may be used. The linear discriminant uses a normalized probability distribution and a projection vector of a sample of only a face image. The similarity is expressed by the equation “S = (M−V * T) / σ” and is calculated for each of the left face image and the right face image. The characters and symbols in the formula are the same as those used above.

差分自乗和は、類似度Ｓは式「Ｓ＝（Ｍｆ−Ｖ）^1/2」で表され、左顔画像、右顔画像の各々について算出される。Ｍｆ、Ｖは先に述べたものと同じである。 The difference sum of squares is calculated for each of the left face image and the right face image by calculating the similarity S by the expression “S = (Mf−V) ^1/2 ”. Mf and V are the same as described above.

なお、線形判別式や差分自乗和を用いた場合の類似度Ｓは、正規化相互相関や正規分布の確率密度関数を用いた場合と異なり、マッチング度合いが高くなるほど類似度Ｓの値が小さくなる。よって、動作フローは、図７ではなくて図９に変更する必要がある。図９と図7のフローにおいて異なる点は、類似度Ｓと閾値Ｔの大小関係を規定する不等号の向きが逆向きとなっている点のみである。すなわち、図７のステップＳ３５１が図９においてステップＳ３５５に変更されている。なお、その他のステップは同じである。 Note that the similarity S in the case of using a linear discriminant or the sum of squared differences is different from the case of using a normalized cross-correlation or a probability distribution function of a normal distribution, and the value of the similarity S decreases as the matching degree increases. . Therefore, the operation flow needs to be changed to FIG. 9 instead of FIG. The only difference between the flowcharts of FIGS. 9 and 7 is that the direction of the inequality sign that defines the magnitude relationship between the similarity S and the threshold value T is reversed. That is, step S351 in FIG. 7 is changed to step S355 in FIG. The other steps are the same.

本実施形態によれば、多様な向きの顔に対応した円滑かつ適正に顔検出および顔向き推定を行うことができる。その検証結果を図１０に示す。 According to this embodiment, face detection and face orientation estimation can be performed smoothly and appropriately corresponding to faces in various orientations. The verification result is shown in FIG.

図１０を参照して、縦軸は類似度であり、横軸は顔の向きを示している。横軸の中央が正面を向いている場合で、中央より右に行くと左向顔となり、左に行くと右向き顔となる。横軸における数値は、横軸下方にある画像の画像Ｎｏ．を表している。 Referring to FIG. 10, the vertical axis represents the similarity and the horizontal axis represents the face orientation. When the center of the horizontal axis is facing the front, going to the right from the center turns left, and going to the left turns right. The numerical value on the horizontal axis indicates the image No. of the image below the horizontal axis. Represents.

図中のＳ_AおよびＳ_Bは、右顔テンプレート（右顔用のパラメータ）および左顔テンプレート（左顔用のパラメータ）を用いた場合の類似度のグラフを示し、Ｓ_WHOLEは正面顔全体をテンプレートに用いた場合の類似度（上記従来技術１）のグラフを示している。各テンプレートのパラメータは、正面顔の画像データベースを学習処理して生成されている。 S _A and S _B in the figure are graphs of similarity when using a right face template (parameter for the right face) and a left face template (parameter for the left face), and S _WHOLE represents the entire front face. The graph of the similarity (the said prior art 1) at the time of using for a template is shown. The parameters of each template are generated by learning the front face image database.

なお、類似度の算出は、正規化相互相関法に従う算出アルゴリズムによって行った（T.Sim and S.Baker and M.Bsat,"The CMU Pose, Illumination, and Expression PIE Database", Proceedings of the 5th International Conference on Automatic Face and Gesture Recognition, 2002）。また、上記実施形態とは異なり、入力画像をスキャンする工程を行わずに、各テンプレートと入力画像の目の位置を人為的に合わせて各類似度の値の算出を行った。すなわち、右顔テンプレートは、その目の位置を入力画像の右顔の目の位置に合わせ、左顔テンプレートは、その目の位置を入力画像の左顔の目の位置に合わせて類似度の算出を行った。比較例（上記従来技術１）のテンプレートは、両目の位置を入力画像の左顔の目の位置に合わせて類似度の算出を行った。 The similarity was calculated by a calculation algorithm according to the normalized cross-correlation method (T. Sim and S. Baker and M. Bsat, “The CMU Pose, Illumination, and Expression PIE Database”, Proceedings of the 5th International Conference on Automatic Face and Gesture Recognition, 2002). Further, unlike the above embodiment, the similarity value is calculated by artificially matching the positions of the eyes of each template and the input image without performing the step of scanning the input image. That is, the right face template matches the position of the eye with the right face eye position of the input image, and the left face template matches the position of the eye with the left face eye position of the input image to calculate the degree of similarity. Went. In the template of the comparative example (the above prior art 1), the degree of similarity was calculated by matching the position of both eyes with the position of the left face eye of the input image.

上記実施形態のように１画素ずつラスタースキャンする場合にも、目の位置が合う状態は起こるため、図１０と同様の特性が得られるものと想定される。すなわち、図１０の特性は、各画像をラスタースキャンしたときに目の位置が合ったタイミングの類似度と見ることができる。 Even when raster scanning is performed on a pixel-by-pixel basis as in the above-described embodiment, it is assumed that the same characteristics as in FIG. That is, the characteristic of FIG. 10 can be regarded as the similarity of the timing when the eyes are aligned when each image is raster scanned.

なお、類似度の算出は、やや下向き、やや上向きの画像に対しても併せて行った。図中の画像番号Ｃ９、Ｃ２５、Ｃ３１の画像は、Ｃ２７、Ｃ２、Ｃ１４の画像に比べ、同じ左右の向き角度でやや下向きとなっている。また、画像番号Ｃ７の画像は、Ｃ２７の画像に比べ、やや下向きとなっている。 It should be noted that the calculation of the similarity was also performed for slightly downward and slightly upward images. The images of image numbers C9, C25, and C31 in the figure are slightly downward with the same left and right orientation angles as compared to the images of C27, C2, and C14. Further, the image with the image number C7 is slightly downward compared to the image with C27.

同図を参照して、Ｓ_WHOLEのグラフは、中央において類似度が一番高く横軸に沿って左右へ行くほど類似度が低下する特性を示している。つまり、正面顔の顔検出率は高いが、左右顔の顔検出率は低いことを示している。 Referring to the figure, the graph of S _WHOLE shows the characteristic that the similarity is highest at the center, and the similarity decreases as it goes to the left and right along the horizontal axis. That is, the face detection rate of the front face is high, but the face detection rate of the left and right faces is low.

Ｓ_Aのグラフは横軸に沿って右へ行くほど類似度が向上する特性を示している。中央から右においてはＳ_WHOLEの値よりも高い値となっている。しかし、中央から左においてはＳ_WHOLEの値よりも低い値となっている。つまり、正面顔から右顔にかけての顔検出率はＳ_WHOLEの場合よりも優れているが、左顔の顔検出率は劣っていることを示している。 Graph of S _A represents the characteristic of improving the higher similarity to go right along the horizontal axis. From the center to the right, the value is higher than the value of _SWHOLE . However, from the center to the left, the value is lower than the value of _SWHOLE . That is, the face detection rate from the front face to the right face is better than that of _SWHOLE , but the face detection rate of the left face is inferior.

一方、Ｓ_Bの特性は、横軸に沿って左へ行くほど類似度が向上する特性を示している。中央から左においてはＳ_WHOLEの値よりも高い値となっている。しかし、中央から右においてはＳ_WHOLEの値よりも低い値となっている。つまり、正面顔から左顔にかけての顔検出率はＳ_WHOLEの場合よりも優れているが、右顔の顔検出率は劣っていることを示している。丁度Ｓ_Aの特性と逆となっている。 On the other hand, the characteristics of S _B is the similarity toward the left along the horizontal axis represents a characteristic to be improved. From the center to the left, the value is higher than the value of _SWHOLE . However, from the center to the right, the value is lower than the value of _SWHOLE . That is, the face detection rate from the front face to the left face is better than that of _SWHOLE , but the face detection rate of the right face is inferior. It has become a characteristic and the reverse of S _A just.

上記実施形態では、Ｓ_B≧Ｔ_Bであるか、または、Ｓ_A≧Ｔ_Aのときに顔であると判別するものであるから、図１０においては、画像が左向きの場合には類似度Ｓ_Bをもとに顔であると判別され、画像が右向きの場合には類似度Ｓ_Aをもとに顔であると判別されることとなる。結局、本実施形態では、図１０において、正面から左向き側の領域における類似度Ｓ_Bと、正面から右向き側の領域における類似度Ｓ_Aとを併せた類似度特性を有するのと等価となる。 In the above embodiment, since S _B ≧ T _B , or S _A ≧ T _A , it is determined that the face is a face. Therefore, in FIG. _The face is determined based on _B, and if the image is rightward, the face is determined based on the similarity S _A. After all, in the present embodiment, in FIG. 10, it is equivalent to having a similarity characteristic that combines the similarity S _B in the region facing left from the front and the similarity S _A in the region facing right from the front.

このように、本実施例によれば、正面顔全体をテンプレートに用いた従来技術１に比べ、正面顔に対する顔検出能力を同等に保ちながら、これ以外の向きに対しても、顔検出精度を高く維持することができる。本実施形態によれば、多様な向きにおける顔の検出能力を高めることができ、検出率の大幅な改善を図ることができる。 Thus, according to the present embodiment, compared with the related art 1 using the entire front face as a template, the face detection capability for the front face is kept equal, and the face detection accuracy is improved in other directions. Can be kept high. According to the present embodiment, the ability to detect faces in various directions can be enhanced, and the detection rate can be greatly improved.

なお、本実施形態では左顔用と右顔用の２つのパラメータを使用したが、顔を左右対称と仮定する場合は、右顔用または左顔用のパラメータを簡単な変換により他方の顔のパラメータとして使用可能であるので、右顔用または左顔用のどちらか一方のパラメータのみを装置に具備しておけばよい。これにより、図２の左顔パラメータ１４または右顔パラメータ１５のどちらかが不要となり、パラメータのメモリ使用量を先述の従来技術１の半分程度に削減できる。 In this embodiment, two parameters for the left face and the right face are used. However, when the face is assumed to be symmetrical, the parameters for the right face or the left face are simply converted into the parameters of the other face. Since it can be used as a parameter, only one parameter for the right face or the left face needs to be provided in the apparatus. Accordingly, either the left face parameter 14 or the right face parameter 15 in FIG. 2 is not necessary, and the memory usage of the parameter can be reduced to about half of the prior art 1 described above.

また、本実施形態では画像のエッジ情報を使用したが、通常の画像の輝度情報に基づいて類似度を算出してもよい。この場合のパラメータは、前述のように、輝度情報と、これらの情報に関する統計学的データなどから構成される。 In this embodiment, the edge information of the image is used. However, the similarity may be calculated based on the luminance information of the normal image. As described above, the parameters in this case are composed of luminance information and statistical data related to the information.

なお、本実施の形態における顔検出装置および顔向き推定装置は、ハードウェア的には、任意のコンピュータのＣＰＵ、メモリ、その他のＬＳＩなどで実現できる。また、ソフトウェア的には、メモリにロードされた顔検出および顔向き推定機能のあるプログラムなどによって実現される。図２には、ハードウェアおよびソフトウェアによって実現される顔検出および顔向き推定の機能ブロックが示されている。ただし、これらの機能ブロックが、ハードウェアのみ、ソフトウェアのみ、あるいは、それらの組合せ等、いろいろな形態で実現できることは言うまでもない。 It should be noted that the face detection device and face orientation estimation device in the present embodiment can be realized in hardware by a CPU, memory, or other LSI of an arbitrary computer. In terms of software, it is realized by a program having face detection and face orientation estimation functions loaded in a memory. FIG. 2 shows functional blocks for face detection and face orientation estimation realized by hardware and software. However, it goes without saying that these functional blocks can be realized in various forms such as hardware only, software only, or a combination thereof.

本発明の実施の形態は、特許請求の範囲に示された技術的思想の範囲内において、適宜、種々の変更が可能である。 The embodiments of the present invention can be appropriately modified in various ways within the scope of the technical idea shown in the claims.

実施の形態に係る顔パラメータ、ならびに顔検出および顔向き推定作業を説明する図である。It is a figure explaining the face parameter which concerns on embodiment, and a face detection and face direction estimation operation | work. 実施の形態に係る顔検出および顔向き推定装置の機能ブロック図である。It is a functional block diagram of the face detection and face direction estimation device according to the embodiment. 実施の形態に係る入力画像内における様々な大きさの顔の検出を行う動作を説明する図である。It is a figure explaining the operation | movement which performs the detection of the face of various sizes in the input image which concerns on embodiment. 実施の形態に係る顔検出および顔向き推定動作の動作フローチャートである。It is an operation | movement flowchart of the face detection and face direction estimation operation | movement which concern on embodiment. 実施の形態に係る顔検出および顔向き推定動作の動作フローチャートである。It is an operation | movement flowchart of the face detection and face direction estimation operation | movement which concern on embodiment. 実施の形態に係る顔検出および顔向き推定動作の動作フローチャートである。It is an operation | movement flowchart of the face detection and face direction estimation operation | movement which concern on embodiment. 実施の形態に係る顔検出および顔向き推定動作の動作フローチャートである。It is an operation | movement flowchart of the face detection and face direction estimation operation | movement which concern on embodiment. 実施の形態に係る顔向き推定に使用するテーブルの一例を示す図The figure which shows an example of the table used for the face direction estimation which concerns on embodiment 実施の形態に係る顔検出および顔向き推定動作の動作フローチャートである。It is an operation | movement flowchart of the face detection and face direction estimation operation | movement which concern on embodiment. 実施の形態に係る正規化相互相関を用いた場合の顔の向きに対する類似度を示すグラフである。It is a graph which shows the similarity with respect to the direction of a face at the time of using the normalized cross correlation which concerns on embodiment.

Explanation of symbols

１０画像記憶部
１２演算処理部
１４左顔パラメータ
１５右顔パラメータ
１６類似度計算部
１７顔判定部
１８顔向き判定部
１００顔検出および顔向き推定装置 DESCRIPTION OF SYMBOLS 10 Image memory | storage part 12 Operation processing part 14 Left face parameter 15 Right face parameter 16 Similarity calculation part 17 Face determination part 18 Face direction determination part 100 Face detection and face direction estimation apparatus

Claims

In a face detection device that detects a face area included in an image based on image data of the image,
An area setting means for setting a frame area on the image;
Storage means for storing individually assigned parameters corresponding to specific divided regions obtained by dividing the frame region into regions;
Based on the respective parameters and the image data of the divided areas to which the parameters are assigned, a determination unit that determines whether the image area in which the frame area is set is a face area,
A face detection device characterized by that.

In claim 1,
The divided region is a region when the frame region is divided into left and right regions, and the parameters are prepared corresponding to the left and right divided regions, respectively.
A face detection device characterized by that.

In claim 2,
The determination means compares the respective parameters with the image data of the divided area to which the parameters are assigned to obtain the similarity between the image defined by the parameter and the image of the divided area. Based on the similarity, it is determined whether the image area in which the frame area is set is a face area,
A face detection device characterized by that.

In claim 3,
The discrimination means compares the similarity to the left and right divided areas with a threshold value, and if at least one of the similarity degrees is equal to or greater than the threshold value, whether the image area in which the frame area is set is a face area To determine the
A face detection device characterized by that.

In any one of Claims 1 thru | or 4,
Each of the parameters is generated by extracting the feature amount of each divided region from the front face image database, and only one is prepared for the one frame region.
A face detection device characterized by that.

A face orientation estimation device that detects a face orientation included in a frame area based on image data in the frame area,
Storage means for storing individually assigned parameters corresponding to specific divided regions obtained by dividing the frame region into regions;
Based on each of the parameters and image data of the divided region to which the parameter is assigned, a determination unit that determines the orientation of the face included in the frame region,
A face orientation estimation device characterized by the above.

In claim 6,
The divided region is a region when the frame region is divided into left and right regions, and the parameters are prepared corresponding to the left and right divided regions, respectively.
A face orientation estimation device characterized by the above.

In claim 7,
The determination means compares the respective parameters with the image data of the divided area to which the parameters are assigned to obtain the similarity between the image defined by the parameter and the image of the divided area. Based on the degree of similarity, determine the horizontal direction of the face included in the frame area,
A face orientation estimation device characterized by the above.

In claim 8,
The determining means determines the orientation of the face included in the frame region in the left-right direction based on the magnitude relationship of the similarity to the left and right divided regions.
A face orientation estimation device characterized by the above.

In claim 9,
The determining means compares the degree of similarity with the left and right divided regions and a threshold value to determine the size of the face in the left-right direction included in the frame region;
A face orientation estimation device characterized by the above.