JP2007108990A

JP2007108990A - Face detecting method, device and program

Info

Publication number: JP2007108990A
Application number: JP2005298831A
Authority: JP
Inventors: Kensuke Terakawa; 賢祐寺川
Original assignee: Fujifilm Corp
Current assignee: Fujifilm Corp
Priority date: 2005-10-13
Filing date: 2005-10-13
Publication date: 2007-04-26
Anticipated expiration: 2025-10-13
Also published as: JP4757598B2

Abstract

<P>PROBLEM TO BE SOLVED: To suppress erroneous detection in processing for detecting a face included in an image. <P>SOLUTION: An index value computing means 34 computes an index value showing probability that a partial image W is a face image on each of partial images taken out in different positions on an image to be detected, and a face image candidate extracting means 34 extracts all of the partial images W whose index values are a predetermined threshold value or more, as candidates of the face image. A highly reliable face image determining means 31 determines the candidate on which a certain high index value is computed, as a highly reliable face image, and an index value increasing means 31 increases the index value to the specific candidate with almost the same face inclination as the highly reliable face image. A face image detecting means 31 performs threshold value determination of the index values on all the candidates to detect the face image. Only the specific candidate may be detected as the face image. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、デジタル画像の中から顔画像を検出する顔検出方法および装置並びにそのためのプログラムに関するものである。 The present invention relates to a face detection method and apparatus for detecting a face image from a digital image, and a program therefor.

従来、デジタルカメラによって撮影されたスナップ写真における人物の顔領域の色分布を調べてその肌色を補正したり、監視システムのデジタルビデオカメラで撮影されたデジタル映像中の人物を認識したりすることが行われている。このような場合、デジタル画像中の人物の顔に対応する顔領域を検出する必要があるため、これまでに、デジタル画像中の顔を含む顔画像を検出する方法が種々提案されている。 Conventionally, it is possible to check the color distribution of the face area of a person in a snapshot photographed by a digital camera and correct the skin color, or to recognize a person in a digital image photographed by a digital video camera of a surveillance system. Has been done. In such a case, since it is necessary to detect a face area corresponding to a human face in the digital image, various methods for detecting a face image including a face in the digital image have been proposed so far.

例えば、検出対象画像上の複数の異なる位置で部分画像を切り出し、その部分画像が顔を含む画像（顔画像）であるか否かを判別して、検出対象画像上の顔画像を検出する方法が挙げられる。部分画像が顔画像であるか否かを判別するには、例えば、テンプレートマッチングによる手法や、マシンラーニングの学習手法により顔の特徴を学習させた判別器モジュールを用いる手法等が考えられるが（例えば、非特許文献１、特許文献１〜３等による手法）、いずれの手法においても、この判別には、部分画像の画像パターンに基づいてその部分画像が顔画像である蓋然性を示す指標値を算出し、その指標値が所定の閾値を超えた場合に、その部分画像を顔画像と判別する手法を用いるのが一般的である。
「高速全方向顔検出」，Shihong LAO他，画像の認識・理解シンポジウム（MIRU2004），２００４年７月，P.II-271−II-276 特願２００３−３１６９２４号特願２００３−３１６９２５号特願２００３−３１６９２６号 For example, a method of extracting a partial image at a plurality of different positions on the detection target image, determining whether the partial image is an image including a face (face image), and detecting the face image on the detection target image Is mentioned. In order to determine whether or not a partial image is a face image, for example, a template matching method, a method using a discriminator module in which facial features are learned by a machine learning learning method, and the like can be considered (for example, In any of these methods, an index value indicating the probability that the partial image is a face image is calculated based on the image pattern of the partial image. However, when the index value exceeds a predetermined threshold, it is common to use a technique for discriminating the partial image from a face image.
"High-speed omnidirectional face detection", Shihong LAO et al., Image Recognition and Understanding Symposium (MIRU2004), July 2004, P.II-271-II-276 Japanese Patent Application No. 2003-316924 Japanese Patent Application No. 2003-316925 Japanese Patent Application No. 2003-316926

しかしながら、上記のような、画像パターンに基づく指標値の閾値判定による判別手法を用いた顔検出方法においては、検出対象画像における画像の明るさやコントラストにむらがある等、検出対象画像の画質が悪い場合や、検出対象画像上の顔が別の被写体によって遮られ、顔の一部が遮蔽されたような場合には、顔の特徴を充分に捉えることができず、顔画像の検出漏れを起こしたり、また、検出対象画像上にたまたま顔に類似した非顔のパターンが存在すると、その非顔の画像を顔画像として誤検出したりするという問題がある。 However, in the face detection method using the discrimination method based on the index value threshold determination based on the image pattern as described above, the image quality of the detection target image is poor, such as unevenness in the brightness or contrast of the image in the detection target image. If the face on the detection target image is obstructed by another subject and part of the face is obstructed, the facial features cannot be captured sufficiently, and face image detection may be missed. If a non-face pattern similar to a face happens to be present on the detection target image, the non-face image is erroneously detected as a face image.

本発明は、上記事情に鑑み、顔画像の検出漏れや誤検出をより抑制することが可能な顔検出方法および装置並びにそのためのプログラムを提供することを目的とするものである。 In view of the above circumstances, an object of the present invention is to provide a face detection method and apparatus capable of further suppressing detection errors and erroneous detection of face images, and a program therefor.

本発明の第１の顔検出方法は、入力画像における顔画像を検出する顔検出方法であって、前記入力画像上の異なる位置で部分画像を切り出すステップと、前記異なる位置で切り出された複数の部分画像の各々について、該部分画像の画像上の特徴量に基づいて、該部分画像が所定の傾きの顔を含む顔画像である蓋然性を示す指標値を、前記所定の傾きを複数の異なる傾きに変えてそれぞれ算出するステップと、前記複数の部分画像のうち、算出された指標値が第１の閾値以上である部分画像を該指標値が算出されたときの前記所定の向きの顔を含む顔画像の候補としてすべて抽出するステップと、前記顔画像の候補の中から、算出された指標値が前記第１の閾値より大きい第２の閾値以上である候補を、高信頼顔画像として決定するステップと、前記顔画像の候補のうち、顔の傾きが前記高信頼顔画像における顔の傾きと略同じである候補に対して、該候補の前記指標値を増加させる処理を施すステップと、前記処理後の指標値が前記第１の閾値と前記第２の閾値の間の値である第３の閾値以上である前記候補を、前記顔画像として検出するステップとを有することを特徴とする方法である。 A first face detection method of the present invention is a face detection method for detecting a face image in an input image, the step of cutting out partial images at different positions on the input image, and a plurality of clips cut out at the different positions For each of the partial images, an index value indicating the probability that the partial image is a face image including a face having a predetermined inclination based on the feature amount on the image of the partial image, and the predetermined inclination is set to a plurality of different inclinations. Each of the plurality of partial images, and a partial image whose calculated index value is equal to or greater than a first threshold is included in the predetermined orientation when the index value is calculated. A step of extracting all as face image candidates, and a candidate having a calculated index value equal to or greater than a second threshold value greater than the first threshold value is determined as a highly reliable face image from among the face image candidates. Step Performing a process for increasing the index value of the candidate for candidates whose face inclination is substantially the same as the face inclination in the highly reliable face image among the face image candidates; Detecting the candidate whose index value is equal to or greater than a third threshold value, which is a value between the first threshold value and the second threshold value, as the face image. .

本発明の第２の顔検出方法は、入力画像における顔画像を検出する顔検出方法であって、前記入力画像上の異なる位置で部分画像を切り出すステップと、前記異なる位置で切り出された複数の部分画像の各々について、該部分画像の画像上の特徴量に基づいて、該部分画像が所定の傾きの顔を含む顔画像である蓋然性を示す指標値を、前記所定の傾きを複数の異なる傾きに変えてそれぞれ算出するステップと、前記複数の部分画像のうち、算出された指標値が第１の閾値以上である部分画像を、該指標値が算出されたときの前記所定の傾きの顔を含む顔画像の候補としてすべて抽出するステップと、前記顔画像の候補の中から、算出された指標値が前記第１の閾値より大きい第２の閾値以上である候補を、高信頼顔画像として決定するステップと、前記顔画像の候補のうち、顔の傾きが前記高信頼顔画像における顔の傾きと略同じである候補のみを、前記顔画像として検出するステップとを有することを特徴とする方法である。 A second face detection method according to the present invention is a face detection method for detecting a face image in an input image, the step of cutting out partial images at different positions on the input image, and a plurality of clips cut out at the different positions. For each of the partial images, an index value indicating the probability that the partial image is a face image including a face having a predetermined inclination based on the feature amount on the image of the partial image, and the predetermined inclination is set to a plurality of different inclinations. And calculating a partial image having a calculated index value equal to or greater than a first threshold among the plurality of partial images, the face having the predetermined inclination when the index value is calculated. A step of extracting all of the face image candidates as candidates, and a candidate having a calculated index value equal to or greater than a second threshold value greater than the first threshold value is determined as a highly reliable face image from the face image candidates. Step And detecting only candidates whose face inclination is substantially the same as the face inclination in the high-reliability face image among the face image candidates as the face image. .

本発明の第１の顔検出装置は、入力画像における顔画像を検出する顔検出装置であって、前記入力画像上の異なる位置で部分画像を切り出す部分画像切出し手段と、前記異なる位置で切り出された複数の部分画像の各々について、該部分画像の画像上の特徴量に基づいて、該部分画像が所定の傾きの顔を含む顔画像である蓋然性を示す指標値を、前記所定の傾きを複数の異なる傾きに変えてそれぞれ算出する指標値算出手段と、前記複数の部分画像のうち、算出された指標値が第１の閾値以上である部分画像を該指標値が算出されたときの前記所定の向きの顔を含む顔画像の候補としてすべて抽出する顔画像候補抽出手段と、前記顔画像の候補の中から、算出された指標値が前記第１の閾値より大きい第２の閾値以上である候補を、高信頼顔画像として決定する高信頼顔画像決定手段と、前記顔画像の候補のうち、顔の傾きが前記高信頼顔画像における顔の傾きと略同じである候補に対して、該候補の前記指標値を増加させる処理を施す指標値増加手段と、前記処理後の指標値が前記第１の閾値と前記第２の閾値の間の値である第３の閾値以上である前記候補を、前記顔画像として検出する顔画像検出手段とを備えたことを特徴とするものである。 A first face detection device according to the present invention is a face detection device that detects a face image in an input image, and includes a partial image cutout unit that cuts out a partial image at a different position on the input image, and a cutout at the different position. For each of the plurality of partial images, an index value indicating the probability that the partial image is a face image including a face having a predetermined inclination is set on the basis of the feature amount on the image of the partial image. Index value calculating means for calculating each of them with different slopes, and the predetermined value when the index value is calculated for a partial image of the plurality of partial images whose calculated index value is greater than or equal to a first threshold value A face image candidate extracting unit that extracts all face image candidates including a face in the direction of the face, and a calculated index value from the face image candidates is equal to or greater than a second threshold value that is greater than the first threshold value. Candidate, high confidence face A high-reliability face image determination unit that determines the image; and a candidate for which the face inclination is substantially the same as the face inclination in the high-reliability face image among the face image candidates. Index value increasing means for performing an increase process, and the candidate whose index value after the process is greater than or equal to a third threshold value that is a value between the first threshold value and the second threshold value as the face image It comprises a face image detecting means for detecting.

本発明の第２の顔検出装置は、入力画像における顔画像を検出する顔検出装置であって、前記入力画像上の異なる位置で部分画像を切り出す部分画像切出し手段と、前記異なる位置で切り出された複数の部分画像の各々について、該部分画像の画像上の特徴量に基づいて、該部分画像が所定の傾きの顔を含む顔画像である蓋然性を示す指標値を、前記所定の傾きを複数の異なる傾きに変えてそれぞれ算出する指標値算出手段と、前記複数の部分画像のうち、算出された指標値が第１の閾値以上である部分画像を、該指標値が算出されたときの前記所定の傾きの顔を含む顔画像の候補としてすべて抽出する顔画像候補抽出手段と、前記顔画像の候補の中から、算出された指標値が前記第１の閾値より大きい第２の閾値以上である候補を、高信頼顔画像として決定する高信頼顔画像決定手段と、前記顔画像の候補のうち、顔の傾きが前記高信頼顔画像における顔の傾きと略同じである候補のみを、前記顔画像として検出する顔画像検出手段とを備えたことを特徴とするものである。 A second face detection device according to the present invention is a face detection device that detects a face image in an input image, and includes a partial image cutout unit that cuts out a partial image at a different position on the input image, and a cutout at the different position. For each of the plurality of partial images, an index value indicating the probability that the partial image is a face image including a face having a predetermined inclination is set on the basis of the feature amount on the image of the partial image. The index value calculating means for calculating each of them with different slopes, and the partial image of which the calculated index value is greater than or equal to a first threshold among the plurality of partial images, when the index value is calculated Face image candidate extraction means for extracting all face image candidates including a face having a predetermined inclination, and a calculated index value out of the face image candidates is greater than or equal to a second threshold value greater than the first threshold value. A candidate is highly reliable A highly reliable face image determining means for determining an image, and among the face image candidates, a face image in which only a candidate whose face inclination is substantially the same as the face inclination in the highly reliable face image is detected as the face image. And a detecting means.

本発明の第１のプログラムは、コンピュータを、入力画像における顔画像を検出する顔検出装置として機能させるためのプログラムであって、該コンピュータを、前記入力画像上の異なる位置で部分画像を切り出す部分画像切出し手段、前記異なる位置で切り出された複数の部分画像の各々について、該部分画像の画像上の特徴量に基づいて、該部分画像が所定の傾きの顔を含む顔画像である蓋然性を示す指標値を、前記所定の傾きを複数の異なる傾きに変えてそれぞれ算出する指標値算出手段、前記複数の部分画像のうち、算出された指標値が第１の閾値以上である部分画像を該指標値が算出されたときの前記所定の向きの顔を含む顔画像の候補としてすべて抽出する顔画像候補抽出手段、前記顔画像の候補の中から、算出された指標値が前記第１の閾値より大きい第２の閾値以上である候補を、高信頼顔画像として決定する高信頼顔画像決定手段、前記顔画像の候補のうち、顔の傾きが前記高信頼顔画像における顔の傾きと略同じである候補に対して、該候補の前記指標値を増加させる処理を施す指標値増加手段、前記処理後の指標値が前記第１の閾値と前記第２の閾値の間の値である第３の閾値以上である前記候補を、前記顔画像として検出する顔画像検出手段、として機能させることを特徴とするものである。 A first program according to the present invention is a program for causing a computer to function as a face detection device for detecting a face image in an input image, wherein the computer cuts out partial images at different positions on the input image. Image segmentation means, for each of the plurality of partial images cut out at the different positions, shows the probability that the partial image is a face image including a face with a predetermined inclination based on the feature amount of the partial image. Index value calculation means for calculating the index value by changing the predetermined inclination to a plurality of different inclinations, and a partial image having a calculated index value equal to or greater than a first threshold among the plurality of partial images. Face image candidate extraction means for extracting all face image candidates including the face of the predetermined orientation when the value is calculated, and the calculated index value is selected from the face image candidates. A highly reliable face image determining unit that determines a candidate that is equal to or greater than a second threshold value greater than the first threshold value as a highly reliable face image, and among the face image candidates, the face tilt in the highly reliable face image Index value increasing means for performing processing for increasing the index value of the candidate for a candidate that is substantially the same as the slope of the candidate, and the index value after the processing is between the first threshold value and the second threshold value The candidate having a value equal to or greater than a third threshold value is made to function as a face image detecting means for detecting the candidate as the face image.

本発明の第２のプログラムは、コンピュータを、入力画像における顔画像を検出する顔検出装置として機能させるためのプログラムであって、該コンピュータを、前記入力画像上の異なる位置で部分画像を切り出す部分画像切出し手段、前記異なる位置で切り出された複数の部分画像の各々について、該部分画像の画像上の特徴量に基づいて、該部分画像が所定の傾きの顔を含む顔画像である蓋然性を示す指標値を、前記所定の傾きを複数の異なる傾きに変えてそれぞれ算出する指標値算出手段、前記複数の部分画像のうち、算出された指標値が第１の閾値以上である部分画像を、該指標値が算出されたときの前記所定の傾きの顔を含む顔画像の候補としてすべて抽出する顔画像候補抽出手段、前記顔画像の候補の中から、算出された指標値が前記第１の閾値より大きい第２の閾値以上である候補を、高信頼顔画像として決定する高信頼顔画像決定手段、前記顔画像の候補のうち、顔の傾きが前記高信頼顔画像における顔の傾きと略同じである候補のみを、前記顔画像として検出する顔画像検出手段、として機能させることを特徴とするものである。 A second program of the present invention is a program for causing a computer to function as a face detection device for detecting a face image in an input image, wherein the computer cuts out partial images at different positions on the input image. Image segmentation means, for each of the plurality of partial images cut out at the different positions, shows the probability that the partial image is a face image including a face with a predetermined inclination based on the feature amount of the partial image. Index value calculation means for calculating the index value by changing the predetermined inclination to a plurality of different inclinations, and among the plurality of partial images, a partial image whose calculated index value is greater than or equal to a first threshold value, Face image candidate extraction means for extracting all face image candidates including the face having the predetermined inclination when the index value is calculated, and the calculated index value from the face image candidates Highly reliable face image determination means for determining a candidate that is equal to or greater than a second threshold value greater than the first threshold value as a highly reliable face image, and among the face image candidates, a face tilt in the highly reliable face image Only candidates that are substantially the same as the inclination of the face are made to function as face image detection means for detecting the face image.

本発明において、「顔画像」とは、顔を構成する画像を含む画像のことを言う。 In the present invention, the “face image” refers to an image including an image constituting a face.

また、「顔の傾き」とは、インプレーン（画像面内）方向での傾きを意味し、別の言い方をすれば、顔の画像上での回転位置を意味するものである。 Further, “face inclination” means an inclination in the in-plane (in-image plane) direction, and in other words means a rotation position on the face image.

また、「顔の傾きが高信頼顔画像における顔の傾きと略同じ」とは、例えば、前記顔画像の候補における顔の傾きと高信頼顔画像における顔の傾きとの角度のずれが所定範囲内であることを言うものであり、例えば、その角度のずれが±４５度以内であることとすることができる。 Further, “the face inclination is substantially the same as the face inclination in the highly reliable face image” means that, for example, the angle deviation between the face inclination in the face image candidate and the face inclination in the highly reliable face image is within a predetermined range. For example, the angle deviation may be within ± 45 degrees.

本発明の第１の顔検出方法および装置並びにプログラムによれば、検出対象となる画像から顔画像である蓋然性を示す指標値が第１の閾値以上である部分画像を顔画像の候補として抽出し、抽出された顔画像の候補の中で、その指標値が第１の閾値よりさらに高い第２の閾値以上である候補を高信頼顔画像として決定し、顔の傾きがこの高信頼顔画像と略同じである候補に対する指標値を増加させ、指標値が第１の閾値と第２の閾値の間の値である第３の閾値以上である候補を顔画像として検出するので、同一の画像に含まれる複数の顔の傾きは略同じであることが多いという経験則に基づいて、顔画像の候補に対する指標値に候補としての信頼度をより適正に反映させることができ、顔画像の検出漏れや誤検出を抑制することが可能となる。 According to the first face detection method, apparatus, and program of the present invention, a partial image whose index value indicating the probability of being a face image is not less than the first threshold is extracted as a face image candidate from the detection target image. Among the extracted face image candidates, a candidate having an index value equal to or higher than a second threshold value that is higher than the first threshold value is determined as a highly reliable face image, and the inclination of the face is determined as the highly reliable face image. Since the index value for a candidate that is substantially the same is increased and a candidate whose index value is equal to or greater than a third threshold value that is a value between the first threshold value and the second threshold value is detected as a face image, Based on an empirical rule that the inclinations of multiple faces are often approximately the same, the reliability of the candidate can be more appropriately reflected in the index value for the candidate for the face image, and the detection of the face image is omitted. And it is possible to suppress false detection

また、本発明の第２の顔検出方法および装置並びにそのためのプログラムによれば、検出対象となる画像から顔画像である蓋然性を示す指標値が第１の閾値以上である部分画像を顔画像の候補として抽出し、抽出された顔画像の候補の中で、その指標値が第１の閾値よりさらに高い第２の閾値以上である候補を高信頼顔画像として決定し、顔の傾きがこの高信頼顔画像と略同じである候補のみを顔画像として検出するので、同一の画像に含まれる複数の顔の傾きは略同じであることが多いという経験則に基づいて、信頼度が低いと考えられる顔画像の候補を排除することができ、顔画像の誤検出を抑制することが可能となる。 Further, according to the second face detection method and apparatus of the present invention and the program therefor, a partial image whose index value indicating the probability of being a face image is greater than or equal to the first threshold is detected from the image to be detected. Among the extracted face image candidates, a candidate having an index value equal to or higher than a second threshold value that is higher than the first threshold value is determined as a highly reliable face image. Since only candidates that are substantially the same as the trusted face image are detected as face images, the reliability is considered to be low based on an empirical rule that the inclinations of multiple faces included in the same image are often substantially the same. Can be excluded, and erroneous detection of the face image can be suppressed.

以下、本発明の実施形態について説明する。 Hereinafter, embodiments of the present invention will be described.

図１は本発明による第１の顔検出装置の実施形態（第１の実施形態）である顔検出システム１の構成を示す概略ブロック図である。この顔検出システム１は、入力されたデジタル画像上の顔を含む画像（以下、顔画像という）を、顔の位置、大きさ、傾き（画像面内での回転位置）、向き（左右首振り方向での向き）によらず検出するものである。本顔検出システム１は、特に検出精度、ロバスト性が優れているとされる顔検出の手法として、サンプル画像を用いたマシンラーニングの学習により生成された判別器モジュール（以下、単に判別器という）を用いる手法を採用したものである。この手法は、所定の傾きおよび向きの顔を表す複数の異なる顔サンプル画像と、非顔を表す複数の異なる非顔サンプル画像とを用いて、顔の特徴を学習させ、ある画像が、所定の傾きおよび向きの顔を含む顔画像であるか否かを判別できる判別器を生成して用意しておき、顔の検出対象となる画像（以下、検出対象画像という）上の異なる位置において部分画像を順次切り出し、その部分画像が顔画像であるか否かを上記の判別器を用いて判別することにより、検出対象画像に含まれる顔画像を検出する手法である。 FIG. 1 is a schematic block diagram showing a configuration of a face detection system 1 which is an embodiment (first embodiment) of a first face detection apparatus according to the present invention. This face detection system 1 converts an image including a face on an input digital image (hereinafter referred to as a face image) into a face position, size, tilt (rotation position in the image plane), and orientation (left and right swing). It is detected regardless of the direction). The face detection system 1 is a discriminator module (hereinafter simply referred to as a discriminator) generated by machine learning learning using a sample image as a face detection method that is particularly excellent in detection accuracy and robustness. The method using is adopted. This technique uses a plurality of different face sample images representing faces with a predetermined inclination and orientation and a plurality of different non-face sample images representing non-faces to learn facial features. A discriminator capable of discriminating whether or not the face image includes a face with an inclination and orientation is prepared and prepared, and partial images are displayed at different positions on an image to be detected (hereinafter referred to as a detection target image). Is sequentially extracted, and whether or not the partial image is a face image is determined using the above discriminator, thereby detecting a face image included in the detection target image.

顔検出システム１は、図１に示すように、多重解像度化部１０と、正規化部２０と、顔検出部３０と、重複検出判定部４０とを備えている。 As shown in FIG. 1, the face detection system 1 includes a multi-resolution conversion unit 10, a normalization unit 20, a face detection unit 30, and an overlap detection determination unit 40.

多重解像度化部１０は、入力された検出対象画像Ｓ０を多重解像度化して解像度の異なる複数の画像（Ｓ１＿１，Ｓ１＿２，・・・，Ｓ１＿ｎ；以下、解像度画像という）からなる解像度画像群Ｓ１を得るものである。 The multi-resolution converting unit 10 multi-resolutions the input detection target image S0 to obtain a resolution image group S1 including a plurality of images having different resolutions (S1_1, S1_2,..., S1_n; hereinafter referred to as resolution images). Is.

検出対象画像Ｓ０の画像サイズ、すなわち、解像度を変換することにより、その解像度を所定の解像度、例えば、短辺が４１６画素の矩形サイズの画像に規格化し、規格化済みの検出対象画像Ｓ０′を得る。そして、この規格化済みの検出対象画像Ｓ０′を基本としてさらに解像度変換を行うことにより、解像度の異なる複数の解像度画像を生成し、解像度画像群Ｓ１を得る。このような解像度画像群を生成する理由は、通常、検出対象画像Ｓ０に含まれる顔の大きさは不明であるが、一方、検出しようとする顔の大きさは、後述の判別器の生成方法と関連して一定の大きさに固定されるため、大きさの異なる顔を検出するためには、解像度の異なる画像上で位置をずらしながら所定サイズの部分画像をそれぞれ切り出し、その部分画像が顔画像であるか否かを判別してゆく必要があるためである。 By converting the image size of the detection target image S0, that is, the resolution, the resolution is normalized to a predetermined resolution, for example, a rectangular size image having a short side of 416 pixels, and the standardized detection target image S0 ′ is converted. obtain. Then, by further performing resolution conversion based on the standardized detection target image S0 ′, a plurality of resolution images having different resolutions are generated, and a resolution image group S1 is obtained. The reason why such a resolution image group is generated is that the size of the face included in the detection target image S0 is usually unknown, while the size of the face to be detected is determined by a method of generating a discriminator described later. In order to detect faces of different sizes, each partial image of a predetermined size is cut out while shifting the position on images with different resolutions. This is because it is necessary to determine whether the image is an image.

図２は、検出対象画像Ｓ０の多重解像度化の工程を示した図である。多重解像度化、すなわち、解像度画像群の生成は、具体的には、図２に示すように、規格化済みの検出対象画像Ｓ０′を基本となる解像度画像Ｓ１＿１とし、解像度画像Ｓ１＿１に対して２の−１／３乗倍サイズの解像度画像Ｓ１＿２と、解像度画像Ｓ１＿２に対して２の−１／３乗倍サイズ（基本画像Ｓ１＿１に対しては２の−２／３乗倍サイズ）の解像度画像Ｓ１＿３とを先に生成し、その後、解像度画像Ｓ１＿１，Ｓ１＿２，Ｓ１＿３のそれぞれを１／２倍サイズに縮小した解像度画像を生成し、それら縮小した解像度画像をさらに１／２倍サイズに縮小した解像度画像を生成する、といった処理を繰り返し行い、複数の解像度画像を所定の数だけ生成するようにする。このようにすることで、輝度を表す画素値の補間処理を必要としない１／２倍の縮小処理を主な処理とし、基本となる解像度画像から２の−１／３乗倍ずつサイズが縮小された複数の画像が高速に生成できる。例えば、解像度画像Ｓ１＿１が短辺４１６画素の矩形サイズである場合、解像度画像Ｓ１＿２，Ｓ１＿３，・・・は、短辺がそれぞれ、３３０画素，２６２画素，２０８画素，１６５画素，１３１画素，１０４画素，８２画素，６５画素，・・・の矩形サイズとなり、２の−１／３乗倍ずつ縮小された複数の解像度画像を生成することができる。なお、このように画素値を補間しないで生成される画像は、元の画像パターンの特徴をそのまま担持する傾向が強いので、顔検出処理において精度向上が期待できる点で好ましい。 FIG. 2 is a diagram illustrating a process of multi-resolution of the detection target image S0. Specifically, the multiresolution, that is, the generation of the resolution image group, is performed by setting the standardized detection target image S0 ′ as the basic resolution image S1_1 as shown in FIG. 2 and adding 2 to the resolution image S1_1. Resolution image S1_2 having a size of -1/3 times, and a resolution image having a size of -1/3 times 2 of the resolution image S1_2 (a size of -2/3 times the size of the basic image S1_1). S1_3 is generated first, and then a resolution image obtained by reducing each of the resolution images S1_1, S1_2, and S1_3 to 1/2 times size is generated, and the reduced resolution image is further reduced to 1/2 times size. A predetermined number of resolution images are generated by repeatedly performing processing such as image generation. In this way, the reduction processing of 1/2 times that does not require the interpolation processing of the pixel value representing the luminance is the main processing, and the size is reduced by 2−1 / 3 times from the basic resolution image. A plurality of images can be generated at high speed. For example, when the resolution image S1_1 has a rectangular size of 416 pixels on the short side, the resolution images S1_2, S1_3,... Have 330 pixels, 262 pixels, 208 pixels, 165 pixels, 131 pixels, and 104 pixels on the short sides, respectively. , 82 pixels, 65 pixels,..., And can generate a plurality of resolution images reduced by a factor of 2 to −1/3. Note that an image generated without interpolating pixel values in this way has a strong tendency to retain the characteristics of the original image pattern as it is, and is preferable in that an improvement in accuracy can be expected in face detection processing.

正規化部２０は、解像度画像のコントラストが顔検出処理に適した状態となるように、解像度画像の各々に対して全体正規化処理および局所正規化処理を施し、正規化済みの複数の解像度画像（Ｓ１′＿１，Ｓ１′＿２，・・・，Ｓ１′＿ｎ）からなる解像度画像群Ｓ１′を得るものである。 The normalization unit 20 performs overall normalization processing and local normalization processing on each of the resolution images so that the contrast of the resolution image is suitable for the face detection processing, and a plurality of normalized resolution images A resolution image group S1 ′ composed of (S1′_1, S1′_2,..., S1′_n) is obtained.

まず、全体正規化処理について説明する、全体正規化処理は、解像度画像のコントラストを顔検出処理に適した所定のレベル、すなわち、後述の判別器の性能を引き出すのに適したレベルに近づけるべく、解像度画像全体の画素値をこの画像における被写体の輝度の対数を表す値に近づける変換曲線にしたがって変換する処理である。 First, the overall normalization process will be described. In the overall normalization process, the contrast of the resolution image is brought close to a predetermined level suitable for the face detection process, that is, a level suitable for extracting the performance of the discriminator described later. This is a process of converting the pixel value of the entire resolution image according to a conversion curve that approximates the value representing the logarithm of the luminance of the subject in this image.

図３は全体正規化処理に用いる変換曲線の一例を示した図である。全体正規化処理としては、例えば、図３に示すような、画素値をｓＲＧＢ空間におけるいわゆる逆ガンマ変換（＝２．２乗する）した後にさらに対数をとるような変換曲線（ルックアップテーブル）にしたがって、画像全体における画素値を変換する処理を考えることができる。これは、次のような理由による。 FIG. 3 is a diagram showing an example of a conversion curve used for the overall normalization process. As the overall normalization process, for example, as shown in FIG. 3, a so-called inverse gamma conversion (= 2.2 power) in the sRGB space is performed, and then a conversion curve (look-up table) that takes a logarithm is used. Therefore, a process for converting pixel values in the entire image can be considered. This is due to the following reason.

画像として観測される光強度Ｉは、通常、被写体の反射率Ｒと光源の強度Ｌの積として表現される（Ｉ＝Ｒ×Ｌ）。したがって、光源の強度Ｌが変化すると、画像として観測される光強度Ｉも変化することになるが、被写体の反射率Ｒのみを評価することができれば、光源の強度Ｌに依存しない、すなわち、画像の明るさの影響を受けない精度の高い顔判別を行うことができる。 The light intensity I observed as an image is usually expressed as the product of the reflectance R of the subject and the intensity L of the light source (I = R × L). Therefore, when the intensity L of the light source changes, the light intensity I observed as an image also changes. However, if only the reflectance R of the subject can be evaluated, it does not depend on the intensity L of the light source. It is possible to perform highly accurate face discrimination that is not affected by the brightness of the image.

ここで、光源の強度がＬの場合において、被写体上で反射率がＲ１の部分から観測される光強度をＩ１、被写体上で反射率がＲ２の部分から観測される光強度をＩ２としたとき、それぞれの対数をとった空間では、下記の式が成り立つ。
ｌｏｇ（Ｉ１）−ｌｏｇ（Ｉ２）＝ｌｏｇ（Ｒ１×Ｌ）−ｌｏｇ（Ｒ２×Ｌ）＝ｌｏｇ（Ｒ１）＋ｌｏｇ（Ｌ）−（ｌｏｇ（Ｒ２）＋ｌｏｇ（Ｌ））＝ｌｏｇ（Ｒ１）−ｌｏｇ（Ｒ２）＝ｌｏｇ（Ｒ１／Ｒ２） Here, when the intensity of the light source is L, the light intensity observed from the portion with the reflectance R1 on the subject is I1, and the light intensity observed from the portion with the reflectance R2 on the subject is I2. In the logarithmic space, the following equation holds.
log (I1) −log (I2) = log (R1 × L) −log (R2 × L) = log (R1) + log (L) − (log (R2) + log (L)) = log (R1) −log (R2) = log (R1 / R2)

すなわち、画像における画素値を対数変換することは、反射率の比が差として表現された空間へ変換することとなり、このような空間では、光源の強度Ｌに依存しない被写体の反射率のみを評価することが可能となる。言い換えると、画像中の明るさによって異なるコントラスト（ここでは画素値の差分そのもの）を揃えることができる。 In other words, logarithmic conversion of pixel values in an image results in conversion into a space where the reflectance ratio is expressed as a difference. In such a space, only the reflectance of the subject that does not depend on the intensity L of the light source is evaluated. It becomes possible to do. In other words, it is possible to align different contrasts (here, the pixel value difference itself) depending on the brightness in the image.

一方、一般的なデジタルカメラ等の機器で取得された画像の色空間はｓＲＧＢである。ｓＲＧＢとは、機器間の色再現の違いを統一するために、色彩、彩度等を規定・統一した国際標準の色空間のことであり、この色空間においては、ガンマ値（γout）が２．２の画像出力機器において適正な色再現を可能にするため、画像の画素値は、入力輝度を１／γout（＝０．４５）乗して得られる値となっている。 On the other hand, the color space of an image acquired by a device such as a general digital camera is sRGB. sRGB is an international standard color space that defines and unifies color, saturation, etc., in order to unify the differences in color reproduction between devices. In this color space, the gamma value (γout) is 2. The image pixel value is a value obtained by raising the input luminance to 1 / γout (= 0.45) in order to enable proper color reproduction in the .2 image output device.

そこで、画像全体における画素値を、いわゆる逆ガンマ変換、すなわち、２．２乗した後にさらに対数をとるような変換曲線にしたがって変換することにより、光源の強度に依存しない被写体の反射率のみによる評価を適正に行うことができるようになる。 Therefore, the pixel value in the entire image is converted according to a so-called inverse gamma conversion, that is, according to a conversion curve that takes a logarithm after being raised to the power of 2.2, thereby evaluating only by the reflectance of the subject independent of the intensity of the light source. Can be performed properly.

なお、このような全体正規化処理は、別の言い方をすれば、画像全体における画素値を、特定の色空間を別の特性を有する色空間に変換する変換曲線にしたがって変換する処理ということができる。 In other words, such an overall normalization process is a process of converting pixel values in the entire image according to a conversion curve for converting a specific color space into a color space having different characteristics. it can.

このような処理を検出対象画像に施すことにより、画像中の明るさによって異なるコントラストを揃えることができ、顔検出処理の精度が向上することとなる。なお、この全体正規化処理は、処理結果が検出対象画像中の斜光や背景、入力モダリティの違いによる影響を受けやすい反面、処理時間が短いという特徴を有する。 By applying such processing to the detection target image, different contrasts can be provided depending on the brightness in the image, and the accuracy of the face detection processing is improved. The overall normalization process is characterized in that the processing result is easily influenced by the difference in oblique light, background, and input modality in the detection target image, but the processing time is short.

次に、局所正規化処理について説明する。局所正規化処理は、解像度画像上の局所的な領域におけるコントラストのばらつきを抑制するための処理である。すなわち、解像度画像における各局所領域について、輝度を表す画素値の分散の程度が所定レベル以上である局所領域に対しては、この分散の程度を上記の所定レベルより高い一定レベルに近づける第１の輝度階調変換処理を施し、画素値の分散の程度が上記の所定レベル未満である局所領域に対しては、この分散の程度を上記の一定レベルより低いレベルに抑える第２の輝度階調変換処理を施すものである。なお、この局所正規化処理は、処理時間は長いが、検出対象画像中の斜光や背景、入力モダリティの違いによる判別結果への影響は小さいという特徴を有する。 Next, the local normalization process will be described. The local normalization process is a process for suppressing variation in contrast in a local region on the resolution image. That is, for each local region in the resolution image, for a local region in which the degree of dispersion of pixel values representing luminance is equal to or higher than a predetermined level, the degree of dispersion is made closer to a certain level higher than the predetermined level. A second luminance gradation conversion that performs luminance gradation conversion processing and suppresses the degree of dispersion to a level lower than the predetermined level for a local region where the degree of dispersion of pixel values is less than the predetermined level. Processing is performed. This local normalization process has a long processing time, but has a feature that the influence on the determination result due to the difference in oblique light, background, and input modality in the detection target image is small.

図４は局所正規化処理の概念を示した図であり、図５は局所正規化処理のフローを示した図である。また、式（１），（２）は、この局所正規化処理のための画素値の階調変換の式である。

FIG. 4 is a diagram showing the concept of local normalization processing, and FIG. 5 is a diagram showing the flow of local normalization processing. Expressions (1) and (2) are gradation conversion expressions for pixel values for the local normalization process.

ここで、Ｘは注目画素の画素値、Ｘ′は注目画素の変換後の画素値、ｍlocalは注目画素を中心とする局所領域における画素値の平均、Ｖlocalはこの局所領域における画素値の分散、ＳＤlocalはこの局所領域における画素値の標準偏差、（Ｃ１×Ｃ１）は上記の一定レベルに対応する基準値、Ｃ２は上記の所定レベルに対応する閾値、ＳＤｃは所定の定数である。なお、本実施形態において、輝度の階調数は８ｂｉｔとし、画素値の取り得る値は０から２５５とする。 Here, X is the pixel value of the pixel of interest, X ′ is the pixel value after conversion of the pixel of interest, mlocal is the average of the pixel values in the local region centered on the pixel of interest, Vlocal is the variance of the pixel values in this local region, SDlocal is a standard deviation of pixel values in this local area, (C1 × C1) is a reference value corresponding to the above-mentioned constant level, C2 is a threshold value corresponding to the above-mentioned predetermined level, and SDc is a predetermined constant. In the present embodiment, the number of gradations of luminance is 8 bits, and the possible pixel values are 0 to 255.

図４に示すように、まず、解像度画像における１つの画素を注目画素として設定し（ステップＳ１）、この注目画素を中心とする所定の大きさ、例えば１１×１１画素サイズの局所領域における画素値の分散Ｖlocalを算出し（ステップＳ２）、分散Ｖlocalが上記所定のレベルに対応する閾値Ｃ２以上であるか否かを判定する（ステップＳ３）。ステップＳ３において、分散Ｖlocalが閾値Ｃ２以上であると判定された場合には、上記第１の輝度階調変換処理として、分散Ｖlocalが上記一定のレベルに対応する基準値（Ｃ１×Ｃ１）より大きいほど、注目画素の画素値Ｘと平均ｍlocalとの差を小さくし、分散ｍlocalが基準値（Ｃ１×Ｃ１）より小さいほど、注目画素の画素値Ｘと平均ｍlocalとの差を大きくする階調変換を式（１）にしたがって行う（ステップＳ４）。一方、ステップＳ３において、分散Ｖlocalが閾値Ｃ２未満であると判定された場合には、上記第２の輝度階調変換処理として、分散Ｖlocalに依らない線形な階調変換を式（２）にしたがって行う（ステップＳ５）。そして、ステップＳ１で設定した注目画素が最後の画素であるか否かを判定する（ステップＳ６）。ステップＳ６において、その注目画素が最後の画素でないと判定された場合には、ステップＳ１に戻り、同じ解像度画像上の次の画素を注目画素として設定する。一方、ステップＳ６において、その注目画素が最後の画素であると判定された場合には、その解像度画像に対する局所正規化を終了する。このように、上記ステップＳ１からＳ６の処理を繰り返すことにより、解像度画像全体に局所正規化を施すことができる。 As shown in FIG. 4, first, one pixel in the resolution image is set as a target pixel (step S1), and a pixel value in a local area having a predetermined size centered on the target pixel, for example, 11 × 11 pixel size. The variance Vlocal is calculated (step S2), and it is determined whether or not the variance Vlocal is equal to or greater than the threshold value C2 corresponding to the predetermined level (step S3). If it is determined in step S3 that the variance Vlocal is equal to or greater than the threshold C2, the variance Vlocal is larger than the reference value (C1 × C1) corresponding to the certain level as the first luminance gradation conversion process. The tone conversion that decreases the difference between the pixel value X of the target pixel and the average mlocal, and increases the difference between the pixel value X of the target pixel and the average mlocal as the variance mlocal is smaller than the reference value (C1 × C1). Is performed according to the equation (1) (step S4). On the other hand, if it is determined in step S3 that the variance Vlocal is less than the threshold value C2, linear tone conversion that does not depend on the variance Vlocal is performed as the second luminance tone conversion processing according to equation (2). This is performed (step S5). Then, it is determined whether or not the target pixel set in step S1 is the last pixel (step S6). If it is determined in step S6 that the target pixel is not the last pixel, the process returns to step S1, and the next pixel on the same resolution image is set as the target pixel. On the other hand, if it is determined in step S6 that the target pixel is the last pixel, the local normalization for the resolution image is terminated. As described above, by repeating the processes of steps S1 to S6, local normalization can be performed on the entire resolution image.

なお、上記の所定レベルは、局所領域における全体または一部の輝度に応じて変化させるようにしてもよい。例えば、上記の、注目画素毎に階調変換を行う正規化処理において、閾値Ｃ２を注目画素の画素値に応じて変化させるようにしてもよい。すなわち、上記の所定レベルに対応する閾値Ｃ２を、注目画素の輝度が相対的に高いときにはより高く設定し、その輝度が相対的に低いときにはより低く設定するようにしてもよい。このようにすることで、輝度の低い、いわゆる暗い領域に低いコントラスト（画素値の分散が小さい状態）で存在している顔も正しく正規化することができる。 Note that the predetermined level may be changed according to the whole or a part of luminance in the local region. For example, in the normalization process in which gradation conversion is performed for each target pixel, the threshold value C2 may be changed according to the pixel value of the target pixel. That is, the threshold value C2 corresponding to the predetermined level may be set higher when the luminance of the target pixel is relatively high, and may be set lower when the luminance is relatively low. In this way, it is possible to correctly normalize a face that exists in a low-brightness, so-called dark area with low contrast (a state in which the dispersion of pixel values is small).

なお、ここでは、検出すべき顔の傾きを、検出対象画像Ｓ０の天地方向を基準に検出対象画像Ｓ０の画像面内において３０度刻みで回転して設定される計１２種類の傾きとし、検出すべき顔の傾きの切り替え順序が予め決められているものとする。例えば、その切り替え順序を、検出対象画像Ｓ０の天地方向を基準に時計回りの回転角度で表すとして、上向き３方向である０度、３３０度、３０度（０度グループ）、右向き３方向である９０度、６０度、１２０度（９０度グループ）、左向き３方向である２７０度、２４０度、３００度（２７０度グループ）、そして、下向き３方向である１８０度、１５０度、２１０度（１８０度グループ）の順序とする。 Here, the inclination of the face to be detected is a total of 12 types of inclinations that are set by rotating in 30 degree increments in the image plane of the detection target image S0 with reference to the vertical direction of the detection target image S0. Assume that the switching order of the face inclination to be determined is determined in advance. For example, assuming that the switching order is represented by a clockwise rotation angle with respect to the vertical direction of the detection target image S0, the upward three directions are 0 degrees, 330 degrees, 30 degrees (0 degree group), and the rightward three directions. 90 degrees, 60 degrees, 120 degrees (90 degree group), leftward three directions 270 degrees, 240 degrees, 300 degrees (270 degree group), and three downward directions 180 degrees, 150 degrees, 210 degrees (180 degrees) Degree group).

顔検出部３０は、正規化部２０により正規化処理がなされた解像度画像群Ｓ１′の各解像度画像に対して、検出すべき顔の傾きを予め設定された順序にしたがって変えながら顔検出処理を施すことにより、各解像度画像に含まれる顔画像Ｓ２を検出するものであり、さらに複数の要素から構成されている。 The face detection unit 30 performs face detection processing on each resolution image of the resolution image group S1 ′ that has been normalized by the normalization unit 20 while changing the inclination of the face to be detected according to a preset order. By performing the detection, the face image S2 included in each resolution image is detected, and the image is further composed of a plurality of elements.

図６は、顔検出部３０の構成を示すブロック図である。顔検出部３０は、図６に示すように、検出制御部（高信頼画像決定手段，指標値増加手段，顔画像検出手段）３１と、解像度画像選択部３２と、サブウィンドウ設定部（部分画像切出し手段）３３と、判別器群（指標値算出手段，顔画像候補抽出手段）３４とから構成されている。 FIG. 6 is a block diagram illustrating a configuration of the face detection unit 30. As shown in FIG. 6, the face detection unit 30 includes a detection control unit (highly reliable image determination unit, index value increase unit, face image detection unit) 31, a resolution image selection unit 32, and a sub-window setting unit (partial image extraction). Means) 33 and a classifier group (index value calculating means, face image candidate extracting means) 34.

検出制御部３１は、顔検出部３０を構成する他の各部を制御して、顔検出処理におけるシーケンス制御を主に行うものである。すなわち、解像度画像選択部３２、サブウィンドウ設定部３３および判別器群３４を制御して、解像度画像群Ｓ１′を構成する解像度画像（入力画像）毎に、解像度画像全体にわたって順次部分画像を切り出し、切り出された部分画像に対して判別すべき顔の傾きが異なる複数種類の判別器をすべて適用して、顔の傾きによらず解像度画像上における顔画像の候補を抽出し、抽出された顔画像の候補の各々に対して真の顔画像であるか否かの判別を行うことにより、各解像度画像における真の顔画像Ｓ２を検出するものである。例えば、適宜、解像度画像選択部３２に対して解像度画像の選択を指示したり、サブウィンドウ設定部３３に対してサブウィンドウの設定条件を指示したり、また、判別器群３４を構成する判別器のうち使用する判別器の種類を切り替えたりする。なお、サブウィンドウ設定条件には、サブウィンドウを設定する画像上の範囲や、サブウィンドウの移動間隔、すなわち検出の粗さ、等が含まれる。 The detection control unit 31 mainly controls sequence control in the face detection process by controlling other units constituting the face detection unit 30. That is, the resolution image selection unit 32, the sub window setting unit 33, and the discriminator group 34 are controlled, and for each resolution image (input image) constituting the resolution image group S1 ′, partial images are sequentially cut out and cut out over the entire resolution image. Applying all kinds of discriminators with different face inclinations to be discriminated against the extracted partial images, the face image candidates on the resolution image are extracted regardless of the face inclination, and the extracted face image By determining whether each of the candidates is a true face image, the true face image S2 in each resolution image is detected. For example, as appropriate, the resolution image selection unit 32 is instructed to select a resolution image, the sub-window setting unit 33 is instructed to set the sub-window, and among the classifiers constituting the classifier group 34 Change the type of discriminator to be used. The sub-window setting condition includes a range on the image where the sub-window is set, a sub-window moving interval, that is, a detection roughness.

解像度画像選択部３２は、検出制御部３１の制御により、解像度画像群Ｓ１′の中から顔検出処理に供する解像度画像をサイズの小さい順に、すなわち、解像度の粗い順に選択するものである。なお、本実施形態における顔検出の手法が、各解像度画像上で順次切り出された同じサイズの部分画像Ｗについてその部分画像Ｗが顔画像であるか否かを判別することにより検出対象画像Ｓ０における顔を検出する手法であるから、この解像度画像選択部３２は、検出対象画像Ｓ０における検出すべき顔の大きさを大から小へ毎回変えながら設定するものと考えることができる。 Under the control of the detection control unit 31, the resolution image selection unit 32 selects resolution images to be subjected to face detection processing from the resolution image group S1 ′ in ascending order of size, that is, in order of coarse resolution. Note that the face detection method in this embodiment determines whether the partial image W is a face image with respect to the partial images W of the same size that are sequentially cut out on the respective resolution images. Since this is a method for detecting a face, it can be considered that the resolution image selection unit 32 sets the size of the face to be detected in the detection target image S0 while changing from large to small each time.

サブウィンドウ設定部３３は、検出制御部３１により設定されたサブウィンドウ設定条件に基づいて、解像度画像選択部３２により選択された解像度画像において、顔画像であるか否かの判別対象となる部分画像Ｗを切り出すサブウィンドウを、その位置を所定幅ずつずらしながら設定するものである。 The sub-window setting unit 33 selects a partial image W that is a determination target of whether or not it is a face image in the resolution image selected by the resolution image selection unit 32 based on the sub-window setting condition set by the detection control unit 31. The subwindow to be cut out is set while shifting its position by a predetermined width.

例えば、上記の選択された解像度画像において、所定のサイズすなわち３２×３２画素サイズの部分画像Ｗを切り出すサブウィンドウを、所定画素数分、例えば２画素ずつ移動させながら順次設定し、その切り出された部分画像Ｗを判別器群３４へ入力する。判別器群３４を構成する各判別器は、後述のように、それぞれ、所定の傾きおよび向きの顔を含む顔画像を判別するものであるから、このようにすることで、あらゆる傾きおよび向きにある顔の顔画像を判別することが可能となる。 For example, in the selected resolution image, a sub-window for cutting out a partial image W having a predetermined size, that is, a 32 × 32 pixel size, is sequentially set by moving a predetermined number of pixels, for example, by 2 pixels, and the cut-out portion The image W is input to the classifier group 34. As will be described later, each discriminator constituting the discriminator group 34 discriminates a face image including a face having a predetermined inclination and orientation. By doing so, it is possible to obtain any inclination and orientation. It becomes possible to discriminate a face image of a certain face.

判別器群３４は、部分画像Ｗの各々について、部分画像Ｗの画像パターンに基づいて、この部分画像Ｗが所定の傾きおよび向きの顔を含む顔画像である蓋然性を示すスコア（指標値）ＳＣを、その所定の傾きおよび向きを変えてそれぞれ算出し、その算出されたスコアＳＣが第１の閾値Ｔｈ１以上である部分画像Ｗを、そのスコアが算出されたときの所定の傾きおよび向きの顔を含む顔画像Ｓ２の候補として抽出するものである。 Based on the image pattern of the partial image W, the discriminator group 34 has a score (index value) SC indicating the probability that the partial image W is a face image including a face having a predetermined inclination and orientation. Are calculated by changing the predetermined inclination and orientation, and the partial image W whose calculated score SC is equal to or greater than the first threshold Th1 is a face having the predetermined inclination and orientation when the score is calculated. Is extracted as a candidate for the face image S2 including.

図７は判別器群３４の構成を示した図である。判別器群３４は、図７に示すように、判別すべき顔の向きがそれぞれ異なる複数種類の判別器群、すなわち、主に正面顔を判別する正面顔判別器群３４＿Ｆ、主に左横顔を判別する左横顔判別器群３４＿Ｌおよび主に右横顔を判別する右横顔判別器群３４＿Ｒが並列に接続された構成である。さらに、これら３種の判別器群はそれぞれ、判別すべき顔の傾きが画像の天地方向を基準として３０度ずつ異なる計１２方向に対応した判別器、すなわち、正面顔判別器群３４＿Ｆは、判別器３４＿Ｆ０，３４＿Ｆ３０，・・・，３４＿Ｆ３３０、左横顔判別器群３４＿Ｌは、判別器３４＿Ｌ０，３４＿Ｌ３０，・・・，３４＿Ｌ３３０、右横顔判別器群３４＿Ｒは、判別器３４＿Ｒ０，３４＿Ｒ３０，・・・，３４＿Ｒ３３０から構成されている。 FIG. 7 is a diagram showing the configuration of the classifier group 34. As shown in FIG. 7, the discriminator group 34 includes a plurality of types of discriminator groups having different face orientations to be discriminated, that is, a front face discriminator group 34_F that mainly discriminates a front face, and mainly a left side face. The left side face classifier group 34_L for discriminating and the right side face classifier group 34_R for mainly discriminating the right side face are connected in parallel. Further, each of these three types of classifier groups is a classifier corresponding to a total of 12 directions in which the inclination of the face to be distinguished differs by 30 degrees with respect to the vertical direction of the image, that is, the front face classifier group 34_F is a classifier. 34_F0, 34_F30,..., 34_F330, the left side face discriminator group 34_L is the discriminator 34_L0, 34_L30,. It is composed of

上記の各判別器は、図７に示すように、複数の弱判別器ＷＣを有しており、弱判別器ＷＣは、部分画像Ｗの画素値の分布に係る少なくとも１つの特徴量を算出し、この特徴量を用いて、この部分画像Ｗが所定の傾き及び向きの顔を含む顔画像である蓋然性を示すスコアＳＣを算出するものである。 As shown in FIG. 7, each of the classifiers includes a plurality of weak classifiers WC. The weak classifier WC calculates at least one feature amount related to the distribution of pixel values of the partial image W. The score SC indicating the probability that the partial image W is a face image including a face having a predetermined inclination and orientation is calculated using the feature amount.

なお、上記の判別器群３４は、いずれも、判別可能な主な顔の向きを正面顔、左横顔および右横顔の３種としているが、斜め向きの顔の検出精度を上げるため、右斜め顔、左斜め顔をそれぞれ判別する判別器をさらに設けるようにしてもよい。 In each of the classifier groups 34 described above, the main face directions that can be discriminated are three types: front face, left side face, and right side face. A discriminator for discriminating each of the face and the left oblique face may be further provided.

ここで、判別器群３４を構成する各判別器の構成、判別器における処理の流れおよび判別器の学習方法について説明する。 Here, the configuration of each classifier constituting the classifier group 34, the flow of processing in the classifier, and the classifier learning method will be described.

判別器は、後述の学習により多数の弱判別器ＷＣの中から選定された判別に有効な複数の弱判別器ＷＣを有している。弱判別器ＷＣは、それぞれ、部分画像Ｗから弱判別器ＷＣ毎に固有の所定のアルゴリズムにしたがって特徴量を算出し、その特徴量と所定のスコアテーブルとしての後述の自己のヒストグラムとに基づいて、部分画像Ｗが所定の傾きおよび向きの顔を含む顔画像である蓋然性を示すスコアｓｃを求めるものである。判別器は、これら複数の弱判別器ＷＣから得られた個々のスコアｓｃを合算してスコアＳＣを求め、このスコアＳＣが閾値Ｔｈ１以上である部分画像Ｗをその所定の傾きおよび向きの顔を含む顔画像Ｓ２の候補として抽出する。 The discriminator has a plurality of weak discriminators WC effective for discrimination selected from a large number of weak discriminators WC by learning described later. The weak discriminator WC calculates a feature amount from the partial image W according to a predetermined algorithm specific to each weak discriminator WC, and based on the feature amount and its own histogram described later as a predetermined score table. The score sc indicating the probability that the partial image W is a face image including a face having a predetermined inclination and orientation is obtained. The discriminator adds the individual scores sc obtained from the plurality of weak discriminators WC to obtain a score SC, and determines the partial image W having the score SC equal to or greater than the threshold Th1 as a face having a predetermined inclination and orientation. Extracted as a candidate for the included face image S2.

図８は１つの判別器における処理の流れを示すフローチャートである。部分画像Ｗが判別器に入力されると、複数の弱判別器ＷＣにおいてそれぞれ種類の異なる特徴量ｘが算出される（ステップＳ１１）。例えば、図９に示すように、所定のサイズ、例えば、３２×３２画素サイズの部分画像Ｗに対して、４近傍画素平均（画像を２×２画素サイズ毎に複数のブロックに区分し、各ブロックの４画素における画素値の平均値をそのブロックに対応する１つの画素の画素値とする処理）を段階的に行うことにより、１６×１６画素サイズの画像と、８×８画素サイズの縮小した画像を得、もとの画像を含めたこれら３つの画像の平面内に設定される所定の２点を１ペアとして、複数種類のペアからなる１つのペア群を構成する各ペアにおける２点間の画素値（輝度）の差分値をそれぞれ計算し、これらの差分値の組合せを特徴量とする。各ペアの所定の２点は、例えば、画像上の顔の濃淡の特徴が反映されるよう決められた縦方向に並んだ所定の２点や、横方向に並んだ所定の２点とする。そして、特徴量である差分値の組合せに対応する値をｘとして算出する。次に、その値ｘに応じて所定のスコアテーブル（自己のヒストグラム）から部分画像Ｗが判別すべき顔（例えば、判別器３４＿Ｆ３０の場合には「顔の向きが正面で傾きが回転角度３０度の顔」）を含む顔画像である蓋然性を示すスコアｓｃが弱判別器毎に算出される（ステップＳ１２）。そして、弱判別器毎に算出された個々のスコアｓｃを合算してスコアＳＣが得られ（ステップＳ１３）、このスコアＳＣが第１の閾値Ｔｈ1以上であるか否かを判定し（ステップＳ１４）、肯定される場合に、その部分画像Ｗを、この判別器が判別すべき所定の傾きおよび向きの顔を含む顔画像Ｓ２の候補として抽出する（ステップＳ１５）。 FIG. 8 is a flowchart showing the flow of processing in one classifier. When the partial image W is input to the discriminator, different types of feature amounts x are calculated in the plurality of weak discriminators WC (step S11). For example, as shown in FIG. 9, for a partial image W having a predetermined size, for example, a 32 × 32 pixel size, an average of four neighboring pixels (the image is divided into a plurality of blocks for each 2 × 2 pixel size, By processing the average value of the pixel values of the four pixels of the block as the pixel value of one pixel corresponding to the block), a 16 × 16 pixel size image and an 8 × 8 pixel size reduction are performed. 2 points in each pair constituting one pair group consisting of a plurality of pairs, with a predetermined two points set in the plane of these three images including the original image as one pair A difference value between pixel values (brightness) is calculated, and a combination of these difference values is used as a feature amount. The predetermined two points of each pair are, for example, two predetermined points arranged in the vertical direction and two predetermined points arranged in the horizontal direction so as to reflect the characteristics of the facial shading on the image. Then, a value corresponding to a combination of difference values as feature amounts is calculated as x. Next, in accordance with the value x, the face to be identified by the partial image W from a predetermined score table (self histogram) (for example, in the case of the discriminator 34_F30, “the face direction is front and the tilt is 30 degrees rotation angle”). A score sc indicating the probability of being a face image including the face of “)” is calculated for each weak classifier (step S12). Then, the score SC is obtained by adding the individual scores sc calculated for each weak classifier (step S13), and it is determined whether or not the score SC is equal to or greater than the first threshold Th1 (step S14). If the result is affirmative, the partial image W is extracted as a candidate for a face image S2 including a face having a predetermined inclination and orientation to be discriminated by the discriminator (step S15).

次に、判別器の学習（生成）方法について説明する。 Next, a classifier learning (generation) method will be described.

図１０は判別器の学習方法を示すフローチャートである。判別器の学習には、所定のサイズ、例えば３２×３２画素サイズで規格化され、さらに、前述の正規化部２０による正規化処理と同様の処理が施された複数のサンプル画像を用いる。サンプル画像としては、顔であることが分かっている複数の異なる顔サンプル画像（顔サンプル画像群）と、顔でないことが分かっている複数の異なる非顔サンプル画像（非顔サンプル画像群）とを用意する。 FIG. 10 is a flowchart showing a learning method of the classifier. For learning of the discriminator, a plurality of sample images that are standardized with a predetermined size, for example, 32 × 32 pixel size, and further subjected to the same processing as the normalization processing by the normalization unit 20 described above are used. Sample images include a plurality of different face sample images (face sample image group) known to be faces and a plurality of different non-face sample images (non-face sample image group) known to be non-faces. prepare.

顔サンプル画像群は、１つの顔サンプル画像につき、縦および／または横を０．７倍から１．２倍の範囲にて０．１倍単位で段階的に拡縮して得られる各サンプル画像に対し、平面上±１５度の範囲にて３度単位で段階的に回転させて得られる複数の変形バリエーションを用いる。なおこのとき、顔サンプル画像は、目の位置が所定の位置に来るように顔のサイズと位置を規格化し、上記の平面上の回転、拡縮は目の位置を基準として行うようにする。例えば、ｄ×ｄサイズのサンプル画像の場合においては、図１１に示すように、両目の位置が、サンプル画像の最左上の頂点と最右上の頂点から、それぞれ、内側に１／４ｄ、下側に１／４ｄ移動した各位置とに来るように顔のサイズと位置を規格化し、また、上記の平面上の回転、拡縮は、両目の中間点を中心に行うようにする。 The face sample image group includes each sample image obtained by stepwise scaling in units of 0.1 times within a range of 0.7 to 1.2 times in length and / or width for one face sample image. On the other hand, a plurality of deformation variations obtained by rotating in steps of 3 degrees in a range of ± 15 degrees on the plane are used. At this time, the face sample image is standardized in size and position so that the eye position is at a predetermined position, and the above-described rotation and scaling on the plane are performed based on the eye position. For example, in the case of a sample image of d × d size, as shown in FIG. 11, the positions of both eyes are 1 / 4d on the inner side and the lower side from the uppermost vertex and the uppermost vertex of the sample image, respectively. The size and position of the face are standardized so as to come to each position moved by 1 / 4d, and the rotation and expansion / contraction on the plane is performed around the middle point of both eyes.

これら各サンプル画像には、重みすなわち重要度が割り当てられる。まず、すべてのサンプル画像の重みの初期値が等しく１に設定される（ステップＳ２１）。 Each of these sample images is assigned a weight or importance. First, the initial value of the weight of all sample images is set equal to 1 (step S21).

次に、サンプル画像およびその縮小画像の平面内に設定される所定の２点を１ペアとして複数のペアからなるペア群を複数種類設定したときの、この複数種類のペア群のそれぞれについて弱半別器が作成される（ステップＳ２２）。ここで、それぞれの弱判別器とは、サブウィンドウＷで切り出された部分画像とその縮小画像の平面内に設定される所定の２点を１ペアとして複数のペアからなる１つのペア群を設定したときの、この１つのペア群を構成する各ペアにおける２点間の画素値（輝度）の差分値の組合せを用いて、顔の画像と顔でない画像とを判別する基準を提供するものである。本実施形態においては、１つのペア群を構成する各ペアにおける２点間の画素値の差分値の組合せについてのヒストグラムを弱判別器のスコアテーブルの基礎として使用する。 Next, when a plurality of pairs of groups consisting of a plurality of pairs are set with a predetermined two points set in the plane of the sample image and the reduced image as one pair, each of the plurality of types of pairs is weak. A separate device is created (step S22). Here, each weak discriminator sets one pair group consisting of a plurality of pairs with a predetermined two points set in the plane of the partial image cut out in the sub-window W and the reduced image as one pair. This provides a reference for discriminating between a face image and a non-face image using a combination of difference values of pixel values (luminance) between two points in each pair constituting this one pair group. . In the present embodiment, a histogram for a combination of pixel value difference values between two points in each pair constituting one pair group is used as the basis of the score table of the weak classifier.

図１２はサンプル画像からヒストグラムが生成される様子を示した図である。図１２の左側のサンプル画像に示すように、この判別器を作成するためのペア群を構成する各ペアの２点は、顔であることが分かっている複数のサンプル画像において、サンプル画像上の右目の中心にある点をＰ１、右側の頬の部分にある点をＰ２、眉間の部分にある点をＰ３、サンプル画像を４近傍画素平均で縮小した１６×１６画素サイズの縮小画像上の右目の中心にある点をＰ４、右側の頬の部分にある点をＰ５、さらに４近傍画素平均で縮小した８×８画素サイズの縮小画像上の額の部分にある点をＰ６、口の部分にある点をＰ７として、Ｐ１−Ｐ２、Ｐ１−Ｐ３、Ｐ４−Ｐ５、Ｐ４−Ｐ６、Ｐ６−Ｐ７の５ペアである。なお、ある判別器を作成するための１つのペア群を構成する各ペアの２点の座標位置はすべてのサンプル画像において同一である。そして顔であることが分かっているすべてのサンプル画像について上記５ペアを構成する各ペアの２点間の画素値の差分値の組合せが求められ、そのヒストグラムが作成される。ここで、画素値の差分値の組合せとしてとり得る値は、画像の輝度階調数に依存するが、仮に１６ビット階調である場合には、１つの画素値の差分値につき６５５３６通りあり、全体では階調数の（ペア数）乗、すなわち６５５３６の５乗通りとなってしまい、学習および検出のために多大なサンプルの数、時間およびメモリを要することとなる。このため、本実施形態においては、画素値の差分値を適当な数値幅で区切って量子化し、ｎ値化する（例えばｎ＝１００）。これにより、画素値の差分値の組合せの数はｎの５乗通りとなるため、画素値の差分値の組合せを表すデータ数を低減できる。 FIG. 12 is a diagram showing how a histogram is generated from a sample image. As shown in the sample image on the left side of FIG. 12, two points of each pair constituting the pair group for creating this discriminator are a plurality of sample images that are known to be faces. The right eye on the reduced image of 16 × 16 pixel size in which the point in the center of the right eye is P1, the point in the right cheek part is P2, the point in the part between the eyebrows is P3, and the sample image is reduced by an average of four neighboring pixels The point at the center of P4, the point at the cheek on the right side is P5, and the point at the forehead part on the reduced image of 8 × 8 pixel size reduced by the average of 4 neighboring pixels is P6, the mouth part A certain point is P7, and there are five pairs of P1-P2, P1-P3, P4-P5, P4-P6, and P6-P7. Note that the coordinate positions of the two points of each pair constituting one pair group for creating a certain classifier are the same in all sample images. For all sample images that are known to be faces, combinations of pixel value difference values between two points of each of the five pairs are obtained, and a histogram thereof is created. Here, the value that can be taken as a combination of the difference values of the pixel values depends on the number of luminance gradations of the image, but if it is a 16-bit gradation, there are 65536 different values for the difference value of one pixel value, As a whole, the number of gradations is (the number of pairs), that is, 65536 to the fifth power, and a large number of samples, time, and memory are required for learning and detection. For this reason, in the present embodiment, the difference value of the pixel value is divided by an appropriate numerical value width and quantized to be n-valued (for example, n = 100). Thereby, since the number of combinations of the difference values of the pixel values is n to the fifth power, the number of data representing the combination of the difference values of the pixel values can be reduced.

同様に、非顔サンプル画像群についても、ヒストグラムが作成される。なお、非顔サンプル画像については、顔サンプル画像上における上記各ペアの所定の２点の位置に対応する位置（同様に参照符号Ｐ１からＰ７を用いる）が用いられる。これらの２つのヒストグラムが示す頻度値の比の対数値を取ってヒストグラムで表したものが、図１２の一番右側に示す、弱判別器のスコアテーブルの基礎として用いられるヒストグラムである。この弱判別器のヒストグラムが示す各縦軸の値を、以下、判別ポイントと称する。この弱判別器によれば、正の判別ポイントに対応する、画素値の差分値の組合せの分布を示す画像は顔である可能性が高く、判別ポイントの絶対値が大きいほどその可能性は高まると言える。逆に、負の判別ポイントに対応する画素値の差分値の組合せの分布を示す画像は顔でない可能性が高く、やはり判別ポイントの絶対値が大きいほどその可能性は高まる。ステップＳ２２では、判別に使用され得る複数種類のペア群を構成する各ペアの所定の２点間の画素値の差分値の組合せについて、上記のヒストグラム形式の複数の弱判別器が作成される。 Similarly, a histogram is also created for the non-face sample image group. For the non-face sample image, positions corresponding to the positions of the two predetermined points of each pair on the face sample image (similarly, reference numerals P1 to P7 are used) are used. A histogram obtained by taking logarithm values of ratios of frequency values indicated by these two histograms and representing the histogram is used as the basis of the score table of the weak discriminator shown on the rightmost side of FIG. The value of each vertical axis indicated by the histogram of the weak classifier is hereinafter referred to as a discrimination point. According to this weak discriminator, an image showing the distribution of combinations of pixel value difference values corresponding to positive discrimination points is highly likely to be a face, and the possibility increases as the absolute value of the discrimination point increases. It can be said. Conversely, an image showing a distribution of combinations of difference values of pixel values corresponding to negative discrimination points is highly likely not to be a face, and the possibility increases as the absolute value of the discrimination point increases. In step S22, a plurality of weak discriminators in the above-described histogram format are created for combinations of pixel value difference values between predetermined two points of each pair constituting a plurality of types of pair groups that can be used for discrimination.

続いて、ステップＳ２２で作成した複数の弱半別器のうち、画像が顔画像であるか否かを判別するのに最も有効な弱判別器が選択される。最も有効な弱判別器の選択は、各サンプル画像の重みを考慮して行われる。この例では、各弱判別器の重み付き正答率が比較され、最も高い重み付き正答率を示す弱判別器が選択される（ステップＳ２３）。すなわち、最初のステップＳ２３では、各サンプル画像の重みは等しく１であるので、単純にその弱判別器によって画像が顔画像であるか否かが正しく判別されるサンプル画像の数が最も多いものが、最も有効な弱判別器として選択される。一方、後述するステップＳ２５において各サンプル画像の重みが更新された後の２回目のステップＳ２３では、重みが１のサンプル画像、重みが１よりも大きいサンプル画像、および重みが１よりも小さいサンプル画像が混在しており、重みが１よりも大きいサンプル画像は、正答率の評価において、重みが１のサンプル画像よりも重みが大きい分多くカウントされる。これにより、２回目以降のステップＳ２３では、重みが小さいサンプル画像よりも、重みが大きいサンプル画像が正しく判別されることに、より重点が置かれる。 Subsequently, the weak classifier that is most effective for determining whether or not the image is a face image is selected from the plurality of weak half-classifiers created in step S22. The most effective weak classifier is selected in consideration of the weight of each sample image. In this example, the weighted correct answer rates of the weak classifiers are compared, and the weak classifier showing the highest weighted correct answer rate is selected (step S23). That is, in the first step S23, since the weight of each sample image is equal to 1, there is a sample having the largest number of sample images that can be correctly discriminated whether or not the image is a face image simply by the weak discriminator. , Selected as the most effective weak classifier. On the other hand, in the second step S23 after the weight of each sample image is updated in step S25, which will be described later, a sample image with a weight of 1, a sample image with a weight greater than 1, and a sample image with a weight less than 1 The sample images having a weight greater than 1 are counted more in the evaluation of the correct answer rate because the weight is larger than the sample images having a weight of 1. Thereby, in step S23 after the second time, more emphasis is placed on correctly determining a sample image having a large weight than a sample image having a small weight.

次に、それまでに選択した弱判別器の組合せの正答率、すなわち、それまでに選択した弱判別器を組み合わせて使用して（学習段階では、弱判別器は必ずしも線形に結合させる必要はない）各サンプル画像が顔画像であるか否かを判別した結果が、実際に顔画像であるか否かの答えと一致する率が、所定の閾値を超えたか否かが確かめられる（ステップＳ２４）。ここで、弱判別器の組合せの正答率の評価に用いられるのは、現在の重みが付けられたサンプル画像群でも、重みが等しくされたサンプル画像群でもよい。所定の閾値を超えた場合は、それまでに選択した弱判別器を用いれば画像が顔であるか否かを十分に高い確率で判別できるため、学習は終了する。所定の閾値以下である場合は、それまでに選択した弱判別器と組み合わせて用いるための追加の弱判別器を選択するために、ステップＳ２６へと進む。 Next, the correct answer rate of the combination of weak classifiers selected so far, that is, using the weak classifiers selected so far in combination (in the learning stage, the weak classifiers do not necessarily need to be linearly combined. ) It is ascertained whether or not the result of determining whether or not each sample image is a face image has exceeded a predetermined threshold value that matches the answer whether or not it is actually a face image (step S24). . Here, the current weighted sample image group or the sample image group with equal weight may be used for evaluating the correct answer rate of the combination of weak classifiers. When the predetermined threshold value is exceeded, learning is terminated because it is possible to determine whether the image is a face with a sufficiently high probability by using the weak classifier selected so far. If it is equal to or less than the predetermined threshold value, the process proceeds to step S26 in order to select an additional weak classifier to be used in combination with the weak classifier selected so far.

ステップＳ２６では、直近のステップＳ２３で選択された弱判別器が再び選択されないようにするため、その弱判別器が除外される。 In step S26, the weak discriminator selected in the most recent step S23 is excluded so as not to be selected again.

次に、直近のステップＳ２３で選択された弱判別器では顔であるか否かを正しく判別できなかったサンプル画像の重みが大きくされ、画像が顔画像であるか否かを正しく判別できたサンプル画像の重みが小さくされる（ステップＳ２５）。このように重みを大小させる理由は、次の弱判別器の選択において、既に選択された弱判別器では正しく判別できなかった画像を重要視し、それらの画像が顔画像であるか否かを正しく判別できる弱判別器が選択されるようにして、弱判別器の組合せの効果を高めるためである。 Next, in the weak classifier selected in the most recent step S23, the weight of the sample image that could not correctly determine whether or not it is a face is increased, and the sample that can correctly determine whether or not the image is a face image The image weight is reduced (step S25). The reason for increasing or decreasing the weight in this way is that in selecting the next weak discriminator, importance is placed on images that have not been correctly discriminated by the already selected weak discriminator, and whether these images are face images or not. This is because a weak discriminator that can correctly discriminate is selected to enhance the effect of the combination of the weak discriminators.

続いて、ステップＳ２３へと戻り、上記したように重み付き正答率を基準にして次に有効な弱判別器が選択される。 Subsequently, the process returns to step S23, and the next effective weak classifier is selected based on the weighted correct answer rate as described above.

以上のステップＳ２３からＳ２６を繰り返して、顔画像であるか否かを判別するのに適した弱判別器として、特定のペア群を構成する各ペアの所定の２点間の画素値の差分値の組合せに対応する弱判別器が選択されたところで、ステップＳ２４で確認される正答率が閾値を超えたとすると、顔画像であるか否かの判別に用いる弱判別器の種類と判別条件とが確定され（ステップＳ２７）、これにより学習を終了する。なお、選択された弱判別器は、その重み付き正答率が高い順に線形結合され、１つの判別器が構成される。また、各弱判別器については、それぞれ得られたヒストグラムを基に、画素値の差分値の組合せに応じてスコアを算出するためのスコアテーブルが生成される。なお、ヒストグラム自身をスコアテーブルとして用いることもでき、この場合、ヒストグラムの判別ポイントがそのままスコアとなる。 As a weak discriminator suitable for discriminating whether or not it is a face image by repeating the above steps S23 to S26, the difference value of the pixel value between two predetermined points of each pair constituting a specific pair group When the weak discriminator corresponding to the combination is selected and the correct answer rate confirmed in step S24 exceeds the threshold, the type of the weak discriminator used for discriminating whether the image is a face image and the discriminating condition are determined. This is confirmed (step S27), thereby completing the learning. The selected weak classifiers are linearly combined in descending order of the weighted correct answer rate to constitute one classifier. For each weak classifier, a score table for calculating a score according to a combination of pixel value difference values is generated based on the obtained histogram. Note that the histogram itself can also be used as a score table. In this case, the discrimination point of the histogram is directly used as a score.

このようにして、顔サンプル画像群と非顔サンプル画像群とを用いた学習により、判別器が生成されるわけであるが、上記のように、判別したい顔の傾きおよび向き毎に異なる複数の判別器を生成するには、顔の各傾きおよび各向きに対応した複数種類の顔サンプル画像群を用意し、その顔サンプル画像群と非顔サンプル画像群とを用いた学習を顔サンプル画像群の種類毎に行うこととなる。 In this way, the discriminator is generated by learning using the face sample image group and the non-face sample image group. However, as described above, a plurality of different discriminators depending on the inclination and orientation of the face to be discriminated. In order to generate a discriminator, a plurality of types of face sample image groups corresponding to each inclination and direction of the face are prepared, and learning using the face sample image group and the non-face sample image group is performed. It will be done for each type.

すなわち、本実施形態においては、顔の向きについては、正面、左横、右横の計３種類、傾きについては、回転角度０度から３３０度まで３０度刻みの計１２種類、合計３６種類の顔サンプル画像群を用意する。 That is, in the present embodiment, the face orientation includes three types of front, left side, and right side, and the tilt includes a total of 36 types, a total of 12 types in increments of 30 degrees from 0 degrees to 330 degrees. A face sample image group is prepared.

上記の複数の顔サンプル画像群が得られたら、顔サンプル画像群の種類毎に、その顔サンプル画像群と非顔サンプル画像群とを用いて、上記の学習を行うことにより、判別器群３４を構成する複数の判別器を生成することができる。 When the plurality of face sample image groups are obtained, the discriminator group 34 is obtained by performing the learning using the face sample image group and the non-face sample image group for each type of the face sample image group. Can be generated.

このように、顔の向き毎に、かつ、顔の傾き毎に学習された複数の判別器を用いることにより、多種の傾きおよび向きの顔を含む顔画像を判別することが可能となる。 As described above, by using a plurality of discriminators learned for each face orientation and each face inclination, it is possible to discriminate face images including faces having various inclinations and orientations.

なお、上記の学習手法を採用する場合において、弱判別器は、特定のペア群を構成する各ペアの所定の２点間の画素値の差分値の組合せを用いて顔の画像と顔でない画像とを判別する基準を提供するものであれば、上記のヒストグラムの形式のものに限られずいかなるものであってもよく、例えば２値データ、閾値または関数等であってもよい。また、同じヒストグラムの形式であっても、図１２の中央に示した２つのヒストグラムの差分値の分布を示すヒストグラム等を用いてもよい。 In the case of adopting the above learning method, the weak classifier uses a combination of difference values of pixel values between two predetermined points of each pair constituting a specific pair group, and a face image and a non-face image. Is not limited to the above-described histogram format, and may be anything, for example, binary data, a threshold value, a function, or the like. Further, even in the same histogram format, a histogram or the like indicating the distribution of difference values between the two histograms shown in the center of FIG. 12 may be used.

また、学習の方法としては上記手法に限定されるものではなく、ニューラルネットワーク等他のマシンラーニングの手法を用いることができる。 Further, the learning method is not limited to the above method, and other machine learning methods such as a neural network can be used.

検出制御部３１は、複数種類の判別器によって顔画像の候補が抽出されると、上記で言及したように、顔画像の候補が真の顔画像Ｓ２であるか否かを判別する。この際、検出制御部３１は、顔画像の候補の中から、算出されたスコアＳＣが第１の閾値Ｔｈ１より大きい第２の閾値Ｔｈ２以上である候補を、高信頼顔画像として決定し、顔画像の候補のうち、顔の傾きがこの高信頼顔画像における顔の傾きと略同じである候補に対して、この候補のスコアＳＣを増加させる処理、例えば特別スコアを加算する等の処理を施す。そして、この処理後のスコアＳＣが第１の閾値Ｔｈ１と第２の閾値Ｔｈ２の間の値である第３の閾値Ｔｈ３以上である候補を、真の顔画像Ｓ２として検出する。ここで、顔の傾きがこの高信頼顔画像における顔の傾きと略同じである候補としては、例えば、顔の傾きが高信頼顔画像における顔の傾きから回転角度±３０度以内の傾きである候補とすることができる。 When face image candidates are extracted by a plurality of types of discriminators, the detection control unit 31 determines whether the face image candidates are true face images S2, as mentioned above. At this time, the detection control unit 31 determines a candidate having a calculated score SC that is equal to or greater than a second threshold Th2 that is greater than the first threshold Th1 as a highly reliable face image from among the face image candidates, Among the image candidates, processing for increasing the score SC of the candidate, for example, adding a special score, is performed on a candidate whose face inclination is substantially the same as the face inclination in the highly reliable face image. . And the candidate whose score SC after this process is more than 3rd threshold value Th3 which is the value between 1st threshold value Th1 and 2nd threshold value Th2 is detected as true face image S2. Here, as a candidate whose face inclination is substantially the same as the face inclination in the highly reliable face image, for example, the face inclination is an inclination within a rotation angle of ± 30 degrees from the face inclination in the highly reliable face image. Can be a candidate.

重複検出判定部４０は、各解像度画像上で検出された顔画像Ｓ２の各々に対して、顔画像の位置関係から、その顔画像が、検出対象画像Ｓ０上では同一の顔を表す顔画像であって解像度の隣接する複数の解像度画像上で重複して検出されたものであるか否かを判定し、重複して検出されたと認められる複数の顔画像を１つにまとめる処理を行い、重複検出のない真の顔画像Ｓ３を出力するものである。 For each of the face images S2 detected on each resolution image, the overlap detection determination unit 40 is a face image that represents the same face on the detection target image S0 based on the positional relationship of the face images. It is determined whether or not the images are detected in duplicate on a plurality of resolution images adjacent to each other in resolution, and a process of combining a plurality of face images that are recognized to be detected in duplicate into one is performed. A true face image S3 without detection is output.

検出対象画像Ｓ０を多重解像度化して複数の解像度画像を得る際には、顔画像の検出漏れを防ぐため、隣接する解像度画像間での解像度のギャップは、あまり大きくとることができない。また、判別器は、通常、判別可能な顔の大きさとしてある程度の許容範囲を有している。このような場合、検出対象画像Ｓ０上の同一の顔が、隣接する複数の解像度画像において重複して検出される場合がある。重複検出判定部４０による上記の処理は、このような重複検出を排除し、正確な検出結果を得るために行われる処理である。 When obtaining a plurality of resolution images by converting the detection target image S0 into multiple resolutions, the resolution gap between adjacent resolution images cannot be made very large in order to prevent detection errors of face images. The discriminator usually has a certain allowable range as the size of the face that can be discriminated. In such a case, the same face on the detection target image S0 may be detected redundantly in a plurality of adjacent resolution images. The above-described processing by the duplication detection determination unit 40 is processing performed to eliminate such duplication detection and obtain an accurate detection result.

次に、顔検出システム１における処理の流れについて説明する。 Next, the flow of processing in the face detection system 1 will be described.

図１３ａ，１３ｂは、上記顔検出システム１における処理の流れを示したフローチャートである。これらの図に示すように、多重解像度化部１０に検出対象画像Ｓ０が供給されると（ステップＳ３１）、この検出対象画像Ｓ０の画像サイズが所定のサイズに変換された画像Ｓ０′が生成され、この画像Ｓ０′から２の−１／３乗倍ずつサイズ（解像度）が縮小された複数の解像度画像からなる解像度画像群Ｓ１が生成される（ステップＳ３２）。そして、正規化部２０において、解像度画像群Ｓ１の各解像度化像に対して、上述の全体正規化処理と局所正規化処理が施され、正規化済みの解像度画像群Ｓ１′が得られる（ステップＳ３３）。 13a and 13b are flowcharts showing the flow of processing in the face detection system 1. As shown in these drawings, when the detection target image S0 is supplied to the multi-resolution conversion unit 10 (step S31), an image S0 ′ in which the image size of the detection target image S0 is converted into a predetermined size is generated. Then, a resolution image group S1 composed of a plurality of resolution images reduced in size (resolution) by 2 to a power of −1/3 is generated from the image S0 ′ (step S32). Then, in the normalizing unit 20, the above-described overall normalization processing and local normalization processing are performed on each resolution image of the resolution image group S1, and a normalized resolution image group S1 ′ is obtained (step S1). S33).

顔検出部３０においては、検出制御部３１からの指示を受けた解像度画像選択部３２により、解像度画像群Ｓ１′の中から画像サイズの小さい順、すなわち、Ｓ１′＿ｎ，Ｓ１′＿ｎ−１，・・・，Ｓ１′＿１の順に所定の解像度画像Ｓ１′＿ｉを選択する（ステップＳ３４）。次にサブウィンドウ設定部３３は、解像度画像Ｓ１′＿ｉ上でサブウィンドウを所定のピッチ、例えば２画素間隔で移動しながら設定して所定サイズの部分画像Ｗを順次切り出し（ステップＳ３５）、その部分画像Ｗを判別器群３４へ入力する（ステップＳ３６）。判別器群３４を構成する各判別器は、複数の弱判別器を用いて、入力された部分画像Ｗが所定の傾きおよび向きの顔を含む顔画像である蓋然性を示すスコアＳＣを算出し（ステップＳ３７）、算出されたスコアＳＣが第１の閾値Ｔｈ１以上である部分画像Ｗを顔画像の候補として抽出し、検出制御部５１がその抽出結果Ｒを取得する（ステップＳ３８）。 In the face detection unit 30, the resolution image selection unit 32 that has received an instruction from the detection control unit 31 starts from the resolution image group S 1 ′ in ascending order of image size, that is, S 1 ′ _n, S 1 ′ _n−1, ..., a predetermined resolution image S1'_i is selected in the order of S1'_1 (step S34). Next, the sub window setting unit 33 sets the sub window on the resolution image S1′_i while moving the sub window at a predetermined pitch, for example, at an interval of two pixels, and sequentially cuts out the partial images W of a predetermined size (step S35). Is input to the classifier group 34 (step S36). Each discriminator constituting the discriminator group 34 uses a plurality of weak discriminators to calculate a score SC indicating a probability that the input partial image W is a face image including a face having a predetermined inclination and orientation ( In step S37), the partial image W having the calculated score SC equal to or greater than the first threshold Th1 is extracted as a face image candidate, and the detection control unit 51 acquires the extraction result R (step S38).

そして、現在の部分画像Ｗが現在の解像度画像上で最後の部分画像であるか否かを判定する（ステップＳ３９）。ここで、現在の部分画像Ｗが最後の部分画像でないと判定された場合には、ステップＳ３５に戻り、現在の解像度画像上で新たな部分画像Ｗを切り出し、検出処理を続行する。一方、現在の部分画像Ｗが最後の部分画像であると判定された場合には、次の判定処理を行う。すなわち、現在の解像度画像が最後の解像度画像であるか否かを判定する（ステップＳ４０）。ここで、現在の解像度画像が最後の解像度画像でないと判定された場合には、ステップＳ３４に戻り、新たな解像度画像を選択し、顔画像の候補を抽出する処理を続行する。一方、現在の解像度画像が最後の解像度画像であると判定された場合には、検出制御部３１が、顔画像の候補が真の顔画像Ｓ２であるか否かを判別する。すなわち、検出制御部３１は、顔画像の候補の中から、算出されたスコアＳＣが第１の閾値Ｔｈ１より大きい第２の閾値Ｔｈ２以上である候補を、高信頼顔画像として決定し（ステップＳ４１）、顔画像の候補のうち、顔の傾きがこの高信頼顔画像における顔の傾きと略同じである候補、すなわち、顔の傾きが高信頼顔画像における顔の傾きから回転角度±３０度以内の傾きである候補に対して、この候補のスコアＳＣに特別スコアを加算する処理を施して、新たなスコアＳＣを算出する（ステップＳ４２）。そして、この加算処理後のスコアＳＣが第１の閾値Ｔｈ１と第２の閾値Ｔｈ２の間の値である第３の閾値Ｔｈ３以上である候補を、真の顔画像Ｓ２として検出する（ステップＳ４３）。 Then, it is determined whether or not the current partial image W is the last partial image on the current resolution image (step S39). If it is determined that the current partial image W is not the last partial image, the process returns to step S35, a new partial image W is cut out on the current resolution image, and the detection process is continued. On the other hand, when it is determined that the current partial image W is the last partial image, the next determination process is performed. That is, it is determined whether or not the current resolution image is the last resolution image (step S40). If it is determined that the current resolution image is not the last resolution image, the process returns to step S34 to select a new resolution image and continue the process of extracting face image candidates. On the other hand, when it is determined that the current resolution image is the last resolution image, the detection control unit 31 determines whether the face image candidate is the true face image S2. That is, the detection control unit 31 determines a candidate whose calculated score SC is equal to or greater than the second threshold Th2 that is greater than the first threshold Th1 as a highly reliable face image from among the face image candidates (step S41). ) Among the face image candidates, candidates whose face inclination is substantially the same as the face inclination in the highly reliable face image, that is, the face inclination is within a rotation angle of ± 30 degrees from the face inclination in the highly reliable face image. A new score SC is calculated by performing a process of adding a special score to the candidate score SC with respect to a candidate having a slope of (step S42). Then, a candidate whose score SC after the addition processing is equal to or greater than a third threshold Th3 that is a value between the first threshold Th1 and the second threshold Th2 is detected as a true face image S2 (step S43). .

そして、重複検出判定部４０が、検出された顔画像の位置関係に基づいて、顔画像の各々に対して、その顔画像が、検出対象画像Ｓ０上では同一の顔を表す顔画像であって解像度の隣接する複数の解像度画像上で重複して検出されたものであるか否かを判定し、重複して検出されたと認められる複数の顔画像を１つにまとめる処理を行う（ステップＳ４３）。 Then, based on the positional relationship of the detected face images, the duplicate detection determination unit 40 is a face image representing the same face on the detection target image S0 for each of the face images. It is determined whether or not the images are detected in duplicate on a plurality of resolution images adjacent to each other in resolution, and a process of combining a plurality of face images recognized as being detected in duplicate is performed (step S43). .

図１４は、上記のステップＳ３４からステップＳ４０までを繰り返すことにより、解像度画像がサイズの小さい順に選択されて、各解像度画像上で部分画像Ｗが順次切り出され、顔検出が実施される様子を示した図である。 FIG. 14 shows how the resolution images are selected in ascending order of size by repeating the above steps S34 to S40, the partial images W are sequentially cut out on each resolution image, and face detection is performed. It is a figure.

図１５は、検出対象画像Ｓ０の一例としてのスナップ写真画像を示した図である。このスナップ写真画像には３人の人物の顔Ｃ１〜Ｃ３が写っており、そのうちの１人の顔Ｃ３の一部は他の被写体によって遮蔽されている。このようなスナップ写真画像に対して顔検出処理を施した場合、遮蔽されていない２人の顔Ｃ１，Ｃ２については、顔の特徴を充分捉えることができ、比較的高いスコア（例えば、それぞれ、７０ポイントと６５ポイント）が算出され、第１の閾値Ｔｈ１（例えば３０ポイント）以上となり、顔画像の候補として抽出され、さらに第３の閾値Ｔｈ３（例えば、５０ポイント）以上であるから、顔画像として検出される。一方、遮蔽されたもう一人の顔Ｃ３については、顔の特徴を充分に捉えることができず、通常であれば、比較的低いスコア（例えば、４０ポイント）が算出され、顔画像の候補としては抽出されるが、第３の閾値Ｔｈ３以上でないため顔画像として検出されない可能性がある。しかしながら、同一の画像に含まれる複数の顔の傾きは略同じになることが多いため、上記の実施形態のように、算出されたスコアが第２の閾値Ｔｈ２（例えば、７０ポイント）以上である候補（Ｃ１）を高信頼顔画像として決定し、この高信頼顔画像における顔の傾きと略同じ傾きを有する顔を含む顔画像の候補（Ｃ２，Ｃ３）に対して、さらに特別スコア（例えば、２０ポイント）を加算するようにすれば、遮蔽等によって顔の特徴を充分とらえることができない顔Ｃ３であっても第３の閾値Ｔｈ３を超えるスコア（この例の場合６０ポイント）を期待することができる。 FIG. 15 is a diagram illustrating a snap photograph image as an example of the detection target image S0. In this snapshot image, faces C1 to C3 of three persons are shown, and a part of one face C3 is shielded by another subject. When face detection processing is performed on such a snapshot image, the facial features of two faces C1 and C2 that are not shielded can be captured sufficiently, and a relatively high score (for example, 70 points and 65 points) are calculated and become the first threshold Th1 (for example, 30 points) or more, extracted as face image candidates, and further, are the third threshold Th3 (for example, 50 points) or more. Detected as On the other hand, with respect to the other face C3 that is shielded, the facial features cannot be captured sufficiently, and usually a relatively low score (for example, 40 points) is calculated. Although it is extracted, it may not be detected as a face image because it is not equal to or greater than the third threshold Th3. However, since the inclinations of a plurality of faces included in the same image are often substantially the same, the calculated score is not less than the second threshold Th2 (for example, 70 points) as in the above embodiment. A candidate (C1) is determined as a highly reliable face image, and a special score (for example, for example, a candidate for a face image (C2, C3) including a face having substantially the same inclination as the face in the highly reliable face image (for example, (20 points) is added, a score exceeding the third threshold Th3 (60 points in this example) can be expected even for the face C3 that cannot sufficiently capture the facial features due to shielding or the like. it can.

このような、本実施形態の顔検出システムによれば、検出対象となる画像から顔画像である蓋然性を示す指標値が第１の閾値以上である部分画像を顔画像の候補として抽出し、抽出された顔画像の候補の中で、その指標値が第１の閾値よりさらに高い第２の閾値以上である候補を高信頼顔画像として決定し、顔の傾きがこの高信頼顔画像と略同じである候補に対する指標値を増加させ、指標値が第１の閾値と第２の閾値の間の値である第３の閾値以上である候補を顔画像として検出するので、同一の画像に含まれる複数の顔の傾きは略同じであることが多いという経験則に基づいて、顔画像の候補に対する指標値に候補としての信頼度をより適正に反映させることができ、顔画像の検出漏れや誤検出を抑制することが可能となる。 According to such a face detection system of the present embodiment, a partial image whose index value indicating the probability of being a face image is greater than or equal to the first threshold is extracted as a face image candidate from the detection target image and extracted. Among the candidates for the face image, a candidate having an index value equal to or higher than a second threshold value higher than the first threshold value is determined as a highly reliable face image, and the face inclination is substantially the same as the highly reliable face image. The index value for the candidate is increased, and candidates whose index value is greater than or equal to the third threshold value, which is a value between the first threshold value and the second threshold value, are detected as face images, so they are included in the same image Based on the empirical rule that the inclination of multiple faces is often approximately the same, the index value for the face image candidate can more appropriately reflect the reliability as a candidate, and face image detection omissions and errors can be reflected. Detection can be suppressed.

なお、上記第１の実施形態においては、抽出された顔画像の候補の中で、算出されたスコアが第２の閾値Ｔｈ２以上である顔画像の候補を高信頼顔画像として決定しているが、これとは別の手法として、例えば、抽出された顔画像の候補の中で、算出されたスコアが最も高い候補を高信頼顔画像として決定するようにしてもよい。 In the first embodiment, a face image candidate whose calculated score is greater than or equal to the second threshold Th2 is determined as a highly reliable face image among the extracted face image candidates. As another technique, for example, among the extracted face image candidates, a candidate having the highest calculated score may be determined as a highly reliable face image.

また、上記第１の実施形態においては、一定の高い信頼度が得られた高信頼顔画像を決定した後に、顔の傾きがその高信頼顔画像における顔の傾きと略同じである顔画像の候補に対して、その候補のスコアを増大させるようにしているが、これとは別の手法として、例えば、高信頼顔画像を決定した後に、顔の傾きがその高信頼顔画像における顔の傾きと略同じである顔画像の候補のみを真の顔画像として検出するようにしてもよい（本発明による第２の顔検出装置の実施形態に該当）。 In the first embodiment, after determining a highly reliable face image with a certain high degree of reliability, a face image whose face inclination is substantially the same as the face inclination in the highly reliable face image is determined. The candidate score is increased with respect to the candidate. As another method, for example, after determining a highly reliable face image, the face inclination in the highly reliable face image is determined. It is also possible to detect only face image candidates that are substantially the same as the true face image (corresponding to the second embodiment of the face detection device according to the present invention).

このようにすれば、検出対象となる画像から顔画像である蓋然性を示す指標値が第１の閾値以上である部分画像を顔画像の候補として抽出し、抽出された顔画像の候補の中で、その指標値が第１の閾値よりさらに高い第２の閾値以上である候補を高信頼顔画像として決定し、顔の傾きがこの高信頼顔画像と略同じである候補のみを顔画像として検出するので、同一の画像に含まれる複数の顔の傾きは略同じであることが多いという経験則に基づいて、信頼度が低いと考えられる顔画像の候補を排除することができ、顔画像の誤検出を抑制することが可能となる。 In this way, a partial image whose index value indicating the probability of being a face image is equal to or more than the first threshold is extracted as a face image candidate from the detection target image, and the extracted face image candidates are extracted. A candidate whose index value is equal to or higher than a second threshold value higher than the first threshold value is determined as a highly reliable face image, and only candidates whose face inclination is substantially the same as the highly reliable face image are detected as face images. Therefore, based on an empirical rule that the inclinations of a plurality of faces included in the same image are often substantially the same, it is possible to eliminate face image candidates that are considered to be low in reliability, and It becomes possible to suppress erroneous detection.

次に、本発明による第２の顔検出装置の実施形態（第２の実施形態）について説明する。本実施形態による顔検出システムは、第１の実施形態の場合と同様、図１の示すように、多重解像度化部１０、正規化部２０、顔検出部３０、重複検出判定部４０により構成されており、さらに、顔検出部３０は、検出制御部３１（高信頼顔画像決定手段、顔画像検出手段）、解像度画像選択部３２、サブウィンドウ設定部（部分画像切出し手段）３３、判別器群（指標値算出手段、顔画像候補抽出手段）３４により構成されるものであるが、顔検出部３０における処理が、第１の実施形態とは異なるものである。 Next, an embodiment (second embodiment) of a second face detection apparatus according to the present invention will be described. As in the case of the first embodiment, the face detection system according to the present embodiment is configured by a multi-resolution unit 10, a normalization unit 20, a face detection unit 30, and an overlap detection determination unit 40, as shown in FIG. Further, the face detection unit 30 includes a detection control unit 31 (highly reliable face image determination unit, face image detection unit), a resolution image selection unit 32, a sub window setting unit (partial image cutout unit) 33, a classifier group ( The index value calculation means and the face image candidate extraction means) 34 are configured, but the processing in the face detection unit 30 is different from that in the first embodiment.

本実施形態において、判別器群３４は、第１の実施形態と同様に、合計３６種類（顔の傾きが、０度から３３０度まで３０度刻みの計１２種類、顔の向きが、正面、左横、右横の計３種類）の判別器により構成されるが、ここでは、これらの判別器を、図１７に示すように、判別すべき顔の傾きに応じて４つのグループに分ける。すなわち、判別すべき顔の傾きが３３０度、０度、３０度である判別器を第１のグループ（０度クループ）、判別すべき顔の傾きが６０度、９０度、１２０度である判別器を第２のグループ（９０度グループ）、判別すべき顔の傾きが２４０度、２７０度、３００度である判別器を第３のグループ（２７０度グループ）、判別すべき顔の傾きが１５０度、１８０度、２１０度である判別器を第４のグループ（１８０度グループ）とする。 In the present embodiment, the discriminator group 34 has a total of 36 types (12 types of face inclinations in increments of 30 degrees from 0 degrees to 330 degrees, the face orientation is front, as in the first embodiment. These discriminators are divided into four groups according to the inclination of the face to be discriminated as shown in FIG. 17. That is, the discriminators whose face to be discriminated are 330 degrees, 0 degrees, and 30 degrees are classified into the first group (0 degree group), and the face inclinations to be discriminated are 60 degrees, 90 degrees, and 120 degrees. The classifier is the second group (90 degree group), the classifiers whose face to be discriminated are 240 degrees, 270 degrees and 300 degrees, the classifier is the third group (270 degree group), and the face tilt to be discriminated is 150. The classifiers having degrees, 180 degrees, and 210 degrees are set as a fourth group (180 degree group).

そして、検出制御部３１が、これらの判別器を、解像度画像上でサブウィンドウ設定部３３により切り出されたすべての部分画像Ｗに対して、第１のグループから第４のグループまで順番に適用して顔画像の候補を検出する。 Then, the detection control unit 31 applies these discriminators in order from the first group to the fourth group to all the partial images W cut out by the sub-window setting unit 33 on the resolution image. Face image candidates are detected.

このとき、あるグループの判別器により、信頼するに足るほど十分高いスコアである第２の閾値Ｔｈ２以上のスコアが算出されて高信頼顔画像が検出された場合には、検出制御部３１は、以後、部分画像Ｗに適用する順番がそれ以降となるグループの判別器については適用しないようにする。すなわち、判別すべき顔の傾きを高信頼顔画像における顔の傾きと略同じ傾きに固定する。このようにする理由は、ひとつの画像上に複数の顔が含まれる場合には、各顔の傾き（画像上の回転位置）が揃う場合が多いという経験則に基づき、用いる順番がそれ以降となるグループの判別器に対応する顔の傾きについては、顔画像の検出を省略することができ、これにより、顔検出の高速化を図ることができるからである。ただし、首をかしげている顔等にも対処するため、±３０度程度の傾きのバラツキを許容できるように、上記のように、判別すべき顔の傾きが近接する複数種類の判別器をグループ毎にまとめ、グループ単位で判別器を用いるようにしている。 At this time, when a score higher than the second threshold Th2, which is a sufficiently high score to be reliable, is calculated by a classifier of a certain group and a highly reliable face image is detected, the detection control unit 31 Thereafter, it is not applied to the discriminator of the group in which the order of application to the partial image W is later. In other words, the inclination of the face to be discriminated is fixed to substantially the same inclination as that of the face in the highly reliable face image. The reason for this is that when multiple faces are included in one image, the order of use is based on the rule of thumb that the inclination of each face (rotation position on the image) is often aligned. This is because detection of a face image can be omitted with respect to the inclination of the face corresponding to a classifier of a certain group, thereby speeding up the face detection. However, in order to deal with a face that has a neck squeezed, a plurality of types of discriminators with close face inclinations to be discriminated as described above are grouped so that variations in inclination of about ± 30 degrees can be allowed. The classifier is used for each group.

具体例を挙げると、例えば、第１のグループの判別器を用いて顔画像の候補を検出した後、第２のグループに属する、判別すべき顔の傾きが６０度の判別器を用いて顔画像の候補を検出しているときに、第２の閾値Ｔｈ２以上のスコアが算出され、高信頼顔画像が検出された場合には、用いる順番がそれ以降となる第３、第４のグループの判別器はその後用いない。各グループについての平均的な検出処理時間を１とすると、上記のような判別すべき顔の傾きの固定を行わない場合には、１×４方向＝４の時間がかかるのに対し、上記のような判別すべき顔の傾きの固定を行う場合には、各グループの判別器を適用して顔画像の候補が検出される確率が顔の傾きに依らず等しいと仮定する（すなわち、各グループについて顔が検出される確率が1/4であるとする）と、計算時間の期待値は、１×1/4＋２×1/4＋３×1/4＋４×1/4＝２．５となり、上記のような判別すべき顔の傾きの固定を行う場合の方が、検出処理時間が短くより高速である。 As a specific example, for example, after a face image candidate is detected using a first group discriminator, a face belonging to the second group and having a face inclination to be discriminated is 60 degrees. When an image candidate is detected, a score equal to or higher than the second threshold Th2 is calculated, and when a highly reliable face image is detected, the third and fourth groups that are used in the subsequent order are used. The discriminator is not used thereafter. Assuming that the average detection processing time for each group is 1, if the face inclination to be discriminated as described above is not fixed, it takes 1 × 4 direction = 4 time, whereas In the case of fixing the inclination of the face to be discriminated, it is assumed that the probability that face image candidates are detected by applying the discriminator of each group is equal regardless of the inclination of the face (that is, each group The probability that a face will be detected for 1/4) is 1 × 1/4 + 2 × 1/4 + 3 × 1/4 + 4 × 1/4 = 2.5. When the inclination of the face to be discriminated is fixed, the detection processing time is shorter and faster.

なお、１８０度のグループを最後に検出するのは、天地が逆さになっている顔は、他の３つのグループに対応する顔の傾きに比べて最も存在確率が低いという経験則による。 The reason why the 180 degree group is detected lastly is based on an empirical rule that a face whose top and bottom are upside down has the lowest existence probability compared to the inclination of the face corresponding to the other three groups.

また、高信頼顔画像が検出された場合には、それ以前のグループの判別器を用いて検出された顔画像の候補については、誤検出とみなして削除するようにする。上記のような判別すべき顔の傾きの固定を行う場合には、例えば、第２のグループの判別器を用いた際に高信頼顔画像が検出されて、判別すべき顔の傾きの固定が行われたときには、第３、第４のグループの判別器は用いず、これらのグループに対応する所定の傾きの顔については検出が行われないため、これらの傾きの顔については誤検出が発生することはない。しかし、第１のグループの判別器を用いた際には、スコアが比較的低い顔画像の候補が検出されている可能性がある。それらは誤検出である可能性が高いと考えることができるため、第１のグループの判別器による検出結果は誤検出とみなして削除することで、誤検出抑制が実現できる。 When a highly reliable face image is detected, the face image candidates detected using the classifiers of the previous group are regarded as erroneous detection and are deleted. When fixing the inclination of the face to be discriminated as described above, for example, when the second group discriminator is used, a highly reliable face image is detected, and the face inclination to be discriminated is fixed. When performed, the discriminators of the third and fourth groups are not used, and detection of faces with a predetermined inclination corresponding to these groups is not performed, so that erroneous detection occurs for faces with these inclinations. Never do. However, when the first group discriminator is used, a face image candidate having a relatively low score may be detected. Since it can be considered that there is a high possibility of erroneous detection, the detection result by the discriminator of the first group is regarded as erroneous detection and deleted, so that erroneous detection suppression can be realized.

これより、第２の実施形態である顔検出システムにおける処理の流れについて説明する。 Hereafter, the flow of the process in the face detection system which is 2nd Embodiment is demonstrated.

図１６ａ，１６ｂは、第２の実施形態である顔検出システムの処理の流れを示したフローチャートである。これらの図に示すように、多重解像度化部１０に検出対象画像Ｓ０が供給されると（ステップＳ５１）、この検出対象画像Ｓ０の画像サイズが所定のサイズに変換された画像Ｓ０′が生成され、この画像Ｓ０′から２の−１／３乗倍ずつサイズが縮小された複数の解像度画像からなる解像度画像群Ｓ１が生成される（ステップＳ５２）。そして、正規化部２０において、解像度画像群Ｓ１の各解像度化像に対して、上述の全体正規化処理と局所正規化処理が施され、正規化済みの解像度画像群Ｓ１′が得られる（ステップＳ５３）。 FIGS. 16a and 16b are flowcharts showing the flow of processing of the face detection system according to the second embodiment. As shown in these drawings, when the detection target image S0 is supplied to the multi-resolution conversion unit 10 (step S51), an image S0 ′ in which the image size of the detection target image S0 is converted to a predetermined size is generated. Then, a resolution image group S1 composed of a plurality of resolution images reduced in size by −1/3 power of 2 is generated from the image S0 ′ (step S52). Then, in the normalizing unit 20, the above-described overall normalization processing and local normalization processing are performed on each resolution image of the resolution image group S1, and a normalized resolution image group S1 ′ is obtained (step S1). S53).

顔検出部３０においては、検出制御部３１からの指示を受けた解像度画像選択部３２により、解像度画像群Ｓ１′の中から画像サイズの小さい順、すなわち、Ｓ１′＿ｎ，Ｓ１′＿ｎ−１，・・・，Ｓ１′＿１の順に所定の解像度画像Ｓ１′＿ｉを選択する（ステップＳ５４）。 In the face detection unit 30, the resolution image selection unit 32 that has received an instruction from the detection control unit 31 starts from the resolution image group S 1 ′ in ascending order of image size, that is, S 1 ′ _n, S 1 ′ _n−1, ..., a predetermined resolution image S1'_i is selected in the order of S1'_1 (step S54).

そして、検出制御部３１は、部分画像Ｗに対して適用する判別器のグループを選択する（ステップＳ５４）。なお、このステップＳ５４では、当該ステップが実行される度に、判別器のグループを予め決められた順番で変えて選択するが、解像度画像が新たに選択されたときには、その順番はリセットされる。また、後述のグループ固定が既になされている場合には、そのグループのみが選択されることになる。 Then, the detection control unit 31 selects a group of discriminators to be applied to the partial image W (Step S54). In step S54, each time this step is executed, the group of discriminators is selected in a predetermined order, but when a new resolution image is selected, the order is reset. In addition, when the group fixing described later is already performed, only that group is selected.

次に、サブウィンドウ設定部３３は、解像度画像Ｓ１′＿ｉ上でサブウィンドウを所定のピッチ、例えば２画素間隔で移動しながら設定して所定サイズの部分画像Ｗを順次切り出し（ステップＳ５６）、その部分画像Ｗを判別器群３４の中の上記選択されたグループの判別器へ入力する（ステップＳ５７）。部分画像Ｗが入力された判別器は、複数の弱判別器を用いて、部分画像Ｗがその判別器に対応する所定の傾きおよび向きの顔を含む顔画像である蓋然性を示すスコアＳＣを算出し（ステップＳ５８）、算出されたスコアＳＣが第１の閾値Ｔｈ1以上である部分画像Ｗを顔画像の候補として抽出し、検出制御部５１がその抽出結果Ｒを取得する（ステップＳ５９）。 Next, the sub-window setting unit 33 sets the sub-window on the resolution image S1′_i while moving the sub-window at a predetermined pitch, for example, at an interval of two pixels, and sequentially cuts out the partial images W of a predetermined size (step S56). W is input to the discriminator of the selected group in the discriminator group 34 (step S57). The discriminator to which the partial image W is input uses a plurality of weak discriminators to calculate a score SC indicating the probability that the partial image W is a face image including a face having a predetermined inclination and orientation corresponding to the discriminator. Then, the partial image W having the calculated score SC equal to or greater than the first threshold Th1 is extracted as a face image candidate, and the detection control unit 51 acquires the extraction result R (step S59).

検出制御部３１は、顔画像の候補が抽出された場合に、部分画像Ｗに適用する判別器のグループの固定が既になされているか否かを判定し（ステップＳ６０）、既にグループの固定がなされていると判定された場合には、ステップＳ６３に移行する。一方、グループの固定が未だなされていないと判定された場合には、さらに、その顔画像の候補のスコアＳＣが第２の閾値Ｔｈ２以上であるか否かを判定する（ステップＳ６１）。ここで、算出されたスコアＳＣが第２の閾値Ｔｈ２以上であると判定された場合には、その顔画像の候補を高信頼顔画像として決定し、これ以降に用いる判別器のグループを現在選択されているグループに固定する（ステップＳ６２）。 When the face image candidate is extracted, the detection control unit 31 determines whether or not the classifier group to be applied to the partial image W has already been fixed (step S60), and the group has already been fixed. If it is determined that it is, the process proceeds to step S63. On the other hand, if it is determined that the group has not yet been fixed, it is further determined whether or not the score SC of the face image candidate is equal to or greater than the second threshold Th2 (step S61). When it is determined that the calculated score SC is equal to or greater than the second threshold Th2, the face image candidate is determined as a highly reliable face image, and a group of discriminators to be used thereafter is currently selected. It fixes to the group currently performed (step S62).

そして、現在の部分画像Ｗが現在の解像度画像上で最後の部分画像であるか否かを判定する（ステップＳ６３）。ここで、現在の部分画像Ｗが最後の部分画像でないと判定された場合には、ステップＳ５６に戻り、現在の解像度画像上で新たな部分画像Ｗを切り出し、検出処理を続行する。一方、現在の部分画像Ｗが最後の部分画像であると判定された場合には、次の判定処理を行う。すなわち、現在選択されている判別器のグループが最後のグループであるか否かを判定する（ステップＳ６４）。ここで、現在選択されている判別器のグループが最後のグループであると判定された場合には、ステップＳ６５に移行し、一方、現在選択されている判別器のグループが最後のグループでないと判定された場合には、次に選択するべきグループがまだあるので、ステップＳ５５に戻り、次の判別器のグループが選択される。 Then, it is determined whether or not the current partial image W is the last partial image on the current resolution image (step S63). If it is determined that the current partial image W is not the last partial image, the process returns to step S56, a new partial image W is cut out on the current resolution image, and the detection process is continued. On the other hand, when it is determined that the current partial image W is the last partial image, the next determination process is performed. That is, it is determined whether or not the currently selected classifier group is the last group (step S64). If it is determined that the currently selected classifier group is the last group, the process proceeds to step S65, while the currently selected classifier group is not the last group. If so, there is still a group to be selected next, so the process returns to step S55, and the next group of discriminators is selected.

ステップＳ６５では、現在の解像度画像が最後の解像度画像であるか否かを判定する。ここで、現在の解像度画像が最後の解像度画像でないと判定された場合には、ステップＳ５４に戻り、新たな解像度画像を選択し、顔画像の候補を抽出する処理を続行する。一方、現在の解像度画像が最後の解像度画像であると判定された場合には、検出制御部３１が、判別器のグループを固定する以前のグループの判別器により抽出された候補を削除するとともに、残りの顔画像の候補を真の顔画像として判別する（ステップＳ６６）。すなわち、高信頼顔画像が検出された場合には、その高信頼顔画像と顔の傾きが略同じである顔画像の候補のみを顔画像として検出するようにする。 In step S65, it is determined whether or not the current resolution image is the last resolution image. If it is determined that the current resolution image is not the last resolution image, the process returns to step S54 to select a new resolution image and continue the process of extracting face image candidates. On the other hand, if it is determined that the current resolution image is the last resolution image, the detection control unit 31 deletes the candidates extracted by the classifier of the previous group that fixes the group of classifiers, The remaining face image candidates are determined as true face images (step S66). That is, when a highly reliable face image is detected, only face image candidates whose face inclination is substantially the same as that of the highly reliable face image are detected as face images.

そして、重複検出判定部４０が、検出された顔画像の位置関係に基づいて、顔画像の各々に対して、その顔画像が、入力画像Ｓ０上では同一の顔を表す顔画像であって解像度の隣接する複数の解像度画像上で重複して検出されたものであるか否かを判定し、重複して検出されたと認められる複数の顔画像を１つにまとめる処理を行う（ステップＳ６７）。 Then, based on the positional relationship of the detected face images, the duplication detection determination unit 40 is a face image representing the same face on the input image S0 for each of the face images, and has a resolution. It is determined whether or not the images are detected in duplicate on a plurality of resolution images adjacent to each other, and a plurality of face images that are recognized to be detected in duplicate are combined into one (step S67).

以上、本発明の実施形態に係る顔検出システムについて説明したが、この顔検出システムのうちの本発明の顔検出装置に対応する部分における各処理をコンピュータに実行させるためのプログラムも、本発明の実施形態の１つである。また、そのようなプログラムを記録したコンピュータ読取可能な記録媒体も、本発明の実施形態の１つである。 Although the face detection system according to the embodiment of the present invention has been described above, a program for causing a computer to execute each process in a portion corresponding to the face detection device of the present invention in the face detection system is also included in the present invention. This is one of the embodiments. A computer-readable recording medium that records such a program is also one embodiment of the present invention.

顔検出システム１の構成を示すブロック図Block diagram showing the configuration of the face detection system 1 検出対象画像の多重解像度化の工程を示す図The figure which shows the process of multiresolution of a detection target image 全体正規化処理に用いる変換曲線の一例を示す図The figure which shows an example of the conversion curve used for a whole normalization process 局所正規化処理の概念を示す図Diagram showing the concept of local normalization processing 局所正規化処理のフローを示す図Diagram showing the flow of local normalization processing 顔検出部３０の構成を示すブロック図Block diagram showing the configuration of the face detection unit 30 判別器群の構成を示すブロック図Block diagram showing configuration of classifier group 判別器における処理フローを示す図Diagram showing the processing flow in the classifier 弱判別器における特徴量の算出を説明するための図The figure for demonstrating calculation of the feature-value in a weak discriminator 判別器の学習方法を示すフローチャートFlow chart showing the learning method of the classifier 目の位置が所定の位置にくるように規格化された顔のサンプル画像を示す図The figure which shows the sample image of the face standardized so that the position of eyes may be in a predetermined position 弱判別器のヒストグラムを導出する方法を示す図The figure which shows the method of deriving the histogram of the weak classifier 第１の実施形態による顔検出システム１における処理を示すフローチャート（前半部）Flowchart showing the processing in the face detection system 1 according to the first embodiment (first half) 第１の実施形態による顔検出システム１における処理を示すフローチャート（後半部）A flowchart (second half) showing processing in face detection system 1 by a 1st embodiment. 顔検出対象となる解像度画像の切替えとその画像上でのサブウィンドウの移動を説明するための図The figure for demonstrating the change of the resolution image used as a face detection object, and the movement of the subwindow on the image 入力画像の一例としてのスナップ写真画像を示す図A diagram showing a snap photo image as an example of an input image 第２の実施形態による顔検出システム１における処理を示すフローチャート（前半部）Flowchart showing the processing in the face detection system 1 according to the second embodiment (first half) 第２の実施形態による顔検出システム１における処理を示すフローチャート（後半部）Flowchart showing the process in the face detection system 1 according to the second embodiment (second half) 判別器群が判別すべき顔の傾きに応じてグループ分けされる様子を示す図The figure which shows a mode that a classifier group is grouped according to the inclination of the face which should be discriminated.

Explanation of symbols

１顔検出システム
１０多重解像度化部
２０正規化部
３０顔検出部
３１検出制御部
３２解像度画像選択部
３３サブウィンドウ設定部
３４判別器群
４０重複検出判定部 DESCRIPTION OF SYMBOLS 1 Face detection system 10 Multi-resolution part 20 Normalization part 30 Face detection part 31 Detection control part 32 Resolution image selection part 33 Subwindow setting part 34 Discriminator group 40 Duplication detection determination part

Claims

A face detection method for detecting a face image in an input image,
Cutting out partial images at different positions on the input image;
For each of the plurality of partial images cut out at the different positions, an index value indicating the probability that the partial image is a face image including a face with a predetermined inclination based on the feature amount on the partial image. Changing the predetermined inclination into a plurality of different inclinations, respectively,
A step of extracting all partial images having a calculated index value equal to or greater than a first threshold among the plurality of partial images as face image candidates including the face of the predetermined orientation when the index value is calculated. When,
Determining, from among the face image candidates, a candidate whose calculated index value is equal to or greater than a second threshold value greater than the first threshold value as a highly reliable face image;
Performing a process of increasing the index value of the candidate for the candidate of the face image with respect to a candidate whose face inclination is substantially the same as the inclination of the face in the highly reliable face image;
Detecting the candidate having a post-processing index value equal to or greater than a third threshold value, which is a value between the first threshold value and the second threshold value, as the face image. Face detection method.

A face detection method for detecting a face image in an input image,
Cutting out partial images at different positions on the input image;
For each of the plurality of partial images cut out at the different positions, an index value indicating the probability that the partial image is a face image including a face with a predetermined inclination based on the feature amount on the image of the partial image, Changing the predetermined inclination into a plurality of different inclinations, respectively,
Out of the plurality of partial images, all partial images whose calculated index value is equal to or greater than the first threshold are extracted as face image candidates including the face having the predetermined inclination when the index value is calculated. Steps,
Determining, from among the face image candidates, a candidate whose calculated index value is equal to or greater than a second threshold value greater than the first threshold value as a highly reliable face image;
And a step of detecting, as the face image, only candidates whose face inclination is substantially the same as the face inclination in the highly reliable face image among the face image candidates.

A face detection device for detecting a face image in an input image,
Partial image cutout means for cutting out partial images at different positions on the input image;
For each of the plurality of partial images cut out at the different positions, an index value indicating the probability that the partial image is a face image including a face with a predetermined inclination based on the feature amount on the image of the partial image, Index value calculating means for calculating the predetermined inclination by changing the predetermined inclination into a plurality of different inclinations;
A face for extracting all partial images having a calculated index value greater than or equal to the first threshold as candidate face images including the face in the predetermined direction when the index value is calculated from the plurality of partial images. Image candidate extraction means;
A highly reliable face image determining means for determining, as a highly reliable face image, a candidate whose calculated index value is greater than or equal to a second threshold value greater than the first threshold value from among the face image candidates;
Index value increasing means for performing processing for increasing the index value of the candidate for candidates whose face inclination is substantially the same as that of the face in the highly reliable face image among the face image candidates;
A face image detecting means for detecting the candidate whose detected index value is not less than a third threshold value that is a value between the first threshold value and the second threshold value as the face image. A face detection device characterized by the above.

A face detection device for detecting a face image in an input image,
Partial image cutout means for cutting out partial images at different positions on the input image;
For each of the plurality of partial images cut out at the different positions, an index value indicating the probability that the partial image is a face image including a face with a predetermined inclination based on the feature amount on the image of the partial image, Index value calculating means for calculating the predetermined inclination by changing the predetermined inclination into a plurality of different inclinations;
Out of the plurality of partial images, all partial images whose calculated index value is equal to or greater than the first threshold are extracted as face image candidates including the face having the predetermined inclination when the index value is calculated. Face image candidate extraction means;
A highly reliable face image determining means for determining, as a highly reliable face image, a candidate whose calculated index value is greater than or equal to a second threshold value greater than the first threshold value from among the face image candidates;
A face image detecting means for detecting only those candidates whose face inclination is substantially the same as the face inclination in the highly reliable face image among the face image candidates; Detection device.

A program for causing a computer to function as a face detection device for detecting a face image in an input image,
The computer
Partial image cutout means for cutting out partial images at different positions on the input image;
For each of the plurality of partial images cut out at the different positions, an index value indicating the probability that the partial image is a face image including a face with a predetermined inclination based on the feature amount on the partial image. Index value calculating means for calculating the predetermined inclination by changing to a plurality of different inclinations, respectively;
A face for extracting all partial images having a calculated index value greater than or equal to the first threshold as candidate face images including the face in the predetermined direction when the index value is calculated from the plurality of partial images. Image candidate extraction means,
Highly reliable face image determining means for determining, as a highly reliable face image, a candidate whose calculated index value is greater than or equal to a second threshold value greater than the first threshold value from among the face image candidates;
Index value increasing means for performing a process of increasing the index value of the candidate for the face image candidates, with respect to a candidate whose face inclination is substantially the same as the face inclination in the highly reliable face image;
Causing the candidate whose detected index value is equal to or greater than a third threshold value, which is a value between the first threshold value and the second threshold value, to function as a face image detection unit that detects the face image; A program characterized by

A program for causing a computer to function as a face detection device for detecting a face image in an input image,
The computer
Partial image cutout means for cutting out partial images at different positions on the input image;
For each of the plurality of partial images cut out at the different positions, an index value indicating the probability that the partial image is a face image including a face with a predetermined inclination based on the feature amount on the partial image. Index value calculating means for calculating the predetermined inclination by changing to a plurality of different inclinations, respectively;
Out of the plurality of partial images, all partial images whose calculated index value is equal to or greater than the first threshold are extracted as face image candidates including the face having the predetermined inclination when the index value is calculated. Face image candidate extraction means,
Highly reliable face image determining means for determining, as a highly reliable face image, a candidate whose calculated index value is greater than or equal to a second threshold value greater than the first threshold value from among the face image candidates;
A program that functions as a face image detection unit that detects, as the face image, only a candidate whose face inclination is substantially the same as the face inclination in the highly reliable face image among the face image candidates. .