JP4749884B2

JP4749884B2 - Learning method of face discriminating apparatus, face discriminating method and apparatus, and program

Info

Publication number: JP4749884B2
Application number: JP2006038925A
Authority: JP
Inventors: 嘉郎北村
Original assignee: Fujifilm Corp
Current assignee: Fujifilm Corp
Priority date: 2005-03-07
Filing date: 2006-02-16
Publication date: 2011-08-17
Anticipated expiration: 2026-02-16
Also published as: JP2006285959A

Description

本発明は、デジタル画像が顔画像であるか否かを判別する顔判別装置の学習方法、顔判別方法および装置並びにそのためのプログラムに関するものである。 The present invention relates to a learning method of a face discriminating apparatus that discriminates whether or not a digital image is a face image, a face discriminating method and apparatus, and a program therefor.

従来、デジタルカメラによって撮影されたスナップ写真における人物の顔領域の色分布を調べてその肌色を補正したり、監視システムのデジタルビデオカメラで撮影されたデジタル映像中の人物を認識したりすることが行われている。このような場合、デジタル画像中の人物の顔に対応する顔領域を検出する必要がある。このため、画像が顔を表す画像であるか否かを判別する種々の手法が提案されている。 Conventionally, the color distribution of a person's face area in a snapshot photographed by a digital camera is examined to correct the skin color, or a person in a digital image photographed by a digital video camera of a surveillance system is recognized. Has been done. In such a case, it is necessary to detect a face area corresponding to a human face in the digital image. For this reason, various methods for determining whether an image is an image representing a face have been proposed.

例えば、非特許文献１は、顔を検出する際に用いられる特徴量である輝度値を正規化し、顔について学習を行ったニューラルネットワークの学習結果を参照して、画像が顔画像であるか否かを判別する手法である。また、非特許文献２は、画像中に含まれるエッジのような高周波成分を対象物の検出に使用する特徴量として求めてこの特徴量を正規化し、ブースティングと称されるマシンラーニング（machine learning）の手法を用いての特徴量についての学習結果を参照して、画像が対象物を表す画像であるか否かを判別する手法である。これら非特許文献１，２の手法は、顔等の対象物の検出に使用する特徴量を正規化しているため、画像が対象物を表す画像であるか否かを精度よく判別することができる。 For example, Non-Patent Document 1 normalizes a luminance value, which is a feature amount used when detecting a face, refers to a learning result of a neural network that has learned the face, and determines whether the image is a face image. This is a technique for determining whether or not. Non-Patent Document 2 obtains a high-frequency component such as an edge included in an image as a feature amount used for detection of an object, normalizes the feature amount, and performs machine learning (machine learning) called boosting. This is a method for determining whether or not the image is an image representing an object with reference to the learning result on the feature amount using the method (1). Since these methods of Non-Patent Documents 1 and 2 normalize the feature amount used for detecting an object such as a face, it is possible to accurately determine whether the image is an image representing the object. .

また、所定対象物を表す画像であることが分かっている複数のサンプル画像と、所定対象物を表す画像でないことが分かっている複数のサンプル画像とからなる多数のサンプル画像群のそれぞれから算出された特徴量を、マシンラーニングの手法によりあらかじめ学習することにより得られた、特徴量の入力により所定対象物を表す画像と所定対象物を表さない画像とを判別するための基準値を出力する複数の識別器を備え、この複数の識別器から出力された基準値の重み付け総和があらかじめ定めた閾値を超えた場合に、判別対象画像が所定対象物を表す画像であると判別する手法が本出願人により提案されている（特許文献２〜４参照）。 In addition, it is calculated from each of a large number of sample image groups including a plurality of sample images known to be images representing a predetermined object and a plurality of sample images known to be images not representing the predetermined object. A reference value for discriminating between an image representing a predetermined object and an image not representing the predetermined object by inputting the feature amount, which is obtained by learning the obtained feature amount in advance by a machine learning method. This technique includes a plurality of discriminators, and discriminates that a discrimination target image is an image representing a predetermined target when a weighted sum of reference values output from the plurality of discriminators exceeds a predetermined threshold. It has been proposed by the applicant (see Patent Documents 2 to 4).

また、顔を表す画像であることが分かっている複数のサンプル画像と、顔を表す画像でないことが分かっている複数のサンプル画像とからなる多数のサンプル画像群のそれぞれから算出された特徴量を、マシンラーニングの手法によりあらかじめ学習することにより得られた、特徴量の入力により判別対象画像が顔を表す画像であるか否かを判別する複数の弱判別器を備え、これら複数の弱判別器を線形に結合してカスケード構造をなし、すべての弱判別器において顔を表す画像であると判別された場合に、判別対象画像が顔を表す画像であると判別する手法も提案されている（非特許文献３参照）。 In addition, the feature amount calculated from each of a large number of sample images including a plurality of sample images known to be images representing a face and a plurality of sample images known to be images not representing a face is obtained. A plurality of weak discriminators, which are obtained by learning in advance by a machine learning method, and that determine whether or not the discrimination target image is an image representing a face by input of a feature amount. A method has also been proposed in which when a weakly classifier is identified as an image representing a face, the image to be identified is an image representing a face when the images are represented in a cascade structure. Non-Patent Document 3).

この特許文献２から４および非特許文献３に記載された手法を用いて、サンプル画像として顔を学習させた場合には、判別対象画像が顔を表す画像であるか否かを良好に判別することができる。 Using the methods described in Patent Documents 2 to 4 and Non-Patent Document 3, when a face is learned as a sample image, it is well determined whether or not the determination target image is an image representing a face. be able to.

また、サンプル画像を段階的に拡大縮小したり、段階的に回転したりすることによりサンプル画像を段階的に変形し、変形の各段において得られるサンプル画像を用いて学習を行っているため、判別対象画像が表す顔の縮率が異なったり、多少回転していても、判別対象画像が顔を表す画像であるか否かを判別することができる。 In addition, because the sample image is deformed stepwise by scaling the sample image stepwise or rotated stepwise, and learning is performed using the sample image obtained at each stage of deformation, Even if the reduction ratio of the face represented by the determination target image is different or slightly rotated, it can be determined whether or not the determination target image is an image representing the face.

なお、マシンラーニングの手法によりあらかじめ学習することにより複数の識別器あるいは複数の弱判別器を得るような場合には、その学習において、顔を表すサンプル画像として、顔の天地方向や顔の向き（首振り方向）が揃った顔画像が用いられる（非特許文献３，図７等参照）。このような顔の天地方向や向きが揃った顔画像をサンプル画像として用いることにより、顔を表すサンプル画像において、顔を構成する目、鼻、口等の顔部品や顔の輪郭がそれぞれ略特定の位置に現れるので、顔に共通するパターン上の特徴が見出しやすく、判別精度の向上が期待できる。 In the case where a plurality of discriminators or a plurality of weak discriminators are obtained by learning in advance by a machine learning method, in the learning, as a sample image representing the face, the top and bottom direction of the face and the direction of the face ( Face images with uniform head swing directions are used (see Non-Patent Document 3, FIG. 7 and the like). By using a face image with the same orientation and orientation of the face as a sample image, the facial parts such as eyes, nose, and mouth that make up the face and the outline of the face are substantially specified in the sample image representing the face. Therefore, it is easy to find features on the pattern common to the face, and an improvement in discrimination accuracy can be expected.

また、上記のように顔の天地方向や向きが揃った顔画像をサンプル画像として用いる学習では、その学習に用いたサンプル画像の顔の向きがそのまま判別可能な顔の向きとなる。したがって、任意の方向を向いた顔を検出する多方向顔検出を実現しようとする場合には、上記判別手法によって判別対象画像が顔を表す画像か否かを判別する顔判別手段（装置）を顔の向き毎に複数用意し、これらを同時に使用することとなる。
Henry A. Rowley, Shumeet Baluja, and Takeo Kanada, "Neural Network-Based Face Detection", volume 20, number 1, pages 23-38, January 1998. Rainer Lienhart, Jochen Maydt, "An Extended Set of Haar-like Features for Rapid Object Detection", International Conference on Image Processing. 「高速全方向顔検出」，Shihong LAO他，画像の認識・理解シンポジウム（MIRU2004），２００４年７月，P.II-271−II-276 特開平５−２８２４５７号公報特願２００３−３１６９２４号特願２００３−３１６９２５号特願２００３−３１６９２６号 Further, in the learning using the face image having the same vertical direction and orientation as the sample image as described above, the face orientation of the sample image used for the learning is a face orientation that can be determined as it is. Therefore, when multi-directional face detection for detecting a face facing an arbitrary direction is to be realized, face discrimination means (apparatus) for discriminating whether or not the discrimination target image is an image representing a face by the discrimination method is provided. A plurality of face orientations are prepared and used simultaneously.
Henry A. Rowley, Shumeet Baluja, and Takeo Kanada, "Neural Network-Based Face Detection", volume 20, number 1, pages 23-38, January 1998. Rainer Lienhart, Jochen Maydt, "An Extended Set of Haar-like Features for Rapid Object Detection", International Conference on Image Processing. "High-speed omnidirectional face detection", Shihong LAO et al., Image Recognition and Understanding Symposium (MIRU2004), July 2004, P.II-271-II-276 Japanese Patent Laid-Open No. 5-282457 Japanese Patent Application No. 2003-316924 Japanese Patent Application No. 2003-316925 Japanese Patent Application No. 2003-316926

ところで、顔判別装置に学習させたい画像上の特徴部分は常に同じではなく、特に、顔判別装置の判別すべき顔の向きや顔判別装置が用いられる顔検出上の処理の種類によって異なる。 By the way, the feature parts on the image to be learned by the face discriminating apparatus are not always the same, and in particular, differ depending on the face direction to be discriminated by the face discriminating apparatus and the type of processing on the face detection used by the face discriminating apparatus.

例えば、横顔を判別する顔判別装置では、横顔に特有の比較的広い背景の存在を学習させることが重要であり、また、正面顔を判別する顔判別装置であって顔検出上の顔候補をラフに検出する処理に用いられる顔判別装置では、ロバスト性を重視し、顔部品の細かな特徴よりも顔の形状が丸いという正面顔の最も単純で共通性の高い特徴を学習させることが重要となる。 For example, in a face discriminating apparatus that discriminates a side face, it is important to learn the presence of a relatively wide background unique to the side face, and a face discriminating apparatus that discriminates a front face and selects face candidates for face detection. In a face discrimination device used for rough detection processing, it is important to emphasize the robustness and to learn the simplest and most common features of the front face that have a round face shape rather than the fine features of the facial parts It becomes.

しかしながら、上記のように顔の天地方向や向きが揃った顔画像をサンプル画像として用いる学習では、顔を表すサンプル画像の顔の向きや天地方向を揃えただけであるから、学習させたい画像上の特徴部分がサンプル画像に適正に含まれず、また、サンプル画像に含まれる画像上の特徴が複数の画像間で分散する傾向があるため、顔判別装置に対して本来学習させたい特徴を的確に学習させることが難しい。 However, learning using a face image with the same orientation and orientation of the face as a sample image as described above only aligns the orientation and orientation of the sample image representing the face. Are not properly included in the sample image, and the features on the image included in the sample image tend to be distributed among a plurality of images. Difficult to learn.

本発明は、上記事情に鑑み、本来学習させたい特徴を的確に学習した顔判別装置およびそのためのプログラム並びに当該顔判別装置を得るための顔判別装置の学習方法を提供することを目的とするものである。 SUMMARY OF THE INVENTION In view of the above circumstances, an object of the present invention is to provide a face discriminating apparatus that accurately learns a characteristic that is originally intended to be learned, a program therefor, and a learning method for the face discriminating apparatus for obtaining the face discriminating apparatus. It is.

本発明の顔判別装置の学習方法は、入力画像が所定の向きおよび天地方向の顔を含む顔画像であるか否かを判別するための顔判別装置に、顔の向きおよび天地方向が前記所定の向きおよび天地方向である複数の異なる学習用顔画像を用いたマシンラーニングの手法により顔の特徴を学習させる顔判別装置の学習方法において、前記複数の学習用顔画像が、前記所定の向きおよび該顔判別装置が用いられる顔検出上の処理の種類のうち少なくとも一方に応じて決定される所定の顔領域の画像のみからなることを特徴とするものである。 According to the learning method of the face discriminating apparatus of the present invention, a face discriminating apparatus for discriminating whether an input image is a face image including a face in a predetermined direction and a vertical direction, the face direction and the vertical direction are the predetermined direction. In the learning method of the face discriminating apparatus that learns the facial features by a machine learning method using a plurality of different learning face images that are in the direction and the top and bottom directions, the plurality of learning face images have the predetermined orientation and It is characterized by comprising only an image of a predetermined face area determined according to at least one of the types of processing for face detection used by the face discrimination device.

本発明の顔判別装置の学習装置は、入力画像が所定の向きおよび天地方向の顔を含む顔画像であるか否かを判別するための顔判別装置に、顔の向きおよび天地方向が前記所定の向きおよび天地方向である複数の異なる学習用顔画像を用いたマシンラーニングの手法により顔の特徴を学習させる顔判別装置の学習装置において、前記複数の学習用顔画像が、前記所定の向きおよび該顔判別装置が用いられる顔検出上の処理の種類のうち少なくとも一方に応じて決定される所定の顔領域の画像のみからなることを特徴とするものである。 The learning device of the face discriminating apparatus of the present invention is a face discriminating device for discriminating whether or not the input image is a face image including a face in a predetermined direction and a vertical direction. In the learning device of the face discriminating apparatus that learns the facial features by a machine learning method using a plurality of different learning face images that are the orientation and the top-and-bottom direction, the plurality of learning face images have the predetermined orientation and It is characterized by comprising only an image of a predetermined face area determined according to at least one of the types of processing for face detection used by the face discrimination device.

本発明のプログラムは、コンピュータに、入力画像が所定の向きおよび天地方向の顔を含む顔画像であるか否かを判別するための顔判別装置に、顔の向きおよび天地方向が前記所定の向きおよび天地方向である複数の異なる学習用顔画像を用いたマシンラーニングの手法により顔の特徴を学習させる処理を実行させるプログラムにおいて、前記複数の学習用顔画像が、前記所定の向きおよび該顔判別装置が用いられる顔検出上の処理の種類のうち少なくとも一方に応じて決定される所定の顔領域の画像のみからなることを特徴とするものである（第１のプログラム）。 The program according to the present invention provides a face discriminating device for discriminating whether or not an input image is a face image including a face in a predetermined direction and a vertical direction, the face direction and the vertical direction being the predetermined direction. And a program for executing a process of learning facial features by a machine learning method using a plurality of different learning face images in a vertical direction, wherein the plurality of learning face images have the predetermined orientation and the face discrimination. The apparatus comprises only an image of a predetermined face area determined according to at least one of the types of processing on face detection used by the apparatus (first program).

本発明の顔判別方法は、顔の向きおよび天地方向が所定の向きおよび天地方向である複数の異なる学習用顔画像を用いたマシンラーニングの手法により顔の特徴を学習してなる、入力画像が前記所定の向きおよび天地方向の顔を含む顔画像であるか否かを判別する顔判別装置を用いて、入力画像が前記顔画像であるか否かを判別する顔判別方法において、前記複数の学習用顔画像が、前記所定の向きおよび該顔判別装置が用いられる顔検出上の処理の種類のうち少なくとも一方に応じて決定される所定の顔領域の画像のみからなることを特徴とするものである。 According to the face discrimination method of the present invention, an input image obtained by learning facial features by a machine learning method using a plurality of different learning face images in which the face direction and the top and bottom direction are a predetermined direction and the top and bottom direction is used. In the face discriminating method for discriminating whether or not the input image is the face image using a face discriminating device that discriminates whether or not the face image includes a face in the predetermined direction and the top and bottom direction, The learning face image is composed only of an image of a predetermined face area determined according to at least one of the predetermined direction and the type of processing on face detection using the face discrimination device. It is.

本発明の顔判別装置は、顔の向きおよび天地方向が所定の向きおよび天地方向である複数の異なる学習用顔画像を用いたマシンラーニングの手法により顔の特徴を学習してなる、入力画像が前記所定の向きおよび天地方向の顔を含む顔画像であるか否かを判別する顔判別装置において、前記複数の学習用顔画像が、前記所定の向きおよび該顔判別装置が用いられる顔検出上の処理の種類のうち少なくとも一方に応じて決定される所定の顔領域の画像のみからなることを特徴とするものである。 The face discriminating apparatus of the present invention has an input image obtained by learning facial features by a machine learning method using a plurality of different learning face images in which the face direction and the vertical direction are a predetermined direction and the vertical direction. In the face discriminating apparatus that discriminates whether or not the face image includes a face in the predetermined direction and the top-and-bottom direction, the plurality of learning face images are used for the face detection in which the predetermined direction and the face discriminating apparatus are used. It consists only of an image of a predetermined face area determined according to at least one of the types of processing.

本発明のプログラムは、コンピュータを、顔の向きおよび天地方向が所定の向きおよび天地方向である複数の異なる学習用顔画像を用いたマシンラーニングの手法により顔の特徴を学習してなる、入力画像が前記所定の向きおよび天地方向の顔を含む顔画像であるか否かを判別する顔判別装置として機能させるためのプログラムにおいて、前記複数の学習用顔画像が、前記所定の向きおよび該顔判別装置が用いられる顔検出上の処理の種類のうち少なくとも一方に応じて決定される所定の顔領域の画像のみからなることを特徴とするものである（第２のプログラム）。 The program according to the present invention is an input image obtained by learning a facial feature by a computer learning method using a plurality of different learning face images in which the face direction and the vertical direction are a predetermined direction and the vertical direction. In a program for functioning as a face discriminating device that discriminates whether or not the face image includes a face in the predetermined direction and the top-and-bottom direction, the plurality of learning face images include the predetermined direction and the face determination The apparatus comprises only an image of a predetermined face area determined according to at least one of the types of processing on face detection used by the apparatus (second program).

本発明において、前記所定の向きが横であるときには、前記所定の顔領域は、顔輪郭すべてを囲む領域とすることができる。 In the present invention, when the predetermined direction is horizontal, the predetermined face area can be an area surrounding the entire face contour.

また、本発明において、前記所定の向きが正面であり、前記処理の種類が、顔候補を抽出する前段処理と抽出した顔候補を絞り込む後段処理とを有する顔検出処理における前記前段処理であるときには、前記所定の顔領域は、顔輪郭すべてを囲む領域とすることができる。 Further, in the present invention, when the predetermined direction is the front, and the type of the process is the pre-stage process in the face detection process including the pre-stage process for extracting the face candidates and the post-stage process for narrowing down the extracted face candidates. The predetermined face area may be an area surrounding the entire face contour.

また、本発明において、前記所定の向きが正面であり、前記処理の種類が、顔候補を抽出する前段処理と抽出した顔候補を絞り込む後段処理とを有する顔検出処理における前記後段処理であるときには、前記所定の顔領域は、両目、鼻および上唇のみを囲む領域および顔輪郭すべてを囲む領域、両目、鼻および上唇のみを囲む領域、または、両目および鼻のみを囲む領域とすることができる。 Further, in the present invention, when the predetermined direction is the front and the type of processing is the post-processing in the face detection processing including the pre-processing for extracting the face candidates and the post-processing for narrowing down the extracted face candidates. The predetermined face region may be a region surrounding only both eyes, nose and upper lip and a region surrounding all face contours, a region surrounding only both eyes, nose and upper lip, or a region surrounding only both eyes and nose.

本発明において、前記顔領域は、矩形領域であることが好ましい。 In the present invention, the face area is preferably a rectangular area.

本発明において、前記顔判別装置は、互いに異なる複数の弱判別器を線形に結合した構造を有するものとすることができる。 In the present invention, the face discriminating apparatus may have a structure in which a plurality of different weak discriminators are linearly coupled.

本発明においては、学習に用いる顔画像として、少なくとも前記学習用顔画像を用いていればよく、もちろん、学習に用いる画像として、前記学習用顔画像に加え学習用非顔画像を用いても構わない。 In the present invention, it is sufficient that at least the learning face image is used as a face image used for learning. Of course, a learning non-face image may be used in addition to the learning face image as an image used for learning. Absent.

本発明において、マシンラーニングの手法としては、例えば、ブースティングやニューラルネットワーク等の手法を考えることができる。 In the present invention, as a machine learning technique, for example, a technique such as boosting or a neural network can be considered.

「学習用顔画像」とは、顔を表す画像であることが分かっている学習に用いるサンプル画像をいい、「学習用非顔画像」とは顔を表す画像でないことが分かっている学習に用いるサンプル画像をいう。 “Learning face image” means a sample image used for learning that is known to be an image representing a face, and “Non-face image for learning” is used for learning that is not known to represent a face. A sample image.

「顔の向きおよび天地方向が前記所定の向きおよび天地方向である」とは、顔の向きおよび天地方向が完全に前記所定の向きおよび天地方向に一致した状態に限定されるわけではなく、僅かなずれを許容するものであり、例えば、顔の天地方向について、画像平面上での±１５度程度の回転角度のずれを許容するものとする。 “The face direction and the top and bottom direction are the predetermined direction and the top and bottom direction” is not limited to a state in which the face direction and the top and bottom direction completely coincide with the predetermined direction and the top and bottom direction. For example, it is assumed that a rotational angle shift of about ± 15 degrees on the image plane is allowed in the vertical direction of the face.

「弱判別器」とは、正答率が５０％を超える判別手段（モジュール）であり、「複数の弱判別器を線形に結合した構造」とは、このような弱判別器を直列に接続し、弱判別器において、対象画像が顔画像であると判別されたときに次の弱判別器に進み、非顔画像であると判別されたときに判別処理を離脱するように構成された構造のことをいう。最後の弱判別器において顔画像であると判別された対象画像が、最終的に、顔画像であると判別される。 “Weak discriminator” is a discriminating means (module) with a correct answer rate exceeding 50%, and “a structure in which a plurality of weak discriminators are linearly combined” is such a weak discriminator connected in series. The weak classifier has a structure configured to proceed to the next weak classifier when the target image is determined to be a face image and to leave the determination process when it is determined to be a non-face image. That means. The target image determined to be a face image by the last weak classifier is finally determined to be a face image.

「顔」は、目、鼻、口、を構成部品として有し、口は上唇と下唇からなるものとする。耳は「構成部品」には含めない。したがって、学習用顔画像の顔領域には、耳が含まれていてもいなくてもよい。「目」は眉を含んでも含まなくてもよい。 The “face” has eyes, nose and mouth as components, and the mouth is composed of an upper lip and a lower lip. Ears are not included in “components”. Therefore, the face area of the learning face image may or may not include an ear. “Eyes” may or may not include eyebrows.

「顔輪郭」とは、頭部（後頭部を含む）および首の輪郭を除いた輪郭であって、正面顔の場合には、右側のこめかみ、顎（あご）、左側のこめかみの各点を結ぶ輪郭、横顔の場合には、額、鼻、下顎の各点を結ぶ輪郭をいう。 The “face contour” is the contour excluding the contours of the head (including the back of the head) and neck. In the case of a frontal face, the right temple, chin, and left temple are connected. In the case of an outline or profile, it means an outline connecting points on the forehead, nose, and lower jaw.

本発明の顔判別装置の学習方法によれば、学習用顔画像の顔の向きや天地方向を揃えることに加え、学習させたい画像上の特徴部分がそれぞれ異なる場合別に、学習用顔画像の顔領域を、所定の顔領域に限定しているので、学習させたい画像上の特徴部分を学習用顔画像に適正に含めることができるとともに、学習用顔画像に含まれる画像上の特徴が複数の画像間で分散するのを抑制することができ、本来学習させたい画像上の特徴を的確に学習した顔判別装置を得ることができる。 According to the learning method of the face discriminating apparatus of the present invention, in addition to aligning the face direction and the top-and-bottom direction of the learning face image, the face of the learning face image is classified according to the case where the characteristic parts on the image to be learned are different. Since the area is limited to a predetermined face area, a feature portion on the image to be learned can be appropriately included in the learning face image, and a plurality of features on the image included in the learning face image can be included. It is possible to obtain a face discriminating apparatus that can suppress dispersion between images and accurately learn features on an image that is originally desired to be learned.

また、本発明の顔判別装置およびプログラムは、上記学習方法に基づく学習により得られたものであるから、本来学習させたい画像上の特徴を的確に学習した顔判別装置およびそのためのプログラムとなる。 Further, since the face discriminating apparatus and program according to the present invention are obtained by learning based on the learning method described above, the face discriminating apparatus and the program therefor accurately learn the features on the image to be originally learned.

以下、本発明の実施形態について説明する。図１は本発明の顔判別装置（判別器）が適用された顔検出システムの構成を示す概略ブロック図である。この顔検出システムは、判別器による判別結果に基づいて、デジタル画像中の顔を、その位置、大きさ、向き、回転角によらずに検出するものである。図１に示すように、顔検出システム１は、入力画像Ｓ０を多重解像度化して複数の解像度画像（縮小画像）Ｓ１＿ｉ（ｉ＝１，２，３・・・）を得る多重解像度画像生成部１０、多重解像度画像Ｓ１＿ｉの輝度分散の正規化を行って正規化済みの多重解像度画像Ｓ１′＿ｉを得る画像正規化部２０、各解像度画像Ｓ１′＿ｉに対してラフな顔検出処理を施して顔候補Ｓ２を抽出する顔検出前段処理部３０、顔候補近傍画像に対して高精度な顔検出処理を施して顔候補Ｓ２を絞り込み顔Ｓ３を得る顔検出後段処理部４０、複数の解像度画像上で重複して検出された顔Ｓ３を整理して顔Ｓ３′を得る重複検出判定処理部５０を備える。 Hereinafter, embodiments of the present invention will be described. FIG. 1 is a schematic block diagram showing a configuration of a face detection system to which a face discrimination device (discriminator) according to the present invention is applied. This face detection system detects a face in a digital image regardless of its position, size, orientation, and rotation angle based on a discrimination result by a discriminator. As shown in FIG. 1, the face detection system 1 has a multi-resolution image generation unit 10 that obtains a plurality of resolution images (reduced images) S1_i (i = 1, 2, 3,...) By converting the input image S0 to multi-resolution. The image normalization unit 20 obtains a normalized multi-resolution image S1′_i by normalizing the luminance dispersion of the multi-resolution image S1_i, and performs a rough face detection process on each resolution image S1′_i. A face detection pre-processing unit 30 that extracts the candidate S2, a face detection post-processing unit 40 that obtains a face S3 by performing high-precision face detection processing on the face candidate neighborhood image to narrow down the face candidate S2, and a plurality of resolution images A duplicate detection determination processing unit 50 is provided that arranges the duplicated detected faces S3 to obtain a face S3 ′.

多重解像度画像生成部１０は、顔検出しようとする入力画像Ｓ０の画像サイズ（解像度）を所定のサイズ、例えば、短辺が４１６画素の矩形サイズに変換して画像Ｓ１を生成し、さらに画像Ｓ１を基本画像として、解像度の異なる複数の解像度画像Ｓ１＿ｉを生成する。このような画像を生成する理由は、通常、入力画像に含まれる顔の大きさは不明であるが、一方、検出しようとする顔の大きさ（画像サイズ）は、後述の判別器の構造と関連して一定にする必要があるため、解像度の異なる画像上で所定サイズの部分画像をそれぞれ切り出して、顔か非顔かを判別してゆく必要があるためである。具体的には、図２に示すように、画像Ｓ１を基本画像Ｓ１＿１として、画像Ｓ１＿１に対して２の−１／３乗倍の画像Ｓ１＿２と、画像Ｓ１＿２に対して２の−１／３乗倍（基本画像Ｓ１＿１に対しては２の−２／３乗倍）の画像Ｓ１＿３とを先に生成し、その後、画像Ｓ１＿１，Ｓ１＿２，Ｓ１＿３のそれぞれに対して、１／２倍サイズの縮小画像を生成し、それらの縮小画像に対してさらに１／２倍サイズの縮小画像を生成する・・・といった処理を繰り返し行い、複数の縮小画像を所定の数だけ生成するようにする。このようにすることで、輝度信号の補間処理を必要としない１／２倍の縮小処理をメインに、基本画像から２の−１／３乗倍ずつ解像度が縮小された複数の画像が高速に生成できる。例えば、画像Ｓ１＿１が短辺４１６画素の矩形サイズである場合、画像Ｓ１＿２，Ｓ１＿３，・・・は、短辺がそれぞれ、３３０画素，２６２画素，２０８画素，１６５画素，１３１画素，１０４画素，８２画素，６５画素，・・・の矩形サイズとなり、２の−１／３乗倍ずつ縮小された解像度画像を生成することができる。なお、このように輝度信号を補間しないで生成される画像は、画像パターンの特徴をそのまま担持する傾向が強いので、顔検出処理において精度向上が期待できる点で好ましい。 The multi-resolution image generation unit 10 converts the image size (resolution) of the input image S0 to be face-detected into a predetermined size, for example, a rectangular size with a short side of 416 pixels, and generates the image S1, and further the image S1 Are generated as a basic image, and a plurality of resolution images S1_i having different resolutions are generated. The reason why such an image is generated is that the size of the face included in the input image is usually unknown. On the other hand, the size of the face to be detected (image size) depends on the structure of the discriminator described later. This is because it is necessary to make them constant in relation to each other, and it is necessary to cut out partial images of a predetermined size on images with different resolutions to determine whether they are faces or non-faces. Specifically, as illustrated in FIG. 2, an image S1 is a basic image S1_1, an image S1_2 that is −1/3 times a power of 2 with respect to the image S1_1, and a 2 −1/3 power of the image S1_2 The image S1_3 that is doubled (2 to the power of 2/3 for the basic image S1_1) is generated first, and then a reduced image that is ½ times the size of each of the images S1_1, S1_2, and S1_3 Are generated, and a reduced image of 1/2 size is further generated with respect to the reduced images, so that a predetermined number of reduced images are generated. In this way, a plurality of images whose resolution is reduced by a factor of −1/3 times from the basic image at a high speed mainly by a reduction process of 1/2 times that does not require an interpolation process of luminance signals. Can be generated. For example, when the image S1_1 has a rectangular size of 416 pixels on the short side, the images S1_2, S1_3,... Have 330 pixels, 262 pixels, 208 pixels, 165 pixels, 131 pixels, 104 pixels, and 82 on the short sides, respectively. The resolution is reduced to a square size of pixels, 65 pixels,. Note that an image generated without interpolating the luminance signal in this way has a strong tendency to carry the characteristics of the image pattern as they are, and is preferable in that an improvement in accuracy can be expected in the face detection process.

画像正規化部２０は、多重解像度画像Ｓ１＿ｉに対して後に施される顔検出処理の精度が向上するように、輝度分散が所定レベルとなるように階調変換を行って正規化し、正規化済みの多重解像度画像Ｓ１′＿ｉを得る。 The image normalization unit 20 performs normalization by performing gradation conversion so that the luminance dispersion becomes a predetermined level so that the accuracy of face detection processing to be performed later on the multi-resolution image S1_i is improved, and has been normalized. Multi-resolution image S1′_i is obtained.

顔検出前段処理部３０は、画像正規化部２０により正規化された各解像度画像Ｓ１′＿ｉに対して比較的粗く高速な顔検出処理を施し、各解像度画像Ｓ１′＿ｉから顔候補Ｓ２を暫定的に抽出するものである。図３は、この顔検出前段処理部３０の構成を示すブロック図である。顔検出前段処理部３０は、図３に示すように、主に正面顔を検出する第１の正面顔検出部３１と、主に左横顔を検出する第１の左横顔検出部３２と、主に右横顔を検出する第１の右横顔検出部３３とから構成されており、各顔検出部３１〜３３は、それぞれ、複数の弱判別器ＷＣｉ（ｉ＝１〜Ｎ）が線形に結合してカスケード構造を有する判別器３１ａ，３２ａ，３３ａを備えている。 The face detection pre-processing unit 30 performs a relatively coarse and high-speed face detection process on each resolution image S1′_i normalized by the image normalization unit 20, and provisionally selects a face candidate S2 from each resolution image S1′_i. To be extracted. FIG. 3 is a block diagram showing the configuration of the face detection pre-processing unit 30. As shown in FIG. 3, the face detection pre-processing unit 30 includes a first front face detection unit 31 that mainly detects a front face, a first left side face detection unit 32 that mainly detects a left side face, The first right side face detection unit 33 for detecting the right side face includes a plurality of weak discriminators WCi (i = 1 to N) that are linearly coupled. Discriminators 31a, 32a, 33a having a cascade structure.

図５は、上記判別器における大局的な処理フローを示したものであり、図６は、その中の各弱判別器による処理フローを示したものである。 FIG. 5 shows a general processing flow in the discriminator, and FIG. 6 shows a processing flow by each weak discriminator therein.

まず、最初の弱判別器ＷＣ１が、解像度画像Ｓ１′＿ｉ上で切り出された所定サイズの部分画像に対して当該部分画像が顔であるか否かを判別する（ステップＳＳ１）。具体的には、弱判別器ＷＣ１は、図７に示すように、解像度画像Ｓ１′＿ｉ上で切り出された所定サイズの部分画像、例えば、３２×３２画素サイズの画像に対して、４近傍画素平均を行うことにより、１６×１６画素サイズの画像と、８×８画素サイズの縮小した画像を得、これら３つの画像の平面内に設定される所定の２点を１ペアとして、複数種類のペアからなる１つのペア群を構成する各ペアにおける２点間の輝度の差分値をそれぞれ計算し、これらの差分値の組合せを特徴量とする（ステップＳＳ１−１）。各ペアの所定の２点は、例えば、画像上の顔の濃淡の特徴が反映されるよう決められた縦方向に並んだ所定の２点や、横方向に並んだ所定の２点とする。そして、特徴量である差分値の組合せに応じて所定のスコアテーブルを参照してスコアを算出し（ステップＳＳ１−２）、直前の弱判別器が算出したスコアに自己の算出したスコアを加算して累積スコアを算出するが（ステップＳＳ１−３）、最初の弱判別器ＷＣ１では、直前の弱判別器がないので、自己の算出したスコアをそのまま累積スコアとする。この累積スコアが所定の閾値以上であるか否かによって部分画像が顔であるか否かを判別する（ステップＳＳ１−４）。ここで、上記部分画像が顔と判別されたときには、次の弱判別器ＷＣ２による判別に移行し（ステップＳＳ２）、部分画像が非顔と判別されたときには、部分画像は、即、非顔と断定され（ステップＳＳＢ）、処理が終了する。 First, the first weak discriminator WC1 discriminates whether or not the partial image is a face with respect to a partial image of a predetermined size cut out on the resolution image S1′_i (step SS1). Specifically, as shown in FIG. 7, the weak discriminator WC1 uses four neighboring pixels for a partial image of a predetermined size cut out on the resolution image S1′_i, for example, an image of 32 × 32 pixel size. By averaging, an image of 16 × 16 pixel size and a reduced image of 8 × 8 pixel size are obtained, and a plurality of types of two predetermined points set in the plane of these three images are used as one pair. The difference value of the brightness | luminance between two points in each pair which comprises one pair group which consists of a pair is each calculated, and the combination of these difference values is made into a feature-value (step SS1-1). The predetermined two points of each pair are, for example, two predetermined points arranged in the vertical direction and two predetermined points arranged in the horizontal direction so as to reflect the characteristics of the facial shading on the image. Then, a score is calculated by referring to a predetermined score table according to a combination of difference values as feature amounts (step SS1-2), and the score calculated by itself is added to the score calculated by the previous weak discriminator. The accumulated score is calculated (step SS1-3). However, since the first weak discriminator WC1 has no previous weak discriminator, the score calculated by itself is used as the cumulative score. It is determined whether or not the partial image is a face depending on whether or not the accumulated score is equal to or greater than a predetermined threshold (step SS1-4). Here, when the partial image is determined to be a face, the process proceeds to determination by the next weak classifier WC2 (step SS2). When the partial image is determined to be a non-face, the partial image is immediately determined to be a non-face. It is determined (step SSB), and the process ends.

ステップＳＳ２においても、ステップＳＳ１と同様に、弱判別器ＷＣ２が部分画像に基づいて画像上の特徴を表す上記のような特徴量を算出し（ステップＳＳ２−１）、スコアテーブルを参照して特徴量からスコアを算出する（ステップＳＳ２−２）。そして、自ら算出したスコアを前の弱判別器ＷＣ１が算出した累積スコアに加算して累積スコアを更新し（ステップＳＳ２−３）、当該累積スコアが所定の閾値以上であるか否かによって部分画像が顔であるか否かを判別する（ステップＳＳ２−４）。ここでも、部分画像が顔と判別されたときには、次の弱判別器ＷＣ３による判別に移行し（ステップＳＳ３）、部分画像が非顔と判別されたときには、部分画像は、即、非顔と断定され（ステップＳＳＢ）、処理が終了する。このようにして、Ｎ個すべての弱判別器において部分画像が顔であると判別されたときには、その部分画像を最終的に顔候補として抽出する（ステップＳＳＡ）。 Also in step SS2, as in step SS1, the weak classifier WC2 calculates the above-described feature amount representing the feature on the image based on the partial image (step SS2-1), and refers to the score table for the feature. A score is calculated from the amount (step SS2-2). Then, the score calculated by itself is added to the cumulative score calculated by the previous weak discriminator WC1 to update the cumulative score (step SS2-3), and the partial image is determined depending on whether the cumulative score is equal to or greater than a predetermined threshold. Is a face (step SS2-4). Again, when the partial image is determined to be a face, the process proceeds to determination by the next weak classifier WC3 (step SS3). When the partial image is determined to be a non-face, the partial image is immediately determined to be a non-face. (Step SSB), and the process ends. In this way, when it is determined that the partial image is a face in all N weak classifiers, the partial image is finally extracted as a face candidate (step SSA).

顔検出部３１〜３３は、それぞれ、独自の、特徴量の種類、スコアテーブル、および閾値によって定められた複数の弱判別器からなる判別器によって、それぞれの判別すべき顔の向き、すなわち、正面顔、左横顔、右横顔にある顔を判別する。また、顔検出部３１〜３３は、上記のような部分画像に対する判別処理を、図８に示すように、すべての解像度画像Ｓ１′＿ｉの平面上でその解像度画像を３６０度回転させつつ、解像度画像上において３２×３２画素サイズの部分画像を切り出すサブウィンドウＷを設定し、サブウィンドウＷを解像度画像上で所定画素数分、例えば５画素ずつ移動させながら、サブウィンドウＷで切り出された部分画像が顔の画像であるか否かの判別を行うことにより、各解像度画像Ｓ１′＿ｉにおいて、平面上のあらゆる回転角度にある正面顔、左横顔、および右横顔を検出し、顔候補Ｓ２を出力する。なお、斜め向きの顔の検出精度を上げるため、右斜め顔、左斜め顔をそれぞれ判別する判別器をさらに設けるようにしてもよいが、ここでは特に設けないものとする。 Each of the face detection units 31 to 33 has a unique discriminator composed of a plurality of weak discriminators determined by a characteristic type, a score table, and a threshold value, that is, the face direction to be discriminated, that is, the front surface. A face, a left side profile, and a right side profile are identified. Further, the face detection units 31 to 33 perform the discrimination processing for the partial images as described above while rotating the resolution images 360 degrees on the planes of all the resolution images S1′_i as shown in FIG. A sub-window W for cutting out a 32 × 32 pixel size partial image on the image is set, and the sub-window W is moved by a predetermined number of pixels on the resolution image, for example, by 5 pixels, and the partial image cut out in the sub-window W By determining whether or not the image is an image, a front face, a left side face, and a right side face at every rotation angle on the plane are detected in each resolution image S1′_i, and a face candidate S2 is output. In addition, in order to improve the detection accuracy of the oblique face, a discriminator for discriminating each of the right oblique face and the left oblique face may be further provided, but it is not particularly provided here.

顔検出後段処理部４０は、顔検出前段処理部３０にて抽出された顔候補Ｓ２の近傍の画像に対して比較的精度の高い顔検出処理を施し、顔候補近傍の画像から真の顔Ｓ３を検出するものである。この顔検出後段処理部４０は、基本的には、顔検出前段処理部３０と同様の構成であり、図４に示すように、主に正面顔を検出する第２の正面顔検出部４１と、主に左横顔を検出する第２の左横顔検出部４２と、主に右横顔を検出する第２の右横顔検出部４３とから構成されており、各顔検出部４１〜４３は、それぞれ、複数の弱判別器ＷＣｉ（ｉ＝１〜Ｎ）が線形に結合してカスケード構造を有する判別器４１ａ，４２ａ，４３ａを備えている。ただし、これらの判別器は、顔検出前段処理部３０における判別器より判別精度の高いものが好ましい。この顔検出後段処理部４０においては、判別器における大局的な処理フロー、および弱判別器による処理フローも基本的には顔検出前段処理部３０と同様であるが、サブウィンドウＷを設定する位置は、顔検出前段処理部３０によって抽出された顔候補Ｓ２を含む所定領域内の画像に限定され、また、サブウィンドウＷの移動幅は、顔検出前段処理部３０の場合より細かく、例えば、１画素ずつとなる。これにより、顔検出前段処理部３０でラフに抽出された顔候補Ｓ２がさらに絞り込まれ、真の顔Ｓ３だけが出力されることになる。 The face detection post-processing unit 40 performs face detection processing with relatively high accuracy on the image near the face candidate S2 extracted by the face detection pre-processing unit 30, and the true face S3 from the image near the face candidate. Is detected. The face detection post-processing unit 40 basically has the same configuration as that of the face detection pre-processing unit 30, and as shown in FIG. 4, a second front face detection unit 41 that mainly detects a front face. The second left side face detection unit 42 that mainly detects the left side face, and the second right side face detection unit 43 that mainly detects the right side face. A plurality of weak classifiers WCi (i = 1 to N) are linearly coupled to each other to include classifiers 41a, 42a, and 43a having a cascade structure. However, it is preferable that these discriminators have higher discrimination accuracy than the discriminators in the face detection pre-processing unit 30. In the face detection post-processing unit 40, the general processing flow in the classifier and the processing flow in the weak classifier are basically the same as those in the face detection pre-processing unit 30, but the position where the sub window W is set is The image is limited to an image in a predetermined area including the face candidate S2 extracted by the face detection pre-processing unit 30, and the movement width of the subwindow W is finer than that of the face detection pre-processing unit 30, for example, one pixel at a time. It becomes. As a result, the face candidates S2 roughly extracted by the face detection pre-processing unit 30 are further narrowed down and only the true face S3 is output.

重複検出判定処理部５０は、顔検出後段処理部４０によって検出された各解像度画像Ｓ１′＿ｉ上の顔Ｓ３の位置情報に基づいて、各解像度画像上で検出された顔のうち重複して検出された同一の顔を１つの顔としてまとめる処理を行い、入力画像Ｓ０において検出された顔Ｓ３′の位置情報を出力する。判別器は、学習方法にもよるが、一般的に部分画像のサイズに対して検出できる顔の大きさにはある程度幅があるので、解像度レベルが隣接する複数の解像度画像において、同一の顔が重複して検出される場合があるからである。 Based on the position information of the face S3 on each resolution image S1′_i detected by the face detection post-processing unit 40, the duplication detection determination processing unit 50 detects duplicates of the faces detected on each resolution image. A process of grouping the same faces as one face is performed, and position information of the face S3 ′ detected in the input image S0 is output. Although the discriminator depends on the learning method, the size of the face that can be detected with respect to the size of the partial image generally has a certain width, and therefore, in a plurality of resolution images having adjacent resolution levels, the same face is detected. This is because it may be detected in duplicate.

図９は、上記顔検出システムにおける処理の流れを示したフローチャートである。図９に示すように、多重解像度画像生成部１０に入力画像Ｓ０が供給されると（ステップＳ１）、当該入力画像Ｓ０の画像サイズが所定のサイズに変換された画像Ｓ１が生成され、画像Ｓ１から２の−１／３乗倍ずつ解像度が縮小された複数の解像度画像Ｓ１＿ｉが生成される（ステップＳ２）。そして、画像正規化部２０において、各解像度画像Ｓ１＿ｉの輝度分散が正規化され、正規化済みの解像度画像Ｓ１′＿ｉが得られる（ステップＳ３）。顔検出前段処理部３０は、正面顔、右横顔、および左横顔の判別器３１ａ，３２ａ，３３ａを用いて、各解像度画像Ｓ１′＿ｉについて顔候補Ｓ２をラフに検出する（ステップＳ４）。さらに、顔検出後段処理部４０は、ステップ４で抽出された顔候補Ｓ２の近傍画像に対して、顔検出前段処理部３０と同様に、正面顔、右横顔、および左横顔の判別器４１ａ，４２ａ，４３ａを用いて精査に相当する顔検出を行い、真の顔Ｓ３に絞り込む（ステップＳ５）。そして、各解像度画像Ｓ１′＿ｉにおいて重複して検出された同一の顔を判定（ステップＳ６）し、これらをそれぞれ１つにまとめて最終的に検出された顔Ｓ３′とする。 FIG. 9 is a flowchart showing the flow of processing in the face detection system. As shown in FIG. 9, when the input image S0 is supplied to the multi-resolution image generation unit 10 (step S1), an image S1 in which the image size of the input image S0 is converted to a predetermined size is generated, and the image S1 A plurality of resolution images S1_i whose resolution is reduced by a factor of 2 to 1/3 times are generated (step S2). Then, the image normalization unit 20 normalizes the luminance dispersion of each resolution image S1_i, and obtains a normalized resolution image S1′_i (step S3). The face detection pre-processing unit 30 roughly detects the face candidate S2 for each resolution image S1′_i using the front face, right side face, and left side face discriminators 31a, 32a, and 33a (step S4). Further, the face detection post-stage processing unit 40 applies the front face, right side face, and left side face discriminators 41a to the neighborhood image of the face candidate S2 extracted in step 4 in the same manner as the face detection pre-stage processing unit 30. Face detection corresponding to scrutiny is performed using 42a and 43a and narrowed down to the true face S3 (step S5). Then, the same face detected redundantly in each resolution image S1′_i is determined (step S6), and these are collectively combined into one finally detected face S3 ′.

次に、判別器の学習方法について説明する。図１０は、この判別器の学習方法を示すフローチャートである。なお、学習は、判別器の種類、すなわち、判別すべき顔の向き毎に行われる。 Next, a learning method for the classifier will be described. FIG. 10 is a flowchart showing a learning method of the classifier. Note that learning is performed for each type of discriminator, that is, for each orientation of the face to be discriminated.

学習の対象となるサンプル画像群は、所定のサイズ、例えば３２×３２画素サイズで規格化された、顔であることが分かっている複数のサンプル画像と、顔でないことが分かっている複数のサンプル画像とからなる。顔であることが分かっているサンプル画像としては、顔の向きが判別器の判別すべき顔の向きと同一であって顔の天地方向が揃ったものを用いる。顔であることが分かっているサンプル画像は、１つのサンプル画像につき、縦および／または横を０．７倍〜１．２倍の範囲にて０．１倍単位で段階的に拡縮して得られる各サンプル画像に対し、平面上±１５度の範囲にて３度単位で段階的に回転させて得られる複数の変形バリエーションを用いる。各サンプル画像には、重みすなわち重要度が割り当てられる。まず、すべてのサンプル画像の重みの初期値が等しく１に設定される（ステップＳ１１）。 The sample image group to be learned is a plurality of sample images that are known to be faces and a plurality of samples that are known to be non-faces, standardized at a predetermined size, for example, 32 × 32 pixel size. It consists of an image. As a sample image that is known to be a face, an image in which the face orientation is the same as the face orientation to be discriminated by the discriminator and the face orientations are aligned is used. A sample image that is known to be a face is obtained by scaling in steps of 0.1 times in the range of 0.7 to 1.2 times in length and / or width for each sample image. For each sample image to be obtained, a plurality of deformation variations obtained by rotating stepwise in units of 3 degrees within a range of ± 15 degrees on a plane is used. Each sample image is assigned a weight or importance. First, the initial value of the weight of all the sample images is set equal to 1 (step S11).

次に、サンプル画像およびその縮小画像の平面内に設定される所定の２点を１ペアとして複数のペアからなるペア群を複数種類設定したときの、当該複数種類のペア群のそれぞれについて弱半別器が作成される（ステップＳ１２）。ここで、それぞれの弱判別器とは、サブウィンドウＷで切り出された部分画像とその縮小画像の平面内に設定される所定の２点を１ペアとして複数のペアからなる１つのペア群を設定したときの、当該１つのペア群を構成する各ペアにおける２点間の輝度の差分値の組合せを用いて、顔の画像と顔でない画像とを判別する基準を提供するものである。本実施形態においては、１つのペア群を構成する各ペアにおける２点間の輝度の差分値の組合せについてのヒストグラムを弱判別器のスコアテーブルの基礎として使用する。 Next, when a plurality of types of pair groups consisting of a plurality of pairs are set with a predetermined two points set in the plane of the sample image and the reduced image as one pair, each of the plurality of types of pair groups is weak. A separate device is created (step S12). Here, each weak discriminator sets one pair group consisting of a plurality of pairs with a predetermined two points set in the plane of the partial image cut out in the sub-window W and the reduced image as one pair. A reference for discriminating between a face image and a non-face image using a combination of luminance difference values between two points in each pair constituting the one pair group is provided. In the present embodiment, a histogram of combinations of luminance difference values between two points in each pair constituting one pair group is used as the basis of the score table of the weak classifier.

図１１を参照しながらある判別器の作成について説明する。図１１の左側のサンプル画像に示すように、この判別器を作成するためのペア群を構成する各ペアの２点は、顔であることが分かっている複数のサンプル画像において、サンプル画像上の右目の中心にある点をＰ１、右側の頬の部分にある点をＰ２、眉間の部分にある点をＰ３、サンプル画像を４近傍画素平均で縮小した１６×１６画素サイズの縮小画像上の右目の中心にある点をＰ４、右側の頬の部分にある点をＰ５、さらに４近傍画素平均で縮小した８×８画素サイズの縮小画像上の額の部分にある点をＰ６、口の部分にある点をＰ７として、Ｐ１−Ｐ２、Ｐ１−Ｐ３、Ｐ４−Ｐ５、Ｐ４−Ｐ６、Ｐ６−Ｐ７の５ペアである。なお、ある判別器を作成するための１つのペア群を構成する各ペアの２点の座標位置はすべてのサンプル画像において同一である。そして顔であることが分かっているすべてのサンプル画像について上記５ペアを構成する各ペアの２点間の輝度の差分値の組合せが求められ、そのヒストグラムが作成される。ここで、輝度の差分値の組合せとしてとり得る値は、画像の輝度階調数に依存するが、仮に１６ビット階調である場合には、１つの輝度の差分値につき６５５３６通りあり、全体では階調数の（ペア数）乗、すなわち６５５３６の５乗通りとなってしまい、学習および検出のために多大なサンプルの数、時間およびメモリを要することとなる。このため、本実施形態においては、輝度の差分値を適当な数値幅で区切って量子化し、ｎ値化する（例えばｎ＝１００）。 The creation of a classifier will be described with reference to FIG. As shown in the sample image on the left side of FIG. 11, two points of each pair constituting the pair group for creating this discriminator are a plurality of sample images that are known to be faces. The right eye on the reduced image of 16 × 16 pixel size in which the point in the center of the right eye is P1, the point in the right cheek part is P2, the point in the part between the eyebrows is P3, and the sample image is reduced by an average of four neighboring pixels The point at the center of P4, the point at the cheek on the right side is P5, and the point at the forehead part on the reduced image of 8 × 8 pixel size reduced by the average of 4 neighboring pixels is P6, the mouth part A certain point is P7, and there are five pairs of P1-P2, P1-P3, P4-P5, P4-P6, and P6-P7. Note that the coordinate positions of the two points of each pair constituting one pair group for creating a certain classifier are the same in all sample images. Then, for all sample images that are known to be faces, combinations of luminance difference values between two points of each pair constituting the above five pairs are obtained, and a histogram thereof is created. Here, the value that can be taken as a combination of luminance difference values depends on the number of luminance gradations of the image, but if it is a 16-bit gradation, there are 65536 patterns for one luminance difference value. The number of gradations is (the number of pairs), that is, 65536 to the fifth power, and a large number of samples, time, and memory are required for learning and detection. For this reason, in the present embodiment, the luminance difference value is divided and quantized by an appropriate numerical value width, and converted into an n-value (for example, n = 100).

これにより、輝度の差分値の組合せの数はｎの５乗通りとなるため、輝度の差分値の組合せを表すデータ数を低減できる。 As a result, the number of combinations of luminance difference values is n to the fifth power, so the number of data representing the combination of luminance difference values can be reduced.

同様に、顔でないことが分かっている複数のサンプル画像についても、ヒストグラムが作成される。なお、顔でないことが分かっているサンプル画像については、顔であることが分かっているサンプル画像上における上記各ペアの所定の２点の位置に対応する位置（同様に参照符号Ｐ１〜Ｐ７を用いる）が用いられる。これらの２つのヒストグラムが示す頻度値の比の対数値を取ってヒストグラムで表したものが、図１１の一番右側に示す、弱判別器のスコアテーブルの基礎として用いられるヒストグラムである。この弱判別器のヒストグラムが示す各縦軸の値を、以下、判別ポイントと称する。この弱判別器によれば、正の判別ポイントに対応する、輝度の差分値の組合せの分布を示す画像は顔である可能性が高く、判別ポイントの絶対値が大きいほどその可能性は高まると言える。逆に、負の判別ポイントに対応する輝度の差分値の組合せの分布を示す画像は顔でない可能性が高く、やはり判別ポイントの絶対値が大きいほどその可能性は高まる。ステップＳ１２では、判別に使用され得る複数種類のペア群を構成する各ペアの所定の２点間の輝度の差分値の組合せについて、上記のヒストグラム形式の複数の弱判別器が作成される。 Similarly, histograms are created for a plurality of sample images that are known not to be faces. For sample images that are known not to be faces, positions corresponding to the positions of the two predetermined points of each pair on the sample image that is known to be a face (similarly, reference numerals P1 to P7 are used). ) Is used. A histogram obtained by taking the logarithm of the ratio of the frequency values indicated by these two histograms and representing the histogram is the histogram used as the basis of the score table of the weak discriminator shown on the rightmost side of FIG. The value of each vertical axis indicated by the histogram of the weak classifier is hereinafter referred to as a discrimination point. According to this weak discriminator, an image showing the distribution of combinations of luminance difference values corresponding to positive discrimination points is likely to be a face, and the possibility increases as the absolute value of the discrimination point increases. I can say that. Conversely, an image showing a distribution of combinations of luminance difference values corresponding to negative discrimination points is highly likely not to be a face, and the possibility increases as the absolute value of the discrimination point increases. In step S12, a plurality of weak discriminators in the above-described histogram format are created for combinations of luminance difference values between two predetermined points of each pair constituting a plurality of types of pair groups that can be used for discrimination.

続いて、ステップＳ１２で作成した複数の弱半別器のうち、画像が顔であるか否かを判別するのに最も有効な弱判別器が選択される。最も有効な弱判別器の選択は、各サンプル画像の重みを考慮して行われる。この例では、各弱判別器の重み付き正答率が比較され、最も高い重み付き正答率を示す弱判別器が選択される（ステップＳ１３）。すなわち、最初のステップＳ１３では、各サンプル画像の重みは等しく１であるので、単純にその弱判別器によって画像が顔であるか否かが正しく判別されるサンプル画像の数が最も多いものが、最も有効な弱判別器として選択される。一方、後述するステップＳ１５において各サンプル画像の重みが更新された後の２回目のステップＳ１３では、重みが１のサンプル画像、重みが１よりも大きいサンプル画像、および重みが１よりも小さいサンプル画像が混在しており、重みが１よりも大きいサンプル画像は、正答率の評価において、重みが１のサンプル画像よりも重みが大きい分多くカウントされる。これにより、２回目以降のステップＳ１３では、重みが小さいサンプル画像よりも、重みが大きいサンプル画像が正しく判別されることに、より重点が置かれる。 Subsequently, the most effective weak discriminator for discriminating whether or not the image is a face is selected from the plurality of weak semi-divided devices created in step S12. The most effective weak classifier is selected in consideration of the weight of each sample image. In this example, the weighted correct answer rates of the weak classifiers are compared, and the weak classifier showing the highest weighted correct answer rate is selected (step S13). That is, in the first step S13, since the weight of each sample image is equal to 1, the one with the largest number of sample images for which it is simply determined correctly whether or not the image is a face by the weak classifier is as follows: Selected as the most effective weak classifier. On the other hand, in the second step S13 after the weight of each sample image is updated in step S15, which will be described later, a sample image with a weight of 1, a sample image with a weight greater than 1, and a sample image with a weight less than 1 The sample images having a weight greater than 1 are counted more in the evaluation of the correct answer rate because the weight is larger than the sample images having a weight of 1. Thereby, in step S13 after the second time, more emphasis is placed on correctly identifying a sample image having a large weight than a sample image having a small weight.

次に、それまでに選択した弱判別器の組合せの正答率、すなわち、それまでに選択した弱判別器を組み合わせて使用して（学習段階では、弱判別器は必ずしも線形に結合させる必要はない）各サンプル画像が顔の画像であるか否かを判別した結果が、実際に顔の画像であるか否かの答えと一致する率が、所定の閾値を超えたか否かが確かめられる（ステップＳ１４）。ここで、弱判別器の組合せの正答率の評価に用いられるのは、現在の重みが付けられたサンプル画像群でも、重みが等しくされたサンプル画像群でもよい。所定の閾値を超えた場合は、それまでに選択した弱判別器を用いれば画像が顔であるか否かを十分に高い確率で判別できるため、学習は終了する。所定の閾値以下である場合は、それまでに選択した弱判別器と組み合わせて用いるための追加の弱判別器を選択するために、ステップＳ１６へと進む。 Next, the correct answer rate of the combination of weak classifiers selected so far, that is, using the weak classifiers selected so far in combination (in the learning stage, the weak classifiers do not necessarily need to be linearly combined. ) It is ascertained whether the result of determining whether or not each sample image is a face image has exceeded a predetermined threshold value at a rate that matches the answer of whether or not it is actually a face image (step) S14). Here, the current weighted sample image group or the sample image group with equal weight may be used for evaluating the correct answer rate of the combination of weak classifiers. When the predetermined threshold value is exceeded, learning is terminated because it is possible to determine whether the image is a face with a sufficiently high probability by using the weak classifier selected so far. If it is equal to or less than the predetermined threshold value, the process proceeds to step S16 in order to select an additional weak classifier to be used in combination with the weak classifier selected so far.

ステップＳ１６では、直近のステップＳ１３で選択された弱判別器が再び選択されないようにするため、その弱判別器が除外される。 In step S16, the weak discriminator selected in the most recent step S13 is excluded so as not to be selected again.

次に、直近のステップＳ１３で選択された弱判別器では顔であるか否かを正しく判別できなかったサンプル画像の重みが大きくされ、画像が顔であるか否かを正しく判別できたサンプル画像の重みが小さくされる（ステップＳ１５）。このように重みを大小させる理由は、次の弱判別器の選択において、既に選択された弱判別器では正しく判別できなかった画像を重要視し、それらの画像が顔であるか否かを正しく判別できる弱判別器が選択されるようにして、弱判別器の組合せの効果を高めるためである。 Next, the weight of the sample image that could not be correctly determined whether or not it is a face in the weak classifier selected in the most recent step S13 is increased, and the sample image that can be correctly determined whether or not the image is a face. Is reduced (step S15). The reason for increasing or decreasing the weight in this way is that in the selection of the next weak classifier, importance is placed on images that could not be correctly determined by the already selected weak classifier, and whether or not those images are faces is correct. This is because a weak discriminator that can be discriminated is selected to enhance the effect of the combination of the weak discriminators.

続いて、ステップＳ１３へと戻り、上記したように重み付き正答率を基準にして次に有効な弱判別器が選択される。 Subsequently, the process returns to step S13, and the next effective weak classifier is selected based on the weighted correct answer rate as described above.

以上のステップＳ１３からＳ１６を繰り返して、顔であるか否かを判別するのに適した弱判別器として、特定のペア群を構成する各ペアの所定の２点間の輝度の差分値の組合せに対応する弱判別器が選択されたところで、ステップＳ１４で確認される正答率が閾値を超えたとすると、顔であるか否かの判別に用いる弱判別器の種類と判別条件とが確定され（ステップＳ１７）、これにより学習を終了する。なお、選択された弱判別器は、その重み付き正答率が高い順に線形結合され、１つの判別器が構成される。また、各弱判別器については、それぞれ得られたヒストグラムを基に、輝度の差分値の組合せに応じてスコアを算出するためのスコアテーブルが生成される。なお、ヒストグラム自身をスコアテーブルとして用いることもでき、この場合、ヒストグラムの判別ポイントがそのままスコアとなる。 As a weak discriminator suitable for discriminating whether or not it is a face by repeating the above steps S13 to S16, a combination of luminance difference values between predetermined two points of each pair constituting a specific pair group When the weak classifier corresponding to is selected and the correct answer rate confirmed in step S14 exceeds the threshold value, the type of the weak classifier and the determination condition used for determining whether or not the face is determined are determined ( Step S17), thereby completing the learning. The selected weak classifiers are linearly combined in descending order of the weighted correct answer rate to constitute one classifier. For each weak discriminator, a score table for calculating a score according to a combination of luminance difference values is generated based on the obtained histogram. Note that the histogram itself can also be used as a score table. In this case, the discrimination point of the histogram is directly used as a score.

なお、上記の学習手法を採用する場合において、弱判別器は、特定のペア群を構成する各ペアの所定の２点間の輝度の差分値の組合せを用いて顔の画像と顔でない画像とを判別する基準を提供するものであれば、上記のヒストグラムの形式のものに限られずいかなるものであってもよく、例えば２値データ、閾値または関数等であってもよい。また、同じヒストグラムの形式であっても、図１１の中央に示した２つのヒストグラムの差分値の分布を示すヒストグラム等を用いてもよい。 In the case of employing the learning method described above, the weak classifier uses a combination of luminance difference values between two predetermined points of each pair constituting a specific pair group, and a face image and a non-face image. As long as it provides a criterion for discriminating the above, it is not limited to the above-described histogram format, and may be anything, for example, binary data, threshold value, function, or the like. Further, even in the same histogram format, a histogram or the like indicating the distribution of difference values between the two histograms shown in the center of FIG. 11 may be used.

また、学習の方法としては上記手法に限定されるものではなく、ニューラルネットワーク等他のマシンラーニングの手法を用いることができる。 Further, the learning method is not limited to the above method, and other machine learning methods such as a neural network can be used.

ここで、判別器に学習させたい画像上の特徴部分について、判別すべき顔の向きと判別器が用いられる顔検出上の処理の種類によってどのように異なるのかを考えてみることにする。 Here, let us consider how the feature portion on the image to be learned by the discriminator differs depending on the orientation of the face to be discriminated and the type of processing on the face detection in which the discriminator is used.

まず、判別すべき顔が横顔である場合について考える。横顔を表す画像では、顔以外の比較的広い背景部分の存在と、顎の比較的鋭角な輪郭形状が特徴的である。したがって、横顔を判別する判別器３２ａ，３３ａ，４２ａ，４３ａでは、この横顔に特有の比較的広い背景の存在と、顎の輪郭形状を学習させることが重要である。これらの特徴を判別器に学習させるには、顔を表すサンプル画像にこれらの特徴が適正に含まれるようにする必要があり、例えば、図１２に示すように、顔を表すサンプル画像を、顔輪郭すべてを囲む領域の画像に限定することが考えられる。なお、横顔のサンプル画像は、顔の側面の輪郭が画像の略中央に位置するよう設定すると、画像に含まれる特徴部分の配置のバランスがよく、より好ましい。 First, consider the case where the face to be identified is a profile. An image representing a profile is characterized by the presence of a relatively wide background other than the face and a relatively sharp outline of the jaw. Therefore, it is important for the discriminators 32a, 33a, 42a, and 43a that discriminate the profile to learn the presence of a relatively wide background specific to the profile and the contour shape of the jaw. In order for the classifier to learn these features, it is necessary to properly include these features in the sample image representing the face. For example, as shown in FIG. It can be considered that the image is limited to the image of the region surrounding the entire outline. Note that it is more preferable that the profile image of the profile be set so that the contour of the side surface of the face is positioned at the approximate center of the image because the balance of the arrangement of the characteristic portions included in the image is good.

次に、判別すべき顔が正面顔の場合について考える。正面を表す画像では、両目、鼻、口の存在、顔の丸い輪郭形状が特徴的である。ところが、正面顔では、口が横に広がった顔や口を大きく開けた顔等、表情の変化によって口が変形し、特に上唇より下側部分の形状に大きな変化が現れやすい。 Next, consider the case where the face to be identified is a front face. In the image representing the front, the presence of both eyes, nose and mouth, and the round outline of the face are characteristic. However, in the front face, the mouth is deformed by a change in facial expression, such as a face with a widened mouth or a face with a wide open mouth, and a large change tends to appear particularly in the shape of the lower part of the upper lip.

したがって、上記顔検出前段処理部３０の判別器３１ａのように、判別すべき顔が正面顔であって、判別器の用いられる顔検出上の処理の種類が、顔候補を抽出する前段処理と抽出した顔候補を絞り込む後段処理とを有する顔検出処理における前記前段処理であるような場合には、顔候補の拾い漏れを防ぐという観点から、正面顔において最も単純で共通性の高い、顔の丸い形状を学習させることが重要である。この特徴を判別器に学習させるには、顔を表すサンプル画像にこの特徴が適正に含まれるようにする必要があり、例えば、図１３に示すように、顔を表すサンプル画像を、顔輪郭すべてを囲む領域の画像に限定することが考えられる。 Therefore, like the discriminator 31a of the face detection pre-processing unit 30, the face to be discriminated is a front face, and the type of processing on the face detection used by the discriminator is the pre-processing for extracting face candidates. In the case of the preceding process in the face detection process including the subsequent process for narrowing down the extracted face candidates, it is the simplest and most common face face in the front face from the viewpoint of preventing omission of face candidates. It is important to learn the round shape. In order for the discriminator to learn this feature, it is necessary to appropriately include this feature in the sample image representing the face. For example, as shown in FIG. It is conceivable that the image is limited to the image of the region surrounding the.

また、上記顔検出後段処理部４０の判別器４１ａのように、判別すべき顔が正面顔であって、判別器の用いられる顔検出上の処理の種類が、顔候補を抽出する前段処理と抽出した顔候補を絞り込む後段処理とを有する顔検出処理における前記後段処理であるような場合には、口周りの変形のバリエーションによる影響を抑え判別精度を向上させるという観点から、両目、鼻、上唇の存在を主に学習させるのが重要である。この特徴を判別器に学習させるには、顔を表すサンプル画像にこの特徴が適正に含まれるようにする必要があり、例えば、顔を表すサンプル画像を、図１４に示すような、両目、鼻および上唇のみを囲む領域の画像に限定するか、あるいは、図１３に示すような、顔輪郭をすべて囲む領域の画像、および、図１４に示すような、両目、鼻および上唇のみを囲む領域の画像の両方に限定することが考えられる。 Further, like the discriminator 41a of the face detection post-processing unit 40, the face to be discriminated is a front face, and the type of processing on the face detection used by the discriminator is the pre-processing for extracting face candidates. In the case of the latter-stage process in the face-detection process including the latter-stage process for narrowing down the extracted face candidates, from the viewpoint of suppressing the influence of variations in deformation around the mouth and improving the discrimination accuracy, both eyes, nose, upper lip It is important to learn mainly the existence of In order for the discriminator to learn this feature, it is necessary that the feature image is properly included in the sample image representing the face. For example, the sample image representing the face is obtained by using both eyes and nose as shown in FIG. And only the image of the region surrounding only the upper lip, or the image of the region surrounding all the face contours as shown in FIG. 13 and the region surrounding only the eyes, nose and upper lip as shown in FIG. It is conceivable to limit to both images.

ここで上記「顔輪郭」とは、頭部（後頭部を含む）および首の輪郭を除いた輪郭であって、正面顔の場合には、右側のこめかみ、顎（あご）、左側のこめかみの各点を結ぶ輪郭、横顔の場合には、額、鼻、下顎の各点を結ぶ輪郭をいう。 Here, the “face outline” is an outline excluding the outline of the head (including the back of the head) and the neck. In the case of a front face, each of the right temple, chin and left temple In the case of a profile connecting points, in the case of a profile, it is a profile connecting points on the forehead, nose, and lower jaw.

このように、本実施形態に係る顔判別装置（判別器）の学習方法によれば、学習用顔画像の顔の向きや天地方向を揃えることに加え、学習させたい画像上の特徴部分がそれぞれ異なる場合別に、顔を表すサンプル画像の顔領域を、所定の顔領域に限定しているので、学習させたい画像上の特徴部分を、顔を表すサンプル画像に適正に含めることができるとともに、顔を表すサンプル画像に含まれる画像上の特徴が複数の画像間で分散するのを抑制することができ、本来学習させたい画像上の特徴を的確に学習した顔判別装置を得ることが可能となる。また、本実施形態の顔判別装置は、上記学習方法に基づく学習により得られたものであるから、本来学習させたい画像上の特徴を的確に学習した顔判別装置となる。 As described above, according to the learning method of the face discriminating apparatus (discriminator) according to the present embodiment, in addition to aligning the face direction and the top-and-bottom direction of the learning face image, the feature portions on the image to be learned are respectively Since the face area of the sample image representing the face is limited to a predetermined face area for each different case, the feature portion on the image to be learned can be appropriately included in the sample image representing the face, and the face It is possible to suppress the dispersion of the features on the image included in the sample image representing the image between the plurality of images, and it is possible to obtain a face discriminating apparatus that has accurately learned the features on the image to be originally learned. . In addition, since the face discriminating apparatus of the present embodiment is obtained by learning based on the learning method described above, it is a face discriminating apparatus that has accurately learned the features on the image that are originally desired to be learned.

なお、顔を表すサンプル画像の顔領域は、便宜上、すべて矩形であることが好ましい。 Note that the face area of the sample image representing the face is preferably all rectangular for convenience.

以上、本発明の実施形態に係る顔判別装置および顔判別装置の学習方法について説明したが、上記顔判別装置（判別器）における各処理をコンピュータに実行させるためのプログラムも、本発明の実施形態の１つである。また、そのようなプログラムを記録したコンピュータ読取可能な記録媒体も、本発明の実施形態の１つである。 The face discriminating apparatus and the learning method of the face discriminating apparatus according to the embodiment of the present invention have been described above, but a program for causing a computer to execute each process in the face discriminating apparatus (discriminator) is also an embodiment of the present invention. It is one of. A computer-readable recording medium that records such a program is also one embodiment of the present invention.

顔検出システム１の構成を示すブロック図Block diagram showing the configuration of the face detection system 1 多重解像度画像の生成工程を示す図Diagram showing the multi-resolution image generation process 顔検出前段処理部３０の構成を示すブロック図Block diagram showing the configuration of the face detection pre-processing unit 30 顔検出後段処理部４０の構成を示すブロック図Block diagram showing the configuration of the post-face detection processing unit 40 判別器における大局的な処理フローを示す図Diagram showing the global processing flow in the classifier 弱判別器における処理フローを示す図The figure which shows the processing flow in the weak classifier 弱判別器における特徴量の算出を説明するための図The figure for demonstrating calculation of the feature-value in a weak discriminator 複数の解像度画像での解像度画像の回転とサブウィンドウの移動を説明するための図Diagram for explaining resolution image rotation and sub-window movement in multiple resolution images 顔検出システム１において行われる処理を示すフローチャートThe flowchart which shows the process performed in the face detection system 1 判別器の学習方法を示すフローチャートFlow chart showing the learning method of the classifier 弱判別器のヒストグラムを導出する方法を示す図The figure which shows the method of deriving the histogram of the weak classifier 横顔の顔輪郭全てを囲む顔領域のサンプル画像の例を示す図The figure which shows the example of the sample image of the face area | region surrounding all the face outlines of a profile 正面顔の顔輪郭すべてを顔囲む領域のサンプル画像の例を示す図The figure which shows the example of the sample image of the area | region surrounding the face outline of all the front faces 両目、鼻、上唇のみを囲む顔領域のサンプル画像の例を示す図The figure which shows the example of the sample image of the face area which surrounds only both eyes, nose and upper lip

Explanation of symbols

１顔検出システム
１０多重解像度画像生成部
２０画像正規化部
３０顔検出前段処理部
３１第１の正面顔検出部
３２第１の左横顔検出部
３３第１の右横顔検出部
３１ａ，３２ａ，３３ａ判別器
４０顔検出後段処理部
４１第２の正面顔検出部
４２第２の左横顔検出部
４３第２の右横顔検出部
４１ａ，４２ａ，４３ａ判別器
５０重複検出判定処理部 DESCRIPTION OF SYMBOLS 1 Face detection system 10 Multi-resolution image generation part 20 Image normalization part 30 Face detection pre-processing part 31 1st front face detection part 32 1st left side face detection part 33 1st right side face detection part 31a, 32a, 33a Discriminator 40 Face detection post-stage processing unit 41 Second front face detection unit 42 Second left side face detection unit 43 Second right side face detection units 41a, 42a, 43a Discriminator 50 Duplicate detection determination processing unit

Claims

A face discriminating apparatus for discriminating whether or not an input image is a face image including a face in a predetermined direction and a vertical direction. A plurality of different learnings in which the facial direction and the vertical direction are the predetermined direction and the vertical direction. In a learning method of a face discrimination device that learns facial features by a machine learning method using facial images
The plurality of learning face images consist only of images of predetermined face areas determined according to the predetermined direction and the type of processing on face detection in which the face discrimination device is used,
The process on face detection using the face discrimination device is a face detection process having a pre-stage process for extracting face candidates and a post-stage process for narrowing down the extracted face candidates,
When the predetermined direction is the front and the type of processing is the pre-stage processing, the predetermined face region is a region surrounding all face contours,
When the predetermined direction is the front and the type of processing is the subsequent processing, the predetermined face region is a region surrounding only both eyes, nose and upper lip. Device learning method.

A face discriminating apparatus for discriminating whether or not an input image is a face image including a face in a predetermined direction and a vertical direction. In a learning method of a face discrimination device that learns facial features by a machine learning method using facial images
The plurality of learning face images consist only of images of predetermined face areas determined according to the predetermined direction and the type of processing on face detection in which the face discrimination device is used,
The process on face detection using the face discrimination device is a face detection process having a pre-stage process for extracting face candidates and a post-stage process for narrowing down the extracted face candidates,
When the predetermined direction is the front and the type of processing is the pre-stage processing, the predetermined face region is a region surrounding all face contours,
When the predetermined orientation is the front and the type of processing is the subsequent processing, the predetermined face region is a region surrounding only both eyes and nose. Learning method.

The predetermined face area is an area surrounding all face contours when the predetermined direction is horizontal and the type of the process is the pre-stage process or the post-stage process. The learning method of the face discrimination device according to 2.

4. The learning method for a face discrimination device according to claim 1, wherein the face area is a rectangular area.

5. The learning method for a face discrimination device according to claim 1, wherein the face discrimination device has a structure in which a plurality of weak discriminators different from each other are linearly coupled.

6. The learning method for a face discrimination device according to claim 1, wherein the machine learning method is boosting.

A face discriminating apparatus for discriminating whether or not an input image is a face image including a face in a predetermined direction and a vertical direction. A plurality of different learnings in which the facial direction and the vertical direction are the predetermined direction and the vertical direction. In a learning device of a face discrimination device that learns facial features by a machine learning method using a facial image
The plurality of learning face images consist only of images of predetermined face areas determined according to the predetermined direction and the type of processing on face detection in which the face discrimination device is used,
The process on face detection using the face discrimination device is a face detection process having a pre-stage process for extracting face candidates and a post-stage process for narrowing down the extracted face candidates,
When the predetermined direction is the front and the type of processing is the pre-stage processing, the predetermined face region is a region surrounding all face contours,
When the predetermined direction is the front and the type of processing is the subsequent processing, the predetermined face region is a region surrounding only both eyes, nose and upper lip. Device learning device.

A face discriminating apparatus for discriminating whether or not an input image is a face image including a face in a predetermined direction and a vertical direction. A plurality of different learnings in which the facial direction and the vertical direction are the predetermined direction and the vertical direction. In a learning device of a face discrimination device that learns facial features by a machine learning method using a facial image
The plurality of learning face images consist only of images of predetermined face areas determined according to the predetermined direction and the type of processing on face detection in which the face discrimination device is used,
The process on face detection using the face discrimination device is a face detection process having a pre-stage process for extracting face candidates and a post-stage process for narrowing down the extracted face candidates,
When the predetermined direction is the front and the type of processing is the pre-stage processing, the predetermined face region is a region surrounding all face contours,
When the predetermined orientation is the front and the type of processing is the subsequent processing, the predetermined face region is a region surrounding only both eyes and nose. Learning device.

On the computer,
A face discriminating apparatus for discriminating whether or not an input image is a face image including a face in a predetermined direction and a vertical direction. A plurality of different learnings in which the facial direction and the vertical direction are the predetermined direction and the vertical direction. In a program that executes the process of learning facial features by machine learning using facial images,
The plurality of learning face images consist only of images of predetermined face areas determined according to the predetermined direction and the type of processing on face detection in which the face discrimination device is used,
The process on face detection using the face discrimination device is a face detection process having a pre-stage process for extracting face candidates and a post-stage process for narrowing down the extracted face candidates,
When the predetermined direction is the front and the type of processing is the pre-stage processing, the predetermined face region is a region surrounding all face contours,
The program according to claim 1, wherein the predetermined face area is an area surrounding only both eyes, nose and upper lip when the predetermined direction is the front and the processing type is the subsequent process.

On the computer,
A face discriminating apparatus for discriminating whether or not an input image is a face image including a face in a predetermined direction and a vertical direction. A plurality of different learnings in which the facial direction and the vertical direction are the predetermined direction and the vertical direction. In a program that executes the process of learning facial features by machine learning using facial images,
The plurality of learning face images consist only of images of predetermined face areas determined according to the predetermined direction and the type of processing on face detection in which the face discrimination device is used,
The process on face detection using the face discrimination device is a face detection process having a pre-stage process for extracting face candidates and a post-stage process for narrowing down the extracted face candidates,
When the predetermined direction is the front and the type of processing is the pre-stage processing, the predetermined face region is a region surrounding all face contours,
The program according to claim 1, wherein the predetermined face area is an area surrounding only both eyes and nose when the predetermined direction is front and the type of processing is the subsequent processing.

An input image is obtained by learning facial features by a machine learning method using a plurality of different learning face images in which the face direction and the vertical direction are the predetermined direction and the vertical direction. In a face discrimination method using a face discrimination device that discriminates whether or not a face image includes a face,
The plurality of learning face images consist only of images of predetermined face areas determined according to the predetermined direction and the type of processing on face detection in which the face discrimination device is used,
The process on face detection using the face discrimination device is a face detection process having a pre-stage process for extracting face candidates and a post-stage process for narrowing down the extracted face candidates,
When the predetermined direction is the front and the type of processing is the pre-stage processing, the predetermined face region is a region surrounding all face contours,
When the predetermined direction is the front and the type of processing is the subsequent processing, the predetermined face region is a region surrounding only both eyes, nose and upper lip. Method.

An input image is obtained by learning facial features by a machine learning method using a plurality of different learning face images in which the face direction and the vertical direction are the predetermined direction and the vertical direction. In a face discrimination method using a face discrimination device that discriminates whether or not a face image includes a face,
The plurality of learning face images consist only of images of predetermined face areas determined according to the predetermined direction and the type of processing on face detection in which the face discrimination device is used,
The process on face detection using the face discrimination device is a face detection process having a pre-stage process for extracting face candidates and a post-stage process for narrowing down the extracted face candidates,
When the predetermined direction is the front and the type of processing is the pre-stage processing, the predetermined face region is a region surrounding all face contours,
A face discrimination method, wherein the predetermined face area is an area surrounding only both eyes and nose when the predetermined direction is front and the type of processing is the subsequent processing.

An input image is obtained by learning facial features by a machine learning method using a plurality of different learning face images in which the face direction and the vertical direction are the predetermined direction and the vertical direction. In a face discriminating apparatus that discriminates whether or not a face image includes a face,
The plurality of learning face images consist only of images of predetermined face areas determined according to the predetermined direction and the type of processing on face detection in which the face discrimination device is used,
The process on face detection using the face discrimination device is a face detection process having a pre-stage process for extracting face candidates and a post-stage process for narrowing down the extracted face candidates,
When the predetermined direction is the front and the type of processing is the pre-stage processing, the predetermined face region is a region surrounding all face contours,
When the predetermined direction is the front and the type of processing is the subsequent processing, the predetermined face region is a region surrounding only both eyes, nose and upper lip. apparatus.

An input image is obtained by learning facial features by a machine learning method using a plurality of different learning face images in which the face direction and the vertical direction are the predetermined direction and the vertical direction. In a face discriminating apparatus that discriminates whether or not a face image includes a face,
The plurality of learning face images consist only of images of predetermined face areas determined according to the predetermined direction and the type of processing on face detection in which the face discrimination device is used,
The process on face detection using the face discrimination device is a face detection process having a pre-stage process for extracting face candidates and a post-stage process for narrowing down the extracted face candidates,
When the predetermined direction is the front and the type of processing is the pre-stage processing, the predetermined face region is a region surrounding all face contours,
When the predetermined orientation is the front and the type of processing is the subsequent processing, the predetermined face area is an area surrounding only both eyes and nose.

A computer learns the features of a face by a machine learning method using a plurality of different learning face images in which the face direction and the vertical direction are the predetermined direction and the vertical direction. In a program for functioning as a face discriminating device that discriminates whether or not a face image includes a face in a vertical direction,
The plurality of learning face images consist only of images of predetermined face areas determined according to the predetermined direction and the type of processing on face detection in which the face discrimination device is used,
The process on face detection using the face discrimination device is a face detection process having a pre-stage process for extracting face candidates and a post-stage process for narrowing down the extracted face candidates,
When the predetermined direction is the front and the type of processing is the pre-stage processing, the predetermined face region is a region surrounding all face contours,
The program according to claim 1, wherein the predetermined face area is an area surrounding only both eyes, nose and upper lip when the predetermined direction is the front and the processing type is the subsequent process.

A computer learns the features of a face by a machine learning method using a plurality of different learning face images in which the face direction and the vertical direction are the predetermined direction and the vertical direction. In a program for functioning as a face discriminating device that discriminates whether or not a face image includes a face in a vertical direction,
The plurality of learning face images consist only of images of predetermined face areas determined according to the predetermined direction and the type of processing on face detection in which the face discrimination device is used,
The process on face detection using the face discrimination device is a face detection process having a pre-stage process for extracting face candidates and a post-stage process for narrowing down the extracted face candidates,
When the predetermined direction is the front and the type of processing is the pre-stage processing, the predetermined face region is a region surrounding all face contours,
The program according to claim 1, wherein the predetermined face area is an area surrounding only both eyes and nose when the predetermined direction is front and the type of processing is the subsequent processing.

The predetermined face area is an area surrounding all face contours when the predetermined direction is horizontal and the type of the process is the pre-stage process or the post-stage process. 9. The learning device for a face discrimination device according to 8.

The predetermined face area is an area surrounding all face contours when the predetermined direction is horizontal and the type of the process is the pre-stage process or the post-stage process. 10. The program according to 10.

12. The predetermined face area is an area surrounding all face contours when the predetermined direction is horizontal and the type of the process is the pre-stage process or the post-stage process. 12. The face discrimination method according to 12.

The predetermined face area is an area surrounding all face contours when the predetermined direction is horizontal and the type of the process is the pre-stage process or the post-stage process. 14. The face discrimination device according to 14.

The predetermined face area is an area surrounding all face contours when the predetermined direction is horizontal and the type of the process is the pre-stage process or the post-stage process. 16. The program according to 16.