JP2013020353A

JP2013020353A - Discriminator generation method, device and program

Info

Publication number: JP2013020353A
Application number: JP2011151873A
Authority: JP
Inventors: Makoto Yonaha; 誠與那覇
Original assignee: Fujifilm Corp
Current assignee: Fujifilm Corp
Priority date: 2011-07-08
Filing date: 2011-07-08
Publication date: 2013-01-31

Abstract

PROBLEM TO BE SOLVED: To increase discrimination accuracy of an open/close state of eyes when detecting eyes from an input image and tracking the same while reducing tracking failure.SOLUTION: An open/close state of eyes is classified into three or more stages. Plural sample images which are known to be images representing eyes at each of the stages respectively and plural sample images which are known to be images not representing eyes are prepared. A discriminator which determines whether a determination object image is an image representing eyes at the stages is generated by causing it to learn the plural sample images which are known to be the images representing eyes at each of the stages as correct samples, so that a sample image which is known to be an image representing eyes in a state one stage above or one stage below the stages is not determined as incorrect, and one of the plural sample images which is known to be an image representing eyes in a state two stages above or two stages below the stages and the plural sample images which are known to be the images not representing eyes as incorrect samples.

Description

本発明は、判別対象画像が、状態が連続して変化する所定の対象物を含む画像であるか否かを、所定の前記状態別にそれぞれ判別する複数の判別器を生成する判別器生成方法、装置およびプログラムに関する。 The present invention relates to a discriminator generating method for generating a plurality of discriminators for discriminating each of the predetermined states for whether or not the determination target image is an image including a predetermined object whose state continuously changes. The present invention relates to a device and a program.

従来、車載用カメラ等により得られた画像を用いて運転者の眼の開閉状態を監視することにより、運転者の居眠り状態を検出し警報する居眠り検出装置が利用されている。 2. Description of the Related Art Conventionally, a dozing detection device that detects and alerts a driver's dozing state by monitoring the open / closed state of the driver's eyes using an image obtained by an in-vehicle camera or the like has been used.

眼の開閉状態を監視する手法として、例えば、開いた状態にある眼を表す画像であることが分かっている複数のサンプル画像を正解サンプルとして、眼を表す画像でないことが分かっている複数のサンプル画像を不正解サンプルとして学習させることにより、判別対象画像が開いた状態にある眼を表す画像であるか否かを判別する判別器を生成し、閉じた状態にある眼を表す画像であることが分かっている複数のサンプル画像を正解サンプルとして、眼を表す画像でないことが分かっている複数のサンプル画像を不正解サンプルとして学習させることにより、判別対象画像が閉じた状態にある眼を表す画像であるか否かを判別する判別器を生成し、それらの判別器を用いて、連続的に撮影して得られた複数の入力画像から順次眼を検出し、追跡する手法が知られている。 As a method for monitoring the open / closed state of eyes, for example, a plurality of sample images that are known to be images representing eyes in an open state are used as correct samples, and a plurality of samples that are known not to represent images of eyes A discriminator that discriminates whether or not a discrimination target image is an image representing an eye in an open state by learning an image as an incorrect answer sample, and is an image representing an eye in a closed state An image representing an eye in which the image to be discriminated is closed by learning a plurality of sample images that are known not to represent an eye as a correct sample and a plurality of sample images that are known not to represent an eye as an incorrect sample. Are generated, and the eyes are sequentially detected from a plurality of input images obtained by continuous shooting using these discriminators. Technique that has been known.

ところで、上記手法においては、前記各状態を判別する判別器を生成する際に、その状態にある眼の特徴は正解として学習されている一方、別の状態にある眼の特徴は正解または不正解のいずれとも学習されていないため、閉じた状態にある眼を表す画像が開いた状態にある眼を表す画像と判別されたり、逆に開いた状態にある眼を表す画像が閉じた状態にある眼を表す画像と判別されたりすることがあり、問題となる。 By the way, in the above method, when generating the discriminator for discriminating each state, the eye feature in the state is learned as a correct answer, while the eye feature in another state is correct or incorrect. Since the image representing the eye in the closed state is discriminated as the image representing the eye in the open state, the image representing the eye in the open state is closed. It may be discriminated as an image representing the eye, which causes a problem.

これに対し、特許文献１には、上記手法において、判別対象画像が開いた状態にある眼を表す画像であるか否かを判別する判別器を生成する際に、眼を表す画像でないことが分かっているサンプル画像に加えて、閉じた状態にある眼を表す画像であることが分かっているサンプル画像も不正解サンプルとして学習させ、判別対象画像が閉じた状態にある眼を表す画像であるか否かを判別する判別器を生成する際に、眼を表す画像でないことが分かっているサンプル画像に加えて、開いた状態にある眼を表す画像であることが分かっているサンプル画像も不正解サンプルとして学習させることにより、上記問題を解決する手法が提案されている。 On the other hand, in Patent Document 1, when generating a discriminator for discriminating whether or not the discrimination target image is an image representing an eye in an open state in the above method, the image is not an image representing an eye. In addition to the known sample image, a sample image that is known to be an image representing an eye in a closed state is also learned as an incorrect sample, and is an image that represents an eye in which a discrimination target image is in a closed state In addition to the sample image that is known not to represent the eye, the sample image that is known to represent the eye in the open state is also generated when generating the discriminator that determines whether or not A technique for solving the above problem by learning as a correct sample has been proposed.

特開第２００７−３２３１０４号公報JP 2007-323104 A

しかしながら、上記特許文献１の手法では、前記各状態を判別する判別器を生成する際に異なる状態にある眼の特徴を不正解として学習させているため、前記２つの状態の中間である半分開いた状態の眼を表す画像が眼を表す画像でなないものと判別され、その結果眼を見失い、眼の追跡が失敗してしまうことがある。 However, in the method of the above-mentioned Patent Document 1, since the eye features in different states are learned as incorrect solutions when generating the discriminator for discriminating each state, a half open that is an intermediate between the two states is performed. It may be determined that the image representing the eye in the state is not an image representing the eye, and as a result, the eye may be lost and tracking of the eye may fail.

本発明は、上記事情に鑑み、状態が連続して変化する所定の対象物を検出し、追跡する際に、追跡失敗を低減させつつ、状態の判別精度を向上させることができる判別器生成方法、装置およびプログラムを提供することを目的とするものである。 In view of the above circumstances, the present invention is a discriminator generation method capable of improving the state discrimination accuracy while reducing tracking failure when detecting and tracking a predetermined object whose state continuously changes. An object of the present invention is to provide an apparatus and a program.

本発明の判別器生成方法は、判別対象画像が、状態が反復的かつ連続的に変化する所定の対象物を含む画像であるか否かを、所定の前記状態別にそれぞれ判別する複数の判別器を生成する判別器生成方法であって、前記状態を３以上の段階に分けて、該各状態にある所定の対象物を表す画像であることが分かっている複数のサンプル画像と、所定の対象物を表す画像でないことが分かっている複数のサンプル画像を取得し、前記状態毎に、該状態にある所定の対象物を表す画像であることが分かっている複数のサンプル画像を正解サンプルとして、該状態の隣の段階の状態にある所定の対象物を表す画像であることが分かっているサンプル画像は不正解サンプルとすることなく、該状態と段階が２以上離れた状態にある所定の対象物を表す画像であることが分かっている複数のサンプル画像のうち少なくとも１つと所定の対象物を表す画像でないことが分かっている複数のサンプル画像とを不正解サンプルとして学習させて、判別対象画像が該状態にある所定の対象物を表す画像であるか否かを判別する判別器を生成することを特徴とするものである。 The discriminator generation method of the present invention includes a plurality of discriminators for discriminating whether or not each discrimination target image is an image including a predetermined object whose state changes repetitively and continuously for each predetermined state. A plurality of sample images that are known to be images representing a predetermined object in each state, and the predetermined target A plurality of sample images that are known not to represent an object are acquired, and a plurality of sample images that are known to be images representing a predetermined object in the state are set as correct samples for each state. A sample image that is known to be an image representing a predetermined object in a state at a stage next to the state is not an incorrect sample, and is a predetermined object that is two or more steps away from the state. Paintings At least one of the plurality of sample images that are known to be and the plurality of sample images that are known not to represent the predetermined object are learned as incorrect samples, and the discrimination target image is in this state A discriminator that discriminates whether or not the image represents a certain predetermined object is generated.

上記判別器生成方法において、前記所定の対象物は眼であり、前記状態は眼の開閉状態であってもよい。 In the classifier generation method, the predetermined object may be an eye, and the state may be an eye open / closed state.

また、前記３以上の段階は、眼を閉じた状態、半分開いた状態、全開の状態の３つの段階であってもよい。 Further, the three or more stages may be three stages of a closed eye state, a half open state, and a fully open state.

また、前記所定の対象物は眼であり、前記３以上の段階は、瞳孔が眼の中心部に位置する状態、眼の左端に位置する状態、眼の右端に位置する状態の３つの段階であってもよい。 Further, the predetermined object is an eye, and the three or more stages are in three stages: a state where the pupil is located at the center of the eye, a state where the pupil is located at the left end of the eye, and a state where the pupil is located at the right end of the eye. There may be.

また、前記所定の対象物は顔であり、前記３以上の段階は、顔が正面を向いている状態、斜めを向いている状態、横を向いている状態の３つの段階であってもよい。 Further, the predetermined object is a face, and the three or more stages may be three stages of a face facing front, a face facing diagonally, and a face facing sideways. .

本発明の判別器生成装置は、上記判別器生成方法を実施するサンプル画像取得手段と、学習手段とを備えたことを特徴とするものである。 The discriminator generation device of the present invention is characterized by comprising sample image acquisition means for implementing the discriminator generation method and learning means.

本発明の判別器生成プログラムは、上記判別器生成方法少なくとも１台のコンピュータに実行させるプログラムである。このプログラムは、ＣＤ−ＲＯＭ，ＤＶＤなどの記録メディアに記録され、またはサーバコンピュータに付属するストレージやネットワークストレージにダウンロード可能な状態で記録されて、ユーザに提供される。 The discriminator generation program of the present invention is a program that is executed by at least one computer. This program is recorded on a recording medium such as a CD-ROM or DVD, or recorded in a state where it can be downloaded to a storage attached to a server computer or a network storage, and provided to the user.

本発明の判別器生成方法、装置およびプログラムによれば、判別対象画像が、状態が反復的かつ連続的に変化する所定の対象物を含む画像であるか否かを、所定の前記状態別にそれぞれ判別する複数の判別器を生成する際に、前記状態を３以上の段階に分けて、該各状態にある所定の対象物を表す画像であることが分かっている複数のサンプル画像と、所定の対象物を表す画像でないことが分かっている複数のサンプル画像を取得し、前記状態毎に、該状態にある所定の対象物を表す画像であることが分かっている複数のサンプル画像を正解サンプルとして、該状態の隣の段階の状態にある所定の対象物を表す画像であることが分かっているサンプル画像は不正解サンプルとすることなく、該状態と段階が２以上離れた状態にある所定の対象物を表す画像であることが分かっている複数のサンプル画像のうち少なくとも１つと所定の対象物を表す画像でないことが分かっている複数のサンプル画像とを不正解サンプルとして学習させて、判別対象画像が該状態にある所定の対象物を表す画像であるか否かを判別する判別器を生成するようにしているので、連続する段階の間の状態にある所定の対象物を表す画像はそのいずれかの状態にある所定の対象物として判別されるようにすることができ、未検出による追跡失敗を低減させることができる。また、段階が２以上離れた状態にある所定の対象物を表す画像間で判別ミスが生じないようにすることができ、状態の判別精度を向上させることができる。 According to the discriminator generation method, device, and program of the present invention, whether or not the discrimination target image is an image including a predetermined object whose state changes repetitively and continuously is determined for each predetermined state. When generating a plurality of discriminators to discriminate, the state is divided into three or more stages, a plurality of sample images that are known to be images representing a predetermined object in each state, A plurality of sample images that are known not to represent an object are acquired, and a plurality of sample images that are known to be images representing a predetermined object in the state are obtained as correct samples for each state. A sample image that is known to be an image representing a predetermined object in a state at a stage adjacent to the state is not an incorrect sample, and a predetermined image in which the state and the stage are separated by two or more Target The discriminating target image is learned as an incorrect sample by learning at least one of a plurality of sample images known to be images representing the image and a plurality of sample images known to be images not representing the predetermined object. Since the discriminator for determining whether or not the image represents the predetermined object in the state is generated, any of the images representing the predetermined object in the state between successive stages is selected. It is possible to discriminate the object as a predetermined object in the state, and to reduce tracking failure due to non-detection. Further, it is possible to prevent a determination error between images representing a predetermined object in a state where two or more stages are separated from each other, and the state determination accuracy can be improved.

上記判別器生成方法、装置およびプログラムにおいて、前記所定の対象物が眼であり、前記状態が眼の開閉状態である場合には、入力画像から眼を検出、追跡するときに、眼の追跡失敗を低減させつつ、眼の開閉状態の判別精度を向上させることができる。 In the discriminator generation method, apparatus, and program, when the predetermined object is an eye and the state is an open / closed state of the eye, an eye tracking failure occurs when the eye is detected and tracked from the input image. The accuracy of discriminating the open / closed state of the eye can be improved.

また、前記３以上の段階が、眼を閉じた状態、半分開いた状態、全開の状態の３つの段階である場合には、判別対象画像が閉じた状態にある眼を表す画像であるか全開の状態にある眼を表す画像であるかの判別精度を向上させつつ、眼の追跡失敗を低減させることができる。 Further, when the three or more stages are the three stages of the eyes closed state, the half-open state, and the fully-open state, the image representing the eye in which the discrimination target image is closed or the fully-open state is displayed. It is possible to reduce eye tracking failure while improving the accuracy of determining whether the image represents an eye in the above state.

また、前記所定の対象物は眼であり、前記３以上の段階は、瞳孔が眼の中心部に位置する状態、眼の左端に位置する状態、眼の右端に位置する状態の３つの段階である場合には、判別対象画像が左側を見ている眼を表す画像であるか右側を見ている眼を表す画像であるかの判別精度を向上させつつ、眼の追跡失敗を低減させることができる。 Further, the predetermined object is an eye, and the three or more stages are in three stages: a state where the pupil is located at the center of the eye, a state where the pupil is located at the left end of the eye, and a state where the pupil is located at the right end of the eye. In some cases, it is possible to reduce eye tracking failure while improving the discrimination accuracy of whether the discrimination target image is an image representing an eye looking at the left side or an eye looking at the right side. it can.

また、前記所定の対象物は顔であり、前記３以上の段階は、顔が正面を向いている状態、斜めを向いている状態、横を向いている状態の３つの段階である場合には、判別対象画像が正面を向いている顔を表す画像であるか横を向いている顔を表す画像であるかの判別精度を向上させつつ、顔の追跡失敗を低減させることができる。 In the case where the predetermined object is a face, and the three or more stages are three stages of a state where the face is facing the front, a state where the face is facing obliquely, and a state where the face is facing sideways. Further, it is possible to reduce face tracking failure while improving the discrimination accuracy of whether the discrimination target image is an image representing a face facing front or an image representing a face facing sideways.

眼検出装置の構成を示すブロック図Block diagram showing the configuration of the eye detection device 判別器生成装置の構成を示すブロック図Block diagram showing the configuration of the classifier generator 各状態を判別する判別器の生成に使用されるサンプル画像を模式的に示す図The figure which shows typically the sample image used for the production | generation of the discriminator which discriminate | determines each state 判別器の生成方法を示すフローチャートA flowchart showing a method of generating a discriminator

以下、図面を参照し、本発明の実施の形態を詳細に説明する。図１に示す眼検出装置１０は、後述する判別器生成装置２０により生成された判別器群を用いて、複数フレームの動画像から眼を検出し、追跡するものであって、フレームメモリ１１、眼検出部１２、眼追跡部１３、記憶部１４等備えている。眼検出装置１０内の各部の機能は、コンピュータシステムが所定のプログラムに従って動作することで実現することができる。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. An eye detection apparatus 10 shown in FIG. 1 detects and tracks eyes from a moving image of a plurality of frames using a classifier group generated by a classifier generation apparatus 20 described later. An eye detection unit 12, an eye tracking unit 13, a storage unit 14, and the like are provided. The function of each unit in the eye detection apparatus 10 can be realized by the computer system operating according to a predetermined program.

フレームメモリ１１は、眼検出装置１０に入力された複数フレームからなる動画像のうちの少なくとも数フレーム分のフレーム画像を記憶する。 The frame memory 11 stores a frame image for at least several frames among the moving images composed of a plurality of frames input to the eye detection device 10.

眼検出部１２は、フレームメモリ１１からフレーム画像を順次取得し、判別器群を用いてその各フレーム画像から検出対象である眼を検出する。具体的には、画像全体を探索範囲として、その探索範囲内で所定のサイズを有する探索枠をその位置をずらしながら順次設定し、各設定された探索枠により囲まれた部分画像を切り出す。そして、切り出された部分画像を判別器群へ入力し、部分画像が所定の状態にある眼を表す画像であるか否かを判別することによって眼を検出する。 The eye detection unit 12 sequentially acquires frame images from the frame memory 11 and detects an eye to be detected from each frame image using a classifier group. Specifically, with the entire image as a search range, search frames having a predetermined size within the search range are sequentially set while shifting their positions, and partial images surrounded by the set search frames are cut out. Then, the cut-out partial image is input to the classifier group, and eyes are detected by determining whether or not the partial image is an image representing an eye in a predetermined state.

ここで、判別器群は、部分画像（判別対象画像）が所定の状態にある眼を表す画像であるか否かを判別する複数の判別器から構成されており、具体的には、判別すべき画像が閉じた状態にある眼を表す画像である第１の判別器、判別すべき画像が半分開いた状態にある眼を表す画像である第２の判別器、判別すべき画像が全開の状態にある目顔を表す画像である第３の判別器が並列に接続されている。 Here, the classifier group is composed of a plurality of classifiers that determine whether or not the partial image (image to be determined) is an image representing an eye in a predetermined state. A first discriminator that is an image representing an eye in which the power image is closed, a second discriminator that is an image representing an eye in which the image to be discriminated is half open, and the image to be discriminated is fully open A third discriminator that is an image representing an eye face in a state is connected in parallel.

各判別器は、部分画像の画素値の分布に係る少なくとも１つの特徴量として、所定の複数点間の画素値の差分に係る特徴量を算出し、この特徴量を用いてこの部分画像が所定の状態にある眼を表す画像であるか否かを判別するものであり、これらの複数の判別器の全部または一部から得られた判別結果を評価して、判別対象画像が前記いずれの状態にある眼を表す画像であるか否かの判別結果を得る。なお、判別器群を構成する各判別器の構成、生成方法等に関する詳細は後述する。 Each discriminator calculates a feature amount related to a difference in pixel values between a plurality of predetermined points as at least one feature amount related to the distribution of pixel values of the partial image, and the partial image is predetermined using the feature amount. It is determined whether or not the image represents an eye in the state, and the determination result obtained from all or part of the plurality of classifiers is evaluated, and the determination target image is in any of the states A determination result is obtained as to whether or not the image represents an eye in the area. Note that details regarding the configuration, generation method, and the like of each classifier constituting the classifier group will be described later.

眼検出部１２は、現在フレームにおいて眼が検出されると、検出された画像中の眼の位置や、検出された眼が前記３つの状態のうちどの状態にあるものとして判別されたかの情報等を記憶部１４に記憶する。また、眼検出部１２は、眼が検出された旨を眼追跡部１３に通知し、眼追跡部１３からの眼を見失った旨の通知があるまで眼の検出処理を中断する。そして、眼追跡部１３により眼を見失った旨が通知されると、再び眼の検出処理を開始する。 When an eye is detected in the current frame, the eye detection unit 12 displays information such as the position of the eye in the detected image and information indicating which of the three states the detected eye is in. Store in the storage unit 14. Further, the eye detection unit 12 notifies the eye tracking unit 13 that the eye has been detected, and interrupts the eye detection process until there is a notification from the eye tracking unit 13 that the eye has been lost. When the eye tracking unit 13 notifies that the eye has been lost, the eye detection process is started again.

眼追跡部１３は、眼検出部１２により眼が検出された場合に、フレームメモリ１１からフレーム画像を順次取得し、判別器群を用いてその各フレーム画像から眼を検出することにより眼検出部１２により検出された眼を追跡するものであって、具体的には、直前のフレームにおいて検出された眼の位置を基準とする所定のサイズの領域を探索範囲として、その探索範囲内で所定のサイズを有する探索枠をその位置をずらしながら順次設定し、各設定された探索枠により囲まれた部分画像を切り出し、その切り出された部分画像を判別器群へ入力し、部分画像が所定の状態にある眼を表す画像であるか否かを判別することによって眼を検出する処理を、眼を見失うまで繰り返し行う。 When an eye is detected by the eye detection unit 12, the eye tracking unit 13 sequentially acquires frame images from the frame memory 11, and uses the discriminator group to detect eyes from the frame images, thereby detecting the eye. 12, specifically, an area having a predetermined size based on the position of the eye detected in the immediately preceding frame is set as a search range, and a predetermined range is set within the search range. A search frame having a size is sequentially set while shifting its position, a partial image surrounded by each set search frame is cut out, the cut out partial image is input to a classifier group, and the partial image is in a predetermined state. The process of detecting the eye by determining whether or not the image represents the eye in the eye is repeated until the eye is lost.

眼追跡部１３は、現在フレームにおいて眼が検出されると、検出された画像中の眼の位置や、検出された眼が前記３つの状態のうちどの状態にあるものとして判別されたかの情報等を記憶部１４に記憶させ、眼が検出されなかった場合には、眼を見失った旨を眼検出部１２に通知し、眼検出部１２からの眼が検出された旨の通知があるまで眼の追跡処理を中断する。そして、眼検出部１２により眼が検出された旨が再び通知されると、眼の追跡処理を再開する。 When the eye tracking unit 13 detects an eye in the current frame, the eye tracking unit 13 displays information such as the position of the eye in the detected image, the information indicating which of the three states the detected eye is in. When the eye is not detected, the eye detection unit 12 is notified that the eye has been lost, and the eye detection unit 12 notifies the eye that the eye has been detected. Interrupt the tracking process. When the eye detection unit 12 notifies that the eye has been detected again, the eye tracking process is resumed.

次に、判別器生成装置２０について説明する。図２に示すように、判別器生成装置２０は、サンプル画像取得部２１および学習部２２を備えている。判別器生成装置２０内の各部の機能は、コンピュータシステムが所定のプログラムに従って動作することで実現することができる。 Next, the discriminator generation device 20 will be described. As shown in FIG. 2, the discriminator generation device 20 includes a sample image acquisition unit 21 and a learning unit 22. The function of each unit in the discriminator generation device 20 can be realized by the computer system operating according to a predetermined program.

サンプル画像取得部２１は、判別器の学習に使用するサンプル画像を判別器生成装置２０に入力するためのものである。判別器生成装置２０は、判別すべき画像が閉じた状態にある眼を表す画像であるか否か、半分開いた状態にある眼を表す画像であるか否か、全開の状態にある目顔を表す画像であるか否かをそれぞれ判別する３つの判別器を生成するものであるので、サンプル画像取得部２１は、その各状態にある眼を表す画像であることが分かっている複数のサンプル画像と、眼を表す画像でないことが分かっている複数のサンプル画像を取得し、学習部２２に学習データとして提供する。 The sample image acquisition unit 21 is for inputting a sample image used for learning of the discriminator to the discriminator generation device 20. The discriminator generation device 20 determines whether the image to be discriminated is an image representing an eye in a closed state, an image representing an eye in a half open state, or an eye face in a fully open state. The sample image acquiring unit 21 generates a plurality of samples that are known to be images representing eyes in each state. An image and a plurality of sample images that are known not to represent an eye are acquired and provided to the learning unit 22 as learning data.

学習部２２は、所定の学習アルゴリズムにより、サンプル画像取得部２１から提供された学習データを用いて学習を行って、前記判別器群を構成する第１、第２および第３の判別器をそれぞれ生成する。 The learning unit 22 performs learning using the learning data provided from the sample image acquisition unit 21 by a predetermined learning algorithm, and sets the first, second, and third discriminators constituting the classifier group, respectively. Generate.

各判別器は、複数の弱判別器からなり、後述の学習により多数の弱判別器の中から選定された判別に有効な弱判別器をその有効な順に直列に接続したものである。弱判別器は、それぞれ、判別対象画像から弱判別器毎に固有の所定のアルゴリズムにしたがって特徴量を算出し、その特徴量に基づいて、判別対象画像が所定の状態にある眼を表す画像であることの蓋然性を示すスコアを求めるものである。各判別器は、これら複数の弱判別器の全部または一部から得られたスコアを評価して、判別対象画像が所定の状態にある眼を表す画像であるか否かの判別結果を得る。 Each discriminator is composed of a plurality of weak discriminators, in which weak discriminators effective for discrimination selected from a large number of weak discriminators by learning described later are connected in series in the effective order. Each weak discriminator calculates an amount of feature from a discrimination target image according to a predetermined algorithm specific to each weak discriminator, and is an image representing an eye in which the discrimination target image is in a predetermined state based on the feature amount. The score which shows the probability of a certain thing is calculated | required. Each discriminator evaluates the score obtained from all or part of the plurality of weak discriminators to obtain a discrimination result as to whether or not the discrimination target image is an image representing an eye in a predetermined state.

これらの判別器は判別可能な眼の状態が互いに異なるため、その状態毎に異なるサンプル画像が学習データとして使用される。図３は各状態を判別する判別器の生成に用いられるサンプル画像を模式的に示す図である。図３に示すように、判別すべき画像が閉じた状態にある眼を表す画像である判別器を学習により生成する際には、閉じた状態にある眼を表す画像であることが分かっている複数のサンプル画像が正解サンプルとして使用され、全開の状態にある眼を表す画像であることが分かっている複数のサンプル画像と眼を表す画像でないことが分かっている複数のサンプル画像が不正解サンプルとして使用される。しかし、半分開いた状態にある眼を表す画像は正解サンプルまたは不正解サンプルのいずれとしても使用しない。 Since these discriminators have different eye states that can be discriminated, different sample images for each state are used as learning data. FIG. 3 is a diagram schematically showing a sample image used for generating a discriminator for discriminating each state. As shown in FIG. 3, when a discriminator that is an image representing an eye in a closed state is generated by learning, the image to be discriminated is known to be an image representing an eye in a closed state. Multiple sample images are used as correct samples, and multiple sample images that are known to be images representing eyes that are fully open and multiple sample images that are known not to represent eyes are incorrect samples Used as. However, an image representing an eye that is half open is not used as either a correct sample or an incorrect sample.

また、判別すべき画像が半分開いた状態にある眼を表す画像である判別器を生成する際には、半分開いた状態にある眼を表す画像であることが分かっている複数のサンプル画像が正解サンプルとして使用され、眼を表す画像でないことが分かっている複数のサンプル画像が不正解サンプルとして使用される。しかし、閉じた状態にある眼を表す画像や全開の状態にある眼を表す画像は正解サンプルまたは不正解サンプルのいずれとしても使用しない。 In addition, when generating a discriminator that is an image representing an eye in a state where the image to be discriminated is half open, a plurality of sample images that are known to be images representing the eye in a half open state are generated. A plurality of sample images that are used as correct samples and are known not to represent images are used as incorrect samples. However, an image representing an eye in a closed state or an image representing an eye in a fully opened state is not used as either a correct sample or an incorrect sample.

また、判別すべき画像が全開の状態にある眼を表す画像である判別器を学習により生成する際には、全開の状態にある眼を表す画像であることが分かっている複数のサンプル画像が正解サンプルとして使用され、閉じた状態にある眼を表す画像であることが分かっている複数のサンプル画像と眼を表す画像でないことが分かっている複数のサンプル画像が不正解サンプルとして使用される。しかし、半分開いた状態にある眼を表す画像は正解サンプルまたは不正解サンプルのいずれとしても使用しない。 In addition, when the discriminator, which is an image representing an eye in a fully open state, is generated by learning, a plurality of sample images that are known to be images representing an eye in a fully open state are generated. A plurality of sample images that are used as correct samples and are known to be images representing eyes in a closed state and a plurality of sample images that are known not to represent images are used as incorrect samples. However, an image representing an eye that is half open is not used as either a correct sample or an incorrect sample.

次いで、学習部２２により行われる前記各判別器の生成（学習）処理について説明する。なお、各判別器を生成する処理は、上述のようにその判別すべき状態毎に異なるサンプル画像が学習データとして使用される点を除けば全て共通しているので、ここでは、判別器を生成する処理の流れを、その生成に適切なサンプル画像が用意されていることを前提として説明する。 Next, the generation (learning) process of each discriminator performed by the learning unit 22 will be described. Note that the processing for generating each discriminator is the same except that the different sample images are used as learning data for each state to be discriminated as described above. The processing flow will be described on the premise that a sample image suitable for the generation is prepared.

図４は判別器の生成方法を示すフローチャートである。まず、生成しようとする判別器に対応して適切に用意されたすべてのサンプル画像（正解サンプルおよび不正解サンプル）の重みの初期値を等しく１に設定する（ステップＳ１１）。 FIG. 4 is a flowchart showing a method for generating a discriminator. First, initial values of weights of all sample images (correct samples and incorrect samples) appropriately prepared corresponding to the classifier to be generated are set equal to 1 (step S11).

次に、サンプル画像上に設定される所定の２点を１ペアとして複数のペアからなるペア群を複数種類設定したときの、この複数種類のペア群のそれぞれについて弱半別器が作成される（ステップＳ１２）。ここで、それぞれの弱判別器とは、その弱判別器に割り当てられている特定のペア群を構成する各ペアにおける２点間の画素値の差分値の組合せを用いて、判別対象画像が所定の状態にある眼を表す画像であることの蓋然性を示す値を出力するものである。 Next, a weak semi-separator is created for each of the plural types of pair groups when a plurality of types of pair groups consisting of a plurality of pairs are set with two predetermined points set on the sample image as one pair. (Step S12). Here, each weak discriminator is a predetermined discrimination target image using a combination of pixel value difference values between two points in each pair constituting a specific pair group assigned to the weak discriminator. A value indicating the probability of being an image representing an eye in the state is output.

続いて、ステップＳ１２で作成した複数の弱半別器のうち、判別対象画像が所定の状態にある眼を表す画像であるか否かを判別するのに最も有効な弱判別器が選択される。最も有効な弱判別器の選択は、各サンプル画像の重みを考慮して行われる。この例では、各弱判別器の重み付き正答率が比較され、最も高い重み付き正答率を示す弱判別器が選択される（ステップＳ１３）。すなわち、最初のステップＳ１３では、各サンプル画像の重みは等しく１であるので、単純にその弱判別器によって判別対象画像が所定の状態にある眼を表す画像であるか否かが正しく判別されるサンプル画像の数が最も多いものが、最も有効な弱判別器として選択される。 Subsequently, the weak classifier that is most effective for determining whether or not the determination target image is an image representing an eye in a predetermined state is selected from the plurality of weak half-separators created in step S12. . The most effective weak classifier is selected in consideration of the weight of each sample image. In this example, the weighted correct answer rates of the weak classifiers are compared, and the weak classifier showing the highest weighted correct answer rate is selected (step S13). That is, in the first step S13, since the weight of each sample image is equal to 1, the weak discriminator simply determines correctly whether or not the discrimination target image is an image representing an eye in a predetermined state. The sample with the largest number of sample images is selected as the most effective weak classifier.

次に、それまでに選択した弱判別器の組合せの正答率、すなわち、それまでに選択した弱判別器を組み合わせて使用して各サンプル画像が所定の状態にある眼を表す画像であるか否かを判別した結果が、実際に所定の状態にある眼を表す画像であるか否かの答えと一致する率が、所定の閾値を超えたか否かが確かめられる（ステップＳ１４）。所定の閾値を超えた場合は、それまでに選択した弱判別器を用いれば判別対象画像が所定の状態にある眼を表す画像であるか否かを十分に高い確率で判別できるため、学習は終了する。所定の閾値以下である場合は、それまでに選択した弱判別器と組み合わせて用いるための追加の弱判別器を選択するために、ステップＳ１６へと進む。ステップＳ１６では、直近のステップＳ１３で選択された弱判別器が再び選択されないようにするため、その弱判別器が除外される。 Next, the correct answer rate of the combination of weak classifiers selected so far, that is, whether or not each sample image is an image representing an eye in a predetermined state using a combination of the weak classifiers selected so far It is ascertained whether or not the rate at which the result of the determination matches the answer indicating whether or not the image actually represents an eye in a predetermined state has exceeded a predetermined threshold (step S14). When the predetermined threshold is exceeded, it is possible to determine with a sufficiently high probability whether or not the determination target image is an image representing an eye in a predetermined state by using the weak classifier selected so far. finish. If it is equal to or less than the predetermined threshold value, the process proceeds to step S16 in order to select an additional weak classifier to be used in combination with the weak classifier selected so far. In step S16, the weak discriminator selected in the most recent step S13 is excluded so as not to be selected again.

次に、直近のステップＳ１３で選択された弱判別器では判別対象画像が所定の状態にある眼を表す画像であるか否かを正しく判別できなかったサンプル画像の重みが大きくされ、判別対象画像が所定の状態にある眼を表す画像であるか否かを正しく判別できたサンプル画像の重みが小さくされる（ステップＳ１５）。続いて、ステップＳ１３へと戻り、上記したように重み付き正答率を基準にして次に有効な弱判別器が選択される。 Next, the weak discriminator selected in the most recent step S13 cannot increase whether the discrimination target image is an image representing an eye in a predetermined state or not, and the weight of the sample image is increased. The weight of the sample image that can correctly determine whether or not is an image representing an eye in a predetermined state is reduced (step S15). Subsequently, the process returns to step S13, and the next effective weak classifier is selected based on the weighted correct answer rate as described above.

以上のステップＳ１３からＳ１６を繰り返して、所定の状態にある眼を表す画像であるか否かを判別するのに適した弱判別器として、特定のペア群を構成する各ペアの所定の２点間の画素値の差分値の組合せに対応する弱判別器が選択されたところで、ステップＳ１４で確認される正答率が閾値を超えたとすると、所定の状態にある眼を表す画像であるか否かの判別に用いる弱判別器の種類と判別条件とが確定され（ステップＳ１７）、これにより学習を終了する。なお、選択された弱判別器は、その重み付き正答率が高い順に線形結合され、判別器が生成される。 As a weak discriminator suitable for discriminating whether or not the image represents an eye in a predetermined state by repeating the above steps S13 to S16, two predetermined points of each pair constituting a specific pair group If a weak discriminator corresponding to a combination of difference values of pixel values between them is selected, if the correct answer rate confirmed in step S14 exceeds a threshold, whether or not the image represents an eye in a predetermined state. The type of weak discriminator used for discrimination and the discrimination condition are determined (step S17), thereby completing the learning. The selected weak classifiers are linearly combined in descending order of the weighted correct answer rate to generate a classifier.

上記の構成により、本実施の形態によれば、判別対象画像が全開の状態にある眼を表す画像であるか否かを判別する判別器を生成する際に、閉じた状態にある眼を表す画像であることが分かっているサンプル画像を不正解サンプルとして学習させ、判別対象画像が閉じた状態にある眼を表す画像であるか否かを判別する判別器を生成する際に、全開の状態にある眼を表す画像であることが分かっているサンプル画像を不正解サンプルとして学習させているので、判別対象画像が閉じた状態にある眼を表す画像であるか全開の状態にある眼を表す画像であるかの判別精度を向上させることができる。 With the above configuration, according to the present embodiment, when generating a discriminator that discriminates whether or not the discrimination target image is an image representing an eye in a fully open state, the eye in a closed state is represented. When a classifier that learns a sample image that is known to be an image as an incorrect sample and generates a discriminator that determines whether or not the discrimination target image is an image that represents a closed eye is generated Since a sample image that is known to be an image representing an eye in the image is learned as an incorrect sample, the image to be identified represents an eye that is in a closed state or an eye that is in a fully open state The accuracy of discriminating whether the image is an image can be improved.

さらに、判別対象画像が全開の状態にある眼を表す画像であるか否かを判別する判別器を生成する際、及び閉じた状態にある眼を表す画像であるか否かを判別する判別器を生成する際に、半分開いた状態の眼を表す画像であることが分かっているサンプル画像は不正解サンプルとして使用せず、半分開いた状態の眼を表す画像であるか否かを判別する判別器を生成する際に、全開の状態にある眼を表す画像と閉じた状態にある眼を表す画像のいずれをも不正解サンプルとして使用しないようにしているので、判別対象画像が全開の状態と半分開いた状態の間の状態にある眼を表す画像である場合には、その画像が全開の状態にある眼を表す画像と半分開いた状態にある眼を表す画像のいずれかとして判別され、判別対象画像が閉じた状態と半分開いた状態の間の状態にある眼を表す画像である場合には、その画像が閉じた状態にある眼を表す画像と半分開いた状態にある眼を表す画像のいずれかとして判別され、眼が検出されている状態を維持することができ、眼の追跡失敗を低減させることができる。 Further, when generating a discriminator for discriminating whether or not the discrimination target image is an image representing an eye in a fully open state, and a discriminator for discriminating whether or not the discriminating image is an image representing an eye in a closed state When generating an image, a sample image that is known to be an image representing a half-opened eye is not used as an incorrect sample, and it is determined whether the image represents a half-opened eye. When generating the discriminator, neither the image representing the eyes in the fully open state nor the image representing the eyes in the closed state is used as an incorrect sample, so the discrimination target image is in the fully open state. If the image represents an eye that is in the half-open state, the image is identified as either an image that represents the eye that is fully open or an image that represents the eye that is half-open. , The discrimination target image is closed and half If the image represents an eye in an open state, the image is identified as either an image representing an eye in a closed state or an image representing an eye in a half-open state. Can be maintained and eye tracking failure can be reduced.

また、人間の眼は瞬き（瞼の開閉運動）を繰り返すので、その点に着目して、上述のように眼の追跡を行いつつ、さらに、追跡中の眼の開閉状態の変化を予め用意しておいた眼の開閉状態の変化モデルと照合することによって、すなわち瞬きらしい開閉運動がなされているか否かを監視することによって、追跡中のオブジェクトが慎の眼であるか否かを判断し、慎の眼ではないと判断された場合に追跡を打ち切るようにすることで、眼の誤追跡を軽減することができ、眼の検出・追跡精度をより向上させることができる。 In addition, since the human eye repeatedly blinks (opening and closing movement of the eyelid), paying attention to this point, while tracking the eye as described above, further preparing a change in the opening / closing state of the eye being tracked in advance. By checking with the eye open / closed state change model, that is, by monitoring whether or not blinking opening and closing movements are made, it is determined whether or not the object being tracked is a modest eye, By stopping tracking when it is determined that it is not a modest eye, it is possible to reduce eye mistracking and to further improve eye detection and tracking accuracy.

なお、上記実施の形態では、本発明の判別器生成方法、装置およびプログラムを、所定の対象物が眼であり、判別すべき状態が眼の開閉状態である場合に適用した場合について説明したが、本発明は、たとえば所定の対象物が顔であり、判別すべき状態が顔の向きである場合や、所定の対象物が眼であり、判別すべき状態が視線方向である場合等、判別対象画像が所定の対象物を表す画像であるか否かをその連続して変化する状態を複数段階に分けて判別する場合に広く適用することができる。 In the above embodiment, the case has been described where the discriminator generation method, apparatus and program of the present invention are applied when the predetermined object is the eye and the state to be discriminated is the open / closed state of the eye. In the present invention, for example, when the predetermined object is a face and the state to be determined is the face orientation, or when the predetermined object is an eye and the state to be determined is the line-of-sight direction, etc. The present invention can be widely applied to the case where it is determined in a plurality of stages whether the target image is an image representing a predetermined target or not.

また、上記実施の形態では、所定の対象物の状態を３つの段階に分けるようにした場合について例示したが、所定の対象物の状態を４以上の段階に分けてその状態毎に判別器を生成するようにしてもよい。 In the above embodiment, the case where the state of the predetermined object is divided into three stages is illustrated. However, the state of the predetermined object is divided into four or more stages, and a discriminator is provided for each state. You may make it produce | generate.

また、上記実施の形態では、眼追跡部１３が、判別器群を用いて眼の追跡処理を行うものである場合について説明したが、テンプレートマッチングを用いて眼の追跡処理を行うものであってもよい。たとえば、眼検出部１２により眼が検出された場合に、その眼が検出された部分画像をテンプレート画像として記憶部１４に記憶しておき、以降のフレーム画像で順次切り出された各部分画像を記憶部１４に記憶されているテンプレート画像と比較し、その類似度が所定の閾値以上であるか否か、すなわち眼検出部１２により検出された眼を表す画像であるか否かを判定することによって眼を追跡するようにしてもよい。 In the above embodiment, the case where the eye tracking unit 13 performs the eye tracking process using the classifier group has been described. However, the eye tracking unit 13 performs the eye tracking process using template matching. Also good. For example, when an eye is detected by the eye detection unit 12, the partial image in which the eye is detected is stored as a template image in the storage unit 14, and each partial image sequentially cut out in the subsequent frame images is stored. By comparing with the template image stored in the unit 14, it is determined whether or not the similarity is equal to or higher than a predetermined threshold, that is, whether the image represents the eye detected by the eye detection unit 12. The eyes may be tracked.

１０眼検出装置
１１フレームメモリ
１２眼検出部
１３眼追跡部
１４記憶部
２０判別器生成装置
２１サンプル画像取得部
２２学習部 DESCRIPTION OF SYMBOLS 10 Eye detection apparatus 11 Frame memory 12 Eye detection part 13 Eye tracking part 14 Storage part 20 Discriminator production | generation apparatus 21 Sample image acquisition part 22 Learning part

Claims

This is a discriminator generation method for generating a plurality of discriminators for discriminating, for each predetermined state, whether or not the discrimination target image is an image including a predetermined object whose state changes repetitively and continuously. And
The state is divided into three or more stages, and a plurality of sample images that are known to be images representing the predetermined object in the respective states, and an image that is not an image representing the predetermined object Get multiple sample images,
For each of the states, a plurality of sample images that are known to be images representing the predetermined object in the state are used as correct samples to represent the predetermined object in a state adjacent to the state. A sample image that is known to be an image is not an incorrect sample, and is a plurality of sample images that are known to be images representing the predetermined object that is two or more steps away from the state. An image representing the predetermined object in which the determination target image is in the state by learning at least one of the image and a plurality of sample images that are known not to represent the predetermined object as incorrect samples. A discriminator generation method for generating a discriminator for determining whether or not there is a discriminator.

The classifier generating method according to claim 1, wherein the predetermined object is an eye, and the state is an eye open / closed state.

3. The discriminator generation method according to claim 2, wherein the three or more stages are three stages of a closed eye state, a half-open state, and a fully-open state.

The predetermined object is an eye;
2. The determination according to claim 1, wherein the three or more stages are three stages of a state in which the pupil is located at the center of the eye, a state at the left end of the eye, and a state at the right end of the eye. Generator generation method.

The predetermined object is a face;
2. The discriminator generation method according to claim 1, wherein the three or more stages are three stages: a state where the face is facing front, a state where the face is facing obliquely, and a state where the face is facing sideways.

A discriminator generation device that generates a plurality of discriminators for discriminating each of predetermined states according to whether or not the determination target image is an image including a predetermined object whose state changes repetitively and continuously. And
The state is divided into three or more stages, and a plurality of sample images that are known to be images representing the predetermined object in the respective states, and an image that is not an image representing the predetermined object Sample image acquisition means for acquiring a plurality of sample images;
For each of the states, a plurality of sample images that are known to be images representing the predetermined object in the state are used as correct samples to represent the predetermined object in a state adjacent to the state. A sample image that is known to be an image is not an incorrect sample, and is a plurality of sample images that are known to be images representing the predetermined object that is two or more steps away from the state. An image representing the predetermined object in which the determination target image is in the state by learning at least one of the image and a plurality of sample images that are known not to represent the predetermined object as incorrect samples. A discriminator generating apparatus comprising learning means for generating a discriminator for determining whether or not there is a discriminator.

The discriminator generation device according to claim 6, wherein the predetermined object is an eye, and the state is an eye open / closed state.

8. The discriminator generation device according to claim 7, wherein the three or more stages are three stages of a closed eye state, a half-open state, and a fully open state.

The predetermined object is an eye;
7. The determination according to claim 6, wherein the three or more stages are three stages of a state in which the pupil is positioned at the center of the eye, a state at the left end of the eye, and a state at the right end of the eye. Generator.

The predetermined object is a face;
7. The discriminator generating device according to claim 6, wherein the three or more stages are three stages of a state where the face is facing front, a state where the face is facing obliquely, and a state where the face is facing sideways.

A discriminator generator that generates a plurality of discriminators for discriminating each computer according to a predetermined state, whether or not the discrimination target image is an image including a predetermined object whose state changes repetitively and continuously. A program for functioning as a device,
The computer is divided into three or more stages, and a plurality of sample images that are known to be images representing the predetermined object in each state, and are not images representing the predetermined object Sample image acquisition means for acquiring a plurality of sample images that are known;
For each of the states, a plurality of sample images that are known to be images representing the predetermined object in the state are used as correct samples to represent the predetermined object in a state adjacent to the state. A sample image that is known to be an image is not an incorrect sample, and is a plurality of sample images that are known to be images representing the predetermined object that is two or more steps away from the state. An image representing the predetermined object in which the determination target image is in the state by learning at least one of the image and a plurality of sample images that are known not to represent the predetermined object as incorrect samples. A discriminator generation program that functions as a learning means for generating a discriminator for determining whether or not there is a discriminator.