JP2020126573A

JP2020126573A - Image processor, eye detection method, and gazing direction detection method

Info

Publication number: JP2020126573A
Application number: JP2019108028A
Authority: JP
Inventors: 雄大松本; Takehiro Matsumoto; 要小川; Kaname Ogawa
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2019-02-06
Filing date: 2019-06-10
Publication date: 2020-08-20

Abstract

To provide a technique to achieve a reduction of a processing load in detecting a gazing region and improvement of detection accuracy.SOLUTION: In feature point detection processing in S1, an image processor detects a plurality of feature points indicating a position of eyes from an overall imaged image. In eye detection processing in S2, the image processor sets a determination region in the imaged image so as to include the plurality of feature points detected in the feature point detection processing, and determines whether or not the plurality of feature points correctly indicate the position of the eyes using information obtained from the determination region. In the feature point detection processing, the image processor detects the plurality of feature points by giving an initial value of a shape vector representing the position of the plurality of feature points and using a correction amount of the shape vector calculated using the information of the imaged image to repeat processing to update the shape vector.SELECTED DRAWING: Figure 2

Description

本開示は、撮像画像から撮像画像中の目が注視する方向を検出する技術に関する。 The present disclosure relates to a technique for detecting, from a captured image, a direction in which an eye gazes in the captured image.

下記特許文献１には、顔が撮像された画像から視線が向いている方向を示す注視領域を検出する装置が記載されている。具体的には、画像から顔位置を検出し、検出された顔位置から顔特徴点を検出し、検出された特徴点に基づいて瞳孔位置及び顔向きを検出し、瞳孔位置から視線方向を検出し、視線方向と顔向きとカメラ位置との幾何学的な演算により、注視領域を検出する。 Patent Document 1 below describes a device that detects a gaze area indicating the direction in which the line of sight is facing from an image in which a face is captured. Specifically, the face position is detected from the image, the face feature point is detected from the detected face position, the pupil position and the face direction are detected based on the detected feature point, and the gaze direction is detected from the pupil position. Then, the gaze area is detected by the geometric calculation of the line-of-sight direction, the face direction, and the camera position.

特開２０１２−３８１０６号公報JP 2012-38106 A

しかしながら、発明者の詳細な検討の結果、特許文献１に開示された従来技術は、効率的とは言えず、高速かつ高精度に注視領域を検出することが困難であるという課題が見出された。 However, as a result of detailed study by the inventor, the conventional technique disclosed in Patent Document 1 is not efficient, and it is difficult to detect the gaze area at high speed and with high accuracy. It was

即ち、従来技術では、顔位置の検出、顔特徴点の検出、瞳孔位置及び顔向きの検出等を行った結果を用いて、注視領域の検出が行われるため、各検出での検出誤差が蓄積され、最終的に検出される注視領域の検出精度が悪かった。 That is, in the related art, since the gaze area is detected using the results of the face position detection, the face feature point detection, the pupil position detection, the face orientation detection, etc., the detection error in each detection is accumulated. However, the detection accuracy of the gaze area that is finally detected is poor.

また、従来技術では、実行すべき検出処理の種類が単に多いだけでなく、最初に実行する顔検出に膨大な処理を必要とし処理時間も増大する。即ち、顔検出には、ウインドウ内の特定パターンに反応するように学習された検出器を、スライディングウインドウの方法で画像上の位置や大きさをずらして逐次スキャンしながらマッチングするパターンを発見することが行われている。この手法では、異なるサイズや位置で切り出されたウインドウを何度も評価する必要がある。また毎回評価すべきウインドウの大部分が前回と重複することもあり効率が悪く、速度やメモリ帯域の面からは多いに改善の余地がある。また、スライディングウインドウ方式では検出すべきオブジェクトの角度にバリエーションがあると、ある程度の角度の範囲ごとに検出器を構成する必要があり、この点でも効率は良いとは言えなかった。 Further, in the conventional technique, not only are there many types of detection processing to be executed, but a huge amount of processing is required for the face detection to be executed first, and the processing time also increases. That is, in face detection, a detector learned to respond to a specific pattern in the window is used to find a matching pattern by sequentially scanning by shifting the position and size on the image by the sliding window method. Is being done. In this method, it is necessary to evaluate windows clipped at different sizes and positions many times. In addition, most of the windows to be evaluated each time overlap with the previous one, resulting in poor efficiency, and there is plenty of room for improvement in terms of speed and memory bandwidth. Further, in the sliding window method, if there is variation in the angle of the object to be detected, it is necessary to configure the detector for each range of angles to some extent, and in this respect too, it cannot be said that the efficiency is good.

本開示の１つの局面は、注視領域の検出における処理負荷の軽減と検出精度の向上を両立させる技術を提供することにある。 One aspect of the present disclosure is to provide a technique that reduces the processing load in detecting the gaze area and improves the detection accuracy.

本開示の一態様は、画像処理装置であって、特徴点検出部（２０：Ｓ１）と、判定部（２０：Ｓ２）と、を備える。特徴点検出部は、撮像画像の全体から目の位置を表す複数の特徴点を検出する。判定部は、特徴点検出部で検出された複数の特徴点を含むように撮像画像中の判定領域を設定し、判定領域から得られる情報を用いて、複数の特徴点が目の位置を正しく表しているか否かを判定する。特徴点検出部は、複数の特徴点の位置を表現したベクトルを形状ベクトルとして、形状ベクトルの初期値を与え、撮像画像の情報を用いて算出される形状ベクトルの補正量を用いて、形状ベクトルを更新する処理を繰り返すことで、撮像画像中の真の目の位置を表す複数の特徴点を検出する。 One aspect of the present disclosure is an image processing apparatus, which includes a feature point detection unit (20:S1) and a determination unit (20:S2). The feature point detection unit detects a plurality of feature points representing eye positions from the entire captured image. The determination unit sets the determination region in the captured image so as to include the plurality of feature points detected by the feature point detection unit, and uses the information obtained from the determination region to correctly determine the eye positions of the plurality of feature points. It is determined whether or not it represents. The feature point detection unit uses a vector representing the positions of a plurality of feature points as a shape vector, gives an initial value of the shape vector, and uses the correction amount of the shape vector calculated using the information of the captured image to calculate the shape vector. By repeating the process of updating the, the plurality of feature points that represent the positions of the true eyes in the captured image are detected.

このような構成によれば、撮像画像から目の位置を検出する際に、顔検出を行う必要がなく、しかも、パターンマッチングではなく、複数の特徴点の位置を表現した形状ベクトルを繰り返し補正する回帰的な手法を用いるため、処理負荷を削減できる。その結果、処理時間を短縮すること、及びメモリ効率を向上させることができる。 With such a configuration, it is not necessary to perform face detection when detecting the position of the eyes from the captured image, and the shape vector expressing the positions of a plurality of feature points is repeatedly corrected instead of pattern matching. Since a recursive method is used, the processing load can be reduced. As a result, the processing time can be shortened and the memory efficiency can be improved.

また、本開示では、特徴点検出部によって目の検出に成功したか否かを判定する判定部を備えるため、誤検出された検出結果が後段に出力されることを抑制でき、検出結果の信頼性を向上させることができる。 Further, in the present disclosure, since the feature point detection unit includes the determination unit that determines whether or not the eye detection is successful, it is possible to suppress the erroneously detected detection result from being output to the subsequent stage, and to improve the reliability of the detection result. It is possible to improve the sex.

本開示の一態様は、画像処理装置であって、特徴点検出部（２０：Ｓ１）と、判定部（２０：Ｓ２）と、方向検出部（２０：Ｓ４）と、を備える。特徴点検出部は、撮像画像の全体から目の位置を表す複数の特徴点を検出する。判定部は、特徴点検出部で検出された複数の特徴点を含むように撮像画像中の判定領域を設定し、判定領域から得られる情報を用いて、複数の特徴点が目の位置を正しく表しているか否かを判定する。方向検出部は、判定部にて、複数の特徴点が目の位置を正しく表していると判定された場合、特徴点検出部にて検出された複数の特徴点の位置を含んで設定される目周辺領域の情報を用いて、撮像画像中の目が注視する方向である注視方向を検出する。また、方向検出部は、注視方向を表すベクトルを注視ベクトルとして、注視ベクトルの初期値を与え、目周辺領域の情報を用いて算出される注視ベクトルの修正量によって注視ベクトルを修正し、該修正された注視ベクトルが示す方向を注視方向として検出する。 One aspect of the present disclosure is an image processing apparatus, which includes a feature point detection unit (20:S1), a determination unit (20:S2), and a direction detection unit (20:S4). The feature point detection unit detects a plurality of feature points representing eye positions from the entire captured image. The determination unit sets the determination region in the captured image so as to include the plurality of feature points detected by the feature point detection unit, and uses the information obtained from the determination region to accurately determine the eye positions of the plurality of feature points. It is determined whether or not it represents. The direction detection unit is set to include the positions of the plurality of feature points detected by the feature point detection unit when the determination unit determines that the plurality of feature points correctly represent the eye position. Using the information of the eye peripheral region, the gaze direction, which is the direction in which the eye gazes in the captured image, is detected. Further, the direction detection unit, using a vector representing the gaze direction as a gaze vector, gives an initial value of the gaze vector, corrects the gaze vector by the correction amount of the gaze vector calculated using the information of the eye peripheral region, and The direction indicated by the selected gaze vector is detected as the gaze direction.

このような構成によれば、目周辺領域の情報を用い、回帰的な手法によって注視方向を検出するため、頭部位置、顔向き、及び視線角度等の検出結果に基づいて検出する従来の手法と比較して、処理を高速化できる。しかも複数の検出器を直列的に使用することによる誤差の蓄積も抑制され、検出精度を向上させることができる。 According to such a configuration, since the gaze direction is detected by the recursive method using the information of the eye peripheral region, the conventional method of detecting based on the detection result of the head position, the face direction, the line-of-sight angle, etc. The processing can be speeded up in comparison with. Moreover, the accumulation of errors due to the use of a plurality of detectors in series is suppressed, and the detection accuracy can be improved.

本開示の一態様は、目検出方法であって、特徴点検出ステップ（Ｓ１）と、判定ステップ（Ｓ２）と、を備える。特徴点検出ステップ、及び判定ステップでは、上述した１番目の画像処理装置における特徴点検出部、及び判定部と同様の処理を行う。 One aspect of the present disclosure is an eye detection method, which includes a feature point detection step (S1) and a determination step (S2). In the characteristic point detection step and the determination step, the same processing as the characteristic point detection unit and the determination unit in the first image processing apparatus described above is performed.

本開示の目検出方法によれば、上述した１番目の画像処理装置による効果と、同様の効果を得ることができる。
本開示の一態様は、注視方向検出方法であって、特徴点検出ステップ（Ｓ１）と、判定ステップ（Ｓ２）と、方向検出ステップ（Ｓ４）と、を備える。特徴点検出ステップ、判定ステップ、及び方向検出ステップでは、上述した２番目の画像処理装置における特徴点検出部、判定部、及び方向検出ステップと同様の処理を行う。 According to the eye detection method of the present disclosure, it is possible to obtain the same effect as that of the first image processing device described above.
One aspect of the present disclosure is a gaze direction detection method, which includes a feature point detection step (S1), a determination step (S2), and a direction detection step (S4). In the feature point detection step, the determination step, and the direction detection step, the same processing as the feature point detection unit, the determination unit, and the direction detection step in the second image processing apparatus described above is performed.

本開示の注視方向検出方法によれば、上述した２番目の画像処理装置による効果と、同様の効果を得ることができる。 According to the gaze direction detecting method of the present disclosure, it is possible to obtain the same effects as the effects of the second image processing device described above.

視線検出システムの構成を示すブロック図である。It is a block diagram which shows the structure of a gaze detection system. 注視領域検出処理のフローチャートである。It is a flowchart of a gaze area detection process. 回帰木を説明する図である。It is a figure explaining a regression tree. 特徴点用パラメータを生成する学習処理のフローチャートである。It is a flowchart of the learning process which produces|generates the parameter for feature points. 特徴点検出処理のフローチャートである。It is a flowchart of a feature point detection process. 特徴点検出処理で用いられる形状パラメータ及び補正量を説明する図である。It is a figure explaining the shape parameter and correction amount used by a feature point detection process. 特徴点検出処理により検出された複数の特徴点が、目の検出に成功している場合と、目の検出に失敗している場合とについて例示する図である。It is a figure which illustrates a case where a plurality of feature points detected by the feature point detection processing have succeeded in eye detection, and a case where eye detection has failed. 弱識別器を生成する学習処理のフローチャートである。It is a flowchart of the learning process which produces|generates a weak discriminator. 目判定処理のフローチャートである。It is a flow chart of eye judgment processing. 目周辺画像を説明する図である。It is a figure explaining an eye periphery image. 複数の注視領域の設定例を示す図である。It is a figure showing an example of setting a plurality of gaze fields. 領域用パラメータを生成する学習処理のフローチャートである。It is a flowchart of the learning process which produces|generates the parameter for areas. 領域検出処理のフローチャートである。It is a flowchart of a region detection process.

以下、図面を参照しながら、本開示の実施形態を説明する。
［１．構成］
図１に示す視線検出システム１は、カメラ１０と、画像処理装置２０と、を含むシステムである。視線検出システム１は、例えば、車両に搭載され、運転支援制御等で必要となる情報であるドライバの視線方向を検出する。 Hereinafter, embodiments of the present disclosure will be described with reference to the drawings.
[1. Constitution]
The visual line detection system 1 shown in FIG. 1 is a system including a camera 10 and an image processing device 20. The line-of-sight detection system 1 is installed in a vehicle, for example, and detects the line-of-sight direction of a driver, which is information required for driving support control and the like.

カメラ１０は、例えば公知のＣＣＤイメージセンサやＣＭＯＳイメージセンサなどを用いることができる。カメラ１０は、例えば、車両の運転席に着座したドライバの顔が撮像範囲に含まれるように配置される。カメラ１０は、周期的に撮像を実行し、撮像画像のデータを画像処理装置２０に出力する。 As the camera 10, for example, a known CCD image sensor or CMOS image sensor can be used. The camera 10 is arranged, for example, such that the face of the driver sitting in the driver's seat of the vehicle is included in the imaging range. The camera 10 periodically captures images and outputs captured image data to the image processing device 20.

画像処理装置２０は、ＣＰＵ２１と、例えば、ＲＡＭ又はＲＯＭ等の半導体メモリ（以下、メモリ２２）と、を有するマイクロコンピュータを備え、視線検出処理を少なくとも実行する。 The image processing apparatus 20 includes a microcomputer having a CPU 21 and a semiconductor memory (hereinafter, memory 22) such as a RAM or a ROM, and executes at least the line-of-sight detection process.

［２．処理］
画像処理装置２０が実行する視線検出処理について、図２のフローチャートを用いて説明する。視線検出処理は、カメラ１０から撮像画像のデータが取得される毎に起動される。 [2. processing]
The line-of-sight detection process executed by the image processing apparatus 20 will be described with reference to the flowchart of FIG. The line-of-sight detection process is activated each time the captured image data is acquired from the camera 10.

Ｓ１では、画像処理装置２０は、特徴点検出処理を実行する。特徴点検出処理は、取得した撮像画像から、画像中における目の位置を特定するのに必要な複数の特徴点を検出する処理である。 In S1, the image processing device 20 executes a feature point detection process. The feature point detection process is a process of detecting, from the acquired captured image, a plurality of feature points necessary for specifying the eye position in the image.

続くＳ２では、画像処理装置２０は、目判定処理を実行する。目判定処理は、Ｓ１で検出された複数の特徴点に基づいて撮像画像から検出される特徴点周辺画像から、複数の特徴点が目の位置を正しく表しているか否か、すなわち、目の検出に成功しているか否かを判定する処理である。 In subsequent S2, the image processing device 20 executes the eye determination process. The eye determination process determines whether or not the plurality of feature points correctly represent the eye position from the feature point peripheral image detected from the captured image based on the plurality of feature points detected in S1, that is, eye detection. Is a process of determining whether or not

続くＳ３では、画像処理装置２０は、Ｓ２での目判定処理の結果、目の検出に成功していると判定した場合は、Ｓ４の処理に移行し、目の検出に失敗していると判定した場合は、視線検出処理を終了する。 In subsequent S3, if the image processing apparatus 20 determines that the eye detection has been successful as a result of the eye determination processing in S2, the image processing apparatus 20 proceeds to S4 and determines that the eye detection has failed. If so, the line-of-sight detection process ends.

Ｓ４では、画像処理装置２０は、領域検出処理を実行して、視線検出処理を終了する。領域検出処理は、Ｓ１で検出された複数の特徴点に基づいて撮像画像から検出される目の周辺画像を用い、検出された目の視線が、予め設定された複数の視線領域のうち、いずれに向いているかを検出する処理である。 In S4, the image processing device 20 executes the area detection process and ends the line-of-sight detection process. The area detection process uses the peripheral image of the eye detected from the captured image based on the plurality of feature points detected in S1, and the detected eye gaze is one of a plurality of preset eye gaze areas. This is a process for detecting whether or not it is suitable for.

以下、Ｓ１の特徴点検出処理、Ｓ２の目判定処理、及びＳ４の領域検出処理の詳細を、順番に説明する。
なお、Ｓ１が特徴点検出部及び特徴点検出ステップに相当し、Ｓ２が判定部及び判定ステップに相当し、Ｓ４が、方向検出部及び方向検出ステップに相当する。 Hereinafter, details of the feature point detection processing of S1, the eye determination processing of S2, and the area detection processing of S4 will be described in order.
Note that S1 corresponds to the characteristic point detecting unit and the characteristic point detecting step, S2 corresponds to the determining unit and the determining step, and S4 corresponds to the direction detecting unit and the direction detecting step.

［２−１．特徴点検出処理］
［２−１−１．概要］
特徴点検出処理において使用される、撮像画像から複数の特徴点を検出する方法について説明する。 [2-1. Feature point detection processing]
[2-1-1. Overview]
A method of detecting a plurality of feature points from a captured image used in the feature point detection process will be described.

特徴点は、撮像画像中において、右目と左目とが撮像されている位置の特定に有効な点が用いられる。本実施形態では、両目の目尻及び目頭の合計４点を特徴点とする。特徴点の種類及び数は、これらに限定されるものではない。 As the characteristic points, points that are effective for specifying the positions where the right eye and the left eye are imaged in the captured image are used. In this embodiment, a total of four points, the outer corners of the eyes and the inner corners of the eyes, are set as the characteristic points. The types and number of feature points are not limited to these.

ここで、全ての特徴点の座標を列挙した列ベクトルを形状ベクトルＳという。形状ベクトルＳは、ｐ番目の特徴点の座標を（ｘ_ｐ，ｙ_ｐ）として、（１）式で表される。但し、Ｐは特徴点の数、ｐ＝１，２，…，Ｐである。（１）式中のＴは、転置行列であることを示す。 Here, a column vector listing the coordinates of all feature points is called a shape vector S. The shape vector S is represented by the equation (1), where the coordinates of the p-th feature point are (x _p , y _p ). However, P is the number of feature points, p=1, 2,..., P. T in the equation (1) indicates that it is a transposed matrix.

特徴点検出処理では、形状ベクトルＳの初期値Ｓ⁽⁰⁾を与え、撮像画像から得られる情報を用いて算出される補正量によって、特徴点の推定位置を繰り返し補正することで、両目の目尻及び目頭の位置を正しく表した形状ベクトルＳを回帰的に算出する。ｔ回目の回帰ステップで算出される形状ベクトルＳ^(t)は（２）式で表される。ただし、Ｔは補正を繰り返す回数、ｔ＝１，２，…，Ｔとする。また、回帰ステップをＴ回繰り返すことで最終的に得られる形状ベクトルＳ^(T)は（３）式で表される。但し、ｒ_ｔはｔ回目の回帰ステップで使用される補正量であり、形状ベクトルＳと同次元のベクトルで表現される。 In the feature point detection process, the initial value S ⁽⁰⁾ of the shape vector S is given, and the estimated position of the feature point is repeatedly corrected by the correction amount calculated using the information obtained from the captured image, so that And a shape vector S that correctly represents the position of the inner corner of the eye is recursively calculated. The shape vector S ^(t) calculated in the t-th regression step is expressed by the equation (2). However, T is the number of times the correction is repeated, and t=1, 2,..., T. Further, the shape vector S ^(T) finally obtained by repeating the regression step T times is expressed by the equation (3). However, r _t is a correction amount used in the t-th regression step, and is represented by a vector having the same dimension as the shape vector S.

（２）式は、ｔ回目の回帰ステップでは、前回の回帰ステップで得られた形状ベクトルＳ^(t-1)に、補正量ｒ_ｔを加えることで、更新された形状ベクトルＳ^(t)が算出されることを意味する。（３）式は、形状ベクトルＳ⁽⁰⁾の初期値に、１回目からＴ回目までの各回帰ステップで得られる補正量ｒ_１〜ｒ_Ｔを全て加算した結果が、最終的な形状ベクトルＳ^(T)になることを意味する。 In the equation (2), at the t-th regression step, the updated shape vector S ^(t) is obtained by adding the correction amount r _t to the shape vector S ^(t-1) obtained in the previous regression step. It means to be calculated. The formula (3) is obtained by adding all the correction amounts r _{1 to} r _T obtained in each regression step from the first time to the Tth time to the initial value of the shape vector S ⁽⁰⁾ to obtain the final shape vector S. It means becoming ^(T) .

補正量ｒ_ｔは、（４）式に示すように、前回の回帰ステップで得られた形状ベクトルＳ^(t-1)と撮像画像Ｉとを入力情報として、補正関数ｆ_Kに作用させることで得られる。補正関数ｆ_Kは、勾配ブースティングを用いた回帰関数の加法モデルを適用した関数である。
このような回帰関数は、例えば、“One Millisecond Face Alignment with an Ensemble of Regression Trees”Vahid Kazemi and Josephine Sullivan, The IEEE Conference on CVPR,2014,1867-1874（以下、参考文献１）、及び、“Greedy Function Approximation : A gradient boosting machine”Jerome H. Friedman, The Annals of Statistics Volume 29, Number 5 (2001),1189-1232（以下、参考文献２）等に示される。 The correction amount r _t is obtained by acting on the correction function f _{K using} the shape vector S ^(t-1) obtained in the previous regression step and the captured image I as input information, as shown in the equation (4). can get. The correction function f _K is a function to which an additive model of a regression function using gradient boosting is applied.
Such a regression function is, for example, “One Millisecond Face Alignment with an Ensemble of Regression Trees” Vahid Kazemi and Josephine Sullivan, The IEEE Conference on CVPR, 2014, 1867-1874 (hereinafter referred to as reference 1), and “Greedy. Function Approximation: A gradient boosting machine”Jerome H. Friedman, The Annals of Statistics Volume 29, Number 5 (2001), 1189-1232 (hereinafter referred to as Reference 2).

補正関数ｆ_Kは、予め用意されたＫ個の回帰木を用いて回帰的に値が決定される関数であり、（５）式で定義される。但し、ｆ_０は補正関数ｆ_Kの初期値、ｇ_ｋはｋで識別される回帰木によって値が決定される回帰関数、ｋ＝１，２，…Ｋである。また、γは学習率であり、０＜γ＜１に設定される。γの値を小さくすることで、過学習となることを抑制し、特徴点の位置の多様性に対応する。 The correction function f _K is a function whose value is recursively determined using K regression trees prepared in advance, and is defined by the equation (5). However, f ₀ is an initial value of the correction function f _K , g _k is a regression function whose value is determined by a regression tree identified by k, and k=1, 2,... K. Further, γ is a learning rate and is set to 0<γ<1. By reducing the value of γ, it is possible to suppress over-learning and cope with the diversity of the positions of the feature points.

Ｋ個の回帰木は、いずれも同様の構造を有する。例えば、図３に示すように、回帰木４１として、ノードを順次二つに分岐させる二分木が用いられる。回帰木４１の枝の分岐点となるノードを通常ノード４２、回帰木４１の葉となるノードを末端ノード４３という。通常ノード４２が通常ノード４２を識別するノードインデックスをｅとすると、ｅ番目の通常ノードには、ピクセルペア（Ｐ_e1，Ｐ_e2）とスレッショルドθ_eとが対応づけられる。末端ノード４３のそれぞれには回帰量（すなわち、図３ではｒ_k1〜ｒ_k8）が対応づけられる。 All K regression trees have the same structure. For example, as shown in FIG. 3, as the regression tree 41, a binary tree that sequentially branches nodes into two is used. A node that is a branch point of the branch of the regression tree 41 is called a normal node 42, and a node that is a leaf of the regression tree 41 is called a terminal node 43. If the node index by which the normal node 42 identifies the normal node 42 is e, the pixel pair (P _e1 , P _e2 ) and the threshold θ _e are associated with the e-th normal node. A regression amount (that is, r _{k1 to} r _{k8 in} FIG. 3) is _associated with each of the terminal nodes 43.

ピクセルペア（Ｐ_e1，Ｐ_e2）は、前回の回帰ステップで得られた形状ベクトルＳ^(t-1)から設定される撮像画像Ｉ上の基点（例えば、いずれか一つの特徴点の位置、または複数の特徴点の平均位置）に対する相対座標で定義される。各通常ノード４２では、ピクセルペア（Ｐ_e1，Ｐ_e2）間の差分値（例えば、輝度差）が、スレッショルドθ_eよりも高いか低いかに応じて、次階層のノードに至るいずれの枝を選択するかが決定される。図３では、Ｐ_e1，Ｐ_e2がそのピクセルの差分値を表すものとする。選択された枝を辿りながら同様の処理を繰り返すことで末端ノード４３に到達する。到達した末端ノード４３に対応付けられた回帰量が、回帰関数ｇ_ｋの出力値の一部となる。 The pixel pair (P _e1 , P _e2 ) is a base point (for example, the position of any one of the feature points on the captured image I set from the shape vector S ^(t-1) obtained in the previous regression step, or The average position of a plurality of characteristic points) is defined by relative coordinates. In each normal node 42, depending on whether the difference value (for example, the brightness difference) between the pixel pairs (P _e1 , P _e2 ) is higher or lower than the threshold θ _e , which branch to the node of the next layer is selected. It is decided whether to do it. In FIG. 3, P _e1 and P _e2 represent the difference value of the pixel. The terminal node 43 is reached by repeating the same processing while tracing the selected branch. The amount of regression associated with the reached terminal node 43 becomes a part of the output value of the regression function g _k .

ところで、ピクセルペア（Ｐ_e1，Ｐ_e2）に属する各ピクセルの位置を表す相対座標は、標準画像上のベクトル（以下、標準ベクトル）として表わされる。ここでいう標準画像とは、多数の学習サンプルにより求められた平均的な画像である。 By the way, the relative coordinates indicating the position of each pixel belonging to the pixel pair (P _e1 , P _e2 ) are expressed as a vector on the standard image (hereinafter, standard vector). The standard image here is an average image obtained by many learning samples.

但し、撮像画像中の顔は、必ずしも標準画像中の顔と同じ姿勢で撮像されるとは限らず、標準ベクトルをそのまま撮像画像Ｉに適用するとずれが生じる。そこで、ピクセルペア（Ｐ_e1，Ｐ_e2）を使用するときには、撮像画像Ｉ毎に、上記ずれを小さくするSimilarity行列（以下、変換行列）Ｑを用いて、標準ベクトルで表現されピクセルの位置に修正を加えて、撮像画像Ｉ上でのピクセルの位置を決定してもよい。変換行列Ｑは、基準画像上の点に対して、どのような回転、拡大、縮小を加えれば、撮像画像Ｉ上の対応点に最も近似するかを示す行列である。なお変換行列Ｑを用いることは必須ではないが、変換行列Ｑを使用することで形状ベクトルＳの推定精度を向上させることができる。 However, the face in the captured image is not always captured in the same posture as the face in the standard image, and if the standard vector is applied to the captured image I as it is, a shift occurs. Therefore, when the pixel pair (P _e1 , P _e2 ) is used, the similarity matrix (hereinafter referred to as a conversion matrix) Q that reduces the deviation is used for each captured image I to correct the pixel position represented by the standard vector. In addition, the position of the pixel on the captured image I may be determined. The transformation matrix Q is a matrix indicating what kind of rotation, enlargement, or reduction is applied to a point on the reference image to best approximate the corresponding point on the captured image I. It is not essential to use the transformation matrix Q, but the estimation accuracy of the shape vector S can be improved by using the transformation matrix Q.

このように、特徴点検出処理では、形状ベクトルＳの初期値Ｓ⁽⁰⁾と、回帰関数ｆ_Ｋの初期値ｆ_０と、回帰木４１を定義する各パラメータとが事前に用意されている必要がある。以下では、これらのパラメータを総称して特徴点用パラメータという。 As described above, in the feature point detection processing, the initial value S ⁽⁰⁾ of the shape vector S, the initial value f ₀ of the regression function f _K , and each parameter that defines the regression tree 41 need to be prepared in advance. There is. Hereinafter, these parameters will be collectively referred to as feature point parameters.

特徴点用パラメータは、画像処理装置２０が学習処理を実行することで生成される。但し、学習処理は、必ずしも画像処理装置２０で実行される必要は無く、画像処理装置２０以外の装置にて実行されてもよい。 The feature point parameter is generated by the image processing device 20 executing the learning process. However, the learning process does not necessarily have to be executed by the image processing device 20, and may be executed by a device other than the image processing device 20.

［２−１−２．学習］
特徴点用パラメータを設定するための学習処理（以下、第１学習処理）について、図４のフローチャートを用いて説明する。第１学習処理を実行する際には、メモリ２２に学習サンプルとなる多数の学習画像が記憶される。ここでは、Ｎ個の学習画像を用意し、各学習画像における特徴点の座標があらかじめ検出されている。 [2-1-2. Learning]
The learning process for setting the characteristic point parameters (hereinafter referred to as the first learning process) will be described with reference to the flowchart of FIG. When executing the first learning process, a large number of learning images serving as learning samples are stored in the memory 22. Here, N learning images are prepared, and the coordinates of the feature points in each learning image are detected in advance.

まずＳ１１では、画像処理装置２０は、各学習画像の特徴点の座標を取得し、学習画像毎に形状ベクトルＳの正解値（以下、教師データ）Ｓ_１〜Ｓ_Ｎを生成する。
続くＳ１２では、画像処理装置２０は、Ｓ１１で生成された教師データＳ_１〜Ｓ_Ｎの平均値を、形状ベクトルＳの初期値Ｓ⁽⁰⁾として算出する。 First, in S11, the image processing apparatus 20 acquires the coordinates of the feature points of each learning image, correct values of the shape vector S for each learning image (hereinafter, teacher data) to generate the S ₁ to S _N.
In subsequent S12, the image processing apparatus 20 calculates the average value of the teacher data S _{1 to} _SN generated in S11 as the initial value S ⁽⁰⁾ of the shape vector S.

続くＳ１３では、画像処理装置２０は、以下で説明する処理の繰返回数を表す回帰ステップｔを１に初期化する。
続くＳ１４では、画像処理装置２０は、（６）式に従って各学習画像の変換行列Ｑ_itを算出する。 In subsequent S13, the image processing apparatus 20 initializes a regression step t representing the number of times of repeating the processing described below to 1.
In subsequent S14, the image processing device 20 calculates the transformation matrix Q _it of each learning image according to the equation (6).

（６）式は、各学習画像の現時点で得られている形状ベクトルＳ^(t-1)における特徴点pの座標Ｓ_ip ^(t-1)と、各学習画像の教師データＳ_１〜Ｓ_Ｎの平均形状ベクトルにおける特徴点ｐの座標Ｓ_ｐに行列Ｑを作用させた結果との距離を、各特徴点pについて合計したときに、その合計値を最小にする行列Ｑを、変換行列Ｑ_itとすることを意味する。但し、p＝１，２，…Ｐ、ｉ＝１，２，…Ｎである。 Expression (6) is obtained by using the coordinates S _ip ^(t-1) of the feature point p in the shape vector S ^(t-1) obtained at the present time of each learning image and the teaching data S _{1 to} S _{N of} each learning image. When the distance from the result of applying the matrix Q to the coordinates S _p of the feature point p in the average shape vector of is summed for each feature point p, the matrix Q that minimizes the total value is the transformation matrix Q _it Means to However, p=1,2,...P and i=1,2,...N.

続くＳ１５では、画像処理装置２０は、（７）式に従って、学習画像毎に、教師データＳｉと形状ベクトルＳ^(t-1)との差分である補正残差ΔＳ_ｉ ^(t)を算出する。 In subsequent S15, the image processing apparatus 20 calculates a correction residual ΔS _i ^(t) which is a difference between the teacher data Si and the shape vector S ^(t-1) for each learning image according to the equation (7).

続くＳ１６では、画像処理装置２０は、補正量ｒ_ｔの算出に用いる補正関数ｆ_Ｋの初期値ｆ₀を、（８）式を用いて算出する。 In subsequent S16, the image processing apparatus 20 calculates the initial value f ₀ of the correction function f _K used for calculating the correction amount r _t using the equation (8).

（８）式は、各学習画像における補正残差ΔＳ_ｉ ^(t)と、形状ベクトルＳと同じ次元を有する任意のベクトルＶとの距離を、全ての学習画像について合計したときに、その合計値を最小にするベクトルＶを、Ｓ１７〜Ｓ２３の処理で学習する補正関数ｆ_Ｋの初期値ｆ_０とすることを意味する。 The expression (8) is obtained by summing the distances between the correction residual ΔS _i ^(t) in each learning image and an arbitrary vector V having the same dimension as the shape vector S for all the learning images. It means that the vector V that minimizes is the initial value f ₀ of the correction function f _K learned in the processing of S17 to S23.

続くＳ１７では、画像処理装置２０は、回帰木の識別に用いる回帰木インデックスｋを１に初期化する。
続くＳ１８では、画像処理装置２０は、学習画像毎に、残り修正量ｃ_ｋを（９）式を用いて算出する。 In subsequent S17, the image processing device 20 initializes the regression tree index k used for identifying the regression tree to 1.
In subsequent S18, the image processing apparatus 20 calculates the remaining correction amount c _k for each learning image by using the equation (9).

（９）式は、Ｓ１５で算出した補正残差ΔＳ_i ^(t)と学習中の補正関数f_k-1によって得られる補正量との差分を、残り修正量ｃ_ｋとすることを意味する。 The expression (9) means that the difference between the correction residual ΔS _i ^(t) calculated in S15 and the correction amount obtained by the correction function f _k−1 during learning is set as the remaining correction amount c _k .

続くＳ１９では、画像処理装置２０は、回帰木４１の生成に用いるピクセルペアを選択する。
続くＳ２０では、画像処理装置２０は、Ｓ１９で選択したピクセルペアの差分値を、学習画像を分類する際の指標とし、すべての学習画像において残り修正量ｃ_ｋに近い値が得られるような回帰木４１を生成する。すなわち、回帰木４１によって実現される回帰関数ｇ_kを生成する。このようなピクセルペアの選択及び回帰木の生成には、例えば、上述した参考文献１の２．３．２項に記載される方法を用いてもよい。 In subsequent S19, the image processing device 20 selects a pixel pair used for generating the regression tree 41.
In subsequent S20, the image processing apparatus 20 uses the difference value of the pixel pair selected in S19 as an index for classifying the learning images, and performs regression such that a value close to the remaining correction amount c _k is obtained in all the learning images. The tree 41 is generated. That is, the regression function g _k realized by the regression tree 41 is generated. For the selection of such pixel pairs and the generation of the regression tree, for example, the method described in the paragraph 2.3.2 of the above-mentioned reference document 1 may be used.

続くＳ２１では、画像処理装置２０は、Ｓ２０で生成された回帰関数ｇ_ｋを用いて、（１０）式により補正関数ｆ_ｋを修正する。 In subsequent S21, the image processing apparatus 20 corrects the correction function f _k by the equation (10) using the regression function g _k generated in S20.

続くＳ２２では、画像処理装置２０は、回帰木インデックスｋを１増加させる。 In subsequent S22, the image processing device 20 increments the regression tree index k by 1.

続くＳ２３では、画像処理装置２０は、ｋ＞Ｋであるか否かを判定する。Ｓ２３において否定判定された場合、処理をＳ１８に戻す。これにより、残り修正量ｃ_ｋに対する新たな回帰木（すなわち、回帰関数ｇ_ｋ）の生成が繰り返される。一方、Ｓ２３において肯定判定された場合、すなわち、Ｋ個の回帰関数ｇ_１〜ｇ_Ｋが生成された場合、処理をＳ２４に進める。 In subsequent S23, the image processing apparatus 20 determines whether k>K. When a negative determination is made in S23, the process is returned to S18. Thereby, generation of a new regression tree (that is, regression function g _k ) for the remaining correction amount c _k is repeated. On the other hand, when an affirmative determination is made in S23, that is, when K regression functions g _{1 to} g _K are generated, the process proceeds to S24.

Ｓ２４では、画像処理装置２０は、（４）式に示すように、Ｓ１８〜Ｓ２３の処理で得られた補正関数ｆ_Ｋの値である補正量ｒ_ｔを用いて、（２）式に従って、形状ベクトルＳを更新する。 In S24, the image processing apparatus 20 uses the correction amount r _t , which is the value of the correction function f _K obtained in the processes of S18 to S23, as shown in Expression (4), and according to Expression (2), Update the vector S.

続くＳ２５では、画像処理装置２０は、回帰ステップｔを１増加させる。
続くＳ２６では、画像処理装置２０は、ｔ＞Ｔであるか否かを判定する。Ｓ２６において、否定判定された場合、処理をＳ２に戻す。これにより、回帰ステップｔに対応づけられる補正量ｒ_ｔの算出に用いる補正関数ｆ_Ｋの生成が繰り返される。一方、Ｓ２６において肯定判定された場合、すなわち、各回帰ステップｔ＝１〜Ｔのそれぞれで使用されるＴ個の補正関数ｆ_Ｋの生成が終了した場合、特徴点用パラメータの学習処理を終了する。 In subsequent S25, the image processing device 20 increments the regression step t by 1.
In subsequent S26, the image processing apparatus 20 determines whether or not t>T. When a negative determination is made in S26, the process returns to S2. Thus, the generation of the correction function f _K used for calculating the correction amount r _{t associated with} the regression step t is repeated. On the other hand, when an affirmative determination is made in S26, that is, when the generation of the T correction functions f _K used in each of the regression steps t=1 to T is finished, the learning process of the feature point parameter is finished. ..

なお、第１学習処理では、Ｔ個の回帰ステップ毎に１つの補正関数ｆ_Ｋが生成され、１つの補正関数ｆ_Ｋに対してＫ個の回帰関数ｇ_ｋが用いられるため、合計Ｔ×Ｋ個の回帰関数ｇ_ｋ（すなわち、回帰木４１）が生成される。また、以下では、回帰ステップｔと回帰木インデックスｋとによって識別される回帰木４１を、回帰木ＲＴ_tkで表す。 In the first learning process, one correction function f _K is generated for each T regression steps, and K regression functions g _k are used for one correction function f _K , so that a total of T×K. Regression functions g _k (that is, the regression tree 41) are generated. Further, hereinafter, the regression tree 41 identified by the regression step t and the regression tree index k is represented by a regression tree RT _tk .

［２−１−３．処理］
画像処理装置２０が、先のＳ１にて実行する特徴点検出処理について、図５のフローチャートを用いて説明する。特徴点検出処理では、第１学習処理によって生成された特徴点用パラメータが用いられる。 [2-1-3. processing]
The feature point detection processing executed by the image processing apparatus 20 in S1 will be described with reference to the flowchart of FIG. In the feature point detection process, the feature point parameter generated by the first learning process is used.

まずＳ３１では、画像処理装置２０は、カメラ１０から撮像画像Ｉを取得する。
続くＳ３２では、画像処理装置２０は、回帰ステップｔを１に初期化する。
続くＳ３３では、画像処理装置２０は、Ｓ３１で取得した撮像画像Ｉに対する変換行列Ｑ_ｔを、（１１）式を用いて算出する。 First, in S31, the image processing apparatus 20 acquires the captured image I from the camera 10.
In subsequent S32, the image processing device 20 initializes the regression step t to 1.
In subsequent S33, the image processing apparatus 20, the transformation matrix _{Q t} for captured images I acquired in S31, is calculated using equation (11).

（１１）式は、撮像画像Ｉから現時点で得られている形状ベクトルＳ^(t-1)における特徴点pの座標Ｓ_p ^(t-1)と、各学習画像の教師データＳ_１〜Ｓ_Ｎの平均形状ベクトルにおける特徴点ｐの座標Ｓ_ｐに行列Ｑを作用させた結果との距離を、各特徴点pについて合計したときに、その合計値を最小にする行列Ｑを、変換行列Ｑ_tとすることを意味する。 Expression (11) is obtained by using the coordinates S _p ^(t-1) of the feature point p in the shape vector S ^(t-1) currently obtained from the captured image I and the training data S _{1 to} S _{N of} each learning image. When the distance from the result of applying the matrix Q to the coordinates S _p of the feature point p in the average shape vector of is summed for each feature point p, the matrix Q that minimizes the total value is the transformation matrix Q _t. Means to

続くＳ３４では、画像処理装置２０は、回帰木インデックスｋを１に初期化する。
続くＳ３５では、画像処理装置２０は、回帰ステップｔと回帰木インデックスｋとで識別される回帰木ＲＴ_tkを用いて回帰関数g_ｋの値を取得し、（１０）式を用いて補正関数ｆ_ｋを更新する。但し、回帰木ＲＴ_tkを用いる処理では、各通常ノード４２に対応づけられたピクセルペアの位置を、Ｓ３３で算出した変換行列Ｑ_ｔで修正して使用する。 In subsequent S34, the image processing device 20 initializes the regression tree index k to 1.
In subsequent S35, the image processing apparatus 20 acquires the value of the regression function g _k using the regression tree RT _tk identified by the regression step t and the regression tree index k, and uses the equation (10) to correct the correction function f _k. Update _k . However, in the process using the regression tree RT _tk , the position of the pixel pair associated with each normal node 42 is corrected by the conversion matrix Q _t calculated in S33 and used.

続くＳ３６では、画像処理装置２０は、回帰木インデックスｋを１増加させる。
続くＳ３７では、画像処理装置２０は、ｋ＞Ｋであるか否かを判定する。つまり、回帰ステップｔに対応づけられた全ての回帰木ＲＴ_t1〜ＲＴ_tKが補正関数ｆ_ｋの算出に用いられたか否かを判定する。Ｓ３７において否定判定された場合は処理をＳ３５に戻し、Ｓ３７において肯定判定された場合は処理をＳ３８に進める。 In subsequent S36, the image processing device 20 increments the regression tree index k by 1.
In subsequent S37, the image processing device 20 determines whether or not k>K. That is, it is determined whether all the regression trees RT _{t1 to} RT _{tK associated} with the regression step t have been used in the calculation of the correction function f _k . If a negative determination is made in S37, the process returns to S35, and if an affirmative determination is made in S37, the process proceeds to S38.

Ｓ３８では、画像処理装置２０は、Ｓ３５〜Ｓ３７の処理により、最終的に得られた補正関数ｆ_Ｋの値を補正量ｒ_ｔとし、（２）式を用いて形状ベクトルＳを更新する。つまり、形状ベクトルＳ^(t)を算出する。 In S38, the image processing apparatus 20 sets the value of the correction function f _K finally obtained by the processes of S35 to S37 as the correction amount r _t, and updates the shape vector S using the equation (2). That is, the shape vector S ^(t) is calculated.

続くＳ３９では、画像処理装置２０は、回帰ステップｔを１増加させる。
続くＳ４０では、画像処理装置２０は、ｔ＞Ｔであるか否かを判定する。Ｓ４０において否定判定された場合は処理をＳ３３に戻し、Ｓ４０において肯定判定された場合は処理を終了する。 In subsequent S39, the image processing device 20 increments the regression step t by 1.
In subsequent S40, the image processing apparatus 20 determines whether or not t>T. If a negative determination is made in S40, the process returns to S33, and if an affirmative determination is made in S40, the process ends.

つまり、図６に示すように、特徴点検出処理では、形状ベクトルの初期値Ｓ⁽⁰⁾を起点として、回帰ステップ毎に補正量ｒ_ｔの算出、及び更新された形状ベクトルＳ^(t)の生成を繰り返すことで、最終的に算出される形状ベクトルＳ^(T)が、４つの特徴点の位置の推定結果となる。 That is, as shown in FIG. 6, in the feature point detection processing, the correction amount r _t is calculated for each regression step, and the updated shape vector S ^(t) is set with the initial value S ⁽⁰⁾ of the shape vector as a starting point. By repeating the generation, the finally calculated shape vector S ^(T) becomes the estimation result of the positions of the four feature points.

［２−２．目判定処理］
［２−２−１．概要］
目判定処理は、特徴点検出処理により、両目の目尻及び目頭の位置を表す形状ベクトルＳが正しく推定されたか否か、すなわち、目の検出に成功したか否かを判定する。 [2-2. Eye judgment processing]
[2-2-1. Overview]
The eye determination process determines whether or not the shape vector S representing the positions of the outer corners of the eyes and the inner corners of the eyes has been correctly estimated by the feature point detection process, that is, whether the eye has been successfully detected.

これにより、目の検出に失敗した場合に、後段の処理において、誤った注視領域が出力されることを抑制する。なお、目の検出に失敗する例としては、図７に示すように、そもそも人が映っていない場合、人は映っているが誤った点が検出される場合がある。この他にも、何等かによって目の一部又は全部が遮蔽され見えない場合もある。 This suppresses the output of an incorrect gaze area in the subsequent processing when the eye detection fails. As an example in which the eye detection fails, as shown in FIG. 7, when a person is not reflected in the first place, a wrong point may be detected although a person is reflected. In addition to this, there is a case where some or all of the eyes are blocked due to some reason and cannot be seen.

目判定処理では、特徴点検出処理によって得られた形状ベクトルＳから、右目及び左目のそれぞれについて各目の中心座標Ｓ_Ｌ，Ｓ_Ｒを算出し、各中心座標Ｓ_Ｌ，Ｓ_Ｒを中心とする幅Ｄｗ、高さＤｈの矩形状の画像領域を、判定領域Ｉ_Ｇとして設定する。判定領域Ｉ_Ｇは、撮像画像Ｉ上における顔サイズの変動を吸収するために正規化されてもよい。具体的には、形状ベクトルＳから算出される目尻、目頭の幅や、各目間の幅が、同一サイズとなるように判定領域Ｉ_Ｇを拡大又は縮小してもよい。 In the eye determination process, the center coordinates S _L and S _R of each eye are calculated for each of the right eye and the left eye from the shape vector S obtained by the feature point detection process, and the center coordinates S _L and S _R are set as the centers. A rectangular image area having a width Dw and a height Dh is set as the determination area I _G. The determination area I _G may be normalized so as to absorb the variation in the face size on the captured image I. Specifically, the determination area I _G may be enlarged or reduced so that the width of the inner corner of the eye, the width of the inner corner of the eye, and the width between the eyes calculated from the shape vector S have the same size.

更に、目判定処理では、判定領域Ｉ_Ｇから得られる情報を入力とする多数の弱識別器を用いて、目の検出に成功したか否かを判定する。
Ｍを弱識別器の数、ｍ＝１，２，…，Ｍとして、ｍ番目の弱識別器ｈ_ｍ（Ｉ_Ｇ）は、（１２）式で表される。Ｐ_ｍ１，Ｐ_ｍ２は、判定領域Ｉ_Ｇから選択されるピクセルペアの輝度であり、ピクセルペアの位置は、二つの中心座標Ｓ_Ｌ，Ｓ_Ｒのいずれかが示す位置、又は両者の平均位置に対する相対位置で定義される。θ_ｍは閾値、α_ｍＨ，α_ｍＬは、重みである。 Furthermore, in the eye determination process, it is determined whether or not the eye detection is successful using a large number of weak classifiers that receive information obtained from the determination area I _G as an input.
Let M be the number of weak classifiers, m=1, 2,..., M, and the m-th weak classifier h _m (I _G ) is expressed by equation (12). P _m1 and P _m2 are the brightness of the pixel pair selected from the determination area I _G , and the position of the pixel pair is relative to the position indicated by either of the two center coordinates S _L and S _R , or the average position of both. Defined in relative position. θ _m is a threshold value, and α _mH and α _mL are weights.

つまり、弱識別器ｈ_ｍ（Ｉ_Ｇ）は、ピクセルペアの差分値Ｐ_ｍ１−Ｐ_ｍ２が閾値θ_ｍより大きいか否かによって、重みα_ｍＨ、又はα_ｍＬを判定結果として出力する。 That is, the weak discriminator h _m (I _G ) outputs the weight α _mH or α _mL as the determination result depending on whether the difference value P _m1 −P _m2 of the pixel pair is larger than the threshold θ _m .

そして、Ｍ個の弱識別器ｈ_ｍ（Ｉ_Ｇ）を結合した（１３）式に示す強識別器Ｈ（Ｉ_Ｇ）の出力に従って、Ｈ（Ｉ_Ｇ）＞０のときに成功、Ｈ（Ｉ_Ｇ）≦０のときに失敗と判定する。なお、λは強識別器Ｈ（Ｉ_Ｇ）の出力に対する閾値であり、判定の精度等を考慮して任意に設定される。 Then, according to the output of the strong discriminator H(I _G ), which is obtained by combining M weak discriminators h _m (I _G ), when H(I _G )>0, success, H(I _G ) ≤ 0, it is determined to be a failure. It should be noted that λ is a threshold for the output of the strong discriminator H(I _G ), and is arbitrarily set in consideration of the determination accuracy and the like.

このように、目判定処理では、Ｍ個の弱識別器ｈ_ｍ（Ｉ_Ｇ）が事前に用意される必要がある。 As described above, in the eye determination process, it is necessary to prepare M weak classifiers h _m (I _G ) in advance.

弱識別器ｈ_ｍ（Ｉ_Ｇ）は、画像処理装置２０が学習処理を実行することで生成される。学習には、例えば、アンサンブル学習であるRealAdaboost等が用いられてもよい。但し、学習処理は、必ずしも画像処理装置２０で実行される必要は無く、画像処理装置２０以外の装置にて実行されてもよい。 The weak classifier h _m (I _G ) is generated by the image processing device 20 performing the learning process. For example, Real Adaboost, which is ensemble learning, may be used for learning. However, the learning process does not necessarily have to be executed by the image processing device 20, and may be executed by a device other than the image processing device 20.

［２−２−２．学習］
弱識別器ｈ_ｍ（Ｉ_Ｇ）を生成するための学習処理（以下、第２学習処理）について、図８のフローチャートを用いて説明する。第２学習処理を実行する際には、メモリ２２に学習サンプルとなる多数の学習画像が記憶される。学習画像には、目が映っている正解画像の他、目が隠れていたり、そもそも人が映っていなかったりする不正解画像も含まれる。また、これら学習画像に対してＳ１の特徴検出処理を実行した結果と、検出結果が目の検出に成功しているか否かを予め判定した結果も用意される。 [2-2-2. Learning]
Learning processing for generating the weak discriminator h _m (I _G ) (hereinafter, second learning processing) will be described with reference to the flowchart in FIG. 8. When executing the second learning process, a large number of learning images serving as learning samples are stored in the memory 22. The learning image includes, in addition to the correct answer image in which the eyes are reflected, an incorrect answer image in which the eyes are hidden or the person is not reflected in the first place. In addition, a result of performing the feature detection process of S1 on these learning images and a result of previously determining whether or not the detection result is successful in eye detection are also prepared.

Ｓ５１では、画像処理装置２０は、各学習画像のウェイトを初期化する。ウェイトの初期値は、例えば、全ての学習画像について均一に設定されてもよい。
続くＳ５２では、画像処理装置２０は、弱識別器を識別する識別器インデックスｍを１に初期化する。これと共に、本処理によって生成する弱識別器の数の最大数をＭに設定する。 In S51, the image processing device 20 initializes the weight of each learning image. The initial value of the weight may be set uniformly for all learning images, for example.
In subsequent S52, the image processing apparatus 20 initializes a discriminator index m for identifying a weak discriminator to 1. At the same time, the maximum number of weak classifiers generated by this processing is set to M.

続くＳ５３では、画像処理装置２０は、弱識別器ｈ_ｍ（Ｉ_Ｇ）を生成する。具体的には、様々なピクセルペアについて、ピクセルペアの差分値を全ての学習画像について算出する。そして、目の検出に成功した画像（以下、成功画像）と失敗した画像（以下、失敗画像）を、最もうまく分離できるピクセルペアＰ_ｍ１，Ｐ_ｍ２及び閾値θ_ｍを選択する。また、重みα_ｍＨ、α_ｍＬは、分離の精度に応じて値を変化させる。重みα_ｍＨ、α_ｍＬの設定や各学習画像のウェイトの更新方法は、例えば、Schapire, R.E. and Singer, Y.: Improved Boosting Algorithms Using Confidencerated Predictions, Machine Learning,等に示された方法を用いてもよい。 In subsequent S53, the image processing device 20 generates a weak discriminator h _m (I _G ). Specifically, for various pixel pairs, difference values of pixel pairs are calculated for all learning images. Then, a pixel pair P _m1 and P _m2 and a threshold value θ _m that can best separate an image that succeeds in eye detection (hereinafter, a success image) and a failure image (hereinafter, a failure image) are selected. Further, the weights α _mH and α _mL change their values according to the accuracy of separation. For setting the weights α _mH and α _mL and updating the weights of each learning image, for example, the method shown in Schapire, RE and Singer, Y.: Improved Boosting Algorithms Using Confidencerated Predictions, Machine Learning, etc. can be used. Good.

続くＳ５４では、画像処理装置２０は、これまでに生成されたｍ個の弱識別器ｈ_１（Ｉ_Ｇ）〜ｈ_ｍ（Ｉ_Ｇ）を結合した強識別器Ｈ（Ｉ_Ｇ）によって、全ての学習画像を、成功画像と失敗画像とに正しく分離できたか否かを判定する。Ｓ５４において肯定判定された場合は処理をＳ５８に進め、Ｓ５４において否定判定された場合は、処理をＳ５５に進める。 In subsequent S54, the image processing apparatus 20 until the by the m generated weak classifier _{_{_{h 1 (I G) ~h m}}} (I G) strongly bound the identifier H _{(I G),} all of this It is determined whether the learning image has been correctly separated into a success image and a failure image. If an affirmative decision is made in S54, the process proceeds to S58, and if a negative decision is made in S54, the process proceeds to S55.

Ｓ５８では、画像処理装置２０は、識別器インデックスｍの値を、生成された弱識別器の数Ｍに設定して、処理を終了する。
続くＳ５５では、画像処理装置２０は、識別器インデックスｍを１増加させる。 In S58, the image processing device 20 sets the value of the discriminator index m to the number M of generated weak discriminators, and ends the process.
In subsequent S55, the image processing device 20 increments the discriminator index m by 1.

続くＳ５６では、画像処理装置２０は、ｍ＞Ｍを超えているか否かを判定する。Ｓ５６にて、肯定判定された場合は、処理を終了する。この場合、Ｓ５１で設定されたＭが、生成された識別器の数となる。一方、Ｓ５６にて、否定判定された場合は、処理をＳ５７に進める。 In subsequent S56, the image processing device 20 determines whether or not m>M is exceeded. If a positive determination is made in S56, the process ends. In this case, M set in S51 is the number of generated classifiers. On the other hand, if a negative determination is made in S56, the process proceeds to S57.

Ｓ５７では、画像処理装置２０は、Ｓ５４にて分離に失敗した学習画像のウェイトが増大するように、学習画像のウェイトを更新して、処理をＳ５３に戻す。つまり、分離に失敗した画像を的確に分離できる弱識別器が生成されるように、学習画像のウェイトを調整する。 In S57, the image processing device 20 updates the weight of the learning image so that the weight of the learning image that has failed to be separated in S54 increases, and returns the process to S53. That is, the weights of the learning images are adjusted so that a weak classifier that can accurately separate the images that have failed to be generated is generated.

［２−２−３．処理］
画像処理装置２０が、Ｓ２で実行する目判定処理を、図９のフローチャートを用いて説明する。目判定処理では、第２学習処理によって生成されたＭ個の弱識別器ｈ_ｍ（Ｉ_Ｇ）が用いられる。 [2-2-3. processing]
The eye determination processing executed by the image processing apparatus 20 in S2 will be described with reference to the flowchart of FIG. In the eye determination process, the M weak classifiers h _m (I _G ) generated by the second learning process are used.

Ｓ６１では、画像処理装置２０は、識別器インデックスｍを１に初期化すると共に、弱識別器ｈ_ｍ（Ｉ_Ｇ）の判定結果の積算値Ｊを０に初期化する。
続くＳ６２では、画像処理装置２０は、形状ベクトルＳに基づいて設定される判定領域Ｉ_Ｇから抽出されるピクセルペアを、弱識別器ｈ_ｍ（Ｉ_Ｇ）に作用させた結果を、（１４）式に示すように積算する。この演算は、（１３）式の右辺第１項を算出することに相当する。 In S61, the image processing device 20 initializes the discriminator index m to 1, and also initializes the integrated value J of the determination result of the weak discriminator h _m (I _G ) to 0.
In subsequent S62, the image processing device 20 causes the weak discriminator h _m (I _G ) to act on the pixel pair extracted from the determination region I _G set based on the shape vector S, in (14). Integrate as shown in the formula. This calculation is equivalent to calculating the first term on the right side of Expression (13).

続くＳ６３では、画像処理装置２０は、識別器インデックスｍを１増加させる。 In subsequent S63, the image processing device 20 increments the discriminator index m by 1.

続くＳ６４では、画像処理装置２０は、ｍ＞Ｍであるか否かを判定する。Ｓ６４において、肯定判定された場合は、Ｍ個の弱識別器による判定が全て終了したとして、処理をＳ６５に進める。一方、Ｓ６４において否定判定された場合は、判定が実施されていない弱識別器があるとして、処理をＳ６２に戻す。 In subsequent S64, the image processing device 20 determines whether or not m>M. If an affirmative decision is made in S64, it is considered that the decision by the M weak discriminators has been completed, and the process proceeds to S65. On the other hand, when a negative determination is made in S64, it is determined that there is a weak discriminator for which the determination has not been performed, and the process returns to S62.

Ｓ６５では、画像処理装置２０は、（１５）式に示すように、弱識別器による判定結果の積算値Ｊから重み閾値λを減算することで強識別器Ｈ（Ｉ_Ｇ）による判定値Ｈを算出する。なお、（１４）（１５）式によって、先に示した（１３）式が実現される。 In S65, the image processing apparatus 20, the judgment value H by (15) as shown in equation strong classifier H (I _G) by subtracting the weight threshold λ from the integrated value J of the determination result by the weak classifier calculate. Note that the equations (14) and (15) realize the equation (13) shown above.

続くＳ６６では、画像処理装置２０は、判定値Ｈが０より大きいか否かを判定する。 In subsequent S66, the image processing device 20 determines whether or not the determination value H is greater than 0.

Ｓ６７において肯定判定された場合、画像処理装置２０は、処理をＳ６７に進め、目検出に成功したと判定して、処理を終了する。
Ｓ６７において否定判定された場合、画像処理装置２０は、処理をＳ６８に進め、目検出に失敗したと判定して、処理を終了する。目検出に成功したか否かの判定結果は、例えば、フラグ等を用いて、後段の処理に通知される。 When an affirmative decision is made in S67, the image processing apparatus 20 advances the process to S67, decides that the eye detection has succeeded, and ends the process.
When a negative determination is made in S67, the image processing device 20 advances the processing to S68, determines that the eye detection has failed, and ends the processing. The result of the determination as to whether or not the eye detection has succeeded is notified to the subsequent process using, for example, a flag.

［２−３．領域判定処理］
［２−３−１．概要］
領域判定処理において使用される、目周辺領域の画像情報からその目によって注視されている領域である注視領域を検出する方法につい説明する。 [2-3. Area determination processing]
[2-3-1. Overview]
A method of detecting a gaze area, which is an area gazed by the eye, from the image information of the area around the eye, which is used in the area determination process, will be described.

目周辺領域Ｉ_Ｅは、目の検出に成功していると判定された形状ベクトルＳ（すなわち、複数の特徴点）に基づいて設定される。具体的には、図１０に示すように、目周辺領域Ｉ_Ｅには、右目及び左目のそれぞれについて、目尻から目頭までの領域を含む、所定サイズの矩形領域が用いられる。 The eye peripheral area I _E is set based on the shape vector S (that is, a plurality of feature points) determined to have succeeded in eye detection. Specifically, as shown in FIG. 10, as the eye peripheral area _IE , a rectangular area of a predetermined size including an area from the outer corner of the eye to the inner corner of the eye is used for each of the right eye and the left eye.

なお、目判定に用いる判定領域Ｉ_Ｇと同様の領域を、目周辺領域として用いてもよい。
注視領域は、例えば、図１１に示すように、正面領域ａ_１、上領域ａ_２、下領域ａ_３、右領域ａ_４、左領域ａ_５の５つの注視領域を設定する。また、閉眼状態も独立したカテゴリとして設定してもよい。注視領域の領域形状や領域数は、これに限定されるものではない。 Note that a region similar to the determination region I _G used for eye determination may be used as the eye peripheral region.
As the gaze area, for example, as shown in FIG. 11, five gaze areas of a front area a ₁ , an upper area a ₂ , a lower area a ₃ , a right area a ₄ , and a left area a ₅ are set. Further, the closed eyes state may be set as an independent category. The area shape and the number of areas of the gaze area are not limited to this.

ここで、注視領域の領域数Ｄと同数の要素を持つベクトルを、スコアベクトルＡという。スコアベクトルＡは、（１６）式で表され、各要素は、注視領域ａ_１〜ａ_５のいずれかに対応づけられる。この注視領域スコアベクトルＡは、該当する注視領域に対応する要素の値が大きく、該当しない注視領域に対応する要素の値が小さくなるようなワンホット形式をとり、最終的に一番大きな値を有する要素に対応する注視領域が検出結果とされる。スコアベクトルＡが注視ベクトルに相当する。 Here, a vector having the same number of elements as the number D of gaze areas is called a score vector A. The score vector A is represented by Expression (16), and each element is associated with any of the gaze areas a _{1 to} a ₅ . This gaze area score vector A takes a one-hot format in which the value of the element corresponding to the corresponding gaze area is large and the value of the element corresponding to the other gaze area is small, and finally the largest value is set. The gaze area corresponding to the element is set as the detection result. The score vector A corresponds to the gaze vector.

領域判定処理では、（１７）式に示すように、スコアベクトルＡの初期値Ａ⁽⁰⁾を与え、目の周辺画像から得られる情報を用いて算出される修正量Ｒによって、スコアベクトルＡを修正することで、視線が注視する領域を表したスコアベクトルＡを算出する。 In the region determination process, as shown in Expression (17), an initial value A ⁽⁰⁾ of the score vector A is given, and the score vector A is calculated by the correction amount R calculated using the information obtained from the peripheral image of the eye. The correction is performed to calculate the score vector A that represents the area where the line of sight is gazing.

修正量Ｒは、（１８）式に示すように、スコアベクトルの初期値Ａ⁽⁰⁾と、目周辺領域Ｉ_Ｅとを入力情報として、修正関数Ｆ_Ｋに作用させることで得られる。修正関数Ｆ_Ｋは、特徴量検出処理で用いられたものと同様に、回帰関数の加法モデルを適用した関数である。 The correction amount R is obtained by acting on the correction function F _{K using} the initial value A ⁽⁰⁾ of the score vector and the eye peripheral region I _E as input information, as shown in Expression (18). The correction function F _K is a function to which an additive model of a regression function is applied, like the one used in the feature amount detection processing.

修正関数Ｆ_Kは、予め用意されたＫ個の回帰木を用いて回帰的に値が決定される関数であり、（１９）式で定義される。Ｆ_０は修正関数Ｆ_Kの初期値、Ｇ_ｋはｋで識別される回帰木によって値が決定される回帰関数、ｋ＝１，２，…Ｋである。また、γは学習率であり、０＜γ＜１に設定される。γの値を小さくすることで、過学習となることを抑制する。なお、Ｋおよびγの設定値は目検出と同じ値である必要はない。 The correction function F _K is a function whose value is recursively determined using K regression trees prepared in advance, and is defined by the equation (19). F ₀ is an initial value of the correction function F _K , G _k is a regression function whose value is determined by a regression tree identified by k, and k=1, 2,... K. Further, γ is a learning rate and is set to 0<γ<1. By reducing the value of γ, overlearning is suppressed. The set values of K and γ do not have to be the same as those used for eye detection.

回帰木の構造は、特徴点検出処理で用いる回帰木と同様であるため、説明を省略する。 The structure of the regression tree is the same as that of the regression tree used in the feature point detection process, and thus the description thereof is omitted.

領域検出処理では、スコアベクトルの初期値Ａ⁽⁰⁾が、修正関数Ｆ_Kを用いて算出される修正量Ｒによって修正され、修正されたスコアベクトルＡにおいて、最大スコアとなる要素に対応した注視領域が、注視領域の検出結果として出力される。なお、最大スコアが予め設定された閾値以下であれば、出力の信頼度が低いものとして、出力を不定としてもよい。 In the area detection processing, the initial value A ⁽⁰⁾ of the score vector is corrected by the correction amount R calculated using the correction function F _K , and the gaze corresponding to the element having the maximum score in the corrected score vector A The area is output as the result of detecting the gaze area. If the maximum score is equal to or lower than a preset threshold value, the output reliability may be low and the output may be indefinite.

このように領域検出処理では、スコアベクトルＡの初期値Ａ⁽⁰⁾と、修正関数Ｆ_Ｋの初期値Ｆ_０と、修正関数Ｆ_Ｋの生成に用いる回帰木を定義するパラメータとが事前に用意されている必要がある。以下では、これらのパラメータを総称して領域用パラメータという。 In this way area detecting process is prepared as an initial value A of the score vector A ^(0), the initial value F ₀ of the correction function F _K, the parameters and pre defining the regression tree used for generating the correction function F _K Must have been Hereinafter, these parameters will be collectively referred to as area parameters.

領域用パラメータは、画像処理装置２０が学習処理を実行することで生成される。但し、学習処理は、必ずしも画像処理装置２０で実行される必要は無く、画像処理装置２０以外の装置にて実行されてもよい。 The region parameter is generated by the image processing device 20 executing the learning process. However, the learning process does not necessarily have to be executed by the image processing device 20, and may be executed by a device other than the image processing device 20.

［２−３−２．学習］
領域用パラメータを設定するための学習処理（以下、第３学習処理）について、図１２のフローチャートを用いて説明する。第３学習処理を実行する際には、メモリ２２に学習サンプルとなる多数の学習画像が記憶される。ここでは、Ｎ個の学習画像を用い、各学習画像における目の注視領域を手動で検出した結果が予め用意されているものとする。 [2-3-2. Learning]
The learning process for setting the region parameters (hereinafter, the third learning process) will be described with reference to the flowchart in FIG. When the third learning process is executed, a large number of learning images serving as learning samples are stored in the memory 22. Here, it is assumed that N learning images are used and the result of manually detecting the eye gaze area in each learning image is prepared in advance.

まずＳ７１では、画像処理装置２０は、各学習画像における注視領域の検出結果を取得し、学習画像毎にスコアベクトルＡの正解値（以下、教師データ）Ａ_１〜Ａ_Ｎを生成する。 First, in S71, the image processing device 20 acquires the detection result of the gaze area in each learning image, and generates correct values (hereinafter, teacher data) A _{1 to} A _N of the score vector A for each learning image.

続くＳ７２では、画像処理装置２０は、Ｓ７１で生成された教師データＡ_１〜Ａ_Ｎの平均値を、スコアベクトルの初期値Ａ⁽⁰⁾として算出する。
続くＳ７３では、画像処理装置２０は、（２０）式に従って、学習画像毎に、教師データＡ_ｉとスコアベクトルの初期値Ａ⁽⁰⁾との差分である修正残差ΔＡ_ｉを算出する。 In subsequent S72, the image processing apparatus 20 calculates the average value of the teacher data A _{1 to} A _N generated in S71 as the initial value A ⁽⁰⁾ of the score vector.
In subsequent S73, the image processing apparatus 20 calculates a correction residual ΔA _i which is a difference between the teacher data A _i and the initial value A ⁽⁰⁾ of the score vector for each learning image according to the equation (20).

続くＳ７４では、画像処理装置２０は、修正量Ｒの算出に用いる修正関数Ｆ_Ｋの初期値f₀を、（２１）式を用いて算出する。 In subsequent S74, the image processing apparatus 20 calculates the initial value f ₀ of the correction function F _K used to calculate the correction amount R using the equation (21).

（２１）式は、各学習画像における修正残差ΔＡｉと、スコアベクトルＡと同じ次元を有する任意のベクトルＶとの距離を、全ての学習画像について合計したときに、その合計値を最小にするベクトルＶを修正関数Ｆ_Ｋの初期値Ｆ_０とすることを意味する。 Equation (21) minimizes the total value when the distance between the modified residual ΔAi in each learning image and an arbitrary vector V having the same dimension as the score vector A is summed for all learning images. This means that the vector V is set to the initial value F ₀ of the correction function F _K.

続くＳ７５では、画像処理装置２０は、回帰木の識別に用いる回帰木インデックスｋを１に初期化する。
続くＳ７６では、画像処理装置２０は、学習画像毎に、残り修正量ｃ_ｋを（２２）式を用いて算出する。 In subsequent S75, the image processing device 20 initializes the regression tree index k used for identifying the regression tree to 1.
In subsequent S76, the image processing apparatus 20 calculates the remaining correction amount c _k for each learning image by using the expression (22).

続くＳ７７では、画像処理装置２０は、回帰木の生成に用いるピクセルペアを選択する。ピクセルの位置は、目位置を表す形状ベクトル（すなわち、複数の特徴点）によって決まる基点からの相対位置で表現される。また、基点には、目尻および目頭の少なくとも一方に対応する撮像画像上の点が用いられる。 In subsequent S77, the image processing device 20 selects a pixel pair used for generating a regression tree. The position of a pixel is represented by a relative position from a base point determined by a shape vector representing an eye position (that is, a plurality of feature points). Moreover, a point on the captured image corresponding to at least one of the outer corner of the eye and the inner corner of the eye is used as the base point.

続くＳ７８では、画像処理装置２０は、Ｓ７７で選択したピクセルペアの差分値を、学習画像を分類する際の指標とし、すべての学習画像において残り修正量ｃ_ｋに近い値が得られるような回帰木を生成する。すなわち、回帰木によって実現される回帰関数Ｇ_kを生成する。 In subsequent S78, the image processing apparatus 20 uses the difference value of the pixel pair selected in S77 as an index when classifying the learning images, and performs regression such that a value close to the remaining correction amount c _k is obtained in all the learning images. Generate a tree. That is, the regression function G _k realized by the regression tree is generated.

続くＳ７９では、画像処理装置２０は、Ｓ７８で生成された回帰関数Ｇ_ｋを用いて、（２３）式により、修正関数Ｆ_ｋを更新する。 In subsequent S79, the image processing apparatus 20 updates the correction function F _k by the equation (23) using the regression function G _k generated in S78.

続くＳ８０では、画像処理装置２０は、回帰木インデックスｋを１増加させる。 In subsequent S80, the image processing device 20 increments the regression tree index k by 1.

続くＳ８１では、画像処理装置２０は、ｋ＞Ｋであるか否かを判定する。Ｓ８０において否定判定された場合は、新たな回帰木を作成するために処理をＳ７６に戻し、Ｓ８０において肯定判定された場合は、処理を終了させる。 In subsequent S81, the image processing apparatus 20 determines whether k>K. If a negative determination is made in S80, the process returns to S76 to create a new regression tree, and if an affirmative determination is made in S80, the process ends.

なお、Ｓ７３〜Ｓ８１の処理は、先に説明したＳ１５〜Ｓ２３の処理と同様である。
第３学習処理では、１つの修正関数Ｆ_ＫのためにＫ個の回帰関数Ｇ_ｋ（すなわち、回帰木）が生成される。 The processing of S73 to S81 is the same as the processing of S15 to S23 described above.
In the third learning process, K regression functions G _k (that is, a regression tree) are generated for one correction function F _K.

［２−３−３．処理］
画像処理装置２０が、先のＳ４にて実行する領域検出処理について、図１３のフローチャートを用いて説明する。領域検出処理では、上述の学習処理によって生成された領域用パラメータが用いられる。 [2-3-3. processing]
The area detection processing executed by the image processing apparatus 20 in S4 will be described with reference to the flowchart of FIG. In the area detection processing, the area parameters generated by the learning processing described above are used.

まずＳ９１では、画像処理装置２０は、Ｓ１にて推定された形状ベクトルＳ^(T)に基づいて設定される目周辺領域Ｉ_Ｅの画像を取得する。
続くＳ９２では、画像処理装置２０は、回帰木インデックスｋを１に初期化する。 First, in S91, the image processing apparatus 20 acquires an image of the eye peripheral area I _E set based on the shape vector S ^(T) estimated in S1.
In subsequent S92, the image processing device 20 initializes the regression tree index k to 1.

続くＳ９３では、画像処理装置２０は、回帰木インデックスｋで識別される回帰木ＲＴ_ｋを用いて回帰関数Ｇ_ｋの値を取得し、（２３）式を用いて修正関数Ｆ_ｋを更新する。
続くＳ９４では、画像処理装置２０は、回帰木インデックスｋを１増加させる。 In subsequent S93, the image processing device 20 acquires the value of the regression function G _k using the regression tree RT _k identified by the regression tree index k, and updates the correction function F _k using the equation (23).
In subsequent S94, the image processing device 20 increments the regression tree index k by 1.

続くＳ９５では、画像処理装置２０は、ｋ＞Ｋであるか否かを判定する。つまり、すべての回帰木ＲＴ_１〜ＲＴ_Ｋが修正関数Ｆ_ｋの算出に用いられたか否かを判定する。Ｓ９５において否定判定された場合は処理をＳ９３に戻し、Ｓ９５において肯定判定された場合は処理をＳ９６に進める。Ｓ９３〜Ｓ９５の処理により、最終的に（１９）式に示された修正関数Ｆ_ｋが算出され、（１８）式に示すように、この修正関数ｆ_ｋによって得られる値が、スコアベクトルの初期値Ａ⁽⁰⁾に対する修正量Ｒとなる。 In subsequent S95, the image processing apparatus 20 determines whether k>K. That is, it is determined whether all the regression trees RT _{1 to} RT _K have been used in the calculation of the correction function F _k . If the negative determination is made in S95, the process returns to S93, and if the positive determination is made in S95, the process proceeds to S96. By the processes of S93 to S95, the correction function F _k shown in the equation (19) is finally calculated, and as shown in the equation (18), the value obtained by the correction function f _k is the initial value of the score vector. It is the correction amount R for the value A ⁽⁰⁾ .

Ｓ９６では、画像処理装置２０は、（１７）式を用いて修正されたスコアベクトルＡを生成する。
続くＳ９７では、画像処理装置２０は、修正されたスコアベクトルＡのうち、最大スコアＳＣを有する要素を抽出する。 In S96, the image processing device 20 generates the score vector A modified using the equation (17).
In subsequent S97, the image processing apparatus 20 extracts the element having the maximum score SC from the corrected score vector A.

続くＳ９８では、画像処理装置２０は、ＳＣ＞ＴＨであるか否かを判定する。ＴＨは、判定結果の信頼性を判定するための閾値である。Ｓ９８において肯定判定された場合は処理をＳ９９に進め、Ｓ９８において否定判定された場合は処理をＳ１００に進める。 In subsequent S98, the image processing apparatus 20 determines whether SC>TH. TH is a threshold for determining the reliability of the determination result. If the affirmative determination is made in S98, the process proceeds to S99, and if the negative determination is made in S98, the process proceeds to S100.

Ｓ９９では、画像処理装置２０は、最大スコアＳＣを有する要素に対応づけられた注視領域を検出結果として出力して、処理を終了する。
Ｓ１００では、画像処理装置２０は、注視領域の検出結果の信頼性が低いものとして、検出結果を無効化して、処理を終了する。 In S99, the image processing apparatus 20 outputs the gaze area associated with the element having the maximum score SC as the detection result, and ends the processing.
In step S100, the image processing apparatus 20 determines that the detection result of the gaze area has low reliability, invalidates the detection result, and ends the processing.

［３．効果］
以上詳述した実施形態によれば、以下の効果を奏する。
（３ａ）本実施形態では、撮像画像から目の位置を検出する際に、複数の特徴点の位置を表現した形状ベクトルＳを繰り返し補正する回帰的な手法を用いる。これにより、複数の特徴点を、撮像画像Ｉの全体から直接かつ同時に検出する。このため、従来技術とは異なり、顔検出を行う必要がなく、また、パターンマッチングも使用しないため、効率よく特徴点を検出できる。その結果、目の位置の検出に要する処理時間を短縮でき、処理におけるメモリ効率も向上させることができる。 [3. effect]
According to the embodiment described in detail above, the following effects are achieved.
(3a) In the present embodiment, when detecting the position of the eyes from the captured image, a recursive method of repeatedly correcting the shape vector S expressing the positions of a plurality of feature points is used. Thereby, a plurality of feature points are directly and simultaneously detected from the entire captured image I. Therefore, unlike the related art, it is not necessary to perform face detection and pattern matching is not used, so that the feature points can be efficiently detected. As a result, the processing time required to detect the eye position can be shortened and the memory efficiency in the processing can be improved.

（３ｂ）本実施形態では、目判定処理によって、特徴点検出処理により検出された複数の特徴点が、正しく目の位置を表しているか否かを判定する。このため、誤検出された検出結果が領域判定処理に出力されること、ひいては、注視領域が誤って検出されること抑制できる。 (3b) In the present embodiment, the eye determination process determines whether or not the plurality of feature points detected by the feature point detection process correctly represent the eye position. Therefore, it is possible to prevent the erroneously detected detection result from being output to the region determination process, and thus to prevent the gaze region from being erroneously detected.

（３ｃ）本実施形態では、目周辺領域Ｉ_Ｅのテクスチャ情報を用いて注視領域を推定するため、頭部位置、顔向き、及び視線角度等の検出結果を利用して推定する従来の手法と比較して、注視領域の検出に要する処理時間を短縮できる。しかも複数の検出器を直列的に使用することによる誤差の蓄積も抑制されるため、注視領域の検出精度、ひいては注視方向の検出精度を向上させることができる。 (3c) In the present embodiment, since the gaze area is estimated using the texture information of the eye peripheral area I _E , the conventional method of estimating the gaze area using the detection results of the head position, the face direction, the gaze angle, and the like By comparison, the processing time required to detect the gaze area can be shortened. Moreover, since the accumulation of errors due to the use of a plurality of detectors in series is also suppressed, it is possible to improve the detection accuracy of the gaze area and thus the detection accuracy of the gaze direction.

（３ｄ）本実施形態では、特徴量検出処理における形状ベクトルＳの補正量ｒ、及び領域検出処理におけるスコアベクトルＡの修正量Ｒの算出に回帰木が用いられ、回帰木の条件分岐に用いるパラメータとして、ピクセルペアの差分値を用いる。また、目判定処理で用いる弱識別器の入力パラメータとしても、ピクセルペアの差分値を用いる。つまり、いずれも回帰的な処理で単純なパラメータを利用しているため、処理を軽量化できる。しかも、特徴点検出処理及び領域検出処理では、いずれも勾配ブースティングを用いているため、高速動作を実現できる。 (3d) In the present embodiment, a regression tree is used to calculate the correction amount r of the shape vector S in the feature amount detection process and the correction amount R of the score vector A in the region detection process, and a parameter used for conditional branching of the regression tree. Is used as the difference value of the pixel pair. The difference value of the pixel pair is also used as the input parameter of the weak classifier used in the eye determination process. In other words, since the recursive processing uses simple parameters, the processing can be lightened. Moreover, since gradient boosting is used in both the feature point detection processing and the area detection processing, high speed operation can be realized.

（３ｅ）本実施形態では、ピクセルペアの位置として、標準ベクトルに変換行列Ｑを作用させて修正した位置を用いる。これにより、撮像画像毎に異なる、回転、拡大、及び縮小等による特徴点形状の変動が吸収されるため、ピクセルペアを利用する処理の検出精度を向上させることができる。 (3e) In the present embodiment, as the position of the pixel pair, the position corrected by applying the conversion matrix Q to the standard vector is used. As a result, variations in the shape of the feature points due to rotation, enlargement, reduction, etc., which are different for each captured image, are absorbed, so that the detection accuracy of the process using the pixel pair can be improved.

［４．他の実施形態］
以上、本開示の実施形態について説明したが、本開示は上述の実施形態に限定されることなく、種々変形して実施することができる。 [4. Other Embodiments]
Although the embodiments of the present disclosure have been described above, the present disclosure is not limited to the above-described embodiments, and various modifications can be implemented.

（４ａ）上記実施形態では、回帰木や弱識別器への入力情報としてピクセルペアの差分値である輝度差を用いているが、本開示はこれに限定されるものではない。例えば、輝度の絶対値や一定範囲の輝度の平均値等を用いてもよい。 (4a) In the above embodiment, the luminance difference, which is the difference value of the pixel pair, is used as the input information to the regression tree or the weak discriminator, but the present disclosure is not limited to this. For example, an absolute value of brightness or an average value of brightness within a certain range may be used.

（４ｂ）上記実施形態では、回帰木を用いて回帰関数ｇ_ｋを取得する方法を例示したが、回帰関数を用いる方法であれば、回帰木を用いていなくてもよい。
（４ｃ）上記実施形態では、変換行列Ｑを用いて、回帰木への入力となるピクセルペアの位置を修正する構成を例示したが、ピクセルペアの位置は、変換行列Ｑによる修正を行うことなく、そのまま使用してもよい。 (4b) In the above embodiment, the method of acquiring the regression function g _k using the regression tree is illustrated, but the regression tree may not be used as long as the method uses the regression function.
(4c) In the above embodiment, the configuration of correcting the position of the pixel pair that is an input to the regression tree by using the conversion matrix Q has been illustrated, but the position of the pixel pair does not need to be corrected by the conversion matrix Q. , May be used as it is.

（４ｄ）上記実施形態では、注視ベクトルとして、ベクトルの各要素が注視領域に対応付けられたスコアベクトルを用いた例を示したが、本開示はこれに限定されるものではない。例えば、各要素が３次元空間の座標を表わすベクトル、すなわちベクトルが示す向きが、そのまま注視方向を示すベクトルを、注視ベクトルとして用いてもよい。 (4d) In the above embodiment, an example in which a score vector in which each element of the vector is associated with a gaze area is used as the gaze vector has been shown, but the present disclosure is not limited to this. For example, a vector in which each element represents the coordinates in the three-dimensional space, that is, a vector in which the direction indicated by the vector indicates the gaze direction may be used as the gaze vector.

（４ｅ）本開示に記載の画像処理装置２０及びその手法は、コンピュータプログラムにより具体化された一つ乃至は複数の機能を実行するようにプログラムされたプロセッサ及びメモリを構成することによって提供された専用コンピュータにより、実現されてもよい。あるいは、本開示に記載の画像処理装置２０及びその手法は、一つ以上の専用ハードウェア論理回路によってプロセッサを構成することによって提供された専用コンピュータにより、実現されてもよい。もしくは、本開示に記載の画像処理装置２０及びその手法は、一つ乃至は複数の機能を実行するようにプログラムされたプロセッサ及びメモリと一つ以上のハードウェア論理回路によって構成されたプロセッサとの組み合わせにより構成された一つ以上の専用コンピュータにより、実現されてもよい。また、コンピュータプログラムは、コンピュータにより実行されるインストラクションとして、コンピュータ読み取り可能な非遷移有形記録媒体に記憶されてもよい。画像処理装置２０に含まれる各部の機能を実現する手法には、必ずしもソフトウェアが含まれている必要はなく、その全部の機能が、一つあるいは複数のハードウェアを用いて実現されてもよい。 (4e) The image processing apparatus 20 and the method thereof according to the present disclosure are provided by configuring a processor and a memory programmed to execute one or a plurality of functions embodied by a computer program. It may be realized by a dedicated computer. Alternatively, the image processing apparatus 20 and the method thereof described in the present disclosure may be realized by a dedicated computer provided by configuring a processor with one or more dedicated hardware logic circuits. Alternatively, the image processing apparatus 20 and the method thereof according to the present disclosure include a processor and a memory programmed to execute one or a plurality of functions, and a processor configured by one or more hardware logic circuits. It may be realized by one or more dedicated computers configured by combination. Further, the computer program may be stored in a computer-readable non-transition tangible recording medium as an instruction executed by a computer. The method for realizing the function of each unit included in the image processing apparatus 20 does not necessarily include software, and all the functions may be realized by using one or a plurality of hardware.

（４ｆ）上記実施形態における１つの構成要素が有する複数の機能を、複数の構成要素によって実現したり、１つの構成要素が有する１つの機能を、複数の構成要素によって実現したりしてもよい。また、複数の構成要素が有する複数の機能を、１つの構成要素によって実現したり、複数の構成要素によって実現される１つの機能を、１つの構成要素によって実現したりしてもよい。また、上記実施形態の構成の一部を省略してもよい。また、上記実施形態の構成の少なくとも一部を、他の上記実施形態の構成に対して付加又は置換してもよい。 (4f) A plurality of functions of one constituent element in the above embodiment may be realized by a plurality of constituent elements, or one function of one constituent element may be realized by a plurality of constituent elements. .. Further, a plurality of functions of a plurality of constituent elements may be realized by one constituent element, or one function realized by a plurality of constituent elements may be realized by one constituent element. Moreover, you may omit a part of structure of the said embodiment. Further, at least a part of the configuration of the above-described embodiment may be added or replaced with respect to the configuration of the other above-described embodiment.

（４ｇ）上述した画像処理装置２０の他、画像処理装置２０を構成要素とするシステム、画像処理装置２０としてコンピュータを機能させるためのプログラム、このプログラムを記録した半導体メモリ等の非遷移的実態的記録媒体、目検出方法、及び注視領域検出方法など、種々の形態で本開示を実現することもできる。 (4g) In addition to the image processing device 20 described above, a system having the image processing device 20 as a constituent element, a program for causing a computer to function as the image processing device 20, and a non-transitional physical substance such as a semiconductor memory storing the program. The present disclosure can be implemented in various forms such as a recording medium, an eye detection method, and a gaze area detection method.

１…視線検出システム、１０…カメラ、２０…画像処理装置、２１…ＣＰＵ、２２…メモリ、４１…回帰木、４２…通常ノード、４３…末端ノード。 1... Line-of-sight detection system, 10... Camera, 20... Image processing device, 21... CPU, 22... Memory, 41... Regression tree, 42... Normal node, 43... End node.

Claims

A feature point detection unit (20:S1) configured to detect a plurality of feature points representing eye positions from the entire captured image;
The determination region in the captured image is set so as to include the plurality of feature points detected by the feature point detection unit, and information obtained from the determination region is used to determine the eye positions of the plurality of feature points. A determination unit (20:S2) configured to determine whether or not the representation is correct,
Equipped with
The feature point detection unit uses a vector representing the positions of the plurality of feature points as a shape vector, gives an initial value of the shape vector, and calculates a correction amount of the shape vector calculated using information of the captured image. By using, by repeating the process of updating the shape vector, to detect the plurality of feature points representing the position of the true eye in the captured image,
Image processing device.

The image processing apparatus according to claim 1, wherein
The feature point detection unit calculates a correction amount of the shape vector by using a regression function representing a relationship between the feature amount calculated from the captured image and the correction amount of the shape vector,
Image processing device.

The image processing apparatus according to claim 2, wherein
The image processing device in which the regression function used in the feature point detection unit is a regression tree learned by gradient boosting and is realized as a set of weak hypotheses having a tree structure.

The image processing apparatus according to claim 2 or 3, wherein
The feature point detection unit uses, as the feature amount, a difference value between two pixels selected from the captured image,
Image processing device.

The image processing apparatus according to claim 4, wherein
The image processing apparatus, wherein the position of the pixel used to calculate the feature amount is represented by a relative position from a base point determined by the shape vector.

The image processing apparatus according to claim 5, wherein
The feature point detection unit uses a transformation matrix that reduces a positional deviation between an average position of the plurality of feature points in the plurality of captured images and a position of the plurality of feature points in each of the captured images, and the conversion matrix An image processing apparatus for selecting the two pixels by using the relative position converted by.

The image processing apparatus according to any one of claims 1 to 6,
The image processing apparatus, wherein the feature point detection unit uses, as the feature point, a point on the captured image corresponding to at least one of the outer corner of the eye and the inner corner of the eye.

The image processing device according to any one of claims 1 to 7,
When the determination unit determines that the plurality of feature points correctly represent the eye position, the eye set including the positions of the plurality of feature points detected by the feature point detection unit. A direction detection unit (20:S4) for detecting a gaze direction, which is a direction in which the eye gazes in the captured image, using information on the peripheral region,
Image processing device.

The image processing apparatus according to claim 8, wherein
The direction detection unit, by using a vector representing the gaze direction as a gaze vector, gives an initial value of the gaze vector, and corrects the gaze vector by a correction amount of the gaze vector calculated using information of the eye peripheral region. The image processing apparatus detects the direction indicated by the corrected gaze vector as the gaze direction.

A feature point detection unit (20:S1) configured to detect a plurality of feature points representing eye positions from the entire captured image;
The determination region in the captured image is set so as to include the plurality of feature points detected by the feature point detection unit, and information obtained from the determination region is used to determine the eye positions of the plurality of feature points. A determination unit (20:S2) configured to determine whether or not the representation is correct,
When the determination unit determines that the plurality of feature points correctly represent the eye position, the eye set including the positions of the plurality of feature points detected by the feature point detection unit. A direction detection unit (20: S4) for detecting a gaze direction, which is a direction in which the eye gazes in the captured image, using information on the peripheral region,
The direction detection unit, by using a vector representing the gaze direction as a gaze vector, gives an initial value of the gaze vector, and corrects the gaze vector by a correction amount of the gaze vector calculated using information of the eye peripheral region. The image processing apparatus detects the direction indicated by the corrected gaze vector as the gaze direction.

The image processing device according to claim 9 or 10, wherein
The direction detecting unit calculates a correction amount of the gaze vector by using a regression function representing a relationship between a feature amount calculated from the captured image and a correction amount for the score vector,
Image processing device.

The image processing apparatus according to claim 11, wherein
The image processing device in which the regression function used in the direction detection unit is a regression tree learned by gradient boosting and is realized as a set of weak hypotheses having a tree structure.

The image processing device according to any one of claims 9 to 12,
The direction detection unit uses a difference value of two pixels selected from the captured image as the feature amount,
Image processing device.

The image processing apparatus according to claim 13, wherein
The image processing device in which the position of the pixel used in the calculation of the feature amount by the direction detection unit is represented by a relative position from a base point determined by the plurality of feature points detected by the feature point detection unit. ..

The image processing apparatus according to claim 14, wherein
The said direction detection part is an image processing apparatus which uses the point on the said captured image corresponding to at least one of the outer corner of the eye and the inner corner of the eye as said base point.

The image processing device according to any one of claims 9 to 15,
The direction detection unit is a vector having a plurality of elements that are associated with any of a plurality of preset gaze areas as the gaze vector, and corresponds to the gaze area located in the gaze direction. The gaze associated with the element having the largest value in the modified gaze vector using a one-hot type vector in which the assigned element has a large value and the other elements have a small value An image processing apparatus in which the direction in which a region is located is the gaze direction.

The image processing apparatus according to claim 16, wherein:
The said direction detection part is an image processing apparatus which makes the said gaze direction the direction where the said gaze area matched with the element which has the largest value in the said corrected gaze vector is located.

The image processing apparatus according to any one of claims 1 to 17,
The determination unit obtains a determination result using a strong discriminator that is a combination of a plurality of weak discriminators.
Image processing device.

The image processing apparatus according to claim 18,
The determination unit uses a difference value between two pixels selected from the determination region as a parameter input to the weak discriminator,
Image processing device.

The image processing device according to claim 19, wherein
The determination unit uses the determination region normalized so that the size of eyes estimated from the plurality of feature points detected by the feature point detection unit is constant,
Image processing device.

The image processing apparatus according to any one of claims 1 to 20,
An image processing apparatus in which a face image of a driver is used as the captured image.

An eye detection method for detecting an eye position from a captured image by image processing,
A feature point detecting step (S1) of detecting a plurality of feature points representing eye positions from the entire captured image;
The determination area in the captured image is set so as to include the plurality of feature points detected in the feature point detection step, and the information obtained from the determination area is used to determine the eye positions of the plurality of feature points. A determination step (S2) for determining whether or not it is represented correctly,
Equipped with
In the feature point detecting step, a vector expressing the positions of the plurality of feature points is used as a shape vector, an initial value of the shape vector is given, and a correction amount of the shape vector calculated using information of the captured image is calculated. By using, by repeating the process of updating the shape vector, to detect the plurality of feature points representing the position of the true eye in the captured image,
Eye detection method.

A gaze direction detecting method for detecting a gaze direction, which is a direction in which an eye gazes from a captured image by image processing,
A feature point detecting step (S1) of detecting a plurality of feature points representing eye positions from the entire captured image;
The determination region in the captured image is set to include the plurality of feature points detected in the feature point detection step, and the information obtained from the determination region is used to determine the eye positions of the plurality of feature points. A determination step (S2) of determining whether or not the detection is correctly performed,
When it is determined in the determination step that the plurality of feature points correctly detect the eye positions, the positions of the plurality of feature points detected in the feature point detection step are set to be included. A direction detection step (S4) of detecting a gaze direction of the eye in the captured image using information about the eye peripheral region;
Equipped with
In the direction detecting step, a vector representing the gaze direction is used as a gaze vector, an initial value of the gaze vector is given, and the gaze vector is corrected by a correction amount of the gaze vector calculated using information of the eye peripheral region. Then, the gaze direction detecting method for detecting the direction indicated by the corrected gaze vector as the gaze direction.