JP4692526B2

JP4692526B2 - Gaze direction estimation apparatus, gaze direction estimation method, and program for causing computer to execute gaze direction estimation method

Info

Publication number: JP4692526B2
Application number: JP2007185996A
Authority: JP
Inventors: 章内海; 大丈山添
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2006-07-18
Filing date: 2007-07-17
Publication date: 2011-06-01
Anticipated expiration: 2027-07-17
Also published as: JP2008102902A

Description

この発明はカメラ等からの画像を処理する画像処理に関し、特に、画像中の人物の視線方向を推定および検出するための画像認識の分野に関する。 The present invention relates to image processing for processing an image from a camera or the like, and more particularly to the field of image recognition for estimating and detecting the direction of the line of sight of a person in an image.

人物の視線方向の推定は、たとえば、マンマシンインタフェースの１つの方法として従来研究されてきた。 The estimation of a person's gaze direction has been conventionally studied as one method of a man-machine interface, for example.

視線計測について、従来のカメラを利用した手法では、カメラの設置位置によって「頭部装着型」と「非装着型」に分類できる。一般的に「頭部装着型」は精度は高いが、ユーザの負担が大きい。またカメラの座標系が頭部の動きに連動するため、注視対象を判別するためには、外界の座標系と結び付ける工夫が必要となる。 With regard to eye gaze measurement, the conventional method using a camera can be classified into “head-mounted type” and “non-mounted type” depending on the installation position of the camera. In general, the “head-mounted type” has high accuracy, but the burden on the user is large. In addition, since the camera coordinate system is linked to the movement of the head, in order to determine the gaze target, it is necessary to devise a connection with the external coordinate system.

具体的な手法としては、近赤外の点光源（Light Emitting Diode:LED）を目に照射し、角膜で反射された光源像と瞳孔の位置から視線を推定する瞳孔角膜反射法が良く知られており、「頭部装着型」または「非装着型」の両方に利用されている。赤外照明によって瞳孔と虹彩のコントラストは可視光照明より高くなり、瞳孔を検出しやすくなるが、その直径は数ミリで、また赤外光源の反射像もごく小さなスポットとして映るため、解像度の高い画像が要求される。そのため、片目をできるだけ大きく撮像することになる。その結果、「非装着型」の場合、顔が少し動くと目がカメラ視界から外れるという問題がある。 As a specific method, a pupil corneal reflection method is known in which a near-infrared point light source (Light Emitting Diode: LED) is irradiated to the eye and the line of sight is estimated from the light source image reflected from the cornea and the position of the pupil. It is used for both “head-mounted type” and “non-mounted type”. Infrared illumination makes the pupil and iris contrast higher than visible light illumination, making it easier to detect the pupil, but its diameter is only a few millimeters and the reflected image of the infrared light source is reflected as a very small spot, resulting in high resolution. An image is required. Therefore, one eye is imaged as large as possible. As a result, in the case of the “non-wearing type”, there is a problem that if the face moves slightly, the eyes will be out of the camera view.

角膜反射を利用しない「非装着型」による視線推定手法については、ステレオカメラ方式と単眼カメラ方式の大きく２種類に分けられる。 The “non-wearing” gaze estimation method that does not use corneal reflection is roughly divided into two types: a stereo camera method and a monocular camera method.

ステレオカメラ方式では、まず顔の特徴点（人為的に貼付したマーカや目尻などの自然特徴点など）の３次元位置を２眼ステレオにより推定し、それをもとに眼球中心位置を推定する手法である。視線方向は、求まった眼球中心位置と画像中の瞳孔位置・虹彩位置を結ぶ直線として推定される。しかし、事前にカメラ間のキャリブレーションが必要であるため、カメラの観測範囲の変更は容易ではないという問題がある。 In the stereo camera method, first, a three-dimensional position of a facial feature point (such as a natural feature point such as an artificially affixed marker or an eye corner) is estimated by a binocular stereo, and an eyeball center position is estimated based on that. It is. The line-of-sight direction is estimated as a straight line connecting the obtained eyeball center position and the pupil position / iris position in the image. However, since calibration between cameras is necessary in advance, there is a problem that it is not easy to change the observation range of the camera.

一方、単眼カメラによる方式では、カメラと顔の距離が離れても、ズームなどにより必要な画像解像度が得られれば視線が推定できる。 On the other hand, in the method using a monocular camera, the line of sight can be estimated even if the camera is away from the face if the necessary image resolution is obtained by zooming or the like.

単眼カメラによる方式では、眼の画像パターンから視線を推定するニューラルネットワークによる手法や、観測時の虹彩の楕円形状から視線方向（虹彩の法線方向）を推定する手法、ステレオカメラ方式と同じように特徴点を抽出して虹彩位置との幾何学的関係から視線を推定する手法がある。 The monocular camera method uses a neural network method that estimates the line of sight from the eye image pattern, a method that estimates the line-of-sight direction (the normal direction of the iris) from the elliptical shape of the iris during observation, and the stereo camera method. There is a method of extracting a feature point and estimating a line of sight from a geometric relationship with an iris position.

これらのうち、顔の上の特徴点と虹彩との相対的位置関係から視線を推定する方法は直感的に理解しやすいことから、早くから検討されてきた。 Among these, the method of estimating the line of sight from the relative positional relationship between the feature points on the face and the iris has been studied from an early stage because it is easy to understand intuitively.

青山らは顔の向きの変化にも対応するため、左右の目尻と口の両端（口角）から形成される台形を利用して顔の向きを推定すると同時に、両目尻の中点と左右の虹彩の中点の差から正面視からの目の片寄り量を推定し、両方合わせて視線方向を推定する原理を示した（たとえば、非特許文献１を参照）。 Aoyama et al. Used the trapezoid formed by the left and right corners of the eye and both ends of the mouth (mouth corners) to estimate the orientation of the face, as well as the midpoint of both eyes and the left and right irises. The principle of estimating the gaze direction by estimating the amount of deviation of the eye from the front view from the difference between the midpoints is shown (for example, see Non-Patent Document 1).

伊藤らは目の近くに指標点となるサージカルテープなどを貼付し、その指標マークと虹彩の相対位置変化から視線方向を推定するシステムを肢体不自由者のコミュニケーションの手段として検討している（たとえば、非特許文献２を参照）。この場合、顔をキャリブレーション時と同じ姿勢に保つ必要がある。 Ito et al. Is investigating a system that estimates the gaze direction from the relative position change of the index mark and iris, as a means of communication for people with physical disabilities, for example, by attaching surgical tape as an index point near the eyes (for example, (Refer nonpatent literature 2). In this case, it is necessary to keep the face in the same posture as during calibration.

三宅らは眼球の幾何学的モデルから、２つの眼球の中心を結ぶ直線が顔表面と交差する左右の点を参照点とすれば、画像上のその中点と左右の虹彩の中点から、顔の向きに関係なく視線方向が計算できることを示した（たとえば、非特許文献３を参照）。 Miyake et al. From the geometric model of the eyeball, if the left and right points where the straight line connecting the centers of the two eyeballs intersects the face surface are used as reference points, from the midpoint of the image and the midpoint of the left and right irises, It has been shown that the gaze direction can be calculated regardless of the face orientation (see, for example, Non-Patent Document 3).

川戸らは、４つの参照点を利用する顔の向きに影響されない視線方向推定方法を提案した（たとえば、非特許文献４を参照）。４つの参照点は原理的には同一平面上でなければ位置的な制限はなく、４つの参照点から眼球中心を推定し、眼球中心と虹彩中心を結ぶ直線として視線方向を推定するものである。 Kawato et al. Proposed a gaze direction estimation method that uses four reference points and is not affected by the orientation of the face (see, for example, Non-Patent Document 4). In principle, the four reference points are not limited unless they are on the same plane, and the eyeball center is estimated from the four reference points, and the direction of the line of sight is estimated as a straight line connecting the eyeball center and the iris center. .

なお、以下に説明する本発明の視線方向の推定方法においては、画像中からまず人物の顔を検出する。そこで、従来の画像中からの顔の検出手法の従来技術については、以下のようなものがある。 In the gaze direction estimation method of the present invention described below, a human face is first detected from an image. In view of this, conventional techniques for detecting a face from a conventional image include the following.

つまり、これまでに、肌色情報を用いた顔検出システムや、色情報を用いない（濃淡情報を用いる）顔検出手法では、テンプレートマッチングやニューラルネットワーク等の学習的手法を利用した手法については報告が数多くなされている。 In other words, so far, face detection systems that use skin color information and face detection methods that do not use color information (use shading information) have reported on methods that use learning methods such as template matching and neural networks. Many have been made.

たとえば、安定性が高く、かつ実時間での顔の追跡が可能な手法として、安定した顔の特徴点として両目の間の点（以下では眉間（Ｂｅｔｗｅｅｎ−ｔｈｅ−Ｅｙｅｓ）と呼ぶ）に着目し、眉間の周囲は、額部と鼻筋は相対的に明るく、両サイドの目と眉の部分は暗いパターンになっており、それを検出するリング周波数フィルタを用いるとの手法が提案されている（たとえば、非特許文献５、特許文献１を参照）。 For example, as a method with high stability and capable of tracking a face in real time, attention is paid to a point between eyes (hereinafter referred to as “Between-the-Eyes”) as a feature point of a stable face. In the area between the eyebrows, the forehead and nose are relatively bright, and the eyes and eyebrows on both sides have a dark pattern, and a method using a ring frequency filter to detect it has been proposed ( For example, see Non-Patent Document 5 and Patent Document 1.)

さらに、他の手法として、たとえば、人間の顔領域を含む対象画像領域内の各画素の値のデジタルデータを準備して、順次、対象となる画像領域内において、６つの矩形形状の結合した眉間検出フィルタによるフィルタリング処理により眉間候補点の位置を抽出し、抽出された眉間候補点の位置を中心として、所定の大きさで対象画像を切り出し、パターン判別処理に応じて、眉間候補点のうちから真の候補点を選択する、というような顔を検出する方法も提案されている（たとえば、特許文献２を参照）。 Further, as another method, for example, digital data of each pixel value in a target image area including a human face area is prepared, and six rectangular shapes are sequentially combined in the target image area. The position of the eyebrow candidate point is extracted by filtering processing by the detection filter, the target image is cut out with a predetermined size around the position of the extracted eyebrow candidate point, and from among the eyebrow candidate points according to the pattern determination process A method for detecting a face by selecting a true candidate point has also been proposed (see, for example, Patent Document 2).

さらに、顔画像中から鼻位置をリアルタイムで追跡する手法についても報告されている（たとえば、非特許文献６を参照）
青山宏、河越正弘：「顔の面対称性を利用した視線感知法」情処研報89−CV−61、pp.1-8（1989）。伊藤，数藤：「画像センサを用いた眼球運動による意思伝達システム」信学技報WIT99−39（２０００）。三宅哲夫、春田誠司、堀畑聡：「顔の向きに依存しない特徴量を用いた注視判定法」信学論（Ｄ−ＩＩ）、Vol．J86−D-II、No．12、 pp．1737−1744（2003）。川戸、内海、安部：「４つの参照点と３枚のキャリブレーション画像に基づく単眼カメラからの視線推定」画像の認識・理解シンポジウム（MIRU2005），pp．1337−1342（2005）。川戸慎二郎、鉄谷信二、”リング周波数フィルタを利用した眉間の実時間検出”信学論（Ｄ−ＩＩ），ｖｏｌ．Ｊ８４−Ｄ−ＩＩ，ｎｏ１２，ｐｐ．２５７７−２５８４，Ｄｅｃ．２００１。川戸慎二郎、鉄谷信二：鼻位置の検出とリアルタイム追跡：信学技報IE2002−263、pp．25−29（2003）。特開２００１−５２１７６号公報明細書特開２００４−１８５６１１号公報明細書 Furthermore, a method for tracking the nose position in real time from the face image has also been reported (for example, see Non-Patent Document 6).
Hiroshi Aoyama, Masahiro Kawagoe: “Gaze detection method using face symmetry”, Information Processing Research Reports 89-CV-61, pp.1-8 (1989). Ito, Kazufuji: “A communication system using eye movements using image sensors” IEICE Tech. WIT99-39 (2000). Tetsuo Miyake, Seiji Haruta, Atsushi Horiba: “Gaze-judgment method using feature quantities independent of face orientation”, Theory of Science (D-II), Vol. J86-D-II, no. 12, pp. 1737-1744 (2003). Kawato, Utsumi, Abe: “Gaze estimation from monocular camera based on four reference points and three calibration images” Image Recognition and Understanding Symposium (MIRU2005), pp. 1337-1342 (2005). Shinjiro Kawato and Shinji Tetsuya, “Real-time detection of eyebrows using a ring frequency filter” Theory of Science (D-II), vol. J84-D-II, no12, pp. 2577-2584, Dec. 2001. Shinjiro Kawato, Shinji Tetsuya: Detection of nose position and real-time tracking: IEICE Technical Report IE2002-263, pp. 25-29 (2003). Japanese Patent Laid-Open No. 2001-52176 Japanese Patent Application Laid-Open No. 2004-185611

しかしながら、従来の単眼カメラによる方式の視線検出のための手法では、参照点として人為的なマークが必要であるという問題があった。 However, the conventional method for line-of-sight detection using a monocular camera has a problem that an artificial mark is required as a reference point.

それゆえに本発明の目的は、カメラにより撮影された画像情報に基づいて、人為的なマーカを用いることなく、リアルタイムに視線を追跡する視線方向の推定装置、視線方向の推定方法およびコンピュータに当該視線方向の推定方法を実行させるためのプログラムを提供することである。 Therefore, an object of the present invention is to provide a gaze direction estimation device, a gaze direction estimation method, and a computer for tracking a gaze in real time based on image information captured by a camera without using an artificial marker. It is to provide a program for executing a direction estimation method.

この発明のさらに他の目的は、１つのカメラ（単眼カメラ）により撮影された画像情報に基づいて、リアルタイムに視線を追跡する視線方向の推定装置、視線方向の推定方法およびコンピュータに当該視線方向の推定方法を実行させるためのプログラムを提供することである。 Still another object of the present invention is to provide a gaze direction estimating device, a gaze direction estimating method, and a computer for tracking a gaze in real time based on image information captured by a single camera (monocular camera). It is to provide a program for executing an estimation method.

この発明のある局面に従うと、視線方向の推定装置であって、画像を獲得するための撮影手段と、撮影手段により撮影された人間の顔領域を含むフレームの画像に基づき、顔領域内の複数の特徴点の間の相対的な３次元の位置関係を特定する相対関係特定手段と、人間の眼球中心の３次元の位置である眼球中心位置の推定処理を実行する眼球中心推定手段とを備え、眼球中心推定手段は、予め特定された特徴点と眼球中心の位置との相対的な３次元の位置関係を用いて、当該推定処理の対象のフレームにおいて相対関係特定手段により特定された複数の特徴点間の相対的な３次元の位置関係に基づいて、推定処理を実行し、推定処理の対象のフレームの画像領域内において、虹彩の領域と虹彩の形状のモデルとを照合することで、虹彩中心が画像に投影される虹彩中心位置を抽出する虹彩中心抽出手段と、推定処理の対象のフレームの画像において抽出された虹彩中心位置と眼球中心の位置とに基づき、眼球中心と虹彩中心とを結ぶ３次元直線の方向として、視線方向を推定する視線推定手段とをさらに備える。 According to an aspect of the present invention, there is provided a gaze direction estimation apparatus, and a plurality of images in a face region based on an image capturing unit for acquiring an image and a frame image including a human face region captured by the image capturing unit. Relative relationship specifying means for specifying a relative three-dimensional positional relationship between the feature points, and eyeball center estimation means for executing an estimation process of an eyeball center position that is a three-dimensional position of the human eyeball center. The eyeball center estimation means uses a relative three-dimensional positional relationship between the feature point specified in advance and the position of the eyeball center, and uses a plurality of relative relation specification means specified in the estimation target frame. Based on the relative three-dimensional positional relationship between the feature points, the estimation process is executed, and in the image area of the estimation target frame, the iris area and the iris shape model are collated, Iris center The iris center extracting means for extracting the iris center position projected on the image, and the eyeball center and the iris center are connected based on the iris center position and the eyeball center position extracted from the image of the estimation target frame 3 The apparatus further includes gaze estimation means for estimating the gaze direction as the direction of the dimension line.

好ましくは、撮影手段は、人間の顔領域を含む対象画像領域内の各画素に対応する画像データを撮影して獲得するための単眼の撮影手段である。 Preferably, the photographing unit is a monocular photographing unit for photographing and acquiring image data corresponding to each pixel in a target image region including a human face region.

好ましくは、相対関係特定手段は、較正時において、人間が撮影手段を見ている状態で撮影手段により撮影された複数の較正用画像を予め取得し、特徴点と眼球中心との間の相対的な３次元の位置関係を特定し、眼球中心推定手段は、撮影手段により撮影された人間の顔領域を含む対象画像領域内において検出された複数の特徴点の投影位置により、特定された相対的な３次元の位置関係に基づいて、人間の眼球中心の投影位置を推定する。 Preferably, the relative relationship specifying means acquires in advance a plurality of calibration images photographed by the photographing means while a human is looking at the photographing means at the time of calibration , and the relative relationship between the feature point and the eyeball center is obtained. The eyeball center estimation means determines the relative relationship specified by the projection positions of the plurality of feature points detected in the target image area including the human face area photographed by the photographing means. Based on the three-dimensional positional relationship, the projection position of the human eyeball center is estimated.

好ましくは、相対関係特定手段は、複数の較正用画像内の複数の特徴点の投影位置を抽出し、複数の特徴点の投影位置を要素として並べた計測行列を算出する計測行列算出手段と、計測行列を因子分解により、撮影手段の姿勢に関する情報を要素とする撮影姿勢行列と、複数の特徴点間の相対的な３次元の位置関に関する情報を要素とする相対位置関係行列とに分解する因子分解手段とを含む。 Preferably, the relative relationship specifying unit extracts a projection position of a plurality of feature points in the plurality of calibration images, and calculates a measurement matrix that calculates a measurement matrix in which the projection positions of the plurality of feature points are arranged as elements. The measurement matrix is decomposed by factorization into a photographing posture matrix having information on the posture of the photographing means as an element and a relative positional relationship matrix having information on a relative three-dimensional positional relationship between a plurality of feature points as elements. And factorization means.

好ましくは、眼球中心推定手段は、撮影された画像フレーム内において観測された特徴点と較正用画像における特徴点との対応付けを行なう特徴点特定手段と、撮影された画像フレーム内において観測された特徴点についての相対位置関係行列の部分行列と、観測された特徴点とから眼球中心の投影位置を推定する。 Preferably, the eyeball center estimation means is observed in the photographed image frame, and feature point specifying means for associating the feature point observed in the photographed image frame with the feature point in the calibration image. The projection position at the center of the eyeball is estimated from the partial matrix of the relative positional relationship matrix for the feature points and the observed feature points.

好ましくは、相対関係特定手段は、較正時において、撮影手段により撮影された複数の較正用画像を予め取得し、較正用画像を正規化した上で、顔領域内の複数の特徴点と眼球中心との間の相対的な３次元の位置関係を特定し、眼球中心推定手段は、撮影手段により撮影された人間の顔領域を含む対象画像領域内において、撮影された画像内の虹彩の領域と、顔座標系内で仮定した眼球中心の位置および眼球半径を用いて算出され投影された虹彩のモデル領域と照合することで、人間の眼球中心の投影位置を推定する。 Preferably, at the time of calibration , the relative relationship specifying unit obtains a plurality of calibration images captured by the imaging unit in advance, normalizes the calibration image, and then a plurality of feature points in the face region and the eyeball center. identify the relative three-dimensional position relationship between the ocular center estimating means, the target image area including the human face region photographed by the photographing means, the area of the iris in the taken image Then, the projected position of the human eyeball center is estimated by collating with the iris model region calculated and projected using the assumed eyeball center position and eyeball radius in the face coordinate system.

好ましくは、相対関係特定手段は、撮影手段により撮影された画像を正規化した上で、顔領域内の複数の特徴点間の相対的な３次元の位置関係を特定し、眼球中心推定手段は、撮影手段により撮影された人間の顔領域を含む対象画像領域内において、撮影された画像内の虹彩の領域と、顔座標系内で仮定した眼球中心の位置および眼球半径を用いて算出され投影された虹彩のモデル領域と照合することで、人間の眼球中心の位置を推定する。 Preferably, the relative relationship specifying unit, in terms of the captured image normalized by the imaging means, to identify the relative three-dimensional position relationship between the plurality of feature points of the face region, the eyeball center estimating means , in the target image area including the human face region photographed by the photographing means, is calculated using the iris region in the captured image, the position and the eyeball radius assumed eyeball center in the face coordinate system The position of the human eyeball center is estimated by collating with the projected iris model region.

好ましくは、虹彩中心抽出手段は、撮影手段により撮影された人間の顔領域を含む対象画像領域内において、撮影された画像内の虹彩の領域と、顔座標内で仮定した眼球中心位置および眼球半径を用いて算出され投影された構成のモデル領域とを照合することで、虹彩中心位置を抽出する。 Preferably, the iris center extracting means includes the iris area in the photographed image, the eyeball center position and the eyeball radius assumed in the face coordinates in the target image area including the human face area photographed by the photographing means. The iris center position is extracted by comparing with the model region of the configuration calculated and projected using.

この発明の他の局面に従うと、視線方向の推定方法であって、人間の顔領域を含む複数のフレームの対象画像領域内の各画素に対応する画像データを単眼の撮影手段により撮影して獲得するステップと、人間が撮影手段を見ている状態で撮影手段により撮影された較正用画像を予め取得し、顔領域内の複数の特徴点と眼球中心との間の相対的な３次元の位置関係を特定するステップと、撮影手段により撮影された対象画像領域内において複数の特徴点の投影位置を検出し、予め較正時に特定されている人間の眼球中心の位置と特徴点との相対的な３次元の位置関係により、フレームにおいて特定された複数の特徴点間の相対的な３次元の位置関係に基づいて、人間の眼球中心の３次元の位置を推定するステップと、フレームの画像領域内において、虹彩の領域と虹彩の形状のモデルとを照合することで、虹彩中心が画像に投影される虹彩中心位置を抽出するステップと、抽出された虹彩中心位置と眼球中心の投影位置とに基づき、眼球中心と虹彩中心とを結ぶ３次元直線の方向として、視線方向を推定するステップとを備える。
この発明のさらに他の局面に従うと、視線方向の推定方法であって、フレームの画像の人間の顔領域を含む対象画像領域内の各画素に対応する画像データを単眼の撮影手段により撮影して獲得するステップと、撮影手段により撮影された画像を取得し、画像を正規化した上で、顔領域内の複数の特徴点の間の相対的な３次元の位置関係を特定するステップと、撮影手段により撮影された対象画像領域内において複数の特徴点の投影位置を検出し、対象となるフレーム以前に撮影されたフレームで特定されている人間の眼球中心と特徴点との相対的な３次元の位置関係により、フレームにおいて特定された複数の特徴点間の相対的な３次元の位置関係に基づいて、人間の眼球中心の３次元の位置を推定するステップとを備え、推定するステップは、撮影手段により撮影された人間の顔領域を含む対象画像領域内において、撮影された画像内の虹彩の領域と、顔座標系内で仮定した眼球中心の位置および眼球半径を用いて算出され投影された虹彩のモデル領域と照合することで、人間の眼球中心の位置を推定するステップを含み、各フレームの対象画像領域内において、虹彩の領域と虹彩の形状のモデルとを照合することで、虹彩中心が画像に投影される虹彩中心位置を抽出するステップと、抽出された虹彩中心位置と眼球中心の位置とに基づき、眼球中心と虹彩中心とを結ぶ３次元直線の方向として、視線方向を推定するステップとを備える。 According to another aspect of the present invention, there is provided a gaze direction estimation method in which image data corresponding to each pixel in a target image area of a plurality of frames including a human face area is captured and acquired by a monocular imaging means. And a relative three-dimensional position between a plurality of feature points in the face region and the center of the eyeball is acquired in advance, and a calibration image photographed by the photographing means in a state where a human is looking at the photographing means. A step of specifying a relationship; and detecting a projection position of a plurality of feature points in a target image region imaged by an imaging unit; and a relative position between a human eyeball center position and a feature point specified in advance during calibration Estimating a three-dimensional position of the center of the human eyeball based on a three-dimensional positional relationship based on a relative three-dimensional positional relationship between a plurality of feature points identified in the frame; In Then, by comparing the iris area with the model of the iris shape, extracting the iris center position at which the iris center is projected on the image, and based on the extracted iris center position and the projected position of the eyeball center And a step of estimating a gaze direction as a direction of a three-dimensional straight line connecting the center of the eyeball and the center of the iris.
According to still another aspect of the present invention, there is provided a gaze direction estimation method in which image data corresponding to each pixel in a target image area including a human face area of a frame image is captured by a monocular imaging means. Acquiring, capturing an image captured by the capturing means, normalizing the image, and specifying a relative three-dimensional positional relationship between a plurality of feature points in the face region; and capturing A relative three-dimensional relationship between a human eyeball center and a feature point identified by a frame photographed before the target frame by detecting projection positions of a plurality of feature points in the target image region photographed by the means steps of the positional relationship, based on the relative three-dimensional position relation between a plurality of feature points identified in the frame, and a step of estimating a three-dimensional position of the human eyeball center estimates In the target image area including the human face area photographed by the photographing means, the projection is calculated using the iris area in the photographed image, the position of the eyeball center assumed in the face coordinate system, and the eyeball radius. Including the step of estimating the position of the center of the human eyeball by collating with the iris model region, and by collating the iris region and the iris shape model in the target image region of each frame, Based on the extracted iris center position at which the iris center is projected on the image, and the extracted iris center position and eyeball center position, the direction of the line of sight is determined as the direction of the three-dimensional straight line connecting the eyeball center and the iris center. Estimating.

この発明のさらに他の局面に従うと、演算処理手段を有するコンピュータに、対象となる画像領域内の顔について視線方向の推定処理を実行させるためのプログラムであって、プログラムは、人間の顔領域を含む複数のフレームの対象画像領域内の各画素に対応する画像データを単眼の撮影手段により撮影して獲得するステップと、人間が撮影手段を見ている状態で撮影手段により撮影された較正用画像を予め取得し、顔領域内の複数の特徴点と眼球中心との間の相対的な３次元の位置関係を特定するステップと、撮影手段により撮影された対象画像領域内において複数の特徴点の投影位置を検出し、予め較正時に特定されている人間の眼球中心の位置と特徴点との相対的な３次元の位置関係により、フレームにおいて特定された複数の特徴点間の相対的な３次元の位置関係に基づいて、人間の眼球中心の３次元の位置を推定するステップと、フレームの画像領域内において、虹彩の領域と虹彩の形状のモデルとを照合することで、虹彩中心が画像に投影される虹彩中心位置を抽出するステップと、抽出された虹彩中心位置と眼球中心の投影位置とに基づき、眼球中心と虹彩中心とを結ぶ３次元直線の方向として、視線方向を推定するステップとをコンピュータに実行させる。
この発明のさらに他の局面に従うと、演算処理手段を有するコンピュータに、対象となる画像領域内の顔について視線方向の推定処理を実行させるためのプログラムであって、プログラムは、フレームの画像の人間の顔領域を含む対象画像領域内の各画素に対応する画像データを単眼の撮影手段により撮影して獲得するステップと、撮影手段により撮影された画像を取得し、画像を正規化した上で、顔領域内の複数の特徴点の間の相対的な３次元の位置関係を特定するステップと、撮影手段により撮影された対象画像領域内において複数の特徴点の投影位置を検出し、対象となるフレーム以前に撮影されたフレームで特定されている人間の眼球中心と特徴点との相対的な３次元の位置関係により、フレームにおいて特定された複数の特徴点間の相対的な３次元の位置関係に基づいて、人間の眼球中心の３次元の位置を推定するステップとを備え、推定するステップは、撮影手段により撮影された人間の顔領域を含む対象画像領域内において、撮影された画像内の虹彩の領域と、顔座標系内で仮定した眼球中心の位置および眼球半径を用いて算出され投影された虹彩のモデル領域と照合することで、人間の眼球中心の位置を推定するステップを含み、各フレームの対象画像領域内において、虹彩の領域と虹彩の形状のモデルとを照合することで、虹彩中心が画像に投影される虹彩中心位置を抽出するステップと、抽出された虹彩中心位置と眼球中心の位置とに基づき、眼球中心と虹彩中心とを結ぶ３次元直線の方向として、視線方向を推定するステップとをコンピュータに実行させる。 According to still another aspect of the present invention, there is provided a program for causing a computer having arithmetic processing means to execute a gaze direction estimation process on a face in a target image area, the program including a human face area. A step of capturing and acquiring image data corresponding to each pixel in a target image region of a plurality of frames including a monocular image capturing unit; and a calibration image captured by the image capturing unit while a human is viewing the image capturing unit. In advance, identifying a relative three-dimensional positional relationship between the plurality of feature points in the face region and the eyeball center, and a plurality of feature points in the target image region photographed by the photographing means A plurality of features specified in the frame based on the relative three-dimensional positional relationship between the position of the human eyeball center and the feature point specified in advance during calibration by detecting the projection position Based on the relative three-dimensional positional relationship between the steps of estimating a three-dimensional position of the human eyeball center, in the image area of the frame, by matching the model of the iris region and the iris shape Then, based on the step of extracting the iris center position at which the iris center is projected on the image, and the projected iris center position and the projected position of the eyeball center, the direction of the three-dimensional straight line connecting the eyeball center and the iris center is as follows: And causing the computer to execute a step of estimating a gaze direction.
According to still another aspect of the present invention, there is provided a program for causing a computer having arithmetic processing means to execute a gaze direction estimation process on a face in a target image area, wherein the program is a human image of a frame image. A step of capturing and acquiring image data corresponding to each pixel in the target image region including the face region of the image by a monocular imaging unit, obtaining an image captured by the imaging unit, and normalizing the image, A step of specifying a relative three-dimensional positional relationship between a plurality of feature points in the face area, and a projection position of the plurality of feature points in the target image area photographed by the photographing means are detected and become a target. Due to the relative three-dimensional positional relationship between the human eyeball center specified in the frame taken before the frame and the feature point, between the feature points specified in the frame Based on the relative three-dimensional positional relationship, and a step of estimating a three-dimensional position of the human eyeball center, estimating the target image area including the human face region photographed by the photographing means In this case, the iris area in the captured image is compared with the model area of the iris calculated and projected using the position and eye radius of the eyeball center assumed in the face coordinate system. Extracting the iris center position at which the iris center is projected on the image by comparing the iris region and the iris shape model in the target image region of each frame, including estimating the position; and Based on the extracted iris center position and eyeball center position, the computer executes a step of estimating a gaze direction as a direction of a three-dimensional straight line connecting the eyeball center and the iris center.

［実施の形態１］
［ハードウェア構成］
以下、本発明の実施の形態１にかかる「視線方向の推定装置」について説明する。この視線方向の推定装置は、パーソナルコンピュータまたはワークステーション等、コンピュータ上で実行されるソフトウェアにより実現されるものであって、対象画像から人物の顔を抽出し、さらに人物の顔の映像に基づいて、視線方向を推定・検出するためのものである。図１に、この視線方向の推定装置の外観を示す。 [Embodiment 1]
[Hardware configuration]
Hereinafter, the “gaze direction estimation device” according to the first exemplary embodiment of the present invention will be described. This gaze direction estimating device is realized by software executed on a computer such as a personal computer or a workstation, and extracts a human face from a target image, and further based on a video of the human face This is for estimating / detecting the gaze direction. FIG. 1 shows the appearance of the gaze direction estimating device.

ただし、以下に説明する「視線方向の推定装置」の各機能の一部または全部は、ハードウェアにより実現されてもよい。 However, some or all of the functions of the “line-of-sight direction estimation device” described below may be realized by hardware.

図１を参照して、この視線方向の推定装置を構成するシステム２０は、ＣＤ−ＲＯＭ（Compact Disc Read-Only Memory ）またはＤＶＤ−ＲＯＭ（Digital Versatile Disc Read-Only Memory）ドライブ（以下、「光学ディスクドライブ」と呼ぶ）５０、あるいはＦＤ（Flexible Disk ）ドライブ５２のような記録媒体からデータを読み取るためのドライブ装置を備えたコンピュータ本体４０と、コンピュータ本体４０に接続された表示装置としてのディスプレイ４２と、同じくコンピュータ本体４０に接続された入力装置としてのキーボード４６およびマウス４８と、コンピュータ本体４０に接続された、画像を取込むための単眼カメラ３０とを含む。この実施の形態の装置では、単眼カメラ３０としてはＣＣＤ（Charge Coupled Device）またはＣＭＯＳ（Complementary Metal-Oxide Semiconductor）センサのような固体撮像素子を含むカメラを用い、カメラ３０の前にいてこのシステム２０を操作する人物の顔の位置および視線を推定・検出する処理を行なうものとする。 Referring to FIG. 1, a system 20 constituting the gaze direction estimating apparatus includes a CD-ROM (Compact Disc Read-Only Memory) or DVD-ROM (Digital Versatile Disc Read-Only Memory) drive (hereinafter referred to as “optical”). A computer main body 40 having a drive device for reading data from a recording medium such as a disk drive 50) or an FD (Flexible Disk) drive 52, and a display 42 as a display device connected to the computer main body 40. And a keyboard 46 and a mouse 48 as input devices also connected to the computer main body 40, and a monocular camera 30 connected to the computer main body 40 for capturing images. In the apparatus of this embodiment, a camera including a solid-state image sensor such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal-Oxide Semiconductor) sensor is used as the monocular camera 30. It is assumed that processing for estimating and detecting the face position and line of sight of the person who operates is performed.

すなわち、カメラ３０により、人間の顔領域を含む画像であって対象となる画像領域内の各画素の値のデジタルデータが準備される。 That is, the camera 30 prepares digital data of values of each pixel in the target image area, which is an image including a human face area.

なお、カメラ３０は、必ずしも単眼カメラに限定される訳ではなく、複数の視点からの画像に基づいて、以下に説明するような視線方向の推定を行なうことも可能であるが、以下の説明でも明らかとなるように、本発明の視線方向の推定装置では、単眼カメラからの画像情報のみに基づいた場合でも、視線の推定を行なうことが可能である。 Note that the camera 30 is not necessarily limited to a monocular camera, and the gaze direction can be estimated as described below based on images from a plurality of viewpoints, but also in the following description. As will be apparent, the gaze direction estimation apparatus of the present invention can estimate the gaze even when based only on image information from a monocular camera.

図２は、カメラ３０により撮影された画像に基づいて、コンピュータ本体４０の処理結果がディスプレイ４２に表示される一例を示す図である。 FIG. 2 is a diagram illustrating an example in which a processing result of the computer main body 40 is displayed on the display 42 based on an image photographed by the camera 30.

図２に示すように、カメラ３０で撮影された画像は、ディスプレイ４２の撮影画像表示領域２００にリアルタイムに動画として表示される。特に限定されないが、たとえば、撮影画像表示領域２００上に、視線方向を示す指標として、眉間から視線方向に延びる線分を表示してもよい。 As shown in FIG. 2, the image captured by the camera 30 is displayed as a moving image in real time in the captured image display area 200 of the display 42. Although not particularly limited, for example, a line segment extending in the line of sight from the eyebrows may be displayed on the captured image display area 200 as an index indicating the line of sight.

図３に、このシステム２０の構成をブロック図形式で示す。図３に示されるようにこのシステム２０を構成するコンピュータ本体４０は、光学ディスクドライブ５０およびＦＤドライブ５２に加えて、それぞれバス６６に接続されたＣＰＵ（Central Processing Unit ）５６と、ＲＯＭ（Read Only Memory) ５８と、RAM （Random Access Memory）６０と、ハードディスク５４と、カメラ３０からの画像を取込むための画像取込装置６８とを含んでいる。光学ディスクドライブ５０にはＣＤ−ＲＯＭ（またはＤＶＤ−ＲＯＭ）６２が装着される。ＦＤドライブ５２にはＦＤ６４が装着される。 FIG. 3 shows the configuration of the system 20 in the form of a block diagram. As shown in FIG. 3, in addition to the optical disk drive 50 and the FD drive 52, the computer main body 40 constituting the system 20 includes a CPU (Central Processing Unit) 56 connected to a bus 66 and a ROM (Read Only). Memory) 58, RAM (Random Access Memory) 60, hard disk 54, and image capturing device 68 for capturing an image from camera 30. A CD-ROM (or DVD-ROM) 62 is mounted on the optical disk drive 50. An FD 64 is attached to the FD drive 52.

既に述べたようにこの視線方向の推定装置の主要部は、コンピュータハードウェアと、ＣＰＵ５６により実行されるソフトウェアとにより実現される。一般的にこうしたソフトウェアはＣＤ−ＲＯＭ（またはＤＶＤ−ＲＯＭ）６２、ＦＤ６４等の記憶媒体に格納されて流通し、光学ドライブ５０またはＦＤドライブ５２等により記憶媒体から読取られてハードディスク５４に一旦格納される。または、当該装置がネットワークに接続されている場合には、ネットワーク上のサーバから一旦ハードディスク５４にコピーされる。そうしてさらにハードディスク５４からＲＡＭ６０に読出されてＣＰＵ５６により実行される。なお、ネットワーク接続されている場合には、ハードディスク５４に格納することなくＲＡＭ６０に直接ロードして実行するようにしてもよい。 As described above, the main part of the gaze direction estimating device is realized by computer hardware and software executed by the CPU 56. Generally, such software is stored and distributed in a storage medium such as a CD-ROM (or DVD-ROM) 62 or FD 64, read from the storage medium by the optical drive 50 or FD drive 52, and temporarily stored in the hard disk 54. The Alternatively, when the device is connected to the network, it is temporarily copied from the server on the network to the hard disk 54. Then, it is further read from the hard disk 54 to the RAM 60 and executed by the CPU 56. In the case of network connection, the program may be directly loaded into the RAM 60 and executed without being stored in the hard disk 54.

図１および図３に示したコンピュータのハードウェア自体およびその動作原理は一般的なものである。したがって、本発明の最も本質的な部分は、ＣＤ−ＲＯＭ（またはＤＶＤ−ＲＯＭ）６２、ＦＤ６４、ハードディスク５４等の記憶媒体に記憶されたソフトウェアである。 The computer hardware itself and its operating principle shown in FIGS. 1 and 3 are general. Therefore, the most essential part of the present invention is software stored in a storage medium such as CD-ROM (or DVD-ROM) 62, FD 64, hard disk 54, and the like.

なお、最近の一般的傾向として、コンピュータのオペレーティングシステムの一部として様々なプログラムモジュールを用意しておき、アプリケーションプログラムはこれらモジュールを所定の配列で必要な時に呼び出して処理を進める方式が一般的である。そうした場合、当該視線方向の推定装置を実現するためのソフトウェア自体にはそうしたモジュールは含まれず、当該コンピュータでオペレーティングシステムと協働してはじめて視線方向の推定装置が実現することになる。しかし、一般的なプラットフォームを使用する限り、そうしたモジュールを含ませたソフトウェアを流通させる必要はなく、それらモジュールを含まないソフトウェア自体およびそれらソフトウェアを記録した記録媒体（およびそれらソフトウェアがネットワーク上を流通する場合のデータ信号）が実施の形態を構成すると考えることができる。 As a recent general trend, various program modules are prepared as part of a computer operating system, and an application program generally calls a module in a predetermined arrangement to advance processing when necessary. is there. In such a case, the software itself for realizing the gaze direction estimation apparatus does not include such a module, and the gaze direction estimation apparatus is implemented only in cooperation with the operating system on the computer. However, as long as a general platform is used, it is not necessary to distribute software including such modules, and the software itself not including these modules and the recording medium storing the software (and the software distributes on the network). Data signal) can be considered to constitute the embodiment.

［システムの機能ブロック］
以下に説明するとおり、本発明の視線方向の推定装置では、顔特徴点を検出・追跡することにより、単眼カメラで視線方向を推定する。 [System functional blocks]
As will be described below, the gaze direction estimation apparatus of the present invention estimates the gaze direction with a monocular camera by detecting and tracking facial feature points.

本発明の視線方向の推定装置では、眼球中心と虹彩中心を結ぶ３次元直線を視線方向として推定する。眼球中心は画像からは直接観測することはできないが、カメラを注視している場合には眼球中心と虹彩中心が同じ位置に観測されることを利用し、眼球中心と顔特徴点との相対関係をモデル化することにより、眼球中心の投影位置を推定する。 In the gaze direction estimation apparatus of the present invention, a three-dimensional straight line connecting the eyeball center and the iris center is estimated as the gaze direction. The center of the eyeball cannot be observed directly from the image, but when the camera is gazing, the center of the eyeball and the center of the iris are observed at the same position. Is used to estimate the projection position at the center of the eyeball.

初期校正（キャリブレーション）では、ユーザがカメラを注視しながら顔の向きが異なる、２枚以上の画像フレーム列を取得し、顔特徴点と虹彩中心を抽出・追跡することにより、顔特徴点と眼球中心との相対関係をモデル化する。 In the initial calibration (calibration), the user acquires two or more image frame sequences with different face orientations while gazing at the camera, and extracts and tracks the face feature points and iris centers, Model the relative relationship with the eyeball center.

視線推定時には、撮影画像から顔特徴点を追跡することで、眼球中心の投影位置を推定し、その点と虹彩中心とを結ぶ３次元ベクトルを視線方向として推定する。 At the time of gaze estimation, the face feature point is tracked from the captured image to estimate the projection position of the eyeball center, and a three-dimensional vector connecting the point and the iris center is estimated as the gaze direction.

以下では、本発明による視線の検知のための構成の概略について説明する。
図４は、図３に示したＣＰＵ５６が、ハードディスク５４等に格納されたプログラムに基づいて行なうソフトウェア処理を示す機能ブロック図である。なお、上述のとおり、この機能ブロックの全部または一部は、専用のハードウェアにより実行される構成としてもよい。 Below, the outline of the structure for the eyes | visual_axis detection by this invention is demonstrated.
FIG. 4 is a functional block diagram showing software processing performed by the CPU 56 shown in FIG. 3 based on a program stored in the hard disk 54 or the like. As described above, all or part of the functional blocks may be executed by dedicated hardware.

視線方向の推定装置のシステム２０の処理は、単眼カメラ３０をベースとしており、顔位置の検出が可能で光学的に十分なズーム比が得られれば観測距離の制約を受けない。 The processing of the gaze direction estimation apparatus system 20 is based on the monocular camera 30 and is not limited by the observation distance as long as the face position can be detected and an optically sufficient zoom ratio can be obtained.

以下では、何らかの手続き、たとえば、ユーザによるカメラの調整等によりあらかじめ大まかな人物顔位置が与えられることを仮定している。 In the following, it is assumed that a rough human face position is given in advance by some procedure, for example, adjustment of the camera by the user.

カメラ３０により撮像された動画に対応する映像信号は、フレームごとに画像キャプチャ処理部５６０２により制御されてデジタルデータとしてキャプチャされ、画像データ記録処理部５６０４により、たとえば、ハードディスク５４のような記憶装置に格納される。 The video signal corresponding to the moving image captured by the camera 30 is captured as digital data by the image capture processing unit 5602 for each frame, and is stored in a storage device such as the hard disk 54 by the image data recording processing unit 5604. Stored.

顔検出部５６０６では、キャプチャされたフレーム画像列に対して６分割矩形フィルタによる顔候補探索（眉間の候補位置の探索）と、探索された眉間の候補位置のうちから、左右の目についてのテンプレートマッチングなどの処理を用いた識別処理を適用することにより画像中の顔位置（両目位置）を検出する。 In the face detection unit 5606, a template for the left and right eyes is selected from the face candidate search (search for candidate positions between eyebrows) using a six-divided rectangular filter for the captured frame image sequence and the searched candidate positions between eyebrows. A face position (both eye positions) in the image is detected by applying an identification process using a process such as matching.

次に、顔領域抽出部５６０８は、求まった両目位置から虹彩中心、鼻位置を抽出する。
さらに、特徴点抽出部５６１０は、抽出された顔や鼻、目（虹彩）の位置を利用し、複数の特徴点の抽出および追跡を行なって、顔領域内の画像特徴点を検出する。なお、画像特徴点周辺のテクスチャは、キャリブレーション時において追跡のためのテンプレートとして保持しておき、さらに適宜更新して、フレーム間の特徴点の対応づけに利用する。一方、虹彩中心抽出部５６１２は、後に説明するように、目の周辺領域に対して、ラプラシアンにより虹彩のエッジ候補を抽出し、円のハフ変換を適用することにより、虹彩の中心の投影位置を検出する。相対関係特定部５６１４は、キャリブレーション時に、眼球中心および特徴点の３次元的な相対位置関係を後に説明するようにモデルとして特定する。眼球中心推定部５６１６は、特徴点の投影位置と相対位置関係のモデルとを用いて、眼球の中心位置を推定する。 Next, the face area extraction unit 5608 extracts the iris center and the nose position from the obtained positions of both eyes.
Further, the feature point extraction unit 5610 uses the extracted positions of the face, nose, and eyes (iris) to extract and track a plurality of feature points to detect image feature points in the face region. Note that the texture around the image feature point is stored as a template for tracking at the time of calibration, and further updated as appropriate to be used for associating feature points between frames. On the other hand, as will be described later, the iris center extracting unit 5612 extracts iris edge candidates by Laplacian for the peripheral region of the eye, and applies the Hough transform of the circle to thereby determine the projected position of the center of the iris. To detect. The relative relationship identification unit 5614 identifies the three-dimensional relative positional relationship between the eyeball center and the feature point as a model at the time of calibration. The eyeball center estimation unit 5616 estimates the center position of the eyeball using the projection position of the feature point and the model of the relative positional relationship.

視線方向推定部５６１８は、抽出された虹彩の中心の投影位置と推定された眼球の中心位置とに基づいて、視線方向を推定する。 The gaze direction estimation unit 5618 estimates the gaze direction based on the extracted projection position of the center of the iris and the estimated center position of the eyeball.

また、表示制御部５６３０は、以上のようにして推定された視線の方向を、取得された画像フレーム上に表示するための処理を行なう。 Further, the display control unit 5630 performs a process for displaying the direction of the line of sight estimated as described above on the acquired image frame.

すなわち、視線方向の推定装置のシステム２０では、特徴点の追跡処理の安定性を確保するため、同一特徴点に関して異なるフレームにおける複数の観測テクスチャを保持している。初期校正過程では、これらの特徴点と虹彩中心の関係から顔特徴点と眼球中心の相対関係を求める。視線推定過程では、校正過程で得られた関係を元に現フレームで得られている特徴点群から眼球中心位置を推定し、その位置と虹彩中心位置から視線方向を決定する。 That is, the system 20 of the gaze direction estimation apparatus holds a plurality of observed textures in different frames for the same feature point in order to ensure the stability of the feature point tracking process. In the initial calibration process, the relative relationship between the face feature point and the eyeball center is obtained from the relationship between these feature points and the iris center. In the gaze estimation process, the eyeball center position is estimated from the feature point group obtained in the current frame based on the relationship obtained in the calibration process, and the gaze direction is determined from the position and the iris center position.

［視線方向の推定処理の動作］
（顔検出処理）
以下では、まず、視線方向の推定の前提として行なわれる顔検出処理について説明する。 [Gaze direction estimation process]
(Face detection process)
Below, the face detection process performed as a premise of the gaze direction estimation will be described first.

（６分割矩形フィルタによる顔検出処理）
視線方向の推定装置のシステム２０では、特に限定されないが、たとえば、顔を連続撮影したビデオ画像を処理するにあたり、横が顔幅、縦がその半分程度の大きさの矩形フィルターで画面を走査する。矩形は、たとえば、３×２に６分割されていて、各分割領域の平均明るさが計算され、それらの相対的な明暗関係がある条件を満たすとき、その矩形の中心を眉間候補とする。 (Face detection processing using a 6-divided rectangular filter)
The system 20 of the gaze direction estimation device is not particularly limited. For example, when processing a video image obtained by continuously capturing a face, the screen is scanned with a rectangular filter whose width is the width of the face and whose height is about half that of the face. . The rectangle is divided into, for example, 3 × 2, and the average brightness of each divided region is calculated, and when the relative brightness relationship is satisfied, the center of the rectangle is set as a candidate for the eyebrows.

連続した画素が眉間候補となるときは、それを取囲む枠の中心候補のみを眉間候補として残す。残った眉間候補を標準パターンと比較してテンプレートマッチング等を行なうことで、上述した手続きで得られた眉間候補のうちから、偽の眉間候補を捨て、真の眉間を抽出する。 When consecutive pixels become the eyebrow candidate, only the center candidate of the frame surrounding it is left as the eyebrow candidate. By comparing the remaining eyebrow candidates with the standard pattern and performing template matching or the like, the false eyebrow candidates are discarded from the eyebrow candidates obtained by the above-described procedure, and the true eyebrow space is extracted.

以下、本発明の顔検出の手続きについて、さらに詳しく説明する。
（６分割矩形フィルタ）
図５は、眉間候補領域を検出するためのフィルタを説明するための概念図である。 Hereinafter, the face detection procedure of the present invention will be described in more detail.
(6-segment rectangular filter)
FIG. 5 is a conceptual diagram for explaining a filter for detecting an eyebrow candidate region.

図５（ａ）は、上述した３×２に６分割された矩形フィルタ（以下、「６分割矩形フィルタ」と呼ぶ）を示す図である。 FIG. 5A is a diagram showing the 3 × 2 rectangular filter described above (hereinafter referred to as “6-divided rectangular filter”).

６分割矩形フィルタは、１）鼻筋は両目領域よりも明るい、２）目領域は頬部よりも暗い、という顔の特徴を抽出し、顔の眉間位置を求めるフィルタである。点（ｘ、ｙ）を中心として、横ｉ画素、縦ｊ画素（ｉ，ｊ：自然数）の矩形の枠を設ける。 The 6-divided rectangular filter is a filter that extracts facial features that 1) the nose is brighter than both eye regions and 2) the eye region is darker than the cheeks, and obtains the position between the eyebrows. A rectangular frame of horizontal i pixels and vertical j pixels (i, j: natural number) is provided with the point (x, y) as the center.

図５（ａ）のように、この矩形の枠を、横に３等分、縦に２等分して、６個のブロックＳ１〜Ｓ６に分割する。 As shown in FIG. 5A, this rectangular frame is divided into three equal parts horizontally and two equal parts vertically, and is divided into six blocks S1 to S6.

図５（ｂ）は、このような６分割矩形フィルタを顔画像の両目領域および頬部に当てはめた状態を示す。 FIG. 5B shows a state in which such a 6-divided rectangular filter is applied to both eye regions and cheeks of the face image.

図６は、６分割矩形フィルタの他の構成を示す概念図である。
鼻筋の部分が目の領域よりも通常は狭いことを考慮すると、ブロックＳ２およびＳ５の横幅ｗ２は、ブロックＳ１，Ｓ３，Ｓ４およびＳ６の横幅ｗ１よりも狭い方がより望ましい。好ましくは、幅ｗ２は幅ｗ１の半分とすることができる。図６は、このような場合の６分割矩形フィルタの構成を示す。また、ブロックＳ１、Ｓ２およびＳ３の縦幅ｈ１と、ブロックＳ４、Ｓ５およびＳ６の縦幅ｈ２とは、必ずしも同一である必要もない。 FIG. 6 is a conceptual diagram showing another configuration of the six-divided rectangular filter.
Considering that the nose muscle portion is usually narrower than the eye region, it is more desirable that the width w2 of the blocks S2 and S5 is narrower than the width w1 of the blocks S1, S3, S4 and S6. Preferably, the width w2 can be half of the width w1. FIG. 6 shows the configuration of a six-divided rectangular filter in such a case. Further, the vertical width h1 of the blocks S1, S2 and S3 and the vertical width h2 of the blocks S4, S5 and S6 are not necessarily the same.

図６に示す６分割矩形フィルタにおいて、それぞれのブロックＳｉ（１≦ｉ≦６）について、画素の輝度の平均値「バーＳｉ」（Ｓｉに上付きの“−”をつける）を求める。 In the six-divided rectangular filter shown in FIG. 6, the average value “bar Si” of the pixel luminance (with a superscript “−”) is obtained for each block Si (1 ≦ i ≦ 6).

ブロックＳ１に１つの目と眉が存在し、ブロックＳ３に他の目と眉が存在するものとすると、以下の関係式（１）が成り立つ。 Assuming that one eye and eyebrows exist in the block S1 and another eye and eyebrows exist in the block S3, the following relational expression (1) is established.

そこで、これらの関係を満たす点を眉間候補（顔候補）として抽出する。
矩形枠内の画素の総和を求める処理について、公知の文献（P. Viola and M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features,” Proc. Of IEEE Conf. CVPR,1,pp.511-518, 2001）において開示されている、インテグラルイメージ（ＩｎｔｅｇｒａｌＩｍａｇｅ）を利用した計算の高速化手法を取り入れることができる。インテグラルイメージを利用することでフィルタの大きさに依らず高速に実行することができる。多重解像度画像に本手法を適用することにより、画像上の顔の大きさが変化した場合にも顔候補の抽出が可能となる。 Therefore, a point satisfying these relationships is extracted as an eyebrow candidate (face candidate).
Regarding the process of obtaining the sum of pixels in a rectangular frame, a well-known document (P. Viola and M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features,” Proc. Of IEEE Conf. CVPR, 1, pp. 511). -518, 2001), it is possible to adopt a calculation speed-up method using an integral image (Integral Image). By using an integral image, it can be executed at high speed regardless of the size of the filter. By applying this method to a multi-resolution image, face candidates can be extracted even when the size of the face on the image changes.

このようにして得られた眉間候補（顔候補）に対しては、両目の標準パターンとのテンプレートマッチングにより、真の眉間位置（真の顔領域）を特定することができる。 For the eyebrow candidate (face candidate) obtained in this way, the true eyebrow position (true face region) can be specified by template matching with the standard pattern of both eyes.

なお、得られた顔候補に対して、サポートベクトルマシン（ＳＶＭ）による顔モデルによる検証処理を適用し顔領域を決定することもできる。髪型の違いや髭の有無、表情変化による認識率の低下を避けるため、たとえば、図７に示すように、眉間を中心とした画像領域を利用してＳＶＭによるモデル化を行なうことができる。なお、このようなＳＶＭによる真の顔領域の決定については、文献：S．Kawato，N．Tetsutani and K．Hosaka：“Scale-adaptive face detection and tracking in real time with ssr fi1ters and support vector machine”，IEICE Trans．on Info．and Sys., E88−D，12，pp．2857−2863（2005）に開示されている。６分割矩形フィルタによる高速候補抽出とＳＶＭによる処理とを組合わせることで実時間の顔検出が可能である。 Note that a face area can be determined by applying verification processing using a face model by a support vector machine (SVM) to the obtained face candidates. In order to avoid a decrease in recognition rate due to differences in hairstyles, presence or absence of wrinkles, and changes in facial expressions, for example, as shown in FIG. 7, modeling by SVM can be performed using an image region centered between the eyebrows. Note that the determination of the true face area by SVM is described in the document: S.A. Kawato, N.A. Tetsutani and K. Hosaka: “Scale-adaptive face detection and tracking in real time with ssr fi1ters and support vector machine”, IEICE Trans. on Info. and Sys., E88-D, 12, pp. 2857-2863 (2005). Real-time face detection is possible by combining high-speed candidate extraction with a six-divided rectangular filter and processing by SVM.

（目・鼻の検出処理）
続いて、目、鼻や虹彩中心の位置を、上述した非特許文献４や非特許文献６の手法を用いて抽出する
両目の位置については、前節の顔領域検出で眉間のパターンを探索しているため、眉間の両側の暗い領域を再探索することにより、大まかな両目の位置を推定することができる。しかし、視線方向の推定のためには、虹彩中心をより正確に抽出する必要がある。ここでは、上で求まった目の周辺領域に対して、ラプラシアンにより虹彩のエッジ候補を抽出し、円のハフ変換を適用することにより、虹彩および虹彩の中心の投影位置を検出する。 (Eye / nose detection process)
Subsequently, the positions of the eyes, nose and iris center are extracted using the methods of Non-Patent Document 4 and Non-Patent Document 6 described above. For the positions of both eyes, a pattern between eyebrows is searched by detecting the face area in the previous section. Therefore, the rough positions of both eyes can be estimated by re-searching the dark areas on both sides of the eyebrows. However, it is necessary to extract the iris center more accurately in order to estimate the gaze direction. Here, for the peripheral region of the eye obtained above, iris edge candidates are extracted by Laplacian, and the Hough transform of the circle is applied to detect the projection position of the iris and the center of the iris.

鼻の位置は、鼻先が凸曲面であるため周囲に対し明るい点として観測されやすいことと、両目の位置から鼻の存在範囲が限定できることを利用して抽出する。また、両目、鼻の位置を用いて、大体の顔の向きも推定できる。 The nose position is extracted by utilizing the fact that the nose tip is a convex curved surface, so that it can be easily observed as a bright spot with respect to the surroundings, and the nose presence range can be limited from the positions of both eyes. In addition, the orientation of the approximate face can be estimated using the positions of both eyes and nose.

図８は、顔検出結果の例を示す図である。検出された顔において、虹彩中心や鼻先や口なども検出されている。たとえば、特徴点としては、鼻先や、左右の目の目尻や目頭、口の両端、鼻腔中心などを用いることができる。 FIG. 8 is a diagram illustrating an example of a face detection result. In the detected face, the iris center, nose tip and mouth are also detected. For example, nose tips, left and right eye corners and eyes, both ends of the mouth, and the center of the nasal cavity can be used as the feature points.

（視線推定）
（視線推定の原理）
本発明では、視線方向は眼球中心と虹彩中心を結ぶ３次元直線として与えられるとする。 (Gaze estimation)
(Principle of gaze estimation)
In the present invention, it is assumed that the viewing direction is given as a three-dimensional straight line connecting the center of the eyeball and the center of the iris.

図９は、視線方向を決定するためのモデルを説明する概念図である。
画像上での眼球半径をｌ、画像上での眼球中心と虹彩中心とのｘ軸方向、ｙ軸方向の距離をｄx、ｄyとすると、視線方向とカメラ光軸とのなす角、つまり、視線方向を向くベクトルがｘ軸およびｙ軸との成す角ψx、ψyは次式で表される。 FIG. 9 is a conceptual diagram illustrating a model for determining the line-of-sight direction.
If the eyeball radius on the image is l, the distance between the eyeball center and the iris center on the image in the x-axis direction and the y-axis direction is dx, dy, the angle between the line-of-sight direction and the camera optical axis, that is, the line of sight The angles ψx and ψy formed by the vector pointing in the direction with the x axis and the y axis are expressed by the following equations.

式（３）により、視線方向を推定するためには、画像上での眼球半径と眼球中心・虹彩中心の投影位置が必要となる。ここで、虹彩中心の投影位置については、上述したとおり、ハフ変換を用いた手法により求めることができる。画像上での眼球直径ｒは、解剖学的なモデル（標準的な人の眼球直径）を用いてもよいし、別途キャリブレーションにより求めてもよい。 In order to estimate the line-of-sight direction using Equation (3), the eyeball radius on the image and the projection positions of the eyeball center and iris center are required. Here, as described above, the projection position of the iris center can be obtained by the method using the Hough transform. The eyeball diameter r on the image may be an anatomical model (standard human eyeball diameter) or may be obtained by calibration separately.

図１０は、図９に示した状態から、ユーザがカメラを注視する状態に移行した後の虹彩中心、眼球中心および投影点の関係を示す概念図である。 FIG. 10 is a conceptual diagram illustrating the relationship between the iris center, the eyeball center, and the projection point after the user transitions from the state illustrated in FIG. 9 to a state in which the user gazes at the camera.

眼球中心の投影位置については、一般には、画像から直接観測することはできない。しかし、ユーザがカメラを注視した場合について考えると、図１０に示すとおり、カメラ、虹彩中心、眼球中心の３点が１直線上に並ぶため、画像では虹彩中心と眼球中心は同一点に投影されることがわかる。 In general, the projection position at the center of the eyeball cannot be observed directly from the image. However, considering the case where the user gazes at the camera, as shown in FIG. 10, since the three points of the camera, the iris center, and the eyeball center are aligned on a straight line, the iris center and the eyeball center are projected at the same point in the image. I understand that

そこで、本発明では、ユーザがカメラを注視しながら、顔の姿勢を変化させている画像フレーム列を撮影し、これらの画像列から虹彩位置と顔特徴点を抽出・追跡することにより、眼球中心と顔特徴点間の相対幾何関係を推定する。 Therefore, in the present invention, the user captures an image frame sequence in which the posture of the face is changed while gazing at the camera, and extracts and tracks the iris position and the facial feature point from these image sequences, thereby the eyeball center. And the relative geometric relationship between the facial feature points.

後により詳しく説明するように、本発明の視線方向の推定装置では、眼球中心と顔特徴点間の相対関係の推定処理と眼球中心の投影位置推定とを行なう。 As will be described in more detail later, the gaze direction estimation apparatus of the present invention performs a process of estimating the relative relationship between the eyeball center and the face feature point and the projection position estimation of the eyeball center.

（顔特徴点の追跡による視線推定）
図１１は、視線方向の推定装置の初期設定として行なうキャリブレーションを説明するためのフローチャートである。 (Gaze estimation by tracking facial feature points)
FIG. 11 is a flowchart for explaining calibration performed as an initial setting of the gaze direction estimating apparatus.

まず、キャリブレーション用の画像列として、ユーザがカメラを注視しながら、顔の姿勢を変化させている画像フレーム列を撮影する（ステップＳ１０２）。 First, as an image sequence for calibration, the user captures an image frame sequence in which the posture of the face is changed while gazing at the camera (step S102).

図１２は、このようにしてキャリブレーションにおいて撮影された４枚の画像フレームを示す。 FIG. 12 shows four image frames taken in the calibration in this way.

ここでは、より一般に、Ｎ（Ｎ≧２）枚の画像列が得られたとする。各画像フレームを、フレームＩ₁，…Ｉ_Ｎとする。 Here, more generally, it is assumed that N (N ≧ 2) image rows are obtained. _Let each image frame be a frame I ₁ ,.

次に、得られた各画像フレーム列に対して、上述したような方法によって顔検出部５６０６が顔検出処理を行ない（ステップＳ１０４）、続いて、顔領域抽出部５６０８が、目や鼻の検出処理を行なう（ステップＳ１０６）。 Next, the face detection unit 5606 performs face detection processing on each obtained image frame sequence by the method as described above (step S104), and then the face region extraction unit 5608 detects the eyes and nose. Processing is performed (step S106).

さらに、特徴点抽出部５６１０が特徴点の抽出・追跡を行なう（ステップＳ１０８）。なお、特徴点の抽出方法としては、上述したような方法の他に、たとえば、文献：J．Shi and C．Tomasi：“Good features to track”，Proc. CVPR94，pp．593−600（1994）で提案された手法を用いることもできる。 Further, the feature point extraction unit 5610 extracts and tracks feature points (step S108). In addition to the above-described method, the feature point extraction method includes, for example, a document: J.A. Shi and C. Tomasi: “Good features to track”, Proc. CVPR94, pp. The method proposed in 593-600 (1994) can also be used.

ここで、各画像フレームＩ_ｉ（ｉ＝１，…，Ｎ）においてＭ（Ｍ≧４）点の特徴点ｐ_ｊ（ｊ＝１，…，Ｍ）が検出・追跡できたとする。画像フレームＩ_ｉにおける特徴点ｐ_ｊの２次元観測位置をｘ_ｊ ^（ｉ）（太字）＝［ｘ_ｊ ^（ｉ），ｙ_ｊ ^（ｉ）］^ｔ（ｉ＝１，…，Ｎ，ｊ＝１，…，Ｍ）とし、両目の虹彩中心の２次元観測位置をそれぞれｘ_ｒ ^（ｉ）（太字）＝［ｘ_ｒ ^（ｉ），ｙ_ｒ ^（ｉ）］^ｔ，ｘ_ｌ ^（ｉ）（太字）＝［ｘ_ｌ ^（ｉ），ｙ_ｌ ^（ｉ）］^ｔ（ｉ＝１，…，Ｎ）とする。ここで、行列Ｗを以下のように定義する。 Here, it is assumed that feature points p _j (j = 1,..., M) of M (M ≧ 4) points can be detected and tracked in each image frame I _i (i = 1,..., N). The two-dimensional observation position of the feature point p _j in the image frame I _i is ^expressed as x _j ⁽ⁱ⁾ (bold) = [x _j ⁽ⁱ⁾ , y _j ⁽ⁱ⁾ ] ^t (i = 1,..., N, j = 1 ,..., M), and the two-dimensional observation positions of the iris centers of both eyes are x _r ⁽ⁱ⁾ (bold) = [x _r ⁽ⁱ⁾ , y _r ⁽ⁱ⁾ ] ^t , x _l ⁽ⁱ⁾ (bold) = [X _l ⁽ⁱ⁾ , y _l ⁽ⁱ⁾ ] ^t (i = 1,..., N). Here, the matrix W is defined as follows.

因子分解法により、特徴点の各フレームでの２次元観測位置を縦に並べた行列Ｗ（計測行列）は以下のように分解できる。 By the factorization method, a matrix W (measurement matrix) in which two-dimensional observation positions in each frame of feature points are vertically arranged can be decomposed as follows.

ここで、行列Ｍ（「撮影姿勢行列）と呼ぶ）にはカメラの姿勢に関する情報のみが、行列Ｓ（「相対位置関係行列」と呼ぶ）には観測対象物の形状に関する情報のみが含まれており、顔特徴点と眼球中心との３次元的な位置の相対関係は行列Ｓとして求まる（ステップＳ１１０）。すなわち、正射影を仮定すると、行列Ｍの各要素が画像フレームでのカメラの姿勢を表す単位ベクトルであって、それぞれの大きさが１であり相互には直交するとの拘束条件のもとで、行列Ｗは、特異値分解により一義的に行列Ｍと行列Ｓの積に分解できることが知られている。なお、このような計測行列Ｗを、因子分解により、カメラの運動の情報を表す行列と対象物の形状情報を表す行列へ分解する点については、文献：金出，ポールマン，森田：因子分解法による物体形状とカメラ運動の復元”，電子通信学会論文誌Ｄ−ＩＩ，Ｊ７６−Ｄ−ＩＩ，８，ｐｐ．１４９７−１５０５（１９９３）に開示がある。 Here, the matrix M (referred to as “photographing posture matrix”) includes only information regarding the posture of the camera, and the matrix S (referred to as “relative positional relationship matrix”) includes only information regarding the shape of the observation object. Therefore, the relative relationship between the three-dimensional position between the face feature point and the eyeball center is obtained as a matrix S (step S110). That is, assuming orthographic projection, each element of the matrix M is a unit vector that represents the posture of the camera in the image frame, and each of them is 1 and under the constraint that they are orthogonal to each other, It is known that the matrix W can be uniquely decomposed into a product of the matrix M and the matrix S by singular value decomposition. In addition, about the point which decomposes | disassembles such a measurement matrix W into the matrix showing the information of the motion of a camera and the shape information of a target object by factorization, literature: Kade, Paulman, Morita: factorization Restoration of object shape and camera motion by the method ", disclosed in IEICE Transactions D-II, J76-D-II, 8, pp. 1497-1505 (1993).

（視線方向推定処理）
図１３は、リアルタイムの視線方向の推定処理のフローチャートを示す。 (Gaze direction estimation process)
FIG. 13 is a flowchart of real-time gaze direction estimation processing.

次に、以上で得られた結果を用いて、視線方向を推定する手順について説明する。
まず、カメラ３０から画像フレームを取得すると（ステップＳ２００）、キャリブレーション時と同様にして、顔の検出および目鼻の検出が行なわれ（ステップＳ２０２）、取得された画像フレーム中の特徴点が抽出される（ステップＳ２０４）。 Next, a procedure for estimating the line-of-sight direction using the results obtained above will be described.
First, when an image frame is acquired from the camera 30 (step S200), face detection and eye-nose detection are performed in the same manner as in calibration (step S202), and feature points in the acquired image frame are extracted. (Step S204).

画像フレームＩ_ｋが得られたとする。ここで、眼球中心以外の特徴点のうちｍ点ｐ_ｊ（ｊ＝ｊ_１，…，ｊ_ｍ）が、それぞれ、ｘ_ｊ ^（ｋ）（太字）＝［ｘ_ｊ ^（ｋ），ｙ_ｊ ^（ｋ）］^ｔに観測されたとする。このとき、観測された特徴点について、上述したように特徴点近傍のテンプレートを用いたテンプレートマッチングを実施することで、キャリブレーション時に特定された特徴点と現画像フレーム中で観測された特徴点との対応付けが行なわれて、現画像フレーム中の特徴点が特定される（ステップＳ２０６）。 _Assume that an image frame I _k is obtained. Here, m points p _j (j = j ₁ ,..., J _m ) among feature points other than the center of the eyeball are respectively x _j ^(k) (bold) = [x _j ^(k) , y _j ^{(k )]} and was observed to ^t. At this time, for the observed feature points, by performing template matching using a template near the feature points as described above, the feature points identified during calibration and the feature points observed in the current image frame And the feature points in the current image frame are specified (step S206).

なお、上述のとおり、特徴点を特定するためのテンプレートは、キャリブレーションの時のものに限定されず、たとえば、最近の画像フレームの所定枚数について検出された特徴点の近傍の所定の大きさの領域内の画像を所定個数だけ保持しておき、これら所定枚数のテンプレートについてマッチングをした結果、もっとも一致度の高い特徴点に特定することとしてもよい。 As described above, the template for specifying the feature point is not limited to the template at the time of calibration. For example, the template having a predetermined size in the vicinity of the detected feature point for the predetermined number of recent image frames is used. A predetermined number of images in the region may be held, and the feature points having the highest degree of matching may be specified as a result of matching the predetermined number of templates.

顔特徴点ｐ_ｊの２次元観測位置ｘ_ｊ ^（ｋ）（太字）＝［ｘ_ｊ ^（ｋ），ｙ_ｊ ^（ｋ）］^ｔとキャリブレーションより求まった３次元位置ｓ_ｊ（太字）＝［Ｘ_ｊ，Ｙ_ｊ，Ｚ_ｊ］^ｔ（ｊ＝１，…，Ｍ）の間には、Ｍ個の特徴点のうち観測されたｍ個の特徴点について注目すると、次式の関係が得られる。 2-dimensional observation position _x ^j of the facial feature points _{p j} ^(k) _{^{_{^{(bold) = [x j (k)}}}} , y j (k)] t and a three-dimensional position _{s j} which Motoma' from calibration (bold) = [X _{Between j 1} , Y _j , Z _j ] ^t (j = 1,..., M), when attention is paid to the m feature points observed among the M feature points, the following relationship is obtained.

ただし、行列Ｐ^（ｋ）は２×３の行列である。右辺の第２項の行列Ｓ^（ｋ）は行列Ｓのうち、観測された特徴点に対応する要素のみからなる部分行列である。上述の通り、カメラと顔は十分に離れているとし正射影を仮定している。ここで、４点以上の特徴点が観測されれば、行列Ｐ^（ｋ）は以下のように計算できる（ステップＳ２０８）。 However, the matrix P ^(k) is a 2 × 3 matrix. The matrix S ^(k) of the second term on the right side is a partial matrix consisting of only elements corresponding to the observed feature points in the matrix S. As described above, it is assumed that the camera and the face are sufficiently separated from each other and an orthogonal projection is assumed. Here, if four or more feature points are observed, the matrix P ^(k) can be calculated as follows (step S208).

画像フレームＩ_ｋにおける眼球中心の投影位置ｘ_ｒ ^（ｉ）（太字），ｘ_ｌ ^（ｉ）（太字）は、行列Ｐ^（ｋ）を用いて以下のように計算できる（ステップＳ２１０）。 The projection position x _r ⁽ⁱ⁾ (bold), x _l ⁽ⁱ⁾ (bold) at the center of the eyeball in the image frame I _k can be calculated as follows using the matrix P ^(k) (step S210).

したがって、画像フレームＩ_ｋにおいて特徴点として抽出した虹彩中心の投影位置とこの眼球中心の投影位置を用いると、視線の推定を行なうことができる（ステップＳ２１２）。 Therefore, by using the iris center projection position extracted as the feature point in the image frame I _k and the eyeball center projection position, the line of sight can be estimated (step S212).

なお、行列ＰをＱＲ分解により分解することで、顔の姿勢Ｒが、以下のように計算できる。 By decomposing the matrix P by QR decomposition, the face posture R can be calculated as follows.

ただしｒ_１、ｒ_２はそれぞれ１×３のベクトルである。このような顔の姿勢Ｒの検出については、文献：L．Quan：“Self-calibration of an affine camera from multiple views”，Int’l Journal of Computer Vision，19，pp．93−105（1996）に開示がある。 However, r ₁ and r ₂ are 1 × 3 vectors, respectively. Such detection of face posture R is described in literature: L.L. Quan: “Self-calibration of an affine camera from multiple views”, Int'l Journal of Computer Vision, 19, pp. 93-105 (1996).

ユーザ等の指示により追跡が終了していると判断されれば（ステップＳ２１４）、処理は終了し、終了が指示されていなければ、処理はステップＳ２０２に復帰する。 If it is determined that the tracking has been completed by an instruction from the user or the like (step S214), the process is terminated, and if the termination is not instructed, the process returns to step S202.

（実験）
以上説明した視線方向の推定装置の有効性を確認するため、実画像を用いた実験を行なった結果について以下に説明する。 (Experiment)
In order to confirm the effectiveness of the gaze direction estimation apparatus described above, the results of experiments using real images will be described below.

カメラはElmo社製PTC−400Cを用い、被験者から約１５０［cm］の位置に設置した。
まず、５０フレームの画像列を用いて、眼球中心と顔特徴点のキャリブレーションを行なった。キャリブレーション用の画像フレーム列と抽出した特徴点の例は、図１２に示したとおりである。 The camera was an Elmo PTC-400C, and was installed at a position of about 150 cm from the subject.
First, the center of the eyeball and the facial feature point were calibrated using an image sequence of 50 frames. Examples of calibration image frame sequences and extracted feature points are as shown in FIG.

キャリブレーション用画像フレーム列の撮影に要した時間は約３秒であった。（＋印は抽出された虹彩中心（眼球中心））、×印は追跡した顔特徴点）。 The time required for capturing the calibration image frame sequence was about 3 seconds. (+ Mark is the extracted iris center (eyeball center)), x mark is the tracked facial feature point).

次に、キャリブレーションにより求まった顔モデル（行列Ｓ）を用いて、視線推定を行なった。ここで、被験者はそれぞれ右上、上、左下の方向を注視しながら、顔の位置・向きを変化させた。 Next, gaze estimation was performed using the face model (matrix S) obtained by calibration. Here, the subject changed the position and orientation of the face while gazing at the upper right, upper and lower left directions.

図１４〜図１６は、視線推定結果を示す。図１４は、右上方注視の状態であり、図１５は、上方注視の状態であり、図１６は、左下方向注視の状態である。ここで、視線方向は両目それぞれで計算された視線方向の平均値としている。結果より、顔の位置・向きの変化とは関係なく、視線方向が推定できていることがわかる。 14 to 16 show the line-of-sight estimation results. FIG. 14 shows a state of right upper gaze, FIG. 15 shows a state of upper gaze, and FIG. 16 shows a state of lower left gaze. Here, the gaze direction is an average value of the gaze directions calculated for both eyes. From the results, it can be seen that the line-of-sight direction can be estimated regardless of changes in the face position and orientation.

以上説明したとおり、実施の形態１の視線方向の推定装置では、単眼カメラの観測に基づいて顔特徴点を検出・追跡することにより視線方向を推定する。本発明の視線方向の推定装置では、まずキャリブレーションとして視線がカメラ方向を向いたまま顔の向きのみが異なる画像列から得られる虹彩位置と顔特徴点を利用することで、眼球中心と顔特徴点の関係をモデル化し（行列Ｓを特定し）、その後、その関係に基づいて推定された入力画像中の眼球中心位置と虹彩位置の関係から視線方向を決定する。 As described above, the gaze direction estimation apparatus according to the first embodiment estimates the gaze direction by detecting and tracking the face feature points based on the observation of the monocular camera. In the gaze direction estimation apparatus of the present invention, first, the center of the eyeball and the face feature are used by using the iris position and the face feature point obtained from the image sequence in which only the face direction is different while the gaze is facing the camera direction as calibration. The point relationship is modeled (matrix S is specified), and then the line-of-sight direction is determined from the relationship between the eyeball center position and the iris position in the input image estimated based on the relationship.

実施の形態１の視線方向の推定装置は、単眼カメラによるパッシブな観測画像から視線方向を推定するため、顔位置が検出可能で光学的に十分なズーム比を得ることができれば、推定処理は観測距離の制約を受けない。これにより従来の視線推定装置では実現が難しかった遠方にいる人物に対する視線方向推定が可能になる。 Since the gaze direction estimation apparatus according to Embodiment 1 estimates the gaze direction from a passive observation image obtained by a monocular camera, if the face position can be detected and an optically sufficient zoom ratio can be obtained, the estimation process is observed. Not subject to distance restrictions. As a result, it is possible to estimate the gaze direction for a person who is far away, which is difficult to realize with the conventional gaze estimation apparatus.

［実施の形態２］
実施の形態１の視線方向の推定装置においては、図１１において説明したとおり、まず、キャリブレーション用の画像列として、ユーザがカメラを注視しながら、顔の姿勢を変化させている画像フレーム列を撮影していた。 [Embodiment 2]
In the gaze direction estimation apparatus according to the first embodiment, as described with reference to FIG. 11, first, as an image sequence for calibration, an image frame sequence in which the posture of the face is changed while the user gazes at the camera. I was shooting.

実施の形態２の視線方向の推定装置においては、以下に説明するとおり、ユーザが必ずしもカメラを注視していない状態の画像列を用いて、キャリブレーションを行なうことが可能である。したがって、視線方向推定の被験者（被推定者）の、自然な状態での画像の撮影のみで、視線方向の推定を開始でき、被推定者への負担を低減できる。 In the gaze direction estimation apparatus according to the second embodiment, as described below, calibration can be performed using an image sequence in which the user is not necessarily gazing at the camera. Therefore, the gaze direction estimation can be started only by capturing the image in the natural state of the gaze direction estimation subject (estimated person), and the burden on the estimated person can be reduced.

なお、キャリブレーション後の視線方向の推定処理自体は、実施の形態２の視線方向の推定装置の動作は、実施の形態１の視線方向の推定装置の動作と同様である。 Note that the gaze direction estimation process itself after calibration is the same as the gaze direction estimation apparatus according to the first embodiment in the same manner as the gaze direction estimation apparatus according to the first embodiment.

（キャリブレーション処理）
図１７は、実施の形態２の視線方向の推定装置の初期設定として行なうキャリブレーションを説明するためのフローチャートである。 (Calibration process)
FIG. 17 is a flowchart for explaining calibration performed as an initial setting of the gaze direction estimation apparatus according to the second embodiment.

まず、キャリブレーション用の画像列として、ユーザが自由な方向を向き、顔の姿勢を変化させて画像列を取得する（ステップＳ３０２）。 First, as an image sequence for calibration, the user turns in a free direction and changes the posture of the face to acquire an image sequence (step S302).

（顔３次元モデルの生成）
以下の手続きにより、顔３次元モデルを生成する。 (Generation of 3D face model)
A face three-dimensional model is generated by the following procedure.

ここでは、Ｎ（Ｎ≧２）枚の画像列が得られたとする。各画像フレームを、フレームＩ₁，…Ｉ_Ｎとする。 Here, it is assumed that N (N ≧ 2) image rows are obtained. _Let each image frame be a frame I ₁ ,.

次に、得られた各画像フレーム列に対して、実施の形態１と同様な方法によって顔検出部５６０６が顔検出処理を行ない（ステップＳ３０４）、続いて、顔領域抽出部５６０８が、目や鼻、虹彩の検出処理を行なう（ステップＳ３０６）。 Next, the face detection unit 5606 performs face detection processing on each obtained image frame sequence by the same method as in the first embodiment (step S304), and then the face region extraction unit 5608 performs eye and eye detection. A nose and iris detection process is performed (step S306).

さらに、特徴点抽出部５６１０が特徴点の抽出・追跡を行なう（ステップＳ３０８）。なお、特徴点の抽出方法も、実施の形態１と同様とする。 Further, the feature point extraction unit 5610 extracts and tracks feature points (step S308). Note that the feature point extraction method is the same as that in the first embodiment.

ここで、各画像フレームＩ_ｉ（ｉ＝１，…，Ｎ）においてＭ（Ｍ≧４）点の特徴点ｐ_ｊ（ｊ＝１，…，Ｍ）が検出・追跡できたとする。画像フレームＩ_ｉにおける特徴点ｐ_ｊの２次元観測位置をｘ_ｊ ^（ｉ）（太字）＝［ｘ_ｊ ^（ｉ），ｙ_ｊ ^（ｉ）］^ｔ（ｉ＝１，…，Ｎ，ｊ＝１，…，Ｍ）とする。 Here, it is assumed that feature points p _j (j = 1,..., M) of M (M ≧ 4) points can be detected and tracked in each image frame I _i (i = 1,..., N). The two-dimensional observation position of the feature point p _j in the image frame I _i is ^expressed as x _j ⁽ⁱ⁾ (bold) = [x _j ⁽ⁱ⁾ , y _j ⁽ⁱ⁾ ] ^t (i = 1,..., N, j = 1 , ..., M).

画像Ｉ_ｉにおける特徴点ｐ_ｊの２次元観測位置を以下のとおりとする。 The two-dimensional observation position of the feature point p _j in the image I _i is as follows.

画像Ｉ_ｉにおける特徴点の重心を以下の記号（１２）で表す。 The center of gravity of the feature point in the image I _i is represented by the following symbol (12).

同様にして、各観測位置を重心からの相対位置として、以下の記号（１３）で表す。 Similarly, each observation position is represented by the following symbol (13) as a relative position from the center of gravity.

したがって、以下の式（１４）および式（１５）の関係が成り立つ。 Therefore, the relationship of the following formulas (14) and (15) is established.

このとき、相対観測位を並べた行列Ｗは、実施の形態１と同様にして、因子分解法により以下の式（１６）（１７）のように分解できる。ここで、カメラと顔は十分に離れているとし、弱透視変換を仮定している。 At this time, the matrix W in which the relative observation positions are arranged can be decomposed as the following equations (16) and (17) by the factorization method in the same manner as in the first embodiment. Here, it is assumed that the camera and the face are sufficiently separated from each other, and weak perspective transformation is assumed.

実施の形態１と同様に、行列Ｍ（「撮影姿勢行列」）にはカメラの姿勢変化に関する情報が、行列Ｓ（「相対位置関係行列」）には観測対象物の形状に関する情報が含まれており、顔特徴点間の相対関係は行列Ｓとして求められる（ステップＳ３１０）。 As in the first embodiment, the matrix M (“shooting posture matrix”) includes information related to the camera posture change, and the matrix S (“relative positional relationship matrix”) includes information related to the shape of the observation object. The relative relationship between the facial feature points is obtained as a matrix S (step S310).

次に、画像Ｉ_ｋが得られ、Ｍ個の特徴点のうちp個がそれぞれ以下のように観測されたとする。 Next, it is assumed that an image _Ik is obtained, and p of M feature points are observed as follows.

ここで、特徴点の重心位置を以下の記号（１８）で表す。 Here, the barycentric position of the feature point is represented by the following symbol (18).

このとき、カメラの姿勢変化を表す行列ｍ_ｋおよび式（１８）で表される特徴点の重心位置は、以下の式（１９）のようにして求められる。 At this time, the matrix m _k representing the posture change of the camera and the barycentric position of the feature point represented by the equation (18) are obtained as the following equation (19).

ただし、行列ｍ_ｋは２×３の行列、［］^＊は疑似逆行列を示す。
さらに、行列ｍ_ｋを式（２０）のようにＱＲ分解することで、画像Ｉ_ｋにおける顔の姿勢Ｒ_ｋ、位置ｔ_ｋが以下の式（２１）のように計算できる。 Here, the matrix m _k is a 2 × 3 matrix, and [] ^* is a pseudo inverse matrix.
Furthermore, by performing QR decomposition on the matrix m _k as shown in Expression (20), the face posture R _k and position t _k in the image I _k can be calculated as shown in Expression (21) below.

ただし、以下の式（２２）のｓ_ｋはスケールファクタを表す。 However, _{s k} of the formula (22) below represents the scale factor.

なお、上記の式（２０）に含まれるａｋは画像面の傾きを表わし、通常は０（ゼロ）とみなされる。 In addition, ak included in the above equation (20) represents the inclination of the image plane, and is normally regarded as 0 (zero).

したがって、以下では、被験者の顔は、正規化された大きさを有するものとして扱うことができ、被験者とカメラの距離による画像中の顔の大きさの変化を考慮する必要がない。 Therefore, in the following, the face of the subject can be treated as having a normalized size, and it is not necessary to consider the change in the size of the face in the image due to the distance between the subject and the camera.

さらに、後に詳しく説明する手順により、眼球モデルパラメータ（眼球中心位置、眼球半径、虹彩半径、画像上における虹彩位置）の推定を行なう（ステップＳ３１２）。 Further, the eyeball model parameters (eyeball center position, eyeball radius, iris radius, iris position on the image) are estimated by a procedure described in detail later (step S312).

（眼球モデルパラメータの推定）
以下では、眼球モデルパラメータの推定処理について、さらに詳しく説明する。なお、以下の処理は、眼球中心推定部５６１６が行なうものである。 (Estimation of eyeball model parameters)
Hereinafter, the eyeball model parameter estimation process will be described in more detail. Note that the following processing is performed by the eyeball center estimation unit 5616.

視線方向を推定するには、前節で求められた顔座標系における眼球中心位置Ｘ_｛ＬＲ｝（なお、以下では、下付文字｛ＬＲ｝は、左を意味するｌ、右を意味するｒを総称するものとして使用する）、および眼球半径ｌ、画像上における虹彩位置ｘ_{｛ＬＲ｝，ｋ}を求める必要がある。 In order to estimate the line-of-sight direction, the eyeball center position X _{LR} in the face coordinate system obtained in the previous section (In the following, the subscript {LR} is l for left and r for right). And the eyeball radius l and the iris position x _{{LR}, k} on the image must be obtained.

入力画像において虹彩位置を効率的に決定するためには、さらに予め虹彩半径ｒを得ておくことが望ましい。 In order to efficiently determine the iris position in the input image, it is desirable to obtain the iris radius r in advance.

（眼球３次元モデルの生成）
図１８は、眼球モデルパラメータの推定処理の手続きを示す図である。 (Generation of three-dimensional eyeball model)
FIG. 18 is a diagram illustrating a procedure of an eyeball model parameter estimation process.

以下では、図１８を用いて、上述したキャリブレーションで用いたＮフレームの画像列を用いてこれらのパラメータを推定する処理について説明する。 Hereinafter, a process for estimating these parameters using the image sequence of N frames used in the calibration described above will be described with reference to FIG.

（ステップ１：眼球中心位置初期値の算出）
まず、各フレームの顔位置・姿勢ｔ_ｋ，Ｒ_ｋ、位置ｔ_ｋ、スケールファクタｓ_ｋおよびステップＳ３０６で求めた虹彩位置から、ＲＡＮＳＡＣを利用して大まかな眼球中心位置を推定する。 (Step 1: Calculation of initial value of eyeball center position)
First, the face position-posture _t k of each frame, _{R k,} position _{t k,} from the iris position calculated by the scale factor _{s k} and step S306, by using the RANSAC to estimate the rough eye center position.

（ＲＡＮＳＡＣ：Random sample consensus）
以下で説明するＲＡＮＳＡＣ処理は、外れ値を含むデータから安定にモデルパラメータを定めるための処理であり、これについては、たとえば、以下の文献に記載されているので、その処理の概略を説明するにとどめる。 (RANSAC: Random sample consensus)
The RANSAC process described below is a process for stably determining model parameters from data including outliers. This is described, for example, in the following document, and the outline of the process will be described. Stay.

文献：M.A.Fischler and R.C.Bolles:” Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,”Comm. Of the ACM, Vol.24, pp.381-395,1981
文献：大江統子、佐藤智和、横矢直和：“画像徳著点によるランドマークデータベースに基づくカメラ位置・姿勢推定”、画像の認識・理解シンポジウム（MIRU2005）２００５年７月
ＲＡＮＳＡＣにより観測データから眼球中心位置Ｘ_Ｌ，Ｘ_Ｒ（初期値）を計算する手順
式（１９）より、フレームｋにおけるカメラ幾何を表す行列ｍ_ｋおよび式（１８）で表される特徴点の重心位置を得る。 Literature: MAFischler and RCBolles: “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Comm. Of the ACM, Vol.24, pp.381-395,1981
References: Tetsuko Oe, Tomokazu Sato, Naokazu Yokoya: “Camera Position and Posture Estimation Based on Landmark Database by Image Virtues”, Image Recognition / Understanding Symposium (MIRU2005) July 2005 Center of Eyeball from Observation Data by RANSAC Procedure for calculating positions X _L and X _R (initial values) From equation (19), a matrix m _k representing the camera geometry at frame _k and the barycentric position of the feature point represented by equation (18) are obtained.

ここで行列ｍ_ｋには、式（２０），（２１）で示したように、フレームｋにおけるカメラの位置・姿勢および撮影像のスケールファクタの情報が含まれる。 Here, the matrix m _k includes information on the position and orientation of the camera in the frame k and the scale factor of the captured image, as shown in the equations (20) and (21).

画像から観測されるｉフレーム目の画像における虹彩中心位置を以下の式（２３）とする。 The iris center position in the i-th frame image observed from the image is represented by the following equation (23).

ランダムにｑ個のフレームｆ_１，ｆ_２，…，ｆ_ｑが選択されたとすると、それらの観測データから眼球中心位置は次式（２４）により計算できる。 If q frames f ₁ , f ₂ ,..., F _q are selected at random, the eyeball center position can be calculated from the observation data by the following equation (24).

得られた眼球中心位置を入力画像ｉ（ｉ＝１，…，Ｎ）上に再投影し、以下の式（２５）で表される虹彩中心とのずれｅ_ｉを評価する。 The obtained eyeball center position is reprojected onto the input image i (i = 1,..., N), and the deviation e _i from the iris center expressed by the following equation (25) is evaluated.

図１９は、このような再投影とずれの関係を示す概念図である。
眼球中心の再投影位置と虹彩中心位置は、眼球の回転によって最大でｓ_ｉｌのずれが生じ得るため、ずれ|ｅ_ｉ|がｓ_ｉｌ以下であれば、以下の式（２６）のとおり誤差ゼロとみなす（ｌは眼球半径、ｓ_ｉはスケールファクタ）。 FIG. 19 is a conceptual diagram showing the relationship between such reprojection and deviation.
Since the maximum re-projection position of the eyeball center and the iris center position may cause a shift of s _i l at the maximum due to the rotation of the eyeball, if the shift | e _i | is equal to or less than s _i l, the following equation (26) The error is considered to be zero (l is the eyeball radius, and s _i is the scale factor).

誤差ｅ_ｉが一定値以下となるフレームが既定数以上あれば、それらのデータを利用して式（２４）により眼球中心位置を再度計算し、式（２６）により利用したフレームのデータに関する誤差を評価する。 If there are more than a predetermined number of frames in which the error e _i is equal to or less than a predetermined value, the center position of the eyeball is recalculated using Equation (24) using those data, and the error relating to the frame data used by Equation (26) is calculated. evaluate.

以上の処理を複数回行ない、誤差の総計がもっとも小さくなるときの眼球中心位置を出力値とする。 The above processing is performed a plurality of times, and the eyeball center position when the total error is the smallest is taken as the output value.

（ステップ２：眼球モデルパラメータの推定）
得られた眼球中心位置を初期値として、入力画像群に対して眼球モデルを当てはめ最適なモデルパラメータを推定する。ここで、入力画像から目の周辺領域を切り出し、色および輝度情報をもとに、以下の式（２７）に従って、虹彩（黒目）、白目、肌領域の３種類にラベル付けを行なう。 (Step 2: Estimation of eyeball model parameters)
Using the obtained eyeball center position as an initial value, an eyeball model is applied to the input image group to estimate an optimal model parameter. Here, a peripheral region of the eye is cut out from the input image, and labeling is performed on three types of iris (black eye), white eye, and skin region according to the following equation (27) based on the color and luminance information.

ここで、ｈｓ,ｋは、肌領域のｋ番目の画素の色相（hue）の値を表わす。ｈｉ，ｊは、入力画像中の画素ｉ，ｊの色相の値を表わす。ｖｓ,ｋは、入力画像中の画素ｉ，ｊの明
度の値を表わす。 Here, hs, k represents the hue value of the kth pixel in the skin region. hi, j represents the hue value of the pixel i, j in the input image. vs, k represents the brightness value of the pixels i, j in the input image.

図２０は、このようなラベリング処理例を示す図である。
続いて各画素が虹彩モデルの内側にあるかどうかをチェックし、眼球モデルとの照合度を評価する（非線形最適化）。 FIG. 20 is a diagram illustrating an example of such a labeling process.
Subsequently, it is checked whether each pixel is inside the iris model, and the degree of matching with the eyeball model is evaluated (nonlinear optimization).

図２１は、このような虹彩とモデルとの照合処理の概念を示す図である。
このような非線形最適化処理を行なうにあたり、以下の距離ｄ_｛LR｝,i,jを導入する。 FIG. 21 is a diagram showing the concept of such an iris / model matching process.
In performing such nonlinear optimization processing, the following distance d _{{LR}, i, j} is introduced.

一方、ｒ_｛LR｝,i,jは虹彩中心から画素ｉ，ｊ（フレームｉのｊ番目の画素）方向の虹彩半径を示すとすると、図２１に示すとおり、画素ｉ，ｊが虹彩の外側にあれば、ｄ_｛LR｝,i,jは、ｒ_｛LR｝,i,jよりも大きな値を示す。 On the other hand, if r _{{LR}, i, j} indicates the iris radius in the direction of pixel i, j (jth pixel of frame i) from the iris center, pixel i, j is located outside the iris as shown in FIG. D _{{LR}, i, j} indicates a larger value than r _{{LR}, i, j} .

ｒ_｛LR｝,i,jは、以下の式（２９）に示すように、眼球中心位置、虹彩位置、眼球半径、虹彩半径の関数となる。 r _{{LR}, i, j} is a function of the eyeball center position, iris position, eyeball radius, and iris radius, as shown in the following equation (29).

なお、顔の相対座標を正規化して考えているので、本来は、眼球中心位置は、フレームに拘わらず、一定の位置に存在するはずである。 Since the relative coordinates of the face are normalized, the eyeball center position should originally exist at a fixed position regardless of the frame.

最後に、眼周辺の全画素についてｄ_｛LR｝,i,jの評価を行ない、入力画像群に尤もよく当てはまる以下の式（３０）のモデルパラメータθを、式（３１）に従って決定する。 Finally, d _{{LR}, i, j} is evaluated for all pixels around the eye, and the model parameter θ of the following equation (30) that is most likely applied to the input image group is determined according to equation (31).

ここで、ｇ_i,j｛LR｝は、フレームi、画素jにおけるｄ_｛LR｝,i,jの評価値であり、対象画素が虹彩領域か白目領域かによって、以下の式に従い、符合を反転させる。 Here, g _{i, j {LR}} is an evaluation value of d _{{LR}, i, j} in frame i and pixel j, and the sign is determined according to the following formula depending on whether the target pixel is an iris region or a white-eye region. Invert.

ラベリングｕijが撮影された画像内の虹彩領域を反映し、関数Ｇ_i,j｛LR｝は、眼球モデルから算出される虹彩領域を反映している。 The labeling uij reflects the iris region in the photographed image, and the function G _{i, j {LR}} reflects the iris region calculated from the eyeball model.

以上の処理により、ユーザが自由な方向を向いた状態の画像でキャリブレーションを行なうことができ、かつ、眼球モデルパラメータも決定できることになる。 Through the above processing, calibration can be performed with an image in which the user faces in a free direction, and eyeball model parameters can also be determined.

以下、視線方向の推定処理の手続きについては、実施の形態１と同様である。
（視線方向の推定）
すなわち、視線方向は眼球中心と虹彩中心を結ぶ３次元直線として与えられるものとしてモデル化する。画像上での眼球半径をｌ、画像上での眼球中心と虹彩中心とのｘ軸方向、ｙ軸方向の距離をｄx、ｄyとすると、視線方向とカメラ光軸とのなす角、つまり、視線方向を向くベクトルがｘ軸およびｙ軸との成す角ψx、ψyは次式で表される。 Hereinafter, the procedure of the gaze direction estimation process is the same as that of the first embodiment.
(Gaze direction estimation)
That is, the line-of-sight direction is modeled as being given as a three-dimensional straight line connecting the eyeball center and the iris center. If the eyeball radius on the image is l, the distance between the eyeball center and the iris center on the image in the x-axis direction and the y-axis direction is dx, dy, the angle between the line-of-sight direction and the camera optical axis, that is, the line of sight The angles ψx and ψy formed by the vector pointing in the direction with the x axis and the y axis are expressed by the following equations.

以上のような実施の形態２の視線方向の推定装置、推定方法によっても、視線方向の推定について実施の形態１と同様の効果を奏することが可能である。 Also by the gaze direction estimation apparatus and estimation method of the second embodiment as described above, the same effects as those of the first embodiment can be achieved with respect to gaze direction estimation.

（実験結果）
以上説明した手法の有効性を確認するため、実画像を用いた実験を行なった。まず、５０フレームの画像列を用いて、顔の位置・姿勢変化および眼球モデルパラメータのキャリブレーションを行なった。キャリブレーション用画像列の撮影に要した時間は約２秒であり、このときに得られた眼球半径および虹彩半径はそれぞれ１１．２１画素、６．２７画素であった。 (Experimental result)
In order to confirm the effectiveness of the method described above, an experiment using real images was performed. First, the face position / posture change and eyeball model parameter calibration were performed using an image sequence of 50 frames. The time required to capture the calibration image sequence was about 2 seconds, and the eyeball radius and iris radius obtained at this time were 11.21 pixels and 6.27 pixels, respectively.

図２２は、キャリブレーションにより得られた顔モデルを用いて、視線推定を行なった結果を示す図である。ここで、被験者はそれぞれ右、正面、左の方向を注視しながら、顔の位置・向きを変化させている（白×印は眼球中心の投影位置推定値）。ここで、視線方向は両目それぞれで計算された視線方向の平均値としている。結果より、顔の位置・向き変化がある場合でも提案手法により視線方向が推定できていることがわかる。 FIG. 22 is a diagram illustrating a result of eye gaze estimation using a face model obtained by calibration. Here, the subject changes the position / orientation of the face while gazing at the right, front, and left directions (white x mark is the estimated position of the center of the eyeball). Here, the gaze direction is an average value of the gaze directions calculated for both eyes. The results show that the gaze direction can be estimated by the proposed method even when there is a change in the position and orientation of the face.

［実施の形態３]
実施の形態１および実施の形態２においては、まず、眼球モデルパラメータの推定処理を、たとえば、Ｎフレームの画像列を用いて行なってから、視線方向の推定処理に移っていた。このとき、視線方向の推定処理においては、虹彩中心を求めるために、たとえば、ハフ変換を用いた処理を行なうこととしていた。 [Embodiment 3]
In the first embodiment and the second embodiment, first, the eyeball model parameter estimation processing is performed using, for example, an N-frame image sequence, and then shifts to the gaze direction estimation processing. At this time, in the gaze direction estimation process, for example, a process using Hough transform is performed in order to obtain the iris center.

たとえば、実施の形態２においては、図１８において説明したとおり、眼球モデルパラメータの推定処理において、まず、ステップ１として、「眼球中心の初期値の算出」を行ない、次に、ステップ２として、「眼球モデルパラメータの推定」を非線形最適化処理により行なっていた。 For example, in the second embodiment, as described in FIG. 18, in the eyeball model parameter estimation process, first, “calculation of the initial value of the eyeball center” is performed as step 1, and then, as step 2, “ “Estimation of eyeball model parameters” was performed by nonlinear optimization processing.

実施の形態３においては、さらに、視線方向の推定処理においても、虹彩中心位置を求めるために、「非線形最適化処理」を行なう例について説明する。 In the third embodiment, an example in which “nonlinear optimization processing” is performed in order to obtain the iris center position also in the gaze direction estimation processing will be described.

そこで、視線方向の推定処理での非線形最適化処理について説明する前提として、まずは、眼球モデルパラメータの推定における「非線形最適化処理」について、もう一度簡単にまとめる。 Therefore, as a premise for explaining the nonlinear optimization process in the eye gaze direction estimation process, first, the “nonlinear optimization process” in the estimation of the eyeball model parameters will be briefly summarized once again.

図２３は、このような眼球モデルパラメータの推定における「非線形最適化処理」を説明するための概念図である。 FIG. 23 is a conceptual diagram for explaining “nonlinear optimization processing” in such estimation of eyeball model parameters.

図２３を参照して、Ｎフレームの画像列を用いて、たとえば、実施の形態２と同様にして、ステップ１として、ＲＡＮＳＡＣを用いて、「眼球中心位置初期値の算出」を行なう。なお、眼球中心位置初期値としては、このようにして求めた値を用いることに限定されず、たとえば、解剖学的な知見から得られた平均的な値を用いることも可能である。 Referring to FIG. 23, “calculation of the initial value of the eyeball center position” is performed using RANSAC as step 1 using an image sequence of N frames, for example, in the same manner as in the second embodiment. Note that the initial value of the eyeball center position is not limited to the value obtained in this manner, and for example, an average value obtained from anatomical knowledge can be used.

続いて、ステップ２として、得られた眼球中心位置を初期値として、入力画像群に対して眼球モデルを当てはめ最適なモデルパラメータを、逐次推定する。 Subsequently, in step 2, the obtained eyeball center position is used as an initial value, and an optimal model parameter is sequentially estimated by fitting an eyeball model to the input image group.

フレーム１〜フレームＮについて、目の周辺領域を切り出し、色および輝度情報をもとに、前述した式（２７）に従って、虹彩（黒目）、白目、肌領域の３種類にラベル付けを行なう。 For frame 1 to frame N, the peripheral region of the eye is cut out, and labeling is performed on the three types of iris (black eye), white eye, and skin region according to the above-described equation (27) based on the color and luminance information.

続いて、各画素が虹彩モデルの内側にあるかどうかをチェックし、眼球モデルとの照合度を評価する（非線形最適化）。 Subsequently, it is checked whether each pixel is inside the iris model, and the matching degree with the eyeball model is evaluated (nonlinear optimization).

すなわち、現時点のステップでのモデルパラメータに対応する、虹彩中心から画素ｉ，ｊ（フレームｉのｊ番目の画素）方向の虹彩半径ｒ_｛LR｝,i,jを用いて、式（２８）で表される距離ｄ_｛LR｝,i,jを評価する。 That is, using the iris radius r _{{LR}, i, j} from the iris center to the pixel i, j (jth pixel of frame i) direction corresponding to the model parameter at the current step, Evaluate the represented distance d _{{LR}, i, j} .

この評価では、上述した式（３０）（３１）を用いて、眼球モデルパラメータ（眼球半径、虹彩半径、眼球中心位置）を更新しつつ、眼球中心を再投影した画像とラベル付けされたフレームとを照合する処理を繰り返すことで、式（３１）を満たすようにモデルパラメータθを決定する。 In this evaluation, using the above-described equations (30) and (31), the eyeball model parameters (eyeball radius, iris radius, eyeball center position) are updated, and an image obtained by reprojecting the eyeball center and a labeled frame, The model parameter θ is determined so as to satisfy the expression (31) by repeating the process of collating.

（非線形最適化を用いた視線方向推定）
以上のようにして、モデルパラメータが決定された後に、新たに観測される入力画像について、ここまでで得られた顔・眼球の３次元モデルを利用して、視線方向を推定する。 (Gaze direction estimation using nonlinear optimization)
After the model parameters are determined as described above, the line-of-sight direction is estimated using the three-dimensional model of the face / eyeball obtained so far for the newly observed input image.

図２４は、このような視線推定のために虹彩中位置を決定する処理を示す概念図である。ここで、このような虹彩中心位置の決定は、図１３で説明される視線方向推定処理のうち、ステップＳ２１２で行なわれる処理に対応する。 FIG. 24 is a conceptual diagram showing a process for determining the mid-iris position for such gaze estimation. Here, such determination of the iris center position corresponds to the process performed in step S212 in the gaze direction estimation process described in FIG.

時刻tにおける画像It（フレームｔ）に対し、まず顔特徴点の追跡を行なう。顔特徴点がｘ_１ ^（ｔ）（太字），…，ｘ_Ｍ ^（ｔ）（太字）の位置にそれぞれ観測されたとする。 First, face feature points are tracked for an image It (frame t) at time t. Assume that face feature points are observed at positions x ₁ ^(t) (bold),..., X _M ^(t) (bold), respectively.

ここで、上述した式（８）（９）の行列Ｐ^（ｋ）（太字）は４点以上の顔特徴点が観測されれば計算できる。よって画像Itにおける行列Ｐ^（ｔ）（太字）と眼球中心の３次元位置Ｘ_｛ＬＲ｝（太字）を用いて、画像Itにおける両目の眼球中心位置（２次元）ｘ_{ｅ｛ＬＲ｝}（太字）は、以下のようにして計算できる。 Here, the matrix P ^(k) (bold) in the above-described equations (8) and (9) can be calculated if four or more face feature points are observed. Therefore, using the matrix P ^(t) (bold) in the image It and the three-dimensional position X _{LR} (bold) of the eyeball center, the eyeball center position (two-dimensional) x _{e {LR}} (bold) in the image It. Can be calculated as follows.

続いて、虹彩中心位置を推定する。上で求まった眼球中心位置をもとに目周辺領域を切り出し、眼球モデルパラメータの推定時と同様に、図２０で説明したように以下の式にしたがって、色および輝度情報をもとに虹彩(黒目)、白目、肌領域の３種類にラベル付けを行なう。 Subsequently, the iris center position is estimated. Based on the eyeball center position obtained above, a region around the eye is cut out, and in the same way as when estimating the eyeball model parameters, as described in FIG. Label three types: black eye), white eye, and skin area.

続いて、やはり、図２１で説明した眼球モデルパラメータの推定時と同様に、以下の距離ｄ_｛LR｝,i,jを用いて、各画素が虹彩モデルの内側にあるかどうかをチェックし、眼球モデルとの照合度を評価する。 Subsequently, similarly to the estimation of the eyeball model parameters described with reference to FIG. 21, the following distances d _{{LR}, i, j} are used to check whether each pixel is inside the iris model, Evaluate the degree of matching with the eyeball model.

ここで、ｒ_{｛LR｝,ｔ,j}は、虹彩中心から画像Itの画素ｊ方向の虹彩半径を示しており、画像Itにおいて画素ｊが虹彩の外側にあれば、ｄ_{｛LR｝,ｔ,j}は正の値を示す。 Here, r _{{LR}, t, j} indicates the iris radius in the pixel j direction of the image It from the iris center, and if the pixel j is outside the iris in the image It, d _{{LR}, t,} j _j represents a positive value.

ｒ_{｛LR｝,ｔ,j}は、眼球中心位置、虹彩位置、眼球半径、虹彩半径の関数となるが、眼球中心位置、眼球半径、虹彩半径は、眼球モデルパラメータとして既に得られている。 r _{{LR}, t, j} is a function of the eyeball center position, iris position, eyeball radius, and iris radius. The eyeball center position, eyeball radius, and iris radius are already obtained as eyeball model parameters.

したがって、ｒ_{｛LR｝,ｔ,j}は、虹彩位置ｘ_{｛ＬＲ｝，ｔ}（太字）の関数と見なすことができるので、以下のように表現できる。 Accordingly, r _{{LR}, t, j} can be regarded as a function of the iris position x _{{LR}, t} (bold), and can be expressed as follows.

よって、眼周辺の全画素についてｄ_{｛LR｝,ｔ,j}の評価を行ない、画像Itに尤もよく当てはまるパラメータθ＝［ｘ_Ｌ，ｔ（太字），ｘ_Ｒ，ｔ（太字）］を、非線形最適化の手続きで決定することで、虹彩中心位置が計算できる。この非線形最適化の手続きにあたっては、以下の式について最適化を行なう。 Therefore, d _{{LR}, t, j} is evaluated for all the pixels around the eye, and parameters θ = [x _{L, t} (bold), x _{R, t} (bold)]] that are most likely applied to the image It are non-linear. By determining the optimization procedure, the iris center position can be calculated. In this nonlinear optimization procedure, the following expression is optimized.

ここで、ｇ_{ｔ,j｛LR｝}は、画像It、画素jにおけるｄ_{｛LR｝,ｔ,j}の評価値であり、対象画素が虹彩領域か白目領域かによって、以下の式に従い、符合を反転させる。 Here, g _{t, j {LR}} is an evaluation value of d _{{LR}, t, j} in the image It, pixel j, and the sign is determined according to the following formula depending on whether the target pixel is an iris region or a white-eye region. Invert.

最後に、以上で求まった眼球中心位置と虹彩中心位置より視線方向を計算する。画像上での眼球半径をｌ、画像上での眼球中心と虹彩中心とのｘ軸方向、ｙ軸方向の距離をｄx、ｄyとすると、視線方向とカメラ光軸とのなす角、つまり、視線方向を向くベクトルがｘ軸およびｙ軸との成す角ψx、ψyは次式で表される。 Finally, the line-of-sight direction is calculated from the eyeball center position and iris center position obtained above. If the eyeball radius on the image is l, the distance between the eyeball center and the iris center on the image in the x-axis direction and the y-axis direction is dx, dy, the angle between the line-of-sight direction and the camera optical axis, that is, the line of sight The angles ψx and ψy formed by the vector pointing in the direction with the x axis and the y axis are expressed by the following equations.

ただし、画像上での眼球半径ｌは３次元眼球半径と行列Ｐ^（ｔ）から計算できる。
以上のような処理により、視線方向の推定処理において、虹彩位置を各フレームにおいて、非線形最適化により求めることとしたので、より高精度に視線方向の推定を行なうことが可能となる。 However, the eyeball radius l on the image can be calculated from the three-dimensional eyeball radius and the matrix P ^(t) .
With the processing as described above, since the iris position is obtained by nonlinear optimization in each frame in the gaze direction estimation processing, the gaze direction can be estimated with higher accuracy.

[実施の形態４]
実施の形態２および実施の形態３においては、眼球モデルパラメータの推定処理については、既に述べた複数フレーム画像を入力としたＲＡＮＳＡＣによる初期値推定と非線形最適化の組合せを用いていた。 [Embodiment 4]
In the second and third embodiments, the eyeball model parameter estimation process uses a combination of the above-described initial value estimation by RANSAC and nonlinear optimization using a plurality of frame images as inputs.

しかし、実施の形態４では、実施の形態２または実施の形態３の構成において、「眼球モデルパラメータの推定処理」を、以下に説明するような「逐次型眼球モデル推定」の処理に置き換える。 However, in the fourth embodiment, in the configuration of the second or third embodiment, the “eyeball model parameter estimation process” is replaced with a “sequential eyeball model estimation” process as described below.

図２５は、このような眼球モデルパラメータの推定処理を「逐次型眼球モデル推定」の処理に置き換えた場合の処理の流れを説明する概念図である。なお、実施の形態４でも、キャリブレーション処理において、顔３次元モデルの生成処理が、複数枚の画像に基づいてなされているものとする。 FIG. 25 is a conceptual diagram illustrating the flow of processing when such an eyeball model parameter estimation process is replaced with a “sequential eyeball model estimation” process. In the fourth embodiment, it is assumed that the face three-dimensional model generation process is performed based on a plurality of images in the calibration process.

すなわち、実施の形態４の眼球モデルパラメータの推定処理については、既に述べた複数フレーム画像を入力としたＲＡＮＳＡＣによる初期値推定と非線形最適化の組合せではなく、平均的なモデルパラメータを初期値とした逐次型のアルゴリズムを用いることもできる。 That is, in the eyeball model parameter estimation processing according to the fourth embodiment, the average model parameter is used as the initial value, not the combination of the initial value estimation by RANSAC and the non-linear optimization which has already been input with the multiple frame images. A sequential algorithm can also be used.

図２５を参照して、このアルゴリズムの実装例について説明している。まずアルゴリズムの開始時点では、被験者実験により対象ユーザの平均値を求めておく等の方法で得た眼球位置Ｘ^０ _Ｌ（太字）、Ｘ^０ _Ｒ（太字）、大きさ眼球半径ｌ^０、虹彩半径ｒ^０を初期パラメータとして、眼球モデルパラメータをたとえばハードディスク５４に保持している。 With reference to FIG. 25, an implementation example of this algorithm will be described. First, at the start of the algorithm, the eyeball position X ⁰ _L (bold), X ⁰ _R (bold), size eyeball radius l ⁰ , iris radius obtained by a method such as obtaining the average value of the target user by subject experiment the r ⁰ as the initial parameter holds the eyeball model parameters for example to the hard disk 54.

ＣＰＵ５６は、第１フレームに対するラベリング結果および顔姿勢を入力として、初期パラメータを出発点として、実施の形態３で説明した視線方向推定の処理と同様の非線形最適化処理によって眼球モデルパラメータＸ^１ _Ｌ（太字）、Ｘ^１ _Ｒ（太字）、大きさ眼球半径ｌ^１、虹彩半径ｒ^１および第１フレームにおける虹彩中心位置ｘ_Ｌ，１（太字），ｘ_ｒ，１（太字）を得て、たとえば、ハードディスク５４に格納する。得られた虹彩中心位置および眼球中心位置から第１フレームにおける視線方向を計算することができる。 The CPU 56 receives the labeling result and the face posture for the first frame, uses the initial parameters as a starting point, and performs the eyeball model parameter X ¹ _L (by the nonlinear optimization process similar to the gaze direction estimation process described in the third embodiment. Bold), X ¹ _R (bold), size eyeball radius l ¹ , iris radius r ¹ and iris center position x _{L, 1} (bold), x _{r, 1} (bold) in the first frame, Store in the hard disk 54. The line-of-sight direction in the first frame can be calculated from the obtained iris center position and eyeball center position.

次フレーム以降の処理においては、前フレームで得られたモデルパラメータを初期値とし、新たに得られる入力データを加えて非線形最適化処理を行なうことでモデルパラメータの更新および当該フレームにおける虹彩中心位置の推定を行なうことができる。 In the process after the next frame, the model parameter obtained in the previous frame is used as an initial value, and the nonlinear optimization process is performed by adding newly obtained input data to update the model parameter and the iris center position in the frame. Estimation can be performed.

このような処理を行なうと、Ｎフレームの画像を取得して、キャリブレーション処理により、顔３次元モデルの生成した後、さらに、眼球モデルパラメータの推定処理の終了を待ってから、視線の推定処理を開始する場合に比べて、視線方向の推定処理を短時間で開始できるという利点がある。 When such processing is performed, an N-frame image is obtained, and after generating a face three-dimensional model by calibration processing, the eyeball model parameter estimation processing is waited for, and then the gaze estimation processing is performed. There is an advantage that the gaze direction estimation process can be started in a short time compared to the case of starting.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

本発明の実施の形態にかかるシステムの外観図である。1 is an external view of a system according to an embodiment of the present invention. 視線検出装置の外観を示す図である。It is a figure which shows the external appearance of a gaze detection apparatus. 本発明の実施の形態にかかるシステムのハードウェア的構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the system concerning embodiment of this invention. ＣＰＵ５６が、視線方向推定のプログラムに基づいて行なうソフトウェア処理を示す機能ブロック図である。It is a functional block diagram which shows the software process which CPU56 performs based on the program of a gaze direction estimation. 眉間候補領域を検出するためのフィルタを説明するための概念図である。It is a conceptual diagram for demonstrating the filter for detecting an eyebrow space candidate area | region. ６分割矩形フィルタの他の構成を示す概念図である。It is a conceptual diagram which shows the other structure of a 6 division | segmentation rectangular filter. 眉間を中心とした画像領域を利用してＳＶＭによるモデル化を説明する図である。It is a figure explaining modeling by SVM using the image field centering on the space between eyebrows. 顔検出結果の例を示す図である。It is a figure which shows the example of a face detection result. 視線方向を決定するためのモデルを説明する概念図である。It is a conceptual diagram explaining the model for determining a gaze direction. ユーザがカメラを注視する状態に移行した後の虹彩中心、眼球中心および投影点の関係を示す概念図である。It is a conceptual diagram which shows the relationship between the iris center, the eyeball center, and a projection point after changing to the state where a user gazes at a camera. 視線方向の推定装置の初期設定の処理のフローを説明するためのフローチャートである。It is a flowchart for demonstrating the flow of the initialization process of the estimation apparatus of a gaze direction. キャリブレーションにおいて撮影された４枚の画像フレームを示す図である。It is a figure which shows four image frames image | photographed in the calibration. 視線方向の推定装置が実行するリアルタイム視線検出の処理のフローを説明するためのフローチャートである。It is a flowchart for demonstrating the flow of the process of the real-time gaze detection which the gaze direction estimation apparatus performs. 右上方注視の状態での視線推定結果を示す図である。It is a figure which shows the gaze estimation result in the state of a right upper gaze. 上方注視の状態での視線推定結果を示す図である。It is a figure which shows the gaze estimation result in the state of an upward gaze. 左下方向注視の状態での視線推定結果を示す図である。It is a figure which shows the gaze estimation result in the state of a lower left direction gaze. 実施の形態２の視線方向の推定装置の初期設定として行なうキャリブレーションを説明するためのフローチャートである。10 is a flowchart for explaining calibration performed as an initial setting of the gaze direction estimation apparatus according to the second embodiment. 眼球モデルパラメータの推定処理の手続きを示す図である。It is a figure which shows the procedure of the estimation process of an eyeball model parameter. 再投影とずれの関係を示す概念図である。It is a conceptual diagram which shows the relationship between reprojection and shift | offset | difference. ラベリング処理例を示す図である。It is a figure which shows the example of a labeling process. 虹彩とモデルとの照合処理の概念を示す図である。It is a figure which shows the concept of the collation process of an iris and a model. キャリブレーションにより得られた顔モデルを用いて、視線推定を行なった結果を示す図である。It is a figure which shows the result of having performed eye-gaze estimation using the face model obtained by calibration. 眼球モデルパラメータの推定における「非線形最適化処理」を説明するための概念図である。It is a conceptual diagram for demonstrating "nonlinear optimization process" in estimation of an eyeball model parameter. 視線推定のために虹彩中位置を決定する処理を示す概念図である。It is a conceptual diagram which shows the process which determines the position in an iris for gaze estimation. 眼球モデルパラメータの推定処理を「逐次型眼球モデル推定」の処理に置き換えた場合の処理の流れを説明する概念図である。It is a conceptual diagram explaining the flow of a process at the time of replacing the estimation process of an eyeball model parameter with the process of "sequential eyeball model estimation".

Explanation of symbols

２０視線方向の推定装置、３０カメラ、４０コンピュータ本体、４２モニタ、５６０２画像キャプチャ処理部、５６０４画像データ記録処理部、５６０６顔検出部、５６０８顔領域抽出部、５６１０特徴点抽出部、５６１２虹彩中心抽出部、５６１４相対関係特定部、５６１６眼球中心推定部、５６１８視線方向推定部、５６３０表示制御部。 20 gaze direction estimation device, 30 camera, 40 computer main body, 42 monitor, 5602 image capture processing unit, 5604 image data recording processing unit, 5606 face detection unit, 5608 face area extraction unit, 5610 feature point extraction unit, 5612 iris center Extraction unit, 5614 relative relationship specifying unit, 5616 eyeball center estimation unit, 5618 gaze direction estimation unit, 5630 display control unit.

Claims

Photographing means for acquiring images;
Relative relationship specifying means for specifying a relative three-dimensional positional relationship between a plurality of feature points in the face area based on an image of a frame including a human face area shot by the shooting means;
Eyeball center estimation means for executing an eyeball center position estimation process that is a three-dimensional position of the human eyeball center;
The eyeball center estimation means is specified by the relative relationship specifying means in the target frame of the estimation process using a relative three-dimensional positional relationship between the feature point specified in advance and the position of the eyeball center. And performing the estimation process based on a relative three-dimensional positional relationship between the plurality of feature points.
Iris center extraction means for extracting the iris center position at which the iris center is projected on the image by comparing the iris region with the iris shape model in the image region of the frame to be estimated. When,
Based on the iris center position and the eyeball center position extracted in the image of the frame to be estimated, the gaze direction is estimated as the direction of a three-dimensional straight line connecting the eyeball center and the iris center. A gaze direction estimation device further comprising gaze estimation means.

The gaze direction estimation apparatus according to claim 1, wherein the photographing unit is a monocular photographing unit for photographing and acquiring image data corresponding to each pixel in a target image region including a human face region.

The relative relationship specifying means acquires in advance a plurality of calibration images photographed by the photographing means while the human is looking at the photographing means at the time of calibration, and between the feature point and the eyeball center. Identify the relative three-dimensional positional relationship of
The eyeball center estimation unit is configured to determine the relative three-dimensional position specified by the projection positions of the plurality of feature points detected in a target image area including the human face area captured by the imaging unit. The gaze direction estimation apparatus according to claim 1, wherein a projection position at the center of the human eyeball is estimated based on a relationship.

The relative relationship specifying means includes:
A measurement matrix calculating means for extracting a projection position of the plurality of feature points in the plurality of calibration images and calculating a measurement matrix in which the projection positions of the plurality of feature points are arranged as elements;
By taking a factorization of the measurement matrix, a shooting posture matrix having information on the posture of the shooting means as an element, and a relative positional relationship matrix having information on a relative three-dimensional positional relationship between the plurality of feature points as elements The gaze direction estimating apparatus according to claim 3, further comprising factor decomposition means for decomposing the line of sight.

The eyeball center estimation means includes:
A feature point specifying means for associating the feature points observed in the captured image frame with the feature points in the calibration image;
The line of sight according to claim 4, wherein a projected position of the eyeball center is estimated from a partial matrix of the relative positional relationship matrix for the feature points observed in the captured image frame and the observed feature points. Direction estimation device.

The relative relationship specifying unit acquires a plurality of calibration images captured by the imaging unit in advance at the time of calibration, normalizes the calibration image, and a plurality of feature points in the face region Identify the relative three-dimensional positional relationship with the eyeball center,
The eyeball center estimation means includes an iris area in a photographed image, a position of an eyeball center assumed in a face coordinate system, and a target image area including the human face area photographed by the photographing means; The gaze direction estimation apparatus according to claim 1, wherein the projection position at the center of the human eyeball is estimated by collating with an iris model region calculated and projected using an eyeball radius.

The relative relationship specifying unit specifies a relative three-dimensional positional relationship between a plurality of feature points in the face region after normalizing an image shot by the shooting unit,
The eyeball center estimation means includes an iris area in a photographed image, a position of an eyeball center assumed in a face coordinate system, and a target image area including the human face area photographed by the photographing means; The gaze direction estimation apparatus according to claim 1, wherein the position of the center of the human eyeball is estimated by collating with a model area of an iris calculated and projected using an eyeball radius.

The iris center extracting means includes an iris area in the photographed image, an eyeball center position assumed in face coordinates, and an eyeball in a target image area including the human face area photographed by the photographing means. The gaze direction estimation apparatus according to claim 1, wherein the iris center position is extracted by collating with a model region having a configuration calculated and projected using a radius.

Capturing and acquiring image data corresponding to each pixel in a target image area of a plurality of frames including a human face area by a monocular imaging means;
A relative three-dimensional position between a plurality of feature points in the face area and the center of the eyeball is acquired in advance while a calibration image photographed by the photographing means is obtained while the human is looking at the photographing means. Identifying the relationship;
A projection position of the plurality of feature points is detected in the target image area photographed by the photographing means, and a relative three-dimensional relationship between the position of the human eyeball center and the feature points specified in advance during calibration is detected. Estimating a three-dimensional position of the human eyeball center based on a relative three-dimensional positional relationship between the plurality of feature points specified in the frame,
Extracting the iris center position at which the iris center is projected onto the image by matching the iris region with the iris shape model in the image region of the frame; and
A method of estimating a gaze direction, comprising: estimating a gaze direction as a direction of a three-dimensional straight line connecting the center of the eyeball and the center of the iris based on the extracted iris center position and the projected position of the eyeball center. .

Capturing and acquiring image data corresponding to each pixel in a target image area including a human face area of an image of a frame by a monocular imaging means;
Obtaining an image photographed by the photographing means, normalizing the image, and specifying a relative three-dimensional positional relationship between a plurality of feature points in the face region;
The human eyeball center and the feature point identified by a frame photographed before the target frame are detected by detecting projection positions of the plurality of feature points in the target image region photographed by the photographing means. the relative three-dimensional positional relationship between the, based on the relative three-dimensional positional relationship between the plurality of feature points identified in the frame, estimates a three-dimensional position of the human eyeball center With steps,
In the estimation step, in the target image area including the human face area photographed by the photographing means, the iris area in the photographed image, the position of the eyeball center assumed in the face coordinate system, and the eyeball Estimating the position of the human eyeball center by matching with a projected iris model region calculated using a radius;
Extracting the iris center position at which the iris center is projected onto the image by matching the iris region with the iris shape model within the target image region of each of the frames;
A gaze direction estimation method comprising: estimating a gaze direction as a direction of a three-dimensional straight line connecting the eyeball center and the iris center based on the extracted iris center position and the eyeball center position.

A program for causing a computer having arithmetic processing means to execute a gaze direction estimation process on a face in a target image area,
Capturing and acquiring image data corresponding to each pixel in a target image area of a plurality of frames including a human face area by a monocular imaging means;
A relative three-dimensional position between a plurality of feature points in the face area and the center of the eyeball is acquired in advance while a calibration image photographed by the photographing means is obtained while the human is looking at the photographing means. Identifying the relationship;
A projection position of the plurality of feature points is detected in the target image area photographed by the photographing means, and a relative three-dimensional relationship between the position of the human eyeball center and the feature points specified in advance during calibration is detected. Estimating a three-dimensional position of the human eyeball center based on a relative three-dimensional positional relationship between the plurality of feature points specified in the frame,
Extracting the iris center position at which the iris center is projected onto the image by matching the iris region with the iris shape model in the image region of the frame; and
A program for causing a computer to execute a step of estimating a gaze direction as a direction of a three-dimensional straight line connecting the eyeball center and the iris center based on the extracted iris center position and the projected position of the eyeball center.

A program for causing a computer having arithmetic processing means to execute a gaze direction estimation process on a face in a target image area,
Capturing and acquiring image data corresponding to each pixel in a target image area including a human face area of an image of a frame by a monocular imaging means;
Obtaining an image photographed by the photographing means, normalizing the image, and specifying a relative three-dimensional positional relationship between a plurality of feature points in the face region;
The human eyeball center and the feature point identified by a frame photographed before the target frame are detected by detecting projection positions of the plurality of feature points in the target image region photographed by the photographing means. the relative three-dimensional positional relationship between the, based on the relative three-dimensional positional relationship between the plurality of feature points identified in the frame, estimates a three-dimensional position of the human eyeball center With steps,
In the estimation step, in the target image area including the human face area photographed by the photographing means, the iris area in the photographed image, the position of the eyeball center assumed in the face coordinate system, and the eyeball Estimating the position of the human eyeball center by matching with a projected iris model region calculated using a radius;
Extracting the iris center position at which the iris center is projected onto the image by matching the iris region with the iris shape model within the target image region of each of the frames;
A program for causing a computer to execute a step of estimating a gaze direction as a direction of a three-dimensional straight line connecting the eyeball center and the iris center based on the extracted iris center position and the eyeball center position.