JP2023125652A

JP2023125652A - Pupil detection device and pupil detection method

Info

Publication number: JP2023125652A
Application number: JP2022029876A
Authority: JP
Inventors: 嘉伸海老澤; Yoshinobu Ebisawa
Original assignee: Shizuoka University NUC
Current assignee: Shizuoka University NUC
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2023-09-07

Abstract

To enhance calculation efficiency and detection accuracy with a simple device configuration.SOLUTION: A sight line detection device 1 comprises: a camera 10 which acquires face images at continuous timing by imaging the face of an object person A; a light source 13 which emits light toward the face of the object person A; and an image processing device 20 which processes the face image acquired by the camera 10 at the irradiation timing of light. The image processing device 20 comprises: a sight line detection unit 23 which detects a position of the pupil of the object person A on the face image; and a pupil position prediction unit 25 which predicts the position of the pupil during eye closure of the object person A by searching for a position of a feature part of the eye in an eye closure state on the face image. The sight line detection unit 23 detects the position of the pupil by tracking the pupil by utilizing the position of the pupil predicted by the pupil position prediction unit 25.SELECTED DRAWING: Figure 4

Description

本発明は、対象者の画像から瞳孔の位置を検出する瞳孔検出装置および瞳孔検出方法に関する。 The present invention relates to a pupil detection device and a pupil detection method for detecting the position of a pupil from an image of a subject.

従来から、対象者の目を含む顔の画像を取得して、その画像を基に対象者の瞳孔の位置を検出する装置が用いられている（例えば、下記特許文献１参照。）。この装置は、瞳孔が明るく写った明瞳孔画像と瞳孔が暗く写った暗瞳孔画像との差分により求まる差分画像を基に瞳孔の位置を検出している。その際、演算効率及び演算精度を高めるために、前のフレームの画像上で検出された瞳孔の位置から次のフレームの画像上において瞳孔の位置と予測される部分にウィンドウを設定し、そのウィンドウ内で瞳孔を探索している。 BACKGROUND ART Conventionally, devices have been used that acquire an image of a subject's face including the eyes and detect the position of the subject's pupils based on the image (for example, see Patent Document 1 below). This device detects the position of the pupil based on a difference image obtained by the difference between a bright pupil image in which the pupil is brightly depicted and a dark pupil image in which the pupil is darkly depicted. At that time, in order to improve calculation efficiency and accuracy, a window is set from the position of the pupil detected on the image of the previous frame to the part predicted to be the position of the pupil on the image of the next frame, and the window is Exploring the pupils inside.

特開２００７－２６８０２６号公報JP2007-268026A

上述した特許文献１に記載の装置では、対象者が瞬き等によって目を閉じた（以下、「閉眼した」ともいう。）場合に、画像上に設定するウィンドウの位置が現実の瞳孔の位置から外れてしまい、瞳孔の検出ができなくなる傾向にあった。ウィンドウを大きく設定すればそのような事態を回避できるが演算効率が低下する。 In the device described in Patent Document 1 mentioned above, when a subject closes his or her eyes by blinking or the like (hereinafter also referred to as "closed eyes"), the position of the window set on the image differs from the actual position of the pupil. There was a tendency for the pupil to become undetectable. Although such a situation can be avoided by setting a large window, the calculation efficiency will decrease.

本発明は、上記課題に鑑みて為されたものであり、演算効率及び検出精度を高めることが可能な瞳孔検出装置及び瞳孔検出方法を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a pupil detection device and a pupil detection method that can improve calculation efficiency and detection accuracy.

上記課題を解決するため、本発明の一形態にかかる瞳孔検出装置は、対象者の顔を撮像することにより顔画像を連続したタイミングで取得するカメラと、対象者の顔に向けて光を照射する光源と、光の照射タイミングでカメラによって取得された顔画像を処理する演算装置とを備え、演算装置は、顔画像上で対象者の瞳孔の位置を検出する瞳孔位置検出部と、顔画像上で閉眼状態における顔の特徴部の位置を探索することにより、対象者の閉眼中の瞳孔の位置を予測する瞳孔位置予測部と、を有し、瞳孔位置検出部は、瞳孔位置予測部によって予測された瞳孔の位置を利用して瞳孔を追尾することにより、瞳孔の位置を検出する。 In order to solve the above problems, a pupil detection device according to one embodiment of the present invention includes a camera that acquires facial images at consecutive timings by capturing an image of a target person's face, and a camera that irradiates light toward the target person's face. The computing device includes a pupil position detection unit that detects the position of the subject's pupil on the facial image, and a computing device that processes the facial image acquired by the camera at the timing of light irradiation. and a pupil position prediction unit that predicts the position of the pupil of the subject when the eyes are closed by searching for the position of the facial features in the eye closed state, and the pupil position detection unit predicts the position of the pupil of the subject when the eyes are closed. The position of the pupil is detected by tracking the pupil using the predicted pupil position.

また、本発明の他の形態にかかる瞳孔検出方法は、対象者の顔を撮像することにより顔画像を連続したタイミングで取得するカメラと、対象者の顔に向けて光を照射する光源と、光の照射タイミングでカメラによって取得された顔画像を処理する演算装置とを用いた瞳孔検出方法であって、演算装置が、顔画像上で対象者の瞳孔の位置を検出する瞳孔位置検出ステップと、演算装置が、顔画像上で閉眼状態における顔の特徴部の位置を探索することにより、対象者の閉眼中の瞳孔の位置を予測する瞳孔位置予測ステップと、を有し、瞳孔位置検出ステップでは、瞳孔位置予測ステップによって予測された瞳孔の位置を利用して瞳孔を追尾することにより、瞳孔の位置を検出する。 Further, a pupil detection method according to another aspect of the present invention includes: a camera that captures facial images of a target person at consecutive timings by imaging the target person's face; a light source that irradiates light toward the target person's face; A pupil detection method using a computing device that processes a face image acquired by a camera at the timing of light irradiation, the computing device detecting a pupil position of a subject on the face image; , a pupil position prediction step in which the arithmetic device predicts the position of the subject's pupil when the subject's eyes are closed by searching for the position of facial features in the eye closed state on the face image, and a pupil position detection step Then, the pupil position is detected by tracking the pupil using the pupil position predicted in the pupil position prediction step.

上記一形態の瞳孔検出装置、あるいは、上記他の形態の瞳孔検出方法によれば、カメラによって連続したタイミングで取得された顔画像上で瞳孔の位置が検出され、その顔画像から閉眼状態における顔の特徴部の位置が探索されることにより、その顔画像上において閉眼中の瞳孔の位置が予測される。そして瞳孔の位置の検出の際には、顔画像上で予測された瞳孔の位置を利用して瞳孔が追尾される。これにより、対象者が閉眼した場合であっても瞳孔を追尾することによって瞳孔の位置を検出でき、高い演算効率で、高精度の瞳孔検出処理が実現できる。 According to the pupil detection device of one form described above or the pupil detection method of another form described above, the position of the pupil is detected on face images acquired at consecutive timings by a camera, and the position of the pupil is detected from the face image in a state where the eyes are closed. By searching the position of the characteristic part of the face, the position of the pupil when the eyes are closed can be predicted on the face image. When detecting the position of the pupil, the pupil is tracked using the position of the pupil predicted on the face image. Thereby, even when the subject's eyes are closed, the position of the pupil can be detected by tracking the pupil, and highly accurate pupil detection processing can be achieved with high computational efficiency.

ここで、顔の特徴部は、対象者の目の特徴部であってよい。この場合、顔画像における目の特徴部の位置を探索してその位置から瞳孔の位置が予測されることにより、瞳孔の位置の予測精度を高めることができ、より高精度の瞳孔検出処理が実現される。 Here, the facial feature may be a feature of the subject's eyes. In this case, by searching for the position of the eye feature in the face image and predicting the pupil position from that position, it is possible to improve the prediction accuracy of the pupil position and achieve more accurate pupil detection processing. be done.

また、瞳孔位置予測部は、ニューラルネットワークを用いた機械学習モデルにより瞳孔の位置を予測する、こととしてもよい。この場合、顔画像上の瞳孔の位置の予測精度を簡易な学習手法によって確実に高めることができ、安定した瞳孔検出処理が実現される。 Furthermore, the pupil position prediction unit may predict the position of the pupil using a machine learning model using a neural network. In this case, the accuracy of predicting the position of the pupil on the face image can be reliably improved by a simple learning method, and stable pupil detection processing can be achieved.

また、瞳孔位置予測部は、顔画像を切り出した部分画像を入力データとして機械学習モデルに入力し、機械学習モデルを用いて、顔の特徴部が含まれる部分画像の位置を探索することにより、瞳孔の位置を予測する、こととしてもよい。この場合、顔画像を切り出した部分画像を入力データとして用いることにより、簡易な処理によって顔画像上の瞳孔の位置を予測することができる。その結果、瞳孔検出処理の演算効率をより高めることができる。 In addition, the pupil position prediction unit inputs a partial image extracted from the facial image as input data to a machine learning model, and uses the machine learning model to search for the position of the partial image containing the facial features. It can also be used to predict the position of the pupil. In this case, by using a partial image cut out from the face image as input data, the position of the pupil on the face image can be predicted through simple processing. As a result, the calculation efficiency of the pupil detection process can be further improved.

また、瞳孔位置予測部は、部分画像を顔の特徴部の全体を含むサイズで切り出す、こととしてもよい。この場合、顔画像を切り出した部分画像に顔の特徴部の全体を収めることができ、瞳孔の位置の予測精度を高めることができる。その結果、より安定した瞳孔検出処理が実現される。 Furthermore, the pupil position prediction unit may cut out the partial image in a size that includes the entire facial feature. In this case, the entire feature of the face can be contained in a partial image cut out from the face image, and the accuracy of predicting the position of the pupil can be improved. As a result, more stable pupil detection processing is realized.

また、演算装置は、機械学習モデルを学習させるモデル学習部をさらに有し、モデル学習部は、対象者の閉眼時に取得された顔画像を、対象者の閉眼直前に瞳孔位置検出部によって検出された瞳孔の位置を基準に切り出し、切り出した顔画像をトレーニングデータとして用いて、機械学習モデルを学習させる、こととしてもよい。こうすれば、適切なトレーニングデータを作成することができ、そのトレーニングデータを用いて学習させることにより瞳孔の位置の予測精度を高めることができる。その結果、より安定した瞳孔検出処理が実現される。 The arithmetic device further includes a model learning unit that learns a machine learning model, and the model learning unit stores a face image obtained when the subject's eyes are closed, which is detected by the pupil position detection unit immediately before the subject's eyes are closed. It is also possible to cut out the face image based on the position of the pupil and use the cut out face image as training data to train the machine learning model. In this way, it is possible to create appropriate training data, and by performing learning using the training data, it is possible to improve the prediction accuracy of the pupil position. As a result, more stable pupil detection processing is realized.

また、演算装置は、カメラから瞳孔までの距離を検出する瞳孔距離検出部をさらに有し、瞳孔位置予測部は、入力データとしての部分画像の切り出しのサイズを、瞳孔の距離に応じて可変に設定する、こととしてもよい。この場合には、入力データとして用いる部分画像のサイズを適切に設定することができ、瞳孔の位置の予測精度を高めることができる。その結果、より安定した瞳孔検出処理が実現される。 The arithmetic device further includes a pupil distance detection unit that detects the distance from the camera to the pupil, and the pupil position prediction unit changes the size of the cutout of the partial image as input data according to the distance of the pupil. It can also be used to set. In this case, the size of the partial image used as input data can be appropriately set, and the accuracy of predicting the pupil position can be improved. As a result, more stable pupil detection processing is realized.

また、演算装置は、カメラから瞳孔までの距離を検出する瞳孔距離検出部をさらに有し、モデル学習部は、トレーニングデータとしての顔画像の切り出しのサイズを、瞳孔の距離に応じて可変に設定する、こととしてもよい。こうすれば、トレーニングデータとして用いる顔画像の切り出しのサイズを適切に設定することができ、瞳孔の位置の予測精度を高めることができる。その結果、より安定した瞳孔検出処理が実現される。 The computing device further includes a pupil distance detection unit that detects the distance from the camera to the pupil, and the model learning unit variably sets the size of the cutout of the face image as training data according to the distance of the pupil. It can also be said to do. In this way, the size of the cutout of the face image used as training data can be appropriately set, and the accuracy of predicting the position of the pupil can be improved. As a result, more stable pupil detection processing is realized.

また、モデル学習部は、切り出した顔画像の中から顔の特徴部の存在する領域を特定し、特定した領域を基にトレーニングデータとしての顔画像の切り出しのサイズを設定する、こととしてもよい。こうすれば、トレーニングデータとして用いる顔画像の切り出しのサイズを顔の特徴部の領域に応じて適切に設定することができ、瞳孔の位置の予測精度を高めることができる。その結果、より安定した瞳孔検出処理が実現される。 Furthermore, the model learning unit may identify a region in which facial features exist from the extracted facial image, and set the size of the facial image to be extracted as training data based on the identified region. . In this way, the size of the cutout of the face image used as training data can be appropriately set according to the region of the facial feature, and the accuracy of predicting the position of the pupil can be improved. As a result, more stable pupil detection processing is realized.

また、モデル学習部は、切り出した顔画像の中から顔の特徴部の存在する領域を特定し、特定した領域が画像の中央に位置するようにトレーニングデータとしての顔画像の切り出しの位置を設定する、こととしてもよい。こうすれば、トレーニングデータとして用いる顔画像の切り出しの位置を顔の特徴部の領域に応じて適切に設定することができ、瞳孔の位置の予測精度を高めることができる。その結果、より安定した瞳孔検出処理が実現される。 In addition, the model learning unit identifies regions in which facial features are present in the extracted facial images, and sets the location for cropping the facial images as training data so that the identified regions are located in the center of the image. It can also be said to do. In this way, the position of cutting out the face image used as training data can be appropriately set according to the region of the facial feature, and the accuracy of predicting the position of the pupil can be improved. As a result, more stable pupil detection processing is realized.

また、機械学習モデルは、対象者の閉眼直前に瞳孔位置検出部によって検出された瞳孔の位置を基準に切り出した顔画像に対する、顔の特徴部が含まれる部分画像の位置のずれ量を予測するモデルであり、モデル学習部は、切り出した顔画像をずらしながらシフト画像を生成し、シフト画像と当該シフト画像のずれ量とをトレーニングデータとして用いて、学習モデルを学習させる、こととしてもよい。この場合には、入力データとして用いる複数の部分画像間の顔画像上におけるずれ量の大きさに関わらず、顔画像上の瞳孔の位置の予測精度を高めることができる。その結果、より安定した瞳孔検出処理が実現される。 In addition, the machine learning model predicts the amount of shift in the position of a partial image containing facial features with respect to a face image extracted based on the position of the pupil detected by the pupil position detection unit just before the subject's eyes are closed. The model learning unit may generate a shifted image while shifting the extracted face image, and use the shifted image and the amount of shift of the shifted image as training data to learn the learning model. In this case, the accuracy of predicting the position of the pupil on the face image can be improved regardless of the amount of deviation on the face image between the plurality of partial images used as input data. As a result, more stable pupil detection processing is realized.

また、瞳孔位置検出部は、対象者が閉眼した後に再度開眼した際に、対象者の閉眼したタイミングで瞳孔位置予測部によって予測された瞳孔の位置を基に顔画像上のウィンドウを設定することにより、瞳孔の位置を検出する、こととしてもよい。従来では、閉眼状態の後に開眼状態になった後は、数フレームの画像の間では瞳孔を検出できない状態になる傾向にあった。このような構成によれば、開眼した直後のフレームから、遅れなく、安定して瞳孔検出を再開することができる。 Furthermore, when the subject closes their eyes and then opens them again, the pupil position detection unit sets a window on the face image based on the pupil position predicted by the pupil position prediction unit at the timing when the subject closes their eyes. The position of the pupil may be detected by this method. Conventionally, after the eyes are closed and then opened, the pupils tend to be undetectable for several frames of images. According to such a configuration, pupil detection can be stably resumed without delay from the frame immediately after the eye is opened.

また、瞳孔位置検出部は、連続したフレームの顔画像上で検出された瞳孔の位置を用いて瞳孔の位置を追尾することにより、瞳孔の位置を検出し、直前のフレームの顔画像上で瞳孔の位置の検出が失敗した場合、瞳孔位置予測部によって予測された瞳孔の位置を利用して瞳孔の位置を追尾する、こととしてもよい。かかる構成によれば、連続して瞳孔の位置を検出する際に対象者が瞬き等により閉眼して画像上に瞳孔の像が現れなくなっても安定して瞳孔の位置を追尾することができ、対象者が開眼して画像上に再度瞳孔の像が現れた際に安定して瞳孔の位置を検出することができる。 In addition, the pupil position detection unit detects the pupil position by tracking the pupil position using the pupil position detected on the face image of consecutive frames, and detects the pupil position on the face image of the immediately previous frame. If the detection of the position of the pupil fails, the pupil position may be tracked using the pupil position predicted by the pupil position prediction unit. According to this configuration, when continuously detecting the position of the pupil, even if the subject closes his eyes due to blinking or the like and the image of the pupil no longer appears on the image, the position of the pupil can be stably tracked; When the subject opens his or her eyes and the image of the pupil appears on the image again, the position of the pupil can be stably detected.

本発明によれば、簡易な装置構成により瞳孔検出処理の演算効率及び検出精度を高めることができる。 According to the present invention, the calculation efficiency and detection accuracy of pupil detection processing can be improved with a simple device configuration.

実施形態に係る視線検出装置を示す斜視図である。FIG. 1 is a perspective view showing a line of sight detection device according to an embodiment. カメラのレンズ部分を示す平面図である。FIG. 3 is a plan view showing a lens portion of the camera. 実施形態に係る画像処理装置のハードウェア構成を示す図である。FIG. 1 is a diagram showing a hardware configuration of an image processing device according to an embodiment. 実施形態に係る視線検出装置の機能構成を示すブロック図である。FIG. 1 is a block diagram showing a functional configuration of a line of sight detection device according to an embodiment. 図４の部分画像作成部２４によって作成される部分画像のイメージを示す図である。5 is a diagram showing an image of a partial image created by the partial image creation unit 24 of FIG. 4. FIG. 瞳孔位置予測部２５によって用いられる機械学習モデル１の層構成を示す図である。FIG. 2 is a diagram showing a layer structure of the machine learning model 1 used by the pupil position prediction unit 25. FIG. 瞳孔位置予測部２５によって用いられる機械学習モデル２の層構成を示す図である。2 is a diagram showing a layer structure of a machine learning model 2 used by a pupil position prediction unit 25. FIG. 顔画像Ｇ_Ｆ２を対象に学習用画像作成部２６によって作成された左右の瞳孔の閉眼画像ＧＰ_Ｌ１，ＧＰ_Ｒ１の一例を示す図である。It is a figure which shows an example of closed-eye images GP _L1 and GP _R1 of the left and right pupils created by the learning image creation unit 26 for the face image G _F2 . 顔画像Ｇ_Ｆ２を対象に学習用画像作成部２６によって作成された学習データ「学習画像３」の一例を示す図である。FIG. 3 is a diagram showing an example of learning data "learning image 3" created by the learning image creating unit 26 for the face image _GF2 . 視線検出装置１の動作手順を示すフローチャートである。3 is a flowchart showing the operation procedure of the line of sight detection device 1. FIG.

以下、図面を参照しつつ本発明に係る瞳孔検出装置及び瞳孔検出方法の好適な実施形態について詳細に説明する。なお、図面の説明においては、同一又は相当部分には同一符号を付し、重複する説明を省略する。 Hereinafter, preferred embodiments of a pupil detection device and a pupil detection method according to the present invention will be described in detail with reference to the drawings. In the description of the drawings, the same or corresponding parts are denoted by the same reference numerals, and redundant description will be omitted.

［視線検出装置の構成］
まず、図１～４を用いて、実施形態に係る瞳孔検出装置である視線検出装置１の構成を説明する。視線検出装置１は、対象者の顔を撮像することで対象者の瞳孔及び角膜反射を検出し、その検出結果を利用して対象者の視線方向を検出するコンピュータシステムであり、この装置により、本実施形態に係る瞳孔検出方法が実施される。対象者とは、視線方向を検出する対象となる人であり、被験者ともいうことができる。視線検出装置１および瞳孔検出方法の利用目的は何ら限定されず、例えば、よそ見運転の検出、運転者のサイドミラーやルームミラーの安全確認動作の確認、運転者の眠気の検出、商品の興味の度合いの調査、アミューズメント装置等に利用されるコンピュータへのデータ入力、乳幼児の自閉症診断等の診断用装置、遠隔地間で利用されるコミュニケーションシステム、遠隔で事象を観察する観察装置などに視線検出装置１を利用することができる。 [Configuration of line of sight detection device]
First, the configuration of a line of sight detection device 1, which is a pupil detection device according to an embodiment, will be explained using FIGS. 1 to 4. The line of sight detection device 1 is a computer system that detects the target's pupil and corneal reflex by capturing an image of the target's face, and uses the detection results to detect the direction of the target's line of sight. A pupil detection method according to this embodiment is implemented. The target person is a person whose gaze direction is to be detected, and can also be called a test subject. The purposes of use of the line of sight detection device 1 and the pupil detection method are not limited in any way, and include, for example, detecting driving while looking away, checking the safety confirmation operation of the driver's side mirrors and room mirrors, detecting the driver's drowsiness, and detecting product interest. The line of sight is used to investigate the severity of the disease, input data into computers used in amusement machines, etc., diagnostic equipment for diagnosing autism in infants, communication systems used between remote locations, observation equipment to observe events remotely, etc. A detection device 1 can be used.

図１に模式的に示すように、視線検出装置１は、ステレオカメラとして機能する一対のカメラ１０と画像処理装置（演算装置）２０とを備える。以下では、必要に応じて、一対のカメラ１０を、対象者Ａの左側にある左カメラ１０_Ｌと、対象者Ａの右側にある右カメラ１０_Ｒとに区別する。本実施形態では、視線検出装置１は、対象者Ａが見る対象であるディスプレイ装置３０をさらに備えるが、視線検出装置１の利用目的は上記のように限定されないので、対象者Ａの視線の先にある物はディスプレイ装置３０に限定されず、例えば自動車のフロントガラスでもあり得る。したがって、ディスプレイ装置３０は視線検出装置１における必須の要素ではない。それぞれのカメラ１０は画像処理装置２０と無線または有線により接続され、カメラ１０と画像処理装置２０との間で各種のデータまたは命令が送受信される。各カメラ１０に対しては予めカメラ較正が行われる。 As schematically shown in FIG. 1, the line of sight detection device 1 includes a pair of cameras 10 that function as stereo cameras and an image processing device (arithmetic device) 20. Below, the pair of cameras 10 will be distinguished into a left camera _10L on the left side of the subject A and a right camera _10R on the right side of the subject A, as necessary. In the present embodiment, the line-of-sight detection device 1 further includes a display device 30, which is the object that the subject A looks at. However, since the purpose of use of the line-of-sight detection device 1 is not limited as described above, The object located there is not limited to the display device 30, but may also be a windshield of a car, for example. Therefore, the display device 30 is not an essential element in the line of sight detection device 1. Each camera 10 is connected to an image processing device 20 wirelessly or by wire, and various data or commands are transmitted and received between the camera 10 and the image processing device 20. Camera calibration is performed for each camera 10 in advance.

カメラ１０は、対象者Ａの左右の目を含む顔部分を撮影して顔画像を取得する。一対のカメラ１０は水平方向に沿って所定の間隔をおいて配され、かつ、対象者Ａが眼鏡をかけているときの顔画像における反射光の写り込みを防止する目的で対象者Ａの顔より低い位置に設けられる。水平方向に対するカメラ１０の仰角は、瞳孔の確実な検出と対象者Ａの視野範囲の妨げの回避との双方を考慮して、例えば２０～３５度の範囲に設定される。個々のカメラ１０に対しては予めカメラ較正が行われる。 The camera 10 photographs the facial part of the subject A including the left and right eyes to obtain a facial image. A pair of cameras 10 are arranged at a predetermined interval along the horizontal direction, and are designed to face the face of the subject A for the purpose of preventing reflected light from appearing in the face image when the subject A is wearing glasses. installed at a lower position. The elevation angle of the camera 10 with respect to the horizontal direction is set, for example, in the range of 20 to 35 degrees, taking into consideration both reliable detection of the pupil and avoidance of obstruction of the visual field of the subject A. Camera calibration is performed for each camera 10 in advance.

本実施形態では、カメラ１０は、連続した定期的なタイミングで複数のフレームの顔画像を取得可能なカメラである。カメラ１０は、画像処理装置２０からの命令に応じて対象者Ａを撮像し、顔画像を画像処理装置２０に出力する。 In this embodiment, the camera 10 is a camera that can acquire multiple frames of facial images at continuous and regular timing. The camera 10 images the subject A in response to a command from the image processing device 20 and outputs a facial image to the image processing device 20.

カメラ１０のレンズ部分を図２に模式的に示す。この図に示すように、カメラ１０では、対物レンズ１１が円形状の開口部１２に収容され、開口部１２の外側に光源１３が取り付けられている。光源１３は、対象者Ａの顔に向けて照明光を照射するための機器であり、複数の発光素子１３ａと複数の発光素子１３ｂとから成る。発光素子１３ａは、出力光の中心波長が８５０ｎｍの半導体発光素子（ＬＥＤ）であり、開口部１２の縁に沿って等間隔でリング状に配される。発光素子１３ｂは、出力光の中心波長が９４０ｎｍの半導体発光素子であり、発光素子１３ａの外側に等間隔でリング状に配される。したがって、カメラ１０の光軸から発光素子１３ｂまでの距離は、該光軸から発光素子１３ａまでの距離よりも大きい。それぞれの発光素子１３ａ，１３ｂは、カメラ１０の光軸に沿って照明光を出射するように設けられる。なお、光源１３の配置は図２に示す構成に限定されず、カメラをピンホールモデルとみなすことができれば他の配置であってもよい。光源１３は、画像処理装置２０からの命令に応じたタイミングで照明光を出射する。 The lens portion of the camera 10 is schematically shown in FIG. As shown in this figure, in the camera 10, an objective lens 11 is housed in a circular opening 12, and a light source 13 is attached to the outside of the opening 12. The light source 13 is a device for irradiating illumination light toward the face of the subject A, and includes a plurality of light emitting elements 13a and a plurality of light emitting elements 13b. The light emitting elements 13a are semiconductor light emitting elements (LEDs) whose output light has a center wavelength of 850 nm, and are arranged in a ring shape along the edge of the opening 12 at equal intervals. The light emitting element 13b is a semiconductor light emitting element whose output light has a center wavelength of 940 nm, and is arranged in a ring shape at equal intervals outside the light emitting element 13a. Therefore, the distance from the optical axis of the camera 10 to the light emitting element 13b is greater than the distance from the optical axis to the light emitting element 13a. Each of the light emitting elements 13a and 13b is provided so as to emit illumination light along the optical axis of the camera 10. Note that the arrangement of the light source 13 is not limited to the configuration shown in FIG. 2, and may be arranged in other ways as long as the camera can be regarded as a pinhole model. The light source 13 emits illumination light at a timing according to a command from the image processing device 20.

画像処理装置２０は、カメラ１０、及び光源１３の制御と、対象者Ａの顔画像を用いた視線方向の検出とを実行するコンピュータ（演算装置）である。画像処理装置２０は、据置型または携帯型のパーソナルコンピュータ（ＰＣ）により構築されてもよいし、ワークステーションにより構築されてもよいし、他の種類のコンピュータにより構築されてもよい。あるいは、画像処理装置２０は複数台の任意の種類のコンピュータを組み合わせて構築されてもよい。複数台のコンピュータを用いる場合には、これらのコンピュータはインターネットやイントラネットなどの通信ネットワークを介して接続される。 The image processing device 20 is a computer (arithmetic device) that controls the camera 10 and the light source 13, and detects the line of sight direction using the face image of the subject A. The image processing device 20 may be constructed using a stationary or portable personal computer (PC), a workstation, or another type of computer. Alternatively, the image processing device 20 may be constructed by combining a plurality of arbitrary types of computers. When using multiple computers, these computers are connected via a communication network such as the Internet or an intranet.

画像処理装置２０の一般的なハードウェア構成を図３に示す。画像処理装置２０は、オペレーティングシステムまたはアプリケーションプログラムなどを実行するＣＰＵ（プロセッサ）１０１と、ＲＯＭおよびＲＡＭで構成される主記憶部１０２と、ハードディスクあるいはフラッシュメモリなどで構成される補助記憶部１０３と、ネットワークカードあるいは無線通信モジュールで構成される通信制御部１０４と、キーボードやマウスなどの入力装置１０５と、ディスプレイあるいはプリンタなどの出力装置１０６とを備える。 FIG. 3 shows a general hardware configuration of the image processing device 20. The image processing device 20 includes a CPU (processor) 101 that executes an operating system or an application program, a main storage section 102 made up of ROM and RAM, and an auxiliary storage section 103 made up of a hard disk or flash memory. It includes a communication control unit 104 made up of a network card or a wireless communication module, an input device 105 such as a keyboard or a mouse, and an output device 106 such as a display or a printer.

後述する画像処理装置２０の各機能要素は、ＣＰＵ１０１または主記憶部１０２の上に所定のソフトウェアを読み込ませ、ＣＰＵ１０１の制御の下で通信制御部１０４、入力装置１０５、又は出力装置１０６などを動作させ、主記憶部１０２または補助記憶部１０３におけるデータの読み出しおよび書き込みを行うことで実現される。処理に必要なデータあるいはデータベースは主記憶部１０２または補助記憶部１０３内に格納される。 Each functional element of the image processing device 20, which will be described later, loads predetermined software onto the CPU 101 or the main storage unit 102, and operates the communication control unit 104, input device 105, output device 106, etc. under the control of the CPU 101. This is realized by reading and writing data in the main storage unit 102 or the auxiliary storage unit 103. Data or databases required for processing are stored in the main storage unit 102 or the auxiliary storage unit 103.

図４に示すように、画像処理装置２０は機能的構成要素として点灯制御部２１、画像取得部２２、視線検出部（瞳孔位置検出部、瞳孔距離検出部）２３、部分画像作成部２４、瞳孔位置予測部２５、学習用画像作成部２６、及びモデル学習部２７を備える。点灯制御部２１は、光源１３の点灯タイミングを制御する。画像取得部２２は、カメラ１０の撮影タイミングに光源１３の点灯タイミングを同期させるように制御することで、カメラ１０から点灯タイミングでの顔画像のデータを取得する機能要素である。視線検出部２３は、顔画像から得られる視線ベクトルに基づいて視軸（視線ともいう）の方向を検出する機能要素である。視軸（視線）とは、対象者の瞳孔中心と該対象者の注視点（対象者が見ている点）とを結ぶ線である。なお、「視軸」という用語は、起点、終点、および方向の意味（概念）を含む。また、「視線ベクトル」とは、対象者の視軸の方向をベクトルで表したもので、「視軸の方向」を表す一形態である。画像処理装置２０の検出結果の視軸の方向の出力先は何ら限定されない。例えば、画像処理装置２０は判定結果を画像、図形、またはテキストでモニタに表示してもよいし、メモリあるいはデータベースなどの記憶装置に格納してもよいし、通信ネットワーク経由で他のコンピュータシステムに送信してもよい。 As shown in FIG. 4, the image processing device 20 includes a lighting control section 21, an image acquisition section 22, a line of sight detection section (pupil position detection section, pupil distance detection section) 23, a partial image creation section 24, and a pupil distance detection section 23 as functional components. It includes a position prediction section 25, a learning image creation section 26, and a model learning section 27. The lighting control unit 21 controls the lighting timing of the light source 13. The image acquisition unit 22 is a functional element that acquires facial image data from the camera 10 at the lighting timing by controlling the lighting timing of the light source 13 to be synchronized with the imaging timing of the camera 10 . The line-of-sight detection unit 23 is a functional element that detects the direction of the visual axis (also referred to as line-of-sight) based on the line-of-sight vector obtained from the face image. The visual axis (line of sight) is a line connecting the center of a subject's pupil and the subject's point of gaze (the point at which the subject is looking). Note that the term "visual axis" includes the meanings (concepts) of starting point, ending point, and direction. In addition, the "line-of-sight vector" is a vector representing the direction of the subject's visual axis, and is one form of expressing the "direction of the visual axis." The output destination of the detection result of the image processing device 20 in the direction of the visual axis is not limited at all. For example, the image processing device 20 may display the determination result as an image, figure, or text on a monitor, store it in a storage device such as a memory or database, or transmit it to another computer system via a communication network. You can also send it.

ここで、画像処理装置２０による視線検出の基本動作について説明する。 Here, the basic operation of visual line detection by the image processing device 20 will be explained.

まず、点灯制御部２１が、光源１３に含まれる発光素子１３ａ及び発光素子１３ｂがカメラ１０の撮影タイミングに同期させて交互に点灯するようにそれらの点灯タイミングを制御し、その交互の点灯タイミングに合わせて画像取得部２２が、それぞれのカメラ１０から、瞳孔が比較的明るく写った明瞳孔画像（顔画像）および瞳孔が比較的暗く写った暗瞳孔画像（顔画像）を取得する。続いて、視線検出部２３が、それぞれのカメラ１０からの明瞳孔画像及び暗瞳孔画像の差分画像（あるいは除算画像）を対象にして、それぞれのカメラ１０の顔画像中の瞳孔中心の位置及び角膜反射の位置を検出する。そして、視線検出部２３は、２つのカメラ１０の顔画像から検出された瞳孔中心の位置を用いてステレオ法によって計算することによって、対象者Ａの左右の瞳孔中心の３次元座標、及びそれぞれのカメラ１０から左右の瞳孔までの距離を取得する。さらに、視線検出部２３は、算出したいずれかのカメラ１０の顔画像における瞳孔中心及び角膜反射の位置、及び左右の瞳孔の３次元座標を基に、左右の目の視軸（視線ベクトル）を算出する。さらに、視線検出部２３は、算出した視軸を参照して所定の視対称平面上の注視点を算出してもよい。以上の処理は、交互に得られる明瞳孔画像及び暗瞳孔画像のペアを対象に繰り返し実行される。 First, the lighting control unit 21 controls the lighting timing of the light emitting elements 13a and the light emitting elements 13b included in the light source 13 so that they are lit alternately in synchronization with the photographing timing of the camera 10. In addition, the image acquisition unit 22 acquires from each camera 10 a bright pupil image (face image) in which the pupil is relatively bright and a dark pupil image (face image) in which the pupil is relatively dark. Next, the line of sight detection unit 23 uses the difference image (or divided image) between the bright pupil image and the dark pupil image from each camera 10 to determine the position of the pupil center and the cornea in the face image of each camera 10. Detect the position of the reflection. Then, the line of sight detection unit 23 calculates the three-dimensional coordinates of the left and right pupil centers of the subject A, and the respective The distance from the camera 10 to the left and right pupils is obtained. Furthermore, the line-of-sight detection unit 23 determines the visual axis (line-of-sight vector) of the left and right eyes based on the calculated positions of the pupil center and corneal reflection in the face image of either camera 10 and the three-dimensional coordinates of the left and right pupils. calculate. Furthermore, the line of sight detection unit 23 may calculate a gaze point on a predetermined visual symmetry plane with reference to the calculated visual axis. The above processing is repeatedly performed on pairs of bright pupil images and dark pupil images that are obtained alternately.

ここで、視線検出部２３による明瞳孔画像及び暗瞳孔画像を対象にした、瞳孔中心の位置及び角膜反射の位置の検出は、明瞳孔画像及び暗瞳孔画像中に設定された所定の大きさのウィンドウの範囲内の画像を対象に実行される。このウィンドウの位置は、検出対象の顔画像の前のフレームで検出された瞳孔の３次元位置を少なくとも用いて予測された現フレーム上の瞳孔の位置を基に、その瞳孔の位置を含むように設定される（例えば、特開２００７－２６８０２６号公報に記載の手法を採用する。）。すなわち、ウィンドウの位置は、連続したフレーム間で瞳孔の位置を追尾するように設定される。ただし、前のフレームの顔画像を対象として視線検出部２３によって瞳孔中心の位置の検出に失敗した場合には、後述する瞳孔位置予測部２５によって予測される瞳孔の３次元位置を利用して、現フレームの顔画像上のウィンドウの位置が設定される。ここで、視線検出部２３による瞳孔中心の位置の検出の失敗の判断（瞬きなどによる閉眼の判断）は、明瞳孔画像及び暗瞳孔画像の差分画像（あるいは除算画像）上で検出された瞳孔の面積（例えば、閾値を用いた二値化画像の瞳孔領域の画素数）がゼロに近くなって瞳孔が存在しないと認識することにより行われる。一般的に、瞬きなどによる閉眼は瞬時にして起こるため、あるフレームで突然に瞳孔が検出できなくなる。したがって、閉眼直前と閉眼直後の間で頭部が動いていなければ、閉眼直前と閉眼直後で同じ位置に目あるいは瞳孔が存在する。よって、閉眼画像と瞳孔位置を対応づけることができる。視線検出部２３による瞳孔の位置の検出と、瞳孔位置予測部２５による瞳孔の位置の予測とは、連続して取得される顔画像を対象に並列に実行されてよい。この場合、視線検出部２３において、前のフレームでの検出が失敗したと判断された場合に、瞳孔位置予測部２５による予測結果が利用されて現フレームにおけるウィンドウの位置が設定される。一方で、あるフレームで視線検出部２３による瞳孔の位置の検出が失敗した場合に、そのフレームを対象とした瞳孔位置予測部２５による予測処理を実行させることもできる。この場合、視線検出部２３において、瞳孔位置予測部２５による予測結果が利用されて次のフレームにおけるウィンドウの位置が設定される。 Here, the detection of the position of the pupil center and the position of the corneal reflection for the bright pupil image and the dark pupil image by the line of sight detection unit 23 is performed using a predetermined size set in the bright pupil image and the dark pupil image. Executed on images within the window. The position of this window is based on the position of the pupil on the current frame predicted using at least the three-dimensional position of the pupil detected in the previous frame of the face image to be detected, and is set to include the position of the pupil. (For example, the method described in Japanese Unexamined Patent Publication No. 2007-268026 is adopted.) That is, the position of the window is set so as to track the position of the pupil between consecutive frames. However, if the line of sight detection unit 23 fails to detect the position of the pupil center for the face image of the previous frame, the three-dimensional position of the pupil predicted by the pupil position prediction unit 25, which will be described later, is used to The position of the window on the face image of the current frame is set. Here, the determination of failure in detection of the position of the pupil center by the line of sight detection unit 23 (determination of eye closure due to blinking, etc.) is based on the pupil detected on the difference image (or division image) between the bright pupil image and the dark pupil image. This is performed by recognizing that the pupil does not exist when the area (for example, the number of pixels in the pupil region of a binarized image using a threshold value) approaches zero. Generally, eye closure due to blinking occurs instantaneously, so the pupil suddenly becomes undetectable in a certain frame. Therefore, if the head does not move between immediately before and immediately after closing the eyes, the eyes or pupils will be in the same position immediately before and immediately after closing the eyes. Therefore, it is possible to associate the eye-closed image with the pupil position. The detection of the position of the pupil by the line of sight detection unit 23 and the prediction of the position of the pupil by the pupil position prediction unit 25 may be performed in parallel on continuously acquired facial images. In this case, when the line-of-sight detection unit 23 determines that detection in the previous frame has failed, the prediction result by the pupil position prediction unit 25 is used to set the window position in the current frame. On the other hand, if detection of the pupil position by the line-of-sight detection unit 23 fails in a certain frame, the pupil position prediction unit 25 can perform prediction processing for that frame. In this case, the line of sight detection unit 23 uses the prediction result by the pupil position prediction unit 25 to set the window position in the next frame.

次に、画像処理装置２０の他の構成要素の機能について説明する。なお、以下に説明する部分画像作成部２４、瞳孔位置予測部２５、学習用画像作成部２６、及びモデル学習部２７による処理は、２つのカメラ１０毎に連続的に取得される明瞳孔画像及び暗瞳孔画像を対象に別々に実行される。 Next, the functions of other components of the image processing device 20 will be explained. Note that the processing by the partial image creation unit 24, pupil position prediction unit 25, learning image creation unit 26, and model learning unit 27, which will be described below, is based on bright pupil images and It is performed separately on the dark pupil image.

部分画像作成部２４は、画像取得部２２によってそれぞれのカメラ１０から取得された顔画像から、所定のサイズの矩形領域の部分画像を切り出す。例えば、顔画像が、横１５０ピクセル×縦８０ピクセルのサイズであり、この部分画像が、横３０ピクセル×縦２０ピクセルのサイズである。この部分画像のサイズは、対象者Ａの顔における目の特徴部である睫毛全体が含まれるようなサイズに設定される。このとき、部分画像作成部２４は、処理対象の顔画像の直前のフレームに関して視線検出部２３あるいは瞳孔位置予測部２５によって検出あるいは予測されたカメラ１０から瞳孔までの距離を基に、目の特徴部の全体が含まれるように部分画像の切り出しサイズを可変に設定し、切り出した部分画像を所定サイズ（例えば、８ピクセル×５ピクセル）の画像データに変換してもよい。そして、部分画像作成部２４は、顔画像中において部分画像の切り出し領域を二次元的な方向に繰り返しシフト（例えば、横方向に１０ピクセルずつ、縦方向に５ピクセルずつシフト）させ、互いに横方向及び縦方向にオーバーラップさせた複数の部分画像を作成する。 The partial image creation unit 24 cuts out a partial image of a rectangular area of a predetermined size from the face image acquired from each camera 10 by the image acquisition unit 22. For example, a face image has a size of 150 pixels horizontally by 80 pixels vertically, and this partial image has a size of 30 pixels horizontally by 20 pixels vertically. The size of this partial image is set to include the entire eyelashes, which are characteristic parts of the eyes of the subject A's face. At this time, the partial image creation unit 24 uses the eye characteristics based on the distance from the camera 10 to the pupil detected or predicted by the line of sight detection unit 23 or the pupil position prediction unit 25 for the frame immediately before the face image to be processed. The cutout size of the partial image may be set variably so that the entire portion is included, and the cutout partial image may be converted into image data of a predetermined size (for example, 8 pixels x 5 pixels). Then, the partial image creation unit 24 repeatedly shifts the cutout regions of the partial images in the face image in a two-dimensional direction (for example, shifts by 10 pixels in the horizontal direction and by 5 pixels in the vertical direction), and mutually shifts the cutout regions in the horizontal direction. and create a plurality of partial images that overlap in the vertical direction.

図５には、部分画像作成部２４によって作成される部分画像のイメージを示している。例えば、切り出し対象の顔画像Ｇ_Ｆ１が、横１５０ピクセル×縦８０ピクセルのサイズであり、部分画像の切り出しサイズが横３０ピクセル×縦２０ピクセルであり、横方向のシフト量が１０ピクセルであり、縦方向のシフト量が５ピクセルである場合、部分画像作成部２４は、１フレームの顔画像Ｇ_Ｆ１を対象に合計１６９枚の部分画像ＧＰ_Ｆ１を作成する。 FIG. 5 shows an image of a partial image created by the partial image creation section 24. For example, the face image G _F1 to be cropped has a size of 150 pixels horizontally x 80 pixels vertically, the cropping size of the partial image is 30 pixels horizontally x 20 pixels vertically, and the horizontal shift amount is 10 pixels, When the vertical shift amount is 5 pixels, the partial image creation unit 24 creates a total of 169 partial images GP _F1 for one frame of the face image G _F1 .

また、部分画像作成部２４は、処理対象のフレームの直前のフレームにおいて瞳孔位置予測部２５によって顔画像上の瞳孔の位置が予測されている場合には、その位置を中心とした矩形領域のウィンドウを設定し、そのウィンドウ内から部分画像を切り出すように機能する。例えば、部分画像作成部２４は、予測された位置を中心にして、横６０ピクセル、縦４０ピクセルのウィンドウを設定し、そのウィンドウ内から横４０ピクセル、縦２０ピクセルのサイズで縦横５ピクセルずつシフトさせながら部分画像を切り出す。 In addition, if the pupil position prediction unit 25 predicts the position of the pupil on the face image in the frame immediately before the frame to be processed, the partial image creation unit 24 creates a window of a rectangular area centered at the position. , and it functions to cut out a partial image from within that window. For example, the partial image creation unit 24 sets a window of 60 pixels horizontally and 40 pixels vertically centering on the predicted position, and shifts the window by 5 pixels vertically and horizontally from within the window to a size of 40 pixels horizontally and 20 pixels vertically. Cut out a partial image while

また、部分画像作成部２４は、処理対象のフレームの１つ前及び２つ前のフレームにおいて瞳孔位置予測部２５によって瞳孔の３次元位置が予測されている場合には、瞳孔が等速で３次元空間内を動いているとの仮定の下で等速モデルを用いて処理対象のフレームにおける瞳孔の位置を予測する。この場合は、３次元空間愛の位置を二次元画像である顔画像上に投影することによって瞳孔の位置を予測する。そして、部分画像作成部２４は、予測した位置を中心とした矩形領域のウィンドウを設定し、そのウィンドウ内から部分画像を切り出すように機能する。例えば、部分画像作成部２４は、予測した位置を中心にして、横６０ピクセル、縦４０ピクセルのウィンドウを設定し、そのウィンドウ内から横４０ピクセル、縦２０ピクセルのサイズで縦横５ピクセルずつシフトさせながら部分画像を切り出す。 In addition, when the three-dimensional position of the pupil is predicted by the pupil position prediction unit 25 in the frames one and two frames before the frame to be processed, the partial image creation unit 24 determines that the pupil is The position of the pupil in the frame to be processed is predicted using a uniform velocity model under the assumption that the pupil is moving in a dimensional space. In this case, the position of the pupil is predicted by projecting the position in the three-dimensional space onto the face image, which is a two-dimensional image. Then, the partial image creation unit 24 functions to set a window of a rectangular area centered on the predicted position and cut out a partial image from within the window. For example, the partial image creation unit 24 sets a window of 60 pixels horizontally and 40 pixels vertically centering on the predicted position, and shifts the window by 5 pixels vertically and horizontally from within the window to a size of 40 pixels horizontally and 20 pixels vertically. while cutting out a partial image.

瞳孔位置予測部２５は、部分画像作成部２４によってそれぞれのカメラ１０の顔画像を対象にして作成された複数の部分画像を基に、顔画像上の閉眼状態における目の特徴部（本実施形態では睫毛部分）の位置を探索することにより、対象者Ａの閉眼中の顔画像上の瞳孔の位置を予測する。この瞳孔の位置の予測は、左右の目のそれぞれについて行われる。 The pupil position prediction unit 25 calculates the characteristics of the eyes in the closed eye state on the face image (in this embodiment By searching for the position of the eyelashes, the position of the pupil on the face image of subject A with his eyes closed is predicted. This prediction of the pupil position is performed for each of the left and right eyes.

すなわち、瞳孔位置予測部２５は、ＣＮＮ（畳み込みニューラルネットワーク）を用いた機械学習モデルによって瞳孔の位置の予測を実行する。まず、瞳孔位置予測部２５は、モデル学習部２７によって予め学習された機械学習モデル１を用いて、複数の部分画像の中から、閉眼状態における目の特徴部（睫毛部分）が含まれる画像（以下、単に「閉眼画像」ともいう。）としての尤度の高い部分画像を予測する。図６には、瞳孔位置予測部２５によって用いられる機械学習モデル１の層構成の一例を示す。ただし、図６に示す層構成は一例であり、この構成以外の他の層構成が採用されてもよい。機械学習モデル１は、畳み込み層および畳み込み層の出力を変換するＲｅＬＵ（Rectified Linear Unit）関数等の活性化関数を有する前処理部と、平滑化層、全結合層、全結合層の出力を変換するＲｅＬＵ関数等の活性化関数、および全結合層をこの順に有する後処理部とによって構成され、入力画像を処理して閉眼画像に関する尤度を算出する。ここで、入力される部分画像はカメラ１０から瞳孔までの距離を基に目の特徴部の全体が含まれるように部分画像を所定サイズの画像データに変換されているので、機械学習モデル１による演算において処理対象の画像の大きさの変化が生じにくいため、機械学習モデル１において、画像を縮小することなく特徴抽出処理で大小の特徴を維持できるため、前処理部におけるプーリング層を省略することができる。また、プーリング層の役割の１つである、入力画像における特徴部の位置ずれの影響を低減させる機能は、部分画像作成部２４による複数の部分画像を特徴部全体が含まれるようにオーバーラップさせながら作成する機能によって実現できる。 That is, the pupil position prediction unit 25 predicts the pupil position using a machine learning model using a CNN (convolutional neural network). First, the pupil position prediction unit 25 uses the machine learning model 1 learned in advance by the model learning unit 27 to select an image (which includes the characteristic part of the eye (eyelash part) in the eye-closed state) from among the plurality of partial images. Hereinafter, a partial image with a high likelihood of being an "eyes closed image" is predicted. FIG. 6 shows an example of the layer configuration of the machine learning model 1 used by the pupil position prediction unit 25. However, the layer configuration shown in FIG. 6 is an example, and other layer configurations may be employed. Machine learning model 1 includes a convolutional layer and a preprocessing unit that has an activation function such as a ReLU (Rectified Linear Unit) function that converts the output of the convolutional layer, and a preprocessing unit that converts the outputs of the smoothing layer, fully connected layer, and fully connected layer. A post-processing unit includes an activation function such as a ReLU function, and a fully connected layer in this order, and processes an input image to calculate a likelihood regarding an eye-closed image. Here, the input partial image is converted into image data of a predetermined size based on the distance from the camera 10 to the pupil so that the entire characteristic part of the eye is included. Since it is difficult for the size of the image to be processed to change during calculation, in machine learning model 1, the size features can be maintained in the feature extraction process without reducing the image, so the pooling layer in the preprocessing section can be omitted. I can do it. In addition, one of the roles of the pooling layer is to reduce the influence of positional shifts of feature parts in the input image by overlapping a plurality of partial images by the partial image creation unit 24 so that the entire feature part is included. This can be achieved through the ability to create while

さらに、瞳孔位置予測部２５は、上記のようにして予測された部分画像を、モデル学習部２７によって予め学習された機械学習モデル２に入力し、その機械学習モデル２の出力値を基に、部分画像の閉眼画像からの二次元的な位置のずれ量を予測する。例えば、瞳孔位置予測部２５は、画像の横方向をｘ軸、画像の縦方向をｙ軸として、二次元的な位置のずれ量として、ｘ軸方向のずれ量Δｘとｙ軸方向のずれ量Δｙとをサブピクセルの単位で予測する。図７には、瞳孔位置予測部２５によって用いられる機械学習モデル２の層構成の一例を示す。ただし、図７に示す層構成は一例であり、この構成以外の他の層構成が採用されてもよい。機械学習モデル２は、畳み込み層およびプーリング層を有する前処理部と、平滑化層、３つの全結合層、全結合層の出力を変換するＲｅＬＵ関数等の活性化関数、および全結合層の出力を変換するリニア関数を有する後処理部とによって構成される。この機械学習モデル２は、入力画像を処理して閉眼画像に対する二次元的なずれ量（Δｘ，Δｙ）毎に尤度を算出する。 Furthermore, the pupil position prediction unit 25 inputs the partial image predicted as described above into the machine learning model 2 trained in advance by the model learning unit 27, and based on the output value of the machine learning model 2, Predict the amount of two-dimensional positional deviation of the partial image from the eye-closed image. For example, the pupil position prediction unit 25 calculates the amount of deviation Δx in the x-axis direction and the amount of deviation in the y-axis direction as two-dimensional positional deviation amounts, with the horizontal direction of the image as the x-axis and the vertical direction of the image as the y-axis. Δy is predicted in subpixel units. FIG. 7 shows an example of the layer configuration of the machine learning model 2 used by the pupil position prediction unit 25. However, the layer configuration shown in FIG. 7 is an example, and other layer configurations other than this configuration may be employed. Machine learning model 2 includes a preprocessing unit including a convolution layer and a pooling layer, a smoothing layer, three fully connected layers, an activation function such as a ReLU function that converts the output of the fully connected layer, and the output of the fully connected layer. and a post-processing unit having a linear function that converts the . This machine learning model 2 processes the input image and calculates the likelihood for each two-dimensional shift amount (Δx, Δy) with respect to the eye-closed image.

加えて、瞳孔位置予測部２５は、上記のようにして予測された閉眼画像に対するずれ量（Δｘ，Δｙ）を基に、予め画像処理装置２０内に記憶された閉眼画像と瞳孔の位置との関係のデータを参照して、そのずれ量を相殺するように部分画像中の瞳孔の位置を計算する。ここで、予め画像処理装置２０内に記憶される瞳孔の位置に関する関係データは、左右の瞳孔毎に別々の値とされている。これにより、瞳孔位置予測部２５は、閉眼中の顔画像上の瞳孔の位置を予測する。そして、瞳孔位置予測部２５は、２つのカメラ１０によって同時に取得された２つの顔画像を対象に予測された顔画像上の左右の瞳孔の位置を基に、ステレオ法を用いて左右の瞳孔の３次元位置を計算する。瞳孔位置予測部２５は、このような左右の瞳孔の位置の予測を、連続するフレームの顔画像毎に繰り返し実行する。 In addition, the pupil position prediction unit 25 calculates the position of the pupil between the closed-eye image stored in advance in the image processing device 20 based on the amount of deviation (Δx, Δy) with respect to the closed-eye image predicted as described above. With reference to the related data, the position of the pupil in the partial image is calculated so as to offset the amount of deviation. Here, the relational data regarding the position of the pupil, which is stored in advance in the image processing device 20, has different values for each of the left and right pupils. Thereby, the pupil position prediction unit 25 predicts the position of the pupil on the face image while the eyes are closed. Then, the pupil position prediction unit 25 uses the stereo method to predict the positions of the left and right pupils on the face image, which are predicted for the two face images captured simultaneously by the two cameras 10. Calculate 3D position. The pupil position prediction unit 25 repeatedly performs prediction of the positions of the left and right pupils for each successive frame of the face image.

学習用画像作成部２６は、上述した機械学習モデル１および機械学習モデル２の事前学習用の学習データ（トレーニングデータ）を作成する。この学習用画像作成部２６による学習データの作成は、特開２００５－２３００４９号公報等に開示された視線方向の検出のための較正処理が終わった直後に実行されることが好ましい。 The learning image creation unit 26 creates learning data (training data) for preliminary learning of the machine learning model 1 and the machine learning model 2 described above. Preferably, the learning data creation section 26 creates the learning data immediately after the calibration process for detecting the line-of-sight direction disclosed in Japanese Patent Laid-Open No. 2005-230049 and the like is completed.

すなわち、学習用画像作成部２６は、較正処理において画像処理装置２０による対象者Ａに対する指示出力に応じて対象者Ａが目を閉じた前後に、それぞれのカメラ１０から連続した複数フレームの顔画像を取得し、それらの顔画像を基に最後に視線検出部２３によって検出された瞳孔の位置を特定する。そして、学習用画像作成部２６は、瞳孔の位置が特定されたフレームの直後における対象者Ａが目を閉じたタイミングのフレームの顔画像を取得し、瞳孔の位置を基準として所定のサイズの矩形領域の画像を、閉眼画像として切り出す。例えば、顔画像が横１５０ピクセル×縦８０ピクセルのサイズである場合、閉眼画像のサイズは横３０ピクセル×縦２０ピクセルである。この閉眼画像のサイズは、対象者Ａの顔における目の特徴部である睫毛全体が含まれるようなサイズに設定される。また、瞳孔の位置と閉眼画像の切り出し位置との関係は、左右の瞳孔で異なる関係に予め設定され、それらの関係は、画像処理装置２０内に記憶され、瞳孔位置予測部２５による瞳孔の位置の予測時に参照される。さらに、学習用画像作成部２６は、同様な処理を繰り返して左右の瞳孔の閉眼画像を複数フレーム分作成し、これらをポジティブ（正解）の学習データ「学習画像１」として画像処理装置２０内に記憶する。このとき、学習用画像作成部２６は、画像処理装置２０からの指示情報の出力（例えばディスプレイ装置３０への出力）により対象者Ａの顔を複数の方向に向けさせて対象者Ａに目を閉じさせ（例えば、ディスプレイ装置３０に対する正面方向、右３０度方向、左３０度方向）、それぞれの方向で左右の瞳孔の閉眼画像をポジティブ（正解）の学習データ「学習画像１」として作成することもできる。 That is, the learning image creation unit 26 generates consecutive frames of facial images from each camera 10 before and after the subject A closes his eyes in response to an instruction output to the subject A by the image processing device 20 in the calibration process. is acquired, and the position of the pupil finally detected by the line of sight detection unit 23 is specified based on those facial images. Then, the learning image creation unit 26 acquires a face image of a frame at a timing when the subject A closes his eyes immediately after the frame in which the pupil position is specified, and creates a rectangle of a predetermined size based on the pupil position. The image of the area is cut out as an eye-closed image. For example, if the face image has a size of 150 pixels wide x 80 pixels high, the size of the eye-closed image is 30 pixels wide x 20 pixels high. The size of this eye-closed image is set to include the entire eyelashes, which are characteristic parts of the eyes of the subject A's face. Further, the relationship between the position of the pupil and the cutout position of the eye-closed image is set in advance to be different for the left and right pupils, and these relationships are stored in the image processing device 20, and the position of the pupil is determined by the pupil position prediction unit 25. Referenced when making predictions. Furthermore, the learning image creation unit 26 repeats the same process to create multiple frames of closed-eye images of the left and right pupils, and stores these as positive (correct) learning data "learning image 1" in the image processing device 20. Remember. At this time, the learning image creation unit 26 directs the subject A's face to face in a plurality of directions by outputting the instruction information from the image processing device 20 (for example, outputting it to the display device 30), and makes the subject A look at the subject A. Close the eyes (for example, in the front direction, 30 degrees to the right, and 30 degrees to the left with respect to the display device 30), and create closed-eye images of the left and right pupils in each direction as positive (correct answer) learning data "learning image 1". You can also do it.

図８には、顔画像Ｇ_Ｆ２を対象に学習用画像作成部２６によって作成された左右の瞳孔の閉眼画像ＧＰ_Ｌ１，ＧＰ_Ｒ１の一例を示す。このように、顔画像Ｇ_Ｆ２から、閉眼直前に検出された瞳孔の位置Ｐ_Ｌ１，Ｐ_Ｒ１を基準に、睫毛全体が含まれるような閉眼画像が自動的に作成される。 FIG. 8 shows an example of closed-eye images GP _L1 and GP _R1 of the left and right pupils created by the learning image creation unit 26 for the face image G _F2 . In this way, an eye-closed image that includes the entire eyelashes is automatically created from the face image G _F2 based on the pupil positions P _L1 and P _R1 detected immediately before the eyes were closed.

また、学習用画像作成部２６は、上述した複数フレーム分のポジティブの学習データ「学習画像１」の作成と同時に、ネガティブ（不正解）の学習データ「学習画像２」の作成も行う。すなわち、学習用画像作成部２６は、顔画像中においてポジティブの学習データである閉眼画像の基準位置から２次元方向に所定幅でシフトした画像をネガティブの学習データとして複数フレーム分作成する。例えば、１つの閉眼画像に対して、縦方向に±５ピクセル、および横方向に±５ピクセルの範囲で、５ピクセルずつ２次元方向に８パターンでシフトした画像を、ネガティブの学習データとして作成する。加えて、縦方向に±１０ピクセル、および横方向に±１０ピクセルの範囲で、２０ピクセルずつ２次元方向に４パターンでシフトした画像を、ネガティブの学習データとして作成する。ただし、学習用画像作成部２６は、ネガティブの学習データをポジティブの学習データとは異なる画像サイズで作成してもよい。学習用画像作成部２６は、作成したネガティブの学習データ「学習画像２」を画像処理装置２０内に記憶する。 Further, the learning image creation unit 26 creates negative (incorrect) learning data "learning image 2" at the same time as creating the positive learning data "learning image 1" for the plurality of frames described above. That is, the learning image creation unit 26 creates a plurality of frames of images shifted by a predetermined width in the two-dimensional direction from the reference position of the eye-closed image, which is positive learning data, in the face image as negative learning data. For example, for one eye-closed image, images are created by shifting 5 pixels in 8 patterns in two-dimensional directions within a range of ±5 pixels in the vertical direction and ±5 pixels in the horizontal direction as negative learning data. . In addition, images shifted by 20 pixels in four patterns in two-dimensional directions within a range of ±10 pixels in the vertical direction and ±10 pixels in the horizontal direction are created as negative learning data. However, the learning image creation unit 26 may create negative learning data in an image size different from that of positive learning data. The learning image creation unit 26 stores the created negative learning data “learning image 2” in the image processing device 20.

なお、学習用画像作成部２６は、部分画像作成部２４と同様にして、処理対象の顔画像の直前のフレームに関して視線検出部２３によって検出されたカメラ１０から瞳孔までの距離を基に、目の特徴部（本実施形態では睫毛）の全体が含まれるように学習データの切り出しサイズを可変に設定し、切り出した学習データを所定サイズ（例えば、８ピクセル×５ピクセル）の画像データに変換してもよい。 Note that, similar to the partial image creation section 24, the learning image creation section 26 calculates the distance between the camera 10 and the pupil based on the distance from the camera 10 to the pupil detected by the line of sight detection section 23 regarding the frame immediately before the face image to be processed. The cropping size of the learning data is set variably so that the entire feature part (eyelashes in this embodiment) is included, and the cropped learning data is converted to image data of a predetermined size (e.g., 8 pixels x 5 pixels). It's okay.

加えて、学習用画像作成部２６は、上述した複数フレーム分のポジティブの学習データ「学習画像１」の作成と同時に、瞳孔位置決定のための機械学習モデル２用の学習データ「学習画像３」の作成も行う。詳細には、学習用画像作成部２６は、顔画像中においてポジティブの学習データである閉眼画像の基準位置から２次元方向に所定幅でシフトした画像（シフト画像）を「学習画像３」として複数フレーム分作成する。ただし、「学習画像３」のシフト幅は、「学習画像２」のシフト幅よりも小さい値に設定される。例えば、１つの閉眼画像に対して、縦方向に±２ピクセル、および横方向に±２ピクセルの範囲で、１ピクセルずつ２次元方向に２５パターンでシフトした複数の画像を、学習データ「学習画像３」として作成する。ただし、学習用画像作成部２６は、学習データ「学習画像３」を学習データ「学習画像１」より大きな画像サイズで作成してもよい。学習用画像作成部２６は、作成した複数の学習データ「学習画像３」を、それぞれの学習画像の閉眼画像に対する、ｘ軸方向のずれ量Δｘ及びｙ軸方向のずれ量Δｙとともに、画像処理装置２０内に記憶する。 In addition, the learning image creation unit 26 creates the learning data "learning image 3" for the machine learning model 2 for determining the pupil position at the same time as creating the positive learning data "learning image 1" for multiple frames described above. We also create . Specifically, the learning image creation unit 26 generates a plurality of images (shifted images) that are shifted by a predetermined width in a two-dimensional direction from the reference position of the eye-closed image, which is positive learning data, in the face image as "learning images 3". Create frames. However, the shift width of "learning image 3" is set to a smaller value than the shift width of "learning image 2." For example, for one eye-closed image, multiple images shifted by 1 pixel in 2-dimensional directions in 25 patterns within the range of ±2 pixels in the vertical direction and ±2 pixels in the horizontal direction are transferred to the learning data "Learning image". 3". However, the learning image creation unit 26 may create the learning data "learning image 3" in a larger image size than the learning data "learning image 1." The learning image creation unit 26 sends the plurality of created learning data "learning images 3" to the image processing device along with the amount of deviation Δx in the x-axis direction and the amount Δy of deviation in the y-axis direction of each learning image with respect to the eye-closed image. Stored within 20 days.

図９には、対象者Ａが閉眼時に取得した顔画像Ｇ_Ｆ２を対象に学習用画像作成部２６によって作成された学習データ「学習画像３」の一例を示す。このように、顔画像Ｇ_Ｆ２から、睫毛全体が含まれるような閉眼画像が縦方向に±２ピクセル、および横方向に±２ピクセルの範囲でシフトされた学習画像ＧＰ_Ｒ２～ＧＰ_Ｒ４が自動的に作成される。 FIG. 9 shows an example of learning data "learning image 3" created by the learning image creating unit 26 using the face image _GF2 acquired by subject A when his eyes were closed. In this way, the learning images GP _R2 to GP _R4 , in which closed-eye images that include the entire eyelashes are shifted by ±2 pixels in the vertical direction and ±2 pixels in the horizontal direction, are automatically created from the face image G _F2 . Created in

モデル学習部２７は、学習用画像作成部２６によって作成された学習データを用いて、瞳孔位置予測部２５が用いる機械学習モデル１および機械学習モデル２を学習させる。すなわち、モデル学習部２７は、画像処理装置２０内に記憶された、複数枚のポジティブの学習データ「学習画像１」、及び複数枚のネガティブの学習データ「学習画像２」を用いて、機械学習モデル１を学習させる。また、モデル学習部２７は、画像処理装置２０内に記憶された、複数枚の学習データ「学習画像３」と、それぞれの学習画像のずれ量Δｘ，Δｙとの組み合わせをポジティブ画像のトレーニングデータとして用いて、機械学習モデル２を学習させる。 The model learning unit 27 uses the learning data created by the learning image creation unit 26 to train machine learning model 1 and machine learning model 2 used by the pupil position prediction unit 25. That is, the model learning unit 27 performs machine learning using a plurality of positive learning data “learning images 1” and a plurality of negative learning data “learning images 2” stored in the image processing device 20. Train model 1. In addition, the model learning unit 27 uses a combination of the plurality of pieces of learning data “learning image 3” stored in the image processing device 20 and the deviation amounts Δx, Δy of the respective learning images as positive image training data. The machine learning model 2 is trained using the machine learning model 2.

次に、視線検出装置１の動作手順を説明するとともに、本実施形態に係る瞳孔検出方法のステップについて説明する。図１０は、視線検出装置１の動作手順を示すフローチャートである。 Next, the operating procedure of the line of sight detection device 1 will be explained, and the steps of the pupil detection method according to this embodiment will be explained. FIG. 10 is a flowchart showing the operation procedure of the line of sight detection device 1.

まず、対象者Ａを対象とした視線方向の検出処理が開始されると、画像処理装置２０の点灯制御部２１及び画像取得部２２によって、光源１３の点灯タイミングの制御、及びカメラ１０からの画像取得の制御が開始される（ステップＳ１０１）。その後、画像処理装置２０の学習用画像作成部２６により、視線方向検出のための較正処理中に、学習データである、「学習画像１」、「学習画像２」、および「学習画像３」が、それぞれ複数枚取得される（ステップＳ１０２）。次に、画像処理装置２０のモデル学習部２７によって、学習データを用いて、機械学習モデル１および機械学習モデル２が学習される（ステップＳ１０３）。 First, when the line-of-sight direction detection processing for the subject A is started, the lighting control unit 21 and the image acquisition unit 22 of the image processing device 20 control the lighting timing of the light source 13 and the image from the camera 10. Acquisition control is started (step S101). Thereafter, the learning image creation unit 26 of the image processing device 20 creates the learning data "learning image 1," "learning image 2," and "learning image 3" during the calibration process for line-of-sight direction detection. , a plurality of images are acquired (step S102). Next, the model learning unit 27 of the image processing device 20 uses the learning data to learn machine learning model 1 and machine learning model 2 (step S103).

そして、画像処理装置２０の視線検出部２３によって、カメラ１０から連続して取得されるフレームの顔画像を対象にした視線方向の検出処理（瞳孔位置の検出も含む。）が実行される（ステップＳ１０４；瞳孔位置検出ステップ）。それに並行して、画像処理装置２０の部分画像作成部２４によって、顔画像中から複数の部分画像が取得される（ステップＳ１０５）。その後、画像処理装置２０の瞳孔位置予測部２５によって、複数の部分画像を基に、瞳孔位置の予測処理が実行される（ステップＳ１０６；瞳孔位置予測ステップ）。さらに、画像処理装置２０において、次のフレームの顔画像が存在するか否かが判定され（ステップＳ１０７）、次のフレームの顔画像が存在する場合にはステップＳ１０４～Ｓ１０６の処理が繰り返される。ここで、ステップＳ１０４における顔画像中のウィンドウの設定においては、処理対象の顔画像の前のフレームを対象にステップＳ１０６の処理によって予測された瞳孔の３次元位置を追尾するように設定される。 Then, the line-of-sight detection unit 23 of the image processing device 20 executes the line-of-sight direction detection process (including the detection of the pupil position) for the facial images of frames continuously acquired from the camera 10 (step S104; Pupil position detection step). In parallel, a plurality of partial images are acquired from the facial image by the partial image creation unit 24 of the image processing device 20 (step S105). Thereafter, the pupil position prediction unit 25 of the image processing device 20 executes pupil position prediction processing based on the plurality of partial images (step S106; pupil position prediction step). Furthermore, in the image processing device 20, it is determined whether or not a face image of the next frame exists (step S107), and if a face image of the next frame exists, the processes of steps S104 to S106 are repeated. Here, in setting the window in the face image in step S104, the three-dimensional position of the pupil predicted by the process in step S106 is set to be tracked in the frame before the face image to be processed.

本開示の実施形態の視線検出装置１及びそれを用いた瞳孔検出方法の作用効果について説明する。 The effects of the line of sight detection device 1 and the pupil detection method using the same according to the embodiment of the present disclosure will be described.

視線検出装置１によれば、カメラ１０によって連続したタイミングで取得された顔画像上で瞳孔の位置が検出され、その顔画像から閉眼状態における目の特徴部の位置が探索されることにより、その顔画像上において閉眼中の瞳孔の位置が予測される。そして瞳孔の位置の検出の際には、顔画像上で予測された瞳孔の位置を利用してウィンドウが設定されることによって瞳孔が追尾される。これにより、対象者Ａが閉眼した場合であっても瞳孔を追尾することによって瞳孔の位置を検出できるのでウィンドウサイズを比較的小さくすることができ、高い演算効率で、高精度の瞳孔検出処理が実現できる。加えて、複雑な光学系を必要としないので、簡易な装置構成によって瞳孔検出処理を実現することができる。 According to the line of sight detection device 1, the position of the pupil is detected on the face images acquired at consecutive timings by the camera 10, and the position of the characteristic part of the eye in the closed eye state is searched from the face image. The position of the pupil when the eyes are closed is predicted on the face image. When detecting the position of the pupil, the pupil is tracked by setting a window using the position of the pupil predicted on the face image. As a result, the position of the pupil can be detected by tracking the pupil even when subject A closes his eyes, so the window size can be made relatively small, and the pupil detection process can be performed with high calculation efficiency and high precision. realizable. In addition, since a complicated optical system is not required, pupil detection processing can be realized with a simple device configuration.

従来の特開２０１７－１０２７３１号公報に記載された視線検出装置は、対象者の頭部の３次元位置を検出する対象者検出装置と、狭視野カメラと、狭視野カメラの姿勢を調整するパンチルト機構とを備えていた。この装置では、対象者検出装置によって検出された対象者の頭部の３次元位置を基に狭視野カメラの姿勢及びズーム値が制御され、姿勢及びズーム値が制御された狭視野カメラによって得られた明瞳孔画像及び暗瞳孔画像を用いて、対象者の瞳孔の位置が検出される。一方で、この従来の装置においては、検出光学系が２種類必要となり装置構成が複雑化する傾向にある。これに対して、本実施形態の視線検出装置１の構成によれば、複雑な光学系を必要としないので、簡易な装置構成によって瞳孔検出処理を実現することができる。 The line of sight detection device described in the conventional Japanese Patent Application Publication No. 2017-102731 includes a subject detection device that detects the three-dimensional position of the subject's head, a narrow-field camera, and a pan-tilt that adjusts the attitude of the narrow-field camera. It was equipped with a mechanism. In this device, the posture and zoom value of the narrow-field camera are controlled based on the three-dimensional position of the subject's head detected by the subject detection device, and the posture and zoom value are obtained by the controlled narrow-field camera. The position of the subject's pupil is detected using the bright pupil image and the dark pupil image. On the other hand, this conventional device requires two types of detection optical systems, which tends to complicate the device configuration. On the other hand, according to the configuration of the line of sight detection device 1 of this embodiment, a complicated optical system is not required, so that pupil detection processing can be realized with a simple device configuration.

ここで、本実施形態においては、瞳孔位置の予測には、ニューラルネットワークを用いた機械学習モデルが用いられている。この場合、顔画像上の瞳孔の位置の予測精度を簡易な学習手法によって確実に高めることができ、安定した瞳孔検出処理が実現される。 In this embodiment, a machine learning model using a neural network is used to predict the pupil position. In this case, the accuracy of predicting the position of the pupil on the face image can be reliably improved by a simple learning method, and stable pupil detection processing can be achieved.

さらに、本実施形態では、顔画像を切り出した部分画像を入力データとして機械学習モデルに入力し、機械学習モデルを用いて、目の特徴部が含まれる部分画像の位置を探索することにより、瞳孔の位置が予測されている。この場合、顔画像を切り出した部分画像を入力データとして用いることにより、簡易な処理によって顔画像上の瞳孔の位置を予測することができる。その結果、瞳孔検出処理の演算効率をより高めることができる。 Furthermore, in this embodiment, a partial image cut out from a face image is input to a machine learning model as input data, and the machine learning model is used to search for the position of the partial image that includes the characteristic part of the eye. location is predicted. In this case, by using a partial image cut out from the face image as input data, the position of the pupil on the face image can be predicted through simple processing. As a result, the calculation efficiency of the pupil detection process can be further improved.

このとき、機械学習モデルの入力データとして用いる部分画像が、目の特徴部の全体を含むサイズで切り出されている。この場合、顔画像を切り出した部分画像に目の特徴部の全体を収めることができ、瞳孔の位置の予測精度を高めることができる。その結果、より安定した瞳孔検出処理が実現される。 At this time, the partial image used as input data for the machine learning model is cut out to a size that includes the entire characteristic part of the eye. In this case, the entire characteristic part of the eye can be included in the partial image extracted from the face image, and the accuracy of predicting the position of the pupil can be improved. As a result, more stable pupil detection processing is realized.

また、画像処理装置２０は、機械学習モデルを学習させるモデル学習部２７をさらに有し、モデル学習部２７は、対象者Ａの閉眼時に取得された顔画像を、対象者Ａの閉眼直前に検出された瞳孔の位置を基準に切り出し、切り出した顔画像をトレーニングデータとして用いて、機械学習モデルを学習させている。こうすれば、切り出した顔画像と瞳孔の位置との関係が適切に設定されたトレーニングデータを作成することができ、そのトレーニングデータを用いて学習させることにより瞳孔の位置の予測精度を高めることができる。その結果、より安定した瞳孔検出処理が実現される。また、本実施形態では、トレーニングデータを較正処理時に自動的に取得する自動アノテーションを可能としている。これにより、マニュアルによるトレーニングデータの取得が不要となり、学習時のユーザの作業負担を軽減することができる。 The image processing device 20 further includes a model learning unit 27 that learns a machine learning model, and the model learning unit 27 detects a face image acquired when the subject A closes his or her eyes immediately before the subject A closes his or her eyes. A machine learning model is trained using the extracted facial images as training data. In this way, it is possible to create training data in which the relationship between the extracted face image and the pupil position is appropriately set, and by using this training data for learning, it is possible to improve the prediction accuracy of the pupil position. can. As a result, more stable pupil detection processing is realized. Furthermore, this embodiment enables automatic annotation in which training data is automatically acquired during calibration processing. This eliminates the need for manual acquisition of training data, reducing the user's workload during learning.

また、画像処理装置２０は、カメラ１０から瞳孔までの距離を検出する機能を有し、入力データとしての部分画像の切り出しのサイズを、瞳孔の距離に応じて可変に設定している。このような機能により、入力データとして用いる部分画像のサイズを適切に設定することができ、瞳孔の位置の予測精度を高めることができる。その結果、より安定した瞳孔検出処理が実現される。また、瞳孔位置の予測に用いる機械学習モデルにおいて、画像サイズに変化があっても予測できるようにするために設けるプーリング層を省略することができる。よって、学習時及び予測時の演算を高速化することができる。 The image processing device 20 also has a function of detecting the distance from the camera 10 to the pupil, and sets the size of the cutout of the partial image as input data variably according to the distance of the pupil. With such a function, it is possible to appropriately set the size of a partial image used as input data, and it is possible to improve the prediction accuracy of the pupil position. As a result, more stable pupil detection processing is realized. Further, in the machine learning model used for predicting the pupil position, a pooling layer provided to enable prediction even if there is a change in image size can be omitted. Therefore, it is possible to speed up calculations during learning and prediction.

また、画像処理装置２０は、カメラ１０から瞳孔までの距離を検出する機能を有し、トレーニングデータとしての顔画像の切り出しのサイズを、瞳孔の距離に応じて可変に設定している。このような機能により、トレーニングデータとして用いる顔画像の切り出しのサイズを適切に設定することができ、瞳孔の位置の予測精度を高めることができる。その結果、より安定した瞳孔検出処理が実現される。 The image processing device 20 also has a function of detecting the distance from the camera 10 to the pupil, and sets the size of the cutout of the face image as training data variably according to the distance of the pupil. With such a function, it is possible to appropriately set the size of the cutout of the face image used as training data, and it is possible to improve the prediction accuracy of the pupil position. As a result, more stable pupil detection processing is realized.

また、画像処理装置２０で用いられる機械学習モデル２は、目の特徴部が含まれる部分画像の位置の二次元的なずれ量を予測するモデルであり、モデル学習部２７は、切り出した顔画像を二次元的にずらしながらシフト画像を生成し、シフト画像と当該シフト画像の二次元的なずれ量とをトレーニングデータとして用いて、学習モデルを学習させている。この場合、予測時に入力データとして用いる複数の部分画像間の顔画像上におけるずれ量の大きさに関わらず、顔画像上の瞳孔の位置の予測精度を高めることができる。その結果、より安定した瞳孔検出処理が実現される。 Further, the machine learning model 2 used in the image processing device 20 is a model that predicts the two-dimensional shift amount of the position of a partial image that includes the eye feature, and the model learning unit 27 is a model that predicts the amount of two-dimensional shift in the position of a partial image that includes the eye feature part. A shifted image is generated while shifting the image two-dimensionally, and a learning model is trained using the shifted image and the two-dimensional shift amount of the shifted image as training data. In this case, the accuracy of predicting the position of the pupil on the face image can be improved regardless of the amount of deviation on the face image between the plurality of partial images used as input data at the time of prediction. As a result, more stable pupil detection processing is realized.

また、画像処理装置２０の視線検出部２３は、連続したフレームの顔画像上で検出された瞳孔の位置を用いて瞳孔の位置を追尾することにより、瞳孔の位置を検出し、直前のフレームの顔画像上で瞳孔の位置の検出が失敗した場合、瞳孔位置予測部２５によって予測された瞳孔の位置を利用して瞳孔の位置を追尾する、こととしてもよい。かかる構成によれば、連続して瞳孔の位置を検出する際に対象者Ａが瞬き等により閉眼して画像上に瞳孔の像が現れなくなっても安定して瞳孔の位置を追尾することができ、対象者Ａが開眼して画像上に再度瞳孔の像が現れた際に、閉眼時に追尾していた瞳孔位置をウィンドウ設定に利用することにより、開眼直後より、遅れなく、安定して瞳孔の位置を検出することができる。 In addition, the line of sight detection unit 23 of the image processing device 20 detects the position of the pupil by tracking the position of the pupil using the position of the pupil detected on the face image of consecutive frames, and detects the position of the pupil in the previous frame. If detection of the pupil position on the face image fails, the pupil position predicted by the pupil position prediction unit 25 may be used to track the pupil position. According to this configuration, when detecting the position of the pupil continuously, even if the subject A closes his eyes due to blinking or the like and the image of the pupil no longer appears on the image, the position of the pupil can be stably tracked. When subject A opens his eyes and the pupil image appears again on the image, by using the pupil position that was tracked when the eyes were closed for window setting, the pupil image can be stably detected without delay immediately after opening the eyes. The location can be detected.

本発明は、上述した実施形態に限定されるものではない。上記実施形態の構成は様々変更されうる。 The present invention is not limited to the embodiments described above. The configuration of the above embodiment may be modified in various ways.

例えば、上記実施形態に係る画像処理装置２０は学習用画像作成部２６及びモデル学習部２７を備えていたが、学習用画像作成部２６及びモデル学習部２７のいずれかあるいは両方の機能は、画像処理装置２０とは別のコンピュータ内に実装され、トレーニングデータの作成、及び、機械学習モデルの学習のいずれか一方あるいは両方は、別のコンピュータによって実行されてもよい。 For example, although the image processing device 20 according to the above embodiment includes the learning image creation unit 26 and the model learning unit 27, the functions of either or both of the learning image creation unit 26 and the model learning unit 27 are It may be implemented in a computer separate from the processing device 20, and either or both of the creation of the training data and the learning of the machine learning model may be performed by the separate computer.

また、本実施形態において探索の対象とされている目の特徴部は、睫毛の代わりに、瞼、眉毛、等とされてもよい。また、探索対象は、目の特徴部の代わりに、鼻孔等の顔の特徴部とされてもよい。鼻孔等の顔の特徴部を探索の対象とする場合には、画像処理装置２０は、予め、瞳孔と顔の特徴部との間の相対的な３次元位置の情報を記憶しておき、さらに瞳孔と顔の特徴部から求まる頭部の回転状態と移動状態の情報を基に瞳孔の位置を予測することが好ましい。 Furthermore, in this embodiment, the characteristic parts of the eye that are searched may be eyelids, eyebrows, etc. instead of eyelashes. Furthermore, the search target may be a facial feature such as a nostril instead of an eye feature. When searching for a facial feature such as a nostril, the image processing device 20 stores information on the relative three-dimensional position between the pupil and the facial feature in advance, and further It is preferable to predict the position of the pupil based on information about the rotational state and movement state of the head determined from the pupil and facial features.

また、視線検出装置１は、予測した瞳孔の位置を用いて瞳孔を追尾して顔画像上に設定するウィンドウの位置を設定して瞳孔の位置を検出していたが、特開２０１７－１０２７３１号公報に記載の構成と同様に、狭視野カメラの姿勢及びズーム値が制御可能な構成が採用されてもよい。この場合、視線検出装置１は、予測した瞳孔の３次元位置を用いて、狭視野カメラの姿勢及びズーム値を制御することによって、瞳孔を追尾するように構成される。このような変形例によっても、対象者Ａが閉眼したときも瞳孔を追尾することができ、高い演算効率で、高精度の瞳孔検出処理が実現できる。 Furthermore, the line of sight detection device 1 detects the position of the pupil by tracking the pupil using the predicted pupil position and setting the position of a window to be set on the face image. Similar to the configuration described in the publication, a configuration in which the attitude and zoom value of the narrow-field camera can be controlled may be adopted. In this case, the line of sight detection device 1 is configured to track the pupil by controlling the attitude and zoom value of the narrow-field camera using the predicted three-dimensional position of the pupil. Even with such a modification, the pupil can be tracked even when the subject A closes his eyes, and pupil detection processing with high calculation efficiency and high accuracy can be realized.

また、視線検出装置１の画像処理装置２０は、ニューラルネットワークを用いた機械学習モデルによって瞳孔位置を予測していたが、テンプレートマッチング等の他の画像処理によって瞳孔位置を予測するように動作してもよい。このような動作によっても、顔画像中から目の特徴部の位置を探索でき、この位置を基に瞳孔の位置を予測することができる。 In addition, the image processing device 20 of the line of sight detection device 1 used to predict the pupil position using a machine learning model using a neural network, but it operates to predict the pupil position using other image processing such as template matching. Good too. Through such an operation, the position of the characteristic part of the eye can be searched for in the face image, and the position of the pupil can be predicted based on this position.

また、画像処理装置２０は、ステレオ法によって瞳孔の３次元位置を計算する代わりに、特開２００７－２６８１６４号公報に記載の手法を用いて、画像上の瞳孔の位置、鼻孔の位置、その他の顔の特徴部の位置、あるいは顔に付したマーカの位置等を用いて、瞳孔の３次元位置を求めるように機能してもよい。また、画像処理装置２０は、顔にドットパターンを与え、このドットパターンによって顔全体の形状を把握しながら同時に瞳孔を検出することにより、顔全体の構造（形状）に対する瞳孔の３次元位置を求めるように機能してもよい。また、画像処理装置２０は、ＴＯＦ（Time Of Flight）カメラを用いて画素毎に顔までの距離を求めることにより、顔の構造に対する瞳孔の３次元位置を求めるように機能してもよい。 Furthermore, instead of calculating the three-dimensional position of the pupil using the stereo method, the image processing device 20 calculates the position of the pupil, the position of the nostril, etc. It may also function to determine the three-dimensional position of the pupil using the position of a facial feature, the position of a marker attached to the face, or the like. Furthermore, the image processing device 20 provides a dot pattern to the face, and simultaneously detects the pupil while grasping the shape of the entire face using this dot pattern, thereby determining the three-dimensional position of the pupil with respect to the structure (shape) of the entire face. It may function as such. The image processing device 20 may also function to determine the three-dimensional position of the pupil with respect to the facial structure by determining the distance to the face for each pixel using a TOF (Time Of Flight) camera.

上述した実施形態に係る画像処理装置２０による学習データの作成においては、目の特徴部である睫毛全体が含まれるような閉眼画像が生成されている。この閉眼画像のサイズが小さすぎると睫毛が画像からはみ出てしまい、機械学習モデルによる瞳孔位置の予測精度が低下する。逆に閉眼画像のサイズが大きすぎると睫毛以外の特徴部（例えば、眉毛、髪の毛等）が画像に含まれるため、その特徴部の影響により機械学習モデルによる瞳孔位置の予測精度が低下する。画像処理装置２０では、閉眼画像のサイズに起因した瞳孔位置の予測精度の低下を防止するために、閉眼画像のサイズの適正化を自動で行う機能を有していてもよい。例えば、画像処理装置２０の学習用画像作成部２６は、最初に閉眼画像を比較的大きなサイズで取得した後に、取得した閉眼画像の中から画像解析によって睫毛の存在する領域を特定し、部分画像のシフト量よりも大きな隙間が睫毛の周辺に確保されるようなサイズに閉眼画像のサイズを設定する。このような機能により、トレーニングデータとして用いる顔画像の切り出しのサイズを睫毛の領域に応じて適切に設定することができ、瞳孔の位置の予測精度を高めることができる。その結果、より安定した瞳孔検出処理が実現される。 When the learning data is created by the image processing device 20 according to the embodiment described above, an eye-closed image that includes the entire eyelashes, which are the characteristic part of the eye, is generated. If the size of this eye-closed image is too small, the eyelashes will protrude from the image, reducing the accuracy of predicting the pupil position by the machine learning model. On the other hand, if the size of the eyes-closed image is too large, the image will include features other than eyelashes (for example, eyebrows, hair, etc.), and the accuracy of predicting the pupil position by the machine learning model will decrease due to the influence of these features. The image processing device 20 may have a function of automatically optimizing the size of the eye-closed image in order to prevent the prediction accuracy of the pupil position from decreasing due to the size of the eye-closed image. For example, the learning image creation unit 26 of the image processing device 20 first acquires an eye-closed image in a relatively large size, and then identifies a region where eyelashes exist from the acquired eye-closed image by image analysis, and creates a partial image. The size of the eye-closed image is set so that a gap larger than the shift amount is secured around the eyelashes. With such a function, the size of the cutout of the face image used as training data can be appropriately set according to the eyelash area, and the prediction accuracy of the pupil position can be improved. As a result, more stable pupil detection processing is realized.

また、閉眼画像において閉眼直前の瞳孔位置と睫毛の位置とがずれてしまっている場合、睫毛の位置が閉眼画像の領域の端に位置してしまい、機械学習モデルによる瞳孔位置の予測精度が低下することも考えられる。つまり、閉眼画像の中央付近に睫毛が位置することが瞳孔位置の予測精度の向上には好ましいと考えられる。この場合は、閉眼画像が比較的小さなサイズに設定することができ、瞳孔位置の予測の演算時間を短くすることもできる。閉眼画像における睫毛の位置ずれに起因した瞳孔位置の予測精度の低下を防止するために、画像処理装置２０の学習用画像作成部２６は、いったん取得した閉眼画像の中から画像解析によって睫毛の存在する領域を特定し、閉眼画像中の中央に睫毛が位置するように閉眼画像の切り出し領域を変更するように設定する。この場合も、学習用画像作成部２６は、瞳孔の位置と閉眼画像の切り出し位置との関係を示すデータを、画像処理装置２０内に記憶する。このような機能により、トレーニングデータとして用いる顔画像の切り出しの位置を睫毛の領域に応じて適切に設定することができ、瞳孔の位置の予測精度を高めることができる。その結果、より安定した瞳孔検出処理が実現される。 Additionally, if the position of the pupil just before closing the eyes and the position of the eyelashes are misaligned in the eye-closed image, the position of the eyelashes will be located at the edge of the area of the eye-closed image, reducing the accuracy of predicting the pupil position by the machine learning model. It is also possible to do so. In other words, it is considered preferable for the eyelashes to be located near the center of the eye-closed image in order to improve the prediction accuracy of the pupil position. In this case, the eye-closed image can be set to a relatively small size, and the calculation time for predicting the pupil position can also be shortened. In order to prevent a decrease in the prediction accuracy of the pupil position due to the positional deviation of the eyelashes in the eye-closed image, the learning image creation unit 26 of the image processing device 20 calculates the presence of eyelashes by image analysis from the acquired eye-closed image. A setting is made to change the cutout area of the eye-closed image so that the eyelashes are located at the center of the eye-closed image. In this case as well, the learning image creation unit 26 stores in the image processing device 20 data indicating the relationship between the pupil position and the cutout position of the eye-closed image. With such a function, it is possible to appropriately set the cutting position of the face image used as training data according to the eyelash area, and it is possible to improve the prediction accuracy of the pupil position. As a result, more stable pupil detection processing is realized.

また、本実施形態の画像処理装置２０の視線検出部２３は、瞳孔の位置検出時のウィンドウの設定を次のように行ってもよい。具体的には、視線検出部２３は、対象者Ａが閉眼して再度開眼した場合は、瞳孔位置予測部２５により追尾されていた閉眼時の瞳孔位置の予測結果を用いて、視線検出部２３において所定の大きさのウィンドウを設定して、明瞳孔画像と暗瞳孔画像の差分画像を基にした瞳孔検出を再開することができる。従来の瞳孔検出装置では、閉眼後に開眼状態になってからの数フレームの間は瞳孔を検出できない状態になる傾向にあった。上記の視線検出部２３の機能により、対象者Ａが閉眼後に再度開眼した直後のフレームから、遅れなく、安定して瞳孔検出を再開することができる。 Further, the line of sight detection unit 23 of the image processing device 20 of this embodiment may set a window when detecting the position of the pupil as follows. Specifically, when subject A closes his eyes and opens them again, the line of sight detection unit 23 uses the prediction result of the pupil position when the eyes are closed, which was tracked by the pupil position prediction unit 25. By setting a window of a predetermined size in , it is possible to restart pupil detection based on the difference image between the bright pupil image and the dark pupil image. Conventional pupil detection devices tend to be unable to detect the pupil for several frames after the eyes are opened after the eyes are closed. Due to the function of the line of sight detection unit 23 described above, pupil detection can be restarted stably without delay from the frame immediately after the subject A reopens his eyes after closing his eyes.

また、本実施形態に係る画像処理装置２０の瞳孔位置予測部２５は、機械学習モデル１と機械学習モデル２との２つのネットワークを使用して瞳孔の位置を予測している。一方で、変形例として、画像処理装置２０は、２つのネットワークの予測機能を統合した１つのネットワークを用いて瞳孔の位置を予測してもよいし、３つ以上のネットワークを用いて瞳孔の位置を予測してもよい。例えば、３つのネットワークとして、顔画像のウィンドウ内から切り出された複数の部分画像を用いて閉眼画像を予測する「機械学習モデル１」と、部分画像の閉眼画像からのずれ量を予測する「機械学習モデル２」と、顔画像の全体から切り出された複数の部分画像を用いて閉眼画像を予測する「機械学習モデル３」とを用いてもよい。 Furthermore, the pupil position prediction unit 25 of the image processing device 20 according to the present embodiment predicts the position of the pupil using two networks, machine learning model 1 and machine learning model 2. On the other hand, as a modified example, the image processing device 20 may predict the pupil position using one network that integrates the prediction functions of two networks, or may predict the pupil position using three or more networks. may be predicted. For example, the three networks are "machine learning model 1" that predicts an eye-closed image using multiple partial images cut out from within the window of a face image, and "machine learning model 1" that predicts the amount of deviation of a partial image from the eye-closed image. "Learning Model 2" and "Machine Learning Model 3" which predicts an eye-closed image using a plurality of partial images cut out from the entire face image may be used.

１…視線検出装置（瞳孔検出装置）、１０…カメラ、１３…光源、２０…画像処理装置（演算装置）、２３…視線検出部（瞳孔位置検出部、瞳孔距離検出部）、２４…部分画像作成部、２５…瞳孔位置予測部、２７…モデル学習部、Ａ…対象者、Ｇ_Ｆ１，Ｇ_Ｆ２…顔画像、ＧＰ_Ｆ１…部分画像、ＧＰ_Ｌ１，ＧＰ_Ｒ１…閉眼画像、Ｐ_Ｌ１，Ｐ_Ｒ１…瞳孔位置。 1... Line of sight detection device (pupil detection device), 10... Camera, 13... Light source, 20... Image processing device (computation device), 23... Line of sight detection section (pupil position detection section, pupil distance detection section), 24... Partial image Creation _{unit, 25... Pupil position prediction unit, 27... Model learning unit, A... Subject, GF1, GF2... Face image, GP F1... Partial image, GP L1} _, _GP _R1 _... Eyes closed image, _PL1 , _PR1 ...pupil position.

Claims

a camera that captures facial images of a subject at consecutive timings by capturing an image of the subject's face;
a light source that emits light toward the subject's face;
a calculation device that processes a face image acquired by the camera at the timing of irradiation of the light;
The arithmetic device is
a pupil position detection unit that detects the position of the subject's pupil on the face image;
a pupil position prediction unit that predicts the position of the pupil of the subject when the subject's eyes are closed by searching for the position of the characteristic part of the face in the eye-closed state on the face image;
The pupil position detection unit detects the position of the pupil by tracking the pupil using the position of the pupil predicted by the pupil position prediction unit.
Pupil detection device.

The facial features are eye features of the subject;
The pupil detection device according to claim 1.

The pupil position prediction unit predicts the pupil position using a machine learning model using a neural network.
The pupil detection device according to claim 1 or 2.

The pupil position prediction unit inputs a partial image extracted from the face image to the machine learning model as input data, and uses the machine learning model to search for a position of the partial image including the facial feature. predicting the position of the pupil by
The pupil detection device according to claim 3.

The pupil position prediction unit cuts out the partial image in a size that includes the entire feature of the face.
The pupil detection device according to claim 4.

The arithmetic device further includes a model learning unit that learns the machine learning model,
The model learning unit cuts out the face image acquired when the subject's eyes are closed based on the position of the pupil detected by the pupil position detection unit immediately before the subject closes his eyes, and extracts the cut out face image. learning the machine learning model using as training data;
The pupil detection device according to claim 4 or 5.

The arithmetic device further includes a pupil distance detection unit that detects a distance from the camera to the pupil,
The pupil position prediction unit variably sets the size of the cutout of the partial image as the input data according to the distance of the pupil.
The pupil detection device according to claim 4 or 5.

The arithmetic device further includes a pupil distance detection unit that detects a distance from the camera to the pupil,
The model learning unit variably sets the size of the cutout of the face image as training data according to the distance of the pupil.
The pupil detection device according to claim 6.

The model learning unit specifies a region in which the facial feature exists from the cut out face image, and sets a size for cutting out the face image as training data based on the specified region.
The pupil detection device according to claim 6.

The model learning unit identifies a region in which the facial features are present in the cut out facial image, and adjusts the cutout of the facial image as training data so that the identified region is located in the center of the image. set the position,
The pupil detection device according to claim 6 or 9.

The machine learning model calculates a positional shift of a partial image including the facial features with respect to the face image cut out based on the position of the pupil detected by the pupil position detection unit immediately before the subject's eyes are closed. It is a model that predicts the amount of
The model learning unit generates a shifted image while shifting the cut out face image, and uses the shifted image and a shift amount of the shifted image as training data to learn the learning model.
The pupil detection device according to claim 6 or 8.

The pupil position detection unit detects the pupil position by tracking the pupil position using the pupil position detected on the face image in consecutive frames, and detects the pupil position on the face image in the immediately previous frame. If detection of the position of the pupil on the image fails, tracking the position of the pupil using the position of the pupil predicted by the pupil position prediction unit;
The pupil detection device according to any one of claims 1 to 11.

The pupil position detection section is configured to detect a window on the face image based on the pupil position predicted by the pupil position prediction section at the timing when the subject's eyes are closed, when the subject closes his eyes and then opens his eyes again. Detecting the position of the pupil by setting
The pupil detection device according to claim 12.

A camera that captures facial images of a target person at successive timings by imaging the target person's face, a light source that irradiates light toward the target person's face, and facial images acquired by the camera at the light irradiation timings. A pupil detection method using a calculation device that processes
a pupil position detection step in which the calculation device detects the position of the subject's pupil on the face image;
a pupil position prediction step in which the arithmetic device predicts the position of the pupil of the subject when the subject's eyes are closed by searching for the position of the feature part of the face in the eye-closed state on the face image;
In the pupil position detection step, the pupil position is detected by tracking the pupil using the pupil position predicted in the pupil position prediction step.
Pupil detection method.