JP7103443B2

JP7103443B2 - Information processing equipment, information processing methods, and programs

Info

Publication number: JP7103443B2
Application number: JP2021000498A
Authority: JP
Inventors: 雄介森下
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2021-01-05
Filing date: 2021-01-05
Publication date: 2022-07-20
Anticipated expiration: 2036-10-31
Also published as: JP2021061048A

Description

本開示は、視線推定装置等に関する。 The present disclosure relates to a line-of-sight estimation device and the like.

人間の視線（目の向き）は、その人物の行動や意図を解析する上で重要な手掛かりとなり得る。そのため、人間の視線に関する情報を推定する技術、特に人間の顔を含む画像（以下「顔画像」ともいう。）に基づいて視線を推定する技術が広く検討されている。 The line of sight (eye orientation) of a person can be an important clue in analyzing the behavior or intention of the person. Therefore, a technique for estimating information on the human line of sight, particularly a technique for estimating the line of sight based on an image including a human face (hereinafter, also referred to as a "face image") has been widely studied.

顔画像に基づいて視線を推定する技術として、例えば、特許文献１～３、非特許文献１～２に記載された技術がある。特許文献１は、画像に含まれる特徴点（画像特徴点）を用いる特徴点ベースの方法（feature-based methods）の一例を開示している。また、特許文献２及び非特許文献２は、物体の見え方（appearance）を利用するアピアランスベースの方法（appearance-based methods）の一例を開示している。非特許文献１は、瞳の虹彩の形状を楕円で近似することにより視線を推定する方法を開示している。 As a technique for estimating the line of sight based on a face image, for example, there are techniques described in Patent Documents 1 to 3 and Non-Patent Documents 1 and 2. Patent Document 1 discloses an example of feature-based methods using feature points (image feature points) included in an image. Further, Patent Document 2 and Non-Patent Document 2 disclose an example of appearance-based methods that utilize the appearance of an object. Non-Patent Document 1 discloses a method of estimating the line of sight by approximating the shape of the iris of the pupil with an ellipse.

特許第４８２９１４１号公報Japanese Patent No. 4829141 特開２００９－０５９２５７号公報Japanese Unexamined Patent Publication No. 2009-059257 特許第５７７２８２１号公報Japanese Patent No. 5772821

J. Wang, E. Sung, and R. Venkateswarlu, "Eye Gaze Estimation from a Single Image of One Eye," Proc. IEEE ICCV 2003, pp.I－136－143, 2003.J. Wang, E. Sung, and R. Venkateswarlu, "Eye Gaze Estimation from a Single Image of One Eye," Proc. IEEE ICCV 2003, pp.I-136-143, 2003. X. Zhang, Y. Sugano, M. Fritz and A. Bulling, "Appearance-Based Gaze Estimation in the Wild," Proc. IEEE CVPR 2015, pp. 4511-4520, 2015.X. Zhang, Y. Sugano, M. Fritz and A. Bulling, "Appearance-Based Gaze Estimation in the Wild," Proc. IEEE CVPR 2015, pp. 4511-4520, 2015.

上述のように、視線の推定にはさまざまな方法が用いられ、それぞれに特徴がある。しかし、いずれの方法にも、顔の向きや照明の明るさなどが特定の条件である場合において、推定の精度が低下する可能性がある。 As mentioned above, various methods are used to estimate the line of sight, and each has its own characteristics. However, in either method, the accuracy of estimation may decrease when the orientation of the face, the brightness of the illumination, or the like is a specific condition.

本開示の例示的な目的は、画像に基づく視線推定の精度を向上させる技術を提供することにある。 An exemplary object of the present disclosure is to provide techniques for improving the accuracy of image-based gaze estimation.

本開示の一態様の情報処理装置は、条件が異なる複数の画像を学習した学習結果を用いて、対象画像に含まれる対象の視線を複数推定する推定部と、その条件と、対象画像に関する条件とに基づいて、対象の視線を決定する決定部と、を備える。 The information processing apparatus of one aspect of the present disclosure uses a learning result of learning a plurality of images having different conditions to estimate a plurality of lines of sight of a target included in the target image, the conditions thereof, and conditions related to the target image. Based on the above, a determination unit for determining the line of sight of the target is provided.

本開示の一態様の情報処理方法においては、コンピュータが、条件が異なる複数の画像を学習した学習結果を用いて、対象画像に含まれる対象の視線を複数推定し、その条件と、対象画像に関する条件とに基づいて、対象の視線を決定する。 In one aspect of the information processing method of the present disclosure, a computer estimates a plurality of lines of sight of a target included in a target image by using learning results obtained by learning a plurality of images having different conditions, and the conditions and the target image are related. The line of sight of the target is determined based on the conditions.

本開示の一態様のプログラムは、条件が異なる複数の画像を学習した学習結果を用いて、対象画像に含まれる対象の視線を複数推定する処理と、その条件と、対象画像に関する条件とに基づいて、対象の視線を決定する処理と、をコンピュータに実行させる。 The program of one aspect of the present disclosure is based on a process of estimating a plurality of lines of sight of a target included in a target image by using learning results obtained by learning a plurality of images having different conditions, the conditions, and conditions related to the target image. Then, the computer is made to execute the process of determining the line of sight of the target.

本開示によれば、画像に基づく視線推定の精度が向上する。 According to the present disclosure, the accuracy of line-of-sight estimation based on images is improved.

図１は、視線推定装置の構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the configuration of the line-of-sight estimation device. 図２は、視線推定方法の一例を示すフローチャートである。FIG. 2 is a flowchart showing an example of the line-of-sight estimation method. 図３は、データ処理装置の構成の一例を示すブロック図である。FIG. 3 is a block diagram showing an example of the configuration of the data processing device. 図４は、データ処理装置の動作例を示すフローチャートである。FIG. 4 is a flowchart showing an operation example of the data processing device. 図５は、顔画像の一例を示す図である。FIG. 5 is a diagram showing an example of a face image. 図６は、目領域の一例を示す図である。FIG. 6 is a diagram showing an example of the eye region. 図７は、目領域の画像の撮像条件を説明するための概念図である。FIG. 7 is a conceptual diagram for explaining the imaging conditions of the image in the eye region. 図８は、実施形態の効果の一例を示す図である。FIG. 8 is a diagram showing an example of the effect of the embodiment. 図９は、コンピュータ装置のハードウェア構成の一例を示すブロック図である。FIG. 9 is a block diagram showing an example of the hardware configuration of the computer device.

［第１実施形態］
図１は、一の実施形態に係る視線推定装置１００の構成を示すブロック図である。視線推定装置１００は、顔画像に含まれる視線を推定するための装置である。視線推定装置１００は、推定部１１０と、決定部１２０とを少なくとも含む。ただし、視線推定装置１００は、必要に応じて他の構成要素を含んでもよい。 [First Embodiment]
FIG. 1 is a block diagram showing a configuration of a line-of-sight estimation device 100 according to an embodiment. The line-of-sight estimation device 100 is a device for estimating the line-of-sight included in the face image. The line-of-sight estimation device 100 includes at least an estimation unit 110 and a determination unit 120. However, the line-of-sight estimation device 100 may include other components as needed.

ここにおいて、顔画像とは、人間の顔の一部又は全部を含む画像をいう。顔画像は、撮像装置（監視カメラ、電子機器の内蔵カメラ等）によって撮像された画像である。顔画像は、このような撮像された画像そのものであってもよいし、撮像された画像の一部、すなわち撮像された画像から顔に相当する領域が抽出された画像であってもよい。 Here, the face image means an image including a part or all of a human face. The face image is an image captured by an imaging device (surveillance camera, built-in camera of electronic device, etc.). The face image may be the captured image itself, or may be a part of the captured image, that is, an image in which a region corresponding to the face is extracted from the captured image.

推定部１１０は、顔画像に含まれる顔の視線を推定する。例えば、推定部１１０は、顔画像に含まれる目の領域を推定することにより、視線、すなわち人間の目が見ている方向（より正確には向き）を推定する。推定部１１０による視線の推定方法は、周知のいずれの方法であってもよい。例えば、推定部１１０は、教師あり学習などの機械学習を用いることにより視線を推定することができる。具体的には、推定部１１０は、あらかじめ収集された顔画像を用いて顔画像と視線の関係を学習してもよい。 The estimation unit 110 estimates the line of sight of the face included in the face image. For example, the estimation unit 110 estimates the line of sight, that is, the direction (more accurately, the direction) that the human eye is looking at by estimating the area of the eyes included in the face image. The method of estimating the line of sight by the estimation unit 110 may be any well-known method. For example, the estimation unit 110 can estimate the line of sight by using machine learning such as supervised learning. Specifically, the estimation unit 110 may learn the relationship between the face image and the line of sight using the face image collected in advance.

推定部１１０は、顔画像に含まれる顔の視線を複数の推定器で推定する。換言すれば、推定部１１０は、単一の顔画像に対して複数の推定方法を用いて視線を推定する。複数の推定器により推定される視線は、その方向が異なり得る。したがって、推定部１１０により推定される視線は、複数通りある。 The estimation unit 110 estimates the line of sight of the face included in the face image with a plurality of estimators. In other words, the estimation unit 110 estimates the line of sight for a single face image using a plurality of estimation methods. The line of sight estimated by a plurality of estimators may have different directions. Therefore, there are a plurality of lines of sight estimated by the estimation unit 110.

複数の推定器は、それぞれ、顔画像に含まれる顔の視線を所定のアルゴリズムに基づいて推定する。複数の推定器は、それぞれ別異の回路によって実現されてもよいが、単一の回路によって実現されてもよい。複数の推定器は、ソフトウェアを用いて実現されてもよい。 Each of the plurality of estimators estimates the line of sight of the face included in the face image based on a predetermined algorithm. The plurality of estimators may be realized by different circuits, or may be realized by a single circuit. The plurality of estimators may be implemented using software.

視線の推定が機械学習により行われる場合、推定器の相違は、事前の学習に用いられるデータの相違によって生じ得る。すなわち、推定部１１０は、あるデータセットを用いた学習と、別のデータセットを用いた学習のそれぞれに基づいて視線を推定してもよい。事前の学習に用いられるデータセットが異なれば、当該データセットに基づく視線の推定結果も異なり得る。 When the line-of-sight estimation is performed by machine learning, the difference in the estimator can be caused by the difference in the data used for the prior learning. That is, the estimation unit 110 may estimate the line of sight based on learning using a certain data set and learning using another data set. If the data set used for the prior learning is different, the estimation result of the line of sight based on the data set may be different.

決定部１２０は、顔画像に含まれる顔の視線を決定する。具体的には、決定部１２０は、推定部１１０による視線の推定結果に基づいて視線を決定する。換言すれば、決定部１２０は、推定部１１０により推定された複数の視線（すなわち複数の方向）に基づいて、単一の方向を決定する。 The determination unit 120 determines the line of sight of the face included in the face image. Specifically, the determination unit 120 determines the line of sight based on the estimation result of the line of sight by the estimation unit 110. In other words, the determination unit 120 determines a single direction based on the plurality of lines of sight (that is, a plurality of directions) estimated by the estimation unit 110.

より詳細には、決定部１２０は、推定部１１０により推定された複数の視線と、第１の条件情報と、第２の条件情報とに基づいて視線を決定する。第１の条件情報は、顔画像の撮像に関する条件を少なくとも含む。換言すれば、第１の条件情報は、顔画像が撮像装置によりどのように撮像されたかを示す情報を含む。第１の条件情報は、物理量等を表す数値によってこのような条件を表してもよい。 More specifically, the determination unit 120 determines the line of sight based on the plurality of lines of sight estimated by the estimation unit 110, the first condition information, and the second condition information. The first condition information includes at least a condition relating to the acquisition of a facial image. In other words, the first condition information includes information indicating how the facial image was captured by the imaging device. The first condition information may represent such a condition by a numerical value representing a physical quantity or the like.

一例として、第１の条件情報は、撮像装置と被写体たる人物との相対的な位置関係を示す情報であってもよい。具体的には、第１の条件情報は、撮像装置と人物との距離や、人物の顔の高さを基準とした撮像装置の高さを示してもよい。あるいは、第１の条件情報は、撮像装置の性能を示す情報であってもよい。具体的には、第１の条件情報は、撮像装置の光学系のパラメータ（画角等）を示してもよい。 As an example, the first condition information may be information indicating the relative positional relationship between the image pickup apparatus and the person who is the subject. Specifically, the first condition information may indicate the distance between the image pickup device and the person and the height of the image pickup device based on the height of the face of the person. Alternatively, the first condition information may be information indicating the performance of the image pickup apparatus. Specifically, the first condition information may indicate parameters (angle of view, etc.) of the optical system of the image pickup apparatus.

また、第１の条件情報は、撮像装置の設置角度を示してもよい。ここにおいて、撮像装置の設置角度とは、撮像される人物の顔の方向と撮像装置の光軸方向とがなす角度をいう。ここでいう顔の方向は、顔画像に基づいて算出されてもよいし、あらかじめ決められてもよい。例えば、ある通路を通行する不特定多数の人物を撮像装置によって撮像する場合、顔の方向は、その通路を通行する人物の平均的又は典型的な顔の方向としてもよい。この場合、顔の方向は、通路の進行方向と一致する可能性が高い。なお、設置角度は、水平角と仰俯角（鉛直角ともいう。）によって表されてもよいし、鉛直角を省略して水平角のみで表されてもよい。 Further, the first condition information may indicate the installation angle of the imaging device. Here, the installation angle of the image pickup device means an angle formed by the direction of the face of the person to be imaged and the direction of the optical axis of the image pickup device. The direction of the face referred to here may be calculated based on the face image or may be determined in advance. For example, when an unspecified number of people passing through a certain passage are imaged by an imaging device, the direction of the face may be the average or typical face direction of the person passing through the passage. In this case, the direction of the face is likely to coincide with the direction of travel of the passage. The installation angle may be represented by a horizontal angle and an elevation / depression angle (also referred to as a vertical angle), or may be represented only by the horizontal angle without the vertical angle.

一方、第２の条件情報は、推定部１１０の複数の推定器のそれぞれに対応する条件を少なくとも含む。第２の条件情報が表す条件は、第１の条件情報が表す条件と比較可能である。例えば、事前に収集された顔画像のデータセットに基づく機械学習に基づいて視線が推定される場合、第２の条件情報が表す条件は、当該データセットに含まれる顔画像を撮像した際の撮像装置と人物との距離、撮像装置の設置角度又は画角（又はこれらのいずれかの平均値）であってもよい。 On the other hand, the second condition information includes at least the conditions corresponding to each of the plurality of estimators of the estimation unit 110. The condition represented by the second condition information can be compared with the condition represented by the first condition information. For example, when the line of sight is estimated based on machine learning based on a data set of face images collected in advance, the condition represented by the second condition information is the imaging when the face image included in the data set is imaged. It may be the distance between the device and the person, the installation angle or the angle of view of the image pickup device (or the average value of any of these).

決定部１２０は、第１の条件情報と第２の条件情報とを比較することによって視線を決定することができる。例えば、決定部１２０は、顔画像が撮像された際の条件と、推定部１１０により推定された複数の視線に対応する複数の条件（換言すれば、当該複数の視線の推定に用いられた複数の推定器に対応する複数の条件）とを比較する。決定部１２０は、これらの比較結果に基づいて視線を決定する。 The determination unit 120 can determine the line of sight by comparing the first condition information and the second condition information. For example, the determination unit 120 includes a condition when the face image is captured and a plurality of conditions corresponding to the plurality of lines of sight estimated by the estimation unit 110 (in other words, a plurality of conditions used for estimating the plurality of line of sights). Compare with multiple conditions) corresponding to the estimator of. The determination unit 120 determines the line of sight based on these comparison results.

具体的には、決定部１２０は、推定部１１０により推定された複数の視線のうち、第２の条件情報により表される条件が、第１の条件情報により表される条件により近い視線に近付くように視線を決定する。例えば、決定部１２０は、推定部１１０により推定された複数の視線に対し、第１の条件情報と第２の条件情報との比較結果に応じた重みを付与した重み付き演算（重み付き加算、加重平均等）を実行することによって視線を決定してもよい。なお、決定部１２０は、第１の条件情報と第２の条件情報とを比較し、一定の基準を満たさない推定結果を除外した上で上記の重み付き演算を実行してもよい。 Specifically, the determination unit 120 brings the condition represented by the second condition information closer to the line of sight represented by the condition represented by the first condition information among the plurality of lines of sight estimated by the estimation unit 110. To determine the line of sight. For example, the determination unit 120 gives weights to a plurality of lines of sight estimated by the estimation unit 110 according to the comparison result between the first condition information and the second condition information (weighted addition, weighted addition, The line of sight may be determined by performing a weighted average, etc.). The determination unit 120 may compare the first condition information with the second condition information, exclude the estimation result that does not satisfy a certain criterion, and then execute the weighted operation.

図２は、本実施形態に係る視線推定方法を示すフローチャートである。視線推定装置１００は、このフローチャートに従って処理を実行することにより、顔画像に含まれる顔の視線を推定することができる。 FIG. 2 is a flowchart showing a line-of-sight estimation method according to the present embodiment. The line-of-sight estimation device 100 can estimate the line-of-sight of the face included in the face image by executing the process according to this flowchart.

ステップＳ１１において、推定部１１０は、顔画像に基づいて複数の視線を推定する。より詳細には、推定部１１０は、１つの顔画像に対して複数の推定器を適用することにより、複数の視線を推定結果として算出する。換言すれば、推定部１１０は、複数通りの方法で視線を推定するともいえる。 In step S11, the estimation unit 110 estimates a plurality of lines of sight based on the face image. More specifically, the estimation unit 110 calculates a plurality of lines of sight as an estimation result by applying a plurality of estimators to one face image. In other words, it can be said that the estimation unit 110 estimates the line of sight by a plurality of methods.

ステップＳ１２において、決定部１２０は、ステップＳ１１において推定された複数の視線に基づいて、一の視線を決定する。より詳細には、決定部１２０は、第１の条件情報と第２の条件情報とに基づいて、ステップＳ１１の推定に用いられた顔画像に対応する視線を決定する。 In step S12, the determination unit 120 determines one line of sight based on the plurality of lines of sight estimated in step S11. More specifically, the determination unit 120 determines the line of sight corresponding to the face image used for the estimation in step S11 based on the first condition information and the second condition information.

以上のとおり、本実施形態の視線推定装置１００は、顔画像に含まれる顔の視線を複数の推定器で推定し、推定された複数の視線に基づいて一の視線を決定する構成を有する。この構成は、単一の推定器を用いて視線を推定する場合に比べ、推定の精度が低下する可能性を低減させることが可能である。したがって、視線推定装置１００によれば、視線推定の精度を向上させることが可能である。 As described above, the line-of-sight estimation device 100 of the present embodiment has a configuration in which the line-of-sight of the face included in the face image is estimated by a plurality of estimators and one line of sight is determined based on the estimated plurality of lines of sight. This configuration can reduce the possibility that the accuracy of the estimation will decrease as compared with the case where the line of sight is estimated using a single estimator. Therefore, according to the line-of-sight estimation device 100, it is possible to improve the accuracy of the line-of-sight estimation.

視線推定の精度は、さまざまな要因によって変動し得る。例えば、視線推定の精度は、顔画像の撮像に関する条件によって変動し得る。具体的には、視線推定の精度は、被写体たる人物と撮像装置との相対的な位置関係（顔の向きなど）によって変動し得る。また、視線推定の精度は、撮像装置そのものの性能や、明るさ等の照明の条件などによっても変動し得る。また、視線推定の精度は、その推定方法によっては特定の条件下において低下する可能性がある。 The accuracy of line-of-sight estimation can vary due to a variety of factors. For example, the accuracy of line-of-sight estimation may vary depending on the conditions for capturing a facial image. Specifically, the accuracy of the line-of-sight estimation may vary depending on the relative positional relationship (face orientation, etc.) between the person who is the subject and the imaging device. In addition, the accuracy of line-of-sight estimation may vary depending on the performance of the imaging device itself, lighting conditions such as brightness, and the like. In addition, the accuracy of line-of-sight estimation may decrease under certain conditions depending on the estimation method.

視線推定装置１００は、複数の推定器を用いて推定された複数の視線に基づいて視線を決定することにより、単一の推定器を用いたことに起因する精度の低下を抑制することが可能である。したがって、視線推定装置１００によれば、顔画像が撮像された条件に対して頑健（ロバスト）な推定結果を得ることが可能である。換言すれば、視線推定装置１００は、さまざまな条件下で撮像された顔画像に対して良好な視線推定を実現することが可能である。 The line-of-sight estimation device 100 can suppress a decrease in accuracy due to the use of a single estimator by determining the line-of-sight based on a plurality of lines of sight estimated using a plurality of estimators. Is. Therefore, according to the line-of-sight estimation device 100, it is possible to obtain a robust estimation result with respect to the condition in which the face image is captured. In other words, the line-of-sight estimation device 100 can realize good line-of-sight estimation for face images captured under various conditions.

［第２実施形態］
図３は、別の実施形態に係るデータ処理装置２００の構成を示すブロック図である。データ処理装置２００は、第１実施形態の視線推定装置１００の一例に相当する。データ処理装置２００は、画像取得部２１０と、条件取得部２２０と、領域抽出部２３０と、視線推定部２４０と、統合部２５０と、出力部２６０とを含む。 [Second Embodiment]
FIG. 3 is a block diagram showing a configuration of a data processing device 200 according to another embodiment. The data processing device 200 corresponds to an example of the line-of-sight estimation device 100 of the first embodiment. The data processing device 200 includes an image acquisition unit 210, a condition acquisition unit 220, an area extraction unit 230, a line-of-sight estimation unit 240, an integration unit 250, and an output unit 260.

データ処理装置２００は、画像に基づいて視線を推定するための装置である。ここでいう画像は、静止画と動画のいずれであってもよい。例えば、動画に基づいて視線を推定する場合、動画のある期間には顔画像が含まれ、別の期間には顔画像が含まれない可能性がある。このような場合、データ処理装置２００は、顔画像が含まれる期間の画像について視線を推定し、顔画像が含まれない期間の画像について視線を推定しない（推定結果を出力しない）ように構成されてもよい。 The data processing device 200 is a device for estimating the line of sight based on an image. The image referred to here may be either a still image or a moving image. For example, when estimating the line of sight based on a moving image, the facial image may be included in one period of the moving image and may not be included in another period of the moving image. In such a case, the data processing device 200 is configured to estimate the line of sight for the image during the period including the face image and not to estimate the line of sight (do not output the estimation result) for the image during the period not including the face image. You may.

画像取得部２１０は、画像を取得する。例えば、画像取得部２１０は、他の装置から画像データを受け付けることにより画像を取得する。ここでいう他の装置は、監視カメラ等の撮像装置であってもよいし、複数の画像データが記録されたデータベース等の記憶装置であってもよい。画像取得部２１０は、領域抽出部２３０に画像データを供給する。 The image acquisition unit 210 acquires an image. For example, the image acquisition unit 210 acquires an image by receiving image data from another device. The other device referred to here may be an imaging device such as a surveillance camera, or a storage device such as a database in which a plurality of image data are recorded. The image acquisition unit 210 supplies image data to the area extraction unit 230.

ここでいう画像データは、画像が複数の画素の輝度値によって表現されたデータである。画像データの画素数、色数（色成分の数）、階調数などは、特定の数値に限定されない。画像取得部２１０により取得される画像データは、画素数や色数があらかじめ決められていてもよいが、そうでなくてもよい。説明の便宜上、以下においては、画像取得部２１０により取得される画像データを「入力画像データ」ともいう。 The image data referred to here is data in which the image is represented by the luminance values of a plurality of pixels. The number of pixels, the number of colors (the number of color components), the number of gradations, and the like of image data are not limited to specific numerical values. The number of pixels and the number of colors of the image data acquired by the image acquisition unit 210 may or may not be predetermined. For convenience of explanation, in the following, the image data acquired by the image acquisition unit 210 is also referred to as “input image data”.

説明の便宜上、以下においては、１つの画像データには１つの顔画像のみが含まれ得るとし、複数の顔画像が含まれていないものとする。ただし、データ処理装置２００は、１つの画像データに複数の顔画像が含まれる場合には、当該複数の顔画像のそれぞれに対して後述される処理を実行すれば足りる。 For convenience of explanation, in the following, it is assumed that one image data may include only one face image and does not include a plurality of face images. However, when one image data includes a plurality of face images, the data processing device 200 suffices to execute the processing described later for each of the plurality of face images.

画像取得部２１０は、入力された画像データをそのまま領域抽出部２３０に供給してもよいが、入力された画像データを加工してから領域抽出部２３０に供給してもよい。例えば、画像取得部２１０は、画像データにより表される画像から人間の顔を検出することにより、当該画像の一部である顔画像を表す画像データを生成し、生成された画像データを領域抽出部２３０に供給してもよい。 The image acquisition unit 210 may supply the input image data to the area extraction unit 230 as it is, or may process the input image data and then supply the input image data to the area extraction unit 230. For example, the image acquisition unit 210 detects a human face from an image represented by the image data, generates image data representing a face image that is a part of the image, and extracts the generated image data as an area. It may be supplied to the unit 230.

あるいは、画像取得部２１０は、画像の色数や階調数が所定の数値になるように画像データを変換してから領域抽出部２３０に供給してもよい。例えば、画像取得部２１０は、Ｒ（赤）、Ｇ（緑）、Ｂ（青）などの複数の色成分によりカラー画像を表す画像データを、単一成分のグレースケール画像を表す画像データに変換してもよい。 Alternatively, the image acquisition unit 210 may convert the image data so that the number of colors and the number of gradations of the image become predetermined numerical values, and then supply the image data to the area extraction unit 230. For example, the image acquisition unit 210 converts image data representing a color image by a plurality of color components such as R (red), G (green), and B (blue) into image data representing a single component grayscale image. You may.

条件取得部２２０は、カメラ情報を取得する。カメラ情報は、画像取得部２１０により取得される画像の撮像条件を含むデータである。ここでいう撮像条件は、例えば、撮像装置の設置角度である。そのほか、撮像条件は、撮像装置のレンズのパラメータ（画角など）や、推定される撮像時の視線の範囲を含み得る。カメラ情報は、第１実施形態の第１の条件情報の一例に相当する。 The condition acquisition unit 220 acquires camera information. The camera information is data including imaging conditions of an image acquired by the image acquisition unit 210. The imaging condition referred to here is, for example, the installation angle of the imaging device. In addition, the imaging conditions may include parameters (angle of view, etc.) of the lens of the imaging apparatus and an estimated range of the line of sight at the time of imaging. The camera information corresponds to an example of the first condition information of the first embodiment.

カメラ情報は、画像データとともに入力されてもよい。例えば、カメラ情報は、画像データに含まれるメタデータとして記述されていてもよい。あるいは、カメラ情報は、ユーザの操作によって入力されてもよい。この場合、条件取得部２２０は、キーボードやタッチスクリーンディスプレイを介してユーザの操作を受け付ける。 The camera information may be input together with the image data. For example, the camera information may be described as metadata included in the image data. Alternatively, the camera information may be input by a user operation. In this case, the condition acquisition unit 220 accepts the user's operation via the keyboard or the touch screen display.

領域抽出部２３０は、画像データから特定の領域を抽出する。領域抽出部２３０は、視線推定部２４０による視線の推定に必要な領域を抽出する。本実施形態において、領域抽出部２３０は、顔画像のうち特に目の周辺領域を抽出する。以下においては、領域抽出部２３０により抽出される領域のことを「目領域」という。目領域は、例えば、人間の両目を含む所定のサイズの長方形である。 The area extraction unit 230 extracts a specific area from the image data. The area extraction unit 230 extracts a region necessary for the line-of-sight estimation by the line-of-sight estimation unit 240. In the present embodiment, the region extraction unit 230 extracts a region around the eyes in particular from the face image. In the following, the region extracted by the region extraction unit 230 is referred to as an “eye region”. The eye area is, for example, a rectangle of a predetermined size that includes both human eyes.

領域抽出部２３０は、一般的な顔画像に特有の画像特徴に基づいて目領域を抽出することが可能である。領域抽出部２３０は、例えば、虹彩（いわゆる瞳）、強膜（いわゆる白目）、内眼角（いわゆる目頭）、外眼角（いわゆる目尻）、眉毛などを検出することにより目領域を抽出することができる。目領域の抽出には、例えば特許文献３に記載された方法など、周知の特徴点検出手法を用いることができる。 The area extraction unit 230 can extract the eye area based on the image features peculiar to a general face image. The region extraction unit 230 can extract the eye region by detecting, for example, the iris (so-called pupil), sclera (so-called white eye), inner eye angle (so-called inner corner of the eye), outer eye angle (so-called outer corner of the eye), eyebrows, and the like. .. For the extraction of the eye region, a well-known feature point detection method such as the method described in Patent Document 3 can be used.

領域抽出部２３０は、視線の推定方法に応じた前処理を実行してもよい。例えば、領域抽出部２３０は、抽出された目領域が水平でない場合、すなわち目領域における右目の中心の高さと左目の中心の高さとが一致しない場合に、右目と左目が水平に位置するように画像を回転してもよい。また、領域抽出部２３０は、目領域のサイズが一定のサイズになるように画像を拡大又は縮小してもよい。画像の回転処理、拡大処理（すなわち補間処理）及び縮小処理（すなわち間引き処理）には、周知の画像処理が適用可能である。このような画像処理を実行すると、目領域の縮尺や傾きが安定することによりこれらを学習する必要がなくなるため、視線の推定精度を向上させることが可能である。 The region extraction unit 230 may execute preprocessing according to the method of estimating the line of sight. For example, the region extraction unit 230 arranges that the right eye and the left eye are positioned horizontally when the extracted eye region is not horizontal, that is, when the height of the center of the right eye and the height of the center of the left eye in the eye region do not match. You may rotate the image. Further, the region extraction unit 230 may enlarge or reduce the image so that the size of the eye region becomes a constant size. Well-known image processing can be applied to image rotation processing, enlargement processing (that is, interpolation processing), and reduction processing (that is, thinning processing). When such image processing is executed, it is not necessary to learn the scale and inclination of the eye region by stabilizing them, so that it is possible to improve the estimation accuracy of the line of sight.

視線推定部２４０は、顔画像に含まれる顔の視線を推定する。視線推定部２４０は、より詳細には、視線推定器２４１₁、２４１₂、・・・、２４１_nを含む。ここにおけるｎの値、すなわち視線推定器の総数は、「２」以上であれば特定の数値に限定されない。以下において、視線推定器２４１₁、２４１₂、・・・、２４１_nは、それぞれが区別される必要がない場合には、「視線推定器２４１」と総称される。視線推定部２４０は、第１実施形態の推定部１１０の一例に相当する。 The line-of-sight estimation unit 240 estimates the line-of-sight of the face included in the face image. More specifically, the line-of-sight estimation unit 240 includes line-of _- sight estimators 241, 241, ..., _{241 n} _. The value of n here, that is, the total number of line-of-sight estimators is not limited to a specific numerical value as long as it is "2" or more. In the following, the line-of _- sight estimators 241, 241, ..., 241 _n are collectively referred to as "line-of-sight estimators 241" when they do not need to be distinguished from each _other . The line-of-sight estimation unit 240 corresponds to an example of the estimation unit 110 of the first embodiment.

視線推定器２４１は、領域抽出部２３０により抽出された目領域を用いて視線を推定する。本実施形態において、視線推定器２４１は、顔画像に含まれる目の視線を機械学習によりあらかじめ学習し、その学習結果を用いて視線を推定するように構成されている。 The line-of-sight estimator 241 estimates the line of sight using the eye region extracted by the region extraction unit 230. In the present embodiment, the line-of-sight estimator 241 is configured to learn the line-of-sight of the eyes included in the face image in advance by machine learning and estimate the line-of-sight using the learning result.

視線推定器２４１₁、２４１₂、・・・、２４１_nは、視線の推定方法がそれぞれ異なる。例えば、視線推定器２４１₁、２４１₂、・・・、２４１_nは、機械学習においてサンプルとして用いられる顔画像がそれぞれ異なる。あるいは、視線推定器２４１₁、２４１₂、・・・、２４１_nは、機械学習のアルゴリズムがそれぞれ異なっていてもよい。 The line-of _- sight estimators 241, 241, ..., 241 _n _have different line-of-sight estimation methods. _{For example, the line-of-sight estimators 241, 241, ..., 241 n} _have _different facial images used as samples in machine learning. Alternatively, the line-of _- sight estimators 241, 241, ..., 241 _n _may have different machine learning algorithms.

統合部２５０は、視線推定部２４０、より詳細には視線推定器２４１₁、２４１₂、・・・、２４１_nにより推定された推定結果を統合する。換言すれば、統合部２５０は、視線推定器２４１₁、２４１₂、・・・、２４１_nにより推定された複数の視線に基づき、単一の方向の視線を決定する。統合部２５０は、第１実施形態の決定部１２０の一例に相当する。 The integration unit 250 integrates the estimation results estimated by the line-of-sight estimation unit 240, more specifically, the line-of _- sight estimators 241, 241, ..., _{241 n} _. In other words, the integration unit 250 determines the line of sight in a single direction based _on the plurality of lines of sight estimated by the line of sight _{estimators 241, 241, ..., 241 n} _. The integration unit 250 corresponds to an example of the determination unit 120 of the first embodiment.

統合部２５０は、カメラ情報及び学習情報に基づいて複数の視線を統合する。ここにおいて、学習情報は、視線推定器２４１₁、２４１₂、・・・、２４１_nのそれぞれの学習に関する条件を含むデータである。学習情報は、例えば、視線推定器２４１₁、２４１₂、・・・、２４１_nのそれぞれの学習に用いられた撮像装置の撮像条件を表す。学習情報は、データ処理装置２００に記憶されているものとする。学習情報は、第１実施形態の第２の条件情報の一例に相当する。 The integration unit 250 integrates a plurality of lines of sight based on camera information and learning information. Here, the learning information is data including conditions related to learning of the line-of _- sight estimators 241, 241, ..., 241 _n , _respectively . The learning information represents, for example, the imaging conditions of the imaging apparatus used for learning each of the line-of _- sight _{estimators 241, 241, ..., 241 n} _. It is assumed that the learning information is stored in the data processing device 200. The learning information corresponds to an example of the second conditional information of the first embodiment.

統合部２５０は、視線推定器２４１₁、２４１₂、・・・、２４１_nのそれぞれに対して決定される重みを用いた重み付き演算により、視線推定器２４１₁、２４１₂、・・・、２４１_nのそれぞれにより推定された複数の視線を統合する。このとき、統合部２５０は、カメラ情報及び学習情報を用いることにより、それぞれの視線に対する重みを決定することができる。統合部２５０による重み付き演算は、後述の動作例において詳細に説明される。 The integration unit 250 _{performs a weighted operation using weights determined for each of the line-of-sight estimators 241, 241, ..., 241 n} _, _and the line _- of _- sight estimators 241, 241, ..., Combine multiple gazes estimated by each of 241 _n . At this time, the integration unit 250 can determine the weight for each line of sight by using the camera information and the learning information. The weighted operation by the integration unit 250 will be described in detail in an operation example described later.

出力部２６０は、統合部２５０により統合された視線を示すデータ（以下「視線データ」ともいう。）を出力する。視線データは、例えば、統合部２５０により統合された視線、換言すれば統合部２５０により決定された方向を所定の規則に従って表す。出力部２６０による出力は、視線データを表示装置等の他の装置に供給することであってもよく、データ処理装置２００に含まれる記憶媒体に視線データを書き込むことであってもよい。 The output unit 260 outputs data indicating the line of sight integrated by the integration unit 250 (hereinafter, also referred to as “line of sight data”). The line-of-sight data represents, for example, the line-of-sight integrated by the integration unit 250, in other words, the direction determined by the integration unit 250 according to a predetermined rule. The output by the output unit 260 may be to supply the line-of-sight data to another device such as a display device, or to write the line-of-sight data to a storage medium included in the data processing device 200.

データ処理装置２００の構成は、以上のとおりである。この構成の下、データ処理装置２００は、画像データに基づいて視線を推定する。データ処理装置２００は、例えば、以下の動作例のように動作する。ただし、データ処理装置２００の具体的な動作は、この動作例に限定されない。 The configuration of the data processing device 200 is as described above. Under this configuration, the data processing device 200 estimates the line of sight based on the image data. The data processing device 200 operates, for example, as in the following operation example. However, the specific operation of the data processing device 200 is not limited to this operation example.

図４は、データ処理装置２００の動作例を示すフローチャートである。データ処理装置２００は、例えば、ユーザによって指定されたタイミングや、他の装置から画像データが送信されたタイミングなどの適当なタイミングで、図４に示される処理を実行することができる。この例において、画像データにより表される画像は、顔画像を含むものとする。また、カメラ情報及び学習情報は、撮像装置の設置角度であるとする。また、ここでいう画像の座標は、所定の位置を原点とする直交座標系によって表されるものとする。 FIG. 4 is a flowchart showing an operation example of the data processing device 200. The data processing device 200 can execute the process shown in FIG. 4 at an appropriate timing such as a timing specified by the user or a timing when image data is transmitted from another device. In this example, the image represented by the image data is assumed to include a face image. Further, it is assumed that the camera information and the learning information are the installation angles of the imaging device. Further, the coordinates of the image referred to here are represented by a Cartesian coordinate system having a predetermined position as the origin.

ステップＳ２１において、画像取得部２１０は、画像データを取得する。ステップＳ２２において、条件取得部２２０は、カメラ情報を取得する。なお、ステップＳ２１及びＳ２２の処理は、図４と逆の順序で実行されてもよく、同時に（すなわち並列的に）実行されてもよい。 In step S21, the image acquisition unit 210 acquires image data. In step S22, the condition acquisition unit 220 acquires camera information. The processes of steps S21 and S22 may be executed in the reverse order of FIG. 4, or may be executed at the same time (that is, in parallel).

ステップＳ２３において、領域抽出部２３０は、ステップＳ２１において取得された画像データを用いて目領域を抽出する。この例において、領域抽出部２３０は、右目の虹彩の中心の座標と左目の虹彩の中心の座標とを特定する。領域抽出部２３０は、これらの座標に基づいて目領域を決定する。説明の便宜上、以下においては、右目の虹彩の中心の座標を「右目の中心座標」、左目の虹彩の中心の座標を「左目の中心座標」ともいう。 In step S23, the area extraction unit 230 extracts the eye area using the image data acquired in step S21. In this example, the region extraction unit 230 identifies the coordinates of the center of the iris of the right eye and the coordinates of the center of the iris of the left eye. The region extraction unit 230 determines the eye region based on these coordinates. For convenience of explanation, in the following, the coordinates of the center of the iris of the right eye are also referred to as "center coordinates of the right eye", and the coordinates of the center of the iris of the left eye are also referred to as "center coordinates of the left eye".

具体的には、領域抽出部２３０は、右目の中心座標と左目の中心座標とを結ぶ線分の中点を目領域の中心とする。領域抽出部２３０は、右目の中心座標と左目の中心座標とを結ぶ線分の長さ（以下「瞳孔間距離（interpupillary distance）」ともいう。）の２倍の長さを目領域の幅とし、瞳孔間距離の０．７５倍の長さを目領域の高さとする。領域抽出部２３０は、このように決定された中心、幅、高さによって規定される矩形の領域を目領域として画像から切り出す。 Specifically, the region extraction unit 230 uses the midpoint of the line segment connecting the center coordinates of the right eye and the center coordinates of the left eye as the center of the eye region. The area extraction unit 230 has a width of the eye region that is twice the length of the line segment connecting the center coordinates of the right eye and the center coordinates of the left eye (hereinafter, also referred to as “interpupillary distance”). The height of the eye region is 0.75 times the interpupillary distance. The region extraction unit 230 cuts out a rectangular region defined by the center, width, and height thus determined as an eye region from the image.

また、領域抽出部２３０は、後続の処理が容易になるように、目領域の傾き、幅及び高さを補正する前処理を実行してもよい。より詳細には、領域抽出部２３０は、右目の中心座標と左目の中心座標が水平でなければ、これらの座標を水平にし、目領域の幅方向及び高さ方向の画素数が所定の画素数でなければ、目領域を拡大又は縮小する。 In addition, the region extraction unit 230 may execute preprocessing for correcting the inclination, width, and height of the eye region so that subsequent processing can be facilitated. More specifically, if the center coordinates of the right eye and the center coordinates of the left eye are not horizontal, the area extraction unit 230 makes these coordinates horizontal, and the number of pixels in the width direction and the height direction of the eye area is a predetermined number of pixels. If not, the eye area is enlarged or reduced.

図５は、顔画像の一例を示す図である。図６は、この顔画像から抽出される目領域の一例を示す図である。図６に示される目領域６００は、図５に示される顔画像５００の一部に相当する。具体的には、目領域６００は、顔画像５００のうちの破線で囲まれた領域５１０に相当する。ただし、目領域６００は、上述の前処理が実行された場合には、その画素数や傾きが領域５１０と必ずしも一致しない。 FIG. 5 is a diagram showing an example of a face image. FIG. 6 is a diagram showing an example of an eye region extracted from this face image. The eye area 600 shown in FIG. 6 corresponds to a part of the face image 500 shown in FIG. Specifically, the eye region 600 corresponds to the region 510 surrounded by the broken line in the face image 500. However, the number of pixels and the inclination of the eye area 600 do not always match the area 510 when the above-mentioned preprocessing is executed.

ステップＳ２４において、視線推定部２４０は、ステップＳ２３において抽出された目領域に基づいて視線を推定する。視線推定部２４０は、事前に学習された視線推定器２４１₁～２４１_nを用いて視線を推定する。この例において、視線推定器２４１₁～２４１_nは、目領域から検出される画像特徴量に基づいて視線を推定する。 In step S24, the line-of-sight estimation unit 240 estimates the line of sight based on the eye region extracted in step S23. The line-of-sight estimation unit 240 estimates the line-of-sight using the line-of-sight estimators 241 ₁ to 241 _n learned in advance. In this example, the line-of-sight estimators 241 ₁ to 241 _n estimate the line of sight based on the amount of image features detected from the eye region.

この例における画像特徴量は、画像の輝度の勾配に関する特徴量である。輝度の勾配に関する特徴量としては、例えば、ＨＯＧ（Histograms of Oriented Gradients）特徴量が知られている。この例における画像特徴量は、目領域における輝度の変化の方向と大きさとを所定の次元数（例えば、数百～数千）で示す。以下において、この画像特徴量は、所定の要素数の列ベクトルｆによっても表現される。 The image feature amount in this example is a feature amount related to the gradient of the brightness of the image. As a feature amount related to the gradient of brightness, for example, a HOG (Histograms of Oriented Gradients) feature amount is known. The image feature amount in this example indicates the direction and magnitude of the change in brightness in the eye region by a predetermined number of dimensions (for example, hundreds to thousands). In the following, this image feature amount is also represented by a column vector f having a predetermined number of elements.

視線推定器２４１₁～２４１_nは、以下の式（１）を用いて視線（ｇ_ｘ，ｇ_ｙ）を算出する。ここにおいて、視線（ｇ_ｘ，ｇ_ｙ）は、顔の向きを基準とした視線の向きを水平角と仰俯角とによって示す。このうち、ｇ_ｘは、水平角を表し、－９０≦ｇ_ｘ≦９０を満たす（単位は[deg]）。また、ｇ_ｙは、仰俯角を表し、－９０≦ｇ_ｙ≦９０を満たす（単位は[deg]）。 The line-of-sight estimators 241 ₁ to 241 _n calculate the line-of-sight (g _x , _gy ) using the following equation (1). Here, the line of sight (g _x , _gy ) indicates the direction of the line of sight with respect to the direction of the face by a horizontal angle and an elevation / depression angle. Of these, g _x represents the horizontal angle and satisfies −90 ≦ g _x ≦ 90 (unit is [deg]). Further, _gy represents an elevation / depression angle and satisfies −90 ≦ _gy ≦ 90 (unit is [deg]).

視線（ｇ_ｘ，ｇ_ｙ）は、（ｇ_ｘ，ｇ_ｙ）＝（０，０）である場合を基準、すなわち顔に対して真正面を向いている視線であるとし、真正面からの視線のずれを水平角と仰俯角とによって表す。例えば、視線が真上を向いている場合に（ｇ_ｘ，ｇ_ｙ）＝（０，＋９０）であり、視線が真下を向いている場合に（ｇ_ｘ，ｇ_ｙ）＝（０，－９０）である。また、視線が真横（右）を向いている場合に（ｇ_ｘ，ｇ_ｙ）＝（＋９０，０）であり、視線が真横（左）を向いている場合に（ｇ_ｘ，ｇ_ｙ）＝（－９０，０）である。 The line of sight (g _x , _gy ) is based on the case where (g _x , _gy ) = (0,0), that is, the line of sight is facing directly in front of the face, and the line of sight is deviated from the front. Is represented by the horizontal angle and the elevation / depression angle. For example, (g _x , _gy ) = (0, +90) when the line of sight is directly upward, and (g _x , _gy ) = (0, -90) when the line of sight is directly downward. ). Further, when the line of sight is directed to the side (right), (g _x , gy) = (+90,0), and when the line of sight is directed to the side (left), (g _x , _gy ) ₌ . (-90,0).

なお、ここでいう正面の向きは、顔画像により表される顔の向きに依存する。すなわち、ここでいう正面は、顔の向きに応じて変化する。したがって、撮像された人物が実際に目で見ている方向は、視線（ｇ_ｘ，ｇ_ｙ）のみによっては特定されず、視線（ｇ_ｘ，ｇ_ｙ）と当該人物の顔の向きとによって特定される。 The front orientation referred to here depends on the orientation of the face represented by the face image. That is, the front surface referred to here changes according to the direction of the face. Therefore, the direction actually seen by the photographed person is not specified only by the line of sight (g _x , _gy ), but is specified by the line of sight (g _x , _gy ) and the direction of the person's face. Will be done.

式（１）において、ｕ_ｘ、ｕ_ｙは、重みベクトルである。重みベクトルｕ_ｘ、ｕ_ｙは、各々が画像特徴量ｆと同じ要素数の行ベクトルであり、画像特徴量ｆとの内積が算出可能である。重みベクトルｕ_ｘ、ｕ_ｙは、視線推定器２４１₁～２４１_n毎に異なり得る。重みベクトルｕ_ｘ、ｕ_ｙは、サポートベクトル回帰や最小二乗法による線形回帰などの周知の手法によって事前に学習可能である。視線推定器２４１₁～２４１_nにおける学習は、一般に、ステップＳ２３と同様に抽出された目領域の画像と、当該画像の実際の視線を示す情報（すなわち正解の情報）との組を多数用意して実行される。 In equation (1), u _x and u _y are weight vectors. Each of the weight vectors u _x and u _y is a row vector having the same number of elements as the image feature amount f, and the inner product with the image feature amount f can be calculated. The weight vectors u _x and u _y can be different for each line-of-sight estimator 241 ₁ to 241 _n . The weight vectors u _x and u _y can be learned in advance by well-known methods such as support vector regression and linear regression by the least squares method. For learning in the line-of-sight estimators 241 ₁ to 241 _n , in general, a large number of sets of an image of the eye region extracted as in step S23 and information indicating the actual line-of-sight of the image (that is, correct answer information) are prepared. Is executed.

この例において、視線推定器２４１₁～２４１_nは、撮像条件が異なる目領域の画像をそれぞれ用いて学習が実行される。具体的には、視線推定器２４１₁～２４１_nの学習には、設置角度が異なる撮像装置により撮像された目領域の画像がそれぞれ用いられる。 In this example, the line-of-sight estimators 241 ₁ to 241 _n perform learning using images of eye regions having different imaging conditions. Specifically, for learning the line-of-sight estimators 241 ₁ to 241 _n , images of the eye region captured by imaging devices having different installation angles are used.

図７は、目領域の画像の撮像条件を説明するための概念図である。ここにおいて、視線推定器２４１の数（すなわちｎの値）は、「４」であるとする。図７の例において、カメラ７１０、７２０、７３０、７４０は、人物７００の顔画像を撮像する撮像装置である。カメラ７１０は、顔画像を右上方から撮像する。カメラ７２０は、顔画像を左上方から撮像する。カメラ７３０は、顔画像を右下方から撮像する。カメラ７４０は、顔画像を左下方から撮像する。なお、人物７００は、画像毎に異なる人物であってもよいし、いずれの画像においても同一の人物であってもよい。また、人物７００は、撮像時は同じ方向（正面）を向いているものとする。 FIG. 7 is a conceptual diagram for explaining the imaging conditions of the image in the eye region. Here, the number of the line-of-sight estimator 241 (that is, the value of n) is assumed to be "4". In the example of FIG. 7, the cameras 710, 720, 730, and 740 are image pickup devices that capture a facial image of a person 700. The camera 710 captures a face image from the upper right. The camera 720 captures a face image from the upper left. The camera 730 captures a face image from the lower right. The camera 740 captures a face image from the lower left. The person 700 may be a different person for each image, or may be the same person in any of the images. Further, it is assumed that the person 700 faces the same direction (front) at the time of imaging.

視線推定器２４１₁は、カメラ７１０により撮像された顔画像を学習に用いる。視線推定器２４１_２は、カメラ７２０により撮像された顔画像を学習に用いる。視線推定器２４１_３は、カメラ７３０により撮像された顔画像を学習に用いる。視線推定器２４１_４は、カメラ７４０により撮像された顔画像を学習に用いる。そうすると、視線推定器２４１_１～２４１_４は、学習に用いられた顔画像に対応する撮像装置の設置角度が互いに異なることになる。 The line-of _- sight estimator 2411 uses the face image captured by the camera 710 for learning. The line-of _- sight estimator 2412 uses the face image captured by the camera 720 for learning. The line-of _- sight estimator 2413 uses the face image captured by the camera 730 for learning. The line-of _- sight estimator 2414 uses the face image captured by the camera 740 for learning. Then, the line-of _- sight estimators 241 ₁ to 2414 have different installation angles of the image pickup devices corresponding to the face images used for learning.

視線推定器２４１₁～２４１_nは、機械学習の条件（ここでは、学習に用いられた顔画像の撮像条件）が異なるため、同一の目領域の画像を用いて視線を推定しても推定結果が異なり得る。換言すると、視線推定器２４１₁～２４１_nは、式（１）における重みベクトルｕ_ｘ、ｕ_ｙが互いに異なり得るゆえに、画像特徴量ｆの値が同一であっても視線（ｇ_ｘ，ｇ_ｙ）が異なる可能性がある。以下においては、視線推定器２４１₁により推定された視線を（ｇ^（１） _ｘ，ｇ^（１） _ｙ）、視線推定器２４１_２により推定された視線を（ｇ^（２） _ｘ，ｇ^（２） _ｙ）、・・・、視線推定器２４１_ｎにより推定された視線を（ｇ^（ｎ） _ｘ，ｇ^（ｎ） _ｙ）ともいう。 Since the line-of-sight estimators 241 ₁ to 241 _n have different machine learning conditions (here, the imaging conditions of the face image used for learning), the estimation result is obtained even if the line-of-sight is estimated using images of the same eye region. Can be different. In other words, in the line-of-sight estimators 241 ₁ to 241 _n , since the weight vectors ux and _yy in the equation (1) can be different from each other, the line-of-sight (g _x , _gy ₎ even if the values of the image feature values f are the same. ) May be different. In the following, the line of sight estimated by the line _- of-sight estimator 2411 is (g ( ₁ ⁾ _x , g ⁽¹⁾ _y ), and the line of sight estimated by the line-of-sight estimator 2412 is (g ⁽²⁾ _x , g ⁽ 2). ⁾ _Y ), ..., The line of sight estimated by the line of sight estimator 241 _n is also referred to as (g ⁽ⁿ⁾ _x , g ⁽ⁿ⁾ _y ).

ステップＳ２５において、統合部２５０は、ステップＳ２４において推定された視線（ｇ^（１） _ｘ，ｇ^（１） _ｙ）～（ｇ^（ｎ） _ｘ，ｇ^（ｎ） _ｙ）を統合する。すなわち、統合部２５０は、ステップＳ２４において推定された視線（ｇ^（１） _ｘ，ｇ^（１） _ｙ）～（ｇ^（ｎ） _ｘ，ｇ^（ｎ） _ｙ）に基づいて単一の視線を算出する。ここにおいて、統合部２５０は、カメラ情報と学習情報とに基づいて重みを算出する。カメラ情報及び学習情報は、ここでは、撮像装置の設置角度である。 In step S25, the integrating unit 250 integrates the line of sight (g ⁽¹⁾ _x , g ⁽¹⁾ _y ) to (g ⁽ⁿ⁾ _x , g ⁽ⁿ⁾ _y ) estimated in step S24. That is, the integrating unit 250 calculates a single line of sight based on the line of sight (g ⁽¹⁾ _x , g ⁽¹⁾ _y ) to (g ⁽ⁿ⁾ _x , g ⁽ⁿ⁾ _y ) estimated in step S24. do. Here, the integration unit 250 calculates the weight based on the camera information and the learning information. The camera information and learning information here are the installation angles of the imaging device.

統合部２５０は、以下の式（２）を用いて視線推定器２４１_ｉに対応する重みｗ_ｉを算出する。ここにおいて、ｃ_i、ｃ_ｊ及びｃ_ｔは、いずれも撮像装置の設置角度を表すベクトルである。ｃ_i（又はｃ_ｊ）は、視線推定器２４１_i（又は２４１_ｊ）の学習に用いられた複数の顔画像が表す顔の方向のそれぞれと当該顔画像を撮像した撮像装置の光軸方向とがなす角度の平均値を示す。一方、ｃ_ｔは、入力画像データに含まれる顔画像が表す顔の方向と当該顔画像を撮像した撮像装置の光軸方向とがなす角度を示す。ｃ_i、ｃ_ｊは、学習情報の一例である。一方、ｃ_ｔは、カメラ情報の一例である。また、αは、０より大きい適当な係数である。 The integration unit 250 calculates the weight _wi corresponding to the line-of-sight estimator 241 _i using the following equation (2). Here, c _i , c _j , and c _t are all vectors representing the installation angles of the image pickup apparatus. c _i (or c _j ) is the direction of each of the faces represented by the plurality of face images used for learning the line-of-sight estimator 241 _i (or 241 _j ) and the optical axis direction of the image pickup device that captured the face image. Shows the average value of the angles formed by. On the other hand, _ct indicates an angle formed by the direction of the face represented by the face image included in the input image data and the direction of the optical axis of the image pickup device that captured the face image. c _i and c _j are examples of learning information. On the other hand, _ct is an example of camera information. Also, α is an appropriate coefficient greater than 0.

例えば、ｎ＝２、すなわち視線推定器２４１の数が２つであるとすると、重みｗ_１、ｗ_２は、以下の式（３）、（４）によって表すことができる。なお、重みｗ_ｉは、学習情報ｃ_iとカメラ情報ｃ_ｔとの差が小さいほど大きくなる。 For example, assuming that n = 2, that is, the number of line-of-sight estimators 241 is _two , the weights w1 and w2 can be expressed by the following equations ( ₃ ) and (4). The weight w _i increases as the difference between the learning information c _i and the camera information c _t becomes smaller.

統合部２５０は、重みｗ_ｉをこのように算出した後、以下の式（５）に従って視線（Ｇ_ｘ，Ｇ_ｙ）を算出する。式（５）が示すとおり、視線（Ｇ_ｘ，Ｇ_ｙ）は、視線（ｇ^（１） _ｘ，ｇ^（１） _ｙ）～（ｇ^（ｎ） _ｘ，ｇ^（ｎ） _ｙ）の加重平均である。なお、式（５）の右辺の分母は、ここでは「１」である（式（２）参照）。 After calculating the weight wi in this way, the integrating unit 250 calculates the line of sight (G _x , G _y ₎ according to the following equation (5). As shown in equation (5), the line of sight (G _x , G _y ) is the weighted average of the line of sight (g ⁽¹⁾ _x , g ⁽¹⁾ _y ) to (g ⁽ⁿ⁾ _x , g ⁽ⁿ⁾ _y ). be. The denominator on the right side of equation (5) is "1" here (see equation (2)).

ステップＳ２６において、出力部２６０は、統合部２５０により算出された視線（Ｇ_ｘ，Ｇ_ｙ）を示す視線データを出力する。この視線データは、例えば、表示装置によって可視化される。視線データにより示される視線は、数値で表示されてもよいし、視線を示す矢印を顔画像に重ねて表示されてもよい。 In step S26, the output unit 260 outputs line-of-sight data indicating the line-of-sight (G _x , G _y ) calculated by the integration unit 250. This line-of-sight data is visualized by, for example, a display device. The line of sight indicated by the line-of-sight data may be displayed numerically, or an arrow indicating the line of sight may be superimposed on the face image.

図８は、本実施形態の効果の一例を示す図である。この例において、視線推定器２４１の数は、２つである。この例は、２つの注視点を順に見つめる１人の被験者を撮像した動画を用いて視線を推定した例である。なお、視線推定器２４１_１に対応する学習情報（設置角度）は、（＋２．３[deg]，＋５．５[deg]）である。また、視線推定器２４１_２に対応する学習情報（設置角度）は、（＋１．２[deg]，－２２．７[deg]）である。また、カメラ情報（設置角度）は、（０[deg]，０[deg]）である。係数αは、ここでは「０．０４」である。 FIG. 8 is a diagram showing an example of the effect of the present embodiment. In this example, the number of line-of-sight estimators 241 is two. This example is an example in which the line of sight is estimated using a moving image of one subject looking at two gazing points in order. The learning information (installation angle) corresponding to the line _- of-sight estimator 2411 is (+2.3 [deg], +5.5 [deg]). The learning information (installation angle) corresponding to the line _- of-sight estimator 2412 is (+1.2 [deg], -22.7 [deg]). The camera information (installation angle) is (0 [deg], 0 [deg]). The coefficient α is “0.04” here.

図８において、グラフ８１０は、視線推定器２４１_１により推定された視線（ｇ^（１） _ｘ，ｇ^（１） _ｙ）を表す。グラフ８２０は、視線推定器２４１_２により推定された視線（ｇ^（２） _ｘ，ｇ^（２） _ｙ）を表す。グラフ８３０は、統合部２５０により統合された視線（Ｇ_ｘ，Ｇ_ｙ）を表す。グラフ８４０は、被験者の実際の視線を表す。 In FIG. 8, graph 810 represents the line of sight (g ⁽¹⁾ _x , g ⁽¹⁾ _y ₎ estimated by the line of sight estimator 2411. Graph 820 represents the line of sight (g ⁽²⁾ _x , g ⁽²⁾ _y ₎ estimated by the line of sight estimator 2412. Graph 830 represents the line of sight (G _x , G _y ) integrated by the integration unit 250. Graph 840 represents the actual line of sight of the subject.

図８に示されるように、統合部２５０により統合された視線（Ｇ_ｘ，Ｇ_ｙ）は、視線推定器２４１_１により推定された視線（ｇ^（１） _ｘ，ｇ^（１） _ｙ）及び視線推定器２４１_２により推定された視線（ｇ^（２） _ｘ，ｇ^（２） _ｙ）に比べ、実際の視線との誤差が少ない。したがって、データ処理装置２００は、単一の視線推定器２４１を用いる場合に比べ、視線推定の精度が向上しているといえる。 As shown in FIG. 8, the line of sight (G _x , G _y ) integrated by the integration unit 250 is the line of sight (g ⁽¹⁾ _x , g ⁽¹⁾ _y ₎ and the line of sight estimated by the line of sight estimator 2411. Compared with the line of sight (g ⁽²⁾ _x , g ⁽²⁾ _y ₎ estimated by the estimator 2412, the error from the actual line of sight is small. Therefore, it can be said that the data processing device 200 has improved the accuracy of the line-of-sight estimation as compared with the case of using the single line-of-sight estimator 241.

この例において、グラフ８１０、すなわち視線（ｇ^（１） _ｘ，ｇ^（１） _ｙ）とグラフ８２０、すなわち視線（ｇ^（２） _ｘ，ｇ^（２） _ｙ）とを比較すると、視線（ｇ^（１） _ｘ，ｇ^（１） _ｙ）の方が実際の視線（グラフ８４０）に近い推定結果であるといえる。ここで、カメラ情報と視線推定器２４１_１、２４１_２に対応する学習情報とを比較すると、視線推定器２４１_１に対応する学習情報の方が、カメラ情報との差が小さいといえる。本実施形態の重み付き加算（式（２）～（５）参照）によれば、学習情報とカメラ情報との差が小さい視線推定器２４１ほど重みｗ_ｉが大きくなる。したがって、視線データが表す視線、すなわち最終的な推定結果は、撮像条件に含まれる設置角度がより近い視線推定器２４１により推定された視線に近付く。 In this example, comparing graph 810, i.e. line of sight (g ⁽¹⁾ _x , g ⁽¹⁾ _y ) with graph 820, i.e. line of sight (g ⁽²⁾ _x , g ⁽²⁾ _y ), gaze (g ⁽ 1) x, g (2) y) ¹⁾ It can be said that _x , g ⁽¹⁾ _y ) is an estimation result closer to the actual line of sight (graph 840). Here, when the camera information and the learning information corresponding to the line _- of-sight estimators 241 ₁ and 2412 are compared, it can be said that the learning information corresponding to the line-of-sight estimator 2411 has _a smaller difference from the camera information. According to the weighted addition of the present embodiment (see equations (2) to (5)), the weight _wi becomes larger as the line-of-sight estimator 241 has a smaller difference between the learning information and the camera information. Therefore, the line of sight represented by the line-of-sight data, that is, the final estimation result, approaches the line of sight estimated by the line-of-sight estimator 241 whose installation angle included in the imaging conditions is closer.

この例において、視線推定器２４１の推定精度は、事前の学習における撮像条件と、データ処理装置２００による推定対象である顔画像、すなわち入力画像データが表す顔画像の撮像条件とに依存するといえる。より詳細には、視線推定器２４１の推定精度は、事前の学習における顔画像と撮像装置との相対的な位置関係（設置角度）と、推定対象である顔画像と当該顔画像を撮像した撮像装置との相対的な位置関係（設置角度）との近似する程度に依存するといえる。しかしながら、推定対象である顔画像と撮像装置との相対的な位置関係は、常に一定であるとは限らず、撮像方法によってはまちまちになる場合もある。 In this example, it can be said that the estimation accuracy of the line-of-sight estimator 241 depends on the imaging conditions in the prior learning and the imaging conditions of the face image to be estimated by the data processing device 200, that is, the facial image represented by the input image data. More specifically, the estimation accuracy of the line-of-sight estimator 241 is the relative positional relationship (installation angle) between the face image and the imaging device in the prior learning, and the imaging of the face image to be estimated and the face image. It can be said that it depends on the degree of approximation to the relative positional relationship (installation angle) with the device. However, the relative positional relationship between the face image to be estimated and the imaging device is not always constant, and may vary depending on the imaging method.

本実施形態の視線推定方法によれば、式（２）～（５）のような重み付き加算を実行することで、異なる撮像条件を用いて学習された複数の視線推定器２４１による推定結果を統合することができる。したがって、本実施形態の視線推定方法によれば、入力画像データが表す顔画像の撮像条件と複数の視線推定器２４１の事前の学習における撮像条件とが一致していなくても、精度が良い視線推定が可能である。加えて、本実施形態の視線推定方法によれば、入力画像データが表す顔画像と撮像装置との相対的な位置関係が一定でなくても、精度が良い視線推定が可能である。 According to the line-of-sight estimation method of the present embodiment, by executing the weighted addition as in the equations (2) to (5), the estimation results by the plurality of line-of-sight estimators 241 learned using different imaging conditions can be obtained. Can be integrated. Therefore, according to the line-of-sight estimation method of the present embodiment, even if the image pickup condition of the face image represented by the input image data and the image pickup condition in the prior learning of the plurality of line-of-sight estimators 241 do not match, the line-of-sight with good accuracy It can be estimated. In addition, according to the line-of-sight estimation method of the present embodiment, it is possible to perform line-of-sight estimation with good accuracy even if the relative positional relationship between the face image represented by the input image data and the imaging device is not constant.

以上のとおり、本実施形態のデータ処理装置２００は、顔画像に含まれる顔の視線を複数の視線推定器２４１で推定し、推定された複数の視線を統合する構成を有する。この構成により、データ処理装置２００は、第１実施形態の視線推定装置１００と同様の作用効果を奏することができる。 As described above, the data processing device 200 of the present embodiment has a configuration in which the line of sight of the face included in the face image is estimated by the plurality of line of sight estimators 241 and the estimated plurality of line of sight are integrated. With this configuration, the data processing device 200 can exert the same effects as the line-of-sight estimation device 100 of the first embodiment.

また、データ処理装置２００は、撮像条件を表す学習情報とカメラ情報とに応じて決定される重みに従った重み付き演算を実行することによって視線を統合する構成を有する。この構成は、複数の視線推定器２４１により推定された複数の視線に対し、撮像条件に応じた重みを付与することを可能にする。したがって、データ処理装置２００は、このような重みを付与しない場合に比べ、視線推定の精度を向上させることが可能である。 Further, the data processing device 200 has a configuration in which the line of sight is integrated by executing a weighted operation according to a weight determined according to the learning information representing the imaging condition and the camera information. This configuration makes it possible to give weights according to the imaging conditions to the plurality of lines of sight estimated by the plurality of line-of-sight estimators 241. Therefore, the data processing device 200 can improve the accuracy of the line-of-sight estimation as compared with the case where such a weight is not applied.

さらに、データ処理装置２００は、撮像条件を表す学習情報及びカメラ情報の比較結果に応じた重みを決定する構成を有する。データ処理装置２００は、より詳細には、複数の視線推定器２４１により推定された複数の視線に対し、視線推定器２４１の学習時における撮像装置の設置角度が入力画像データにより表される顔画像を撮像した撮像装置の設置角度に近いものほど重みを大きくする。このような構成により、データ処理装置２００は、出力される視線データが表す視線を、設置角度がより近い視線推定器２４１により推定された視線に近付けることが可能である。 Further, the data processing device 200 has a configuration for determining the weight according to the comparison result of the learning information representing the imaging condition and the camera information. More specifically, the data processing device 200 is a face image in which the installation angle of the image pickup device at the time of learning of the line-of-sight estimator 241 is represented by input image data with respect to a plurality of lines of sight estimated by the plurality of line-of-sight estimators 241. The closer to the installation angle of the image pickup device that captured the image, the heavier the weight. With such a configuration, the data processing device 200 can bring the line of sight represented by the output line-of-sight data closer to the line of sight estimated by the line-of-sight estimator 241 having a closer installation angle.

［変形例］
上述された実施形態は、例えば、以下のような変形を適用することができる。これらの変形例は、必要に応じて適宜組み合わせることも可能である。 [Modification example]
For example, the following modifications can be applied to the above-described embodiment. These modifications can be combined as needed.

（変形例１）
決定部１２０は、周知の顔向き推定技術を用いることによって顔の方向を推定することが可能である。決定部１２０は、このように推定された顔の方向と、撮像装置の光軸方向とがなす角度とに基づいて撮像装置の設置角度を算出してもよい。 (Modification example 1)
The determination unit 120 can estimate the direction of the face by using a well-known face orientation estimation technique. The determination unit 120 may calculate the installation angle of the image pickup device based on the angle formed by the face direction estimated in this way and the optical axis direction of the image pickup device.

（変形例２）
カメラ情報及び学習情報は、顔画像の撮像に用いられた撮像装置の種類を示す情報を含んでもよい。ここでいう撮像装置の種類は、例えば、撮像装置の機種や、撮像装置が感度を有する光の波長帯を表す。 (Modification 2)
The camera information and the learning information may include information indicating the type of the imaging device used for capturing the facial image. The type of the image pickup apparatus referred to here represents, for example, the model of the image pickup apparatus and the wavelength band of light having the sensitivity of the image pickup apparatus.

例えば、可視光により撮像する可視光カメラと近赤外光により撮像する近赤外光カメラとが撮像装置に含まれる場合がある。このような場合において、視線推定器２４１の学習に用いられる撮像装置にも可視光カメラと近赤外光カメラとが含まれるときには、入力される顔画像の撮像に用いられた撮像装置と学習に用いられた撮像装置とに異同が生じる可能性がある。例えば、入力される顔画像の撮像に用いられた撮像装置が近赤外光カメラであれば、学習に用いられた撮像装置が近赤外光カメラである視線推定器２４１による推定結果の方が信頼できる（すなわち精度が保証される）可能性が高いといえる。 For example, the image pickup apparatus may include a visible light camera that captures images with visible light and a near-infrared light camera that captures images with near-infrared light. In such a case, when the image pickup device used for learning the line-of-sight estimator 241 also includes a visible light camera and a near-infrared light camera, the image pickup device used for capturing the input face image and learning Differences may occur with the imaging device used. For example, if the image pickup device used to capture the input face image is a near-infrared light camera, the estimation result by the line-of-sight estimator 241 whose image pickup device used for learning is a near-infrared light camera is better. It can be said that it is likely to be reliable (that is, accuracy is guaranteed).

このような場合、統合部２５０は、入力される顔画像の撮像に用いられた撮像装置と学習に用いられた撮像装置の種類とが一致する視線推定器２４１_ｉに対応する重みｗ_ｉを大きくし、そうでない視線推定器２４１_ｉに対応する重みｗ_ｉを小さくする。このようにすれば、入力される顔画像の撮像に用いられた撮像装置と同種の撮像装置が学習に用いられている視線推定器２４１_ｉの推定結果を視線データにより強く反映させることが可能である。 In such a case, the integration unit 250 increases the weight _wi corresponding to the line-of-sight estimator 241 _i in which the type of the image pickup device used for capturing the input face image and the type of the image pickup device used for learning match. However, the weight _wi corresponding to the line-of-sight estimator 241 _i , which is not the case, is reduced. In this way, it is possible to more strongly reflect the estimation result of the line-of-sight estimator 241 _i , which is used for learning by the same type of image pickup device as the image pickup device used for capturing the input face image, in the line-of-sight data. be.

また、カメラ情報及び学習情報は、撮像装置の光学系のパラメータであってもよい。例えば、カメラ情報及び学習情報は、レンズの水平方向及び垂直方向の画角をパラメータとして含んでもよい。この場合、統合部２５０は、このようなパラメータを要素とするベクトルをカメラ情報及び学習情報として用いて、式（２）と同様の計算によって重みを算出することが可能である。 Further, the camera information and the learning information may be parameters of the optical system of the imaging device. For example, the camera information and the learning information may include the horizontal and vertical angles of view of the lens as parameters. In this case, the integration unit 250 can calculate the weight by the same calculation as in the equation (2) by using the vector having such a parameter as an element as the camera information and the learning information.

（変形例３）
第２実施形態における重み付き演算の方法は、上述の動作例に限定されない。例えば、統合部２５０は、式（２）により算出された重みｗ_ｉの一部を用いずに視線（ｇ^（１） _ｘ，ｇ^（１） _ｙ）～（ｇ^（ｎ） _ｘ，ｇ^（ｎ） _ｙ）を統合してもよい。具体的には、統合部２５０は、重みｗ_ｉのうちの所定の閾値以上であるもの（又は値が大きい順に所定数のもの）以外のものを「０」に置換してもよい。この置換は、重みｗ_ｉのうち最終的な推定結果に与える影響が少ないものを切り捨てることに相当する。また、この場合において、統合部２５０は、切り捨て後の重みｗ_ｉの総和が「１」になるように各重みの比率（式（２）の分母）を再計算してもよい。 (Modification example 3)
The method of weighted operation in the second embodiment is not limited to the above-mentioned operation example. For example, the integration unit 250 does not use a part of the weight wi calculated by the equation (2), and the line of sight ( _g ⁽¹⁾ _x , g ⁽¹⁾ _y ) to (g ⁽ⁿ⁾ _x , g ⁽ n). ⁾ _Y ) may be integrated. Specifically, the integration unit 250 may replace weights wi that are not equal to or greater than a predetermined threshold value (or a predetermined number in descending order of value) with “ ₀ ”. This substitution corresponds to truncating the weights _wi that have little effect on the final estimation result. Further, in this case, the integration unit 250 may recalculate the ratio of each weight (denominator of the equation (2)) so that the sum of the weights _wi after truncation becomes “1”.

また、統合部２５０は、式（２）において、exp(-α||c_i-c_t||)に代えて、||c_i-c_t||の増加に対して単調減少する別の関数を用いてもよい。例えば、統合部２５０は、以下の式（６）を用いて重みｗ_ｉを算出してもよい。ここにおいて、max(a,b)は、ａ、ｂのうちのより大きい値を返す関数を表す。また、βは、０以上の定数である。 Further, in the equation (2), the integration unit 250 is different from exp (-α || c _i _-ct ||), which decreases monotonically with the increase of || c _i _-ct ||. You may use a function. For example, the integration unit 250 may calculate the weight _wi using the following equation (6). Here, max (a, b) represents a function that returns a larger value of a and b. Further, β is a constant of 0 or more.

あるいは、統合部２５０は、視線（ｇ^（１） _ｘ，ｇ^（１） _ｙ）～（ｇ^（ｎ） _ｘ，ｇ^（ｎ） _ｙ）の一部を切り捨ててから式（５）の視線（Ｇ_ｘ，Ｇ_ｙ）を算出してもよい。例えば、統合部２５０は、視線（ｇ^（１） _ｘ，ｇ^（１） _ｙ）～（ｇ^（ｎ） _ｘ，ｇ^（ｎ） _ｙ）に外れ値が含まれる場合に、外れ値を除外して式（５）の計算を実行してもよい。外れ値に相当する視線は、推定に失敗した視線であると考えられるためである。ここでいう外れ値は、視線（ｇ^（１） _ｘ，ｇ^（１） _ｙ）～（ｇ^（ｎ） _ｘ，ｇ^（ｎ） _ｙ）のうちの他の値と大きく外れた値である。外れ値は、例えば、視線（ｇ^（１） _ｘ，ｇ^（１） _ｙ）～（ｇ^（ｎ） _ｘ，ｇ^（ｎ） _ｙ）をベクトルと捉えた場合における視線間のユークリッド距離に基づいて特定される。 Alternatively, the integrating unit 250 cuts off a part of the line of sight (g ⁽¹⁾ _x , g ⁽¹⁾ _y ) to (g ⁽ⁿ⁾ _x , g ⁽ⁿ⁾ _y ), and then the line of sight (G) of the equation (5). _x , G _y ) may be calculated. For example, the integration unit 250 excludes outliers when the line of sight (g ⁽¹⁾ _x , g ⁽¹⁾ _y ) to (g ⁽ⁿ⁾ _x , g ⁽ⁿ⁾ _y ) contains outliers. The calculation of equation (5) may be executed. This is because the line of sight corresponding to the outliers is considered to be the line of sight that failed to be estimated. The outliers referred to here are values that are significantly different from the other values of the line of sight (g ⁽¹⁾ _x , g ⁽¹⁾ _y ) to (g ⁽ⁿ⁾ _x , g ⁽ⁿ⁾ _y ). Outliers are specified, for example, based on the Euclidean distance between the lines of sight when the line of sight (g ⁽¹⁾ _x , g ⁽¹⁾ _y ) to (g ⁽ⁿ⁾ _x , g ⁽ⁿ⁾ _y ) is regarded as a vector. Will be done.

（変形例４）
上述のとおり、学習情報及びカメラ情報は、推定される視線の範囲を含み得る。ここでいう視線の範囲は、カメラ情報においては、推定しようとしている視線の範囲を示し、学習情報においては、視線推定器２４１において学習に用いられた視線の範囲を示す。視線の範囲は、例えば、視線（ｇ_ｘ，ｇ_ｙ）と同様に、真正面からのずれを－９０～＋９０[deg]の範囲の数値によって表す。学習情報及びカメラ情報は、視線の範囲を水平角と仰俯角の双方によって表してもよく、これらの一方によって表してもよい。 (Modification example 4)
As mentioned above, the learning information and the camera information may include an estimated range of line of sight. The line-of-sight range referred to here indicates the range of the line-of-sight to be estimated in the camera information, and indicates the range of the line-of-sight used for learning in the line-of-sight estimator 241 in the learning information. As for the range of the line of sight, for example, the deviation from the front is represented by a numerical value in the range of −90 to +90 [deg], similarly to the line of sight (g _x , _gy ). The learning information and the camera information may represent the range of the line of sight by both the horizontal angle and the elevation / depression angle, or may be represented by one of them.

このような学習情報及びカメラ情報が用いられる場合、統合部２５０は、視線の範囲が重なる割合（以下「重複率」ともいう。）に基づいて重みを算出することができる。ここにおいて、重複率は、学習情報とカメラ情報の少なくとも一方に含まれる視線の範囲と、学習情報とカメラ情報の双方に含まれる視線の範囲の比率を表す。 When such learning information and camera information are used, the integration unit 250 can calculate the weight based on the ratio of overlapping line-of-sight ranges (hereinafter, also referred to as “overlap rate”). Here, the duplication rate represents the ratio of the line-of-sight range included in at least one of the learning information and the camera information to the line-of-sight range included in both the learning information and the camera information.

例えば、学習情報が表す視線の範囲とカメラ情報が表す視線の範囲が完全に一致する場合、重複率は「１．０」である。一方、学習情報が表す視線の範囲とカメラ情報が表す視線の範囲が全く一致しない場合、重複率は「０」である。より具体的には、視線推定器２４１_１の学習情報が表す水平方向の視線の範囲が－１０～＋５[deg]であり、カメラ情報が表す水平方向の視線の範囲が－１０～＋１０[deg]である場合、水平方向の重複率は、「０．７５（＝１５／２０）」である。 For example, when the line-of-sight range represented by the learning information and the line-of-sight range represented by the camera information completely match, the duplication rate is "1.0". On the other hand, when the line-of-sight range represented by the learning information and the line-of-sight range represented by the camera information do not match at all, the duplication rate is "0". More specifically, the range of the horizontal line of sight represented by the learning information of the line _- of-sight estimator 2411 is -10 to +5 [deg], and the range of the horizontal line of sight represented by the camera information is -10 to +10 [deg]. ], The horizontal overlap rate is "0.75 (= 15/20)".

このような学習情報及びカメラ情報が用いられる場合、統合部２５０は、式（２）の学習情報ｃ_i、ｃ_ｊ及びカメラ情報ｃ_ｔとして水平方向及び垂直方向の重複率を用いることができる。統合部２５０は、撮像角度の設置角度に代えて重複率を用いてもよく、撮像角度の設置角度に加えて重複率を用いてもよい。例えば、学習情報ｃ_i、ｃ_ｊ及びカメラ情報ｃ_ｔは、撮像角度の設置角度と重複率の双方が用いられる場合には、４成分（設置角度の水平成分及び垂直成分並びに重複率の水平成分及び垂直成分）のベクトルになる。 When such learning information and camera information are used, the integration unit 250 can use horizontal and vertical overlap _rates as the learning information c _i , c _j and camera information ct of the equation (2). The integration unit 250 may use the overlap rate instead of the installation angle of the imaging angle, or may use the overlap rate in addition to the installation angle of the imaging angle. For example, the learning information c _i , c _j and the camera information c _t have four components (horizontal component and vertical component of the installation angle and horizontal component of the overlap rate) when both the installation angle and the overlap rate of the imaging angle are used. And the vertical component) vector.

（変形例５）
領域抽出部２３０は、右目及び左目の中心座標や目領域を計算によって特定しなくてもよい。例えば、右目及び左目の中心座標や目領域は、ユーザが入力してもよい。この場合、データ処理装置２００は、ユーザの入力に基づいて右目及び左目の中心座標や目領域を特定することができる。 (Modification 5)
The area extraction unit 230 does not have to specify the center coordinates of the right eye and the left eye and the eye area by calculation. For example, the center coordinates of the right eye and the left eye and the eye area may be input by the user. In this case, the data processing device 200 can specify the center coordinates of the right eye and the left eye and the eye area based on the input of the user.

（変形例６）
目領域の形状は、必ずしも矩形に限定されない。例えば、領域抽出部２３０は、上述された目領域（図６参照）から視線の推定に直接的には影響しない領域（例えば鼻の領域）を除外してもよい。また、目領域は、必ずしも両目を含まなくてもよい。例えば、領域抽出部２３０は、右目又は左目の一方を含み、他方を含まない領域と目領域として抽出してもよい。 (Modification 6)
The shape of the eye region is not necessarily limited to a rectangle. For example, the region extraction unit 230 may exclude a region (for example, a nose region) that does not directly affect the estimation of the line of sight from the above-mentioned eye region (see FIG. 6). Also, the eye area does not necessarily have to include both eyes. For example, the region extraction unit 230 may extract as a region and an eye region that include one of the right eye or the left eye and does not include the other.

（変形例７）
視線推定部２４０による学習は、上述された例に限定されない。例えば、視線推定部２４０は、ランダムフォレスト等の集団学習アルゴリズムにより視線を推定するための非線形の関数を学習してもよい。 (Modification 7)
The learning by the line-of-sight estimation unit 240 is not limited to the above-mentioned example. For example, the line-of-sight estimation unit 240 may learn a non-linear function for estimating the line-of-sight by a group learning algorithm such as Random Forest.

（変形例８）
視線推定装置１００（又はデータ処理装置２００）により推定された視線の用途は、特に限定されない。例えば、視線推定装置１００は、コンビニエンスストア等の小売店に設置された監視カメラによって撮像された人物の視線を推定し、不審人物を検出するシステムに適用されてもよい。また、視線推定装置１００は、情報が表示された画面に対するユーザの視線に基づいて当該ユーザの興味・関心を推測するシステムに適用されてもよい。あるいは、視線推定装置１００は、視線の動きによって操作可能な電子機器や、自動車等の運転支援に適用されてもよい。 (Modification 8)
The use of the line of sight estimated by the line-of-sight estimation device 100 (or the data processing device 200) is not particularly limited. For example, the line-of-sight estimation device 100 may be applied to a system that estimates the line-of-sight of a person captured by a surveillance camera installed in a retail store such as a convenience store and detects a suspicious person. Further, the line-of-sight estimation device 100 may be applied to a system that estimates the user's interest / interest based on the user's line of sight with respect to the screen on which the information is displayed. Alternatively, the line-of-sight estimation device 100 may be applied to an electronic device that can be operated by the movement of the line of sight, or to support driving of an automobile or the like.

（変形例９）
本開示に係る装置（視線推定装置１００又はデータ処理装置２００）の具体的なハードウェア構成は、さまざまなバリエーションが含まれ、特定の構成に限定されない。例えば、本開示に係る装置は、ソフトウェアを用いて実現されてもよく、複数のハードウェアを用いて各種処理を分担するように構成されてもよい。 (Modification 9)
The specific hardware configuration of the device (line-of-sight estimation device 100 or data processing device 200) according to the present disclosure includes various variations and is not limited to a specific configuration. For example, the apparatus according to the present disclosure may be realized by using software, or may be configured to share various processes by using a plurality of hardware.

図９は、本開示に係る装置を実現するコンピュータ装置３００のハードウェア構成の一例を示すブロック図である。コンピュータ装置３００は、ＣＰＵ（Central Processing Unit）３０１と、ＲＯＭ（Read Only Memory）３０２と、ＲＡＭ（Random Access Memory）３０３と、記憶装置３０４と、ドライブ装置３０５と、通信インタフェース３０６と、入出力インタフェース３０７とを含んで構成される。本開示に係る装置は、図９に示される構成（又はその一部）によって実現され得る。 FIG. 9 is a block diagram showing an example of the hardware configuration of the computer device 300 that realizes the device according to the present disclosure. The computer device 300 includes a CPU (Central Processing Unit) 301, a ROM (Read Only Memory) 302, a RAM (Random Access Memory) 303, a storage device 304, a drive device 305, a communication interface 306, and an input / output interface. It is configured to include 307. The apparatus according to the present disclosure can be realized by the configuration (or a part thereof) shown in FIG.

ＣＰＵ３０１は、ＲＡＭ３０３を用いてプログラム３０８を実行する。プログラム３０８は、ＲＯＭ３０２に記憶されていてもよい。また、プログラム３０８は、メモリカード等の記録媒体３０９に記録され、ドライブ装置３０５によって読み出されてもよいし、外部装置からネットワーク３１０を介して送信されてもよい。通信インタフェース３０６は、ネットワーク３１０を介して外部装置とデータをやり取りする。入出力インタフェース３０７は、周辺機器（入力装置、表示装置など）とデータをやり取りする。通信インタフェース３０６及び入出力インタフェース３０７は、データを取得又は出力するための構成要素として機能することができる。 The CPU 301 executes the program 308 using the RAM 303. The program 308 may be stored in the ROM 302. Further, the program 308 may be recorded on a recording medium 309 such as a memory card and read by the drive device 305, or may be transmitted from an external device via the network 310. The communication interface 306 exchanges data with an external device via the network 310. The input / output interface 307 exchanges data with peripheral devices (input device, display device, etc.). The communication interface 306 and the input / output interface 307 can function as components for acquiring or outputting data.

なお、本開示に係る装置の構成要素は、単一の回路（プロセッサ等）によって構成されてもよいし、複数の回路の組み合わせによって構成されてもよい。ここでいう回路（circuitry）は、専用又は汎用のいずれであってもよい。例えば、本開示に係る装置は、一部が専用のプロセッサによって実現され、他の部分が汎用のプロセッサによって実現されてもよい。 The components of the apparatus according to the present disclosure may be composed of a single circuit (processor or the like) or a combination of a plurality of circuits. The circuitry referred to here may be either dedicated or general purpose. For example, the apparatus according to the present disclosure may be partially realized by a dedicated processor and the other part may be realized by a general-purpose processor.

上述された実施形態において単体の装置として説明された構成は、複数の装置に分散して設けられてもよい。例えば、視線推定装置１００は、クラウドコンピューティング技術などを用いて、複数のコンピュータ装置の協働によって実現されてもよい。また、視線推定器２４１_１～２４１_ｎは、互いに異なるコンピュータ装置によって実現されてもよい。 The configuration described as a single device in the above-described embodiment may be distributed to a plurality of devices. For example, the line-of-sight estimation device 100 may be realized by the collaboration of a plurality of computer devices by using cloud computing technology or the like. Further, the line-of-sight estimators 241 ₁ to 241 _n may be realized by different computer devices.

以上、本発明は、上述された実施形態及び変形例を模範的な例として説明された。しかし、本発明は、これらの実施形態及び変形例に限定されない。本発明は、本発明のスコープ内において、いわゆる当業者が把握し得るさまざまな変形又は応用を適用した実施の形態を含み得る。また、本発明は、本明細書に記載された事項を必要に応じて適宜に組み合わせ、又は置換した実施の形態を含み得る。例えば、特定の実施形態を用いて説明された事項は、矛盾を生じない範囲において、他の実施形態に対しても適用し得る。 As described above, the present invention has been described as a model example of the above-described embodiments and modifications. However, the present invention is not limited to these embodiments and modifications. The present invention may include embodiments within the scope of the present invention to which so-called variations or applications that can be grasped by those skilled in the art are applied. In addition, the present invention may include embodiments in which the matters described in the present specification are appropriately combined or replaced as necessary. For example, the matters described using a particular embodiment may be applied to other embodiments as long as they do not cause inconsistency.

［付記］
本開示の一部又は全部は、以下の付記のようにも記載され得る。ただし、本開示は、必ずしもこの付記の態様に限定されない。
（付記１）
顔画像に含まれる顔の視線を複数の推定器で推定する推定手段と、
前記顔画像の撮像に関する条件を含む第１の条件情報と、各々が前記複数の推定器のいずれかに対応する前記条件を含む複数の第２の条件情報と、前記推定された複数の視線とに基づいて前記顔の視線を決定する決定手段と
を備える視線推定装置。
（付記２）
前記条件は、撮像手段による撮像条件を含む
付記１に記載の視線推定装置。
（付記３）
前記条件は、推定される視線の範囲を含む
付記１又は付記２に記載の視線推定装置。
（付記４）
前記決定手段は、前記複数の推定器で推定された複数の視線のそれぞれに対して決定される重みであって、当該推定器に対応する前記第２の条件情報と前記第１の条件情報とに応じて決定される重みに従った重み付き演算を実行する
付記１から付記３までのいずれかに記載の視線推定装置。
（付記５）
前記決定手段は、前記第２の条件情報と前記第１の条件情報との比較結果に基づいて前記重みを決定する
付記４に記載の視線推定装置。
（付記６）
前記決定手段は、前記第２の条件情報が前記第１の条件情報に近いものほど前記重みを大きくする
付記５に記載の視線推定装置。
（付記７）
前記複数の推定器は、前記条件が互いに異なる顔画像に基づいて学習される
付記１から付記６までのいずれかに記載の視線推定装置。
（付記８）
前記顔画像を取得する第１の取得手段と、
前記第１の条件情報を取得する第２の取得手段と、
前記取得された顔画像から目の周辺の領域を抽出する抽出手段と、
前記決定手段により決定された視線を示す視線情報を出力する出力手段とをさらに備え、
前記推定手段は、前記顔画像のうちの前記領域を用いて前記顔の視線を推定する
付記１から付記７までのいずれかに記載の視線推定装置。
（付記９）
顔画像に含まれる顔の視線を複数の推定器で推定し、
前記顔画像の撮像に関する条件を含む第１の条件情報と、各々が前記複数の推定器のいずれかに対応する前記条件を含む複数の第２の条件情報と、前記推定された複数の視線とに基づいて前記顔の視線を決定する
視線推定方法。
（付記１０）
前記第１の条件情報及び前記第２の条件情報は、撮像手段による撮像条件を示す情報を含む
付記９に記載の視線推定方法。
（付記１１）
コンピュータに、
顔画像に含まれる顔の視線を複数の推定器で推定する処理と、
前記顔画像の撮像に関する条件を含む第１の条件情報と、各々が前記複数の推定器のいずれかに対応する前記条件を含む複数の第２の条件情報と、前記推定された複数の視線とに基づいて前記顔の視線を決定する処理と
を実行させるためのプログラムを記録したコンピュータ読み取り可能なプログラム記録媒体。
（付記１２）
前記第１の条件情報及び前記第２の条件情報は、撮像手段による撮像条件を示す情報を含む
付記１１に記載のプログラム記録媒体。 [Additional Notes]
Part or all of this disclosure may also be described as in the appendix below. However, the present disclosure is not necessarily limited to this additional aspect.
(Appendix 1)
Estimating means for estimating the line of sight of the face included in the face image with multiple estimators,
The first condition information including the condition relating to the imaging of the face image, the plurality of second condition information including the condition each corresponding to any of the plurality of estimators, and the estimated plurality of lines of sight. A line-of-sight estimation device including a determination means for determining the line-of-sight of the face based on the above.
(Appendix 2)
The line-of-sight estimation device according to Appendix 1, wherein the conditions include imaging conditions by an imaging means.
(Appendix 3)
The line-of-sight estimation device according to Appendix 1 or Appendix 2, wherein the condition includes an estimated range of the line-of-sight.
(Appendix 4)
The determination means is a weight determined for each of the plurality of lines of sight estimated by the plurality of estimators, and includes the second condition information and the first condition information corresponding to the estimator. The line-of-sight estimation device according to any one of Supplementary note 1 to Supplementary note 3, which executes a weighted operation according to a weight determined according to.
(Appendix 5)
The line-of-sight estimation device according to Appendix 4, wherein the determination means determines the weight based on a comparison result between the second condition information and the first condition information.
(Appendix 6)
The line-of-sight estimation device according to Appendix 5, wherein the determination means increases the weight as the second condition information is closer to the first condition information.
(Appendix 7)
The line-of-sight estimation device according to any one of Supplementary note 1 to Supplementary note 6, wherein the plurality of estimators are learned based on facial images whose conditions are different from each other.
(Appendix 8)
The first acquisition means for acquiring the face image and
A second acquisition means for acquiring the first condition information and
An extraction means for extracting the area around the eyes from the acquired face image,
Further provided with an output means for outputting line-of-sight information indicating the line-of-sight determined by the determination means.
The line-of-sight estimation device according to any one of Supplementary note 1 to Supplementary note 7, wherein the estimation means estimates the line-of-sight of the face using the region of the face image.
(Appendix 9)
Estimate the line of sight of the face included in the face image with multiple estimators,
The first condition information including the condition relating to the imaging of the face image, the plurality of second condition information including the condition each corresponding to any of the plurality of estimators, and the estimated plurality of lines of sight. A line-of-sight estimation method for determining the line-of-sight of the face based on.
(Appendix 10)
The line-of-sight estimation method according to Appendix 9, wherein the first condition information and the second condition information include information indicating imaging conditions by the imaging means.
(Appendix 11)
On the computer
Processing to estimate the line of sight of the face included in the face image with multiple estimators,
The first condition information including the condition relating to the imaging of the face image, the plurality of second condition information including the condition each corresponding to any of the plurality of estimators, and the estimated plurality of lines of sight. A computer-readable program recording medium that records a program for executing the process of determining the line of sight of the face based on the above.
(Appendix 12)
The program recording medium according to Appendix 11, wherein the first condition information and the second condition information include information indicating imaging conditions by the imaging means.

１００視線推定装置
１１０推定部
１２０決定部
２００データ処理装置
２１０画像取得部
２２０条件取得部
２３０領域抽出部
２４０視線推定部
２４１視線推定器
２５０統合部
２６０出力部
３００コンピュータ装置 100 Line-of-sight estimation device 110 Estimating unit 120 Determining unit 200 Data processing device 210 Image acquisition unit 220 Condition acquisition unit 230 Area extraction unit 240 Line-of-sight estimation unit 241 Line-of-sight estimation unit 250 Integration unit 260 Output unit 300 Computer device

Claims

An estimation means for estimating a plurality of lines of sight of an object included in a target image by using a plurality of estimators that have learned a plurality of images under different conditions, and an estimation means.
A weighted operation is executed according to the weights determined for each of the plurality of lines of sight estimated by the plurality of estimators and determined according to the conditions and the conditions relating to the target image. Then , the determination means for determining the line of sight of the object and
Information processing device equipped with.

The plurality of estimators include a first estimator learned under the first condition and a second estimator learned under the second condition.
The estimation means estimates a plurality of lines of sight of the target by using the first estimator and the second estimator.
The information processing device according to claim 1.

The first estimator is trained using an image captured from the first image pickup device, and is trained.
The second estimator is learned using an image captured by the second image pickup device.
The information processing device according to claim 2.

The first imaging device and the second imaging device have different installation angles.
The information processing device according to claim 3.

The first imaging device and the second imaging device have different performances.
The information processing apparatus according to claim 3 or 4.

The computer
Using multiple estimators that have learned multiple images with different conditions, multiple gazes of the target included in the target image are estimated.
A weighted operation is executed according to the weights determined for each of the plurality of lines of sight estimated by the plurality of estimators and determined according to the conditions and the conditions relating to the target image. Then , the line of sight of the target is determined.
Information processing method.

A process of estimating multiple lines of sight of a target included in a target image using a plurality of estimators that have learned a plurality of images under different conditions, and
A weighted operation is executed according to the weights determined for each of the plurality of lines of sight estimated by the plurality of estimators and determined according to the conditions and the conditions relating to the target image. Then , the process of determining the line of sight of the target and
A program that causes a computer to run.