JP2014509535A

JP2014509535A - Gaze point mapping method and apparatus

Info

Publication number: JP2014509535A
Application number: JP2013558461A
Authority: JP
Inventors: ウィリアムズ，デニス; ホフマン、ジャン
Original assignee: ゼンゾモトリックインストルメンツゲゼルシャフトフュアイノベイティブゼンゾリクミットベシュレンクテルハフツング
Priority date: 2011-03-18
Filing date: 2012-03-16
Publication date: 2014-04-21
Also published as: CN103501688B; US20140003738A1; WO2012126844A1; CN103501688A; US9164583B2; EP2499963A1

Abstract

【課題】シーン画像上の対象の注視点を基準画像中の注視点にマッピングする装置を提供する。
【解決手段】シーン画像と基準画像は異なる位置からカメラで撮られており、装置は、基準画像に対し特徴検出アルゴリズムを実行し、基準画像中の複数の特性とその位置を特定するモジュールと、シーン画像に対し特徴検出アルゴリズムを実行し、シーン画像中の複数の特性とその位置を再特定するモジュールと、基準画像とシーン画像で検出された複数の特性の位置に基づき、シーン画像と基準画像との間で点位置を変換する点転写マッピングを決定するモジュールと、点転写マッピングを用いてシーン画像で決定された注視点を基準画像中の対応する点にマッピングするモジュールとからなる。
【選択図】図１An apparatus for mapping a target gazing point on a scene image to a gazing point in a reference image is provided.
A scene image and a reference image are taken by a camera from different positions, and the apparatus executes a feature detection algorithm on the reference image to identify a plurality of characteristics in the reference image and their positions; A module that executes a feature detection algorithm on a scene image and re-specifies the characteristics in the scene image and their positions, and the position of the scene image and the reference image based on the positions of the characteristics detected in the reference image and the scene image And a module for mapping a point of interest determined in the scene image using the point transfer mapping to a corresponding point in the reference image.
[Selection] Figure 1

Description

本発明は、注視点マッピング方法および装置に関し、特にシーン中の注視点へ対象の注視方向をマッピングする方法および装置に関する。 The present invention relates to a gaze point mapping method and apparatus, and more particularly to a method and apparatus for mapping a target gaze direction to a gaze point in a scene.

解決すべき問題は、（おそらくは動いている）人間が注視している点または物体、より具体的には物体の表面の一部を見つけることである。この問題に対する既存の解決法を下記に述べるが、これは別個の部分に分割可能である。 The problem to be solved is to find the point or object that the (possibly moving) person is gazing at, more specifically the part of the surface of the object. An existing solution to this problem is described below, which can be divided into separate parts.

まず、人の注視方向（または瞳/ＣＲの組み合わせ、角膜中心および瞳/縁等、その表示）を見つける。 First, the person's gaze direction (or the pupil / CR combination, corneal center and pupil / edge, etc., display thereof) is found.

この注視方向を頭部搭載型シーンカメラまたはあらゆる固定位置のシーンカメラによって捕捉したシーンの画像にマッピングする。頭部搭載型シーンカメラは目に対して固定されるため、一旦対応する較正を実行すればこのようなマッピングが可能である。 This gaze direction is mapped to a scene image captured by a head-mounted scene camera or a scene camera at any fixed position. Since the head mounted scene camera is fixed with respect to the eyes, such mapping is possible once the corresponding calibration is executed.

次のステップは、対象の移動により変更可能な、頭部搭載型カメラによって捕捉されたシーン画像中の注視点を、移動せず、「現実世界」の物体に対応する（安定した）基準画像中の点にマッピングすることである。 The next step is to change the target's movement in the reference image that corresponds to the object in the "real world" without moving the gaze point in the scene image captured by the head-mounted camera. It is to map to the point.

注視方向の決定には、アイトラッカーを用いることができる。アイトラッカーは、注視方向を算出するため、瞳、縁、強膜上の血管または光源の反射（角膜反射）等の目の特徴を観察する。
注視方向を頭部搭載型シーンカメラの画像にマッピングできるのであれば、どんな種類のアイトラッカーでも利用できる。 An eye tracker can be used to determine the gaze direction. The eye tracker observes eye features such as pupils, edges, blood vessels on the sclera, or light source reflection (corneal reflection) to calculate the gaze direction.
Any kind of eye tracker can be used as long as the gaze direction can be mapped to the image of the head mounted scene camera.

対象の頭が動かない場合、一旦較正を行えば、注視方向の決定により基準画像上の注視点が直接的に与えられる。頭が動かないという特殊な場合の較正により、頭部搭載型カメラが動かず、基準画像に対して固定位置にあると、シーン画像と基準画像とが同一であるため、シーン画像の点から基準画像の点へ注視方向をマッピングできる。 If the subject's head does not move, once calibration is performed, the gaze point on the reference image is directly given by determining the gaze direction. When the head mounted camera does not move and is in a fixed position with respect to the reference image, the scene image and the reference image are the same because of the calibration in the special case that the head does not move. Gaze direction can be mapped to image points.

しかし、頭と目が動く場合、頭が動いた後、頭部搭載型カメラで撮影したあるシーン画像に対する注視方向の検出に基づき、動かない基準画像で注視点を決定することは、その時のシーン画像が、基準画像の対応する注視点に対する注視方向の較正に用いた基準画像と同一ではなくなっているため、より複雑なものになる。 However, when the head and eyes move, it is possible to determine the gazing point with the non-moving reference image based on the detection of the gaze direction for a certain scene image taken by the head mounted camera after the head moves. The image is more complex because it is no longer the same as the reference image used to calibrate the gaze direction with respect to the corresponding gazing point of the reference image.

注視点を決定する１つの可能なアプローチとして、アイトラッカーに関して定義された仮想シーン平面に注視方向を交差させるものがある。ＷＯ２０１０/０８３８５３Ａ１は、そのために、例えば本棚など一定の位置に固定される複数のアクティブＩＲマーカーの使用を開示する。これらマーカーの位置は、２つのラインセンサの最大強度を検出することにより２つの直交角を検出する２つの直交ＩＲライン検出器を用い、頭部搭載型カメラによって得られた「基準」画像となる「試験シーン」に対してまず検出される。ＩＲ源の検出角度は基準画像中のその位置に対応する。そしてマーカーの角度を、異なる位置から頭部搭載型カメラで撮った後に検出されたシーンについて検出し、これによって後のシーン画像中のＩＲ源の位置を検出する。そして、頭部搭載型カメラが別の位置にある時、後に撮った画像中に検出されたＩＲ源の位置（シーン画像）を試験画像（または基準画像）中のＩＲ光源の位置に変換するマッピングである「透視投影」が決定される。この変換により、シーン画像について後に決定される注視点も、試験画像中の対応する（実際の）注視点に変換することができる。 One possible approach to determining the point of gaze is to cross the direction of gaze on a virtual scene plane defined for the eye tracker. For this purpose, WO 2010/083853 A1 discloses the use of a plurality of active IR markers which are fixed in a certain position, for example a bookshelf. The positions of these markers are "reference" images obtained by a head mounted camera using two orthogonal IR line detectors that detect two orthogonal angles by detecting the maximum intensity of the two line sensors. A “test scene” is first detected. The detection angle of the IR source corresponds to its position in the reference image. Then, the angle of the marker is detected for a scene detected after taking a picture with a head-mounted camera from a different position, thereby detecting the position of the IR source in the subsequent scene image. When the head-mounted camera is at a different position, mapping is performed to convert the position of the IR source (scene image) detected in the image taken later to the position of the IR light source in the test image (or reference image). The “perspective projection” is determined. By this conversion, a gaze point determined later for the scene image can also be converted into a corresponding (actual) gaze point in the test image.

実際の「シーン画像」から、時間不変である安定した基準画像へ注視点をマッピングすることは、アイトラッカー（ＥＴ）の代わりにシーン安定マーカーに対して注視点をマッピングする面を定義することによって可能になる。このようにして、基準画像面が長期に安定し、他の参加者の注視もそこへマッピングすることにより、以前は固定位置のアイトラッカーでしかできなかったような、時間の経過と共に多くの参加者の注視点情報を集めることができるようになる。このため、ＷＯ２０１０/０８３８５３Ａ１に開示する従来技術は、人工的マーカーとして、その位置が直交ＩＲライン検出器によって検出することができるＩＲ源を用い、最大放射角度を検出する。 Mapping a gaze point from an actual "scene image" to a stable reference image that is time-invariant is by defining a plane that maps the gaze point to a scene stability marker instead of an eye tracker (ET) It becomes possible. In this way, the reference image plane is stable over time, and other participants' gazes are mapped to it so that many participants over time, which were previously only possible with fixed-position eye trackers. It becomes possible to collect attention point information. For this reason, the prior art disclosed in WO2010 / 083853 A1 uses an IR source whose position can be detected by an orthogonal IR line detector as an artificial marker, and detects the maximum radiation angle.

注視点のシーン画像から基準画像への変換を決定するためのマーカーとしてＩＲ源を用いる方法は複雑で不便である。これには、人工的ＩＲ光源を取り付けなければならず、２つの直交ラインセンサからなる追加ＩＲ検出器を設けなければならない。従って、頭部搭載型シーンカメラがこのような外部マーカーなしに動くとしても、注視点マッピングを決定できるアプローチを提供することが望ましい。 The method of using an IR source as a marker for determining the conversion from a gaze point scene image to a reference image is complicated and inconvenient. For this, an artificial IR light source must be attached and an additional IR detector consisting of two orthogonal line sensors must be provided. Therefore, it is desirable to provide an approach that allows gaze point mapping to be determined even if the head mounted scene camera moves without such external markers.

ある実施形態によれば、基準画像中の注視点に対してシーン画像上の対象の注視点をマッピングするための装置が提供され、該シーン画像および該基準画像は異なる位置からカメラで撮ったもので、該装置は、
該基準画像に対し特徴検出アルゴリズムを実行し、複数の特性と該基準画像におけるその位置を特定するためのモジュールと、
該シーン画像に対し該特徴検出アルゴリズムを実行し、該複数の特性と該シーン画像におけるその位置を再特定するためのモジュールと、
該シーン画像と該基準画像との間で、該基準画像と該シーン画像において検出された該複数の特性の位置に基づき点位置を変換する点転写マッピングを決定するためのモジュールと、
該点転写マッピングを用いて、該シーン画像中で決定された注視点を該基準画像中の対応する点にマッピングするためのモジュールとからなる。 According to an embodiment, there is provided an apparatus for mapping a target gazing point on a scene image to a gazing point in a reference image, the scene image and the reference image taken with cameras from different positions. And the device is
A module for performing a feature detection algorithm on the reference image to identify a plurality of characteristics and their position in the reference image;
A module for executing the feature detection algorithm on the scene image and re-identifying the plurality of characteristics and their position in the scene image;
A module for determining a point transfer mapping for converting a point position between the scene image and the reference image based on the position of the reference image and the plurality of characteristics detected in the scene image;
A module for mapping the point of interest determined in the scene image to a corresponding point in the reference image using the point transfer mapping.

これにより、いかなる人工的ＩＲ源およびＩＲ検出器も必要としない注視点マッピングの実現を可能とする。これは、可視周波数範囲で動作する通常のＣＣＤカメラで撮った自然なシーンの、通常で修正のない画像上で動作可能である。 This allows for the realization of gaze mapping that does not require any artificial IR source and IR detector. It can operate on normal, uncorrected images of natural scenes taken with a normal CCD camera operating in the visible frequency range.

ある実施形態によれば、該点転写マッピングは、基準画像とシーン画像との特徴から推定され、該シーン画像から該基準画像へワールドシーンの仮想転写面を介して点位置を転写する平面ホモグラフィである。 According to an embodiment, the point transfer mapping is estimated from features of a reference image and a scene image, and a planar homography that transfers point positions from the scene image to the reference image via a virtual transfer surface of a world scene. It is.

これは、点転写マッピングの特に好適な実施である。 This is a particularly preferred implementation of point transcription mapping.

ある実施形態によれば、この装置は、
該基準画像で１つ以上の関心区域を決定し、その中で該特徴検出アルゴリズムを実行し、その中の該特性を検出および特定するためのモジュールからさらになる。
これにより、特徴検出アルゴリズムに適した領域を選択できる。これにより、ユーザが特に特徴的な特性を持つ領域を選択できるようにすることにより、特徴検出アルゴリズムの効率が強化される。 According to an embodiment, the device is
It further comprises a module for determining one or more regions of interest in the reference image, executing the feature detection algorithm therein, and detecting and identifying the characteristic therein.
Thereby, a region suitable for the feature detection algorithm can be selected. This enhances the efficiency of the feature detection algorithm by allowing the user to select regions with particularly characteristic characteristics.

ある実施形態によれば、該シーン画像中の該複数の特性を検出し、該シーン画像中の該特性を検出し、再特定するための該特徴検出アルゴリズムは次のうちのいずれかからなる。
スケール不変特徴転写アルゴリズムと、
高速化頑健特徴アルゴリズム。 According to an embodiment, the feature detection algorithm for detecting the plurality of characteristics in the scene image, detecting and respecifying the characteristics in the scene image comprises any of the following:
Scale invariant feature transcription algorithm,
Fast and robust feature algorithm.

これらは、特徴検出および再特定アルゴリズムの特に好適な実施である。 These are particularly preferred implementations of feature detection and re-identification algorithms.

ある実施形態によれば、装置は、
異なる人および/または異なる時間から基準画像中の対応する注視点へ注視方向をマッピングし、時間の経過と共にマッピングした注視点を、場合により異なるユーザについても記録するためのモジュールからさらになる。 According to an embodiment, the device is
It further comprises a module for mapping gaze directions from different people and / or different times to corresponding gaze points in the reference image and recording the gaze points mapped over time, possibly also for different users.

これにより、時間の経過と共に、場合により異なるユーザについてさえ、注視データを集積および累積可能になる。 This allows gaze data to be accumulated and accumulated over time, even for different users.

ある実施形態によれば、この装置は、
該シーン画像から該基準画像へマッピングされた該注視点を該基準画像中に表示して、場合により時間の経過と共にその位置が展開するにつれ、該注視点を可視化するモジュールからさらになる。 According to an embodiment, the device is
The module further includes a module for displaying the gazing point mapped from the scene image to the reference image in the reference image, and visualizing the gazing point as the position expands as time passes.

これにより、場合により時間をかけてでも基準画像へマッピングした注視点の可視化が可能になる。 This makes it possible to visualize the gazing point mapped to the reference image even if it takes some time.

ある実施形態によれば、この装置は、
該シーン画像から該基準画像へマッピングした該注視点を、該基準画像と異なる可視化画像で可視化するモジュールからさらになり、該モジュールは、
該基準画像から該可視化画像へ点を転写するための点転写マッピングを決定するモジュールと、
該点転写マッピングを用いて、該注視点を該基準画像から該可視化画像中のその対応する点へマッピングするモジュールと、
該注視点を該可視化画像中の該対応する点上に表示するモジュールとからなる。 According to an embodiment, the device is
The gazing point mapped from the scene image to the reference image is further visualized with a visualization image different from the reference image, and the module includes:
A module for determining a point transfer mapping for transferring points from the reference image to the visualized image;
A module for mapping the point of interest from the reference image to its corresponding point in the visualized image using the point transfer mapping;
And a module for displaying the gazing point on the corresponding point in the visualized image.

これにより、注視点を該基準画像とは異なる画像にマッピングし、ここで可視化できるようになる。 As a result, the gazing point can be mapped to an image different from the reference image and visualized here.

ある実施形態によれば、この装置は、
特徴検出アルゴリズムによって該基準画像中で検出され、該特徴検出アルゴリズムによって該可視化画像中で検出され再特定された特性に基づき該点転写マッピングを決定するモジュールからさらになる。 According to an embodiment, the device is
It further comprises a module that determines the point transfer mapping based on characteristics detected in the reference image by a feature detection algorithm and detected and re-specified in the visualized image by the feature detection algorithm.

これは、可視化のための点転写マッピングの好適な実施である。 This is a preferred implementation of point transcription mapping for visualization.

ある実施形態によれば、該可視化画像は次のうちのいずれかである。
異なる位置から撮ったシーン画像から選択した画像、異なる位置から撮ったシーン画像のうち２枚以上をスティッチすることで生成された画像、
シーンの幾何学的に修正された画像、
歪みが取り除かれたシーンの画像、
注視点の実際の測定と異なる時に撮った画像、
該基準画像より解像度の高い画像、
漫画またはスケッチ、
定型化した図、
手製の図。 According to an embodiment, the visualized image is one of the following:
Images selected from scene images taken from different positions, images generated by stitching two or more scene images taken from different positions,
A geometrically modified image of the scene,
An image of the scene with the distortion removed,
Images taken at different times from the actual measurement of the gazing point,
An image having a higher resolution than the reference image;
Cartoon or sketch,
Stylized figure,
Homemade illustration.

これらは、基準画像と異なる可視化画像の好適な実施である。 These are preferred implementations of the visualized image different from the reference image.

ある実施形態によれば、この装置は、
該注視点をビデオシーケンスのフレーム内で可視化するモジュールからさらになり、該モジュールは、
該基準画像の点から該ビデオシーケンスのビデオフレーム内の対応する点への点転写マッピングを決定するモジュールと、
該点転写マッピングを用いて、該基準画像から対応する点への注視点をビデオシーケンスのフレームの対応する点内へマッピングするモジュールと
からなる。 According to an embodiment, the device is
Further comprising a module for visualizing the point of interest within a frame of a video sequence, the module comprising:
A module for determining a point transfer mapping from a point of the reference image to a corresponding point in a video frame of the video sequence;
A module that maps the point of interest from the reference image to the corresponding point into the corresponding point of the frame of the video sequence using the point transfer mapping.

これにより、ビデオシーケンス内の注視点の可視化が可能になる。 This makes it possible to visualize the point of interest in the video sequence.

ある実施形態によれば、この装置は、
第1のカメラによって撮ったシーン画像で検出されたユーザの注視点を、該第1のカメラと異なる位置から第２のカメラによって撮ったビデオの対応するフレーム中の対応する位置へマッピングするモジュールからさらになる。 According to an embodiment, the device is
From a module that maps a user's point of interest detected in a scene image taken by a first camera from a position different from the first camera to a corresponding position in a corresponding frame of a video taken by a second camera Become more.

これにより、シーン画像と異なる位置から撮ったビデオシーケンスでの可視化が可能になる。 This enables visualization with a video sequence taken from a position different from the scene image.

ある実施形態によれば、この装置は、
対象の注視を追跡するためのアイトラッキングモジュール、および/または
シーン画像を撮るための頭部搭載型カメラ、および/または
注視点を該頭部搭載型カメラで撮ったシーン中の対応する点にマッピングするための較正モジュールからさらになる。 According to an embodiment, the device is
Eye tracking module for tracking a subject's gaze and / or head mounted camera for taking a scene image and / or mapping a point of gaze to a corresponding point in a scene taken with the head mounted camera And further comprises a calibration module.

これにより、アイトラッカーと、較正モジュールと、シーンカメラとからなる完全システムの実現が可能になる。 This makes it possible to implement a complete system comprising an eye tracker, a calibration module, and a scene camera.

ある実施形態によれば、シーン画像上の対象の注視点を基準画像中の注視点にマッピングする方法が提供され、該シーン画像と該基準画像とは異なる位置からカメラで撮ったもので、該方法は、
該基準画像に対し特徴検出アルゴリズムを実行し、該基準画像中で複数の特性とその位置を特定し、該シーン画像に対し該特徴検出アルゴリズムを実行して該シーン画像中で該複数の特性とその位置を再特定し、該シーン画像と該基準画像との間で、該基準画像と該シーン画像とで検出された該複数の特性の位置に基づき、点位置を変換する点転写マッピングを決定し、該点転写マッピングを用いて、該シーン画像で決定した注視点を該基準画像中の対応する点にマッピングすることからなる。 According to an embodiment, there is provided a method for mapping a target gazing point on a scene image to a gazing point in a reference image, wherein the scene image and the reference image are taken by a camera from different positions. The method is
A feature detection algorithm is executed on the reference image, a plurality of characteristics and positions thereof are specified in the reference image, and the feature detection algorithm is executed on the scene image to determine the characteristics in the scene image. Re-specify the position, and determine the point transfer mapping to convert the point position between the scene image and the reference image based on the positions of the plurality of characteristics detected in the reference image and the scene image Then, using the point transfer mapping, the gazing point determined in the scene image is mapped to a corresponding point in the reference image.

このように、本発明の実施形態による方法を実施することができる。 Thus, the method according to the embodiment of the present invention can be implemented.

この方法は、その他実施形態のいずれかの追加的特徴によって実行するステップをさらに含むことができる。 The method can further include performing according to additional features of any of the other embodiments.

ある実施形態によれば、コンピュータ上で実行されると、本発明の実施形態のいずれかによる方法を該コンピュータに実行させることができるコンピュータプログラムコードからなるコンピュータプログラムが提供される。
According to an embodiment, there is provided a computer program comprising computer program code that, when executed on a computer, causes the computer to execute a method according to any of the embodiments of the invention.

図１は、本発明の実施形態による注視点のマッピングを概略で示す。FIG. 1 schematically illustrates gaze point mapping according to an embodiment of the present invention. 図２は、本発明の実施形態による最終基準画像への初期基準画像のマッピングを概略で示す。FIG. 2 schematically illustrates the mapping of the initial reference image to the final reference image according to an embodiment of the present invention.

ある実施形態によると、シーン画像から異なる位置で撮られた基準画像へ注視点をマッピングすることが可能な方法および装置が提供される。
このため、基準画像において1組の特性（基準特徴）を利用する。 According to certain embodiments, a method and apparatus are provided that can map a point of interest from a scene image to a reference image taken at a different location.
For this reason, a set of characteristics (reference features) is used in the reference image.

ある実施形態によると、これら特性は、基準画像において特定可能な特徴を検索することができる特徴検出アルゴリズムによって特定される。ある実施形態によると、特徴検出アルゴリズムは、異なる位置から撮られ、後に撮った画像においても検出特徴を再特定することができるように選択する。特徴検出アルゴリズムを用いて、シーン画像が異なる位置で撮られたものであっても、基準点、線または領域等の特性を後に撮ったシーン画像で再び再特定することができる。 According to one embodiment, these characteristics are identified by a feature detection algorithm that can search for identifiable features in the reference image. According to one embodiment, the feature detection algorithm is selected so that the detected features can be re-specified in images taken from different locations and later taken. Using the feature detection algorithm, even if the scene image is taken at a different position, characteristics such as a reference point, line, or region can be re-specified by the scene image taken later.

特徴検出アルゴリズムによって検出可能な好適な特徴は、例えば、基準画像中のブロブ、エッジまたは角等の特徴的な点または領域である。特性として用いることの可能なこれら点または領域は、カメラによって撮った画像中の特徴検出が可能な画像処理アルゴリズムによって特定でき、このような特徴検出アルゴリズムを用いて、後に異なる位置からカメラによって撮った画像中で再特定できる。 Suitable features that can be detected by the feature detection algorithm are, for example, characteristic points or regions such as blobs, edges or corners in the reference image. These points or regions that can be used as characteristics can be identified by an image processing algorithm capable of detecting features in the images taken by the camera, and later taken by the camera from different locations using such feature detection algorithms. It can be re-specified in the image.

画像を処理して１つ以上の特性を検出できる特徴検出アルゴリズムを用いることで、ＩＲ源のような人工的マーカーやＩＲライン検出器のような追加検出器を使う必要がなくなることに注意するべきである。その代り、そのシーンでいかなる人工的構成部品や光源を取り付けることなく、追加検出器の必要なしに、カメラで撮った自然なシーンの画像自体を用いて、マッピングを決定するための基準点を特定することができる。これは、従来技術でＩＲ源によって提供された人工的マーカーの必要なしに、（無修正または「自然な」）シーン画像中で特性の位置を検出する特徴検出アルゴリズムを適用することによって達成される。 Note that using a feature detection algorithm that can process the image and detect one or more characteristics eliminates the need to use artificial markers such as IR sources or additional detectors such as IR line detectors. It is. Instead, the natural scene image taken with the camera itself can be used to determine the mapping point without having to install any artificial components or light sources in the scene and without the need for additional detectors. can do. This is achieved by applying a feature detection algorithm that detects the position of the feature in the (unmodified or “natural”) scene image without the need for artificial markers provided by IR sources in the prior art. .

ある実施形態によると、ユーザは、特性を検出するために特徴検出アルゴリズムを実行する必要のある１つ以上の領域を（例えばマウスその他入力手段により）手動で選択することができる。このような点または領域は、基準画像中でユーザが、例えばその上をマウスでクリックすることでマークしたり、それらを囲む矩形または円等の区域を限定することで、特徴検出アルゴリズムを実行する１つ以上の「関心区域」を定義したりすることにより手動で選択できる。 According to certain embodiments, the user can manually select one or more regions (eg, with a mouse or other input means) that need to run a feature detection algorithm to detect a characteristic. Such points or areas are marked in the reference image by the user, for example by clicking on it with a mouse, or by limiting the area surrounding the rectangle or circle, for example, to execute a feature detection algorithm It can be selected manually by defining one or more “regions of interest”.

こうして特徴検出アルゴリズムの実行によってこのような関心区域で対応する位置と共に検出された特性は、異なる位置から頭部搭載型シーンカメラで撮った後の画像、いわゆる「シーン画像」中で、以下に詳しく述べるようにシーン画像で特性を再特定することにより、再び検出される。基準画像中で検出された特性と、シーン画像中で再特定された特性は共に、シーン画像と基準画像との間で点位置を変換する点転写マッピングの決定に用いることができる。このようなマッピングに考えられるある実施形態は、基準画像とシーン画像での特徴位置から推定可能で、ワールドシーンの仮想転写面を介してシーン画像から基準画像へ点位置を転写するホモグラフィである。 The characteristics detected together with the corresponding positions in such a region of interest by the execution of the feature detection algorithm are described in detail below in the so-called “scene image” after being taken by the head-mounted scene camera from different positions. It is detected again by respecifying the characteristics in the scene image as described. Both the characteristic detected in the reference image and the characteristic re-specified in the scene image can be used to determine a point transfer mapping that converts the point position between the scene image and the reference image. One possible embodiment for such a mapping is a homography that can be estimated from feature positions in the reference image and the scene image, and that transfers the point position from the scene image to the reference image via the virtual transfer plane of the world scene. .

実際の測定シーケンス前に撮った基準画像は、実世界シーンのカメラ画像面への投影からなる。その時異なる位置にあるカメラの異なる位置から後で撮ったシーン画像は、実世界シーンのカメラの画像面への異なる投影からなる。しかしながら、基準画像にまず複数の特性（少なくとも３つ以上）が特定され、後にシーン画像で再特定される場合、シーン画像から基準画像の対応する点への点の間のマッピングを決定することができる。そして、このマッピングとなる変換を用いて、シーン画像から基準画像の対応する点への注視点マッピングが可能である。 The reference image taken before the actual measurement sequence consists of the projection of the real world scene onto the camera image plane. Scene images taken later from different positions of the camera at different positions then consist of different projections of the real world scene onto the image plane of the camera. However, if multiple characteristics (at least three or more) are first identified in the reference image and later re-specified in the scene image, it is possible to determine the mapping between points from the scene image to corresponding points in the reference image it can. Then, the gaze point mapping from the scene image to the corresponding point of the reference image can be performed by using the conversion as the mapping.

しかしながら、まず、特徴的基準特徴を取得するセットアップ手順と、ある実施形態に係る対応する較正機構を説明する。最初に、実際の測定シーケンス（異なる位置からの「シーン画像」を撮り、注視方向を決定するシーケンス）前に基準画像を撮る。これをアイトラッカーの較正中に用いて、アイトラッカーにより検出される注視方向と、シーンカメラによって撮る基準画像の対応する点との対応を較正する。この方法で、基準画像の対応する点と、ひいては基準画像中に見える「実世界シーン」の対応する点への注視方向の較正を実行することができる。これは、頭部搭載型シーンカメラが動かない場合、注視点への注視方向をマッピングする従来の較正に対応する。 However, first, a setup procedure for obtaining characteristic reference features and a corresponding calibration mechanism according to an embodiment will be described. First, a reference image is taken before the actual measurement sequence (a sequence in which “scene images” from different positions are taken and the direction of gaze is determined). This is used during eye tracker calibration to calibrate the correspondence between the gaze direction detected by the eye tracker and the corresponding point in the reference image taken by the scene camera. In this way, the gaze direction calibration can be performed on the corresponding points of the reference image and thus the corresponding points of the “real world scene” visible in the reference image. This corresponds to the conventional calibration that maps the gaze direction to the gaze point when the head-mounted scene camera does not move.

後の実際の測定において、頭部搭載型シーンカメラと頭および目が動くと、シーンカメラの画像（「シーン画像」）が変化する。そのため、基準画像で検出された特性の位置もこれに従って変化する。
このようなシナリオでは、ある実施形態によると、注視点検出は次のように実行される。 In later actual measurements, when the head-mounted scene camera and the head and eyes move, the image of the scene camera (“scene image”) changes. For this reason, the position of the characteristic detected in the reference image also changes accordingly.
In such a scenario, according to an embodiment, gaze point detection is performed as follows.

まず、特徴的基準特徴が、異なる位置から頭部搭載型シーンカメラで撮ったシーン画像中で再び検出、または「再特定」される。
このため、特性を検出し、セットアップ手順中に決定した基準画像の既知の基準特徴と一致させることの可能な画像処理アルゴリズムを用いる。 First, characteristic reference features are again detected or “re-identified” in scene images taken with a head-mounted scene camera from different positions.
For this reason, an image processing algorithm is used that can detect the characteristics and match the known reference features of the reference image determined during the setup procedure.

基準画像中の特性の検出と、異なる位置から撮ったシーン画像中でそれらを再特定するために実際に用いる特徴検出アルゴリズムは、アプリケーション領域に最も適した特徴タイプにより選択することができる。 The feature detection algorithm that is actually used to detect the characteristics in the reference image and re-specify them in the scene images taken from different positions can be selected according to the feature type most suitable for the application area.

例えば、基準画像中で検出され、シーン画像中で再検出および再特定する特性が角部である場合、Moravec角検出アルゴリズム、Harris & Stephens角検出アルゴリズム、Plessey角検出アルゴリズム等の角検出アルゴリズムを用いることができる。 For example, if the corner is the characteristic that is detected in the reference image and re-detected and re-specified in the scene image, use a corner detection algorithm such as the Moravec angle detection algorithm, Harris & Stephens angle detection algorithm, Plessey angle detection algorithm, etc. be able to.

一旦基準画像と異なる位置から撮ったシーン画像で特性を検出または再特定したら、これらを用いてシーン画像の点から基準画像の点への平面投影変換を推定することができる。 Once the characteristics are detected or re-specified in the scene image taken from a position different from the reference image, they can be used to estimate the planar projection transformation from the point of the scene image to the point of the reference image.

これを、図１を参照してより詳細に説明する。 This will be described in more detail with reference to FIG.

図１は、本棚からなる情景を示す。当初基準画像として、第1の位置（「基準位置」）でシーンカメラＳＣによってシーン画像を撮り、これによりシーンカメラＳＣの画像面に投影する。図示のように、このシーンは本棚からなり、画像を撮った時のシーンカメラＳＣの画像面に投影されている。 FIG. 1 shows a scene consisting of bookshelves. As an initial reference image, a scene image is taken by the scene camera SC at the first position (“reference position”), and is thereby projected onto the image plane of the scene camera SC. As shown in the figure, this scene comprises a bookshelf and is projected on the image plane of the scene camera SC when an image is taken.

セットアップ手順中、基準画像、または本棚を取り囲む矩形のような関心領域内等その一部の中で特性を検出することにより、1組の特徴的基準特徴を定義する。この領域を、領域を定義し、その中で特徴検出アルゴリズムを実行して特性を検出するための較正手順中にユーザによって「関心領域」として選択しておいてよい。関心領域は、例えばマウスを用いて選択し、これによってシーンカメラＳＣで撮った基準画像中に本棚を取り囲む矩形を選択してよい。 During the setup procedure, a set of characteristic reference features is defined by detecting characteristics in a reference image, or part thereof, such as within a region of interest such as a rectangle surrounding the bookshelf. This region may be selected as a “region of interest” by the user during a calibration procedure to define the region and execute the feature detection algorithm therein to detect the characteristic. The region of interest may be selected using, for example, a mouse, thereby selecting a rectangle surrounding the bookshelf in the reference image taken by the scene camera SC.

1組の特徴的基準特徴を定義するもう一つの可能性は、例えば、４つの点、本棚の４つの前部角を選択することである。このように４つの角点を（例えばマウスによって）選択することで、本実施形態によると、マウスクリックで選択された選択点周囲の小さい（所定）区域の選択に対応でき、この区域内において、特徴検出アルゴリズムを実行し、特徴検出アルゴリズムにより特徴と（マウスクリックで選択した点を取り囲む所定区域内での）その位置を決定する。このようにして特徴的基準特徴の定義を、（本棚の角等、関心点をマウスクリックで選択するための）手動選択と、（マウスクリックで選択した点周囲の所定の区域内での）特徴検出アルゴリズムの実行による自動検出とを組み合わせて実行する。 Another possibility to define a set of characteristic reference features is to select, for example, four points, four front corners of the bookshelf. By selecting four corner points in this manner (for example, with a mouse), according to the present embodiment, it is possible to cope with selection of a small (predetermined) area around the selection point selected by mouse click. The feature detection algorithm is executed, and the feature and its position (within a predetermined area surrounding the point selected by the mouse click) are determined by the feature detection algorithm. In this way, the definition of the characteristic reference features can be made manually (to select a point of interest, such as a corner of a bookcase, with a mouse click) and to a feature (within a predetermined area around the point selected with the mouse click). It is executed in combination with automatic detection by execution of the detection algorithm.

別の実施形態によると、基準画像全体は、その中で特性が特徴検出アルゴリズムによって検出される「関心領域」を形成してよい。すると、図１で本棚を取り囲む矩形で示したもののような関心領域」が必要ない。しかしながら、好適な実施形態では、少なくとも１つの関心形状、区域または領域をその内容によって特徴検出アルゴリズムを制約することによって選択することで、特徴的基準特徴組を定義する。 According to another embodiment, the entire reference image may form a “region of interest” in which characteristics are detected by a feature detection algorithm. Then, the “region of interest” like that shown by the rectangle surrounding the bookshelf in FIG. 1 is not necessary. However, in a preferred embodiment, the characteristic reference feature set is defined by selecting at least one shape, area or region of interest by constraining the feature detection algorithm by its content.

この基準画像（または注視較正に適したその他画像）により、注視方向の較正を従来の方法で実行することができ、これは、アイトラッカーＥＴにより決定される注視方向と、基準画像中の対応する注視点との間のマッピングを実行することを意味する。これは例えば、対象に基準画像（または注視較正に適したその他画像）中の定義点を見させ、アイトラッカーによって測定される対応する注視方向を決定することによって行うことができる。このデータに基づき、アイトラッカーによって決定された注視方向と、基準画像（ひいては実シーンの画像）中の対応する注視点との間の対応を決定することができる。この較正を、従来の較正手順のように、頭部搭載型シーンカメラが固定位置にあり、動かない間に実行する。 With this reference image (or other image suitable for gaze calibration), gaze direction calibration can be performed in a conventional manner, which corresponds to the gaze direction determined by the eye tracker ET and the corresponding in the reference image. It means to perform mapping between gaze points. This can be done, for example, by having the object look at a definition point in the reference image (or other image suitable for gaze calibration) and determining the corresponding gaze direction measured by the eye tracker. Based on this data, it is possible to determine the correspondence between the gaze direction determined by the eye tracker and the corresponding gaze point in the reference image (and thus the actual scene image). This calibration is performed while the head mounted scene camera is in a fixed position and does not move, as in a conventional calibration procedure.

この較正により、頭部搭載型カメラが動かない限り、注視方向を検出することによって、ある期間の注視点を従来の方法で検出することができた。 By this calibration, as long as the head-mounted camera does not move, the gaze point for a certain period can be detected by the conventional method by detecting the gaze direction.

しかしながら、頭部搭載型カメラが動くと、図１でわかるように、例えば、今は異なる位置にあるシーンカメラＳＣ２の画像面でシーン画像により示すように、頭部搭載型シーンカメラで撮ったシーン画像は変化する。 However, when the head-mounted camera moves, as shown in FIG. 1, for example, a scene taken with the head-mounted scene camera as shown by the scene image on the image plane of the scene camera SC2 that is now at a different position. The image changes.

ある実施形態によると、後に撮ったシーン画像中で（本棚を取り囲む矩形によって限定された区域に実行した特徴検出アルゴリズム等によって）以前に定義した特徴的基準特徴が検出される。そのため、頭部搭載型シーンカメラの異なる位置から撮ったその後のシーン画像中の基準画像の以前に定義された特性（または対象）を少なくとも部分的に再特定可能な画像処理アルゴリズムを用いる。 According to one embodiment, previously defined characteristic reference features are detected in a scene image taken later (such as by a feature detection algorithm performed on an area defined by a rectangle surrounding the bookshelf). Therefore, an image processing algorithm is used that can at least partially re-specify previously defined characteristics (or objects) of a reference image in subsequent scene images taken from different positions of the head mounted scene camera.

このために用いることのできる、ある好適なアルゴリズムは、いわゆるスケール不変特徴変換（ＳＩＦＴ）である。このアルゴリズムで、まず対象の特徴を１つ以上の基準画像（前述の「基準画像」等）から抽出してデータベースに記憶する。この特徴は、画像の位置と、特徴がディスクリプタ-ベクトルの距離で比較可能になるような方法でその位置を取り囲む画像区域の情報を捕捉する特徴-ディスクリプタ-ベクトルによって定義される。そして、新しい画像の各特徴をこのデータベースに対して個々に比較し、その特徴ベクトルのユークリッド距離に基づいて一致する特徴候補を見つけることで、新しい画像の中で対象を認識する。一致のフルセットから、対象とその位置に一致する特徴のサブセット、新しい画像中のスケールと向きを特定し、正しい一致を抜き出す。そして、対象とそのポーズに一致する４つ以上の特徴の各クラスターをさらに詳細なモデル検証にかけた後、外れ値を捨てる。最後に、一致の精度と考えられる偽一致の数を考慮し、特定の特徴セットが対象の存在を示す確率を計算する。これら試験に全て合格した対象または特徴の一致を、高信頼度で正しいと特定することができる。 One suitable algorithm that can be used for this is the so-called scale invariant feature transformation (SIFT). With this algorithm, first, target features are extracted from one or more reference images (such as the “reference image” described above) and stored in a database. This feature is defined by a feature-descriptor-vector that captures information about the image location and the image area that surrounds that location in such a way that the feature can be compared by the descriptor-vector distance. Then, each feature of the new image is individually compared with this database, and a target candidate is recognized in the new image by finding a matching feature candidate based on the Euclidean distance of the feature vector. From the full set of matches, identify the target and the subset of features that match its location, the scale and orientation in the new image, and extract the correct match. Then, after more detailed model verification is performed on each cluster of four or more features that match the target and its pose, outliers are discarded. Finally, considering the number of false matches considered to be the accuracy of the match, the probability that a particular feature set indicates the presence of the object is calculated. An object or feature match that passes all these tests can be identified as reliable and correct.

ＳＩＦＴアルゴリズムのより詳細な説明は、例えばＵＳ特許第６，７１１，２９３号にある。 A more detailed description of the SIFT algorithm can be found, for example, in US Pat. No. 6,711,293.

ＳＩＦＴアルゴリズムを用いることで、図１に示す画像ＳＣ２の特徴的基準特徴を再特定または認識することができる。 By using the SIFT algorithm, the characteristic reference feature of the image SC2 shown in FIG. 1 can be re-specified or recognized.

したがって、頭部搭載型カメラが動いていても、基準画像ＳＣ中のどの点がＳＣ２で撮ったシーン画像のどの点に対応するかを特定するマッピングを決定することができる。このため、例えば一致した特徴の一部またはすべてからホモグラフィを推定するＲＡＮＳＡＣのような頑健な推定器を用いて、暗示的に表された仮想転写面から離れすぎている外れ値を捨てることによるマッピングのように、基準画像の特性とシーン画像で再特定された特性との間のホモグラフィを決定する。この平面はほとんどの場合図１に示すような特性に対応するシーン対象点の平均平面である。 Therefore, even if the head-mounted camera is moving, it is possible to determine a mapping that specifies which point in the reference image SC corresponds to which point in the scene image taken in SC2. For this reason, for example, by discarding outliers that are too far away from the implicitly represented virtual transfer surface using a robust estimator such as RANSAC that estimates homography from some or all of the matched features. Similar to mapping, a homography between the characteristics of the reference image and the characteristics re-specified in the scene image is determined. This plane is in most cases the average plane of scene target points corresponding to the characteristics shown in FIG.

このマッピングを用いて、基準画像中の注視点への注視方向をマッピングすることができる。注視方向の決定に基づき、（当初較正に基づく）シーン画像でまず実際の注視点を決定してから、（ホモグラフィに基づく）基準画像中の対応する点を決定することができる。これは図１にも示され、以下により詳細に説明する。 Using this mapping, it is possible to map the gaze direction to the gaze point in the reference image. Based on the determination of the gaze direction, the actual gaze point can first be determined in the scene image (based on initial calibration) and then the corresponding point in the reference image (based on homography) can be determined. This is also shown in FIG. 1 and will be described in more detail below.

ある注視方向の検出は、カメラＳＣ２で撮った画像の画像面を交差する線で示すように、後で撮ったシーン画像ＳＣ２のある点に対応する。この対応は較正中に取得されており、それによって、較正手順に関連して前述したようにシーン画像中の点への注視方向のマッピングが得られている。注視方向によって特定されたシーン画像平面中のこの点は、図１からわかるように図１のある仮想転写面の点に対応することになり、異なる位置で再びカメラＳＣ２で撮った図１の仮想転写面のこの点はまた、較正中に当初位置でカメラＳＣによって撮った基準画像中のある点に対応する。このマッピングは、シーン画像ＳＣ２の複数の点を基準画像ＳＣにマッピングするため決定し、特性の認識または再特定と、望ましくは頑健な推定に基づき決定されたホモグラフィによって得られる。 The detection of a certain gaze direction corresponds to a certain point in the scene image SC2 taken later, as indicated by a line intersecting the image plane of the image taken by the camera SC2. This correspondence was obtained during calibration, thereby providing a mapping of gaze directions to points in the scene image as described above in connection with the calibration procedure. This point in the scene image plane specified by the gaze direction corresponds to a point on a certain virtual transfer surface in FIG. 1 as can be seen from FIG. 1, and the virtual in FIG. 1 taken again by the camera SC2 at a different position. This point on the transfer surface also corresponds to a point in the reference image taken by the camera SC at the initial position during calibration. This mapping is determined by mapping a plurality of points of the scene image SC2 to the reference image SC and is obtained by homography determined based on feature recognition or re-identification and preferably robust estimation.

上記説明からわかるように、ある実施形態によると、基準画像で特定され、シーン画像で再特定された特徴と共に「仮想転写面」を定義する１組の特性（点、角、縁等）を用いる。カメラ画像はこの仮想転写面のカメラの画像平面への投影であり、これらが異なるカメラ位置から撮ったものである場合、異なるカメラ画像における特性の位置が異なることになる。この仮想転写面は、シーン画像がカメラの移動のために変化しても、頭部搭載型カメラによって異なる位置から後に撮ったシーン画像中の特性を再特定することにより、後に撮ったシーン画像で「再特定」できるという意味で、「シーン安定面」と見なしてよい。頭部搭載型シーンカメラの画像の画像処理を用いて、まず特徴的基準特徴を定義してから、実世界の情景で取り付けなければならない人工的ＩＲ光源を用いる代わりに、特徴検出により後のシーン画像中の特性を再び見つけ、あるいは再特定する。その代り、人工的マーカーや追加的ＩＲ検出器なしに、通常カメラで撮った実世界の情景の「自然な」無修正画像を利用可能で、あらゆる外部マーカーを用いることなく、注視点決定を実行できる。その代り、頭部搭載型カメラが動いても、人工的追加なしに、無変化のシーン画像のみに基づいて注視点検出を実行できる。 As can be seen from the above description, according to one embodiment, a set of properties (points, corners, edges, etc.) that define a “virtual transfer surface” along with features identified in the reference image and re-specified in the scene image are used. . The camera image is a projection of the virtual transfer surface onto the image plane of the camera. When these images are taken from different camera positions, the positions of the characteristics in the different camera images are different. Even if the scene image changes due to the movement of the camera, this virtual transfer surface is a scene image taken later by re-specifying the characteristics in the scene image taken later from different positions depending on the head mounted camera. It may be regarded as a “scene stable surface” in the sense that it can be “respecified”. Instead of using an artificial IR light source that must be attached in a real-world scene after first defining a characteristic reference feature using image processing of a head-mounted scene camera image, a later scene by feature detection Find or re-specify the characteristics in the image. Instead, you can use “natural” uncensored images of real-world scenes taken with a regular camera, without artificial markers or additional IR detectors, and perform gaze point determination without using any external markers it can. Instead, even if the head-mounted camera moves, gaze point detection can be executed based only on unchanged scene images without artificial addition.

上述の例の１組の特徴的基準特徴の定義は、図１に示すように「特徴的領域」またはその中に含まれ、特徴検出アルゴリズムによって特定される特徴を定義する「関心領域として本棚を取り囲む矩形を選択することにより実行された。 The definition of a set of characteristic reference features in the above example is a “characteristic region” as shown in FIG. 1, or a “shelf as a region of interest” that defines the features identified by the feature detection algorithm. This was done by selecting the enclosing rectangle.

しかしながら、後のシーン画像中での再特定に適した特性を含む基準画像のその他関心領域を選択してもよい。例えば、本棚の４つの前部角を、マウスで選択し、その周囲の関心区域を限定し、この場合は例えば角検出アルゴリズム等の特徴検出アルゴリズムの実行に用いることで、特徴点として選択してもよい。そしてこのような角は、例えば、Moravec角検出アルゴリズム、Harris & Stephens角検出アルゴリズム、Plessey角検出アルゴリズム等、当業者に周知の好適な角検出アルゴリズムにより、後のシーン画像中で再び再特定してよい。
さらに、上記の例で述べたSIFTアルゴリズムも用いてよい。 However, other regions of interest in the reference image that include characteristics suitable for re-specification in later scene images may be selected. For example, the four front corners of the bookshelf are selected with the mouse, the surrounding area of interest is limited, and in this case, for example, the feature detection algorithm such as the corner detection algorithm is used to select the feature point. Also good. Such corners are then re-identified in later scene images by suitable angle detection algorithms well known to those skilled in the art, such as the Moravec angle detection algorithm, Harris & Stephens angle detection algorithm, Plessey angle detection algorithm, etc. Good.
Further, the SIFT algorithm described in the above example may be used.

SIFTアルゴリズムの代わりに、あるいはこれと組み合わせて（基準画像全体に基づき、または基準画像で定義された１つ以上の関心領域に基づき）用いてよい特徴特定および特徴認識のための別のアルゴリズムは、いわゆる「高速化頑健特徴」（SURF）アルゴリズムである。SURFは、特徴-ディスクリプタとして特徴位置周辺の近似２D Haar Wavelet応答の総計に基づき、積分画像を効率的に利用する。
これは、Hessian Blob検出器の決定要因への整数近似を用いて特徴位置を見つけるが、積分画像によって非常に速く計算することができる。このアルゴリズムの説明は、例えば、Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool SURF: “Speeded Up Robust Features”, Computer Vision and Image Understanding (CVIU), Vol. 110, No. 3, pp. 346-359, 2008にある。 Another algorithm for feature identification and feature recognition that may be used in place of or in combination with the SIFT algorithm (based on the entire reference image or based on one or more regions of interest defined in the reference image) is: This is the so-called “speed-up robust feature” (SURF) algorithm. SURF uses the integrated image efficiently as a feature-descriptor based on the sum of approximate 2D Haar Wavelet responses around the feature location.
This finds feature locations using integer approximations to the determinants of the Hessian Blob detector, but can be calculated very quickly with integral images. The algorithm is described in, for example, Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool SURF: “Speeded Up Robust Features”, Computer Vision and Image Understanding (CVIU), Vol. 110, No. 3, pp. 346- 359, 2008.

特徴検出アルゴリズムとして、基準画像中の点、ブロブ、領域、物体等の特性を検出可能で、さらに、シーン画像を異なる位置から撮ったとしても、後に撮ったシーン画像でも基準画像で検出しているこれら特性を認識または再特定できるあらゆる画像処理アルゴリズムを用いてよい。画像処理アルゴリズムは、従来のカメラで撮った「自然な」画像に基づくこのような特徴検出が可能でなければならない。これにより、人工的IR光源の必要性や、追加検出器の必要を避ける。その代わり、シーン画像にある程度のコントラストがあり、完全に均一ではなく形状や輪郭を含む限り、通常のカメラで撮ったあらゆる「自然な」シーン画像において固有の特性を検出可能な特徴検出アルゴリズムを選択する。このような特徴検出アルゴリズムを用いることにより、本発明の実施形態の注視点マッピングアプローチは、人工的なシーンエンリッチやIR光源によるシーン画像なしに、原則としてあらゆるシーン画像に基づいて実行可能である。角検出アルゴリズムのようなやや単純なものや、SIFTアルゴリズム、SURFアルゴリズムのようなやや高度なもの等、このような機能を達成可能な画像処理アルゴリズムは複数ある。これらアルゴリズムはすべて、マーカーとして人工的IR源を含まない画像固有の特性を検出および利用可能であるため、追加IR源や追加IR検出器の必要なしに自然画像に対して動作可能であるという点で共通している。 As a feature detection algorithm, it is possible to detect the characteristics of points, blobs, regions, objects, etc. in the reference image, and even if the scene image is taken from a different position, the scene image taken later is also detected from the reference image Any image processing algorithm that can recognize or re-specify these characteristics may be used. Image processing algorithms must be able to detect such features based on “natural” images taken with conventional cameras. This avoids the need for artificial IR light sources and the need for additional detectors. Instead, select a feature detection algorithm that can detect unique properties in any “natural” scene image taken with a regular camera, as long as the scene image has some contrast and is not perfectly uniform and includes shapes and contours. To do. By using such a feature detection algorithm, the gazing point mapping approach of the embodiment of the present invention can be performed based on any scene image in principle, without an artificial scene enrichment or an IR light source scene image. There are a number of image processing algorithms that can achieve such functions, such as a somewhat simple one such as a corner detection algorithm, a slightly advanced one such as a SIFT algorithm, or a SURF algorithm. All of these algorithms can operate on natural images without the need for additional IR sources or additional IR detectors because they can detect and use image-specific properties that do not include artificial IR sources as markers. In common.

ある実施形態によると、このような特性の決定は、特徴検出アルゴリズムによって自動的に行われるが、例えば、前述のような特徴検出アルゴリズムをその上で実行する関心領域を（例えばマウスやあらゆる入力または選択装置によって）選択することによって手動で対応してよい。 According to one embodiment, such property determination is performed automatically by a feature detection algorithm, for example, a region of interest (eg, a mouse or any input or You may respond manually by selecting (by the selection device).

しかしながらある実施形態によると、特徴的な点、ブロブ、領域等の特性は、基準画像に基づき完全に自動的にも行うことができる。例えば、上述のSIFTアルゴリズムは、特徴点（または特徴）、後の画像で後に再特定するいわゆる「キーポイント」を特定可能である。図１に示す矩形を選択した後、SIFTアルゴリズムは従って、この矩形内の「キーポイント」を自動的に特定できる。上述のように矩形を選択する代わりに基準画像全体がSIFTアルゴリズムにより特徴特定の基礎を形成し仮想転写面を定義すれば、完全自動手順を取得できる。 However, according to an embodiment, characteristics such as characteristic points, blobs, regions, etc. can also be performed completely automatically based on the reference image. For example, the SIFT algorithm described above can identify feature points (or features), so-called “keypoints” that are later re-specified in later images. After selecting the rectangle shown in FIG. 1, the SIFT algorithm can therefore automatically identify "keypoints" within this rectangle. If instead of selecting a rectangle as described above, the entire reference image forms the basis for feature identification by the SIFT algorithm and a virtual transfer surface is defined, a fully automatic procedure can be acquired.

実施形態の上記説明をまとめると、シーンの基準画像までの目の注視方向のマッピングは次のように実行できる。 To summarize the above description of the embodiment, mapping of the gaze direction of the eye up to the reference image of the scene can be executed as follows.

まず、アイトラッカーＥＴからシーンカメラ基準画像ＳＣまでの注視を標準手順を用いてマッピングする。 First, the gaze from the eye tracker ET to the scene camera reference image SC is mapped using a standard procedure.

次に、基準画像中の１組の特性を、特徴検出アルゴリズムを用いて定義する。 Next, a set of characteristics in the reference image is defined using a feature detection algorithm.

次に、異なる位置から撮った他のシーンカメラ画像ＳＣ２中の特徴的基準特徴の位置を見つける。 Next, the position of the characteristic reference feature in another scene camera image SC2 taken from a different position is found.

そして、シーン平面の仮想転写面への変換（ホモグラフィ）を取得/算出する。この変換を用いて、注視方向（および較正手順から取得するようなシーン画像中の対応するポイント）を、基準画像中の対応する注視点へマッピングすることができる。 Then, the transformation (homography) of the scene plane to the virtual transfer surface is acquired / calculated. Using this transformation, the gaze direction (and the corresponding point in the scene image as obtained from the calibration procedure) can be mapped to the corresponding gaze point in the reference image.

なお、他のシーン画像は、他の位置や他の時間からだけでなく、他の参加者（すなわち、頭部搭載型カメラとアイトラッカーを装着している他の人）からのものでも良い。頭部搭載型カメラは、次の測定シーケンスにおいて、別の対象が装着する場合でも、新しい較正の実行（すなわち、新しい基準画像またはその特徴の生成）の必要はない。 It should be noted that the other scene images may be from other participants (that is, other people wearing head-mounted cameras and eye trackers) as well as from other positions and other times. The head mounted camera does not need to perform a new calibration (i.e., generation of a new reference image or its features) even if another subject wears in the next measurement sequence.

このように、ユーザの頭が動いたり、別々のユーザであったりしても、注視方向をある期間中（および異なる参加者を通じても）、基準画像上の対応する注視点にマッピングし、その間の注視点を記録できるようにする。 In this way, even if the user's head moves or is a separate user, the gaze direction is mapped to the corresponding gaze point on the reference image for a certain period (and through different participants) Be able to record gaze points.

一旦基準画像に対する注視のマッピングを取得したら、例えばユーザの注視が向けられる位置をハイライトしたり、注視位置に十字その他マークのような記号を表示したりすることで基準画像中にその注視を表示できる。一定期間中に測定を実施した場合、注視点の対応する展開を、基準画像中に示すマークや記号を動かすことで表示できる。これが、基準画像中の対象の注視の可視化である。 Once you have obtained the gaze mapping for the reference image, for example, highlight the location where the user's gaze is directed, or display the gaze in the reference image by displaying a symbol such as a cross or other mark at the gaze location it can. When the measurement is performed during a certain period, the corresponding development of the gazing point can be displayed by moving the mark or symbol shown in the reference image. This is visualization of the gaze of the object in the reference image.

しかしながら、当初基準画像はこのような可視化にとって最良の画像でない可能性がある。例えば、当初基準画像が、図１に示すように本棚の正面図でなく斜視図を含んでいる場合である。しかしながら、対象の注視の可視化には、本棚を斜視図で示すより正面図で示す画像で注視点を示す画像を用いる方がより望ましい。 However, the initial reference image may not be the best image for such visualization. For example, the initial reference image includes a perspective view instead of a front view of a bookshelf as shown in FIG. However, it is more preferable to use an image showing a gazing point in an image shown in a front view than in a perspective view of the bookshelf for visualization of a target gaze.

このため、ある実施形態によると、注視点を決定し追跡するためのシーンの「所望の図」を提供する「最終基準画像」または「可視化画像」を生成してよい。 Thus, according to some embodiments, a “final reference image” or “visualized image” may be generated that provides a “desired view” of the scene for determining and tracking the point of interest.

このような「最終基準画像」を生成するため、例えば、所望の図中でシーンを示す好適なシーン画像を選択してよい。このシーン画像について、当初基準画像とシーン画像との間をマッピングするホモグラフィが実際の測定段階中に既に決定されている。 In order to generate such a “final reference image”, for example, a suitable scene image showing a scene in a desired figure may be selected. For this scene image, a homography mapping between the initial reference image and the scene image has already been determined during the actual measurement phase.

この変換を用いて、当初基準画像から、選択したシーン画像に対応する「最終基準画像」または「可視化画像」へ全ての注視点をマッピングする。 Using this conversion, all gaze points are mapped from the initial reference image to the “final reference image” or “visualized image” corresponding to the selected scene image.

もっとも望ましい方法でシーンを示すシーン画像がない場合、例えば高解像度で望ましい観点から、異なるカメラにより、シーンの「外部」または「追加」画像を最終基準画像として取ることもできる。この最終基準画像では、特徴的基準特徴を検出または再特定する特徴検出アルゴリズムを実行する。認識された特徴に基づき、当初基準画像から最終基準画像（可視化画像）への対応する変換を再び決定し、そうして決定された変換を用いて、当初基準画像中で検出された注視点を可視化画像に変換またはマッピングする。 If there is no scene image showing the scene in the most desirable way, for example, from a desirable point of view at high resolution, an “external” or “additional” image of the scene can be taken as the final reference image by a different camera. For this final reference image, a feature detection algorithm is executed that detects or re-specifies the characteristic reference feature. Based on the recognized features, the corresponding transformation from the initial reference image to the final reference image (visualized image) is determined again, and using the determined conversion, the gaze point detected in the initial reference image is determined. Convert or map to a visualized image.

ある実施例によると、最終基準画像は、最終基準画像に変換される時の注視点を最も望ましい方法で表すよう、シーン画像を最良に示すように生成される。これは、例えば、注視点を決定する物体を、斜視図でなく正面図で示す場合や、シーン画像より最終基準画像が大きかったり解像度が高かったりする場合である。 According to one embodiment, the final reference image is generated to best represent the scene image so as to represent the point of interest when converted to the final reference image in the most desirable manner. This is, for example, a case where an object for determining a gazing point is shown as a front view instead of a perspective view, or a case where the final reference image is larger than the scene image or the resolution is higher.

当初基準画像よりシーンの正面の図を提供するため、当初基準画像に対して最終基準画像を幾何学的に修正した例を図２に略図的に示す。図２は、図１の当初基準画像と、当初基準画像より正面の図を提供する最終基準画像を示す。当初基準画像を最終基準画像に投影し、マッピング取得が決定されているホモグラフィまたは変換に対応する投影図も略図として示す。幾何学的修正の別の例は、画像のアスペクト比の変更である。 An example of geometrically modifying the final reference image relative to the initial reference image to provide a front view of the scene from the initial reference image is shown schematically in FIG. FIG. 2 shows the initial reference image of FIG. 1 and the final reference image providing a front view from the initial reference image. A projection diagram corresponding to the homography or transformation in which the initial reference image is projected onto the final reference image and mapping acquisition is determined is also shown schematically. Another example of geometric modification is changing the aspect ratio of the image.

別の実施形態によると、最終基準画像は、例えば、たる形歪みのようなカメラ歪みを除去することにより、当初基準画像から歪みを除いて取得してよい。 According to another embodiment, the final reference image may be obtained by removing distortion from the initial reference image, for example, by removing camera distortion, such as barrel distortion.

ある実施形態によると、最終基準画像または可視化画像の生成は、画像のスティッチからなってよい。また、例えば、ある区域または部分のマーキングや様式化した図のスケッチ等による、１つ以上のシーン画像またはその様式化の修正からなってよい。可視化画像は、ある実施形態では手製の図でもよい。 According to an embodiment, the generation of the final reference image or visualization image may consist of an image stitch. It may also consist of modification of one or more scene images or their stylization, for example by marking an area or part or sketching a stylized figure. The visualized image may be a handmade drawing in some embodiments.

可視化画像として用いる画像の種類により、基準画像から可視化画像への点転写マッピングを決定するため様々な方法を用いてよい。例えば、可視化画像を基準画像と同じカメラで撮った場合、前述の特徴検出アルゴリズムをこのために使ってよい。 Depending on the type of image used as the visualized image, various methods may be used to determine the point transfer mapping from the reference image to the visualized image. For example, if the visualized image is taken with the same camera as the reference image, the feature detection algorithm described above may be used for this purpose.

しかしながら、可視化画像が様式化画像やましてやスケッチまたは手製の図である場合、点転写マッピングは、例えば三角あるいは点毎のマッピングによって手動で定義してよい。 However, if the visualized image is a stylized image, or even a sketch or a handmade drawing, the point transfer mapping may be defined manually, for example by a triangle or point-by-point mapping.

ある実施形態によると、マッピングした注視点の可視化は１枚の可視化画像ではなく、可視化画像のシーケンスとして作用するビデオシーケンスのフレームで行う。 According to one embodiment, the visualization of the mapped gazing point is not a single visualized image, but a frame of a video sequence that acts as a sequence of visualized images.

そのため、基準画像とビデオシーケンスの各フレームとの間の点転写マッピングを決定し、このマッピングを用いて１人以上のユーザのある期間中の注視点をビデオシーケンスのフレームにマッピングすることができる。このように、期間中に頭部搭載型ビデオカメラで撮って記録した注視点を、異なる位置または観点から異なるカメラで撮った別のビデオに基準画像を介してマッピングすることができる。期間中の展開と共に１人以上のユーザの注視を、異なる、より望ましいであろう観点から撮ったシーンの同じビデオにこのようにマッピングすることができる。 Thus, it is possible to determine a point transfer mapping between the reference image and each frame of the video sequence and to use this mapping to map a point of interest of one or more users during a period to a frame of the video sequence. In this way, the gazing point captured and recorded by the head mounted video camera during the period can be mapped via the reference image to another video captured by the different camera from a different position or viewpoint. One or more user gazes along with the development over time can thus be mapped to the same video of the scene taken from different, more desirable perspectives.

一旦最終基準画像が生成されると、特徴検出アルゴリズムをそれに対して実行し、その中の特徴的基準特徴を検出することができる。そして特徴間のホモグラフィを算出することにより、最終基準画像への注視点のマッピングを取得できる。当初基準画像から最終基準画像中の対応する位置まで点を変換可能なマッピングを決定する他の手段、例えば手によるホモグラフィの定義（４点一致を指定する等）や点毎の変換の定義等も用いることができる。 Once the final reference image is generated, a feature detection algorithm can be run on it to detect the characteristic reference features therein. Then, by calculating the homography between the features, it is possible to acquire the mapping of the gazing point to the final reference image. Other means for determining mapping that can convert points from the initial reference image to the corresponding position in the final reference image, such as defining hand homography (specifying 4-point matching, etc.), defining point-by-point conversion, etc. Can also be used.

他の実施形態に関して上記に述べたように、特性を検出する自動画像処理については、特徴検出を行う画像部分を、区域（関心領域）やその区域の内容によって特徴づけ、これによってアルゴリズムが特性を探す領域や、関心区域のその内容中で特定可能な物体を定義することができる。ホモグラフィは、別のシーン画像の特徴的基準特徴と、点転写を容易化する暗示的仮想シーン平面を定義する基準画像中の特徴的基準特徴から算出する。ホモグラフィは（実世界シーンの仮想転写面の点と平面からの距離に対して視差のある平面から全ての点に対応する）シーン画像の全ての点を基準画像上およびあらゆる基準画像への延長の点に変換することで、頭部搭載型アイトラッカーを着けた動く人の点のマッピングを固定アイトラッカーの場合まで減らす。従って、頭部搭載型カメラが動いても注視点トラッキングを実行することができる。 As described above with respect to other embodiments, for automatic image processing that detects characteristics, the image portion for which feature detection is performed is characterized by the area (region of interest) and the contents of the area, which allows the algorithm to determine the characteristics. It is possible to define a search area and an object that can be identified in the contents of the area of interest. The homography is calculated from the characteristic reference features in another scene image and the characteristic reference features in the reference image that define an implicit virtual scene plane that facilitates point transfer. Homography extends all points in the scene image to the reference image and to any reference image (corresponding to all points from the plane with parallax with respect to the distance from the virtual transfer plane point and the plane of the real world scene) This reduces the mapping of a moving person's point wearing a head-mounted eye tracker to that of a fixed eye tracker. Therefore, gaze tracking can be performed even when the head mounted camera moves.

ある実施形態によると、基準画像中のある物体（または１つ以上の物体）をその形状や輪郭を通じて（例えば、その境界線を通じて）定義することにより、ある注視点についてこの注視点が物体のこのような境界内にあるか否かを比較することができる。境界内にある場合、対応する物体を注視していると決定可能で、そうでない場合は決定しない。 According to an embodiment, by defining an object (or one or more objects) in the reference image through its shape or contour (eg, through its border), the It can be compared whether it is in such a boundary. If it is within the boundary, it can be determined that the corresponding object is being watched, otherwise it is not determined.

ある物体を注視しているか否かの特定は、シーン画像中で決定した注視に基づくのではなく、基準画像にマッピングした注視に基づいて行われる。全シーン画像を１つの基準画像で置換え、（異なる参加者および時間からを含む）注視をその上にマッピングすることにより、集積した注視データを可視化し、統計を取ることができる。これにより（少なくとも仮想転写面の注視データについて）頭部搭載型アイトラッカーのデータを分析することは、ヘッド固定のアイトラッキングデータの分析と同じことになる。 Whether or not an object is being watched is specified based on the gaze mapped on the reference image, not based on the gaze determined in the scene image. By replacing the entire scene image with one reference image and mapping the gaze (including from different participants and time) onto it, the accumulated gaze data can be visualized and statistics can be taken. Thus, analyzing the data of the head mounted eye tracker (at least for the gaze data of the virtual transfer surface) is the same as the analysis of the eye tracking data with the head fixed.

当業者は、前述の実施形態は、ハードウェア、ソフトウェア、またはソフトウェアとハードウェアの組み合わせによって実施してよいと理解される。本発明の実施形態と関連して説明したモジュールおよび機能は、本発明の実施形態と関連して説明した方法に従って作用するよう好適にプログラムされたマイクロプロセッサまたはコンピュータによって全体あるいは部分的に実施してよい。これには、当業者は容易に理解するように、アイトラッキングおよび画像処理の分野で用いられる好適なインターフェイスおよび/または測定装置とコンピュータまたはマイクロプロセッサとの接続を含んでよい。 Those skilled in the art will appreciate that the above-described embodiments may be implemented in hardware, software, or a combination of software and hardware. The modules and functions described in connection with embodiments of the present invention may be implemented in whole or in part by a microprocessor or computer suitably programmed to operate according to the methods described in connection with embodiments of the present invention. Good. This may include a suitable interface and / or measurement device and computer or microprocessor connection used in the field of eye tracking and image processing, as will be readily appreciated by those skilled in the art.

Claims

An apparatus for mapping a target gazing point on a scene image to a gazing point in a reference image, wherein the scene image and the reference image are taken by a camera from different positions,
A module that performs a feature detection algorithm on the reference image and identifies a plurality of characteristics and positions in the reference image;
A module that executes the feature detection algorithm on the scene image and re-specifies the plurality of characteristics and their positions in the scene image;
A module for determining a point transfer mapping for converting a point position between the scene image and the reference image based on the positions of the plurality of characteristics detected in the reference image and the scene image;
An apparatus comprising: a module for mapping a gazing point determined in the scene image to a corresponding point in the reference image using the point transfer mapping.

The point transfer mapping is a homography that is estimated from characteristics of a reference image and a scene image, and that transfers a point position from the scene image to the reference image via a virtual transfer surface in a world scene. Item 2. The apparatus according to Item 1.

3. The apparatus of claim 1 or 2, further comprising a module that determines one or more regions of interest in the reference image and in which the feature detection algorithm is executed to detect and locate the characteristic therein.

The feature detection algorithm for detecting the plurality of characteristics in the scene image, detecting the characteristics in the scene image, and respecifying,
A scale time invariant feature transformation algorithm,
The apparatus according to any one of the preceding claims, comprising one of the accelerated robust feature algorithms.

Any of the preceding claims further comprising a module for mapping gaze directions from different people and / or different times to corresponding gaze points in the reference image, and recording the gaze points mapped over time for different users The apparatus according to claim 1.

The said reference image WHEREIN: The said gazing point mapped from the said scene image to the said reference image is displayed on the said reference image, The said gazing point is further comprised from the module during the period of the position, The module which further comprises visualizes. Equipment.

The module further comprises a module for visualizing the gazing point mapped from the scene image to the reference image in a visualized image different from the reference image,
A module for determining a point transfer mapping for transferring points from the reference image to the visualized image;
A module for mapping the point of interest from the reference image to its corresponding point in the visualized image using the point transfer mapping;
The apparatus according to claim 1, further comprising a module that displays the gazing point on the corresponding point on the visualized image.

8. The apparatus of claim 7, further comprising a module that determines the point transfer mapping based on characteristics that are detected by the feature detection algorithm in the reference image and detected by the feature detection algorithm in the visualized image and re-specified.

The visualized image is
Images selected from scene images taken from different positions,
An image generated by stitching two or more scene images taken from different positions;
A geometrically modified image of the scene,
An image of the scene with the distortion removed,
External images taken with different cameras,
Images taken at different times from the actual measurement of the gaze point,
An image having a higher resolution than the reference image;
With cartoons or sketches,
A stylized figure and
9. Device according to claim 7 or 8, characterized in that it is one of the handmade figures.

Further comprising a module for visualizing the point of interest in a frame of a video sequence, the module comprising:
A module for determining a point transfer mapping from a point of the reference image to a corresponding point in a video frame of the video sequence;
The module according to any one of the preceding claims, comprising a module that uses the point transfer mapping to map a gazing point from the reference image to a corresponding point to a corresponding point in a frame of a video sequence. Equipment.

A module for mapping a user's gaze point detected in a scene image taken by a first camera from a position different from the first camera to a corresponding position in a corresponding frame of a video taken by a second camera The apparatus of claim 10 further comprising:

An eye tracking module that tracks the gaze of the object and / or a head mounted camera that takes a scene image and / or a calibration module that maps a point of interest to a corresponding point in the scene taken by the head mounted camera. An apparatus according to any one of the preceding claims.

A method for mapping a target gazing point on a scene image to a gazing point in a reference image, wherein the scene image and the reference image are taken by a camera from different positions, and the method includes:
Performing a feature detection algorithm on the reference image to identify a plurality of characteristics and their positions in the reference image;
Performing the feature detection algorithm on the scene image to re-specify the plurality of characteristics and their positions in the scene image;
Determining a point transfer mapping for converting a point position between the scene image and the reference image based on the positions of the plurality of characteristics detected in the reference image and the scene image;
Mapping the point of interest determined in the scene image using the point transfer mapping to a corresponding point in the reference image.

14. A method according to claim 13, further comprising the step of performing according to the features defined in any one of claims 2-12.

A computer program comprising computer program code which, when executed on a computer, allows the computer to execute the method according to claim 13 or claim 14.