JP2012146199A

JP2012146199A - Sight line position detection device, sight line position detection method, and computer program

Info

Publication number: JP2012146199A
Application number: JP2011005080A
Authority: JP
Inventors: Tetsuaki Kurokawa; 哲明黒川; Susumu Shiroyama; 晋白山; Hideyuki Kubota; 秀行窪田; Takakazu Kobayashi; 敬和小林
Original assignee: Nippon Steel Corp
Current assignee: Nippon Steel Corp
Priority date: 2011-01-13
Filing date: 2011-01-13
Publication date: 2012-08-02
Anticipated expiration: 2031-01-13
Also published as: JP5549605B2

Abstract

PROBLEM TO BE SOLVED: To easily and reliably detect what a user is looking at based upon a sight line image of the user.SOLUTION: A SIFT feature point Sof a sight line camera image and a SIFT feature point Tof an image to be visually recognized are extracted, and corresponding points of the extracted SIFT feature points S, Tare derived. Then a Delaunay triangle is formed which includes, as a vertex, "the SIFT feature point Sof the sight line camera image" as a corresponding point. Further, three points in the vicinity of a sight line position of an operator are selected as feature points A, B, and C in the vicinity of the sight line position out of vertexes of the Delaunay triangle. Furthermore, "a point P' of the sight line position of the operator in the image to be visually recognized" is determined from coordinates of "SIFT feature points A', B', and C' in the image to be visually recognized" corresponding to "the feature points A, B, and C in the vicinity of the sight line position" so that the same positional relation of "a point P of the sight line position of the operator in the sight line camera image" when viewed from "the feature points A, B, and C in the vicinity of the sight line position"is obtained.

Description

本発明は、視線位置検出装置、視線位置検出方法、及びコンピュータプログラムに関し、特に、ユーザの視線位置（実注視点）を検出するために用いて好適なものである。 The present invention relates to a line-of-sight position detection apparatus, a line-of-sight position detection method, and a computer program, and is particularly suitable for use in detecting a line-of-sight position (actual point of view) of a user.

近年、鉄鋼業等の製造業において、生産効率の向上、安全性の確保、環境負荷の低減等に対して情報処理技術を活用することが進められている。その一つとして、熟練者が自身の経験や勘を頼りにとっている行動であって、ワークフロー等に記述されていない行動を、視線計測データを分析することによって顕在化させることが挙げられる。このような経験や勘に基づいて熟練者がとっている行動に関する知識（この「知識」を「暗黙知」と称する）を明示することができれば、熟練者が直接的に教育をしなくても、その暗黙知の情報を非熟練者に伝承することができる。 In recent years, in the manufacturing industry such as the steel industry, it has been promoted to use information processing technology for improving production efficiency, ensuring safety, reducing environmental burden, and the like. As one of them, an action that a skilled person relies on his / her own experience and intuition and which is not described in a workflow or the like can be realized by analyzing gaze measurement data. If the knowledge about the actions that the skilled person is taking based on such experience and intuition (this "knowledge" is called "implicit knowledge") can be clearly indicated, the skilled person does not need to educate directly. , The tacit knowledge can be passed on to unskilled persons.

このような背景の下、視線計測データを分析する技術として非特許文献１に記載の技術がある。非特許文献１では、視野カメラの位置を視野カメラ画像から求めるようにしている。この非特許文献１では、計測対象の３次元座標が分かれば、原理的には、ユーザが見ていたものを識別することができる。 Under such a background, there is a technique described in Non-Patent Document 1 as a technique for analyzing gaze measurement data. In Non-Patent Document 1, the position of the field camera is obtained from the field camera image. In Non-Patent Document 1, in principle, if the three-dimensional coordinates of the measurement target are known, it is possible to identify what the user was looking at.

小橋優司他，"頭部装着型視線計測装置のための自然特徴点を用いた三次元注視点推定"，電子情報通信学会技術研究報告−ヒューマン情報処理，Vol109，No.261, p.5-10,2009Yuji Kobashi et al., “Three-dimensional gaze point estimation using natural feature points for head-mounted gaze measurement device”, IEICE Technical Report-Human Information Processing, Vol109, No.261, p.5- 10,2009 D.G.Lowe，"Object recognition from local scale-invariant features"，International Conference on Computer Vision，Corfu，Greece（September 1999）,p.1150-1157D.G.Lowe, "Object recognition from local scale-invariant features", International Conference on Computer Vision, Corfu, Greece (September 1999), p.1150-1157 杉原厚吉著，"なわばりの数理モデル -ボロノイ図からの数理工学入門-"，共立出版（2009）Sugihara Atsuyoshi, “Mathematical model of Nawabari -Introduction to mathematical engineering from Voronoi diagram”, Kyoritsu Shuppan (2009)

しかしながら、非特許文献１に記載の技術では、計測の精度を高くすることが困難であると共に、計測対象の３次元座標が既知であるという制約がある。このため、非特許文献１に記載の技術は、実用には適さない。このため、現状では、ユーザの目視での確認によってユーザが何を見ていたのかを同定するようにしている。
本発明は、このような問題点に鑑みてなされたものであり、ユーザの視野画像に基づいて、ユーザが何を見ているのかを容易に且つ確実に自動で検出できるようにすることを目的とする。 However, in the technique described in Non-Patent Document 1, it is difficult to increase the accuracy of measurement, and there is a restriction that the three-dimensional coordinates of the measurement target are known. For this reason, the technique described in Non-Patent Document 1 is not suitable for practical use. For this reason, at present, what the user is looking at is identified by visual confirmation of the user.
The present invention has been made in view of such problems, and it is an object of the present invention to easily and reliably automatically detect what the user is looking at based on the user's visual field image. And

本発明の視線位置検出装置は、ユーザが視認する対象となる領域の二次元画像である視認対象画像を取得する視認対象画像取得手段と、ユーザが装着している撮像手段で撮像された二次元画像である視野画像を取得する視野画像取得手段と、前記視認対象画像取得手段で取得された視認対象画像と、前記視野画像取得手段により取得された視野画像とを対応付ける為の、該視認対象画像中の点である特徴点を抽出する視認対象画像特徴点抽出手段と、前記視認対象画像取得手段で取得された視認対象画像と、前記視野画像取得手段により取得された視野画像とを対応付ける為の、該視野画像中の点である特徴点を抽出する視野画像特徴点抽出手段と、前記視野画像特徴点抽出手段により抽出された視野画像の特徴点と、当該視野画像の特徴点に対し特徴量が相互に対応する、前記視認対象画像特徴点抽出手段により抽出された視認対象画像の特徴点とを、対応点として抽出する対応点抽出手段と、前記対応点抽出手段により対応点として抽出された視野画像の特徴点から、前記視野画像取得手段により取得された視野画像におけるユーザの視線位置を内部に包含する三角形を構成する３つの特徴点を、視線位置近傍特徴点として抽出する視線位置近傍特徴点抽出手段と、前記視野画像におけるユーザの視線位置に対し画像上での位置が相互に対応する、前記視認対象画像における位置を、当該視認対象画像におけるユーザの視線位置として導出する視認対象画像上視線位置導出手段と、を有し、前記視認対象画像上視線位置導出手段は、前記３つの視線位置近傍特徴点から定まる座標系から見た場合の、前記視野画像におけるユーザの視線位置の座標と、当該３つの視線位置近傍特徴点に対応する３つの視認対象画像の特徴点から定まる座標系から見た場合に同一の座標となる位置を、前記視認対象画像におけるユーザの視線位置として導出することを特徴とする。 The line-of-sight position detection device of the present invention is a two-dimensional image captured by a visual target image acquisition unit that acquires a visual target image that is a two-dimensional image of a region that is to be viewed by a user, and an imaging unit that the user is wearing. Field-of-view image acquisition means for associating the field-of-view image acquisition means for acquiring a field-of-view image as an image; A visual target image feature point extracting unit that extracts a feature point that is a middle point, a visual target image acquired by the visual target image acquisition unit, and a visual field image acquired by the visual field image acquisition unit , Field image feature point extraction means for extracting feature points that are points in the field image, field image feature points extracted by the field image feature point extraction means, and feature points of the field image On the other hand, the feature points of the visual target image extracted by the visual recognition object image feature point extracting means corresponding to each other in feature quantities are extracted as corresponding points, and the corresponding points are extracted as corresponding points by the corresponding point extracting means. A line of sight that extracts, from the feature points of the extracted field-of-view image, three feature points that constitute a triangle that internally includes the user's line-of-sight position in the field-of-view image acquired by the field-of-view image acquisition means. A visual point for deriving a position in the visual target image in which the position in the visual image corresponds to the position of the user's visual line in the visual field image as a user's visual line position in the visual target image. A target image upper line-of-sight position deriving unit, and the visual target image upper line-of-sight position deriving unit is a seat determined from the feature points near the three line-of-sight positions. The same coordinates when viewed from the coordinate system determined from the coordinates of the user's line-of-sight position in the visual field image and the three visual point image feature points corresponding to the three line-of-sight position neighboring feature points when viewed from the system Is derived as the user's line-of-sight position in the visual recognition target image.

本発明の視線位置検出方法は、ユーザが視認する対象となる領域の二次元画像である視認対象画像を取得する視認対象画像取得工程と、ユーザが装着している撮像手段で撮像された二次元画像である視野画像を取得する視野画像取得工程と、前記視認対象画像取得工程で取得された視認対象画像と、前記視野画像取得手段により取得された視野画像とを対応付ける為の、該視認対象画像中の点である特徴点を抽出する視認対象画像特徴点抽出工程と、前記視認対象画像取得手段で取得された視認対象画像と、前記視野画像取得工程により取得された視野画像とを対応付ける為の、該視野画像中の点である特徴点を抽出する視野画像特徴点抽出工程と、前記視野画像特徴点抽出工程により抽出された視野画像の特徴点と、当該視野画像の特徴点に対し特徴量が相互に対応する、前記視認対象画像特徴点抽出工程により抽出された視認対象画像の特徴点とを、対応点として抽出する対応点抽出工程と、前記対応点抽出工程により対応点として抽出された視野画像の特徴点から、前記視野画像取得工程により取得された視野画像におけるユーザの視線位置を内部に包含する三角形を構成する３つの特徴点を、視線位置近傍特徴点として抽出する視線位置近傍特徴点抽出工程と、前記視野画像におけるユーザの視線位置に対し画像上での位置が相互に対応する、前記視認対象画像における位置を、当該視認対象画像におけるユーザの視線位置として導出する視認対象画像上視線位置導出工程と、を有し、前記視認対象画像上視線位置導出工程は、前記３つの視線位置近傍特徴点から定まる座標系から見た場合の、前記視野画像におけるユーザの視線位置の座標と、当該３つの視線位置近傍特徴点に対応する３つの視認対象画像の特徴点から定まる座標系から見た場合に同一の座標となる位置を、前記視認対象画像におけるユーザの視線位置として導出することを特徴とする。 The gaze position detection method of the present invention includes a visual target image acquisition step of acquiring a visual target image that is a two-dimensional image of a region that is to be visually recognized by a user, and a two-dimensional image captured by an imaging means that the user is wearing. A visual field image acquisition step for associating a visual field image acquisition step of acquiring a visual field image that is an image, the visual recognition target image acquired in the visual recognition target image acquisition step, and the visual field image acquired by the visual field image acquisition unit A visual target image feature point extraction step for extracting a feature point that is a middle point, a visual recognition target image acquired by the visual recognition target image acquisition unit, and a visual field image acquired by the visual field image acquisition step A field image feature point extracting step for extracting a feature point that is a point in the field image, a field image feature point extracted by the field image feature point extraction step, and a feature point of the field image On the other hand, the feature points of the visual target image extracted by the visual recognition target image feature point extraction step corresponding to each other in feature quantities are extracted as corresponding points, and the corresponding points are extracted as corresponding points by the corresponding point extraction step. A line of sight that extracts, from the feature points of the extracted field-of-view image, three feature points that constitute a triangle that internally includes the user's line-of-sight position in the field-of-view image acquired by the field-of-view image acquisition step A position-neighboring feature point extracting step and a visual recognition for deriving a position in the visual recognition target image corresponding to a user's visual position in the visual field image as the user's visual position in the visual recognition target image. A target image upper line-of-sight position deriving step, and the visual target image upper line-of-sight position deriving step is a seat determined from the three line-of-sight position neighboring feature points. The same coordinates when viewed from the coordinate system determined from the coordinates of the user's line-of-sight position in the visual field image and the three visual point image feature points corresponding to the three line-of-sight position neighboring feature points when viewed from the system Is derived as the user's line-of-sight position in the visual recognition target image.

本発明のコンピュータプログラムは、前記視線位置検出方法の各工程をコンピュータに実行させることを特徴とする。 A computer program according to the present invention causes a computer to execute each step of the eye gaze position detecting method.

本発明によれば、視野画像の特徴点と、視認対象画像の特徴点との対応点を抽出し、対応点として抽出された視野画像の特徴点のうち、視野画像におけるユーザの視線位置を内部に包含する三角形の頂点を構成する３点を選択する。そして、その３点（３つの視野画像の特徴点）を視線位置近傍特徴点として抽出する。そして、３つの視線位置近傍特徴点から定まる座標系から見た場合の、視野画像におけるユーザの視線位置の座標と、当該３つの視線位置近傍特徴点に対応する３つの視認対象画像の特徴点から定まる座標系から見た場合に同一の座標となる位置を、視認対象画像におけるユーザの視線位置として導出する。したがって、ユーザの視野画像に基づいて、ユーザが何を見ているのかを容易に且つ確実に自動で検出することができる。 According to the present invention, the corresponding point between the feature point of the visual field image and the feature point of the visual target image is extracted, and the user's line-of-sight position in the visual field image is extracted from the feature points of the visual field image extracted as the corresponding point. Select three points that make up the vertices of the triangle included in. Then, the three points (feature points of the three field-of-view images) are extracted as gaze position neighboring feature points. From the coordinates of the user's line-of-sight position in the visual field image and the feature points of the three visual target images corresponding to the three line-of-sight position neighboring feature points when viewed from the coordinate system determined from the three line-of-sight position neighboring feature points A position having the same coordinates when viewed from a fixed coordinate system is derived as a user's line-of-sight position in the visual recognition target image. Therefore, what the user is looking at can be easily and reliably automatically detected based on the user's visual field image.

本発明の実施形態を示し、視線位置検出装置の機能的な構成の一例を示す図である。It is a figure which shows embodiment of this invention and shows an example of a functional structure of a gaze position detection apparatus. 本発明の実施形態を示し、視野カメラ画像のSIFT特徴点と、視認対象画像のSIFT特徴点の一例を概念的に示す図（写真）である。It is a figure (photograph) which shows embodiment of the present invention and shows notionally an example of a SIFT feature point of a visual field camera image, and a SIFT feature point of a visual recognition object image. 本発明の実施形態を示し、誤対応の対応点の対を除去した後の、対応点の対の一例を概念的に示す図（写真）である。It is a figure (photograph) which shows an embodiment of the present invention, and shows an example of a pair of corresponding points after removing a pair of corresponding points corresponding to an incorrect correspondence. 本発明の実施形態を示し、ボロノイ図の一例とドロネー三角形分割により得られたドロネー三角形の一例を示す図である。It is a figure which shows embodiment of this invention and shows an example of the Delaunay figure obtained by the example of the Voronoi diagram and the Delaunay triangulation. 本発明の実施形態を示し、視野カメラ画像における作業者の視線位置の点に対応する、視認対象画像における作業者の視線位置の点の一例を示す図である。It is a figure which shows embodiment of this invention and shows an example of the point of the operator's gaze position in the visual recognition target image corresponding to the point of the operator's gaze position in the visual field camera image. 本発明の実施形態を示し、視認対象画像における作業者の視線位置の点の一例を概念的に示す図（写真）である。It is a figure (photograph) which shows an embodiment of the present invention and shows an example of a point of a worker's look position in a visual recognition object picture. 本発明の実施形態を示し、視線位置検出装置の動作の一例を説明するフローチャートである。It is a flowchart which shows embodiment of this invention and demonstrates an example of operation | movement of a gaze position detection apparatus. 本発明の実施形態を示し、図７のステップＳ７０６の詳細を説明するフローチャートである。It is a flowchart which shows embodiment of this invention and demonstrates the detail of FIG.7 S706. 本発明の実施形態を示し、図７のステップＳ７０７の詳細を説明するフローチャートである。It is a flowchart which shows embodiment of this invention and demonstrates the detail of step S707 of FIG. 本発明の実施形態を示し、図７のステップＳ７０８の詳細を説明するフローチャートである。It is a flowchart which shows embodiment of this invention and demonstrates the detail of step S708 of FIG. 本発明の実施形態を示し、図７のステップＳ７０９の詳細を説明するフローチャートである。It is a flowchart which shows embodiment of this invention and demonstrates the detail of FIG.7 S709.

以下、図面を参照しながら、本発明の一実施形態を説明する。
図１は、視線位置検出装置の機能的な構成の一例を示す図である。本実施形態では、作業者（ユーザ）が視認する対象となる領域の二次元画像を予めデジタルカメラで得ておく（以下の説明では、「作業者が視認する対象となる領域の二次元画像」を必要に応じて「視認対象画像」と称する）。また、頭部装着型の視野カメラを作業者の頭部に装着させ、作業者の視野範囲を撮像した二次元画像を当該頭部装着型の視野カメラで得る（以下の説明では、「作業者の視野範囲を撮像した二次元画像」を必要に応じて「視野カメラ画像」と称する）。視線位置検出装置１００は、これら視認対象画像と視野画像とを入力し、「視野画像における視線位置（視野画像における作業者の注視点）」と画像上での位置が相互に対応する「視認対象画像上の位置」を視認対象画像上視線位置として検出する機能を有する。
以下、視線位置検出装置１００が有する機能の詳細を説明する。視線位置検出装置１００のハードウェアは、例えば、ＣＰＵ、ＲＯＭ、ＲＡＭ、ＨＤＤ、及び各種のインターフェースを備えた情報処理装置を用いることにより実現することができる。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a diagram illustrating an example of a functional configuration of a visual line position detection device. In the present embodiment, a two-dimensional image of a region to be viewed by an operator (user) is obtained in advance with a digital camera (in the following description, “a two-dimensional image of a region to be viewed by an operator”). Are referred to as “visual recognition target images” as necessary). In addition, a head-mounted view camera is mounted on the operator's head, and a two-dimensional image obtained by imaging the worker's view range is obtained with the head-mounted view camera (in the following description, “worker The “two-dimensional image obtained by imaging the visual field range” is referred to as a “visual field camera image” as necessary). The line-of-sight position detection apparatus 100 inputs the visual target image and the visual field image, and the “visual target in the visual field image (the gaze position of the worker in the visual field image)” and the position on the image correspond to each other. It has a function of detecting the “position on the image” as the line-of-sight position on the image to be viewed.
Hereinafter, details of functions of the line-of-sight position detection apparatus 100 will be described. The hardware of the line-of-sight position detection apparatus 100 can be realized by using, for example, an information processing apparatus including a CPU, ROM, RAM, HDD, and various interfaces.

（画像取得部１０１）
画像取得部１０１は、作業者がその頭部に装着した頭部装着型の視野カメラで撮像された視野カメラ画像（動画像）を取り込む。尚、頭部装着型の視野カメラは、作業者の視野範囲を撮像するものであり、公知の技術で実現できるものであるので、ここでは、その詳細を省略する。また、以下の説明では、「頭部装着型の視野カメラ」を必要に応じて「視野カメラ」と略称する。
また、画像取得部１０１は、デジタルカメラで撮像された視認対象画像を取り込む。
画像取得部１０１は、例えば、画像インターフェースが視野カメラで得られた視野カメラ画像の信号や、デジタルカメラで得られた視認対象画像の信号を受信し、ＣＰＵが、その画像信号をＶＲＡＭ等に記憶することにより実現できる。 (Image acquisition unit 101)
The image acquisition unit 101 captures a field-of-view camera image (moving image) captured by a head-mounted field-of-view camera worn by the operator on the head. The head-mounted field-of-view camera captures an operator's field-of-view range and can be realized by a known technique, and therefore the details thereof are omitted here. In the following description, the “head-mounted view camera” is abbreviated as “view camera” as necessary.
In addition, the image acquisition unit 101 captures a visual recognition target image captured by a digital camera.
The image acquisition unit 101 receives, for example, a signal of a field-of-view camera image obtained by a field-of-view camera using an image interface or a signal of an image to be viewed obtained by a digital camera, and the CPU stores the image signal in a VRAM or the like. This can be achieved.

（特徴点抽出部１０２）
特徴点抽出部１０２は、画像取得部１０１で取得された「視野カメラ画像と視認対象画像」の特徴点（特徴量ベクトル）を抽出する。ここで、特徴点抽出部１０２は、視野カメラ画像の各フレームの特徴点を抽出する。本実施形態では、画像の特徴点の抽出に、Scale-Invariant Feature Transform（SIFT）を用いるようにしている。SIFTは、画像の中から注目する画素を特徴点として決定し、特徴点の周辺の領域における輝度の分布から特徴量を算出する手法であり、回転やスケール変化等に頑健な特徴量を記述可能なことから、同一画像をデータベース中から検索する画像検索などの用途で主に用いられる。尚、SIFTのアルゴリズムは、非特許文献２等に記載されているように公知の技術で実現できるので、ここでは、その詳細な説明を省略する。
図２は、視野カメラ画像のSIFT特徴点（図２（ａ））と、視認対象画像のSIFT特徴点（図２（ｂ））の一例を概念的に示す図（写真）である。図２では、作業者が、高炉の操業状態を表示するモニターを見ている場合を例に挙げて示している。図２では、SIFT特徴点を白丸で示している。
視野カメラ画像のSIFT特徴点Ｓⁱは、以下の（１）式で表され、視認対象画像のSIFT特徴点Ｔ^jは、以下の（２）式で表される。すなわち、SIFT特徴点Ｓⁱ、Ｔ^jはともに、１２８次元の要素を持つベクトルである。 (Feature point extraction unit 102)
The feature point extraction unit 102 extracts feature points (feature amount vectors) of the “field-of-view camera image and viewing target image” acquired by the image acquisition unit 101. Here, the feature point extraction unit 102 extracts the feature points of each frame of the visual field camera image. In the present embodiment, Scale-Invariant Feature Transform (SIFT) is used for extracting feature points of an image. SIFT is a technique that determines the pixel of interest from the image as a feature point and calculates the feature value from the luminance distribution in the area around the feature point. For this reason, it is mainly used for applications such as image retrieval for retrieving the same image from a database. Since the SIFT algorithm can be realized by a known technique as described in Non-Patent Document 2 and the like, detailed description thereof is omitted here.
FIG. 2 is a diagram (photograph) conceptually showing an example of the SIFT feature point (FIG. 2A) of the visual field camera image and the SIFT feature point (FIG. 2B) of the visual recognition target image. FIG. 2 shows an example in which the worker is looking at a monitor that displays the operating state of the blast furnace. In FIG. 2, SIFT feature points are indicated by white circles.
The SIFT feature point S ⁱ of the visual field camera image is represented by the following equation (1), and the SIFT feature point T ^j of the visual recognition target image is represented by the following equation (2). That is, both SIFT feature points S ⁱ and T ^j are vectors having 128-dimensional elements.

（１）式において、ｉは、１，２，・・・Ｎ（Ｎは、視野カメラ画像のSIFT特徴点の数）である。また、（２）式において、ｊは、１，２，・・・Ｍ（Ｍは、視認対象画像のSIFT特徴点の数）である。
特徴点抽出部１０２は、例えば、ＣＰＵが、ＶＲＡＭ等から、視野カメラ画像と視認対象画像のデータを読み出し、それらのSIFT特徴点を、SIFTのアルゴリズムにより抽出し、抽出したSIFT特徴点の情報をＲＡＭ等に記憶することにより実現できる。 In the equation (1), i is 1, 2,... N (N is the number of SIFT feature points of the field-of-view camera image). In the equation (2), j is 1, 2,... M (M is the number of SIFT feature points of the visual recognition target image).
In the feature point extraction unit 102, for example, the CPU reads out the data of the visual field camera image and the visual recognition target image from the VRAM or the like, extracts the SIFT feature points by the SIFT algorithm, and extracts the extracted SIFT feature point information. This can be realized by storing in a RAM or the like.

（特徴点マッチング部１０３）
特徴点マッチング部１０３は、視野カメラ画像のSIFT特徴点Ｓⁱの特徴量と、視認対象画像のSIFT特徴点Ｔ^jの特徴量との誤差を、以下の（３）式のように、これらのユークリッド距離d(i,j)として導出する。 (Feature point matching unit 103)
The feature point matching unit 103 calculates an error between the feature quantity of the SIFT feature point S ⁱ of the visual field camera image and the feature quantity of the SIFT feature point T ^j of the visual recognition target image as shown in the following equation (3). Derived as Euclidean distance d (i, j).

次に、特徴点マッチング部１０３は、視野カメラ画像のSIFT特徴点Ｓⁱとのユークリッド距離d(i,j)が最も近い視認対象画像のSIFT特徴点Ｔ^j(i,0)を探索する。また、特徴点マッチング部１０３は、視野カメラ画像のSIFT特徴点Ｓⁱとのユークリッド距離ｄ（ｉ，ｊ）が二番目に近い視認対象画像のSIFT特徴点Ｔ^j(i,1)を探索する。
次に、特徴点マッチング部１０３は、視野カメラ画像のSIFT特徴点Ｓⁱと、視認対象画像のSIFT特徴点Ｔ^j(i,0)とのユークリッド距離d(i,j(i,0))と、視野カメラ画像のSIFT特徴点Ｓⁱと、視認対象画像のSIFT特徴点Ｔ^j(i,1)とのユークリッド距離d(i,j(i,1))とを比較し、以下の（４）式を満たすか否かを判定する。 Next, the feature point matching unit 103 searches for the SIFT feature point T ^{j (i, 0)} of the visual target image having the closest Euclidean distance d (i, j) to the SIFT feature point S ⁱ of the visual field camera image. Further, the feature point matching unit 103 searches for the SIFT feature point T ^{j (i, 1)} of the visual target image whose Euclidean distance d (i, j) is the second closest to the SIFT feature point S ⁱ of the visual field camera image. .
Next, the feature point matching unit 103 performs the Euclidean distance d (i, j (i, 0)) between the SIFT feature point S ⁱ of the visual field camera image and the SIFT feature point T ^{j (i, 0)} of the viewing target image. And the Euclidean distance d (i, j (i, 1)) between the SIFT feature point S ⁱ of the field-of-view camera image and the SIFT feature point T ^{j (i, 1)} of the image to be viewed. 4) It is determined whether or not the expression is satisfied.

この判定の結果、（４）式を満たす場合、特徴点マッチング部１０３は、視野カメラ画像のSIFT特徴点Ｓⁱと、視認対象画像のSIFT特徴点Ｔ^j(i,0)とを、特徴量が相互に対応する対応点として設定する。一方、（４）式を満たさない場合、視野カメラ画像のSIFT特徴点Ｓⁱと、視認対象画像のSIFT特徴点Ｔ^j(i,0)とは、特徴量が相互に対応する対応点として設定されない。このように、本実施形態では、ユークリッド距離d(i,j(i,0))、d(i,j(i,1))の差が所定値よりも大きくなるものについてのみ、視野カメラ画像のSIFT特徴点Ｓⁱと、視認対象画像のSIFT特徴点Ｔ^j(i,0)とを対応点として設定するようにしている。尚、（４）式における「０．５」は一例であり、「０．５」以外の値であってもよい。 When the expression (4) is satisfied as a result of this determination, the feature point matching unit 103 uses the SIFT feature point S ⁱ of the visual field camera image and the SIFT feature point T ^{j (i, 0) of} the viewing target image as the feature amount. Are set as corresponding points corresponding to each other. On the other hand, if the equation (4) is not satisfied, the SIFT feature point S ⁱ of the visual field camera image and the SIFT feature point T ^{j (i, 0) of} the visual recognition target image are set as corresponding points whose feature amounts correspond to each other. Not. As described above, in the present embodiment, the field-of-view camera image is only used when the difference between the Euclidean distances d (i, j (i, 0)) and d (i, j (i, 1)) is larger than a predetermined value. The SIFT feature point S ⁱ and the SIFT feature point T ^{j (i, 0) of} the visual recognition target image are set as corresponding points. In addition, “0.5” in the formula (4) is an example, and may be a value other than “0.5”.

特徴点マッチング部１０３は、以上のユークリッド距離d(i,j)の導出、SIFT特徴点Ｔ^j(i,0)、Ｔ^j(i,1)の探索、（４）式の判定、及び対応点の設定を、視野カメラ画像のSIFT特徴点Ｓⁱの全てについて行う。
その後、特徴点マッチング部１０３は、同一の「視野カメラ画像のSIFT特徴点Ｓⁱ」に対し、複数の対応点（複数の「視認対象画像のSIFT特徴点Ｔ^j(i,0)」）が設定されたか否かを判定する。この判定の結果、複数の対応点が設定された場合、特徴点マッチング部１０３は、それらの対応点の対のうち、ユークリッド距離d(i,j)が最も近い対応点の対を選択し、それ以外の対応点の対を除去する。 The feature point matching unit 103 derives the above Euclidean distance d (i, j ⁾ , searches for the SIFT feature points T ^{j (i, 0)} and T ^{j (i, 1)} , determines the equation (4), and deals with them. Points are set for all SIFT feature points S ⁱ of the field-of-view camera image.
After that, the feature point matching unit 103 has a plurality of corresponding points (a plurality of “SIFT feature points T ^{j (i, 0)} ” of the view target image ⁾ corresponding to the same “SIFT feature points S ⁱ of the field-of-view camera image”. It is determined whether it has been set. When a plurality of corresponding points are set as a result of this determination, the feature point matching unit 103 selects a pair of corresponding points having the closest Euclidean distance d (i, j) from the pair of corresponding points, Remove other pairs of corresponding points.

特徴点マッチング部１０３は、例えば、ＣＰＵが、ＲＡＭ等から、視野カメラ画像のSIFT特徴点Ｓⁱの情報と、視認対象画像のSIFT特徴点Ｔ^jの情報とを読み出して、前述したようにして、視野カメラ画像のSIFT特徴点Ｓⁱと、当該視野カメラ画像のSIFT特徴点Ｓⁱと特徴量が相互に対応する視認対象画像のSIFT特徴点Ｔ^j(i,0)とを対応点として求め、求めた対応点の情報をＲＡＭ等に記憶することにより実現できる。 In the feature point matching unit 103, for example, the CPU reads the information of the SIFT feature point S ⁱ of the field-of-view camera image and the information of the SIFT feature point T ^j of the visual target image from the RAM or the like, as described above. obtains a SIFT feature point S ⁱ of view camera image, SIFT feature point S ⁱ and the feature quantity of the field of view camera image and a SIFT feature point T ^j of the visual target image corresponding to each other ^{(i, 0)} as the corresponding points This can be realized by storing information on the obtained corresponding points in a RAM or the like.

（誤対応除去部１０４）
誤対応除去部１０４は、特徴点マッチング部１０３により得られた対応点の対（視野カメラ画像のSIFT特徴点Ｓⁱと、当該視野カメラ画像のSIFT特徴点Ｓⁱと特徴量が相互に対応する視認対象画像のSIFT特徴点Ｔ^j(i,0)）をランダムに４つ選択する。
次に、誤対応除去部１０４は、選択した４つの対応点の対の画像上の座標を用いて、以下の（５）式及び（６）式で示される「射影変換のためのパラメータａ₁〜ａ₈」を導出する。 (Incorrect response removal unit 104)
The erroneous correspondence removing unit 104 corresponds to a pair of corresponding points obtained by the feature point matching unit 103 (SIFT feature points S ⁱ of the field camera image, SIFT feature points S ⁱ of the field camera image, and feature amounts mutually. Four SIFT feature points T ^{j (i, 0)} ) of the image to be viewed are selected at random.
Next, the incorrect correspondence removing unit 104 uses the coordinates on the image of the four selected pairs of corresponding points to display the “parameter a ₁ for projective transformation a ₁ expressed by the following formulas (5) and (6). ˜a ₈ ”is derived.

（５）式及び（６）式において、(x',y')は、視野対象画像上の座標であり、（ｘ，ｙ）は、視野カメラ画像上の座標である。
次に、誤対応除去部１０４は、射影変換のためのパラメータａ₁〜ａ₈の導出のために使用していない対応点の対を選択する。そして、誤対応除去部１０４は、選択した対応点の対の一方である「視野カメラ画像のSIFT特徴点Ｓⁱ」の視野カメラ画像上の座標(x,y)とパラメータａ₁〜ａ₈とを（５）式及び（６）式に代入して射影変換を行い、視認対象画像上の座標(x',y')を導出する。そして、誤対応除去部１０４は、導出した「視認対象画像上の座標(x',y')」と、選択した対応点の他方である「視認対象画像のSIFT特徴点Ｔ^j(i,0)」の視認対象画像上の座標(x',y')との誤差である投影誤差を導出する。そして、誤対応除去部１０４は、求めた投影誤差が閾値以下である場合、変数Ｃに１を加算する。一方、求めた投影誤差が閾値を超える場合には、変数Ｃは変化しない。
誤対応除去部１０４は、射影変換のためのパラメータａ₁〜ａ₈の導出のために使用していない対応点の対の全てについて、投影誤差の導出と、投影誤差と閾値との比較と、変数Ｃの加算とを行う。 In Expressions (5) and (6), (x ′, y ′) is coordinates on the visual field target image, and (x, y) is coordinates on the visual field camera image.
Next, the erroneous correspondence removing unit 104 selects a pair of corresponding points that are not used for deriving the parameters a _{1 to} a ₈ for projective transformation. Then, the erroneous correspondence removing unit 104 sets the coordinates (x, y) of the “SIFT feature point S ⁱ of the visual field camera image”, which is one of the selected pair of corresponding points, on the visual field camera image, and parameters a _{1 to} a ₈ . Is substituted into equations (5) and (6) to perform projective transformation, and the coordinates (x ′, y ′) on the image to be viewed are derived. The erroneous correspondence removal unit 104 then derives the “coordinate (x ′, y ′) on the viewing target image” and the “SIFT feature point T ^{j (i, 0 of the} viewing target image), which is the other of the selected corresponding points. ^)] , A projection error that is an error from the coordinates (x ′, y ′) on the visual recognition target image is derived. Then, the erroneous correspondence removing unit 104 adds 1 to the variable C when the obtained projection error is equal to or less than the threshold value. On the other hand, when the obtained projection error exceeds the threshold value, the variable C does not change.
The erroneous correspondence removing unit 104 derives a projection error and compares the projection error with a threshold value for all of the pairs of corresponding points that are not used to derive the parameters a _{1 to} a ₈ for projective transformation. Addition of variable C is performed.

誤対応除去部１０４は、以上の処理（対応点の対の選択、パラメータａ₁〜ａ₈の導出、投影誤差の導出、投影誤差と閾値との比較、及び変数Ｃの加算）を、４つの「対応点の対」の全ての組み合わせについて行う。これにより、４つの「対応点の対」の全ての組み合わせと同数の変数Ｃが得られる。
次に、誤対応除去部１０４は、得られた変数Ｃの最大値Ｃ_maxを選択し、最大値Ｃ_maxを得るために導出したパラメータａ₁〜ａ₈を最適射影変換パラメータとして設定する。 The erroneous correspondence removing unit 104 performs the above processing (selection of corresponding point pairs, derivation of parameters a _{1 to} a ₈ , derivation of projection error, comparison between projection error and threshold, and addition of variable C) in four ways. Performed for all combinations of “corresponding point pairs”. Thus, the same number of variables C as all combinations of the four “pairs of corresponding points” are obtained.
Then, erroneous corresponding removal unit 104 selects the maximum value C _max of the variable obtained C, and set the parameters a ₁ ~a ₈ derived in order to obtain the maximum value C _max as the optimal projective transformation parameter.

次に、誤対応除去部１０４は、特徴点マッチング部１０３により得られた対応点の対を１つ選択する。
次に、誤対応除去部１０４は、選択した対応点の対の一方である「視野カメラ画像のSIFT特徴点Ｓⁱ」の視野カメラ画像上の座標(x,y)と最適射影変換パラメータａ₁〜ａ₈とを（５）式及び（６）式に代入して射影変換を行い、視認対象画像上の座標(x',y')を導出する。そして、誤対応除去部１０４は、導出した「視認対象画像上の座標(x',y')」と、選択した対応点の他方である「視認対象画像のSIFT特徴点Ｔ^j(i,0)」の視認対象画像上の座標(x',y')との誤差である投影誤差を導出する。そして、誤対応除去部１０４は、求めた投影誤差が閾値を超える場合、選択した対応点の対を誤対応の対応点の対であるとして除去する。
誤対応除去部１０４は、以上の対応点の対の選択と、投影誤差の導出と、投影誤差と閾値との比較と、対応点の対の除去とを、特徴点マッチング部１０３により得られた対応点の対の全てについて行う。
図３は、誤対応の対応点の対を除去した後の、対応点の対（視野カメラ画像のSIFT特徴点Ｓⁱと、当該視野カメラ画像のSIFT特徴点Ｓⁱと特徴量が相互に対応する視認対象画像のSIFT特徴点Ｔ^j(i,0)）の一例を概念的に示す図（写真）である。図３では、視野カメラ画像３０１のSIFT特徴点Ｓⁱと、視認対象画像３０２のSIFT特徴点Ｔ^j(i,0)との対応点の対が９６個得られた場合を例に挙げて示している。尚、図３では、白い直線の両端が対応点となるが、この直線は、説明の便宜のために示したものであり、実際の画像に表示されるものではない。 Next, the incorrect correspondence removing unit 104 selects one pair of corresponding points obtained by the feature point matching unit 103.
Next, the erroneous correspondence removal unit 104 selects the coordinates (x, y) of the “SIFT feature point S ⁱ of the visual field camera image” that is one of the selected pair of corresponding points and the optimal projective transformation parameter a _1. ˜a ₈ is substituted into the equations (5) and (6) to perform projective transformation, and the coordinates (x ′, y ′) on the visual recognition target image are derived. The erroneous correspondence removal unit 104 then derives the “coordinate (x ′, y ′) on the viewing target image” and the “SIFT feature point T ^{j (i, 0 of the} viewing target image), which is the other of the selected corresponding points. ^)] , A projection error that is an error from the coordinates (x ′, y ′) on the visual recognition target image is derived. Then, if the calculated projection error exceeds the threshold value, the erroneous correspondence removing unit 104 removes the selected corresponding point pair as a pair of erroneous corresponding points.
The miscorresponding removal unit 104 obtained the above selection of the corresponding point pair, the derivation of the projection error, the comparison between the projection error and the threshold value, and the removal of the corresponding point pair by the feature point matching unit 103. Repeat for all pairs of corresponding points.
FIG. 3 shows a pair of corresponding points (SIFT feature point S ⁱ of the field-of-view camera image and SIFT feature point S ⁱ of the field-of-view camera image and the feature amount after the corresponding pair of corresponding points corresponding to each other is removed. It is a figure (photograph) which shows notionally an example of SIFT feature point Tj ^{(i, 0)} ) of the visual recognition target image to be performed. FIG. 3 shows an example in which 96 pairs of corresponding points between the SIFT feature point S ⁱ of the visual field camera image 301 and the SIFT feature point T ^{j (i, 0)} of the viewing target image 302 are obtained. ing. In FIG. 3, both ends of the white straight line are corresponding points, but this straight line is shown for convenience of explanation and is not displayed in an actual image.

誤対応除去部１０４は、例えば、ＲＡＭ等から、対応点の情報を読み出して、前述したようにして、読み出した対応点が誤対応の対応点であると判定した場合には、当該対応点の情報をＲＡＭ等から消去することにより実現できる。 For example, if the corresponding correspondence point is read from the RAM or the like and the corresponding correspondence point is determined to be an erroneous correspondence point as described above, the erroneous correspondence removal unit 104 reads the corresponding point information. This can be realized by erasing information from the RAM or the like.

（視線位置近傍特徴点抽出部１０５）
視線位置近傍特徴点抽出部１０５は、対応点として設定されている視野カメラ画像のSIFT特徴点Ｓⁱを全て選択する。
次に、視線位置近傍特徴点抽出部１０５は、視野カメラ画像のSIFT特徴点Ｓⁱの視野カメラ画像上の座標(x,y)を頂点とする三角形を、ドロネーの三角分割法（Delaunay Triangulation）を用いて作成する（この三角形をドロネー三角形と称する）。
そのために、視線位置近傍特徴点抽出部１０５は、ボロノイ図（Voronoi Diagram）を作成する。ボロノイ図とは、以下の（７）式を満たすボロノイ領域（Voronoi Region）Ｖ(p_i)で空間を分割したものをいう。 (Gaze position vicinity feature point extraction unit 105)
Gaze position near the feature point extracting unit 105 selects all SIFT feature point S ⁱ of the field of view camera image is set as a corresponding point.
Next, the line-of-sight position feature point extraction unit 105 performs a Delaunay triangulation (Delaunay Triangulation) on a triangle whose vertex is a coordinate (x, y) of the SIFT feature point S ⁱ of the field-of-view camera image on the field-of-view camera image. (This triangle is called a Delaunay triangle).
For this purpose, the line-of-sight position neighboring feature point extraction unit 105 creates a Voronoi Diagram. The Voronoi diagram means a space divided by a Voronoi region V (p _i ) that satisfies the following expression (7).

ここでは、ｍ個の点（ボロノイジェネレータ（Voronoi Generator））からなる点集合Ｐ＝[p₁,p₂,・・・,p_m]が与えられているものとする。また、（７）式において、Ｒⁿは、ｎ次元の実数空間を示す。また、D(p,p_i)、D(p,p_j)は、ある点pと点（ボロノイジェネレータ）p_i,p_jとのユークリッド距離を表す。すなわち、（７）式は、ボロノイ領域Ｖ(p_i)内の任意の点pから最もユークリッド距離が近い点（ボロノイジェネレータ）は、点（ボロノイジェネレータ）p_iとなることを表しており、このような点の集合をボロノイ領域としている。
このようなボロノイ図において、隣接する点（ボロノイジェネレータ）同士を相互に繋いだ図をドロネー三角形分割という。このようにドロネー三角形分割を行うと、三角形の最小の内角が最大になる（三角形をなるべく細長くしない）ようにドロネー三角形が作成される。また、ドロネー三角形分割では、作成された任意のドロネー三角形に外接する円の内部には、他のドロネー三角形の頂点が含まれないという性質（外接円特性）がある。尚、ドロネー三角形分割のアルゴリズムは、非特許文献３等に記載されているように公知の技術で実現できるので、ここでは、これ以上の詳細な説明を省略する。 Here, it is assumed that a point set P = [p ₁ , p ₂ ,..., P _m ] composed of m points (Voronoi Generator) is given. In the equation (7), R ⁿ represents an n-dimensional real number space. D (p, p _i ) and D (p, p _j ) represent Euclidean distances between a point p and points (Voronoi generators) p _i , p _j . That is, equation (7) represents that the point (Voronoi generator) p _i that is closest to the Euclidean distance from the arbitrary point p in the Voronoi region V (p _i ) is the point (Voronoi generator) p _i. A set of such points is used as a Voronoi region.
In such a Voronoi diagram, a diagram in which adjacent points (Voronoi generators) are connected to each other is called Delaunay triangulation. When the Delaunay triangulation is performed in this way, a Delaunay triangle is created so that the minimum interior angle of the triangle is maximized (the triangle is not elongated as much as possible). In addition, the Delaunay triangulation has a property (circumscribed circle characteristic) that the vertices of other Delaunay triangles are not included in the circle circumscribing any created Delaunay triangle. Note that the Delaunay triangulation algorithm can be realized by a known technique as described in Non-Patent Document 3 and the like, and therefore, detailed description thereof is omitted here.

図４は、ボロノイ図の一例（図４（ａ））とドロネー三角形分割により得られたドロネー三角形の一例（図４（ｂ））を示す図である。
図４において、視野カメラ画像のSIFT特徴点Ｓⁱの視野カメラ画像上の座標(x,y)の点が、ボロノイジェネレータp₁〜p₈となる。
以上のようにしてドロネー三角形が得られると、視線位置近傍特徴点抽出部１０５は、視野カメラ画像における作業者の視線位置を検出し、検出した作業者の視線位置がドロネー三角形の内部にあるか否かを判定する。この判定の結果、作業者の視線位置がドロネー三角形の内部にある場合、視線位置近傍特徴点抽出部１０５は、当該ドロネー三角形の３つの頂点を構成する「視野カメラ画像のSIFT特徴点Ｓⁱ」を視線位置近傍特徴点Ａ、Ｂ、Ｃとして設定する（Ａ、Ｂ、Ｃは、視野カメラ画像における座標を表すものとする）。 FIG. 4 is a diagram illustrating an example of a Voronoi diagram (FIG. 4A) and an example of a Delaunay triangle obtained by Delaunay triangulation (FIG. 4B).
In FIG. 4, points of coordinates (x, y) on the field-of-view camera image of the SIFT feature point S ⁱ of the field-of-view camera image are Voronoi generators p _{1 to} p ₈ .
When the Delaunay triangle is obtained as described above, the line-of-sight position neighboring feature point extraction unit 105 detects the worker's line-of-sight position in the visual field camera image, and whether the detected line-of-sight position of the worker is inside the Delaunay triangle. Determine whether or not. As a result of this determination, if the operator's line-of-sight position is inside the Delaunay triangle, the line-of-sight position neighboring feature point extraction unit 105 forms “SIFT feature points S ⁱ of the field-of-view camera image” that constitute the three vertices of the Delaunay triangle. Are set as eye-gaze position neighboring feature points A, B, and C (A, B, and C represent coordinates in the visual field camera image).

一方、作業者の視線位置がドロネー三角形の内部にない場合、視線位置近傍特徴点抽出部１０５は、作業者の視線位置にユークリッド距離が最も近いドロネー三角形の辺の両端点を構成する「視野カメラ画像のSIFT特徴点Ｓⁱ」を選択する（これら２つの視野カメラ画像のSIFT特徴点ＳⁱをＡ、Ｂとする。また、これらＡ、Ｂは、視野カメラ画像における座標を表すものとする）。
次に、視線位置近傍特徴点抽出部１０５は、視野カメラ画像のSIFT特徴点Ｓⁱのうち、作業者の視線位置とのユークリッド距離が最も近い特徴点Ｓⁱを、選択した「２つの視野カメラ画像のSIFT特徴点Ａ、Ｂ」以外から選択する（ここで選択した「視野カメラ画像のSIFT特徴点Ｓⁱ」をＣとする。また、Ｃは、視野カメラ画像における座標を表すものとする）。
次に、視線位置近傍特徴点抽出部１０５は、選択した「視野カメラ画像のSIFT特徴点Ａ、Ｂ、Ｃ」に基づく角度（∠ＡＣＢ）が以下の（８）式を満たすか否かを判定する。
３０°＜∠ＡＣＢ＜１５０° ・・・（８）
（８）式は、視野カメラ画像のSIFT特徴点Ｓⁱの「視野カメラ画像における座標の点Ａ、Ｂ、Ｃ」を頂点とする三角形をなるべく細長くしないようにするためのものである。よって、∠ＡＣＢの下限、上限は、それぞれ、３０°、１５０°に限定されず、この主旨を逸脱しない範囲で、∠ＡＣＢの下限と上限とを定めることができる。 On the other hand, when the worker's line-of-sight position is not inside the Delaunay triangle, the line-of-sight position feature point extraction unit 105 configures the “field camera” that constitutes both end points of the Delaunay triangle side that is closest to the worker's line-of-sight position. selecting a SIFT feature point S ⁱ "of the image (the two the SIFT feature point S ⁱ of view camera image a, and B. these a, B denote the coordinates in the field of view camera image) .
Next, the line-of-sight position feature point extraction unit 105 selects the feature point S ⁱ having the closest Euclidean distance to the worker's line-of-sight position among the SIFT feature points S ⁱ of the field-of-view camera image. Select from other than SIFT feature points A and B of image (“SIFT feature point S ⁱ of field-of-view camera image” selected here is C. C represents the coordinates in the field-of-view camera image) .
Next, the line-of-sight position feature point extraction unit 105 determines whether or not the angle (∠ACB) based on the selected “SIFT feature points A, B, and C of the visual field camera image” satisfies the following expression (8). To do.
30 ° <∠ACB <150 ° (8)
Expression (8) is for making the triangle having the vertexes of “the coordinate points A, B, and C in the field-of-view camera image” of the SIFT feature point S ⁱ of the field-of-view camera image as thin as possible. Therefore, the lower limit and upper limit of ∠ACB are not limited to 30 ° and 150 °, respectively, and the lower limit and the upper limit of ∠ACB can be determined within a range not departing from this gist.

この判定の結果、（８）式を満たす場合、視線位置近傍特徴点抽出部１０５は、選択した「視野カメラ画像のSIFT特徴点Ａ、Ｂ、Ｃ」と特徴量が相互に対応する「視認対象画像のSIFT特徴点Ｔ^j(i,0)」を選択する（ここで選択した「視認対象画像のSIFT特徴点Ｔ^j(i,0)」をそれぞれＡ´、Ｂ´、Ｃ´とする。また、Ａ´、Ｂ´、Ｃ´は、視認対象画像における座標を表すものとする）。
次に、視線位置近傍特徴点抽出部１０５は、以下の（９）式に示す２次元ベクトルの外積の符号と、以下の（１０）式に示す２次元ベクトルの外積の符号とが同じであるか否かを判定する。この判定の結果、これらの外積の符号が同じである場合、視線位置近傍特徴点抽出部１０５は、選択した視野カメラ画像のSIFT特徴点Ａ、Ｂ、Ｃを視線位置近傍特徴点Ａ、Ｂ、Ｃとする。 As a result of this determination, when the expression (8) is satisfied, the eye-gaze position neighboring feature point extraction unit 105 selects the “viewing target whose feature quantity corresponds to the selected“ SIFT feature points A, B, and C of the visual field camera image ”. SIFT feature point T ^{j (i, 0)} ”of the image is selected (“ SIFT feature point T ^{j (i, 0)} ”of the image to be viewed is set to A ′, B ′, and C ′, respectively ⁾ . A ′, B ′, and C ′ represent coordinates in the visual recognition target image).
Next, the line-of-sight position feature point extraction unit 105 has the same sign as the outer product of the two-dimensional vector shown in the following expression (9) and the sign of the outer product of the two-dimensional vector shown in the following expression (10). It is determined whether or not. As a result of this determination, if the signs of these outer products are the same, the eye-gaze position feature point extraction unit 105 converts the SIFT feature points A, B, and C of the selected visual field camera image to the eye-gaze position feature points A, B, C.

一方、これらの外積の符号が同じでない場合には、視野カメラ画像のSIFT特徴点Ａ、Ｂ、Ｃにより構成される三角形の頂点の位置関係と、視認対象画像のSIFT特徴点Ａ´、Ｂ´、Ｃ´により構成される三角形の頂点の位置関係とが異なっていることになる。このようなことは、画像の歪み等によって生じる。そこで、このような場合、視線位置近傍特徴点抽出部１０５は、視野カメラ画像のSIFT特徴点Ｓⁱのうち、作業者の視線位置とのユークリッド距離が次に近い特徴点Ｓⁱを、選択した「２つの視野カメラ画像のSIFT特徴点Ａ、Ｂ」以外から選択する（ここで選択した「視野カメラ画像のSIFT特徴点Ｓⁱ」をＣとする。また、Ｃは、視野カメラ画像における座標を表すものとする）。そして、前述した処理を繰り返し行う。
また、（８）式を満たさない場合、視線位置近傍特徴点抽出部１０５は、「２つの視野カメラ画像のSIFT特徴点Ｓⁱ」の「視野カメラ画像における座標の点Ａ、Ｂ」以外の点を全て選択したか否かを判定する。
この判定の結果、全ての点を選択したにもかかわらず、点Ｃを選択する条件である（８）式、（９）式、（１０）式の条件を満たす点が無かった場合には、視線位置近傍特徴点抽出部１０５は、視野カメラ画像のこれまで選択したSIFT特徴点Ｓⁱのうち、作業者の視線位置とのユークリッド距離が最も近い特徴点Ｓⁱを、選択した「２つの視野カメラ画像のSIFT特徴点Ａ、Ｂ」以外から選択し直す（ここで選択した「視野カメラ画像のSIFT特徴点Ｓⁱ」をＣとする。また、Ｃは、視野カメラ画像における座標を表すものとする）。そして、視線位置近傍特徴点抽出部１０５は、視線位置近傍特徴点抽出部１０５は、選択した「視野カメラ画像のSIFT特徴点Ａ、Ｂ、Ｃ」を視線位置近傍特徴点Ａ、Ｂ、Ｃとする。 On the other hand, when the signs of these outer products are not the same, the positional relationship between the vertices of the triangle formed by the SIFT feature points A, B, and C of the field-of-view camera image and the SIFT feature points A ′ and B ′ of the viewing target image , C ′ is different from the positional relationship of the vertices of the triangle. This occurs due to image distortion and the like. Therefore, in such a case, the line-of-sight position near feature point extraction unit 105, among the SIFT feature point S ⁱ of the field of view camera image, the Euclidean distance is next closest feature point S ⁱ of the working line of sight position, selected Select from other than “SIFT feature points A and B of two field-of-view camera images” (“SIFT feature point S ⁱ of field-of-view camera image” selected here is C. C represents the coordinates in the field-of-view camera image. To represent). And the process mentioned above is performed repeatedly.
In addition, when the expression (8) is not satisfied, the line-of-sight position feature point extraction unit 105 determines points other than “coordinate points A and B in the field-of-view camera image” of “SIFT feature points S ⁱ of two field-of-view camera images”. It is determined whether or not all are selected.
As a result of this determination, if all the points are selected but there are no points that satisfy the conditions of the expressions (8), (9), and (10) that are the conditions for selecting the point C, gaze position near the feature point extracting unit 105, among the SIFT feature point S ⁱ selected far field of view camera image, the Euclidean distance is closest feature point S ⁱ of the working line of sight position, selected "two views Reselect from other than the SIFT feature points A and B of the camera image (the “SIFT feature point S ⁱ of the visual field camera image” selected here is C. Also, C represents the coordinates in the visual field camera image. To do). Then, the line-of-sight position feature point extraction unit 105 selects the line-of-sight camera point SIFT feature points A, B, and C as the line-of-sight position feature points A, B, and C. To do.

一方、全ての点を選択していない場合、視線位置近傍特徴点抽出部１０５は、視野カメラ画像のSIFT特徴点Ｓⁱのうち、作業者の視線位置とのユークリッド距離が次に近い特徴点Ｓⁱを、選択した「２つの視野カメラ画像のSIFT特徴点Ａ、Ｂ」以外から選択する（ここで選択した「視野カメラ画像のSIFT特徴点Ｓⁱ」をＣとする。また、Ｃは、視野カメラ画像における座標を表すものとする）。そして、視線位置近傍特徴点抽出部１０５は、そして、視線位置近傍特徴点抽出部１０５は、選択した「視野カメラ画像のSIFT特徴点Ａ、Ｂ、Ｃ」に基づいて、前述した（８）式の判定を再度行う。
この様にして、作業者の視線位置がドロネー三角形の内部にない場合の視線位置近傍特徴点が決定される。 On the other hand, when not all the points are selected, the eye-gaze position neighboring feature point extraction unit 105 extracts the feature point S having the next closest Euclidean distance to the operator's eye-gaze position among the SIFT feature points S ⁱ of the visual field camera image. ⁱ is selected from other than the selected “SIFT feature points A and B of the two field-of-view camera images” (“SIFT feature point S ⁱ of the field-of-view camera image” selected here is C. C is the field of view) Representing coordinates in the camera image). The line-of-sight position feature point extraction unit 105 and the line-of-sight position feature point extraction unit 105 perform the above-described equation (8) based on the selected “SIFT feature points A, B, and C of the visual field camera image”. Determine again.
In this way, the feature point in the vicinity of the line-of-sight position when the line-of-sight position of the worker is not inside the Delaunay triangle is determined.

視線位置近傍特徴点抽出部１０５は、例えば、ＣＰＵが、ＲＡＭ等から、対応点の情報を読み出して、前述したようにして、視野カメラ画像における作業者の視線位置の近傍の３つの「視野カメラ画像のSIFT特徴点Ｓⁱ」を視線位置近傍特徴点Ａ、Ｂ、Ｃとし、視線位置近傍特徴点Ａ、Ｂ、Ｃの情報をＲＡＭ等に記憶することによって実現できる。 In the line-of-sight position feature point extraction unit 105, for example, the CPU reads information on corresponding points from the RAM or the like, and as described above, the three “field-of-view cameras” in the vicinity of the line-of-sight position of the worker This can be realized by setting the SIFT feature point S ⁱ ”of the image as the line-of-sight position vicinity feature points A, B, and C, and storing the line-of-sight position vicinity feature points A, B, and C in a RAM or the like.

（視認対象画像上視線位置導出部１０６）
図５は、「視野カメラ画像における作業者の視線位置の点Ｐ」と画像上での位置が相互に対応する「視認対象画像における作業者の視線位置の点Ｐ´」の一例を示す図である。
図５の左図において、視線位置近傍特徴点抽出部１０５で得られた「視線位置近傍特徴点Ａ、Ｂ、Ｃの点Ａを原点とし、線分ＡＢ、ＡＣを軸とする斜交座標系で表すと、視野カメラ画像における作業者の視線位置の点Ｐは、以下の（１１）式のように表される。この（１１）式を変形すると、パラメータｓ、ｔは、以下の（１２）式のように表される。尚、（１１）式及び（１２）式において、Ａ、Ｂ、Ｃ、Ｐは、それぞれ視野カメラ画像における座標を表す。 (Viewing target image upper line-of-sight position deriving unit 106)
FIG. 5 is a diagram illustrating an example of “a point P of the operator's line-of-sight position in the field-of-view camera image” and “a point P ′ of the worker's line-of-sight position in the image to be visually recognized” corresponding to the positions on the image. is there.
In the left diagram of FIG. 5, an “oblique coordinate system with the point A of the line-of-sight position vicinity feature points A, B, and C as the origin and the line segments AB and AC as axes obtained by the line-of-sight position feature point extraction unit 105. The point P of the operator's line-of-sight position in the field-of-view camera image is represented by the following equation (11): When this equation (11) is transformed, the parameters s and t are expressed by the following (12 In the equations (11) and (12), A, B, C, and P represent coordinates in the field-of-view camera image, respectively.

視認対象画像上視線位置導出部１０６は、視線位置近傍特徴点抽出部１０５で得られた視線位置近傍特徴点Ａ、Ｂ、Ｃに基づいて、（１２）式により、パラメータｓ、ｔを導出する。そして、視認対象画像上視線位置導出部１０６は、「視線位置近傍特徴点Ａ、Ｂ、Ｃ」と特徴量が相互に対応する「視認対象画像のSIFT特徴点Ａ´、Ｂ´、Ｃ´」と、パラメータｓ、ｔを用いて、「視認対象画像における作業者の視線位置の点Ｐ´」を視認対象画像上視線位置として、以下の（１３）式により導出する（図５の右図を参照）。 The visual target image upper line-of-sight position deriving unit 106 derives parameters s and t from the line-of-sight position neighboring feature points A, B, and C obtained by the line-of-sight position neighboring feature point extracting unit 105 using Equation (12). . Then, the visual target image upper line-of-sight position deriving unit 106 selects “SIFT feature points A ′, B ′, C ′ of the visual target image” corresponding to the feature amounts corresponding to the “visual point vicinity feature points A, B, C”. Using the parameters s and t, “the point P ′ of the operator's line-of-sight position in the image to be viewed” is derived as the line-of-sight position on the image to be viewed by the following equation (13) (the right diagram in FIG. 5 is shown). reference).

本実施形態では、（１２）式及び（１３）式の計算を行うことによって、「視線位置近傍特徴点Ａ、Ｂ、Ｃ」から定まる座標系から見た場合の「視野カメラ画像における作業者の視線位置の点Ｐの座標」と、「視線位置近傍特徴点Ａ、Ｂ、Ｃ」と特徴量が相互に対応する「視認対象画像のSIFT特徴点Ａ´、Ｂ´、Ｃ´」から定まる座標系から見た場合に、同一の座標となる位置を「視認対象画像における作業者の視線位置の点Ｐ´」として求めるようにしている。 In the present embodiment, by calculating the equations (12) and (13), the operator's view in the field-of-view camera image when viewed from the coordinate system determined from the “line-of-sight position neighboring feature points A, B, C”. Coordinates determined from “the coordinates of the point P of the line-of-sight position” and “SIFT feature points A ′, B ′, C ′ of the image to be viewed” corresponding to the “feature point vicinity feature points A, B, C” corresponding to each other. When viewed from the system, the position having the same coordinates is obtained as “a point P ′ of the operator's line-of-sight position in the image to be viewed”.

図６は、視認対象画像における作業者の視線位置の点Ｐ´の一例を概念的に示す図（写真）である。
図６の左図の点ｕ_a、ｕ_b、ｕ_c、ｐ^eは、それぞれ、視線位置近傍特徴点Ａ、Ｂ、Ｃ、視野カメラ画像における作業者の視線位置の点Ｐに対応する。一方、図６の右図の点Ｕ_a、Ｕ_b、Ｕ_c、Ｐ^eは、それぞれ、視認対象画像の点Ａ´、Ｂ´、Ｃ´、視認対象画像における作業者の視線位置の点Ｐ´に対応する。 FIG. 6 is a diagram (photograph) conceptually illustrating an example of the point P ′ of the operator's line-of-sight position in the visual recognition target image.
Point u _a left view of FIG. _{_{6, u b, u c,}} p e , respectively, the line-of-sight position neighbor feature points A, B, C, corresponding to the point P of the operator of the line-of-sight position in the field of view camera image. On the other hand, the point U _a of the right view in FIG. _{_{6, U b, U c,}} P e , respectively, A'point gazed object image, B', C', point operator gaze location in viewing the target image P Corresponds to '.

視認対象画像上視線位置導出部１０６は、視線位置近傍特徴点Ａ、Ｂ、Ｃの情報と、視野カメラ画像における作業者の視線位置の点Ｐの情報と、視線位置近傍特徴点Ａ、Ｂ、Ｃと特徴量が相互に対応する視認対象画像の点Ａ´、Ｂ´、Ｃ´の情報と、をＲＡＭ等から読み出して、前述したようにして、視認対象画像における作業者の視線位置の点Ｐ´を導出し、視認対象画像における作業者の視線位置の点Ｐ´の情報をＲＡＭ等に記憶することによって実現できる。 The visual target image upper line-of-sight position deriving unit 106 includes information on the line-of-sight position feature points A, B, and C, information on the point P of the worker's line-of-sight position in the visual field camera image, and line-of-sight position vicinity feature points A, B, Information on the points A ′, B ′, and C ′ of the visual recognition target image corresponding to C and the feature amount are read from the RAM or the like, and the point of the operator's line-of-sight position in the visual recognition target image as described above. This can be realized by deriving P ′ and storing information on the point P ′ of the operator's line-of-sight position in the image to be viewed in a RAM or the like.

（視認対象画像上視線位置表示部１０７）
視認対象画像上視線位置表示部１０７は、視認対象画像上視線位置導出部１０６で導出された「視認対象画像における作業者の視線位置の点Ｐ´」の情報を、液晶ディスプレイ等の表示装置に表示する。例えば、視認対象画像上視線位置表示部１０７は、図２（ｂ）に示す視認対象画像上に、図６の右図に示したような点Ｐ^eを示すマーク（図６の右図では白色の×）を表示装置に表示することができる。
視認対象画像上視線位置表示部１０７は、例えば、ＣＰＵが、ＲＡＭ等から、視認対象画像における作業者の視線位置の点Ｐ´の情報を読み出し、その情報を表示するための表示データを生成し、生成した表示データを表示装置に出力することによって実現することができる。 (Viewing target image upper line-of-sight position display unit 107)
The visual target image upper line-of-sight position display unit 107 uses the information of “the point P ′ of the worker's visual line position in the visual target image” derived by the visual target image upper line-of-sight position deriving unit 106 as a display device such as a liquid crystal display. indicate. For example, the visual target image upper line-of-sight position display unit 107 displays a mark indicating the point ^Pe as shown in the right diagram of FIG. 6 (white in the right diagram of FIG. 6) on the visual target image shown in FIG. X) can be displayed on the display device.
In the visual target image upper line-of-sight position display unit 107, for example, the CPU reads information on the point P ′ of the visual line position of the worker in the visual target image from the RAM or the like, and generates display data for displaying the information. This can be realized by outputting the generated display data to a display device.

（視線位置検出装置１００の動作フローチャート）
次に、図７のフローチャートを参照しながら、視線位置検出装置１００の動作の一例を説明する。
まず、ステップＳ７０１において、画像取得部１０１は、視認対象画像を取得する。
次に、ステップＳ７０２において、画像取得部１０１は、視野カメラ画像を取得する。
次に、ステップＳ７０３において、特徴点抽出部１０２は、ステップＳ７０１で取得された視認対象画像から、視認対象画像のSIFT特徴点を抽出する。
次に、ステップＳ７０４において、特徴点抽出部１０２は、ステップＳ７０２で取得された視野カメラ画像のフレームを、その取得順に１つ選択する。 (Operation flowchart of the line-of-sight position detection apparatus 100)
Next, an example of the operation of the eye gaze position detecting device 100 will be described with reference to the flowchart of FIG.
First, in step S701, the image acquisition unit 101 acquires a visual recognition target image.
Next, in step S702, the image acquisition unit 101 acquires a visual field camera image.
Next, in step S703, the feature point extraction unit 102 extracts SIFT feature points of the viewing target image from the viewing target image acquired in step S701.
Next, in step S704, the feature point extraction unit 102 selects one frame of the visual field camera image acquired in step S702 in the acquisition order.

次に、ステップＳ７０５において、特徴点抽出部１０２は、ステップＳ７０２で取得された視野カメラ画像のフレームから、視野カメラ画像のSIFT特徴点を抽出する。
次に、ステップＳ７０６において、特徴点マッチング部１０３は、視認対象画像のSIFT特徴点Ｓⁱと、当該視野カメラ画像のSIFT特徴点Ｓⁱと特徴量が相互に対応する視認対象画像のSIFT特徴点Ｔ^j(i,0)とを対応点として求める特徴点マッチング処理を行う。特徴点マッチング処理の詳細については、図８を参照しながら後述する。
次に、ステップＳ７０７において、誤対応除去部１０４は、ステップＳ７０６で得られた対応点の対のうち、誤対応の対応点の対を除去する誤対応除去処理を行う。誤対応除去処理の詳細については、図９を参照しながら後述する。 Next, in step S705, the feature point extraction unit 102 extracts SIFT feature points of the visual field camera image from the frame of the visual field camera image acquired in step S702.
Next, in step S706, the feature point matching unit 103, a SIFT feature point S ⁱ on the viewing target image, SIFT feature points of the viewing target image SIFT feature point S ⁱ and the feature quantity of the field of view camera image correspond to each other A feature point matching process for obtaining T ^{j (i, 0)} as a corresponding point is performed. Details of the feature point matching process will be described later with reference to FIG.
Next, in step S <b> 707, the incorrect correspondence removing unit 104 performs an erroneous correspondence removing process for removing the pair of erroneous corresponding points from the pair of corresponding points obtained in step S <b> 706. Details of the error handling removal process will be described later with reference to FIG.

次に、ステップＳ７０８において、視線位置近傍特徴点抽出部１０５は、ステップＳ７０７で誤対応のものが除去された後の対応点を構成する「視野カメラ画像のSIFT特徴点Ｓⁱ」のうち、視野カメラ画像における作業者の視線位置の近傍の３つの「視野カメラ画像のSIFT特徴点Ｓⁱ」を視線位置近傍特徴点として抽出する視線位置近傍特徴点抽出処理を行う。視線位置近傍特徴点抽出処理の詳細については、図１０を参照しながら後述する。
次に、ステップＳ７０９において、視認対象画像上視線位置導出部１０６は、視認対象画像における作業者の視線位置の点Ｐ´を導出する視認対象画像上視線位置導出処理を行う。視認対象画像上視線位置導出処理の詳細については、図１１を参照しながら後述する。 Next, in step S708, the line-of-sight position feature point extraction unit 105 selects the field of view among the “SIFT feature points S ⁱ of the field-of-view camera image” that constitute the corresponding points after the erroneous correspondence is removed in step S707. A line-of-sight position feature point extraction process is performed to extract three “SIFT feature points S ⁱ of the visual field camera image” in the vicinity of the worker's line-of-sight position in the camera image. Details of the gaze position vicinity feature point extraction process will be described later with reference to FIG.
Next, in step S709, the visual target image upper line-of-sight position deriving unit 106 performs a visual target image upper line-of-sight position derivation process for deriving a point P ′ of the operator's line-of-sight position in the visual target image. The details of the visual target image upper line-of-sight position derivation process will be described later with reference to FIG.

次に、ステップＳ７１０において、視認対象画像上視線位置表示部１０７は、ステップＳ７０９で導出された「視認対象画像における作業者の視線位置の点Ｐ´」の情報を（フレーム単位で）表示装置に表示させる。
次に、ステップＳ７１１において、特徴点抽出部１０２は、ステップＳ７０２で取得された視野カメラ画像の全てのフレームを取得したか否かを判定する。この判定の結果、視野カメラ画像の全てのフレームを取得していない場合には、ステップＳ７０４に戻る。一方、視野カメラ画像の全てのフレームを取得した場合には、図７のフローチャートによる処理を終了する。 Next, in step S 710, the visual target image upper line-of-sight position display unit 107 displays the information of “the point P ′ of the worker's visual line position in the visual target image” derived in step S 709 on the display device. Display.
Next, in step S711, the feature point extraction unit 102 determines whether all frames of the visual field camera image acquired in step S702 have been acquired. As a result of this determination, if all the frames of the visual field camera image have not been acquired, the process returns to step S704. On the other hand, when all the frames of the visual field camera image have been acquired, the processing according to the flowchart of FIG.

次に、図８のフローチャートを参照しながら、図７のステップＳ７０６の特徴点マッチング処理の詳細を説明する。
まず、ステップＳ８０１において、特徴点マッチング部１０３は、未選択の視野カメラ画像のSIFT特徴点Ｓⁱを、ステップＳ７０５で抽出されたものの中から１つ選択する。
次に、ステップＳ８０２において、特徴点マッチング部１０３は、ステップＳ７０５で選択された視野カメラ画像のSIFT特徴点Ｓⁱの特徴量と、ステップＳ７０３で抽出された視認対象画像のSIFT特徴点Ｔ^jの特徴量のそれぞれとの誤差として、これらのユークリッド距離d(i,j)を、（３）式を用いて導出する。
次に、ステップＳ８０３において、特徴点マッチング部１０３は、視野カメラ画像のSIFT特徴点Ｓⁱとのユークリッド距離d(i,j)が最も小さな視認対象画像のSIFT特徴点Ｔ^j(i,0)を探索する。
次に、ステップＳ８０４において、特徴点マッチング部１０３は、視野カメラ画像のSIFT特徴点Ｓⁱとのユークリッド距離d(i,j)が２番目に小さな視認対象画像のSIFT特徴点Ｔ^j(i,1)を探索する。 Next, details of the feature point matching process in step S706 of FIG. 7 will be described with reference to the flowchart of FIG.
First, in step S801, the feature point matching unit 103, a SIFT feature point S ⁱ of view camera image unselected, selects one from among those extracted in step S705.
Next, in step S802, the feature point matching unit 103 sets the feature quantity of the SIFT feature point S ⁱ of the visual field camera image selected in step S705 and the SIFT feature point T ^j of the visual target image extracted in step S703. These Euclidean distances d (i, j) are derived using equation (3) as an error from each of the feature quantities.
Next, in step S803, the feature point matching unit 103 determines the SIFT feature point T ^{j (i, 0) of the} viewing target image having the smallest Euclidean distance d (i, j) from the SIFT feature point S ⁱ of the visual field camera image. Explore.
Next, in step S804, the feature point matching unit 103 determines the SIFT feature point T ^{j (i, i, of the} visual target image having the second smallest Euclidean distance d (i, j) from the SIFT feature point S ⁱ of the visual field camera image ^. Search for ¹⁾ .

次に、ステップＳ８０５において、特徴点マッチング部１０３は、視野カメラ画像のSIFT特徴点Ｓⁱと、視認対象画像のSIFT特徴点Ｔ^j(i,0)とのユークリッド距離d(i,j(i,0))と、視野カメラ画像のSIFT特徴点Ｓⁱと、視認対象画像のSIFT特徴点Ｔ^j(i,1)とのユークリッド距離d(i,j(i,1))との関係が、（４）式を満たすか否かを判定する。
この判定の結果、（４）式を満たさない場合には、視野カメラ画像のSIFT特徴点Ｓⁱと、視認対象画像のSIFT特徴点Ｔ^j(i,0)とを対応点としないので、ステップＳ８０６を省略して後述するステップＳ８０７に進む。 Next, in step S805, the feature point matching unit 103 performs the Euclidean distance d (i, j (i ⁾ between the SIFT feature point S ⁱ of the visual field camera image and the SIFT feature point T ^{j (i, 0)} of the viewing target image. , 0)) and the Euclidean distance d (i, j (i, 1)) between the SIFT feature point S ⁱ of the visual field camera image and the SIFT feature point T ^{j (i, 1)} of the image to be viewed. , (4) is satisfied.
As a result of this determination, if the expression (4) is not satisfied, the SIFT feature point S ⁱ of the visual field camera image and the SIFT feature point T ^{j (i, 0) of} the viewing target image are not used as corresponding points. S806 is omitted, and the process proceeds to step S807 described later.

一方、（４）式を満たす場合には、ステップＳ８０６に進む。ステップＳ８０６に進むと、特徴点マッチング部１０３は、視野カメラ画像のSIFT特徴点Ｓⁱと、視認対象画像のSIFT特徴点Ｔ^j(i,0)とを、特徴量が相互に対応する対応点として設定する。
次に、ステップＳ８０７において、特徴点マッチング部１０３は、ステップＳ７０５で抽出された視野カメラ画像のSIFT特徴点Ｓⁱを全て選択したか否かを判定する。この判定の結果、視野カメラ画像のSIFT特徴点Ｓⁱを全て選択していない場合には、ステップＳ８０１に戻る。 On the other hand, if the expression (4) is satisfied, the process proceeds to step S806. In step S806, the feature point matching unit 103 matches the SIFT feature points S ⁱ of the field-of-view camera image and the SIFT feature points T ^{j (i, 0) of the} image to be viewed with corresponding feature amounts. Set as.
Next, in step S807, the feature point matching unit 103 determines whether all the SIFT feature points S ⁱ of the visual field camera image extracted in step S705 have been selected. If all the SIFT feature points S ⁱ of the visual field camera image have not been selected as a result of this determination, the process returns to step S801.

一方、視野カメラ画像のSIFT特徴点Ｓⁱを全て選択した場合には、ステップＳ８０８に進む。ステップＳ８０８に進むと、同一の「視野カメラ画像のSIFT特徴点Ｓⁱ」に対し、複数の対応点（複数の「視認対象画像のSIFT特徴点Ｔ^j(i,0)」）が設定されたか否かを判定する。この判定の結果、複数の対応点が設定された場合には、ステップＳ８０９に進む。
ステップＳ８０９に進むと、特徴点マッチング部１０３は、それらの対応点のうち、ユークリッド距離d(i,j)が最も近い対応点の対を選択し、それ以外の対応点を除去する。これにより、視野カメラ画像のSIFT特徴点Ｓⁱと、当該視野カメラ画像のSIFT特徴点Ｓⁱと特徴量が相互に対応する視認対象画像のSIFT特徴点Ｔ^j(i,0)とが１対１の関係になる。そして、図７のステップＳ７０７に進む。
一方、複数の対応点が設定されていない場合には、対応点を除去する必要はないので、ステップＳ８０９を省略して図７のステップＳ７０７に進む。 On the other hand, if all the SIFT feature points S ⁱ of the visual field camera image have been selected, the process proceeds to step S808. In step S808, a plurality of corresponding points (a plurality of “SIFT feature points T ^{j (i, 0)} ” of the viewing target image) are set for the same “SIFT feature point S ⁱ of the visual field camera image”. Determine whether or not. As a result of this determination, if a plurality of corresponding points are set, the process proceeds to step S809.
In step S809, the feature point matching unit 103 selects a pair of corresponding points having the closest Euclidean distance d (i, j) from the corresponding points, and removes the other corresponding points. Thus, the SIFT feature point S ⁱ of the field of view camera image, SIFT feature point T ^{j (i, 0)} of the visual target image SIFT feature point S ⁱ and the feature quantity corresponding to each other of the viewing camera images and a pair 1 relationship. Then, the process proceeds to step S707 in FIG.
On the other hand, when a plurality of corresponding points are not set, it is not necessary to remove the corresponding points, so step S809 is omitted and the process proceeds to step S707 in FIG.

次に、図９のフローチャートを参照しながら、図７のステップＳ７０７の誤対応除去処理の詳細を説明する。
まず、ステップＳ９０１において、誤対応除去部１０４は、図７のステップＳ７０６（図８のフローチャート）で得られた４つの対応点の対（視野カメラ画像のSIFT特徴点Ｓⁱと、当該視野カメラ画像のSIFT特徴点Ｓⁱと特徴量が相互に対応する視認対象画像のSIFT特徴点Ｔ^j(i,0)）であって、未選択の対応点の対を選択する。
次に、ステップＳ９０２において、誤対応除去部１０４は、ステップＳ９０１で選択した４つの対応点の対の画像上の座標を用いて、（５）式及び（６）式で示される「射影変換のためのパラメータａ₁〜ａ₈」を導出する。 Next, details of the erroneous correspondence removal processing in step S707 of FIG. 7 will be described with reference to the flowchart of FIG.
First, in step S901, the erroneous correspondence removing unit 104 determines the four corresponding point pairs (SIFT feature points S ⁱ of the field camera image and the field camera image obtained in step S706 of FIG. 7 (flowchart of FIG. 8)). SIFT feature points S ⁱ and SIFT feature points T ^{j (i, 0)} ) of the image to be viewed whose feature values correspond to each other, and a pair of unselected corresponding points is selected.
Next, in step S902, the erroneous correspondence removing unit 104 uses the coordinates on the image of the four corresponding point pairs selected in step S901 to display “projection transformation of formula (5)” and (6). Parameters a _{1 to} a ₈ ”are derived.

次に、ステップＳ９０３において、誤対応除去部１０４は、射影変換のためのパラメータａ₁〜ａ₈の導出のために使用していない対応点の対（ステップＳ９０１で選択した対応点の対と異なる対応点の対）であって、未選択の対応点の対を選択する。
次に、ステップＳ９０４において、誤対応除去部１０４は、ステップＳ９０３で選択した対応点の対の一方である「視野カメラ画像のSIFT特徴点Ｓⁱ」の視野カメラ画像上の座標(x,y)とパラメータａ₁〜ａ₈とを（５）式及び（６）式に代入して射影変換を行い、視認対象画像上の座標(x',y')を導出する。そして、誤対応除去部１０４は、導出した「視認対象画像上の座標(x',y')」と、ステップＳ９０３で選択した選択した対応点の他方である「視認対象画像のSIFT特徴点Ｔ^j(i,0)」の視認対象画像上の座標(x',y')との誤差である投影誤差を導出する。 Next, in step S903, the erroneous correspondence removal unit 104 is different from the pair of corresponding points not used for deriving the parameters a _{1 to} a ₈ for projective transformation (the pair of corresponding points selected in step S901). A pair of corresponding points) which is an unselected pair of corresponding points.
Next, in step S904, the erroneous correspondence removing unit 104 determines the coordinates (x, y) on the field camera image of the “SIFT feature point S ⁱ of the field camera image” which is one of the pair of corresponding points selected in step S903. And parameters a _{1 to} a ₈ are substituted into the equations (5) and (6) to perform projective transformation, and the coordinates (x ′, y ′) on the image to be viewed are derived. Then, the erroneous correspondence removing unit 104 selects the derived “coordinate (x ′, y ′) on the viewing target image” and “SIFT feature point T of the viewing target image, which is the other of the selected corresponding points selected in step S903”. ^A projection error, which is an error between the coordinates (x ′, y ′) on the visual recognition target image of ^{j (i, 0)} ”, is derived.

次に、ステップＳ９０５において、誤対応除去部１０４は、ステップＳ９０４で導出した投影誤差が、予め設定されている閾値以下であるか否かを判定する。この判定の結果、投影誤差が閾値以下でない場合には、ステップＳ９０４で導出された投影誤差によれば、ステップＳ９０２で導出された「射影変換のためのパラメータａ₁〜ａ₈」が正しいものではないと推定されるので変数Ｃを変化させない。よって、ステップＳ９０６を省略して後述するステップＳ９０７に進む。
一方、投影誤差が閾値以下である場合には、ステップＳ９０６に進む。ステップＳ９０６に進むと、誤対応除去部１０４は、変数Ｃに１を加算する。 Next, in step S905, the erroneous correspondence removing unit 104 determines whether or not the projection error derived in step S904 is equal to or less than a preset threshold value. If the result of this determination is that the projection error is not less than or equal to the threshold, the “projection transformation parameters a _{1 to} a ₈ ” derived in step S 902 are not correct according to the projection error derived in step S 904. Therefore, the variable C is not changed. Therefore, step S906 is omitted and the process proceeds to step S907 described later.
On the other hand, if the projection error is less than or equal to the threshold value, the process proceeds to step S906. In step S906, the incorrect correspondence removing unit 104 adds 1 to the variable C.

次に、ステップＳ９０７において、誤対応除去部１０４は、射影変換のためのパラメータａ₁〜ａ₈の導出のために使用していない対応点の対（ステップＳ９０１で選択した対応点の対と異なる対応点の対）を全て選択したか否かを判定する。この判定の結果、射影変換のためのパラメータａ₁〜ａ₈の導出のために使用していない対応点の対を全て選択していない場合には、ステップＳ９０３に戻る。
一方、射影変換のためのパラメータａ₁〜ａ₈の導出のために使用していない対応点の対を全て選択した場合には、ステップＳ９０８に進む。ステップＳ９０８に進むと、誤対応除去部１０４は、図７のステップＳ７０６（図８のフローチャート）で得られた４つの対応点の対の組み合わせを全て選択したか否かを判定する。この判定の結果、４つの対応点の対の組み合わせを全て選択していない場合には、ステップＳ９０１に戻る。一方、４つの対応点の対の組み合わせを全て選択した場合には、ステップＳ９０９に進む。ステップＳ９０９に進む場合には、４つの「対応点の対」の全ての組み合わせと同数の変数Ｃが得られている。 Next, in step S907, the erroneous correspondence removing unit 104 is different from the pair of corresponding points not used for deriving the parameters a _{1 to} a ₈ for projective transformation (the pair of corresponding points selected in step S901). It is determined whether or not all pairs of corresponding points have been selected. As a result of this determination, if not all corresponding pairs of corresponding points not used for deriving the parameters a _{1 to} a ₈ for projective transformation are selected, the process returns to step S903.
On the other hand, if all corresponding pairs of corresponding points that are not used for deriving the parameters a _{1 to} a ₈ for projective transformation are selected, the process proceeds to step S908. In step S908, the incorrect correspondence removing unit 104 determines whether all combinations of the four corresponding point pairs obtained in step S706 of FIG. 7 (the flowchart of FIG. 8) have been selected. As a result of the determination, if all the combinations of the four corresponding points are not selected, the process returns to step S901. On the other hand, when all the combinations of the four corresponding points are selected, the process proceeds to step S909. In the case of proceeding to step S909, the same number of variables C as all combinations of the four “corresponding point pairs” are obtained.

そして、ステップＳ９０９に進むと、誤対応除去部１０４は、得られた変数Ｃの最大値Ｃ_maxを選択し、最大値Ｃ_maxを得るために導出したパラメータａ₁〜ａ₈を最適射影変換パラメータとして設定する。
次に、ステップＳ９１０において、誤対応除去部１０４は、ステップＳ７０６（図８のフローチャート）で得られた対応点の対のうち未選択の対応点の対を１つ選択する。
次に、ステップＳ９１１において、誤対応除去部１０４は、ステップＳ９１０で選択した対応点の対の一方である「視野カメラ画像のSIFT特徴点Ｓⁱ」の視野カメラ画像上の座標(x,y)と最適射影変換パラメータａ₁〜ａ₈とを（５）式及び（６）式に代入して射影変換を行い、視認対象画像上の座標(x',y')を導出する。そして、誤対応除去部１０４は、導出した「視認対象画像上の座標(x',y')」と、選択した対応点の他方である「視認対象画像のSIFT特徴点Ｔ^j(i,0)」の視認対象画像上の座標(x',y')との誤差である投影誤差を導出する。 When the process proceeds to step S909, erroneous corresponding removal unit 104, resulting selects the maximum value C _max of the variable C has a maximum value C _max was derived to obtain the parameters a ₁ ~a ₈ optimal projective transformation parameters Set as.
Next, in step S910, the incorrect correspondence removal unit 104 selects one pair of unselected corresponding points from the pair of corresponding points obtained in step S706 (the flowchart in FIG. 8).
Next, in step S911, the erroneous correspondence removing unit 104 determines the coordinates (x, y) on the field camera image of the “SIFT feature point S ⁱ of the field camera image” which is one of the pair of corresponding points selected in step S910. And the optimal projection transformation parameters a _{1 to} a ₈ are substituted into the equations (5) and (6) to perform the projection transformation, and the coordinates (x ′, y ′) on the visual recognition target image are derived. The erroneous correspondence removal unit 104 then derives the “coordinate (x ′, y ′) on the viewing target image” and the “SIFT feature point T ^{j (i, 0 of the} viewing target image), which is the other of the selected corresponding points. ^)] , A projection error that is an error from the coordinates (x ′, y ′) on the visual recognition target image is derived.

次に、誤対応除去部１０４は、ステップＳ９１１で導出した投影誤差が、予め設定されている閾値以下であるか否かを判定する。この判定の結果、投影誤差が閾値以下である場合には、ステップＳ９１０で選択した対応点は誤対応の対応点ではないので、ステップＳ９１３を省略して後述するステップＳ９１４に進む。
一方、投影誤差が閾値以下でない場合には、ステップＳ９１３に進む。ステップＳ９１３に進むと、誤対応除去部１０４は、ステップＳ９１０で選択した対応点の対を誤対応の対応点の対であるとして除去する。
次に、ステップＳ９１４において、誤対応除去部１０４は、ステップＳ７０６（図８のフローチャート）で得られた対応点の対を全て選択したか否かを判定する。この判定の結果、対応点の対を全て選択していない場合には、ステップＳ９１０に戻る。一方、対応点の対を全て選択した場合には、図７のステップＳ７０８に進む。 Next, the incorrect correspondence removing unit 104 determines whether or not the projection error derived in step S911 is equal to or less than a preset threshold value. If the result of this determination is that the projection error is less than or equal to the threshold value, the corresponding point selected in step S910 is not a miscorresponding corresponding point, so step S913 is omitted and processing proceeds to step S914 described later.
On the other hand, if the projection error is not less than or equal to the threshold value, the process proceeds to step S913. In step S913, the incorrect correspondence removing unit 104 removes the pair of corresponding points selected in step S910 as a pair of corresponding points corresponding to the wrong correspondence.
Next, in step S914, the incorrect correspondence removing unit 104 determines whether all the pairs of corresponding points obtained in step S706 (the flowchart in FIG. 8) have been selected. As a result of the determination, if not all the corresponding point pairs have been selected, the process returns to step S910. On the other hand, when all the pairs of corresponding points have been selected, the process proceeds to step S708 in FIG.

次に、図１０のフローチャートを参照しながら、図７のステップＳ７０８の視線位置近傍特徴点抽出処理の詳細を説明する。
まず、ステップＳ１００１において、視線位置近傍特徴点抽出部１０５は、図７のステップＳ７０７（図９のフローチャート）で誤対応のものが除去された対応点として設定されている視野カメラ画像のSIFT特徴点Ｓⁱを全て選択する。
次に、ステップＳ１００２において、視線位置近傍特徴点抽出部１０５は、ステップＳ１００１で選択した「視野カメラ画像のSIFT特徴点Ｓⁱ」の視野カメラ画像上の座標(x,y)を頂点とするドロネー三角形を作成する（図４（ｂ）を参照）。 Next, details of the line-of-sight position feature point extraction process in step S708 of FIG. 7 will be described with reference to the flowchart of FIG.
First, in step S1001, the line-of-sight position feature point extraction unit 105 sets the SIFT feature points of the field-of-view camera image set as the corresponding points from which the incorrect correspondences are removed in step S707 of FIG. 7 (the flowchart of FIG. 9). all the S ⁱ to select.
Next, in step S1002, the line-of-sight position neighboring feature point extraction unit 105 uses the Delaunay having the coordinates (x, y) on the field camera image of the “SIFT feature point S ⁱ of the field camera image” selected in step S1001 as a vertex. A triangle is created (see FIG. 4B).

次に、ステップＳ１００３において、視線位置近傍特徴点抽出部１０５は、視野カメラ画像における作業者の視線位置を検出し、検出した作業者の視線位置がドロネー三角形の内部にあるか否かを判定する。この判定の結果、作業者の視線位置がドロネー三角形の内部にある場合には、ステップＳ１００４に進む。
ステップＳ１００４に進むと、視線位置近傍特徴点抽出部１０５は、当該ドロネー三角形の３つの頂点を構成する「視野カメラ画像のSIFT特徴点Ｓⁱ」を視線位置近傍特徴点Ａ、Ｂ、Ｃとして設定する。そして、図７のステップＳ７０９に進む。
一方、作業者の視線位置がドロネー三角形の内部にない場合には、ステップＳ１００５に進む。ステップＳ１００５に進むと、視線位置近傍特徴点抽出部１０５は、作業者の視線位置にユークリッド距離が最も近いドロネー三角形の辺の両端点を構成する「視野カメラ画像のSIFT特徴点Ａ、Ｂ」を選択する。
次に、ステップＳ１００６において、視線位置近傍特徴点抽出部１０５は、視野カメラ画像の未選択のSIFT特徴点Ｓⁱのうち、作業者の視線位置とのユークリッド距離が最も近い特徴点Ｃを、ステップＳ１００５で選択した「２つの視野カメラ画像のSIFT特徴点Ａ、Ｂ」以外から選択する。
次に、ステップＳ１００７において、視線位置近傍特徴点抽出部１０５は、ステップＳ１００５で選択した「視野カメラ画像のSIFT特徴点Ａ、Ｂ」と、ステップＳ１００６で選択した「視野カメラ画像のSIFT特徴点Ｃ」とに基づく角度（∠ＡＣＢ）が（８）式を満たすか否かを判定する。 Next, in step S1003, the line-of-sight position feature point extraction unit 105 detects the line-of-sight position of the worker in the visual field camera image, and determines whether the detected line-of-sight position of the worker is inside the Delaunay triangle. . As a result of the determination, if the operator's line-of-sight position is inside the Delaunay triangle, the process proceeds to step S1004.
In step S1004, the line-of-sight position feature point extraction unit 105 sets “SIFT feature points S ⁱ of the visual field camera image” constituting the three vertices of the Delaunay triangle as line-of-sight position feature points A, B, and C. To do. Then, the process proceeds to step S709 in FIG.
On the other hand, if the operator's line-of-sight position is not inside the Delaunay triangle, the process proceeds to step S1005. In step S1005, the line-of-sight position feature point extraction unit 105 selects “SIFT characteristic points A and B of the field camera image” that constitute both end points of the Delaunay triangle whose Euclidean distance is closest to the line-of-sight position of the operator. select.
Next, in step S1006, the line-of-sight position feature point extraction unit 105 performs a step of selecting a feature point C having the closest Euclidean distance to the operator's line-of-sight position among unselected SIFT feature points S ⁱ of the visual field camera image. Selection is made from other than “SIFT feature points A and B of two field-of-view camera images” selected in S1005.
Next, in step S1007, the line-of-sight position feature point extraction unit 105 selects “SIFT feature points A and B of the field camera image” selected in step S1005 and “SIFT feature point C of the field camera image selected in step S1006”. It is determined whether or not the angle based on (∠ACB) satisfies the expression (8).

この判定の結果、（８）式を満たさない場合には、後述するステップＳ１０１１に進む。一方、（８）式を満たす場合には、ステップＳ１００８に進む。
ステップＳ１００８に進むと、視線位置近傍特徴点抽出部１０５は、ステップＳ１００５で選択した「視野カメラ画像のSIFT特徴点Ａ、Ｂ」と、ステップＳ１００６で選択した「視野カメラ画像のSIFT特徴点Ｃ」と特徴量が相互に対応する「視認対象画像のSIFT特徴点Ａ´、Ｂ´、Ｃ´」を選択する。
次に、ステップＳ１００９において、視線位置近傍特徴点抽出部１０５は、（９）式に示す２次元ベクトルの外積の符号と、（１０）式に示す２次元ベクトルの外積の符号とが同じであるか否かを判定する。この判定の結果、これらの外積の符号が同じでない場合には、視野カメラ画像のSIFT特徴点Ａ、Ｂ、Ｃにより構成される三角形の頂点の位置関係と、視認対象画像のSIFT特徴点Ａ´、Ｂ´、Ｃ´により構成される三角形の頂点の位置関係とが異なっていることになる。よって、視野カメラ画像のSIFT特徴点Ａ、Ｂ、Ｃを視線位置近傍特徴点とすることはできない。そこで、ステップＳ１００６に戻る。 If the result of this determination is that the expression (8) is not satisfied, the routine proceeds to step S1011 described later. On the other hand, if the expression (8) is satisfied, the process proceeds to step S1008.
In step S1008, the line-of-sight position feature point extraction unit 105 selects “SIFT feature points A and B of the visual field camera image” selected in step S1005 and “SIFT feature point C of the visual field camera image” selected in step S1006. And “SIFT feature points A ′, B ′, C ′ of the image to be viewed” corresponding to each other.
Next, in step S1009, the line-of-sight position neighboring feature point extraction unit 105 has the same sign as the outer product of the two-dimensional vector shown in Expression (9) and the sign of the outer product of the two-dimensional vector shown in Expression (10). It is determined whether or not. As a result of this determination, if the signs of these outer products are not the same, the positional relationship between the vertices of the triangle formed by the SIFT feature points A, B, and C of the field-of-view camera image and the SIFT feature point A ′ of the viewing target image , B ′ and C ′ are different from each other in the positional relationship of the vertices of the triangle. Therefore, the SIFT feature points A, B, and C of the field-of-view camera image cannot be set as feature points near the line-of-sight position. Therefore, the process returns to step S1006.

一方、これらの外積の符号が同じである場合には、ステップＳ１０１０に進む。ステップＳ１０１０に進むと、視線位置近傍特徴点抽出部１０５は、ステップＳ１００５で選択した「視野カメラ画像のSIFT特徴点Ａ、Ｂ」と、ステップＳ１００６で選択した最新の「視野カメラ画像のSIFT特徴点Ｃ」とを視線位置近傍特徴点Ａ、Ｂ、Ｃとして設定する。そして、図７のステップＳ７０９に進む。
前述したように、ステップＳ１００７において、（８）式を満たさない場合には、ステップＳ１０１１に進む。ステップＳ１０１１に進むと、視線位置近傍特徴点抽出部１０５は、ステップＳ１００５で選択した「２つの視野カメラ画像のSIFT特徴点Ａ、Ｂ」以外の点を視野カメラ画像のSIFT特徴点Ｃとして全て選択したか否かを判定する。この判定の結果、２つの視野カメラ画像のSIFT特徴点Ａ、Ｂ以外の点を全て選択していない場合には、ステップＳ１００６に戻る。 On the other hand, if the signs of these outer products are the same, the process proceeds to step S1010. In step S1010, the line-of-sight position feature point extraction unit 105 selects the “SIFT feature points A and B of the field camera image” selected in step S1005 and the latest “SIFT feature points of the field camera image selected in step S1006”. C ”is set as the feature point A, B, C near the line-of-sight position. Then, the process proceeds to step S709 in FIG.
As described above, if the expression (8) is not satisfied in step S1007, the process proceeds to step S1011. In step S1011, the line-of-sight position feature point extraction unit 105 selects all the points other than the “SIFT feature points A and B of the two field camera images” selected in step S1005 as the SIFT feature points C of the field camera image. Determine whether or not. If all the points other than the SIFT feature points A and B of the two field-of-view camera images have not been selected as a result of this determination, the process returns to step S1006.

一方、２つの視野カメラ画像のSIFT特徴点Ａ、Ｂ以外の点を全て選択した場合には、ステップＳ１０１２に進む。ステップＳ１０１２に進むと、視線位置近傍特徴点抽出部１０５は、視線位置近傍特徴点抽出部１０５は、視野カメラ画像のSIFT特徴点Ｓⁱのうち、作業者の視線位置とのユークリッド距離が最も近い特徴点Ｃを、ステップＳ１００５で選択した「２つの視野カメラ画像のSIFT特徴点Ａ、Ｂ」以外から選択し直す。そして、ステップＳ１０１０に進む。ステップＳ１０１２からステップＳ１０１０に進んだ場合、視線位置近傍特徴点抽出部１０５は、ステップＳ１００５で選択した「視野カメラ画像のSIFT特徴点Ａ、Ｂ」と、ステップＳ１０１２で選択した「視野カメラ画像のSIFT特徴点Ｃ」とを視線位置近傍特徴点Ａ、Ｂ、Ｃとして設定する。そして、図７のステップＳ７０９に進む。 On the other hand, if all points other than the SIFT feature points A and B of the two field-of-view camera images have been selected, the process proceeds to step S1012. In step S1012, the line-of-sight position feature point extraction unit 105 has the shortest Euclidean distance from the line-of-sight position of the worker among the SIFT feature points S ⁱ of the visual field camera image. The feature point C is selected again from other than the “SIFT feature points A and B of the two field-of-view camera images” selected in step S1005. Then, the process proceeds to step S1010. When the process proceeds from step S1012 to step S1010, the line-of-sight position feature point extraction unit 105 selects the “SIFT feature points A and B of the visual field camera image” selected in step S1005 and the “SIFT of the visual field camera image” selected in step S1012. The feature point C ”is set as the feature point A, B, C near the line-of-sight position. Then, the process proceeds to step S709 in FIG.

次に、図１１のフローチャートを参照しながら、図７のステップＳ７０８の視線位置近傍特徴点抽出処理の詳細を説明する。
まず、図１１のステップＳ１１０１において、視認対象画像上視線位置導出部１０６は、図７のステップＳ７０８（図１０のフローチャート）で得られた視線位置近傍特徴点Ａ、Ｂ、Ｃに基づいて、（１２）式により、パラメータｓ、ｔを導出する。（１１）式及び（１２）式に示すように、このパラメータｓ、ｔは、視線位置近傍特徴点Ａを原点とし、且つ、視線位置近傍特徴点Ａ、Ｂを両端とする辺ＡＢと、視線位置近傍特徴点Ａ、Ｃを両端とする辺ＡＣとを軸とした斜交座標系における視線位置の点Ｐ（視野カメラ画像における座標）を表すためのパラメータである。 Next, details of the line-of-sight position feature point extraction process in step S708 of FIG. 7 will be described with reference to the flowchart of FIG.
First, in step S1101 of FIG. 11, the visual target image upper line-of-sight position deriving unit 106 determines based on the line-of-sight position feature points A, B, and C obtained in step S708 (flow chart of FIG. 10) of FIG. The parameters s and t are derived from the equation (12). As shown in the equations (11) and (12), the parameters s and t are the side AB having the feature point A near the line-of-sight position as the origin and the feature points A and B near the line-of-sight position, and the line of sight This is a parameter for representing the point P (coordinate in the field-of-view camera image) of the line-of-sight position in the oblique coordinate system with the vicinity of the position vicinity feature points A and C as the sides AC.

次に、ステップＳ１１０２において、視認対象画像上視線位置導出部１０６は、「視線位置近傍特徴点Ａ、Ｂ、Ｃ」と特徴量が相互に対応する「視認対象画像のSIFT特徴点Ａ´、Ｂ´、Ｃ´」と、ステップＳ１１０１で導出されたパラメータｓ、ｔとを用いて、「視認対象画像における作業者の視線位置の点Ｐ´」を視認対象画像上視線位置として、（１３）式により導出する（図５の右図を参照）。そして、図７のステップＳ７１０に進む。 Next, in step S1102, the visual target image upper line-of-sight position deriving unit 106 selects “SIFT feature points A ′ and B of the visual target image corresponding to the characteristic points A, B and C near the visual line position”. By using “′, C ′” and the parameters s, t derived in step S1101, “the point P ′ of the operator's line-of-sight position in the visual recognition target image” is set as the visual recognition target image upper line-of-sight position. (See the right figure of FIG. 5). Then, the process proceeds to step S710 in FIG.

（まとめ）
以上のように本実施形態では、まず、視線位置検出対象である三次元空間を、デジタルカメラなどで撮影した視認対象画像（二次元平面）に変換して、実際の視線位置検出処理を、その二次元平面に限定する。そして、移動型視線計測カメラの画像である視野カメラ画像のSIFT特徴点Ｓⁱと、視認対象画像のSIFT特徴点Ｔ^jとを抽出し、抽出したSIFT特徴点Ｓⁱ、Ｔ^jの対応点を導出する。次に、対応点である「視野カメラ画像のSIFT特徴点Ｓⁱ」の視野カメラ画像上の座標(x,y)を頂点とするドロネー三角形を形成する。次に、形成したドロネー三角形の頂点の中から、作業者の視線位置の近傍の三点を視線位置近傍特徴点Ａ、Ｂ、Ｃとして選択する。次に、選択した「視線位置近傍特徴点Ａ、Ｂ、Ｃ」から見た場合の「視野カメラ画像における作業者の視線位置の点Ｐ」の位置関係と同じになるように、「視線位置近傍特徴点Ａ、Ｂ、Ｃ」と特徴量が相互に対応する「視認対象画像のSIFT特徴点Ａ´、Ｂ´、Ｃ´」の画像上の座標から、「視認対象画像における作業者の視線位置の点Ｐ´」を求める。したがって、二次元の視野カメラ画像と、二次元の視認対象画像との特徴点を相互に対応付けることができ、しかも、視野カメラ画像における作業者の視線位置の点Ｐを囲む視線位置近傍特徴点Ａ、Ｂ、Ｃが極端につぶれてしまう（三角形の内角の一つが極端に小さくなる）ことを防止することができる。よって、現実の３次元空間で何を見ているかを、三次元座標を必要とせず、また三次元の射影変換を行わずに、視認対象画像における作業者の視線位置の点Ｐ´を簡単な計算処理で精度良く求めることができる。よって、射影変換のパラメータの誤差の影響を受けることなく、視認対象画像における作業者の視線位置の点Ｐ´を求めることができる。また、射影変換のパラメータの誤差を少なくするための「画像の歪みを除去する前処理」を行わなくても、視認対象画像における作業者の視線位置の点Ｐ´を正確に求めることができる。以上のように、本実施形態では、作業者が見ている視野カメラ画像と、作業者の視認対象となる領域として予め得られている視認対象画像との対応をとることによって、作業者が何を見ているのかを容易に且つ確実に自動で検出することができる。
また、本実施形態では、誤対応の対応点を除去した上で、視線位置近傍特徴点Ａ、Ｂ、Ｃを抽出するようにしたので、視認対象画像における作業者の視線位置の点Ｐ´をより一層正確に求めることができる。 (Summary)
As described above, in the present embodiment, first, the three-dimensional space that is the target of eye gaze position detection is converted into a visual object image (two-dimensional plane) photographed with a digital camera or the like, and the actual eye gaze position detection processing is performed. Limited to two-dimensional planes. Then, the SIFT feature point S ⁱ of the field-of-view camera image that is an image of the mobile gaze measurement camera and the SIFT feature point T ^{j of the} image to be viewed are extracted, and the corresponding points of the extracted SIFT feature points S ⁱ and T ^j are determined. To derive. Next, a Delaunay triangle whose vertex is the coordinate (x, y) on the field camera image of the “SIFT feature point S ⁱ of the field camera image” which is the corresponding point is formed. Next, three points in the vicinity of the operator's line-of-sight position are selected from the vertices of the Delaunay triangle formed as line-of-sight position vicinity feature points A, B, and C. Next, the “near gaze position neighborhood” is the same as the positional relationship of “the point P of the gaze position of the worker in the visual field camera image” when viewed from the selected “gaze point neighboring feature points A, B, C”. From the coordinates on the image of “SIFT feature points A ′, B ′, C ′ of the viewing target image” whose feature amounts correspond to the feature points A, B, C ”, the“ visual line position of the operator in the viewing target image ” Of point P ′ ”. Therefore, the feature points of the two-dimensional visual field camera image and the two-dimensional visual recognition target image can be associated with each other, and the visual point position neighboring feature point A surrounding the point P of the worker's visual line position in the visual field camera image. , B, and C can be prevented from being extremely crushed (one of the inner angles of the triangle is extremely small). Therefore, what is viewed in the actual three-dimensional space does not require the three-dimensional coordinates, and without performing the three-dimensional projective transformation, the point P ′ of the operator's line-of-sight position in the image to be viewed can be easily obtained. It can be obtained with high accuracy by calculation processing. Therefore, the point P ′ of the operator's line-of-sight position in the visual recognition target image can be obtained without being affected by the error of the projective transformation parameter. Further, the point P ′ of the operator's line-of-sight position in the visual recognition target image can be accurately obtained without performing “preprocessing for removing distortion of the image” for reducing the error in the parameters of the projective transformation. As described above, in this embodiment, the correspondence between the visual field camera image that the worker is viewing and the visual target image that is obtained in advance as a region that is the visual target of the worker can be obtained. It is possible to easily and reliably automatically detect whether or not the user is watching.
Further, in the present embodiment, the feature points A, B, and C in the vicinity of the line-of-sight position are extracted after removing the corresponding points corresponding to the erroneous correspondence. It can be determined even more accurately.

（変形例）
本実施形態では、一般の画像に適用できることに加え、高い頑健性を有しているという特徴があるので、視野カメラ画像と視認対象画像の特徴点の抽出のために、SIFTを用いるようにした。しかしながら、特徴点を抽出することができれば、その手法は、SIFTに限定されない。
また、本実施形態では、誤対応除去部１０４による「対応点の対の選択、パラメータａ₁〜ａ₈の導出、投影誤差の導出、投影誤差と閾値との比較、及び変数Ｃの加算」（ステップＳ９０１〜Ｓ９０８）を、４つの「対応点の対」の全ての組み合わせについて行うようにした。しかしながら、必ずしもこのようにする必要はない。例えば、これらの処理を、４つの「対応点の対」の全ての組み合わせについて行った結果、これらの処理を行う回数が、４つの「対応点の対」の全ての組み合わせの数よりも少ない回数であっても、４つの「対応点の対」の全ての組み合わせについてこれらの処理を行ったのと同様の最適射影変換パラメータが得られることが確認された場合には、当該回数だけ、これらの処理を行うようにしてもよい。 (Modification)
In this embodiment, in addition to being applicable to a general image, there is a feature that it has high robustness. Therefore, SIFT is used for extracting feature points of a visual field camera image and a visual recognition target image. . However, if feature points can be extracted, the method is not limited to SIFT.
Further, in the present embodiment, “corresponding point pair selection, derivation of parameters a _{1 to} a ₈ , derivation of projection error, comparison of projection error and threshold value, and addition of variable C” by erroneous correspondence removing unit 104 ( Steps S901 to S908) are performed for all combinations of the four “pairs of corresponding points”. However, this is not always necessary. For example, as a result of performing these processes on all combinations of four “corresponding point pairs”, the number of times of performing these processes is smaller than the number of all combinations of four “corresponding point pairs”. Even when it is confirmed that the same optimal projective transformation parameters as those obtained by performing these processes for all the combinations of the four “pairs of corresponding points” can be obtained, Processing may be performed.

また、本実施形態では、視線位置近傍特徴点抽出部１０５は、視野カメラ画像のSIFT特徴点Ｓⁱの視野カメラ画像上の座標(x,y)を頂点とする三角形を、ドロネーの三角分割法（Delaunay Triangulation）を用いて作成し、作業者の視線位置がドロネー三角形の内部にある場合、当該ドロネー三角形の３つの頂点を構成する「視野カメラ画像のSIFT特徴点Ｓⁱ」を視線位置近傍特徴点Ａ、Ｂ、Ｃとして設定した。しかしながら、必ずしもこのようにする必要はない。例えば、次のようにしてもよい。まず、対応点として設定されている視野カメラ画像のSIFT特徴点Ｓⁱを全て選択する。次に、視野カメラ画像のSIFT特徴点Ｓⁱの中で、ユーザの視線位置からの距離が近い順にSIFT特徴点Ｓⁱを選択して、当該視線位置を内部に包含する三角形を構成する３つの特徴点を、視線位置近傍特徴点として抽出する。ここで、ユーザの視線位置からの距離が近い順に選択された３つの特徴点では、視線位置を内部に包含する三角形が構成されない場合には、次に距離が近い特徴点を抽出して、視線位置を内部に包含する三角形を構成出来た３つの特徴点を、視線位置近傍特徴点として抽出する。 Further, in this embodiment, the line-of-sight position feature point extraction unit 105 performs a Delaunay triangulation method on a triangle whose vertex is the coordinate (x, y) on the field-of-view camera image of the SIFT feature point S ⁱ of the field-of-view camera image. (Delaunay Triangulation), and when the operator's line-of-sight position is inside the Delaunay triangle, the “SIFT feature points S ⁱ of the visual field camera image” constituting the three vertices of the Delaunay triangle Points A, B, and C were set. However, this is not always necessary. For example, it may be as follows. First, all SIFT feature points S ⁱ of the visual field camera image set as corresponding points are selected. Next, among the SIFT feature points S ⁱ of the field-of-view camera image, the SIFT feature points S ⁱ are selected in order of increasing distance from the user's line-of-sight position, and three triangles are formed that include the line-of-sight position inside. Feature points are extracted as feature points near the line-of-sight position. Here, among the three feature points selected in the order of the shortest distance from the user's line-of-sight position, if a triangle that includes the line-of-sight position is not formed, the feature point with the next closest distance is extracted and the line-of-sight is extracted. Three feature points that can form a triangle including the position inside are extracted as feature points near the line-of-sight position.

また、本実施形態では、誤対応除去部１０４において、視野カメラ画像から視認対象画像への射影変換を行うようにした（ステップＳ９０２）。しかしながら、視認対象画像から視野カメラ画像への射影変換を行うようにしてもよい。
また、作業者の視野範囲を撮像することができ、撮像した画像における作業者の視線位置を検出することができれば、撮像装置は、頭部装着型の視野カメラでなくてもよい。 Further, in the present embodiment, the erroneous correspondence removing unit 104 performs projective conversion from the visual field camera image to the visual recognition target image (step S902). However, you may make it perform the projective transformation from a visual recognition target image to a visual field camera image.
Further, the imaging device may not be a head-mounted visual field camera as long as the visual field range of the worker can be captured and the worker's line-of-sight position in the captured image can be detected.

（請求項との対応）
本実施形態では、例えば、画像取得部１０１を用いることにより、視認対象画像取得手段及び視野画像取得手段が実現される。また、本実施形態では、例えば、特徴点抽出部１０２を用いることにより、視認対象画像特徴点抽出手段及び視野画像特徴点抽出手段が実現される。
また、本実施形態では、例えば、特徴点マッチング部１０３を用いることにより、対応点抽出手段が実現される。具体的に、例えば、特徴点マッチング部１０３がステップＳ８０２の処理を行うことにより導出手段が実現される。また、例えば、特徴点マッチング部１０３がステップＳ８０３、Ｓ８０４の処理を行うことにより探索手段が実現される。また、例えば、特徴点マッチング部１０３がステップＳ８０５、Ｓ８０６の処理を行うことにより抽出手段が実現される。ここで、（４）式の関係を満たす場合が、誤差が最も小さくなる視認対象画像の特徴点の誤差と、前記誤差が二番目に小さくなる視認対象画像の特徴点の誤差との差が閾値よりも大きい場合の一例である。
また、本実施形態では、例えば、視線位置近傍特徴点抽出部１０５を用いることにより、視線位置近傍特徴点抽出手段が実現される。また、本実施形態では、例えば、視認対象画像上視線位置導出部１０６を用いることにより、視認対象画像上視線位置導出手段が実現される。
また、本実施形態では、例えば、誤対応除去部１０４を用いることにより、対応点除去手段が実現される。ここで、例えば、誤対応除去部１０４がステップＳ９０１、Ｓ９０２の処理を行うことにより、前記視野画像の点と前記視認対象画像の点とを射影変換するためのパラメータを導出することが実現される。また、例えば、誤対応除去部１０４がステップＳ９１０〜Ｓ９１２、Ｓ９１４の処理を行うことにより、導出したパラメータを用いて前記対応点の一方を射影変換して得られた画像上の位置と、当該対応点の他方の画像上の位置とを、前記対応点抽出手段により抽出された対応点のそれぞれについて比較することが実現される。 (Correspondence with claims)
In the present embodiment, for example, by using the image acquisition unit 101, a visual target image acquisition unit and a visual field image acquisition unit are realized. In the present embodiment, for example, by using the feature point extraction unit 102, the visual target image feature point extraction unit and the visual field image feature point extraction unit are realized.
In the present embodiment, for example, the corresponding point extraction unit is realized by using the feature point matching unit 103. Specifically, for example, the derivation unit is realized by the feature point matching unit 103 performing the process of step S802. In addition, for example, the feature point matching unit 103 performs the processing in steps S803 and S804, thereby realizing a search unit. In addition, for example, the feature point matching unit 103 performs the processing of steps S805 and S806, thereby realizing an extraction unit. Here, when the relationship of the expression (4) is satisfied, the difference between the error of the feature point of the viewing target image with the smallest error and the error of the feature point of the viewing target image with the second smallest error is the threshold value. It is an example in the case of larger than.
Further, in the present embodiment, for example, the gaze position vicinity feature point extraction unit is realized by using the gaze position vicinity feature point extraction unit 105. In the present embodiment, for example, the visual target image upper line-of-sight position deriving unit 106 is realized by using the visual target image upper visual line position deriving unit 106.
Further, in the present embodiment, for example, the corresponding point removing unit is realized by using the erroneous correspondence removing unit 104. Here, for example, the erroneous correspondence removing unit 104 performs the processes of steps S901 and S902 to derive parameters for projective transformation between the points of the visual field image and the points of the visual recognition target image. . In addition, for example, the incorrect correspondence removing unit 104 performs the processes of steps S910 to S912 and S914, so that the position on the image obtained by projective transformation of one of the corresponding points using the derived parameter and the corresponding correspondence. Comparing the position of the point on the other image with respect to each of the corresponding points extracted by the corresponding point extracting means is realized.

尚、以上説明した本発明の実施形態は、コンピュータがプログラムを実行することによって実現することができる。また、プログラムをコンピュータに供給するための手段、例えばかかるプログラムを記録したＣＤ−ＲＯＭ等のコンピュータ読み取り可能な記録媒体、又はかかるプログラムを伝送する伝送媒体も本発明の実施の形態として適用することができる。また、前記プログラムを記録したコンピュータ読み取り可能な記録媒体などのプログラムプロダクトも本発明の実施の形態として適用することができる。前記のプログラム、コンピュータ読み取り可能な記録媒体、伝送媒体及びプログラムプロダクトは、本発明の範疇に含まれる。
また、以上説明した本発明の実施形態は、何れも本発明を実施するにあたっての具体化の例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその技術思想、またはその主要な特徴から逸脱することなく、様々な形で実施することができる。 The embodiment of the present invention described above can be realized by a computer executing a program. Further, a means for supplying the program to the computer, for example, a computer-readable recording medium such as a CD-ROM recording such a program, or a transmission medium for transmitting such a program may be applied as an embodiment of the present invention. it can. A program product such as a computer-readable recording medium that records the program can also be applied as an embodiment of the present invention. The programs, computer-readable recording media, transmission media, and program products are included in the scope of the present invention.
In addition, the embodiments of the present invention described above are merely examples of implementation in carrying out the present invention, and the technical scope of the present invention should not be construed as being limited thereto. Is. That is, the present invention can be implemented in various forms without departing from the technical idea or the main features thereof.

１００視線位置検出装置
１０１画像取得部
１０２特徴点抽出部
１０３特徴点マッチング部
１０４誤対応除去部
１０５視線位置近傍特徴点抽出部
１０６視認対象画像上視線位置導出部
１０７視認対象画像上視線位置表示部 DESCRIPTION OF SYMBOLS 100 Eye-gaze position detection apparatus 101 Image acquisition part 102 Feature point extraction part 103 Feature point matching part 104 Incorrect correspondence removal part 105 Eye-gaze position vicinity feature point extraction part 106 Visual object image upper eye-gaze position derivation part 107 Visual object image upper eye-gaze position display part

Claims

A visual target image acquisition means for acquiring a visual target image that is a two-dimensional image of an area to be visually recognized by the user;
A field-of-view image acquisition means for acquiring a field-of-view image that is a two-dimensional image captured by an imaging means worn by a user;
A visual target image feature that extracts a feature point that is a point in the visual recognition target image for associating the visual recognition target image acquired by the visual recognition target image acquisition unit with the visual field image acquired by the visual field image acquisition unit. Point extraction means;
Field-of-view image feature point extraction for extracting feature points that are points in the field-of-view image for associating the field-of-view image acquired by the field-of-view image acquisition unit with the field-of-view image acquired by the field-of-view image acquisition unit Means,
The feature points of the visual field image extracted by the visual field image feature point extracting unit and the visual target image extracted by the visual target image feature point extracting unit corresponding to the feature points of the feature points of the visual field image correspond to each other. Corresponding point extracting means for extracting feature points as corresponding points;
From the feature points of the visual field image extracted as the corresponding points by the corresponding point extraction unit, three characteristic points constituting a triangle that internally includes the user's line-of-sight position in the visual field image acquired by the visual field image acquisition unit, Eye gaze position neighborhood feature point extracting means for extracting gaze position neighborhood feature points;
A visual target image upper line-of-sight position deriving unit for deriving a position in the visual recognition target image corresponding to a user's visual line position in the visual field image as a user's visual line position in the visual recognition target image; Have
The visual target image upper line-of-sight position deriving means calculates the coordinates of the user's line-of-sight position in the visual field image and the three line-of-sight position neighboring feature points when viewed from a coordinate system determined from the three characteristic points near the line-of-sight position. A line-of-sight position detection apparatus, wherein a position having the same coordinates when viewed from a coordinate system determined from feature points of three corresponding visual target images is derived as a user's line-of-sight position in the visual target image.

The line-of-sight position neighboring feature point extracting unit performs Delaunay triangulation on the feature points of the visual field image extracted as the corresponding points by the corresponding point extracting unit, and the Delaunay triangle obtained by performing the Delaunay triangulation is performed. The feature points of the three visual field images constituting the vertices of the Delaunay triangle in which the visual line position of the user in the visual field image is included are extracted as the visual line position neighboring feature points. The line-of-sight position detection apparatus according to 1.

Based on the coordinates on the image of the corresponding point extracted by the corresponding point extracting means, a parameter for projective transformation between the point of the visual field image and the point of the visual recognition target image is derived, and the derived parameter is used. The position on the image obtained by projective transformation of one of the corresponding points and the position on the other image of the corresponding point are compared for each of the corresponding points extracted by the corresponding point extracting unit, The line-of-sight position detection apparatus according to claim 1, further comprising: a corresponding point removing unit that removes the corresponding point based on the result.

The corresponding point extracting unit includes a feature amount of the feature point of the visual field image extracted by the visual field image feature point extraction unit, and a feature amount of the feature point of the visual target image extracted by the visual target image feature point extracting unit. Derivation means for deriving the error of
The error derived by the deriving unit is the characteristic point of the visual field image extracted by the visual field image feature point extracting unit from the characteristic points of the visual target image extracted by the visual recognition target image feature point extracting unit. Search means for searching for the feature point of the viewing target image that is the smallest and the feature point of the viewing target image that is the second smallest in error.
When the difference between the error of the feature point of the viewing target image with the smallest error and the error of the feature point of the viewing target image with the second smallest error is larger than a threshold value, The line-of-sight position detection according to any one of claims 1 to 3, further comprising: extraction means for extracting, as the corresponding point, a feature point of the visual target image with the smallest error. apparatus.

The visual target image feature point extraction unit extracts the feature point of the visual target image acquired by the visual target image acquisition unit by SIFT (Scale-Invariant Feature Transform),
5. The visual field image feature point extraction unit extracts a feature point of the visual field image acquired by the visual field image acquisition unit by SIFT (Scale-Invariant Feature Transform). The line-of-sight position detection apparatus according to the item.

A visual recognition target image acquisition step of acquiring a visual recognition target image that is a two-dimensional image of an area to be visually recognized by the user;
A field-of-view image acquisition step of acquiring a field-of-view image that is a two-dimensional image captured by the imaging means worn by the user;
A visual target image feature that extracts a feature point that is a point in the visual recognition target image for associating the visual recognition target image acquired in the visual recognition target image acquisition step with the visual field image acquired by the visual field image acquisition unit. A point extraction process;
Field-of-view image feature point extraction for extracting feature points that are points in the field-of-view image for associating the field-of-view image acquired by the field-of-view image acquisition means with the field-of-view image acquired by the field-of-view image acquisition step Process,
The feature points of the visual field image extracted by the visual field image feature point extracting step, the feature points of the visual field image extracted by the visual field image feature point extracting step, and the feature amounts corresponding to the feature points of the visual field image. A corresponding point extracting step of extracting feature points as corresponding points;
From the feature points of the visual field image extracted as the corresponding points by the corresponding point extraction step, three characteristic points constituting a triangle that internally includes the user's line-of-sight position in the visual field image acquired by the visual field image acquisition step, A line-of-sight position vicinity feature point extracting step for extracting as a line-of-sight position vicinity feature point;
A visual target image upper line-of-sight position deriving step of deriving a position in the visual recognition target image corresponding to a user's visual line position in the visual field image as a user's visual line position in the visual recognition target image; Have
The visual target image upper line-of-sight position deriving step uses the coordinates of the user's line-of-sight position in the visual field image and the three line-of-sight position neighboring feature points when viewed from a coordinate system determined from the three characteristic points near the line-of-sight position. A line-of-sight position detection method, wherein a position having the same coordinates when viewed from a coordinate system determined from feature points of three corresponding visual target images is derived as a user's line-of-sight position in the visual target image.

The line-of-sight position feature point extraction step performs Delaunay triangulation on the feature points of the visual field image extracted as the corresponding points by the corresponding point extraction step, and the Delaunay triangle obtained by performing the Delaunay triangulation The feature points of the three visual field images constituting the vertices of the Delaunay triangle in which the visual line position of the user in the visual field image is included are extracted as the visual line position neighboring feature points. 6. A method of detecting a gaze position according to 6.

Based on the coordinates on the image of the corresponding point extracted by the corresponding point extraction step, a parameter for projective transformation between the point of the visual field image and the point of the visual recognition target image is derived, and the derived parameter is used. Comparing the position on the image obtained by projective transformation of one of the corresponding points with the position on the other image of the corresponding point for each of the corresponding points extracted by the corresponding point extraction step The line-of-sight position detection method according to claim 6, further comprising a corresponding point removal step of removing the corresponding point based on the result.

The corresponding point extracting step includes a feature amount of the feature point of the visual field image extracted by the visual field image feature point extraction step, and a feature amount of the feature point of the visual recognition target image extracted by the visual recognition target image feature point extraction step. A derivation process for deriving the error of
Among the feature points of the visual target image extracted by the visual recognition object image feature point extracting step, the error derived by the derivation step is the feature point of the visual field image extracted by the visual field image feature point extraction step. A search step for searching for a feature point of the viewing target image that is the smallest and a feature point of the viewing target image that has the second smallest error;
When the difference between the error of the feature point of the viewing target image with the smallest error and the error of the feature point of the viewing target image with the second smallest error is larger than a threshold value, The eye-gaze position detection according to any one of claims 6 to 8, further comprising: an extraction step of extracting the feature point of the visual target image with the smallest error as the corresponding point. Method.

The visual recognition target image feature point extraction step extracts feature points of the visual recognition target image acquired in the visual recognition target image acquisition step by SIFT (Scale-Invariant Feature Transform),
The said visual field image feature point extraction process extracts the feature point of the visual field image acquired by the said visual field image acquisition process by SIFT (Scale-Invariant Feature Transform), The any one of Claims 6-9 characterized by the above-mentioned. The line-of-sight position detection method according to item.

A computer program for causing a computer to execute each step of the eye gaze position detecting method according to any one of claims 6 to 10.