JP2007025963A

JP2007025963A - Device and program for measuring line of sight, and program for generating line of sight calibration data

Info

Publication number: JP2007025963A
Application number: JP2005205635A
Authority: JP
Inventors: Yasuhito Sawahata; 康仁澤畠; Kazuaki Komine; 一晃小峯; Hisaya Morita; 寿哉森田
Original assignee: Nippon Hoso Kyokai NHK; NHK Engineering Services Inc; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp; NHK Engineering System Inc
Priority date: 2005-07-14
Filing date: 2005-07-14
Publication date: 2007-02-01
Anticipated expiration: 2025-07-14
Also published as: JP4537901B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device for measuring a line of sight for enabling a user to easily perform calibration alone. <P>SOLUTION: The device 1 is provided with an object detecting means 17 for detecting an object from a video displayed on a picture, a keyword registration control means 18 for performing the registration of a keyword or the erasure of the registration of the keyword in a keyword storage means 12 according to whether or not the object is detected, a voice recognizing means 11 for recognizing a voice to be uttered by a user, a keyword detecting means 13 for detecting whether or not the keyword is included in the voice recognition result, a calibration data generating means 20 for generating calibration data based on the position of the object in a stage where the keyword is detected and a line of sight calibrating means 23 for calibrating the line of sight based on the calibration data. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、表示装置に表示される映像を視認する利用者の視線を測定するための視線測定装置および視線測定プログラム、ならびに、視線校正データ生成プログラムに関する。 The present invention relates to a line-of-sight measurement device, a line-of-sight measurement program, and a line-of-sight calibration data generation program for measuring the line of sight of a user viewing a video displayed on a display device.

現在、人の視線を測定して、コンピュータの表示装置に表示されるメニュー、アイコン等を、その視線によって選択、操作するインタフェース（視線インタフェース）に関する技術が種々提案されている。このようなインタフェース技術の前提として、視線を測定するための視線測定装置が必要となる。
この視線測定装置としては、眼球に赤外線光を照射し当該眼球を撮影した画像から瞳孔中心と、赤外線光の角膜表面での反射光を示すプルキニエ像とにより視線を測定する技術が開示されている（特許文献１参照）。 Currently, various techniques relating to an interface (line-of-sight interface) for measuring a person's line of sight and selecting and operating menus, icons, and the like displayed on a display device of a computer by the line of sight have been proposed. As a premise of such interface technology, a line-of-sight measuring device for measuring the line of sight is required.
As this line-of-sight measurement device, a technique is disclosed in which a line of sight is measured from an image obtained by irradiating an eyeball with infrared light and photographing the eyeball using a pupil center and a Purkinje image showing reflected light on the cornea surface of the infrared light. (See Patent Document 1).

一般に、人（利用者）が表示装置の画面上を注視した位置と、視線測定装置が眼球等の動きにより測定した画面上の位置との間には、眼球形状の個人差等に起因する誤差が存在しているため、視線測定装置では、その誤差を補正するため、キャリブレーション（校正）処理を行っている。
このキャリブレーション処理は、利用者が予め画面上の複数のマーカを順次注視し、その注視した位置と、視線測定装置が眼球等の動きにより測定した位置との位置関係からキャリブレーションデータ（校正データ）を生成しておき、そのキャリブレーションデータによって、利用者の視線を補正する処理である。
なお、従来、利用者が最低２点のマーカを注視することで、キャリブレーション処理を行うことが可能な技術が開示されている（非特許文献１参照）。
特開２００３−７９５７７号公報（段落００２７〜００４１、図１）大野健彦、他２名「２点補正による簡易キャリブレーションを実現した視線測定システム」、情報処理学会論文誌Ｖｏｌ．４４Ｎｏ．４ｐｐ．１１３６−１１４９、Ａｐｒ．２００３ In general, an error caused by an individual difference in eyeball shape between a position where a person (user) gazes on the screen of the display device and a position on the screen measured by the movement of the eyeball etc. Therefore, the line-of-sight measurement apparatus performs a calibration process in order to correct the error.
This calibration process is performed by calibration data (calibration data) based on the positional relationship between a position where the user has sequentially looked at a plurality of markers on the screen in advance, and the position where the line-of-sight measuring device has measured the movement of the eyeball or the like. ) And the user's line of sight is corrected by the calibration data.
Conventionally, a technique has been disclosed that allows a user to perform calibration processing by gazing at least two markers (see Non-Patent Document 1).
Japanese Patent Laying-Open No. 2003-79579 (paragraphs 0027 to 0041, FIG. 1) Takehiko Ohno and two others, "Gaze measurement system realizing simple calibration by two-point correction", Transactions of Information Processing Society of Japan, Vol. 44 no. 4 pp. 1136-1149, Apr. 2003

しかし、一般に、キャリブレーションデータ（校正データ）は、視線測定装置において、オペレータの操作によって表示装置の画面上に複数のマーカを含んだ映像を表示させ、利用者がオペレータから指示されたマーカを注視したことを、オペレータが会話等で確認しながら、視線の測定を行うことで生成される。すなわち、キャリブレーションデータを生成するには、利用者以外のオペレータが必要となり、コンピュータ操作用のパーソナルなインタフェース環境を実現する際の障害となっている。
なお、利用者が、キーボード等の入力装置を用い、マーカを含んだ映像を画面上に表示させ、注視するマーカの位置を順次変更することで、利用者が一人で視線の測定を行うことは可能である。しかし、注視するマーカの位置を特定する際に視線測定装置の操作を行う必要があるため、入力装置を常に手元に置いておく必要があったり、その操作のために視線を画面から外さなければならなかったりと、キャリブレーションデータの生成に手間がかかるという問題がある。 However, in general, calibration data (calibration data) is a visual line measurement device that displays an image including a plurality of markers on the screen of a display device by an operator's operation, and the user gazes at the marker indicated by the operator. It is generated by measuring the line of sight while the operator confirms this through conversation or the like. That is, in order to generate calibration data, an operator other than the user is required, which is an obstacle in realizing a personal interface environment for computer operation.
Note that the user can measure the line of sight alone by using an input device such as a keyboard to display an image including a marker on the screen and sequentially changing the position of the marker to be watched. Is possible. However, since it is necessary to operate the line-of-sight measuring device when identifying the position of the marker to be watched, it is necessary to always keep the input device at hand or to remove the line of sight from the screen for that operation. There is a problem that it takes time to generate calibration data.

さらに、従来の視線測定装置は、キャリブレーションデータを生成する際に、複数のマーカを含んだ専用の映像を表示装置に表示させる必要があるため、その準備作業が必要となる。例えば、コンピュータのインタフェースとして視線測定装置を使用する際に、キャリブレーションデータを再度調整したい場合や、コンピュータの利用者が代わるといった場合に、再度準備作業を行わねばならないため、コンピュータを視線インタフェースによって利用するまでに時間がかかってしまうという問題がある。 Furthermore, since the conventional gaze measurement apparatus needs to display a dedicated image including a plurality of markers on the display device when generating calibration data, preparation work is required. For example, when using a line-of-sight measurement device as a computer interface, if you want to recalibrate the calibration data, or if the computer user changes, you will have to perform preparations again. There is a problem that it takes time to do.

本発明は、以上のような問題点に鑑みてなされたものであり、オペレータを伴わずに利用者が一人で容易にキャリブレーションを行うことが可能で、さらに、複数のマーカを含んだ専用の映像を用いなくてもキャリブレーションを行うことが可能な視線測定装置および視線測定プログラム、ならびに、視線校正データ生成プログラムを提供することを目的とする。 The present invention has been made in view of the problems as described above, and can be easily calibrated by a single user without an operator, and further includes a dedicated marker including a plurality of markers. An object of the present invention is to provide a line-of-sight measurement device, a line-of-sight measurement program, and a line-of-sight calibration data generation program capable of performing calibration without using an image.

本発明は、前記目的を達成するために創案されたものであり、まず、請求項１に記載の視線測定装置は、表示装置の画面を視認する利用者の視線を測定する視線測定装置において、キーワード記憶手段と、音声認識手段と、キーワード検出手段と、校正データ生成手段と、視線校正手段とを備える構成とした。 The present invention was devised to achieve the above-mentioned object.First, the line-of-sight measurement apparatus according to claim 1 is a line-of-sight measurement apparatus that measures the line of sight of a user viewing the screen of a display device. A keyword storage unit, a voice recognition unit, a keyword detection unit, a calibration data generation unit, and a line-of-sight calibration unit are provided.

かかる構成において、視線測定装置は、予めキーワード記憶手段に、画面上に表示される映像内の注視対象となるオブジェクトの位置と当該オブジェクトを特定するキーワードとを対応付けて記憶しておく。そして、視線測定装置は、音声認識手段によって、利用者が発する音声を認識する。このとき、キーワード記憶手段に記憶されているキーワードを利用者が発声することで、視線測定装置は、画面上のオブジェクトを特定することが可能になる。
そして、視線測定装置は、キーワード検出手段によって、音声認識手段で認識された認識結果に、キーワードが含まれているか否かを検出する。ここで、視線測定装置は、音声認識結果にキーワードが含まれている場合に、利用者がオブジェクトを注視している状態であるとみなす。 In such a configuration, the line-of-sight measurement device stores in advance the keyword storage unit in association with the position of the object to be watched in the video displayed on the screen and the keyword that identifies the object. Then, the line-of-sight measurement device recognizes the voice emitted by the user by the voice recognition means. At this time, when the user utters the keyword stored in the keyword storage unit, the line-of-sight measurement apparatus can specify the object on the screen.
Then, the line-of-sight measurement device detects whether or not the keyword is included in the recognition result recognized by the voice recognition unit by the keyword detection unit. Here, the line-of-sight measurement device considers that the user is gazing at an object when a keyword is included in the speech recognition result.

このように、利用者がオブジェクトを注視している状態で、視線測定装置は、校正データ生成手段によって、利用者が注視しているオブジェクトの位置に基づいて、視線を校正する校正データを生成する。この校正データは、視線とオブジェクトの位置とがどれだけずれているのかを示す指標である。
そして、視線測定装置は、視線校正手段によって、校正データを参照して、視線を校正する。すなわち、校正データで表されるずれ量分の補正を行うことで、実際に利用者が画面を注視する視線を測定することが可能になる。 Thus, in a state where the user is gazing at the object, the line-of-sight measurement device generates calibration data for calibrating the line of sight based on the position of the object being watched by the user by the calibration data generation means. . This calibration data is an index indicating how much the line of sight is shifted from the position of the object.
The line-of-sight measurement device calibrates the line of sight with reference to the calibration data by the line-of-sight calibration means. That is, by correcting the deviation amount represented by the calibration data, it becomes possible to measure the line of sight of the user actually gazing at the screen.

また、請求項２に記載の視線測定装置は、表示装置の画面を視認する利用者の視線を測定する視線測定装置において、キーワード記憶手段と、オブジェクト検出手段と、キーワード登録制御手段と、音声認識手段と、キーワード検出手段と、校正データ生成手段と、視線校正手段とを備える構成とした。 The line-of-sight measurement apparatus according to claim 2 is a line-of-sight measurement apparatus that measures the line of sight of a user who visually recognizes the screen of the display device. The keyword storage means, the object detection means, the keyword registration control means, and the voice recognition Means, keyword detection means, calibration data generation means, and line-of-sight calibration means.

かかる構成において、視線測定装置は、キーワード記憶手段に画面上に表示される映像内の注視対象となるオブジェクトを特定するキーワードを記憶する。そして、視線測定装置は、オブジェクト検出手段によって、映像内からオブジェクトを検出し、その位置を特定する。
なお、視線測定装置は、キーワード登録制御手段によって、オブジェクト検出手段でオブジェクトが検出された段階で、そのオブジェクトに予め設定されているキーワードをキーワード記憶手段に登録する。また、オブジェクトが未検出となった段階で、そのオブジェクトに予め設定されているキーワードのキーワード記憶手段における登録を削除する。
これによって、キーワード記憶手段に登録されているキーワードに対応するオブジェクトのみが現在画面上に表示されていることとなる。 In such a configuration, the line-of-sight measurement device stores a keyword that specifies an object to be watched in the video displayed on the screen in the keyword storage unit. Then, the line-of-sight measurement device detects the object from the video by the object detection means and specifies its position.
The line-of-sight measurement device registers a keyword set in advance in the keyword storage means when the object is detected by the object detection means by the keyword registration control means. In addition, when the object is not detected, the registration of the keyword preset for the object in the keyword storage unit is deleted.
As a result, only the object corresponding to the keyword registered in the keyword storage means is currently displayed on the screen.

そして、視線測定装置は、音声認識手段によって、利用者が発する音声を認識する。そして、視線測定装置は、キーワード検出手段によって、音声認識手段で認識された認識結果に、キーワードが含まれているか否かを検出する。ここで、視線測定装置は、音声認識結果にキーワードが含まれている場合に、利用者がオブジェクトを注視している状態であるとみなす。
このように、利用者がオブジェクトを注視している状態で、視線測定装置は、校正データ生成手段によって、利用者が注視しているオブジェクトの位置に基づいて、視線を校正する校正データを生成する。そして、視線測定装置は、視線校正手段によって、校正データを参照して、視線を校正する。 Then, the line-of-sight measurement device recognizes the voice emitted by the user by the voice recognition means. Then, the line-of-sight measurement device detects whether or not the keyword is included in the recognition result recognized by the voice recognition unit by the keyword detection unit. Here, the line-of-sight measurement device considers that the user is gazing at an object when a keyword is included in the speech recognition result.
Thus, in a state where the user is gazing at the object, the line-of-sight measurement device generates calibration data for calibrating the line of sight based on the position of the object being watched by the user by the calibration data generation means. . The line-of-sight measurement device calibrates the line of sight with reference to the calibration data by the line-of-sight calibration means.

さらに、請求項３に記載の視線測定装置は、請求項２に記載の視線測定装置において、前記オブジェクトを特徴付ける特徴量と前記キーワードとを対応付けて記憶するオブジェクト情報記憶手段を備え、前記オブジェクト検出手段が、前記オブジェクト情報記憶手段に記憶されている特徴量に基づいて、前記映像から前記オブジェクトを検出する構成とした。 Furthermore, the eye gaze measurement device according to claim 3 is the eye gaze measurement device according to claim 2, further comprising object information storage means for storing the feature quantity characterizing the object and the keyword in association with each other, and the object detection The means detects the object from the video based on the feature quantity stored in the object information storage means.

かかる構成において、視線測定装置は、予めオブジェクト情報記憶手段に、オブジェクトを特徴付ける特徴量とキーワードとを対応付けて記憶しておく。この特徴量は、画像処理によってオブジェクトを認識可能なものであればどのような量であってもよく、例えば、色、形状、明るさ等である。
この特徴量を用いることで、オブジェクト検出手段は、映像内からオブジェクトを検出することが可能になる。 In such a configuration, the line-of-sight measurement device stores in advance in the object information storage means the feature quantity characterizing the object and the keyword in association with each other. This feature amount may be any amount as long as an object can be recognized by image processing, such as color, shape, and brightness.
By using this feature amount, the object detection means can detect the object from the video.

また、請求項４に記載の視線測定装置は、請求項２に記載の視線測定装置において、前記オブジェクト検出手段が、前記オブジェクトの位置と前記キーワードとを前記映像の時間区間に対応付けて記述したメタデータに基づいて、前記オブジェクトを検出する構成とした。 According to a fourth aspect of the present invention, in the visual line measurement device according to the second aspect, the object detection unit describes the position of the object and the keyword in association with the time interval of the video. The object is detected based on the metadata.

かかる構成において、視線測定装置は、オブジェクト検出手段によって、メタデータを解析することでオブジェクトの検出を行う。このメタデータには、映像内のどの時間区間にオブジェクトがどの位置に表示され、また、そのオブジェクトのキーワードが何であるのかを記述しておく。これによって、オブジェクト検出手段は、ある時間にオブジェクトが画面上のどの位置に表示されているのかを認識することができる。
また、メタデータによって、オブジェクトにキーワードが対応付けられているため、当該オブジェクトの出現に対応して、キーワード登録制御手段が、キーワードの登録または登録の削除を行う。 In such a configuration, the line-of-sight measurement apparatus detects an object by analyzing metadata by the object detection unit. This metadata describes in what time section in the video the object is displayed at which position, and what the keyword of the object is. Thereby, the object detection means can recognize where the object is displayed on the screen at a certain time.
Further, since a keyword is associated with the object by the metadata, the keyword registration control means registers or deletes the keyword corresponding to the appearance of the object.

さらに、請求項５に記載の視線測定プログラムは、表示装置の画面を視認する利用者の視線を測定するために、コンピュータを、基準位置特定手段、音声認識手段、キーワード検出手段、校正データ生成手段、視線校正手段として機能させる構成とした。 Further, the line-of-sight measurement program according to claim 5 is configured to use a computer as a reference position specifying unit, a voice recognition unit, a keyword detection unit, and a calibration data generation unit in order to measure the line of sight of a user viewing the screen of the display device. Then, it is configured to function as a line-of-sight calibration means.

かかる構成において、視線測定プログラムは、基準位置特定手段によって、光を照射された利用者の眼球を含んだ眼球画像から瞳孔中心の位置および角膜反射点の位置を特定する。なお、瞳孔の位置は、利用者がオブジェクトを注視する際の視線によって眼球内で変化するが、角膜反射点は眼球内で変化しないため、眼球内において基準となる位置を示すことになる。 In such a configuration, the line-of-sight measurement program specifies the position of the center of the pupil and the position of the corneal reflection point from the eyeball image including the eyeball of the user irradiated with light by the reference position specifying unit. Note that the position of the pupil changes in the eyeball depending on the line of sight when the user gazes at the object, but the corneal reflection point does not change in the eyeball, and thus indicates a reference position in the eyeball.

そして、視線測定プログラムは、音声認識手段によって、利用者が発する音声を認識する。そして、視線測定プログラムは、キーワード検出手段によって、音声認識手段で認識された認識結果に、キーワードが含まれているか否かを検出する。
さらに、視線測定プログラムは、校正データ生成手段によって、利用者が注視しているオブジェクトの位置により、視線を校正する校正データを生成する。そして、視線測定プログラムは、視線校正手段によって、校正データを参照して、視線を校正する。これによって、実際に利用者が画面を注視する視線を測定することが可能になる。 Then, the line-of-sight measurement program recognizes the voice uttered by the user by the voice recognition means. Then, the line-of-sight measurement program detects whether or not the keyword is included in the recognition result recognized by the voice recognition unit by the keyword detection unit.
Further, the line-of-sight measurement program generates calibration data for calibrating the line of sight according to the position of the object being watched by the user by means of the calibration data generation means. Then, the line-of-sight measurement program calibrates the line of sight with reference to the calibration data by the line-of-sight calibration means. This makes it possible to measure the line of sight of the user actually gazing at the screen.

また、請求項６に記載の視線校正データ生成プログラムは、表示装置の画面を視認する利用者の視線を校正する校正データを生成するために、コンピュータを、基準位置特定手段、音声認識手段、キーワード検出手段、校正データ生成手段として機能させる構成とした。 Further, the line-of-sight calibration data generation program according to claim 6, in order to generate calibration data for calibrating the line of sight of a user who visually recognizes the screen of the display device, a computer, a reference position specifying unit, a voice recognition unit, a keyword The detection unit and the calibration data generation unit are configured to function.

かかる構成において、視線校正データ生成プログラムは、基準位置特定手段によって、光を照射された利用者の眼球を含んだ眼球画像から瞳孔中心の位置および角膜反射点の位置を特定する。
そして、視線校正データ生成プログラムは、音声認識手段によって、利用者が発する音声を認識する。そして、視線校正データ生成プログラムは、キーワード検出手段によって、音声認識手段で認識された認識結果に、キーワードが含まれているか否かを検出する。
さらに、視線校正データ生成プログラムは、校正データ生成手段によって、利用者が注視しているオブジェクトの位置により、視線を校正する校正データを生成する。この校正データは、視線とオブジェクトの位置がどれだけずれているのかを示す指標である。 In such a configuration, the line-of-sight calibration data generation program specifies the position of the pupil center and the position of the corneal reflection point from the eyeball image including the eyeball of the user irradiated with light by the reference position specifying unit.
Then, the line-of-sight calibration data generation program recognizes the voice uttered by the user by the voice recognition means. Then, the line-of-sight calibration data generation program detects whether or not the keyword is included in the recognition result recognized by the voice recognition unit by the keyword detection unit.
Further, the line-of-sight calibration data generation program generates calibration data for calibrating the line of sight according to the position of the object being watched by the user by means of the calibration data generation unit. This calibration data is an index indicating how much the line of sight is shifted from the position of the object.

本発明は、以下に示す優れた効果を奏するものである。
請求項１、請求項２または請求項５に記載の発明によれば、利用者が映像内のオブジェクトを注視し、そのオブジェクトに対応するキーワードを発声することで、利用者の視線を校正した視線測定を行うことができる。これによって、本発明は、オペレータを伴わずに利用者が一人で容易に校正（キャリブレーション）を行うことができ、コンピュータ操作用のパーソナルなインタフェース環境を実現することが可能になる。 The present invention has the following excellent effects.
According to the first, second, or fifth aspect of the invention, the line of sight in which the user gazes at an object in the video and utters a keyword corresponding to the object, thereby correcting the line of sight of the user. Measurements can be made. As a result, according to the present invention, a user can easily perform calibration (calibration) without an operator, and a personal interface environment for computer operation can be realized.

請求項３に記載の発明によれば、映像内から特徴量に基づいてオブジェクトを検出することができるため、マーカを含んだキャリブレーション専用の映像を用いる必要はなく、そのための準備作業を行う必要もない。このため、利用者は、コンテンツを視認しているときにいつでも校正を行うことができる。また、コンピュータ操作用のパーソナルなインタフェース環境において、利用者が代わる場合であっても、すぐに校正を行うことができ、コンピュータ操作を可能とするまでの時間を短縮することができる。 According to the third aspect of the present invention, since the object can be detected from the video based on the feature quantity, it is not necessary to use a video dedicated to calibration including the marker, and it is necessary to perform a preparation work for that purpose. Nor. For this reason, the user can calibrate at any time while viewing the content. Further, even if the user changes in a personal interface environment for computer operation, calibration can be performed immediately, and the time until the computer operation is enabled can be shortened.

請求項４に記載の発明によれば、メタデータによって、映像内からオブジェクトを検出することができ、一般的な映像であってもオブジェクトの位置を特定することができる。このため、利用者は、マーカを含んだキャリブレーション専用の映像を画面上に表示させる必要がないため、コンテンツを視認しているときにいつでも校正を行うことができる。また、本発明によれば、メタデータによるテキスト情報によって、オブジェクトの検出を行うため、画像処理等の複雑な処理を行う必要がない。このため、装置にかかる負荷を抑えることができるとともに、高速に視線の測定を行うことができる。 According to the fourth aspect of the present invention, the object can be detected from the video by the metadata, and the position of the object can be specified even for a general video. For this reason, the user does not need to display on the screen a calibration-dedicated image including a marker, so that calibration can be performed at any time while viewing the content. Further, according to the present invention, since the object is detected based on the text information based on the metadata, it is not necessary to perform complicated processing such as image processing. For this reason, while being able to suppress the load concerning an apparatus, a gaze can be measured at high speed.

請求項６に記載の発明によれば、利用者が映像内のオブジェクトを注視し、そのオブジェクトの対応するキーワードを発声することで、利用者の視線を校正する校正データを生成することができる。これによって、本発明は、オペレータを伴わずに利用者が一人で容易に校正を行うことができる。 According to the sixth aspect of the present invention, it is possible to generate calibration data for calibrating the user's line of sight by gazing at the object in the video and uttering the corresponding keyword of the object. Thus, according to the present invention, a user can easily perform calibration alone without an operator.

以下、本発明の実施の形態について図面を参照して説明する。
［視線測定装置の概要］
最初に、図１を参照して、本発明に係る視線測定装置の概要について説明する。図１は、本発明に係る視線測定装置の概要を説明するための説明図である。
図１に示すように、視線測定装置１は、表示装置Ｄの画面を利用者Ｈが注視したときの、利用者Ｈの視線を測定するものである。なお、視線測定装置１は、視線測定を行う際のキャリブレーション（校正）を、利用者Ｈの発話によって行うことを特徴とする。 Embodiments of the present invention will be described below with reference to the drawings.
[Outline of eye gaze measurement device]
First, with reference to FIG. 1, an outline of a visual line measuring apparatus according to the present invention will be described. FIG. 1 is an explanatory diagram for explaining an outline of a line-of-sight measurement apparatus according to the present invention.
As shown in FIG. 1, the line-of-sight measurement device 1 measures the line of sight of the user H when the user H gazes at the screen of the display device D. The line-of-sight measurement apparatus 1 is characterized by performing calibration (calibration) when performing line-of-sight measurement by the utterance of the user H.

すなわち、視線測定装置１は、表示装置Ｄの画面上に注視の対象となるオブジェクト（注視対象物）を含んだ映像（コンテンツ）を表示させ、利用者ＨがマイクＭを介して発した音声によってどのオブジェクトを注視しているのかを特定し、利用者Ｈの視線をオブジェクトの位置に基づいて校正する。例えば、図１では、視線測定装置１は、表示装置Ｄの画面上にオブジェクトを複数（ここでは、星印、丸印の２つ）表示し、利用者Ｈが「星印」を注視した後、「星で校正」と発話する。そして、視線測定装置１は、利用者Ｈの視線と実際の「星印」の位置との差を解析することで、視線を校正するための校正データを生成する。そして、２点以上のオブジェクトで校正データを生成した後、視線測定装置１は、利用者Ｈの視線を校正データによって逐次校正する。 That is, the line-of-sight measurement apparatus 1 displays a video (content) including an object (gazing target object) to be watched on the screen of the display device D, and the sound generated by the user H via the microphone M Which object is being watched is specified, and the line of sight of the user H is calibrated based on the position of the object. For example, in FIG. 1, the line-of-sight measurement device 1 displays a plurality of objects (here, two stars, a star symbol and a circle symbol) on the screen of the display device D, and the user H gazes at the “star symbol”. , Say "calibrate with stars". Then, the line-of-sight measurement device 1 generates calibration data for calibrating the line of sight by analyzing the difference between the line of sight of the user H and the actual position of the “star”. Then, after generating calibration data with two or more objects, the line-of-sight measurement device 1 sequentially calibrates the line of sight of the user H with the calibration data.

なお、利用者Ｈの視線は、例えば、強膜反射法、角膜反射法、瞳孔−角膜反射法等の一般的な技術により求めることができ、ここでは、視線測定装置１は、瞳孔−角膜反射法によって、発光手段（ＬＥＤ）Ｌから発光する赤外線が、利用者Ｈの眼球で反射した状態をカメラＣで撮影した画像（眼球画像）から求めることとする。また、視線は、画面上の注視点、あるいは、利用者の眼球の瞳孔から画面上の注視点へのベクトル（視線ベクトル）であってもよいが、以降の説明では、注視点を示すこととする。
以下、視線測定装置１の具体的な構成および動作について説明を行う。 The line of sight of the user H can be obtained by a general technique such as a scleral reflection method, a corneal reflection method, or a pupil-corneal reflection method. Here, the line-of-sight measurement apparatus 1 uses the pupil-corneal reflection. According to the method, the state in which the infrared light emitted from the light emitting means (LED) L is reflected by the eyeball of the user H is obtained from an image (eyeball image) captured by the camera C. The line of sight may be a gaze point on the screen or a vector (gaze vector) from the pupil of the user's eyeball to the gaze point on the screen. In the following description, the gaze point is indicated. To do.
Hereinafter, a specific configuration and operation of the line-of-sight measurement apparatus 1 will be described.

［視線測定装置の構成］
まず、図２を参照（適宜図１参照）して、視線測定装置の具体的な構成について説明する。図２は、本発明に係る視線測定装置の構成を示すブロック図である。
図２に示すように、視線測定装置１は、音声認識手段１１と、キーワード記憶手段１２と、キーワード検出手段１３と、コンテンツ入力手段１４と、コンテンツ出力手段１５と、オブジェクト情報記憶手段１６と、オブジェクト検出手段１７と、キーワード登録制御手段１８と、基準位置特定手段１９と、校正データ生成手段２０と、校正データ記憶手段２１と、視線算出手段２２と、視線校正手段２３とを備えている。なお、視線測定装置１には、利用者の音声を入力する音声入力手段としてのマイクＭと、利用者の眼球画像を撮影する撮影手段としての発光手段ＬとカメラＣとが接続されている。 [Configuration of eye gaze measurement device]
First, with reference to FIG. 2 (refer to FIG. 1 as appropriate), a specific configuration of the visual line measuring device will be described. FIG. 2 is a block diagram showing the configuration of the line-of-sight measurement device according to the present invention.
As shown in FIG. 2, the line-of-sight measurement device 1 includes a voice recognition unit 11, a keyword storage unit 12, a keyword detection unit 13, a content input unit 14, a content output unit 15, an object information storage unit 16, An object detection unit 17, a keyword registration control unit 18, a reference position specifying unit 19, a calibration data generation unit 20, a calibration data storage unit 21, a line-of-sight calculation unit 22, and a line-of-sight correction unit 23 are provided. The line-of-sight measurement apparatus 1 is connected to a microphone M as a voice input means for inputting a user's voice, and a light emitting means L and a camera C as a photographing means for photographing a user's eyeball image.

音声認識手段１１は、利用者が発する音声を、マイクＭにより入力して認識するものである。この音声認識手段１１は、一般的な音声認識技術を用いることができる。例えば、音声認識手段１１は、入力された音声信号をＡ／Ｄ変換し、ＬＰＣ（線形予測）法等により音声分析を行うことで、音声信号から音響的特徴パラメータを抽出する。そして、音声認識手段１１は、時系列の音響的特徴パラメータを、隠れマルコフモデル（ＨＭＭ：Hidden Markov Model）によりモデル化し、統計的言語モデル（Ｎグラムモデル等）を参照することで、音声を文字列である音声データに変換する。この音声認識手段１１で認識された音声データは、キーワード検出手段１３に出力される。 The voice recognizing means 11 recognizes a voice uttered by a user by inputting it with a microphone M. The voice recognition means 11 can use a general voice recognition technique. For example, the speech recognition unit 11 performs A / D conversion on the input speech signal and performs speech analysis using an LPC (Linear Prediction) method or the like, thereby extracting acoustic feature parameters from the speech signal. Then, the speech recognition means 11 models the time-series acoustic feature parameters by a Hidden Markov Model (HMM) and refers to a statistical language model (such as an N-gram model) so that the speech is converted to text. Convert to audio data that is a sequence. The voice data recognized by the voice recognition unit 11 is output to the keyword detection unit 13.

キーワード記憶手段１２は、表示装置Ｄの画面上に表示されるコンテンツ（映像）内に含まれる注視対象となるオブジェクトを特定するためのキーワードを記憶するものであって、ＲＡＭ（Random Access Memory）等の一般的な記憶手段である。このキーワード記憶手段１２は、表示装置Ｄの画面上に表示されるコンテンツ（映像）内に、オブジェクトが表示されている間、そのオブジェクトに対応するキーワードのみを記憶する。なお、このキーワードは、後記するキーワード登録制御手段１８によって、登録（記録）され、または、その登録が削除される。
なお、キーワード記憶手段１２には、オブジェクトを直接特定するキーワード以外に、オブジェクトを検索するための方向を特定するキーワードを予め記憶しておくこととしてもよい。例えば、「上」、「下」、「左」、「右」等である。 The keyword storage unit 12 stores a keyword for specifying an object to be watched included in content (video) displayed on the screen of the display device D, and includes a RAM (Random Access Memory) or the like. This is a general storage means. The keyword storage unit 12 stores only the keyword corresponding to the object while the object is displayed in the content (video) displayed on the screen of the display device D. This keyword is registered (recorded) by the keyword registration control means 18 described later, or the registration is deleted.
The keyword storage unit 12 may store in advance a keyword that specifies a direction for searching for an object in addition to a keyword that directly specifies the object. For example, “upper”, “lower”, “left”, “right”, and the like.

キーワード検出手段１３は、音声認識手段１１で認識された音声データが、キーワード記憶手段１２に記憶されているか否かを検出するものである。このキーワード検出手段１３は、キーワード記憶手段１２に音声データがキーワードとして記憶されていることを検出した場合は、その旨をキーワード検出通知として、オブジェクト検出手段１７に出力する。
なお、キーワード検出手段１３は、音声データに含まれる名詞または名詞句がキーワードと同一であるか否かでキーワードを検出することが望ましい。例えば、利用者が発声した「星で校正」という音声である場合、名詞である「星」がキーワード記憶手段１２に記憶されているか否かを検出する。 The keyword detection unit 13 detects whether or not the voice data recognized by the voice recognition unit 11 is stored in the keyword storage unit 12. When the keyword detection unit 13 detects that the voice data is stored as a keyword in the keyword storage unit 12, the keyword detection unit 13 outputs the fact to the object detection unit 17 as a keyword detection notification.
The keyword detection means 13 preferably detects the keyword based on whether or not the noun or noun phrase included in the voice data is the same as the keyword. For example, in the case of the voice “calibration with stars” uttered by the user, it is detected whether or not “star” as a noun is stored in the keyword storage unit 12.

また、キーワード記憶手段１２に方向を示すキーワードを記憶している場合、キーワード検出手段１３は、例えば、利用者が「右の星で校正」と発声することで、当該音声に含まれるキーワードである「右」および「星」を検出する。そして、方向を示すキーワードが検出された場合、キーワード検出手段１３は、キーワード検出通知に方向を示す情報を付加して、オブジェクト検出手段１７に出力する。 In addition, when a keyword indicating a direction is stored in the keyword storage unit 12, the keyword detection unit 13 is a keyword included in the voice when the user utters “calibration with the right star”, for example. Detect “right” and “star”. When a keyword indicating the direction is detected, the keyword detecting unit 13 adds information indicating the direction to the keyword detection notification and outputs the information to the object detecting unit 17.

コンテンツ入力手段１４は、外部からコンテンツを映像のストリームデータとして入力し、オブジェクト検出手段１７に出力するものである。
コンテンツ出力手段１５は、オブジェクト検出手段１７から出力されるコンテンツを外部（表示装置Ｄ）に出力するものである。
なお、コンテンツ入力手段１４に入力されるコンテンツが、当該視線測定装置１に入力される前段で、すでに表示装置Ｄに分岐されている場合は、コンテンツ出力手段１５を構成から省略することができる。 The content input means 14 inputs content from the outside as video stream data and outputs it to the object detection means 17.
The content output unit 15 outputs the content output from the object detection unit 17 to the outside (display device D).
If the content input to the content input unit 14 is already branched to the display device D before being input to the line-of-sight measurement device 1, the content output unit 15 can be omitted from the configuration.

オブジェクト情報記憶手段１６は、オブジェクトを特徴付ける特徴量とキーワードとを予め対応付け、オブジェクト情報として記憶するものであって、ハードディスク等の一般的な記憶手段である。
この特徴量は、一般的な映像の特徴量を用いることができる。例えば、オブジェクトの色に特徴がある場合は、オブジェクト領域内の各画素の色ベクトルを平均化した平均色ベクトル等、オブジェクトの形状に特徴がある場合は、オブジェクトを囲む外接矩形の面積等である。 The object information storage means 16 is a general storage means such as a hard disk, which associates in advance a feature quantity characterizing an object with a keyword and stores it as object information.
As this feature quantity, a general video feature quantity can be used. For example, when there is a feature in the color of the object, the average color vector obtained by averaging the color vectors of each pixel in the object area, or the like, when there is a feature in the shape of the object, the area of the circumscribed rectangle surrounding the object .

オブジェクト検出手段１７は、オブジェクト情報記憶手段１６に記憶されているオブジェクトごとの特徴量に基づいて、コンテンツ入力手段１４から入力されたコンテンツにおいてオブジェクトを検出するものである。なお、オブジェクト検出手段１７は、図示を省略したメモリを備え、コンテンツをフレームごとに記憶し、このフレーム単位でオブジェクトを検出する。
また、オブジェクト検出手段１７は、コンテンツからオブジェクトを検出した段階で、その旨をオブジェクト検出通知としてキーワード登録制御手段１８に出力し、その後、オブジェクトが検出されなくなった段階で、その旨をオブジェクト非検出通知としてキーワード登録制御手段１８に出力する。 The object detection unit 17 detects an object in the content input from the content input unit 14 based on the feature amount for each object stored in the object information storage unit 16. The object detection unit 17 includes a memory (not shown), stores content for each frame, and detects an object in units of frames.
Further, the object detection means 17 outputs the fact as an object detection notification to the keyword registration control means 18 when the object is detected from the content, and when the object is not detected thereafter, the object non-detection is detected. The notification is output to the keyword registration control means 18 as a notification.

なお、オブジェクト検出手段１７は、図示を省略したメモリに、オブジェクトを検出したか否かを示す状態を内部状態として記憶しておくこととする。このように、状態を記憶しておくことで、オブジェクト検出手段１７は、オブジェクトの検出と未検出との状態変化を認識することができる。 Note that the object detection means 17 stores a state indicating whether or not an object is detected as an internal state in a memory (not shown). As described above, by storing the state, the object detection unit 17 can recognize the state change between detection and non-detection of the object.

さらに、オブジェクト検出手段１７は、キーワード検出手段１３から、キーワード検出通知を入力された段階で、すなわち、利用者が、キーワードを発声した段階で、検出したキーワードに対応するオブジェクトの位置を校正データ生成手段２０に出力する。
なお、オブジェクト検出手段１７は、キーワード検出手段１３から、方向を示す情報が通知された場合は、複数のオブジェクトの中から、当該方向に該当するオブジェクトを検出することとする。例えば、キーワードとして「右」が発声された場合、オブジェクト検出手段１７は、複数のオブジェクトの中から、画面の座標系をＸＹ座標としたときのＸ座標が最大値となるオブジェクトを注視対象のオブジェクトとして検出する。 Further, the object detection unit 17 generates calibration data for the position of the object corresponding to the detected keyword when the keyword detection notification is input from the keyword detection unit 13, that is, when the user utters the keyword. Output to means 20.
In addition, when the information which shows a direction is notified from the keyword detection means 13, the object detection means 17 shall detect the object applicable to the said direction from several objects. For example, when “right” is uttered as a keyword, the object detection means 17 selects an object of which the X coordinate is the maximum value when the coordinate system of the screen is the XY coordinate from among a plurality of objects. Detect as.

キーワード登録制御手段１８は、オブジェクト検出手段１７でオブジェクトが検出されるか否かによって、キーワード記憶手段１２にキーワードを登録したり、キーワード記憶手段１２からキーワードを削除したりするものである。
すなわち、キーワード登録制御手段１８は、オブジェクト検出手段１７からオブジェクト検出通知が入力された段階で、オブジェクト情報記憶手段１６に記憶されている当該オブジェクトに対応するキーワードを読み出して、キーワード記憶手段１２に登録（記録）する。また、キーワード登録制御手段１８は、オブジェクト検出手段１７からオブジェクト非検出通知が入力された段階で、当該オブジェクトに対応するキーワードをキーワード記憶手段１２から削除する。
これによって、表示装置Ｄの画面上にキーワードの対象となるオブジェクトが表示されている間だけ、キーワードがキーワード記憶手段１２に記憶されることになる。 The keyword registration control unit 18 registers a keyword in the keyword storage unit 12 or deletes a keyword from the keyword storage unit 12 depending on whether or not an object is detected by the object detection unit 17.
That is, the keyword registration control means 18 reads the keyword corresponding to the object stored in the object information storage means 16 and registers it in the keyword storage means 12 when the object detection notification is input from the object detection means 17. (Record. Further, the keyword registration control means 18 deletes the keyword corresponding to the object from the keyword storage means 12 when the object non-detection notification is input from the object detection means 17.
As a result, the keyword is stored in the keyword storage unit 12 only while the object that is the target of the keyword is displayed on the screen of the display device D.

基準位置特定手段１９は、カメラＣで撮影された眼球画像から瞳孔中心の位置および特定の基準位置を特定するものである。なお、眼球画像は、赤外線光を発光する発光手段（ＬＥＤ）Ｌによって、赤外線光が照射された利用者の眼球を撮影した画像である。また、ここでは、眼球の角膜表面で反射した光として眼球画像上に表れる角膜反射像（プルキニエ像）の中心を、特定の基準位置とする。 The reference position specifying unit 19 specifies the position of the pupil center and the specific reference position from the eyeball image captured by the camera C. The eyeball image is an image obtained by photographing the user's eyeball irradiated with infrared light by the light emitting means (LED) L that emits infrared light. Further, here, the center of the cornea reflection image (Purkinje image) appearing on the eyeball image as the light reflected on the cornea surface of the eyeball is set as a specific reference position.

この基準位置特定手段１９は、一般的な画像処理技術により、眼球画像から瞳孔中心と角膜反射点とを特定する。例えば、基準位置特定手段１９は、瞳孔が他の領域よりも輝度が低い領域を探索することで瞳孔を検出し、その中心（重心）を瞳孔中心とする。また、基準位置特定手段１９は、瞳孔中心から所定範囲内（例えば、眼球の虹彩内）で、他の領域よりも輝度が高い領域を探索することで角膜反射像を探索し、その中心（重心）を角膜反射点とする。
なお、基準位置特定手段１９で特定された瞳孔中心および角膜反射点は、校正データ生成手段２０および視線算出手段２２に出力される。 The reference position specifying means 19 specifies the pupil center and the corneal reflection point from the eyeball image by a general image processing technique. For example, the reference position specifying unit 19 detects the pupil by searching for a region where the pupil has a lower luminance than other regions, and sets the center (center of gravity) as the pupil center. Further, the reference position specifying means 19 searches for a corneal reflection image by searching for a region having a higher brightness than other regions within a predetermined range (for example, within the iris of the eyeball) from the pupil center, and the center (centroid ) As the corneal reflection point.
Note that the pupil center and the corneal reflection point specified by the reference position specifying means 19 are output to the calibration data generating means 20 and the line-of-sight calculation means 22.

校正データ生成手段２０は、オブジェクト検出手段１７で検出されたオブジェクトの位置と、基準位置特定手段１９で特定された瞳孔中心および角膜反射点とを対応付けることで、視線（注視点）を校正する校正データを生成するものである。
すなわち、校正データ生成手段２０は、利用者が画面上のオブジェクトの位置を注視していると仮定したときの、瞳孔中心と角膜反射点との差を校正データ（キャリブレーションデータ）として算出（生成）する。 The calibration data generation unit 20 calibrates the line of sight (gaze point) by associating the position of the object detected by the object detection unit 17 with the pupil center and the corneal reflection point specified by the reference position specifying unit 19. Data is generated.
That is, the calibration data generation means 20 calculates (generates) the difference between the pupil center and the corneal reflection point as calibration data (calibration data) when it is assumed that the user is gazing at the position of the object on the screen. )

ここで、図３を参照して、校正データについて説明する。図３は、校正データを説明するための説明図である。
図３に示すように、表示装置Ｄの画面上において、オブジェクトＯがＸＹ座標系において座標Ｐ（Ｘ₁，Ｙ₁）の位置に表示されているとする。このとき、このオブジェクトＯを注視している利用者の眼球画像Ｇのｘｙ座標系において、瞳孔中心Ｑと角膜反射点Ｒとの差が、ｘ座標でｘ₁、ｙ座標でｙ₁であったとする。
このとき、座標Ｐ（Ｘ₁，Ｙ₁）と、瞳孔中心Ｑと角膜反射点Ｒとの差の対（ｘ₁，ｙ₁）とを対応付ける。この対応付けをオブジェクトＯの異なる２点以上で行うことで、校正データが生成されることになる。
図２に戻って、視線測定装置１の構成について説明を続ける。 Here, the calibration data will be described with reference to FIG. FIG. 3 is an explanatory diagram for explaining the calibration data.
As shown in FIG. 3, it is assumed that the object O is displayed at the position of coordinates P (X ₁ , Y ₁ ) in the XY coordinate system on the screen of the display device D. At this time, in the xy coordinate system of the eyeball image G of the user who is gazing at the object O, the difference between the pupil center Q and the corneal reflection point R is x _{1 in} the x coordinate and y ₁ in the y coordinate. To do.
At this time, the coordinate P (X ₁ , Y ₁ ) is associated with the pair (x ₁ , y ₁ ) of the difference between the pupil center Q and the corneal reflection point R. By performing this association at two or more different points of the object O, calibration data is generated.
Returning to FIG. 2, the description of the configuration of the line-of-sight measurement device 1 will be continued.

校正データ記憶手段２１は、校正データ生成手段２０で生成された校正データを記憶するものであって、半導体メモリ等の一般的な記憶手段である。なお、校正データ記憶手段２１は、ＦＩＦＯ（First In First Out）バッファとし、予め定めた回数の校正データのみを記憶することとする。例えば、バッファ長を“３”とした場合は、３回分の校正データのみが記憶され、さらに校正データを記憶する場合は、最も古い校正データは削除することとする。これによって、誤った校正データが記憶された場合であっても、再度校正をやり直すことで、正しい校正データが記憶されることになる。この校正データ記憶手段２１に記憶された校正データは、後記する視線校正手段２３によって参照される。 The calibration data storage means 21 stores the calibration data generated by the calibration data generation means 20, and is a general storage means such as a semiconductor memory. The calibration data storage means 21 is a FIFO (First In First Out) buffer and stores only a predetermined number of calibration data. For example, when the buffer length is “3”, only the calibration data for three times is stored, and when the calibration data is stored, the oldest calibration data is deleted. Thus, even if incorrect calibration data is stored, correct calibration data is stored by performing calibration again. The calibration data stored in the calibration data storage means 21 is referred to by the line-of-sight calibration means 23 described later.

視線算出手段２２は、基準位置特定手段１９で特定された瞳孔中心および角膜反射点と、カメラパラメータ（焦点位置、カメラ位置、パン角、チルト角等）とに基づいて、表示装置Ｄの画面に対する視線である利用者の注視点を算出するものである。なお、注視点は、例えば、眼球画像内の瞳孔中心および角膜反射点から、カメラパラメータに基づいて、眼球中心から視線方向を示す視線ベクトルを求め、その視線ベクトルを既知の位置に配置されている表示装置Ｄの画面に射影することで求めることができる。
この視線算出手段２２で算出された注視点である視線データは、視線校正手段２３に出力される。 The line-of-sight calculation means 22 applies to the screen of the display device D based on the pupil center and the corneal reflection point specified by the reference position specification means 19 and the camera parameters (focal position, camera position, pan angle, tilt angle, etc.). The gaze point of the user who is the line of sight is calculated. Note that, for example, a gaze vector indicating a gaze direction from the center of the eyeball is obtained from the center of the pupil and the corneal reflection point in the eyeball image based on the camera parameters, and the gaze vector is arranged at a known position. It can be obtained by projecting on the screen of the display device D.
The line-of-sight data that is the gaze point calculated by the line-of-sight calculation means 22 is output to the line-of-sight calibration means 23.

視線校正手段２３は、視線算出手段２２で算出された視線データ（注視点）を、校正データ記憶手段２１に記憶されている校正データに基づいて校正（補正）するものである。 The line-of-sight calibration unit 23 calibrates (corrects) the line-of-sight data (gaze point) calculated by the line-of-sight calculation unit 22 based on the calibration data stored in the calibration data storage unit 21.

ここで、図４を参照（構成については、適宜図２参照）して、視線校正手段２３における視線データの校正方法について説明する。図４は、視線データの校正方法を説明するための説明図である。
ここでは、ある任意の注視点の座標ｐ（ｘ，ｙ）を、３点（Ｐ₁、Ｐ₂、Ｐ₃）分の校正データに基づいて、座標Ｐ（Ｘ，Ｙ）に校正する方法について説明する。
なお、校正データとして、座標Ｐ₁（Ｘ₁，Ｙ₁）に瞳孔中心と角膜反射点との差の対（ｘ₁，ｙ₁）が対応付けられているものとする。同様に、座標Ｐ₂（Ｘ₂，Ｙ₂）には（ｘ₂，ｙ₂）、座標Ｐ₃（Ｘ₃，Ｙ₃）には（ｘ₃，ｙ₃）がそれぞれ対応付けられているものとする。 Here, with reference to FIG. 4 (refer to FIG. 2 as appropriate for the configuration), a method of calibrating the line-of-sight data in the line-of-sight calibration means 23 will be described. FIG. 4 is an explanatory diagram for explaining a method of correcting the line-of-sight data.
Here, a method for calibrating coordinates p (x, y) of an arbitrary gazing point to coordinates P (X, Y) based on calibration data for three points (P ₁ , P ₂ , P ₃ ). explain.
As calibration data, it is assumed that a pair (x ₁ , y ₁ ) of the difference between the pupil center and the corneal reflection point is associated with the coordinates P ₁ (X ₁ , Y ₁ ). Similarly, (x ₂ , y ₂ ) is associated with the coordinate P ₂ (X ₂ , Y ₂ ), and (x ₃ , y ₃ ) is associated with the coordinate P ₃ (X ₃ , Y ₃ ). And

このとき、視線校正手段２３は、座標ＰのＸ座標を算出する場合、瞳孔中心と角膜反射点とのＸ軸方向の差が大きいものから順に２点の校正データを使用し、座標ＰのＹ座標を算出する場合、瞳孔中心と角膜反射点とのＹ軸方向の差が大きいものから順に２点の校正データを使用する。
ここでは、Ｘ軸方向の差が大きい順にｘ₃、ｘ₁、ｘ₂（ｘ₃＞ｘ₁＞ｘ₂）であるものとし、Ｘ軸方向の校正については、座標Ｐ₁および座標Ｐ₃における校正データを使用することとする。また、ここでは、Ｙ軸方向の差が大きい順にｙ₁、ｙ₂、ｙ₃（ｙ₁＞ｙ₂＞ｙ₃）であるものとし、Ｙ軸方向の校正については、座標Ｐ₁および座標Ｐ₂における校正データを使用することとする。
具体的には、視線校正手段２３は、任意の注視点の座標ｐ（ｘ，ｙ）を、以下の式（１）により、座標Ｐ（Ｘ，Ｙ）に校正する。 At this time, when calculating the X coordinate of the coordinate P, the line-of-sight calibration means 23 uses the calibration data of two points in order from the largest difference in the X-axis direction between the pupil center and the corneal reflection point. When calculating the coordinates, two points of calibration data are used in descending order of the difference in the Y-axis direction between the pupil center and the corneal reflection point.
Here, it is assumed that x ₃ , x ₁ , x ₂ (x ₃ > x ₁ > x ₂ ) in descending order of the difference in the X-axis direction, and the calibration in the X-axis direction is performed at the coordinates P ₁ and P ₃ . Calibration data will be used. Here, it is assumed that y ₁ , y ₂ , y ₃ (y ₁ > y ₂ > y ₃ ) in descending order of the difference in the Y-axis direction. For calibration in the Y-axis direction, coordinates P ₁ and P The calibration data in ₂ will be used.
Specifically, the line-of-sight calibration means 23 calibrates the coordinates p (x, y) of an arbitrary gazing point to the coordinates P (X, Y) by the following equation (1).

なお、ここでは、３点の校正データのうち、Ｘ軸方向およびＹ軸方向でそれぞれ異なる２点によって校正を行ったが、２点の校正データのみで校正を行うことも可能である。すなわち、校正データは少なくとも２点以上あればよい。 Here, the calibration is performed using two different points in the X-axis direction and the Y-axis direction among the three points of calibration data. However, the calibration can be performed using only two points of calibration data. That is, it is sufficient that the calibration data has at least two points.

以上、視線測定装置１の構成について説明したが、本発明はこの構成に限定されるものではない。
例えば、予めオブジェクトの位置とキーワードとを時間区間に対応付けて記述したメタデータが付加されたコンテンツを用い、オブジェクト検出手段１７が、当該メタデータを解析することで、ある時間にどのオブジェクトが表示装置Ｄに表示されているのかを認識することで、オブジェクトの検出を行うこととしてもよい。 The configuration of the line-of-sight measurement apparatus 1 has been described above, but the present invention is not limited to this configuration.
For example, by using content to which metadata in which object positions and keywords are described in advance in association with time intervals is used, the object detection unit 17 analyzes the metadata to display which object at a certain time. The object may be detected by recognizing whether it is displayed on the device D.

この場合、オブジェクト検出手段１７は、メタデータによってオブジェクトを検出した段階で、同じくメタデータで当該オブジェクトに対応付けられているキーワードを、オブジェクト検出通知とともに、キーワード登録制御手段１８に出力する。また、オブジェクト検出手段１７は、メタデータによってオブジェクトが検出されなくなった段階で、当該オブジェクトに対応付けられているキーワードを、オブジェクト非検出通知とともに、キーワード登録制御手段１８に出力する。 In this case, the object detection means 17 outputs the keyword associated with the object in the metadata together with the object detection notification to the keyword registration control means 18 when the object is detected by the metadata. Further, when the object is no longer detected by the metadata, the object detection unit 17 outputs the keyword associated with the object to the keyword registration control unit 18 together with the object non-detection notification.

また、この場合、オブジェクト検出手段１７は、キーワード検出手段１３からキーワード検出通知が入力された段階で、メタデータに記述されているオブジェクトの位置を校正データ生成手段２０に出力する。
これによって、視線測定装置１の構成からオブジェクト情報記憶手段１６を省略することができる。また、例えば、放送局からメタデータが付加されたコンテンツが送信されることで、利用者は、一般的な映像コンテンツを視聴しながら、視線の校正を行うことができ、当該視線測定装置１を組み込んだテレビ受像機において、視線により操作を行うことが可能になる。 In this case, the object detection unit 17 outputs the position of the object described in the metadata to the calibration data generation unit 20 when the keyword detection notification is input from the keyword detection unit 13.
Accordingly, the object information storage unit 16 can be omitted from the configuration of the line-of-sight measurement device 1. Further, for example, by transmitting content with metadata added from a broadcasting station, the user can calibrate the line of sight while viewing general video content, and the line-of-sight measurement device 1 can be The built-in television receiver can be operated with a line of sight.

また、視線測定装置１は、入力されるコンテンツからオブジェクトを抽出することとしたが、予めオブジェクトが表示される位置が既知のコンテンツを使用する場合は、図７に示すように、構成を簡略化した視線測定装置１Ｂとしてもよい。
なお、図７に示した視線測定装置１Ｂのキーワード記憶手段１２Ｂには、コンテンツに含まれるオブジェクトの位置やキーワードが予め記憶されているものであって、ハードディスク等の一般的な記憶手段である。他の構成については、視線測定装置１と同じものであるため、同一の符号を付し、説明を省略する。 In addition, the line-of-sight measurement apparatus 1 extracts an object from input content, but when using content whose position where the object is displayed in advance is used, the configuration is simplified as shown in FIG. The line-of-sight measurement device 1B may be used.
Note that the keyword storage unit 12B of the line-of-sight measurement apparatus 1B shown in FIG. 7 stores the positions and keywords of objects included in the content in advance, and is a general storage unit such as a hard disk. About another structure, since it is the same as the visual line measuring apparatus 1, the same code | symbol is attached | subjected and description is abbreviate | omitted.

また、視線測定装置１は、一般的なコンピュータを、前記した各手段として機能させるプログラム（視線測定プログラム）で実現することができる。なお、校正データを生成するまでの手順をコンピュータに機能させる視線校正データ生成プログラムとすることも可能である。これらのプログラム（視線測定プログラム、視線校正データ生成プログラム）は、通信回線を介して配布することも可能であるし、ＣＤ−ＲＯＭ等の記録媒体に書き込んで配布することも可能である。
以上説明したように、視線測定装置１は、オペレータを伴わずに、利用者の音声によって視線のキャリブレーションを行うことができ、また、複数のマーカを含んだ専用の映像を用いなくてもキャリブレーションを行うことができる。 The line-of-sight measurement apparatus 1 can be realized by a program (line-of-sight measurement program) that causes a general computer to function as each of the above-described means. It is also possible to use a line-of-sight calibration data generation program that causes a computer to function the procedure up to the generation of calibration data. These programs (line-of-sight measurement program, line-of-sight calibration data generation program) can be distributed via a communication line, or can be written and distributed on a recording medium such as a CD-ROM.
As described above, the line-of-sight measurement apparatus 1 can calibrate the line of sight with the voice of the user without an operator, and can also perform calibration without using a dedicated image including a plurality of markers. Can be performed.

［視線測定装置の動作］
次に、図５および図６を参照して、視線測定装置の動作について説明する。図５は、本発明に係る視線測定装置における校正データを生成する動作を示すフローチャートである。図６は、本発明に係る視線測定装置における視線を校正（補正）した視線データを生成する動作を示すフローチャートである。 [Operation of eye gaze measurement device]
Next, the operation of the line-of-sight measurement device will be described with reference to FIGS. FIG. 5 is a flowchart showing an operation of generating calibration data in the visual line measuring device according to the present invention. FIG. 6 is a flowchart showing an operation of generating line-of-sight data in which the line-of-sight is calibrated (corrected) in the line-of-sight measurement apparatus according to the present invention.

（校正データ生成動作）
最初に、図５を参照（構成については、適宜図２参照）して、視線測定装置１の校正データを生成する動作について説明する。
まず、視線測定装置１は、コンテンツ入力手段１４によって、外部からコンテンツ（映像）を入力する（ステップＳ１）。
そして、視線測定装置１は、オブジェクト検出手段１７によって、オブジェクト情報記憶手段１６に記憶されている特徴量に基づいて、コンテンツからオブジェクトの検出を行う（ステップＳ２）。なお、オブジェクト検出手段１７は、内部状態を、予めオブジェクトを検出していない「未検出状態」としておく。 (Calibration data generation operation)
First, referring to FIG. 5 (refer to FIG. 2 as appropriate for the configuration), the operation of generating calibration data of the visual line measuring device 1 will be described.
First, the line-of-sight measurement apparatus 1 inputs content (video) from the outside by the content input means 14 (step S1).
Then, the line-of-sight measurement apparatus 1 detects the object from the content based on the feature amount stored in the object information storage unit 16 by the object detection unit 17 (step S2). The object detecting means 17 sets the internal state to an “undetected state” in which no object is detected in advance.

ここで、視線測定装置１は、オブジェクト検出手段１７によって、オブジェクトの検出に成功したか否かを判定する（ステップＳ３）。そして、オブジェクトの検出に成功した場合（ステップＳ３でＹｅｓ）、視線測定装置１は、キーワード登録制御手段１８によって、検出したオブジェクトに対応するキーワードをオブジェクト情報記憶手段１６から読み出して、キーワード記憶手段１２に登録（記録）し（ステップＳ４）、ステップＳ７に進む。この段階で、オブジェクト検出手段１７は、内部状態をオブジェクトが検出された状態を示す「検出状態」とする。なお、ステップＳ４において、直前（前フレーム）まで同じオブジェクトが検出されていた場合は、すでにキーワード記憶手段１２に当該オブジェクトのキーワードが登録されているため、そのままステップＳ７に進むこととする（図示せず）。 Here, the line-of-sight measurement apparatus 1 determines whether or not the object detection unit 17 has successfully detected the object (step S3). When the object detection is successful (Yes in step S3), the line-of-sight measurement device 1 reads the keyword corresponding to the detected object from the object information storage unit 16 by the keyword registration control unit 18, and the keyword storage unit 12 Is registered (recorded) (step S4), and the process proceeds to step S7. At this stage, the object detection unit 17 sets the internal state to a “detection state” indicating a state where the object is detected. If the same object has been detected immediately before (previous frame) in step S4, the keyword of the object has already been registered in the keyword storage means 12, and therefore the process proceeds to step S7 as it is (not shown). )

一方、オブジェクトの検出に成功しなかった場合（ステップＳ３でＮｏ）、さらに、視線測定装置１は、オブジェクト検出手段１７によって、内部状態により、直前（前フレーム）までオブジェクトが検出されていたかどうかを判定する（ステップＳ５）。そして、直前までオブジェクトが検出されていた場合（ステップＳ５でＹｅｓ）、視線測定装置１は、キーワード登録制御手段１８によって、未検出となったオブジェクトに対応するキーワードをキーワード記憶手段１２から削除し（ステップＳ６）、ステップＳ１に戻って動作を継続する。
一方、未検出状態が継続されている場合（ステップＳ５でＮｏ）、視線測定装置１は、そのままステップＳ１に戻って動作を継続する。 On the other hand, if the detection of the object is not successful (No in step S3), the line-of-sight measurement device 1 further determines whether or not the object has been detected by the object detection means 17 until immediately before (previous frame) due to the internal state. Determine (step S5). If the object has been detected until immediately before (Yes in step S5), the line-of-sight measurement device 1 deletes the keyword corresponding to the undetected object from the keyword storage unit 12 by the keyword registration control unit 18 ( Step S6), returning to step S1 to continue the operation.
On the other hand, when the undetected state is continued (No in step S5), the visual line measuring device 1 returns to step S1 as it is and continues the operation.

また、視線測定装置１は、音声認識手段１１によって、利用者が発する音声を、マイクＭにより入力して認識する（ステップＳ７）。
そして、視線測定装置１は、キーワード検出手段１３によって、ステップＳ７で認識された音声データが、キーワード記憶手段１２にキーワードとして記憶されているか否か、すなわち、キーワードが発声されたか否かを判定する（ステップＳ８）。 Further, the line-of-sight measurement device 1 recognizes the voice uttered by the user by the voice recognition unit 11 through the microphone M (step S7).
Then, the line-of-sight measurement apparatus 1 determines whether or not the voice data recognized in step S7 is stored as a keyword in the keyword storage unit 12 by the keyword detection unit 13, that is, whether or not the keyword is uttered. (Step S8).

ここで、音声データがキーワードとして記憶されていない場合（ステップＳ８でＮｏ）、視線測定装置１は、ステップＳ１に戻って動作を継続する。
一方、音声データがキーワードとして記憶されている場合（ステップＳ８でＹｅｓ）、視線測定装置１は、オブジェクト検出手段１７によって、検出したキーワードに対応するオブジェクトの位置を特定する（ステップＳ９）。 If the voice data is not stored as a keyword (No in step S8), the line-of-sight measurement device 1 returns to step S1 and continues the operation.
On the other hand, when the voice data is stored as a keyword (Yes in step S8), the line-of-sight measurement device 1 specifies the position of the object corresponding to the detected keyword by the object detection means 17 (step S9).

また、視線測定装置１は、基準位置特定手段１９によって、カメラＣで撮影された眼球画像を入力し（ステップＳ１０）、その眼球画像から瞳孔中心および角膜反射点を特定する（ステップＳ１１）。
そして、視線測定装置１は、校正データ生成手段２０によって、ステップＳ９で特定されたオブジェクトの位置と、ステップＳ１１で特定された瞳孔中心および角膜反射点とにより、校正データを生成（算出）し、校正データ記憶手段２１に記憶する（ステップＳ１２）。 Further, the line-of-sight measurement device 1 inputs an eyeball image captured by the camera C by the reference position specifying means 19 (step S10), and specifies the pupil center and the corneal reflection point from the eyeball image (step S11).
Then, the line-of-sight measurement device 1 generates (calculates) calibration data by the calibration data generation means 20 based on the position of the object specified in step S9 and the pupil center and corneal reflection point specified in step S11. The data is stored in the calibration data storage means 21 (step S12).

そして、視線測定装置１は、ステップＳ１に戻って、コンテンツが入力される間、オブジェクトの検出、ならびに、キーワードの検出動作を継続する。
これによって、利用者がオブジェクトを注視し、オブジェクトのキーワードを発声することで、校正データ記憶手段２１には校正データが記憶されることになる。なお、本動作が、視線校正データ生成プログラムの動作に相当する。 Then, the line-of-sight measurement device 1 returns to step S1 and continues the object detection and keyword detection operations while content is input.
Thus, the calibration data is stored in the calibration data storage unit 21 when the user gazes at the object and utters the keyword of the object. This operation corresponds to the operation of the line-of-sight calibration data generation program.

（視線データ生成動作）
次に、図６を参照（構成については、適宜図２参照）して、視線測定装置１の視線を校正した視線データを生成する動作について説明する。
まず、視線測定装置１は、基準位置特定手段１９によって、カメラＣで撮影された眼球画像を入力し（ステップＳ２１）、その眼球画像から瞳孔中心および角膜反射点を特定する（ステップＳ２２）。 (Gaze data generation operation)
Next, referring to FIG. 6 (refer to FIG. 2 as appropriate for the configuration), an operation for generating line-of-sight data obtained by calibrating the line-of-sight of the line-of-sight measurement apparatus 1 will be described.
First, the line-of-sight measurement apparatus 1 inputs an eyeball image captured by the camera C by the reference position specifying unit 19 (step S21), and specifies the pupil center and the corneal reflection point from the eyeball image (step S22).

そして、視線測定装置１は、視線算出手段２２によって、ステップＳ２２で特定された瞳孔中心および角膜反射点と、カメラＣから入力されるカメラパラメータ（焦点位置、カメラ位置、パン角、チルト角等）とに基づいて、表示装置の画面に対する視線（視線データ）である利用者の注視点を算出する（ステップＳ２３）。
そして、視線測定装置１は、視線校正手段２３によって、ステップＳ２３で算出された視線（注視点）を、校正データ記憶手段２１に記憶されている校正データに基づいて校正（補正）し（ステップＳ２４）、視線データとして出力する（ステップＳ２５）。
このように、視線測定装置１は、逐次、利用者の視線をキャリブレーションした視線データを生成し、出力する。
以上説明した動作によって、利用者はキーワードを発声するだけの簡単な操作でキャリブレーションを行うことができる。また、キャリブレーション用の専用の映像を使用しないため、利用者はコンテンツを視聴している好きなタイミングで、キャリブレーションを行うことができる。
なお、前記した校正データを生成する動作（図５）と、本動作（図６）とを合わせた動作が、視線測定プログラムの動作に相当する。 Then, the line-of-sight measurement device 1 uses the line-of-sight calculation means 22 to determine the pupil center and the corneal reflection point specified in step S22 and the camera parameters input from the camera C (focal position, camera position, pan angle, tilt angle, etc.). Based on the above, the gaze point of the user, which is the line of sight (gaze data) with respect to the screen of the display device, is calculated (step S23).
The line-of-sight measurement apparatus 1 calibrates (corrects) the line of sight (gaze point) calculated in step S23 by the line-of-sight calibration unit 23 based on the calibration data stored in the calibration data storage unit 21 (step S24). ) And output as line-of-sight data (step S25).
In this way, the line-of-sight measurement device 1 sequentially generates and outputs line-of-sight data obtained by calibrating the user's line of sight.
With the operation described above, the user can perform calibration with a simple operation by simply speaking a keyword. In addition, since a dedicated video for calibration is not used, the user can perform calibration at any timing when viewing the content.
In addition, the operation | movement which combined the operation | movement (FIG. 5) mentioned above and this operation | movement (FIG. 6) is equivalent to the operation | movement of a gaze measurement program.

本発明に係る視線測定装置の概要を説明するための説明図である。It is explanatory drawing for demonstrating the outline | summary of the visual line measuring apparatus which concerns on this invention. 本発明に係る視線測定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the gaze measuring apparatus which concerns on this invention. 本発明に係る視線測定装置における校正データを説明するための説明図である。It is explanatory drawing for demonstrating the calibration data in the gaze measurement apparatus which concerns on this invention. 本発明に係る視線測定装置における視線データの校正方法を説明するための説明図である。It is explanatory drawing for demonstrating the calibration method of the gaze data in the gaze measuring apparatus which concerns on this invention. 本発明に係る視線測定装置における校正データを生成する動作を示すフローチャートである。It is a flowchart which shows the operation | movement which produces | generates the calibration data in the gaze measuring apparatus which concerns on this invention. 本発明に係る視線測定装置における視線を校正（補正）した視線データを生成する動作を示すフローチャートである。It is a flowchart which shows the operation | movement which produces | generates the gaze data which calibrated (corrected) the gaze in the gaze measuring apparatus which concerns on this invention. 本発明に係る視線測定装置の他の構成を示すブロック図である。It is a block diagram which shows the other structure of the gaze measuring apparatus which concerns on this invention.

Explanation of symbols

１視線測定装置
１１音声認識手段
１２キーワード記憶手段
１３キーワード検出手段
１４コンテンツ入力手段
１５コンテンツ出力手段
１６オブジェクト情報記憶手段
１７オブジェクト検出手段
１８キーワード登録制御手段
１９基準位置特定手段
２０校正データ生成手段
２１校正データ記憶手段
２２視線算出手段
２３視線校正手段 DESCRIPTION OF SYMBOLS 1 Line-of-sight measurement apparatus 11 Voice recognition means 12 Keyword storage means 13 Keyword detection means 14 Content input means 15 Content output means 16 Object information storage means 17 Object detection means 18 Keyword registration control means 19 Reference position specification means 20 Calibration data generation means 21 Calibration Data storage means 22 Gaze calculation means 23 Gaze calibration means

Claims

In a line-of-sight measuring device that measures the line of sight of a user viewing the screen of a display device,
Keyword storage means for storing a position of an object to be watched in the video displayed on the screen in association with a keyword for specifying the object;
Voice recognition means for recognizing the voice uttered by the user;
Keyword detection means for detecting whether or not the keyword stored in the keyword storage means is included in the recognition result recognized by the voice recognition means;
Calibration data generating means for generating calibration data for calibrating the line of sight based on the position of the object when a keyword is detected by the keyword detecting means;
Line-of-sight calibration means for calibrating the line of sight based on the calibration data generated by the calibration data generation means;
A line-of-sight measurement apparatus comprising:

In a line-of-sight measuring device that measures the line of sight of a user viewing the screen of a display device,
Keyword storage means for storing a keyword for identifying an object to be watched in the video displayed on the screen;
Object detection means for detecting the object from within the video;
Keyword registration control means for registering or deleting a keyword stored in the keyword storage means depending on whether or not the object is detected by the object detection means;
Voice recognition means for recognizing the voice uttered by the user;
A keyword detection unit for detecting whether or not the keyword registered in the keyword storage unit is included in the recognition result recognized by the voice recognition unit;
Calibration data generating means for generating calibration data for calibrating the line of sight based on the position of the object detected by the object detecting means when the keyword is detected by the keyword detecting means;
Line-of-sight calibration means for calibrating the line of sight based on the calibration data generated by the calibration data generation means;
A line-of-sight measurement apparatus comprising:

Comprising object information storage means for storing the feature quantity characterizing the object and the keyword in association with each other;
The line-of-sight measurement apparatus according to claim 2, wherein the object detection unit detects the object from the video based on a feature amount stored in the object information storage unit.

3. The line-of-sight measurement according to claim 2, wherein the object detection unit detects the object based on metadata in which the position of the object and the keyword are described in association with a time interval of the video. apparatus.

In order to measure the line of sight of the user viewing the screen of the display device,
Reference position specifying means for specifying the position of the center of the pupil and the position of the corneal reflection point from the eyeball image including the eyeball of the user irradiated with light;
Voice recognition means for recognizing the voice emitted by the user;
The keyword is included in the recognition result recognized by the voice recognition means with reference to keyword storage means for storing the position of the object to be watched in the video in association with the keyword for specifying the object. Keyword detecting means for detecting whether or not,
Calibration data generating means for generating calibration data for calibrating the line of sight based on the position of the pupil center, the position of the corneal reflection point, and the position of the object when the keyword is detected by the keyword detecting means;
Line-of-sight calibration means for calibrating the line of sight based on the calibration data generated by the calibration data generation means,
Eye gaze measurement program characterized by functioning as

In order to generate calibration data for calibrating the line of sight of the user viewing the screen of the display device,
Reference position specifying means for specifying the position of the center of the pupil and the position of the corneal reflection point from the eyeball image including the eyeball of the user irradiated with light;
Voice recognition means for recognizing the voice emitted by the user;
The keyword is included in the recognition result recognized by the voice recognition means with reference to keyword storage means for storing the position of the object to be watched in the video in association with the keyword for specifying the object. Keyword detecting means for detecting whether or not,
Calibration data generating means for generating calibration data for calibrating the line of sight based on the position of the pupil center, the position of the corneal reflection point, and the position of the object when the keyword is detected by the keyword detecting means;
A line-of-sight calibration data generation program characterized in that it functions as: