JP4061379B2

JP4061379B2 - Information processing apparatus, portable terminal, information processing method, information processing program, and computer-readable recording medium

Info

Publication number: JP4061379B2
Application number: JP2004344849A
Authority: JP
Inventors: 宗平嶋; 淳夫吉高; 知晃竹村
Original assignee: Hiroshima University NUC
Current assignee: Hiroshima University NUC
Priority date: 2004-11-29
Filing date: 2004-11-29
Publication date: 2008-03-19
Anticipated expiration: 2024-11-29
Also published as: JP2006155238A

Description

本発明は、拡張現実感に関するものであり、注目対象となるオブジェクトの注釈情報を適応的にユーザに提示し得る情報処理装置、携帯端末、情報処理方法、情報処理プログラム、およびコンピュータ読取可能な記録媒体に関するものである。 The present invention relates to augmented reality, and is an information processing apparatus, a portable terminal, an information processing method, an information processing program, and a computer-readable recording capable of adaptively presenting annotation information of an object of interest to a user It relates to the medium.

拡張現実感(Augmented Reality、以下ARと略す)とは、テキストやCG等の注釈情報を実世界のオブジェクトに重ねて、あるいは近い位置に提示することによって、実世界に情報を付加する技術であり、これによりユーザはオブジェクトの外観が持つ情報以上の情報を得ることができる。ARシステムでは、ユーザが興味を持つオブジェクトの情報を提示することを目的とするため、ユーザが注目しているオブジェクトを認識する必要がある。既存のシステムにおいて、オブジェクトの認識方法は、オブジェクト認識のためのタグ付けを行う方法と、行わない方法に大別でき、それぞれの方法により研究が盛んに行われている。 Augmented Reality (Augmented Reality) is a technology that adds information to the real world by overlaying or presenting annotation information such as text or CG on the real world object. As a result, the user can obtain more information than the information of the appearance of the object. Since the purpose of the AR system is to present information on an object that the user is interested in, it is necessary to recognize the object that the user is paying attention to. In existing systems, object recognition methods can be broadly classified into methods for tagging for object recognition and methods for not performing object recognition, and research is actively conducted by each method.

オブジェクトにタグを付け、タグを読み取ることによりオブジェクトを認識し、関連付けられた情報を提示する方法（非特許文献１〜４参照）では、2次元バーコードやRFIDタグ、赤外LEDをオブジェクトに取り付ける方法（非特許文献１〜３参照）や、物理的なタグを貼り付けるのではなく、赤外線によりタグを投影する方法が提案されている（非特許文献４参照）。これらの方法では、注釈情報を提示するために、ユーザはタグリーダやカメラなどのデバイスによりタグを読み取るという明示的な動作が必要となる。一方、タグ付けを行わずオブジェクトの認識を行い、関連付けられた情報の提示を行う方法（非特許文献５，６参照）では、事前に獲得しておいたオブジェクトの画像を基にして、ユーザに装着したカメラに映っているオブジェクトの認識を行っている。これらの方法では、オブジェクト全体が画像に映っている場合の認識精度は100%となっている。
小林元樹, 小池英樹, “電子情報の表示と操作を実現する机型実世界インターフェース「EnhancedDesk」”, インタラクティブシステムとソフトウェアＶ:日本ソフトウェア科学会WISS1997. 椎尾一郎, 増井俊之, 福地健太郎, “FieldMouseによる実世界インタラクション”, インタラクティブシステムとソフトウェアVII:日本ソフトウェア科学会WISS1999. 青木恒, “カメラで読み取る赤外線タグとその応用”, インタラクティブシステムとソフトウェアVIII:日本ソフトウェア科学会WISS2000. 白井良成, 松下光範, 大黒毅, “秘映プロジェクタ：不可視情報による実環境の拡張”,インタラクティブシステムとソフトウェアXI:日本ソフトウェア科学会WISS2003. T. Kurata, T. Okuma, M. Kourogi, T. Kato, and K. Sakaue, “VizWear: Toward HumanCentered Interaction through Wearable Vision and Visualization”, The Second IEEE PacificRim Conference on Multimedia, 2001. T. Jebara, B. Schiele, N. Oliver, A. Pentland, “DyPERS: Dynamic Personal Enhanced Reality System”, M.I.T. Media Lab. Perceptual Computing Section Technical Report, No. 468, 1998. 池田光男, “眼はなにを見ているか”, 平凡社, 1998. A method of attaching a tag to an object, recognizing the object by reading the tag, and presenting associated information (see Non-Patent Documents 1 to 4) attaches a two-dimensional barcode, RFID tag, or infrared LED to the object. There have been proposed a method (see Non-Patent Documents 1 to 3) and a method of projecting a tag with infrared rays instead of attaching a physical tag (see Non-Patent Document 4). In these methods, in order to present annotation information, a user needs an explicit operation of reading a tag with a device such as a tag reader or a camera. On the other hand, in the method of recognizing an object without tagging and presenting the associated information (see Non-Patent Documents 5 and 6), the user is prompted based on the object image acquired in advance. Recognizes objects that are reflected in the attached camera. In these methods, the recognition accuracy when the entire object is reflected in the image is 100%.
Motoki Kobayashi, Hideki Koike, “Desktop Real World Interface“ EnhancedDesk ”for Displaying and Manipulating Electronic Information”, Interactive System and Software V: Japan Software Science Society WISS1997. Ichiro Shiio, Toshiyuki Masui, Kentaro Fukuchi, “Real World Interaction with FieldMouse”, Interactive System and Software VII: Japan Software Science Society WISS1999. Tsuyoshi Aoki, “Infrared tags read by cameras and their applications”, Interactive System and Software VIII: Japan Software Science Society WISS2000. Yoshinari Shirai, Mitsunori Matsushita, Satoshi Oguro, “Secret Projector: Expansion of Real Environment by Invisible Information”, Interactive System and Software XI: Japan Software Science Society WISS2003. T. Kurata, T. Okuma, M. Kourogi, T. Kato, and K. Sakaue, “VizWear: Toward HumanCentered Interaction through Wearable Vision and Visualization”, The Second IEEE PacificRim Conference on Multimedia, 2001. T. Jebara, B. Schiele, N. Oliver, A. Pentland, “DyPERS: Dynamic Personal Enhanced Reality System”, MIT Media Lab. Perceptual Computing Section Technical Report, No. 468, 1998. Mitsuo Ikeda, “What does the eye see?”, Heibonsha, 1998.

しかしながら、従来技術では、オブジェクトに関する情報を提示する際に、ユーザがそのオブジェクトに興味を持っているか否かの判断を行っていないため、ユーザにとって不必要な情報を提示する場合がある。また、オブジェクトの認識しか行っていないため、オブジェクトのどの部分に注目したかを検出することはできない。 However, in the prior art, when presenting information related to an object, since it is not determined whether or not the user is interested in the object, information unnecessary for the user may be presented. Further, since only object recognition is performed, it is impossible to detect which part of the object is focused.

本発明は、上記従来の問題点に鑑みなされたものであって、ユーザが注目しているオブジェクトに関する注釈情報を的確にユーザに提示し得る情報処理装置、携帯端末、情報処理方法、情報処理プログラム、およびコンピュータ読取可能な記録媒体を提供することを目的とする。 The present invention has been made in view of the above-described conventional problems, and is an information processing apparatus, a portable terminal, an information processing method, and an information processing program capable of accurately presenting annotation information regarding an object that the user is paying attention to. And a computer-readable recording medium.

本発明の情報処理装置は、上記課題を解決するために、ユーザの眼球運動を撮影した眼球撮影画像に基づき、ユーザがオブジェクトに注目している状態を検出する眼球動作検出手段と、上記眼球動作検出手段によりユーザが注目状態にあると検出された際、ユーザの視界を撮影した視界画像内において、ユーザが注目しているオブジェクトを認識するオブジェクト認識手段と、上記オブジェクトにおいてユーザが注目している注目領域を判別する注目領域判別手段と、上記注目領域判別手段により判別された注目領域に関する注釈情報をユーザに提示する注釈情報提示手段とを備えている。 In order to solve the above problem, an information processing apparatus according to the present invention includes an eyeball motion detection unit that detects a state in which a user is paying attention to an object, based on an eyeball image obtained by capturing the user's eyeball movement, and the eyeball motion When the detection means detects that the user is in the state of interest, the object recognition means for recognizing the object being noticed by the user in the view image obtained by photographing the user's view, and the user is paying attention to the object. a region of interest determining unit that determines a region of interest, that have an annotation information presenting means for presenting the annotation information about the determined region of interest by the target region determining means to the user.

上記構成によれば、眼球動作検出手段にてユーザが注目状態にあることを検出した上で、オブジェクト認識手段によりユーザが注目しているオブジェクトが認識されるので、ユーザが真に興味を示しているオブジェクトを認識することができる。 According to the above configuration, since the object that the user is interested in is recognized by the object recognizing means after the eye movement detecting means detects that the user is in the attention state, the user is truly interested. Can recognize the existing object.

その上、注目領域判別手段により、オブジェクトにおいてユーザが注目している注目領域を判別するので、オブジェクトにおいてどの部分にユーザが注目しているかを把握することができる。そして、注釈情報提示手段は、上記注目領域に関する情報をユーザに提示するので、ユーザが真に興味を示している領域に関する注釈情報を的確にユーザに提示することが可能であり、不必要な情報がユーザに提示されることを防止できる。 In addition, the attention area discriminating means discriminates the attention area that the user is paying attention to in the object, so that it is possible to grasp which part of the object the user is paying attention to. And since the annotation information presentation means presents information on the region of interest to the user, it is possible to accurately present annotation information about the region in which the user is truly interested to the user, unnecessary information. Can be prevented from being presented to the user.

このように、本発明の情報処理装置によれば、ユーザが興味を示している領域に関する注釈情報が的確にユーザに提示されるので、本発明の情報処理装置の快適な利用環境をユーザに与えることができる。 As described above, according to the information processing apparatus of the present invention, since the annotation information regarding the area in which the user is interested is accurately presented to the user, a comfortable use environment of the information processing apparatus of the present invention is given to the user. be able to.

さらに、本発明の情報処理装置は、上記課題を解決するために、上記眼球動作検出手段が、上記眼球撮影画像から瞳孔領域を抽出する視線検出手段と、上記瞳孔領域における中心座標から注視点の移動距離を算出する注視点算出手段と、上記注視点の移動距離から、オブジェクトに対するユーザの視角を算出するとともに、その視角の大きさに基づき眼球の固視状態および跳躍状態を検出し、これら固視状態および跳躍状態の回数に基づきユーザがオブジェクトに注目している状態を検出する注目状態検出手段とを備えている。 In the information processing apparatus of the present invention, in order to solve the above-described problem, the eye movement detecting means includes a line-of-sight detecting means for extracting a pupil region from the eyeball image, and a gaze point from a center coordinate in the pupil region. The gazing point calculation means for calculating the moving distance and the viewing angle of the user with respect to the object are calculated from the moving distance of the gazing point, and the fixation state and the jumping state of the eyeball are detected based on the size of the viewing angle. user based on the state and the number of jumping state viewing that are provided by an attention state detecting means for detecting a state of focusing on an object.

人間が何かに注目している際には、固視状態と跳躍状態とを頻繁に繰り返すことが知られている。したがって、この眼球運動の特徴を利用すれば、ユーザが注目状態にあることを的確に検出することができる。 It is known that when a human is paying attention to something, the fixation state and the jumping state are frequently repeated. Therefore, if this eye movement feature is used, it is possible to accurately detect that the user is in the focused state.

そこで、上記構成では、眼球動作検出手段に、視線検出手段と、注視点算出手段と、注目状態検出手段とが備えられている。 Therefore, in the above configuration, the eye movement detection means includes a gaze detection means, a gaze point calculation means, and an attention state detection means.

すなわち、固視状態および跳躍状態は、ユーザの視角を検出することにより検出可能であり、ユーザの視角は、注視点の移動距離を算出すれば求めることができる。 That is, the fixation state and the jumping state can be detected by detecting the viewing angle of the user, and the viewing angle of the user can be obtained by calculating the moving distance of the gazing point.

上記構成の情報処理装置は、視線検出手段によりたとえば２値化処理を用いて瞳孔領域を抽出する一方で、注視点算出手段により注視点の移動距離を算出する。そして、注目状態検出手段は、注視点算出手段により求められた注視点の移動距離からユーザの視角を算出して固視状態および跳躍状態の頻度を検出するので、的確にユーザの注目状態を検出することができる。 The information processing apparatus configured as described above extracts the pupil region using, for example, binarization processing by the line-of-sight detection unit, and calculates the movement distance of the gazing point by the gazing point calculation unit. Then, the attention state detection means detects the frequency of the fixation state and the jump state by calculating the user's viewing angle from the movement distance of the gazing point obtained by the gazing point calculation means, and thus accurately detects the user's attention state. can do.

このように、上記構成によれば、ユーザの注目状態を的確に把握することができるので、ユーザが興味を示しているオブジェクトがオブジェクト認識手段により的確に認識され、注釈情報提示手段はさらに的確な注釈情報をユーザに提示することができる。 In this way, according to the above configuration, the user's attention state can be accurately grasped, so that the object that the user is interested in is accurately recognized by the object recognition means, and the annotation information presentation means is more accurate. Annotation information can be presented to the user.

さらに、本発明の情報処理装置は、上記課題を解決するために、上記注目状態検出手段が、ユーザが注目状態にある間の注視点の分布を内包するように複数の注視点を結ぶことにより、上記視界画像内に最小多角形領域を画定するものであるとともに、上記注目領域判別手段が、データベース内に格納された登録画像における複数の領域のうち、上記最小多角形領域との重なりが最大となる領域を判断し、その領域に対応する上記視界画像内の領域を、上記注目領域として判別するものであることを特徴としている。 Furthermore, in order to solve the above-described problem, the information processing apparatus of the present invention connects the plurality of gazing points so that the attention state detection unit includes a distribution of gazing points while the user is in the attention state. And the minimum polygonal area is defined in the field-of-view image, and the attention area discriminating means has a maximum overlap with the minimum polygonal area among a plurality of areas in the registered image stored in the database. A region in the field-of-view image corresponding to the region is determined as the region of interest.

上記構成によれば、注目状態検出手段により最小多角形領域が画定される。この最小多角形領域は、注視点の分布を内包するように複数の注視点を結んで得られるものであるから、視界画像においてユーザが注目している領域を的確に表しているといえる。 According to the above configuration, the minimum polygonal area is defined by the attention state detection unit. Since the minimum polygonal area is obtained by connecting a plurality of gazing points so as to include the distribution of the gazing points, it can be said that the minimum polygonal area accurately represents an area that the user is paying attention to in the view field image.

そして、注目領域判別手段は、上記最小多角形との重なりが最大となる領域を、データベース内の登録画像について判断し、その領域に対応する視界画像内の領域を注目領域として判断するので、ユーザが注目している領域を的確に注目領域として判別することができる。 Then, the attention area determination means determines the area where the overlap with the minimum polygon is maximum for the registered image in the database, and determines the area in the view image corresponding to the area as the attention area. Can be accurately determined as a region of interest.

さらに、本発明の情報処理装置は、上記課題を解決するために、上記オブジェクト認識手段が、上記視界画像における色相と、データベース内に格納された上記視界画像の候補となり得る登録画像の色相とに基づき、ユーザが注目しているオブジェクトを認識するものであることが好ましい。 Furthermore, in the information processing apparatus of the present invention, in order to solve the above-described problem, the object recognition unit converts the hue in the field-of-view image and the hue of a registered image that can be a candidate for the field-of-view image stored in the database. Based on this, it is preferable to recognize an object that the user is paying attention to .

視界画像がたとえば絵画画像のような色彩のある画像であれば、その色相により視界画像に含まれるオブジェクトを特徴付けることが可能である。そこで、上記構成では、視界画像の色相と、登録画像の色相とに基づき、ユーザが注目しているオブジェクトを認識するので、オブジェクトの認識精度を高めることができる。 If the visual field image is an image having a color such as, for example, a painting image, the object included in the visual field image can be characterized by the hue. Therefore, in the above configuration, the object being noticed by the user is recognized based on the hue of the field-of-view image and the hue of the registered image, so that the object recognition accuracy can be improved.

さらに、本発明の情報処理装置は、上記課題を解決するために、上記注釈情報提示手段が、さらに、上記注目領域と階層構造をなす領域に関する注釈情報をユーザに提示するものであることが好ましい。 Furthermore, in the information processing apparatus of the present invention, in order to solve the above-described problem, it is preferable that the annotation information presenting unit further presents the user with annotation information regarding a region having a hierarchical structure with the region of interest. .

上記構成によれば、注目領域と階層構造をなす領域に関する注釈情報もユーザに提示されるので、ユーザは注目領域に関してより多くの情報を得ることができ、注目領域に関する理解を深めることができる。 According to the above configuration, the annotation information related to the region that forms a hierarchical structure with the region of interest is also presented to the user, so that the user can obtain more information about the region of interest and deepen their understanding of the region of interest.

また、上記眼球撮影画像を撮影する第１カメラ、および上記視界画像を撮影する第２カメラの一方または双方を携帯端末に搭載することにより、ユーザが移動しても眼球撮影画像や視界画像を撮影することができるので、本発明の情報処理装置の利便性をさらに高めることができる。 In addition, by mounting one or both of the first camera for capturing the eyeball image and the second camera for capturing the field image on the mobile terminal, the eyeball image and field image can be captured even when the user moves. Therefore, the convenience of the information processing apparatus of the present invention can be further enhanced.

さらに、携帯端末により注釈情報の再生を行うことで、移動するユーザに対しても注釈情報の提示を行うことができるので、本発明の情報処理装置の利便性をより高めることができる。 Furthermore, since the annotation information can be presented to the moving user by reproducing the annotation information using the portable terminal, the convenience of the information processing apparatus of the present invention can be further improved.

また、本発明の情報処理方法は、上記課題を解決するために、ユーザの眼球運動を撮影した眼球撮影画像に基づき、ユーザがオブジェクトに注目している状態を検出する第１ステップと、上記第１ステップによりユーザが注目状態にあると検出された際、ユーザの視界を撮影した視界画像内において、ユーザが注目しているオブジェクトを認識する第２ステップと、上記オブジェクトにおいてユーザが注目している注目領域を判別する第３ステップと、上記第３ステップにて判別された注目領域に関する注釈情報をユーザに提示する第４ステップとを備え、上記第１ステップは、上記眼球撮影画像から瞳孔領域を抽出する視線検出ステップと、上記瞳孔領域における中心座標から注視点の移動距離を算出する注視点算出ステップと、上記注視点の移動距離から、オブジェクトに対するユーザの視角を算出するとともに、その視角の大きさに基づき眼球の固視状態および跳躍状態を検出し、これら固視状態および跳躍状態の回数に基づきユーザがオブジェクトに注目している状態を検出する注目状態検出ステップとからなり、上記注目状態検出ステップは、さらに、ユーザが注目状態にある間の注視点の分布を内包するように複数の注視点を結ぶことにより、上記視界画像内に最小多角形領域を画定するステップであり、上記第３ステップは、視界画像の候補としてデータベース内に格納された登録画像における複数の領域のうち、上記最小多角形領域との重なりが最大となる領域を判断し、その領域に対応する上記視界画像内の領域を、上記注目領域として判別するステップであることを特徴としている。 In order to solve the above problem, the information processing method of the present invention includes a first step of detecting a state in which the user is paying attention to the object based on an eyeball photographed image of the user's eye movement; When it is detected that the user is in an attention state in one step, the second step of recognizing the object that the user is paying attention in the view image obtained by capturing the user's view, and the user pays attention to the object. a third step of determining a region of interest, and a fourth step of presenting the user with annotation information about the determined region of interest in the third step, the first step, a pupil region from the eyeball captured image A gaze detection step for extracting, a gaze point calculating step for calculating a moving distance of the gaze point from center coordinates in the pupil region, and the gaze The viewing angle of the user with respect to the object is calculated from the moving distance of the object, and the fixation state and the jumping state of the eyeball are detected based on the size of the viewing angle, and the user pays attention to the object based on the number of the fixation state and the jumping state. An attention state detection step for detecting a state of being connected, and the attention state detection step further includes connecting a plurality of attention points so as to include a distribution of attention points while the user is in the attention state, A step of demarcating a minimum polygonal area in the view image, wherein the third step overlaps the minimum polygonal area among a plurality of areas in a registered image stored in the database as a candidate for the viewable image. Is a step of determining a region in which the maximum value of the field of view is maximized, and determining a region in the view field image corresponding to the region as the region of interest It is characterized by a door.

上記構成の情報処理方法によれば、本発明の情報処理装置と同様の作用効果を得ることができる。 According to the information processing method having the above configuration, it is possible to obtain the same operational effects as those of the information processing apparatus of the present invention.

なお、コンピュータに上記情報処理方法における各ステップを実行させる情報処理プログラムにより、コンピュータを用いて本発明の情報処理方法と同様の作用効果を得ることができる。さらに、上記情報処理プログラムをコンピュータ読取り可能な記録媒体に記憶させることにより、任意のコンピュータ上で上記情報処理プログラムを実行させることができる。 In addition, the effect similar to the information processing method of this invention can be obtained using a computer with the information processing program which makes a computer perform each step in the said information processing method. Furthermore, the information processing program can be executed on any computer by storing the information processing program in a computer-readable recording medium.

本発明の情報処理装置によれば、オブジェクトに全体的に注目しているならばオブジェクト全体に関する注釈情報を提示し、オブジェクトに部分的に注目しているならばその部分に関する注釈情報を提示するなど、ユーザが注目した箇所に対応した注釈情報を適応的にユーザに提示することが可能となる。 According to the information processing apparatus of the present invention, if attention is focused on the object as a whole, the annotation information regarding the entire object is presented, and if attention is focused on the object, the annotation information regarding the portion is presented. It is possible to adaptively present the annotation information corresponding to the location that the user has focused on to the user.

〔１．情報処理装置の構成〕
本発明の一実施形態に係る情報処理装置について、図１〜図１６に基づいて説明する。図１に示すように、情報処理装置１は、眼球動作検出部２と、オブジェクト領域抽出部３と、オブジェクト認識部４と、注目領域判別部５と、注釈情報提示部６と、データベース７とを備えている。 [1. Configuration of information processing apparatus]
An information processing apparatus according to an embodiment of the present invention will be described with reference to FIGS. As shown in FIG. 1, the information processing apparatus 1 includes an eyeball movement detection unit 2, an object region extraction unit 3, an object recognition unit 4, a region of interest determination unit 5, an annotation information presentation unit 6, a database 7, It has.

眼球動作検出部２は、ユーザが何かに注目していることを検出するために、眼球を撮影するカメラ１７の画像（以下、眼球撮影画像とする）から、ユーザの視線の動きを検出するものである。より具体的には、眼球動作検出部２は、視線検出部８と、注視点算出部９と、注目状態検出部１０とを備えている。これらの眼球動作検出部２に設けられた各ブロックにおける処理内容については後述する。 The eyeball motion detection unit 2 detects the movement of the user's line of sight from the image of the camera 17 that captures the eyeball (hereinafter referred to as an eyeball image) in order to detect that the user is paying attention to something. Is. More specifically, the eye movement detection unit 2 includes a line-of-sight detection unit 8, a gaze point calculation unit 9, and an attention state detection unit 10. The processing content in each block provided in these eyeball motion detection units 2 will be described later.

オブジェクト領域抽出部３は、ユーザが何かに注目している際に、視界を撮影するカメラ１８の画像（以下、視界画像とする）から、オブジェクト領域を抽出するものである。オブジェクト領域の抽出手法に関しては、種々の方法を採用することができるが、ここではその一例として、オブジェクト領域抽出部３に設けられる水平方向エッジ検出部１１、垂直方向エッジ検出部１２、色相分割部１３、およびヒストグラム算出部１４によりオブジェクト領域を抽出する手順について説明する。なお、これらのオブジェクト領域抽出部３に設けられた各ブロックにおける処理内容の詳細については後述する。 The object area extraction unit 3 extracts an object area from an image of a camera 18 that captures a field of view (hereinafter referred to as a field of view image) when the user is paying attention to something. Various methods can be employed for the object region extraction method. Here, as an example, a horizontal direction edge detection unit 11, a vertical direction edge detection unit 12, and a hue division unit provided in the object region extraction unit 3 are used. 13 and a procedure for extracting an object region by the histogram calculation unit 14 will be described. Details of processing contents in each block provided in the object area extraction unit 3 will be described later.

オブジェクト認識部４は、視界画像からオブジェクト領域抽出部３により抽出された領域に映っている画像と、データベース７内に格納された登録画像とのマッチング処理を行い、ユーザが注目している画像を認識するものである。なお、登録画像とは、予めデータデース内に格納された、視界画像の候補となる画像である。 The object recognizing unit 4 performs a matching process between the image shown in the region extracted from the view field image by the object region extracting unit 3 and the registered image stored in the database 7, and selects the image that the user is paying attention to. Recognize. Note that a registered image is an image that is stored in advance in the data database and that is a candidate for a view field image.

このマッチング処理の手法に関しても、オブジェクト領域の抽出手法と同様に種々の方法を採用することができる。ここでは、マッチング処理の一例として、オブジェクト認識部４に設けられた色相平均算出部１５および色差算出部１６を用いて、オブジェクト領域抽出部３により抽出された画像とデータベース７内の登録画像とのそれぞれにおける色相の平均値・色差に着目する手法について説明する（詳細は後述する）。 With respect to this matching processing method, various methods can be employed as in the object region extraction method. Here, as an example of the matching process, the hue average calculation unit 15 and the color difference calculation unit 16 provided in the object recognition unit 4 are used to compare the image extracted by the object region extraction unit 3 and the registered image in the database 7. A method of paying attention to the average value and the color difference of each hue will be described (details will be described later).

注目領域判別部５は、オブジェクト認識部４によりユーザが注目していると認識された画像において、ユーザがどの領域に注目しているかさらに詳細に判別するものである。この注目領域判別部５は、データベース７に登録された画像における領域であって、注視点座標に基づいて決定される最小凸多角形領域（詳細は後述する）との重なりが最大となる領域を、ユーザが注目していると判断する。注目領域の判別処理についての詳細は後述する。 The attention area discriminating unit 5 discriminates in more detail which area the user is paying attention to in the image recognized by the object recognition unit 4 as being noticed by the user. This attention area discriminating unit 5 is an area in the image registered in the database 7 and has an area where the overlap with the minimum convex polygon area (details will be described later) determined based on the gazing point coordinates is maximum. , It is determined that the user is paying attention. Details of the attention area determination processing will be described later.

注釈情報提示部６は、注目領域判別部５によりユーザが注目していると判別された領域に関する注釈情報（たとえばテキストデータ、音声データ、静止画像データ、動画像データ）をデータベース７から読み出し、情報処理装置１の外部に設けられた出力装置（たとえばノートパソコン、ヘッド・マウンテッド・ディスプレイ、ヘッドホン、スピーカ）に出力するためのものである。 The annotation information presenting unit 6 reads annotation information (for example, text data, audio data, still image data, moving image data) related to the region determined to be noticed by the user by the attention region determining unit 5 from the database 7, and information This is for outputting to an output device (for example, a notebook personal computer, a head mounted display, headphones, a speaker) provided outside the processing device 1.

データベース７は、ユーザにその注釈情報が提供される登録画像と、登録画像内のオブジェクトに関するデータとして、オブジェクトの色、オブジェクト上で注釈情報が付加されている領域の座標、その注釈情報とを格納している。 The database 7 stores a registered image in which the annotation information is provided to the user, the color of the object, the coordinates of the region to which the annotation information is added, and the annotation information as data related to the object in the registered image. is doing.

なお、カメラ１７は、たとえばＣＣＤカメラであり、眼球の下部に設置される。また、カメラ１８はたとえばＣＣＤカメラであり、眉間に設置される。これら２つのカメラ１７・１８は、ユーザが正面を見ているときの眼球と、そのときの視界の中心とが、それぞれのカメラ画像において中心に映るような位置関係に配置される。 The camera 17 is a CCD camera, for example, and is installed below the eyeball. The camera 18 is a CCD camera, for example, and is installed between the eyebrows. These two cameras 17 and 18 are arranged in such a positional relationship that the eyeball when the user is looking at the front and the center of the visual field at that time are reflected in the center of each camera image.

上記構成により、情報処理装置１は、ユーザの眼球運動から、ユーザがオブジェクトに注目している状態を検出することにより、ユーザが注目しているオブジェクトの注釈情報を提示することができる。この注釈情報が提示されるまでの一連の処理を、図２のフローチャートを用いて説明する。 With the above configuration, the information processing apparatus 1 can present the annotation information of the object focused on by the user by detecting the state where the user is focused on the object from the user's eye movement. A series of processes until the annotation information is presented will be described with reference to the flowchart of FIG.

まず、図２に示すように、ユーザが何かのオブジェクトに注目している状態が、眼球動作検出部２により検出される（ステップ１、以下ステップを単にＳと記載する）。その後、オブジェクト領域抽出部３により、視界画像が獲得されるとともに（Ｓ２）、その視界画像からオブジェクト領域の抽出が行われる（Ｓ３）。さらに、オブジェクト認識部４によりユーザが注目している画像の認識が行われ（Ｓ４）、ユーザが注目していると認識された画像において、ユーザがどの領域に注目しているか、注目領域判別部５によりさらに詳細に判別される（Ｓ５）。そして、注釈情報提示部６により、ユーザが注目していると判別された領域に関する注釈情報がユーザに提示され（Ｓ６）、一定時間経過した後に注釈情報の提示が終了する（Ｓ７）。 First, as shown in FIG. 2, a state in which the user is paying attention to an object is detected by the eyeball motion detection unit 2 (step 1; hereinafter, steps are simply referred to as S). Thereafter, a view field image is acquired by the object region extraction unit 3 (S2), and an object region is extracted from the view field image (S3). Further, the image recognizing the user's attention is recognized by the object recognizing unit 4 (S4), and in the image recognized as the user's attention, which region the user is paying attention to is the attention region determining unit 5 is determined in more detail (S5). Then, the annotation information presenting unit 6 presents the annotation information regarding the area determined to be noticed by the user to the user (S6), and the presentation of the annotation information is finished after a predetermined time (S7).

最後に、注目の検出を再開するか否かの判断が行われ（Ｓ８）、再開する場合にはＳ１の処理に戻り、再開しない場合は、一連の処理が終了する。なお、Ｓ８における判断は、たとえば、スイッチ、キーボード、マウス等の入力デバイスから、ユーザにより「注目の検出を再開する」旨の入力があったか否かを判断することにより行うことができる。 Finally, it is determined whether or not attention detection is to be resumed (S8). If it is resumed, the process returns to S1, and if it is not resumed, the series of processes is terminated. Note that the determination in S8 can be made by determining whether or not the user has input “to resume attention detection” from an input device such as a switch, keyboard, or mouse.

本実施形態における情報処理装置１は、上述した構成および処理フローにより、ユーザが注目している領域に関する注釈情報を、的確にユーザに提示することができる。以下、情報処理装置１における各ブロックにより行われる処理の詳細について説明する。 The information processing apparatus 1 according to the present embodiment can accurately present the annotation information related to the area that the user is paying attention to the user with the above-described configuration and processing flow. Details of processing performed by each block in the information processing apparatus 1 will be described below.

〔２．眼球動作検出部２における処理〕
眼球動作検出部２は、上述したように、ユーザが何かに注目していることを検出するために、眼球撮影画像から、ユーザの視線の動きを検出するものである。なお、以下の説明では、ユーザが注目状態にあることが検出されたときの、注目開始時から注目終了時までを区間を「注目区間」と定義する。 [2. Processing in Eyeball Motion Detection Unit 2]
As described above, the eyeball motion detection unit 2 detects the movement of the user's line of sight from the eyeball photographed image in order to detect that the user is paying attention to something. In the following description, the section from the start of attention to the end of attention when the user is detected to be in the attention state is defined as “attention section”.

また、眼球動作検出部２は、上述したとおり、視線検出部８と、注視点算出部９と、注目状態検出部１０とを備えており（図１参照）、視線検出部８によりユーザの視線を検出するとともに、注視点算出部９により視線に対応した注視点座標を算出する。なお、注視点座標とは、ユーザが実世界において視線を合わせていた点（以下、注視点と呼ぶ）を、視界画像における座標で示したものとする。また、注目区間における注視点の広がりを「注視点分布」と定義する。 Further, as described above, the eyeball motion detection unit 2 includes the line-of-sight detection unit 8, the gaze point calculation unit 9, and the attention state detection unit 10 (see FIG. 1). And a gaze point coordinate corresponding to the line of sight is calculated by the gaze point calculation unit 9. Note that the gaze point coordinate is a point (hereinafter referred to as a gaze point) at which the user has matched his / her line of sight in the real world, and is indicated by coordinates in the view field image. Further, the spread of the gazing point in the attention section is defined as “gaze point distribution”.

〔２−１．視線検出部８における処理〕
視線検出部８によるユーザの視線の検出は、瞳孔の位置を検出することにより行われる。つまり、より正確な視線の位置を求めるため、カメラ１７から眼球に赤外線が照射されており、瞳孔と虹彩とのコントラストが強調されている。そして、視線検出部８は、眼球撮影画像の二値化処理、つまり瞳孔とそうでない部分とを区別することによって、瞳孔領域を抽出する。 [2-1. Processing in the line-of-sight detection unit 8]
Detection of the user's line of sight by the line-of-sight detection unit 8 is performed by detecting the position of the pupil. That is, in order to obtain a more accurate line-of-sight position, infrared rays are emitted from the camera 17 to the eyeball, and the contrast between the pupil and the iris is enhanced. Then, the line-of-sight detection unit 8 extracts a pupil region by distinguishing between a binarization process of an eyeball image, that is, a pupil and a portion that is not.

さらに、視線検出部８は、瞳孔領域の中心座標(x(t)、y(t))を、時刻t(フレーム)における瞳孔の位置として算出する。なお、眼球撮影画像の水平方向をx軸、垂直方向をy軸とし、原点を画像の左下として設定している。なお、以下の説明における各種設定値は、画像の大きさを水平方向１６０ピクセル、垂直方向１２０ピクセルとし、毎秒１０フレームで獲得する場合の一例であり、獲得画像の大きさの変更やここで想定している物とは特性の異なるカメラ等の使用を前提とした場合は適宜最適な値を採用することができる。 Further, the line-of-sight detection unit 8 calculates the center coordinates (x (t), y (t)) of the pupil region as the position of the pupil at time t (frame). The horizontal direction of the eyeball image is set as the x axis, the vertical direction is set as the y axis, and the origin is set as the lower left of the image. Note that the various setting values in the following description are examples of the case where the image size is 160 pixels in the horizontal direction and 120 pixels in the vertical direction and acquired at 10 frames per second. When it is assumed that a camera having a characteristic different from that of the object being used is used, an optimal value can be appropriately adopted.

〔２−２．注視点算出部９における処理〕
次に、注視点算出部９により、注視点座標（G_u(t),G_v(t)）を、以下の式（１）によって求める。なお、視界画像における水平方向をu軸、垂直方向をv軸とし、原点を画像の左下とする。 [2-2. Processing in the gaze point calculation unit 9]
Next, the gaze point calculation unit 9 obtains the gaze point coordinates (G _u (t), G _v (t)) by the following equation (1). In the view image, the horizontal direction is the u axis, the vertical direction is the v axis, and the origin is the lower left of the image.

ここで、視界画像および眼球撮影画像の一部領域(40≦x≦120、40≦y≦80)を4×4の16ブロックに分割する。また、(D(i、j)_x、D(i、j)_y)(i,j=1,…,4)は眼球撮影画像のブロック(i、j)の中心座標とし、(Cv(i、j)_u、Cv(i、j)_v)は視界画像のブロック(i、j)の中心座標とする。 Here, a partial area (40 ≦ x ≦ 120, 40 ≦ y ≦ 80) of the view field image and the eyeball image is divided into 16 blocks of 4 × 4. Also, (D (i, j) _x , D (i, j) _y ) (i, j = 1, ..., 4) are the center coordinates of the block (i, j) of the eyeball image, and (Cv (i , J) _u and Cv (i, j) _v ) are the center coordinates of the block (i, j) of the view image.

なお、ブロック(i、j)とは、画像の左下から数えて、水平方向にi番目、垂直方向にj番目のブロックのことである。また、α(i、j)、β(i、j)は各ブロックにおける、瞳孔の中心の移動距離に対する注視点の移動距離の比を表す値である。また、瞳孔の中心の移動距離ｄは、以下の式（２）に基づき、注視点算出部９により単位をピクセルとして算出される。 Note that the block (i, j) is the i-th block in the horizontal direction and the j-th block in the vertical direction, counting from the lower left of the image. Further, α (i, j) and β (i, j) are values representing the ratio of the moving distance of the gazing point to the moving distance of the center of the pupil in each block. Further, the moving distance d of the center of the pupil is calculated by the gaze point calculation unit 9 in units of pixels based on the following formula (2).

〔２−３．注目状態検出部１０における処理〕
眼球運動は人間の注意や関心を表しており、人間が静止物体に注目している場合は、約300ミリ秒間の固視状態と、約30ミリ秒間に起こる跳躍運動とを頻繁に繰り返すことが知られている（非特許文献７参照）。注目状態検出部１０は、この眼球運動の特徴を利用することにより、ユーザがオブジェクトに注目している状態を検出するものである。 [2-3. Processing in attention state detection unit 10]
Eye movements represent the attention and interest of human beings, and when humans are paying attention to stationary objects, they often repeat a fixation state of about 300 milliseconds and a jumping movement that occurs in about 30 milliseconds. It is known (see Non-Patent Document 7). The attention state detection unit 10 detects a state in which the user is paying attention to the object by using the feature of the eye movement.

すなわち、注目状態検出部１０は、瞳孔の位置が3フレームの間(約300ミリ秒)動かないか、または微小に動いており、4フレーム目で瞳孔の位置が跳躍する状態を検出し、この状態を固視・跳躍運動として検出する。そして、注目状態検出部１０は、固視・跳躍運動が連続して3回以上検出されたとき、ユーザが注目状態にあると判定する。 That is, the attention state detection unit 10 detects a state in which the position of the pupil does not move for 3 frames (about 300 milliseconds) or moves slightly, and the position of the pupil jumps in the 4th frame. The state is detected as a fixation / jumping motion. Then, the attention state detection unit 10 determines that the user is in the attention state when the fixation / jumping motion is detected three or more times continuously.

ただし、注目状態検出部１０は、3フレーム間で、瞳孔の位置の移動が視角にして2.1°未満であれば固視とみなし、固視状態から瞳孔の位置の移動が視角にして2.1°以上であれば跳躍とみなす。また、跳躍が生じてから3秒間で次の跳躍が生じないとき、注目状態検出部１０は固視・跳躍が終了したと判断する。なお、2.1°なる値は、固視であるか跳躍であるかを区別するしきい値の一例であり、他の値であっても構わない。 However, if the movement of the pupil position is less than 2.1 ° in terms of viewing angle between three frames, the attention state detection unit 10 regards as fixation, and the movement of the pupil position from the fixation state is 2.1 ° or more in terms of viewing angle. If so, it is considered a jump. Further, when the next jump does not occur in 3 seconds after the jump occurs, the attention state detection unit 10 determines that the fixation / jump has ended. Note that the value of 2.1 ° is an example of a threshold value for distinguishing between fixation and jumping, and other values may be used.

ここで、視角θは、以下の式（３）に基づいて算出される。 Here, the viewing angle θ is calculated based on the following equation (3).

上記式（３）における「0.024」という数値は、眼球撮影画像における１ピクセルあたりの距離(cm)である。もちろん、画像の精細度に応じて、0.024なる値は、他の値となっても構わない。また、式（３）における「2.0」という数値は、眼球と眼球を撮影するカメラ１７との距離(cm)である。もちろん、眼球の撮影条件に応じて、2.0なる値は、他の値にされてもよい。 The numerical value “0.024” in the above formula (3) is the distance (cm) per pixel in the eyeball image. Of course, depending on the definition of the image, the value 0.024 may be another value. Further, the numerical value “2.0” in the equation (3) is the distance (cm) between the eyeball and the camera 17 for photographing the eyeball. Of course, the value of 2.0 may be changed to another value depending on the imaging condition of the eyeball.

また、注目区間において発生する注視点座標のそれぞれは、ユーザが注目状態であったときの瞳孔の位置から算出しているため、ユーザが注目しているオブジェクト上、またはその付近に現れる。そのため、注目状態検出部１０は、１注目区間、すなわち注目の始まりから注目の終わりを検出するまでの間の注視点分布が内包されるように、複数の注視点座標を結ぶことにより最小凸多角形領域（最小多角形領域）を画定する。したがって、この最小凸多角形領域には、注目していたオブジェクト全体、またはその一部が含まれることになる。 In addition, since each of the gaze point coordinates generated in the attention section is calculated from the position of the pupil when the user is in the attention state, it appears on or near the object that the user is paying attention to. For this reason, the attention state detection unit 10 connects a plurality of gazing point coordinates so that the gazing point distribution from one attention interval, that is, from the start of attention to the end of attention, is included, thereby reducing the minimum convexity. A square area (minimum polygon area) is defined. Accordingly, the minimum convex polygonal region includes the entire object of interest or a part thereof.

〔２−４．眼球動作検出部２における処理フロー〕
次に、眼球動作検出部２において実行される、ユーザが何かに注目していることを検出するための処理について、図３のフローチャートを用いて説明する。 [2-4. Processing flow in eyeball motion detection unit 2]
Next, the process for detecting that the user is paying attention to something, which is executed in the eyeball movement detection unit 2, will be described with reference to the flowchart of FIG.

先ず、注目状態検出部１０により、連続回数が０にリセットされるとともに（Ｓ１０）、固視回数も０にリセットされる（Ｓ１１）。なお、連続回数とは、固視・跳躍運動が連続した回数を示している。以下、これと同様の意味において「連続回数」の文言を用いる。 First, the attention state detection unit 10 resets the number of consecutive times to 0 (S10), and also resets the number of fixations to 0 (S11). The number of continuous times indicates the number of times that the fixation / jumping motion is continued. Hereinafter, the term “number of consecutive times” is used in the same meaning.

その後、視線検出部８により、眼球撮影画像がカメラ１７から獲得される（Ｓ１２）。さらに、視線検出部８は、Ｓ１２において獲得された眼球撮影画像を２値化処理することで、眼球撮影画像から瞳孔領域を抽出する（Ｓ１３）。 Thereafter, an eyeball image is acquired from the camera 17 by the line-of-sight detection unit 8 (S12). Further, the line-of-sight detection unit 8 extracts a pupil region from the eyeball image by performing binarization processing on the eyeball image acquired in S12 (S13).

その後、注視点算出部９は、上記式（２）により瞳孔の中心の移動距離を算出するとともに（Ｓ１４）、視線検出部８が求めた瞳孔領域の中心座標(x(t)、y(t))から、上記式（１）に基づいて注視点座標（G_u(t),G_v(t)）を算出する（Ｓ１５）。 Thereafter, the gazing point calculation unit 9 calculates the movement distance of the center of the pupil by the above equation (2) (S14), and the center coordinates (x (t), y (t) of the pupil region obtained by the gaze detection unit 8 )), The gaze point coordinates (G _u (t), G _v (t)) are calculated based on the above equation (1) (S15).

そして、注目状態検出部１０は、上述の式（３）に基づき、視角θを算出し（Ｓ１６）、その値が2.1°以上となるか否かを判断する（Ｓ１７）。視角θが2.1°以上でない場合、注目状態検出部１０はユーザが固視状態にあると判断し、固視回数を１回増やす（Ｓ１８）。そして、Ｓ１８の処理が終了したら、再度Ｓ１２の処理に移行する。 Then, the attention state detection unit 10 calculates the viewing angle θ based on the above equation (3) (S16), and determines whether or not the value is 2.1 ° or more (S17). If the viewing angle θ is not 2.1 ° or more, the attention state detection unit 10 determines that the user is in a fixation state, and increases the number of fixations by one (S18). Then, when the process of S18 is completed, the process proceeds to S12 again.

一方、Ｓ１７において視角θが2.1°以上であると判断された場合、注目状態検出部１０は、ユーザの視線が跳躍したと判断し（Ｓ１９）、現在までの固視回数が３回以上か否かを判断する（Ｓ２０）。 On the other hand, when it is determined in S17 that the viewing angle θ is 2.1 ° or more, the attention state detection unit 10 determines that the user's line of sight has jumped (S19), and whether or not the number of fixations so far is three or more. Is determined (S20).

Ｓ２０において固視回数が３回以上ではないと判断されれば、注目状態検出部１０により固視回数が０にリセットされてから（Ｓ２１）、Ｓ１２の処理に移行する。また、Ｓ２０において固視回数が３回以上であると判断したら、注目状態検出部１０は、固視回数を０にリセットした後に（Ｓ２２）、ユーザが固視・跳躍状態にあると判断する（Ｓ２３）。 If it is determined in S20 that the number of fixations is not 3 or more, the attention state detection unit 10 resets the number of fixations to 0 (S21), and then the process proceeds to S12. If it is determined in S20 that the number of fixations is 3 or more, the attention state detection unit 10 determines that the user is in a fixation / jumping state after resetting the number of fixations to 0 (S22) ( S23).

そして、Ｓ２３の後、注目状態検出部１０は、最後の跳躍が生じてから３秒以内に次の跳躍が生じるか否かを判断する（Ｓ２４）。そして、Ｓ２４において３秒以内に次の跳躍が生じたと判断したら、注目状態検出部１０は、ユーザの固視・跳躍運動が連続しているものと判断して、連続回数を１回増やす（Ｓ２５）。そして、Ｓ２５の処理が終了したら、再度Ｓ１２の処理に移行する。 After S23, the attention state detection unit 10 determines whether or not the next jump occurs within 3 seconds after the last jump occurs (S24). If it is determined in S24 that the next jump has occurred within 3 seconds, the attention state detection unit 10 determines that the user's fixation / jumping motion is continuous, and increases the number of consecutive times by one (S25). ). Then, when the process of S25 is completed, the process proceeds to S12 again.

また、Ｓ２４において３秒以内に次の跳躍が生じなかったと判断したら、注目状態検出部１０は、連続回数が３回以上か否かを判断する（Ｓ２６）。Ｓ２６において連続回数が３回以上でなければ、注目状態検出部１０は、連続回数を０にリセットする（Ｓ２７）。Ｓ２７の後、Ｓ１２の処理に移行する。また、Ｓ２６において連続回数が３回以上であれば、注目状態検出部１０は、ユーザが注目状態にあるものと判断する（Ｓ２８）。 If it is determined in S24 that the next jump has not occurred within 3 seconds, the attention state detection unit 10 determines whether the number of consecutive times is 3 or more (S26). If the number of continuous times is not 3 or more in S26, the attention state detection unit 10 resets the number of continuous times to 0 (S27). After S27, the process proceeds to S12. If the number of consecutive times is 3 or more in S26, the attention state detection unit 10 determines that the user is in the attention state (S28).

このようにしてＳ１０〜Ｓ２８のステップを踏むことにより、眼球動作検出部２による注目状態の検出処理が終了する。 In this way, the steps of S10 to S28 are performed, and the eye state detection processing by the eye movement detection unit 2 ends.

〔３．オブジェクト領域抽出部３における処理〕
オブジェクト領域抽出部３は、上述したとおり、視界画像からオブジェクト領域を抽出するものである。本実施形態では、美術館等で展示されている絵画をオブジェクトの一例として、注目時の視界画像に映っている絵画領域を抽出する方法について説明する。 [3. Processing in Object Area Extraction Unit 3]
As described above, the object area extraction unit 3 extracts an object area from the view field image. In the present embodiment, a method for extracting a painting area shown in a visual field image at the time of attention will be described using paintings displayed in a museum as an example of an object.

オブジェクト領域抽出部３がどのようにして絵画領域を抽出するかについて、簡単に説明する。絵画が映っている視界画像において、壁と額縁との境界で水平方向・垂直方向に直線のエッジが現れる。オブジェクト領域抽出部３は、水平方向エッジを水平方向エッジ検出部１１により検出するとともに、垂直方向エッジを垂直方向エッジ検出部１２により検出する。 How the object area extraction unit 3 extracts a picture area will be briefly described. In a field of view image showing a painting, straight edges appear in the horizontal and vertical directions at the boundary between the wall and the picture frame. The object area extraction unit 3 detects the horizontal edge by the horizontal edge detection unit 11 and the vertical edge by the vertical edge detection unit 12.

また、HSV表色系の色相において、壁と額縁との境界で色相が変化する。オブジェクト領域抽出部３は、この色相の変化を色相分割部１３およびヒストグラム算出部１４により検出し、直線のエッジと色相の境界とに基づいて、視界画像から絵画領域を抽出する。 In the hue of the HSV color system, the hue changes at the boundary between the wall and the frame. The object region extraction unit 3 detects the change in hue by the hue division unit 13 and the histogram calculation unit 14 and extracts a painting region from the visual field image based on the straight edge and the boundary of the hue.

次に、オブジェクト領域抽出部３における一連の処理について、図４のフローチャートを用いて説明する。図４に示すように、先ず、水平方向エッジ検出部１１および垂直方向エッジ検出部１２のいずれか一方、または双方により、視界画像の濃淡画像から、たとえばrobertsフィルタを用いてエッジ画像が生成される（Ｓ３０）。Ｓ３０の後、水平方向エッジ検出部１１により、水平方向に現れる直線エッジの長さが算出される（Ｓ３１）。また、Ｓ３１の後、垂直方向エッジ検出部１２により、垂直方向に現れる直線エッジの長さが算出される（Ｓ３２）。 Next, a series of processes in the object area extraction unit 3 will be described with reference to the flowchart of FIG. As shown in FIG. 4, first, one or both of the horizontal edge detection unit 11 and the vertical edge detection unit 12 generates an edge image from the grayscale image of the field-of-view image using, for example, a roberts filter. (S30). After S30, the horizontal edge detection unit 11 calculates the length of the straight edge appearing in the horizontal direction (S31). Further, after S31, the length of the straight edge appearing in the vertical direction is calculated by the vertical edge detection unit 12 (S32).

なお、Ｓ３２は必ずしもＳ３１の後に行われる必要は無く、Ｓ３１の前にＳ３２が実行されてもよいし、Ｓ３１の処理とＳ３２の処理とを並行して行ってもよい。これらのＳ３１およびＳ３２における処理の詳細については後述する。 In addition, S32 does not necessarily need to be performed after S31, S32 may be performed before S31, and the process of S31 and the process of S32 may be performed in parallel. Details of the processes in S31 and S32 will be described later.

さらに、水平方向エッジ検出部１１は、水平方向で隣り合うエッジでより長いエッジを検出し、額縁のエッジ候補とする。また、垂直方向エッジ検出部１２は、垂直方向で隣り合うエッジでより長いエッジを検出し、額縁のエッジ候補とする。なお、図４のフローチャートでは、上述の水平方向エッジ検出部１１および垂直方向エッジ検出部１２によるエッジ候補の検出処理を、まとめてＳ３３として記載している。 Further, the horizontal direction edge detection unit 11 detects a longer edge from adjacent edges in the horizontal direction and sets it as a frame edge candidate. Further, the vertical edge detection unit 12 detects a longer edge from adjacent edges in the vertical direction and sets it as a frame edge candidate. In the flowchart of FIG. 4, the edge candidate detection processing by the horizontal edge detection unit 11 and the vertical edge detection unit 12 is collectively described as S <b> 33.

その後、色相分割部１３により、エッジ画像の色相が７等級に分割される（Ｓ３４）。そして、ヒストグラム算出部１４により、額縁のエッジ候補から色相のヒストグラムが算出される（Ｓ３５）。なお、Ｓ３４およびＳ３５における処理の詳細に関しては、後述する。 Thereafter, the hue division unit 13 divides the hue of the edge image into seven grades (S34). Then, the histogram calculator 14 calculates a hue histogram from the frame edge candidates (S35). Details of the processes in S34 and S35 will be described later.

その後、全ての額縁のエッジ候補に対して、ヒストグラム算出部１４は、エッジ候補の両側の領域に関して、色相のヒストグラムの差CS_eを、以下の式（４）により算出する（Ｓ３６）。 Then, for every frame of edge candidates, the histogram calculating section 14, with respect to both sides of the region of the edge candidates, a difference CS _e histogram of the hue, it is calculated by the following equation (4) (S36).

なお、hist_lnは、額縁のエッジ候補を境界にして左（または上）に１画素ずらした領域における色相のヒストグラムであり、hist_rnは、額縁のエッジ候補を境界にして右（または下）に１画素ずらした領域における色相のヒストグラムである。なお、ｎは１から７までの整数であり、色相の等級を示している。なお、色相の等級数を７とすれば、視界画像に映っている壁、額縁、絵画の境界で色相が大まかに分かれることは実験により確認されている。 Note that hist _ln is a hue histogram in a region shifted by one pixel to the left (or top) with respect to the frame edge candidate, and hist _rn is right (or down) with the frame edge candidate as a boundary. It is a hue histogram in a region shifted by one pixel. Note that n is an integer from 1 to 7, indicating the hue grade. It has been confirmed by experiments that the hue is roughly divided at the boundary of the wall, the frame, and the picture shown in the view image when the hue grade number is 7.

式（４）によるCS_eの算出処理が終了したら、水平方向エッジ検出部１１は、CS_eの大きさが上位となるエッジを水平方向において２本検出する（Ｓ３７）。また、垂直方向エッジ検出部１２は、CS_eの大きさが上位となるエッジを垂直方向において２本検出する（Ｓ３７）。そして、オブジェクト領域抽出部３は、Ｓ３７において検出された４本のエッジで囲まれた領域を、絵画領域として抽出する（Ｓ３８）。 When the calculation process of CS _e according to Equation (4) is completed, the horizontal edge detection unit 11 detects two edges in the horizontal direction that have a higher CS _e magnitude (S37). Further, the vertical edge detection unit 12 detects two edges in the vertical direction whose CS _e is higher in size (S37). Then, the object area extraction unit 3 extracts the area surrounded by the four edges detected in S37 as a painting area (S38).

上述したＳ３０〜Ｓ３８の一連のステップを踏むことにより、オブジェクト領域抽出部３は、視界画像からオブジェクト領域としての絵画領域を抽出する。 By following the series of steps S30 to S38 described above, the object area extraction unit 3 extracts a picture area as an object area from the view field image.

〔３−１．水平方向のエッジの長さの算出処理〕
次に、上述した図４のフローにおけるＳ３１の処理の詳細について説明する。たとえば、図５（ａ）に示すような視界画像が得られたとする。この場合、図６に示すように、水平方向エッジ検出部１１は、視界画像においてｙ＝ａ（１≦ａ≦１１８、ａは整数）で示される直線に関して、以下の処理を行う。 [3-1. (Calculation process of horizontal edge length)
Next, details of the process of S31 in the flow of FIG. 4 described above will be described. For example, assume that a field-of-view image as shown in FIG. In this case, as shown in FIG. 6, the horizontal direction edge detection unit 11 performs the following processing on a straight line indicated by y = a (1 ≦ a ≦ 118, a is an integer) in the view field image.

先ず、水平方向エッジ検出部１１は、エッジとなる画素が直線ｙ＝ａ上において連続する数（連続画素数）を、０にリセットする（Ｓ４０）。そして、水平方向エッジ検出部１１は、ａの値を固定した状態で、ｘ座標の値を示すｂの値を、１から１５９までの整数値として変更することにより、以下のＳ４１〜Ｓ４５の処理を行う。 First, the horizontal edge detection unit 11 resets the number of continuous pixels on the straight line y = a (the number of continuous pixels) to 0 (S40). And the horizontal direction edge detection part 11 changes the value of b which shows the value of x coordinate as an integer value from 1 to 159 in the state which fixed the value of a, and processes of the following S41-S45 I do.

先ず、Ｓ４１において、水平方向エッジ検出部１１は、座標（ｂ，ａ−１），（ｂ，ａ），および（ｂ，ａ＋１）における画素のうち、いずれかがエッジとなる画素であるか否かを判断する。 First, in S <b> 41, the horizontal edge detection unit 11 determines whether any of the pixels at the coordinates (b, a−1), (b, a), and (b, a + 1) is an edge. Determine whether.

Ｓ４１における判断がＹＥＳの場合、つまり、座標（ｂ，ａ−１），（ｂ，ａ），および（ｂ，ａ＋１）における画素のうちいずれかがエッジとなる画素ならば、水平方向エッジ検出部１１は、座標（ｂ，ａ）の画素をエッジとみなす（Ｓ４２）。その後、水平方向エッジ検出部１１は、座標（ｂ−１，ａ），および（ｂ，ａ）における画素が、ともにエッジとなる画素であるか否かを判断する（Ｓ４３）。 If the determination in S41 is YES, that is, if any of the pixels at coordinates (b, a-1), (b, a), and (b, a + 1) is an edge, the horizontal edge detection unit 11 regards a pixel at coordinates (b, a) as an edge (S42). Thereafter, the horizontal edge detection unit 11 determines whether or not the pixels at the coordinates (b-1, a) and (b, a) are both edges (S43).

なお、このように（ｂ，ａ−１）および（ｂ，ａ＋１）の画素がエッジとなるか否かを考慮するのは、求めたいエッジが大まかに直線となるエッジだからである。また、Ｓ４１における判断がＮＯの場合、つまり、座標（ｂ，ａ−１），（ｂ，ａ），および（ｂ，ａ＋１）における画素のうちどれもエッジとなる画素ではないと判断されれば、水平方向エッジ検出部１１は、Ｓ４２の処理を行うことなくＳ４３の処理を行う。 The reason why the pixels (b, a-1) and (b, a + 1) are edges as described above is that the edge to be obtained is an edge that is roughly a straight line. Also, if the determination in S41 is NO, that is, if it is determined that none of the pixels at coordinates (b, a-1), (b, a), and (b, a + 1) is an edge pixel. The horizontal edge detection unit 11 performs the process of S43 without performing the process of S42.

Ｓ４３において、座標（ｂ−１，ａ）および（ｂ，ａ）における画素が、ともにエッジとなる画素であると判断したら、水平方向エッジ検出部１１は、連続画素数を１増やす（Ｓ４４）。一方、Ｓ４３において、座標（ｂ−１，ａ）および（ｂ，ａ）における画素が、ともにエッジとなる画素ではないと判断したら、水平方向エッジ検出部１１は、連続画素数を０にリセットする（Ｓ４５）。 If it is determined in S43 that the pixels at the coordinates (b-1, a) and (b, a) are both edges, the horizontal edge detection unit 11 increases the number of continuous pixels by 1 (S44). On the other hand, when it is determined in S43 that the pixels at the coordinates (b-1, a) and (b, a) are not edges, the horizontal edge detection unit 11 resets the number of continuous pixels to zero. (S45).

このようにして、Ｓ４１〜Ｓ４５の処理が、ｂの値を１から１５９までの間に含まれる整数値として変化させつつ行われる。そして、各整数値について求められた連続画素数のうち、最大となる連続画素数を、水平方向エッジ検出部１１は、ｙ＝ａにおける直線エッジの長さとして算出する（Ｓ４６）。 In this way, the processing of S41 to S45 is performed while changing the value of b as an integer value included between 1 and 159. And the horizontal direction edge detection part 11 calculates the largest continuous pixel number among the continuous pixel numbers calculated | required about each integer value as the length of the straight line edge in y = a (S46).

上述したＳ４０〜Ｓ４６における処理は、ａの値を１から１１８までの間に含まれる整数値として変化させつつ行われる。このようにして求められた水平方向のエッジの長さを、図５（ｂ）の右側のグラフに示す。 The processes in S40 to S46 described above are performed while changing the value of a as an integer value included between 1 and 118. The length of the edge in the horizontal direction thus obtained is shown in the graph on the right side of FIG.

〔３−２．垂直方向のエッジの長さの算出処理〕
次に、上述した図４のフローにおけるＳ３２の処理の詳細について説明する。たとえば、図５（ａ）に示すような視界画像が得られたとする。この場合、図７に示すように、垂直方向エッジ検出部１２は、視界画像においてｘ＝ｂ（１≦ｂ≦１５８、ｂは整数）で示される直線に関して、以下の処理を行う。 [3-2. (Calculation process of vertical edge length)
Next, details of the process of S32 in the flow of FIG. 4 described above will be described. For example, assume that a field-of-view image as shown in FIG. In this case, as shown in FIG. 7, the vertical edge detection unit 12 performs the following processing on a straight line indicated by x = b (1 ≦ b ≦ 158, b is an integer) in the view field image.

先ず、垂直方向エッジ検出部１２は、エッジとなる画素が直線ｘ＝ｂ上において連続する数（連続画素数）を、０にリセットする（Ｓ５０）。そして、垂直方向エッジ検出部１２は、ｂの値を固定した状態で、ｙ座標の値を示すａの値を、１から１１９までの整数値として変更することにより、以下のＳ５１〜Ｓ５５の処理を行う。 First, the vertical edge detection unit 12 resets the number of continuous pixels on the straight line x = b (the number of continuous pixels) to 0 (S50). Then, the vertical edge detection unit 12 changes the value of a indicating the value of the y coordinate as an integer value from 1 to 119 in a state where the value of b is fixed, thereby performing the following processing of S51 to S55. I do.

先ず、Ｓ５１において、垂直方向エッジ検出部１２は、座標（ｂ−１，ａ），（ｂ，ａ１），および（ｂ＋１，ａ）における画素のうち、いずれかがエッジとなる画素であるか否かを判断する。 First, in S51, the vertical edge detection unit 12 determines whether any one of the pixels at the coordinates (b-1, a), (b, a1), and (b + 1, a) is an edge. Determine whether.

Ｓ５１における判断がＹＥＳの場合、つまり、座標（ｂ−１，ａ），（ｂ，ａ），および（ｂ＋１，ａ）における画素のうちいずれかがエッジとなる画素ならば、垂直方向エッジ検出部１２は、座標（ｂ，ａ）の画素をエッジとみなす（Ｓ５２）。その後、垂直方向エッジ検出部１２は、座標（ｂ，ａ−１），および（ｂ，ａ）における画素が、ともにエッジとなる画素であるか否かを判断する（Ｓ５３）。 If the determination in S51 is YES, that is, if any of the pixels at coordinates (b-1, a), (b, a), and (b + 1, a) is an edge, the vertical edge detection unit 12 considers a pixel at coordinates (b, a) as an edge (S52). Thereafter, the vertical edge detection unit 12 determines whether or not the pixels at the coordinates (b, a-1) and (b, a) are both edges (S53).

また、Ｓ５１における判断がＮＯの場合、つまり、座標（ｂ−１，ａ），（ｂ，ａ），および（ｂ＋１，ａ）における画素のうちどれもエッジとなる画素ではないと判断されれば、垂直方向エッジ検出部１２は、Ｓ５２の処理を行うことなくＳ５３の処理を行う。 If the determination in S51 is NO, that is, if it is determined that none of the pixels at coordinates (b-1, a), (b, a), and (b + 1, a) is an edge pixel. The vertical edge detection unit 12 performs the process of S53 without performing the process of S52.

Ｓ５３において、座標（ｂ，ａ−１）および（ｂ，ａ）における画素が、ともにエッジとなる画素であると判断したら、垂直方向エッジ検出部１２は、連続画素数を１増やす（Ｓ５４）。一方、Ｓ５３において、座標（ｂ，ａ−１）および（ｂ，ａ）における画素が、ともにエッジとなる画素ではないと判断したら、垂直方向エッジ検出部１２は、連続画素数を０にリセットする（Ｓ５５）。 If it is determined in S53 that the pixels at the coordinates (b, a-1) and (b, a) are both edges, the vertical edge detection unit 12 increases the number of continuous pixels by 1 (S54). On the other hand, if it is determined in S53 that the pixels at the coordinates (b, a-1) and (b, a) are not both edges, the vertical edge detection unit 12 resets the number of continuous pixels to zero. (S55).

このようにして、Ｓ５１〜Ｓ５５の処理が、ａの値を１から１１９までの間に含まれる整数値として変化させつつ行われる。そして、各整数値について求められた連続画素数のうち、最大となる連続画素数を、垂直方向エッジ検出部１２は、ｘ＝ｂにおける直線エッジの長さとして算出する（Ｓ５６）。 In this way, the processing of S51 to S55 is performed while changing the value of a as an integer value included between 1 and 119. The vertical edge detection unit 12 calculates the maximum number of continuous pixels among the number of continuous pixels obtained for each integer value as the length of the straight edge at x = b (S56).

上述したＳ５０〜Ｓ５６における処理は、ｂの値を１から１５８までの間に含まれる整数値として変化させつつ行われる。 The processing in S50 to S56 described above is performed while changing the value of b as an integer value included between 1 and 158.

〔３−３．色相の分割処理およびヒストグラムの算出処理〕
次に、上述した図４のフローにおけるＳ３４およびＳ３５の処理の詳細について説明する。先ず、色相分割部１３は、図８に示すように、視界画像全体から３６０等級の色相のヒストグラムを算出し、頻度が最大となる色相を検出する（Ｓ６０）。そして、色相分割部１３は、Ｓ６０において検出した色相を中心として±２５°の範囲を、一等級の範囲として設定する（Ｓ６１）。 [3-3. Hue division processing and histogram calculation processing)
Next, details of the processing of S34 and S35 in the flow of FIG. 4 described above will be described. First, as shown in FIG. 8, the hue division unit 13 calculates a 360-degree hue histogram from the entire view image, and detects the hue having the maximum frequency (S60). Then, the hue dividing unit 13 sets a range of ± 25 ° centered on the hue detected in S60 as a first grade range (S61).

その後、色相分割部１３は、Ｓ６１で設定した範囲以外の範囲を６等分することで、色相を７等級に分割する（Ｓ６２）。 Thereafter, the hue dividing unit 13 divides the hue into seven grades by dividing the range other than the range set in S61 into six equal parts (S62).

このＳ６２における処理が終了したら、ヒストグラム算出部１４は、Ｓ３３（図４参照）において設定されたエッジ候補の全てに対して、図９に示すように、エッジ候補の両側の領域について、範囲を７等級に分割した色相のヒストグラムを算出する（Ｓ６３）。このようにして、図４におけるＳ３４およびＳ３５の処理が終了する。 When the processing in S62 ends, the histogram calculation unit 14 sets a range of 7 for the regions on both sides of the edge candidate as shown in FIG. 9 for all the edge candidates set in S33 (see FIG. 4). A histogram of hues divided into grades is calculated (S63). In this way, the processes of S34 and S35 in FIG. 4 are completed.

〔４．オブジェクト認識部における処理〕
次に、オブジェクト認識部４における処理の詳細について説明する。オブジェクト認識部４は、視界画像からオブジェクト領域抽出部３により抽出された領域に映っている画像と、データベース７内に格納された登録画像とのマッチング処理を行い、ユーザが注目している画像を認識するものである。ここでは、ユーザが注目している画像が絵画であり、データベース７に格納された登録画像としての絵画画像と、ユーザが注目している画像とのマッチングが行われる場合を例に挙げて説明する。 [4. Processing in the object recognition unit]
Next, details of processing in the object recognition unit 4 will be described. The object recognizing unit 4 performs a matching process between the image shown in the region extracted from the view field image by the object region extracting unit 3 and the registered image stored in the database 7, and selects the image that the user is paying attention to. Recognize. Here, an example will be described in which the image focused on by the user is a painting, and the painting image as a registered image stored in the database 7 is matched with the image focused on by the user. .

まず、図１０に示すように、オブジェクト認識部４は、視界画像からオブジェクト領域抽出部３により抽出された絵画領域を、横・縦方向に各々８、６等分し、４８ブロックに分割する（Ｓ７０）。さらに、オブジェクト認識部４における色相平均算出部１５は、各ブロックの色相の平均値を算出する（Ｓ７１）。 First, as shown in FIG. 10, the object recognition unit 4 divides the picture area extracted by the object area extraction unit 3 from the field-of-view image into 8 and 6 equal parts in the horizontal and vertical directions, and divides it into 48 blocks ( S70). Further, the hue average calculation unit 15 in the object recognition unit 4 calculates the average value of the hue of each block (S71).

Ｓ７１における色相の平均値の算出処理は、図１１に示すように、各ブロックにおける画素の色相の総和を画素数で除算した値を平均値とする処理を、全てのブロックに対して行うことで実行される。 As shown in FIG. 11, the calculation process of the average value of hues in S71 is performed by performing the process of making the average value the value obtained by dividing the sum of the hues of the pixels in each block by the number of pixels for all the blocks. Executed.

そして、図１０に示すように、色差算出部１６は、データベース７中の全ての絵画画像に対して、各ブロックでの色差から、全ブロックでの色差CS_piを、以下の式（５）により算出する。 Then, as shown in FIG. 10, the color difference calculation unit 16 calculates the color difference CS _pi in all blocks from the color difference in each block for all the painting images in the database 7 by the following equation (5). calculate.

上記式（５）において、 In the above formula (5),

は、視界画像から抽出した絵画領域におけるブロック(j,k)（j＝1,…,8）,（k＝1,…,6）の色相の平均値を示している。 Indicates the average value of the hues of the blocks (j, k) (j = 1,..., 8), (k = 1,..., 6) in the painting region extracted from the view field image.

また、 Also,

は、データベース７中のi番目の絵画画像におけるブロック(j,k)の色相の平均値を示している。 Indicates the average value of the hue of the block (j, k) in the i-th painting image in the database 7.

なお、オブジェクトや領域に２次元バーコードのようなタグを付けることにより、オブジェクト認識や注目領域の判別を行ってもよい。 Note that object recognition and region-of-interest discrimination may be performed by attaching a tag such as a two-dimensional barcode to an object or region.

〔５．注視点分布に基づく注釈情報の提示処理について〕
本実施形態における情報処理装置１は、注目領域判別部５によりオブジェクトにおける注目領域を判別し、注釈情報提示部６により注目領域に応じた注釈情報および階層情報の提示を行う。また、データベース７中の登録画像のそれぞれには、色情報と，注釈情報が付加される領域の凸多角形領域の頂点座標、その注釈情報、その画像を構成する領域の階層情報が定義されている。 [5. Presentation processing of annotation information based on gaze point distribution)
In the information processing apparatus 1 according to the present embodiment, the attention area determination unit 5 determines the attention area of the object, and the annotation information presentation section 6 presents annotation information and hierarchy information corresponding to the attention area. Each registered image in the database 7 defines color information, vertex coordinates of a convex polygonal region of the region to which annotation information is added, annotation information thereof, and hierarchical information of regions constituting the image. Yes.

図１２（ａ）は、登録画像がたとえば絵画画像である場合における、注釈情報が付加されている領域を示す図であり、図１２（ｂ）は、その絵画画像の階層情報の構成を示す図である。 FIG. 12A is a diagram illustrating a region to which annotation information is added when the registered image is, for example, a painting image, and FIG. 12B is a diagram illustrating a configuration of hierarchical information of the painting image. It is.

図１２（ａ）に示す絵画を大まかな２つの領域に分けると、領域１（屋敷）と、領域２（林）とに分けることができる。また、領域１（屋敷）を更に細分化すると、領域１１（屋根）と領域１２（ベランダ）とに分けることができる。よって、図１２（ｂ）に示すように、階層情報は、領域１の下位に、領域１１および領域１２が含まれるツリー構造となっている。 When the painting shown in FIG. 12A is divided into two roughly divided areas, it can be divided into an area 1 (house) and an area 2 (forest). Further, when the region 1 (house) is further subdivided, it can be divided into a region 11 (roof) and a region 12 (veranda). Therefore, as shown in FIG. 12B, the hierarchical information has a tree structure in which the area 11 and the area 12 are included in the lower level of the area 1.

以下、注目領域判別部５による注目領域の判別処理、および注釈情報提示部６による注釈情報の提示処理について、登録画像が絵画画像である場合を例に挙げて、より具体的に説明する。 Hereinafter, the attention area determination processing performed by the attention area determination unit 5 and the annotation information presentation processing performed by the annotation information presentation unit 6 will be described more specifically with reference to a case where the registered image is a picture image.

〔５−１．注目領域判別部５における処理〕
注目領域判別部５は、オブジェクト領域抽出部３およびオブジェクト認識部４における処理の後に、注目区間の注視点分布を含む最小凸多角形領域と、データベース中の絵画に対して注釈情報が付加されている領域の位置情報とを基にして、注目領域を判別する。 [5-1. Processing in the attention area determination unit 5]
The attention area discriminating section 5 adds annotation information to the minimum convex polygon area including the gazing point distribution of the attention section and the painting in the database after the processing in the object area extraction section 3 and the object recognition section 4. The attention area is determined based on the position information of the existing area.

ここで、視界画像から抽出された絵画領域と、データベース７中の絵画領域とは大きさが異なる。そのため、注目領域判別部５は、データベース７中の絵画領域に対する、抽出された絵画領域の縮小・拡大率を基に、絵画領域の大きさに関して正規化を行う。その後、注目領域判別部５は、最小凸多角形領域に含まれる注目領域を、図１３に示すフローチャートに従って判別する。 Here, the size of the picture area extracted from the view field image and the picture area in the database 7 are different. Therefore, the attention area discriminating unit 5 normalizes the size of the picture area based on the reduction / enlargement ratio of the extracted picture area with respect to the picture area in the database 7. Thereafter, the attention area determination unit 5 determines the attention area included in the minimum convex polygon area according to the flowchart shown in FIG.

先ず、注目領域判別部５は、絵画領域の左下を基準として最小凸多角形領域の各頂点までの水平方向、垂直方向の距離を求め、絵画領域上での最小凸多角形領域の位置を算出する（Ｓ８０）。 First, the attention area discriminating unit 5 calculates the horizontal and vertical distances to the vertices of the minimum convex polygon area with reference to the lower left of the painting area, and calculates the position of the minimum convex polygon area on the painting area. (S80).

また、注目領域判別部５は、データベース７内の絵画領域についても、絵画領域の左下を基準として注釈情報が付加されている凸多角形領域の各頂点までの水平方向、垂直方向の距離を算出する（Ｓ８１）。 The attention area discriminating unit 5 also calculates the horizontal and vertical distances to the vertices of the convex polygon area to which the annotation information is added with reference to the lower left of the painting area for the painting area in the database 7. (S81).

なお、Ｓ８１の処理は、データベース７内における絵画画像で、予め注釈情報が定義された領域について行われる。また、Ｓ８１の処理は必ずしもＳ８０の後に実行される必要はなく、予め行われるものであっても構わない。また、注目領域判別部５以外の処理部により、Ｓ８１が行われても構わない。 Note that the process of S81 is performed on a region in which annotation information is defined in advance for a painting image in the database 7. Moreover, the process of S81 does not necessarily need to be performed after S80, and may be performed in advance. Further, S81 may be performed by a processing unit other than the attention area determination unit 5.

そして、注目領域判別部５は、データベース７中の絵画領域の大きさを基準として、視界画像から抽出した絵画領域の大きさを正規化する（Ｓ８２）。また、Ｓ８０で距離を算出した絵画領域上での最小凸多角形領域の各頂点について、正規化後の位置を算出する（Ｓ８３）。 Then, the attention area determination unit 5 normalizes the size of the painting area extracted from the view field image with reference to the size of the painting area in the database 7 (S82). Further, a normalized position is calculated for each vertex of the minimum convex polygonal area on the painting area whose distance has been calculated in S80 (S83).

そして、注目領域判別部５は、視界画像の絵画領域における最小凸多角形領域と、データベース７中の注釈情報を付加された各領域との重なりがどの程度の大きさであるか判断し、その重なりが最大となる領域を、ユーザが注目した領域として判別する（Ｓ８４）。 Then, the attention area determination unit 5 determines how much the overlap between the minimum convex polygonal area in the painting area of the view field image and each area to which the annotation information is added in the database 7 is, The region where the overlap is maximum is determined as the region noted by the user (S84).

〔５−２．注釈情報提示部６における処理〕
注釈情報提示部６は、注視点分布から注目領域判別部５により注目領域の判別が行われた後に、注目領域に関する注釈情報を提示する。たとえば、ユーザが絵画画像の屋敷部分に注目していると注目領域判別部５により判別されたら、注釈情報提示部６は、図１４に示すように、屋敷に関する情報を情報処理装置１の外部に設けられた出力装置（ノートＰＣ、ヘッド・マウンテッド・ディスプレイ等）により表示する。なお、図１４に示すように、注釈情報として絵画そのものに関する情報（作者、作者名）を表示してもよい。 [5-2. Processing in the annotation information presentation unit 6]
The annotation information presentation unit 6 presents the annotation information related to the attention region after the attention region is determined by the attention region determination unit 5 from the gazing point distribution. For example, if the attention area discriminating unit 5 determines that the user is paying attention to the mansion part of the painting image, the annotation information presenting unit 6 sends the information about the mansion to the outside of the information processing apparatus 1 as shown in FIG. The image is displayed by a provided output device (notebook PC, head mounted display, etc.). In addition, as shown in FIG. 14, you may display the information (author, author name) regarding the painting itself as annotation information.

また、注釈情報提示部６は、注目領域に関する注釈情報を提示するだけでなく、注目領域の階層情報を提示してもよい。つまり、階層情報にて示される階層構造において、注目領域の親となる領域や子となる領域の注釈情報を、注釈情報提示部６を用いてユーザに提示することにより、注目領域に関するその他の情報を得る手がかりをユーザに提供できる。この際に表示される注釈情報が付加されている領域と、注目領域との階層が近ければ近いほど、注目領域に関する情報を的確に補足する情報を得ることができる。 Further, the annotation information presentation unit 6 may present not only annotation information related to the attention area but also hierarchical information of the attention area. In other words, in the hierarchical structure indicated by the hierarchical information, the annotation information of the region that becomes the parent or the child of the region of interest is presented to the user using the annotation information presentation unit 6, so that other information related to the region of interest is displayed. The user can be provided with a clue to obtain In this case, the closer the hierarchy between the region to which the annotation information is displayed and the region of interest is closer, the more information can be obtained that accurately supplements the information about the region of interest.

また、注釈情報が表示デバイスの画面上に提示されている間は、情報処理装置１にて眼球運動の解析が行われないようにしてもよい。そして、ユーザが眼を一定時間閉じることを検出して、眼球運動の解析を再開するトリガとしてもよい。これにより、ユーザは目を閉じるだけで再び注釈情報の提示を受けることができるので、情報処理装置１の利便性が向上する。 Further, while the annotation information is presented on the screen of the display device, the eye movement may not be analyzed by the information processing apparatus 1. And it is good also as a trigger which detects that a user closes eyes for a definite period of time and restarts analysis of eyeball movement. Thereby, since the user can receive the presentation of the annotation information again only by closing his eyes, the convenience of the information processing apparatus 1 is improved.

また、ユーザが絵画全体に注目した場合に、絵画そのものに関する注釈情報と、絵画に対して子となる領域とを、絵画に重ね合わせて表示してもよい。また、ユーザがある領域に注目している場合、注目領域に関する注釈情報と、注目領域の親または子となる領域を、ユーザに提示してもよい。 In addition, when the user pays attention to the entire painting, the annotation information about the painting itself and a region that is a child of the painting may be displayed superimposed on the painting. In addition, when the user is paying attention to a certain area, annotation information related to the attention area and a parent or child area of the attention area may be presented to the user.

〔６．実験と評価〕
本実施形態の情報処理装置１により判別された注目領域が、実際に注目した領域であるかを評価するために実験を行ったので以下に説明する。なお、注目領域を判別する前処理として、視界画像から絵画領域を抽出し、抽出した絵画領域の認識を行っているため、絵画領域の抽出と、絵画の認識とについての評価実験も行った。 [6. Experiment and evaluation)
Since an experiment was performed to evaluate whether the attention area determined by the information processing apparatus 1 of the present embodiment is an area of actual attention, the following description will be given. Note that, as a preprocessing for discriminating the region of interest, a painting region is extracted from the field-of-view image and the extracted painting region is recognized. Therefore, an evaluation experiment on the extraction of the painting region and the recognition of the painting was also performed.

なお、情報処理装置１として使用したＰＣは、周波数が500MHzのPentium（登録商標） IIIをCPUとして搭載したノート型ＰＣである。また、カメラ１７・１８は、フレームサイズを160×120[pixel]、処理速度を10[fps]に設定した。なお、眼球撮影画像は256階調グレースケール、視界画像は24bit colorである。 The PC used as the information processing apparatus 1 is a notebook PC equipped with a Pentium (registered trademark) III having a frequency of 500 MHz as a CPU. In addition, the cameras 17 and 18 have a frame size of 160 × 120 [pixel] and a processing speed of 10 [fps]. Note that the image taken by the eyeball is 256 gray scale, and the view image is 24 bit color.

また、A3サイズの20種類の絵画を額縁に入れ、単一色の壁に掛けた状態で、被験者4人に対して実験を行った。20種類の絵画の内訳は、人物・動物画が10枚、風景画が10枚である。 In addition, 20 A3-sized paintings were placed in a picture frame and hung on a single-colored wall. The breakdown of the 20 types of paintings consists of 10 people and animal paintings and 10 landscape paintings.

また、情報処理装置１のプロトタイプ使用時の様子を図１５に、絵画抽出の様子を図１６に示す。図１６の上部５枚の画像のうち、下段右の画像は注視点を表すものである。 Further, FIG. 15 shows a state when the prototype of the information processing apparatus 1 is used, and FIG. 16 shows a state of picture extraction. Of the upper five images in FIG. 16, the lower right image represents the point of interest.

〔６−１．絵画領域の抽出精度と絵画の認識精度〕
〔６−１−１．視界画像から絵画領域を抽出する精度〕
上述した方法でオブジェクト領域抽出部３により抽出された領域が、絵画領域であるかを評価するため、オブジェクト領域抽出部３により抽出された領域のうち実際に絵画領域である割合(Pr)と、視界画像に映っている絵画領域のうちオブジェクト領域抽出部３により抽出された領域の割合(Re)とに基づき評価を行った。 [6-1. (Picture area extraction accuracy and picture recognition accuracy)
[6-1-1. (Accuracy to extract picture area from view image)
In order to evaluate whether the region extracted by the object region extraction unit 3 by the above-described method is a painting region, the ratio (Pr) that is actually a painting region among the regions extracted by the object region extraction unit 3; Evaluation was performed based on the ratio (Re) of the area extracted by the object area extraction unit 3 among the picture areas shown in the view field image.

そして、20種類の絵画について、ほぼ正面から見たときの絵画が映っている視界画像をシステムにより各20枚獲得し、合計400枚の視界画像から絵画領域の抽出を行った。その結果を表１に示す。表１には、各視界画像から絵画を抽出して得たPr、Reの平均値と、絵画の面積占有率の平均値とが示されている。なお、絵画の面積占有率とは、視界画像において絵画が占める面積の割合である。 For the 20 types of paintings, the system acquired 20 view images each showing the paintings when viewed almost from the front, and extracted the painting areas from a total of 400 view images. The results are shown in Table 1. Table 1 shows the average values of Pr and Re obtained by extracting a picture from each view image, and the average value of the area occupancy ratio of the picture. In addition, the area occupation rate of a painting is a ratio of the area which a painting occupies in a view field image.

〔６−１−２．絵画の認識精度〕
上述した手法によりオブジェクト領域抽出部３で視界画像に映っている絵画領域を抽出した後に、上述した方法によりオブジェクト認識部４で絵画を認識する。20種類の絵画について、ほぼ正面から見たときの絵画が映っている視界画像を情報処理装置１により各20枚獲得し、合計400枚の視界画像について絵画の認識を行った。そして、各視界画像において判別された絵画と、視界画像に映っている絵画と同じであれば認識の成功とした。その結果を表２に示す。 [6-1-2. (Picture recognition accuracy)
After the painting region shown in the view field image is extracted by the object region extraction unit 3 by the above-described method, the object recognition unit 4 recognizes the painting by the above-described method. About 20 types of paintings, 20 images of the field of view, each of which shows the paintings when viewed from the front, were acquired by the information processing apparatus 1 and the paintings were recognized for a total of 400 fields of view images. If the picture discriminated in each view image is the same as the picture shown in the view image, the recognition is successful. The results are shown in Table 2.

表２より、ユーザと絵画の距離が遠くなっても、ある程度の精度で絵画の認識を行うことができている。 From Table 2, even if the distance between the user and the painting is increased, the painting can be recognized with a certain degree of accuracy.

〔６−２．注目領域の判別精度〕
上述した方法により注目領域判別部５が判別した注目領域が、実際に注目した領域であるかを評価した。なお、20種類の絵画に対して、構成要素である人や建物など5箇所を注目領域として定義しておく。 [6-2. (Identification accuracy of attention area)
It was evaluated whether or not the attention area determined by the attention area determination unit 5 by the above-described method was the area that was actually noticed. For 20 types of paintings, five locations such as people and buildings that are constituent elements are defined as attention areas.

これらの領域は、隣接している場合、離れている場合、包含している(されている)場合がある。各被験者に合計100箇所の注目領域を1回ずつ注目してもらい、注目領域判別部５が判別した注目領域と、実際に注目した領域が同じならば判別の成功とした。被験者4人の結果の平均を表３に示す。なお、ユーザと絵画との距離が1.0mの場合、絵画に対する視角は上下17.8°、左右22.6°となる。また、距離が1.5mの場合は、絵画に対する視角は上下11.4°、左右15.2°となる。 These areas may be adjacent, separated, or contained (done). Each subject paid attention to a total of 100 attention areas once, and if the attention area determined by the attention area determination unit 5 was the same as the actual attention area, the determination was successful. The average of the results of 4 subjects is shown in Table 3. When the distance between the user and the painting is 1.0 m, the viewing angle with respect to the painting is 17.8 ° vertically and 22.6 ° horizontally. When the distance is 1.5 m, the viewing angle with respect to the painting is 11.4 ° up and down and 15.2 ° left and right.

ユーザと絵画との距離が1.0mの場合と1.5mとの場合の精度を比較すると、距離が近いほうが注目領域の判別精度はよくなっている。これは、ユーザと絵画の距離が近ければ、前処理の精度が良いことや、視界画像から抽出した絵画領域を正規化する際の、注視点の抽出上の誤差が及ぼす影響が小さいためだと考えられる。 Comparing the accuracy when the distance between the user and the painting is 1.0 m and when the distance is 1.5 m, the discrimination accuracy of the attention area is better when the distance is shorter. This is because if the distance between the user and the painting is close, the accuracy of preprocessing will be good, and the influence on the extraction of the gazing point when normalizing the painting area extracted from the view field image will be small. Conceivable.

また、ユーザと絵画との距離が近い場合、抽出した絵画領域を正規化する際に領域の大きさの変化が小さかった。そのため、図１３のＳ８３で説明した正規化後の最小凸多角形の位置を計算しても、注視点の抽出上の誤差と、ユーザと絵画との距離の影響とにより、最小凸多角形が変形することはなかった。 In addition, when the distance between the user and the painting is short, the change in the size of the region is small when the extracted painting region is normalized. For this reason, even if the position of the minimum convex polygon after normalization described in S83 of FIG. 13 is calculated, the minimum convex polygon is determined due to the error in extraction of the gazing point and the influence of the distance between the user and the painting. There was no deformation.

〔７．補足〕
本実施形態の情報処理装置１の処理手順は、ＣＰＵなどの演算手段が、ＲＯＭ（Read Only Memory）やＲＡＭなどの記憶手段に記憶されたプログラムを実行し、キーボードなどの入力手段、ディスプレイなどの出力手段、あるいは、インターフェース回路などの通信手段を制御することにより実現することができる。 [7. Supplement)
The processing procedure of the information processing apparatus 1 according to the present embodiment is such that a calculation unit such as a CPU executes a program stored in a storage unit such as a ROM (Read Only Memory) or a RAM, and an input unit such as a keyboard or a display It can be realized by controlling output means or communication means such as an interface circuit.

したがって、これらの手段を有するコンピュータが、上記プログラムを記録した記録媒体を読み取り、当該プログラムを実行するだけで、本実施形態の情報処理装置１の各種処理を実現することができる。また、上記プログラムをリムーバブルな記録媒体に記録することにより、任意のコンピュータ上で上記の各種機能および各種処理を実現することができる。 Therefore, various processes of the information processing apparatus 1 of the present embodiment can be realized simply by a computer having these means reading the recording medium storing the program and executing the program. In addition, by recording the program on a removable recording medium, the various functions and various processes described above can be realized on an arbitrary computer.

この記録媒体としては、マイクロコンピュータで処理を行うために図示してはいないがメモリ、例えばＲＯＭのようなものがプログラムメディアであっても良いし、また、図示していないが外部記憶装置としてプログラム読み取り装置が設けられ、そこに記録媒体を挿入することにより読み取り可能なプログラムメディアであっても良い。 As the recording medium, a program such as a memory such as a ROM may be used as a recording medium (not shown in the figure for processing by a microcomputer). It may be a program medium provided with a reading device and readable by inserting a recording medium therein.

また、何れの場合でも、格納されているプログラムは、マイクロプロセッサがアクセスして実行される構成であることが好ましい。さらに、プログラムを読み出し、読み出されたプログラムは、マイクロコンピュータのプログラム記憶エリアにダウンロードされて、そのプログラムが実行される方式であることが好ましい。なお、このダウンロード用のプログラムは予め本体装置に格納されているものとする。 In any case, the stored program is preferably configured to be accessed and executed by the microprocessor. Furthermore, it is preferable that the program is read out, and the read program is downloaded to a program storage area of the microcomputer and the program is executed. It is assumed that this download program is stored in advance in the main unit.

また、上記プログラムメディアとしては、本体と分離可能に構成される記録媒体であり、磁気テープやカセットテープ等のテープ系、フレキシブルディスクやハードディスク等の磁気ディスクやＣＤ／ＭＯ／ＭＤ／ＤＶＤ等のディスクのディスク系、ＩＣカード（メモリカードを含む）等のカード系、あるいはマスクＲＯＭ、ＥＰＲＯＭ（Erasable Programmable Read Only Memory）、ＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory）、フラッシュＲＯＭ等による半導体メモリを含めた固定的にプログラムを担持する記録媒体等がある。 The program medium is a recording medium configured to be separable from the main body, such as a tape system such as a magnetic tape or a cassette tape, a magnetic disk such as a flexible disk or a hard disk, or a disk such as a CD / MO / MD / DVD. Fixed disk, IC card (including memory card), etc., or semiconductor ROM such as mask ROM, EPROM (Erasable Programmable Read Only Memory), EEPROM (Electrically Erasable Programmable Read Only Memory), flash ROM, etc. In particular, there are recording media that carry programs.

また、インターネットを含む通信ネットワークを接続可能なシステム構成であれば、通信ネットワークからプログラムをダウンロードするように流動的にプログラムを担持する記録媒体であることが好ましい。 In addition, if the system configuration is capable of connecting to a communication network including the Internet, the recording medium is preferably a recording medium that fluidly carries the program so as to download the program from the communication network.

さらに、このように通信ネットワークからプログラムをダウンロードする場合には、そのダウンロード用のプログラムは予め本体装置に格納しておくか、あるいは別な記録媒体からインストールされるものであることが好ましい。 Further, when the program is downloaded from the communication network as described above, it is preferable that the download program is stored in the main device in advance or installed from another recording medium.

このように、本実施形態の情報処理装置１は、ユーザの眼球運動を撮影した眼球撮影画像に基づき、ユーザがオブジェクトに注目している状態を検出する眼球動作検出部２と、眼球動作検出部２によりユーザが注目状態にあると検出された際、ユーザの視界を撮影した視界画像内において、ユーザが注目しているオブジェクトを認識するオブジェクト認識部４と、上記オブジェクトにおいてユーザが注目している注目領域を判別する注目領域判別部５と、注目領域判別部５により判別された注目領域に関する注釈情報をユーザに提示する注釈情報提示部６とを備えているものである。 As described above, the information processing apparatus 1 according to the present embodiment includes an eyeball motion detection unit 2 that detects a state in which the user is paying attention to an object, and an eyeball motion detection unit based on an eyeball image obtained by capturing the user's eyeball motion 2, the object recognizing unit 4 for recognizing the object of interest of the user in the field-of-view image obtained by photographing the user's field of view, and the user is paying attention to the object. An attention area discriminating section 5 that discriminates an attention area, and an annotation information presentation section 6 that presents annotation information related to the attention area discriminated by the attention area discrimination section 5 to the user.

上記構成によれば、眼球動作検出部２にてユーザが注目状態にあることを検出した上で、オブジェクト認識部４によりユーザが注目しているオブジェクトが認識されるので、ユーザが真に興味を示しているオブジェクトを認識することができる。 According to the above configuration, since the eye movement detection unit 2 detects that the user is in the attention state, the object recognition unit 4 recognizes the object that the user is paying attention to. The object shown can be recognized.

その上、注目領域判別部５により、オブジェクトにおいてユーザが注目している注目領域を判別するので、オブジェクトにおいてどの部分にユーザが注目しているかを把握することができる。そして、注釈情報提示部６は、上記注目領域に関する情報をユーザに提示するので、ユーザが真に興味を示している領域に関する注釈情報を的確にユーザに提示することが可能であり、不必要な情報がユーザに提示されることを防止できる。 In addition, since the attention area discriminating unit 5 discriminates the attention area in which the user is paying attention, it is possible to grasp which part of the object the user is paying attention to. And since the annotation information presentation part 6 presents the information regarding the said attention area | region to a user, it is possible to present to the user exactly the annotation information regarding the area | region where the user is really interested, and is unnecessary. Information can be prevented from being presented to the user.

このように、情報処理装置１によれば、ユーザが興味を示している領域に関する注釈情報が的確にユーザに提示されるので、情報処理装置１の快適な利用環境をユーザに与えることができる。 As described above, according to the information processing apparatus 1, the annotation information regarding the area in which the user is interested is accurately presented to the user, so that a comfortable use environment of the information processing apparatus 1 can be given to the user.

さらに、眼球動作検出部２は、眼球撮影画像から瞳孔領域を抽出する視線検出部８と、上記瞳孔領域における中心座標から注視点の移動距離を算出する注視点算出部９と、上記注視点の移動距離から、オブジェクトに対するユーザの視角を算出するとともに、その視角の大きさに基づき眼球の固視状態および跳躍状態を検出し、これら固視状態および跳躍状態の回数に基づきユーザがオブジェクトに注目している状態を検出する注目状態検出部１０とを備えている。 Further, the eyeball motion detection unit 2 includes a line-of-sight detection unit 8 that extracts a pupil region from an eyeball photographed image, a gaze point calculation unit 9 that calculates a movement distance of a gaze point from center coordinates in the pupil region, and the gaze point The viewing angle of the user with respect to the object is calculated from the moving distance, the fixation state and the jumping state of the eyeball are detected based on the size of the viewing angle, and the user pays attention to the object based on the number of the fixation state and the jumping state. And an attention state detection unit 10 that detects the state of

上記構成の情報処理装置１は、視線検出部８により２値化処理を用いて瞳孔領域を抽出する一方で、注視点算出部９により注視点の移動距離を算出する。そして、注目状態検出部１０は、注視点算出部９に求められた注視点の移動距離からユーザの視角を算出して固視状態および跳躍状態の回数を検出するので、的確にユーザの注目状態を検出することができる。 In the information processing apparatus 1 configured as described above, the eye gaze detection unit 8 extracts the pupil region using the binarization process, and the gaze point calculation unit 9 calculates the movement distance of the gaze point. Then, the attention state detection unit 10 calculates the user's viewing angle from the movement distance of the gazing point obtained by the gazing point calculation unit 9, and detects the number of fixation states and jumping states. Can be detected.

このように、上記構成によれば、ユーザの注目状態を的確に把握することができるので、ユーザが興味を示しているオブジェクトがオブジェクト認識部４により的確に認識され、注釈情報提示部６はさらに的確な注釈情報をユーザに提示することができる。 In this way, according to the above configuration, the user's attention state can be accurately grasped, so that the object that the user is interested in is accurately recognized by the object recognition unit 4, and the annotation information presentation unit 6 further includes Accurate annotation information can be presented to the user.

さらに、注目状態検出部１０は、ユーザが注目状態にある間の注視点の分布を内包するように複数の注視点を結ぶことにより、上記視界画像内に最小多角形領域を画定するものであるとともに、注目領域判別部５は、データベース７内に格納された登録画像における複数の領域のうち、上記最小多角形領域との重なりが最大となる領域を判断し、その領域に対応する上記視界画像内の領域を、上記注目領域として判別するものである。 Furthermore, the attention state detection unit 10 defines a minimum polygonal region in the view image by connecting a plurality of attention points so as to include a distribution of attention points while the user is in the attention state. At the same time, the attention area determination unit 5 determines an area where the overlap with the minimum polygonal area is maximum among a plurality of areas in the registered image stored in the database 7, and the view image corresponding to the area The inner area is discriminated as the attention area.

上記構成によれば、注目状態検出部１０により最小多角形領域が画定される。この最小多角形領域は、注視点の分布を内包するように複数の注視点を結んで得られるものであるから、視界画像においてユーザが注目している領域を的確に表しているといえる。 According to the above configuration, the attention state detection unit 10 defines the minimum polygonal region. Since the minimum polygonal area is obtained by connecting a plurality of gazing points so as to include the distribution of the gazing points, it can be said that the minimum polygonal area accurately represents an area that the user is paying attention to in the view field image.

そして、注目領域判別部５は、上記最小多角形との重なりが最大となる領域を、データベース７内の登録画像について判断し、その領域に対応する視界画像内の領域を注目領域として判断するので、ユーザが注目している領域を的確に注目領域として判別することができる。 Then, the attention area determination unit 5 determines the area where the overlap with the minimum polygon is maximum for the registered image in the database 7, and determines the area in the view field image corresponding to the area as the attention area. Thus, it is possible to accurately determine the region that the user is paying attention to as the attention region.

さらに、オブジェクト認識部４は、上記視界画像における色相と、データベース７内に格納された視界画像の候補となり得る登録画像の色相とに基づき、ユーザが注目しているオブジェクトを認識するものである。 Furthermore, the object recognizing unit 4 recognizes an object that the user is paying attention to based on the hue in the view image and the hue of a registered image that can be a candidate for the view image stored in the database 7.

また、注釈情報提示部６は、上記注目領域と階層構造をなす領域に関する注釈情報をユーザに提示するものである。 The annotation information presentation unit 6 presents the user with annotation information regarding a region having a hierarchical structure with the region of interest.

また、上記眼球撮影画像を撮影するカメラ１７、および上記視界画像を撮影するカメラ１８の一方または双方を携帯端末に搭載することにより、ユーザが移動しても眼球撮影画像や視界画像を撮影することができるので、情報処理装置１の利便性をさらに高めることができる。 Further, by mounting one or both of the camera 17 that captures the eyeball image and the camera 18 that captures the field image on the mobile terminal, the eyeball image and field image can be captured even if the user moves. Therefore, the convenience of the information processing apparatus 1 can be further enhanced.

さらに、携帯端末により注釈情報の再生を行うことで、移動するユーザに対しても注釈情報の提示を行うことができるので、情報処理装置１の利便性をより高めることができる。 Furthermore, since the annotation information can be presented to the moving user by reproducing the annotation information using the portable terminal, the convenience of the information processing apparatus 1 can be further improved.

本発明は上述した実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of the present invention.

本発明の情報処理装置によれば、たとえば美術館のように多くの絵画を展示している環境において、作品全体に関する情報や、作品の一部分に関する情報を、注釈情報としてユーザの注目領域に適応させて提示することが可能となる。これにより、ユーザは展示されている絵画に関する理解を深めることができ、また、絵画の説明をする人員を配置するための人件費も省略することができる。 According to the information processing apparatus of the present invention, in an environment where a large number of paintings are exhibited, for example, in an art museum, information on the entire work or information on a part of the work is adapted to the attention area of the user as annotation information. It can be presented. As a result, the user can deepen their understanding of the paintings on display, and can also omit labor costs for arranging personnel to explain the paintings.

また、本発明の情報処理装置の利用環境は、絵画を展示するような環境だけに限定されるものではない。たとえば、工作機械の各部分を撮影した画像を登録画像としてデータベースに格納しておくとともに、注釈情報として工作機械の操作方法を説明するテキストデータや音声データをユーザに提示するとよい。これにより、ユーザは、工作機械の各部分に注目するだけで工作機械の操作方法を知ることができるので、初心者であっても工作機械を操作することができる。 In addition, the use environment of the information processing apparatus of the present invention is not limited to an environment where pictures are displayed. For example, it is good to store the image which image | photographed each part of the machine tool as a registration image in a database, and to show a user text data and voice data which explain the operation method of a machine tool as annotation information. Thereby, the user can know the operation method of the machine tool only by paying attention to each part of the machine tool, so that even a beginner can operate the machine tool.

本発明の一実施形態に係る情報処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the information processing apparatus which concerns on one Embodiment of this invention. 図１の情報処理装置により注釈情報が提示されるまでの一連の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a series of processes until annotation information is shown by the information processing apparatus of FIG. 図１の情報処理装置における眼球動作検出部が行う処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which the eyeball movement detection part in the information processing apparatus of FIG. 1 performs. 図１の情報処理装置におけるオブジェクト領域抽出部が行う処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which the object area extraction part in the information processing apparatus of FIG. 1 performs. 図５（ａ）は、視界画像の一例を示す図であり、図５（ｂ）は、図５（ａ）の視界画像から求められたエッジ画像ならびに水平方向のエッジの長さを示すグラフである。FIG. 5A is a diagram showing an example of a field-of-view image, and FIG. 5B is a graph showing the edge image obtained from the field-of-view image of FIG. 5A and the length of the edge in the horizontal direction. is there. 水平方向のエッジに係る長さの算出処理を詳細に示すフローチャートである。It is a flowchart which shows in detail the calculation process of the length concerning the edge of a horizontal direction. 垂直方向のエッジに係る長さの算出処理を詳細に示すフローチャートである。It is a flowchart which shows in detail the calculation process of the length concerning the edge of a perpendicular direction. 色相の分割処理を詳細に示すフローチャートである。It is a flowchart which shows the division | segmentation process of a hue in detail. 色相のヒストグラム算出処理を詳細に示すフローチャートである。It is a flowchart which shows the histogram calculation process of a hue in detail. オブジェクトを分割したブロックについて色差を算出する処理を詳細に示すフローチャートである。It is a flowchart which shows in detail the process which calculates a color difference about the block which divided | segmented the object. 色相の平均値の算出処理を詳細に示すフローチャートである。It is a flowchart which shows the calculation process of the average value of a hue in detail. 図１２（ａ）は、登録画像がたとえば絵画画像である場合における、注釈情報が付加されている領域を示す図であり、図１２（ｂ）は、その絵画画像の階層情報の構成を示す図である。FIG. 12A is a diagram illustrating a region to which annotation information is added when the registered image is, for example, a painting image, and FIG. 12B is a diagram illustrating a configuration of hierarchical information of the painting image. It is. 注目領域の判別処理の詳細を示す図である。It is a figure which shows the detail of an attention area determination process. 注釈情報を提示する画面の一例を示す図である。It is a figure which shows an example of the screen which presents annotation information. 本発明の情報処理装置の一実施形態を使用する状態を示す図である。It is a figure which shows the state which uses one Embodiment of the information processing apparatus of this invention. 本発明の情報処理装置の一実施形態により絵画画像を抽出する様子を示す図である。It is a figure which shows a mode that the pictorial image is extracted by one Embodiment of the information processing apparatus of this invention.

Explanation of symbols

１情報処理装置
２眼球動作検出部（眼球動作検出手段）
４オブジェクト認識部（オブジェクト認識手段）
５注目領域判別部（注目領域判別手段）
６注釈情報提示部（注釈情報提示手段）
７データベース
８視線検出部（視線検出手段）
９注視点算出部（注視点算出手段）
１０注目状態検出部（注目状態検出手段）
１７カメラ（第１カメラ）
１８カメラ（第２カメラ） DESCRIPTION OF SYMBOLS 1 Information processing apparatus 2 Eye movement detection part (eye movement detection means)
4 Object recognition unit (object recognition means)
5 Attention area discrimination unit (attention area discrimination means)
6 Annotation information presentation part (annotation information presentation means)
7 Database 8 Gaze detection unit (Gaze detection means)
9 Gaze point calculation unit (gaze point calculation means)
10 Attention state detection unit (attention state detection means)
17 Camera (first camera)
18 cameras (second camera)

Claims

Eye movement detecting means for detecting a state in which the user is paying attention to the object, based on an eyeball photographed image of the user's eye movement;
An object recognizing unit that recognizes an object focused on by the user in a view field image of the user's view when the eye movement detecting unit detects that the user is in the focused state;
Attention area discrimination means for discriminating the attention area that the user is paying attention to in the object;
Annotation information presenting means for presenting annotation information related to the attention area determined by the attention area determination means to the user ,
The eye movement detecting means is
Eye-gaze detection means for extracting a pupil region from the eyeball image,
Gazing point calculation means for calculating a moving distance of the gazing point from the center coordinates in the pupil region;
The viewing angle of the user with respect to the object is calculated from the moving distance of the gazing point, the fixation state and the jumping state of the eyeball are detected based on the magnitude of the viewing angle, and the user is based on the number of the fixation state and the jumping state. Attention state detection means for detecting a state of attention to the object,
The attention state detection means further defines a minimum polygonal region in the view image by connecting a plurality of attention points so as to include a distribution of attention points while the user is in the attention state. ,
The attention area determination means determines an area where the overlap with the minimum polygonal area is maximum among a plurality of areas in the registered image stored in the database as a candidate for a view field image, and corresponds to the area An information processing apparatus for discriminating an area in a view field image as the attention area.

The object recognizing means recognizes an object focused on by a user based on a hue in the view image and a hue of a registered image that can be a candidate for the view image stored in a database. The information processing apparatus according to claim 1.

The information processing apparatus according to claim 1, wherein the annotation information presenting unit further presents annotation information relating to a region having a hierarchical structure with the region of interest to the user.

A mobile terminal used in the information processing apparatus according to claim 1,
A portable terminal comprising a first camera for photographing the eyeball photographed image.

A mobile terminal used in the information processing apparatus according to claim 1,
A portable terminal comprising a second camera that captures the view image.

A mobile terminal used in the information processing apparatus according to claim 1,
A portable terminal comprising: a first camera that captures the eyeball image; and a second camera that captures the field-of-view image.

A mobile terminal used in the information processing apparatus according to claim 1,
A portable terminal characterized in that the annotation information is fetched from the annotation information presenting means and the annotation information is reproduced.

A first step of detecting a state in which the user is paying attention to the object based on an eyeball image obtained by photographing the user's eye movement;
A second step of recognizing an object that the user is paying attention to in the view image obtained by photographing the user's view when it is detected that the user is in the attention state by the first step;
A third step of determining a region of interest in which the user is paying attention in the object;
A fourth step of presenting annotation information regarding the attention area determined in the third step to the user ,
The first step is
A line-of-sight detection step of extracting a pupil region from the eyeball image,
A gazing point calculation step of calculating a moving distance of the gazing point from the center coordinates in the pupil region;
The viewing angle of the user with respect to the object is calculated from the moving distance of the gazing point, the fixation state and the jumping state of the eyeball are detected based on the magnitude of the viewing angle, and the user is based on the number of the fixation state and the jumping state. And a state of interest detection step for detecting a state of attention to the object,
The attention state detection step is a step of defining a minimum polygonal area in the view field image by connecting a plurality of attention points so as to include a distribution of attention points while the user is in the attention state. ,
The third step determines a region having the largest overlap with the minimum polygonal region from among a plurality of regions in the registered image stored in the database as a candidate for the view image, and the view corresponding to the region An information processing method comprising the step of discriminating an area in an image as the attention area.

An information processing program for causing a computer to execute each step of the information processing method according to claim 8.

A computer-readable recording medium on which the information processing program according to claim 9 is recorded.