JP2016025625A

JP2016025625A - Information processor, information processing method, and program

Info

Publication number: JP2016025625A
Application number: JP2014150865A
Authority: JP
Inventors: 泰輔石黒; Taisuke Ishiguro
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2014-07-24
Filing date: 2014-07-24
Publication date: 2016-02-08

Abstract

PROBLEM TO BE SOLVED: To identify a subject of a photographed image accurately.SOLUTION: An information processor has: region extraction means extracting a plurality of content regions from a plurality of photographed images where a part of one paper medium is photographed; feature amount calculating means calculating feature amounts of the content regions; position identification means which identifies positional relations among the plurality of content regions; and electronic document identification means which refers to storage means that stores the feature amounts of the plurality of contents included in one electronic document in association with the electronic document corresponding to one paper medium and further stores positional information showing positional relations among the plurality of contents included in one electronic document and identifies an electronic document corresponding to the plurality of photographed images on the basis of the respective feature amounts of the respective plurality of content regions calculated by the feature amount calculating means and the positional relations among the plurality of content regions identified by the position identification means.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置、情報処理方法及びプログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program.

従来、タブレット端末やスマートフォン（以降、スマート端末とも記載する）を用いて、カメラで撮影した被写体に関連する情報を検索し、検索された関連情報をスマート端末の画面に表示する技術が知られている。例えば、ＡｕｇｍｅｎｔｅｄＲｅａｌｉｔｙ（以降、ＡＲ）と呼ばれる拡張現実技術では、カメラで撮影された被写体に関連する情報を検索し、検索結果をカメラで撮影された実世界の物体に重畳表示することができる。さらに、当該技術における被写体を紙文書とし、紙文書を撮影すると、紙文書上に関連情報が重畳表示されるサービスも展開されている。紙媒体に動画等のコンテンツを表現することができないが、紙文書に印刷されている静止画の関連情報として動画を検索し、得られた動画を重畳表示することが可能になる。
また、カメラで撮影された被写体を特定するための技術として、特許文献１には、複数ページから構成される印刷物をページ単位で読み取り、読み取った情報からオリジナルの電子文書を検索する技術が開示されている。 2. Description of the Related Art Conventionally, there has been known a technique for searching for information related to a subject photographed by a camera using a tablet terminal or a smartphone (hereinafter also referred to as a smart terminal) and displaying the searched related information on the screen of the smart terminal. Yes. For example, in augmented reality technology called Augmented Reality (hereinafter referred to as AR), information related to a subject photographed by a camera can be retrieved, and the retrieval result can be superimposed on a real-world object photographed by the camera. Furthermore, a service has been developed in which the subject in the technology is a paper document, and when the paper document is photographed, related information is superimposed and displayed on the paper document. Although content such as a moving image cannot be expressed on a paper medium, it is possible to search for a moving image as related information of a still image printed on a paper document and display the obtained moving image in a superimposed manner.
Further, as a technique for specifying a subject photographed by a camera, Patent Document 1 discloses a technique for reading a printed matter composed of a plurality of pages in units of pages and searching an original electronic document from the read information. ing.

特開２００４−３４９９４０号公報JP 2004-349940 A

上述のように、被写体と関連する情報を検索する際には、被写体を正確に認識することが必要となる。例えば、被写体の認識精度が低いために、複数の被写体候補が検出された場合には、これに対応し、被写体に対する関連情報を一意に特定することができない。
例えば、スマート端末による撮影対象が、街頭のポスター等のように大きな紙面や、微修正が繰り返された仕様書である場合に、被写体の認識が困難となり、適切な関連情報を特定できない可能性がある。このように関連情報を一意に特定できない場合に複数の関連情報をユーザに提示することも考えられるが、この場合、ユーザが必要な関連情報を選択する必要があり、操作が煩雑になるという問題があった。 As described above, when searching for information related to a subject, it is necessary to accurately recognize the subject. For example, when a plurality of subject candidates are detected because the subject recognition accuracy is low, it is not possible to uniquely identify related information for the subject corresponding to this.
For example, if the subject to be photographed by a smart terminal is a large paper such as a street poster or a specification with repeated fine corrections, it may be difficult to recognize the subject, and appropriate related information may not be identified. is there. In this way, when related information cannot be uniquely identified, it may be possible to present a plurality of related information to the user. In this case, however, the user needs to select necessary related information, and the operation becomes complicated. was there.

本発明はこのような問題点に鑑みなされたもので、撮影画像の被写体を精度よく特定することを目的とする。 The present invention has been made in view of such problems, and an object thereof is to accurately specify a subject of a captured image.

そこで、本発明は、情報処理システムであって、１枚の紙媒体の一部が撮影された複数の撮影画像から、複数のコンテンツ領域を抽出する領域抽出手段と、前記コンテンツ領域の特徴量を算出する特徴量算出手段と、前記複数のコンテンツ領域間の位置関係を特定する位置特定手段と、１枚の紙媒体に対応する電子文書に対応付けて、１つの電子文書に含まれる複数のコンテンツの特徴量を記憶し、さらに１つの電子文書に含まれる複数のコンテンツ間の位置関係を示す位置情報を記憶する記憶手段を参照し、前記特徴量算出手段により算出された複数のコンテンツ領域それぞれの特徴量と、前記位置特定手段により特定された複数のコンテンツ領域間の位置関係とに基づいて、前記複数の撮影画像に対応する電子文書を特定する電子文書特定手段とを有することを特徴とする。 Therefore, the present invention is an information processing system, and includes an area extraction unit that extracts a plurality of content areas from a plurality of captured images obtained by capturing a part of one paper medium, and a feature amount of the content area. A feature amount calculating means for calculating, a position specifying means for specifying a positional relationship between the plurality of content areas, and a plurality of contents included in one electronic document in association with the electronic document corresponding to one paper medium And a storage unit that stores positional information indicating a positional relationship between a plurality of contents included in one electronic document, and stores each of the plurality of content areas calculated by the feature amount calculation unit. An electronic document feature that specifies an electronic document corresponding to the plurality of captured images based on the feature amount and the positional relationship between the plurality of content areas specified by the position specifying means. And having a means.

本発明によれば、撮影画像の被写体を精度よく特定することができる。 According to the present invention, the subject of the captured image can be specified with high accuracy.

ＡＲ処理システムの全体構成を示す図である。It is a figure which shows the whole structure of AR processing system. 紙文書の一例を示す図である。It is a figure which shows an example of a paper document. 携帯端末のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of a portable terminal. 携帯端末のソフトウェア構成を示す図である。It is a figure which shows the software structure of a portable terminal. 管理サーバ装置のソフトウェア構成を示す図である。It is a figure which shows the software structure of a management server apparatus. 特徴量テーブルのデータ構成の一例を示す図である。It is a figure which shows an example of a data structure of a feature-value table. ＡＲ処理を示すシーケンス図である。It is a sequence diagram which shows AR processing. 画像解析処理をフローチャートである。It is a flowchart for an image analysis process. 処理対象のフレームの一例を示す図である。It is a figure which shows an example of the flame | frame of a process target. 撮影画像テーブルのデータ構成の一例を示す図である。It is a figure which shows an example of the data structure of a picked-up image table. 撮影画像テーブルに情報が記録される処理の説明図である。It is explanatory drawing of the process in which information is recorded on a picked-up image table. 特徴量抽出処理を示すフローチャートである。It is a flowchart which shows a feature-value extraction process. 縮小画像生成処理の説明図である。It is explanatory drawing of a reduced image generation process. 電子文書検索処理を示すフローチャートである。It is a flowchart which shows an electronic document search process. 特徴量照合処理を示すフローチャートである。It is a flowchart which shows a feature amount collation process. 支援情報送信処理を示すフローチャートである。It is a flowchart which shows a support information transmission process. 支援情報送信処理の説明図である。It is explanatory drawing of a support information transmission process. 支援情報の表示例を示す図である。It is a figure which shows the example of a display of assistance information.

以下、本発明の実施形態について図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、情報処理システムとしてのＡＲ（ＡｕｇｍｅｎｔｅｄＲｅａｌｉｔｙ）処理システムの全体構成を示す図である。ＡＲ処理システムは、携帯端末１００と、情報処理装置としての管理サーバ装置１１０とを有している。携帯端末１００と管理サーバ装置１１０は、ネットワーク１２０を介して通信を行うことができる。ＡＲ処理システムにおいて、携帯端末１００は、紙媒体としての紙文書を撮影し、撮影画像を表示装置１０１に表示すると共に、管理サーバ装置１１０に送信する。ここで、撮影画像は動画像であるものとする。管理サーバ装置１１０は、撮影画像を受信すると、撮影画像に関連付けられた電子文書を関連情報として特定し、これを携帯端末１００に送信する。携帯端末１００は、撮影画像の撮影中は、表示画面上に撮影画像を表示し、関連情報を受信すると、撮影画像上に関連画像を重畳して表示する。
図２は、携帯端末１００の撮影対象となる紙文書２００の一例を示す図である。本実施形態において、撮影対象とする紙文書２００は、図２に示すように、複数のコンテンツを含むものとする。図２に示す紙文書２００には、文字列コンテンツ２０１と、写真コンテンツ２０２と、図形コンテンツ２０３と、文字列コンテンツ２０４が表示されている。このように、１枚の紙文書は、複数のコンテンツを含み、各コンテンツの種類は同一の場合もあれば、異なる場合もある。
本実施形態においては、撮影対象の紙文書２００として、比較的大きいサイズのものを想定しており、このため、撮影画像中の各フレームには、紙文書２００の一部のみが含まれるものとする。すなわち、撮影画像中の各フレームには、紙文書２００に含まれる一部のコンテンツのみが含まれる。 FIG. 1 is a diagram showing an overall configuration of an AR (Augmented Reality) processing system as an information processing system. The AR processing system includes a mobile terminal 100 and a management server device 110 as an information processing device. The mobile terminal 100 and the management server device 110 can communicate via the network 120. In the AR processing system, the mobile terminal 100 captures a paper document as a paper medium, displays the captured image on the display device 101, and transmits the captured image to the management server device 110. Here, it is assumed that the captured image is a moving image. When the management server device 110 receives the captured image, the management server device 110 identifies the electronic document associated with the captured image as related information and transmits it to the mobile terminal 100. The mobile terminal 100 displays the captured image on the display screen while capturing the captured image, and displays the related image superimposed on the captured image when the related information is received.
FIG. 2 is a diagram illustrating an example of a paper document 200 to be photographed by the mobile terminal 100. In the present embodiment, it is assumed that the paper document 200 to be photographed includes a plurality of contents as shown in FIG. In a paper document 200 shown in FIG. 2, character string content 201, photo content 202, graphic content 203, and character string content 204 are displayed. Thus, one paper document includes a plurality of contents, and the types of the contents may be the same or different.
In the present embodiment, the paper document 200 to be photographed is assumed to have a relatively large size, and therefore, each frame in the photographed image includes only a part of the paper document 200. To do. That is, each frame in the captured image includes only a part of the content included in the paper document 200.

図３は、携帯端末１００のハードウェア構成を示す図である。撮影部３００は、画像を撮影する。加速度センサ３０１は、携帯端末１００の位置や傾きの変化を検出する。ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）３０２は、各種処理のための演算や論理判断等を行い、バス３１０に接続された各種要素を制御する。なお、携帯端末１００は、加速度センサ３０１に替えて、方位センサを有してもよい。
携帯端末１００には、プログラムメモリとデータメモリを含むメモリが搭載されている。プログラムメモリは、フローチャートにより後述する各種処理手順を含むＣＰＵ３０２による制御のためのプログラムを格納する。このメモリは、ＲＯＭ（Ｒｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）３０３、外部記憶装置等からプログラムがロードされるＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）３０４、またはこれらの組み合わせで実現される。 FIG. 3 is a diagram illustrating a hardware configuration of the mobile terminal 100. The imaging unit 300 captures an image. The acceleration sensor 301 detects a change in the position or tilt of the mobile terminal 100. A CPU (Central Processing Unit) 302 performs calculations and logic determinations for various processes, and controls various elements connected to the bus 310. Note that the mobile terminal 100 may include an orientation sensor instead of the acceleration sensor 301.
The mobile terminal 100 is equipped with a memory including a program memory and a data memory. The program memory stores a program for control by the CPU 302 including various processing procedures to be described later with reference to flowcharts. The memory is realized by a ROM (Read-Only Memory) 303, a RAM (Random Access Memory) 304 loaded with a program from an external storage device or the like, or a combination thereof.

記憶装置３０５は、データやプログラムを記憶しておくためのハードディスク等の装置である。記憶装置３０５は、例えば写真や文書等を保持しておくためにも利用される。なお、写真や文書等を保持しておく装置としは、外部接続やネットワークに接続された不図示のデータ保持装置が用いられてもよい。ここで、外部記憶装置３０６は、例えば、メディア（記録媒体）と、当該メディアへのアクセスを実現するための外部記憶ドライブとにより実現される。このようなメディアとしては、例えば、フレキシブルディスク（ＦＤ）、ＣＤ−ＲＯＭ、ＤＶＤ、ＵＳＢメモリ、ＭＯ、フラッシュメモリ等が挙げられる。
なお、後述する携帯端末１００の機能や処理は、ＣＰＵ３０２がＲＯＭ３０３等に格納されているプログラムを読み出し、このプログラムを実行することにより実現されるものである。 The storage device 305 is a device such as a hard disk for storing data and programs. The storage device 305 is also used for holding, for example, photographs and documents. Note that a data holding device (not shown) connected to an external connection or a network may be used as a device for holding photographs, documents, and the like. Here, the external storage device 306 is realized by, for example, a medium (recording medium) and an external storage drive for realizing access to the medium. Examples of such media include a flexible disk (FD), a CD-ROM, a DVD, a USB memory, an MO, and a flash memory.
Note that the functions and processes of the portable terminal 100 described later are realized by the CPU 302 reading a program stored in the ROM 303 or the like and executing the program.

入力装置３０７は、ユーザからの指示を入力するための装置である。ユーザは入力装置３０７を介して、ＡＲ処理システムに対する指示を入力する。入力装置３０７は、例えば、キーボードやポインティングデバイスにより実現することができる。
表示装置１０１は、処理結果等を表示出力する装置である。表示装置１０１は、例えば、出力のための回路を含むＣＲＴ（Ｃａｔｈｏｄｅ−ＲａｙＴｕｂｅ）や液晶表示器等のディスプレイ装置により実現することができる。本実施形態においては、表示装置１０１は、写真や文書等を表示する。
インタフェース（以下、Ｉ／Ｆという）３０９は、情報の仲介を行う装置であり、携帯端末１００は、Ｉ／Ｆ３０９を介して外部装置とのデータのやり取りを行う。
なお、管理サーバ装置１１０のハードウェア構成は、図２を参照しつつ説明した携帯端末１００のハードウェア構成と同様である。但し、管理サーバ装置１１０は、撮影部３００、加速度センサ３０１等を備えなくともよい。 The input device 307 is a device for inputting an instruction from the user. The user inputs an instruction for the AR processing system via the input device 307. The input device 307 can be realized by a keyboard or a pointing device, for example.
The display device 101 is a device that displays and outputs processing results and the like. The display device 101 can be realized by a display device such as a CRT (Cathode-Ray Tube) including a circuit for output and a liquid crystal display, for example. In the present embodiment, the display device 101 displays photos, documents, and the like.
An interface (hereinafter referred to as I / F) 309 is a device that mediates information, and the mobile terminal 100 exchanges data with an external device via the I / F 309.
The hardware configuration of the management server device 110 is the same as the hardware configuration of the mobile terminal 100 described with reference to FIG. However, the management server device 110 may not include the imaging unit 300, the acceleration sensor 301, and the like.

図４は、携帯端末１００のソフトウェア構成を示す図である。通信部４００は、ネットワーク１２０を介して外部装置と情報の送受信を行う。通信部４００は、例えば管理サーバ装置１１０に撮影画像を送信し、また、管理サーバ装置１１０から、撮影画像に関連する関連情報を受信する。受付部４０１は、ユーザにより入力された指示を受け付ける。撮影制御部４０２は、ユーザによる指示に従い、撮影を行い、動画像を得る。
位置関連情報生成部４０３は、動画像を構成する各フレームの位置関連情報を生成する。本実施形態においては、位置関連情報は、各フレームが撮影された時点における加速度センサ３０１による検出結果と、撮影部３００のズーム状態を示す情報とを含むものとする。なお、撮影画像及び位置情報は、通信部４００を介して、管理サーバ装置１１０に送信される。表示部４０４は、撮影制御部４０２により得られた撮影画像としての動画像を表示装置１０１に表示する。さらに、表示部４０４は、通信部４００が管理サーバ装置１１０から、関連情報を受信した場合には、撮影画像上に関連情報を重畳して表示する。 FIG. 4 is a diagram illustrating a software configuration of the mobile terminal 100. The communication unit 400 transmits / receives information to / from an external device via the network 120. For example, the communication unit 400 transmits a captured image to the management server device 110 and receives related information related to the captured image from the management server device 110. The accepting unit 401 accepts an instruction input by the user. The shooting control unit 402 performs shooting according to an instruction from the user to obtain a moving image.
The position related information generation unit 403 generates position related information of each frame constituting the moving image. In the present embodiment, the position-related information includes a detection result by the acceleration sensor 301 at the time when each frame is photographed and information indicating the zoom state of the photographing unit 300. The captured image and the position information are transmitted to the management server device 110 via the communication unit 400. The display unit 404 displays a moving image as a captured image obtained by the imaging control unit 402 on the display device 101. Furthermore, when the communication unit 400 receives related information from the management server device 110, the display unit 404 displays the related information superimposed on the captured image.

図５は、管理サーバ装置１１０のソフトウェア構成を示す図である。通信部５００は、ネットワーク１２０を介して外部装置と情報の送受信を行う。通信部５００は例えば、携帯端末１００から撮影画像を受信し、また、携帯端末１００に対し、関連情報を送信する。
解析部５０１は、撮影画像を解析し、撮影画像としての動画像を構成する各フレームの特徴量を抽出する。なお、解析処理には、既存の文書解析技術を用いることができる。解析処理については後述する。
電子文書ＤＢ５０２は、電子文書を記憶している。ここで、電子文書は、例えば図４に示す紙文書２００に対応する電子データである。電子文書ＤＢ５０２は、撮影対象となる紙文書に対応する電子文書を記憶している。なお、撮影対象の紙文書が複数存在する場合には、電子文書ＤＢ５０２は、複数の紙文書それぞれに対応する電子文書を記憶する。なお、電子文書ＤＢ５０２は、外部記憶装置３０６等に記憶されているものとする。 FIG. 5 is a diagram illustrating a software configuration of the management server device 110. The communication unit 500 transmits / receives information to / from an external device via the network 120. For example, the communication unit 500 receives a captured image from the mobile terminal 100 and transmits related information to the mobile terminal 100.
The analysis unit 501 analyzes the captured image and extracts the feature amount of each frame constituting the moving image as the captured image. Note that an existing document analysis technique can be used for the analysis processing. The analysis process will be described later.
The electronic document DB 502 stores electronic documents. Here, the electronic document is, for example, electronic data corresponding to the paper document 200 shown in FIG. The electronic document DB 502 stores an electronic document corresponding to a paper document to be photographed. When there are a plurality of paper documents to be photographed, the electronic document DB 502 stores an electronic document corresponding to each of the plurality of paper documents. Note that the electronic document DB 502 is stored in the external storage device 306 or the like.

電子文書ＤＢ５０２はさらに、特徴量テーブルを記憶している。図６（ａ）は、特徴量テーブル６００のデータ構成の一例を示す図である。特徴量テーブル６００は、文書ＩＤに対応付けて、コンテンツＩＤ、属性、特徴量、位置情報、関連情報を記憶している。なお、図６（ａ）に示す文書ＩＤ「００１」の電子文書は、紙文書２００に対応している。
文書ＩＤは、電子文書の識別情報である。コンテンツＩＤは、電子文書に含まれる各コンテンツの識別情報である。コンテンツは、例えば図２に示すコンテンツ２０１〜２０４に対応する情報である。属性は、コンテンツの種別を示す。図６（ａ）において属性「１」は文字列、「２」は写真、「３」は図形の種別を意味する。位置情報は、各コンテンツの文書における位置（Ｘ，Ｙ）と、コンテンツの幅Ｗと、コンテンツの高さＨを含んでいる。
関連情報は、コンテンツに関連する情報として、管理者等により設定された情報である。本実施形態においては、関連情報は、電子文書であるものとするが、関連情報の種類はこれに限定されるものではない。他の例としては、関連情報は、対応付けられている紙文書に対するＵＩ画面であってもよい。また、特徴量テーブルに登録されている、すべてのコンテンツに対し関連情報が対応付けられている必要はない。 The electronic document DB 502 further stores a feature amount table. FIG. 6A is a diagram illustrating an example of a data configuration of the feature amount table 600. The feature amount table 600 stores a content ID, an attribute, a feature amount, position information, and related information in association with the document ID. Note that the electronic document with the document ID “001” shown in FIG. 6A corresponds to the paper document 200.
The document ID is identification information of the electronic document. The content ID is identification information of each content included in the electronic document. The content is information corresponding to the content 201 to 204 shown in FIG. The attribute indicates the type of content. In FIG. 6A, the attribute “1” means a character string, “2” means a photograph, and “3” means a figure type. The position information includes the position (X, Y) of each content in the document, the content width W, and the content height H.
The related information is information set by an administrator or the like as information related to the content. In the present embodiment, the related information is an electronic document, but the type of related information is not limited to this. As another example, the related information may be a UI screen for the associated paper document. Further, it is not necessary that the related information is associated with all the contents registered in the feature amount table.

なお、位置情報は、同一文書に含まれる複数のコンテンツ間の位置関係を示す情報であればよく、具体的な情報の内容は実施形態に限定されるものではない。図６（ｂ）は、特徴量テーブル６１０の他の例を示す図である。図６（ｂ）に示す例においては、位置情報は、対象とするコンテンツに隣接するコンテンツを示す情報である。すなわち、位置情報は、各コンテンツに対し、上下左右それぞれの方向に隣接するコンテンツのコンテンツＩＤを示す情報である。
また、関連情報を提示する対象となる文書が１つのみの場合には、関連情報テーブルは、文書ＩＤを記録しなくともよい。 Note that the position information only needs to be information indicating the positional relationship between a plurality of contents included in the same document, and the specific information content is not limited to the embodiment. FIG. 6B is a diagram illustrating another example of the feature amount table 610. In the example shown in FIG. 6B, the position information is information indicating content adjacent to the target content. That is, the position information is information indicating the content ID of content that is adjacent to each content in each of the vertical and horizontal directions.
In addition, when there is only one document for which related information is presented, the related information table may not record the document ID.

図５に戻り、文書検索部５０３は、電子文書ＤＢ５０２を参照し、解析部５０１により得られた、コンテンツの特徴量に基づいて、撮影画像に対応する電子文書の検索処理を行う。関連情報抽出部５０４は、電子文書ＤＢ５０２を参照し、特定された電子文書のうち、処理時点において携帯端末１００の表示部４０４に表示されているコンテンツに対応する関連情報を抽出する。抽出された関連情報は、通信部５００を介して携帯端末１００に送信される。 Returning to FIG. 5, the document search unit 503 refers to the electronic document DB 502 and performs a search process of the electronic document corresponding to the photographed image based on the content feature amount obtained by the analysis unit 501. The related information extraction unit 504 refers to the electronic document DB 502 and extracts related information corresponding to the content displayed on the display unit 404 of the mobile terminal 100 at the time of processing from among the specified electronic documents. The extracted related information is transmitted to the mobile terminal 100 via the communication unit 500.

図７は、ＡＲ処理システムによる、ＡＲ処理を示すシーケンス図である。Ｓ７００において、携帯端末１００の撮影制御部４０２は、ユーザからの指示に従い、紙文書の撮影を開始する。なお、撮影制御部４０２は、ユーザからの開始指示が入力されるまで、撮影を継続する。次に、Ｓ７０１において、表示部４０４は、撮影制御部４０２により撮影された撮影画像としての動画の表示装置１０１への表示を開始する。なお、撮影及び撮影画像の表示は、ユーザからの終了指示が入力されるまで継続する。
さらに、Ｓ７０２において、位置関連情報生成部４０３は、撮影画像に含まれる各フレームに対する位置関連情報を生成する。次に、Ｓ７０３において、通信部４００は、撮影画像と位置関連情報とを管理サーバ装置１１０に送信する。 FIG. 7 is a sequence diagram showing AR processing by the AR processing system. In step S700, the shooting control unit 402 of the mobile terminal 100 starts shooting a paper document in accordance with an instruction from the user. Note that the shooting control unit 402 continues shooting until a start instruction is input from the user. In step S 701, the display unit 404 starts displaying a moving image as a captured image captured by the capturing control unit 402 on the display device 101. Note that shooting and display of the shot image continue until an end instruction is input from the user.
Further, in S702, the position related information generation unit 403 generates position related information for each frame included in the captured image. In step S 703, the communication unit 400 transmits the captured image and the position related information to the management server device 110.

一方、管理サーバ装置１１０においては、Ｓ７０３において、通信部５００が、撮影画像及び位置情報を受信すると、Ｓ７０４において、解析部５０１は、撮影画像に対する解析処理を行い、各フレームの特徴量を抽出する。なお、Ｓ７０３における画像解析処理については、図８を参照しつつ後述する。Ｓ７０５において、文書検索部５０３は、Ｓ７０４において抽出された特徴量に基づいて、特徴量テーブル６００を参照し、撮影画像の被写体としての紙文書に対応する電子文書を検索する。文書検索部５０３はさらに、電子文書に対する撮影画像の角度（回転角度）を特定する。
次に、Ｓ７０６において、文書検索部５０３は、紙文書に対応する電子文書を特定できたか否かを確認する。文書検索部５０３は、電子文書を特定できた場合には（Ｓ７０６でＹｅｓ）。処理をＳ７０７へ進める。文書検索部５０３は、電子文書を特定できなかった場合には（Ｓ７０６でＮｏ）、Ｓ７０３において、再び後続のフレームの受信を待つ。
Ｓ７０７において、関連情報抽出部５０４は、特徴量テーブル６００を参照し、撮影画像のうち最後のフレームに表示されているコンテンツに対応付けられている関連情報を抽出する（関連情報抽出処理）。さらに、関連情報抽出部５０４は、最後のフレームのうち、関連情報に対応するコンテンツが表示されているコンテンツ領域（対象コンテンツ領域と称する）の位置に基づいて、関連情報を表示すべき位置を特定する。次に、Ｓ７０８において、通信部５００は、関連情報と、関連情報の表示位置と、Ｓ７０５において特定された対象コンテンツ領域の回転角度とを携帯端末１００に送信する。 On the other hand, in the management server device 110, when the communication unit 500 receives the captured image and the position information in S703, the analysis unit 501 performs analysis processing on the captured image and extracts the feature amount of each frame in S704. . Note that the image analysis processing in S703 will be described later with reference to FIG. In step S 705, the document search unit 503 searches the electronic document corresponding to the paper document as the subject of the photographed image with reference to the feature amount table 600 based on the feature amount extracted in step S 704. The document search unit 503 further specifies the angle (rotation angle) of the captured image with respect to the electronic document.
In step S 706, the document search unit 503 confirms whether an electronic document corresponding to the paper document has been identified. If the electronic document can be identified (S706: Yes), the document search unit 503 can identify the electronic document. The process proceeds to S707. If the electronic document cannot be identified (No in S706), the document search unit 503 again waits for reception of a subsequent frame in S703.
In step S 707, the related information extraction unit 504 refers to the feature amount table 600 and extracts related information associated with the content displayed in the last frame from the captured image (related information extraction processing). Further, the related information extraction unit 504 identifies the position where the related information is to be displayed based on the position of the content area (referred to as the target content area) where the content corresponding to the related information is displayed in the last frame. To do. Next, in S708, the communication unit 500 transmits the related information, the display position of the related information, and the rotation angle of the target content area specified in S705 to the mobile terminal 100.

携帯端末１００においては、Ｓ７０８において、通信部４００が関連情報等を受信すると、Ｓ７０９において、表示部４０４は、関連情報と、関連情報の表示位置と、回転角度とに基づいて、重畳すべきデータを生成する。そして、表示部４０４は、表示中の撮影画像上に生成したデータを重畳表示する（重畳表示処理）。
なお、紙画像は様々な角度で撮影される可能性がある。そこで、本実施形態においては、携帯端末１００は、回転角度を用いることにより、表示装置１０１に表示中の撮影画像中の紙文書の上下方向と重畳対象の関連情報の上下方向を合わせることとした。これにより、携帯端末１００は、関連情報をより見やすい状態で関連情報を表示することができる。以上の処理により、ユーザは撮影した紙文書に関連する関連情報を携帯端末１００上で閲覧することが可能になる。 In portable terminal 100, when communication unit 400 receives related information or the like in S708, display unit 404 displays data to be superimposed based on the related information, the display position of the related information, and the rotation angle in S709. Is generated. The display unit 404 superimposes the generated data on the captured image being displayed (superimposition display process).
Note that paper images may be taken at various angles. Therefore, in the present embodiment, the mobile terminal 100 uses the rotation angle to match the vertical direction of the paper document in the captured image being displayed on the display device 101 with the vertical direction of the related information to be superimposed. . Thereby, the portable terminal 100 can display related information in a state in which the related information is easier to see. Through the above processing, the user can browse related information related to the photographed paper document on the portable terminal 100.

図８は、図７を参照しつつ説明した画像解析処理（Ｓ７０４）における詳細な処理を示すフローチャートである。Ｓ８００において、解析部５０１は、撮影画像の各フレームから意味のあるブロック（領域）を塊として認識して、ブロック各々に表示されるコンテンツの属性を判定する。
図９は、処理対象のフレームの一例を示す図である。図９（ａ）に示すフレーム９００からは、解析部５０１は、３つのコンテンツ領域９０１〜９０３を抽出する（領域抽出処理）。ここで、コンテンツ領域９０１は、コンテンツ２０１を表示する領域である。コンテンツ領域９０２は、コンテンツ２０２を表示する領域である。コンテンツ領域９０３は、コンテンツ２０３を表示する領域である。解析部５０１はさらに、各コンテンツ領域の属性を特定する。図９に示すコンテンツ領域９０１〜９０３に対しては、それぞれテキスト、写真及び図形の属性が特定される。
また、図９（ｂ）に示すフレーム９１０からは、解析部５０１は、１つのコンテンツ領域９１１のみを抽出し、コンテンツ領域９１１の属性をテキストと特定する。 FIG. 8 is a flowchart showing detailed processing in the image analysis processing (S704) described with reference to FIG. In S800, the analysis unit 501 recognizes meaningful blocks (regions) from each frame of the captured image as a block, and determines the attribute of the content displayed in each block.
FIG. 9 is a diagram illustrating an example of a frame to be processed. The analysis unit 501 extracts three content areas 901 to 903 from the frame 900 shown in FIG. 9A (area extraction processing). Here, the content area 901 is an area for displaying the content 201. The content area 902 is an area for displaying the content 202. The content area 903 is an area for displaying the content 203. The analysis unit 501 further specifies the attribute of each content area. For the content areas 901 to 903 shown in FIG. 9, the attributes of text, photograph, and figure are specified, respectively.
Also, from the frame 910 shown in FIG. 9B, the analysis unit 501 extracts only one content area 911 and identifies the attribute of the content area 911 as text.

以下、コンテンツ領域を抽出し、属性を特定する処理をより具体的に説明する。解析部５０１は、フレームを受け取ると、これを白黒画像に二値化する。そして、解析部５０１は、輪郭線追跡を行い黒画素輪郭で囲まれる画素のかたまりを抽出する。解析部５０１は、一定面積以上の黒画素が抽出された場合には、内部にある白画素に対しても輪郭線追跡を行い白画素のかたまりを抽出する。解析部５０１は、抽出した白画素のかたまりが一定面積以上であれば、さらに黒画素のかたまりを抽出する。解析部５０１は、抽出されたかたまりが一定面積以上であれば、この抽出処理を再帰的に実行する。
そして、解析部５０１は、抽出処理で得られた黒画素のかたまりを大きさ及び形状により様々な属性を持つブロックとして分類する。解析部５０１は、例えば、縦横比が１に近いブロックは文字相当の画素のかたまりとし、隣接する文字相当の画素のかたまりが整列されていてグループ化可能な場合はテキストブロックとする。また、解析部５０１は、不定形の画素のかたまりが散在する場合は、写真ブロック、それ以外は図形ブロック等に分類する。 Hereinafter, the process of extracting the content area and specifying the attribute will be described more specifically. Upon receiving the frame, the analysis unit 501 binarizes the frame into a black and white image. Then, the analysis unit 501 performs contour tracking and extracts a block of pixels surrounded by a black pixel contour. When black pixels having a certain area or more are extracted, the analysis unit 501 performs contour tracking for white pixels inside to extract a block of white pixels. The analysis unit 501 further extracts a black pixel block if the extracted white pixel block is larger than a certain area. The analysis unit 501 performs this extraction process recursively if the extracted cluster is larger than a certain area.
Then, the analysis unit 501 classifies the block of black pixels obtained by the extraction process as blocks having various attributes depending on the size and shape. For example, the analysis unit 501 determines a block having an aspect ratio close to 1 as a block of pixels corresponding to a character, and sets a block of pixels corresponding to a character as a text block when the blocks of pixels corresponding to adjacent characters are aligned. Further, the analysis unit 501 classifies a block of irregular pixels into a photographic block, and classifies a block other than that as a graphic block.

Ｓ８００において、解析部５０１は、さらに撮影画像中において既に処理済みの他のフレームと処理対象のフレームそれぞれの位置関連情報に基づいて、フレーム間の位置関係を特定する。そして、解析部５０１は、フレーム間の位置関係に基づいて、複数のフレームから得られた複数のコンテンツ領域間の位置関係を特定する（位置特定処理）。
なお、図９に示すフレーム９００のように、１フレーム内に複数のコンテンツ領域が含まれている場合には、フレーム内の複数のコンテンツ領域間の位置関係を特定する。図９に示すフレーム９００においては、コンテンツ領域９０１の右にコンテンツ領域９０２が存在する、コンテンツ領域９０２の下にコンテンツ領域９０３が存在するという位置関係が特定される。 In S800, the analysis unit 501 further specifies the positional relationship between the frames based on the position related information of each of the other frames already processed in the captured image and the frame to be processed. And the analysis part 501 specifies the positional relationship between the some content area | region obtained from the some flame | frame based on the positional relationship between frames (position specifying process).
When a plurality of content areas are included in one frame as in a frame 900 shown in FIG. 9, the positional relationship between the plurality of content areas in the frame is specified. In the frame 900 shown in FIG. 9, the positional relationship that the content area 902 exists to the right of the content area 901 and the content area 903 exists below the content area 902 is specified.

解析部５０１は、さらにブロックとして得られたコンテンツ領域に対し領域ＩＤを付与する。ここで、領域ＩＤは、コンテンツ領域の識別情報である。解析部５０１は、領域ＩＤと、領域に対して特定された属性と、位置関係を示す位置情報を撮影画像テーブルに記録する。ここで、撮影画像テーブルは、例えば、ＲＡＭ３０４又は外部記憶装置３０６に記憶されているものとする。
図１０は、撮影画像テーブル１０００のデータ構成の一例を示す図である。撮影画像テーブル１０００は、領域ＩＤと、属性と、位置情報と、特徴量とを対応付けて記憶する。図１０に示す各行が１つのレコードに相当する。Ｓ８００においては、解析部５０１は、領域ＩＤ、属性と、位置情報とを対応付けて、１つのレコードとして撮影画像テーブル１０００に記録する。なお、特徴量は、後述のＳ８０２において、領域ＩＤに対応付けて、追加して記録される。 The analysis unit 501 further assigns an area ID to the content area obtained as a block. Here, the area ID is identification information of the content area. The analysis unit 501 records the area ID, the attribute specified for the area, and position information indicating the positional relationship in the captured image table. Here, it is assumed that the captured image table is stored in the RAM 304 or the external storage device 306, for example.
FIG. 10 is a diagram illustrating an example of a data configuration of the photographed image table 1000. The captured image table 1000 stores a region ID, an attribute, position information, and a feature amount in association with each other. Each row shown in FIG. 10 corresponds to one record. In S800, the analysis unit 501 records the area ID, the attribute, and the position information in the captured image table 1000 as one record in association with each other. Note that the feature amount is additionally recorded in association with the region ID in S802 to be described later.

図８に戻り、Ｓ８００の処理の後、Ｓ８０１において、解析部５０１は、Ｓ８００において得られた各コンテンツ領域の特徴量を算出する（特徴量算出処理）。次に、Ｓ８０２において、解析部５０１は、コンテンツ領域に対応付けて、特徴量を撮影画像テーブル１０００に記録する。 Returning to FIG. 8, after the process of S800, in S801, the analysis unit 501 calculates the feature amount of each content region obtained in S800 (feature amount calculation process). In step S 802, the analysis unit 501 records the feature amount in the captured image table 1000 in association with the content area.

ここで、撮影画像テーブル１０００に情報が記録される処理について、具体的に説明する。例えば、図１１（ａ）に示すように、携帯端末１００において紙文書２００の撮影が開始され、表示装置１０１に、撮影画像として、コンテンツ２０１の画像が表示されているとする。
その後、ユーザが携帯端末１００の傾きを変える等して、撮影方向を紙文書２００の右方向（矢印Ａの方向）に移動すると、表示装置１０１には、図１１（ｂ）に示すように、撮影画像として、コンテンツ２０２の画像が表示される。さらに、撮影方向が紙文書２００の下方向（矢印Ｂの方向）に移動すると、表示装置１０１には、図１１（ｃ）に示すように、撮影画像として、コンテンツ２０３の画像が表示される。 Here, a process for recording information in the captured image table 1000 will be specifically described. For example, as illustrated in FIG. 11A, it is assumed that photographing of a paper document 200 is started on the mobile terminal 100 and the image of the content 201 is displayed on the display device 101 as a photographed image.
Thereafter, when the user moves the photographing direction to the right of the paper document 200 (the direction of the arrow A) by changing the tilt of the mobile terminal 100, the display device 101 has a display as shown in FIG. An image of the content 202 is displayed as a captured image. Further, when the shooting direction moves downward (in the direction of arrow B), the image of the content 203 is displayed on the display device 101 as a shot image, as shown in FIG.

以上のような撮影方向の変化を伴う撮影画像を処理対象とする場合、まず図１１（ａ）に示すフレーム１１１１が解析部５０１に入力され、フレーム１１１１に対する領域解析処理（Ｓ７０４）が実行される。領域解析処理（Ｓ８００）において、解析部５０１は、コンテンツ２０１の画像の領域をコンテンツ領域１１０１として抽出し、このコンテンツ領域１１０１に対し、コンテンツＩＤ「ａ００１」を発行する。解析部５０１はさらに、コンテンツ領域１１０１の属性をテキストと特定する。
なお、この時点では、他のフレームの解析は行われておらず、フレーム１１１１のみからは、他のコンテンツ領域との位置関係は分からない。そこで、この時点では、解析部５０１は、撮影画像テーブル１０００に、領域ＩＤ「ａ００１」を記録し、これに対応付けて、属性「１」を記録する。その後、解析部５０１は、Ｓ８０１において、コンテンツ領域１１０１の特徴量を算出し、Ｓ８０２において、これを領域ＩＤ「ａ００１」に対応付けて記録する。 When a captured image with a change in shooting direction as described above is to be processed, a frame 1111 shown in FIG. 11A is first input to the analysis unit 501, and an area analysis process (S 704) for the frame 1111 is executed. . In the area analysis process (S800), the analysis unit 501 extracts the image area of the content 201 as the content area 1101, and issues a content ID “a001” to the content area 1101. The analysis unit 501 further identifies the attribute of the content area 1101 as text.
At this time, the analysis of other frames has not been performed, and the positional relationship with other content areas is not known only from the frame 1111. Therefore, at this time, the analysis unit 501 records the region ID “a001” in the captured image table 1000 and records the attribute “1” in association with the region ID “a001”. Thereafter, the analysis unit 501 calculates the feature amount of the content area 1101 in S801, and records it in association with the area ID “a001” in S802.

続いて、フレーム１１１２が解析部５０１に入力されると、フレーム１１１２に対し、画像解析処理（Ｓ７０４）が実行される。すなわち、領域解析処理（Ｓ８００）において、解析部５０１は、コンテンツ２０２の画像の領域をコンテンツ領域１１０２として抽出し、このコンテンツ領域１１０１に対し、コンテンツＩＤ「ａ００２」を発行する。解析部５０１はさらに、コンテンツ領域１１０２の属性を写真と特定する。
さらに、解析部５０１は、処理済みのフレーム１１１１からフレーム１１１２までの間に撮影方向が紙文書２００の右方向に移動したことを位置関連情報と、各フレーム１１１１，１１１１中の画素変化から特定する。そして、解析部５０１は、撮影方向の変化から、コンテンツ領域１１０２の左方向に、コンテンツ領域１１０１が存在することを特定する。 Subsequently, when the frame 1112 is input to the analysis unit 501, an image analysis process (S 704) is performed on the frame 1112. That is, in the region analysis process (S800), the analysis unit 501 extracts the image region of the content 202 as the content region 1102, and issues a content ID “a002” to the content region 1101. The analysis unit 501 further identifies the attribute of the content area 1102 as a photograph.
Further, the analysis unit 501 specifies that the shooting direction has moved to the right of the paper document 200 between the processed frame 1111 and the frame 1112 from the position related information and the pixel change in each of the frames 1111 and 1111. . Then, the analysis unit 501 specifies that the content area 1101 exists on the left side of the content area 1102 from the change in the shooting direction.

そして、解析部５０１は、撮影画像テーブル１０００に、領域ＩＤ「ａ００２」を記録し、これに対応付けて、属性「２」を記録し、さらに、左方向に「ａ００１」で特定されるコンテンツ領域１１０１が存在することを示す位置情報を記録する。解析部５０１はまた、このとき、領域ＩＤ「ａ００１」に対応付けて、右方向に「ａ００２」で特定されるコンテンツ領域１１０２が存在することを示す位置情報を記録する。その後、Ｓ８０１において、コンテンツ領域１１０２の特徴量を算出し、Ｓ８０２において、これを領域ＩＤ「ａ００２」に対応付けて記録する。
さらに図１０に示す撮影画像テーブル１０００には、コンテンツ領域１１０３が登録されている。図１０に示す撮影画像テーブル１０００の位置情報から、領域ＩＤ「ａ００１」のコンテンツ領域１１０１の左に領域ＩＤ「ａ００２」のコンテンツ領域１１０２が存在することがわかる。さらに、この位置情報から、領域ＩＤ「ａ００２」のコンテンツ領域１１０２の下に領域ＩＤ「ａ００３」のコンテンツ領域１１０３が存在することがわかる。 Then, the analysis unit 501 records the region ID “a002” in the captured image table 1000, records the attribute “2” in association with the region ID “a002”, and further identifies the content region identified by “a001” in the left direction. Position information indicating that 1101 exists is recorded. At this time, the analysis unit 501 also records position information indicating that the content area 1102 identified by “a002” exists in the right direction in association with the area ID “a001”. Thereafter, in S801, the feature amount of the content area 1102 is calculated, and in S802, this is recorded in association with the area ID “a002”.
Further, a content area 1103 is registered in the captured image table 1000 shown in FIG. From the position information of the captured image table 1000 shown in FIG. 10, it can be seen that the content area 1102 with the area ID “a002” exists to the left of the content area 1101 with the area ID “a001”. Furthermore, it can be seen from this position information that a content area 1103 with area ID “a003” exists below the content area 1102 with area ID “a002”.

なお、Ｓ８０２において抽出された特徴量に対応付けられた領域ＩＤが既に撮影画像テーブル１０００に登録されている場合には、撮影画像テーブル１０００への新たなコンテンツ領域の登録は行わない。なお、抽出された特徴量に対応する領域ＩＤが既に登録されている場合とは、処理済みの他のフレームに同一のコンテンツが表示されており、この他のフレームの処理において、コンテンツ領域が既に撮影画像テーブル１０００に登録されている場合である。
また、１つのフレームに複数のコンテンツ領域が含まれている場合には、解析部５０１は、コンテンツ領域間の位置関係を特定し、これを位置情報として撮影画像テーブル１０００に登録する。 If the area ID associated with the feature amount extracted in S802 is already registered in the captured image table 1000, no new content area is registered in the captured image table 1000. Note that the case where the area ID corresponding to the extracted feature amount is already registered means that the same content is displayed in another processed frame, and the content area has already been displayed in the processing of this other frame. This is a case where it is registered in the photographed image table 1000.
When a plurality of content areas are included in one frame, the analysis unit 501 identifies the positional relationship between the content areas and registers this in the captured image table 1000 as position information.

図１２は、図８を参照しつつ説明した特徴量抽出処理（Ｓ８０１）における詳細な処理を示すフローチャートである。特徴量抽出処理は、画像の局所的な特徴量（局所特徴量）を抽出する処理である。局所特徴量は、回転不変、拡大・縮小不変という性質を持つ。この特徴により、画像を回転、拡大又は縮小させた場合であっても特徴量を使った検索処理が可能となる。局所特徴量抽出処理は、既存の局所特徴量抽出処理技術を用いて行うことが可能である。
Ｓ１２００において、解析部５０１は、処理対象のフレームを読み込む。次に、Ｓ１２０１において、解析部５０１は、入力フレームに対し輝度成分を抽出し、輝度成分画像を作成する。次に、Ｓ１２０２において、解析部５０１は、輝度成分画像から縮小画像を生成する。具体的には、解析部５０１は、緯度成分画像を、倍率ｐに従って順次縮小し、縮小画像をｎ枚生成する。ここで、倍率ｐ及び縮小画像の枚数ｎは予め規定され、ＲＯＭ３０３等に格納されているものとする。 FIG. 12 is a flowchart showing detailed processing in the feature amount extraction processing (S801) described with reference to FIG. The feature amount extraction processing is processing for extracting a local feature amount (local feature amount) of an image. A local feature has a property of rotation invariance and enlargement / reduction invariance. With this feature, even when the image is rotated, enlarged or reduced, a search process using the feature amount is possible. The local feature quantity extraction processing can be performed using an existing local feature quantity extraction processing technique.
In step S1200, the analysis unit 501 reads a processing target frame. Next, in S1201, the analysis unit 501 extracts a luminance component from the input frame and creates a luminance component image. Next, in S1202, the analysis unit 501 generates a reduced image from the luminance component image. Specifically, the analysis unit 501 sequentially reduces the latitude component image according to the magnification p, and generates n reduced images. Here, it is assumed that the magnification p and the number n of reduced images are defined in advance and stored in the ROM 303 or the like.

図１３は、縮小画像生成処理（Ｓ１２０２）を説明するための図である。図１３には、倍率ｐに２の−（１／４）乗、縮小画像の枚数ｎに９が設定されている場合の例を示す。ただし、この例では、倍率ｐを面積比ではなく辺の長さの比としている。
図１３において、１３０１は、Ｓ１２０１において作成された輝度成分画像である。１３０２は、輝度成分画像１００１から倍率ｐに従って４回縮小された縮小画像であり、輝度成分画像１３０１を１／２に縮小した画像に相当する。また、１３０３は、輝度成分画像１３０１から倍率ｐに従って８回縮小された縮小画像であり、輝度成分画像１３０１を１／４に縮小した画像に相当する。なお、スケール番号１３１０は、縮小画像のサイズが大きい順に付与される番号である。
なお、画像を縮小する方法の他の例としては、単純に画素を間引く方法、線形補間を用いる方法、低域フィルタ適用後にサンプリングする方法等が挙げられる。 FIG. 13 is a diagram for explaining the reduced image generation processing (S1202). FIG. 13 shows an example where the magnification p is set to 2 to the power of-(1/4) and the number n of reduced images is set to 9. However, in this example, the magnification p is not the area ratio but the side length ratio.
In FIG. 13, reference numeral 1301 denotes the luminance component image created in S1201. Reference numeral 1302 denotes a reduced image obtained by reducing the luminance component image 1001 four times according to the magnification p, and corresponds to an image obtained by reducing the luminance component image 1301 to ½. Reference numeral 1303 denotes a reduced image obtained by reducing the luminance component image 1301 eight times in accordance with the magnification p, and corresponds to an image obtained by reducing the luminance component image 1301 to ¼. Note that the scale number 1310 is a number assigned in descending order of the size of the reduced image.
Other examples of the method of reducing the image include a method of simply thinning out pixels, a method of using linear interpolation, a method of sampling after applying a low-pass filter, and the like.

図１２に戻り、Ｓ１２０２の後、Ｓ１２０３において、解析部５０１は、Ｓ１２０２において得られたｎ枚の縮小画像のそれぞれから局所的な特徴点（局所特徴点）を抽出する。ここで抽出される局所特徴点は、画像に回転や縮小等の画像処理を施しても同じ場所から安定的に抽出されるようなロバストな局所特徴点である。このような局所特徴点を抽出する方法として、Ｈａｒｒｉｓ作用素等が挙げられる。
具体的には、解析部５０１は、Ｈａｒｒｉｓ作用素を作用させて得られた画像の画素それぞれについて、着目画素とその周辺８近傍にある画素（合計９画素）の画素値を調べる。そして、解析部５０１は、着目画素の画素値が閾値以上、かつ局所極大になる（９画素の中で画素値が最大になる）場合に、着目画素が位置する点を局所特徴点として抽出する。なお、解析部５０１がロバストな局所特徴点を抽出す処理は、実施形態に限定されるものではない。 Returning to FIG. 12, after S1202, in S1203, the analysis unit 501 extracts local feature points (local feature points) from each of the n reduced images obtained in S1202. The local feature points extracted here are robust local feature points that are stably extracted from the same place even if image processing such as rotation or reduction is performed on the image. As a method for extracting such local feature points, a Harris operator or the like can be cited.
Specifically, the analysis unit 501 examines the pixel values of the pixel of interest and the pixels in the vicinity of the periphery 8 (total of 9 pixels) for each pixel of the image obtained by applying the Harris operator. Then, the analysis unit 501 extracts a point where the pixel of interest is located as a local feature point when the pixel value of the pixel of interest is equal to or greater than the threshold value and has a local maximum (the pixel value of nine pixels is maximum). . In addition, the process which the analysis part 501 extracts a robust local feature point is not limited to embodiment.

次に、Ｓ１２０４において、解析部５０１は、Ｓ１２０３において得られた局所特徴点それぞれについて、画像の回転があっても不変となるように定義された特徴量（局所特徴量）を算出する。解析部５０１は、この局所特徴量の算出方法として、以下に示す文献のＬｏｃａｌＪｅｔ及びそれらの導関数の組み合わせを用いる。
Ｊ．Ｊ．ＫｏｅｎｄｅｒｉｎｋａｎｄＡ．Ｊ．ｖａｎＤｏｏｒｎ， "Ｒｅｐｒｅｓｅｎｔａｔｉｏｎｏｆｌｏｃａｌｇｅｏｍｅｔｒｙｉｎｔｈｅｖｉｓｕａｌｓｙｓｔｅｍ，" ＲｉｏｌｏｇｉｃａｌＣｙｂｅｒｎｅｔｉｃｓ，ｖｏｌ．５５，ｐｐ．３６７−３７５，１９８７ Next, in S1204, the analysis unit 501 calculates, for each local feature point obtained in S1203, a feature quantity (local feature quantity) that is defined so as to remain unchanged even when the image is rotated. The analysis unit 501 uses a local jet of the following literature and a combination of derivatives thereof as a method for calculating the local feature amount.
J. et al. J. et al. Koenderink and A.K. J. et al. van Doorn, "Representation of local geometry in the visual system," Riologic Cybernetics, vol. 55, pp. 367-375, 1987

この手法により算出される局所特徴量は、拡大縮小、回転に対して、比較的高い耐性を持つような特性を持たせることができる。具体的には、解析部５０１は、式（１）に示す局所的な特徴量ｖを算出する。

ただし、式（１）の右辺で用いている記号は、以下に示す式（２）〜式（７）で定義される。ここで、式（２）右辺のＧ（ｘ，ｙ）はガウス関数、Ｉ（ｘ，ｙ）は画像の座標（ｘ，ｙ）における画素値であり、"＊"は畳み込み演算を表す記号である。また、式（３）は式（２）で定義された変数Ｌのｘに関する偏導関数、式（４）は当該変数Ｌのｙに関する偏導関数である。式（５）は式（３）で定義された変数Ｌｘのｙに関する偏導関数、式（６）は式（３）で定義された変数Ｌｘのｘに関する偏導関数、式（７）は式（４）で定義されたＬｙのｙに関する偏導関数である。

なお、局所特徴量の算出方法は、上述の方法に限定されるものではなく、他の局所特徴量の算出方法も適用可能である。 The local feature amount calculated by this method can have characteristics that have a relatively high resistance to scaling and rotation. Specifically, the analysis unit 501 calculates a local feature amount v shown in Expression (1).

However, the symbols used on the right side of the equation (1) are defined by the following equations (2) to (7). Here, G (x, y) on the right side of Expression (2) is a Gaussian function, I (x, y) is a pixel value at image coordinates (x, y), and “*” is a symbol representing a convolution operation. is there. Equation (3) is a partial derivative of variable L defined by equation (2) with respect to x, and equation (4) is a partial derivative of variable L with respect to y. Equation (5) is the partial derivative of variable Lx defined in equation (3) with respect to y, equation (6) is the partial derivative of variable Lx defined in equation (3) with respect to x, and equation (7) is the equation. It is a partial derivative with respect to y of Ly defined in (4).

Note that the local feature amount calculation method is not limited to the above-described method, and other local feature amount calculation methods are also applicable.

図１４は、図７を参照しつつ説明した電子文書検索処理（Ｓ７０５）における詳細な処理を示すフローチャートである。電子文書検索処理では、前述のように、特徴量に基づいて、撮影画像に対応する電子文書を検索し、さらに、撮影画像に対する関連情報の回転角度の特定を行う。Ｓ１４００において、文書検索部５０３は、撮影画像テーブル１０００に登録されているコンテンツ領域の中から、後述の特徴量照合処理（Ｓ１４０１）が行われていない、未処理のコンテンツ領域を１つ選択する。
なお、未処理のコンテンツ領域を特定すべく、撮影画像テーブル１０００の各領域ＩＤに対応付けて処理済みか否かを示すフラグを記憶しておくこととしてもよい。また、未処理のコンテンツ領域を順次選択する処理としては、撮影画像テーブル１０００の先頭から順番にサーチしてもよく、また他の例としては、ランダムにサーチしてもよい。 FIG. 14 is a flowchart showing detailed processing in the electronic document search processing (S705) described with reference to FIG. In the electronic document search process, as described above, the electronic document corresponding to the captured image is searched based on the feature amount, and the rotation angle of the related information with respect to the captured image is specified. In step S 1400, the document search unit 503 selects one unprocessed content area that has not been subjected to a feature amount matching process (S 1401) described below from the content areas registered in the captured image table 1000.
In order to specify an unprocessed content area, a flag indicating whether or not the process has been performed may be stored in association with each area ID of the captured image table 1000. Further, as a process of sequentially selecting unprocessed content areas, a search may be performed in order from the top of the captured image table 1000, or as another example, a search may be performed randomly.

次に、Ｓ１４０１において、文書検索部５０３は、Ｓ１４００において選択したコンテンツ領域の特徴量と、特徴量テーブル６００に登録されている電子文書のコンテンツの特徴量とを照合する。さらに、両者が一致すると判定した場合には、判定家庭で算出された値を利用して、コンテンツ領域に重畳して表示する関連情報の回転角度を算出する。なお、特徴量照合処理については、図１５を参照しつつ後に詳述する。
Ｓ１４０２において、文書検索部５０３は、撮影画像テーブル１０００に等登録されているすべてのコンテンツ領域に対しＳ１４０１の処理が完了したか否かを確認する。文書検索部５０３は、未処理のコンテンツ領域が存在する場合には（Ｓ１４０２でＹｅｓ）、処理をＳ１４００へ進め、未処理のコンテンツ領域を選択し、処理を継続する。文書検索部５０３はまた、未処理のコンテンツ領域が存在しない場合には（Ｓ１４０２でＮｏ）、処理をＳ１４０３へ進める。
なお、Ｓ１４００〜Ｓ１４０２の処理は、特徴量テーブル６００において、コンテンツ領域それぞれに対応するコンテンツを特定するコンテンツ特定処理の一例である。 In step S 1401, the document search unit 503 collates the feature amount of the content area selected in step S 1400 with the feature amount of the electronic document content registered in the feature amount table 600. Furthermore, when it is determined that the two match, the rotation angle of the related information to be displayed superimposed on the content area is calculated using the value calculated at the determination home. The feature amount matching process will be described in detail later with reference to FIG.
In step S 1402, the document search unit 503 confirms whether or not the processing in step S 1401 has been completed for all content areas that are registered in the captured image table 1000. If there is an unprocessed content area (Yes in S1402), the document search unit 503 advances the process to S1400, selects an unprocessed content area, and continues the process. If there is no unprocessed content area (No in S1402), the document search unit 503 advances the process to S1403.
Note that the processing of S1400 to S1402 is an example of content specifying processing for specifying content corresponding to each content area in the feature amount table 600.

Ｓ１４０３において、文書検索部５０３は、Ｓ１４００及びＳ１４０１の繰り返し処理において得られた、撮影画像テーブル１０００に登録されているすべてのコンテンツ領域に対応するコンテンツを含む電子文書を特定する。そして、文書検索部５０３は、これを撮影画像に対応する電子文書の候補とする。
次に、Ｓ１４０４において、文書検索部５０３は、撮影画像テーブル１０００に登録されているすべてのコンテンツ領域間の位置関係が、特徴量テーブル６００に登録されている位置情報に示される位置関係を満たすか否かを判断する。文書検索部５０３は、位置情報に示される位置関係を満たさない場合には（Ｓ１４０４でＮｏ）、処理をＳ１４０５へ進める。Ｓ１４０５において、文書検索部５０３は、電子文書の候補は、撮影画像に対応する電子文書ではなく、電子文書の特定ができないと判断し、電子文書検索処理を終了する。 In step S1403, the document search unit 503 specifies an electronic document including content corresponding to all content areas registered in the captured image table 1000, which is obtained in the repetition processing in steps S1400 and S1401. Then, the document search unit 503 sets this as a candidate for the electronic document corresponding to the captured image.
In step S 1404, the document search unit 503 determines whether the positional relationship between all the content areas registered in the captured image table 1000 satisfies the positional relationship indicated in the positional information registered in the feature amount table 600. Judge whether or not. If the position relationship indicated by the position information is not satisfied (No in S1404), the document search unit 503 advances the process to S1405. In step S 1405, the document search unit 503 determines that the electronic document candidate is not the electronic document corresponding to the photographed image, and the electronic document cannot be specified, and ends the electronic document search process.

一方、文書検索部５０３は、位置情報に示される位置関係を満たす場合に（Ｓ１４０４でＹｅｓ）、処理をＳ１４０６へ進める。Ｓ１４０６において、文書検索部５０３は、位置情報に示される位置関係を満たす電子文書の候補が複数存在するか否かを判断する。文書検索部５０３は、複数存在する場合には（Ｓ１４０６でＹｅｓ）、処理をＳ１４０５へ進め、電子文書を特定できないと判断し、電子文書検索処理を終了する。
一方、文書検索部５０３は、位置情報に示される位置関係を満たす電子文書の候補が１つのみ存在する場合には（Ｓ１４０６でＮｏ）、処理をＳ１４０７へ進める。Ｓ１４０７において、文書検索部５０３は、得られた電子文書の候補を、撮影画像に対応する電子文書として特定する（電子文書特定処理）。
なお、文書検索部５０３は、抽出されたコンテンツ領域の特徴量と、コンテンツ領域間の位置関係とに基づいて撮影画像に対応する電子文書を特定すればよく、そのための具体的な処理は、実施形態に限定されるものではない。 On the other hand, when the positional relationship indicated by the positional information is satisfied (Yes in S1404), the document search unit 503 advances the process to S1406. In step S1406, the document search unit 503 determines whether there are a plurality of electronic document candidates that satisfy the positional relationship indicated by the positional information. If there are a plurality of documents (Yes in S1406), the document search unit 503 advances the process to S1405, determines that the electronic document cannot be specified, and ends the electronic document search process.
On the other hand, when there is only one electronic document candidate that satisfies the positional relationship indicated by the positional information (No in S1406), the document search unit 503 advances the process to S1407. In step S1407, the document search unit 503 specifies the obtained electronic document candidate as an electronic document corresponding to the captured image (electronic document specifying process).
Note that the document search unit 503 may specify an electronic document corresponding to the captured image based on the extracted feature amount of the content area and the positional relationship between the content areas, and specific processing for that is performed. The form is not limited.

図１５は、図１４に示す特徴量照合処理（Ｓ１４０１）における詳細な処理を示すフローチャートである。特徴量照合処理においては、撮影画像において得られたコンテンツ領域に表示される画像と、電子文書のコンテンツの類似度を算出し、算出した類似度に基づいて、コンテンツ領域に対応するコンテンツを特定する。文書検索部５０３は、具体的には、閾値以上かつ最大の類似度を示すコンテンツをコンテンツ領域の画像に対応するコンテンツとして特定する。
特徴量照合処理の説明に先立ち、特徴量照合処理において使用される記号について説明する。撮影画像から抽出された局所特徴点をＱ、座標をＱ（ｘ'，ｙ'）とし、その局所特徴点の局所特徴量をＶｑとする。また、照合相手の電子文書の１つのコンテンツ上の局所特徴点をＳ、座標をＳ（ｘ，ｙ）とし、その局所特徴点の局所特徴量をＶｓとする。 FIG. 15 is a flowchart showing detailed processing in the feature amount matching processing (S1401) shown in FIG. In the feature amount matching process, the similarity between the image displayed in the content area obtained in the captured image and the content of the electronic document is calculated, and the content corresponding to the content area is specified based on the calculated similarity. . Specifically, the document search unit 503 specifies content that is equal to or greater than the threshold and has the maximum similarity as content corresponding to the image in the content area.
Prior to the description of the feature amount matching process, symbols used in the feature amount matching process will be described. The local feature point extracted from the captured image is Q, the coordinate is Q (x ′, y ′), and the local feature amount of the local feature point is Vq. Further, a local feature point on one content of the electronic document to be collated is S, a coordinate is S (x, y), and a local feature amount of the local feature point is Vs.

Ｓ１５００において、文書検索部５０３は、ＶｑとＶｓとの局所特徴量間距離を全ての組み合わせについて計算し、最小距離対応点リストを作成する。具体的には、文書検索部５０３は、まず、ＶｑとＶｓとの全ての組み合わせについて特徴量間距離を計算する。次に、文書検索部５０３は、計算した特徴量間距離が閾値Ｔｖ以下となり、かつ、最小距離となるようなＶｑとＶｓとの組み合わせ（対応点）を抽出する。そして、文書検索部５０３は、抽出した対応点をリストに登録することで、最小距離対応点リストを作成する。ここで、ｋ番目の最小距離対応点をそれぞれＱｋ、Ｓｋと表わし、これらの座標をＱｋ（ｘ'_k，ｙ'_k）、Ｓｋ（ｘ_k，ｙ_k）等と、添え字を合わせて記載することとする。Ｑｋ、Ｓｋの局所特徴量をそれぞれＶｑ（ｋ）、Ｖｓ（ｋ）と記載する。
なお、１つの局所特徴点に対応付けられる局所特徴量は２つ以上あってもよいが、ここでは簡単のため、１つの局所特徴点に対応付けられる局所特徴量が１つだけの場合について説明する。また、最小距離対応点リストに登録されている対応点の組の数をｍとする。 In step S1500, the document search unit 503 calculates the distance between the local feature amounts of Vq and Vs for all combinations, and creates a minimum distance corresponding point list. Specifically, the document search unit 503 first calculates the distance between feature amounts for all combinations of Vq and Vs. Next, the document search unit 503 extracts a combination (corresponding point) of Vq and Vs such that the calculated distance between the feature amounts is equal to or less than the threshold value Tv and the minimum distance. Then, the document search unit 503 creates the minimum distance corresponding point list by registering the extracted corresponding points in the list. Here, the k-th minimum distance corresponding point is represented as Qk and Sk, respectively, and these coordinates are described as Qk (x ′ _k , y ′ _k ), Sk (x _k , y _k ), etc., with subscripts. I decided to. The local feature values of Qk and Sk are described as Vq (k) and Vs (k), respectively.
Note that there may be two or more local feature quantities associated with one local feature point, but here, for simplicity, a case where only one local feature quantity is associated with one local feature point will be described. To do. Also, let m be the number of pairs of corresponding points registered in the minimum distance corresponding point list.

Ｓ１５００の処理の後、Ｓ１５０１において、文書検索部５０３は、ｍが３以上か否かを確認する。文書検索部５０３は、ｍが３未満の場合は（Ｓ１５０１でＮｏ）、類似度を算出できないとして、特徴量照合処理を終了する。文書検索部５０３は、ｍが３以上の場合には（Ｓ１５０１でＹｅｓ）、処理をＳ１５０３へ進める。Ｓ１５０３において、文書検索部５０３は、最終投票数を表す変数ＶｏｔｅＭａｘを０に初期化する。次に、Ｓ１５０４において、文書検索部５０３は、類似度算出処理の反復カウント数を表す変数Ｃｏｕｎｔを０に初期化する。
次に、Ｓ１５０５において、文書検索部５０３は、反復カウント数Ｃｏｕｎｔと予め定められた最大反復処理回数Ｒｎとを比較する。文書検索部５０３は、反復カウント数Ｃｏｕｎｔが最大反復処理回数Ｒｎ以上の場合には（Ｓ１５０５でＮｏ）、処理をＳ１５０６へ進める。Ｓ１５０６において、文書検索部５０３は、最終投票数ＶｏｔｅＭａｘ及び回転角度を出力し、特徴量照合処理を終了する。 After the processing of S1500, in S1501, the document search unit 503 checks whether m is 3 or more. If m is less than 3 (No in S1501), the document search unit 503 determines that the similarity cannot be calculated and ends the feature amount matching process. If m is 3 or more (Yes in S1501), the document search unit 503 advances the process to S1503. In step S1503, the document search unit 503 initializes a variable VoteMax indicating the final number of votes to 0. In step S 1504, the document search unit 503 initializes a variable “Count” indicating the number of repetition counts of the similarity calculation process to 0.
In step S 1505, the document search unit 503 compares the iteration count number Count with a predetermined maximum iteration number Rn. If the iteration count Count is equal to or greater than the maximum iteration count Rn (No in S1505), the document search unit 503 advances the process to S1506. In step S1506, the document search unit 503 outputs the final vote number VoteMax and the rotation angle, and ends the feature amount matching process.

文書検索部５０３は、反復カウント数Ｃｏｕｎｔが最大反復処理回数Ｒｎ未満の場合は（Ｓ１５０５でＹｅｓ）、処理をＳ１５０７へ進める。Ｓ１５０７において、文書検索部５０３は、投票数を表す変数Ｖｏｔｅを０に初期化する。次に、Ｓ１５０８において、文書検索部５０３は、最小距離対応点リストから対応点の組の座標をランダムに２組抽出する。ここで、抽出した２組の座標をＱ１（ｘ'₁，ｙ'₁）とＳ１（ｘ₁，ｙ₁）、Ｑ２（ｘ'₂，ｙ'₂）とＳ２（ｘ₂，ｙ₂）とする。
次に、Ｓ１５０９において、文書検索部５０３は、変換行列Ｍを算出する。ここで、変換行列Ｍは式（８）で示される。なお、本実施形態においては、簡略化のため相似変換だけを考えるものとする。

文書検索部５０３は、具体的には、式（８）中の変数ａ，ｂ，ｅ，ｆを求める。文書検索部５０３は、Ｑ１（ｘ'₁，ｙ'₁）とＳ１（ｘ₁，ｙ₁）、Ｑ２（ｘ'₂，ｙ'₂）とＳ２（ｘ₂，ｙ₂）を用いて、式（９）〜式（１２）により、変数ａ，ｂ，ｅ，ｆを求める。

If the iteration count number Count is less than the maximum iteration count Rn (Yes in S1505), the document search unit 503 advances the process to S1507. In step S1507, the document search unit 503 initializes a variable Vote representing the number of votes to 0. In step S 1508, the document search unit 503 randomly extracts two sets of coordinates of corresponding points from the minimum distance corresponding point list. Here, the extracted two sets of coordinates are Q1 (x ′ ₁ , y ′ ₁ ), S1 (x ₁ , y ₁ ), Q2 (x ′ ₂ , y ′ ₂ ), and S2 (x ₂ , y ₂ ). To do.
In step S1509, the document search unit 503 calculates a conversion matrix M. Here, the transformation matrix M is represented by Expression (8). In the present embodiment, only similarity conversion is considered for the sake of simplicity.

Specifically, the document search unit 503 obtains variables a, b, e, and f in Expression (8). The document search unit 503 uses Q1 (x ′ ₁ , y ′ ₁ ), S1 (x ₁ , y ₁ ), Q2 (x ′ ₂ , y ′ ₂ ), and S2 (x ₂ , y ₂ ) to The variables a, b, e, and f are obtained from (9) to (12).

次に、Ｓ１５１０において、文書検索部５０３は、Ｓ１５０８において選択された２組の点以外の点を選択すべく、対応点選択変数ｋを３に初期化する。次に、Ｓ１５１１において、文書検索部５０３は、対応点選択変数ｋが最短距離対応点リストに登録されている対応点の組の数ｍ以下か否かを確認する。文書検索部５０３は、ｋがｍ以下の場合には（Ｓ１５１１でＹｅｓ）、処理をＳ１５１２へ進める。文書検索部５０３は、ｋがｍよりも大きい場合には（Ｓ１５１１でＮｏ）、処理をＳ１５１７へ進める。
Ｓ１５１２において、文書検索部５０３は、最小距離対応点リストから新たな対応点の組を１組抽出する。ここで抽出した座標をＳｋ（ｘ_k，ｙ_k）、Ｑｋ（ｘ'_k，ｙ'_k）とする。次に、Ｓ１５１３において、文書検索部５０３は、Ｓ１５１２において抽出された座標Ｓｋ（ｘ_k，ｙ_k）が式（８）により移される先の座標Ｓｋ（ｘ'_k，ｙ'_k）を求める。 In step S1510, the document search unit 503 initializes the corresponding point selection variable k to 3 in order to select a point other than the two sets of points selected in step S1508. In step S 1511, the document search unit 503 checks whether the corresponding point selection variable k is equal to or less than the number m of pairs of corresponding points registered in the shortest distance corresponding point list. If k is equal to or smaller than m (Yes in S1511), the document search unit 503 advances the process to S1512. If k is larger than m (No in S1511), the document search unit 503 advances the process to S1517.
In step S1512, the document search unit 503 extracts one new set of corresponding points from the minimum distance corresponding point list. The extracted coordinates are represented by Sk (x _k , y _k ) and Qk (x ′ _k , y ′ _k ). In step S1513, the document search unit 503 obtains the coordinates Sk (x ′ _k , y ′ _k ) to which the coordinates Sk (x _k , y _k ) extracted in step S1512 are transferred according to the equation (8).

次に、Ｓ１５１４において、文書検索部５０３は、座標Ｓｋ（ｘ'_k，ｙ'_k）と座標Ｑｋ（ｘ'_k，ｙ'_k）の間の幾何学的距離としてユークリッド距離Ｄを計算し、ユークリッド距離Ｄと閾値Ｔｄとを比較する。文書検索部５０３は、ユークリッド距離Ｄが閾値Ｔｄ以下の場合には（Ｓ１５１４でＹｅｓ）、処理をＳ１５１５へ進める。
Ｓ１５１５において、文書検索部５０３は、投票数Ｖｏｔｅをインクリメントし、処理をＳ１５１６へ進める。なお、文書検索部５０３は、ユークリッド距離Ｄが閾値Ｔｄよりも大きい場合には（Ｓ１５１４でＮｏ）、Ｓ１５１５の処理を行うことなく、処理をＳ１５１６へ進める。Ｓ１５１６において、文書検索部５０３は、対応点選択変数ｋをインクリメントする。その後、処理をＳ１５１１へ進める。そして、対応点選択変数ｋが当該最短距離対応点リストに登録されている対応点の組数ｍを超えるまで、上述の処理を繰り返す。 In step S1514, the document search unit 503 calculates the Euclidean distance D as a geometric distance between the coordinates Sk (x ′ _k , y ′ _k ) and the coordinates Qk (x ′ _k , y ′ _k ). The Euclidean distance D is compared with the threshold value Td. If the Euclidean distance D is equal to or smaller than the threshold Td (Yes in S1514), the document search unit 503 advances the process to S1515.
In step S1515, the document search unit 503 increments the vote number Vote and advances the process to step S1516. If the Euclidean distance D is greater than the threshold Td (No in S1514), the document search unit 503 advances the process to S1516 without performing the process of S1515. In step S1516, the document search unit 503 increments the corresponding point selection variable k. Thereafter, the process proceeds to S1511. The above processing is repeated until the corresponding point selection variable k exceeds the number m of corresponding points registered in the shortest distance corresponding point list.

次に、Ｓ１５１１において、対応点選択変数ｋが対応点リストに登録されている対応点の組数ｍを超えた場合の処理について説明する。Ｓ１５１７において、文書検索部５０３は、投票数Ｖｏｔｅの値と最終投票数ＶｏｔｅＭａｘの値とを比較する。文書検索部５０３は、投票数Ｖｏｔｅの値が最終投票数ＶｏｔｅＭａｘの値よりも大きい場合には（Ｓ１５１７でＹｅｓ）、処理をＳ１５１８へ進める。文書検索部５０３は、投票数Ｖｏｔｅの値が最終投票数ＶｏｔｅＭａｘの値以下の場合には（Ｓ１５１７でＮｏ）、処理をＳ１５１９へ進める。
Ｓ１５１８において、文書検索部５０３は、最終投票数ＶｏｔｅＭａｘの値を投票数Ｖｏｔｅの値で置き換えるとともに、その時の変換行列Ｍを変数Ｍｍａｘに保存する。その後、Ｓ１５１９において、文書検索部５０３は、反復カウント数Ｃｏｕｎｔをインクリメントし、上述のＳ１５０５へ処理を進める。 Next, a process when the corresponding point selection variable k exceeds the number m of corresponding points registered in the corresponding point list in S1511 will be described. In step S1517, the document search unit 503 compares the value of the vote number Vote with the value of the final vote number VoteMax. If the value of the vote number Vote is larger than the value of the final vote number VoteMax (Yes in S1517), the document search unit 503 advances the process to S1518. If the value of the vote number Vote is equal to or smaller than the value of the final vote number VoteMax (No in S1517), the document search unit 503 advances the process to S1519.
In step S1518, the document search unit 503 replaces the value of the final vote number VoteMax with the value of the vote number Vote, and stores the conversion matrix M at that time in the variable Mmax. Thereafter, in S1519, the document search unit 503 increments the iteration count number Count, and advances the process to S1505 described above.

なお、Ｓ１５０９において、式（８）に示す変換行列以外の変換行列を用いることにより、アフィン変換等その他の幾何学変換に対応可能である。アフィン変換の場合には、Ｓ１５０８においてランダムに選択する対応点の組の座標数を３とする。また、Ｓ１５０９において式（８）に替えて、式（１３）を用いることとし、Ｓ１５０８において選択した３組の対応点（合計６点）を使って変数ａ〜ｆを求めればよい。

In S1509, it is possible to deal with other geometric transformations such as affine transformation by using a transformation matrix other than the transformation matrix shown in Expression (8). In the case of affine transformation, the number of coordinates of the pair of corresponding points selected at random in S1508 is set to 3. In S1509, equation (13) is used instead of equation (8), and the variables a to f may be obtained using the three corresponding points (total of 6 points) selected in S1508.

以上の処理により、文書検索部５０３は、最終投票数ＶｏｔｅＭａｘを類似度として算出することができる（類似度算出処理）。そして、文書検索部５０３は、算出された類似度が予め定められた閾値よりも大きく、類似度が一番高くなるコンテンツを処理対象のコンテンツ領域に対応するコンテンツとして特定することができる。
なお、本実施形態においては、Ｓ１５０６において、最終投票数ＶｏｔｅＭａｘを類似度として出力することとしたが、類似度の算出方法は、実施形態に限定されるものではない。他の例としては、文書検索部５０３は、Ｓ１５０３以降の処理を行うことなく、対応点の組数ｍを類似度として出力してもよい。
また、上記説明では、局所特徴点／局所特徴量の比較に基づく画像の照合方法としてＲＡＮＳＡＣを利用した方法を説明した。しかしながら、特徴量照合処理は、２つの画像間の類似度が算出でき、類似度算出の過程において処理負荷の低い方法で回転角度を推定できる方法があれば、他の方法であってもよい。 Through the above processing, the document search unit 503 can calculate the final vote number VoteMax as the similarity (similarity calculation processing). Then, the document search unit 503 can specify the content having the calculated similarity higher than a predetermined threshold and the highest similarity as the content corresponding to the content area to be processed.
In the present embodiment, the final vote number VoteMax is output as the similarity in S1506, but the method of calculating the similarity is not limited to the embodiment. As another example, the document search unit 503 may output the number m of corresponding points as the similarity without performing the processing from S1503 onward.
In the above description, a method using RANSAC as an image matching method based on the comparison of local feature points / local feature amounts has been described. However, the feature amount matching process may be another method as long as there is a method that can calculate the similarity between two images and can estimate the rotation angle by a method with a low processing load in the process of calculating the similarity.

以上のように、本実施形態に係るＡＲ処理システムは、撮影画像から得られたコンテンツと、特徴量テーブル６００に登録されているコンテンツとの単なる比較ではなく、周囲のコンテンツとの位置関係を考慮してコンテンツを特定する。したがって、ＡＲ処理システムは、より精度よくコンテンツの特定を行うことができ、これにより、ユーザが閲覧中のコンテンツに対応付けられた関連情報を精度よく提示することができる。
また、ＡＲ処理システムは、撮影時の撮影方向の変化等から、各フレーム間の位置関係を特定し、これに基づいてコンテンツ間の位置関係を特定する。したがって、ＡＲ処理システムは、１つの紙文書に配置された複数のコンテンツが１フレームに収まりきらない場合においても、精度よく各コンテンツの位置関係を特定することができる。 As described above, the AR processing system according to the present embodiment considers the positional relationship between the content obtained from the photographed image and the content registered in the feature amount table 600, not the simple content. To identify the content. Therefore, the AR processing system can specify the content with higher accuracy, and thereby can present the related information associated with the content being browsed by the user with high accuracy.
Further, the AR processing system specifies the positional relationship between the frames based on the change in the shooting direction at the time of shooting, and specifies the positional relationship between the contents based on the positional relationship. Therefore, the AR processing system can accurately specify the positional relationship of each content even when a plurality of contents arranged in one paper document cannot be accommodated in one frame.

第１の実施形態に係るＡＲ処理システムの第１の変更例としては、撮影画像中のすべてのフレームを処理対象とする必要はない。例えば、所定間隔毎のフレームを処理対象とし、このフレームに対して、図７に示す位置関連情報の生成（Ｓ７０２）、撮影画像等の送信（Ｓ７０３）、画像解析（Ｓ７０４）を行い、これらの結果に基づいて、Ｓ７０５以降の処理を行ってもよい。これにより、処理回数を削減することができ、処理負担を軽減することができる。
また、撮影画像は静止画であってもよい。この場合には、ＡＲ処理システムは、連続撮影又は異なるタイミングで撮影された撮影画像を処理対象としてもよい。具体的には、ＡＲ処理システムは、Ｓ７０２において、位置関連情報を生成し、さらに撮影画像が得られたタイミングも特定する。そして、撮影画像としての静止画に対し、Ｓ７０３以降の処理を行ってもよい。
また、第２の変更例としては、図７に示すＳ７０４〜Ｓ７０６の処理は、管理サーバ装置１１０に替えて、携帯端末１００が行ってもよい。また、他の例としては、管理サーバ装置１１０は、複数の装置で構成されており、Ｓ７０４〜Ｓ７０６の処理を複数の装置が分担して実行してもよい。 As a first modification of the AR processing system according to the first embodiment, it is not necessary to set all frames in a captured image as a processing target. For example, a frame at a predetermined interval is set as a processing target, and position related information generation (S702), transmission of a photographed image, etc. (S703) and image analysis (S704) shown in FIG. Based on the result, the processing after S705 may be performed. As a result, the number of processes can be reduced, and the processing load can be reduced.
Further, the captured image may be a still image. In this case, the AR processing system may process continuously captured images captured at different timings. Specifically, in step S702, the AR processing system generates position related information and further specifies the timing at which the captured image is obtained. And you may perform the process after S703 with respect to the still image as a picked-up image.
As a second modification, the processing of S704 to S706 illustrated in FIG. 7 may be performed by the mobile terminal 100 instead of the management server device 110. As another example, the management server device 110 is configured by a plurality of devices, and the processing of S704 to S706 may be shared and executed by a plurality of devices.

（第２の実施形態）
次に、第２の実施形態に係るＡＲ処理システムについて説明する。第１の実施形態に係るＡＲ処理システムは、撮影画像から対応する電子文書を特定できなかった場合には、後続のフレームが入力されるのを待ち、後続のフレームから得られた情報に基づいて再度電子文書の特定を試みる。
これに対し、第２の実施形態に係るＡＲ処理システムは、撮影画像から対応する電子文書を特定できなかった場合には、特定するために必要な画像を撮影するための支援情報を携帯端末１００の表示装置１０１に表示する。以下、第２の実施形態に係るＡＲ処理システムの具体的な処理のうち、第１の実施形態に係るＡＲ処理システムの処理と異なる部分について説明する。 (Second Embodiment)
Next, an AR processing system according to the second embodiment will be described. When the AR processing system according to the first embodiment cannot identify the corresponding electronic document from the captured image, the AR processing system waits for the subsequent frame to be input, and based on the information obtained from the subsequent frame. Try to identify the electronic document again.
On the other hand, the AR processing system according to the second embodiment, when the corresponding electronic document cannot be identified from the captured image, provides support information for capturing an image necessary for specifying the mobile terminal 100. Displayed on the display device 101. Hereinafter, of the specific processing of the AR processing system according to the second embodiment, a part different from the processing of the AR processing system according to the first embodiment will be described.

図１６は、管理サーバ装置１１０による、支援情報送信処理を示すフローチャートである。管理サーバ装置１１０は、図７を参照しつつ説明したＡＲ処理のＳ７０６においてＮｏと判断された場合に、処理をＳ７０３へ進めるのに替えて、支援情報送信処理を実行する。Ｓ１６０１において、管理サーバ装置１１０の文書検索部５０３は、電子文書検索処理（Ｓ７０５）において、位置情報を満たす電子文書の候補が複数得られたか否かを確認する。文書検索部５０３は、電子文書の候補が複数得られた場合には（Ｓ１６０１でＹｅｓ）、処理をＳ１６０３へ進める。文書検索部５０３は、位置情報を満たす電子文書の候補が得られなかった場合には（Ｓ１６０１でＮｏ）、処理をＳ１６０２へ進める。
Ｓ１６０２において、通信部５００は、は、関連情報がない旨を示す提示情報を携帯端末１００に送信する。これに対応し、携帯端末１００の通信部４００は、提示情報を受信する。そして、携帯端末１００の表示部４０４は、表示装置１０１に提示情報を表示する。これにより、携帯端末１００のユーザは、閲覧中の紙文書に対する関連情報が存在しないと把握することができる。 FIG. 16 is a flowchart showing support information transmission processing by the management server device 110. When it is determined No in S706 of the AR process described with reference to FIG. 7, the management server apparatus 110 executes the support information transmission process instead of proceeding to S703. In step S1601, the document search unit 503 of the management server apparatus 110 checks whether or not a plurality of electronic document candidates that satisfy the position information are obtained in the electronic document search process (S705). If a plurality of electronic document candidates are obtained (Yes in S1601), the document search unit 503 advances the process to S1603. If the electronic document candidate satisfying the position information is not obtained (No in S1601), the document search unit 503 advances the process to S1602.
In S1602, the communication unit 500 transmits presentation information indicating that there is no related information to the mobile terminal 100. In response to this, the communication unit 400 of the mobile terminal 100 receives the presentation information. Then, the display unit 404 of the portable terminal 100 displays the presentation information on the display device 101. Thereby, the user of the portable terminal 100 can grasp that there is no related information for the paper document being browsed.

Ｓ１６０３において、文書検索部５０３は、特徴量テーブル６００を参照し、複数の候補に対応付けられているコンテンツの中から、撮影画像に対応する１つの電子文書に絞り込むために必要なコンテンツを特定する（コンテンツ特定処理）。ここで、複数の候補は、電子文書検索処理（Ｓ７０５）において得られたものである。
次に、Ｓ１６０４において、文書検索部５０３は、Ｓ１６０３において特定したコンテンツに基づいて、支援情報を作成する（支援情報作成処理）。具体的には、文書検索部５０３は、特定したコンテンツが撮影されるように、携帯端末１００の撮影部３００による撮影方向を変更するための指示を支援情報として作成する。
次に、Ｓ１６０５において、通信部５００は、支援情報を携帯端末１００に送信し、その後、処理をＳ７０４（図７）へ進める。これに対応し、携帯端末１００の通信部４００は、支援情報を受信する。そして、携帯端末１００の表示部４０４は、表示装置１０１に支援情報を表示する（支援情報表示処理）。これにより、携帯端末１００のユーザは、撮影方向を変更することにより関連情報を見ることができる、ということを把握することができる。 In step S 1603, the document search unit 503 refers to the feature amount table 600 and specifies content necessary for narrowing down to one electronic document corresponding to the photographed image from the content associated with the plurality of candidates. (Content identification process). Here, the plurality of candidates are obtained in the electronic document search process (S705).
In step S1604, the document search unit 503 generates support information based on the content specified in step S1603 (support information generation process). Specifically, the document search unit 503 creates, as support information, an instruction for changing the shooting direction by the shooting unit 300 of the mobile terminal 100 so that the specified content is shot.
Next, in S1605, the communication unit 500 transmits the support information to the mobile terminal 100, and then the process proceeds to S704 (FIG. 7). In response to this, the communication unit 400 of the mobile terminal 100 receives support information. Then, the display unit 404 of the portable terminal 100 displays support information on the display device 101 (support information display process). Thereby, the user of the portable terminal 100 can grasp that the related information can be viewed by changing the shooting direction.

図１７を参照しつつ、支援情報送信処理を具体的に説明する。前提として、図１７（ａ）に示すように、特徴量テーブル１７００には、文書ＩＤ「０１１」，「０１２」の電子文書が登録されているものとする。また、ＡＲ処理において、３つのコンテンツ領域が抽出され、図１７（ｂ）に示すように、撮影画像テーブル１７１０には、３つのコンテンツ領域に対応するレコードが記録されているものとする。
図１７に示すように、領域ＩＤ「ａ０１１」のコンテンツ領域の特徴量は、文書ＩＤ「０１１」のコンテンツＩＤ「Ａ」のコンテンツの特徴量と一致するものとする。また、図１７に示すように、領域ＩＤ「ａ０１２」のコンテンツ領域の特徴量は、文書ＩＤ「０１２」のコンテンツＩＤ「Ｃ」のコンテンツの特徴量と一致するものとする。この場合、文書検索部５０３は、文書ＩＤ「０１１」,「０１２」により識別される２つの電子文書が候補として特定され、撮影画像がいずれの候補に対応するのか特定することができない。 The support information transmission process will be specifically described with reference to FIG. As a premise, as shown in FIG. 17A, it is assumed that electronic documents having document IDs “011” and “012” are registered in the feature amount table 1700. In the AR process, it is assumed that three content areas are extracted, and records corresponding to the three content areas are recorded in the captured image table 1710 as shown in FIG.
As illustrated in FIG. 17, it is assumed that the feature amount of the content area with the region ID “a011” matches the feature amount of the content with the content ID “A” of the document ID “011”. Also, as shown in FIG. 17, the feature amount of the content area with the region ID “a012” is assumed to match the feature amount of the content with the content ID “C” of the document ID “012”. In this case, the document search unit 503 cannot identify the two electronic documents identified by the document IDs “011” and “012” as candidates, and which candidate the captured image corresponds to.

したがって、Ｓ１６０３において、文書検索部５０３は、電子文書の候補に対応付けられているコンテンツのうち、撮影画像テーブル１０００に記録されているコンテンツ領域と一致するコンテンツとの位置関係が登録されたコンテンツを特定する。すなわち、このとき特定されるコンテンツは、コンテンツ領域と一致するコンテンツ以外のコンテンツである。図１７の例においては、文書検索部５０３は、文書ＩＤ「０１１」のコンテンツＩＤ「Ｄ」のコンテンツと、文書ＩＤ「０１２」のコンテンツＩＤ「Ｅ」のコンテンツを特定する。 Therefore, in step S1603, the document search unit 503 searches the content registered in the positional relationship with the content that matches the content area recorded in the captured image table 1000 among the content associated with the candidate electronic document. Identify. That is, the content specified at this time is content other than content that matches the content area. In the example of FIG. 17, the document search unit 503 specifies the content with the content ID “D” with the document ID “011” and the content with the content ID “E” with the document ID “012”.

そして、Ｓ１６０４において、文書検索部５０３は、Ｓ１６０３において特定されたコンテンツＩＤ「Ｄ」，「Ｅ」のコンテンツそれぞれを撮影するための撮影方向を決定する。例えば、Ｓ１６０４の処理時点において携帯端末１００の表示装置１０１に表示されているフレームに、領域ＩＤ「ａ０１３」のコンテンツ領域が表示されているとする。この場合、文書検索部５０３は、撮影方向をより下又は左上に向けることを指示する支援情報を作成する。これにより、コンテンツＩＤ「Ｅ」のコンテンツを表示するコンテンツ領域が撮影されれば、撮影画像に対応する電子文書は、文書ＩＤ「０１２」の文書であるということがわかる。また、コンテンツＩＤ「Ｄ」のコンテンツを表示するコンテンツ領域が撮影されれば、撮影画像に対応する電子文書は、文書ＩＤ「０１１」の文書であるということがわかる。 In step S 1604, the document search unit 503 determines a shooting direction for shooting each of the content IDs “D” and “E” specified in step S 1603. For example, it is assumed that the content area with the area ID “a013” is displayed in the frame displayed on the display device 101 of the mobile terminal 100 at the time of the processing of S1604. In this case, the document search unit 503 creates support information for instructing to turn the shooting direction downward or upper left. As a result, if the content area displaying the content with the content ID “E” is photographed, it can be understood that the electronic document corresponding to the photographed image is the document with the document ID “012”. In addition, if the content area displaying the content with the content ID “D” is captured, it can be understood that the electronic document corresponding to the captured image is the document with the document ID “011”.

図１８は、支援情報の表示例を示す図である。このように、携帯端末１００の表示部４０４は、支援情報から撮影方向を示す矢印画像１８００を作成し、これを表示装置１０１に表示する。なお、支援情報の出力形態は実施形態に限定されるものではない。他の例としては、表示装置１０１は、テキスト形式で支援情報を表示してもよい。また、他の例としては、携帯端末１００がスピーカ（不図示）を備える場合には、支援情報を音声として出力してもよい。
なお、第２の実施形態にかかるＡＲ処理システムのこれ以外の構成及び処理は、第１の実施形態にかかるＡＲ処理システムの構成及び処理と同様である。
以上のように、第２の実施形態に係るＡＲ処理システムにおいては、撮影画像に対応する電子文書を一意に特定できない場合に、ユーザに対し必要な操作を求めることにより、より早く、また確実に電子文書を特定し、適切な関連情報を提示することができる。 FIG. 18 is a diagram illustrating a display example of support information. As described above, the display unit 404 of the mobile terminal 100 creates the arrow image 1800 indicating the shooting direction from the support information, and displays this on the display device 101. The output form of support information is not limited to the embodiment. As another example, the display device 101 may display support information in a text format. As another example, when the mobile terminal 100 includes a speaker (not shown), the support information may be output as sound.
The remaining configuration and processing of the AR processing system according to the second embodiment are the same as the configuration and processing of the AR processing system according to the first embodiment.
As described above, in the AR processing system according to the second embodiment, when the electronic document corresponding to the photographed image cannot be uniquely specified, the user is requested to perform a necessary operation, thereby more quickly and reliably. An electronic document can be identified and appropriate relevant information can be presented.

＜その他の実施形態＞
また、本発明の目的は、以下のようにすることによって達成されることは言うまでもない。すなわち、前述した実施形態の機能を実現するソフトウェアのプログラムコード（コンピュータプログラム）を記録した記録媒体（または記憶媒体）を、システムあるいは装置に供給する。係る記憶媒体は言うまでもなく、コンピュータ読み取り可能な記憶媒体である。
そして、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコードを読み出し実行する。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。また、ソフトウェアのプログラムコードは、ネットワークを介して、そのシステムあるいは装置に供給されてもよい。 <Other embodiments>
Needless to say, the object of the present invention can be achieved as follows. That is, a recording medium (or storage medium) that records a program code (computer program) of software that implements the functions of the above-described embodiments is supplied to the system or apparatus. Needless to say, such a storage medium is a computer-readable storage medium.
Then, the computer (or CPU or MPU) of the system or apparatus reads and executes the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the storage medium storing the program code constitutes the present invention. Moreover, the program code of software may be supplied to the system or apparatus via a network.

また、コンピュータが読み出したプログラムコードの指示に基づき、コンピュータ上で稼働しているオペレーティングシステム（ＯＳ）等が実際の処理の一部または全部を行う。その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 An operating system (OS) operating on the computer performs part or all of the actual processing based on the instruction of the program code read by the computer. Needless to say, the process includes the case where the functions of the above-described embodiments are realized.

さらに、記憶媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれたとする。その後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
本発明を上記記憶媒体に適用する場合、その記憶媒体には、先に説明したフローチャートに対応するプログラムコードが格納されることになる。 Furthermore, it is assumed that the program code read from the storage medium is written in a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer. After that, based on the instruction of the program code, the CPU of the function expansion board or function expansion unit performs part or all of the actual processing, and the processing of the above-described embodiment is realized by the processing. Needless to say.
When the present invention is applied to the storage medium, the storage medium stores program codes corresponding to the flowcharts described above.

以上、上述した各実施形態によれば、撮影画像の被写体を精度よく特定することができる。 As mentioned above, according to each embodiment mentioned above, the photographic subject can be specified accurately.

以上、本発明の好ましい実施形態について詳述したが、本発明は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 The preferred embodiments of the present invention have been described in detail above, but the present invention is not limited to such specific embodiments, and various modifications can be made within the scope of the gist of the present invention described in the claims.・ Change is possible.

１００携帯端末
１０１表示装置
１１０管理サーバ装置
３００撮影部
３０１加速度センサ
３０２ＣＰＵ DESCRIPTION OF SYMBOLS 100 Mobile terminal 101 Display apparatus 110 Management server apparatus 300 Image pick-up part 301 Acceleration sensor 302 CPU

Claims

Area extracting means for extracting a plurality of content areas from a plurality of photographed images obtained by photographing a part of one paper medium;
A feature amount calculating means for calculating a feature amount of the content area;
Position specifying means for specifying a positional relationship between the plurality of content areas;
A position that stores feature quantities of a plurality of contents included in one electronic document in association with an electronic document corresponding to one paper medium, and further indicates a positional relationship between the plurality of contents included in one electronic document With reference to the storage means for storing information, based on the feature amounts of each of the plurality of content areas calculated by the feature amount calculation means and the positional relationship between the plurality of content areas specified by the position specifying means, An information processing system comprising electronic document specifying means for specifying an electronic document corresponding to the plurality of photographed images.

The information processing system according to claim 1, wherein the region extraction unit extracts a plurality of content regions from a plurality of frames included in a moving image as the plurality of captured images.

Similarity calculation means for calculating the similarity based on the feature quantities of the plurality of content regions extracted by the area extraction means and the feature quantities of the content;
First content specifying means for specifying content corresponding to each of the plurality of content areas extracted by the area extracting means based on the similarity;
The electronic document specifying means indicates the positional relationship specified by the position specifying means in the position information stored in the storage means for the plurality of contents specified by the first content specifying means. 3. The information processing system according to claim 1, wherein an electronic document corresponding to the position information is specified as an electronic document corresponding to the photographed image when the positional relationship is satisfied.

In the electronic document specifying means, all the contents associated with the electronic document are specified by the first content specifying means, and the positional relationship specified by the position specifying means is indicated in the position information. 4. The information processing system according to claim 3, wherein an electronic document corresponding to the position information is specified as an electronic document corresponding to the captured image when the positional relationship is satisfied.

The storage means further stores related information in association with the content,
Related information extraction for extracting the related information stored in the storage means in association with the content included in the electronic document specified by the electronic document specifying means, corresponding to the content area extracted by the area extracting means The information processing system according to claim 3 or 4, further comprising means.

The apparatus further includes a superimposed display unit that displays the captured image by superimposing the related information at a position corresponding to the content area corresponding to the related information extracted by the related information extracting unit of the captured image. The information processing system according to claim 5, wherein the system is an information processing system.

Second content specification that specifies content necessary for narrowing down based on the positional information when the positional relationship specified by the position specifying means satisfies the positional relationship indicated by each of the positional information of a plurality of electronic documents. Means,
Support information creating means for creating support information that prompts the user to change the shooting direction based on the second content;
The information processing system according to claim 3, further comprising support information display means for displaying the support information.

Area extracting means for extracting a plurality of content areas from a plurality of photographed images obtained by photographing a part of one paper medium;
A feature amount calculating means for calculating a feature amount of the content area;
Position specifying means for specifying a positional relationship between the plurality of content areas;
A position that stores feature quantities of a plurality of contents included in one electronic document in association with an electronic document corresponding to one paper medium, and further indicates a positional relationship between the plurality of contents included in one electronic document With reference to the storage means for storing information, based on the feature amounts of each of the plurality of content areas calculated by the feature amount calculation means and the positional relationship between the plurality of content areas specified by the position specifying means, An information processing apparatus comprising: an electronic document specifying unit that specifies an electronic document corresponding to the plurality of captured images.

An information processing method executed by an information processing system,
A region extracting step of extracting a plurality of content regions from a plurality of captured images obtained by capturing a part of one paper medium;
A feature amount calculating step for calculating a feature amount of the content area;
A position specifying step for specifying a positional relationship between the plurality of content areas;
A position that stores feature quantities of a plurality of contents included in one electronic document in association with an electronic document corresponding to one paper medium, and further indicates a positional relationship between the plurality of contents included in one electronic document With reference to storage means for storing information, based on the feature amounts of each of the plurality of content areas calculated in the feature amount calculation step and the positional relationship between the plurality of content areas specified in the position specifying step, And an electronic document specifying step of specifying an electronic document corresponding to the plurality of photographed images.

An information processing method executed by an information processing apparatus,
A region extracting step of extracting a plurality of content regions from a plurality of captured images obtained by capturing a part of one paper medium;
A feature amount calculating step for calculating a feature amount of the content area;
A position specifying step for specifying a positional relationship between the plurality of content areas;
A position that stores feature quantities of a plurality of contents included in one electronic document in association with an electronic document corresponding to one paper medium, and further indicates a positional relationship between the plurality of contents included in one electronic document With reference to storage means for storing information, based on the feature amounts of each of the plurality of content areas calculated in the feature amount calculation step and the positional relationship between the plurality of content areas specified in the position specifying step, And an electronic document specifying step of specifying an electronic document corresponding to the plurality of photographed images.

Computer
Area extracting means for extracting a plurality of content areas from a plurality of photographed images obtained by photographing a part of one paper medium;
A feature amount calculating means for calculating a feature amount of the content area;
Position specifying means for specifying a positional relationship between the plurality of content areas;
A position that stores feature quantities of a plurality of contents included in one electronic document in association with an electronic document corresponding to one paper medium, and further indicates a positional relationship between the plurality of contents included in one electronic document With reference to the storage means for storing information, based on the feature amounts of each of the plurality of content areas calculated by the feature amount calculation means and the positional relationship between the plurality of content areas specified by the position specifying means, A program for functioning as electronic document specifying means for specifying an electronic document corresponding to the plurality of captured images.