JP2009506394A

JP2009506394A - Method and machine-readable medium in mixed media document system

Info

Publication number: JP2009506394A
Application number: JP2008510935A
Authority: JP
Inventors: ジェーハル，ジョナサン; リー，ダル−シヤン; ピアソル，カート; イーハート，ピーター; グラハム，ジェイミー; エロール，ベルナ; ジーヴァンオルスト，ダニエル; ルゥ，シアオイエ
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2005-08-23
Filing date: 2006-08-22
Publication date: 2009-02-12
Anticipated expiration: 2026-08-22
Also published as: EP1917637A1; JP4897795B2; EP1917637A4; KR100960639B1; KR20080034480A; WO2007023993A1

Abstract

混合メディアリアリティ(MMR)システム及び関連技術が開示される。MMRシステムは、少なくとも２種類のメディア（例えば、第１媒体としての印刷された用紙及び第２媒体としてのディジタルコンテンツ及び／又はウェブリンク）を含む混合メディア書類を形成する仕組みを提供する。ある特定の実施例では、MMRシステムはインデックステーブルと共に構築されたコンテンツベースの検索データベースを含み、テキストベースのインデックスを用いて探索を可能にする方法で、印刷書類から取り出されたオブジェクト間の２次元の幾何学的な位置関係をインデックステーブルは表現する。格付けされた一群の書類、ページ及びロケーションの候補は、インデックステーブルから所与のデータの元で算出可能である。本技法は、画像パッチ内で検出された特徴をテキストタームに(又は他の検索可能な特徴に)効率的に変換し、そのテキストタームは、特徴それ自身及びそれらの幾何学的位置関係の双方を表現する。ストレージ手段は、各書類画像パッチについて追加的な特徴を格納するのに使用可能である。
A mixed media reality (MMR) system and related techniques are disclosed. The MMR system provides a mechanism for forming a mixed media document that includes at least two types of media (eg, printed paper as a first medium and digital content and / or web links as a second medium). In one particular embodiment, the MMR system includes a content-based search database built with an index table, and in a manner that allows searching using a text-based index in a two-dimensional manner between objects retrieved from a printed document. The index table expresses the geometric positional relationship of. A group of rated documents, pages and locations can be calculated from the index table under given data. This technique efficiently converts the features detected in the image patch into text terms (or other searchable features), which are both the features themselves and their geometric positional relationships. Express. Storage means can be used to store additional features for each document image patch.

Description

本発明は、少なくとも２種類のメディアから形成される混合メディア書類を生成する技術に関連し、特に、印刷媒体と電子媒体とを結合して混合メディア書類を作成する混合メディアリアリティ(MMR: Mixed Media Reality)システムに関連する。 The present invention relates to a technique for generating a mixed media document formed from at least two types of media, and in particular, a mixed media reality (MMR: Mixed Media) that combines a print medium and an electronic medium to create a mixed media document. Reality) related to the system.

書類を印刷及び複写する技術は長年にわたって多くの状況で使用されている。例えば、プライベートな及び業務用のオフィス環境で、パーソナルコンピュータと共に家庭環境で、書類印刷及び出版サービス環境等において、プリンタ及び複写機が使用されている。しかしながら、印刷及び複写技術は、静的な印刷媒体(即ち、紙の書類)と双方向性の「仮想世界(virtual world)」との間のギャップを橋渡しする手段としては従来考えられていなかった。双方向性の仮想世界は、ディジタル通信、ネットワーキング、情報提供、宣伝広告、娯楽及び電子商取引等を含む。 The technology for printing and copying documents has been used in many situations for many years. For example, printers and copiers are used in private and professional office environments, in home environments with personal computers, and in document printing and publishing service environments. However, printing and copying techniques have not previously been considered as a means of bridging the gap between static print media (i.e., paper documents) and the interactive "virtual world". . The interactive virtual world includes digital communications, networking, information provision, advertising, entertainment and electronic commerce.

何世紀もの間、印刷媒体は、ニュースや広告情報等のような情報を通知する一次ソースであった。ここ数年の間に、パーソナルディジタルアシスタント(PDA)装置やセルラ電話(例えば、カメラ付きセルラ電話)等のようなパーソナル電子機器及びパーソナルコンピュータの出現及び普及は、印刷媒体を電子的に読み取り可能でサーチ可能な形式にすることによって及び双方向性のマルチメディア機能を導入することによって、印刷媒体の概念を拡張した。これらは従来の印刷媒体に類を見ない。 For centuries, print media has been the primary source of information such as news and advertising information. Over the last few years, the advent and dissemination of personal electronic devices and personal computers such as personal digital assistant (PDA) devices and cellular telephones (e.g. camera-equipped cellular telephones) have made it possible to read print media electronically. The concept of print media was expanded by making it a searchable format and by introducing interactive multimedia capabilities. These are unmatched by conventional print media.

不都合なことに、電子的にアクセス可能な仮想的なマルチメディアベースの世界と、印刷媒体の物理的な世界との間には或る隔たり(ギャップ)がある。例えば、先進国のほとんど誰でもが印刷媒体に及び電子情報に日常的にアクセスするが、印刷媒体の及びパーソナル電子機器のユーザは、その両者２つの間のリンクを形成するのに必要な（即ち、混合媒体書類の作成を促すための）ツール及び技術を持っていない。 Unfortunately, there is a gap between the electronically accessible virtual multimedia-based world and the physical world of print media. For example, almost everyone in developed countries has routine access to print media and electronic information, but print media and personal electronic device users are required to form a link between the two (ie, both). Does not have the tools and techniques to encourage the creation of mixed media documents.

更に、通常の印刷媒体には、触れる感覚があること、電力を要しないこと、構造及び保存に関する永続性等をもたらすという特に優れた性質があり、これらは仮想的な又はディジタル媒体には備わっていない。同様に、通常のディジタル媒体には、携帯性(例えば、セルラ電話やラップトップのストレージで運べること)や伝送の容易性(例えば、電子メール)等のような特に優れた性質がある。 In addition, ordinary print media have the particularly excellent properties of being touchable, requiring no power, and providing structure and preservation permanence, which are inherent to virtual or digital media. Absent. Similarly, ordinary digital media have particularly good properties such as portability (eg, can be carried on a cellular phone or laptop storage) and ease of transmission (eg, email).

これらの理由から、本発明の課題は、印刷媒体及び仮想媒体双方の利点を利用可能にする技術をもたらすことである。 For these reasons, the problem of the present invention is to provide a technique that makes available the advantages of both print media and virtual media.

本発明の１つ以上の実施例の内の少なくとも１つの形態は、混合メディア書類システムで情報を組織化する及び情報にアクセスするコンピュータで使用される方法を提供する。本方法は、紙書類の電子表現を生成するステップと、前記紙書類上の特徴を特定し、前記紙書類の２次元態様を捕捉するステップと、前記特徴の位置を特定するステップと、該位置の各々により前記特徴を索引付けし、それにより索引付け(インデックス)テーブルを生成するステップとを有する。本方法は、前記紙書類を受信する予備的なステップを含んでもよい。本方法は、前記特徴の少なくとも１つに関連する１つ以上の特徴を格納するステップを含んでもよい。その場合、１つ以上の特徴は、テキスト情報の抽出、図形情報の抽出、プロセスの実行、コマンドの実行、ある順序に並べること、ビデオを抽出すること、音を抽出すること、情報を格納すること、新たな書類を作成すること、書類を印刷すること及び書類を表示することの内の少なくとも１つを含む１つ以上のアクションを含む。特定の別の場合、前記紙書類上の特徴を特定し、２次元態様を捕捉するステップは、水平方向に並んだ(隣接している又はしていない)又は垂直方向に並んだ(隣接していている又はしていない)オブジェクトを特定するステップを含む。特定の別の場合、前記紙書類上の特徴を特定し、２次元態様を捕捉するステップは、水平方向の及び垂直方向のワードペア(全体的な又は部分的なワードペア)を特定するステップを含む。別の特定の場合、紙書類の電子表現を生成するステップは、走査又は印刷するプロセスの間に実行される。別の特定の場合、紙書類上の特徴を特定し、紙書類の２次元的態様を捕捉するステップは、２つの連続するシーケンス間の縦のオーバーラップ量を検査することで、テキストのシーケンスを論理的なラインにグループ化するステップを含む。本方法は、１つ以上の問い合わせ語句(クエリターム)を受信するステップ(検索対象書類中のオブジェクト間の２次元的な位置関係を捕捉する)と、インデックステーブルからのデータに基づいて、クエリタームに応じる可能性のあるロケーション候補及び少なくとも１つの混合メディア書類を算出するステップとを含んでもよい。そのような或る場合には、検索対象書類中のオブジェクト間の２次元的な位置関係を捕捉する１つ以上のクエリタームを受信するステップは、検索対象書類を受信すること、検索対象書類の少なくともパッチの画像を生成すること、及びその画像に基づいて１つ以上のクエリタームを生成することの後になされる。そのような或る場合には、画像に基づいて１つ以上のクエリタームを生成することは、その画像から抽出された水平方向の及び垂直方向のワードペアを生成することを含む。別の特定の場合、少なくとも１つの混合メディア書類及びロケーション候補を算出するステップは、検索対象書類のパッチに最も合致しそうな格納済みページの位置を特定するステップと、最もパッチの中心になりそうなページ中の場所を算出するステップを含む。そのような或る場合には、各ワードペアは書類頻度逆関数(inverse document frequency)に関連付けられ、検索対象書類の少なくともパッチに最も合致しそうな格納済みページの位置を特定するステップは、ワードペアが現れる書類ページで索引付けされるアキュムレータに、各ワードペアの書類頻度逆関数を付けるステップを含む。アキュムレータで閾値を越える最大値に応答して、本方法は、関連する書類ページをそのパッチに合うものとして出力する。そのような或る場合、最もパッチの中心になりそうなページ中の場所を算出するステップは、各ワードペア周囲のゾーンの中で各セルにウエイトを付加するステップ(各セルについてのウエイトは、例えば、そのワードペアの書類頻度逆関数と、ゾーンの中心及びそのセル間の規格化された幾何学的距離との積で決定可能である。)と、最大値を伴うセルを求めてアキュムレータの対応するAccumアレイを探すステップを含む。閾値を越える最大値に応じて、本方法は、そのセルの候補をパッチの場所として報告するステップを更に含む。別の特定の場合、少なくとも１つの混合メディア書類及びロケーション候補を算出するステップは、インデックステーブル中の１つ以上のクエリターム各々を探索し、各クエリタームに関連する１つ以上の場所(ロケーション)を取り出すステップと、確認されたロケーション各々について、そのロケーションを含む１つ以上の領域候補を確認するステップとを含む。そのような或る場合、少なくとも１つの混合メディア書類及びロケーション候補を算出するステップは、１つ以上のクエリタームの全てに最も一致する、１つ以上の領域候補の１つを特定するステップを含む。所定の一致基準を満たす１つ以上の領域候補の１つを決定したことに応じて、本方法は、その領域を検索対象書類に合致するものとして確認することに続く。 At least one form of one or more embodiments of the present invention provides a computer-used method for organizing and accessing information in a mixed media document system. The method includes generating an electronic representation of a paper document, identifying features on the paper document, capturing a two-dimensional aspect of the paper document, identifying a position of the feature, Indexing said features, thereby generating an indexing (index) table. The method may include a preliminary step of receiving the paper document. The method may include storing one or more features associated with at least one of the features. In that case, the one or more features include text information extraction, graphic information extraction, process execution, command execution, ordering, video extraction, sound extraction, information storage One or more actions including at least one of: creating a new document, printing the document, and displaying the document. In certain other cases, the step of identifying features on the paper document and capturing the two-dimensional aspect is aligned horizontally (adjacent or not) or aligned vertically (adjacent). Including the step of identifying objects (with or without). In certain other cases, identifying features on the paper document and capturing a two-dimensional aspect includes identifying horizontal and vertical word pairs (full or partial word pairs). In another particular case, the step of generating an electronic representation of the paper document is performed during the scanning or printing process. In another particular case, the step of identifying features on a paper document and capturing a two-dimensional aspect of the paper document examines the sequence of text by examining the amount of vertical overlap between two successive sequences. Grouping into logical lines. The method responds to a query term based on receiving one or more query terms (query terms) (capturing a two-dimensional positional relationship between objects in the document to be searched) and data from an index table. Computing potential location candidates and at least one mixed media document. In some such cases, receiving one or more query terms that capture a two-dimensional positional relationship between objects in the search target document includes receiving the search target document, at least the search target document. This is done after generating an image of the patch and generating one or more query terms based on the image. In some such cases, generating one or more query terms based on an image includes generating horizontal and vertical word pairs extracted from the image. In another particular case, the step of calculating at least one mixed media document and location candidate includes identifying the location of the stored page that most likely matches the patch of the document to be searched and is most likely to be the center of the patch. Including calculating a location in the page. In some such cases, each word pair is associated with an inverse document frequency, and the step of locating the stored page that most likely matches at least the patch of the searched document appears as a word pair. The accumulator indexed on the document page includes the inverse document frequency function for each word pair. In response to the maximum value exceeding the threshold at the accumulator, the method outputs the associated document page as matching the patch. In such a case, the step of calculating the location in the page that is most likely to be the center of the patch is the step of adding a weight to each cell in the zone around each word pair (the weight for each cell is, for example, , Which can be determined by the product of the inverse document frequency function of the word pair and the normalized geometric distance between the center of the zone and the cell.) And the accumulator corresponding to the cell with the maximum value Includes searching for an Accum array. In response to the maximum value exceeding the threshold, the method further includes reporting the candidate cell as a patch location. In another particular case, the step of calculating at least one mixed media document and location candidate searches each of the one or more query terms in the index table and retrieves one or more locations (locations) associated with each query term. And for each confirmed location, confirming one or more candidate regions including that location. In some such cases, the step of calculating at least one mixed media document and location candidate includes identifying one of the one or more region candidates that most closely matches all of the one or more query terms. In response to determining one of the one or more region candidates that meet a predetermined match criterion, the method continues to confirm that the region matches the search target document.

本発明の１つ以上の実施例の内の少なくとも１つの他の形態は、命令と共にエンコードされたマシン読み取り可能な媒体(例えば、コンパクトディスク、ディスケット、サーバ、メモリスティック、ハードドライブ、ROM、RAM等の１つ以上、又は電子命令を格納するのに適した如何なるタイプのメディアでもよい)をもたらし、該命令は、１つ以上のプロセッサにより実行される場合に、混合メディア書類システム中の情報を組織する及び情報へアクセスする処理をプロセッサに実行させる。そのプロセスは、例えば、上記の方法と同様でもよいし、異なっていてもよい。 At least one other form of one or more embodiments of the present invention includes machine-readable media encoded with instructions (eg, compact disc, diskette, server, memory stick, hard drive, ROM, RAM, etc.). Or any type of media suitable for storing electronic instructions) that, when executed by one or more processors, organizes information in a mixed media document system. And processing to access information is executed by the processor. The process may be similar to or different from the above method, for example.

本発明の１つ以上の実施例の内の少なくとも１つの他の形態は、混合メディア書類システムで情報にアクセスする方法をもたらす。本方法は、１つ以上のクエリタームを受信するステップ(検索対象書類中のオブジェクト間の２次元的な位置関係を捕捉する)と、インデックステーブル(書類特徴及び混合メディア書類の特徴ロケーションを索引付ける)からのデータに基づいて、クエリタームに応じる可能性のあるロケーション候補及び少なくとも１つの混合メディア書類を算出するステップとを含んでもよい。そのような或る場合には、検索対象書類中のオブジェクト間の２次元的な位置関係を捕捉する１つ以上のクエリタームを受信するステップは、検索対象書類を受信すること、検索対象書類の少なくともパッチの画像を生成すること、及びその画像に基づいて１つ以上のクエリタームを生成することの後になされる。そのような或る場合には、画像に基づいて１つ以上のクエリタームを生成することは、その画像から抽出された水平方向の及び垂直方向のワードペアを生成することを含む。別の特定の場合、少なくとも１つの混合メディア書類及びロケーション候補を算出するステップは、検索対象書類のパッチに最も合致しそうな格納済みページの位置を特定するステップと、最もパッチの中心になりそうなページ中の場所を算出するステップを含む。そのような或る場合には、各ワードペアは書類頻度逆関数に関連付けられ、検索対象書類の少なくともパッチに最も合致しそうな格納済みページの位置を特定するステップは、ワードペアが現れる書類ページで索引付けされるアキュムレータに、各ワードペアの書類頻度逆関数を付けるステップを含む。アキュムレータで閾値を越える最大値に応答して、本方法は、関連する書類ページをそのパッチに合うものとして出力する。そのような或る場合、最もパッチの中心になりそうなページ中の場所を算出するステップは、各ワードペア周囲のゾーンの中で各セルにウエイトを付加するステップ(各セルについてのウエイトは、例えば、そのワードペアの書類頻度逆関数と、ゾーンの中心及びそのセル間の規格化された幾何学的距離との積で決定可能である。)と、最大値を伴うセルを求めてアキュムレータの対応するAccumアレイを探すステップを含む。閾値を越える最大値に応じて、本方法は、そのセルの候補をパッチの場所として報告するステップを更に含む。別の特定の場合、少なくとも１つの混合メディア書類及びロケーション候補を算出するステップは、インデックステーブル中の１つ以上のクエリターム各々を探索し、各クエリタームに関連する１つ以上の場所(ロケーション)を取り出すステップと、確認されたロケーション各々について、そのロケーションを含む１つ以上の領域候補を確認するステップとを含む。少なくとも１つの混合メディア書類及びロケーション候補を算出するステップは、１つ以上のクエリタームの全てに最も一致する、１つ以上の領域候補の１つを特定するステップを含む。所定の一致基準を満たす１つ以上の領域候補の１つを決定したことに応じて、本方法は、その領域を検索対象書類に合致するものとして確認することに続く。 At least one other form of one or more embodiments of the present invention provides a method for accessing information in a mixed media document system. The method includes receiving one or more query terms (capturing a two-dimensional positional relationship between objects in the document to be searched) and an index table (indexing document features and feature locations of mixed media documents). Calculating location candidates and at least one mixed media document that may be responsive to a query term based on data from. In some such cases, receiving one or more query terms that capture a two-dimensional positional relationship between objects in the search target document includes receiving the search target document, at least the search target document. This is done after generating an image of the patch and generating one or more query terms based on the image. In some such cases, generating one or more query terms based on an image includes generating horizontal and vertical word pairs extracted from the image. In another particular case, the step of calculating at least one mixed media document and location candidate includes identifying the location of the stored page that most likely matches the patch of the document to be searched and is most likely to be the center of the patch. Including calculating a location in the page. In some such cases, each word pair is associated with an inverse document frequency function, and the step of locating the stored page that most likely matches at least the patch of the document being searched is indexed by the document page in which the word pair appears. Adding an inverse document frequency function for each word pair to the accumulator being processed. In response to the maximum value exceeding the threshold at the accumulator, the method outputs the associated document page as matching the patch. In such a case, the step of calculating the location in the page that is most likely to be the center of the patch is the step of adding a weight to each cell in the zone around each word pair (the weight for each cell is, for example, , Which can be determined by the product of the inverse document frequency function of the word pair and the normalized geometric distance between the center of the zone and the cell.) And the accumulator corresponding to the cell with the maximum value Includes searching for an Accum array. In response to the maximum value exceeding the threshold, the method further includes reporting the candidate cell as a patch location. In another particular case, the step of calculating at least one mixed media document and location candidate searches each of the one or more query terms in the index table and retrieves one or more locations (locations) associated with each query term. And for each confirmed location, confirming one or more candidate regions including that location. The step of calculating at least one mixed media document and location candidate includes identifying one of the one or more region candidates that most closely matches all of the one or more query terms. In response to determining one of the one or more region candidates that meet a predetermined match criterion, the method continues to confirm that the region matches the search target document.

本発明の１つ以上の実施例の内の少なくとも１つの別の形態は、命令と共にエンコードされたマシン読み取り可能な媒体(例えば、コンパクトディスク、ディスケット、サーバ、メモリスティック、ハードドライブ、ROM、RAM等の１つ以上、又は電子命令を格納するのに適した如何なるタイプのメディアでもよい)をもたらし、該命令は、１つ以上のプロセッサにより実行される場合に、混合メディア書類システム中の情報を組織する及び情報へアクセスする処理をプロセッサに実行させる。そのプロセスは、例えば、上記の方法と同様でもよいし、異なっていてもよい。 At least one other form of one or more embodiments of the present invention includes machine-readable media encoded with instructions (eg, compact disc, diskette, server, memory stick, hard drive, ROM, RAM, etc.). Or any type of media suitable for storing electronic instructions) that, when executed by one or more processors, organizes information in a mixed media document system And processing to access information is executed by the processor. The process may be similar to or different from the above method, for example.

本願で説明される特徴及び利点は、全てを網羅するものではなく、特に、多くの追加的な特徴及び利点が以下の説明及び図面を理解することで当業者に明白になるであろう。更に、本明細書で使用される特定の言語は、主に理解及び説明の観点から選択されているに過ぎず、本発明の範囲を限定しようとするものでないことに留意すべきである。 The features and advantages described in this application are not exhaustive and, in particular, many additional features and advantages will be apparent to those of skill in the art upon understanding the following description and drawings. Furthermore, it should be noted that the specific language used herein has been selected primarily for purposes of understanding and description and is not intended to limit the scope of the invention.

以下、添付図面と共に非限定的な実施例により本発明が説明され、図中同様な番号が同様な要素を指すために使用される。 The invention will now be described by way of non-limiting examples in conjunction with the accompanying drawings, in which like numerals are used to refer to like elements.

本発明を実施するベストモード
混合メディアリアリティ(MMR)システム及び関連する方法が開示される。MMRシステムは少なくとも２種類のメディアを含む混合メディア書類を作成する仕組みをもたらし、例えば第１媒体として印刷用紙を使用し、第２媒体としてディジタル写真、ディジタル映画、ディジタルオーディオファイル、ディジタルテキストファイル又はウェブリンク等を使用してもよい。更に、MMRシステム及び／又は技術は、様々なビジネスモデルを促すように使用することもでき、携帯用電子装置(例えば、PDA又はカメラ付きセルラ電話)と紙の書類を組み合わせて混合メディア書類を用意することを考えてもよい。 A Best Mode Mixed Media Reality (MMR) system and associated method embodying the present invention is disclosed. The MMR system provides a mechanism for creating a mixed media document containing at least two types of media, for example, using printing paper as the first medium and digital photos, digital movies, digital audio files, digital text files or web as the second medium. A link or the like may be used. In addition, MMR systems and / or technologies can be used to encourage various business models, combining portable electronic devices (eg, PDAs or cellular phones with cameras) and paper documents to provide mixed media documents. You may consider doing it.

ある特定の実施例では、MMRシステムはコンテンツベースの検索データベースを含み、そのデータベースは、テキストベースのインデックスを利用して検索を可能にする方法で、印刷書類から抽出されたオブジェクト間の幾何学的な２次元的位置関係を表す。証拠蓄積技術(evidence accumulation)は、特徴の出現頻度と２次元領域内での場所の尤度とを結合する。そのような一実施例では、MMRデータベースシステムはインデックステーブルを含み、MMR特徴抽出アルゴリズムで算出された記述を受信する。インデックステーブルは、特徴の各々が出現するページ内でのｘ−ｙ座標、ページ及び書類を特定する。証拠蓄積アルゴリズムは、インデックステーブルからのデータの元での仮定的な位置、ページ及び書類のランキングされた群を計算する。関連データベース(又は他の適切なストレージ装置)は、書類、ページ及び場所の各々について追加的な特徴を必要に応じて格納するよう使用することができる。 In one particular embodiment, the MMR system includes a content-based search database that is geometrically between objects extracted from a printed document in a manner that allows search using a text-based index. Represents a two-dimensional positional relationship. Evidence accumulation combines the frequency of appearance of features with the likelihood of a place in a two-dimensional region. In one such embodiment, the MMR database system includes an index table and receives a description calculated with an MMR feature extraction algorithm. The index table identifies xy coordinates, pages and documents within the page where each feature appears. The evidence accumulation algorithm calculates a hypothetical location under the data from the index table, a ranked group of pages and documents. An associated database (or other suitable storage device) can be used to store additional features as needed for each of the documents, pages, and locations.

MMRデータベースシステムは他のコンポーネントを含んでもよく、例えば、MMRプロセッサ、キャプチャー装置、通信手段及びメモリ(MMRソフトウエアを含む)等をふくんでもよい。MMRプロセッサは、メディアタイプのストレージ又はソースに、入力装置に及び出力装置に結合されてもよい。そのような一実施例では、MMRソフトウエアはMMRプロセッサで実行可能なルーチンを含み、追加的なディジタルコンテンツとともにMMRドキュメントにアクセスし、MMRドキュメントを作成又は修正し、その書類を用いて他の処理(業務処理、データ検索、報告等)を実行する。 The MMR database system may include other components, for example, an MMR processor, a capture device, communication means, memory (including MMR software), and the like. The MMR processor may be coupled to a media type storage or source, to an input device, and to an output device. In one such embodiment, the MMR software includes routines that can be executed by the MMR processor, accessing the MMR document with additional digital content, creating or modifying the MMR document, and using the document for other processing. (Business processing, data search, report, etc.) are executed.

MMRシステムの概要
図１Ａを参照するに、本発明の一実施例による混合メディアリアリティ(MMR)システム100aが示されている。MMRシステム100aは、MMRプロセッサ102、通信手段104、携帯入力装置168及び携帯出力装置170を有するキャプチャー装置106、MMRソフトウエア108を含むメモリ、ベースメディアストレージ160、MMRメディアストレージ162、出力装置164及び入力装置166を有する。MMRシステム100aは、既存の印刷書類(第1メディアタイプ)からの情報を、第２メディアタイプ(例えば、オーディオ、ビデオ、テキスト、更新情報及びサービス等)のインデックスとして利用する方法を提供することで、混合メディア環境を作り出す。 Overview of MMR System Referring to FIG. 1A, a mixed media reality (MMR) system 100a according to one embodiment of the present invention is shown. The MMR system 100a includes an MMR processor 102, a communication means 104, a capture device 106 having a portable input device 168 and a portable output device 170, a memory including the MMR software 108, a base media storage 160, an MMR media storage 162, an output device 164, and An input device 166 is included. The MMR system 100a provides a method of using information from an existing print document (first media type) as an index of a second media type (eg, audio, video, text, update information, service, etc.). Create a mixed media environment.

キャプチャー装置１０６は、印刷書類の表現(例えば、画像、図形その他の表現)を生成することができ、その表現は、MMRプロセッサ102に送信される。MMRシステム100aはその表現をMMRドキュメント及び他の第２メディアタイプと照合する。MMRシステム100aは、入力に応答する動作及び表現の認識に関する責務を負う。MMRシステム100aにより行われる動作は、如何なるタイプのものでもよく、例えば、情報を抽出すること、ある順序に並べること、ビデオ又は音を検索すること、情報を格納すること、新たな書類を作成すること、書類を印刷すること、書類や画像を表示すること等を含んでもよい。上記のコンテンツベースの検索データベース技術を利用することで、MMRシステム100aは印刷されたテキストをダイナミックなメディアに変える仕組みをもたらし、関心のあるサービスの電子コンテンツや値に対する入力点をユーザにもたらす。 The capture device 106 can generate a representation of a printed document (eg, an image, a graphic, or other representation) that is sent to the MMR processor 102. The MMR system 100a matches the representation with the MMR document and other second media types. The MMR system 100a is responsible for actions that respond to input and recognition of expressions. The operations performed by the MMR system 100a may be of any type, for example, extracting information, arranging in a certain order, searching for video or sound, storing information, creating a new document , Printing a document, displaying a document or image, and the like. By utilizing the content-based search database technology described above, the MMR system 100a provides a mechanism for converting printed text into dynamic media, and provides the user with input points for the electronic content and values of the service of interest.

MMRプロセッサ102は、データ信号を処理し、様々な演算アーキテクチャを含んでもよく、演算アーキテクチャは、複合命令セットコンピュータ(CISC)アーキテクチャ、縮小命令セットコンピュータ(RISC)アーキテクチャ、又は組み合わせによる命令セットを実行するアーキテクチャ等を含む。ある特定の実施例では、MMRプロセッサ102は、演算論理部、マイクロプロセッサ、汎用コンピュータ、その他の情報機器等の本発明による動作を実行するように備えられたものである。他の実施例では、MMRプロセッサ102はグラフィカルユーザインターフェースを有する汎用コンピュータを構成し、例えば、ウインドウズ(登録商標)やユニックス(登録商標)ベースのオペレーティングシステム等のようなオペレーティングシステム上で動作するジャバ(Java(登録商標))で書かれたプログラムによって生成されてもよい。図１Ａでは１つのプロセッサしか描かれていないが、複数のプロセッサが含められてもよい。プロセッサはMMR108に結合され、そこに格納されている命令を実行する。 The MMR processor 102 processes data signals and may include various arithmetic architectures, which execute an instruction set with a complex instruction set computer (CISC) architecture, a reduced instruction set computer (RISC) architecture, or a combination. Including architecture etc. In one particular embodiment, the MMR processor 102 is equipped to perform operations according to the present invention, such as arithmetic logic, microprocessors, general purpose computers, and other information equipment. In other embodiments, the MMR processor 102 comprises a general purpose computer having a graphical user interface, such as a Java (such as a Java operating on an operating system such as a Windows® or Unix® based operating system). It may be generated by a program written in Java (registered trademark). Although only one processor is depicted in FIG. 1A, multiple processors may be included. The processor is coupled to the MMR 108 and executes instructions stored therein.

通信手段104は、キャプチャー装置106をMMRプロセッサ102に結合する何らかの装置又はシステムである。例えば、通信手段104は、ネットワーク(例えば、WAN及び／又はLAN)、有線リンク(例えば、USB、RS232又はイーサーネット)、無線リンク(例えば、赤外線、ブルートゥース又は802.11)、移動体通信リンク(例えば、GPRS又はGSM)、公衆交換電話網(PSTN)リンク又はこれらの何らかの組み合わせを用いて実現されてもよい。様々な通信アーキテクチャ及びプロトコルがここで使用可能である。 The communication means 104 is any device or system that couples the capture device 106 to the MMR processor 102. For example, the communication means 104 can be a network (e.g., WAN and / or LAN), a wired link (e.g., USB, RS232 or Ethernet), a wireless link (e.g., infrared, Bluetooth or 802.11), a mobile communication link (e.g., GPRS or GSM), public switched telephone network (PSTN) links, or some combination thereof. Various communication architectures and protocols can be used here.

キャプチャー装置106は、通信手段104とインターフェースをとるトランシーバのような手段を含み、入力装置168を介して画像又はデータをディジタル的に捕捉することが可能な如何なる装置でもよい。キャプチャー装置106は出力装置170を場合によっては含んでもよく、或いは携帯可能でもよい。例えば、キャプチャー装置106は、標準的なカメラ付きセルラ電話、PDA装置、ディジタルカメラ、バーコードリーダ、無線ICタグ(RFID)リーダ、標準的なウェブカム(standard webcam)のようなコンピュータ周辺機器、又はPCのビデオカードのような組込型装置等でもよい。キャプチャー装置106のいくつかの具体例は図2A-2Dを参照しながらそれぞれ詳細に説明される。更に、キャプチャー装置106はソフトウエアアプリケーションでもよく、ソフトウエアアプリケーションは、コンテンツベースの検索を可能にし、MMRシステム100a/100bのインフラストラクチャとキャプチャー装置106とを結び付ける。キャプチャー装置106の更に詳細な機能は、図２Eを参照しながら説明される。一般的な及びカスタマイズされた様々なキャプチャー装置106、及びそれらの機能やアーキテクチャは、本願により明らかになるであろう。 The capture device 106 may be any device capable of digitally capturing images or data via an input device 168, including means such as a transceiver that interfaces with the communication means 104. The capture device 106 may optionally include an output device 170, or may be portable. For example, the capture device 106 can be a standard camera cellular phone, a PDA device, a digital camera, a barcode reader, a wireless IC tag (RFID) reader, a computer peripheral such as a standard webcam, or a PC An embedded device such as a video card may be used. Several specific examples of the capture device 106 are each described in detail with reference to FIGS. 2A-2D. Furthermore, the capture device 106 may be a software application, which enables content-based searching and links the capture device 106 with the infrastructure of the MMR system 100a / 100b. A more detailed function of the capture device 106 will be described with reference to FIG. 2E. Various general and customized capture devices 106 and their functions and architecture will be apparent from this application.

メモリ108は、プロセッサ102で実行される命令及び/又はデータを格納する。命令及び／又はデータは、上記の技術の何れか及び／又は全てを実行するコードで構成される。メモリ108はダイナミックランダムアクセスメモリ(DRAM)装置、スタティックランダムアクセスメモリ(SRAM)装置でもよいし、或いは適切な如何なる他のメモリ装置でもよい。メモリ108は図４を参照しながら以下で更に詳細に説明される。ある特定の実施例では、メモリ108はMMRソフトウエア一式、オペレーティングシステム及び他のアプリケーションプログラム(例えば、ワードプロセシングアプリケーション、電子メールアプリケーション、財務アプリケーション及びウェブブラウザアプリケーション)を含む。 The memory 108 stores instructions and / or data executed by the processor 102. The instructions and / or data are comprised of code that performs any and / or all of the above techniques. The memory 108 may be a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, or any other suitable memory device. The memory 108 is described in further detail below with reference to FIG. In certain embodiments, memory 108 includes a set of MMR software, an operating system, and other application programs (eg, word processing applications, email applications, financial applications, and web browser applications).

ベースメディアストレージ160は、第２メディアタイプをそれら当初の形式で格納するためのものであり、MMRメディアストレージ162はMMRドキュメントを、データベースを及び他の情報（MMR環境を作り出すための詳細な情報）を格納するためのものである。図中、別々に示されているが、他の実施例では、ベースメディアストレージ160及びMMRメディアストレージ162は、同じストレージ装置の一部でもよいし、或いは統合されてもよい。データストレージ160，162は、MMRプロセッサ102の命令及びデータを格納し、（ハードディスクドライブ、フロッピディスクドライブ、CD-ROM装置、DVD-ROM装置、DVD-RAM装置、DVD-RW装置、フラッシュメモリ装置又は適切な他の何らかの大容量記憶装置等のような）1つ以上の装置を含んでもよい。 Base media storage 160 is for storing secondary media types in their original form, and MMR media storage 162 is for MMR documents, databases and other information (detailed information for creating an MMR environment). Is for storing. Although shown separately in the figure, in other embodiments, the base media storage 160 and MMR media storage 162 may be part of the same storage device or may be integrated. The data storage 160, 162 stores instructions and data of the MMR processor 102 (hard disk drive, floppy disk drive, CD-ROM device, DVD-ROM device, DVD-RAM device, DVD-RW device, flash memory device or One or more devices (such as any other suitable mass storage device, etc.) may be included.

出力装置164は、MMRプロセッサ102に動作可能に結合され、データを出力するように装備された何らかの装置(表示する、音を出す、或いはコンテンツを提示する装置)を表現する。例えば、出力装置164は、プリンタ、表示装置及び／又はスピーカ等のような様々なタイプの内のどれでもよい。例示的な表示出力装置164は、陰極線管(CRT)、液晶表示装置(LCD)、同様に備わる何らかの表示装置、スクリーン又はモニタ等を含む。一実施例では、出力装置164はタッチスクリーンに備わり、そのタッチスクリーンでは、接触感知方式の透明パネルが出力装置164の画面を覆っている。 The output device 164 is operatively coupled to the MMR processor 102 and represents any device (device that displays, produces sound, or presents content) that is equipped to output data. For example, the output device 164 may be any of various types such as a printer, a display device, and / or a speaker. An exemplary display output device 164 includes a cathode ray tube (CRT), a liquid crystal display (LCD), any display device provided as well, a screen or monitor, and the like. In one embodiment, the output device 164 is provided on a touch screen, and a touch-sensitive transparent panel covers the screen of the output device 164 on the touch screen.

入力装置166はMMRプロセッサ102に動作可能に結合され、キーボード、カーソルコントローラ、スキャナ、複合機、スチルカメラ、ビデオカメラ、キーパッド、タッチスクリーン、ディテクタ、RFIDタグリーダ、スイッチその他の何らかの手段等のような様々なタイプの内のどれでもよい（何らかの手段は、ユーザがシステム100aとやり取りすることを可能にするものである。）。一実施例では、入力装置166はキーパッド及びカーソルコントローラである。カーソルコントローラは、例えば、マウス、トラックボール、スタイラス、ペン、タッチスクリーン、パッド、カーソル方向キー、又はカーソルを動かす他の手段を含む。他の実施例では、入力装置166は、マクロフォン、オーディオ挿入／拡張カードであり、汎用コンピュータシステム、アナログディジタル変換器及びディジタル信号プロセッサ内で音声認識及び／又はオーディオ処理を促す余に使用するよう設計されたものである。 The input device 166 is operably coupled to the MMR processor 102, such as a keyboard, cursor controller, scanner, multifunction device, still camera, video camera, keypad, touch screen, detector, RFID tag reader, switch or some other means, etc. Any of a variety of types (some means allows the user to interact with system 100a). In one embodiment, input device 166 is a keypad and cursor controller. The cursor controller includes, for example, a mouse, trackball, stylus, pen, touch screen, pad, cursor direction keys, or other means for moving the cursor. In other embodiments, the input device 166 is a macrophone, audio insertion / expansion card, and may be used to facilitate speech recognition and / or audio processing in general purpose computer systems, analog to digital converters, and digital signal processors. It is designed.

図１Ｂは、本発明の別の実施例により構築されたMMRシステム100bの機能ブロック図を示す。この実施例では、MMRシステム100bは、MMRコンピュータ112(ユーザ110により操作される)、ネットワークメディアサーバ114及び印刷書類118を生成するプリンタ116を含む。更にMMRシステム100bは、オフィスポータル120、サービスプロバイダサーバ122、電子ディスプレイ124を含み、電子ディスプレイはセットトップボックス126及び書類スキャナ127に電子的に接続されている。MMRコンピュータ112、ネットワークメディアサーバ114、プリンタ116、オフィスポータル120、サービスプロバイダサーバ122、セットトップボックス126及び書類スキャナ127の間の通信リンクは、ネットワーク128により用意され、ネットワークは、LAN(例えば、オフィス又はホームネットワーク)、WAN(例えば、インターネット、企業ネットワーク)、LAN／WANの組み合わせ他の適切な何らかのデータパスであってよく、それらを介して複数の演算装置が通信してよい。 FIG. 1B shows a functional block diagram of an MMR system 100b constructed in accordance with another embodiment of the present invention. In this embodiment, the MMR system 100b includes an MMR computer 112 (operated by a user 110), a network media server 114, and a printer 116 that generates a print document 118. The MMR system 100b further includes an office portal 120, a service provider server 122, and an electronic display 124, which are electronically connected to a set top box 126 and a document scanner 127. The communication link between the MMR computer 112, the network media server 114, the printer 116, the office portal 120, the service provider server 122, the set top box 126, and the document scanner 127 is provided by a network 128, which is a LAN (e.g. Or a home network), a WAN (eg, the Internet, a corporate network), a LAN / WAN combination, or any other suitable data path through which multiple computing devices may communicate.

更にMMRシステム100bはキャプチャー装置106を含み、キャプチャー装置は、ネットワークメディアサーバ114、ユーザプリンタ116、オフィスポータル120、サービスプロバイダサーバ122、電子ディスプレイ124、セットトップボックス126及び書類スキャナ127の1つ以上と、セルラインフラスクトラクチャ132、ワイヤレスフィデリティ(Wi-Fi)技術134、ブルートゥース技術136及び／又は赤外線(IR)技術138等を介して無線で通信することができる。代替的に又は追加的に、キャプチャー装置106は、MMRコンピュータ112、ネットワークメディアサーバ114、プリンタ116、オフィスポータル120、サービスプロバイダサーバ122、セットトップボックス126及び書類スキャナ127と有線技術140を介して有線形式で通信することができる。Wi-Fi技術134、ブルートゥース技術136、IR技術138及び有線技術140は、図１Ｂでは別々の要素のように示されているが、そのような技術は、(例えば、MMRコンピュータ112、ネットワークメディアサーバ114、キャプチャー装置106等のような)処理環境内で統合されてもよい。更に、MMR100bは更にジオ(Geo)ロケーション手段142を含み、無線で又は有線でサービスプロバイダサーバ122又はネットワーク128と通信する。これはキャプチャー装置106に統合されてもよい。 The MMR system 100b further includes a capture device 106, which includes one or more of a network media server 114, a user printer 116, an office portal 120, a service provider server 122, an electronic display 124, a set top box 126, and a document scanner 127. , Wireless communication via cellular infrastructure 132, wireless fidelity (Wi-Fi) technology 134, Bluetooth technology 136 and / or infrared (IR) technology 138, and the like. Alternatively or additionally, the capture device 106 may be wired via a wired technology 140 with an MMR computer 112, a network media server 114, a printer 116, an office portal 120, a service provider server 122, a set-top box 126 and a document scanner 127. Can communicate in the form. Although the Wi-Fi technology 134, the Bluetooth technology 136, the IR technology 138, and the wired technology 140 are shown as separate elements in FIG. 1B, such technologies (e.g., MMR computer 112, network media server) 114, capture device 106, etc.) may be integrated within the processing environment. In addition, the MMR 100b further includes Geo location means 142 to communicate with the service provider server 122 or network 128 wirelessly or wired. This may be integrated into the capture device 106.

MMRユーザ110は、MMRシステム100bを利用している何らかの個人である。MMRコンピュータ112は何らかのデスクトップ、ラップトップ、ネットワークコンピュータその他の処理環境である。ユーザプリンタ116は印刷書類を作成可能な何らかのホーム、オフィス又は業務用プリンタであり、印刷書類は、1以上の印刷ページで構成される書類である。 The MMR user 110 is any individual who uses the MMR system 100b. The MMR computer 112 is any desktop, laptop, network computer or other processing environment. The user printer 116 is any home, office, or business printer capable of creating a print document, and the print document is a document composed of one or more print pages.

ネットワークメディアサーバ114はネットワークコンピュータであり、ネットワーク128を介してMMRシステム100bのユーザによりアクセスされる情報及び／又はアプリケーションを有する。ある特定の実施例では、ネットワークメディアサーバ114は、セントラル化されたコンピュータであり、そのコンピュータ上で、テキストソースファイル、ウェブページ、オーディオ及び／又はビデオファイル、画像ファイル（例えば、静止画写真）等のような様々なメディアファイルが格納されている。ネットワークメディアサーバ114は、例えば、コムキャスト社のコムキャストビデオオンデマンドサービス、リコーイノベーションズインコーポレーテッドのリコードキュメントモール又はグーグルインコーポレーテッドのビデオサーバである。概して、ネットワークメディアサーバ114は、アタッチされた、統合された或いはキャプチャー装置106により印刷書類118に関連付けられた何らかのデータへのアクセス権を提供する。 The network media server 114 is a network computer and has information and / or applications accessed by a user of the MMR system 100b via the network 128. In certain embodiments, the network media server 114 is a centralized computer on which text source files, web pages, audio and / or video files, image files (eg, still image photos), etc. Various media files such as are stored. The network media server 114 is, for example, a Comcast video on demand service of Comcast, a Ricoh document mall of Ricoh Innovations Incorporated, or a video server of Google Incorporated. In general, the network media server 114 provides access to any data that is attached, integrated, or associated with the print document 118 by the capture device 106.

オフィスポータル120は、MMRユーザ110の環境の中で生じるイベントを捕捉するためのオプション的な手段であり、例えばMMRユーザ110のオフィスで生じるイベント等を捕捉する。例えばオフィスポータル120はMMRコンピュータ112とは別個のコンピュータでもよい。その場合、オフィスポータル120は、MMRコンピュータ112に直接的に接続されてもよいし、ネットワーク128を介してMMRコンピュータ112に接続されてもよい。或いは、オフィスポータル120はMMRコンピュータ112内に組み込まれてもよい。例えば、オフィスポータル120が通常のパーソナルコンピュータ(PC)で構成され、そして、何らかの関連する捕捉装置106をサポートする適切なハードウエアとともに拡張されてもよい。オフィスポータル120はビデオカメラ及びオーディオレコーダのような捕捉装置を含んでもよい。更に、オフィスポータル120はMMRコンピュータ112からのデータを捕捉及び格納してもよい。例えば、オフィスポータル120はMMRコンピュータ112で生じるイベント及び機能を監視及び受信することができる。その結果、オフィスポータル120はMMRユーザ110の物理的な環境の中でオーディオ及びビデオの全てを記録し、MMRコンピュータ112で生じる全てのイベントを記録することができる。ある特定の実施例では、オフィスポータル120はイベントを捕捉する−例えば、書類が編集されている最中にMMRコンピュータ112から映像スクリーン捕捉を行う。その際、オフィスポータル120は、閲覧したウェブサイト、及び所与の書類が作成された際に作成された他の書類を捕捉する。その情報は、彼／彼女のMMRコンピュータ112又は捕捉装置106を介して後にMMRユーザ110にとって利用できるようにしてもよい。更に、オフィスポータル120は、ユーザが彼らの書類に付けるクリップに備えてマルチメディアサーバとして使用されてもよい。更に、書類が机の上にある間に生じた会話(例えば、電話やオフィス内の会話)、電話での話し合い及びオフィス内の小規模なミーティング等のような他のオフィスイベントをオフィスポータル120は捕捉してもよい。オフィスポータル120のビデオカメラ(図示せず)は、捕捉装置106用に開発された同じコンテンツベースの検索技術を利用することで、MMRユーザ110の物理的なデスクトップ上の書類を確認してもよい。 The office portal 120 is an optional means for capturing events that occur in the environment of the MMR user 110, such as capturing events that occur in the office of the MMR user 110, for example. For example, the office portal 120 may be a computer separate from the MMR computer 112. In that case, the office portal 120 may be directly connected to the MMR computer 112 or may be connected to the MMR computer 112 via the network 128. Alternatively, the office portal 120 may be incorporated in the MMR computer 112. For example, the office portal 120 may be configured with a regular personal computer (PC) and extended with appropriate hardware to support some associated capture device 106. The office portal 120 may include capture devices such as video cameras and audio recorders. Further, the office portal 120 may capture and store data from the MMR computer 112. For example, the office portal 120 can monitor and receive events and functions that occur on the MMR computer 112. As a result, the office portal 120 can record all audio and video in the physical environment of the MMR user 110 and can record all events that occur on the MMR computer 112. In one particular embodiment, the office portal 120 captures an event—for example, a video screen capture from the MMR computer 112 while a document is being edited. At that time, the office portal 120 captures the viewed website and other documents created when a given document is created. That information may later be made available to MMR user 110 via his / her MMR computer 112 or capture device 106. Furthermore, the office portal 120 may be used as a multimedia server in preparation for clips that users attach to their documents. In addition, the office portal 120 can handle other office events such as conversations that occur while the document is on the desk (eg, telephone or office conversations), telephone conversations, and small meetings in the office. May be captured. A video camera (not shown) in the office portal 120 may verify the document on the physical desktop of the MMR user 110 by utilizing the same content-based search technology developed for the capture device 106. .

サービスプロバイダサーバ122は情報又はアプリケーションを保持する何らかの商用サーバであり、その情報等はネットワーク128を介してMMRシステム100bのMMRユーザ110によってアクセス可能である。特に、サービスプロバイダサーバ122は、MMRシステム100bに関連する如何なるサービスプロバイダをも表現する。サービスプロバイダサーバ122は、限定ではないが例えば、コムキャストコーポレーション等のようなケーブルTVプロバイダの商用サーバ、ベリゾンワイヤレス等のようなセルラ電話サービスプロバイダ、アデルフィアコミュニケーションズ等のようなインターネットサービスプロバイダ、ソニーコーポレーション等のようなオンライン音楽サービスプロバイダ等である。 The service provider server 122 is any commercial server that holds information or applications, and the information and the like can be accessed by the MMR user 110 of the MMR system 100b via the network 128. In particular, the service provider server 122 represents any service provider associated with the MMR system 100b. The service provider server 122 may be, for example, without limitation, a commercial server of a cable TV provider such as Comcast Corporation, a cellular telephone service provider such as Verizon Wireless, an Internet service provider such as Adelphia Communications, Sony, etc. Online music service providers such as corporations.

電子ディスプレイ124は何らかの表示装置であり、限定ではないが例えば、標準的なアナログ又はディジタルテレビジョン(TV)、フラットスクリーンTV、フラットパネルディスプレイ又は投影システム等である。セットトップボックス126は、衛星アンテナ、空中線、ケーブル、ネットワーク又は電話回線等から到来する信号を処理する受信装置である。セットトップボックスの製造者は例えばアドバンストディジタルブロードキャストである。セットトップボックス126は電子ディスプレイ124のビデオ入力に電子的に接続される。 The electronic display 124 is any display device such as, but not limited to, a standard analog or digital television (TV), flat screen TV, flat panel display or projection system. The set top box 126 is a receiving device that processes signals coming from a satellite antenna, an antenna, a cable, a network, a telephone line, or the like. The manufacturer of the set top box is, for example, advanced digital broadcast. Set top box 126 is electronically connected to the video input of electronic display 124.

書類スキャナ127は、パナソニックコーポレーションのKV-S2026Cフルカラースキャナ等のような市販の書類スキャナ装置である。書類スキャナ127は、既存の印刷された書類をMMR用書類に変換する際に使用される。 The document scanner 127 is a commercially available document scanner device such as KV-S2026C full-color scanner of Panasonic Corporation. The document scanner 127 is used when converting an existing printed document into an MMR document.

セルラインフラストラクチャ132は、複数のセルタワー及び他のセルラネットワーク相互接続を表す。特に、セルラインフラストラクチャ132を利用することで、双方向の音声及びデータ通信が、捕捉装置106等のような装置に組み込まれた無線モデムを介して、手持ちの、携帯可能な及び車載の電話で行われる。 The cellular infrastructure 132 represents multiple cell towers and other cellular network interconnections. In particular, by utilizing cellular infrastructure 132, two-way voice and data communication can be performed with a handheld, portable and in-vehicle phone via a wireless modem embedded in a device such as capture device 106 or the like. Done.

Wi-Fi技術134、ブルートゥース技術136及びIR技術138は、電子装置間の無線通信を行うための技術を表す。Wi-Fi技術134は、802.11規格等に基づく無線ローカルエリアネットワーク(WLAN)に関連する技術である。ブルートゥース技術136は電気通信工業規格仕様で規定され、その規格はセルラ電話、コンピュータ及びPDAが短距離無線接続等によりどのように相互接続されるかを記述している。IR技術138は短距離無線信号を通じて電子装置が通信できるようにする。例えば、IR技術138はテレビジョンのリモートコントローラ、ラップトップコンピュータ、PDAその他の装置で使用される見通し線無線通信媒体でもよい。IR技術138は可視光線より下のミッドマイクロ波帯(mid-microwave)のスペクトルを使用する。更に、1つ以上の他の実施例では、IEEE802.15(UWB)及び／又は802.16(WiMAX)規格を用いて無線通信がサポートされてもよい。 The Wi-Fi technology 134, the Bluetooth technology 136, and the IR technology 138 represent technologies for performing wireless communication between electronic devices. The Wi-Fi technology 134 is a technology related to a wireless local area network (WLAN) based on the 802.11 standard or the like. Bluetooth technology 136 is defined in the Telecommunications Industry Standard specification, which describes how cellular telephones, computers and PDAs are interconnected via short-range wireless connections and the like. IR technology 138 enables electronic devices to communicate through short-range wireless signals. For example, IR technology 138 may be a line-of-sight wireless communication medium used in television remote controllers, laptop computers, PDAs and other devices. IR technology 138 uses a mid-microwave spectrum below visible light. Further, in one or more other embodiments, wireless communications may be supported using the IEEE 802.15 (UWB) and / or 802.16 (WiMAX) standards.

有線技術140は標準的なイーサーネット接続又はユニバーサルシリアルバス(USB)接続等のような何らかの有線通信手段である。セルラインフラストラクチャ132、Wi-Fi技術134、ブルートゥース技術136、IR技術138及び／又は有線技術140を利用することで、捕捉装置106はMMRシステム100bの電子装置の何れか又は全てと双方向に通信できる。 The wired technology 140 is any wired communication means such as standard Ethernet connection or universal serial bus (USB) connection. By using cellular infrastructure 132, Wi-Fi technology 134, Bluetooth technology 136, IR technology 138 and / or wired technology 140, capture device 106 communicates bi-directionally with any or all of the electronic devices of MMR system 100b. it can.

ジオロケーション手段142は地理的な位置を確認する適切な何らかの手段である。ジオロケーション手段142は、例えば周知のような、地上のGPS受信装置に位置データを提供するGPS衛星である。図１Ｂに示される具体例では、位置データはサービスプロバイダサーバ122を介してMMRシステム100bのユーザに与えられ、そのサービスプロバイダサーバはGPS受信機(図示せず)と結合するネットワーク128に接続されている。或いは、ジオロケーション手段142は一群のセルタワー(例えば、セルラインフラストラクチャ132の一部分)であり、三角測量手段、セルタワー身元確認(ID)手段及び／又は(地理的な位置を確認する手段のような)エンハンストサービス911を用意する。或いは、ジオロケーション手段142は、WiFiアクセスポイント又はブルートゥース装置の既知の場所からの信号強度測定により与えられてもよい。 The geolocation means 142 is any suitable means for confirming the geographical position. The geolocation means 142 is a GPS satellite that provides position data to a terrestrial GPS receiver, for example as is well known. In the example shown in FIG. 1B, location data is provided to a user of the MMR system 100b via a service provider server 122, which is connected to a network 128 that is coupled to a GPS receiver (not shown). Yes. Alternatively, the geolocation means 142 is a group of cell towers (e.g., a portion of the cellular infrastructure 132), such as triangulation means, cell tower identification (ID) means, and / or (such as means for confirming geographic location). Prepare enhanced service 911. Alternatively, the geolocation means 142 may be provided by a signal strength measurement from a known location of the WiFi access point or Bluetooth device.

動作の際、捕捉装置106はMMRユーザ110の所有するクライアントとして機能する。そこにはソフトウエアアプリケーションがあり、コンテンツベースの検索処理を実行可能にし、そして、セルラインフラストラクチャ132、Wi-Fi技術134、ブルートゥース技術136、IR技術138及び／又は有線技術140を介して捕捉装置106をMMRシステム100bのインフラストラクチャに結合する。更に、いくつもの動作を可能にするソフトウエアアプリケーションはMMRコンピュータ112にも存在し、限定ではないが例えば、印刷捕捉処理、イベント捕捉処理(例えば、書類の編集履歴を保存する)、サーバ処理(例えば、他社に対する後のサービスに備えてデータ及びイベントを保存する)、又は印刷管理処理(例えば、プリンタ116は、書類のレイアウト及びマルチメディアクリップ等のようなMMRに必要なデータをキュー(queue)に並べるよう設定されてもよい)等を可能にする。 In operation, the capture device 106 functions as a client owned by the MMR user 110. There is a software application that enables content-based search processing and capture devices via cellular infrastructure 132, Wi-Fi technology 134, Bluetooth technology 136, IR technology 138 and / or wired technology 140 106 is coupled to the infrastructure of the MMR system 100b. In addition, software applications that enable a number of operations also exist in the MMR computer 112, including but not limited to print capture processing, event capture processing (e.g., storing document editing history), server processing (e.g., Save data and events for later service to other companies) or print management processing (e.g., printer 116 queues data required for MMR, such as document layout and multimedia clips, etc.) May be set to line up).

ネットワーク化されたメディアサーバ114は、印刷書類118のような印刷書類に添付されたデータへのアクセス権を与え、印刷書類118はMMRユーザ110に属するMMRコンピュータ112を介して印刷される。その際、ビデオ又はオーディオのような第２の媒体が、書類のような第１の媒体に関連付けられる。第１の媒体に第２の媒体を関連付けるソフトウエアアプリケーション及び／又は手段の更なる詳細は、図２Ｅ，３，４及び５を参照しながら以下で説明される。 A networked media server 114 provides access to data attached to a print document, such as a print document 118, which is printed via the MMR computer 112 belonging to the MMR user 110. In so doing, a second medium, such as video or audio, is associated with the first medium, such as a document. Further details of the software application and / or means for associating the second medium with the first medium are described below with reference to FIGS. 2E, 3, 4 and 5.

捕捉装置
図２Ａ，２Ｂ，２Ｃ及び２Ｄは、本発明の一実施例による捕捉装置106を例示する。より具体的には、図２Ａはセルラ電話である捕捉装置106aを示す。図２ＢはPDA装置である捕捉装置106bを示す。図２Ｃはコンピュータ周辺装置である捕捉装置106cを示す。コンピュータ周辺装置の一例は何らかの標準的なウェブカムである。図２Ｄはコンピュータ装置(例えば、MMRコンピュータ112等)に組み込まれている捕捉装置106dを示す。例えば、捕捉装置106dはコンピュータグラフィックスカードである。捕捉装置106の具体的な詳細は、図２Ｅを参照しながら説明される。 Capture Device FIGS. 2A, 2B, 2C and 2D illustrate a capture device 106 according to one embodiment of the present invention. More specifically, FIG. 2A shows a capture device 106a that is a cellular telephone. FIG. 2B shows a capture device 106b which is a PDA device. FIG. 2C shows a capture device 106c, which is a computer peripheral device. An example of a computer peripheral device is any standard webcam. FIG. 2D shows a capture device 106d that is incorporated into a computing device (eg, MMR computer 112, etc.). For example, the capture device 106d is a computer graphics card. Specific details of the capture device 106 will be described with reference to FIG. 2E.

捕捉装置106a,106bの場合、捕捉装置106はMMRユーザ110の所有に係るものでもよい。それらの物理的な場所はジオロケーション手段142により又はセルラインフラストラクチャ132内の各セルタワーのID番号により追跡されてもよい。 In the case of the capture devices 106a, 106b, the capture device 106 may be related to possession of the MMR user 110. Their physical location may be tracked by the geolocation means 142 or by the ID number of each cell tower in the cellular infrastructure 132.

図２Ｅを参照するに、本発明の一実施例による捕捉装置例106の機能ブロック図が示されている。捕捉装置106は、プロセッサ210、ディスプレイ212、キーパッド214、ストレージデバイス216、無線通信リンク218、有線通信リンク220、MMRソフトエア群222、捕捉装置ユーザインターフェース(UI)224、書類フィンガープリント照合モジュール226、第三者ソフトウエアモジュール228、及び様々な捕捉手段230の内の少なくとも１つを含む。捕捉手段の具体例は、限定ではないが例えば、ビデオカメラ232、スチルカメラ234、ボイスレコーダ236、電子ハイライター238、レーザ240、GPS装置242及びRFIDリーダ244等を含む。 Referring to FIG. 2E, a functional block diagram of an example capture device 106 according to one embodiment of the present invention is shown. The capture device 106 includes a processor 210, a display 212, a keypad 214, a storage device 216, a wireless communication link 218, a wired communication link 220, an MMR software group 222, a capture device user interface (UI) 224, and a document fingerprint verification module 226. , A third party software module 228, and at least one of various capture means 230. Specific examples of the capturing means include, but are not limited to, a video camera 232, a still camera 234, a voice recorder 236, an electronic highlighter 238, a laser 240, a GPS device 242 and an RFID reader 244.

プロセッサ210は中央処理装置(CPU)であり、限定ではないが例えば、インテルコーポレーションにより製造されているペンティアム(登録商標)マイクロプロセッサである。ディスプレイ212は、手持ち式の電子装置で使用されるような何らかの標準的なビデオディスプレイ手段である。より具体的には、ディスプレイ212は例えば何らかのディジタルディスプレイ(液晶ディスプレイ(LCD)又は有機発光ダイオード(OLED)ディスプレイ等)である。キーパッド214は何らかの標準的な英数字入力手段であり、例えば、標準的なコンピュータ装置及び携帯式の電子装置(例えば、セルラ電話)で使用されるキーパッドである。ストレージデバイス216は揮発性又は不揮発性の何らかのメモリ装置であり、例えば、周知のハードディスクドライブやランダムアクセスメモリ(RAM)装置である。 The processor 210 is a central processing unit (CPU), for example, but not limited to, a Pentium (registered trademark) microprocessor manufactured by Intel Corporation. Display 212 is any standard video display means such as used in handheld electronic devices. More specifically, the display 212 is, for example, any digital display (such as a liquid crystal display (LCD) or an organic light emitting diode (OLED) display). Keypad 214 is any standard alphanumeric input means, such as a keypad used in standard computer devices and portable electronic devices (eg, cellular phones). The storage device 216 is any volatile or non-volatile memory device, such as a well-known hard disk drive or random access memory (RAM) device.

無線通信リンク218は無線データ通信手段であり、アクセスポイント(図示せず)を介して及びLAN(例えば、IEEE802.11Wi-Fi又はブルートゥース技術)を介して良く知られているように直接的な１対１通信又は無線通信を行う。有線通信リンク220は有線データ通信手段であり、例えば、標準的なイーサーネット及び／又はUSB接続を介して直接的な通信を行う。 The wireless communication link 218 is a wireless data communication means, which is a direct one as is well known via an access point (not shown) and via a LAN (eg, IEEE 802.11 Wi-Fi or Bluetooth technology). Perform one-to-one communication or wireless communication. The wired communication link 220 is a wired data communication means, and performs direct communication via, for example, standard Ethernet and / or USB connection.

MMRソフトエア群222は全体的な管理ソフトウエアであり、あるタイプのメディアを第２のタイプに併合するようなMMR処理を実行する。MMRソフトエア群222の更なる詳細は、図４を参照しながら説明される。 The MMR software group 222 is overall management software, and executes MMR processing that merges a certain type of media with the second type. Further details of the MMR soft air group 222 will be described with reference to FIG.

捕捉装置ユーザインターフェース(UI)224は、捕捉装置106を操作するためのユーザインターフェースである。捕捉装置UE224を利用することで、機能の選択肢に関する様々なメニューがMMRユーザ110に提示される。より具体的には、捕捉装置ユーザインターフェース(UI)224のメニューはMMRユーザ110がタスクを管理することを可能にし、限定ではないが例えば、書類と相互作用すること、既存の書類からデータを読み取ること、既存の書類にデータを書き込むこと、それらの書類に関連する拡張されたリアリティとともに閲覧及び相互作用すること、彼／彼女のMMRコンピュータ112に表示された書類に関連する拡張されたリアリティとともに閲覧及び相互作用すること等を可能にする。 A capture device user interface (UI) 224 is a user interface for operating the capture device 106. By using the capturing device UE224, various menus regarding function options are presented to the MMR user 110. More specifically, the capture device user interface (UI) 224 menu allows the MMR user 110 to manage tasks, including but not limited to interacting with documents and reading data from existing documents. , Writing data to existing documents, viewing and interacting with the enhanced reality associated with those documents, viewing with the enhanced reality associated with the documents displayed on his / her MMR computer 112 And allows interaction and the like.

書類フィンガープリント照合モジュール226は、捕捉装置106の少なくとも１つの捕捉手段230を介して、捕捉したテキスト画像から特徴を抽出するソフトウエアモジュールである。書類フィンガープリント照合モジュール226は、捕捉した画像及び書類データベースの間でパターン照合を実行することもできる。最も基本的なレベルでの一実施例によれば、書類フィンガープリント照合モジュール226は大きなページ画像中の画像パッチの一部(小区画)を決定し、その大きなページ画像は書類の多くの集合から選択されたものである。捕捉したデータを受信すること、捕捉したデータから画像の表現を抽出すること、書類中のパッチ判定及びモーション分析を実行すること、判定の合成を実行すること、入力画像が位置付けられるｘ−ｙ座標のリストを出力すること等のルーチン又はプログラムを書類フィンガープリント照合モジュール226は含む。例えば、書類フィンガープリント照合モジュール226は、テキストのフィンガープリント(指紋)の画像から抽出された水平方向の及び垂直方向の(縦横の)特徴を結合し、その特徴の抽出元の書類及び書類中のセクションを確認するアルゴリズムでもよい。いったん特徴が抽出されると、例えばMMRコンピュータ112又はネットワークメディアサーバ114にある印刷書類インデックス(図示せず)が問い合わせられ、シンボリック(symbolic)書類を確認する。捕捉装置ユーザインターフェース(UI)224の制御の基で、書類フィンガープリント照合モジュール226は印刷書類インデックスへのアクセス権を有する。印刷書類インデックス図３のMMRコンピュータ112を参照しながら更に詳細に説明される。代替実施例では、書類フィンガープリント照合モジュール226はMMRコンピュータ112の一部であり且つ捕捉装置106内に設けられないようにしてもよいことに留意を要する。そのような例では、捕捉装置106は未処理の捕捉データをMMRコンピュータ112へ、画像抽出、パターン照合、書類及び位置判定に備えて送信する。更に別の実施例では、書類フィンガープリント照合モジュール226は特徴抽出のみを実行し、抽出された特徴がパターン照合及び認証に備えてMMRコンピュータ112に送信される。 The document fingerprint matching module 226 is a software module that extracts features from the captured text image via at least one capturing means 230 of the capturing device 106. The document fingerprint matching module 226 can also perform pattern matching between the captured image and the document database. According to one embodiment at the most basic level, the document fingerprint matching module 226 determines a portion (small section) of an image patch in a large page image, and the large page image is derived from a large collection of documents. Selected. Receiving captured data, extracting a representation of the image from the captured data, performing patch determination and motion analysis in the document, performing a synthesis of the determination, xy coordinates where the input image is located The document fingerprint verification module 226 includes a routine or program such as outputting a list of For example, the document fingerprint matching module 226 combines horizontal and vertical (vertical and horizontal) features extracted from a text fingerprint (fingerprint) image and extracts the features from the document and the document from which the features were extracted. An algorithm for checking the section may be used. Once the features are extracted, a printed document index (not shown), for example, located in the MMR computer 112 or network media server 114 is queried to confirm the symbolic document. Under the control of the capture device user interface (UI) 224, the document fingerprint verification module 226 has access to the printed document index. The printed document index is described in further detail with reference to the MMR computer 112 of FIG. Note that in an alternative embodiment, the document fingerprint verification module 226 may be part of the MMR computer 112 and not be provided in the capture device 106. In such an example, capture device 106 transmits raw capture data to MMR computer 112 for image extraction, pattern matching, document and position determination. In yet another embodiment, the document fingerprint matching module 226 performs only feature extraction, and the extracted features are sent to the MMR computer 112 for pattern matching and authentication.

第三者ソフトウエアモジュール228は、捕捉装置106で生じる何らかの処理を支援する何らかの第三者ソフトウエアモジュールを表す。ダイダン社ソフトウエアの具体例は、セキュリティソフトウエア、画像検出ソフトウエア、画像処理ソフトウエア及びMMRデータベースソフトウエアを含む。 Third party software module 228 represents any third party software module that supports any processing that occurs in capture device 106. Examples of Daidan software include security software, image detection software, image processing software, and MMR database software.

上述したように、捕捉装置106は捕捉手段230をいくつでも含んでよく、それらの具体例が以下に説明される。 As mentioned above, the capture device 106 may include any number of capture means 230, examples of which are described below.

ビデオカメラ232はディジタルビデオ記録装置であり、標準的なディジタルカメラやセルラ電話で見受けられるものでもよい。 The video camera 232 is a digital video recording device and may be found on a standard digital camera or cellular telephone.

スチルカメラ234は、ディジタル画像を捕捉することができる何らかの標準的なディジタルカメラ装置である。 The still camera 234 is any standard digital camera device that can capture digital images.

ボイスレコーダ236は、何らかの標準的な記録装置(マイクロフォン及び関連するハードウエア)であり、その記録装置は、オーディオ信号を捕捉し、それをディジタル形式で出力することができる。 The voice recorder 236 is any standard recording device (microphone and associated hardware) that can capture the audio signal and output it in digital form.

電子ハイライター238は、印刷されたテキスト、バーコード及び小さな画像をPC、ラップトップコンピュータ又はPDA装置に対して走査、格納及び転送することができる電子ハイライターである。電子ハイライター238は、例えば、ウィズコムテクノロジーズのクイックリンクペンハンドヘルドスキャナ(Quicklink Pen Handheld Scanner)であり、シリアルポート、赤外線通信又はUSBアダプタを介して情報をペンに格納することや情報をコンピュータぷりケーションに直接的に転送することを可能にする。 The electronic highlighter 238 is an electronic highlighter that can scan, store and transfer printed text, barcodes and small images to a PC, laptop computer or PDA device. The electronic highlighter 238 is, for example, a Quicklink Pen Handheld Scanner from Withcom Technologies, which stores information in the pen and computer information via a serial port, infrared communication or USB adapter. Can be transferred directly to.

レーザ240は良く知られているように誘導放出、コヒーレント、準単色光により光を生成する光源である。例えばレーザ240は標準的なレーザダイオードであり、順方向にバイアスされた場合にコヒーレント光を放出する半導体素子である。レーザ240に関連するもの及び包含されるものは、レーザ240が向けられている画像で反射された光量を測定するディテクタである。 As is well known, the laser 240 is a light source that generates light by stimulated emission, coherent, or quasi-monochromatic light. For example, laser 240 is a standard laser diode, a semiconductor device that emits coherent light when forward biased. Associated with and included with laser 240 is a detector that measures the amount of light reflected from the image to which laser 240 is directed.

GPS装置242は、例えば緯度及び経度のディジタルデータのような位置データを用意する何らかのポータブルGPS受信装置である。ポータブルGPS受信装置242の具体例は、ソニーコーポレーションのUV-U70ポータブル衛星ナビゲーションシステム、タレスノースアメリカ(Thales North America)コーポレーションのマゼランブランドロードメートシリーズGPS装置、メリディアンシリーズ(Meridian Series)GPS装置及びイクスプロリストシリーズ(eXplorist Series)GPS装置等である。GPS装置242は、リアルタイムで、或る程度、三角測量により、捕捉装置106の位置を決定する方法を複数のジオロケーション手段142に良く知られているようにもたらす。 The GPS device 242 is any portable GPS receiver that prepares position data such as digital data of latitude and longitude. Specific examples of portable GPS receivers 242 include Sony Corporation's UV-U70 portable satellite navigation system, Thales North America Corporation's Magellan brand roadmate series GPS devices, Meridian Series GPS devices and Xpro EXplorist Series GPS device. The GPS device 242 provides a method for determining the position of the capture device 106 in real time, to some extent by triangulation, as is well known to the plurality of geolocation means 142.

RFIDリーダ244は、テキサスインスツルメントで製造されているTI-RFIDシステムのような市販のRFIDタグリーダシステムである。RFIDタグは無線波を利用することで固有の事項(アイテム)を確認するための無線装置である。周知のように、RFIDタグはアンテナに取り付けられたマイクロチップで形成され、そのマイクロチップに固有のディジタル識別番号が格納されている。 The RFID reader 244 is a commercially available RFID tag reader system, such as the TI-RFID system manufactured by Texas Instruments. The RFID tag is a wireless device for confirming unique items (items) by using a radio wave. As is well known, an RFID tag is formed by a microchip attached to an antenna, and a unique digital identification number is stored in the microchip.

ある特定の実施例では、捕捉装置106は、プロセッサ210、ディスプレイ212、キーパッド214、ストレージデバイス216、無線通信リンク218、有線通信リンク220、MMRソフトエア群222、捕捉装置ユーザインターフェース(UI)224、書類フィンガープリント照合モジュール226、第三者ソフトウエアモジュール228、及び様々な捕捉手段230の内の少なくとも１つを含む。その場合、捕捉装置106は全機能搭載型装置である。或いは、捕捉装置106はより少ない機能しか備えてなくてもよいし、機能コンポーネントの限定された組しか含んでなくてもよい。例えば、MMRソフトエア群222及び書類フィンガープリント照合モジュール226は遠隔的に存在し、例えば、MMRシステム100bのMMRコンピュータ112又はネットワークメディアサーバ114は、無線通信リンク218又は有線通信リンク220を介して捕捉装置106からアクセスされてもよい。 In certain embodiments, capture device 106 includes processor 210, display 212, keypad 214, storage device 216, wireless communication link 218, wired communication link 220, MMR software group 222, capture device user interface (UI) 224. , A document fingerprint verification module 226, a third party software module 228, and at least one of various capture means 230. In that case, the capture device 106 is a fully functional device. Alternatively, the capture device 106 may have fewer functions or may include only a limited set of functional components. For example, the MMR software group 222 and the document fingerprint verification module 226 exist remotely, for example, the MMR computer 112 or the network media server 114 of the MMR system 100b is captured via the wireless communication link 218 or the wired communication link 220. It may be accessed from device 106.

MMRコンピュータ
図３を参照するに、本発明の一実施例により構築されたMMRコンピュータ112が示されている。図示されているように、１つ以上のマルチメディア(MM)ファイル336を含むネットワークメディアサーバ114に、印刷書類118を生成するユーザプリンタ116に、書類スキャナ127に及び捕捉装置106にMMRコンピュータ112は接続されており、捕捉装置は書類フィンガープリント照合モジュール226の第１インスタンス及び捕捉装置ユーザインターフェース(UI)224を含む。これらのコンポーネント間の通信リンクは、直接的なリンクでもよいし、或いはネットワークを介してもよい。更に、書類スキャナ127は書類フィンガープリント照合モジュール226’の第２インスタンスを含む。 MMR Computer Referring to FIG. 3, an MMR computer 112 constructed in accordance with one embodiment of the present invention is shown. As shown, the MMR computer 112 is connected to a network media server 114 containing one or more multimedia (MM) files 336, to a user printer 116 that generates a printed document 118, to a document scanner 127, and to a capture device 106. Connected, the capture device includes a first instance of a document fingerprint verification module 226 and a capture device user interface (UI) 224. The communication link between these components may be a direct link or over a network. In addition, document scanner 127 includes a second instance of document fingerprint verification module 226 ′.

本実施例のMMRコンピュータ112は、1つ以上のソースファイル310、第１ソース書類(SD)ブラウザ312、第２SDブラウザ314、プリンタ装置316、印刷書類(PD)捕捉モジュール318、PDインデックス322を格納するドキュメントイベントデータベース320、イベント捕捉モジュール324、書類分析モジュール326、マルチメディア(MM)クリップブラウザ／エディタモジュール328、MMのプリンタドライバ330、書類ビデオ分析(DVP: Document-to-Video Parser)印刷システム332及びビデオペーパー334を含む。 The MMR computer 112 of this embodiment stores one or more source files 310, a first source document (SD) browser 312, a second SD browser 314, a printer device 316, a print document (PD) capture module 318, and a PD index 322. Document event database 320, event capture module 324, document analysis module 326, multimedia (MM) clip browser / editor module 328, MM printer driver 330, document video analysis (DVP: Document-to-Video Parser) printing system 332 And video paper 334.

ソースファイル310は、書類(又はその一部分)の電子表現である何らかのソースファイルを表す。ソースファイル310の具体例は、ハイパーテキストマークアップ言語(HTML)ファイル、マイクロソフトワードファイル、マイクロソフトパワーポイントファイル、シンプルテキストファイル、ポータブルドキュメントフォーマット(PDF)ファイル等であり、これらはMMRコンピュータ112のハードドライブ(又は他の適切なストレージ)に格納される。 Source file 310 represents any source file that is an electronic representation of a document (or a portion thereof). Specific examples of source files 310 are hypertext markup language (HTML) files, Microsoft Word files, Microsoft PowerPoint files, simple text files, portable document format (PDF) files, etc., which are the hard drives of MMR computer 112 ( Or other suitable storage).

第１SDブラウザ312及び第２SDブラウザ314は、スタンドアローンPCアプリケーション又は既存のPCアプリケーション用のプラグイン(plug-in)であり、ソースファイル310に関連付けられているデータへのアクセスを可能にする。第１及び第２SDブラウザ312,314はオリジナルのHTMLファイル又はMMクリップをMMRコンピュータ112で表示するために抽出する。 The first SD browser 312 and the second SD browser 314 are plug-ins for stand-alone PC applications or existing PC applications and allow access to data associated with the source file 310. The first and second SD browsers 312 and 314 extract the original HTML file or MM clip for display on the MMR computer 112.

プリンタ装置316はプリンタドライバソフトウエアであり、アプリケーションと(何らかの特定のプリンタで使用されている)ページ記述言語又はプリンタ制御言語との間の通信リンクを周知のように制御する。特に、印刷書類118のような書類が印刷されるときはいつでも、印刷装置用にリコーコーポレーションで用意されているもののように、プリンタドライバ316は制御コマンドを正すデータをプリンタ116に供給する。一実施例では、印刷装置316は通常の印刷装置とは異なり、印刷される全ページの全キャラクタのｘ−ｙ座標、フォント及びポイントサイズの表現物を自動的に捕捉する。言い換えれば、印刷装置は印刷される書類全体の内容についての情報を捕捉し、そのデータをPD捕捉モジュール318にフィードバックする。 The printer device 316 is printer driver software and controls the communication link between the application and the page description language (used in any particular printer) or printer control language as is well known. In particular, whenever a document such as the printed document 118 is printed, the printer driver 316 provides the printer 116 with data to correct the control command, such as those prepared by Ricoh Corporation for the printing device. In one embodiment, the printing device 316 automatically captures an xy coordinate, font, and point size representation of all characters of all printed pages, unlike a normal printing device. In other words, the printing device captures information about the contents of the entire document to be printed and feeds the data back to the PD capture module 318.

印刷書類(PD)捕捉モジュール318は書類の印刷表現を捕捉するソフトウエアアプリケーションであり、キャラクタ及び図形の印刷ページでのレイアウトが取り出せるようにする。更に、PD捕捉モジュール318を利用することで、書類の印刷表現は、自動的に、リアルタイムで及び／又は印刷時に捕捉される。より具体的には、PD捕捉モジュール318はあるソフトウエアルーチンであり、印刷ページ上での文字の２次元配置を捕捉し、その情報をPDインデックス322に伝送する。一実施例では、PD捕捉モジュール318は、印刷ページ上の全キャラクタのウインドウズテキストレイアウトコマンドを取得することで処理を行う。そのテキストレイアウトコマンドは、フォントやポイントサイズ等だけでなく印刷ページ上の全キャラクタのｘ−ｙ座標をオペレーティングシステム(OS)に示す。実際には、PD捕捉モジュール318はプリンタ116に伝送された印刷データを傍受する。図示の例では、PD捕捉モジュール318はデータを捕捉するために第１SDブラウザ312の出力に結合される。代替的に、PD捕捉モジュール318の機能はプリンタドライバ316内で直接的に実行されてもよい。様々なコンフィギュレーションが可能なことが本願の開示により明らかになるであろう。 A print document (PD) capture module 318 is a software application that captures a printed representation of a document, allowing the layout of character and graphic print pages to be retrieved. Further, by utilizing the PD capture module 318, a printed representation of the document is captured automatically, in real time and / or at the time of printing. More specifically, the PD capture module 318 is a software routine that captures the two-dimensional arrangement of characters on the printed page and transmits the information to the PD index 322. In one embodiment, the PD capture module 318 performs processing by obtaining Windows text layout commands for all characters on the printed page. The text layout command indicates to the operating system (OS) the xy coordinates of all characters on the printed page as well as the font and point size. In practice, the PD capture module 318 intercepts the print data transmitted to the printer 116. In the illustrated example, the PD capture module 318 is coupled to the output of the first SD browser 312 to capture data. Alternatively, the functionality of the PD capture module 318 may be performed directly within the printer driver 316. It will be apparent from the present disclosure that various configurations are possible.

ドキュメントイベントデータベース320は、本発明の一実施例により、印刷書類及びイベント間の関係を格納するように修正された何らかの標準的なデータベースである。（ドキュメントイベントデータベース320は図３４Ａを参照しながらMMRデータベースとして更に後述される。）例えば、ドキュメントイベントデータベース320は、ソースファイル310(例えば、ワードファイル、HTMLファイル、PDFファイル)から、印刷書類118に関連するイベントへの双方向のリンクを格納する。イベントの例は、ワード書類が印刷された直後に捕捉装置106でマルチメディアクリップを捕捉すること、捕捉装置106のクライアントアプリケーションとともにマルチメディアを書類に加えること、又はマルチメディアクリップに対する注釈等である。更に、ドキュメントイベントデータベース320に格納されてもよいソースファイル310に関連する他のイベントは、所与のソースファイル310が開かれた、閉じられた又は削除されたときのロギング、所与のソースファイル310がMMRコンピュータ112のデスクトップ上でアクティブアプリケーション内にあった場合のロギング、書類「コピー」及び「移動」操作の回数及び宛先のロギング、所与のソースファイル310の編集履歴のロギング等を含む。そのようなイベントは、イベント捕捉モジュール324で捕捉され、ドキュメントイベントデータベース320に格納される。ドキュメントイベントデータベース320は、ソースファイル310の出力を、イベント捕捉モジュール324、PD捕捉モジュール318及びスキャナ127の出力を受信するように結合され、クエリ及びデータを受けるように捕捉装置106にも結合され、出力を用意する。 The document event database 320 is any standard database that has been modified to store relationships between printed documents and events according to one embodiment of the present invention. (The document event database 320 is further described below as an MMR database with reference to FIG. 34A.) For example, the document event database 320 is transferred from the source file 310 (eg, a word file, HTML file, PDF file) to the print document 118. Stores bi-directional links to related events. Examples of events include capturing a multimedia clip at the capture device 106 immediately after the word document is printed, adding multimedia to the document with the client application of the capture device 106, or annotating the multimedia clip. In addition, other events associated with the source file 310 that may be stored in the document event database 320 include logging when a given source file 310 is opened, closed, or deleted, a given source file This includes logging when 310 is in the active application on the desktop of MMR computer 112, logging the number of document “copy” and “move” operations and destinations, logging the editing history of a given source file 310, and so on. Such events are captured by the event capture module 324 and stored in the document event database 320. The document event database 320 is coupled to receive the output of the source file 310 to receive the output of the event capture module 324, PD capture module 318 and scanner 127, and is also coupled to the capture device 106 to receive queries and data, Prepare output.

ドキュメントイベントデータベース320は、PDインデックス322も格納する。PDインデックスはソフトウエアアプリケーションであり、印刷書類の画像から取り出した特徴をそれらのシンボリックフォームに(例えば、スキャンされた画像をワードに)マッピングする。一実施例では、PD捕捉モジュール318は、フォントやポイントサイズ等だけでなく、印刷ページ上の全キャラクタのｘ−ｙ座標をPDインデックス322に与える。PDインデックス322は所与の書類が印刷された時点で作成される。しかしながら、全ての印刷データは、後の時点で取得される方法で捕捉されPDインデックス322に保存されてもよい。例えば、印刷書類118が、言葉「薔薇」の１行前のページに物理的に位置する言葉「庭」を含んでいた場合、PDインデックス322はそのようなクエリ機能をサポートする(即ち、言葉「薔薇」に先行して言葉「庭」がある)。PDインデックス322はどの書類のどのページのどの場所で(そのページ中のどの場所で)、言葉「庭」が言葉「薔薇」の前に現れているかの記録を含む。従って、PDインデックス322は特徴ベースの又はテキストベースのクエリをサポートするよう組織される。PDDインデックス322の内容は、印刷書類の電子表現であり、印刷処理中にPD捕捉モジュール318を利用することで及び／又はスキャン処理中に書類スキャナ127の書類フィンガープリント照合モジュール226’を利用することで生成される。更なるデータベース320及びPDインデックス322のアーキテクチャ及び機能は、図３４Ａ−Ｃ、３５及び３６を参照しながら後述される。 The document event database 320 also stores a PD index 322. A PD index is a software application that maps features extracted from images of a printed document to their symbolic forms (eg, scanned images to words). In one embodiment, PD capture module 318 provides PD index 322 with xy coordinates of all characters on the printed page, as well as fonts, point sizes, and the like. The PD index 322 is created when a given document is printed. However, all print data may be captured and stored in the PD index 322 in a manner obtained at a later time. For example, if the printed document 118 includes the word “garden” physically located on the page one line before the word “rose”, the PD index 322 supports such a query function (ie, the word “rose”). The word “garden” precedes “rose”. The PD index 322 contains a record of which page of which page of which document (where in the page) the word “garden” appears before the word “rose”. Accordingly, the PD index 322 is organized to support feature-based or text-based queries. The content of the PDD index 322 is an electronic representation of the printed document, using the PD capture module 318 during the printing process and / or using the document fingerprint verification module 226 'of the document scanner 127 during the scanning process. Is generated. Further database 320 and PD index 322 architectures and functions are described below with reference to FIGS. 34A-C, 35 and 36.

イベント捕捉モジュール324は、MMRコンピュータ112でイベントを捕捉するソフトウエアアプリケーションであり、所与の印刷書類118及び／又はソースファイル310に関連付けられる。これらのイベントは所与のソースファイル310のライフサイクルの中で捕捉され、ドキュメントイベントデータベース320に保存される。特定の実施例では、イベント捕捉モジュール324を用いることで、第１SDブラウザ312のようなコンピュータ112のブラウザの中でアクティブなHTMLファイルに関するイベントが捕捉される。これらのイベントは、HTMLファイルがMMRコンピュータ112に表示された時点、又はそのHTMLファイルが表示若しくは印刷されている際に同時に開かれていた他の書類のファイル名等を含んでもよい。このイベント情報は、例えば、HTMLファイルが表示又は印刷されている際に彼／彼女が閲覧又は作業していた書類が何であるかを、MMRユーザ110が(後の時点で)知ることを希望した場合に特に有用である。イベント捕捉モジュール324により捕捉されるイベントの具体例は、書類編集履歴、所与のソースファイル310がデスクトップ上にあった近辺の時点でなされたオフィスミーティングの映像、及び所与のソースファイル310が開かれた時点で起こった電話呼出(オフィスポータル120で捕捉されてもよい)等を含む。 The event capture module 324 is a software application that captures events at the MMR computer 112 and is associated with a given print document 118 and / or source file 310. These events are captured in the life cycle of a given source file 310 and stored in the document event database 320. In a particular embodiment, event capture module 324 is used to capture events related to an active HTML file in a browser on computer 112, such as first SD browser 312. These events may include the file name of another document that was opened at the same time when the HTML file was displayed on the MMR computer 112 or when the HTML file was displayed or printed. This event information, for example, wanted MMR user 110 to know (at a later time) what document he / she was viewing or working on when the HTML file was displayed or printed. It is particularly useful in cases. Specific examples of events captured by the event capture module 324 include document editing history, video of an office meeting made at a time when a given source file 310 was on the desktop, and the given source file 310 opened. Phone calls (which may be captured at the office portal 120), etc.

イベント捕捉モジュール324の具体的な機能は：１）トラッキング−アクティブなファイル及びアプリケーションを追跡すること、２）キーストロークキャプチャリング−キーストローク捕捉及びアクティブアプリケーションとの関連付け、３）フレームバッファキャプチャリング及びインデキシング−各フレームバッファ画像はフレームバッファデータの光文字認識(OCR)の結果と索引付けされることを含み、印刷書類の一部が画面に表示された場合に照合できるようにする。或いは、グラフィカルディスプレイインターフェース(GDI)シャドーdllとともにテキストが捕捉可能であり、PCオペレーティングシステムにより発行されたPCデスクトップについて文字を描くコマンドを追跡する。MMRユーザ110は捕捉装置106をある書類で指示し、それがMMRコンピュータ112のデスクトップでアクティブであった時点を確認してもよい；上記の機能は、４）履歴捕捉読取−フレームバッファキャプチャリング及びインデキシング処理のデータは、彼／彼女のMMRコンピュータ112のデスクトップで書類がアクティブであった時点の分析とリンクされ、どの程度長いか、及び特定の書類のどの部分かを追跡するように、MMRユーザ110に表示可能にすることを含む。その際、MMRユーザ110が書類を読んでいたか否かを推察するために、キーストロークやマウスの動きのような他のイベントとの関連性が考慮されてもよい。 Specific functions of the event capture module 324 are: 1) tracking-tracking active files and applications, 2) keystroke capturing-keystroke capture and association with active applications, 3) frame buffer capturing and indexing Each frame buffer image includes being indexed with optical character recognition (OCR) results of the frame buffer data so that a portion of the printed document can be verified when displayed on the screen. Alternatively, text can be captured along with a graphical display interface (GDI) shadow dll to track commands that draw characters for the PC desktop issued by the PC operating system. The MMR user 110 may point the capture device 106 with a document to see when it was active on the desktop of the MMR computer 112; the above functions include: 4) History capture read-frame buffer capture and The indexing process data is linked to an analysis of when the document was active on his / her MMR computer 112 desktop and tracks how long it is and what part of a particular document. Including making displayable at 110. At that time, in order to infer whether or not the MMR user 110 has read the document, relevance to other events such as keystrokes and mouse movements may be considered.

ドキュメントイベントデータベース320、PDインデックス322及びイベント捕捉モジュール324の組み合わせは、MMRコンピュータ112でローカルに実現されてもよいし、或いは共用データベースとして実現されてもよい。ローカルに実現する場合、共用形式で実現することに比べてセキュリティレベルの低いことが必要とされる。 The combination of the document event database 320, the PD index 322, and the event capture module 324 may be implemented locally on the MMR computer 112 or may be implemented as a shared database. When implemented locally, the security level is required to be lower than when implemented in a shared format.

書類分析モジュール326は、印刷される書類118各自に関連するソースファイル310を分析するソフトウエアアプリケーションであり、そこでの有用なオブジェクト(例えば、ユニフォームリソースロケータ(URL)、アドレス、タイトル、著者、時間又は場所を表すフレーズ(例えば、ハリーディービルディング))を特定する。その際、ソースファイル310の印刷バージョンにおけるこれらのオブジェクトの場所が特定される。書類分析モジュール326の出力は、受信装置により使用可能であり、書類118の提示を追加的な情報とともに拡張し、パターン照合の精度を改善する。更に、受信装置は、例えばURLの場合そのロケーションを使用して、URLに関連するウェブページを取り出す。書類分析モジュール326は、ソースファイル310を受信するよう結合され、書類フィンガープリント照合モジュール226に出力を提供する。捕捉装置の書類フィンガープリント照合モジュール226に結合されるようにしか示されていないが、書類分析モジュール326の出力は、書類フィンガープリント照合モジュール226の全部に又はいくつにでもそれらが位置する何処にでも結合可能である。更に、書類分析モジュール326の出力は、後の利用に備えてドキュメントイベントデータベース320に格納されてもよい。 The document analysis module 326 is a software application that analyzes the source file 310 associated with each printed document 118, where useful objects (e.g., uniform resource locator (URL), address, title, author, time or Identify a phrase that represents a place (eg, Harry Dee Building). In doing so, the location of these objects in the printed version of the source file 310 is identified. The output of the document analysis module 326 can be used by the receiving device to extend the presentation of the document 118 with additional information to improve the accuracy of pattern matching. Further, the receiving device retrieves the web page associated with the URL, for example using the location in the case of a URL. Document analysis module 326 is coupled to receive source file 310 and provides an output to document fingerprint verification module 226. Although only shown to be coupled to the document fingerprint verification module 226 of the capture device, the output of the document analysis module 326 may be output to all or any number of the document fingerprint verification modules 226. Can be combined. Further, the output of the document analysis module 326 may be stored in the document event database 320 for later use.

MMクリップブラウザ／エディタモジュール328は、オーサリング(著作)機能を提供するソフトウエアアプリケーションである。MMクリップブラウザ／エディタモジュール328は、スタンドアローンアプリケーションでもよいし、或いは書類ブラウザ上で動作するプラグインでもよい(第２SDブラウザ314に対して波線で表現されている)。MMクリップブラウザ／アプリケーション328はマルチメディアファイルをユーザに表示し、マルチメディアファイル336を受けるようにネットワークメディアサーバに結合される。更に、MMRユーザ110が書類を作成する際(例えば、マルチメディアクリップを書類に付ける際)、MMクリップブラウザ／アプリケーション328はその機能に関するサポートツールになる。MMクリップブラウザ／アプリケーション328は、マルチメディア書類が作成された頃に印刷された書類から分析された情報のようなメタデータを示すアプリケーションである。 The MM clip browser / editor module 328 is a software application that provides an authoring function. The MM clip browser / editor module 328 may be a stand-alone application or a plug-in operating on a document browser (represented by a wavy line with respect to the second SD browser 314). The MM clip browser / application 328 is coupled to the network media server to display the multimedia file to the user and receive the multimedia file 336. Furthermore, when the MMR user 110 creates a document (eg, when attaching a multimedia clip to a document), the MM clip browser / application 328 becomes a support tool for that function. The MM clip browser / application 328 is an application that shows metadata such as information analyzed from a document printed when a multimedia document is created.

MMのプリンタドライバ330はMMRドキュメントを作成する機能を用意する。例えば、MMRユーザ110は、MMのプリンタドライバ330により生成されたテキストをユーザインターフェース(UI)で強調し、マルチメディアデータを取り出すこと又はネットワーク128で若しくはMMRコンピュータ112で他の何らかのプロセスを実行することを含むアクションをそのテキストに付け加えてもよい。MMのプリンタドライバ330及びDVP印刷システム332の組み合わせは、バーコードを利用する代替的な出力をもたらす。このフォーマットは必ずしもコンテンツベースの検索技術を必要としない。MMのプリンタドライバ330は、ビデオペーパーテクノロジ−即ち、ビデオペーパー334をサポートするプリンタドライバである。MMのプリンタドライバ330は、バーコードを含む紙の表現物をマルチメディアにアクセスする手段として作成する。これに対して、プリンタドライバ316は、MMR対応技術を含む紙の表現物をマルチメディアにアクセスする手段として作成する。MMクリップブラウザ／アプリケーション328及びSDブラウザ314の組み合わせに含まれるオーサリング機能は、ソナ時出力フォーマットをSDブラウザ312として作成し、コンテンツベースの検索に備えて、MMRドキュメントの作成を可能にする。DVP印刷システム332は、書類に関連するドキュメントイベントデータベース320中の何らかのデータの処理を、その印刷表現に明示的な又は黙示的なバーコードとともに結合する。黙示的なバーコードは、バーコードに使用されるもののようなテキスト属性のパターンに関連する。 The MM printer driver 330 provides a function for creating an MMR document. For example, the MMR user 110 may highlight text generated by the MM printer driver 330 in a user interface (UI) and retrieve multimedia data or perform some other process on the network 128 or on the MMR computer 112 An action containing may be added to the text. The combination of the MM printer driver 330 and the DVP printing system 332 provides an alternative output that utilizes barcodes. This format does not necessarily require content-based search technology. The MM printer driver 330 is a printer driver that supports video paper technology, ie, video paper 334. The printer driver 330 of the MM creates a paper expression including a barcode as a means for accessing the multimedia. On the other hand, the printer driver 316 creates a paper expression including MMR-compatible technology as a means for accessing the multimedia. The authoring function included in the combination of the MM clip browser / application 328 and the SD browser 314 creates a sonar time output format as the SD browser 312 and enables creation of an MMR document in preparation for content-based search. The DVP printing system 332 couples the processing of any data in the document event database 320 associated with the document to the printed representation with an explicit or implicit barcode. Implicit bar codes relate to patterns of text attributes, such as those used for bar codes.

ビデオペーパー334は、紙のような印刷可能な媒体でオーディオビジュアル情報を表す技術である。ビデオペーパーでは、コンピュータ内に格納された又はコンピュータないでアクセス可能な電子コンテンツに対する指標としてバーコードが使用される。ユーザはバーコードをスキャンすると、そのテキストに関連するビデオクリップその他のマルチメディアコンテンツがシステムから出力される。オーディオ又はビデオペーパーを印刷するシステムは存在し、これらのシステムは本質的にはマルチメディア情報について紙ベースのインターフェースをもたらす。 Video paper 334 is a technique for representing audiovisual information on a printable medium such as paper. In video paper, barcodes are used as indicators for electronic content stored in or accessible without a computer. As the user scans the barcode, the system outputs video clips and other multimedia content associated with the text. Systems exist that print audio or video paper, and these systems essentially provide a paper-based interface for multimedia information.

ネットワークメディアサーバ114のMMファイル336は、様々なファイルタイプ及びファイルフォーマットの内のどの集合をも表す。例えば、MMファイル336はテキストソースファイル、ウェブページ、オーディオファイル、ビデオファイル、オーディオ／ビデオファイル及びイメージファイル(例えば、静止画写真)である。 The network media server 114 MM file 336 represents any set of various file types and file formats. For example, the MM file 336 is a text source file, a web page, an audio file, a video file, an audio / video file, and an image file (for example, a still picture).

図１Ｂで述べたように、書類スキャナ127は既存の印刷書類をMMR仕様書類に変換する際に使用される。しかしながら、更に図３を参照するに、書類スキャナ127は、書類フィンガープリント照合モジュール226’の特徴抽出処理を、走査される書類の全ページに適用することで、既存の書類に対してMMR処理を行うことを可能にするために使用される。その後、PDインデックス322には走査及び特徴抽出結果が投入され、走査された書類の電子表現がドキュメントイベントデータベース320に格納される。PDインデックス322の情報は、MMRドキュメントを作成するのに使用可能である。 As described in FIG. 1B, the document scanner 127 is used to convert an existing printed document into an MMR specification document. Still referring to FIG. 3, however, the document scanner 127 applies MMR processing to an existing document by applying the feature extraction processing of the document fingerprint matching module 226 'to all pages of the scanned document. Used to make it possible to do. Thereafter, the scanning and feature extraction results are entered into the PD index 322, and an electronic representation of the scanned document is stored in the document event database 320. The information in the PD index 322 can be used to create an MMR document.

図３を更に参照するに、MMRコンピュータ112のソフトウエア機能は、MMRコンピュータ112だけに限定されないことに留意を要する。或いは、MMRコンピュータ112、ネットワークメディアサーバ114、サービスプロバイダサーバ122及び捕捉装置106のMMRシステム100bの中でユーザが決めた何らかのコンフィギュレーションの中で図３に示されるソフトウエア機能が分散されてもよい。例えば、ソースファイル310、SDブラウザ312、SDブラウザ314、プリンタ装置316、PD捕捉モジュール318、ドキュメントイベントデータベース320、PDインデックス322、イベント捕捉モジュール324、書類分析モジュール326、MMクリップブラウザ／エディタモジュール328、MMのプリンタドライバ330及びDVP印刷システム332は、捕捉装置106内に完全に備わっていてもよく、それにより、捕捉装置106にとって強化された機能をしてもよい。 With further reference to FIG. 3, it should be noted that the software functions of the MMR computer 112 are not limited to the MMR computer 112 alone. Alternatively, the software functions shown in FIG. 3 may be distributed in any configuration determined by the user in the MMR system 100b of the MMR computer 112, the network media server 114, the service provider server 122, and the acquisition device 106. . For example, source file 310, SD browser 312, SD browser 314, printer 316, PD capture module 318, document event database 320, PD index 322, event capture module 324, document analysis module 326, MM clip browser / editor module 328, The MM printer driver 330 and the DVP printing system 332 may be fully contained within the capture device 106, thereby providing enhanced functionality for the capture device 106.

MMRソフトウエア群
図４は、本発明の一実施例によるMMRソフトウエア群222に含まれるソフトウエアコンポーネント群を示す。MMRソフトウエア群222の全部又はいくつかは、MMRコンピュータ112、捕捉装置106、ネットワークメディアサーバ114及び他のサーバに含まれてもよいことに留意すべきである。更に、MMRソフトウエア群222の他の例は、図示のコンポーネントを１つ乃至全部に至るまでいくつでも所有してよい。本実施例のMMRソフトウエア群222は、マルチメディア注釈ソフトウエア410(テキストコンテンツベースの抽出部412、画像コンテンツベースの抽出部414及びステガノグラフィック修正部416を含む)と、ペーパー読み取り履歴ログ418と、オンライン読み取り履歴ログ420と、協同的書類確認部422と、リアルタイム通知部424と、マルチメディア取得部426と、デスクトップビデオリマインダ部428と、ウェブページリマインダ部430と、物理履歴ログ432と、記入済書類確認部434と、伝搬時間部436と、位置確認部438と、PCオーサリング部440と、書類オーサリング部442と、捕捉装置オーサリング部444と、無意識的アップロード部446と、書類バージョン取得部448と、PC書類メタデータ部450と、捕捉装置ユーザインターフェース(UI)部452と、ドメイン固有コンポーネント454とを含む。 MMR Software Group FIG. 4 illustrates software component groups included in the MMR software group 222 according to one embodiment of the present invention. It should be noted that all or some of the MMR software group 222 may be included in the MMR computer 112, the capture device 106, the network media server 114, and other servers. Further, other examples of the MMR software group 222 may have any number of the illustrated components, from one to all. The MMR software group 222 of this embodiment includes multimedia annotation software 410 (including a text content-based extraction unit 412, an image content-based extraction unit 414, and a steganographic correction unit 416), a paper reading history log 418, , Online reading history log 420, collaborative document checking unit 422, real-time notification unit 424, multimedia acquisition unit 426, desktop video reminder unit 428, web page reminder unit 430, physical history log 432, entry Completed document confirmation unit 434, propagation time unit 436, position confirmation unit 438, PC authoring unit 440, document authoring unit 442, capture device authoring unit 444, unconscious upload unit 446, and document version acquisition unit 448 A PC document metadata unit 450, a capture device user interface (UI) unit 452, a domain specific component 454, including.

ドキュメントイベントデータベース320の組織とともに結合したマルチメディア注釈ソフトウエア410は、ある特定の実施例によるMMRシステム100bの基礎的な技術をなす。より具体的には、マルチメディア注釈ソフトウエア410は、紙の書類についてのマルチメディア注釈を管理するためのものである。例えば、MMRユーザ110は、紙書類の何らかの部分で捕捉装置を指示し、捕捉装置106の少なくとも１つの捕捉手段230を用いてその部分に注釈を付ける。特定の例では、契約書のある部分について法律家が注釈を指図する(オーディオファイルを作成する)。そのマルチメディアデータ(オーディオファイル)は、その書類のオリジナルの電子バージョンに自動的に添付される。その書類の以後の印刷は、選択的に、それらの注釈の存在の指標を含む。 The multimedia annotation software 410 combined with the organization of the document event database 320 forms the basic technology of the MMR system 100b according to a specific embodiment. More specifically, multimedia annotation software 410 is for managing multimedia annotations on paper documents. For example, the MMR user 110 points the capture device at some portion of the paper document and annotates that portion using at least one capture means 230 of the capture device 106. In a specific example, a lawyer directs an annotation (creates an audio file) for a certain part of the contract. The multimedia data (audio file) is automatically attached to the original electronic version of the document. Subsequent printing of the document optionally includes an indication of the presence of those annotations.

テキストコンテンツベースの抽出部412は、コンテンツベースの情報をテキストから取り出すソフトウエアアプリケーションである。例えば、テキストコンテンツベースの抽出部412を用いることで、テキストのパッチ(一部)からコンテンツが取得され、オリジナルの書類及び書類中のセクションが確認される、或いはそのパッチにリンクした他の情報が確認される。テキストコンテンツベースの抽出部412はOCRベースの技術を利用してもよい。或いは、テキストからコンテンツベースの抽出を実行するために非OCRベースの技術は、テキストのパッチ内でのワード長の２次元配列を含む。テキストコンテンツベースの抽出部412の一例はあるアルゴリズムであり、そのアルゴリズムは、テキストの一部の画像から抽出された水平方向の及び垂直方向の(縦横の)特徴を結合し、書類及びその書類中の抽出されたセクションを確認する。水平方向の及び垂直方向の特徴は、連続的に、並列的に或いは同時に使用されてもよい。そのような非OCRベースの特徴群を使用すると、高速な実現手段をもたらし、ノイズのある環境でのロバスト性をもたらす。 The text content-based extraction unit 412 is a software application that extracts content-based information from text. For example, by using a text content-based extraction unit 412, content is obtained from a text patch (part), the original document and sections in the document are confirmed, or other information linked to the patch It is confirmed. The text content-based extraction unit 412 may use an OCR-based technique. Alternatively, non-OCR-based techniques for performing content-based extraction from text include a two-dimensional array of word lengths within a patch of text. An example of a text content-based extractor 412 is an algorithm that combines horizontal and vertical (vertical and horizontal) features extracted from an image of a portion of text to produce a document and its document. Check the extracted section of. Horizontal and vertical features may be used sequentially, in parallel or simultaneously. Using such non-OCR-based features provides a fast implementation and robustness in noisy environments.

画像コンテンツベースの抽出部414は、コンテンツベースの情報を画像から取り出すソフトウエアである。画像コンテンツベースの抽出部414は、捕捉した画像及びデータベース320中の画像間の画像比較を実行し、可能な画像照合のリスト及び関連する信頼度を生成する。更に、一致画像の各々は、ユーザ入力に応じて実行される関連動作及び関連データを有してもよい。一例では、画像コンテンツベースの抽出部414は、例えば、画像をベクトル表現に変換することで、ラスタ画像(例えば、マップ)に基づいてコンテンツを抽出し、そのベクトル表現は、特徴の配列の同じ画像を求めて画像データベースに問い合わせるのに使用可能である。代替実施例は、カラーコンテンツの画像又は画像内のオブジェクトの幾何学的配置を利用して、データベースの中で一致する画像を探す。 The image content base extraction unit 414 is software that extracts content base information from an image. The image content based extractor 414 performs an image comparison between the captured images and the images in the database 320 to generate a list of possible image matches and associated confidence levels. Further, each of the matching images may have associated actions and associated data that are performed in response to user input. In one example, the image content-based extraction unit 414 extracts content based on a raster image (e.g., a map) by, for example, converting the image into a vector representation, and the vector representation is the same image in the feature array. Can be used to query the image database for An alternative embodiment uses the image of color content or the geometry of objects in the image to look for matching images in the database.

ステガノグラフィック(steganographic)修正部416は、印刷に先立ってステガノグラフィック的な修正を実行するソフトウエアアプリケーションである。MMRアプリケーションをより良好にイネーブルにするため、テキスト及びイメージが印刷される前にテキスト及びイメージにディジタル情報が加えられる。代替実施例では、ステガノグラフィック修正部416はMMRドキュメントを生成及び格納し、MMRドキュメントは、１）テキスト、オーディオ又はビデオ情報のようなオリジナルベースのコンテンツ、２）テキスト、オーディオ、ビデオ、アプレット、ハイパーテキストリンク等のような何らかの形式の追加的なコンテンツを含む。ステガノグラフィック修正は、カラーの又はグレースケールの画像中にウォーターマークを、書類の背景にドットパターンの印刷を、印刷されたキャラクタの輪郭の僅かな修正を埋め込み、ディジタル情報をエンコードすることを含む。 A steganographic correction unit 416 is a software application that executes a steganographic correction prior to printing. To better enable the MMR application, digital information is added to the text and image before the text and image are printed. In an alternative embodiment, the steganographic modifier 416 generates and stores an MMR document, which is 1) original based content such as text, audio or video information, 2) text, audio, video, applet, hyper Includes some form of additional content such as text links. Steganographic modifications include encoding digital information by embedding watermarks in color or grayscale images, printing dot patterns in the background of documents, and slight modifications to the contours of printed characters.

ペーパー読み取り履歴ログ418は、紙書類の読み取り履歴ログである。ペーパー読み取り履歴ログ418は、例えば、ドキュメントイベントデータベース320の中にある。ペーパー読み取り履歴ログ418は、リコーイノベーションズにより開発された、ビデオによる書類認証技術(document identification-from-video technology)に基づいてもよい。ペーパー読み取り履歴ログ418は、例えば、MMRユーザ110に書類を読み取ることを及び／又は何らかの関連するイベントをリマインドする際に特に有用である。 The paper reading history log 418 is a paper document reading history log. The paper reading history log 418 is, for example, in the document event database 320. The paper reading history log 418 may be based on a document identification-from-video technology developed by Ricoh Innovations. The paper reading history log 418 is particularly useful, for example, when reading a document to the MMR user 110 and / or reminding any related event.

オンライン読み取り履歴ログ420は、オンライン書類の読み取り履歴ログである。オンライン読み取り履歴ログ420は、オペレーティングシステムイベントの分析に基づき、ドキュメントイベントデータベース320の中にある。オンライン読み取り履歴ログ420は、MMRユーザ110により読み取られたオンライン書類の記録であり、その書類の一部分が読み取られる。オンライン読み取り履歴ログ420のエントリは、多くの方法で以後の如何なるプリントアウトで印刷されてもよく、例えば、各部分を読み取るのに費やした時間に基づいて、各ページの下にメモを用意することで又は異なる色でテキストを強調することで印刷されてよい。更に、マルチメディア注釈ソフトウエア410はPDインデックス322でこのデータを指示してもよい。或いは、オンライン読み取り履歴ログ420は、MMRコンピュータ112をモニタする顔検出システムのような装置に備わるMMRコンピュータ112により支援されてもよい。 The online reading history log 420 is an online document reading history log. Online reading history log 420 is in document event database 320 based on analysis of operating system events. The online reading history log 420 is a record of an online document read by the MMR user 110, and a part of the document is read. Entries in the online reading history log 420 may be printed in any number of subsequent printouts in a number of ways, for example, providing notes under each page based on the time spent reading each part. Or by highlighting text in different colors. Further, the multimedia annotation software 410 may indicate this data with the PD index 322. Alternatively, the online reading history log 420 may be supported by the MMR computer 112 included in a device such as a face detection system that monitors the MMR computer 112.

協同的書類確認部422はあるソフトウエアアプリケーションであり、同じ紙書類の異なるバージョンの１以上のリーダが、他のリーダにより適用されたコメントを、(彼／彼女の捕捉装置106を書類の何らかのセクションの場所で指示することによって)確認できるようにする。例えば、書類サムネイルのトップにおけるオーバーレイとして、注釈が捕捉装置106で表示されてもよい。協同的書類確認部422は、既存の如何なるタイプの協同的なソフトウエアとともに実現される又は協同してもよいもよい。 The collaborative document checker 422 is a software application in which one or more readers of different versions of the same paper document can comment on the comments applied by other readers (the his / her capture device 106 can be Can be confirmed by instructing at the location). For example, the annotation may be displayed on the capture device 106 as an overlay on top of the document thumbnail. The collaborative document confirmation unit 422 may be implemented or cooperate with any existing type of collaborative software.

リアルタイム通知部424は、読み取られる書類のリアルタイムの通知を実行するソフトウエアアプリケーションである。例えば、MMRユーザ110は書類を読み取るがその一方で、彼／彼女の読み取った跡(トレース)は、ブログに又はオンライン掲示板に掲示される。その結果、同じ話題に興味のある他の人々が、その書類について集まってチャットを行ってもよい。 The real-time notification unit 424 is a software application that performs real-time notification of a document to be read. For example, MMR user 110 reads a document, while his / her reading trace is posted on a blog or on an online bulletin board. As a result, other people interested in the same topic may gather and chat about the document.

マルチメディア取得部426は、代替的な紙書類からマルチメディアを取り出すソフトウエアアプリケーションである。例えば、MMRユーザ110は行われる会話全てを取り出す一方、捕捉装置106をその書類位置で指示することで、代替的な紙書類がMMRユーザ110の机に提示される。これは、マルチメディアデータを捕捉するMMRユーザ110のオフィスの中でオフィスポータル120が(又は他の適切な手段が)存在することを想定している。 The multimedia acquisition unit 426 is a software application that extracts multimedia from alternative paper documents. For example, the MMR user 110 retrieves all conversations that take place, while pointing the capture device 106 at that document location, an alternative paper document is presented to the MMR user 110 desk. This assumes that an office portal 120 (or other suitable means) exists in the office of the MMR user 110 that captures the multimedia data.

デスクトップビデオリマインダ部428は、MMRコンピュータ112で起こるイベントをMMRユーザ110に知らせるソフトウエアアプリケーションである。例えば、紙書類のあるセクションで捕捉装置106を指図することで、MMRユーザ110は、そのセクションが現れた場合に、MMRコンピュータ112のデスクトップにおける変化を示すビデオクリップを見てもよい。更に、デスクトップビデオリマインダ部428は、MMRコンピュータ112により記録された他のマルチメディア(例えば、MMRコンピュータ112の近辺での音響)を取り出すように使用されてもよい。 The desktop video reminder unit 428 is a software application that notifies the MMR user 110 of events that occur in the MMR computer 112. For example, by directing capture device 106 in a section of a paper document, MMR user 110 may view a video clip that shows changes in the desktop of MMR computer 112 when that section appears. Further, the desktop video reminder unit 428 may be used to retrieve other multimedia recorded by the MMR computer 112 (eg, sound near the MMR computer 112).

ウェブページリマインダ部430は、MMRユーザ110に彼／彼女のMMRコンピュータ112で見えるウェブページを思い出させるソフトウエアアプリケーションである。例えば、紙書類上で捕捉装置106をパン撮り(panning)することで、MMRユーザ110は表示されたウェブページのトレースを眺める一方、書類の関連するセクションがMMRコンピュータ112のデスクトップに示される。ウェブページは、SDブラウザ312，314のようなブラウザに示されてもよいし、捕捉装置106のディスプレイ212に示されてもよい。或いは、ウェブページは捕捉装置106のディスプレイ212に又はMMRコンピュータ112にURLそのものとして表示されてもよい。 Web page reminder unit 430 is a software application that reminds MMR user 110 of a web page visible on his / her MMR computer 112. For example, by panning the capture device 106 on a paper document, the MMR user 110 sees a trace of the displayed web page, while the relevant section of the document is shown on the desktop of the MMR computer 112. The web page may be shown in a browser such as the SD browser 312, 314, or on the display 212 of the capture device 106. Alternatively, the web page may be displayed on the display 212 of the capture device 106 or on the MMR computer 112 as the URL itself.

物理履歴ログ432は、例えばドキュメントイベントデータベース320の中にある。物理履歴ログ432は紙書類の物理履歴ログである。例えば、MMRユーザ110は彼／彼女の捕捉装置106を或る紙書類の所で指し、物理履歴ログ432に格納されている情報を利用することで、何らかの過去の時点で関心を寄せた書類に隣接する他の書類が決定される。この処理は、RFIDのようなトラッキングシステムによって支援される。その場合、捕捉装置106はRFIDリーダ244を含む。 The physical history log 432 is in the document event database 320, for example. The physical history log 432 is a physical history log of paper documents. For example, the MMR user 110 points his / her capture device 106 at a paper document and uses the information stored in the physical history log 432 to create a document of interest at some past time. Other adjacent documents are determined. This process is supported by a tracking system such as RFID. In that case, the capture device 106 includes an RFID reader 244.

記入済書類確認部434は、あるフォーム(定型様式)を記入するのに使用した過去に取得した情報を抽出するソフトウエアアプリケーションである。例えば、MMRユーザ110は彼／彼女の捕捉装置106を或るブランクフォーム(例えば、ウェブサイトから印刷された請求様式)の所で指示し、過去に入力した情報の履歴を用意する。その後に、記入済書類確認部434により、そのフォームは過去に入力した情報で自動的に記入される。 The completed document confirmation unit 434 is a software application that extracts information acquired in the past used to fill in a certain form (standard form). For example, the MMR user 110 points to his / her capture device 106 on a blank form (eg, a billing form printed from a website) and prepares a history of previously entered information. Thereafter, the completed document confirmation unit 434 automatically fills in the form with information input in the past.

伝搬時間部436は、書類の過去の及び将来のバージョンに備えてソースファイルを取り出し、それらのバージョンに関連するイベントのリストを取得及び表示するソフトウエアアプリケーションである。この処理は、最も重要な外的なイベント(例えば、話し合いやミーティング)の後にそれらに関連する書類が何ヶ月かたって作成されたバージョンから、手持ちの印刷書類が生成されていることを埋め合わせる。 Propagation time 436 is a software application that retrieves source files in preparation for past and future versions of a document and obtains and displays a list of events associated with those versions. This process makes up for the fact that on-hand printed documents have been generated from versions of the most important external events (e.g. discussions and meetings) that have been associated with documents that were created in months.

位置確認部438は、位置認識(location-aware)書類を管理するソフトウエアアプリケーションである。位置認識書類の管理は、例えば、RFID的なトラッキングシステムによって支援される。例えば、捕捉装置106は、日中のMMRユーザ110の地理的な位置のトレースを捕捉し、書類又は書類を含むフォルダに付いたRFIDタグをスキャンする。RFIDスキャン処理は、捕捉装置106のRFIDリーダ244により実行され、その範囲内で何らかのRFIDを検出する。MMRユーザ110の地理的な位置は、セルラインフラストラクチャ132内の各セルタワーの識別番号によって、或いは捕捉装置106のGPS装置242によって、又はジオロケーション手段142との組み合わせによって追跡されてもよい。或いは、書類の確認は、捕捉装置106のビデオカメラ232又は「オールウェイズオンビデオ(always-on-video)」とともに達成されてもよい。ロケーションデータは、「ジオリファレンス(geo-referenced)」書類を用意し、一日中、書類が位置する場所を示すマップベースのインターフェースを可能にする。１つのアプリケーションは、遠く離れたクライアントを訪問する、ファイルを持ち運ぶ法律家である。代替例では、書類118はそれらに取り付けられた検出手段を含み、書類が動かされた場合にそれを検知でき、ある基本的な顔検出処理を実行する。この検知機能は、書類に付けられたジャイロスコープ又は同様な装置一式を介してなされてもよい。位置情報に基づいて、MMRシステム100bは、所有者のセルラ電話を「呼び出す」際に、書類が動いていることを彼／彼女に知らせる。セルラ電話は、その書類を仮想的なブリーフケースに加える。更に、これは「不可視」のバーコードの概念であり、それは、捕捉装置106のビデオカメラ232又はスチルカメラ234には見えるマシン読み取り可能なマーキングであるが、人間にとっては見えない又は非常に微妙である。様々なインク及びステガノグラフィ又は印刷された画像ウォーターマーキング技術(捕捉装置106でデコードされてもよい)は、位置を確認するのに考慮されてもよい。 The location confirmation unit 438 is a software application that manages location-aware documents. The management of the position recognition document is supported by an RFID tracking system, for example. For example, the capture device 106 captures a trace of the geographic location of the MMR user 110 during the day and scans the RFID tag attached to the document or folder containing the document. The RFID scanning process is executed by the RFID reader 244 of the capturing device 106 and detects any RFID within the range. The geographical location of the MMR user 110 may be tracked by the identification number of each cell tower in the cellular infrastructure 132, by the GPS device 242 of the capture device 106, or in combination with the geolocation means 142. Alternatively, document verification may be accomplished with the video camera 232 of the capture device 106 or “always-on-video”. Location data provides a “geo-referenced” document and allows a map-based interface that shows where the document is located throughout the day. One application is a lawyer carrying a file that visits a remote client. In the alternative, the documents 118 include detection means attached to them, which can detect when the document is moved and perform some basic face detection process. This sensing function may be done via a gyroscope or similar set of devices attached to the document. Based on the location information, the MMR system 100b informs him / her that the document is moving when “calling” the owner's cellular phone. The cellular phone adds the document to a virtual briefcase. In addition, this is the “invisible” bar code concept, which is a machine readable marking that is visible to the video camera 232 or still camera 234 of the capture device 106 but is invisible or very subtle to humans. is there. Various inks and steganography or printed image watermarking techniques (which may be decoded by the capture device 106) may be considered to confirm the position.

PCオーサリング部440は、MMRコンピュータ112のようなPCでのオーサリング処理を実行するソフトウエアアプリケーションである。PCオーサリング部440は、マイクロソフトワード、パワーポイント及びウェブページオーサリングパッケージ等のような既存のオーサリングアプリケーションのプラグインとして与えられる。PCオーサリング部440は、MMRユーザ110が紙書類を準備することを可能にし、その紙書類は、彼／彼女のMMRコンピュータ112からのイベントに対するリンク又は彼／彼女の環境の中でのイベントに対するリンクを含む。或いはPCオーサリング部440は、リンクを有する紙書類を自動的に生成することを可能にし、印刷される書類が、該書類の生成元のワードファイルに自動的にリンクされるようにする。或いはPCオーサリング部440は、MMRユーザ110がワードファイルを検索可能にし、他の誰かにそれを付与できるようにする。リンクを含む紙書類は、MMRドキュメントとして言及される。MMRドキュメントの更なる詳細は、図５に関連して更に説明される。 The PC authoring unit 440 is a software application that executes authoring processing on a PC such as the MMR computer 112. The PC authoring unit 440 is provided as a plug-in for existing authoring applications such as Microsoft Word, PowerPoint and Web page authoring packages. The PC authoring unit 440 allows the MMR user 110 to prepare a paper document that is linked to an event from his / her MMR computer 112 or to an event in his / her environment. including. Alternatively, the PC authoring unit 440 can automatically generate a paper document having a link so that the document to be printed is automatically linked to the word file from which the document is generated. Alternatively, the PC authoring unit 440 allows the MMR user 110 to search for a word file and give it to someone else. Paper documents containing links are referred to as MMR documents. Further details of the MMR document are further described in connection with FIG.

書類オーサリング部442は、既存の書類に関するオーサリング処理を実行するソフトウエアアプリケーションである。書類オーサリング部442は、例えば個人用エディションとして又は業務用エディションとして実現可能である。個人用エディションの場合、MMRユーザ110は書類をスキャンし、それらをMMRドキュメントデータベースに加える(例えば、ドキュメントイベントデータベース320に加える)。業務用エディションの場合、発行者(又は第三者)がオリジナルの電子ソース(又は、電子ゲラ刷り)からMMRドキュメントを作成する。この機能は、ハイエンド出版パッケージ(例えば、アドビリーダー)に組み込まれ、他のエンティティにより提供されるバックエンドユーザとともにリンクされてもよい。 The document authoring unit 442 is a software application that executes an authoring process for an existing document. The document authoring unit 442 can be realized as, for example, a personal edition or a business edition. For the personal edition, the MMR user 110 scans the documents and adds them to the MMR document database (eg, adds to the document event database 320). In the case of the commercial edition, the issuer (or a third party) creates an MMR document from the original electronic source (or electronic galley print). This functionality may be incorporated into a high-end publishing package (eg, Adobe leader) and linked with back-end users provided by other entities.

捕捉装置オーサリング部444は、捕捉装置106で直接的にオーサリング処理を実行するソフトウエアアプリケーションである。捕捉装置オーサリング部444を利用することで、MMRユーザ110は、彼／彼女が手に持っている紙書類からキーフレーズを取り出し、その場で捕捉した追加的なコンテンツとともにそのキーフレーズを格納し、仮のMMRドキュメントを作成する。更に、捕捉装置オーサリング部440を利用することで、MMRユーザ110は、彼／彼女のMMRコンピュータ112に戻り、彼／彼女が作成した仮のMMRドキュメントを既存の書類アプリケーション(例えば、パワーポイント)にダウンロードし、それをMMR書類の最終バージョンに編集する又は他のアプリケーション用の他の標準的な形式の書類に編集してもよい。その際、イメージ及びテキストが既存の書類のページ中に自動的に挿入される(例えば、パワーポイント書類のページ中に挿入される。)。 The capture device authoring unit 444 is a software application that directly executes an authoring process in the capture device 106. Using the capture device authoring unit 444, the MMR user 110 takes the keyphrase from the paper document he / she has in his hand and stores the keyphrase along with the additional content captured on the spot, Create a temporary MMR document. Additionally, using the capture device authoring unit 440, the MMR user 110 returns to his / her MMR computer 112 and downloads his / her temporary MMR document to an existing document application (eg, PowerPoint). However, it may be edited into a final version of the MMR document or other standard format document for other applications. In doing so, images and text are automatically inserted into the pages of existing documents (eg, inserted into pages of PowerPoint documents).

無意識的アップロード部446は、印刷書類を無意識的に(自動的に、ユーザの介在なしに)捕捉装置106にアップロードするソフトウエアアプリケーションである。MMRユーザ110が彼／彼女のMMRコンピュータ112の所に居る場合も含めて、捕捉装置106はほとんどの時点でMMRユーザ110に所有されているので、書類をプリンタ116に送信することに加えてプリンタドライバ316は、それらの同じ書類を捕捉装置106のストレージ装置216へ、捕捉装置106の無線通信リンク218を介して、Wi-Fi技術134又はブルートゥース技術136とともに又は捕捉装置106がMMRコンピュータ112に結合されているならば有線接続により、送り出してもよい。その場合、MMRユーザ110は書類が印刷された後書類を取ることを決して忘れない。なぜなら書類は捕捉装置106に自動的にアップロードされるからである。 The unconscious upload unit 446 is a software application that uploads a print document to the capturing device 106 unconsciously (automatically and without user intervention). Since the MMR user 110 is owned by the MMR user 110 at most times, including when the MMR user 110 is at his / her MMR computer 112, in addition to sending the document to the printer 116, the printer The driver 316 couples these same documents to the storage device 216 of the capture device 106, via the wireless communication link 218 of the capture device 106, with the Wi-Fi technology 134 or the Bluetooth technology 136, or the capture device 106 couples to the MMR computer 112. If it is, it may be sent out by wired connection. In that case, the MMR user 110 will never forget to take the document after it has been printed. This is because the document is automatically uploaded to the capture device 106.

書類バージョン取得部448は、所与のソースファイル310の過去及び将来的なバージョンを取得するソフトウエアアプリケーションである。例えば、MMRユーザ110は印刷書類の場所に捕捉装置106を指示し、そして書類バージョン取得部448が現在のソースファイル310(例えば、ワードファイル)、ソースファイル310の他の過去及び将来のバージョンを特定する。特定の実施例では、この動作はウインドウズファイルトラッキングソフトウエアを利用し、ソースファイル310が複製され及び動かされた場所を追跡する。他のそのようなファイルトラッキングソフトウエアも使用可能である。例えば、グーグルデスクトップサーチやマイクロソフトウインドウズサーチコンパリゾンは、ソースファイル310から選択されたワードで構成されるクエリとともにファイルの現在のバージョンを見つけることができる。 The document version acquisition unit 448 is a software application that acquires past and future versions of a given source file 310. For example, the MMR user 110 points the capture device 106 to the location of the printed document, and the document version acquisition unit 448 identifies the current source file 310 (eg, a word file), other past and future versions of the source file 310. To do. In certain embodiments, this operation utilizes Windows file tracking software to track where the source file 310 was replicated and moved. Other such file tracking software can also be used. For example, Google Desktop Search or Microsoft Windows Search Comparison can find the current version of a file with a query consisting of words selected from the source file 310.

PC書類メタデータ部450は、書類のメタデータを取得するソフトウエアアプリケーションである。例えば、MMRユーザ110は捕捉装置106を印刷書類の所に指示し、そしてPC書類メタデータ部450は、誰がその書類を印刷したか、いつその書類が印刷されたか、どこでその書類が印刷されたか及び印刷時の所与のソースファイルのファイルパス等を確認する。 The PC document metadata unit 450 is a software application that acquires document metadata. For example, the MMR user 110 points the capture device 106 to the printed document, and the PC document metadata section 450 determines who printed the document, when the document was printed, and where the document was printed. Also check the file path of the given source file at the time of printing.

捕捉装置ユーザインターフェース(UI)部452は、捕捉装置106のユーザインターフェース(UI)の動作を管理するソフトウエアアプリケーションであり、MMRユーザ110が紙書類とともに相互作用できるようにする。捕捉装置ユーザインターフェース(UI)部452及び捕捉装置ユーザインターフェース(UI)224を組み合わせることで、既存の書類からデータを読み取ること及びデータを既存の書類に書き込むこと、その書類に関連する拡張されたリアリティとともに閲覧及び相互作用すること（即ち、捕捉装置106を介して、何が起こったか、何時書類が作成されたか又は何時書類が編集されたかをMMRユーザ110は知ることができる。）、並びに彼／彼女の捕捉装置106に表示される書類に関連する拡張されたリアリティとともに閲覧及び相互作用することをMMRユーザ110に許可する。 The capture device user interface (UI) unit 452 is a software application that manages the operation of the user interface (UI) of the capture device 106 and allows the MMR user 110 to interact with the paper document. The capture device user interface (UI) portion 452 and the capture device user interface (UI) 224 combine to read data from and write data to an existing document, and the enhanced reality associated with that document. Viewing and interacting with (ie, via the capture device 106, the MMR user 110 knows what happened, when the document was created or when it was edited), and he / Allows MMR user 110 to view and interact with the enhanced reality associated with the documents displayed on her capture device 106.

ドメイン固有コンポーネント454は、ドメイン固有の機能を管理するソフトウエアアプリケーションである。例えば、音楽アプリケーションの場合、ドメイン固有コンポーネント454は、例えば、捕捉装置106のボイスレコーダ236を介して検出された音楽をタイトル、アーティスト又は作曲家と照合するソフトウエアアプリケーションである。このように、検出された音楽に関する音楽CDや楽譜等のような関心のあるアイテムが、MMRユーザ110に提示される。同様に、ドメイン固有コンポーネント454は、ビデオコンテンツ、ビデオゲーム及び何らかの娯楽情報についても同様な方法で動作するよう適合させられている。デバイス固有コンポーネント454は何らかの大容量メディアコンテンツの電子バージョンに適合していてもよい。 The domain specific component 454 is a software application that manages domain specific functions. For example, in the case of a music application, the domain specific component 454 is a software application that, for example, matches music detected via the voice recorder 236 of the capture device 106 with a title, artist, or composer. In this way, items of interest such as music CDs and sheet music relating to the detected music are presented to the MMR user 110. Similarly, domain specific component 454 is adapted to operate in a similar manner for video content, video games, and some entertainment information. Device specific component 454 may be adapted to an electronic version of some high volume media content.

図３及び図４を更に参照するに、MMRソフトウエア群222のソフトウエアコンポーネントは、１つ以上のMMRコンピュータ112、ネットワークメディアサーバ114、サービスプロバイダサーバ122及び捕捉装置106のMMRシステム100b内に完全に又は部分的に存在してよいことに留意すべきである。言い換えれば、MMRソフトウエア群222で実行されるようなMMRシステム100bの処理は、MMRコンピュータ112、ネットワークメディアサーバ114、サービスプロバイダサーバ122及び捕捉装置106の間のユーザの決めた何らかのコンフィギュレーションの中で分散されてもよい(又はシステム100bに含まれるそのような他の処理環境の中で分散されてもよい）。 With further reference to FIGS. 3 and 4, the software components of the MMR software suite 222 are fully contained within the MMR system 100b of one or more MMR computers 112, the network media server 114, the service provider server 122, and the capture device 106. It should be noted that may be present in part or in part. In other words, the processing of the MMR system 100b as executed by the MMR software group 222 is performed in some user-defined configuration among the MMR computer 112, the network media server 114, the service provider server 122, and the acquisition device 106. (Or may be distributed among such other processing environments included in the system 100b).

MMRシステム100a/100bの基本的な機能は、MMRソフトウエア群222のソフトウエアコンポーネントの或る組み合わせとともに実行可能であることは、本開示により明らかになるであろう。例えば、MMRシステム100a/100bの一例による基本機能は、以下を含む：
●第１メディア部分及び第２メディア部分を含むMMRドキュメントを作成及び付加すること。 It will be apparent from this disclosure that the basic functions of the MMR system 100a / 100b can be performed with certain combinations of the software components of the MMR software suite 222. For example, basic functions according to an example of the MMR system 100a / 100b include the following:
Create and add an MMR document that includes the first media part and the second media part.

●MMRドキュメントの第１メディア部分(例えば、紙書類)を利用して、第２メディア部分の情報にアクセスすること。 Accessing information in the second media part using the first media part (eg paper document) of the MMR document.

●MMRドキュメントの第１メディア部分(例えば、紙書類)を利用して、電子ドメインでの処理を引き起こす(トリガをかける)又は開始すること。 Initiate (trigger) or initiate processing in the electronic domain using the first media portion (eg, paper document) of the MMR document.

●MMRドキュメントの第１メディア部分(例えば、紙書類)を利用して、第２メディア部分を作成又は付加すること。 Create or add a second media portion using the first media portion (eg, paper document) of the MMR document.

●MMRドキュメントの第２メディア部分を利用して、第１メディア部分を作成又は付加すること。 Create or add the first media part using the second media part of the MMR document.

●MMRドキュメントの第２メディア部分を利用して、電子ドメイン内での処理を引き起こす若しくは開始する又は第１メディア部分に関連付けること。 Use the second media part of the MMR document to trigger or initiate processing in the electronic domain or to associate with the first media part.

MMRドキュメント
図５は、本発明の一実施例によるMMRドキュメント500を示す。より具体的には、図５のMMRドキュメント500は、印刷書類118の一部の表現502、アクション又は第２メディア504、インデックス又はホットスポット506、及び書類118全体の電子表現508を含む。MMRドキュメント500は典型的にはドキュメントイベントデータベース320に格納されているが、捕捉装置に又はネットワーク128に結合された何らかの他の装置に格納されてもよい。一実施例では、複数のMMRドキュメント500が１つの印刷書類に対応してもよい。別の実施例では、図５に示される構造が複製され、１つの印刷書類の中に複数のホットスポット506を作成してもよい。ある特定の実施例では、MMRドキュメント500は、ページに関する表現502及びホットスポット506並びにページ内の場所を含み、第２メディア504及び電子表現508は選択的であり、波線で示されるように線引きされている。第２メディア504及び電子表現508は、望まれるならば、MMRドキュメントが作成された後に後で付加されてもよいことに留意を要する。この基礎的な実施例は、その表現に関連する書類を見つける又は書類中の特定の場所を見つけるために使用されてもよい。 MMR Document FIG. 5 shows an MMR document 500 according to one embodiment of the present invention. More specifically, the MMR document 500 of FIG. 5 includes a representation 502 of a portion of a printed document 118, an action or second media 504, an index or hot spot 506, and an electronic representation 508 of the entire document 118. The MMR document 500 is typically stored in the document event database 320, but may be stored on the capture device or on some other device coupled to the network 128. In one embodiment, a plurality of MMR documents 500 may correspond to a single printed document. In another embodiment, the structure shown in FIG. 5 may be duplicated to create multiple hot spots 506 in a single printed document. In one particular embodiment, the MMR document 500 includes a representation 502 and hotspots 506 for the page and a location within the page, and the second media 504 and the electronic representation 508 are optional and drawn as shown by the wavy lines. ing. Note that second media 504 and electronic representation 508 may be added later after the MMR document is created, if desired. This basic embodiment may be used to find a document associated with the representation or to find a specific location in the document.

印刷書類118の一部の表現502は、パターンを照合し、書類中の少なくとも１つのロケーションを特定するのに利用可能な如何なる形式(画像、ベクトル、画素、文字、コード等)でもよい。表現502は印刷書類の中で或る場所を一意に特定することが好ましい。一実施例では、表現502は図５に示されるようなテキストフィンガープリントである。テキストフィンガープリントは、PD捕捉モジュール318を介して自動的に捕捉され、印刷処理中にPDインデックス322に格納される。或いは、テキストフィンガープリント502は、書類スキャナ127の書類フィンガープリント照合モジュール226’を介して自動的に捕捉され、スキャン処理中にPDインデックス322に格納されてもよい。代替的に表現502は書類全体、テキストの一部、１つのワード(そのワードが書類中の固有のインスタンスであるならば)、イメージの一部分、固有の属性又は書類の照合可能な部分の他の如何なる表現でもよい。 The representation 502 of the portion of the printed document 118 can be any format (image, vector, pixel, character, code, etc.) that can be used to match patterns and identify at least one location in the document. The representation 502 preferably uniquely identifies a location in the printed document. In one embodiment, representation 502 is a text fingerprint as shown in FIG. The text fingerprint is automatically captured via the PD capture module 318 and stored in the PD index 322 during the printing process. Alternatively, the text fingerprint 502 may be automatically captured via the document fingerprint verification module 226 'of the document scanner 127 and stored in the PD index 322 during the scanning process. Alternatively, the representation 502 can be an entire document, a portion of text, a word (if that word is a unique instance in the document), a portion of an image, a unique attribute, or any other collable portion of the document. Any expression is acceptable.

アクション又は第２メディア504は、ディジタルファイル又は何らかのタイプのデータ構造であることが好ましい。ほとんどの基本的な実施例における第２メディア504は、提示されるテキスト又は実行される１以上のコマンドかもしれない。第２メディアタイプ504はより一般的にはテキストファイル、オーディオファイル又はビデオファイルであり、それらは表現502により特定される書類の一部分に関連するものである。第２メディアタイプ504は、複数の異なるメディアタイプや、同一タイプの複数のファイルを参照又は包含するデータ構造又はファイルでもよい。例えば、第２メディア504は、テキスト、コマンド、イメージ、PDFファイル、ビデオファイル、オーディオファイル、アプリケーションファイル(例えば、スプレッドシートやワードプロセシング書類)等にすることができる。 The action or second media 504 is preferably a digital file or some type of data structure. The second media 504 in most basic embodiments may be text that is presented or one or more commands that are executed. The second media type 504 is more generally a text file, audio file, or video file, which is associated with a portion of the document identified by the representation 502. The second media type 504 may be a data structure or file that refers to or includes a plurality of different media types or a plurality of files of the same type. For example, the second media 504 can be text, commands, images, PDF files, video files, audio files, application files (eg, spreadsheets or word processing documents), etc.

インデックス又はホットスポット506は、表現502とアクション又は台2メディア04との間のリンクである。ホットスポット506は、表現502及び第２メディア504を関連付ける。一実施例では、インデックス又はホットスポット506は、書類中のｘ及びｙ座標のような位置情報を含む。ホットスポット506は或る点(ポイント)でもよいし、或る領域でもよいし、書類全体でさえよい。一実施例では、ホットスポットは表現502を指すポインタ、第２メディア504を指すポインタ及び書類中のロケーションを備えたデータ構造である。MMRドキュメント500は複数のホットスポット506を含むことができ、その場合、データ構造は、複数の表現、複数の第２メディアファイル、及び印刷書類118内の複数のロケーションの間のリンクを作成することが理解されるべきである。 The index or hot spot 506 is a link between the representation 502 and the action or platform 2 media 04. Hot spot 506 associates representation 502 and second media 504. In one embodiment, the index or hot spot 506 includes positional information such as x and y coordinates in the document. The hot spot 506 may be a certain point (point), a certain area, or even the entire document. In one embodiment, the hot spot is a data structure with a pointer to the representation 502, a pointer to the second media 504, and a location in the document. The MMR document 500 can include multiple hot spots 506, in which case the data structure creates links between multiple representations, multiple second media files, and multiple locations within the printed document 118. Should be understood.

代替実施例では、MMRドキュメント500は書類全体118の電子表現508を含む。この電子表現は、ホットスポット506の位置を特定すること、ユーザインターフェースを用いて捕捉装置106又はMMRコンピュータ112に書類を表示することに使用可能である。 In an alternative embodiment, the MMR document 500 includes an electronic representation 508 of the entire document 118. This electronic representation can be used to locate the hot spot 506 and display the document on the capture device 106 or MMR computer 112 using a user interface.

MMRドキュメント500の利用例は、以下のとおりである。テキストフィンガープリント又は表現502を分析することで、捕捉されたテキストフラグメントが、捕捉装置106のドキュメントフィンガープリント照合モジュール226’により確認される。例えば、MMRユーザ110は、彼／彼女の捕捉装置106のビデオカメラ232又はスチルカメラ234を印刷書類118の所に指示し、イメージを捕捉する。次に、ドキュメントフィンガープリント照合モジュール226は、捕捉したイメージについて分析を実行し、PDインデックス322内に関連するエントリが有るか否かを確認する。合致した場合、ホットスポット506の存在が、彼／彼女の捕捉装置106のディスプレイ212上でMMRユーザ110に対して強調される。例えば、図５に示されるようなワード又はフレーズが強調される。印刷書類118内の各ホットスポット506は、他のユーザの決めた又は所定のデータ(例えば、ネットワークメディアサーバ114にあるMMファイル336の１つ)に対するリンクとして機能する。PDインデックス322内に格納されたテキストフィンガープリント又は表現502にアクセスすることは、書類中の如何なるMMRドキュメント500にも又は如何なるホットスポット506にも電子データを付加することを可能にする。図４を参照しながら説明したように、少なくとも１つのホットスポット506(例えば、リンク)を含む紙書類は、MMRドキュメント500として参照される。 Examples of using the MMR document 500 are as follows. By analyzing the text fingerprint or representation 502, the captured text fragments are verified by the document fingerprint verification module 226 'of the capture device 106. For example, the MMR user 110 points the video camera 232 or still camera 234 of his / her capture device 106 to the print document 118 and captures the image. Next, the document fingerprint matching module 226 performs an analysis on the captured image to see if there is an associated entry in the PD index 322. If there is a match, the presence of the hot spot 506 is highlighted to the MMR user 110 on the display 212 of his / her capture device 106. For example, words or phrases as shown in FIG. 5 are emphasized. Each hot spot 506 in the printed document 118 serves as a link to other user determined or predetermined data (eg, one of the MM files 336 on the network media server 114). Accessing the text fingerprint or representation 502 stored in the PD index 322 allows electronic data to be appended to any MMR document 500 or any hot spot 506 in the document. As described with reference to FIG. 4, a paper document that includes at least one hot spot 506 (eg, a link) is referred to as an MMR document 500.

図１Ｂ、２Ａ乃至２Ｄ、３、４及び５に関するMMRシステム100bの動作例が以下に説明される。MMRユーザ110又は他の何らかのエンティティ(例えば、出版会社)は、所与のソースファイル310を開き、印刷処理を開始し、印刷書類118のような紙書類を作成する。印刷処理中に、次のような（１）〜（６）の動作が自動的に実行される。（１）印刷時にPD捕捉モジュール318を介して印刷フォーマットを自動的に捕捉し、それを捕捉装置106に転送すること。書類の電子表現508は、例えばSDブラウザ312の出力でPD捕捉モジュール318を利用することで、印刷時に自動的に捕捉される。例えば、MMRユーザ110はブラウザ312からコンテンツを印刷し、そのコンテンツはPD捕捉モジュール318により選別(フィルタリング)される。上述したように、ページ上の文字の２次元配列は、その書類が印刷用にレイアウトされた時点で決定可能である。（２）印刷時に書類のソースファイル310をPD捕捉モジュール318により自動的に捕捉すること。（３）書類分析モジュール326を介して印刷されたフォーマット及び／又はソースファイル310を分析し、捕捉装置106のマルチメディア注釈インターフェースに入力された「指名されたエンティティ」又は他の関心のある情報を発見すること。指名されたエンティティは、例えば、マルチメディアを後で付与するための「いかり(anchor)」であり、自動的にホットスポット506が生成される。書類分析モジュール326は、所与の印刷書類118に関連する入力ソースファイル310を受信する。書類分析モジュール326は、書類118におけるホットスポット506とともに使用する表現502(例えば、タイトル、作者、時間又はロケーション)を確認し、情報を捕捉装置106で受けることを促すアプリケーションである。（４）コンテンツベースの検索に備えて、印刷されたフォーマット及び／又はソースファイル310に自動的に索引付けを行うこと(PDインデックス322を構築する)。（５）書類に関するドキュメントイベントデータベース320におけるエントリを作成し、例えば履歴及び現在位置のようなソースファイル310に関連するイベントを作成すること。（６）プリンタドライバ316内でインタラクティブな対話を実行すること。これは、MMRユーザ110が、書類の印刷される前にホットスポット506を書類に付けることを可能にし、MMRドキュメント500が作成される。関連するデータがMMRコンピュータ112に格納される、或いはネットワークメディアサーバ114にアップロードされる。 Examples of operation of the MMR system 100b with respect to FIGS. An MMR user 110 or some other entity (eg, a publishing company) opens a given source file 310, initiates the printing process, and creates a paper document, such as a printed document 118. During the printing process, the following operations (1) to (6) are automatically executed. (1) Automatically capture the print format via the PD capture module 318 during printing and transfer it to the capture device 106. The electronic representation 508 of the document is automatically captured at the time of printing, for example, using the PD capture module 318 at the output of the SD browser 312. For example, the MMR user 110 prints content from the browser 312, and the content is selected (filtered) by the PD capture module 318. As described above, the two-dimensional arrangement of characters on the page can be determined when the document is laid out for printing. (2) The document source file 310 is automatically captured by the PD capture module 318 during printing. (3) Analyzing the printed format and / or source file 310 via the document analysis module 326 to obtain “nominated entities” or other information of interest input to the multimedia annotation interface of the capture device 106 To discover. The named entity is, for example, an “anchor” for later granting multimedia, and a hot spot 506 is automatically generated. Document analysis module 326 receives an input source file 310 associated with a given print document 118. The document analysis module 326 is an application that confirms the representation 502 (eg, title, author, time or location) used with the hotspot 506 in the document 118 and prompts the capture device 106 to receive the information. (4) Automatically index printed format and / or source file 310 in preparation for content-based search (build PD index 322). (5) Create an entry in the document event database 320 for the document and create an event associated with the source file 310, such as history and current location. (6) Perform interactive dialogue within the printer driver 316. This allows the MMR user 110 to attach a hot spot 506 to the document before the document is printed and an MMR document 500 is created. Related data is stored in the MMR computer 112 or uploaded to the network media server 114.

代替実施例
MMRシステム100(100a又は100b)は、図１Ａ−１Ｂ、２Ａ−２Ｄ及び３−５に示されるコンフィギュレーションに限定されない。MMRソフトウエアは、捕捉装置106及びMMRコンピュータ112の間の全体にわたって又は一部分に分散されてもよいし、図３，４に関して説明したモジュール全てよりもかなり少ないものしか要しない。可能な複数のコンフィギュレーションは、以下の代替例を含む。 Alternative embodiments
The MMR system 100 (100a or 100b) is not limited to the configurations shown in FIGS. 1A-1B, 2A-2D, and 3-5. The MMR software may be distributed throughout or in part between the capture device 106 and the MMR computer 112 and requires significantly less than all the modules described with respect to FIGS. Possible configurations include the following alternatives:

第１の代替例によるMMRシステム100は、捕捉装置106及び捕捉装置ソフトウエアを含む。捕捉装置ソフトウエアは、捕捉装置ユーザインターフェース(UI)224及びドキュメントフィンガープリント照合モジュール226(例えば、図３に示されるようなもの)である。捕捉装置ソフトウエアは捕捉装置106で実行されてもよいし、或いは、捕捉装置106にとってアクセス可能なネットワークメディアサーバ114やサービスプロバイダサーバ122のような外部サーバで実行されてもよい。この代替例では、刊行物にリンクされたデータを供給するネットワークサービスが利用可能である。ある階層判定法が使用されてもよい。その方法では、刊行物が先ず確認され、その刊行物内のページ及びセクションが確認される。 The MMR system 100 according to the first alternative includes a capture device 106 and capture device software. The capture device software is a capture device user interface (UI) 224 and a document fingerprint verification module 226 (eg, as shown in FIG. 3). The capture device software may be executed on the capture device 106 or may be executed on an external server such as a network media server 114 or service provider server 122 accessible to the capture device 106. In this alternative, a network service that provides data linked to the publication is available. A hierarchy determination method may be used. In that method, a publication is first verified and the pages and sections within the publication are verified.

第２の代替例によるMMRシステム100は、捕捉装置106、捕捉装置ソフトウエア及びドキュメント利用ソフトウエアを含む。第２の代替例は、図４を参照しながら図示及び説明されたようなソフトウエアを含み、印刷書類を捕捉及び索引付けし、書類の編集履歴のような基本的な書類イベントを結び付ける。これはMMRユーザ110が彼／彼女の捕捉装置106を何らかの印刷書類の所に指示し、印刷の時間及び場所に加えて、書類を生成したソースファイルの名称及び場所も確認することを可能にする。 The MMR system 100 according to the second alternative includes a capture device 106, capture device software, and document usage software. A second alternative includes software such as that shown and described with reference to FIG. 4 to capture and index printed documents and tie basic document events such as document editing history. This allows the MMR user 110 to point his / her capture device 106 to some printed document and to see the name and location of the source file that generated the document, as well as the time and location of the printing. .

第３の代替例によるMMRシステム100は、捕捉装置106、捕捉装置ソフトウエア、ドキュメント利用ソフトウエア及びイベント捕捉モジュール324を含む。イベント捕捉モジュール324が、イベントを捕捉するMMRコンピュータ112に加えられ、そのイベントは、書類がMMRコンピュータ112のデスクトップ上で見えるようになった時間(GDIキャラクタジェネレータを監視することで確認される)、書類が開かれた際にアクセスされたURL、又は書類が開かれた際にキーボードにタイプされたキャラクタ等のような、書類に関連するイベントである。 The MMR system 100 according to a third alternative includes a capture device 106, capture device software, document usage software, and an event capture module 324. An event capture module 324 is added to the MMR computer 112 that captures the event, which is when the document becomes visible on the desktop of the MMR computer 112 (as confirmed by monitoring the GDI character generator), An event related to a document, such as a URL accessed when the document is opened, or a character typed on the keyboard when the document is opened.

第４の代替例によるMMRシステム100は、捕捉装置106、捕捉装置ソフトウエア及びプリンタ116を含む。第４の代替例では、プリンタ116がブルートゥーストランシーバ又は同様な通信リンクとともに装備され、近接している何らかのMMRユーザ110の捕捉装置106と通信する。何らかのMMRユーザ110がプリンタ116から書類を取り出すときはいつでも、プリンタ116はMMRデータ(ドキュメントレイアウト及びマルチメディアクリップ)をそのユーザの捕捉装置106に押し出す。ユーザプリンタ116はキーパッドを含み、そのキーパッドによりユーザはログインしてコードを入力し、特定の書類に関連するマルチメディアデータを取得する。書類のフッターに、プリンタドライバ316により挿入されたコードの印刷表現が含まれていてもよい。 The MMR system 100 according to a fourth alternative includes a capture device 106, capture device software and a printer 116. In a fourth alternative, printer 116 is equipped with a Bluetooth transceiver or similar communication link to communicate with capture device 106 of any MMR user 110 in close proximity. Whenever any MMR user 110 retrieves a document from the printer 116, the printer 116 pushes the MMR data (document layout and multimedia clips) to the user's capture device 106. User printer 116 includes a keypad that allows a user to log in and enter a code to obtain multimedia data associated with a particular document. A printed representation of the code inserted by the printer driver 316 may be included in the footer of the document.

第５の代替例によるMMRシステム100は、捕捉装置106、捕捉装置ソフトウエア及びオフィスポータル120を含む。オフィスポータル120は好ましくは個人仕様のオフィスポータル120である。オフィスポータル120は、会話、会議／電話及びミーティング等のようなオフィス内のイベントを捕捉する。オフィスポータル120は物理的なデスクトップ上で特定の紙書類を確認及び追跡する。更にオフィスポータル120は書類確認ソフトウエア(即ち、ドキュメントフィンガープリント照合モジュール226及びホストドキュメントイベントデータベース320)を実行する。第５の代替例は、MMRコンピュータ112から演算負担を分散するように機能し、顧客装置のようなMMRシステム100bをパッケージする便利な方法をもたらす(例えば、MMRシステム100bは、アップルコンピュータインコーポレーテッドのマックミニコンピュータで実行されるハードウエア及びソフトウエアプロダクトとして販売される。)。 An MMR system 100 according to a fifth alternative includes a capture device 106, capture device software and an office portal 120. The office portal 120 is preferably a personalized office portal 120. The office portal 120 captures events in the office such as conversations, conferences / phones, meetings and the like. The office portal 120 identifies and tracks specific paper documents on the physical desktop. In addition, the office portal 120 executes document verification software (ie, the document fingerprint verification module 226 and the host document event database 320). A fifth alternative functions to distribute the computational burden from the MMR computer 112 and provides a convenient way to package an MMR system 100b, such as a customer device (eg, the MMR system 100b is Apple Computer, Inc.'s Sold as hardware and software products running on Mac minicomputers.)

第６の代替例によるMMRシステム100は、捕捉装置106、捕捉装置ソフトウエア及びネットワークメディアサーバ114を含む。この代替例では、マルチメディアデータは、例えばカムキャストビデオオンデマンドサーバのようなネットワークメディアサーバ114にある。MMRユーザ110が彼／彼女の捕捉装置106のユーザにより書類テキストの一部をスキャンする場合、その結果のルックアップコマンドは、(無線で、インターネットを介して、或いは電話回線でセットトップボックス126を呼び出すことで)MMRユーザ110のケーブルTVに関連するセットトップボックス126に、又はコムキャストサーバに伝えられる。何れの場合にも、マルチメディアはコムキャストサーバからセットトップボックス126へストリーム伝送される。システム100はデータを送る場所を知っている。なぜなら、MMRユーザ110は彼／彼女の電話を過去に登録しているからである。従って、捕捉装置106はセットトップボックス126に対するアクセス及び制御に使用可能になる。 The MMR system 100 according to the sixth alternative includes a capture device 106, capture device software and a network media server 114. In this alternative, the multimedia data resides on a network media server 114, such as a camcast video on demand server. If the MMR user 110 scans a portion of the document text by the user of his / her capture device 106, the resulting lookup command will be sent to the set top box 126 (over the air, over the Internet, or over the telephone line). Called) is communicated to the set-top box 126 associated with the cable TV of the MMR user 110 or to the Comcast server. In either case, the multimedia is streamed from the Comcast server to the set top box 126. System 100 knows where to send data. This is because MMR user 110 has registered his / her phone call in the past. Accordingly, the capture device 106 can be used to access and control the set top box 126.

第７の代替例によるMMRシステム100は、捕捉装置106、捕捉装置ソフトウエア、ネットワークメディアサーバ114及びロケーションサービスを含む。この代替例では、ロケーション認識サービスが、コムキャストシステム(又は適切な他の通信システム)からの出力に関する複数の宛先を識別する。この機能は、セルラ電話タワーIDを自動的に識別することにより、又はデータが表示されるロケーションをMMRユーザ110に選択させるキーパッドインターフェースにより実行される。従って、他のロケーションがケーブルアクセスを有する限り、他のロケーションを訪れている際に、ケーブルオペレータにより提供されるケーブルTV上映作品及び番組にユーザはアクセスできる。 An MMR system 100 according to a seventh alternative includes a capture device 106, capture device software, a network media server 114, and a location service. In this alternative, the location awareness service identifies multiple destinations for output from the Comcast system (or other suitable communication system). This function is performed by automatically identifying the cellular telephone tower ID or by a keypad interface that allows the MMR user 110 to select the location where the data is displayed. Thus, as long as the other location has cable access, the user can access cable TV screenings and programs provided by the cable operator when visiting the other location.

ドキュメントフィンガープリント照合(イメージベースのパッチ判定)
上述したように、ドキュメントフィンガープリント照合は、MMRドキュメントの一部分、即ち「パッチ」を固有に確認することを含む。図６を参照するに、書類のフィンガープリント照合モジュール／システム610は、捕捉したイメージ612を受信する。ドキュメントフィンガープリント照合システム610は、書類データベース3400のページの集合に問い合わせを行い(例えば図３４を参照しながら後述される)、捕捉イメージ612が含まれているページ及び書類のリストを返す。結果の各々は、捕捉された入力画像612が現れるｘ−ｙロケーションである。(例えば、図６に示されるように)データベース3400はドキュメントフィンガープリント照合モジュール610の外にあるが、ドキュメントフィンガープリント照合モジュールの中にあってもよいことを当業者は理解するであろう(例えば、図７，１１，１２，１４，２０，２４，２６，２８及び３０−３２に示されるように、ドキュメントフィンガープリント照合モジュール610はデータベース3400を含む。)。 Document fingerprint verification (image-based patch determination)
As described above, document fingerprint verification involves uniquely identifying a portion of an MMR document, a “patch”. Referring to FIG. 6, a document fingerprint verification module / system 610 receives a captured image 612. The document fingerprint matching system 610 queries a set of pages in the document database 3400 (eg, described below with reference to FIG. 34) and returns a list of pages and documents that contain the captured image 612. Each of the results is an xy location where the captured input image 612 appears. Those skilled in the art will appreciate that the database 3400 is outside the document fingerprint matching module 610 (eg, as shown in FIG. 6), but may be within the document fingerprint matching module (eg, As shown in FIGS. 7, 11, 12, 14, 20, 24, 26, 28, and 30-32, the document fingerprint verification module 610 includes a database 3400).

図７は、本発明の一実施例によるドキュメントフィンガープリント照合システム610のブロック図を示す。捕捉装置106はイメージを捕捉する。捕捉されたイメージは品質評価モジュール712に伝送され、ダウンストリーム処理の必要性及び能力に基づいて、捕捉したイメージの内容について予備的な判定を効果的に行う。例えば、捕捉されたイメージが、ドキュメントフィンガープリント照合システム610でダウンストリーム処理できない程度の品質であった場合、品質評価モジュール712は、捕捉装置106が、より高い解像度でイメージを取得し直すようにする。更に、品質評価モジュール712は捕捉イメージの他の多くの関連属性を検出してもよい−例えば、捕捉画像に含まれているテキストの鮮明さ(sharpness)を検出してもよい(鮮明さは捕捉イメージに照準が合っているか否かの指標になる。)。更に、品質評価モジュール712は、捕捉イメージがその書類の一部になり得る何かを含んでいるか否かを確認してもよい。例えば、書類画像でないもの(例えば、机、外の情景)を含む画像パッチは、ユーザが捕捉装置106の視界を新たな書類に移しつつあることを示す。 FIG. 7 shows a block diagram of a document fingerprint matching system 610 according to one embodiment of the present invention. The capture device 106 captures an image. The captured image is transmitted to the quality assessment module 712 to effectively make a preliminary determination on the content of the captured image based on the need and capability of downstream processing. For example, if the captured image is of a quality that cannot be processed downstream by the document fingerprint verification system 610, the quality assessment module 712 causes the capture device 106 to reacquire the image at a higher resolution. . In addition, the quality assessment module 712 may detect many other related attributes of the captured image--for example, the sharpness of text contained in the captured image (the sharpness may be captured). It is an indicator of whether the image is aimed.) Furthermore, the quality assessment module 712 may check whether the captured image contains something that can be part of the document. For example, an image patch that includes something that is not a document image (eg, desk, outside scene) indicates that the user is moving the view of the capture device 106 to a new document.

更に、１つ以上の実施例では、品質評価モジュール712はテキスト／非テキスト判定を実行し、認識可能なテキストを含めるようにイメージのみを通過させる。図８は、1つ以上の実施例によるテキスト／非テキストを区別するフローチャートを示す。ステップ810では、ピクセル(画素)の列数が入力イメージパッチから取り出される。一般に、入力イメージはグレースケールであり、列(カラム)の各値は(８ビット画素の場合)ゼロから255までの整数である。ステップ812では、各カラム内のローカルピクセルが検出される。これは一般に理解されている「スライディングウインドウ」法で行うことができ、その方法では、固定長のウインドウ(例えば、Nピクセル)が列に沿って一度にMピクセルずつスライドされる(ここで、M＜Nである。)。各ステップにおいて、グレーレベル値の顕著な相違(例えば、40より大きいもの)を探すことで、ピークの存否が判定される。ウインドウの或る場所でピークが特定された場合、そのスライディングウインドウがその場所と重なっているときはいつでも、他のピークの検出は控えられる。ステップ812では、一連のピークの間のギャップも検出される。ステップ812は、イメージパッチ内のカラム番号Cに適用され、ステップ814でそのギャップ値がヒストグラムに蓄積される。 Further, in one or more embodiments, the quality assessment module 712 performs text / non-text determination and passes only the image to include recognizable text. FIG. 8 illustrates a flowchart for distinguishing text / non-text according to one or more embodiments. In step 810, the number of columns of pixels is extracted from the input image patch. In general, the input image is grayscale and each value in the column is an integer from zero to 255 (for 8-bit pixels). In step 812, local pixels in each column are detected. This can be done by the generally understood “sliding window” method, in which a fixed-length window (eg, N pixels) is slid M pixels at a time along the column (where M <N.) At each step, the presence or absence of a peak is determined by looking for significant differences in gray level values (eg, greater than 40). When a peak is identified at a location in the window, detection of other peaks is refrained whenever the sliding window overlaps the location. In step 812, a gap between a series of peaks is also detected. Step 812 is applied to column number C in the image patch, and in step 814 the gap value is accumulated in the histogram.

ギャップヒストグラムは、データベース818に格納されている既知の分類(ステップ816)とともにトレーニングデータから導出された他のヒストグラムと比較され、パッチのカテゴリに関する判定(テキストであるか否か)が、その判定の信頼度指標とともに出力される。ステップ816でのヒストグラム分類は、テキストのイメージから導出されるヒストグラムの一般的な出現度を考慮し、それは２つの密接なピークと、1つ又は２つの他の十分に小さなピークとともに行間の距離に集中したもの(それらのピークから離れたヒストグラムで整数倍高い)を含む。分類は、統計的なばらつきの指標とともにヒストグラムの形状を決定し、或いは、ヒストグラムを距離指標とともに格納済みのプロトタイプと１つ１つ比較してもよい(距離指標は、例えば、ハミング距離やユークリッド距離である。)。 The gap histogram is compared with other histograms derived from training data along with known classifications stored in database 818 (step 816), and a decision regarding the category of the patch (whether it is text) Output with reliability index. The histogram classification at step 816 takes into account the general appearance of the histogram derived from the image of the text, which is based on the distance between lines along with two close peaks and one or two other sufficiently small peaks. Includes concentrated ones (integer times higher in histograms away from those peaks). Classification may determine the shape of the histogram along with an indicator of statistical variation, or the histogram may be compared with a stored prototype one by one with a distance indicator (for example, the distance indicator may be a Hamming distance or Euclidean distance). .)

図９を参照するに、テキスト／非テキストの判定例が示されている。入力イメージ910は、カラム数をサンプルするように処理され、そのサブセットが波線で示される。代表的なカラム912のグレースケールレベルが914に示される。Yの値は910でのグレースケールレベルであり、Xの値は910の行である。ヒストグラム中のピーク間で検出されるギャップが、916に示される。サンプルしたカラム全てからのギャップ値のヒストグラムが、918に示される。この例は、テキストを含むパッチから導出されたヒストグラムの形状を示す。 Referring to FIG. 9, an example of text / non-text determination is shown. The input image 910 is processed to sample the number of columns, a subset of which is shown as a wavy line. An exemplary column 912 grayscale level is shown at 914. The Y value is the grayscale level at 910 and the X value is the 910 row. The gap detected between the peaks in the histogram is shown at 916. A histogram of gap values from all sampled columns is shown at 918. This example shows the shape of a histogram derived from a patch containing text.

図１０には、イメージパッチ中のテキストのポイントサイズを推定するフローチャートが示されている。このフローチャートは、イメージの不鮮明さ(blur)が、ページからの捕捉装置の距離に反比例することを考慮している。不鮮明度を推定することで、距離が推定され、その距離を使用して、イメージのオブジェクトのサイズを既知の「規格化された」大きさにスケーリングしてもよい。この振る舞いは、新たなイメージ中のテキストのピントサイズを推定するのに使用されてもよい。 FIG. 10 shows a flowchart for estimating the point size of the text in the image patch. This flowchart takes into account that the blur of the image is inversely proportional to the distance of the capture device from the page. By estimating the blur, the distance may be estimated, and the distance may be used to scale the size of the object in the image to a known “normalized” size. This behavior may be used to estimate the focus size of the text in the new image.

トレーニング段階1010では、既知のフォント及びポイントサイズのテキストのパッチのイメージ(「校正」イメージとして言及される)が、ステップ1012で既知の距離で画像捕捉装置により取得される。そのイメージ内のテキストキャラクタの大きさ(画素数で表現される)が、ステップ1014で測定される。これは例えばマイクロソフトフォトエディタのような画像注釈ツールとともにマニュアルで実行してもよい。校正イメージの不鮮明さがステップ1016で推定される。これは例えば２次元高速フーリエ変換のスペクトルカットオフの既知の指標とともに実行されてもよい。これは多数の画素数を単位として表現されてもよい。 In the training phase 1010, an image of a patch of text of known font and point size (referred to as a “proof” image) is acquired by the image capture device at a known distance in step 1012. The size (expressed in pixels) of the text character in the image is measured at step 1014. This may be done manually with an image annotation tool such as a Microsoft photo editor. The blur of the calibration image is estimated at step 1016. This may be performed, for example, with a known measure of the spectral cut-off of the two-dimensional fast Fourier transform. This may be expressed in units of a large number of pixels.

ステップ1024で「新たな」イメージがMMR認識システムにランタイムに与えられると、そのイメージはステップ1026で処理され、ライン分割及びキャラクタ分割の一般に知られている方法でテキストを特定し、その方法は、各キャラクタ周囲を包囲するボックスを生成する。これらのボックスの高さは画素数で表現されてもよい。新たな画像の不鮮明さは、ステップ1016でのものと同様な方法でステップ1028で推定される。これらの推定値はステップ1030で結合され、各キャラクタの(又は、等価的に各ラインの)ポイントサイズの第１推定値1032を生成する。これは、次の数式を計算することでなされてもよい：
(校正画像不鮮明サイズ／新規画像不鮮明サイズ)×
(新規画像テキスト高さ／構成画像テキスト高さ)×
(校正画像フォントサイズポイント数)
これは、校正画像中のテキストのポイントサイズをスケーリングし、入力画像パッチ中のテキストの推定されたポイントサイズを生成する。同じスケーリング機能が、キャラクタの包囲ボックス全ての高さに適用されてもよい。これは、パッチ中のキャラクタ全てについての判定をもたらす。例えば、パッチが50個のキャラクタを含んでいた場合、これはパッチ中のフォントのポイントサイズについて50票をもたらす。ポイントサイズについて１つの推定値が、票のメジアン(中央値)とともに導出されてもよい。 When a “new” image is provided to the MMR recognition system at run-time in step 1024, the image is processed in step 1026 to identify the text using commonly known methods of line splitting and character splitting. A box surrounding each character is generated. The height of these boxes may be expressed by the number of pixels. The blur of the new image is estimated at step 1028 in a manner similar to that at step 1016. These estimates are combined at step 1030 to produce a first estimate 1032 of the point size for each character (or equivalently for each line). This may be done by calculating the following formula:
(Proof image unclear size / new image unclear size) ×
(New image text height / Structure image text height) x
(Proofreading image font size points)
This scales the point size of the text in the proof image and generates an estimated point size of the text in the input image patch. The same scaling function may be applied to the height of all of the character's siege boxes. This results in a determination for all the characters in the patch. For example, if the patch contained 50 characters, this would yield 50 votes for the point size of the font in the patch. One estimate for the point size may be derived along with the median (median) of the vote.

更に、図７を更に具体的に参照するに、１つ以上の実施例では、捕捉装置106への品質評価モジュール712のフィードバックが、捕捉装置106のユーザインターフェース(UI)に指示されていてもよい。例えば、そのフィードバックは、音又は振動の形式の指示を含み、その指示は、捕捉したイメージがテキストに見える何かを含んでいるが、それは不鮮明であること、そしてユーザは捕捉装置106を設置すべきであることを指示してもよい。フィードバックは或るコマンドを含み、そのコマンドは、捕捉装置106の光学パラメータを変更して捕捉画像の品質を改善してもよい。例えば、焦点、Fストップ、及び／又は露出時間調整され、捕捉画像の品質が改善されるようにしてもよい。 Furthermore, referring more specifically to FIG. 7, in one or more embodiments, feedback of the quality assessment module 712 to the capture device 106 may be directed to a user interface (UI) of the capture device 106. . For example, the feedback includes an instruction in the form of a sound or vibration that includes something that the captured image looks like text, but it is smeared and the user installs the capture device 106 You may indicate that it should. The feedback includes a command that may change the optical parameters of the capture device 106 to improve the quality of the captured image. For example, the focus, F-stop, and / or exposure time may be adjusted to improve the quality of the captured image.

更に、捕捉装置106への品質評価モジュール712のフィードバックは、使用される特手の特徴抽出アルゴリズムのニーズに応じて特化されてもよい。後述されるように、特徴抽出はイメージをシンボリックな表現に変換する。ワード長を計算する認識システムでは、捕捉装置106の光学特性について、捕捉画像を不鮮明にすることが望ましいかもしれない。そのような調整は或る画像を形成するが、人又は光学文字認識(OCR)プロセスではおそらく認識されず、特徴抽出技法に十分相応しいことを当業者は留意するであろう。品質評価モジュール712は捕捉装置106に指示をフィードバックすることでそれを実行し、捕捉装置106はレンズの焦準をぼやけさせ、ぼけた画像を生成する。 Further, the feedback of the quality assessment module 712 to the capture device 106 may be customized depending on the needs of the feature extraction algorithm used. As described below, feature extraction converts an image into a symbolic representation. In recognition systems that calculate word length, it may be desirable to blur the captured image for the optical properties of the capture device 106. Those skilled in the art will note that such an adjustment forms an image but is probably not recognized by the human or optical character recognition (OCR) process and is well suited to feature extraction techniques. The quality assessment module 712 does this by feeding back instructions to the capture device 106, which blurs the focus of the lens and generates a blurred image.

フィードバックプロセスは制御構造714により修正される。一般に、制御構造714はデータ及びシンボリック情報(symbolic information)をドキュメントフィンガープリント照合システム610内の他のコンポーネントから受信する。制御構造714は、ドキュメントフィンガープリント照合システム610での様々な手順の実行順序を決定し、演算負荷を最適化することができる。制御構造714は受信したイメージパッチのｘ−ｙ座標を特定する。より具体的には、制御構造714は、特徴抽出プロセスの必要性に関する情報と、品質評価モジュール712からの結果と、捕捉装置106のパラメータとを受信し、それらを適切になるように変えることができる。これはフレーム毎の形式で動的に実行可能である。複数の特徴抽出法を使用するシステムコンフィギュレーションでは、あるものがテキストの大きなパッチの不鮮明な画像を必要とし、別のものが紙の目に鮮明に焦点のあった高い解像度を必要とするかもしれない。その場合、制御構造714は品質評価モジュール712にコマンドを送信し、視界にテキストを含む場合、適切な画像品質をもたらすように指図する。品質評価モジュール712は捕捉装置106と相互作用し、正しいイメージを生成する(例えば、紙の目に鮮明に焦点のあった(高解像度の)M個の画像の後に、N個の不鮮明な画像の大きなパッチが続く。)。制御構造714は、パイプライン処理でそれらの画像処理の進行を追跡し、対応する特徴抽出および分類が適用されることを保証する。 The feedback process is modified by the control structure 714. In general, control structure 714 receives data and symbolic information from other components in document fingerprint matching system 610. The control structure 714 can determine the order of execution of various procedures in the document fingerprint matching system 610 and optimize the computational load. The control structure 714 identifies the xy coordinates of the received image patch. More specifically, the control structure 714 can receive information about the need for the feature extraction process, the results from the quality assessment module 712, and the parameters of the capture device 106 and change them as appropriate. it can. This can be done dynamically in a frame-by-frame format. In system configurations that use multiple feature extraction methods, one may require a blurred image of a large patch of text and another may require a high resolution with a sharp focus on the paper eye. Absent. In that case, the control structure 714 sends a command to the quality assessment module 712 to direct the appropriate image quality if the view contains text. The quality assessment module 712 interacts with the capture device 106 to produce the correct image (e.g., M images that are sharply focused (high resolution) in the paper eye, followed by N blurred images). Followed by large patches.) The control structure 714 tracks the progress of those image processing in the pipeline process and ensures that the corresponding feature extraction and classification is applied.

画像処理モジュール716は認識システムのニーズに基づいて入力画像の品質を修正する。イメージ修正のタイプの具体例は、鮮明化すること、歪を正すこと(deskewing)及び２値化することである。そのようなアルゴリズムは、マスクサイズ、予測回転量及び閾値などのような調整可能な多くのパラメータを含む。 Image processing module 716 modifies the quality of the input image based on the needs of the recognition system. Specific examples of the type of image correction are sharpening, deskewing and binarization. Such an algorithm includes a number of adjustable parameters such as mask size, predicted rotation amount and threshold.

図７に示されるように、ドキュメントフィンガープリント照合システム610は、特徴抽出及び分類モジュール718，720(後述)からのフィードバックを用いて、画像処理モジュール716のパラメータを動的に修正する。これが機能するのは、ユーザは典型的には彼らの捕捉装置を書類の同じ場所で連続的に何秒かの間に指示するからである。例えば、捕捉装置106が毎秒30フレーム処理する場合、どのシーケンスでも最初の数フレームの処理結果は、後に捕捉されるフレームがどの程度処理されるかに影響する。 As shown in FIG. 7, the document fingerprint matching system 610 dynamically modifies the parameters of the image processing module 716 using feedback from the feature extraction and classification modules 718, 720 (described below). This works because users typically point their capture device at the same location on the document for several seconds in a row. For example, if the capture device 106 processes 30 frames per second, the processing results of the first few frames in any sequence will affect how much later captured frames are processed.

特徴抽出モジュール718は捕捉した画像をシンボリックな表現に変換する。一実施例では、特徴抽出モジュール718はワードを特定し、それらを囲むボックスを算出する。別の実施例では、特徴抽出モジュール718は、つながったコンポーネントを特定し、それらの形状に関する記述を用意する。更に、１つ以上の実施例では、ドキュメントフィンガープリント照合システム610は、特徴抽出結果に関するメタデータを制御構造714と共有し、そのメタデータを利用して、他のシステムコンポーネントのパラメータを調整する。これは、貧弱な品質データの認識を阻止することで、演算制約をかなり削減し、精度をかなり改善することを当業者は認識するであろう。例えば、ワード包囲ボックスを特定する特徴抽出モジュール718は、制御構造714に行数及び発見した「ワード」を通知する。ワード数が多すぎた場合(例えば、入力イメージが分割されることを示す場合)、制御構造714は品質評価モジュール712に不鮮明な画像を作成することを指示してもよい。そして品質評価モジュール712は適切な信号を捕捉装置106に送信する。或いは、制御構造714は、画像処理モジュール716にスムージングフィルタを適用するように指示してもよい。 The feature extraction module 718 converts the captured image into a symbolic representation. In one embodiment, the feature extraction module 718 identifies words and calculates a box around them. In another embodiment, feature extraction module 718 identifies connected components and provides a description of their shape. Further, in one or more embodiments, the document fingerprint matching system 610 shares metadata about the feature extraction results with the control structure 714 and uses the metadata to adjust parameters of other system components. One skilled in the art will recognize that this significantly reduces computational constraints and significantly improves accuracy by preventing the recognition of poor quality data. For example, the feature extraction module 718 that identifies a word enclosing box notifies the control structure 714 of the number of lines and the found “word”. If there are too many words (eg, indicating that the input image is to be split), the control structure 714 may instruct the quality assessment module 712 to create a blurred image. The quality assessment module 712 then sends an appropriate signal to the capture device 106. Alternatively, the control structure 714 may instruct the image processing module 716 to apply a smoothing filter.

分類モジュール720は、特徴抽出モジュール718からの特徴記述を、書類中の1以上のページの身元及びそれらのページ内のｘ、ｙ座標(入力画像パッチが現れている場所)に変換する。その身元は上述したようにデータベース3400からのフィードバックに依存して形成される。更に、1つ以上の実施例では、信頼度の値が各判定に関連付けられてもよい。ドキュメントフィンガープリント照合システム610は、そのような判定を利用して、システム内の他のコンポーネントのパラメータを決定してもよい。例えば、上位２つの判定の信頼度が互いに近接していた場合、画像処理アルゴリズムのパラメータは変更されるべきことを、制御構造714は決定してもよい。これは、メジアンフィルタの範囲の大きさを増やし、結果のダウンストリームを残りのコンポーネントに運ぶことになる。 The classification module 720 converts the feature description from the feature extraction module 718 into the identity of one or more pages in the document and the x and y coordinates (where the input image patch appears) in those pages. Its identity is formed depending on the feedback from the database 3400 as described above. Further, in one or more embodiments, a confidence value may be associated with each determination. Document fingerprint verification system 610 may utilize such determination to determine parameters of other components in the system. For example, if the reliability of the top two determinations are close to each other, the control structure 714 may determine that the parameters of the image processing algorithm should be changed. This increases the extent of the median filter range and carries the resulting downstream to the remaining components.

更に、図７に示されるように、分類モジュール720及びデータベース3400の間にフィードバックがあってもよい。更に、当業者は図６に示されるようにデータベース3400がモジュール610の外部にあってもよいことを思い出すであろう。パッチの身元に関する確認は、同様な外観を有する他のパッチを求めて、データベース3400に問い合わせるのに使用可能である。これは、入力画像パッチとデータベース3400との照合ではなく、データベース3400に格納されているパッチの完全な画像データと、データベース3400内の他の画像を比較する。これは、分類モジュール720の判定に関する追加的な確認レベルをもたらし、一致するデータの何らかの予備的処理を可能にする。 Furthermore, there may be feedback between the classification module 720 and the database 3400, as shown in FIG. Further, those skilled in the art will recall that the database 3400 may be external to the module 610 as shown in FIG. Verification of patch identity can be used to query the database 3400 for other patches that have a similar appearance. This is not a comparison between the input image patch and the database 3400, but the complete image data of the patch stored in the database 3400 is compared with other images in the database 3400. This provides an additional level of confirmation regarding the determination of the classification module 720 and allows some preliminary processing of the matching data.

データベース比較は、イメージデータだけのもの以外のパッチについては、シンボリック表現でも実行可能である。例えば、最良の判定内容は、イメージパッチが、12ポイントのアリエル(Arial)フォントで１行置きであったとする。データベース比較は、同様なフォント、行間及びワードレイアウトの他の書類中のパッチを、イメージデータ比較以外のテキストメタデータだけを用いて特定できるかもしれない。 The database comparison can also be executed in symbolic expression for patches other than those of only image data. For example, it is assumed that the best determination content is that the image patch is every other line in a 12-point Arial font. Database comparisons may identify patches in other documents with similar fonts, line spacing, and word layouts using only text metadata other than image data comparisons.

データベース3400はいくつかのタイプのコンテンツベースのクエリをサポートしてもよい。分類モジュール720はデータベース3400に或る特徴配置を伝達し、その配置が現れるｘ−ｙ座標及び書類のリストを受信してもよい。例えば、特徴は縦横のワード長の三重文字(trigram)(後述)でもよい。データベース3400はクエリの何れかのタイプに応じて結果のリストを返すように組織されてもよい。分類モジュール720又は制御構造714は、それらのランキング(格付け)を結合し、１つのソートされた判定リストを生成してもよい。 Database 3400 may support several types of content-based queries. The classification module 720 may communicate a feature arrangement to the database 3400 and receive a list of xy coordinates and documents in which the arrangement appears. For example, the feature may be a trigram (described later) having a word length vertically and horizontally. The database 3400 may be organized to return a list of results depending on any type of query. The classification module 720 or the control structure 714 may combine their rankings to generate a single sorted decision list.

データベース3400、分類モジュール720及び制御構造714の間にフィードバックが存在してもよい。特徴ベクトルから位置を特定するのに十分な情報を格納することに加えて、データベース3400は、図形成分に関するシンボリック表現だけでなく、書類の当初のイメージを含む関連情報を格納してもよい。これは、制御構造714が他のシステムコンポーネントの動作をその場で制御することを可能にする。例えば、所与の画像パッチについてもっともらしい２つの判定があった場合、データベース3400は、画像の表示に関して縮小すること及び領域右側を検査することでそれらを明確化するように指示してもよい。制御構造714は、縮小することを指示する適切なメッセージを捕捉装置106に送信する。特徴抽出モジュール718及び分類モジュール720は、書類に印刷される画像についてその画像の右側を調べる。 There may be feedback between the database 3400, the classification module 720 and the control structure 714. In addition to storing sufficient information to locate from feature vectors, the database 3400 may store related information including the original image of the document as well as symbolic representations of the graphical components. This allows the control structure 714 to control the operation of other system components in place. For example, if there are two plausible decisions for a given image patch, the database 3400 may instruct them to clarify them by reducing with respect to the display of the image and examining the right side of the region. The control structure 714 sends an appropriate message to the capture device 106 instructing to reduce. The feature extraction module 718 and the classification module 720 examine the right side of the image for the image printed on the document.

更に、パッチが書類の中に適切に位置している場合、データベース3400は画像パッチを包囲するデータについての詳細な情報を格納することに留意を要する。これは、従来では予想されていない更なるハードウエア及びソフトウエア画像分析ステップを引き起こすのに使用されてもよい。その詳細な情報は、一例として、書類の詳細なシンボリック記述を保存するプリント捕捉システムによって用意される。１つ以上の他の実施例では、同様な情報が書類をスキャンすることで取得されてもよい。 Furthermore, it should be noted that the database 3400 stores detailed information about the data surrounding the image patch if the patch is properly located in the document. This may be used to trigger additional hardware and software image analysis steps not previously anticipated. The detailed information is prepared, for example, by a print capture system that stores a detailed symbolic description of the document. In one or more other embodiments, similar information may be obtained by scanning a document.

図７を更に参照するに、ポジショントラッキングモジュール724は、制御構造714から画像パッチの身元に関する情報を受信する。ポジショントラッキングモジュール724は、それを用いてデータベース3400から書類ページ全体のコピーを取り出す又は書類を記述するデータ構造を取り出す。ポジショントラッキングプロセスの開始に関する最初の場所は、アンカー(anchor)になる。ポジショントラッキングモジュール724は、品質評価モジュール712が捕捉画像はトラッキングに相応しいことを確認した場合、捕捉装置106からイメージデータを受け取る。ポジショントラッキングモジュール724は、最後のフレームが良好に認識されて以来経過した時間の情報も有する。ポジショントラッキングモジュール724は、捕捉装置106が連続的なフレームの間で動かされた場合、書類上で距離を推定するのを可能にする選択的なフロー処理技術を使用する。捕捉装置の所与のサンプリングレートの基で、装置の見るデータがたとえ認識可能でなかったとしても、その検索対象は推定可能である。捕捉装置106の推定位置は、画像データと、データベース書類から導出された関連する画像データとの比較により確認されてもよい。簡易な例は、捕捉された画像と、データベース3400の中の予想される画像との相互相関を計算することである。 Still referring to FIG. 7, the position tracking module 724 receives information about the identity of the image patch from the control structure 714. The position tracking module 724 uses it to retrieve a copy of the entire document page from the database 3400 or retrieve a data structure describing the document. The first place for the start of the position tracking process is the anchor. The position tracking module 724 receives image data from the capture device 106 when the quality assessment module 712 confirms that the captured image is suitable for tracking. The position tracking module 724 also has information on the time that has elapsed since the last frame was successfully recognized. The position tracking module 724 uses a selective flow processing technique that allows a distance to be estimated on the document if the capture device 106 is moved between successive frames. Based on a given sampling rate of the capture device, the search object can be estimated even if the data seen by the device is not recognizable. The estimated position of the capture device 106 may be confirmed by comparing the image data with related image data derived from a database document. A simple example is to calculate the cross-correlation between the captured image and the expected image in the database 3400.

こうして、ポジショントラッキングモジュール724は、データベース画像のインタラクティブな利用法をもたらし、ポジショントラッキングアルゴリズムの進行を案内する。これは、図形及び画像のようなテキストでないオブジェクトに対する電子相互作用のためのアタッチメントを可能にする。更に、１つ以上の他の実施例では、そのようなアタッチメントは、上述の画像比較／確認を行わずに実現されてもよい。言い換えれば、ページに関する捕捉装置106の瞬間的な動きを推定することで、捕捉画像に関係ない視野の中にある電子的なリンクが推定されてもよい。 Thus, the position tracking module 724 provides interactive usage of the database image and guides the progress of the position tracking algorithm. This allows attachment for electronic interaction with non-text objects such as graphics and images. Furthermore, in one or more other embodiments, such attachments may be implemented without performing the image comparison / confirmation described above. In other words, by estimating the instantaneous movement of the capture device 106 with respect to the page, an electronic link in the field of view not related to the captured image may be estimated.

図１１は、本発明の別の実施例による書類のフィンガープリント照合法を示す。図１１に示される「フィードフォワード」法は、パッチ各々を独立に処理する。この方法は、1つ以上のページ及びそのページでのｘ−ｙ座標(パッチが現れる場所)を特定するのに使用される特徴を画像パッチから取り出す。例えば、1つ以上の実施例では、ドキュメントフィンガープリント照合用の特徴抽出は、捕捉される画像の特徴(例えば、ワード、キャラクタ、ブロック等)の水平方向の及び垂直方向のグループ化に依存してもよい。抽出された特徴のグループは、抽出された特徴を含む書類(及びその書類中のパッチ)を探すのに使用されてもよい。OCR機能は、捕捉した画像における水平方向のワード対を特定するのに使用されてもよい。特定された水平方向のワード対の各々は、データベース3400に対するサーチクエリを形成するのに使用され、そのサーチクエリは、特定された水平方向のワードペアを含む全ての書類及び書類中のワードペアのｘ−ｙ座標を確認するためのものである。例えば、水平方向のワードペアが「その，猫」(“the,cat”)であった場合、データベース3400は(15,x,y),(20,x,y)を返し、水平方向のワードペア「その，猫」が書類15及び20のｘ−ｙ座標に現れることを示す。同様に、垂直方向のワードペア各々について、データベース3400は、そのワードペアのインスタンスを含む全ての書類及び書類中のワードペアのｘ−ｙ座標を探すように問い合わせられる。例えば、垂直方向(縦向き)に隣接するワードペア「中で，帽子」(“in,hat”)に関し、データベース3400は(15,x,y),(7,x,y)を返し、縦向きに隣接するワードペア「中で，帽子」は書類15,7のｘ−ｙ座標に現れることを示す。こうして、データベース3400から返される書類及びロケーション情報を用いることで、捕捉画像から取り出された様々な水平方向ワードペア及び垂直方向ワードペアの間でどの書類が最大のロケーションオーバーラップを表したかに関する判定を行うことができる。これは、ホットスポット及びリンクされたメディアの存否が確認されることに応じて、捕捉された画像を含む書類を確認することになる。 FIG. 11 illustrates a document fingerprint verification method according to another embodiment of the present invention. The “feed forward” method shown in FIG. 11 processes each patch independently. This method retrieves from an image patch the features used to identify one or more pages and the xy coordinates (where the patch appears) on that page. For example, in one or more embodiments, feature extraction for document fingerprint matching may depend on horizontal and vertical groupings of captured image features (e.g., words, characters, blocks, etc.). Also good. The group of extracted features may be used to look for documents (and patches in the document) that contain the extracted features. The OCR function may be used to identify horizontal word pairs in the captured image. Each identified horizontal word pair is used to form a search query against the database 3400, which searches all documents including the identified horizontal word pair and the x− of the word pairs in the document. This is for confirming the y coordinate. For example, if the horizontal word pair is “that, cat” (“the, cat”), the database 3400 returns (15, x, y), (20, x, y) and the horizontal word pair “ Indicates that “cat” appears in the xy coordinates of documents 15 and 20. Similarly, for each vertical word pair, the database 3400 is queried to find all documents that contain that word pair instance and the xy coordinates of the word pairs in the document. For example, for the word pair “in, hat” (“in, hat”) adjacent in the vertical direction (vertical direction), the database 3400 returns (15, x, y), (7, x, y) Indicates that the word pair “in, hat” adjacent to appears in the xy coordinates of document 15,7. Thus, using the document and location information returned from the database 3400, a determination is made as to which document represented the greatest location overlap between the various horizontal and vertical word pairs retrieved from the captured image. be able to. This will confirm the document containing the captured image in response to the confirmation of the presence of the hotspot and linked media.

図１２は、本発明の別の実施例による書類のフィンガープリント照合法を示す。図１２に示される「インタラクティブ画像分析」法は、画像処理及び特徴抽出の間での相互作用を含み、画像パッチが認識される前になされてもよい。例えば、画像処理モジュール716は、入力画像の不鮮明さを先ず推定してもよい。そして、特徴抽出モジュール718は、画像テキストのページからの距離及びポイントサイズを算出する。そして、画像処理モジュール716は、そのポイントサイズのフォントのキャラクタを利用して、その画像に関してテンプレート照合ステップを実行してもよい。その後、特徴抽出モジュール718はその実行結果からキャラクタ又はワード特徴を取り出してもよい。更に、フォント、ポイントサイズ及び特徴は、書類データベース3400の中でそのフォントにより拘束されることを当業者は認識するであろう。 FIG. 12 illustrates a document fingerprint verification method according to another embodiment of the present invention. The “interactive image analysis” method shown in FIG. 12 involves an interaction between image processing and feature extraction and may be done before the image patch is recognized. For example, the image processing module 716 may first estimate the unclearness of the input image. Then, the feature extraction module 718 calculates the distance from the page of the image text and the point size. Then, the image processing module 716 may execute a template matching step for the image using the font character of the point size. Thereafter, the feature extraction module 718 may extract character or word features from the execution result. Further, those skilled in the art will recognize that fonts, point sizes and features are constrained by the fonts in the document database 3400.

図１３には、図１２に関して上述したようなインタラクティブ画像分析の例が示されている。入力画像パッチは、カメラからの距離だけでなく画像パッチ内のテキストのフォント及びポイントサイズをも推定するようにステップ1310で処理される。フォントの推定（即ち、パッチ内のテキストのフォントの候補の特定）は、既知の技術でなされてもよいことに当業者は留意するであろう。ポイントサイズ及び距離推定は、例えば、図１０に関して説明されたフロープロセスを用いることで実行されてもよい。更に、捕捉装置に容易に適用可能な既知の焦点距離法等のような他の技術が使用されてもよい。 FIG. 13 shows an example of interactive image analysis as described above with respect to FIG. The input image patch is processed at step 1310 to estimate not only the distance from the camera but also the font and point size of the text in the image patch. Those skilled in the art will note that font estimation (i.e., identifying font candidates for text in a patch) may be done with known techniques. Point size and distance estimation may be performed, for example, using the flow process described with respect to FIG. In addition, other techniques may be used such as known focal length methods that are readily applicable to the capture device.

図１３を更に参照するに、ライン分割アルゴリズムがステップ1312で適用され、パッチの中のテキストのライン周囲を包囲するボックスを構築する。各ライン画像の高さは、プロポーショナルスケーリングのような既知の方法を用いて、ステップ1314で固定サイズに規格化される。フォントサイズだけでなく、画像内で検出されたフォントの身元は、フォントプロトタイプの集まり(コレクション)に伝達され、それらは、指名されたフォント各々でキャラクタの画像プロトタイプを取り出すのに使用される。 Still referring to FIG. 13, a line splitting algorithm is applied at step 1312 to build a box that surrounds the line of text in the patch. The height of each line image is normalized to a fixed size at step 1314 using known methods such as proportional scaling. In addition to the font size, the identity of the font detected in the image is communicated to a collection of font prototypes, which are used to retrieve the character's image prototype for each named font.

フォントデータベース1322は、ユーザのシステムにおけるフォントコレクションから構築されてもよく、ユーザのシステムは、書類を印刷するためのオペレーティングシステム及び他のソフトウエアアプリケーションによって使用される(例えば、マイクロソフトウインドウズのラスターフォント、TrueType、OpenType等である。)。１つ以上の他の実施例では、フォントコレクションは、データベース3400内の書類の当初の画像から生成されてもよい。データベース3400のxmlファイルは、ボックスを包囲するｘ−ｙ座標を用意し、当初の画像からキャラクタの正確なプロトタイプ画像を抽出するのに使用されてもよい。Xmlファイルは、フォントの名称及びキャクタのフォントサイズを正確に特定する。 The font database 1322 may be constructed from a font collection in the user's system, which is used by the operating system and other software applications for printing documents (e.g., Microsoft Windows raster fonts, TrueType, OpenType, etc.) In one or more other embodiments, the font collection may be generated from an original image of a document in the database 3400. The xml file of database 3400 may be used to prepare xy coordinates surrounding the box and extract an accurate prototype image of the character from the original image. The Xml file accurately specifies the name of the font and the font size of the character.

選択されたフォントにおけるキャラクタプロトタイプは、ステップ1314で使用されたパラメータの機能に基づいて、ステップ1320で規格化されたサイズである。ステップ1316での画像分類は、ステップ1320で出力された規格化されたサイズのキャラクタと、ステップ1314での出力とを比較し、画像パッチ内のｘ−ｙ座標各々での判定を行う。既存の画像テンプレート照合法が、(ci,xi,yi,wi,hi)のような出力を生成するのに使用され、ここでciはキャラクタの身元であり、(xi,yi)はボックスを包囲する左上隅の座標であり、hi,wiはその高さ及び幅であり、i=1,...,nは画像パッチ内で検出されたキャラクタを指す。 The character prototype in the selected font is the size normalized in step 1320 based on the function of the parameters used in step 1314. In the image classification in step 1316, the character of the standardized size output in step 1320 is compared with the output in step 1314, and a determination is made for each xy coordinate in the image patch. Existing image template matching methods are used to generate output such as (ci, xi, yi, wi, hi), where ci is the character's identity and (xi, yi) encloses the box , Hi, wi are the height and width thereof, and i = 1,..., N indicate the characters detected in the image patch.

ステップ1318では、幾何学的制約データベースでの探索が上述したように実行されるが、この場合、ワードペアの代わりにキャラクタペアについて特化されてもよい。その場合、“a-b”は、ａ及びbが水平方向に隣接していることを示し；“a+b”は、それらが垂直方向に隣接していることを示し；“a/b”は、ａがｂの南西にあることを示し；“a＼b”は、ａがｂの南東にあることを示す。幾何学的位置関係は、キャラクタの各ペアのxi,yiの値から導出されてもよい。MMRデータベース3400は、それが書類ページのリストを返すように組織され、その書類ページはワードペアでなくキャラクタペアを含むものである。ステップ1326デの出力は、スコアで格付けされたｎ組として表現される入力画像に一致する候補のリストである(documenti,pagei,xi,yi,actioni,scorei)。 In step 1318, the search in the geometric constraint database is performed as described above, but in this case, it may be specialized for character pairs instead of word pairs. In that case, “ab” indicates that a and b are adjacent in the horizontal direction; “a + b” indicates that they are adjacent in the vertical direction; “a / b” is indicates that a is southwest of b; “a \ b” indicates that a is southeast of b. The geometric positional relationship may be derived from the values of xi and yi for each pair of characters. The MMR database 3400 is organized so that it returns a list of document pages that contain character pairs rather than word pairs. The output of step 1326 is a list of candidates that match the input image expressed as n sets ranked by score (documenti, pagei, xi, yi, actioni, scorei).

図１４は、本発明の別の実施例による書類のフィンガープリント照合法を示す。図１４に示される「生成及び検査」法は、各パッチを独立に処理する。その方法は、所与の画像パッチを含むかもしれない画像ページ数を特定するのに使用される特徴を画像パッチから取り出す。更に、１つ以上の実施例では、追加的な抽出及び分類ステップが実行され、ページが画像パッチを含む尤度で各ページをランキングしてもよい。 FIG. 14 illustrates a document fingerprint verification method according to another embodiment of the present invention. The “generation and inspection” method shown in FIG. 14 processes each patch independently. The method extracts features from the image patch that are used to identify the number of image pages that may contain a given image patch. Further, in one or more embodiments, additional extraction and classification steps may be performed to rank each page with a likelihood that the page contains image patches.

図１４を参照しながら説明される「生成及び検査」法を更に参照するに、捕捉される画像の特徴は抽出され、それら抽出された特徴のほとんどを含むデータベース3400中の画像パッチが特定されてもよい。最も一致する特徴を伴う最初のX個の書類パッチ(候補)が、更に処理される。このプロセスでは、一致する書類パッチ候補中の特徴の相対的な位置が、目下対象の画像の特徴の相対的な場所と比較される。その比較に基づいて或るスコアが算出される。そして、最も一致する書類パッチPに対応する最高スコアが確認される。最高スコアが適用される閾値より大きかった場合、その書類パッチPは問われているイメージに合致するものとして見出される。閾値は(例えば抽出された特徴数を含む)多くのパラメータに対して適応的である。データベース3400では、書類パッチPがどこから来たかが分かり、問われている画像が、その場所から来ているものとして確認される。 With further reference to the “generation and inspection” method described with reference to FIG. 14, the captured image features are extracted and the image patches in the database 3400 that contain most of the extracted features are identified. Also good. The first X document patches (candidates) with the best matching features are further processed. In this process, the relative positions of features in matching document patch candidates are compared to the relative locations of the current image features. A certain score is calculated based on the comparison. Then, the highest score corresponding to the most matching document patch P is confirmed. If the highest score is greater than the threshold applied, the document patch P is found as matching the image being queried. The threshold is adaptive for many parameters (eg including the number of extracted features). In the database 3400, it is known where the document patch P came from, and the image being queried is confirmed as coming from that location.

図１５は、ワード境界ボックス判定アルゴリズムの例を示す。回転量を正す画像処理後の入力画像パッチ1510が示されている。スキュー補正アルゴリズムとして一般に知られているように、この種の技術はテキスト画像を回転させ、テキスト画像が水平軸に揃うようにする。境界ボックス判定の次のステップは、水平投影プロファイル(平面図特性)(1512)の算出である。行判定用の閾値は、既知の適応的な閾値法により又はスライディングウインドウアルゴリズムにより選択され(1516)、それらの方法では「閾値を越える」領域はテキストの行に対応する。各行内の領域は、1514及び1518と同様に抽出及び処理され、行の中でワードを表す閾値を超える領域を特定する。テキストの或る行の中で検出された境界ボックスの例が、1520で示されている。 FIG. 15 shows an example of a word bounding box determination algorithm. An input image patch 1510 after image processing for correcting the rotation amount is shown. As commonly known as a deskew algorithm, this type of technique rotates the text image so that the text image is aligned with the horizontal axis. The next step of the bounding box determination is the calculation of the horizontal projection profile (plan view characteristics) (1512). The threshold for line determination is selected (1516) by a known adaptive threshold method or by a sliding window algorithm, in which the “exceeding threshold” region corresponds to a line of text. The regions within each row are extracted and processed in the same manner as 1514 and 1518 to identify the regions in the row that exceed the threshold representing a word. An example of a bounding box detected in a line of text is shown at 1520.

様々な特徴が書類パッチ候補との比較用に取り出されてもよい。例えば、スケールインバリアントフィーチャートランスフォーム(SIFT)特性、コーナー特性、突出点、アセンダー(ascender)、ディセンダー、ワード境界及びスペース等が照合用に抽出されてもよい。書類画像から容易に抽出可能な特徴の1つは、ワード境界である。ワード境界が抽出されると、それらは図１６に示されるように複数のグループに形成される。図１６では、例えば、ワード境界が、重なるワード境界を上にも下にも有し、且つ重なるワード境界の総数が少なくとも３であるように、縦のグループが形成される(重なるワード境界の最小数は、1つ以上の他の実施例で異なってもよいことに留意を要する。)。例えば、第１の特徴点(第２行中の２番目のワードボックスで長さが６のもの)は、上に２つのワード境界を有し(それらの長さは５及び７である)、下に１つのワード境界を有する(その長さは５である)。第２特徴点(第３行中の４番目のワードボックスであり、長さが５のもの)は、上に２つのワード境界を有し(それらの長さは４及び５である)、下に２つのワード境界を有する(それらの長さは８及び７である)。従って、図１６に示されるように、図示の特徴は、上のワード境界の長さに続き、下のワード境界の長さに続く中間のワード境界の長さとともに表現される。更に、ワードボックスの長さは、如何なるメトリックに基づいてもよいことに留意を要する。従って、いくつかのワードボックスについて別の長さを用意することもできる。その場合、それらの代替物の全部又は一部を含む特徴が、抽出されてもよい。 Various features may be retrieved for comparison with document patch candidates. For example, scale invariant feature transform (SIFT) characteristics, corner characteristics, protruding points, ascenders, descenders, word boundaries, spaces, etc. may be extracted for matching. One feature that can be easily extracted from document images is word boundaries. As word boundaries are extracted, they are formed into groups as shown in FIG. In FIG. 16, for example, vertical groups are formed such that word boundaries have overlapping word boundaries above and below, and the total number of overlapping word boundaries is at least three (minimum overlapping word boundaries). Note that the number may be different in one or more other embodiments.) For example, the first feature point (the second word box in the second row with a length of 6) has two word boundaries above (they are 5 and 7), It has one word boundary below (its length is 5). The second feature point (the fourth word box in the third row, which has a length of 5) has two word boundaries above (they are 4 and 5 in length), and the bottom Have two word boundaries (their lengths are 8 and 7). Thus, as shown in FIG. 16, the illustrated features are expressed with the length of the middle word boundary following the length of the upper word boundary and the length of the lower word boundary. Furthermore, note that the length of the word box may be based on any metric. Therefore, it is possible to prepare different lengths for some word boxes. In that case, features including all or part of those alternatives may be extracted.

更に、１つ以上の実施例では、スペースが０で表現され且つワード領域が１で表現されるように特徴が抽出されてもよい。図１７には具体例が示されている。右側のブロック表現は、左側の書類パッチのワード／スペース領域に対応する。 Further, in one or more embodiments, features may be extracted such that spaces are represented by 0 and word regions are represented by 1. FIG. 17 shows a specific example. The block representation on the right corresponds to the word / space area of the left document patch.

抽出された特徴は、様々な距離指標(例えば、ノルムやハミング距離を含む)と比較されてもよい。或いは、１つ以上の実施例では、問い合わせのイメージ(クエリイメージ)と同じ特徴を有する書類パッチを特定するためにハッシュテーブルが使用されてもよい。そのようなパッチがいったん特定されると、各特徴点から他の特徴点への角度が図１８に示されるように算出されてもよい。或いは、特徴点のグループ間の角度が算出されてもよい。1802は、３つの特徴点から計算された角度1803,1804,1805を表す。算出された角度は、クエリイメージでの特徴点各々から他の特徴点への角度と比較される。照合する特徴点の何らかの角度が似ていた場合、類似スコアが増やされてもよい。或いは、角度のグループが使用される場合であって、２つのイメージの中で特徴点の同様なグループ間の角度のグループが数値的に同様であった場合、類似スコアが増やされてもよい。クエリイメージ及び抽出された各書類パッチ間のスコアが一旦算出されると、最高のスコアをもたらす書類パッチが選択され、適応的な閾値と比較され、その一致度が或る所定の基準に合うか否かを確認する。その基準に合っていた場合、一致する書類パッチが発見されたものとして通知される。 The extracted features may be compared with various distance metrics (eg, including norm and hamming distance). Alternatively, in one or more embodiments, a hash table may be used to identify document patches that have the same characteristics as the query image (query image). Once such a patch is identified, the angle from each feature point to another feature point may be calculated as shown in FIG. Alternatively, the angle between groups of feature points may be calculated. 1802 represents angles 1803, 1804, and 1805 calculated from the three feature points. The calculated angle is compared with the angle from each feature point to another feature point in the query image. If some angle of feature points to be matched is similar, the similarity score may be increased. Alternatively, if an angle group is used, and the angle groups between similar groups of feature points in the two images are numerically similar, the similarity score may be increased. Once the score between the query image and each extracted document patch is calculated, the document patch that yields the highest score is selected and compared to an adaptive threshold to determine if the degree of match meets certain predetermined criteria. Confirm whether or not. If it meets the criteria, it is notified that a matching document patch has been found.

更に、１つ以上の実施例では、抽出された特徴はワード長に基づいてもよい。ワードの高さ及び幅に基づいて各ワードが推定文字(複数)に分割される。所与のワード上下のワードラインがスキャンされると、上下のワードライン中のスペース情報に応じて、推定文字各々に２進値が割り当てられる。そして、２進コードは整数で表現される。例えば、図１９を参照するに、ワードボックスの配列が示され、各ワードボックスは、捕捉されたイメージの中で検出されたワードを表現する。ワード1910は推定文字に分割される。この特徴は、(i)ワード1910の長さ、(ii)ワード1910の上の行のテキスト配列及び(iii)ワード1910の下の行のテキスト配列とともに記述される。ワード1910の長さは、推定文字の数で測定される。テキスト配列情報は、目下の推定文字の上下のスペース情報の２進コードから引き出される。ワード1910の場合、最後の推定文字だけがスペースの上にあり、第２及び第３の推定文字はスペースの下にある。従って、ワード1910の特徴は、(6,100111,111110)としてコード化され、ここで、０はスペースを意味し、１はスペースでないことを意味する。整数形式で書き直すと、ワード1910は(6,39,62)にコード化される。 Further, in one or more embodiments, the extracted features may be based on word length. Each word is divided into estimated characters based on the height and width of the word. When the upper and lower word lines of a given word are scanned, a binary value is assigned to each estimated character according to the space information in the upper and lower word lines. The binary code is expressed as an integer. For example, referring to FIG. 19, an array of word boxes is shown, each word box representing a detected word in the captured image. Word 1910 is divided into estimated characters. This feature is described with (i) the length of word 1910, (ii) the text array in the line above word 1910, and (iii) the text array in the line below word 1910. The length of word 1910 is measured by the estimated number of characters. The text arrangement information is derived from the binary code of the space information above and below the current estimated character. For word 1910, only the last estimated character is above the space, and the second and third estimated characters are below the space. Thus, the features of word 1910 are encoded as (6,100111,111110), where 0 means a space and 1 means no space. When rewritten in integer form, word 1910 is encoded as (6,39,62).

図２０は、本発明の別の実施例による書類のフィンガープリント照合法を示す。図２０に示される「複数分類(multiple classification)」法は、様々な特徴記述の相補的な情報を分類し、その結果を結合することで、それらの情報を活用する。テキストパッチ照合に適用されるこのパラダイムの具体例は、横に及び縦に隣接するワードペアの長さを抽出し、データベース中のパッチのランキングを個々に計算することである。より具体的には、例えば、１つ以上の実施例において、分類モジュール720とともに「分類子」アテンダントによって、特徴のロケーションが確認される。捕捉されたイメージの縦横の特徴の分類子の結合を利用して、捕捉されたイメージはフィンガープリント照合に委ねられる。これは、テキストのイメージは、その身元に関して２つの独立したソースを含み−ワードの横並びに加えて、縦方向のワードのレイアウトが書類を特定するのに使用可能である（その書類はイメージの抽出元である）、という観点から実行される。例えば、図２１に示されるように、捕捉イメージ2110は、水平分類子2112及び垂直分類子2114によって分類される。捕捉イメージを入力することに加えて、分類子2112，2114各々は、データベース3400から情報を取り出し、各分類子を適用した書類ページのランキングを出力する。言い換えれば、図２１に示される複数分類子法は、水平及び垂直方向の特徴を利用して捕捉イメージを独立に分類する。書類ページのランキングされたリスト（複数）は、結合アルゴリズム2118により結合され(具体例については後述される)、書類ページのランキングされたリスト（単数）を出力し、そのリストは、捕捉イメージ2110の水平及び垂直方向の特徴双方に基づく。特に、１つ以上の実施例では、水平分類子2112及び垂直分類子2114からの別々のランキングは、ある情報(検出された特徴がデータベース3400の中でどのように共に現れるかについての情報)を用いて結合される。 FIG. 20 illustrates a document fingerprint verification method according to another embodiment of the present invention. The “multiple classification” method shown in FIG. 20 uses complementary information by classifying complementary information of various feature descriptions and combining the results. A specific example of this paradigm applied to text patch matching is to extract the length of horizontally and vertically adjacent word pairs and individually calculate the ranking of the patches in the database. More specifically, for example, in one or more embodiments, the location of a feature is ascertained by a “classifier” attendant along with a classification module 720. The captured image is left to fingerprint verification using a combination of the vertical and horizontal feature classifiers of the captured image. This means that an image of text contains two independent sources for its identity-in addition to the side-by-side of the word, a vertical word layout can be used to identify the document (the document is an image extractor). It is executed from the viewpoint of For example, as shown in FIG. 21, the captured image 2110 is classified by a horizontal classifier 2112 and a vertical classifier 2114. In addition to inputting captured images, each of classifiers 2112 and 2114 retrieves information from database 3400 and outputs a ranking of document pages to which each classifier has been applied. In other words, the multiple classifier method shown in FIG. 21 classifies captured images independently using horizontal and vertical features. The ranked list of document pages is combined by a combining algorithm 2118 (an example will be described below) to output a ranked list of document pages, the list of captured images 2110 Based on both horizontal and vertical features. In particular, in one or more embodiments, separate rankings from horizontal classifier 2112 and vertical classifier 2114 can provide certain information (information about how detected features appear together in database 3400). Combined.

図２２を参照するに、特徴抽出に備えて縦のレイアウトが横のレイアウトとどのように統合されるかについての具体例が示されている。(a)では、ワード分割と共に捕捉イメージ2200が示されている。その捕捉イメージから、水平及び垂直の「ｎグラム(n-gram)」が決定される。「ｎグラム」は、ある特徴の量を記述する数ｎ個のシーケンスである。例えば、水平方向のトリグラム(trigram)は、３つのワードの水平方向シーケンス中の各ワード内のキャラクタ数を指定する。例えば、捕捉イメージ2200の場合、(b)は、5-8-7(捕捉イメージ2200の第１行にある横に並んだワード“upper”,“division”及び“courses”各々のキャラクタ数)、7-3-5(捕捉イメージ2200の第２行にある横に並んだワード“Project”,“has”及び“begun”各々のキャラクタ数)、3-5-3(捕捉イメージ2200の第２行にある横に並んだワード“has”,“begun”及び“The”各々のキャラクタ数)、3-3-6(捕捉イメージ2200の第３行にある横に並んだワード“461”,“and”及び“permit”各々のキャラクタ数)及び3-6-8(捕捉イメージ2200の第３行にある横に並んだワード“and”,“permit”及び“projects”各々のキャラクタ数)の水平トリグラムを示す。 Referring to FIG. 22, a specific example of how a vertical layout is integrated with a horizontal layout in preparation for feature extraction is shown. In (a), a captured image 2200 is shown along with word splitting. From the captured image, horizontal and vertical “n-grams” are determined. An “n-gram” is a sequence of several n describing the amount of a certain feature. For example, a horizontal trigram specifies the number of characters in each word in a horizontal sequence of three words. For example, in the case of captured image 2200, (b) is 5-8-7 (the number of characters in each of the words “upper”, “division” and “courses” side by side in the first row of captured image 2200), 7-3-5 (number of characters in the words “Project”, “has” and “begun” side by side in the second row of the captured image 2200), 3-5-3 (second row of the captured image 2200) Next to the words “has”, “begun” and “The”, the number of characters in each row), 3-3-6 (the words “461”, “and” next to the third row of the captured image 2200) ”And“ permit ”for each character) and 3-6-8 (number of characters for each of the words“ and ”,“ permit ”and“ projects ”side by side in the third row of the captured image 2200) Indicates.

縦方向のトリグラムは、所与のワード上下の縦並びのワード各々のキャラクタ数を指定する。例えば、捕捉イメージ2200の場合、(c)は、5-7-3(縦並びのワード“upper”,“Project”及び“461”各々のキャラクタ数)、8-7-3(縦並びのワード“division”,“Project”及び“461”各々のキャラクタ数)、8-3-3(縦並びのワード“division”,“has”及び“and”各々のキャラクタ数)、8-3-6(縦並びのワード“division”,“has”及び“permit”各々のキャラクタ数)、8-5-6(縦並びのワード“division”,“begun”及び“permit”各々のキャラクタ数)、8-5-8(縦並びのワード“division”,“begun”及び“projects”各々のキャラクタ数)、7-5-6(縦並びのワード“courses”,“begun”及び“permit”各々のキャラクタ数)、7-5-8(縦並びのワード“courses”,“begun”及び“projects”各々のキャラクタ数)、7-3-8(縦並びのワード“courses”,“The”及び“projects”各々のキャラクタ数)、7-3-7(縦並びのワード“Project”,“461”及び“student”各々のキャラクタ数)及び3-3-7(縦並びのワード“has”,“and”及び“student”各々のキャラクタ数)の垂直トリグラムを示す。 A vertical trigram specifies the number of characters in each vertical word above and below a given word. For example, in the case of the captured image 2200, (c) is 5-7-3 (the number of characters in each of the vertically aligned words “upper”, “Project”, and “461”), 8-7-3 (the vertically aligned words) “Division”, “Project” and “461” each character count), 8-3-3 (vertical word “division”, “has” and “and” character count each), 8-3-6 ( Vertical number of words “division”, “has” and “permit” for each character), 8-5-6 (vertical word “division”, “begun” and “permit” for each character), 8- 5-8 (number of characters for vertical words “division”, “begun” and “projects”), 7-5-6 (number of characters for vertical words “courses”, “begun” and “permit”) ), 7-5-8 (the number of characters in each of the vertical words “courses”, “begun” and “projects”), 7-3-8 (the vertical words “courses”, “The” and “projects”) Number of each character), 7-3-7 (vertical word “Project Shows a vertical trigrams "461" and "student" each number of characters) and 3-3-7 (vertically arranged word "HAS", "and" and "student" number each character).

図２２に示される捕捉イメージ2200から決定された水平及び垂直トリグラムに基づいて、書類のリスト(d)及び(e)が生成され、水平及び垂直トリグラム各々を含む書類を示す。例えば、(d)の場合、水平トリグラム7-3-5は書類15,22,134に現れる。更に、例えば、(e)の場合、垂直トリグラム7-5-6は書類15,17に現れる。書類リスト(d),(e)を用いて、参照される書類全てのランキングされたリストが(f)及び(g)にそれぞれ示される。例えば、(f)の場合、書類15は(d)の中で５つの水平トリグラムによって参照されているが、書類9は(d)の中で１つのトリグラムでしか参照されていない。更に、例えば、(g)の場合、書類15は(e)の中で11個の垂直トリグラムで参照されているが、書類18は(e)の中で１つの垂直トリグラムによってしか参照されていない。 Based on the horizontal and vertical trigrams determined from the captured image 2200 shown in FIG. 22, a list of documents (d) and (e) is generated showing the documents including horizontal and vertical trigrams, respectively. For example, in the case of (d), the horizontal trigram 7-3-5 appears in the documents 15, 22, and 134. Further, for example, in the case of (e), the vertical trigram 7-5-6 appears in the documents 15 and 17. Using the document lists (d) and (e), a ranked list of all referenced documents is shown in (f) and (g), respectively. For example, in (f), document 15 is referenced by five horizontal trigrams in (d), while document 9 is referenced by only one trigram in (d). Further, for example, in (g), document 15 is referenced by 11 vertical trigrams in (e), while document 18 is referenced by only one vertical trigram in (e). .

図２３を参照するに、図２２を参照しながら説明された水平及び垂直トリグラム情報を結合する方法が示されている。この方法は、オリジナルの印刷ページにおけるトリグラムの既知の物理的なロケーションに関する情報を利用して、水平及び垂直特徴抽出からの投票のリストを結合する。水平及び垂直分類子各々によって出力される上位M個の選択肢の中で共通する全書類に関し、書類に投票された水平トリグラム全部のロケーションが、その書類について投票された垂直トリグラム全部のロケーションと比較される。どの垂直トリグラムにも重複する書類は、水平トリグラム数に等しい投票数を受け取り、この場合において、２つのトリグラムの境界ボックスが重なる場合に、「重複(オーバーラップ)」が生じる。更に、重複の中心のｘ−ｙ座標は、図３４の3406に関して後述される適切に修正されたバージョンの証拠蓄積アルゴリズムによりカウントされる。例えば、図２３に示されるように、(a)及び(b)のリスト(図２２ではそれぞれ(f)及び(g)である)は分割され、水平及び垂直トリグラム双方で参照されるページのリスト(c)を特定する。分割されたリスト(c)、リスト(d)及び(e)(指定されたトリグラムによって参照される、分けられた書類のみを示す)、並びに印刷書類データベース3400を利用して、書類のオーバーラップが確認される。例えば、書類6は水平トリグラム3-5-3により及び垂直トリグラム8-3-6により参照され、これら２つのトリグラムは、捕捉イメージ2200の中のワード“has”に関してオーバーラップし、従って書類6はその１つのオーバーラップにつき１票を獲得する。(f)に示されるように、特定の捕捉イメージ2200の場合、書類15は最大の投票数を獲得し、捕捉イメージ2200を含む書類として確認される。(x1,y1)は書類15の中での入力イメージの場所として確認される。図２２，２３を参照しながら上述したドキュメントフィンガープリント照合法では、要するに、水平分類子はテキスト中のワードの水平配列から導出された特徴を使用し、垂直分類子はワードの垂直配列から導出された特徴を使用し、その結果が、オリジナルドキュメントでの特徴のオーバーラップに基づいて結合される。そのような特徴抽出は、特徴抽出による水平方向の特徴を適切な文法及び言語の制約に委ねる一方、垂直方向の特徴をそのような制約には委ねずに、書類を一意に特定する仕組みをもたらす。 Referring to FIG. 23, a method for combining horizontal and vertical trigram information described with reference to FIG. 22 is shown. This method uses information about the known physical location of the trigram on the original printed page to combine the list of votes from the horizontal and vertical feature extraction. For all documents that are common among the top M choices output by each horizontal and vertical classifier, the location of all horizontal trigrams voted for the document is compared to the location of all vertical trigrams voted for that document. The Documents that overlap any vertical trigram receive a vote count equal to the number of horizontal trigrams, where “overlap” occurs when the bounding boxes of the two trigrams overlap. In addition, the xy coordinates of the center of overlap are counted by an appropriately modified version of the evidence accumulation algorithm described below with respect to 3406 in FIG. For example, as shown in FIG. 23, the lists of (a) and (b) (respectively (f) and (g) in FIG. 22) are divided and lists of pages referenced in both horizontal and vertical trigrams. Specify (c). Using the split lists (c), lists (d) and (e) (only the split documents referenced by the specified trigram are shown), and the printed document database 3400, document overlap It is confirmed. For example, document 6 is referenced by horizontal trigram 3-5-3 and by vertical trigram 8-3-6, and these two trigrams overlap with respect to the word “has” in captured image 2200, so document 6 is Get one vote for each overlap. As shown in (f), for a particular captured image 2200, the document 15 gets the maximum number of votes and is confirmed as the document containing the captured image 2200. (x1, y1) is confirmed as the location of the input image in document 15. In the document fingerprint matching method described above with reference to FIGS. 22 and 23, in essence, the horizontal classifier uses features derived from the horizontal array of words in the text, and the vertical classifier is derived from the vertical array of words. Features are combined, and the results are combined based on feature overlap in the original document. Such feature extraction provides a mechanism for uniquely identifying a document, while leaving the horizontal features from feature extraction to the appropriate grammar and language constraints, while not leaving the vertical features to such constraints. .

更に、図２２，２３に関する説明は特にトリグラムを利用していたが、水平及び垂直の特徴抽出／分類の一方又は双方に如何なるｎグラムが使用されてもよい。例えば、1つ以上の実施例で、垂直及び水平のｎグラム(ｎ＝４)が、マルチ分類子特徴抽出に使用されてもよい。１つ以上の他の実施例では、水平分類子がｎグラム(n=3)に基づいて特徴を抽出し、垂直分類子がｎグラム(n=5)に基づいて特徴を抽出してもよい。 Furthermore, while the description with respect to FIGS. 22 and 23 used trigrams in particular, any n-gram may be used for one or both of horizontal and vertical feature extraction / classification. For example, in one or more embodiments, vertical and horizontal n-grams (n = 4) may be used for multi-classifier feature extraction. In one or more other embodiments, the horizontal classifier may extract features based on n-grams (n = 3) and the vertical classifier may extract features based on n-grams (n = 5). .

更に、１つ以上の実施例において、厳密には垂直でない又は水平でない隣接関係に基づいて、分類がなされてもよい。例えば、NW,SW,SW,SEの隣接関係が抽出／分類に使用されてもよい。 Further, in one or more embodiments, the classification may be based on adjacency relationships that are not strictly vertical or horizontal. For example, the adjacent relationship of NW, SW, SW, and SE may be used for extraction / classification.

図２４は、本発明の別の実施例による書類のフィンガープリント照合例を示す。入力に合致するかもしれない書類の画像を使用し、画像分析の以後のステップ(初期の画像からのサブイメージが入力イメージと照合される)を決定することで、書類画像照合システムの精度は向上する、ということを図２４に示される「データベースドリブンフィードバック」法では考慮に入れる。本方法は、入力イメージにあるノイズを複製するという変換を含む。その後にテンプレート照合分析が続く。 FIG. 24 shows an example of fingerprint verification of a document according to another embodiment of the present invention. The accuracy of the document image matching system is improved by using a document image that may match the input and determining subsequent steps in image analysis (sub-images from the initial image are matched with the input image). The “database driven feedback” method shown in FIG. 24 takes this into consideration. The method includes a transformation that duplicates the noise present in the input image. This is followed by template matching analysis.

図２５は、本発明の一実施例によるデータベースドリブンフィードバックのプロセスフローを示す。入力イメージパッチは上述のステップ2510,2512で先ず処理及び認識され(例えば、ワードOCR及びワードペアルックアップ、キャラクタOCR及びキャラクタペアルックアップ、ワード境界ボックスコンフィギュレーションなどを使用する)、イメージパッチ2522の身元の多数の候補を生成する。このリストの中の各候補は、次の項目(doci,pagei,xi,yi)を含み、dociは書類の識別子であり、pageiは書類中のページであり、(xi,yi)はそのページの中のイメージパッチの中心のｘ−ｙ座標である。 FIG. 25 illustrates a database-driven feedback process flow according to one embodiment of the present invention. The input image patch is first processed and recognized in steps 2510 and 2512 described above (eg, using word OCR and word pair lookup, character OCR and character pair lookup, word bounding box configuration, etc.) to identify the identity of image patch 2522. Generate a large number of candidates. Each candidate in this list contains the following items (doci, pagei, xi, yi), where doci is the document identifier, pagei is the page in the document, and (xi, yi) is the page It is an xy coordinate of the center of the middle image patch.

ステップ2512での初期パッチ抽出アルゴリズムは、ページからの距離の情報を利用して入力イメージパッチ全体のサイズを固定サイズに選択的に規格化し、既知の空間分解能(例えば、100dpi)に変換することを保証する。上述のフォントサイズ推定アルゴリズムが、このタスクに適用されてもよい。同様に、フォーカスからの既知の距離又はフォーカスからの深さ等が使用されてもよい。サイズの規格化は、各ワード包囲ボックスの高さに基づいてイメージパッチを線形にスケーリングすることができる。 The initial patch extraction algorithm in step 2512 uses the distance information from the page to selectively normalize the size of the entire input image patch to a fixed size and convert it to a known spatial resolution (e.g., 100 dpi). Guarantee. The font size estimation algorithm described above may be applied to this task. Similarly, a known distance from the focus, a depth from the focus, or the like may be used. Size normalization can scale the image patch linearly based on the height of each word enclosing box.

初期パッチ抽出アルゴリズムは、識別子と共にMMRデータベース3400に問い合わせ、各書類及びページを探し、MMRデータベースが生成するパッチの包囲ボックスの中心と共にそれを受け取る。生成されるパッチの範囲は、規格化された入力パッチのサイズに依存する。このような方法では、同じ空間分解能及び寸法のパッチが取得される。例えば、100dpiに規格化されている場合、入力パッチはその中心の各側に50ピクセルずつ広がる。この場合、MMRデータベースは、指定されたｘ−ｙの値を中心とする100画素の高さ及び幅の100dpiの初期パッチを生成するように指示する。 The initial patch extraction algorithm queries the MMR database 3400 with the identifier, looks for each document and page, and receives it along with the center of the enclosing box of patches that the MMR database generates. The range of patches to be generated depends on the standardized input patch size. In such a method, patches with the same spatial resolution and dimensions are obtained. For example, when standardized to 100 dpi, the input patch spreads by 50 pixels on each side of the center. In this case, the MMR database instructs to generate an initial patch of 100 dpi with a height and width of 100 pixels centered on the specified xy value.

MMRデータベースから返された初期イメージパッチ(2524)の各々は、次の項目(doci,pagei,xi,yi,widthi,heighti,actioni)に関連付けられ、(doci,pagei,xei,yi)は上述のものであり、widthi及びheightiは初期パッチの幅及び高さ(画素数)であり、actioniは或る選択的動作であり、データベース中のdociのエントリ中の対応する領域に関連付けられる。初期パッチ抽出アルゴリズムは、イメージパッチのリスト及びデータを、構成される規格化された入力パッチのサイズと共に出力する(2518)。 Each initial image patch (2524) returned from the MMR database is associated with the following items (doci, pagei, xi, yi, widthi, heighti, actioni), and (doci, pagei, xei, yi) And widthi and heighti are the width and height (number of pixels) of the initial patch, and actioni is a selective action and is associated with the corresponding region in the doci entry in the database. The initial patch extraction algorithm outputs the image patch list and data, along with the configured standardized input patch size (2518).

更に、１つ以上の実施例では、パッチ照合アルゴリズム(2516)は、規格化されたサイズの入力パッチと初期パッチ各々とを比較し、それらが互いにどの程度よく合致するかを測るスコアを割り当てる(2520)。パッチのサイズが互いに匹敵することを保証するメカニズムに起因して、ハミング距離に関する簡易な相互相関は多くの場合に十分であることを当業者は理解するであろう。更に、このプロセスは、初期パッチにノイズを導入することを含み、入力で検出された画像ノイズを模擬する。比較は代替的に複雑化してよく、如何なる特徴群の比較を含んでもよく、その特徴群は、キャラクタ、キャラクタ対又はワード対の数に基づく２パターンのOCR結果及びランキングを含み、それらの対は、上述の幾何学的位置関係から構築可能である。しかしながら、この場合、入力パッチ及び初期パッチ間に共通する幾何学的な対の数は、ランキングメトリックとして推定及び使用されてもよい。 Further, in one or more embodiments, the patch matching algorithm (2516) compares the standardized size input patch with each of the initial patches and assigns a score that measures how well they match each other ( 2520). One skilled in the art will appreciate that a simple cross-correlation with respect to the Hamming distance is often sufficient due to a mechanism that ensures that the patch sizes are comparable to each other. In addition, the process includes introducing noise into the initial patch to simulate image noise detected at the input. The comparison may alternatively be complicated and may include a comparison of any feature group, which includes two patterns of OCR results and rankings based on the number of characters, character pairs, or word pairs, which pairs are Can be constructed from the above-described geometric positional relationship. However, in this case, the number of geometric pairs common between the input patch and the initial patch may be estimated and used as a ranking metric.

更に、出力2520はｎ組の形式(doci,pagei,xi,yi,actioni,scorei)でもよく、そのスコアは、パッチ照合アルゴリズムにより用意され、入力パッチがdoci,pageiの対応する領域にどの程度よく合致しているかを測る。 Furthermore, the output 2520 may be in n sets (doci, pagei, xi, yi, actioni, scorei), and the score is prepared by the patch matching algorithm, and how well the input patch is in the corresponding area of doci, pagei. Measure whether they match.

図２６は、本発明の別の実施例による書類のフィンガープリント照合例を示す。図２６に示される「データベースドリブン分類子」法は、入力イメージを含むかもしれない候補群を生成するために初期分類子を使用する。これらの候補はデータベース3400内で探され、特徴抽出プラス分類法がそれらの候補について自動的にデザインされる。一例は、タイムス(Times)又はアリエル(Arial)フォントを含むような入力パッチを特定することである。この場合、制御構造714は、セリフ／サンセリフ(serif/san serif)の識別に特化した特徴抽出器及び分類子を起動する。 FIG. 26 shows an example of fingerprint verification of a document according to another embodiment of the present invention. The “database driven classifier” method shown in FIG. 26 uses an initial classifier to generate a set of candidates that may include an input image. These candidates are looked up in database 3400 and feature extraction plus classification methods are automatically designed for those candidates. One example is to identify an input patch that includes a Times or Arial font. In this case, the control structure 714 activates a feature extractor and classifier specialized for serif / san serif identification.

図２７は、本発明の一実施例によるデータベースドリブンクラシフィケーションのフローチャートを示す。第１の特徴抽出2710に続いて、入力イメージパッチは、上述の何らかの1つ以上の認識法によって分類され(2712)、書類のランキング、ページ及びページ内のｘ−ｙ座標を生成する。このリスト中の各候補は、例えば、次の項目(doci,pagei,xi,yi)を含み、dociは書類の識別子であり、pageiは書類中のページであり、(xi,yi)はそのページ内のイメージパッチの中心のｘ−ｙ座標である。図２５を参照しながら説明された初期パッチ抽出アルゴリズム2714を利用して、各候補についてパッチイメージを生成してもよい。 FIG. 27 shows a flowchart of database driven classification according to one embodiment of the present invention. Following the first feature extraction 2710, the input image patch is classified 2712 by any one or more of the recognition methods described above to generate document rankings, pages and xy coordinates within the page. Each candidate in this list includes, for example, the following items (doci, pagei, xi, yi), where doci is the document identifier, pagei is the page in the document, and (xi, yi) is the page The xy coordinates of the center of the image patch inside. A patch image may be generated for each candidate using the initial patch extraction algorithm 2714 described with reference to FIG.

図２７を更に参照するに、第２の特徴抽出が初期パッチに適用される(2716)。これは第１の特徴抽出とは異なり、例えば、フォント検出アルゴリズム、キャラクタ認識技法、境界ボックス及びSIEF特徴等の１つ以上を含んでもよい。初期パッチ各々で検出された特徴は、自動分類子設計法2720への入力になり、例えば、ニューラルネットワーク、サポートベクトルマシン及び／又は最近接分類子を含む(それらは、未知のサンプルを初期パッチの1つとして分類するようデザインされる)。同じ第２の特徴抽出が入力イメージパッチに適用され(2718)、検出された特徴は、新たに設計された分類子への入力になり、その初期パッチに特化したものになる。 Still referring to FIG. 27, a second feature extraction is applied to the initial patch (2716). This is different from the first feature extraction and may include one or more of, for example, a font detection algorithm, a character recognition technique, a bounding box and a SIEF feature. The features detected in each of the initial patches become inputs to the automatic classifier design method 2720, including, for example, neural networks, support vector machines, and / or closest classifiers (they identify unknown samples in the initial patch. Designed to be classified as one). The same second feature extraction is applied to the input image patch (2718) and the detected features become inputs to the newly designed classifier and are specific to that initial patch.

出力2724はｎ組の形式(doci,pagei,xi,yi,actioni,scorei)でもよく、スコア(score)は2720で自動的に設計される分類法(2722)により用意される。そのスコアは、入力パッチがdoci,pageiの対応する領域とどの程度合っているかを測るものであることを当業者は理解するであろう。 The output 2724 may be in n sets (doci, pagei, xi, yi, actioni, scorei), and the score is prepared by a classification method (2722) automatically designed by 2720. Those skilled in the art will understand that the score measures how well the input patch matches the corresponding region of doci, pagei.

図２８は、本発明の別の実施例による書類のフィンガープリント照合例を示す。図２８に示される「データベースドリブン複数分類子」法は、判定プロセス全体を通じて複数の候補を使用することで、認識プロセス初期での復元不可能なエラーの可能性を減らす。いくつかの初期分類が実行される。各々は、入力パッチの異なるランキングを生成し、そのランキングは、様々な特徴抽出及び分類によって識別可能である。例えば、これらの群中の１つは、水平方向のｎ組によって生成され、そして、サン−セリフによるセリフを識別することで一意に認識される。他の具体例は、垂直方向のｎ組により生成され、行間の正確な計算から一意に認識される。 FIG. 28 shows an example of fingerprint verification of a document according to another embodiment of the present invention. The “database driven multiple classifier” method shown in FIG. 28 reduces the possibility of unrecoverable errors early in the recognition process by using multiple candidates throughout the decision process. Some initial classification is performed. Each generates a different ranking of the input patch, which ranking can be identified by various feature extractions and classifications. For example, one in these groups is generated by n horizontal sets and is uniquely recognized by identifying the serif by sun-serif. Another example is generated by n sets in the vertical direction and is uniquely recognized from accurate calculations between rows.

図２９は、本発明の一実施例によるデータベースドリブン多重分類のフローチャートを示す。このフロープロセスは図２７に示されるものに似ているが、複数の異なる特徴抽出アルゴリズム2910,2912を使用し、分類2914,2916と共に入力イメージパッチの独立したランキングを生成する点で異なる。特徴及び分類法の例は、上述の水平及び垂直のワード長のｎグラムを使用する。各分類子は少なくとも以下の項目(doci,pagei,xi,yi,scorei)を含むパッチ身元のランク付けされたリストを各候補について生成し、pageiは書類中のページであり、(xi,yi)はそのページ内の画像パッチ中心のｘ−ｙ座標であり、scoreiは、入力パッチが、データベース書類中の対応するロケーションとどの程度よく合っているかを測る。 FIG. 29 shows a flowchart of database-driven multiple classification according to one embodiment of the present invention. This flow process is similar to that shown in FIG. 27 except that it uses a plurality of different feature extraction algorithms 2910, 2912 to generate independent rankings of the input image patches along with the classifications 2914, 2916. Examples of features and classification methods use n-grams of the horizontal and vertical word lengths described above. Each classifier generates a ranked list of patch identities for each candidate that includes at least the following items (doci, pagei, xi, yi, scorei), where pagei is a page in the document and (xi, yi) Is the xy coordinate of the image patch center in the page and scorei measures how well the input patch matches the corresponding location in the database document.

図２５に関する上述の初期パッチ抽出アルゴリズムは、2914,2916の出力でのパッチ識別子のリスト中のエントリに対応する初期イメージパッチ群を生成するのに使用されてもよい。第３及び第４の特徴抽出2918,2920は、図２７で説明したように自動的に設計及び適用された初期パッチ及び分類子に適用される。 The initial patch extraction algorithm described above with respect to FIG. 25 may be used to generate an initial image patch group corresponding to an entry in the list of patch identifiers at the output of 2914,2916. The third and fourth feature extractions 2918 and 2920 are applied to initial patches and classifiers that are automatically designed and applied as described in FIG.

図２９を更に参照するに、これらの分類子から生成されたランキングは結合され、候補番号i=1,...についてエントリ(doci,pagei,xi,yi,actioni,scorei)と共に１つのランキングを生成し(2924)、各エントリ中の値は上述の通りである。ランキング結合2922は、例えば、既知のボルダカウント指標(Borda count measure)を用いて実行されてもよく、その指標は、２つのランキングでの共通する位置に基づいてアイテムにスコアを割り当てるものである。これは、個々の分類により割り当てられたスコアと結合され、複合的なスコアを生成する。更に、当業者はランキング結合の他の方法が使用されてよいことに留意するであろう。 Still referring to FIG. 29, the rankings generated from these classifiers are combined to give one ranking with entries (doci, pagei, xi, yi, actioni, scorei) for candidate numbers i = 1, ... Generated (2924), and the value in each entry is as described above. Ranking combination 2922 may be performed, for example, using a known Borda count measure, which assigns a score to an item based on a common position in the two rankings. This is combined with the score assigned by the individual classification to produce a composite score. Furthermore, those skilled in the art will note that other methods of ranking combining may be used.

図３０は、本発明の別の実施例による書類のフィンガープリント照合例を示す。図３０に示される「ビデオシーケンスイメージ蓄積」法は、近くの又は隣接するフレームからのデータを統合することで或る画像(イメージ)を構築する。一例は、「超解像度(super resolution)を含む。これは、N個の一時的に隣接するフレームを登録し、レンズの点広がり関数(point spread function)を利用して、本質的にはサブピクセルエッジ強調であるものを実行する。その効果は、イメージの空間的解像度を増やす。更に、１つ以上の実施例では、超解像度法は、ホール、コーナー及びドットのような文字固有の特徴を強調するように特化されてもよい。別の拡張法は、データベース3400から判定されるような候補イメージパッチの特徴を利用して、超解像度統合機能を特化することである。 FIG. 30 shows an example of fingerprint verification of a document according to another embodiment of the present invention. The “video sequence image storage” method shown in FIG. 30 constructs an image by integrating data from nearby or adjacent frames. An example includes “super resolution. This registers N temporarily adjacent frames and uses the point spread function of the lens, essentially sub-pixels. Perform what is edge enhancement, the effect of which increases the spatial resolution of the image, and in one or more embodiments, the super-resolution method enhances character-specific features such as holes, corners and dots. Another extension method is to specialize the super-resolution integration function using the features of candidate image patches as determined from the database 3400.

図３１は、本発明の別の実施例による書類のフィンガープリント照合例を示す。図３１に示される「ビデオシーケンス特徴蓄積」法は、判定を行う前に、多数の一時的に隣接するフレームに渡って特徴を蓄積する。これは、捕捉装置の速いサンプリングレート(例えば、毎秒30フレーム)及びユーザの意向を利用し、少なくとも何秒かの間書類の同じ点に捕捉装置を指示し続けるようにする。特徴抽出は各フレームで独立して実行され、その結果(複数)は結合され、１つの統合された特徴マップを生成する。結合プロセスは、間接的な(implicit)登録ステップを含む。この方法のニーズは、テキストパッチのビデオクリップを検査する際に直ぐに明らかになる。一般的な捕捉装置でのオートフォーカス及びコントラスト調整は、隣接するビデオフレームでかなり異なる結果をもたらすかもしれない。 FIG. 31 shows an example of fingerprint verification of a document according to another embodiment of the present invention. The “video sequence feature accumulation” method shown in FIG. 31 accumulates features over a number of temporarily adjacent frames before making a decision. This takes advantage of the capture device's fast sampling rate (eg, 30 frames per second) and user intent to keep the capture device directed to the same point in the document for at least a few seconds. Feature extraction is performed independently on each frame, and the results are combined to produce one integrated feature map. The binding process includes an implicit registration step. The need for this method becomes readily apparent when examining video clips of text patches. Autofocus and contrast adjustment with a typical capture device may yield significantly different results in adjacent video frames.

図３２は、本発明の別の実施例による書類のフィンガープリント照合例を示す。図３２に示される「ビデオシーケンス判定結合」法は、多数の一時的に隣接するフレームからの判定を結合する。これは、一般的な捕捉装置の速いサンプリングレート及びユーザの意向を利用し、少なくとも何秒かの間捕捉装置を書類上の同じ地点に指示するようにする。各フレームは独立に処理され、それ自身のランキングされた判定リストを生成する。これらの判定は、入力画像群の１つの統合されたランキングを生成するように結合される。この方法は、判定結合プロセスを制御する間接的な登録法を含む。 FIG. 32 shows an example of fingerprint verification of a document according to another embodiment of the present invention. The “video sequence decision combining” method shown in FIG. 32 combines decisions from multiple temporarily adjacent frames. This takes advantage of the fast sampling rate and user intent of a typical capture device and directs the capture device to the same point on the document for at least a few seconds. Each frame is processed independently, generating its own ranked decision list. These decisions are combined to produce one integrated ranking of the input images. This method includes an indirect registration method that controls the decision combining process.

１つ以上の実施例では、図６−３２に関連して上述した様々なドキュメントフィンガープリント照合法の１つ以上は、１つ以上の既知の照合法と組み合わせて使用されてもよい(例えば、そのような結合は、「マルチ−ティア(又は、マルチ−ファクタ)認定」として本願で言及される。)。一般に、マルチ−ティア認定では、特定の基準を満たすページ群を書類データベース内で特定するために第１の照合法が使用され、その後に第２の照合法が使用され、その群中のページの中から１つのパッチを一意に特定する。 In one or more embodiments, one or more of the various document fingerprint matching methods described above in connection with FIGS. 6-32 may be used in combination with one or more known matching methods (eg, Such a combination is referred to herein as “multi-tier (or multi-factor) certification”). In general, in multi-tier certification, a first matching method is used to identify a group of pages that meet certain criteria in the document database, and then a second matching method is used to identify the pages in the group. One patch is uniquely identified from the inside.

図３３は、本発明の一実施例によるマルチティア認識のフローチャートを示す。始めに、ステップ3310で捕捉装置106が使用され、関心のある書類上で「選別(culling)」特徴を捕捉／スキャンする。選別特徴は如何なる特徴でもよく、選別特徴の捕捉は、書類データベース内の一群の書類を実際上選択することになる。例えば、選別特徴は、数字のみのバーコード(例えば、ユニバーサル製品コード(UPC))、英数字バーコード(例えば、コード39、コード93、コード128)又は２次元バーコード(例えば、QRコード、PDF417、DataMatrix、Maxicode)等でもよい。更に、選別特徴は、例えば、図形、画像、トレードマーク、ロゴ、特定の色又は色の組み合わせ、キーワード、又はフレーズ等でもよい。更に、１つ以上の実施例では、選別特徴は、捕捉装置106による認識に相応しい特徴に制限されてもよい。 FIG. 33 shows a flowchart of multi-tier recognition according to an embodiment of the present invention. First, at step 3310, the capture device 106 is used to capture / scan "culling" features on the document of interest. The sorting feature can be any feature, and capturing the sorting feature effectively selects a group of documents in the document database. For example, the sorting feature can be a numeric only barcode (eg, Universal Product Code (UPC)), an alphanumeric barcode (eg, Code 39, Code 93, Code 128) or a two-dimensional barcode (eg, QR Code, PDF417). , DataMatrix, Maxicode), etc. Further, the sorting feature may be, for example, a graphic, an image, a trademark, a logo, a specific color or combination of colors, a keyword, or a phrase. Further, in one or more embodiments, the sorting features may be limited to features suitable for recognition by the capture device 106.

ステップ3312では、選別特徴が一旦ステップ3310で捕捉されると、書類データベース内の一群の書類及び／又は書類のページが、捕捉された選別特徴に関連して選択される。例えば、捕捉された選別特徴が企業のロゴであった場合、そのロゴを含むように索引付けされているデータベース中の全ての書類が選択される。別の例では、データベースは、捕捉された選別画像と比較されるトレードマークのライブラリを含んでもよい。ライブラリの中で「ヒット」があった場合、ヒットしたトレードマークに関連する全ての書類が、以下に説明される以後の照合に備えて選択される。更に、１つ以上の実施例では、ステップ3312での書類／ページの選択は、走査された書類上で捕捉された選別特徴及びその選別特徴の場所に依存してよい。例えば、捕捉された選別特徴に関連する情報は、選別画像が、書類の左下隅でなく、書類の右上隅にあるか否かを特定してもよい。 In step 3312, once a screening feature is captured in step 3310, a group of documents and / or pages of documents in the document database are selected in relation to the captured screening feature. For example, if the captured screening feature was a company logo, all documents in the database that are indexed to contain the logo are selected. In another example, the database may include a library of trademarks that are compared to the captured sorted images. If there is a “hit” in the library, all documents associated with the hit trademark are selected for further verification as described below. Further, in one or more embodiments, the document / page selection at step 3312 may depend on the sorting features captured on the scanned document and the location of the sorting features. For example, the information related to the captured sorting feature may specify whether the sorted image is in the upper right corner of the document rather than in the lower left corner of the document.

更に、特定の捕捉画像が選別特徴の画像を含んでいることの確認は、捕捉装置106によりなされてもよいし、或いは捕捉装置106から未処理のイメージデータを受信する他の何らかのコンポーネントによりなされてもよいことを、当業者は留意するであろう。例えば、捕捉装置106から伝送された特定の捕捉画像が選別特徴を含んでいることを、捕捉した選別特徴に関連する書類群をデータベースが選択したことに応答して、データベース自身が判定してもよい。 In addition, confirmation that a particular captured image contains an image of the screening feature may be made by the capture device 106 or by some other component that receives raw image data from the capture device 106. Those skilled in the art will note that this is also possible. For example, the database itself may determine that a particular captured image transmitted from the capture device 106 includes a screening feature in response to the database selecting a set of documents associated with the captured screening feature. Good.

ステップ3314では、書類の特定の群がステップ3312で選択された後、捕捉装置106はスキャンし続け、そして関心のある書類の画像を捕捉する。図６−３２に関連して説明された様々なドキュメントフィンガープリント照合法の1つ以上を利用して、書類の捕捉画像は、ステップ3312で選択された書類と照合される。例えば、ステップ3310で関心のある書類上で靴の図形画像を捕捉したことに基づいて、靴の図形の選別特徴を含むようにインデックスされた書類群が、ステップ3312で選択された後、関心のある書類の以後捕捉された画像は、上述の複数の分類子を利用して選択された書類群と照合されてもよい。 In step 3314, after a particular group of documents is selected in step 3312, the capture device 106 continues to scan and captures an image of the document of interest. Utilizing one or more of the various document fingerprint verification methods described in connection with FIGS. 6-32, the captured image of the document is verified with the document selected in step 3312. For example, after capturing a shoe graphic image on a document of interest in step 3310, a group of documents indexed to include a shoe graphic selection feature is selected in step 3312 and A subsequent captured image of a document may be verified against a group of documents selected using the plurality of classifiers described above.

そして、図３３に関連して上述したマルチティア認定フロープロセスを実行し、パッチ認定時間は、以後捕捉される画像と照合されるページ／書類数を始めに削減することで、減らされる。更に、ユーザは、あるロケーション上で書類を先ずスキャンすることでそのように改善された認定時間の恩恵を受けてもよい（そのロケーションには、画像、バーコード、図形その他の選別特徴が存在する。）。そのような動作を行うことで、以後捕捉される画像と照合される書類数をユーザは速やかに減らしてもよい。 Then, performing the multi-tier certification flow process described above in connection with FIG. 33, the patch certification time is reduced by first reducing the number of pages / documents that are subsequently matched to the captured image. In addition, the user may benefit from such improved qualification time by first scanning the document over a location (images, barcodes, graphics and other sorting features exist at that location) .) By performing such an operation, the user may quickly reduce the number of documents to be collated with images captured thereafter.

MMRデータベースシステム
図３４Ａは、本発明の一実施例により構築されたMMRデータベースシステム3400の機能ブロック図を示す。システム3400はコンテンツベースの抽出用に構築され、オブジェクト間の２次元の幾何学的位置関係が、テキストベースのインデックスで(又は、他の何らかの探索可能なインデックスで)探索可能な方法で表現される。システム3400は、証拠蓄積法(evidence accumulation)を利用し、例えば、特徴の出現頻度と２次元領域内のロケーションの尤度とを結合することで検索効率を強化する。ある特定の実施例では、データベースシステム3400は(PDインデックス322を含む)書類イベントデータベース320の詳細な実施例であり、そのコンテンツは、図３に関連して上述した捕捉モジュール318及び／又はドキュメントフィンガープリント照合モジュール226によって生成された印刷書類の電子表現を含む。システム3400に関する他のアプリケーション及びコンフィギュレーションは、本願の開示内容から明らかになるであろう。 MMR Database System FIG. 34A shows a functional block diagram of an MMR database system 3400 constructed in accordance with one embodiment of the present invention. System 3400 is built for content-based extraction, where the two-dimensional geometric positional relationship between objects is represented in a searchable manner with a text-based index (or with some other searchable index). . The system 3400 uses evidence accumulation to enhance search efficiency by combining, for example, the frequency of appearance of features and the likelihood of locations within a two-dimensional region. In one particular embodiment, database system 3400 is a detailed embodiment of document event database 320 (including PD index 322), the contents of which are captured module 318 and / or document fingers described above in connection with FIG. It includes an electronic representation of the printed document generated by the print verification module 226. Other applications and configurations for the system 3400 will become apparent from the present disclosure.

図示されているように、データベースシステム3400はMMRインデックステーブルモジュール3404(MMR特徴抽出モジュール3402により算出された記述を受信する)と、証拠蓄積モジュール3406と、リレーショナルデータベース3408(又は、何らかの適切な他のストレージ手段)とを含む。インデックステーブルモジュール3404は、インデックステーブルに問い合わせを行い、インデックステーブルは、各特徴が現れるページのｘ−ｙ座標、ページ及び書類を特定する。インデックステーブルは、例えば、MMRインデックステーブルモジュール3404又は他の専用モジュールにより生成可能である。証拠蓄積モジュール3406は、インデックステーブルモジュール3404からのデータの基で、書類、ページ及びロケーション候補3410のランク付けされた群を計算するようにプログラムされる又は構築される。リレーショナルデータベース3408は、各パッチについての付加的な特徴3412を格納するよう使用可能である。限定ではないが、これらは図５の504,508を含む。パッチについてシグネチャ又はフィンガープリント(即ち、固有のサーチターム(search term))を導出する際に、パッチ内でテキストの２次元配列を利用することで、テキストの小さな部分でさえその一意性が大幅に増える。他の実施例は、パッチのシグネチャ又はフィンガープリントを導出する際に、パッチ内でオブジェクト／特徴の何らかの２次元配列を同様に利用することができ、本発明の実施例は、パッチを一意に特定するテキストの２次元配列に限定するようには意図されない。図３４Ａに示されるデータベースシステム3400の他のコンポーネント及び機能は、フィードバック指向の特徴サーチモジュール3418、書類変換アプリケーションモジュール3414及びサブイメージ抽出モジュール3416を含む。これらのコンポーネントは、システム3400の他のコンポーネントと相互作用し、動的に初期イメージを生成することに加えて、フィードバック指向の特徴探索を行う。更に、システム3400は、アクションを受けるアクションプロセッサ3413を含む。そのアクションは、データベースシステム34002より実行されるアクション及びそれが提供する出力を判定する。これらの他のコンポーネントの各々が以下に説明される。 As shown, the database system 3400 includes an MMR index table module 3404 (receives a description calculated by the MMR feature extraction module 3402), an evidence storage module 3406, and a relational database 3408 (or any other suitable Storage means). The index table module 3404 queries the index table, and the index table specifies the xy coordinates, page and document of the page where each feature appears. The index table can be generated by, for example, the MMR index table module 3404 or another dedicated module. The evidence storage module 3406 is programmed or constructed to calculate a ranked group of documents, pages, and location candidates 3410 based on the data from the index table module 3404. Relational database 3408 can be used to store additional features 3412 for each patch. These include, but are not limited to, 504,508 in FIG. When deriving a signature or fingerprint (ie, a unique search term) for a patch, utilizing a two-dimensional array of text within the patch greatly increases the uniqueness of even small portions of text Increase. Other embodiments can similarly utilize any two-dimensional array of objects / features within the patch when deriving the signature or fingerprint of the patch, and embodiments of the present invention uniquely identify the patch It is not intended to be limited to a two-dimensional array of text. Other components and functions of the database system 3400 shown in FIG. 34A include a feedback-oriented feature search module 3418, a document conversion application module 3414, and a sub-image extraction module 3416. These components interact with other components of the system 3400 to perform feedback-oriented feature searches in addition to dynamically generating an initial image. Further, system 3400 includes an action processor 3413 that receives actions. The action determines the action performed by the database system 34002 and the output it provides. Each of these other components is described below.

図３４Ｂには、パッチ内のテキストの２次元配列を利用するMMR特徴抽出モジュール3402の例が示されている。そのような1つの例では、MMR特徴抽出モジュール3402は、OCRベースの技術を利用して、画像パッチから特徴(テキスト又は他の検索対象特徴)を抽出するようにプログラムされる又は構築される。この特定の実施例では、特徴抽出モジュール3402は、テキストのパッチの画像内のワードのｘ−ｙ座標(ロケーション)を抽出し、そのロケーションを、それを含む水平に及び垂直に隣接するワードペア群として表現する。画像パッチは、それらが水平に隣接する場合は“−”が付与され(例えば、the-cat, in-the, the-hat, is-back)、それらが垂直にオーバラップする場合は“＋”が付与されるワードペア(例えば、the+in, cat+the, in+is, the+back)に、実際上変換される。ｘ−ｙ座標は、例えば、書類画像中の或る固定点からの、ｘｙ平面における各方向の画素数に基づいてもよい(例えば、書類の左上隅又は中心からの画素数でもよい。)。その例では水平に隣接するペアは多くの他のテキスト部分で頻繁に現れるかもしれないが、縦にオーバラップするペアは他のテキスト部分で頻繁には生じない傾向があることに留意を要する。画像特徴間の他の幾何学的位置関係は、同様に符号化可能であり、例えばSW-NEの隣接性をワード間に“／”を入れることで、NW-SEの隣接性を“＼”等により表現してもよい。また、「特徴」は、ワード包囲ボックス(又は、他の特徴包囲ボックス)に一般化され、文字数に合うことを除き、任意に符号化されてもよい。例えば、４つ分の境界ボックスは、スムーズな下の輪郭を除き、凹凸のある高さを上の輪郭に有する限り、“4rusl”の文字列で表現されてもよい。更に、幾何学的位置関係は、特徴間の任意の角度及び距離に拡張されてもよい。例えば、NW-SEの位置関係で隣接するが、２ワード分の高さだけ離れている２つのワード(各ワードは“4ursl”で記述される)は、“4rusl＼＼4rusl”で表現されてもよい。本願の開示内容から様々な符号化法が明らかになるであろう。更に、数、論理値(Boolean value)、幾何学的形状、及び他のそのような書類特徴が、ワードペアの代わりにIDパッチに使用されてもよいことに留意を要する。 FIG. 34B shows an example of an MMR feature extraction module 3402 that uses a two-dimensional array of text in a patch. In one such example, the MMR feature extraction module 3402 is programmed or constructed to extract features (text or other searchable features) from the image patch utilizing OCR-based techniques. In this particular embodiment, the feature extraction module 3402 extracts the xy coordinates (location) of a word in the image of the text patch, and the location as a group of horizontally and vertically adjacent word pairs that contain it. Express. Image patches are given a “-” if they are horizontally adjacent (eg, the-cat, in-the, the-hat, is-back) and “+” if they overlap vertically. Is effectively converted into a word pair (eg, the + in, cat + the, in + is, the + back). The xy coordinate may be based on, for example, the number of pixels in each direction in the xy plane from a certain fixed point in the document image (for example, the number of pixels from the upper left corner or the center of the document). Note that while horizontally adjacent pairs may appear frequently in many other text portions in that example, vertically overlapping pairs tend not to occur frequently in other text portions. Other geometric positional relationships between image features can be encoded in the same way, eg by placing a “/” between SW-NE adjacencies and “/” between NW-SE adjacencies. Or the like. Further, the “feature” is generalized to a word enclosing box (or other feature enclosing box), and may be arbitrarily encoded unless it matches the number of characters. For example, four bounding boxes may be expressed by the character string “4rusl” as long as the upper contour has an uneven height except for a smooth lower contour. Furthermore, the geometric positional relationship may be extended to any angle and distance between features. For example, two words that are adjacent in the NW-SE positional relationship but separated by a height of 2 words (each word is described as “4ursl”) are expressed as “4rusl \\ 4rusl” Also good. Various encoding methods will be apparent from the disclosure of the present application. Furthermore, it should be noted that numbers, Boolean values, geometric shapes, and other such document features may be used for ID patches instead of word pairs.

図３４Ｃは、本発明の一実施例によるインデックステーブル組織例を示す。図示されるように、MMRインデックステーブルは、逆タームインデックステーブル3422及びドキュメントインデックステーブル3424を含む。固有のターム又は特徴(例えば、key3421)は、特徴の関数値(例えば、key x)を保持するタームインデックステーブル3422内のあるロケーションを指し、その関数値は記録のリスト3423(例えば、Rec#1、Rec#2等)を指し、各記録は、上述したように、書類内のページの候補領域を特定する。一例では、key(キー)及びkeyの関数値(keyx)は同じである。別の例では、ハッシュ関数がkeyに適用され、その関数の出力がkeyxになる。 FIG. 34C shows an example index table organization according to one embodiment of the present invention. As shown, the MMR index table includes an inverted term index table 3422 and a document index table 3424. A unique term or feature (e.g., key 3421) points to a location in the term index table 3422 that holds the function value of the feature (e.g., key x), and the function value is stored in a list 3423 (e.g., Rec # 1 , Rec # 2, etc.), and each record identifies a candidate area of a page in the document as described above. In one example, key and key function value (keyx) are the same. In another example, a hash function is applied to key and the output of that function is keyx.

問い合わせ語句(クエリターム)のリストの基で、keyでインデックスされる記録全てが検査され、全てのクエリタームに最も合う領域が特定される。その領域が十分に高い合致スコア(例えば、予め決めておいた一致閾値)を有するならば、その候補に確定する。そうでなければ、照合は失敗したことが宣言され、どの領域も返されない。この実施例では、keyは上述したような“−”又は“＋”により分離されたワードペア(例えば、“the-cat”又は“cat+the”)である。Key自身の幾何学的関係を組み込むこの技法は、２次元の幾何学的探索用の通常のテキストサーチ技術を利用可能にする。 Based on the list of query terms, all records indexed by key are inspected to identify the area that best fits all query terms. If the region has a sufficiently high match score (for example, a predetermined match threshold), it is determined as a candidate. Otherwise, the match is declared failed and no region is returned. In this embodiment, the key is a word pair (eg, “the-cat” or “cat + the”) separated by “−” or “+” as described above. This technique of incorporating Key's own geometric relationships makes available conventional text search techniques for two-dimensional geometric searches.

従って、インデックステーブル機構は、画像パッチ内で検出された特徴をテキストタームに変換し、そのテキストタームは、特徴自身とそれらの幾何学的位置関係の双方を表現する。これは、通常のテキストインデックス及びサーチ法の利用を可能にする。例えば、縦に隣接するターム“cat”及び“the”は、シンボル“cat+the”で表現され、本開示により明らかになるように、「クエリターム」として言及されてよい。通常のテキストサーチデータ構造及び方法論を利用することは、インターネットテキストサーチシステム(例えば、グーグル、ヤフー、マイクロソフト等)のトップにMMR法を結び付けることを促す。 Thus, the index table mechanism converts the features detected in the image patch into text terms, which represent both the features themselves and their geometric positional relationships. This allows the use of normal text indexes and search methods. For example, the vertically adjacent terms “cat” and “the” are represented by the symbol “cat + the” and may be referred to as “query terms” as will be apparent from this disclosure. Utilizing the normal text search data structure and methodology encourages the MMR method to be tied to the top of Internet text search systems (eg, Google, Yahoo, Microsoft, etc.).

本実施例の逆タームインデックステーブル3422では、各記録(レコード)は６つのパラメータを用いて書類中のページの候補領域を特定する。6つのパラメータは、書類識別子(DocID)、ページ番号(PG)、x/yオフセット(X及びYそれぞれのオフセット)、四角形領域の幅及び高さ(W及びH)である。DocIDは、書類が印刷された時点のタイムスタンプ(又は、他のメタデータ)に基づいて生成された固有の文字列(ストリング)である。しかしながら、装置ID及び人のIDを組み合わせた如何なる文字列でもよい。いずれにせよ、書類は固有のDocIDで特定され、書類インデックステーブルに記憶されたレコードを有する。ページ番号は、紙出力に対応するページ数であり、１から始まる。四角形領域は、規格化された座標系での包囲ボックスの幅及び高さに加えて、左上隅のｘ−ｙ座標のパラメータによっても指定される。様々な内部の書類ロケーション／座標系が本開示から明らかになり、本発明は何らかの特定のものに限定されるようには意図されない。 In the reverse term index table 3422 of this embodiment, each record (record) specifies a candidate area of a page in a document using six parameters. The six parameters are a document identifier (DocID), a page number (PG), an x / y offset (X and Y offsets), and a rectangular area width and height (W and H). The DocID is a unique character string (string) generated based on the time stamp (or other metadata) at the time when the document is printed. However, any character string combining the device ID and the person ID may be used. In any case, the document is identified by a unique DocID and has a record stored in the document index table. The page number is the number of pages corresponding to paper output and starts from 1. The rectangular area is specified by the xy coordinate parameters in the upper left corner in addition to the width and height of the enclosing box in the standardized coordinate system. Various internal document location / coordinate systems will become apparent from this disclosure, and the present invention is not intended to be limited to any particular one.

本発明の一実施例により構築されたレコード構造例は、24ビットのDocID及び８ビットのページ番号を利用し、16×10⁶の書類及び4×10¹⁰のページまで許容する。境界ボックスのX及びYオフセット各々についての１つの無符号バイトは、水平方向に30dpi及び垂直方向に23dpiの空間解像度を与える(8.5”×11”のページを想定しているが、他のページサイズ及び／又は空間解像度が使用されてもよい。)。(例えば、W及びH各々について１つの無符号化バイトのように)境界ボックスの幅及び高さについての同様な取り扱いは、“ｉ”のドット又は区間と同程度に小さな領域の表現を可能にする、或いはページ全体と同程度に大きな領域の表現を可能にする(例えば、8.5”×11”等)。従って、レコード当たり８バイト(DocIDにつき３バイト、PGにつき１バイト、ｘにつき１バイト、ｙにつき１バイト、ｗにつき１バイト、Hにつき１バイトで全部で８バイトになる)は、大量の領域を収容できる。 An example record structure constructed according to one embodiment of the present invention uses a 24-bit DocID and an 8-bit page number, and allows up to 16 × 10 ⁶ documents and 4 × 10 ¹⁰ pages. One unsigned byte for each X and Y offset of the bounding box gives a spatial resolution of 30dpi horizontally and 23dpi vertically (assuming 8.5 "x 11" pages, but other page sizes And / or spatial resolution may be used). Similar handling of the bounding box width and height (for example, one uncoded byte for each of W and H) allows the representation of a region as small as an “i” dot or interval. Or an area as large as the whole page can be expressed (for example, 8.5 ”× 11”). Therefore, 8 bytes per record (3 bytes per DocID, 1 byte per PG, 1 byte per x, 1 byte per y, 1 byte per w, 1 byte per H, 8 bytes total) Can be accommodated.

ドキュメントインデックステーブル3424は、各書類に関連する情報を含む。ある特定の実施例では、この情報はXMLファイル中の書類関連フィールドを含み、そのフィールドは、印刷解像度、印刷日、用紙サイズ、シャドーファイル名、ページ画像ロケーション等を含む。印刷座標は書類を指示した時点で規格化された座標系に変換されるので、サーチ候補の計算はこのテーブルを含まない。従って、ドキュメントインデックステーブル3424は、照合される候補領域についてのみ問い合わせを受ける。しかしながら、その判定は、そのインデックスにおける或る情報損失を含む−なぜなら通常は、規格化された座標は印刷解像度より低い解像度だからである。代替例は、サーチ候補を算出する際、望まれるならば、ドキュメントインデックステーブル3424(又は、規格化された座標系での解像度より高いもの)を利用する。 The document index table 3424 includes information related to each document. In one particular embodiment, this information includes document related fields in the XML file, which include print resolution, print date, paper size, shadow file name, page image location, and the like. Since the print coordinates are converted into a standardized coordinate system when the document is designated, the search candidate calculation does not include this table. Accordingly, the document index table 3424 receives an inquiry only about candidate areas to be collated. However, the determination includes some information loss in the index—usually because the normalized coordinates are a lower resolution than the print resolution. An alternative uses the document index table 3424 (or higher than the resolution in the standardized coordinate system), if desired, when calculating search candidates.

このようにインデックステーブルモジュール3404は、所与の画像クエリが現れる場所のオブジェクト中のｘ−ｙ座標及びオブジェクト(例えば、書類ページ)のコンテンツベースの検索を可能にする画像インデックスを効果的にもたらすように機能する。そのような画像インデックス及びリレーショナルデータベース3408の組み合わせは、画像パッチ及びパッチの特徴に合致するオブジェクトの特定を可能にする(例えば、パッチに付随する「アクション」、又はパッチに関連する他のコンテンツの抽出を引き起こすようにスキャン可能なバーコード等)。リレーショナルデータベース3408は、書類の他のパッチについてインデックステーブルにおけるパッチから特徴への「逆リンク」用の手段を与える。逆リンクは、特徴を発見する方法を提供し、認識アルゴリズムは、書類画像のある部分から別の部分へ動くように見えることを予測し、上述したようなMMRシステムでフロントエンドの画像分析アルゴリズムのパフォーマンスをかなり改善する。 In this way, the index table module 3404 effectively provides an image index that allows a content-based search of xy coordinates and objects (eg, document pages) in the object where a given image query appears. To work. Such a combination of image index and relational database 3408 allows the identification of image patches and objects that match the characteristics of the patches (e.g., extraction of "actions" associated with patches, or other content associated with patches). Can be scanned to cause barcodes etc.). Relational database 3408 provides a means for “reverse linking” of patches to features in the index table for other patches of the document. The reverse link provides a way to find features, the recognition algorithm predicts that it will appear to move from one part of the document image to another, and the front-end image analysis algorithm in the MMR system as described above. Improve performance considerably.

フィードバック指向の特徴サーチ
書類及びページの身元だけでなく、画像パッチのｘ−ｙ座標(例えば、画像パッチの中心に対するｘ−ｙ座標)が、フィードバック指向の特徴サーチモジュール3418に入力される。フィードバック指向の特徴サーチモジュール3418は、画像パッチの中心から所与の距離の範囲内にあるレコード3423を求めてタームインデックステーブル3422を探索する。例えば、DocID-PGの組み合わせ各々についてレコード3423をX又はYの値の順番で連続するメモリブロックに格納することで、サーチは促進される。所与のX及びYの値を伴うレコード全てについて、所与の値を求めるバイナリサーチ及びそのロケーションからの連続的なサーチにより、探索は実行される(X又はYは、データがどのように格納されているか及び何時格納されたかに依存する。)。典型的には、これはパッチの外周のMインチリングのｘ−ｙ座標を含み、そのリングは、所与の書類及びページでWインチの幅及びHインチの高さを有する。このリング内にあるレコードが特定され、それらのキー(key)又は特徴3421が、ポインタをトレースバック(追跡)することで特定される。リング内の特徴及びそれらのｘ−ｙ座標のリストは、図３４Ａの3417に示されるように報告される。3415に示されるW,H,Mの値は、入力画像のサイズに基づいて認識システムによって動的に設定可能であり、特徴3417が入力画像パッチの外にあるようにする。 Not only the feedback-oriented feature search document and page identity, but also the xy coordinates of the image patch (eg, the xy coordinates relative to the center of the image patch) are input to the feedback-oriented feature search module 3418. The feedback-oriented feature search module 3418 searches the term index table 3422 for records 3423 that are within a given distance from the center of the image patch. For example, the search is facilitated by storing records 3423 for each DocID-PG combination in successive memory blocks in the order of X or Y values. For all records with a given X and Y value, the search is performed by a binary search for the given value and a continuous search from that location (X or Y is how the data is stored) Depending on when it was done and when it was stored.) Typically, this includes the xy coordinates of the M inch ring around the periphery of the patch, which ring has a width of W inches and a height of H inches for a given document and page. Records that are within this ring are identified and their keys or features 3421 are identified by tracing back the pointer. A list of features in the ring and their xy coordinates is reported as shown at 3417 in FIG. 34A. The values of W, H, and M shown in 3415 can be set dynamically by the recognition system based on the size of the input image so that the feature 3417 is outside the input image patch.

画像データベースシステム3400のそのような性質は、例えば、複数の候補の曖昧さを無くす際に有利である。データベースシステム3400が、入力画像パッチを照合するのに1つより多くの書類を報告する場合、判定を明確にする方向に画像捕捉装置を僅かにずらすようにユーザに指示することで、パッチ周囲のリング内の特徴は、どの書類が、ユーザの保持している書類に最も一致するかを認識システム(例えば、フィンガープリント照合モジュール226又は他の適切な認識システム)が決定できるようにする。例えば（OCRベースの特徴が使用されるが、コンテンツは何らかの幾何学的にインデックスされた特徴群に広がっている）、書類Ａの画像パッチは、「青−木琴」“blue-xylophone”のワード対の真下にあるかもしれない。書類Ｂの画像パッチは、「青−雷」“blue-thunderbird”の真下にあるかもしれない。データベースシステム3400は、これらの特徴の予想される場所を報告し、認識システムは、パッチのトップ及び特徴のｙ座標の差分で指定される量だけカメラを(例えば、ユーザインターフェースを介して)上に動かすようにユーザに指示することができる。認識システムはその異なる領域で特徴を算出し、書類Ａ及びＢからの特徴を利用して、どれが最も合致するかを確認する。例えば、(木琴、雷)で構成される特徴の「辞書(dictionary)」と共に、認識システムは、異なる領域からのOCR結果の後処理を実行することができる。OCR結果に最も合致するワードは、入力画像に最も合致する書類に対応する。後処理アルゴリズムの具体例は、一般に知られているスペル補正法(ワードプロセッサや電子メールアプリケーションで使用されるようなもの)である。 Such properties of the image database system 3400 are advantageous, for example, in eliminating ambiguity among multiple candidates. If the database system 3400 reports more than one document to match the input image patch, the user is instructed to slightly shift the image capture device in a direction that makes the decision clearer. The features in the ring allow the recognition system (eg, fingerprint verification module 226 or other suitable recognition system) to determine which document best matches the document held by the user. For example (where OCR-based features are used, but the content is spread over some geometrically indexed features), the image patch for document A is the word pair “blue-xylophone” May be directly underneath. The image patch for document B may be directly under “blue-thunderbird”. The database system 3400 reports the expected location of these features, and the recognition system places the camera on the top (eg, via the user interface) by the amount specified by the difference between the top of the patch and the y coordinate of the feature. The user can be instructed to move. The recognition system calculates features in the different areas and uses the features from documents A and B to determine which matches best. For example, along with a “dictionary” of features composed of (xylophone, thunder), the recognition system can perform post-processing of OCR results from different regions. The word that best matches the OCR result corresponds to the document that best matches the input image. A specific example of a post-processing algorithm is the generally known spelling correction method (such as that used in word processors and email applications).

この実施例が示すように、別のデータベースアクセスを実行する必要性を無くす方法で特徴記述を照合することで、データベースシステム3400のデザインは、認識システムが効率的な方法で複数の候補の曖昧さを無くすことを可能にする。代替例は、各イメージを独立に処理することである。 As this example shows, by collating feature descriptions in a way that eliminates the need to perform another database access, the design of the database system 3400 allows multiple candidate ambiguities in a way that the recognition system is efficient. Can be eliminated. An alternative is to process each image independently.

動的な初期画像生成
書類及びページの身元だけでなく画像パッチのロケーションのｘ−ｙ座標(例えば、画像パッチの中心のｘ−ｙ座標)も、リレーショナルデータベース3408に入力され、それらは、書類及びページの格納済みの電子情報源を引き出すのに使用可能である。書類は、書類変換アプリケーションモジュール3414によりビットマップ画像として変換可能である。また、モジュール3414で用意される付加的な「ボックスサイズ」の値は、サブイメージ抽出モジュール3416で使用され、その中心付近のビットマップの部分を取り出す。このビットマップは、イメージパッチの予測される表示の「初期」表現であり、入力画像中に存在する全ての特徴について正確な表現を含む。初期パッチは、パッチ特徴3412として返される。このソリューションは、オンデマンドでビットマップデータに後で変換されるコンパクトな非イメージ表現を格納することで、ビットマップを格納する従来法で必要とされていた過剰なストレージの問題を克服する。 As well as the dynamic initial image generation document and page identity, the xy coordinates of the location of the image patch (eg, the xy coordinate of the center of the image patch) are also entered into the relational database 3408, which is Can be used to retrieve a stored electronic source of a page. The document can be converted as a bitmap image by the document conversion application module 3414. Also, the additional “box size” value provided by module 3414 is used by sub-image extraction module 3416 to extract the portion of the bitmap near its center. This bitmap is an “initial” representation of the predicted display of the image patch and includes an accurate representation of all the features present in the input image. The initial patch is returned as patch feature 3412. This solution overcomes the excessive storage problem required by conventional methods of storing bitmaps by storing compact non-image representations that are later converted to bitmap data on demand.

そのようなストレージ法は有利である。なぜなら、仮定及び検査の認識法を利用可能にするからであり、その方法では、イメージから引き出された特徴表現は候補群を引き出すのに使用され、その候補群は詳細な特徴分析により曖昧さは解消される。しばしば、代替的な候補群を最良に明確化する(曖昧さを無くす)特徴を予測できないので、これらの候補のオリジナルの画像から特徴が決定されることが望ましい。例えば、“the cat”というワードペアの画像が２つのデータベース書類中にあり、その一方はタイムスロマン(Times Roman)フォントで当初印刷されたものであり、他方はヒルベティカ(Helvetica)フォントにおけるものである。入力画像がこれらのフォントの１つを含むか否かの確認と同様な確認は、正しく合致するデータベース書類を特定する。それらの書類の当初のパッチと入力画像パッチとを、ユークリッド距離のようなテンプレート照合比較メトリックと共に比較することで、正しい候補を特定する。 Such a storage method is advantageous. This is because it makes possible the use of hypothesis and inspection recognition methods, in which the feature representation derived from the image is used to derive candidate groups, which are then obscured by detailed feature analysis. It will be resolved. Often, it is desirable to determine features from the original images of these candidates, since it is not possible to predict the features that best define alternative candidate groups (disambiguating). For example, an image of the word pair “the cat” is in two database documents, one originally printed in a Times Roman font and the other in a Helvetica font. A check similar to the check whether the input image contains one of these fonts identifies the database document that matches correctly. The correct candidate is identified by comparing the original patch of those documents with the input image patch along with a template matching comparison metric such as Euclidean distance.

具体例は、マイクロソフトワード“．ｄｏｃ”ファイルを格納するリレーショナルデータベース3408を含む(同様な方法が、他の書類フォーマットに関して機能し、他の書類フォーマットは、PCL、pdf.、又はマイクロソフトのXMLペーパー仕様XPS、又は他のそのようなフォーマットであり、他のそのようなフォーマットは、ゴーストスクリプトのような変換アプリケーション又はXPSの場合、マイクロソフトのインターネットエクスプローラ(インストールされたWinFXコンポーネントを有する)によりビットマップに変換可能である。)。書類、ページ、ｘ−ｙ座標、ボックス寸法、及びシステムパラメータが特定され、それらは好ましい解像度が600ドットパーインチ(dpi)であることを示し、ワードアプリケーションは、ビットマップイメージを生成するように起動可能である。これは6600行及び5100列のビットマップをもたらす。追加的なパラメータx=3”、y=3”、高さ=1”及び幅=1”は、データベースが600画素の高さ及び幅のパッチ(ページの左上隅からｘ及びｙ方向に1800画素の地点に中心を有する)を返すべきことを示す。 Examples include a relational database 3408 that stores a Microsoft Word “.doc” file (similar methods work with other document formats, such as PCL, pdf., Or Microsoft XML paper specifications XPS, or other such formats, and other such formats are converted to bitmaps by Microsoft Internet Explorer (with installed WinFX components) in the case of conversion applications such as ghost scripts or XPS Is possible.) Document, page, xy coordinates, box dimensions, and system parameters are specified, indicating that the preferred resolution is 600 dots per inch (dpi), and the word application is launched to generate a bitmap image Is possible. This results in a 6600 row and 5100 column bitmap. Additional parameters x = 3 ″, y = 3 ″, height = 1 ″ and width = 1 ″ are databases that have a 600 pixel height and width patch (1800 pixels in the x and y directions from the upper left corner of the page) To return).

複数データベース
複数のデータベースシステム3400が使用される場合、各データベースシステムは様々なドキュメントコレクションを含み、２つのデータベースが同じ書類を返しているか否かを又はどのデータベースが入力にいっそう近い候補を返したかを確認するために初期パッチが使用可能である。おそらくは異なる識別子3410(即ち、それらは異なるデータベースに別個に入っているので、オリジナル書類と同じであるようには現れない)及び特徴3412と共に、２つのデータベースが同じ書類を返した場合、初期パッチはほぼ厳密に同じである。例えばハミング距離と共に初期パッチと互いに比較することでそれは判定可能であり、ハミング距離は、異なる画素数を計数している。オリジナル書類がピクセル毎に厳密に同じならば、ハミング距離はゼロになる。ささいなフォントの相違に起因して引き起こされるようにパッチが僅かに異なっていた場合、ハミング距離はゼロより僅かに大きくなる。これは、ハミングオペレータで画像の違いが算出される際、キャラクタのエッジ周辺で「ハロー(halo)」効果を引き起こすおそれがある。このようなフォントの相違は、元の変換アプリケーションのバージョンの相違、データベースを走らせているサーバでのオペレーティングシステムのバージョンの相違、プリンタの相違又はフォントコレクションの相違などによって引き起こされるおそれがある。 Multiple Databases When multiple database systems 3400 are used, each database system includes various document collections, whether two databases are returning the same document or which database returned a candidate that is closer to the input. An initial patch is available to check. If two databases return the same document, possibly with different identifiers 3410 (ie they do not appear to be the same as the original document because they are separately in different databases) and feature 3412, the initial patch is It is almost exactly the same. For example, it can be determined by comparing with the initial patch together with the Hamming distance, and the Hamming distance counts the number of different pixels. If the original document is exactly the same for each pixel, the Hamming distance is zero. If the patches are slightly different, as caused by minor font differences, the Hamming distance will be slightly greater than zero. This can cause a “halo” effect around the edge of the character when image differences are calculated by the hamming operator. Such font differences may be caused by differences in the original conversion application version, operating system versions at the server running the database, printer differences, font collection differences, and the like.

初期パッチ比較アルゴリズムは、２つの書類で１つより多いｘ−ｙ座標により、パッチ(複数)で実行可能である。それらは全て同じであるべきだが、このようなサンプリング処理は、データベースシステム間の相違の表現を克服する冗長性を許容する。例えば、２つのシステムで表現される場合、あるフォントは根本的に違って見えるが、別のフォントはほとんど同じに見えるかもしれない。 The initial patch comparison algorithm can be performed on the patch (s) with more than one xy coordinate in two documents. Although they should all be the same, such a sampling process allows redundancy to overcome the representation of differences between database systems. For example, when represented on two systems, one font may look radically different, while another font may look almost the same.

２つ以上のデータベースが、入力画像と最良に合うものと異なる書類を返す場合、ハミング距離のようなピクセルベースの比較メトリックによって、初期パッチは入力画像と比較され、どれが正しいかを判定する。 If more than one database returns a different document than the one that best fits the input image, a pixel-based comparison metric, such as a Hamming distance, compares the initial patch with the input image to determine which is correct.

１つより多くのデータベースからの結果を比較する代替法は、アキュムレータアレイの内容を比較することであり、アキュムレータアレイは、各データベースから報告された処理中の特徴の幾何学的な分布(配列)を測るものである。元の特徴群を別個に探す必要性を回避するため、このアキュムレータはデータベースの直ぐそばに用意されることが望ましい。また、このアキュムレータは、データベースシステム3400のコンテンツとは独立であるべきである。図３４Ａに示される例では、アクティビティアレイ3420が出力されている。２つのアクティビティアレイは、それらの値のアレイ内の分布を測定することで比較可能である。 An alternative method of comparing results from more than one database is to compare the contents of the accumulator array, which is a geometric distribution (array) of the in-process features reported from each database. It measures. In order to avoid the need to look for the original feature group separately, it is desirable that this accumulator be provided right next to the database. This accumulator should also be independent of the contents of the database system 3400. In the example shown in FIG. 34A, an activity array 3420 is output. Two activity arrays can be compared by measuring their distribution in the array.

より詳細には、おそらくは異なる識別子3410(即ち、それらは異なるデータベースに別個に入っているので、オリジナル書類と同じであるようには現れない))及び特徴3412と共に、２つ以上のデータベースが同じ書類を返す場合、各データベースからのアクティビティアレイ3420はほとんど厳密に同じである。例えばハミング距離と共に初期パッチと互いに比較することでそれは判定可能であり、ハミング距離は、異なる画素数を計数している。オリジナル書類がピクセル毎に厳密に同じならば、ハミング距離はゼロになる。 More specifically, two or more databases may be the same document, possibly with different identifiers 3410 (ie, they do not appear to be the same as the original document because they are separately in different databases)) and feature 3412. The activity array 3420 from each database is almost exactly the same. For example, it can be determined by comparing with the initial patch together with the Hamming distance, and the Hamming distance counts a different number of pixels. If the original document is exactly the same for each pixel, the Hamming distance is zero.

２つ以上のデータベースが入力特徴と最良に合うものと異なる書類を返した場合、どの書類が入力画像に「最良に」合うかを判定するために、それらのアクティビティアレイ3420が比較可能である。画像パッチに適切に合致するアクティビティアレイは、パッチが現れているロケーション付近に集まっている高い値のクラスタを含む。画像パッチに不適切にしか一致してないアクティビティアレイは、ランダムに分布した値を含むであろう。エントロピーのような画像の分散性やランダム性を測る多くの周知の方法が存在する。そのようなアルゴリズムはアクティビティアレイ3420に適用され、クラスタの存在を示す指標を取得する。例えば、画像パッチに対応するクラスタを含むアクティビティアレイ3420のエントロピーは、値がランダムに分布しているアクティビティアレイ3420のエントロピーとかなり異なる。 If more than one database returns a document that is different from the one that best fits the input features, their activity arrays 3420 can be compared to determine which document “best” fits the input image. An activity array that appropriately matches an image patch includes high value clusters that are clustered near the location where the patch appears. An activity array that only inappropriately matches an image patch will contain randomly distributed values. There are many well-known methods for measuring the dispersibility and randomness of an image such as entropy. Such an algorithm is applied to the activity array 3420 to obtain an indicator that indicates the presence of a cluster. For example, the entropy of an activity array 3420 that includes clusters corresponding to image patches is significantly different from the entropy of an activity array 3420 that has randomly distributed values.

更に、各クライアント106はいつでも複数のデータベースシステム3400にアクセスしてよく、データベースの内容は、互いに必ずしも競合しないことに留意を要する。例えば、或る企業は、公にアクセス可能なパッチと、その企業にとって私的なもの(それぞれ１つのドキュメントに関連する)との双方を所有してもよい。そのような場合、クライアント106は、データベースD1,D2,D3,...のリストを保持し(データベースはその順序で問い合わせられる)、ユーザに対する統合された表示に結合されるアクティビティアレイ3420及び識別子3410を生成する。所与のクライアント装置106は、全てのデータベースから利用可能なパッチを表示してもよいし、或いはデータベースの一部分(例えば、D1,D3及びD7のみ)を選択することをユーザに許可してもよいし、それらのデータベース中のパッチを表示するだけでもよい。あるサービスに加入することで、データベースはリストに加えられてもよいし、クライアント装置106がある場所にあった場合にデータベースは無線で利用可能にされてもよい。なぜなら、データベースはクライアント装置106にロードされているいくつかの内の1つだからであり、あるユーザはその装置を使って現在認証されているからであり、或いは、装置はあるモードで動作しているからでさえある。例えば、特定のクライアント装置がそのオーディオスピーカをターンオンに又はオフにすることに起因して、或いはビデオプロジェクタのような周辺装置が現在クライアントに取り付けられていることに起因して、いくつかのデータベースが利用可能であるかもしれない。 Furthermore, it should be noted that each client 106 may access multiple database systems 3400 at any time, and the contents of the databases do not necessarily compete with each other. For example, a company may own both publicly accessible patches and private ones (each associated with one document) for that company. In such a case, the client 106 maintains a list of databases D1, D2, D3,... (Databases are queried in that order) and an activity array 3420 and identifier 3410 coupled to a consolidated display for the user. Is generated. A given client device 106 may display available patches from all databases, or may allow a user to select a portion of a database (eg, only D1, D3, and D7). Or just display the patches in those databases. By subscribing to a service, the database may be added to the list, or the database may be made available wirelessly if the client device 106 is at a location. This is because the database is one of several that are loaded on the client device 106, because a user is currently authenticated using that device, or the device is operating in a mode. Even because they are. For example, several databases may result from a particular client device turning its audio speaker on or off, or because a peripheral device such as a video projector is currently attached to the client. May be available.

アクション
図３４Ａを更に参照するに、MMRデータベース3400は、MMR特徴抽出モジュール3402からの一群の特徴と共に或るアクションを受ける。アクションはコマンド及びパラメータを指定する。そのような例では、コマンド及びそのパラメータは、返されるパッチの特徴3412を決定する。例えば、テキストに容易に変換可能なhttpを含むフォーマットでアクションが受信されてもよい。 Referring further to the action diagram 34A, the MMR database 3400 receives certain actions with a group of features from the MMR feature extraction module 3402. Actions specify commands and parameters. In such an example, the command and its parameters determine the returned patch characteristics 3412. For example, actions may be received in a format that includes http that can be easily converted to text.

アクションプロセッサ3413は、証拠蓄積モジュール3406により判定されたページ中のｘ−ｙ座標、ページ及び書類の身元を受信する。アクションプロセッサはコマンド及びそのパラメータも受信する。アクションプロセッサ3413は、コマンドを命令(指示)に変換するようにプログラムされる又は構築され、所与の書類、ページ及びｘ−ｙ座標に対応するロケーションでデータをリレーショナルデータベース3408を用いて抽出又は格納する。 The action processor 3413 receives the xy coordinates in the page, the page and the identity of the document determined by the evidence storage module 3406. The action processor also receives commands and their parameters. The action processor 3413 is programmed or constructed to convert commands into instructions (instructions) to extract or store data using a relational database 3408 at a location corresponding to a given document, page and xy coordinates. To do.

そのような一実施例では、コマンドは次のものを含む：RETRIEVE, INSERT_TO <DATA>, RETRIEVE_TEXT <RADIUS>, TRANSFER<AMOUNT>, PURCHASE, PRISTINE_PATCH<RADIUS[DOCID PAGEID X Y DPI]>及びACESS_DATABASE<DBID>。以下、これら各々が説明される。 In one such embodiment, the commands include: RETRIEVE, INSERT_TO <DATA>, RETRIEVE_TEXT <RADIUS>, TRANSFER <AMOUNT>, PURCHASE, PRISTINE_PATCH <RADIUS [DOCID PAGEID XY DPI]> and ACESS_DATABASE <DBID> . Each of these will be described below.

RETRIEVEは、所与の書類ページ中のｘ−ｙ座標にリンクしたデータを取り出す。アクションプロセッサは、RETRIEVEコマンドをリレーショナルデータベースクエリに変換し、そのｘ−ｙ座標近辺に格納されているデータを抽出する。これは、そのｘ−ｙ座標周辺の領域を探索するために、１つより多いデータベースクエリの発行を要するかもしれない。取り出されたデータは、パッチ特徴3412として出力される。RETRIEVEコマンドの具体的なアプリケーションは、マルチメディアブラウジングアプリケーションであり、ビデオクリップや動的な情報オブジェクト(例えば、目下の情報が抽出可能な電子アドレス)を抽出する。抽出されたデータは、MMR装置で実行される以後のステップを指定するメニューを含んでもよい。抽出されたデータは静的なデータでもよく、そのデータは、JPEG画像又はビデオクリップのように電話で(又は他の表示装置で)表示できる。パッチ特徴を探す領域を決定するパラメータがRETRIEVEコマンドに用意されてもよい。 RETRIEVE retrieves data linked to xy coordinates in a given document page. The action processor converts the RETRIEVE command into a relational database query and extracts data stored in the vicinity of the xy coordinates. This may require issuing more than one database query to search for the area around that xy coordinate. The retrieved data is output as a patch feature 3412. A specific application of the RETRIEVE command is a multimedia browsing application, which extracts a video clip and a dynamic information object (for example, an electronic address from which current information can be extracted). The extracted data may include a menu that specifies subsequent steps to be executed by the MMR device. The extracted data may be static data, which can be displayed on the phone (or on other display devices) like JPEG images or video clips. A parameter for determining an area to search for patch features may be provided in the RETRIEVE command.

INSERT_TO<DATA>は、<DATA>を画像パッチで指定されたｘ−ｙ座標に挿入する。アクションプロセッサ3413は、INSERT_TOコマンドをリレーショナルデータベース用の命令に変換し、指定されたｘ−ｙ座標にデータを加える。INSERT_TOコマンドが首尾よく完了したことの確認通知は、パッチ特徴3412として返される。INSERT_TOコマンドの具体的なアプリケーションは、MMR装置でのソフトウエアアプリケーションであり、テキストの一節の中の任意のｘ−ｙ座標にデータを付けることをユーザに許可する。データはJPEG画像のような静的なマルチメディア、ビデオクリップ又はオーディオファイル等でもよいが、所与のロケーションに関連するアクションを指定するメニューのような任意的な電子データでもよい。 INSERT_TO <DATA> inserts <DATA> into the xy coordinates specified by the image patch. The action processor 3413 converts the INSERT_TO command into an instruction for a relational database, and adds data to the designated xy coordinates. A confirmation notice that the INSERT_TO command has been successfully completed is returned as a patch feature 3412. A specific application of the INSERT_TO command is a software application on the MMR device that allows the user to attach data to any xy coordinate in a passage of text. The data can be static multimedia such as JPEG images, video clips or audio files, etc., but can also be arbitrary electronic data such as menus that specify actions associated with a given location.

RETRIEVE_TEXT<RADIUS>は、画像パッチで決められているｘ−ｙ座標の半径(<RADIUS>）の中でテキストを取り出す。<RADIUS>は、例えば、画像空間での画素数として指定されてもよいし、或いは、証拠蓄積モジュール3406で決められているｘ−ｙ座標周辺のワードのキャラクタ数として指定されてもよい。<RADIUS>は分析されるテキストオブジェクトに関連してもよい。この特定の例の場合、アクションプロセッサ3413はRETRIEVE＿TEXTコマンドをリレーショナルデータベースクエリに変換し、適切なテキストを抽出する。<RADIUS>が分析されるテキストオブジェクトを指定する場合、アクションプロセッサは、分析されるテキストオブジェクトを返すだけである。分析されるテキストオブジェクトが指定されたｘ−ｙ座標近辺に見当たらなかった場合、アクションプロセッサはヌル(null)の指標を返す。代替例では、アクションプロセッサはフィードバック指向の特徴サーチモジュールを呼び出し、所与のｘ−ｙ座標の半径の中にあるテキストを取り出す。テキストストリング(文字列)は、パッチ特徴3412として返される。テキストストリング内の各ワードに関連する選択的なデータは、オリジナル書類中のｘ−ｙ境界ボックスを含む。RETRIE_TEXTコマンドの具体的なアプリケーションは、印刷書類からテキストフレーズを選択し、別書類に含めることである。これは、例えば、MMRシステムで(例えば、パワーポイントフォーマットで)プレゼンテーションファイルを構成するのに使用可能である。 RETRIEVE_TEXT <RADIUS> extracts text within the radius (<RADIUS>) of xy coordinates determined by the image patch. <RADIUS> may be specified, for example, as the number of pixels in the image space, or may be specified as the number of characters in the words around the xy coordinates determined by the evidence storage module 3406. <RADIUS> may relate to the text object being analyzed. For this particular example, the action processor 3413 converts the RETRIEVE_TEXT command into a relational database query and extracts the appropriate text. If <RADIUS> specifies a text object to be analyzed, the action processor only returns the text object to be analyzed. If the text object to be analyzed is not found near the specified xy coordinates, the action processor returns a null index. In the alternative, the action processor invokes a feedback-oriented feature search module to retrieve text that is within the radius of a given xy coordinate. A text string is returned as the patch feature 3412. The selective data associated with each word in the text string includes an xy bounding box in the original document. A specific application of the RETRIE_TEXT command is to select a text phrase from a printed document and include it in a separate document. This can be used, for example, to compose a presentation file in an MMR system (eg, in PowerPoint format).

TRANSFER<AMOUNT>は、書類全体を又は書類にリンクしたデータの一部を別のデータベースにロード可能な形式で取り出す。<AMOUNT>は取り出されるデータの数及びタイプを指定する。<AMOUNT>がALL(全て)であった場合、アクションプロセッサ3413はデータベース3408にコマンドを発行し、書類に関連するデータを全て取り出す。そのようなコマンドの具体例は、DUMP又はUnix(登録商標)TARを含む。<AMOUNT>がSOURCE(ソース)であった場合、その書類についてのオリジナルのソースファイルが取り出される。例えば、アクションプロセッサは印刷書類のワードファイルを取り出す。<AMOUNT>がBITMAP(ビットマップ)であった場合、印刷書類のビットマップのJPEG圧縮バージョン(又は他の一般的に使用されるフォーマット)が取り出される。<AMOUNT>がPDFであった場合、書類のPDF表現が取り出される。取り出されたデータは、コマンド名により呼出アプリケーションに知られているフォーマットでパッチ特徴3412として出力される。TRANSFERコマンドの具体的なアプリケーションは「ドキュメントグラバ(document grabber)」であり、テキストの小さな領域を画像処理することで、書類のPDF表現をMMR装置に転送することをユーザに許可する。 TRANSFER <AMOUNT> retrieves the entire document or a portion of the data linked to the document in a format that can be loaded into another database. <AMOUNT> specifies the number and type of data to be retrieved. If <AMOUNT> is ALL, the action processor 3413 issues a command to the database 3408 to retrieve all data associated with the document. Examples of such commands include DUMP or Unix® TAR. If <AMOUNT> is SOURCE, the original source file for the document is retrieved. For example, the action processor retrieves a word file of a printed document. If <AMOUNT> is BITMAP, a JPEG compressed version (or other commonly used format) of the bitmap of the print document is retrieved. If <AMOUNT> is PDF, the PDF representation of the document is retrieved. The retrieved data is output as a patch feature 3412 in a format known to the calling application by the command name. A specific application of the TRANSFER command is “document grabber”, which allows a user to transfer a PDF representation of a document to an MMR device by image processing a small area of text.

PURCHASEは、書類中のｘ−ｙ座標にリンクした製品仕様を取り出す。アクションプロセッサ3413は一連の１つ以上のRETRIEコマンドを先ず実行し、所与のｘ−ｙ座標近辺の製品仕様を取得する。製品仕様は、例えば、ベンダ名、製品の身元(例えば、ストック番号)、ベンダの電子アドレス等を含む。製品仕様は、近辺に位置するかもしれない他のデータタイプよりも優先して取得される。例えば、画像パッチで決められているｘ−ｙ座標にjpegが格納されていた場合、隣の最も近い所の製品仕様が代わりに取り出される。取り出された製品仕様は、パッチ特徴3412として出力される。PURCHASEコマンドの具体的なアプリケーションは、印刷書類での宣伝広告に関連する。MMR装置のソフトウエアアプリケーションは、その宣伝広告に関連する製品仕様を受信し、ユーザの個人的な識別情報(例えば、名前、送付先、クレジットカード番号等)を加え、その後にそれを指定された電子アドレスの指定されたベンダに送付する。 PURCHASE retrieves the product specification linked to the xy coordinates in the document. The action processor 3413 first executes a series of one or more RETRIE commands to obtain a product specification near a given xy coordinate. The product specification includes, for example, a vendor name, a product identity (for example, a stock number), a vendor electronic address, and the like. Product specifications are obtained in preference to other data types that may be located nearby. For example, when jpeg is stored in the xy coordinates determined by the image patch, the product specification of the nearest nearest neighbor is taken out instead. The retrieved product specification is output as a patch feature 3412. The specific application of the PURCHASE command relates to advertising on printed documents. The software application on the MMR device receives the product specification associated with the advertisement, adds the user's personal identification information (eg name, shipping address, credit card number, etc.) and then specifies it Send to a vendor with an electronic address.

PRISTINE_PATCH<RADIUS[DOCID PAGEID X Y DPI]>は、指定された書類の電子表現を取得し、ｘ−ｙを中心とする半径RADIUSの中でイメージパッチを取得する。RADIUSは円の半径を本来は指定するが、四角形のパッチを指定してもよい(例えば、２インチの高さで３インチ幅)。また、書類ページ全体を指定してもよい。(DocID,PG,x,y)の情報は、アクションの一部として明示的に与えられてもよいし、テキストパッチの画像から導出されてもよい。アクションプロセッサ3413は、書類の元の表現をリレーショナルデータベース3408から取り出す。その表現はビットマップでもよいが、表示可能な電子書類でもよい。オリジナルの表現は、書類レンダリングアプリケーション3414に伝達され、(インチ当たりのドットとしてパラメータDPIで与えられ得る解像度と共に)ビットマップに変換され、そしてサブイメージ抽出部3416に与えられ、所望のパッチが取り出される。パッチイメージはパッチ特徴3412として返される。 PRISTINE_PATCH <RADIUS [DOCID PAGEID XY DPI]> obtains an electronic representation of a specified document and obtains an image patch within a radius RADIUS centered on xy. RADIUS originally specifies the radius of a circle, but may specify a rectangular patch (for example, a height of 2 inches and a width of 3 inches). Alternatively, the entire document page may be specified. The information of (DocID, PG, x, y) may be given explicitly as part of the action, or may be derived from the image of the text patch. Action processor 3413 retrieves the original representation of the document from relational database 3408. The representation may be a bitmap or a displayable electronic document. The original representation is communicated to the document rendering application 3414, converted to a bitmap (with resolution that can be given in parameters DPI as dots per inch), and given to the sub-image extractor 3416 to retrieve the desired patch . The patch image is returned as patch feature 3412.

ACESS_DATABASE<DBID>は、データベース3400をクライアント106のデータベースリストに加える。クライアントは、現在のリストの中に存在するデータベースだけでなく、今加えたデータベースを問い合わせることができる。DBIDは指定されたデータベースに関するファイル又はリモートネットワークを指定する。 ACESS_DATABASE <DBID> adds the database 3400 to the client 106 database list. The client can query not only the databases that are in the current list, but also the databases that have just been added. DBID specifies the file or remote network for the specified database.

インデックステーブル生成法
図３５は、本発明の一実施例によるMMRインデックステーブルを生成する方法3500を示す。本方法は例えば図３４Ａのデータベースシステム34002より実行可能である。そのような実施例では、MMRインデックステーブルモジュール3404(又は、詳細な別名のモジュール)により、スキャンした又は印刷した書類からMMRインデックステーブルが生成される。生成モジュールは、ソフトウエアで、ハードウエア(例えば、ゲートレベルのロジック)で、ファームウエア(例えば、本方法を実行する組込ルーチンと共に構築されたマイクロコントローラ)で、又はそれらの何らかの組み合わせで、本願で説明される他のモジュールと同様に実現可能である。 Index Table Generation Method FIG. 35 illustrates a method 3500 for generating an MMR index table according to one embodiment of the present invention. This method can be executed, for example, from the database system 34002 of FIG. 34A. In such an embodiment, the MMR index table module 3404 (or a detailed alias module) generates an MMR index table from the scanned or printed document. The generation module is in software, hardware (e.g., gate level logic), firmware (e.g., a microcontroller built with an embedded routine that performs the method), or some combination thereof. It can be realized in the same manner as other modules described in the above.

本方法は紙書類を受信すること3510を含む。紙書類は如何なる書類でもよく、例えば、何ページでもよいメモ(例えば、業務関連メモ、個人的な手紙)、製品ラベル(例えば、缶詰品、薬、箱詰めされた電子装置)、製品仕様(例えば、スノーブラウザ、コンピュータシステム、製造システム)、製品ブローシャ又は広告題材(例えば、自動車、ボート、バケーションリゾート)、サービス説明題材(例えば、インターネットサービスプロバイダ、クリーニングサービス)、本、雑誌その他の刊行物中の１ページ以上、ウェブサイトから印刷したページ、手書きメモ、ホワイトボードから捕捉及び印刷されたメモ、何らかの処理システム(例えば、デスクトップ、ポータプルコンピュータ、カメラ、スマートフォン、リモート端末)から印刷されたページ等である。 The method includes receiving 3510 a paper document. The paper document can be any document, for example a memo (e.g. business-related memos, personal letters), product labels (e.g. canned goods, medicines, boxed electronic devices), product specifications (e.g. Snow browsers, computer systems, manufacturing systems), product brochures or advertising materials (eg cars, boats, vacation resorts), service description materials (eg internet service providers, cleaning services), books, magazines and other publications A page or more, a page printed from a website, a handwritten memo, a memo captured and printed from a whiteboard, a page printed from some processing system (eg, desktop, portable computer, camera, smartphone, remote terminal).

本方法は、紙書類の電子表現を生成すること3512に続き、その電子表現は、書類中に示される特徴のｘ−ｙ座標を含む。目標の(検索対象)特徴は、例えば、個々のワード、文字(複数)、及び／又はキャラクタ等の書類中のものでよい。例えば、オリジナルの書類がスキャンされる場合、先ずOCR処理され、ワード(又は、他の検索対象特徴)及びそれらのｘ−ｙ座標が抽出される(例えば、スキャナ127のドキュメントフィンガープリント照合モジュール226により抽出される)。オリジナル書類が印刷される場合、索引付けプロセス(インデキシングプロセス)は、全キャラクタのフォント、サイズ、ｘ−ｙ境界ボックスの(又は他の検索対象特徴の)XMLフォーマットで、(例えば、プリンタ116のプリンタドライバ316の処理により)正確な表現を受信する。この場合、(例えば、プリンタドライバ316により)正確に特定されたｘ−ｙ特徴座標と共に電子書類が受信されるので、インデックステーブル生成がステップ3514で始まる。XML以外のフォーマットについても本願の開示内容から明らかであろう。マイクロソフトワード、アドビアクロバット及びポストスクリプトのような電子書類は、それらをプリンタドライバに「プリントする」ことでデータベースに入力可能になり、プリンタドライバの出力はファイルに向けられ、紙書類が作成される必要はないようにする。これは、以下に示されるXMLファイル構造の生成を引き起こす。全ての場合に、オリジナルの書類フォーマット(ワード、アクロバット、ポストスクリプト等)に加えてXMLは、識別子(データベースに加えられるｉ番目の書類についてdoc i)に割り当てられ、その識別子により後に検索可能な方法でリレーショナルデータベース3408に格納されるだけでなく、捕捉された時間、印刷された日、印刷を引き起こしたアプリケーション、出力ファイル名等を含む書類の他の「メタデータ」特徴に基づく。 The method continues with generating 3512 an electronic representation of the paper document, the electronic representation including xy coordinates of features shown in the document. The target (search target) feature may be, for example, in a document such as an individual word, a character (s), and / or a character. For example, when an original document is scanned, it is first OCR processed to extract words (or other searchable features) and their xy coordinates (eg, by the document fingerprint matching module 226 of the scanner 127). Extracted). When the original document is printed, the indexing process (indexing process) can be performed in the XML format of all character fonts, sizes, xy bounding boxes (or other searchable features) (eg, printer 116 printer). Receive accurate representation (by driver 316 processing). In this case, index table generation begins at step 3514 because the electronic document is received with the accurately specified xy feature coordinates (eg, by printer driver 316). Formats other than XML will also be apparent from the disclosure of this application. Electronic documents such as Microsoft Word, Adobe Acrobat and PostScript can be entered into the database by "printing" them to the printer driver, and the output of the printer driver must be directed to a file and a paper document must be created Not to be. This causes the generation of the XML file structure shown below. In all cases, in addition to the original document format (word, acrobat, postscript, etc.), XML is assigned to an identifier (doc i for the i-th document added to the database), and a method that can be searched later by that identifier In addition to being stored in the relational database 3408, it is based on other “metadata” characteristics of the document, including time captured, date printed, application that caused printing, output file name, and so on.

XMLファイル構造の具体例は、次のとおりである：

A specific example of an XML file structure is as follows:

ある特定の実施例では、ワードは、ａ−ｚ、Ａ−Ｚ、０−９及び＠％＄＃等の如何なるキャラクタを含んでもよく、他の全ては区切り文字(デリミタ)である。.xmlファイルのオリジナルの記述は、インデキシングプロセスで使用されるプリントキャプチャーソフトウエアで生成可能である(例えば、それはデータベース320サーバのようなサーバで実行される。)。新たな書類がシステムで捕捉されるにつれて、実際のフォーマットは絶えず進展し、より多くの要素を含むようになる。 In certain embodiments, the word may include any character such as az, AZ, 0-9, and @% $ #, all others are delimiters. The original description of the .xml file can be generated by the print capture software used in the indexing process (eg, it is executed on a server such as database 320 server). As new documents are captured by the system, the actual format will continually evolve and include more elements.

プリントドライバ(例えば、プリントドライバ316)で受信したテキストの元のシーケンスは保存され、論理ワード構造は、“＿＠＆＄＃”を除いて句読点記号に基づいて制約される。XMLファイルを入力として使用し、インデックステーブルモジュール3404はページ境界を参照し、２つの連続するシーケンス間のオーバーラップ量を検査することで、先ずシーケンスを論理ラインにグループ化しようとする。ある特定の実施例では、２つのシーケンスがそれらの平均高さの半分未満しかオーバーラップしなかったならば、ラインの中断が起こっていることの発見的方法(heuristic)が使用される。そのような発見的方法は、典型的なテキスト書類(例えば、マイクロソフトワード)では良好に機能する。複雑なレイアウトを伴うhtmページの場合、付加的な幾何学的分析が必要になるかもしれない。しかしながら、クエリプロセスによって一貫した索引付けタームが生成可能である限り、完璧な意味論的書類構造を取り出すことは必須でない。 The original sequence of text received by the print driver (eg, print driver 316) is preserved and the logical word structure is constrained based on the punctuation symbols except for “_ @ & $ #”. Using an XML file as input, the index table module 3404 first attempts to group sequences into logical lines by looking at page boundaries and examining the amount of overlap between two consecutive sequences. In one particular embodiment, if the two sequences overlap less than half their average height, a heuristic is used that a line break is occurring. Such a heuristic works well with typical text documents (eg, Microsoft Word). For htm pages with complex layouts, additional geometric analysis may be required. However, it is not essential to retrieve a complete semantic document structure as long as a consistent indexing term can be generated by the query process.

紙書類の電子書類の構造に基づいて、本方法は、紙書類の全ページについて検索対象特徴全ての一を索引付けし続ける(3514)。ある特定の実施例では、このステップは、紙書類の全ページにおける水平に及び垂直に隣接する全てのペアの位置を索引付けすることを含む。上述したように、水平に隣接するワードはライン(行)の中で隣接するワードのペアである。縦に隣接するワードは、垂直に並ぶラインで隣接するワードである。ページ中の他の多次元形態が同様に利用されてもよい。 Based on the electronic document structure of the paper document, the method continues to index (3514) all of the searchable features for all pages of the paper document. In one particular embodiment, this step includes indexing the positions of all horizontally and vertically adjacent pairs on all pages of the paper document. As described above, horizontally adjacent words are pairs of adjacent words in a line (row). Vertically adjacent words are words that are adjacent in vertically aligned lines. Other multidimensional forms in the page may be used as well.

本方法は、各検索対象特徴に関連するパッチ特徴を格納すること(3516)を含む。ある特定の実施例では、パッチ特徴は、そのパッチに付随するアクションを含み、リレーショナルデータベースに格納されている。上述したように、そのような画像インデックス及びストレージ手段を組み合わせることは、画像パッチ及びそのパッチの特徴に一致するオブジェクトの発見を可能にする。特徴は、メタデータのような、パスに関連する如何なるデータでもよい。特徴は、例えば、特定の機能を実行するアクション、そのパッチに関連する他のコンテンツへのアクセスを与えるように選択可能なリンク、及び／又はパッチに関連する他のコンテンツの抽出を引き起こすようにスキャン又は処理されることが可能なバーコード等を含んでもよい。 The method includes storing (3516) a patch feature associated with each search target feature. In one particular embodiment, the patch features include actions associated with the patch and are stored in a relational database. As described above, combining such image index and storage means allows for the discovery of objects that match the image patch and the characteristics of the patch. The feature can be any data related to the path, such as metadata. Features can be scanned to cause, for example, actions that perform a specific function, links that can be selected to give access to other content related to the patch, and / or extraction of other content related to the patch Alternatively, it may include a barcode or the like that can be processed.

より正確な定義がサーチターム生成に関して与えられ、ライン構造の一部分のみが観察される。水平に隣接するペアの場合、そのワードと“−”の分離子(セパレータ)とを連結することで、クエリタームが形成される。垂直ペアは“＋”を用いて連結される。ワード(複数)はそれらオリジナルの形式で使用可能であり、望まれるならばキャピタリゼーション(大文字小文字)を維持する(これは、よりいっそう固有のタームを作成するが、そのように微妙な状況を考慮するための追加的なクエリ事項と共に、より大きなインデックスを作ることになる。)。本インデキシング法は、水平若しくは垂直ワードペアに、又は双方の組み合わせに同じサーチ法が適用されることを可能にする。タームの識別能力は、何らかの場合についての逆文献頻度(inverse document frequency)によって説明される。 A more accurate definition is given for search term generation and only a portion of the line structure is observed. In the case of horizontally adjacent pairs, a query term is formed by concatenating the word and a “-” separator. Vertical pairs are connected using “+”. Words can be used in their original form and preserve capitalization (case) if desired (this creates a more specific term, but makes such a subtle situation It will create a larger index with additional query items to consider.) This indexing method allows the same search method to be applied to horizontal or vertical word pairs, or a combination of both. The ability to identify a term is described by the inverse document frequency for any case.

証拠蓄積法
図３６は、検索対象書類のランク付けされた一群の書類、ページ及びロケーション仮説を算出する本発明の一実施例による方法3600を示す。本方法は、例えば、図３４Ａのデータベースシステム3400で実行可能である。そのような実施例では、証拠蓄積モジュール3406は、上述したようにインデックステーブルモジュール3404からのデータを用いて仮説(候補)を算出する。 Evidence Accumulation Method FIG. 36 illustrates a method 3600 according to one embodiment of the present invention for calculating a ranked group of documents, pages and location hypotheses of documents to be searched. The method can be performed, for example, on the database system 3400 of FIG. 34A. In such an embodiment, the evidence accumulation module 3406 calculates hypotheses (candidates) using data from the index table module 3404 as described above.

本方法は(大きな書類イメージ又は書類イメージ全体のイメージパッチのような)検索対象書類イメージを受信することから始まる(3610)。本方法は、1つ以上のクエリタームを生成すること(3612)に続き、検索対象書類イメージ中のオブジェクト間の2次元的位置関係を捕捉する。ある特定の実施例では、クエリタームは特徴抽出プロセスによって生成され、そのプロセスは、図３４Ｂを参照しながら説明したような水平の及び垂直のワードペアを生成する。しかしながら、本開示内容から明らかになるように、ここで説明される特徴抽出プロセスのいくつでもがクエリタームを生成するのに使用可能であり、検索対象書類イメージ中のオブジェクト間の２次元的位置関係を捕捉する。例えば、方法3500のインデックスを構築するのに使用されたのと同じ特徴抽出法が、ステップ3512を参照しながら説明されたようなクエリタームを生成するのに使用可能である(紙書類の電子表現を生成する。)。更に、クエリタームの２次元的形態が、クエリターム個々に適用されてもよいし(例えば、水平のワードペアである第１のクエリタームと、垂直のワードペアである第２のクエリターム)、或いは一群のサーチタームに適用されてもよい(例えば、検索対象書類中の水平及び垂直のオブジェクト双方を表現する１つのクエリターム)ことに留意を要する。 The method begins by receiving 3610 a document image to be searched (such as a large document image or an image patch of the entire document image). Following the generation of one or more query terms (3612), the method captures a two-dimensional positional relationship between objects in the searched document image. In one particular embodiment, the query terms are generated by a feature extraction process that generates horizontal and vertical word pairs as described with reference to FIG. 34B. However, as will become apparent from the present disclosure, any number of the feature extraction processes described herein can be used to generate a query term, and the two-dimensional positional relationship between objects in the document image to be searched is determined. To capture. For example, the same feature extraction method used to build the index of method 3500 can be used to generate a query term as described with reference to step 3512 (an electronic representation of a paper document). To generate.) Further, a two-dimensional form of query terms may be applied to each query term (eg, a first query term that is a horizontal word pair and a second query term that is a vertical word pair), or a group of search terms. Note that it may be applied (eg, one query term representing both horizontal and vertical objects in the document to be searched).

本方法は、タームインデックステーブル3422内で各クエリタームを探し、各クエリタームに関連する場所(ロケーション)のリストを抽出することに続く(3614)。各ロケーションに関し、本方法は、そのロケーションを含む多数の領域を生成することに続く(3616)。全てのクエリが処理された後で、本方法は、全てのクエリタームに最も一致する領域を特定することを含む(3618)。そのような実施例では、全候補領域のスコアが或るウエイトにより増やされる(例えば、各領域が全てのクエリタームとどの程度一致しているかに基づく)。本方法は、特定された領域が、予め決められた一致基準を満たすか否かを判定することに続く(3620)(例えば、所定の一致閾値に基づく)。満たしていれば、本方法は、その領域を検索対象書類イメージに一致しているように確認することに続く(3622)(例えば、その領域を最も多く含んでいるようなページがアクセスされ或いは使用される)。そうでなければ、本方法はその領域を拒否することに続く(3624)。 The method continues by looking for each query term in the term index table 3422 and extracting a list of locations associated with each query term (3614). For each location, the method continues by generating a number of regions containing that location (3616). After all queries have been processed, the method includes identifying the region that best matches all query terms (3618). In such an embodiment, the scores for all candidate regions are increased by some weight (eg, based on how well each region matches all query terms). The method continues by determining (3620) whether the identified region meets a predetermined match criterion (eg, based on a predetermined match threshold). If so, the method follows (3622) following verifying the region to match the document image being searched (e.g., the page containing the most region is accessed or used. ). Otherwise, the method continues to reject the area (3624).

ワードペアは、「規格化された」座標空間内の座標と共にタームインデックステーブル3422に格納される。これは、様々なプリンタ及びスキャナの解像度間の統一をもたらす。ある特定の実施例では、85×110の座標空間が8.5”×11”のページに使用される。その場合、ワードペア全てが、その85×110の空間の中の場所で特定される。 The word pairs are stored in the term index table 3422 along with the coordinates in the “normalized” coordinate space. This provides a unity between the resolutions of the various printers and scanners. In one particular embodiment, an 85 × 110 coordinate space is used for 8.5 ″ × 11 ″ pages. In that case, all word pairs are identified at a place in the 85 × 110 space.

サーチの効率を改善するため、２段階のプロセスが実行されてもよい。第１ステップは、入力画像パッチを最も含みそうなページを見つけることを含む。第２ステップは、そのページ内で最もパッチの中心らしいｘ−ｙ座標を計算することを含む。そのような方法は、真に最良の一致が第１ステップで見逃されてしまうおそれを招く。しかしながら、僅かなインデキシング空間に伴うそのような可能性は希である。かくて、インデックスのサイズ及び所望のパフォーマンスに依存して、そのような効率的な改善法が使用可能である。 To improve the search efficiency, a two-stage process may be performed. The first step involves finding the page that is most likely to contain the input image patch. The second step involves calculating the xy coordinates likely to be the center of the patch in the page. Such a method leads to the risk that the truly best match will be missed in the first step. However, such a possibility with a small indexing space is rare. Thus, depending on the size of the index and the desired performance, such an efficient improvement method can be used.

そのような一実施例では、ページを発見するために以下のアルゴリズムが使用され、そのページは、入力イメージパッチ内で検出されるワードペアを最も含みそうなものである。

In one such embodiment, the following algorithm is used to find a page, which is most likely to contain a word pair that is detected in the input image patch.

この方法は、各ワードペアの書類頻度逆関数(idf)を、ワードペアが生じる書類及びページでインデックスされるアキュムレータに加えるnum_docs(wp)は、ワードペアwpを含む書類数を返す。アキュムレータは、証拠蓄積モジュール3406で実現される。そのアキュムレータでの最大値が閾値を越えた場合、そのパッチに最も合うページとして出力される。従って、アルゴリズムは、クエリ中のワードペアに最も合うページを特定するように動作する。或いは、Accumアレイが分類(ソート)され、上位N個のページが、入力画像に合う「ベストN」ページとして報告される。 This method adds the document frequency inverse function (idf) of each word pair to the accumulator indexed by the document and page in which the word pair occurs, and num_docs (wp) returns the number of documents containing the word pair wp. The accumulator is implemented by the evidence accumulation module 3406. If the maximum value in the accumulator exceeds the threshold, the page that best matches the patch is output. Thus, the algorithm operates to identify the page that best matches the word pair in the query. Alternatively, the Accum array is sorted (sorted) and the top N pages are reported as “best N” pages that match the input image.

以下の証拠蓄積アルゴリズムは、本発明の一実施例により、１ページ内の入力画像パッチの位置に関するエビデンスを蓄積する。

The following evidence accumulation algorithm accumulates evidence regarding the position of an input image patch within a page, according to one embodiment of the present invention.

本アルゴリズムは、85×110の空間内のセルを特定するよう動作し、そのセルは、最も入力画像パッチの中心らしいものである。ここに示されている例では、各ワードペア周囲の固定された領域(ゾーンとも呼ばれる)内のセルにウエイトを加えることで、アルゴリズムはそれを実行する。 The algorithm operates to identify a cell in the 85 × 110 space, which is most likely the center of the input image patch. In the example shown here, the algorithm does this by adding weights to cells in a fixed area (also called a zone) around each word pair.

範囲関数(extent function)がx,yのペアに与えられ、範囲関数は、包囲する固定サイズ領域の最小値及び最大値を返す(1.5”の高さ及び2”の幅が一般的である。)。範囲関数は、境界条件に配慮し、返す値がアキュムレータの外に出ないことを確認する(即ち、０より小さい、ｘが85より大きい、又はｙが110より大きい領域に存在しないようにする。)。最大距離関数(maxdist function)は、境界ボックス座標(minx,maxx,miny,maxy)で記述される境界ボックス内の２点間の最大ユークリッド距離を見出す。ワードペアの書類頻度逆関数と、セル及びゾーンの中心間の規格化された幾何学的距離との積で決定されるウエイトが、ゾーン内の各セルについて算出される。これは、中心に近く、遠く離れたセルより高いセルを重み付ける。全てのワードペアがそのアルゴリズムで処理された後、Accum2アレイは、最大値を伴うセルを探す。それが閾値を越えた場合、その座標はイメージパッチの場所として報告される。アクティビティアレイは、蓄積されたnorm_dist値を格納する。それらはidfでスケーリングされてないので、それらは、特定のワードペアを含むデータベース内の書類数を考慮しない。しかしながら、それらは、所与の一群のワードペアに最も合うｘ−ｙ座標について２次元的画像表現をもたらす。更に、アクティビティアレイのエントリは、データベースに格納されている書類と独立である。通常内部で使用されるこのデータ構造が出力されてもよい(3420)。 An extent function is given to the x, y pair, and the range function returns the minimum and maximum values of the enclosing fixed size region (typically 1.5 "high and 2" wide). ). The range function takes into account the boundary conditions and ensures that the value returned does not go out of the accumulator (i.e., not in a region that is less than 0, x is greater than 85, or y is greater than 110). ). The max distance function (maxdist function) finds the maximum Euclidean distance between two points in a bounding box described by bounding box coordinates (minx, maxx, miny, maxy). A weight determined by the product of the inverse document frequency function of the word pair and the normalized geometric distance between the cell and the center of the zone is calculated for each cell in the zone. This weights cells closer to the center and higher than cells far away. After all word pairs have been processed with the algorithm, the Accum2 array looks for the cell with the maximum value. If it exceeds the threshold, the coordinates are reported as the location of the image patch. The activity array stores the accumulated norm_dist value. Since they are not scaled with idf, they do not consider the number of documents in the database that contain a particular word pair. However, they provide a two-dimensional image representation for the xy coordinates that best fit a given group of word pairs. In addition, the activity array entries are independent of the documents stored in the database. This data structure normally used internally may be output (3420).

規格化された幾何学的処理は、本発明の一実施例により、以下に示されるように算出される。

ワードペアの場所とゾーンの中心との間のユークリッド距離が計算され、そのユークリッド距離と、計算される可能性のある最大距離との間の差分が返される。 The normalized geometric processing is calculated as shown below according to one embodiment of the present invention.

The Euclidean distance between the word pair location and the center of the zone is calculated and the difference between the Euclidean distance and the maximum distance that can be calculated is returned.

ワードペアが証拠蓄積アルゴリズムで処理された後、Accum2アレイは最大値を伴うセルについて探索される。その値が所定の閾値を越えた場合、その座標は、イメージパッチの中心の場所として報告される。 After the word pair is processed with the evidence accumulation algorithm, the Accum2 array is searched for the cell with the maximum value. If the value exceeds a predetermined threshold, the coordinates are reported as the center location of the image patch.

MMR印刷アーキテクチャ
図３７Ａは、本発明の一実施例によるMMRコンポーネントの機能ブロック図を示す。主なMMRコンポーネントは、関連するプリンタ116及び/又は共有書類注釈(SDA)サーバ3755と共にコンピュータ3705を含む。 MMR Printing Architecture FIG. 37A shows a functional block diagram of an MMR component according to one embodiment of the present invention. The main MMR components include a computer 3705 with an associated printer 116 and / or shared document annotation (SDA) server 3755.

コンピュータ3705は、当該技術分野で既知の何らかの標準的な、デスクトップ、ラップトップ又はネットワークコンピュータである。一実施例では、コンピュータは図１Ｂを参照しながら説明されたMMRコンピュータ112である。ユーザプリンタ116は、当該技術分野で既知の何らかの標準的な、ホーム、オフィス又は市販のプリンタである。ユーザプリンタ116は印刷書類118を生成し、印刷書類は、１以上のページで形成される紙書類である。 Computer 3705 is any standard desktop, laptop or network computer known in the art. In one embodiment, the computer is the MMR computer 112 described with reference to FIG. 1B. User printer 116 is any standard home, office or commercially available printer known in the art. The user printer 116 generates a print document 118, which is a paper document formed of one or more pages.

SDAサーバ3755は標準的なネットワーク化された又はセントラル化されたコンピュータであり、そのコンピュータは、共用される注釈法に関する様々なファイル、アプリケーション及び／又は情報を保持する。例えば、ウェブページ又は他の書類に関連する共用される注釈は、SDAサーバ3755に格納される。この例では、注釈は上述のMMRで使用されるデータ又は指示である。SDAサーバ3755は、一実施例ではネットワーク接続を介してアクセス可能である。一実施例では、SDAサーバ3755は図１Ｂを参照しながら説明されたネットワーク化されたメディアサーバ114である。 The SDA server 3755 is a standard networked or centralized computer that holds various files, applications and / or information regarding shared annotations. For example, shared annotations associated with web pages or other documents are stored on the SDA server 3755. In this example, the annotation is data or instructions used in the MMR described above. The SDA server 3755 is accessible via a network connection in one embodiment. In one embodiment, SDA server 3755 is networked media server 114 described with reference to FIG. 1B.

コンピュータ3705は様々なコンポーネントを有し、コンポーネントの全部又は一部は様々な実施形態に応じて選択的である。一実施例では、コンピュータ3705は、ソースファイル3710、ブラウザ3715、プラグイン3720、シンボリックホットスポット記述3725、修正されたファイル3730、捕捉モジュール3735、page_desc.xml3740、hotspot.xml3745、データストア3750、SDAサーバ3755及びMMR印刷ソフトウエア3760を含む。 Computer 3705 has various components, all or some of which are optional depending on the various embodiments. In one embodiment, computer 3705 includes source file 3710, browser 3715, plug-in 3720, symbolic hotspot description 3725, modified file 3730, capture module 3735, page_desc.xml3740, hotspot.xml3745, data store 3750, SDA server Includes 3755 and MMR printing software 3760.

ソースファイル3710は、書類の電子表現である何らかのソースファイルを表す。ソースファイル3710の具体例は、ハイパーテキストマークアップ言語(HTML)ファイル、マイクロソフト(登録商標)ワード(登録商標)ファイル、マイクロソフト(登録商標)パワーポイント(登録商標)ファイル、シンプルテキストファイル、ポータブルドキュメントフォーマット(PDF)ファイル等を含む。上述したように、ブラウザ3715で受信される書類は、多くの場合、ソースファイル3710から発する。一実施例では、ソースファイル3710は図3を参照しながら説明されたソースファイル310と等価である。 Source file 3710 represents some source file that is an electronic representation of a document. Specific examples of source files 3710 include hypertext markup language (HTML) files, Microsoft® Word® files, Microsoft® PowerPoint® files, simple text files, portable document formats ( PDF) file etc. are included. As described above, documents received by browser 3715 often originate from source file 3710. In one embodiment, source file 3710 is equivalent to source file 310 described with reference to FIG.

ブラウザ3715は、ソースファイル3710に関連するデータへのアクセスをもたらすアプリケーションである。例えば、ブラウザ3715は、ウェブページ及び／又はドキュメントをソースファイル3710から取り出すのに使用されてもよい。一実施例では、ブラウザ3715は図３を参照しながら説明されたSDブラウザ312,314である。一実施例では、ブラウザ3715はインターネットエクスプローラのようなインターネットブラウザである。 Browser 3715 is an application that provides access to data associated with source file 3710. For example, browser 3715 may be used to retrieve web pages and / or documents from source file 3710. In one embodiment, the browser 3715 is the SD browser 312 314 described with reference to FIG. In one embodiment, browser 3715 is an Internet browser such as Internet Explorer.

プラグイン3720は、オーソリング機能をもたらすソフトウエアアプリケーションである。プラグイン3720は、スタンドアローンソフトウエアアプリケーションでもよいし、或いは、ブラウザ3715上で動作するプラグインでもよい。一実施例では、プラグイン3720はコンピュータプログラムであり、ブラウザ3715のようなアプリケーションと相互作用し、上述の特定の機能を提供する。プラグイン3720は、様々な実施例に応じてブラウザ3715に表示されるウェブページや書類に対する変換その他の修正を行う。例えば、プラグイン3720は、個々に識別可能な基準マークと共にホットスポットの指標を包囲してホットスポットを作成し、HTMLファイルの「マークアップ」バージョンをブラウザ3715に返し、ブラウザ3715に表示される書類の一部に或る変換規則を適用し、ブラウザ3715に表示される書類に対する共有注釈を抽出及び／又は受信する。更に、プラグイン3720は、修正された書類を作成すること、シンボリックホットスポット記述3725を作成すること等のような他の機能を上述のように実行してもよい。プラグイン3720は、捕捉モジュール3735に関連して、図３８，４４，４５，４８及び５０Ａ−Ｂで説明される方法を支援する。 Plug-in 3720 is a software application that provides an authoring function. The plug-in 3720 may be a stand-alone software application, or may be a plug-in that runs on the browser 3715. In one embodiment, plug-in 3720 is a computer program that interacts with an application such as browser 3715 to provide the specific functionality described above. Plug-in 3720 performs conversions and other modifications to web pages and documents displayed on browser 3715 according to various embodiments. For example, plug-in 3720 creates a hot spot by enclosing a hot spot indicator with individually identifiable fiducial marks, returns a “markup” version of the HTML file to browser 3715, and the document displayed in browser 3715 A conversion rule is applied to a part of the document, and a shared annotation for the document displayed in the browser 3715 is extracted and / or received. Further, the plug-in 3720 may perform other functions as described above, such as creating a modified document, creating a symbolic hotspot description 3725, and the like. Plug-in 3720 supports the method described in FIGS. 38, 44, 45, 48 and 50A-B in connection with capture module 3735.

シンボリックホットスポット記述3725は、書類内のホットスポットを特定するファイルである。シンボリックホットスポット記述3725は、ホットスポット数及びコンテンツを特定する。この例では、シンボリックホットスポット記述3725はデータストア3750に格納される。シンボリックホットスポット記述の具体例は、図４１で更に詳細に説明される。 The symbolic hot spot description 3725 is a file that identifies a hot spot in a document. The symbolic hot spot description 3725 specifies the number of hot spots and the content. In this example, the symbolic hot spot description 3725 is stored in the data store 3750. A specific example of a symbolic hot spot description is described in more detail in FIG.

修正されたファイル3730は、プラグイン3720によるソースファイル3710に対する修正及び変換の結果として作成された書類及びウェブページである。例えば、上述のマークアップHTMLファイルは、修正されたファイル3730の一例である。本開示内容から明らかになるように、修正されたファイル3730は、場合によっては、ユーザに表示するためにブラウザ3715に返される。 The modified file 3730 is a document and web page created as a result of modification and conversion of the source file 3710 by the plug-in 3720. For example, the markup HTML file described above is an example of a modified file 3730. As will become apparent from the present disclosure, the modified file 3730 is optionally returned to the browser 3715 for display to the user.

捕捉モジュール3735は或るソフトウエアアプリケーションであり、書類の印刷表現に特徴抽出及び／又は座標捕捉を実行し、印刷ページ上のキャラクタ及び図形のレイアウトが取り出せるようにする。レイアウト−即ち、印刷ページでのテキストの２次元配列は、印刷時に自動的に捕捉されてもよい。例えば、捕捉モジュール3735は、全ての文字及び描画印刷コマンドを実行し、更に、その印刷表現における全キャラクタ及び／又はイメージのｘ−ｙ座標及び他のキャラクタを取得及び記録する。一実施例によれば、捕捉モジュール3735は、ここで説明されるような印刷捕捉DLL、追加を許可するフォワーディングダイナミックリンクライブラリ(DLL: Dynamically Linked Library)、又は既存のDLLの機能を修正したもの等である。捕捉モジュール3735の機能の更に詳細な説明は、図４４に関連してなされる。 The capture module 3735 is a software application that performs feature extraction and / or coordinate capture on a printed representation of a document so that the layout of characters and graphics on a printed page can be retrieved. Layout—That is, a two-dimensional array of text on a printed page may be captured automatically when printed. For example, the capture module 3735 executes all character and drawing print commands, and also acquires and records all characters and / or xy coordinates of the image and other characters in the printed representation. According to one embodiment, the capture module 3735 may be a print capture DLL as described herein, a forwarding dynamically linked library (DLL) that allows additions, or a modified version of an existing DLL, etc. It is. A more detailed description of the function of the acquisition module 3735 is given in connection with FIG.

捕捉モジュール3735は、データを捕捉するためにブラウザ3715の出力に結合されることを当業者は認識するであろう。或いは、捕捉モジュール3735の機能は、プリンタドライバの中で直接的に実行されてもよい。一実施例では、捕捉モジュール3735は、図３に関して説明されたようなPD捕捉モジュールと等価である。 One skilled in the art will recognize that the capture module 3735 is coupled to the output of the browser 3715 to capture data. Alternatively, the functions of the capture module 3735 may be performed directly in the printer driver. In one embodiment, capture module 3735 is equivalent to a PD capture module as described with respect to FIG.

page_desc.xml3740は、拡張可能マークアップ言語(XML)ファイルであり、テキスト関連である捕捉モジュール3735により処理される機能呼出に関してテキスト関連出力が書き込まれる。ホットスポット情報、プリンタポート名、ブラウザ名、印刷の日時、インチ当たりのドット数(dpi)及び解像度(res)情報に加えて、page_dec.xml3740は、ワード毎に及びキャラクタ毎に印刷された全テキストについて書類の座標情報を含む。page_dec.xml3740は例えばデータストア3750に格納される。データストア3750は図３４Ａを参照しながら説明されたMMRデータベース3400と等価である。図４２Ａ−ＢはHTMLファイルの場合のpage_dec.xml3740の具体例を詳細に示す。 page_desc.xml 3740 is an Extensible Markup Language (XML) file in which text related output is written for function calls processed by the text related capture module 3735. In addition to hotspot information, printer port name, browser name, date and time of printing, dots per inch (dpi) and resolution (res) information, page_dec.xml3740 contains all text printed for each word and for each character. About document coordinate information. For example, page_dec.xml3740 is stored in the data store 3750. The data store 3750 is equivalent to the MMR database 3400 described with reference to FIG. 34A. 42A-B show in detail a specific example of page_dec.xml3740 in the case of an HTML file.

hotspot.xml3745は、書類が印刷される際に作成されるXMLファイルである(例えば、上述したように印刷ドライバ316の動作によりなされる)。hotspot.xmlは、シンボリックホットスポット記述3725及びpage_dec.xml3740を併合した結果である。hotspot.xmlは、ホットスポット数、座標情報、寸法情報及びホットスポットのコンテンツのようなホットスポット識別情報を含む。図４３にはhotspot.xmlファイルの具体例が示されている。 hotspot.xml 3745 is an XML file created when a document is printed (for example, as described above, by the operation of the print driver 316). hotspot.xml is the result of merging the symbolic hotspot description 3725 and page_dec.xml3740. hotspot.xml includes hot spot identification information such as the number of hot spots, coordinate information, size information, and hot spot content. FIG. 43 shows a specific example of the hotspot.xml file.

データストア3750は、ファイルを格納するための当該技術分野で既知の何らかのデータベースであるが、ここで説明される方法を利用できるように修正される。例えば、一実施例によれば、データストア3750は、ソースファイル3710、シンボリックホットスポット記述3725、page_dec.xml3740、表現されるページレイアウト、共有される注釈、画像処理された書類、ホットスポットの定義及び特徴表現等を格納する。一実施例では、データストア3750は、図３を参照しながら説明されたドキュメントイベントデータベース320と等価であり、また図３４Ａを参照しながら説明されたデータベースシステム3400と等価である。 Data store 3750 is any database known in the art for storing files, but modified to utilize the methods described herein. For example, according to one embodiment, data store 3750 includes source file 3710, symbolic hotspot description 3725, page_dec.xml3740, rendered page layout, shared annotations, imaged document, hotspot definition and Stores feature expressions and the like. In one embodiment, the data store 3750 is equivalent to the document event database 320 described with reference to FIG. 3 and is equivalent to the database system 3400 described with reference to FIG. 34A.

MMR印刷ソフトウエア3760は、ここで説明されるMMR印刷処理を支援するソフトウエアであり、例えば、上述のコンピュータ3705のコンポーネントによって実行される。MMR印刷ソフトウエア3760は、図３７Ｂを参照しながら以下で更に詳細に説明される。 The MMR printing software 3760 is software that supports the MMR printing process described here, and is executed by, for example, the components of the computer 3705 described above. The MMR printing software 3760 is described in further detail below with reference to FIG. 37B.

図３７Ｂは、MMR印刷ソフトウエアに含まれている本発明の一実施例による一群のソフトウエアコンポーネントを示す。MMR印刷ソフトウエア3760の全部又は一部がコンピュータ112,905、捕捉装置106、ネットワークメディアサーバ114及び他のサーバに上述のように含まれてよいことが理解されるべきである。MMR印刷ソフトウエア3760はこれら様々なコンポーネントを含むように今のところ説明されるが、MMR印刷ソフトウエア3760はこれらのコンポーネントの１つから全てに至る内のいくつでもを備えてよいことを当業者は認識するであろう。MMR印刷ソフトウエア3760は、変換モジュール3765、組込モジュール3768、分析モジュール3770、変換モジュール3775、特徴抽出モジュール3778、注釈モジュール3780、ホットスポットモジュール3785、表現／表示モジュール3790及びストレージモジュール3795を含む。 FIG. 37B shows a group of software components included in the MMR printing software according to one embodiment of the present invention. It should be understood that all or part of the MMR printing software 3760 may be included in the computer 112,905, capture device 106, network media server 114, and other servers as described above. Although the MMR printing software 3760 is currently described as including these various components, those skilled in the art will appreciate that the MMR printing software 3760 may include any number from one to all of these components. Will recognize. The MMR printing software 3760 includes a conversion module 3765, an embedded module 3768, an analysis module 3770, a conversion module 3775, a feature extraction module 3778, an annotation module 3780, a hot spot module 3785, a representation / display module 3790, and a storage module 3795.

変換モジュール3765は、ソース書類を画像書類に変換することを可能にし、その画像書類から特徴表現が抽出可能であり、変換モジュールはそれを行う1つの手段である。 The conversion module 3765 allows the source document to be converted into an image document, and feature representations can be extracted from the image document, and the conversion module is one means of doing it.

組込モジュール3768は、電子書類のホットスポットの指標に対応するマークを埋め込むことを可能にし、それを行う１つの手段である。ある特定の実施例では、埋め込まれたマークは、ホットスポットの開始点及びホットスポットの終了点を示す。或いは、埋め込まれたマーク周辺の所定の領域が、電子書類のホットスポットを特定するために使用されてもよい。そのような様々なマーキング法が使用可能である。 The built-in module 3768 is one means of allowing and embedding a mark corresponding to the hot spot indicator of the electronic document. In one particular embodiment, the embedded marks indicate the hot spot start point and hot spot end point. Alternatively, a predetermined area around the embedded mark may be used to identify a hot spot in the electronic document. Various such marking methods can be used.

分析モジュール3770は、ホットスポットの開始点を示すマークについて、(プリンタに送付される)電子書類の分析を可能にし、それを実行する1つの手段である。 The analysis module 3770 is one means of enabling and performing analysis of the electronic document (sent to the printer) for the mark indicating the starting point of the hot spot.

変換モジュール3775は、電子書類の一部に或る変換規則を適用可能にし、それを行う１つの手段である。ある特定の実施例では、その一部分は、ホットスポットの開始点を示すマークとホットスポットの終了点を示すマークとの間のキャラクタのストリームである。 The conversion module 3775 is one way to make certain conversion rules applicable to a part of an electronic document and to do so. In one particular embodiment, the portion is a stream of characters between a mark indicating the start point of the hot spot and a mark indicating the end point of the hot spot.

特徴抽出モジュール3778は、書類及びホットスポットの印刷表現に対応する座標の抽出及び特徴の抽出を可能にし、それを行う1つの手段である。フォワーディングダイナミックリンクライブラリを用いて印刷コマンドを取り出すこと、そして、ホットスポット又は変換されたキャラクタに対応する座標の一部分についての印刷表現を分析することを座標捕捉は含む。特徴抽出モジュール3778は、一実施例により、捕捉モジュール3735の機能を実行可能にする。 The feature extraction module 3778 is one means of enabling and extracting the coordinates and features corresponding to the printed representation of the document and hotspot. Coordinate capture includes retrieving print commands using a forwarding dynamic link library and analyzing the printed representation for a portion of the coordinates corresponding to the hotspot or transformed character. The feature extraction module 3778 enables the functionality of the capture module 3735 according to one embodiment.

注釈モジュール3780は、共有される注釈及び共有される注釈に関連する書類の内の付随する指標部分を受信可能にし、それを行う１つの手段である。共有される注釈を受信することは、エンドユーザから及びSDAサーバから注釈を受けることを含む。 Annotation module 3780 is one means of enabling and receiving shared annotations and accompanying indicator portions of documents associated with the shared annotations. Receiving shared annotations includes receiving annotations from the end user and from the SDA server.

ホットスポットモジュール3785は、１つ以上のクリップを１つ以上のホットスポットに関連付けることを可能にし、それを行う１つの手段である。ホットスポットモジュール3785は、書類の中でホットスポットの位置を先ず指定し、ホットスポットに関連するクリップを決めることで、ホットスポット定義の作成を可能にする。 Hotspot module 3785 is one means of allowing and doing one or more clips to be associated with one or more hotspots. The hot spot module 3785 allows the creation of a hot spot definition by first specifying the location of the hot spot in the document and determining the clip associated with the hot spot.

表現／表示モジュール3790は、書類又は書類の印刷表現を、表現又は表示されるようにし、それを行う１つの手段である。 The representation / display module 3790 is a means for rendering or displaying a document or a printed representation of a document.

ストレージモジュール3795は、ページレイアウト、画像処理された書類、ホットスポット定義及び特徴表現を含む様々なファイルの格納を可能にし、それを行う１つの手段である。 Storage module 3795 is one means of enabling and storing various files including page layouts, image processed documents, hotspot definitions and feature representations.

ソフトウエア部3765−3795が個別的なソフトウエアモジュールであることは必須でない。図示のソフトウエアコンフィギュレーションは、単なる例示に過ぎず、本開示から明らかになるように、本発明の範囲内で他のコンフィギュレーションも想定される。 It is not essential that the software portion 3765-3795 is an individual software module. The software configuration shown is merely exemplary and other configurations are envisioned within the scope of the present invention, as will be apparent from the disclosure.

書類にホットスポットを埋め込む
図３８は、書類にホットスポットを埋め込む本発明の一実施例による方法のフローチャートを示す。 Embedding Hot Spots in a Document FIG. 38 shows a flowchart of a method for embedding hot spots in a document according to one embodiment of the present invention.

本方法によれば、書類中のホットスポットの指示に関連するマークが書類に埋め込まれる(3810)。一実施例では、ホットスポット指示位置を含む書類がブラウザでの表示に備えて受信され、例えば或る書類がソースファイル3710からブラウザ3715で受信される。ホットスポットは、電子データに加えて、あるテキストを又は図形や写真等のような他の書類オブジェクトを含む。電子データはオーディオ又はビデオのようなマルチメディアを含んでもよいし、ホットスポットがアクセスを受けた場合に捕捉装置で実行される一群のステップを表してもよい。例えば、書類がハイパーテキストマークアップ言語(HTML)ファイルであった場合、ブラウザはインターネットエクスプローラであり、指示はHTMLファイル中のユニフォームリソースロケータ(URL)でもよい。図３９ＡはURL3920を伴うそのようなHTMLファイル3910の例を示す。図４０Ａは、例えばインターネットエクスプローラのようなブラウザ4010で表示される際の、図３９ＡのHTMLファイル3910のテキストを示す。 According to the method, a mark associated with an indication of a hot spot in the document is embedded in the document (3810). In one embodiment, a document containing a hotspot indication location is received for display in a browser, for example, a document is received at the browser 3715 from a source file 3710. Hotspots include, in addition to electronic data, some text or other document objects such as graphics or photos. The electronic data may include multimedia such as audio or video, or may represent a group of steps performed at the capture device when the hotspot is accessed. For example, if the document is a hypertext markup language (HTML) file, the browser may be Internet Explorer and the instruction may be a uniform resource locator (URL) in the HTML file. FIG. 39A shows an example of such an HTML file 3910 with a URL 3920. FIG. 40A shows the text of the HTML file 3910 of FIG. 39A when displayed on a browser 4010 such as Internet Explorer.

(3810)でマークを埋め込むため、ブラウザ3715のプラグイン3720は、識別可能な基準マーク各々と共に各ホットスポット指示位置を包囲し、ホットスポットを作成する。一実施例では、プラグイン3720は、ブラウザ3715で表示される書類(例えば、上記に続いてインターネットエクスプローラで表示されるHTML)を修正し、ホットスポット指示位置(例えば、URL)を一まとめにするマーク又はタグを挿入する。そのマークは、ブラウザ3715で又は書類の印刷バージョンでその書類を見るエンドユーザにとって感知できないほど小さいかもしれないが、印刷コマンドで検出可能なものである。この例では、MMRクーリエニュー(MMR Courier New)と言及される新しいフォントが、基準マークの開始及び終了を付け加えるために使用される。MMRクーリエニューフォントでは、キャラクタ“ｂ”、“ｅ”及び数字用の一般的なグリフ又はドットパターンは、空白スペースで表現される。 In order to embed the mark in (3810), the plug-in 3720 of the browser 3715 surrounds each hot spot indication position together with each identifiable reference mark, and creates a hot spot. In one embodiment, plug-in 3720 modifies the document displayed in browser 3715 (eg, HTML displayed in Internet Explorer following the above) and bundles hotspot indication locations (eg, URLs). Insert a mark or tag. The mark may be undetectable by an end user viewing the document in the browser 3715 or in a printed version of the document, but is detectable with a print command. In this example, a new font referred to as MMR Courier New is used to add the start and end of fiducial marks. In the MMR Courier New font, general glyphs or dot patterns for characters “b”, “e” and numbers are represented by blank spaces.

図３９Ａ及び４０Ａに示されるHTMLページ例を更に参照するに、プラグイン3720は、URLの開始位置で基準マーク“b0”を(“here”)、URLの終了位置で基準マーク“e0”を埋め込み、識別子“0”と共にホットスポットを示す(3810)。b,e及び数字キャラクタはスペースとして表示されるので、ユーザは、書類の表示の中でほとんど又は全く変化を見ない。更に、プラグイン3720は、図４１に示されるようなこれらのマークを示すシンボリックホットスポット指示(情報)3725を作成する。シンボリックホットスポット指示3725は、ホットスポット番号を０で特定し、それは“b0”及び“e0”基準マークにおける０に対応する。この例では、シンボリックホットスポット指示3725は例えばデータストア3750に格納される。 Further referring to the HTML page example shown in FIGS. 39A and 40A, the plug-in 3720 embeds the reference mark “b0” (“here”) at the start position of the URL and the reference mark “e0” at the end position of the URL. The hot spot is indicated together with the identifier “0” (3810). Since b, e and numeric characters are displayed as spaces, the user sees little or no change in the display of the document. Further, the plug-in 3720 creates a symbolic hot spot instruction (information) 3725 indicating these marks as shown in FIG. The symbolic hot spot indication 3725 identifies the hot spot number as 0, which corresponds to 0 in the “b0” and “e0” reference marks. In this example, the symbolic hot spot instruction 3725 is stored in the data store 3750, for example.

図３９Ｂに示されるように、プラグイン3720はHTMLのマークアップバージョン3950をブラウザ3715に返す。マークアップHTML3950は、基準マークをスパンタグ3960で囲み、スパンタグは、そのフォントを１ポイントのMMRクーリエニューに変える。b,e及び数字キャラクタはスペースで示されるので、ユーザは書類の表示上でほとんど又は全く変化を見ない。マークアップHTML3950は修正されたファイル3730の一例である。個の例は簡明化のため１ページモデルを使用しているが、複数ページモデルが同じパラメータを使用してもよい。例えば、ホットスポットがページ境界を跨ぐ場合、そのホットスポットは各ページ位置に対応する基準マークを有し、各々のホットスポット識別子は同じである。 As shown in FIG. 39B, the plug-in 3720 returns an HTML markup version 3950 to the browser 3715. Markup HTML 3950 surrounds the fiducial mark with a span tag 3960, which turns the font into a 1-point MMR Courier New. Since b, e, and numeric characters are shown as spaces, the user sees little or no change on the document display. Markup HTML 3950 is an example of a modified file 3730. These examples use a single page model for simplicity, but multiple page models may use the same parameters. For example, when a hot spot crosses a page boundary, the hot spot has a reference mark corresponding to each page position, and each hot spot identifier is the same.

次に、印刷コマンドに応じて、印刷表現及びホットスポットに対応する座標が取得される(3820)。一実施例では、捕捉モジュール3735は印刷コマンド内のテキスト及び描画コマンドを「取り出す(tap)」。捕捉モジュール3735はテキスト及び描画コマンドを全て実行し、更に、その印刷表現中のキャラクタ及び／又はイメージ全てのｘ−ｙ座標その他のキャラクタを取得及び記録する。この例では、捕捉モジュール3735は印刷表現用のデバイスコンテキスト(DC)を参照し、DCは印刷表現の構造に対する操作ハンドルであり、その構造は出力フォーマットに依存して出力されるテキスト及び／又はイメージの属性を決める(例えば、プリンタ、ウインドウ、ファイルフォーマット、メモリバッファ等)。印刷表現用の座標を捕捉するプロセス(3820)では、ホットスポットは、HTMLに埋め込まれた基準マークを用いて容易に特定される。例えば、開始マークに遭遇すると、終了マークが発見されるまで、全てのキャラクタのｘ−ｙ座標が記録される。 Next, in accordance with the print command, coordinates corresponding to the print expression and the hot spot are acquired (3820). In one embodiment, the capture module 3735 “taps” text and drawing commands in the print command. The capture module 3735 executes all text and drawing commands, and further acquires and records the xy coordinates and other characters of all characters and / or images in the printed representation. In this example, capture module 3735 refers to a device context (DC) for a printed representation, where DC is an operation handle to the structure of the printed representation, which structure is output text and / or image depending on the output format. Attributes (eg, printer, window, file format, memory buffer, etc.). In the process (3820) of capturing coordinates for printed representations, hot spots are easily identified using fiducial marks embedded in HTML. For example, when a start mark is encountered, the xy coordinates of all characters are recorded until an end mark is found.

一実施例では、捕捉モジュール3735は、「印刷捕捉DLL」と言及されるフォワーディングDLLであり、既存のDLLの機能に付加又は修正を可能にする。フォワーディングDLLは、クライアントに正確にオリジナルのDLLを示すが、コール(call)が検索対象の(オリジナル)DLLに転送される前に、追加的なコード(タップ)が機能の全部又は一部に加えられる。この例では、印刷捕捉DLLは、ウインドウズグラフィックスデバイスインターフェース(Windows(登録商標) GDI)DLL gdi32.dll用のフォワーディングDLLである。gdi32.dllは、600を越える出力ポートを有し、その全部が転送されることを要する。gid32_mmr.dllとして言及される印刷可能DLLは、描画にDLL gdi32.dllを使用する何らかのウインドウズアプリケーションからプリントアウトをクライアントが捕捉することを可能にし、リモートサーバに印刷する場合でさえ、ローカルコンピュータで実行することだけを要する。 In one embodiment, the capture module 3735 is a forwarding DLL referred to as a “print capture DLL” that allows additions or modifications to the functionality of an existing DLL. The forwarding DLL shows the client exactly the original DLL, but additional code (tap) is added to all or part of the function before the call is forwarded to the searched (original) DLL. It is done. In this example, the print capture DLL is a forwarding DLL for the Windows Graphics Device Interface (Windows® GDI) DLL gdi32.dll. gdi32.dll has over 600 output ports, all of which need to be transferred. A printable DLL, referred to as gid32_mmr.dll, allows the client to capture printouts from any Windows application that uses the DLL gdi32.dll for drawing and runs on the local computer even when printing to a remote server It only needs to be done.

一実施例によれば、gdi32_mmr.dllはgdi32.dllに改名され、C:＼Windows＼system32にコピーされ、ウインドウズアプリケーションほぼ全体からの印刷をモニタすることを引き起こす。別の実施例では、gdi32_mmr.dllはgdi32.dllに改名され、アプリケーションのホームディレクトリにコピーされ、そのアプリケーションについての印刷が監視される。例えば、C:＼Program Files＼Internet Explorerは、ウインドウズXPのインターネットエクスプローラをモニタする。この例では、このアプリケーション(例えば、インターネットエクスプローラ)だけが印刷捕捉DLLの機能を自動的に呼び出す。 According to one embodiment, gdi32_mmr.dll is renamed gdi32.dll and copied to C: \ Windows \ system32, causing it to monitor printing from almost the entire Windows application. In another embodiment, gdi32_mmr.dll is renamed gdi32.dll, copied to the application's home directory, and printing for the application is monitored. For example, C: \ Program Files \ Internet Explorer monitors Windows XP Internet Explorer. In this example, only this application (eg, Internet Explorer) automatically invokes the print capture DLL function.

図４４は、フォワーディングDLLで使用される本発明の一実施例によるプロセスのフローチャートを示す。印刷捕捉DLL dgi32_mmr.dllは、gdi32.dllに向けられたファンクションコールを先ず受ける(4405)。一実施例では、gdi32_mmr.dllはgdi32.dllに向けられた全てのファンクションコールを受信する。gdi32.dllは約600個のファンクションコール全体の内200個程度をモニタし、それらは何らかの方法で印刷されるページの外観に影響を及ぼす機能である。次に、印刷捕捉DLLは、受信したコールが監視されるファンクションコールであるか否かを確認する。受信したコールが監視されるファンクションコールでなかった場合、そのコールはステップ4415乃至4435を迂回し、gdi32.dllに転送される(4440)。 FIG. 44 shows a flow chart of a process according to one embodiment of the present invention used in the forwarding DLL. The print capture DLL dgi32_mmr.dll first receives a function call directed to gdi32.dll (4405). In one embodiment, gdi32_mmr.dll receives all function calls directed to gdi32.dll. gdi32.dll monitors about 200 out of approximately 600 function calls, and these are functions that affect the appearance of pages printed in some way. Next, the print capture DLL checks whether the received call is a function call to be monitored. If the received call is not a monitored function call, the call bypasses steps 4415 to 4435 and is forwarded to gdi32.dll (4440).

受信したコールが監視されるファンクションコールであった場合、次に本方法は、そのファンクションコールが「新たな」プリンタデバイスコンテンツ(DC)を指定するか否か(即ち、プリンタDCが過去に受信されているか否か)を確認する。これは、内部DCテーブルに対してプリンタDCを検査することで確認される。DCは、上述したように、フォントや色等の描画設定だけでなく、(プリンタ、メモリバッファ等でもよい)描画する検索対象をも包含する。全ての描画処理(例えば、LineTo()、DrawText()等)はDC上で実行される。プリンタDCが新しくなかった場合、そのプリンタDCに対応するメモリバッファが既に存在しており、ステップ4420はスキップされる。プリンタDCが新しかった場合、新たなプリンタDCに対応するメモリバッファDCが作成される。このメモリバッファDCは、印刷されるページの出現を反映し、この例では、上記の印刷表現と等価である。こうして、プリンタDCが内部DCテーブルに加えられると、同じディメンジョンのメモリバッファDC(及びメモリバッファ)が作成され、内部DCテーブルの中でプリンタDCに関連付けられる。 If the received call is a monitored function call, the method then determines whether the function call specifies a “new” printer device content (DC) (ie, the printer DC has been received in the past). To check). This is confirmed by inspecting the printer DC against the internal DC table. As described above, DC includes not only drawing settings such as fonts and colors, but also search targets to be drawn (may be a printer, a memory buffer, etc.). All drawing processes (for example, LineTo (), DrawText (), etc.) are executed on the DC. If the printer DC is not new, a memory buffer corresponding to the printer DC already exists and step 4420 is skipped. If the printer DC is new, a memory buffer DC corresponding to the new printer DC is created. This memory buffer DC reflects the appearance of the page to be printed, and in this example is equivalent to the above printed representation. Thus, when a printer DC is added to the internal DC table, a memory buffer DC (and memory buffer) of the same dimension is created and associated with the printer DC in the internal DC table.

Gdi32_mmr.dllは、次に、コールがテキスト関連のファンクションコールであるか否を確認する(4425)。監視される200個のgdi32.dllコールの内約12個程度はテキスト関連である。テキスト関連でなければステップ4430はスキップされる。ファンクションコールがテキスト関連であった場合、テキスト関連出力がxmlファイルに書き込まれ(4430)、そのファイルは、図３７Ａに示されるようなpage_dec.xml3740として言及される。 Next, Gdi32_mmr.dll checks whether the call is a text-related function call (4425). About 12 of the 200 gdi32.dll calls monitored are text related. If it is not text-related, step 4430 is skipped. If the function call was text related, the text related output is written to an xml file (4430), which is referred to as page_dec.xml3740 as shown in FIG. 37A.

図４２Ａ及び４２Ｂは、図３９Ａ及び４０Ａを参照しながら説明されたHTMLファイル例3910のpage_dec.xml3740を例示する。Page_dec.xml3740は、ワード4210(例えば、Get)により、ｘ、ｙ、幅及び高さにより、並びにキャラクタ4220(例えば、G)により、印刷されるテキスト全てについての座標情報を含む。全ての座標はドットにおけるものであり、それはプリンタにおける画素のようなものであり、別様に定義されない限り、ページの左上隅に対する座標である。Page_dec.xml3740は、開始マーク4230及び終了マーク4240のようなホットスポット情報を「シーケンス」の形式で含む。ページ境界にわたるホットスポット(例えば、ページNからページN+1にかけてのホットスポット)の場合、ホットスポットが双方のページ(N及びN+1)に登場し、何れの場合もホットスポット識別子は同じである。更に、他の情報が、page_dec.xml3740に含まれ、他の情報は、ページ4280及び印刷可能領域4290のインチ当たりのドット数(dpi)及び解像度(res)に加えて、プリンタポート名4250(生成される.xml及び.jpegファイルに大きな影響を及ぼす)、ブラウザ3715(又はアプリケーション)の名前4260及び印刷の日時4270のようなものである。 42A and 42B illustrate page_dec.xml3740 of the example HTML file 3910 described with reference to FIGS. 39A and 40A. Page_dec.xml 3740 includes coordinate information for all printed text by word 4210 (eg, Get), by x, y, width and height, and by character 4220 (eg, G). All coordinates are in dots, which are like pixels in a printer, and unless otherwise defined, are coordinates for the upper left corner of the page. Page_dec.xml 3740 includes hot spot information such as a start mark 4230 and an end mark 4240 in the form of “sequence”. For hot spots across page boundaries (for example, hot spots from page N to page N + 1), the hot spots appear on both pages (N and N + 1), and in both cases the hot spot identifier is the same. is there. In addition, other information is included in page_dec.xml3740, which includes the printer port name 4250 (created in addition to the number of dots per inch (dpi) and resolution (res) of page 4280 and printable area 4290. Such as the name 4260 of the browser 3715 (or application) and the print date and time 4270.

図４４を再び参照するに、コールがテキスト関連でなかったことの確認に続いて、又はテキスト関連出力をpage_dec.xml3740に書き込んだことに続いて、gdi32_mmr.dllは、DCのメモリバッファでファンクションコールを実行する(4435)。このステップ4435は、出力をプリンタに提供し、ローカルコンピュータのメモリバッファへの出力も提供する。そして、ページが増えると、メモリバッファのコンテンツは圧縮され、JPEG及びPNGフォーマットで書き込まれる。そしてファンクションコールはgdi32.dllに転送され(4440)、あたかも通常なされるようにそれを実行する。 Referring back to Figure 44, following confirmation that the call was not text-related, or following writing text-related output to page_dec.xml3740, gdi32_mmr.dll is a function call in the DC memory buffer. Is executed (4435). This step 4435 provides output to the printer and also provides output to the local computer's memory buffer. As the number of pages increases, the contents of the memory buffer are compressed and written in JPEG and PNG formats. The function call is then forwarded to gdi32.dll (4440) and executed as if normally made.

図３８を参照するに、ホットスポットを含む印刷表現を有するページレイアウトが表現される(3830)。一実施例では、その表現3830は書類を印刷することを含む。図４０Ｂは図３９Ａ及び４０ＡのHTMLファイル3910の印刷バージョン4011の例を示す。基準マークはエンドユーザにとって視覚的に感知できるものでないことに留意を要する。表現されたレイアウトは、例えばデータストア3750に保存される。 Referring to FIG. 38, a page layout having a printed representation including hot spots is represented (3830). In one embodiment, the representation 3830 includes printing the document. FIG. 40B shows an example of a print version 4011 of the HTML file 3910 of FIGS. 39A and 40A. Note that the fiducial mark is not visually perceptible to the end user. The expressed layout is stored in, for example, the data store 3750.

一実施例によれば、印刷捕捉DLLは、例えば図４２Ａ−Ｂに示されるようにシンボリックホットスポット記述3725及びpage_dec.xml3740の中のデータを、図４３に示されるようなhotspot.xml3745に併合する。この例では、hotspot.xml3745は、書類が印刷される際に作成される。図４３の例は、ホットスポット０がx=1303，y=350にあり、190ピクセルの幅及び71ピクセルの高さであることを示す。ホットスポットのコンテンツは、http://www.richo.com.で示される。 According to one embodiment, the print capture DLL merges the data in symbolic hotspot description 3725 and page_dec.xml3740 into hotspot.xml3745 as shown in FIG. 43, for example as shown in FIGS. 42A-B. . In this example, hotspot.xml3745 is created when the document is printed. The example of FIG. 43 shows that hotspot 0 is at x = 1303, y = 350, 190 pixels wide and 71 pixels high. Hotspot content is shown at http://www.richo.com.

捕捉モジュール3820の代替例によれば、マイクロソフトプリンタドライバでのフィルタXPS(XMLプリント仕様)は、“XPSDrv filter”のように一般に知られており、テキスト描画コマンドを受信し、page_dec.xmlファイルを上述のように作成する。 According to an alternative to the capture module 3820, the filter XPS (XML Print Specification) in the Microsoft printer driver is commonly known as “XPSDrv filter”, receives text drawing commands, and sets the page_dec.xml file above. Create as follows.

視覚的に見えるホットスポット
図４５は、書類中のホットスポットに対応するキャラクタを変換する本発明の一実施例による方法のフローチャートを示す。本方法は、エンドユーザ及びMMR認識ソフトウエア双方を示す方法で、ホットスポットが存在する印刷書類を修正する。 Visually Visible Hot Spots FIG. 45 shows a flowchart of a method according to one embodiment of the present invention for converting characters corresponding to hot spots in a document. This method is a method that shows both end-users and MMR recognition software, and modifies a printed document with hot spots.

先ず、印刷される電子書類がキャラクタストリームとして受信される(4510)。例えば、キャラクタストリームを選別できるプリンタドライバ又はソフトウエアモジュールで、書類が受信されてもよい(4510)。一実施例では、ソースファイル3710から書類がブラウザ3715で受信される。図４６は、本発明の一実施例による書類4610の電子バージョン例を示す。この例の書類4610は２つのホットスポットを含み、１つは“are listed below”に関連し、１つは“possible prior art”に関連する。ホットスポットは一実施例ではエンドユーザが視覚的に感知できるものでない。図３８を参照しながら説明された座標捕捉法により、或いは上記の他の何らかの方法により、ホットスポットが設定されてもよい。 First, an electronic document to be printed is received as a character stream (4510). For example, the document may be received 4510 by a printer driver or software module that can screen the character stream. In one embodiment, a document is received at browser 3715 from source file 3710. FIG. 46 shows an example electronic version of a document 4610 according to one embodiment of the present invention. Document 4610 in this example includes two hot spots, one related to “are listed below” and one related to “possible prior art”. Hot spots are not visually perceivable by the end user in one embodiment. The hot spot may be set by the coordinate capturing method described with reference to FIG. 38 or by some other method described above.

ホットスポットの始まりを示す開始マークを求めて書類が分析される(4520)。開始マークは、上述したような基準マークでもよいし、或いは、ホットスポットを特定する個々に識別可能な何らかの他のマークでもよい。いったん開始マークが発見されると、ある変換規則が書類の一部分に適用され(4530)、即ち、終了マークが発見されるまで開始マーク以降のキャラクタに適用される。変換規則は、一実施例では例えばキャラクタのフォント又は色を修正することで、ホットスポットに対応する書類の一部分に目に見える修正を引き起こす。この例では、例えばタイムスニューロマン(Times New Roman)のような元のフォントが、例えばOCR-Aのような別の既知のフォントに変換される。別の例では、テキストが例えば青#F86Aのような別のフォントカラーで表現される。フォントを変換するプロセスは、一実施例の上記のプロセスと同様である。例えば、書類がHTMLファイル4610であった場合、書類4510の中で基準マークに遭遇すると、そのフォントはHTMLファイルの中で置換される。 The document is analyzed for a start mark indicating the beginning of the hot spot (4520). The start mark may be a reference mark as described above, or some other individually identifiable mark that identifies a hot spot. Once a start mark is found, certain conversion rules are applied to a portion of the document (4530), i.e., to characters after the start mark until an end mark is found. The conversion rules cause visible modifications to the portion of the document corresponding to the hot spot, in one embodiment, for example, by modifying the font or color of the character. In this example, an original font such as Times New Roman is converted to another known font such as OCR-A. In another example, the text is represented in another font color, such as blue # F86A. The process of converting the font is similar to the process described above in one embodiment. For example, if the document is an HTML file 4610 and encounters a fiducial mark in the document 4510, the font is replaced in the HTML file.

一実施例による変換ステップは、ブラウザ3715に対するプラグイン3720により実行され、修正された書類3730をもたらす。図４７は、本発明の一実施例による印刷された修正済みの書類例を示す。図示されるように、ホットスポット4720及び4730は、残りのテキスト部分と視覚的に区別可能である。特に、ホットスポット4720は異なるフォントで視覚的に区別可能であり、ホットスポット4730は異なる色及び下線で視覚的に区別可能である。 The conversion step according to one embodiment is performed by a plug-in 3720 to the browser 3715, resulting in a modified document 3730. FIG. 47 shows an example of a printed modified document according to one embodiment of the present invention. As shown, hot spots 4720 and 4730 are visually distinguishable from the rest of the text. In particular, hot spot 4720 is visually distinguishable with different fonts, and hot spot 4730 is visually distinguishable with different colors and underlines.

次に、変換された部分を伴う書類は、あるページレイアウトに表現され、それは電子書類及び電子書類内でのホットスポットの場所を含む。一実施例では、書類を表現することは書類を印刷することを含む。一実施例では、表現は、変換された部分を伴う書類について(上記のような方法の何れかに従って)特徴抽出を行うことを含む。一実施例では、特徴抽出は、印刷コマンドに応答して、電子書類に対応するページ座標を捕捉することを含む。そして電子書類は変換されたキャラクタに対応する座標の一部分について分析される。一実施例では図３７Ａの捕捉モジュール3735が特徴抽出及び／又は座標捕捉を実行する。 The document with the converted portion is then represented in a page layout that includes the electronic document and the location of the hotspot within the electronic document. In one embodiment, rendering the document includes printing the document. In one embodiment, the representation includes performing feature extraction (according to any of the methods as described above) on the document with the transformed portion. In one embodiment, the feature extraction includes capturing page coordinates corresponding to the electronic document in response to the print command. The electronic document is then analyzed for a portion of the coordinates corresponding to the converted character. In one embodiment, the acquisition module 3735 of FIG. 37A performs feature extraction and / or coordinate acquisition.

MMR認識ソフトウエアは、同じ変換規則を用いて全画像に前処理を施す。先ず、その規則に従うテキストを探し(例えば、OCR-Aのものや、ブルー#F86Aのもの)、次に通常の認識アルゴリズムを適用する。 The MMR recognition software preprocesses all images using the same conversion rules. First, search for text that complies with the rules (for example, those of OCR-A or Blue # F86A), and then apply the normal recognition algorithm.

本発明のこの態様は、非常に簡単な画像処理ルーチンを利用することになるので、MMR認識ソフトウエアの演算負担をかなり削減し、演算するオーバーヘッドの大部分を省略できる点で有利である。更に、例えば図５１Ａ−Ｄを参照しながら説明されるように、境界ボックスが書類の一部分上にあるならば、適用するかもしれない代替法の大多数を選択肢から削除することで、特徴抽出の精度を改善できる。更に、テキストの目に見える修正は、どのテキストが（又は他のどの書類オブジェクトが）ホットスポットの部分であるかを示す。 This aspect of the present invention is advantageous in that it uses a very simple image processing routine, thus significantly reducing the computational burden on the MMR recognition software and eliminating most of the computational overhead. In addition, if the bounding box is on a part of the document, for example as described with reference to FIGS. 51A-D, the majority of alternatives that may apply may be removed from the selection to eliminate feature extraction. Accuracy can be improved. Further, the visible modification of the text indicates which text (or which other document object) is part of the hot spot.

共有書類注釈
図４８は、本発明の一実施例による共有される書類注釈法のフローチャートを示す。本方法は、共有される環境でユーザが書類に注釈を付けることを可能にする。以下に説明される実施例では、共有される環境は、様々なユーザが眺めるウェブページであるが、他の実施例での共有される環境は、ワークグループ等のように、リソースが共有される如何なる環境でもよい。 Shared Document Annotation FIG. 48 shows a flowchart of a shared document annotation method according to one embodiment of the present invention. The method allows users to annotate documents in a shared environment. In the embodiment described below, the shared environment is a web page viewed by various users, but in the other embodiment, the shared environment is shared resources such as a work group. Any environment is acceptable.

本方法によれば、ソース書類は例えばブラウザ3715のようなブラウザに表示される(4810)。一実施例では、ソース書類はソースファイル3710から受信され、他の実施例では、ソース書類は例えばインターネット接続のようなネットワークを介して受信されたウェブページである。そのウェブページの例を用いて説明するに、図４９Ａは、本発明の一実施例によるブラウザ中のソースウェブページサンプル4910を示す。この例では、ウェブページ4910は、一般的な子供向け書籍のキャラクタ−ジェリーバターゲーム(Jerry Butter Game)に関連するゲームのHTMLファイルである。 According to the method, the source document is displayed (4810) in a browser such as browser 3715, for example. In one embodiment, the source document is received from a source file 3710, and in another embodiment, the source document is a web page received over a network, such as an internet connection. To illustrate using that example web page, FIG. 49A shows a source web page sample 4910 in a browser according to one embodiment of the present invention. In this example, web page 4910 is an HTML file of a game related to a general children's book character-Jerry Butter Game.

ソース書類のディスプレイでは(4810)、ソース書類に関連する共有注釈とその共有注釈に関連するソース書類の部分の指示とが受信される(4820)。この例では説明の簡明化のため、１つの注釈が使用されるが、複数の注釈を使用することもできる。その例では、複数の注釈はここで説明されるようなMMRで使用されるデータ、指示又は相互作用である。注釈は、一実施例では、図３７Ａに示される3755のような共有ドキュメンテーション注釈サーバ(SDA: Shared Documentation Annotation server)に格納され、そこから取り出すことで受信される。SDAサーバ3755は一実施例ではネットワーク接続を介してアクセス可能である。共有される注釈の抽出用のプラグインは、この例では、図３７Ａに示されるようなプラグイン3720であり、この機能を促進する。別の実施例では、注釈及び指示がユーザから受け取られる。ユーザは如何なる注釈も含んでいない書類に対して共有される注釈を作成してもよいし、或いは既存の共有されている注釈を書類に付け加える又は修正してもよい。例えば、ユーザは、ソース書類の一部分を強調(ハイライト)し、共有される注釈との関連を指示し、その注釈は、叙述した様々な方法でユーザにより用意される。 At the source document display (4810), a shared annotation associated with the source document and an indication of the portion of the source document associated with the shared annotation are received (4820). In this example, a single annotation is used for simplicity of explanation, but multiple annotations may be used. In that example, the multiple annotations are data, instructions or interactions used in the MMR as described herein. The annotations are stored in and retrieved from a shared documentation annotation server (SDA) such as 3755 shown in FIG. 37A in one embodiment. The SDA server 3755 is accessible via a network connection in one embodiment. The shared annotation extraction plug-in is, in this example, a plug-in 3720 as shown in FIG. 37A to facilitate this function. In another embodiment, annotations and instructions are received from the user. A user may create a shared annotation for a document that does not contain any annotations, or may add or modify an existing shared annotation to the document. For example, the user highlights (highlights) a portion of the source document and indicates an association with the shared annotation, which is prepared by the user in the various ways described.

次に、修正された書類がブラウザで表示される(4830)。修正された書類は、ステップ4820で指定されたソース書類の一部に対応するホットスポットを含む。ホットスポットは、共有される注釈の場所を指定する。一実施例によれば、修正書類は、プラグイン3720で生成される修正ファイル3730の一部であり、ブラウザ3715に返される。図４９Ｂは、本発明の一実施例によるブラウザにおける修正ウェブページサンプル4920を示す。ウェブページ4920は、ホットスポット4930の指示及び関連する注釈4940を示し、注釈はこの例ではビデオクリップである。指示4930は、例えばハイライトすることで、残りのウェブページ4920のテキストと視覚的に区別されてもよい。一実施例によれば、指示4930がクリックされる又はマウスを操作すると、注釈4040が表示される。 The modified document is then displayed in the browser (4830). The modified document includes a hot spot that corresponds to the portion of the source document specified in step 4820. Hotspots specify the location of shared annotations. According to one embodiment, the revision document is part of the revision file 3730 generated by the plug-in 3720 and is returned to the browser 3715. FIG. 49B shows a modified web page sample 4920 in a browser according to one embodiment of the present invention. Web page 4920 shows instructions for hotspot 4930 and associated annotation 4940, which in this example is a video clip. The instruction 4930 may be visually distinguished from the text of the remaining web page 4920, for example by highlighting. According to one embodiment, annotation 4040 is displayed when instruction 4930 is clicked or when the mouse is operated.

印刷コマンドに応答して、修正書類の印刷表現に対応するテキスト座標及びホットスポットが捕捉される(4840)。座標捕捉の詳細は、本願で説明されるどの方法に従ってもよい。 In response to the print command, text coordinates and hot spots corresponding to the printed representation of the modified document are captured (4840). The details of coordinate acquisition may follow any of the methods described herein.

次に、ホットスポットを含む印刷表現のページレイアウトが表現される(4850)。一実施例では、その表現4850は、書類を印刷することである。図４９Ｃは、本発明の一実施例による印刷されたウェブページサンプル4950を示す。印刷されたウェブページレイアウト4950は指示されるようにホットスポット4930を含むが、ウェブページ4920とは異なり、行は印刷レイアウト4950では中断している。この例では、ホットスポット4930の境界は、印刷されたレイアウト4950では視覚可能でない。 Next, a printed page layout including hot spots is represented (4850). In one embodiment, the representation 4850 is to print the document. FIG. 49C shows a printed web page sample 4950 according to one embodiment of the present invention. The printed web page layout 4950 includes hot spots 4930 as indicated, but unlike web page 4920, the lines are interrupted in print layout 4950. In this example, the border of the hot spot 4930 is not visible in the printed layout 4950.

選択的な最終ステップでは、共有される注釈は、例えばデータストレージ3750に格納され、印刷された書類4950内のホットスポット4930との関連性を用いてインデックスされる。その印刷表現もローカルに保存されてよい。一実施例では、印刷するアクションは、ダウンロード及びローカルコピーの作成を引き起こす(トリガを与える)。 In an optional final step, shared annotations are stored, for example, in data storage 3750 and indexed using relevance to hotspots 4930 in printed document 4950. The printed representation may also be stored locally. In one embodiment, the printing action causes a download and creation of a local copy (provides a trigger).

画像処理書類のホットスポット
図５０Ａは、ホットスポットを画像処理書類に付加する本発明の一実施例による方法を示すフローチャートである。本方法は、スキャン後に紙書類にホットスポットを付加できるようにする、或いは印刷用に表現された後にシンボリック電子書類にホットスポットを付加できるようにする。 Image Processing Document Hot Spot FIG. 50A is a flow chart illustrating a method according to one embodiment of the present invention for adding a hot spot to an image processing document. The method allows hot spots to be added to a paper document after scanning, or allows hot spots to be added to a symbolic electronic document after being rendered for printing.

先ず、ソース書類は画像処理書類に変換される(5010)。一実施例ではソース書類は、ソースファイル3710からブラウザ3715で受信される。その変換(5010)は、特徴抽出を行う対象の書類を生成し、特徴表現を生成する如何なる方法でなされてもよい。一実施例によれば、紙書類がスキャンされ、それが画像処理書類になる。別の実施例では、電子書類の表現可能なページプルーフが、適切なアプリケーションを用いて表現される。例えば、表現可能なページプルーフがポストスクリプトフォーマットであった場合、ゴーストスクリプトが使用される。図５１Ａは、一実施例でスキャンされた新聞のページ5110の一部を示すユーザインターフェース5105の一例を示す。メインウインドウ5115は、新聞ページ5110の大きな部分を示し、サムネイル5120はページのどの部分が表示されるかを示す。 First, the source document is converted into an image processing document (5010). In one embodiment, the source document is received at browser 3715 from source file 3710. The conversion (5010) may be performed by any method for generating a document to be subjected to feature extraction and generating a feature expression. According to one embodiment, a paper document is scanned and becomes an image processing document. In another embodiment, a representable page proof of an electronic document is rendered using a suitable application. For example, if the representable page proof is in Postscript format, a ghost script is used. FIG. 51A shows an example of a user interface 5105 showing a portion of a newspaper page 5110 scanned in one embodiment. The main window 5115 shows a large part of the newspaper page 5110, and the thumbnail 5120 shows which part of the page is displayed.

次に、画像処理書類に特徴抽出が適用され(5020)、特徴表現を作成する。本願で説明された様々な特徴抽出のどれでもがこの目的に使用されてもよい。一実施例では、特徴抽出は、図３７Ａを参照しながら説明された捕捉モジュール3735によって実行される。そして、１つ以上のホットスポット5125が画像処理書類に加えられる(5030)。ホットスポットは、予め決められていてもよいし、或いは様々な例に応じて決められる必要があってもよい。ホットスポットが既に決まっている場合、その定義は、ページ数、ページ上のホットスポットの境界ボックスの場所、ホットスポットに付随する電子データ又は相互作用等を含む。一実施例では、ホットスポットの定義は図４３に示されるようにhotspot.xmlファイルの形をとる。 Next, feature extraction is applied to the image processing document (5020) to create a feature representation. Any of the various feature extractions described in this application may be used for this purpose. In one embodiment, feature extraction is performed by the acquisition module 3735 described with reference to FIG. 37A. One or more hot spots 5125 are then added to the image processing document (5030). The hot spot may be predetermined or may need to be determined according to various examples. If the hotspot is already determined, the definition includes the number of pages, the location of the hotspot's bounding box on the page, the electronic data or interaction associated with the hotspot, etc. In one embodiment, the hotspot definition takes the form of a hotspot.xml file as shown in FIG.

ホットスポットが決まっていなかった場合、エンドユーザがホットスポットを決めてもよい。図５０Ｂは、画像処理書類に付けるホットスポットを決める本発明の一実施例による方法のフローチャートを示す。先ず、ホットスポット候補が選択される(5032)。例えば、図５１Ａでは、エンドユーザが境界ボックス5125を使って書類の一部をホットスポットとして選択している。次に、所与のデータベースについて、ホットスポットがユニークであるか否かが選択的なステップ5034で確認される。例えば、n”×n”で囲まれるパッチ内にホットスポットを一意に特定するのに十分なテキストが存在すべきである。ｎの典型的な値は２である。ホトスポットがそのデータベースに対して十分にユニークでなかった場合、一実施例では、その曖昧さを如何にして処理するかについての選択肢がユーザに提示される。例えば、ユーザインターフェースは、より大きな領域を選択すること、曖昧さを許容するがその説明をデータベースに付け加えること等のような選択肢を用意してもよい。他の実施例は、ホットスポットを決める別方法を使用してもよい。 If the hot spot has not been determined, the end user may determine the hot spot. FIG. 50B shows a flowchart of a method according to one embodiment of the present invention for determining hot spots to be applied to an image processing document. First, a hot spot candidate is selected (5032). For example, in FIG. 51A, the end user uses a bounding box 5125 to select a portion of the document as a hot spot. Next, an optional step 5034 determines if the hotspot is unique for a given database. For example, there should be enough text to uniquely identify a hot spot within a patch surrounded by n ″ × n ″. A typical value for n is 2. If the photo spot is not sufficiently unique to the database, in one embodiment, the user is presented with options on how to handle the ambiguity. For example, the user interface may provide options such as selecting a larger area, allowing ambiguity but adding a description to the database. Other embodiments may use alternative methods of determining hot spots.

ホットスポットの場所が選択されると(5032)、データ又は相互作用が決定され(5036)、それがホットスポットに付け加えられる。図５１Ｂは、選択されたホットスポットに関連付けるデータ又は操作を決めるユーザインターフェース例を示す。例えば、ユーザが境界ボックス5125を選択すると、編集ボックス5130が表示される。関連するボタンを使って、ユーザはその操作を取り消してもよいし(5135)、境界ボックス5125を単に保存してもよいし(5140)、或いはデータ又はやり取りをホットスポットに割り当ててもよい(5145)。データ又はやり取りをホットスポットに割り当てることをユーザが選択した場合、図５１Ｃに示されるように、割り当てボックス5150が表示される。割り当てボックス5150は、イメージ5155、様々な他のメディア5160及びウェブリンク5165をユーザがホットスポットに割り当てることを可能にし、割り当てられるものはＩＤ番号で区別される。そして、ユーザはホットスポット定義を保存することを選択してもよい(5175)。簡明化のため1つのホットスポットしか記載されていないが、複数のホットスポットが使用されてもよい。図５１Ｄは、書類内のホットスポットを表示するユーザインターフェースを示す。一実施例では、異なる色の境界ボックスが、様々なデータ及びやり取りのタイプに対応している。 Once the location of the hot spot is selected (5032), data or interaction is determined (5036) and added to the hot spot. FIG. 51B illustrates an example user interface that determines data or operations associated with a selected hot spot. For example, when the user selects the bounding box 5125, an edit box 5130 is displayed. Using the associated button, the user may cancel the operation (5135), simply save the bounding box 5125 (5140), or assign data or interactions to the hotspot (5145). ). If the user selects to assign data or interactions to the hotspot, an assignment box 5150 is displayed as shown in FIG. 51C. An assignment box 5150 allows the user to assign images 5155, various other media 5160 and web links 5165 to hotspots, which are distinguished by ID numbers. The user may then choose to save the hotspot definition (5175). Only one hot spot is shown for simplicity, but multiple hot spots may be used. FIG. 51D shows a user interface displaying hot spots in the document. In one embodiment, different colored bounding boxes correspond to different data and interaction types.

選択的なステップでは、画像処理書類、ホットスポット定義及び特徴表現は、例えばデータストア3750内に格納される(5040)。 In an optional step, the image processing document, hot spot definition, and feature representation are stored (5040), for example, in data store 3750.

図５２は、MMRドキュメント500及びMMRシステム100bを使用する本発明の一実施例による方法を示す。 FIG. 52 illustrates a method according to one embodiment of the present invention using an MMR document 500 and an MMR system 100b.

本方法5200は、第１書類又は第１書類の表現を取得することから始まる(5210)。第１書類を捕捉する具体的な方法は、以下の（１）〜（４）を含む：（１）MMRコンピュータ112のオペレーティングシステム内で印刷書類のテキストレイアウトを、PD捕捉モジュール318を介して自動的に捕捉することで、第１書類が捕捉されること；（２）MMRコンピュータ112のプリンタドライバ316内で印刷書類のテキストレイアウトを自動的に捕捉することで、第１書類が捕捉されること；（３）例えばMMRコンピュータ112に接続されたスキャン書類スキャナ装置127を介して紙書類をスキャンすることで、第１書類が捕捉されること；及び（４）印刷書類の表現であるファイルを自動的に又は手動的にMMRコンピュータ112に転送、アップロード又はダウンロードすることで、第１書類が捕捉されること。捕捉ステップは印刷書類のほとんど又は全てを捕捉するように説明されているが、捕捉ステップ5210は、印刷書類の最少部分だけについて実行されもよいことが理解されるべきである。更に、書類は1つの書類を捕捉する観点から説明されているが、このステップは、多数の書類を捕捉し、第1書類のライブラリを作成するように実行されてもよい。 The method 5200 begins by obtaining a first document or a representation of the first document (5210). Specific methods for capturing the first document include the following (1) to (4): (1) Automatic text layout of the printed document within the operating system of the MMR computer 112 via the PD capture module 318. The first document is captured by automatically capturing; (2) the first document is captured by automatically capturing the text layout of the printed document within the printer driver 316 of the MMR computer 112; (3) a first document is captured, for example, by scanning a paper document via a scanned document scanner device 127 connected to the MMR computer 112; and (4) a file that is a representation of a printed document is automatically The first document is captured by transferring, uploading or downloading to the MMR computer 112 manually or manually. Although the capture step is described as capturing most or all of the printed document, it should be understood that the capture step 5210 may be performed on only a minimum portion of the printed document. Further, although the document is described in terms of capturing a single document, this step may be performed to capture multiple documents and create a library of first documents.

いったん捕捉ステップ5210が実行されると、本方法5200は第１書類についてインデックス処理を実行する(5212)。インデックス処理は、書類の関連する電子表現と関連する第２メディアタイプとを確認し、捕捉された第１書類又はその一部に一致する入力を探すことを可能にする。一実施例でのこのステップでは、書類インデックス処理は、PDインデックス322を生成するPD捕捉モジュール318により実行される。インデックス処理の具体例は、以下の（１）〜（６）を含む：（１）印刷書類のキャラクタのｘ−ｙ座標が索引付けされること、（２）印刷書類のワードのｘ−ｙ座標が索引付けされること、（３）印刷書類のイメージの又はイメージの一部のｘ−ｙ座標が索引付けされること、（４）OCR画像処理が実行され、キャラクタ及び／又はワードのｘ−ｙ座標がそれに応じて索引付けされること、（５）表現されたページの画像から特徴抽出が実行され、その特徴のｘ−ｙ座標が索引付けされること、及び（６）ページのシンボリックバージョンで特徴抽出が実行され、その特徴のｘ−ｙ座標が索引付けされること。索引付け処理は、本発明のアプリケーションに依存して上記のインデックス処理の何れか又はグループを包含してよい。 Once the capture step 5210 is performed, the method 5200 performs indexing on the first document (5212). The indexing process makes it possible to identify the associated electronic representation of the document and the associated second media type and look for an input that matches the captured first document or part thereof. In this step in one embodiment, the document indexing is performed by a PD capture module 318 that generates a PD index 322. Specific examples of index processing include the following (1) to (6): (1) that the xy coordinates of the characters of the printed document are indexed, and (2) the xy coordinates of the words of the printed document. Is indexed, (3) the xy coordinates of the image or part of the image of the printed document are indexed, (4) OCR image processing is performed, and the character x and / or word x- the y coordinate is indexed accordingly, (5) feature extraction is performed from the image of the represented page, the xy coordinates of the feature are indexed, and (6) a symbolic version of the page Feature extraction is performed and the xy coordinates of the feature are indexed. The indexing process may include any or a group of the above index processes depending on the application of the present invention.

本方法5200は第２書類を捕捉する(5214)。このステップ5214では、捕捉される第２書類は、書類全体でもよいし、第２書類の一部分(パッチ)だけでもよい。第２書類を捕捉する方法の具体例は、以下の（１）〜（８）を含む：（１）捕捉装置106の１つ以上の初速手段230を用いてテキストのパッチをスキャンすること；（２）捕捉装置106の1つ以上の捕捉手段230を用いてテキストのパッチをスキャンし、その後に、意図される特徴記述が適切に取り出される尤度を決めるように画像を処理すること。例えば、インデックスがOCRに基づいている場合、画像がテキストの複数行を含んでいるか否か及び画像の鮮明さが良好なOCR処理に十分か否かをシステムが決めてもよい。その判定が否定的であった場合、テキストの別のパッチがスキャンされる；（３）スキャンされる書類を特定するマシン読み取り可能な識別子(例えば、国際標準書籍番号(ISBN)又はユニバーサルプロデュースコード(UPC))をスキャンすること；（４）表現される書類又は一群の書類を特定するデータ(例えば、スポーツ写真マガジンの2003年版)を入力し、その後に、テキストのパッチが本方法の（１）又は（２）によりスキャンされること；（５）添付された第２書類と共に電子メールを受信すること；（６）第２書類をファイルトランスファで受信すること；（７）捕捉装置106の１つ以上の捕捉手段230で画像の一部分をスキャンすること；及び（８）入力装置166と共に第２書類を入力すること。 The method 5200 captures a second document (5214). In this step 5214, the second document to be captured may be the entire document or only a portion (patch) of the second document. Specific examples of a method for capturing a second document include the following (1) to (8): (1) scanning a patch of text using one or more initial speed means 230 of the capture device 106; 2) Scanning a patch of text using one or more capture means 230 of the capture device 106 and then processing the image to determine the likelihood that the intended feature description is properly retrieved. For example, if the index is based on OCR, the system may determine whether the image contains multiple lines of text and whether the image is sharp enough for good OCR processing. If the determination is negative, another patch of text is scanned; (3) a machine readable identifier that identifies the document to be scanned (eg, International Standard Book Number (ISBN) or Universal Produce Code ( UPC)); (4) Enter data to identify the document or group of documents to be represented (for example, the 2003 edition of the Sports Photo Magazine), and then a text patch will be used in this method (1) Or (2) being scanned; (5) receiving an email with an attached second document; (6) receiving the second document by file transfer; (7) one of the capture devices 106; Scanning a portion of the image with the capture means 230 described above; and (8) inputting a second document with the input device 166.

ステップ5210及び5214がいったん実行されると、本方法は、第１書類及び第２書類間で書類照合又はパターン照合を実行する(5216)。一実施例では、第１書類と第２書類のドキュメントフィンガープリント照合を実行することでそれが実行される。ドキュメントフィンガープリント照合は、第２メディア書類についてPDインデックス322を問い合わせることで実行される。ドキュメントフィンガープリント照合の具体例は、ステップ5214で捕捉した画像から特徴を抽出し、それらの特徴から記述子を構成し、ある割合の記述子を含む書類及びパッチを探すことである。データベースが多数の書類を格納する場合、各書類につき一度、ライブラリ又はデータベース中の何らかの書類が第２書類に合致しているか否かを確認するために、このパターン照合ステップが、複数回実行されてもよいことが理解されるべきである。或いは、索引付けステップ5212は書類5210にインデックスを加え、そのインデックスは、書類のコレクションを及びパターン照合ステップが一度行われたことを表す。 Once steps 5210 and 5214 are performed, the method performs a document or pattern match between the first document and the second document (5216). In one embodiment, this is done by performing a document fingerprint match of the first document and the second document. Document fingerprint verification is performed by querying the PD index 322 for the second media document. A specific example of document fingerprint matching is extracting features from the image captured in step 5214, constructing descriptors from those features, and looking for documents and patches that contain a certain percentage of descriptors. If the database stores a large number of documents, this pattern matching step is performed multiple times, once for each document, to see if any document in the library or database matches the second document. It should be understood that Alternatively, indexing step 5212 adds an index to document 5210, which represents a collection of documents and that the pattern matching step has been performed once.

最終的に本方法5200は、ステップ5216の結果に基づいて及びユーザ入力に選択的に基づいて或るアクションを時国交する(5218)。一実施例では、本方法5200は所定のアクションを探し、そのアクションは、所与の書類パッチに関連し、例えばステップ5216で照合しているように発見されたホットスポット506に関連する第2メディア504に格納される。所定のアクションの具体例は、以下の（１）〜（６）を含む：（１）ドキュメントイベントデータベース320、インターネット等から情報を取得すること；（２）システムの出力を受ける準備の整ったMMRシステム100bにより確認された場所に情報を書き込むこと；（３）情報を探すこと；（４）捕捉装置106のようなクライアント装置で情報を表示し、ユーザと双方向の対話を行うこと；（５）後の実行に備えて、方法ステップ5216で決定したアクション及びデータをキューに入れること(ユーザの介入は選択的でもよい)；及び（６）方法ステップ5216で決定したデータ及びアクションを速やかに実行すること。本方法ステップの結果の具体例は、情報の抽出、修正された書類、他のアクションの実行()、セットトップボックス126のようなケーブルTVボックスに送られるコマンドの入力等を含み、ケーブルTVボックスは、ビデオをケーブルTVボックスに流すケーブルTVサーバ(例えば、サービスプロバイダサーバ122)に結合される。ステップ5218が一旦実行されると、本方法5200は完了し、終了する。 Finally, the method 5200 diplomatics an action based on the result of step 5216 and selectively based on user input (5218). In one embodiment, the method 5200 looks for a predetermined action, which is associated with a given document patch, for example the second media associated with the hotspot 506 that was found as matched in step 5216. Stored in 504. Specific examples of the predetermined actions include the following (1) to (6): (1) obtaining information from the document event database 320, the Internet, etc .; (2) MMR ready to receive the output of the system. Writing information to the location identified by system 100b; (3) looking for information; (4) displaying information on a client device such as capture device 106 and interactively interacting with the user; (5 ) Queue the actions and data determined in method step 5216 for later execution (user intervention may be optional); and (6) Immediately execute the data and actions determined in method step 5216 To do. Specific examples of the results of this method step include extraction of information, modified documents, execution of other actions (), input of commands sent to cable TV boxes such as set-top boxes 126, etc. Is coupled to a cable TV server (eg, service provider server 122) that streams video to a cable TV box. Once step 5218 is executed, the method 5200 is complete and ends.

図５３は、本発明の一実施例によるMMRシステム100bに関連する一群のビジネスエンティティ例5300のブロック図を示す。一群のビジネスエンティティ5300は、MMRサービスプロバイダ5310、MMRコンシューマ5312、マルチメディア企業5314、プリンタユーザ5316、セルラ電話サービスプロバイダ5318、ハードウエア製造業者5320、ハードウエア小売業者5322、金融機関5324、クレジットカード処理者5326、書類出版者5328、書類印刷者5330、フルフィルメントハウス5332、ケーブルTVプロバイダ5334、サービスプロバイダ5336、ソフトウエアプロバイダ5338、広告企業5340及びビジネスネットワーク5370を含む。 FIG. 53 shows a block diagram of a group of example business entities 5300 associated with the MMR system 100b according to one embodiment of the invention. Group of business entities 5300 are MMR service provider 5310, MMR consumer 5312, multimedia company 5314, printer user 5316, cellular telephone service provider 5318, hardware manufacturer 5320, hardware retailer 5322, financial institution 5324, credit card processing 5326, document publisher 5328, document printer 5330, fulfillment house 5332, cable TV provider 5334, service provider 5336, software provider 5338, advertising company 5340 and business network 5370.

MMRサービスプロバイダ5310は、図１Ａ乃至５及び５２を参照しながら説明したMMRシステム100の所有者及び／又は管理者である。MMRコンシューマ5312は、図１Ｂを参照しながら以前に説明された何らかのMMRユーザ110を表す。 The MMR service provider 5310 is the owner and / or administrator of the MMR system 100 described with reference to FIGS. 1A-5 and 52. The MMR consumer 5312 represents any MMR user 110 previously described with reference to FIG. 1B.

マルチメディア企業5314はディジタルマルチメディア製品の何らかのプロバイダであり、例えば、ディジタルムービー及びビデオゲームを提供するブロックブラスターインコーポレーション(テキサス州ダラス)や、ディジタルミュージック、映画及びTV番組を提供するソニーコーポレーションオブアメリカ(ニューヨーク州ニューヨーク)等である。 Multimedia company 5314 is some provider of digital multimedia products such as Block Blaster Inc. (Dallas, Texas), which provides digital movies and video games, and Sony Corporation of America, which provides digital music, movies and TV programs. (New York, New York).

プリンタユーザ5316は、印刷された紙書類を作成するために何らかの種類の何らかのプリンタを使用する何らかの個人又は組織(エンティティ)である。例えば、MMRコンシューマ5312は、プリンタユーザ5316や書類印刷者5330になってもよい。 Printer user 5316 is any person or organization that uses some kind of printer to create printed paper documents. For example, the MMR consumer 5312 may be a printer user 5316 or a document printer 5330.

セルラ電話サービスプロバイダ5318は何らかのセルラ電話サービスプロバイダであり、例えば、ベリゾンワイヤレス(ニュージャージー州ベッドミンスター)、シンギュラーワイヤレス(ジョージア州アトランタ)、TモバイルUSA(ワシントン州ベレブ)及びスプリントネクステル(バージニア州レストン)等である。 Cellular telephone service provider 5318 is any cellular telephone service provider, such as Verizon Wireless (Bedminster, NJ), Singular Wireless (Atlanta, GA), T-Mobile USA (Beleb, WA) and Sprint Nextel (Reston, VA). Etc.

ハードウエア製造業者5320はハードウエア装置の何らかの製造業者であり、例えば、プリンタ、セルラ電話又はPDAの製造業者である。ハードウエア製造業者の具体例は、例えば、ヒューレットパッカード(テキサス州ヒューストン)、モトローラインコーポレーテッド(イリノイ州シカゴ)及びソニーコーポレーションオブアメリカ(ニューヨーク州ニューヨーク)等である。ハードウエア小売業者5322は、プリンタ、セルラ電話又はPDAの小売業者のような、ハードウエアの何からの小売業者である。ハードウエア小売業者の具体例は、限定ではないが、ラジオショックコーポレーション(テキサス州フォートワース)、サーキットシティストアインコーポレーテッド(バージニア州リッチモンド)、ウォルマート(アーカンサス州ベントンビル)及びベストバイコーポレーション(ミネソタ州リッチフィールド)等を含む。 Hardware manufacturer 5320 is any manufacturer of hardware devices, such as a printer, cellular phone or PDA. Specific examples of hardware manufacturers are, for example, Hewlett-Packard (Houston, Texas), Motoroline Corporation (Chicago, Illinois), and Sony Corporation of America (New York, NY). Hardware retailer 5322 is a retailer of anything from hardware, such as a printer, cellular phone or PDA retailer. Specific examples of hardware retailers include, but are not limited to, Radioshock Corporation (Fort Worth, TX), Circuit City Store Incorporated (Richmond, VA), Walmart (Bentonville, Arkansas), and Best Buy Corporation (Richfield, MN) ) Etc.

金融機関5324は何らかの銀行又はクレジット組合のような何らかの金融機関であり、銀行口座の管理、他銀行又は金融機関へ及びそこからの資金のやりとりを処理する。クレジットカード処理者5326は何らかのクレジットカード機関であり、クレジットカード認証を管理し、購買処理のプロセスを承認する。クレジットカードプロセッサの具体例は、限定ではないが、クリックバンクを含み、それは、クリックセールスインコーポレーテッド(アイダホ州ボイシ)、シャレイトインコーポレーテッド(ミネソタ州プレーリー)及びシーシーナウ(CCNow)インコーポレーテッド(ミネソタ州エデンプレーリー)のサービスである。 Financial institution 5324 is some financial institution, such as some bank or credit association, and handles the management of bank accounts and the exchange of funds to and from other banks or financial institutions. Credit card processor 5326 is some credit card agency that manages credit card authentication and approves the process of purchase processing. Specific examples of credit card processors include, but are not limited to, ClickBank, which is Click Sales Incorporated (Boise, Idaho), Sharate Incorporated (Pralee, Minnesota), and SiCinau (CCNow) Incorporated (Eden, Minnesota). Prairie) service.

書類出版者5328は何らかの書類発行会社であり、限定ではないが、グリゲス(Gregath)出版社(オクラホマ州ワイアンドット)、プレンティスホール(ニュージャージー州アッパーサドルリバー)及びペリカン出版社(ルイジアナ州グレトナ)等である。書類印刷者5330は何らかの書類印刷会社であり、限定ではないが、PSプリントLLC(カリフォルニア州オークランド)、プリントリザードインコーポレーテッド(ニュージャージー州バッファロ)及びミネオインコーポレーテッド(ニューヨーク州ニューヨーク)等である。他の例では、書類出版者5328及び／又は書類印刷者5330は、新聞又は雑誌を作成及び配布する何らかの者(エンティティ)である。 Document publisher 5328 is any document publisher, including but not limited to Gregath Publisher (Wyandotte, Oklahoma), Prentice Hall (Upper Saddle River, New Jersey) and Pelican Publisher (Gretna, Louisiana). Etc. Document printer 5330 is any document printing company, including but not limited to PS Print LLC (Oakland, Calif.), Print Lizard Incorporated (Buffalo, NJ), and Mineo Incorporated (New York, NY). In another example, document publisher 5328 and / or document printer 5330 is any person (entity) that creates and distributes newspapers or magazines.

フルフィルメントハウス(fulfillment house)5332は、周知のように注文の調達に特化した何らかの第三者物流ウェアハウスである。フルフィルメントハウスの具体例は、限定ではないが、コーポレートディスクカンパニー(イリノイ州マックヘンリー)、オーダーモーションインコーポレーテッド(ニューヨーク州ニューヨーク)及びシップワイヤドットコム(カリフォルニア州ロスアンゼルス)等である。 A fulfillment house 5332 is any third party logistics warehouse specialized in order procurement as is well known. Specific examples of fulfillment houses include, but are not limited to, Corporate Disc Company (McHenry, Ill.), Order Motion Incorporated (New York, NY) and Shipwire.com (Los Angeles, CA).

ケーブルTVプロバイダ5334は何らかのケーブルTVサービスプロバイダであり、限定ではないが、コムキャストコーポレーション(ペンシルバニア州フィラデルフィア)及びアデルフィアコミュニケーションズ(コロラド州グリーンウッドビレッジ)等である。サービスプロバイダ5336は、何らかのサービスを提供する何らかのエンティティを表現する。 Cable TV provider 5334 is any cable TV service provider such as, but not limited to, Comcast Corporation (Philadelphia, PA) and Adelphia Communications (Greenwood Village, CO). Service provider 5336 represents some entity that provides some service.

ソフトウエアプロバイダ5338は何らかのソフトウエア開発会社であり、限定ではないが、アートアンドロジックインコーポレーテッド(カリフォルニア州パサデナ)、ジグソーデータコーポレーション(カリフォルニア州サンマテオ)、データミラーコーポレーション(ニューヨーク州ニューヨーク)及びデータバンクIMX,LCC(メリーランド州ベルツビル)等である。 Software Provider 5338 is any software development company, including but not limited to Art and Logic Incorporated (Pasadena, Calif.), Jigsaw Data Corporation (San Mateo, Calif.), Data Mirror Corporation (New York, NY), and Databank IMX , LCC (Bertsville, Maryland), etc.

広告企業5340は何らかの広告会社又は代理店であり、限定ではないが、DアンドBマーケティング(イリノイ州エルハースト)、ブラックシープマーケティング(マサチューセッツ州ボストン)及びゴータムダイレクトインコーポレーテッド(ニューヨーク州ニューヨーク)等である。 Advertising company 5340 is some advertising company or agency, including but not limited to D & B Marketing (Elhurst, Illinois), Black Sheep Marketing (Boston, Massachusetts) and Gautam Direct Incorporated (New York, NY). is there.

ビジネスネットワーク5370は、それにより取引関係が設定及び／又は支援される何らかの仕組みを表す。 Business network 5370 represents any mechanism by which business relationships are established and / or supported.

図５４は、本発明の一実施例によるMMRシステム100bを利用することで促される一般化されたビジネス方法5400を示す。少なくとも２つのエンティティの間で或る関係を設定すること、可能性のある取引関係を決定すること、少なくとも１つの取引を実行すること及び取引に関する製品又はサービスを配達／提供することを方法5400は含む。 FIG. 54 illustrates a generalized business method 5400 prompted by utilizing the MMR system 100b according to one embodiment of the present invention. Method 5400 includes establishing a relationship between at least two entities, determining a potential business relationship, performing at least one transaction, and delivering / providing a product or service related to the transaction. Including.

先ず、少なくとも２つのビジネスエンティティ5300の間で或る関係が設定される(5410)。ビジネスエンティティ5300は、例えば(1)MMRクリエータ、(2)MMRディストリビュータ、(3)MMRユーザ及び(4)その他のような４つの大まかなカテゴリに揃えられ、ビジネスエンティティは１つより多くのカテゴリに該当する。この例によれば、ビジネスエンティティ5300は次のように分類される：
●MMRクリエータは、MMRサービスプロバイダ5310、マルチメディア会社5314、書類発行者5328、書類印刷者5330、ソフトウエアプロバイダ5338及び広告会社5340などである。 First, a relationship is established between at least two business entities 5300 (5410). Business entities 5300 are organized into four broad categories, for example (1) MMR creator, (2) MMR distributor, (3) MMR users, and (4) others, and business entities are in more than one category. Applicable. According to this example, business entity 5300 is classified as follows:
MMR creators include MMR service provider 5310, multimedia company 5314, document issuer 5328, document printer 5330, software provider 5338, and advertising company 5340.

●MMRディストリビュータは、MMRサービスプロバイダ5310、マルチメディア会社5314、セルラ電話サービスプロバイダ5318、ハードウエア製造業者5320、ハードウエア小売業者5322、書類発行者5328、書類印刷者5330、フルフィルメントハウス5332、ケーブルTVプロバイダ5334、サービスプロバイダ5336及び広告会社5340等である。 MMR distributors are MMR service provider 5310, multimedia company 5314, cellular telephone service provider 5318, hardware manufacturer 5320, hardware retailer 5322, document issuer 5328, document printer 5330, fulfillment house 5332, cable TV Provider 5334, service provider 5336, advertising company 5340, and the like.

●MMRユーザは、MMRコンシューマ5312、プリンタユーザ5316及び書類印刷社5330等である。 The MMR users are the MMR consumer 5312, the printer user 5316, the document printing company 5330, and the like.

●その他は、金融機関5324及びクレジットカード処理者5326等である。 ● Others include financial institutions 5324 and credit card processors 5326.

例えば本方法のステップでは、MMRクリエータであるMMRサービスプロバイダ5310、MMRユーザであるMMRコンシューマ5312、MMRディストリビュータであるセルラ電話サービスプロバイダ及びハードウエア小売業者の間で取引関係が設定される。更に、ハードウエア製造業者5320は、ハードウエア小売業者5322との取引関係を有し、双方ともMMRディストリビュータである。 For example, in the method steps, a business relationship is established between an MMR service provider 5310, an MMR creator, an MMR consumer 5312, an MMR user, a cellular telephone service provider, an MMR distributor, and a hardware retailer. In addition, hardware manufacturer 5320 has a business relationship with hardware retailer 5322, both of which are MMR distributors.

次に、方法5400はステップ5410で設定された関係を有する者の間で可能性のある取引を決定する。特に、何らかの２以上のビジネスエンティティ5300の間で様々な取引がなされる。取引の具体例は次のようなものを含む：情報を購入すること；物理的な商品を購入すること；サービスを購入すること；帯域幅を購入すること；電子ストレージを購入すること；広告を購入すること；広告統計資料を購入すること；商品を発送すること；情報を販売すること；物理的な商品を販売すること；サービスを販売すること；帯域幅を販売すること；電子ストレージを販売すること；広告を販売すること；広告統計資料を販売すること；貸与／リースを行うこと；意見／格付け／投票を集めること等々。 Next, the method 5400 determines possible transactions among those having the relationship established in step 5410. In particular, various transactions are made between any two or more business entities 5300. Examples of transactions include: buying information; buying physical goods; buying services; buying bandwidth; buying electronic storage; advertising Purchasing advertising statistics; shipping goods; selling information; selling physical goods; selling services; selling bandwidth; selling electronic storage Selling advertisements; selling advertising statistics; lending / lease; gathering opinions / ratings / votings, etc.

方法5400が各人の間の可能な取引を決定すると、少なくとも１つの取引で合意に至るようにMMRシステム100が使用される(5414)。特に、取引結果をもたらす何らかの２以上のビジネスエンティティ5300の間で様々なアクションが起こるかもしれない。アクションの具体例は次のようなものを含む：情報を購入すること；注文を受けること；より多くの情報を求めてクリックすること；スペースを用意すること；ローカルの／リモートのアクセスを行うこと；ホスティングを行うこと；発送すること；取引を発生させること；プライベート情報を格納すること；他者に情報を伝送すること；コンテンツを付加すること；及びポッドキャストすること等々。 Once the method 5400 determines a possible transaction between each person, the MMR system 100 is used to reach an agreement on at least one transaction (5414). In particular, various actions may occur between any two or more business entities 5300 that result in a transaction result. Specific examples of actions include: buying information; receiving orders; clicking for more information; providing space; providing local / remote access. Hosting; shipping; generating transactions; storing private information; transmitting information to others; adding content; and podcasting.

方法5400でその取引について合意に至ると、例えばMMRコンシューマ5312に取引に係る製品又はサービスを配布／提供するためにMMRシステム100が使用される(5416)。特に、方法ステップ5414でビジネス取引が合意に達したことで、様々なコンテンツが何らかの２以上のビジネスエンティティ5300の間で取引される。コンテンツの具体例は、テキスト、ウェブリンク、ソフトウエア、静止画写真、ビデオ、オーディオ及びそれらの何らかの組み合わせ等を含む。更に、取引を促すために、何らかの２以上のビジネスエンティティ5300の間で様々な拡布手段が使用されてよい。拡布手段の具体例は、紙、パーソナルコンピュータ、ネットワークコンピュータ、キャプチャー装置106、パーソナルビデオ装置、パーソナルオーディオ装置及びそれらの何らかの組み合わせ等を含む。 Once the method 5400 is agreed upon for the transaction, the MMR system 100 is used (5416) to distribute / provide the product or service associated with the transaction, for example, to the MMR consumer 5312. In particular, a variety of content is traded between any two or more business entities 5300 as a result of the business deal being reached at method step 5414. Specific examples of content include text, web links, software, still picture photos, video, audio and some combination thereof. In addition, various spreading means may be used between any two or more business entities 5300 to facilitate transactions. Specific examples of the spreading means include paper, a personal computer, a network computer, a capture device 106, a personal video device, a personal audio device, and some combination thereof.

上記実施例で説明及び請求される発明に加えて、本発明の１つ以上の実施例の少なくとも１つの形態は、混合メディア書類を用意するようにコンピュータで実行される方法をもたらす。本方法は、紙書類から抽出された特徴の電子記述を(インデックステーブルで)受信するステップを含む。インデックステーブルは、紙書類及び書類中の特徴の場所を、印刷されるディジタルメディアを合成した混合メディア書類に関連付けるためのものである。検索対象書類中のオブジェクト間の２次元的な位置関係を捕捉するクエリタームを受信するステップ、及びインデックステーブルからのデータに基づいてクエリタームに応じて可能性のあるロケーション候補及び少なくとも１つの混合メディア書類を算出するステップに本方法は続く。そのような或る場合、本方法は検索対象書類に関連する付加的な特徴を格納するステップを含む。そのような或る場合、付加的な特徴は、テキスト情報の抽出、図形情報の抽出、プロセスの実行、コマンドの実行、ある順序に並べること、ビデオを抽出すること、音を抽出すること、情報を格納すること、新たな書類を作成すること、書類を印刷すること及び／又は書類を表示することを含む１つ以上のアクションを含む。別の特定の場合、紙書類から抽出された特徴の電子記述を(インデックステーブルで)受信するステップは、複数の紙書類から抽出された特徴の電子記述を受信するステップを含む。別の特定の場合、少なくとも１つの混合メディア書類及びロケーション候補を算出するステップは、インデックステーブルからのデータに基づいて、混合メディア書類、ページ及びロケーション候補のランク付けされた群を算出するステップを含む。別の特定の場合、検索対象書類中のオブジェクト間の２次元的位置関係を捕捉するクエリタームを受信するステップは、検索対象書類から抽出された一群の横に及び縦に隣接するワードペアを受信するステップを含む。別の特定の場合、インデックステーブルは逆タームインデックステーブルを含み、その逆タームインデックステーブル中の固有のターム各々は記録(レコード)のリストを指し、各レコードは混合メディア書類中のページ上の候補領域を特定する。その場合、少なくとも１つの混合メディア書類及びロケーション候補を算出するステップは、クエリタームに対応するキー(key)で索引付けされるレコード全てを検査するステップと、全てのクエリタームに最も一致する領域を特定するステップとを含む。特定された領域が、合致基準を満たす一致スコアを持つ場合、本方法は、対応する混合メディア書類及びロケーション候補を確認するステップを更に含んでもよい。或る特定の場合、検索対象書類は、紙書類の画像である或いはその紙書類のパッチである。 In addition to the invention described and claimed in the above embodiments, at least one form of one or more embodiments of the present invention provides a computer-implemented method for preparing mixed media documents. The method includes receiving (in an index table) an electronic description of features extracted from a paper document. The index table is for associating a paper document and the location of the features in the document with a mixed media document composed of digital media to be printed. Receiving a query term that captures a two-dimensional positional relationship between objects in the document to be searched, and a potential location candidate and at least one mixed media document according to the query term based on data from the index table The method follows the calculating step. In some such cases, the method includes storing additional features associated with the document to be searched. In some cases, such additional features include text information extraction, graphic information extraction, process execution, command execution, ordering, video extraction, sound extraction, information Including one or more actions including storing the document, creating a new document, printing the document, and / or displaying the document. In another particular case, receiving an electronic description of features extracted from a paper document (in an index table) includes receiving an electronic description of features extracted from a plurality of paper documents. In another particular case, calculating at least one mixed media document and location candidate includes calculating a ranked group of mixed media documents, pages, and location candidates based on data from the index table. . In another particular case, receiving a query term that captures a two-dimensional positional relationship between objects in a search target document includes receiving a group of laterally and vertically adjacent word pairs extracted from the search target document. including. In another particular case, the index table includes a reverse term index table, each unique term in the reverse term index table points to a list of records (records), and each record is a candidate area on a page in a mixed media document Is identified. In that case, computing at least one mixed media document and location candidate includes examining all records indexed with a key corresponding to the query term and identifying an area that best matches all query terms. Steps. If the identified region has a match score that meets the match criteria, the method may further include identifying corresponding mixed media documents and location candidates. In certain cases, the search target document is an image of a paper document or a patch of the paper document.

本発明の１つ以上の実施例の内の少なくとも１つの他の形態は、混合メディア書類を用意するデータベースをもたらす。本システムはインデックステーブルを含み、紙書類から抽出された特徴の電子記述を受信すること、及び紙書類と書類中の特徴の場所とを、印刷されるディジタルメディアを合成する混合メディア書類に関連付けることに備える。本システムはアキュムレータモジュールを含み、検索対象書類中のオブジェクト間の２次元的な位置関係を捕捉するクエリタームを受信し、インデックステーブルからのデータに基づいて、クエリタームに応答する可能性のある少なくとも１つの混合メディア書類及びロケーション候補を算出する。ある特定の場合、本システムはストレージファリリティ(例えば、リレーショナルデータベース)を含み、検索対象書類に関連する追加的な特徴を格納する。テキスト情報の抽出、図形情報の抽出、プロセスの実行、コマンドの実行、ある順序に並べること、ビデオを抽出すること、音を抽出すること、情報を格納すること、新たな書類を作成すること、書類を印刷すること及び／又は書類を表示することを含む１つ以上のアクションを含む。別の特定の場合、インデックステーブルは、複数の紙書類から抽出された特徴の電子記述を受けることができる。別の特定の場合、紙書類は複数のページを含み、インデックステーブルは、混合メディア書類、ページ及びそれらのページの中におけるｘ−ｙ位置属性を特定するよう構築される。別の特定の場合、アキュムレータモジュールで実行される少なくとも混合メディア書類及びロケーション候補を算出することは、インデックステーブルからのデータに基づいて、ロケーション候補、ページ及び混合メディア書類のランク付けされた群を算出することを含む。別の特定の場合、検索対象書類中のオブジェクト間の２次元的な位置関係を捕捉するクエリタームを受信すること（アキュムレータモジュールで実行される）は、検索対象書類から抽出された横に及び縦に隣接するワードペア群を受信することを含む。別の特定の場合、インデックステーブルは逆タームインデックステーブルを含み、その逆タームインデックステーブル中の固有のターム各々は記録(レコード)のリストを指し、各レコードは混合メディア書類中のページ上の候補領域を特定する。そのような場合、少なくとも１つの混合メディア書類及びロケーション候補を算出することは、クエリタームに対応するキー(key)で索引付けされるレコード全てを検査することと、全てのクエリタームに最も一致する領域を特定することとを含む。特定された領域が、合致基準を満たす一致スコアを持つ場合、アキュムレータモジュールは、対応する混合メディア書類及びロケーション候補を確認してもよい。別のそのような場合、インデックステーブルは混合メディア書類各々について関連情報を含む書類インデックステーブルを更に含み、その関連情報は、印刷解像度、印刷日、用紙サイズ、シャドーファイル名及びページ画像位置の少なくとも１つを含む。別の特定の場合、インデックステーブルで受け取られる記述は特徴抽出モジュールで算出され、特徴抽出モジュールは、抽出した特徴を、それらの特徴の内部書類位置データに関連付ける。ある特定の場合、検索対象書類は紙書類の画像又はその紙書類のパッチである。本システムの機能は様々な手段と共に実現可能であり、そのような手段は、ソフトウエア(例えば、１つ以上のコンピュータ読み取り可能な媒体でエンコードされている実行可能な命令)、ハードウエア(例えば、ゲートレベルのロジック又は１つ以上のASIC)、ファームウエア(例えば、I/O機能を備えた１つ以上のマイクロコントローラ及びここで説明される機能を実行する組込ルーチン)又はそれらの何らかの組み合わせである。データベースシステムはここで説明されるような混合メディアリアリティ(MMR)システムで実現可能であり、MMRシステムは、例えば、もう１つのサーバで、コンピュータシステムで、携帯用装置で又はそれらの何らかの組み合わせで実行される機能を備える。 At least one other form of one or more embodiments of the present invention provides a database for preparing mixed media documents. The system includes an index table to receive an electronic description of the features extracted from the paper document and to associate the paper document with the location of the features in the document with the mixed media document that synthesizes the printed digital media Prepare for. The system includes an accumulator module, receives a query term that captures a two-dimensional positional relationship between objects in a search target document, and responds to the query term based on data from an index table. Compute mixed media documents and location candidates. In certain cases, the system includes a storage facility (eg, a relational database) and stores additional features associated with the document being searched. Text information extraction, graphic information extraction, process execution, command execution, arranging in a certain order, extracting video, extracting sound, storing information, creating new documents, It includes one or more actions including printing the document and / or displaying the document. In another particular case, the index table can receive an electronic description of features extracted from multiple paper documents. In another particular case, the paper document includes a plurality of pages, and the index table is constructed to identify the mixed media document, pages, and xy location attributes within those pages. In another particular case, calculating at least the mixed media document and location candidates executed in the accumulator module calculates a ranked group of location candidates, pages and mixed media documents based on data from the index table. Including doing. In another particular case, receiving a query term (performed by the accumulator module) that captures a two-dimensional positional relationship between objects in the search target document is horizontally and vertically extracted from the search target document. Receiving adjacent word pairs. In another particular case, the index table includes a reverse term index table, each unique term in the reverse term index table points to a list of records (records), and each record is a candidate area on a page in a mixed media document Is identified. In such a case, calculating at least one mixed media document and location candidate is to examine all the records indexed by the key corresponding to the query term and to determine the region that best matches all the query terms. Identifying. If the identified region has a match score that meets the match criteria, the accumulator module may identify the corresponding mixed media document and location candidate. In another such case, the index table further includes a document index table that includes relevant information for each mixed media document, the relevant information including at least one of print resolution, print date, paper size, shadow file name, and page image location. Including one. In another particular case, the descriptions received in the index table are calculated in a feature extraction module that associates the extracted features with the internal document location data for those features. In certain cases, the document to be searched is an image of a paper document or a patch of that paper document. The functions of the system can be implemented in conjunction with various means such as software (eg, executable instructions encoded on one or more computer-readable media), hardware (eg, Gate level logic or one or more ASICs), firmware (eg, one or more microcontrollers with I / O functions and embedded routines that perform the functions described herein), or some combination thereof is there. The database system can be implemented with a mixed media reality (MMR) system as described herein, which can be implemented, for example, on another server, on a computer system, on a portable device, or some combination thereof. It has a function to be performed.

特定の実施例では、MMRシステムはインデックステーブルと共にコンテンツベースの抽出データベースを含み、テキストベースのインデックスを用いて探索することを許容する方法で、インデックステーブルは、印刷書類から抽出されたオブジェクト間の２次元的な幾何学的位置関係を表現する。書類、ページ及びロケーション候補のランク付けされた群が、インデックステーブルから所与のデータの元で算出可能である。本技法は画像パッチ内で検出された特徴をテキストタームに(又は他のサーチ可能な特徴に)効率的に変換し、テキストターム等は、特徴それ自身及びそれらの間の幾何学的位置関係の双方を表現する。ストレージ手段は、書類画像パッチ各々に関する付加的な特徴を格納するために使用可能である。 In a particular embodiment, the MMR system includes a content-based extraction database along with an index table, which is a method that allows searching using a text-based index. Represents a dimensional geometric positional relationship. A ranked group of documents, pages and location candidates can be calculated under a given data from the index table. This technique efficiently converts the features detected in the image patch into text terms (or other searchable features), such as text terms themselves and the geometric positional relationship between them. Express both. The storage means can be used to store additional features for each document image patch.

ここで提示されるアルゴリズムは、何らかの特定のコンピュータ又は他の装置に固有に関連付けられない。様々な汎用の及び／又は専用のシステムが、本発明の実施例に応じてプログラムされ又は構築されてもよい。本開示から明らかにされるように、様々なプログラミング言語及び／又は構造が、そのような様々なシステムを実現するのに使用可能である。更に、本発明の実施例は、情報システム又はネットワーク上で動作してもよいし或いはその中で機能してもよい。例えば、本発明はスタンドアローンの複合機プリンタで動作してもよいし、或いはコンフィギュレーションに依存して変化する機能と共にネットワーク化されたプリンタ上で動作してもよい。ここで説明された全機能の内最少の機能と共に、本発明は何らかの情報システムと共に動作することも可能である。 The algorithm presented here is not uniquely associated with any particular computer or other device. Various general purpose and / or dedicated systems may be programmed or constructed according to embodiments of the invention. As will be apparent from this disclosure, various programming languages and / or structures can be used to implement such various systems. Further, embodiments of the invention may operate on or function within an information system or network. For example, the present invention may operate on a stand-alone multifunction printer or may operate on a networked printer with features that vary depending on the configuration. With the minimum of all the functions described here, the present invention can also work with any information system.

本発明の実施例に関する上記の説明は、例示及び説明を意図してなされている。開示された厳密な形式に本発明を限定すること又は開示内容が限定的であることは意図されていない。多くの修正及び変形が上記の教示から可能である。本発明の範囲はこの詳細な説明に限定されず、本願の特許請求の範囲によってのみ限定されることが意図される。当業者に理解されるように、本発明の精神又は本質的特徴から逸脱せずに、本発明は他の形態に組み込まれてもよい。同様に、モジュール、ルーチン、特徴、属性、方法その他の形態の特定の区分け法及びネーミングは、必須でもなく重要でもない。本発明又は本発明の特長を実現する手段(メカニズム)は様々な名称、区分け法及び／又はフォーマットを備えていてよい。更に、関連する技術分野の当業者に明らかなように、モジュール、ルーチン、特徴、属性、方法等の本発明の他の形態は、ソフトウエアで、ハードウエアで、ミドルウエアで又はそれらの何らかの組み合わせで実現可能である。本発明のコンポーネント(一例としてそれはモジュールである)はソフトウエアとして実現され、コンポーネントは、スタンドアローンプログラムとして、静的に又は動的にリンクしたライブラリとして、カーネルローダブルモジュールとして、デバイスドライバとして実現可能であり、及び／又はコンピュータプログラミングの当業者にとって現在既知の又は将来既知になる何らかの方法で実現される。更に、本発明は如何なる特定のプログラミング言語で実現することにも限定されず、如何なる特定のオペレーティングシステムにも環境にも限定されない。従って、本発明の開示内容は例示的なものであり、添付の特許請求の範囲に記載される本発明の範囲を限定するものではない。 The above description of embodiments of the invention is intended for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed or to limit the disclosure. Many modifications and variations are possible from the above teachings. It is intended that the scope of the invention be limited not by this detailed description, but only by the claims of this application. As will be appreciated by those skilled in the art, the present invention may be incorporated in other forms without departing from the spirit or essential characteristics of the invention. Similarly, specific partitioning schemes and naming of modules, routines, features, attributes, methods and other forms are neither essential nor important. The present invention or means (mechanism) for realizing the features of the present invention may have various names, classification methods and / or formats. Further, as will be apparent to those skilled in the relevant arts, other aspects of the invention, such as modules, routines, features, attributes, methods, etc., are software, hardware, middleware, or some combination thereof. It is feasible. The component of the present invention (for example, it is a module) is realized as software, and the component can be realized as a stand-alone program, as a statically or dynamically linked library, as a kernel loadable module, as a device driver And / or implemented in any manner now known or future known to those skilled in the art of computer programming. Further, the present invention is not limited to being implemented in any particular programming language, and is not limited to any particular operating system or environment. Accordingly, the disclosure of the present invention is illustrative and is not intended to limit the scope of the invention as set forth in the appended claims.

本願は、2005年8月23日付出願の米国優先権主張出願第60/710,767号、2006年4月17日付出願の米国出願第60/792,912号、2006年7月31日付出願の米国出願第11/461,147号、2006年7月31日付出願の米国出願第11/461,164号に基づくものであり、その全内容が本願のリファレンスに組み込まれる。 This application is a U.S. priority application 60 / 710,767 filed on August 23, 2005, a U.S. application 60 / 792,912 filed on April 17, 2006, and a U.S. application filed on July 31, 2006. No. 461,147, based on US application Ser. No. 11 / 461,164, filed Jul. 31, 2006, the entire contents of which are incorporated herein by reference.

本発明の一実施例により構築された混合メディアリアリティ(MMR)システムの機能ブロック図を示す。FIG. 3 shows a functional block diagram of a mixed media reality (MMR) system constructed in accordance with one embodiment of the present invention. 本発明の別の実施例により構築されたMMRシステムの機能ブロック図を示す。FIG. 4 shows a functional block diagram of an MMR system constructed in accordance with another embodiment of the present invention. 本発明の一実施例による捕捉装置を示す図である。1 shows a capture device according to one embodiment of the present invention. 本発明の一実施例による捕捉装置を示す図である。1 shows a capture device according to one embodiment of the present invention. 本発明の一実施例による捕捉装置を示す図である。1 shows a capture device according to one embodiment of the present invention. 本発明の一実施例による捕捉装置を示す図である。1 shows a capture device according to one embodiment of the present invention. 本発明の一実施例により構築された捕捉装置の機能ブロック図である。FIG. 3 is a functional block diagram of a capture device constructed in accordance with an embodiment of the present invention. 本発明の一実施例により構築されたMMRコンピュータの機能ブロック図を示す。1 shows a functional block diagram of an MMR computer constructed in accordance with one embodiment of the present invention. 本発明の一実施例により適切に構築されたMMRソフトウエアに含まれるソフトウエアコンポーネント群を示す図である。FIG. 3 is a diagram illustrating software component groups included in MMR software appropriately constructed according to an embodiment of the present invention. 本発明の一実施例により構築されたMMRドキュメントの例を表す図を示す。FIG. 6 shows a diagram representing an example of an MMR document constructed in accordance with one embodiment of the present invention. 本発明の一実施例による書類のフィンガープリント照合法を説明するための図である。It is a figure for demonstrating the fingerprint collation method of the document by one Example of this invention. 本発明の一実施例により構築された書類のフィンガープリント照合システムを表す図を示す。1 shows a diagram representing a document fingerprint verification system constructed in accordance with one embodiment of the present invention. FIG. 本発明の一実施例によるテキスト／非テキストを区別するフローチャートを表す図を示す。FIG. 6 shows a diagram representing a flowchart for distinguishing text / non-text according to one embodiment of the present invention. 本発明の一実施例によるテキスト／非テキストを区別する具体例を表す図を示す。FIG. 4 shows a diagram illustrating a specific example of distinguishing text / non-text according to an embodiment of the present invention. 本発明の一実施例による画像パッチ中のテキストのポイントサイズを推定するフローチャートを表す図を示す。FIG. 5 shows a diagram representing a flow chart for estimating the point size of text in an image patch according to one embodiment of the present invention. 本発明の別の実施例による書類のフィンガープリント照合法を説明するための図である。FIG. 6 is a diagram for explaining a fingerprint collation method for a document according to another embodiment of the present invention. 本発明の別の実施例による書類のフィンガープリント照合法を説明するための図である。FIG. 6 is a diagram for explaining a fingerprint collation method for a document according to another embodiment of the present invention. 本発明の一実施例によるインタラクティブ画像分析の例を示す図である。FIG. 6 is a diagram illustrating an example of interactive image analysis according to an embodiment of the present invention. 本発明の別の実施例による書類のフィンガープリント照合法を説明するための図である。FIG. 6 is a diagram for explaining a fingerprint collation method for a document according to another embodiment of the present invention. 本発明の一実施例によるワード境界ボックス判定例を示す図である。It is a figure which shows the example of word boundary box determination by one Example of this invention. 本発明の一実施例による特徴抽出法を示す図である。It is a figure which shows the feature extraction method by one Example of this invention. 本発明の別の実施例による特徴抽出法を示す図である。It is a figure which shows the feature extraction method by another Example of this invention. 本発明の別の実施例による特徴抽出法を示す図である。It is a figure which shows the feature extraction method by another Example of this invention. 本発明の別の実施例による特徴抽出法を示す図である。It is a figure which shows the feature extraction method by another Example of this invention. 本発明の別の実施例による書類のフィンガープリント照合法を説明するための図である。FIG. 6 is a diagram for explaining a fingerprint collation method for a document according to another embodiment of the present invention. 本発明の一実施例による書類のフィンガープリント照合用のマルチ分類子特徴抽出例を示す図である。FIG. 10 is a diagram illustrating an example of multi-classifier feature extraction for document fingerprint matching according to an embodiment of the present invention. 本発明の一実施例による書類のフィンガープリント照合例を示す図である。It is a figure which shows the fingerprint collation example of the document by one Example of this invention. 本発明の一実施例による書類のフィンガープリント照合例を示す図である。It is a figure which shows the fingerprint collation example of the document by one Example of this invention. 本発明の別の実施例による書類のフィンガープリント照合例を示す図である。It is a figure which shows the fingerprint collation example of the document by another Example of this invention. 本発明の一実施例によるデータベースドリブンフィードバックのフローチャートを示す。3 shows a flowchart of database driven feedback according to an embodiment of the present invention. 本発明の別の実施例による書類のフィンガープリント照合例を示す図である。It is a figure which shows the fingerprint collation example of the document by another Example of this invention. 本発明の一実施例によるデータベースドリブンクラシフィケーションのフローチャートを示す。2 shows a flowchart of database driven classification according to one embodiment of the present invention. 本発明の別の実施例による書類のフィンガープリント照合例を示す図である。It is a figure which shows the fingerprint collation example of the document by another Example of this invention. 本発明の一実施例によるデータベースドリブン多重分類のフローチャートを示す。6 shows a flowchart of database driven multiple classification according to one embodiment of the present invention. 本発明の別の実施例による書類のフィンガープリント照合例を示す図である。It is a figure which shows the fingerprint collation example of the document by another Example of this invention. 本発明の別の実施例による書類のフィンガープリント照合例を示す図である。It is a figure which shows the fingerprint collation example of the document by another Example of this invention. 本発明の別の実施例による書類のフィンガープリント照合例を示す図である。It is a figure which shows the fingerprint collation example of the document by another Example of this invention. 本発明の一実施例によるマルチティア認識のフローチャートを示す。6 shows a flowchart of multi-tier recognition according to an embodiment of the present invention. 本発明の一実施例により構築されたMMRデータベースシステムの機能ブロック図を示す。1 shows a functional block diagram of an MMR database system constructed according to one embodiment of the present invention. FIG. 本発明の一実施例によるOCRベース方式のMMR特徴抽出例を示す図である。FIG. 5 is a diagram illustrating an example of extracting an MMR feature of an OCR base method according to an embodiment of the present invention. 本発明の一実施例によるインデックステーブル組織例を示す図である。It is a figure which shows the index table organization example by one Example of this invention. 本発明の一実施例によるMMRインデックステーブルを生成する方法を示す図である。FIG. 6 is a diagram illustrating a method for generating an MMR index table according to an embodiment of the present invention. 検索対象書類のランク付けされた一群の書類、ページ及びロケーション仮説を算出する本発明の一実施例による方法を示す図である。FIG. 6 illustrates a method according to an embodiment of the present invention for calculating a ranked group of documents, pages and location hypotheses for a search target document. 本発明の別の実施例により構築されたMMRコンポーネントの機能ブロック図を示す。FIG. 4 shows a functional block diagram of an MMR component constructed in accordance with another embodiment of the present invention. MMR印刷ソフトウエアに含まれている本発明の一実施例による一群のソフトウエアコンポーネントを示す図である。FIG. 3 illustrates a group of software components included in MMR printing software according to one embodiment of the present invention. 書類にホットスポットを埋め込む本発明の一実施例による方法を示すフローチャートである。2 is a flowchart illustrating a method according to one embodiment of the present invention for embedding hot spots in a document. 本発明の一実施例によるHTMLファイル例を示す図である。It is a figure which shows the example HTML file by one Example of this invention. 図３９ＡのHTMLファイルのマークアップバージョンの例を示す図である。It is a figure which shows the example of the markup version of the HTML file of FIG. 39A. 本発明の一実施例によりブラウザに表示される図３９ＡのHTMLファイル例を示す図である。It is a figure which shows the example of an HTML file of FIG. 39A displayed on a browser by one Example of this invention. 本発明の一実施例による図４０ＡのHTMLファイルの印刷バージョンの例を示す図である。FIG. 40B is a diagram illustrating an example of a printed version of the HTML file of FIG. 40A according to one embodiment of the present invention. 本発明の一実施例によるシンボリックホットスポット記述を示す図である。FIG. 6 is a diagram illustrating a symbolic hot spot description according to an embodiment of the present invention. 本発明の一実施例による図３９ＡのHTMLファイルに関するpage_desc.xmlファイル例を示す図である。FIG. 39B is a diagram showing an example page_desc.xml file for the HTML file of FIG. 39A according to one embodiment of the present invention. 本発明の一実施例による図３９ＡのHTMLファイルに関するpage_desc.xmlファイル例を示す図である。FIG. 39B is a diagram showing an example page_desc.xml file for the HTML file of FIG. 39A according to one embodiment of the present invention. 本発明の一実施例による図４１，４２Ａ及び４２Ｂに対応するhotspot.xmlファイルを示す図である。FIG. 43 is a diagram showing a hotspot.xml file corresponding to FIGS. 41, 42A and 42B according to one embodiment of the present invention. フォワーディングDLLで使用される本発明の一実施例によるプロセスのフローチャートを示す図である。FIG. 4 shows a flow chart of a process according to an embodiment of the invention used in a forwarding DLL. 書類中のホットスポットに対応するキャラクタを変換する本発明の一実施例による方法を示すフローチャートである。4 is a flowchart illustrating a method according to an embodiment of the present invention for converting a character corresponding to a hot spot in a document. 本発明の一実施例による書類の電子バージョン例を示す図である。It is a figure which shows the electronic version example of the document by one Example of this invention. 本発明の一実施例による印刷された修正済みの書類例を示す図である。FIG. 5 is a diagram illustrating an example of a printed modified document according to an embodiment of the present invention. 本発明の一実施例により共有される書類注釈法のフローチャートを示す。6 shows a flowchart of a document annotation method shared by one embodiment of the present invention. 本発明の一実施例によるブラウザ中のソースウェブページサンプルを示す図である。FIG. 3 shows a sample source web page in a browser according to one embodiment of the present invention. 本発明の一実施例によるブラウザ中の修正ウェブページサンプルを示す図である。FIG. 4 is a diagram illustrating a modified web page sample in a browser according to an embodiment of the present invention. 本発明の一実施例による印刷されたウェブページサンプルを示す図である。FIG. 4 illustrates a printed web page sample according to one embodiment of the present invention. ホットスポットを画像処理書類に付加する本発明の一実施例による方法を示すフローチャートである。6 is a flowchart illustrating a method according to one embodiment of the present invention for adding a hot spot to an image processing document. イメージ書類に追加するホットスポットを決める本発明の一実施例による方法を示すフローチャートである。6 is a flowchart illustrating a method according to one embodiment of the present invention for determining hot spots to add to an image document. 一実施例でスキャンされた新聞のページの一部を示すユーザインターフェース例を示す図である。FIG. 6 is a diagram illustrating an example user interface showing a portion of a scanned newspaper page in one embodiment. 選択されたホットスポットに関連付けるデータ又は操作を決めるユーザインターフェース例を示す図である。It is a figure which shows the example of a user interface which determines the data or operation linked | related with the selected hot spot. 本発明の一実施例による挿入ボックスを含む図５１Ｂのユーザインターフェース例を示す図である。FIG. 51B illustrates the example user interface of FIG. 51B including an insertion box according to one embodiment of the present invention. 本発明の一実施例による書類内のホットスポットを表示するユーザインターフェースを示す図である。FIG. 5 is a diagram illustrating a user interface displaying hot spots in a document according to an embodiment of the present invention. MMRドキュメント及びMMRシステムを使用する本発明の一実施例による方法を示すフローチャートである。4 is a flowchart illustrating a method according to an embodiment of the present invention using an MMR document and an MMR system. 本発明の一実施例によるMMRシステムに関連する一群のビジネスエンティティ例のブロック図を示す。FIG. 4 shows a block diagram of an example group of business entities associated with an MMR system according to one embodiment of the invention. 本発明の一実施例によるMMRシステムを利用することで促される一般化されたビジネス方法のフローチャートを示す。Fig. 4 shows a flow chart of a generalized business method that is facilitated by utilizing an MMR system according to one embodiment of the present invention.

Explanation of symbols

100a,100b MMR(ミックスド・メディア・リアリティ)システム
102 MMRプロセッサ
104 通信手段
106 捕捉装置
108 MMRソフトウエア
110 ユーザ
112 MMRコンピュータ
114 ネットワークメディアサーバ
116 プリンタ
118 印刷書類
120 オフィスポータル
122 サービスプロバイダサーバ
124 電子ディスプレイ
126 セットトップボックス
127 書類スキャナ
160 ベースメディアストレージ
162 MMRメディアストレージ
164 出力装置
166 入力装置
168 携帯入力装置
170 携帯出力装置
210 プロセッサ
212 ディスプレイ
214 キーパッド
216 ストレージデバイス
218 無線通信リンク
220 有線通信リンク
222 MMRソフトエア
224 捕捉装置ユーザインターフェース(UI)
226 書類フィンガープリント照合モジュール
228 第三者ソフトウエアモジュール
230 捕捉手段
232 ビデオカメラ
234 スチルカメラ
236 ボイスレコーダ
238 電子ハイライター(電子蛍光ペン)
240 レーザ
242 GPS装置
244 RFIDリーダ
310 ソースファイル
312 第１ソース書類(SD)ブラウザ
314 第２SDブラウザ
316 プリンタ装置
318 印刷書類(PD)捕捉モジュール
320 ドキュメントイベントデータベース
322 PDインデックス
324 イベント捕捉モジュール
326 書類分析モジュール
328 マルチメディア(MM)クリップブラウザ／エディタモジュール
330 MMのプリンタドライバ
332 書類ビデオ分析(DVP)印刷システム
334 ビデオペーパー
410 マルチメディア注釈ソフトウエア
412 テキストコンテンツベースの抽出コンポーネント
414 画像コンテンツベースの抽出コンポーネント
416 ステガノグラフィック修正コンポーネント
418 ペーパー読み取り履歴ログ
420 オンライン読み取り履歴ログ
422 互換性書類確認コンポーネント
424 リアルタイム通知コンポーネント
426 マルチメディア取得コンポーネント
428 デスクトップビデオリマインダコンポーネント
430 ウェブページリマインダコンポーネント
432 物理履歴ログ
434 記入済フォーム確認コンポーネント
436 時間伝搬コンポーネント
438 位置確認コンポーネント
440 PC初期化コンポーネント
442 書類オーサリングコンポーネント
444 捕捉装置オーサリングコンポーネント
446 無意識的(自動)アップロードコンポーネント
448 書類バージョン取得コンポーネント
450 PC書類メタデータコンポーネント
452 捕捉装置ユーザインターフェース(UI)コンポーネント
454 ドメイン固有コンポーネント
610 ドキュメントフィンガープリント照合システム
712 品質評価モジュール
3400データベースシステム
3402 MMR特徴抽出モジュール
3404 MMRインデックステーブルモジュール
3406 証拠蓄積モジュール
3408 リレーショナルデータベース
3414 書類変換アプリケーションモジュール
3416 サブイメージ抽出モジュール
3418 フィードバック指向の特徴サーチモジュール
3705 コンピュータ
3710 ソースファイル
3715 ブラウザ
3720 プラグイン
3725 シンボリックホットスポット記述
3730 修正されたファイル
3735 捕捉モジュール
3740 page_desc.xml
3745 hotspot.xml
3750 データストア
3755 SDAサーバ
3760 MMR印刷ソフトウエア
3765 変換モジュール
3768 組込モジュール
3770 分析モジュール
3775 変換モジュール
3778 特徴抽出モジュール
3780 注釈モジュール
3785 ホットスポットモジュール
3790 表現／表示モジュール
3795 ストレージモジュール
5300 ビジネスエンティティ
5310 MMRサービスプロバイダ
5312 MMRコンシューマ
5314 マルチメディア企業
5316 プリンタユーザ
5318 セルラ電話サービスプロバイダ
5320 ハードウエア製造業者
5322 ハードウエア小売業者
5324 金融機関
5326 クレジットカード処理者
5328 書類出版者
5330 書類印刷者
5332 フルフィルメントハウス
5334 ケーブルTVプロバイダ
5336 サービスプロバイダ
5338 ソフトウエアプロバイダ
5340 広告企業
5370 ビジネスネットワーク 100a, 100b MMR (mixed media reality) system
102 MMR processor
104 Communication means
106 Trapping device
108 MMR software
110 users
112 MMR computer
114 Network media server
116 Printer
118 printed documents
120 office portal
122 Service Provider Server
124 electronic display
126 set top box
127 Document scanner
160 Base media storage
162 MMR media storage
164 Output device
166 Input device
168 Portable input device
170 Portable output device
210 processor
212 display
214 keypad
216 storage devices
218 Wireless communication link
220 Wired communication link
222 MMR soft air
224 Capture Device User Interface (UI)
226 Document Fingerprint Verification Module
228 Third Party Software Module
230 Trapping means
232 camcorder
234 Still camera
236 Voice Recorder
238 Electronic Highlighter (Electronic Highlighter)
240 laser
242 GPS device
244 RFID reader
310 source files
312 First Source Document (SD) Browser
314 Second SD Browser
316 Printer device
318 Print Document (PD) Capture Module
320 Document Event Database
322 PD index
324 Event Capture Module
326 Document Analysis Module
328 Multimedia (MM) clip browser / editor module
330 MM printer driver
332 Document Video Analysis (DVP) Printing System
334 video paper
410 Multimedia annotation software
412 Text Content Based Extraction Component
414 Image Content Based Extraction Component
416 Steganographic correction component
418 Paper reading history log
420 Online reading history log
422 Compatibility Document Confirmation Component
424 Real-time notification component
426 Multimedia Acquisition Component
428 Desktop Video Reminder Component
430 Web Page Reminder Component
432 Physical history log
434 Completed Form Confirmation Component
436 Time Propagation Component
438 Location component
440 PC initialization component
442 Document Authoring Component
444 Capture Device Authoring Component
446 Unconscious (automatic) upload component
448 Document Version Acquisition Component
450 PC Document Metadata Component
452 Capture Device User Interface (UI) Component
454 Domain specific components
610 Document fingerprint verification system
712 quality evaluation module
3400 database system
3402 MMR feature extraction module
3404 MMR index table module
3406 Evidence storage module
3408 relational database
3414 Document Conversion Application Module
3416 Sub-image extraction module
3418 Feedback-oriented feature search module
3705 computer
3710 source files
3715 Browser
3720 plug-in
3725 Symbolic hotspot description
3730 Modified file
3735 acquisition module
3740 page_desc.xml
3745 hotspot.xml
3750 data store
3755 SDA server
3760 MMR printing software
3765 conversion module
3768 embedded modules
3770 analysis module
3775 conversion module
3778 Feature Extraction Module
3780 annotation module
3785 Hotspot module
3790 representation / display module
3795 storage module
5300 Business entity
5310 MMR Service Provider
5312 MMR Consumer
5314 Multimedia companies
5316 Printer user
5318 Cellular telephone service provider
5320 Hardware manufacturer
5322 Hardware retailers
5324 Financial institutions
5326 Credit card processor
5328 Document publishers
5330 Document printer
5332 fulfillment house
5334 Cable TV Provider
5336 Service Provider
5338 Software Provider
5340 Advertising companies
5370 Business Network

Claims

A computer-implemented method for organizing and accessing information in a mixed media document system comprising:
Generating an electronic representation of the paper document;
Identifying features on the paper document and capturing a two-dimensional form of the paper document;
Identifying the location of the feature;
Indexing the features by the location and generating an index table;
Having a method.

The method of claim 1, further comprising a preliminary step of receiving the paper document.

The method of claim 1, further comprising storing one or more features associated with at least one of the features.

The one or more features include: extraction of text information; extraction of graphic information; execution of processes; execution of commands; arranging in a certain order; extracting video; extracting sound; storing information; 4. The method of claim 3, including one or more actions including at least one of creating a new document, printing the document, and displaying the document.

The method of claim 1, wherein identifying features on the paper document and capturing a two-dimensional form of the paper document includes identifying horizontally aligned objects and vertically aligned objects.

The method of claim 1, wherein identifying features on the paper document and capturing a two-dimensional form of the paper document includes identifying a horizontal word pair and a vertical word pair.

The method of claim 1, wherein the step of generating an electronic representation of the paper document is performed during a scanning or printing process.

The step of identifying features on the paper document and capturing the two-dimensional form of the paper document is to examine the amount of vertical overlap between two successive sequences, thereby converting the sequence of text into logical lines. The method of claim 1 including the step of grouping into:

Receiving one or more query terms and capturing a two-dimensional positional relationship between objects in the search target document;
Calculating at least one mixed media document and location candidates that may be responsive to the query terms based on data from the index table;
The method of claim 1 further comprising:

Prior to receiving one or more query terms and capturing a two-dimensional positional relationship between objects in the document to be searched,
Receiving the search object document; and
Creating at least a patch image of the document to be searched;
Generating one or more query terms based on the image;
10. A method according to claim 9, comprising:

The method of claim 10, wherein generating one or more query terms based on the image includes generating horizontal and vertical word pairs extracted from the image.

Calculating at least one mixed media document and location candidates;
Identifying a stored page that tends to best match the patch of the search target document;
Calculating a location closest to the center of the patch in the page;
The method of claim 10 comprising:

Each word pair is associated with an inverse document frequency function,
Identifying a stored page that tends to best match at least a patch of the document to be searched;
Adding the inverse document frequency function of each word pair to an accumulator indexed on the document page in which the word pair appears;
Outputting a document page that matches the patch with a corresponding document page according to a maximum value in the accumulator that exceeds a threshold;
The method according to claim 12, comprising:

Calculating the location in the page closest to the center of the patch;
Weighting each cell in a zone around each word pair, wherein the weight of each cell is normalized between the inverse document frequency of the word pair and the center of the cell and the zone A step determined by the product of the geometric distance;
Finding the corresponding Accum array of the cell accumulator with the maximum value;
Notifying the cell as the location of the patch according to a maximum value exceeding a threshold;
14. The method of claim 13, comprising:

Calculating at least one mixed media document and location candidates;
Looking up each of the one or more query terms in an index table for searching one or more locations associated with each query term;
Identifying, for each identified location, one or more candidate regions that include the location;
10. A method according to claim 9, comprising:

Calculating at least one mixed media document and location candidates;
Identifying one of the one or more candidate regions that best matches all of the one or more query terms;
Determining that the one of the one or more candidate regions meets a predetermined match criterion, and determining the region as matching the search target document;
The method according to claim 15, comprising:

A computer readable medium encoded with instructions, wherein the instructions, when executed on one or more processors, include a process for organizing and accessing information in a mixed media document system. Of the processor, the process
Generating an electronic representation of a paper document,
Identifying features on the paper document and capturing a two-dimensional form of the paper document;
Identifying the location of the feature;
Indexing the features by the location and generating an index table;
A computer readable medium including

The process is
The computer-readable medium of claim 17, further comprising storing one or more features associated with at least one of the features.

18. The computer readable medium of claim 17, wherein identifying features on the paper document and capturing a two-dimensional form of the paper document includes identifying horizontal and vertical word pairs.

The process is
Receiving one or more query terms, capturing a two-dimensional positional relationship between objects in the document to be searched, and at least responding to the query terms based on data from the index table Calculating one mixed media document and location candidates;
The computer-readable medium of claim 17 further comprising:

A computer-implemented method for accessing information in a mixed media document system, comprising:
Receiving one or more query terms and capturing a two-dimensional positional relationship between objects in the search target document;
Calculating at least one mixed media document and location candidate that may respond to the query terms based on data from an index table;
And wherein the index table indexes document features and feature locations of mixed media documents.

Prior to receiving one or more query terms and capturing a two-dimensional positional relationship between objects in the document to be searched,
Receiving the search object document; and
Creating at least a patch image of the document to be searched;
Generating one or more query terms based on the image;
The method of claim 21, comprising:

23. The method of claim 22, wherein generating one or more query phrases based on the image includes generating horizontal and vertical word pairs extracted from the image.

Calculating at least one mixed media document and location candidates;
Identifying a stored page that tends to best match the patch of the search target document;
Calculating a location in the page closest to the center of the patch;
24. The method of claim 23, comprising:

Each word pair is associated with an inverse document frequency function,
Identifying a stored page that tends to best match at least a patch of the search target document;
Adding the inverse document frequency function of each word pair to an accumulator indexed on the document page in which the word pair appears;
Outputting a corresponding document page as the patch niche in response to a maximum value in the accumulator exceeding a threshold;
25. The method of claim 24, comprising:

Calculating the location in the page closest to the center of the patch;
Adding a weight to each cell in a zone around each word pair, wherein the weight of each cell is normalized between the inverse document frequency of the word pair and the center of the cell and the zone A step determined by the product of the geometric distance;
Finding the corresponding Accum array of the cell accumulator with the maximum value;
Reporting the cell as the location of the patch in response to a maximum value exceeding a threshold;
26. The method of claim 25, comprising:

Calculating at least one mixed media document and location candidates;
Looking up each of the one or more query terms in an index table for searching one or more locations associated with each query term;
Identifying, for each identified location, one or more candidate regions that include the location;
The method of claim 21 comprising:

Calculating at least one mixed media document and location candidates;
Identifying one of the one or more candidate regions that best matches all of the one or more query terms;
Confirming that the one of the one or more candidate regions meets a predetermined match criterion, and determining the region as matching the search target document;
28. The method of claim 27, comprising:

A machine-readable medium encoded with instructions, where the instructions, when executed on one or more processors, cause the processor to execute a process for accessing information in a mixed media document system, the process comprising: ,
Receiving one or more query terms and capturing a two-dimensional positional relationship between objects in the search target document;
Calculating at least one mixed media document and location candidate that may respond to the query terms based on data from an index table;
Wherein the index table indexes document features and feature locations of mixed media documents.

Prior to receiving one or more query terms and capturing a two-dimensional positional relationship between objects in the document to be searched,
Receiving the search object document; and
Creating at least a patch image of the document to be searched;
Generating one or more query terms based on the image;
30. The machine readable medium of claim 29, comprising:

31. The machine readable medium of claim 30, wherein generating one or more query terms based on the image includes generating horizontal and vertical word pairs extracted from the image.

Calculating at least one mixed media document and location candidates;
Identifying a stored page that tends to best match the patch of the search target document;
Calculating a location in the page closest to the center of the patch;
32. The machine readable medium of claim 31, wherein:

Each word pair is associated with an inverse document frequency function,
Identifying a stored page that tends to best match at least a patch of the search target document;
Adding the inverse document frequency function of each word pair to an accumulator indexed on the document page in which the word pair appears;
Outputting a corresponding document page as the patch niche in response to a maximum value in the accumulator exceeding a threshold;
35. The machine-readable medium of claim 32, comprising:

Calculating the location in the page closest to the center of the patch;
Adding a weight to each cell in a zone around each word pair, wherein the weight of each cell is normalized between the inverse document frequency of the word pair and the center of the cell and the zone A step determined by the product of the geometric distance;
Finding the corresponding Accum array of the cell accumulator with the maximum value;
Reporting the cell as the location of the patch in response to a maximum value exceeding a threshold;
34. The machine readable medium of claim 33, comprising:

Calculating at least one mixed media document and location candidates;
Looking up each of the one or more query terms in an index table for searching one or more locations associated with each query term;
Identifying, for each identified location, one or more candidate regions that include the location;
Identifying one of the one or more candidate regions that best matches all of the one or more query terms;
Confirming that the one of the one or more candidate regions meets a predetermined match criterion, and determining the region as matching the search target document;
30. The machine readable medium of claim 29, comprising: