JP6262708B2

JP6262708B2 - Document detection method for detecting original electronic files from hard copy and objectification with deep searchability

Info

Publication number: JP6262708B2
Application number: JP2015255694A
Authority: JP
Inventors: カークテク，
Original assignee: コニカミノルタラボラトリーユー．エス．エー．，インコーポレイテッド
Priority date: 2014-12-31
Filing date: 2015-12-28
Publication date: 2018-01-17
Anticipated expiration: 2035-12-28
Also published as: CN105740317B; CN105740317A; JP2016129021A

Description

本発明は深い検索性を有するオブジェクト化及びハードコピーからオリジナルの電子ファイルを検出するドキュメント検出方法に関する。 The present invention relates to an object having deep searchability and a document detection method for detecting an original electronic file from a hard copy.

ネイティブ電子ファイルによってユーザーは様々なオプションや機能によりドキュメントを容易に編集することができる。ネイティブファイルは異なるファイル形式に変換されることがある（つまり非ネイティブファイル）。しかし一般に、ドキュメントの編集性は非ネイティブ形式では低下する。具体的に、ネイティブファイルの使用時においては、ユーザーは文書処理ドキュメントの表における個別のセルを編集しうる。しかし、ユーザーがファイルの非ネイティブ形式を使用している場合は、ユーザーが行える表の編集が限られる。例えば、ユーザーは個別のセルを編集できず、ユーザーができることは、単にページ上で表全体を配置しうる場所を選択しうることに限定されうる。 Native electronic files allow users to easily edit documents with various options and functions. Native files may be converted to different file formats (ie non-native files). In general, however, document editability is reduced in non-native formats. Specifically, when using a native file, the user can edit individual cells in a table of document processing documents. However, if the user is using a non-native file format, the user can only edit the table. For example, the user cannot edit individual cells, and what the user can do can be limited to simply selecting where the entire table can be placed on the page.

電子ドキュメントの管理は組織の大小を問わず難しい業務である。ユーザーがオリジナルを見つけられない場合、紛失した電子ドキュメントの捜索やドキュメントを再び作成するのに用いられる労力に何千時間や何百万ドルが無駄に消費される。場合によっては、ユーザーはドキュメントの物理的コピーまたはその他の非ネイティブコピーを保有することもあるが、ネットワークドライブやデータリポジトリ、例えばエンタープライズコンテンツ管理（ＥＣＭ）リポジトリのどこかに記憶されている可能性があるオリジナル電子ドキュメントを見つけることができない。ユーザーはドキュメントを改めて作成しうるが、高品質で改めて作成しても、改めて作成されたドキュメントはオリジナル電子ドキュメントと同じにはならないことがある。 Managing electronic documents is a difficult task regardless of the size of the organization. If the user cannot find the original, thousands of hours and millions of dollars are wasted on the effort used to locate the lost electronic document and recreate the document. In some cases, a user may have a physical copy or other non-native copy of a document, but may be stored somewhere in a network drive or data repository, such as an Enterprise Content Management (ECM) repository. I cannot find an original electronic document. Although the user can create a new document, the newly created document may not be the same as the original electronic document even if the document is newly created with high quality.

非ネイティブファイルの例として物理的なドキュメントが挙げられる。物理的なドキュメントは家庭、オフィス、及びその他の環境というあらゆる場においてみられる。多くの物理的ドキュメントは電子ドキュメントからのプリントアウトであり、例えば、演算装置のワード処理アプリケーションからのものが挙げられる。ユーザーは演算装置を用いて物理的ドキュメントを編集したいと思うことがある。これを行うには、ユーザーはまず物理的ドキュメントをスキャナーや複合機を用いてスキャンした上で、スキャンされたドキュメントのオブジェクトを認識しうるソフトウェアによってラスタライズされた画像を分析して処理する。例えば、テキスト認識や変換等の一般的な処理は光学式文字認識（ＯＣＲ）ソフトウェアを用いて行うことができる。しかし、非テキストオブジェクトは認識できず、編集もできない。画像内のテキストも適切に定義されていない場合は、テキストの認識も編集もできないことがある。いずれの場合も、オブジェクトは一般にビットマップオブジェクトとして扱われるか、オリジナルのスキャンからベクター形式に変換され、そのネイティブ形式においては認識することができない。 An example of a non-native file is a physical document. Physical documents can be found everywhere—home, office, and other environments. Many physical documents are printouts from electronic documents, such as those from word processing applications on computing devices. A user may wish to edit a physical document using a computing device. To do this, the user first scans a physical document using a scanner or multifunction device, and then analyzes and processes the rasterized image by software that can recognize the scanned document object. For example, general processing such as text recognition and conversion can be performed using optical character recognition (OCR) software. However, non-text objects cannot be recognized or edited. If the text in the image is not properly defined, the text may not be recognized or edited. In either case, the object is generally treated as a bitmap object or converted from an original scan to a vector format and cannot be recognized in its native format.

ユーザーは電子ドキュメントを見つけるため、ドキュメントテキストからの文字列をネットワークドライブやデータリポジトリにおいて検索することがある。例えば、ユーザーはハードコピーをスキャンして光学式文字認識（ＯＣＲ）ソフトウェアを用いて比較を行うことによって、ネットワークドライブやＥＣリポジトリにおいて一致するものを検出することがある。しかし、単純なテキストの検索では必ずしも十分とはいえない。例えばドキュメントにテキストがない場合やテキストが適切に定義されていない場合は、ＯＣＲソフトウェアが非テキストオブジェクトを認識できないため、ドキュメントの検索をすることができない。その他の例として、ドキュメントはごく一般的な言葉しか含まない場合、返ってくる検索の結果が多すぎることがある。 To find an electronic document, a user may search a network drive or data repository for a string from the document text. For example, a user may detect a match in a network drive or EC repository by scanning a hard copy and making a comparison using optical character recognition (OCR) software. However, a simple text search is not always sufficient. For example, if there is no text in the document or if the text is not properly defined, the OCR software cannot recognize the non-text object and cannot search the document. As another example, if a document contains only common words, it may return too many search results.

本発明の一の側面によれば、コンピュータープロセッサーを備えたコンピューティングシステムにより、非ネイティブファイル内のオブジェクトを含む非テキストコンテンツをオブジェクト化する方法であって、前記コンピュータープロセッサーにより、ネイティブファイル形式において前記オブジェクトを認識するためのタグを決定し、オブジェクトとタグとを含むオブジェクト化オブジェクトを作成することによって、非テキストコンテンツのオブジェクトをオブジェクト化するステップと、前記コンピュータープロセッサーにより、前記オブジェクト化オブジェクトに基づいて、構成情報の少なくとも一部がネイティブファイルのためのネイティブアプリケーションによって検索可能なテキストデータである、オブジェクト化オブジェクトの構成情報を含むメタデータを生成するステップと、前記コンピュータープロセッサーにより、前記メタデータを付加した前記オブジェクト化オブジェクトを含む新しいネイティブファイルを生成するステップと、を備える。 According to one aspect of the present invention, there is provided a method for objectifying non-text content including objects in a non-native file by a computing system comprising a computer processor , wherein the computer processor performs the above processing in a native file format . Determining a tag for recognizing the object, and creating an objected object including the object and the tag to objectify the object of the non-text content, and based on the objectized object by the computer processor , Objectized objects where at least part of the configuration information is text data that can be searched by native applications for native files Comprising the steps of: generating metadata including bets configuration information, by the computer processor, generating a new native file containing the object object obtained by adding the metadata, the.

本発明の一の側面によれば、非ネイティブファイル内のオブジェクトを含む非テキストコンテンツをオブジェクト化するシステムであって、コンピュータープロセッサーと、前記コンピュータープロセッサー上で実行されるオブジェクト化部と、を備え、前記オブジェクト化部は、ネイティブファイル形式において前記オブジェクトを認識するためのタグを決定し、オブジェクトとタグとを含むオブジェクト化オブジェクトを作成することによって、非テキストコンテンツのオブジェクトをオブジェクト化し、前記オブジェクト化オブジェクトに基づいて、構成情報の少なくとも一部がネイティブファイルのためのネイティブアプリケーションによって検索可能なテキストデータである、オブジェクト化オブジェクトの構成情報を含むメタデータを生成し、前記メタデータを付加した前記オブジェクト化オブジェクトを含む新しいネイティブファイルを生成する。 According to one aspect of the present invention, a system for converting non-text content including an object in a non-native file into an object, comprising: a computer processor; and an objectification unit executed on the computer processor, The objectification unit determines a tag for recognizing the object in a native file format , creates an objectized object including the object and the tag, thereby converting the object of the non-text content into an object, and the objectized object The meta data containing the configuration information of the object object that is at least part of the configuration information is text data that can be searched by the native application for the native file. It generates data to generate a new native file containing the object object obtained by adding the metadata.

本発明の一の側面によれば、非ネイティブファイル内のオブジェクトを含む非テキストコンテンツをオブジェクト化する指示を含むコンピュータープログラムであって、コンピューターに、ネイティブファイル形式においてオブジェクトを認識するためのタグを決定し、オブジェクトとタグとを含むオブジェクト化オブジェクトを作成することによって、非テキストコンテンツのオブジェクトをオブジェクト化させ、前記オブジェクト化オブジェクトに基づいて、構成情報の少なくとも一部がネイティブファイルのためのネイティブアプリケーションによって検索可能なテキストデータである、オブジェクト化オブジェクトの構成情報を含むメタデータを生成させ、前記メタデータを付加した前記オブジェクト化オブジェクトを含む新しいネイティブファイルを生成させる。 According to one aspect of the present invention, there is provided a computer program comprising instructions for object the non-text content containing the object non-native file to a computer, determining the tag to recognize an object in a native file format The object of the non-text content is made into an object by creating an object object including the object and the tag, and at least a part of the configuration information is based on the object object by the native application for the native file. New metadata including the objectized object to which the metadata including the configuration information of the objectized object is generated and the metadata is added is generated as searchable text data. The Restorative file Ru to produce.

一般に、ある側面によれば、本発明は以下に関する、コンピュータープロセッサーを備えたコンピューティングシステムによりドキュメントを検出する方法であって、前記コンピュータープロセッサーにより、非テキストオブジェクトを有するドキュメントの物理的コピーのスキャンを受信するステップと、前記コンピュータープロセッサーにより、オリジナルファイルにおいて前記非テキストオブジェクトを認識するための第一タグを前記非テキストオブジェクトについて決定するステップと、前記コンピュータープロセッサーにより、前記第一タグに基づいて、前記非テキストオブジェクトの構成情報を含む非テキストオブジェクトメタデータを生成するステップと、前記コンピュータープロセッサーにより、前記生成された非テキストオブジェクトメタデータを用いて、データリポジトリに記憶され、それぞれがオブジェクトと前記オブジェクトに関連付けられた検索可能なメタデータを含む複数の電子ドキュメントを検索するステップと、前記コンピュータープロセッサーにより、前記非テキストオブジェクトメタデータを前記検索可能なメタデータと比較するステップと、前記コンピュータープロセッサーにより、前記非テキストオブジェクトメタデータが前記検索可能なメタデータと一致する場合、ユーザーに前記オリジナルファイルの位置を提供するステップと、を備える。 In general, according to one aspect, the invention relates to a method for detecting a document with a computing system comprising a computer processor, the computer processor scanning a physical copy of a document having a non-text object. receiving, by the computer processor, determining a first tag for recognizing Oite the non-text object in the original file for said non-text object, by the computer processor, based on the first tag and generating a non-text object metadata including configuration information of the non-text object, by the computer processor, a non-text which is the product Using object meta data, stored in the data repository, retrieving a plurality of electronic documents, each including a searchable metadata associated with the with the object object, by the computer processor, the non-text object meta Comparing data to the searchable metadata; and providing, by the computer processor, a location of the original file to a user if the non-text object metadata matches the searchable metadata; Is provided.

一般に、ある側面によれば、本発明は以下に関する、ドキュメントを検出するシステムであって、オブジェクトと前記オブジェクトに関連付けられた検索可能なメタデータとを含む電子ドキュメントを複数記憶するデータリポジトリと、コンピュータープロセッサーと、前記コンピュータープロセッサー上で実行されるドキュメントロケーターと、を備え、前記ドキュメントロケーターは、非テキストオブジェクトを有するドキュメントの物理的コピーのスキャンを受信し、オリジナルファイルにおいて前記非テキストオブジェクトを認識するための第一タグを前記非テキストオブジェクトについて決定し、前記第一タグに基づいて、前記非テキストオブジェクトの構成情報を含む非テキストオブジェクトメタデータを生成し、前記生成された非テキストオブジェクトメタデータを用いて、前記データリポジトリに記憶される複数の電子ドキュメントを検索し、前記非テキストオブジェクトメタデータを前記検索可能なメタデータと比較し、前記非テキストオブジェクトメタデータが前記検索可能なメタデータと一致する場合、ユーザーに前記オリジナルファイルの位置を提供する。 In general, according to one aspect, the present invention relates to a system for detecting a document, comprising: a data repository storing a plurality of electronic documents including an object and searchable metadata associated with the object; and a computer comprising a processor, and a document locator that runs on said computer processor, said document locator receives a scan of a physical copy of a document having a non-text object, recognizing Oite the non-text object in the original file the first tag to determine for said non-text object, based on the first tag, wherein generates the non-text object metadata including the configuration information of the non-text object, the generated Using non-text object metadata, the searching a plurality of electronic documents stored in the data repository, wherein the non-text object metadata compared to the searchable metadata, the non-text object metadata the search If it matches the possible metadata, it provides the user with the location of the original file.

一般に、ある側面によれば、本発明は以下に関する、ドキュメントを検出する指示を含むコンピュータープログラムであって、コンピューターに、非テキストオブジェクトを有するドキュメントの物理的コピーのスキャンを受信させ、オリジナルファイルにおいて前記非テキストオブジェクトを認識するための第一タグを前記非テキストオブジェクトについて決定させ、前記第一タグに基づいて、前記非テキストオブジェクトの構成情報を含む非テキストオブジェクトメタデータを生成させ、前記生成された非テキストオブジェクトメタデータを用いて、データリポジトリに記憶され、それぞれがオブジェクトと前記オブジェクトに関連付けられた検索可能なメタデータを含む複数の電子ドキュメントを検索させ、前記非テキストオブジェクトメタデータを前記検索可能なメタデータと比較させ、前記非テキストオブジェクトメタデータが前記検索可能なメタデータと一致する場合、ユーザーに前記オリジナルファイルの位置を提供させる。 In general, according to one aspect, the present invention is a computer program that includes instructions for detecting a document with respect to the following, causing the computer to receive a scan of a physical copy of a document having a non-text object, and to copy the original file . A first tag for recognizing the non-text object is determined for the non-text object, non-text object metadata including configuration information of the non-text object is generated based on the first tag, and the generation is performed. using non-text object metadata, stored in the data repository, thereby respectively retrieved a plurality of electronic documents that contain searchable metadata associated with the object and said object, said non-text objects Metadata is compared with the searchable metadata, if the non-text object meta data matches the searchable metadata, Ru is provided the position of the original file to the user.

本発明のその他の特徴は以下の明細書の記載及び添付されるクレームにおいて明らかにされる。 Other features of the invention will be apparent from the following description and the appended claims.

図１は本発明の第一実施形態におけるシステムの概略図を示す。FIG. 1 shows a schematic diagram of a system in the first embodiment of the present invention.

図２は本発明の第一実施形態におけるフローチャートを図示する。FIG. 2 shows a flowchart in the first embodiment of the present invention.

図３は本発明の第一実施形態における実施例を図示する。FIG. 3 illustrates an example of the first embodiment of the present invention.

図４は本発明の第二実施形態におけるシステムの概略図を示す。FIG. 4 shows a schematic diagram of a system in the second embodiment of the present invention.

図５は本発明の第二実施形態におけるフローチャートを図示する。FIG. 5 illustrates a flowchart in the second embodiment of the present invention.

図６は本発明の第二実施形態における実施例を図示する。FIG. 6 illustrates an example of the second embodiment of the present invention.

図７は本発明の一以上の実施形態のコンピューティングシステムを図示する。FIG. 7 illustrates a computing system of one or more embodiments of the invention.

以下に、添付の図面を参照して、本発明の具体的な実施形態について、詳細に説明する。整合性を図るため、各図面における類似の要素には同様の参照符号が付加される。 Hereinafter, specific embodiments of the present invention will be described in detail with reference to the accompanying drawings. For consistency, like elements in each drawing are given like reference numerals.

以下の本発明の実施形態の詳細な説明においては、本発明のより深い理解を提供するため、具体的な詳細について説明がされる。しかし、これらの具体的な詳細によらなくても本発明を実施しうることは当業者にとっては明らかであろう。その他の場面において、不必要に複雑な説明を回避するため、周知の要素については詳細な説明が省略されている。 In the following detailed description of the embodiments of the present invention, specific details are set forth in order to provide a deeper understanding of the present invention. However, it will be apparent to those skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known elements have not been described in detail in order to avoid unnecessarily complicated description.

（第一実施形態）
一般的に、本発明の第一実施形態は非ネイティブドキュメントオブジェクトをそのネイティブ形式で認識するオブジェクト化（オブジェクトの定義）を行うための方法及びシステムを提供する。例えば、ユーザーが非ネイティブファイル（例えば、電子ドキュメントをプリントアウトしたハードコピー）にアクセスしうるが、ネイティブファイルにアクセスできない場合、ユーザーがドキュメントを編集したり検索したりする能力は限られる。本発明の第一実施形態によれば、ユーザーはネイティブファイル形式において編集することができ、深い検索性を有する新たな電子ドキュメントを作成することができる。深い検索性によって、オペレーティングシステム及び／またはドキュメントプログラムにおける既存の及び／または内蔵されるテキスト検索機能によって、画像、チャート、表、グラフ、写真、等のオブジェクトと共通する連続する記述を用いた検索を実行することができる。 (First embodiment)
Generally, the first embodiment of the present invention provides a method and system for objectification (object definition) for recognizing a non-native document object in its native format. For example, if a user can access a non-native file (eg, a hard copy of an electronic document printed out) but cannot access the native file, the ability of the user to edit and search the document is limited. According to the first embodiment of the present invention, the user can edit in the native file format, and can create a new electronic document having deep searchability. Deep searchability allows existing and / or built-in text search capabilities in the operating system and / or document program to search using consecutive descriptions in common with objects such as images, charts, tables, graphs, photographs, etc. Can be executed.

第一の実施形態によれば、オブジェクトを含む非ネイティブファイルが取得される。オブジェクトのタグを決定し、オブジェクト及びタグでオブジェクト化されたオブジェクトを作成することによってオブジェクトはオブジェクト化される。オブジェクト化されたオブジェクトに基づいてメタデータを生成し、オブジェクト化されたオブジェクトとメタデータとを含む新たなネイティブファイルを生成しうる。新たなネイティブファイルにおけるオブジェクト化されたオブジェクトはそのネイティブ形式によって編集することができ、メタデータを検索することができる。 According to the first embodiment, a non-native file including an object is acquired. An object is objectified by determining the tag of the object and creating an object that is objectified with the object and tag. Metadata may be generated based on the objectified object, and a new native file including the objectized object and metadata may be generated. The objectified object in the new native file can be edited by its native format and the metadata can be retrieved.

よって、本発明の第一実施形態において非ネイティブファイル（例えばプリントアウトされたハードコピーの形式やＰＤＦ（ＰｏｒｔａｂｌｅＤｏｃｕｍｅｎｔＦｏｒｍａｔ）形式の電子ドキュメント、またはプリントアウトされたハードコピーのスキャン画像）から始まり、深い検索性を有するメタデータを含むオブジェクト化された非テキストコンテンツの新たな電子ファイルで終わる、ドキュメントワークフローを提供する。例えば、第一の実施形態によれば、ユーザーは電子ドキュメントのハードコピー形式をスキャンし、スキャンされたコンテンツをオブジェクト化し、認識されたオブジェクトに基づいて検索可能なメタデータを生成しうる。メタデータは認識されたオブジェクトに関連付けられるか周辺に配置される検索可能で隠されたテキストとして埋め込まれる連続する記述でありうる。その結果、ユーザーは新たな目的で再利用可能な及び／または自然な言語による問い合わせによって深い検索が可能な電子ドキュメントを取得することができる。第一実施形態においては、「非テキストコンテンツ」は定型化されたテキスト、グラフィカルなテキスト、その他の従来のＯＣＲソフトウェアによって認識不可能だったテキストを含みうる。つまり、「非テキストコンテンツ」はコンテンツをテキストと非テキストとで分類した場合にテキストコンテンツとして認識されないコンテンツでありうる。 Therefore, in the first embodiment of the present invention, it starts from a non-native file (for example, a printed hard copy format, a PDF (Portable Document Format) electronic document, or a scanned image of a printed hard copy), and deep. Provide a document workflow ending with a new electronic file of objected non-text content containing searchable metadata. For example, according to the first embodiment, a user can scan a hard copy format of an electronic document, convert the scanned content into an object, and generate searchable metadata based on the recognized object. Metadata can be a continuous description embedded as searchable and hidden text associated with a recognized object or placed around. As a result, the user can obtain an electronic document that can be reused for new purposes and / or can be deeply searched by a query in a natural language. In the first embodiment, “non-text content” may include stylized text, graphical text, and other text that cannot be recognized by conventional OCR software. That is, “non-text content” may be content that is not recognized as text content when the content is classified into text and non-text.

図１は本発明の第一実施形態によるシステムの例を示す簡略化された概略図を示す。具体的に、図１は演算装置（１０５）、ネイティブファイル（１１０）、コンテンツ（１１５）、オブジェクト（１２０）、タグ（１２５）、非ネイティブファイル（１３０）、非ネイティブコンテンツ（１３５）、非ネイティブオブジェクト（１４０）、スキャナー（１４５）、オブジェクト化部（１５０）、及びサーバー（１５５）を含むシステム（１００）である。第一実施形態においては、演算装置（１０５）として電子ファイルを作成しうるあらゆる装置を用いることが可能であり、例えばデスクトップコンピューター、ラップトップコンピューター、スマートフォン、タブレット等を含む。演算装置（１０５）は様々な構成要素を含み、例えば、プロセッサー、メモリー、入力装置等が挙げられる（いずれも図示略）。第一実施形態においては、演算装置（１０５）はユーザーが電子ドキュメントを作成するのに用いうる様々なプログラムやアプリケーション（図示略）を実行しうる。これらのプログラムやアプリケーションとして、例えば、ワード処理プログラム、スライドショープログラム、スプレッドシートアプリケーション、ノートをとるアプリケーション等が挙げられる。 FIG. 1 shows a simplified schematic diagram illustrating an example of a system according to a first embodiment of the present invention. Specifically, FIG. 1 shows a computing device (105), native file (110), content (115), object (120), tag (125), non-native file (130), non-native content (135), non-native The system (100) includes an object (140), a scanner (145), an objectification unit (150), and a server (155). In the first embodiment, any device capable of creating an electronic file can be used as the computing device (105), and includes, for example, a desktop computer, a laptop computer, a smartphone, a tablet, and the like. The arithmetic unit (105) includes various components, and examples thereof include a processor, a memory, and an input device (all not shown). In the first embodiment, the computing device (105) can execute various programs and applications (not shown) that can be used by a user to create an electronic document. Examples of these programs and applications include a word processing program, a slide show program, a spreadsheet application, and a note taking application.

これらの電子ドキュメントは情報を記憶、共有、保管、及び検索するために演算装置のユーザーに用いられうる。これらのドキュメントは一時的にまたは永久的にファイルに記憶される。様々な異なるファイル形式が存在する。各ファイル形式はファイルのコンテンツがどのように符号化されるかを定義する。つまり、ファイル形式に基づいてファイルのコンテンツが読み出されて表示される。主にドキュメントを作成及び／または編集するために用いられるファイル形式もあれば、主に他の目的、例えば他者とのドキュメントの共有のために用いられるファイル形式もある。ファイル形式の具体例として、例えばオフィスオープンＸＭＬ（ＯＯＸＭＬ）、ＰＤＦ等が挙げられる。 These electronic documents can be used by computing device users to store, share, store, and retrieve information. These documents are stored temporarily or permanently in a file. There are a variety of different file formats. Each file format defines how the contents of the file are encoded. That is, the file content is read and displayed based on the file format. Some file formats are used primarily for creating and / or editing documents, while others are used primarily for other purposes, such as sharing documents with others. Specific examples of the file format include office open XML (OOXML), PDF, and the like.

ユーザーは例えばＯＯＸＭＬドキュメントからＰＤＦドキュメントへの変換等、あるファイル形式のドキュメントを他のドキュメント形式に変換する場合がある。また、ユーザーは電子ドキュメントの物理的コピーを印刷することがある。このような作業により、ネイティブファイル形式の様々な特徴が失われることがある。一般に、これらの特徴はユーザーには見えないが、ファイルの編集性を低下させる等、重大な結果をもたらすことがある。しかし、以下に詳細に説明するが、本発明の第一実施形態によってこのような結果を抑えることができる。 A user may convert a document of a certain file format into another document format, such as conversion from an OOXML document to a PDF document. The user may also print a physical copy of the electronic document. Due to such work, various features of the native file format may be lost. In general, these features are not visible to the user, but can have serious consequences, such as reducing the editability of the file. However, as described in detail below, such a result can be suppressed by the first embodiment of the present invention.

引き続き図１を参照して、第一実施形態によれば、ネイティブファイル（１１０）はドキュメントが作成されたオリジナルのファイル形式の電子ドキュメントである。ネイティブファイル（１１０）はあらゆるファイル形式であることができ、既知のものや今後開発されるものでもありうる。ネイティブファイル（１１０）は演算装置（１０５）またはその他の適切な場所に記憶される。第一実施形態によれば、ネイティブファイル（１１０）は非ネイティブファイル（１３０）等の他の形式のファイルに変換しうる。ネイティブファイル（１１０）は、ネイティブファイルを作成するのに用いられたプログラムを用いてネイティブファイルをユーザーが参照した際に表示されるコンテンツ（１１５）等のデータを含む。 Still referring to FIG. 1, according to the first embodiment, the native file (110) is an electronic document in the original file format from which the document was created. The native file (110) can be in any file format and can be known or later developed. The native file (110) is stored in the computing device (105) or other suitable location. According to the first embodiment, the native file (110) can be converted into a file of another format such as a non-native file (130). The native file (110) includes data such as content (115) displayed when the user refers to the native file using the program used to create the native file.

第一実施形態においては、コンテンツ（１１５）は電子ドキュメントにみられるあらゆるコンテンツを含み、例えば、テキスト、写真、表、チャート、画像、数式等を含むが、これらに限定されない。第一実施形態において、コンテンツ（１１５）は一以上のオブジェクト（１２０）を含む。オブジェクト（１２０）はテキスト、グラフィック画像またはその他の表示することができるコンテンツ（１１５）の一部でありうる。グラフィック画像はビットマップ画像やベクターグラフィック画像を含みうる。例えば、グラフィック画像は定型化されたテキスト（例えば、ワードアート）、チャート、写真画像またはその他のグラフィックでありうる。テキストオブジェクトにはバナライゼーション、単語分割及びＯＣＲ技術等の従来技術が適用されるため、以下の説明において、コンテンツがテキストオブジェクトと判断された場合についての詳細な説明は省略される。 In the first embodiment, content (115) includes any content found in electronic documents, including but not limited to text, photos, tables, charts, images, formulas, and the like. In the first embodiment, the content (115) includes one or more objects (120). The object (120) may be part of text (graphic image) or other displayable content (115). Graphic images can include bitmap images and vector graphic images. For example, the graphic image can be a stylized text (eg, word art), a chart, a photographic image, or other graphic. Since conventional techniques such as bannerization, word segmentation, and OCR techniques are applied to the text object, a detailed description of the case where the content is determined to be a text object is omitted in the following description.

第一実施形態によれば、非テキストオブジェクトと判断されたオブジェクト（１２０）は一以上の隠されたタグ（１２５）によって区切られる。具体的には、タグ（１２５）によって一以上のオブジェクトの構成情報が設定され、情報には書式情報や種類情報が含まれる。構成情報の少なくとも一部はネイティブファイルのネイティブアプリケーションにより検索しうるテキストデータでありうる。書式はオブジェクトがどのように表示されるかを示す。書式には色、大きさ、影付き、画像ファイル名（例えばｐｕｐｐｙ．ｊｐｇ）及びその他のこのような情報が含まれる。種類はオブジェクトが何かを示す。例えば、種類は特定の種類のチャート、ワードアート、画像、表、クリップアート、箇条書きの一覧、及びその他の種類のものを含みうる。 According to the first embodiment, the object (120) determined to be a non-text object is delimited by one or more hidden tags (125). Specifically, configuration information of one or more objects is set by the tag (125), and the information includes format information and type information. At least a portion of the configuration information may be text data that can be retrieved by a native application of the native file. The format indicates how the object is displayed. The format includes color, size, shading, image file name (eg, ppy.jpg) and other such information. The type indicates what the object is. For example, types may include specific types of charts, word art, images, tables, clip art, bulleted lists, and other types.

引き続きタグ（１２５）について、例えば円グラフオブジェクト（つまり、円グラフに対応するオブジェクト）は円グラフを特定するための隠されたスタートタグと隠されたエンドタグによってファイルの残りの部分から区切られうる。円グラフオブジェクト内には円グラフを構成する各扇形の大きさと扇形の色を定義するタグが存在しうる。棒グラフオブジェクトは棒グラフを特定するための隠されたスタートタグと隠されたエンドタグによってオリジナルのファイルの残りの部分から区切られうる。テキストは、フォントサイズ、フォント名、フォント色、及びテキストのその他の特性を指定するタグによって区切られうる。当業者は本明細書の開示から、様々なタグが存在しうるものであり、本発明は上記例示に限定されるものではないことを理解するであろう。 Continuing with tag (125), for example, a pie chart object (ie, an object corresponding to a pie chart) can be separated from the rest of the file by a hidden start tag and a hidden end tag to identify the pie chart. Within the pie chart object, there can be tags that define the size and color of each sector that make up the pie chart. The bar graph object can be separated from the rest of the original file by a hidden start tag and a hidden end tag to identify the bar graph. Text can be delimited by tags that specify font size, font name, font color, and other characteristics of the text. Those skilled in the art will understand from the disclosure herein that various tags may exist and the present invention is not limited to the above examples.

第一実施形態によれば、非ネイティブファイル（１３０）はネイティブファイル（１１０）とファイルの形式が異なるネイティブファイル（１１０）のコピーである。非ネイティブファイル（１３０）はあらゆるファイル形式でありうる。例えば、第一実施形態においては、非ネイティブファイル（１３０）はネイティブファイル（１１０）のプリントアウトもしくは物理的コピーであり、またはプリントアウトのスキャンされた画像である。第一実施形態においては、非ネイティブファイル（１３０）はネイティブファイル（１１０）と同じかほぼ同じにみえる。しかし、ネイティブファイル（１１０）が印刷され、または非ネイティブファイル（１３０）に変換された際にタグ（１２５）等の一部のデータが失われた可能性がある。図１において、非ネイティブファイル（１３０）とネイティブファイル（１１０）とが点線によって結ばれ、これはこれらが「同じ」ファイルであることを示す。 According to the first embodiment, the non-native file (130) is a copy of the native file (110) whose file format is different from that of the native file (110). The non-native file (130) can be any file format. For example, in the first embodiment, the non-native file (130) is a printout or physical copy of the native file (110), or a scanned image of the printout. In the first embodiment, the non-native file (130) looks the same or nearly the same as the native file (110). However, some data such as the tag (125) may be lost when the native file (110) is printed or converted to a non-native file (130). In FIG. 1, the non-native file (130) and the native file (110) are connected by a dotted line, which indicates that they are “same” files.

第一実施形態においては、非ネイティブコンテンツ（１３５）はコンテンツ（１１５）の非ネイティブコピーである。コンテンツ（１１５）と同様に非ネイティブコンテンツ（１３５）はあらゆるコンテンツを含み、写真、表、チャート、画像等を含むがこれらに限定されない。第一実施形態においては、非ネイティブコンテンツ（１３５）は一以上の非ネイティブオブジェクト（１４０）を含む。非ネイティブオブジェクト（１４０）はオブジェクト（１２０）の非ネイティブコピーである。重要なこととして、変換処理及び／または印刷によってタグが失われるため、非ネイティブオブジェクト（１４０）はタグと関連付けられていない。 In the first embodiment, the non-native content (135) is a non-native copy of the content (115). Like content (115), non-native content (135) includes any content, including but not limited to photographs, tables, charts, images, and the like. In the first embodiment, the non-native content (135) includes one or more non-native objects (140). Non-native object (140) is a non-native copy of object (120). Importantly, the non-native object (140) is not associated with a tag because the tag is lost due to the conversion process and / or printing.

第一実施形態においては、スキャナー（１４５）はスキャン機能を有するスキャナーまたはその他の装置であり、例えば複合機（ＭＦＰ）が挙げられる。スキャナー（１４５）は様々な異なる構成要素を含み、プロセッサー、メモリー、ディスプレイ、入力装置等が含まれるがこれらに限定されない。スキャナー（１４５）はスキャナー及び／またはＭＦＰに共通して関連するあらゆる機能を含み、ドキュメントを光学的にスキャンし、ドキュメントをデジタル画像に変換すること、光学式文字認識（ＯＣＲ）を実行すること、画像をラスタライズすること等が含まれる。スキャナー（１４５）は様々な異なるファイル形式及び／または解像度のドキュメントを生成しうる。スキャナー（１４５）は有線及び／または無線の接続を介してサーバー（１５５）に通信可能に接続される。スキャナー（１４５）は任意に他の装置に接続することができ、例えばパーソナルコンピューター、タブレット、スマートフォン等が挙げられる。 In the first embodiment, the scanner (145) is a scanner or other device having a scanning function, such as a multifunction peripheral (MFP). The scanner (145) includes a variety of different components, including but not limited to a processor, memory, display, input device, and the like. The scanner (145) includes all functions commonly associated with the scanner and / or MFP, optically scanning the document, converting the document into a digital image, performing optical character recognition (OCR), For example, rasterizing an image. The scanner (145) can generate documents in a variety of different file formats and / or resolutions. The scanner (145) is communicably connected to the server (155) via a wired and / or wireless connection. The scanner (145) can be arbitrarily connected to other devices, and examples thereof include a personal computer, a tablet, and a smartphone.

第一実施形態においては、スキャナー（１４５）はオブジェクト化部（１５０）を実行する機能を有する。オブジェクト化部（１５０）は非ネイティブファイル内のオブジェクトをオブジェクト化するためのプログラムまたはモジュールである。図１に示すように、オブジェクト化部（１５０）はスキャナー（１４５）及び／またはサーバー（１５５）の上で実行される。具体的には、オブジェクト化部（１５０）はオブジェクトのタグを決定し、オブジェクト化されたオブジェクトを生成し、メタデータを生成し、新たなネイティブファイルを生成する機能を有する。 In the first embodiment, the scanner (145) has a function of executing the objectification unit (150). The object converting unit (150) is a program or module for converting an object in a non-native file into an object. As shown in FIG. 1, the objectification unit (150) is executed on the scanner (145) and / or the server (155). Specifically, the objectification unit (150) has a function of determining a tag of an object, generating an objectized object, generating metadata, and generating a new native file.

第一実施形態においては、オブジェクト化部（１５０）はあらゆる適切な態様でオブジェクトのタグを判断する機能を有する。オブジェクトはスキャナー、サーバー、またはその他の演算装置を介してオブジェクト化部（１５０）に提供され、または既知のまたは今後開発されるあらゆる方法でオブジェクト化部（１５０）により検出されうる。いったんオブジェクトが認識されると、オブジェクト化部（１５０）はオブジェクトを分析し、オブジェクトがネイティブファイルの一部だったときにオブジェクトに当初関連付けられたまたは関連付けられた可能性があるタグを判断する。第一実施形態においては、オブジェクト化部（１５０）は様々なテンプレートとオブジェクトとを比較しうる。各テンプレートは一以上のタグを示す。一致または一致に近い状態がみられた場合、テンプレートと関連付けられたタグはそのオブジェクトをオブジェクト化するのに用いられうる。第一実施形態においては、オブジェクト化部（１５０）はどのタグがオブジェクトと関連付けられるかについてもっとも有力と思われる推測をすることができる。またはオブジェクト化部（１５０）はその他の適切な方法を用いて、どのタグがオブジェクトと関連付けられるかを判断しうる。 In the first embodiment, the objectification unit (150) has a function of determining the tag of the object in any appropriate manner. The object is provided to the objectification unit (150) via a scanner, server, or other computing device, or can be detected by the objectification unit (150) in any manner known or later developed. Once the object is recognized, the objectification unit (150) analyzes the object to determine a tag that was originally associated with or possibly associated with the object when the object was part of the native file. In the first embodiment, the objectification unit (150) can compare various templates with objects. Each template represents one or more tags. If a match or close match is found, the tag associated with the template can be used to object the object. In the first embodiment, the objectification unit (150) can make the most likely guess as to which tag is associated with the object. Alternatively, the objectification unit (150) may use other suitable methods to determine which tag is associated with the object.

第一実施形態においては、オブジェクト化部（１５０）はオブジェクト化されたオブジェクトを作成する機能を有する。オブジェクト化されたオブジェクトはネイティブファイルのオリジナルのオブジェクトと似ているか、同じである。オブジェクト化されたオブジェクトは電子ファイルの適切な位置にタグを挿入することによって作成しうる。または、オブジェクト化されたオブジェクトはその他の既知のまたは今後開発されるオブジェクト認識方法またはパターンマッチング方法によって作成することができる。 In the first embodiment, the objectification unit (150) has a function of creating an object that has been converted into an object. Objectified objects are similar to or the same as the original object in the native file. Objectified objects can be created by inserting tags at appropriate locations in the electronic file. Alternatively, the objectized object can be created by other known or later developed object recognition methods or pattern matching methods.

第一実施形態においては、オブジェクト化部（１５０）はオブジェクト化されたオブジェクトに関するメタデータを作成する機能を有する。オブジェクト化されたオブジェクトのメタデータにおいてネイティブファイル形式でどのようにオブジェクトが構成されるかが記述されうる。例えば、オブジェクトの種類、オブジェクトの書式、オブジェクトの位置及び／またはオブジェクトのその他の特徴及び／または記述を含みうる。例えば、各オブジェクトの位置はオブジェクトが位置するページのページ番号及びページのｘｙ座標によって指定しうる。第一実施形態においては、メタデータは表示の目的でオブジェクトをレンダリングするために用いられていない。つまり、メタデータは検索等の情報の目的のみのために用いられうる。第一実施形態においては、メタデータはその一部またはすべてがユーザーには隠されている。 In the first embodiment, the objectification unit (150) has a function of creating metadata relating to an object that has been converted into an object. It is possible to describe how the object is configured in the native file format in the object-formed object metadata. For example, it may include the type of object, the format of the object, the location of the object and / or other characteristics and / or descriptions of the object. For example, the position of each object can be specified by the page number of the page where the object is located and the xy coordinates of the page. In the first embodiment, metadata is not used to render the object for display purposes. That is, metadata can be used only for information purposes such as searching. In the first embodiment, part or all of the metadata is hidden from the user.

第一実施形態においては、オブジェクト化部（１５０）は新たなネイティブファイルを生成する機能を有する。新しいネイティブファイルは既知のまたは今後開発されるあらゆる方法によって生成しうる。具体的には、新しいネイティブファイルはオブジェクト化されたオブジェクトを含む。新しいネイティブファイルは、ネイティブファイルが非ネイティブファイルに変換された際に失われた、オリジナルネイティブファイルの多くのまたはすべての特徴（例えばタグ）を有しうる。第一実施形態においては、サーバー（１５５）またはスキャナー（１４５）上で実行されるその他のソフトウェア等のその他の構成要素によって新たなネイティブファイルが生成されうる。その結果ユーザーは、オリジナルネイティブファイルを生成したプログラム（例えば、マイクロソフトオフィスプログラム）を用いて非テキストコンテンツを再度作成することなく、ドキュメント内の非テキストコンテンツを編集しうる。 In the first embodiment, the objectification unit (150) has a function of generating a new native file. New native files can be generated by any method known or later developed. Specifically, the new native file contains an object that has been objectified. The new native file may have many or all features (eg, tags) of the original native file that were lost when the native file was converted to a non-native file. In the first embodiment, a new native file may be generated by other components such as other software running on the server (155) or scanner (145). As a result, the user can edit the non-text content in the document without re-creating the non-text content using a program (for example, a Microsoft office program) that generated the original native file.

第一実施形態においては、サーバー（１５５）はオブジェクト化部（１５０）を実現しうる、サーバー、ラック、デスクトップコンピューター、ラップトップコンピューター、またはその他の演算装置である。サーバー（１５５）は様々な異なる構成を有しうるものであり、本発明は図１に示される構成に限定されない。 In the first embodiment, the server (155) is a server, rack, desktop computer, laptop computer, or other computing device that can implement the objectification unit (150). The server (155) can have a variety of different configurations, and the present invention is not limited to the configuration shown in FIG.

図２は本発明の第一実施形態に基づくフローチャートを図示する。フローチャートの様々なステップは順番に示して説明されているが、当業者は一部またはすべてのステップが異なる順番で実行しうること及び一部またはすべてのステップが平行に実行しうることを理解しうるであろう。また、本発明の第一実施形態においては、下記に記載される一以上のステップが、省略され、繰り返され、及び／または異なる順番で実行されうる。さらに、本発明の範囲を逸脱することなく下記に記載されない追加のステップを実行しうる。よって、図２に記載される具体的なステップの構成は本発明の範囲を限定する方向で解釈されるべきではない。 FIG. 2 illustrates a flow chart according to the first embodiment of the present invention. Although the various steps of the flowchart are illustrated and described in order, those skilled in the art will understand that some or all of the steps may be performed in a different order and that some or all of the steps may be performed in parallel. It will be possible. Also, in the first embodiment of the present invention, one or more steps described below may be omitted, repeated, and / or executed in a different order. Furthermore, additional steps not described below may be performed without departing from the scope of the present invention. Therefore, the specific steps described in FIG. 2 should not be construed as limiting the scope of the invention.

ステップ２００において、オブジェクトを含む非ネイティブファイルが取得される。第一実施形態においては、非ネイティブファイルは電子ドキュメントのプリントアウトまたは物理的コピーである。具体的には物理的コピーを有するユーザーはドキュメントのネイティブ電子オリジナル／コピーにアクセスすることができないかもしれないが、電子ドキュメントを手動で作り直すことなく自分のコンピューター上で当該ドキュメントを編集したいと思っている。第一実施形態においては、非ネイティブファイルはスキャナーから取得されうる。または、非ネイティブファイルはネイティブファイルとは異なるファイル形式の電子ファイルでありうる。第一実施形態においては、非ネイティブファイルはメモリー、データリポジトリ、またはその他の適切な情報源から取得しうる。 In step 200, a non-native file containing an object is obtained. In the first embodiment, the non-native file is a printout or physical copy of an electronic document. Specifically, users with physical copies may not have access to the document's native electronic original / copy, but want to edit the document on their computer without manually recreating the electronic document. Yes. In the first embodiment, the non-native file can be obtained from a scanner. Alternatively, the non-native file may be an electronic file having a file format different from that of the native file. In the first embodiment, the non-native file may be obtained from memory, a data repository, or other suitable information source.

第一実施形態においては、ユーザーは推測されるネイティブファイル形式に関する入力を実行しうる。例えば、ユーザーが、物理的ドキュメントが「ワード処理プログラムＡ」によって当初作成されたと考える場合、ユーザーはその情報を追加の入力として提供しうる。この追加の入力によってステップ２０５においてどのタグがオブジェクトと関連付けられるべきかや、どのようなファイル形式を作成すべきかについて判断する際の補助となりうる。 In the first embodiment, the user may perform input regarding the inferred native file format. For example, if the user thinks that the physical document was originally created by “Word Processing Program A”, the user may provide that information as additional input. This additional input can assist in determining in step 205 which tags should be associated with the object and what file format should be created.

ステップ２０５において、オブジェクトについてタグが決定される。タグはオブジェクトの少なくとも一部を定義し、あらゆる適切な方法によって決定しうる。第一実施形態においては、オブジェクトをテンプレートと比較することで、オブジェクトが既知のタグ付オブジェクトと似ているか判断しうる。または、最も有力な推測をするアルゴリズム、ユーザーからの入力またはその他の適切な方法に基づいてタグを決定する。 In step 205, a tag is determined for the object. A tag defines at least part of an object and can be determined by any suitable method. In the first embodiment, it is possible to determine whether an object is similar to a known tagged object by comparing the object with a template. Alternatively, the tag is determined based on the most probable guessing algorithm, user input, or other suitable method.

ステップ２１０において、オブジェクト化されたオブジェクト（例えばＯＯＸＭＬオブジェクト）が作成される。オブジェクト化されたオブジェクトはオブジェクト及びオブジェクトの一以上のタグを含む。つまり、オブジェクト化されたオブジェクトはネイティブファイルにおけるオブジェクトの一部のまたは全部の再形成である。オブジェクト化されたオブジェクトによってユーザーは「単純な」オブジェクトよりも機能的にオブジェクト化されたオブジェクトを編集しうる。例えば、本発明を用いなければ、オブジェクトが円の場合、オブジェクトがスキャンされると、円はラスタライズされた画像として記憶され、ユーザーは、ページ上の位置の選択や簡単なサイズ変更等、基本的な編集機能しか用いることができない。オブジェクト化された円によって、ユーザーは円の色を編集し、線の濃さを調整し、パターンを追加する等のことが可能となる。 In step 210, an objectified object (eg, an OOXML object) is created. An objectified object includes an object and one or more tags of the object. That is, an object that is made into an object is a reconstruction of part or all of the object in the native file. The objectized object allows the user to edit a functionalized object rather than a “simple” object. For example, without using the present invention, when the object is a circle, when the object is scanned, the circle is stored as a rasterized image, and the user can select a position on the page or perform a simple size change. Only a simple editing function can be used. The object circle allows the user to edit the color of the circle, adjust the line density, add a pattern, etc.

ステップ２１５において、メタデータが生成される。すべてのオブジェクト化されたコンテンツは既知の検索アルゴリズムによって分析され、オブジェクトの近傍にメタデータが追加される。当該メタデータは既知のまたは今後開発されるあらゆる方法によって生成しうる。当該メタデータはオブジェクト化されたオブジェクトに基づきうる。第一実施形態においては、メタデータはオブジェクトの例えば、オブジェクトの種類、サイズ、色、位置、形状等の各特徴を記述しうる。 In step 215, metadata is generated. All objectified content is analyzed by known search algorithms and metadata is added in the vicinity of the object. Such metadata may be generated by any method known or later developed. The metadata may be based on the object that is made into an object. In the first embodiment, the metadata can describe each feature of the object, such as the type, size, color, position, and shape of the object.

ステップ２２０において、オブジェクト化されたオブジェクトとメタデータを含む新たなネイティブファイルが生成される。新しいネイティブファイルは既知のまたは今後開発されるあらゆる方法によって生成されうる。新しいネイティブファイルは非ネイティブファイルを用いて生成され、オブジェクト化されたオブジェクトのためにまたはこれを用いて決定されたタグ及びメタデータが付加された新しい電子ドキュメントである。有利な点として、オブジェクト化されたオブジェクトとこれに添付されるメタデータによって、新しいネイティブファイルを用いてユーザーはより多くの機能を用いることができ、例えば深い検索性やオブジェクト化されたオブジェクトの向上した編集性が挙げられる。 In step 220, a new native file containing the objectified object and metadata is generated. New native files can be generated by any known or later developed method. A new native file is a new electronic document that is created using a non-native file and with the tags and metadata determined for or using the objected object. The advantage is that the objectized object and the metadata attached to it allow the user to use more features with the new native file, for example deeper searchability and improved objectized object. Editability.

ステップ２２５において、新しいネイティブファイルが編集及び／または検索される。点線によって示されるように、ステップ２２５は任意であり、あらゆる演算装置を用いていつでも実行しうる。編集及び／または検索は既知のまたは今後開発されるあらゆる方法によって実行しうる。 In step 225, a new native file is edited and / or retrieved. As indicated by the dotted lines, step 225 is optional and can be performed at any time using any computing device. Editing and / or searching may be performed by any method known or later developed.

図３は本発明の第一実施形態に基づく一例を示す。当該例示は説明のみを目的とするものであり、本発明の範囲を限定するものではない。具体的に、図３は紙ファイル（３００）から深い検索性と向上した編集機能を有する電子ドキュメントへ変換する処理を図示する。まず、ユーザーは紙ファイル（３００）を有する。紙ファイル（３００）はページの中央に大きな黒い三角形が印刷された一枚の紙である。ユーザーは紙ファイル（３００）を自分のコンピューター上で編集したいと考えるが、手動でファイルを作り直すのに時間を使うことを望まない。ユーザーは紙ファイル（３００）が当初コンピューターを用いて作成された電子ドキュメントであったことを知っているが、ユーザーは電子コピーがどこにあるのかわからない。 FIG. 3 shows an example based on the first embodiment of the present invention. The examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Specifically, FIG. 3 illustrates a process of converting a paper file (300) into an electronic document having deep searchability and improved editing functions. First, the user has a paper file (300). The paper file (300) is a piece of paper with a large black triangle printed at the center of the page. The user wants to edit the paper file (300) on his computer, but does not want to spend time manually recreating the file. The user knows that the paper file (300) was originally an electronic document created using a computer, but the user does not know where the electronic copy is.

そこで、ユーザーは紙ファイル（３００）をスキャナー（３０５）に置き、ファイルをスキャンし、新たなネイティブファイル（３１０）を作成する。紙ファイル（３００）をスキャンした後、スキャナー（３０５）はオブジェクトが存在することを検出する、つまり黒い三角形である。スキャナー（３０５）は選択されたネイティブファイル形式においてどのタグが黒い三角形と関連付けられるかを決定することで、黒い三角形をオブジェクト化する。タグは新しいネイティブファイル形式の表示（３１５）で確認しうる。具体的にはスキャナー（３０５）は＜ｔｒｉａｎｇｌｅ：ｂｌａｃｋ＞というタグは黒い三角形に関連付けられるべきと決定する。よって、＜ｔｒｉａｎｇｌｅ：ｂｌａｃｋ＞というタグは新しいネイティブファイル（３１０）に含まれる、しかしタグはユーザーには見えない。このタグによってユーザーは（アクセスすることができない）当初のネイティブファイルで黒い三角形が作成された時に存在したものと同じ特徴や機能によって新しいネイティブファイル（３１０）における黒い三角形を編集することができる。 Therefore, the user places the paper file (300) on the scanner (305), scans the file, and creates a new native file (310). After scanning the paper file (300), the scanner (305) detects the presence of an object, ie a black triangle. The scanner (305) objects the black triangle by determining which tags are associated with the black triangle in the selected native file format. The tag can be confirmed by displaying the new native file format (315). Specifically, the scanner (305) determines that the tag <triangle: black> should be associated with a black triangle. Thus, the tag <triangle: black> is included in the new native file (310), but the tag is not visible to the user. This tag allows the user to edit the black triangle in the new native file (310) with the same features and functions that existed when the black triangle was created in the original native file (which cannot be accessed).

また、スキャナー（３０５）はオブジェクト化されたコンテンツを用いて、オブジェクトに関するメタデータを生成する。具体的には、この具体例におけるメタデータはオブジェクトを以下のように記述する、三角形、黒、等辺。しかし、これらのキーワードは任意に定義することができ、これらのキーワードを変更または修正することによって、特定の種類のオブジェクトを見つけるためにユーザーが探すものをよりよく捉えることが可能となる。このようなキーワードがどのように定義され、認識されたオブジェクトと関連付けられるかに関する詳細についてはＵＳ２０１４／０２５８２５８に記載され、参照することにより本明細書に組み込まれる。このメタデータは新しいネイティブファイル（３１０）に埋め込まれ、ユーザーには見えない。しかし、ユーザーはメタデータの用語を用いて新しいネイティブファイル（３１０）を検索しうる（一般に、見えないテキストは標準的なアプリ／ＯＳ検索ツールを用いて検出しうる）。よって、ユーザーがどこに新しいネイティブファイル（３１０）を保存したかわからない場合は、「等辺、黒、三角形」という用語を用いて自分のコンピューターのすべてのドキュメントに検索をかければ、新しいネイティブファイル（３１０）が検索結果に表れ、このような固有の検索用語によって筆頭の「検索ヒット」となる可能性が高い。 Further, the scanner (305) generates metadata regarding the object using the objectized content. Specifically, the metadata in this specific example describes the object as follows: triangle, black, equilateral. However, these keywords can be defined arbitrarily, and changing or modifying these keywords can better capture what the user is looking for to find a particular type of object. Details regarding how such keywords are defined and associated with recognized objects are described in US 2014/0258258, which is incorporated herein by reference. This metadata is embedded in the new native file (310) and is not visible to the user. However, users can search for new native files (310) using metadata terms (in general, invisible text can be detected using standard app / OS search tools). Thus, if the user does not know where the new native file (310) is stored, the new native file (310) can be searched by searching all documents on his / her computer using the term “equivalent, black, triangle”. Appears in the search result, and there is a high possibility that such a unique search term becomes the first “search hit”.

（第二実施形態）
一般に、本発明の第二実施形態はドキュメント検出に関する方法及びシステムを提供する。具体的には、本発明の第二実施形態によって、ユーザーは例えば、電子ドキュメントのスキャンされたハードコピー等の非ネイティブコピー内の非テキストオブジェクトに関する検索可能なメタデータを作成することによって、ドキュメントのコピーを用いてオリジナル電子ドキュメントを検出または見つけることが可能となる。第二実施形態によれば、例えば、ユーザーは検索可能なメタデータを電子ドキュメントと電子ドキュメントに由来するハードコピーとの間の用語の比較におけるテキストによるクエリーの一部として利用することができる。これによって、ユーザーがオリジナル電子ドキュメントを見つける可能性が上がり、引き続き編集、修正、印刷、保管等を行うことができる。本明細書において、物理的コピー、ハードコピー、紙コピー、プリントアウト及び物理的ファイルは同じ意味で用いられる。 (Second embodiment)
In general, the second embodiment of the present invention provides a method and system for document detection. Specifically, according to a second embodiment of the present invention, a user can create searchable metadata about a non-text object in a non-native copy, such as a scanned hard copy of an electronic document, for example. The copy can be used to detect or find the original electronic document. According to the second embodiment, for example, a user can utilize searchable metadata as part of a text query in a term comparison between an electronic document and a hard copy derived from the electronic document. This increases the possibility that the user will find the original electronic document, and it is possible to continue editing, correction, printing, storage and the like. In this specification, physical copy, hard copy, paper copy, printout, and physical file are used interchangeably.

第二実施形態においては、非テキストオブジェクトを含むドキュメントの物理的コピーのスキャンが受信される。非テキストオブジェクトについて一以上のタグが決定され、これらのタグに基づいてメタデータが生成される。続いて、非テキストオブジェクトメタデータを用いて電子ドキュメントを記憶するデータリポジトリが検索され、オリジナルドキュメントがみつけられる。任意に、メタデータに加えて、すでに存在する文字列を用いることができ、例えば、タイトル、見出し、または電子ドキュメントのその他のコンテンツが挙げられる。検出の際は、オリジナルドキュメントの場所がユーザーに提供される。 In a second embodiment, a scan of a physical copy of a document containing a non-text object is received. One or more tags are determined for the non-text object and metadata is generated based on these tags. Subsequently, the data repository storing the electronic document is searched using the non-text object metadata, and the original document is found. Optionally, in addition to the metadata, an existing string can be used, for example, a title, a heading, or other content of an electronic document. Upon detection, the location of the original document is provided to the user.

図４はデータリポジトリ（４０５）、電子ドキュメント（４１０）、オブジェクト（４１５）、タグ（４２０）、メタデータ（４２５）、物理的コピー（４３０）、非テキストオブジェクト（４３５）、スキャナー（４４０）、ドキュメントロケーター（４４５）、及び演算装置（４５０）を含むシステム（４００）を図示する。第二実施形態において、データリポジトリ（４０５）はメモリー、ハードドライブ、データベース、ネットワークドライブ、及び／または一以上の装置に設けられる一以上の記憶装置である。データリポジトリ（４０５）はエンタープライズコンテンツ管理（ＥＣＭ）システムの構成要素でありうる。データリポジトリ（４０５）はあらゆるサイズ、であることができ、あらゆる人数のユーザーがアクセスしうる。第二実施形態においては、データリポジトリは様々なレベルでユーザーの実行を許可しうる、つまり、データリポジトリに記憶されているすべてのファイルにフルアクセスが認められるユーザーもいれば、他のユーザーはアクセスしうるファイルが限定される。データリポジトリ（４０５）は電子ドキュメントを記憶する（４１０）。 FIG. 4 shows a data repository (405), electronic document (410), object (415), tag (420), metadata (425), physical copy (430), non-text object (435), scanner (440), 1 illustrates a system (400) including a document locator (445) and a computing device (450). In the second embodiment, the data repository (405) is a memory, hard drive, database, network drive, and / or one or more storage devices provided in one or more devices. The data repository (405) can be a component of an enterprise content management (ECM) system. The data repository (405) can be of any size and can be accessed by any number of users. In the second embodiment, the data repository may allow the user to execute at various levels, i.e., some users have full access to all files stored in the data repository, while others have access. Possible files are limited. The data repository (405) stores the electronic document (410).

第二実施形態において、電子ドキュメント（４１０）はデータリポジトリ（４０５）に記憶される電子ファイルである。電子ドキュメント（４１０）は情報の記憶、共有、保管、及び検索をする演算装置のユーザーに用いられる。このようなドキュメントは一時的または永久的にファイルに記憶される。様々な異なるファイル形式が存在する。各ファイル形式はファイルのコンテンツがどのように符号化されるかを定義する。つまり、ファイル形式に基づいてファイルのコンテンツは読み出され、表示される。主にドキュメントの作成及び／または編集に用いられるファイル形式があれば、他者とドキュメントを共有するため等、主にその他の様々な目的で用いられるファイル形式もある。ファイル形式の例として、例えばオフィスオープンＸＭＬ（ＯＯＸＭＬ）、ＰＤＦ等が挙げられる。 In the second embodiment, the electronic document (410) is an electronic file stored in the data repository (405). The electronic document (410) is used by users of computing devices that store, share, store, and retrieve information. Such documents are stored temporarily or permanently in a file. There are a variety of different file formats. Each file format defines how the contents of the file are encoded. That is, the content of the file is read and displayed based on the file format. If there is a file format mainly used for creation and / or editing of a document, there is also a file format mainly used for various other purposes such as sharing a document with others. Examples of file formats include office open XML (OOXML), PDF, and the like.

ユーザーはあるファイル形式のドキュメントを別のファイル形式のドキュメントに変換することがあり、例えばＯＯＸＭＬドキュメントをＰＤＦドキュメントに変換する。また、ユーザーは電子ドキュメントの物理的コピーを印刷しうる。これにより、ネイティブファイル形式の特徴が失われうる。一般にこのような特徴はユーザーには見えないが、これらは重大な結果をもたらしうる、例えばファイルの編集能力が低下したり、ファイルのコンテンツをその他の方法で変更したりする。電子ドキュメント（４１０）はオブジェクト（４１５）、タグ（４２０）、及びメタデータ（４２５）を含む。電子ドキュメント（４１０）はワード処理プログラム、ノートをとるプログラム、スプレッドシートプログラム、スライドショープログラム等、あらゆる適切なプログラムによって作成しうる。 A user may convert a document in one file format into a document in another file format, for example, convert an OOXML document into a PDF document. The user can also print a physical copy of the electronic document. As a result, the characteristics of the native file format can be lost. In general, such features are not visible to the user, but they can have serious consequences, for example, the ability to edit the file is reduced, or the contents of the file are modified in other ways. The electronic document (410) includes an object (415), a tag (420), and metadata (425). The electronic document (410) may be created by any suitable program, such as a word processing program, a note-taking program, a spreadsheet program, a slide show program.

第二実施形態において、オブジェクト（４１５）はテキスト、グラフィック画像、またはその他の表示可能なコンテンツでありうる。グラフィック画像はビットマップ画像やベクターグラフィック画像を含みうる。例えば、グラフィック画像は定型化されたテキスト（例えば、ワードアート）、チャート、写真画像、またはその他のグラフィックでありうる。 In the second embodiment, the object (415) can be text, a graphic image, or other displayable content. Graphic images can include bitmap images and vector graphic images. For example, the graphic image can be stylized text (eg, word art), a chart, a photographic image, or other graphic.

第二実施形態において、オブジェクト（４１５）は一以上の隠されたタグ（４２０）によって区切られる。具体的には、タグ（４２０）によって一以上のオブジェクトの構成情報が設定され、情報には書式情報や種類情報が含まれる。書式はオブジェクトがどのように表示されるかを示す。書式には色、大きさ、影付き、画像ファイル名（例えばｐｕｐｐｙ．ｊｐｇ）及びその他のこのような情報が含まれる。種類はオブジェクトが何かを示す。例えば、種類は特定の種類のチャート、ワードアート、テキスト、画像、表、クリップアート、箇条書きの一覧、及びその他の種類を含みうる。 In the second embodiment, the object (415) is delimited by one or more hidden tags (420). Specifically, configuration information of one or more objects is set by the tag (420), and the information includes format information and type information. The format indicates how the object is displayed. The format includes color, size, shading, image file name (eg, ppy.jpg) and other such information. The type indicates what the object is. For example, types may include specific types of charts, word art, text, images, tables, clip art, bulleted lists, and other types.

引き続きタグ（４２０）について、例えば円グラフオブジェクト（つまり、円グラフに対応するオブジェクト）は円グラフを特定するための隠されたスタートタグと隠されたエンドタグによってファイルの残りの部分から区切られうる。円グラフオブジェクト内には円グラフを構成する各扇形の大きさと扇形の色を定義するタグが存在しうる。棒グラフオブジェクトは棒グラフを特定するための隠されたスタートタグと隠されたエンドタグによってオリジナルのファイルの残りの部分から区切られうる。テキストは、フォントサイズ、フォント名、フォント色、及びテキストのその他の特性を指定するタグによって区切られうる。当業者は本明細書の開示から、様々なタグが存在しうるものであり、本発明は上記例示に限定されるものではないことを理解するであろう。 Continuing with tag (420), for example, a pie chart object (ie, an object corresponding to a pie chart) can be separated from the rest of the file by a hidden start tag and a hidden end tag to identify the pie chart. Within the pie chart object, there can be tags that define the size and color of each sector that make up the pie chart. The bar graph object can be separated from the rest of the original file by a hidden start tag and a hidden end tag to identify the bar graph. Text can be delimited by tags that specify font size, font name, font color, and other characteristics of the text. Those skilled in the art will understand from the disclosure herein that various tags may exist and the present invention is not limited to the above examples.

第二実施形態において、メタデータ（４２５）は電子ドキュメント内のオブジェクトに関する電子ドキュメントに記憶されるデータである。メタデータ（４２５）は検索可能なテキスト形式でありうる。オブジェクトのメタデータ（４２５）はファイル形式においてオブジェクトがどのように構成されるかを記述しうる。例えば、オブジェクトの種類、オブジェクトの書式、オブジェクトの位置、及び／またはオブジェクトのその他の特徴及び／または説明を記述しうる。例えば、各オブジェクトの位置はオブジェクトが位置するページのページ番号及びページのｘｙ座標によって設定しうる。第二実施形態においては、メタデータ（４２５）は表示の目的でオブジェクトをレンダリングするために用いられていない。つまり、メタデータは検索等の情報の目的のみのために用いられうる。検索は、既知のまたは今後開発される、あらゆるテキスト検索ツール、プログラム及び／または方法を用いて実行しうる。または、検索は専用の方法またはアプリケーションを用いて実行しうる。第二実施形態においては、メタデータの一部または全部がユーザーには隠されている（例えば、隠されたテキスト等）。第二実施形態においては、メタデータ（４２５）は任意のものであり、ソフトウェア開発者、ユーザー、ソフトウェア発行者、またはその他の適切なエンティティによって作成されたルールに基づいて定義しうる。例えば、青い正方形のメタデータはソフトウェア開発者により「青、正方形」と定義しうる。ユーザーが任意にどのようなメタデータが青い正方形に関連付けられるかを修正することができ、例えば、「会社ロゴ」という用語を追加しうる。 In the second embodiment, the metadata (425) is data stored in the electronic document regarding the object in the electronic document. The metadata (425) may be in a searchable text format. The object metadata (425) may describe how the object is organized in the file format. For example, the type of object, the format of the object, the position of the object, and / or other features and / or descriptions of the object may be described. For example, the position of each object can be set by the page number of the page where the object is located and the xy coordinates of the page. In the second embodiment, metadata (425) is not used to render the object for display purposes. That is, metadata can be used only for information purposes such as searching. The search may be performed using any text search tool, program and / or method known or later developed. Alternatively, the search can be performed using a dedicated method or application. In the second embodiment, part or all of the metadata is hidden from the user (for example, hidden text). In the second embodiment, the metadata (425) is optional and may be defined based on rules created by a software developer, user, software publisher, or other appropriate entity. For example, blue square metadata may be defined by the software developer as “blue, square”. The user can optionally modify what metadata is associated with the blue square, for example, the term “company logo” may be added.

第二実施形態においては、物理的コピー（４３０）は電子ドキュメントの紙コピーである。物理的コピー（４３０）は既知のまたは今後開発されるあらゆる方法によって印刷することができ、演算装置上で対応する電子ドキュメントが表示される態様にできる限り近い形で印刷しうる。また、物理的コピー（４３０）は演算装置上で電子ドキュメントが表示される態様と大きく異なる点を有しうる。例えば、使用されるテキストフォントで印刷することができなかったり、ページの余白の調整が必要だったりする。物理的コピー（４３０）はオリジナル電子ドキュメントのあらゆるコンテンツを含み、第二実施形態においては非テキストオブジェクトを含む（４３５）。 In the second embodiment, the physical copy (430) is a paper copy of an electronic document. The physical copy (430) can be printed by any known or later developed method and can be printed as close as possible to the manner in which the corresponding electronic document is displayed on the computing device. Further, the physical copy (430) may have a point that is greatly different from the manner in which the electronic document is displayed on the arithmetic device. For example, the text font used cannot be printed, or the page margin needs to be adjusted. The physical copy (430) includes any content of the original electronic document, and in a second embodiment includes a non-text object (435).

第二実施形態においては、非テキストオブジェクト（４３５）は物理的コピー（４３０）のページに印刷される非テキストオブジェクトである。非テキストオブジェクト（４３５）はオブジェクト（４１５）に関する、上述したあらゆる種類のものでありうる。例えば、非テキストオブジェクト（４３５）は赤い円、青い三角形、写真等でありうる。非テキストオブジェクト（４３５）は構成要素としてテキストを含みうるが（例えば表における列）、非テキストオブジェクト（４３５）は単純なテキストではなく、追加の書式やその他の特徴が存在する。 In the second embodiment, the non-text object (435) is a non-text object that is printed on the page of the physical copy (430). The non-text object (435) can be of any kind described above with respect to the object (415). For example, the non-text object (435) can be a red circle, a blue triangle, a photograph, or the like. Although the non-text object (435) may contain text as a component (eg, a column in a table), the non-text object (435) is not simple text and there are additional formatting and other features.

第二実施形態においては、スキャナー（４４０）は複合機（ＭＦＰ）等のスキャン機能を有するスキャナーまたはその他の装置である。スキャナー（４４０）は様々な異なる構成要素を有しうる、例えば、プロセッサー、メモリー、ディスプレイ、入力装置等が挙げられるがこれらに限定されない。スキャナー（４４０）はスキャナー及び／またはＭＦＰに共通に関連するあらゆる機能を含み、ドキュメントを光学的にスキャンし、ドキュメントをデジタル画像に変換すること、光学式文字認識（ＯＣＲ）を実行すること、画像をラスタライズすること等が含まれる。スキャナー（４４０）は様々な異なるファイル形式及び／または解像度のドキュメントを生成しうる。スキャナー（４４０）はインターネット等の有線及び／または無線の接続を介してデータリポジトリ（４０５）及び／または演算装置（４５０）に通信可能に接続される。 In the second embodiment, the scanner (440) is a scanner or other device having a scanning function such as a multifunction peripheral (MFP). The scanner (440) may have a variety of different components, including but not limited to a processor, memory, display, input device, and the like. The scanner (440) includes all functions commonly associated with the scanner and / or MFP, optically scans the document, converts the document into a digital image, performs optical character recognition (OCR), image Including rasterizing the file. The scanner (440) can generate documents in a variety of different file formats and / or resolutions. The scanner (440) is communicably connected to the data repository (405) and / or the computing device (450) via a wired and / or wireless connection such as the Internet.

第二実施形態においては、スキャナー（４４０）はドキュメントロケーター（４４５）を実行する機能を有する。ドキュメントロケーター（４４５）はドキュメントを探すためのプログラムまたはモジュールである。図１に示すように、ドキュメントロケーター（４４５）はスキャナー（４４０）、演算装置（４５０）、及び／またはその他の適切な装置において実行されうる。具体的にはドキュメントロケーター（４４５）は以下の機能を有する、ドキュメントの物理的コピーのスキャンを受信し、オブジェクトのタグを決定し、メタデータを生成し、許可を決定し、電子ドキュメントを見つけ、電子ドキュメントをユーザーに提供する。 In the second embodiment, the scanner (440) has a function of executing a document locator (445). The document locator (445) is a program or module for searching for a document. As shown in FIG. 1, document locator (445) may be implemented in scanner (440), computing device (450), and / or other suitable devices. Specifically, the document locator (445) receives a scan of a physical copy of the document, has the following functions, determines the tag of the object, generates metadata, determines permissions, finds the electronic document, Provide electronic documents to users.

第二実施形態においては、ドキュメントロケーター（４４５）は既知のまたは今後開発されるあらゆる形式で、あらゆる解像度のドキュメントの物理的コピーのスキャンを受信する機能を有する。ドキュメントロケーター（４４５）はスキャナー自体または別の演算装置のスキャンを受信しうる。いったんスキャンが受信されると、ドキュメントロケーター（４４５）はオブジェクトのタグを決定する機能を有する。オブジェクトはスキャンされたドキュメント内において、スキャナーまたはその他の演算装置によって特定され、または既知のまたは今後開発されるあらゆる態様でドキュメントロケーター（４４５）によって特定されうる。いったんオブジェクトが特定されると、ドキュメントロケーター（４４５）はオブジェクトを分析し、オブジェクトが電子ファイルだった時にどのタグがオブジェクトに当初関連付けられたか、関連付けられた可能性があるかを決定する。第二実施形態においては、ドキュメントロケーター（４４５）は様々な種類のテンプレートをオブジェクトと比較しうる。各テンプレートは一以上のタグを示しうる。一致または一致に近いものがあれば、そのテンプレートに関連付けられたタグはオブジェクトのオブジェクト化に用いられうる。第二実施形態においては、ドキュメントロケーター（４４５）はどのタグがオブジェクトに関連付けられるべきか有力な推測を実行しうる。または、ドキュメントロケーター（４４５）はその他の適切な方法を用いてそのタグがどのオブジェクトに関連付けられるべきか判断しうる。 In the second embodiment, the document locator (445) is capable of receiving a scan of a physical copy of a document of any resolution in any known or later developed format. The document locator (445) may receive a scan of the scanner itself or another computing device. Once the scan is received, the document locator (445) has the function of determining the tag of the object. Objects can be identified within a scanned document by a scanner or other computing device, or by a document locator (445) in any manner known or later developed. Once the object is identified, the document locator (445) analyzes the object to determine which tag was originally associated with, and possibly associated with, the object when it was an electronic file. In the second embodiment, the document locator (445) can compare various types of templates with objects. Each template can indicate one or more tags. If there is a match or close match, the tag associated with the template can be used to objectify the object. In the second embodiment, the document locator (445) may perform a powerful guess as to which tag should be associated with the object. Alternatively, the document locator (445) may use other suitable methods to determine which object the tag should be associated with.

第二実施形態においては、ドキュメントロケーター（４４５）はメタデータを生成する機能を有する。メタデータは既知のまたは今後開発されるあらゆる方法によって生成しうる。具体的に、メタデータはドキュメントロケーター（４４５）がオブジェクトに関連付けられるべきと決定するタグに基づくものであり、メタデータはサイズ、大きさ、色、パターン、位置等、オブジェクトのあらゆる特徴を記述する。第二実施形態においては、既存の検索機能を用いて迅速にドキュメントの電子コピーを見つけるためにメタデータはテキスト形式である。 In the second embodiment, the document locator (445) has a function of generating metadata. The metadata can be generated by any method known or later developed. Specifically, the metadata is based on a tag that the document locator (445) determines to be associated with the object, and the metadata describes all the characteristics of the object such as size, size, color, pattern, position, etc. . In the second embodiment, the metadata is in text format to quickly find an electronic copy of the document using an existing search function.

第二実施形態においては、ドキュメントロケーター（４４５）は電子ドキュメントを見つける機能を有する。電子ドキュメントは既知のまたは今後開発されるあらゆる方法によって見つけうる。例えば、テキストによる検索を用いて、電子ドキュメントを見つける。検索で用いられるテキストはオブジェクトに関するメタデータの一部またはすべてである。任意に、検索に用いられるテキストは電子ドキュメント内に存在する通常のテキストを含みうる。検索された電子ドキュメントがオブジェクトに関するメタデータを含めるように予め処理されたとすると、ドキュメントにまったくまたはほとんどテキストが含まれないとしても、このようなテキストによる検索によってある物理的コピーと一致する可能性がある電子ドキュメントの数々がもたらされうる。つまり、メタデータを検索用語として用いることによってドキュメントロケーター（４４５）はその他の数多くの異なるドキュメントに頻繁に登場するテキストの代わりに、電子ドキュメント内で見つけられるオブジェクト（例えばページ３の中央にみられる青、緑、赤を含む円グラフ）に基づいて効率的に電子ドキュメントを見つけることができる。 In the second embodiment, the document locator (445) has a function of finding an electronic document. Electronic documents can be found by any method known or later developed. For example, an electronic document is found using a text search. The text used in the search is part or all of the metadata about the object. Optionally, the text used for the search may include normal text that exists in the electronic document. If the retrieved electronic document was preprocessed to include metadata about the object, such a text search could match a physical copy, even if the document contains little or no text. A number of electronic documents can be brought. That is, by using metadata as a search term, the document locator (445) replaces text that appears frequently in many other different documents, instead of the objects found in the electronic document (eg, the blue color found in the center of page 3). E-documents can be found efficiently based on pie charts, including green, red).

第二実施形態においては、ドキュメントロケーター（４４５）は許可状況を判断する機能を有する。許可状況によって電子ドキュメントを閲覧、修正及び／またはアクセスしうる者を管理する。ユーザーがドキュメントの紙コピーを有することは必ずしもユーザーがファイルの電子版にアクセスする権限を有することを意味しない。そこでドキュメントの電子コピーがどこにあるのかをドキュメントロケーター（４４５）がユーザーに知らせる前に、ドキュメントロケーター（４４５）によってユーザーの許可状況を判断し、ユーザーがそのファイルにアクセスすることが認められるかどうかを確実にする。第二実施形態においては、ドキュメントロケーター（４４５）はユーザーの許可状況を判断するために、ユーザーにログイン、パスワードの提供、または自分が何者かを示すことを求めうる。または、その他の適切な方法によって、ドキュメントロケーター（４４５）は許可状況を判断してチェックしうる。 In the second embodiment, the document locator (445) has a function of determining permission status. Manage who can view, modify and / or access electronic documents according to permission status. Having a paper copy of a document does not necessarily mean that the user has the authority to access an electronic version of the file. Therefore, before the document locator (445) informs the user where the electronic copy of the document is, the document locator (445) determines the user's permission status and determines whether the user is allowed to access the file. to be certain. In the second embodiment, the document locator (445) may require the user to log in, provide a password, or indicate who he is in order to determine the user's permission status. Alternatively, the document locator (445) may determine and check the permission status by other suitable methods.

第二実施形態においては、ドキュメントロケーター（４４５）はユーザーに見つけられた電子ドキュメントを提供する機能を有する。見つけられた電子ドキュメントは様々な方法でユーザーに提供されうる。第二実施形態において、見つけられた電子ドキュメントはユーザーに電子メールで送信しうる。または、ファイル名及び／または位置をスキャナー上に表示、印刷、またはユーザーに電子メールで送信等を実行しうる。本明細書の開示により、見つけられたドキュメントをユーザーに提供するには様々な方法があることを当業者は理解し、よって本発明は上述の例に限定されない。 In the second embodiment, the document locator (445) has a function of providing the electronic document found to the user. The found electronic document can be provided to the user in various ways. In a second embodiment, the found electronic document can be emailed to the user. Alternatively, the file name and / or position may be displayed on the scanner, printed, sent to the user by e-mail, or the like. With the disclosure herein, one of ordinary skill in the art will appreciate that there are various ways to provide the found document to the user, and thus the present invention is not limited to the examples described above.

第二実施形態において、演算装置（４５０）として、電子ファイルを作成することができるあらゆる装置を用いることができ、例えば、デスクトップコンピューター、ラップトップコンピューター、スマートフォン、タブレット等が挙げられる。演算装置（４５０）は様々な異なる構成要素を有し、例えば、図示しないプロセッサー、メモリー、入力装置等が挙げられる。第二実施形態においては、演算装置（４５０）において、ユーザーが電子ドキュメントを作成するのに用いうる様々なプログラム／アプリケーション（図示しない）を実行しうる。これらのプログラム／アプリケーションとして、例えば、ワード処理プログラム、スライドショープログラム、スプレッドシートアプリケーション、ノートをとるアプリケーション等が挙げられる。第二実施形態においては、演算装置（４５０）はデータリポジトリ（４０５）に記憶される電子ドキュメントを記憶、修正、またはアクセスしうる。また、上述のように、演算装置（４５０）はドキュメントロケーター（４４５）を実行しうる。 In the second embodiment, any device capable of creating an electronic file can be used as the computing device (450), and examples include a desktop computer, a laptop computer, a smartphone, and a tablet. The arithmetic device (450) has various different components, and examples thereof include a processor, a memory, and an input device (not shown). In the second embodiment, the computer (450) can execute various programs / applications (not shown) that can be used by a user to create an electronic document. Examples of these programs / applications include a word processing program, a slide show program, a spreadsheet application, and an application for taking notes. In the second embodiment, the computing device (450) may store, modify, or access an electronic document stored in the data repository (405). Further, as described above, the computing device (450) can execute the document locator (445).

図５は本発明の第二実施形態に基づくフローチャートを示す。フローチャートの様々なステップは順番に示して説明されているが、当業者は一部またはすべてのステップが異なる順番で実行しうること及び一部またはすべてのステップが平行に実行しうることを理解しうるであろう。また、本発明の第二実施形態においては、下記に記載される一以上のステップが、省略され、繰り返され、及び／または異なる順番で実行されうる。さらに、本発明の範囲を逸脱することなく下記に記載されない追加のステップを実行しうる。よって、図５に記載される具体的なステップの構成は本発明の範囲を限定する方向で解釈されるべきではない。 FIG. 5 shows a flowchart according to the second embodiment of the present invention. Although the various steps of the flowchart are illustrated and described in order, those skilled in the art will understand that some or all of the steps may be performed in a different order and that some or all of the steps may be performed in parallel. It will be possible. Also, in the second embodiment of the present invention, one or more steps described below may be omitted, repeated, and / or executed in a different order. Furthermore, additional steps not described below may be performed without departing from the scope of the present invention. Therefore, the specific steps described in FIG. 5 should not be construed as limiting the scope of the present invention.

ステップ５００において、データリポジトリにおける電子ドキュメントは、電子ドキュメント内のオブジェクトに関するメタデータを含むように処理される。点線によって示されるように、電子ドキュメントはあらゆる適切な時点で処理しうる、例えば、データリポジトリに保存されるとき、または所定のスケジュール（例えば、週に一度新しいドキュメントを処理する）、またはその他のあらゆる適切な時点が挙げられる。電子ドキュメントに対して処理を実行することによって、電子ドキュメント内のオブジェクトに関するメタデータを生成し、メタデータを電子ドキュメントに保存しうる。これにより、物理的ドキュメントの電子コピーはオブジェクトに関するメタデータを用いて検索しうる。 In step 500, an electronic document in the data repository is processed to include metadata about objects in the electronic document. As indicated by the dotted lines, electronic documents can be processed at any suitable time, for example when stored in a data repository, or on a predetermined schedule (eg, processing new documents once a week), or any other Appropriate time points are listed. By performing processing on the electronic document, metadata about an object in the electronic document can be generated and the metadata can be stored in the electronic document. Thus, an electronic copy of a physical document can be retrieved using metadata about the object.

具体的には、電子ドキュメントはステップ５１０及び５１５に説明されるように処理しうる。つまり、電子ドキュメント内のすべてのオブジェクトについてタグが決定され、当該オブジェクト及びタグに基づいてメタデータが生成される。生成されたメタデータは次に電子ドキュメントに保存される。メタデータは例えば見えないテキストレイヤーかその他の適切な方法で保存される。ユーザーに見えるか見えないかを問わず、メタデータをテキスト形式で保存することによって、既存のテキスト検索を用いてメタデータを検索することができる。また、第二実施形態においては、オブジェクトのタグはすでに電子ドキュメントに存在しうる（つまりドキュメントはネイティブ形式である）。そしてメタデータは既存のタグを用いて生成され、電子ドキュメントに保存される。 Specifically, the electronic document can be processed as described in steps 510 and 515. That is, tags are determined for all objects in the electronic document, and metadata is generated based on the objects and tags. The generated metadata is then stored in an electronic document. The metadata is stored, for example, in an invisible text layer or other suitable method. Regardless of whether it is visible to the user or not, the metadata can be searched using an existing text search by storing the metadata in a text format. In the second embodiment, the tag of the object may already exist in the electronic document (that is, the document is in a native format). Metadata is generated using existing tags and stored in an electronic document.

ステップ５０５において、オブジェクトを含むドキュメントの物理的コピーをスキャンしたものが受信される。このようなスキャンは既知のまたは今後開発されるあらゆる形式のあらゆる解像度及び／またはサイズで受信しうる。スキャンはスキャナー自体で実行されるかその他の演算装置で実行される、プログラムまたはアプリケーションによって受信しうる。 In step 505, a scanned physical copy of the document containing the object is received. Such a scan may be received at any resolution and / or size in any form known or later developed. The scan may be received by a program or application that is performed on the scanner itself or on other computing devices.

ステップ５１０において、オブジェクトについてタグが決定される。タグはオブジェクトの少なくとも一部を定義し、あらゆる適切な方法によって決定されうる。第二実施形態においては、オブジェクトとテンプレートとを比較して、オブジェクトが既知のタグ付きオブジェクトと同様かを判断しうる。または、タグは有力な推測のアルゴリズム、ユーザーからの入力またはその他の適切な方法に基づいて決定しうる。 In step 510, a tag is determined for the object. A tag defines at least a portion of an object and can be determined by any suitable method. In the second embodiment, the object and the template can be compared to determine whether the object is similar to a known tagged object. Alternatively, the tag may be determined based on a powerful guessing algorithm, input from the user, or other suitable method.

ステップ５１５において、メタデータはオブジェクト及びタグに基づいて生成される。メタデータはオブジェクトの構成情報を含み、既知のまたは今後開発されるあらゆる態様で生成されうる。メタデータはオブジェクトとそのタグに基づいて生成しうる。第二実施形態においては、メタデータは、オブジェクトの種類、サイズ、色、位置、形状等のオブジェクトの特徴を記述しうる。 In step 515, metadata is generated based on the objects and tags. Metadata includes object configuration information and can be generated in any manner known or later developed. The metadata can be generated based on the object and its tag. In the second embodiment, the metadata can describe the characteristics of the object such as the type, size, color, position, and shape of the object.

ステップ５２０において、データリポジトリはメタデータを用いて検索される。具体的には、メタデータの一部または全部を用いた検索用語によるテキスト検索が用いられる。また、テキスト検索はＯＣＲまたはその他の方法によって認識されたテキスト等のドキュメントからのテキストコンテンツを含むことがある。よって、物理的コピーのメタデータはデータリポジトリ内の電子ドキュメントのメタデータと比較され、これによってドキュメントにテキストがほとんどまたはまったくない場合でも電子ドキュメントを見つけることができる。テキスト検索は既知のまたは今後開発されるあらゆる方法で実行しうる。第二実施形態においては、データリポジトリ内のすべてのドキュメントが検索にかけられる。また、一部の電子ドキュメントを検索することが可能であり、例えば、ユーザーにアクセスが認められている電子ドキュメントのみを検索することが可能である（ステップ５２５）。 In step 520, the data repository is searched using the metadata. Specifically, a text search using a search term using a part or all of the metadata is used. A text search may also include text content from documents such as text recognized by OCR or other methods. Thus, the physical copy's metadata is compared with the electronic document's metadata in the data repository so that the electronic document can be found even if the document has little or no text. The text search can be performed in any way known or later developed. In the second embodiment, all documents in the data repository are subjected to a search. Further, it is possible to search a part of electronic documents, for example, it is possible to search only an electronic document that is allowed to be accessed by the user (step 525).

ステップ５２５において、ユーザーが電子ドキュメントにアクセスする許可を有するか判断される。任意に、第二実施形態において、ステップ５２５はステップ５２０の前またはこれと同時に実行しうる。ユーザーの許可状況によってユーザーがどの電子ドキュメントに対して閲覧、編集またはその他のアクセスが可能かを指定しうる。例えば、ユーザーにユーザー名、パスワードまたはその他の身元証明を入力させ、ユーザーが電子ドキュメント及び／またはデータリポジトリの一部にアクセスする許可を有するかを確認することによって、ユーザーの許可状況を判断しうる。ユーザーが電子ドキュメントのアクセスを許可されていない場合、処理は終了する。また、第二実施形態においては、ユーザーが電子ドキュメントへのアクセスを許可されていない場合、設定及び／または許可状況に基づいて、ドキュメントに関する一部の情報がユーザーに提供されうる。例えば、ユーザーは一致が存在するか否かまたは複数の一致が存在するかを知らされうる。さらに、必要な許可のレベル、またはその他の適切な情報を提供しうる。ユーザーが電子ドキュメントにアクセスする許可を有しない場合、処理はステップ５３０へ進む。 In step 525, it is determined whether the user has permission to access the electronic document. Optionally, in the second embodiment, step 525 may be performed before or simultaneously with step 520. The user's permission status may specify which electronic documents the user can view, edit or otherwise access. For example, the user's permission status may be determined by having the user enter a username, password, or other identification and verifying that the user has permission to access a portion of the electronic document and / or data repository. . If the user is not authorized to access the electronic document, the process ends. In the second embodiment, when the user is not permitted to access the electronic document, some information regarding the document may be provided to the user based on the setting and / or permission status. For example, the user can be informed whether there is a match or whether there are multiple matches. In addition, the level of permission required, or other appropriate information may be provided. If the user does not have permission to access the electronic document, the process proceeds to step 530.

ステップ５３０において、見つけられた電子ドキュメントはユーザーへ提供される。見つけられた電子ドキュメントは適切な方法でユーザーへ提供される。具体的に、電子ドキュメントの場所は、リンクに示され（例えば、ハイパーリンク）、またはその名前がスキャナー上の表示、音声メッセージ、電子メール、プリントアウト等によって示される。また、電子ドキュメントのコピーがユーザーに電子メールで送信されうる。本明細書の開示から、見つけられた電子ドキュメントをユーザーへ提供する様々な方法があることは当業者にとって明らかであり、よって、本発明の上述の例示に限定されない。 In step 530, the found electronic document is provided to the user. The found electronic document is provided to the user in an appropriate manner. Specifically, the location of the electronic document is indicated by a link (eg, a hyperlink), or its name is indicated by a display on the scanner, a voice message, an email, a printout, etc. Also, a copy of the electronic document can be sent to the user by email. From the disclosure herein, it will be apparent to those skilled in the art that there are various ways of providing a found electronic document to a user, and thus is not limited to the above illustration of the present invention.

図６は本発明の第二実施形態に基づく一例を示す。当該例示は説明のみを目的とするものであり、本発明の範囲を限定するものではない。具体的には図６はドキュメント発見の例を示す。図６において、ユーザーは電子ドキュメントのプリントアウトである物理的コピー（６００）を所有する。物理的コピー（６００）において、ドキュメントの左下の角に小さい黒い円が含まれている。ユーザーがドキュメントの電子コピーを見つけたいと望む場合、ユーザーはスキャナー（６０５）上に物理的コピー（６００）を置き、ドキュメントをスキャンする。ユーザーは単にスキャナーの「オリジナルを見つける」等のボタンを押しスキャナーに物理的コピー（６００）の電子コピーを探すように指示しうる。 FIG. 6 shows an example based on the second embodiment of the present invention. The examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Specifically, FIG. 6 shows an example of document discovery. In FIG. 6, the user has a physical copy (600) that is a printout of an electronic document. In the physical copy (600), a small black circle is included in the lower left corner of the document. If the user wants to find an electronic copy of the document, the user places a physical copy (600) on the scanner (605) and scans the document. The user may simply press a button such as “find original” on the scanner and instruct the scanner to look for an electronic copy of the physical copy (600).

そして、スキャナー（６０５）は物理的コピーを処理し、物理的コピー（６００）にオブジェクトつまり小さい黒い円があることを認識する。スキャナー（６０５）は小さい黒い円のためのタグを決定する。小さい黒い円のためのタグを決定した後、オブジェクト及びタグに基づくメタデータ（６１０）が生成される。具体的に、メタデータ（６１０）はオブジェクトを記述する。メタデータ（６１０）は電子ファイルで見られるような状態で示され、ユーザーには見えないことがある。今回の例で生成されるメタデータは以下の通りである、「円、黒、小さい、左下」。今回の例で生成されるメタデータ及びタグはあらゆる適切なエンティティによって設定されたルールに基づくものであり、また、時間の経過とともに見直されることで、同じオブジェクトがメタデータを生成するルールに変更があった時点の後に処理された場合、異なるタグを有しうる。続いて、生成されたメタデータを用いて検索（６１５）が実施される。検索（６１５）において用いられる検索用語が「小さい黒い円、左下」であることが示される。検索（６１５）は大量の電子ドキュメント（６２５）を含むデータリポジトリ（６２０）内またはそれに対して実行される。電子ドキュメント（６２５）に物理的コピー（６００）のコピーが含まれる場合、検索に応じてドキュメントの位置またはドキュメント自体が回答される。検索（６１５）の結果（６３０）は「物理的ドキュメントの電子コピーはネットワークドライブＹの『プレゼンテーション』というフォルダに位置し、『マーケティングプレゼンテーション２０１２』というファイル名を有します」。そしてユーザーはネットワークドライブＹへ行き、編集やその他の目的で電子ドキュメントにアクセスすることができる。 The scanner (605) then processes the physical copy and recognizes that the physical copy (600) has an object, a small black circle. The scanner (605) determines the tag for the small black circle. After determining a tag for a small black circle, metadata based on the object and tag (610) is generated. Specifically, metadata (610) describes the object. The metadata (610) is shown as it would be seen in an electronic file and may not be visible to the user. The metadata generated in this example is as follows: “Circle, Black, Small, Bottom Left”. The metadata and tags generated in this example are based on rules set by any appropriate entity, and when reviewed over time, the rules for the same object generating metadata are changed. If processed after a certain point in time, it may have a different tag. Subsequently, a search (615) is performed using the generated metadata. It is indicated that the search term used in the search (615) is “small black circle, lower left”. The search (615) is performed in or against a data repository (620) that contains a large volume of electronic documents (625). If the electronic document (625) includes a copy of the physical copy (600), the location of the document or the document itself is answered in response to the search. The result (630) of the search (615) is "The electronic copy of the physical document is located in the folder" Presentation "on the network drive Y and has the file name" Marketing Presentation 2012 "." The user can then go to the network drive Y and access the electronic document for editing and other purposes.

任意に、図６においてユーザーは「オリジナルを見つける」というボタンを押した際にログインするかまたはパスワードを入力することがスキャナー（６０５）により求められうる。このようなログイン及び／またはパスワードを用いてユーザーが電子コピーにアクセスすることが認められるかが判断され、認められない場合は、電子コピーが見つけられても検索によって結果が返されない。 Optionally, in FIG. 6, the user may be prompted by the scanner (605) to log in or enter a password when pressing the “find original” button. Such a login and / or password is used to determine if the user is allowed to access the electronic copy, and if not, the search returns no results when the electronic copy is found.

当業者は本発明が非ネイティブハードコピーへの適用に限定されないことを理解するであろう。本発明の第二実施形態は非ネイティブ電子ドキュメントにも適用することができる。例えば、当初別の形式で作成されたドキュメントのＰＤＦコピーを有するユーザーがいる場合を考える。ＰＤＦコピーはネイティブドキュメントが有していたタグを欠き、これによりユーザーが容易にドキュメントを編集することが妨げられるので、ユーザーはオリジナルドキュメントを見つけてドキュメントにいくつかの変更を加えたいと思うことがある。ハードコピーのスキャンと同様にＰＤＦコピーを分析し、オブジェクトのタグが認識され、メタデータが生成される。そして、メタデータと、任意にその他の通常のテキストを検索用語として用いて電子ドキュメントのデータベースのテキスト検索を実行しうる。一致または複数の一致が見つかった場合は、その位置が適切な方法でユーザーに表示及び／または提供される。 One skilled in the art will appreciate that the present invention is not limited to non-native hardcopy applications. The second embodiment of the present invention can also be applied to non-native electronic documents. For example, consider the case where there is a user who has a PDF copy of a document originally created in another format. The PDF copy lacks the tags that native documents had, which prevents the user from easily editing the document, so the user may want to find the original document and make some changes to the document. is there. Similar to hard copy scanning, PDF copies are analyzed, object tags are recognized, and metadata is generated. A text search of the electronic document database may then be performed using metadata and optionally other normal text as search terms. If a match or multiple matches are found, the location is displayed and / or provided to the user in an appropriate manner.

当業者は本発明が上述の例に限定されないことを理解しうるであろう。その他の具体例として、タグ付きオブジェクトを有しないネイティブ電子ファイルの場合、タグ及び／またはメタデータを追加することによって利益が得られる。ＪＰＥＧをはじめとする多くのファイル形式はタグを欠く。よって、ネイティブファイルが単にＪＰＥＧ画像で、続いて印刷された場合、ＪＰＥＧについて上述のステップを実行すれば、ユーザーはオリジナルＪＰＥＧファイルを見つけられる場合がある。この例において、ユーザーはＪＰＥＧのプリントアウトをスキャナー上に置き、オリジナルドキュメントが画像であっても、メタデータ用語によるテキスト検索を用いてオリジナル電子ＪＰＥＧドキュメントを見つけることに成功しうる。 One skilled in the art will appreciate that the present invention is not limited to the examples described above. As another example, for native electronic files that do not have tagged objects, the benefits can be gained by adding tags and / or metadata. Many file formats, including JPEG, lack tags. Thus, if the native file is simply a JPEG image and subsequently printed, the user may be able to find the original JPEG file by performing the above steps for JPEG. In this example, the user can place a JPEG printout on the scanner and successfully find the original electronic JPEG document using text search by metadata terms, even if the original document is an image.

本発明の実施形態は用いられるプラットフォームを問わず、ヴァーチャル上であらゆる種類のコンピューティングシステムで実施しうる。例えば、コンピューティングシステムとして、一以上のモバイル装置（例えば、ラップトップコンピューター、スマートフォン、電子手帳、タブレットコンピューター、またはその他のモバイルデバイス）、デスクトップコンピューター、サーバー、サーバー筐体内のブレード、もしくはその他の種類の演算装置、または本発明の一以上の実施形態を実行しうる装置であり、少なくとも最低限の処理能力、メモリー並びに入力及び出力手段を有するものが挙げられる。例えば、図７に示すように、コンピューティングシステム（７００）は一以上のコンピュータープロセッサー（７０２）、関連付けられたメモリー（７０４）（例えば、ＲＡＭ、キャッシュメモリー、フラッシュメモリー等）、一以上の記憶装置（７０６）（例えば、ハードディスク、コンパクトディスク（ＣＤ）ドライブまたはデジタル多目的ディスク（ＤＶＤ）ドライブ等の光学ドライブ、フラッシュメモリーディスク等）及び数々のその他の要素及び機能を有する。コンピュータープロセッサー（７０２）は指示を処理する集積回路でありうる。例えば、コンピュータープロセッサーはプロセッサーの一以上のコアまたはマイクロコアでありうる。コンピューティングシステム（７００）は一以上の入力装置（７１０）を含み、例えば、タッチスクリーン、キーボード、マウス、マイクロフォン、タッチパッド、電子ペンまたはその他の種類の入力装置が挙げられる。また、コンピューティングシステム（７００）は一以上の出力装置（７０８）を含み、例えば、スクリーン（例えば、液晶表示装置（ＬＣＤ）、プラズマディスプレイ、タッチスクリーン、ブラウン管（ＣＲＴ）モニター、プロジェクター、またはその他の表示装置）、プリンター、外部記憶装置、またはその他の出力装置が挙げられる。一以上の出力装置は入力装置と同じまたは異なるものでありうる。コンピューティングシステム（７００）はネットワークインターフェイス接続（図示せず）を介してネットワーク（７１２）（例えば、ローカルエリアネットワーク（ＬＡＮ）、インターネット等のワイドエリアネットワーク（ＷＡＮ）、モバイルネットワーク、またはその他の種類のネットワーク）に接続されうる。入力装置及び出力装置はローカルまたはリモートで（例えば、ネットワーク（７１２）を介して）、コンピュータープロセッサー（７０２）、メモリー（７０４）、及び記憶装置に（７０６）接続されうる。様々な異なる種類のコンピューティングシステムが存在し、上述の入力装置及び出力装置も様々な形式がある。 Embodiments of the present invention can be implemented on any type of computing system on a virtual, regardless of the platform used. For example, a computing system may include one or more mobile devices (eg, laptop computers, smart phones, electronic notebooks, tablet computers, or other mobile devices), desktop computers, servers, blades in server enclosures, or other types of Arithmetic devices, or devices that can implement one or more embodiments of the present invention, include at least a minimum processing capability, memory, and input and output means. For example, as shown in FIG. 7, the computing system (700) includes one or more computer processors (702), associated memory (704) (eg, RAM, cache memory, flash memory, etc.), one or more storage devices. (706) (e.g., optical drives such as hard disks, compact disk (CD) drives or digital multi-purpose disk (DVD) drives, flash memory disks, etc.) and numerous other elements and functions. Computer processor (702) may be an integrated circuit that processes instructions. For example, the computer processor can be one or more cores or microcores of the processor. The computing system (700) includes one or more input devices (710), such as a touch screen, keyboard, mouse, microphone, touchpad, electronic pen, or other type of input device. The computing system (700) also includes one or more output devices (708), such as a screen (eg, a liquid crystal display (LCD), plasma display, touch screen, cathode ray tube (CRT) monitor, projector, or other Display device), printer, external storage device, or other output device. The one or more output devices can be the same as or different from the input devices. The computing system (700) may be connected to a network (712) via a network interface connection (not shown) (eg, a local area network (LAN), a wide area network (WAN) such as the Internet), a mobile network, or other type of network. Network). Input and output devices may be connected (706) to a computer processor (702), a memory (704), and a storage device locally or remotely (eg, via a network (712)). There are various different types of computing systems, and the input devices and output devices described above also have various types.

本発明を実施する、コンピューター読取可能プログラムコードの形式のソフトウェア指示はその全部または一部が、一時的にまたは永久的に、非一時的コンピューター読取媒体に記憶することができ、例えば、ＣＤ、ＤＶＤ、記憶装置、ディスケット、テープ、フラシュメモリー、物理的メモリーまたはその他のコンピューター読取記憶媒体が挙げられる。具体的に、ソフトウェア指示はプロセッサーによって実行されると本発明の実施形態を実施するように構成されるコンピューター読取可能プログラムコードに対応しうる。 Software instructions in the form of computer readable program code embodying the present invention may be stored in whole or in part on non-transitory computer readable media, eg, CD, DVD, etc. Storage devices, diskettes, tapes, flash memory, physical memory or other computer-readable storage media. Specifically, the software instructions may correspond to computer readable program code configured to implement embodiments of the present invention when executed by a processor.

また、上述のコンピューティングシステム（７００）の一以上の構成要素は離れた場所に位置し、ネットワーク（７１２）上で他の構成要素と接続されうる。また、本発明の実施形態は複数のノードを有する分散システムに適用することができ、本発明の各部分を分散システム内の異なるノードに配置しうる。本発明のある実施形態においては、ノードは異なるコンピューティングデバイスに対応する。また、ノードは関連付けられた物理的メモリーを有するコンピュータープロセッサーと対応しうる。ノードはまた共有メモリー及び／またはリソースを有するコンピュータープロセッサーまたはコンピュータープロセッサーのマイクロコアと対応しうる。 Also, one or more components of the computing system (700) described above may be remotely located and connected to other components on the network (712). Further, the embodiment of the present invention can be applied to a distributed system having a plurality of nodes, and each part of the present invention can be arranged at different nodes in the distributed system. In some embodiments of the invention, the nodes correspond to different computing devices. A node can also correspond to a computer processor having an associated physical memory. A node may also correspond to a computer processor or a microcore of a computer processor having shared memory and / or resources.

本発明は限られた数の実施形態により説明されたが、当業者は本明細書を利用することによって、本明細書において開示される本発明の範囲を逸脱することなく、その他の実施形態を考案することが可能であることを理解するであろう。よって、本発明の範囲は添付されたクレームのみによって限定される。 Although the present invention has been described in terms of a limited number of embodiments, those skilled in the art will be able to use other embodiments without departing from the scope of the present invention disclosed herein. It will be understood that it can be devised. Accordingly, the scope of the invention is limited only by the appended claims.

Claims

A method of objectifying non-text content including objects in non-native files by a computing system comprising a computer processor, comprising:
Determining , by the computer processor, a tag for recognizing the object in a native file format , and creating an objectized object including the object and the tag, thereby converting the object of the non-text content into an object;
Generating, by the computer processor, metadata including configuration information of the objectified object, based on the objectized object , wherein at least part of the configuration information is text data searchable by a native application for a native file. When,
Generating a new native file containing the objectified object with the metadata added thereto by the computer processor ;
A method comprising:

The method of claim 1, wherein the non-native file is a physical document and the native file is an OOXML file.

The method of claim 1, wherein the objectified object is editable in a native format and the metadata is searchable.

The method of claim 1, wherein the object is a graphic object, and wherein the metadata describes a graphic type of the object and a position of the object in a page of the non-native file.

Receiving , by the computer processor, a file format of the new native file from a user;
The method of claim 1, wherein a portion of the tag determination is based on the file format.

Determining the tag comprises a step of comparing the object and a plurality of templates, each template of said plurality of templates corresponding to the native tag from the file format of the native file, according to claim 1 the method of.

The method of claim 1, wherein the non-native file is obtained from a scanner.

A system for objectifying non-text content including objects in non-native files,
A computer processor;
An objectification unit executed on the computer processor;
With
The objectification unit includes:
Determine a tag for recognizing the object in the native file format , create an object object that includes the object and the tag, and turn the object of non-text content into an object,
Generating metadata including configuration information of the objectified object, wherein at least part of the configuration information is text data searchable by a native application for a native file based on the objectified object;
A system for generating a new native file including the objectized object to which the metadata is added.

The system of claim 8, wherein the non-native file is a physical document and the native file is an OOXML file.

The system of claim 8, wherein the objectified object is editable in a native format and the metadata is searchable.

The system of claim 8, wherein the object is a graphic object, and wherein the metadata describes a graphic type of the object and a position of the object in a page of the non-native file.

The objectification unit further receives a file format of the new native file from a user,
The system of claim 8, wherein a portion of the tag determination is based on the file format.

Determining the tag further includes comparing said object and a plurality of templates, each template of said plurality of templates corresponding to the native tag from the file format of the native file, to claim 8 The described system.

The system of claim 8, further comprising a scanner that acquires the non-native file.

A computer program that includes instructions for objectifying non-text content, including objects in non-native files,
Determine a tag for recognizing the object in the native file format , and create an object object including the object and the tag, thereby converting the object of the non-text content into an object,
Based on the objectified object, generating metadata including the configuration information of the objectified object, wherein at least part of the configuration information is text data searchable by a native application for a native file,
A program for generating a new native file including the objectized object to which the metadata is added.

The computer program product of claim 15, wherein the non-native file is a physical document and the native file is an OOXML file.

The computer program product of claim 15, wherein the objectified object is editable in a native format and the metadata is searchable.

The computer program product of claim 15, wherein the object is a graphic object, and the metadata describes a graphic type of the object and a position of the object in a page of the non-native file.

Further comprising receiving a file format of the new native file from a user;
The computer program product of claim 15, wherein a portion of the tag determination is based on the file format.

Thereby determining the tag, said method comprising to compare the object and a plurality of templates, each template of said plurality of templates corresponding to the native tag from the file format of the native file, according to claim 15 Computer program.

The computer program according to claim 15, wherein the non-native file is obtained from a scanner.

A method for detecting a document by a computing system comprising a computer processor , comprising:
Receiving , by the computer processor, a scan of a physical copy of a document having a non-text object;
By the computer processor, determining a first tag for recognizing Oite the non-text object in the original file for said non-text object,
Generating non-text object metadata including configuration information of the non-text object based on the first tag by the computer processor ;
Retrieving, by the computer processor, a plurality of electronic documents stored in a data repository using the generated non-text object metadata, each including an object and searchable metadata associated with the object; ,
Comparing the non-text object metadata with the searchable metadata by the computer processor ;
Providing, by the computer processor, a location of the original file to a user if the non-text object metadata matches the searchable metadata;
A method comprising:

The computer processor extracts a second tag for the object in the electronic document , generates the searchable metadata describing the object based on the second tag, and in the electronic document associated with the object, the electronic document 23. The method of claim 22, further comprising processing electronic documents from the plurality of electronic documents stored in the data repository by storing searchable metadata.

23. The method of claim 22, wherein the original file is an OOXML file, and the original file is one of the plurality of electronic documents stored in the data repository.

Further comprising determining, by the computer processor, whether a user is authorized to access the original file, wherein the location is provided only if it is determined that the user is authorized to access the original file. Item 23. The method according to Item 22.

23. The method of claim 22, wherein the location is provided to the user by email.

23. The method of claim 22, wherein the location is provided by displaying the location on a scanner display.

23. The method of claim 22, wherein the data repository is part of an enterprise content management (ECM) system.

23. The method of claim 22, wherein the searching step further comprises using normal text detected in the document via optical character recognition (OCR).

A system for detecting documents,
A data repository for storing a plurality of electronic documents including an object and searchable metadata associated with the object;
A computer processor;
A document locator running on the computer processor;
With
The document locator is:
Receive a scan of a physical copy of a document with non-text objects;
A first tag for recognizing Oite the non-text object in the original file was determined for the non-text object,
Generating non-text object metadata including configuration information of the non-text object based on the first tag;
Using the generated non-text object metadata to retrieve a plurality of electronic documents stored in the data repository;
Comparing the non-text object metadata with the searchable metadata;
A system for providing a location of the original file to a user when the non-text object metadata matches the searchable metadata.

The document locator extracts a second tag for the object in an electronic document, generates the searchable metadata describing the object based on the second tag, and in the electronic document associated with the object 32. The system of claim 30, wherein the system processes electronic documents from the plurality of electronic documents stored in the data repository by storing searchable metadata.

The system of claim 30, wherein the original file is an OOXML file, and the original file is one of the plurality of electronic documents stored in the data repository.

The document locator determines whether a user has authority to access the original file, and the location is provided only if it is determined that the user has authority to access the original file. System.

32. The system of claim 30, wherein the location is provided to the user by email.

32. The system of claim 30, wherein the location is provided by displaying the location on a scanner display.

32. The system of claim 30, wherein the data repository is part of an enterprise content management (ECM) system.

The system of claim 30, wherein the searching further comprises using normal text detected in the document via optical character recognition (OCR).

A computer program containing instructions for detecting a document on a computer,
Receive scans of physical copies of documents with non-text objects,
A first tag for recognizing Oite the non-text object in the original file is determined for the non-text object,
Generating non-text object metadata including configuration information of the non-text object based on the first tag;
Using the generated non-text object metadata to search a plurality of electronic documents stored in a data repository, each containing an object and searchable metadata associated with the object;
Comparing the non-text object metadata with the searchable metadata;
A program for causing a user to provide the location of the original file when the non-text object metadata matches the searchable metadata.

A second tag is extracted for the object in the electronic document, the searchable metadata describing the object is generated based on the second tag, and the searchable metadata in the electronic document associated with the object 40. The computer program of claim 38, further comprising processing an electronic document from the plurality of electronic documents stored in the data repository by storing.

The computer program according to claim 38, wherein the original file is an OOXML file, and the original file is one of the plurality of electronic documents stored in the data repository.

39. The computer of claim 38, further comprising causing a user to determine whether the user has authority to access the original file, wherein the location is provided only if the user is determined to have authority to access the original file. program.

40. The computer program product of claim 38, wherein the location is provided to the user by email.

The computer program according to claim 38, wherein the position is provided by displaying the position on a display of a scanner.

40. The computer program of claim 38, wherein the data repository is part of an enterprise content management (ECM) system.

40. The computer program product of claim 38, wherein the retrieving further comprises using normal text detected in the document via optical character recognition (OCR).