JP2016173822A

JP2016173822A - Information processing apparatus, information processing system and program

Info

Publication number: JP2016173822A
Application number: JP2016053997A
Authority: JP
Inventors: 圭輔中沢; Keisuke Nakazawa; 有登柴田; Yuto Shibata; 大介岡田; Daisuke Okada; ゼン顧; Zheng Ko; 暁子北山; Akiko Kitayama; 潤田　浩也; Hiroya Uruta; 浩也潤田; 優香斎藤; Yuka Saito
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2015-03-17
Filing date: 2016-03-17
Publication date: 2016-09-29
Anticipated expiration: 2036-03-17
Also published as: JP6662132B2

Abstract

PROBLEM TO BE SOLVED: To enable the generation of such a thumbnail image about a plurality of documents so that a user can easily discriminate the documents and also easily grasp contents of each document.SOLUTION: An information processing apparatus respectively calculates feature amounts about the image of each page of a document to be stored and a thumbnail image corresponding to another stored document (S13, S15), calculates similarity between respective images in the image of each page of the document to be stored and a thumbnail image corresponding to another stored document on the basis of the feature amounts (S16), selects an image having high similarity to another page in the document to be stored and also having low similarity to the thumbnail image corresponding to another stored document among images of each page of the document to be stored on the basis of the similarity according to a prescribed reference (S17 to S20), and creates a thumbnail image corresponding to the document to be stored on the basis of the selected image (S21).SELECTED DRAWING: Figure 3

Description

この発明は、情報処理装置、情報処理システム及びプログラムに関する。 The present invention relates to an information processing apparatus, an information processing system, and a program.

従来から、情報処理装置や画像形成装置の分野において、多数の文書を記憶手段に蓄積しておき、ユーザがその中から任意に文書を選択して表示や印刷等の処理を行えるようにすることが行われている。また、文書の選択を受け付ける場合に、文書名、蓄積日時、印刷設定といった文書の属性の情報を表示する他、文書を表示や印刷した場合に表れる画像を縮小したサムネイル画像を表示し、ユーザが文書の内容を把握しやすくすることも行われている。 Conventionally, in the field of information processing apparatuses and image forming apparatuses, a large number of documents are stored in a storage unit so that a user can arbitrarily select a document from among them and perform processing such as display and printing. Has been done. In addition, when accepting a document selection, in addition to displaying document attribute information such as the document name, storage date, and print settings, the thumbnail image obtained by reducing the image that appears when the document is displayed or printed is displayed. It has also been made easier to understand the contents of documents.

ここで、サムネイル画像を作成する場合、単純に各文書の１ページ目の画像に基づき作成することも考えられる。しかし、このような単純な作成法では、似たサムネイル画像が多くなってしまい、文書を識別しづらくなってしまう場合があるという問題があった。例えば、表紙のフォーマットが共通する文書が多数ある場合である。
このような問題に対処するための技術として、例えば特許文献１及び２に記載のものが知られている。 Here, when creating a thumbnail image, it is conceivable to create it simply based on the image of the first page of each document. However, such a simple creation method has a problem in that there are cases where a number of similar thumbnail images increase, making it difficult to identify a document. For example, there are many documents with a common cover format.
As a technique for coping with such a problem, for example, those described in Patent Documents 1 and 2 are known.

特許文献１には、文書に含まれる各ページの画像データについて特徴量を算出し、その各特徴量を基準の特徴量と比較して、最も類似度が高いページの画像に基づきサムネイル画像を生成することが記載されている。
特許文献２には、複数の文書のサムネイル画像に対してクラスタリング処理を行い、その結果互いに類似するサムネイル画像のグループがあった場合に、そのグループ内の各文書のサムネイル画像を、現在使用しているページの次のページの画像に基づいて再生成することが記載されている。 In Patent Document 1, a feature amount is calculated for image data of each page included in a document, and each feature amount is compared with a reference feature amount to generate a thumbnail image based on the image of the page having the highest similarity. It is described to do.
In Patent Document 2, when the thumbnail images of a plurality of documents are subjected to clustering processing, and as a result, there are groups of similar thumbnail images, the thumbnail images of each document in the group are currently used. It is described that the image is regenerated based on the image of the next page of the page.

しかし、特許文献１に記載の技術では、複数の文書においてサムネイル画像が似たものになってしまう点について直接の考慮はされておらず、似たサムネイル画像ができることを防止したいという要求に十分応えられていなかった。
また、特許文献２に記載の技術では、サムネイル画像の生成に用いるページは、ページ順に従って決定されるため、必ずしも各文書の内容を代表するものとならないという問題があった。 However, the technique described in Patent Document 1 does not directly take into consideration that thumbnail images are similar in a plurality of documents, and sufficiently satisfies the demand for preventing the generation of similar thumbnail images. It was not done.
Further, the technique described in Patent Document 2 has a problem that the page used for generating the thumbnail image is determined according to the page order, and therefore does not necessarily represent the contents of each document.

この発明は、このような問題を解決し、複数の文書について、ユーザがそれらの文書を区別しやすくかつ各文書の内容も把握しやすいようなサムネイル画像を生成できるようにすることを目的とする。 An object of the present invention is to solve such a problem and to generate a thumbnail image for a plurality of documents so that a user can easily distinguish between the documents and understand the contents of each document. .

この発明は、上記の目的を達成するため、情報処理装置において、文書を蓄積する蓄積手段と、上記蓄積手段により蓄積された各文書と対応するサムネイル画像を保存する保存手段と、対応するサムネイル画像を作成しようとする一の文書の各ページの画像と、上記保存手段により保存されている、上記一の文書以外の文書と対応するサムネイル画像とについて、それぞれ特徴量を算出する特徴量算出手段と、上記特徴量算出手段が算出した特徴量に基づき、上記一の文書の各ページの画像及び上記一の文書以外の文書と対応するサムネイル画像の中での、各画像間の類似度を算出する類似度算出手段と、上記類似度算出手段が算出した類似度に基づき、上記一の文書の各ページの画像のうち、上記一の文書内の他のページとの類似度が高く、かつ、上記一の文書以外の文書と対応するサムネイル画像との類似度が低い画像を、所定の基準に従って選択する選択手段と、上記選択手段が選択した画像に基づき、上記一の文書と対応するサムネイル画像を作成するサムネイル作成手段とを設けたものである。 In order to achieve the above object, according to the present invention, in an information processing apparatus, storage means for storing documents, storage means for storing thumbnail images corresponding to each document stored by the storage means, and corresponding thumbnail images A feature amount calculating means for calculating a feature amount for each page image of one document to be created and a thumbnail image corresponding to a document other than the one document stored by the storing means; Based on the feature amount calculated by the feature amount calculation means, the similarity between the images in the image of each page of the one document and the thumbnail image corresponding to the document other than the one document is calculated. Based on the similarity calculated by the similarity calculation means and the similarity calculation means, the similarity between the image of each page of the one document and the other pages in the one document is high. In addition, a selection unit that selects an image with a low similarity between a corresponding thumbnail image and a thumbnail image corresponding to the document other than the one document according to a predetermined criterion, and the image corresponding to the one document based on the image selected by the selection unit. There is provided thumbnail creation means for creating a thumbnail image.

上記構成によれば、複数の文書について、ユーザがそれらの文書を区別しやすくかつ各文書の内容も把握しやすいようなサムネイル画像を生成できるようにすることができる。 According to the above configuration, it is possible to generate thumbnail images for a plurality of documents so that the user can easily distinguish between the documents and can easily understand the contents of each document.

この発明の第１実施形態である情報処理装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the information processing apparatus which is 1st Embodiment of this invention. 図１に示した情報処理装置の機能構成を示す図である。It is a figure which shows the function structure of the information processing apparatus shown in FIG. 図１に示した情報処理装置のＣＰＵが、文書の蓄積指示を検出した場合に実行する処理のフローチャートである。3 is a flowchart of processing executed when the CPU of the information processing apparatus illustrated in FIG. 1 detects a document accumulation instruction. クラスタリング処理の実行結果の例を模式的に示す図である。It is a figure which shows the example of the execution result of a clustering process typically. その別の例を示す図である。It is a figure which shows the other example. 図１に示した情報処理装置のＣＰＵが、サムネイル画像の作成指示を検出した場合に実行する処理のフローチャートである。3 is a flowchart of processing executed when the CPU of the information processing apparatus illustrated in FIG. 1 detects an instruction to create a thumbnail image. 第２実施形態における図３と対応する処理のフローチャートである。It is a flowchart of the process corresponding to FIG. 3 in 2nd Embodiment. 蓄積済み文書の管理データの例を示す図である。It is a figure which shows the example of the management data of the accumulated document. 第３実施形態における図３と対応する処理のフローチャートである。It is a flowchart of the process corresponding to FIG. 3 in 3rd Embodiment.

以下、この発明の実施形態について、図面を参照しつつ説明する。
〔第１実施形態：図１乃至図５〕
まず、この発明の第１実施形態について説明する。
図１は、この発明の第１の実施形態である情報処理装置のハードウェア構成を示す図である。
図１に示すように、情報処理装置１０は、ＣＰＵ１１、ＲＯＭ１２、ＲＡＭ１３、ＨＤＤ（ハードディスクドライブ）１４、通信Ｉ／Ｆ（インタフェース）１５、操作部１６、表示部１７を備え、これらをシステムバス２０により接続した構成としている。 Embodiments of the present invention will be described below with reference to the drawings.
[First Embodiment: FIGS. 1 to 5]
First, a first embodiment of the present invention will be described.
FIG. 1 is a diagram showing a hardware configuration of an information processing apparatus according to the first embodiment of the present invention.
As shown in FIG. 1, the information processing apparatus 10 includes a CPU 11, a ROM 12, a RAM 13, an HDD (hard disk drive) 14, a communication I / F (interface) 15, an operation unit 16, and a display unit 17. It is set as the structure connected by.

そして、ＣＰＵ１１が、ＲＡＭ１３をワークエリアとしてＲＯＭ１２あるいはＨＤＤ１４に記憶されたプログラムを実行することにより、情報処理装置１０全体を制御し、図２を用いて後述するものをはじめとする種々の機能を実現する。
ＲＯＭ１２及びＨＤＤ１４は、不揮発性記憶媒体（記憶手段）であり、ＣＰＵ１１が実行する各種プログラムや後述する各種データを格納している。また、ＨＤＤ１４は、文書の蓄積先の記憶手段として用いることができる。 Then, the CPU 11 controls the entire information processing apparatus 10 by executing programs stored in the ROM 12 or the HDD 14 using the RAM 13 as a work area, thereby realizing various functions including those described later with reference to FIG. To do.
The ROM 12 and the HDD 14 are non-volatile storage media (storage means) and store various programs executed by the CPU 11 and various data described later. The HDD 14 can be used as a storage unit for storing documents.

通信Ｉ／Ｆ１５は、ＬＡＮ（ローカルエリアネットワーク）、インターネット、ピアツーピア通信等の任意の通信経路を介して外部装置と通信するためのインタフェースである。文書の蓄積先とする記憶手段は、この通信Ｉ／Ｆ１５を介して通信可能な外部装置に設けてもよい。 The communication I / F 15 is an interface for communicating with an external device via an arbitrary communication path such as a LAN (local area network), the Internet, or peer-to-peer communication. The storage means serving as the document storage destination may be provided in an external device that can communicate via the communication I / F 15.

操作部１６は、ユーザからの操作を受け付けるための操作手段であり、キーボード及びマウス等のポインティングデバイスにより構成することができる。
表示部１７は、情報処理装置１０の動作状態や設定内容、メッセージ等をユーザに提示するための提示手段であり、液晶ディスプレイ等を備える。表示部１７は、サムネイル画像を用いて処理対象の文書の選択を受け付けるための画面も表示する。その画面に対する操作は、操作部１６により受け付けることができる。 The operation unit 16 is an operation means for receiving an operation from a user, and can be configured by a pointing device such as a keyboard and a mouse.
The display unit 17 is a presentation unit for presenting the operation state, setting content, message, and the like of the information processing apparatus 10 to the user, and includes a liquid crystal display and the like. The display unit 17 also displays a screen for accepting selection of a document to be processed using thumbnail images. An operation on the screen can be accepted by the operation unit 16.

なお、操作部１６及び表示部１７は外付けであってもよい。また、情報処理装置１０がユーザからの操作を直接受ける必要がない（通信Ｉ／Ｆ１５を介して接続された外部装置により操作を受け付けたり情報の提示を行ったりすればよい）場合には、操作部１６や表示部１７を設けなくてよい。 The operation unit 16 and the display unit 17 may be externally attached. Further, when the information processing apparatus 10 does not need to directly receive an operation from the user (the operation may be accepted by an external apparatus connected via the communication I / F 15 or information may be presented), the operation is performed. The part 16 and the display part 17 may not be provided.

以上の情報処理装置１０は、ハードウェアとしては汎用のコンピュータにより構成することができる。しかし、図１に破線で示すように、原稿の画像を読み取って画像データを取得する画像読取手段であるスキャナエンジン１８や、用紙に画像を形成する画像形成手段であるプリンタエンジン１９を設け、ＭＦＰ（デジタル複合機）等の画像処理装置として構成することもできる。
また、以上の情報処理装置１０において特徴的な点の一つは、文書を蓄積する場合における、その各文書と対応するサムネイル画像の作成に係る機能である。以下、この点について説明する。 The information processing apparatus 10 described above can be configured as a general-purpose computer as hardware. However, as indicated by a broken line in FIG. 1, a scanner engine 18 that is an image reading unit that reads an image of a document and acquires image data, and a printer engine 19 that is an image forming unit that forms an image on a sheet are provided. It can also be configured as an image processing apparatus such as a (digital multifunction peripheral).
In addition, one of the characteristic points in the information processing apparatus 10 described above is a function related to creation of thumbnail images corresponding to each document when the documents are stored. Hereinafter, this point will be described.

次に、図２に、情報処理装置１０の機能の構成を示す。なお、図２には主に、上述の文書の蓄積及びサムネイル画像の作成に関連する機能を示している。これらの各部の機能は、ＣＰＵ１１が所要のプログラムを実行して所要のハードウェアを制御することにより実現されるものである。 Next, FIG. 2 shows a functional configuration of the information processing apparatus 10. FIG. 2 mainly shows functions related to the above-described document accumulation and thumbnail image creation. The functions of these units are realized by the CPU 11 executing a required program and controlling required hardware.

図２に示すように、情報処理装置１０は、文書管理部１１０、文書記憶部１２０及び文書処理部１３０を備える。
これらのうち文書記憶部１２０は、それぞれ１以上のページを含む複数の文書のデータを記憶する記憶手段の機能を備える。文書記憶部１２０の機能は例えばＨＤＤ１４により実現できるが、情報処理装置１０の外部にある装置のストレージにより実現してもよい。
文書処理部１３０は、文書記憶部１２０に記憶しているものの中からユーザが選択した文書に対し、ユーザが指示した処理を実行する機能を備える。この処理には、例えば、表示、印刷、外部への送信、編集、削除等が考えられる。 As illustrated in FIG. 2, the information processing apparatus 10 includes a document management unit 110, a document storage unit 120, and a document processing unit 130.
Among these, the document storage unit 120 has a function of a storage unit that stores data of a plurality of documents each including one or more pages. The function of the document storage unit 120 can be realized by the HDD 14, for example, but may be realized by a storage of an apparatus outside the information processing apparatus 10.
The document processing unit 130 has a function of executing a process instructed by the user on a document selected by the user from those stored in the document storage unit 120. For example, display, printing, transmission to the outside, editing, and deletion can be considered as this processing.

文書管理部１１０は、文書記憶部１２０に対する文書の蓄積及び蓄積された文書を管理する機能をそなえる。より具体的には、文書蓄積部１１１、文書取得部１１２、特徴量算出部１１３、クラスタリング処理部１１４、ページ選択部１１５、サムネイル画像作成部１１６、サムネイル画像保存部１１７、および文書選択受付部１１８を備える。 The document management unit 110 has a function of accumulating documents in the document storage unit 120 and managing the accumulated documents. More specifically, the document storage unit 111, the document acquisition unit 112, the feature amount calculation unit 113, the clustering processing unit 114, the page selection unit 115, the thumbnail image creation unit 116, the thumbnail image storage unit 117, and the document selection reception unit 118. Is provided.

これらのうち文書蓄積部１１１は、ユーザ、他のプロセスあるいは外部装置等から蓄積を指示された文書を文書記憶部１２０に記憶させて蓄積する蓄積手段の機能を備える。
文書取得部１１２は、文書記憶部１２０に記憶されている文書のうち、文書処理部１３０における処理に供する文書を取得する機能を備える。 Among these, the document storage unit 111 has a function of a storage unit that stores and stores a document instructed to be stored by a user, another process, or an external device in the document storage unit 120.
The document acquisition unit 112 has a function of acquiring a document to be used for processing in the document processing unit 130 among documents stored in the document storage unit 120.

特徴量算出部１１３は、文書の各ページの画像あるいはサムネイル画像について、サムネイル画像を作成する際の分析に用いる画像の特徴量を算出する特徴量算出手段の機能を備える。この特徴量とは、例えば、配色、質感、エッジ分布、構図等、画像の特徴を数値列で表したものである。より具体的には、shape context、signature、skeleton、ＳＩＦＴ（Scale-Invariant Feature Transform）、ＣＳＳ（Color Self-Similarity）等を任意に組み合わせて用いることが考えられるが、これらには限られない。 The feature amount calculation unit 113 has a function of a feature amount calculation unit that calculates a feature amount of an image used for analysis when creating a thumbnail image for each page image or thumbnail image of the document. This feature amount represents, for example, image features such as color scheme, texture, edge distribution, composition, etc., in a numerical string. More specifically, it is possible to use any combination of shape context, signature, skeleton, SIFT (Scale-Invariant Feature Transform), CSS (Color Self-Similarity), etc., but is not limited thereto.

クラスタリング処理部１１４は、特徴量算出部１１３が算出した特徴量に基づき、画像間の類似度を算出する類似度算出手段の機能を備える。より具体的には、クラスタリング処理部１１４は、上記特徴量に対してクラスタリング処理を行い、サムネイルを作成しようとする文書の各ページの画像と、比較対象とする蓄積済み文書のサムネイル画像との中に、相互に類似した一群の画像があればそれらをクラスタとして抽出する。もちろん、相互に類似した画像の群が複数あれば、クラスタは複数抽出される。１画像が１クラスタを構成することもある。 The clustering processing unit 114 has a function of similarity calculation means for calculating the similarity between images based on the feature amount calculated by the feature amount calculation unit 113. More specifically, the clustering processing unit 114 performs a clustering process on the feature amount, and includes an image of each page of a document for which a thumbnail is to be created and a thumbnail image of an accumulated document to be compared. If there are a group of images similar to each other, they are extracted as a cluster. Of course, if there are a plurality of groups of similar images, a plurality of clusters are extracted. One image may constitute one cluster.

このクラスタリング処理のアルゴリズムとしては、教師無しクラスタリングや、蓄積済みの文書を学習サンプルとして識別器を構築する教師ありクラスタリングを用いることができる。より具体的には、例えばrandom forest（L.Breiman, “Random Forests”,
Machine Learning, vol. 45, no. 1, pp. 5-32, Oct. 2001 参照）、k-means法、自己組織化マップ（Self-Organizing Map）等を採用可能であるが、これらには限られない。 As an algorithm for this clustering process, unsupervised clustering or supervised clustering that constructs a classifier using an accumulated document as a learning sample can be used. More specifically, for example, random forest (L. Breiman, “Random Forests”,
Machine Learning, vol. 45, no. 1, pp. 5-32, Oct. 2001), k-means method, Self-Organizing Map, etc. can be used. I can't.

なお、クラスタリング処理部１１４による、サムネイルを作成しようとする文書の各ページの画像に関する類似度の算出は、各ページの画像そのものについて求めた特徴量ではなく、各ページの画像に基づいて作成したサムネイル画像について求めた特徴量を用いて行ってもよい。比較対象が他の文書のサムネイル画像であるので、サムネイルを作成しようとする文書についても、サムネイル画像を用いて比較を行った方が、同じ基準での比較を行えるためである。以下に説明する具体的な処理例でも、サムネイル画像について求めた特徴量を用いるものとする。 Note that the clustering processing unit 114 calculates the similarity for each page image of the document for which a thumbnail is to be created, not the feature amount obtained for the image itself of each page, but the thumbnail created based on the image of each page. You may perform using the feature-value calculated | required about the image. This is because the comparison target is a thumbnail image of another document, and a document for which a thumbnail is to be created can be compared based on the same standard when compared using the thumbnail image. Also in the specific processing example described below, the feature amount obtained for the thumbnail image is used.

ページ選択部１１５は、クラスタリング処理部１１４による類似度の算出結果に従い、サムネイルを作成しようとする文書のどのページの画像に基づきサムネイル画像を作成するかを選択する選択手段の機能を備える。
サムネイル画像作成部１１６は、文書中の任意のページの画像に基づきサムネイル画像を作成するサムネイル作成手段の機能を備える。 The page selection unit 115 has a function of a selection unit that selects a page image of a document in which a thumbnail is to be created based on a similarity calculation result by the clustering processing unit 114 and selects a thumbnail image.
The thumbnail image creation unit 116 has a function of thumbnail creation means for creating a thumbnail image based on an image of an arbitrary page in the document.

サムネイル画像保存部１１７は、ページ選択部１１５が選択したページについてサムネイル画像作成部１１６が作成したサムネイル画像を、元にした文書のデータと対応付けて、その文書のサムネイル画像として保存する機能を備える。その保存先は、文書記憶部１２０であっても、それ以外の記憶手段であってもよい。
文書選択受付部１１８は、サムネイル画像保存部１１７が保存した各文書と対応するサムネイル画像をディスプレイに表示しつつ、ユーザから処理対象とする文書の選択を受け付ける機能を備える。 The thumbnail image storage unit 117 has a function of storing the thumbnail image created by the thumbnail image creation unit 116 for the page selected by the page selection unit 115 as the thumbnail image of the document in association with the original document data. . The storage destination may be the document storage unit 120 or other storage means.
The document selection accepting unit 118 has a function of accepting selection of a document to be processed from the user while displaying thumbnail images corresponding to each document saved by the thumbnail image saving unit 117 on the display.

次に、以上の情報処理装置１０のＣＰＵ１１が実行する、サムネイル画像の作成処理について説明する。図３はその処理のフローチャートである。
ＣＰＵ１１は、ユーザ、他のプロセスあるいは外部装置等から文書の蓄積を指示されたことを検出すると、図３のフローチャートに示す処理を開始する。
図３の処理において、ＣＰＵ１１はまず、蓄積対象文書のデータを文書記憶部１２０に記憶させる（Ｓ１１）。この処理は、文書蓄積部１１１の機能と対応するものである。 Next, a thumbnail image creation process executed by the CPU 11 of the information processing apparatus 10 will be described. FIG. 3 is a flowchart of the processing.
When the CPU 11 detects that a user, another process, an external device, or the like has instructed the storage of the document, the CPU 11 starts the processing shown in the flowchart of FIG.
In the process of FIG. 3, the CPU 11 first stores the data of the accumulation target document in the document storage unit 120 (S11). This process corresponds to the function of the document storage unit 111.

次に、ＣＰＵ１１は、蓄積対象文書の各ページの画像に基づきサムネイル画像を作成する（Ｓ１２）。この処理は、サムネイル画像作成部１１６の機能と対応する。ここで作成するサムネイル画像は、どのページの画像を採用するかの検討に用いるためのものである。
次に、ＣＰＵ１１は、ステップＳ１２で作成した各サムネイル画像の特徴量を算出して、特徴空間にマッピングする（Ｓ１３）。この処理は、特徴量算出部１１３の機能と対応する。 Next, the CPU 11 creates a thumbnail image based on the image of each page of the accumulation target document (S12). This process corresponds to the function of the thumbnail image creation unit 116. The thumbnail image created here is for use in examining which page image to use.
Next, the CPU 11 calculates the feature amount of each thumbnail image created in step S12 and maps it to the feature space (S13). This process corresponds to the function of the feature amount calculation unit 113.

次に、ＣＰＵ１１は、文書記憶部１２０に記憶されている蓄積対象文書以外の文書から、所定の条件に当てはまる文書を対比文書として選択する（Ｓ１４）。サムネイル画像の作成に当たり、文書記憶部１２０に記憶されている蓄積対象文書以外を全て検討の対象としてもよいが、数が多すぎる場合には、処理の負荷が大きくなるため、対象文書の数を絞るものである。 Next, the CPU 11 selects, as a comparison document, a document that satisfies a predetermined condition from documents other than the accumulation target document stored in the document storage unit 120 (S14). When creating thumbnail images, all documents other than the storage target documents stored in the document storage unit 120 may be considered. However, if the number is too large, the processing load increases. It is to squeeze.

所定の条件としては、例えば、登録されてからの期間が一定以下、アクセス数が一定以上あるいは上位一定割合以内、蓄積対象文書と同じ分類の文書、などが考えられるがこれらには限られない。比較的頻繁にサムネイル画像が表示されたり、蓄積対象文書と並べてサムネイル画像が表示されたりする文書を抽出できるような条件を設定するとよい。また、「全て」という条件を設定すれば、条件を設定しない場合と実質的に同じ処理を行うことができる。これらの条件は、ユーザあるいは管理者が任意に設定することができる。 Examples of the predetermined condition include, but are not limited to, a period of time after registration, a certain number of accesses, a number of accesses of a certain value or within a certain upper ratio, and documents of the same classification as the accumulation target document. It is preferable to set a condition for extracting a document in which a thumbnail image is displayed relatively frequently or a thumbnail image is displayed side by side with an accumulation target document. If the condition “all” is set, substantially the same processing can be performed as when no condition is set. These conditions can be arbitrarily set by the user or the administrator.

次に、ＣＰＵ１１は、ステップＳ１４で選択した各対比文書と対応するサムネイル画像の特徴量を算出して、ステップＳ１３の場合と同じ特徴空間にマッピングする（Ｓ１５）。この処理は、特徴量算出部１１３の機能と対応する。また、サムネイル画像は、サムネイル画像保存部１１７が保存しているものを用いればよい。図３の処理により過去に作成されたものであっても、他の処理で作成されたものでも、特に文書の画像を反映していないものでもよい。 Next, the CPU 11 calculates the feature amount of the thumbnail image corresponding to each contrast document selected in step S14 and maps it to the same feature space as in step S13 (S15). This process corresponds to the function of the feature amount calculation unit 113. In addition, the thumbnail image stored in the thumbnail image storage unit 117 may be used. It may be created in the past by the processing of FIG. 3, created by other processing, or not particularly reflecting the image of the document.

次に、ＣＰＵ１１は、ステップＳ１３及びＳ１５でマッピングした特徴量についてクラスタリング処理を実行する（Ｓ１６）。この処理は、クラスタリング処理部１１４の、画像間の類似度を算出する類似度算出手段の機能と対応する。
その後、ＣＰＵ１１は、クラスタリングの結果のうち、蓄積対象文書の各ページがどのクラスタに属するかに注目し、蓄積対象文書のページが最も多く属するクラスタを注目クラスタとして選択する（Ｓ１７）。最も多いクラスタが複数あった場合には、クラスタの中心からの距離がより小さいページを含むクラスタを注目クラスタとする。このときの「距離」としては、特徴空間におけるユークリッド距離を用いることができる。以降の説明における「距離」や、「近い」、「遠い」の基準についても同様である。以上の注目クラスタは、蓄積対象文書内の他のページとの類似度が高いページが集まったクラスタであると考えることができる。 Next, the CPU 11 executes clustering processing for the feature values mapped in steps S13 and S15 (S16). This processing corresponds to the function of the similarity calculation unit that calculates the similarity between images in the clustering processing unit 114.
Thereafter, the CPU 11 pays attention to which cluster each page of the accumulation target document belongs to among the clustering results, and selects the cluster to which the most pages of the accumulation target document belong as the attention cluster (S17). When there are a plurality of clusters having the largest number, a cluster including a page with a smaller distance from the center of the cluster is set as a focused cluster. As the “distance” at this time, the Euclidean distance in the feature space can be used. The same applies to the criteria of “distance”, “near”, and “far” in the following description. The above noted cluster can be considered as a cluster in which pages having high similarity to other pages in the accumulation target document are gathered.

そして、この注目クラスタに対比文書（のサムネイル画像）が属しない場合（Ｓ１８のＮｏ）、ＣＰＵ１１は、注目クラスタの中で最もクラスタの中心に近い位置にマッピングされたページを、サムネイル画像の作成に用いるページとして選択する（Ｓ１９）。これは、クラスタ内の各ページには、対比文書のサムネイル画像と類似度の高いページがないため、それ以上対比文書について考慮せずに、クラスタ内で最も典型的な画像をサムネイル画像にして、蓄積対象文書の内容を把握しやすいサムネイル画像を作成しようとする選択である。 If the contrast document (thumbnail image) does not belong to this cluster of interest (No in S18), the CPU 11 creates a thumbnail image of the page mapped to the position closest to the center of the cluster in the cluster of interest. The page to be used is selected (S19). This is because each page in the cluster does not have a page with a high similarity to the thumbnail image of the comparison document, so the most typical image in the cluster is made a thumbnail image without considering the comparison document any more. This is a selection to create a thumbnail image that makes it easy to grasp the contents of the document to be stored.

一方、注目クラスタに対比文書（のサムネイル画像）が属する場合（Ｓ１８のＹｅｓ）、ＣＰＵ１１は、注目クラスタの中で対比文書から最も遠い位置にマッピングされたページを、サムネイル画像の作成に用いるページとして選択する（Ｓ２０）。これは、クラスタ内でも、対比文書のサムネイル画像と類似度の高いページがあるため、クラスタ内で対比文書のサムネイル画像と類似度が低いページを選択し、なるべく他の文書と区別しやすいサムネイル画像を作成しようとするものである。
以上のステップＳ１７乃至Ｓ２０の処理は、蓄積対象文書内の他のページとの類似度が高く、かつ、蓄積対象文書以外の文書と対応するサムネイル画像との類似度が低い画像を、所定の基準に従って選択する処理であり、ページ選択部１１５の機能と対応する。 On the other hand, when the comparison document (thumbnail image) belongs to the cluster of interest (Yes in S18), the CPU 11 uses the page mapped to the position farthest from the comparison document in the cluster of attention as the page used to create the thumbnail image. Select (S20). This is because, even in a cluster, there are pages with high similarity to the thumbnail image of the comparison document, so select a page with low similarity to the thumbnail image of the comparison document in the cluster and make it as easy as possible to distinguish it from other documents. Is to create.
The processing in steps S17 to S20 described above is performed by using an image having a high similarity to another page in the accumulation target document and a low similarity between the corresponding thumbnail image and a document other than the accumulation target document. According to the function of the page selection unit 115.

いずれの場合も、ＣＰＵ１１は次に、ステップＳ１９又はＳ２０で選択したページの画像に基づき作成したサムネイル画像を、蓄積対象文書のサムネイル画像として保存して（Ｓ２１）、処理を終了する。この処理は、サムネイル画像保存部１１７の機能と対応する。
以上の処理により、情報処理装置１０は、蓄積しようとする文書に対し、当該文書の特徴をよく表し、さらに蓄積済みの他の文書のサムネイル画像とも区別しやすいサムネイル画像を、自動的に作成して保存することができる。以上の処理において、ＣＰＵ１１が第１制御手段として機能する。 In any case, the CPU 11 next stores the thumbnail image created based on the image of the page selected in step S19 or S20 as the thumbnail image of the document to be accumulated (S21), and ends the process. This process corresponds to the function of the thumbnail image storage unit 117.
With the above processing, the information processing apparatus 10 automatically creates a thumbnail image that clearly represents the characteristics of the document to be stored and that can be easily distinguished from other stored thumbnail images. Can be saved. In the above processing, the CPU 11 functions as a first control unit.

ここで、図４Ａ及び図４Ｂを用いて、ステップＳ１７乃至Ｓ２０の処理についてさらに説明する。
図４Ａ及び図４Ｂは、ステップＳ１６でのクラスタリング処理の実行結果を模式的に示したものである。これらの図において、黒塗りの図形は蓄積対象文書の各ページのサムネイル画像の特徴量をマッピングした位置を示し、白抜きの図形は対比文書のサムネイル画像をマッピングした位置を示す。いずれの例でも、蓄積対象文書は４ページであり、対比文書は２つである。また、特徴空間の表記は模式的なものであり、実際の構成は２次元には限らない。 Here, the processing of steps S17 to S20 will be further described with reference to FIGS. 4A and 4B.
4A and 4B schematically show the execution results of the clustering process in step S16. In these figures, the black figure indicates the position where the feature amount of the thumbnail image of each page of the document to be stored is mapped, and the white figure indicates the position where the thumbnail image of the comparison document is mapped. In any example, the accumulation target document is four pages and the number of comparison documents is two. In addition, the notation of the feature space is schematic and the actual configuration is not limited to two dimensions.

また、図４Ａ及び図４Ｂにおいて、楕円及び角丸長方形はそれぞれクラスタリング処理により抽出されたクラスタを示し、そのうち楕円が、ステップＳ１７で選択される注目クラスタを示す。
図４Ａと図４Ｂのいずれの例でも、蓄積対象文書のページのうち、３つのページが同じクラスタに属し、もう１つのページが別のクラスタに属している。従って、最大数である３つのページが属するクラスタが、注目クラスタである。 4A and 4B, an ellipse and a rounded rectangle indicate clusters extracted by the clustering process, respectively, and an ellipse indicates a target cluster selected in step S17.
In both examples of FIGS. 4A and 4B, among the pages of the document to be stored, three pages belong to the same cluster, and the other page belongs to another cluster. Therefore, the cluster to which the maximum three pages belong is the attention cluster.

そして、図４Ａの例では、注目クラスタに対比文書のサムネイル画像が属していないため、クラスタの中心から最も近い位置にマッピングされたページを、サムネイル画像の生成に用いるページとする。
一方、図４Ｂの例では、注目クラスタに対比文書のサムネイル画像が属しているため、注目クラスタの中で対比文書から最も遠い位置にマッピングされたページを、サムネイル画像の作成に用いるページとして選択する。 In the example of FIG. 4A, since the thumbnail image of the comparison document does not belong to the cluster of interest, the page mapped to the position closest to the center of the cluster is set as the page used for generating the thumbnail image.
On the other hand, in the example of FIG. 4B, since the thumbnail image of the comparison document belongs to the target cluster, the page mapped to the position farthest from the comparison document in the target cluster is selected as the page used for creating the thumbnail image. .

ところで、情報処理装置１０がサムネイル画像を作成するタイミングは、文書を新規に蓄積するタイミングには限らない。任意のタイミングにおける、既に蓄積された文書のいずれかを指定した作成指示に応じて、図３と同様な処理によりサムネイル画像を作成することもできる。 By the way, the timing at which the information processing apparatus 10 creates a thumbnail image is not limited to the timing at which a document is newly accumulated. A thumbnail image can be created by a process similar to that shown in FIG. 3 in response to a creation instruction designating any of the already accumulated documents at an arbitrary timing.

図５に、この場合にＣＰＵ１１が実行する処理のフローチャートを示す。なお、図５の処理は、多くの部分で図３の処理と共通し、共通する部分には同じステップ番号を用いた。
ＣＰＵ１１は、ユーザ、他のプロセスあるいは外部装置等からサムネイル画像の作成を指示されたことを検出すると、図５のフローチャートに示す処理を開始する。 FIG. 5 shows a flowchart of processing executed by the CPU 11 in this case. The process of FIG. 5 is common to the process of FIG. 3 in many parts, and the same step numbers are used for the common parts.
When the CPU 11 detects that an instruction to create a thumbnail image has been given by the user, another process, or an external device, the CPU 11 starts the process shown in the flowchart of FIG.

図５の処理において、サムネイル画像の作成対象として指定された作成対象文書の各ページの画像に基づき、サムネイル画像を作成する（Ｓ１２′）。この処理は、対象の文書が異なる点以外は図３のステップＳ１２と同じ処理である。
その後、ＣＰＵ１１は、ステップＳ１３乃至Ｓ２１′の処理により、図３同趣旨のサムネイル画像を生成して保存することができる。なお、「′」を付したステップについては、処理の対象が作成対象文書である点が、図３の対応するステップと異なる。また、ステップＳ２１′での保存は、過去に作成したサムネイル画像を上書きする形で行うとよい。以上の処理において、ＣＰＵ１１が第２制御手段として機能する。 In the process of FIG. 5, a thumbnail image is created based on the image of each page of the creation target document designated as the thumbnail image creation target (S12 ′). This process is the same as step S12 in FIG. 3 except that the target document is different.
Thereafter, the CPU 11 can generate and store a thumbnail image having the same concept as in FIG. 3 through the processing of steps S13 to S21 ′. Note that steps marked with “′” are different from the corresponding steps in FIG. 3 in that the object of processing is a document to be created. The storage in step S21 ′ may be performed by overwriting a thumbnail image created in the past. In the above processing, the CPU 11 functions as a second control unit.

図３の処理によってサムネイル画像を作成しても、その後他の文書のサムネイル画像が増えるにつれ、他の文書のサムネイル画像と区別しづらくなってしまうことも考えられる。この場合、再度サムネイル画像を作成し直すと、別のページの画像に基づきより区別が容易なサムネイル画像を作成できることも考えられる。なお、文書の中でどの程度典型的なページであるかという点と、他の文書のサムネイル画像とどの程度見分けやすいかという点とのバランスは、クラスタリング処理及びその処理結果に基づくページの選択の際に、自動的に考慮される。 Even if a thumbnail image is created by the processing of FIG. 3, it may be difficult to distinguish it from the thumbnail images of other documents as the number of thumbnail images of other documents increases thereafter. In this case, if a thumbnail image is created again, it may be possible to create a thumbnail image that is easier to distinguish based on an image of another page. Note that the balance between how typical a page is in a document and how easily it can be distinguished from thumbnail images of other documents depends on the clustering process and the selection of pages based on the processing results. When automatically taken into account.

〔第２実施形態：図６及び図７〕
次に、この発明の第２実施形態について説明する。
この第２実施形態は、サムネイル画像の作成対象とする文書と同じ内容の別の文書が既に蓄積されている場合に、その別の文書のサムネイル画像を、作成対象の文書のサムネイル画像としても用いるようにした点が第１実施形態と異なる。これ以外の点では上述した第１実施形態と共通であるので、この相違点に関連する事項についてのみ説明する。また、第１実施形態と共通の又は対応する構成については、第１実施形態で用いたものと同じ符号を用いる。 [Second Embodiment: FIGS. 6 and 7]
Next explained is the second embodiment of the invention.
In the second embodiment, when another document having the same content as the document to be created as a thumbnail image has already been accumulated, the thumbnail image of the other document is also used as the thumbnail image of the document to be created. This is different from the first embodiment. Since other points are common to the above-described first embodiment, only matters relating to this difference will be described. Moreover, about the structure which is common or corresponds to 1st Embodiment, the same code | symbol as what was used in 1st Embodiment is used.

図６に、第２実施形態における図３と対応する処理のフローチャートを示す。
この処理は、図３のステップＳ１１とＳ１２の間に、ステップＳＡの、蓄積対象文書と同内容の別文書が蓄積されているか否かの判断を追加し、これがＹｅｓの場合にステップＳＢに進むようにしたものである。ステップＳＢでは、ＣＰＵ１１は、ステップＳＡで発見した別文書と対応するサムネイル画像を蓄積対象文書のサムネイル画像として採用して保存し、処理を終了する。ステップＳＡでＮｏの場合には、図３のステップＳ１２以下の処理に進む。
ここで、ステップＳＡの判断は、例えば、文書管理部１１０が管理する、文書記憶部１２０に蓄積されている文書を管理するための管理データを参照して行うことができる。 FIG. 6 shows a flowchart of processing corresponding to FIG. 3 in the second embodiment.
In this process, a determination is made between steps S11 and S12 in FIG. 3 as to whether or not another document having the same content as the document to be accumulated is accumulated in step SA, and if this is Yes, the process proceeds to step SB. It is what I did. In step SB, the CPU 11 adopts and stores the thumbnail image corresponding to the other document found in step SA as the thumbnail image of the accumulation target document, and ends the process. If No in step SA, the process proceeds to step S12 and subsequent steps in FIG.
Here, the determination in step SA can be performed with reference to management data for managing documents stored in the document storage unit 120 managed by the document management unit 110, for example.

図７に、この管理データの例を示す。
この管理データは、文書記憶部１２０に蓄積されている文書に関する書誌事項をまとめたものである。そして例えば、ファイル名、文書の登録（蓄積）日時、文書の最終更新日時、ファイルのサイズ、文書のページ数、等の情報が含まれる。
これらのうち、例えば最終更新日時とサイズが共通する文書は、同じ内容の文書であると考えられる。あるいは、文書の性質上これだけでは断定できない場合には、さらに各ページの画像に対してマッチング処理を行って同一性を確認してもよい。もちろん、他の基準で判定してもよい。 FIG. 7 shows an example of this management data.
This management data is a collection of bibliographic items related to documents stored in the document storage unit 120. For example, information such as a file name, a document registration (accumulation) date, a document last update date, a file size, a document page number, and the like are included.
Among these, for example, documents having the same size as the last update date and time are considered to be documents having the same content. Alternatively, if it cannot be determined by this alone due to the nature of the document, matching may be further performed on the image of each page to confirm the identity. Of course, the determination may be based on other criteria.

このように、内容の同じ文書がある場合には、ユーザがそのことを認識できるよう、同じサムネイル画像を用いるようにするとよい。このことは、同じ内容の文書が複数蓄積される場合における、サムネイル画像の作成処理負荷の低減にもつながる。
なお、図５に示した処理についても図６と同様な変更が可能であることはもちろんである。図５の処理にはステップＳ１１がないため、処理開始直後にステップＳＡを実行すればよい。 In this way, when there are documents with the same content, it is preferable to use the same thumbnail image so that the user can recognize it. This leads to a reduction in the processing load for creating thumbnail images when a plurality of documents having the same contents are accumulated.
Needless to say, the process shown in FIG. 5 can be modified in the same manner as in FIG. Since the process of FIG. 5 does not include step S11, step SA may be executed immediately after the start of the process.

〔第３実施形態：図８〕
次に、この発明の第３実施形態について説明する。
この第３実施形態は、文書の先頭ページの画像を優先的にサムネイル画像作成に用いるようにした点が第１実施形態と異なる。これ以外の点では上述した第１実施形態と共通であるので、この相違点に関連する事項についてのみ説明する。また、第１実施形態と共通の又は対応する構成については、第１実施形態で用いたものと同じ符号を用いる。 [Third Embodiment: FIG. 8]
Next explained is the third embodiment of the invention.
The third embodiment is different from the first embodiment in that the image of the first page of the document is preferentially used for thumbnail image creation. Since other points are common to the above-described first embodiment, only matters relating to this difference will be described. Moreover, about the structure which is common or corresponds to 1st Embodiment, the same code | symbol as what was used in 1st Embodiment is used.

図８に、第３実施形態における図３と対応する処理のフローチャートを示す。
ＣＰＵ１１は、ユーザ、他のプロセスあるいは外部装置等から文書の蓄積を指示されたことを検出すると、図８のフローチャートに示す処理を開始する。
図８の処理において、ＣＰＵ１１はまず、図３のステップＳ１１の場合と同様、蓄積対象文書のデータを文書記憶部１２０に記憶させる（Ｓ３１）。 FIG. 8 shows a flowchart of processing corresponding to FIG. 3 in the third embodiment.
When the CPU 11 detects that a user, another process, an external device, or the like has instructed the storage of the document, the CPU 11 starts the processing shown in the flowchart of FIG.
In the process of FIG. 8, the CPU 11 first stores the data of the accumulation target document in the document storage unit 120 as in step S11 of FIG. 3 (S31).

次に、ＣＰＵ１１は、蓄積対象文書の先頭ページの画像に基づきサムネイル画像を作成する（Ｓ３２）。この処理は、使用するのが先頭ページの画像のみであることを除けば、図３のステップＳ１２と同様である。
次に、ＣＰＵ１１は、ステップＳ３２で作成したサムネイル画像の特徴量を算出して、特徴空間にマッピングする（Ｓ３３）。この処理は、図３のステップＳ１３と同様である。 Next, the CPU 11 creates a thumbnail image based on the image of the first page of the accumulation target document (S32). This process is the same as step S12 in FIG. 3 except that only the image of the first page is used.
Next, the CPU 11 calculates the feature amount of the thumbnail image created in step S32 and maps it to the feature space (S33). This process is the same as step S13 in FIG.

次に、ＣＰＵ１１は、文書記憶部１２０に記憶されている蓄積対象文書以外の文書から、所定の条件に当てはまる文書を対比文書として選択する（Ｓ３４）。さらに、ステップＳ３４で選択した各対比文書と対応するサムネイル画像の特徴量を算出して、ステップ３３の場合と同じ特徴空間にマッピングする（Ｓ３５）。これらの処理は、図３のステップＳ１４及びＳ１５と同じである。 Next, the CPU 11 selects, as a comparison document, a document that satisfies a predetermined condition from documents other than the accumulation target document stored in the document storage unit 120 (S34). Further, the feature amount of the thumbnail image corresponding to each contrast document selected in step S34 is calculated and mapped to the same feature space as in step 33 (S35). These processes are the same as steps S14 and S15 in FIG.

その後、ＣＰＵ１１は、ステップＳ３２で作成したサムネイル画像と、それに最も近い対比文書のサムネイル画像との間の、特徴空間における距離（ユークリッド距離）が所定値以下であるか否か判断する（Ｓ３６）。この判断は、ステップＳ３２で作成したサムネイル画像と類似度が所定基準以上のサムネイル画像が対比文書のサムネイル画像として既に使用されているか否かを判断するものである。 Thereafter, the CPU 11 determines whether or not the distance (Euclidean distance) in the feature space between the thumbnail image created in step S32 and the thumbnail image of the closest comparison document is equal to or less than a predetermined value (S36). This determination is to determine whether or not a thumbnail image having a similarity equal to or higher than a predetermined reference with the thumbnail image created in step S32 has already been used as the thumbnail image of the comparison document.

ステップＳ３６でＮｏであれば、ステップＳ３２で蓄積対象文書の先頭ページの画像に基づき生成したサムネイル画像と類似度が高いサムネイル画像は使用されていないことがわかる。従って、ステップＳ３２で作成したサムネイル画像により、蓄積対象文書を他の文書と容易に見分けられると考えられるため、ＣＰＵ１１は、ステップＳ３２で作成したサムネイル画像を蓄積対象文書のサムネイル画像として保存して（Ｓ３７）、処理を終了する。この処理は図３のステップＳ２１と対応するものである。 If “No” in step S36, it is understood that a thumbnail image having a high similarity to the thumbnail image generated based on the image of the first page of the accumulation target document in step S32 is not used. Accordingly, since it is considered that the accumulation target document can be easily distinguished from other documents by the thumbnail image created in step S32, the CPU 11 stores the thumbnail image created in step S32 as the thumbnail image of the accumulation target document ( S37), the process is terminated. This process corresponds to step S21 in FIG.

一方、ステップＳ３６でＹｅｓであれば、蓄積対象文書の先頭ページの画像に基づき生成したサムネイル画像と類似度が高いサムネイル画像が他に文書に使用されていることがわかる。そこで、ＣＰＵ１１は、図３のステップＳ１２以下の処理を実行し、第１実施形態の場合と同様な基準で、先頭ページ以外のページも候補として、どのページの画像に基づき蓄積対象文書のサムネイル画像を生成するかを決める（Ｓ３８）。その結果、先頭ページが選択されることもあり得る。
以上の処理において、ＣＰＵ１１は第３制御手段として機能する。 On the other hand, if Yes in step S36, it can be seen that another thumbnail image having a high similarity to the thumbnail image generated based on the image of the first page of the accumulation target document is used in the document. Therefore, the CPU 11 executes the processing from step S12 onward in FIG. 3, and based on the same criteria as in the first embodiment, the pages other than the first page are candidates, and the thumbnail image of the accumulation target document based on the image of any page. Is determined (S38). As a result, the first page may be selected.
In the above processing, the CPU 11 functions as third control means.

以上の処理によれば、文書の先頭ページの画像を優先的にサムネイル画像作成に用いるようにしつつ、ユーザが文書を区別しやすいサムネイル画像を生成できる。先頭ページは、しばしばタイトルが記載されるなど、文書の内容を分かりやすく表すページであることが多いので、他のサムネイル画像との類似度が低ければ、先頭ページを優先的に用いることにより、ユーザが各文書の内容を把握しやすいサムネイル画像を生成できる。
なお、ステップＳ３６における判断を、ステップＳ３３及びＳ３５でマッピングした特徴量についてクラスタリング処理を行い、蓄積対象文書の先頭ページの画像と同じクラスタに、対比文書のサムネイル画像が属しているか否かを基準に行うことも考えられる。この場合、属していれば、類似度が高いサムネイル画像ありでステップＳ３８へ、属していなければ、類似度が高いサムネイル画像なしでステップＳ３７へ進むとよい。 According to the above processing, it is possible to generate a thumbnail image that allows the user to easily distinguish the document while preferentially using the image of the first page of the document for thumbnail image creation. The first page is often a page that clearly shows the contents of the document, such as a title, so if the similarity to other thumbnail images is low, the user can use the first page preferentially. Can generate thumbnail images for easy understanding of the contents of each document.
Note that the determination in step S36 is performed on the feature values mapped in steps S33 and S35, and whether or not the thumbnail image of the comparison document belongs to the same cluster as the image of the first page of the accumulation target document is used as a reference. It is possible to do it. In this case, if it belongs, there is a thumbnail image with a high similarity, and the process proceeds to step S38. If not, the process proceeds to step S37 without a thumbnail image with a high similarity.

以上で実施形態の説明を終了するが、この発明において、装置の具体的な構成、具体的な処理の手順、データの構成、処理に用いるアルゴリズム、判定基準等は、実施形態で説明したものに限るものではない。 This is the end of the description of the embodiment. In the present invention, the specific configuration of the apparatus, the specific processing procedure, the data configuration, the algorithm used for the processing, the determination criteria, and the like are the same as those described in the exemplary embodiment. It is not limited.

例えば、類似度を算出する処理は、クラスタリング処理に限らず、別の手法で算出してもよい。
また、情報処理装置１０の機能を、複数の装置に分散して設け、それらの装置を協働させて情報処理装置１０と同様な機能を備える情報処理システムとして機能させることも妨げられない。また、情報処理装置１０は、図３に示したもの以外の任意の機能をさらに備えていてよい。 For example, the process for calculating the similarity is not limited to the clustering process, and may be calculated by another method.
In addition, the functions of the information processing apparatus 10 are provided in a distributed manner in a plurality of apparatuses, and it is not impeded that these apparatuses cooperate to function as an information processing system having the same functions as the information processing apparatus 10. Further, the information processing apparatus 10 may further include an arbitrary function other than that illustrated in FIG.

また、この発明のプログラムの実施形態は、コンピュータに所要のハードウェアを制御させて上述した実施形態における情報処理装置１０の機能を実現させるためのプログラムである。
このようなプログラムは、はじめからコンピュータに備えるＲＯＭや他の不揮発性記憶媒体（フラッシュメモリ，ＥＥＰＲＯＭ等）などに格納しておいてもよい。しかし、メモリカード、ＣＤ、ＤＶＤ、ブルーレイディスク等の任意の不揮発性記録媒体に記録して提供することもできる。それらの記録媒体に記録されたプログラムをコンピュータにインストールして実行させることにより、上述した各手順を実行させることができる。 The embodiment of the program of the present invention is a program for causing a computer to control required hardware to realize the functions of the information processing apparatus 10 in the above-described embodiment.
Such a program may be stored in a ROM or other nonvolatile storage medium (flash memory, EEPROM, etc.) provided in the computer from the beginning. However, it can also be provided by being recorded on an arbitrary nonvolatile recording medium such as a memory card, CD, DVD, or Blu-ray disc. Each procedure described above can be executed by installing the program recorded in the recording medium in a computer and executing the program.

さらに、ネットワークに接続され、プログラムを記録した記録媒体を備える外部装置あるいはプログラムを記憶手段に記憶した外部装置からダウンロードし、コンピュータにインストールして実行させることも可能である。
また、以上説明してきた各実施形態及び変形例の構成は、相互に矛盾しない限り任意に組み合わせて実施可能であることは勿論である。 Furthermore, it is also possible to download from an external device that is connected to a network and includes a recording medium that records the program, or an external device that stores the program in a storage unit, and install and execute the program on a computer.
In addition, it is needless to say that the configurations of the embodiments and modifications described above can be arbitrarily combined and implemented as long as they do not contradict each other.

１０：情報処理装置、１１：ＣＰＵ、１２：ＲＯＭ、１３：ＲＡＭ、１４：ＨＤＤ、１５：通信Ｉ／Ｆ、１６：操作部、１７：表示部、１８：スキャナエンジン、１９：プリンタエンジン、２０：システムバス、１１０：文書管理部、１１１：文書蓄積部、１１２：文書取得部、１１３：特徴量算出部、１１４：クラスタリング処理部、１１５：ページ選択部、１１６：サムネイル画像作成部、１１７：サムネイル画像保存部、１１８：文書選択受付部、１２０：文書記憶部、１３０：文書処理部 10: Information processing device, 11: CPU, 12: ROM, 13: RAM, 14: HDD, 15: Communication I / F, 16: Operation unit, 17: Display unit, 18: Scanner engine, 19: Printer engine, 20 : System bus 110: Document management unit 111: Document storage unit 112: Document acquisition unit 113: Feature amount calculation unit 114: Clustering processing unit 115: Page selection unit 116: Thumbnail image creation unit 117: Thumbnail image storage unit 118: Document selection receiving unit 120: Document storage unit 130: Document processing unit

特開２００９−２５１５８７号公報JP 2009-251587 A 特開２０１２−８６４４号公報JP 2012-8644 A

Claims

Storage means for storing documents;
Storage means for storing thumbnail images corresponding to each document stored by the storage means;
Features for calculating feature amounts for each page image of one document for which a corresponding thumbnail image is to be created, and for thumbnail images corresponding to a document other than the one document stored by the storage unit A quantity calculating means;
Similarity for calculating similarity between images in the image of each page of the one document and the thumbnail image corresponding to the document other than the one document based on the feature amount calculated by the feature amount calculation unit Degree calculation means;
Based on the similarity calculated by the similarity calculation means, among the images of each page of the one document, the similarity with other pages in the one document is high, and a document other than the one document Selecting means for selecting an image with a low similarity to the corresponding thumbnail image according to a predetermined criterion;
An information processing apparatus comprising: a thumbnail creation unit that creates a thumbnail image corresponding to the one document based on the image selected by the selection unit.

The information processing apparatus according to claim 1,
The similarity calculation means is means for performing a clustering process on the image of each page of the one document and the feature amount of a thumbnail image corresponding to a document other than the one document,
The selection means includes, as the predetermined criterion, a criterion for selecting one of images belonging to a cluster into which a maximum number of images are classified among images of each page of the one document. Information processing device.

An information processing apparatus according to claim 2,
If the thumbnail image corresponding to a document other than the one document does not belong to the cluster in which the maximum number of images are classified as the predetermined reference, the selection unit is closest to the center in the cluster. An information processing apparatus having a criterion of selecting an image having a feature amount.

An information processing apparatus according to claim 2 or 3,
If the thumbnail image corresponding to a document other than the one document belongs to the cluster in which the maximum number of images are further classified as the predetermined reference, the selection unit includes the one document in the cluster. An information processing apparatus comprising a criterion of selecting an image having a feature amount furthest from a thumbnail image corresponding to a document other than

An information processing apparatus according to any one of claims 1 to 4,
The information processing apparatus, wherein the feature amount calculation means uses a feature amount of a thumbnail image created based on an image of each page as a feature amount of an image of each page of the one document.

An information processing apparatus according to any one of claims 1 to 5,
The information processing apparatus according to claim 1, wherein the feature amount calculating unit includes a unit that sets a range of documents corresponding to a range of documents among the documents stored in the storage unit.

An information processing apparatus according to any one of claims 1 to 6,
A first control unit configured to create a thumbnail image corresponding to the document by the feature amount calculation unit, the similarity calculation unit, the selection unit, and the thumbnail generation unit when the document is to be newly stored in the storage unit; An information processing apparatus comprising:

An information processing apparatus according to any one of claims 1 to 7,
At any timing, a thumbnail image corresponding to the document is created by the feature amount calculation unit, the similarity calculation unit, the selection unit, and the thumbnail creation unit for any of the documents already stored in the storage unit An information processing apparatus comprising: a second control unit that performs:

An information processing apparatus according to any one of claims 1 to 8,
In the case where another document having the same content as the one document for which the corresponding thumbnail image is to be created is stored in the storage unit, the thumbnail creating unit stores the thumbnail image corresponding to the other document, An information processing apparatus that is employed as a thumbnail image corresponding to the one document.

An information processing apparatus according to any one of claims 1 to 9,
The feature amount calculation means calculates the feature amount for each of the first page image of the one document and the thumbnail image corresponding to the document other than the one document stored by the storage means,
Based on the feature amount calculated by the feature amount calculation means, a similarity between the image of the first page of the one document and a thumbnail image corresponding to each document other than the one document is obtained, and the one document A thumbnail image corresponding to the one document is generated based on the image of the first page of the one document when there is no thumbnail image whose similarity with the image of the first page is equal to or greater than a predetermined reference,
Third control means for causing the selection means to select an image as a candidate for another page of the one document when there is a thumbnail image having a similarity with the image of the first page of the one document equal to or greater than the predetermined reference. An information processing apparatus comprising:

Storage means for storing documents;
Storage means for storing thumbnail images corresponding to each document stored by the storage means;
Features for calculating feature amounts for each page image of one document for which a corresponding thumbnail image is to be created, and for thumbnail images corresponding to a document other than the one document stored by the storage unit A quantity calculating means;
Similarity for calculating similarity between images in the image of each page of the one document and the thumbnail image corresponding to the document other than the one document based on the feature amount calculated by the feature amount calculation unit Degree calculation means;
Based on the similarity calculated by the similarity calculation means, among the images of each page of the one document, the similarity with other pages in the one document is high, and a document other than the one document Selecting means for selecting an image with a low similarity to the corresponding thumbnail image according to a predetermined criterion;
An information processing system comprising: a thumbnail creation unit that creates a thumbnail image corresponding to the one document based on the image selected by the selection unit.

Computer
Storage means for storing documents;
Storage means for storing thumbnail images corresponding to each document stored by the storage means;
Features for calculating feature amounts for each page image of one document for which a corresponding thumbnail image is to be created, and for thumbnail images corresponding to a document other than the one document stored by the storage unit A quantity calculating means;
Similarity for calculating similarity between images in the image of each page of the one document and the thumbnail image corresponding to the document other than the one document based on the feature amount calculated by the feature amount calculation unit Degree calculation means;
Based on the similarity calculated by the similarity calculation means, among the images of each page of the one document, the similarity with other pages in the one document is high, and a document other than the one document Selecting means for selecting an image with a low similarity to the corresponding thumbnail image according to a predetermined criterion;
A program for functioning as thumbnail creation means for creating a thumbnail image corresponding to the one document based on the image selected by the selection means.