JP6135327B2

JP6135327B2 - Information processing apparatus, document data organizing apparatus, document presentation method, and computer program

Info

Publication number: JP6135327B2
Application number: JP2013129635A
Authority: JP
Inventors: 香美森脇; 聡史出石; 河渕　洋一; 洋一河渕
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2013-06-20
Filing date: 2013-06-20
Publication date: 2017-05-31
Anticipated expiration: 2033-06-20
Also published as: JP2015005112A

Description

本発明は、文書を取り扱う装置および方法に関する。 The present invention relates to an apparatus and method for handling documents.

従来、ユーザが指定した単語を検索クエリ（検索キー）として文書の検索を行う技術が普及している。 2. Description of the Related Art Conventionally, a technique for searching for a document using a word specified by a user as a search query (search key) has been widely used.

近年、ローカルのコンピュータにもインターネット上のコンピュータにも文書のデータがますます多く記憶されている。よって、非常に多くの文書が検索によってヒットすることがある。ヒットする文書が多いと、ユーザは、さらに、自分に必要な文書を、ヒットした文書の中から絞り込まなければならない。 In recent years, more and more document data is stored in both local computers and computers on the Internet. Therefore, a very large number of documents may be hit by the search. If there are many hit documents, the user must further narrow down documents necessary for the user from the hit documents.

したがって、文書の検索を容易に行えることがユーザから求められる。このための方法として、幾つかの方法が考えられる。 Therefore, the user is required to easily search for documents. Several methods are conceivable as methods for this purpose.

例えば、文書を予め分類しておくことが、考えられる。これにより、検索の範囲を限定することができる。 For example, it is conceivable to classify documents in advance. Thereby, the range of search can be limited.

また、多数の文書のうちの互いに類似する複数の文書を階層的に分類する方法が提案されている（特許文献１）。 A method of hierarchically classifying a plurality of similar documents among a large number of documents has been proposed (Patent Document 1).

また、いわゆるバージョン違いの複数の文書のうちの１つを代表文書として検索結果に表わす方法が提案されている（特許文献２）。 Further, a method has been proposed in which one of a plurality of documents having different versions is represented as a representative document in a search result (Patent Document 2).

特開２００４−３１８５２７号公報JP 2004-318527 A 特開２００６−１２７０２９号公報JP 2006-127029 A

しかし、ユーザによっては、互いに類似する複数の文書の１つのみの存在を知ればよいこともあれば、すべての文書の存在を知りたいこともある。 However, some users may need to know only one of a plurality of similar documents, or may want to know the existence of all documents.

本発明は、このような課題に鑑み、互いに類似する複数の文書を、従来よりもユーザの好みに合わせて提示することを、目的とする。 The present invention has been made in view of such problems, and an object thereof is to present a plurality of documents that are similar to each other in accordance with user preferences.

本発明の一形態に係る情報処理装置は、互いに類似する複数の文書のうちの一部または全部の存在を知らせる情報処理装置であって、前記複数の文書同士の差分を抽出する差分抽出手段と、興味キーワード記憶手段に予め記憶されている、ユーザが興味を有する事項を表わす興味キーワードが、所定の条件を満たすように前記差分に表われているか否かを判別する、条件適否判別手段と、前記興味キーワードが前記所定の条件を満たすように前記差分に表われている場合は、前記複数の文書の全部の存在を提示し、そうでない場合は、前記複数の文書のうちの一部の存在を優先的に提示する、存在提示手段と、を有する。 An information processing apparatus according to an aspect of the present invention is an information processing apparatus that notifies the existence of some or all of a plurality of similar documents, and includes a difference extraction unit that extracts differences between the plurality of documents. A condition suitability judging means for judging whether or not an interest keyword representing an item of interest to the user, which is stored in advance in the interest keyword storage means, is represented in the difference so as to satisfy a predetermined condition; If the interest keyword appears in the difference so as to satisfy the predetermined condition, the presence of all of the plurality of documents is presented; otherwise, the existence of a part of the plurality of documents Presence presenting means for preferentially presenting.

好ましくは、検索のキーとして使用する検索キーワードを受け付ける検索キーワード受付手段、を有し、前記複数の文書は、前記検索キーワードに基づいて検索されたものである。 Preferably, a search keyword receiving unit that receives a search keyword used as a search key is included, and the plurality of documents are searched based on the search keyword.

または、前記所定の条件は、前記興味キーワードが前記差分に１回以上、表われることである。または、前記興味キーワード記憶手段には、前記興味キーワードが複数、記憶されており、かつ、当該複数の興味キーワードそれぞれに点数が与えられており、前記所定の条件は、前記興味キーワードそれぞれの、前記差分に表われる回数と前記点数との積の和が閾値以上であることである。または、前記興味キーワード記憶手段には、複数の単語からなるペアが前記興味キーワードとして複数、記憶されており、かつ、当該複数のペアそれぞれに点数が与えられており、前記所定の条件は、前記ペアそれぞれの、前記差分に表われる回数と前記点数との積の和が閾値以上であることである。 Alternatively, the predetermined condition is that the keyword of interest appears at least once in the difference. Alternatively, the interest keyword storage means stores a plurality of the interest keywords, and a score is given to each of the plurality of interest keywords, and the predetermined condition is that each of the interest keywords The sum of the products of the number of times represented by the difference and the score is equal to or greater than a threshold value. Alternatively, the interest keyword storage means stores a plurality of pairs of words as the interest keyword, and a score is given to each of the plurality of pairs, and the predetermined condition is: The sum of the product of the number of times that appears in the difference and the score of each pair is equal to or greater than a threshold value.

または、前記存在提示手段は、前記興味キーワードが前記所定の条件を満たすように前記差分に表われていない場合に、前記複数の文書のうちの一部の存在を、当該一部の識別子を表示し残りの識別子を表示しないことによって、優先的に提示する。または、前記存在提示手段は、前記興味キーワードが前記所定の条件を満たすように前記差分に表われていない場合に、前記複数の文書のうちの一部の存在を、当該一部の識別子を残りの識別子よりも目立つように表示することによって、優先的に提示する。 Alternatively, the presence presenting means displays a part of the plurality of documents and a part of the identifier when the interested keyword is not represented in the difference so as to satisfy the predetermined condition. However, the remaining identifiers are preferentially presented by not displaying them. Alternatively, if the interest keyword is not represented in the difference so that the predetermined keyword satisfies the predetermined condition, the presence presenting means leaves a part of the plurality of documents as a remaining identifier. It is presented preferentially by displaying it conspicuously than the identifier.

または、前記興味キーワードが前記所定の条件を満たすように前記差分に表われていない場合に、前記複数の文書のうちの前記存在提示手段によって優先的に提示された文書のデータを残しそれ以外の文書のデータを削除する処理を行う、文書データ整理手段、を有する。 Or, when the interested keyword is not represented in the difference so as to satisfy the predetermined condition, the data of the document preferentially presented by the presence presenting means among the plurality of documents is left, and the others A document data organizing unit that performs processing for deleting document data;

本発明によると、互いに類似する複数の文書を、従来よりもユーザの好みに合わせて提示することができる。 According to the present invention, a plurality of documents that are similar to each other can be presented in accordance with the user's preference as compared with the related art.

文書管理システムの全体的な構成の例を示す図である。It is a figure which shows the example of the whole structure of a document management system. ユーザコンピュータのハードウェア構成の例を示す図である。It is a figure which shows the example of the hardware constitutions of a user computer. ユーザコンピュータの機能的構成の例を示す図である。It is a figure which shows the example of a functional structure of a user computer. 興味キーワード入力画面の例を示す図である。It is a figure which shows the example of an interest keyword input screen. 個人プロファイルデータの例を示す図である。It is a figure which shows the example of personal profile data. ユーザコード入力画面の例を示す図である。It is a figure which shows the example of a user code input screen. 検索キーワード入力画面の例を示す図である。It is a figure which shows the example of a search keyword input screen. ２つの類似文書の例を示す図である。It is a figure which shows the example of two similar documents. 差分の例を示す図である。It is a figure which shows the example of a difference. 関心有無判別処理の流れの例を説明するフローチャートである。It is a flowchart explaining the example of the flow of an interest presence determination process. 検索結果提示処理の流れの例を説明するフローチャートである。It is a flowchart explaining the example of the flow of a search result presentation process. 検索結果画面の例を示す図である。It is a figure which shows the example of a search result screen. 検索結果画面の例を示す図である。It is a figure which shows the example of a search result screen. 検索結果画面の例を示す図である。It is a figure which shows the example of a search result screen. 類似文書のグループが２組ある場合の検索結果画面の例を示す図である。It is a figure which shows the example of a search result screen in case there are two groups of similar documents. 検索処理の流れの例を説明するフローチャートである。It is a flowchart explaining the example of the flow of a search process. 類似文書および差分の例を示す図である。It is a figure which shows the example of a similar document and a difference. 個人プロファイルデータの例を示す図である。It is a figure which shows the example of personal profile data. 関心有無判別処理の流れの変形例を説明するフローチャートである。It is a flowchart explaining the modification of the flow of an interest presence determination process. 差分の例を示す図である。It is a figure which shows the example of a difference. 個人プロファイルデータの例を示す図である。It is a figure which shows the example of personal profile data. 関心有無判別処理の流れの変形例を説明するフローチャートである。It is a flowchart explaining the modification of the flow of an interest presence determination process. 類似文書の例を示す図である。It is a figure which shows the example of a similar document. 検索処理の流れの変形例を説明するフローチャートである。It is a flowchart explaining the modification of the flow of a search process. 差分の例を示す図である。It is a figure which shows the example of a difference. 第二の検索結果提示処理の流れの例を説明するフローチャートである。It is a flowchart explaining the example of the flow of a 2nd search result presentation process. 検索結果画面の例を示す図である。It is a figure which shows the example of a search result screen. ユーザコンピュータの機能的構成の変形例を示す図である。It is a figure which shows the modification of a functional structure of a user computer. 検索結果画面の変形例を示す図である。It is a figure which shows the modification of a search result screen.

図１は、文書管理システム１の全体的な構成の例を示す図である。図２は、ユーザコンピュータ２のハードウェア構成の例を示す図である。図３は、ユーザコンピュータ２の機能的構成の例を示す図である。 FIG. 1 is a diagram illustrating an example of the overall configuration of the document management system 1. FIG. 2 is a diagram illustrating an example of a hardware configuration of the user computer 2. FIG. 3 is a diagram illustrating an example of a functional configuration of the user computer 2.

図１の文書管理システム１は、種々の情報、特に文書（ドキュメント）を管理しユーザに提供するためのシステムである。文書管理システム１は、企業、役所、または学校などの組織に設置され、組織のメンバーによって用いられる。以下、ある企業において文書管理システム１が用いられる場合を例に説明する。よって、この企業の従業員が文書管理システム１のユーザである。各ユーザには、ユニークなＩＤ（identification）であるユーザコードが１つずつ与えられている。 The document management system 1 in FIG. 1 is a system for managing various information, in particular, a document (document) and providing it to a user. The document management system 1 is installed in an organization such as a company, a government office, or a school, and is used by members of the organization. Hereinafter, a case where the document management system 1 is used in a certain company will be described as an example. Therefore, employees of this company are users of the document management system 1. Each user is given one user code, which is a unique ID (identification).

文書管理システム１は、複数台のユーザコンピュータ２、文書サーバ３、および通信回線４などによって構成される。各ユーザコンピュータ２および文書サーバ３は、通信回線４を介して通信を行うことができる。通信回線４として、インターネット、いわゆるＬＡＮ（Local Area Network）回線、公衆回線、または専用線などが用いられる。 The document management system 1 includes a plurality of user computers 2, a document server 3, a communication line 4, and the like. Each user computer 2 and document server 3 can communicate via a communication line 4. As the communication line 4, the Internet, a so-called LAN (Local Area Network) line, a public line, a dedicated line, or the like is used.

文書サーバ３は、この企業の複数の従業員（ユーザ）で共用する文書の文書データ５０を記憶する。そして、ユーザコンピュータ２からの要求に応じて文書データ５０を提供する。以下、説明の簡単のため、文書データ５０へのアクセスの制限が設定されていない場合を例に説明する。 The document server 3 stores document data 50 of documents shared by a plurality of employees (users) of this company. Then, document data 50 is provided in response to a request from the user computer 2. Hereinafter, for the sake of simplicity of explanation, a case where access restriction to the document data 50 is not set will be described as an example.

文書サーバ３として、市販されている文書管理用のサーバが用いられる。ＮＡＳ（Network Attached Storage）を用いてもよい。文書データ５０のフォーマットとして、テキストフォーマットまたはＰＤＦ（Portable Document Format）のほか、文書作成用のアプリケーションのフォーマットなどが用いられる。 As the document server 3, a commercially available document management server is used. NAS (Network Attached Storage) may be used. As a format of the document data 50, a text format or PDF (Portable Document Format), a format of an application for creating a document, and the like are used.

ユーザコンピュータ２は、文書サーバ３に保存されている文書データ５０へユーザがアクセスするためのコンピュータである。ユーザコンピュータ２として、パーソナルコンピュータ、タブレットコンピュータ、またはスマートフォンなどが用いられる。以下、ユーザコンピュータ２としてパーソナルコンピュータが用いられる場合を例に説明する。 The user computer 2 is a computer for the user to access the document data 50 stored in the document server 3. As the user computer 2, a personal computer, a tablet computer, a smartphone, or the like is used. Hereinafter, a case where a personal computer is used as the user computer 2 will be described as an example.

ユーザコンピュータ２は、図２に示すようにＣＰＵ（Central Processing Unit）２０ａ、ＲＡＭ（Random Access Memory）２０ｂ、ＲＯＭ（Read Only Memory）２０ｃ、大容量記憶装置２０ｄ、タッチパネルディスプレイ２０ｅ、キーボード２０ｆ、ポインティングデバイス２０ｇ、およびＮＩＣ（Network Interface Card）２０ｈなどによって構成される。 As shown in FIG. 2, the user computer 2 includes a CPU (Central Processing Unit) 20a, a RAM (Random Access Memory) 20b, a ROM (Read Only Memory) 20c, a mass storage device 20d, a touch panel display 20e, a keyboard 20f, and a pointing device. 20g and NIC (Network Interface Card) 20h.

タッチパネルディスプレイ２０ｅは、ユーザに対してメッセージを与えるための画面、処理の結果を示す画面、またはユーザが指令を入力するための画面などを表示する。また、タッチパネルディスプレイ２０ｅは、タッチされた位置を検知し、ＣＰＵ２０ａにその位置を通知する。 The touch panel display 20e displays a screen for giving a message to the user, a screen showing a result of processing, a screen for the user to input a command, and the like. The touch panel display 20e detects the touched position and notifies the CPU 20a of the position.

キーボード２０ｆおよびポインティングデバイス２０ｇは、指令および条件などをユーザが入力するために用いられる。 The keyboard 20f and the pointing device 20g are used by the user to input commands and conditions.

ＮＩＣ２０ｈは、ＴＣＰ／ＩＰなどのプロトコルで文書サーバ３などの装置と通信を行う。 The NIC 20h communicates with a device such as the document server 3 using a protocol such as TCP / IP.

大容量記憶装置２０ｄには、検索ソフトウェア２Ｐ１が記憶されている。また、大容量記憶装置２０ｄの特定のディレクトリには、複数の文書データ５０が記憶されている。以下、この特定のディレクトリを「ローカル文書データベース２Ｂ１」と記載する。 The mass storage device 20d stores search software 2P1. A plurality of document data 50 is stored in a specific directory of the large-capacity storage device 20d. Hereinafter, this specific directory is referred to as “local document database 2B1”.

検索ソフトウェア２Ｐ１は、文書サーバ３またはローカル文書データベース２Ｂ１に文書データ５０が記憶されている文書を検索するためのソフトウェアである。検索ソフトウェア２Ｐ１によると、図３に示す個人プロファイルデータ生成部２０１、個人プロファイルデータ記憶部２０２、ユーザコード受付部２０３、検索キーワード受付部２０４、文書検索部２０５、類似文書選出部２０６、差分抽出部２０７、関心有無判別部２０８、および検索結果提示部２０９などの機能が実現される。 The search software 2P1 is software for searching for documents in which the document data 50 is stored in the document server 3 or the local document database 2B1. According to the search software 2P1, the personal profile data generation unit 201, personal profile data storage unit 202, user code reception unit 203, search keyword reception unit 204, document search unit 205, similar document selection unit 206, difference extraction unit shown in FIG. Functions such as 207, interest presence / absence determination unit 208, and search result presentation unit 209 are realized.

次に、図３に示すユーザコンピュータ２の個人プロファイルデータ生成部２０１ないし検索結果提示部２０９の詳細を、図４〜図１４を参照しながら説明する。 Next, details of the personal profile data generation unit 201 or the search result presentation unit 209 of the user computer 2 shown in FIG. 3 will be described with reference to FIGS.

〔検索の作業前の準備〕
図４は、興味キーワード入力画面６１の例を示す図である。図５は、個人プロファイルデータ５２の例を示す図である。 [Preparation before search work]
FIG. 4 is a diagram illustrating an example of the interest keyword input screen 61. FIG. 5 is a diagram illustrating an example of the personal profile data 52.

個人プロファイルデータ生成部２０１は、予め、ユーザごとの個人プロファイルデータ５２を生成する。個人プロファイルデータ５２は、ユーザのプロファイルを示すが、本実施形態では特に、ユーザの興味を表わす単語を興味キーワードとして示す。個人プロファイルデータ生成部２０１は、個人プロファイルデータ５２を例えば次のように生成する。 The personal profile data generation unit 201 generates personal profile data 52 for each user in advance. The personal profile data 52 indicates a user's profile. In the present embodiment, in particular, a word representing the user's interest is indicated as an interest keyword. The personal profile data generation unit 201 generates the personal profile data 52 as follows, for example.

プロファイルの作成のモードがユーザによって選択されると、個人プロファイルデータ生成部２０１は、図４のような興味キーワード入力画面６１をタッチパネルディスプレイ２０ｅに表示する。ここで、ユーザは、自分のユーザコードを入力する。さらに、自分の興味のある物事を表わす単語を入力する。複数の単語がある場合は、１つ１つをカンマで区切って入力する。そして、「ＯＫ」ボタンを押す。 When the profile creation mode is selected by the user, the personal profile data generation unit 201 displays an interesting keyword input screen 61 as shown in FIG. 4 on the touch panel display 20e. Here, the user inputs his / her user code. In addition, enter words that represent things you are interested in. If there are multiple words, enter them one by one separated by commas. Then, press the “OK” button.

すると、個人プロファイルデータ生成部２０１は、入力されたユーザコードおよび単語を示す個人プロファイルデータ５２を生成する。 Then, the personal profile data generation unit 201 generates personal profile data 52 indicating the input user code and word.

個人プロファイルデータ５２は、個人プロファイルデータ記憶部２０２に記憶される。これにより、個人プロファイルデータ記憶部２０２には、図５に示すように、ユーザごとの個人プロファイルデータ５２が記憶される。 The personal profile data 52 is stored in the personal profile data storage unit 202. Thus, personal profile data 52 for each user is stored in the personal profile data storage unit 202 as shown in FIG.

〔検索時の処理〕
図６は、ユーザコード入力画面６２の例を示す図である。図７は、検索キーワード入力画面６３の例を示す図である。図８は、２つの類似文書７Ａ１、７Ａ２の例を示す図である。図９は、差分７Ｂ１、７Ｂ２の例を示す図である。図１０は、関心有無判別処理の流れの例を説明するフローチャートである。図１１は、検索結果提示処理の流れの例を説明するフローチャートである。 [Process during search]
FIG. 6 is a diagram illustrating an example of the user code input screen 62. FIG. 7 is a diagram illustrating an example of the search keyword input screen 63. FIG. 8 is a diagram illustrating an example of two similar documents 7A1 and 7A2. FIG. 9 is a diagram illustrating an example of the differences 7B1 and 7B2. FIG. 10 is a flowchart illustrating an example of the flow of interest presence / absence determination processing. FIG. 11 is a flowchart for explaining an example of the flow of search result presentation processing.

ユーザコード受付部２０３は、検索を指示するユーザのユーザコードを受け付けるための処理を次のように行う。 The user code receiving unit 203 performs a process for receiving a user code of a user who instructs a search as follows.

検索のモードがユーザによって選択されると、ユーザコード受付部２０３は、図６のようなユーザコード入力画面６２をタッチパネルディスプレイ２０ｅに表示する。ここで、ユーザは、自分のユーザコードを入力し、「ＯＫ」ボタンを押す。 When the search mode is selected by the user, the user code receiving unit 203 displays a user code input screen 62 as shown in FIG. 6 on the touch panel display 20e. Here, the user inputs his / her user code and presses an “OK” button.

すると、ユーザコード受付部２０３は、入力されたユーザコードを、検索を指示するユーザのユーザコードとして受け付ける。 Then, the user code reception unit 203 receives the input user code as the user code of the user who instructs the search.

または、既にユーザ認証に成功し、現在、ユーザがユーザコンピュータ２にログイン中である場合は、ユーザコード受付部２０３は、ユーザコードを、ユーザコード入力画面６２で受け付ける代わりに、オペレーティングシステムに問い合わせるなどして取得してもよい。 Alternatively, when user authentication has already been successful and the user is currently logged in to the user computer 2, the user code receiving unit 203 makes an inquiry to the operating system instead of receiving the user code on the user code input screen 62. You may get it.

検索キーワード受付部２０４は、ユーザコードがユーザコード受付部２０３によって受け付けられると、検索のキー（検索クエリ）とする語句（以下、「検索キーワード」と記載する。）を受け付けるための処理を次のように行う。検索キーワード受付部２０４は、図７のような検索キーワード入力画面６３をタッチパネルディスプレイ２０ｅに表示する。ここで、ユーザは、検索キーワードとする語句を入力し、「検索」ボタンを押す。 When the user code is received by the user code receiving unit 203, the search keyword receiving unit 204 performs a process for receiving a phrase (hereinafter referred to as “search keyword”) as a search key (search query). Do as follows. The search keyword receiving unit 204 displays a search keyword input screen 63 as shown in FIG. 7 on the touch panel display 20e. Here, the user inputs a word / phrase as a search keyword and presses a “search” button.

すると、検索キーワード受付部２０４は、入力された語句を検索キーワードとして受け付ける。 Then, the search keyword receiving unit 204 receives the input word / phrase as a search keyword.

文書検索部２０５は、ローカル文書データベース２Ｂ１に文書データ５０が記憶されている文書および文書サーバ３に文書データ５０が記憶されている文書の中から、受け付けられた検索キーワードを含む文書を検索する。検索は、公知の方法で行うことができる。なお、文書サーバ３の文書からの検索については、検索キーワードを文書サーバ３へ通知し、文書サーバ３に実行させてもよい。そして、ユーザコンピュータ２は、見つかった（ヒットした）文書の文書データ５０を文書サーバ３から取得すればよい。 The document retrieval unit 205 retrieves a document including the accepted search keyword from the document in which the document data 50 is stored in the local document database 2B1 and the document in which the document data 50 is stored in the document server 3. The search can be performed by a known method. Note that the search from the document of the document server 3 may be performed by notifying the document server 3 of the search keyword and causing the document server 3 to execute the search keyword. Then, the user computer 2 may acquire the document data 50 of the found (hit) document from the document server 3.

類似文書選出部２０６は、文書検索部２０５によって複数の文書が見つかった場合に、これらの文書の中から互いに類似するもの（以下、「類似文書」と記載する。）を選出する。類似文書の選出は、公知の方法によって行うことができる。例えば、ある２つ以上の文書の一致する部分の文字数が所定の数以上であれば、これらの文書を類似文書として選出する。または、特開２００４−３１８５２７号公報に記載されるような、ある２つ以上の文書の類似度を算出し、その類似度が所定の値以上であれば、これらの文書を類似文書として選出してもよい。 When a plurality of documents are found by the document search unit 205, the similar document selection unit 206 selects similar documents (hereinafter referred to as “similar documents”) from these documents. Selection of similar documents can be performed by a known method. For example, if the number of matching characters in two or more documents is equal to or greater than a predetermined number, these documents are selected as similar documents. Alternatively, as described in Japanese Patent Application Laid-Open No. 2004-318527, the similarity between two or more documents is calculated, and if the similarity is equal to or greater than a predetermined value, these documents are selected as similar documents. May be.

差分抽出部２０７は、類似文書選出部２０６によって選出された複数の類似文書から差分を抽出する。差分の抽出は、公知の方法によって行えばよい。 The difference extraction unit 207 extracts differences from a plurality of similar documents selected by the similar document selection unit 206. The extraction of the difference may be performed by a known method.

例えば、図８に示す２つの類似文書７Ａ１、７Ａ２が選出された場合は、差分抽出部２０７は、図９に示す差分７Ｂ１、７Ｂ２を抽出する。 For example, when two similar documents 7A1 and 7A2 shown in FIG. 8 are selected, the difference extraction unit 207 extracts the differences 7B1 and 7B2 shown in FIG.

関心有無判別部２０８は、類似文書選出部２０６によって選出された複数の類似文書に対してユーザが関心を有するか否かを判別する。この処理は、例えば、図１０に示す手順で実行される。 The interest determination unit 208 determines whether or not the user is interested in a plurality of similar documents selected by the similar document selection unit 206. This process is executed, for example, according to the procedure shown in FIG.

関心有無判別部２０８は、ユーザコード受付部２０３によって受け付けられたユーザコードを示す個人プロファイルデータ５２を個人プロファイルデータ記憶部２０２（図５参照）から読み出す（図１０の＃８０１）。差分抽出部２０７によって抽出されたいずれかの差分に、この個人プロファイルデータ５２に示される興味キーワードが含まれているか否かをチェックする（＃８０２）。そして、含まれていれば（＃８０３でＹｅｓ）、類似文書に対してユーザが興味を有すると、判別する（＃８０４）。含まれていなければ（＃８０３でＮｏ）、興味がないと判別する（＃８０５）。 The interest determination unit 208 reads the personal profile data 52 indicating the user code received by the user code reception unit 203 from the personal profile data storage unit 202 (see FIG. 5) (# 801 in FIG. 10). It is checked whether any of the differences extracted by the difference extraction unit 207 includes the interest keyword indicated in the personal profile data 52 (# 802). If it is included (Yes in # 803), it is determined that the user is interested in the similar document (# 804). If it is not included (No in # 803), it is determined that there is no interest (# 805).

例えば、図８に示す２つの類似文書７Ａ１、７Ａ２の差分として、図９に示す差分７Ｂ１、７Ｂ２が抽出され、かつ、個人プロファイルデータ５２に「技術,材料,合成,温度」が示される場合は、「温度」という興味キーワードが差分７Ｂ２に含まれるので、類似文書７Ａ１、７Ａ２に対して関心があると、判別する。一方、「商品,販売,ニーズ,人気」が個人プロファイルデータ５２に示される場合は、いずれの興味キーワードも差分７Ｂ１にも７Ｂ２にも含まれないので、関心がないと、判別する。 For example, when the differences 7B1 and 7B2 shown in FIG. 9 are extracted as the differences between the two similar documents 7A1 and 7A2 shown in FIG. 8, and “technology, material, composition, temperature” is shown in the personal profile data 52 Since the interest keyword “temperature” is included in the difference 7B2, it is determined that the similar documents 7A1 and 7A2 are interested. On the other hand, if “commodity, sales, needs, popularity” is indicated in the personal profile data 52, no interest keyword is included in the difference 7B1 or 7B2, so it is determined that there is no interest.

３つ以上の類似文書が選出された場合は、これらの類似文書それぞれから差分を抽出すればよい。そして、いずれかの差分にいずれかの興味キーワードが含まれて（表われて）いれば、関心があると判別すればよい。なお、ある類似文書の「差分」は、他のいずれかの類似文書とは異なる部分を意味する。 When three or more similar documents are selected, a difference may be extracted from each of these similar documents. Then, if any interest keyword is included (represented) in any difference, it may be determined that there is interest. The “difference” of a certain similar document means a portion different from any other similar document.

検索結果提示部２０９は、文書検索部２０５による検索の結果、つまり、見つかった文書を提示する。この処理は、例えば、図１１に示す手順で実行される。 The search result presentation unit 209 presents the search result by the document search unit 205, that is, the found document. This process is executed, for example, according to the procedure shown in FIG.

検索結果提示部２０９は、文書検索部２０５によって見つかった文書の文書名の配置を次のように決定する。類似文書選出部２０６によって選出された類似文書については（図１１の＃８１１でＹｅｓ）、ユーザが関心を有すると関心有無判別部２０８によって判別された場合は（＃８１２でＹｅｓ）、それぞれの類似文書の文書名を同等（同列、同順位）に配置する（＃８１３）。しかし、関心を有しないと判別された場合は（＃８１２でＮｏ）、これらの類似文書のうちの１つを代表として選出し、その文書名を優先的に配置する（＃８１４）。優先的な配置の例および代表の選出の例は、後に説明する。 The search result presentation unit 209 determines the arrangement of the document names of the documents found by the document search unit 205 as follows. For the similar documents selected by the similar document selection unit 206 (Yes in # 811 in FIG. 11), if the user presence / absence determination unit 208 determines that the user is interested (Yes in # 812), the respective similarities are selected. The document names of the documents are arranged equally (same column, same rank) (# 813). However, if it is determined that there is no interest (No in # 812), one of these similar documents is selected as a representative, and the document name is preferentially arranged (# 814). Examples of preferential placement and examples of representative selection will be described later.

一方、類似文書選出部２０６によって選出されなかった文書（以下、「非類似文書」と記載する。）の文書名については（＃８１５でＹｅｓ）、他の文書の文書名と同等に配置する（＃８１６）。 On the other hand, the document names of documents that are not selected by the similar document selection unit 206 (hereinafter referred to as “dissimilar documents”) (Yes in # 815) are arranged in the same manner as the document names of other documents ( # 816).

そして、上述の通り文書名を配置した検索結果画面６４をタッチパネルディスプレイ２０ｅに表示する（＃８１７）。 Then, the search result screen 64 in which the document names are arranged as described above is displayed on the touch panel display 20e (# 817).

なお、文書名の代わりに、文書データ５０のデータ名（ファイル名）を表示してもよい。または、文書名とともにデータ名を表示してもよい。さらに、文書データ５０の保存場所とともに表示してもよい。 Note that the data name (file name) of the document data 50 may be displayed instead of the document name. Alternatively, the data name may be displayed together with the document name. Further, it may be displayed together with the storage location of the document data 50.

図１２〜図１４は、検索結果画面６４の例を示す図である。図１５は、類似文書のグループが２組ある場合の検索結果画面６４の例を示す図である。 12 to 14 are diagrams showing examples of the search result screen 64. FIG. 15 is a diagram showing an example of the search result screen 64 when there are two groups of similar documents.

ここで、文書検索部２０５および類似文書選出部２０６の処理によって、類似文書として「ＡＡＡＡＡ１」という文書名の文書および「ＡＡＡＡＡ２」という文書名の文書が見つかり、非類似文書として「ＢＢＢＢＢ」という文書名の文書が見つかった場合を例に、優先的な配置の仕方を説明する。 Here, by the processing of the document search unit 205 and the similar document selection unit 206, a document with a document name “AAAAAA1” and a document with a document name “AAAAA2” are found as similar documents, and a document name “BBBBB” is as a dissimilar document. The preferential arrangement method will be described by taking as an example the case where the above document is found.

類似文書に関心がないと関心有無判別部２０８によって判別された場合は、検索結果提示部２０９は、例えば、図１２（Ａ）のように、類似文書の代表のみの文書名と非類似文書の文書名とを、同等に配置する。類似文書のうち、代表以外の文書名は、この時点では、除去しておく。また、類似文書の文書名の直後にアイコン６４ｃを配置する。そして、アイコン６４ｃが押されたら、関心有無判別部２０８は、検索結果画面６４を、図１２（Ｂ）のように、代表以外の（残りの）類似文書の文書名を新たに、インデントを付けて（字下げして）配置した状態で、表示し直す。 When the interest presence / absence discriminating unit 208 discriminates that there is no interest in similar documents, the search result presenting unit 209 displays, for example, as shown in FIG. Arrange document names equally. Among similar documents, document names other than the representative are removed at this point. An icon 64c is arranged immediately after the document name of the similar document. When the icon 64c is pressed, the interest determination unit 208 adds a new indent to the search result screen 64 with the document names of the remaining (non-representative) similar documents as shown in FIG. (Indented) and placed again.

なお、図１２（Ａ）および（Ｂ）の例では、類似文書の代表の文書名および非類似文書の文書名の順番は、任意に決めればよい。例えば、アルファベットの降順または昇順にしてもよい。または、作成日時の降順または昇順にしてもよい。検索結果画面６４を、最初から、図１２（Ｂ）のように代表以外の類似文書の文書名を下位に配置して表示してもよい。下位の類似文書の文書名のフォントのサイズを、代表の文書名のフォントのサイズよりも小さくしてもよい。 In the examples of FIGS. 12A and 12B, the order of the representative document name of the similar document and the document name of the dissimilar document may be arbitrarily determined. For example, it may be in descending or ascending order of the alphabet. Alternatively, the creation date may be in descending order or ascending order. From the beginning, the search result screen 64 may be displayed with the document names of similar documents other than the representative arranged at the lower position as shown in FIG. The font size of the document name of the lower similar document may be smaller than the font size of the representative document name.

一方、類似文書に関心があると関心有無判別部２０８によって判別された場合は、検索結果提示部２０９は、図１３のように、すべての類似文書および非類似文書それぞれの文書名を同等に配置する。この場合も、文書名の順番は、任意に決めればよい。 On the other hand, if the interest determination unit 208 determines that the similar document is interested, the search result presentation unit 209 arranges the document names of all similar documents and dissimilar documents equally as shown in FIG. To do. In this case as well, the order of document names may be determined arbitrarily.

または、類似文書に関心がないと判別された場合に、検索結果提示部２０９は、図１４（Ａ）のように、複数の類似文書の集合を表わすアイコン６４ｓおよび代表の文書名と、単独の非類似文書を表わすアイコン６４ｔ１およびその文書名とを、配置してもよい。そして、アイコン６４ｓが押されたら、図１４（Ｂ）のように、類似文書それぞれのアイコン６４ｔ２、６４ｔ３をそれぞれの文書名とともに配置してもよい。 Alternatively, when it is determined that there is no interest in similar documents, the search result presentation unit 209 displays an icon 64s representing a set of a plurality of similar documents and a representative document name, as shown in FIG. An icon 64t1 representing a dissimilar document and its document name may be arranged. When the icon 64s is pressed, the icons 64t2 and 64t3 of the similar documents may be arranged together with the document names as shown in FIG. 14B.

また、検索結果提示部２０９は、類似文書の代表を次のように選出すればよい。例えば、作成日時が最新のものを代表として選出すればよい。または、更新日時が最新のものを選出してもよい。または、アクセスの回数が最多であるものを選出してもよい。または、文字数が最多であるものを選出してもよい。代表は、２つ以上であってもよい、特に、類似文書が多数ある場合は、そのうちの数個を代表として選出してもよい。 Further, the search result presentation unit 209 may select representatives of similar documents as follows. For example, the latest creation date may be selected as a representative. Alternatively, the latest update date may be selected. Alternatively, the one having the highest number of accesses may be selected. Alternatively, the character having the largest number of characters may be selected. There may be two or more representatives. In particular, when there are many similar documents, several of them may be selected as representatives.

なお、検索結果画面６４において文書名またはアイコンが選択されたら、ユーザコンピュータ２は、従来通り、それに対応する文書データ５０をローカル文書データベース２Ｂ１または文書サーバ３から読み出し、所定のアプリケーションによってオープンする。 When a document name or an icon is selected on the search result screen 64, the user computer 2 reads the document data 50 corresponding to the document name or icon from the local document database 2B1 or the document server 3 as usual, and opens it with a predetermined application.

類似文書が複数組、選出されることがある。この場合は、関心有無判別部２０８および検索結果提示部２０９は、次のように処理を行えばよい。 Multiple sets of similar documents may be selected. In this case, the interest presence / absence determining unit 208 and the search result presenting unit 209 may perform processing as follows.

関心有無判別部２０８は、それぞれの組の類似文書について、関心の有無を判別する。検索結果提示部２０９は、それぞれの組の類似文書について、関心有無判別部２０８による判別結果に応じて代表を優先的に配置するのかすべてを同等に配置するのかを決め、検索結果画面６４を表示する。 The interest presence / absence discriminating unit 208 discriminates whether or not there is an interest in each set of similar documents. The search result presentation unit 209 determines whether the representatives are preferentially arranged according to the determination result by the interest presence determination unit 208 or all of the similar documents in each set, and the search result screen 64 is displayed. To do.

例えば、文書名がそれぞれ「ＡＡＡＡＡ１」および「ＡＡＡＡＡ２」の１組の類似文書と、文書名がそれぞれ「ＣＣＣＣＣ１」および「ＣＣＣＣＣ２」の１組の類似文書が選出され、さらに、どちらの組の類似文書にも類似しない「ＢＢＢＢＢ」という文書が選出された場合は、次のように処理が行われる。関心有無判別部２０８は、２つの組それぞれについて関心の有無を判別する。検索結果提示部２０９は、関心のある組については、図１５（Ａ）または（Ｂ）に一点鎖線で示すように、類似文書の文書名を同等に配置する。関心のない組については、二点鎖線で示すように、代表の文書名を優先的に配置する。 For example, a set of similar documents with document names “AAAAAA1” and “AAAAAA2” and a set of similar documents with document names “CCCCC1” and “CCCCC2” are selected, respectively, and which set of similar documents If a document “BBBBB” that is not similar to the above is selected, the following processing is performed. The interest presence / absence determination unit 208 determines the presence / absence of interest for each of the two sets. The search result presentation unit 209 equally arranges the document names of similar documents as shown by the alternate long and short dash line in FIG. 15A or FIG. For groups that are not of interest, representative document names are preferentially arranged as shown by the two-dot chain line.

図１６は、検索処理の流れの例を説明するフローチャートである。次に、ユーザコンピュータ２の全体的な処理の流れを、図１６のフローチャートを参照しながら説明する。 FIG. 16 is a flowchart illustrating an example of the flow of search processing. Next, the overall processing flow of the user computer 2 will be described with reference to the flowchart of FIG.

ユーザコンピュータ２は、検索者であるユーザのユーザコードを受け付け（図１６の＃１１）、検索キーワードを受け付ける（＃１２）。検索キーワードを含む文書をローカル文書データベース２Ｂ１および文書サーバ３それぞれから検索し（＃１３）、互いに類似する複数の文書（類似文書）を選出する（＃１４）。 The user computer 2 receives the user code of the user who is the searcher (# 11 in FIG. 16) and receives the search keyword (# 12). A document including the search keyword is searched from each of the local document database 2B1 and the document server 3 (# 13), and a plurality of similar documents (similar documents) are selected (# 14).

類似文書を選出できた場合は、ユーザコンピュータ２は、類似文書同士の差分を抽出し（＃１５）、このユーザが類似文書に対して関心を有するか否かを判別する（＃１６）。判別の方法は、図１０で説明した通りである。 If a similar document can be selected, the user computer 2 extracts a difference between similar documents (# 15), and determines whether or not the user is interested in the similar document (# 16). The determination method is as described in FIG.

そして、ユーザコンピュータ２は、各文書の文書名をステップ＃１３〜＃１６の処理の結果に基づいて配置することによって検索結果画面６４（図１２〜図１４参照）を生成し、表示する（＃１７）。この処理の手順は、図１１で説明した通りである。 Then, the user computer 2 generates and displays the search result screen 64 (see FIGS. 12 to 14) by arranging the document names of the respective documents based on the processing results of steps # 13 to # 16 (#). 17). The procedure of this process is as described in FIG.

本実施形態によると、互いに類似する複数の文書（類似文書）を、従来よりもユーザごとの好みに合わせて提示することができる。例えば、ユーザが技術職であれば、技術的な単語が興味キーワードとして登録されているので、類似文書が技術的なものであれば、これらをすべて提示される傾向が強くなる。しかし、ユーザが営業職であれば、技術的な類似文書は、代表的なものだけ提示される傾向が強くなる。 According to the present embodiment, a plurality of documents (similar documents) similar to each other can be presented in accordance with the preference of each user as compared with the related art. For example, if the user is a technical profession, technical words are registered as interest keywords, so if similar documents are technical, there is a strong tendency to present them all. However, if the user is a sales person, only technical representative documents are more likely to be presented.

次に、図３に示すユーザコンピュータ２の各部の処理の変形例を、図１７〜図２９を参照しながら順次説明する。 Next, modified examples of the processing of each unit of the user computer 2 shown in FIG. 3 will be sequentially described with reference to FIGS.

〔個人プロファイルデータの生成および管理の変形例〕
上述の実施形態では、個人プロファイルデータ生成部２０１は、興味キーワードを、興味キーワード入力画面６１（図４参照）を介してユーザに入力させることによって取得し、個人プロファイルデータ５２（図５参照）を生成した。しかし、他の方法によって取得してもよい。 [Variation of personal profile data generation and management]
In the above-described embodiment, the personal profile data generation unit 201 acquires the interest keyword by causing the user to input the interest keyword via the interest keyword input screen 61 (see FIG. 4), and the personal profile data 52 (see FIG. 5). Generated. However, you may acquire by another method.

例えば、個人プロファイルデータ生成部２０１は、ユーザが過去にアクセスした各文書から、単語およびその登場回数を分析する。そして、登場回数が所定の回数以上の単語を興味キーワードとして選出し、個人プロファイルデータ５２を生成する。または、登場回数が多い順に所定の個数だけ単語を興味キーワードとして選出し、個人プロファイルデータ５２を生成してもよい。 For example, the personal profile data generation unit 201 analyzes words and the number of appearances from each document accessed by the user in the past. Then, a word whose number of appearances is a predetermined number or more is selected as an interest keyword, and personal profile data 52 is generated. Alternatively, the personal profile data 52 may be generated by selecting a predetermined number of words as an interest keyword in descending order of appearance.

個人プロファイルデータ５２を文書サーバ３またはその他のサーバにおいて一元的に管理し、複数のユーザコンピュータ２において共用してもよい。この場合に、ユーザコンピュータ２の関心有無判別部２０８は、ユーザコード受付部２０３によって受け付けられたユーザコードをサーバへ通知し、これを示す個人プロファイルデータ５２をサーバからダウンロードすればよい。 The personal profile data 52 may be centrally managed in the document server 3 or other servers and shared by a plurality of user computers 2. In this case, the interest determination unit 208 of the user computer 2 may notify the server of the user code received by the user code reception unit 203 and download the personal profile data 52 indicating this from the server.

〔差分の抽出の変形例〕
図１７は、類似文書７Ｃ１、７Ｃ２および差分７Ｅ１、７Ｅ２の例を示す図である。 [Modification of difference extraction]
FIG. 17 is a diagram illustrating examples of similar documents 7C1 and 7C2 and differences 7E1 and 7E2.

上述の実施形態では、図８および図９に示した通り、差分抽出部２０７は、文字の単位で差分を抽出した。しかし、この単位で抽出すると、類似文書によっては、差分が少ししか取れないことがある。例えば、図１７（Ａ）に示す２つの類似文書７Ｃ１、７Ｃ２から、差分として、数文字の文字列からなる差分７Ｄ１、７Ｄ２しか抽出されない。そうすると、関心有無判別部２０８は、興味キーワードとのマッチングを十分に行うことができず、類似文書に対するユーザの関心の有無を十分適切に判別することができないことがある。 In the above-described embodiment, as illustrated in FIGS. 8 and 9, the difference extraction unit 207 extracts a difference in units of characters. However, when extracted in this unit, only a small difference may be obtained depending on similar documents. For example, from the two similar documents 7C1 and 7C2 shown in FIG. 17A, only the differences 7D1 and 7D2 composed of character strings of several characters are extracted as differences. In this case, the interest presence / absence determination unit 208 may not be able to sufficiently match the interest keyword, and may not be able to determine adequately whether the user is interested in the similar document.

そこで、差分抽出部２０７は、もう少し大きい単位で差分を抽出してもよい。例えば、文（センテンス）の単位で抽出してもよい。この場合は、図１７（Ａ）の類似文書７Ｃ１、７Ｃ２から、図１７（Ｂ）のように差分７Ｅ１、７Ｅ２が抽出される。または、段落の単位で抽出してもよい。または、文頭または１つの読点から次の読点または句点までの文字列の単位で抽出してもよい。または、文字の単位で差分を抽出し、興味キーワードとのマッチングの範囲を、差分の前後の所定の範囲（例えば、前後３０文字の範囲）まで広げてもよい。 Therefore, the difference extraction unit 207 may extract the difference in a slightly larger unit. For example, you may extract by the unit of a sentence (sentence). In this case, the differences 7E1 and 7E2 are extracted from the similar documents 7C1 and 7C2 in FIG. 17A as shown in FIG. Or you may extract by the unit of a paragraph. Or you may extract in the unit of the character string from a sentence head or one reading point to the next reading point or a phrase. Alternatively, the difference may be extracted in units of characters, and the range of matching with the keyword of interest may be expanded to a predetermined range before and after the difference (for example, a range of 30 characters before and after the difference).

〔類似文書に対する関心の有無の判別の第一の変形例〕
図１８は、個人プロファイルデータ５３の例を示す図である。図１９は、関心有無判別処理の流れの変形例を説明するフローチャートである。図２０は、差分７Ｆ１、７Ｆ２の例を示す図である。 [First modified example of determination of interest in similar documents]
FIG. 18 is a diagram illustrating an example of the personal profile data 53. FIG. 19 is a flowchart illustrating a modified example of the flow of interest presence / absence determination processing. FIG. 20 is a diagram illustrating an example of the differences 7F1 and 7F2.

上述の実施形態では、関心有無判別部２０８は、類似文書へのユーザの関心の有無を、いずれかの類似文書の差分にこのユーザの興味キーワード（図５参照）が含まれているか否かだけで判別した。関心の有無をより正確に判別するために、次のように判別してもよい。 In the above-described embodiment, the interest presence / absence determination unit 208 determines whether or not a user is interested in a similar document, and only whether or not the user's interest keyword (see FIG. 5) is included in the difference between any similar documents. Was determined. In order to more accurately determine the presence or absence of interest, the determination may be made as follows.

個人プロファイルデータ記憶部２０２に、個人プロファイルデータ５２の代わりに、図１８に示すような個人プロファイルデータ５３を記憶させておく。個人プロファイルデータ５３には、個人プロファイルデータ５２と同様、ユーザコードとともに、ユーザが興味を有する事項の単語が興味キーワードとして示されている。さらに、個人プロファイルデータ５３は、興味キーワードごとに、ユーザにとっての重要性または興味深さを表わす点数が示されている。点数は、ユーザが付けてもよいし、ユーザが過去にアクセスした各文書における登場回数に基づいて決定してもよい。 Instead of the personal profile data 52, personal profile data 53 as shown in FIG. 18 is stored in the personal profile data storage unit 202. In the personal profile data 53, as in the personal profile data 52, words of matters that the user is interested in are shown as interest keywords together with the user code. Furthermore, the personal profile data 53 shows a point representing importance or depth of interest for the user for each interest keyword. The score may be given by the user or may be determined based on the number of appearances in each document accessed by the user in the past.

関心有無判別部２０８は、類似文書に対する関心の有無を、個人プロファイルデータ５２の代わりに個人プロファイルデータ５３を用いて判別する。ここで、判別の方法を、図１９を参照しながら説明する。 An interest presence / absence determination unit 208 determines the presence / absence of interest in a similar document by using the personal profile data 53 instead of the personal profile data 52. Here, the determination method will be described with reference to FIG.

関心有無判別部２０８は、ユーザコード受付部２０３によって受け付けられたユーザコードを示す個人プロファイルデータ５３を個人プロファイルデータ記憶部２０２から読み出す（図１９の＃８２１）。類似文書選出部２０６によって選出された類似文書の、差分抽出部２０７によって抽出された差分に、個人プロファイルデータ５３に示される興味キーワードが含まれるか否かをチェックするとともに、含まれる場合は興味キーワードそれぞれの登場回数をカウントする（＃８２２）。 The interest determination unit 208 reads the personal profile data 53 indicating the user code received by the user code reception unit 203 from the personal profile data storage unit 202 (# 821 in FIG. 19). It is checked whether or not the interest keyword indicated in the personal profile data 53 is included in the difference extracted by the difference extraction unit 207 of the similar document selected by the similar document selection unit 206. Each appearance count is counted (# 822).

興味キーワードが含まれる場合は（＃８２３でＹｅｓ）、関心有無判別部２０８は、各興味キーワードの登場回数と個人プロファイルデータ５３に示される点数との積の合計値を、重要度として算出する（＃８２４）。そして、算出した重要度が閾値α以上である場合は（＃８２５でＹｅｓ）、類似文書に対する関心があると判別する（＃８２６）。閾値α未満である場合は（＃８２６でＮｏ）、類似文書に対する関心がないと判別する（＃８２７）。興味キーワードが１つも含まれない場合も（＃８２３でＮｏ）、類似文書に対する関心がないと判別する（＃８２７）。 When an interest keyword is included (Yes in # 823), the interest presence / absence determination unit 208 calculates the sum of products of the number of appearances of each interest keyword and the point indicated in the personal profile data 53 as the importance ( # 824). If the calculated importance is equal to or greater than the threshold α (Yes in # 825), it is determined that there is an interest in a similar document (# 826). If it is less than the threshold α (No in # 826), it is determined that there is no interest in similar documents (# 827). If no interest keyword is included (No in # 823), it is determined that there is no interest in similar documents (# 827).

例えば、２つの類似文書の差分が図２０に示す差分７Ｆ１、７Ｆ２であり、閾値αが「８．０」である場合において、図１８（Ａ）の個人プロファイルデータ５３を用いると、「成分」が４回登場し、「温度」が３回登場するので、重要度として、４．０×４＋３．０×３＝２５．０、が算出される。よって、「関心あり」と、判別される。一方、図１８（Ｃ）の個人プロファイルデータ５３を用いると、「温度」が３回登場するので、重要度として、２．０×３＝６．０、が算出される。よって、「関心なし」と、判別される。 For example, when the difference between two similar documents is the difference 7F1 and 7F2 shown in FIG. 20 and the threshold value α is “8.0”, using the personal profile data 53 of FIG. Appears four times and “temperature” appears three times, so that 4.0 × 4 + 3.0 × 3 = 25.0 is calculated as the importance. Therefore, “interested” is determined. On the other hand, when the personal profile data 53 of FIG. 18C is used, “temperature” appears three times, so that 2.0 × 3 = 6.0 is calculated as the importance. Therefore, it is determined that there is no interest.

なお、ステップ＃８２４において、重要度として、積の合計値の代わりに積の平均値を算出してもよい。次に説明する第二の変形例においても、同様である。 In step # 824, an average value of products may be calculated as the importance instead of the total value of products. The same applies to the second modification described below.

〔類似文書に対する関心の有無の判別の第二の変形例〕
図２１は、個人プロファイルデータ５４の例を示す図である。図２２は、関心有無判別処理の流れの変形例を説明するフローチャートである。 [Second modification of determination of interest in similar documents]
FIG. 21 is a diagram illustrating an example of the personal profile data 54. FIG. 22 is a flowchart illustrating a modified example of the flow of interest presence / absence determination processing.

キーワード同士の関連性に鑑み、類似文書に対するユーザの関心の有無を次のように判別してもよい。 In view of the relevance between keywords, the user's interest in similar documents may be determined as follows.

個人プロファイルデータ記憶部２０２に、個人プロファイルデータ５２の代わりに、図２１に示すような個人プロファイルデータ５４を記憶させておく。個人プロファイルデータ５４には、ユーザコードとともに、ユーザが興味を有する事項を表わす２つの単語のペアが興味ペアキーワードとして示されている。さらに、個人プロファイルデータ５４は、興味ペアキーワードごとに、ユーザにとっての重要性または興味深さを表わす点数が示されている。または、興味ペアキーワードを構成する２つの単語同士の関連性の深さを表わす点数を示してもよい。 Instead of the personal profile data 52, personal profile data 54 as shown in FIG. 21 is stored in the personal profile data storage unit 202. In the personal profile data 54, along with the user code, a pair of two words representing an item that the user is interested in is shown as an interest pair keyword. Furthermore, the personal profile data 54 indicates points representing importance or depth of interest for the user for each interest pair keyword. Or you may show the score showing the depth of relevance of two words which comprise an interest pair keyword.

興味ペアキーワードおよびその点数は、予め、ユーザが個々に入力すればよい。または、ユーザが過去にアクセスした各文書の所定の単位の塊（例えば、一文または一段落など）から、２つの単語同士の組合せを検索し、その登場回数をカウントすることによって、取得してもよい。 The user may input the interest pair keyword and its score individually in advance. Alternatively, it may be obtained by searching for a combination of two words from a predetermined unit block (for example, one sentence or one paragraph) of each document accessed by the user in the past and counting the number of appearances. .

関心有無判別部２０８は、類似文書に対する関心の有無を、個人プロファイルデータ５２の代わりに個人プロファイルデータ５４を用いて判別する。ここで、判別の方法を、図２２を参照しながら説明する。 The interest presence / absence determination unit 208 determines the presence / absence of interest in the similar document by using the personal profile data 54 instead of the personal profile data 52. Here, the determination method will be described with reference to FIG.

関心有無判別部２０８は、ユーザコード受付部２０３によって受け付けられたユーザコードを示す個人プロファイルデータ５４を個人プロファイルデータ記憶部２０２から読み出す（図２２の＃８３１）。類似文書選出部２０６によって選出された類似文書の、差分抽出部２０７によって抽出された差分に、個人プロファイルデータ５４に示される興味ペアキーワードが含まれるか否かをチェックするとともに、含まれる場合は興味ペアキーワードそれぞれの登場回数をカウントする（＃８３２）。 The interest determination unit 208 reads the personal profile data 54 indicating the user code received by the user code reception unit 203 from the personal profile data storage unit 202 (# 831 in FIG. 22). It is checked whether or not the interest pair keyword indicated by the personal profile data 54 is included in the difference extracted by the difference extraction unit 207 of the similar document selected by the similar document selection unit 206. The number of appearances of each pair keyword is counted (# 832).

なお、「興味ペアキーワードが差分に含まれる」とは、興味ペアキーワードに係る２つの単語が１つの差分の中の任意の位置に含まれていることとしてもよいが、両単語の位置が相当離れていると、両単語がペアであると認め難いことがある。そこで、両単語が同じ１つの文（センテンス）に含まれている場合を、興味ペアキーワードが差分に含まれることとしてもよい。 “Interested pair keyword is included in the difference” means that two words related to the interested pair keyword may be included in any position in one difference, but the positions of both words are equivalent. When separated, it may be difficult to recognize that both words are a pair. Therefore, when both words are included in the same sentence (sentence), the interest pair keyword may be included in the difference.

興味ペアキーワードが含まれる場合は（＃８３３でＹｅｓ）、関心有無判別部２０８は、各興味ペアキーワードの登場回数と個人プロファイルデータ５４に示される点数との積の合計を、重要度として算出する（＃８３４）。そして、算出した重要度が閾値β以上である場合は（＃８３５でＹｅｓ）、類似文書に対する関心があると判別する（＃８３６）。閾値β未満である場合は（＃８３５でＮｏ）、類似文書に対する関心がないと判別する（＃８３７）。興味ペアキーワードが１つも含まれない場合も（＃８３３でＮｏ）、類似文書に対する関心がないと判別する（＃８３７）。 When the interest pair keyword is included (Yes in # 833), the interest presence / absence determination unit 208 calculates the sum of the products of the number of appearances of each interest pair keyword and the score indicated in the personal profile data 54 as the importance. (# 834). If the calculated importance is equal to or greater than the threshold β (Yes in # 835), it is determined that there is an interest in a similar document (# 836). If it is less than the threshold β (No in # 835), it is determined that there is no interest in similar documents (# 837). If no interest pair keyword is included (No in # 833), it is determined that there is no interest in similar documents (# 837).

本変形例では、２つの単語同士を１つのペアにしたが、３つ以上の単語同士を１つのペアにしてもよい。 In this modification, two words are made into one pair, but three or more words may be made into one pair.

〔３つ以上の類似文書がある場合の変形例〕
図２３は、類似文書７Ｇ１〜７Ｇ３の例を示す図である。図２４は、検索処理の流れの変形例を説明するフローチャートである。図２５は、差分７Ｈ１、７Ｈ２および差分７Ｊ２、７Ｊ３の例を示す図である。図２６は、第二の検索結果提示処理の流れの例を説明するフローチャートである。図２７は、検索結果画面６４の例を示す図である。 [Variation when there are three or more similar documents]
FIG. 23 is a diagram illustrating an example of similar documents 7G1 to 7G3. FIG. 24 is a flowchart illustrating a modified example of the flow of search processing. FIG. 25 is a diagram illustrating an example of the differences 7H1 and 7H2 and the differences 7J2 and 7J3. FIG. 26 is a flowchart illustrating an example of the flow of the second search result presentation process. FIG. 27 is a diagram illustrating an example of the search result screen 64.

１組の類似文書が、ベースが同じであるが版（バージョン、エディション）が異なることが、ある。つまり、いわゆる版違いであることが、ある。例えば、３つの類似文書のうちの１つ目がドラフト版（第１版）であり、２つ目が改訂版（第２版）であり、３つ目が正式版（第３版）である、というケースである。なお、版違いであることは、各類似文書の文書データ５０のプロパティを参照すれば、分かる。例えば、表題が同一であるが版番が異なれば、版違いであると判別することができる。または、版を管理するシステム（例えば、特開２００６−１２７０２９号公報に記載されるシステム）に問い合わせてもよい。 A set of similar documents may have the same base but different versions (versions, editions). In other words, there is a so-called version difference. For example, the first of three similar documents is a draft version (first version), the second is a revised version (second version), and the third is an official version (third version). This is the case. Note that the version difference can be understood by referring to the property of the document data 50 of each similar document. For example, if the titles are the same but the version numbers are different, it can be determined that the versions are different. Alternatively, an inquiry may be made to a system that manages the version (for example, a system described in Japanese Patent Application Laid-Open No. 2006-127029).

１組に類似文書が３つ以上であり、かつ、これらの類似文書が版違いである場合は、版が隣り合う２つの類似文書同士で関心の有無を判別し、その結果に応じて一部の類似文書の文書名のみを優先的に提示してもよい。以下、図２３に示す３つの類似文書７Ｇ１、７Ｇ２、および７Ｇ３が類似文書選出部２０６によって選出された場合を例に、図２４のフローチャートなどを参照しながら本変形例を説明する。なお、類似文書７Ｇ１、７Ｇ２、および７Ｇ３は、それぞれ、第１版、第２版、および第３版である。 If there are three or more similar documents in one set and these similar documents are of different versions, the presence or absence of interest is discriminated between two similar documents whose versions are adjacent to each other. Only the document names of similar documents may be presented with priority. Hereinafter, this modification will be described with reference to the flowchart of FIG. 24 and the like, taking as an example the case where the three similar documents 7G1, 7G2, and 7G3 shown in FIG. 23 are selected by the similar document selection unit 206. The similar documents 7G1, 7G2, and 7G3 are the first version, the second version, and the third version, respectively.

ユーザコンピュータ２の各部は、図１６のステップ＃１１〜＃１４と同様、ユーザコードおよび検索キーワードを受け付け（図２４の＃２１、＃２２）、文書を検索し（＃２３）、類似文書を選出する（＃２４）。これらの処理において、上述の変形例を用いてもよい。 Each part of the user computer 2 accepts a user code and a search keyword (# 21 and # 22 in FIG. 24), searches for a document (# 23), and selects a similar document, as in steps # 11 to # 14 in FIG. (# 24). In these processes, the above-described modifications may be used.

３つ以上の類似文書が１組として選出された場合は、差分抽出部２０７は、版が隣り合う２つの類似文書同士をペアとして、各ペアについて差分を抽出し、関心有無判別部２０８は、各ペアの関心の有無を判別する（＃２５〜＃２８）。関心の有無の判別方法は、図１０で説明した通りである。または、上述の変形例を用いてもよい。差分の抽出も、上述の変形例を用いてもよい。 When three or more similar documents are selected as one set, the difference extraction unit 207 extracts two differences for each pair by using two similar documents whose versions are adjacent to each other. Whether or not each pair is interested is determined (# 25 to # 28). The method for determining whether there is interest is as described with reference to FIG. Or you may use the above-mentioned modification. The above-described modification may also be used for difference extraction.

図２３の例では、類似文書７Ｇ１および７Ｇ２のペアについて、図２５（Ａ）のように差分７Ｈ１および７Ｈ２を抽出し、関心の有無を判別する。さらに、類似文書７Ｇ２および７Ｇ３のペアについて、図２５（Ｂ）のように差分７Ｊ２および７Ｊ３を抽出し、関心の有無を判別する。 In the example of FIG. 23, differences 7H1 and 7H2 are extracted from a pair of similar documents 7G1 and 7G2 as shown in FIG. Further, for the pair of similar documents 7G2 and 7G3, differences 7J2 and 7J3 are extracted as shown in FIG.

そして、検索結果提示部２０９は、ステップ＃２３〜＃２８の結果に基づいて検索結果画面６４をタッチパネルディスプレイ２０ｅに表示する（＃２９）。表示の処理は、図２６に示す手順で行われる。 Then, the search result presentation unit 209 displays the search result screen 64 on the touch panel display 20e based on the results of Steps # 23 to # 28 (# 29). The display process is performed according to the procedure shown in FIG.

検索結果提示部２０９は、類似文書７Ｇ１〜７Ｇ３のうち（＃８４１でＹｅｓ）、関心があるとステップ＃２６において少なくとも１回でも判別されたことのある類似文書に対して（＃８４２でＹｅｓ）、上位の優先度を与える（＃８４３）。 The search result presenting unit 209 determines that among the similar documents 7G1 to 7G3 (Yes in # 841), a similar document that has been determined at least once in Step # 26 when interested (Yes in # 842). Is given a higher priority (# 843).

例えば、類似文書７Ｇ２が、類似文書７Ｇ１とのペアについての判別処理では「関心なし」と判別されたが、類似文書７Ｇ３とのペアについての判別処理では「関心あり」と判別された場合は、類似文書７Ｇ２に上位の優先度を与える。 For example, when the similar document 7G2 is determined as “no interest” in the determination process for the pair with the similar document 7G1, but is determined as “interested” in the determination process for the pair with the similar document 7G3, A higher priority is given to the similar document 7G2.

一方、関心があると一度も判別されなかった類似文書に対しては（＃８４２でＮｏ）、検索結果提示部２０９は、下位の優先度を与える（＃８４４）。 On the other hand, for similar documents that have never been identified as being interested (No in # 842), the search result presentation unit 209 gives a lower priority (# 844).

そして、検索結果提示部２０９は、類似文書７Ｇ１〜７Ｇ３のうち上位の優先度の類似文書の文書名を優先的に配置する（＃８４５）。このような類似文書が複数ある場合は、それぞれの文書名を同等に配置する。ただし、すべての類似文書の優先度が下位である場合は、代表を１つ選出し、代表の文書名を優先的に配置する。 Then, the search result presentation unit 209 preferentially arranges the document names of similar documents with higher priority among the similar documents 7G1 to 7G3 (# 845). When there are a plurality of such similar documents, the document names are arranged equally. However, if the priority of all similar documents is lower, one representative is selected and the representative document name is preferentially arranged.

非類似文書の文書名の配置の方法は、図１１で説明した方法と同様である。つまり、他の文書の文書名と同等に配置する（＃８４６でＹｅｓ、＃８４７）。 The method of arranging the document names of dissimilar documents is the same as the method described with reference to FIG. That is, they are arranged in the same way as the document names of other documents (Yes in # 846, # 847).

そして、このように文書名を配置した検索結果画面６４をタッチパネルディスプレイ２０ｅに表示する（＃８４８）。 Then, the search result screen 64 in which the document names are arranged in this way is displayed on the touch panel display 20e (# 848).

例えば、類似文書７Ｇ１、７Ｇ２、および７Ｇ３の文書名がそれぞれ「ＥＥＥＥＥ１」、「ＥＥＥＥＥ２」、および「ＥＥＥＥＥ３」であり、類似文書７Ｇ１の優先度が下位であり、類似文書７Ｇ２および７Ｇ３の優先度が下位であり、非類似文書の文書名が「ＤＤＤＤＤ」である場合は、検索結果提示部２０９は、図２７（Ａ）のように文書名を配置した検索結果画面６４を表示する。そして、アイコン６４ｃが押されたら、図２７（Ｂ）のように類似文書７Ｇ３の文書名を類似文書７Ｇ１および７Ｇ２の各文書名よりも下位に配置して検索結果画面６４を表示し直す。 For example, the document names of the similar documents 7G1, 7G2, and 7G3 are “EEEEEE1”, “EEEEEE2”, and “EEEEEE3”, respectively. When the document name of the lower-order and dissimilar document is “DDDDD”, the search result presentation unit 209 displays the search result screen 64 in which the document names are arranged as shown in FIG. When the icon 64c is pressed, the search result screen 64 is displayed again by arranging the document name of the similar document 7G3 below the document names of the similar documents 7G1 and 7G2 as shown in FIG.

〔文書の整理（データベースのクレンジング）〕
図２８は、ユーザコンピュータ２の機能的構成の変形例を示す図である。図２９は、検索結果画面６４の変形例を示す図である。 [Document organization (database cleansing)]
FIG. 28 is a diagram illustrating a modification of the functional configuration of the user computer 2. FIG. 29 is a diagram showing a modified example of the search result screen 64.

上述の実施形態および変形例は、検索キーワードに合う文書の存在をユーザに知らせるために用いられるが、文書の整理のために応用することができる。具体的には、互いに類似する複数の文書（類似文書）のうち１つのみを残し、それ以外を削除するためにも、用いられる。以下、この仕組みを、図２８および図２９を参照しながら説明する。 The above-described embodiments and modifications are used to inform the user of the existence of a document that matches the search keyword, but can be applied to organize documents. Specifically, it is also used to leave only one of a plurality of similar documents (similar documents) and delete the other documents. Hereinafter, this mechanism will be described with reference to FIGS. 28 and 29. FIG.

図２８に示すように、さらに文書データ削除部２１１を設ける。個人プロファイルデータ生成部２０１ないし検索結果提示部２０９の機能は基本的に上述の実施形態または変形例と同様である。 As shown in FIG. 28, a document data deletion unit 211 is further provided. The functions of the personal profile data generation unit 201 or the search result presentation unit 209 are basically the same as those in the above-described embodiment or modification.

ただし、文書検索部２０５は、検索の範囲を限定してもよい。例えば、ローカル文書データベース２Ｂ１に文書データ５０が保存されている文書に限定してもよい。または、ユーザコード受付部２３３によってユーザコードが受け付けられたユーザのみが所有者である文書に限定してもよい。または、検索の範囲をユーザに指定させてもよい。または、文書が幾つかのカテゴリのうちのいずれかに予め分類されている場合は、ユーザが選択したカテゴリを検索の範囲としてもよい。 However, the document search unit 205 may limit the search range. For example, the document data 50 may be limited to documents stored in the local document database 2B1. Or you may limit to the document in which only the user for whom the user code was received by the user code reception part 233 is an owner. Alternatively, the search range may be specified by the user. Alternatively, when the document is classified in advance into one of several categories, the category selected by the user may be set as the search range.

また、検索結果提示部２０９は、文書検索部２０５、類似文書選出部２０６、および関心有無判別部２０８による処理の結果に基づいて検索結果画面６４を表示するが、図２９（Ａ）または（Ｂ）のように、文書の整理ができる旨のメッセージおよび「削除」ボタンを配置した状態で表示する。 The search result presentation unit 209 displays the search result screen 64 based on the processing results of the document search unit 205, the similar document selection unit 206, and the interest presence / absence determination unit 208. FIG. 29A or FIG. And a message indicating that the document can be organized and a “delete” button are arranged.

ここで、ユーザは、関心がないと関心有無判別部２０８によって判別された類似文書を、代表のみを残して削除してもよい場合は、「削除」ボタンを押す。 Here, when the user can delete the similar document determined by the interest determination unit 208 with no interest, leaving only the representative, the user presses a “delete” button.

すると、文書データ削除部２１１は、関心がないと判別された類似文書のうちの代表以外の類似文書の文書データ５０を、現在の保存場所から削除する。 Then, the document data deletion unit 211 deletes the document data 50 of similar documents other than the representative among the similar documents determined not to be interested from the current storage location.

文書の整理の機能を備えた場合の、ユーザコンピュータ２の全体的な処理の流れは、基本的に図１６または図２４で説明した通りである。ただし、ユーザコンピュータ２は、図１６のステップ＃１７または図２４のステップ＃２９の後、ユーザが「削除」ボタンを押した場合に、上述の通り類似文書の文書データ５０を削除する処理を実行する。 The overall processing flow of the user computer 2 in the case of having a document organizing function is basically as described in FIG. 16 or FIG. However, when the user presses the “delete” button after step # 17 in FIG. 16 or step # 29 in FIG. 24, the user computer 2 executes the process of deleting the document data 50 of the similar document as described above. To do.

このように、文書データ削除部２１１によると、ローカル文書データベース２Ｂ１または文書サーバ３のメンテナンスにおける類似文書のクレンジングを行うことができる。 As described above, the document data deleting unit 211 can cleanse similar documents in the maintenance of the local document database 2B1 or the document server 3.

なお、文書データ削除部２１１は、検索結果画面６４を表示することなく削除の処理を実行してもよい。または、検索結果提示部２０９は、類似文書の文書名を所定の順序（例えば、興味キーワードが多く含まれる順）で一覧として表示し、文書データ削除部２１１は、一覧の中からユーザが選択した文書の文書データ５０のみを削除してもよい。 Note that the document data deletion unit 211 may execute the deletion process without displaying the search result screen 64. Alternatively, the search result presentation unit 209 displays the document names of similar documents as a list in a predetermined order (for example, the order in which many interesting keywords are included), and the document data deletion unit 211 selects the user from the list. Only the document data 50 of the document may be deleted.

その他、文書管理システム１、ユーザコンピュータ２の全体または各部の構成、処理内容、処理順序、データの構成などは、本発明の趣旨に沿って適宜変更することができる。 In addition, the configuration of the entire document management system 1 and user computer 2 or each unit, processing contents, processing order, data configuration, and the like can be appropriately changed in accordance with the spirit of the present invention.

２ユーザコンピュータ（情報処理装置、文書データ整理装置）
２０２個人プロファイルデータ記憶部（興味キーワード記憶手段）
２０４検索キーワード受付部（検索キーワード受付手段）
２０７差分抽出部（差分抽出手段）
２０８関心有無判別部（条件適否判別手段）
２０９検索結果提示部（存在提示手段）
２１１文書データ削除部（文書データ整理手段、文書データ削除手段）
５０文書データ
６４検索結果画面 2 User computer (information processing device, document data organizing device)
202 Personal profile data storage unit (interest keyword storage means)
204 Search keyword receiving unit (search keyword receiving means)
207 Difference extraction unit (difference extraction means)
208 Interest presence / absence discriminating section (condition suitability judging means)
209 Search result presentation unit (presentation means)
211 Document data deletion unit (document data organizing means, document data deleting means)
50 Document data 64 Search result screen

Claims

An information processing apparatus that notifies the presence of some or all of a plurality of similar documents,
Difference extracting means for extracting differences between the plurality of documents;
A condition suitability judging means for judging whether or not an interest keyword representing an item of interest to the user, which is stored in advance in the interest keyword storage means, is represented in the difference so as to satisfy a predetermined condition;
If the interest keyword appears in the difference so as to satisfy the predetermined condition, the presence of all of the plurality of documents is presented; otherwise, the existence of a part of the plurality of documents Presence presenting means for preferentially presenting
An information processing apparatus comprising:

Search keyword receiving means for receiving a search keyword used as a search key;
The plurality of documents are searched based on the search keyword.
The information processing apparatus according to claim 1.

The predetermined condition is that the keyword of interest appears at least once in the difference.
The information processing apparatus according to claim 1 or 2.

The interest keyword storage means stores a plurality of the interest keywords, and a score is given to each of the plurality of interest keywords.
The predetermined condition is that a sum of products of the number of times expressed in the difference and the score of each of the interested keywords is equal to or greater than a threshold value.
The information processing apparatus according to claim 1 or 2.

The interest keyword storage means stores a plurality of pairs of words as the interest keyword, and a score is given to each of the plurality of pairs,
The predetermined condition is that the sum of the product of the number of times expressed in the difference and the score of each pair is equal to or greater than a threshold value.
The information processing apparatus according to claim 1 or 2.

The presence presenting means displays the presence of a part of the plurality of documents, the identifier of the part, and the remainder when the interest keyword is not represented in the difference so as to satisfy the predetermined condition By preferentially presenting by not displaying the identifier of
The information processing apparatus according to any one of claims 1 to 5.

When the interest keyword is not represented in the difference so that the predetermined keyword satisfies the predetermined condition, the presence presenting means indicates the presence of a part of the plurality of documents, the partial identifier as a remaining identifier, Present it preferentially by displaying it more prominently,
The information processing apparatus according to any one of claims 1 to 5.

If the keyword of interest is not represented in the difference so as to satisfy the predetermined condition, the data of the document preferentially presented by the presence presenting means among the plurality of documents is left, and the other documents A document data organizing means for performing processing for deleting data;
The information processing apparatus according to claim 1.

An information processing apparatus that notifies the presence of some or all of a plurality of similar documents,
Difference extracting means for extracting differences between the plurality of documents;
Determining whether or not an interest keyword that is stored in advance in the interest keyword storage means and represents an item that the user is interested in is represented in the difference or a predetermined range before and after the difference so as to satisfy a predetermined condition; A condition adequacy determination means;
If the keyword of interest appears in the difference or the predetermined range so as to satisfy the predetermined condition, the presence of all of the plurality of documents is presented; otherwise, out of the plurality of documents Presence presenting means for preferentially presenting the presence of a part of
An information processing apparatus comprising:

A document data organizing apparatus for organizing data of a plurality of similar documents,
Difference extracting means for extracting differences between the plurality of documents;
A condition suitability judging means for judging whether or not an interest keyword representing an item of interest to the user, which is stored in advance in the interest keyword storage means, is represented in the difference so as to satisfy a predetermined condition;
When the interested keyword is not represented in the difference so as to satisfy the predetermined condition, a process is performed in which data of a part of the plurality of documents is left and data of the other documents is deleted. , Document data deletion means,
An apparatus for organizing document data.

A document presentation method for notifying the existence of some or all of a plurality of similar documents,
On the computer,
Causing an extraction process to extract differences between the plurality of documents,
A determination process for determining whether or not an interest keyword representing an item of interest to the user, which is stored in advance in the interest keyword storage means, is represented in the difference so as to satisfy a predetermined condition;
If the interest keyword appears in the difference so as to satisfy the predetermined condition, the presence of all of the plurality of documents is presented; otherwise, the existence of a part of the plurality of documents , Preferentially present, execute the presentation process,
A document presentation method characterized by the above.

A computer program used in a computer for informing the existence of some or all of a plurality of similar documents,
In the computer,
Causing an extraction process to extract differences between the plurality of documents,
A determination process for determining whether or not an interest keyword representing an item of interest to the user, which is stored in advance in the interest keyword storage means, is represented in the difference so as to satisfy a predetermined condition;
If the interest keyword appears in the difference so as to satisfy the predetermined condition, the presence of all of the plurality of documents is presented; otherwise, the existence of a part of the plurality of documents , Preferentially present, execute the presentation process,
A computer program characterized by the above.