JP5112027B2

JP5112027B2 - Document group presentation device and document group presentation program

Info

Publication number: JP5112027B2
Application number: JP2007308151A
Authority: JP
Inventors: 嘉隆伊藤
Original assignee: Hitachi Solutions Ltd
Current assignee: Hitachi Solutions Ltd
Priority date: 2007-11-29
Filing date: 2007-11-29
Publication date: 2013-01-09
Anticipated expiration: 2027-11-29
Also published as: JP2009134378A

Description

本発明は、文書管理システムにおける文書群提示装置および文書群提示プログラムに係り、特に、全文書集合の全体像から概念の意味上の階層関係を使用して重要話題での意味上の拡大・縮小を行う技術に関する。 The present invention relates to a document group presentation apparatus and a document group presentation program in a document management system, and in particular, uses semantic hierarchical relationships of concepts from the overall image of all document sets to expand or reduce semantics on important topics. It is related to the technology to do.

従来の情報潮流検出方法として、自動的に話題を抽出し、分類し、関連語を算出する方法が知られている。（下記、特許文献１参照）
また、従来の情報潮流検出方法として、時系列に並んだ話題を自動的に抽出し提示する方法が知られている。（下記、特許文献２参照）
さらに、従来の文書管理システムにおける文書の自動分類方法として、抽象度を使った分類方法が知られている。（下記、特許文献３参照） As a conventional information flow detection method, a method of automatically extracting and classifying topics and calculating related words is known. (See Patent Document 1 below)
As a conventional information flow detection method, a method of automatically extracting and presenting topics arranged in time series is known. (See Patent Document 2 below)
Furthermore, a classification method using abstraction is known as an automatic document classification method in a conventional document management system. (See Patent Document 3 below)

なお、本願発明に関連する先行技術文献としては以下のものがある。
特開２００６−２７７７６７号公報特開平１１−１７５５３０号公報特開２００３−８５１８９号公報 As prior art documents related to the invention of the present application, there are the following.
JP 2006-277767 A JP-A-11-175530 JP 2003-85189 A

前述の特許文献１に記載の方法では、出現頻度による話題抽出と、話題同士の強度を算出し話題間に関係を持たせるが、意味上関係があるかどうか不明な話題同士が結合する可能性があり、上位概念の話題から下位概念の話題へと辿って文書群を提示する用途には利用できなかった。
また、前述の特許文献２に記載の方法では、出現回数と日時を軸としたグラフ上に話題を投影し、一覧性は向上しているが、莫大な数の話題に対して提示することができなかった。
さらに、前述の特許文献３に記載の方法では、文書群の具体的な提示方法は示されておらず、分類した文書群を提示することができなかった。
本発明は、前記従来技術の問題点を解決するためになされたものであり、本発明の目的は、指定した抽象度で文書群の統合・分割結果を作成することができ、分類対象となる文書群の文書空間を地図に見立て意味上の拡大・縮小が可能なインタフェースを構築することが可能な文書群提示装置を提供することにある。
また、本発明の他の目的は、前述の文書群提示装置をコンピュータに実行させるためのプログラムを提供することにある。
本発明の前記ならびにその他の目的と新規な特徴は、本明細書の記述及び添付図面によって明らかにする。 In the method described in Patent Document 1 described above, topic extraction based on appearance frequency and intensity of topics are calculated and related to each other, but there is a possibility that topics that are unknown in terms of meaning are combined. Therefore, it cannot be used for the purpose of presenting a document group by tracing from the topic of the superordinate concept to the topic of the subordinate concept.
Further, in the method described in Patent Document 2, topics are projected on a graph with the number of appearances and the date and time as axes, and the listability is improved, but it can be presented to a huge number of topics. could not.
Furthermore, in the method described in Patent Document 3, a specific document group presentation method is not shown, and a classified document group cannot be presented.
The present invention has been made to solve the above-described problems of the prior art, and an object of the present invention is to create an integration / division result of a document group at a specified abstraction level, and is a classification target. An object of the present invention is to provide a document group presentation device capable of constructing an interface that can be enlarged or reduced in terms of meaning by using a document space of a document group as a map.
Another object of the present invention is to provide a program for causing a computer to execute the above document group presentation device.
The above and other objects and novel features of the present invention will become apparent from the description of this specification and the accompanying drawings.

本願において開示される発明のうち、代表的なものの概要を簡単に説明すれば、下記の通りである。
前述の目的を達成するために、本発明は、文書を入力し、登録・保管・検索・表示する機能を有する文書管理システムの文書群を提示する文書群提示装置であって、概念辞書を読み出して単語の概念を木構造として構築する概念木構築手段と、複数の文書を読み出して単語を抽出し、単語の出現回数、あるいは、見出し語かどうかにより重要話題を判定し抽出する重要話題抽出手段と、前記概念木構築手段により構築された前記概念木と、前記重要話題抽出手段により抽出された前記重要話題とから、文書群の提示に必要となる概念識別子、文書識別子、抽象度、現抽象度を表現する概念木を文書分類表として構築する文書分類表構築手段と、前記文書分類表構築手段により構築された前記文書分類表を解析し、今回入力された抽象度と、前回入力された抽象度、現抽象度とに基づき前記文書分類表を更新する文書群統合・分割手段と、前記文書群統合・分割手段により更新された前記文書分類表に基づき、前記文書群を提示する提示手段とを備える。 Of the inventions disclosed in this application, the outline of typical ones will be briefly described as follows.
In order to achieve the above object, the present invention is a document group presentation device for presenting a document group of a document management system having a function of inputting, registering, storing, searching, and displaying a document, and reading a concept dictionary A concept tree construction means for constructing a word concept as a tree structure, and an important topic extraction means for extracting a word by reading a plurality of documents and determining and extracting an important topic based on the number of occurrences of the word or whether it is a headword The concept tree constructed by the concept tree construction means and the important topic extracted by the important topic extraction means. a document classification table construction means for constructing a concept tree representing the degree as a document classification table, parsing the document classification table constructed by the document classification table constructing unit, and abstract inputted this time, before Input abstraction, and documents integrating and dividing means for updating the document classification table based on the current level of abstraction based on the document classification table updated by the documents integrating and dividing means, presenting the documents Presenting means.

また、本発明では、前記文書分類表構築手段は、前記重要話題抽出手段により抽出された前記重要話題が、前記概念木構築手段により構築された前記概念木に存在する場合に、当該重要話題の概念識別子を前記文書分類表の前記概念識別子、当該重要話題に関連付けられた文書の文書識別子を前記文書分類表の前記文書識別子、当該重要話題の前記概念木上の階層を前記文書分類表の抽象度、現抽象度として設定し、前記文書群統合・分割手段は、前記前回入力された抽象度が前記今回入力された抽象度よりも大きい場合に、前記今回入力された抽象度と前記文書分類表の現抽象度とに基づき、前記今回入力された抽象度の重要話題に、前記今回入力された抽象度の重要話題の下位概念の重要話題を統合し、前記前回入力された抽象度が前記今回入力された抽象度以下の場合に、前記今回入力された抽象度と前記文書分類表の現抽象度とに基づき、下位概念の重要話題を統合している上位概念の重要話題を、前記今回入力された抽象度の重要話題に分割するように、文書分類表を更新し、前記提示手段は、前記文書群統合・分割手段により更新された前記文書分類表に基づき、前記今回入力された抽象度以下の前記重要話題に関連付けられた文書群を、縦軸を出現頻度、横軸を出現日時とするグラフ上に、前記今回入力された抽象度以下の前記重要話題に関連付けられた文書数に比例する面積で提示する。
また、本発明は、文書を入力し、登録・保管・検索・表示する機能を有する文書管理システムの文書群を提示する文書群提示プログラムであって、前記文書群提示プログラムは、コンピュータに、前述の文書群提示装置の各手段を実現させる。 Further, in the present invention, the document classification table construction unit, when the important topic extracted by the important topic extraction unit exists in the concept tree constructed by the concept tree construction unit , The concept identifier is the concept identifier of the document classification table, the document identifier of the document associated with the important topic is the document identifier of the document classification table, and the hierarchy of the important topic on the concept tree is the abstract of the document classification table When the abstraction level input last time is greater than the abstraction level input this time , the document group integration / division means sets the abstraction level input this time and the document classification. based on the current level of abstraction table, the important topic of the currently input abstraction, integrates important topic subgeneric important topic of the currently input abstraction level, the previous time input abstraction Said If: abstract input times, based on said a currently input abstraction and the current abstract of the document classification table, an important topic of the preamble that integrates important topic subgeneric, the current The document classification table is updated so as to divide into the important topics of the abstraction level inputted, and the presenting means is based on the document classification table updated by the document group integration / division means, and the abstraction inputted this time degrees below the critical topic group document associated with the appearance of the vertical axis frequency on the graph for the horizontal axis and the appearance date, the number of documents associated with the important topic of the following abstract wherein is currently input Present in proportional area.
The present invention is also a document group presentation program for presenting a document group of a document management system having functions for inputting, registering, storing, searching, and displaying a document, and the document group presentation program is stored in the computer. Each means of the document group presentation apparatus is realized.

本願において開示される発明のうち代表的なものによって得られる効果を簡単に説明すれば、下記の通りである。
本発明によれば、指定した抽象度で文書群の統合・分割結果を作成することができ、分類対象となる文書群の文書空間を地図に見立て意味上の拡大・縮小が可能なインタフェースを構築することが可能となる。 The effects obtained by the representative ones of the inventions disclosed in the present application will be briefly described as follows.
According to the present invention, it is possible to create an integration / division result of a document group with a specified abstraction level, and to construct an interface that can be expanded or reduced semantically by using the document space of the document group to be classified as a map It becomes possible to do.

以下、図面を参照して本発明の実施例を詳細に説明する。
なお、実施例を説明するための全図において、同一機能を有するものは同一符号を付け、その繰り返しの説明は省略する。
［機能ブロック図］
図１に、本発明の実施例の文書群提示装置の機能ブロックを示す図を示す。
図１に示すように、文書群提示装置１１は、概念木構築手段１０１と、重要話題抽出手段１０２と、文書分類表構築手段１０３と、概念辞書記憶手段１０４と、文書記憶手段１０５と、文書群統合・分割手段１０６と、文書群提示手段１０７と、抽象度入力手段１０８と、表示手段１０９とを備える。
概念木構築手段１０１は、概念辞書記憶手段１０４から概念の単語を読出し、親子関係を解析してメモリ上に木構造を構築する。
重要話題抽出手段１０２は、文書記憶手段１０５から文書を読出し、テキスト文章を抽出し、形態素解析を実施することで品詞分解し話題を抽出し、所定の手続きにより重要度を判定し、重要話題を抽出する。
文書分類表構築手段１０３は、概念木構築手段１０１で構築された概念木と、重要話題抽出手段１０２で抽出された重要話題を元に、文書群の提示に必要となる概念識別子、文書識別子、抽象度、現抽象度、現在位置を表現する概念木を含む文書分類表を構築する。
文書群統合・分割手段１０６は、抽象度入力手段１０８により入力された抽象度に従い文書分類表を解析し、表示対象となる抽象度の文書分類にそれ以下の抽象度の文書分類を統合・分割されるよう文書分類表を更新する。
文書群提示手段１０７は、前記文書統合／分類手段１０６で作成された文書分類表を解析し表示対象分類対象文書群を表示装置に表示する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
In all the drawings for explaining the embodiments, parts having the same functions are given the same reference numerals, and repeated explanation thereof is omitted.
[Function block diagram]
FIG. 1 shows a functional block diagram of a document group presentation apparatus according to an embodiment of the present invention.
As shown in FIG. 1, the document group presentation device 11 includes a concept tree construction unit 101, an important topic extraction unit 102, a document classification table construction unit 103, a concept dictionary storage unit 104, a document storage unit 105, and a document. A group integration / division unit 106, a document group presentation unit 107, an abstraction level input unit 108, and a display unit 109 are provided.
The concept tree construction unit 101 reads a concept word from the concept dictionary storage unit 104, analyzes the parent-child relationship, and constructs a tree structure on the memory.
The important topic extraction unit 102 reads a document from the document storage unit 105, extracts a text sentence, performs a morphological analysis, extracts a topic, extracts a topic, determines an importance by a predetermined procedure, and determines an important topic. Extract.
Based on the concept tree constructed by the concept tree construction unit 101 and the important topic extracted by the important topic extraction unit 102, the document classification table construction unit 103 includes a concept identifier, a document identifier, A document classification table including a concept tree representing the abstraction level, the current abstraction level and the current position is constructed.
The document group integration / division unit 106 analyzes the document classification table according to the abstraction level input by the abstraction level input unit 108, and integrates / divides the document classifications of the lower abstraction level into the abstraction document classifications to be displayed. Update the document classification table.
The document group presentation unit 107 analyzes the document classification table created by the document integration / classification unit 106 and displays the display target classification target document group on the display device.

［ハードウェア構成］
図１に示す文書群提示装置１１を実行するコンピュータ装置のハードウェア構成を図２に示す。
図２に示すように、文書群提示装置１１を実行するコンピュータは、ディスプレイ２０１と、ＣＰＵ２０２と、メモリ２０３と、キーボード／マウス２０４と、ハードディスク２０５、ＣＤ−ＲＯＭ２０７を読み込むためのＣＤ−ＲＯＭドライブ２０６と、インターネット２０９と接続される通信回路２０８とを備える。ハードディスク２０５には、文書提示プログラム２０５１、概念辞書データベース２０５２、および文書データベース２０５３が格納されている。
図１の概念木構築手段１０１は、概念辞書データベース２０５２を使用し、ＣＰＵ２０２がメモリ２０３を用いて実行する文書提示プログラムにより実現される。
図１の重要話題抽出手段１０２は、文書データベース２０５３を使用し、ＣＰＵ２０２がメモリ２０３を用いて実行する文書提示プログラム２０５１により実現される。
図１の文書分類表構築手段１０３は、ＣＰＵ２０２がメモリ２０３を用いて実行する文書提示プログラム２０５１により実現される。
図１の概念辞書記憶手段１０４は、概念辞書データベース２０５２により実現される。図１の文書記憶手段１０５は、文書データベース２０５３により実現される。
図１の文書群統合・分割手段１０６は、ＣＰＵ２０２がメモリ２０３を用いて実行する文書提示プログラム２０５１により実現される。
図１の文書群提示手段１０７は、ＣＰＵ２０２がメモリ２０３を用いて実行する文書提示プログラム２０５１により実現される。
図１の抽象度入力手段１０８は、キーボード／マウス２０４によって実現される。図１の表示手段１０９は、ディスプレイ２０１によって実現される。 [Hardware configuration]
FIG. 2 shows a hardware configuration of a computer device that executes the document group presentation device 11 shown in FIG.
As shown in FIG. 2, the computer executing the document group presentation apparatus 11 includes a display 201, a CPU 202, a memory 203, a keyboard / mouse 204, a hard disk 205, and a CD-ROM drive 206 for reading a CD-ROM 207. And a communication circuit 208 connected to the Internet 209. The hard disk 205 stores a document presentation program 2051, a concept dictionary database 2052, and a document database 2053.
The concept tree construction unit 101 of FIG. 1 uses a concept dictionary database 2052 and is realized by a document presentation program executed by the CPU 202 using the memory 203.
The important topic extraction unit 102 in FIG. 1 uses a document database 2053 and is realized by a document presentation program 2051 that is executed by the CPU 202 using the memory 203.
The document classification table construction unit 103 in FIG. 1 is realized by a document presentation program 2051 that the CPU 202 executes using the memory 203.
1 is realized by a concept dictionary database 2052. The document storage unit 105 in FIG. 1 is realized by a document database 2053.
The document group integration / division means 106 in FIG. 1 is realized by a document presentation program 2051 that is executed by the CPU 202 using the memory 203.
The document group presentation unit 107 in FIG. 1 is realized by a document presentation program 2051 that the CPU 202 executes using the memory 203.
The abstraction level input means 108 in FIG. 1 is realized by a keyboard / mouse 204. The display unit 109 in FIG. 1 is realized by the display 201.

［処理詳細］
図４〜図８を用いて、本実施例の処理手順について説明する。
図４は、図１に示す文書群提示装置１１の概念木構築手段１０１の処理手順を示すフローチャートである。
図５は、図１に示す文書群提示装置１１の重要話題抽出手段１０２の処理手順を示すフローチャートである。
図６は、図１に示す文書群提示装置１１の文書分類表構築手段１０３の処理手順を示すフローチャートである。
図７は、図１に示す文書群提示装置１１の文書群統合・分割手段１０６の処理手順を示すフローチャートである。
図８は、図１に示す文書群提示装置１１の文書群提示手段１０７の処理手順を示すフローチャートである。
なお、図４〜図８のフローチャートは、図２に示す文書提示プログラム２０５１により実現される。 [Processing details]
The processing procedure of this embodiment will be described with reference to FIGS.
FIG. 4 is a flowchart showing a processing procedure of the concept tree construction unit 101 of the document group presentation device 11 shown in FIG.
FIG. 5 is a flowchart showing a processing procedure of the important topic extraction means 102 of the document group presentation device 11 shown in FIG.
FIG. 6 is a flowchart showing a processing procedure of the document classification table construction unit 103 of the document group presentation device 11 shown in FIG.
FIG. 7 is a flowchart showing a processing procedure of the document group integration / division means 106 of the document group presentation apparatus 11 shown in FIG.
FIG. 8 is a flowchart showing a processing procedure of the document group presenting means 107 of the document group presenting apparatus 11 shown in FIG.
4 to 8 are realized by the document presentation program 2051 shown in FIG.

［概念木構築処理］
ユーザが、文書提示プログラム２０５１を開始すると概念木構築手段１０１による概念木構築処理が開始する。概念木構築処理では、あらかじめメモリ２０３に構築結果の概念木を保持する領域を確保しておく。
最初に、概念辞書記憶手段１０４から概念を読込み１件分の概念を取得する（ステップＳ４０１）。ここで概念辞書は、図９に示す概念部と図１０に示す関係部から構成されている辞書を想定しており、図９の１件目の概念９０１を読出し、つぎに図１０に示す関係部から自分が下位概念になっている概念を読込む。この場合、概念９０１が下位概念に該当する関係は存在しないため、読込まない。
つぎに、メモリ２０３に図３の概念３０１に示すオブジェクトを作成し、読込んだ概念の概念識別子である「１」、概念名称である「概念」、を設定する（ステップＳ４０２）。上位概念、下位概念は存在しないため設定しない。
つぎに、結果の概念木をルート要素から順に下位概念へと辿りながら（ステップＳ４０３）、読込んだ概念が下位概念となる上位概念が存在するかどうかを判定する（ステップＳ４０４）。この場合、まだ概念木には要素が存在しないのでルート要素に現在の概念を設定する（ステップＳ４０５）。 [Concept tree construction processing]
When the user starts the document presentation program 2051, the concept tree construction process by the concept tree construction unit 101 starts. In the concept tree construction process, an area for retaining the concept tree of the construction result is secured in the memory 203 in advance.
First, the concept is read from the concept dictionary storage unit 104 to acquire one concept (step S401). Here, the concept dictionary is assumed to be a dictionary composed of the concept part shown in FIG. 9 and the relation part shown in FIG. 10, and the first concept 901 in FIG. 9 is read, and then the relation shown in FIG. Read the concept that you are a subordinate concept from. In this case, the concept 901 is not read because there is no relationship corresponding to the subordinate concept.
Next, an object shown in the concept 301 of FIG. 3 is created in the memory 203, and “1” that is the concept identifier of the read concept and “concept” that is the concept name are set (step S402). There is no superordinate concept or subordinate concept, so it is not set.
Next, while tracing the resulting concept tree from the root element to the lower level concept (step S403), it is determined whether or not there is a higher level concept whose subordinate concept is the read concept (step S404). In this case, since there is no element in the concept tree yet, the current concept is set as the root element (step S405).

つぎに、読込んだ概念が最後の概念か判定する（ステップＳ４０６）。この場合、まだ最後の概念ではないので次の概念を読込む（ステップＳ４０１）。
図９の２件目の主体９０２を読出し、つぎに関係部である図１０から自分が下位概念になっている１００１の１行を読込む。
つぎに、メモリ２０３に図３の概念３０１に示すオブジェクトを作成し、読込んだ概念の概念識別子である「２」、概念名称である「主体」、上位概念識別子には「概念」の識別子である「１」を設定する。この場合、下位概念は存在しないため設定しない。
つぎに、結果の概念木をルート要素から順に下位概念へと辿りながら（ステップＳ４０３）、読込んだ概念が下位概念となる上位概念が存在するかどうかを判定する（ステップＳ４０４）。この場合、「主体」の概念オブジェクトは概念木に既に存在する「概念」の概念オブジェクトの下位概念に該当するため、「概念」の概念オブジェクトの下位概念には、「主体」の概念オブジェクトそのものを設定する。
以降、最後の概念まで、この処理を繰返し、図１１に示す概念木を構築する。ここで、「概念」の概念オブジェクトの下位概念には、「主体」、「ものごと」、「事象」、「位置」、「時の」の概念オブジェクトが設定される。 Next, it is determined whether the read concept is the last concept (step S406). In this case, since it is not the last concept yet, the next concept is read (step S401).
The second subject 902 in FIG. 9 is read, and then one line 1001 that is a subordinate concept is read from FIG.
Next, an object shown in the concept 301 of FIG. 3 is created in the memory 203, and the read concept identifier “2”, the concept name “subject”, and the higher concept identifier are “concept” identifiers. A certain “1” is set. In this case, since there is no subordinate concept, it is not set.
Next, while tracing the resulting concept tree from the root element to the lower level concept (step S403), it is determined whether or not there is a higher level concept whose subordinate concept is the read concept (step S404). In this case, since the concept object of “subject” corresponds to the subordinate concept of the concept object of “concept” that already exists in the concept tree, the concept object of “subject” is not included in the subordinate concept of the concept object of “concept”. Set.
Thereafter, this process is repeated until the last concept, and the concept tree shown in FIG. 11 is constructed. Here, concept objects of “subject”, “thing”, “event”, “position”, and “time” are set as subordinate concepts of the concept object of “concept”.

［重要話題抽出処理］
概念木構築処理が終了すると、つぎに重要話題抽出手段１０２による重要話題抽出処理が開始する。
重要話題抽出処理では、最初に、文書記憶手段１０５から文書を読込み１件分の文書を取得する（ステップＳ５０１）、この場合、文書は、図１２に示す見出し１２０１と本文１２０２から構成されている文書を想定している。
つぎに、メモリ２０３に図３の文書３０２に示すオブジェクトを作成し、連番の文書識別子、この場合、「０」を自動的に割当て設定し、ファイルのパスから文書ファイルパス、この場合、「概念．ｄｏｃ」を設定する（ステップＳ５０２）。
つぎに、文書から見出し１２０１と本文１２０２を含む全てのテキストを抽出し、１文を抽出する（ステップＳ５０３）。取り込んだ文章から、第１文を切り出すには、たとえば、文章の最初の句点「。」または改行を検出することにより行う。この場合、取り込んだ文章が、たとえば、「概念とは、物事の総括的・概括的な意味のこと。ある事柄に対して…」であったとすると、ステップＳ５０３で、第１文として、「概念とは、物事の総括的・概括的な意味のこと。」が切り出される。
つぎに、抽出したテキスト文章を形態素解析処理し、品詞に分解する（ステップＳ５０４）。形態素解析処理においては、切り出された文を単語に分解するとともに品詞情報を生成する。このような形態素解析処理には既知の手法を用いることができる。
この場合、図１３に示す１３０２の形で解析処理を実施し、１３０３の単語の一覧を得る。 [Important topic extraction processing]
When the concept tree construction process ends, the important topic extraction unit 102 starts the important topic extraction process.
In the important topic extraction process, first, a document is read from the document storage unit 105 to obtain one document (step S501). In this case, the document is composed of a heading 1201 and a body 1202 shown in FIG. Assume a document.
Next, an object shown in the document 302 of FIG. 3 is created in the memory 203, and a sequential document identifier, in this case, “0” is automatically assigned and set. From the file path to the document file path, in this case, “ Concept.doc "is set (step S502).
Next, all texts including the headline 1201 and the body 1202 are extracted from the document, and one sentence is extracted (step S503). To cut out the first sentence from the captured sentence, for example, it is performed by detecting the first punctuation mark “.” Or a line break in the sentence. In this case, if the captured sentence is, for example, “concept is a general / general meaning of things. For a certain thing…”, in step S503, “concept” Is the general and general meaning of things. "
Next, the extracted text sentence is subjected to morphological analysis processing and decomposed into parts of speech (step S504). In the morphological analysis process, the extracted sentence is decomposed into words and part-of-speech information is generated. A known method can be used for such morphological analysis processing.
In this case, analysis processing is performed in the form of 1302 shown in FIG. 13 to obtain a list of words 1303.

つぎに、分解された単語の一覧を話題の一覧として、１件目の話題を取得する（ステップＳ５０５）。この場合、「概念」を取得する。
つぎに、概念木構築手段１０１で構築した概念木に「概念」が存在するかどうか判定する（ステップＳ５０６）。この場合、概念木に「概念」が存在するので、図３の文書分類３０３に示すオブジェクトを生成する（ステップＳ５０７）。
つぎに、概念識別子として、図３の文書分類３０３に示すオブジェクトの概念識別子に「概念」の識別子である「１」を設定し、文書識別子として「１」を設定し、図３の概念３０１に示す「概念」のオブジェクトの出現頻度を、＋１増加させ、文書の作成日付である「２００３年６月１５日」を出現日時に設定する（ステップＳ５０８）。なお、図３の概念３０１に示す「概念」のオブジェクトの出現頻度が、＋１増加させるごとに、出現日時は、最も新しい文書の作成日付に更新される。
つぎに、「概念」が見出し語かどうかを判定し（ステップＳ５０９）、見出し語ではなかった場合には、例えば、図３の概念３０１に示す「概念」のオブジェクトの出現頻度が、予めハードディスク２０５に保存されているファイルに設定されている閾値を越えているかどうかで判定する（ステップＳ５１０）。この場合、「概念」は見出し語であるので重要単語であると判定する。 Next, the first topic is acquired using the decomposed word list as the topic list (step S505). In this case, “concept” is acquired.
Next, it is determined whether or not “concept” exists in the concept tree constructed by the concept tree construction unit 101 (step S506). In this case, since “concept” exists in the concept tree, an object shown in the document classification 303 of FIG. 3 is generated (step S507).
Next, “1” that is an identifier of “concept” is set as the concept identifier of the object shown in the document classification 303 of FIG. 3 as the concept identifier, “1” is set as the document identifier, and the concept 301 of FIG. The appearance frequency of the “concept” object shown is incremented by +1, and the document creation date “June 15, 2003” is set as the appearance date (step S508). Each time the appearance frequency of the “concept” object shown in the concept 301 of FIG. 3 is increased by +1, the appearance date and time is updated to the creation date of the newest document.
Next, it is determined whether or not “concept” is a headword (step S509). If it is not a headword, for example, the appearance frequency of the object of “concept” shown in the concept 301 of FIG. Judgment is made based on whether or not the threshold value set in the file stored in is exceeded (step S510). In this case, since “concept” is a headword, it is determined to be an important word.

つぎに、ステップＳ５０７で生成された文書分類オブジェクトを図３の文書分類表３０４に追加する（ステップＳ５１２）。
つぎに、現在の話題が最後の話題かどうかを確認する（ステップＳ５１３）。この場合、最後の話題ではないのでつぎの話題を取得する（ステップＳ５０５）。
以降、ステップＳ５０５〜ステップＳ５１３のステップを繰返し、現在の話題が最後の話題であった場合には、最後の文かどうかを確認する（ステップＳ５１４）。この場合、最後の文ではないのでつぎの文を抽出する（ステップＳ５０３）。
以降、ステップＳ５０３〜ステップＳ５１４のステップを繰返し、現在の文が最後の文であった場合には、最後の文書かどうかを確認する（ステップＳ５１５）。この場合、最後の文書ではないのでつぎの文書を読込む（ステップＳ５０１）。
以降、ステップＳ５０１〜ステップＳ５１５のステップを繰返し、現在の文書が最後の文書であった場合には、終了する。 Next, the document classification object generated in step S507 is added to the document classification table 304 of FIG. 3 (step S512).
Next, it is confirmed whether or not the current topic is the last topic (step S513). In this case, since it is not the last topic, the next topic is acquired (step S505).
Thereafter, the steps from Step S505 to Step S513 are repeated, and if the current topic is the last topic, it is confirmed whether it is the last sentence (Step S514). In this case, since it is not the last sentence, the next sentence is extracted (step S503).
Thereafter, the steps S503 to S514 are repeated, and if the current sentence is the last sentence, it is confirmed whether or not it is the last document (step S515). In this case, since it is not the last document, the next document is read (step S501).
Thereafter, steps S501 to S515 are repeated, and if the current document is the last document, the process ends.

［文書分類表構築処理］
重要話題抽出処理が終了すると、つぎに文書分類表構築手段１０３による文書分類表構築処理が開始する。
文書分類表構築処理では、最初に、重要話題抽出手段１０２で作成された文書分類表３０４から１件分の文書分類オブジェクトを取得する（ステップＳ６０１）。この場合、重要話題名が「概念」である文書分類が取得される。
つぎに、重要話題名が「概念」である文書分類に対応する概念が概念木構築手段１０１により構築された概念木に存在するかどうかを判定する（ステップＳ６０２）。この場合、概念木に「概念」という概念が存在するため、文書分類オブジェクトの抽象度に現在の概念木の階層の段数である「１」を設定し（ステップＳ６０３）、文書分類オブジェクトの現抽象度に現在の概念木の階層の段数である「１」を設定する（ステップＳ６０４）。
つぎに、更新された文書分類で文書分類表を更新する（ステップＳ６０５）。
つぎに、最後の文書分類かどうかを判定する（ステップＳ６０６）。この場合、最後の文書分類ではないため、つぎの文書分類を取得する（ステップＳ６０１）。
以降、最後の文書分類までステップＳ６０１〜ステップＳ６０６を繰返し、図１４に示す文書分類表を構築する。
この文書分類表は、図３の文書分類表３０４を例示したものである。たとえば、１４０１に示す表の１行は、図３の文書分類３０３に相当し、１４０２に示す「概念のツリー表現」で表現される３列は、文書分類３０３の現概念に相当する階層構造を概念名称で表現したものである。 [Document classification table construction process]
When the important topic extraction process ends, the document classification table construction process by the document classification table construction unit 103 starts.
In the document classification table construction process, first, one document classification object is acquired from the document classification table 304 created by the important topic extraction unit 102 (step S601). In this case, a document classification whose important topic name is “concept” is acquired.
Next, it is determined whether or not the concept corresponding to the document classification whose important topic name is “concept” exists in the concept tree constructed by the concept tree construction unit 101 (step S602). In this case, since the concept “concept” exists in the concept tree, “1”, which is the number of stages in the current concept tree hierarchy, is set as the abstraction level of the document classification object (step S603), and the current abstract of the document classification object is set. Each time, “1”, which is the number of stages in the current concept tree hierarchy, is set (step S604).
Next, the document classification table is updated with the updated document classification (step S605).
Next, it is determined whether it is the last document classification (step S606). In this case, since it is not the last document classification, the next document classification is acquired (step S601).
Thereafter, steps S601 to S606 are repeated until the last document classification, and the document classification table shown in FIG. 14 is constructed.
This document classification table is an example of the document classification table 304 of FIG. For example, one row of the table 1401 corresponds to the document classification 303 in FIG. 3, and three columns represented by “concept tree representation” 1402 have a hierarchical structure corresponding to the current concept of the document classification 303. It is expressed by a concept name.

［文書群統合・分割処理］
文書分類表構築処理が終了すると、文書群統合・分割手段１０６による文書群統合・分割処理が開始する。
文書群統合・分割処理は、文書群統合処理と文書群分割処理から構成されている。入力された抽象度が以前の抽象度よりも小さい場合には、文書群統合処理が開始し、入力された抽象度が以前の抽象度よりも大きい場合には、文書群分割処理が開始する。
［文書群統合処理］
文書群統合処理は、最初に、ユーザが抽象度を入力することから開始する（ステップＳ７０１）。この場合、抽象度として「４」を入力したとする。
つぎに、文書分類表構築手段１０３で構築された図１４に示す文書分類表から１件分の文書分類を取得する（ステップＳ７０２）。この場合、重要話題名が「概念」である文書分類が取得される。
つぎに、前回入力された抽象度と今回入力された抽象度を比較する（ステップＳ７０３）。この場合、最大の抽象度まで展開されており、前回入力された抽象度として「５」が設定されているため、入力された抽象度と文書分類の抽象度を比較する（ステップＳ７１０）。
この場合、文書分類の現抽象度は「１」であり、入力された抽象度は「４」であり、入力された抽象度が文書分類の現抽象度より大きいため、対象をつぎの文書分類に移す（ステップＳ７０２）。
以降、同様に図１４の３１行目までは、ステップＳ７０２〜ステップＳ７１０を繰返す。 [Document group integration / division processing]
When the document classification table construction process ends, the document group integration / division process by the document group integration / division means 106 starts.
The document group integration / division process includes a document group integration process and a document group division process. When the input abstraction level is lower than the previous abstraction level, the document group integration process starts. When the input abstraction level is higher than the previous abstraction level, the document group division process starts.
[Document group integration processing]
First, the document group integration process starts when the user inputs an abstraction level (step S701). In this case, it is assumed that “4” is input as the abstraction level.
Next, one document classification is acquired from the document classification table shown in FIG. 14 constructed by the document classification table construction unit 103 (step S702). In this case, a document classification whose important topic name is “concept” is acquired.
Next, the abstraction level input last time is compared with the abstraction level input this time (step S703). In this case, since the maximum abstraction level has been expanded and “5” is set as the previously input abstraction level, the input abstraction level is compared with the abstraction level of the document classification (step S710).
In this case, the current abstraction level of the document classification is “1”, the input abstraction level is “4”, and the input abstraction level is higher than the current abstraction level of the document classification. (Step S702).
Thereafter, similarly, steps S702 to S710 are repeated up to the 31st line in FIG.

対象の文書分類が３２行目の重要話題名が「高校教師」である文書分類である場合、入力された抽象度が文書分類の現抽象度より小さくなるため、文書分類の現概念の上位概念を辿り（ステップＳ７１１）、上位概念の抽象度は、入力された抽象度かどうかを判断する（ステップＳ７１２）。
この場合、重要話題名が「高校教師」である文書分類の上位概念は、重要話題名が「教師」である文書分類となり、この文書分類の抽象度は「４」であり、入力された抽象度に合致するため、現抽象度に入力された抽象度である「４」を設定する（ステップ７１３）。
つぎに、図３の文書分類３０３に示す現概念に、上位概念を設定する（ステップＳ７１４）。この場合、名称が「教師」である概念を設定する。現概念に設定された概念オブジェクトは、上位概念と下位概念を持った木構造を形成している。
つぎに、現在の文書分類が最後の文書分類かどうか確認する（ステップＳ７１５）。この場合、まだ最後の文書分類ではないので、対象をつぎの文書分類に移す（ステップＳ７０２）。
以降、ステップＳ７０２〜ステップＳ７１５のステップを繰返し、図１５に示す文書分類表を構築する。 When the target document classification is a document classification whose important topic name in the 32nd line is “high school teacher”, the input abstraction is smaller than the current abstraction of the document classification. (Step S711), it is determined whether the abstraction level of the superordinate concept is the input abstraction level (step S712).
In this case, the superordinate concept of the document classification whose important topic name is “high school teacher” is the document classification whose important topic name is “teacher”, and the abstraction level of this document classification is “4”, and the inputted abstract In order to match the degree, “4”, which is the abstraction level input to the current abstraction degree, is set (step 713).
Next, a superordinate concept is set to the current concept shown in the document classification 303 of FIG. 3 (step S714). In this case, a concept whose name is “teacher” is set. The concept objects set in the current concept form a tree structure having a superordinate concept and a subordinate concept.
Next, it is confirmed whether or not the current document classification is the last document classification (step S715). In this case, since it is not the last document classification, the target is moved to the next document classification (step S702).
Thereafter, steps S702 to S715 are repeated to construct the document classification table shown in FIG.

［文書群分割処理］
文書群分割処理は、最初に、ユーザが抽象度を入力することから開始する（ステップＳ７０１）。この場合、抽象度として「５」を入力したとする。
つぎに、文書統合処理で構築された図１５に示す文書分類表から１件分の文書分類を取得する（ステップＳ７０２）。この場合、重要話題名が「概念」である文書分類が取得される。
つぎに、前回入力された抽象度と今回入力された抽象度を比較する（ステップＳ７０３）。この場合、前回入力された抽象度として「４」が設定されているため、文書分類の抽象度が文書分類の現抽象度と異なるか判定する（ステップＳ７０４）。この場合、文書分類の抽象度は「１」、文書分類の現抽象度は「１」と同じであるため、対象をつぎの文書分類に移す（ステップＳ７０２）。以降、同様に図１５の３１行目までは、ステップＳ７０２〜ステップＳ７０４を繰返す。
対象の文書分類が３２行目の重要話題名が「高校教師」の文書分類の場合、文書分類の抽象度は「５」、文書分類の現抽象度は「４」と異なるため、入力された抽象度と文書分類の抽象度を比較する（ステップＳ７０５）。
ここでは、文書分類の抽象度は「５」、入力された抽象度は「５」であり、入力された抽象度が文書分類の現抽象度以下であるため、文書分類の現概念（図１５では「教師」）の下位概念を辿り（ステップＳ７０６）、概念は入力された抽象度かどうかを判断する（ステップＳ７０７）。 [Document group split processing]
The document group division process starts with the user inputting an abstraction level (step S701). In this case, it is assumed that “5” is input as the abstraction level.
Next, one document classification is acquired from the document classification table shown in FIG. 15 constructed by the document integration process (step S702). In this case, a document classification whose important topic name is “concept” is acquired.
Next, the abstraction level input last time is compared with the abstraction level input this time (step S703). In this case, since “4” is set as the abstraction level input last time, it is determined whether the abstraction level of the document classification is different from the current abstraction level of the document classification (step S704). In this case, since the abstract level of the document category is “1” and the current abstract level of the document category is “1”, the target is moved to the next document category (step S702). Thereafter, similarly, steps S702 to S704 are repeated until the 31st line in FIG.
When the target document classification is the document classification whose important topic name on line 32 is “high school teacher”, the abstract level of the document classification is different from “5” and the current abstraction level of the document classification is different from “4”. The abstraction level is compared with the abstraction level of the document classification (step S705).
Here, the abstract level of the document classification is “5”, the input abstraction level is “5”, and the input abstraction level is less than or equal to the current abstract level of the document classification, so the current concept of the document classification (FIG. 15). Then, the subordinate concept of “teacher” is traced (step S706), and it is determined whether or not the concept has the input abstraction level (step S707).

この場合、重要話題名が「高校教師」の文書分類の現概念である「教師」に設定されている下位概念は、「高校教師」、「中学教師」、「小学教師」である文書分類となり、抽象度は「５」であり、入力された抽象度に合致する。
つぎに、重要話題名が「高校教師」の文書分類の現概念である「教師」に設定されている下位概念が、現在の文書分類の概念を子概念に含む概念かどうかを判定する（Ｓ７０８）。この場合、現在の文書分類の概念は「高校教師」であり、下位概念は「高校教師」、「中学教師」、「小学教師」であるため、「高校教師」を選択する。
つぎに、現抽象度を入力された抽象度に設定する（Ｓ７０９）。この場合、現在の文書分類である「高校教師」の現抽象度に入力された抽象度である「５」を設定する。
つぎに、図３の文書分類３０３に示す現概念に、下位概念を設定する（Ｓ７１４）。この場合、名称が「高校教師」である概念を設定する。現概念に設定された概念オブジェクトは、上位概念と下位概念を持った木構造を形成している。
つぎに、現在の文書分類が最後の文書分類かどうか確認する（ステップＳ７１５）。この場合、まだ最後の文書分類ではないので、対象をつぎの文書分類に移す（ステップＳ７０２）。
以降、ステップＳ７０２〜ステップＳ７１５のステップを繰返し、図１６に示す文書分類表を構築する。 In this case, the subordinate concept set to “teacher”, which is the current concept of the document classification of “high school teacher” as the important topic name, becomes the document classification of “high school teacher”, “junior high school teacher”, and “primary school teacher”. The abstraction level is “5”, which matches the input abstraction level.
Next, it is determined whether or not the subordinate concept set in the “teacher”, which is the current concept of the document classification with the important topic name “high school teacher”, is a concept that includes the current document classification concept as a child concept (S708). ). In this case, since the current document classification concept is “high school teacher” and the subordinate concepts are “high school teacher”, “junior high school teacher”, and “elementary school teacher”, “high school teacher” is selected.
Next, the current abstraction level is set to the input abstraction level (S709). In this case, the abstraction level “5” input to the current abstraction level of “high school teacher”, which is the current document classification, is set.
Next, a subordinate concept is set to the current concept shown in the document classification 303 of FIG. 3 (S714). In this case, a concept whose name is “high school teacher” is set. The concept objects set in the current concept form a tree structure having a superordinate concept and a subordinate concept.
Next, it is confirmed whether or not the current document classification is the last document classification (step S715). In this case, since it is not the last document classification, the target is moved to the next document classification (step S702).
Thereafter, steps S702 to S715 are repeated to construct the document classification table shown in FIG.

［文書群提示処理］
文書群統合・分割処理が終了すると、つぎに文書群提示手段１０７により処理文書群提示処理が開始する。この場合、文書分類表として抽象度「４」で統合された文書分類表であるとする。
文書群提示処理では、最初に、文書群統合・分割手段１０６で作成された図１５に示す文書分類表から、１件分の文書分類オブジェクトを取得する（ステップＳ８０１）。この場合、重要話題名が「概念」である文書分類が取得される。
つぎに、文書分類オブジェクトの現抽象度が入力された抽象度以下かどうか判定する（ステップＳ８０２）。この場合、入力された抽象度は「４」であり、重要話題名が「概念」である文書分類の現抽象度は「１」であり、入力された抽象度以下であるため、以降の処理を続ける。
つぎに、文書分類表３０４から文書分類オブジェクトの現概念と同じ現概念が存在するかどうか確認する（ステップＳ８０３）。この場合、現概念が「概念」である文書分類は存在しないため、以降の処理を続ける。
つぎに、現概念に属する文書数から表示面積を算出する（ステップＳ８０８）。この場合、重要話題名が「概念」の文書数が「１」であるので、たとえば、表示面積として縦３２ｘ１ドット、横３２ｘ１ドットの表示エリアとする。文書数が多ければ表示エリアの面積を広くする。
つぎに、表示エリアの位置を算出し、縦軸が出現頻度、横軸が出現日時とするグラフ上に、縦３２ドット、横３２ドットの表示エリア自体を表示する（ステップＳ８０９）。この場合、表示エリアの位置は、表示エリアの中心位置の座標が、概念の出現頻度（例えば、「１」）と、概念の出現日時（例えば、「２００３年６月１５日」）となる位置とする。
つぎに、表示エリア内に現概念の名称を表示する（ステップＳ８１０）。この場合、「概念」を表示する。 [Document group presentation processing]
When the document group integration / division process ends, the document group presentation unit 107 starts the processed document group presentation process. In this case, it is assumed that the document classification table is a document classification table integrated at an abstraction level “4”.
In the document group presentation process, first, one document classification object is acquired from the document classification table shown in FIG. 15 created by the document group integration / division unit 106 (step S801). In this case, a document classification whose important topic name is “concept” is acquired.
Next, it is determined whether or not the current abstraction level of the document classification object is equal to or less than the input abstraction level (step S802). In this case, since the input abstraction level is “4” and the current abstract level of the document classification whose important topic name is “concept” is “1”, which is below the input abstraction level, the subsequent processing Continue.
Next, it is confirmed from the document classification table 304 whether the same current concept as the current concept of the document classification object exists (step S803). In this case, since there is no document classification whose current concept is “concept”, the subsequent processing is continued.
Next, the display area is calculated from the number of documents belonging to the current concept (step S808). In this case, since the number of documents whose important topic name is “concept” is “1”, for example, the display area is a display area of 32 × 1 dots in the vertical direction and 32 × 1 dots in the horizontal direction. If the number of documents is large, the display area is increased.
Next, the position of the display area is calculated, and the display area itself of 32 dots vertically and 32 dots horizontally is displayed on the graph where the vertical axis represents the appearance frequency and the horizontal axis represents the appearance date and time (step S809). In this case, the position of the display area is the position where the coordinates of the center position of the display area are the concept appearance frequency (for example, “1”) and the concept appearance date (for example, “June 15, 2003”). And
Next, the name of the current concept is displayed in the display area (step S810). In this case, “concept” is displayed.

つぎに、文書分類オブジェクトの文書識別子を元に文書ファイルパスを取得し、文書自体を表示エリアに表示する（ステップＳ８１１）。この場合、表示エリアには、文書「概念．ｄｏｃ」の内容である図１２が表示される。
つぎに、現在の文書分類が最後の文書分類かどうか確認する（ステップＳ８１２）。この場合、まだ最後の文書分類ではないので、対象をつぎの文書分類に移す（ステップＳ８０１）。
以降、同様に図１５の２６行目までは、ステップＳ８０１〜Ｓ８１２を繰返す。この場合、対象の文書分類が２７行目の重要話題名が「教師」である文書分類の場合、同じ現概念である文書分類が３２行目〜３４行目まで存在するため、同じ現概念の文書分類を抽出し（ステップＳ８０４）、抽出した文書分類の現概念の出現頻度を現在の文書分類の現概念の出現頻度に足し（ステップＳ８０５）、抽出した文書分類の現概念の出現日時と現在の文書分類の現概念の出現日時を比較し、抽出した文書分類の現概念の出現日時の方が新しい場合には更新し（ステップＳ８０６）、抽出した文書分類の文書識別子に対応する文書も表示対象にする（ステップＳ８０７）。
この場合、重要話題名が「高校教師」、「中学教師」、「小学教師」の文書分類オブジェクトが同じ現概念の文書分類として抽出され、それぞれの出現頻度が「教師」の概念の出現頻度に足込まれ、出現日時が比較後更新され、１〜４の文書識別子の文書は、「教師」概念の文書として表示対象にする。 Next, a document file path is acquired based on the document identifier of the document classification object, and the document itself is displayed in the display area (step S811). In this case, FIG. 12 which is the content of the document “concept.doc” is displayed in the display area.
Next, it is confirmed whether or not the current document classification is the last document classification (step S812). In this case, since it is not the last document classification yet, the target is moved to the next document classification (step S801).
Thereafter, similarly, steps S801 to S812 are repeated up to the 26th line in FIG. In this case, when the target document classification is the document classification whose important topic name is “teacher” on the 27th line, the document classification that is the same current concept exists from the 32nd line to the 34th line. The document classification is extracted (step S804), the appearance frequency of the current concept of the extracted document classification is added to the appearance frequency of the current concept of the current document classification (step S805), and the appearance date and time of the extracted current concept of the document classification and the current The appearance date and time of the current concept of the document classification are compared, and if the appearance date and time of the extracted current concept of the document classification is newer, it is updated (step S806), and the document corresponding to the document identifier of the extracted document classification is also displayed. Target (step S807).
In this case, the document classification objects whose important topic names are “high school teacher”, “junior high school teacher”, and “primary school teacher” are extracted as the document classification of the same current concept, and the frequency of appearance of each is the frequency of appearance of the concept of “teacher”. The appearance date and time are updated after the comparison, and the documents with the document identifiers 1 to 4 are made to be displayed as documents of the “teacher” concept.

つぎに、現概念に属する文書数から表示面積を算出する（ステップＳ８０８）。この場合、文書数が「４」であり、たとえば、表示面積として縦３２ｘ４ドットの１２８ドット、横３２ｘ４ドットの１２８ドットの表示エリアとする。文書数が多ければ表示エリアの面積を広くする。
つぎに、表示エリアの位置を算出し、縦軸が出現頻度、横軸が出現日時とするグラフ上に、縦１２８ドット、横１２８ドットの表示エリア自体を表示する（ステップＳ８０９）。この場合、表示エリアの位置は、表示エリアの中心位置の座標が、教師の出現頻度（例えば、「１００」）と、教師の出現日時（例えば、「２００７年９月２０日」）となる位置とする。
つぎに、表示エリア内に現概念の名称を表示する（ステップＳ８１０）。この場合、「教師」を表示する。
つぎに、文書分類オブジェクトの文書識別子を元に文書ファイルパスを取得し、文書自体を表示エリアに表示する（ステップＳ８１１）。この場合、表示エリアには、１〜４の文書識別子の文書の内容を表示する。以降、同様に３５行目〜３７行目の文書分類を処理する。
最後に、現在の文書分類が最後の文書分類かどうか確認する（ステップＳ８１２）。この場合、最後の文書分類なので終了する。
前述の処理を行うことによって、例えば、図１７に示すような画面を表示する。なお、図１７は、抽象度として「４」が指定された画面の一例を示すための図であるが、表示を見やすくするために、表示されている「現概念」の表示エリアの中心位置の座標（各重要話題名の出現頻度、出現日時）は、前述の説明とは一致していない。 Next, the display area is calculated from the number of documents belonging to the current concept (step S808). In this case, the number of documents is “4”. For example, the display area is a display area of 128 dots of vertical 32 × 4 dots and 128 dots of horizontal 32 × 4 dots. If the number of documents is large, the display area is increased.
Next, the position of the display area is calculated, and the display area itself of 128 dots vertically and 128 dots horizontally is displayed on the graph where the vertical axis represents the appearance frequency and the horizontal axis represents the appearance date and time (step S809). In this case, the position of the display area is a position where the coordinates of the center position of the display area are the appearance frequency of the teacher (for example, “100”) and the appearance date and time of the teacher (for example, “September 20, 2007”). And
Next, the name of the current concept is displayed in the display area (step S810). In this case, “teacher” is displayed.
Next, a document file path is acquired based on the document identifier of the document classification object, and the document itself is displayed in the display area (step S811). In this case, the contents of the documents with document identifiers 1 to 4 are displayed in the display area. Thereafter, the document classification on the 35th to 37th lines is processed in the same manner.
Finally, it is confirmed whether or not the current document classification is the last document classification (step S812). In this case, since it is the last document classification, it ends.
By performing the above-described processing, for example, a screen as shown in FIG. 17 is displayed. FIG. 17 is a diagram showing an example of a screen in which “4” is specified as the abstraction level. In order to make the display easy to see, the center position of the displayed “current concept” display area is shown. The coordinates (appearance frequency and appearance date and time of each important topic name) do not match the above description.

以下、上記の実施例において説明した、本発明の利点を図１７と図１８を使って説明する。
図１７は、抽象度に「４」が指定された画面表示例であり、１７０１は抽象度を指定するスライドバーであり、１７０２は文書表示エリアであり、１７０５が「概念」の文書群であり、１７０４が「教師」の文書群であり、１７０３が「車」の文書群である。
図１８は、抽象度として「５」が入力された場合の画面表示例であり、１８０１は抽象度を指定するスライドバーであり、１８０２は文書表示エリアであり、１８０６は「概念」の文書群であり、１８０７が「教師」の文書群であり、１８０８が「高校教師」の文書群であり、１８０９が「中学教師」の文書群であり、１８１０が「小学教師」の文書群であり、１８０３が「スポーツ」の文書群であり、１８０４が「ＲＶ」の文書群であり、１８０５が「セダン」の文書群である。
抽象度が「５」から「４」に遷移すると、図１７の教師１７０４が、図１８の教師１８０７と、高校教師１８０８、中学教師１８０９、小学教師１８１０に分割され、これは意味上の拡大が行われたとみなすことができる。なお、図１８では、図形が重なって見にくくなるのを避けるために、「概念」の表示エリアをディスプレイの中央付近に移動させており、表示されている「現概念」の表示エリアの中心位置の座標（各重要話題名の出現頻度、出現日時）は、前述の説明とは一致していない。 The advantages of the present invention described in the above embodiment will be described below with reference to FIGS.
FIG. 17 is a screen display example in which “4” is specified as the abstraction level, 1701 is a slide bar for specifying the abstraction level, 1702 is a document display area, and 1705 is a document group of “concept”. , 1704 is a document group of “teacher”, and 1703 is a document group of “car”.
FIG. 18 is a screen display example when “5” is input as the abstraction level, 1801 is a slide bar for specifying the abstraction level, 1802 is a document display area, and 1806 is a document group of “concept”. 1807 is the “teacher” document group, 1808 is the “high school teacher” document group, 1809 is the “junior high school teacher” document group, and 1810 is the “primary school teacher” document group, Reference numeral 1803 denotes a document group of “sports”, 1804 denotes a document group of “RV”, and 1805 denotes a document group of “sedan”.
When the level of abstraction transitions from “5” to “4”, the teacher 1704 in FIG. 17 is divided into a teacher 1807 in FIG. 18, a high school teacher 1808, a middle school teacher 1809, and an elementary school teacher 1810. It can be considered that it was done. In FIG. 18, the “concept” display area is moved to the vicinity of the center of the display in order to avoid the overlapping of figures and it is difficult to see, and the center position of the displayed “current concept” display area is displayed. The coordinates (appearance frequency and appearance date and time of each important topic name) do not match the above description.

以上説明したように、本実施例によれば、指定した抽象度で文書群の統合および分割結果を作成でき、統合・分割結果を使って、図１７および図１８に示す分類対象となる文書群の文書空間を地図に見立て、意味上の拡大（文書群の分割）・縮小（文書群の統合）を可能とする画面を構築することが可能となる。
即ち、本実施例では、単語の意味の階層構造に文書内の出現単語を関連付けることで単語の意味の文書空間を構築し、グラフ上に話題を投影し、単語の意味での拡大・縮小を可能とし、文書空間を地図に見立て概念の意味上の拡大・縮小を実施することで、注目している概念のより詳細な部分を提示するユーザインタフェースを構築でき、分析業務を効率的に行うことができる文書群提示装置を提供することが可能となる。
以上、本発明者によってなされた発明を、前記実施例に基づき具体的に説明したが、本発明は、前記実施例に限定されるものではなく、その要旨を逸脱しない範囲において種々変更可能であることは勿論である。 As described above, according to the present embodiment, it is possible to create a document group integration and division result with a specified abstraction level, and use the integration / division result to obtain a document group to be classified as shown in FIGS. 17 and 18. It is possible to construct a screen that makes it possible to enlarge (divide document groups) and reduce (integrate document groups) semantically, with the document space as a map.
In other words, in this embodiment, a word space in a document is constructed by associating words appearing in a document with a hierarchical structure of word meanings, a topic is projected on a graph, and enlargement / reduction in terms of words is performed. By making the document space as a map and expanding and reducing the semantic meaning of the concept, it is possible to construct a user interface that presents more detailed parts of the concept that is of interest, and to perform analysis work efficiently It is possible to provide a document group presentation device capable of
As mentioned above, the invention made by the present inventor has been specifically described based on the above embodiments. However, the present invention is not limited to the above embodiments, and various modifications can be made without departing from the scope of the invention. Of course.

本発明の実施例の文書群提示装置の機能ブロックを示す図である。It is a figure which shows the functional block of the document group presentation apparatus of the Example of this invention. 本発明の実施例の文書群提示装置を実行するコンピュータのハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the computer which performs the document group presentation apparatus of the Example of this invention. 本発明の実施例の文書群提示装置におけるデータ構造を示すクラス図である。It is a class diagram which shows the data structure in the document group presentation apparatus of the Example of this invention. 図１に示す文書群提示装置の概念木構築手段の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the concept tree construction means of the document group presentation apparatus shown in FIG. 図１に示す文書群提示装置の重要話題抽出手段の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the important topic extraction means of the document group presentation apparatus shown in FIG. 図１に示す文書群提示装置の文書分類表構築手段の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the document classification table construction | assembly means of the document group presentation apparatus shown in FIG. 図１に示す文書群提示装置の文書群統合・分割手段の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the document group integration / division | segmentation means of the document group presentation apparatus shown in FIG. 図１に示す文書群提示装置の文書群提示手段の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the document group presentation means of the document group presentation apparatus shown in FIG. 図１に示す概念辞書記憶手段に格納されている概念辞書の概念部の一例を示す図である。It is a figure which shows an example of the conceptual part of the concept dictionary stored in the concept dictionary memory | storage means shown in FIG. 図１に示す概念辞書記憶手段に格納されている概念辞書の関係部の一例を示す図である。It is a figure which shows an example of the relationship part of the concept dictionary stored in the concept dictionary memory | storage means shown in FIG. 図１に示す文書群提示装置の概念木構築手段で構築される概念木の一例を示す図である。It is a figure which shows an example of the concept tree constructed | assembled by the concept tree construction means of the document group presentation apparatus shown in FIG. 図１に示す文書記憶手段に格納されている文書の一例を示す図である。It is a figure which shows an example of the document stored in the document memory | storage means shown in FIG. 図１に示す文書群提示装置の重要話題抽出手段で抽出される重要話題抽出例の一例を示す図である。It is a figure which shows an example of the important topic extraction example extracted by the important topic extraction means of the document group presentation apparatus shown in FIG. 本発明の実施例の文書群提示装置において、抽象度が５の場合の文書分類表の一例を示す図である。It is a figure which shows an example of the document classification table in case the abstraction level is 5 in the document group presentation apparatus of the Example of this invention. 本発明の実施例の文書群提示装置において、抽象度が５から４に変更された場合の文書分類表の一例を示す図である。It is a figure which shows an example of a document classification table | surface when the abstraction level is changed from 5 to 4 in the document group presentation apparatus of the Example of this invention. 本発明の実施例の文書群提示装置において、抽象度が４から５に変更された場合の文書分類表の一例を示す図である。It is a figure which shows an example of a document classification table | surface when the abstraction level is changed from 4 to 5 in the document group presentation apparatus of the Example of this invention. 本発明の実施例の文書群提示装置において、抽象度が４を指定された場合の画面表示例の一例を示す図である。It is a figure which shows an example of the screen display example in case the abstraction level is designated 4 in the document group presentation apparatus of the Example of this invention. 本発明の実施例の文書群提示装置において、抽象度が４から５に変更された場合の画面表示例の一例を示す図である。It is a figure which shows an example of the screen display example in case the abstraction level is changed from 4 to 5 in the document group presentation apparatus of the Example of this invention.

Explanation of symbols

１１文書群提示装置
１０１概念木構築手段
１０２重要話題抽出手段
１０３文書分類表構築手段
１０４概念辞書記憶手段
１０５文書記憶手段
１０６文書群統合・分割手段
１０７文書群提示手段
１０８抽象度入力手段
１０９表示手段
２０１ディスプレイ
２０２ＣＰＵ
２０３メモリ
２０４キーボード／マウス
２０５ハードディスク
２０６ＣＤ−ＲＯＭドライブ
２０７ＣＤ−ＲＯＭ
２０８通信回路
２０９インターネット
２０５１文書提示プログラム
２０５２概念辞書データベース
２０５３文書データベース
１７０１，１８０１スライドバー
１７０２，１８０２文書表示エリア 11 Document group presentation device
DESCRIPTION OF SYMBOLS 101 Concept tree construction means 102 Important topic extraction means 103 Document classification table construction means 104 Concept dictionary storage means 105 Document storage means 106 Document group integration / division means 107 Document group presentation means 108 Abstract level input means 109 Display means 201 Display 202 CPU
203 Memory 204 Keyboard / Mouse 205 Hard Disk 206 CD-ROM Drive 207 CD-ROM
208 Communication Circuit 209 Internet 2051 Document Presentation Program 2052 Concept Dictionary Database 2053 Document Database 1701, 1801 Slide Bar 1702, 1802 Document Display Area

Claims

A document group presentation device for inputting a document and presenting a document group of a document management system having functions of registration, storage, retrieval, and display,
A concept tree construction means for reading a concept dictionary and constructing a word concept as a tree structure;
An important topic extraction means for extracting a word by reading a plurality of documents, and determining and extracting an important topic according to the number of occurrences of the word or whether it is a headword;
Based on the concept tree constructed by the concept tree construction means and the important topics extracted by the important topic extraction means, a concept identifier, document identifier, abstraction level, and current abstraction level necessary for presenting a document group are obtained. A document classification table construction means for constructing a concept tree to be expressed as a document classification table;
Document group integration that analyzes the document classification table constructed by the document classification table construction unit and updates the document classification table based on the abstraction level input this time, the abstraction level input last time, and the current abstraction level. Dividing means;
A document group presentation apparatus comprising: a presentation unit that presents the document group based on the document classification table updated by the document group integration / division unit.

The document classification table construction means, when the important topic extracted by the important topic extraction means exists in the concept tree constructed by the concept tree construction means, assigns a concept identifier of the important topic to the document classification. The concept identifier of the table, the document identifier of the document associated with the important topic as the document identifier of the document classification table, the hierarchy on the concept tree of the important topic as the abstraction level and the current abstraction level of the document classification table Set,
The document group integration / division means, when the previously input abstraction level is greater than the abstraction level input this time, based on the abstraction level input this time and the current abstraction level of the document classification table, the important topic of the currently input abstraction, the integrating important topic subgeneric important topic of the currently input abstraction, the case the last input abstraction degree is less abstract entered the time In addition, based on the abstraction level input this time and the current abstraction level of the document classification table, the important topic of the superordinate concept integrating the important topics of the subordinate concept is changed to the important topic of the abstraction level input this time Update the document classification table to divide,
The presenting means, based on the document classification table updated by the document group integration / division means, shows a document group associated with the important topic below the abstraction inputted this time , with the appearance frequency and horizontal axis 2. The document group presentation device according to claim 1, wherein the document group presentation device presents an area proportional to the number of documents associated with the important topic less than or equal to the abstraction level input this time on a graph having an appearance date and time as an axis.

A document group presentation program for presenting a document group of a document management system having functions for inputting, registering, storing, searching, and displaying a document,
The document group presentation program causes a computer to realize each means of the document group presentation device according to claim 1 or 2.