JP2001273458A

JP2001273458A - Device and method for supporting document group analysis and recording medium

Info

Publication number: JP2001273458A
Application number: JP2000083491A
Authority: JP
Inventors: Makoto Yamazaki; 真湖人山崎; Atsuo Shimada; 敦夫嶋田; Katsuhiko Fujita; 克彦藤田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2000-03-24
Filing date: 2000-03-24
Publication date: 2001-10-05

Abstract

PROBLEM TO BE SOLVED: TO provide a document group analysis supporting device for performing efficient analysis by combining document sorting based on qualitative data and document sorting based on conventionally utilized quantitative data. SOLUTION: In a document information managing part 1, individual document data are managed so as to be utilized depending on each of attributes contained in the document. In a document group generating part 2, a document set composed of a plurality of documents is sorted into a plurality of document subsets. Information on at least one document group as a result is managed by a document information managing part 3 and based on information 4 of sorting dimensions composed of at least one document group, totalization such as cross totalization is performed in a totalizing processing part 5 while using the individual attribute values of the document.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、文書群分析支援装
置、方法、及び記録媒体に関し、より詳細には、文書群
分類技術、多次元データベースを用いた情報分析技術を
用いた文書群分析支援装置、方法、及び記録媒体に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document group analysis support apparatus, method, and recording medium, and more particularly, to a document group analysis support using a document group classification technique and an information analysis technique using a multidimensional database. The present invention relates to an apparatus, a method, and a recording medium.

【０００２】[0002]

【従来の技術】近年、インターネットなどの普及によ
り、大量の文書群へのアクセスが可能になり、そのよう
な文書群を様々な利用者の意図に基づいて、且つ効率的
に利用できるようにする必要性が高まっている。文書は
複数の要素によって構成されるデータとみなすことがで
き、文書群は、各要素ごと異なるフィールドに対応づけ
ることにより、多次元データの形態に格納して扱うこと
ができる。2. Description of the Related Art In recent years, the spread of the Internet and the like has made it possible to access a large number of documents, and to efficiently use such documents based on the intentions of various users. The need is growing. A document can be regarded as data composed of a plurality of elements, and a document group can be stored and handled in the form of multidimensional data by associating each element with a different field for each element.

【０００３】多次元データの分析を支援する方法として
は、多次元データを構成するある属性項目（次元）に注
目し、それに何らかの分類基準（ルール）を適用するこ
とにより、各データが属するカテゴリを決定する方法が
一般的である。この次元は、あるデータフィールドに関
して、そのフィールド値を用いてデータをカテゴリ化す
るための基準を指定したものである。各次元の情報に従
って独立にデータのカテゴリ化を行い、後ほどそれらを
用いたクロス集計を行うことにより、複数の次元（複数
の属性項目）によって定義されるカテゴリを求めること
ができる。このようにして得られたデータのカテゴリに
ついて、必要な箇所だけを表示したり、該当する文書数
や特定要素の値に関する平均値などの統計量を取得する
ことにより、分析者は求める情報を容易に得ながら分析
を行うことができる。このような方法は、ＯＬＡＰ（Ｏ
ｎＬｉｎｅＡｎａｌｙｔｉｃａｌＰｒｏｃｅｓｓｉ
ｎｇ；オンライン分析処理）として知られている。As a method for supporting the analysis of multidimensional data, attention is paid to a certain attribute item (dimension) constituting the multidimensional data, and a classification standard (rule) is applied to the attribute item to determine a category to which each data belongs. The method of determining is common. This dimension specifies a criterion for categorizing data using a field value for a certain data field. By independently categorizing data according to the information of each dimension and performing cross tabulation using them later, a category defined by a plurality of dimensions (a plurality of attribute items) can be obtained. By displaying only the necessary parts of the data category obtained in this way and obtaining statistics such as the number of relevant documents and the average value of the value of a specific element, the analyst can easily obtain the information required. The analysis can be performed while obtaining the data. Such a method is known as OLAP (O
nLine Analytical Process
ng; online analysis processing).

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、文書は
定性データ（文章，画像，音声など）をその要素として
含むことがあるため、文書群の分析においては、定性デ
ータと定量データ（数値，カテゴリカルデータなど）と
を共に用いながら、それらを統合する分析方法が必要と
される。However, since a document may include qualitative data (text, image, sound, etc.) as its element, in analyzing a document group, qualitative data and quantitative data (numerical values, categorical data) are used. And the like, while using them together.

【０００５】定性データに注目するとき、定量データを
用いて行うカテゴリ化とは異なる事態が生じる。まず、
ある次元を特定した場合に、一つの文書が複数のカテゴ
リに該当する場合がある。定性データはそれを扱う分析
者によって多義的に解釈される可能性があり、必ずいず
れかのカテゴリに含まれるものと決定することは困難で
あるためである。When focusing on qualitative data, a situation different from categorization performed using quantitative data occurs. First,
When a certain dimension is specified, one document may fall into a plurality of categories. This is because qualitative data may be interpreted ambiguously by the analyst who handles it, and it is difficult to determine that the data belongs to any category.

【０００６】次に、一つの定性データに関して、複数の
次元を定義できることが必要になる。定性データは、統
一的な基準によって定量化されていないデータである。
これに対し何らかの次元を定義しようとするとき、分析
者は、分析の目的によって、複数の異なる分析基準、な
いし次元を想定することが可能である。Next, it is necessary that a plurality of dimensions can be defined for one qualitative data. Qualitative data is data that has not been quantified by a unified standard.
On the other hand, when trying to define some dimension, the analyst can assume a plurality of different analysis criteria or dimensions depending on the purpose of the analysis.

【０００７】上述の要求を満たさない分析支援方法にお
いては、かえって文書群から抽出できる知識を制限して
しまう恐れがあるため、上述の要求を制約しない分析支
援方法を提供することが求められる。この目標は、従来
の多次元データ分析方法では解決されていない。In an analysis support method that does not satisfy the above-mentioned requirements, there is a possibility that the knowledge that can be extracted from the document group may be restricted, and therefore it is required to provide an analysis support method that does not restrict the above-mentioned requirements. This goal has not been solved by conventional multidimensional data analysis methods.

【０００８】本発明は、上述のごとき実情に鑑みてなさ
れたものであり、上述の特徴を持った定性データによる
文書分類と、従来利用されている定量データによる文書
分類とを組み合わせて効率的な分析を行う文書群分析支
援装置、方法、及び記録媒体を提供することをその目的
とする。[0008] The present invention has been made in view of the above situation, and is an efficient combination of document classification based on qualitative data having the above-described characteristics and document classification based on conventionally used quantitative data. An object of the present invention is to provide a document group analysis support apparatus, method, and recording medium for performing analysis.

【０００９】[0009]

【課題を解決するための手段】請求項１の発明は、複数
の文書を用いて意思決定に必要な情報を得る作業を支援
する文書群分析支援装置において、それぞれの文書を文
書に含まれる各属性毎に利用できるよう管理する文書情
報管理手段と、複数の文書を指定された分類基準に基づ
いて分類する文書群生成手段と、該文書群生成手段で生
成した文書群に関する情報を管理する文書群情報管理手
段と、一つ以上の文書群によって構成される分類次元を
持ち、該分類次元に従って、所属する文書の数や所属す
る文書の属性値を用いた集計を行う集計処理手段と、該
集計の結果として得られた情報を表示する集計結果表示
手段とを備えることを特徴としたものである。According to a first aspect of the present invention, there is provided a document group analysis support apparatus for supporting a task of obtaining information necessary for decision making using a plurality of documents, wherein each document is included in a document. Document information management means for managing each document so that it can be used for each attribute, document group generation means for classifying a plurality of documents based on a specified classification standard, and document for managing information on the document group generated by the document group generation means Group information management means, aggregation processing means having a classification dimension composed of one or more document groups, and performing aggregation using the number of belonging documents and attribute values of the belonging documents according to the classification dimensions; And a counting result display means for displaying information obtained as a result of the counting.

【００１０】請求項２の発明は、請求項１の発明におい
て、前記文書群生成手段によって生成される各文書群
は、一つの属性に関する分類基準に従って構成され、前
記文書群情報管理手段は、各文書群を該文書群が持つ属
性に関する情報と対応づけて管理することを特徴とした
ものである。According to a second aspect of the present invention, in the first aspect of the invention, each document group generated by the document group generation means is configured according to a classification criterion relating to one attribute, and the document group information management means It is characterized in that a group of documents is managed in association with information on attributes of the group of documents.

【００１１】請求項３の発明は、請求項１又は２の発明
において、前記文書群生成手段は、複数の異なる文書群
に同じ文書が所属することを可能とし、前記文書群情報
管理手段は、前記複数の異なる文書群に所属する文書を
管理し、前記集計処理手段は、複数の文書群に基づいて
集計を行う際には、文書群間に重複して所属する文書を
一度しか扱わないことを特徴としたものである。According to a third aspect of the present invention, in the first or second aspect of the invention, the document group generation means enables the same document to belong to a plurality of different document groups, and the document group information management means The documents belonging to the plurality of different document groups are managed, and the tabulation processing means treats the document belonging to the document groups redundantly only once when performing the tabulation based on the plurality of document groups. It is characterized by.

【００１２】請求項４の発明は、請求項１乃至３のいず
れか１の発明において、同じ属性に関して構成された、
一つ以上の文書群を含むことができる文書メタ群を定義
して管理する文書メタ群管理手段を備え、前記集計処理
手段において分類次元に文書群を指定する際に、文書メ
タ群を単位として指定を行うことを特徴としたものであ
る。According to a fourth aspect of the present invention, in any one of the first to third aspects, the invention is configured with respect to the same attribute.
Document meta group management means for defining and managing a document meta group that can include one or more document groups, and when specifying a document group in a classification dimension in the aggregation processing means, the document meta group is defined as a unit. It is characterized by specifying.

【００１３】請求項５の発明は、請求項１乃至４のいず
れか１の発明において、前記集計処理手段は、一つ以上
の文書群によって構成される複数の分類次元を持ち、複
数の分類次元に従って、所属する文書の数や所属する文
書の属性値を用いた集計を行うことを特徴としたもので
ある。According to a fifth aspect of the present invention, in the first aspect of the present invention, the totalizing means has a plurality of classification dimensions constituted by one or more document groups, and a plurality of classification dimensions. According to the above, the totalization is performed using the number of belonging documents and the attribute values of the belonging documents.

【００１４】請求項６の発明は、請求項１乃至５のいず
れか１の発明において、前記集計処理手段は、一つの属
性に対して複数の分類次元を指定し、該複数の分類次元
間でのクロス集計処理を行うことを可能とすることを特
徴したものである。According to a sixth aspect of the present invention, in the invention according to any one of the first to fifth aspects, the aggregation processing means designates a plurality of classification dimensions for one attribute, and specifies a plurality of classification dimensions between the plurality of classification dimensions. The cross-tabulation processing described above can be performed.

【００１５】請求項７の発明は、請求項１乃至６のいず
れか１の発明において、前記集計処理手段は、テキスト
を値として持つ属性に関して集計を行う際には、一つ以
上の文書のテキストの要約や、テキストに含まれる単語
や該単語の頻度など集計値として抽出することが可能で
あることを特徴としたものである。According to a seventh aspect of the present invention, in the invention according to any one of the first to sixth aspects, when the tabulation processing section performs tabulation on an attribute having a text as a value, the tabulation processing means may include a text of one or more documents. , And can be extracted as a total value such as a word included in the text and the frequency of the word.

【００１６】請求項８の発明は、複数の文書を用いて意
思決定に必要な情報を得る作業を支援する文書群分析支
援方法において、それぞれの文書を文書に含まれる各属
性毎に利用できるよう管理する文書情報管理ステップ
と、複数の文書を指定された分類基準に基づいて分類す
る文書群生成ステップと、該文書群生成ステップで生成
した文書群に関する情報を管理する文書群情報管理ステ
ップと、一つ以上の文書群によって構成される分類次元
を持ち、該分類次元に従って、所属する文書の数や所属
する文書の属性値を用いた集計を行う集計処理ステップ
と、該集計の結果として得られた情報を表示する集計結
果表示ステップとからなることを特徴としたものであ
る。The invention according to claim 8 is a document group analysis support method for supporting a task of obtaining information necessary for decision making using a plurality of documents, so that each document can be used for each attribute included in the document. A document information management step of managing, a document group generation step of classifying a plurality of documents based on a specified classification criterion, a document group information management step of managing information about the document group generated in the document group generation step, A tabulation processing step of having a classification dimension composed of one or more document groups, performing a tabulation using the number of belonging documents and attribute values of the belonging documents according to the classification dimension, and obtaining a result of the tabulation And a counting result display step of displaying the collected information.

【００１７】請求項９の発明は、請求項８の発明におい
て、前記文書群生成ステップによって生成される各文書
群は、一つの属性に関する分類基準に従って構成され、
前記文書群情報管理ステップは、各文書群を該文書群が
持つ属性に関する情報と対応づけて管理することを特徴
としたものである。According to a ninth aspect of the present invention, in the invention of the eighth aspect, each document group generated in the document group generation step is configured according to a classification criterion relating to one attribute,
The document group information management step is characterized in that each document group is managed in association with information on attributes of the document group.

【００１８】請求項１０の発明は、請求項８又は９の発
明において、前記文書群生成ステップは、複数の異なる
文書群に同じ文書が所属することを可能とし、前記文書
群情報管理ステップは、前記複数の異なる文書群に所属
する文書を管理し、前記集計処理ステップは、複数の文
書群に基づいて集計を行う際には、文書群間に重複して
所属する文書を一度しか扱わないことを特徴としたもの
である。In a tenth aspect based on the eighth or ninth aspect, the document group generation step enables the same document to belong to a plurality of different document groups, and the document group information management step includes: When managing the documents belonging to the plurality of different document groups and performing the aggregation based on the plurality of document groups, the aggregation processing step handles only the documents belonging to the overlapping document group once. It is characterized by.

【００１９】請求項１１の発明は、請求項８乃至１０の
いずれか１の発明において、同じ属性に関して構成され
た、一つ以上の文書群を含むことができる文書メタ群を
定義して管理する文書メタ群管理ステップを有し、前記
集計処理ステップにおいて分類次元に文書群を指定する
際に、文書メタ群を単位として指定を行うことを特徴と
したものである。According to an eleventh aspect of the present invention, in the invention of any one of the eighth to tenth aspects, a document meta group that can include one or more document groups and that is configured with respect to the same attribute is defined and managed. A document meta group management step is provided, and when the document group is specified as the classification dimension in the tabulation processing step, the specification is performed in units of the document meta group.

【００２０】請求項１２の発明は、請求項８乃至１１の
いずれか１の発明において、前記集計処理ステップは、
一つ以上の文書群によって構成される複数の分類次元を
持ち、複数の分類次元に従って、所属する文書の数や所
属する文書の属性値を用いた集計を行うことを特徴とし
たものである。According to a twelfth aspect of the present invention, in the invention according to any one of the eighth to eleventh aspects, the tallying step comprises:
It has a plurality of classification dimensions composed of one or more document groups, and performs counting using the number of belonging documents and attribute values of the belonging documents according to the plurality of classification dimensions.

【００２１】請求項１３の発明は、請求項８乃至１２の
いずれか１の発明において、前記集計処理ステップは、
一つの属性に対して複数の分類次元を指定し、該複数の
分類次元間でのクロス集計処理を行うことを可能とする
ことを特徴としたものである。According to a thirteenth aspect of the present invention, in the invention according to any one of the eighth to twelfth aspects, the tallying step comprises:
A plurality of classification dimensions are designated for one attribute, and a cross tabulation process can be performed between the plurality of classification dimensions.

【００２２】請求項１４の発明は、請求項８乃至１３の
いずれか１の発明において、前記集計処理ステップは、
テキストを値として持つ属性に関して集計を行う際に
は、一つ以上の文書のテキストの要約や、テキストに含
まれる単語や該単語の頻度など集計値として抽出するこ
とが可能であることを特徴としたものである。The invention of claim 14 is the invention according to any one of claims 8 to 13, wherein the totaling step is:
When performing aggregation on attributes having text as a value, it is possible to extract a summary of the text of one or more documents, a word included in the text, and the frequency of the word as a total value. It was done.

【００２３】請求項１５の発明は、請求項１乃至７のい
ずれか１に記載の文書群分析支援装置として機能させる
ためのプログラムを記録したことを特徴としたものであ
る。According to a fifteenth aspect of the present invention, a program for causing the computer to function as the document group analysis support apparatus according to any one of the first to seventh aspects is recorded.

【００２４】請求項１６の発明は、複数の文書のそれぞ
れを文書に含まれる各属性毎に利用できるよう文書情報
を管理し、複数の文書を指定された分類基準に基づいて
分類して文書群を生成し、該生成した文書群に関する情
報を管理し、一つ以上の文書群によって構成される分類
次元を持ち、該分類次元に従って、所属する文書の数や
所属する文書の属性値を用いた集計を行い、該集計の結
果として得られた情報を表示させることにより、複数の
文書を用いて意思決定に必要な情報を得る作業を支援す
るためのプログラムを記録したことを特徴としたもので
ある。According to a sixteenth aspect of the present invention, document information is managed so that each of a plurality of documents can be used for each attribute included in the document, and the plurality of documents are classified based on a specified classification criterion. , Manages information about the generated document group, has a classification dimension composed of one or more document groups, and uses the number of belonging documents and the attribute value of the belonging document according to the classification dimension. A program for supporting a task of obtaining information necessary for decision making by using a plurality of documents by performing aggregation and displaying information obtained as a result of the aggregation. is there.

【００２５】請求項１７の発明は、請求項１６の発明に
おいて、前記生成される各文書群は、一つの属性に関す
る分類基準に従って構成され、前記文書群に関する情報
は、各文書群を該文書群が持つ属性に関する情報と対応
づけて管理させることを特徴としたものである。According to a seventeenth aspect, in the sixteenth aspect, each of the generated document groups is configured in accordance with a classification criterion relating to one attribute, and the information relating to the document group is obtained by associating each document group with the document group. It is characterized in that it is managed in association with information on attributes possessed by.

【００２６】請求項１８の発明は、請求項１６又は１７
の発明において、複数の異なる前記生成された文書群に
同じ文書が所属することを可能とし、前記複数の異なる
文書群に所属する文書を管理し、前記集計は、複数の文
書群に基づいて集計を行う際には、文書群間に重複して
所属する文書を一度しか扱わないことを特徴としたもの
である。The invention of claim 18 is the invention of claim 16 or 17
In the invention, the same document can belong to a plurality of different generated document groups, the documents belonging to the plurality of different document groups can be managed, and the totaling can be performed based on a plurality of document groups. Is characterized in that documents belonging to a group of documents that overlap are handled only once.

【００２７】請求項１９の発明は、請求項１６乃至１８
のいずれか１の発明において、同じ属性に関して構成さ
れた、一つ以上の文書群を含むことができる文書メタ群
を定義して管理し、前記集計は、分類次元に文書群を指
定する際に、前記文書メタ群を単位として指定を行うこ
とを特徴としたものである。[0027] The invention of claim 19 is the invention of claims 16 to 18.
In any one of the inventions, a document meta group that can include one or more document groups configured with respect to the same attribute is defined and managed, and the aggregation is performed when a document group is specified in a classification dimension. The document meta group is designated as a unit.

【００２８】請求項２０の発明は、請求項１６乃至１９
のいずれか１の発明において、前記集計は、一つ以上の
文書群によって構成される複数の分類次元を持ち、複数
の分類次元に従って、所属する文書の数や所属する文書
の属性値を用いた集計であることを特徴としたものであ
る。[0028] The invention of claim 20 is the invention of claims 16 to 19.
In any one of the inventions, the aggregation has a plurality of classification dimensions constituted by one or more document groups, and uses the number of belonging documents and attribute values of the belonging documents according to the plurality of classification dimensions. It is characterized by being a tally.

【００２９】請求項２１の発明は、請求項１６乃至２０
のいずれか１の発明において、前記集計は、一つの属性
に対して複数の分類次元を指定し、該複数の分類次元間
でのクロス集計処理を行うことを可能とすることを特徴
としたものである。The invention of claim 21 is the invention of claims 16 to 20
In the invention of any one of the above, the aggregation is characterized in that a plurality of classification dimensions are designated for one attribute, and a cross aggregation process can be performed between the plurality of classification dimensions. It is.

【００３０】請求項２２の発明は、請求項１６乃至２１
のいずれか１の発明において、前記集計は、テキストを
値として持つ属性に関して集計を行う際には、一つ以上
の文書のテキストの要約や、テキストに含まれる単語や
該単語の頻度など集計値として抽出することが可能であ
ることを特徴としたものである。The invention of claim 22 is the invention of claims 16 to 21
In the invention according to any one of the above, when the tabulation is performed on an attribute having a text as a value, the tabulation may include a summary of a text of one or more documents, a word included in the text, a frequency of the word, or the like. It is characterized in that it can be extracted as

【００３１】[0031]

【発明の実施の形態】図１は、本発明の一実施形態の文
書群分析支援装置の構成を示すブロック図である。図１
に示したように、本実施形態の文書群分析支援装置は、
それぞれの文書データを管理する文書管理手段である文
書情報管理部１、複数の文書からなる文書集合を複数の
文書部分集合に分類する文書分類処理手段（文書群生成
手段）である分類処理部（文書群生成部）２、文書群生
成部（文書グループ生成部）２の処理結果である一つ以
上の文書群（文書グループ）に関する情報を管理する文
書群情報管理部（文書グループ情報管理部）３、分類次
元情報４に基づいて文書の各属性値を用いてクロス集計
等の集計を行う集計処理部５、集計処理部５の結果を表
示する集計結果表示部６、ＣＲＴディスプレイなどの表
示装置７、及びキーボードやマウスなどからなる入力装
置８などを備えている。FIG. 1 is a block diagram showing a configuration of a document group analysis support apparatus according to an embodiment of the present invention. FIG.
As shown in the above, the document group analysis support device of the present embodiment,
A document information management unit 1 that is a document management unit that manages each document data, and a classification processing unit (a document group generation unit) that classifies a document set including a plurality of documents into a plurality of document subsets. A document group information management unit (document group information management unit) that manages information about one or more document groups (document groups), which is a processing result of the document group generation unit (document group generation unit) 2 3. Aggregation processing unit 5 that performs aggregation such as cross-tabulation using each attribute value of the document based on the classification dimension information 4, aggregation result display unit 6 that displays the result of aggregation processing unit 5, and a display device such as a CRT display And an input device 8 such as a keyboard and a mouse.

【００３２】＜文書情報管理部＞本明細書における文書
とは、電子ファイルであり、複数の文書属性からなるひ
とまとまりの情報である。例えば、リレーショナルデー
タベースにおいては、１レコードを１つの文書，１フィ
ールドを１つの属性として扱ったり、ＨＴＭＬ，ＸＭＬ
等の構造化データは、１つのＨＴＭＬ文書を１つの文
書、タグで区切られた構造化要素を各属性として取り扱
ったりする。各文書属性は少なくとも、属性名、データ
型、及び値を持つ。属性名は属性を識別するためのラベ
ルであり、他のすべての属性と重複しない数値や文字列
であればよい。データ型とはその属性が持つ値のデータ
型であり、テキスト型、数値型、日時型などを指定す
る。値は、各属性として指定されたデータ型を持つデー
タである。<Document Information Management Unit> A document in this specification is an electronic file, which is a group of information including a plurality of document attributes. For example, in a relational database, one record is treated as one document and one field is treated as one attribute, or HTML, XML, or the like.
For example, structured data such as one HTML document is treated as one document, and structured elements separated by tags are treated as attributes. Each document attribute has at least an attribute name, a data type, and a value. The attribute name is a label for identifying the attribute, and may be a numerical value or a character string that does not overlap with all other attributes. The data type is the data type of the value of the attribute, and specifies a text type, a numeric type, a date and time type, and the like. The value is data having a data type specified as each attribute.

【００３３】文書情報管理部１では、各文書を一つのレ
コードとし、各属性値を各フィールドに対応づけて格納
した多次元データベースの形式で管理する。この方法と
しては、一般的なリレーショナルデータベースを用いる
ことができる。The document information management unit 1 manages each document as one record, and manages each attribute value in the form of a multidimensional database in which each attribute value is stored in association with each field. As this method, a general relational database can be used.

【００３４】＜文書グループ生成部＞文書グループ生成
部２は、各属性値に対して分類基準を適用し、文書をあ
る文書グループに対応づける処理を行う。本実施形態の
一実施例としては、複数の異なる文書グループに同じ文
書が属することが可能なようにする。このような文書群
分析支援装置では、各文書属性ごとに分類処理を実行す
るため、その文書属性が値として持つデータの変数型を
有効に利用する分類方法を取ることができる。すなわ
ち、文章（テキスト型）のデータを持つ文書属性に関し
ては、その文章が含むべき単語を条件として指定し、属
性値である文章がその条件を満たすかどうかに依って文
書の所属する文書グループを判断する方法である。この
方法では、例えば、指定された文書属性のデータである
文章が「事件」又は「犯罪」という単語を含んでいれば
文書グループＡ、「野球」「サッカー」「選手」のいず
れかを含めば文書グループＢというように定義し、各文
書グループにある文書が含まれるかどうかを判断する。<Document Group Generation Unit> The document group generation unit 2 performs a process of applying a classification criterion to each attribute value and associating a document with a certain document group. As an example of the present embodiment, the same document can belong to a plurality of different document groups. In such a document group analysis support apparatus, since the classification process is performed for each document attribute, a classification method that effectively uses a variable type of data that the document attribute has as a value can be adopted. That is, for a document attribute having text (text type) data, the word to be included in the text is specified as a condition, and the document group to which the document belongs depends on whether the text as the attribute value satisfies the condition. How to judge. In this method, for example, if the sentence that is the data of the designated document attribute includes the word “case” or “crime”, the document group A and any of “baseball”, “soccer”, and “player” are included. A document group B is defined, and it is determined whether or not each document group includes a document.

【００３５】また、数値型のデータを持つ文書属性を用
いた分類手法としては、定量データを値とする文書属性
を用いる場合には、とりうる値の範囲に関して文書グル
ープごとに指定を行い、指定された属性の値がその範囲
に含まれるかどうかを判断する方法などが利用可能であ
る。これは、例えば、整数型のデータをとる属性に関し
て、値が０から１９までの間ならば文書グループＣ、２
０から３９までの間ならば文書グループＤ、というよう
に定義し、各文書グループにある文書が含まれるかどう
かを判断する方法である。As a classification method using a document attribute having numerical data, when a document attribute having quantitative data as a value is used, a range of possible values is specified for each document group, and the specification is performed. A method of determining whether or not the value of the attribute is included in the range can be used. This is, for example, for an attribute that takes integer-type data, if the value is between 0 and 19, the document group C, 2
This is a method of defining a document group D if it is between 0 and 39, and determining whether or not each document group includes a certain document.

【００３６】図２は、本発明による文書属性を用いた分
類例を示す図で、上述の方法によって文書グループが定
義された状態の例を示す図である。この例では、整数値
をとる文書属性「年齢」に関する分類基準と、文章を値
とする文書属性「今年の抱負」に関する分類基準とを示
してある。このような分類基準を用いた場合、ある文書
の「年齢」属性値が３６であればその文書はＩＤが２で
ある文書グループに所属し、属性「今年の抱負」の値が
「仕事で帰宅が遅くならないようにしたい」であれば、
その文書はＩＤが１２である文書グループに所属する。FIG. 2 is a diagram showing an example of classification using document attributes according to the present invention, and is a diagram showing an example of a state in which a document group is defined by the above-described method. In this example, a classification criterion relating to a document attribute “age” having an integer value and a classification criterion relating to a document attribute “a resolution of the year” having a sentence as a value are shown. When such a classification criterion is used, if the “age” attribute value of a certain document is 36, the document belongs to the document group whose ID is 2, and the value of the attribute “this year's resolution” is “ Want to be slow. "
The document belongs to the document group whose ID is 12.

【００３７】＜文書グループ間の階層関係＞複数の文書
グループを組み合わせ、それらの間に階層あるいは包含
の関係を定義することにより、分析者は複数の文書グル
ープの和集合である文書グループを容易に獲得すること
ができる。<Hierarchical Relationship between Document Groups> By combining a plurality of document groups and defining a hierarchy or inclusion relationship between them, the analyst can easily create a document group which is a union of a plurality of document groups. Can be acquired.

【００３８】文書グループが、文書だけでなく他の文書
グループをそのメンバーとして含むことを可能とする構
成にすることによって、文書グループ間に階層構造を定
義することが可能になる。しかしながら、この方法で
は、様々な文書グループを一つの文書グループが包含す
るような事態も生じてしまい、個々の文書グループの意
味や位置づけを理解することが困難になる。また、階層
の上位に位置する文書グループに関して分類基準を変更
した場合、その下位に位置する文書グループに対しても
所属すべき文書の再獲得処理を施す必要が生じる。By configuring a document group to include not only documents but also other document groups as its members, it is possible to define a hierarchical structure between document groups. However, according to this method, various document groups may be included in one document group, and it is difficult to understand the meaning and positioning of each document group. Further, when the classification criterion is changed for a document group located at a higher level in the hierarchy, it is necessary to perform a process of reacquiring a document to which the document group located at a lower level belongs.

【００３９】上述の問題を避けるためには、大きく分け
て二つの方法がある。まず、文書グループ間の関係を、
文書グループの属性として定義する方法がある。この方
法を用いた場合の実施例として、文書グループに「サブ
グループ」属性を定義する場合を考える。ここで、サブ
グループ属性は、ゼロ以上のグループを参照するための
情報を含むリストである。ある文書グループのサブグル
ープとして定義された文書グループについて、さらにサ
ブグループを持つよう指定することによって、文書グル
ープの間に多重階層を持った階層構造を定義することが
できる。カウントや集計のため、ある文書グループと、
それが包含する全てのグループに含まれる文書を得る場
合には、ある文書グループに関して、それに含まれる文
書グループ及び、そのサブグループ属性に示された全て
の文書グループに含まれる文書グループを参照すればよ
い。In order to avoid the above problem, there are roughly two methods. First, the relationship between document groups
There is a method to define it as an attribute of the document group. As an embodiment using this method, consider a case where a “subgroup” attribute is defined for a document group. Here, the subgroup attribute is a list including information for referring to zero or more groups. By designating a document group defined as a subgroup of a certain document group to have further subgroups, a hierarchical structure having multiple hierarchies between document groups can be defined. A document group for counting and counting,
When obtaining documents included in all the groups included in the document group, refer to the document group included in the document group and the document group included in all the document groups indicated in the subgroup attribute of the certain document group. Good.

【００４０】この方法について、定義する関係の種類を
示すラベル情報と、関係先となる文書グループの識別情
報（ＩＤ等）とを対にしたものを、例えば「関係のある
文書グループ」という属性として文書グループに付与す
ることによっても、同様の効果を実現できる。ここで、
「関係のある文書グループ」属性をリストの構造を持っ
た属性とすることにより、一つの文書グループに関して
複数の文書グループを、関係を持ったものとして定義す
ることができる。関係の種類としては、包含や類似、並
置、原因、結果など任意のものを定義する。この方法を
用いた場合には、関係の種類として、上記のように複数
のものを同じデータ構造を用いて表現できるという効果
が得られる。In this method, a pair of the label information indicating the type of the relationship to be defined and the identification information (ID etc.) of the related document group is used as an attribute, for example, “related document group”. The same effect can be realized by giving the document group. here,
By making the “related document group” attribute an attribute having a list structure, a plurality of document groups can be defined as having a relationship with respect to one document group. Any type of relation, such as inclusion, similarity, juxtaposition, cause, and consequence, is defined. When this method is used, there is an effect that a plurality of relationships can be expressed using the same data structure as described above.

【００４１】しかしながら、この方法では、様々な文書
グループを一つの文書グループが包含するような事態も
生じてしまい、それによって個々の文書グループの意味
や位置づけを理解することが困難になる。また、階層の
上位に位置する文書グループに関して分類基準を変更し
た場合、その下位に位置する文書グループに対しても所
属すべき文書の再獲得処理を施す必要が生じる。However, in this method, various document groups may be included in one document group, which makes it difficult to understand the meaning and position of each document group. Further, when the classification criterion is changed for a document group located at a higher level in the hierarchy, it is necessary to perform a process of reacquiring a document to which the document group located at a lower level belongs.

【００４２】図３は、本発明による文書グループ間の構
造の一例を記述したツリーを示す図である。文書グルー
プに関する情報とは別に、文書グループ間の構造を定義
した情報を保持する方法としては、例えば、文書グルー
プ間の包含関係によって形成される集合を用いる方法が
ある。この集合を、文書メタグループ（文書メタ群）と
呼ぶことにする。文書メタグループは、一つ以上の文書
グループを含むことができる。文書メタグループに関す
る情報は、文書グループに関する情報とは異なるテーブ
ルの上に表現し、文書グループに関する情報を参照する
ための情報を含む。カウントや集計を行うため、文書グ
ループによって構成される和集合に含まれる全ての文書
を得る場合には、ある文書メタグループに関して、それ
に含まれる全ての文書グループを参照すればよい。文書
メタグループが他の文書メタグループを含むように定義
すれば、文書グループ間に多重の階層を持った包含関係
を定義することができる。このような相互関係を図３に
示してある。文書グループは、文書グループ間の階層構
造（文書メタグループによって構成される）とは独立に
管理するものとする。FIG. 3 is a diagram showing a tree describing an example of the structure between document groups according to the present invention. As a method of holding information defining the structure between document groups, separately from information on document groups, for example, there is a method of using a set formed by the inclusion relation between document groups. This set is called a document meta group (document meta group). A document metagroup can include one or more document groups. The information about the document meta group is expressed on a table different from the information about the document group, and includes information for referring to the information about the document group. To obtain all the documents included in the union constituted by the document groups in order to perform counting and totaling, all document groups included in a certain document meta group may be referred to. By defining a document metagroup to include another document metagroup, it is possible to define an inclusion relationship having multiple layers between document groups. Such an interrelation is shown in FIG. The document group is managed independently of the hierarchical structure between the document groups (configured by the document meta group).

【００４３】上述の方法によれば、文書メタグループの
生成や削除、編集に伴って、文書グループに関する情報
をいちいち更新する必要がないため、これらの操作に伴
う処理が煩雑にならない。この場合には、文書グループ
は他の文書グループを包含しないため、各文書グループ
に関する分類基準は階層関係の定義によらず単純なもの
とすることができる。また、ある文書グループに関して
分類基準の変更を行った場合にも、他の文書グループに
おいて所属文書の再獲得が必要となることはない。さら
に、ある文書メタグループに属する全ての文書を獲得す
る場合、個々の文書グループに関して集計値を事前に算
出しておくことにより、集計、特定の要素に関する集計
をすばやく行うことができる。According to the above-described method, it is not necessary to update the information on the document group each time a document meta group is generated, deleted, or edited, so that the processing involved in these operations does not become complicated. In this case, since the document group does not include another document group, the classification criteria for each document group can be simple regardless of the definition of the hierarchical relationship. Further, even when the classification criteria are changed for a certain document group, it is not necessary to reacquire the belonging document in another document group. Furthermore, when acquiring all the documents belonging to a certain document meta group, by calculating the total value in advance for each document group, the total and the total for a specific element can be quickly performed.

【００４４】また、複数の異なる文書グループに属する
文書を管理し、複数の文書グループにまたがって集計を
行うことが可能な文書群分析支援装置を実現する場合
に、一つの文書メタグループにはある共通の文書属性を
用いて構成されたグループのみが含まれるよう構成する
ことにより、その文書属性に関する分類次元として指定
することのできる文書グループを容易に判断できるよう
になるため、指定可能な文書グループのみを分析者に対
して適宜表示するなどの工夫を行うことができる。Further, when a document group analysis support apparatus that manages documents belonging to a plurality of different document groups and can perform totalization over a plurality of document groups is realized, one document meta group exists. By configuring so that only a group configured using a common document attribute is included, a document group that can be specified as a classification dimension related to the document attribute can be easily determined. It is possible to take measures such as appropriately displaying only the information to the analyst.

【００４５】＜集計・クロス集計＞集計とは、一つ以上
の文書グループについて、所属する文書の数や、所属文
書の各属性値の合計、平均、標準偏差などの統計的な情
報（ここでは、これを集計値と呼ぶ）を獲得する処理を
指す。集計においてデータの分類を示す変数を分類次元
と呼ぶ。本発明による文書群分析支援装置では、文書グ
ループを一つ以上含むものを分類次元として用いること
により、文書グループごとに集計値を求めることを可能
にしている。また、他の実施例の文書群分析支援装置で
は、文書グループはそれぞれ単独の文書属性に関して分
類基準を指定することによって得られる文書を含むもの
とする。文書グループが単独の文書属性に関する分類基
準しか持たないようにすることにより、クロス集計を行
う際、それぞれの分類次元になるべく単純な意味をもた
せることで集計作業を分析者にとってわかりやすいもの
とし、また、分類次元間で共通の分類基準が用いられる
ことによって、無駄な集計が行われることを避け、効率
的な集計を可能にすることができる。<Aggregation / Cross-Aggregation> Aggregation refers to statistical information such as the number of documents belonging to one or more document groups and the sum, average, and standard deviation of the attribute values of the belonging documents. , Which is called a total value). A variable indicating the classification of data in the aggregation is called a classification dimension. In the document group analysis support apparatus according to the present invention, it is possible to obtain a total value for each document group by using, as a classification dimension, one that includes one or more document groups. In the document group analysis support device of another embodiment, the document group includes documents obtained by designating a classification criterion for each single document attribute. By ensuring that document groups only have classification criteria for a single document attribute, when performing cross-tabulation, each classification dimension has as simple a meaning as possible, making the tabulation work easier for analysts to understand. By using a common classification criterion between the classification dimensions, it is possible to avoid unnecessary counting and to enable efficient counting.

【００４６】複数の分類次元を用いる集計を特に、クロ
ス集計と呼ぶ。クロス集計では、複数の分類次元を指定
することにより、複数の分類条件の組み合わせでデータ
を分類し、その集計を行うことができる。特定の行およ
び列が交差する領域を、セルと呼ぶ。クロス集計におい
ては、各セルはそのセルに対応づけられた複数の文書グ
ループのいずれにも属する文書を、その内容として持
つ。Aggregation using a plurality of classification dimensions is particularly called cross aggregation. In the cross tabulation, by specifying a plurality of classification dimensions, data can be classified based on a combination of a plurality of classification conditions, and the data can be totaled. The area where a particular row and column intersect is called a cell. In the cross tabulation, each cell has, as its contents, a document belonging to any of a plurality of document groups associated with the cell.

【００４７】集計値としては、求めるセルに所属する文
書の数や、特定の文書属性について所属する文書のもつ
値の平均，合計，標準偏差などの情報を用いる。本実施
形態の他の実施例による文書群分析支援装置では、テキ
ストを値として持つ文書属性を利用する場合に、集計処
理部５に自然言語に対する公知の要約技術，形態素解析
技術を用いることにより、テキスト内容の要約や、含ま
れる単語のリストを代表値として利用することができ
る。As the total value, information such as the number of documents belonging to the cell to be obtained and the average, total, and standard deviation of the values belonging to the documents belonging to a specific document attribute are used. In the document group analysis support apparatus according to another embodiment of the present embodiment, when using a document attribute having a text as a value, a known summarization technique and a morphological analysis technique for a natural language are used in the tabulation processing unit 5. A summary of the text content or a list of included words can be used as a representative value.

【００４８】クロス集計とは、一つ以上の行及び列を持
つ二次元の表形式に整理されたデータに対し、指定され
た行と列の組み合わせに関して、該当する文書数、値の
合計値、平均値、標準偏差などの統計的な情報を獲得す
る処理である。集計処理部５，集計結果表示部６には、
既存のＯＬＡＰクライアントを利用することができる。
一般的なＯＬＡＰクライアントは、クロス集計を行った
結果を表形式で作成し、その上で指定された行や列に対
応した集計結果のみを選択的に表示したり、集計結果を
利用して棒グラフや折れ線グラフなど様々な形式のグラ
フを表示したりする機能を有する。このような機能を実
現した例としては、ＳｅａｇａｔｅＳｏｆｔｗａｒｅ
社のＳｅａｇａｔｅＡｎａｌｙｓｉｓがある。The cross tabulation refers to the number of documents, the total value of the values, and the number of documents corresponding to a specified combination of rows and columns for data arranged in a two-dimensional table format having one or more rows and columns. This is a process of acquiring statistical information such as an average value and a standard deviation. The tally processing unit 5 and the tally result display unit 6
Existing OLAP clients can be used.
A typical OLAP client creates the results of cross tabulation in a table format, and then selectively displays only the results corresponding to the specified rows and columns, and uses a bar graph to display the results. It has a function to display various types of graphs such as a line graph and a line graph. As an example of realizing such a function, Seagate Software
There is Seagate Analysis of the company.

【００４９】＜複数の文書グループへの所属＞テキスト
や画像などの定性データは、一つの値であっても場合に
よって複数の分類基準に該当する場合がある。これは、
定性データは一つの値の中に複数の内容を含んでいた
り、複数の観点から別の基準で表現することが可能であ
ったりするからである。従って、定性データをその値と
する文書属性に関して何らかの分類基準を適用して分類
処理を行う際には、一つの文書が複数の文書グループに
同時に所属することが適切である場合がありうる。<Belonging to a Plurality of Document Groups> Qualitative data such as texts and images may correspond to a plurality of classification criteria depending on the case even if they have one value. this is,
This is because the qualitative data may include a plurality of contents in one value, or may be expressed by different criteria from a plurality of viewpoints. Therefore, when a classification process is performed by applying some classification criterion to a document attribute having qualitative data as its value, it may be appropriate that one document belongs to a plurality of document groups at the same time.

【００５０】文書グループ生成部２は、このような多重
帰属を可能とするよう構成し、文書グループ情報管理部
３は、多重帰属された文書グループの情報を管理するよ
う構成する。文書グループ生成部２の実現方法に関して
は、文章を値として持つ文書属性に関して、ある単語を
含むか否かの分類基準に従って分類を行う場合には、図
２の分類基準を例とするならば、例えば「仕事」と「勉
強」とをともに含む文章（例えば「もっと勉強して仕事
の能力を上げたい」）は存在しうるため、グループ１０
とグループ１２とに多重に所属する文書は存在し、この
ような分類方法においては多重帰属の状態が生じうる。The document group generation unit 2 is configured to enable such multiple assignment, and the document group information management unit 3 is configured to manage the information of the multiplely assigned document group. Regarding the method of realizing the document group generation unit 2, when classifying document attributes having a sentence as a value according to a classification criterion of whether or not a certain word is included, if the classification criterion of FIG. 2 is taken as an example, For example, a sentence that includes both “work” and “study” (eg, “I want to study more and improve my work ability”) can exist, so the group 10
There are documents that belong to the group 12 and the group 12 multiplely, and in such a classification method, a state of multiple belonging can occur.

【００５１】図４は、本発明による文書グループ情報管
理部におけるデータ形式を示す図である。文書グループ
情報管理部３に関する実現方法の例としては、図４のよ
うなデータ形式を用いて、ある文書グループに対してそ
れぞれ、メンバーとなる文書のリストを保持する方法が
ある。この例では、ＩＤ５０の文書が文書グループ５１
と５２、ＩＤ８３の文書が文書グループ５０と５１に、
それぞれ重複して所属している。このように、複数の異
なる文書グループに同じ文書が属する場合、それらの文
書を文書グループ情報管理部３で管理する。これらの複
数の異なる文書グループに基づいて集計を行う際には、
文書グループ間に重複して所属する文書を一度しか扱わ
ないような処理をしてもよい。FIG. 4 is a diagram showing a data format in the document group information management unit according to the present invention. As an example of an implementation method relating to the document group information management unit 3, there is a method of using a data format as shown in FIG. 4 and holding a list of documents that are members for each document group. In this example, the document with ID 50 is the document group 51
, 52, and the document with ID 83 into document groups 50 and 51,
They belong to each other. As described above, when the same document belongs to a plurality of different document groups, the documents are managed by the document group information management unit 3. When performing aggregations based on these different document groups,
A process may be performed such that a document belonging to a document group in an overlapping manner is handled only once.

【００５２】＜文書属性と分類次元＞上述のごとく多重
の所属を許す分類を行った際、異なる分類基準で構成し
た文書グループは、他の基準による文書グループとは独
立の関係にあることもある。このような場合、それらの
間の共通、和、差などの領域について情報を得ること
が、分析において有効となりうる。すなわち、分類次元
情報４では、定性データをその値として含む文書属性に
関しては、その文書属性に関する分類次元を複数定義す
ることを可能とし、集計処理部５では、それらの間のク
ロス集計を行うことを可能とすることが有効である。<Document Attributes and Classification Dimensions> As described above, when a classification that allows multiple affiliations is performed, a document group configured with different classification criteria may be independent of a document group based on other criteria. . In such a case, obtaining information on areas such as common, sum, and difference between them may be effective in analysis. That is, the classification dimension information 4 makes it possible to define a plurality of classification dimensions related to the document attribute that includes qualitative data as its value, and the tabulation processing unit 5 performs cross tabulation between them. It is effective to enable

【００５３】＜分類基準に該当しない文書＞定性データ
を値とする文書属性に関して分類基準を設定する場合、
必ずしも、全ての文書がいずれかの分類基準に該当する
よう条件を設定できるわけではない。このようなとき、
必要に応じて、ある分類次元に関してどの分類基準にも
該当しない文書が必ずある文書グループ（例えば「その
他」グループ）に属するよう処理することが考えられ
る。これによって、分析者は、設定した分類基準によっ
て、文書全体の中でどれだけの文書を説明できるのかを
把握することができる。<Documents Not Applicable to Classification Criteria> When a classification criterion is set for a document attribute having qualitative data as a value,
It is not always possible to set conditions so that all documents meet one of the classification criteria. In such a case,
If necessary, it is conceivable to perform processing so that a document that does not correspond to any classification criterion for a certain classification dimension always belongs to a certain document group (for example, “other” group). As a result, the analyst can grasp how much of the entire document can be explained by the set classification criteria.

【００５４】＜分類次元の指定＞分類次元の指定は、例
えば、以下の操作手順に従って行われる。まず、一つ以
上の文書グループ、あるいは文書メタグループが既に構
成されていることが前提となる。まず分析者は、ある文
書属性に関して、新規の分類次元を定義する。次に、表
示装置７はその文書属性に関して構成された文書グルー
プ、文書メタグループを表示し、分析者はその中から利
用したいものを入力装置８を用いて指定する。<Designation of Classification Dimension> The specification of the classification dimension is performed, for example, according to the following operation procedure. First, it is assumed that one or more document groups or document meta groups have already been configured. First, the analyst defines a new classification dimension for a document attribute. Next, the display device 7 displays a document group and a document meta group configured with respect to the document attribute, and the analyst designates a desired one of them using the input device 8.

【００５５】分類次元の定義とは別に、集計値として求
める統計的な情報の種類を指定する必要がある。これは
具体的には、クロス集計の結果を表の形式で表示した場
合には、その各セルに表示されるべき値の算出方法を指
定することである。集計値を得る方法の例としては、該
当する文書の数、該当する文書のある文書属性の値を合
計したもの、などがある。In addition to the definition of the classification dimension, it is necessary to specify the type of statistical information to be obtained as a total value. Specifically, when the result of the cross tabulation is displayed in the form of a table, a method of calculating a value to be displayed in each cell is designated. Examples of a method of obtaining the total value include the number of applicable documents, the sum of values of certain document attributes of the applicable documents, and the like.

【００５６】図５乃至図９は、構成された文書グループ
及びそれを用いて二つの分類次元を定義した例を示す図
で、図５は文書情報の例を、図６は図５の文書集合に対
する分類基準の例を、図７及び図８は図６における各属
性に関する文書メタグループを、図９は図５の文書集合
を図６の文書分類を利用して定義した２つの分類次元及
びそれらのクロス集計結果を、それぞれ示す図である。
この例では、年齢層によって「今年の抱負」の記述内容
にどのような違いがあるのかを分析する際、年齢層と今
年の抱負というそれぞれの文書属性に関して独立に文書
グループ、及び文書メタグループを定義し、それらを分
類次元とすることによって、二つの分類次元間のクロス
集計を行う場合を想定したものである。縦横の合計値
が、各セルの値の合計と一致しないのは、文書属性「今
年の抱負」に関する分類において、文書グループ間に多
重帰属が生じているためである。FIGS. 5 to 9 are diagrams showing examples of the formed document group and two classification dimensions defined using the document group. FIG. 5 shows an example of the document information, and FIG. 7 and 8 show document metagroups relating to respective attributes in FIG. 6, and FIG. 9 shows two classification dimensions defining the document set in FIG. 5 using the document classification in FIG. It is a figure which shows each cross tabulation result.
In this example, when analyzing the differences in the description of "Aspirations of the Year" depending on the age group, a document group and a document meta group are separately set for each document attribute of the age group and the resolution of this year. It is assumed that cross-tabulation between two classification dimensions is performed by defining and defining them as classification dimensions. The reason that the total value in the vertical and horizontal directions does not match the total value in each cell is that multiple attributions have occurred between document groups in the classification regarding the document attribute “This year's resolution”.

【００５７】以上、本発明を上述の各構成要素からなる
文書群分析支援装置として説明してきたが、本発明は、
これらの各要素をステップとして実行させる文書群分析
支援方法としての形態も可能であり、また、コンピュー
タに各構成要素として機能させるためのプログラムをコ
ンピュータにインストールすることでも実装可能であ
り、そのようなプログラムを記録したコンピュータ読み
取り可能な記録媒体としての実施形態もありうる。As described above, the present invention has been described as a document group analysis support device including the above-described components.
A form as a document group analysis supporting method in which each of these elements is executed as a step is also possible, and can also be implemented by installing a program for causing a computer to function as each component in a computer. There may be an embodiment as a computer-readable recording medium on which a program is recorded.

【００５８】[0058]

【発明の効果】本発明によれば、複数の文書（文書集
合）の分類結果を様々に組み合わせながら、文書グルー
プを異なる視点から分析することが可能となり、文書集
合を用いた分析を効率化することができる。According to the present invention, it is possible to analyze a document group from different viewpoints while variously combining the classification results of a plurality of documents (document sets), thereby improving the efficiency of the analysis using the document sets. be able to.

【００５９】本発明によれば、さらに、文書グループを
特定の文書属性に関して構成させるという特徴を持つた
め、分析者が分類次元を指定する際に、クロス集計を行
う意味のない組み合わせを排除し、整理されたわかりや
すい構造を定義することが容易になる。According to the present invention, furthermore, since the document group is configured with respect to a specific document attribute, when the analyst specifies a classification dimension, a meaningless combination for performing cross tabulation is excluded. It becomes easy to define an organized and understandable structure.

【００６０】本発明によれば、さらに、定性データを値
として持つ文書属性を用いて文書グループを分類する際
に、複数の文書グループに同じ文書を対応づけることが
可能となっており、またそうした状況でも正しい集計結
果を得ることが可能になっているため、多義的な特性を
持つ定性データを含んだ文書集合の分析を自然に行うこ
とができる。According to the present invention, when a document group is classified using a document attribute having qualitative data as a value, it is possible to associate the same document with a plurality of document groups. Since it is possible to obtain a correct aggregation result even in a situation, it is possible to naturally analyze a document set including qualitative data having ambiguous characteristics.

【００６１】本発明によれば、さらに、文書メタグルー
プを単位として分類次元を定義することが可能となって
いるため、分類次元指定の操作を効率化することができ
る。さらに、文書グループ間に階層構造を定義した場
合、ある文書グループに関して分類基準の変更を行って
も、他の文書グループには影響を与えないため、個々の
文書グループの特徴がわかりやすくなり、また文書グル
ープの変更を柔軟に行うことができる。According to the present invention, since the classification dimension can be defined in units of the document meta group, the operation of specifying the classification dimension can be made more efficient. Furthermore, if a hierarchical structure is defined between document groups, changing the classification criteria for one document group does not affect other document groups, making it easier to understand the characteristics of individual document groups. Document groups can be changed flexibly.

【００６２】本発明によれば、さらに、定性データを値
として持つ文書属性を用いて文書グループを分類する際
に、一つの文書属性に関して複数の分類次元を定義する
ことが可能となっているため、多義的な特性を持つ定性
データを含んだ文書集合の分析を自然に行うことができ
る。According to the present invention, when a document group is classified using a document attribute having qualitative data as a value, a plurality of classification dimensions can be defined for one document attribute. In addition, it is possible to naturally analyze a document set including qualitative data having ambiguous characteristics.

【００６３】本発明によれば、さらに、テキストを値と
して持つ属性に関して集計を行う際に、一つ以上の文書
のテキストの要約や、テキストに含まれる単語や該単語
の頻度など集計値として抽出することが可能となってい
るため、より属性値を生かした様々な集計が可能とな
る。According to the present invention, when tabulation is performed on an attribute having a text as a value, text summaries of one or more documents and words included in the text and the frequency of the word are extracted as tabulated values. It is possible to perform various tabulations utilizing attribute values.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の一実施形態の文書群分析支援装置の
構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a document group analysis support device according to an embodiment of the present invention.

【図２】本発明による文書属性を用いた分類例を示す
図である。FIG. 2 is a diagram showing a classification example using a document attribute according to the present invention.

【図３】本発明による文書グループ間の構造の一例を
記述したツリーを示す図である。FIG. 3 is a diagram showing a tree describing an example of a structure between document groups according to the present invention.

【図４】本発明による文書グループ情報管理部におけ
るデータ形式を示す図である。FIG. 4 is a diagram showing a data format in a document group information management unit according to the present invention.

【図５】構成された文書グループ及びそれを用いて二
つの分類次元を定義した例を示す図である。FIG. 5 is a diagram illustrating an example in which a document group is configured and two classification dimensions are defined using the document group.

【図６】構成された文書グループ及びそれを用いて二
つの分類次元を定義した例を示す図である。FIG. 6 is a diagram showing an example of a configured document group and two classification dimensions defined using the document group.

【図７】構成された文書グループ及びそれを用いて二
つの分類次元を定義した例を示す図である。FIG. 7 is a diagram illustrating an example of a configured document group and two classification dimensions defined using the document group.

【図８】構成された文書グループ及びそれを用いて二
つの分類次元を定義した例を示す図である。FIG. 8 is a diagram illustrating an example of a configured document group and two classification dimensions defined using the document group.

【図９】構成された文書グループ及びそれを用いて二
つの分類次元を定義した例を示す図である。FIG. 9 is a diagram illustrating an example of a configured document group and two classification dimensions defined using the document group.

[Explanation of symbols]

１…文書情報管理部、２…文書グループ生成部、３…文
書グループ情報管理部、４…分類次元情報、５…集計処
理部、６…集計結果表示部、７…表示装置、８…入力装
置。DESCRIPTION OF SYMBOLS 1 ... Document information management part, 2 ... Document group generation part, 3 ... Document group information management part, 4 ... Classification dimension information, 5 ... Total processing part, 6 ... Total result display part, 7 ... Display device, 8 ... Input device .

───────────────────────────────────────────────────── フロントページの続き (72)発明者藤田克彦東京都大田区中馬込１丁目３番６号株式会社リコー内Ｆターム(参考） 5B049 AA02 GG09 5B075 NR12 UU06 ────────────────────────────────────────────────── ─── Continued on front page (72) Inventor Katsuhiko Fujita 1-3-6 Nakamagome, Ota-ku, Tokyo F-term in Ricoh Co., Ltd. 5B049 AA02 GG09 5B075 NR12 UU06

Claims

[Claims]

1. A document group analysis support apparatus for supporting a task of obtaining information necessary for decision making using a plurality of documents,
A document information management unit that manages each document so that it can be used for each attribute included in the document; a document group generation unit that classifies a plurality of documents based on a specified classification criterion; A document group information management unit for managing information about the group of documents, and a classification dimension composed of one or more document groups. According to the classification dimension, the number of belonging documents and the attribute values of the belonging documents are used. A document group analysis support device, comprising: a tally processing unit that performs tallying; and a tallying result display unit that displays information obtained as a result of the tallying.

2. A document group generated by the document group generating unit is configured according to a classification criterion for one attribute, and the document group information managing unit stores each document group with information on an attribute of the document group. The document group analysis support device according to claim 1, wherein the document group analysis support device manages the document group in association with each other.

3. The document group generation means enables the same document to belong to a plurality of different document groups, and the document group information management means manages documents belonging to the plurality of different document groups, 3. The aggregation processing unit according to claim 1, wherein, when performing aggregation based on a plurality of document groups, a document belonging to a plurality of document groups and belonging to each other is handled only once.
Document group analysis support device described in 1.

4. A document meta group management means for defining and managing a document meta group which can include one or more document groups, which are configured with respect to the same attribute, 4. The document group analysis support apparatus according to claim 1, wherein when specifying the document group, the specification is performed in units of a document meta group.

5. The tabulation processing means has a plurality of classification dimensions constituted by one or more document groups, and tabulates using the number of belonging documents and attribute values of the belonging documents according to the plurality of classification dimensions. 5. The method according to claim 1, wherein
2. The document group analysis support device according to claim 1.

6. The method according to claim 1, wherein the totaling processing unit specifies a plurality of classification dimensions for one attribute, and performs a cross-tabulation process between the plurality of classification dimensions. 6. The document group analysis support device according to any one of 1 to 5.

7. The aggregation processing means, when performing aggregation for an attribute having a text as a value, as a summary of a text of one or more documents, a word included in the text, and a frequency of the word. The document group analysis support device according to claim 1, wherein the document group analysis support device is capable of extracting.

8. A document group analysis support method for supporting a task of obtaining information necessary for decision making using a plurality of documents,
A document information management step of managing each document so that it can be used for each attribute included in the document; a document group generation step of classifying a plurality of documents based on a specified classification standard; and a document group generation step A document group information management step of managing information related to a group of documents, and a classification dimension composed of one or more document groups. According to the classification dimension, the number of belonging documents and the attribute values of the belonging documents are used. A document group analysis support method, comprising: a tabulation processing step of tabulating; and a tabulation result display step of displaying information obtained as a result of the tabulation.

9. Each document group generated in the document group generation step is configured according to a classification criterion for one attribute, and the document group information management step includes a step of storing each document group with information on an attribute of the document group. The document group analysis support method according to claim 8, wherein the document group analysis is managed in association with each other.

10. The document group generation step enables the same document to belong to a plurality of different document groups, and the document group information management step manages documents belonging to the plurality of different document groups, 10. The document group according to claim 8, wherein, in the tabulation processing step, when counting is performed based on a plurality of document groups, a document that belongs to and overlaps between the document groups is handled only once. Analysis support method.

11. A document meta group management step for defining and managing a document meta group that can include one or more document groups and configured with respect to the same attribute. 11. The document group analysis support method according to claim 8, wherein when the group is specified, the specification is performed in units of a document meta group.

12. The tallying step has a plurality of classification dimensions constituted by one or more document groups, and uses the number of belonging documents and attribute values of the belonging documents according to the plurality of classification dimensions. 9. The method according to claim 8, wherein
12. The document group analysis support method according to any one of items 1 to 11.

13. The method according to claim 1, wherein the aggregation processing step specifies a plurality of classification dimensions for one attribute, and performs cross aggregation processing between the plurality of classification dimensions. 13. The document group analysis support method according to any one of 8 to 12.

14. The tallying step includes, when tallying an attribute having a text as a value, as a tally value such as a text summary of one or more documents, a word included in the text, and a frequency of the word. 14. The document group analysis support method according to claim 8, wherein the document group analysis can be extracted.

15. A computer-readable storage medium in which a program for causing the apparatus to function as the document group analysis support apparatus according to claim 1 is recorded.

16. Managing document information so that each of a plurality of documents can be used for each attribute included in the document, classifying the plurality of documents based on a specified classification criterion, and generating a document group. It manages information on the generated document group, has a classification dimension composed of one or more document groups, and performs aggregation using the number of belonging documents and attribute values of the belonging documents according to the classification dimension. A computer-readable storage medium in which a program for supporting a task of obtaining information necessary for decision making using a plurality of documents by displaying information obtained as a result of aggregation is provided.

17. The generated document group is configured according to a classification criterion for one attribute, and the information on the document group is managed by associating each document group with information on an attribute of the document group. 17. The method according to claim 16, wherein
A computer-readable storage medium according to claim 1.

18. The method according to claim 18, wherein the same document belongs to a plurality of different generated document groups, and the documents belonging to the plurality of different document groups are managed, and the totaling is performed based on the plurality of document groups. 18. The computer-readable storage medium according to claim 16, wherein when counting is performed, a document belonging to a group of documents that overlaps is handled only once.

19. Define and manage a document meta group that can include one or more document groups that are configured with respect to the same attribute, and the aggregation is performed when specifying a document group in a classification dimension.
19. The computer-readable storage medium according to claim 16, wherein specification is performed in units of the document meta group.

20. The tabulation has a plurality of classification dimensions composed of one or more document groups, and is a tabulation using the number of belonging documents and attribute values of the belonging documents according to the plurality of classification dimensions. The computer-readable storage medium according to claim 16, wherein the storage medium is a computer-readable storage medium.

21. The tallying method according to claim 16, wherein a plurality of classification dimensions are designated for one attribute, and cross-tabulation processing among the plurality of classification dimensions can be performed. 21. The computer-readable storage medium according to any one of 20.

22. When the tabulation is performed on an attribute having text as a value, the tabulation is extracted as a summary value of text of one or more documents, a word included in the text, and a frequency of the word. The computer-readable storage medium according to any one of claims 16 to 21, wherein the storage medium is capable of performing the following.