JP2001034617A

JP2001034617A - Device and method for information analysis support and storage medium

Info

Publication number: JP2001034617A
Application number: JP11203834A
Authority: JP
Inventors: Katsuhiko Fujita; 克彦藤田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1999-07-16
Filing date: 1999-07-16
Publication date: 2001-02-09

Abstract

PROBLEM TO BE SOLVED: To enable a user to extract the information from the text character strings and to sort the texts in a common framework at need and then to obtain a processing result including a combination of the said extracting and sorting jobs. SOLUTION: A document record holding means 3 can store plural document records including the text character strings and the information on the attributes of these character strings. Each of plural information extraction means 8 can extract a character string from the text character strings to satisfy a prescribed specific condition. An information extraction instruction means 9 can select a means 8 that is applied to the designated text character strings included in one or more document records which are selected by a document record selection means 7. An attribute addition means 10 adds the character string extracted by the selected means 8 to a document record as the information on the attribute corresponding to the text character string of the destination of extraction.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、情報分析支援装
置、情報分析支援方法および記憶媒体に関する。[0001] The present invention relates to an information analysis support device, an information analysis support method, and a storage medium.

【０００２】[0002]

【従来の技術】従来、さまざまな情報分析支援装置が開
発され、使用されている。2. Description of the Related Art Conventionally, various information analysis support devices have been developed and used.

【０００３】その中には、自然言語処理技術や、文字列
のパターンマッチングの技術を用いて、テキスト文字列
中に含まれる情報（たとえば固有名詞や数値、日付な
ど）を抽出する機能をもったものがある。[0003] Among them, there is a function of extracting information (for example, proper nouns, numerical values, dates, etc.) contained in a text character string by using a natural language processing technique or a character string pattern matching technique. There is something.

【０００４】また、複数のテキスト文字列間の関係を、
何らかの類似度計算手法に基づく数値によって表現し、
類似したテキスト文字列同士を同じカテゴリーに属する
ものとして分類するものもある。[0004] Further, the relationship between a plurality of text strings,
Expressed by numerical values based on some similarity calculation method,
In some cases, similar text strings are classified as belonging to the same category.

【０００５】[0005]

【発明が解決しようとする課題】しかし、前記従来の技
術は、特定の機能による処理の結果を利用者に見せるこ
とを狙いとしているため、必要に応じてそれらの機能を
切り替えて利用したり、それらの結果を組み合わせて、
より深い分析結果を生み出すようには構成されておら
ず、使い勝手が充分ではないという不具合がある。However, since the above-mentioned prior art aims to show the result of processing by a specific function to a user, the function can be switched and used as needed. Combine those results,
It is not configured to produce deeper analysis results, and is not convenient enough.

【０００６】この発明の目的は、テキスト文字列からの
情報抽出やテキストの分類の作業を共通の枠組みから、
必要に応じて行なえるようにして、利用者がそれらを組
み合わせた処理結果を得ることができるようにすること
である。An object of the present invention is to extract information from a text string and classify texts from a common framework.
The processing can be performed as needed, so that the user can obtain a processing result obtained by combining them.

【０００７】[0007]

【課題を解決するための手段】請求項１に記載の発明
は、テキスト文字列と当該テキスト文字列の属性の情報
とからなる文書レコードを複数記憶する文書レコード保
持手段と、この文書レコード保持手段に記憶されている
文書レコードから１つ以上の文書レコードの選択を可能
とする文書レコード選択手段と、この選択された１つ以
上の文書レコードをソートするための基準とする属性の
選択を可能とするソート基準属性指定手段と、前記文書
レコード選択手段によって選択された１個以上の文書レ
コードに対し、前記ソート基準属性指定手段で指定され
た属性の情報の性質に応じたソートを行なう文書レコー
ド・ソート手段と、前記テキスト文字列からあらかじめ
定めた特定の条件を満たす文字列を抽出する１種類以上
の情報抽出手段と、前記文書レコード選択手段によって
選択された１つ以上の文書レコードに含まれる指定され
たテキスト文字列に対して適用する前記情報抽出手段の
選択を可能とする情報抽出指示手段と、この選択された
情報抽出手段で抽出した文字列をその抽出先の前記テキ
スト文字列に対応した前記属性の情報として前記文書レ
コードに付加する属性付加手段とを備えている情報分析
支援装置である。According to a first aspect of the present invention, there is provided a document record holding means for storing a plurality of document records each including a text character string and information on the attribute of the text character string, and the document record holding means. And a document record selecting means for selecting one or more document records from the document records stored in the document record, and selecting an attribute as a reference for sorting the selected one or more document records. A sort-criterion attribute specifying unit that performs sorting on one or more document records selected by the document record selecting unit in accordance with the property of the information of the attribute specified by the sort-criterion attribute specifying unit. Sorting means, at least one kind of information extracting means for extracting a character string satisfying a predetermined specific condition from the text character string, Information extraction instructing means for enabling selection of the information extracting means to be applied to a specified text string included in one or more document records selected by the document record selecting means, and the selected information An information analysis support apparatus comprising: an attribute adding unit that adds a character string extracted by an extracting unit to the document record as information of the attribute corresponding to the text character string of the extraction destination.

【０００８】したがって、複数の文書レコードがあった
場合に、それらに含まれるテキスト文字列に対し、あら
かじめ定めた特定の条件を満たす文字列を抽出するメニ
ューを目的に応じて選択し適用することで情報抽出を行
ない、その結果を最初から文書レコードに含まれていた
属性の情報と同様に利用することを可能として、関連す
る文書レコードを整理することができる。Therefore, when there are a plurality of document records, a menu for extracting a character string satisfying a predetermined specific condition is selected and applied to a text character string included in the plurality of document records. Information extraction is performed, and the result can be used in the same way as the attribute information included in the document record from the beginning, and related document records can be organized.

【０００９】請求項２に記載の発明は、テキスト文字列
と当該テキスト文字列の属性の情報とからなる文書レコ
ードを複数記憶する文書レコード保持手段と、この文書
レコード保持手段に記憶されている文書レコードから１
つ以上の文書レコードの選択を可能とする文書レコード
選択手段と、この選択された１つ以上の文書レコードを
ソートするための基準とする属性の選択を可能とするソ
ート基準属性指定手段と、前記文書レコード選択手段に
よって選択された１個以上の文書レコードに対し、前記
ソート基準属性指定手段で指定された属性の情報の性質
に応じたソートを行なう文書レコード・ソート手段と、
複数の前記テキスト文字列をあらかじめ定めた手法に基
づき分類する１種類以上のテキスト分類手段と、前記文
書レコード選択手段によって選択された１つ以上の文書
レコードに含まれる指定されたテキスト文字列に対して
適用する前記テキスト分類手段の選択を可能とする分類
指示手段と、この選択されたテキスト分類手段で分類し
た結果をその分類された前記テキスト文字列に対応した
前記属性の情報として前記文書レコードに付加する属性
付加手段とを備えている情報分析支援装置である。According to a second aspect of the present invention, there is provided a document record holding means for storing a plurality of document records each comprising a text character string and attribute information of the text character string, and a document stored in the document record holding means. 1 from record
Document record selecting means for enabling selection of one or more document records; sort reference attribute designating means for enabling selection of an attribute used as a reference for sorting the selected one or more document records; Document record sorting means for sorting one or more document records selected by the document record selecting means in accordance with the property of the information of the attribute designated by the sort criterion attribute designating means;
One or more types of text classifying means for classifying the plurality of text character strings based on a predetermined method; and a specified text character string included in one or more document records selected by the document record selecting means. A classification instructing means for enabling selection of the text classification means to be applied to the document record as information of the attribute corresponding to the classified text character string. This is an information analysis support device including an attribute adding means for adding.

【００１０】したがって、複数の文書レコードがあった
場合に、それらに含まれるテキスト文字列をあらかじめ
定めた手法に基づき分類するメニューを目的に応じて選
択し適用することで分類を行ない、その結果を最初から
文書レコードに含まれていた属性の情報と同様に利用す
ることを可能として、複数の分類結果を組み合せた分類
が可能となる。また、他の属性との関連もみやすくな
り、情報分析作業の効率が向上する。Therefore, when there are a plurality of document records, a menu for classifying text strings contained therein based on a predetermined method is selected and applied according to the purpose, and classification is performed. It can be used in the same way as the attribute information included in the document record from the beginning, and classification can be performed by combining a plurality of classification results. Further, the relation with other attributes is easily seen, and the efficiency of the information analysis work is improved.

【００１１】請求項３に記載の発明は、テキスト文字列
と当該テキスト文字列の属性の情報とからなる文書レコ
ードに含まれる前記テキスト文字列からあらかじめ定め
た特定の条件を満たす文字列を抽出することを可能とす
る１種類以上のメニューから、複数の前記文書レコード
から選択された１つ以上の前記文書レコードに含まれる
指定されたテキスト文字列に対して適用するものを選択
する工程と、この選択されたメニューに従って前記文字
列を抽出する工程と、この抽出した文字列をその抽出先
の前記テキスト文字列に対応した前記属性の情報として
前記文書レコードに付加する工程とを含んでなる情報分
析支援方法である。According to a third aspect of the present invention, a character string satisfying a predetermined condition is extracted from the text character string included in a document record including a text character string and attribute information of the text character string. Selecting from the one or more types of menus that can be applied to a specified text string contained in one or more of the document records selected from the plurality of document records; and An information analysis including a step of extracting the character string according to a selected menu; and a step of adding the extracted character string to the document record as information of the attribute corresponding to the text string of the extraction destination. It is a support method.

【００１２】したがって、複数の文書レコードがあった
場合に、それらに含まれるテキスト文字列に対し、あら
かじめ定めた特定の条件を満たす文字列を抽出するメニ
ューを目的に応じて選択し適用することで情報抽出を行
ない、その結果を最初から文書レコードに含まれていた
属性の情報と同様に利用することを可能として、関連す
る文書レコードを整理することができる。Therefore, when there are a plurality of document records, a menu for extracting a character string satisfying a predetermined specific condition is selected and applied to a text character string included in the plurality of document records. Information extraction is performed, and the result can be used in the same way as the attribute information included in the document record from the beginning, and related document records can be organized.

【００１３】請求項４に記載の発明は、テキスト文字列
と当該テキスト文字列の属性の情報とからなる文書レコ
ードに含まれる前記テキスト文字列をあらかじめ定めた
手法に基づき分類することを可能とする１種類以上のメ
ニューから、複数の前記文書レコードから選択された１
つ以上の前記文書レコードに含まれる指定されたテキス
ト文字列に対して適用するものを選択する工程と、この
選択されたメニューに従って前記テキスト文字列を分類
する工程と、この分類した結果をその分類された前記テ
キスト文字列に対応した前記属性の情報として前記文書
レコードに付加する工程とを含んでなる情報分析支援方
法である。According to a fourth aspect of the present invention, it is possible to classify the text character string included in a document record including a text character string and attribute information of the text character string based on a predetermined method. One selected from a plurality of document records from one or more menus
Selecting one to apply to a specified text string contained in one or more of the document records; classifying the text string according to the selected menu; And adding to the document record as information of the attribute corresponding to the text string thus obtained.

【００１４】したがって、複数の文書レコードがあった
場合に、それらに含まれるテキスト文字列をあらかじめ
定めた手法に基づき分類するメニューを目的に応じて選
択し適用することで分類を行ない、その結果を最初から
文書レコードに含まれていた属性の情報と同様に利用す
ることを可能として、複数の分類結果を組み合せた分類
が可能となる。また、他の属性との関連もみやすくな
り、情報分析作業の効率が向上する。Therefore, when there are a plurality of document records, a menu for classifying text strings included in the document records based on a predetermined method is selected and applied according to the purpose, and classification is performed. It can be used in the same way as the attribute information included in the document record from the beginning, and classification can be performed by combining a plurality of classification results. Further, the relation with other attributes is easily seen, and the efficiency of the information analysis work is improved.

【００１５】請求項５に記載の発明は、テキスト文字列
と当該テキスト文字列の属性の情報とからなる複数の文
書レコードの中から１つ以上の前記文書レコードの選択
を受け付ける工程と、この選択された１つ以上の文書レ
コードをソートするための基準とする属性の選択を受け
付ける工程と、前記文書レコード選択受付工程で選択を
受け付けた１個以上の文書レコードに対し、前記属性選
択受付工程で選択を受け付けた属性の情報の性質に応じ
たソートを行なう工程と、前記文書レコードに含まれる
前記テキスト文字列からあらかじめ定めた特定の条件を
満たす文字列を抽出することを可能とする１種類以上の
メニューから、複数の前記文書レコードから選択された
１つ以上の前記文書レコードに含まれる指定されたテキ
スト文字列に対して適用するものを選択する工程と、こ
の選択されたメニューに従って前記文字列を抽出する工
程と、この抽出した文字列をその抽出先の前記テキスト
文字列に対応した前記属性の情報として前記文書レコー
ドに付加する工程とをコンピュータに実行させるプログ
ラムを記憶した、コンピュータに読み取り可能な記憶媒
体である。According to a fifth aspect of the present invention, there is provided a step of receiving a selection of one or more document records from a plurality of document records each including a text character string and attribute information of the text character string, and the selection. Receiving a selection of an attribute as a reference for sorting one or more of the selected document records, and providing the one or more document records selected in the document record selection receiving step in the attribute selection receiving step. A step of performing sorting according to the nature of the information of the attribute whose selection has been accepted, and one or more types capable of extracting a character string satisfying a predetermined specific condition from the text character string included in the document record From the menu of the specified text string included in one or more of the document records selected from a plurality of document records Selecting an application to be applied; extracting the character string according to the selected menu; and extracting the extracted character string into the document record as information of the attribute corresponding to the extracted text character string. A computer-readable storage medium storing a program for causing a computer to execute the adding step.

【００１６】したがって、複数の文書レコードがあった
場合に、それらに含まれるテキスト文字列に対し、あら
かじめ定めた特定の条件を満たす文字列を抽出するメニ
ューを目的に応じて選択し適用することで情報抽出を行
ない、その結果を最初から文書レコードに含まれていた
属性の情報と同様に利用することを可能として、関連す
る文書レコードを整理することができる。Therefore, when there are a plurality of document records, a menu for extracting a character string satisfying a predetermined specific condition is selected and applied to a text character string included in the plurality of document records. Information extraction is performed, and the result can be used in the same way as the attribute information included in the document record from the beginning, and related document records can be organized.

【００１７】請求項６に記載の発明は、テキスト文字列
と当該テキスト文字列の属性の情報とからなる複数の文
書レコードの中から１つ以上の前記文書レコードの選択
を受け付ける工程と、この選択された１つ以上の文書レ
コードをソートするための基準とする属性の選択を受け
付ける工程と、前記文書レコード選択受付工程で選択を
受け付けた１個以上の文書レコードに対し、前記属性選
択受付工程で選択を受け付けた属性の情報の性質に応じ
たソートを行なう工程と、前記文書レコードに含まれる
前記テキスト文字列をあらかじめ定めた手法に基づき分
類することを可能とする１種類以上のメニューから、複
数の前記文書レコードから選択された１つ以上の前記文
書レコードに含まれる指定されたテキスト文字列に対し
て適用するものを選択する工程と、この選択されたメニ
ューに従って前記テキスト文字列を分類する工程と、こ
の分類した結果をその分類された前記テキスト文字列に
対応した前記属性の情報として前記文書レコードに付加
する工程とをコンピュータに実行させるプログラムを記
憶した、コンピュータに読み取り可能な記憶媒体であ
る。According to a sixth aspect of the present invention, there is provided a step of receiving a selection of one or more document records from a plurality of document records each comprising a text character string and attribute information of the text character string, Receiving a selection of an attribute as a reference for sorting one or more of the selected document records, and providing the one or more document records selected in the document record selection receiving step in the attribute selection receiving step. A step of performing sorting in accordance with the nature of the information of the attribute whose selection has been accepted; and a plurality of menus that allow the text string included in the document record to be classified based on a predetermined method. To apply to a specified text string contained in one or more of the document records selected from the document records of Selecting the text string according to the selected menu, and adding the result of the classification to the document record as information of the attribute corresponding to the classified text string. Is a computer-readable storage medium storing a program for causing a computer to execute the program.

【００１８】したがって、複数の文書レコードがあった
場合に、それらに含まれるテキスト文字列をあらかじめ
定めた手法に基づき分類するメニューを目的に応じて選
択し適用することで分類を行ない、その結果を最初から
文書レコードに含まれていた属性の情報と同様に利用す
ることを可能として、複数の分類結果を組み合せた分類
が可能となる。また、他の属性との関連もみやすくな
り、情報分析作業の効率が向上する。Therefore, when there are a plurality of document records, a menu for classifying the text character strings contained therein based on a predetermined method is selected and applied according to the purpose, and the classification is performed. It can be used in the same way as the attribute information included in the document record from the beginning, and classification can be performed by combining a plurality of classification results. Further, the relation with other attributes is easily seen, and the efficiency of the information analysis work is improved.

【００１９】[0019]

【発明の実施の形態】［発明の実施の形態１］図１は、
この発明の実施の形態１である情報分析支援装置１の機
能ブロック図である。図１の情報分析支援装置１を構成
する各手段は互いに接続されており、必要に応じて各々
の間で文書レコードのデータや制御情報のやり取りが可
能なよう構成されている。[First Embodiment of the Invention] FIG.
1 is a functional block diagram of an information analysis support device 1 according to a first embodiment of the present invention. The units constituting the information analysis support apparatus 1 of FIG. 1 are connected to each other, and are configured so that data of document records and control information can be exchanged between them as needed.

【００２０】まず、入力手段２で文書レコードのデータ
が受け入れられ、文書レコード保持手段３で、その内容
に一意な文書識別子を付与されて蓄積される。ここで、
文書レコードとは、文書の内容を構成するテキスト文字
列と、０個以上の属性情報からなる。属性情報は、テキ
スト文字列の属性に関する情報で、属性名とその属性に
対する値（以下、「属性値」という）から構成されてい
る。属性名は、属性がどのようなものかを示すラベルで
あり、文書レコードに含まれていても含まれていなくて
もよい。属性値は、そのラベルに対応する実際の内容で
ある。属性情報の例としては、文書の作成日や入力日、
文書の作成者など、いわゆる書誌事項があるが、必ずし
も書誌事項だけである必要はない。例えば、その文書レ
コードから抽出できるさまざまの情報（たとえば、含ま
れるテキスト文字列の長さ）なども、いったん抽出され
れば、たとえば「テキスト文字列の長さ」という属性情
報とすることができる。First, the data of the document record is received by the input means 2, and the document record holding means 3 accumulates the contents by attaching a unique document identifier to the content. here,
The document record is composed of a text character string constituting the contents of the document and zero or more pieces of attribute information. The attribute information is information on the attribute of the text character string, and includes an attribute name and a value for the attribute (hereinafter, referred to as “attribute value”). The attribute name is a label indicating what the attribute is, and may or may not be included in the document record. The attribute value is the actual content corresponding to the label. Examples of attribute information include the date the document was created and entered,
There are so-called bibliographic items such as a creator of a document, but it is not always necessary to include only bibliographic items. For example, various information that can be extracted from the document record (for example, the length of the included text character string) and the like can be attributed to, for example, “length of text character string” once extracted.

【００２１】図２は、文書レコード保持手段３のデータ
構造の模式図である。図２の上段に示した、「入力
日」、「入力者」、「テキスト文字列の長さ」などが属
性名であり、それぞれの下の段に入っているデータが、
属性値である（この他に個々の文書レコードについて、
一意の文書識別子があるが、図２では省略してある）。
実際のコンピュータ上では、これらの情報は、一般的な
データ記憶形式に変換されて保持されることになる。FIG. 2 is a schematic diagram of the data structure of the document record holding means 3. The “input date”, “input user”, “length of text string” and the like shown in the upper row of FIG. 2 are attribute names, and the data in each lower row is
Attribute value (other than for individual document records,
Although there is a unique document identifier, it is omitted in FIG. 2).
On an actual computer, such information is converted and stored in a general data storage format.

【００２２】文書レコード表示出力手段４は、文書レコ
ード保持手段３に蓄積されている１つ以上の文書レコー
ドを、図３に示すような表形式で表示する。The document record display output means 4 displays one or more document records stored in the document record holding means 3 in a table format as shown in FIG.

【００２３】文書レコード・ソート手段５は、選択され
た１つ以上の文書レコードに対し、ソート基準属性指定
手段６で指定された属性の値の大小によって、並べ替え
を行なう。文書レコードの選択は、表示された複数の文
書レコードから一つ以上の文書レコードの利用者による
選択を可能にする文書レコード選択手段７を介して行な
う。これは、表示された複数の文書レコードのうち、対
象とした文書レコードにポインタを置き、そこで選択を
指示するという、一般的な利用者インタフェースで採用
されている方式により容易に実現できる。The document record sorting means 5 sorts the selected one or more document records according to the value of the attribute specified by the sort reference attribute specifying means 6. The selection of a document record is performed via a document record selection means 7 which allows a user to select one or more document records from the displayed plurality of document records. This can be easily realized by a method adopted in a general user interface in which a pointer is placed on a target document record among a plurality of displayed document records and a selection is instructed there.

【００２４】ソートの基準となる属性の選択は、ソート
基準属性指定手段６を介して行なう。これは、たとえ
ば、図３の表示の一番上の行にある属性のうち、基準と
したい属性にポインタを置き、そこで選択を指示すると
いう、一般的な利用者インタフェースで採用されている
方式により容易に実現できる。The selection of an attribute serving as a sorting reference is made via a sorting reference attribute designating means 6. This is done by, for example, a method adopted in a general user interface in which a pointer is placed on an attribute to be a reference among the attributes in the top line of the display in FIG. 3 and a selection is instructed there. Can be easily realized.

【００２５】文書レコードを選択し、ソート基準となる
属性を指定すると、ソートが行なわれ、その結果が文書
レコード表示出力手段４で出力表示される。具体的なソ
ートの手法の説明については省略する。なお、図３の文
書レコードは、入力日属性で昇順にソートされた状態を
示している。When a document record is selected and an attribute serving as a sorting criterion is designated, sorting is performed, and the result is output and displayed by the document record display output means 4. A detailed description of the sorting method is omitted. It should be noted that the document records in FIG. 3 are in a state of being sorted in ascending order by the input date attribute.

【００２６】情報抽出手段８は、複数設けられ（図１の
例ではｎ個（ｎは２以上の整数））、テキスト文字列か
らあらかじめ定めた特定の条件を満たす文字列を抽出す
る手段である。具体的には、次に示すようなものがあ
る。 ---固有名（企業名や商品名など）の抽出を行なう「固
有名情報抽出手段」 ---日時に関する情報抽出を行なう「日時情報抽出手
段」 ---金額に関する情報抽出を行なう「金額情報抽出手
段」これらは、自然言語処理技術や文字列のパターン照合処
理技術によって実現できる。たとえば、「固有名情報抽
出手段」は、企業名辞書を用いた文字列の形態素解析処
理によって実現できるし、「日時情報抽出手段」は、日
時表記のパターン辞書を用いた文字列のパターン照合処
理により実現できる。A plurality of information extracting means 8 (n in the example of FIG. 1 (n is an integer of 2 or more)) are means for extracting a character string satisfying a predetermined condition from a text character string. . Specifically, there is the following. --- "Unique name information extraction means" to extract unique names (company name, product name, etc.) --- "Date and time information extraction means" to extract information about date and time --- "Amount to extract information about money amount" Information Extraction Means "These can be realized by natural language processing technology or character string pattern matching processing technology. For example, the “specific name information extracting means” can be realized by morphological analysis of a character string using a company name dictionary, and the “date and time information extracting means” can perform pattern matching of a character string using a pattern dictionary of date and time notation. Can be realized by:

【００２７】情報抽出指示手段９は、選択された文書レ
コードに含まれるテキスト文字列に対し、複数の情報抽
出手段８のどれを適用するかを指示するためのものであ
る。これは、用意されている情報抽出手段８のリストを
表示し、そこから選択させるという、一般的な利用者イ
ンタフェースで採用されている方式により容易に実現で
きる。The information extraction instructing means 9 is for instructing which of the plurality of information extracting means 8 is to be applied to the text string contained in the selected document record. This can be easily realized by a method adopted in a general user interface, in which a list of prepared information extracting means 8 is displayed and selected from the list.

【００２８】文書レコード選択手段７で選択された文書
レコードのテキスト文字列に対し、情報抽出指示手段９
を介して指定した情報抽出手段８による処理の適用を行
なうと、それぞれの文書レコードに対して、抽出された
結果が得られる。それを文書レコードに属性として付加
するのが、属性付加手段１０である。The text string of the document record selected by the document record selection means 7 is input to the information extraction instructing means 9.
When the processing is applied by the information extracting means 8 specified through the above, an extracted result is obtained for each document record. The attribute adding means 10 adds the attribute to the document record as an attribute.

【００２９】たとえば、図３の最初の文書レコードに含
まれるテキスト文字列「○○社のプリンタは印字が奇
麗」に対して、情報抽出指示手段９を介して、「固有名
抽出手段」の適用を行なうと、企業名として「○○」が
得られる。この「○○」を企業名という属性に対する属
性値とし、この属性情報を当該文書レコードに付加する
のが、属性付加手段１０である。これは、文書レコード
保持手段３から読み出した当該文書レコードの最後に属
性情報を付加し、それを再び文書レコード保持手段３に
格納するという、一般的な処理によって実現できる。For example, the application of the “unique name extracting means” to the text character string “printer of the company XX is beautiful” contained in the first document record of FIG. , "XX" is obtained as the company name. The attribute adding unit 10 adds "OO" as an attribute value for the attribute of the company name and adds this attribute information to the document record. This can be realized by a general process of adding attribute information to the end of the document record read from the document record holding unit 3 and storing it again in the document record holding unit 3.

【００３０】図４に示したのは、入力日が1999/1/1から
1999/1/30までの文書レコードを選択し（これは文書レ
コード・ソート手段５によるソート結果を利用すること
で利用者にとっては容易な作業である）、それらに含ま
れるテキスト文字列に対する「固有名抽出手段」の適用
結果を属性情報として、当該対象文書レコードに付加し
た結果を文書レコード表示出力手段４が表示している模
式図である。処理対象としなかった文書に対しては、そ
れが分かるような表示をする。FIG. 4 shows that the input date is from January 1, 1999.
Select document records up to 1999/1/30 (this is an easy task for the user by using the sorting result by the document record sorting means 5), and select "unique" for the text strings contained in them. FIG. 11 is a schematic diagram in which the result of adding to the target document record is displayed by the document record display output unit 4 using the application result of the “name extraction unit” as attribute information. For a document not to be processed, a display is displayed so that the document can be understood.

【００３１】本実施の形態の情報分析支援装置１では、
このように情報抽出手段８を用いて、文書レコードに新
たな属性情報を付加することを可能にする。これと、用
意されている文書レコード・ソート手段５による処理の
適用を組み合わせることにより、従来困難だった情報分
析が容易になる。In the information analysis support device 1 of the present embodiment,
As described above, it is possible to add new attribute information to a document record by using the information extracting means 8. By combining this with the application of the processing by the prepared document record / sorting means 5, information analysis that has been difficult in the past becomes easier.

【００３２】たとえば、情報抽出の結果得られた新たな
属性である企業名をソート基準属性として指定して、ソ
ートを適用すると、図４に示したものが、図５に示すよ
うに並べ替えられ、表示される。それぞれの企業につい
て記述したテキスト文字列を含む文書レコードがひとか
たまりとなって分類されていることになる。文書レコー
ドがさらに大量にある場合も、情報抽出結果を利用した
このような形での並べ替えが可能であるので、情報分析
が従来に比べて非常に容易になる。For example, when a new attribute obtained as a result of the information extraction, a company name, is designated as a sort reference attribute and sorting is applied, what is shown in FIG. 4 is rearranged as shown in FIG. ,Is displayed. Document records containing text strings describing each company are classified as a unit. Even when there are a large number of document records, the information can be rearranged in such a manner using the result of information extraction, so that the information analysis becomes much easier than before.

【００３３】なお、どのような情報の抽出を行う情報抽
出手段８を用意するかによって、効果も異なってくる
が、基本的に属性の値による並べ替えは一般に有効であ
るので、目的に応じて必要な（技術的に可能な）ものを
用意し、選択できるようにすることが重要である。Although the effect differs depending on what kind of information extraction means 8 is provided, the sorting by attribute values is generally effective. It is important that you have what you need (technically possible) and make it available for selection.

【００３４】また、一般的な表計算ソフトにあるような
処理（データ数の集計やクロス集計など）を本実施の形
態の情報分析支援装置１に組込むことで、さらなる分析
支援が可能となるが、これは容易に実現できる。Further, by incorporating processing (such as totaling of the number of data and cross-tabulation) as in general spreadsheet software into the information analysis support apparatus 1 of the present embodiment, further analysis support is possible. , This can be easily realized.

【００３５】次に、情報分析支援装置１のコンピュータ
での具体的な構成例について説明する。図６は、情報分
析支援装置１を構成するコンピュータの概要を示すブロ
ック図である。図６に示すように、情報分析支援装置１
は、ＣＰＵ１１、ＲＯＭ１２、ＲＡＭ１３がバス１４で
接続されている。ＣＰＵ１１は情報分析支援装置１の各
部を集中的に制御する。ＲＯＭ１２にはＢＩＯＳなどが
格納されている。ＲＡＭ１３はデータを書き換え可能に
記憶し、ＣＰＵ１１の作業領域となるものである。Next, a specific configuration example of the computer of the information analysis support apparatus 1 will be described. FIG. 6 is a block diagram illustrating an outline of a computer constituting the information analysis support device 1. As shown in FIG. 6, the information analysis support device 1
In FIG. 1, a CPU 11, a ROM 12, and a RAM 13 are connected by a bus 14. The CPU 11 centrally controls each unit of the information analysis support device 1. The ROM 12 stores a BIOS and the like. The RAM 13 stores data in a rewritable manner and serves as a work area for the CPU 11.

【００３６】バス１４には、キーボードなどで構成され
る入力装置１５、ディスプレイで構成される出力装置１
６、ＣＤ−ＲＯＭドライブ１７、ハードディスク１８が
接続されている。図１に示す各種手段２〜１０の各種機
能の実現は、記録媒体であるＣＤ−ＲＯＭ１９からＣＤ
−ＲＯＭドライブ１７により読み取って、ハードディス
ク１８に格納したプログラムにもとづいて行われる。入
力手段２における文書レコードの入力は入力装置１５を
介して行われ、文書レコード表示出力手段４による文書
レコードの表示は出力装置１６を介して行われ、文書レ
コード保持手段３による文書レコードの記憶は、文書レ
コードをハードディスク１８内に記憶することにより行
われる。なお、記録媒体はＣＤ−ＲＯＭ１９に限定され
るものではなく、フロッピーディスク、光磁気ディスク
など、各種の記録媒体を用いることができる。The bus 14 has an input device 15 composed of a keyboard and the like, and an output device 1 composed of a display.
6, a CD-ROM drive 17 and a hard disk 18 are connected. The realization of the various functions of the various means 2 to 10 shown in FIG.
-Performed based on a program read by the ROM drive 17 and stored in the hard disk 18. The input of the document record in the input means 2 is performed via the input device 15, the display of the document record by the document record display output means 4 is performed via the output device 16, and the storage of the document record by the document record holding means 3 is performed. , By storing the document record in the hard disk 18. Note that the recording medium is not limited to the CD-ROM 19, and various recording media such as a floppy disk and a magneto-optical disk can be used.

【００３７】[発明の実施の形態２]図７は、この発明の
実施の形態２である情報分析支援装置２１の機能ブロッ
ク図である。図１の情報分析支援装置２１を構成する各
手段は互いに接続されており、必要に応じて各々の間で
文書レコードのデータや制御情報のやり取りが可能なよ
う構成されている。[Second Embodiment of the Invention] FIG. 7 is a functional block diagram of an information analysis support device 21 according to a second embodiment of the present invention. The units constituting the information analysis support apparatus 21 of FIG. 1 are connected to each other, and are configured to be able to exchange document record data and control information between the units as needed.

【００３８】図７において、入力手段２、文書レコード
保持手段３、文書レコード表示出力手段４、文書レコー
ド選択手段７、ソート基準属性指定手段６、文書レコー
ド・ソート手段５については、発明の実施の形態１の場
合と同様の手段なので、詳細な説明を省略する。In FIG. 7, the input means 2, the document record holding means 3, the document record display and output means 4, the document record selection means 7, the sort reference attribute designating means 6, and the document record / sort means 5 are the embodiments of the present invention. Since the same means as in the first embodiment, a detailed description is omitted.

【００３９】テキスト分類手段２２は、複数設けられ
（図７の例ではｎ個（ｎは２以上の整数））、いずれも
与えられた複数のテキスト文字列を複数のカテゴリーあ
るいはクラスターに分類する手段である。テキスト分類
技術には、さまざまなものが提案、実現されているが、
たとえば次に示すようなものがある。A plurality of text classification means 22 are provided (n (n is an integer of 2 or more) in the example of FIG. 7), and means for classifying a given text string into a plurality of categories or clusters. It is. Although various text classification technologies have been proposed and realized,
For example, there is the following.

【００４０】 --手段１：単語の有無に基づくカテゴリー分類まず、個々のテキスト文字列を、形態素解析技術を用い
て単語に分解する。分類したいカテゴリー毎にあらかじ
め単語の有無の基準（たとえば、○○という単語が含ま
れていたらカテゴリー「○○」に分類する、など）に基
づき、個々のテキスト文字列がどのカテゴリーに属すか
判定していく。--Means 1: Category Classification Based on Presence or Absence of Word First, individual text character strings are decomposed into words using a morphological analysis technique. For each category you want to classify, determine in advance which category each text string belongs to based on the criteria for the presence or absence of a word (for example, if the word XX is included, classify it into the category "XX"). To go.

【００４１】--手段２：単語共通性に基づく類似度をも
とにしたクラスタリングまず、個々のテキスト文字列を、形態素解析技術を用い
て単語に分解する。次に、テキスト文字列同士の類似性
を、比較する２つの文字列に共通に含まれている単語の
数などをもとに算出し（たとえば、長さが同じ文書同士
の比較では、３つの単語が共通に含まれている文字列同
士の方が１つの単語だけが共通に含まれている文字列同
士より類似性が高いように算出する、など）、すべての
文字列のペア間の類似度が算出されたら、その類似度の
近いもの同士を同じクラスターにまとめていく。-Means 2: Clustering based on similarity based on word commonality First, individual text character strings are decomposed into words using morphological analysis technology. Next, the similarity between the text strings is calculated based on the number of words commonly included in the two character strings to be compared. Calculate so that strings with common words have higher similarity than strings with only one common word, etc.), similarity between all pairs of strings After the degrees are calculated, those with similar similarities are put together in the same cluster.

【００４２】--手段３：テキスト文字列と単語によって
構成される行列の特異値分解を用いた空間構成に基づく
距離計算によるテキスト文字列のクラスタリングまず、個々のテキスト文字列を、形態素解析技術を用い
て単語に分解する。次に、テキスト文字列と単語の有無
の関係を表現する行列（たとえば、テキスト文字列を行
に、それぞれの単語を列に割り当て、行と列で決まる位
置にそのテキスト文字列に含まれるその単語の頻度を入
れる、などで構成できる）を構成する。その行列に対す
る特異値分解処理を用いて、テキスト文字列が配置され
る空間を構成し、そこでの距離をもとにクラスタリング
を行なう。-Means 3: Clustering of Text Strings by Distance Calculation Based on Spatial Configuration Using Singular Value Decomposition of Matrix Consisting of Text Strings and Words First, individual text strings are morphologically analyzed. To break it down into words. Next, a matrix expressing the relationship between the text string and the presence or absence of the word (for example, assigning the text string to a row and each word to a column, and placing the word in the text string at a position determined by the row and column) , Etc.). Using the singular value decomposition processing for the matrix, a space in which the text character string is arranged is constructed, and clustering is performed based on the distance there.

【００４３】図８に、手段１の「単語の有無に基づくカ
テゴリー分類」に利用する、カテゴリー毎の単語の有無
による分類基準のデータ構造の模式図を示す。左側の列
がカテゴリー名（値）、右側がその判定基準となる含ま
れるべき単語（複数も可）である。この基準によるカテ
ゴリー分類は、具体的にはたとえば、次のような手順で
行なう。まず、選択された文書レコードを文書レコード
保持手段３から読み出し、そこに含まれているテキスト
文字列を形態素解析により単語に分解し、続いて、図８
に示した基準のデータ構造にふくまれる単語と順次照合
していく。マッチするものがあったら、そのカテゴリー
名を返す。マッチするものがなかったら、カテゴリー名
は「なし」という値を返す。FIG. 8 is a schematic diagram of a data structure of a classification criterion based on the presence / absence of a word for each category, which is used for the “category classification based on the presence / absence of a word” of the means 1. The column on the left is the category name (value), and the column on the right is the word (or words) to be included, which is the criterion. The categorization based on this criterion is specifically performed in the following procedure, for example. First, the selected document record is read from the document record holding unit 3, and the text character string contained therein is decomposed into words by morphological analysis.
Are sequentially collated with words included in the reference data structure shown in FIG. If there is a match, return the category name. If no match is found, the category name returns the value "none".

【００４４】図９には、その「単語の有無に基づくカテ
ゴリー分類」を適用する対象となる文書レコードの表示
例を示す。FIG. 9 shows a display example of a document record to which the “category classification based on the presence or absence of a word” is applied.

【００４５】分類指示手段２３は、選択された文書レコ
ードに含まれるテキスト文字列に対し、複数のテキスト
分類手段２２の中のどれを適用するかを指示するための
ものである。これは、用意されているテキスト分類のリ
ストを表示し、そこから選択させるという、一般的な利
用者インタフェースで採用されている方式により容易に
実現できる。The classification instructing means 23 is for instructing which of the plurality of text classifying means 22 is to be applied to the text character string contained in the selected document record. This can be easily realized by a method adopted in a general user interface, in which a list of prepared text classifications is displayed and selected from the list.

【００４６】図９に示した文書レコードのすべてを文書
レコード選択手段７で選択し、分類指示手段２３によっ
て、例えば「単語の有無に基づくカテゴリー分類」の適
用を指示すると、選択された各文書レコードに含まれる
テキスト文字列に対し、カテゴリーの付与が行われる。
たとえば、最初の文書レコードに含まれるテキスト文字
列「○○社のプリンタは印字が奇麗」には、単語「○○
社」が含まれているので、カテゴリー名「○○」が得ら
れることになる。この得られた結果を他の属性情報と同
様に扱えるようにするのが、分類指示手段２３である。When all the document records shown in FIG. 9 are selected by the document record selecting means 7 and the application of "category classification based on the presence or absence of a word" is instructed by the classification instructing means 23, each of the selected document records is The category is assigned to the text string included in.
For example, the text string included in the first document record, "The printer of company XX is beautiful,"
Since “company” is included, the category name “XX” is obtained. The classification instructing means 23 enables the obtained result to be handled in the same manner as other attribute information.

【００４７】すなわち、文書レコード選択手段７で選択
された文書レコードのテキスト文字列に対し、分類指示
手段２３を介して指定したテキスト分類手段２２による
処理の適用を行なうと、それぞれの文書レコードに対し
て、分類結果であるカテゴリーあるいはクラスタの名前
ないし番号が得られる。それを文書レコードに属性とし
て付加するのが、属性付加手段２４である。これは、文
書レコード保持手段３から読み出した当該文書レコード
の最後に属性情報を付加し、それを再び文書レコード保
持手段３に格納するという、一般的な処理によって実現
できる。That is, when the processing of the text character string of the document record selected by the document record selection means 7 by the text classification means 22 specified through the classification instruction means 23 is performed, As a result, the name or number of the category or cluster as the classification result is obtained. The attribute adding unit 24 adds the attribute to the document record as an attribute. This can be realized by a general process of adding attribute information to the end of the document record read from the document record holding unit 3 and storing it again in the document record holding unit 3.

【００４８】図９に示した文書レコードのすべてに、
「単語の有無に基づくカテゴリー分類」を行なった際の
属性付加手段２４の適用結果を文書レコード表示出力手
段４を介して表示した様子を図１０に示す。なお属性名
には使った分類基準データの名前などをつけるようにし
ておくこととする。All of the document records shown in FIG.
FIG. 10 shows a state in which the application result of the attribute adding unit 24 when the “category classification based on the presence or absence of a word” is performed is displayed via the document record display output unit 4. Note that the attribute name is given the name of the used classification standard data.

【００４９】いったん属性情報になれば、文書レコード
ソート手段５により、それらを、その属性を基準に並べ
替えることが可能になるのは、発明の実施の形態１の場
合と同様である。それによって、分類結果を操作の対象
とすることが容易となり、情報分析が従来の分類システ
ムに比べてはるかに容易になる。Once the attribute information has been obtained, it becomes possible for the document record sorting means 5 to sort them based on the attribute, as in the first embodiment of the invention. As a result, the classification result can be easily manipulated, and the information analysis is much easier than the conventional classification system.

【００５０】本発明の実施の形態では、特徴の異なった
テキスト分類手段２２を複数用意することで、それらの
結果が別々の属性情報として得られるので、たとえば、
最初にあるテキスト分類手段２２で分類し、それによっ
て得られた属性を基準にソートし、その属性の値が同じ
文書レコードだけを選択して、それを対象に別の分類を
行なうなど、組み合わせた処理が可能になる。In the embodiment of the present invention, by preparing a plurality of text classifying means 22 having different characteristics, their results can be obtained as separate attribute information.
First, the text is classified by a certain text classification means 22, sorted based on the attribute obtained by the classification, and only the document records having the same attribute value are selected, and another classification is performed on the document record. Processing becomes possible.

【００５１】具体的な例としては、ある競合企業の商品
についての意見をその内容で分類したいなどの場合、ま
ず企業による分類を行ない、その結果から目的の企業に
関する意見を含む文書レコードだけを選び、それに対し
て、内容の類似性（手段２や手段３など）で分類する、
ということが可能になる。As a specific example, when it is desired to classify opinions on a product of a competitor by its contents, the classification is first performed by the company, and only the document record containing the opinion on the target company is selected from the result. , On the other hand, are classified by the similarity of the contents (means 2, means 3, etc.)
It becomes possible.

【００５２】このように、情報分析支援装置２１を活用
することで、種々の機能を組み合せた情報分析が可能に
なる。As described above, by utilizing the information analysis support apparatus 21, information analysis combining various functions becomes possible.

【００５３】なお、一般的な表計算ソフトにあるような
処理（データ数の集計やクロス集計など）を本発明に組
込むことで、さらなる分析支援が可能となるが、これは
容易に実現できる。It should be noted that by incorporating processing (such as totaling of the number of data and cross-tabulation) as in general spreadsheet software into the present invention, further analysis support is possible, but this can be easily realized.

【００５４】また、情報分析支援装置１１と情報分析支
援装置２１を組み合せることで、属性として扱える情報
が増し、組み合せての利用が広がることは言うまでもな
い。Further, by combining the information analysis support device 11 and the information analysis support device 21, the information that can be handled as an attribute increases, and it goes without saying that the use of the combination increases.

【００５５】情報分析支援装置２１のコンピュータでの
具体的な構成例は、図６を参照して情報分析支援装置１
について説明したものと同様であるため、詳細な説明は
省略する。A specific example of the configuration of the information analysis support apparatus 21 on a computer will be described with reference to FIG.
Is the same as that described above, and a detailed description thereof will be omitted.

【００５６】[0056]

【発明の効果】請求項１に記載の発明は、複数の文書レ
コードがあった場合に、それらに含まれるテキスト文字
列に対し、あらかじめ定めた特定の条件を満たす文字列
を抽出するメニューを目的に応じて選択し適用すること
で情報抽出を行ない、その結果を最初から文書レコード
に含まれていた属性の情報と同様に利用することを可能
として、関連する文書レコードを整理することができ
る。An object of the present invention is to provide a menu for extracting a character string satisfying a predetermined condition from text strings included in a plurality of document records when there are a plurality of document records. The information is extracted by selecting and applying the information according to the information, and the result can be used in the same manner as the attribute information included in the document record from the beginning, and related document records can be organized.

【００５７】請求項２に記載の発明は、複数の文書レコ
ードがあった場合に、それらに含まれるテキスト文字列
をあらかじめ定めた手法に基づき分類するメニューを目
的に応じて選択し適用することで分類を行ない、その結
果を最初から文書レコードに含まれていた属性の情報と
同様に利用することを可能として、複数の分類結果を組
み合せた分類が可能となる。また、他の属性との関連も
みやすくなり、情報分析作業の効率が向上する。According to a second aspect of the present invention, when there are a plurality of document records, a menu for classifying text strings included in the document records based on a predetermined method is selected and applied according to the purpose. Classification is performed, and the result can be used in the same manner as the attribute information included in the document record from the beginning, so that classification can be performed by combining a plurality of classification results. Further, the relation with other attributes is easily seen, and the efficiency of the information analysis work is improved.

【００５８】請求項３に記載の発明は、複数の文書レコ
ードがあった場合に、それらに含まれるテキスト文字列
に対し、あらかじめ定めた特定の条件を満たす文字列を
抽出するメニューを目的に応じて選択し適用することで
情報抽出を行ない、その結果を最初から文書レコードに
含まれていた属性の情報と同様に利用することを可能と
して、関連する文書レコードを整理することができる。According to a third aspect of the present invention, when there are a plurality of document records, a menu for extracting a character string satisfying a predetermined specific condition from a text character string included in the plurality of document records is provided according to the purpose. By selecting and applying the information, the information is extracted, and the result can be used in the same way as the attribute information included in the document record from the beginning, and related document records can be organized.

【００５９】請求項４に記載の発明は、複数の文書レコ
ードがあった場合に、それらに含まれるテキスト文字列
をあらかじめ定めた手法に基づき分類するメニューを目
的に応じて選択し適用することで分類を行ない、その結
果を最初から文書レコードに含まれていた属性の情報と
同様に利用することを可能として、複数の分類結果を組
み合せた分類が可能となる。また、他の属性との関連も
みやすくなり、情報分析作業の効率が向上する。According to a fourth aspect of the present invention, when there are a plurality of document records, a menu for classifying text strings contained therein based on a predetermined method is selected and applied according to the purpose. Classification is performed, and the result can be used in the same manner as the attribute information included in the document record from the beginning, so that classification can be performed by combining a plurality of classification results. Further, the relation with other attributes is easily seen, and the efficiency of the information analysis work is improved.

【００６０】請求項５に記載の発明は、複数の文書レコ
ードがあった場合に、それらに含まれるテキスト文字列
に対し、あらかじめ定めた特定の条件を満たす文字列を
抽出するメニューを目的に応じて選択し適用することで
情報抽出を行ない、その結果を最初から文書レコードに
含まれていた属性の情報と同様に利用することを可能と
して、関連する文書レコードを整理することができる。According to a fifth aspect of the present invention, when there are a plurality of document records, a menu for extracting a character string which satisfies a predetermined specific condition from a text character string included in the plurality of document records is provided according to the purpose. By selecting and applying the information, the information is extracted, and the result can be used in the same way as the attribute information included in the document record from the beginning, and related document records can be organized.

【００６１】請求項６に記載の発明は、複数の文書レコ
ードがあった場合に、それらに含まれるテキスト文字列
をあらかじめ定めた手法に基づき分類するメニューを目
的に応じて選択し適用することで分類を行ない、その結
果を最初から文書レコードに含まれていた属性の情報と
同様に利用することを可能として、複数の分類結果を組
み合せた分類が可能となる。また、他の属性との関連も
みやすくなり、情報分析作業の効率が向上する。According to a sixth aspect of the present invention, when there are a plurality of document records, a menu for classifying the text character strings contained therein based on a predetermined method is selected and applied according to the purpose. Classification is performed, and the result can be used in the same manner as the attribute information included in the document record from the beginning, so that classification can be performed by combining a plurality of classification results. Further, the relation with other attributes is easily seen, and the efficiency of the information analysis work is improved.

[Brief description of the drawings]

【図１】この発明の実施の形態１である情報分析支援装
置の機能ブロック図である。FIG. 1 is a functional block diagram of an information analysis support device according to a first embodiment of the present invention.

【図２】前記情報分析支援装置の文書レコード保持手段
のデータ構造の模式図である。FIG. 2 is a schematic diagram of a data structure of a document record holding unit of the information analysis support device.

【図３】前記情報分析支援装置の文書レコード表示出力
手段に文書レコードの表示例を示す説明図である。FIG. 3 is an explanatory diagram showing a display example of a document record on a document record display output unit of the information analysis support device.

【図４】前記情報分析支援装置の文書レコード表示出力
手段に文書レコードの表示例を示す説明図である。FIG. 4 is an explanatory diagram showing a display example of a document record on a document record display output unit of the information analysis support device.

【図５】前記情報分析支援装置の文書レコード表示出力
手段に文書レコードの表示例を示す説明図である。FIG. 5 is an explanatory diagram showing a display example of a document record on a document record display output unit of the information analysis support device.

【図６】前記情報分析支援装置を構成するコンピュータ
の概要を示すブロック図である。FIG. 6 is a block diagram showing an outline of a computer constituting the information analysis support device.

【図７】この発明の実施の形態２である情報分析支援装
置の機能ブロック図である。FIG. 7 is a functional block diagram of an information analysis support device according to a second embodiment of the present invention.

【図８】前記情報分析支援装置が用いる分類基準の例の
データ構造の模式図である。FIG. 8 is a schematic diagram of a data structure of an example of a classification standard used by the information analysis support device.

【図９】前記分類基準を適用する対象となる文書レコー
ドの表示例を示す説明図である。FIG. 9 is an explanatory diagram showing a display example of a document record to which the classification criterion is applied.

【図１０】前記分類基準で分類を行なった際の前記情報
分析支援装置の属性付加手段の適用結果の表示例を示す
説明図である。FIG. 10 is an explanatory diagram showing a display example of an application result of an attribute adding unit of the information analysis support device when performing classification according to the classification criterion.

[Explanation of symbols]

１情報分析支援手段３文書レコード保持手段５文書レコード・ソート手段６ソート基準属性指定手段７文書レコード選択手段８情報抽出手段９情報抽出指示手段１０属性不可手段２１情報分析支援手段２２テキスト分類手段２３分類指示手段２４属性付加手段 DESCRIPTION OF SYMBOLS 1 Information analysis support means 3 Document record holding means 5 Document record / sort means 6 Sort reference attribute designation means 7 Document record selection means 8 Information extraction means 9 Information extraction instructing means 10 Attribute disable means 21 Information analysis support means 22 Text classification means 23 Classification instruction means 24 Attribute addition means

Claims

[Claims]

1. A document record holding means for storing a plurality of document records each comprising a text character string and attribute information of the text character string, and at least one document from the document records stored in the document record holding means. Document record selecting means for selecting a record; sort reference attribute specifying means for enabling selection of an attribute to be a reference for sorting the selected one or more document records; and the document record selecting means A document record sorting means for sorting one or more document records selected according to the property of the information of the attribute specified by the sort reference attribute specifying means; One or more types of information extracting means for extracting a character string satisfying the condition of Information extraction instructing means for enabling selection of the information extracting means to be applied to a specified text character string included in one or more document records obtained, and a character string extracted by the selected information extracting means And an attribute adding unit for adding, as information of the attribute corresponding to the text character string of the extraction destination, to the document record.

2. A document record holding means for storing a plurality of document records each comprising a text character string and attribute information of the text character string, and at least one document from the document records stored in the document record holding means. Document record selecting means for selecting a record; sort reference attribute specifying means for enabling selection of an attribute to be a reference for sorting the selected one or more document records; and the document record selecting means Document record sorting means for performing sorting on one or more document records selected according to the property of the information of the attribute specified by the sort reference attribute specifying means, and a plurality of the text character strings are predetermined. At least one type of text classifying means for classifying based on the selected method; A classification instructing unit that enables selection of the text classification unit to be applied to a specified text string included in one or more document records; and a classification result obtained by the selected text classification unit. An information analysis support device comprising: an attribute adding unit that adds the attribute information corresponding to the text character string to the document record.

3. A type that enables a character string satisfying a predetermined condition to be extracted from the text character string included in a document record including a text character string and attribute information of the text character string. Selecting, from the menu, one to apply to a specified text string contained in one or more of the document records selected from a plurality of the document records; and selecting the character according to the selected menu. An information analysis support method, comprising: extracting a string; and adding the extracted character string to the document record as information of the attribute corresponding to the text string of the extraction destination.

4. One or more types of menus that enable the text string included in a document record including a text string and attribute information of the text string to be classified based on a predetermined method. Selecting what to apply to a specified text string contained in one or more of the document records selected from the plurality of document records; and classifying the text string according to the selected menu And a step of adding the classified result to the document record as information of the attribute corresponding to the classified text character string.

5. A step of receiving a selection of one or more document records from a plurality of document records each including a text character string and attribute information of the text string; Receiving a selection of an attribute to be a reference for sorting records; and information of the attribute received by the attribute selection receiving step for one or more document records received by the document record selection receiving step. Performing a sort in accordance with the nature of a plurality of the text strings included in the document record, from one or more types of menus capable of extracting a character string that satisfies predetermined conditions Select what to apply to a specified text string contained in one or more of the document records selected from the document record Extracting the character string according to the selected menu; and adding the extracted character string to the document record as information of the attribute corresponding to the text character string of the extraction destination. A computer-readable storage medium that stores a program to be executed by a computer.

6. A step of receiving a selection of one or more document records from a plurality of document records each including a text character string and information on attributes of the text character string; Receiving a selection of an attribute to be a reference for sorting records; and information of the attribute received by the attribute selection receiving step for one or more document records received by the document record selection receiving step. Performing a sort according to the nature of the document record, and classifying the text character string included in the document record based on a predetermined method.
Selecting from the more than one type of menu what to apply to the specified text string contained in one or more of the document records selected from the plurality of document records; and Storing a program for causing a computer to execute a step of classifying a text character string and a step of adding the result of the classification to the document record as information of the attribute corresponding to the classified text string. A readable storage medium.