JPH01188934A

JPH01188934A - Automatic document sorting device

Info

Publication number: JPH01188934A
Application number: JP63013063A
Authority: JP
Inventors: Atsushi Tamura; 淳田村
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-01-22
Filing date: 1988-01-22
Publication date: 1989-07-28
Anticipated expiration: 2009-02-02
Also published as: JPH069054B2

Abstract

PURPOSE:To effectively sort documents by checking a sample document group to obtain the appearing frequency information on the key words of each field and knowing a key word having the high identifying power as well as the degree of this identifying degree. CONSTITUTION:In a preparatory process a key word is extracted by an automatic key word extracting means 2 for a sample document. Then the appearing frequency of the extracted key word is counted by a positive score table production means 71 for acquisition of the squared value. Then a key word having high identifying power is selected and at the same time the score of the key word showing the degrees of contribution to each field is calculated from said squared value. These calculated scores are stored in a score table storing means 8. In a field process, the means 2 ejects the key word to the document received from a document input means 1. Then the score of the key word is read out by reference to the means 8 and added to each field. The sorting operation is carried out to the fixed area of a document from its head toward a field showing the highest score.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、文書自動分類装置に関するものである。[Detailed description of the invention] (Industrial application field) The present invention relates to an automatic document classification device.

（従来の技術）従来は、文書の分類は人手によっていたため、非能率的
であった。また、あるキーワードが出現なときに特定の
項目へ分類する方法では、分類は自動的に行えるものの
、キーワードと分類先との対応関係はあらかじめ人手で
つけておかなければならなかった。(Prior Art) Conventionally, document classification was done manually, which was inefficient. In addition, in the method of classifying into a specific item when a certain keyword appears, although the classification can be performed automatically, the correspondence between the keyword and the classification destination had to be established manually in advance.

（発明が解決しようとする問題点）以上述べたよ−うに、従来の文書の分類では、人手を介
するため、正確ではあるものの時間とコストがかかると
いう問題点があった。(Problems to be Solved by the Invention) As described above, the conventional document classification requires manual labor, so although it is accurate, it is time consuming and costly.

本発明の目的は、このような従来の欠点を除去して、文
書分類の際に各分類ごとのキーワードの出現頻度情報を
利用して自動的に分類する新規な文書自動分類装置を提
供することにある。An object of the present invention is to provide a novel automatic document classification device that eliminates such conventional drawbacks and automatically classifies documents using frequency information of keywords for each classification. It is in.

（問題点を解決するための手段）本発明の文書自動分類装置は、（ａ）電子化文書を入力する文書入力手段、（ｂ）前記
文書入力手段から文書を受け取り、その文書中のキーワ
ードを自動的に抽出するキーワード自動抽出手段、（ｃ）前記文書入力手段に標本文書が入力されたときに
、前記キーワード自動抽出手段により抽出されたキーワ
ードの出現頻度から統計値をもとに各キーワードの各分
野への肯定的な貢献度を表す正の得点を計算し、得点表
を作成する正得点表作成手段、（ｄ）前記得点表作成手段により作成された得点表を格
納する得点表表示手段、（ｅ）前記文書入力手段に分類すべき文書が入力された
ときに、前記キーワード自動抽出手段により抽出された
キーワードを入力として、そのキーワードに対応する得
点を前記得点表表示手段を参照することにより入力して
、入力文書の各分野ごとの得点を計算する得点計算手段
、（ｆ）前記得点計算手段から各分野の得点を受け取り、
その得点をもとに一つの分類先を決定する単一分類手段
、（ｇ）前記分類手段から分類結果を受け取り、その分類
結果を格納する分類結果表示手段、（ｈ）前記分類手段
から分類結果を受け取り、その分類結果を表示する分類
結果表示手段、を備えていることを特徴としている。(Means for Solving the Problems) The automatic document classification device of the present invention includes (a) a document input means for inputting an electronic document; (b) a document received from the document input means, and keywords in the document. automatic keyword extraction means for automatically extracting keywords; (c) when a sample document is input to the document input means, each keyword is extracted based on statistical values from the appearance frequency of the keywords extracted by the automatic keyword extraction means; (d) a score sheet display means for storing the score sheet created by the score sheet creation means; (e) When a document to be classified is input to the document input means, inputting a keyword extracted by the automatic keyword extraction means, and referring to the score table display means for the score corresponding to the keyword; (f) receiving the score for each field from the score calculating means;
A single classification means that determines one classification destination based on the score; (g) A classification result display means that receives the classification result from the classification means and stores the classification result; (h) A classification result from the classification means. The present invention is characterized by comprising a classification result display means for receiving the classification result and displaying the classification result.

本発明の第２の文書自動分類装置は、（ａ）゛電子化文書を入力する文書入力手段、（ｂ）前
記文書入力手段から文書を受け取り、その文書中のキー
ワードを自動的に抽出するキーワード自動抽出手段、（ｃ）前記文書入力手段に標本文書が入力されたときに
、前記キーワード自動抽出手段により抽出されたキーワ
ードの出現頻度から統計値をもとに各キーワードの各分
野への肯定的な貢献度を表す正の得点および否定的な貢
献度を表す負の得点を計算し、得点表を作成する正負得
点表作成手段、（ｄ）前記得点表作成手段により作成された得点表を格
納する得点表表示手段、（ｅ）前記文書入力手段に分類すべき文書が入力された
ときに、前記キーワード自動抽出手段により抽出された
キーワードを入力として、そのキニワードに対応する得
点を前記得点表表示手段を参照することにより入力して
、入力文書の各分野ごとの得点を計算する得点計算手段
、（ｆ）前記得点計算手段から各分野の得点を受け取り、
その得点をもとに一つの分類先を決定する単一分類手段
、（ｇ）前記分類手段から分類結果を受け取り、その分類
結果を格納する分類結果表示手段、（ｈ）前記分類手段
から分類結果を受け取り、その分類結果を表示する分類
結果表示手段、を備えていることを特徴としている。A second automatic document classification device of the present invention includes (a) a document input means for inputting an electronic document, and (b) a keyword for receiving a document from the document input means and automatically extracting a keyword in the document. automatic extraction means; (c) when a sample document is input to the document input means, a positive expression for each field of each keyword based on statistical values from the appearance frequency of the keywords extracted by the automatic keyword extraction means; positive and negative score table creation means for calculating a positive score representing a degree of contribution and a negative score representing a degree of negative contribution, and creating a score sheet; (d) storing a score sheet created by the score sheet creation means; (e) When a document to be classified is input to the document input means, a keyword extracted by the automatic keyword extraction means is input, and a score corresponding to the keyword is displayed in the score table. a score calculation means for calculating a score for each field of the input document by inputting the input document by referring to the means; (f) receiving the score for each field from the score calculation means;
A single classification means that determines one classification destination based on the score; (g) A classification result display means that receives the classification result from the classification means and stores the classification result; (h) A classification result from the classification means. The present invention is characterized by comprising a classification result display means for receiving the classification result and displaying the classification result.

本発明の第３の文書自動分類装置は、（ａ）電子化文書を入力する文書入力手段、（ｂ）前記
文書入力手段から文書を受け取り、その文書中のキーワ
ードを自動的に抽出するキーワード自動抽出手段、（ｃ）前記文書入力手段に標本文書が入力されたときに
、前記キーワード自動抽出手段により抽出されたキーワ
ードの出現頻度から統計値をもとに各キーワードの各分
野への肯定的な貢献度を表す正の得点を計算し、得点表
を作成する正得点表作成手段、（ｄ）前記得点表作成手段により作成された得点表を格
納する得点表表示手段、（ｅ）前記文書入力手段に分類すべき文書が入力された
ときに、前記キーワード自動抽出手段により抽出された
キーワードを入力として、そのキーワードに対応する得
点を前記得点表表示手段を参照することにより入力して
、入力文書の各分野ごとの得点を計算する得点計算手段
、（ｆ）前記得点計算手段から各分野の得点を受け取り、
その得点をもとに複数の分類先を決定する複数分類手段
、（ｃ）前記分類手段から分類結果を受け取り、その分類
結果を格納する分類結果表示手段、（ｈ）前記分類手段
から分類結果を受け取り、その分類結果を表示する分類
結果表示手段、を備えていることを特徴としている。A third automatic document classification device of the present invention includes (a) a document input means for inputting an electronic document; (b) a keyword automatic classification device that receives a document from the document input means and automatically extracts keywords in the document. (c) when a sample document is input to the document input means, a positive keyword for each field is determined based on statistical values from the appearance frequency of the keywords extracted by the automatic keyword extraction means; a positive score sheet creation means that calculates a positive score representing the degree of contribution and creates a score sheet; (d) a score sheet display means that stores the score sheet created by the score sheet creation means; (e) the document input. When a document to be classified as a means is input, the keyword extracted by the automatic keyword extraction means is input, the score corresponding to the keyword is input by referring to the score table display means, and the input document is (f) receiving the scores for each field from the score calculating means;
a plurality of classification means for determining a plurality of classification destinations based on the scores; (c) a classification result display means for receiving classification results from the classification means and storing the classification results; (h) a classification result display means for receiving the classification results from the classification means; The present invention is characterized by comprising a classification result display means for receiving and displaying the classification results.

本発明め第４の文書自動分類装置は、（ａ）　電子化文書を入力する文書入力手段、（ｂ）前
記文書入力手段から文書を受け取り、その文書中のキー
ワードを自動的に抽出するキーワード自動抽出手段、＜ｅ）前記文書入力手段に標本文書が入力されたときに
、前記キーワード自動抽出手段により抽出されたキーワ
ードの出現頻度から統計値をもとに各キーワードの各分
野への肯定的な貢献度を表す正の得点および否定的な貢
献度を表す負の得点を計算し、得点表を作成する正負得
点表作成手段、（ｄ）前記得点表作成手段により作成された得点表を格
納する得点表表示手段、（ｅ）前記文書入力手段に分類すべき文書が入力された
ときに、前記キーワード自動抽出手段により抽出された
キーワードを入力として、そのキーワードに対応する得
点を前記得点表表示手段を参照することにより入力して
、入力文書の各分野ごとの得点を計算する得点計算手段
、（ｆ）前記得点計算手段から各分野の得点を受け取り、
゛その得点をもとに複数の分類先を決定する複数分類手
段、（ｇ）前記分類手段から分類結果を受け取り、その分類
結果を格納する分類結果表示手段、（ｈ）前記分類手段
から分類結果を受け取り、その分類結果を表示する分類
結果表示手段、を備えていることを特徴としている。A fourth automatic document classification device of the present invention includes: (a) a document input means for inputting an electronic document; (b) a keyword automatic classification device for receiving a document from the document input means and automatically extracting keywords in the document. Extracting means, <e) When a sample document is input to the document inputting means, determining the positive association of each keyword with respect to each field based on statistical values from the appearance frequency of the keywords extracted by the automatic keyword extraction means; positive and negative score table creation means for calculating a positive score representing a degree of contribution and a negative score representing a negative degree of contribution and creating a score sheet; (d) storing the score sheet created by the score sheet creation means; (e) When a document to be classified is input to the document input means, the keyword extracted by the automatic keyword extraction means is input, and the score table display means displays the score corresponding to the keyword. (f) receiving the score for each field from the score calculating means;
゛Multiple classification means that determines a plurality of classification destinations based on the scores; (g) classification result display means that receives classification results from the classification means and stores the classification results; (h) classification results from the classification means; The present invention is characterized by comprising a classification result display means for receiving the classification result and displaying the classification result.

第１図は文書自動分類装置のブロック図であって、第１
図において１は文書入力手段、２はキーワード自動抽出
手段、３は得点計算手段、４は単一分類手段、５は分類
結果表示手段、６は分類結果表示手段、７は正得点表作
成手段、８は得点表表示手段である。FIG. 1 is a block diagram of the automatic document classification device.
In the figure, 1 is a document input means, 2 is a keyword automatic extraction means, 3 is a score calculation means, 4 is a single classification means, 5 is a classification result display means, 6 is a classification result display means, 7 is a correct score table creation means, 8 is a score sheet display means.

第４図は文書自動分類装置のブロック図であって、第４
図において１は文書入力手段、２はキーワード自動抽出
手段、３は得点計算手段、４は単一分類手段、５は分類
結果表示手段、６は分類結果表示手段、７は正負得点表
作成手段、８は得点表表示手段である。FIG. 4 is a block diagram of the automatic document classification device.
In the figure, 1 is a document input means, 2 is a keyword automatic extraction means, 3 is a score calculation means, 4 is a single classification means, 5 is a classification result display means, 6 is a classification result display means, 7 is a positive/negative score table creation means, 8 is a score sheet display means.

第５図は文書自動分類装置のブロック図であって、第５
図において１は文書入力手段、２はキーワード自動抽出
手段、３は得点′計算手段、４は複数分類手段、５は分
類結果表示手段、６は分類結果表示手段、７は正得点表
作成手段、８は得点表表示手段である。FIG. 5 is a block diagram of the automatic document classification device.
In the figure, 1 is a document input means, 2 is a keyword automatic extraction means, 3 is a score calculation means, 4 is a multiple classification means, 5 is a classification result display means, 6 is a classification result display means, 7 is a correct score table creation means, 8 is a score sheet display means.

第６図は文書自動分類装置のブロック図であって、第６
図において１は文書入力手段、２はキーワード自動抽出
手段、３は得点計算手段、４は複数分類手段、５は分類
結果表示手段、６は分類結果表示手段、７は正負得点表
作成手段、８は得点表表示手段である。FIG. 6 is a block diagram of the automatic document classification device.
In the figure, 1 is a document input means, 2 is a keyword automatic extraction means, 3 is a score calculation means, 4 is a multiple classification means, 5 is a classification result display means, 6 is a classification result display means, 7 is a positive/negative score table creation means, 8 is a score sheet display means.

（作用）本発明においては、標本文書群を調べることにより各分
野におけるキーワードの出現頻度情報を得て、識別力の
高いキーワードとその識別力の高さを知ることができる
。第２および第４の発明においては、ある分野における
キーワードの出やすさだけでなく出にくさをも考慮する
ことにより、情報を有効に活用して文書を効率的に分類
することができる。第１および第２の発明においては、
単一の分類先へ分類することができ、第３および第４の
発明においては、複数の分類先へ分類することができる
。(Operation) In the present invention, by examining a group of sample documents, information on the appearance frequency of keywords in each field can be obtained, and keywords with high discriminative power and their high discriminative powers can be known. In the second and fourth inventions, by considering not only the ease with which keywords appear in a certain field but also the difficulty in finding them, documents can be efficiently classified by effectively utilizing information. In the first and second inventions,
It is possible to classify into a single classification destination, and in the third and fourth inventions, it is possible to classify into a plurality of classification destinations.

（実施例１）本発明の第１の装置を用いた文書分類手順を以下で説明
する。手順は、キーワードの出現頻度と分野との関係を
調べるために標本データに対して行う準備処理と、実際
に文書を分類する分類処理の２つに大別される。(Example 1) A document classification procedure using the first device of the present invention will be described below. The procedure is broadly divided into two: a preparation process performed on sample data to examine the relationship between the frequency of appearance of keywords and the field, and a classification process to actually classify documents.

まず、準備処理について第１図、第２図を参照しながら
述べる。準備処理においては、標本文書に対して文書入
力手段１、キーワード自動抽出手段２、正得点表作成手
段７１、得点表表示手段８が使われる。準備処理手順を
以下で説明する。まず、文書入力手段１により入力され
た標本文書に対して、ステップ１１でキーワード自動抽
出手段２によってキーワードが抽出される。ステップ１
１では基本的に文書中の名詞、す変動側語幹が抽出され
る。゛そのほか、キーワード自動抽出手段２内の辞書に
登録されていない同字種からなる文字列も抽出される。First, the preparation process will be described with reference to FIGS. 1 and 2. In the preparation process, the document input means 1, the automatic keyword extraction means 2, the correct score table creation means 71, and the score table display means 8 are used for the sample document. The preparation procedure will be explained below. First, keywords are extracted from a sample document input by the document input means 1 by the automatic keyword extraction means 2 in step 11. Step 1
1 basically extracts nouns and variable stems from the document.゛In addition, character strings consisting of the same character types that are not registered in the dictionary in the automatic keyword extraction means 2 are also extracted.

前記ステップ１１で抽出されたキーワードの出現頻度を
正得点表作成手段７１によりステップ１２で数え、第ｉ
番目のキーワードの第ｊ分野における出現頻度Ｘｌｊを
調べる。前記ステップ１１と前記ステップ１２は標本デ
ータのある限り繰り返される。標本データを調べ終えた
ならば、この出現頻度ＸＩＪからステップ１３でカイ二
乗値Ｘ２１を正得点表作成手段７１により求める。具体
的には、（１１式および（′２Ｊ式を用いる。The appearance frequency of the keyword extracted in step 11 is counted in step 12 by the correct score table creation means 71, and the i-th
The appearance frequency Xlj of the th keyword in the jth field is examined. The steps 11 and 12 are repeated as long as there is sample data. After examining the sample data, the chi-square value X21 is determined from the appearance frequency XIJ in step 13 by the correct score table creation means 71. Specifically, formulas (11 and ('2J) are used.

Ｘ　、＝Σ（Ｘ　ＩＪ　　ａ　ＩＪ）　／　ａ　ＩＪ　
　　　　　　（１１ｊ＝１ａｌＪ−Σ　ｘ　＋に′Ｘ、Ｘ　ＩＪ／Σ　Σ　Ｘ、ｋ
（２１に−１１−１ｋ−１１−１ここで、ＸＩＪは第ｉ番目のキーワードの第ｊ分野にお
ける実際の出現頻度、ａｌＪは第ｉ番目のキーワードの
第ｊ分野における理論度数、Ｍは異なり単語数、ｎは分
野数である。なお、理論度数とは各分野均一にキーワー
ドが出現した場合のキーワードの出現頻度をいう。X, = Σ(X IJ a IJ) / a IJ
(11j=1 alJ−Σ x +′X, X IJ/Σ Σ
(21-11-1k-11-1 Here, XIJ is the actual appearance frequency of the i-th keyword in the j-th field, alJ is the theoretical frequency of the i-th keyword in the j-th field, and M is a different word The number n is the number of fields. Note that the theoretical frequency refers to the frequency of appearance of a keyword when the keyword appears uniformly in each field.

次にステップ１４で正得点表作成手段７１により（２１
式を満たす第ｉ番目のキーワードを識別力のあるキーワ
ードとして選別する。θは処理時間と精度とを勘案して
定める。Next, in step 14, the correct score table creation means 71 (21
The i-th keyword that satisfies the formula is selected as a keyword with discriminative power. θ is determined in consideration of processing time and accuracy.

Ｘ２１〉θ　　　　　　　　　　　　　　　　（２前記
ステツプ１４により選別されたキーワードの数をｍとす
る。X21>θ (2) Let m be the number of keywords selected in step 14.

ステップ１５でカイ二乗値Ｘ２肋’ら第ｉ一番目のキー
ワードの第ｊ分野への貢献度を示す得点ＷＩＪを正得点
表作成手段７１により算出する。第ｊ分野へ肯定的な影
響を与える正の貢献度を得点ＷＩＪ＋と表し、（３ａ）
式、（３ｂ）式で定義する。In step 15, the score WIJ indicating the degree of contribution of the i-th keyword to the j-th field is calculated by the positive score table creation means 71 from the chi-square value X2'. The degree of positive contribution that has a positive impact on the j-th field is expressed as the score WIJ+, and (3a)
It is defined by equation (3b).

ＸＩＪ≧ａｌＪのときＷＩＪ子　−Ｘ２ビ　（ＸＩＪ　　　ａ＋、＋）２　／
Ｘ　ＩＪ＜　ａ　ＩＪのときＷＩＪ”　＝Ｏ（３ｂ　）なお、（３ａ）式において、１≦ｉ≦ｍ、　　　１≦ｊ≦ｎ、　　　１≦に≦ｎであ
る。When XIJ≧alJ, WIJ child −X2bi (XIJ a+, +)2/
When X IJ<a IJ, WIJ''=O(3b) In equation (3a), 1≦i≦m, 1≦j≦n, and 1≦≦n.

完成した大きさｍＸｎの得点表は、ステップ１６で得点
表表示手段８に格納される。以上が準備処理である。The completed score sheet of size mXn is stored in the score sheet display means 8 in step 16. The above is the preparation process.

次に分類処理について第１図、第３図を参照しながら述
べる。分類処理においては、分類されるべき文書に対し
て文書入力手段１、キーワード自動抽出手段２、得点計
算手段３、単一分類手段４１、分類結果表示手段５、分
類結果表示手段６、得点表表示手段８が使われる。分類
処理手順を以下で説明する。まず、文書入力手段１によ
り入力された文書に対して、ステップ２１でキーワード
自動抽出手段２によりキーワードが抽出される。Next, the classification process will be described with reference to FIGS. 1 and 3. In the classification process, a document to be classified is subjected to a document input means 1, an automatic keyword extraction means 2, a score calculation means 3, a single classification means 41, a classification result display means 5, a classification result display means 6, and a score table display. Means 8 is used. The classification processing procedure will be explained below. First, keywords are extracted from a document input by the document input means 1 by the automatic keyword extraction means 2 in step 21 .

前記ステップ２１では基本的に文書中の名詞、す変動側
語幹が抽出される。そのほか、キーワード自動抽出手段
２内の辞書に登録されていない同字種からなる文字列も
抽出される。次に前記ステップ２１で抽出されたキーワ
ードに対して、ステップ２２で得点計算手段３により得
点表表示手段８を参照して該当キーワードの得点を読み
出し、得点を各分野へ加算する。前記ステップ２１と前
記ステップ２２は文章の先頭から一定領域に対して行う
。対象領域は、先頭の一定数文、もしくは−定数のキー
ワードが抽出されるまでの領域とし、標本データの特性
をもとに決定する。対象領域内の処理が終了したときに
は、第ｊ分野の総得点ｖＬｊは対象領域内のデータに対
して（４）式を用いて計算されている。なお、同じキー
ワードが複数回出現した場合には、回数分加算されたも
のとする。In step 21, basically the nouns and variable stems in the document are extracted. In addition, character strings consisting of the same character types that are not registered in the dictionary in the automatic keyword extraction means 2 are also extracted. Next, in step 22, the score calculation means 3 refers to the score table display means 8 to read the score of the keyword extracted in step 21, and adds the score to each field. The steps 21 and 22 are performed for a certain area from the beginning of the sentence. The target area is defined as the first certain number of sentences or the area until the - constant keyword is extracted, and is determined based on the characteristics of the sample data. When the processing within the target area is completed, the total score vLj of the j-th field has been calculated using equation (4) for the data within the target area. Note that if the same keyword appears multiple times, it is assumed that the number of times is added.

菫、＝ΣＷ　Ｉ　Ｊ　　　　　　　　　　　　　　（４
１各分野の総得点Ｗ、が計算されたならば、これをもと
にステップ２３で分類手段４により、最高得点を示す分
野へ分類する。すなわち、（５）式を満たす第ｊ分野へ
分類する。Violet, = ΣW I J (4
1. Once the total score W for each field has been calculated, based on this, in step 23, the classification means 4 classifies the field into the field showing the highest score. That is, it is classified into the j-th field that satisfies equation (5).

Ｗ、≧Ｗｋ　ｆｏｒ　　Ｖｋ　　　　　　　　　（５］
最後に、前記ステップ２３で決定された分類先を、ステ
ップ２４で分類結果表示手段５により格納し、分類結果
表示手段６により表示する。W, ≧Wk for Vk (5)
Finally, the classification destination determined in step 23 is stored by the classification result display means 5 in step 24, and displayed by the classification result display means 6.

（実施例２）本発明の第２の装置を用いた文書分類手順を以下で説明
する。(Example 2) A document classification procedure using the second device of the present invention will be described below.

まず、準備処理について第４図、第２図を参照しながら
述べる。準備処理においては、標本文書に対して文書入
力手段１、キーワード自動抽出手段２、正負得点作成手
段７２、得点表表示手段８が使われる。準備処理手順を
以下で説明する。ここで、第１図における手段の番号と
同じものは、同様の機能を有する手段である。First, the preparation process will be described with reference to FIGS. 4 and 2. In the preparation process, the document input means 1, the automatic keyword extraction means 2, the positive/negative score creation means 72, and the score table display means 8 are used for the sample document. The preparation procedure will be explained below. Here, the same numbers as the means in FIG. 1 indicate means having similar functions.

第２の発明においては、第２図のステップ１５でカイ二
乗値Ｘ２Ｉから第１番目のキーワードの第ｊ分野への貢
献度を示す得点ＷＩＪを正負得点表作成手段７２により
算出する。第ｊ分野へ肯定的な影響を与える正の貢献度
を得点ＷＩＪ＋、否定的な影響を与える負の貢献度を得
点ＷＩＪ−と表し、それぞれ（３ａ）式、（３Ｃ）式で
定義する。得点ｗｌＪ＋と得点ＷＩＪ−とをまとめて得
点ＷＩＪとよぶことにする。In the second invention, in step 15 of FIG. 2, a score WIJ indicating the degree of contribution of the first keyword to the j-th field is calculated from the chi-square value X2I by the positive/negative score table creation means 72. A positive contribution that has a positive impact on the j-th field is represented by a score WIJ+, and a negative contribution that has a negative impact is represented by a score WIJ-, which are defined by equations (3a) and (3C), respectively. The score wlJ+ and the score WIJ- will be collectively referred to as the score WIJ.

ＸＩＪ≧ａｌＪのときｗ、）＋　＝Ｘ２１−　　（Ｘｌｌ　　ａｃ」）　　Ｖ
Σ　　　（Ｘ＋ｋ　ａｃｋ）２　　　　　（３ａ）Ｘｌ
ｋ≧ａｌｋｘ　、ｊ＜　ａ　、ｊのときＷＩＪ−ニーＸ２ｉ・　（ＸＩＪ　　ａｚ）　２／なお
、（３ａ）式、（３Ｃ）式において、１≦ｉ≦ｍ、　　
　１≦ｊ≦ｎ、　　　１≦に≦ｎである。When XIJ≧alJ, )+ =X21- (Xll ac”) V
Σ (X+k ack)2 (3a)Xl
When k≧alk x, j<a, j, WIJ-nee
1≦j≦n, 1≦and≦n.

次に分類処理について第４図、第３図を参照しながら述
べる。分類処理においては、分類されるべき文書に対し
て文書入力手段１、キーワード自動抽出手段２、得点計
算手段３、単一分類手段４１、分類結果表示手段５、分
類結果表示手段６、得点表表示手段８が使われる。分類
処理手順は第１の発明と同様である。Next, the classification process will be described with reference to FIGS. 4 and 3. In the classification process, a document to be classified is subjected to a document input means 1, an automatic keyword extraction means 2, a score calculation means 3, a single classification means 41, a classification result display means 5, a classification result display means 6, and a score table display. Means 8 is used. The classification processing procedure is the same as the first invention.

（実施例３）本発明の第３の装置を用いた文書分類手順を以下で説明
する。(Example 3) A document classification procedure using the third device of the present invention will be described below.

まず、準備処理について第５図、第２図を参照し０なが
ら述べる。準備処理においては、第１の発明と同様に、
標本文書に対して文書入力手段１、キーワード自動抽出
手段２、正得点表作成手段７１、得点表表示手段８が使
われる。準備処理手順は、第１の発明と同様で、第１図
における手段の番号と同じものは、同様の機能を有する
手段である。First, the preparation process will be described with reference to FIGS. 5 and 2. In the preparation process, similarly to the first invention,
Document input means 1, automatic keyword extraction means 2, correct score table creation means 71, and score table display means 8 are used for the sample document. The preparation processing procedure is the same as that of the first invention, and the same numbers as the means in FIG. 1 are means having similar functions.

次に分類処理について第５図、第３図３参照しながら述
べる。分類処理においては、分類されるべき文書に対し
て文書入力手段１、キーワード自動抽出手段２、得点計
算手段３、複数分類手段４２、分類結果表示手段５、分
類結果表示手段６、得点表示手段８が使われる。ここで
、第１図における手段の番号と同じものは、同様の機能
を有する手段である。第３の発明においては、複数の分
類先を許し、第３図のステップ２３においては、総得点
の一定割合以上の得点を示す分野、すなわち（６ａ）式
を満たす第ｊ分野へ分類する。Next, the classification process will be described with reference to FIGS. 5 and 3. In the classification process, document input means 1, automatic keyword extraction means 2, score calculation means 3, multiple classification means 42, classification result display means 5, classification result display means 6, and score display means 8 are used for documents to be classified. is used. Here, the same numbers as the means in FIG. 1 indicate means having similar functions. In the third invention, a plurality of classification destinations are allowed, and in step 23 of FIG. 3, classification is performed into a field in which the score is greater than a certain percentage of the total score, that is, a j-th field that satisfies equation (6a).

もしくは、最高得点に対して一定割合以上の得点を得た
分野、すなわち（６ｂ）式を満たす第ｊ分野へ分類する
。Alternatively, it is classified into the field in which the score is higher than a certain percentage of the highest score, that is, the j-th field that satisfies equation (6b).

菫、≧β、、、（ｗｈ　＞、ｏ＜β＜１　　（６ｂ）も
しくは前記２方法の論理和などによる複合した方法によ
って分類する。なお、α、βは分類漏れと分類ノイズと
のかねあいや分類構造の性質を勘案して定める。Violet, ≥ β, , (wh >, o < β < 1 (6b) or a combined method such as the logical sum of the above two methods. Note that α and β are the balance between classification omission and classification noise. Determined by taking into consideration the nature of the classification structure.

（実施例４）本発明の第４の装置を用いた文書分類手順を以下で説明
する。(Embodiment 4) A document classification procedure using the fourth device of the present invention will be described below.

まず、準備処理について第６図、第２図を参照しながら
述べる。準備処理においては、第２の発明と同様に、標
本文書に対して文書入力手段１、キーワード自動抽出手
段２、正負得点表作成手段７２、得点表表示手段８が使
われる。First, the preparation process will be described with reference to FIGS. 6 and 2. In the preparation process, similarly to the second invention, the document input means 1, automatic keyword extraction means 2, positive/negative score table creation means 72, and score table display means 8 are used for the sample document.

次に分類処理について第６図、第３図を参照しながら述
べる。分類処理においては、分類されるべき文書に対し
て文書入力手段１、キーワード自動抽出手段２、正負得
点計算手段３、複数分類手段４２、分類結果表示手段５
、分類結果表示手段６、得点表表示手段８が使われる。Next, the classification process will be described with reference to FIGS. 6 and 3. In the classification process, document input means 1, automatic keyword extraction means 2, positive/negative score calculation means 3, multiple classification means 42, and classification result display means 5 are used for documents to be classified.
, classification result display means 6, and score table display means 8 are used.

ここで、゛第１図における手段の番号と同じものは、同
様の機能を有する手段である。Here, the same numbers as the means in FIG. 1 indicate means having similar functions.

（発明の効果）本発明により、文書を人手によらずに効率的かつ効果的
に自動分類することができ、時間およびコストを削減す
ることができる。(Effects of the Invention) According to the present invention, documents can be automatically and efficiently classified without manual effort, and time and costs can be reduced.

[Brief explanation of the drawing]

第１図は第１の発明におけるブロック図、第２図は準備
処理を示す流れ図、第３図は分類処理を示す流れ図、第
４図は第２の発明におけるブロック図、第５図は第３の
発明におけるブロック図、第６図は第４の発明における
ブロック図である。図において、１・・・・・・文書入力手段、２・・・・・・キーワード自動抽出手段、３・・・・・
・得点計算手段、５・・・・・・分類結果表示手段、６・・・・・・分類結果表示手段、８・・・・・・得点表表示手段、４１・・・単一分類手段、　４２・・・複数分類手段、
７１・・・正得点表作成手段、７２・・・正負得点表作成手段。Fig. 1 is a block diagram of the first invention, Fig. 2 is a flowchart showing preparation processing, Fig. 3 is a flowchart showing classification processing, Fig. 4 is a block diagram of the second invention, and Fig. 5 is a flowchart showing the preparation process. FIG. 6 is a block diagram of the fourth invention. In the figure, 1...Document input means, 2...Keyword automatic extraction means, 3...
・Score calculation means, 5... Classification result display means, 6... Classification result display means, 8... Score table display means, 41... Single classification means, 42...Multiple classification means,
71...Positive score table creation means, 72...Positive score table creation means.

Claims

[Claims] 1. An automatic document classification device comprising the following (a) to (h). (a) document input means for inputting an electronic document; (b) automatic keyword extraction means for receiving a document from the document input means and automatically extracting keywords in the document; (c) inputting a sample into the document input means. When a document is input, a positive score representing the positive contribution of each keyword to each field is calculated based on statistical values from the appearance frequency of the keywords extracted by the automatic keyword extraction means, and a score is calculated. (d) score sheet storage means for storing the score sheet created by the score sheet creation means; (e) when a document to be classified is input to the document input means; , score calculation means for inputting the keywords extracted by the automatic keyword extraction means and inputting the scores corresponding to the keywords by referring to the score table storage means to calculate scores for each field of the input document; (f) receiving scores in each field from the score calculation means;
A single classification means that determines one classification destination based on the score; (g) A classification result storage means that receives the classification result from the classification means and stores the classification result; (h) A classification result from the classification means. A classification result display means that receives the classification result and displays the classification result. 2. An automatic document classification device comprising the following (a) to (h). (a) document input means for inputting an electronic document; (b) automatic keyword extraction means for receiving a document from the document input means and automatically extracting keywords in the document; (c) inputting a sample into the document input means. When a document is input, a positive score and a negative contribution representing the degree of positive contribution of each keyword to each field are calculated based on statistical values from the appearance frequency of the keywords extracted by the automatic keyword extraction means. a positive/negative score table creation means for calculating a negative score representing a degree and creating a score sheet; (d) a score sheet storage means for storing the score sheet created by the score sheet creation means; (e) the document input means. When a document to be classified is input, the keyword extracted by the automatic keyword extraction means is input, the score corresponding to the keyword is input by referring to the score table storage means, and the input document is classified. a score calculation means for calculating a score for each field; (f) receiving a score for each field from the score calculation means;
A single classification means that determines one classification destination based on the score; (g) A classification result storage means that receives the classification result from the classification means and stores the classification result; (h) A classification result from the classification means. A classification result display means that receives the classification result and displays the classification result. 3. An automatic document classification device comprising the following (a) to (h). (a) document input means for inputting an electronic document; (b) automatic keyword extraction means for receiving a document from the document input means and automatically extracting keywords in the document; (c) inputting a sample into the document input means. When a document is input, a positive score representing the positive contribution of each keyword to each field is calculated based on statistical values from the appearance frequency of the keywords extracted by the automatic keyword extraction means, and a score is calculated. (d) score sheet storage means for storing the score sheet created by the score sheet creation means; (e) when a document to be classified is input to the document input means; , score calculation means for inputting the keywords extracted by the automatic keyword extraction means and inputting the scores corresponding to the keywords by referring to the score table storage means to calculate scores for each field of the input document; (f) receiving scores in each field from the score calculation means;
multiple classification means for determining a plurality of classification destinations based on the scores; (g) classification result storage means for receiving classification results from the classification means and storing the classification results; (h) receiving classification results from the classification means; A classification result display means for receiving and displaying the classification results. 4. An automatic document classification device comprising the following (a) to (h). (a) document input means for inputting an electronic document; (b) automatic keyword extraction means for receiving a document from the document input means and automatically extracting keywords in the document; (c) inputting a sample into the document input means. When a document is input, a positive score and a negative contribution representing the degree of positive contribution of each keyword to each field are calculated based on statistical values from the appearance frequency of the keywords extracted by the automatic keyword extraction means. a positive/negative score table creation means for calculating a negative score representing a degree and creating a score sheet; (d) a score sheet storage means for storing the score sheet created by the score sheet creation means; (e) the document input means. When a document to be classified is input, the keyword extracted by the automatic keyword extraction means is input, the score corresponding to the keyword is input by referring to the score table storage means, and the input document is classified. a score calculation means for calculating a score for each field; (f) receiving the score for each field from the score calculation means;
A plurality of classification means that determines a plurality of classification destinations based on the scores; (g) A classification result storage means that receives classification results from the classification means and stores the classification results; (h) A classification result storage means that receives the classification results from the classification means. A classification result display means for receiving and displaying the classification results.