JP2001256251A

JP2001256251A - Device and system for automatically evaluating document information

Info

Publication number: JP2001256251A
Application number: JP2000064037A
Authority: JP
Inventors: Hiroyuki Kondo; 広幸近藤
Original assignee: NEC Software Chubu Ltd
Current assignee: NEC Solution Innovators Ltd
Priority date: 2000-03-08
Filing date: 2000-03-08
Publication date: 2001-09-21

Abstract

PROBLEM TO BE SOLVED: To solve a problem such that it is necessary to manually prepare a keyword for previously specifying the destination of distribution when sorting and distributing the claim information of electronic mail or the like to appropriate persons in charge. SOLUTION: The electronic mail is regarded as a natural language text and the same method can be used even for inputting the inquiry or claim of audio or fax except for electronic mail. Information provided in a natural language is automatically evaluated and sorted by estimating the destination of distribution suitable for sorting from contents thereof. For example, it is estimated and sorted who is suitable for receiving the claim dispatched to the claim processing part of a manufacturer. When information to be inputted is expressed in the natural language, a format is no object. A title or keyword is not limited. The estimation rule of category can by preset but can be previously learnt by sample data or learnt after repeated operation as well.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、電子メール等の受
信時、電子メールの内容を解析し最適な担当者に配信す
る場合、特に事前にサンプルデータから抽出した各配信
先を決定する単語と受信した電子メールの内容を照合
し、類似度の高い宛先に配信し、適宜、前記配信先を決
定する単語の修正を行う文書情報自動評価装置及び文書
情報自動評価システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method of analyzing the contents of an e-mail when receiving the e-mail or the like and distributing the contents to an optimal person in charge. The present invention relates to a document information automatic evaluation device and a document information automatic evaluation system for collating contents of a received e-mail, distributing the contents to a destination having a high degree of similarity, and appropriately correcting a word for determining the distribution destination.

【０００２】[0002]

【従来の技術】メーカが出荷した製品やサービスについ
てその製品やサービスの機能、操作法等に対する問い合
わせ、製品やサービス自体へのクレーム等がメーカ側に
寄せられることは、日常的に発生していることである。
メーカは、ユーザから寄せられたこれらの反応を十分に
解析し、ユーザに応えるべきものは応え、次の製品やサ
ービスの開発に参考に出来る部分は参考にして行く必要
がある。そのためには最初にこれらのユーザの声を速や
かに且つ対応がふさわしいしかるべき担当者に正しく振
り分ける必要がある。2. Description of the Related Art Inquiries about the functions and operation methods of products and services delivered by manufacturers and complaints about the products and services themselves are sent to the manufacturers on a daily basis. That is.
Manufacturers need to fully analyze these responses received from users, respond to those that should respond to users, and refer to parts that can be used for the development of the next product or service. For that purpose, it is necessary first to correctly assign the voices of these users promptly and appropriately to the appropriate person in charge.

【０００３】一方、このユーザからの反応の数が日に何
百、何千のオーダになるとメーカ側におけるしかるべき
担当者への振り分け作業だけでも膨大な工数が発生する
為、上記必要性に応えにくいという問題がおきる。イン
ターネットが発展し、ユーザが電子メール等の手段で容
易にメーカ側にアクセスできるようになりつつある現
在、この傾向は、今後ますます顕著になると思われる。
また、メーカに限らず、各市町村のサービスに対する問
題等が住民から寄せられることも多く、同様な問題は、
社会全般で抱えているものと思われる。[0003] On the other hand, if the number of reactions from the user is on the order of hundreds or thousands per day, enormous man-hours will be required only for the work of allocating to the appropriate person on the manufacturer side. There is a problem that it is difficult. With the development of the Internet and the fact that users can easily access manufacturers by means such as e-mail, this tendency is likely to become even more pronounced in the future.
In addition, not only manufacturers, but also problems with services of each municipality are often received from residents.
It seems that it is held by society in general.

【０００４】例えば、特開平１０−４４２８には上記問
題に対応するため、電子メールを受信した時に、該受信
メールの表題及び構文を解析し、キーワードをそこから
抽出するキーワード抽出手段と、振り分けの対象となる
メールボックスには予めキーワードを設定しておくキー
ワード設定手段と、前記２つの手段によるキーワードを
照合しキーワードが一致したメールボックスにメールを
送信する仕分け手段とを設けることで、電子メールを自
動的に仕分けする内容が記載されている。For example, Japanese Patent Application Laid-Open No. 10-4428 discloses a keyword extracting means for analyzing the title and syntax of an e-mail and extracting a keyword from the e-mail when the e-mail is received. By providing a keyword setting means for setting a keyword in advance in a target mailbox, and a sorting means for comparing the keywords by the two means and transmitting the mail to a mailbox in which the keyword matches, the electronic mail is provided. The contents to be automatically sorted are described.

【０００５】[0005]

【発明が解決しようとする課題】本発明は、上記電子メ
ールシステムが解決しようとしたのと同様に、膨大な受
信メールを迅速に自動的に処理する内容を有している。
しかしながら、上記電子メールシステムでは、振り分け
が行われるメールボックスには予めキーワードを設定し
ておく必要があった。又、受信した電子メールが仮に間
違ったメールボックスに転送された場合、その後どう対
処していくかについての記載は無かった。The present invention has a content for quickly and automatically processing a large amount of received mail, as in the case of the above-mentioned electronic mail system.
However, in the above e-mail system, it is necessary to set a keyword in advance in a mailbox to be sorted. In addition, there is no description on how to handle the received e-mail if it is transferred to the wrong mailbox.

【０００６】本発明では、これら問題に対応すると同時
に、さらに、電子メールを自然言語テキストと捉え、同
じ手法を電子メール以外の音声（音声認識後テキストに
なる）やｆａｘ（文字認識後テキストになる）による問
い合わせやクレームの場合の入力にも使えるように構成
している。自然言語（日本語の自由構文表現）で提供さ
れた情報を、その内容から分類するのにふさわしいカテ
ゴリを類推し、指数を付け、自動的に評価分類するもの
である。たとえば、メーカーのクレーム処理部に届いた
クレームを受け取るのにふさわしいのは誰かを類推し
（カテゴリ類推）、分類する。入力する情報は、自然言
語で表現されたものならフォーマットを問わない。タイ
トル付けやキーワードなどには制限は持たない。カテゴ
リの類推ルールは、予め設定することも可能だが、事前
にサンプルデータにより学習したり、運用を重ねるうち
に学習することも可能である。従来の選別システムで
は、前述した電子メールシステムのように、あらかじめ
キーワード等を設定する方式が主なので、ここは本発明
の大きな特徴となると思われる。In the present invention, at the same time as addressing these problems, an e-mail is regarded as a natural language text, and the same method is applied to voices other than the e-mail (becoming text after speech recognition) or fax (becoming text after character recognition). ) Can be used for inquiries and complaints. It is a method of estimating a category suitable for classifying information provided in a natural language (free syntax expression in Japanese) from its contents, assigning an index, and automatically performing evaluation classification. For example, it is possible to analogize (categorical analogy) and classify a person who is suitable for receiving a complaint that has arrived at the complaint processing section of the manufacturer. The input information may be in any format as long as it is expressed in a natural language. There are no restrictions on titles or keywords. The category analogy rule can be set in advance, but it is also possible to learn in advance by using sample data, or to learn while repeating operations. Since the conventional sorting system mainly uses a method in which a keyword or the like is set in advance as in the above-described electronic mail system, this is considered to be a major feature of the present invention.

【０００７】[0007]

【課題を解決するための手段】本願の第１の発明の文書
情報自動評価装置は、文書を分類するための類推ルール
を生成する前処理と、前記前処理で生成された類推ルー
ルを使用しクレームや問い合わせ等からなるクレームデ
ータを担当者別に分類して配信する分類処理と、から構
成され、前記前処理は、既に配信先の判明している全て
の配信先についてのクレームデータを格納する類推ルー
ル生成サンプルデータファイルと、前記類推ルール生成
サンプルデータファイルを入力し全ての配信先について
の単語又は単語列からなる類推ルールを生成する類推ル
ール生成部と、前記類推ルール生成部により生成された
前記類推ルールを格納する類推ルールファイルと、を有
し、前記分類処理は、前記クレームデータを入力しテキ
スト形式に変換する入力部と、前記テキスト形式に変換
されたクレームデータを格納するテキストファイルと、
前記テキストファイルに変換されたクレームデータを入
力し形態素解析を行い分解された単語からなる構成単語
データとして出力する自然言語解析部と、前記構成単語
データを格納する構成単語データファイルと、前記構成
単語データファイルから入力した構成単語データに対し
前記類推ルールファイルの類推ルールを各配信先に適用
して類似度を算出し、最も高い類似度を持つ配信先に前
記クレームデータを配信する分類部と、を備える。According to a first aspect of the present invention, there is provided a document information automatic evaluation apparatus which uses pre-processing for generating an analogy rule for classifying documents and uses the analogy rule generated in the pre-processing. And a classification process for classifying and distributing claim data including claims and inquiries for each person in charge. The pre-processing is analogous to storing claim data for all distribution destinations whose distribution destinations are already known. A rule generation sample data file, an analogy rule generation unit that inputs the analogy rule generation sample data file and generates analogy rules composed of words or word strings for all distribution destinations, and the analogy rule generation unit that generates the analogous rule. An analogy rule file storing analogy rules, wherein the classification process inputs the claim data and converts it into a text format. An input unit, and a text file for storing the converted claims data to the text format,
A natural language analysis unit for inputting the claim data converted to the text file, performing morphological analysis and outputting as constituent word data composed of decomposed words, a constituent word data file storing the constituent word data, and the constituent words A classification unit that calculates the similarity by applying the analogy rule of the analogy rule file to each distribution destination for the constituent word data input from the data file, and distributes the claim data to the distribution destination having the highest similarity, Is provided.

【０００８】本願の第２の発明の文書情報自動評価装置
は、第１の発明において、前記前処理における類推ルー
ルの生成時、全ての配信先について既に配信先が判明し
配信先毎に予め区分けされたクレームデータを入力する
ことにより、配信先を決定する為の類推ルールを全ての
配信先について生成することを備える。According to the second aspect of the present invention, in the document information automatic evaluation device according to the first aspect, at the time of generation of the analogy rule in the preprocessing, the distribution destinations are already known for all the distribution destinations, and the distribution destinations are classified in advance. By inputting the complaint data provided, analogy rules for determining a distribution destination are generated for all distribution destinations.

【０００９】本願の第３の発明の文書情報自動評価装置
は、第１の発明において、前記クレームデータは、電子
メール、電話による音声情報、音声メール、ｆａｘ情報
の何れかを含むことを備える。A third aspect of the present invention provides the automatic document information evaluation apparatus according to the first aspect, wherein the claim data includes any of electronic mail, telephone voice information, voice mail, and fax information.

【００１０】本願の第４の発明の文書情報自動評価装置
は、第１又は第２の発明において、前記前処理における
類推ルールの生成時、全ての配信先について既に配信先
が判明し配信先毎に予め区分けされたクレームデータを
入力すると、ある配信先へ配信されるクレームデータと
当該配信先以外へ配信されるクレームデータとの２つに
分類し、２つに分類されたクレームデータに含まれる各
単語又は単語列の出現頻度の割合を当該配信先のクレー
ムデータの有する全ての単語又は単語列について求め、
それを予め規定したしきい値と比較して当該配信先の類
推ルールを生成し、これを全ての配信先について行うこ
とを備える。According to a fourth aspect of the present invention, in the first or second aspect, when the analogy rule is generated in the preprocessing, the distribution destinations are already determined for all the distribution destinations, and When the claim data classified in advance is input, the claim data is classified into two: claim data distributed to a certain destination and claim data distributed to a destination other than the destination, and is included in the two classified claim data. Find the ratio of the appearance frequency of each word or word string for all the words or word strings of the claim data of the delivery destination,
It is provided with a step of comparing it with a predetermined threshold value to generate an analogy rule for the distribution destination, and performing this for all distribution destinations.

【００１１】本願の第５の発明の文書情報自動評価装置
は、第１の発明において、前記分類処理において、クレ
ームデータをテキスト形式データに変換しさらにこれを
形態素解析によって単語列に分解した前記構成単語デー
タに対して、各配信先の有する前記類推ルールとの比較
照合を全ての配信先について実行して類似度を算出し算
出した中の最大値が予め規定されたしきい値以上の場合
に該最大値を算出した配信先に前記クレームデータを配
信し、しきい値未満の場合は配信しないことを備える。According to a fifth aspect of the present invention, in the document information automatic evaluation device according to the first aspect, in the classification processing, the claim data is converted into text format data in the classification process, and further divided into word strings by morphological analysis. When the word data is compared with the analogy rule of each distribution destination for all the distribution destinations, the similarity is calculated and the maximum value among the calculated values is equal to or greater than a predetermined threshold value. Distributing the claim data to the distribution destination that has calculated the maximum value, and not distributing the claim data if the value is less than the threshold value.

【００１２】本願の第６の発明の文書情報自動評価装置
は、第１の発明において、前記類推ルール生成サンプル
データファイルは、クレームデータ又はクレームデータ
に対応する構成単語データからなることを備える。[0012] In a sixth aspect of the present invention, in the automatic document information evaluation apparatus according to the first aspect, the analogy rule generation sample data file includes claim data or constituent word data corresponding to the claim data.

【００１３】本願の第７の発明の文書情報自動評価装置
は、第１の発明において、クレームデータの配信におい
て、前記しきい値未満で配信されなかったクレームデー
タや不正な配信先に配信されたクレームデータ等につい
ては、本来配信されるべき配信先についてのサンプルデ
ータとして類推ルール生成サンプルデータファイルに追
加登録し、前記類推ルール生成部を起動して類推ルール
の見直しをする形での学習を行うことを備える。The document information automatic evaluation device according to a seventh aspect of the present invention, in the first aspect, wherein in the distribution of the claim data, the claim data which is not distributed below the threshold value or which is distributed to an unauthorized distribution destination. Claim data and the like are additionally registered in the analogy rule generation sample data file as sample data on the distribution destination that should be originally distributed, and learning is performed in such a manner that the analogy rule generation unit is activated and the analogy rule is reviewed. It is prepared.

【００１４】本願の第８の発明の文書情報自動評価装置
は、第１又は第７の発明において、クレームデータを受
け付けた場合に前記入力部を自動起動し、前記しきい値
未満で配信されなかったクレームデータや不正な配信先
に配信されたクレームデータ等の発生時には、本来配信
されるべき配信先のサンプルデータとして類推ルール生
成サンプルデータファイルに自動的に追加登録しさらに
前記類推ルール生成部を自動起動して類推ルールの見直
しを自動的に行うことを備える。According to an eighth aspect of the present invention, in the first or seventh aspect, the document information automatic evaluation device automatically starts the input section when claim data is received, and is not distributed below the threshold value. In the event of complaint data or claim data delivered to an unauthorized delivery destination, the analogy rule generation sample data file is automatically added to the analogy rule generation sample data file as sample data of the delivery destination to be originally delivered, and It is provided to automatically start up and automatically review the analogy rules.

【００１５】本願の第９の発明の文書情報自動評価シス
テムは、電子メール、電話、音声メール、ｆａｘ等で受
信したクレームデータを担当者に自動配信するシステム
において、既に配信先の判明しているクレームデータを
入力して単語又は単語列からなる類推ルールを全ての配
信先について求め、新規に受信したクレームデータをテ
キスト形式に変換後、形態素解析を実行して分解された
単語からなる構成単語データを作成し、前記各配信先に
ついての類推ルールを当該構成単語データに適用して類
似度を算出し、類似度として最大の値を持つ配信先に配
信することを備える。An automatic document information evaluation system according to a ninth aspect of the present invention is a system for automatically distributing complaint data received by e-mail, telephone, voice mail, facsimile, etc. to a person in charge, and the destination of distribution is already known. Input claim data, find analogy rules consisting of words or word strings for all distribution destinations, convert newly received claim data to text format, perform morphological analysis, and construct word data consisting of decomposed words And calculating the similarity by applying the analogy rule for each of the distribution destinations to the constituent word data, and distributing the similarity to the distribution destination having the maximum value of the similarity.

【００１６】本願の第１０の発明の文書情報自動評価シ
ステムは、第９の発明において、入手を要望する話題の
掲載されたＨＴＭＬファイルと入手を要望する話題の掲
載されていないＨＴＭＬファイルをサンプルデータとし
て類推ルールを生成し、インターネット上のＷｅｂロボ
ット等のＨＴＭＬファイル検索システムから最新のＨＴ
ＭＬファイルを入手し、当該類推ルールをもとに入手を
希望する話題を有するＨＴＭＬファイルを検索すること
を備える。A document information automatic evaluation system according to a tenth aspect of the present invention is the ninth aspect, wherein the HTML file in which the topic requested to be obtained is described and the HTML file in which the topic requested to be obtained is not described are sampled. Generates analogy rules as HTML and retrieves the latest HT from an HTML file search system such as a Web robot on the Internet.
The method includes obtaining an ML file and searching for an HTML file having a topic desired to be obtained based on the analogy rules.

【００１７】[0017]

【発明の実施の形態】次に、本発明の実施の形態につい
て図面を参照して詳細に説明する。Next, embodiments of the present invention will be described in detail with reference to the drawings.

【００１８】図１は、汎用コンピュータやパーソナルコ
ンピュータによる本発明の文書情報自動評価装置の構成
図であり、電子メール、音声メール、電話、ｆａｘ等の
形態で到着したクレームデータや問い合わせ等（以降、
クレームデータと称す。）に対し、当該クレームデータ
の処理を行う担当者別に分類仕分けを行うものである
が、図１の左側の「前処理」では、クレームデータを分
類する時に適用する類推ルールを前もって各担当者毎に
作成する時の構成を示し、右側には、左側の構成によっ
て作成された類推ルールに従ってクレームデータを仕分
けする「分類処理」での構成を示している。尚、図では
到着したクレームデータに対する音声認識装置や文字認
識装置を省略しているが、音声やｆａｘによるデータは
これらの装置によって、テキスト形式のデータとして出
力されるものとする。FIG. 1 is a block diagram of a document information automatic evaluation device of the present invention using a general-purpose computer or a personal computer. Claim data and inquiries (hereinafter, referred to as e-mail, voice mail, telephone, fax, etc.) arrived.
This is called claim data. ) Is classified according to the person in charge of processing the claim data. In the “pre-processing” on the left side of FIG. 1, the analogy rules applied when classifying the claim data are set in advance for each person in charge. The right side shows the configuration in the "classification process" in which the claim data is sorted according to the analogy rules created by the left side configuration. Although a voice recognition device and a character recognition device for arriving claim data are omitted in the drawing, it is assumed that voice and fax data are output as text data by these devices.

【００１９】「前処理」の構成においては、どの担当者
に配布するか判明したクレーム情報のサンプルデータで
ある類推ルール生成サンプルデータファイル１０５と、
それを元に類推ルールを生成する類推ルール生成部１０
１と、生成された類推ルールを格納する類推ルールファ
イル１０６と、を含む。尚、類推ルール生成サンプルデ
ータファイル１０５は、クレームに対応するテキストデ
ータ又は、構成単語データからなり、構成単語データの
場合は、類推ルール生成部１０１における記載されてい
ない自然言語解析部１０３の処理は不要となる。In the “pre-processing” configuration, an analogy rule generation sample data file 105 which is sample data of claim information determined to which person to distribute,
An analogy rule generation unit 10 that generates analogy rules based on the rule.
1 and an analogy rule file 106 that stores the generated analogy rules. The analogy rule generation sample data file 105 is composed of text data or constituent word data corresponding to the claim. In the case of constituent word data, the processing of the natural language analysis unit 103 not described in the analogy rule generation unit 101 is performed. It becomes unnecessary.

【００２０】「分類処理」の構成においては、電子メー
ル等の形態で到着したクレームデータ１０７を入力して
テキスト形式に変換しテキストファイル１０８に出力す
る入力部１０２と、テキストファイル１０８を入力して
名詞、動詞を含む単語に分解し、結果を構成単語データ
ファイル１０９に出力する自然言語解析部１０３と、構
成単語データファイル１０９と類推ルールファイル１０
６とを入力してクレームデータ１０７の配布先担当者を
決定する分類部１０４と、分類結果１１０と、を含む。In the structure of the "classification process", the input unit 102 for inputting the claim data 107 arriving in the form of e-mail or the like, converting it into a text format, and outputting it to a text file 108, and inputting the text file 108 A natural language analysis unit 103 for decomposing into words including nouns and verbs and outputting the result to a constituent word data file 109; a constituent word data file 109;
6 includes a classification unit 104 that determines the person in charge of the distribution destination of the claim data 107, and a classification result 110.

【００２１】類推ルール生成部１０１は、利用者からの
製品等の使用に関しての苦情を含むクレームや製品につ
いての相談や問い合わせ等を電子メール、音声メール、
電話、ｆａｘ等でクレームデータ１０７として受け、各
クレームデータ１０７についての対応処理を行う担当者
（カテゴリ）別に分類するための類推ルールファイル１
０６を前もって生成する。配信先の各担当者ごとに、次
のような方法で類推ルールを出力する。The analogy rule generation unit 101 receives consultations and inquiries regarding complaints and products including complaints from users regarding the use of the products and the like by e-mail, voice mail,
An analogy rule file 1 for receiving the complaint data 107 by telephone, facsimile, etc., and classifying the complaint data 107 by person (category) who performs the corresponding process.
06 is generated in advance. The analogy rule is output for each person in charge of the distribution destination in the following manner.

【００２２】まず、類推ルール生成用サンプルデータ１
０５を準備する。クレームを処理する各担当者ごとにク
レームデータ１０７で配信先の決まっているサンプルデ
ータを何件か（例えば、各担当者あたり５０件〜１００
件位）用意し、これを類推ルール生成部１０１の処理に
組み込まれた自然言語解析部１０３と同一の処理を施し
て構成単語を抽出することで、類推ルールを生成する。
尚、サンプルデータとして構成単語データを与えた場合
は上述した処理は不要となる。以降、類推ルールを生成
するのに使用した担当者のサンプルデータを正例データ
と呼び、それ以外の担当者のサンプルデータを負例デー
タと呼ぶ。担当者Ａの正例データは、担当者Ｂからは負
例データとなるわけである。First, sample data 1 for analogy rule generation
Prepare 05. For each person in charge of processing a complaint, the number of sample data to which the distribution destination is determined in the complaint data 107 (for example, 50 to 100 per person in charge)
The analogy rule is generated by performing the same processing as that of the natural language analysis unit 103 incorporated in the processing of the analogy rule generation unit 101 and extracting constituent words.
When the constituent word data is given as the sample data, the above-described processing becomes unnecessary. Hereinafter, the sample data of the person in charge used to generate the analogy rule will be referred to as positive example data, and the sample data of other persons will be referred to as negative example data. The positive example data of the person in charge A becomes negative example data from the person in charge B.

【００２３】次に、正例データから使用頻度の高い特徴
的な単語を抽出する。自然言語解析を行い得られた構成
単語全てを対象として、単語の選定抽出を行う。単語の
抽出基準は、正例データを構成するすべての単語につい
て次の値を求め、その値がそれぞれの選定基準を満たす
単語を抽出対象とする。Next, characteristic words frequently used are extracted from the positive example data. Word selection and extraction are performed for all constituent words obtained by performing the natural language analysis. With respect to the word extraction criterion, the following values are obtained for all the words constituting the positive example data, and words whose values satisfy the respective selection criteria are extracted.

【００２４】[0024]

【数１】 (Equation 1)

【００２５】ここで、ａ：正例データに含まれているある単語Ｘの個数ｂ：正例データに含まれているＸ以外の単語の個数ｄ：負例データに含まれている単語Ｘの個数とする。Here, a: the number of words X included in the positive example data b: the number of words other than X included in the positive example data d: the number of words X included in the negative example data Number.

【００２６】図４は、ある正例データの持つ単語につい
て分類した内容で、ある配信先についての正例データに
含まれる単語が、Ａ，Ｂ・・・Ｘ，Ｙから構成されてい
る場合、それぞれの出現頻度等を表している。ここで、
再現率、および適合率は、それぞれ次の意味を持つ。再現率：正例データ、負例データ全ての中でその単語の
正例データ中に使用されている割合（使用頻度の偏り）
を表すもので、単語の独特さに関する評価基準であり、
この値が大きい程独特さが高いと言うことになる。適合率：単語の正例データ中での出現頻度に関する評価
基準で再現率が高くても適合率が低すぎれば抽出基準に
あわないというような利用が行われる。FIG. 4 shows the contents of the words included in certain positive example data. In the case where the words included in the positive example data for a certain distribution destination are composed of A, B. The respective appearance frequencies are shown. here,
Recall and precision have the following meanings, respectively. Recall: The percentage of the word used in the positive example data of all positive and negative example data (use frequency bias)
Is a measure of the uniqueness of a word,
The larger the value, the higher the uniqueness. Relevance: An evaluation criterion for the frequency of occurrence of words in positive example data is used such that even if the recall is high, the relevance is too low to meet the extraction criterion.

【００２７】単語の抽出基準は次の２つの条件をクリア
したものを対象とする。再現率＞再現率のしきい値適合率＞（全ての正例データ中の単語についての適合率
の平均値−適合率の平均値からの差分のしきい値）上記再現率、適合率についての具体例を説明すると、あ
る正例データ中に１ヶ所だけ”単語Ａ”が存在し、負例
データ中には存在しなかったとする。この時”単語Ａ”
の再現率は１となるため前記再現率のしきい値を超え抽
出の対象候補となるが１ヶ所でしか使用されていない稀
な使用の単語を選定するのでは問題がある。この時、適合率の平均値＝０．０３適合率の平均値からの差分のしきい値＝０．０１と仮にすると、適合率＞０．０２を持つ単語が選定対象
となるため正例データを構成する単語量が極端に少なす
ぎなければ、上記”単語Ａ”は抽出の対象にはならない
ことになる。The word extraction criterion is targeted at one that satisfies the following two conditions. Recall> threshold of recall Precision> (average of relevance for all words in the positive example data-threshold of difference from average of relevance) To explain a specific example, it is assumed that "word A" exists only at one place in certain positive example data and does not exist in negative example data. At this time, "word A"
Has a reproducibility of 1, which exceeds the reproducibility threshold and is a candidate for extraction, but there is a problem in selecting rarely used words that are used only at one place. At this time, if it is assumed that the average value of the precision is 0.03 and the threshold value of the difference from the average value of the precision is 0.01, the words having the precision of> 0.02 are to be selected. If the amount of words constituting is not too small, the "word A" will not be a target for extraction.

【００２８】以上のように、個々の単語の持つ再現率、
適合率について上記２つの基準と照合し”単語の抽出”
を行っていく。“単語の抽出”で抽出した単語を使用し
て、各担当者に配信するクレームの類推ルールを生成す
る。類推ルールとして、次のような情報を出力する。類
推ルール（単語情報）として抽出した単語が、正例デー
タに連続して現れる場合、その単語を複合した単語列を
類推ルールとする。連続しない場合は、単独の単語を類
推ルールとする。As described above, the recall rate of each word,
Check the precision against the above two criteria and extract words.
Go on. Using the words extracted in “extracting words”, a rule for inferring claims to be delivered to each person in charge is generated. The following information is output as the analogy rule. When words extracted as analogy rules (word information) appear continuously in positive example data, a word string obtained by combining the words is used as an analogy rule. If not consecutive, a single word is used as an analogy rule.

【００２９】次に、選定した各類推ルールについて重み
を設定する。Next, a weight is set for each of the selected analogy rules.

【００３０】次の式で類推ルールの再現率（類推ルール
の独特さ）を求め、類推ルールの重みとする。The recall rate of the analogy rule (uniqueness of the analogy rule) is obtained by the following equation, and is used as the weight of the analogy rule.

【００３１】[0031]

【数２】 (Equation 2)

【００３２】ここでａ：正例データに含まれている類推ルールｘの個数ｄ：負例データに含まれている類推ルールｘの個数以上の処理で類推ルールの設定が終了すると、実際にク
レームデータ１０７の配信処理が可能となる。クレーム
データ１０７についての処理担当者の決定を開始する。
クレームデータ１０７の入力媒体が音声またはｆａｘな
ら、音声認識処理またはテキスト認識処理を施し、テキ
ストデータとしてテキストファイル１０８に出力する。
電子メールなどの電子テキストは、そのまま受け付け、
テキストデータとしてテキストファイル１０８に出力す
る。Here, a: the number of analogy rules x included in the positive example data d: the number of analogous rules x included in the negative example data The data 107 can be distributed. The determination of the person in charge of processing the claim data 107 is started.
If the input medium of the claim data 107 is voice or fax, voice or text recognition processing is performed, and the claim data 107 is output to the text file 108 as text data.
We accept electronic text such as e-mail as it is,
Output to the text file 108 as text data.

【００３３】入力部１０２の出力したテキストファイル
１０８を入力し、自然言語解析部１０３によって形態素
解析を施し、構成単語をリストアップし、構成単語デー
タファイル１０９に出力する。この時、固有名詞を含む
名詞、および動詞のみを取り出し、動詞は活用形を統一
する。The text file 108 output from the input unit 102 is input, subjected to morphological analysis by the natural language analysis unit 103, to list constituent words, and output to the constituent word data file 109. At this time, only the noun including the proper noun and the verb are extracted, and the verb unifies the inflected form.

【００３４】構成単語と類推ルールファイル１０６の既
存カテゴリ名の類似度計算をし、類似度の最も高いカテ
ゴリに振り分ける。類似度の計算は様々な手法が考えら
れるが、例えば、類推ルール生成部１０１で作成した類
推ルールファイルを使用し、以下のように行う。（１）類推ルール中の単語又は単語列をテキストから抽
出する。類推ルールに格納されている単語列のパターン
がテキスト中に存在する場合、その単語列のパターンを
抽出する。（２）類似度を計算する。例えば、次の式の値を類似度
とする。The similarity between the constituent words and the existing category name in the analogy inference rule file 106 is calculated, and the category is assigned to the category having the highest similarity. Various methods are conceivable for calculating the similarity. For example, the similarity calculation is performed as follows using an analogy rule file created by the analogy rule generation unit 101. (1) A word or word string in the analogy rule is extracted from the text. If the word string pattern stored in the analogy rule exists in the text, the word string pattern is extracted. (2) Calculate the similarity. For example, let the value of the following equation be the similarity.

【００３５】[0035]

【数３】 (Equation 3)

【００３６】ここで、ｋ：係数ｎ：テキスト中に存在する、類推ルールの単語又は単
語列パターンの数Ｒｉ：類推ルールの重みＮｉ：類推ルールを構成する単語の個数Ｗ：テキストデータに存在する全単語数（３）送信先担当者の決定全てのカテゴリについて、類似度を計算し、最も大きい
値がしきい値を超えている場合、テキストデータはその
カテゴリに分類されるとし、そのカテゴリの担当者にも
とのクレームデータ１０５を分類結果１１０として配信
する。もし最も大きい値がしきい値を越えなかった場合
は、分類できなかったクレームを処理する担当者に配信
する。Here, k: coefficient n: number of words or word string patterns of the analogy rule existing in the text Ri: weight of the analogy rule Ni: number of words constituting the analogy rule W: existing in the text data Total number of words (3) Determination of the person in charge of the destination The similarity is calculated for all categories, and if the largest value exceeds the threshold, the text data is classified into that category. The original claim data 105 is distributed to the person in charge as the classification result 110. If the largest value does not exceed the threshold, it will be delivered to the person in charge of the complaint that could not be classified.

【００３７】以上のように配信先を決定するが、あるク
レームデータ１０７についての類似度の計算で計算した
結果が、しきい値を超えていない場合や正しい配信先に
配信されなかった場合は、当該クレームデータ１０７を
人の目で確認して正しい配信先に配信することになる。
その場合、当該するクレームデータを新規の類推ルール
生成サンプルデータファイル１０５として正しい配信先
の持つ既存のサンプルデータに追加する。その他の配信
先についての類推ルール生成サンプルデータファイル１
０５は、既存のままとして類推ルール生成部１０１を起
動し、類推ルールファイル１０６を更新する。サンプル
データの追加を受けた配信先の類推ルールは、正例デー
タの追加により当然更新される可能性があるが、他の配
信先の類推ルールについても逆に負例データが追加にな
るためこれも更新される可能性がある。以上のようにし
て類推ルールはより良いものに生長していくことにな
る。The distribution destination is determined as described above. If the result of the similarity calculation for a certain claim data 107 does not exceed the threshold value or is not distributed to the correct distribution destination, The complaint data 107 is confirmed by the human eyes and distributed to the correct distribution destination.
In that case, the corresponding claim data is added as a new analogy rule generation sample data file 105 to the existing sample data of the correct distribution destination. Analogue rule generation sample data file 1 for other distribution destinations
In step 05, the analog inference rule generation unit 101 is activated as it is, and the analog inference rule file 106 is updated. The analogy rule of the distribution destination to which the sample data has been added may naturally be updated by the addition of positive example data, but the analogy rule of other distribution destinations will also have negative example data. May also be updated. In this way, the analogy rules grow to be better.

【００３８】さらに、上記処理の自動化の観点では、ク
レームデータ１０７を新規に受け付けると自動的に入力
部１０２を起動するようにし、さらに上述したように、
配信が正しく実行出来なかったクレームデータ１０７
は、そのようなクレームデータが発生する都度、又はあ
る件数蓄積がされた都度、ある期間が経過する都度、自
動的に類推ルール生成サンプルデータファイル１０５に
新しいサンプルデータを追加し類推ルール生成部１０１
を起動するようにすれば、学習機能を含めた自動化が可
能になる。Further, from the viewpoint of automation of the above-mentioned processing, the input unit 102 is automatically activated when a new claim data 107 is received.
Claim data 107 that could not be delivered correctly
Automatically adds new sample data to the analogy rule generation sample data file 105 every time such claim data is generated, every time a certain number of cases are accumulated, or each time a certain period elapses.
If it is started, automation including a learning function becomes possible.

【００３９】次に具体的な文書例を元に本発明の動作を
説明する。Next, the operation of the present invention will be described based on specific document examples.

【００４０】本発明の文書情報自動評価装置及び文書情
報自動評価システムは、以下に記述する２つの処理から
構成されている。「前処理」にあたる類推ルール生成部
１０１において、各担当者にクレームを配信するための
類推ルールを作成する。類推ルールの作成は、どの担当
者に配布するか予め決まっているクレームの類推ルール
生成サンプルデータファイル１０５を各担当者毎に複数
件、ほぼ同数ずつ用意し、上述した処理を実施すること
で、各担当者ごとの類推ルールを作成する。The automatic document information evaluation apparatus and the automatic document information evaluation system of the present invention include the following two processes. The analogy rule generation unit 101 corresponding to “preprocessing” creates analogy rules for distributing claims to each person in charge. The analogy rule creation is performed by preparing a plurality of analogy rule generation sample data files 105 of claims for which a predetermined person is to be distributed to each person, approximately the same number, and performing the processing described above. Create analogy rules for each person in charge.

【００４１】例えば、担当者１の類推ルールを生成しよ
うとしたとする。図２の（ａ）に示すクレームデータ１
０７の１つの例は、担当者１への配布が決まっているサ
ンプルデータとする。当該サンプルデータから、“類
推”、“ルール”、“クレーム”、および“担当者”と
いう単語が抽出され、前述した再現率、適合率の基準を
満たす図２の（ｂ）の類推ルールが作成されたとする。
（図２の（ｂ）の類推ルールの例において、類推ルール
の重みは、上述した処理に従い別途設定された値とす
る。）以上で「前処理」が終了する。For example, suppose that an attempt is made to generate an analogy rule for the person in charge 1. Claim data 1 shown in FIG.
One example of 07 is sample data that has been decided to be distributed to the person in charge 1. The words "analog", "rule", "claim", and "person in charge" are extracted from the sample data, and the analogical rule of FIG. Suppose it was done.
(In the example of the analogy rule in FIG. 2B, the weight of the analogy rule is a value separately set according to the above-described processing.) The “pre-processing” is completed.

【００４２】次に、実際に到着したクレームデータ１０
７に対する「分類処理」を開始することが可能となる。
電話・ｆａｘ・メール等で受け付けたクレームや質問は
入力部１０２において、テキストファイルに変換する。
このとき、電話で受け付けたクレームは音声認識処理を
施し、ｆａｘで受け付けたクレームは文字認識処理を施
す。また、メールのヘッダーなどの分類に不必要なデー
タは取り除かれる。Next, claim data 10 actually arrived
7 can be started.
Claims and questions received by telephone, fax, mail, etc. are converted into a text file in the input unit 102.
At this time, the claims received by telephone are subjected to voice recognition processing, and the claims received by fax are subjected to character recognition processing. In addition, data unnecessary for classification, such as the header of an email, is removed.

【００４３】次に自然言語解析部１０３において、テキ
ストファイル１０８に形態素解析処理を施しテキストフ
ァイル１０８を構成する名詞と動詞を抽出し、構成単語
データファイル１０９を生成する。このとき、動詞は活
用形を統一する。例えば、図３の（ａ）に例示したクレ
ームデータ１０５のクレームが到着した場合、図３の
（ｂ）に示す構成単語データが出力される。次に分類部
１０４において、類推ルール生成部１０１で作成した各
担当者の類推ルールファイル１０６と比較し、各類推ル
ールとの類似度を計算する。本クレームの担当者１との
類似度を調べる場合、図２の（ｂ）の担当者１向けに作
成された類推ルールを使用して、図３の（ｂ）の構成単
語データの類似度を算出する。この時、図２の（ｂ）の
類推ルールが含むルール（単語又は単語列）が、類似度
を算出するための単語となる。ここで抽出した単語につ
いて、その図３の（ｂ）での出現頻度数と図２の（ｂ）
の類推ルール中の類推ルールの重みを使用し、前述した
類似度計算式を用いて、類似度を算出する。図３の
（ｃ）は、抽出した単語を表している。Next, the natural language analysis unit 103 performs a morphological analysis process on the text file 108 to extract nouns and verbs constituting the text file 108, and generates a constituent word data file 109. At this time, the verbs unify the inflected forms. For example, when the claim of the claim data 105 illustrated in FIG. 3A arrives, the constituent word data illustrated in FIG. 3B is output. Next, the classification unit 104 compares the analogy rule file 106 of each person in charge created by the analogy rule generation unit 101 and calculates the degree of similarity with each analogous rule. When examining the similarity with the person in charge 1 of this claim, the analogy rule created for person 1 in FIG. 2B is used to determine the similarity of the constituent word data in FIG. calculate. At this time, the rule (word or word string) included in the analogy rule of FIG. 2B is a word for calculating the similarity. With respect to the words extracted here, the number of appearance frequencies in FIG. 3B and FIG.
The similarity is calculated using the similarity calculation formula described above, using the weight of the analogy rule in the analogy rule of (1). FIG. 3C shows the extracted words.

【００４４】図３の（ａ）のクレームについて、すべて
の担当者の類推ルールに基づき類似度を算出し、その中
の最も大きい値が別途決定しているしきい値よりも大き
い場合、その類似度を算出した類推ルールを持つ担当者
にクレームが配信される。最大の類似度がしきい値より
も小さい場合、分類できなかったクレームを処理する担
当者に配信する。For the claim of FIG. 3A, the similarity is calculated based on the analogy rules of all persons in charge, and if the largest value is larger than a separately determined threshold, the similarity is calculated. The complaint is delivered to the person in charge who has the analogy rule that calculated the degree. If the maximum similarity is less than the threshold, the unsuccessful classification is distributed to the person in charge of processing the complaint.

【００４５】類推ルールは、何らかのタイミング（例え
ば、算出された類似度は低いが重要なクレームを受け取
った場合など）や間違った担当者に配信してしまった時
に、適宜、再構築する。つまり、類推のためのルールを
学習することができる。例えば、配布先が間違っていた
場合にも、そのクレームデータ１０７は、前述した分類
出来なかったクレームを処理する担当者に返信され、そ
の場合は人の目で配布先が決定される。この場合におい
ても配布されるべき担当者のサンプルデータとしてこの
データを追加し、再度類推ルール生成部１０１による類
推ルールの見直しを行うことが可能である。上述したよ
うに、しきい値以下で配布先が決定できないクレームデ
ータ１０７、又間違って配布したクレームデータ１０７
については、人の目により正しい配布先を選定する必要
があるが、そのようなクレームデータ１０７を正しい配
布先についての類推ルール生成サンプルデータファイル
１０５として追加し再度類推ルールの見直しをさせるこ
とが出来きる。さらに、この処理を自動的に実行し、常
によりよい類推ルールを保持し続けていくことが可能と
なる。The analogy inference rules are reconstructed as appropriate at some timing (for example, when the calculated similarity is low, but an important claim is received) or when it is distributed to the wrong person in charge. That is, a rule for analogy can be learned. For example, even if the distribution destination is wrong, the claim data 107 is returned to the person in charge of processing the above-mentioned uncategorized complaint. In this case, the distribution destination is determined by human eyes. Also in this case, it is possible to add this data as sample data of the person in charge to be distributed, and to review the analogy rules again by the analogy rule generation unit 101. As described above, the claim data 107 whose distribution destination cannot be determined below the threshold value or the claim data 107
As for, it is necessary to select a correct distribution destination by human eyes, but such claim data 107 can be added as an analogy rule generation sample data file 105 for the correct distribution destination, and the analogy rule can be reviewed again. Wear. Further, it is possible to automatically execute this processing and to always keep a better analogy rule.

【００４６】次に、本発明の第２の実施例について説明
する。Next, a second embodiment of the present invention will be described.

【００４７】インターネット上に存在する各Ｗｅｂサー
バの提供するハイパー・テキスト・システムのＨＴＭＬ
ファイルは、ＨＴＭＬファイルを収集する機能を持つＷ
ｅｂロボットにより収集を行うことが出来る。Ｗｅｂロ
ボットは、多くのＷＷＷサーバが管理するＨＴＭＬファ
イルをＷＷＷサーバを順次移動しながらＨＴＭＬファイ
ルの検索システムなどで検索対象とするために無作為に
収集し蓄積するものである。ＷＷＷサーバの保有するＷ
ｅｂページは毎日更新が行われるものも多いためＷｅｂ
ロボットは周期的にＷＷＷサーバを巡回して更新が行わ
れたＷｅｂページを検出している。HTML of a hypertext system provided by each Web server existing on the Internet
The file has a function of collecting HTML files.
Collection can be performed by an eb robot. The Web robot randomly collects and accumulates HTML files managed by many WWW servers so as to be searched by an HTML file search system while sequentially moving the WWW servers. W owned by WWW server
Many web pages are updated daily, so Web
The robot periodically visits the WWW server to detect an updated Web page.

【００４８】以降、このＷｅｂロボットが収集した新規
に追加された情報や更新された情報についてのＨＴＭＬ
ファイルのＵＲＬを受信すると、受信したシステムにお
いては、ＵＲＬをもとに該当するＨＴＭＬファイルをア
クセスし当該文書情報を入手する。入手した当該文書に
対し先のクレームデータ１０７と同様な操作を行うこと
により必要な情報を選択することが出来る。尚、本実施
例の場合のサンプルデータの内容について補足すると、
正例データとしては利用者が希望する話題を掲載したＷ
ｅｂページのＨＴＭＬファイルを数十件与える。同様に
負例データとしては、利用者が希望する話題を掲載して
いないＨＴＭＬファイルを数十件与える。但し、この場
合負例データとしては、正例データの内容と全く無関係
なものではなく近い話題を持つＨＴＭＬファイルを選択
することが望ましい。Thereafter, the HTML about the newly added information and the updated information collected by the Web robot will be described.
When the URL of the file is received, the received system accesses the corresponding HTML file based on the URL and obtains the document information. By performing the same operation on the obtained document as in the case of the claim data 107, necessary information can be selected. In addition, to supplement the contents of the sample data in the case of this embodiment,
W which posted the topic that the user wanted as positive data
Give dozens of HTML files of web pages. Similarly, as the negative example data, dozens of HTML files not describing the topic desired by the user are given. However, in this case, as the negative example data, it is desirable to select an HTML file having a close topic without being completely unrelated to the contents of the positive example data.

【００４９】例えば、歌手Ａの話題を持つＨＴＭＬファ
イルが知りたい場合、歌手Ａの話題の掲載されたＨＴＭ
Ｌファイルを正例データとするが、負例データとしては
歌手Ｂとか歌手Ｃ等の掲載された情報を選択するという
ことである。以上のサンプルデータをもとに類推ルール
を生成する。尚、希望する話題である歌手Ａ以外の話題
を受け取った場合は、配布先として当該する話題を廃棄
するようなゴミ箱等にしておけばよい。For example, if the user wants to know an HTML file having the topic of singer A, an HTML file containing the topic of singer A
The L file is used as positive example data, but as negative example data, information such as singer B or singer C is selected. An analogy rule is generated based on the above sample data. When a topic other than singer A, which is a desired topic, is received, a trash box or the like may be disposed as a distribution destination so as to discard the topic.

【００５０】[0050]

【発明の効果】第１の効果は、配布先を決定するための
キーワードを人が事前に与えるのではなく、過去に既に
配布したクレームやＱ＆Ａのメールを配布先毎に用意し
配布先を決定するサンプルデータとすることで配布先を
特徴付ける単語を自動的にそこから選出し配布先を選定
する類推ルールとする。新たに受け付けたメールに対し
自動的にクレームの内容を類推ルールに従って分類す
る。カテゴリごとに担当者のアドレスを割り付けておけ
ば、たとえば掃除機のクレームが掃除機の担当者にダイ
レクトに届くことになり、膨大な情報を受け付ける部門
での省力化に寄与することになる。The first effect is that a keyword for determining a distribution destination is not given by a person in advance, but a complaint or Q & A mail which has been distributed in the past is prepared for each distribution destination and a distribution destination is determined. By using the sample data to be used, a word characterizing the distribution destination is automatically selected therefrom to form an analogy rule for selecting the distribution destination. The contents of the complaint are automatically classified according to the analogy rule with respect to the newly received mail. If the address of the person in charge is assigned to each category, for example, a vacuum cleaner complaint is directly sent to the person in charge of the vacuum cleaner, which contributes to labor saving in a department that receives a great deal of information.

【００５１】第２の効果は、類似度がしきい値よりも低
く配布先を決定出来なかったり、間違った配布先を選定
した場合、当該クレームデータを正しい配布先に対する
サンプルデータとして入力し再学習を行いそれまでの類
推ルールを見直すことが出来る。この再学習も自動化す
ることが可能である。The second effect is that when the similarity is lower than the threshold value and the distribution destination cannot be determined, or when the wrong distribution destination is selected, the claim data is input as sample data for the correct distribution destination and re-learning is performed. And review the previous analogy rules. This re-learning can also be automated.

[Brief description of the drawings]

【図１】本発明の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of the present invention.

【図２】本発明の実施例の前処理を説明するクレームの
サンプル例である。FIG. 2 is a sample example of a claim for explaining pre-processing of an embodiment of the present invention.

【図３】本発明の実施例の分類処理を説明する配信対象
のクレーム例である。FIG. 3 is an example of a claim to be distributed for explaining a classification process according to the embodiment of the present invention.

【図４】本発明の実施例の前処理におけるサンプルデー
タの単語に分解した例である。FIG. 4 is an example of decomposing sample data into words in preprocessing according to the embodiment of the present invention.

[Explanation of symbols]

１０１類推ルール生成部１０２入力部１０３自然言語解析部１０４分類部１０５類推ルール生成用サンプルデータ１０６類推ルールファイル１０７クレームデータ１０８テキストファイル１０９構成単語データファイル１１０分類結果 DESCRIPTION OF SYMBOLS 101 Analogue rule generation part 102 Input part 103 Natural language analysis part 104 Classification part 105 Sample data for analogy rule generation 106 Analogy rule file 107 Claim data 108 Text file 109 Composition word data file 110 Classification result

Claims

[Claims]

1. A pre-process for generating an analogy rule for classifying documents, and a classification for classifying and distributing complaint data including claims and inquiries by a person in charge using the analogy rule generated in the pre-process. The pre-processing is performed by inputting an analogy rule generation sample data file that stores claim data for all distribution destinations whose distribution destinations are already known, and inputting the analogy rule generation sample data file. An analogy rule generation unit that generates an analogy rule consisting of a word or a word string for a distribution destination, and an analogy rule file that stores the analogy rule generated by the analogy rule generation unit,
The classification process includes inputting the claim data and converting it into a text format, a text file storing the claim data converted into the text format, and inputting the claim data converted into the text file and performing morphological analysis. A natural language analysis unit for outputting as constituent word data composed of decomposed words, a constituent word data file storing the constituent word data, and the analogy rule file for constituent word data input from the constituent word data file. A classification unit that calculates a similarity by applying the analogy rule to each distribution destination, and distributes the claim data to the distribution destination having the highest similarity.

2. When generating an analogy rule in the preprocessing, an analogy for deciding a distribution destination by inputting claim data which has already been identified for all distribution destinations and pre-sorted for each distribution destination. 2. The automatic document information evaluation apparatus according to claim 1, wherein rules are generated for all distribution destinations.

3. The automatic document information evaluation apparatus according to claim 1, wherein the claim data includes any of electronic mail, voice information by telephone, voice mail, and fax information.

4. When generating analogy rules in the pre-processing, when distribution destinations have already been identified for all distribution destinations and claim data preliminarily classified for each distribution destination is input, claim data distributed to a certain distribution destination is obtained. It is classified into two with the claim data distributed to other than the destination, and the ratio of the appearance frequency of each word or word string included in the two classified claim data is calculated for all of the claim data of the destination. 3. The document according to claim 1, wherein a word or a word string is obtained, an analogy rule for the destination is generated by comparing the word or word string with a predetermined threshold, and the rule is performed for all the destinations. Automatic information evaluation device.

5. In the classification processing, the complaint data is converted into text format data, and the constituent word data obtained by decomposing the text data into a word string by morphological analysis is compared with the analogy rule of each distribution destination. Is executed for all the distribution destinations, the similarity is calculated, and when the maximum value calculated is equal to or greater than a predetermined threshold, the claim data is distributed to the distribution destination that has calculated the maximum value, and the threshold is calculated. 2. The automatic document information evaluation apparatus according to claim 1, wherein the distribution is not performed when the value is less than the value.

6. The automatic document information evaluation apparatus according to claim 1, wherein the analogy rule generation sample data file includes claim data or constituent word data corresponding to the claim data.

7. In claim data distribution, claim data not delivered below the threshold value or claim data delivered to an unauthorized delivery destination are sampled data of a delivery destination that should be delivered. Additional registration in the analogy rule generation sample data file,
2. The automatic document information evaluation apparatus according to claim 1, wherein the learning is performed by activating the analogy rule generation unit and revising the analogy rule.

8. When the claim data is received, the input unit is automatically started, and when the claim data which is not delivered below the threshold value or the claim data delivered to an illegal delivery destination is generated, the original delivery is performed. 2. The analogy rule generation sample data file is automatically added and registered as sample data of a distribution destination to be performed, and the analogy rule generation unit is automatically started to automatically review analogy rules. Or the document information automatic evaluation device according to 7.

9. E-mail, telephone, voice mail, fax
In the system that automatically distributes the claim data received by the user to the person in charge, the claim data for which the distribution destination is already known is input, the analogy rule composed of words or word strings is obtained for all distribution destinations, and newly received. After converting the claim data into a text format, a morphological analysis is performed to create constituent word data composed of decomposed words, and a similarity rule for each of the distribution destinations is applied to the constituent word data to calculate a similarity. A document information automatic evaluation system for distributing to a distribution destination having a maximum similarity.

10. An HT on which topics for which acquisition is requested are posted.
Generating analogy rules by using ML files and HTML files that do not include topics that are requested to be obtained as sample data, and generating HTMs such as Web robots on the Internet
10. The automatic document information evaluation system according to claim 9, wherein the latest HTML file is obtained from the L file search system, and an HTML file having a topic desired to be obtained is searched based on the analogy rules.