JPH11250072A

JPH11250072A - Information sorting method, device therefor and storage medium stored with information sorting program

Info

Publication number: JPH11250072A
Application number: JP10045770A
Authority: JP
Inventors: Tatsuya Muramoto; 達也村本; Seiji Washisaki; 誠司鷲崎
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1998-02-26
Filing date: 1998-02-26
Publication date: 1999-09-17

Abstract

PROBLEM TO BE SOLVED: To solve problems of poor sorting precision caused by the ambiguity of key word extraction and real-time processing due to the need of long learning time by making a single word extracted from a sorting object correspond to an existing hierarchical knowledge system. SOLUTION: A single word extracting part 2 executes the morpheme analysis of sorting object information acquired by a reference information acquiring part 1 and reference information being information related to it to divide a single word and to give a part of speech to the single word. Among the given parts of speech, a noun and an adjective word are extracted to obtain the occurrence frequency of them to transfer a single word of the highest occurrence frequency to a retrieving part 3. The part 3 retrieves the hierarchical knowledge system 4 by the single word extracted by the part 2 and makes sorting items correspond to the single word to obtain a sorting candidate. A sorting destination deciding part 5 calculates through the use of the occurrence frequency of the single word obtained by the part 2 and the frequency of the sorting item obtained by the part 3 and sorts the value to decide an item becoming a high order to be a sorting destination item.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、情報分類方法及び
装置及び情報分類プログラムを格納した記憶媒体に係
り、特に、情報内の単語の頻度を分析し、当該単語を階
層型知識体系に対応させることで、予め整理された分類
項目の中から妥当な分類先に情報を分類する情報分類方
法及び装置及び情報分類プログラムを格納した記憶媒体
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information classification method and apparatus, and a storage medium storing an information classification program, and more particularly, to analyzing the frequency of words in information and making the words correspond to a hierarchical knowledge system. Accordingly, the present invention relates to an information classification method and apparatus for classifying information to a valid classification destination from pre-arranged classification items, and a storage medium storing an information classification program.

【０００２】[0002]

【従来の技術】従来の情報分類技術として、当該情報内
のテキスト情報を形態素解析技術等により単語に分解
し、その中から当該情報を特徴付けるような予め用意し
てあるキーワードを抽出し、そのキーワードに対応する
分類先に分類する方法がある。この例として、電子メー
ル整理ソフトの“Visual Mail ”の自動分類機能があ
る。2. Description of the Related Art As a conventional information classification technique, text information in the information is decomposed into words by a morphological analysis technique or the like, and a keyword prepared in advance to characterize the information is extracted from the text information. There is a method of classifying into the classification destination corresponding to. An example of this is the automatic classification function of the e-mail organizing software "Visual Mail".

【０００３】また、その他の分類方法として、予め分類
されている情報を答えとして特徴を学習することによ
り、分類する当該情報の特徴から分類先を決定する方法
がある。[0003] As another classification method, there is a method of determining a classification destination from characteristics of the information to be classified by learning a characteristic by using information classified in advance as an answer.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上記の
予め用意されているキーワードを用いて分類する方法で
は、当該情報から当該情報を特徴付ける妥当なキーワー
ドを抽出するのが困難であり、分類精度が悪いという問
題がある。さらに、特徴を学習することにより分類先を
決定する方法は、長い学習時間が必要となり、実時間処
理が必要なシステムへの応用は困難である。However, in the above-described method of classifying using a keyword prepared in advance, it is difficult to extract a valid keyword characterizing the information from the information, and the classification accuracy is poor. There is a problem. Further, the method of determining a classification destination by learning features requires a long learning time, and it is difficult to apply the method to a system that requires real-time processing.

【０００５】このように、上記従来の方法では、当該情
報からのキーワードの曖昧性から分類精度が悪くなるこ
とが考えられる。また、予め分類されている情報の特徴
の学習時間は、実時間処理の実現には問題である。本発
明は、上記の点に鑑みなされたもので、分類対象の情報
と当該情報が参照する情報より単語を抽出し、抽出した
単語を既存の階層型知識体系に対応付けることにより、
従来におけるキーワード抽出の曖昧性を起因とする分類
精度の悪さ、長い学習時間による実時間処理の問題を解
決した情報分類方法及び装置及び情報分類プログラムを
格納した記憶媒体を提供することを目的とする。As described above, in the above-described conventional method, it is conceivable that the classification accuracy is deteriorated due to the ambiguity of the keyword from the information. Further, the learning time of the characteristics of the information classified in advance is a problem in realizing real-time processing. The present invention has been made in view of the above points, and extracts words from information to be classified and information referred to by the information, and associates the extracted words with an existing hierarchical knowledge system.
It is an object of the present invention to provide an information classification method and apparatus which has solved the problems of conventional classification inaccuracies caused by ambiguity in keyword extraction and real-time processing due to a long learning time, and a storage medium storing an information classification program. .

【０００６】[0006]

【課題を解決するための手段】図１は、本発明の原理を
説明するための図である。本発明（請求項１）は、分類
対象の情報を妥当な分類先に分類する情報分類方法にお
いて、分類対象情報が参照している参照情報を取得し
（ステップ１）、分類対象情報と参照情報から分類に有
用な単語を抽出し（ステップ２）、抽出された単語を階
層型知識体系の分類項目に対応付けし（ステップ３）、
対応付けされた分類項目中から分類先を決定し、分類対
象情報の分類を行う（ステップ４）。FIG. 1 is a diagram for explaining the principle of the present invention. According to the present invention (claim 1), in an information classification method for classifying information to be classified into valid classification destinations, reference information referred to by the classification target information is obtained (step 1), and the classification target information and the reference information are obtained. (Step 2), and associates the extracted words with the classification items of the hierarchical knowledge system (step 3).
The classification destination is determined from the associated classification items, and the classification target information is classified (step 4).

【０００７】本発明（請求項２）は、参照情報を取得す
る際に、分類対象情報の文書を解析し、構造情報を取得
し、構造情報に基づいてアクセスし、リンク情報や関連
情報を含む。本発明（請求項３）は、分類に有用な単語
を抽出する際に、分類対象情報と参照情報内のテキスト
情報を形態素解析し、形態素解析により分割された単語
の品詞のうち、名詞、形容動詞を抽出し、出現頻度の大
きい順にソートし、最も出現頻度の大きい単語を抽出す
る。According to the present invention (claim 2), when acquiring reference information, a document of the classification target information is analyzed, structure information is acquired, access is made based on the structure information, and link information and related information are included. . According to the present invention (claim 3), when extracting a word useful for classification, text information in the classification target information and the reference information is morphologically analyzed, and among the parts of speech of the words divided by the morphological analysis, nouns and adjectives are used. Verbs are extracted and sorted in descending order of appearance frequency, and words having the highest appearance frequency are extracted.

【０００８】本発明（請求項４）は、分類先を決定する
際に、抽出された単語の出現頻度と、階層型知識体系を
用いて対応付けられた分類項目の頻度の積和を取り、最
も該積和の大きいものを分類先として決定する。図２
は、本発明の原理構成図である。本発明（請求項５）
は、分類対象の情報を妥当な分類先に分類する情報分類
装置であって、分類対象情報が参照している参照情報を
取得する参照情報取得手段１と、分類対象情報と参照情
報から分類に有用な単語を抽出する単語抽出手段２と、
単語抽出手段２により抽出された単語を階層型知識体系
４の分類項目に対応付けする分類項目対応付け手段３
と、対応付けされた分類項目中から分類先を決定する分
類先決定手段５とを有する。According to the present invention (claim 4), when determining a classification destination, a product sum of an appearance frequency of an extracted word and a frequency of a classification item associated using a hierarchical knowledge system is obtained. The one with the largest sum of products is determined as the classification destination. FIG.
FIG. 1 is a diagram illustrating the principle of the present invention. The present invention (Claim 5)
Is an information classification device for classifying information to be classified into valid classification destinations, a reference information acquisition unit 1 for acquiring reference information referred to by the classification target information, and a classification from the classification target information and the reference information. Word extraction means 2 for extracting useful words;
Classification item association means 3 for associating words extracted by word extraction means 2 with classification items of hierarchical knowledge system 4
And classification destination determining means 5 for determining a classification destination from among the associated classification items.

【０００９】本発明（請求項６）は、参照情報取得手段
１において、分類対象情報の文書を解析し、構造情報を
取得する手段と、構造情報に基づいてアクセスし、リン
ク情報や関連情報を含む参照情報を取得する手段を含
む。本発明（請求項７）は、単語抽出手段２において、
分類対象情報と参照情報内のテキスト情報を形態素解析
する手段と、形態素解析により分割された単語の品詞の
うち、名詞、形容動詞を抽出し、出現頻度の大きい順に
ソートし、最も出現頻度の大きい単語を抽出する手段と
を含む。According to the present invention (claim 6), the reference information obtaining means 1 analyzes a document of the classification target information and obtains structural information, and accesses based on the structural information to link information and related information. Means for acquiring reference information including the reference information. According to the present invention (claim 7), in the word extracting means 2,
A means for morphologically analyzing the text information in the classification target information and the reference information, and extracting nouns and adjective verbs from the parts of speech of the words divided by the morphological analysis, sorting them in descending order of appearance frequency, and having the highest appearance frequency Means for extracting words.

【００１０】本発明（請求項８）は、分類先決定手段５
において、抽出された単語の出現頻度と、階層型知識体
系を用いて対応付けられた分類項目の頻度の積和を取
り、最も該積和の大きいものを分類先として決定する手
段を含む。本発明（請求項９）は、分類対象の情報を妥
当な分類先に分類する情報分類プログラムを格納した記
憶媒体であって、分類対象情報が参照している参照情報
を取得する参照情報取得プロセスと、分類対象情報と参
照情報から分類に有用な単語を抽出する単語抽出プロセ
スと、単語抽出プロセスにより抽出された単語を階層型
知識体系の分類項目に対応付けする分類項目対応付けプ
ロセスと、対応付けされた分類項目中から分類先を決定
する分類先決定プロセスとを有する。According to the present invention (claim 8), the classification destination determining means 5
And means for taking the product sum of the frequency of appearance of the extracted word and the frequency of the classification item associated with each other by using the hierarchical knowledge system, and determining the item having the largest product sum as the classification destination. The present invention (claim 9) is a storage medium storing an information classification program for classifying information to be classified into valid classification destinations, and a reference information acquisition process for acquiring reference information referred to by the classification target information. A word extraction process for extracting words useful for classification from the classification target information and the reference information, a classification item association process for associating the words extracted by the word extraction process with the classification items of the hierarchical knowledge system, And a classification destination determination process for determining a classification destination from the assigned classification items.

【００１１】本発明（請求項１０）は、参照情報取得プ
ロセスにおいて、分類対象情報の文書を解析し、構造情
報を取得するプロセスと、構造情報に基づいてアクセス
し、リンク情報や関連情報を含む参照情報を取得するプ
ロセスを含む。本発明（請求項１１）は、単語抽出プロ
セスにおいて、分類対象情報と参照情報内のテキスト情
報を形態素解析するプロセスと、形態素解析により分割
された単語の品詞のうち、名詞、形容動詞を抽出し、出
現頻度の大きい順にソートし、最も出現頻度の大きい単
語を抽出するプロセスとを含む。According to a tenth aspect of the present invention, in the reference information acquisition process, a process of analyzing the document of the classification target information and acquiring the structure information, and accessing based on the structure information, including the link information and the related information. Includes the process of obtaining reference information. According to the present invention (claim 11), in a word extraction process, a process of morphologically analyzing text information in classification target information and reference information, and extracting a noun and an adjective verb from a part of speech of a word divided by morphological analysis. And sorting the words having the highest frequency of occurrence and extracting the words having the highest frequency of occurrence.

【００１２】本発明（請求項１２）は、分類先決定プロ
セスにおいて、抽出された単語の出現頻度と、階層型知
識体系を用いて対応付けられた分類項目の頻度の積和を
取り、最も該積和の大きいものを分類先として決定する
プロセスを含む。上記のように、本発明は、分類対象情
報からだけではなく、当該情報が参照する参照情報から
も単語を抽出する。そのために、分類のためにより有用
な単語を抽出することが可能であり、精度のよい分類を
可能とする。According to the present invention (claim 12), in the classification destination determination process, the product sum of the appearance frequency of the extracted word and the frequency of the classification item associated using the hierarchical knowledge system is calculated. Including the process of determining the one with the large sum of products as the classification destination. As described above, the present invention extracts words not only from classification target information but also from reference information referred to by the information. For this reason, it is possible to extract more useful words for classification, and to perform accurate classification.

【００１３】また、抽出した単語を既存の階層知識体系
に対応付けするため、分類対象情報から特定のキーワー
ドが抽出されなくとも、精度のよい分類が可能となり、
分類前の学習も不要となる。Further, since the extracted words are associated with the existing hierarchical knowledge system, accurate classification can be performed without extracting a specific keyword from the classification target information.
Learning before classification is also unnecessary.

【００１４】[0014]

【発明の実施の形態】図３は、本発明の分類装置の構成
を示す。同図に示す分類装置は、分類対象情報が参照し
ている参照情報を取得する参照情報取得部１、分類対象
情報と参照情報から単語を抽出する単語抽出部２、単語
抽出部２で抽出した単語を階層型知識体系に対応付ける
検索部３、既存の階層型知識体系４、検索部３で得た分
類先の候補の中から分類先を決定する分類先決定部５か
ら構成される。FIG. 3 shows the configuration of a classification device according to the present invention. The classification device shown in FIG. 1 is extracted by a reference information acquisition unit 1 that acquires reference information referred to by the classification target information, a word extraction unit 2 that extracts words from the classification target information and the reference information, and a word extraction unit 2 that extracts the words. It comprises a search unit 3 for associating words with a hierarchical knowledge system, an existing hierarchical knowledge system 4, and a classification destination determination unit 5 for determining a classification destination from among the classification destination candidates obtained by the search unit 3.

【００１５】参照情報取得部１は、入力された分類対象
情報を解析して構造情報に基づいて、関連する情報、補
足説明のための情報参照情報（リンク情報）を取得し、
分類対象情報と当該参照情報を単語抽出部２に転送す
る。単語抽出部２は、取得した分類対象情報と参照情報
の形態素解析を行い、単語分割と分割された単語に対し
て品詞を付与する。付与された品詞のうち、名詞及び形
容動詞を抽出して、それらの出現頻度を求め、出現頻度
の最も高い単語を検索部３に転送する。The reference information acquisition unit 1 analyzes the input classification target information and acquires related information and information reference information (link information) for supplementary explanation based on the structure information.
The classification target information and the reference information are transferred to the word extraction unit 2. The word extraction unit 2 performs a morphological analysis of the acquired classification target information and the reference information, and performs word division and gives a part of speech to the divided words. Among the assigned parts of speech, nouns and adjective verbs are extracted, their appearance frequencies are determined, and words having the highest appearance frequencies are transferred to the search unit 3.

【００１６】検索部３は、単語抽出部２で抽出された単
語で階層型知識体系４を検索し、当該単語に対応する分
類項目を対応付け、分類候補を取得する。分類先決定部
５は、単語抽出部２で取得した単語の出現頻度と、検索
部３で取得した分類項目の頻度を用いて計算を行い、そ
の値をソートして、上位となった項目を分類先項目とし
て決定する。The search unit 3 searches the hierarchical knowledge system 4 with the words extracted by the word extraction unit 2, associates the classification items corresponding to the words, and obtains classification candidates. The classification destination determination unit 5 calculates using the frequency of appearance of the words obtained by the word extraction unit 2 and the frequency of the classification items obtained by the search unit 3, sorts the values, and sorts the items of higher rank. Determined as the classification destination item.

【００１７】[0017]

【実施例】以下、図面と共に本発明の実施例を説明す
る。以下では、インターネット上のＨＴＭＬ形式で書か
れたホームページの情報を既存の階層型知識体系４とし
て、“Yahoo Japan(http://www.yahoo.co.jp/)やＮＴＴ
ＤＩＲＥＣＴＯＲＹ(http://navi.ntt.co.jp/)に代表
されるインターネット上のディレクトリ型サーチエンジ
ンを利用した場合を例として分類する過程を説明する。
この場合、分類先はこのディレクトリ型サーチエンジン
の各分類項目となる。Embodiments of the present invention will be described below with reference to the drawings. In the following, the information of the homepage written in HTML format on the Internet is used as the existing hierarchical knowledge system 4 as "Yahoo Japan (http://www.yahoo.co.jp/) and NTT.
The classification process will be described by taking as an example a case where a directory-type search engine on the Internet represented by DIRECTORY (http://navi.ntt.co.jp/) is used.
In this case, the classification destination is each classification item of this directory type search engine.

【００１８】参照情報取得部１は、分類対象情報の文書
を解析してタグと呼ばれる構造情報に基づいて参照情報
を取得する。図４は、本発明の一実施例のＨＴＭＬ文書
の例を示す。インターネット上のホームページが同図に
示すように、ＨＴＭＬ(Hyper Text Markup Language)形
式と呼ばれる言語で記述されている場合、 <a href=" URL "> 〜</a> <frame src=" URL "> というタグに注目し、その中に記述されているＵＲＬ(U
niform Resource Locator)にアクセスすることにより、
参照情報を取得する。図４の例では、 http://aaa.bbb.com/ http://ccc.ddd.com/ へアクセスし、参照情報を取得する。この参照情報はリ
ンク情報とも呼ばれ、当該情報に関連する情報であった
り、当該情報を捕捉説明する情報である可能性が高い。
参照情報をも考慮に入れることにより、分類対象情報に
十分なテキスト情報が含まれなくとも精度の良い分類が
可能になる。The reference information acquisition unit 1 analyzes a document of the information to be classified and acquires reference information based on structural information called a tag. FIG. 4 shows an example of an HTML document according to an embodiment of the present invention. When a home page on the Internet is described in a language called HTML (Hyper Text Markup Language) format as shown in FIG. 1, <a href="URL"> to </a><frame src = "URL"> Tag, and the URL (U
niform Resource Locator)
Get reference information. In the example of FIG. 4, the user accesses http://aaa.bbb.com/ http://ccc.ddd.com/ to acquire reference information. This reference information is also called link information, and is likely to be information related to the information or information for capturing and explaining the information.
By taking the reference information into consideration, accurate classification can be performed even if the classification target information does not include sufficient text information.

【００１９】単語抽出部２では、まず、当該分類対象情
報と参照情報のテキスト情報を既存技術である茶筌(htt
p://cactus.aist-nara.ac.jp/lab/nlt/chasen.html) の
ような形態素解析にかけて単語分解する。これにより、
当該分類対象情報と参照情報内のテキスト情報が単語に
分解され、それぞれの単語の品詞が判別される。当該分
類対象情報と参照情報を単語分解した結果の例を図５に
示す。この分解された単語の中から名詞、形容動詞を抽
出し、出現頻度でソートし、出現頻度の大きいものを採
用する。図６は、本発明の一実施例の抽出単語と出現度
数の例を示す。同図の例は、抽出単語を出現頻度でソー
トした結果を示しており、この例では、「特許庁」とい
う単語の出現頻度が一番大きいことが分かる。First, the word extracting unit 2 converts the text information of the classification target information and the reference information into Chasen (htt
Decompose words by morphological analysis such as p: //cactus.aist-nara.ac.jp/lab/nlt/chasen.html). This allows
The classification target information and the text information in the reference information are decomposed into words, and the part of speech of each word is determined. FIG. 5 shows an example of the result of word decomposition of the classification target information and the reference information. Nouns and adjective verbs are extracted from the decomposed words, sorted by appearance frequency, and those having a high appearance frequency are adopted. FIG. 6 shows an example of an extracted word and an appearance frequency according to an embodiment of the present invention. The example in the figure shows the result of sorting the extracted words by the appearance frequency. In this example, it can be seen that the appearance frequency of the word “Patent Office” is the highest.

【００２０】検索部３は、単語抽出部２で抽出した各単
語に対して、階層型知識体系４の分類項目を対応付け
る。具体的には、ディレクトリ検索サービスの“Yahoo
Japan(http://www.yahoo.co.jp/) ”のように、単語を
検索語句として入力すると、階層型知識体系４に格納さ
れている情報の中から検索語句を含む情報とその情報が
格納されている階層型知識体系４の分類項目を出力する
モジュールを用いて、この検索結果から分類項目とその
頻度を得る。分類項目の頻度とは検索結果の情報の中
で、その分類項目に該当する情報の数を示す。The search unit 3 associates each word extracted by the word extraction unit 2 with a classification item of the hierarchical knowledge system 4. Specifically, the directory search service “Yahoo
Japan (http://www.yahoo.co.jp/) If a word is entered as a search term, information including the search term from the information stored in the hierarchical knowledge system 4 and its information Is obtained from this search result using a module that outputs the classification items of the hierarchical knowledge system 4 in which the classification item is stored. Indicates the number of information items corresponding to.

【００２１】図７は、本発明の一実施例の検索結果の例
である。同図において「タイトルｎ」とあるのが、検索
語句を含む情報で、「ジャンル・・・」とあるのが、情
報が格納されている階層型知識体系４の分類項目であ
る。この例で、ジャンル：［趣味・生活］−［趣味］−［その他］−
［発明］−［］−［］に注目すると、タイトル２、７、９、１０が該当するの
で度数は「４」となる。このように、単語を階層型知識
体系４に対応付けることにより分類先の候補を得る。FIG. 7 shows an example of a search result according to an embodiment of the present invention. In the figure, "Title n" is information including a search term, and "Genre ..." is a classification item of the hierarchical knowledge system 4 in which the information is stored. In this example, genre: [Hobby / Life]-[Hobby]-[Other]-
Paying attention to [Invention]-[]-[], the frequency is "4" because titles 2, 7, 9, and 10 correspond. Thus, by associating words with the hierarchical knowledge system 4, candidates for the classification destination are obtained.

【００２２】分類先決定部では、Ｆｗ_iを単語抽出部２
で得られる単語ｉの出現頻度とし、Ｆｃ_ijを検索部３で
得られる単語ｉを検索語句とした時の分類項目ｊの頻度
とした時の[0022] In the classification destination determining unit, word extraction unit 2 Fw _i
, And Fc _ij is the frequency of the classification item j when the word i obtained by the search unit 3 is the search phrase.

【００２３】[0023]

【数１】 (Equation 1)

【００２４】を分類項目について計算し、この値をソー
トし、この上位項目を採用する。図８は、本発明の一実
施例の分類項目とソート結果の例を示す。この例では、［趣味・生活］−［趣味］−［その他］−［発明］が分類先として決定される。また、上記の実施例では、
図３の構成要素に基づいて説明したが、この例に限定さ
れることなく、図３の各構成要素をプログラムとして構
築し、当該分類装置として利用されるコンピュータに接
続されるディスク装置や、フロッピーディスクやＣＤ−
ＲＯＭ等の可搬記憶媒体に格納しておき、本発明を実行
する際に、インストールすることにより容易に本発明を
実現することができる。Is calculated for the classification item, this value is sorted, and this upper item is adopted. FIG. 8 shows an example of classification items and sorting results according to an embodiment of the present invention. In this example, [hobby / life]-[hobby]-[other]-[invention] is determined as the classification destination. In the above embodiment,
Although the description has been given based on the components in FIG. 3, the present invention is not limited to this example. Each of the components in FIG. 3 is constructed as a program, and a disk device or a floppy disk connected to a computer used as the classification device. Discs and CD-
The present invention can be easily realized by storing it in a portable storage medium such as a ROM and installing the same when executing the present invention.

【００２５】なお、本発明は、上記の実施例に限定され
ることなく、特許請求の範囲内で種々変更・応用が可能
である。The present invention is not limited to the above embodiment, but can be variously modified and applied within the scope of the claims.

【００２６】[0026]

【発明の効果】上述のように、本発明によれば、分類対
象情報と当該分類対象情報が参照する情報から単語を抽
出し、階層型知識体系に対応付けすることにより、事前
の学習をすることなしに、当該情報の分類が可能にな
る。さらに、特定のキーワードが抽出されなくとも精度
の良い分類が可能となる。As described above, according to the present invention, prior learning is performed by extracting words from classification target information and information referred to by the classification target information and associating them with a hierarchical knowledge system. Without this, the information can be classified. Further, accurate classification can be performed without extracting a specific keyword.

[Brief description of the drawings]

【図１】本発明の原理を説明するための図である。FIG. 1 is a diagram for explaining the principle of the present invention.

【図２】本発明の原理構成図である。FIG. 2 is a principle configuration diagram of the present invention.

【図３】本発明の分類装置の構成図である。FIG. 3 is a configuration diagram of a classification device of the present invention.

【図４】本発明の一実施例のＨＴＭＬ文書の例である。FIG. 4 is an example of an HTML document according to an embodiment of the present invention.

【図５】本発明の一実施例の形態素解析結果の例であ
る。FIG. 5 is an example of a morphological analysis result of one embodiment of the present invention.

【図６】本発明の一実施例の抽出単語と出現度数の例で
ある。FIG. 6 is an example of an extracted word and an appearance frequency according to an embodiment of the present invention.

【図７】本発明の一実施例の検索結果の例である。FIG. 7 is an example of a search result according to an embodiment of the present invention.

【図８】本発明の一実施例の分類項目とソート結果の例
である。FIG. 8 is an example of classification items and sorting results according to an embodiment of the present invention.

[Explanation of symbols]

１参照情報取得部、参照情報取得手段２単語抽出部、単語抽出手段３検索部、分類項目対応付け手段４階層型知識体系５分類先決定部、分類先決定手段 DESCRIPTION OF SYMBOLS 1 Reference information acquisition part, reference information acquisition means 2 Word extraction part, word extraction means 3 Search part, classification item correspondence means 4 Hierarchical knowledge system 5 Classification destination determination part, classification destination determination means

Claims

[Claims]

1. An information classification method for classifying information to be classified into valid classification destinations, acquiring reference information referred to by the classification target information, and using the classification target information and the reference information to find useful words for classification. Extracting the extracted words, associating the extracted words with classification items of a hierarchical knowledge system, determining a classification destination from the associated classification items, and classifying the classification target information. Classification method.

2. Acquiring the reference information, analyzing a document of the classification target information, acquiring structure information, accessing based on the structure information, and acquiring reference information including link information and related information. The information classification method according to claim 1, wherein

3. Extracting a word useful for the classification, morphologically analyzes the text information in the classification target information and the reference information, and extracts nouns and adjectives from the parts of speech of the words divided by the morphological analysis. 2. The information classification method according to claim 1, wherein verbs are extracted, sorted in descending order of appearance frequency, and words having the highest appearance frequency are extracted.

4. When deciding the classification destination, a product sum of an appearance frequency of the extracted word and a frequency of a classification item associated by using the hierarchical knowledge system is calculated.
2. The information classification method according to claim 1, wherein the one having the largest sum of products is determined as a classification destination.

5. An information classifying apparatus for classifying information to be classified into valid classification destinations, comprising: reference information obtaining means for obtaining reference information referred to by the classification target information; Word extraction means for extracting words useful for classification from information; classification item association means for associating the words extracted by the word extraction means with classification items of a hierarchical knowledge system; An information classification device, comprising: a classification destination determination unit that determines a classification destination from inside.

6. The reference information acquisition unit analyzes a document of the classification target information and acquires structure information, and accesses based on the structure information to acquire reference information including link information and related information. 6. The information classification device according to claim 5, further comprising means for performing.

7. The word extracting means, which morphologically analyzes the text information in the classification target information and the reference information, and extracts a noun and an adjective verb from the parts of speech of the words divided by the morphological analysis. 6. The information classification apparatus according to claim 5, further comprising: means for sorting words having the highest appearance frequency and extracting words having the highest appearance frequency.

8. The classification destination determining means calculates a product sum of an appearance frequency of the extracted word and a frequency of a classification item associated using the hierarchical knowledge system,
6. The information classification device according to claim 5, further comprising means for determining a product having the largest sum of products as a classification destination.

9. A storage medium storing an information classification program for classifying information to be classified into valid classification destinations, a reference information acquisition process for acquiring reference information referred to by the classification target information, A word extraction process of extracting words useful for classification from the target information and the reference information; a classification item association process of associating the words extracted by the word extraction process with classification items of a hierarchical knowledge system; And a classification destination determining process for determining a classification destination from among the added classification items. A storage medium storing an information classification program.

10. The reference information acquisition process: analyzing a document of the classification target information and acquiring structure information; accessing based on the structure information to acquire reference information including link information and related information 10. The process of claim 9 including the step of performing
A storage medium storing the described information classification program.

11. The word extraction process includes: a process of morphologically analyzing the text information in the classification target information and the reference information; and extracting a noun and an adjective verb from the parts of speech of the words divided by the morphological analysis. 10. The storage medium storing the information classification program according to claim 9, further comprising: a process of sorting words having the highest frequency of appearance and extracting words having the highest frequency of appearance.

12. The classifying destination determining process calculates a sum of products of the frequency of appearance of the extracted word and the frequency of a classification item associated using the hierarchical knowledge system.
10. The storage medium storing the information classification program according to claim 9, further comprising a process of determining a product having the largest sum of products as a classification destination.