JP2001306594A

JP2001306594A - Information retrieval device and storage medium stored with information retrieval program

Info

Publication number: JP2001306594A
Application number: JP2000117405A
Authority: JP
Inventors: Shinichiro Tsudaka; 新一郎津高; Hidekazu Arita; 英一有田; Hiroyoshi Konaka; 裕喜小中
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2000-04-19
Filing date: 2000-04-19
Publication date: 2001-11-02

Abstract

PROBLEM TO BE SOLVED: To provide an information retrieval device for solving the problem in the conventional device that a user cannot acquire knowledge efficiently with the result of information retrieval as a starting point since the result of ordinary information retrieval is presented merely as a list of documents. SOLUTION: While defining a document database 101 as a retrieval target, an information retrieving means 102 performs basic information retrieval, on the basis of the information request of the user, document groups on the acquired list of the retrieved result are sorted into document sets composed of mutually similar documents by a retrieved result sort means 103, a characteristic word and a characteristic relation are extracted from each of document sets of the retrieved result by a characteristic word extract means 104 and a characteristic relation extracting means 105 and on the basis of the sorted groups, the extracted word and relation, the information of an output picture operable for the user is generated by an output information-editing means 106.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、文書データベー
スをユーザの情報要求に従って検索する情報検索装置に
おいて、検索結果を自動分類し、検索結果の文書群にお
いて特徴的な単語及び特徴的な関係を求めることによ
り、ユーザの対話的な情報検索行動を支援し、効率的に
知識獲得をすることができる出力情報を提供する情報検
索装置及び情報検索プログラムを格納した記憶媒体に関
するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information retrieval apparatus for retrieving a document database in accordance with a user's information request, automatically classifying the retrieval results, and finding characteristic words and characteristic relationships in a document group of the retrieval results. Accordingly, the present invention relates to an information search device and a storage medium storing an information search program that provide output information that supports a user's interactive information search behavior and enables efficient knowledge acquisition.

【０００２】[0002]

【従来の技術】従来の情報検索装置においては、ユーザ
の情報検索要求に対して検索された文書データベース中
の各文書について、ユーザの情報要求に対する適合度を
計算し、この適合度によってソートされた文書のリスト
をユーザに提供することが通常行われている。2. Description of the Related Art In a conventional information retrieval apparatus, for each document in a document database retrieved in response to a user's information retrieval request, the relevance to the user's information request is calculated and sorted according to the relevance. It is common practice to provide a list of documents to a user.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記の
従来の情報検索装置では、この適合度は必ずしもユーザ
の意向や直感を反映するものであるとは限らない。ま
た、情報検索結果は単なる文書リストであるため、情報
検索結果を起点とするユーザの効率的な知識獲得を支援
するには十分ではないという問題がある。However, in the above-mentioned conventional information retrieval apparatus, the degree of matching does not always reflect the intention and intuition of the user. Further, since the information search result is merely a document list, there is a problem that it is not enough to support a user to efficiently acquire knowledge based on the information search result.

【０００４】また、近年のインターネット上のサーチエ
ンジンに顕著なように、膨大な文書群を対象とする場
合、検索によって得られる文書リストは膨大なものにな
る場合も多く、各文書リストの要素を全てユーザがチェ
ックすることは、事実上不可能であるという問題があ
る。[0004] Further, as is remarkable in search engines on the Internet in recent years, when an enormous document group is targeted, a document list obtained by a search often becomes enormous, and elements of each document list are used as elements. There is a problem that it is virtually impossible for all users to check.

【０００５】この発明は、上記のような問題点に鑑みな
されたもので、情報検索の結果を起点とした効率的な知
識獲得ができる、対話的な情報検索装置および情報検索
プログラムを格納した記憶媒体の提供を目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and has an interactive information retrieval apparatus and a storage storing an information retrieval program capable of efficiently acquiring knowledge starting from an information retrieval result. The purpose is to provide media.

【０００６】[0006]

【課題を解決するための手段】この発明に係る第１の情
報検索装置は、予め構築された文書データベースをユー
ザにより入力される情報要求に従って検索する情報検索
装置であって、前記文書データベースを検索対象とし
て、前記ユーザの情報要求に基づいて基本的な情報検索
を行う情報検索手段、該情報検索手段により取得した検
索結果のリストにおける文書群を文書集合に分類する検
索結果分類手段、前記文書群それぞれから特徴的な単語
を抽出する特徴的単語抽出手段、前記文書群それぞれか
ら特徴的な単語の関係を抽出する特徴的関係抽出手段、
前記検索結果分類手段と、前記特徴的単語抽出手段と、
前記特徴的関係抽出手段とから得られた結果に基づい
て、前記ユーザに対して操作可能な出力画面の情報を生
成する出力情報編集手段を有するものである。A first information retrieval apparatus according to the present invention is an information retrieval apparatus for retrieving a previously constructed document database in accordance with an information request input by a user, and retrieves the document database. Information retrieval means for performing basic information retrieval based on the information request of the user; search result classification means for classifying a document group in a list of retrieval results obtained by the information retrieval means into a document set; Characteristic word extracting means for extracting a characteristic word from each; characteristic relation extracting means for extracting a characteristic word relation from each of the documents;
Said search result classification means, said characteristic word extraction means,
An output information editing unit for generating information on an output screen operable for the user based on a result obtained from the characteristic relationship extraction unit.

【０００７】この発明に係る第２の情報検索装置は、予
め構築された文書データベースをユーザにより入力され
る情報要求に従って検索する情報検索装置であって、前
記文書データベースを検索対象として、前記ユーザの情
報要求に基づいて基本的な情報検索を行う情報検索手
段、該情報検索手段により取得した検索結果の文書群そ
れぞれから特徴的な単語を抽出する特徴的単語抽出手
段、該特徴的単語抽出手段により抽出した特徴的な単語
に基づき、文書群を互いに類似した文書集合に分類する
検索結果分類手段、前記文書群それぞれから特徴的な単
語の関係を抽出する特徴的関係抽出手段、前記検索結果
分類手段と、前記特徴的関係抽出手段とから得られた結
果に基づいて、前記ユーザに対して操作可能な出力画面
の情報を生成する出力情報編集手段を有するものであ
る。A second information retrieval apparatus according to the present invention is an information retrieval apparatus for retrieving a previously constructed document database in accordance with an information request input by a user. An information search unit for performing a basic information search based on the information request, a characteristic word extraction unit for extracting a characteristic word from each document group of the search result acquired by the information search unit, and a characteristic word extraction unit Search result classifying means for classifying documents into document sets similar to each other based on the extracted characteristic words, characteristic relation extracting means for extracting a relation between characteristic words from each of the document groups, and the search result classifying means And an output for generating information on an output screen operable for the user based on a result obtained from the characteristic relation extracting unit. Those having a broadcast editing means.

【０００８】この発明に係る第３の情報検索装置は、予
め構築された文書データベースをユーザにより入力され
る情報要求に従って検索する情報検索装置であって、前
記文書データベースを検索対象として、前記ユーザの情
報要求に基づいて基本的な情報検索を行う情報検索手
段、該情報検索手段により取得した検索結果の文書群そ
れぞれから特徴的な単語を抽出する特徴的単語抽出手
段、前記文書群それぞれから特徴的な単語の関係を抽出
する特徴的関係抽出手段、該特徴的単語抽出手段および
特徴的関係抽出手段により抽出した特徴的な単語および
特徴的な単語の関係に基づき、文書群を互いに類似した
文書集合に分類する検索結果分類手段、該検索結果分類
手段から得られた結果に基づいて、前記ユーザに対して
操作可能な出力画面の情報を生成する出力情報編集手段
を有するものである。A third information retrieval apparatus according to the present invention is an information retrieval apparatus for retrieving a previously constructed document database in accordance with an information request input by a user, wherein the document database is a retrieval target and is searched for by the user. An information search unit for performing a basic information search based on an information request, a characteristic word extraction unit for extracting a characteristic word from each document group of the search result obtained by the information search unit, and a characteristic word from each of the document groups. Characteristic extraction means for extracting relations between unique words, characteristic words extracted by the characteristic word extraction means and characteristic relation extraction means, and a set of documents similar to each other based on the relation between characteristic words. A search result classifying unit that classifies the search result into an output screen operable for the user based on a result obtained from the search result classifying unit. And it has a output information editing means for generating a broadcast.

【０００９】この発明に係る第４の情報検索装置は、前
記第１ないし第３の情報検索装置のいずれかにおいて、
出力情報編集手段が、出力画面の表示と同時に、ユーザ
の次の行動を入力可能な画面情報を表示する手段を含む
ものである。According to a fourth aspect of the present invention, in any one of the first to third information retrieval apparatuses,
The output information editing means includes means for displaying screen information capable of inputting the next action of the user at the same time as displaying the output screen.

【００１０】この発明に係る第５の情報検索装置は、前
記第１ないし第３の情報検索装置のいずれかにおいて、
検索結果分類手段が、検索結果分類手段により分類され
た文書集合に対して、再度分類を行う手段を含むもので
ある。According to a fifth aspect of the present invention, in any one of the first to third information retrieval apparatuses,
The search result classifying means includes means for re-classifying the document set classified by the search result classifying means.

【００１１】この発明に係る第６の情報検索装置は、前
記第１ないし第３の情報検索装置のいずれかにおいて、
情報検索手段が、特徴的単語抽出手段および特徴的関係
抽出手段により抽出した特徴的な単語および特徴的な単
語の関係の少なくとも一つを情報要求として再度検索を
実行する手段を含むものである。[0011] A sixth information retrieval apparatus according to the present invention, in any one of the first to third information retrieval apparatuses,
The information retrieving means includes means for re-executing a search with at least one of the characteristic words and the relation between characteristic words extracted by the characteristic word extracting means and the characteristic relation extracting means as an information request.

【００１２】この発明に係る第１の記憶媒体は、予め構
築された文書データベースをユーザにより入力される情
報要求に従って検索する情報検索プログラムを格納した
記憶媒体であって、前記文書データベースを検索対象と
して、前記ユーザの情報要求に基づいて基本的な情報検
索を行う情報検索プロセス、該情報検索プロセスにより
取得した検索結果のリストにおける文書群を文書集合に
分類する検索結果分類プロセス、前記文書群それぞれか
ら特徴的な単語を抽出する特徴的単語抽出プロセス、前
記文書群それぞれから特徴的な単語の関係を抽出する特
徴的関係抽出プロセス、前記検索結果分類プロセスと、
前記特徴的単語抽出プロセスと、前記特徴的関係抽出プ
ロセスとから得られた結果に基づいて、前記ユーザに対
して操作可能な出力画面の情報を生成する出力情報編集
プロセスを有するものである。A first storage medium according to the present invention is a storage medium storing an information search program for searching a previously constructed document database in accordance with an information request input by a user, wherein the document database is a search target. An information search process for performing basic information search based on the information request of the user, a search result classification process for classifying a document group in a list of search results acquired by the information search process into a document set, A characteristic word extraction process of extracting a characteristic word, a characteristic relationship extraction process of extracting a characteristic word relationship from each of the documents, the search result classification process,
An output information editing process for generating output screen information operable for the user based on the results obtained from the characteristic word extraction process and the characteristic relationship extraction process.

【００１３】この発明に係る第２の記憶媒体は、予め構
築された文書データベースをユーザにより入力される情
報要求に従って検索する情報検索プログラムを格納した
記憶媒体であって、前記文書データベースを検索対象と
して、前記ユーザの情報要求に基づいて基本的な情報検
索を行う情報検索プロセス、該情報検索プロセスにより
取得した検索結果の文書群それぞれから特徴的な単語を
抽出する特徴的単語抽出プロセス、該特徴的単語抽出プ
ロセスにより抽出した特徴的な単語に基づき、文書群を
互いに類似した文書集合に分類する検索結果分類プロセ
ス、前記文書群それぞれから特徴的な単語の関係を抽出
する特徴的関係抽出プロセス、前記検索結果分類プロセ
スと、前記特徴的関係抽出プロセスとから得られた結果
に基づいて、前記ユーザに対して操作可能な出力画面の
情報を生成する出力情報編集プロセスを有するものであ
る。A second storage medium according to the present invention is a storage medium storing an information search program for searching a previously constructed document database in accordance with an information request input by a user, wherein the document database is a search target. An information search process for performing a basic information search based on the information request of the user, a characteristic word extraction process for extracting a characteristic word from each document group of search results obtained by the information search process, A search result classification process of classifying documents into document sets similar to each other based on the characteristic words extracted by the word extraction process, a characteristic relationship extraction process of extracting a characteristic word relationship from each of the document groups, Based on the results obtained from the search result classification process and the characteristic relationship extraction process, And has a output information editing process for generating information operable output screen against over THE.

【００１４】この発明に係る第３の記憶媒体は、予め構
築された文書データベースをユーザにより入力される情
報要求に従って検索する情報検索プログラムを格納した
記憶媒体であって、前記文書データベースを検索対象と
して、前記ユーザの情報要求に基づいて基本的な情報検
索を行う情報検索プロセス、該情報検索プロセスにより
取得した検索結果の文書群それぞれから特徴的な単語を
抽出する特徴的単語抽出プロセス、前記文書群それぞれ
から特徴的な単語の関係を抽出する特徴的関係抽出プロ
セス、該特徴的単語抽出プロセスおよび前記特徴的関係
抽出プロセスにより抽出した特徴的な単語および特徴的
な単語の関係に基づき、文書群を互いに類似した文書集
合に分類する検索結果分類プロセス、該検索結果分類プ
ロセスから得られた結果に基づいて、前記ユーザに対し
て操作可能な出力画面の情報を生成する出力情報編集プ
ロセスを有するものである。A third storage medium according to the present invention is a storage medium storing an information search program for searching a previously constructed document database in accordance with an information request input by a user, wherein the document database is a search target. An information search process for performing a basic information search based on the user's information request, a characteristic word extraction process for extracting a characteristic word from each document group of search results obtained by the information search process, A characteristic relationship extraction process for extracting a relationship between characteristic words from each, a characteristic word extracted by the characteristic word extraction process and the characteristic relationship extraction process, and a document group based on the characteristic word relationship. A search result classification process for classifying documents into a set of documents similar to each other; Based on the results, and has an output information editing process for generating information operable output screen to the user.

【００１５】この発明に係る第４の記憶媒体は、前記第
１ないし第３の記憶媒体のいずれかにおいて、出力情報
編集プロセスが、出力画面の表示と同時に、ユーザの次
の行動を入力可能な画面情報を表示するプロセスを含む
ものである。According to a fourth storage medium of the present invention, in any one of the first to third storage media, the output information editing process can input the next action of the user simultaneously with the display of the output screen. It includes the process of displaying screen information.

【００１６】この発明に係る第５の記憶媒体は、前記第
１ないし第３の記憶媒体のいずれかにおいて、検索結果
分類プロセスが、検索結果分類プロセスにより分類され
た文書集合に対して、再度分類を行うプロセスを含むも
のである。In a fifth storage medium according to the present invention, in any one of the first to third storage media, the search result classification process re-classifies a document set classified by the search result classification process. The process includes the following.

【００１７】この発明に係る第６の記憶媒体は、前記第
１ないし第３の記憶媒体のいずれかにおいて、情報検索
プロセスが、特徴的単語抽出プロセスおよび特徴的関係
抽出プロセスにより抽出した特徴的な単語および特徴的
な単語の関係の少なくとも一つを情報要求として再度検
索を実行するプロセスを含むものである。In a sixth storage medium according to the present invention, in any one of the first to third storage media, the information retrieval process is characterized by a characteristic word extracted by a characteristic word extraction process and a characteristic relation extraction process. It includes a process of executing a search again using at least one of the relationship between a word and a characteristic word as an information request.

【００１８】[0018]

【発明の実施の形態】実施の形態１．図１は、本発明の
原理構成図である。１０１は予め構築された文書データ
ベース、１０２は文書データベース１０１を検索対象と
して、ユーザの情報要求に基づいて基本的な情報検索を
行う情報検索手段、１０３は情報検索手段１０２により
取得した検索結果のリストにおける文書群を文書集合に
分類する検索結果分類手段、１０４は検索結果の各文書
から特徴的な単語のリストを抽出する特徴的単語抽出手
段、１０５は検索結果の各文書から特徴的な単語間の関
係のリストを抽出する特徴的関係抽出手段、１０６は検
索結果分類手段１０３と特徴的単語抽出手段１０４と特
徴的関係抽出手段１０５の結果に基づいて、ユーザに対
して操作可能な出力画面の情報を生成する出力情報編集
手段である。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiment 1 FIG. 1 is a diagram illustrating the principle of the present invention. 101 is a document database constructed in advance, 102 is an information search unit for performing basic information search based on a user's information request with the document database 101 as a search target, and 103 is a list of search results obtained by the information search unit 102 Search result classifying means for classifying a group of documents into a document set, a characteristic word extracting means 104 for extracting a list of characteristic words from each document of the search result, and a characteristic word extracting means 105 for extracting a characteristic word list from each document of the search result. The characteristic relation extracting means 106 for extracting a list of the relations of the search results is based on the results of the search result classifying means 103, the characteristic word extracting means 104, and the characteristic relation extracting means 105. Output information editing means for generating information.

【００１９】図１の原理構成において、まず、ユーザは
情報要求のキーワドを情報検索手段１０２に入力し、文
書データベース１０１の基本的な検索を行い、文書群の
リストを得る。In the principle configuration shown in FIG. 1, first, a user inputs a keyword of an information request to the information search means 102, performs a basic search of the document database 101, and obtains a list of documents.

【００２０】次に、検索結果分類手段１０３において
は、例えば、検索結果のリストにおける文書群を、その
文書が入力された日時、文書の提供者、文書提供者の業
界等、文書に入力されているキーワードに基づいて分類
を実行し、分類された文書リストを得る。Next, in the search result classifying means 103, for example, a group of documents in the list of search results is input to a document such as the date and time when the document was input, the provider of the document, and the industry of the document provider. Classification is performed based on the keywords that are present, and a classified document list is obtained.

【００２１】また、検索結果のリストを特徴的単語抽出
手段１０４と特徴的関係抽出手段１０５に転送し、検索
された文書それぞれの中に含まれる特徴的な単語と特徴
的な単語の関係を抽出する。ここで、特徴的単語および
特徴的な単語の関係とは、文書に含まれる単語および単
語の関係のリストである。各単語には、その重要性を表
す重み、例えば、実数値が付与されているものとする。
すなわち、文書それぞれの特徴は実数値を値とし、各要
素はある単語に対応したベクトルとして表現される。ま
た、特徴的関係とは、文書に含まれる一つ以上の単語と
それらを結ぶリンクから構成される、データ構造を表
す。単語間を結ぶリンクは、その両端の単語間が該当文
書において密接な関係にあることを表す。より具体的に
は、該当文書中でそれらに文法的な係り受け関係がある
ことや、表層的な出現位置が特定の閾値以下であるこ
と、また、文章の論理構造的に出現位置が閾値以下であ
ること（例えば、同一文章中に出現する）などが挙げら
れる。これらの特徴的関係を用いて、文書それぞれの特
徴は、例えば、実数値を値とし、各要素はある単語の関
係に対応したベクトルとして表現される。The search result list is transferred to the characteristic word extracting means 104 and the characteristic relation extracting means 105, and the relation between characteristic words and characteristic words contained in each of the retrieved documents is extracted. I do. Here, the characteristic words and the relation between the characteristic words are a list of the words included in the document and the relation between the words. Each word is given a weight indicating its importance, for example, a real value.
That is, each document feature has a real value as a value, and each element is expressed as a vector corresponding to a certain word. The characteristic relationship indicates a data structure including one or more words included in a document and a link connecting the words. The link connecting the words indicates that the words at both ends are closely related in the corresponding document. More specifically, they have a grammatical dependency relationship in the relevant document, the superficial appearance position is below a certain threshold, and the logically appearing position of the sentence is below the threshold. (For example, appear in the same sentence). Using these characteristic relationships, the characteristics of each document are represented, for example, by using real values as values and each element is expressed as a vector corresponding to a certain word relationship.

【００２２】単語の重みとしては、情報検索の分野にお
いて従来より検討がなされており、単純な頻度、正規化
された頻度や特徴的であるか否かを表す値（ＴＦ＊ＩＤ
Ｆ）などが考えられる。本発明では、単語の重みとして
何を使用するかについては規定しない。また、特徴的単
語抽出手段１０４では、指示された文書の特徴をその都
度計算するのではなく、文書の特徴を内部データベース
にキャッシュしておくなどの効率化手段が考えられる
が、本発明ではその詳細は規定しない。特徴的関係に関
しても特徴的単語と同様の重みづけが考えられるが、本
発明では関係の重みとして何を使用するかについては規
定しない。The weight of a word has been conventionally studied in the field of information retrieval, and has a simple frequency, a normalized frequency, and a value (TF * ID) indicating whether the word is characteristic or not.
F) and the like. In the present invention, what is used as a word weight is not specified. In addition, the characteristic word extraction unit 104 may consider efficiency improvement means such as caching the document characteristics in an internal database instead of calculating the characteristics of the designated document each time. Details are not specified. The same weighting as the characteristic word can be considered for the characteristic relation, but the present invention does not specify what is used as the relation weight.

【００２３】次に、出力情報編集手段１０６は、検索結
果分類手段１０３と特徴的単語抽出手段１０４と特徴的
関係抽出手段１０５の結果に基づいて、ユーザに対して
操作可能な出力画面の情報を生成する。この出力画面情
報は、同時に、ユーザが次の行動（再検索あるいは検索
結果の再分類）を入力する画面でもある。Next, based on the results of the search result classifying means 103, the characteristic word extracting means 104, and the characteristic relation extracting means 105, the output information editing means 106 outputs information of an output screen which can be operated by the user. Generate. This output screen information is also a screen on which the user inputs the next action (re-search or re-classification of search results).

【００２４】ユーザは、出力画面に情報要求のキーワー
ドとして特徴的単語あるいは特徴的関係を追加して上記
のプロセスを経て再検索を行うことができる。The user can add a characteristic word or a characteristic relationship as a keyword of the information request to the output screen and perform a re-search through the above process.

【００２５】また、抽出された特徴的単語あるいは特徴
的関係あるいはこれら両方の組み合わせを用いて、検索
された文書群を再分類することができる。この再分類の
場合は、検索結果得られた文書リストあるいは分類され
た文書集合を検索結果分類手段１０３へ入力し、特徴的
単語あるいは特徴的関係あるいはこれら両方の組み合わ
せを用いて、再分類し、その処理結果を出力情報編集手
段１０６へ転送し、出力情報編集手段１０６は、ユーザ
が操作可能な上記と同様の出力画面の情報を生成する。Further, the retrieved documents can be re-classified using the extracted characteristic words or characteristic relations or a combination of both. In the case of this re-classification, a document list obtained as a search result or a set of classified documents is input to the search result classifying means 103, and re-classified using characteristic words or characteristic relations or a combination of both. The processing result is transferred to the output information editing unit 106, and the output information editing unit 106 generates the same output screen information that can be operated by the user as described above.

【００２６】この実施の形態１によれば、基本的な検索
結果について、特徴的単語とこの単語の特徴的関係を抽
出することによって、質の高い検索結果（検索情報）が
得られ、下記（１）、（２）および（３）のようなこと
が対話的で高効率に実施可能になり、ユーザの知識獲得
を高度に支援することが可能になる。（１）情報検索結果が単なるリストではなく、検索結果
の文書が情報要求を高度に満たす文書集合へと自動的に
分類されるため、実際にアクセスする文書を決定する際
の支援となる。（２）さらに、上記で生成されたグループを１つ以上選
択した結果である文書集合に対して、再度の自動分類を
指示することにより、検索結果を絞り込んでいくことが
可能になる。（３）抽出された特徴的な単語及び特徴的な関係のリス
トを利用して、この中からユーザが適当な単語または関
係またはこれらの組み合わせによって、絞り込み的な検
索や、関連するトピックに関する検索（連想型検索）が
可能となる。According to the first embodiment, a high-quality search result (search information) is obtained by extracting a characteristic word and a characteristic relationship between the word and the basic search result. Things like 1), (2) and (3) can be performed interactively and with high efficiency, and it is possible to highly support the user's knowledge acquisition. (1) Since the information retrieval result is not simply a list but the documents of the retrieval result are automatically classified into a document set that satisfies the information request at a high level, it is possible to assist in determining a document to be actually accessed. (2) Further, it is possible to narrow down the search results by instructing the automatic re-classification of the document set as a result of selecting one or more groups generated as described above. (3) Using a list of the extracted characteristic words and characteristic relations, the user can perform a narrow search or a search on a related topic by using an appropriate word or relation or a combination thereof from the list ( (Associative search).

【００２７】実施の形態２．図２は、本発明の情報検索
装置における実施の形態２を示す構成図である。同図に
示す構成（システム）は、入力部２０１、情報検索サー
ビス部２０２、情報検索実行部２０３、文書データベー
ス２０４、文書分類サービス部２０５、検索結果分類実
行部２０６、特徴的単語抽出部２０７、特徴的関係抽出
部２０８、出力情報編集部２０９、出力部２１０を有す
る。Embodiment 2 FIG. FIG. 2 is a configuration diagram showing Embodiment 2 of the information search device of the present invention. The configuration (system) shown in FIG. 1 includes an input unit 201, an information search service unit 202, an information search execution unit 203, a document database 204, a document classification service unit 205, a search result classification execution unit 206, a characteristic word extraction unit 207, It has a characteristic relation extracting unit 208, an output information editing unit 209, and an output unit 210.

【００２８】入力部２０１は、ユーザ端末から送信され
るシステムへの指示を受け付ける。ユーザからの指示
は、検索のための情報要求、または、検索結果の再分類
の指示のどちらかである。入力部２０１は、これらの判
断を行い、前者の場合には、情報検索サービス部２０２
へ入力された情報要求を転送し、後者の場合には、文書
分類サービス部２０５へ入力された検索結果の再分類の
指示を転送する。以下では、まず、検索のための情報要
求について説明する。The input unit 201 receives an instruction to the system transmitted from the user terminal. The instruction from the user is either an information request for a search or an instruction to reclassify the search result. The input unit 201 makes these determinations, and in the case of the former, the information search service unit 202
In the latter case, the instruction for re-classifying the search result is transferred to the document classification service unit 205. Hereinafter, first, an information request for search will be described.

【００２９】情報検索サービス部２０２は、まず、入力
部２０１から転送されてきた情報要求を情報検索実行部
２０３へ転送する。情報検索実行部２０３は、予め構築
された文書データベース２０４を検索対象とした情報検
索を実行し、その結果として、ユーザの情報要求に対す
る適合度順にソートされた、文書データベース２０４の
文書リストを情報検索サービス部２０２へ返却する。な
お、情報検索実行部２０３に相当するシステムは、公知
の技術により十分実現可能であるため、その詳細は問わ
ず、入力として単語の論理結合（ＡＮＤ、ＯＲ、ＮＯ
Ｔ）を許すことと、適合度順にソートされた文書リスト
を結果とすることのみ条件とする。The information search service unit 202 first transfers the information request transferred from the input unit 201 to the information search execution unit 203. The information search execution unit 203 executes an information search with the document database 204 constructed in advance as a search target, and as a result, searches the document list of the document database 204 sorted in the order of relevance to the information request of the user. Return to the service unit 202. Note that a system corresponding to the information search execution unit 203 can be sufficiently realized by a known technique. Therefore, regardless of the details, a logical combination of words (AND, OR, NO
The only conditions are that T) is permitted and that the result is a document list sorted in order of relevance.

【００３０】検索結果分類実行部２０６は、情報検索サ
ービス部２０２より転送されてくる文書群（実際には文
書データベース２０４における文書ＩＤの集合）を入力
として受ける。検索結果分類実行部２０６は、まず、特
徴的単語抽出部２０７を呼び出すことにより、文書デー
タベース２０４における指定された文書における特徴的
単語を得る。ここで、特徴的単語とは、文書に含まれる
単語のリストであり、各単語には、その重要性を表す重
み、例えば、実数値が付与されているものとする。すな
わち、文書の特徴は実数値を値とし、各要素はある単語
に対応しているベクトルとして表現される。The search result classification execution unit 206 receives as input a document group (actually, a set of document IDs in the document database 204) transferred from the information search service unit 202. First, the search result classification executing unit 206 obtains a characteristic word in the designated document in the document database 204 by calling the characteristic word extracting unit 207. Here, the characteristic word is a list of words included in the document, and each word is given a weight indicating its importance, for example, a real value. That is, the document features are represented by real numbers as values, and each element is represented as a vector corresponding to a certain word.

【００３１】単語の重みとしては、情報検索の分野にお
いて従来より検討がなされており、単純な頻度、正規化
された頻度や特徴的であるか否かを表す値（ＴＦ＊ＩＤ
Ｆ）などが考えられる。本発明では、単語の重みとして
何を使用するかについては規定しない。また、特徴的単
語抽出部２０７では、指示された文書の特徴をその都度
計算するのではなく、文書の特徴を内部データベースに
キャッシュしておくなどの効率化手段が考えられるが、
その詳細は規定しない。The weight of a word has been conventionally studied in the field of information retrieval, and has a simple frequency, a normalized frequency, and a value (TF * ID) indicating whether the word is characteristic or not.
F) and the like. In the present invention, what is used as a word weight is not specified. In addition, the characteristic word extraction unit 207 does not calculate the characteristics of the designated document each time, but may improve the efficiency by, for example, caching the characteristics of the document in an internal database.
Details are not specified.

【００３２】また、検索結果分類実行部２０６は、特徴
的関係抽出部２０８を呼び出すことにより、文書データ
ベース２０４における指定された文書における特徴的関
係を得る。ここで、特徴的関係とは、文書に含まれる一
つ以上の単語とそれらを結ぶリンクから構成される、デ
ータ構造を表す。単語間を結ぶリンクは、その両端の単
語間が該当文書において密接な関係にあることを表す。
より具体的には、該当文書中でそれらに文法的な係り受
け関係があることや、表層的な出現位置が特定の閾値以
下であること、また、文章の論理構造的に出現位置が閾
値以下であること（例えば、同一文章中に出現する）な
どが挙げられる。これらの特徴的関係を用いて、文書の
特徴は実数値を値とし、各要素はある関係に対応してい
るベクトルとして表現される。特徴的関係に関しても特
徴的単語と同様の重みづけが考えられるが、本発明では
関係の重みとして何を使用するかについては規定しな
い。また、特徴的関係抽出部２０８においても、特徴的
単語抽出部２０７と同様、内部データベースにキャッシ
ュしておくなどの効率化手段が考えられるが、その詳細
は規定しない。Further, the search result classification executing section 206 obtains the characteristic relation in the designated document in the document database 204 by calling the characteristic relation extracting section 208. Here, the characteristic relationship represents a data structure including one or more words included in a document and a link connecting the words. The link connecting the words indicates that the words at both ends are closely related in the corresponding document.
More specifically, they have a grammatical dependency relationship in the relevant document, the superficial appearance position is below a certain threshold, and the logically appearing position of the sentence is below the threshold. (For example, appear in the same sentence). Using these characteristic relations, the document characteristics are represented by real values as values, and each element is expressed as a vector corresponding to a certain relation. The same weighting as the characteristic word can be considered for the characteristic relation, but the present invention does not specify what is used as the relation weight. Also in the characteristic relation extracting unit 208, similar to the characteristic word extracting unit 207, efficiency improving means such as caching in an internal database can be considered, but the details are not specified.

【００３３】次に、検索結果分類実行部２０６は、入力
された文書群の文書それぞれに対して求められた文書特
徴ベクトルを総合することにより、図３に示すような行
列を求める。当該行列の各行は文書に、各列は単語また
は関係に相当する。このような行列を以下では特徴行列
と呼ぶ。特徴行列の構成時に、特徴的関係の重みのバラ
ンスを取るために、定数を二種類（α，β）用意し、特
徴的単語を表す重みにはαを、特徴的関係を表す重みに
はβを乗じてもよい。（α，β）＝（１，０）のときは
特徴的単語のみを考慮した分類となり、（α，β）＝
（０，１）のときは特徴的関係のみを考慮した分類とな
る。Next, the search result classification execution unit 206 obtains a matrix as shown in FIG. 3 by integrating the document feature vectors obtained for each document of the input document group. Each row of the matrix corresponds to a document, and each column corresponds to a word or relation. Such a matrix is hereinafter referred to as a feature matrix. In constructing the feature matrix, two types of constants (α, β) are prepared in order to balance the weights of the characteristic relationships. Α is used for the weight representing the characteristic word, and β is used for the weight representing the characteristic relationship. May be multiplied. When (α, β) = (1,0), the classification takes into account only characteristic words, and (α, β) =
In the case of (0, 1), the classification takes into account only the characteristic relationship.

【００３４】つぎに、検索結果分類実行部２０６は、特
徴行列に基づき互いに類似した文書を文書集合に分類す
る。特徴行列の近いもの同士をグループ化することによ
り、文書の自動分類を行う方法として、クラスタリング
と呼ばれる手法が知られており、いくつかのアルゴリズ
ムが提案されている（参考文献例：Ｅ．Ｒａｓｍｕｓｓ
ｅｎ：ＣｌｕｓｔｅｒｉｎｇＡｌｇｏｒｉｔｈｍｓ，
ｉｎＷ．Ｂ．Ｆｒａｋｅｓ，Ｒ．Ｂａｅｚａ−Ｙａｔ
ｅｓ，ｅｄｉｔｏｒｓ，ＩｎｆｏｒｍａｔｉｏｎＲｅ
ｔｒｉｅｖａｌ，ＰｒｅｎｔｉｃｅＨａｌｌ，１９９
２）。この発明における検索結果分類実行部２０６の採
用するクラスタリングのアルゴリズムについては規定し
ないが、これらの処理の結果、特徴行列が互いに類似し
た文書をグループ化することが可能となる。特徴行列は
特徴的単語と特徴的関係から構成されるため、結果とし
て内容的に互いに類似した文書をグループ化することが
できる。Next, the retrieval result classification execution unit 206 classifies documents similar to each other into a document set based on the feature matrix. As a method for automatically classifying documents by grouping objects having similar feature matrices, a method called clustering is known, and several algorithms have been proposed (reference examples: E. Rasmuss).
en: Clustering Algorithms,
in W. B. Frakes, R.A. Baeza-Yat
es, editors, Information Re
trieval, Prentice Hall, 199
2). Although the algorithm of clustering employed by the search result classification execution unit 206 in the present invention is not specified, as a result of these processes, documents having similar feature matrices can be grouped. Since the feature matrix is made up of characteristic words and characteristic relationships, as a result, documents that are similar in content can be grouped.

【００３５】次に、検索結果分類実行部２０６は、分類
された結果に応じて、図４に示すような、各グループに
おいて特徴的な単語または関係のリストも求めるものと
する。その方法の一つとしては、多くのクラスタリング
アルゴリズムではグループ毎にクラスタ中心という仮想
的な特徴群に基づいて分類処理を行うため、各グループ
におけるクラスタ中心から重みの大きい特徴を取り出し
て用いることが考えられる。また、別の方法として、各
クラスタに分類された文書から再度特徴的単語や特徴的
関係を抽出し、重みの大きい特徴を取り出して用いるこ
とも考えられる。Next, it is assumed that the search result classification executing section 206 also obtains a list of characteristic words or relations in each group as shown in FIG. 4 according to the classified result. As one of the methods, since many clustering algorithms perform classification processing based on a virtual feature group called a cluster center for each group, it is conceivable to use a feature with a large weight from the cluster center in each group and use it Can be As another method, it is conceivable to extract characteristic words and characteristic relations again from the documents classified into each cluster, and extract and use a characteristic with a large weight.

【００３６】特徴の取り出し方としては、リストの大き
さ（単語または関係の数）を陽に指定して大きいものか
ら順に取得する方法や、ある一定の値以上の重みを持つ
単語または半径のみを対象としてリストを構成する方法
が考えられる。また、特徴行列の構成と同様、定数を二
種類用意し、それぞれ特徴的単語を表す重みと特徴的関
係を表す重みに乗じた後、リストを構成してもよい。こ
れらの定数は特徴行列の生成時に用いた定数とは独立に
決定してもよい。As a method of extracting the feature, a method of explicitly specifying the size of the list (the number of words or relations) and obtaining the list in descending order, or a method of extracting only words or radii having a weight equal to or more than a certain value are used. A method of constructing a list as an object is conceivable. Similarly to the configuration of the feature matrix, two types of constants may be prepared, and the weight may be multiplied by the weight representing the characteristic word and the weight representing the characteristic relationship. These constants may be determined independently of the constants used when generating the feature matrix.

【００３７】出力情報編集部２０９は、検索結果分類実
行部２０６から下記（１）及び（２）のデータ、を受け
取り、ユーザによるインタラクティブな情報検索行動を
支援するための出力画面情報を生成する。この出力画面
情報は、同時に、ユーザが次の再分類または再検索の行
動を入力する画面でもある。（１）グループに属する文書のリスト（２）グループを特徴付ける単語及び関係のリストと各
々の重みThe output information editing unit 209 receives the following data (1) and (2) from the search result classification execution unit 206, and generates output screen information for supporting interactive information search behavior by the user. This output screen information is also a screen on which the user inputs the next re-classification or re-search action. (1) List of documents belonging to a group (2) List of words and relationships characterizing the group and their weights

【００３８】出力部２１０は、出力情報編集部２０９か
ら転送されてきた画面情報をユーザ端末へと転送する。The output unit 210 transfers the screen information transferred from the output information editing unit 209 to the user terminal.

【００３９】ユーザは、出力画面に情報要求のキーワー
ドとして特徴的単語あるいは特徴的関係を追加し、上記
のプロセスを経て再検索を行うことができる。The user can add a characteristic word or a characteristic relationship as a keyword of the information request to the output screen, and can perform a search again through the above-described process.

【００４０】また、ユーザは検索された文書集合を再分
類するために、再分類の指示を入力部２０１に指示し、
入力部２０１は再分類の指示を文書分類サービス部２０
５へ転送する。文書分類サービス部２０５は、再分類の
指示に基づいて、再分類の対象となる文書集合を出力情
報編集部２０９に求め、この文書集合を検索結果分類実
行部２０６へ転送し、さらに、その処理結果を出力情報
編集部２０９に転送し、ユーザが操作可能な上記と同様
の出力画面の情報を生成する。Further, the user instructs the input unit 201 to perform re-classification in order to re-classify the searched document set.
The input unit 201 issues a re-classification instruction to the document classification service unit 20.
Transfer to 5. The document classification service unit 205 obtains a document set to be re-classified from the output information editing unit 209 based on the re-classification instruction, transfers this document set to the search result classification execution unit 206, and further processes the document set. The result is transferred to the output information editing unit 209, and the same output screen information that can be operated by the user is generated.

【００４１】この実施の形態２においては、検索された
文書リスト中文書群の自動分類に先だって特徴的単語お
よび特徴的関係を抽出し、この抽出された特徴的単語お
よび特徴的関係を利用して検索された文書群を類似した
文書集合に分類しているので、極めて質の高い検索結果
（検索情報）が得られ、対話的で高効率に、実際にアク
セスする文書をユーザが決定するのを支援することがで
きる。また、再度の自動分類を指示することによる検索
結果の絞り込みができる。また、自動分類に至る過程で
抽出される特徴的な単語及び特徴的な関係のリストを利
用した絞り込み的な検索や、関連するトピックに関する
検索（連想型検索）が可能となる。In the second embodiment, characteristic words and characteristic relationships are extracted prior to automatic classification of a document group in the retrieved document list, and the extracted characteristic words and characteristic relationships are used. Since the retrieved documents are classified into a similar set of documents, extremely high-quality search results (search information) can be obtained, and the user can interactively and efficiently determine the documents to be actually accessed. I can help. In addition, the search results can be narrowed down by instructing automatic classification again. Further, it is possible to perform a narrow-down search using a list of characteristic words and characteristic relations extracted in a process leading to automatic classification, and a search for a related topic (associative search).

【００４２】なお、検索された文書リスト中文書群の自
動分類に先だって特徴的単語および特徴的関係を抽出
し、この抽出された特徴的単語利用して検索された文書
群を類似した文書集合に分類し、分類された文書集合と
別に抽出された特徴的関係を研修して出力することによ
り、出力までの時間を短縮することができる。Prior to the automatic classification of the documents in the retrieved document list, characteristic words and characteristic relationships are extracted, and the retrieved documents are converted into a similar document set by using the extracted characteristic words. By classifying and training and outputting a characteristic relationship extracted separately from the classified document set, the time until output can be reduced.

【００４３】[0043]

【実施例】以下、図面に基づき、具体的な実施例により
本発明を説明するが、本発明が上この実施例に限定され
るものではなく、特許請求の範囲内で種々変更・応用が
可能である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below with reference to specific embodiments based on the drawings. However, the present invention is not limited to the above embodiments, and various modifications and applications are possible within the scope of the claims. It is.

【００４４】この実施例は、本発明の情報検索装置をＷ
ＷＷ（ＷｏｒｌｄＷｉｄｅＷｅｂ）上の検索エンジ
ンに適用した場合である。In this embodiment, the information retrieval apparatus of the present invention
This is a case where the present invention is applied to a search engine on the WW (World Wide Web).

【００４５】図５は、本発明の一実施例における検索要
求入力画面の例を示しており、ユーザ端末に表示される
初期画面の例である。この画面例は、まず、情報要求を
表す語を入力する領域が最上部に設定されており、ユー
ザは、「インターネット」、「電子商取引」の２語から
なる情報要求を入力したことを示している。その次の行
では、検索方式に関する基本的な設定が行えるようにな
っており、複数入力された検索語をＡＮＤ条件で結んで
検索すること、英文字の大文字と小文字を区別して検索
することが指定されている。その次の行では、検索結果
のリスト表示に関する設定が行えるようになっており、
検索語を多く含む文書から１０件ずつのリストを表示す
るように指定している。その次の行では、検索結果のク
ラスタ処理に関する設定が行えるようになっており、ク
ラスタ数自動でクラスタ処理を行うように指定してい
る。以下では、この入力例に即し、図２を参照して説明
を行う。FIG. 5 shows an example of a search request input screen in one embodiment of the present invention, which is an example of an initial screen displayed on the user terminal. In this example of the screen, first, an area for inputting a word representing an information request is set at the top, and it is shown that the user has input an information request consisting of two words, "Internet" and "e-commerce". I have. In the next line, you can make basic settings related to the search method. You can search for multiple input search terms by connecting them with AND conditions. Specified. In the next line, you can configure the settings related to displaying the list of search results,
It is specified that a list of ten documents from a document containing many search terms is displayed. In the next line, the setting related to the cluster processing of the search result can be performed, and it is specified that the cluster processing is automatically performed by the number of clusters. Hereinafter, description will be given with reference to FIG. 2 in accordance with this input example.

【００４６】入力部２０１は、ユーザ端末から送信され
てくる上記のような入力画面の要求のタイプに従って、
ユーザの要求を情報検索サービス部２０２、または、文
書分類サービス部２０５へ転送する。この実施例の場合
は、情報検索のための情報要求であるので、情報検索サ
ービス部２０２へ要求を転送する。情報検索サービス部
２０２は、転送されてきた情報要求と検索条件から検索
式を生成する。この実施例の情報要求及び検索条件から
は、（インターネット）ＡＮＤ（電子商取引）なる検索
式が生成され、この検索式は情報検索実行部２０３へ転
送される。The input unit 201 is provided in accordance with the type of the input screen request transmitted from the user terminal as described above.
The user's request is transferred to the information search service unit 202 or the document classification service unit 205. In the case of this embodiment, since the request is an information request for information search, the request is transferred to the information search service unit 202. The information search service unit 202 generates a search formula from the transferred information request and search condition. A search formula (Internet) AND (electronic commerce) is generated from the information request and the search condition of this embodiment, and the search formula is transferred to the information search execution unit 203.

【００４７】図６は、この実施例における情報検索結果
の文書リストの例を示し、上記の検索式によって情報検
索実行部２０３が文書データベース２０４を検索対象と
して検索を行った結果である。情報検索実行部２０３に
適用されるような通常のテキスト検索システムは、ここ
に示されたような情報以外の情報を返却することも可能
であるが、図７では、以下の説明に必要最小限な情報の
みを示す。すなわち、情報検索結果の各文書に対して
は、その文書データベース２０４内における文書ＩＤ
（この例では３桁の数）、文書のタイトル（説明を分か
り易くするために導入した）が返却されたものとしてい
る。FIG. 6 shows an example of a document list of the information search result in this embodiment, and shows a result of the search performed by the information search execution unit 203 with the document database 204 as a search target using the above search formula. Although a normal text search system applied to the information search execution unit 203 can return information other than the information shown here, FIG. Only relevant information. That is, for each document of the information search result, the document ID in the document database 204
It is assumed that the document title (a three-digit number in this example) and the title of the document (introduced for easy understanding) have been returned.

【００４８】情報検索サービス部２０２は、図６に示さ
れた検索結果を検索結果分類実行部２０６へ転送する。
検索結果分類実行部２０６は、まず、特徴的単語抽出部
２０７を呼び出すことにより、転送されてきた文書の特
徴的な単語を得る。また、検索結果分類実行部２０６
は、特徴的関係抽出部２０８を呼び出すことにより、転
送されてきた文書の特徴的な関係を得る。図７は、得ら
れた文書特徴ベクトルの例であり、図７の第１要素（Ｉ
Ｄ＝００１，タイトル＝電子商取引の持つ危険性）につ
いて得られた例である。説明を簡単にするため、この文
書の２つの特徴的単語である（詐欺）：３、（改ざ
ん）：２及び１つの特徴的関係である（クレジット番号
→不正使用）：２のみを示す。ここで、実数は強さを表
している。また、ここに挙げた特徴的関係は係り受け関
係に基づくものであり、→は、係る単語→係られる単語
の関係を示している。The information search service unit 202 transfers the search results shown in FIG. 6 to the search result classification execution unit 206.
The search result classification execution unit 206 first obtains characteristic words of the transferred document by calling the characteristic word extraction unit 207. Also, the search result classification execution unit 206
Obtains the characteristic relation of the transferred document by calling the characteristic relation extraction unit 208. FIG. 7 is an example of the obtained document feature vector, and the first element (I
D = 001, title = danger of e-commerce). For the sake of simplicity, only two characteristic words (fraud): 3, (tampering): 2, and one characteristic relationship (credit number → illegal use): 2 are shown in this document. Here, the real number represents the strength. Further, the characteristic relations listed here are based on the dependency relations, and → indicates the relation between the relevant word and the related word.

【００４９】次に、検索結果分類実行部２０６が、入力
された文書集合の各要素である文書に対して求められた
文書特徴ベクトルを総合することにより、図３に示すよ
うな行列を求める。図８は、この実施例における特徴行
列の例であり、検索結果リストに対する特徴行列を説明
するための図である。図８は、前述の図６の検索結果リ
ストに対する特徴行列を示しており、同図では、図７と
同様に説明を簡単にするため、検索結果で得られた１０
の文書に対し主要な特徴的単語５個と特徴的関係２個の
みを示したが、実際においては、特徴的単語と特徴的関
係の数はこれにとどまるものではない。Next, the search result classification execution unit 206 obtains a matrix as shown in FIG. 3 by synthesizing the document feature vectors obtained for the documents which are each element of the input document set. FIG. 8 is an example of a feature matrix in this embodiment, and is a diagram for explaining a feature matrix for a search result list. FIG. 8 shows a feature matrix for the search result list of FIG. 6 described above. In FIG. 8, as in FIG.
Although only five main characteristic words and two characteristic relations are shown for this document, the number of characteristic words and characteristic relations is not limited to this.

【００５０】次に、検索結果分類実行部２０６は、図８
の特徴行列に対してクラスタリングアルゴリズムを実行
する。前述したように、いくつかのクラスタリングアル
ゴリズムが提案されているので、本実施例における検索
結果分類実行部２０６は、適当なアルゴリズムを実装し
ていると仮定する。図９は、図８の特徴行列に対してク
ラスタリングアルゴリズムを実行した結果の分類結果行
列を説明するものである。図９に示したように、この例
においては、１０の文書が２つのグループ（１つは６つ
の文書からなり、もう１つは４つの文書からなる）へ自
動分類されている。Next, the search result classification execution unit 206
Execute the clustering algorithm for the feature matrix of. As described above, since some clustering algorithms have been proposed, it is assumed that the search result classification execution unit 206 in the present embodiment implements an appropriate algorithm. FIG. 9 illustrates a classification result matrix as a result of executing the clustering algorithm on the feature matrix of FIG. As shown in FIG. 9, in this example, ten documents are automatically classified into two groups (one is composed of six documents and the other is composed of four documents).

【００５１】図１０は、検索結果分類実行部２０６の処
理結果として、分類結果行列と同時に得られる特徴的単
語および特徴的関係のリストを説明するための図であ
る。図１０において、第１の文書グループにおける特徴
的な単語および特徴的な関係がその重みと共に示されて
いる。FIG. 10 is a diagram for explaining a list of characteristic words and characteristic relations obtained at the same time as the classification result matrix as the processing result of the search result classification execution unit 206. In FIG. 10, characteristic words and characteristic relations in the first document group are shown together with their weights.

【００５２】上記のような検索結果分類実行部２０６の
処理結果は、出力情報編集部２０９へ転送される。出力
情報編集部２０９は、転送されてきたデータに基づい
て、ユーザによる対話的な情報検索行動を支援するため
の出力画面情報を生成する。この出力画面は、同時に、
ユーザが次に行う再検索あるいは再分類の行動を入力す
るための画面でもある。出力部２１０は、出力情報編集
部２０９から転送されてきた出力画面情報をユーザの端
末へ転送する。The processing result of the search result classification executing unit 206 as described above is transferred to the output information editing unit 209. The output information editing unit 209 generates output screen information for supporting a user's interactive information search action based on the transferred data. This output screen, at the same time,
It is also a screen for the user to input a re-search or re-classification action to be performed next. The output unit 210 transfers the output screen information transferred from the output information editing unit 209 to the user terminal.

【００５３】図１１及び図１２は、この実施例における
出力画面情報の例を示している。これらの図１１および
図１２は、出力部２１０によりユーザの端末に転送され
る具体的な出力画面情報の例である。FIGS. 11 and 12 show examples of output screen information in this embodiment. FIGS. 11 and 12 are examples of specific output screen information transferred to the user terminal by the output unit 210.

【００５４】図１１は、上記の検索結果として生成され
た再検索のための入力フォームと、検索された文書のリ
ストを表す。図１１において、上部に再検索のための入
力フォームが、その下に検索された文書のタイトルが列
挙されている。各タイトル部をクリックすることで具体
的な文書の内容を参照することができる。図１１の入力
フォームにおいて、検索方式、結果表示、クラスタ処理
等の選択メニューは、図５に示したものと同様のもので
あり、検索語のフィールドには先に入力した語が予め入
力されているが、ユーザはこれらの語を編集もしくは語
を追加して再検索を行うことが容易となっている。ま
た、自動分類処理の段階で抽出された特徴的な単語およ
び特徴的な関係のうち主たるものが表示されている。こ
れらの語に付与されたボタンをクリックすることで、検
索語のフィールドに当該語を入力し、再検索を容易に行
うことが可能になる。FIG. 11 shows an input form for re-search generated as a result of the above-mentioned search and a list of searched documents. In FIG. 11, an input form for re-search is listed at the top, and the title of the searched document is listed below it. By clicking on each title part, the contents of a specific document can be referred to. In the input form of FIG. 11, selection menus for the search method, result display, cluster processing, and the like are the same as those shown in FIG. 5, and the previously input word is previously input in the search word field. However, it is easy for the user to edit these words or add words to perform a search again. In addition, main words among characteristic words and characteristic relations extracted at the stage of the automatic classification processing are displayed. By clicking the buttons given to these words, the words can be input in the search word field, and the search can be easily performed again.

【００５５】図１２は、自動分類の結果として生成され
た文書グループ（画面ではクラスタと表記）の情報を表
示している。同図において、４文書からなるクラスタ１
と、６文書からなるクラスタ２が生成されたことが示さ
れている。各クラスタ番号の次に表示されているのは、
当該クラスタにおける特徴的な単語および特徴的な関
係、及びそれらの当該クラスタにおける強さである。そ
の次に列挙されているのは、当該クラスタに分類された
記事のタイトルを表す。タイトル部をクリックすること
で具体的な文書の内容を参照することができる。各クラ
スタの先頭のボタンを選択し、末尾の再クラスタリング
ボタンをクリックすることにより、当該クラスタに含ま
れる文書を再クラスタリングし、より詳細な情報を得る
ことができる。FIG. 12 shows information of a document group (shown as a cluster on the screen) generated as a result of the automatic classification. In the figure, cluster 1 consisting of four documents
Indicates that a cluster 2 including six documents has been generated. Next to each cluster number is
Characteristic words and characteristic relationships in the cluster and their strength in the cluster. The next listed item indicates the title of the article classified into the cluster. By clicking on the title part, the contents of a specific document can be referred to. By selecting the button at the top of each cluster and clicking the re-clustering button at the end, the documents included in the cluster can be re-clustered to obtain more detailed information.

【００５６】[0056]

【発明の効果】上述のように、本発明によれば、情報検
索結果の自動分類や、検索結果の部分集合に対する再自
動分類による検索結果の構造化、自動分類の過程で抽出
された特徴的な単語および関係を組み合わせることによ
る次の段階の検索支援が可能となり、これらは、情報検
索に基づくユーザの知識獲得を支援する。As described above, according to the present invention, characteristic classification extracted in the process of automatic classification of information retrieval results, structuring of retrieval results by re-automatic classification for a subset of retrieval results, and automatic classification. The next level of search support is possible by combining various words and relationships, and these assist the user in acquiring knowledge based on information search.

[Brief description of the drawings]

【図１】本発明における情報辺索装置の実施の形態１
を説明する構成図である。FIG. 1 is a first embodiment of an information edge searching device according to the present invention.
FIG.

【図２】本発明における情報検索装置の実施の形態２
を説明する構成図である。FIG. 2 is a second embodiment of the information retrieval apparatus according to the present invention.
FIG.

【図３】本発明の実施の形態２における、文書リスト
の特徴行列の例を示す図である。FIG. 3 is a diagram illustrating an example of a feature matrix of a document list according to the second embodiment of the present invention.

【図４】本発明の実施の形態２における、文書グルー
プの特徴的な単語または特徴的な関係のリストの例を示
す図である。FIG. 4 is a diagram showing an example of a list of characteristic words or characteristic relations of a document group according to the second embodiment of the present invention.

【図５】本発明の一実施例における、検索要求入力画
面の例を示す図である。FIG. 5 is a diagram showing an example of a search request input screen in one embodiment of the present invention.

【図６】本発明の一実施例における、情報検索結果の
文書リストの例を示す図である。FIG. 6 is a diagram illustrating an example of a document list of an information search result according to an embodiment of the present invention.

【図７】本発明の一実施例における、文書特徴ベクト
ルの例を示す図である。FIG. 7 is a diagram illustrating an example of a document feature vector according to an embodiment of the present invention.

【図８】本発明の一実施例における、文書リストの特
徴行列の例を示す図である。FIG. 8 is a diagram illustrating an example of a feature matrix of a document list according to an embodiment of the present invention.

【図９】本発明の一実施例における、分類結果行列の
例を示す図である。FIG. 9 is a diagram showing an example of a classification result matrix in one embodiment of the present invention.

【図１０】本発明の一実施例における、特徴的単語お
よび特徴的関係の例を示す図である。FIG. 10 is a diagram showing an example of characteristic words and characteristic relations in one embodiment of the present invention.

【図１１】本発明の一実施例における、出力画面の一
例を示す図である。FIG. 11 is a diagram showing an example of an output screen in one embodiment of the present invention.

【図１２】本発明の一実施例における、出力画面の他
の例を示す図である。FIG. 12 is a diagram showing another example of the output screen in one embodiment of the present invention.

[Explanation of symbols]

１０１，２０４文書データベース、１０２情報検索
手段、１０３検索結果分類手段、１０４特徴的単語
抽出手段、１０５特徴的関係抽出手段、１０６出力
情報編集手段、２０１入力部、２０２情報検索サー
ビス部、２０３情報検索実行部、２０５文書分類サ
ービス部、２０６検索結果分類実行部、２０７特徴
的単語抽出部、２０８特徴的関係抽出部、２０９出
力情報編集部、２１０出力部。101, 204 document database, 102 information search means, 103 search result classification means, 104 characteristic word extraction means, 105 characteristic relation extraction means, 106 output information editing means, 201 input section, 202 information search service section, 203 information search Execution unit, 205 document classification service unit, 206 search result classification execution unit, 207 characteristic word extraction unit, 208 characteristic relation extraction unit, 209 output information editing unit, 210 output unit.

───────────────────────────────────────────────────── フロントページの続き (72)発明者小中裕喜東京都千代田区丸の内二丁目２番３号三菱電機株式会社内Ｆターム(参考） 5B075 ND03 NK31 NR12 PQ02 ────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Hiroki Konaka 2-3-2 Marunouchi, Chiyoda-ku, Tokyo F-term (reference) in Mitsubishi Electric Corporation 5B075 ND03 NK31 NR12 PQ02

Claims

[Claims]

An information retrieval apparatus for retrieving a previously constructed document database according to an information request input by a user, wherein a basic information retrieval is performed on the document database based on the information request of the user. A search result classifying unit that classifies a document group in a list of search results obtained by the information search unit into a document set, a characteristic word extracting unit that extracts a characteristic word from each of the document groups, Characteristic relation extracting means for extracting characteristic word relations from each document group,
Said search result classification means, said characteristic word extraction means,
An information retrieval apparatus comprising: an output information editing unit that generates output screen information operable for the user based on a result obtained from the characteristic relationship extraction unit.

2. An information retrieval apparatus for retrieving a previously constructed document database in accordance with an information request input by a user, wherein a basic information retrieval is performed on the document database based on the information request of the user. A characteristic word extracting means for extracting a characteristic word from each document group of the retrieval result obtained by the information searching means, and a document based on the characteristic words extracted by the characteristic word extracting means. A search result classifying means for classifying a group into a set of documents similar to each other, a characteristic relation extracting means for extracting a characteristic word relation from each of the document groups, the search result classifying means, and the characteristic relation extracting means. An output information editing unit configured to generate information on an output screen operable for the user based on the obtained result. Information retrieval device.

3. An information retrieval apparatus for retrieving a previously constructed document database in accordance with an information request input by a user, wherein a basic information retrieval is performed on the document database based on the information request of the user. Information extracting means for performing a search, characteristic word extracting means for extracting a characteristic word from each document group of the search result obtained by the information searching means, characteristic relation for extracting a characteristic word relationship from each of the document groups A search result classifying unit for classifying documents into document sets similar to each other based on the relation between characteristic words and characteristic words extracted by the extraction unit, the characteristic word extraction unit, and the characteristic relation extraction unit; Based on the results obtained from the result classifier,
An information retrieval apparatus, comprising: an output information editing unit that generates information on an output screen that can be operated by the user.

4. The information retrieval apparatus according to claim 1, wherein the output information editing means includes means for displaying screen information capable of inputting the next action of the user at the same time as displaying the output screen.

5. The information search apparatus according to claim 1, wherein the search result classifying means includes means for re-classifying the document set classified by the search result classifying means.

6. An information search means for executing a search again using at least one of a characteristic word and a characteristic word relation extracted by the characteristic word extraction means and the characteristic relation extraction means as an information request. The information retrieval device according to claim 1, wherein the information retrieval device includes:

7. A storage medium storing an information retrieval program for retrieving a previously constructed document database in accordance with an information request input by a user, wherein the document database is targeted for retrieval based on the information request of the user. An information search process for performing basic information search, a search result classification process for classifying documents in a list of search results obtained by the information search process into a document set,
A characteristic word extraction process for extracting a characteristic word from each of the document groups, a characteristic relationship extraction process for extracting a characteristic word relationship from each of the document groups, the search result classification process, and the characteristic word extraction Process and
A storage medium having an output information editing process for generating information on an output screen operable for the user based on a result obtained from the characteristic relationship extraction process.

8. A storage medium storing an information retrieval program for retrieving a previously constructed document database in accordance with an information request input by a user. An information search process for performing a basic information search, a characteristic word extraction process for extracting a characteristic word from each document group of search results obtained by the information search process, and a characteristic word extracted by the characteristic word extraction process. A search result classification process of classifying documents into document sets similar to each other based on words, a characteristic relationship extraction process of extracting a characteristic word relationship from each of the documents,
A storage, comprising: an output information editing process for generating information on an output screen operable for the user based on results obtained from the search result classification process and the characteristic relationship extraction process. Medium.

9. A storage medium storing an information retrieval program for retrieving a pre-established document database in accordance with an information request input by a user. An information search process for performing a basic information search, a characteristic word extraction process for extracting a characteristic word from each document group of the search result obtained by the information search process, and a characteristic word relationship from each of the document groups. A characteristic relation extraction process to be extracted, a characteristic word extraction process, and a retrieval for classifying documents into document sets similar to each other based on characteristic words extracted by the characteristic relation extraction process and a relation between characteristic words. A result classification process, based on the results obtained from the search result classification process, A storage medium having an output information editing process for generating output screen information that can be operated by using the output information editing process.

10. The storage medium according to claim 7, wherein the output information editing process includes a process of displaying screen information capable of inputting a next action of the user at the same time as displaying the output screen.

11. The storage medium according to claim 7, wherein the search result classification process includes a process of re-classifying a document set classified by the search result classification process.

12. The information search process includes a process of performing a search again using at least one of a characteristic word and a characteristic word relationship extracted by the characteristic word extraction process and the characteristic relationship extraction process as an information request. The storage medium according to any one of claims 7 to 9, including: